Page 4 
Previous  4 of 8  Next 

small (250x250 max)
medium (500x500 max)
Large
Extra Large
large ( > 500x500)
Full Resolution
All (PDF)

This page
All

A "law of large numbers," closely related to the centrallimit theorem, states: If repeated random samples of size N are drawn from any population (of whatever form) having a mean Xp and a standard deviation SDp , then as N becomes large the sampling distribution of sample means approaches normality as a limit, with mean Xp and standard deviation SDp/\/N . This theorem says that "no matter how unusual a distribution we start with, provided N is sufficiently large, we can count on a sampling distribution which is approximately normal. Since it is the sampling distribution, and not the population, which will be used in significance tests, this means that whenever N is large we can completely relax the assumption about the normality of the population and still make use of the normal curve in our tests" (5). Both the centrallimit theorem and the law of large numbers assume that simple random samples have been drawn, and are less appropriate with more complex sample designs. Many times when we want to compute the standard error of a sample estimate (in this case the estimate is of the mean) the standard deviation in the population is not known. Therefore the sample standard deviation is frequently used as an estimate of the population value. Thus the estimate of the standard error would become SDs/y/N . We are now prepared to add a margin of error to the estimate of average annual number of physician visits derived from one sample from the population. We know that in a normal distribution a certain percent of the cases lie within one standard error of the mean, a larger percent within two standard errors, etc. For example, we know that 95 percent of the cases lie within 1 .96 standard errors of the mean. Therefore, if we put a "confidence interval" of 1.96 standard errors around the sample estimate of the mean, the population mean falls within this interval 95 out of 100 samples on the average (see Figure 3). The flipside of this coin is, of course, that 5 times out of 100, on the average, this interval does not include the population mean. This 95 percent confidence level is frequently used (or p = .05, i.e., the probability of being wrong is 5 percent), but for 99 percent confidence, for example, a wider interval of 2.57 standard errors should be used. Let us take a numerical example. Assume that a sample of 500 persons is taken from a population and that the average number of physician visits per year for this sample is 5.1. Assume that the sample standard deviation is 2.2 visi ts.* The estimate of the standard error is therefore 2.2 /vSoo^ .098, and we can say that for 95 of 100 samples the population mean lies between 5.1 ± 1 .96 (.098) = 5.1 ± .192, or that we are 95 percent certain that the average number of physician visits in the population is between 4.91 and 5.29. This error margin is based only on a consideration of sampling error and assumes that nonsampling errors have been eliminated. RCURE 3: Comparison of confidence Intervals with the Sampling Distribution of the Mean, Showing Why 95 Percent Confidence Intervals Include Xp 95 Percent of the Time 1.96 S *>1.96S — * 1.% S 1.96 S — where s = standard error xi ^ mean of sample 1 X2 ^ mean of sample 2 Xp = population mean (adapted from Blalock, ref. (2), p. 159) In the example above we were estimating a numerical average, but frequently one will want to estimate a population proportion (p), for example the proportion or percent of persons who smoke. The process for estimating the standard error in this case is very similar to the one shown above. The standard deviation of a proportion is simply \/pq^ , where q is 7p. The standard error (which is the standard deviation of the sampling distribution) is \fpq~/\fN , or V pq/H where N is the sample size. Let's assume that we again use the sample standard deviation as an estimate of the population standard deviation, and that the proportion of persons who smoke in the sample is ps = .41 . The estimate of the standard error, in the case of a sample of 500, is therefore V41 (.59) /500 = .022 . We can then say with a five percent chance of being wrong that the population proportion is .41 ± 1 .96 (.022), or between .367 and .453. This calculation of the confidence interval is based on the normal approximation to the binomial distribution, and assumes that both Hp and H (1p), or Nq, are greater than 10. The confidence interval of a sample proportion is frequently used because of its relative simplicity. •The standard deviation is a measure of dispersion of the cases around the mean and is computed as the square root of the sum of the squared deviations of each case from the mean divided by the number of cases minus one. In this example this would be SD,= = 2.2. 5001
Object Description
Description
Title  Page 4 
Full Text  A "law of large numbers," closely related to the centrallimit theorem, states: If repeated random samples of size N are drawn from any population (of whatever form) having a mean Xp and a standard deviation SDp , then as N becomes large the sampling distribution of sample means approaches normality as a limit, with mean Xp and standard deviation SDp/\/N . This theorem says that "no matter how unusual a distribution we start with, provided N is sufficiently large, we can count on a sampling distribution which is approximately normal. Since it is the sampling distribution, and not the population, which will be used in significance tests, this means that whenever N is large we can completely relax the assumption about the normality of the population and still make use of the normal curve in our tests" (5). Both the centrallimit theorem and the law of large numbers assume that simple random samples have been drawn, and are less appropriate with more complex sample designs. Many times when we want to compute the standard error of a sample estimate (in this case the estimate is of the mean) the standard deviation in the population is not known. Therefore the sample standard deviation is frequently used as an estimate of the population value. Thus the estimate of the standard error would become SDs/y/N . We are now prepared to add a margin of error to the estimate of average annual number of physician visits derived from one sample from the population. We know that in a normal distribution a certain percent of the cases lie within one standard error of the mean, a larger percent within two standard errors, etc. For example, we know that 95 percent of the cases lie within 1 .96 standard errors of the mean. Therefore, if we put a "confidence interval" of 1.96 standard errors around the sample estimate of the mean, the population mean falls within this interval 95 out of 100 samples on the average (see Figure 3). The flipside of this coin is, of course, that 5 times out of 100, on the average, this interval does not include the population mean. This 95 percent confidence level is frequently used (or p = .05, i.e., the probability of being wrong is 5 percent), but for 99 percent confidence, for example, a wider interval of 2.57 standard errors should be used. Let us take a numerical example. Assume that a sample of 500 persons is taken from a population and that the average number of physician visits per year for this sample is 5.1. Assume that the sample standard deviation is 2.2 visi ts.* The estimate of the standard error is therefore 2.2 /vSoo^ .098, and we can say that for 95 of 100 samples the population mean lies between 5.1 ± 1 .96 (.098) = 5.1 ± .192, or that we are 95 percent certain that the average number of physician visits in the population is between 4.91 and 5.29. This error margin is based only on a consideration of sampling error and assumes that nonsampling errors have been eliminated. RCURE 3: Comparison of confidence Intervals with the Sampling Distribution of the Mean, Showing Why 95 Percent Confidence Intervals Include Xp 95 Percent of the Time 1.96 S *>1.96S — * 1.% S 1.96 S — where s = standard error xi ^ mean of sample 1 X2 ^ mean of sample 2 Xp = population mean (adapted from Blalock, ref. (2), p. 159) In the example above we were estimating a numerical average, but frequently one will want to estimate a population proportion (p), for example the proportion or percent of persons who smoke. The process for estimating the standard error in this case is very similar to the one shown above. The standard deviation of a proportion is simply \/pq^ , where q is 7p. The standard error (which is the standard deviation of the sampling distribution) is \fpq~/\fN , or V pq/H where N is the sample size. Let's assume that we again use the sample standard deviation as an estimate of the population standard deviation, and that the proportion of persons who smoke in the sample is ps = .41 . The estimate of the standard error, in the case of a sample of 500, is therefore V41 (.59) /500 = .022 . We can then say with a five percent chance of being wrong that the population proportion is .41 ± 1 .96 (.022), or between .367 and .453. This calculation of the confidence interval is based on the normal approximation to the binomial distribution, and assumes that both Hp and H (1p), or Nq, are greater than 10. The confidence interval of a sample proportion is frequently used because of its relative simplicity. •The standard deviation is a measure of dispersion of the cases around the mean and is computed as the square root of the sum of the squared deviations of each case from the mean divided by the number of cases minus one. In this example this would be SD,= = 2.2. 5001 