Friday, September 11, 2009 External Validity Sampling, Normal Distribution Class Exercise External Validity The approximate truth of conclusions that involve generalizations The degree to which the conclusions in your study would hold for other persons in other places and at other times Two approaches: Sampling model Proximal Similarity Model The Sampling Model for External Validity Age group Frequency Percent Percent 2000 data 2008 estimate 18- 25 3654 3.7 11.6 13.0 (18-24) 10.8 (18-24) 26 - 39 8156 8.3 26.0 29.9 (25-39) 20.4 (25-39) 40 - 55 9530 9.7 30.3 28.7 21.8 Over 55 10087 10.2 32.1 28.3 (>=55) 23.4 (>=55) Total 31427 31.9 100.0 Missing 67222 68.1 Total 98649 100.0 http://www.census.gov/population/www//cen2000/briefs/phc-t9/index.html NHIS 2005 U.S. Census 2000, 2008 The Proximal Similarity Model for External Validity The Different Groups in the Sampling Model Sampling Distribution A theoretical concept based on the idea that an infinite number of samples from a population will resemble a bell-shaped curve or a normal curve distribution. The Sampling Distribution Sampling distribution is theoretical and based on the Central Limit Theorem. An infinite number of samples of the same sample size Converge around the same central value (mean) Fewer samples deviate from the central value “Average of the averages” is close to the population parameter. Standard error and the 68.27, 95.4, 99.7 percent rule Normal Curve Distribution Symmetrical and bell-shaped. In a perfect normal curve the mode, median, and mean are the same and at the center of the distribution. A fixed proportion of the observations lies between the mean and fixed units of standard deviations. Normal Curve Distribution 68.27% of Responses 34.13% of you’re the same is between the mean and 1 standard deviation to the right and 34.13% between the mean and 1 SD to the left. Taken together, you will find that 68.27% of your findings are within 1 SD above the mean and 1 SD below the mean. 95.4% of Responses 2 SD from the mean in both directions you will find that your sample statistic is within 95.4% (27.2% + 68.27) of the distribution of your sample findings or scores. Two SD from the mean you will find an additional 13.6% of your sample to the right and 13.6% to left of the mean. 99.7% of Responses Thus, 3 SD from the mean in both directions you will find that your sample statistic is within 99.74% (4.28% + 27.2% + 68.27) of the distribution of your sample findings or scores. 99% of your randomly selected respondents had a sample statistic (e.g., age, smoking prevalence) that was within 3 SD + from the mean of your sample. Standard Deviation Estimate of the dispersion from the mean response or score in a sample. It is a more precise estimate of the deviation from the mean than variance. The SD is the square root of the variance. Standard Deviation Standard Error or Sampling Error The difference between your sample estimate and the true value (population parameter, unknown) is called the standard error or sampling error. Standard error is estimated by the SD of your survey. Sampling Error Sampling error gives you some idea of the precision of your statistical estimate. The SD is the best estimator of the sampling error and the greater the SD, the greater the sampling error. The greater the sample size, the smaller the standard error. Sampling Error Generally decreases as the sample size increases (but not proportionally) Depends on the variability of the characteristic of interest in the population Can be accounted for and reduced by an appropriate sample plan Can be measured and controlled in probability sample surveys.