Solution Set #5 Statistics 480, Winter 2006 10.4 n = 120, t = 100, s = 27 $ () . $ ( $ ) ( )()()( ) () $ ( $ ). N nt s VN tnn s s BVN == = ≈ = − = − == 120 100 27 444 44 445 100 120 120 27 27 21506 2 3 2 3 11.2 k = 10 yy yy yy 16 27 38 49 510 48 46 48 40 76 50 52 36 62 68 == == == .. .. .. y i ∑ = 52.6 The estimated mean y is, given in Equation (11.2), y k y i === ∑ 1526 10 526 . . The estimated variance of y is, given in Equation (11.3), ignoring the fpc, then becomes ˆ V (y ) = s k 2 k = 1.247 2 10 with BVy==2 $ () .789 11.6 N = 493, n = 30, n 1 = 24 y 1j ∑ = 235.3 The estimator of the population mean is y 1 , given by Equation (11.5), which yields an estimate of y n y j1 1 1 1235 24 === ∑ . 9.804 The quantity ()/Nn N 11 1 − must be estimated by ()/NnN− , since N 1 is unknown. The estimate of y 1 , given in Equation (11.6), then becomes ˆ V (y 1 ) = N −n N ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ s 1 2 n 1 = 493 −30 493 ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ 36.06 24 with BVy==2 1 $ () 2.376 11.8 When N 1 (= 420) is known the estimate of total amount of past-due accounts for the store, is, by Equation (11.7) $ (.) .τ 1 1 1 1 420 24 235 3 4117 75== = ∑ N n y j The estimated variance of $ τ 1 is, by Equation (11.8) ˆ V ( ˆ τ 1 ) = N 1 2 N 1 − n 1 N 1 ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ s 1 2 n 1 = 420 2 420 −24 420 ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ 36.06 24 with BV==2 999 81 1 $ ( $ ).τ 11.10 Dropping ten points (like grains of rice) at random on the page of rectangles should provide a sample for which the probability of selection is proportional to the area of the rectangle. This could also be done more accurately by using pairs of random numbers to locate rectangles on the grid. The results of one such study are presented on the table below. The weights are now the reciprocals of the observed areas. Rectangle y = area w = 1/y wy wy-rw 12 8 0.125000 1 0.00478 57 18 0.055556 1 0.55768 29 10 0.100000 1 0.20382 91 3 0.333333 1 -1.65393 53 6 0.166667 1 -0.32696 33 12 0.083333 1 0.33652 65 5 0.200000 1 -0.59236 81 15 0.066667 1 0.46921 60 16 0.062500 1 0.50239 47 16 0.062500 1 0.50239 n MEAN MEDIAN STDEV wy 10 1.0000 1.0000 0.000 w 10 0.1256 0.0917 0.088 wy-rw 10 0.0000 0.2700 0.698 The ratio estimate of mean area is given by 7.962 256.1 10 ˆ ==== ∑ ∑ i ii w yw rμ It’s margin of error is 2 ˆ V (r) = 2 N −n N 1 w s r n = 2 90 100 0.698 0.1256 10 = 2(1.667)= 3.334 The unweighted mean of the 10 observed areas is 10.90. The weighting does a good job of scaling this down closer to truth (7.40). 11.11 With the underlying assumption that nonresponse rates may differ from stratum to stratum, but the actual nonrespondents are somewhat randomly arranged within each stratum, we can proceed to a weighting class adjustment. Assuming the original stratified random sample sizes are not known, the poststratification population sizes must be estimated from the data that was actually observed. These are shown below. ˆ N 1 = 70 6 32 ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ = 13 ˆ N 2 = 70 10 32 ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ = 22 ˆ N 3 = 70 5 32 ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ =11 ˆ N 4 = 70 7 32 ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ = 15 ˆ N 5 = 70 4 32 ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ = 9 Using these estimated population sizes with the observed data, and following the standard calculations for stratified random sampling from Chapter 5, yields an estimate of the total student enrollment in large lecture sections of 70(670.405)=46,928. The margin of error is 70(111.727)=7821. These values are quite close to those from the original stratified random sampling analysis. 11.12 The original table of cell counts, shown below, must be adjusted iteratively so that the row totals are close to 50 each and the column totals are 40 and 60. The first step is to multiply row 1 by 50/30 and row 2 by 50/70. This results in: 20 30 50 28.6 21.4 50 48.6 51.4 100 Now, the column totals are off, so the second step is to multiply column 1 by 40/48.6 and column 2 by 60/51.4, producing: 16.5 35 51.5 23.5 25 48.5 40 60 100 This throws the row totals off a bit, so the third step is to multiply row 1 by 50/51.5 and row 2 by 50/48.5, producing 16 34 50 24.2 25.8 50 40.2 59.8 100 We could do another iteration, but this is close enough. The population size of N=600 units is now allocated among the poststrata in proportion to the sample sizes; that is, the Region A owner-occupied cell (upper left) will get assigned an estimated population size of 600(.16)=96. The final result for the estimated population sizes, ˆ N i , are: 96 204 145 155 Using these population sizes and the data from the survey, a stratified sampling estimate and its margin of error can be calculated using the methods of Chapter 5. 11.15 Bootstrap (a) Mean The distribution of 400 sample means from bootstrap samples is shown in the histogram below. The bootstrap interval estimate of the population mean covers the middle 95% of these values, and ranges from $127,156 to $196,157. The traditional confidence interval from simple random sampling (Chapter 4) is ($124,104, $192,488), so the two methods give similar results. (b) Median The distribution of 400 sample medians from bootstrap samples is shown in the histogram below. The bootstrap interval estimate of the population median covers the middle 95% of these values, and ranges from $125,551 to $164,870. This interval is shifted toward lower 50 100 150 200 250 300 Median 100000 180000 260000 Bootstrap Medians Histogram 20 40 60 80 100 Mean 100000 160000 220000 Bootstrap Means Histogram values than those for the mean, primarily because of the skewness in the distribution of typical housing values. (c) Ratio The distribution of 400 sample ratios of mean 2002 values to mean 1994 values from bootstrap samples is shown in the histogram below. The bootstrap interval estimate of the population ratio covers the middle 95% of these values, and ranges from 1.43 to 1.46. The traditional ratio estimate from Chapter 6 gives an interval of (1.37, 1.51). 50 100 150 200 250 300 350 400 Ratio 1.42 1.44 1.46 1.48 1.50 1.52 Bootstrap Ratios Histogram yakdas Microsoft Word - solution5.doc