Frequently Asked Questions Re Lecture 11 Statistics 480, Winter 2006 February 9, 2006 For ratio estimation, I understand that I need ¯y and ¯x. But what else? If your goal is to estimate µy, then you’d need µx. If your goal is to estimate the total of y’s, eg τy, then you’d need to know τx. When would τy be more important to know that µy? Estimating the total weight of a shipment of oranges would be one example. Estimating the total value of inventoried items could be another. Still, the question’s unstated premise is correct — we’re often more interested in population averages than in population totals. So why all the fuss (in lectures, at least) about population totals? The answer is that when it comes to calculating variances of estimates, estimated totals are easier to work with than estimated means. If you have separate estimates of a mean for two strata, then the estimator of the combined mean will be a weighted average of those estimates, and the variance of this estimator involves squaring the weights attached to variances of the separate estimates. On the other hand, the total τy for stratum A plus stratum B is estimated by the sum of estimates of τyA and τyB, and the variance of ˆτy would just be Var(ˆτyA) + Var(ˆτyB). What if you plot x versus y and see a negative trend — could you still use regres- sion estimation? Sure, provided the trend is roughly linear and errors are homoskedastic. Wouldn’t want to use ratio or difference, of course. I’m still unsure about the difference estimate. Is it pretty much just a SRS estimate on the differences, δi ≡ yi −xi? Yes, precisely. Nothing more to it than that. How to determine what is y and what is x? In difference, ratio, and regression esti- mation, x is the variable for which we already know µ or τ, but it’s y that we want to know the mean or total of. Why would we look for trends in a plot to see whether there’s bias in a regression estimate? 1 We’re not using regression in an ordinary way here. Note, for instance, we don’t start by assuming that Y = b0 + b1x + epsilon1, or anything like that. All we’re assuming is that the sample is selected at random. Since we’re not making those assumptions, we can’t deduce from them that the regression estimate is unbiased. It isn’t, of course. For more on this, you might look at the discussion on pp. 210-11 of the text, §6.8. 2