ROBE Math 1530 Fall 2010 Mega-chapter (13 & 16 together) Introduction to Inference For starters, you want to read pp. 231 - 233. Slowly and carefully. Those pages present a very basic introduction to the material represented by the next several chapters. These notes are not comprehensive coverage of the material, they are designed to be lectured with. Chapter 13 is particularly well written, you may choose to read it. In a very broad sense, chapters 13 and 16 cover the same material, just with different levels of detail. These notes cover that material in 3 parts – Confidence Intervals, Hypothesis Testing, and (chapter 14, with its own notes) Inference in Practice. Things we already know and can use: Chapter 3: A population’s mean is ‘μ’ its standard deviation is ‘σ’ (these must be given). Chapter 7: We plan to take an SRS of size n Chapter 2: and compute EMBED Equation.3 . Chapter 10: By the Central Limit Theorem, EMBED Equation.3 lives in a Normal distribution with μ( EMBED Equation.3 ) = μ and σ( EMBED Equation.3 ) = σ/√n. Leading Questions ♦ The middle 95% of all z-scores are between and . ♦ The middle 95% of any Normal distribution have z-scores between and . ♦ For example: IQ scores follow a Normal distribution with mean 100 and standard deviation 15. The middle 95% of IQ scores are between and . ♦ For example, messier case: Suppose we take SRS’s of size n = 25 and find the mean QUOTE of those 25 IQ scores. According to the Central Limit Theorem, all of those different possible QUOTE ’s that we might get follow a Normal distribution, but with mean QUOTE = and standard deviation QUOTE = . Now – the middle 95% of QUOTE ’s Normal distribution still has z-scores between -1.96 and 1.96. So, the middle 95% of QUOTE ’s fall between and . New Example Consider SAT scores instead. Suppose we know that for SAT’s, σ = 209, but that the value of μ for SAT’s is unknown. Regardless of not knowing the μ for SAT’s, we can still take an SRS of 50 SAT’s and calculate the sample mean EMBED Equation.3 . Hypothetically, if we took every possible sample of n = 50 SAT scores and calculated EMBED Equation.3 for every single one, those EMBED Equation.3 ’s would have an approximately Normal distribution with mean μ( EMBED Equation.3 ) = the same as the unknown μ for all SAT scores and standard deviation σ( EMBED Equation.3 ) = σ/√n = 209/√50 = 29.56. Observe: ● WHATEVER μ is – 95% of the possible EMBED Equation.3 ’s we might get fall between μ ± 2(209/√50), or μ ± 59.11 ● For the same 95% of samples, μ will fall between EMBED Equation.3 + 59.11 Say we take an SRS of 50 SAT scores and find the sample mean EMBED Equation.3 is 1104. AGAIN – this is one sample, out of millions of samples that could have been found. It might be an unlucky one. Or it might not. We DON’T KNOW. But – we use the sample we have. EMBED Equation.3 + 59.11 becomes 1104 + 59.11 or 1044.89 to 1163.11. Then the “95% confidence interval” for μ of all SAT’s is 1044.89 to 1163.11. We are now 95% confident that the mean for all SAT’s is between 1044.89 and 1163.11. We do NOT, repeat NOT, know for certain whether the real value of μ for SAT’s is between the two numbers. We will not ever know for certain. What we have is a method that we know in the long run (probability! of EMBED Equation.3 ) will be right 95% of the time. There are lots of samples of n = 50 SAT scores out there, and each sample has its own EMBED Equation.3 . 95% of the intervals will contain the real value of μ for SAT’s. So, we use one interval, from the one sample we had, and say we have “95% confidence” in it. In this mix, probability applies to all possible EMBED Equation.3 ’s before we take a sample. Confidence (in the same amount) is attached to the interval we build around a particular EMBED Equation.3 once we have it. margin of error and confidence level We carefully built a 95% confidence interval for the unknown mean of all SAT scores: 1104 + 59.11, or 1044.89 to 1163.11. Because 95% of all samples have EMBED Equation.3 ’s between μ + 59.11, any SRS of 50 SAT’s would give a 95% confidence interval EMBED Equation.3 + 59.11. The + 59.11 part is called the margin of error. This is how close we think any SRS’s EMBED Equation.3 is likely to be to the real value of μ (based on wanting 95% confidence in the answer.) 1104 + 59.11 is a 95% confidence interval for the unknown mean μ of all SAT scores ↑ This part is the interval, but it has a confidence level attached. “I think the real μ for all SAT scores is between 1044.89 and 1163.11” and I am 95% confident of that. Note the boxes on p. 234, especially the one about ‘Interpreting a Confidence Interval.’ 95% confidence is all very well and good, but what else is out there? * A 68% confidence interval for the mean μ of all SAT scores would look like EMBED Equation.3 ± 1(σ/√n), or EMBED Equation.3 ± 1(209/√50), or EMBED Equation.3 ± 29.56 Using the same old sample that we had, the 68% confidence interval would be 1104 ± 29.56, or 1074.44 to 1133.56 “We can be 68% confident that the real μ for all SAT’s is between 1074.44 and 1133.56.” The 68% confidence interval is narrower (more precise), but we pay for that by having less confidence in the result. * A 99.7% confidence interval for the mean μ of all SAT scores would look like EMBED Equation.3 ± 3(σ/√n), or EMBED Equation.3 ± 3(209/√50), or EMBED Equation.3 ± 88.67 (assuming n = 50) Using the same old sample that we had, the 99.7% confidence interval would be 1104 ± 88.67, or 1015.33 to 1192.67 “We can be 99.7% confident that the real μ for all SAT’s is between 1015.33 and 1192.67. The 99.7% confidence interval has more confidence, but we pay for that by having a much broader (less precise) interval. * Look carefully at what changed for each different confidence level: 68%’s margin of error was 1(209/√50), or 29.56 95%’s margin of error was 2(209/√50), or 59.11 99.7%’s margin of error was 3(209/√50), or 88.67 The 1, 2, and 3 (numbers of standard deviations) are also known as critical values for the Standard Normal distribution. Critical values, usually labeled z*, are just z-scores that pick out a given percent between – z* and + z* under the standard Normal curve. As all Normal curves scale into the Standard Normal, you will always find the exact same percent between –z*(σ/√n) and +z*(σ/√n) under any Normal curve with standard deviation σ/√n. We are not limited to 68, 95, or 99.7% confidence. There are lists of z* for different confidence levels, or software can calculate them. We can build a confidence interval with any confidence level we want – and we’re going to let software do just that. (See also the top C row and the near-bottom z* row in table C.) ♦ Just for giggles, because it makes this Statistical Inference business a bit easier to learn, chapter 13 ‘Introduction to inference’ always has the “simple conditions” in a box on p. 232. 1) Every sample we need to use is an SRS with no bias, nonresponse, etc. 2) The variable (ACT scores, SAT scores, cholesterol levels) has a Normal distribution. 3) We don’t know μ, that’s what building a confidence interval is for, but we do have a σ to work with. “Known (” is unrealistic. Not all samples are SRS’s. While many populations are approximately Normal, this is only approximately, and many variables just aren’t even close to Normal. But we still want to do inference. So, we adjust. p. 286 (inside chapter 16) standard error ( is a population standard deviation, i.e., a parameter (if available, σ must be given, NOT calculated) ( ( EMBED Equation.3 ) is ‘the standard deviation of the mean,’ that is, the standard deviation of EMBED Equation.3 ’s sampling distribution ( a parameter, so to speak, in the super-population of all possible samples of size n ) As long as the sample is not too big a part of the population, ( ( EMBED Equation.3 ) = ( / √ n s is the sample’s (internal) standard deviation (see chapter 2) , and is a statistic. Specifically, s is the statistic that estimates ( when ( is unknown. There is a formula, but we use our calculators to find s. Re-learn to use yours!! Examples 12 12 12 12 12 has s = 0 12 12 12 12 13 has s = 0.4472… 12 12 12 12 15 has s = 1.3416... s / √ n is the ‘standard error of the mean’, aka SEmean or SE( EMBED Equation.3 ). example Suppose we have an SRS of quiz grades: 10 8 9 8 8 5 9 7 9 9 6 n = EMBED Equation.3 = s = s / √ n = is the standard error of the mean ╣ Given ( , EMBED Equation.3 and every z-score ever lives in the same N(0,1) distribution. Without (, EMBED Equation.3 is not a z-score anymore. BUT we STILL need to 'measure' EMBED Equation.3 . When we substitute s in place of (, we get something else (Box p. 287) – the t-statistic EMBED Equation.3 t does not use σ/√n, so t is not a z-score, and t does not have a NORMAL DISTRIBUTION. So, it does not belong in table A. So – what is t’s ‘sampling distribution?’ t measures the difference between EMBED Equation.3 and (in terms of 'standard error', not standard deviation) But ( t depends on the sample data for both EMBED Equation.3 and s ( two statistics instead of one. All z-scores live in the same N(0,1) [tall bell curve, very flat tails] distribution. A "t-distribution" is bell-ish shaped – not as tall in the middle, much fatter tail, and has n – 1 degrees of freedom. Each degrees-of-freedom is the t(n-1) distribution. The more degrees of freedom a t has, the closer it is to a z-score. example a t statistic for the quiz data 10 8 9 8 8 5 9 7 9 9 6, n = 11 would have 11 – 1 = 10 degrees of freedom. see table C “t distribution critical values” Confidence level 70% 80% 90% 95% Df … … … … … 10 1.093 1.372 1.812 2.228 … … … … … z* 1.036 1.282 1.645 1.960 Box p. 289 Draw an SRS from a population with unknown mean μ. A level C t confidence interval for μ is EMBED Equation.3 where t* is from the t(n-1) distribution with middle area C%. example Find a 90% confidence interval for μ = the mean of all quiz grades, using the SRS 10 8 9 8 8 5 9 7 9 9 6 6 3 6 10 10 8 7 7 4 6 7 EMBED Equation.3 is a statistic we need to calculate from the data. s is another statistic we need to calculate from the data. t* with df = 21 and 90% confidence is a value we need to look up? But – for t statistic problems, in Math 1530, we will use technology, as most Statistics users do, so that the t-distribution table (or lack thereof) is less of an issue. Then we can concentrate on using the result and less on number crunching. Let’s let MINITAB do it all!! ** t (confidence intervals and tests) on Minitab ** ENTER your data into C1 (skip this step if you just have EMBED Equation.3 , s, and n) CLICK on Stat > Basic Statistics > 1-sample t ( we deal only in one sample problems ) and a dialog box opens up.. ( Samples in columns Put C1 here OR ○ Summarized data Sample size: Put n here Mean: Put EMBED Equation.3 here Standard deviation: Put s here Test mean: ← Fill in this only if you have a hypothesis test. Options CLICK on button to specify a confidence level and/or an alternate hypothesis. [ the alternate should be ‘not equal’ for our plain CI to work] Confidence level: 90% This is where we tell it OK OK With our example data in C1, and specifying a 90% confidence interval, here is what Minitab told me: One-Sample T: quiz grades Variable N Mean StDev SE Mean 90% CI quiz grades 22 7.36364 1.91598 0.40849 (6.66073, 8.06654) “Based on the sample given, we are 90% confident the mean μ of all quiz grades is between 6.66 and 8.07.” The Minitab bit is way too easy? We just type in some numbers and crank out a confidence interval. What roadblocks does reality have for this? We can truly be 90% confident that the mean μ of all quiz scores is between 6.66 and 8.06 IF the data is an SRS of all quiz scores (in this case, I say that it is.) [plus, none of this is justified without ‘SRS’] the population of all quiz scores is approximately Normal But Robe gives quizzes to each of her classes on different days of the week, at different times of day, on different colors of paper, there are several different editions of each quiz. Perhaps the population of all quiz scores is NOT Normal. We just don’t know for sure about this population. Using EMBED Equation.3 means outliers could be a problem. We ought to graph the data anyway. See p. 297 Robustness of the t procedures. A statistical procedure is called ‘robust’ if its interpretation stands up to ‘conditions’ or ‘assumptions’ that are not met just right. For instance, no statistical procedure is ‘robust’ against a biased sample. On the other hand, a very large (random) sample size n is robust against outliers. The sample mean EMBED Equation.3 for 120 Math 1530 test #2 grades is not bothered by one really low outlier. (Your three test grades, on the other hand…) ‘Are we justified in using the t calculations with this sample?’ When there is some question about ‘Normal population,’ or 'is the sample big enough,' (reality, not ‘simple conditions’) the only shape we can check is a graph of the data itself (histogram, stemplot, dotplot). The t statistic is so sensitive that the shape of the data set affects how valid it is to use t: Box p. 298 for n < 15, the t procedures are valid ONLY IF the data is fairly symmetric around 1 peak with no skewness or outliers (ie, the data itself is roughly Normal) for 15 < n < 40, the t procedures are valid ONLY if the data has no outliers or bad skew; for 40 < n, that is “large n” as far as means are concerned, the t procedures are valid regardless of the sample. “Large n” is big enough to overcome the effects of outliers. NOTE example 16.5 p. 298; note also that ‘large n’ is relative. In class, in the Stat Cave, at this point, we will do an exercise in which the students use Minitab to create their own confidence intervals. A hypothesis test, or test of hypotheses, or significance test, or test of significance is the other form of statistical inference that we will look at this semester. (Starts on p. 238 in chapter 13.) There is a claimed or supposed value of a parameter, but you doubt the claim. “Professor Robe claims that the mean on the Math 1530 final is 70%. You’ve heard vicious rumors, and don’t believe her.” An outcome (sample result) that would rarely happen if a claim were true is ‘evidence against’ the claim. Example Prof. Robe brings a giant ‘coin’ made out of paper plates to class. She says that it is a fair coin. Suppose she ‘flips’ the coin 10 times, and gets 10 ‘heads’ in a row. ’10 heads’ is possible for a fair coin. The probability is (1/2) 10 = 1/1024 = 0.000976.. (very unlikely, if the coin is fair) Now – we don’t believe the coin is fair. Getting “10 heads in a row” is ‘evidence against the claim my coin is fair.’ A hypothesis test for a population mean μ comes in fairly distinct stages: I. Recognize that it is a hypothesis test problem ► There is both a claim (about a mean μ) and a potential problem, or ► there is a question about ‘evidence’ or ‘statistical significance’ or ‘significance level.’ II. Verbally pin down the parameter μ is ‘the mean SSHS score among older college students’ AND state the hypotheses H 0 null hypothesis H 0 : μ = problem’s number (very specific) H a alternative hypothesis H a μ > or < or ≠ problem’s number (vague) Example p. 242 # 8 III. p. 242 Calculate (or fish out) a statistic and test statistic (your statistic measured against H0‘s value of μ ) A sample is inherently imperfect, but it has a sampling distribution, and we can use that. ► If QUOTE and μ are the respective means, and σ is given, then a z-score tests how far QUOTE is from μ. Example p. 253 #13.34 ► What if σ isn’t given? Without (, EMBED Equation.3 is not a z-score anymore. BUT we STILL need to 'measure' EMBED Equation.3 . When we substitute s in place of (, we get – the t-statistic EMBED Equation.3 Example p. 301 #16 IV. p. 242 Once you have a z (or a t) statistic, the next step is to find the P-value (you may want to memorize the definition) (blue box, p. 242) ► If σ is given and the ‘test statistic’ is a z-score, then the P-value is a table A probability. (say z = -1.23 and H a is ‘less than’.. then we look in table A and the P-value is 0.1093 ) Example p. 253 now look up the actual P-value for #13.34 How does P-value work for t ‘s? example Suppose Professor Robe claims that μ for the most recent quiz is 7. That seems too low to you. You felt good about that quiz and even helped other people studying. You think μ is actually higher than her claimed μ = 7 Ah. A hypothesis test situation. The population is all scores on the most recent quiz. The sample is the SRS: 10 8 9 8 8 5 9 7 6 3 6 10 10 8 7 7 6 7 9 6 9 4 The parameter is μ = the mean of the quiz scores. H0 : μ = 7 Ha : μ > 7 EMBED Equation.3 is a statistic we need to calculate from the data. s is another statistic we need to calculate from the data. t* = ( EMBED Equation.3 - μ ) / (s / √ n ) is going to have df = 21 & we don’t have a df = 21 table and… Ick. Let the technology do it!! ENTER the data into column 1 (I called mine ‘quiz grades’) CLICK on Stat (button across the top) > Basic Statistics > 1-sample t > dialog box opens up.. ( Samples in columns Put C1 here Test mean: 7 ← Fill this in – H0’s value - when you have a hypothesis test. Options CLICK on button to specify an alternate hypothesis. In our case Confidence level : Alternative : greater than OK OK OK Here is what Minitab told me One-Sample T: quiz grades Test of mu = 7 vs > 7 90% Lower Variable N Mean StDev SE Mean Bound T P quiz grades 22 7.36364 1.91598 0.40849 6.66073 0.89 0.192 So – what makes the difference between one P-value and another one - ‘high’ P-value versus ‘low’ P-value? □ a low P-value happens when □ QUOTE is far from μ □ the z test statistic is large-ish □ QUOTE is unlikely if H 0 is true □ there is ‘significant’ or ‘strong’ evidence against H 0 □ a high P-value happens when □ QUOTE is closer to μ □ the z test statistic is small-ish □ QUOTE is consistent with ‘H 0 is true’ □ there is ‘not’ significance or evidence against H 0 V. p. 248 Handle a significance level: □ P-value < α means the sample is “statistically significant” at level α [so we reject H 0 ] □ P-value > α means the sample is “not statistically significant” at level α [so we do not reject H 0 .] Example Refer back to our hypothesis test with quiz grades. Is the sample statistically significant at level α = 5%? Is the sample statistically significant at level α = 20%? In class, in the Stat Cave, at this point, we will do an exercise in which the students use Minitab to run their own hypothesis test. Example p. 303 #30 (needs Minitab – note: this is matched pairs data)