Stat 145 Course Notes – Chapter 20 Testing Hypotheses About Proportions Hypotheses In Statistics, a hypothesis proposes a model for the world. Then we look at the data. If the data are consistent with that model, we have no reason to disbelieve the hypothesis. Data consistent with the model lend support to the hypothesis, but do not prove it. But if the facts are inconsistent with the model, we need to make a choice as to whether they are inconsistent enough to disbelieve the model. If they are inconsistent enough, we can reject the model. Think about the logic of jury trials: To prove someone is guilty, we start by assuming they are innocent. We retain that hypothesis until the facts make it unlikely beyond a reasonable doubt. Then, and only then, we reject the hypothesis of innocence and declare the person guilty. The same logic used in jury trials is used in statistical tests of hypotheses: We begin by assuming that a hypothesis is true. Next we consider whether the data are consistent with the hypothesis. If they are, all we can do is retain the hypothesis we started with. If they are not, then like a jury, we ask whether they are unlikely beyond a reasonable doubt. The statistical twist is that we can quantify our doubt. We can use the model proposed by our hypothesis to calculate the probability that the event we’ve witnessed could happen. That’s just the probability we’re looking for—it quantifies exactly how surprised we are to see our results. This probability is called a P-value. When the data are consistent with the model from the null hypothesis, the P-value is high and we are unable to reject the null hypothesis. In that case, we have to “retain” the null hypothesis we started with. We can’t claim to have proved it; instead we “fail to reject the null hypothesis” when the data are consistent with the null hypothesis model and in line with what we would expect from natural sampling variability. Explain this concept in real words—if the data don’t seem too weird… If the P-value is low enough, we’ll “reject the null hypothesis,” since what we observed would be very unlikely were the null model true. Explain this concept in real words—if the data seem too weird to be true… Testing Hypotheses The null hypothesis, which we denote H0, specifies a population model parameter of interest and proposes a value for that parameter. We might have, for example, H0: p = 0.20, as in the chapter example. We want to compare our data to what we would expect given that H0 is true. We can do this by finding out how many standard deviations away from the proposed value we are. We then ask how likely it is to get results like we did if the null hypothesis were true. Example: According to a June 2004 Gallup poll, 28% of Americans “said there have been times in the last year when they haven’t been able to afford medical care” (“U.S. Minorities Still Struggle to Pay for Healthcare,” The Gallup Organization, July 27, 2004). Is this proportion higher for black Americans than for all Americans? A Trial as a Hypothesis Test Hypothesis testing is very much like a court trial. The null hypothesis is that the defendant is innocent. We then present the evidence—collect data. Then we judge the evidence—“Could these data plausibly have happened by chance if the null hypothesis were true?” If they were very unlikely to have occurred, then the evidence raises more than a reasonable doubt in our minds about the null hypothesis. Ultimately we must make a decision. How unlikely is unlikely? Some people advocate setting rigid standards—1 time out of 20 (0.05) or 1 time out of 100 (0.01). But if you have to make the decision, you must judge for yourself in any particular situation whether the plausibility is small enough to constitute “reasonable doubt.” Point out that we might all have different thresholds for this “reasonable doubt” and that the students should find out what is typical in their fields. What to Do with an “Innocent” Defendant If the evidence is not strong enough to reject the presumption of innocence, the jury returns with a verdict of “not guilty.” The just does not say that the defendant is innocent. All it says is that there is not enough evidence to convict, to reject innocence. The defendant may, in fact, be innocent, but the jury has no way to be sure. Said statistically, we will fail to reject the null hypothesis. We never declare the null hypothesis to be true, because we simply do not know whether it’s true or not. Sometimes in this case we say that the null hypothesis has been retained. In a trial, the burden of proof is on the prosecution. In a hypothesis test, the burden of proof is on the unusual claim. The null hypothesis is the ordinary state of affairs, so it’s the alternative to the null hypothesis that we consider unusual (and for which we must marshal evidence). Mention that the alternative hypothesis is typically the “research hypothesis,” the one we want to prove. This is kind of like wanting to prove guilt but assuming innocence in a jury trial. The Reasoning of Hypothesis Testing There are four basic parts to a hypothesis test: Hypotheses Model Mechanics Conclusion Let’s look at these parts in detail… Hypotheses The null hypothesis: To perform a hypothesis test, we must first translate our question of interest into a statement about model parameters. In general, we have H0: parameter = value. The alternative hypothesis: The alternative hypothesis, HA, contains the values of the parameter we accept if we reject the null first translate our question of interest into a statement about model parameters. HA comes in three basic forms: HA: parameter < value HA: parameter ≠ value HA: parameter > value Example: We want to test whether the proportion of black Americans who have not been able to afford medical care in the past year is higher than 28%. Our null hypothesis is: EMBED Equation.DSMT4 Our alternative hypothesis is: EMBED Equation.DSMT4 Model To plan a statistical hypothesis test, specify the model you will use to test the null hypothesis and the parameter of interest. All models require assumptions, so state the assumptions and check any corresponding conditions. Your plan should end with a statement like Because the conditions are satisfied, it is appropriate to model the sampling distribution of the proportion with a Normal model. Watch out, though, it might be the case that your model step ends with “Because the conditions are not satisfied, I can’t proceed with the test.” If that’s the case, stop and reconsider. Each test we discuss in the book has a name that you should include in your report. The test about proportions is called a one-proportion z-test. Example: Check the conditions… Independence Assumption: Gallup had a random sample of 801 black Americans in its survey. There is no reason to think that the answer one person gives is dependent on the answer another person gives, so we can reasonably assume that the answers of respondents are independent. Random Sampling Condition: We just mentioned that the sample was a random sample. 10% Condition: 801 respondents is certainly less than 10% of the population of black Americans Success/Failure Condition: EMBED Equation.DSMT4 and EMBED Equation.DSMT4 , so the sample is large enough Since the conditions are met, we can model the sampling distribution of the proportion with a Normal distribution. We can therefore use the one-proportion z-test. One-Proportion z-Test The conditions for the one-proportion z-test are the same as for the one proportion z-interval. We test the hypothesis EMBED Equation.DSMT4 using the statistic EMBED Equation.DSMT4 where EMBED Equation.DSMT4 . When the conditions are met and the null hypothesis is true, this statistic follows the standard Normal model, so we can use that model to obtain a P-value. Mechanics Under “mechanics” we place the actual calculation of our test statistic from the data. Different tests will have different formulas and different test statistics. Usually, the mechanics are handled by a statistics program or calculator, but it’s good to know the formulas. The ultimate goal of the calculation is to obtain a P-value: The P-value is the probability that the observed statistic (or an even more extreme value) could occur if the null model were correct. Two thing to point out here: 1) it’s a null model; 2) “as weird as or weirder than.” If the P-value is small enough, we’ll reject the null hypothesis. Note: The P-value is a conditional probability—it’s the probability that the observed results could have happened if the null hypothesis is true. Example: Our null model is Normal with mean 0.28 and SD EMBED Equation.DSMT4 The observed proportion (as reported by Gallup) is EMBED Equation.DSMT4 , so the z-value is EMBED Equation.DSMT4 The sample proportion is 6.29 SDs above the mean. It’s probably pretty clear without even looking at Table Z that the chance of being 6.29 or more SDs above the mean is close to zero, so the P-value ( 0. Conclusion The conclusion in a hypothesis test is always a statement about the null hypothesis. The conclusion must state either that we reject or that we fail to reject the null hypothesis. And, as always, the conclusion should be stated in context. Your conclusion about the null hypothesis should never be the end of a testing procedure. Often there are actions to take or policies to change. Example: Our P-value says that if the true proportion of black Americans who could not afford medical care in the past year is 28%, we would see results as extreme as or more extreme than we did almost never. This suggests that more than 28% of black Americans have not been able to afford medical care in the past year. Note: Statistics can tell us that this disparity exists. But, it is up to people in other disciplines, with whom statisticians work, to make equitable policy changes. Alternative Alternatives Recall that there are three possible alternative hypotheses: HA: parameter < value HA: parameter ≠ value HA: parameter > value HA: parameter ≠ value is known as a two-sided alternative because we are equally interested in deviations on either side of the null hypothesis value. For two-sided alternatives, the P-value is the probability of deviating in either direction from the null hypothesis value. Example: Suppose that based on Dr. Miller’s past experience, she thinks that 70% of her students in Stat 145 drink. The Spring 2005 class survey revealed that 183 out of 249 students drink. Does the survey give reason to believe that the percentage of Stat 145 students who drink differs from 70%? Hypotheses: EMBED Equation.DSMT4 and EMBED Equation.DSMT4 Model: Check the conditions… Independence Assumption: There is no reason to think that the answer one person gives is dependent on the answer another person gives, so we can reasonably assume that the answers of respondents are independent. Random Sampling Condition: This was not a random sample, but there is no reason to expect unnecessary bias. 10% Condition: 249 respondents is certainly less than 10% of the population of all students who could take Stat 145 Success/Failure Condition: EMBED Equation.DSMT4 and EMBED Equation.DSMT4 , so the sample is large enough Since the conditions are met, we can model the sampling distribution of the proportion with a Normal distribution. We can therefore use the one-proportion z-test. Mechanics: Our null model is Normal with mean 0.70 and SD EMBED Equation.DSMT4 The observed proportion is EMBED Equation.DSMT4 , so the z-value is EMBED Equation.DSMT4 The sample proportion is 1.21 SDs above the mean. Our P-value is: P-value = EMBED Equation.DSMT4 Make sure to draw the picture to show the P-value. Talk about using Z = -1.21 as well. Conclusion: Our P-value says that if the true proportion of Stat 145 students who drink is 70%, we would see results as extreme as or more extreme than we did about 22.6 of the time. This suggests that Dr. Miller’s estimate is reasonable. The other two alternative hypotheses are called one-sided alternatives. A one-sided alternative focuses on deviations from the null hypothesis value in only one direction. Thus, the P-value for one-sided alternatives is the probability of deviating only in the direction of the alternative away from the null hypothesis value. For an alternative hypothesis of the form HA: parameter < value, we have a left-tailed test: For an alternative hypothesis of the form HA: parameter > value, we have a right-tailed test: P-Values and Decisions: What to Tell About a Hypothesis Test How small should the P-value be in order for you to reject the null hypothesis? It turns out that our decision criterion is context-dependent. When we’re screening for a disease and want to be sure we treat all those who are sick, we may be willing to reject the null hypothesis of no disease with a fairly large P-value. A longstanding hypothesis, believed by many to be true, needs stronger evidence (and a correspondingly small P-value) to reject it. Another factor in choosing a P-value is the importance of the issue being tested. Your conclusion about any null hypothesis should be accompanied by the P-value of the test. If possible, it should also include a confidence interval for the parameter of interest. Don’t just declare the null hypothesis rejected or not rejected. Report the P-value to show the strength of the evidence against the hypothesis. This will let each reader decide whether or not to reject the null hypothesis. A significance level (notated by EMBED Equation.DSMT4 ) is sometimes listed to identify a specific value below which we would reject the null: If the P-value < EMBED Equation.DSMT4 , reject the null hypothesis. If the P-value ≥ EMBED Equation.DSMT4 , fail to reject the null hypothesis. But remember, it is far better to report the P-value so that each reader can make her/his own decision. Example: Suppose that, after performing a hypothesis test, we get a P-value of 0.03. What would our decision be at the 0.01 level? We would fail to reject the null hypothesis at the 0.01 level, since 0.03 > 0.01. What would our decision be at the 0.05 level? We would reject the null hypothesis at the 0.05 level, since 0.03 < 0.05. What additional information does the P-value give us? Knowing the P-value allows us to make up our own minds about how strong the evidence is against the null hypothesis without having hard and fast rules binding us to decisions. Example: What proportion of US teens (aged 13 to 17) know someone who is in an abusive relationship? Is the figure higher than 10%? According to a May 2005 Gallup Youth Survey, 12% of US teens (aged 13 to 17) reported that they “know someone in [their] own age group who is in an abusive relationship with a boyfriend or girlfriend” (“Adolescents Not Invulnerable to Abusive Relationships,” The Gallup Organization, May 24, 2005). Hypotheses: EMBED Equation.DSMT4 and EMBED Equation.DSMT4 Model: Check the conditions… Independence Assumption: There is no reason to think that the answer one person gives is dependent on the answer another person gives, so we can reasonably assume that the answers of respondents are independent. Random Sampling Condition: Gallup reported that this was a randomly selected national sample of 1028 teenagers. 10% Condition: 1028 (not given in the problem, but stated in random sampling condition) respondents is certainly less than 10% of the population of all US teens (aged 13 to 17) Success/Failure Condition: EMBED Equation.DSMT4 and EMBED Equation.DSMT4 , so the sample is large enough Since the conditions are met, we can model the sampling distribution of the proportion with a Normal distribution. We can therefore use the one-proportion z-test. Mechanics: Our null model is Normal with mean 0.10 and SD EMBED Equation.DSMT4 The observed proportion is EMBED Equation.DSMT4 , so the z-value is EMBED Equation.DSMT4 The sample proportion is 2.13 SDs above the mean. Our P-value is: P-value = EMBED Equation.DSMT4 Don’t forget to draw a picture for the students. Conclusion: Our P-value says that if the true proportion of US teens (aged 13 to 17) who know a peer in an abusive romantic relationship is 0.10, we would see results as extreme as or more extreme than we did about 1.7% of the time. We reject the null hypothesis and find that we have enough evidence to conclude that more than 10% of US teens know peers in an abusive romantic relationship. What Can Go Wrong? Hypothesis tests are so widely used—and so widely misused—that the issues involved are addressed in their own chapter (Chapter 21). There are a few issues that we can talk about already, though: Don’t base your null hypothesis on what you see in the data. Think about the situation you are investigating and develop your null hypothesis appropriately. Don’t base the alternative hypothesis on the data, either. Again, you need to Think about the situation. Don’t make your null hypothesis what you want to show to be true. You can reject the null hypothesis, but you can never “accept” or “prove” the null. Don’t forget to check the conditions. We need randomization, independence, and a sample that is large enough to justify the use of the Normal model.