Stat 201 - Project 2 Guidance – Fall 09 Here is an example of the JMP output we expect to see, as well as instructions on how to get the JMP output. See the project description for instructions on how to take a random sample. You may also find it useful to look over some of the JMP tutorials we have prepared at http://web.utk.edu/~cwiek/201Tutorials/ #1 Come up with some good answers! Just use your common sense. #2 a) Let’s choose to analyze the % of people who did not vote in the election last year. The proportion is 29.0%. Found from Analyze > Distribution, and choosing 43 Voted for Barack Obama. 43 Voted for Barack Obama? Frequencies Level Count Prob Yes 154 0.26598 No, Someone Else 257 0.44387 No, I Didn't Vote 168 0.29016 Total 579 1.00000 b) After selecting a random subsample of size 45 (for example), we can repeat the steps in (a) to get the results for the subsample. The confidence interval can be obtained by clicking the red arrow, selecting Confidence Interval, then 0.90) Be sure to also write up what the value of EMBED Equation.3 is as well as report the 90% CI that JMP gives, and give an interpretation of what the interval tells us! 43 Voted for Barack Obama? Confidence Intervals Level Count Prob Lower CI Upper CI 1-Alpha Yes 13 0.28889 0.192261 0.409462 0.900 No, Someone Else 17 0.37778 0.269041 0.500378 0.900 No, I Didn't Vote 15 0.33333 0.230125 0.455446 0.900 Total 45 c) Show whether n EMBED Equation.3 ≥10 and n(1- EMBED Equation.3 )≥10 for your sample, and comment whether your CI in (b) was appropriate. d) Report whether the CI includes p: it does for this example, but it may not for yours. In fact there’s a 10% chance it won’t since it’s a 90% confidence interval! #3) a) Let’s choose to analyze GPA. You can find μ (the average of all respondents) by clicking Analyze, then Distribution, and selecting 11 Your GPA. Be sure to report the value of μ! Note: In this variable there are three outliers whose values are 0. This probably corresponds to people providing bogus answers. Zero is not a valid GPA for our analysis, so a better analysis (and one you should do) is to eliminate the bogus observations. See Project 1 guidance/example project or the JMP tutorials for a refresher on how to exclude rows. 11 Your GPA Moments Mean 3.1561485 Std Dev 0.5370149 Std Err Mean 0.0223176 Upper 95% Mean 3.199982 Lower 95% Mean 3.1123151 N 579 b) After taking a random subsample of size 45 (for example), you can click Analyze, Distribution once again to generate the relevant histogram. After reading p. 592 in your book, make sure you comment on whether the histogram is Normal enough to make a CI for the mean. In this subset, there looks to be an outlier at 1.25. Does this mean the confidence interval will still be valid? Is this Normal enough? 11 Your GPA Moments Mean 3.0348889 Std Dev 0.5449088 Std Err Mean 0.0812302 Upper 95% Mean 3.1985976 Lower 95% Mean 2.8711802 N 45 c) JMP automatically reports a 95% CI (the lower value is labeled lower 95% Mean, the upper value is labeled upper 95% Mean) in the output of Analyze, Distribution. Be sure to include this output and interpret the interval! Also be sure to comment on whether μ is inside the interval so we know if your interval got it right. In this case, it did! #4) a) Let’s choose to analyze GPA based on UT being the first choice for college. We can make histograms for both categories by once again using Analyze, Distribution. Here, I entered GPA for Y, and selected UT first choice for the “By” selection. You will do something similar. Note, your total sample size may be slightly less than what you started with due to eliminating extreme outliers. This is ok! Once again read p. 592 in your book, and comment as to whether BOTH histograms are nearly Normal. b) To conduct a hypothesis test, I need to specify the null and alternative hypothesis. Here’s one line of reasoning. One can imagine that the better-than-average students may have had as their first choice a number of highly selective schools such as Harvard, M.I.T., Julliard, etc. They might have used UT as a safety school when they were rejected, or perhaps did not have the funds to afford a private school even if they were accepted. Thus, one can imagine that the GPAs of these better-than-average students who did not have UT as their first choice may be higher GPAs than students who did have UT as their first choice. Thus, if µUT is the average GPA of those who had UT as their first choice, and µnotUT is the average GPA of those who did not have UT as their first choice, the hypothesis pair is: Ho: µUT - µnotUT = 0 vs. HA: µUT - µnotUT < 0 I performed the test by going to Fit Y by X, entering in GPA for Y and UT first choice for X, then clicking the red arrow and selecting t-test. Note: JMP will perform the hypothesis test with the null set up as Ho: µ1 - µ2 = 0 where group 1 is the category that comes LAST alphabetically, and group 2 is the category that comes FIRST alphabetically. Thus, specify your null and alternative hypothesis in part (b) in the same manner. In this case, “UT first choice” had two categories “Yes” and “No”, so this is why the null hypothesis I used was µTN - µnotTN and not the other way around. t Test Yes-No Assuming unequal variances Difference -0.19989 t Ratio -1.31844 Std Err Dif 0.15161 DF 27.37117 Upper CL Dif 0.11099 Prob > |t| 0.1983 Lower CL Dif -0.51076 Prob > t 0.9009 Confidence 0.95 Prob < t 0.0991 In your writeup, also report the difference in averages (-0.19989 here), its standard error (0.15161 here), and the p-value for the test (0.0991 here [match up the > or < with the sign in your HA, or use Prob > |t| if you used ≠ in HA]), and give interpretations of each! Be sure to state your conclusion! Also, BE CAREFUL of what difference JMP is actually calculating. Here, JMP saying “Yes-No” right under t Test means that JMP calculated GPA of UT first choice – GPA of UT not-first choice. This is critical to your interpretation! c) Do a little research in Chapter 21 (specifically, the section on Making Errors, starting on p. 543) to answer this question! #5) a) Let’s analyze Parents Married and Voted for Barack Obama. Fit Y by X, enter in one variable for Y, the other for X (doesn’t matter which in this case). To modify the contingency table to include the correct output, click the red arrow next to Contingency Table, delete Total %, Row %, Column %, and select Expected and Cell Chi Square Contingency Table 10 Parents married? By 43 Voted for Barack Obama? Count Expected Cell Chi^2 Yes No, Someone Else No, I Didn't Vote No 59 44.6839 4.5867 52 74.5699 6.8312 57 48.7461 1.3976 168 Yes 95 109.316 1.8748 205 182.43 2.7923 111 119.254 0.5713 411 154 257 168 579 b) Report which cell has the largest Cell Chi^2 and tell us what a large value means. In this example, Parents Married = No and Voted for Barack Obama = No, Someone Else has the largest value. In general, large values mean …… c) Write out the null and alternative hypotheses, report the p-value (found to the row labeled Pearson), and state your conclusion. Here, we have strong evidence that there is an association between whether someone’s parents are married and how they voted. Very intriguing. Tests N DF -LogLike RSquare (U) 579 2 9.1836793 0.0148 Test ChiSquare Prob>ChiSq Likelihood Ratio 18.367 0.0001* Pearson 18.054 0.0001* d) This is a good question! e) Another good question! Review Chapter 21 for this. #6) Here is the relevant output that you should provide. Be sure to answer all the questions thoughtfully. Once again, it is ok to have slightly fewer observations than you started with, due to eliminating some extreme outliers. a) To make a regression line, go to Analyze, Fit Y by X, enter in Fastest Speed for Y, and Desired Weight for X. From the red arrow menu, select Fit Line. Bivariate Fit of 25 Fastest Speed Achieved Driving By 04 Desired Weight (Lbs.) Linear Fit 25 Fastest Speed Achieved Driving = 62.994527 + 0.2797353*04 Desired Weight (Lbs.) Summary of Fit RSquare 0.260123 RSquare Adj 0.242916 Root Mean Square Error 14.11963 Mean of Response 104.1778 Observations (or Sum Wgts) 45 b) To get a 95% confidence interval for the slope, right click (Ctrl-Click on Mac) on the word Intercept in the Parameter Estimates section, select Columns from the menu that pops up, and then Lower 95%. Repeat once more and also select Upper 95%. These represent the lower and upper numbers in the 95% CI. Parameter Estimates Term Estimate Std Error t Ratio Prob>|t| Lower 95% Upper 95% Intercept 72.248928 3.48428 20.74 <.0001* 65.40546 79.092397 04 Desired Weight (Lbs.) 0.2173344 0.022965 9.46 <.0001* 0.1722285 0.2624404 c) Do this part “by hand” (i.e., no JMP output is needed here). d) To get the residuals plot, click the red arrow next to Linear Fit after you have made the regression line, and select Plot Residuals from the menu that pops up. e) Provide answers to the 3 questions asked. Also provide similar JMP output (for each gender) as shown in parts (a), (b) and (d) here. PAGE 5