Ceane 14 Project #3 : Statistical Analysis Report (Statistics 201) Submitted To : Professor Jamie Paul University of Tennessee Report Prepared By : Miranda Ceane Undergraduate Student, Business Department University of Tennessee: Knoxville, TN 37916 November 15, 2016 Executive Summary On Monday November 14, 2016 I began analyzing the University of Tennessee-Knoxville’s Statistics 201 data set from a survey conducted in the spring of 2016. In this data set, 1015 college students answered vital questions that we will analyze in this report. The programs JMP and Microsoft Excel were used in order to create graphs, tables, and charts so that the data could be better interpreted. The sections represent which number is being answered. For example, section 3 addresses all of #3 and is separated by subsections. Section 1 shows that the sample I used is in fact random. Section 2 answers questions regarding whether or not students in college have cheated academically. Section 3 includes information based college students average hours of sleep per night. I am unaware of the exact sampling protocol and methods that were used to collect data for this survey, but I hope to clear up any questions a bout the data set in this report . Section 1 For this section, I have taken a random sample with the sample size “100” from the data collected from the Spring 2016 survey taken by Statistics 201 students. Below is a screenshot that shows the last sixteen rows of that sample, along with the original row number of the data. Section 2 Section 2 answers all of the following questions for #2. For this section, we are asked to analyze the answers from the survey “ Q-34 Have you ever cheated in college (academic cheating only)?” a . Assuming that my sample of 100 students is an accurate representation of the total population, those who answered no to cheating in college was 0.8 or approximately 80% of the sample population. From my random sample of “100”, I used the data to create a histogram that included the probability of each event happening, as well as the count. A screenshot is included below (“events” include=No (not cheating), Yes: looked at answers, Yes: other, and Yes: Plagiarism) b. The three conditions that are required to be met before constructing a confidence interval are: The Randomness Condition, the 10% Condition, and the Success/Failure Condition. Randomness Condition: This condition is met due to the fact that we took a random sample of 100. 10% Condition: This condition states that our sample has to be no more than 10% of the entire population. This condition is met. Our population is 1015. 10% of our population would be 101.5 and our random sample is just below that, at 100. Success/Failure Condition: This condition states that the amount of successes (np hat) and the amount of failures (nq hat) must be greater than or equal to ten. So for our sample, the amount of successes is .80, and the amount of failures is .20. Np hat (. 80)( 100)=80 which is greater than 10 Nq hat (. 20)( 100)=20 which is greater than 10 This condition is met. c . By examining this confidence interval, we can be 90% confident that the true proportion of Stats 201 students that have not cheated in college is contained in the interval. Attached below is a screenshot of the confidence interval for this data set. The confidence interval is ( 0.72 6696, 0.857 498) . d . Using the full set of data given, I used JMP to display the true population proportion for those who answered “no” to cheating in college (academic cheating only). The screenshot is attached below. The p value, or true population proportion for those who said no is 0.74. e. Our 90% confiden ce interval 72.7%-85.7% does contain our true population proportion 74%. f . We would expect approximately 90% of other classmates’ confidence intervals to be contained in the population proportion. Section 3 For this section, #3, we move to analyzing “ Q35- Average Hours of Slee p per N ight”. We will focus on interpreting confidence intervals, test hypotheses, and margin of errors. a. From the random sample of 100, I created a histogram for the variable “Average Hours of Sleep per night”. The screenshot of that histogram is below. b. The three conditions for a confidence interval, as listed in section 2 is the Randomness Condition, the 10% Condition, and the Success/Failure condition. Randomness Condition: The sample is still a random sample of 100 taken from the population 1015. This condition is met. The 10% Condition: 1015 x .10=101.5. Our random sample of 100 is less than 101.5. This condition is met. The Normal Enough Condition: A sample size of 100 is large enough to perform a confidence interval, and the data appears to be nearly normal. This condition is met. c. Most adults function best with 7-9 hours of sleep per night. We are trying to see if , on average, college students are getting less than the minimum hours of sleep to best function. Null hypothesis (Ho): =7 Alternative hypothesis (Ha): < 7 d . The P value=.3349. There is a 33.5% chance that college students will get more than 7 hours of sleep. Below are two screenshots to show the histogram along with the test mean and confidence intervals. It is more zoomed in and easier to read if I did two separate screenshots. e. We fail to reject the null hypothesis because the P-Value (.3349) is greater than 0.05. f. We are 95% confident that of hours slept by college students each night is contained within the interval 6.75-7.16. g. The margin of error was calculated as 0.209. To find that number, you subtract the confidence interval, and then divide by two. Margin of error: 7.1637242-6.7462758=0.4174484 0.4174484 / 2 = 0.2087242 or 0.209 h. The margin of error here is .209 which is equal to the margin of error calculated in part g. i . If you fail to reject the null hypothesis, this is a Type II error. To reduce the chance of making a Type II error, you can pick an alpha level that is higher or increase the sample size. Closing Thank you for allowing me the opportunity to analyze and report on this data set. I hope it is as informative for you as it has been for me. I found the variables interesting and easy to work with. Working with this data set has also helped to improve my skills with JMP and Excel, and I expect to be able to use these skills in the future. If you have any questions or concerns with the report, please feel free to contact me at any time. Regards, Miranda Ceane