9 Project #1: Statistical Analysis Report (Statistics 201) Submitted To : Professor Jamie Paul University of Tennessee Report Prepared By : Miranda Ceane Undergraduate Student, Business Department University of Tennessee: Knoxville, TN 37916 September 13, 2016 Executive Summary This report summarizes the analysis results associated with the survey conducted by most Statistics 201 students during the 2016 spring semester at the University of Tennessee. The purpose of this report is to document and graphically show the survey’s data and relationships between specific variables. I am unaware of the sampling protocol and methods that took place for this survey, but the data was monitored by professors at the University of Tennessee who took the responsibility of grading the subsets and surveys. The technology used for this report is JMP, statistical s oftware that makes analysis of data quick and accurate if used correctly. The software is responsible for all of my graphs, data charts, and statistics summaries. Section 1 of this report shows which random sample of the original data set I used. Section 2 begins to go into depth with graphical analysis and displays of different variables from the data set. All analysis only uses the random sample of the original data set. The subsections (a, b, c…) explain what question has been answered or what graph is displayed below. Section 3 focuses on two specific variables and their relationship with one another. Section 1 Since the last two digits of my student I.D. # is 91 (000414991) I took a random sample size of the data of 791 rows . Section 2 2a. A n example of a categorical variab le in this data set is “q8-Greek life” which indicates whether the person is or is not involved in Greek life. An ordinal variable is “q-12 Economic class” which allowed students to describe whether they come from lower, middle, or upper socio-economic classes, or somewhere in between. An example of a quantitative variable in this data set is “q-16 Hours work weekly” which is a numerical value of however many hours the person works during the week. 2b. Below is a bar chart and pie chart of the categorical variable “q-30 Water drinking behavior” which are responses to the question “ Which statement best describes you r behavior when you drink water on camp us. 2c. For the two categorical variables “q1-Gender” (Male or female) and “q33-Tobacco Usage” (Smoke, Other [Dip, Chew], Multiple Products, No Usage), I believe there is a relationship between the two variables. I personally believe there will be a moderate relationship between the two, and there will be more men that use tobacco than women 2d. Below is a Mosaic plot and a contingency table that shows the relationship between “q1-Gender” (male or female) and “q-33 Tobacco Usage”. 2e. The relationship between these two variables (“q1-Gender” and “q33-Tobacco Usage”) in this random sample set shows that females use tobacco less than males. For example, 4.72% of females in the data set (20 women out of 424) smoke compared to the 10.63% of men who smoke (39 men out of 367). Also 11.44% of men use dip while 0% of females do. The percentage of females who do not use any type of tobacco substance in this data set is 19.88% larger than the male percentage. The mosaic plot clearly shows that males in this survey use more tobacco than females, and the use of tobacco by women is a prominent trait. 2f. The Mosaic plot in part (d) verified my expectation in part (c), although in part (c) I stated that the relationship between these two variables and my expectation was probably moderate. I now believe that the association between these two variables is considered strong. Females in this data set do not use multiple tobacco products whatsoever, and the number of non-usage in females is much higher than in males. Section 3 3a. Below is a histogram and the Quantiles/Summary Statistics that displays the data of the variable “q7-HS GPA.” The shape of the histogram is fairly symmetric. In the histogram, it is difficult to see any unusual features. Using the distributions feature in JMP , I determined that there are several outliers which include having GPA s as hi gh as 5.0, and as low as 2.3 . The mean of the data set is 3.73, the median is 3.8, and the standard deviation is 0.384. You can see all of those values in the second table. 3b. Comparing the variable “q32-Driven Drunk” and “q7-HS GPA” may have a correlation. I believe that with the use of graphical display, the correlation of these two variables may be more evident. Those who have driven drunk most likely have a less high H.S. GPA than those who have not. I do not think this correlation will be significant. There also must be considerations made that in this survey and all other surveys, there may be an issue of nonresponse or other types of bias due to the controversy of this particular question. 3c. Below are histograms displaying the data for the variables “q32-Driven Drunk” and “q7-HS GPA”. Underneath the histograms, you can see the Summary Statistics table that includes mean, median, and standard deviation, as well as quantiles. For those who have not driven drunk, the mean GPA is 3.79, the median is 3.8, and the standard deviation is 0.37. For those who have driven drunk, the mean GPA is 3.61, the median is 3.7, and the standard deviation is 0.39. 3d. Below is a side-by-side box plot that shows the variable data (“q7-HS GPA” and “q32-Driving Drunk”) directly next to each other. 3e. My suspicion regarding which group of drivers would have the higher average high school GPA was confirmed, but to a higher degree. There is a bigger corr elation than I expected there would be, and it seems to prove that those who have not driven drunk tend to stay in a range of higher GPA’s with occasional outliers. Conclusion Analyzing and working with these data sets has been interesting. Some of the relationships between variables were unexpected on my part, simply because I had misjudged some of the data or originally thought particular variables could not have a correlation. Using the software posed as a new challenge, but I feel like I have properly grasped the concepts and usage of JMP. Being able to manipulate the presentation of the graphs is extremely helpful in displaying data for data analysis. I hope this analysis sheds light on any questions you may have about the data and variables. Regards, Miranda Ceane