Lec02: Descriptive Statistics IOE 265 W10 1 Descriptive Statistics 1 Topics I Concept of Location and Dispersion. of Location and II. Measures of Location III. Measures of Dispersion IV Box Plots 2 . Plots Lec02: Descriptive Statistics IOE 265 W10 2 I. Location and Dispersion ? Most common descriptive statistics are related to either measuring location or dispersion (variation). ? Location ~ central tendency ? Dispersion ~ spread of distribution ? Classic example to demonstrate these concepts: 3 example demonstrate these Outcomes of Throwing Darts ? On or Off Location ? Low or High Dispersion Lecture Exercise: Identify On/Off Target & High/Low Dispersion for each x xx x x x x x x B. __________A. _________ 4 x x x D. __________C. __________ Lec02: Descriptive Statistics IOE 265 W10 3 Target / dispersion analysis and general problem solving ? First, address problems in order of importance. ? Highest Priority ? address features that have strongest cause- effect relationship with end-customer satisfaction. ? Next, we typically try to reduce dispersion, then shift mean to target as necessary to meet end- customer needs. 5 Stabilize process Center Process as necessary II. Measures of Location ? Mean ? Median ? Trimmed Mean 6 Lec02: Descriptive Statistics IOE 265 W10 4 Mean ? Mean (also known as the average) is a measure of (g the center of a distribution. ? Typical notation used to represent the mean of a sample of data is ; Greek letter ? is used to represent the mean of a population. N XXX Mean N ... 21 ?? ? X 7 ? Example: suppose five students take a test and their scores are 70, 68, 71, 69 and 98. Mean = (70+68+71+69+98)/5 = 75.2 Median ? Median (also known as the 50 th percentile) is the middle observation in a data set. ? Rank the data set and select the middle value. ? If odd number of observations, the middle value is observation [N + 1] / 2. ? If even number of observations, the middle value is extrapolated as midway between observation numbers N / 2 and [N / 2] + 1. ? Prior data values: 68 69 70 71 and 98 8 data values: , , , , 98. ? Median is 70. ? If another student with a score of 60 was included, the new median would result in 69.5 (69 + 70 / 2). Lec02: Descriptive Statistics IOE 265 W10 5 Mean Vs. Median ? Which is a better measure of location for the following set of test scores? ? 68, 70, 69, 71, and 98 ? Mean = 75.2 Median = 70.0 9 Trimmed Mean ? Trimmed Mean is a compromise between mean and median. ? 10% Trimmed Mean ? First, eliminate smallest 10% of values and largest 10% of values. ? Then, re-compute the mean. 10 ? Trimmed means ? gaining popularity ? Less sensitive than the mean to outliers, but not as robust as the median value. Lec02: Descriptive Statistics IOE 265 W10 6 Trimmed Mean (Example from Devore Textbook) ? Variable: life (hours) of incandescent lamps. ? Sample size = 20 ? How many values will be trimmed in 10% TM? ? Mean = 965.0 Median = 1009.5 Trim Mean = 971.4 ? How are these values impacted by sample size, by distribution? 11 ? What might be some useful applications? III. Measures of Dispersion ? Range ? Standard Deviation ? Variance 12 Lec02: Descriptive Statistics IOE 265 W10 7 Range ? Range is the maximum value in a data set minus the minimum value value. ? Example: Test Scores: 70, 68, 71, 69 and 98. Range = 98 - 68 = 30. 13 ? Note: the range is often preferred over the standard deviation for small data sets (e.g., if # of observations for a sample data set < 10). Standard Deviation ? Sample Standard deviation (S D ) S ht ev , measures t e dispersion of the individual observations from the mean. ? For a sample data set, standard deviation is also referred to as the sample standard deviation or the root-mean-square S rms ?? 1 1 2 ? ? ? ? ? n XX S n i i 14 ? Units for S are the same as for the variable being analyzed. ? E.g., if we measure mpg, then S will be in mpg. Lec02: Descriptive Statistics IOE 265 W10 8 Why divide by n-1? ? To correct an estimating error ? we?ll cover this in chapter 6 in detail (point estimation theory)p(p y) ? What you should know now: n ? 1 is referred to as the ?degrees of freedom?. ? Degrees of freedom (dof) are a measure of the amount of information from the sample data that has been used in estimating a sample statistic ? Every time a statistic is calculated from a sample one degree 15 time is calculated from , degree of freedom is used up ? So, when we calculate the sample std deviation, we divide by n-1 because the sample mean (Xbar) has to be calculated first and this calculation uses 1 dof ?? 1 1 2 ? ? ? ? ? n XX S n i i Effects of Extreme Values ? Test scores: 70, 68, 71, 69 and 98, ? sample standard deviation is 12.79. ? Suppose you exclude the score of 98, ? sample standard deviation is reduced to 1.3! ? Standard deviation may be severely influenced by extreme values in sample data set (Note; 16 (Note; these values may not necessarily be mistakes). ? We may reduce the effects of any individual observation by increasing the sample size. Lec02: Descriptive Statistics IOE 265 W10 9 Variance ? Variance is the square of the standard deviation. ? Represents the average squared deviation of each average deviation observation from the sample mean. 1 )( 2 1 2 ? ? ? ? ? n XX S n i i 17 ? Prior Example where std deviation = 12.79 ? Variance = (12.79) 2 = 163.72 Skewness ? Some software packages provide skewness* skewness ? Skewness is a measure of relative symmetry. ? Zero indicates symmetry ? Positive skewness show a long right tail ? Negative skewness show a long left tail 18 long left tail ? *Actual calculation outside scope of class Lec02: Descriptive Statistics IOE 265 W10 10 Kurtosis ? Some software packages provide kurtosis* kurtosis ? Kurtosis (K) is a measure of peakedness of a distribution (relative to normal). ? K = 3 ? normal, bell-shaped distribution (mesokurtic) --(Note: some software: normal=0) K < 3 (or negative relative to 0) ? flatter peak 19 ? (or , fatter shoulders, shorter tails ? K > 3 (or positive relative to 0) ? more peaked than normal with longer tails *Actual calculation outside scope of class Using Software to Calculate Descriptive Statistics ? In practice we rarely calculate statistics by , statistics by hand. So, let us explore some useful Excel functions. ? Mean ? =average(array) ? Median ? =median(array) ? Std Dev? =stdev(array) 20 (y) ? Variance? =var(array) ? Range ? =max(array)-min(arrary) Lec02: Descriptive Statistics IOE 265 W10 11 Minitab Results ? Of course, all advanced statistical software will automatically compute descriptive statistics. Descriptive Statistics: Score Variable N Mean Median TrMean StDev SE Mean Score 16 82.78 83.50 83.32 9.17 2.29 21 Variable Minimum Maximum Score 63.00 95.00 IV. Box Plots Mild Outlier(s) Q3 ? 75 th Percentile Median 50 th Percentile Extreme Outlier(s) * * Upper Whisker: Highest value within upper limit Median Third quartile (Q3) or Upper fourth Q1 ? 25 th Percentile f s = Q3 ? Q1 Upper Limit: Q3 + 1.5 f s LLiit 22 First quartile (Q1) or Lower fourth Lower Limit: Q1 ? 1.5 f s * Lower Whisker: Lowest value within lower limit < extreme outlier Q +/- 1.5 f s < Q +/- 3.0 f s < mild outlier Lec02: Descriptive Statistics IOE 265 W10 12 Box Plots differences in notation/calculation ? Minitab calculates quartiles (Q1, Q3) Sttbk(ildiD )ftl? Some textboo s (inc uding evore re er o ower fourth and upper fourth ? Roughly the same, but with some differences: ? Lower fourth={median of the smallest n/2 obs, n even OR median of the smallest (n+1)/2, n odd} ? Q1 ? observation at position (n+1)/4 (if not an integer then interpolate) 23 ? Upper fourth= ={median of the largest n/2 obs, n even OR median of the largest (n+1)/2, n odd} ? Q3 ? observation at position 3(n+1)/4 (if not an integer then interpolate) Box Plot Information ? Box Plot Shows: ? Location ? line for median ? Note: some software will also include a dot for mean. ? Dispersion ?box shows the 25 th ?75 th percentile value range. ? Departures from symmetry ? one box or whisker can be larger than the other side suggesting a lack 24 be larger than the side suggesting of symmetry. ? Identification of mild and extreme outliers. Lec02: Descriptive Statistics IOE 265 W10 13 Box Plot - MPG Example Boxplot of MPG 25 23222120191817 MPG Box Plots Vs. Histogram ? Note: wider box to left of median in box plot t d t l ft th i htsugges s more sprea o e an r ght. ? Similar pattern is shown in the histogram. 15 Histogram of MPGBoxplot of MPG 26 17 18 19 20 21 22 23 0 5 10 MPG F r equenc y Median = 20.1 23222120191817 MPG Median = 20.1 Lec02: Descriptive Statistics IOE 265 W10 14 Multiple Box Plot Example ? For MPG data, suppose you also collected data for tire pressures (grouped: as normal or low) ? Does this stratification variable help explain bi-modal distribution? 27 Summary of concepts? ? Most common descriptive statistics are related to either measuring location or dispersion (variation) me . ? Location ~ central tendency (mean, median, trimmed mean) ? Dispersion ~ spread of distribution (range, standard deviation and variance) ? Extreme observations (or outliers) can have an important effect on some of these statistics BPlt th hi l t l th t h l 28 ? Box Plots are ano er grap ca oo tha can e p us identify extreme observations and distribution shapes Luis Microsoft PowerPoint - Lec02-ioe265w10 [Compatibility Mode]