Chapter 1: Statistics Success Stories and Cautionary Tales 1.1 / What Is Statistics? statistics: a collection of procedures and principles for gathering data and analyzing information to help people make decisions when faced with uncertainty. 1.2 / Seven Statistical Stories with Morals Case Study 1.1: Who Are Those Speedy Drivers? dotplot: simple summary of a long list of numbers; used to help see the patterns within data. In the plot, each dot represents an individual response. five-number summary: the lowest value, the lower quartile (1/4), median (1/2), the upper quartile (3/4), and the highest value. Moral of the story: simple summaries of data can tell an interesting story and are easier to digest than long lists. data: a plural word referring to numbers or non-numerical labels (such as male/female) collected from a set of entities (people, cities, and so on). median: the value in the middle of a numerical list of data when the numbers are put in order. For an even number of entities, the median is the average of the middle two values. lower quartile and upper quartile: (roughly) the medians of the lower and upper halves of the data. Case Study 1.2: Safety in the Skies? Moral of the story: when discussing the change in the rate or risk of occurrence of something, make sure you also include the base rate or baseline risk. rate: the number of times something occurs per number of opportunities for it to occur. risk: the potential for a bad outcome in the future can be estimated by using the past rate for that outcome, if it is assumed the future is like the past. base rate / baseline risk: the rate or risk at a beginning time period or under specific conditions. Case Study 1.3: Did Anyone Ask Whom You’ve Been Dating? Moral of the story: a representative sample of only a few thousand, or perhaps even a few hundred, can give reasonably accurate information about a population of many millions. population: a collection of all individuals about which information is desired. random sample: a subset of the population selected so that every individual has a specified probability of being part of the sample. sample survey: gathered opinions or other information from each individual included in the sample. margin of error: in a properly conducted survey, a number that is added to and subtracted from the sample information to produce an interval that is 95% certain to contain the truth about the population. In the most common types of sample surveys, the margin of error is approximately equal to 1 divided by the square root of the number of individuals in the sample. Case Study 1.4: Who Are Those Angry Women? Extensive nonresponse from a random sample, or the use of a self-selected (i.e. all-volunteer) sample, will probably produce biased results. Those who voluntarily respond to surveys tend to care about the issue and therefore have stronger and different opinions than those who do not respond. Moral of the story: an unrepresentative sample, even a large one, tells you almost nothing about the population. nonresponse bias: can occur when many people who are selected for the sample either do not respond at all or do not respond to some of the key survey questions. May occur even when an approximate random sample is selected and contacted. self-selected sample / volunteer sample: surveys that do not attempt to contact a random sample but instead ask anyone who wishes to respond to do so. In most cases, this kind of sample tells you nothing about the larger population at all; it tells you only about those who responded. Case Study 1.5: Does Prayer Lower Blood Pressure? Moral of the story: cause-and-effect conclusions cannot generally be made on the basis of an observational study. observational study: one in which participants are merely observed and measured. Comparisons based on observational studies are comparisons of naturally occurring groups. variable: a characteristic that differs from one individual to the next. (numerical or categorical) confounding variable: a variable that is not the main concern of the study, but may be partially responsible for the observed results. Case Study 1.6: Does Aspirin Reduce Heart Attack Rates? Moral of the story: unlike with observational studies, cause-and-effect conclusions can generally be made on the basis of randomized experiments. randomized experiment: a study in which treatments are randomly assigned to participants. treatment: a specific regimen or procedure assigned to participants by the experimenter. random assignment: one in which each participant has a specified probability of being assigned to each treatment. placebo: a pill or treatment designed to look just like the active treatment but with no active ingredients. statistically significant: relationship or difference that is large enough to be unlikely to have occurred in the sample if there was no relationship or difference in the population. Case Study 1.7: Does the Internet Increase Loneliness and Depression? Moral of the story: a “statistically significant” finding does not necessarily have practical importance. When a study reports a statistically significant finding, find out the magnitude of the relationship or difference. A secondary moral to this story is that the implied direction of cause and effect may be wrong. In this case, it could be that people who were more lonely and depressed were more prone to using the Internet. And as the follow-up research makes clear, remember that “truth” doesn’t necessarily remain fixed across time. Any study should be viewed in the context of society at the time it was done. 1.3 / The Common Elements in the Seven Stories In every story, data are used to make a judgment about a situation. The Discovery of Knowledge: (1) Asking the right question(s), (2) Collecting useful data, which includes deciding how much is needed, (3) Summarizing and analyzing data, with the goal of answering the questions, (4) Making decisions and generalizations based on the observed data, (5) Turning the data and subsequent decisions into new knowledge.