Chapter 8 Producing Data: Sampling BPS - 5th Ed. Chapter 8 Population and Sample Researchers often want to answer questions about some large group of individuals (this group is called the population) Often the researchers cannot measure (or survey) all individuals in the population, so they measure a subset of individuals that is chosen to represent the entire population (this subset is called a sample) A sample design describes exactly how t choose a sample from the population. The researchers then use statistical techniques to make conclusions about the population based on the sample BPS - 5th Ed. Chapter 8 Choosing a Sample to Represent a Population Choosing a representative sample from a population is not easy. Goals: Individuals in sample are representative of the population (that is, provide accurate information about the Population) Minimize cost of obtaining the sample (money, time, personnel, etc.) Sample Designs (some examples) Sample Surveys Simple Random Sample Stratified Random Sample BPS - 5th Ed. Chapter 8 Bad Sampling Designs Voluntary response sampling allowing individuals to choose to be in the sample Convenience sampling selecting individuals that are easiest to reach Both of these techniques are biased systematically favor certain outcomes BPS - 5th Ed. Chapter 8 Voluntary Response To prepare for her book Women and Love, Shere Hite sent questionnaires to 100,000 women asking about love, sex, and relationships. 4.5% responded Hite used those responses to write her book Moore (Statistics: Concepts and Controversies, 1997) noted: respondents “were fed up with men and eager to fight them…” “the anger became the theme of the book…” “but angry women are more likely” to respond BPS - 5th Ed. Chapter 8 Convenience Sampling Sampling mice from a large cage to study how a drug affects physical activity lab assistant reaches into the cage to select the mice one at a time until 10 are chosen Which mice will likely be chosen? could this sample yield biased results? BPS - 5th Ed. Chapter 8 Probability Sample a sample chosen by chance must know what samples are possible and what chance, or probability, each possible sample has of being selected Four Basic Types Simple Random Sampling (SRS) Stratified Sampling Systematic Sampling Cluster Sampling BPS - 5th Ed. Chapter 8 Simple Random Sampling Each individual in the population has the same chance of being chosen for the sample When every possible sample of size n out of a population of N has an equally likely chance of being selected. Example: For a simple random sample of size n = 2 form a population of N= 4, each of the 6 possible samples has an equally likely chance of occurring. BPS - 5th Ed. Chapter 8 Simple Random Sampling Simple random sampling requires that we have a list of all the individuals within a population. This list is called a frame. If we do not have a frame, then a different sampling method must be used How to get a Random selection from our frame: “drawing names out of a hat” table of random digits technology List and number the individuals. Use software (such as MINITAB) to take a random sample. BPS - 5th Ed. Chapter 8 MINITAB Example Today’s MINITAB Problem: Select a random sample of 5 people from a fictitious class. MTB > sample 5 c1 c2 MTB > samp 5 c1 c3 MTB > SAMPL 5 C1 C4 Session Window MTB > name c4 'Sample3' Sample3 BPS - 5th Ed. Chapter 8 MINITAB – Select Random Sample Enable Commands from the Editor menu item. Open a MINITAB file (i.e., worksheet) containing the frame (e.g., the names of each class member) in some column, say column 1 (c1) of the worksheet. Type the following command: MTB> sample n cx cy On the next page command is explained BPS - 5th Ed. Chapter 8 MINITAB – Select Random Sample (cont.) MTB> sample n cx cy where: sample (or samp) is the command n is the number of sample members desired (i.e., sample size) cx is the column from which the sample is drawn cy is the column where the sample names are placed Note: Substitute the correct numbers from specific problem for the items that are red and in italics. BPS - 5th Ed. Chapter 8 Table of Random Digits Table B on pg. 692 of text each entry is equally likely to be any of the 10 digits 0 through 9 entries are independent of each other (knowledge of one entry gives no information about any other entries) each pair of entries is equally likely to be any of the 100 pairs 00, 01,…, 99 each triple of entries is equally likely to be any of the 1000 values 000, 001, …, 999 BPS - 5th Ed. Chapter 8 Choosing a Simple Random Sample (SRS) using Table B STEP 1: Label each individual in the population STEP 2: Use Table B to select labels at random BPS - 5th Ed. Chapter 8 Stratified Random Sample first divide the population into groups of similar individuals, called strata second, choose a separate SRS in each stratum third, combine these SRSs to form the full sample BPS - 5th Ed. Chapter 8 Stratified Random Sample Example Suppose a university has the following student demographics: Undergraduate Graduate First Professional Special 55% 20% 5% 20% BPS - 5th Ed. Chapter 8 A stratified random sample of 100 students could be chosen as follows: select a SRS of 55 undergraduates, a SRS of 20 graduates, a SRS of 5 first professional students, and a SRS of 20 special students; combine these 100 students. Multistage Sample several stages of sampling are carried out useful for large-scale sample surveys samples at each stage may be SRSs, but are often stratified stages may involve other random sampling techniques as well (cluster, systematic, random digit dialing, …) BPS - 5th Ed. Chapter 8 Cautions about Sample Surveys Undercoverage some individuals or groups in the population are left out of the process of choosing the sample Nonresponse individuals chosen for the sample cannot be contacted or refuse to cooperate/respond Response bias behavior of respondent or interviewer may lead to inaccurate answers or measurements Wording of questions confusing or leading (biased) questions; words with different meanings BPS - 5th Ed. Chapter 8 Nonresponse To prepare for her book Women and Love, Shere Hite sent questionnaires to 100,000 women asking about love, sex, and relationships. 4.5% responded Hite used those responses to write her book angry women are more likely to respond BPS - 5th Ed. Chapter 8 Response Bias A door-to-door survey is being conducted to determine drug use (past or present) of members of the community. Respondents may give socially acceptable answers (maybe not the truth!) For this survey on drug use, would it matter if a police officer is conducting the interview? (bias from interviewer) BPS - 5th Ed. Chapter 8 Asking the Uninformed Washington Post National Weekly Edition (April 10-16, 1995, p. 36) A 1978 poll done in Cincinnati asked people whether they “favored or opposed repealing the 1975 Public Affairs Act.” There was no such act! About one third of those asked expressed an opinion about it. BPS - 5th Ed. Chapter 8 Response Bias Wording of Questions BPS - 5th Ed. Chapter 8 A newsletter distributed by a politician to his constituents gave the results of a “nationwide survey on Americans’ attitudes about a variety of educational issues.” One of the questions in the survey “Should your legislature adopt a policy to assist children in failing schools to opt out of that school and attend an alternative school--public, private, or parochial--of the parents’ choosing?” From the wording of this question, can you speculate on what answer was desired? Explain. Wording: Deliberate Bias “If you found a wallet with $20 in it, would you return the money?” “If you found a wallet with $20 in it, would you do the right thing and return the money?” BPS - 5th Ed. Chapter 8 Wording: Unintentional Bias “I have taught several students over the past few years.” How many students do you think I have taught? How many years am I referring to? “Over the past few days, how many servings of fruit have you eaten?” How many days are you considering? What constitutes a serving? BPS - 5th Ed. Chapter 8 Wording: Ordering of Questions “How often do you normally go out on a date? about ___ times a month.” “How happy are you with life in general?” Strong association between these questions. If the ordering is reversed, then there would be no strong association between these questions BPS - 5th Ed. Chapter 8 Inferences about the Population Values calculated from samples are used to make conclusions (inferences) about unknown values in the population Variability different samples from the same population may yield different results for a particular value of interest estimates from random samples will be closer to the true values in the population if the samples are larger how close the estimates will likely be to the true values can be calculated -- this is called the margin of error BPS - 5th Ed. Chapter 8