Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Part1b Hamid Semiyari February 4, 2010 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Table of contents 1 Chapter3, Association: Contingency, Correlation, and Regression How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? 2 Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Chapter 3 Chapter3 Association: Contingency, Correlation, and Regression Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? When we analyze data on two variables, our rst step is to distinguish between the response variable and the explanatory variable. Response Variable, Explanatory Variable, Association (. ) The Response Variable is the outcome variable on which comparisons are made. (. ) The Explanatory Variable is the variable that explains the outcome variable. (. ) An Association exists between the two variables if a particular value for one variable is more likely to occur with certain values of the other variable. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? When we analyze data on two variables, our rst step is to distinguish between the response variable and the explanatory variable. Response Variable, Explanatory Variable, Association (. ) The Response Variable is the outcome variable on which comparisons are made. (. ) The Explanatory Variable is the variable that explains the outcome variable. (. ) An Association exists between the two variables if a particular value for one variable is more likely to occur with certain values of the other variable. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? When we analyze data on two variables, our rst step is to distinguish between the response variable and the explanatory variable. Response Variable, Explanatory Variable, Association (. ) The Response Variable is the outcome variable on which comparisons are made. (. ) The Explanatory Variable is the variable that explains the outcome variable. (. ) An Association exists between the two variables if a particular value for one variable is more likely to occur with certain values of the other variable. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? When we analyze data on two variables, our rst step is to distinguish between the response variable and the explanatory variable. Response Variable, Explanatory Variable, Association (. ) The Response Variable is the outcome variable on which comparisons are made. (. ) The Explanatory Variable is the variable that explains the outcome variable. (. ) An Association exists between the two variables if a particular value for one variable is more likely to occur with certain values of the other variable. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Consider the following example. Are pesticides present less often in organic foods? Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Consider the following example. Are pesticides present less often in organic foods? Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Ctd. Are pesticides present less often in organic foods? (. ) What is the response variable? The pesticide status is the response variable (. ) What is the explanatory variable? The food type is the explanatory variable Food Type: Yes No Organic 29 98 Conventional 19485 7086 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Ctd. Are pesticides present less often in organic foods? (. ) What is the response variable?The pesticide status is the response variable (. ) What is the explanatory variable? The food type is the explanatory variable Food Type: Yes No Organic 29 98 Conventional 19485 7086 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Ctd. Are pesticides present less often in organic foods? (. ) What is the response variable?The pesticide status is the response variable (. ) What is the explanatory variable? The food type is the explanatory variable Food Type: Yes No Organic 29 98 Conventional 19485 7086 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Ctd. Are pesticides present less often in organic foods? (. ) What is the response variable?The pesticide status is the response variable (. ) What is the explanatory variable? The food type is the explanatory variable Food Type: Yes No Organic 29 98 Conventional 19485 7086 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Ctd. Are pesticides present less often in organic foods? (. ) What is the response variable?The pesticide status is the response variable (. ) What is the explanatory variable? The food type is the explanatory variable Food Type: Yes No Organic 29 98 Conventional 19485 7086 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Ctd. Are pesticides present less often in organic foods? (. ) What proportion of organic foods contain pesticides? 29 out of 127 organic foods contained pesticide residues. The proportion with pesticide is 29/127=0.228. (. ) What proportion of conventionally grown foods contain pesticides? 19485 out of 26,571 conventionally grown foods contained pesticide residues. The proportion with pesticide is 19,485/26,571=0.733, much higher than organic foods. (. ) What proportion of all sampled items contain pesticide residuals? (29+19,485) out of (127+26,571) foods contained pesticide residues. The proportion with pesticide is 19,514/26,698=0.731. Food Type: Yes No Organic 29 98 Conventional 19485 7086 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Ctd. Are pesticides present less often in organic foods? (. ) What proportion of organic foods contain pesticides? 29 out of 127 organic foods contained pesticide residues. The proportion with pesticide is 29/127=0.228. (. ) What proportion of conventionally grown foods contain pesticides? 19485 out of 26,571 conventionally grown foods contained pesticide residues. The proportion with pesticide is 19,485/26,571=0.733, much higher than organic foods. (. ) What proportion of all sampled items contain pesticide residuals? (29+19,485) out of (127+26,571) foods contained pesticide residues. The proportion with pesticide is 19,514/26,698=0.731. Food Type: Yes No Organic 29 98 Conventional 19485 7086 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Ctd. Are pesticides present less often in organic foods? (. ) What proportion of organic foods contain pesticides? 29 out of 127 organic foods contained pesticide residues. The proportion with pesticide is 29/127=0.228. (. ) What proportion of conventionally grown foods contain pesticides? 19485 out of 26,571 conventionally grown foods contained pesticide residues. The proportion with pesticide is 19,485/26,571=0.733, much higher than organic foods. (. ) What proportion of all sampled items contain pesticide residuals? (29+19,485) out of (127+26,571) foods contained pesticide residues. The proportion with pesticide is 19,514/26,698=0.731. Food Type: Yes No Organic 29 98 Conventional 19485 7086 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Ctd. Are pesticides present less often in organic foods? (. ) What proportion of organic foods contain pesticides? 29 out of 127 organic foods contained pesticide residues. The proportion with pesticide is 29/127=0.228. (. ) What proportion of conventionally grown foods contain pesticides? 19485 out of 26,571 conventionally grown foods contained pesticide residues. The proportion with pesticide is 19,485/26,571=0.733, much higher than organic foods. (. ) What proportion of all sampled items contain pesticide residuals? (29+19,485) out of (127+26,571) foods contained pesticide residues. The proportion with pesticide is 19,514/26,698=0.731. Food Type: Yes No Organic 29 98 Conventional 19485 7086 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Ctd. Are pesticides present less often in organic foods? (. ) What proportion of organic foods contain pesticides? 29 out of 127 organic foods contained pesticide residues. The proportion with pesticide is 29/127=0.228. (. ) What proportion of conventionally grown foods contain pesticides? 19485 out of 26,571 conventionally grown foods contained pesticide residues. The proportion with pesticide is 19,485/26,571=0.733, much higher than organic foods. (. ) What proportion of all sampled items contain pesticide residuals? (29+19,485) out of (127+26,571) foods contained pesticide residues. The proportion with pesticide is 19,514/26,698=0.731. Food Type: Yes No Organic 29 98 Conventional 19485 7086 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Ctd. Are pesticides present less often in organic foods? (. ) What proportion of organic foods contain pesticides? 29 out of 127 organic foods contained pesticide residues. The proportion with pesticide is 29/127=0.228. (. ) What proportion of conventionally grown foods contain pesticides? 19485 out of 26,571 conventionally grown foods contained pesticide residues. The proportion with pesticide is 19,485/26,571=0.733, much higher than organic foods. (. ) What proportion of all sampled items contain pesticide residuals? (29+19,485) out of (127+26,571) foods contained pesticide residues. The proportion with pesticide is 19,514/26,698=0.731. Food Type: Yes No Organic 29 98 Conventional 19485 7086 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Ctd. Are pesticides present less often in organic foods? (. ) What proportion of organic foods contain pesticides? 29 out of 127 organic foods contained pesticide residues. The proportion with pesticide is 29/127=0.228. (. ) What proportion of conventionally grown foods contain pesticides? 19485 out of 26,571 conventionally grown foods contained pesticide residues. The proportion with pesticide is 19,485/26,571=0.733, much higher than organic foods. (. ) What proportion of all sampled items contain pesticide residuals? (29+19,485) out of (127+26,571) foods contained pesticide residues. The proportion with pesticide is 19,514/26,698=0.731. Food Type: Yes No Organic 29 98 Conventional 19485 7086 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Contingency Table The Food Type and Pesticide Status Table is called a contingency table A contingency table: Displays 2 categorical variables The rows list the categories of 1 variable The columns list the categories of the other variable Entries in the table are frequencies Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Contingency Table The Food Type and Pesticide Status Table is called a contingency table A contingency table: Displays 2 categorical variables The rows list the categories of 1 variable The columns list the categories of the other variable Entries in the table are frequencies Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Contingency Table The Food Type and Pesticide Status Table is called a contingency table A contingency table: Displays 2 categorical variables The rows list the categories of 1 variable The columns list the categories of the other variable Entries in the table are frequencies Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Contingency Table The Food Type and Pesticide Status Table is called a contingency table A contingency table: Displays 2 categorical variables The rows list the categories of 1 variable The columns list the categories of the other variable Entries in the table are frequencies Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Contingency Table The Food Type and Pesticide Status Table is called a contingency table A contingency table: Displays 2 categorical variables The rows list the categories of 1 variable The columns list the categories of the other variable Entries in the table are frequencies Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Contingency Table The Food Type and Pesticide Status Table is called a contingency table A contingency table: Displays 2 categorical variables The rows list the categories of 1 variable The columns list the categories of the other variable Entries in the table are frequencies Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Food Type and Pesticide Status Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Food Type and Pesticide Status Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Ctd. Food Type and Pesticide Status (. ) What is the sum over each row?The the sum over each row is 1.00 (. ) What proportion of organic foods contained pesticide residuals?? 0.23 (. ) What proportion of conventional foods contained pesticide residuals? 0.73 Food Type: Yes No Organic 0.23 0.77 Conventional 0.73 0.27 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Ctd. Food Type and Pesticide Status (. ) What is the sum over each row? The the sum over each row is 1.00 (. ) What proportion of organic foods contained pesticide residuals?? 0.23 (. ) What proportion of conventional foods contained pesticide residuals? 0.73 Food Type: Yes No Organic 0.23 0.77 Conventional 0.73 0.27 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Ctd. Food Type and Pesticide Status (. ) What is the sum over each row?The the sum over each row is 1.00 (. ) What proportion of organic foods contained pesticide residuals?? 0.23 (. ) What proportion of conventional foods contained pesticide residuals? 0.73 Food Type: Yes No Organic 0.23 0.77 Conventional 0.73 0.27 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Ctd. Food Type and Pesticide Status (. ) What is the sum over each row?The the sum over each row is 1.00 (. ) What proportion of organic foods contained pesticide residuals?? 0.23 (. ) What proportion of conventional foods contained pesticide residuals? 0.73 Food Type: Yes No Organic 0.23 0.77 Conventional 0.73 0.27 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Ctd. Food Type and Pesticide Status (. ) What is the sum over each row?The the sum over each row is 1.00 (. ) What proportion of organic foods contained pesticide residuals?? 0.23 (. ) What proportion of conventional foods contained pesticide residuals? 0.73 Food Type: Yes No Organic 0.23 0.77 Conventional 0.73 0.27 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Ctd. Food Type and Pesticide Status (. ) What is the sum over each row?The the sum over each row is 1.00 (. ) What proportion of organic foods contained pesticide residuals?? 0.23 (. ) What proportion of conventional foods contained pesticide residuals? 0.73 Food Type: Yes No Organic 0.23 0.77 Conventional 0.73 0.27 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Ctd. Food Type and Pesticide Status (. ) What is the sum over each row?The the sum over each row is 1.00 (. ) What proportion of organic foods contained pesticide residuals?? 0.23 (. ) What proportion of conventional foods contained pesticide residuals? 0.73 Food Type: Yes No Organic 0.23 0.77 Conventional 0.73 0.27 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Ctd. Food Type and Pesticide Status (. ) What is the sum over each row?The the sum over each row is 1.00 (. ) What proportion of organic foods contained pesticide residuals?? 0.23 (. ) What proportion of conventional foods contained pesticide residuals? 0.73 Food Type: Yes No Organic 0.23 0.77 Conventional 0.73 0.27 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Ctd. Food Type and Pesticide Status Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Ctd. Food Type and Pesticide Status Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Example: Gender and Admission Status The 2 2 table below classi es the 4,526 applicants who applied to the six largest graduate program at UC Berkeley according to the gender and admission status. Gender Admitted Denied Male 1198 1493 Female 557 1278 Are gender and admission status associated? Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Example: Gender and Admission Status The 2 2 table below classi es the 4,526 applicants who applied to the six largest graduate program at UC Berkeley according to the gender and admission status. Gender Admitted Denied Male 1198 1493 Female 557 1278 Are gender and admission status associated? Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Example: Gender and Admission Status The 2 2 table below classi es the 4,526 applicants who applied to the six largest graduate program at UC Berkeley according to the gender and admission status. Gender Admitted Denied Male 1198 1493 Female 557 1278 Are gender and admission status associated? Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Ctd. Gender and Admission Status To determine wether there exists an association, we compute the conditional proportion on one variable within each category of the other variable. Gender Admitted Denied Total Male 1198/(1198+1493)=0.445 1493/(2691)=0.555 1.00 Female 0.304 0.697 1.00 In the above example, it appears that more males were admitted (44.5%) than were females(30.4%). Is this evidence of a gender bias? (we will discuss this further in ch 3.4.) Is there an association? When you form a contingency table, rst decide on which variable to treat as the, response variable. In some cases, either variable could be the response variable. Studying the conditional proportion helps you judge whether there is an association between the variables. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Ctd. Gender and Admission Status To determine wether there exists an association, we compute the conditional proportion on one variable within each category of the other variable. Gender Admitted Denied Total Male 1198/(1198+1493)=0.445 1493/(2691)=0.555 1.00 Female 0.304 0.697 1.00 In the above example, it appears that more males were admitted (44.5%) than were females(30.4%). Is this evidence of a gender bias? (we will discuss this further in ch 3.4.) Is there an association? When you form a contingency table, rst decide on which variable to treat as the, response variable. In some cases, either variable could be the response variable. Studying the conditional proportion helps you judge whether there is an association between the variables. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Ctd. Gender and Admission Status To determine wether there exists an association, we compute the conditional proportion on one variable within each category of the other variable. Gender Admitted Denied Total Male 1198/(1198+1493)=0.445 1493/(2691)=0.555 1.00 Female 0.304 0.697 1.00 In the above example, it appears that more males were admitted (44.5%) than were females(30.4%). Is this evidence of a gender bias? (we will discuss this further in ch 3.4.) Is there an association? When you form a contingency table, rst decide on which variable to treat as the, response variable. In some cases, either variable could be the response variable. Studying the conditional proportion helps you judge whether there is an association between the variables. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Ctd. Gender and Admission Status To determine wether there exists an association, we compute the conditional proportion on one variable within each category of the other variable. Gender Admitted Denied Total Male 1198/(1198+1493)=0.445 1493/(2691)=0.555 1.00 Female 0.304 0.697 1.00 In the above example, it appears that more males were admitted (44.5%) than were females(30.4%). Is this evidence of a gender bias? (we will discuss this further in ch 3.4.) Is there an association? When you form a contingency table, rst decide on which variable to treat as the, response variable. In some cases, either variable could be the response variable. Studying the conditional proportion helps you judge whether there is an association between the variables. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? In practice, when we investigate the association between two variables, there are three types of cases: (.) Both variables are categorial, such as food type and pesticide status. The data can then be displayed in a contingency table. We’ve seen that we can then explore the association by comparing conditional proportions. (.) One variables could be categorial and one could be quantitative, such as income and gender. We can then compare the categories (such as females and males) using summaries of center and spread for quantitative variable (such as the mean and the standard deviation of income) and graphic such as side-by-side box plots. (.) Both variables could be quantitative, such as below examples. We then analyze how outcome on the response variable tends to change as the value of the explanatory variable changes. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? In practice, when we investigate the association between two variables, there are three types of cases: (.) Both variables are categorial, such as food type and pesticide status. The data can then be displayed in a contingency table. We’ve seen that we can then explore the association by comparing conditional proportions. (.) One variables could be categorial and one could be quantitative, such as income and gender. We can then compare the categories (such as females and males) using summaries of center and spread for quantitative variable (such as the mean and the standard deviation of income) and graphic such as side-by-side box plots. (.) Both variables could be quantitative, such as below examples. We then analyze how outcome on the response variable tends to change as the value of the explanatory variable changes. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? In practice, when we investigate the association between two variables, there are three types of cases: (.) Both variables are categorial, such as food type and pesticide status. The data can then be displayed in a contingency table. We’ve seen that we can then explore the association by comparing conditional proportions. (.) One variables could be categorial and one could be quantitative, such as income and gender. We can then compare the categories (such as females and males) using summaries of center and spread for quantitative variable (such as the mean and the standard deviation of income) and graphic such as side-by-side box plots. (.) Both variables could be quantitative, such as below examples. We then analyze how outcome on the response variable tends to change as the value of the explanatory variable changes. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? In practice, when we investigate the association between two variables, there are three types of cases: (.) Both variables are categorial, such as food type and pesticide status. The data can then be displayed in a contingency table. We’ve seen that we can then explore the association by comparing conditional proportions. (.) One variables could be categorial and one could be quantitative, such as income and gender. We can then compare the categories (such as females and males) using summaries of center and spread for quantitative variable (such as the mean and the standard deviation of income) and graphic such as side-by-side box plots. (.) Both variables could be quantitative, such as below examples. We then analyze how outcome on the response variable tends to change as the value of the explanatory variable changes. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? In practice, when we investigate the association between two variables, there are three types of cases: (.) Both variables are categorial, such as food type and pesticide status. The data can then be displayed in a contingency table. We’ve seen that we can then explore the association by comparing conditional proportions. (.) One variables could be categorial and one could be quantitative, such as income and gender. We can then compare the categories (such as females and males) using summaries of center and spread for quantitative variable (such as the mean and the standard deviation of income) and graphic such as side-by-side box plots. (.) Both variables could be quantitative, such as below examples. We then analyze how outcome on the response variable tends to change as the value of the explanatory variable changes. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? (a) Plastic (lb) and Household Size Size 2 3 3 6 4 2 1 5 Plastic 0.85 1.81 2.19 3.05 2.19 1.41 0.27 2.83 (b) Years Owned (Car) versus Value Years 1 5 10 8 6 3 2 Value 14,000 9,000 6,000 7,000 8,000 11,000 12,500 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? (a) Plastic (lb) and Household Size Size 2 3 3 6 4 2 1 5 Plastic 0.85 1.81 2.19 3.05 2.19 1.41 0.27 2.83 (b) Years Owned (Car) versus Value Years 1 5 10 8 6 3 2 Value 14,000 9,000 6,000 7,000 8,000 11,000 12,500 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Relationships Between Quantitative Variables 3 Tools Used to Describe, Picture, and Quantify the Relationship Between Two Quantitative Variables: ( ) Scatter-plot: A Two-Dimensional graph of data values. ( ) Correlation: A statistic that measures the length of a linear relationship between two quantitative variables. ( ) Regression Equation: An equation that describes the average relationship between a quantitative response variable and an explanatory variable. Looking for a Patterns (Trends) with Scatter-plots Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Relationships Between Quantitative Variables 3 Tools Used to Describe, Picture, and Quantify the Relationship Between Two Quantitative Variables: ( ) Scatter-plot: A Two-Dimensional graph of data values. ( ) Correlation: A statistic that measures the length of a linear relationship between two quantitative variables. ( ) Regression Equation: An equation that describes the average relationship between a quantitative response variable and an explanatory variable. Looking for a Patterns (Trends) with Scatter-plots Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Relationships Between Quantitative Variables 3 Tools Used to Describe, Picture, and Quantify the Relationship Between Two Quantitative Variables: ( ) Scatter-plot: A Two-Dimensional graph of data values. ( ) Correlation: A statistic that measures the length of a linear relationship between two quantitative variables. ( ) Regression Equation: An equation that describes the average relationship between a quantitative response variable and an explanatory variable. Looking for a Patterns (Trends) with Scatter-plots Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Relationships Between Quantitative Variables 3 Tools Used to Describe, Picture, and Quantify the Relationship Between Two Quantitative Variables: ( ) Scatter-plot: A Two-Dimensional graph of data values. ( ) Correlation: A statistic that measures the length of a linear relationship between two quantitative variables. ( ) Regression Equation: An equation that describes the average relationship between a quantitative response variable and an explanatory variable. Looking for a Patterns (Trends) with Scatter-plots Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Positive Association Two variables have a positive association when the values of one variable tend to increase as the values of the other variable increase. Plastic (lb) and Household Size Size 2 3 3 6 4 2 1 5 Plastic 0.85 1.81 2.19 3.05 2.19 1.41 0.27 2.83 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Positive Association Two variables have a positive association when the values of one variable tend to increase as the values of the other variable increase. Plastic (lb) and Household Size Size 2 3 3 6 4 2 1 5 Plastic 0.85 1.81 2.19 3.05 2.19 1.41 0.27 2.83 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Negative Association Values of one variable tend to decrease as the values of the other variables increase. Years Owned (Car) versus Value Years 1 5 10 8 6 3 2 Value 14,000 9,000 6,000 7,000 8,000 11,000 12,500 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Negative Association Values of one variable tend to decrease as the values of the other variables increase. Years Owned (Car) versus Value Years 1 5 10 8 6 3 2 Value 14,000 9,000 6,000 7,000 8,000 11,000 12,500 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? The Scatter-plot With two quantitative variable it is common to denote the response variable by y and the explanatory variable by x. In a a scatter-plot, the explanatory variable x is placed on a horizontal axis and the response variable y is placed on the vertical axis. The (x;y) pair of values for an observation is represented by a dot relative to two axes. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? The Scatter-plot With two quantitative variable it is common to denote the response variable by y and the explanatory variable by x. In a a scatter-plot, the explanatory variable x is placed on a horizontal axis and the response variable y is placed on the vertical axis. The (x;y) pair of values for an observation is represented by a dot relative to two axes. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Scatter-plot: Internet Usage and Gross National Product (GDP) Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? The Correlation How can We Summarize Strength of Association? The Correlation is one of the most useful statistics. A correlation is a single number that describes the degree of relationship between two variables. A positive r-value indicates a positive association. A negative r-value indicates a negative association. An r-value close to +1 or -1 indicates a strong linear association. An r-value close to 0 indicates a weak association. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Calculating the Correlation r = 1n 1 x x sx y y sy When n is the number of points , x and y are means, and sx and sy are standard deviations for x and y. The sum is taken over all n observations. Calculating the Correlation Computation is simpler with: r = (xy) ( x)( y)=np x2 ( x)2=np y2 ( y)2=n Graph the data to check wether the correlation is Appropriate Always Construct a scatter-plot to visually examine the association between two quantitative variables.The correlation works only for a linear relationship Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Calculating the Correlation r = 1n 1 x x sx y y sy When n is the number of points , x and y are means, and sx and sy are standard deviations for x and y. The sum is taken over all n observations. Calculating the Correlation Computation is simpler with: r = (xy) ( x)( y)=np x2 ( x)2=np y2 ( y)2=n Graph the data to check wether the correlation is Appropriate Always Construct a scatter-plot to visually examine the association between two quantitative variables.The correlation works only for a linear relationship Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Calculating the Correlation r = 1n 1 x x sx y y sy When n is the number of points , x and y are means, and sx and sy are standard deviations for x and y. The sum is taken over all n observations. Calculating the Correlation Computation is simpler with: r = (xy) ( x)( y)=np x2 ( x)2=np y2 ( y)2=n Graph the data to check wether the correlation is Appropriate Always Construct a scatter-plot to visually examine the association between two quantitative variables.The correlation works only for a linear relationship Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Calculating the Correlation r = 1n 1 x x sx y y sy When n is the number of points , x and y are means, and sx and sy are standard deviations for x and y. The sum is taken over all n observations. Calculating the Correlation Computation is simpler with: r = (xy) ( x)( y)=np x2 ( x)2=np y2 ( y)2=n Graph the data to check wether the correlation is Appropriate Always Construct a scatter-plot to visually examine the association between two quantitative variables. The correlation works only for a linear relationship Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Calculating the Correlation r = 1n 1 x x sx y y sy When n is the number of points , x and y are means, and sx and sy are standard deviations for x and y. The sum is taken over all n observations. Calculating the Correlation Computation is simpler with: r = (xy) ( x)( y)=np x2 ( x)2=np y2 ( y)2=n Graph the data to check wether the correlation is Appropriate Always Construct a scatter-plot to visually examine the association between two quantitative variables.The correlation works only for a linear relationship Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Driving Experience vs. Monthly Auto Insurance Premium A random sample of eight auto drivers with a company and having a similar auto insurance policies was selected. Driving Monthly Experience Auto Insurance x y xy x2 y2 5 64 320 25 4096 2 87 174 4 7569 12 50 600 144 2500 9 71 639 81 5041 15 44 660 225 1936 6 56 336 36 3136 25 42 1050 625 1764 16 60 960 256 3600 x=90 y=474 xy=4739 x2=1396 y2=29642 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Driving Experience vs. Monthly Auto Insurance Premium A random sample of eight auto drivers with a company and having a similar auto insurance policies was selected. Driving Monthly Experience Auto Insurance x y xy x2 y2 5 64 320 25 4096 2 87 174 4 7569 12 50 600 144 2500 9 71 639 81 5041 15 44 660 225 1936 6 56 336 36 3136 25 42 1050 625 1764 16 60 960 256 3600 x=90 y=474 xy=4739 x2=1396 y2=29642 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Driving Experience vs. Monthly Auto Insurance Premium Compute the correlation coe cient between the two sets of data and then interpret the result. x=90 y=474 xy=4739 x2=1396 y2=29642 r = (xy) ( x)( y)=np x2 ( x)2=np y2 ( y)2=n r = 4739 (90)(474)=8p(1396) (90)2=8p(29642) (474)2=8 = 0:77 The relationship is strong but not very strong.The driving experience and monthly auto insurance premium are negatively related. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Driving Experience vs. Monthly Auto Insurance Premium Compute the correlation coe cient between the two sets of data and then interpret the result. x=90 y=474 xy=4739 x2=1396 y2=29642 r = (xy) ( x)( y)=np x2 ( x)2=np y2 ( y)2=n r = 4739 (90)(474)=8p(1396) (90)2=8p(29642) (474)2=8 = 0:77 The relationship is strong but not very strong.The driving experience and monthly auto insurance premium are negatively related. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Driving Experience vs. Monthly Auto Insurance Premium Compute the correlation coe cient between the two sets of data and then interpret the result. x=90 y=474 xy=4739 x2=1396 y2=29642 r = (xy) ( x)( y)=np x2 ( x)2=np y2 ( y)2=n r = 4739 (90)(474)=8p(1396) (90)2=8p(29642) (474)2=8 = 0:77 The relationship is strong but not very strong.The driving experience and monthly auto insurance premium are negatively related. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Driving Experience vs. Monthly Auto Insurance Premium Compute the correlation coe cient between the two sets of data and then interpret the result. x=90 y=474 xy=4739 x2=1396 y2=29642 r = (xy) ( x)( y)=np x2 ( x)2=np y2 ( y)2=n r = 4739 (90)(474)=8p(1396) (90)2=8p(29642) (474)2=8 = 0:77 The relationship is strong but not very strong.The driving experience and monthly auto insurance premium are negatively related. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Driving Experience vs. Monthly Auto Insurance Premium Compute the correlation coe cient between the two sets of data and then interpret the result. x=90 y=474 xy=4739 x2=1396 y2=29642 r = (xy) ( x)( y)=np x2 ( x)2=np y2 ( y)2=n r = 4739 (90)(474)=8p(1396) (90)2=8p(29642) (474)2=8 = 0:77 The relationship is strong but not very strong. The driving experience and monthly auto insurance premium are negatively related. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Driving Experience vs. Monthly Auto Insurance Premium Compute the correlation coe cient between the two sets of data and then interpret the result. x=90 y=474 xy=4739 x2=1396 y2=29642 r = (xy) ( x)( y)=np x2 ( x)2=np y2 ( y)2=n r = 4739 (90)(474)=8p(1396) (90)2=8p(29642) (474)2=8 = 0:77 The relationship is strong but not very strong.The driving experience and monthly auto insurance premium are negatively related. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? 100 cars on the lot of a used-car dealership Would you expect a positive association, a negative association or no association between the age of the car and the mileage on the odometer? Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Regression Line Predicts the value for the response variable, y, as a straight-line function of the value of the explanatory variable, x The regression line is of the form: ^y = a + bx Important terms: ^y: predicted value a: y intercept b: slope Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Regression Line Predicts the value for the response variable, y, as a straight-line function of the value of the explanatory variable, x The regression line is of the form: ^y = a + bx Important terms: ^y: predicted value a: y intercept b: slope Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Regression Line Predicts the value for the response variable, y, as a straight-line function of the value of the explanatory variable, x The regression line is of the form: ^y = a + bx Important terms: ^y: predicted value a: y intercept b: slope Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? How Can Anthropologists Predict Height Using Human Remains? Regression Equation: ^y = 61:4 + 2:4x ^y is the predicted height and x is the length of a femur (thighbone), measured in centimeters Use the regression equation to predict the height of a person whose femur length was 50 centimeters ^y = 61:4 + 2:4(50) Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? How Can Anthropologists Predict Height Using Human Remains? Regression Equation: ^y = 61:4 + 2:4x ^y is the predicted height and x is the length of a femur (thighbone), measured in centimeters Use the regression equation to predict the height of a person whose femur length was 50 centimeters ^y = 61:4 + 2:4(50) Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? How Can Anthropologists Predict Height Using Human Remains? Regression Equation: ^y = 61:4 + 2:4x ^y is the predicted height and x is the length of a femur (thighbone), measured in centimeters Use the regression equation to predict the height of a person whose femur length was 50 centimeters ^y = 61:4 + 2:4(50) Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? How Can Anthropologists Predict Height Using Human Remains? Regression Equation: ^y = 61:4 + 2:4x ^y is the predicted height and x is the length of a femur (thighbone), measured in centimeters Use the regression equation to predict the height of a person whose femur length was 50 centimeters ^y = 61:4 + 2:4(50) Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? How Can Anthropologists Predict Height Using Human Remains? Regression Equation: ^y = 61:4 + 2:4x ^y is the predicted height and x is the length of a femur (thighbone), measured in centimeters Use the regression equation to predict the height of a person whose femur length was 50 centimeters ^y = 61:4 + 2:4(50) Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? How Can Anthropologists Predict Height Using Human Remains? Regression Equation: ^y = 61:4 + 2:4x ^y is the predicted height and x is the length of a femur (thighbone), measured in centimeters Use the regression equation to predict the height of a person whose femur length was 50 centimeters ^y = 61:4 + 2:4(50) Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Regression Line Interpreting the y-Intercept y-Intercept: the predicted value for y when x = 0 helps in plotting the line May not have any interpretative value if no observations had x values near 0 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Regression Line Interpreting the y-Intercept y-Intercept: the predicted value for y when x = 0 helps in plotting the line May not have any interpretative value if no observations had x values near 0 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Regression Line Interpreting the y-Intercept y-Intercept: the predicted value for y when x = 0 helps in plotting the line May not have any interpretative value if no observations had x values near 0 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Regression Line Interpreting the y-Intercept y-Intercept: the predicted value for y when x = 0 helps in plotting the line May not have any interpretative value if no observations had x values near 0 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Regression Line Interpreting the Slope Slope: measures the change in the predicted variable for every unit change in the explanatory variable Example A 1 cm increase in femur length results in a 2.4 cm increase in predicted height Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Regression Line Interpreting the Slope Slope: measures the change in the predicted variable for every unit change in the explanatory variable Example A 1 cm increase in femur length results in a 2.4 cm increase in predicted height Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Regression Line Slope Values: Positive, Negative, Equal to 0 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Residual Residual Prediction error, also called residual , for an observation is computed by y ^y, that is, the di erence between the observed value and the predicted value. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Residual Residual Prediction error, also called residual, for an observation is computed by y ^y, that is, the di erence between the observed value and the predicted value. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Residual A large residual indicates an unusual observation Large residuals can easily be found by constructing a histogram of the residuals "Least Squares Method" Yields the Regression Line Residual sum of squares: (residuals)2 = (y ^y)2 The optimal line through the data is the line that minimizes the residual sum of squares Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Residual A large residual indicates an unusual observation Large residuals can easily be found by constructing a histogram of the residuals "Least Squares Method" Yields the Regression Line Residual sum of squares: (residuals)2 = (y ^y)2 The optimal line through the data is the line that minimizes the residual sum of squares Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Residual A large residual indicates an unusual observation Large residuals can easily be found by constructing a histogram of the residuals "Least Squares Method" Yields the Regression Line Residual sum of squares: (residuals)2 = (y ^y)2 The optimal line through the data is the line that minimizes the residual sum of squares Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Residual A large residual indicates an unusual observation Large residuals can easily be found by constructing a histogram of the residuals "Least Squares Method" Yields the Regression Line Residual sum of squares: (residuals)2 = (y ^y)2 The optimal line through the data is the line that minimizes the residual sum of squares Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Residual A large residual indicates an unusual observation Large residuals can easily be found by constructing a histogram of the residuals "Least Squares Method" Yields the Regression Line Residual sum of squares: (residuals)2 = (y ^y)2 The optimal line through the data is the line that minimizes the residual sum of squares Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? The Slope and y-Intercept The slope and y-intercept for the least-squares regression line can be computed by the following: b = r sys x or b = xy ( x)( y)=n x2 ( x)2=n and a = y b x Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? The Slope and y-Intercept The slope and y-intercept for the least-squares regression line can be computed by the following: b = r sys x or b = xy ( x)( y)=n x2 ( x)2=n and a = y b x Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? The Slope and y-Intercept The slope and y-intercept for the least-squares regression line can be computed by the following: b = r sys x or b = xy ( x)( y)=n x2 ( x)2=n and a = y b x Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? The Slope and y-Intercept The slope and y-intercept for the least-squares regression line can be computed by the following: b = r sys x or b = xy ( x)( y)=n x2 ( x)2=n and a = y b x Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? The Slope and y-Intercept The slope and y-intercept for the least-squares regression line can be computed by the following: b = r sys x or b = xy ( x)( y)=n x2 ( x)2=n and a = y b x Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? The Slope and y-Intercept The slope and y-intercept for the least-squares regression line can be computed by the following: b = r sys x or b = xy ( x)( y)=n x2 ( x)2=n and a = y b x Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Driving Experience vs. Monthly Auto Insurance Premium A random sample of eight auto drivers with a company and having a similar auto-insurance policies was selected. Driving Monthly Experience Auto Insurance x y xy x2 y2 5 64 320 25 4096 2 87 174 4 7569 12 50 600 144 2500 9 71 639 81 5041 15 44 660 225 1936 6 56 336 36 3136 25 42 1050 625 1764 16 60 960 256 3600 x=90 y=474 xy=4739 x2=1396 y2=29642 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Driving Experience vs. Monthly Auto Insurance Premium A random sample of eight auto drivers with a company and having a similar auto-insurance policies was selected. Driving Monthly Experience Auto Insurance x y xy x2 y2 5 64 320 25 4096 2 87 174 4 7569 12 50 600 144 2500 9 71 639 81 5041 15 44 660 225 1936 6 56 336 36 3136 25 42 1050 625 1764 16 60 960 256 3600 x=90 y=474 xy=4739 x2=1396 y2=29642 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Revisiting Driving Experience vs. Monthly Auto Insurance Premium Find the regression line,that is, we need to nd slope and y-intercept. x=90 y=474 xy=4739 x2=1396 y2=29642 b = xy ( x)( y)=n x2 ( x)2=n b = 4739 (90)(474)=8(1396) (90)2=8 = 1:5476 a = y b x x = x=n = 90=8 = 11:25 y = y=n = 474=8 = 59:25 a = (59:25) ( 1:5476)(11:25) = 76:6605 Thus our estimated regression line ^y = a + bx is ^y = 76:6605 1:5476x Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Revisiting Driving Experience vs. Monthly Auto Insurance Premium Find the regression line,that is, we need to nd slope and y-intercept. x=90 y=474 xy=4739 x2=1396 y2=29642 b = xy ( x)( y)=n x2 ( x)2=n b = 4739 (90)(474)=8(1396) (90)2=8 = 1:5476 a = y b x x = x=n = 90=8 = 11:25 y = y=n = 474=8 = 59:25 a = (59:25) ( 1:5476)(11:25) = 76:6605 Thus our estimated regression line ^y = a + bx is ^y = 76:6605 1:5476x Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Revisiting Driving Experience vs. Monthly Auto Insurance Premium Find the regression line,that is, we need to nd slope and y-intercept. x=90 y=474 xy=4739 x2=1396 y2=29642 b = xy ( x)( y)=n x2 ( x)2=n b = 4739 (90)(474)=8(1396) (90)2=8 = 1:5476 a = y b x x = x=n = 90=8 = 11:25 y = y=n = 474=8 = 59:25 a = (59:25) ( 1:5476)(11:25) = 76:6605 Thus our estimated regression line ^y = a + bx is ^y = 76:6605 1:5476x Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Revisiting Driving Experience vs. Monthly Auto Insurance Premium Find the regression line,that is, we need to nd slope and y-intercept. x=90 y=474 xy=4739 x2=1396 y2=29642 b = xy ( x)( y)=n x2 ( x)2=n b = 4739 (90)(474)=8(1396) (90)2=8 = 1:5476 a = y b x x = x=n = 90=8 = 11:25 y = y=n = 474=8 = 59:25 a = (59:25) ( 1:5476)(11:25) = 76:6605 Thus our estimated regression line ^y = a + bx is ^y = 76:6605 1:5476x Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Revisiting Driving Experience vs. Monthly Auto Insurance Premium Find the regression line,that is, we need to nd slope and y-intercept. x=90 y=474 xy=4739 x2=1396 y2=29642 b = xy ( x)( y)=n x2 ( x)2=n b = 4739 (90)(474)=8(1396) (90)2=8 = 1:5476 a = y b x x = x=n = 90=8 = 11:25 y = y=n = 474=8 = 59:25 a = (59:25) ( 1:5476)(11:25) = 76:6605 Thus our estimated regression line ^y = a + bx is ^y = 76:6605 1:5476x Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Revisiting Driving Experience vs. Monthly Auto Insurance Premium Find the regression line,that is, we need to nd slope and y-intercept. x=90 y=474 xy=4739 x2=1396 y2=29642 b = xy ( x)( y)=n x2 ( x)2=n b = 4739 (90)(474)=8(1396) (90)2=8 = 1:5476 a = y b x x = x=n = 90=8 = 11:25 y = y=n = 474=8 = 59:25 a = (59:25) ( 1:5476)(11:25) = 76:6605 Thus our estimated regression line ^y = a + bx is ^y = 76:6605 1:5476x Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Revisiting Driving Experience vs. Monthly Auto Insurance Premium Find the regression line,that is, we need to nd slope and y-intercept. x=90 y=474 xy=4739 x2=1396 y2=29642 b = xy ( x)( y)=n x2 ( x)2=n b = 4739 (90)(474)=8(1396) (90)2=8 = 1:5476 a = y b x x = x=n = 90=8 = 11:25 y = y=n = 474=8 = 59:25 a = (59:25) ( 1:5476)(11:25) = 76:6605 Thus our estimated regression line ^y = a + bx is ^y = 76:6605 1:5476x Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Revisiting Driving Experience vs. Monthly Auto Insurance Premium Find the regression line,that is, we need to nd slope and y-intercept. x=90 y=474 xy=4739 x2=1396 y2=29642 b = xy ( x)( y)=n x2 ( x)2=n b = 4739 (90)(474)=8(1396) (90)2=8 = 1:5476 a = y b x x = x=n = 90=8 = 11:25 y = y=n = 474=8 = 59:25 a = (59:25) ( 1:5476)(11:25) = 76:6605 Thus our estimated regression line ^y = a + bx is ^y = 76:6605 1:5476x Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Revisiting Driving Experience vs. Monthly Auto Insurance Premium Find the regression line,that is, we need to nd slope and y-intercept. x=90 y=474 xy=4739 x2=1396 y2=29642 b = xy ( x)( y)=n x2 ( x)2=n b = 4739 (90)(474)=8(1396) (90)2=8 = 1:5476 a = y b x x = x=n = 90=8 = 11:25 y = y=n = 474=8 = 59:25 a = (59:25) ( 1:5476)(11:25) = 76:6605 Thus our estimated regression line ^y = a + bx is ^y = 76:6605 1:5476x Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Similarities between Correlation and Regression (z) Both assume a linear association between x and y. (z) The signs of the correlation (r) and the regression slope (b) are always the same for any given data. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Similarities between Correlation and Regression (z) Both assume a linear association between x and y. (z) The signs of the correlation (r) and the regression slope (b) are always the same for any given data. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Similarities between Correlation and Regression (z) Both assume a linear association between x and y. (z) The signs of the correlation (r) and the regression slope (b) are always the same for any given data. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Di erences between Correlation and Regression (z) The regression line for predicting y from x is di erent from that predicting x from y. However, the correlation between x and y is the same as between y and x. (z) The value of regression slope depends on the measurement unit, while the value of the correlation does not. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Di erences between Correlation and Regression (z) The regression line for predicting y from x is di erent from that predicting x from y. However, the correlation between x and y is the same as between y and x. (z) The value of regression slope depends on the measurement unit, while the value of the correlation does not. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Di erences between Correlation and Regression (z) The regression line for predicting y from x is di erent from that predicting x from y. However, the correlation between x and y is the same as between y and x. (z) The value of regression slope depends on the measurement unit, while the value of the correlation does not. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? r-Squared (r2) (z) The coe cient of determination, denoted by r2, measures the proportion of the variability in the data that is accounted for by the linear association between x and y. (z) In a regression context, r2 can be interpreted as the proportional reduction in error. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? r-Squared (r2) (z) The coe cient of determination, denoted by r2, measures the proportion of the variability in the data that is accounted for by the linear association between x and y. (z) In a regression context, r2 can be interpreted as the proportional reduction in error. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? r-Squared (r2) (z) The coe cient of determination, denoted by r2, measures the proportion of the variability in the data that is accounted for by the linear association between x and y. (z) In a regression context, r2 can be interpreted as the proportional reduction in error. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? revisiting Driving Experience vs. Monthly Auto Insurance Premium r = 0:77 hence r2 = ( 0:77)2 = 0:5929 This means that prediction error using the regression line to predict y is 59.3% smaller than the prediction error using y to predict y. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? revisiting Driving Experience vs. Monthly Auto Insurance Premium r = 0:77 hence r2 = ( 0:77)2 = 0:5929 This means that prediction error using the regression line to predict y is 59.3% smaller than the prediction error using y to predict y. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Extrapolation Extrapolation Using a regression line to predict y-values for x-values outside the observed range of the data . Response Variable, Explanatory Variable, Association (. ) The Response Variable is the outcome variable on which comparisons are made. (. ) The Explanatory Variable is the variable that explains the outcome variable. (. ) An Association exists between the two variables if a particular value for one variable is more likely to occur with certain values of the other variable. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Extrapolation Extrapolation Using a regression line to predict y-values for x-values outside the observed range of the data . Response Variable, Explanatory Variable, Association (. ) The Response Variable is the outcome variable on which comparisons are made. (. ) The Explanatory Variable is the variable that explains the outcome variable. (. ) An Association exists between the two variables if a particular value for one variable is more likely to occur with certain values of the other variable. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Extrapolation Extrapolation Using a regression line to predict y-values for x-values outside the observed range of the data . Response Variable, Explanatory Variable, Association (. ) The Response Variable is the outcome variable on which comparisons are made. (. ) The Explanatory Variable is the variable that explains the outcome variable. (. ) An Association exists between the two variables if a particular value for one variable is more likely to occur with certain values of the other variable. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Extrapolation Extrapolation Using a regression line to predict y-values for x-values outside the observed range of the data . Response Variable, Explanatory Variable, Association (. ) The Response Variable is the outcome variable on which comparisons are made. (. ) The Explanatory Variable is the variable that explains the outcome variable. (. ) An Association exists between the two variables if a particular value for one variable is more likely to occur with certain values of the other variable. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Extrapolation Extrapolation refers to using a regression line to predict the y value for an observation whose x value is outside the observed range of data. Extrapolation is not recommended because there is no assurance that the same regression line extends beyond the observed range of x. (extrapolation) Height in inches (y) and weight in pounds (x) may be roughly linearly associated. Suppose that a regression equation has been obtained. Can you con dently predict the height of a person whose weight is 350 pounds? Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Extrapolation Extrapolation refers to using a regression line to predict the y value for an observation whose x value is outside the observed range of data. Extrapolation is not recommended because there is no assurance that the same regression line extends beyond the observed range of x. (extrapolation) Height in inches (y) and weight in pounds (x) may be roughly linearly associated. Suppose that a regression equation has been obtained. Can you con dently predict the height of a person whose weight is 350 pounds? Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Be Cautious of In uential Outliers (. ) Both in uential observations and regression outliers a ect the regression line. (. ) In uential observations: usually have relatively small or large x values typically a ect the regression slope (. ) Regression outliers: are far from the rest of the observations typically a ect the y-intercept. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Be Cautious of In uential Outliers (. ) Both in uential observations and regression outliers a ect the regression line. (. ) In uential observations: usually have relatively small or large x values typically a ect the regression slope (. ) Regression outliers: are far from the rest of the observations typically a ect the y-intercept. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Be Cautious of In uential Outliers (. ) Both in uential observations and regression outliers a ect the regression line. (. ) In uential observations: usually have relatively small or large x values typically a ect the regression slope (. ) Regression outliers: are far from the rest of the observations typically a ect the y-intercept. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Be Cautious of In uential Outliers (. ) Both in uential observations and regression outliers a ect the regression line. (. ) In uential observations: usually have relatively small or large x values typically a ect the regression slope (. ) Regression outliers: are far from the rest of the observations typically a ect the y-intercept. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Be Cautious of In uential Outliers (. ) Both in uential observations and regression outliers a ect the regression line. (. ) In uential observations: usually have relatively small or large x values typically a ect the regression slope (. ) Regression outliers: are far from the rest of the observations typically a ect the y-intercept. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Be Cautious of In uential Outliers (. ) Both in uential observations and regression outliers a ect the regression line. (. ) In uential observations: usually have relatively small or large x values typically a ect the regression slope (. ) Regression outliers: are far from the rest of the observations typically a ect the y-intercept. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Be Cautious of In uential Outliers (. ) Both in uential observations and regression outliers a ect the regression line. (. ) In uential observations: usually have relatively small or large x values typically a ect the regression slope (. ) Regression outliers: are far from the rest of the observations typically a ect the y-intercept. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Be Cautious of In uential Outliers (. ) Both in uential observations and regression outliers a ect the regression line. (. ) In uential observations: usually have relatively small or large x values typically a ect the regression slope (. ) Regression outliers: are far from the rest of the observations typically a ect the y-intercept. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Be Cautious of In uential Outliers Outlier Data points that diverge from the overall pattern and have large residuals are called outliers. Outliers limit the t of the regression equation to the data. This is illustrated in the scatter-plots below. The coe cient of determination is bigger when the outlier is not present.With outlier Regression equation: ^y = 97:51 3:32x Coe cient of determination: R2 = 0:55 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Be Cautious of In uential Outliers Outlier Data points that diverge from the overall pattern and have large residuals are called outliers. Outliers limit the t of the regression equation to the data. This is illustrated in the scatter-plots below. The coe cient of determination is bigger when the outlier is not present. With outlier Regression equation: ^y = 97:51 3:32x Coe cient of determination: R2 = 0:55 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Be Cautious of In uential Outliers Outlier Data points that diverge from the overall pattern and have large residuals are called outliers. Outliers limit the t of the regression equation to the data. This is illustrated in the scatter-plots below. The coe cient of determination is bigger when the outlier is not present.With outlier Regression equation: ^y = 97:51 3:32x Coe cient of determination: R2 = 0:55 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Be Cautious of In uential Outliers Outlier Data points that diverge from the overall pattern and have large residuals are called outliers. Outliers limit the t of the regression equation to the data. This is illustrated in the scatter-plots below. The coe cient of determination is bigger when the outlier is not present.With outlier Regression equation: ^y = 97:51 3:32x Coe cient of determination: R2 = 0:55 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Be Cautious of In uential Outliers Outlier Data points that diverge from the overall pattern and have large residuals are called outliers. Outliers limit the t of the regression equation to the data. This is illustrated in the scatter-plots below. The coe cient of determination is bigger when the outlier is not present.Without outlier Regression equation: ^y = 104:78 4:10x Coe cient of determination: R2 = 0:94 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Be Cautious of In uential Outliers Outlier Data points that diverge from the overall pattern and have large residuals are called outliers. Outliers limit the t of the regression equation to the data. This is illustrated in the scatter-plots below. The coe cient of determination is bigger when the outlier is not present. Without outlier Regression equation: ^y = 104:78 4:10x Coe cient of determination: R2 = 0:94 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Be Cautious of In uential Outliers Outlier Data points that diverge from the overall pattern and have large residuals are called outliers. Outliers limit the t of the regression equation to the data. This is illustrated in the scatter-plots below. The coe cient of determination is bigger when the outlier is not present.Without outlier Regression equation: ^y = 104:78 4:10x Coe cient of determination: R2 = 0:94 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Be Cautious of In uential Outliers Outlier Data points that diverge from the overall pattern and have large residuals are called outliers. Outliers limit the t of the regression equation to the data. This is illustrated in the scatter-plots below. The coe cient of determination is bigger when the outlier is not present.Without outlier Regression equation: ^y = 104:78 4:10x Coe cient of determination: R2 = 0:94 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Be Cautious of In uential Outliers Outlier Data points that diverge from the overall pattern and have large residuals are called outliers. Outliers limit the t of the regression equation to the data. This is illustrated in the scatter-plots below. The coe cient of determination is bigger when the outlier is not present.Without outlier Regression equation: ^y = 104:78 4:10x Coe cient of determination: R2 = 0:94 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Be Cautious of In uential Outliers In uential In uential points are data points with extreme values that greatly a ect the the slope of the regression line.Note that this in uential point, unlike the outliers discussed above, did not reduce the coe cient of determination. In fact, the coe cient of determination was bigger when the in uential point was present.Without in uential Regression equation: ^y = 92:54 2:5x Slope: b0 = 2:5 Coe cient of determination: R2 = 0:46 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Be Cautious of In uential Outliers In uential In uential points are data points with extreme values that greatly a ect the the slope of the regression line. Note that this in uential point, unlike the outliers discussed above, did not reduce the coe cient of determination. In fact, the coe cient of determination was bigger when the in uential point was present.Without in uential Regression equation: ^y = 92:54 2:5x Slope: b0 = 2:5 Coe cient of determination: R2 = 0:46 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Be Cautious of In uential Outliers In uential In uential points are data points with extreme values that greatly a ect the the slope of the regression line.Note that this in uential point, unlike the outliers discussed above, did not reduce the coe cient of determination. In fact, the coe cient of determination was bigger when the in uential point was present. Without in uential Regression equation: ^y = 92:54 2:5x Slope: b0 = 2:5 Coe cient of determination: R2 = 0:46 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Be Cautious of In uential Outliers In uential In uential points are data points with extreme values that greatly a ect the the slope of the regression line.Note that this in uential point, unlike the outliers discussed above, did not reduce the coe cient of determination. In fact, the coe cient of determination was bigger when the in uential point was present.Without in uential Regression equation: ^y = 92:54 2:5x Slope: b0 = 2:5 Coe cient of determination: R2 = 0:46 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Be Cautious of In uential Outliers In uential In uential points are data points with extreme values that greatly a ect the the slope of the regression line.Note that this in uential point, unlike the outliers discussed above, did not reduce the coe cient of determination. In fact, the coe cient of determination was bigger when the in uential point was present.Without in uential Regression equation: ^y = 92:54 2:5x Slope: b0 = 2:5 Coe cient of determination: R2 = 0:46 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Be Cautious of In uential Outliers In uential In uential points are data points with extreme values that greatly a ect the the slope of the regression line.Note that this in uential point, unlike the outliers discussed above, did not reduce the coe cient of determination. In fact, the coe cient of determination was bigger when the in uential point was present.Without in uential Regression equation: ^y = 92:54 2:5x Slope: b0 = 2:5 Coe cient of determination: R2 = 0:46 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Be Cautious of In uential Outliers In uential In uential points are data points with extreme values that greatly a ect the the slope of the regression line.Note that this in uential point, unlike the outliers discussed above, did not reduce the coe cient of determination. In fact, the coe cient of determination was bigger when the in uential point was present. With in uential Regression equation: ^y = 87:59 1:6x Slope: b0 = 1:6 Coe cient of determination: R2 = 0:52 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Be Cautious of In uential Outliers In uential In uential points are data points with extreme values that greatly a ect the the slope of the regression line. Note that this in uential point, unlike the outliers discussed above, did not reduce the coe cient of determination. In fact, the coe cient of determination was bigger when the in uential point was present. With in uential Regression equation: ^y = 87:59 1:6x Slope: b0 = 1:6 Coe cient of determination: R2 = 0:52 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Be Cautious of In uential Outliers In uential In uential points are data points with extreme values that greatly a ect the the slope of the regression line.Note that this in uential point, unlike the outliers discussed above, did not reduce the coe cient of determination. In fact, the coe cient of determination was bigger when the in uential point was present. With in uential Regression equation: ^y = 87:59 1:6x Slope: b0 = 1:6 Coe cient of determination: R2 = 0:52 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Be Cautious of In uential Outliers In uential In uential points are data points with extreme values that greatly a ect the the slope of the regression line.Note that this in uential point, unlike the outliers discussed above, did not reduce the coe cient of determination. In fact, the coe cient of determination was bigger when the in uential point was present. With in uential Regression equation: ^y = 87:59 1:6x Slope: b0 = 1:6 Coe cient of determination: R2 = 0:52 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Be Cautious of In uential Outliers In uential In uential points are data points with extreme values that greatly a ect the the slope of the regression line.Note that this in uential point, unlike the outliers discussed above, did not reduce the coe cient of determination. In fact, the coe cient of determination was bigger when the in uential point was present. With in uential Regression equation: ^y = 87:59 1:6x Slope: b0 = 1:6 Coe cient of determination: R2 = 0:52 Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Correlation Does Not Imply Causation Sometimes, a correlation between two quantitative variables can be spurious. Lurking Variable A lurking variable is a variable (usually unobserved) that in uences the correlation between the variables of interest. Lurking Variable Amount of ice-cream consumption (x) and number of drowning incidents (y) are shown to have a positive correlation. "Does eating ice cream cause people to die?" No, "high heat" (in summer months) is a lurking variable that causes both x and y to go up, but there is no direct connection between ice cream and drowning. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Correlation Does Not Imply Causation Sometimes, a correlation between two quantitative variables can be spurious. Lurking Variable A lurking variable is a variable (usually unobserved) that in uences the correlation between the variables of interest. Lurking Variable Amount of ice-cream consumption (x) and number of drowning incidents (y) are shown to have a positive correlation. "Does eating ice cream cause people to die?" No, "high heat" (in summer months) is a lurking variable that causes both x and y to go up, but there is no direct connection between ice cream and drowning. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Correlation Does Not Imply Causation Sometimes, a correlation between two quantitative variables can be spurious. Lurking Variable A lurking variable is a variable (usually unobserved) that in uences the correlation between the variables of interest. Lurking Variable Amount of ice-cream consumption (x) and number of drowning incidents (y) are shown to have a positive correlation. "Does eating ice cream cause people to die?" No, "high heat" (in summer months) is a lurking variable that causes both x and y to go up, but there is no direct connection between ice cream and drowning. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Correlation Does Not Imply Causation Sometimes, a correlation between two quantitative variables can be spurious. Lurking Variable A lurking variable is a variable (usually unobserved) that in uences the correlation between the variables of interest. Lurking Variable Amount of ice-cream consumption (x) and number of drowning incidents (y) are shown to have a positive correlation. "Does eating ice cream cause people to die?" No, "high heat" (in summer months) is a lurking variable that causes both x and y to go up, but there is no direct connection between ice cream and drowning. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Correlation Does Not Imply Causation Sometimes, a correlation between two quantitative variables can be spurious. Lurking Variable A lurking variable is a variable (usually unobserved) that in uences the correlation between the variables of interest. Lurking Variable Amount of ice-cream consumption (x) and number of drowning incidents (y) are shown to have a positive correlation. "Does eating ice cream cause people to die?" No, "high heat" (in summer months) is a lurking variable that causes both x and y to go up, but there is no direct connection between ice cream and drowning. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Simpsons Paradox An association between two categorical variables can be spurious also (hence, association does not imply causation). Recall the data from UC Berkeley. Gender Admitted Denied Male 1198(44.5%) 1493(55.5%) Female 557(30.4%) 1278(69.6%) Proportionately, more males than females were admitted to graduate programs. Does this constitute evidence of a gender bias? Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Simpsons Paradox An association between two categorical variables can be spurious also (hence, association does not imply causation). Recall the data from UC Berkeley. Gender Admitted Denied Male 1198(44.5%) 1493(55.5%) Female 557(30.4%) 1278(69.6%) Proportionately, more males than females were admitted to graduate programs. Does this constitute evidence of a gender bias? Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Simpsons Paradox An association between two categorical variables can be spurious also (hence, association does not imply causation). Recall the data from UC Berkeley. Gender Admitted Denied Male 1198(44.5%) 1493(55.5%) Female 557(30.4%) 1278(69.6%) Proportionately, more males than females were admitted to graduate programs. Does this constitute evidence of a gender bias? Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Let us break down the 2 2 table into two sub-tables according to the selectivity of the program. Highly Selective Programs Gender Admitted Denied Male 334(25.6%) 972(74.4%) Female 451(26.5%) 1251(73.5%) Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Let us break down the 2 2 table into two sub-tables according to the selectivity of the program. Highly Selective Programs Gender Admitted Denied Male 334(25.6%) 972(74.4%) Female 451(26.5%) 1251(73.5%) Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Let us break down the 2 2 table into two sub-tables according to the selectivity of the program. Highly Selective Programs Gender Admitted Denied Male 334(25.6%) 972(74.4%) Female 451(26.5%) 1251(73.5%) Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Moderately Selective Programs Gender Admitted Denied Male 864(62.4% ) 521(37.6% ) Female 106(79.7% ) 27(20.3% ) Here, in both highly and moderately selective programs, the nature of the apparent gender bias seems to be reversed. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Moderately Selective Programs Gender Admitted Denied Male 864(62.4% ) 521(37.6% ) Female 106(79.7% ) 27(20.3% ) Here, in both highly and moderately selective programs, the nature of the apparent gender bias seems to be reversed. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Selectivity of the Programs In the original table, the female applicants appear to be at a disadvantage. This is because they tended to apply to the more compatitive program(92.8% of all female applications vs, 48.5% of all male applications), where the acceptance rate was low. The above example illustrates the phenomenon called the Simpson paradox. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Confounding Confounding occurs when two explanatory variables, which are mutually associated, both a ect the response variable. When two variables are confounded, it is unclear which is the true cause of the response. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Confounding Confounding occurs when two explanatory variables, which are mutually associated, both a ect the response variable. When two variables are confounded, it is unclear which is the true cause of the response. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 How can we explore the association between two categorial variables? How can we explore the association between two quantitative variables? How Can We Predict the Outcome of a Variable? What Are Some Cautions in Analyzing Associations? Confounding Consider an experiment in which the longevities of three brands of tires are examined: Brand A tire is tested by Dattatreya; Brand B tire is tested by Ekachakra; Brand C tire is tested by Firaki. Suppose that the Brand A tire, driven by Dattatreya, lasted the longest. Can we conclude that this brand really lasts the longest? Or, could it be that Dattatreya was the best driver? Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Chapter4: Gathering Data Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Experimental design The goal of every statistical study is to collect data and then use the data to make decision. Any decision you make using the results of statistical study is only as good as the process used to obtain the data. If the process is awed, then the resulting decision is questionable. While you may never have to develop a statistical study, it is likely that you will have to interpret the results of one. And before you interpret the result of a study, you should determine whether or not the results are valid. In other words, you should be familiar with how to design a statistical study. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Design a Statistical Study Guidelines 1 Identify the variable(s) of interest (the focus) and the population of the study. 2 Develop a detailed plan for collecting data. If you use a sample, make sure sample is representative of the population. 3 Collect the data. 4 Describe the data using descriptive statistics techniques. 5 Interprets the data and make decisions about the population using inferential statistics. 6 Identify any possible errors. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Design a Statistical Study Guidelines 1 Identify the variable(s) of interest (the focus) and the population of the study. 2 Develop a detailed plan for collecting data. If you use a sample, make sure sample is representative of the population. 3 Collect the data. 4 Describe the data using descriptive statistics techniques. 5 Interprets the data and make decisions about the population using inferential statistics. 6 Identify any possible errors. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Design a Statistical Study Guidelines 1 Identify the variable(s) of interest (the focus) and the population of the study. 2 Develop a detailed plan for collecting data. If you use a sample, make sure sample is representative of the population. 3 Collect the data. 4 Describe the data using descriptive statistics techniques. 5 Interprets the data and make decisions about the population using inferential statistics. 6 Identify any possible errors. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Design a Statistical Study Guidelines 1 Identify the variable(s) of interest (the focus) and the population of the study. 2 Develop a detailed plan for collecting data. If you use a sample, make sure sample is representative of the population. 3 Collect the data. 4 Describe the data using descriptive statistics techniques. 5 Interprets the data and make decisions about the population using inferential statistics. 6 Identify any possible errors. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Design a Statistical Study Guidelines 1 Identify the variable(s) of interest (the focus) and the population of the study. 2 Develop a detailed plan for collecting data. If you use a sample, make sure sample is representative of the population. 3 Collect the data. 4 Describe the data using descriptive statistics techniques. 5 Interprets the data and make decisions about the population using inferential statistics. 6 Identify any possible errors. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Design a Statistical Study Guidelines 1 Identify the variable(s) of interest (the focus) and the population of the study. 2 Develop a detailed plan for collecting data. If you use a sample, make sure sample is representative of the population. 3 Collect the data. 4 Describe the data using descriptive statistics techniques. 5 Interprets the data and make decisions about the population using inferential statistics. 6 Identify any possible errors. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias How to Collect Data There are several ways you can collect data. Often, the focus of the study dedicates the best way to collect data. The following is a summary of four methods of data collection. Take a census A census is a count or measure of an entire population. Taking a census provides complete information, but it is often costly and di cult to perform. Use sampling A sampling is a count or measure of part of population. The statistics calculated from a sample are used to predict various population parameters. For instance, every year the U.S. Census Bureau samples the U.S. population to update the most recent census data. Using sampling is often more practical than taking census. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias How to Collect Data There are several ways you can collect data. Often, the focus of the study dedicates the best way to collect data. The following is a summary of four methods of data collection. Take a census A census is a count or measure of an entire population. Taking a census provides complete information, but it is often costly and di cult to perform. Use sampling A sampling is a count or measure of part of population. The statistics calculated from a sample are used to predict various population parameters. For instance, every year the U.S. Census Bureau samples the U.S. population to update the most recent census data. Using sampling is often more practical than taking census. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias How to Collect Data Use a Simulation A simulation is the use of mathematical or physical model to reproduce the conditions of the situation or process. Collecting data often involves the use of computers. Simulations allow you to study situations that are impractical or even dangerous to create in real life and often save time and money. For instance, automobile manufactures use simulations with dummies to study the e ects of crashes on humans. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias How to Collect Data Perform an Experiment When performing an experiment, a treatments applied to part of population and responses are observed. A second part of populations often used as a control group. This group receives treatment or is given a placebo. After responses from both groups are observed, results are compared. For instance, to test the e ect of imposing a new marketing strategy in a certain region. Each experimental unit is called a block. Care must be taken to ensure that blocks are similar. Once you determine which method you will use to collect data, you might decide that a survey can help you. Survey can be used to take a census or a sampling. A survey is an investigation of one or more characteristics of a population. Most often, surveys are carried out on people by asking them questions. A disadvantage of using survey to collect data is that the wording of the questions can lead to bias results. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias How to Collect Data Perform an Experiment When performing an experiment, a treatments applied to part of population and responses are observed. A second part of populations often used as a control group. This group receives treatment or is given a placebo. After responses from both groups are observed, results are compared. For instance, to test the e ect of imposing a new marketing strategy in a certain region. Each experimental unit is called a block. Care must be taken to ensure that blocks are similar. Once you determine which method you will use to collect data, you might decide that a survey can help you. Survey can be used to take a census or a sampling. A survey is an investigation of one or more characteristics of a population. Most often, surveys are carried out on people by asking them questions. A disadvantage of using survey to collect data is that the wording of the questions can lead to bias results. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Deciding Upon Methods of Data Collection Consider the following, which method of data collection do you use? A study of the e ects of changing ight patterns on the number of airplane accidents. Because it is impractical to create this situation,you would want to use simulation A study of the e ect of aspirin on preventing heart attacks. In this study, you want to measure the e ect of a treatment (taking an aspirin) has on patients.So, you would want to perform an experiment. A study of weights of all linemen in National Football League.Because the National Football League teams keep accurate physical records of all players,you could take a census. A study of U.S. residents’ approval rating of the U.S. president.It would be nearly impossible to ask every person in U.S. whether or not he or she approves of the president’s job performance.So, you should use sampling to collect these data. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Deciding Upon Methods of Data Collection Consider the following, which method of data collection do you use? A study of the e ects of changing ight patterns on the number of airplane accidents.Because it is impractical to create this situation, you would want to use simulation A study of the e ect of aspirin on preventing heart attacks. In this study, you want to measure the e ect of a treatment (taking an aspirin) has on patients.So, you would want to perform an experiment. A study of weights of all linemen in National Football League.Because the National Football League teams keep accurate physical records of all players,you could take a census. A study of U.S. residents’ approval rating of the U.S. president.It would be nearly impossible to ask every person in U.S. whether or not he or she approves of the president’s job performance.So, you should use sampling to collect these data. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Deciding Upon Methods of Data Collection Consider the following, which method of data collection do you use? A study of the e ects of changing ight patterns on the number of airplane accidents.Because it is impractical to create this situation,you would want to use simulation A study of the e ect of aspirin on preventing heart attacks. In this study, you want to measure the e ect of a treatment (taking an aspirin) has on patients.So, you would want to perform an experiment. A study of weights of all linemen in National Football League.Because the National Football League teams keep accurate physical records of all players,you could take a census. A study of U.S. residents’ approval rating of the U.S. president.It would be nearly impossible to ask every person in U.S. whether or not he or she approves of the president’s job performance.So, you should use sampling to collect these data. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Deciding Upon Methods of Data Collection Consider the following, which method of data collection do you use? A study of the e ects of changing ight patterns on the number of airplane accidents.Because it is impractical to create this situation,you would want to use simulation A study of the e ect of aspirin on preventing heart attacks. In this study, you want to measure the e ect of a treatment (taking an aspirin) has on patients.So, you would want to perform an experiment. A study of weights of all linemen in National Football League.Because the National Football League teams keep accurate physical records of all players,you could take a census. A study of U.S. residents’ approval rating of the U.S. president.It would be nearly impossible to ask every person in U.S. whether or not he or she approves of the president’s job performance.So, you should use sampling to collect these data. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Deciding Upon Methods of Data Collection Consider the following, which method of data collection do you use? A study of the e ects of changing ight patterns on the number of airplane accidents.Because it is impractical to create this situation,you would want to use simulation A study of the e ect of aspirin on preventing heart attacks. In this study, you want to measure the e ect of a treatment (taking an aspirin) has on patients. So, you would want to perform an experiment. A study of weights of all linemen in National Football League.Because the National Football League teams keep accurate physical records of all players,you could take a census. A study of U.S. residents’ approval rating of the U.S. president.It would be nearly impossible to ask every person in U.S. whether or not he or she approves of the president’s job performance.So, you should use sampling to collect these data. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Deciding Upon Methods of Data Collection Consider the following, which method of data collection do you use? A study of the e ects of changing ight patterns on the number of airplane accidents.Because it is impractical to create this situation,you would want to use simulation A study of the e ect of aspirin on preventing heart attacks. In this study, you want to measure the e ect of a treatment (taking an aspirin) has on patients.So, you would want to perform an experiment. A study of weights of all linemen in National Football League.Because the National Football League teams keep accurate physical records of all players,you could take a census. A study of U.S. residents’ approval rating of the U.S. president.It would be nearly impossible to ask every person in U.S. whether or not he or she approves of the president’s job performance.So, you should use sampling to collect these data. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Deciding Upon Methods of Data Collection Consider the following, which method of data collection do you use? A study of the e ects of changing ight patterns on the number of airplane accidents.Because it is impractical to create this situation,you would want to use simulation A study of the e ect of aspirin on preventing heart attacks. In this study, you want to measure the e ect of a treatment (taking an aspirin) has on patients.So, you would want to perform an experiment. A study of weights of all linemen in National Football League. Because the National Football League teams keep accurate physical records of all players,you could take a census. A study of U.S. residents’ approval rating of the U.S. president.It would be nearly impossible to ask every person in U.S. whether or not he or she approves of the president’s job performance.So, you should use sampling to collect these data. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Deciding Upon Methods of Data Collection Consider the following, which method of data collection do you use? A study of the e ects of changing ight patterns on the number of airplane accidents.Because it is impractical to create this situation,you would want to use simulation A study of the e ect of aspirin on preventing heart attacks. In this study, you want to measure the e ect of a treatment (taking an aspirin) has on patients.So, you would want to perform an experiment. A study of weights of all linemen in National Football League.Because the National Football League teams keep accurate physical records of all players, you could take a census. A study of U.S. residents’ approval rating of the U.S. president.It would be nearly impossible to ask every person in U.S. whether or not he or she approves of the president’s job performance.So, you should use sampling to collect these data. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Deciding Upon Methods of Data Collection Consider the following, which method of data collection do you use? A study of the e ects of changing ight patterns on the number of airplane accidents.Because it is impractical to create this situation,you would want to use simulation A study of the e ect of aspirin on preventing heart attacks. In this study, you want to measure the e ect of a treatment (taking an aspirin) has on patients.So, you would want to perform an experiment. A study of weights of all linemen in National Football League.Because the National Football League teams keep accurate physical records of all players,you could take a census. A study of U.S. residents’ approval rating of the U.S. president.It would be nearly impossible to ask every person in U.S. whether or not he or she approves of the president’s job performance.So, you should use sampling to collect these data. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Deciding Upon Methods of Data Collection Consider the following, which method of data collection do you use? A study of the e ects of changing ight patterns on the number of airplane accidents.Because it is impractical to create this situation,you would want to use simulation A study of the e ect of aspirin on preventing heart attacks. In this study, you want to measure the e ect of a treatment (taking an aspirin) has on patients.So, you would want to perform an experiment. A study of weights of all linemen in National Football League.Because the National Football League teams keep accurate physical records of all players,you could take a census. A study of U.S. residents’ approval rating of the U.S. president. It would be nearly impossible to ask every person in U.S. whether or not he or she approves of the president’s job performance.So, you should use sampling to collect these data. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Deciding Upon Methods of Data Collection Consider the following, which method of data collection do you use? A study of the e ects of changing ight patterns on the number of airplane accidents.Because it is impractical to create this situation,you would want to use simulation A study of the e ect of aspirin on preventing heart attacks. In this study, you want to measure the e ect of a treatment (taking an aspirin) has on patients.So, you would want to perform an experiment. A study of weights of all linemen in National Football League.Because the National Football League teams keep accurate physical records of all players,you could take a census. A study of U.S. residents’ approval rating of the U.S. president.It would be nearly impossible to ask every person in U.S. whether or not he or she approves of the president’s job performance. So, you should use sampling to collect these data. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Deciding Upon Methods of Data Collection Consider the following, which method of data collection do you use? A study of the e ects of changing ight patterns on the number of airplane accidents.Because it is impractical to create this situation,you would want to use simulation A study of the e ect of aspirin on preventing heart attacks. In this study, you want to measure the e ect of a treatment (taking an aspirin) has on patients.So, you would want to perform an experiment. A study of weights of all linemen in National Football League.Because the National Football League teams keep accurate physical records of all players,you could take a census. A study of U.S. residents’ approval rating of the U.S. president.It would be nearly impossible to ask every person in U.S. whether or not he or she approves of the president’s job performance.So, you should use sampling to collect these data. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias To collect unbiased data, it is important that the sample be representative of the population. Appropriate sampling techniques must be used to ensure that inferences about the population are valid. Remember that when a study is done with faulty data, the results are questionable. A biased sample is one that is not representative of the population from which it is drawn. For instance, a sample consisting of only 18 to 22 year-old college students would not be representative of the entire 18 to 22 year-old population in the country. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias To collect unbiased data, it is important that the sample be representative of the population. Appropriate sampling techniques must be used to ensure that inferences about the population are valid. Remember that when a study is done with faulty data, the results are questionable.A biased sample is one that is not representative of the population from which it is drawn. For instance, a sample consisting of only 18 to 22 year-old college students would not be representative of the entire 18 to 22 year-old population in the country. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Random Sample Simple Random Sample A random sample is one in which every member of the population has an equal chance of being selected. A simple random sample is a sample in which every possible sample of the same size has the same chance of being selected. One way to collect a simple random sample is to assign a di erent number to each member of the population and then use a random number. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Random Sample Strati ed Sample When it is important for the sample to have a members from each segment of the population, you should use a strati ed sample. Depending on the focus of the study, members of the population are divided into two or more di erent subsets, called strata, that share a similar characteristic such age, gender, ethnicity, or even political reference. A sample is then randomly selected from each strata. Using a strati ed sample ensures that each segments of the population is represented. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Random Sample Cluster Sample When the population falls into naturally occurring subgroups, each having similar characteristics, a cluster sample may be the most appropriate. To select a cluster sample, divide the population into groups called clusters, and select all of the members in one or more (but not all) of the clusters. A type of sample that often leads to biased (and it is not recommended) is a convenience sample. A convenience sample consists only of available people. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Random Sample Cluster Sample When the population falls into naturally occurring subgroups, each having similar characteristics, a cluster sample may be the most appropriate. To select a cluster sample, divide the population into groups called clusters, and select all of the members in one or more (but not all) of the clusters. A type of sample that often leads to biased (and it is not recommended) is a convenience sample. A convenience sample consists only of available people. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Identify Sampling Techniques You are doing a study to determine the opinion of students at your school regarding gun control. Identify sampling techniques you are using if you select the sample listed. You select a class at random and question each student in the class.Because each class is a naturally occurring subgroup (a cluster) and question each student in the class, this is a cluster sample. You divide the student population with respect to majors and randomly select and question some students in each major. Because students are divided into strata (majors) and a sample is selected from each major, this is a strati ed sample. You assign each student a number and generate random numbers. You then question each student whose number is randomly selected.Each sample of the same size has an equal chance of being selected and each student has an equal chance of being selected, so this is a simple random sample. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Identify Sampling Techniques You are doing a study to determine the opinion of students at your school regarding gun control. Identify sampling techniques you are using if you select the sample listed. You select a class at random and question each student in the class. Because each class is a naturally occurring subgroup (a cluster) and question each student in the class, this is a cluster sample. You divide the student population with respect to majors and randomly select and question some students in each major. Because students are divided into strata (majors) and a sample is selected from each major, this is a strati ed sample. You assign each student a number and generate random numbers. You then question each student whose number is randomly selected.Each sample of the same size has an equal chance of being selected and each student has an equal chance of being selected, so this is a simple random sample. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Identify Sampling Techniques You are doing a study to determine the opinion of students at your school regarding gun control. Identify sampling techniques you are using if you select the sample listed. You select a class at random and question each student in the class.Because each class is a naturally occurring subgroup (a cluster) and question each student in the class, this is a cluster sample. You divide the student population with respect to majors and randomly select and question some students in each major. Because students are divided into strata (majors) and a sample is selected from each major, this is a strati ed sample. You assign each student a number and generate random numbers. You then question each student whose number is randomly selected.Each sample of the same size has an equal chance of being selected and each student has an equal chance of being selected, so this is a simple random sample. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Identify Sampling Techniques You are doing a study to determine the opinion of students at your school regarding gun control. Identify sampling techniques you are using if you select the sample listed. You select a class at random and question each student in the class.Because each class is a naturally occurring subgroup (a cluster) and question each student in the class, this is a cluster sample. You divide the student population with respect to majors and randomly select and question some students in each major. Because students are divided into strata (majors) and a sample is selected from each major, this is a strati ed sample. You assign each student a number and generate random numbers. You then question each student whose number is randomly selected.Each sample of the same size has an equal chance of being selected and each student has an equal chance of being selected, so this is a simple random sample. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Identify Sampling Techniques You are doing a study to determine the opinion of students at your school regarding gun control. Identify sampling techniques you are using if you select the sample listed. You select a class at random and question each student in the class.Because each class is a naturally occurring subgroup (a cluster) and question each student in the class, this is a cluster sample. You divide the student population with respect to majors and randomly select and question some students in each major. Because students are divided into strata (majors) and a sample is selected from each major, this is a strati ed sample. You assign each student a number and generate random numbers. You then question each student whose number is randomly selected.Each sample of the same size has an equal chance of being selected and each student has an equal chance of being selected, so this is a simple random sample. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Identify Sampling Techniques You are doing a study to determine the opinion of students at your school regarding gun control. Identify sampling techniques you are using if you select the sample listed. You select a class at random and question each student in the class.Because each class is a naturally occurring subgroup (a cluster) and question each student in the class, this is a cluster sample. You divide the student population with respect to majors and randomly select and question some students in each major. Because students are divided into strata (majors) and a sample is selected from each major, this is a strati ed sample. You assign each student a number and generate random numbers. You then question each student whose number is randomly selected. Each sample of the same size has an equal chance of being selected and each student has an equal chance of being selected, so this is a simple random sample. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Identify Sampling Techniques You are doing a study to determine the opinion of students at your school regarding gun control. Identify sampling techniques you are using if you select the sample listed. You select a class at random and question each student in the class.Because each class is a naturally occurring subgroup (a cluster) and question each student in the class, this is a cluster sample. You divide the student population with respect to majors and randomly select and question some students in each major. Because students are divided into strata (majors) and a sample is selected from each major, this is a strati ed sample. You assign each student a number and generate random numbers. You then question each student whose number is randomly selected.Each sample of the same size has an equal chance of being selected and each student has an equal chance of being selected, so this is a simple random sample. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Margin of Error The margin of error (in percent) is de ned to be: 1p n 100 Example of Margin of Error Suppose that 72% of the 512 people indicated "life is good" in a sample survey. The margin of error is 1p 512 100 4:4 So, the true percentage of the people who feel "life is good" may be as low as (72 4:4 = 67:6) 67.6% and as high as (72 + 4:4 = 76:4) 76.4% Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Margin of Error The margin of error (in percent) is de ned to be: 1p n 100 Example of Margin of Error Suppose that 72% of the 512 people indicated "life is good" in a sample survey. The margin of error is 1p 512 100 4:4 So, the true percentage of the people who feel "life is good" may be as low as (72 4:4 = 67:6) 67.6% and as high as (72 + 4:4 = 76:4) 76.4% Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Margin of Error The margin of error (in percent) is de ned to be: 1p n 100 Example of Margin of Error Suppose that 72% of the 512 people indicated "life is good" in a sample survey. The margin of error is 1p 512 100 4:4 So, the true percentage of the people who feel "life is good" may be as low as (72 4:4 = 67:6) 67.6% and as high as (72 + 4:4 = 76:4) 76.4% Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Margin of Error The margin of error (in percent) is de ned to be: 1p n 100 Example of Margin of Error Suppose that 72% of the 512 people indicated "life is good" in a sample survey. The margin of error is 1p 512 100 4:4 So, the true percentage of the people who feel "life is good" may be as low as (72 4:4 = 67:6) 67.6% and as high as (72 + 4:4 = 76:4) 76.4% Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Margin of Error The margin of error (in percent) is de ned to be: 1p n 100 Example of Margin of Error Suppose that 72% of the 512 people indicated "life is good" in a sample survey. The margin of error is 1p 512 100 4:4 So, the true percentage of the people who feel "life is good" may be as low as (72 4:4 = 67:6) 67.6% and as high as (72 + 4:4 = 76:4) 76.4% Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Margin of Error The margin of error (in percent) is de ned to be: 1p n 100 Example of Margin of Error Suppose that 72% of the 512 people indicated "life is good" in a sample survey. The margin of error is 1p 512 100 4:4 So, the true percentage of the people who feel "life is good" may be as low as (72 4:4 = 67:6) 67.6% and as high as (72 + 4:4 = 76:4) 76.4% Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Margin of Error The margin of error (in percent) is de ned to be: 1p n 100 Example of Margin of Error Suppose that 72% of the 512 people indicated "life is good" in a sample survey. The margin of error is 1p 512 100 4:4 So, the true percentage of the people who feel "life is good" may be as low as (72 4:4 = 67:6) 67.6% and as high as (72 + 4:4 = 76:4) 76.4% Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Be Wary of Sources of Potential Bias in Sample Surveys Sampling bias occurs from using nonrandom samples; having undercoverage (i.e., sampling frame excludes some parts of the population). Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Be Wary of Sources of Potential Bias in Sample Surveys Sampling bias occurs from using nonrandom samples; having undercoverage (i.e., sampling frame excludes some parts of the population). Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Be Wary of Sources of Potential Bias in Sample Surveys Nonresponse bias occurs when some sampled individuals cannot be reached; some individuals fail to answer some questions, which produces missing data. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Be Wary of Sources of Potential Bias in Sample Surveys Nonresponse bias occurs when some sampled individuals cannot be reached; some individuals fail to answer some questions, which produces missing data. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Be Wary of Sources of Potential Bias in Sample Surveys Response bias occurs when some individuals give incorrect responses; the question wording or the interviewing method in uences the responses, for instance, the way interviewer dresses up or the way he/she talks. Hamid Semiyari Part1b Chapter3, Association: Contingency, Correlation, and Regression Chapter 4 Experimental design Sampling Techniques How Accurate Are Results from Surveys with Random Sampling? Revisiting Bias Be Wary of Sources of Potential Bias in Sample Surveys Response bias occurs when some individuals give incorrect responses; the question wording or the interviewing method in uences the responses, for instance, the way interviewer dresses up or the way he/she talks. Hamid Semiyari Part1b