Department of Economics Prof. Derek DeLia Econometrics 220:322:B6 Summer 2009 APPLICATIONS OF THE MULTIPLE REGRESSION MODEL The model: EMBED Equation.3 ( EMBED Equation.3 ). Isolate the impact of one independent variable while holding others fixed. Take assumptions of the CLR model as given. Assume the error is normally distributed. Inference is similar to that for the simple model. Now df = n – k. IMPORTANT: k is the number of slopes plus the intercept GPA example Y (GPA) X (Parents income in $1,000) Hours of study per week 4.0 21.0 25 3.0 15.0 14 3.5 15.0 16 2.0 9.0 1 3.0 12.0 11 3.5 18.0 13 2.5 6.0 3 2.5 12.0 4 Simple regression results (with p-values in parentheses) EMBED Equation.3 = 1.38 + 0.12*INCOME, R2 = 0.78 (0.01) (0.00) Income is a significant predictor of GPA at the 1% level Extra $1,000 of income is associated with an additional 0.12 points of GPA. Income explains 78% of the variation in GPA. EMBED Equation.3 = 2.15 + 0.08*HOURS, R2 = 0.91 (0.00) (0.00) Hours is a significant predictor of GPA at the 1% level Extra hour of study is associated with an additional 0.08 points of GPA. Hours studied explains 91% of the variation in GPA. Multiple regression results (with p-values in parentheses) EMBED Equation.3 = 2.10 + 0.02*INCOME +0.07*HOURS, R2 = 0.92 (0.01) (0.66) (0.03) HOURS retains its predictive power. An extra $1,000 of income is associated with only an additional 0.02 points of GPA holding hours of study fixed. The value 0.02 is not significantly different from zero at the 5% level. An extra hour of study is associated with an increase of 0.07 points of GPA holding income fixed. The value 0.07 is significantly different from zero at the 5% level. Income and hours studied together explain 92% of the variation in GPA. A note on R2 In multiple regression, adding more variables can never decrease R2 and may increase it just by chance. Another way to measure goodness of fit is adjusted R2 Adjusted R2 provides a correction, which is useful for models with many independent variables. Ex. 4.1: Auto sales How do personal income and interest rates affect sales of automobiles? Data Time periods t: quarterly data from 1975-95 ==> N=82 Dependent variable: Real auto sales in $billions – SR Independent variables: Real personal income in $billions – YPR Real interest rate in annual percentage – RR Model EMBED Equation.3 Results (Standard errors in parentheses) EMBED Equation.3 (0.2213) (0.0015) (0.4361) Interpret the coefficient for YPR and determine whether it is statistically significant at the 1% level. Interpret the coefficient for RR and determine whether it is statistically significant at the 5% level. How would the coefficient for YPR change if RR were omitted? (Hint: Consider the possible relationships between YPR and RR.) Standardized coefficients Comparison of slope estimates that are in different units. E.g., dollars of income, hours of study. Measure variables as #std deviations from the mean. Amounts to a regression where everything is measured as “Z-scores”. EMBED Equation.3 STGPA = (GPA – mean of GPA)/(Std dev of GPA) STINCOME & STHOURS are defined similarly. Results (P-values in parentheses): STGPA = 0.14*STINCOME + 0.83*STHOURS, R2 = 0.92 (0.66) (0.03) A one standard deviation increase in income is associated with a 0.14 standard deviation increase in GPA holding hours of study fixed. A one standard deviation increase in hours studies is associated with a 0.83 standard deviation increase in GPA holding income fixed. Intercept disappears, since it is impossible to standardize a constant. Notice that p-values and R2 are no different from original regression. Useful fact Let EMBED Equation.3 be the ordinary slope coefficient for independent variable X and dependent variable Y and let EMBED Equation.3 be the corresponding standardized coefficient. Then the following equation is true: EMBED Equation.3 where EMBED Equation.3 is the sample standard deviation of X and EMBED Equation.3 is the sample standard deviation of Y. We will prove this for simple regression but the formula works for multiple regression too. Estimating elasticities Elasticities are fundamental to economic analysis – e.g., price, income, production elasticities. Measure sensitivity of quantity purchased to price, quantity purchased to income, production output to labor employed, etc. We want to know a 1% change in X is associated with how much of a % change in Y. Example: Elasticity of GPA with respect to hours studied Elasticity = %change in GPA/%change in hours From micro, elasticity = EMBED Equation.3 The first term is the slope of the regression line. For the second term we will use average values of GPA and HOURS. Elasticity = 0.07*(10.875/3) = 0.25 ==> a 10% increase in hours studied is associated with a 2.5% increase in GPA. Notice that elasticity will be different at different points on the regression line (i.e., when GPA and HOURS are not equal to their average values). Constant elasticity specification Measure variables in natural logs EMBED Equation.3 Notice elasticity can also be written as dlog(GPA)/dlog(INCOME), where the d refers to derivative with respect to income. Results (P-values in parentheses): LOG(GPA) = 0.57 + 0.05*LOG(INCOME) + 0.19*LOG(HOURS), R2 = 0.93 (0.05) (0.64) (0.01) A 10% increase in income is associated with a 0.5% increase in GPA holding hours studies fixed. This elasticity is the same at every point of the curve. Similarly, a 10% increase in hours studied is associated with a 1.9% increase in GPA. Cobb-Douglas production function Data: output Q, capital K, labor L Need simple way to represent usual isoquants Cobb-Douglas model: EMBED Equation.3 Need to estimate β’s Model can be “linearized” by taking logs EMBED Equation.3 Re-write as EMBED Equation.3 Now the model is linear in the parameters Coefficients can be interpreted as elasticities Modeling nonlinear effects Average cost curve Average cost (AC) as a function of output (Q) for firms i=1,…,N. U-shape Use quadratic (squared) terms EMBED Equation.3 Using calculus, we expect EMBED Equation.3 . (i.e., the second derivative w.r.t. Q needs to be positive for a U-shaped curve.) Ex 5.1: Cost function for S&L industry LAC: average annual operating expenses in $millions. Q: total assets in $millions. Estimation results (p-values in parentheses) EMBED Equation.3 (0.002) (0.013) (0.005) Notice the coefficient for the squared term is positive & statistically significant. Using this equation, we can find the optimal scale of operation (i.e., value of Q that minimizes average total cost). Setting the first derivative equal to zero gives Q=569 We already know the second derivative is positive, which means we have found a minimum (and not a maximum). Therefore, the optimal scale of operation is $569 million in total assets. Dummy variables AKA dichotomous or indicator variables Ex: output can be produced by two types of machines A or B. Is there a difference in average output between the two types? Regression framework: EMBED Equation.3 X2i is a dummy variable X2i equals 1 if machine A is used at factory i and equals 0 if machine B is used at factory i. Notice that adding a dummy variable for machine B would be redundant (i.e., perfect collinearity). EMBED Equation.3 if machine A is used at factory i = EMBED Equation.3 if machine B is used at factory i To answer the original question, test whether EMBED Equation.3 . QUESTION: Why use the regression model when you could just use a t-test? What if there were 3 machines A, B, and C? Use two contrast variables ==> one dummy variable (the reference variable) will be omitted to avoid perfect collinearity. Modeling interaction effects Production Y depends on unit of labor L EMBED Equation.3 The marginal productivity of labor is EMBED Equation.3 . Does the marginal productivity of labor depend on which machine is used? EMBED Equation.3 To answer the question, test whether EMBED Equation.3 . Model where machine can have both level and slope effects EMBED Equation.3 Level effect is measured by EMBED Equation.3 Slope effect is measured by EMBED Equation.3 Eviews example: Salary difference between men and women Simple t test from earlier lecture SALARY – worker salary in $ SEX – 1 for female, 0 for male Test for Equality of Means of SALARY Categorized by values of SEX Date: 07/28/06 Time: 16:04 Sample: 1 206 Included observations: 206 Method df Value Probability t-test 204 3.860190 0.0002 Anova F-statistic (1, 204) 14.90107 0.0002 Analysis of Variance Source of Variation df Sum of Sq. Mean Sq. Between 1 1.53E+09 1.53E+09 Within 204 2.10E+10 1.03E+08 Total 205 2.25E+10 1.10E+08 Category Statistics Std. Err. SEX Count Mean Std. Dev. of Mean 0 105 21,867.06 11640.63 1136.009 1 101 16,412.58 8292.053 825.0901 All 206 19192.78 10476.42 729.9270 This proves that a difference in salary between men and women exists. Simple linear regression Dependent Variable: SALARY Method: Least Squares Date: 05/17/08 Time: 22:53 Sample: 1 206 Included observations: 206 Variable Coefficient Std. Error t-Statistic Prob. C 21867.06 989.3988 22.10136 0.0000 SEX -5454.476 1413.007 -3.860190 0.0002 R-squared 0.068072 Mean dependent var 19192.78 Adjusted R-squared 0.063504 S.D. dependent var 10476.42 S.E. of regression 10138.32 Akaike info criterion 21.29569 Sum squared resid 2.10E+10 Schwarz criterion 21.32800 Log likelihood -2191.456 F-statistic 14.90107 Durbin-Watson stat 1.790377 Prob(F-statistic) 0.000152 This also proves that a difference exists. Notice that the average salary difference is the same using both methods. Multiple linear regression COLLEGE – 1 if worker has college degree, 0 if not AGE – worker’s age in years Dependent Variable: SALARY Method: Least Squares Date: 05/17/08 Time: 23:01 Sample: 1 206 Included observations: 206 Variable Coefficient Std. Error t-Statistic Prob. C 11682.86 2078.997 5.619467 0.0000 SEX -5404.359 1252.466 -4.314974 0.0000 COLLEGE 9779.992 1418.050 6.896792 0.0000 AGE 205.9574 52.61257 3.914605 0.0001 R-squared 0.296498 Mean dependent var 19192.78 Adjusted R-squared 0.286050 S.D. dependent var 10476.42 S.E. of regression 8852.112 Akaike info criterion 21.03393 Sum squared resid 1.58E+10 Schwarz criterion 21.09855 Log likelihood -2162.494 F-statistic 28.37834 Durbin-Watson stat 1.817756 Prob(F-statistic) 0.000000 This proves that adjusting for age and college education reduces the salary difference but only by a little. Make sure you know how to interpret all of the coefficients and how to do statistical inference (i.e., significance tests and confidence intervals for the slopes). PAGE Page PAGE 9 of NUMPAGES 9