copyright © 2011 pearson education, inc. multiple regression chapter 23

47

Upload: helena-hamilton

Post on 25-Dec-2015

244 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23
Page 2: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

Copyright © 2011 Pearson Education, Inc.

Multiple Regression

Chapter 23

Page 3: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

23.1 The Multiple Regression Model

A chain is considering where to locate a new restaurant. Is it better to locate it far from the competition or in a more affluent area?

Use multiple regression to describe the relationship between several explanatory variables and the response.

Multiple regression separates the effects of each explanatory variable on the response and reveals which really matter.

Copyright © 2011 Pearson Education, Inc.

3 of 47

Page 4: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

23.1 The Multiple Regression Model

Multiple regression model (MRM): model for the association in the population between multiple explanatory variables and a response.

k: the number of explanatory variables in the multiple regression (k = 1 in simple regression).

Copyright © 2011 Pearson Education, Inc.

4 of 47

Page 5: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

23.1 The Multiple Regression Model

The observed response y is linearly related to k explanatory variables x1, x2, and xk by the equation

, ~N(0, )

The unobserved errors in the model1. are independent of one another,2. have equal variance, and 3. are normally distributed around the regression

equation.

Copyright © 2011 Pearson Education, Inc.

5 of 47

kk xxxy ...22110 2

Page 6: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

23.1 The Multiple Regression Model

While the SRM bundles all but one explanatory variable into the error term, multiple regression allows for the inclusion of several variables in the model.

In the MRM, residuals departing from normality may suggest that an important explanatory variable has been omitted.

Copyright © 2011 Pearson Education, Inc.

6 of 47

Page 7: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

23.2 Interpreting Multiple Regression

Example: Women’s Apparel Stores

Response variable: sales at stores in a chain of women’s apparel (annually in dollars per square foot of retail space).

Two explanatory variables: median household income in the area (thousands of dollars) and number of competing apparel stores in the same mall.

Copyright © 2011 Pearson Education, Inc.

7 of 47

Page 8: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

23.2 Interpreting Multiple Regression

Example: Women’s Apparel Stores

Begin with a scatterplot matrix, a table of scatterplots arranged as in a correlation matrix.

Using a scatterplot matrix to understand data can save considerable time later when interpreting the multiple regression results.

Copyright © 2011 Pearson Education, Inc.

8 of 47

Page 9: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

23.2 Interpreting Multiple Regression

Scatterplot Matrix: Women’s Apparel Stores

Copyright © 2011 Pearson Education, Inc.

9 of 47

Page 10: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

23.2 Interpreting Multiple Regression

Example: Women’s Apparel Stores

The scatterplot matrix for this example

Confirms a positive linear association between sales and median household income.

Shows a weak association between sales and number of competitors.

Copyright © 2011 Pearson Education, Inc.

10 of 47

Page 11: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

23.2 Interpreting Multiple Regression

Correlation Matrix: Women’s Apparel Stores

Copyright © 2011 Pearson Education, Inc.

11 of 47

Page 12: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

23.2 Interpreting Multiple Regression

R-squared and se

The equation of the fitted model for estimating sales in the women’s apparel stores example is

= 60.3587 + 7.966 Income -24.165 Competitors

Copyright © 2011 Pearson Education, Inc.

12 of 47

y

Page 13: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

23.2 Interpreting Multiple Regression

R-squared and se

R2 indicates that the fitted equation explains 59.47% of the store-to-store variation in sales.

For this example, R2 is larger than the r2 values for separate SRMs fitted for each explanatory variable; it is also larger than their sum.

For this example, se = $68.03.

Copyright © 2011 Pearson Education, Inc.

13 of 47

Page 14: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

23.2 Interpreting Multiple Regression

R-squared and se

is known as the adjusted R-squared. It adjusts for both sample size n and model size k. It is always smaller than R2.

The residual degrees of freedom (n-k-1) is the divisor of se. and se move in opposite directions when an explanatory variable is added to the model ( goes up while se goes down).

Copyright © 2011 Pearson Education, Inc.

14 of 47

2R

2R

2R

Page 15: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

23.2 Interpreting Multiple Regression

Marginal and Partial Slopes

Partial slope: slope of an explanatory variable in a multiple regression that statistically excludes the effects of other explanatory variables.

Marginal slope: slope of an explanatory variable in a simple regression.

Copyright © 2011 Pearson Education, Inc.

15 of 47

Page 16: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

23.2 Interpreting Multiple Regression

Partial Slopes: Women’s Apparel Stores

Copyright © 2011 Pearson Education, Inc.

16 of 47

Page 17: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

23.2 Interpreting Multiple Regression

Partial Slopes: Women’s Apparel Stores

The slope b1 = 7.966 for Income implies that a store in a location with a higher median household of $10,000 sells, on average, $79.66 more per square foot than a store in a less affluent location with the same number of competitors.

The slope b2 = -24.165 implies that, among stores in equally affluent locations, each additional competitor lowers average sales by $24.165 per square foot.

Copyright © 2011 Pearson Education, Inc.

17 of 47

Page 18: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

23.2 Interpreting Multiple Regression

Marginal and Partial Slopes

Partial and marginal slopes only agree when the explanatory variables are uncorrelated.

In this example they do not agree. For instance, the marginal slope for Competitors is 4.6352. It is positive because more affluent locations tend to draw more competitors. The MRM separates these effects but the SRM does not.

Copyright © 2011 Pearson Education, Inc.

18 of 47

Page 19: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

23.2 Interpreting Multiple Regression

Path Diagram

Path diagram: schematic drawing of the relationships among the explanatory variables and the response.

Collinearity: very high correlations among the explanatory variables that make the estimates in multiple regression uninterpretable.

Copyright © 2011 Pearson Education, Inc.

19 of 47

Page 20: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

23.2 Interpreting Multiple Regression

Path Diagram: Women’s Apparel Stores

Income has a direct positive effect on sales and an indirect negative effect on sales via the number of competitors.

Copyright © 2011 Pearson Education, Inc.

20 of 47

Page 21: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

23.3 Checking Conditions

Conditions for Inference

Use the residuals from the fitted MRM to check that the errors in the model

are independent; have equal variance; and follow a normal distribution.

Copyright © 2011 Pearson Education, Inc.

21 of 47

Page 22: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

23.3 Checking Conditions

Calibration Plot

Calibration plot: scatterplot of the response on the fitted values .

R2 is the correlation between and ; the tighter data cluster along the diagonal line in the calibration plot, the larger the R2 value.

Copyright © 2011 Pearson Education, Inc.

22 of 47

yy

y y

Page 23: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

23.3 Checking Conditions

Calibration Plot: Women’s Apparel Stores

Copyright © 2011 Pearson Education, Inc.

23 of 47

Page 24: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

23.3 Checking Conditions

Residual Plots

Plot of residuals versus fitted y values is used to identify outliers and to check for the similar variances condition.

Plot of residuals versus each explanatory variable are used to verify that the relationships are linear.

Copyright © 2011 Pearson Education, Inc.

24 of 47

Page 25: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

23.3 Checking Conditions

Residual Plot: Women’s Apparel Stores

This plot of residuals versus fitted values of y has no evident pattern.

Copyright © 2011 Pearson Education, Inc.

25 of 47

Page 26: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

23.3 Checking Conditions

Residual Plot: Women’s Apparel Stores

This plot of residuals versus Income has no evident pattern.

Copyright © 2011 Pearson Education, Inc.

26 of 47

Page 27: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

23.3 Checking Conditions

Check Normality: Women’s Apparel Stores

The quantile plot indicates nearly normal condition is satisfied.

Copyright © 2011 Pearson Education, Inc.

27 of 47

Page 28: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

23.4 Inference in Multiple Regression

Inference for the Model: F-test

F-test: test of the explanatory power of the MRM as a whole.

F-statistic: ratio of the sample variance of the fitted values to the variance of the residuals.

Copyright © 2011 Pearson Education, Inc.

28 of 47

Page 29: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

23.4 Inference in Multiple Regression

Inference for the Model: F-test

The F-Statistic

is used to test the null hypothesis that all slopes are equal to zero, e.g., H0: .

Copyright © 2011 Pearson Education, Inc.

29 of 47

k

kn

R

RF

1

1 2

2

021

Page 30: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

23.4 Inference in Multiple Regression

F-test Results in Analysis of Variance Table

The F-statistic has a p-value of <0.0001; reject H0. Income and Competitors together explain statistically significant variation in sales.

Copyright © 2011 Pearson Education, Inc.

30 of 47

Page 31: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

23.4 Inference in Multiple Regression

Inference for One Coefficient

The t-statistic is used to test each slope using the null hypothesis H0: βj = 0.

The t-statistic is calculated as

Copyright © 2011 Pearson Education, Inc.

31 of 47

)(

0

j

jj bse

bt

Page 32: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

23.4 Inference in Multiple Regression

t-test Results for Women’s Apparel Stores

The t-statistics and associated p-values indicate that both slopes are significantly different from zero.

Copyright © 2011 Pearson Education, Inc.

32 of 47

Page 33: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

23.4 Inference in Multiple Regression

Prediction Intervals

An approximate 95% prediction interval is given by .

For example, the 95% prediction interval for sales per square foot at a location with median income of $70,000 and 3 competitors is approximately$545.47 ±$136.06 per square foot.

Copyright © 2011 Pearson Education, Inc.

33 of 47

esy 2ˆ

Page 34: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

23.5 Steps in Fitting a Multiple Regression

1. What is the problem to be solved? Do these data help in solving it?

2. Check the scatterplots of the response versus each explanatory variable (scatterplot matrix).

3. If the scatterplots appear straight enough, fit the multiple regression model. Otherwise find a transformation.

4. Obtain the residuals and fitted values from the regression.

Copyright © 2011 Pearson Education, Inc.

34 of 47

Page 35: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

23.5 Steps in Fitting a Multiple Regression (Continued)

5. Use residual plot of e vs. to check for similar variance condition.

6. Construct residual plots of e vs. explanatory variables. Look for patterns.

7. Check whether the residuals are nearly normal.8. Use the F-statistic to test the null hypothesis that

the collection of explanatory variables has no effect on the response.

9. If the F-statistic is statistically significant, test and interpret individual partial slopes.

Copyright © 2011 Pearson Education, Inc.

35 of 47

y

Page 36: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

4M Example 23.1: SUBPRIME MORTGAGES

Motivation

A banking regulator would like to verify how lenders use credit scores to determine the interest rate paid by subprime borrowers. The regulator would like to separate its effect from other variables such as loan-to-value (LTV) ratio, income of the borrower and value of the home.

Copyright © 2011 Pearson Education, Inc.

36 of 47

Page 37: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

4M Example 23.1: SUBPRIME MORTGAGES

Method

Use multiple regression on data obtained for 372 mortgages from a credit bureau. The explanatory variables are the LTV, credit score, income of the borrower, and home value. The response is the annual percentage rate of interest on the loan (APR).

Copyright © 2011 Pearson Education, Inc.

37 of 47

Page 38: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

4M Example 23.1: SUBPRIME MORTGAGES

Method Find correlations among variables:

Copyright © 2011 Pearson Education, Inc.

38 of 47

Page 39: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

4M Example 23.1: SUBPRIME MORTGAGES

Method Check scatterplot matrix (like APR vs. LTV )

Linearity and no obvious lurking variables conditions satisfied.

Copyright © 2011 Pearson Education, Inc.

39 of 47

Page 40: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

4M Example 23.1: SUBPRIME MORTGAGES

Mechanics Fit model and check conditions.

Copyright © 2011 Pearson Education, Inc.

40 of 47

Page 41: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

4M Example 23.1: SUBPRIME MORTGAGES

Mechanics Residuals versus fitted values.

Similar variances condition is satisfied.

Copyright © 2011 Pearson Education, Inc.

41 of 47

Page 42: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

4M Example 23.1: SUBPRIME MORTGAGES

Mechanics

Nearly normal condition is not satisfied; data are skewed.

Copyright © 2011 Pearson Education, Inc.

42 of 47

Page 43: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

4M Example 23.1: SUBPRIME MORTGAGES

Message

Regression analysis shows that the characteristics of the borrower (credit score) and loan LTV affect interest rates in the market. These two factors together explain almost half of the variation in interest rates. Neither income of the borrower nor the home value improves a model with these two variables.

Copyright © 2011 Pearson Education, Inc.

43 of 47

Page 44: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

Best Practices

Know the context of your model.

Examine plots of the overall model and individual explanatory variables before interpreting the output.

Check the overall F-statistic before looking at the t-statistics.

Copyright © 2011 Pearson Education, Inc.

44 of 47

Page 45: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

Best Practices (Continued)

Distinguish marginal from partial slopes.

Let your software compute prediction intervals in multiple regression.

Copyright © 2011 Pearson Education, Inc.

45 of 47

Page 46: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

Pitfalls

Don’t confuse a multiple regression with several simple regressions.

Do not become impatient.

Don’t believe that you have all of the important variables.

Copyright © 2011 Pearson Education, Inc.

46 of 47

Page 47: Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23

Pitfalls (Continued)

Do not think that you have found causal effects.

Do not interpret an insignificant t-statistic to mean that an explanatory variable has no effect.

Don’t think that the order of the explanatory variables in a regression matters.

Copyright © 2011 Pearson Education, Inc.

47 of 47