multiple regression models

34
Lecture 14 1 Econ 140 Econ 140 Multiple Regression Models Lecture 14

Upload: jamese

Post on 07-Jan-2016

35 views

Category:

Documents


2 download

DESCRIPTION

Multiple Regression Models. Lecture 14. Today’s plan. How to read the estimated coefficients Functional form Testing the explanatory power of the model Adjustment to R 2. Reading coefficients. With a bi-variate model we could easily determine how a change in X affects Y. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Multiple Regression Models

Lecture 14 1

Econ 140Econ 140

Multiple Regression Models

Lecture 14

Page 2: Multiple Regression Models

Lecture 14 2

Econ 140Econ 140Today’s plan

• How to read the estimated coefficients

• Functional form

• Testing the explanatory power of the model

• Adjustment to R2

Page 3: Multiple Regression Models

Lecture 14 3

Econ 140Econ 140Reading coefficients

• With a bi-variate model we could easily determine how a change in X affects Y

XbaY ˆˆˆ

2211ˆˆˆˆ XbXbaY • With a multivariate model ,

determining how a change in X2 affects Y is more complicated

• For a multivariate regression, you must hold X1 constant to determine the effect of a change in X2 on Y

– For this reason we call the slope coefficients in a multivariate regression the partial regression coefficients

Page 4: Multiple Regression Models

Lecture 14 4

Econ 140Econ 140Reading coefficients Example

• Back to our earnings and education example from L11.xls

• For our estimated multivariate regression equation, the expectation of Y is:

E(Y) = 4.135 + 0.057 X1 + 0.023 X2

• If we hold age constant at 30, the expectation of Y becomes:

E(Y) = 4.135 + 0.057 X1 + 0.023 (30)

= 4.135 + 0.023 (30) + 0.057 X1

– What we’re doing here is looking at the relationship between education and earnings for 30 year olds

– This can also be done for any other age, i.e. 50 year olds:

E(Y) = 4.135 + 0.057 X1 + 0.023 (50)

Page 5: Multiple Regression Models

Lecture 14 5

Econ 140Econ 140Functional form

• Our example on earnings and years of education has some economic theory in its foundation - but basically an ‘ad-hoc’ specification. We know we want to test the relationship between earnings and years of schooling.

• Let’s look at another example that is based on economic theory: the Cobb-Douglas production function Y = ALK

• If we want to test for constant returns to scale

+ = 1

Page 6: Multiple Regression Models

Lecture 14 6

Econ 140Econ 140Functional form (2)

• We can get the equation into a form we can estimate by taking logs:

ln Y = ln A + ln L + ln K

– This is called log linear form since all the variables are in logs

– The model is now linear in parameters so we can use least squares to estimate it

– The log linear form gives us estimated coefficients that are elasticities: the estimates of and give us the elasticities of labor and capital with respect to output

Page 7: Multiple Regression Models

Lecture 14 7

Econ 140Econ 140Example with longitudinal data

• L14-1.xls is on the web. It contains information on companies in the UK private sector. Data from DATASTREAM; for US: COMPUSTAT

• Note that this is a longitudinal data set - we are analyzing the same agents (the companies) over time

• I have calculated the true output elasticity with respect to labor for a 100% change in labor and the true output elasticity with respect to labor for a 10% change in labor

– Note that the larger the increase in the independent variable, the further the approximation is from the coefficient

Page 8: Multiple Regression Models

Lecture 14 8

Econ 140Econ 140Example with longitudinal data (2)

• If we want to calculate the true change, we need to calculate: 1%1ln% ofXbEXPofY

• If we want to estimate the Cobb-Douglas production function, we use the partial slope coefficients

• We can calculate the partial slope coefficients :

45.0ˆ

67.094.1071

71.722

156.60491.59847.178

165.67156.60491.59064.80ˆ

2

21

b

b

Page 9: Multiple Regression Models

Lecture 14 9

Econ 140Econ 140Example with longitudinal data (3)

• Adding our estimates together we find:

12.1ˆˆˆˆ21 bb

• Later on we’ll test the constraint that + = 1

Page 10: Multiple Regression Models

Lecture 14 10

Econ 140Econ 140Phillips Curve

• The Phillips Curve is an example of ad-hoc variable inclusion

Un

WThe equation representing this relationshipbetween unemployment and wage inflationis:

nUbaW

1

Page 11: Multiple Regression Models

Lecture 14 11

Econ 140Econ 140Phillips Curve (2)

• With ad-hoc specification we don’t know what other variables are relevant

– we need to make informed guesses determined by what we know of economic theory

Page 12: Multiple Regression Models

Lecture 14 12

Econ 140Econ 140The story so far

• Functional form

• Omitted variable bias

• Types of data

– Cross section: earnings and education

– Panel/longitudinal: Cobb-Douglas

– Time-series: Phillips Curve

Page 13: Multiple Regression Models

Lecture 14 13

Econ 140Econ 140Variation in multivariate models

• Let our model be eXbXbaY 2211

ˆˆˆˆ2211

, , , 22bbbb

• We still want to calculate:

– How to calculate these values.

Page 14: Multiple Regression Models

Lecture 14 14

Econ 140Econ 140Variation in multivariate models (2)

• It still holds that the variance of the regression line is

• It also still holds that:

2ˆYX

kn

eYX

bb

22

2

ˆˆ

ˆˆ 11

Page 15: Multiple Regression Models

Lecture 14 15

Econ 140Econ 140Test statistics in multivariate models

• We will start with the sum of squares identity, where:

Total = Explained + Residual

or

222 ˆˆ YYYYYY

• But, the composition of the ESS will be different - our sum of squares identity will look like this:

22211

2 ˆˆ eyxbyxby

• As you add more independent variables to the model, more terms get added to the ESS

Page 16: Multiple Regression Models

Lecture 14 16

Econ 140Econ 140Test statistics in multivariate models (2)

222112

2

ˆˆS of S Total

S of S Explained

y

yxbyxbR

R

• Now let’s look back to an example from an earlier lecture

– we looked at the returns to earnings of education (b1) and age (b2)

– calculate the test statistics and consider model problems

• Our R2 is:

Page 17: Multiple Regression Models

Lecture 14 17

Econ 140Econ 140Test statistics in multivariate models (3)

• You will also be given these these values:

• On an exam you may be asked to estimate the regression line, given a matrix of products and cross-products like this:

y x1 x2

y 25.05 15.75 164.37x1 163.00 276.17x2 6394.97

53.38 77.5

83.12 36

2

1

XY

Xn

Page 18: Multiple Regression Models

Lecture 14 18

Econ 140Econ 140Test statistics in multivariate models (4)

• We can start our calculations with:

• The regression line we calculated earlier is:

21 023.0057.0135.4ˆ XXY

617.033

371.20ˆ

336

37.164023.075.15057.005.25ˆ

2

2

YX

YX

• Taking the square root, we find the root mean square error:

617.0ˆ YXRootMSE

Page 19: Multiple Regression Models

Lecture 14 19

Econ 140Econ 140Test statistics in multivariate models (5)

• Taking the square root gives us

• We can then calculate:

0041.617.024.966110

97.6394

617.017.27697.63943.16

97.6394ˆ

221

b

064.00041.0ˆ1

b

Page 20: Multiple Regression Models

Lecture 14 20

Econ 140Econ 140Test statistics in multivariate models (6)

• Taking the square root gives

• We can then calculate:

000104.0617.024.966110

163ˆ 2

2b

01.0000104.0ˆ2

b

Page 21: Multiple Regression Models

Lecture 14 21

Econ 140Econ 140Hypothesis test on education

• The t-ratio is calculated:

• We can also form a null hypothesis 0: 10 bH

891.0064.0

0057.0 t

• For a significance level of 5% we have a table t value of t/2,33 = 2.035

• Since |t| < t /2 , we accept the null hypothesis

• Recall that the purpose of the test was to examine whether or not education has an effect on earnings. Can we accept this given what we know about economics?

Page 22: Multiple Regression Models

Lecture 14 22

Econ 140Econ 140Hypothesis test on age

• The t-ratio is calculated:

• We construct another hypothesis test: 0: 20 bH

3.201.0

0023.0

ˆ

ˆ

2

22 b

bbt

• For a significance level of 5% we have a table t value of

t/2,33 = 2.035

• Since |t| > t /2 , we reject the null hypothesis

Page 23: Multiple Regression Models

Lecture 14 23

Econ 140Econ 140Looking at R2

• Let’s look at R2:

187.005.25

678.405.25

)37.164(023.0)75.15(057.0

ˆˆ2

22112

y

yxbyxbR

• This is a rather low R2

– This means that the regression equation doesn’t explain the variation well– The regression equation only explains about 1/5 of the variation in Y

Page 24: Multiple Regression Models

Lecture 14 24

Econ 140Econ 140Looking at R2 (2)

• What should we do about the form of our estimated equation when years of education are shown to be statistically insignificant at our chosen significance level?

• We chose a 5% significance level for our test, but we might have been able to reject the null at a different significance level

• Remember: with hypothesis test we want to reduce the number of type I errors where we falsely reject a null

Page 25: Multiple Regression Models

Lecture 14 25

Econ 140Econ 140Testing explanatory power

• What if we examined the regression equation as a whole?• To do so, we look at this null hypothesis:

H0 : b1 = b2 = 0

– This says that neither of the independent variables has any explanatory power– To test this, we will use an F test

Page 26: Multiple Regression Models

Lecture 14 26

Econ 140Econ 140Testing explanatory power (2)

• The F statistic that we’re looking at can be found on the LINEST output• The F test comes from the ANOVA table for the multivariate case, which

looks like this:

Source ofvariation

Sum of Squares Degrees ofFreedom

Mean SquaredDeviation

Explained yxbyxb 2211ˆˆ 2

2

ˆˆ2211 yxbyxb

Residual 2e n-33ˆ2

n

e

Total 2y n-11

2

n

y

Page 27: Multiple Regression Models

Lecture 14 27

Econ 140Econ 140Testing explanatory power (3)

• The F statistic will look like:

• Using the F table, you choose a significance level and use the degrees of freedom in the numerator and denominator, or F0.05, 2, 33

– The 1st row in the table is df in the numerator– The 1st column is the df in the denominator– The 2nd column is the significance level

F 4 72 220 33 33

2 360 62

381..

.

..

F

b x y b x y

e n

1 2 22

1 2

3^

^^

Page 28: Multiple Regression Models

Lecture 14 28

Econ 140Econ 140Testing explanatory power (4)

• If our calculated F statistic is greater than (to the right of) our F table value, we reject the null

• If our calculated F statistic is less than (to the left of) our F table value, we accept the null

F table value

H0: Accepting the null

H1: Rejecting the null

F

Page 29: Multiple Regression Models

Lecture 14 29

Econ 140Econ 140Testing explanatory power (5)

• Looking at the F table, we find that there is no value for exactly 33 df– We have to approximate using 30 df instead

– Our approximated F value is F0.05, 2, 33 3.29

• We reject the null because F > F0.05, 2, 33

• Had we picked a 1% significance level, or F table value would be F0.01, 2, 33 5.27

– and we would’ve accepted the null because F < F0.01, 2, 33

Page 30: Multiple Regression Models

Lecture 14 30

Econ 140Econ 140Testing explanatory power (6)

• In summary, we’re more likely to reject the null at a greater significance level• In this case, we rejected at a 5% significance level and accepted at a 1% level• Graphically:

F* value

F

1%5%

3.29 3.81 5.27

Page 31: Multiple Regression Models

Lecture 14 31

Econ 140Econ 140Testing explanatory power (7)

• The t-test suggests that we should remove years of education from our regression

• An F-test on the joint hypothesis rejects the null, but the test is weak. At a lower significance level (1 percent), we would have accepted the null.

• In this instance, we want to keep the years of education variable in the equation because of what we know of economic theory

• What to do? Conclude that the economic theory is weak. Obtain more data and try again!

Page 32: Multiple Regression Models

Lecture 14 32

Econ 140Econ 140Adjustment to R2

• The more variables added to a regression, the higher R2 will be

– R2 is important, but it isn’t the sole criteria for judging a model’s explanatory power

• Adjusted R2 adjusts for the loss in degrees of freedom associated with adding independent variables to the regression

Page 33: Multiple Regression Models

Lecture 14 33

Econ 140Econ 140Adjustment to R2 (2)

• Adjusted R2 is written as

Adj R2 = 1 - (1 - R2)((n - 1)/(n - k))

n : sample size

k : number of parameters in the regression

Page 34: Multiple Regression Models

Lecture 14 34

Econ 140Econ 140What’s next

• Restricted least squares and the Cobb Douglas Production function

• Including qualitative indicators into the regression equation (e.g. race, gender, marital status).