eco375 tutorial 8 instrumental variables - wordpress.com · eco375 tutorial 8 instrumental...

22
ECO375 Tutorial 8 Instrumental Variables Matt Tudball University of Toronto Mississauga November 16, 2017 Matt Tudball (University of Toronto) ECO375H5 November 16, 2017 1 / 22

Upload: letuyen

Post on 24-May-2018

232 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: ECO375 Tutorial 8 Instrumental Variables - WordPress.com · ECO375 Tutorial 8 Instrumental Variables Matt Tudball University of Toronto Mississauga November 16, 2017 Matt Tudball

ECO375 Tutorial 8Instrumental Variables

Matt Tudball

University of Toronto Mississauga

November 16, 2017

Matt Tudball (University of Toronto) ECO375H5 November 16, 2017 1 / 22

Page 2: ECO375 Tutorial 8 Instrumental Variables - WordPress.com · ECO375 Tutorial 8 Instrumental Variables Matt Tudball University of Toronto Mississauga November 16, 2017 Matt Tudball

Review: Endogeneity

Instrumental variables are used to deal with endogeneity in multipleregression models.

Recall the assumption MLR.4

MLR.4 The error ui has an expected value of zero conditional on all xiE(ui |xi1, xi2, ..., xik) = 0 for i = 1, ..., n.

If xj is correlated with the error term u, perhaps because it iscorrelated with some omitted variable xk+1, then MLR.4 will beviolated and our OLS estimators will be biased and inconsistent.

Matt Tudball (University of Toronto) ECO375H5 November 16, 2017 2 / 22

Page 3: ECO375 Tutorial 8 Instrumental Variables - WordPress.com · ECO375 Tutorial 8 Instrumental Variables Matt Tudball University of Toronto Mississauga November 16, 2017 Matt Tudball

Review: IV Estimator

In the last lecture you considered the simple model

yi = β0 + β1xi1 + ui

where Cov(xi , ui ) 6= 0 and hence xi1 is endogenous.

If we have a valid instrument zi for xi1 then it must satisfy theproperties

(1) Cov(zi , ui ) = 0 exogeneity condition(2) Cov(zi , xi1) 6= 0 relevance condition

The condition (1) is difficult to test and must generally by justifiedvia economic theory.

The condition (2) can be tested with a t-test from the regression

xi1 = π0 + π1zi + vi

Matt Tudball (University of Toronto) ECO375H5 November 16, 2017 3 / 22

Page 4: ECO375 Tutorial 8 Instrumental Variables - WordPress.com · ECO375 Tutorial 8 Instrumental Variables Matt Tudball University of Toronto Mississauga November 16, 2017 Matt Tudball

Review: IV Estimator

Using the moment restrictions

E [ui ] = E [yi − β0 − β1xi1] = 0E [ziui ] = E [zi (yi − β0 − β1xi1)] = 0

we can show that β1 takes the form

β1 = Cov(zi ,yi )

Cov(zi ,xi1)

and β0 takes the form

β0 = E(yi )− β1E(xi1)

Replacing the covariances with their sample analogues we can obtainthe IV estimator for β1

βIV1 =∑n

i=1(zi−z)(yi−y)∑ni=1(zi−z)(xi1−x1)

Matt Tudball (University of Toronto) ECO375H5 November 16, 2017 4 / 22

Page 5: ECO375 Tutorial 8 Instrumental Variables - WordPress.com · ECO375 Tutorial 8 Instrumental Variables Matt Tudball University of Toronto Mississauga November 16, 2017 Matt Tudball

Review: IV Inference

With the IV estimator for β1 in hand, we now need to produce anestimator for the variance of β1 in order to conduct valid inference.

Let’s assume MLR.5 (homoscedasticity) for zi such that

Var(u2i |zi ) = σ2 = Var(ui )

Then we can show that the variance of βIV1 takes the form

Var(βIV1 ) = σ2

nσ2x1ρ2x1,z

Estimating σ2 and σ2x1

by their sample analogues and ρ2x1,z by R2

x1,z

(which is the R-squared from the regression of xi1 on zi ), we obtainthe estimator for the variance of βIV1

Var(βIV1 ) = σ2

SSTx1R2x1,z

Matt Tudball (University of Toronto) ECO375H5 November 16, 2017 5 / 22

Page 6: ECO375 Tutorial 8 Instrumental Variables - WordPress.com · ECO375 Tutorial 8 Instrumental Variables Matt Tudball University of Toronto Mississauga November 16, 2017 Matt Tudball

Weak IV

Let’s recall the sample variance of the OLS estimator:

Var(βOLS1 ) = σ2

SSTx1

Note that in the denominator of the estimator for the variance we hadthe term 0 < R2

x1,z < 1 which is the R-squared from a regression ofxi1 on zi .

This suggests that the variance of the IV estimator will always behigher than the variance of the OLS estimator.

It also indicates that the weaker the relationship between xi1 and zi ,the higher the variance of the IV estimator βIV1 .

Weak IV will also exacerbate the finite sample bias of the IVestimator. As we will see below, weak IV’s (even with validinstruments) may actually produce estimates that are more biasedthan OLS.

Matt Tudball (University of Toronto) ECO375H5 November 16, 2017 6 / 22

Page 7: ECO375 Tutorial 8 Instrumental Variables - WordPress.com · ECO375 Tutorial 8 Instrumental Variables Matt Tudball University of Toronto Mississauga November 16, 2017 Matt Tudball

Monte Carlo Results: OVB Correction

Let’s generate some Monte Carlo results to visually compare the IVand OLS estimators in the case where there is a relevant omittedvariable.

First let’s visualise the distribution of the OLS estimates in this case.

In our simulation we are going to generate 1000 replications ofsamples n = 50 produced from the following data-generating process:

xi2 = N (1, 2)xi1 = 2xi2 +N (2, 2)yi = 3xi1 + 2xi2 +N (0, 1)

In each replication we are going to calculate the OLS estimator fromthe regression of yi on xi1 (i.e. omitting xi2 which is correlated withboth xi1 and yi ).

You can therefore think of ui = 2xi2 + εi where εi ∼ N (0, 1).

Matt Tudball (University of Toronto) ECO375H5 November 16, 2017 7 / 22

Page 8: ECO375 Tutorial 8 Instrumental Variables - WordPress.com · ECO375 Tutorial 8 Instrumental Variables Matt Tudball University of Toronto Mississauga November 16, 2017 Matt Tudball

Monte Carlo Results: OVB Correction

Notice that the estimates are centred around βOLS1 ≈ 3.8 while the

true value of β1 is 3.

Matt Tudball (University of Toronto) ECO375H5 November 16, 2017 8 / 22

Page 9: ECO375 Tutorial 8 Instrumental Variables - WordPress.com · ECO375 Tutorial 8 Instrumental Variables Matt Tudball University of Toronto Mississauga November 16, 2017 Matt Tudball

Monte Carlo Results: OVB Correction

Let’s now try running a simulation with the same data-generatingprocess as before but introducing a valid IV zi for xi1.

xi2 ∼ N (1, 2)zi ∼ N (3, 1)xi1 = 3zi + 2xi2 +N (2, 2)yi = 3xi1 + 2xi2 +N (0, 1)

Notice that zi is uncorrelated with xi2. Since ui = 2xi2 + εi we canshow that

Cov(zi , ui ) = Cov(zi , 2xi2 + εi ) = 2Cov(zi , xi2) + Cov(zi , εi ) = 0

and therefore zi satisfies condition (1) of IV validity in slide 3.Since zi enters into the equation which determines xi1 we also knowthat it is relevant. It therefore also satisfies condition (2) of IVvalidity.Therefore zi is a valid instrument and it should return a consistentestimate of β1 = 3.

Matt Tudball (University of Toronto) ECO375H5 November 16, 2017 9 / 22

Page 10: ECO375 Tutorial 8 Instrumental Variables - WordPress.com · ECO375 Tutorial 8 Instrumental Variables Matt Tudball University of Toronto Mississauga November 16, 2017 Matt Tudball

Monte Carlo Results: OVB Correction

Notice that, although the estimates are clustered much more aroundthe true value of β1 = 3, the distribution is skewed very much to theleft.

It’s also true that the average βIV1 ≈ 2.9 indicating that the IVestimator is biased.

Matt Tudball (University of Toronto) ECO375H5 November 16, 2017 10 / 22

Page 11: ECO375 Tutorial 8 Instrumental Variables - WordPress.com · ECO375 Tutorial 8 Instrumental Variables Matt Tudball University of Toronto Mississauga November 16, 2017 Matt Tudball

Monte Carlo Results: Weak IV

In the last Monte Carlo simulation xi1 and zi were related accordingto the coefficient π1 = 3.

Let’s see what happens to the mean and variance of our estimateswhen we reduce that to π1 = 0.3.

Here the average βIV1 ≈ 4.1 which is actually worse than the estimateproduced by OLS.

Matt Tudball (University of Toronto) ECO375H5 November 16, 2017 11 / 22

Page 12: ECO375 Tutorial 8 Instrumental Variables - WordPress.com · ECO375 Tutorial 8 Instrumental Variables Matt Tudball University of Toronto Mississauga November 16, 2017 Matt Tudball

IV With Additional Exogenous Variables

Let’s extend this model somewhat. Suppose there is an additionalexplanatory variable xi2 which satisfies MLR.4 (i.e. Cov(xi2, ui ) = 0)such that our regression takes the form

yi = β0 + β1xi1 + β1xi2 + ui

How does this change our estimation procedure?

The relevance condition can be restated as testing that π1 6= 0 in amultiple regression

xi1 = π0 + π1zi + π2xi2 + vi

The moment restrictions needed for identification are now

E [ui ] = 0 as beforeE [zi1ui ] = 0 as beforeE [xi2ui ] = 0 which is the standard MLR.4 condition

Matt Tudball (University of Toronto) ECO375H5 November 16, 2017 12 / 22

Page 13: ECO375 Tutorial 8 Instrumental Variables - WordPress.com · ECO375 Tutorial 8 Instrumental Variables Matt Tudball University of Toronto Mississauga November 16, 2017 Matt Tudball

IV With Additional Exogenous Variables

We can replace these moment restrictions with their sampleanalogues in order to obtain the FOCs that we need to solve

0 =∑n

i=1

(yi − β0 − βIV1 xi1 − β2xi2

)0 =

∑ni=1 zi1

(yi − β0 − βIV1 xi1 − β2xi2

)0 =

∑ni=1 xi2

(yi − β0 − βIV1 xi1 − β2xi2

)We can solve this system of equations to obtain estimates for β0, βIV1and β2.

Matt Tudball (University of Toronto) ECO375H5 November 16, 2017 13 / 22

Page 14: ECO375 Tutorial 8 Instrumental Variables - WordPress.com · ECO375 Tutorial 8 Instrumental Variables Matt Tudball University of Toronto Mississauga November 16, 2017 Matt Tudball

Two-Stage Least Squares (2SLS): Motivation

Notice that the IV estimator held for the case in which we had oneendogeneous variable xi1 and one instrument zi .

What happens if we have more instruments zi1 and zi2 than we haveendogenous variables xi1 (let’s ignore having an exogenous xi2 fornow?

We cannot use the standard IV estimator since it is unclear whatinstrument(s) should be contained in the covariances Cov(zi , xi1) andCov(zi , yi ).

We need to find some way of aggregating the information containedin zi1 and zi2.

Matt Tudball (University of Toronto) ECO375H5 November 16, 2017 14 / 22

Page 15: ECO375 Tutorial 8 Instrumental Variables - WordPress.com · ECO375 Tutorial 8 Instrumental Variables Matt Tudball University of Toronto Mississauga November 16, 2017 Matt Tudball

Two-Stage Least Squares (2SLS): Predicted Instrument

Note that since zi1 and zi2 are both individually uncorrelated with ui ,any linear combination of zi1 and zi2 is also uncorrelated with ui .

The linear combination that is most correlated with xi1 (i.e. the onethat maximises the relevance of the IVs) is obtained from the firststage regression

xi1 = π0 + π1zi1 + π2zi2 + εi= x∗i1 + εi

The “best IV” for xi1 is therefore the linear combination

x∗i1 = π0 + π1zi1 + π2zi2

We can estimate x∗i1 by OLS since all of the explanatory variables zi1and zi2 are exogenous:

xi1 = π0 + π1zi1 + π2zi2

We can then use xi1 as an IV for xi1 and obtain βIV1 .

Matt Tudball (University of Toronto) ECO375H5 November 16, 2017 15 / 22

Page 16: ECO375 Tutorial 8 Instrumental Variables - WordPress.com · ECO375 Tutorial 8 Instrumental Variables Matt Tudball University of Toronto Mississauga November 16, 2017 Matt Tudball

Two-Stage Least Squares (2SLS): Procedure

As suggested in the last slide, the estimator we obtain from thisprocedure is called the Two-Stage Least Squares (2SLS) estimator.

The reason for this name is that the estimator can be obtained byrunning two OLS regressions.

1 Run OLS regression on the first stage regression

xi1 = π0 + π1zi1 + π2zi2 + εi

and calculate the predicted values xi1.2 Run OLS in the second stage regression which replaces xi1 with xi1

yi = β0 + β1xi1 + ui

obtaining β2SLS1 .

It can be shown that β2SLS1 = βIV1 .

Between step 1 and step 2 we can also compute an F-test forH0 : π1 = π2 = 0 to test the relevance condition.

Matt Tudball (University of Toronto) ECO375H5 November 16, 2017 16 / 22

Page 17: ECO375 Tutorial 8 Instrumental Variables - WordPress.com · ECO375 Tutorial 8 Instrumental Variables Matt Tudball University of Toronto Mississauga November 16, 2017 Matt Tudball

Two-Stage Least Squares (2SLS): Intuition

Note that when we are doing 2SLS, we are replacing the endogenousvariable xi1 with its predicted value from an OLS regression with zi1and zi2 as explanatory variables.

We can think of these predicted values as containing the variation inxi1 that is uncorrelated with ui .

Therefore, in the second stage regression, we are able to estimate β1

by looking only at the correlation between yi and xi1 that isuncorrelated with ui .

Since we are only looking at a portion of the total variation in xi1,this is why our IV estimators have a higher variance than thecorresponding OLS estimators.

Matt Tudball (University of Toronto) ECO375H5 November 16, 2017 17 / 22

Page 18: ECO375 Tutorial 8 Instrumental Variables - WordPress.com · ECO375 Tutorial 8 Instrumental Variables Matt Tudball University of Toronto Mississauga November 16, 2017 Matt Tudball

Two-Stage Least Squares (2SLS): ivregress

To implement 2SLS in Stata we need the command ivregress.

Suppose we want to run the regression

lwagei = β0 + β1educi + ui

and we have two instruments fatheduci and motheduci .

Then to estimate β2SLS1 in Stata we would type the command

ivregress 2sls lwage (educ = fatheduc motheduc)

Note that this command does not estimate 2SLS in the way weoutlined in slide 16. The procedure that Stata uses accounts for thevariance in the first stage predicted values when calculating thestandard errors for β2SLS

1 . The procedure in slide 16 takes thosepredicted values as given in the second stage and so it willunderestimate the true variance.

Matt Tudball (University of Toronto) ECO375H5 November 16, 2017 18 / 22

Page 19: ECO375 Tutorial 8 Instrumental Variables - WordPress.com · ECO375 Tutorial 8 Instrumental Variables Matt Tudball University of Toronto Mississauga November 16, 2017 Matt Tudball

In-Class Exercise 1

In this exercise we are going to use 2SLS to estimate the effect ofeducation on wages. Download the dataset CARD.dta from my website(matthewtudball.com). Consider the following regression:

lwagei = β0 + β1educi + ui

where lwagei is the natural logarithm of wage and educi is years ofeducation.

1 Estimate the model by OLS and record the estimate βOLS1 .

2 Consider the three potential instruments nearc4i (distance to nearest4-year college), fatheduci (father’s years of education) and motheduci(mother’s years of education). Recall that each of these threeinstruments must be uncorrelated with ui , which contains all of thevariables other than educi which are related to wages lwagei .Break into small groups of 2 or 3 and discuss which (if any) of theseinstruments will satisfy that assumption.

Matt Tudball (University of Toronto) ECO375H5 November 16, 2017 19 / 22

Page 20: ECO375 Tutorial 8 Instrumental Variables - WordPress.com · ECO375 Tutorial 8 Instrumental Variables Matt Tudball University of Toronto Mississauga November 16, 2017 Matt Tudball

In-Class Exercise 1

3 Estimate the first stage of this regression by regressing educi onnearc4i , fatheduci and motheduci . Run an F-test to test whether theinstruments are jointly significant. What do you conclude aboutwhether these instruments satisfy the relevance condition?

4 Estimate the model using the command ivregress.

5 Estimate the model again using the procedure on slide 16. You shouldend up running two OLS regressions. How do your estimates compareto those in the previous question? How about the standard errors?

Matt Tudball (University of Toronto) ECO375H5 November 16, 2017 20 / 22

Page 21: ECO375 Tutorial 8 Instrumental Variables - WordPress.com · ECO375 Tutorial 8 Instrumental Variables Matt Tudball University of Toronto Mississauga November 16, 2017 Matt Tudball

In-Class Exercise 2

This is a variation on Computer Exercise 2 from Wooldridge Chapter 15.Download the dataset FERTIL2.dta from my website. This datasetincludes, for women in Botswana during 1988, information on numbers ofchildren, years of education, age and religious and economic status.

1 Estimate the model

children1 = β0 + β1educi + β2agei + β3age2i + ui

by OLS. Holding age fixed, what is the estimated effect of anotheryear of education on fertility? Do you think this estimate has a causalinterpretation? You may talk in small groups about this.

2 The variable frsthalfi is a dummy equal to 1 if the woman was bornduring the first six months of the year. Do you think frsthalfi is areasonable IV for educi? Test the relevance condition (Hint: you needto run a regression).

Matt Tudball (University of Toronto) ECO375H5 November 16, 2017 21 / 22

Page 22: ECO375 Tutorial 8 Instrumental Variables - WordPress.com · ECO375 Tutorial 8 Instrumental Variables Matt Tudball University of Toronto Mississauga November 16, 2017 Matt Tudball

In-Class Exercise 2

3 Estimate the model from part 1 by using frsthalfi as an IV for educi .Compare the estimated effect of education with the OLS estimatefrom part 1. Interpret the coefficient on educi . Do you think thisestimate has a causal interpretation?

4 Add the binary variables electrici , tvi and bicyclei to the model andassume these are exogenous. Estimate the equation by OLS and 2SLSand compared the estimated coefficient on educi . Interpret thecoefficient on tvi and explain why television ownership has a negativeeffect on fertility. You may talk in small groups.

Matt Tudball (University of Toronto) ECO375H5 November 16, 2017 22 / 22