instrumental variables - wordpress.com · this is known as the instrumental variables / iv method....

Instrumental Variables

Yona Rubinstein

July 2016

Yona Rubinstein (LSE) Instrumental Variables 07/16 1 / 31

The Limitation of Panel Data

So far we learned how to account for selection on time invariantfactors using fixed effects.

Indeed over-weight people are more likely to be on diet than others.And indeed accounting for their time invariant factors conditions outsome of the bias.

Yet even over-weight people choose whether and when to practicediet or not.

The causal interpretation of the before-after comparisons, that is fixedeffect estimates, requires the diet regimes to be independent ofdevelopments in persons weight.

If people start a diet when they gain weight and stop when theyalready lost enough then fixed effect estimates might be exaggeratingthe impact of diet on weight.

Hence, the causal interpretation of fixed effect models is not robust toselection on shocks and/or time varying unobservables.


When Do We Start a Diet?

Why people go on diet in a particular day?

It might be not more than a coincidence. The particular timing inthat case mimics a randomized trial.

Yet, it might also reflect particular circumstances. For example arecent weight gain or a particular "need" to lose a few pounds.

In that the case exact timing is no longer independent of subject’sweight (gain). And the exact timing does not mimic a randomizedtrial, even holding subjects’fixed effects constant.

If people start a diet when they gain a few extra pounds, above andbeyond their "long term" weight, then the regression coeffi cient ofweight on diet might overstate (understate) the causal impact ofkeeping diet on weight.


The Causal Model

To illustrate let’s consider the following model:

Yit = β0 + βDDit + Uit , (1)

where Dit = 1 if person i is on diet and 0 otherwise.Let βD denote the causal impact of diet on weight.Let’s allow the person-specific error term to reflect time invariantfactors shaping person’s i "long term" weight (θi ) and person-timespecific fluctuations around her/his mean:

Uit = θi + εit , (2)

where εit reflects the latter.Equation (1) is a version of the linear causal model. The error-timeterm in this equation (εit ) is the part of potential outcome left overafter controlling for θi .This error term is uncorrelated with diet by assumption. If thisassumption turns out to be correct, the population regression of Yiton Dit and θi generates the causal impact of diet on weight.Yona Rubinstein (LSE) Instrumental Variables 07/16 4 / 31

The Concern

The problem we initially want to tackle is how to estimate the βDwhen the reasons that people start diet on a particular timing areunobserved.The key concern is that conditional on persons fixed effects (θi ),persons’diet status in time t is correlated with the fluctuations (εit )in their weight.In that case, the regression coeffi cient, without controlling for personfixed effect is:

βOLSD = βD + (θ1 − θ0) + (ε1 − ε0) , (3)

where:

ε0 = E(ε0it |Dit = 0

)= E (εit |Dit = 0) ;

ε1 = E(ε1it |Dit = 1

)= E (εit |Dit = 1) .

Controlling for person fixed effect we eliminate part of the bias(θ1 − θ0) but not necessarily all. The fixed effect estimator equals to:

βFED = βD + (ε1 − ε0) , (4)

where βFED is the fixed-effects estimator for βD .If people start a diet when they gain weight then the regressioncoeffi cient βFED under estimates the causal impact of diet on weight.


The Bias

The problems are that: the timing is not random!

If people go on diet when they gain weight and stop their diet whenthey lose weight then:

ε0 < 0

ε1 > 0

In that case:βFED = βD + (ε1 − ε0) > βD . (5)

If people start a diet when they gain weight then the regressioncoeffi cient βFED under estimates the causal impact of diet on weight.

Instrumental Variables method (hereafter IV) can be used to estimateβD .


The Source of the Problem

The problem we face is that subjects make choices and that thesechoices are not independent of the outcomes of interest.

Subjects’choices are influenced by two type of factors:(i) the outcome of interest, (ii) others independent from theoutcome of interest.

Let’s assume that there is a variable Zit , observed by theeonometrician, that is correlated with (ii) and uncorrelated with (i).

A mathematical representation of subjects’choices is spelled outbelow:

Dit = α0 + αZZit + αUUit + εit , (6)

Where εit is by construction a "left-over" uncorrelated with Zit or Uit .

Clearly, Dit is "contaminated" by Uit , that is, as long as:

αU 6= 0.


Decomposition of the Choice / Treatment Status Equation

Let’s consider the following decomposition of the choice / treatmentstatus equation into the three components in (6):

Dit = α0 + αZZit︸︷︷︸ + αUUit︸︷︷︸ + εit︸︷︷︸ .the "effect" the "effect" "left-over"of an of an uncorrelatedexogenous endogenous with Zitsource Zit source Uit or Uiton Dit on Dit

(7)

We have a problem using Dit to identify the impact of treatment(Dit ) as long as Dit is "contaminated" by Uit :

αU 6= 0.


A Solution

An intuitive solution will be to decompose Dit into these parts.

If we could condition out Uit , that is to have a "clean" from Uitmeasure of Dit we could obtain a consistent ("good") estimate of thecausal impact of D on Y .

For example, if we had access to the variable Zit we could decomposeDit into two parts:

Dit = α0 + αZZit︸︷︷︸ + αUUit + εit︸︷︷︸"clean" "contaminated"

(8)

And use only the first part to estimate the causal impact of D on Y .

This is known as the Instrumental Variables / IV method.


Summarizing

So far we learned how to account for selection on time invariantfactors using fixed effects.

Indeed over-weight people are more likely to be on diet than others.And indeed accounting for their time invariant factors conditions outsome of the bias.

Yet even over-weight people choose whether and when to practicediet or not.

The causal interpretation of the before-after comparisons, that is fixedeffect estimates, requires the diet regimes to be independent ofdevelopments in persons weight.

If people start a diet when they gain weight and stop when theyalready lost enough then fixed effect estimates might be exaggeratingthe impact of diet on weight.

Hence, the causal interpretation of fixed effect models is not robust toselection on shocks and/or time varying unobservables.


Back to the Basics

The treatment effects literature is about how some outcome ofinterest, such as earnings, is affected by some treatment, for instanceworking in the financial industry.Although treatment effects must be related to structural models,where the outcome of interest is the left hand side variable and thetreatment is a right-hand side variable, the treatment effect modelshave a terminology and set up all their own though.Notation. As in our previous notes let i index individuals and Didenote a treatment indicator, equal to 1 if a person is treated, andequal to 0 otherwise (we can use of course other letters).Di = 1 indicates exposure to treatment, for example if the first bornchild lives on a 3 children family and 0 if she lives in 2 children family.Y 0i denotes the potential outcome that would occur when person i isnot treated (Di = 0); Y 1i denotes the potential outcome that wouldoccur when person i is not treated (Di = 1).The "problem" these are not both observed.Yona Rubinstein (LSE) Instrumental Variables 07/16 11 / 31

Treatment Effects and Observed Outcomes

Recall that the treatment effect for individual i is:

∆i ≡ Y 1i − Y 0i = βDi . (9)

If the treatment has the same impact on all then:

βDi = βD . (10)

Let’s start with the simplest case of homogeneous/common treatmenteffect, that is βDi = βD .In this case the outcomes are:

Y 0i = β0 + Ui (11)

Y 1i = β0 + βD + Ui

The observed outcome will be:

Y0i = (β0 + U0) + εi (12)

Y1i = (β0 + U1) + βD + εi

where εi is by group zero mean error term.Yona Rubinstein (LSE) Instrumental Variables 07/16 12 / 31

The OLS Regression Coeffi cient

By substituting equation (12) into the observed outcome we receivethat (for simplicitly of notation we denote YDi as Yi ):

Yi = Di (β0 + βD + U1 + εi ) + (1−Di ) (β0 + U0 + εi ) , (13)

Which leads to the following linear reduced form model:

YDi = β0 + βDDi︸︷︷︸ + (U0 + (U1 − U0)Di + εi )︸︷︷︸the treatment Ui = the error termeffect

The OLS estimator for β1 is:

βOLSD = E [Yi |Di = 1]− E [Yi |Di = 0] = βD + (U1 − U0) . (14)

Our concern is that individuals are not randomly selected on the"unobservables", and therefore (U1 − U0) 6= 0.Yona Rubinstein (LSE) Instrumental Variables 07/16 13 / 31

Instrumental Variables: the Intuition beyond the IVApproach

An intuitive solution will be to "find" a shifter in Di that is notcorrelated with the "unobservables". Let’s assume that we observe avariable Zi that:

1 Influences individuals’treatment status, which means that Zi and Diare correlated;

2 Is exogenous to the outcome of interest (Y ), which means that Uiand Zi are not correlated.

Then we could decompose the variation in the treatment status Diinto two parts by estimating using OLS the following regression model:

Di = α0 + αZZi + Vi (15)

And obtain:

Di (Zi ) = α0 + αZZiVi = Di − Di (Zi )


The Intuition beyond the IV Approach

Note that given the assumptions (1) and (2) in the previous slide:

cov(Di ,Vi

)= 0,

cov(Di ,Ui

)= 0

Therefore if Di and Ui are correlated it must be the case that Ui andVi are correlated. In that case by estimating equation (15) wedecompose the variation in the treatment status into two parts: (i)the first part Di (Zi ) that is "clean" from the "influence" of Us and(ii) the second part that is "contaminated" by Us:

Di = α0 + αZZi︸︷︷︸ + αUUi + εi︸︷︷︸"clean" "contaminated"Di (Zi ) +Vi

(16)

We’ll use only the first part to estimate the causal impact of D on Y .This is known as the Instrumental Variables / IV method.


The IV Estimator

Let’s assume that we are interested in estimating the following causalrelationship:

Yi = β0 + βDDi + Ui . (17)

We can obtain a consistent estimate of βD using an instrument Zithat fulfills the following conditions:

1 Correlated with treatment status Di :

cov (Zi ,Di ) 6= 02 Uncorrelated with any other determinants of the dependent variablesYi , that is uncorrelated with the error term in the outcome equation:

cov (Zi ,Ui ) = 0

The latter is called an exclusion restriction since Zi can be said(assumed) to be excluded from the causal model of interest inequation (17).In this case we obtain a consistent estimate of βD by using the IVestimator:

βIVD =cov (Zi ,Yi )cov (Zi ,Di )

. (18)


The IV Method using TSLS

Let’s assume that we are interested in estimating the following causalrelationship:

Yi = β0 + βDDi + Ui . (19)

Assuming that we observe a valid instrument Zi we can obtain aconsistent estimate of βD using the following two steps procedure:

First stage: decompose Di to the "clean" and the "contaminated"elements by estimating equation the following equation:

Di = α0 + αZZi + Vi . (20)

Second stage: impute Di = α0 + αZZi and use that to estimate thecausal relationship between Di by estimating the following equation:

Yi = β0 + βD Di +Wi . (21)


Does the TSLS Provide the IV Estimator?

What do we actually do when we estimate equation (21)? To addressthis question let’s write it explicitly:

Yi = β0 + βD (α0 + αZZi ) +Wi . (22)

Note that the OLS estimator of βD in the equation above is, theIV/TSLS estimator is:

βIVD =αZ cov (Yi ,Zi )

α2Z var (Zi )=cov (Yi ,Zi )αZ var (Zi )

. (23)

Yet, since αZ , estimated in the first stage, is:

αZ =cov (Di ,Zi )var (Zi )

. (24)

Then the IV/TSLS estimator equals:

βIVD =cov (Yi ,Zi ) /var (Zi )cov (Di ,Zi ) /var (Zi )

=cov (Yi ,Zi )cov (Di ,Zi )

. (25)


Are We Sure that the IV Formula is Correct?

What we actually do is estimating the following equation:

Yi = β0 + βD (α0 + αZZi ) + Vi . (26)

And that in this case:

βD =αZ cov (Yi ,Zi )

α2Z var (Zi )=cov (Yi ,Zi )αZ var (Zi )

. (27)

Yet, since:

αZ =cov (Di ,Zi )var (Zi )

. (28)

Then:

βIVD =cov (Yi ,Zi ) /var (Zi )cov (Di ,Zi ) /var (Zi )

=cov (Yi ,Zi )cov (Di ,Zi )

. (29)


The IV Formula in the Binary Case

Consider a case where both the treatment and the instrument take abinary form:

Di = (0, 1)

Zi = (0, 1)

We already proved that the regression coeffi cient of a binary on acontinuos variable equals to the gap in means, that is:

cov (Yi ,Zi )var (Zi )

= E (Y |Z = 1)− E (Y |Z = 0)

cov (Di ,Zi )var (Zi )

= E (D |Z = 1)− E (D |Z = 0)

The IV estimator in this case equals to the gap in mean outcomesdivided by the gap in mean treatment conditional on Z :

βIVD =E (Y |Z = 1)− E (Y |Z = 0)E (D |Z = 1)− E (D |Z = 0) . (30)


The Effect of Schooling on Earnings: Angrist and Krueger1991

One of the most studied public policies is the effect of schooling onlabor market outcomes.In the reduced form sense we are interested in estimating the causalimpact of schooling on earnings:

Yi = β0 + βSSi + Ui , (31)

where Y measures earnings (log hourly, for instance) and S stands forschooling.A key concern is is that years of schooling may be endogenous, withpre-schooling levels of ability affecting both schooling choices andearnings given education levels.Angrist and Krueger (1991), exploit variation in schooling levels thatarise from differential impacts of compulsory schooling laws toestimate βS .


Angrist and Krueger 1991

In the US a child is entitled to drop out of school once he/she turns16. School districts typically require a child to have turned six byJanuary 1st of the year the student enters school. Therefore theyaccumulate different lengths of schooling at the time they turn 16.


Angrist and Krueger 1991: First Stage


Angrist and Krueger 1991: First Stage

Angrist and Kruger Estimated the first stage using the followingspecfication:

Si = α0 + αZ 1Z1 + αZ 2Z2 + αZ 3Z3 + Vi , (32)


Angrist and Krueger 1991: Second Stage for Men Born1930-1939

Angrist and Kruger Estimated the fsecond stage using the followingspecfication:

Yi = β0 + βS Si + Vi , (33)


The IV and the Wald Estimator

Table 1 shows average years of education and average log earnings forindividual born in the first and fourth quarter, using the 1980 censusborn between 1930 to 1939.

Table 1: Summary Statistics, a Subset of the Angrist and Krueger 1991 Data

Variable 1ST Q 2ND, 3RD & 4TH Q Difference

Years of Schooling 12.6881 12.7969 0.1088Weekly Earnings (log) 5.8916 5.9027 0.0111

OLS Estimate 0.0709Ratio 0.1020

The IV estimator is:

βIVD =cov (Yi ,Zi ) /var (Zi )cov (Si ,Zi ) /var (Zi )

=Y4 − Y1S4 − S1

=0.01110.1088

= 0.1020. (34)


Compliance

While the insturment affects choices - subjects choose based on otherfactors too. To discuss the implications let’s consider a case whereboth the treatment and the instrument take a binary form, that is,Zi = (0, 1) and Di = (0, 1).

Specifically, we could disaggregate the population into four groups

1 Non-complier (Ci = 0): always take treatment;2 Non-complier (Ci = 0): never take treatment;3 Compliers (Ci = 1): take treatment;4 Compliers (Ci = 1): do not take treatment;

What does the IV identify?


Average Local Treatment Effect

What is identified by the IV? Let C denote the proportion of thecompliers. In that case the IV estimator can be expressed as:

βIVD =

[E(Y 1i |Ci = 1

)− E

(Y 0i |Ci = 1

)][E (Di |Zi = 1)− E (Di |Zi = 0)]

· C , (35)

since [E (Yi |Zi = 1&Ci = 0)− E (Yi |Zi = 0&Ci = 0)] = 0.Note that:

C = [E (Di |Zi = 1)− E (Di |Zi = 0)]Denote by Y 0C=1 and Y

1C=1 the potential outcomes the compliers

then:βIVD = Y 1C=1 − Y 0C=1 (36)

Hence, the IV identifies the effect on the compliers. This is known asthe Local Average Treatment Effect (LATE).If the treatment effect varies over groups this should be taken intoaccount.Yona Rubinstein (LSE) Instrumental Variables 07/16 29 / 31

The IV Estimator and Treatment Effects

Let’s now allow for henterogenoues treatment effects.Specifically, let’s consider a case with two effects: one for thoseinfluenced by the "instrument", βDC , and another one for those whoare not influenced by the "instrument", βDNC .If, for instance, children from high income families benefit fromschooling less than children from low income families then we mightfind the IV estimator of the returns to schooling to be higher than theOLS estimator, even if the OLS estimator is unbiased.This can happen if/when children from low income families are moreinfluenced from the instrument (mandatory schooling; income shocks,tuitions).In that case:

βIVD = βDC > βD = βOLSD .

Thus, we should be careful with the interpretation of the IV estimatoreven when the identifying assumptions hold.


Take Home Message

Selection into "treatment" is endogenous to the evaluated outcomes.For instance:

Able students are less likely to dropout from school and earn moreregardless of schooling.Productive firms are more likely to get external finance and have higherprofits than less productive firms regardless of the external finance.

Selection into "treatment"is also determined by other factorsexternal to the subject.The TSLS allows to control for self-selection into "treatment" byusing external factors - the instrumental variables - influencingsubjects selection into "treatment".

The TSLS provide a consistent estimate for the casual impact onthose subject manipulated by the instrument.


instrumental variables - wordpress.com · this is known as the instrumental variables / iv method....

Documents