regression discontinuity design - wordpress.com · this method was developed to estimate treatment...
TRANSCRIPT
Regression Discontinuity Design
Yona Rubinstein
July 2016
Yona Rubinstein (LSE) Regression Discontinuity Design 07/16 1 / 34
Outline
The basic idea of Regression Discontinuity Design (RDD).
Sharp versus Fuzzy discontinuities.
Estimating regression discontinuity.
Interpretation and Internal validity.
Heterogeneous effects and external validity.
Applications from recent literature.
Yona Rubinstein (LSE) Regression Discontinuity Design 07/16 2 / 34
Regression Discontinuity Design (RDD) in Brief
This method was developed to estimate treatment effects innon-experimental settings.
The basic idea of Regression Discontinuity Design (RDD) is thefollowing:
Observations / subjects (e.g. individuals, firms, governments,economies etc.) are ‘treated’based on a known cutoff rule.
There is an observed variable, X , that assigns subjects to treatment.
Subjects are treated if X ≥ C and subjects are not treated ifX < C , where C is the cutoff value.
This cutoff rule should generate discontinuity in outcomes (Y ) — ifindeed treatment matters. X might have a direct effect on theoutcome of interest (Y ). Yet in the absence of treatment therelationship between X and Y should be "smooth".
We can evaluate treatment effects at the margin by estimating therelationship between Y and X around the cutoff point.
Yona Rubinstein (LSE) Regression Discontinuity Design 07/16 3 / 34
Applications
Finance: credit constraints (Keys, et al, 2010).
Political Economy: incumbency effects (Lee, 2008). Do votersinfluence government policies? (Lee, Moretti and Buttler, 2004)
Health: the effect of medical treatment on infants (Almond et al.,2010); eligibility for Medicare (Card, Dobkin and Maestas, 2008);health insurance (Anderson, 2012).
Information: Yelp ratings effect (Anderson and Magruder, 2012;Lucas, 2012).
Education: class size (Angrist and Lavy, 1999), demand for schooling(Black, 1999), returns to schooling (Thistlethwaite and Campbell,1960; Abdulkadiroglu et al. 2014)
Environment: air quality and housing prices (Chay and Greenstone,2005)
Yona Rubinstein (LSE) Regression Discontinuity Design 07/16 4 / 34
Pros and Cons: Credible yet Limited
Pros:
Internal validity: some key identifying assumptions can beempirically verified; specifically the absence of other discontinuities
Easy to estimate (like RTC)
Credible causal estimates of treatment effects.Transparency: treatment and outcomes can be illustrated usinggraphical methods
Cons:
Treatment effects are local (LATE)
Limits external validity.
Yona Rubinstein (LSE) Regression Discontinuity Design 07/16 5 / 34
RDD and Randomized Treatment-Control (RTC) Setting
Assignment to treatment and control is not random!
Subjects cannot precisely manipulate the assignment variable X ,that is whether they are treated or not. It is a function of X .
A consequence of this is that the variation in treatment near thethreshold is randomized as though from a randomized experiment.
Subjects that are just below the threshold are similar/comparable tosubjects just above the X threshold.
The RD designs can be analyzed– and tested– likerandomized experiments.
Therefore RDD is a way of estimating treatment effects in anonexperimental setting where treatment is determined by whether anobserved “assignment”variable.
Yona Rubinstein (LSE) Regression Discontinuity Design 07/16 6 / 34
RDD as a Nonparametric Differences- in-Differences
Has similar flavor to Diff-in-Diff experiment setting.Graphical presentation of an RD design using raw data is helpful andinformative.A graph provides a sense of whether the “jump” in the outcome ofinterest at the cutoff is unusually large compared to bumps away fromthe cutoff.Plot outcome (Y ) against the observed “assignment variable" (X ). Iftreatment matters we should observe sharp, discontinuous change inY at the cutoff value of C .At the very same time we should not observe and observe sharp,discontinuous change in any explanatory variable at the cutoff valueof X = C .The first change in the first diff; the second change is thesecond diff.Note that treatment only depends on X we never observe subjectswith the same X for a treatment and a control (“sharp RDD”).Yona Rubinstein (LSE) Regression Discontinuity Design 07/16 7 / 34
RDD Terminology
The variable X is called the “assignment variable" or the“forcing variable".
The cutoff rule can be a single variable or multiple variables; forsimplicity of illustrations, we’ll discuss the single variable setting.
We label the cutoff value — the threshold —with the letter C .
Outcomes: Y 0 and Y 1 are the potential outcomes of interest withand without treatment.
Yona Rubinstein (LSE) Regression Discontinuity Design 07/16 8 / 34
Two Types of RDD Settings
Sharp RDD: Assignment to treatment occurs through known anddeterministic decision rule that depends only on X ; i.e. if X ≥ C .Subject i is treated with probability 1. If Xi < C and not treatedotherwise:
Di ={1 if Xi ≥ C0 if Xi < C
}. (1)
Fuzzy RDD: The probability of being treated "jumps" at Xi = C :
Pr (Di = 1|X ) ={g1 (X ) if Xi ≥ Cg0 (X ) if Xi < C
}, (2)
where g1 (X = C ) > g1 (X = C ) .
Yona Rubinstein (LSE) Regression Discontinuity Design 07/16 9 / 34
RDD Settings and Compliance
Let Zi be an indicator variable that equals 1 if Xi ≥ C and 0 otherwise:Zi = 1 (Xi ≥ C ) . (3)
1 Sharp RDD: Assignment to treatment depends only on Zi :
Di ={1 if Zi = 10 if Zi = 0
}. (4)
2 Fuzzy RDD: Assignment to treatment depends on Zi , on X andon "unobservables":
Pr (Di = 1|X ) ={E (Di = 1|X ,Z = 1) if Zi = 1E (Di = 1|X ,Z = 0) if Zi = 0
}. (5)
Sharp RDD is a particular case of the Fuzzy RDD with perfectcompliance. We’ll discuss that later.Yona Rubinstein (LSE) Regression Discontinuity Design 07/16 10 / 34
Discontinuity in Treatment: Sharp RDD Visually
Yona Rubinstein (LSE) Regression Discontinuity Design 07/16 11 / 34
Discontinuity in Treatment: Fuzzy RDD Visually
Yona Rubinstein (LSE) Regression Discontinuity Design 07/16 12 / 34
Continuity in Potential Outcomes
Y 0 (X ) and Y 0 (X ) continuous at threshold C ; common effect.
Yona Rubinstein (LSE) Regression Discontinuity Design 07/16 13 / 34
Continuity in Potential Outcomes
Y 0 (X ) and Y 0 (X ) continuous at threshold C ; heterogeneous effects.
Yona Rubinstein (LSE) Regression Discontinuity Design 07/16 14 / 34
Sharp RDD: Graphical Example
Simulated data with STATA: C = 140; "gen y = 100 + 80*T +2*x + rnormal(0, 20)"
Yona Rubinstein (LSE) Regression Discontinuity Design 07/16 15 / 34
Nonlinear Effect of X on Y, Graphical Example
Simulated data with STATA: C = 140; "gen y = 10000 + 0*T -100*x +x2 + rnormal(0, 1000)"
Yona Rubinstein (LSE) Regression Discontinuity Design 07/16 16 / 34
Nonlinear Effect of X on Y and Miss-specified Sharp RDD:
What if nonlinear? Could result in a biased treatment effect if oneassumes a linear model?
Yona Rubinstein (LSE) Regression Discontinuity Design 07/16 17 / 34
Sharp RDD: Estimation
Two practical strategies to implement a RDD.
1 First, globally using all data: control for the direct effect of X on Yin a very general and rigorous form.
2 Second, locally using data in a narrow window / margin aroundthreshold: control for the local direct effect of X on Y
Yona Rubinstein (LSE) Regression Discontinuity Design 07/16 18 / 34
Estimating Sharp RDD using All Data
Specify the potential outcomes models as a flexible function of theforcing variable X .
Potential outcomes in the absence of treatment:
Y 0i = β0 + f0 (Xi − C ) + U0i . (6)
Potential outcomes with treatment:
Y 1i = β0 + βD + f1 (Xi − C ) + U1i . (7)
f 0 (Xi ) and f 1 (Xi ) are general continuous functions of X where:
f 0 (0) = f 1 (0) = 0. (8)
The average treatment effect for subjects with X = C (LATE) is:
βD = Y1Xi=C = C − Y
0Xi=C . (9)
Yona Rubinstein (LSE) Regression Discontinuity Design 07/16 19 / 34
Sharp RDD using All Data in Practice
Practically we can estimate the local treatment effect using a fullyinteracted — switching regression —model:
Yi = (1−Di )Y 0i +DiY 0i (10)
Where:Di = 1 (Xi ≥ C ) . (11)
By introducing (6) and (7) into the switching regression modelabove we obtain:
Yi = β0 + βDDi + f0 (Xi − C ) + f (Xi − C )Di + Ui (12)
where:f (Xi − C ) = f 1 (Xi − C )− f 0 (Xi − C )
βD is the average treatment effect for subjects with Xi = C —theLocal Average Treatment Effect at (X = C ).Yona Rubinstein (LSE) Regression Discontinuity Design 07/16 20 / 34
in Practice (cont.)
A common practical practice is to approximate model f (Xi − C ) witha pth order polynomial, specifically:
f 0 (Xi − C ) = f (xi ) = β1xi + β2x2i + β3x
3i + β4x
4i . . .+ βpx
pi ,(13)
f 1 (Xi − C ) = g (xi ) = γ1xi + γ2x2i + γ3x
3i + γ4x
4i . . .+ γpx
pi ,
Note that xi = Xi − C .We estimate the model in (12) using the following specification:
Yi = β0 + βDDi + (14)
β1xi + β2x2i + β3x
3i + β4x
4i . . .+ βpx
pi +
δ1xiDi + δ2x2i Di + δ3x3i Di + δ4x4i Di . . .+ δpxpi Di +
Ui ,
Note that δj = γj − βj .
Yona Rubinstein (LSE) Regression Discontinuity Design 07/16 21 / 34
Estimating Sharp RDD using Windows
Estimate, this time, using observations with Xi such that:
C − ∆ ≤ Xi ≤ C + ∆
Specifically, estimate the following model using the sub-sample ∆above:
Yi∆ = β0 + βDDi + f̃0 (xi ) + f̃ (xi )Di + Ui∆ (15)
f̃ 0 (xi ) and f̃ (xi ) are general continuous functions of X ; might be abit less flexible than the comparable f 0 (xi ) and f (xi ).
Note that as ∆→ 0 the estimated βOLSD → βD
lim∆→0
E [Yi |C ≤ Xi ≤ C + ∆]− E [Yi |C − ∆ ≤ Xi ≤ C ] =
E[(Y 1i − Y 0i
)|Xi = C
]= βD .
Yona Rubinstein (LSE) Regression Discontinuity Design 07/16 22 / 34
Sharp RDD: Winning the Next Election, Lee 2008
Lee (2008) uses a Sharp Regression Discontinuity Design to estimatethe causal effect of incumbency in U.S. House elections.
The population sample of contains more than six thousands electionsover the 1946—98 period.
The assignment variable in this setting —X— is the fraction of votesawarded to Democrats in the previous election.
The cutoff rule, C ,is 50%
When the fraction exceeds 50%, that is when X ≥ 0.5 a Democrat iselected and the party becomes the incumbent party in the nextelection.
Both the share of votes and the probability of winning the nextelection are considered as outcome variables.
Yona Rubinstein (LSE) Regression Discontinuity Design 07/16 23 / 34
Sharp RDD: Probability of Winning Election t+1, Lee 2008
Yona Rubinstein (LSE) Regression Discontinuity Design 07/16 24 / 34
Sharp RDD: Probability of Winning Election t-1, Lee 2008
Yona Rubinstein (LSE) Regression Discontinuity Design 07/16 25 / 34
Fuzzy RDD
The causal model - represented by the switching regression model(12) — is the same as in the Sharp RD Design:
Yi = β0 + βDDi + f0 (xi ) + f (xi )Di + Ui (16)
Yet, unlike the Sharp RDD, treatment itself (Di ) is correlated withwhether the subject is above or below the cutoff (Zi ) but not solelydetermined by that:
Zi = 1 (xi ≥ 0) ,Di 6= Zi .
Hence, we can write the probability of treatment as:
Pr (Di = 1|X ) = α0 + βZZi + g (xi ) (17)
Note that the Sharp RDD is a special case where:
α0 = 0, βZ = 1, g (xi ) = 0.Yona Rubinstein (LSE) Regression Discontinuity Design 07/16 26 / 34
Fuzzy RDD as an IV
Hence, the cutoff rule serves as an instrumental variable.Assuming that Y is a smooth and continuos function of X , then thecutoff rule reflected in Z can serve as an instrumental variable since itfulfills the two necessary conditions:
1 Relevancy: Z affects the probability of treatment (D).2 Exclusion Restriction: Z does not have a direct effect on Yconditional on X .
We use 2SLS to estimate the parameter of interest βD .
1 First stage: since Di = Pr (Di = 1|X ) = α0 + βZZi + g (xi ) + Vi
Di = α0 + βZZi + g (xi ) + Vi . (18)
2 Second stage:Yi = β0 + βD D̂i + f (xi ) +Wi . (19)
Yona Rubinstein (LSE) Regression Discontinuity Design 07/16 27 / 34
Fuzzy RDD: an Example from Angrist and Lavy (1999)
Angrist and Lavy (1999) study the impact of class size oneducational attainment.
Angrist and Lavy exploit an old Talmudic rule —.“Maimonides’Rule”—that classes should be split if they have more than 40 students inIsrael. Specifically:
1 A school with 40 students has only one class of 40.2 A school with 41 students has two classes. with 21 and 20 students.
This rule implies that class C’s size at school S , MSC , as a functionof enrollment ES is:
MSC =ES
int[ES−140
]+ 1
,
where int [A] is the integer part of the real number A.
Yona Rubinstein (LSE) Regression Discontinuity Design 07/16 28 / 34
The Maimonides’Rule as an Instrumental Variable
Note, that the rule was NOT followed strictly! Yet, a very clever IV.
Yona Rubinstein (LSE) Regression Discontinuity Design 07/16 29 / 34
OLS and Fuzzy RD Estimates
Yona Rubinstein (LSE) Regression Discontinuity Design 07/16 30 / 34
Fuzzy Local RDD
Similarly to the sharp RDD, we can zoom in around the threshold andget non-parametric estimates.Specifically, we restrict the sample to the following range of X :
C − ∆ ≤ Xi ≤ C + ∆
Then we estimate the model using 2SLS.
1 The reduced form estimate is:
E [Yi |C ≤ Xi ≤ C + ∆]− E [Yi |C − ∆ ≤ Xi ≤ C ] (20)
2 The first stage estimate is:
E [Di |C ≤ Xi ≤ C + ∆]− E [Di |C − ∆ ≤ Xi ≤ C ] (21)
The Wald estimator is:
lim∆→0
E [Yi |C ≤ Xi ≤ C + ∆]− E [Yi |C − ∆ ≤ Xi ≤ C ]E [Di |C ≤ Xi ≤ C + ∆]− E [Di |C − ∆ ≤ Xi ≤ C ]
= βD (22)
Yona Rubinstein (LSE) Regression Discontinuity Design 07/16 31 / 34
Parametric or Non-Parametric? Bias versus Noise
Similarly to the sharp RDD, we can "zoom in" around the thresholdand get non-parametric estimates.When would parametric or non-parametric or window size matter?
Small effectRelationship between Y and X different away from cutoff,Functional form not well captured by polynomials (or other functionalform).
What do we trade-off?
Zoom In: consistent yet noisy estimate. smaller window can besubject to greater noise, robust to functional form assumptions.
Zoom Out: effi cient, yet biased estimate; contamination by X or"unobservables" the functional form fails to condition out.
Yona Rubinstein (LSE) Regression Discontinuity Design 07/16 32 / 34
RDD: Some Concerns
1 First concern: functional form and latent variables around cutoff2 Second concern: subject manipulate X around the cutoff. Note thatif subjects can’t perfectly manipulate it, then there will still berandomness in treatment.
Addressing 1st concern: plot RD pictures for other variables thatwe observe, but that shouldn’t be affected by treatment (D), seeLee (2008)
Addressing 2nd concern: test whether subjects are bunching aroundthe threshold by looking for discontinuities in the distribution ofcharacteristics that shouldn’t be affected by the treatment(gender?, age?, etc.).
Additional checks: falsification test
Yona Rubinstein (LSE) Regression Discontinuity Design 07/16 33 / 34
Take Home Message
RDD allows to identify causal effects using observational data.
Makes use of treatment assignment that isn’t random, but whereprocess follows some known and arbitrary cutoff rule.
Very useful with large samples; allows for nonparametric estimationand assessment of internal validity.
Very common scenario in practice, and estimator likely to be ofincreasing use.
If treatment effect is heterogeneous estimators’interpretation isLATE
Yona Rubinstein (LSE) Regression Discontinuity Design 07/16 34 / 34