notes on experiments and quasi-experiments in economics s ...€¦ · sergio firpo insper sergio...
TRANSCRIPT
-
“Notes on Experiments and quasi-experiments inEconomics”
São Paulo School of Advanced Science on Learning fromData
07/Aug/2019
Sergio FirpoInsper
Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1
-
Introduction
How to determine if there is a Causal Effect of a Program on SomeVariable of Interest.
Example 1: Workers Training / Qualification Program:
Workers undergoing the program are compared with non-programworkers. In general, can it be argued that the average difference inlabor income between the groups is unequivocally due to the trainingeffect?
Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1
-
Example 2: Conditional cash transfer programs:
Children whose families belong to the program are compared tochildren who do not belong to the program. In general, it can be saidthat the average difference in education (measured in some way, suchas matricule, lag, number of failures) between the decreasing groupsis due, unequivocally, to the effect of the program?
Answer for both examples: NOSergio Firpo Lectures 2 and 3 - SPSAS 1 / 1
-
What are the conditions under which it can be determined if there isa causal effect?
Evaluation Design:
Groups chosen in a random way;
Groups chosen based on observables;
Groups are result of self-selection;
Assumptions about selection into treatment:
Researcher has access to observable determinants of selection;
Researcher does not have access to important determinants ofselection.
Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1
-
Plan of this talk
Introduction: Identifying Program Effect, Causality, Potential Results,Heterogeneity, and Selection
Random Experiments: Advantages, Disadvantages, and Examples
Selection based on Observables
Basic Concepts
Matching
Propensity-Score Methods
Selection based on Non-Observables
Instrumental Variables
Regression-Discontinuity Designs
Synthetic Control MethodsSergio Firpo Lectures 2 and 3 - SPSAS 1 / 1
-
Identification
Identifying β in the equation below
Yi = α+ β · Ti + Ui
where Y is outcome variable and T treatment assignment dummy.
Causality
Causality for us is related to possibility of manipulation or of takingan action.
Examples:
“I took a pill and my headache disappeared.”
“She got a nice job because she graduated from an elite school.”
“He got fired because is old.”Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1
-
Potential outcomes and treatment effect heterogeneity
Potential Outcomes:Yi(0), outcome had i not received the treatment;Yi(1), outcome had i received the treatment;
We are interested in :
βi = Yi(1)− Yi(0)
Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1
-
Central problem of causal inference (Holland, 1986):
We cannot observe both Yi (1) and Yi (0) simultaneously for same uniti.
Thus, we cannot observe βi = Yi (1)− Yi (0)
We observe Yi:
Yi = Ti · Yi (1) + (1− Ti) · Yi (0)= Yi (0) + (Yi (1)− Yi (0)) · Ti
Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1
-
If we could observe both potential outcomess:
We would then have individual treatment effects:
βi = Yi (1)− Yi (0)
And we would estimate the average treatment effect, ATE by
1
N
N∑i=1
(Yi (1)− Yi (0))P→ E [Y (1)− Y (0)] = ATE
We observe nevertheless {Yi, Ti, Xi}Ni=1 where Xi is a vector ofcovariates.
Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1
-
From the definition of Y , we have
Yi = αi + βi · Ti= α+ β · Ti + Ui
where
Ui = αi − α+ (βi − β) · Ti
α = E [αi]
β = E [βi] = ATE
and if
Ti ⊥⊥ (Yi (1) , Yi (0))⇒ E [Ui|Ti] = Cov (Ti, Ui) = 0
Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1
-
Sometimes we are interested in the ATT , or the average treatmenteffect on the treated:
ATT = E [Y (1)− Y (0) |T = 1]
For the case of random experiments
Ti ⊥⊥ (Yi (1) , Yi (0))
then ATT = ATESergio Firpo Lectures 2 and 3 - SPSAS 1 / 1
-
Selection based on observables:
Given X, potential outcomes are independent of the treatment:
Pr [T = 1|Y (1) , Y (0) , X] = Pr [T = 1|X] ≡ p (X)
where p (X), or the probability of being treated given X, is theso-called propensity-score.
Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1
-
Conditional independence assumption (given X) between T and thepotential outcomes Y (1) and Y (0) a.k.a treatment ignorability orunconfoundedness.
T is ignorable if
Pr [T = 1|Y (1) , Y (0) , X] = Pr [T = 1|X]or
(Y (1) , Y (0)) ⊥⊥ T |X
Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1
-
Given ignorability, there are several ways to identify ATE from data.
Imputation:
ATE = E [Y (1)]− E [Y (0)]= E [E [Y (1) |X]]− E [E [Y (0) |X]]= E [E [Y (1) |X,T = 1]]− E [E [Y (0) |X,T = 0]]= E [E [Y |X,T = 1]]− E [E [Y |X,T = 0]]
Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1
-
If we are interested in ATT , then
ATT = E [Y (1) |T = 1]− E [Y (0) |T = 1]= E [Y |T = 1]− E [E [Y (0) |X,T = 1] |T = 1]= E [Y |T = 1]− E [E [Y (0) |X,T = 0] |T = 1]= E [Y |T = 1]− E [E [Y |X,T = 0] |T = 1]
Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1
-
Thus, we can estimate ATE and ATT by:
ÂTEimp =1
N
N∑i=1
Ê [Y |Xi, T = 1]−1
N
N∑i=1
Ê [Y |Xi, T = 0]
ÂTT imp =1
N
N∑i=1
Tip̂·(Yi − Ê [Y |Xi, T = 0]
)
Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1
-
Matching
For individual i, if Yi (1) observed (Ti = 1), then Yi (0) is missing.
Find j, such that Tj = 0 and:
‖Xj −Xi‖ ≤ ‖Xl −Xi‖ , ∀ l, Tl = 0
Then Ŷi (0) = Yj (0) = Yj .
ATT is estimated by
ÂTTmatch,1 =1
N
N∑i=1
Tip̂·(Yi − Ŷi (0)
)
Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1
-
Implementation questions:
1 How to deal with ties?
2 With or without replacement?
3 What distance ‖·‖ should be used?
4 ‖Xi −Xj‖ is different from zero, in general. That generates bias. Howto deal with that? Bias increases with dimension of X.
5 Statistical inference.
6 Number of matches: bias-variance tradeoff
7 Bootstrap fails
Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1
-
Finally, note that one interpretation for matching is as a particular caseof imputation method. For ATT :
ÂTT imp =1
N
N∑i=1
Tip̂·(Yi − Ê [Y |Xi, T = 0]
)ÂTTmatch,1 =
1
N
N∑i=1
Tip̂·(Yi − Ŷi (0)
)
and difference between Ê [Y |Xi, T = 0] and Ŷi (0) may not even exist.If, for example, E [Y |X,T = 0] is estimated by nearest-neighbor withfixed number of neighbors, these methods are algebraically identical.
Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1
-
Inverse probability weighting
ATE = E [E [Y |X,T = 1]]− E [E [Y |X,T = 0]]
= E
[T
p (X)· Y]− E
[(1− T
1− p (X)
)· Y]
ATT =E [T · Y ]
p− E [p (X) · E [Y |X,T = 0]]
p
= E
[T
p· Y]− E
[(p (X)
p· 1− p1− p (X)
· 1− T1− p
)· Y]
Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1
-
Selection on Unobservables: Instrumental Variables Method
How to estimate ATE when
Pr [T = 1|Y (1) , Y (0) , X] 6= Pr [T = 1|X]
Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1
-
Let Z be a dummy instrumental variable.
Define potential treatment Ti (z), as the treatment that individual iwould have had she got Z = z. Observed treatment is then:
Ti = Ti (1) · Zi + Ti (0) · (1− Zi)
And observed outcome still is:
Yi = Yi (1) · Ti + Yi (0) · (1− Ti)
Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1
-
Identification assumptions:
Instrument validity:
(Y (1) , Y (0) , T (1) , T (0)) ⊥⊥ Z
Monotonicity:
T (1) ≥ T (0)
Four types, but one is ruled out (defiers):
T (1) T (0) type1 1 always-taker (a)1 0 complier (c)0 1 defier (d)0 0 never-taker (n)
Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1
-
One can obtain an average treatment effect that is ‘local’, as it is anATE for “compliers”, or simply, LATE:
LATE = E [Y (1)− Y (0) |T (1) > T (0)]
=E [Y |Z = 1]− E [Y |Z = 0]E [T |Z = 1]− E [T |Z = 0]
Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1
-
Regression Discontinuity Design
Novamente, o ideal seria estimar ATE = E [Y (1)− Y (0)]. Lembre-seque:
Yi = Yi (0) + (Yi (1)− Yi (0)) · Ti= α+ β · Ti + Yi (0)− α+ (Yi (1)− Yi (0)− β) · Ti= α+ β · Ti + Ui
onde
β = ATE
Ui = Yi (0)− α+ (Yi (1)− Yi (0)− β) · Ti
Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1
-
Say that there is some continuous variable Z, sucht that E[T |Z] is welldefined.
There is a point z0, in which the probability of being treated jumps:
T+ ≡ lim�↓0
Pr [T = 1|Z = z0 + �]
T− ≡ lim�↓0
Pr [T = 1|Z = z0 − �]
T+ 6= T−
This is the source of exogenous variation that allows us to identify ATEat z0.
Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1
-
Fuzzy design versus sharp design
sharp design
Pr[T = 1|Z < z0] = 0Pr[T = 1|Z > z0] = 1
fuzzy design:
0 < Pr[T = 1|Z < z0] < 10 < Pr[T = 1|Z > z0] < 1
Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1
-
Assumption: E[Y (1)|Z = z] and E[Y (0)|Z = z] are continuous atz = z0.
Define
Y + ≡ lim�↓0
E [Y |Z = z0 + �]
Y − ≡ lim�↓0
E [Y |Z = z0 − �]
Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1
-
Then ATE at z0 is
ATE(z0) = E [Y (1)− Y (0) |Z = z0]
=Y + − Y −
T+ − T−
Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1
-
Figure 1: Simple Linear RD Setup
0
1
2
3
4
Forcing variable (X)
Out
com
e va
riabl
e (Y
)
A''
B'
C C'C''
τ
-
Figure 2: Nonlinear RD
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
4.00
0 0.5 1 1.5 2 2.5 3 3.5 4
Forcing variable (X)
Out
com
e va
riabl
e (Y
)
Xd
E FD
B'
B
A'
AE[Y(1)|X]
E[Y(0)|X]
Observed
Observed
-
Figure 3: Randomized Experiment as a RD Design
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
0 0.5 1 1.5 2 2.5 3 3.5 4
Forcing variable (random number, X)
Out
com
e va
riabl
e (Y
)
E[Y(1)|X]
E[Y(0)|X]
Observed (treatment)
Observed (control)
-
Figure 5. Treatment, Observables, and Unobservables in four research designs.
A. Randomized Experiment
B. Regression Discontinuity Design
C. Matching on Observables
D. Instrumental Variables
-
Figure 6a: Share of vote in next election, bandwidth of 0.02 (50 bins)
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5
-
Figure 6b: Share of vote in next election, bandwidth of 0.01 (100 bins)
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5
-
Figure 6c: Share of vote in next election, bandwidth of 0.005 (200 bins)
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5
-
Figure 7a: Winning the next election, bandwidth of 0.02 (50 bins)
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5
-
Figure 7b: Winning the next election, bandwidth of 0.01 (100 bins)
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5
-
Figure 7c: Winning the next election, bandwidth of 0.005 (200 bins)
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5
-
Figure 8: Density of the forcing variable (vote share in previous election)
0.00
0.50
1.00
1.50
2.00
2.50
-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5
-
Figure 9: Discontinuity in baseline covariate (share of vote in prior election)
0.20
0.30
0.40
0.50
0.60
0.70
0.80
-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5
-
Grupo de Controle Sintético (Abadie e Gardeazabal, AER 2003)
� Muitas vezes interesse recai sobre efeitos de eventos ou intervenções queacontecem em níveis agregados
� Exemplos:
efeito de salário mínimo sobre emprego: compara-se grupo afetado (eg,estado nos EUA) com um que não tenha sido afetado pela intervenção;
efeito de progressão continuada: comparam-se escolas da rede estadualde SP com de outras redes estaduais não afetadas.
-
� Em geral, para se estimar efeitos de eventos agregados, acompanhamosa evolução temporal da unidade afetada pela intervenção e comparamosessa evolução com a de uma outra unidade que não tenha sido afetada.
� Por exemplo, vemos o que aconteceu com salários e desemprego médioantes e depois da intervenção para tratados e controles (di¤-in-di¤).
� Como escolher grupos de comparação? No caso do progressão continuadaem SP, qual UF usar como comparação? RJ? O fato de ser estado contíguonão pode ter efeito nos municípios fronteiriços?
-
� Descrição do Método
Suponha que haja J+1 regiões e que 1a região foi afetada pelo trata-mento (intervenção)
Existem disponíveis J regiões de controle. SejaW =�w2; :::; wJ+1
�|,onde wj � 0 for j = 2; :::; J + 1 e w2+ :::+wJ+1 = 1. Cada valorde W representa um controle potencial.
Seja X1 um vetor k � 1 de características prétratamento. De modesimilar, X0 é uma matriz k � J que contém as mesmas variáveis nasregiões não afetadas.
O vetorW � é escolhido como o argumento que minimiza kX1 �X0Wksujeito a wj � 0 for j = 2; :::; J + 1 e w2 + :::+ wJ+1 = 1.
-
Em geral, kAkV =pA|V A onde V é matriz simétrica, positiva e
semidenida.
-
� Vantagens
Como controle, usamos a combinação convexa das unidades de com-paração que mais se assemelham ao valor das características observáveisdo grupo afetado antes do tratamento.
Evita-se extrapolação
Força pesquisador a demonstrar anidades entre unidades afetadas enão-afetadas usando caracterísitcas quanticáveis.
Pode acomodar presença de fatores não-observáveis.
Inferência válida e independente do número de unidades disponíveispara comparação.
-
� Exemplo: Californias Proposition 99, a qual aumentou imposto sobre ciga-rros em 25 cents/pack.
� Ver em Synthetic Control Methods for Comparative Case Studies: Es-timating the E¤ect of Californias Tobacco Control Program de AlbertoAbadie, A. Diamond and J. Hainmueller (2007).
-
California’s Proposition 99
• In 1988, California was first to pass comprehensive tobaccocontrol legislation (Proposition 99)
– increased cigarette tax by 25 cents/pack
– earmarked tax revenues to health and anti-smoking bud-
gets
– funded anti-smoking media campaigns
– spurred clean-air ordinances throughout the state
– produced more than $100 million per year in anti-tobacco
projects
• What was the effect of Proposition 99 on tobacco consump-tion in California?
-
1970 1975 1980 1985 1990 1995 2000
020
4060
8010
012
014
0
year
per−
capi
ta c
igar
ette
sal
es (
in p
acks
)
Californiarest of the U.S.
Passage of Proposition 99
California and the Rest of the U.S.
-
Smoking Prevalence Predictors Means
California Average ofVariables Real Synthetic 38 control statesLn(GDP per capita) 10.08 9.86 9.86Percent aged 15-24 17.40 17.40 17.29Retail price 89.42 89.41 87.27Beer consumption per capita 24.28 24.20 23.75Cigarette sales per capita 1988 90.10 91.62 114.20Cigarette sales per capita 1980 120.20 120.43 136.58Cigarette sales per capita 1975 127.10 126.99 132.81
Note: All variables except lagged cigarette sales are averaged for the 1980-1988 period (beer consumption is averaged 1984-1988).
-
1970 1975 1980 1985 1990 1995 2000
020
4060
8010
012
014
0
year
per−
capi
ta c
igar
ette
sal
es (
in p
acks
)
Californiasynthetic California
Passage of Proposition 99
California and synthetic California
-
1970 1975 1980 1985 1990 1995 2000
020
4060
8010
012
014
0
year
per−
capi
ta c
igar
ette
sal
es (
in p
acks
)
Californiarest of the U.S.
Passage of Proposition 99
California and the Rest of the U.S.
-
1970 1975 1980 1985 1990 1995 2000
020
4060
8010
012
014
0
year
per−
capi
ta c
igar
ette
sal
es (
in p
acks
)
Californiasynthetic California
Passage of Proposition 99
California and synthetic California
-
1970 1975 1980 1985 1990 1995 2000
−30
−20
−10
010
2030
year
gap
in p
er−
capi
ta c
igar
ette
sal
es (
in p
acks
)
Passage of Proposition 99
Smoking Gap Between California and synthetic California