notes on experiments and quasi-experiments in economics s ...€¦ · sergio firpo insper sergio...

“Notes on Experiments and quasi-experiments inEconomics”

São Paulo School of Advanced Science on Learning fromData

07/Aug/2019

Sergio FirpoInsper

Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1

Introduction

How to determine if there is a Causal Effect of a Program on SomeVariable of Interest.

Example 1: Workers Training / Qualification Program:

Workers undergoing the program are compared with non-programworkers. In general, can it be argued that the average difference inlabor income between the groups is unequivocally due to the trainingeffect?


Example 2: Conditional cash transfer programs:

Children whose families belong to the program are compared tochildren who do not belong to the program. In general, it can be saidthat the average difference in education (measured in some way, suchas matricule, lag, number of failures) between the decreasing groupsis due, unequivocally, to the effect of the program?

Answer for both examples: NOSergio Firpo Lectures 2 and 3 - SPSAS 1 / 1

What are the conditions under which it can be determined if there isa causal effect?

Evaluation Design:

Groups chosen in a random way;

Groups chosen based on observables;

Groups are result of self-selection;

Assumptions about selection into treatment:

Researcher has access to observable determinants of selection;

Researcher does not have access to important determinants ofselection.


Plan of this talk

Introduction: Identifying Program Effect, Causality, Potential Results,Heterogeneity, and Selection

Random Experiments: Advantages, Disadvantages, and Examples

Selection based on Observables

Basic Concepts

Matching

Propensity-Score Methods

Selection based on Non-Observables

Instrumental Variables

Regression-Discontinuity Designs

Synthetic Control MethodsSergio Firpo Lectures 2 and 3 - SPSAS 1 / 1

Identification

Identifying β in the equation below

Yi = α+ β · Ti + Ui

where Y is outcome variable and T treatment assignment dummy.

Causality

Causality for us is related to possibility of manipulation or of takingan action.

Examples:

“I took a pill and my headache disappeared.”

“She got a nice job because she graduated from an elite school.”

“He got fired because is old.”Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1

Potential outcomes and treatment effect heterogeneity

Potential Outcomes:Yi(0), outcome had i not received the treatment;Yi(1), outcome had i received the treatment;

We are interested in :

βi = Yi(1)− Yi(0)


Central problem of causal inference (Holland, 1986):

We cannot observe both Yi (1) and Yi (0) simultaneously for same uniti.

Thus, we cannot observe βi = Yi (1)− Yi (0)

We observe Yi:

Yi = Ti · Yi (1) + (1− Ti) · Yi (0)= Yi (0) + (Yi (1)− Yi (0)) · Ti


If we could observe both potential outcomess:

We would then have individual treatment effects:

βi = Yi (1)− Yi (0)

And we would estimate the average treatment effect, ATE by

1

N

N∑i=1

(Yi (1)− Yi (0))P→ E [Y (1)− Y (0)] = ATE

We observe nevertheless {Yi, Ti, Xi}Ni=1 where Xi is a vector ofcovariates.


From the definition of Y , we have

Yi = αi + βi · Ti= α+ β · Ti + Ui

where

Ui = αi − α+ (βi − β) · Ti

α = E [αi]

β = E [βi] = ATE

and if

Ti ⊥⊥ (Yi (1) , Yi (0))⇒ E [Ui|Ti] = Cov (Ti, Ui) = 0


Sometimes we are interested in the ATT , or the average treatmenteffect on the treated:

ATT = E [Y (1)− Y (0) |T = 1]

For the case of random experiments

Ti ⊥⊥ (Yi (1) , Yi (0))

then ATT = ATESergio Firpo Lectures 2 and 3 - SPSAS 1 / 1

Selection based on observables:

Given X, potential outcomes are independent of the treatment:

Pr [T = 1|Y (1) , Y (0) , X] = Pr [T = 1|X] ≡ p (X)

where p (X), or the probability of being treated given X, is theso-called propensity-score.


Conditional independence assumption (given X) between T and thepotential outcomes Y (1) and Y (0) a.k.a treatment ignorability orunconfoundedness.

T is ignorable if

Pr [T = 1|Y (1) , Y (0) , X] = Pr [T = 1|X]or

(Y (1) , Y (0)) ⊥⊥ T |X


If we are interested in ATT , then

ATT = E [Y (1) |T = 1]− E [Y (0) |T = 1]= E [Y |T = 1]− E [E [Y (0) |X,T = 1] |T = 1]= E [Y |T = 1]− E [E [Y (0) |X,T = 0] |T = 1]= E [Y |T = 1]− E [E [Y |X,T = 0] |T = 1]


Thus, we can estimate ATE and ATT by:

ÂTEimp =1

N

N∑i=1

Ê [Y |Xi, T = 1]−1

N

N∑i=1

Ê [Y |Xi, T = 0]

ÂTT imp =1

N

N∑i=1

Tip̂·(Yi − Ê [Y |Xi, T = 0]

)


Matching

For individual i, if Yi (1) observed (Ti = 1), then Yi (0) is missing.

Find j, such that Tj = 0 and:

‖Xj −Xi‖ ≤ ‖Xl −Xi‖ , ∀ l, Tl = 0

Then Ŷi (0) = Yj (0) = Yj .

ATT is estimated by

ÂTTmatch,1 =1

N

N∑i=1

Tip̂·(Yi − Ŷi (0)

)


Implementation questions:

1 How to deal with ties?

2 With or without replacement?

3 What distance ‖·‖ should be used?

4 ‖Xi −Xj‖ is different from zero, in general. That generates bias. Howto deal with that? Bias increases with dimension of X.

5 Statistical inference.

6 Number of matches: bias-variance tradeoff

7 Bootstrap fails


Finally, note that one interpretation for matching is as a particular caseof imputation method. For ATT :

ÂTT imp =1

N

N∑i=1

Tip̂·(Yi − Ê [Y |Xi, T = 0]

)ÂTTmatch,1 =

1

N

N∑i=1

Tip̂·(Yi − Ŷi (0)

)

and difference between Ê [Y |Xi, T = 0] and Ŷi (0) may not even exist.If, for example, E [Y |X,T = 0] is estimated by nearest-neighbor withfixed number of neighbors, these methods are algebraically identical.


Inverse probability weighting

ATE = E [E [Y |X,T = 1]]− E [E [Y |X,T = 0]]

= E

[T

p (X)· Y]− E

[(1− T

1− p (X)

)· Y]

ATT =E [T · Y ]

p− E [p (X) · E [Y |X,T = 0]]

p

= E

[T

p· Y]− E

[(p (X)

p· 1− p1− p (X)

· 1− T1− p

)· Y]


Selection on Unobservables: Instrumental Variables Method

How to estimate ATE when

Pr [T = 1|Y (1) , Y (0) , X] 6= Pr [T = 1|X]


Let Z be a dummy instrumental variable.

Define potential treatment Ti (z), as the treatment that individual iwould have had she got Z = z. Observed treatment is then:

Ti = Ti (1) · Zi + Ti (0) · (1− Zi)

And observed outcome still is:

Yi = Yi (1) · Ti + Yi (0) · (1− Ti)


Identification assumptions:

Instrument validity:

(Y (1) , Y (0) , T (1) , T (0)) ⊥⊥ Z

Monotonicity:

T (1) ≥ T (0)

Four types, but one is ruled out (defiers):

T (1) T (0) type1 1 always-taker (a)1 0 complier (c)0 1 defier (d)0 0 never-taker (n)


One can obtain an average treatment effect that is ‘local’, as it is anATE for “compliers”, or simply, LATE:

LATE = E [Y (1)− Y (0) |T (1) > T (0)]

=E [Y |Z = 1]− E [Y |Z = 0]E [T |Z = 1]− E [T |Z = 0]


Regression Discontinuity Design

Novamente, o ideal seria estimar ATE = E [Y (1)− Y (0)]. Lembre-seque:

Yi = Yi (0) + (Yi (1)− Yi (0)) · Ti= α+ β · Ti + Yi (0)− α+ (Yi (1)− Yi (0)− β) · Ti= α+ β · Ti + Ui

onde

β = ATE

Ui = Yi (0)− α+ (Yi (1)− Yi (0)− β) · Ti


Say that there is some continuous variable Z, sucht that E[T |Z] is welldefined.

There is a point z0, in which the probability of being treated jumps:

T+ ≡ lim�↓0

Pr [T = 1|Z = z0 + �]

T− ≡ lim�↓0

Pr [T = 1|Z = z0 − �]

T+ 6= T−

This is the source of exogenous variation that allows us to identify ATEat z0.


Fuzzy design versus sharp design

sharp design

Pr[T = 1|Z < z0] = 0Pr[T = 1|Z > z0] = 1

fuzzy design:

0 < Pr[T = 1|Z < z0] < 10 < Pr[T = 1|Z > z0] < 1


Assumption: E[Y (1)|Z = z] and E[Y (0)|Z = z] are continuous atz = z0.

Define

Y + ≡ lim�↓0

E [Y |Z = z0 + �]

Y − ≡ lim�↓0

E [Y |Z = z0 − �]


Then ATE at z0 is

ATE(z0) = E [Y (1)− Y (0) |Z = z0]

=Y + − Y −

T+ − T−


Figure 1: Simple Linear RD Setup

0

1

2

3

4

Forcing variable (X)

Out

com

e va

riabl

e (Y

)

A''

B'

C C'C''

τ

Figure 2: Nonlinear RD

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

4.00

0 0.5 1 1.5 2 2.5 3 3.5 4

Forcing variable (X)

Out

com

e va

riabl

e (Y

)

Xd

E FD

B'

B

A'

AE[Y(1)|X]

E[Y(0)|X]

Observed

Observed

Figure 3: Randomized Experiment as a RD Design

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

0 0.5 1 1.5 2 2.5 3 3.5 4

Forcing variable (random number, X)

Out

com

e va

riabl

e (Y

)

E[Y(1)|X]

E[Y(0)|X]

Observed (treatment)

Observed (control)

Figure 5. Treatment, Observables, and Unobservables in four research designs.

A. Randomized Experiment

B. Regression Discontinuity Design

C. Matching on Observables

D. Instrumental Variables

Figure 6a: Share of vote in next election, bandwidth of 0.02 (50 bins)

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5

Figure 6b: Share of vote in next election, bandwidth of 0.01 (100 bins)

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5

Figure 6c: Share of vote in next election, bandwidth of 0.005 (200 bins)

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5

Figure 7a: Winning the next election, bandwidth of 0.02 (50 bins)

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5

Figure 7b: Winning the next election, bandwidth of 0.01 (100 bins)

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5

Figure 7c: Winning the next election, bandwidth of 0.005 (200 bins)

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5

Figure 8: Density of the forcing variable (vote share in previous election)

0.00

0.50

1.00

1.50

2.00

2.50

-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5

Figure 9: Discontinuity in baseline covariate (share of vote in prior election)

0.20

0.30

0.40

0.50

0.60

0.70

0.80

-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5

Grupo de Controle Sintético (Abadie e Gardeazabal, AER 2003)

� Muitas vezes interesse recai sobre efeitos de eventos ou intervenções queacontecem em níveis agregados

� Exemplos:

efeito de salário mínimo sobre emprego: compara-se grupo afetado (eg,estado nos EUA) com um que não tenha sido afetado pela intervenção;

efeito de progressão continuada: comparam-se escolas da rede estadualde SP com de outras redes estaduais não afetadas.

� Em geral, para se estimar efeitos de eventos agregados, acompanhamosa evolução temporal da unidade afetada pela intervenção e comparamosessa evolução com a de uma outra unidade que não tenha sido afetada.

� Por exemplo, vemos o que aconteceu com salários e desemprego médioantes e depois da intervenção para tratados e controles (di¤-in-di¤).

� Como escolher grupos de comparação? No caso do progressão continuadaem SP, qual UF usar como comparação? RJ? O fato de ser estado contíguonão pode ter efeito nos municípios fronteiriços?

� Descrição do Método

Suponha que haja J+1 regiões e que 1a região foi afetada pelo trata-mento (intervenção)

Existem disponíveis J regiões de controle. SejaW =�w2; :::; wJ+1

�|,onde wj � 0 for j = 2; :::; J + 1 e w2+ :::+wJ+1 = 1. Cada valorde W representa um controle potencial.

Seja X1 um vetor k � 1 de características prétratamento. De modesimilar, X0 é uma matriz k � J que contém as mesmas variáveis nasregiões não afetadas.

O vetorW � é escolhido como o argumento que minimiza kX1 �X0Wksujeito a wj � 0 for j = 2; :::; J + 1 e w2 + :::+ wJ+1 = 1.

Em geral, kAkV =pA|V A onde V é matriz simétrica, positiva e

semidenida.

� Vantagens

Como controle, usamos a combinação convexa das unidades de com-paração que mais se assemelham ao valor das características observáveisdo grupo afetado antes do tratamento.

Evita-se extrapolação

Força pesquisador a demonstrar anidades entre unidades afetadas enão-afetadas usando caracterísitcas quanticáveis.

Pode acomodar presença de fatores não-observáveis.

Inferência válida e independente do número de unidades disponíveispara comparação.

� Exemplo: Californias Proposition 99, a qual aumentou imposto sobre ciga-rros em 25 cents/pack.

� Ver em Synthetic Control Methods for Comparative Case Studies: Es-timating the E¤ect of Californias Tobacco Control Program de AlbertoAbadie, A. Diamond and J. Hainmueller (2007).

California’s Proposition 99

• In 1988, California was first to pass comprehensive tobaccocontrol legislation (Proposition 99)

– increased cigarette tax by 25 cents/pack

– earmarked tax revenues to health and anti-smoking bud-

gets

– funded anti-smoking media campaigns

– spurred clean-air ordinances throughout the state

– produced more than $100 million per year in anti-tobacco

projects

• What was the effect of Proposition 99 on tobacco consump-tion in California?

1970 1975 1980 1985 1990 1995 2000

020

4060

8010

012

014

0

year

per−

capi

ta c

igar

ette

sal

es (

in p

acks

)

Californiarest of the U.S.

Passage of Proposition 99

California and the Rest of the U.S.

Smoking Prevalence Predictors Means

California Average ofVariables Real Synthetic 38 control statesLn(GDP per capita) 10.08 9.86 9.86Percent aged 15-24 17.40 17.40 17.29Retail price 89.42 89.41 87.27Beer consumption per capita 24.28 24.20 23.75Cigarette sales per capita 1988 90.10 91.62 114.20Cigarette sales per capita 1980 120.20 120.43 136.58Cigarette sales per capita 1975 127.10 126.99 132.81

Note: All variables except lagged cigarette sales are averaged for the 1980-1988 period (beer consumption is averaged 1984-1988).

1970 1975 1980 1985 1990 1995 2000

020

4060

8010

012

014

0

year

per−

capi

ta c

igar

ette

sal

es (

in p

acks

)

Californiasynthetic California


California and synthetic California

1970 1975 1980 1985 1990 1995 2000

020

4060

8010

012

014

0

year

per−

capi

ta c

igar

ette

sal

es (

in p

acks

)

Californiarest of the U.S.


California and the Rest of the U.S.

1970 1975 1980 1985 1990 1995 2000

020

4060

8010

012

014

0

year

per−

capi

ta c

igar

ette

sal

es (

in p

acks

)

Californiasynthetic California


California and synthetic California

1970 1975 1980 1985 1990 1995 2000

−30

−20

−10

010

2030

year

gap

in p

er−

capi

ta c

igar

ette

sal

es (

in p

acks

)


Smoking Gap Between California and synthetic California

notes on experiments and quasi-experiments in economics s ...€¦ · sergio firpo insper sergio...

Documents