notes on experiments and quasi-experiments in economics s ...€¦ · sergio firpo insper sergio...

55
“Notes on Experiments and quasi-experiments in Economics” ao Paulo School of Advanced Science on Learning from Data 07/Aug/2019 Sergio Firpo Insper Sergio Firpo Lectures 2 and 3 - SPSAS 1/1

Upload: others

Post on 17-Oct-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

  • “Notes on Experiments and quasi-experiments inEconomics”

    São Paulo School of Advanced Science on Learning fromData

    07/Aug/2019

    Sergio FirpoInsper

    Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1

  • Introduction

    How to determine if there is a Causal Effect of a Program on SomeVariable of Interest.

    Example 1: Workers Training / Qualification Program:

    Workers undergoing the program are compared with non-programworkers. In general, can it be argued that the average difference inlabor income between the groups is unequivocally due to the trainingeffect?

    Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1

  • Example 2: Conditional cash transfer programs:

    Children whose families belong to the program are compared tochildren who do not belong to the program. In general, it can be saidthat the average difference in education (measured in some way, suchas matricule, lag, number of failures) between the decreasing groupsis due, unequivocally, to the effect of the program?

    Answer for both examples: NOSergio Firpo Lectures 2 and 3 - SPSAS 1 / 1

  • What are the conditions under which it can be determined if there isa causal effect?

    Evaluation Design:

    Groups chosen in a random way;

    Groups chosen based on observables;

    Groups are result of self-selection;

    Assumptions about selection into treatment:

    Researcher has access to observable determinants of selection;

    Researcher does not have access to important determinants ofselection.

    Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1

  • Plan of this talk

    Introduction: Identifying Program Effect, Causality, Potential Results,Heterogeneity, and Selection

    Random Experiments: Advantages, Disadvantages, and Examples

    Selection based on Observables

    Basic Concepts

    Matching

    Propensity-Score Methods

    Selection based on Non-Observables

    Instrumental Variables

    Regression-Discontinuity Designs

    Synthetic Control MethodsSergio Firpo Lectures 2 and 3 - SPSAS 1 / 1

  • Identification

    Identifying β in the equation below

    Yi = α+ β · Ti + Ui

    where Y is outcome variable and T treatment assignment dummy.

    Causality

    Causality for us is related to possibility of manipulation or of takingan action.

    Examples:

    “I took a pill and my headache disappeared.”

    “She got a nice job because she graduated from an elite school.”

    “He got fired because is old.”Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1

  • Potential outcomes and treatment effect heterogeneity

    Potential Outcomes:Yi(0), outcome had i not received the treatment;Yi(1), outcome had i received the treatment;

    We are interested in :

    βi = Yi(1)− Yi(0)

    Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1

  • Central problem of causal inference (Holland, 1986):

    We cannot observe both Yi (1) and Yi (0) simultaneously for same uniti.

    Thus, we cannot observe βi = Yi (1)− Yi (0)

    We observe Yi:

    Yi = Ti · Yi (1) + (1− Ti) · Yi (0)= Yi (0) + (Yi (1)− Yi (0)) · Ti

    Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1

  • If we could observe both potential outcomess:

    We would then have individual treatment effects:

    βi = Yi (1)− Yi (0)

    And we would estimate the average treatment effect, ATE by

    1

    N

    N∑i=1

    (Yi (1)− Yi (0))P→ E [Y (1)− Y (0)] = ATE

    We observe nevertheless {Yi, Ti, Xi}Ni=1 where Xi is a vector ofcovariates.

    Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1

  • From the definition of Y , we have

    Yi = αi + βi · Ti= α+ β · Ti + Ui

    where

    Ui = αi − α+ (βi − β) · Ti

    α = E [αi]

    β = E [βi] = ATE

    and if

    Ti ⊥⊥ (Yi (1) , Yi (0))⇒ E [Ui|Ti] = Cov (Ti, Ui) = 0

    Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1

  • Sometimes we are interested in the ATT , or the average treatmenteffect on the treated:

    ATT = E [Y (1)− Y (0) |T = 1]

    For the case of random experiments

    Ti ⊥⊥ (Yi (1) , Yi (0))

    then ATT = ATESergio Firpo Lectures 2 and 3 - SPSAS 1 / 1

  • Selection based on observables:

    Given X, potential outcomes are independent of the treatment:

    Pr [T = 1|Y (1) , Y (0) , X] = Pr [T = 1|X] ≡ p (X)

    where p (X), or the probability of being treated given X, is theso-called propensity-score.

    Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1

  • Conditional independence assumption (given X) between T and thepotential outcomes Y (1) and Y (0) a.k.a treatment ignorability orunconfoundedness.

    T is ignorable if

    Pr [T = 1|Y (1) , Y (0) , X] = Pr [T = 1|X]or

    (Y (1) , Y (0)) ⊥⊥ T |X

    Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1

  • Given ignorability, there are several ways to identify ATE from data.

    Imputation:

    ATE = E [Y (1)]− E [Y (0)]= E [E [Y (1) |X]]− E [E [Y (0) |X]]= E [E [Y (1) |X,T = 1]]− E [E [Y (0) |X,T = 0]]= E [E [Y |X,T = 1]]− E [E [Y |X,T = 0]]

    Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1

  • If we are interested in ATT , then

    ATT = E [Y (1) |T = 1]− E [Y (0) |T = 1]= E [Y |T = 1]− E [E [Y (0) |X,T = 1] |T = 1]= E [Y |T = 1]− E [E [Y (0) |X,T = 0] |T = 1]= E [Y |T = 1]− E [E [Y |X,T = 0] |T = 1]

    Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1

  • Thus, we can estimate ATE and ATT by:

    ÂTEimp =1

    N

    N∑i=1

    Ê [Y |Xi, T = 1]−1

    N

    N∑i=1

    Ê [Y |Xi, T = 0]

    ÂTT imp =1

    N

    N∑i=1

    Tip̂·(Yi − Ê [Y |Xi, T = 0]

    )

    Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1

  • Matching

    For individual i, if Yi (1) observed (Ti = 1), then Yi (0) is missing.

    Find j, such that Tj = 0 and:

    ‖Xj −Xi‖ ≤ ‖Xl −Xi‖ , ∀ l, Tl = 0

    Then Ŷi (0) = Yj (0) = Yj .

    ATT is estimated by

    ÂTTmatch,1 =1

    N

    N∑i=1

    Tip̂·(Yi − Ŷi (0)

    )

    Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1

  • Implementation questions:

    1 How to deal with ties?

    2 With or without replacement?

    3 What distance ‖·‖ should be used?

    4 ‖Xi −Xj‖ is different from zero, in general. That generates bias. Howto deal with that? Bias increases with dimension of X.

    5 Statistical inference.

    6 Number of matches: bias-variance tradeoff

    7 Bootstrap fails

    Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1

  • Finally, note that one interpretation for matching is as a particular caseof imputation method. For ATT :

    ÂTT imp =1

    N

    N∑i=1

    Tip̂·(Yi − Ê [Y |Xi, T = 0]

    )ÂTTmatch,1 =

    1

    N

    N∑i=1

    Tip̂·(Yi − Ŷi (0)

    )

    and difference between Ê [Y |Xi, T = 0] and Ŷi (0) may not even exist.If, for example, E [Y |X,T = 0] is estimated by nearest-neighbor withfixed number of neighbors, these methods are algebraically identical.

    Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1

  • Inverse probability weighting

    ATE = E [E [Y |X,T = 1]]− E [E [Y |X,T = 0]]

    = E

    [T

    p (X)· Y]− E

    [(1− T

    1− p (X)

    )· Y]

    ATT =E [T · Y ]

    p− E [p (X) · E [Y |X,T = 0]]

    p

    = E

    [T

    p· Y]− E

    [(p (X)

    p· 1− p1− p (X)

    · 1− T1− p

    )· Y]

    Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1

  • Selection on Unobservables: Instrumental Variables Method

    How to estimate ATE when

    Pr [T = 1|Y (1) , Y (0) , X] 6= Pr [T = 1|X]

    Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1

  • Let Z be a dummy instrumental variable.

    Define potential treatment Ti (z), as the treatment that individual iwould have had she got Z = z. Observed treatment is then:

    Ti = Ti (1) · Zi + Ti (0) · (1− Zi)

    And observed outcome still is:

    Yi = Yi (1) · Ti + Yi (0) · (1− Ti)

    Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1

  • Identification assumptions:

    Instrument validity:

    (Y (1) , Y (0) , T (1) , T (0)) ⊥⊥ Z

    Monotonicity:

    T (1) ≥ T (0)

    Four types, but one is ruled out (defiers):

    T (1) T (0) type1 1 always-taker (a)1 0 complier (c)0 1 defier (d)0 0 never-taker (n)

    Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1

  • One can obtain an average treatment effect that is ‘local’, as it is anATE for “compliers”, or simply, LATE:

    LATE = E [Y (1)− Y (0) |T (1) > T (0)]

    =E [Y |Z = 1]− E [Y |Z = 0]E [T |Z = 1]− E [T |Z = 0]

    Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1

  • Regression Discontinuity Design

    Novamente, o ideal seria estimar ATE = E [Y (1)− Y (0)]. Lembre-seque:

    Yi = Yi (0) + (Yi (1)− Yi (0)) · Ti= α+ β · Ti + Yi (0)− α+ (Yi (1)− Yi (0)− β) · Ti= α+ β · Ti + Ui

    onde

    β = ATE

    Ui = Yi (0)− α+ (Yi (1)− Yi (0)− β) · Ti

    Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1

  • Say that there is some continuous variable Z, sucht that E[T |Z] is welldefined.

    There is a point z0, in which the probability of being treated jumps:

    T+ ≡ lim�↓0

    Pr [T = 1|Z = z0 + �]

    T− ≡ lim�↓0

    Pr [T = 1|Z = z0 − �]

    T+ 6= T−

    This is the source of exogenous variation that allows us to identify ATEat z0.

    Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1

  • Fuzzy design versus sharp design

    sharp design

    Pr[T = 1|Z < z0] = 0Pr[T = 1|Z > z0] = 1

    fuzzy design:

    0 < Pr[T = 1|Z < z0] < 10 < Pr[T = 1|Z > z0] < 1

    Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1

  • Assumption: E[Y (1)|Z = z] and E[Y (0)|Z = z] are continuous atz = z0.

    Define

    Y + ≡ lim�↓0

    E [Y |Z = z0 + �]

    Y − ≡ lim�↓0

    E [Y |Z = z0 − �]

    Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1

  • Then ATE at z0 is

    ATE(z0) = E [Y (1)− Y (0) |Z = z0]

    =Y + − Y −

    T+ − T−

    Sergio Firpo Lectures 2 and 3 - SPSAS 1 / 1

  • Figure 1: Simple Linear RD Setup

    0

    1

    2

    3

    4

    Forcing variable (X)

    Out

    com

    e va

    riabl

    e (Y

    )

    A''

    B'

    C C'C''

    τ

  • Figure 2: Nonlinear RD

    0.00

    0.50

    1.00

    1.50

    2.00

    2.50

    3.00

    3.50

    4.00

    0 0.5 1 1.5 2 2.5 3 3.5 4

    Forcing variable (X)

    Out

    com

    e va

    riabl

    e (Y

    )

    Xd

    E FD

    B'

    B

    A'

    AE[Y(1)|X]

    E[Y(0)|X]

    Observed

    Observed

  • Figure 3: Randomized Experiment as a RD Design

    0.0

    0.5

    1.0

    1.5

    2.0

    2.5

    3.0

    3.5

    4.0

    0 0.5 1 1.5 2 2.5 3 3.5 4

    Forcing variable (random number, X)

    Out

    com

    e va

    riabl

    e (Y

    )

    E[Y(1)|X]

    E[Y(0)|X]

    Observed (treatment)

    Observed (control)

  • Figure 5. Treatment, Observables, and Unobservables in four research designs.

    A. Randomized Experiment

    B. Regression Discontinuity Design

    C. Matching on Observables

    D. Instrumental Variables

  • Figure 6a: Share of vote in next election, bandwidth of 0.02 (50 bins)

    0.10

    0.20

    0.30

    0.40

    0.50

    0.60

    0.70

    0.80

    -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5

  • Figure 6b: Share of vote in next election, bandwidth of 0.01 (100 bins)

    0.10

    0.20

    0.30

    0.40

    0.50

    0.60

    0.70

    0.80

    -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5

  • Figure 6c: Share of vote in next election, bandwidth of 0.005 (200 bins)

    0.10

    0.20

    0.30

    0.40

    0.50

    0.60

    0.70

    0.80

    -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5

  • Figure 7a: Winning the next election, bandwidth of 0.02 (50 bins)

    0.00

    0.10

    0.20

    0.30

    0.40

    0.50

    0.60

    0.70

    0.80

    0.90

    1.00

    -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5

  • Figure 7b: Winning the next election, bandwidth of 0.01 (100 bins)

    0.00

    0.10

    0.20

    0.30

    0.40

    0.50

    0.60

    0.70

    0.80

    0.90

    1.00

    -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5

  • Figure 7c: Winning the next election, bandwidth of 0.005 (200 bins)

    0.00

    0.10

    0.20

    0.30

    0.40

    0.50

    0.60

    0.70

    0.80

    0.90

    1.00

    -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5

  • Figure 8: Density of the forcing variable (vote share in previous election)

    0.00

    0.50

    1.00

    1.50

    2.00

    2.50

    -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5

  • Figure 9: Discontinuity in baseline covariate (share of vote in prior election)

    0.20

    0.30

    0.40

    0.50

    0.60

    0.70

    0.80

    -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5

  • Grupo de Controle Sintético (Abadie e Gardeazabal, AER 2003)

    � Muitas vezes interesse recai sobre efeitos de eventos ou intervenções queacontecem em níveis agregados

    � Exemplos:

    efeito de salário mínimo sobre emprego: compara-se grupo afetado (eg,estado nos EUA) com um que não tenha sido afetado pela intervenção;

    efeito de progressão continuada: comparam-se escolas da rede estadualde SP com de outras redes estaduais não afetadas.

  • � Em geral, para se estimar efeitos de eventos agregados, acompanhamosa evolução temporal da unidade afetada pela intervenção e comparamosessa evolução com a de uma outra unidade que não tenha sido afetada.

    � Por exemplo, vemos o que aconteceu com salários e desemprego médioantes e depois da intervenção para tratados e controles (di¤-in-di¤).

    � Como escolher grupos de comparação? No caso do progressão continuadaem SP, qual UF usar como comparação? RJ? O fato de ser estado contíguonão pode ter efeito nos municípios fronteiriços?

  • � Descrição do Método

    Suponha que haja J+1 regiões e que 1a região foi afetada pelo trata-mento (intervenção)

    Existem disponíveis J regiões de controle. SejaW =�w2; :::; wJ+1

    �|,onde wj � 0 for j = 2; :::; J + 1 e w2+ :::+wJ+1 = 1. Cada valorde W representa um controle potencial.

    Seja X1 um vetor k � 1 de características prétratamento. De modesimilar, X0 é uma matriz k � J que contém as mesmas variáveis nasregiões não afetadas.

    O vetorW � é escolhido como o argumento que minimiza kX1 �X0Wksujeito a wj � 0 for j = 2; :::; J + 1 e w2 + :::+ wJ+1 = 1.

  • Em geral, kAkV =pA|V A onde V é matriz simétrica, positiva e

    semidenida.

  • � Vantagens

    Como controle, usamos a combinação convexa das unidades de com-paração que mais se assemelham ao valor das características observáveisdo grupo afetado antes do tratamento.

    Evita-se extrapolação

    Força pesquisador a demonstrar anidades entre unidades afetadas enão-afetadas usando caracterísitcas quanticáveis.

    Pode acomodar presença de fatores não-observáveis.

    Inferência válida e independente do número de unidades disponíveispara comparação.

  • � Exemplo: Californias Proposition 99, a qual aumentou imposto sobre ciga-rros em 25 cents/pack.

    � Ver em Synthetic Control Methods for Comparative Case Studies: Es-timating the E¤ect of Californias Tobacco Control Program de AlbertoAbadie, A. Diamond and J. Hainmueller (2007).

  • California’s Proposition 99

    • In 1988, California was first to pass comprehensive tobaccocontrol legislation (Proposition 99)

    – increased cigarette tax by 25 cents/pack

    – earmarked tax revenues to health and anti-smoking bud-

    gets

    – funded anti-smoking media campaigns

    – spurred clean-air ordinances throughout the state

    – produced more than $100 million per year in anti-tobacco

    projects

    • What was the effect of Proposition 99 on tobacco consump-tion in California?

  • 1970 1975 1980 1985 1990 1995 2000

    020

    4060

    8010

    012

    014

    0

    year

    per−

    capi

    ta c

    igar

    ette

    sal

    es (

    in p

    acks

    )

    Californiarest of the U.S.

    Passage of Proposition 99

    California and the Rest of the U.S.

  • Smoking Prevalence Predictors Means

    California Average ofVariables Real Synthetic 38 control statesLn(GDP per capita) 10.08 9.86 9.86Percent aged 15-24 17.40 17.40 17.29Retail price 89.42 89.41 87.27Beer consumption per capita 24.28 24.20 23.75Cigarette sales per capita 1988 90.10 91.62 114.20Cigarette sales per capita 1980 120.20 120.43 136.58Cigarette sales per capita 1975 127.10 126.99 132.81

    Note: All variables except lagged cigarette sales are averaged for the 1980-1988 period (beer consumption is averaged 1984-1988).

  • 1970 1975 1980 1985 1990 1995 2000

    020

    4060

    8010

    012

    014

    0

    year

    per−

    capi

    ta c

    igar

    ette

    sal

    es (

    in p

    acks

    )

    Californiasynthetic California

    Passage of Proposition 99

    California and synthetic California

  • 1970 1975 1980 1985 1990 1995 2000

    020

    4060

    8010

    012

    014

    0

    year

    per−

    capi

    ta c

    igar

    ette

    sal

    es (

    in p

    acks

    )

    Californiarest of the U.S.

    Passage of Proposition 99

    California and the Rest of the U.S.

  • 1970 1975 1980 1985 1990 1995 2000

    020

    4060

    8010

    012

    014

    0

    year

    per−

    capi

    ta c

    igar

    ette

    sal

    es (

    in p

    acks

    )

    Californiasynthetic California

    Passage of Proposition 99

    California and synthetic California

  • 1970 1975 1980 1985 1990 1995 2000

    −30

    −20

    −10

    010

    2030

    year

    gap

    in p

    er−

    capi

    ta c

    igar

    ette

    sal

    es (

    in p

    acks

    )

    Passage of Proposition 99

    Smoking Gap Between California and synthetic California