the supervised learning approach to estimating...

28
The Supervised Learning Approach To Estimating Heterogeneous Causal Regime Effects Thai T. Pham Stanford Graduate School of Business [email protected] May, 2016

Upload: others

Post on 20-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Supervised Learning Approach To Estimating ...web.stanford.edu/~thaipham/papers/Heterogeneous... · work) We use a deep learning method in machine learning literature called “Multilayer

The Supervised Learning Approach To EstimatingHeterogeneous Causal Regime Effects

Thai T. Pham

Stanford Graduate School of [email protected]

May, 2016

Page 2: The Supervised Learning Approach To Estimating ...web.stanford.edu/~thaipham/papers/Heterogeneous... · work) We use a deep learning method in machine learning literature called “Multilayer

Introduction

Observations

Many sequential treatment settings: patients make adjustmentsin medications in multiple periods; students decide whether tofollow an educational honors program over multiple years; in labormarket, the unemployed might participate in a set of programs (jobsearch, subsidized job, training) sequentially.

Heterogeneity in treatment sequence reactions: medicationeffects can be heterogeneous across patients and across time;the same for educational program and labor market.

Hard to set up sequential randomized experiments in reality.

Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 2 / 27

Page 3: The Supervised Learning Approach To Estimating ...web.stanford.edu/~thaipham/papers/Heterogeneous... · work) We use a deep learning method in machine learning literature called “Multilayer

Introduction

Contributions

Develop a nonparametric framework using supervised learning toestimate heterogeneous treatment regime effects from observational(or experimental) data.

Treatment Regime: a set of functions of characteristics andintermediate outcomes.

Propose using supervised learning approach (deep learning),which gives good estimation accuracy and which is robust tomodel misspecification.

Propose matching based testing method for the estimation ofheterogeneous treatment regime effects.

Propose matching based kernel estimator for variance ofheterogeneous treatment regime effects (time allows).

Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 3 / 27

Page 4: The Supervised Learning Approach To Estimating ...web.stanford.edu/~thaipham/papers/Heterogeneous... · work) We use a deep learning method in machine learning literature called “Multilayer

Introduction

Contributions (cont’d)

In this paper, we

Focus on dynamic setting with multiple treatments appliedsequentially (in contrast to a single treatment).

Focus on the heterogeneous (in contrast to average) effect of asequence of treatments, i.e. a treatment regime.

Focus on observational data (in contrast to experimental data).

Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 4 / 27

Page 5: The Supervised Learning Approach To Estimating ...web.stanford.edu/~thaipham/papers/Heterogeneous... · work) We use a deep learning method in machine learning literature called “Multilayer

An Illustrative Model The Setup

Setup - Motivational Dataset

The North Carolina Honors Program Dataset.

There are 24,112 observations in total.

X0 = [Y0,d1,d2,d3], where Y0 is the Math test score at the end of8th grade and d1,d2,d3 are census-data dummy variables.

W0,W1 ∈ {0,1} are treatment variables.

Y1: end of 9th grade Math test score.

Y2: end of 10th grade Math test score (object of interest).

Y0,Y1,Y2 are pre-scaled to have zero mean and unit variance.

Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 5 / 27

Page 6: The Supervised Learning Approach To Estimating ...web.stanford.edu/~thaipham/papers/Heterogeneous... · work) We use a deep learning method in machine learning literature called “Multilayer

An Illustrative Model The Setup

Setup - Model

End of eighth grade:

Students’ initial information X0, which includes Math test score Y0and other personal information (d1,d2,d3), is observed.

Decide to follow honors (W0 = 1) or standard (W0 = 0) program.

End of ninth grade:

X0,W0, and Math test score Y1 are observed.

Decide to switch or stay in current program (W1 = 1 or 0).

End of tenth grade:

X0,W0,Y1,W1, and Math test score Y2 are observed.

Object of interest: Y2 (It could be any functions of X0,Y1,Y2).

Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 6 / 27

Page 7: The Supervised Learning Approach To Estimating ...web.stanford.edu/~thaipham/papers/Heterogeneous... · work) We use a deep learning method in machine learning literature called “Multilayer

An Illustrative Model The Setup

Potential Outcome (PO) Framework

Treatment regime d = (d0,d1) has

d0 : X0 →W0 ∈ {0,1} and d1 : X0 ×W0 × Y1 →W1 ∈ {0,1}.

Potential Outcome Y1(W0) = Y1. Also, the observed outcome

Y1 = W0 · Y1(1) + (1−W0) · Y1(0).

Similarly, Y d2 = Y2 if the subject follows regime d . We also write

Y2 = Y d2 = Y2(W0,W1) when d0 maps to W0 and d1 maps to W1.

We have

Y2 = W0W1 · Y2(1,1) + W0(1−W1) · Y2(1,0)+(1−W0)W1 · Y2(0,1) + (1−W0)(1−W1) · Y2(0,0).

Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 7 / 27

Page 8: The Supervised Learning Approach To Estimating ...web.stanford.edu/~thaipham/papers/Heterogeneous... · work) We use a deep learning method in machine learning literature called “Multilayer

An Illustrative Model The Setup

Types of Treatment Regime

Static Treatment Regime: subjects specify (or are specified) thewhole treatment plan based only on the initial covariates (X0).

So d : X0 → (W0,W1) ∈ {0,1}2.

Dynamic Treatment Regime: subjects choose (or are assigned)the initial treatment based on the initial covariates (X0); thensubsequently choose (or are assigned) the next treatment basedon the initial covariates (X0), the first period treatment (W0), andthe intermediate outcome (Y1); and so on.

This is our original setup.

Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 8 / 27

Page 9: The Supervised Learning Approach To Estimating ...web.stanford.edu/~thaipham/papers/Heterogeneous... · work) We use a deep learning method in machine learning literature called “Multilayer

An Illustrative Model The Setup

Potential Outcome (PO) Framework (Cont’d)

Objective: Estimate E[Y d

2 − Y d ′

2

]for individuals (or average), and

derive heterogeneous optimal regime

d∗(C) = arg maxd

E[Y d

2

∣∣∣ C]

for individual covariates C.

Difficulties:Fundamental Problem of Causal Inference: for each subject, wenever observe both Y d

2 and Y d ′

2 .

Selection Bias: students following d may fundamentally bedifferent from those following d ′ (e.g., students with good testscores choose the honors program in each period).

Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 9 / 27

Page 10: The Supervised Learning Approach To Estimating ...web.stanford.edu/~thaipham/papers/Heterogeneous... · work) We use a deep learning method in machine learning literature called “Multilayer

An Illustrative Model Identification Results

Identification Result - Static Treatment Regime

Theorem (Identification Result - STR)

Let d0 = d0(X0) and d1 = d1(X0). Then (with Assumptions)

E[

Y2 · 1{W0 = d0} · 1{W1 = d1}P(W0 = d0|X0) · P(W1 = d1|X0)

∣∣∣∣X0

]= E

[Y d

2

∣∣∣X0

].

Corollary:

E[ observed/estimable, Transformed Outcome︷ ︸︸ ︷

Y2 ·[

W0W1

e0e1− (1−W0)(1−W1)

(1− e0)(1− e1)

] ∣∣∣∣X0

]= E

[ unobserved, PO︷ ︸︸ ︷Y2(1,1)− Y2(0,0)

∣∣X0].

Here, e0 = P(W0 = 1|X0) and e1 = P(W1 = 1|X0).

Matching Based Testing Method STR Estimation

Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 10 / 27

Page 11: The Supervised Learning Approach To Estimating ...web.stanford.edu/~thaipham/papers/Heterogeneous... · work) We use a deep learning method in machine learning literature called “Multilayer

An Illustrative Model Identification Results

Identification Result - Dynamic Treatment Regime

Theorem (Identification Result - DTR)

Let d0 = d0(X0) and d1 = d1(X0,X1,Y1,W0). Then (with Assumptions)

In period T = 1:

E[

Y2 ·1{W1 = d1}

P(W1 = d1|X0,X1,Y1,W0)︸ ︷︷ ︸observed/estimable

∣∣∣∣X0,X1,Y1,W0

]

= E[

Y d12︸︷︷︸

PO

∣∣X0,X1,Y1,W0].

In period T = 0:

E[

Y2 · 1{W1 = d1} · 1{W0 = d0}P(W1 = d1|X0,X1,Y1,W0) · P(W0 = d0|X0)

∣∣∣∣X0

]= E

[Y d

2∣∣X0].

DTR Estimation

Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 11 / 27

Page 12: The Supervised Learning Approach To Estimating ...web.stanford.edu/~thaipham/papers/Heterogeneous... · work) We use a deep learning method in machine learning literature called “Multilayer

Model Estimation

Challenges In Traditional Approach

Goal: Specify a relation b/w transformed outcome T and covariates C.

Econometric approaches assume T = h(C;β) + ε for a fixed (linear)function h(·) and E[ε|C] = 0, and estimate β by minimizing

||T− h(C;β)||2.

Problem: Linear models need not give good estimates in general.

Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 12 / 27

Page 13: The Supervised Learning Approach To Estimating ...web.stanford.edu/~thaipham/papers/Heterogeneous... · work) We use a deep learning method in machine learning literature called “Multilayer

Model Estimation

Machine Learning Approach

Machine learning methods generally give much more accurateestimates than traditional econometric models.

Empirical comparisons of different machine learning methods withlinear regressions:

Caruana and Niculescu-Mizil (2006)1

Morton, Marzban, Giannoulis, Patel, Aparasu, and Kakadiaris(2014)2

1Caruana, R. and A. Niculescu-Mizil, (2006), “An Empirical Comparison ofSupervised Learning Algorithms,” Proceedings of the 23rd International Conferenceon Machine Learning, Pittsburgh, PA.

2Morton, A., E. Marzban, G. Giannoulis, A. Patel, R. Aparasu, and I. A. Kakadiaris,(2014), “A Comparison of Supervised Machine Learning Techniques for PredictingShort-Term In-Hospital Length of Stay Among Diabetic Patients,” 13th InternationalConference on Machine Learning and Applications.

Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 13 / 27

Page 14: The Supervised Learning Approach To Estimating ...web.stanford.edu/~thaipham/papers/Heterogeneous... · work) We use a deep learning method in machine learning literature called “Multilayer

Model Estimation

Machine Learning Approach (Cont’d)

Goal: Specify a relation b/w transformed outcome T and covariates C.

Machine learning (ML) methods allow h(·) to vary in terms ofcomplexity, and estimate β by minimizing||T− h(C;β)||2 + λg(β) where g penalizes complex models.

Data set = (Training, Validation, Test). Use Training set (with CV)to choose the optimal h(·) in terms of complexity, validation set tochoose the optimal ‘hyperparameter’ λ, and test set to evaluatethe performance.

RMSE is the comparison criterion.

Hence, ML approach is flexible and performance oriented.

Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 14 / 27

Page 15: The Supervised Learning Approach To Estimating ...web.stanford.edu/~thaipham/papers/Heterogeneous... · work) We use a deep learning method in machine learning literature called “Multilayer

Model Estimation

Estimating Model

Propensity Score Estimation: Use logistic regression or otherML techniques such as Random Forest, Gradient Boosting, etc.

Full Model Estimation: (Though many ML techniques wouldwork) We use a deep learning method in machine learningliterature called “Multilayer Perceptron.”

It possesses the universal approximation property : it canapproximate any continuous function on any compact subset of Rn.

Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 15 / 27

Page 16: The Supervised Learning Approach To Estimating ...web.stanford.edu/~thaipham/papers/Heterogeneous... · work) We use a deep learning method in machine learning literature called “Multilayer

Model Estimation

Multilayer Perceptron (MLP)

Assume we want to estimate T = h(C;β) + ε. MLP (with onehidden layer) considers

h(C;β) =K∑

j=1

αjσ(γTj C + θj) and β =

(K , (αj , γj , θj)

Kj=1

),

where σ is a sigmoid function such as σ(x) = 1/(1 + exp(−x)).Empirically, MLP (and deep learning in general) is shown to workvery well. (Lecun et al.3, Mnih et al.4)

3LeCun, Y., Y. Bengio, and G. Hinton, (2015), “Deep Learning,” Nature 521,436-444 (28 May).

4Mnih, V., K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A.Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik,I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, (2015),“Human-level control through deep reinforcement learning,” Nature 518, 529-533 (26February).

Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 16 / 27

Page 17: The Supervised Learning Approach To Estimating ...web.stanford.edu/~thaipham/papers/Heterogeneous... · work) We use a deep learning method in machine learning literature called “Multilayer

Model Estimation Testing Method

Matching Based Testing Method

The identification results relate unobserved difference of potentialoutcomes Z to observed (or estimable) transformed outcome T :

E[T |C] = E[Z |C].

For example in STR : Z = Y d2 − Y d ′

2 , C = X0.

Randomly draw M units with treatment regime d . Denote byxd ,m

0 ’s and yd ,m2 ’s the covariates and corresponding outcomes.

For each m, determine xd ′,m0 = arg minx i

0 | regime= d ′ ||x i0 − xd ,m

0 ||2.

Let τm = yd ,m2 − yd ′,m

2 . Here, τ is a proxy for the unobserved Z .Let τ be the estimator which fits x0 to T .Define τm = 1

2(τ(xd ,m0 ) + τ(xd ′,m

0 )).

Define the matching lossM :√

1M∑M

m=1(τm − τm)2.

Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 17 / 27

Page 18: The Supervised Learning Approach To Estimating ...web.stanford.edu/~thaipham/papers/Heterogeneous... · work) We use a deep learning method in machine learning literature called “Multilayer

Simulations Setup

Simulation Setup

Test the ability of our method in adapting to heterogeneity in thetreatment regime effect.

50,000 obs for training; 5,000 obs for validation; 5,000 for testing.

X0 ∼ U([0,1]10); W0 ∈ {0,1}; Y1 ∈ R with standard normal noise;W1 ∈ {0,1}; Y2 ∈ R with standard normal noise.

e0(X0) = e1(X0) = e1(X0,W0,Y1) = 0.5.

τ1(X0) = E[

Y W0=11 − Y W0=0

1

∣∣∣X0

]= ξ(X0[1])ξ(X0[2]); and

τ2(X0,W0,Y1) = E[

Y W1=12 − Y W1=0

2

∣∣∣X0,W0,Y1

]= ρ(Y1)ρ(W0)ξ(X0[1])

whereξ(x) =

21 + e−12(x−1/2) ; ρ(x) = 1 +

11 + e−20(x−1/3) .

Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 18 / 27

Page 19: The Supervised Learning Approach To Estimating ...web.stanford.edu/~thaipham/papers/Heterogeneous... · work) We use a deep learning method in machine learning literature called “Multilayer

Simulations Results

Simulation Results

Table: Performance In Terms of Root Mean Squared Error (RMSE)

Method Linear Regression(LR)

Multilayer Perceptron(MLP)

STR 1.75 1.66DTR: T = 0 0.74 0.13DTR: T = 1 1.10 0.20

∗ sdv (TO: T = 0) = 2.41; sdv (true effect: T = 0) = 1.34.∗ sdv (TO: T = 1) = 3.21; sdv (true effect: T = 1) = 2.52.

Comments:MLP returns really good results, and it outperforms LR.Static setting does not fit here as the RMSEs on STR are bad.

Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 19 / 27

Page 20: The Supervised Learning Approach To Estimating ...web.stanford.edu/~thaipham/papers/Heterogeneous... · work) We use a deep learning method in machine learning literature called “Multilayer

Simulations Results

Simulation Results (cont’d)5

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

● ●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●●

●●

●● ●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●●●

●●

●●

●●●

●●

● ●

●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●●

● ●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

MLP: T = 0

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

●●

● ●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●●

●●

●● ●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●

●●

●●●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

● ● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

● ●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

● ●

●●

●●

●●

●●●

● ●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

True Effect: T = 0

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

●●

● ●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●●

●●

●● ●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●

●●

●●●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

● ● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

● ●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

● ●

●●

●●

●●

●●●

● ●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

LR: T = 0

●●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●● ●

●●

●●

●●

●●

● ●

● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

● ●

●●

●● ●

●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

● ●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●●

● ●

●●

●●

●●

● ●

●●

●●

● ●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

● ●

● ●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

● ●

●● ●

●●

●●

●●

●●

●●

●●

● ● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●●

●●

●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

MLP: T = 1

●●

●●

●●

●●

● ●

● ●

● ●

●●

●● ●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●● ●

●●

● ●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

● ●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●● ●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

● ●●

● ●

●●

●●

●●

● ●

●●

● ●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

● ●

● ●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

● ●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●●

●●

●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

True Effect: T = 1

●●

●●

●●

●●

● ●

● ●

● ●

●●

●● ●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●● ●

●●

● ●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

● ●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●● ●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

● ●●

● ●

●●

●●

●●

● ●

●●

● ●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

● ●

● ●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

● ●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●●

●●

●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

LR: T = 1

Figure: Heterogeneous Treatment Regime Effect Using Validation and Test Data. The first row corresponds to period T = 0and the second row corresponds to period T = 1. In each period: the middle picture visualizes the true treatment effect; the leftone is the estimated effect by using Multilayer Perceptron; and the right one is the estimated effect by using Linear Regression.

5We thank Wager and Athey (2015) for sharing their visualization code.Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 20 / 27

Page 21: The Supervised Learning Approach To Estimating ...web.stanford.edu/~thaipham/papers/Heterogeneous... · work) We use a deep learning method in machine learning literature called “Multilayer

Empirical Application Estimation of Propensity Scores

Propensity Score Estimation

Use the North Carolina Honors Program data (in illustrative model).

EstimateP(W0 = 1|X0), P(W1 = 1|X0)

andP(W1 = 1|X0,Y1,W0).

Use Random Forest as a probabilistic classification problem.

Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 21 / 27

Page 22: The Supervised Learning Approach To Estimating ...web.stanford.edu/~thaipham/papers/Heterogeneous... · work) We use a deep learning method in machine learning literature called “Multilayer

Empirical Application Static Treatment Regime Estimation

Model Estimation - Static Treatment Regime

Use three methods: Linear Regression, Gradient Boosting, andMultilayer Perceptron.

MethodValidation

Matching LossTest

Matching Loss

Linear Regression 11.15 9.77Gradient Boosting 5.03 4.89Multilayer Perceptron 3.20 3.27

Comments:MLP outperforms other methods.All results are bad, which signals the dynamic nature of the data.

STR

Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 22 / 27

Page 23: The Supervised Learning Approach To Estimating ...web.stanford.edu/~thaipham/papers/Heterogeneous... · work) We use a deep learning method in machine learning literature called “Multilayer

Empirical Application Dynamic Treatment Regime Estimation

Model Estimation - Dynamic Treatment Regime

Period T = 1 - MethodValidation

Matching LossTest

Matching Loss

Linear Regression 1.29 1.29Gradient Boosting 0.94 1.01Multilayer Perceptron 0.94 1.01

*sdv(TO: val) = 4.06; sdv(est. true effect: val) = 0.85.*sdv(TO: test) = 4.03; sdv(est. true effect: test) = 0.92.

Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 23 / 27

Page 24: The Supervised Learning Approach To Estimating ...web.stanford.edu/~thaipham/papers/Heterogeneous... · work) We use a deep learning method in machine learning literature called “Multilayer

Empirical Application Dynamic Treatment Regime Estimation

Model Estimation - Dynamic Treatment Regime(Cont’d)

Period T = 0 - MethodValidation

Matching LossTest

Matching Loss

Linear Regression 3.29 3.45Gradient Boosting 1.51 1.63Multilayer Perceptron 1.14 1.60

DTR - Use only students who follow the optimal treatment in T = 1.

*sdv(TO: val) = 6.94; sdv(est. true effect: val) = 0.84.*sdv(TO: test) = 7.45; sdv(est. true effect: test) = 0.98.

Remark: The results are worse than that in simulations due tounobserved heterogeneity.

Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 24 / 27

Page 25: The Supervised Learning Approach To Estimating ...web.stanford.edu/~thaipham/papers/Heterogeneous... · work) We use a deep learning method in machine learning literature called “Multilayer

Empirical Application Heterogeneous Optimal Regime Estimation

Heterogeneous Optimal Regime

Static Treatment Regime

Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 25 / 27

Page 26: The Supervised Learning Approach To Estimating ...web.stanford.edu/~thaipham/papers/Heterogeneous... · work) We use a deep learning method in machine learning literature called “Multilayer

Empirical Application Heterogeneous Optimal Regime Estimation

Heterogeneous Optimal Regime (cont’d)

STR: Estimated gain per student from heterogeneous optimal regimeover homogeneous optimal regime (0,0):

1∑(0,0) not opt

1

( ∑(1,1) opt

[Y2(1,1)− Y2(0,0)

]+

∑(1,0) opt

[Y2(1,0)− Y2(0,0)

]+

∑(0,1) opt

[Y2(0,1)− Y2(0,0)

])= 0.91.

DTR: Estimated gain per student in T = 0 from heterogeneous optimalW0 over homogeneous optimal treatment W0 = 0:∑

W0=1 optimal

[Y W0=1

2 − Y W0=02

]#obs used in T = 0 s.t. W0 = 1 opt

= 0.74.

∗ mean(Y2) = 0;min(Y2) = −4.06;max(Y2) = 3.66.

Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 26 / 27

Page 27: The Supervised Learning Approach To Estimating ...web.stanford.edu/~thaipham/papers/Heterogeneous... · work) We use a deep learning method in machine learning literature called “Multilayer

Conclusion

Conclusion

We developed a nonparametric framework using supervisedlearning to estimate heterogeneous causal regime effects.

Our model addresses the dynamic treatment setting, thepopulation heterogeneity, and the difficulty in setting up sequentialrandomized experiments in reality.

We introduced machine learning approach, in particular deeplearning, which demonstrates its estimation power and which isrobust to model misspecification.

We also introduced matching based testing method for theestimation of heterogeneous treatment regime effects. A matchingbased kernel estimator for variance of these effects is introducedin Appendix.

Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 27 / 27

Page 28: The Supervised Learning Approach To Estimating ...web.stanford.edu/~thaipham/papers/Heterogeneous... · work) We use a deep learning method in machine learning literature called “Multilayer

Appendix

Variance Estimation - Matching Kernel Approach

Matching

M matching pairs[(xd ,m

0 , yd ,m2 ); (xd ′,m

0 , yd ′,m2 )

], m = 1, ...,M.

Fix xnew0 . To estimate σ2(xnew

0 ) = Var(Y d2 − Y d ′

2 |xnew0 ), we define

εm = τm − τm.

Let xmean,m0 =

xd,m0 +xd′,m

02 . An estimator for σ2(xnew

0 ) is

σ2(xnew0 ) =

∑Mm=1 K (H−1[xmean,m

0 − xnew0 ])ε2m∑M

m=1 K (H−1[xmean,m0 − xnew

0 ]).

Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 27 / 27