large scale causal inference with machine...

53
Large Scale Causal Inference with Machine Learning Ph.D. Thesis Defense Presentation of Thai T. Pham Graduate School of Business, Stanford University [email protected] May 30, 2017 Committee: Guido Imbens, Han Hong, Mohsen Bayati, Paulo Somaini University Chair: Gabriel Carroll

Upload: others

Post on 20-May-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Large Scale Causal Inference with Machine Learning

Ph.D. Thesis Defense Presentationof

Thai T. Pham

Graduate School of Business, Stanford [email protected]

May 30, 2017

Committee: Guido Imbens, Han Hong, Mohsen Bayati, Paulo SomainiUniversity Chair: Gabriel Carroll

Page 2: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Outline

1 Average Treatment Effect Estimation in High Dimensional ObservationalData: Practical Recommendations

2 A Deep Causal Inference Approach to Measuring the Effects of FormingGroup Loans in Online Non-profit Microfinance Platform

3 The Supervised Learning Approach To Estimating Heterogeneous CausalRegime Effects (time allows)

Thai T. Pham Large Scale Causal Inference with Machine Learning 2 / 39

Page 3: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

PART I:

Average Treatment Effect Estimation in HighDimensional Observational Data: Practical

Recommendations

(Joint work with Guido Imbens)

Page 4: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Observational High-Dim ATE Estimation

Contributions

Review a rich list of average treatment effect (ATE) estimators inhigh dimensional observational data.

Systematically simulate data to compare those estimators,treating these simulated datasets as observational.

Provide a set of diagnostic tests which help applied researchersdecide on which estimators to use and which estimates are themost credible.

Run diagnostic tests on both simulated and real data to test theirperformance.

Thai T. Pham Large Scale Causal Inference with Machine Learning 3 / 39

Page 5: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Observational High-Dim ATE Estimation

Causal Inference Setting

The observed data (Yi ,Wi ,Xi)ni=1 includes:

Covariates (features) X ∈ Rp where p is large relative to n;Treatment W ∈ {0,1};Outcome Y ∈ R.

We use Rubin Causal Model with the main assumptions of SUTVA,Unconfoundedness, and Overlap (see Imbens and Rubin (2015) andRosenbaum and Rubin (1983)).

We are interested in estimating the ATE of W on Y :

τ := E[Y (1)− Y (0)],

where Y (1),Y (0) are potential outcomes.

Thai T. Pham Large Scale Causal Inference with Machine Learning 4 / 39

Page 6: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Observational High-Dim ATE Estimation

Methodology Design

Naive Estimator: mean({Yi |Wi = 1})−mean({Yi |Wi = 0}).

Regularized Regression: Elastic-net (L2 loss) or Dantzig selector(max norm loss) to estimate µ(w , x) = E[Y |X = x ,W = w ].

Inverse Propensity Score Weighting: Naive Estimator weighted bythe inverse of propensity score (p-score) e(x) = P(W = 1|X = x).

Thai T. Pham Large Scale Causal Inference with Machine Learning 5 / 39

Page 7: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Observational High-Dim ATE Estimation

Methodology Design (cont’d)

Double Selection Estimator (DSE): (Belloni et al. (2014)).

- For w ∈ {0,1}, fit γw = arg minγ

∑i:Wi=w

(Yi − Xiγ)2 + λ‖γ‖1

and δ = arg minγ

∑i

(Wi − Xiδ)2 + λ‖δ‖1.

- Z is the set of covariates with non-zero coefs. in γ0 ∪ γ1 ∪ δ.

- For w ∈ {0,1}, refit γ∗w = arg minγ

∑i:Wi=w

(Yi − Ziγ)2.

Then

τDSE =1n

n∑i=1

Zi(γ∗1 − γ∗0).

Part II - DSE

Thai T. Pham Large Scale Causal Inference with Machine Learning 6 / 39

Page 8: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Observational High-Dim ATE Estimation

Methodology Design (cont’d)

Doubly Robust Estimator (DRE): τDRE = ICi , where

ICi =Wi(Yi − µ(1,Xi))

e(Xi)−(1−Wi)(Yi − µ(0,Xi))

1− e(Xi)+µ(1,Xi)−µ(0,Xi).

Targeted Maximum Likelihood Estimator (TMLE): (van der Laanand Rubin (2006)). τTMLE is τDRE with µ(w , x) replaced with itsupdated version: µ(w , x) + εH(w , x), where

H(w , x) =w

e(x)− 1− w

1− e(x)and ε =

n∑i=1

H(Wi ,Xi)(Yi − µ(Wi ,Xi))

n∑i=1

H(Wi ,Xi)2.

Part II - DRE

Thai T. Pham Large Scale Causal Inference with Machine Learning 7 / 39

Page 9: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Observational High-Dim ATE Estimation

Methodology Design (cont’d)

Approximate Residual Balancing Estimator (ARBE): (Athey et al.(2016)). For w ∈ {0,1}, fit

γw = arg minγ

(1− ξ)‖γ‖22 + ξ∥∥X − XT

w γ∥∥2∞ s.t.

∑i:Wi=0

γi = 1, γ ≥ 0.

and

βw = arg minβ

∑i:Wi=w

(Yi − Xiβ)2 + λ(

(1− α)‖β‖22 + α‖β‖1).

Then, τARBE is[X β1 +

∑i:Wi=1

γ1,i(Yi − Xi β1)]−[X β0 +

∑i:Wi=0

γ0,i(Yi − Xi β0)].

Thai T. Pham Large Scale Causal Inference with Machine Learning 8 / 39

Page 10: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Observational High-Dim ATE Estimation

Methodology Design (cont’d)

The ARBE estimator:

τARBE =[X β1 +

∑i:Wi=1

γ1,i(Yi − Xi β1)]−[X β0 +

∑i:Wi=0

γ0,i(Yi − Xi β0)].

Approximate Balance: ARBE with β0 = β1 = 0 (Zubizarreta(2015)).

Inverse Propensity Residual Weighting: ARBE with γw replacedwith weighted p-scores (Farrell (F2015)).

Thai T. Pham Large Scale Causal Inference with Machine Learning 9 / 39

Page 11: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Observational High-Dim ATE Estimation

Methodology Design (cont’d)

Propensity Tree: Use sample splitting with one sample buildingthe tree and a separate sample estimating the treatment effects(Athey and Imbens (2016)).

Propensity Forest: The average of many propensity trees. Thegeneralization from propensity tree to propensity forest is similarto that from Tree to Random forest (Wager and Athey (2016)).

Thai T. Pham Large Scale Causal Inference with Machine Learning 10 / 39

Page 12: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Observational High-Dim ATE Estimation

Methodology Design (cont’d)

Double Machine Learning Estimator (DMLE): DRE with samplesplitting (Chernozhukov et al. (2016)). Consider the Interactive Model:

Y = g0(W ,X ) + ξ and W = m0(X ) + ν.

The steps to estimate τ in this setting are as follows.1. K ≥ 2. Let {1, ...,n} = ∪K

k=1Ik , |Ik | = n/K , Ick = ∪j 6=k Ij . Define

τ(Ik , Ick ) = 1

n/K∑

i∈Ik ICi , where ICi is

Wi(Yi − gIck(1,Xi))

mIck(Xi)

−(1−Wi)(Yi − gIc

k(0,Xi))

1− mIck(Xi)

+gIck(1,Xi)−gIc

k(0,Xi).

Here, gIck(1, ·), gIc

k(0, ·), and mIc

k(·) are estimators using sample Ic

k

of g0(1, ·), g0(0, ·), and m0(·).

2. The final estimator τDMLE =1K

K∑k=1

τ(Ik , Ick ).

Thai T. Pham Large Scale Causal Inference with Machine Learning 11 / 39

Page 13: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Observational High-Dim ATE Estimation

Methodology Design (cont’d)

A special case of DMLE: Partially Linear Model:

Y = W τ + g0(X ) + ξ and W = m0(X ) + ν.

The estimation is the same as in the Interactive Model, except that nowτ(Ik , Ic

k ) is the root τ of

1n

∑i∈Ik

(Yi − lIc

k(Xi)− τ

(Wi − mIc

k(Xi)

))(Wi − mIc

k(Xi)

)= 0.

Here, lIck(·) and mIc

k(·) are estimators using sample Ic

k of l0(·) and m0(·)where m0(·) defined above and l0(X ) = E[Y |X ].

Thai T. Pham Large Scale Causal Inference with Machine Learning 12 / 39

Page 14: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Observational High-Dim ATE Estimation

Simulation Design - General Setting

Run N = 100 simulation replications for the following settings:

n p X W βW Y β

• 1000 • 100 • Independence • Linear • Dense • Linear • Dense

• 300 • Dependence • Nonlinear • Sparse • Partial Linear • Sparse

• Clustering • Nonlinear

• Mixed

Appendix - General Setting

Thai T. Pham Large Scale Causal Inference with Machine Learning 13 / 39

Page 15: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Observational High-Dim ATE Estimation

Simulation Design - Special Setting

Inspired by Athey et al. (2016), run N = 100 simulation replications forthe settings in which some variables have strong “double effects."

(n,p) ∈ {1000} × {100,300} and X ∼ N(0, Ip×p).

W ∼ Bernoulli(1− e−θ

)where θ = 0.89× log

(1 + e−2−2XβW

).

Consider dense, sparse, and extremely sparse βW .

Y = Xβ + θ(2W−1)2 + η where η ∼ N(0,1). Consider both dense

and sparse β.

When θ ↑ (due to XβW ↑), (1− e−θ) ↑ which means P(W = 1|X ) ↑.Thus in this case, both θ and W have direct positive effects on Y ,hence the name “double effects."

Thai T. Pham Large Scale Causal Inference with Machine Learning 14 / 39

Page 16: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Observational High-Dim ATE Estimation

Evaluation Metrics

In each simulation, we determine the oracle estimator which obtainsthe semi-parametric efficiency bound:

τoracle =1n

n∑i=1

{(Yi − µ1(Xi ))Wi

P(Wi = 1|Xi )− (Yi − µ0(Xi ))(1−Wi )

P(Wi = 0|Xi )+ µ1(Xi )− µ0(Xi )

}where µi(X ) = E[Y |X ,W = i] for i ∈ {0,1}. This τoracle satisfies thatτoracle = arg minτ E[(τ − τ)2]. We then define the “efficiency", whichmeasures the biasedness of τoracle:

efficiency = |τoracle − τ |.

For each estimator τ , obtain N estimates τi (i = 1, ...,N). Calculate:(a) absolute median (scaled) error:

∣∣∣median(

τ−τefficiency

)∣∣∣; and

(b) median absolute (scaled) error: median(|τ−τ |

efficiency

).

Thai T. Pham Large Scale Causal Inference with Machine Learning 15 / 39

Page 17: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Observational High-Dim ATE Estimation

Results - General SettingTable: τ = 10; independent X ; dense, linear pscore; dense, linear outcome

Error Metric Absolute Median Median Absolutep 100 300 100 300

Efficiency 0.02 0.00 0.09 0.09Naive 109.48 180.65 109.48 180.65Elastic-net 4.03 3.42 4.03 3.42Dantzig 0.88 4.49 1.15 4.49IPW Lasso 79.80 166.40 79.80 166.40IPW Elastic-net 75.59 165.36 75.59 165.36IPW Forest 101.40 174.38 101.40 174.38IPW Residual 2.84 3.25 2.84 3.25DSE 0.11 0.33 0.68 0.90DRE 100.67 172.55 100.67 172.55TMLE 1.24 1.55 1.42 1.60ARBE 0.97 2.21 1.11 2.21ARBE (Allow Neg Weights) 0.91 2.04 1.17 2.04Approx Balance 31.08 100.12 31.08 100.12ProTree 109.33 181.33 109.33 181.33ProForest 107.07 177.73 107.07 177.73DMLE (Partial Linear) Lasso 80.97 161.09 80.97 161.09DMLE (Partial Linear) Forest 90.48 165.63 90.48 165.63DMLE (Interactive) Lasso 93.32 170.03 93.32 170.03DMLE (Interactive) Forest 95.11 169.75 95.11 169.75

Diagnostic Tests on General Setting 1

Thai T. Pham Large Scale Causal Inference with Machine Learning 16 / 39

Page 18: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Observational High-Dim ATE Estimation

Results - Special SettingTable: τ = 1.55; sparse pscore; sparse outcome

Error Metric Absolute Median Median Absolutep 100 300 100 300

Efficiency 0.00 0.01 0.07 0.07Naive 3.50 3.72 3.50 3.72Elastic-net 23.16 20.18 23.16 20.18Dantzig 25.22 21.74 25.22 21.74IPW Lasso 6.45 6.61 6.45 6.61IPW Elastic-net 6.42 6.28 6.42 6.28IPW Forest 5.52 4.53 5.52 4.53IPW Residual 22.29 20.45 22.29 20.45DSE 28.34 25.25 28.34 25.25DRE 3.42 2.69 3.42 2.69TMLE 18.24 15.50 18.24 15.50ARBE 22.45 21.37 22.45 21.37ARBE (Allow Neg Weights) 27.63 24.28 27.63 24.28Approx Balance 16.09 15.18 16.09 15.18ProTree 5.27 5.18 5.27 5.18ProForest 6.79 5.77 6.79 5.77DMLE (Partial Linear) Lasso 5.41 5.50 5.41 5.50DMLE (Partial Linear) Forest 3.20 2.24 3.20 2.24DMLE (Interactive) Lasso 12.72 10.68 12.72 10.68DMLE (Interactive) Forest 4.48 3.69 4.48 3.69

Diagnostic Tests on Special Setting 1

Thai T. Pham Large Scale Causal Inference with Machine Learning 17 / 39

Page 19: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Observational High-Dim ATE Estimation

Diagnostic Tests for Determining The Best Estimator

Use each of the proposed estimators to estimate the effect of Won Y on each of two halves of the data, cut off at the median ofeach covariate, in addition to the estimate on all the data. So if wehave p covariates, then we would obtain p pairs of estimates(τ1

i , τ2i )p

i=1 beside the full-sample estimate τ .

For each estimator τ , define τdiffi =

∣∣∣ τ1i +τ

2i

2 − τ∣∣∣ for i = 1, ...,p, and

let Mτ ,mτ , and στ be max, mean, and s.d. of {τdiffi

∣∣i = 1, ...,p}.

Split the dataset randomly into K folds. In turn, we estimate theeffect of W on Y using each fold and output the in-sample GOFon the same fold. So for each estimator τ , we obtain K values(GOF τ

i )Ki=1. Let Gτ

M , Gτm, and Gτ

σ be their max, mean, and s.d.

Thai T. Pham Large Scale Causal Inference with Machine Learning 18 / 39

Page 20: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Observational High-Dim ATE Estimation

Diagnostic Tests (cont’d)

Two-samples approximate one-sided t-tests: For estimators τ1, τ2with Gτ1

M > Gτ2M and Gτ1

m > Gτ2m , we form two test statistics:

Gmax =Gτ1

M −Gτ2M√

(Gτ1σ )2

K + (Gτ2σ )2

K

and Gmean =Gτ1

m −Gτ2m√

(Gτ1σ )2

K + (Gτ2σ )2

K

,

with df = (K − 1)((Gτ1σ )2 + (Gτ2

σ )2)2/((Gτ1σ )4 + (Gτ2

σ )4).

Two-samples approximate one-sided t-tests and one-sidedF-tests: For two estimators τ1 and τ2 with Mτ1 > Mτ2 ,mτ1 > mτ2 ,and στ1 > στ2 , we form three test statistics:

Tmax =Mτ1 −Mτ2√σ2τ1/p + σ2

τ2/p

and Tmean =mτ1 −mτ2√σ2τ1/p + σ2

τ2/p, with

df =(p−1)(σ2

τ1+σ2

τ2)2

σ4τ1+σ4

τ2and F =

σ2τ1

σ2τ2

with df1 = df2 = p − 1.

Thai T. Pham Large Scale Causal Inference with Machine Learning 19 / 39

Page 21: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Observational High-Dim ATE Estimation

Diagnostic Tests on General Simulated Data

Table: τ = 10; independent X ; dense, linear pscore; dense, linear outcome

mean diff max diff sd diff mean GOF max GOF sd GOF

Naive 0.05 0.24 0.05 8.81 9.09 0.35Elastic-net 0.04 0.15 0.03 4.35 5.80 1.34Dantzig 0.06 0.18 0.04 3.98 5.64 1.12IPW Lasso 1.05 1.72 0.31 10.12 10.79 0.40IPW Elastic-net 1.13 1.82 0.28 10.12 10.79 0.40IPW Forest 0.09 0.24 0.06 10.12 10.79 0.40IPW Residual 0.07 0.19 0.04 4.29 5.28 1.74DSE 0.03 0.13 0.03 3.87 6.15 2.43DRE 0.07 0.26 0.05 3.32 3.46 0.15TMLE 0.06 0.17 0.03 4.35 5.80 1.34ARBE 0.08 0.19 0.04 3.93 7.00 2.42ARBE (Allow Neg Weights) 0.07 0.16 0.04 4.03 5.66 1.94Approx Balance 1.33 1.66 0.14 10.02 10.69 0.40ProTree 0.16 0.54 0.13 8.29 8.90 0.54ProForest 0.32 1.07 0.22 3.32 3.46 0.15DMLE (Partial Linear) Lasso 1.14 1.57 0.24 7.21 7.97 0.81DMLE (Partial Linear) Forest 0.08 0.22 0.05 3.30 3.50 0.18DMLE (Interactive) Lasso 0.73 1.14 0.18 6.37 6.90 0.53DMLE (Interactive) Forest 0.12 0.38 0.10 3.34 3.48 0.13

General Setting 1

Thai T. Pham Large Scale Causal Inference with Machine Learning 20 / 39

Page 22: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Observational High-Dim ATE Estimation

Diagnostic Tests on Special Simulated Data

Table: τ = 1.55; sparse pscore; sparse outcome

mean diff max diff sd diff mean GOF max GOF sd GOF

Naive 0.01 0.31 0.03 1.57 1.65 0.05Elastic-net 0.20 0.60 0.10 1.20 1.37 0.11Dantzig 0.68 1.26 0.24 1.13 1.34 0.15IPW Lasso 0.03 0.25 0.03 1.79 1.88 0.08IPW Elastic-net 0.03 0.27 0.03 1.79 1.88 0.08IPW Forest 0.03 0.26 0.03 1.79 1.88 0.08IPW Residual 0.24 0.64 0.10 1.24 1.40 0.11DSE 0.08 0.25 0.06 0.82 0.94 0.08DRE 0.02 0.17 0.03 0.55 0.56 0.01TMLE 0.41 0.97 0.19 1.20 1.37 0.11ARBE 0.15 0.47 0.07 1.22 1.40 0.12ARBE (Allow Neg Weights) 0.18 0.40 0.06 1.18 1.41 0.13Approx Balance 0.10 0.17 0.03 1.77 1.87 0.08ProTree 0.09 0.30 0.07 1.28 1.32 0.05ProForest 0.10 0.26 0.07 0.55 0.56 0.01DMLE (Partial Linear) Lasso 0.08 0.21 0.04 1.33 1.49 0.09DMLE (Partial Linear) Forest 0.02 0.05 0.01 0.60 0.62 0.01DMLE (Interactive) Lasso 0.22 0.36 0.05 1.08 1.22 0.08DMLE (Interactive) Forest 0.05 0.14 0.03 0.58 0.60 0.01

Special Setting 1

Thai T. Pham Large Scale Causal Inference with Machine Learning 21 / 39

Page 23: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Observational High-Dim ATE Estimation

Diagnostic Test Results on Real Data

Inspired by Athey, Imbens, Pham, and Wager (2017): Consider six ATTestimators on non-experimental Lalonde data.

* ATT (on experimental data) ≈ 1794.

Table: Diagnostic Tests on Lalonde Data

Naive OLS DSE ARBE DRE DMLE

ATT -8497.52 689.86 684.26 1153.27 1257.25 1504.47mean diff 904.60 755.40 792.37 679.25 404.49 1545.18max diff 1398.59 1454.24 1439.00 1153.69 1136.67 2045.67sd diff 463.34 476.18 466.44 375.11 429.89 524.02mean GOF 9646.83 6981.61 7000.44 7097.15 5241.02 5011.23max GOF 9694.39 7047.13 7052.17 7138.70 5332.31 5092.06sd GOF 55.83 56.97 57.82 60.21 88.37 80.15

* Diagnostic Tests claim that DRE is the best here.* DMLE appears better but unstable here (running multiple timesreturns largely different estimates).

Thai T. Pham Large Scale Causal Inference with Machine Learning 22 / 39

Page 24: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Observational High-Dim ATE Estimation

Conclusion & Future Study

Main take-aways:

Propose a procedure of using diagnostic tests for selecting themost credible estimators in high dimensional observational data.

This procedure has three important characteristics: first, it isintuitive; second, it works reasonably well on both simulated andreal datasets; and third, it is easy to implement.

Future work:

Apply the proposed diagnostic test procedure to more datasets.

Adjust the procedure to fit really big data in which computationaltime is a concern.

Adjust the procedure to fit more datasets.

Thai T. Pham Large Scale Causal Inference with Machine Learning 23 / 39

Page 25: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

PART II:

A Deep Causal Inference Approach to Measuring theEffects of Forming Group Loans in Online Non-profit

Microfinance Platform

(Joint work with Yuanyuan Shen)

Page 26: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Deep Causal Inference on Kiva

Contributions

Kiva: an online non-profit crowdsouring microfinance platform thatraises funds for the poor in the third world.

The borrowers on Kiva are small business owners and individualsin urgent need of money. To raise funds as fast as possible, theyhave the option to form groups to request loans. They rely on theField Partners who connect with the non-profit seeking lenders.

*MFI: MicroFinance Institute

Thai T. Pham Large Scale Causal Inference with Machine Learning 24 / 39

Page 27: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Deep Causal Inference on Kiva

Contributions (cont’d)

While it is generally believed that group loans pose less risk forinvestors than individual loans do, we study whether this is thecase in a philanthropic online marketplace.

In particular, we measure the average treatment effect (ATE) offorming group loans on funding time while controlling for the loansizes and other factors.

Because loan descriptions (in the form of texts) play an importantrole in lenders’ decision process on Kiva, we make use of thisinformation through the cutting-edge deep learning and naturallanguage processing techniques.

Thai T. Pham Large Scale Causal Inference with Machine Learning 25 / 39

Page 28: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Deep Causal Inference on Kiva

Contributions (cont’d)

Deep learning has been extremely successful in solving predictiontasks (self-driving cars, translation, AlphaGo, etc.)

We are aware of no other work that uses deep learning in causalinference, except Hartford et al. (2017) for their Deep IV model;but they use a simple deep learning model in a low-dim settingwith numerical covariates.

This is the first paper that uses one of the most advanced deeplearning techniques to deal with unstructured text data in a waythat can take advantage of its superior prediction power to answercausal questions.

Thai T. Pham Large Scale Causal Inference with Machine Learning 26 / 39

Page 29: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Deep Causal Inference on Kiva

Data

Our cleaned data set has 995,911 loan entries from Kiva from Jan 1,2006 to May 10, 2016.

Outcome Y : funding time (Use only funded loans; 95.2% of theloans). we focus our attention on these loans. Fastest fundingtime: 15 seconds; Longest funding time: 154 days. Averagefunding time: 7.11 days.

Treatment W : whether to form a group loan. 14.3% of the loansare group loans.

Covariates X : include a loan description and other information.

Thai T. Pham Large Scale Causal Inference with Machine Learning 27 / 39

Page 30: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Deep Causal Inference on Kiva

Data (cont’d)

Item

Variable Description Type

description_texts Loan description Textloan_amount The amount borrower requests Numericalsector Purpose of the loan Categorical (15)risker Lenders or partners bear the default risk Binarygender The gender of the borrower(s) Binary

“James is a 35-year-old mixed crop farmer. He is married toZipporah, a housewife. They are blessed with two childrenage seven and four years old, respectively. James has beenpracticing farming for the past two years with a monthlyincome of KES 18,000. James is applying for his third loanfrom KADET LTD after repaying the previous loanssuccessfully. He will use the loan to buy poultry feed andone-day-old chicks for rearing...”

Thai T. Pham Large Scale Causal Inference with Machine Learning 28 / 39

Page 31: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Deep Causal Inference on Kiva

Estimation Methodology - Preprocessing

We also use the Rubin Causal Model with the similar notationsand usual assumptions.

Preprocessing of Text Data: Use GloVe (Pennington et al. (2014))to create word embedding 100-dim vectors. Depending on usage,we may take the average of word vectors in a loan description tocreate loan vectors.

GloVe Method: Find a vector representation for each word in a waythat makes sense both semantically (meaning) and syntactically(grammar). A good vector representation of words, called embeddings,should capture such relations as

king −man + woman ≈ queen.

Thai T. Pham Large Scale Causal Inference with Machine Learning 29 / 39

Page 32: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Deep Causal Inference on Kiva

Estimation Methodology - Baseline Model

Regularized Linear Regression (LR) without Text Data: Use LRwith elastic-net to estimate µ(w , x) = E[Y |X = x ,W = w ]. Let Y1

and Y0 be the estimated µ(1,X ) and µ(0,X ) on all sample.

The ATE estimate: mean(Y1)−mean(Y0). The s.e. is estimatedby√

V1 + V0 where

V1 =var(Yi − Y1,i |i : Wi = 1)

nt − 1and V0 =

var(Yi − Y0,i |i : Wi = 0)

nc − 1.

Regularized Linear Regression with Text Data: First obtain loanvectors (from preprocessing step) and incorporate them with othercovariates to create full covariate vectors, and proceed as above.

Thai T. Pham Large Scale Causal Inference with Machine Learning 30 / 39

Page 33: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Deep Causal Inference on Kiva

Estimation Methodology - Advanced Model I

The first advanced method is Double Selection Estimator (DSE),which is proposed by Belloni et al. (2014).

DSE is easy to implement and is computationally inexpensive.Moreover, it gives reasonably good estimate in many cases.(Reference to Part I)

Part I - DSE

Thai T. Pham Large Scale Causal Inference with Machine Learning 31 / 39

Page 34: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Deep Causal Inference on Kiva

Estimation Methodology - Advanced Model II

The second advanced method is Doubly Robust Estimator (DRE)with the main ideas dated back to Robins et al. (1994).

We use DRE to take advantage of the deep learning methods inestimating the infinite dimensional components.

More importantly, DRE has the double robustness property: it is aconsistent estimator of the ATE if either the outcome model or thepropensity score (p-score) model is correctly specified, or both. Inhigh-dimensional setting, it is extremely difficult to guaranteecorrect specification for both models and thus, this property isessential.

Thai T. Pham Large Scale Causal Inference with Machine Learning 32 / 39

Page 35: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Deep Causal Inference on Kiva

Estimation Methodology - Advanced Model II (cont’d)

Doubly Robust Estimator (DRE):

Estimate the conditional expectations and p-score with estimatorsµ(1, x), µ(0, x), and e(x).

The DRE τDRE for τ is the average of the Influence Curves:

ICi =Wi [Yi − µ(1,Xi)]

e(Xi)−(1−Wi)[Yi − µ(0,Xi)]

1− e(Xi)+µ(1,Xi)−µ(0,Xi).

We follow Lunceford and Davidian (2004) and use an empiricalsandwich estimator to estimate the s.e.:√√√√ 1

n2

n∑i=1

(ICi − τDRE )2.

Part I - DRE

Thai T. Pham Large Scale Causal Inference with Machine Learning 33 / 39

Page 36: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Deep Causal Inference on Kiva

Estimation Methodology - Advanced Model III

The third advanced method is Targeted Maximum LikelihoodEstimator (TMLE), which can be regarded as a more generalversion of DRE (van der Laan and Rubin (2006)).

TMLE possesses all nice properties of DRE. Plus, it is moreaccurate as it is DRE applied on an updated, better version of theinitial infinite dimensional component estimates.

Part I - TMLE

Thai T. Pham Large Scale Causal Inference with Machine Learning 34 / 39

Page 37: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Deep Causal Inference on Kiva

Deep Learning Models - Multilayer Perceptron (MLP)

Figure: MLP for p-score Estimation

Estimate:A1 ∈ Rn1×d , A2 ∈ Rn2×n1,As ∈ RnT×T , A3 ∈ Rn3×(n2+nT ),A4 ∈ R2×n3 , b1 ∈ Rn1 , b2 ∈ Rn2 ,bs ∈ RT , b3 ∈ Rn3 , and b4 ∈ R2.The relations:

h1 = ReLU(A1X + b1);h2 = ReLU(A2h1 + b2);sfinal = ReLU(Ass + bs). LetH2 = [h2, sfinal ]

′ where s ∈ RT ;h3 = ReLU(A3H2 + b3);out = logistic(A4h3 + b4).

Here, ReLU(x) = max(0, x).

Thai T. Pham Large Scale Causal Inference with Machine Learning 35 / 39

Page 38: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Deep Causal Inference on Kiva

Deep Learning Models - Deep LSTM

Multilayer Recurrent Neural Network with Long Short Term MemoryCells, known shortly as Deep LSTM (with attention): Capturesequential relations!

Figure: Deep LSTM for p-score Estimation

LSTM Cell

Thai T. Pham Large Scale Causal Inference with Machine Learning 36 / 39

Page 39: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Deep Causal Inference on Kiva

Results

Estimation of p-score and outcome models: Compare MLP andDeep LSTM with Regularized Linear/Logistic Regression (RLR) andRandom Forest (Forest).

Table: Result Comparison on the Test Set. For p-score estimation, we use F1score and accuracy (F1 score and accuracy are both in [0,1]; the higher thebetter); for outcome estimations, we use RMSE (the lower the better).

p-score Treated Control

Method (F1) (acc) (RMSE) (RMSE)

w/o textRLR 0.41 88.2% 8.94 9.30Forest 0.56 90.2% 8.82 8.81

with textRLR 0.80 94.6% 8.63 9.15Forest 0.82 95.5% 7.91 8.32MLP 0.95 98.6% 7.27 7.76Deep LSTM 0.98 99.3% 7.24 7.70

Thai T. Pham Large Scale Causal Inference with Machine Learning 37 / 39

Page 40: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Deep Causal Inference on Kiva

Results (cont’d)

ATE Estimate: By double robustness, use DRE & TMLE (Deep LSTM)

Method ATE std

w/o textNaive 0.17 0.027Baseline 2.28 0.026

DSE -0.70 0.026

DRE (RLR) -0.61 1.623DRE (Forest) -2.62 0.475

TMLE (RLR) -1.03 1.690TMLE (Forest) -4.38 0.644

with textBaseline 2.87 0.026

DSE -0.52 0.025

DRE (RLR) -0.19 1.623DRE (Forest) -1.26 0.472DRE (MLP) -2.93 0.090DRE (Deep LSTM) -3.30 0.167

TMLE (RLR) -1.00 0.990TMLE (Forest) -12.60 0.150TMLE (MLP) -2.78 0.091TMLE (Deep LSTM) -3.29 0.167

Thai T. Pham Large Scale Causal Inference with Machine Learning 38 / 39

Page 41: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Deep Causal Inference on Kiva

Conclusion & Future Study

Main take-aways:

Machine learning, and especially deep learning, is very useful indealing with textual or other structured data. GloVe is, on its own,a very important algorithm to give meaning to texts.

Deep learning can also be useful in causal inference (e.g. DREand TMLE estimators).

Future Work:

Work with more estimators.

Apply the diagnostic tests (from Part I) to select the bestestimators (probably using a small subset of the data).

Estimate ATE on subgroups, e.g. important sectors.

Thai T. Pham Large Scale Causal Inference with Machine Learning 39 / 39

Page 42: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

PART III:

The Supervised Learning Approach To EstimatingHeterogeneous Causal Regime Effects

Page 43: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Estimation of Heterogeneous Causal Regime Effects

Contributions

Many sequential treatment settings: patients make adjustmentsin medications in multiple periods; students decide whether tofollow an educational honors program over multiple years; in labormarket, the unemployed might participate in a set of programs (jobsearch, subsidized job, training) sequentially.

Heterogeneity in treatment sequence reactions: medicationeffects can be heterogeneous across patients and across time;the same for educational program and labor market.

Hard to set up sequential randomized experiments in reality.

Thai T. Pham Large Scale Causal Inference with Machine Learning 39 / 39

Page 44: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Estimation of Heterogeneous Causal Regime Effects

Contributions (cont’d)

Develop a nonparametric framework using supervised learning toestimate heterogeneous treatment regime effects from observational(or experimental) data.

Treatment Regime: a set of functions of characteristics andintermediate outcomes.

Propose using supervised learning approach (MultilayerPerceptron or MLP), which gives good estimation accuracy andwhich is robust to model misspecification.

Propose matching based approach for testing the performance ofthe heterogeneous treatment regime effect estimators.

Propose matching based kernel estimator for variance of theheterogeneous treatment regime effect estimates.

Thai T. Pham Large Scale Causal Inference with Machine Learning 39 / 39

Page 45: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Estimation of Heterogeneous Causal Regime Effects

Contributions (cont’d)

In this paper, we

Focus on dynamic setting with multiple treatments appliedsequentially (in contrast to a single treatment).

Focus on the heterogeneous (in contrast to average) effect of asequence of treatments, i.e. a treatment regime.

Focus on observational data (in contrast to experimental data).

We apply the proposed methodology on both simulated data and theNorth Carolina Honors Program data and obtain good results.

Thai T. Pham Large Scale Causal Inference with Machine Learning 39 / 39

Page 46: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Estimation of Heterogeneous Causal Regime Effects

Simulation Results1

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

● ●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●●

●●

●● ●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●●●

●●

●●

●●●

●●

● ●

●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●●

● ●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

MLP: T = 0

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

●●

● ●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●●

●●

●● ●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●

●●

●●●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

● ● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

● ●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

● ●

●●

●●

●●

●●●

● ●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

True Effect: T = 0

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

●●

● ●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●●

●●

●● ●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●

●●

●●●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

● ● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

● ●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

● ●

●●

●●

●●

●●●

● ●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

OLS: T = 0

●●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●● ●

●●

●●

●●

●●

● ●

● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

● ●

●●

●● ●

●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

● ●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●●

● ●

●●

●●

●●

● ●

●●

●●

● ●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

● ●

● ●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

● ●

●● ●

●●

●●

●●

●●

●●

●●

● ● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●●

●●

●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

MLP: T = 1

●●

●●

●●

●●

● ●

● ●

● ●

●●

●● ●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●● ●

●●

● ●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

● ●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●● ●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

● ●●

● ●

●●

●●

●●

● ●

●●

● ●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

● ●

● ●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

● ●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●●

●●

●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

True Effect: T = 1

●●

●●

●●

●●

● ●

● ●

● ●

●●

●● ●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●● ●

●●

● ●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

● ●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●● ●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

● ●●

● ●

●●

●●

●●

● ●

●●

● ●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

● ●

● ●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

● ●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●●

●●

●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

OLS: T = 1

Figure: Heterogeneous Treatment Regime Effect Using Validation and Test Data. The first row corresponds to period T = 0and the second row corresponds to period T = 1. In each period: the middle picture visualizes the true treatment effect; the leftone is the estimated effect by using MLP; and the right one is the estimated effect by using OLS.

1We thank Wager and Athey (2016) for sharing visualization code.Thai T. Pham Large Scale Causal Inference with Machine Learning 39 / 39

Page 47: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

References (presentation order)

Imbens, G. and T. Pham, (2017), “Average Treatment Effect Estimationin High Dimensional Observational Data: Practical Recommendations,”Stanford GSB, Technical Report.

Pham, T. and Y. Shen, (2017), “A Deep Causal Inference Approach toMeasuring the Effects of Forming Group Loans in Online Non-profitMicrofinance Platform,” Stanford GSB, Technical Report.

Pham, T., (2016), “The Supervised Learning Approach to EstimatingHeterogeneous Causal Regime Effects,” Stanford GSB, TechnicalReport.

Imbens, G., and D. Rubin, (2015), “Causal Inference for Statistics,Social, and Biomedical Sciences: An Introduction,” CambridgeUniversity Press.

Rosenbaum, P. R., and D. B. Rubin, (1983), “The central role of thepropensity score in observational studies for causal effects,” Biometrika,70(1): 41–55.

Thai T. Pham Large Scale Causal Inference with Machine Learning 39 / 39

Page 48: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

References (cont’d)

Belloni, A., V. Chernozhukov, and C. Hansen, (2014), “Inference ontreatment effects after selection among high-dimensional controls,” TheReview of Economic Studies, 81(2): 608–650.

van der Laan, M. J. and D. Rubin, (2006), “Targeted Maximum LikelihoodLearning,” The International Journal of Biostatistics, 2(1): Article 11.

Athey, S., G. Imbens, and S. Wager, (2016), “Approximate ResidualBalancing: De-Biased Inference of Average Treatment Effects in HighDimensions,” preprint at arXiv:1604.07125v3.

Zubizarreta, J. S., (2015), “Stable weights that balance covariates forestimation with incomplete outcome data,” Journal of the AmericanStatistical Association, 110(511): 910–922.

Farrell, M. H., (2015), “Robust inference on average treatment effectswith possibly more covariates than observations,” Journal ofEconometrics, 189(1): 1–23.

Thai T. Pham Large Scale Causal Inference with Machine Learning 39 / 39

Page 49: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

References (cont’d)

Athey, S. and G. Imbens, (2016), “Recursive Partitioning forHeterogeneous Causal Effects,” Proceedings of the National Academyof Sciences, 113(27): 7353–7360.

Wager, S. and S. Athey, (2016), “Estimation and Inference ofHeterogeneous Treatment Effects using Random Forests,” preprint atarXiv:1510.04342v3.

Chernozhukov, V., D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, andW. Newey, (2016), “Double Machine Learning for Treatment and CausalParameters,” MIT, Working Paper.

Hartford, J., G. Lewis, K. Leyton-Brown, and M. Taddy, (2017),“Counterfactual Prediction with Deep Instrumental Variables Networks,”forthcoming, ICML.

Pennington, J., R. Socher, and C. D. Manning, (2014), “GloVe: GlobalVectors for Word Representation,” Proceedings of the Conference onEmpirical Methods in Natural Language Processing (EMNLP),1532–1543.

Thai T. Pham Large Scale Causal Inference with Machine Learning 39 / 39

Page 50: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

References (cont’d)

Robins, J. M., A. Rotnitzky, and L. P. Zhao, (1994), “Estimation ofregression coefficients when some regressors are not always observed,”Journal of the American Statistical Association, 89:846–66.

Lunceford, J. K. and M. Davidian, (2004), “Stratification and weightingvia the propensity score in estimation of causal treatment effects: Acomparative study,” Statistics in Medicine, 23:2937–60.

Pham, T., (2017), “Balancing Method for High Dimensional CausalInference,” arXiv preprint arXiv:1702.04473.

Athey, S., G. Imbens, T. Pham, and S. Wager, (2017), “EstimatingAverage Treatment Effects: Supplementary Analyses and RemainingChallenges,” The American Economic Review, 107(5): 278–81.

Pham, T. and W. Chen, (2016), “The Instrumental Variable Method forEstimating Local Average Treatment Regime Effects,” arXiv preprintarXiv:1611.03545.

Thai T. Pham Large Scale Causal Inference with Machine Learning 39 / 39

Page 51: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Appendix

Simulation Design - General Setting

Run N = 100 simulation replications for the following settings:

(n,p) ∈ {1000} × {100,300}.

Consider three cases for covariates X :

Independence: X ∼ N(0, Ip×p);

Dependence: X ∼ N(0,Σ), where Σjk = 12|j−k| ;

Clustering:1 Generate independent cluster centers ck ∼ N(0, Ip×p), k = 1, ..., 10.2 Draw center C uniformly at random from this set of cluster centers

and generate X ∼ N(C, Ip×p).3 Note that in this setting only, we simulate the treatment W such that

W = 1 with probability 0.15 for the first five clusters and W = 1 withprobability 0.85 for the last five clusters.

General Setting

Thai T. Pham Large Scale Causal Inference with Machine Learning 39 / 39

Page 52: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Appendix

Simulation Design - General Setting (cont’d)

Consider two cases for the p-score model:Linear: W ∼ Bernoulli(θ) where θ = 1

1+eXβW;

Nonlinear: W ∼ Bernoulli(θ) where θ = 1− 1(1+eXβW )1.23 .

Consider βW ∝ (1,1/√

2, ...,1/√

p) and βW ∝ (1, ...,1︸ ︷︷ ︸10

,0, ...,0︸ ︷︷ ︸p − 10

).

Consider four cases for outcome model: Here, η ∼ N(0,1)

Linear: Y = Xβ + 10W + η;

Partially linear: Y = 21+eXβ + 10W + η;

Nonlinear: Y = 2W−11+eXβ + η;

Mixed: Y = 2W−11+eXβ + Xβ + η.

Consider β ∝ (1,1, ...,1) and β ∝ (1,1/22, ...,1/p2).

General Setting

Thai T. Pham Large Scale Causal Inference with Machine Learning 39 / 39

Page 53: Large Scale Causal Inference with Machine Learningweb.stanford.edu/~thaipham/papers/Thai_Pham_PhD_Thesis_Defense.pdf · Large Scale Causal Inference with Machine Learning Ph.D. Thesis

Appendix

Long Short Term Memory (LSTM) Cell

Figure: LSTM Cell. Source: http://colah.github.io/

ft = σ(Wf · [ht−1, xt ] + bf ); it = σ(Wi · [ht−1, xt ] + bi);Ct = tanh(WC · [ht−1, xt ] + bC); Ct = ft ∗ Ct−1 + it ∗ Ct ;ot = σ(Wo · [ht−1, xt ] + bo); ht = ot ∗ tanh(Ct ).

Deep LSTM

Thai T. Pham Large Scale Causal Inference with Machine Learning 39 / 39