solving heterogeneous estimating equations using forest based … · 2018-08-13 · i cart: mse of...

50
Solving Heterogeneous Estimating Equations Using Forest Based Algorithms Susan Athey Stanford University Chicago, September 2016 with Julie Tibshirani and Stefan Wager Also based on Athey and Imbens (2016) and Wager and Athey (2015)

Upload: others

Post on 11-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Solving Heterogeneous Estimating EquationsUsing Forest Based Algorithms

Susan AtheyStanford University

Chicago, September 2016

with Julie Tibshirani and Stefan Wager

Also based on Athey and Imbens (2016) and Wager and Athey(2015)

Page 2: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Heterogeneous Parameter Estimation

Estimating heterogeneous treatment effects lies at the heart ofmany modern statistical problems:

I Personalized medicine: Which form of cancer therapy ismost appropriate for this specific patient?

I Targeted advertising/offers: What is the best offer for thisindividual?

I Fairness in machine learning: If we automate earlyscreening in college applications, we need to make sure thesystem doesn’t implicitly discriminate against minorities.

Page 3: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Policy Estimation and Evaluation

Heterogeneous treatment effect estimation is closely related to theproblem of optimal policy estimation and evaluation.

I What is the best policy for assigning individuals to medicine?

I Can I reject the hypothesis that my new assignment policygenerates the same benefits as alternatives?

I How can I learn optimal policies while minimizing the costfrom misassignment along the way?

Two strands of literature crossing many disciplines:

I (Offline) policy estimation and evaluation (Contextual bandits(Langford et al), many others; Athey and Wager (in progress))

I (Online) multi-armed bandits with state-contingentdecision-rules (Langford et al, many others; Athey, Du andImbens (2016))

Page 4: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

ML Methods for Causal Inference: Treatment EffectHeterogeneity

I ML methods perform well in practice, but many do not havewell established statistical properties (see Chen and White(1999) for early analysis of neural nets)

I Unlike prediction, ground truth for causal parameters notdirectly observed

I Need valid confidence intervals for many applications (ABtesting, drug trials); challenges include adaptive modelselection and multiple testing

I Different possible questions of interest, e.g.:I Identifying subgroups (Athey and Imbens, 2016)I Testing for heterogeneity across all covariates (List, Shaikh,

and Xu, 2016)I Robustness to model specification (Athey and Imbens, 2015)I Personalized estimates (Wager and Athey, 2015; Taddy et al

2014; others)

NB: See Mullainathan et al on prediction policy; ML in time series

Page 5: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

ML Methods for Causal Inference: More general models

I Much recent literature bringing ML methods to causalinference focus on single binary treatment in environment withunconfoundedness

I Economic models often have more complex estimationapproaches

I Quantile regressionI Instrumental VariablesI Panel regressionI Consumer choiceI Euler equationsI Survival analysis

Page 6: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

ML Methods for Causal Inference: More general modelsAverage Treatment Effects with IV or Unconfoundedness

I In a series of influential papers, Belloni, Chernozhukov,Hansen, et al generalized LASSO methods to averagetreatment effect estimation through instrumental variablesmodels, unconfoundedness, and also moment-based methods

I See also Athey, Imbens and Wager (2016) combine regularizedregression and high-dimensional covariate balancing foraverage treatment effect estimation; and references therein onmore recent papers on ATE estimation in high dimensions

Heterogeneous Treatment Effects

I Imai and Ratkovic (2013) analyze treatment effectheterogeneity with LASSO

I Targeted ML (van der Laan, 2006) can be used as asemi-parametric approach to estimating treatment effectheterogeneity

I See Wager and Athey (2015) and Athey and Imbens (2015)for further references on treatment effect heterogeneity

Page 7: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Forests for Semi-Parametric Parameter Heterogeneity

I Semi-parametric GMM/ML uses kernel weighting to estimatepersonalized model for each individual, weighting nearbyobservations more.

I Problem: curse of dimensionality

I We propose forest methods to determine what dimensionsmatter for “nearby” metric, reducing curse of dimensionality.

I Estimate model for each point using “forest-based” weights:the fraction of trees in which an observation appears in thesame leaf as the target

I We derive splitting rules optimized for objectiveI Computational trick:

I Use approximation to gradient to construct pseudo-outcomesI Then apply a splitting rule inspired by regression trees to these

pseudo-outcomes

Page 8: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Related Work(Semi-parametric) local maximum likelihood/GMM

I Local maximum likelihood (Hastie and Tibshirani, 1987)weights nearby observations; e.g. local linear regression. SeeLoader, C. (1999); also Hastie and Tibshirani (1990) on GAM.

I Lewbel (2006) asymptotic properties of kernel-based localGMM

I Other approaches include Sieve: Chen (2007) reviews

Score-based test statistics for parameter heterogeneityI Andrews (1993), Hansen (1992), and many others, e.g.

structural breaks, using scores of estimating equationsI Zeiles et al (2008) apply this literature to split points, when

estimating models in the leaves of a single tree.

Splitting rulesI CART: MSE of predictions for regression, Gini impurity for

classification, survival (see Bouhamad et al (2011))I Statistical tests, multiple testing corrections: Su et al (2009)I Causal trees/forests: adaptive v. honest est. (Athey and

Imbens, 2016); propensity forests (Wager and Athey, 2015)

Page 9: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

The potential outcomes framework

For a set of i.i.d. subjects i = 1, ..., n, we observe a tuple(Xi , Yi , Wi ), comprised of

I A feature vector Xi ∈ Rp,

I A response Yi ∈ R, and

I A treatment assignment Wi ∈ {0, 1}.

Following the potential outcomes framework (Holland, 1986,Imbens and Rubin, 2015, Rosenbaum and Rubin, 1983, Rubin,1974), we posit the existence of quantities Y

(0)i and Y

(1)i .

I These correspond to the response we would have measuredgiven that the i-th subject received treatment (Wi = 1) or notreatment (Wi = 0).

Page 10: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

The potential outcomes framework

For a set of i.i.d. subjects i = 1, ..., n, we observe a tuple(Xi , Yi , Wi ), comprised of

I A feature vector Xi ∈ Rp,

I A response Yi ∈ R, and

I A treatment assignment Wi ∈ {0, 1}.

Goal is to estimate the conditional average treatment effect

τ (x) = E[Y (1) − Y (0)

∣∣X = x].

NB: In experiments, we only get to see Yi = Y(Wi )i .

Page 11: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

The potential outcomes framework

If we make no further assumptions, estimating τ(x) is not possible.

I We assume that we have measured enough features to achieveunconfoundedness (Rosenbaum and Rubin, 1983){

Y(0)i , Y

(1)i

}⊥⊥Wi

∣∣ Xi .

I When this assumption holds, methods based on matching orpropensity score estimation are usually consistent.

Page 12: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Baseline method: k-NN matching

Consider the k-NN matching estimator for τ(x):

τ (x) =1

k

∑S1(x)

Yi −1

k

∑S0(x)

Yi ,

where S0/1(x) is the set of k-nearest cases/controls to x . This isconsistent given unconfoundedness and regularity conditions.

I Pro: Transparent asymptotics and good, robust performancewhen p is small.

I Con: Acute curse of dimensionality, even when p = 20 andn = 20k .

NB: Kernels have similar qualitative issues as k-NN.

Page 13: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Adaptive nearest neighbor matching

Random forests are a a popular heuristic for adaptive nearestneighbors estimation introduced by Breiman (2001).

I Pro: Excellent empirical track record.

I Con: Often used as a black box, without statistical discussion.

There has been considerable interest in using forest-like methodsfor treatment effect estimation, but without formal theory.

I Green and Kern (2012) and Hill (2011) have considered usingBayesian forest algorithms (BART, Chipman et al., 2010).

I Several authors have also studied related tree-basedmethods: Athey and Imbens (2016), Su et al. (2009), Taddyet al. (2014), Wang and Rudin (2015), Zeilis et al. (2008), ...

Wager and Athey (2015) provide the first formal results allowingrandom forest to be used for provably valid asymptotic inference.

Page 14: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Prior work on the statistics of random forests

I There has been considerable interest in characterizing regimeswhere random forests are consistent: Breiman et al. (1984),Denil et al. (2014), Lin and Jeon (2006), Meinshausen(2006), Scornet et al. (2015), ...

I Some have focused on simplified forest algorithms, in orderto obtain qualitative insights about forests: Arlot and Genuer(2014), Biau (2012), Biau et al. (2008), Buhlmann and Yu(2002), Duroux and Scornet (2016), Geurts et al. (2006), ...

I There are some recent results on variance estimation forforests (Mentch and Hooker, 2016; Sexton and Laake, 2009),but they do not yield formally valid confidence intervals.

Page 15: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Prior work on the statistics of random forestsI Mentch and Hooker, 2016: “Finally, recall that these

confidence intervals are for the expected prediction θkn andnot necessarily for the true value of the underlying regressionfunction θ. If the tree building method employed is consistentso that θkn → θ, then as the sample size increases, the treeshould be (on average) producing more accurate predictions,but in order to claim that our confidence intervals areasymptotically valid for θ we need for this convergence tooccur at rate of

√n.”

I But the flavor of forest studied in the paper does not convergeat that rate, it is bias-dominated.

I They stay closer to typical random forest (no sample splitting)I The proof method requires small subsamples <<

√n which

limits depth of trees, contributes to biasI Don’t rule out super-efficient forests that have wildly

inconsistent predictions (e.g. always predict 0)

So a different approach is needed.

Page 16: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Making k-NN matching adaptiveAthey and Imbens (2016) introduce causal tree: definesneighborhoods for matching based on recursive partitioning(Breiman, Friedman, Olshen, and Stone, 1984), advocate samplesplitting (w/ modified splitting rule) to get assumption-freeconfidence intervals for treatment effects in each leaf.

Euclidean neighborhood,for k-NN matching.

Tree-based neighborhood.

Page 17: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Valid Confidence Intervals

Athey and Imbens (2016) highlight the perils of adaptiveestimation for confidence intervals, tradeoff between MSE andcoverage.

Page 18: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Application: Treatment Effect Heterogeneity in Estimating Position Effects in Search Queries highly heterogeneous

Tens of millions of unique search phrases each month Query mix changes month to month for a variety of reasons Behavior conditional on query is fairly stable

Desire for segments. Want to understand heterogeneity and make decisions based

on it “Tune” algorithms separately by segment Want to predict outcomes if query mix changes

For example, bring on new syndication partner with more queries of a certain type

Page 19: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Relevance v. Position

25.4%

6.9% 4.0% 2.1%

4.9%3.5% 1.7%

13.5% 17.9% 21.6%

Control(1st Position) (1,3) (1,5) (1,10)

CTR

Loss in CTR from Link Demotion (US All Non-Navigational)

Original CTR of Position Gain from Increased Relevance Loss from Demotion

Page 20: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Search Experiment Tree: Effect of Demoting Top Link (Test Sample Effects) Some data

excluded with prob p(x): proportions do not match populationHighly navigational queries excluded

Page 21: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad
Page 22: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Use Test Sample for Segment Means & StdErrors to Avoid BiasVariance of estimated treatment effects in training sample 2.5 times that in test sample (adaptive estimates biased)

Honest Estimates Adaptive EstimatesTreatment Effect Standard Error Proportion Treatment Effect Standard Error Proportion-0.124 0.004 0.202 -0.124 0.004 0.202-0.134 0.010 0.025 -0.135 0.010 0.024-0.010 0.004 0.013 -0.007 0.004 0.013-0.215 0.013 0.021 -0.247 0.013 0.022-0.145 0.003 0.305 -0.148 0.003 0.304-0.111 0.006 0.063 -0.110 0.006 0.064-0.230 0.028 0.004 -0.268 0.028 0.004-0.058 0.010 0.017 -0.032 0.010 0.017-0.087 0.031 0.003 -0.056 0.029 0.003-0.151 0.005 0.119 -0.169 0.005 0.119-0.174 0.024 0.005 -0.168 0.024 0.0050.026 0.127 0.000 0.286 0.124 0.000

-0.030 0.026 0.002 -0.009 0.025 0.002-0.135 0.014 0.011 -0.114 0.015 0.010-0.159 0.055 0.001 -0.143 0.053 0.001-0.014 0.026 0.001 0.008 0.050 0.000-0.081 0.012 0.013 -0.050 0.012 0.013-0.045 0.023 0.001 -0.045 0.021 0.001-0.169 0.016 0.011 -0.200 0.016 0.011-0.207 0.030 0.003 -0.279 0.031 0.003-0.096 0.011 0.023 -0.083 0.011 0.022-0.096 0.005 0.069 -0.096 0.005 0.070-0.139 0.013 0.013 -0.159 0.013 0.013-0.131 0.006 0.078 -0.128 0.006 0.078

Page 23: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

From trees to random forests (Breiman, 2001)

Suppose we have a training set {(Xi , Yi , Wi )}ni=1, a test point x ,and a tree predictor

τ (x) = T (x ; {(Xi , Yi , Wi )}ni=1) .

Random forest idea: build and average many different trees T ∗:

τ (x) =1

B

B∑b=1

T ∗b (x ; {(Xi , Yi , Wi )}ni=1) .

· · ·

Page 24: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

From trees to random forests (Breiman, 2001)

Suppose we have a training set {(Xi , Yi , Wi )}ni=1, a test point x ,and a tree predictor

τ (x) = T (x ; {(Xi , Yi , Wi )}ni=1) .

Random forest idea: build and average many different trees T ∗:

τ (x) =1

B

B∑b=1

T ∗b (x ; {(Xi , Yi , Wi )}ni=1) .

We turn T into T ∗ by:

I Bagging / subsampling the training set (Breiman, 1996); thishelps smooth over discontinuities (Buhlmann and Yu, 2002).

I Selecting the splitting variable at each step from m out of prandomly drawn features (Amit and Geman, 1997).

Page 25: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Statistical inference with regression forests

Honest trees do not use the same data to select partition (splits)and make predictions. Ex: Split-sample trees, propensity trees.

Theorem. (Wager and Athey, 2015) Regression forests areasymptotically Gaussian and centered,

µn (x)− µ (x)

σn (x)⇒ N (0, 1) , σ2n(x)→p 0,

given the following assumptions (+ technical conditions):

1. Honesty. Individual trees are honest.

2. Subsampling. Individual trees are built on randomsubsamples of size s � nβ, where βmin < β < 1.

3. Continuous features. The features Xi have a density that isbounded away from 0 and ∞.

4. Lipschitz response. The conditional mean functionµ(x) = E

[Y∣∣X = x

]is Lipschitz continuous.

Page 26: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Variance estimation for regression forests

We estimate the variance of the regression forest using theinfinitesimal jackknife for random forests (Wager, Hastie, andEfron, 2014). For each of the b = 1, ..., B trees comprising theforest, define

I The estimated response as µ∗b(x), and

I The number of times the i-th observation was used as N∗bi .

Then, defining Cov∗ as the covariance taken with respect to all thetrees comprising the forest, we set

σ2 =n − 1

n

(n

n − s

)2 n∑i=1

Cov∗ [µ∗b(x), N∗bi ]2 .

Page 27: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Causal forest exampleWe have n = 20k observations whose features are distributed asX ∼ U([−1, 1]p) with p = 6; treatment assignment is random. Allthe signal is concentrated along two features.

The plots below depict τ(x) for 10k random test examples,projected into the 2 signal dimensions.

true effect τ(x) causal forest k-NN estimate

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

● ●●

● ●

● ●

● ●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

● ●

●●

● ●●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

● ●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

● ●

●●

●●

●●

●●

● ●

● ●

● ●

●●

● ●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●● ●

● ●

●●

● ●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

● ●

●●

● ●

●●

●●

● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

x1

x2

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

● ●●

● ●

● ●

● ●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

● ●

●●

● ●●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

● ●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

● ●

●●

●●

●●

●●

● ●

● ●

● ●

●●

● ●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●● ●

● ●

●●

● ●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

● ●

●●

● ●

●●

●●

● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

x1

x2

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

● ●●

● ●

● ●

● ●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

● ●

●●

● ●●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

● ●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

● ●

●●

●●

●●

●●

● ●

● ●

● ●

●●

● ●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●● ●

● ●

●●

● ●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

● ●

●●

● ●

●●

●●

● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

x1

x2

Software: causalTree for R (Athey, Kong, and Wager, 2015)available at github: susanathey/causalTree

Page 28: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Causal forest exampleWe have n = 20k observations whose features are distributed asX ∼ U([−1, 1]p) with p = 20; treatment assignment is random.All the signal is concentrated along two features.

The plots below depict τ(x) for 10k random test examples,projected into the 2 signal dimensions.

true effect τ(x) causal forest k-NN estimate

●●

●● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

● ●

● ●

●●

● ●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

● ● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

● ●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

● ●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

x1

x2

●●

●● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

● ●

● ●

●●

● ●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

● ● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

● ●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

● ●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

x1

x2

●●

●● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

● ●

● ●

●●

● ●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

● ● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

● ●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

● ●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

x1

x2

Software: causalTree for R (Athey, Kong, and Wager, 2015)available at github: susanathey/causalTree

Page 29: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Causal forest exampleThe causal forest dominates k-NN for both bias and variance.With p = 20, the relative mean-squared error (MSE) for τ is

MSE for k-NN (tuned on test set)

MSE for forest (heuristically tuned)= 19.2.

causal forest k-NN estimate

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●● ●●

● ●

●●●

●●●● ●

●●

●●

●●

● ●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●● ●

●●

●●

● ●● ●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●●●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●●

●●● ●

●●

●●

● ●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

● ●

●●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●●

●●

●●●●●

●●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●

●●●●● ●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●●

●●

●●●●

●●

● ●

● ●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●● ●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

● ●●

●●●

● ●

●●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

● ●●

●●●

●●

●●

●●

●●●

● ●

●●●

●●

●●

●●

●●●●

●●●● ●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●● ●

● ●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●● ●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

● ●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●●

●●

●●

● ●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

● ●●

●●●

●●●

●●● ●● ●

●●●●

●●●

●● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

● ●

●●

●●

●●

●●●

●●●

● ●●●

●●●

●●

● ●

●●●

●●

●●

●●

●●

●●●

●●

●● ●

●●

●●

●●●

●●

● ●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●● ●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

● ●

●●●

●●

● ●

●●

●●

●●

●●●

●●

●● ●

● ●

●●

● ●●

●●●

●●●

●●●

● ●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

● ●

● ●●● ●

●●

●●

●●

●●

●●

●●

● ●●

●●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●● ●

●●● ●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●● ● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●●

●● ●

●●

●●

●●●

● ●

●● ●

●●●

●●

●●

●●

●●●

●●

●●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

● ●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●●

● ●

●●●

●●

●●

●●

●●

●● ●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●●

● ●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●●●

● ●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●●●●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

● ●●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●● ●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●●

●●●●

●●

●● ●

●●

●●

●●

●●●

●●

●●

●●●●●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●●

●●● ●●

●●

●●

● ●

●●

● ●

● ●

●● ●

●●

●●

●●

●●

●●●● ●

●●●●

●●●

●●

●●●

●●●

●●

●●●

●●

●●●

●●● ●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●

●●●

●●●

●●

● ●

●●●

●●

●●

●●

●● ●

●●●●

●●

●●●

●● ●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

● ●

●●●

●●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●●

● ●

●●●

●●

●●●

● ●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●● ●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●●●

●●●●

●●

●●●

●●●

●●●

●●

●●● ●

● ●

●●

●●

●●

● ●

●●

●●

●●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●● ●

● ●●

●●●

●●

● ●

●●

●●●●

●●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

● ●

●●

●●

●●

●●● ●

●●

●●●●

● ●●●

●●●●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

● ●

●●●

●●

●●

●● ●

●●

●●

● ●

●●

●●●

● ●●

●●●

●●

● ●

●● ●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●●●

● ●

●●●●

●●

●●

●●

●●

●●

●●

●●

● ●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

● ●

●●

●●

●●

●●

●●

●●

●●●●●

●●●

●●

● ●

●●

●●●

● ●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

● ●●

●●●

●●●

●●

●● ●

●●●

●● ●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●● ●●

● ●

●●●

●●●

●●

●●

●●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

● ●

●●

●●●●●

●●●

●●

●●●● ●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●● ●

●●

●●●

● ●

●● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●●●●

● ●

●●

●●

●●●

● ●

●●●

●●●

●●●

●●●●●

●●

●●●●

●●

●●

● ●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●●

●●

●●

●●●

●●●

● ●

●●

●●●

●● ●●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

● ●

●●

●●

●●

0.0 0.5 1.0 1.5 2.0

0.0

0.5

1.0

1.5

2.0

True Treatment Effect

Est

imat

ed T

reat

men

t Effe

ct

●●

●●

●●

●● ●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●● ●

●●

●●

●●

●●

● ●●

●●

●●

●●

● ●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

● ●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●● ●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●● ●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

● ●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

● ●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●●

●●

● ● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●● ●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●●

● ●

● ●

●●

● ●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●

●●●

● ●●

● ● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●●

●●

●● ●

● ●

●●

●●

●● ●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

● ●

● ● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●●● ●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

0.0 0.5 1.0 1.5 2.0

−0.

50.

00.

51.

01.

5

True Treatment Effect

Est

imat

ed T

reat

men

t Effe

ct

For p = 6, the corresponding MSE ratio for τ is 2.2.

Page 30: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Application: General Social SurveyThe General Social Survey is an extensive survey, collected since1972, that seeks to measure demographics, political views, socialattitudes, etc. of the U.S. population.

Of particular interest to us is a randomized experiment, forwhich we have data between 1986 and 2010.

I Question A: Are we spending too much, too little, or aboutthe right amount on welfare?

I Question B: Are we spending too much, too little, or aboutthe right amount on assistance to the poor?

Treatment effect: how much less likely are people to answer toomuch to question B than to question A.

I We want to understand how the treatment effect depends oncovariates: political views, income, age, hours worked, ...

NB: This dataset has also been analyzed by Green and Kern(2012) using Bayesian additive regression trees (Chipman, George,and McCulloch, 2010).

Page 31: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Application: General Social Survey

A causal forest analysis uncovers strong treatmentheterogeneity (n = 28, 686, p = 12).

<−−− less income −−− more income −−−>

<−

−−

m

ore

liber

al

−−

mor

e co

nser

vativ

e −

−−

>

0.250.300.350.40

CATE

Page 32: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Bigger picture: Solving estimating equations with randomforests

We have i = 1, . . . , n i.i.d. samples, each of which has anobservable quantity Oi , and a set of auxiliary covariates Xi .

Examples:

I Non-parametric regression: Oi = {Yi}.I Treatment effect estimation: Oi = {Yi , Wi}.I Instrumental variables regression: Oi = {Yi , Wi , Zi}.

Our parameter of interest, θ(x), is characterized by anestimating equation:

E[ψθ(x), ν(x) (Oi )

∣∣Xi = x]

= 0 for all x ∈ X ,

where ν(x) is an optional nuisance parameter.

Page 33: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

The GMM Setup: Examples

Our parameter of interest, θ(x), is characterized by

E[ψθ(x), ν(x) (Oi )

∣∣Xi = x]

= 0 for all x ∈ X ,

where ν(x) is an optional nuisance parameter.

I Quantile regression, where θ(x) = F−1x (q) for q ∈ (0, 1):

ψθ(x)(Yi ) = q 1 ({Yi > θ(x)})− (1− q) 1 ({Yi ≤ θ(x)})

I IV regression, with treatment assignment W and instrumentZ . We care about the treatment effect τ(x):

ψτ(x), µ(x) =

(Zi (Yi −Wi τ(x)− µ(x))Yi −Wi τ(x)− µ(x)

).

Page 34: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Solving heterogeneous estimating equations

The classical approach is to rely on local solutions (Fan andGijbels, 1996; Hastie and Tibshirani, 1990; Loader, 1999).

n∑i=1

α(x ; Xi ) ψθ(x), ν(x) (Oi ) = 0,

where the weights α(x ; Xi ) are obtained from, e.g., a kernel.

We use random forests to get good data-adaptive weights. Haspotential to be help mitigate the curse of dimensionality.

I Building many trees with small leaves, then solving theestimating equation in each leaf, and finally averaging theresults is a bad idea. Quantile and IV regression are badlybiased in very small samples.

I Using RF as an “adaptive kernel” protects against this effect.

Page 35: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

The random forest kernel

· · ·

=⇒

Forests induce a kernel via averaging tree-based neighborhoods.This idea was used by Meinshausen (2006) for quantile regression.

Page 36: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Solving estimating equations with random forests

We want to use an estimator of the form

n∑i=1

α(x ; Xi ) ψθ(x), ν(x) (Oi ) = 0,

where the weights α(x ; Xi ) are from a random forest.

Key Challenges:

I How do we grow trees that yield an expressive yet stableneighborhood function α(·; Xi )?

I We do not have access to “prediction error” for θ(x), sohow should we optimize splitting?

I How should we account for nuisance parameters?

I Split evaluation rules need to be computationally efficient,as they will be run many times for each split in each tree.

Page 37: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Step #1: Conceptual motivation

Following CART (Breiman et al., 1984), we use greedy splits.Each split directly seeks to improve the fit as much as possible.

I For regression trees, in large samples, the best split is thatwhich increases the heterogeneity of the predictions themost.

I The same fact also holds locally for estimating equations.

We split a parent node P into two children C1 and C2. In largesamples and with no computational constraints, we would liketo maximize

∆ (C1, C2) = nC1nC2

(θC1 − θC2

)2,

where θC1 , θC2 solve the estimating equation in the children.

Page 38: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Step #2: Practical realization

Computationally, solving the estimating equation in each possiblechild to get θC1 and θC2 can be prohibitively expensive.

To avoid this problem, we use a gradient-based approximation.The same idea underlies gradient boosting (Friedman, 2001).

θC ≈ θC := θP −1

|{i : Xi ∈ C}|∑

{i :Xi∈C}

ξ>A−1P ψθP , νp (Oi ) ,

AP =1

|{i : Xi ∈ P}|∑

{i :Xi∈P}

∇ψθP , νP (Oi ) ,

where θP and νP are obtained by solving the estimating equationonce in the parent node, and ξ is a vector that picks out theθ-coordinate from the (θ, ν) vector.

Page 39: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Step #2: Practical realization

In practice, this idea leads to a split-relabel algorithm:

1. Relabel step: Start by computing pseudo-outcomes

Yi = −ξ>A−1P ψθP , νP (Oi ) ∈ R.

2. Split step: Apply a CART-style regression split to the Yi .

This procedure has several advantages, including the following:

I Computationally, the most demanding part of growing a treeis in scanning over all possible splits. Here, we reduce to aregression split that can be efficiently implemented.

I Statistically, we only have to solve the estimating equationonce. This reduces the risk of hitting a numerically unstableleaf—which can be a risk with methods like IV.

I From an engineering perspective, we can write a single,optimized split-step algorithm, and then use it everywhere.

Page 40: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Step #3: Variance correction

Conceptually, we saw that—in large samples—we want splits thatmaximize the heterogeneity of the θ(Xi ). In small samples, weneed to account for sampling variance.

We need to penalize for the following two sources of variance.

I Our plug-in estimates for the heterogeneity of θ(Xi ) will beoverly optimistic about the large-sample parameterheterogeneity. We need to correct for this kind of over-fitting.

I We anticipate “honest” estimation, and want to avoidleaves where the estimating equation is unstable. Forexample, with IV regression, we want to avoid leaves with anunusually weak 1st-stage coefficient.

This is a generalization of the analysis of Athey and Imbens (2016)for treatment effect estimation.

Page 41: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Simulation example: Quantile regression

In quantile regression, we want to estimate the q-th quantile of theconditional distribution of Y given X , namely θ(x) = F−1x (q).

I Meinshausen (2006) used the random forest kernel forquantile regression. However, he used standard CARTregression splitting instead of a tailored splitting rule.

I In our split-relabel paradigm, quantile splits reduce toclassification splits (θP is the q-th quantile of the parent):

Yi = 1({

Yi > θP

}).

I To estimate many quantiles, we do multi-class classification.

Page 42: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Simulation example: Quantile regression

Case 1: Mean Shift Case 2: Scale Shift

−1.0 −0.5 0.0 0.5 1.0

−1.

00.

00.

51.

01.

52.

0

X

Y

truthquantregForestsplit−relabel

−1.0 −0.5 0.0 0.5 1.0−

2−

10

12

3X

Y

truthquantregForestsplit−relabel

The above examples show quantile estimates at q = 0.1, 0.5, 0.9,on Gaussian data with n = 2, 000 and p = 40. The packagequantregForest implements the method of Meinshausen (2006).

Page 43: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Simulation example: Instrumental variables

We want to estimate heterogeneous treatment effects withendogenous treatment assignment: Yi is the treatment, Wi is thetreatment assignment, and Zi is an instrument satisfying:

{Yi (w)}w∈W ⊥⊥ Zi

∣∣Xi .

I Our split-relabel formalism tells us to use pseudo-outcomes

Yi =(Zi − Zp

) ((Yi − Y p

)− τP

(Wi −W p

)),

where τP is the IV solution in the parent, and Y P , W p, Zp

are averages over the parent.

I This is just IV regression residuals projected onto theinstruments.

Page 44: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Simulation example: Instrumental variables

Using IV forests is important

−1.0 −0.5 0.0 0.5 1.0

−1

01

23

4

X

tau

True Treat. EffectRaw CorrelationCausal ForestIV Forest

We have spurious correlations:

I OLS for Y on W given X hastwo jumps, at X1 = −1/3and at X1 = 1/3.

I The causal effect τ(X ) onlyhas a jump at X1 = −1/3.

I n = 10, 000, p = 20.

The response function is

Yi = (2Wi − 1) 1 ({X1,i > −1/3})+ (3A− 1.5) 1 ({X1,i > 1/3})+ 2εi .

Ai is correlated with Wi .

Page 45: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Simulation example: Instrumental variables

Using IV splits is important

−1.0 −0.5 0.0 0.5 1.0

0.0

0.5

1.0

1.5

X

tau

True Treatment EffectIV Forest with Causal SplitIV Forest with IV Split

We have useless correlations:

I The joint distribution of(Wi , Yi ) is independent ofthe covariates Xi .

I But: the causal effect τ(X )has a jump at X1 = 0.

I n = 5, 000, p = 20.

The response function is

Yi = 2 · 1 ({X1,i ≤ 0}) Ai

+ 1 ({X1,i > 0}) Wi

+ (1 + 0.73 · 1 ({X1,i > 0})) εi .

Ai is correlated with Wi .

Page 46: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Empirical Application: Family Size

Angrist and Evans (1998) study the effect of family size onwomen’s labor market outcomes. Understanding heterogeneity canguide policy.

I Outcomes: participation, female income, hours worked, etc.

I Treatment: more than two kids

I Instrument: first two kids same sex

I First stage effect of same sex on more than two kids: .06

I Reduced form effect of same sex on probability of work,income: .008, $132

I LATE estimates of effect of kids on probability of work,income: .133, $2200

Page 47: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Treatment Effects

Effect on Participation Baseline Probability of Working

20 40 60 80 100

0.00

0.05

0.10

0.15

0.20

Father's Income [$1k/year]

CAT

E

18 years20 years22 years24 years

20 40 60 80 100

0.2

0.3

0.4

0.5

0.6

0.7

Father's Income [$1k/year]

Bas

elin

e P

rob.

Wor

king

18 years20 years22 years24 years

Page 48: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Treatment Effects

Effect on Participation Effect relative to Baseline

20 40 60 80 100

0.00

0.05

0.10

0.15

0.20

Father's Income [$1k/year]

CAT

E

18 years20 years22 years24 years

20 40 60 80 100

0.0

0.1

0.2

0.3

0.4

Father's Income [$1k/year]

CAT

E /

(Pro

babi

lity

of W

orki

ng)

18 years20 years22 years24 years

Page 49: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Treatment Effects

Effect on Earnings Baseline Earnings

20 40 60 80 100

500

1000

1500

Father's Income [$1,000/year]

CAT

E [$

1/ye

ar]

18 years20 years22 years24 years

20 40 60 80 100

45

67

8

Father's Income [$1,000/year]

Mot

her's

Bas

elin

e In

com

e [$

1,00

0/ye

ar] 18 years

20 years22 years24 years

Page 50: Solving Heterogeneous Estimating Equations Using Forest Based … · 2018-08-13 · I CART: MSE of predictions for regression, Gini impurity for classi cation, survival (see Bouhamad

Alternative: Two Random Forests

Our approach can be compared to building two honest randomforests and comparing the ratios; can also use all partitions fromboth forests to ensure same weighting for both outcomes.Ongoing work: compare alternative approaches.