causal inference msm

Post on 27-Nov-2014

49 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Causal InferenceMarginal Structural Models

Rhoderick Machekano

Center for Health Care Research and PolicyDepartment of Medicine

Case Western Reserve UniversityCleveland, OH 44109

rhoderick.machekano@case.edu

March, 2010

Association

Interest in understanding why values of an outcome variable Yvary over the units in a populationY is the response variable - variable to be explainedIn associational inference, we are satisfied with discovering howthe values of Y are associated with the values of the variablesdefined on the units (attributes A) in our population.The conditional distribution characterizes how Y values changesas A varies e.g. E(Y |A = a) = β0 + β1AThe parameter β1 is the associational parameterAssociational inference consists of estimating and testing theparameters β using observed data on Y and A

R. Machekano ( Center for Health Care Research and Policy Department of Medicine Case Western Reserve University Cleveland, OH 44109 rhoderick.machekano@case.edu )Causal Inference March 2010 2 / 27

Causal Inference

Causal inference addresses comparisons of different treatments ifapplied to the same units.Rubin’s causal model posits existance of potential outcomes foreach unitCI is a prediction problem - what would have happened underdifferent treatment options

R. Machekano ( Center for Health Care Research and Policy Department of Medicine Case Western Reserve University Cleveland, OH 44109 rhoderick.machekano@case.edu )Causal Inference March 2010 3 / 27

The Potential Outcomes Framework

Describes the responses that would have been observed had anindividual/unit been subjected to each possible treatmentCausal Effect with a binary treatment: y1i − y0i

It is critical that the unit be potentially exposable to any of theexposuresExample: The method of instruction a student receives can be acause of students’ performance in a test, but the student’s race orgender cannot be a cause.We observe only one potential outcome corresponding to thetreatment received - the other potential outcome(s) are missing:Fundamental Problem of Causal Inference

R. Machekano ( Center for Health Care Research and Policy Department of Medicine Case Western Reserve University Cleveland, OH 44109 rhoderick.machekano@case.edu )Causal Inference March 2010 4 / 27

Fundamental Problem of Causal Inference1

Hypothetical complete data

Unit Pretreat Treat Potential Treati inputs indic ouctomes effect

Xi Ti y0i y1i y1i − y0i1 1 50 0 69 75 62 1 98 0 111 108 -33 2 80 1 92 102 104 1 98 1 112 111 -1...

.

.

....

.

.

....

.

.

....

100 1 104 1 111 114 3

Observed data

Unit Pretreat Treat Potential Treati inputs indic ouctomes effect

Xi Ti y0i y1i y1i − y0i1 1 50 0 69 ? ?2 1 98 0 111 ? ?3 2 80 1 ? 102 ?4 1 98 1 ? 111 ?...

.

.

....

.

.

....

.

.

....

100 1 104 1 ? 114 ?

Effect of mathematics program of study on fourth graders

We wish to compare a novel study program to a standard program of study on fourth graders. Outcome is a score on test at theend of the year.

1Source: Gelman & HillR. Machekano ( Center for Health Care Research and Policy Department of Medicine Case Western Reserve University Cleveland, OH 44109 rhoderick.machekano@case.edu )Causal Inference March 2010 5 / 27

Fundamental Problem of Causal Inference1

Hypothetical complete data

Unit Pretreat Treat Potential Treati inputs indic ouctomes effect

Xi Ti y0i y1i y1i − y0i1 1 50 0 69 75 62 1 98 0 111 108 -33 2 80 1 92 102 104 1 98 1 112 111 -1...

.

.

....

.

.

....

.

.

....

100 1 104 1 111 114 3

Observed data

Unit Pretreat Treat Potential Treati inputs indic ouctomes effect

Xi Ti y0i y1i y1i − y0i1 1 50 0 69 ? ?2 1 98 0 111 ? ?3 2 80 1 ? 102 ?4 1 98 1 ? 111 ?...

.

.

....

.

.

....

.

.

....

100 1 104 1 ? 114 ?

Effect of mathematics program of study on fourth graders

We wish to compare a novel study program to a standard program of study on fourth graders. Outcome is a score on test at theend of the year.

1Source: Gelman & HillR. Machekano ( Center for Health Care Research and Policy Department of Medicine Case Western Reserve University Cleveland, OH 44109 rhoderick.machekano@case.edu )Causal Inference March 2010 5 / 27

Fundamental Problem of Causal Inference1

Hypothetical complete data

Unit Pretreat Treat Potential Treati inputs indic ouctomes effect

Xi Ti y0i y1i y1i − y0i1 1 50 0 69 75 62 1 98 0 111 108 -33 2 80 1 92 102 104 1 98 1 112 111 -1...

.

.

....

.

.

....

.

.

....

100 1 104 1 111 114 3

Observed data

Unit Pretreat Treat Potential Treati inputs indic ouctomes effect

Xi Ti y0i y1i y1i − y0i1 1 50 0 69 ? ?2 1 98 0 111 ? ?3 2 80 1 ? 102 ?4 1 98 1 ? 111 ?...

.

.

....

.

.

....

.

.

....

100 1 104 1 ? 114 ?

Effect of mathematics program of study on fourth graders

We wish to compare a novel study program to a standard program of study on fourth graders. Outcome is a score on test at theend of the year.

1Source: Gelman & HillR. Machekano ( Center for Health Care Research and Policy Department of Medicine Case Western Reserve University Cleveland, OH 44109 rhoderick.machekano@case.edu )Causal Inference March 2010 5 / 27

Fundamental Problem of Causal Inference1

Hypothetical complete data

Unit Pretreat Treat Potential Treati inputs indic ouctomes effect

Xi Ti y0i y1i y1i − y0i1 1 50 0 69 75 62 1 98 0 111 108 -33 2 80 1 92 102 104 1 98 1 112 111 -1...

.

.

....

.

.

....

.

.

....

100 1 104 1 111 114 3

Observed data

Unit Pretreat Treat Potential Treati inputs indic ouctomes effect

Xi Ti y0i y1i y1i − y0i1 1 50 0 69 ? ?2 1 98 0 111 ? ?3 2 80 1 ? 102 ?4 1 98 1 ? 111 ?...

.

.

....

.

.

....

.

.

....

100 1 104 1 ? 114 ?

Effect of mathematics program of study on fourth graders

We wish to compare a novel study program to a standard program of study on fourth graders. Outcome is a score on test at theend of the year.

1Source: Gelman & HillR. Machekano ( Center for Health Care Research and Policy Department of Medicine Case Western Reserve University Cleveland, OH 44109 rhoderick.machekano@case.edu )Causal Inference March 2010 5 / 27

Approaches to the fundamental problem of causalinference

Is causal inference impossible?Two solutions to the fundamental problem:

1 Scientific solution2 Statistical solution

R. Machekano ( Center for Health Care Research and Policy Department of Medicine Case Western Reserve University Cleveland, OH 44109 rhoderick.machekano@case.edu )Causal Inference March 2010 6 / 27

Approaches to the fundamental problem of causalinference

Is causal inference impossible?Two solutions to the fundamental problem:

1 Scientific solution2 Statistical solution

R. Machekano ( Center for Health Care Research and Policy Department of Medicine Case Western Reserve University Cleveland, OH 44109 rhoderick.machekano@case.edu )Causal Inference March 2010 6 / 27

Approaches to the fundamental problem of causalinference

Is causal inference impossible?Two solutions to the fundamental problem:

1 Scientific solution2 Statistical solution

R. Machekano ( Center for Health Care Research and Policy Department of Medicine Case Western Reserve University Cleveland, OH 44109 rhoderick.machekano@case.edu )Causal Inference March 2010 6 / 27

Scientific solution

Exploits homogeneity and invariance assumptions to find closesubstitutesAssuming all remains the same, a measurement before treatmentcan be substituted for the other potential outcomeExamples include studies in animals (rats), physical sciences etc

R. Machekano ( Center for Health Care Research and Policy Department of Medicine Case Western Reserve University Cleveland, OH 44109 rhoderick.machekano@case.edu )Causal Inference March 2010 7 / 27

Statistical Solution

The average causal effect T is the expected value of thedifference Y1i − Y0i over the units in the population.T = E(Y1)− E(Y0) implies information of different units that canbe observed can be used to gain knowledge on TThe observed data only gives us E(YT |T = 1) = E(Y1|T = 1) andE(YT |T = 0) = E(Y0|T = 0)

The way the units get to be selected to the different exposures isvery importantWe want to compare outcomes from similar units - randomizationThe goal is to have treatment assignment T independent of thepotential outcomes and all other variables

R. Machekano ( Center for Health Care Research and Policy Department of Medicine Case Western Reserve University Cleveland, OH 44109 rhoderick.machekano@case.edu )Causal Inference March 2010 8 / 27

Observational Studies

Non-randomized T

X

T Y

Not all studies can berandomizedUnits often end up treatedor not based oncharacteristics that arepredictive of the outcome -systematic differencesSolution is statisticaladjustments

R. Machekano ( Center for Health Care Research and Policy Department of Medicine Case Western Reserve University Cleveland, OH 44109 rhoderick.machekano@case.edu )Causal Inference March 2010 9 / 27

Causal Inference in Observational Studies

Assume conditional independence:conditional on the confoundingcovariates distribution of treatments across units is random withrespect to the potential outcomesThe distribution of the potential outcomes (y0, y1) is the sameacross treatment levels conditioning on confounding variates Xi.e. y0, y1 ⊥ T | X - ignorabilityIf choice of treatment is made based on other covariatespredictive of the outcome but are not measured, then we havenon-ignorable treatment mechanismWith ignorability satisfied, we can use regression modelingadjusting for confoundersUnbalance and lack of overlap creates problems

R. Machekano ( Center for Health Care Research and Policy Department of Medicine Case Western Reserve University Cleveland, OH 44109 rhoderick.machekano@case.edu )Causal Inference March 2010 10 / 27

Propensity scores as a solution to observationalstudies

Propensity scores as a tool to acheive balance and overlapthrough matchingInverse of Propensity score as weight for a data point

R. Machekano ( Center for Health Care Research and Policy Department of Medicine Case Western Reserve University Cleveland, OH 44109 rhoderick.machekano@case.edu )Causal Inference March 2010 11 / 27

Marginal Structrural Models for point treatmentstudies

If we a have a binary treatment, exposed unexposed, causalinference is given by E(Y1)− E(Y0)

When treatment assignment is associated with prognostic factorsX , subjects who are exposed are a selective subgroup andsample average of their outcomes may systematically over orunderestimate the population mean counterfactualThe selection bias can be corrected when there are nounmeasured confounders by weighting each unit’s data by theinverse of the propensity score πi

E(Yt) =

Pni=1

I(Ti =t)yiπtiPn

i=1I(Ti =t)πti

We can model the counterfactual outcome: E(Yt) = β0 + β1t

R. Machekano ( Center for Health Care Research and Policy Department of Medicine Case Western Reserve University Cleveland, OH 44109 rhoderick.machekano@case.edu )Causal Inference March 2010 12 / 27

Inverse Probability Weighting

1 Calculate weights wi = 1pr(Ti=t |Xi )

2 Perform a weighted regression of the outcome Y on exposurevariable T

3 Why does this work?Creats a psuedo-population consisting of wi copies of unit iIn the pseudo-population, treatment T is unconfoundedThe distribution of the counterfactuals in the pseudo-population isthe same as in the study population

R. Machekano ( Center for Health Care Research and Policy Department of Medicine Case Western Reserve University Cleveland, OH 44109 rhoderick.machekano@case.edu )Causal Inference March 2010 13 / 27

Inverse Probability Weighting

1 Calculate weights wi = 1pr(Ti=t |Xi )

2 Perform a weighted regression of the outcome Y on exposurevariable T

3 Why does this work?Creats a psuedo-population consisting of wi copies of unit iIn the pseudo-population, treatment T is unconfoundedThe distribution of the counterfactuals in the pseudo-population isthe same as in the study population

R. Machekano ( Center for Health Care Research and Policy Department of Medicine Case Western Reserve University Cleveland, OH 44109 rhoderick.machekano@case.edu )Causal Inference March 2010 13 / 27

Inverse Probability Weighting

1 Calculate weights wi = 1pr(Ti=t |Xi )

2 Perform a weighted regression of the outcome Y on exposurevariable T

3 Why does this work?Creats a psuedo-population consisting of wi copies of unit iIn the pseudo-population, treatment T is unconfoundedThe distribution of the counterfactuals in the pseudo-population isthe same as in the study population

R. Machekano ( Center for Health Care Research and Policy Department of Medicine Case Western Reserve University Cleveland, OH 44109 rhoderick.machekano@case.edu )Causal Inference March 2010 13 / 27

Inverse Probability Weighting

1 Calculate weights wi = 1pr(Ti=t |Xi )

2 Perform a weighted regression of the outcome Y on exposurevariable T

3 Why does this work?Creats a psuedo-population consisting of wi copies of unit iIn the pseudo-population, treatment T is unconfoundedThe distribution of the counterfactuals in the pseudo-population isthe same as in the study population

R. Machekano ( Center for Health Care Research and Policy Department of Medicine Case Western Reserve University Cleveland, OH 44109 rhoderick.machekano@case.edu )Causal Inference March 2010 13 / 27

Inverse Probability Weighting

1 Calculate weights wi = 1pr(Ti=t |Xi )

2 Perform a weighted regression of the outcome Y on exposurevariable T

3 Why does this work?Creats a psuedo-population consisting of wi copies of unit iIn the pseudo-population, treatment T is unconfoundedThe distribution of the counterfactuals in the pseudo-population isthe same as in the study population

R. Machekano ( Center for Health Care Research and Policy Department of Medicine Case Western Reserve University Cleveland, OH 44109 rhoderick.machekano@case.edu )Causal Inference March 2010 13 / 27

Inverse Probability Weighting

1 Calculate weights wi = 1pr(Ti=t |Xi )

2 Perform a weighted regression of the outcome Y on exposurevariable T

3 Why does this work?Creats a psuedo-population consisting of wi copies of unit iIn the pseudo-population, treatment T is unconfoundedThe distribution of the counterfactuals in the pseudo-population isthe same as in the study population

R. Machekano ( Center for Health Care Research and Policy Department of Medicine Case Western Reserve University Cleveland, OH 44109 rhoderick.machekano@case.edu )Causal Inference March 2010 13 / 27

Toy Example of Effect of Weighting

Xi A A A B B B C C CY1i 1 1 1 2 2 2 3 3 3Ti 1 0 0 1 1 1 1 1 0πi

13 1 1 1 2

323

Tiπi

3 0 0 1 1 1 1.5 1.5 0

True exposure counterfactual mean E(Y1) = 2

Mean among exposed E(Y | T = 1) = 136

IPW mean = 3x1+0x1+0x1+1x2+1x2+1x2+1.5x3+1.5x3+0x39 = 2

R. Machekano ( Center for Health Care Research and Policy Department of Medicine Case Western Reserve University Cleveland, OH 44109 rhoderick.machekano@case.edu )Causal Inference March 2010 14 / 27

Example: Complete Data

> head(simdat)+ tail(simdat)

covar y1 y0 treat outcome1 0.08317643 1.650394 0.8060006 1 1.6503942 0.82471339 3.546720 2.5126394 0 2.5126393 0.98284952 4.650695 2.2191206 1 4.6506954 0.91146710 3.433277 2.1412135 0 2.1412145 0.81274144 3.584094 2.4440184 0 2.4440186 0.71152091 4.019010 2.6391156 1 4.019010

covar y1 y0 treat outcome45 0.16681624 2.558554 1.155048 1 2.55855446 0.30581451 2.131925 1.416699 1 2.13192547 0.49911482 2.943167 2.886796 1 2.94316748 0.24429569 2.524773 1.514149 1 2.52477349 0.08419395 1.004389 1.284473 1 1.00438950 0.65493092 3.387608 2.819581 0 2.819581

R. Machekano ( Center for Health Care Research and Policy Department of Medicine Case Western Reserve University Cleveland, OH 44109 rhoderick.machekano@case.edu )Causal Inference March 2010 15 / 27

Treatment Mechanism

pr(T = 1|w) = 11+exp−(1−2w)

R. Machekano ( Center for Health Care Research and Policy Department of Medicine Case Western Reserve University Cleveland, OH 44109 rhoderick.machekano@case.edu )Causal Inference March 2010 16 / 27

Estimation

Causal Effects from the potential outcomes

> mean(y1)-mean(y0)[1] 1.103484

Unadjusted estimate

mean(Y[A==1])-mean(Y[A==0])[1] 0.6900297

Adjusted regression

> display(lm(Y˜A+w))lm(formula = Y ˜ A + w)

coef.est coef.se(Intercept) 0.81 0.25A 1.24 0.19w 2.11 0.32---n = 50, k = 3residual sd = 0.59, R-Squared = 0.56

R. Machekano ( Center for Health Care Research and Policy Department of Medicine Case Western Reserve University Cleveland, OH 44109 rhoderick.machekano@case.edu )Causal Inference March 2010 17 / 27

Inverse Probability of Treatment WeightedEstimator

We know the true propensity of treat i.e. the probability of receiving treatment or controlgiven the covariates

Calculate the weights

wt=ifelse(A==1,1/pa,1/(1-pa))

Perform a weighted regression of the outcome on exposure

> display(lm(Y˜A, weight=wt))lm(formula = Y ˜ A, weights = wt)

coef.est coef.se(Intercept) 2.11 0.19A 1.01 0.25---n = 50, k = 2residual sd = 1.18, R-Squared = 0.26

R. Machekano ( Center for Health Care Research and Policy Department of Medicine Case Western Reserve University Cleveland, OH 44109 rhoderick.machekano@case.edu )Causal Inference March 2010 18 / 27

Application: Effect of Job Training Program onearnings

> head(nws)age educ black married nodegree re75 re78 hisp treat educ_cat4

1 42 16 0 1 0 0.000 100.4854 0 0 42 20 13 0 0 0 3317.468 4793.7451 0 0 33 37 12 0 1 0 22781.855 25564.6699 0 0 24 48 12 0 1 0 20839.355 20550.7441 0 0 25 51 12 0 1 0 21575.178 22783.5879 0 0 26 18 11 0 0 1 1455.532 2157.4807 0 0 1

R. Machekano ( Center for Health Care Research and Policy Department of Medicine Case Western Reserve University Cleveland, OH 44109 rhoderick.machekano@case.edu )Causal Inference March 2010 19 / 27

Examine Exposure-Covariate Association

Covariate Exposed Unexposed p-valueAge 25.8 33.4 < 0.001Black 84% 9.7% < 0.001Hispanic 5.6% 6.7% 0.694Married 19% 73% < 0.001Non-degreed 71% 30% < 0.001Education 10 12 < 0.0011975 Salary 1532 14380 < 0.001

There are systematic differences between subjects exposed to the job trainingprogram compared to those not exposed

R. Machekano ( Center for Health Care Research and Policy Department of Medicine Case Western Reserve University Cleveland, OH 44109 rhoderick.machekano@case.edu )Causal Inference March 2010 20 / 27

Naive Estimates - using regression methods

> nws.unadjusted=glm(re78˜treat)> display(nws.unadjusted)glm(formula = re78 ˜ treat)

coef.est coef.se(Intercept) 15750.30 79.84treat -9401.16 801.98---n = 18667, k = 2residual deviance = 2.198893e+12, null deviance = 2.215081e+12 (difference = 16188578920.9)overdispersion parameter = 117808349.3residual sd is sqrt(overdispersion) = 10853.96

> mean(re78[treat==1])-mean(re78[treat==0])[1] -9401.156

R. Machekano ( Center for Health Care Research and Policy Department of Medicine Case Western Reserve University Cleveland, OH 44109 rhoderick.machekano@case.edu )Causal Inference March 2010 21 / 27

Adjusting for pre-treatment covariates

> nws.adjusted=glm(re78˜treat+age+educ+married+black+hisp+nodegree+re75+re74)+ display(nws.adjusted)glm(formula = re78 ˜ treat + age + educ + married + black + hisp +

nodegree + re75 + re74)coef.est coef.se

(Intercept) 4427.01 449.18treat 640.50 583.24age -110.15 5.88educ 247.02 28.80married 180.24 144.33black -387.15 190.58hisp -50.18 228.22nodegree 645.35 179.45re75 0.51 0.01re74 0.30 0.01---n = 18667, k = 10residual deviance = 1.074142e+12, null deviance = 2.215081e+12 (difference = 1.140939e+12)overdispersion parameter = 57573147.6residual sd is sqrt(overdispersion) = 7587.70

R. Machekano ( Center for Health Care Research and Policy Department of Medicine Case Western Reserve University Cleveland, OH 44109 rhoderick.machekano@case.edu )Causal Inference March 2010 22 / 27

Inverse Weighted Estimator

propensity.model = glm(treat˜age+educ+married+black+hisp+nodegree+re75+re74, family="binomial")

phat = predict(propensity.model, type="response")

wt = ifelse(treat==1,1/phat,1/(1-phat))

st.wt = ifelse(treat==1,mean(treat)/phat,mean(1-treat)/(1-phat))

R. Machekano ( Center for Health Care Research and Policy Department of Medicine Case Western Reserve University Cleveland, OH 44109 rhoderick.machekano@case.edu )Causal Inference March 2010 23 / 27

Causal Effect estimation

> nws.weighted=glm(re78˜treat,weights=wt)> display(nws.weighted)glm(formula = re78 ˜ treat, weights = wt)

coef.est coef.se(Intercept) 15647.00 91.82treat -7116.53 151.84---n = 18667, k = 2residual deviance = 2.937325e+12, null deviance = 3.282995e+12 (difference = 345669632293.3)overdispersion parameter = 157370755.5residual sd is sqrt(overdispersion) = 12544.75

R. Machekano ( Center for Health Care Research and Policy Department of Medicine Case Western Reserve University Cleveland, OH 44109 rhoderick.machekano@case.edu )Causal Inference March 2010 24 / 27

Distribution of propensity scores

R. Machekano ( Center for Health Care Research and Policy Department of Medicine Case Western Reserve University Cleveland, OH 44109 rhoderick.machekano@case.edu )Causal Inference March 2010 25 / 27

Improved causal estimate

> nws.weighted=glm(re78˜treat,weights=wt2)+ display(nws.weighted)+glm(formula = re78 ˜ treat, weights = wt2)

coef.est coef.se(Intercept) 8037.06 159.77treat 493.40 204.14---n = 6809, k = 2residual deviance = 1.182667e+12, null deviance = 1.183682e+12 (difference = 1014917902.6)overdispersion parameter = 173742743.2residual sd is sqrt(overdispersion) = 13181.15

R. Machekano ( Center for Health Care Research and Policy Department of Medicine Case Western Reserve University Cleveland, OH 44109 rhoderick.machekano@case.edu )Causal Inference March 2010 26 / 27

Pros and Cons of IPW estimators

Accounts for selection bias (and more) in observational studiesIs more efficient compared to propensity matching - uses all dataavailableCan be extended to time-varying confounders and exposuresBehaves badly in situations where few individuals have very largeweights - increases variance. Happens when backgroundcharacteristics are strongly predictive of treatment.Sensitive to model specification

R. Machekano ( Center for Health Care Research and Policy Department of Medicine Case Western Reserve University Cleveland, OH 44109 rhoderick.machekano@case.edu )Causal Inference March 2010 27 / 27

top related