naihua duan ucla and rand may 2000 selection bias in treatment assignment/delivery
DESCRIPTION
Research Designs, Statistical Strategies for Dealing with Selection Bias in Treatment Delivery, and Limitations. Naihua Duan UCLA and RAND May 2000 Selection bias in treatment assignment/delivery Research designs Mitigating for overt selection bias Dealing with hidden selection bias - PowerPoint PPT PresentationTRANSCRIPT
May 24, 2000 NIDA/NIMH Substance Abuse Conference
1
Research Designs, Statistical Strategies for Dealing with Selection Bias in Treatment Delivery, and Limitations
Naihua Duan
UCLA and RAND
May 2000
Selection bias in treatment assignment/delivery
Research designs
Mitigating for overt selection bias
Dealing with hidden selection bias
Discussions
May 24, 2000 NIDA/NIMH Substance Abuse Conference
2
Selection Bias in Treatment Delivery
In naturalistic settings:
Pre-treatment health treatment delivered
Pre-treatment health outcome
Treated group dissimilar from untreated group
Direct comparison of treated vs. untreated results in biased
estimate for treatment effect
Need to mitigate selection bias in order to assess treatment effect more appropriately
May 24, 2000 NIDA/NIMH Substance Abuse Conference
3
Selection Bias in Treatment Delivery: Typology
Overt selection bias
Treatment related to covariates
T X
Given covariates, treatment independent of outcome
T Y | X (ignorability)
Like a stratified randomized experiment
Hidden selection bias
Given covariates, treatment still related to outcome
T Y | X
Rosenbaum (1995) Observational Studies, Springer-Verlag
May 24, 2000 NIDA/NIMH Substance Abuse Conference
5
Research Designs
Ideal randomized clinical trial (RCT)Imperfect RCT with noncomplianceRandomized encouragement design (RED)Observational studies
Settings: controlled vs. naturalistic
Treatment assignment/delivery: mandated vs. choice
Treated vs. untreated groups: balance vs. imbalance
Research questions: efficacy vs. adoption, program effect, and efficacy
Analytic strength: interval validity vs. external validity
May 24, 2000 NIDA/NIMH Substance Abuse Conference
6
Randomized Clinical Trial
A d op tionC om p lian ce
N ot ad op tedN on -com p l.
A ss ig n to Tx
A d op tionN on -com p l.
N o t ad op tedC om p lian ce
A ss ig n to C on tro l
R an d om ize
R ec ru it, con sen t, en ro ll
Intensive efforts made to mandate assignment
May 24, 2000 NIDA/NIMH Substance Abuse Conference
7
Randomized Encouragement Design
A d op tionC om p lian ce
N ot ad op tedN on -com p l.
E n cou rag e Tx
A d op tionN on -com p l.
N o t ad op tedC om p lian ce
N o en cou rag em en t
R an d om ize
R ec ru it, con sen t, en ro ll
Encouragement: training, providing information, case management, reducing barriers (child care, transportation, flexible hours, reducing co-payment…), decorate waiting room,...
May 24, 2000 NIDA/NIMH Substance Abuse Conference
8
Randomized Encouragement Design: Features
Analogous to marketing experiment
Encouragement higher adoption rate?
better overall outcomes? better outcomes for new users?
Naturalistic, incorporate user preferences, facilitate choice
Broader participation, external validity, dissemination
Zelen (1979 NEJM, 1990 Stat. in Medicine: randomized consent design), Holland (1988) in Clogg CC, ed. Sociological Methodology, Hirano et al. (2000, Biostatistics), Wells et al. (2000, JAMA), Duan et al. (2000, manuscript)
May 24, 2000 NIDA/NIMH Substance Abuse Conference
9
Mitigating Overt Selection Bias
Assume overt selection bias: T X
Assume no hidden selection bias: T Y | X
Covariate adjustment through ANCOVA
Stratification (through propensity score method)
Matching (through propensity score method)
May 24, 2000 NIDA/NIMH Substance Abuse Conference
10
Covariate Adjustment
Y = + T + X (+ T X ) + Extrapolation can be risky when imbalance is substantial
Y
X: Pre-Tx health
T = 1
T = 0
May 24, 2000 NIDA/NIMH Substance Abuse Conference
11
Limitations for Covariate Adjustment
Extrapolation can be risky when imbalance is substantial
Compare apples and oranges, rely on model to adjust
Careful model diagnosis is essential
Multivariate imbalance might be more problematic
Why so popular?
Ease of push-botton analysis
Almost always gives an answer
Could be a bad answer!
May 24, 2000 NIDA/NIMH Substance Abuse Conference
12
Stratification When Covariate Is Univariate
Stratify, then compare by stratum
Compare apples and apples, oranges and oranges
Y
X: Pre-Tx health
T = 1
T = 0
May 24, 2000 NIDA/NIMH Substance Abuse Conference
13
Stratification: Procedure
Stratify, then compare treated vs. untreated by stratum
Two-sample comparison within each stratum
ANCOVA within each stratum
Assess interactions across strata
Synthesize treatment effects across strata
Weighted average
Overall intervention effect on treated
Overall intervention effect on untreated
Overall intervention effect on entire pool
Can be specified as ANCOVA with interactions
Nonparametric regression of Y on X, stratified by T
May 24, 2000 NIDA/NIMH Substance Abuse Conference
14
Covariate Adjustment, Nonparametric Version
OK for low dimension X
Curse of dimensionality for high dimension X
Y
X: Pre-Tx health
T = 1
T = 0
May 24, 2000 NIDA/NIMH Substance Abuse Conference
15
Stratification: Features
Why not used as widely as ANCOVA?
Does not always give an answer
Provides warning where imbalance is too severe
Not a push-button operation, but not difficult
How to stratify?
Clinical judgement
Usually not critical; sensitivity analysis recommended
Cochran-Rubin-Rosenbaum recommend 5 strata
How to stratify with multi-dimensional covariates?
Curse of dimensionality
Use propensity score method to reduce dimensionality
May 24, 2000 NIDA/NIMH Substance Abuse Conference
16
Propensity Score Method
Assumeovert selection bias, no hidden selection bias
T Y | X
= X) = P(T = 1 | X) is the propensity score
Example: logit(X)) = + X X) is a balancing score (most parsimonious)
T X | X)
Given X), treatment independent of outcome
T Y | X)
Need only stratify by propensity score
Other dimensions of X can be neglected in assessing treatment effect
May 24, 2000 NIDA/NIMH Substance Abuse Conference
17
Propensity Score Method: Procedure
Estimate X) = P(T = 1 | X)
Logistic regression of T on X
Stratify sample (X, T, and Y) by estimated X) or XSort out apples and oranges
Analyze each stratum, compare treated vs. untreated
Two sample comparison within stratum
ANCOVA within stratum
Assess interactions across strata
Synthesize treatment effects across strata
Weighted average...
May 24, 2000 NIDA/NIMH Substance Abuse Conference
19
Propensity Score Method: Stratification for Y
Stratify, then compare by stratum
Compare apples and apples, oranges and oranges
Y
X
T = 0
T = 1
May 24, 2000 NIDA/NIMH Substance Abuse Conference
20
Propensity Score Method: Model Specification
Specification of propensity score model
Lean towards over-fitting vs. under-fitting?
Model diagnosis: are the covariates balanced across treatment groups within each stratum?
Stratify by propensity score and key covariates (one or two)?
Model misspecification less serious than ANCOVA?
Only rank of estimated propensity score is used
Stratification not sensitive to minor perturbations in model
Limited empirical evidence (Drake 1993 Biometrics, Dehejia and Wahba 1999 JASA)
May 24, 2000 NIDA/NIMH Substance Abuse Conference
21
Propensity Score Method: Options
Stratification
Matching (case-control)
Curse of dimensionality relevant, less critical
Mahalonobis distance matching
Match on propensity score (+ a few key covariates?)
Design stage vs. analysis stage
Primary vs. secondary data collection
ANCOVA: regress Y on T and propensity score (+ a few key covariates? + interactions?)
Nonparametric regression? Stratified by T?
May 24, 2000 NIDA/NIMH Substance Abuse Conference
22
Dimension Reduction
Fundamental challenge in ANCOVA
Valid assessment of treatment effect can be obtained using nonparametric regression of Y on X, stratified by T
Curse of dimensionality
No obvious way to reduce dimensionality?
Propensity score method is an elegant way to reduce dimensionality
Alternative dimension reduction methods?
Slicing regression (Duan and Li 1991 Annals of Statistics, Li 1991 JASA): use inverse regression of X on Y...
May 24, 2000 NIDA/NIMH Substance Abuse Conference
23
Propensity Score Method: References
Rosenbaum and Rubin (1983 Biometrika, 1984 JASA)
Lavori, Dawson, and Mueller (1994 Stat. in Medicine)
Rosenbaum (1995) Observational Studies, Springer-Verlag
Rubin (1997) Annals of Internal Medicine
D’Agastino (1998 Stat. in Medicine)
Normand et al. (2000 manuscript)
Hirano et al. (2000 manuscript)
May 24, 2000 NIDA/NIMH Substance Abuse Conference
24
Dealing with Hidden Selection Bias
T Y | X
Very challenging problem, no easy solutions
Given X, how does treatment depend on outcome?
Overt selection bias can be made to look like stratified randomized experiment
Hidden selection bias cannot be made to…
Rosenbaum-Rubin’s sensitivity analysis
Instrumental variable analysis a la Rubin Causal Model
Selection modeling
May 24, 2000 NIDA/NIMH Substance Abuse Conference
25
Rosenbaum’s Sensitivity Analysis: General Principle
How robust is the observed treatment effect against hidden selection bias?
Analogous to pattern mixture model for missing data
Formulate a family of plausible models for hidden selection bias (from mild to severe)
Assess treatment effect under each model
Determine how much hidden selection bias wipes out treatment effect
Is this much hidden selection bias realistic?
Specificity analysis
May 24, 2000 NIDA/NIMH Substance Abuse Conference
26
Unobserved Confounder Model
logit(Xi)) = + Xi Ui 0 Ui 1
> 0: maximum impact of unobserved hidden bias
= exp() is the upper bound between Xi)’s | X
Example: 2 x 2 table (analyzed with Fisher’s exact test)
Worst case scenario for hidden bias:
Unobserved health is a perfect predictor of survival
Healthy patients are more likely to receive treatment
Ui = 1 for all survivors; = 0 for all deceaseds
Null distribution is a tilted hypergeometric distribution
Given , derive P-value under tilted hypergeometric distribution
May 24, 2000 NIDA/NIMH Substance Abuse Conference
27
Rosenbaum’s Sensitivity Analysis: Limitations
Does not give THE answer (should we expect one?)
Rosenbaum’s sensitivity analysis is based on permutation test (tilted by hidden selection bias)
Permutation test is the foundation for randomized trials, but rarely used: heavy computation burden
Used more in recent years, e.g., COMMIT
Special software required for tilted permutation test
Programming logic not difficult
Very heavy computation burden
Inertia for users to stay with familiar packages
May 24, 2000 NIDA/NIMH Substance Abuse Conference
28
Instrumental Variable (IV) Analysis for RED,a la Rubin Causal Model
Encouragement intervention serves as instrumental variable
Assume binary intervention (I = 0, 1)
binary treatment (T = 0, 1)
T(0) T(1) Category
0 0 Never takers
0 1 Compliers (new users)
1 0 Defiers (assumed to be absent)
1 1 Always takers
Very likely different beyond observed characteristics
May 24, 2000 NIDA/NIMH Substance Abuse Conference
29
IV Analysis: Observed Compliance Status
I = 0:
Untreated: C or N
Treated: A or D
I = 1:
Untreated: N or D
Treated: C or A
Randomized encouragement design
Compliance status distributed similarly across intervention groups
%(C) = %(treated | I = 1) %(treated | I = 0)
= %(untreated | I = 0) %(untreated | I = 1)
May 24, 2000 NIDA/NIMH Substance Abuse Conference
30
IV Analysis: Intervention Effect by Subgroups
Key assumption:
Effect of encouragement intervention mediated entirely through treatment (exclusion restriction)
Always takers and never takers: no treatment variation
no intervention effect [exclusion restriction]
cannot assess treatment effect
Intervention effect manifested entirely through compliers
May 24, 2000 NIDA/NIMH Substance Abuse Conference
31
Complier Average Causal Effect
Treatment “Efficacy” on compliers:
CACE = Program effect / Incremental adoption rate
Program effect = intent-to-treat effect for encouragement intervention on outcome
Incremental adoption rate = intent-to-treat effect for encouragement intervention on adoption
Distribute intervention effect on outcome over compliers
May 24, 2000 NIDA/NIMH Substance Abuse Conference
32
IV Analysis: External Validity
Treatment effect estimable only for compliers (new users)
Intrinsic limitation of design (RED or imperfect RCT)
Should we be concerned about treatment effect for always takers and never takers?
Yes for efficacy trials, less so for RED
Never taker might never adopt treatment voluntarily
Mandate vs. choice
Universal dissemination vs. practical dissemination
Always takers more critical; absent for new treatments
Presence of defier likely to cancel some intervention effect
IV estimate is conservative for true CACE
May 24, 2000 NIDA/NIMH Substance Abuse Conference
33
IV Analysis: Discussions
Exclusion restriction needs to be entertained carefully
Likelihood and Bayesian methods available under weaker assumptions
Non-randomized encouragement design (observational studies with instrumental variables)
Example: McClellan et al. JAMA 1994, distance to alternative types of hospitals
IV analysis usually deflates precision substantially
Bias-variance trade-off?
Combine propensity score analysis with IV analysis?
May 24, 2000 NIDA/NIMH Substance Abuse Conference
34
IV Analysis: References
Sommer and Zeger (1991 Stat. in Medicine)
Angrist, Imbens, and Rubin (1996 JASA)
Imbens and Rubin (1997 Annals of Statistics)
Little and Yau (1998 Psych Methods)
Hirano, Imbens, Rubin, and Zhou (2000 Biostatistics)
Wells, et al. (2000, manuscript)
May 24, 2000 NIDA/NIMH Substance Abuse Conference
35
DiscussionsFormulate research questions
Treatment effect for whom? Adoption?
Careful design usually more effective than analytic solutions
Matching to avoid severe imbalance
Promising methods for mitigating overt selection bias
Careful modeling warranted
Propensity score method worth exploring
Nonparametric regression worth exploring
Hidden selection bias very challenging
Rosenbaum’s sensitivity analysis warranted
IV analysis and selection model require careful assessment