personalized medicine and dynamic treatment …personalized medicine and dynamic treatment regimes...
Post on 23-Jun-2020
5 Views
Preview:
TRANSCRIPT
Personalized Medicine and DynamicTreatment Regimes
Marie Davidian and Butch Tsiatis
Department of StatisticsNorth Carolina State University
April 21, 2014
Trends and Innovations in Clinical Trial Statistics Conference
1/118
Acknowledgement
IMPACT: Innovative Methods Program for AdvancingClinical Trials• A joint venture of the University of North Carolina at Chapel
Hill, Duke University, and North Carolina State University• Supported by Program Project grant P01 CA142538 from
the National Cancer Institute, Statistical Methods forCancer Clinical Trials
• Comprises five interrelated research projects, includingDiscovery of Dynamic Treatment Regimes
This short course is part of IMPACT’s education andoutreach efforts
2/118
Course Outline
Introduction to Personalized Medicine and Dynamic TreatmentRegimes
Estimation of Optimal Treatment Regimes for a Single Decision
Estimation of Optimal Treatment Regimes from a ClassificationPerspective
SMARTs and Estimation of Optimal Treatment Regimes forMultiple Decisions
3/118
Roadmap
8:00 am – 8:45 am Marie will give an introduction to dynamictreatment regimes
8:45 am – 10:00 am Butch will discuss estimation of optimaltreatment regimes in the single decisionsetting
10:00 am – 10:20 am Break
10:20 am – 10:50 am Butch will discuss estimation of optimalregimes as a classification problem
10:50 am – 12:00 pm Marie will discuss SMART studies andestimation of optimal treatment regimesin the multiple decision setting
4/118
Course Outline
Introduction to Personalized Medicine and Dynamic TreatmentRegimes
Estimation of Optimal Treatment Regimes for a Single Decision
Estimation of Optimal Treatment Regimes from a ClassificationPerspective
SMARTs and Estimation of Optimal Treatment Regimes forMultiple Decisions
5/118
What is Personalized Medicine?
“The right treatment for the right patient (at the right time) ”
6/118
What is Personalized Medicine?
In general: One size does not fit all• Multiple treatment options may be available• Patient heterogeneity
I Across patients: What works for one patient may not workfor another
I Within patients: What works now may not work later
Premise: Use information on a patient’s characteristics todetermine which treatment option s/he should receive (andwhen. . . )• Genetic/genomic, demographic,. . .• Physiologic/clinical measures, medical history,. . .
7/118
Popular Perspective on Personalized Medicine
Subgroup identification/targeted treatment:• Are there subgroups of patients who are likely to benefit
from a particular treatment?• Can biomarkers be developed to identify such patients?• Can a treatment be developed that targets a subgroup that
is very likely to benefit from that treatment?
Focus: Treating and targeting treatment for subgroups of thepatient population• “Right patient for the treatment ”
8/118
Another Perspective on Personalized Medicine
Clinical practice: Clinicians make (a series of) treatmentdecision(s) over the course of a patient’s disease or disorder• Fixed schedule , milestone in the disease process, event
necessitating a decision• Multiple treatment options• Accrued information on the patient
Focus: Given information on patient’s characteristics, can wedetermine the treatment from among the available options mostlikely to benefit him/her?• “Right treatment for the patient ”• This is the perspective we will take in this course
9/118
Clinical Decision-Making
How are these decisions made?• Clinical judgment , practice guidelines• Synthesize all information on a patient up to the point of a
decision to determine next treatment action• Goal: “Individualize ” the decision to the patient• Can this be formalized and made evidence-based ?
10/118
Dynamic Treatment Regime
Formalizing personalized medicine: At any decision• Construct a rule that takes as input the available
information on the patient to that point and dictates thenext treatment from among the possible, feasible options
Dynamic treatment regime: A set of formal rules , eachcorresponding to a decision point• Dynamic because the treatment action varies depending
on the accrued information
11/118
Dynamic Treatment Regime
Assume: There is a clinical outcome by which treatmentbenefit can be assessed• Survival time, CD4 count, indicator of no myocardial
infarction within 30 days, . . .• Larger outcomes are better
Intuitively: Rules should depend on characteristics (variables ,covariates ) that exhibit a qualitative interaction with treatment• “Tailoring variables ”
12/118
Tailoring Variables
Ave
rage
Out
com
e
Trt A
Trt ATrt B
Trt B
No Mutation Mutation
13/118
Tailoring Variables
Ave
rage
Out
com
e
Trt A
Trt ATrt B
Trt B
No Mutation Mutation
14/118
Single Decision Point
Simple example: Which treatment to give patients whopresent with primary operable breast cancer ?• Treatment options : L-phenylalanine mustard and
5-fluorouracil (PF, c1) or c1 + tamoxifen (PFT, c2)• Available information: age (years), progesterone receptor
(PR) level (fmol)• Outcome: Disease-free survival to three years
Gail and Simon (1985):• Data on ∼ 1,300 patients in a National Surgical Adjuvant
Breast and Bowel Project (NSABP) clinical trial
15/118
Single Decision Point
Gail and Simon rule/regime:• “If age < 50 years and PR < 10 fmol, give c1 (coded as 1);
otherwise, give c2 (coded as 0)”• Mathematically: The rule d is (rectangular region )
d(age,PR) = I(age < 50 and PR < 10)
Alternatively: Rules of form (hyperplane )
d(age,PR) = I{age + 8.7 log(PR)− 60 > 0}
16/118
Multiple Decision Points
Two decision points:• Decision 1 : Induction chemotherapy (options c1, c2)• Decision 2 : Maintenance treatment for patients who
respond (options m1, m2), Salvage chemotherapy for thosewho don’t (options s1, s2)
• Outcome : Survival time
17/118
Multiple Decision Points
• At baseline: Information x1 (accrued information h1 = x1)
• Decision 1: Two options {c1, c2}; rule 1: d1(x1)⇒ x1 → {c1, c2}• Between decisions 1 and 2: Collect additional information x2,
including responder status
• Accrued information h2 = {x1, chemotherapy at decision 1, x2}• Decision 2: Four options {m1,m2, s1, s2}; rule 2: d2(h2)⇒
h2 → {m1,m2} (responder), h2 → {s1, s2} (nonresponder)
• Regime : d = (d1,d2)
18/118
Summary
Single decision: 1 decision point• Baseline information x ∈ X• Set of treatment options a ∈ A• Decision rule d(x), d : X → A• Dynamic treatment regime: d
Multiple decisions: K decision points• Baseline information x1, intermediate information xk
between decisions k − 1 and k , k = 2, . . . ,K• Set of treatment options at decision k ak ∈ Ak
• Accrued information h1 = x1 ∈ H1,
hk = {x1,a1, x2,a2, . . . , xk−1,ak−1, xk} ∈ Hk , k = 2, . . . ,K
• Decision rules d1(h1),d2(h2), . . . ,dK (hK ), dk : Hk → Ak
• Dynamic treatment regime d = (d1,d2, . . . ,dK )
19/118
Considerations
Challenge: Infinitude of possible regimes d• D = class of all possible dynamic treatment regimes• Can we find the “best ” set of rules; i.e., the “best ” dynamic
treatment regime in D?• How do we define “best? ”
20/118
Optimal Dynamic Treatment Regime
An optimal dynamic treatment regime: Should satisfy• If all patients in the population were to receive treatment
according to regime d , the expected (average) outcome forthe population would be as large as possible
• If an individual patient were to receive treatment accordingto regime d = (d1, . . . ,dK ), his/her expected outcomewould be as large as possible given the informationavailable on him/her
• Can we formalize this?
21/118
Potential Outcomes
Single decision: Possible treatment options a ∈ A• For a randomly chosen patient from the population, define
the random variable Y ∗(a) = the outcome the patient wouldachieve if s/he were to receive treatment option a
• “Potential outcome ”• E.g., if A = {0,1} (two possible treatment options)
I Y ∗(1)= the outcome a patient would achieve if s/he weregiven treatment 1
I Similarly for Y ∗(0)
• E{Y ∗(1)} is the average outcome if all patients in thepopulation were to receive 1; similarly for E{Y ∗(0)}
22/118
Potential Outcomes
Potential outcome for a regime: For any d ∈ D• Y ∗(d) = the outcome a patient would achieve if s/he
received treatment according to regime d• If A = {0,1} and patient has baseline information X = x
d(x) = 0 or 1
• Potential outcome under d
Y ∗(d) = Y ∗(1)d(X ) + Y ∗(0){1− d(X )}
23/118
Optimal Dynamic Treatment Regime
Thus:• E{Y ∗(d)|X = x} is the expected outcome for a patient with
information x if s/he were to receive treatment according toregime d ∈ D
• E{Y ∗(d)} = E [ E{Y ∗(d)|X} ] is the expected (average)outcome for the population if all patients were to receivetreatment according to regime d ∈ D
Optimal regime: dopt is a regime in D such that• E{Y ∗(d)} ≤ E{Y ∗(dopt )} for all d ∈ D• E{Y ∗(d)|X = x} ≤ E{Y ∗(dopt )|X = x} for all d ∈ D and
all x ∈ X
24/118
Optimal Dynamic Treatment Regime
Multiple decisions: Same idea, only more complicated• Baseline information X1
• Potential outcomes under a regime d ∈ D
X ∗2 (d), . . . ,X ∗K (d),Y ∗(d)
• Patient presents with information X1; all subsequentinformation determined by d
Optimal regime: dopt is a regime in D such that• E{Y ∗(d)} ≤ E{Y ∗(dopt )} for all d ∈ D• E{Y ∗(d)|X1 = x1} ≤ E{Y ∗(dopt )|X1 = x1} for all d ∈ D
and x1 ∈ H1
25/118
Value of a Regime
Value of regime d: The expected (average) outcome for thepopulation if all patients were to receive treatment according toregime d ∈ D• Value of regime d is E{Y ∗(d)}• Sometimes written as V (d) = E{Y ∗(d)}
26/118
Important Philosophical Point
Distinguish between:• The “best ” treatment for a patient• The “best ” treatment decision for a patient given the
information available
Best treatment for a patient: Option abest ∈ A correspondingto the largest Y ∗(a) for that patient
Best treatment decision given the information available:• We cannot hope to determine abest because we can never
see all the potential outcomes on a given patient• We can hope to make the optimal decision given the
information available, i.e., find dopt that makesE{Y ∗(d)|X = x} (and the value, E{Y ∗(d)}) as large aspossible
27/118
Estimation of Optimal Treatment Regimes
Result: This perspective on personalized medicine boils downto discovery of optimal dynamic treatment regimes using data• Existing data from observational studies (e.g., registries),
previously conducted clinical trials• Prospectively collected data from clinical trials conducted
specifically for this purpose (coming up)
28/118
Estimation of Optimal Treatment Regimes
Single decision: Data (Xi ,Ai ,Yi), i = 1, . . . ,n• n subjects indexed by i• Xi = information observed on subject i• Ai = observed treatment actually received by subject i• Yi = observed outcome for subject i• Goal: Under suitable assumptions , estimate dopt using
these data
29/118
Estimation of Optimal Treatment Regimes
Multiple decisions: Data
(X1i ,A1i ,X2i ,A2i , . . . ,X(K−1)i ,A(K−1)i ,XKi ,Yi), i = 1, . . . ,n
• X1i = Baseline information observed on subject i• Xki , k = 2, . . . ,K = intermediate information between
decisions k − 1 and k on subject i• Aki , k = 1, . . . ,K = observed treatment actually received
by subject i at decision k• Yi = observed outcome for subject i ; can be ascertained
after decision K or can be a function of X2i , . . . ,XKi
• Goal: Under suitable assumptions , estimatedopt = (dopt
1 , . . . ,doptK ) using these data
30/118
Estimation of Optimal Treatment Regimes
Challenges:1. The optimal dynamic treatment regime dopt is defined in
terms of potential outcomes (not the observed data)2. This sounds hard ; does it really have to be?3. Were all possibly useful tailoring variables that clinicians
used in the study at each decision point recorded in thedata?
31/118
Estimation of Optimal Treatment Regimes
Challenge 1: dopt is defined in terms of potential outcomes• Need to be able to express the definition of dopt
equivalently in terms of the data• Possible under certain assumptions• Butch will demonstrate in the single decision case• Also possible in the multiple decision case (but harder)
32/118
Estimation of Optimal Treatment Regimes
Challenge 2: Can’t we just piece together results from severalstudies to figure out the optimal regime?
• Study comparing induction chemotherapies based on response
• Study comparing maintenance therapies based on survival timeamong responders to induction therapy
• Study comparing salvage therapies based on survival timeamong nonresponders
• Wouldn’t the regime that uses the “best ” option in each studyhave to have the “best ” average outcome?
• Delayed effects: The induction therapy with the highestproportion of responders might have other effects that rendersubsequent treatments less effective in regard to survival
• Result: Must consider the entire sequence of decisions
33/118
Clinical Trials for Discovery of Treatment Regimes
Challenge 3: Conduct a clinical trial specifically designed forestimation of optimal dynamic treatment regimes• SMART: Sequential Multiple Assignment Randomized Trial• Randomize subjects to the treatment options at each
decision point• Collect extensive, detailed information initially and
intermediate to decision points on possible tailoringvariables
Later: Marie will have more to say on both of these issues
34/118
Clinical Trials for Discovery of Treatment Regimes
Cancer
Response
NoResponse
Response
NoResponse
C
C
1
2
M1
M2
M1
M2
S1
S2
S1
S2
Randomization at •s
35/118
Course Outline
Introduction to Personalized Medicine and Dynamic TreatmentRegimes
Estimation of Optimal Treatment Regimes for a Single Decision
Estimation of Optimal Treatment Regimes from a ClassificationPerspective
SMARTs and Estimation of Optimal Treatment Regimes forMultiple Decisions
36/118
Optimal Regime
Assume: Large outcomes are good
An optimal regime:• A regime that, if followed by all patients in the population,
yields the largest outcome on average• That is, yields the largest value
Goal: Given data (evidence ) from a clinical trial orobservational study, estimate an optimal regime satisfying thisdefinition• For now : Consider regimes involving a single decision /rule
37/118
Statistical Framework
Simplest setting: A single decision with two treatment options• A = {0,1}
Observed data: (Yi ,Xi ,Ai), i = 1, . . . ,n, iid• Yi outcome, Xi baseline covariates, Ai = 0,1 treatment
received
Treatment regime: A single rule• A function d : X → {0, 1}
38/118
Statistical Framework
Breast cancer example: Which treatment to give patients whopresent with primary operable breast cancer ?• Two treatment options (0 or 1), x =(age, PR)• Possible rules
d(age,PR) = I(age < 50 and PR < 10)
d(age,PR) = I{age + 8.7log(PR)− 60 > 0}
Goal, restated:• Let D be the class of all possible regimes d• Estimate dopt ∈ D such that, if dopt were followed by all
patients in the population, it would lead to largest averageoutcome (value ) among all regimes in D
39/118
Potential Outcomes
Reminder: We can hypothesize potential outcomes• Y ∗(1) = outcome that would be achieved if patient were to
receive 1; Y ∗(0) defined similarly• E{Y ∗(1)} is the average outcome if all patients in the
population were to receive 1; and similarly for E{Y ∗(0)}• We observe
Y = Y ∗(1)A + Y ∗(0)(1− A)
40/118
Potential Outcomes
No unmeasured confounders: Assume that
Y ∗(0),Y ∗(1) ⊥⊥ A|X
• X contains all information used to assign treatments• Automatically satisfied for data from a randomized trial• Standard but unverifiable assumption for observational
studies• Implies that
E{Y ∗(1)} = E [E{Y ∗(1)|X}]= E [E{Y ∗(1)|X ,A = 1}]= E{E(Y |X ,A = 1) }
and similarly for E{Y ∗(0)}
41/118
Potential Outcomes
E{Y ∗(1)} = E{E(Y |X ,A = 1) }
Implication for estimating E{Y ∗(1)}: Similarly for E{Y ∗(0)}• E(Y |X ,A) = Q(X ,A) is the regression of Y on X and A• E(Y |X ,A) is unknown• Posit a model Q(X ,A;β) for Q(X ,A)
• Estimate β based on observed data =⇒ β(e.g., least squares)
• Estimator for E{Y ∗(1)}
n−1n∑
i=1
Q(Xi ,1; β)
42/118
Potential Outcomes
Potential outcome for a regime:• For any d ∈ D, define Y ∗(d) to be the potential outcome
for a patient if s/he were given treatment according toregime d
Y ∗(d) = Y ∗(1)d(X ) + Y ∗(0){1− d(X )}
• E{Y ∗(d)} is the average outcome for the population if allpatients were treated according to regime d
• That is, E{(Y ∗(d)} = V (d) is the value of regime d
43/118
Value of a Regime
Y ∗(d) = Y ∗(1)d(X ) + Y ∗(0){1− d(X )}
Value of regime d: Using no unmeasured confounders
E{Y ∗(d)} = E [E{Y ∗(d)|X}]
= E[E{Y ∗(1)|X}d(X ) + E{Y ∗(0)|X}{1− d(X )}
]= E
[E(Y |X ,A = 1)d(X ) + E(Y |X ,A = 0){1− d(X )}
]= E [Q(X ,1)d(X ) + Q(X ,0){1− d(X )}],
where E(Y |X ,A) = Q(X ,A)
44/118
Estimating the Value of a Regime
E{Y ∗(d)} = E [Q(X ,1)d(X ) + Q(X ,0){1− d(X )}]
Again: E(Y |X ,A) is not known• Posit a model Q(X ,A;β) for E(Y |X ,A)
• Estimate β based on observed data =⇒ β(e.g., least squares)
• Estimate V (d) = E{Y ∗(d)} by
V (d) = n−1n∑
i=1
[Q(Xi ,1, β)d(Xi) + Q(X1,0, β){1− d(Xi)}]
45/118
Optimal Regime
Reminder: dopt is a regime in D such that• E{Y ∗(d)} ≤ E{Y ∗(dopt )} for all d ∈ D• E{Y ∗(d)|X = x} ≤ E{Y ∗(dopt )|X = x} for all d ∈ D and
x ∈ X
Optimal regime:
dopt (x) = arg maxa={0,1} E{Y ∗(a)|X = x}
• Thus
dopt (x) = I[E{Y ∗(1)|X = x} > E{Y ∗(0)|X = x}]= I{Q(x ,1) > Q(x ,0) }
46/118
Estimating the Optimal Regime
“Regression estimator”:• Estimate dopt by
doptREG(x) = I{Q(x ,1; β) > Q(x ,0; β) }
• Estimator for V (dopt ) = E{Y ∗(dopt )}
VREG(doptREG) = n−1
n∑i=1
[Q(X ,1i , β)dopt
REG(Xi)+Q(X ,0i , β){1−doptREG(Xi)}
]Concern: Q(X ,A;β) may be misspecified , so dopt
REG could befar from dopt
47/118
Optimal Restricted Regime
Alternative perspective: Q(X ,A;β) defines a class of regimes
d(x , β) = I{Q(x ,1;β) > Q(x ,0;β)},
indexed by β, that may or may not contain dopt
• E.g., suppose in truth
E(Y |X ,A) = exp{1 + X1 + 2X2 + 3X1X2 + A(1− 2X1 + X2)}
=⇒ dopt (x) = I(x2 ≥ 2x1 − 1) (hyperplane)
48/118
Optimal Restricted Regime
Posited model:
Q(X ,A;β) = β0 + β1X1 + β2X2 + A(β3 + β4X1 + β5X2)
• Regimes I{Q(x ,1;β) > Q(x ,0;β)} define a class ofregimes Dη with elements
I(x2 ≥ η1x1+η0) or I(x2 ≤ η1x1+η0), η0 = −β3/β5, η1 = −β4/β5
depending on the sign of β5
• Parameter η is defined as a function of β• The optimal regime in this case is contained in Dη• However, the estimated regime I{Q(x ,1; β) > Q(x ,0; β}
may not estimate the optimal regime within Dη if theposited model is incorrect
49/118
Optimal Restricted Regime
Suggests: Consider directly a restricted class of regimes Dηwith elements of form
d(x ; η) = dη(x) indexed by η
• Such regimes may be motivated by a regression model orbased on cost , feasibility in practice, interpretability; e.g.,
d(x ; η) = I(x1 < η0, x2 < η1)
• Dη may or may not contain dopt but is still of interest
• Optimal restricted regime doptη (x) = d(x ; ηopt ),
ηopt = arg maxη E{Y ∗(dη)}
50/118
Estimating the Optimal Restricted Regime
Optimal restricted regime: doptη (x) = d(x ; ηopt ),
ηopt = arg maxη E{Y ∗(dη)} = arg maxη V (dη)
Approach:• Directly estimate the value V (dη) = E{Y ∗(dη)} for any
fixed η =⇒ V (dη)
• Estimate the optimal restricted regime by finding
ηopt = arg maxηV (dη) =⇒ doptη (x) = d(x ; ηopt )
• We refer to this as a value search estimator for doptη
51/118
Value Search Estimators
Required: A “good ” estimator for V (dη)
• Missing data analogy• Let Cη denote η-regime consistency indicator
Cη = Ad(X ; η) + (1− A){1− d(X ; η)}
• “Full data ” are {X ,Y ∗(dη)}; “observed data ” are(X ,Cη,CηY )
• =⇒ Only a subset of subjects have observed outcomesunder dη; the rest are missing
52/118
Value Search Estimators
Cη = Ad(X ; η) + (1− A){1− d(X ; η)}
Propensity scores:• π(X ) = pr(A = 1|X ) is the propensity score for treatment• Randomized trial: π(X ) is known• Observational study: Posit a model π(X ; γ) (e.g., logistic
regression) and fit using (Ai ,Xi), i = 1, . . . ,n =⇒ γ.• Propensity of receiving treatment consistent with dη
πc(X ; η) = pr(Cη = 1|X ) = E(Cη|X )
= E [Ad(X ; η) + (1− A){1− d(X ; η)}|X ]
= π(X )d(X ; η) + {1− π(X )}{1− d(X ; η)}
• Write πc(X ; η, γ) with π(X ; γ)
53/118
Value Search Estimators
Estimators for V (dη) = E{Y ∗(dη)}: For fixed η• Inverse probability weighted estimator
VIPWE (dη) = n−1n∑
i=1
Cη,iYi
πc(Xi ; η, γ).
• Consistent for V (dη) if π(X ; γ) (hence πc(X ; η, γ)) is correct
54/118
Value Search Estimators
Consistency:
E{
CηYπc(X ; η)
}= E
{CηY ∗(dη)
πc(X ; η)
}= E
[E{
CηY ∗(dη)
πc(X ; η)
∣∣∣∣Y ∗(dη),X}]
= E[
E{Cη|Y ∗(dη),X}Y ∗(dη)
πc(X ; η)
]= E
[E{Cη|X}Y ∗(dη)
πc(X ; η)
]= E
{πc(X ; η)Y ∗(dη)
πc(X ; η)
}= E{Y ∗(dη)}
55/118
Value Search Estimators
Estimators for V (dη) = E{Y ∗(dη)}: For fixed η• Doubly robust augmented inverse probability weighted
estimator
VAIPWE (dη) = n−1n∑
i=1
{Cη,iYi
πc(Xi ; η, γ)−
Cη,i − πc(Xi ; η, γ)
πc(Xi ; η, γ)m(Xi ; η, β)
}m(X ; η, β) = E{Y ∗(dη)|X} = Q(X ,1;β)d(X ; η)+Q(0,X ;β){1−d(X ; η)}
and Q(X ,A;β) is a model for E(Y |X ,A)
• Consistent if either π(X , γ) or Q(X ,A;β) is correct
56/118
Augmented Estimator
Under MAR: Y ∗(dη)⊥⊥Cη|X• If γ
p−→ γ∗ and βp−→ β∗, this estimator
p−→
E{
CηYπc(X ; η, γ∗)
− Cη − πc(X ; η, γ∗)
πc(X ; η, γ∗)m(X ; η, β∗)
}= E
[Y ∗(dη) +
{Cη − πc(X ; η, γ∗)
πc(X ; η, γ∗)
}{Y ∗(dη)−m(X ; η, β∗)}
]= E{Y ∗(dη)}+ E
[{Cη − πc(X ; η, γ∗)
πc(X ; η, γ∗)
}{Y ∗(dη)−m(X ; η, β∗)}
]• Hence the estimator is consistent if either
I π(X ; γ∗) = π(X )⇒ πc(X ; η, γ∗) = πc(X ; η)(propensity correct)
I Q(X ,A;β∗) = Q(X ,A)⇒ m(X ; η, β∗) = m(X ; η)(regression correct )
I Double robustness
57/118
Value Search Estimators
Result: Estimators ηopt for ηopt obtained by maximizingVIPWE (dη) or VAIPWE (dη) in η
• Estimated optimal restricted regime doptη (x) = d(x ; ηopt )
• Non-smooth functions of η; must use suitable optimizationtechniques
• Estimators for V (doptη ) = E{Y ∗(dopt
η )}
VIPWE (doptη,IPWE ) or VAIPWE (dopt
η,AIPWE )
Can calculate standard errors• Semiparametric theory : AIPWE is more efficient than
IPWE for estimating V (dη) = E{Y ∗(dη)}• =⇒ Estimating regimes based on AIPWE should be
“better ”
58/118
Simulation Studies
Extensive simulations: One representative scenario• True E(Y |X ,A) of form
Qt (X ,A;β) = exp{β0+β1X 21 +β2X 2
2 +β3X1X2+A(β4+β5X1+β6X2)}
• Misspecified model for E(Y |X ,A)
Qm(X ,A;β) = β0 + β1X1 + β2X2 + A(β3 + β4X1 + β5X2)
• =⇒ Dη = {I(η0 + η1X1 + η2X2 > 0)}, dopt ∈ Dη• True propensity score
logit{πt (X ; γ)} = γ0 + γ1X 21 + γ2X 2
2
• Misspecified propensity score
logit{πm(X ; γ)} = γ0 + γ1X1 + γ2X2
59/118
Simulation Studies
Here: Both the correct and misspecified outcome regressionmodels define a class of regimes
Dη = {I(η0 + η1x1 + η2x2 > 0)}
so that dopt ∈ Dη• Other scenarios with dopt /∈ Dη• 1000 Monte Carlo data sets with n = 500
60/118
Simulation Studies
• Truth: V (dopt ) = E{Y ∗(dopt )} = 3.71
• V (dopt ) obtained for each data set using 106 Monte Carlosimulations
Method V (dopt ) SE Cov V (dopt )
REG-Qt 3.70 (0.14) – – 3.71 (0.00)REG-Qm 3.44 (0.18) – – 3.27 (0.19)
PS correct
IPWE 4.01 (0.26) 0.28 86.1 3.63 (0.07)AIPWE-Qt 3.72 (0.15) 0.15 94.7 3.70 (0.01)AIPWE-Qm 3.85 (0.21) 0.23 91.8 3.66 (0.07)
PS incorrect
IPWE 4.06 (0.22) 0.23 69.4 3.42 (0.20)AIPWE-Qt 3.72 (0.15) 0.15 95.2 3.70 (0.01)AIPWE-Qm 3.81 (0.18) 0.19 94.1 3.57 (0.20)
61/118
Simulation Studies
Performance: Empirical CDFs of over 1000 data sets ofV (dopt )/V (dopt ) under true and misspecified models
62/118
Application: NSABP Trial
Recall: Two treatment options, n = 1276• A = 0 if PFT (c2), = 1 if PF (c1)• Y = 1 if patient survived disease-free to 3 years, = 0
otherwise• x = {age, log(PR+1)}=(x1, x2)
• Consider regimes of the form
d(x ; η) = I(age < η0 and PR < η1)
• Gail and Simon : η0 = 50, η1 = 10
63/118
Application: NSABP Trial
IPWE and AIPWE:• Propensity: π = n−1∑n
i=1 Ai (randomized trial )• Outcome regression: With expit(u) = eu/(1 + eu)
Q(x ,a;β) = expit{β0 + β1x1 + β2x2 + a(β3 + β4x1 + β5x2)}
Fit by logistic regression
Results: Regimes of form
d(x ; η) = I(age < η0 and PR < η1)
• Gail and Simon : η0 = 50, η1 = 10• Estimated optimal regimes :
ηopt0 ηopt
1 V (doptη ) (95% CI)
IPWE 56 5 0.68 (0.64,0.72)AIPWE 60 9 0.69 (0.65,0.72)
64/118
Discussion
• Two approaches to estimation of optimal regimes for asingle decision point
• Regression methods – estimate an optimal regime basedon a posited regression model
• Value search methods – estimate an optimal treatmentregime within a specified class by maximizing the value
• Robustness to misspecification (AIPWE)• Both methods may be extended to multiple decision points
(later)• Next: Alternative classification perspective for single
decision
65/118
References
Gail, M. and Simon, R. (1985). Testing for qualitativeinteractions between treatment effects and patient subsets.Biometrics 41, 361–372.
Zhang, B., Tsiatis, A. A., Laber, E. B., and Davidian, M. (2012).A robust method for estimating optimal treatment regimes.Biometrics 68, 1010–1018.
66/118
BreakPlease return by 10:20 am
67/118
Course Outline
Introduction to Personalized Medicine and Dynamic TreatmentRegimes
Estimation of Optimal Treatment Regimes for a Single Decision
Estimation of Optimal Treatment Regimes from a ClassificationPerspective
SMARTs and Estimation of Optimal Treatment Regimes forMultiple Decisions
68/118
Classification Methods
Generic classification situation:• Z = outcome , class , label ; here, Z = {0, 1} (binary )• X = vector of covariates, features taking values in X , the
feature space• d is a classifier: d : X → {0, 1}• D is a family of classifiers , e.g.,
I Hyperplanes of the form
I(η0 + η1X1 + η2X2 > 0)
I Rectangular regions of the form
I(X1 < a1) + I(X1 ≥ a1,X2 < a2)
69/118
Classification Methods
Generic classification problem:• Training set: (Xi ,Zi), i = 1, . . . ,n• Find classifier d ∈ D that minimizes
I Classification error
n∑i=1
{Zi − d(Xi )}2
I Weighted classification error
n∑i=1
wi{Zi − d(Xi )}2
70/118
Classification Methods
Approaches:• This problem has been studied extensively by statisticians
and computer scientists• Machine learning (supervised learning)• Many methods and software are available• Recursive partitioning (CART ): Rectangular regions• Support vector machines: Hyperplanes, etc.
71/118
Value Search Estimators, Revisited
Recall: Estimation of dη ∈ restricted class Dη
ηopt = arg maxηV (dη) = arg maxηE{Y ∗(dη)}
• Doubly robust AIPWE
VAIPWE (dη) = n−1n∑
i=1
{Cη,iYi
πc(Xi ; η, γ)−
Cη,i − πc(Xi ; η, γ)
πc(Xi ; η, γ)m(Xi ; η, β)
}
Cη,i = Aid(Xi ; η) + (1− Ai){1− d(Xi ; η)}πc(Xi ; η, γ) = π(Xi ; γ)d(Xi ; η) + {1− π(Xi ; γ)}{1− d(Xi ; η)}m(Xi ; η, β) = Q(Xi ,1; β)d(Xi ; η) + {1−Q(Xi ,0, β)}{1− d(Xi ; η)}
72/118
Value Search Estimators, Revisited
Algebra: VAIPWE (dη) may be rewritten as
n−1n∑
i=1
d(Xi ; η)C(Xi) + terms not involving d
C(Xi) =
{AiYi
π(Xi , γ)− Ai − π(Xi , γ)
π(Xi ; γ)Q(Xi ,1; β)
}−
{(1− Ai)Yi
1− π(Xi ; γ)+
Ai − π(Xi ; γ)
1− π(Xi ; γ)Q(Xi ,0; β)
},
• The contrast function is
E{C(Xi)|Xi} ≈ C(Xi) = Q(Xi ,1)−Q(Xi ,0)
73/118
Contrast Function
E{C(Xi)|Xi} ≈ C(Xi) = Q(Xi ,1)−Q(Xi ,0)
Result: C(Xi) can be viewed as an estimator for the contrastfunction for subject i• If we knew the functions Q(Xi ,1) and Q(Xi ,0), we should
assign treatment
I{C(Xi) > 0} = I{Q(Xi ,1)−Q(Xi ,0) > 0}
to patient i .
74/118
Classification Perspective
ηopt = arg maxηn∑
i=1
d(Xi ; η)C(Xi)
Further algebra: Another identity
d(Xi ; η)C(Xi) = −|C(Xi)|[I{C(Xi) > 0} − d(Xi ; η)]2
+ |C(Xi)|I{C(Xi) > 0}
• Hence
ˆηopt = arg minηn∑
i=1
|C(Xi)| [I{C(Xi) > 0} − d(Xi ; η)]2,
75/118
Classification Perspective
ηopt = arg minηn∑
i=1
|C(Xi)| [I{C(Xi) > 0} − d(Xi ; η)]2
Alternative formulation: This can be viewed as a weightedclassification problem with• Label I{C(Xi) > 0}• Classifier d(Xi ; η)
• Weight |C(Xi)|
76/118
NSABP Trial, Revisited
Recall: Two treatment options, n = 1276• A = 0 if PFT (c2), = 1 if PF (c1)• Y = 1 if patient survived disease-free to 3 years, = 0
otherwise• x = {age, log(PR+1)}=(x1, x2)
AIPWE: Classification formulation• Propensity: π = n−1∑n
i=1 Ai (randomized trial )• Outcome regression: With expit(u) = eu/(1 + eu)
Q(x ,a;β) = expit{β0 + β1x1 + β2x2 + a(β3 + β4x1 + β5x2)}
Fit by logistic regression
77/118
NSABP Trial, Revisited
Classifier: Recursive partitioning (CART ):• R function rpart with default settings except for weights|C(Xi)| and labels I{C(Xi) > 0}
Results: Regime of the form
d(x ; η) = I(age < η0 and PR < η1)
ηopt0 ηopt
1 V (doptη ) (95% CI)
IPWE 56.0 5.0 0.68 (0.64,0.72)AIPWE 60.0 9.0 0.69 (0.65,0.72)CART 59.5 16.5 0.68 (0.65,0.72)
78/118
Discussion
• Estimation of optimal regime using “off-the-shelf ”classification methods
• Estimated contrast functions constructed independently ofclass of regimes
• Form of estimated optimal regime determined byclassification method
• Extension to multiple decisions ongoing
79/118
References
Zhang, B., Tsiatis, A. A., Davidian, M., Zhang, M., and Laber, E.B. (2012). Estimating optimal treatment regimes from aclassification perspective. Stat 1, 103–114.
Zhao, Y., Zeng, D., Rush, A. J., and Kosorok, M. R. (2012).Estimating individualized treatment rules using outcomeweighted learning. Journal of the American StatisticalAssociation 107, 1106–1118.
80/118
Course Outline
Introduction to Personalized Medicine and Dynamic TreatmentRegimes
Estimation of Optimal Treatment Regimes for a Single Decision
Estimation of Optimal Treatment Regimes from a ClassificationPerspective
SMARTs and Estimation of Optimal Treatment Regimes forMultiple Decisions
81/118
Recap: Multiple Decision Points
Two decision points:• Decision 1 : Induction chemotherapy (options c1, c2)• Decision 2 : Maintenance treatment for patients who
respond (options m1, m2), Salvage chemotherapy for thosewho don’t (options s1, s2)
• Outcome : Survival time
82/118
Recap: Multiple Decision Points
• At baseline: Information x1 (accrued information h1 = x1)
• Decision 1: Two options {c1, c2}; rule 1: d1(x1)⇒ x1 → {c1, c2}• Between decisions 1 and 2: Collect additional information x2,
including responder status
• Accrued information h2 = {x1, chemotherapy at decision 1, x2}• Decision 2: Four options {m1,m2, s1, s2}; rule 2: d2(h2)⇒
h2 → {m1,m2} (responder), h2 → {s1, s2} (nonresponder)
• Regime : d = (d1,d2)
83/118
Recap: Multiple Decision Points
In general: K decision points• Baseline information x1, intermediate information xk
between decisions k − 1 and k , k = 2, . . . ,K• Set of treatment options at decision k ak ∈ Ak
• Accrued information h1 = x1 ∈ H1,
hk = {x1,a1, x2,a2, . . . , xk−1,ak−1, xk} ∈ Hk , k = 2, . . . ,K
• Decision rules d1(h1),d2(h2), . . . ,dK (hK ), dk : Hk → Ak
• Dynamic treatment regime d = (d1,d2, . . . ,dK )
• D is the set of all possible K -decision regimes
84/118
Recap: Optimal Regime for Multiple Decisions
Optimal regime: dopt ∈ D such that a patient with baselineinformation X1 = x1 who receives all K treatments according todopt has expected outcome as large as possible
Potential outcomes under a regime d ∈ D:• Baseline information X1, potential outcomes
X ∗2 (d), . . . ,X ∗K (d),Y ∗(d)
dopt satisfies:• E{Y ∗(d)} ≤ E{Y ∗(dopt )} for all d ∈ D• E{Y ∗(d)|X1 = x1} ≤ E{Y ∗(dopt )|X1 = x1} for all d ∈ D
and x1 ∈ H1
85/118
Recap: Delayed Effects
Recall: Can not “piece together” an optimal regime fromseveral separate studies• In the cancer example, the regime that uses the “best ”
options from separate studies of inductionchemotherapies , maintenance therapies , and salvagetherapies would not necessarily lead to dopt
• Delayed effects: The induction therapy with the highestproportion of responders might have other effects thatrender subsequent treatments less effective in regard tosurvival
• Result: Require data from a study involving the entiresequence of decisions
86/118
Estimation of Optimal Treatment Regimes
K decisions: Data
(X1i ,A1i ,X2i ,A2i , . . . ,X(K−1)i ,A(K−1)i ,XKi ,Yi), i = 1, . . . ,n
• X1i = Baseline information observed on subject i• Xki , k = 2, . . . ,K = intermediate information between
decisions k − 1 and k on subject i• Aki , k = 1, . . . ,K = observed treatment actually received
by subject i at decision k• Hi = accrued information for subject i up to decision k
H1i = X1i , Hki = (X1i ,A1i , . . . ,A(k−1)i ,Xki), k = 2, . . . ,K
• Yi = observed outcome for subject i ; can be ascertainedafter decision K or can be a function of X2i , . . . ,XKi
87/118
Data Sources
Goal: Under suitable assumptions , estimate dopt using thesedata
Studies:• Existing data from longitudinal observational studies,
previously conducted clinical trials with follow-up• Prospectively collected data from clinical trials conducted
specifically for this purpose
88/118
Assumptions
Single decision setting: Require the assumption of nounmeasured confounders• Treatment received is independent of potential outcomes
that would be achieved under all treatment options givenbaseline X
• Unverifiable for observational studies• Automatically satisfied in a randomized trial
89/118
Assumptions
Multiple decision setting: Require an analogous assumptionat all decision points• Sequential randomization assumption• Treatment received at each decision k = 1, . . . ,K is
independent of potential outcomes that would be achievedunder all treatment options at all decisions given accruedinformation Hk
• Unverifiable for longitudinal observational studies• Randomized trial ?
90/118
SMART
Randomized study: Sequential Multiple AssignmentRandomized Trial – SMART• Randomize subjects to treatment options that are feasible
given their accrued information at each decision point• Sequential randomization assumption is automatically
satisfied
91/118
SMART
Cancer
Response
NoResponse
Response
NoResponse
C
C
1
2
M1
M2
M1
M2
S1
S2
S1
S2
Randomization at •s
92/118
SMART
Embedded regimes: The SMART for the cancer exampleembeds 8 simple regimes for j , `,p = 1,2 of the form
“Give cj followed by m` if response, else sp if nonresponse”
• E.g., for j = 1, ` = 1, p = 1
d1(h1) = c1 for all h1
d2(h2) = m1 if response= s1 if nonresponse
• These embedded regimes use no baseline information atdecision 1 and response status only at decision 2
• By randomization , there will be subjects whose actualtreatment received is consistent with following at least oneof these embedded regimes
93/118
SMART
Result: Multi-purpose• Can estimate the value for each embedded regime and
formally compare (e.g., using suitable inverse probabilityweighted estimators)
• Can collect extensive, detailed information at baseline andintermediate to decisions 1 and 2 to inform estimation ofan optimal regime
SMART design principles:• Embed simple regimes• Base sample size on conventional comparisons, e.g.,
decision 1 treatments• Base sample size on embedded regimes (how?)• Open problem
94/118
Estimation of Optimal Treatment Regimes
Goal, restated: Estimate dopt satisfying• E{Y ∗(d)} ≤ E{Y ∗(dopt )} for all d ∈ D• E{Y ∗(d)|X1 = x1} ≤ E{Y ∗(dopt )|X1 = x1} for all d ∈ D
and x1 ∈ H1
Sequential randomization assumption: Data from• A SMART• A fabulous longitudinal observational study
For definiteness: Take K = 2 and Ak = {0,1}, k = 1,2• Recall accrued information
H1i = X1i , H2i = (X1i ,A1i ,X2i)
95/118
Characterizing the Optimal Regime
Optimal regime dopt : Follows from backward induction(dynamic programming )• Formally in terms of potential outcomes• Sequential randomization assumption allows equivalent
expressions in terms of observed data (X1,A1,X2,A2,Y )(as for single decision and no unmeasured confounders)
96/118
Characterizing the Optimal Regime
Optimal regime dopt : Backward induction• Decision 2: Q2(H2,A2) = E(Y |H2,A2)
dopt2 (h2) = I{Q2(h2,1) > Q2(h2,0)} = arg maxa2={0,1}Q2(h2,a2)
Y2(h2) = max{Q2(h2,0),Q2(h2,1)}
• Decision 1: Q1(H1,A1) = E{Y2(H2)|H1,A1}
dopt1 (h1) = I{Q1(h1,1) > Q1(h1,0)]} = arg maxa1={0,1}Q1(h1,a1)
Y1(h1) = max{Q1(h1,0),Q2(h1,1)}
• dopt = (dopt1 ,dopt
2 )
• The value of dopt is V (dopt ) = E{Y1(H1)}• Y2(h2) and Y1(h1) are referred to as the value functions
97/118
Q-Learning
Q-learning: May be thought of as a generalization of theregression estimator to sequential decisions• Reinforcement learning in computer science• Posit models for the “Q-functions ”• Involves some complications not present in the single
decision case
98/118
Q-Learning
Estimation of dopt :• Decision 2: Posit and fit a model Q2(H2,A2;β2) by
regressing Y on H2,A2 (e.g., least squares) and estimate
doptQ,2(h2) = I{Q2(h2,1; β2) > Q2(h2,0; β2)}
• For each i , form “predicted value ”
Y 2i = Y2i(H2i ; β2) = max{Q2(H2i ,0; β2),Q2(H2i ,1; β2)}
• Decision 1: Posit and fit a model Q1(H1,A1;β1) by
regressing Y 2 on H1,A1 (e.g., least squares) and estimate
doptQ,1(h1) = I{Q1(h1,1; β1) > Q2(h1,0; β1)}
• Estimated regime doptQ = (dopt
Q,1, doptQ,2)
99/118
Q-Learning
Estimation of V (dopt ):• For each i , form
Y 1i = Y1i(H1i ; β1) = max{Q1(H1i ,0; β1),Q1(H1i ,1; β1)}
• Estimate the value V (dopt ) by
VQ(dopt ) = n−1n∑
i=1
Y 1i
100/118
Q-Learning
Decision 2: Positing/fitting Q2(H2,A2;β2) is a standardregression problem• Data (H2i ,A2i ,Yi), i = 1, . . . ,n• For continuous Y , linear models are often used, e.g.
Q2(h2,a2;β2) = hT2 β21 + a2(hT
2 β22), h2 = (1,hT2 )T
• Standard model selection methods, diagnostics, etc• Under the linear model
dopt2 (h2) = I(hT
2 β22 > 0)
Y2(H2;β2) = HT2 β21 + (HT
2 β22)I(HT2 β22 > 0)
101/118
Q-Learning
Y2(H2;β2) = HT2 β21 + (HT
2 β22)I(HT2 β22 > 0)
Decision 1: It is common to assume a linear model forQ1(H1,A1) = E{Y2(H2;β2)|H1,A1}
Q1(h1,a1;β1) = hT1 β11 + a2(hT
1 β12), h1 = (1,hT1 )T
• “Data ” (H1i ,A1i ,Y 2i), i = 1 . . . ,n
• This is clearly a nonstandard regression problem; more ina moment
• Under the linear model (if it were correct )
dopt1 (h1) = I(hT
1 β12 > 0)
102/118
Q-Learning
Decision 1: Even if the Decision 2 linear model is correctlyspecified, a linear model at Decision 1 is almost certainlyincorrect• In truth Q2(h2,a2) = −x1 + a2(0.5a1 + x2) and
X2|X1 = x1,A1 = a1 ∼ N (ξx1, σ2)
• May show that
Q1(h1,a1) = −x1 +σ√2π
exp{−(0.5a1 + ξx1)2/(2σ2)}
+(0.5a1 + ξx1)Φ{(0.5a1 + ξx1)/σ}
• Not linear in general
103/118
Q-Learning
Issues and challenges:• Need not use linear models• Regardless, as in the single decision case, incorrect model
specification will impact quality of estimation of dopt
• Modeling at decisions K − 1, . . . ,1 challenging due to needto model max
• More flexible models for Q-functions can be used• Because of nonsmooth max operator, standard asymptotic
theory is invalid• Considerable current research
104/118
Other Regression Methods
A-learning: Related, alternative class of methods• g-estimation• In general, for k = 1, . . . ,K
doptk (hk ) = I{Qk (hk ,1) > Qk (hk ,0)} = I{Qk (hk ,1)−Qk (hk ,0) > 0}
depends only on the sign of the contrast function
Qk (hk ,1)−Qk (hk ,0)
• Model the contrast functions instead of the full Q-functions• Along with propensity scores for each decision known in a
SMART)
105/118
Value Search Methods
Generalization to K > 1:• Consider directly a restricted class of regimes Dη with
elements dη = (dη,1, . . . ,dη,K ); at decision k
dη,k (hk ) = dk (hk ; ηk )
• Based on cost , feasibility , interpretability at each decision• Optimal restricted regime dopt
η
ηopt = arg maxηE{Y ∗(dη)} = arg maxηV (dη)
• Estimator V (dη) for fixed η; maximize in η to obtain ηopt
• Required: A “good ” V (dη)
106/118
Value Search Methods
Extend missing data analogy to monotone dropout: K = 2• “Full data ”
{X1,X ∗2 (dη),Y ∗(dη)}• Define η-regime consistency indicator Cη
• Cη =∞: If a patient’s actual treatments A1,A2 are allconsistent with following dη, then
(X1,X2,Y ) = {X1,X ∗2 (dη),Y ∗(dη)}
• Cη = 2: If actual A1 is consistent with following dη but A2 isnot, then
(X1,X2) = {X1,X ∗2 (dη)}but Y ∗(dη) is “missing ” (“dropout ” before decision 2)
• Cη = 1: If neither of A1,A2 is consistent with following dη,both X ∗2 (dη),Y ∗(dη) are “missing ” (“dropout ” beforedecision 1)
107/118
Value Search Methods
Propensity scores: At decision k = 1, . . . ,K
πk (Hk ) = pr(Ak = 1|Hk )
• Randomized trial (SMART): πk (hk ) is known• Observational study: Posit and fit models πk (hk ; γk )
• Can express propensities of receiving treatment consistentwith dη through decision k in terms of πk (hk )
Result: Can develop IPWE and doubly-robust AIPWEestimators for V (dη) in terms of Cη and πk (hk )
108/118
Value Search Methods
Issues and challenges:• As for K = 1, is nonstandard optimization problem• IPWE (leading term for AIPWE) involves only subjects with
Cη =∞ (consistent with following regime for all Kdecisions )
• May become infeasible for K > 3• Simulation evidence: Performance comparable to
Q-learning with correct models; AIPWE is robust to modelmodel misspecification while Q-learning is not
109/118
Application: STAR*D
STAR*D: Sequenced Treatment Alternatives to RelieveDepression – SMART with ∼ 4000 subjects• Goal: Strategies for patients whose depressive disorder
does not benefit sufficiently from citalopram• Here: A simplified version• Run-in: All received citalopram• Decision 1: 1260 non-responders randomized to augment
(0) or switch (1)• Decision 2: 330 non-responders to decision 1 treatment
randomized to augment (0) or switch (1)• Response: Quick Inventory of Depressive
Symptomatology (QIDS) score
12-week QIDS < 5 (remission) OR > 50% decrease from entry
110/118
Simplified STAR*D schematic
Non-response to citalopram
Augment
Switch
Response
Non-Response
Response
Non-Response
Augment
Augment
Switch
Switch
Continue current
treatment
Continue current
treatment
Randomization at •s
111/118
Application: STAR*D
Data:• Baseline (at decision 1): X1 = (X11,X12), X11 = QIDS at
decision 1, X12 = QIDS slope from entry to decision 1• Decision 2: X2 = (X21,X22), X21 = QIDS at decision 2,
X22 = QIDS slope from decision 1 to decision 2• Treatment: Ak = 0,1, k = 1,2• Outcome: Cumulative average negative QIDS
Y = −I(X21 ≤ L0)X21 − I(X21 > L0)(X21 + T )/2
L0 = max(5,Xentry/2), T = ending QIDS
(function of accrued information )
112/118
Application: STAR*D
Y = −I(X21 ≤ L0)X21 − I(X21 > L0)(X21 + T )/2
L0 = max(5,Xentry/2), T = ending QIDS
Q-learning:• Decision 2:
Q2(H2,A2;β2) = −{I(X21 ≤ L0) + I(X21 > L0)/2}X21
+I(X21 > L0)(β20 + β21X21 + β22X22 + β23A2)/2
• Decision 1:
Q1(H1,A1) = β10 + β11X11 + β12X12 + A1(β13 + β14X12)
113/118
Application: STAR*D
Estimated regime:
doptQ,2(h2) = always switch
doptQ,1(h1) = I(x12 > −0.97) ( switch if QIDS slope > -0.97)
• VQ(dopt ) = −7.97
114/118
Discussion
• Two classes of methods for estimation of optimal regimesfor multiple decision points
• Q- and A-learning (sequential regression methods) –estimate an optimal regime based on sequential positedregression models
• Potential for model misspecification is high• Value search methods – robustness to misspecification• Limitation to small K due to need for “regime consistency ”
115/118
References
Lunceford, J., Davidian, M., and Tsiatis, A. A. (2002).Estimation of the survival distribution of treatment regimes intwo-stage randomization designs in clinical trials. Biometrics58, 48–57.
Murphy, S. A. (2005). An experimental design for thedevelopment of adaptive treatment strategies. Statistics inMedicine 24, 1455–1481.
Schulte, P. J., Tsiatis, A. A., Laber, E. B., and Davidian, M.(2014). Q- and A-learning methods for estimating optimaldynamic treatment regimes. Statistical Science, in press.
Zhang, B., Tsiatis, A. A., Laber, E. B., and Davidian, M. (2013).Robust estimation of optimal dynamic treatment regimes forsequential treatment decisions. Biometrika 100, 681–694.
116/118
Closing Remarks
• Estimation of optimal treatment regimes is a wide openarea of research
• SMARTs are the “gold standard ” data source forestimation of optimal regimes
• Design considerations for SMARTs?• High-dimensional covariate information? Regression
model selection ?• “Black box ” vs. restricted class of regimes?• Inference ?• Balancing multiple outcomes (e.g., efficacy vs. toxicity )?• . . .
117/118
Thought Leaders
2013 MacArthur Fellow Susan Murphy and Jamie Robins
118/118
top related