getting the best of both worlds: exploiting observational data to maximise the relevance of rcts for...
TRANSCRIPT
Getting the best of both worlds: exploiting observational data to
maximise the relevance of RCTs for decision-making
Health Economics at Kings seminar, 14.10.2015
Richard [email protected]
Acknowledgements• Collaborators: Jas Sekhon, Erin Hartman (UC Berkeley), Adam Steventon (Health
Foundation), Noemi Kreif, Stephen O’Neill, (LSHTM)• Hartman E, Grieve R, Ramsahai R & Sekhon JS (2015). From SATE to PATT: combining
experimental with observational studies to estimate population treatment effects, JRSS(A) doi: 10.1111/rssa.1209
• Steventon A, Grieve R, Bardsley M (2015) . An approach to assess generalizability in comparative effectiveness research: a case study of the Whole Systems Demonstrator cluster randomized trial. Med Decis Mak doi: 10.1177/0272989X15585131
• Team for Health Economics, Policy & Technology Assessment (THETA)http://theta.lshtm.ac.uk/• Funding: Senior Research Fellowship, National Institute for Health Research
A. External validity
Exclusions from randomised study
B. Non-compliance
Switch to alternative treatment
D. Confounding
Observed and unobserved characteristics differ by treatment group
C. Missing data
Questionnaires not returned
Reweight cost-effectiveness to target population
Present analytical framework for handling treatment switching in CEA
Develop CEA methods to recognise data ‘missing not at random’
Extend approaches allowing for unobserved confounding
Research programme: CEA for target population
Output A-D:
Modelling framework for accurate cost-effectiveness for target population
e.g. NIHR fellowship: research programmeAnalytical methods to improve the relevance of cost-effectiveness
studies for decision-making (IMPROVE study)
Context
• Policy-makers want cost-effectiveness evidence for relevant treatments and for the target population of interest
• Target population-patients eligible for treatment in practice • Economic evaluations methods recommend use RCT data • Assume external validity without adequate justification • What are we assuming? • How can we test the underlying assumptions?• What external data is required to test the assumptions, and
improve external validity?
Context What do we want to estimate?
(see Imai et al, 2008)
Population versus sample effects• Sample average treatment effect for treated (SATT)
– e.g. treatment effects for treated in RCT• Population average treatment (PATT)
– e.g. treatment effects for treated in target population • SATT≠PATT if heterogeneity or treatment (or control) different
between settings• Patient and contextual characteristics differ across settings• These characteristics may modify the relative effectiveness, costs
and cost-effectiveness
Context• We propose quantitative approach, combines RCT and NRS• Program evaluation on external validity (Hotz et al, 2005; Heckman
and Vytlacil 2008; Imai et al 2008; Allcott 2014)• Biostatistics on generalisability from trials (Stuart at al, 2011) • Our approach builds on the causal inference literatures:• Harnesses large observational data (Electronic Medical Records
(EMRs), population-wide disease registries and claims data) • Defines underlying assumptions• Reweights RCT estimates for target population• Tests underlying assumptions• Aims to give unbiased estimates for the target population
Structure of talk
• Running example I- simple clinical intervention• Define key assumptions identifying population effects• Method for testing them• Example II- complex health service intervention• Areas further research
Running examplePulmonary artery catheterisation (PAC)• Common Invasive device monitoring flow Intensive care Units (ICU)• Routine practice limited evidence (but strong beliefs)• Highly influential observational study: PAC increase mortality• UK multicentre RCT: PAC no effect on survival, and not cost-effective• Concern seminal RCT lacked external validity• Prospective non-randomised study (NRS)• Accessed UK intensive care database over 1.5 million admissions• Included data from 50 centres, where patients had PAC (or no PAC)• NRS same protocol, casemix, resource use and endpoints RCT
Intervention Pulmonary artery catheter (PAC): UK RCT and UK NRS: Good overlap
RCT NRS
Inclusion general UK ICUsAdmission 01-04
general UK ICUsAdmission 03-04
Equipoise in centre No equipoise required
Consent No consent
Might benefit from pac PACs: would benefitNo PACs: admitted to ICU
Exclusion Specialist centres Specialist centres
Children, transplants Children, transplants
N 506 PACs; 508 No PACS 1052 PACs; 31,447 No PACs
Characteristics and outcomes of PAC patients RCT vs NRS
Variables RCTPAC
(n=506)
NRS PAC
(n=1,051)
Mean Age 64.2 61.9% Elective surgical% Emergency surgical
6.328.1
9.323.1
% Ventilated admission 88.9 86.2% Teaching hospital 21.7 42.5
% In hospital Mortality 68.4 59.3Mean hospital cost (£) 18,612 19,577
Strategy to maximise external validity of cost-effectiveness estimates from RCTs
Estimand interest: PATT Single RCT and data on target population from NRS
1. Define target population according to those received treatment in the NRS
2. Estimate effectiveness within RCT: match controls to treated within the trial to maintain internal validity (SATT)
3. Reweight RCT to characteristics of target population treated 4. Placebo test to test key assumptions5. Estimate PATT
1. Target population defined from observational data patients eligible for treatment
3. Reweight RCT to target population
4. Assess generalisability with Placebo test: reweighted RCT vs target population
5. If test passes, treatment effectiveness after reweighting to target population
General approach
2. Estimate Treatment effectiveness in RCT
Defining assumptionsterminology
• Yist potential outcomes patient i, study sample s, treatment t• Sample ‘assignment’, whether in RCT (s=1) or NRS (s=0)• 2 arm comparison, treatment (t=1) vs control (t=0)• Set of covariates, W common to both RCT and NRS settings • Explains sample assignment to RCT vs target population
Identifying PATT from RCT
)1,0( 0001 iiii TSYYPATT
)0,1,()1,1,(1 iiiiiiiio TSWYETSWYEEPATT
Identifying PATT from RCTKey assumptions
1. Treatment invariant to sample assignment (consistency)Individual’s potential outcomes for RCT or target population, e.g. for t=1
2. Strong ignorability of sample assignment for treatedPotential outcomes independent for same W
)1,(),( 111 TWSYYo
111 iio YY
Testing assumptions Placebo tests (Jones, Health Econ 2007)
• Test assumptions by comparing outcomes between the settings• Are outcomes RCT treated after reweighting same as NRS treated• Placebo tests: assess if recover zero effect with current model• Null Hypothesis: data are inconsistent with valid research design• Will accept null if:
a) lack of power b) Observe outcome differences between settings
i) selection into RCT conditional on potential outcomes
ii) treatment differs between settings • i.e. A good result is a small mean difference and low P Value
Estimation
AIM: adjust RCT data to target populationMain approach uses maximum entropy reweighting (MaxEnt)
– Reweight each strata from RCT to represent target population– Can use aggregated data from target population– Weights according to constraints, e.g. covariate means– Harnesses search algorithm: See Kapar and Kesavan (1992)– Of possible distributions satisfy constraints, pick closest to uniform– Avoids imposing assumptions about distribution of the weights
• Also consider regression and propensity score approaches
Implementing approach in PAC case study
1. Within RCT, for each PAC find matched control, to estimate SATT2. Placebo tests, contrast weighted outcomes RCT PAC versus NRS PAC3. Reweight SATT covariate information from NRS, to estimate PATT
Report measures of uncertainty that allow for:correlation costs and outcomesuncertainty weighting to target population (Bickel and Savov, 2008)
Each aspect implemented in R
RCT (PAC) versus NRS (PAC)
P values on baseline characteristic before reweighting
P values from t-tests
age<5757<age<67
67<age<75
age>75elective su
rg
emer surg
non-surg
small u
nit
medium unit
large unit
teaching hospital
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Reweighting the RCT for the NRS baseline characteristics
before and after reweighting with MaxEnt
P values from t-tests
age<5757<age<67
67<age<75
age>75elective su
rg
emer surg
non-surg
small u
nit
medium unit
large unit
teaching hospital
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
after reweighting
before reweighting
Placebo tests- mortality NRS treated- RCT treated after weighting by maxent
Null hypothesis: the study design is invalidmortality difference
P Value Power
Overall -3% 0.05 96%
Subgroup Teaching hospital -4% 0.12 27%
Non teaching hospital -3% 0.05 85%
Non surgical -4% 0.06 83%
Elective Surgery 8% 0.46 8%
P values, corrected for multiple comparisons
PATT (MaxEnt) versus SATTmean difference in survival (PAC - no PAC)
-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
SATT
OVERALL
elective surgical
Non surgical
Non teaching
teaching
PATT (MaxEnt) versus SATTIncremental net benefits PAC- No PAC
£20,000 per QALY
-25000 0 25000 50000 75000 100000
SATT
OVERALL
elective surgical
Non surgical
Non teaching
teaching
Interpretation and implications (I)
• In this example, placebo tests passed, overall• Approach extends previous work for improving external
validity (Hotz et al, 2005; Imai et al 2008, Stuart et al, 2011)• Defines and tests key assumptions • Harnesses RCTs and large observational data• Current approach applies to settings with full compliance• Future work, identify complier-average causal effect for target
population, under non-compliance• What do when assumptions fail.. conclude results lack
external validity?
Example II: Telehealth
25
“the remote exchange of data between a patient and health care
professionals as part of the diagnosis and management of health care
conditions”
Telehealth devices enable items such as blood glucose level and weight to be
measured by the patient and transmitted to health care professionals
working remotely.
Whole System Demonstrator Trial (WSD)Telehealth vs ‘usual care’ Steventon et al, 2012
• Adult Patients with diabetes, COPD, heart failure• Cluster Randomised design• Randomised 179 GP practices, 3230 adults, 3 English counties • Intervention: broad class of Telehealth devices• Control: usual care, services available at the trial sites• Blinding: patients at consent, not for recruiters• Outcomes: emergency admissions, outpatient visits
WSD outcomes: emergency admissions
Percentage difference in emergency admissions: -20.6% (95% CI -33.8% to -7.4%)
WSDinterpretation
• BMJ paper: “..results suggest Telehealth helped patients avoid need for emergency admissions..”
• Department of Health: "We funded a three-year randomised control trial..which clearly demonstrated that if implemented appropriately, telehealth can reduce emergency admissions by 20%...”
• David Cameron: "We've trialled it, it's been a huge success, and now we're on a drive to roll this out nationwide.”
Generalising effectiveness RCTs to target populations (see Hartman et al, 2015)
• Policy makers want: average treatment effect for target population• RCT: sample average treatment effects• For RCT to provide unbiased effectiveness for target population:Assumes:
– RCT participants have similar characteristics to the target population– ‘intervention’ in the RCT is the same in routine practice– ‘control’ in the RCT is the same as ‘usual care’ in routine practice
WSDbaseline characteristics
30
Control groupN=1584
Intervention groupN=1570
Age 70.8 (11.7) 69.7 (11.6)
Female no (%) 643 (40.6) 647 (41.2)
COPD no (%) 786 (49.6) 739 (47.1)
Diabetes no (%) 342 (21.6) 406 (25.9)
Heart failure no (%) 456 (28.8) 425 (27.1)
Mean (SD) no. chronic conditions 1.8 (1.80) 1.8 (1.78)
Mean (SD) Combined Model score 0.26 (0.20) 0.26 (0.20)
Did RCT control arm have usual care?Emergency admissions: before and after RCT
.
WSDre-analysis to improve generalisability
Step 1. Define target population from observational data, patients who met RCT inclusion criteria and received usual care (n=88,830)
Step 2. From target population find individuals (n=1,293), who match RCT controls (n=1,293), on 65 baseline covariates
Step 3. Placebo test: contrast outcomes from matched target population (n=1,293), vs RCT controls (n=1,293) (A)
Step 4. Sensitivity analysis: re-estimate Telehealth effectivenessRCT Telehealth arm (n=1,229) vs matched target population (n=1,229) (B)
WSD: placebo results
Placebo tests
(A)RCT (control) vs.
matched target population (usual care)
Emergency admissions per head 1.22(1.05, 1.43)
Outpatient attendances per head 1.03(0.94, 1.13) a
Incidence rate ratios.
a: passes placebo testFrom Steventon et al. (2015).
WSD: re-analyses
a Passes placebo test, point estimate and confidence interval lie within range
Placebo tests Estimated effect of Telehealth(A)
RCT control arm vs target population
(usual care)
(B)RCT Telehealth arm vs target population
(usual care)
(C)RCT Telehealth armvs RCT control armb
Emergency admissions per head
1.22(1.05, 1.43)
1.12(0.95, 1.31)
0.90(0.77, 1.05)
Outpatient attendances per head
1.03(0.94, 1.13) a
1.04(0.95, 1.14)
1.02(0.93, 1.12)
Incidence rate ratios.
b we use data from 1229 in the Telehealth arm and 1229 in the control arm to ensure consistency across the re-analyses. See Steventon et al (2015) for details.
Interpretation and implications
• Framework testing assumptions for generalisability from RCT• ‘simple’ interventions placebo tests satisfied (Hartman et al, 2015)• Why generalisability assumptions fail for Telehealth vs usual care?• RCT control arm did not receive usual care, or differences in
unobserved characteristics vs target population?• RCT participation encouraged the control arm to seek more help• More generally, better outcomes RCT control arm vs usual care • If RCT controls do not receive usual care, other designs required
– RCT assumes: control therapy= usual care and no unobserved confounders– Matched analysis assumes no unobserved confounding
Discussion and future work• Framework for providing relevant estimates of treatment
effectiveness for target populations• Lack evidence on patient/public/clinicians preferences or beliefs• Big data technology, opportunity to survey large numbers• Insight into behavioural dimensions
– Example Health Economics MOdelling (HEMO) for maintaining blood supply– RCT (n=50,000), Observational (2 million), and stated pref survey (n=100,000)
• More accurate predictions of how effective and cost-effective a complex intervention will be in practice
• Other areas- phase II and III RCTs• Evidence from several RCTs
36
References• Hartman E, Grieve R, Ramsahai R & Sekhon JS (2015). From SATE to PATT:
combining experimental with observational studies to estimate population treatment effects, JRSS(A) doi: 10.1111/rssa.1209
• Heckman et al. (1998). Characterizing selection bias using experimental data. Econometrica 66(Sep):1017-1098
• Hellenstein and Imbens (1999). Imposing moment restrictions from auxiliary data by weighting. Review Econ and Statistics, 81(1): 1-14.
• Hotz et al (2005). Predicting the efficacy of future training programs using past experiences at other locations. Journal of Econometrics 125(1-2); 241-70
• Imai et al (2008). Misunderstandings among experimentalists and observationalists about causal inference. JRSS (A): 171(2):481-502.
• Steventon A, Grieve R, Bardsley M (2015) . An approach to assess generalizability in comparative effectiveness research: a case study of the Whole Systems Demonstrator cluster randomized trial. Med Decis Mak doi: 10.1177/0272989X15585131
• Stuart et al (2011). The use of Propensity scores to assess the generalisability of results from randomised trials. JRSS(A) 174:369-86