getting the best of both worlds: exploiting observational data to maximise the relevance of rcts for...

Getting the best of both worlds: exploiting observational data to

maximise the relevance of RCTs for decision-making

Health Economics at Kings seminar, 14.10.2015

Richard [email protected]

mailto:[email protected]

Acknowledgements• Collaborators: Jas Sekhon, Erin Hartman (UC Berkeley), Adam Steventon (Health

Foundation), Noemi Kreif, Stephen O’Neill, (LSHTM)• Hartman E, Grieve R, Ramsahai R & Sekhon JS (2015). From SATE to PATT: combining

experimental with observational studies to estimate population treatment effects, JRSS(A) doi: 10.1111/rssa.1209

• Steventon A, Grieve R, Bardsley M (2015) . An approach to assess generalizability in comparative effectiveness research: a case study of the Whole Systems Demonstrator cluster randomized trial. Med Decis Mak doi: 10.1177/0272989X15585131

• Team for Health Economics, Policy & Technology Assessment (THETA)http://theta.lshtm.ac.uk/• Funding: Senior Research Fellowship, National Institute for Health Research

A. External validity

Exclusions from randomised study

B. Non-compliance

Switch to alternative treatment

D. Confounding

Observed and unobserved characteristics differ by treatment group

C. Missing data

Questionnaires not returned

Reweight cost-effectiveness to target population

Present analytical framework for handling treatment switching in CEA

Develop CEA methods to recognise data ‘missing not at random’

Extend approaches allowing for unobserved confounding

Research programme: CEA for target population

Output A-D:

Modelling framework for accurate cost-effectiveness for target population

e.g. NIHR fellowship: research programmeAnalytical methods to improve the relevance of cost-effectiveness

studies for decision-making (IMPROVE study)

Context

• Policy-makers want cost-effectiveness evidence for relevant treatments and for the target population of interest

• Target population-patients eligible for treatment in practice • Economic evaluations methods recommend use RCT data • Assume external validity without adequate justification • What are we assuming? • How can we test the underlying assumptions?• What external data is required to test the assumptions, and

improve external validity?

Context What do we want to estimate?

(see Imai et al, 2008)

Population versus sample effects• Sample average treatment effect for treated (SATT)

– e.g. treatment effects for treated in RCT• Population average treatment (PATT)

– e.g. treatment effects for treated in target population • SATT≠PATT if heterogeneity or treatment (or control) different

between settings• Patient and contextual characteristics differ across settings• These characteristics may modify the relative effectiveness, costs

and cost-effectiveness

Context• We propose quantitative approach, combines RCT and NRS• Program evaluation on external validity (Hotz et al, 2005; Heckman

and Vytlacil 2008; Imai et al 2008; Allcott 2014)• Biostatistics on generalisability from trials (Stuart at al, 2011) • Our approach builds on the causal inference literatures:• Harnesses large observational data (Electronic Medical Records

(EMRs), population-wide disease registries and claims data) • Defines underlying assumptions• Reweights RCT estimates for target population• Tests underlying assumptions• Aims to give unbiased estimates for the target population

Structure of talk

• Running example I- simple clinical intervention• Define key assumptions identifying population effects• Method for testing them• Example II- complex health service intervention• Areas further research

Running examplePulmonary artery catheterisation (PAC)• Common Invasive device monitoring flow Intensive care Units (ICU)• Routine practice limited evidence (but strong beliefs)• Highly influential observational study: PAC increase mortality• UK multicentre RCT: PAC no effect on survival, and not cost-effective• Concern seminal RCT lacked external validity• Prospective non-randomised study (NRS)• Accessed UK intensive care database over 1.5 million admissions• Included data from 50 centres, where patients had PAC (or no PAC)• NRS same protocol, casemix, resource use and endpoints RCT

Intervention Pulmonary artery catheter (PAC): UK RCT and UK NRS: Good overlap

RCT NRS

Inclusion general UK ICUsAdmission 01-04

general UK ICUsAdmission 03-04

Equipoise in centre No equipoise required

Consent No consent

Might benefit from pac PACs: would benefitNo PACs: admitted to ICU

Exclusion Specialist centres Specialist centres

Children, transplants Children, transplants

N 506 PACs; 508 No PACS 1052 PACs; 31,447 No PACs

Characteristics and outcomes of PAC patients RCT vs NRS

Variables RCTPAC

(n=506)

NRS PAC

(n=1,051)

Mean Age 64.2 61.9% Elective surgical% Emergency surgical

6.328.1

9.323.1

% Ventilated admission 88.9 86.2% Teaching hospital 21.7 42.5

% In hospital Mortality 68.4 59.3Mean hospital cost (£) 18,612 19,577

Strategy to maximise external validity of cost-effectiveness estimates from RCTs

Estimand interest: PATT Single RCT and data on target population from NRS

1. Define target population according to those received treatment in the NRS

2. Estimate effectiveness within RCT: match controls to treated within the trial to maintain internal validity (SATT)

3. Reweight RCT to characteristics of target population treated 4. Placebo test to test key assumptions5. Estimate PATT

1. Target population defined from observational data patients eligible for treatment

3. Reweight RCT to target population

4. Assess generalisability with Placebo test: reweighted RCT vs target population

5. If test passes, treatment effectiveness after reweighting to target population

General approach

2. Estimate Treatment effectiveness in RCT

Defining assumptionsterminology

• Yist potential outcomes patient i, study sample s, treatment t• Sample ‘assignment’, whether in RCT (s=1) or NRS (s=0)• 2 arm comparison, treatment (t=1) vs control (t=0)• Set of covariates, W common to both RCT and NRS settings • Explains sample assignment to RCT vs target population

Identifying PATT from RCT

)1,0( 0001 iiii TSYYPATT

)0,1,()1,1,(1 iiiiiiiio TSWYETSWYEEPATT

Identifying PATT from RCTKey assumptions

1. Treatment invariant to sample assignment (consistency)Individual’s potential outcomes for RCT or target population, e.g. for t=1

2. Strong ignorability of sample assignment for treatedPotential outcomes independent for same W

)1,(),( 111 TWSYYo

111 iio YY

Testing assumptions Placebo tests (Jones, Health Econ 2007)

• Test assumptions by comparing outcomes between the settings• Are outcomes RCT treated after reweighting same as NRS treated• Placebo tests: assess if recover zero effect with current model• Null Hypothesis: data are inconsistent with valid research design• Will accept null if:

a) lack of power b) Observe outcome differences between settings

i) selection into RCT conditional on potential outcomes

ii) treatment differs between settings • i.e. A good result is a small mean difference and low P Value

Estimation

AIM: adjust RCT data to target populationMain approach uses maximum entropy reweighting (MaxEnt)

– Reweight each strata from RCT to represent target population– Can use aggregated data from target population– Weights according to constraints, e.g. covariate means– Harnesses search algorithm: See Kapar and Kesavan (1992)– Of possible distributions satisfy constraints, pick closest to uniform– Avoids imposing assumptions about distribution of the weights

• Also consider regression and propensity score approaches

Implementing approach in PAC case study

1. Within RCT, for each PAC find matched control, to estimate SATT2. Placebo tests, contrast weighted outcomes RCT PAC versus NRS PAC3. Reweight SATT covariate information from NRS, to estimate PATT

Report measures of uncertainty that allow for:correlation costs and outcomesuncertainty weighting to target population (Bickel and Savov, 2008)

Each aspect implemented in R

RCT (PAC) versus NRS (PAC)

P values on baseline characteristic before reweighting

P values from t-tests

age<5757<age<67

67<age<75

age>75elective su

rg

emer surg

non-surg

small u

nit

medium unit

large unit

teaching hospital

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Reweighting the RCT for the NRS baseline characteristics

before and after reweighting with MaxEnt

P values from t-tests

age<5757<age<67

67<age<75

age>75elective su

rg

emer surg

non-surg

small u

nit

medium unit

large unit

teaching hospital

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

after reweighting

before reweighting

Placebo tests- mortality NRS treated- RCT treated after weighting by maxent

Null hypothesis: the study design is invalidmortality difference

P Value Power

Overall -3% 0.05 96%

Subgroup Teaching hospital -4% 0.12 27%

Non teaching hospital -3% 0.05 85%

Non surgical -4% 0.06 83%

Elective Surgery 8% 0.46 8%

P values, corrected for multiple comparisons

PATT (MaxEnt) versus SATTmean difference in survival (PAC - no PAC)

-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

SATT

OVERALL

elective surgical

Non surgical

Non teaching

teaching

PATT (MaxEnt) versus SATTIncremental net benefits PAC- No PAC

£20,000 per QALY

-25000 0 25000 50000 75000 100000

SATT

OVERALL

elective surgical

Non surgical

Non teaching

teaching

Interpretation and implications (I)

• In this example, placebo tests passed, overall• Approach extends previous work for improving external

validity (Hotz et al, 2005; Imai et al 2008, Stuart et al, 2011)• Defines and tests key assumptions • Harnesses RCTs and large observational data• Current approach applies to settings with full compliance• Future work, identify complier-average causal effect for target

population, under non-compliance• What do when assumptions fail.. conclude results lack

external validity?

Example II: Telehealth

25

“the remote exchange of data between a patient and health care

professionals as part of the diagnosis and management of health care

conditions”

Telehealth devices enable items such as blood glucose level and weight to be

measured by the patient and transmitted to health care professionals

working remotely.

Whole System Demonstrator Trial (WSD)Telehealth vs ‘usual care’ Steventon et al, 2012

• Adult Patients with diabetes, COPD, heart failure• Cluster Randomised design• Randomised 179 GP practices, 3230 adults, 3 English counties • Intervention: broad class of Telehealth devices• Control: usual care, services available at the trial sites• Blinding: patients at consent, not for recruiters• Outcomes: emergency admissions, outpatient visits

WSD outcomes: emergency admissions

Percentage difference in emergency admissions: -20.6% (95% CI -33.8% to -7.4%)

WSDinterpretation

• BMJ paper: “..results suggest Telehealth helped patients avoid need for emergency admissions..”

• Department of Health: "We funded a three-year randomised control trial..which clearly demonstrated that if implemented appropriately, telehealth can reduce emergency admissions by 20%...”

• David Cameron: "We've trialled it, it's been a huge success, and now we're on a drive to roll this out nationwide.”

Generalising effectiveness RCTs to target populations (see Hartman et al, 2015)

• Policy makers want: average treatment effect for target population• RCT: sample average treatment effects• For RCT to provide unbiased effectiveness for target population:Assumes:

– RCT participants have similar characteristics to the target population– ‘intervention’ in the RCT is the same in routine practice– ‘control’ in the RCT is the same as ‘usual care’ in routine practice

WSDbaseline characteristics

30

Control groupN=1584

Intervention groupN=1570

Age 70.8 (11.7) 69.7 (11.6)

Female no (%) 643 (40.6) 647 (41.2)

COPD no (%) 786 (49.6) 739 (47.1)

Diabetes no (%) 342 (21.6) 406 (25.9)

Heart failure no (%) 456 (28.8) 425 (27.1)

Mean (SD) no. chronic conditions 1.8 (1.80) 1.8 (1.78)

Mean (SD) Combined Model score 0.26 (0.20) 0.26 (0.20)

Did RCT control arm have usual care?Emergency admissions: before and after RCT

.

WSDre-analysis to improve generalisability

Step 1. Define target population from observational data, patients who met RCT inclusion criteria and received usual care (n=88,830)

Step 2. From target population find individuals (n=1,293), who match RCT controls (n=1,293), on 65 baseline covariates

Step 3. Placebo test: contrast outcomes from matched target population (n=1,293), vs RCT controls (n=1,293) (A)

Step 4. Sensitivity analysis: re-estimate Telehealth effectivenessRCT Telehealth arm (n=1,229) vs matched target population (n=1,229) (B)

WSD: placebo results

Placebo tests

(A)RCT (control) vs.

matched target population (usual care)

Emergency admissions per head 1.22(1.05, 1.43)

Outpatient attendances per head 1.03(0.94, 1.13) a

Incidence rate ratios.

a: passes placebo testFrom Steventon et al. (2015).

WSD: re-analyses

a Passes placebo test, point estimate and confidence interval lie within range

Placebo tests Estimated effect of Telehealth(A)

RCT control arm vs target population

(usual care)

(B)RCT Telehealth arm vs target population

(usual care)

(C)RCT Telehealth armvs RCT control armb

Emergency admissions per head

1.22(1.05, 1.43)

1.12(0.95, 1.31)

0.90(0.77, 1.05)

Outpatient attendances per head

1.03(0.94, 1.13) a

1.04(0.95, 1.14)

1.02(0.93, 1.12)

Incidence rate ratios.

b we use data from 1229 in the Telehealth arm and 1229 in the control arm to ensure consistency across the re-analyses. See Steventon et al (2015) for details.

Interpretation and implications

• Framework testing assumptions for generalisability from RCT• ‘simple’ interventions placebo tests satisfied (Hartman et al, 2015)• Why generalisability assumptions fail for Telehealth vs usual care?• RCT control arm did not receive usual care, or differences in

unobserved characteristics vs target population?• RCT participation encouraged the control arm to seek more help• More generally, better outcomes RCT control arm vs usual care • If RCT controls do not receive usual care, other designs required

– RCT assumes: control therapy= usual care and no unobserved confounders– Matched analysis assumes no unobserved confounding

Discussion and future work• Framework for providing relevant estimates of treatment

effectiveness for target populations• Lack evidence on patient/public/clinicians preferences or beliefs• Big data technology, opportunity to survey large numbers• Insight into behavioural dimensions

– Example Health Economics MOdelling (HEMO) for maintaining blood supply– RCT (n=50,000), Observational (2 million), and stated pref survey (n=100,000)

• More accurate predictions of how effective and cost-effective a complex intervention will be in practice

• Other areas- phase II and III RCTs• Evidence from several RCTs

36

References• Hartman E, Grieve R, Ramsahai R & Sekhon JS (2015). From SATE to PATT:

combining experimental with observational studies to estimate population treatment effects, JRSS(A) doi: 10.1111/rssa.1209

• Heckman et al. (1998). Characterizing selection bias using experimental data. Econometrica 66(Sep):1017-1098

• Hellenstein and Imbens (1999). Imposing moment restrictions from auxiliary data by weighting. Review Econ and Statistics, 81(1): 1-14.

• Hotz et al (2005). Predicting the efficacy of future training programs using past experiences at other locations. Journal of Econometrics 125(1-2); 241-70

• Imai et al (2008). Misunderstandings among experimentalists and observationalists about causal inference. JRSS (A): 171(2):481-502.

• Steventon A, Grieve R, Bardsley M (2015) . An approach to assess generalizability in comparative effectiveness research: a case study of the Whole Systems Demonstrator cluster randomized trial. Med Decis Mak doi: 10.1177/0272989X15585131

• Stuart et al (2011). The use of Propensity scores to assess the generalisability of results from randomised trials. JRSS(A) 174:369-86

getting the best of both worlds: exploiting observational data to maximise the relevance of rcts for...

Documents

population treatment

alternative treatment

handling treatment

treatment groupc

external data

rct data

target population sattpatt

health research