2. michael walsh - fragility - mcmaster university's...

The Fragile Study Can I trust its results?

Michael Walsh, MD MSc McMaster University

The Second David Sackett Symposium

1

Speaker Michael Walsh has no poten?al for conflict of interest with

this presenta?on

Faculty/Presenter Disclosure

2

DISCLAIMER Observational vs RCT? Good observational = good for some things Good RCT = informative about treatment effect

“Good” not always clear

Starting position: Randomized + p<0.05 = Truth

3

WHEN DO WE BELIEVE AN EFFECT EXISTS?

ü Random allocation ü Randomization list concealed ü Sufficient follow-up ü Analysis of all patients in their allocated group ü Blinding ü Equal treatment ü Similar baseline prognosis

4

WHY IS THIS DIFFICULT? Concealment

•  Unclear or inadequate in ~ 49% Sufficient follow-up

•  ? Analysis of patient in their allocated group

•  48% stated ITT or which 17% did not 75% unclear how Blinding

•  Variable and unclear meaning Similar baseline prognosis

•  ? Effect of spin, publication bias etc…

Pidal J et al. Int J Epi. 2007; 36:847-857.; Hollis S et al. BMJ. 1999; 319:670-674 Devereaux PJ et al. JAMA. 2001;285:2000–3. 5

WHY IS THIS DIFFICULT? Multiple ways of presenting point estimates Too much information

•  Many therapies turn out less effective than initially reported •  Exponential growth pattern of RCTs

Akl E et al. Cochrane Database of Systematic Reviews 2011;3. Strippoli et al. J Am Soc Neph. 2004;15:411-419.

6

MODERN STATISTICS TO THE RESCUE!

THERE IS A LOT OF INFORMATION TO DIGEST FOR A BUSY CLINICIAN

8

P-VALUES “…the accept/reject philosophy of significance testing based on the “magical” p=0.05 barrier remains dominant in the minds of many non-statisticians.”

•  S. Pocock. BMJ. 1985.

11

MODERN STATISTICS TO THE RESCUE!

We are not intuitively good statisticians

• Belief in small numbers

• Based on heuristics of sample representativeness

13

Tversky and Khaneman. Psychology Bulletin. 1971; 76(2):105-110.

HOW CAN WE MAKE THE BELIEVABILITY OF A TREATMENT EFFECT MORE TRANSPARENT?

14

95% CONFIDENCE INTERVALS

15 Relative Risk Reduction (%)

0 50 100

Yusuf S et al. Prog Cardiovascular Dis. 1985;27(5):335-371.

NUMBER OF EVENTS

16

Thorlund K et al. PLoS ONE 6(10);2011:e25491

30% RRR

20% RRR

30% RRR

20% RRR

NUMBER OF EVENTS

17

UNIT FRAGILITY Relative change in the effect size if one unit transferred in the 2x2 table Comparison to minimally clinically important difference and threshold for statistical significance Hasn’t really caught on

•  Difficulty framing change in relative effect size? •  Difficulty determining (widely accepted) MCIDs? •  Not enough advertising?

18

Feinstein. J Clin Epi. 1990; 43(2):201-209; Walter. J Clin Epi. 1991; 44(12): 1373-1378

THE FRAGILITY INDEX Treatment Control

Cases a b Noncases c d

19 n = Fragility Index

Fisher’s Exact Test

Treatment Control Cases a+n b

Noncases c-n d

Increase n un?l Fisher’s Exact Test un?l p≥0.05

EXAMPLE

Trial 1 Tx A

(n=1159) Placebo (n=1157)

RRR (95% CI) p-value

Death 90 118 24%

(1 to 43%) 0.04

Death 90+2 118 22%

(-1 to 40%) 0.06

20

Powered for a 33% RRR under assumption control event rate 12.5% (actual 13.8%)

Trial 1 Tx A

(n=100) Placebo (n=100)

P value RRR (95% CI)

MI 1 9 0.02 89%

(14 to 99%)

MI 1+1 9 0.06 0.22

(0.05 – 1.00)

Trial 2 Tx B

(n=4000) Placebo (n=4000)

P value RR (95% CI)

MI 200 250 0.02 20%

(4 to 33%)

MI 200+9 250 0.05 0.84

(0.70 – 1.00)

EXAMPLE

21

FRAGILITY INDEX Sample sizes 800 to 3000 Baseline risk: 20% to 50% Powered for ~20% relative risk reduction (80% power, alpha=0.05)

True Treatment Effect: 0, 10 or 20% relative risk reduction

Total events: 360 to 600

22

FRAGILITY INDEX

23

05

1015

2025

3035

40

Frag

ility

Inde

x

0 .05 .1 .15Mean Difference from True RRR

9 events

4 events

Pretty Close to the Truth

There really is a treatment effect but not this big

Suggests a statistically significant treatment effect when there really isn’t one

FRAGILITY INDEX

24

010

2030

40

Freq

uenc

y

0 50 100 150 200Fragility Index

25%

50%

399 RCTs from top medical journals with at least one statistically significant result in the abstract (p<0.05 or 95% CI excludes null)

IS FRAGILITY PLAUSIBLE? Loss to follow-up

•  often greater than a few patients and unclear if a random process

•  Post-randomization exclusions Incomplete/inadequate blinding

•  Subjectivity in outcome assessment •  Differential cointervention

Goofy, weird things •  Murphy’s law…

25

CONCLUSIONS We should probably be conservative in our belief about treatment effects Statistics help determine belief in a treatment effect but are fallible Several options for figuring out if statistical tests are likely misleading Simple, intuitive aids to tempering belief may be useful

26

ACKNOWLEDGEMENTS PJ Devereaux David Sackett Gordon Guyatt

Sadeesh Srinathan Danny McAuley Marko Mrkobrada Amber Molnar Oren Levine Neil Dattani Andrew Burke Christine Ribic Lehana Thabane Stephen Walter Janice Pogue

27

QUESTIONS?

28

METHODS Eligibility

•  Parallel limb RCT with 1:1 randomization •  Dichotomous outcome •  Nominally statistically signficant (p<0.05 or 95% CI excludes null)

Search

•  2006-2010 Fragility Index computed for each trial

29

METHODS Eligibility

•  Parallel limb RCT with 1:1 randomization •  Dichotomous outcome •  Nominally statistically signficant (p<0.05 or 95% CI excludes null)

Search

•  2006-2010 Fragility Index computed for each trial

30

31

1273 Trials Identified

874 Trials Excluded

383 Design not two-parallel group or 2x2 factorial

64 Allocation not 1:1

427 Abstract did not report a statistically significant dichotomous result

399 Eligible Trials

32

Characteristic Number n=399

Journal, n (%) New England Journal of Medicine 165 (41.3) Lancet 112 (28.1) Journal of the American Medical Association 48 (12.0) Annals of Internal Medicine 33 (8.3) British Medical Journal 41 (10.3)

Sample Size, median (min to max) 682 (15 to 112,604) Number of Outcome Events, median (min to max)

112 (8 to 5,142)

Reported p-value, n (%) <0.05 to 0.01 186 (46.6) <0.01 to 0.001 168 (42.1) <0.001 45 (11.3)

Included Outcome, n (%) Primary 263 (65.9) Composite 132 (33.1) Time-to-event 206 (51.6) Adjusted 35 (8.8)

FRAGILITY INDEX Absolute Fragility Index

•  median 8 •  25th percentile 3

Relative Fragility Index

•  median 17% •  25th percentile 5.1%

Loss to follow-up exceeded AFI •  32%

33

34

Median Fragility Index: 8 25% Fragility Index ≤ 3 10% Fragility Index 0

35

Characteristic β-coefficient (95% CI)

p-value

Reported p-value <0.05 to 0.01 Referent <0.01 to 0.001 11.6 (4.0 to 19.3) 0.003 <0.001 39.2 (4.9 to 73.5) 0.03

Number of Events 8 to 51 Referent 52 to 112 7.3 (4.5 to 10.1) <0.001 113 to 281 10.0 (6.7 to 13.3) <0.001 282 to 5142 48.1 (27.7 to 68.6) <0.001

Sample Size 15 to 286 Referent 287 to 682 6.6 (2.7 to 10.6) 0.001 683 to 2522 9.5 (4.0 to 15.0) 0.001 2523 to 112,604 39.5 (19.6 to 59.3) <0.001

36

03

1030

100

300

1000

Frag

ility

Inde

x

0 10 100 1000 10000 100000Total Sample Size

37

03

1030

100

300

1000

Frag

ility

Inde

x

0 10 100 1000 5000Total Number of Events

38

Characteristic β-coefficient (95% CI)

p-value

Primary Outcome -8.8 (-23.2 to 5.7) 0.23 Time-to-Event Outcome -0.3 (-10.8 to 10.1) 0.95 Composite Outcome 5.7 (-7.8 to 19.2) 0.41 Adjusted Analysis -2.7 (-15.5 to 10.1) 0.68 Intention-to-Treat Analysis -6.9 (-18.0 to 4.2) 0.22 Allocation Concealment Unclear/Inadequate

-9.8 (-17.6 to -1.9) 0.02

Lost to Follow-up ≤1% Referent >1 to 5% 4.6 (-7.5 to 16.6) 0.46 >5 to 10% -3.0 (-10.5 to 4.6) 0.44 >10% 5.3 (-12.7 to 23.2) 0.57 Not reported 15.9 (-2.7 to 34.5) 0.09

PAIRWISE CORRELATIONS OF FRAGILITY INDEX

AFI R

(p-‐value)

N 0.28 (<0.001)

# Events 0.64 (<0.001)

39

LOST TO FOLLOW-UP 306 trials reported lost to follow-up clearly

•  Median 9 Total Lost > Fragility

•  162 trials (53%) Lost from one group > Fragility

•  132 trials (43%)

40

2. michael walsh - fragility - mcmaster university's...

Documents