2. michael walsh - fragility - mcmaster university's...
TRANSCRIPT
The Fragile Study Can I trust its results?
Michael Walsh, MD MSc McMaster University
The Second David Sackett Symposium
1
Speaker Michael Walsh has no poten?al for conflict of interest with
this presenta?on
Faculty/Presenter Disclosure
2
DISCLAIMER Observational vs RCT? Good observational = good for some things Good RCT = informative about treatment effect
“Good” not always clear
Starting position: Randomized + p<0.05 = Truth
3
WHEN DO WE BELIEVE AN EFFECT EXISTS?
ü Random allocation ü Randomization list concealed ü Sufficient follow-up ü Analysis of all patients in their allocated group ü Blinding ü Equal treatment ü Similar baseline prognosis
4
WHY IS THIS DIFFICULT? Concealment
• Unclear or inadequate in ~ 49% Sufficient follow-up
• ? Analysis of patient in their allocated group
• 48% stated ITT or which 17% did not 75% unclear how Blinding
• Variable and unclear meaning Similar baseline prognosis
• ? Effect of spin, publication bias etc…
Pidal J et al. Int J Epi. 2007; 36:847-857.; Hollis S et al. BMJ. 1999; 319:670-674 Devereaux PJ et al. JAMA. 2001;285:2000–3. 5
WHY IS THIS DIFFICULT? Multiple ways of presenting point estimates Too much information
• Many therapies turn out less effective than initially reported • Exponential growth pattern of RCTs
Akl E et al. Cochrane Database of Systematic Reviews 2011;3. Strippoli et al. J Am Soc Neph. 2004;15:411-419.
6
7
MODERN STATISTICS TO THE RESCUE!
THERE IS A LOT OF INFORMATION TO DIGEST FOR A BUSY CLINICIAN
8
9
10
P-VALUES “…the accept/reject philosophy of significance testing based on the “magical” p=0.05 barrier remains dominant in the minds of many non-statisticians.”
• S. Pocock. BMJ. 1985.
11
MODERN STATISTICS TO THE RESCUE!
We are not intuitively good statisticians
• Belief in small numbers
• Based on heuristics of sample representativeness
13
Tversky and Khaneman. Psychology Bulletin. 1971; 76(2):105-110.
HOW CAN WE MAKE THE BELIEVABILITY OF A TREATMENT EFFECT MORE TRANSPARENT?
14
95% CONFIDENCE INTERVALS
15 Relative Risk Reduction (%)
0 50 100
Yusuf S et al. Prog Cardiovascular Dis. 1985;27(5):335-371.
NUMBER OF EVENTS
16
Thorlund K et al. PLoS ONE 6(10);2011:e25491
30% RRR
20% RRR
30% RRR
20% RRR
NUMBER OF EVENTS
17
UNIT FRAGILITY Relative change in the effect size if one unit transferred in the 2x2 table Comparison to minimally clinically important difference and threshold for statistical significance Hasn’t really caught on
• Difficulty framing change in relative effect size? • Difficulty determining (widely accepted) MCIDs? • Not enough advertising?
18
Feinstein. J Clin Epi. 1990; 43(2):201-209; Walter. J Clin Epi. 1991; 44(12): 1373-1378
THE FRAGILITY INDEX Treatment Control
Cases a b Noncases c d
19 n = Fragility Index
Fisher’s Exact Test
Treatment Control Cases a+n b
Noncases c-n d
Increase n un?l Fisher’s Exact Test un?l p≥0.05
EXAMPLE
Trial 1 Tx A
(n=1159) Placebo (n=1157)
RRR (95% CI) p-value
Death 90 118 24%
(1 to 43%) 0.04
Death 90+2 118 22%
(-1 to 40%) 0.06
20
Powered for a 33% RRR under assumption control event rate 12.5% (actual 13.8%)
Trial 1 Tx A
(n=100) Placebo (n=100)
P value RRR (95% CI)
MI 1 9 0.02 89%
(14 to 99%)
MI 1+1 9 0.06 0.22
(0.05 – 1.00)
Trial 2 Tx B
(n=4000) Placebo (n=4000)
P value RR (95% CI)
MI 200 250 0.02 20%
(4 to 33%)
MI 200+9 250 0.05 0.84
(0.70 – 1.00)
EXAMPLE
21
FRAGILITY INDEX Sample sizes 800 to 3000 Baseline risk: 20% to 50% Powered for ~20% relative risk reduction (80% power, alpha=0.05)
True Treatment Effect: 0, 10 or 20% relative risk reduction
Total events: 360 to 600
22
FRAGILITY INDEX
23
05
1015
2025
3035
40
Frag
ility
Inde
x
0 .05 .1 .15Mean Difference from True RRR
9 events
4 events
Pretty Close to the Truth
There really is a treatment effect but not this big
Suggests a statistically significant treatment effect when there really isn’t one
FRAGILITY INDEX
24
010
2030
40
Freq
uenc
y
0 50 100 150 200Fragility Index
25%
50%
399 RCTs from top medical journals with at least one statistically significant result in the abstract (p<0.05 or 95% CI excludes null)
IS FRAGILITY PLAUSIBLE? Loss to follow-up
• often greater than a few patients and unclear if a random process
• Post-randomization exclusions Incomplete/inadequate blinding
• Subjectivity in outcome assessment • Differential cointervention
Goofy, weird things • Murphy’s law…
25
CONCLUSIONS We should probably be conservative in our belief about treatment effects Statistics help determine belief in a treatment effect but are fallible Several options for figuring out if statistical tests are likely misleading Simple, intuitive aids to tempering belief may be useful
26
ACKNOWLEDGEMENTS PJ Devereaux David Sackett Gordon Guyatt
Sadeesh Srinathan Danny McAuley Marko Mrkobrada Amber Molnar Oren Levine Neil Dattani Andrew Burke Christine Ribic Lehana Thabane Stephen Walter Janice Pogue
27
QUESTIONS?
28
METHODS Eligibility
• Parallel limb RCT with 1:1 randomization • Dichotomous outcome • Nominally statistically signficant (p<0.05 or 95% CI excludes null)
Search
• 2006-2010 Fragility Index computed for each trial
29
METHODS Eligibility
• Parallel limb RCT with 1:1 randomization • Dichotomous outcome • Nominally statistically signficant (p<0.05 or 95% CI excludes null)
Search
• 2006-2010 Fragility Index computed for each trial
30
31
1273 Trials Identified
874 Trials Excluded
383 Design not two-parallel group or 2x2 factorial
64 Allocation not 1:1
427 Abstract did not report a statistically significant dichotomous result
399 Eligible Trials
32
Characteristic Number n=399
Journal, n (%) New England Journal of Medicine 165 (41.3) Lancet 112 (28.1) Journal of the American Medical Association 48 (12.0) Annals of Internal Medicine 33 (8.3) British Medical Journal 41 (10.3)
Sample Size, median (min to max) 682 (15 to 112,604) Number of Outcome Events, median (min to max)
112 (8 to 5,142)
Reported p-value, n (%) <0.05 to 0.01 186 (46.6) <0.01 to 0.001 168 (42.1) <0.001 45 (11.3)
Included Outcome, n (%) Primary 263 (65.9) Composite 132 (33.1) Time-to-event 206 (51.6) Adjusted 35 (8.8)
FRAGILITY INDEX Absolute Fragility Index
• median 8 • 25th percentile 3
Relative Fragility Index
• median 17% • 25th percentile 5.1%
Loss to follow-up exceeded AFI • 32%
33
34
Median Fragility Index: 8 25% Fragility Index ≤ 3 10% Fragility Index 0
35
Characteristic β-coefficient (95% CI)
p-value
Reported p-value <0.05 to 0.01 Referent <0.01 to 0.001 11.6 (4.0 to 19.3) 0.003 <0.001 39.2 (4.9 to 73.5) 0.03
Number of Events 8 to 51 Referent 52 to 112 7.3 (4.5 to 10.1) <0.001 113 to 281 10.0 (6.7 to 13.3) <0.001 282 to 5142 48.1 (27.7 to 68.6) <0.001
Sample Size 15 to 286 Referent 287 to 682 6.6 (2.7 to 10.6) 0.001 683 to 2522 9.5 (4.0 to 15.0) 0.001 2523 to 112,604 39.5 (19.6 to 59.3) <0.001
36
03
1030
100
300
1000
Frag
ility
Inde
x
0 10 100 1000 10000 100000Total Sample Size
37
03
1030
100
300
1000
Frag
ility
Inde
x
0 10 100 1000 5000Total Number of Events
38
Characteristic β-coefficient (95% CI)
p-value
Primary Outcome -8.8 (-23.2 to 5.7) 0.23 Time-to-Event Outcome -0.3 (-10.8 to 10.1) 0.95 Composite Outcome 5.7 (-7.8 to 19.2) 0.41 Adjusted Analysis -2.7 (-15.5 to 10.1) 0.68 Intention-to-Treat Analysis -6.9 (-18.0 to 4.2) 0.22 Allocation Concealment Unclear/Inadequate
-9.8 (-17.6 to -1.9) 0.02
Lost to Follow-up ≤1% Referent >1 to 5% 4.6 (-7.5 to 16.6) 0.46 >5 to 10% -3.0 (-10.5 to 4.6) 0.44 >10% 5.3 (-12.7 to 23.2) 0.57 Not reported 15.9 (-2.7 to 34.5) 0.09
PAIRWISE CORRELATIONS OF FRAGILITY INDEX
AFI R
(p-‐value)
N 0.28 (<0.001)
# Events 0.64 (<0.001)
39
LOST TO FOLLOW-UP 306 trials reported lost to follow-up clearly
• Median 9 Total Lost > Fragility
• 162 trials (53%) Lost from one group > Fragility
• 132 trials (43%)
40