nicholas jewell medicres world congress 2011

35
Good Biostatistical Report Practices Being Honest in Data Analysis Nicholas P. Jewell Departments of Statistics & School of Public Health (Biostatistics) University of California, Berkeley March 26, 2011 1

Upload: medicres

Post on 20-Aug-2015

506 views

Category:

Health & Medicine


1 download

TRANSCRIPT

Page 1: Nicholas Jewell MedicReS World Congress 2011

Good Biostatistical Report Practices

Being Honest in Data Analysis

Nicholas P. Jewell Departments of Statistics &

School of Public Health (Biostatistics) University of California, Berkeley

March 26, 2011

1

Page 2: Nicholas Jewell MedicReS World Congress 2011

2

Page 3: Nicholas Jewell MedicReS World Congress 2011

3

Page 4: Nicholas Jewell MedicReS World Congress 2011

References

•  Bailar, J.C. and Mosteller, F. “Guidelines for statistical reporting in articles for medical journals: Amplifications and explanations,” Annals of Internal Medicine, 108, 1988, 266-273.

•  Lang, T.A. and Secic M. “How to Report Statistics in Medicine: Annotated Guidelines for Authors, Editors, and Reviewers,” 2nd Edition, Amrican College of Physicians, 2006.

•  Altman, D. “Statistical reviewing for medical journals, Statistics in Medicine, 17, 1998, 2661-2674.

© Nicholas P. Jewell, 2011

4

Page 5: Nicholas Jewell MedicReS World Congress 2011

Overview •  Opportunistic Data Torturing

–  Multiple comparisons –  Multiple analyses –  Subgroup analyses

•  Procustean Data Torturing –  Subgroup analyses (or lack thereof) –  Meta-analyses

•  Sensitivity Analyses/Proxies/Assumptions/Honesty •  Blinding the Statistician (who funds the statistician?) •  Pre-specified Data Driven Analyses •  Dissemination and Publication

5

© Nicholas P. Jewell, 2011

Page 6: Nicholas Jewell MedicReS World Congress 2011

What is the Recipe for a Good Biostatistical Review?

The Four Hs •  What basic questions should be asked?

–  GBRS Questions provide an excellent checklist (guidelines etc)

–  How was the Data Collected (e.g. design/sample size issues/loss to follow up)

–  How were the variables measured? –  How was the data analyzed? (e.g. methods/

assumptions, software, pre-specified methods) –  How were the results reported? (e.g. sensitivity,

interpretation, uncertainty, conflicts of interest) 6

© Nicholas P. Jewell, 2011

Page 7: Nicholas Jewell MedicReS World Congress 2011

Deeper Questions The Five Ws

•  What is the population of interest? •  What is the data-generating mechanism (e.g.

assumptions, missingness or data filtering) •  What are the parameters of interest that describe the

data-generating mechanism? (e.g. causal inference) •  Which of these parameters are identifiable and how can

they be estimated effectively? •  Where can I find the data and the software?

7

© Nicholas P. Jewell, 2011

Page 8: Nicholas Jewell MedicReS World Congress 2011

The MIRA Trial •  Gates Foundation study to determine the

effectiveness of a latex diaphragm in the reduction of heterosexual acquisition of HIV among women

•  Two arm, randomized, controlled trial •  Primary intervention: diaphragm and gel provision

to diaphragm arm (nothing to control arm). •  Secondary Intervention: Intensive condom provision

and counseling given to both arms, plus treatment of STIs

•  Trial is not blinded •  5000 women seen for 18 months in three sites in

Zimbabwe and South Africa 8

© Nicholas P. Jewell, 2009

Page 9: Nicholas Jewell MedicReS World Congress 2011

MIRA Trial: Basic Intention to Treat Results

•  Basic Intent-to-Treat Analysis: – 158 new HIV infections in Diaphragm Arm – 151 new HIV infections in Control Arm

•  ITT estimate of Relative Risk is 1.05 with a 95% CI of (0.84, 1.30)

•  End of story . . . . .?

9

© Nicholas P. Jewell, 2009

Page 10: Nicholas Jewell MedicReS World Congress 2011

MIRA Trial: Basic Intention to Treat Results

•  However condom use differed between the two arms: – 53.5% in Diaphragm Arm (by visit) – 85.1% in Control Arm (by visit)

•  Could this mean that the diaphragm was more effective than it appeared from the basic analysis?

•  To make sense of this—we’d like to understand the role of condom use in mediating the effect of treatment assignment on HIV infection. 10

© Nicholas P. Jewell, 2009

Page 11: Nicholas Jewell MedicReS World Congress 2011

Most Important Public Health Questions

1. What is the effectiveness of providing study product in environment of country-level standard condom counseling?

(in environment of no condom counseling?) 2. How does providing study product alone compare to

consistent condom use alone in reducing HIV transmission?

3. How does providing the study product alone compare to unprotected sex, in terms of risk of HIV infection?

None of these questions are answered by basic ITT analysis 11

© Nicholas P. Jewell, 2009

Page 12: Nicholas Jewell MedicReS World Congress 2011

Reasons effectiveness may be different in environment of intensive condom counseling vs. environment of

country-level standard condom counseling

A. In trials without blinding, effect of intensive condom counseling may be different for treatment arm and control arm.

B. Effects of intensive condom counseling may be different for those who adhere to assigned treatment and those who don’t.

C. Intensive condom counseling may itself affect adherence to assigned treatment.

12

© Nicholas P. Jewell, 2009

Page 13: Nicholas Jewell MedicReS World Congress 2011

Reason A: In unblinded trials, effect of intensive condom counseling may be different for treatment arm and control arm.

Treatment Arm

Control Arm

Non-adherers (25%) Adherers (75%)

50 infections 40

50

Treatment Arm Control Arm

Non-adherers (25%) Adherers (75%)

50 40

50 10

RR:170/200=0.85

RR:106/80=1.33

Study Product (or placebo) Users Condom Users No Condom Counseling

Intensive Condom Counseling

Assume: Condoms 80% protective, Study Product 20% efficacy, and Intensive Condom Counseling less effect in treatment arm

8

50 infections

40 40

50 50

8

10 10

Page 14: Nicholas Jewell MedicReS World Congress 2011

MIRA Trial—Randomization of Diaphragm

HIV Infection

Tx Assignment (Diaphragm use)

Condom Use

14

© Nicholas P. Jewell, 2009

Page 15: Nicholas Jewell MedicReS World Congress 2011

Estimating Direct Effects: Adjusting for a Mediator (condom use)

HIV Infection

Treatment Group (Diaphragm use)

Condom Use

DIRECT EFFECT

•  We want to estimate the direct effect of diaphragm provision, at a set level of condom use. (Petersen et al. 2006, Robins and Greenland 1992, Pearl 2000, Rosenblum et al. 2009) •  Still ITT interpretation •  Requires Stronger Assumptions than basic ITT

15

© Nicholas P. Jewell, 2009

Page 16: Nicholas Jewell MedicReS World Congress 2011

Estimating Direct Effects: Stratify on condom use?

(danger: condom use is not randomized)

HIV Infection

Treatment Group (Diaphragm use)

Condom Use

Confounders

after stratification on condom use

HIV Infection

Treatment Group (Diaphragm use)

Confounders randomization hasn’t ruled out confounding of direct effect!

Now have to adjust for confounders (but we are still ITT)

Page 17: Nicholas Jewell MedicReS World Congress 2011

Why Stratification/Regression Gives Biased Estimates When There Are

Confounders as Causal Intermediates

Treatment Group (R)

HIV Status (H)

Condom Use (C)

Diaphragm Use (D) (confounder)

Using Regression, if we control for D, we don’t get the direct effect that we want. © Nicholas P. Jewell, 2009 © Nicholas P. Jewell, 2009

Page 18: Nicholas Jewell MedicReS World Congress 2011

Results of Direct Effects Analysis

•  Relative Risk of HIV infection between Diaphragm arm and Control arm by end of Trial, with Condom Use Fixed at “Never”: 0.59 (95% CI: 0.26, 4.56)

•  Relative Risk of HIV infection between Diaphragm arm and Control arm by end of Trial, with Condom Use Fixed at “Always”: 0.96 (95% CI: 0.59, 1.45)

Conclusion: No definitive evidence from direct effects analysis that diaphragms prevent (or don’t prevent) HIV.

© Nicholas P. Jewell, 2009

18

© Nicholas P. Jewell, 2009

Page 19: Nicholas Jewell MedicReS World Congress 2011

Pain Trials with Self-Reporting •  Pain is the most disturbing symptom of peripheral

neuropathy among diabetic patients

•  As many as 45% of patients with diabetes develop peripheral neuropathies

•  Gabapentin was suggested as a treatment option

•  To evaluate the effect of Gabapentin, a randomized, double-blind, placebo-controlled trial was conducted

•  165 patients with a 1- to 5-year history of pain attributed to diabetic neuropathy enrolled at 20 different sites

© Nicholas P. Jewell, 2011

19

Page 20: Nicholas Jewell MedicReS World Congress 2011

Backonja et al. Trial in JAMA •  The main outcome was daily pain severity as measured on an

11-point Likert scale (0 no pain- 10 worst possible pain)

•  Eighty-four patients received gabapentin, 81 received placebo

•  By intention-to-treat analysis, gabapentin-treated patients' mean daily pain score (baseline 6.4, end point 3.9) was significantly lower (P<.001) than the placebo-treated patients' score (baseline 6.5, end point 5.1)

•  Concluded that gabapentin appears to be efficacious for the treatment of pain associated with diabetic peripheral neuropathy

© Nicholas P. Jewell, 2011

20

Page 21: Nicholas Jewell MedicReS World Congress 2011

Handling Treatment-Related Side Effects

•  Treat side effects singly by removing those individuals from the data analysis and seeing if that changed the results

© Nicholas P. Jewell, 2011

21

Page 22: Nicholas Jewell MedicReS World Congress 2011

Perception Effect •  Patients have a perception about the treatment they

receive •  In general we may think of the patients assigning a

degree of certainty (probability) to receiving the active treatment, measured by a variable P

•  In most cases we do not observe the patient’s perception on a continuous scale

© Nicholas P. Jewell, 2011

P =

10−1

⎨ ⎪

⎩ ⎪

convinced on treatmentnot sure if on treatment or placebo

convinced on placebo

22

Page 23: Nicholas Jewell MedicReS World Congress 2011

Application to Unmasking of Blinded Treatment Assignment Due to Side Effects

Outcome (pain score)

Treatment Group

Side effects Confounders

•  side effects are treatment-related and therefore potentially unmasking

•  outcome is measured subjectively

•  actual use of the drug (adherence) is a potential confounder 23

© Nicholas P. Jewell, 2011

Page 24: Nicholas Jewell MedicReS World Congress 2011

Data •  Outcome:

mean pain score for the last 7 diary entries

•  Baseline covariates: age, sex, race, height, weight, baseline pain, baseline sleep

•  Treatment: gabapentin, placebo

•  Perception: changes from 0 to 1 when a treatment-related side effect occurs

© Nicholas P. Jewell, 2011

24

Page 25: Nicholas Jewell MedicReS World Congress 2011

Parameters of Interest •  What is the treatment effect if all the patients

remained unknowledgeable about their treatment? (Perception fixed at 0)

•  What is the treatment effect if all the patients thought they were receiving the active treatment? (Perception fixed at 1)

•  (The difference between these two parameters can be thought of as a perception bias)

© Nicholas P. Jewell, 2011

preferred ITT parameter

25

Page 26: Nicholas Jewell MedicReS World Congress 2011

Estimates

© Nicholas P. Jewell, 2011

26

Page 27: Nicholas Jewell MedicReS World Congress 2011

Subgroups—Interaction?

27

© Nicholas P. Jewell, 2011

Page 28: Nicholas Jewell MedicReS World Congress 2011

28

© Nicholas P. Jewell, 2011

Page 29: Nicholas Jewell MedicReS World Congress 2011

29

© Nicholas P. Jewell, 2011

Page 30: Nicholas Jewell MedicReS World Congress 2011

30

© Nicholas P. Jewell, 2011

Page 31: Nicholas Jewell MedicReS World Congress 2011

Reviewer Comments

•  Figure 4- The steroid finding is worthy of comment, even if the interaction is not statistically significant (the power for interactions is low). It may well be that the new agent has little benefit for those not treated with steroids,”

•  Figure 4 has been deleted as suggested by the editor. While there was not a significant reduction in relative risk of confirmed PUBS in the subgroup not taking steroids at baseline, VIGOR was only designed to detect a significant reduction overall, and not within any specific subgroup. There were, however, reductions in relative risk in this subgroup albeit not significant. Whenever many subgroups are examined, it becomes likely that there will be no apparent effect in some of them purely by chance. This is why the test for interaction, although underpowered as you rightly pointed out, is the appropriate way to assess subgroup effects. When there is no treatment-by-subgroup interaction, the more reliable estimate of treatment effect within a subgroup is the effect in the entire cohort [1].”

31

© Nicholas P. Jewell, 2011

Page 32: Nicholas Jewell MedicReS World Congress 2011

Meta-Analyses

•  Random or Fixed Effects –  heterogeneity –  Measure of association (RR or ER)

•  Zero-event trials/use of all evidence –  Drug safety studies (Vioxx, Celebrex, Avandia, etc) –  Continuity corrections –  Study duration

32

© Nicholas P. Jewell, 2009

Page 33: Nicholas Jewell MedicReS World Congress 2011

Meta-Analyses and No Event Studies

•  One study: 30 events under Tx, 3 under placebo (5000 individuals in both arms) – Risk difference of 0.0054 with exact 95%

confidence interval of (0.0032, 0.0076), p < 0.0001 •  Now add three (short duration) studies,all with

5000 in both arms, no events in either arms in all cases – exact 95% confidence interval of (-0.0044, 0.0001),

p = 0.46

33

© Nicholas P. Jewell, 2011

Page 34: Nicholas Jewell MedicReS World Congress 2011

Multiplicity

34

© Nicholas P. Jewell, 2011

Page 35: Nicholas Jewell MedicReS World Congress 2011

References •  Ioannidis, J.P.A. “Why most published research findings are false,” PLoS Med, 2(8),

2005, e124. •  Siegfried, T. “Odds are, it’s wrong,” Science News, March 27, 2010, 26-29. •  Bailar, J.C. III., ”Science. Statistics, and deception," Annals of Internal Medicine, 104,

1986, 259-260. •  Mills. J.S., “Data torturing,” New England Journal of Medicine, 329, 1993, 1196-1199. •  Young, J., Hubbard, A.E., Eskenazi, B. and Jewell, N. P., “A machine-learning

algorithm for estimating and ranking the impact of environmental risk factors in exploratory epidemiological studies”, to appear, 2011.

•  A.E. Hubbard, Ahern, J., Fleischer, N.L., Lippman, S.A., Jewell, N.P., Bruckner, T., Satariano, W.A., “To GEE or not to GEE: Comparing estimating function and likelihood-based methods for estimating the associations between neighboring risk factors and health,” Epidemiology, 21, 2010, 467-474.

•  Van der Laan, M., Hubbard, A., Jewell, N.P., ”Semi-parametric models versus faith-based inference," to appear, Epidemiology, 21, 2010, 479-481.

35

© Nicholas P. Jewell, 2011