bad science (2015)
TRANSCRIPT
“Torture numbers and they will tell you anything”*
Peter Kamerman Brain Function Research Group, University of the Witwatersrand, South Africa
Bad science
* Greg Easterbrook
Bad science Science under threat
UNIVERSITY OF THE WITWATERSRAND
Bad science Paper retractions are on the rise
Retracted biomedical research
Retracted in other scientific fields
Year of retraction
Num
ber o
f ret
ract
ed a
rticl
es
Grieneisen & Zhang, 2012 UNIVERSITY OF THE WITWATERSRAND
Bad science Almost half of retractions are for scientific misconduct
Van Noorden, 2011; Wagner & Williams, 2008 UNIVERSITY OF THE WITWATERSRAND
Bad science Biomedical publications are more likely to be retracted
Grieneisen & Zhang, 2012 UNIVERSITY OF THE WITWATERSRAND
Percent of all articles (%)
Per
cent
retra
ctio
ns (%
) Medicine
50
40
Bad science Fortunately, retractions are rare
Grieneisen & Zhang, 2012 UNIVERSITY OF THE WITWATERSRAND
% retracted in biomedical research
% retracted in other scientific fields
Year
Per
cent
age
of re
cord
s re
trcat
ed
“80% of non-randomized studies turn out to be
wrong, as do 25% of supposedly gold-
standard randomized trials, and as much as
10% of the platinum-standard large
randomized trials”
John Ioannidis (Health Research and Policy, Stanford School of Medicine)
UNIVERSITY OF THE WITWATERSRAND
Two broad categories:
• Publication bias
• Poor study design, execution and analysis
Bad science Where is it going wrong?
UNIVERSITY OF THE WITWATERSRAND
Publication bias Vanashing studies
UNIVERSITY OF THE WITWATERSRAND Hopewell et al., 2009
Negative trials (median: 0.4)
Positive trials (median: 0.7)
Proportion published
Publication bias Inflated estimates of effect size
UNIVERSITY OF THE WITWATERSRAND Finnerup et al., 2015
Effect size
Pre
cisi
on
Effect size
Trim-and-fill
~10%
Publication bias Drugs susceptable to bias
UNIVERSITY OF THE WITWATERSRAND Finnerup et al., 2015
* Number of participants in a negative trial to increase NNT to 11
*"
Poor study design, execution and analysis The experimental method
P value
Summary statistics
Tidy data
Raw data
Experimental design
Hypothesis testing
Basic data analysis
Data cleaning
Data collection
UNIVERSITY OF THE WITWATERSRAND Leek & Peng, 2015
Poor study design, execution and analysis The experimental method
P value
Summary statistics
Tidy data
Raw data
Experimental design
Hypothesis testing
Basic data analysis
Data cleaning
Data collection
Little scrutiny
Lots of scrutiny
UNIVERSITY OF THE WITWATERSRAND Leek & Peng, 2015
The p-value has been likened to:
• A mosquito (annoying and impossible to swat away);
• The emperor's new clothes
(fraught with obvious problems that everyone ignores); • A “sterile intellectual rake”
(ravishes science, but leaves it with no progeny)
The P value: Statistical Hypothesis Inference Testing
UNIVERSITY OF THE WITWATERSRAND Nuzzo, 2014; Lambdin, 2012
Poor study design, execution and analysis
“Statistics are like bikinis. What they reveal is
suggestive, but what they conceal is vital”
Aaron Levenstein
(Baruch College, CUNY)
UNIVERSITY OF THE WITWATERSRAND
The experimental method
P value
Summary statistics
Tidy data
Raw data
Experimental design
Hypothesis testing
Basic data analysis
Data cleaning
Data collection
Poor decisions
in data analysis
UNIVERSITY OF THE WITWATERSRAND Leek & Peng, 2015
Poor study design, execution and analysis
“The vast majority of data analysis is not
performed by people properly trained to
perform data analysis…[there is] a
fundamental shortage of data analytic skill”
Jeff Leek (Johns Hopkins Bloomberg School of Public Health)
UNIVERSITY OF THE WITWATERSRAND
• Reactive rather than prospective analysis plan;
• Not understanding basic principles underlying choice of statistical test;
• Not viewing the data;
• Not assessing or hiding variance and error estimates;
• Not understanding what a P value means;
• Not correcting for multiple comparisons;
• Over-fitting models
Common errors in data analysis
UNIVERSITY OF THE WITWATERSRAND Nuzzo, 2014; Lambdin, 2012
Poor analysis
• Retrospective registration of a trial on a trials database;
• Primary end-points not clearly stated;
• Analyses do not directly address the primary end-point(s);
What should you look out for?
UNIVERSITY OF THE WITWATERSRAND Nuzzo, 2014; Lambdin, 2012
Poor analysis
"
• No CONSORT flow diagram
• Analysis of per protocol vs intention-to-treat population;
• Method of imputation not specified (e.g., LOCF, BOCF);
• No correction for multiple comparisons;
What should you look out for?
UNIVERSITY OF THE WITWATERSRAND Nuzzo, 2014; Lambdin, 2012
Poor analysis
The experimental method
P value
Summary statistics
Tidy data
Raw data
Experimental design
Hypothesis testing
Basic data analysis
Data cleaning
Data collection Poor design and execution
UNIVERSITY OF THE WITWATERSRAND Leek & Peng, 2015
Poor study design, execution and analysis
• No sample size calculation;
• No or inappropriate randomization;
• No concealment;
• Study too short;
• Biased sampling;
• Biased/inappropriate measurements;
• Not assessing potential confounders
Common errors in study design
UNIVERSITY OF THE WITWATERSRAND
Poor design and execution
Filters to apply:
Filter I: Are the methods valid?
Filter II: Are the results clinically important?
Filter III: Are the results important for my practice?
Bad science Interpreting the data
UNIVERSITY OF THE WITWATERSRAND American"Society"for"Reproduc4ve"Medicine,"2008"
Filters to apply:
Filter I: Are the methods valid?
• Was the assignment of patients randomized?
• Was the randomization concealed?
• Was follow-up sufficiently long and complete?
• Were all patients analyzed in the groups they were allocated to?
Bad science Interpreting the data
UNIVERSITY OF THE WITWATERSRAND American"Society"for"Reproduc4ve"Medicine,"2008"
Filters to apply:
Filter I: Are the methods valid? Filter II: Are the results clinically important?
• Was the treatment effect large enough to be clinically relevant?
• Was the treatment effect precise?
• Are the conclusions based on the question posed and are the results obtained?
Bad science Interpreting the data
UNIVERSITY OF THE WITWATERSRAND American"Society"for"Reproduc4ve"Medicine,"2008"
Is it clinically important?
• Effect size (minimally important clinical difference)
• Direction of change
• Precision
Bad science Interpreting the data
UNIVERSITY OF THE WITWATERSRAND
Absolute measures
• Absolute change from baseline
• Numbers needed to treat (NNT)
Relative measures
• Percentage change from baseline
• Risk ratio /relative risk (RR)
• Odds ratio (OR)
Bad science Typical measures of effect size in pain studies
UNIVERSITY OF THE WITWATERSRAND
Bad science Precision of the estimate
UNIVERSITY OF THE WITWATERSRAND
Trial& Mean&&pain&difference:&Drug&2&Placebo&
P&value" Change&from&baseline:&Drug&
95%&CI&&of&change&from&baseline:&Drug&
1" <1.7" <"0.001" <2.1" <2.4"to"<1.8"2" <0.5" 0.2" <1.5" <1.8"to"<1.2"3" <2.3" <"0.001" <3.6" <3.8"to"–"3.3"4" <0.3" 0.1" <3.4" <3.7"to"<3.2"Modelled:"delta"="1,"n=234"per"group,"common"SD"="2.2,"power"="0.9""
Bad science Precision of the estimate
UNIVERSITY OF THE WITWATERSRAND
Trial& Mean&&pain&difference:&Drug&2&Placebo&
P&value" Change&from&baseline:&Drug&
95%&CI&&of&change&from&baseline:&Drug&
1" <1.7" <"0.001" <2.1" <2.4"to"<1.8"2" <0.5" 0.2" <1.5" <1.8"to"<1.2"3" <2.3" <"0.001" <3.6" <3.8"to"<3.3"4" <0.3" 0.1" <3.4" <3.7"to"<3.2"Modelled:"delta"="1,"n=234"per"group,"common"SD"="2.2,"power"="0.9""
Filters to apply:
Filter I: Are the methods valid? Filter II: Are the results clinically important? Filter III: Are the results important for your practice?
• Is the study population similar to the patients in your practice?
• Is the intervention feasible in your own clinical setting?
• What are your patient’s personal risks and potential benefits from the therapy?
• What alternative treatments are available?
Bad science Interpreting the data
UNIVERSITY OF THE WITWATERSRAND American"Society"for"Reproduc4ve"Medicine,"2008"
“The average human has one breast
and one testicle”
Desmond McHale (School of Mathematical Sciences, University College Cork, Ireland)