the reproducibility mindset ... - cancer research ukroger peng’s coursera course and notes (2013)...
TRANSCRIPT
The Reproducibility Mindset:Enhancing Big Data Quality in Medicine
and Research
Keith A. BaggerlyBioinformatics and Computational Biology
UT M. D. Anderson Cancer [email protected]
Big Data Analytics, June 13, 2017
1
What Makes Data High Quality?
My (bioinformatics) viewpoint:big data is thousands of measurements per sample
Findability / Accessibility (publication, data sharing)
Accuracy / Precision (bias, variance)
Clarity / Labeling / Metadata (sanity checks)
Generality (experimental design, confounding)
Relevance to the Problem at Hand
2
What Makes Inference High Quality?
Mostly the same criteria applied to the methods used
Can we find/access the code?
Can we understand it? Is the workflow clear?
Are the methods employed appropriate?
Will results replicate? (design, prespecification, train/test)
These criteria also apply to any preprocessing of “raw” datainto the final form (often important for big data)
None of this is inherently complicated
3
Relevance to Medicine and Research?
Many biomedical claims are failing tests of replicability
2005, 2009 Ioannidis ( et al)
2012 Begley and Ellis
2015 Freedman et al
2016 NIH Rigor and Reproducibility Initiative
2017 NSF, NASEM
We’re getting better, but there’s still room for improvement
Let’s look at some examples...
4
A Proteomics Case Study
Petricoin et al (2002), Lancet, 359(9306):572-77
100 ovarian cancer patients100 normal controls16 patients with “benign disease”
Use 50 cancer and 50 normal spectra to train a classificationmethod; test the algorithm on the remaining samples.
5
Which Group is Different?
6
Really?
7
Processing Can Trump Biology: Design!
8
Using Cell Lines to Predict Sensitivity
Potti et al (2006), Nature Medicine, 12:1294-300.
The main conclusion: we can use microarray data from celllines (the NCI60) to define drug response “signatures”, whichcan predict whether patients will respond.
They provide examples using 7 commonly used agents.
This got people at MDA very excited.
9
Their Gene List and Ours
> temp <- cbind(sort(rownames(pottiUpdated)[fuRows]),sort(rownames(pottiUpdated)[
[email protected] <= fuCut]);> colnames(temp) <- c("Theirs", "Ours");> temp
Theirs Ours...[3,] "1881_at" "1882_g_at"[4,] "31321_at" "31322_at"[5,] "31725_s_at" "31726_at"[6,] "32307_r_at" "32308_r_at"...
10
Predicting Response: Docetaxel
Potti et al (2006), Nature Medicine, 12:1294-300, Fig 1d
Chang et al, Lancet 2003, 362:362-9, Fig 2 top
11
Predicting Response: Adriamycin
Potti et al (2006), Nature Medicine, 12:1294-300, Fig 2c
Holleman et al, NEJM 2004, 351:533-42, Fig 1
12
We Tried Matching Their Validation Samples
43 samples are mislabeled.16 samples don’t match because the genes are mislabeled.All of the validation data are wrong.
We reported this to Duke and the NCI in mid-Nov 2009.
13
A Catalyzing Event: July 16, 2010
Jul 19/20: Letter to Varmus; Duke resuspends trials.Oct 22/9: First call for paper retraction.Nov 9: Duke terminates trials.Nov 19: call for Nat Med retraction, Potti resigns
14
Age-Related Macular Degeneration (AMD)
AREDS, 2001
Awh et al, 2013 Genomics matters!
15
However...
Chew et al, 2014 No it doesn’t!
Awh et al, 2015 Yes it does!
16
Do we Need to Genotype?
Awh et al, 2015, Fig 2Statistical arbitration sought
17
Challenges
Data cleaning
Experimenter degrees of freedom
How many genes were examined before choosingCFH and ARMS2?
How were genotype groups defined?Was this algorithmic?
Are we working with the same data?
Will the claims hold in an independent test set?
18
MetaAnalysis: Dietary Reference Intakes (DRIs)
Recommended Dietary Allowance (RDA): the average dailydietary intake level that is sufficient to meet the nutrientrequirement of nearly all (97 to 98 percent) healthyindividuals in a group.
Estimated Average Requirement (EAR): a nutrient intakevalue that is estimated to meet the requirement of half thehealthy individuals in a group.
Tolerable Upper Intake Level (UL): the highest daily nutrientintake likely to pose no risk of adverse health effects toalmost all individuals in the general population. As intakeincreases above the UL, the risk of adverse effects increases.
Adequate Intake (AI): the fallback
19
Nomenclature
Intakes in IUWe consume this
Serum Levels inng/mL = 2.5 nmol/L
We link this to outcomeRequirements in terms of either
20
Modeling Intakes and Requirements
IOM 2000: DRIs in Dietary Assessment, Fig 4.2Assume normality and model.
21
Defining Requirements in Other Units
Durazo-Arvizu et al, 2010, Fig 1
When is the nutrient product not biochemically limiting?
If vitamin D is too low to regulate calcium, parathyroidhormone (PTH) will increase and leach Ca from bones.
Requirements use serum levels (ng/mL); intakes use IU.
22
Priemel et al, Figure 4d: OV/BV
OV/BV values ≥ 1.2% or 2% are bad.Priemel et al recommended targeting >30 ng/mL.
23
The Official Cutpoint
20 ng/mL = 50 nmol/L.
Why? Because 97.5% isn’t 100%.
“The number ... above 50 nmol/L was counted by inspection...
At ... 50 nmol/L, there were seven data points reflecting ...(OV/BV > 2 percent).
This suggested ...50 nmol/L met the needs of 99 percent ... (that is, only 7 of675 surpassed the measure).”
IOM report, p.276.
24
Come Again?
Is this picture reasonable?
25
Zooming In on 20 ng/mL
This rate of problems is way too high
26
Mapping Serum to Intake, IOM 2011 Fig 5.4
Are cohort averages (dots) vertically close to a curve?How can we model variation in attainments?
27
SD(Attainments) for the Studies Used
Here, σY /σX ≈ 4. (One study used SEM.)
28
All the Data
IOM in red, About 2 SEM; prediction in black.
29
Doing it Better: IMPACT, May 8
Zehir et al, Nat Med 10945 samples from 10336 patients
30
Most Data are Publicly Available
From the Paper
The Supplementary Information (meta-data, annotation)
The cBio Portal http://cbioportal.org/msk-impact
GitHub repositories of their data processing pipelines
Not BAM level raw data, but somatic mutation calls, variantallele fractions, and the like
31
We Can Check It: TP53
The uber-tumor suppressor: break anywhere
32
We Can Check It: KRAS
A key oncogene: break in very specific places
33
The Bottom Lines
These cases may be pathological.
But we see similar problems a lot.
The most common mistakes are simple.
Confounding in the Experimental DesignMixing up the sample labelsMixing up the gene labelsMixing up the group labels(Most mixups involve simple switches or offsets)
This simplicity is often hidden.
Incomplete documentation
This is fixable.
34
Reasons for Hope
1. Our Own (Evolving!) Experience & Sanity Checks
2. Better tools (knitr, markdown, GitHub)
3. Journals, Code and Data
4. The IOM, the FDA, and IDEs*
5. The NCI and Trials it Funds
6. OSTP, Congress, Science, Nature
As I perform an analysis, am I confident I or someone elsecould easily get the same results again, or modify theanalysis if need be?
35
Some Places to Learn More
Karl Broman’s Tools for RR Course
Roger Peng’s Coursera course and notes (2013)
Christopher Gandrud’s book (2e, 2015)
Yihui Xie’s book (2e, 2015)
Hadley Wickham’s R Packages book (2015)
NAS meeting, Feb 26-7, 2015
SISBID Reproducible Research Short Course, July 2016
ASA Webinar, Nov 16, 2016
36
Thanks!