the reproducibility mindset ... - cancer research ukroger peng’s coursera course and notes (2013)...

The Reproducibility Mindset:Enhancing Big Data Quality in Medicine

and Research

Keith A. BaggerlyBioinformatics and Computational Biology

UT M. D. Anderson Cancer [email protected]

Big Data Analytics, June 13, 2017

1

What Makes Data High Quality?

My (bioinformatics) viewpoint:big data is thousands of measurements per sample

Findability / Accessibility (publication, data sharing)

Accuracy / Precision (bias, variance)

Clarity / Labeling / Metadata (sanity checks)

Generality (experimental design, confounding)

Relevance to the Problem at Hand

2

What Makes Inference High Quality?

Mostly the same criteria applied to the methods used

Can we find/access the code?

Can we understand it? Is the workflow clear?

Are the methods employed appropriate?

Will results replicate? (design, prespecification, train/test)

These criteria also apply to any preprocessing of “raw” datainto the final form (often important for big data)

None of this is inherently complicated

3

Relevance to Medicine and Research?

Many biomedical claims are failing tests of replicability

2005, 2009 Ioannidis ( et al)

2012 Begley and Ellis

2015 Freedman et al

2016 NIH Rigor and Reproducibility Initiative

2017 NSF, NASEM

We’re getting better, but there’s still room for improvement

Let’s look at some examples...

http://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0020124

http://www.nature.com/ng/journal/v41/n2/full/ng.295.html

http://www.nature.com/nature/journal/v483/n7391/full/483531a.html

http://www.plosbiology.org/article/fetchObject.action?uri=info:doi/10.1371/journal.pbio.1002165&representation=PDF

https://www.nih.gov/research-training/rigor-reproducibility

http://www.mrsec.harvard.edu/2017NSFReliability/index.html

https://www.youtube.com/playlist?list=PLGJm1x3XQeK0FeRdgKcyvyBH8TKUBtwFv

4

A Proteomics Case Study

Petricoin et al (2002), Lancet, 359(9306):572-77

100 ovarian cancer patients100 normal controls16 patients with “benign disease”

Use 50 cancer and 50 normal spectra to train a classificationmethod; test the algorithm on the remaining samples.

http://www.sciencedirect.com/science/article/pii/S0140673602077462

5

Which Group is Different?

6

Really?

7

Processing Can Trump Biology: Design!

8

Using Cell Lines to Predict Sensitivity

Potti et al (2006), Nature Medicine, 12:1294-300.

The main conclusion: we can use microarray data from celllines (the NCI60) to define drug response “signatures”, whichcan predict whether patients will respond.

They provide examples using 7 commonly used agents.

This got people at MDA very excited.

http://www.nature.com/nm/journal/v12/n11/full/nm1491.html

9

Their Gene List and Ours

> temp <- cbind(sort(rownames(pottiUpdated)[fuRows]),sort(rownames(pottiUpdated)[

[email protected] <= fuCut]);> colnames(temp) <- c("Theirs", "Ours");> temp

Theirs Ours...[3,] "1881_at" "1882_g_at"[4,] "31321_at" "31322_at"[5,] "31725_s_at" "31726_at"[6,] "32307_r_at" "32308_r_at"...

10

Predicting Response: Docetaxel

Potti et al (2006), Nature Medicine, 12:1294-300, Fig 1d

Chang et al, Lancet 2003, 362:362-9, Fig 2 top


http://www.sciencedirect.com/science/article/pii/S0140673603140238

11

Predicting Response: Adriamycin

Potti et al (2006), Nature Medicine, 12:1294-300, Fig 2c

Holleman et al, NEJM 2004, 351:533-42, Fig 1


http://www.nejm.org/doi/full/10.1056/NEJMoa033513

12

We Tried Matching Their Validation Samples

43 samples are mislabeled.16 samples don’t match because the genes are mislabeled.All of the validation data are wrong.

We reported this to Duke and the NCI in mid-Nov 2009.

http://bioinformatics.mdanderson.org/Supplements/ReproRsch-All/Modified/index.html

13

A Catalyzing Event: July 16, 2010

Jul 19/20: Letter to Varmus; Duke resuspends trials.Oct 22/9: First call for paper retraction.Nov 9: Duke terminates trials.Nov 19: call for Nat Med retraction, Potti resigns

http://www.cancerletter.com/articles/20131204_3

http://www.cancerletter.com/articles/20100803



http://www.cancerletter.com/articles/20101123_1

14

Age-Related Macular Degeneration (AMD)

AREDS, 2001

Awh et al, 2013 Genomics matters!

https://www.ncbi.nlm.nih.gov/pubmed/11594942


15

However...

Chew et al, 2014 No it doesn’t!

Awh et al, 2015 Yes it does!



16

Do we Need to Genotype?

Awh et al, 2015, Fig 2Statistical arbitration sought


17

Challenges

Data cleaning

Experimenter degrees of freedom

How many genes were examined before choosingCFH and ARMS2?

How were genotype groups defined?Was this algorithmic?

Are we working with the same data?

Will the claims hold in an independent test set?

18

MetaAnalysis: Dietary Reference Intakes (DRIs)

Recommended Dietary Allowance (RDA): the average dailydietary intake level that is sufficient to meet the nutrientrequirement of nearly all (97 to 98 percent) healthyindividuals in a group.

Estimated Average Requirement (EAR): a nutrient intakevalue that is estimated to meet the requirement of half thehealthy individuals in a group.

Tolerable Upper Intake Level (UL): the highest daily nutrientintake likely to pose no risk of adverse health effects toalmost all individuals in the general population. As intakeincreases above the UL, the risk of adverse effects increases.

Adequate Intake (AI): the fallback

https://www.ncbi.nlm.nih.gov/books/NBK45182/

19

Nomenclature

Intakes in IUWe consume this

Serum Levels inng/mL = 2.5 nmol/L

We link this to outcomeRequirements in terms of either

20

Modeling Intakes and Requirements

IOM 2000: DRIs in Dietary Assessment, Fig 4.2Assume normality and model.

https://www.nap.edu/catalog/9956/dietary-reference-intakes-applications-in-dietary-assessment

21

Defining Requirements in Other Units

Durazo-Arvizu et al, 2010, Fig 1

When is the nutrient product not biochemically limiting?

If vitamin D is too low to regulate calcium, parathyroidhormone (PTH) will increase and leach Ca from bones.

Requirements use serum levels (ng/mL); intakes use IU.


22

Priemel et al, Figure 4d: OV/BV

OV/BV values ≥ 1.2% or 2% are bad.Priemel et al recommended targeting >30 ng/mL.

http://onlinelibrary.wiley.com/doi/10.1359/jbmr.090728/abstract

http://onlinelibrary.wiley.com/doi/10.1359/jbmr.090728/abstract

23

The Official Cutpoint

20 ng/mL = 50 nmol/L.

Why? Because 97.5% isn’t 100%.

“The number ... above 50 nmol/L was counted by inspection...

At ... 50 nmol/L, there were seven data points reflecting ...(OV/BV > 2 percent).

This suggested ...50 nmol/L met the needs of 99 percent ... (that is, only 7 of675 surpassed the measure).”

IOM report, p.276.

24

Come Again?

Is this picture reasonable?

25

Zooming In on 20 ng/mL

This rate of problems is way too high

26

Mapping Serum to Intake, IOM 2011 Fig 5.4

Are cohort averages (dots) vertically close to a curve?How can we model variation in attainments?

http://www.nationalacademies.org/hmd/Reports/2010/Dietary-Reference-Intakes-for-Calcium-and-Vitamin-D.aspx

27

SD(Attainments) for the Studies Used

Here, σY /σX ≈ 4. (One study used SEM.)

28

All the Data

IOM in red, About 2 SEM; prediction in black.

29

Doing it Better: IMPACT, May 8

Zehir et al, Nat Med 10945 samples from 10336 patients

http://www.nature.com/nm/journal/vaop/ncurrent/full/nm.4333.html

30

Most Data are Publicly Available

From the Paper

The Supplementary Information (meta-data, annotation)

The cBio Portal http://cbioportal.org/msk-impact

GitHub repositories of their data processing pipelines

Not BAM level raw data, but somatic mutation calls, variantallele fractions, and the like

http://cbioportal.org/msk-impact

31

We Can Check It: TP53

The uber-tumor suppressor: break anywhere

32

We Can Check It: KRAS

A key oncogene: break in very specific places

33

The Bottom Lines

These cases may be pathological.

But we see similar problems a lot.

The most common mistakes are simple.

Confounding in the Experimental DesignMixing up the sample labelsMixing up the gene labelsMixing up the group labels(Most mixups involve simple switches or offsets)

This simplicity is often hidden.

Incomplete documentation

This is fixable.

34

Reasons for Hope

1. Our Own (Evolving!) Experience & Sanity Checks

2. Better tools (knitr, markdown, GitHub)

3. Journals, Code and Data

4. The IOM, the FDA, and IDEs*

5. The NCI and Trials it Funds

6. OSTP, Congress, Science, Nature

As I perform an analysis, am I confident I or someone elsecould easily get the same results again, or modify theanalysis if need be?

http://yihui.name/knitr/

https://daringfireball.net/projects/markdown/

https://github.com/

35

Some Places to Learn More

Karl Broman’s Tools for RR Course

Roger Peng’s Coursera course and notes (2013)

Christopher Gandrud’s book (2e, 2015)

Yihui Xie’s book (2e, 2015)

Hadley Wickham’s R Packages book (2015)

NAS meeting, Feb 26-7, 2015

SISBID Reproducible Research Short Course, July 2016

ASA Webinar, Nov 16, 2016

http://kbroman.org/Tools4RR/

https://itunes.apple.com/us/book/id961495566?mt=11

http://www.amazon.com/Reproducible-Research-Studio-Second-Chapman-ebook/dp/B010ACWGBI/ref=tmm_kin_title_0?_encoding=UTF8&sr=&qid=

http://www.amazon.com/Dynamic-Documents-knitr-Second-Chapman-ebook/dp/B00ZBYPJEW/ref=tmm_kin_title_0?_encoding=UTF8&sr=&qid=

http://www.amazon.com/R-Packages-Hadley-Wickham-ebook/dp/B00VAYCHL0/ref=pd_sim_351_6?ie=UTF8&refRID=1E8HS30WBHRCW45SEWXM

http://sites.nationalacademies.org/DEPS/BMSA/DEPS_153236

https://github.com/SISBID/Module3

https://www.amstat.org/asa/files/pdfs/EDU-ReproducibleResearchWebinarTranscript.pdf

36

Thanks!