nci clinical genomics data sharing ncra sept 2016

42
The Genomic Data Commons Data Sharing and Cancer Research Knowledge Ecosystem Warren Kibbe, PhD [email protected] @wakibbe September 26 th , 2016

Upload: warren-kibbe

Post on 09-Feb-2017

286 views

Category:

Health & Medicine


0 download

TRANSCRIPT

Page 1: Nci clinical genomics data sharing ncra sept 2016

The Genomic Data Commons

Data Sharing and

Cancer Research Knowledge EcosystemWarren Kibbe, PhD

[email protected]

@wakibbe

September 26th, 2016

Page 2: Nci clinical genomics data sharing ncra sept 2016

2

Cancer Data Sharing

Genomic Data Commons

Cancer Data Ecosystem

• Support open science • Support data reusability• Precision Oncology• Improve patient access to clinical

trials

Reduce the risk, improve early detection, outcomes and survivorship in cancer

Page 3: Nci clinical genomics data sharing ncra sept 2016

Cancer Moonshot: Genomic Data Commons, Data sharing, Ecosystem

The era of precision oncology is predicated on the integration of research, care, and molecular medicine and the availability of data for modeling, risk analysis, and optimal care

The Genomic Data Commons is a completely new way of making

scientific data available to the cancer research community

Page 4: Nci clinical genomics data sharing ncra sept 2016

4

The Cancer Genomic Data Commons (GDC) is an existing effort to standardize and simplify submission of genomic data to NCI and follow the principles of FAIR – Findable, Accessible, Attributable, Interoperable, Reusable, and Provide Recognition.

The GDC is part of the NIH Big Data to Knowledge (BD2K) initiative and an example of the NIH Commons

Genomic Data Commons

Microattribution, nanopublications, tracking the use of data, annotation of data, use of

algorithms, supports the data /software /metadata life cycle to provide credit and

analyze impact of data, software, analytics, algorithm, curation and knowledge sharing

Force11 white paperhttps://www.force11.org/group/fairgroup/fairprinciples

Page 5: Nci clinical genomics data sharing ncra sept 2016

Cancer Research Data Ecosystem – Cancer Moonshot BRP

Well characterized Research Data Sets Cancer Cohorts Patient Data

EHR, Lab Data, Imaging, PROs, Smart Devices,

Decision Support

Learning from everycancer patient

Active researchparticipation

Research informationdonor

Clinical ResearchObservational Studies

Proteogenomics Imaging DataClinical Trials

Discovery Patient-engaged Research

SurveillanceBig Data

Implementation Research

SEERGDC

Page 6: Nci clinical genomics data sharing ncra sept 2016

6

Cancer Research Data Commons Ecosystem

GenomicData Commons

Data Validation and Harmonization

ImagingData Commons

ProteomicsData Commons

Clinical Data Commons(Cohorts /

Indiv.)

SEER(Populations)

Data Contributors and Consumers

Researchers PatientsClinicians

Institutions

Page 7: Nci clinical genomics data sharing ncra sept 2016

Why is the GDC important? A data ecosystem?How does this move cancer research forward?

Page 8: Nci clinical genomics data sharing ncra sept 2016

8

Page 9: Nci clinical genomics data sharing ncra sept 2016

Biological Scales

Page 10: Nci clinical genomics data sharing ncra sept 2016

10

Page 11: Nci clinical genomics data sharing ncra sept 2016

11

Digital technology is changing rapidly

Page 12: Nci clinical genomics data sharing ncra sept 2016

12

Cancer therapy is changing

Page 13: Nci clinical genomics data sharing ncra sept 2016

13

Application of Cancer Genomics is changing

Page 14: Nci clinical genomics data sharing ncra sept 2016

What about Data Sharing?

Making data available for discovery, validation, new therapies

Working toward a National Learning Heath System for Cancer

Maximizing the impact, reuse, and reproducibility of cancer research

We need to change the incentives for data sharing

Page 15: Nci clinical genomics data sharing ncra sept 2016

15

NIH Genomic Data Sharing Policy

https://gds.nih.gov/ Went into effect January 25, 2015

NCI guidance:http://

www.cancer.gov/grants-training/grants-management/nci-policies/genomic-data

Requires public sharing of genomic data sets

Page 16: Nci clinical genomics data sharing ncra sept 2016

16

http://cancer.gov/brp

Page 17: Nci clinical genomics data sharing ncra sept 2016

17

BRP RECOs

Page 18: Nci clinical genomics data sharing ncra sept 2016

18

BRP RECOs

Page 19: Nci clinical genomics data sharing ncra sept 2016

19

BRP RECOs

Page 20: Nci clinical genomics data sharing ncra sept 2016

20

BRP RECOs

Page 21: Nci clinical genomics data sharing ncra sept 2016

21

BRP and a Cancer Data Ecosystem

All the BRP Recos need some IT/informatics/analytics infrastructure

Page 22: Nci clinical genomics data sharing ncra sept 2016

22

What is the GDC?

Page 23: Nci clinical genomics data sharing ncra sept 2016

Genomic Data Commons Data Portal

Page 24: Nci clinical genomics data sharing ncra sept 2016

The NCI Genomic Data Commons User InterfaceHome Page

Page 25: Nci clinical genomics data sharing ncra sept 2016

The NCI Genomic Data Commons User InterfaceSample Browser

Page 26: Nci clinical genomics data sharing ncra sept 2016

26

Clinical data Biospecimen data

Molecular data Files uploaded

The NCI Genomic Data Commons User InterfaceData Submission Dashboard

Page 27: Nci clinical genomics data sharing ncra sept 2016

Development of the NCI Genomic Data Commons (GDC)To Foster the Molecular Diagnosis and Treatment of Cancer

GDC

Bob Grossman PIUniv. of Chicago

Ontario Inst. Cancer Res.Leidos

Institute of MedicineTowards Precision Medicine

2011

Page 28: Nci clinical genomics data sharing ncra sept 2016
Page 29: Nci clinical genomics data sharing ncra sept 2016
Page 30: Nci clinical genomics data sharing ncra sept 2016

Discovery of Cancer Drivers With 2% Prevalence

Lung adeno.+ 2,900

Colorectal+ 1,200

Ovarian+ 500

Lawrence et al, Nature 2014

Power Calculation for Cancer Driver Discovery

Need to resequence >100,000 tumors to identify all cancer drivers at >2% prevalence

Page 31: Nci clinical genomics data sharing ncra sept 2016

GDC Content

GDC

TCGA11,353 cases TARGET 3,178 cases

Current

Foundation Medicine 18,000 cases Cancer studies in dbGAP ~4,000 cases

Coming soon

NCI-MATCH ~5,000 cases Clinical Trial Sequencing Program ~3,000 cases

Planned (1-3 years)

Cancer Driver Discovery Program ~5,000 cases Human Cancer Model Initiative ~1,000 cases APOLLO – VA-DoD ~8,000

cases

~58,000 cases

Page 32: Nci clinical genomics data sharing ncra sept 2016

What Makes GDC Special? Stores raw genomic data, allowing continuous reanalysis as

computation methods and genome annotations improve

NCI commitment to maintain long-term storage of cancer genomic data in the GDC with free access to researchers

Utilizes shared bioinformatic pipelines to facilitate cross-study comparisons and integrated analysis of multiple data types

Maintains harmonized clinical data in a highly structured and extensible schema

Enables researchers to comply with the NIH Genomic Data Sharing policy as well as journal requirements for data sharing

GDC The explanatory power of data in the GDC will grow over time as

it accrues more cases => GDC will promote precision oncology

Page 33: Nci clinical genomics data sharing ncra sept 2016

Other Cancer Data Sharing EffortsSignature Efforts Data

BRCA ChallengeSomatic variant sharing

Isolated genetic variantsNo raw sequencing data

Precision medicine questionsSomatic variant sharing

Panel gene resequencingClinical response

Clinical trialPublic-private partnerships

Comprehensive genomicsDetailed clinical

phenotype data

Clinical trial accessClinical/genomic data aggregation

EHR dataClinical sequencing

Clinical oncology standardsEHR dataClinical sequencing

Page 34: Nci clinical genomics data sharing ncra sept 2016

GDC

Towards a Cancer Knowledge System Continue genomic investigations of cancer

=> Need > 100,000 cases analyzed=> Embrace all genomic platforms=> Relationship of relapse and primary biopsies

Incorporate associated clinical annotations=> Clinical trial data=> Observational, longitudinal standard-of-care data=> N-of-1 clinical data

Promote and curate biological investigations of cancer genetic variants=> Driver vs. passenger mutations=> Multiple phenotypic assays=> Alterations in regulatory pathways – proteomics=> Mechanisms of therapeutic resistance=> Functional genomic investigations

Integrative models for high-dimensional data

Page 35: Nci clinical genomics data sharing ncra sept 2016

GDC

Utility of a Cancer Knowledge System

Identifylow-frequencycancer drivers

Define genomicdeterminants of response

to therapy

Compose clinical trialcohorts sharing

targeted genetic lesions

Cancerinformation

donors

GenomicData

Commons

Page 36: Nci clinical genomics data sharing ncra sept 2016

36

Genomic Data Commons / Cloud Pilot Ecosystem– Near Term

Data Submission, Harmonization, Visualization &

Download

Broad FireCloud

ISB CGC

Researchers

APIs

Web InterfaceResearchersContributors

Authentication & Authorization

thru eRA Commons and dbGaP

SBG CGCCommons @ AWS

Commons @ GCP

Commons @ Azure

New algorithms,tools, pipelines, visualizationsDockStore

GenomicData Commons

Page 37: Nci clinical genomics data sharing ncra sept 2016

37

Support the Precision Medicine Initiative

• Expand data model to include other data (e.g. imaging and proteomics)

• Allow easy publication of persistent links to data, annotations, algorithms, tools, workflows

• Measure usage and impact

• Change incentives for public contributions

The Genomic Data Commons and Cloud Pilots

Page 38: Nci clinical genomics data sharing ncra sept 2016

Cancer data ecosystem

Well characterized research data

sets

Cancer cohorts Patient data

EHR, lab data, imaging, PROs, smart devices,

decision support

Learning from everycancer patient

Active researchparticipation

Researchinformationdonor

Clinical ResearchObservational studies

ProteogenomicsImaging dataClinical trials

Discovery Patient engaged Research

SurveillanceBig Data

Implementation research

SEER

Page 39: Nci clinical genomics data sharing ncra sept 2016

GDC AcknowledgementsNCI Center for Cancer Genomics Univ. of Chicago

Bob GrossmanAllison Heath

Mike FordZhenyu Zhang

Ontario Institute for Cancer Research

Lou StaudtZhining Wang

Martin FergusonJC Zenklusen

Daniela GerhardDeb Steverson

Vincent Ferretti'Francois Gerthoffert

JunJun Zhang

Leidos Biomedical ResearchMark Jensen

Sharon GaheenHimanso Sahni

NCI NCI CBIITTony KerlavageTanya Davidsen

Page 40: Nci clinical genomics data sharing ncra sept 2016

CGC Pilot Team Principal Investigators • Gad Getz, Ph.D - Broad Institute - http://firecloud.org • Ilya Shmulevich, Ph.D - ISB - http://cgc.systemsbiology.net/ • Deniz Kural, Ph.D - Seven Bridges – http://www.cancergenomicscloud.org

NCI Project Officer & CORs• Anthony Kerlavage, Ph.D –Project Officer• Juli Klemm, Ph.D – COR, Broad Institute• Tanja Davidsen, Ph.D – COR, Institute for Systems Biology • Ishwar Chandramouliswaran, MS, MBA – COR, Seven Bridges Genomics

GDC Principal Investigator• Robert Grossman, Ph.D - University of Chicago• Allison Heath, Ph.D - University of Chicago• Vincent Ferretti, Ph.D - Ontario Institute for Cancer Research

Cancer Genomics Project Teams

NCI Leadership Team• Doug Lowy, M.D.• Lou Staudt, M.D., Ph.D.• Stephen Chanock, M.D.• George Komatsoulis, Ph.D.• Warren Kibbe, Ph.D.

Center for Cancer Genomics Partners• JC Zenklusen, Ph.D.• Daniela Gerhard, Ph.D.• Zhining Wang, Ph.D.• Liming Yang, Ph.D.• Martin Ferguson, Ph.D.

Page 41: Nci clinical genomics data sharing ncra sept 2016

41

Questions?

Warren Kibbe, Ph.D.

[email protected]

@wakibbe

Page 42: Nci clinical genomics data sharing ncra sept 2016

www.cancer.gov www.cancer.gov/espanol