nci clinical genomics data sharing ncra sept 2016
TRANSCRIPT
The Genomic Data Commons
Data Sharing and
Cancer Research Knowledge EcosystemWarren Kibbe, PhD
@wakibbe
September 26th, 2016
2
Cancer Data Sharing
Genomic Data Commons
Cancer Data Ecosystem
• Support open science • Support data reusability• Precision Oncology• Improve patient access to clinical
trials
Reduce the risk, improve early detection, outcomes and survivorship in cancer
Cancer Moonshot: Genomic Data Commons, Data sharing, Ecosystem
The era of precision oncology is predicated on the integration of research, care, and molecular medicine and the availability of data for modeling, risk analysis, and optimal care
The Genomic Data Commons is a completely new way of making
scientific data available to the cancer research community
4
The Cancer Genomic Data Commons (GDC) is an existing effort to standardize and simplify submission of genomic data to NCI and follow the principles of FAIR – Findable, Accessible, Attributable, Interoperable, Reusable, and Provide Recognition.
The GDC is part of the NIH Big Data to Knowledge (BD2K) initiative and an example of the NIH Commons
Genomic Data Commons
Microattribution, nanopublications, tracking the use of data, annotation of data, use of
algorithms, supports the data /software /metadata life cycle to provide credit and
analyze impact of data, software, analytics, algorithm, curation and knowledge sharing
Force11 white paperhttps://www.force11.org/group/fairgroup/fairprinciples
Cancer Research Data Ecosystem – Cancer Moonshot BRP
Well characterized Research Data Sets Cancer Cohorts Patient Data
EHR, Lab Data, Imaging, PROs, Smart Devices,
Decision Support
Learning from everycancer patient
Active researchparticipation
Research informationdonor
Clinical ResearchObservational Studies
Proteogenomics Imaging DataClinical Trials
Discovery Patient-engaged Research
SurveillanceBig Data
Implementation Research
SEERGDC
6
Cancer Research Data Commons Ecosystem
GenomicData Commons
Data Validation and Harmonization
ImagingData Commons
ProteomicsData Commons
Clinical Data Commons(Cohorts /
Indiv.)
SEER(Populations)
Data Contributors and Consumers
Researchers PatientsClinicians
Institutions
Why is the GDC important? A data ecosystem?How does this move cancer research forward?
8
Biological Scales
10
11
Digital technology is changing rapidly
12
Cancer therapy is changing
13
Application of Cancer Genomics is changing
What about Data Sharing?
Making data available for discovery, validation, new therapies
Working toward a National Learning Heath System for Cancer
Maximizing the impact, reuse, and reproducibility of cancer research
We need to change the incentives for data sharing
15
NIH Genomic Data Sharing Policy
https://gds.nih.gov/ Went into effect January 25, 2015
NCI guidance:http://
www.cancer.gov/grants-training/grants-management/nci-policies/genomic-data
Requires public sharing of genomic data sets
16
http://cancer.gov/brp
17
BRP RECOs
18
BRP RECOs
19
BRP RECOs
20
BRP RECOs
21
BRP and a Cancer Data Ecosystem
All the BRP Recos need some IT/informatics/analytics infrastructure
22
What is the GDC?
Genomic Data Commons Data Portal
The NCI Genomic Data Commons User InterfaceHome Page
The NCI Genomic Data Commons User InterfaceSample Browser
26
Clinical data Biospecimen data
Molecular data Files uploaded
The NCI Genomic Data Commons User InterfaceData Submission Dashboard
Development of the NCI Genomic Data Commons (GDC)To Foster the Molecular Diagnosis and Treatment of Cancer
GDC
Bob Grossman PIUniv. of Chicago
Ontario Inst. Cancer Res.Leidos
Institute of MedicineTowards Precision Medicine
2011
Discovery of Cancer Drivers With 2% Prevalence
Lung adeno.+ 2,900
Colorectal+ 1,200
Ovarian+ 500
Lawrence et al, Nature 2014
Power Calculation for Cancer Driver Discovery
Need to resequence >100,000 tumors to identify all cancer drivers at >2% prevalence
GDC Content
GDC
TCGA11,353 cases TARGET 3,178 cases
Current
Foundation Medicine 18,000 cases Cancer studies in dbGAP ~4,000 cases
Coming soon
NCI-MATCH ~5,000 cases Clinical Trial Sequencing Program ~3,000 cases
Planned (1-3 years)
Cancer Driver Discovery Program ~5,000 cases Human Cancer Model Initiative ~1,000 cases APOLLO – VA-DoD ~8,000
cases
~58,000 cases
What Makes GDC Special? Stores raw genomic data, allowing continuous reanalysis as
computation methods and genome annotations improve
NCI commitment to maintain long-term storage of cancer genomic data in the GDC with free access to researchers
Utilizes shared bioinformatic pipelines to facilitate cross-study comparisons and integrated analysis of multiple data types
Maintains harmonized clinical data in a highly structured and extensible schema
Enables researchers to comply with the NIH Genomic Data Sharing policy as well as journal requirements for data sharing
GDC The explanatory power of data in the GDC will grow over time as
it accrues more cases => GDC will promote precision oncology
Other Cancer Data Sharing EffortsSignature Efforts Data
BRCA ChallengeSomatic variant sharing
Isolated genetic variantsNo raw sequencing data
Precision medicine questionsSomatic variant sharing
Panel gene resequencingClinical response
Clinical trialPublic-private partnerships
Comprehensive genomicsDetailed clinical
phenotype data
Clinical trial accessClinical/genomic data aggregation
EHR dataClinical sequencing
Clinical oncology standardsEHR dataClinical sequencing
GDC
Towards a Cancer Knowledge System Continue genomic investigations of cancer
=> Need > 100,000 cases analyzed=> Embrace all genomic platforms=> Relationship of relapse and primary biopsies
Incorporate associated clinical annotations=> Clinical trial data=> Observational, longitudinal standard-of-care data=> N-of-1 clinical data
Promote and curate biological investigations of cancer genetic variants=> Driver vs. passenger mutations=> Multiple phenotypic assays=> Alterations in regulatory pathways – proteomics=> Mechanisms of therapeutic resistance=> Functional genomic investigations
Integrative models for high-dimensional data
GDC
Utility of a Cancer Knowledge System
Identifylow-frequencycancer drivers
Define genomicdeterminants of response
to therapy
Compose clinical trialcohorts sharing
targeted genetic lesions
Cancerinformation
donors
GenomicData
Commons
36
Genomic Data Commons / Cloud Pilot Ecosystem– Near Term
Data Submission, Harmonization, Visualization &
Download
Broad FireCloud
ISB CGC
Researchers
APIs
Web InterfaceResearchersContributors
Authentication & Authorization
thru eRA Commons and dbGaP
SBG CGCCommons @ AWS
Commons @ GCP
Commons @ Azure
New algorithms,tools, pipelines, visualizationsDockStore
GenomicData Commons
37
Support the Precision Medicine Initiative
• Expand data model to include other data (e.g. imaging and proteomics)
• Allow easy publication of persistent links to data, annotations, algorithms, tools, workflows
• Measure usage and impact
• Change incentives for public contributions
The Genomic Data Commons and Cloud Pilots
Cancer data ecosystem
Well characterized research data
sets
Cancer cohorts Patient data
EHR, lab data, imaging, PROs, smart devices,
decision support
Learning from everycancer patient
Active researchparticipation
Researchinformationdonor
Clinical ResearchObservational studies
ProteogenomicsImaging dataClinical trials
Discovery Patient engaged Research
SurveillanceBig Data
Implementation research
SEER
GDC AcknowledgementsNCI Center for Cancer Genomics Univ. of Chicago
Bob GrossmanAllison Heath
Mike FordZhenyu Zhang
Ontario Institute for Cancer Research
Lou StaudtZhining Wang
Martin FergusonJC Zenklusen
Daniela GerhardDeb Steverson
Vincent Ferretti'Francois Gerthoffert
JunJun Zhang
Leidos Biomedical ResearchMark Jensen
Sharon GaheenHimanso Sahni
NCI NCI CBIITTony KerlavageTanya Davidsen
CGC Pilot Team Principal Investigators • Gad Getz, Ph.D - Broad Institute - http://firecloud.org • Ilya Shmulevich, Ph.D - ISB - http://cgc.systemsbiology.net/ • Deniz Kural, Ph.D - Seven Bridges – http://www.cancergenomicscloud.org
NCI Project Officer & CORs• Anthony Kerlavage, Ph.D –Project Officer• Juli Klemm, Ph.D – COR, Broad Institute• Tanja Davidsen, Ph.D – COR, Institute for Systems Biology • Ishwar Chandramouliswaran, MS, MBA – COR, Seven Bridges Genomics
GDC Principal Investigator• Robert Grossman, Ph.D - University of Chicago• Allison Heath, Ph.D - University of Chicago• Vincent Ferretti, Ph.D - Ontario Institute for Cancer Research
Cancer Genomics Project Teams
NCI Leadership Team• Doug Lowy, M.D.• Lou Staudt, M.D., Ph.D.• Stephen Chanock, M.D.• George Komatsoulis, Ph.D.• Warren Kibbe, Ph.D.
Center for Cancer Genomics Partners• JC Zenklusen, Ph.D.• Daniela Gerhard, Ph.D.• Zhining Wang, Ph.D.• Liming Yang, Ph.D.• Martin Ferguson, Ph.D.
www.cancer.gov www.cancer.gov/espanol