madrid icgc pcawg_2016_slideshare
TRANSCRIPT
Canceromatic III - Session I: Pan-Cancer analysis - Changing landscape of data and tools available for reproducible cancer genomics workflows: report from the ICGC trenches.
Nov 14th 2016B.F. Francis Ouellette [email protected]
• Senior Scientists & Associate Director, Informatics and Biocomputing, Ontario Institute for Cancer Research, Toronto, ON
• Associate Professor, Department of Cell and Systems Biology, University of Toronto, Toronto, ON.
2Module #: Title of Module
Module 2 bioinformatics.ca
ONTARIO INSTITUTE FOR CANCER RESEARCH
You are free to:Copy, share, adapt, or re-mix;
Photograph, film, or broadcast;
Blog, live-blog, or post video of;
This presentation. Provided that:You attribute the work to its author and respect the rights and licenses associated with its components.
Slide Concept by Cameron Neylon, who has waived all copyright and related or neighbouring rights. This slide only ccZero.Social Media Icons adapted with permission from originals by Christopher Ross. Original images are available under GPL at;http://www.thisismyurl.com/free-downloads/15-free-speech-bubble-icons-for-popular-websites
6
ONTARIO INSTITUTE FOR CANCER RESEARCH
Cancer-om-atics Jul 6-9 2009Cancer-om-atics II Mar 28-30 2011Canceromatics III Nov 13 -16 2016
ONTARIO INSTITUTE FOR CANCER RESEARCH
DisclaimersI do not (and will not) profit in any way, shape or form, from any of the brands, products or companies I may mention.
I am a big proponent of Open Access, Open Source, Opent Data and Open Courseware
I am on the SAB of many NIH funded projects (SGD, Galaxy, GenomeSpace, H3ABionet, and HMP2), as well as Elixir and Genome Canada’s SIAC, and the NRC’s KMAC. This comes with a bias on how science should be done!
ONTARIO INSTITUTE FOR CANCER RESEARCH
Outline
8
IntroductionICGCPCAWGClosing remarks
9
ONTARIO INSTITUTE FOR CANCER RESEARCH
adapted from https://goo.gl/fQJAz1
ICGC PCAWGDocker Testing
ONTARIO INSTITUTE FOR CANCER RESEARCH
Cancer is a Disease of the Genome
Challenge in Treating Cancer: Every tumour is different Every cancer patient is different
Adapted from Tom Hudsonhttps://www.cancer.gov/research/areas/genomics
ONTARIO INSTITUTE FOR CANCER RESEARCH
Johns Hopkins> 18,000 genes analyzed for mutations11 breast and 11 colon tumorsL.D. Wood et al, Science, Oct. 2007
Wellcome Trust Sanger Institute518 genes analyzed for mutations210 tumors of various typesC. Greenman et al, Nature, Mar. 2007
TCGA (NIH)Multiple technologiesbrain (glioblastoma multiforme), lung (squamous
carcinoma), and ovarian (serous cystadenocarcinoma).
F.S. Collins & A.D. Barker, Sci. Am, Mar. 2007
Large-Scale Studies of Cancer Genomes
ONTARIO INSTITUTE FOR CANCER RESEARCH
Heterogeneity within and across tumor types
High rate of abnormalities (driver vs passenger)
Sample quality matters Consent and controlled data access is
complicated
Lessons learned from early studies
MR Stratton et al. Nature 458, 719-724 (2009) doi:10.1038/nature07943
ONTARIO INSTITUTE FOR CANCER RESEARCH
Analysis Data TypesSimple Somatic Mutations (SSM or SNV)Copy Number Alterations (CAN or CNV)Structural Variants (SV) Germline variants (SNPs)Gene Expression (micro-arrays and RNASeq)miRNA Expression (RNASeq)Epigenomics (Arrays and Methylation) Splicing Variation (RNASeq)Protein Expression (Arrays)
ONTARIO INSTITUTE FOR CANCER RESEARCH
Rationale for the ICGC:
Scope is hugeReduce duplication of effortStandardization and uniform quality measuresMerging of datasetsSpectrum of many cancers varies across the worldAccelerate the dissemination of genomic and analytical methods
ONTARIO INSTITUTE FOR CANCER RESEARCH
International Cancer Genome Consortium
Collect ~500 tumour/normal pairs from each of 50 different major cancer types; 25,000 T/N pairs!
Comprehensive genome analysis of each T/N pair: GenomeTranscriptomeMethylomeClinical data
Make the data available to the research community & public. Identify
genome changes
…GATTATTCCAGGTAT… …GATTATTGCAGGTAT… …GATTATTGCAGGTAT…
Adapted from Tom Hudson
16
ONTARIO INSTITUTE FOR CANCER RESEARCH
ONTARIO INSTITUTE FOR CANCER RESEARCH
International Cancer Genome Consortium: http:/icgc.org
ONTARIO INSTITUTE FOR CANCER RESEARCH
DataSubmissio
n
ValidationValidationValidation(dictionary)
Validation(across fields)Validation
(across fields)Validation
(across fields)
indexing
Happy Users
http://goo.gl/1EcyR
ONTARIO INSTITUTE FOR CANCER RESEARCH
ICGC needs to deal with different kinds of users!
19
Biologists/Clinicians:Web interface to processed data, providing:
Affected gene lists with consequencesImpact on pathways
Power users:Application Programing Interface (API) to get to dataAvailability and Integration with cloud resources
20
ONTARIO INSTITUTE FOR CANCER RESEARCH
ICGC Data Coordinating Centre: dcc.icgc.org
ONTARIO INSTITUTE FOR CANCER RESEARCH
BRAF missense mutations in colorectal cancer
21
22
ONTARIO INSTITUTE FOR CANCER RESEARCH
https://dcc.icgc.org/
23
ONTARIO INSTITUTE FOR CANCER RESEARCH
https://dcc.icgc.org/icgc-in-the-cloud
24
ONTARIO INSTITUTE FOR CANCER RESEARCH
http://www.cancercollaboratory.org/
ONTARIO INSTITUTE FOR CANCER RESEARCH
http://docs.icgc.org/
User and submitter documentation
26
ONTARIO INSTITUTE FOR CANCER RESEARCH
Software development discussions
https://discuss.icgc.org/
ONTARIO INSTITUTE FOR CANCER RESEARCH
Some challenges:
27
So, we have lots of data, is it generated the same way?
ONTARIO INSTITUTE FOR CANCER RESEARCH
Every country/group has basically been submitting:
28
Simple Somatic Mutations (SSM or SNV)Copy Number Alterations (CAN or CNV)Structural Variants (SV) Germline variants (SNPs)Gene Expression (micro-arrays and RNASeq)miRNA Expression (RNASeq)Epigenomics (Arrays and Methylation) Splicing Variation (RNASeq)Protein Expression (Arrays)
29
ONTARIO INSTITUTE FOR CANCER RESEARCH
Are they all using the same pipelines?
No
ONTARIO INSTITUTE FOR CANCER RESEARCH
http://goo.gl/CekF6y
Missing Clinical Data?
31
ONTARIO INSTITUTE FOR CANCER RESEARCH
http://goo.gl/CekF6y
32
ONTARIO INSTITUTE FOR CANCER RESEARCH
Are we all using the same definition for controlled access data?
No
ONTARIO INSTITUTE FOR CANCER RESEARCH
ICGCBAM/FASTQ
TCGABAM/FASTQ
ICGCOpenData
(includes TCGA
Open Data)
ONTARIO INSTITUTE FOR CANCER RESEARCH
• Detailed Phenotype and Outcome data Region of residenceRisk factorsExaminationSurgeryRadiationSampleSlideSpecific histological featuresAnalyteAliquotDonor notes
• Gene Expression (probe-level data)• Raw genotype calls• Gene-sample identifier links• Genome sequence files
ICGC Controlled Access Datasets
• Cancer Pathology Histologic type or subtypeHistologic nuclear grade
• Patient/Person Gender, Age range, Vital status, Survival timeRelapse type, Status at follow-up
• Gene Expression (normalized)• DNA methylation •Computed Copy Number and Loss of Heterozygosity• Newly discovered somatic variants
ICGC OA Datasets
http://goo.gl/w4mrV
ONTARIO INSTITUTE FOR CANCER RESEARCH
ICGC
TCGA
ONTARIO INSTITUTE FOR CANCER RESEARCH
ICGC
TCGA
Differences between ICGC & TCGA• Different tumour types• Different geographic rules• Many countries vs one jurisdiction• Different definitions of what is controlled• Different data access rules
ONTARIO INSTITUTE FOR CANCER RESEARCH
• Detailed Phenotype and Outcome data
• Gene Expression (probe-level data)
• Raw genotype calls
• Gene-sample identifier links
• Genome sequence files
• Germ line variants
ICGC Controlled Access Datasets
• Cancer Pathology Histologic type or subtypeHistologic nuclear grade
• Patient/Person Gender, Age range, Vital status, Survival timeRelapse type, Status at follow-up
• Gene Expression (normalized)• DNA methylation •Computed Copy Number and Loss of Heterozygosity• Somatic variants from Exome or WGS
ICGC OpenAccess Datasets
http://goo.gl/w4mrV
ONTARIO INSTITUTE FOR CANCER RESEARCH
• Primary sequence data (BAM and FASTQ files)
• SNP6 array level 1 and level 2 data• Exon array level 1 and level 2 data• Somatic variants from whole
genome sequencing• Certain information in MAFs• A full list of controlled-access
data types can be found at: http://goo.gl/K1h7zu
TCGA Controlled Access Datasets
• De-identified clinical and demographic data
• Gene expression data• Copy number alterations in regions
of the genome• Epigenetic data• Summaries of data compiled across
individuals• Anonymized single amplicon DNA
sequence data• Somatic variants from scrubbed
exome sequencing
TCGA OpenAccess Datasets
http://goo.gl/A1rMRB
39
ONTARIO INSTITUTE FOR CANCER RESEARCH
Can we do better?
ONTARIO INSTITUTE FOR CANCER RESEARCH
From ICGC/TCGA
40
Each groups have been free to decide on their own if they wanted to sequence Exomes or Whole Genomes.A bit more than 10% of all genomes done were done with Whole Genome SequencingA steering comitte was formed and we decided to alnalyze these WG in a robust way with the primary question of figuring out what was hidden in the genomic sequence of cancer patients!
41
ONTARIO INSTITUTE FOR CANCER RESEARCH
42
ONTARIO INSTITUTE FOR CANCER RESEARCH
Steering Committee of PCAWG
Peter Campbell, Sanger Inst.Gady Getz, BroadJan Korbel, EMBLLincoln Stein, OICRJosh Stuart, UCSC
ONTARIO INSTITUTE FOR CANCER RESEARCH
PanCancer Analysis of Whole Genomes (PCAWG)
> 2,800 T/N pairs with clinical data from 20 tumour type of whole genome analysis.Aligned with one standard pipeline.Genomic Variants determined with 3 pipelines17 working groupsStart writing papers now
44
ONTARIO INSTITUTE FOR CANCER RESEARCH
Deliverable for PCAWG will include:
1st PANCANCER analysis on > 2,800 cancer tumours from a WGS perspectiveRNA, SSM, CNV, Methylation analysis & germlinePublished (executable) pipelines
Docker / DockstoreMutiple cloud access to dataMultiple portal access to data
45
ONTARIO INSTITUTE FOR CANCER RESEARCH
https://dcc.icgc.org/pcawg
46
ONTARIO INSTITUTE FOR CANCER RESEARCH
Working Groups (1/2)1 Novel somatic mutation calling methods 2 Analysis of mutations in regulatory regions3 Integration of transcriptome and genome4 Integration of epigenome and genome5 Consequences of somatic mutations on pathway and network activity6 Patterns of structural variations, signatures, genomic correlations, retrotransposons, mobile elements7 Mutation signatures and processes8 Germline cancer genome
47
ONTARIO INSTITUTE FOR CANCER RESEARCH
Working Groups (1/2)9 Inferring driver mutations and identifying cancer genes and pathways10 Translating cancer genomes to the clinic11 Evolution and heterogeneity12 Exploratory: portals, visualization and software infrastructure13 Molecular subtypes and classification14 Analysis of mutations in non-coding RNA15 Exploratory: mitochondrial16 Exploratory: pathogensTech Technical working group
48
ONTARIO INSTITUTE FOR CANCER RESEARCH
http://dockstore.org
49
ONTARIO INSTITUTE FOR CANCER RESEARCH
PCAWG pipelines now on Dockstore
50
ONTARIO INSTITUTE FOR CANCER RESEARCH
DOCKSTORE testing groupAndrew Duncan, OICRChristina Yung, OICRDenis Yuen, OICRZhibin Lu, OICRBrian O’Connor, UCSCAlex Buchanan, OHSUKyle Ellrott, OHSU Francis Ouellette, OICRGordon Saksena, BroadJunjun Zhang, OICRMiguel Vazquez, CNIOOliver Hofmann, AustraliaSolomon Shorser, OICRAdam Strucka, OHSU
51
ONTARIO INSTITUTE FOR CANCER RESEARCH
Challenges:Too many conference calls!Too many clouds Even though we learned from what not to do with ICGC, we had to learn what not to do in the clouds.TCGA and ICGC have different authorization protocolsNot all data can exist everywhereDockstore testing is taking too long!
ONTARIO INSTITUTE FOR CANCER RESEARCH
Other projects in planning ICGC to finish in Spring of 2018Planning for ICGCmed
ICGC 1: 25,000 tumours (DNA, RNA, Epigenome, Clinical data)ICGCmed: 200,000 Tumours (DNA, RNA, Epigenome, Clinical trial)ICGC1 was the picture, ICGCmed will be the movie (before and after treatment).
Submission system with one place for data and metadataTools/links directory portal
53
ONTARIO INSTITUTE FOR CANCER RESEARCH
29,647
54
ONTARIO INSTITUTE FOR CANCER RESEARCH
29,647
55
ONTARIO INSTITUTE FOR CANCER RESEARCH
2,834
56
ONTARIO INSTITUTE FOR CANCER RESEARCH
2,834
57
ONTARIO INSTITUTE FOR CANCER RESEARCH
1477
58
ONTARIO INSTITUTE FOR CANCER RESEARCH
1477
59
ONTARIO INSTITUTE FOR CANCER RESEARCH
915
60
ONTARIO INSTITUTE FOR CANCER RESEARCH
915
61
ONTARIO INSTITUTE FOR CANCER RESEARCH
20
62
ONTARIO INSTITUTE FOR CANCER RESEARCH
20
ONTARIO INSTITUTE FOR CANCER RESEARCH
17
ONTARIO INSTITUTE FOR CANCER RESEARCHhttp://bioinformatics.ca/
17
65
ONTARIO INSTITUTE FOR CANCER RESEARCH
12
66
ONTARIO INSTITUTE FOR CANCER RESEARCH
0-Toronto1-Bethesda2-Hinxton
4-Queensland 3-Madrid5-Kyoto
7-Hidelberg 6-Cannes8-Toronto
9-Beijing
10-Mumbai11- Boston
12
67
ONTARIO INSTITUTE FOR CANCER RESEARCH
10
68
ONTARIO INSTITUTE FOR CANCER RESEARCH
Informatics & BioComputing @ OICR
10
ONTARIO INSTITUTE FOR CANCER RESEARCH
9
ONTARIO INSTITUTE FOR CANCER RESEARCH
9
ONTARIO INSTITUTE FOR CANCER RESEARCH
71
1
ONTARIO INSTITUTE FOR CANCER RESEARCH
72
Bioinformatics.ca workshops Content
http://bioinformatics-ca.github.io/
https://goo.gl/CGu13q1
ONTARIO INSTITUTE FOR CANCER RESEARCH
DCC Software Developer
Vincent FerrettiDusan AndricPhuong-My DoFrancois GerthoffertTerry LinMichael MoncadaVitalii SlobodianykBob TiernayDouglas WongLinda XiangJunjun Zhang
AcknowledgmentsICGC/OICR Project leaders:
Tom HudsonJohn McPhersonLincoln SteinJared SimpsonPaul BoutrosVincent FerrettiFrancis OuelletteJennifer JenningsChristine Yung
Ouellette LabAlysha MoncrieffeAnn MeyerZhibin LuWeb DevJoseph YamadaKaman WuKim CullionKoji MiyauchiMiyuki Fukuma
ICGC DCC BiocurationHardeep NahalMarc Perry
http://oicr.on.ca http://icgc.org
… and all the patients and their families that that are putting their hopes into our work!
Research IT/Systems
David Sutton, Bob GibsonDavid MagdaRob NaccaratoBrian OttGino Yearwood
EGAJordi Rambla De ArgilaArcadi Navarro Audald Iloret Mauricio Moldes
ONTARIO INSTITUTE FOR CANCER RESEARCH
http://icgc.orghttp://dcc.icgc.orghttp://docs.icgc.org
[email protected] http://bioinformatics.ca
ONTARIO INSTITUTE FOR CANCER RESEARCH
We are hiring:
• OICR Director• Genome Technology Director• Junior Faculty in Informatics
& Biocomputing• PDFs
Interested? Ask Paul Boutros or I
76
ONTARIO INSTITUTE FOR CANCER RESEARCH
Muchas gracias!