cancergenomics.no big data challenges in personalized cancer medicine bioinformatics activities in...
TRANSCRIPT
cancergenomics.no
Big data challenges in personalized cancer medicine
Bioinformatics activities in the Norwegian Cancer Genomics Consortium (NCGC)
Sigve NakkenPostdoctoral fellow, Eivind Hovigs group
Norwegian Cancer Genomics Consortium (NCGC)Department of Tumor Biology, ICR, OUS
radium.no/myklebost
cancergenomics.no
N o r w e g i a n C a n cer genomics Consortium (NCGC)
• Founded by oncologists and cancer scientists across the country (Tromsø, Trondheim, Bergen, Oslo)
• Contributing to and following the national priorization of ”Individualized cancer treatment based on the gene profile of the tumour” as the most important topic in cancer research
• Has obtained grants of 75 Mkr (≈ 10 MUSD) from the Research Council
• Industrial partners: OCC, PubGene, BergenBio
• Project divided into work packages WP4: Data handling and establishment of national infrastructure
radium.no/myklebost
cancergenomics.no
NCGC sample cohorts
Cancer type REK approvals Sequencing Samples Analysis
Melanoma Approved Done 115 On-going
Colon cancer Approved Done 100 On-going
Multiple myeloma Approved On-going On-going
Lymphoma Approved Done 76 On-going
Leukemia Approved On-going 41 On-going
Sarcoma Approved On-going -
Prostate Approved On-going 75 -
Breast cancer Approved On-going -
Ovarian cancer Submitted
radium.no/myklebost
cancergenomics.no
NCGC cancer genome sequencing
• Exome sequencing• Goal: identify & characterize the acquired
genetic changes in the tumor sample by massively parallel deep sequencingSNVs & Insertions/deletionsCopy number aberrationsStructural rearrangements
radium.no/myklebost
cancergenomics.no
Cancer genome sequencing (II)
Variant calling pipeline
radium.no/myklebost
cancergenomics.no
Cancer genome sequencing (III)• How deep should I sequence my
tumor sample? (to detect a mutant subpopulation at X percent?)
• Biological complexity Tumor purity Ploidy Local CNAs
• Technical biases Uneven coverage (GC) PCR artefacts Sequencing quality/errors Oxidation (DNA extraction + library
prep)
• Other Tumor-control mismatch
radium.no/myklebost
cancergenomics.no
Somatic variant calling
• Two key components Read alignment – mapping each
read to its proper position in the genome
Mutation calling – quantify the likelihood of a true somatic mutation
• Best-practice workflows defined Still many different
algorithms to choose from
• Need for benchmark
radium.no/myklebost
cancergenomics.no
ICGC mutation benchmark• Purpose: Assess concordance & accuracy of somatic SNV/indel calling
among variant calling pipelines used in different research groups• Evaluate impact of different algorithms (aligner, caller etc.)
• NCGC: optimize and verify running pipeline (“ICGC stamp”)
• Participants were given raw sequence reads from a medulloblastoma (MB99) genome (tumor + normal), ~40X coverage task: submit somatic indels + snvs
• Coordinated by CNAG, Barcelona (Ivo Gut’s lab)
• Weekly global telephone conferences
• BM1.2
radium.no/myklebost
cancergenomics.no
SNVs – how well do we agree?
radium.no/myklebost
cancergenomics.no
InDels – how well do we agree?
radium.no/myklebost
cancergenomics.no
Verification of calls – GOLD set 300X sequencing of the same genome Six different pipelines called somatic SNVs and InDels SNVs with concordance of > 3 accepted SNVs with concordance < 3 and all indels reviewed manually
radium.no/myklebost
cancergenomics.no
Accuracy – SNV/InDels
radium.no/myklebost
cancergenomics.no
Impact of aligner-caller combination
radium.no/myklebost
cancergenomics.no
Benchmark manuscript
radium.no/myklebost
cancergenomics.no
Improved accuracy – SNVs/InDels
• EH_rev
• EH_rev
radium.no/myklebost
cancergenomics.no
Interpretation of variants• Which variants/genes are of
functional relevance? Is my variant a frequent mutation?
Which cancer types? Is my variant likely to alter the
activity of the encoding protein? Is my variant known as a drug
sensitivity marker? Which mutant genes are known
drug targets?• Annotation pipeline
Variant calling
Functional annotation
Prioritization
radium.no/myklebost
cancergenomics.no
Variants – phenotypic effect?• Computational prediction of
damaging variants• Machine learning• Numerous algorithms
SIFT, PolyPhen2, MutationTaster, MutationAssessor, Provean, FATHMM, etc..
• Challenge: many have been trained with Mendelian disease mutations Gain-of-function mutations hard to
predict
radium.no/myklebost
cancergenomics.no
Variants – clinical associations?
• Recent promising resources/data on clinically associated variants
radium.no/myklebost
cancergenomics.no
Which genes are key drivers?• Which genes show significantly
more mutations than random expectation? Requires sophisticated modeling
of the background mutation rates MutSigCV
• Which genes are enriched with functionally biased variants? IntoGen
Lawrence at al., Nature (2013)
Gonzalez-Perez at al., Nature Methods (2013)
radium.no/myklebost
cancergenomics.no
NCGC – data trends
radium.no/myklebost
cancergenomics.no
Mutational heterogeneity – across cancer types
radium.no/myklebost
cancergenomics.no
Mutational heterogeneity – within cancer types
CRC Melanoma
radium.no/myklebost
cancergenomics.no
Functional heterogeneity
radium.no/myklebost
cancergenomics.no
Mutational signatures• Distinct mutational patterns
(mutation types & sequence context) that reflect underlying mutational processes
• Mathematical framework to infer the k mutational signatures contributing to a cohort
• What is the relative contribution of each process in each sample?
S1 – Alkylating agents (?) S2 – UV damage S3 - Aging
radium.no/myklebost
cancergenomics.no
In progress/future plans
• Evaluation of more read aligners/variant callers• Integration of improved calling of copy number
aberrations• Inference of clonal population structure• Report pr. tumor case – QC, mutated cancer genes,
actionable targets etc.• Improved tools for visualization of results
radium.no/myklebost
cancergenomics.no
Other activities
radium.no/myklebost
cancergenomics.no
Acknowledgements
• NCGC Principal investigatorsDepartment of Tumor Biology
• Leonardo Meza-Zepeda, Susanne Lorenz, Ola Myklebost
• Daniel Vodak, Ghislain Fournous, Lars Birger Aasheim, Eivind Hovig
• ICGC Technical Validation group