cancergenomics.no big data challenges in personalized cancer medicine bioinformatics activities in...

27
cancergenomics.no Big data challenges in personalized cancer medicine Bioinformatics activities in the Norwegian Cancer Genomics Consortium (NCGC) Sigve Nakken Postdoctoral fellow, Eivind Hovigs group Norwegian Cancer Genomics Consortium (NCGC) Department of Tumor Biology, ICR, OUS

Upload: randall-hicks

Post on 18-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cancergenomics.no Big data challenges in personalized cancer medicine Bioinformatics activities in the Norwegian Cancer Genomics Consortium (NCGC) Sigve

cancergenomics.no

Big data challenges in personalized cancer medicine

Bioinformatics activities in the Norwegian Cancer Genomics Consortium (NCGC)

Sigve NakkenPostdoctoral fellow, Eivind Hovigs group

Norwegian Cancer Genomics Consortium (NCGC)Department of Tumor Biology, ICR, OUS

Page 2: Cancergenomics.no Big data challenges in personalized cancer medicine Bioinformatics activities in the Norwegian Cancer Genomics Consortium (NCGC) Sigve

radium.no/myklebost

cancergenomics.no

N o r w e g i a n C a n cer genomics Consortium (NCGC)

• Founded by oncologists and cancer scientists across the country (Tromsø, Trondheim, Bergen, Oslo)

• Contributing to and following the national priorization of ”Individualized cancer treatment based on the gene profile of the tumour” as the most important topic in cancer research

• Has obtained grants of 75 Mkr (≈ 10 MUSD) from the Research Council

• Industrial partners: OCC, PubGene, BergenBio

• Project divided into work packages WP4: Data handling and establishment of national infrastructure

Page 3: Cancergenomics.no Big data challenges in personalized cancer medicine Bioinformatics activities in the Norwegian Cancer Genomics Consortium (NCGC) Sigve

radium.no/myklebost

cancergenomics.no

NCGC sample cohorts

Cancer type REK approvals Sequencing Samples Analysis

Melanoma Approved Done 115 On-going

Colon cancer Approved Done 100 On-going

Multiple myeloma Approved On-going On-going

Lymphoma Approved Done 76 On-going

Leukemia Approved On-going 41 On-going

Sarcoma Approved On-going -

Prostate Approved On-going 75 -

Breast cancer Approved On-going -

Ovarian cancer Submitted

Page 4: Cancergenomics.no Big data challenges in personalized cancer medicine Bioinformatics activities in the Norwegian Cancer Genomics Consortium (NCGC) Sigve

radium.no/myklebost

cancergenomics.no

NCGC cancer genome sequencing

• Exome sequencing• Goal: identify & characterize the acquired

genetic changes in the tumor sample by massively parallel deep sequencingSNVs & Insertions/deletionsCopy number aberrationsStructural rearrangements

Page 5: Cancergenomics.no Big data challenges in personalized cancer medicine Bioinformatics activities in the Norwegian Cancer Genomics Consortium (NCGC) Sigve

radium.no/myklebost

cancergenomics.no

Cancer genome sequencing (II)

Variant calling pipeline

Page 6: Cancergenomics.no Big data challenges in personalized cancer medicine Bioinformatics activities in the Norwegian Cancer Genomics Consortium (NCGC) Sigve

radium.no/myklebost

cancergenomics.no

Cancer genome sequencing (III)• How deep should I sequence my

tumor sample? (to detect a mutant subpopulation at X percent?)

• Biological complexity Tumor purity Ploidy Local CNAs

• Technical biases Uneven coverage (GC) PCR artefacts Sequencing quality/errors Oxidation (DNA extraction + library

prep)

• Other Tumor-control mismatch

Page 7: Cancergenomics.no Big data challenges in personalized cancer medicine Bioinformatics activities in the Norwegian Cancer Genomics Consortium (NCGC) Sigve

radium.no/myklebost

cancergenomics.no

Somatic variant calling

• Two key components Read alignment – mapping each

read to its proper position in the genome

Mutation calling – quantify the likelihood of a true somatic mutation

• Best-practice workflows defined Still many different

algorithms to choose from

• Need for benchmark

Page 8: Cancergenomics.no Big data challenges in personalized cancer medicine Bioinformatics activities in the Norwegian Cancer Genomics Consortium (NCGC) Sigve

radium.no/myklebost

cancergenomics.no

ICGC mutation benchmark• Purpose: Assess concordance & accuracy of somatic SNV/indel calling

among variant calling pipelines used in different research groups• Evaluate impact of different algorithms (aligner, caller etc.)

• NCGC: optimize and verify running pipeline (“ICGC stamp”)

• Participants were given raw sequence reads from a medulloblastoma (MB99) genome (tumor + normal), ~40X coverage task: submit somatic indels + snvs

• Coordinated by CNAG, Barcelona (Ivo Gut’s lab)

• Weekly global telephone conferences

• BM1.2

Page 9: Cancergenomics.no Big data challenges in personalized cancer medicine Bioinformatics activities in the Norwegian Cancer Genomics Consortium (NCGC) Sigve

radium.no/myklebost

cancergenomics.no

SNVs – how well do we agree?

Page 10: Cancergenomics.no Big data challenges in personalized cancer medicine Bioinformatics activities in the Norwegian Cancer Genomics Consortium (NCGC) Sigve

radium.no/myklebost

cancergenomics.no

InDels – how well do we agree?

Page 11: Cancergenomics.no Big data challenges in personalized cancer medicine Bioinformatics activities in the Norwegian Cancer Genomics Consortium (NCGC) Sigve

radium.no/myklebost

cancergenomics.no

Verification of calls – GOLD set 300X sequencing of the same genome Six different pipelines called somatic SNVs and InDels SNVs with concordance of > 3 accepted SNVs with concordance < 3 and all indels reviewed manually

Page 12: Cancergenomics.no Big data challenges in personalized cancer medicine Bioinformatics activities in the Norwegian Cancer Genomics Consortium (NCGC) Sigve

radium.no/myklebost

cancergenomics.no

Accuracy – SNV/InDels

Page 13: Cancergenomics.no Big data challenges in personalized cancer medicine Bioinformatics activities in the Norwegian Cancer Genomics Consortium (NCGC) Sigve

radium.no/myklebost

cancergenomics.no

Impact of aligner-caller combination

Page 14: Cancergenomics.no Big data challenges in personalized cancer medicine Bioinformatics activities in the Norwegian Cancer Genomics Consortium (NCGC) Sigve

radium.no/myklebost

cancergenomics.no

Benchmark manuscript

Page 15: Cancergenomics.no Big data challenges in personalized cancer medicine Bioinformatics activities in the Norwegian Cancer Genomics Consortium (NCGC) Sigve

radium.no/myklebost

cancergenomics.no

Improved accuracy – SNVs/InDels

• EH_rev

• EH_rev

Page 16: Cancergenomics.no Big data challenges in personalized cancer medicine Bioinformatics activities in the Norwegian Cancer Genomics Consortium (NCGC) Sigve

radium.no/myklebost

cancergenomics.no

Interpretation of variants• Which variants/genes are of

functional relevance? Is my variant a frequent mutation?

Which cancer types? Is my variant likely to alter the

activity of the encoding protein? Is my variant known as a drug

sensitivity marker? Which mutant genes are known

drug targets?• Annotation pipeline

Variant calling

Functional annotation

Prioritization

Page 17: Cancergenomics.no Big data challenges in personalized cancer medicine Bioinformatics activities in the Norwegian Cancer Genomics Consortium (NCGC) Sigve

radium.no/myklebost

cancergenomics.no

Variants – phenotypic effect?• Computational prediction of

damaging variants• Machine learning• Numerous algorithms

SIFT, PolyPhen2, MutationTaster, MutationAssessor, Provean, FATHMM, etc..

• Challenge: many have been trained with Mendelian disease mutations Gain-of-function mutations hard to

predict

Page 18: Cancergenomics.no Big data challenges in personalized cancer medicine Bioinformatics activities in the Norwegian Cancer Genomics Consortium (NCGC) Sigve

radium.no/myklebost

cancergenomics.no

Variants – clinical associations?

• Recent promising resources/data on clinically associated variants

Page 19: Cancergenomics.no Big data challenges in personalized cancer medicine Bioinformatics activities in the Norwegian Cancer Genomics Consortium (NCGC) Sigve

radium.no/myklebost

cancergenomics.no

Which genes are key drivers?• Which genes show significantly

more mutations than random expectation? Requires sophisticated modeling

of the background mutation rates MutSigCV

• Which genes are enriched with functionally biased variants? IntoGen

Lawrence at al., Nature (2013)

Gonzalez-Perez at al., Nature Methods (2013)

Page 20: Cancergenomics.no Big data challenges in personalized cancer medicine Bioinformatics activities in the Norwegian Cancer Genomics Consortium (NCGC) Sigve

radium.no/myklebost

cancergenomics.no

NCGC – data trends

Page 21: Cancergenomics.no Big data challenges in personalized cancer medicine Bioinformatics activities in the Norwegian Cancer Genomics Consortium (NCGC) Sigve

radium.no/myklebost

cancergenomics.no

Mutational heterogeneity – across cancer types

Page 22: Cancergenomics.no Big data challenges in personalized cancer medicine Bioinformatics activities in the Norwegian Cancer Genomics Consortium (NCGC) Sigve

radium.no/myklebost

cancergenomics.no

Mutational heterogeneity – within cancer types

CRC Melanoma

Page 23: Cancergenomics.no Big data challenges in personalized cancer medicine Bioinformatics activities in the Norwegian Cancer Genomics Consortium (NCGC) Sigve

radium.no/myklebost

cancergenomics.no

Functional heterogeneity

Page 24: Cancergenomics.no Big data challenges in personalized cancer medicine Bioinformatics activities in the Norwegian Cancer Genomics Consortium (NCGC) Sigve

radium.no/myklebost

cancergenomics.no

Mutational signatures• Distinct mutational patterns

(mutation types & sequence context) that reflect underlying mutational processes

• Mathematical framework to infer the k mutational signatures contributing to a cohort

• What is the relative contribution of each process in each sample?

S1 – Alkylating agents (?) S2 – UV damage S3 - Aging

Page 25: Cancergenomics.no Big data challenges in personalized cancer medicine Bioinformatics activities in the Norwegian Cancer Genomics Consortium (NCGC) Sigve

radium.no/myklebost

cancergenomics.no

In progress/future plans

• Evaluation of more read aligners/variant callers• Integration of improved calling of copy number

aberrations• Inference of clonal population structure• Report pr. tumor case – QC, mutated cancer genes,

actionable targets etc.• Improved tools for visualization of results

Page 26: Cancergenomics.no Big data challenges in personalized cancer medicine Bioinformatics activities in the Norwegian Cancer Genomics Consortium (NCGC) Sigve

radium.no/myklebost

cancergenomics.no

Other activities

Page 27: Cancergenomics.no Big data challenges in personalized cancer medicine Bioinformatics activities in the Norwegian Cancer Genomics Consortium (NCGC) Sigve

radium.no/myklebost

cancergenomics.no

Acknowledgements

• NCGC Principal investigatorsDepartment of Tumor Biology

• Leonardo Meza-Zepeda, Susanne Lorenz, Ola Myklebost

• Daniel Vodak, Ghislain Fournous, Lars Birger Aasheim, Eivind Hovig

• ICGC Technical Validation group