using encode to interpret mutational patterns in cancer · 2016-06-09 · dna, compared to their...

39
Using ENCODE to interpret mutational patterns in cancer Shamil Sunyaev Broad Institute of M.I.T. and Harvard

Upload: others

Post on 01-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data

Using ENCODE to interpret mutational patterns in cancer

Shamil Sunyaev

Broad Institute of M.I.T. and Harvard

Page 2: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data

Disclosures

This work was funded by NIH

Nothing to disclose

Page 3: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data

Data on de novo mutations

Germ-line mutations Somatic mutations

Change in DNA sequence

Change in DNA sequence

Page 4: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data

Why is this of any interest?

Statistical genetics of cancer

Evolution

Biology

Methods

As a key parameter

Of mutation rate

Of DNA repair

Of DNA replication

Page 5: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data

Genetic mapping and mapping by natural selection

Most of gene mapping methods (linkage, association) rely on recombination and are

only applicable to sexual systems

Many methods to detect selection signals (selective sweeps, extended haplotypes) are

similarly limited to sexual populations

What about cancer?

Page 6: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data

Identifying selected genes/functional units by recurrence

Page 7: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data

This signature of selection is completely (!!!) confounded by mutation rate variation

“Non-functional” regions may not serve as an ideal

null model if mutation rate is correlated with “functionality”

Other samples may not serve as an ideal null model

if mutation rate variation is sample-specific

All of this is exacerbated in the search for non-coding drivers!

Everything is more difficult in search for non-coding drivers

Page 8: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data

Somatic cancer mutation density is associated with replication timing

Lawrence, et al., Nature 2013

Page 9: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data

Somatic mutation rate depends on expression

Mutation rate is reduced in transcribed regions compared to intergenic regions

The reduction of mutation rate is proportional to expression level

The effect is attributed to transcription coupled repair (TCR), which is supported by the strand bias

Hanawalt & Spivak, Nat Rev Mol Cell Biol 2008

Page 10: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data
Page 11: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data

Predicting local mutation rate at 1Mb scale

Polak, Karlic et al., Nature 2015

Page 12: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data

Tissue specificity

Page 13: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data

55-86% of regional variation is explained by 184 chromatin tracks from more than

80 tissues

Page 14: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data

Cell type specificity

Skin_FibroblastsfBrain

Skin_KeratinocytesiPS

fSpinalLungfHeartfSkinH1

NPCHeart

fAdrenalfLung

epithelial_cancerfPlacentafKidneyLiver

gastricfThymus

epithelial_normalhematopoietic

fStomachfMuscle

Skin_Melanocytes

0.0

0.2

0.4

0.6

0.8

variance.explained

CtoT.mel.all.rate

Page 15: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data

Mutation rate is reduced in regulatory regions marked by accessible chromatin

mut

atio

n de

nsity

0e+00

1e−05

2e−05

3e−05

4e−05

5e−05

6e−05

CLL Lung Melanoma Colon MM

regionDHS peakDHSflanksgenome

Polak et al., Nature Biotech. 2014

Page 16: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data
Page 17: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data

Implicating nucleotide excision repair (NER)

melanoma genomes

depl

etio

n of

mut

atio

n de

nsity

in D

HS

0.0

0.2

0.4

0.6

0.8

1.0

NERgenomes with mutations in NER genes genomes without mutations in NER genes

Page 18: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data

High mutation density in TFBS due to NER

2 6 4 | N A T U R E | V O L 5 3 2 | 1 4 A P R I L 2 0 1 6

LETTERdoi:10.1038/nature17661

Nucleotide excision repair is impaired by binding of transcription factors to DNARadhakrishnan Sabarinathan1, Loris Mularoni1, Jordi Deu-Pons1, Abel Gonzalez-Perez1 & Núria López-Bigas1,2

Somatic mutations are the driving force of cancer genome evolution1. The rate of somatic mutations appears to be greatly variable across the genome due to variations in chromatin organization, DNA accessibility and replication timing2–5. However, other variables that may influence the mutation rate locally are unknown, such as a role for DNA-binding proteins, for example. Here we demonstrate that the rate of somatic mutations in melanomas is highly increased at active transcription factor binding sites and nucleosome embedded DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data6, we show that the higher mutation rate at these sites is caused by a decrease of the levels of nucleotide excision repair (NER) activity. Our work demonstrates that DNA-bound proteins interfere with the NER machinery, which results in an increased rate of DNA mutations at the protein binding sites. This finding has important implications for our understanding of mutational and DNA repair processes and in the identification of cancer driver mutations.

The accumulation of somatic mutations in cells results from the interplay of mutagenic processes, both internal and exogenous, and mechanisms of DNA repair. Detailed early biochemical studies7,8 and recent efforts to sequence the genomes of tumours from different can-cer types9,10 have shed light on this. Mutational signatures associated with various tumorigenic mechanisms have been identified across cancer types11, and genomic features such as chromatin organization, DNA accessibility, and DNA replication timing2–5 have been associated with the variation of somatic mutation rates at the megabase scale. Two recent studies proposed a causal relationship between the accessibility of chromosomal areas to the DNA repair machinery and their mutational burden. Supek and Lehner12 pointed to variable repair of DNA mis-matches as the basis of the megabase scale variation of somatic mutation rates across the human genome. Polak et al.4 attributed lower somatic mutation rates at DNase I hypersensitive sites (DHS) than at their flank-ing regions and the rest of the genome in cell lines and primary tumours to higher accessibility to the global genome repair machinery. Similarly, nucleo some occupancy has been linked to regional mutation rate vari-ation between nucleosome-bound DNA and linker regions13–16, while two recent studies found a relation between transcription factor binding sites (TFBS) and nucleotide substitution rates. Reijns et al.17 detected increased levels of nucleotide substitutions around TFBS in the yeast genome, which was attributed to DNA-binding proteins acting as partial barriers to the polymerase δ mediated displacement of polymerase α synthesized DNA. Katainen et al.18 found that CTCF/cohesin-binding sites are frequently mutated in colorectal tumours and in a small subset of tumours of other cancer types, and suggested that these mutations are probably caused by challenged DNA replication under aberrant conditions.

To determine the impact of DNA-binding proteins on DNA repair, we analysed the somatic mutation burden at TFBS in the genomes of 38 primary melanomas sequenced by The Cancer Genome Atlas10,19. We found that the mutation rate was approximately five times higher

in active TFBS, that is, those overlapping DHS (Fig. 1a) than in their flanking regions (P < 2.2 × 10−16, chi-square test). We determined that this elevated mutation rate could not be explained by the sequence context (Fig. 1a), and that it did not occur at inactive TFBS (Fig. 1a and Extended Data Fig. 1), indicating that it is directly related to the protein being bound to DNA. Furthermore, this enrichment for muta-tions appeared at the active binding sites of most transcription fac-tors (TFs) (Fig. 1b, Extended Data Fig. 2 and Supplementary Table 1); the signal was discernible in most individual melanomas (Fig. 1c and Supplementary Table 2), and it increased with genome-wide mutation rate. Moreover, the signal was also apparent across the genome of a sample taken from normal human skin20 (Fig. 1c), which indicates that the accumulation of mutations in TFBS results of a normal process rather than a pathogenic effect in tumour cells.

Most somatic mutations in melanocytes are caused by exposure to ultraviolet (UV) radiation11. UV radiation causes specific DNA lesions or DNA photoproducts of cyclobutane pyrimidine dimers (CPDs) and pyrimidine–pyrimidone (6–4) photoproducts (6–4PPs), at the sites of dipyrimidines21. As expected, C > T (G> A) mutations predominated in melanomas over other nucleotide changes (Fig. 1d), both within TFBS and at their flanks. This could be explained by either a faulty DNA repair7,8 or a higher probability of UV induced lesions22,23 at protein-bound DNA.

Next, we focused on active TFBS in distal regions from transcription start sites (TSS), and again found increased mutation rate at binding sites, flanked by periodic peaks of mutation rate observed at a distance of ∼170 bp, which coincides well with the size of the DNA wrapped around nucleosomes (∼146 bp) and the linker DNA, and could not be explained by sequence context (Fig. 2a). When we superimposed the nucleosomes positioning signals from the ENCODE data24 and these mutation rate peaks, we verified that their positions matched well. Furthermore, we found that the peak of mutation rate observed at the centre of DHS regions occurred exclusively at TFBS and was absent from DHS sites without TFBS (Fig. 2b and Extended Data Fig. 3). This corroborated the model that whatever process was causing the increase in the mutation rate, it required that the proteins be bound to the DNA.

We then determined if the cause of the higher mutation rate in TFBS and nucleosomes was the reduced accessibility to the protein-bound DNA of the NER machinery. Non-repaired lesions would be bypassed by polymerases carrying out translesional DNA synthesis, thus result-ing in mutations25. To test this, we assembled nucleotide-resolution maps of the NER activity of the two products of UV-induced DNA damage, CPDs and 6–4PPs, generated by ref. 6 using XR-seq in irra-diated skin fibroblasts6. In XR-seq, the excised ∼ 30-mer around the site of damage generated during nucleotide excision repair is isolated and subjected to high-throughput sequencing. When we analysed the genome-wide signal of this NER map, we found a strong decrease in the amount of CPD and 6–4PP repair at the centre of TFBS (Fig. 3a and Extended Data Fig. 4a), compared to their flanking regions. The decrease was apparent both in wild-type cells (NHF1), and CS-B

1Research Program on Biomedical Informatics, IMIM Hospital del Mar Medical Research Institute and Universitat Pompeu Fabra, Doctor Aiguader 88, 08003 Barcelona, Spain. 2Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluís Companys 23, 08010 Barcelona, Spain.

© 2016 Macmillan Publishers Limited. All rights reserved

1 4 A P R I L 2 0 1 6 | V O L 5 3 2 | N A T U R E | 2 5 9

LETTERdoi:10.1038/nature17437

Differential DNA repair underlies mutation hotspots at active promoters in cancer genomesDilmi Perera1*, Rebecca C. Poulos1*, Anushi Shah1, Dominik Beck1, John E. Pimanda1,2 & Jason W. H. Wong1

Promoters are DNA sequences that have an essential role in controlling gene expression. While recent whole cancer genome analyses have identified numerous hotspots of somatic point mutations within promoters, many have not yet been shown to perturb gene expression or drive cancer development1–4. As such, positive selection alone may not adequately explain the frequency of promoter point mutations in cancer genomes. Here we show that increased mutation density at gene promoters can be linked to promoter activity and differential nucleotide excision repair (NER). By analysing 1,161 human cancer genomes across 14 cancer types, we find evidence for increased local density of somatic point mutations within the centres of DNase I-hypersensitive sites (DHSs) in gene promoters. Mutated DHSs were strongly associated with transcription initiation activity, in which active promoters but not enhancers of equal DNase I hypersensitivity were most mutated relative to their flanking regions. Notably, analysis of genome-wide maps of NER5 shows that NER is impaired within the DHS centre of active gene promoters, while XPC-deficient skin cancers do not show increased promoter mutation density, pinpointing differential

NER as the underlying cause of these mutation hotspots. Consistent with this finding, we observe that melanomas with an ultraviolet-induced DNA damage mutation signature show greatest enrichment of promoter mutations, whereas cancers that are not highly dependent on NER, such as colon cancer, show no sign of such enrichment. Taken together, our analysis has uncovered the presence of a previously unknown mechanism linking transcription initiation and NER as a major contributor of somatic point mutation hotspots at active gene promoters in cancer genomes.

To examine systematically the frequency of somatic point muta-tions at gene promoters, we curated mutation calls from 1,161 whole cancer genomes across 14 cancer types from The Cancer Genome Atlas (TCGA), International Cancer Genome Consortium (ICGC) and published mutations6 (Fig. 1a, see Supplementary Table 1 for full list of samples and sources). Using cell-type-specific epigenomic data, we calculated the average mutation density across promoter DHSs, enhancer DHSs, genic and heterochromatin regions (Methods). In many cancer types, the average mutation density at promoter DHSs was higher than that of enhancer DHSs, with melanoma, lung and

1Prince of Wales Clinical School and Lowy Cancer Research Centre, UNSW Australia, Sydney 2052, Australia. 2Department of Haematology, Prince of Wales Hospital, Sydney 2031, Australia.*These authors contributed equally to this work.

Mela

noma

Lung

Colon

Ovaria

n

Oesop

hage

alLiv

er

Lympho

ma

Pancr

eatic

Breas

t

Renal

Prosta

teCLL

Med

ullob

lasto

ma

Astroc

ytoma

0.1

1

10

100

Mut

atio

ns p

er M

b

Promoter DHSs

Coding exons

Heterochromatin

Enhancer DHSs

Intronic

Mela

noma

Ovaria

nLu

ng

Pancr

eatic

Oesop

hage

al

Renal

Breas

t

Lympho

maLiv

er

Colon

0

2

4

6

Mut

atio

n de

nsity

ratio

(DH

S/±

1 kb

DH

S fl

ank)

a b

c d

Mela

noma

Astroc

ytoma

Lung

Ovaria

n

Oesop

hage

al

Prosta

te

Pancr

eatic

Breas

t

Med

ullob

lasto

maLiv

er

Renal

Lympho

maCLL

Colon

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

Mut

atio

n de

nsity

ratio

(DH

S/±

1 kb

DH

S fl

ank)

PromoterEnhancer

Cancer SamplesMedian

mutations Astrocytoma 100 102Breast 119 3,569CLL 28 1,912Colon 47 18,718Oesophageal 16 16,234Liver 244 9,080Lung 24 46,020Lymphoma 44 3,815Medulloblastoma 100 1,100Melanoma 36 66,843Ovarian 93 7,142Pancreatic 201 5,203Prostate 14 1,983Renal 95 3,894Total 1,161

100 102104 106

100 102104 106

Figure 1 | Pan-cancer analysis of mutation distribution across the genome reveals enrichment at promoter DHSs. a, Summary of the number of samples and median mutations associated with each cancer type analysed, with a histogram for each cancer type showing the total mutation count across each sample. CLL, chronic lymphocytic leukaemia. b, Mutation density across selected genomic regions for each cancer type.

DHSs are defined as ±75 bp of the DHS centre. c, Comparison of mutation density within promoter DHSs and enhancer DHSs relative to their flanking regions (±1 kb). d, Distribution of promoter DHS/DHS flank ratios of individual cancer samples with at least 8,602 mutations separated by cancer type. Box-plot shows median, quartiles, maximum and minimum. Individual cancer samples are shown as grey dots.

© 2016 Macmillan Publishers Limited. All rights reserved

Page 19: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data

Overall, mutation density is low in early replicating regions, active regulatory elements

and highly expressed genes.

Overall, mutation … but this is on average

This is aggregate. How about the effects of individual mutagens?

How about individual samples?

Page 20: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data

Example: APOBEC mutagenesis

APOBECs are cytidine deaminases involved in cancer mutagenesis (primarily APOBEC3A)

APOBEC creates strand-coordinated mutation clusters. APOBEC acts on ssDNA.

APOBEC has a characteristic signature: TCW→TTW or TCW→TGW

Page 21: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data

Focus on APOBEC

Enrichment of APOBEC signature and of cluster density varies by cancer types

It also varies by sample within cancer type

APOBEC activity results on sample-specific mutation properties

Page 22: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data

For mutations in clusters

119 breast and 24 lung cancer samples Kazanov et al., Cell Reports 2015

Page 23: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data

Dependency on the enrichment of the APOBEC signature

Page 24: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data

We can model mutations within APOBEC signature as a mixture of APOBEC induced

mutations and other mutations

Modelling a mixture of mutation origins

Page 25: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data

Joint analysis of all samples

Page 26: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data

The effect of epigenomic features on cancer mutations may be mutagen-dependent

APOBEC mutations are unique in the genomic distribution

Mutation models have to be sample-specific

Conclusions

Page 27: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data

Cluster regulatory elements by all covariates

Assume Poisson statistics within clusters

Use “wrong” tissue types as a control to derive FDR

Search for non-coding drivers

D’Antonio, Weghorn et al., (in review)

Page 28: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data

Search for non-coding drivers

Page 29: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data

Search for non-coding drivers

Page 30: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data

Precisely estimating local mutation rate is very difficult

It is practical to model a set of samples rather

than a cancer type or an individual sample

We opt for a hierarchical model

In search for a better statistical approach

Page 31: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data

In search for a better statistical approach

Fitting a parametric model of mutation rate heterogeneity

Posterior distribution for “functional” sites

Data across genomes

Known covariates

“Neutral” density in the locus

Page 32: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data

Modelling heterogeneity of mutation rates

LUAD

0 2 4 60.000.050.100.150.200.250.300.35

Count per gene

P(s|θ)

Synonymous

0 5 100.00

0.05

0.10

0.15

Count per gene

P(m|θ)

Missense

0 1 2 30.00.10.20.30.40.50.60.7

Count per gene

P(k|θ)

Nonsense

Page 33: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data

Modelling heterogeneity of mutation rates

MEL

0 2 4 60.0

0.1

0.2

0.3

0.4

0.5

Count per gene

P(s|θ)

Synonymous

0 5 100.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

Count per gene

P(m|θ)

Missense

0 1 2 30.0

0.2

0.4

0.6

0.8

Count per gene

P(k|θ)

Nonsense

Page 34: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data

Finding genes under selection (LUSC)

LUSC

0 5 10 15

10-1

10-2

10-3

10-4

10-5

�pos

PDF

0 2 4 6 8

10-1

10-2

10-3

10-4

10-5

�neg

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0

False positive rate

Truepositiverate

Known cancer genes

AUC: 0.68AUC: 0.59

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0

False positive rate

Cell essential genes

AUC: 0.59

Page 35: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data

Finding genes under selection (LUAD)

LUAD

0 5 10 15

10-1

10-2

10-3

10-4

10-5

�pos

PDF

0 2 4 6 8 10

10-1

10-2

10-3

10-4

10-5

�neg

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0

False positive rate

Truepositiverate

Known cancer genes

AUC: 0.62AUC: 0.58

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0

False positive rate

Cell essential genes

AUC: 0.64

Page 36: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data

Finding genes under selection (MEL)

MEL

0 5 10 15

10-1

10-2

10-3

10-4

10-5

�pos

PDF

0 2 4 6 8

10-1

10-2

10-3

10-4

10-5

�neg

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0

False positive rate

Truepositiverate

Known cancer genes

AUC: 0.63AUC: 0.54

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0

False positive rate

Cell essential genes

AUC: 0.53

Page 37: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data

Understanding mutation rate heterogeneity helps understand basic biology and develop

statistical methods of cancer genomics

Mutation rate varies by cancer type and by sample

Epigenomic features are key

We need better statistical approaches

Conclusions

Page 38: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data

Acknowledgments

Page 39: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data

Acknowledgments

The lab: Daniel Jordan, Ivan Adzhubei, Paz Polak, Donate Weghorn, Mashaal Sohail, Dana Vuzman, Daniel Balick, Sung Chun, Jae-Hoon Sul, Chris Cassa, Sebastian Akle, David Radke

Collaborators: John Stamatoyannopoulos, Bob Thurman, Rosa Karlic, Amnon Koren, Dmitri Gordenin, Marat Kazanov, Steven Roberts