using encode to interpret mutational patterns in cancer · 2016-06-09 · dna, compared to their...
TRANSCRIPT
![Page 1: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/1.jpg)
Using ENCODE to interpret mutational patterns in cancer
Shamil Sunyaev
Broad Institute of M.I.T. and Harvard
![Page 2: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/2.jpg)
Disclosures
This work was funded by NIH
Nothing to disclose
![Page 3: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/3.jpg)
Data on de novo mutations
Germ-line mutations Somatic mutations
Change in DNA sequence
Change in DNA sequence
![Page 4: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/4.jpg)
Why is this of any interest?
Statistical genetics of cancer
Evolution
Biology
Methods
As a key parameter
Of mutation rate
Of DNA repair
Of DNA replication
![Page 5: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/5.jpg)
Genetic mapping and mapping by natural selection
Most of gene mapping methods (linkage, association) rely on recombination and are
only applicable to sexual systems
Many methods to detect selection signals (selective sweeps, extended haplotypes) are
similarly limited to sexual populations
What about cancer?
![Page 6: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/6.jpg)
Identifying selected genes/functional units by recurrence
![Page 7: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/7.jpg)
This signature of selection is completely (!!!) confounded by mutation rate variation
“Non-functional” regions may not serve as an ideal
null model if mutation rate is correlated with “functionality”
Other samples may not serve as an ideal null model
if mutation rate variation is sample-specific
All of this is exacerbated in the search for non-coding drivers!
Everything is more difficult in search for non-coding drivers
![Page 8: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/8.jpg)
Somatic cancer mutation density is associated with replication timing
Lawrence, et al., Nature 2013
![Page 9: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/9.jpg)
Somatic mutation rate depends on expression
Mutation rate is reduced in transcribed regions compared to intergenic regions
The reduction of mutation rate is proportional to expression level
The effect is attributed to transcription coupled repair (TCR), which is supported by the strand bias
Hanawalt & Spivak, Nat Rev Mol Cell Biol 2008
![Page 10: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/10.jpg)
![Page 11: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/11.jpg)
Predicting local mutation rate at 1Mb scale
Polak, Karlic et al., Nature 2015
![Page 12: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/12.jpg)
Tissue specificity
![Page 13: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/13.jpg)
55-86% of regional variation is explained by 184 chromatin tracks from more than
80 tissues
![Page 14: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/14.jpg)
Cell type specificity
Skin_FibroblastsfBrain
Skin_KeratinocytesiPS
fSpinalLungfHeartfSkinH1
NPCHeart
fAdrenalfLung
epithelial_cancerfPlacentafKidneyLiver
gastricfThymus
epithelial_normalhematopoietic
fStomachfMuscle
Skin_Melanocytes
0.0
0.2
0.4
0.6
0.8
variance.explained
CtoT.mel.all.rate
![Page 15: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/15.jpg)
Mutation rate is reduced in regulatory regions marked by accessible chromatin
mut
atio
n de
nsity
0e+00
1e−05
2e−05
3e−05
4e−05
5e−05
6e−05
CLL Lung Melanoma Colon MM
regionDHS peakDHSflanksgenome
Polak et al., Nature Biotech. 2014
![Page 16: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/16.jpg)
![Page 17: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/17.jpg)
Implicating nucleotide excision repair (NER)
melanoma genomes
depl
etio
n of
mut
atio
n de
nsity
in D
HS
0.0
0.2
0.4
0.6
0.8
1.0
NERgenomes with mutations in NER genes genomes without mutations in NER genes
![Page 18: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/18.jpg)
High mutation density in TFBS due to NER
2 6 4 | N A T U R E | V O L 5 3 2 | 1 4 A P R I L 2 0 1 6
LETTERdoi:10.1038/nature17661
Nucleotide excision repair is impaired by binding of transcription factors to DNARadhakrishnan Sabarinathan1, Loris Mularoni1, Jordi Deu-Pons1, Abel Gonzalez-Perez1 & Núria López-Bigas1,2
Somatic mutations are the driving force of cancer genome evolution1. The rate of somatic mutations appears to be greatly variable across the genome due to variations in chromatin organization, DNA accessibility and replication timing2–5. However, other variables that may influence the mutation rate locally are unknown, such as a role for DNA-binding proteins, for example. Here we demonstrate that the rate of somatic mutations in melanomas is highly increased at active transcription factor binding sites and nucleosome embedded DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data6, we show that the higher mutation rate at these sites is caused by a decrease of the levels of nucleotide excision repair (NER) activity. Our work demonstrates that DNA-bound proteins interfere with the NER machinery, which results in an increased rate of DNA mutations at the protein binding sites. This finding has important implications for our understanding of mutational and DNA repair processes and in the identification of cancer driver mutations.
The accumulation of somatic mutations in cells results from the interplay of mutagenic processes, both internal and exogenous, and mechanisms of DNA repair. Detailed early biochemical studies7,8 and recent efforts to sequence the genomes of tumours from different can-cer types9,10 have shed light on this. Mutational signatures associated with various tumorigenic mechanisms have been identified across cancer types11, and genomic features such as chromatin organization, DNA accessibility, and DNA replication timing2–5 have been associated with the variation of somatic mutation rates at the megabase scale. Two recent studies proposed a causal relationship between the accessibility of chromosomal areas to the DNA repair machinery and their mutational burden. Supek and Lehner12 pointed to variable repair of DNA mis-matches as the basis of the megabase scale variation of somatic mutation rates across the human genome. Polak et al.4 attributed lower somatic mutation rates at DNase I hypersensitive sites (DHS) than at their flank-ing regions and the rest of the genome in cell lines and primary tumours to higher accessibility to the global genome repair machinery. Similarly, nucleo some occupancy has been linked to regional mutation rate vari-ation between nucleosome-bound DNA and linker regions13–16, while two recent studies found a relation between transcription factor binding sites (TFBS) and nucleotide substitution rates. Reijns et al.17 detected increased levels of nucleotide substitutions around TFBS in the yeast genome, which was attributed to DNA-binding proteins acting as partial barriers to the polymerase δ mediated displacement of polymerase α synthesized DNA. Katainen et al.18 found that CTCF/cohesin-binding sites are frequently mutated in colorectal tumours and in a small subset of tumours of other cancer types, and suggested that these mutations are probably caused by challenged DNA replication under aberrant conditions.
To determine the impact of DNA-binding proteins on DNA repair, we analysed the somatic mutation burden at TFBS in the genomes of 38 primary melanomas sequenced by The Cancer Genome Atlas10,19. We found that the mutation rate was approximately five times higher
in active TFBS, that is, those overlapping DHS (Fig. 1a) than in their flanking regions (P < 2.2 × 10−16, chi-square test). We determined that this elevated mutation rate could not be explained by the sequence context (Fig. 1a), and that it did not occur at inactive TFBS (Fig. 1a and Extended Data Fig. 1), indicating that it is directly related to the protein being bound to DNA. Furthermore, this enrichment for muta-tions appeared at the active binding sites of most transcription fac-tors (TFs) (Fig. 1b, Extended Data Fig. 2 and Supplementary Table 1); the signal was discernible in most individual melanomas (Fig. 1c and Supplementary Table 2), and it increased with genome-wide mutation rate. Moreover, the signal was also apparent across the genome of a sample taken from normal human skin20 (Fig. 1c), which indicates that the accumulation of mutations in TFBS results of a normal process rather than a pathogenic effect in tumour cells.
Most somatic mutations in melanocytes are caused by exposure to ultraviolet (UV) radiation11. UV radiation causes specific DNA lesions or DNA photoproducts of cyclobutane pyrimidine dimers (CPDs) and pyrimidine–pyrimidone (6–4) photoproducts (6–4PPs), at the sites of dipyrimidines21. As expected, C > T (G> A) mutations predominated in melanomas over other nucleotide changes (Fig. 1d), both within TFBS and at their flanks. This could be explained by either a faulty DNA repair7,8 or a higher probability of UV induced lesions22,23 at protein-bound DNA.
Next, we focused on active TFBS in distal regions from transcription start sites (TSS), and again found increased mutation rate at binding sites, flanked by periodic peaks of mutation rate observed at a distance of ∼170 bp, which coincides well with the size of the DNA wrapped around nucleosomes (∼146 bp) and the linker DNA, and could not be explained by sequence context (Fig. 2a). When we superimposed the nucleosomes positioning signals from the ENCODE data24 and these mutation rate peaks, we verified that their positions matched well. Furthermore, we found that the peak of mutation rate observed at the centre of DHS regions occurred exclusively at TFBS and was absent from DHS sites without TFBS (Fig. 2b and Extended Data Fig. 3). This corroborated the model that whatever process was causing the increase in the mutation rate, it required that the proteins be bound to the DNA.
We then determined if the cause of the higher mutation rate in TFBS and nucleosomes was the reduced accessibility to the protein-bound DNA of the NER machinery. Non-repaired lesions would be bypassed by polymerases carrying out translesional DNA synthesis, thus result-ing in mutations25. To test this, we assembled nucleotide-resolution maps of the NER activity of the two products of UV-induced DNA damage, CPDs and 6–4PPs, generated by ref. 6 using XR-seq in irra-diated skin fibroblasts6. In XR-seq, the excised ∼ 30-mer around the site of damage generated during nucleotide excision repair is isolated and subjected to high-throughput sequencing. When we analysed the genome-wide signal of this NER map, we found a strong decrease in the amount of CPD and 6–4PP repair at the centre of TFBS (Fig. 3a and Extended Data Fig. 4a), compared to their flanking regions. The decrease was apparent both in wild-type cells (NHF1), and CS-B
1Research Program on Biomedical Informatics, IMIM Hospital del Mar Medical Research Institute and Universitat Pompeu Fabra, Doctor Aiguader 88, 08003 Barcelona, Spain. 2Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluís Companys 23, 08010 Barcelona, Spain.
© 2016 Macmillan Publishers Limited. All rights reserved
1 4 A P R I L 2 0 1 6 | V O L 5 3 2 | N A T U R E | 2 5 9
LETTERdoi:10.1038/nature17437
Differential DNA repair underlies mutation hotspots at active promoters in cancer genomesDilmi Perera1*, Rebecca C. Poulos1*, Anushi Shah1, Dominik Beck1, John E. Pimanda1,2 & Jason W. H. Wong1
Promoters are DNA sequences that have an essential role in controlling gene expression. While recent whole cancer genome analyses have identified numerous hotspots of somatic point mutations within promoters, many have not yet been shown to perturb gene expression or drive cancer development1–4. As such, positive selection alone may not adequately explain the frequency of promoter point mutations in cancer genomes. Here we show that increased mutation density at gene promoters can be linked to promoter activity and differential nucleotide excision repair (NER). By analysing 1,161 human cancer genomes across 14 cancer types, we find evidence for increased local density of somatic point mutations within the centres of DNase I-hypersensitive sites (DHSs) in gene promoters. Mutated DHSs were strongly associated with transcription initiation activity, in which active promoters but not enhancers of equal DNase I hypersensitivity were most mutated relative to their flanking regions. Notably, analysis of genome-wide maps of NER5 shows that NER is impaired within the DHS centre of active gene promoters, while XPC-deficient skin cancers do not show increased promoter mutation density, pinpointing differential
NER as the underlying cause of these mutation hotspots. Consistent with this finding, we observe that melanomas with an ultraviolet-induced DNA damage mutation signature show greatest enrichment of promoter mutations, whereas cancers that are not highly dependent on NER, such as colon cancer, show no sign of such enrichment. Taken together, our analysis has uncovered the presence of a previously unknown mechanism linking transcription initiation and NER as a major contributor of somatic point mutation hotspots at active gene promoters in cancer genomes.
To examine systematically the frequency of somatic point muta-tions at gene promoters, we curated mutation calls from 1,161 whole cancer genomes across 14 cancer types from The Cancer Genome Atlas (TCGA), International Cancer Genome Consortium (ICGC) and published mutations6 (Fig. 1a, see Supplementary Table 1 for full list of samples and sources). Using cell-type-specific epigenomic data, we calculated the average mutation density across promoter DHSs, enhancer DHSs, genic and heterochromatin regions (Methods). In many cancer types, the average mutation density at promoter DHSs was higher than that of enhancer DHSs, with melanoma, lung and
1Prince of Wales Clinical School and Lowy Cancer Research Centre, UNSW Australia, Sydney 2052, Australia. 2Department of Haematology, Prince of Wales Hospital, Sydney 2031, Australia.*These authors contributed equally to this work.
Mela
noma
Lung
Colon
Ovaria
n
Oesop
hage
alLiv
er
Lympho
ma
Pancr
eatic
Breas
t
Renal
Prosta
teCLL
Med
ullob
lasto
ma
Astroc
ytoma
0.1
1
10
100
Mut
atio
ns p
er M
b
Promoter DHSs
Coding exons
Heterochromatin
Enhancer DHSs
Intronic
Mela
noma
Ovaria
nLu
ng
Pancr
eatic
Oesop
hage
al
Renal
Breas
t
Lympho
maLiv
er
Colon
0
2
4
6
Mut
atio
n de
nsity
ratio
(DH
S/±
1 kb
DH
S fl
ank)
a b
c d
Mela
noma
Astroc
ytoma
Lung
Ovaria
n
Oesop
hage
al
Prosta
te
Pancr
eatic
Breas
t
Med
ullob
lasto
maLiv
er
Renal
Lympho
maCLL
Colon
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
Mut
atio
n de
nsity
ratio
(DH
S/±
1 kb
DH
S fl
ank)
PromoterEnhancer
Cancer SamplesMedian
mutations Astrocytoma 100 102Breast 119 3,569CLL 28 1,912Colon 47 18,718Oesophageal 16 16,234Liver 244 9,080Lung 24 46,020Lymphoma 44 3,815Medulloblastoma 100 1,100Melanoma 36 66,843Ovarian 93 7,142Pancreatic 201 5,203Prostate 14 1,983Renal 95 3,894Total 1,161
100 102104 106
100 102104 106
Figure 1 | Pan-cancer analysis of mutation distribution across the genome reveals enrichment at promoter DHSs. a, Summary of the number of samples and median mutations associated with each cancer type analysed, with a histogram for each cancer type showing the total mutation count across each sample. CLL, chronic lymphocytic leukaemia. b, Mutation density across selected genomic regions for each cancer type.
DHSs are defined as ±75 bp of the DHS centre. c, Comparison of mutation density within promoter DHSs and enhancer DHSs relative to their flanking regions (±1 kb). d, Distribution of promoter DHS/DHS flank ratios of individual cancer samples with at least 8,602 mutations separated by cancer type. Box-plot shows median, quartiles, maximum and minimum. Individual cancer samples are shown as grey dots.
© 2016 Macmillan Publishers Limited. All rights reserved
![Page 19: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/19.jpg)
Overall, mutation density is low in early replicating regions, active regulatory elements
and highly expressed genes.
Overall, mutation … but this is on average
This is aggregate. How about the effects of individual mutagens?
How about individual samples?
![Page 20: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/20.jpg)
Example: APOBEC mutagenesis
APOBECs are cytidine deaminases involved in cancer mutagenesis (primarily APOBEC3A)
APOBEC creates strand-coordinated mutation clusters. APOBEC acts on ssDNA.
APOBEC has a characteristic signature: TCW→TTW or TCW→TGW
![Page 21: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/21.jpg)
Focus on APOBEC
Enrichment of APOBEC signature and of cluster density varies by cancer types
It also varies by sample within cancer type
APOBEC activity results on sample-specific mutation properties
![Page 22: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/22.jpg)
For mutations in clusters
119 breast and 24 lung cancer samples Kazanov et al., Cell Reports 2015
![Page 23: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/23.jpg)
Dependency on the enrichment of the APOBEC signature
![Page 24: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/24.jpg)
We can model mutations within APOBEC signature as a mixture of APOBEC induced
mutations and other mutations
Modelling a mixture of mutation origins
![Page 25: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/25.jpg)
Joint analysis of all samples
![Page 26: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/26.jpg)
The effect of epigenomic features on cancer mutations may be mutagen-dependent
APOBEC mutations are unique in the genomic distribution
Mutation models have to be sample-specific
Conclusions
![Page 27: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/27.jpg)
Cluster regulatory elements by all covariates
Assume Poisson statistics within clusters
Use “wrong” tissue types as a control to derive FDR
Search for non-coding drivers
D’Antonio, Weghorn et al., (in review)
![Page 28: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/28.jpg)
Search for non-coding drivers
![Page 29: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/29.jpg)
Search for non-coding drivers
![Page 30: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/30.jpg)
Precisely estimating local mutation rate is very difficult
It is practical to model a set of samples rather
than a cancer type or an individual sample
We opt for a hierarchical model
In search for a better statistical approach
![Page 31: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/31.jpg)
In search for a better statistical approach
Fitting a parametric model of mutation rate heterogeneity
Posterior distribution for “functional” sites
Data across genomes
Known covariates
“Neutral” density in the locus
![Page 32: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/32.jpg)
Modelling heterogeneity of mutation rates
LUAD
0 2 4 60.000.050.100.150.200.250.300.35
Count per gene
P(s|θ)
Synonymous
0 5 100.00
0.05
0.10
0.15
Count per gene
P(m|θ)
Missense
0 1 2 30.00.10.20.30.40.50.60.7
Count per gene
P(k|θ)
Nonsense
![Page 33: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/33.jpg)
Modelling heterogeneity of mutation rates
MEL
0 2 4 60.0
0.1
0.2
0.3
0.4
0.5
Count per gene
P(s|θ)
Synonymous
0 5 100.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
Count per gene
P(m|θ)
Missense
0 1 2 30.0
0.2
0.4
0.6
0.8
Count per gene
P(k|θ)
Nonsense
![Page 34: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/34.jpg)
Finding genes under selection (LUSC)
LUSC
0 5 10 15
10-1
10-2
10-3
10-4
10-5
�pos
0 2 4 6 8
10-1
10-2
10-3
10-4
10-5
�neg
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0
False positive rate
Truepositiverate
Known cancer genes
AUC: 0.68AUC: 0.59
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0
False positive rate
Cell essential genes
AUC: 0.59
![Page 35: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/35.jpg)
Finding genes under selection (LUAD)
LUAD
0 5 10 15
10-1
10-2
10-3
10-4
10-5
�pos
0 2 4 6 8 10
10-1
10-2
10-3
10-4
10-5
�neg
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0
False positive rate
Truepositiverate
Known cancer genes
AUC: 0.62AUC: 0.58
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0
False positive rate
Cell essential genes
AUC: 0.64
![Page 36: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/36.jpg)
Finding genes under selection (MEL)
MEL
0 5 10 15
10-1
10-2
10-3
10-4
10-5
�pos
0 2 4 6 8
10-1
10-2
10-3
10-4
10-5
�neg
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0
False positive rate
Truepositiverate
Known cancer genes
AUC: 0.63AUC: 0.54
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0
False positive rate
Cell essential genes
AUC: 0.53
![Page 37: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/37.jpg)
Understanding mutation rate heterogeneity helps understand basic biology and develop
statistical methods of cancer genomics
Mutation rate varies by cancer type and by sample
Epigenomic features are key
We need better statistical approaches
Conclusions
![Page 38: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/38.jpg)
Acknowledgments
![Page 39: Using ENCODE to interpret mutational patterns in cancer · 2016-06-09 · DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data](https://reader033.vdocuments.net/reader033/viewer/2022060514/5f84f07635a7bb58bc2afb1f/html5/thumbnails/39.jpg)
Acknowledgments
The lab: Daniel Jordan, Ivan Adzhubei, Paz Polak, Donate Weghorn, Mashaal Sohail, Dana Vuzman, Daniel Balick, Sung Chun, Jae-Hoon Sul, Chris Cassa, Sebastian Akle, David Radke
Collaborators: John Stamatoyannopoulos, Bob Thurman, Rosa Karlic, Amnon Koren, Dmitri Gordenin, Marat Kazanov, Steven Roberts