computational analyses of yeast and human chromatin
DESCRIPTION
Computational analyses of yeast and human chromatin. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington. Outline. Sequence-based models of nucleosome positioning Footprinting protein binding sites genomewide. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/1.jpg)
Computational analyses of yeast and human chromatin
William Stafford NobleDepartment of Genome Sciences
Department of Computer Science and EngineeringUniversity of Washington
![Page 2: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/2.jpg)
Outline
• Sequence-based models of nucleosome positioning
• Footprinting protein binding sites genomewide
![Page 3: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/3.jpg)
![Page 4: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/4.jpg)
GenesGenes
Gene Gene ‘domains’‘domains’
Organization of cis-regulatory sequences
DNaseIDNaseIHypersensitive Hypersensitive SiteSite
Trans-Trans-factor factor
complexcomplex
Chromatin Fiber Chromatin Fiber
NucleusNucleus
GenomicGenomicDNADNA
Packaged into Packaged into ChromatinChromatin
![Page 5: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/5.jpg)
4/439.3%
33/4967.3%
108/14673.9%
![Page 6: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/6.jpg)
![Page 7: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/7.jpg)
![Page 8: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/8.jpg)
![Page 9: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/9.jpg)
Overall approach
Microarray data from (Yuan et al. 2006).
![Page 10: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/10.jpg)
Sequence spectrum
• Compute frequencies of substrings of length k (k-mers) for k = 1 up to 6.
• Treat reverse complements as the same k-mer.
• The resulting vector contains 2772 entries.
A/T 0.5966 C/G 0.4249 AA/TT 0.1931 AC/GT 0.1116 AG/CT 0.1288 AT/AT 0.0815 CA/TG 0.1674 CC/GG 0.0901 CG/CG 0.0172 GA/TC 0.1330 GC/GC 0.0429 TA/TA 0.0515 AAA/TTT 0.0730 AAC/GTT 0.0343 AAG/CTT 0.0472 AAT/ATT 0.0386 ...TTTAAA/TTTAAA 0.0043
![Page 11: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/11.jpg)
Primary results
![Page 12: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/12.jpg)
The SVM recapitulates array data
![Page 13: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/13.jpg)
10bp periodicity
AA periodicity, Drew & Travers 1986
AA/TT/AT periodicity, Segal 2006
Periodicity in SVM score, Peckham 2007
![Page 14: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/14.jpg)
Comparison of yeast modelsSegal 2006:• The model is positional.• The model is generative.
• Compare predicted positions with 199 sites from the literature.
• 54% are within 35 bp• Expect 39% by chance.• The model explains >50% of
the signal.• The model performs 15%
better than chance.
Peckham 2007:• The model is compositional.• The model is discriminative.
• Compare predicted positions with sites derived from (Yuan 2006).
• 50% are within 40 bp• Expect 33% by chance.• The model explains ~50% of
the signal.• The model performs 17%
better than chance.
![Page 15: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/15.jpg)
![Page 16: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/16.jpg)
Two data sets
• Dennis et al., Genome Research, 2007
• 25 kb regions upstream of 42 genes
• 50-mer probes every 20 bp
• 3 arrays, 3 copies of each probe, forward and reverse strand → 18 measurements per probe
• Ozsolak et al., Nature Biotechnology, 2007.
• 1.5 kb regions upstream of 3692 genes
• 50-mer probes every 10 bp
• 7 cell lines
![Page 17: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/17.jpg)
Cross-validation results
![Page 18: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/18.jpg)
Complementary aspects of chromatin accessibility
Dennis and A375 SVMs accurately identify low MNase accessibility.
MEC SVM accurately identifies high MNase accessibility.
Strong MNase digestion (MEC) allows the recognition of nucleosome disfavoring sequences.
Weak MNase digestion (A375) allows the recognition of nucleosome forming sequences.
![Page 19: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/19.jpg)
Yeast and human concordance
Each model was applied to the human ENCODE regions.
0.862 0.849
![Page 20: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/20.jpg)
Low- and high-scoring regions
A375 SVM scores are averaged over 1000 top- and bottom-scoring regions.Flanking lines indicate standard error of the mean.
![Page 21: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/21.jpg)
Dinucleotide frequencies
• MNase cleavage bias is unlikely to account for such large differences.
• Nucleosome forming sequences exhibit a 3bp periodicity of CG and GC dinucleotides.
• Nucleosome disfavoring sequences tends to be low complexity.
![Page 22: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/22.jpg)
Transcription start sites
A375 – weak digestionRecognizes nucleosome
forming sequences
MEC – strong digestionRecognizes nucleosome disfavoring sequences
SVM scores are averaged over all TSSs in the ENCODE regions.
![Page 23: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/23.jpg)
Summary
• An SVM can discriminate between MNase protected and MNase accessible sequences with high accuracy.
• The model learns to recognize complementary phenomena, depending upon the degree of MNase digestion.
• The model recapitulates known features of human chromatin.
• Most nucleosome positioning is boundary-event driven.
![Page 24: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/24.jpg)
![Page 25: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/25.jpg)
![Page 26: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/26.jpg)
Methodology
![Page 27: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/27.jpg)
60% of DNaseI cleavage occurs in intergenic regions
![Page 28: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/28.jpg)
Individual footprints
![Page 29: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/29.jpg)
Problem definition
• Given– Cut-counts at each position– Unique mappability (Boolean) of each position– Size range of footprints– Size of the background window
• Return – A ranked list of non-overlapping footprints,
each associated with a statistical confidence score
![Page 30: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/30.jpg)
Scoring a candidate footprint
Foreground window
Background window
A depletion score
![Page 31: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/31.jpg)
• The probability that a window of size a within the target region will contain x or fewer cuts
– a: effective foreground window size– b: effective background window size– B: # of cuts in the background window
• Score all overlapping windows of width kmin to kmax.
Depletion score: binomial distribution
![Page 32: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/32.jpg)
Greedy selection
• Generate a non-overlapping set of high-scoring windows– Sort all of the depletion scores in ascending
order– Traverse the sorted list, accepting a scored
window if it does not overlap a previously accepted window
![Page 33: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/33.jpg)
Empirical null model
• Shuffle the cut-counts at the level of genomic positions, together with the mappability information of each position
• Repeat the depletion scoring and greedy selection procedure on the shuffled data
• Generate a ranked list of footprints
• Estimate false discovery rate using Storey method.
![Page 34: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/34.jpg)
Evaluation: gold standard
• MacIsaac set [MacIsaac et al. 2006]
– Conserved regulatory sites in yeast– Identified from ChIP data– 4387 sites with stringent thresholds
• Imperfect– Conservatively defined– Different experimental conditions
• Only used to compare different footprint detectors
![Page 35: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/35.jpg)
Evaluation: metric
• Recall = TP / (TP+FN)
• Precision = TP / (TP+FP)
![Page 36: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/36.jpg)
Results
“What fraction of the MacIsaac motifs are in footprints?”
“What fraction of the footprints contain a MacIsaac motif?”
![Page 37: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/37.jpg)
Results
• Binomial scoring performs better than the simple ratio.• The rank transformation yields better results.• Larger background widths are better.• Using the double scoring scheme does not always help.
![Page 38: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/38.jpg)
Results
• 238,133 candidate footprints
• 4514 are significant at q<0.05.
• Estimated 10,716 footprints in total.
• Our algorithm identifies 40.0% of these at q<0.05.
![Page 39: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/39.jpg)
• Scan footprints with MacIsaac motifs, using q<0.05.
• 36.6% of the footprints contain a motif.
• Also scan intergenic regions.
• Every motif occurs more frequently in footprints than in intergenic regions.
![Page 40: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/40.jpg)
Footprints contain known motifs
Motif information content is inversely correlated with Phastcons score (p < 0.0022).
![Page 41: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/41.jpg)
Motif discovery
15 sites, E=7e-12 41 sites, E=1e-29
8 sites, E=6e-1128 sites, E=3e-6
7/8 sites occur in sigma LTRs associated with
retrotransposons
![Page 42: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/42.jpg)
MCM1
• The first motif matches the core of the TRANSFAC MCM1 motif.
![Page 43: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/43.jpg)
Motif discovery
41 sites, E=1e-29
28 sites, E=3e-6
• 108 occurrences in footprints.
• Of these, 42 are within 250bp 5’ of the start of a gene.
• 35 occurrences in footprints.
• Of these, 22 are within 250bp 5’ of the start of a gene.
![Page 44: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/44.jpg)
Global view of chromatin organization
![Page 45: Computational analyses of yeast and human chromatin](https://reader035.vdocuments.net/reader035/viewer/2022062721/56813811550346895d9fc5cb/html5/thumbnails/45.jpg)
Summary
• Digital genomic footprinting provides a nucleotide-level map of DNaseI accessibility across the yeast genome.
• This map enables identification of individual protein binding sites.
• Dramatically improves the signal-to-noise ratio for motif searching.
• The method can be performed on any organism whose genome is sequenced, exposing its entire cis-regulatory framework in a single experiment.