two bioinformatics applications of dynamic bayesian networks william stafford noble department of...
TRANSCRIPT
![Page 1: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering](https://reader036.vdocuments.net/reader036/viewer/2022062511/5514c6e0550346935c8b4908/html5/thumbnails/1.jpg)
Two bioinformatics applications of dynamic Bayesian networks
William Stafford NobleDepartment of Genome Sciences
Department of Computer Science and EngineeringUniversity of Washington
![Page 2: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering](https://reader036.vdocuments.net/reader036/viewer/2022062511/5514c6e0550346935c8b4908/html5/thumbnails/2.jpg)
Outline
• Segmenting genomic data– Background: DNA, chromatin and DNase I– Simple solution– Wavelets– Hierarchical model
• Matching peptides to mass spectra– Background: tandem mass spectrometry– Modeling peptide fragmentation
![Page 3: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering](https://reader036.vdocuments.net/reader036/viewer/2022062511/5514c6e0550346935c8b4908/html5/thumbnails/3.jpg)
GenesGenes
Gene Gene ‘domains’‘domains’
DnaseIDnaseIHypersensitive Hypersensitive SiteSite
Trans-Trans-factor factor
complexcomplex
Chromatin Fiber Chromatin Fiber
NucleusNucleus
GenomicGenomicDNADNA
Packaged into Packaged into ChromatinChromatin
The human genome in vivo
![Page 4: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering](https://reader036.vdocuments.net/reader036/viewer/2022062511/5514c6e0550346935c8b4908/html5/thumbnails/4.jpg)
Measuring chromatin
accessibility
![Page 5: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering](https://reader036.vdocuments.net/reader036/viewer/2022062511/5514c6e0550346935c8b4908/html5/thumbnails/5.jpg)
![Page 6: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering](https://reader036.vdocuments.net/reader036/viewer/2022062511/5514c6e0550346935c8b4908/html5/thumbnails/6.jpg)
A simple hidden Markov model
• Each state contains a single Gaussian.• The model has six parameters (two transitions, two means, two standard
deviations).• The parameters are initialized randomly and trained in an unsupervised
fashion via expectation-maximization.• EM is re-started 100 times, and we select the parameters that yield the
highest likelihood.• The original data set is then segmented using either Viterbi or posterior
decoding.
Openchromatin
Closedchromatin
very
^
![Page 7: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering](https://reader036.vdocuments.net/reader036/viewer/2022062511/5514c6e0550346935c8b4908/html5/thumbnails/7.jpg)
1.5 megabases
![Page 8: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering](https://reader036.vdocuments.net/reader036/viewer/2022062511/5514c6e0550346935c8b4908/html5/thumbnails/8.jpg)
A problem, and two solutions
• Problem: We are interested in phenomena occurring at multiple scales.
• Solution #1: Perform a wavelet smooth prior to HMM analysis.
• Solution #2: Build a more complex probability model.
![Page 9: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering](https://reader036.vdocuments.net/reader036/viewer/2022062511/5514c6e0550346935c8b4908/html5/thumbnails/9.jpg)
![Page 10: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering](https://reader036.vdocuments.net/reader036/viewer/2022062511/5514c6e0550346935c8b4908/html5/thumbnails/10.jpg)
![Page 11: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering](https://reader036.vdocuments.net/reader036/viewer/2022062511/5514c6e0550346935c8b4908/html5/thumbnails/11.jpg)
![Page 12: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering](https://reader036.vdocuments.net/reader036/viewer/2022062511/5514c6e0550346935c8b4908/html5/thumbnails/12.jpg)
![Page 13: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering](https://reader036.vdocuments.net/reader036/viewer/2022062511/5514c6e0550346935c8b4908/html5/thumbnails/13.jpg)
Change point model
• Four-state model: – major DNase hypersensitive site (DHS),– minor DHS,– intermediate sensitivity region, and– insensitive region.
• Continuous mixture of Gaussians at each state.
• Gamma distribution of lengths within each region.
![Page 14: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering](https://reader036.vdocuments.net/reader036/viewer/2022062511/5514c6e0550346935c8b4908/html5/thumbnails/14.jpg)
![Page 15: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering](https://reader036.vdocuments.net/reader036/viewer/2022062511/5514c6e0550346935c8b4908/html5/thumbnails/15.jpg)
Spanning the gaps
Beginning in State 1 (Insensitive)
![Page 16: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering](https://reader036.vdocuments.net/reader036/viewer/2022062511/5514c6e0550346935c8b4908/html5/thumbnails/16.jpg)
Spanning the gaps
Beginning in State 4 (Major DHS)
![Page 17: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering](https://reader036.vdocuments.net/reader036/viewer/2022062511/5514c6e0550346935c8b4908/html5/thumbnails/17.jpg)
Selecting the number of states
![Page 18: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering](https://reader036.vdocuments.net/reader036/viewer/2022062511/5514c6e0550346935c8b4908/html5/thumbnails/18.jpg)
Improved fit to the data
Each panel is a QQ plot of the difference between the observed residuals and the theoretical Gaussian.
Insensitive Intermediate sensitivity
Minor DHS Major DHS
![Page 19: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering](https://reader036.vdocuments.net/reader036/viewer/2022062511/5514c6e0550346935c8b4908/html5/thumbnails/19.jpg)
Capturing different scales
![Page 20: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering](https://reader036.vdocuments.net/reader036/viewer/2022062511/5514c6e0550346935c8b4908/html5/thumbnails/20.jpg)
Enrichment of biologically relevant features
![Page 21: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering](https://reader036.vdocuments.net/reader036/viewer/2022062511/5514c6e0550346935c8b4908/html5/thumbnails/21.jpg)
Future directions
• Many types of genomic data– Phylogenetic conservation scores– Various histone modifications– Replication timing, etc.
• Perform segmentions in multiple dimensions simultaneously.
• Assign statistical significance to observed segments.
![Page 22: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering](https://reader036.vdocuments.net/reader036/viewer/2022062511/5514c6e0550346935c8b4908/html5/thumbnails/22.jpg)
Shotgun proteomics
TrainedModel
TestPSMs
TrainingPSMs
ProbabilityModel
Evaluation
PSM = peptide-spectrum match
![Page 23: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering](https://reader036.vdocuments.net/reader036/viewer/2022062511/5514c6e0550346935c8b4908/html5/thumbnails/23.jpg)
Peptide sequence influences peak height
![Page 24: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering](https://reader036.vdocuments.net/reader036/viewer/2022062511/5514c6e0550346935c8b4908/html5/thumbnails/24.jpg)
Bayesian network
• We model peptide fragmentation using a Bayesian network.
• Nodes represent random variables, and edges represent conditional dependencies.
• Each node stores a conditional probability table (CPT) giving Pr(node|parents).
1.000.00no b-ion observed
0.750.25 b-ion observed
intensity > 50% intensity < 50%
Is b-ionobserved?
b-ionintensity
![Page 25: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering](https://reader036.vdocuments.net/reader036/viewer/2022062511/5514c6e0550346935c8b4908/html5/thumbnails/25.jpg)
Ion series modeled in a Markov chain
Is b-ionobserved?
b-ionintensity
Is b-ionobserved?
b-ionintensity
Is b-ionobserved?
b-ionintensity
Is b-ionobserved?
b-ionintensity
Is b-ionobserved?
b-ionintensity
~ PepHMM (Han et al., 2005).
![Page 26: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering](https://reader036.vdocuments.net/reader036/viewer/2022062511/5514c6e0550346935c8b4908/html5/thumbnails/26.jpg)
A more realistic model
Is b-ionobserved?
b-ion intensity
N-termAA
C-term AA
Is ion detectable?
Fractionalm/z
Is protonmobile?
![Page 27: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering](https://reader036.vdocuments.net/reader036/viewer/2022062511/5514c6e0550346935c8b4908/html5/thumbnails/27.jpg)
Ion series modeled in a Markov chain
model nullpeptide ions,-bPr
modelpeptide ions,-bPrlogbLOR
![Page 28: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering](https://reader036.vdocuments.net/reader036/viewer/2022062511/5514c6e0550346935c8b4908/html5/thumbnails/28.jpg)
Vectors of log-odds ratios
Correct peptide-spectrum matches Incorrect peptide-spectrum matches
![Page 29: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering](https://reader036.vdocuments.net/reader036/viewer/2022062511/5514c6e0550346935c8b4908/html5/thumbnails/29.jpg)
Binary classifier
![Page 30: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering](https://reader036.vdocuments.net/reader036/viewer/2022062511/5514c6e0550346935c8b4908/html5/thumbnails/30.jpg)
Model Evaluation: Accuracy
Model Redundant TP/FP Unique TP/FP
Bayes Net 285/300, 95% 137/144, 95.1%
SEQUEST 288/300, 96% 136/144, 94.4%
InsPecT 274/300, 91.3% 131/144, 90.9%
TrainedModel
TestPSMs
TrainingPSMs
ProbabilityModel
Evaluation
![Page 31: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering](https://reader036.vdocuments.net/reader036/viewer/2022062511/5514c6e0550346935c8b4908/html5/thumbnails/31.jpg)
An incorrect identification
SEQUEST: LRPGAELLEGAHVGNFVEMKBayes net: HQDETQDALNALDLLTNEK
Blue = b and y, green = a, red = ammonia loss, magenta = water loss, sienna = +2
This peptide does not appear in E. coli, the organism from which this protein sample was derived.
![Page 32: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering](https://reader036.vdocuments.net/reader036/viewer/2022062511/5514c6e0550346935c8b4908/html5/thumbnails/32.jpg)
Co-eluting peptides
SEQUEST: AFPEAVLFIHPLDAKBayes net: DVFVHFSALQGNQFK
Blue = b and y, green = a, red = ammonia loss, magenta = water loss, sienna = +2
![Page 33: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering](https://reader036.vdocuments.net/reader036/viewer/2022062511/5514c6e0550346935c8b4908/html5/thumbnails/33.jpg)
Future directions
• Build a single Bayesian network that includes all ion types.
• Produce more descriptive outputs from the Bayesian network for input to the classifier.
• Add more biophysical details to the model: chromatography retention time, a better mass-to-charge estimate, etc.
• Generate a better (larger, more accurate) gold standard data set.
![Page 34: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering](https://reader036.vdocuments.net/reader036/viewer/2022062511/5514c6e0550346935c8b4908/html5/thumbnails/34.jpg)
Acknowledgments
• DNase I hypersensitivity– John Stamatoyannopoulos– Pete Sabo– Scott Kuehn– many others in the Stam
lab
• Wavelet analysis: Bob Thurman
• Change point model– Charles Lawrence– Heng Lian– William Thompson
• Mass spectrometry– Aaron Klammer– Jeff Bilmes– Sheila Reynolds– Michael MacCoss