cédric notredame (21/10/2015) uncovering sequences mysteries with hidden markov model cédric...
TRANSCRIPT
![Page 1: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/1.jpg)
Cédric Notredame (21/04/23)
Uncovering
Sequences
Mysteries
With
Hidden Markov
ModelCédric Notredame
![Page 2: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/2.jpg)
Cédric Notredame (21/04/23)
![Page 3: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/3.jpg)
Cédric Notredame (21/04/23)
Our Scope
Understand the principle of HMMs
Understand HOW HMMs are used in Biology
Look once Under the Hood
![Page 4: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/4.jpg)
Cédric Notredame (21/04/23)
Outline
-Reminder of Bayesian Probabilities
-Application to gene prediction
-Application Tm predictions
-HMMs and Markov Chains
-Application to Domain/Prot Family Prediction
-Future Applications
![Page 5: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/5.jpg)
Cédric Notredame (21/04/23)
Conditional Probabilities
AndBayes Theorem
![Page 6: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/6.jpg)
Cédric Notredame (21/04/23)
I now send you an essay which I have found among the papers of our deceased friend Mr Bayes, and which, in my opinion, has great merit... In an introduction which he has writ to this Essay, he says, that his design at first in thinking on the subject of it was, to find out a method by which we might judge concerning the probability that an event has to happen, in given circumstances, upon supposition that we know nothing concerning it but that, under the same circumstances, it has happened a certain number of times, and failed a certain other number of times.
Bayes
![Page 7: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/7.jpg)
Cédric Notredame (21/04/23)
“The Durbin…”
![Page 8: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/8.jpg)
Cédric Notredame (21/04/23)
What is a Probabilistic Model ?
Dice = Probabilistic Model
-Each Possible outcome has a probability (1/6)
-Biological Questions:
-What kind of dice would generate coding DNA
-Non-Coding ?
![Page 9: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/9.jpg)
Cédric Notredame (21/04/23)
Which Parameters ?
Dice = Probabilistic Model
-A Priori estimation: 1/6 for each Number
Parameters: proba of each outcome
-Through Observation:-measure frequencies on a large numberof events
OR
![Page 10: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/10.jpg)
Cédric Notredame (21/04/23)
Which Parameters ?
Model: Intra/Extra Protein
1- Make a set of Inside Proteins using annotation
Parameters: proba of each outcome
2- Make a set of Outside Proteins using annotation
3- COUNT Frequencies on the two sets
Model Accuracy Training Set
![Page 11: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/11.jpg)
Cédric Notredame (21/04/23)
Maximum Likelihood Models
Model: Intra/Extra Proteins
1- Make training set
2- Count Frequencies
Model Accuracy Training Set
Maximum Likelihood Model:
Model probability MAXIMISES Data probability
![Page 12: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/12.jpg)
Cédric Notredame (21/04/23)
Maximum Likelihood Models
Model: Intra/Extra-Cell Proteins
Model Probability MAXIMISES Data ProbabilityAND Data Probability MAXIMISES Model Probability
P ( Model ¦ Data) is Maximised
¦ means GIVEN!
Maximum Likelihood Model
![Page 13: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/13.jpg)
Cédric Notredame (21/04/23)
Maximum Likelihood Models
Model: Intra/Extra-Cell Proteins
Model Probability MAXIMISES Data ProbabilityAND Data Probability MAXIMISES Model Probability
P ( Model ¦ Data) is Maximised
Maximum Likelihood Model
P ( Data ¦ Model) is Maximised
![Page 14: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/14.jpg)
Cédric Notredame (21/04/23)
Maximum Likelihood Models
Model: Intra/Extra-Cell Proteins
Data: 11121112221212122121112221112121112211111
P ( Coin ¦ Data)< P(Dice ¦ Data)
Maximum Likelihood Model
![Page 15: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/15.jpg)
Cédric Notredame (21/04/23)
Conditional Probabilities
![Page 16: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/16.jpg)
Cédric Notredame (21/04/23)
Conditional Probabilities
The Probability that something happens IF
something else ALSO
Happens
P (Win Lottery ¦ Participation)
![Page 17: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/17.jpg)
Cédric Notredame (21/04/23)
Conditional Probability
The Probability that something happens IF
something else ALSO
Happens
Dice 1Dice 2
P(6¦ Dice 1)=1/6P(6¦ Dice 2)=1/2
Loaded!
![Page 18: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/18.jpg)
Cédric Notredame (21/04/23)
P(6¦ D1)=1/6P(6¦ D2)=1/2
P(6,D2)=P(6¦D2) * P(D2)=1/2* 1/100
Joint Probability
The Probability that something happens IF
something else ALSO
Happens
Comma
AND
![Page 19: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/19.jpg)
Cédric Notredame (21/04/23)
Joint Probability
Question: What is the probability of Making a 6, given that the Loaded Dice is used 1% of
the time
P(6¦ DF and DL)= P(6, DF) + P(6, DL)= P(6 ¦ DF) * P(DF) + P(6¦ DL)*P(DL)= 1/6*0.99 + 1/2*0.01= 0.17
(0.16 for an unloaded dice)
![Page 20: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/20.jpg)
Cédric Notredame (21/04/23)
Joint Probability
P(6¦ DF and DL)= P(6, DF) + P(6, DL)= P(6 ¦ DF) * P(DL) + P(6¦ DF)*P(DL)= 1/6*0.99 + 1/2*0.01= 0.17(0.16 for an unloaded dice)
Unsuspected Heterogeneity In the training set
Inaccurate Parameters Estimation
![Page 21: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/21.jpg)
Cédric Notredame (21/04/23)
Bayes Theorem
X : Model or Data or any EventY : Model or Data or any Event
P(Xi¦ Y) =
P(Y¦Xi) * P(Xi)
(P(Y¦Xi)*P(Xi
))i
![Page 22: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/22.jpg)
Cédric Notredame (21/04/23)
Bayes Theorem
X : Model or Data or any EventY : Model or Data or any Event
XT=X+ X
P(Y,X)+ P(Y,X)
P(Y)
P(X¦ Y) =
P(Y¦X) * P(X)
P(Y¦X)*P(X)+ P(Y¦X)*P(X)
![Page 23: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/23.jpg)
Cédric Notredame (21/04/23)
Bayes Theorem
X : Model or Data or any EventY : Model or Data or any event
P(X¦ Y) =
P(Y¦X) * P(X)
P(Y)
Proba of Observing XIF Y is fulfilled ‘Remove’ P(Y)
to Get P(X¦Y)
Proba of Observing Y
AND X simultaneously
![Page 24: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/24.jpg)
Cédric Notredame (21/04/23)
Bayes Theorem
X : Model or Data or any EventY : Model or Data or any event
P(X¦Y) = P(X,Y)
P(Y)
Proba of Observing XIF Y is fulfilled
Proba of Observing Y and X simultaneously
‘Remove’ P(Y) to Get P(X¦Y)
![Page 25: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/25.jpg)
Cédric Notredame (21/04/23)
Using Bayes Theorem
Question:The dice gave three 6s in a rowIS IT LOADED !!!
We will use Bayes Theorem to test our belief:
If the Dice was loaded (model) what would be the probability of this
ModelGiven the data (three 6 in a row)
![Page 26: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/26.jpg)
Cédric Notredame (21/04/23)
Using Bayes Theorem
Question:The dice gave three 6s in a rowIS IT LOADED !!!
P(D1)=0.99P(D2)=0.01P(6¦D1)=1/6P(6¦D2)=1/2
Occasionally DishonestCasino…
![Page 27: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/27.jpg)
Cédric Notredame (21/04/23)
Using Bayes Theorem
Question:The dice gave three 6s in a rowIS IT LOADED !!!
P(D2¦63) = P(63 ¦D2)*P(D2)
P(63 ¦D1)*p(D1) + P(63¦D2)*P(D2)
P(D1)=0.99P(D2)=0.01P(6¦D1)=1/6P(6¦D2)=1/2
P(X¦ Y) =
P(Y¦X)*P(X)
P(Y)
63 with D1 63 with D2
Y: 63
X: D2
![Page 28: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/28.jpg)
Cédric Notredame (21/04/23)
Using Bayes Theorem
Question:The dice gave three 6s in a rowIS IT LOADED !!!
P(D2¦63) = P(63 ¦D2)*P(D2)
P(63 ¦D1)*p(D1) + P(63¦D2)*P(D2)
P(D1)=0.99P(D2)=0.01P(6¦D1)=1/6P(6¦D2)=1/2
P(X¦ Y) =
P(X,Y)
P(Y)
= 0.21
Probably NOT
![Page 29: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/29.jpg)
Cédric Notredame (21/04/23)
Posterior Probability
Question:The dice gave three 6s in a rowIS IT LOADED !!!
P(D2¦63) = P(63 ¦D2)*P(D2)
P(63 ¦D1)*p(D1) + P(63¦D2)*P(D2)= 0.21
0.21 is a posterior probability: it was estimated AFTER the Data was obtained
P(63¦D2) is the likelihood of the Hypotheses
![Page 30: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/30.jpg)
Cédric Notredame (21/04/23)
Debunking Headlines
P(Migrant) =0.1P(Criminal) =0.0001P(M¦C)=0.5
P(C¦M) =
P(M¦C)*P(C)
P(M)
50% of the crimes are committed by Migrants.
Question: Are 50% of the Migrants Criminals??.
NO: 0.05% Migrants only are Criminals (NOT 50%!)
= 0.5*0.0001
0.1=0.0005P(C¦M)
=
P(M¦C)*P(C)
P(M)
![Page 31: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/31.jpg)
Cédric Notredame (21/04/23)
Debunking Headlines
50% of Gene Promoters contain TATA.
P(T)=0.1P(P)=0.0001P(T¦P)=0.5
P(P¦T) = P(T¦P)*P(P)
P(T)
Question:IS TATA a good gene predictor
NO
= 0.5*0.0001
0.1=0.0005P(P¦T) =
P(T¦P)*P(P)
P(T)
![Page 32: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/32.jpg)
Cédric Notredame (21/04/23)
Bayes Theorem
Bayes Theorem Reveals the Trade-offBetween
Sensitivity:Finding ALL the genesand
Specificity: Finding ONLY genes
TATA=High Sensitivity / Low Specificity
![Page 33: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/33.jpg)
Cédric Notredame (21/04/23)
Markov Chains
![Page 34: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/34.jpg)
Cédric Notredame (21/04/23)
What is a Markov Chain ?
Simple Chain: One Dice
-Each Roll is the same-A Roll does not depend on the previous
Markov Chain: Two Dices
-You only use ONE dice: the fair OR the loaded
-The Dice you roll only depends on the previous roll
![Page 35: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/35.jpg)
Cédric Notredame (21/04/23)
What is a Markov Chain ?
Biological Sequences Tend To Behave like Markov Chains
Question/Example
Is it possible to Tell Whether my sequence is CpG island ???
![Page 36: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/36.jpg)
Cédric Notredame (21/04/23)
![Page 37: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/37.jpg)
Cédric Notredame (21/04/23)
What is a Markov Chain ?
Question:
Identify CpG Island sequences
Old Fashion Solution
-Slide a Window of size: Captain’s Height/-Measure the % of CpG-Plot it against the sequence-Decide
![Page 38: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/38.jpg)
Cédric Notredame (21/04/23)
sliding Window Methods
Average
Sliding Window
Sliding Window
![Page 39: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/39.jpg)
Cédric Notredame (21/04/23)
What is a Markov Chain ?
Question:
Identify CpG Island sequences
Bayesian Solution
-Make a CpG Markov Chain-Run the sequence through the Chain-Likelihood for the chain to produce the sequence?
![Page 40: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/40.jpg)
Cédric Notredame (21/04/23)
A
C G
T
Transition
State
Transition ProbabilitiesProbability of Transition from G to C
AGC=P(Xi=C ¦ Xi-1=G)
![Page 41: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/41.jpg)
Cédric Notredame (21/04/23)
P(sequence)=P(XL,XL-1,XL-2,….., X1)
Remember: P(X,Y)=P(X¦Y)*P(Y)
P(sequence)=P(XL¦XL-1)*P(XL-1¦XL-2)….., P(X1) )
In The Markov Chain, XL only depends on XL-1
![Page 42: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/42.jpg)
Cédric Notredame (21/04/23)
P(sequence)=P(XL¦XL-1)*P(XL-1¦XL-2)….., P(X1) )
L
i=2Axi-1 xi
P(sequence)=P(x1)*
AGC=P(Xi=C ¦ Xi-1=G)
![Page 43: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/43.jpg)
Cédric Notredame (21/04/23)
A
C G
T
Arbitrary Beginning and End States can be addedTo The Chain.
By Convention, Only the Beginning State is added
B
![Page 44: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/44.jpg)
Cédric Notredame (21/04/23)
A
C G
T
B
Adding An End State with a Transition Proba T Defines Length probabilities
P(all the sequences length L)=T(1-T)L-1
E
![Page 45: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/45.jpg)
Cédric Notredame (21/04/23)
A
C G
T
The transition are probabilities
The sum of the probability of all thepossible Sequences of all possible
Lengthis 1
B E
![Page 46: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/46.jpg)
Cédric Notredame (21/04/23)
Using Markov Chains
To Predict
![Page 47: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/47.jpg)
Cédric Notredame (21/04/23)
What is a Prediction
Given A sequence We want to know what is the probability that this sequence is a CpG
1-We need a training set:-CpG+ sequences-CpG- sequences
2-We will Measure the transition frequencies, and treat them like probabilities
![Page 48: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/48.jpg)
Cédric Notredame (21/04/23)
What is a Prediction
Is my sequence a CpG ???
2-We will Measure the transition frequencies, and treat them like probabilities
A+GC
N+GC
N+GX
X
=Ratio between the number of transitions GC, and all the other transitions involving G->X
Transition GC: G followed by a C
GCCGCTGCGCGA
![Page 49: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/49.jpg)
Cédric Notredame (21/04/23)
1
What is a Prediction
Is my sequence a CpG ???
2-We will Measure the transition frequencies, and treat them like probabilities
A0.180.170.160.08
C0.270.360.330.35
G0.420.270.370.38
T0.120.180.120.18
+ACGT
A0.300.320.250.17
C0.210.300.250.24
G0.280.080.300.29
T0.210.300.200.29
-ACGT
![Page 50: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/50.jpg)
Cédric Notredame (21/04/23)
A0.180.170.160.08
C0.270.360.330.35
G0.420.270.370.38
T0.120.180.120.18
+ACGT
L
i=1P(seq ¦ M+)= +
Axi-1 xi
What is a Prediction
Is my sequence a CpG ???
3-Evaluate the probability for each of these models to generate our sequence
L
i=1P(seq ¦ M-)= -
A0.300.320.250.17
C0.210.300.250.24
G0.280.080.300.29
T0.210.300.200.29
-ACGT
Axi-1 xi
![Page 51: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/51.jpg)
Cédric Notredame (21/04/23)
Using The Log ODD
Is my sequence a CpG ???
4-Measure the Log Odd
Log Odd Confrontation of the Two Models…Log2 Gives a value in bits (standard)LEN Gives a less spread out score distribution
S(seq)= LogP(seq ¦ M+)
P(seq ¦ M-)~
A+Xi-1,Xi
A-Xi-1,Xi
log2X
1
LEN
![Page 52: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/52.jpg)
Cédric Notredame (21/04/23)
Using The Log ODD
Is my sequence a CpG ???
4-Measure the Log Odd
Positive: more likely than NOT to be CpG
Negative: more likely NOT to be CpG
S(seq)= LogP(seq ¦ M+)
P(seq ¦ M-)~
A+Xi-1,Xi
A-Xi-1,Xi
log2X
1
LEN
![Page 53: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/53.jpg)
Cédric Notredame (21/04/23)
Using The Log ODD
Is my sequence a CpG ???
5-Plot the score distribution
N seq
Bits0
![Page 54: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/54.jpg)
Cédric Notredame (21/04/23)
Using The Log ODD
Is my sequence a CpG ???
5-Plot the score distribution
N seq
Bits0
Things can go Wrong-bad training set-bad param estimation
![Page 55: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/55.jpg)
Cédric Notredame (21/04/23)
Using The Log ODD
Is my sequence a CpG ???
-The Markov Chain is a Good discriminator-PB: What to do with long sequences That are partly CpG, and partly NON CpG ???-How Can we make a prediction Nucleotide per Nucleotide??
-We want to uncover the HIDDEN Boundaries
![Page 56: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/56.jpg)
Cédric Notredame (21/04/23)
Hidden Markov Models
![Page 57: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/57.jpg)
Cédric Notredame (21/04/23)
Hidden Markov Model:Switching Dices
-If you are Cheating You want to switch Dices Without Telling!
-The MODEL Switch is HIDDEN
Simple Chain: One Dice
-Each Roll is the same
-A roll does not depend on the previous
Markov Chain: Two Dices
-You only use ONE dice: the fair OR the loaded
-The Dice you roll only depends on the previous roll
![Page 58: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/58.jpg)
Cédric Notredame (21/04/23)
Using HMMS
Question: I want to find the CpG boundaries
The chain had four symbol AGCT
The Model has eight states: A+, A-, G+, G-, C+, C-, T+, T-
There is no 1to1 correspondence symbol/states:
The state of each symbol is hiddenA can either be in A+ or A-
![Page 59: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/59.jpg)
Cédric Notredame (21/04/23)
Using HMMs
Question: I want to find the CpG boundaries
1-Define the model topology
A+ G+ C+ T+
A- G- C- T-
EVERY transition is possible
C+ TO G- cost more
![Page 60: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/60.jpg)
Cédric Notredame (21/04/23)
Using HMMs
Question: I want to find the CpG boundaries
2-Parameterise the model: count frequencies…
A0.180.170.160.08
C0.270.360.330.35
G0.420.270.370.38
T0.120.180.120.18
+ACGT
A0.300.320.250.17
C0.210.300.250.24
G0.280.080.300.29
T0.210.300.200.29
-ACGT
We also Need + to -
![Page 61: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/61.jpg)
Cédric Notredame (21/04/23)
Using HMMs
Question: I want to find the CpG boundaries
3-FORCE the model to emit your sequence: Viterbi
One can use the model to emit any sequence. This sequence is named a PATH () because it is a walk through the model
G+ C+ G+ C+ T+ C+ C+ C- C- G- T- ….
![Page 62: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/62.jpg)
Cédric Notredame (21/04/23)
The path with the occasionally dishonest Casino
-The state L, emits a symbol with a proba
AL,F =P(i=L¦ i-1=F)
P (emit 6 with L)=EL(6) = P(Xi=6 ¦ i=L)=0.5
Using HMMs
Question: I want to find the CpG boundaries
3-FORCE the model to emit your sequence: Viterbi
Switch Dices: Transition
Roll The Dice: Emission
![Page 63: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/63.jpg)
Cédric Notredame (21/04/23)
1- 0.162- 0.163- 0.164- 0.165- 0.166- 0.16
1- 0.102- 0.103- 0.104- 0.105- 0.106- 0.50
Fair Loaded
Two States: Fair and Loaded
SixEmissionForStateLoaded
Six EmissionFor State Fair
![Page 64: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/64.jpg)
Cédric Notredame (21/04/23)
1- 0.162- 0.163- 0.164- 0.165- 0.166- 0.16
Fair
1- 0.102- 0.103- 0.104- 0.105- 0.106- 0.50
Loaded
P (emit 6L) =EL(6) = P(Xi=6 ¦ i=L)=0.5
Emissionsof L withTheir Proba
AL,F =P(i=L¦ i-1=F) Switch Dices: Transition
Roll The Dice: Emission
![Page 65: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/65.jpg)
Cédric Notredame (21/04/23)
A+
A-
G+
G-
C+
C-
T+
T-
8 STATES, 1 EMISSION per State
![Page 66: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/66.jpg)
Cédric Notredame (21/04/23)
Using HMMs
Question: I want to find the CpG boundaries
3-FORCE the model to emit your sequence: Viterbi
The path:-goes from state to state with a proba
AG+,C+ =P(i=C+¦ i-1=G+)
-in x, it EMITS a symbol with a proba 1
Proba emit G=EG+(G) = P(Xi=G ¦ i=G+)
1
![Page 67: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/67.jpg)
Cédric Notredame (21/04/23)
Using HMMs
Question: I want to find the CpG boundaries
3-FORCE the model to emit your sequence: Viterbi
We are interested in the joint probability of the PATH (chain of G+, C-…) with our Sequence X
Ei
i=1
P(X,)=L
Ai,i-1
(Xi)*A0,1
*
![Page 68: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/68.jpg)
Cédric Notredame (21/04/23)
Using HMMs
Question: I want to find the CpG boundaries
3-FORCE the model to emit your sequence: Viterbi
Ei
i=1
P(X,)=L
Ai,i-1
(Xi)*A0,1
*
A0,C+ *1 * A C+,G- *1 * AG-,C- *1 * AC-,G+ *1
P= C+ G- C- G+X= C G C G
![Page 69: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/69.jpg)
Cédric Notredame (21/04/23)
Using HMMs
Question: I want to find the CpG boundaries
3-FORCE the model to emit your sequence: Viterbi
To Make a prediction We must Identify the Best Scoring Path:
A0,C+ *1 * A C+,G- *1 * AG-,C- *1 * AC-,G+ *1
*=argmax P(x,)
This is NOT a prediction
![Page 70: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/70.jpg)
Cédric Notredame (21/04/23)
Using HMMs
Question: I want to find the CpG boundaries
3-FORCE the model to emit your sequence: Viterbi
To Make a prediction We must Identify the Best Scoring Path:
*=argmax P(x,)
We do this recursively with the VITERBI Algorithm
![Page 71: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/71.jpg)
Cédric Notredame (21/04/23)
A+G+C+T+A-G-C-T-
C
A+G+C+T+A-G-C-T-
G
G+C+G-A-
G+C+G-A-
A+G+C+T+A-G-C-T-
A
G+C+G-
G+C+G+A+G+C+T+A-G-C-T-
G
…
…
A+G+C+T+A-G-C-T-
C
G+
G-
A+G+C+T+A-G-C-T-
G
G+C+
G+C+
![Page 72: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/72.jpg)
Cédric Notredame (21/04/23)
A+G+C+T+A-G-C-T-
G
A+G+C+T+A-G-C-T-
C
A+G+C+T+A-G-C-T-
G
A+G+C+T+A-G-C-T-
A
A+G+C+T+A-G-C-T-
G
A+G+C+T+A-G-C-T-
C
G+ C+ G- A- G- C-
Trace Back
![Page 73: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/73.jpg)
Cédric Notredame (21/04/23)
Initiation:
V0(0)=1, Vk(0)=0 for every k
Recursion: i=1..L
Vl (i)=El(Xi)*Maxk (Vk(i-1)*Akl)
ptri (l)=argmax (Vk(i-1) *Akl)
Termination: i=1..L
P(x,*)=Maxk (Vk(L)*Ak0)
-k and l are two states
-Vk(i) score of the best path 1…i, that finishes on state k and position i
![Page 74: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/74.jpg)
Cédric Notredame (21/04/23)
Initiation: k and l are two states
Recursion: i=1..L
Vl (i)=El(Xi)*Maxk (Vk(i-1)*Akl)
V0(0)=1, Vk(0)=0 for every k
Multiplying Proba can cause an underflow problem
Usually, Proba multiplications are replaced with Log additions
log (a*b) = log (a) + log (b)
![Page 75: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/75.jpg)
Cédric Notredame (21/04/23)
Using HMMs
Question: I want to know the Probability of my sequence Given The model
In Theory, you must sum over ALL the possible PATH. In practice:
* is a good approximation
![Page 76: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/76.jpg)
Cédric Notredame (21/04/23)
Using HMMs
Question: I want to know the Proba of my sequence Given The model
The Forward Algorithm Gives the exact value of P(x)
* is a good approximation But…
![Page 77: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/77.jpg)
Cédric Notredame (21/04/23)
Initiation: k and l are two states
Recursion: i=1..L
Vl (i)=El(Xi)*Maxk (Vk(i-1)*Akl)
V0(0)=1, Vk(0)=0 for every k
Termination:P(x,*)=Maxk (Vk(L)*Ak0)
Viterbi
Initiation: k and l are two states
Recursion: i=1..L
Vl (i)=El(Xi)*k (Vk(i-1)*Akl)
V0(0)=1, Vk(0)=0 for every k
Termination: P(x)=k (Vk(L)*Ak0)
Forward
![Page 78: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/78.jpg)
Cédric Notredame (21/04/23)
Initiation: k and l are two states
Recursion: i=1..L
Vl (i)=El(Xi)*Maxk (Vk(i-1)*Akl)
V0(0)=1, Vk(0)=0 for every k
Termination: P(x,*)=Maxk (Vk(L)*Ak0)
Viterbi
A+G+C+T+A-G-C-T-
…
…
A+G+C+T+A-G-C-T-
G+
G-
Max
Initiation: k and l are two states
Recursion: i=1..L
Vl (i)=El(Xi)*k (Vk(i-1)*Akl)
V0(0)=1, Vk(0)=0 for every k
Termination: P(x)=k (Vk(L)*Ak0)
Forward
A+G+C+T+A-G-C-T-
…
…
A+G+C+T+A-G-C-T-
G+
G-
![Page 79: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/79.jpg)
Cédric Notredame (21/04/23)
Posterior Decodingof
Hidden Markov Models
![Page 80: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/80.jpg)
Cédric Notredame (21/04/23)
Why Posterior Decoding ?
-Viterbi is BRUTAL !!!!-It does Not Associate Individual PredictionsWith a Probability
Question: What is the probability that Nucleotide 1300 really is a CpG Boundary ?
ANSWER: The Backward Algorithm
![Page 81: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/81.jpg)
Cédric Notredame (21/04/23)
Posterieur Decoding ?
Question: What is the probability that Nucleotide 1300 really is a CpG Boundary ?
P (X,i=l)
Probability of Sequence X WITH
position i is in state l
![Page 82: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/82.jpg)
Cédric Notredame (21/04/23)
Posterieur Decoding
i
P (x,i=l)=P(X1…Xi¦ i=l) * P(XL… Xi+1¦ i=l)
i=l
Forward Algorithm
i=l
Backward Algorithm
![Page 83: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/83.jpg)
Cédric Notredame (21/04/23)
Initiation:
Recursion: i=1..L
Fl (i)=El(Xi)*k (Fk(i-1)*Akl)
F0(0)=1, Fk(0)=0 for every k
Termination: P(x)=k (Fk(L)*Ak0)
Forward
Initiation:
Recursion: i=L..1
Bl (i)=El(Xi)*k (Bk(i+1)*Akl)
B0(0)=1, Bk(L)=Ak0 for every k
Termination: P(x)=k (Bk(1)*Ak0)
Backward
![Page 84: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/84.jpg)
Cédric Notredame (21/04/23)
Recursion: i=1..L
Fl (i)=Fl(Xi)*k (Fk(i-1)*Akl)Forward
Recursion: i=L..1
Bl (i)=Bl(Xi)*k (Bk(i+1)*Akl)Backward
P (i=l,X)=Fl(i)*Bl(i)
P (i=l,X)=P(i=l ¦ X)*P(X) = Fl(i) * Bl(i)
Fl(i) * Bl(i)
P(X)P(i=l ¦ X)=
P(X)=F(L)=B(1)
![Page 85: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/85.jpg)
Cédric Notredame (21/04/23)
Sliding Window
P(i=l ¦ X)
Free From The Sliding Window ofArbitrary Size!!!!
![Page 86: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/86.jpg)
Cédric Notredame (21/04/23)
P(i=l ¦ X)
Posterior Decoding is Less Sensitive to the Parameterisation of the model.
![Page 87: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/87.jpg)
Cédric Notredame (21/04/23)
Training HMMs
![Page 88: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/88.jpg)
Cédric Notredame (21/04/23)
Training HMMs ?
Case 1-Set of annotated data
Parameters can be estimated on this data where thePATH is known.
Case 2-NO annotated data and a Model
-Parameterise the model so P(Model¦data)=max-Start with random parameters-Iterate using Baum-Welch, Viterbi or EM
![Page 89: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/89.jpg)
Cédric Notredame (21/04/23)
Trainning HMMs ?
Difficult !!!!
![Page 90: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/90.jpg)
Cédric Notredame (21/04/23)
What MattersAbout
Hidden Markov Models
![Page 91: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/91.jpg)
Cédric Notredame (21/04/23)
HMM and Markov Chains
Bayes Theorem
-Markov Chain: When There is no Hidden State
-Hidden Markov Models: When a Nucleotide can be in different HIDDEN states
![Page 92: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/92.jpg)
Cédric Notredame (21/04/23)
Three Algorithms for HMMS
Viterbi: -Make the State assignments-Predict
Forward: Evaluate the Sequence Probability under the considered model
Backward and Posterior Decoding:Evaluating the proba of the predictionWindow-Free
![Page 93: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/93.jpg)
Cédric Notredame (21/04/23)
Applicationsof HMMs
![Page 94: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/94.jpg)
Cédric Notredame (21/04/23)
What To Do with an HMM?
Transmembrane domain predictions
www.cbs.dtu.dk/services/TMHMM/
![Page 95: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/95.jpg)
Cédric Notredame (21/04/23)
What To Do with an HMM?
RNA structure Prediction/Fold Recognition
SCGF: Stochastic Context Free Grammars(Sean Eddy)
![Page 96: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/96.jpg)
Cédric Notredame (21/04/23)
What To Do with an HMM?
Gene Prediction
State of the art use HMMs
Genemark: Prokaryotes
GenScan: Eukaryotes
![Page 97: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/97.jpg)
Cédric Notredame (21/04/23)
GeneMark
![Page 98: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/98.jpg)
Cédric Notredame (21/04/23)
A typical HMM for Coding DNA
S
GGG 0.02GGGA 0.00GGGT 0.6GGGC 0.38G
TGG 1.00W
64 Codons
GGG 0.02GGGA 0.00GGGT 0.6GGGC 0.38G
TGG 1.00W
E
64 Codons
![Page 99: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/99.jpg)
Cédric Notredame (21/04/23)
Emission (codon Frequency)
Transition (Dipeptide)
A Typical HMM for Coding DNA
![Page 100: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/100.jpg)
Cédric Notredame (21/04/23)
GeneMark HMM
HMM order 5: 6th Nucleotide depends on the 5 previous
Proba of seq (GGG-TGG Given Model)=
Proba(GGG)*Proba(GGG->TGG)*Proba(TGG)
Takes into account Codon Bias AND dipeptide Comp
![Page 101: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/101.jpg)
Cédric Notredame (21/04/23)
What To Do with an HMM?
Family and Domain Identification
PfamSmartProsite Profiles
![Page 102: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/102.jpg)
Cédric Notredame (21/04/23)
What To Do with an HMM?
Bayesian Phylogenic Inference
chite
wheattrybr
mouse
morphbank.ebc.uu.se/mrbayes/manual.php
![Page 103: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/103.jpg)
Cédric Notredame (21/04/23)
What To Do with an HMM?
Metabolic Networks: Bayesian Networks
www.cs.huji.ac.il/~nirf/
![Page 104: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/104.jpg)
Cédric Notredame (21/04/23)
CollectionsOf
Domains HMMs
![Page 105: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/105.jpg)
Cédric Notredame (21/04/23)
What is a Domain HMM ?
SAM, HMMER, PFtools
![Page 106: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/106.jpg)
Cédric Notredame (21/04/23)
Emission Proba
![Page 107: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/107.jpg)
Cédric Notredame (21/04/23)
Using Domain HMMs
Question: I want to Compare my HMM with all the sequences in SwissProt
Very Similar to Dynamic Programming
Requires an adapted Viterbi: Pair-HMM
![Page 108: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/108.jpg)
Cédric Notredame (21/04/23)
Using Domain HMMs
Question: What are the Available CollectionsOf Pre-computed HMMs
Interpro unites many collections
![Page 109: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/109.jpg)
Cédric Notredame (21/04/23)
Interpro: The Idea of Domains
![Page 110: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/110.jpg)
Cédric Notredame (21/04/23)
Interpro: A Federation of Databases
![Page 111: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/111.jpg)
Cédric Notredame (21/04/23)
Using InterPro: Asking a question
Which Domains does the oncogene FosB contain?
![Page 112: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/112.jpg)
Cédric Notredame (21/04/23)
Using InterPro: Asking a question
![Page 113: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/113.jpg)
Cédric Notredame (21/04/23)
Using InterPro: Asking a question
![Page 114: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/114.jpg)
Cédric Notredame (21/04/23)
Finding Domains
-How can I be sure that the domain Prediction of my Protein is real ?
Use the EMBnet pfscan
![Page 115: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/115.jpg)
Cédric Notredame (21/04/23)
Using EMBNet PFscan
![Page 116: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/116.jpg)
Cédric Notredame (21/04/23)
Posterior Decoding With EMBNet PFscan
Important Position that is Well conserved in our sequence
![Page 117: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/117.jpg)
Cédric Notredame (21/04/23)
Posterior
Prior
![Page 118: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/118.jpg)
Cédric Notredame (21/04/23)
The Inside
Of Pfam
![Page 119: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/119.jpg)
Cédric Notredame (21/04/23)
A Typical pfam Domain
![Page 120: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/120.jpg)
Cédric Notredame (21/04/23)
A Typical pfam Domain
HMMER Package:
![Page 121: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/121.jpg)
Cédric Notredame (21/04/23)
![Page 122: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/122.jpg)
Cédric Notredame (21/04/23)
Going FurtherBuilding and Using
HMMs
![Page 123: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/123.jpg)
Cédric Notredame (21/04/23)
HMMer2: hmmer.wustl.edu/Used to create and distribute Pfam
PFtools: www.isrec.isb-sib.ch/ftp-server/pftools/Used to create and distribute Prosite
SAM T02
www.cse.ucsc.edu/research/compbio/sam.html
![Page 124: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/124.jpg)
Cédric Notredame (21/04/23)
EMBOSS Online
www.hgmp.mrc.ac.uk/SOFTWARE/EMBOSS
Jemboss: a JAVA aplet interacting with an EMBOSSServer
![Page 125: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/125.jpg)
Cédric Notredame (21/04/23)
![Page 126: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/126.jpg)
Cédric Notredame (21/04/23)
HMMer
![Page 127: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/127.jpg)
Cédric Notredame (21/04/23)
![Page 128: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/128.jpg)
Cédric Notredame (21/04/23)
EMBASSY(Hmmer)
![Page 129: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/129.jpg)
Cédric Notredame (21/04/23)
![Page 130: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/130.jpg)
Cédric Notredame (21/04/23)
In The End:Markov Uncovered
![Page 131: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/131.jpg)
Cédric Notredame (21/04/23)
HMM and Markov Chains
Domain Collections
Gene Prediction
Bayesian Phylogenetic Inferencechite
wheattrybr
mouse
![Page 132: Cédric Notredame (21/10/2015) Uncovering Sequences Mysteries With Hidden Markov Model Cédric Notredame](https://reader036.vdocuments.net/reader036/viewer/2022062314/56649edd5503460f94bee164/html5/thumbnails/132.jpg)
Cédric Notredame (21/04/23)
HMM and Markov Chains
Domain Collections
Profiles HMM Generalized Profiles
Interactive Tools