Download - Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss
![Page 1: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/1.jpg)
.
Hidden Markov Models
with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss
![Page 2: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/2.jpg)
Outline
Markov Models Hidden Markov Models The Main Problems in HMM Context Implementation Issues Applications of HMMs
![Page 3: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/3.jpg)
Weather: A Markov Model
Sunny
Rainy
Snowy
80%
15%
5%
60%
2%
38%
20%
75% 5%
![Page 4: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/4.jpg)
Ingredients of a Markov Model
States:
State transition probabilities:
Initial state distribution:
Sunny
Rainy
Snowy
80%
15%
5%
60%
2%
38%
20%
75% 5%
][ 1 ii SqP
},...,,{ 21 NSSS
)|( 1 jtitij SqSqPa
![Page 5: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/5.jpg)
Ingredients of Our Markov Model
States:
State transition probabilities:
Initial state distribution:
)05.25.7.(
},,{ snowyrainysunny SSS
2.05.75.
02.6.38.
05.15.8.
A
Sunny
Rainy
Snowy
80%
15%
5%
60%
2%
38%
20%
75% 5%
![Page 6: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/6.jpg)
Probability of a Seq. of States
Given:
What is the probability of this seq. of states?
)05.25.7.(
2.05.75.
02.6.38.
05.15.8.
A
0001512.02.002.06.06.015.07.0
)|()|()|()|()|()(
snowysnowyrainysnowy
rainyrainyrainyrainysunnyrainysunny
SSPSSPSSPSSPSSPSP
![Page 7: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/7.jpg)
Outline
Markov Models Hidden Markov Models The Main Problems in HMM Context Implementation Issues Applications of HMMs
![Page 8: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/8.jpg)
Hidden Markov Models
Sunny
Rainy
Snowy
80%
15%
5%
60%
2%
38%
20%
75% 5%
Sunny
Rainy
Snowy
80%
15%
5%
60%
2%
38%
20%
75% 5%
60%
10%
30%
65%
5%
30%
50%0%50%
NOT OBSERVABLE
![Page 9: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/9.jpg)
Sunny
Rainy
Snowy
80%
15%
5%
60%
2%
38%
20%
75% 5%
60%
10%
30%
50%0%50%
65%
5%
30%Ingredients of an HMM
States: State transition probabilities:
Initial state distribution:][ 1 ii SqP
},...,,{ 21 NSSS
)|( 1 jtitij SqSqPa
Observations:
Observation probabilities:
},...,,{ 21 MOOO
)|()( jtktj SqOvPkb emit output k in state j
prob of moving from state j to
state i
![Page 10: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/10.jpg)
Sunny
Rainy
Snowy
80%
15%
5%
60%
2%
38%
20%
75% 5%
60%
10%
30%
50%0%50%
65%
5%
30%
Ingredients of Our HMM States: Observations: State transition probabilities:
Initial state distribution: Observation probabilities:
)05.25.7.(
},,{ snowyrainysunny SSS
5.5.0
65.3.05.
1.3.6.
B
},,{ umbrellacoatshorts OOO
2.05.75.
02.6.38.
05.15.8.
A
![Page 11: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/11.jpg)
Three Basic Problems
Evaluation (aka likelihood): compute P(O| an HMM)
Decoding (aka inference): given an observed output sequence O
compute most likely state at each time periodcompute most likely state sequence
q* = argmax_q P(q|O, HMM) Training (aka learning):
find HMM* = argmax_HMM P(O|HMM)
![Page 12: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/12.jpg)
Probability of an Output Sequence
Given:
What is the probability of this output sequence?
)05.25.7.(
2.05.75.
02.6.38.
05.15.8.
A
5.5.0
65.3.05.
1.3.6.
B
),...,(),...,|()()|( 7171,..., all 71
qqPqqOPQPQOPqqQ
),...,,,()( umbrellaumbrellacoatcoat OOOOPOP
...6.01.03.08.07.0 426 exponential number of
terms
![Page 13: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/13.jpg)
The Forward Algorithm
S2
S3
S1
S2
S3
S1
O2 O3O1O2 O3O1
S2
S3
S1
O2 O3O1
S2
S3
S1
O2 O3O1
S2
S3
S1
O2 O3O1
…
),,...,()( 1 ittt SqOOPi
N
i
N
iTiT iSqOPOP
1 1
)(),()(
![Page 14: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/14.jpg)
The Forward Algorithm (cont.)
S2
S3
S1
S2
S3
S1
O2 O3O1O2 O3O1
S2
S3
S1
O2 O3O1
S2
S3
S1
O2 O3O1
S2
S3
S1
O2 O3O1
…
),,...,()( 1 ittt SqOOPi
)()(
)()|,(
),,...,(),,...,|,,...,(
),,...,()(
11
111
111111
1111
iaOb
iSqSqOP
SqOOPSqOOSqOOP
SqOOPj
tijt
N
ij
titjtt
N
i
N
iittittjtt
jttt
)()( 11 Obi ii
first, get to state i, then move to state j,
then omit output
O[t+1]
![Page 15: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/15.jpg)
Exercise
What is the probability of observing AB?
a. Initial state s1:
b. Initial state chosen at random:
s2s1
0.60.4
0.30.7
0.3
B
0.7
A
0.2
B
0.8
A
0.2 (0.4 0.8 + 0.6 0.7) = 0.148
0.5 0.148 + (0.5 0.3 (0.3 0.7 + 0.7 0.8)) = 0.1895
![Page 16: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/16.jpg)
The Backward Algorithm
S2
S3
S1
S2
S3
S1
O2 O3O1O2 O3O1
S2
S3
S1
O2 O3O1
S2
S3
S1
O2 O3O1
S2
S3
S1
O2 O3O1
…
)|,...,,()( 21 itTttt SqOOOPi
1)( iT
N
jttjijt jObai
111 )()(...)(
P(O) = sum over i: P(q1 is i) * P(emit O1 in state i) * beta_1(i)
![Page 17: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/17.jpg)
The Forward-Backward Algorithm
S2
S3
S1
S2
S3
S1
O2 O3O1O2 O3O1
S2
S3
S1
O2 O3O1
S2
S3
S1
O2 O3O1
S2
S3
S1
O2 O3O1
…
)(it)(it
P(O) = sum over i: alpha_t(i) * beta_t(i) for any t=> you can derive the formulas for forward algand backward alg from this
![Page 18: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/18.jpg)
Finding the best state sequenceWe would like to find the most likely path (and not just the most likely state at each time slice)
The Viterbi algorithm is an efficient method for finding the MPE:
and we to reconstruct the path:
)O,Q(P)O|Q(P maxargmaxargQQ
jitj
1tjitj
1ti1t
1ii1
a)j(maxarg)i(a)j(max)O(b)i(
)O(bq)i(
)Q(Q
)i(maxargQ)i(max)Q(P
tt1t
Ti
TTi
![Page 19: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/19.jpg)
Hidden Markov Models
Sunny
Rainy
Snowy
80%
15%
5%
60%
2%
38%
20%
75% 5%
Sunny
Rainy
Snowy
80%
15%
5%
60%
2%
38%
20%
75% 5%
60%
10%
30%
65%
5%
30%
50%0%50%
NOT OBSERVABLE
![Page 20: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/20.jpg)
Learning the Model with EM
Problem: Find HMM that makes data most likely
E-Step: Computefor given
M-Step: Compute new under these expectations (this is now a Markov model)
),|( OSqP it
![Page 21: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/21.jpg)
E-Step
Calculate
using the forward-backward algorithm, for fixed model
),|()( OSqPi itt
),|,(),( 1 OSqSqPji jtitt
![Page 22: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/22.jpg)
The M Step: generate =(, a, b))(1 at time statein timesofnumber expected 1 iSii
T
tt
T
tt
i
jiji
i
ji
S
SSa
1
1
)(
),(
state from ns transitioofnumber expected
state to state from ns transitioofnumber expected
T
tt
T
ttvO
i
kii
i
iI
S
vSkb
kt
1
1
)(
)(
statein timesofnumber expected
observing statein timesofnumber expected)(
![Page 23: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/23.jpg)
Understanding the EM Algorithm
The best way to understand the EM algorithm start with the M step, understand what quantities
it needs then look at the E step, see how it computes
those quantities with the help of the forward-backward algorithm
![Page 24: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/24.jpg)
Summary (Learning)
Given observation sequence O Guess initial model Iterate:
Calculate expected times in state Si at time t (and in Sj at
time t) using forward-backward algorithm Find new model by frequency counts
![Page 25: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/25.jpg)
Implementing HMM Algorithms
Quantities get very small for long sequences Taking logarithm helps
the Viterbi algorithm computing the alphas and betas not helpful in computing gammas
Normalization method can help these problems see the note by ChengXiang Zhai
![Page 26: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/26.jpg)
Problems with HMMs
Zero probabilities Training sequence: AAABBBAAA Test sequence: AAABBBCAAA
Finding “right” number of states, right structure Numerical instabilities
![Page 27: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/27.jpg)
Outline
Markov Models Hidden Markov Models The Main Problems in HMM Context Implementation Issues Applications of HMMs
![Page 28: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/28.jpg)
Three Problems
What bird is this?
How will the song continue?
Is this bird abnormal?
Time series classification
Time series prediction
Outlier detection
![Page 29: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/29.jpg)
Time Series Classification
Train one HMM l for each bird l Given time series O, calculate
'' )'()|(
)()|()|bird(
ll
l
lPOP
lPOPOlP
![Page 30: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/30.jpg)
Outlier Detection
Train HMM Given time series O, calculate probability
If abnormally low, raise flag If high, raise flag
)|( OP
![Page 31: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/31.jpg)
Time Series Prediction
Train HMM Given time series O, calculate distribution over final
state (via )
and ‘hallucinate’ new states and observations according to a, b
),|( OSqP iT
![Page 32: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/32.jpg)
Typical HMM in Speech Recognition
20-dim frequency spaceclustered using EM
Use Bayes rule + Viterbi for classification
Linear HMM representing one phoneme
[Rabiner 86] + everyone else
![Page 33: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/33.jpg)
Typical HMM in Robotics
[Blake/Isard 98, Fox/Dellaert et al 99]
![Page 34: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/34.jpg)
IE with Hidden Markov Models
Yesterday Pedro Domingos spoke this example sentence.
Yesterday Pedro Domingos spoke this example sentence.
Person name: Pedro Domingos
Given a sequence of observations:
and a trained HMM:
Find the most likely state sequence: (Viterbi)
Any words said to be generated by the designated “person name”state extract as a person name:
),(maxarg osPs
person name
location name
background
![Page 35: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/35.jpg)
HMM for Segmentation
Simplest Model: One state per entity type
![Page 36: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/36.jpg)
What is a “symbol” ???
Cohen => “Cohen”, “cohen”, “Xxxxx”, “Xx”, … ?
4601 => “4601”, “9999”, “9+”, “number”, … ?
000.. . . .999
3 -d ig i ts
00000 .. . .99999
5 -d ig i ts
0 ..99 0000 ..9999 000000 ..
O th e rs
N u m b e rs
A .. ..z
C h a rs
a a ..
M u lt i -le tte r
W o rds
. , / - + ? #
D e lim ite rs
A ll
Datamold: choose best abstraction level using holdout set
![Page 37: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/37.jpg)
HMM Example: “Nymble”
Other examples of shrinkage for HMMs in IE: [Freitag and McCallum ‘99]
Task: Named Entity Extraction
Train on ~500k words of news wire text.
Case Language F1 .Mixed English 93%Upper English 91%Mixed Spanish 90%
[Bikel, et al 1998], [BBN “IdentiFinder”]
Person
Org
Other
(Five other name classes)
start-of-sentence
end-of-sentence
Transitionprobabilities
Observationprobabilities
P(st | st-1, ot-1 ) P(ot | st , st-1 )
Back-off to: Back-off to:
P(st | st-1 )
P(st )
P(ot | st , ot-1 )
P(ot | st )
P(ot )
or
Results:
![Page 38: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/38.jpg)
Passage Selection (e.g., for IR)
Document
Query
Collection Information
Relevantpassages
How is a relevant passage different from a background passage in terms of language modeling?
Backgroundpassages
![Page 39: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d5d5503460f94a3b628/html5/thumbnails/39.jpg)
HMMs: Main Lessons
HMMs: Generative probabilistic models of time series (with hidden state)
Forward-Backward: Algorithm for computing probabilities over hidden states
Learning models: EM, iterates estimation of hidden state and model fitting
Extremely practical, best known methods in speech, computer vision, robotics, …
Numerous extensions exist (continuous observations, states; factorial HMMs, controllable HMMs=POMDPs, …)