hidden markov models cse 586, spring 2011 …rtc12/cse586/lectures/cse586hmm_6pp.pdf · cse 586,...
TRANSCRIPT
CSE586Robert Collins
CSE 586, Spring 2011
Advanced Computer Vision
Graphical Models:Hidden Markov Models
CSE586Robert Collins
Hidden Markov Models
Note: a good background reference is LR Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proc. of the IEEE, Vol.77, No.2, pp.257-286, 1989.
CSE586Robert Collins
Markov Chain
Set of states, e.g. {S1,S2,S3,S4,S5}
Table of transition probabilities (aij = Probof going from state Si to Sj)
a11 a12 a13 a14 a15
a21 a22 a23 a24 a25
a31 a32 a33 a34 a35
a41 a42 a43 a44 a45
a51 a52 a53 a54 a55
P(Sj | Si)
Prob of starting in each statee.g. 1, 2, 3, 4, 5
P(Si)
Computation: prob of an ordered sequence S2 S1 S5 S4
P(S2) P(S1 | S2) P(S5 | S1) P(S4 | S5) = 2 a21 a15 a54
Note: this picture is a state transitiondiagram, not a graphical model!
CSE586Robert Collins
Hidden Markov Model
Set of states, e.g. {S1,S2,S3,S4,S5}
Table of transition probabilities (aij = Probof going from state Si to Sj)
a11 a12 a13 a14 a15
a21 a22 a23 a24 a25
a31 a32 a33 a34 a35
a41 a42 a43 a44 a45
a51 a52 a53 a54 a55
P(Sj | Si)
Prob of starting in each statee.g. 1, 2, 3, 4, 5
P(Si)
Set of observable symbols, e.g. {v1,v2,v3}
{v1,v2,v3}
{v1,v2,v3}
{v1,v2,v3}
{v1,v2,v3}
{v1,v2,v3}
Prob of seeing each symbol in a given state(bik = Prob seeing vk in state Si)
b11 b21 b31 b41 b51
b12 b22 b32 b42 b52
b13 b23 b33 b43 b53
P(vk | Si)
CSE586Robert Collins
Hidden Markov Model
Computation: Prob of ordered sequence (S2,v3) (S1,v1) (S5,v2) (S4,v1)
P(S2,S1,S5,S4,v3,v1,v2,v1) = P(S2) P(v3|S2) P(S1|S2) P(v1|S1) P(S5|S1) P(v2|S5) P(S4|S5) P(v1|S4) = 2 b23 a21 b11 a15 b52 a54 b41
= (2 a21 a15 a54) ( b23 b11 b52b41)= P(S1,S2,S3,S4) P(v1,v2,v3,v4|S1,S2,S3,S4)
CSE586Robert Collins
HMM as Graphical Model
O1 O2 O3 O4 observations
B B BB
X1 X2 X3 X4 markov chain
A A A
Let state random variables be X1,X2,X3,X4,... with Xt being state at time t.Allowable values of Xt (a discrete random variable) are {S1,S2,...,Sn}
Given HMM with states S1,S2,...,Sn symbols v1,v2,...,vk, transitionprobs A={aij}, initial probs ={i}, and observation probs B={bik}
Let observed random variables be O1,O2,O3... with Ot observed at time t.Allowable values of Ot (a discrete random variable) are {v1,v2,...,vk}
CSE586Robert Collins
HMM as Graphical Model
Let state random variables be X1,X2,X3,X4,... with Xt being state at time t.Allowable values of Xt (a discrete random variable) are {S1,S2,...,Sn}
X1 X2 X3 X4
O1 O2 O3 O4
markov chain
observations
A
Given HMM with states S1,S2,...,Sn symbols v1,v2,...,vk, transitionprobs A={aij}, initial probs ={i}, and observation probs B={bik}
Let observed random variables be O1,O2,O3... with Ot observed at time t.Allowable values of Ot (a discrete random variable) are {v1,v2,...,vk}
A A
B B BB
Oi are observed variables. Xj are hidden (latent) variables.
CSE586Robert Collins
HMM as Graphical Model
X1 X2 X3 X4
O1 O2 O3 O4
markov chain
observations
A
Verify: What is P(X1,X2,X3,X4,O1,O2,O3,O4)?
A A
B B BB
P(X1,X2,X3,X4,O1,O2,O3,O4) =P(X1) P(O1|X1) P(X2|X1) P(O2|X2) P(X3|X2) P(O3|X3) P(X4|X3)P(O4|X4) ...
CSE586Robert Collins
Three Computations for HMMs CSE586Robert Collins
HMM Problem 1
What is the likelihood of observing a sequence, e.g. O1,O2,O3 ?
Note: there are multiple ways that a given sequence could be emitted, involvingdifferent sequences of hidden states X1,X2,X3
One (inefficient) way to compute the answer: generate all sequences of threestates Sx,Sy,Sz and compute P(Sx,Sy,Sz,O1,O2,O3) [which we know how todo]. Summing up over all sequences of three states gives us our answer.
Drawback, there are 3^N subsequences that can be formed from N states.
CSE586Robert Collins
HMM Problem 1
Better (efficient) solution is based on message passing / belief propagation.
marginal
This is the sum-product procedure of message passing. It is called the forward-backward procedure in HMM terminology
CSE586Robert Collins
HMM Problem 2
What is the most likely sequence of hidden states S1,S2,S3 given that wehave observed O1,O2,O3 ?
Again, there is an inefficient way based on explicitly generating the exponential number of possible state sequences... AND, there is amore efficient way using message passing.
X
X
CSE586Robert Collins
HMM Problem 2
XX
X
maxmaxmaxX
Define:
Then XX
This is the max-product procedure of message passing. It is called the Viterbi algorithm in HMM terminology
CSE586Robert Collins
Viterbi, under the hood
X1
S1
S2
S3
S4
X2
S1
S2
S3
S4
X3
S1
S2
S3
S4
X4
S1
S2
S3
S4
B E
1) Build a state-space trellis. 2) Add a source and sink node.4) Assign weights to edges based on observations.3) Given observations.
O1 O2 O3 O4
P(Si)P(O1|Si)P(Sj|Si) P(O2|Si)P(Sj|Si)
P(O4|Si)P(O3|Si)P(Sj|Si)
Do multistage dynamic programming to find the max product path.
(note: in some implementations, each edge is weighted by -log(wi) of the weights shown here, so that standard min length path DP can be performed.)
CSE586Robert Collins
HMM Problem 3
Given a training set of observation sequences{{O1,O2,...}, {O1,O2,...}, {O1,O2,...}}
how do we determine the transition probs aij, initial state probs pi,and observation probs bik?
This is a learning problem. It is the hardest problem of the three. We assume topology of the HMM is known (number and connectivityof the states) otherwise it is even harder.
Two popular approaches:
• Segmental K-means algorithm
• Baum-Welch algorithm (EM-based)
CSE586Robert Collins
Steve Mills, U.Nottingham
CSE586Robert Collins
Steve Mills, U.Nottingham
CSE586Robert Collins
Steve Mills, U.Nottingham
Segmental K-Means
CSE586Robert Collins
Steve Mills, U.Nottingham
Note: this is essentially a clustering problem!
CSE586Robert Collins
Steve Mills, U.Nottingham
CSE586Robert Collins
Steve Mills, U.Nottingham
CSE586Robert Collins
Steve Mills, U.Nottingham
CSE586Robert Collins
Steve Mills, U.Nottingham
CSE586Robert Collins
Steve Mills, U.Nottingham
CSE586Robert Collins
Steve Mills, U.Nottingham
CSE586Robert Collins
Steve Mills, U.Nottingham
CSE586Robert Collins
Steve Mills, U.Nottingham
Note: This part assumes continuous-valued observations are output. For a discrete set of observation symbols, we could just do histograms here.
CSE586Robert Collins
Steve Mills, U.Nottingham
CSE586Robert Collins
Steve Mills, U.Nottingham
CSE586Robert Collins
Steve Mills, U.Nottingham
CSE586Robert Collins
Steve Mills, U.Nottingham
CSE586Robert Collins
Steve Mills, U.Nottingham
CSE586Robert Collins
Steve Mills, U.Nottingham
Really?
CSE586Robert Collins
Baum-Welch Algorithm
Same thing, but with dancing bears.
fractional assignments of observations to states
Basic idea is to use EM instead of K-means as a way of assessing ownership weights beforecomputing observation statistics at each state.