hidden markov models cse 586, spring 2011 …rtc12/cse586/lectures/cse586hmm_6pp.pdf · cse 586,...

CSE586Robert Collins

CSE 586, Spring 2011

Advanced Computer Vision

Graphical Models:Hidden Markov Models


Hidden Markov Models

Note: a good background reference is LR Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proc. of the IEEE, Vol.77, No.2, pp.257-286, 1989.


Markov Chain

Set of states, e.g. {S1,S2,S3,S4,S5}

Table of transition probabilities (aij = Probof going from state Si to Sj)

a11 a12 a13 a14 a15

a21 a22 a23 a24 a25

a31 a32 a33 a34 a35

a41 a42 a43 a44 a45

a51 a52 a53 a54 a55

P(Sj | Si)

Prob of starting in each statee.g. 1, 2, 3, 4, 5

P(Si)

Computation: prob of an ordered sequence S2 S1 S5 S4

P(S2) P(S1 | S2) P(S5 | S1) P(S4 | S5) = 2 a21 a15 a54

Note: this picture is a state transitiondiagram, not a graphical model!


Hidden Markov Model

Set of states, e.g. {S1,S2,S3,S4,S5}

Table of transition probabilities (aij = Probof going from state Si to Sj)

a11 a12 a13 a14 a15

a21 a22 a23 a24 a25

a31 a32 a33 a34 a35

a41 a42 a43 a44 a45

a51 a52 a53 a54 a55

P(Sj | Si)

Prob of starting in each statee.g. 1, 2, 3, 4, 5

P(Si)

Set of observable symbols, e.g. {v1,v2,v3}

{v1,v2,v3}

{v1,v2,v3}

{v1,v2,v3}

{v1,v2,v3}

{v1,v2,v3}

Prob of seeing each symbol in a given state(bik = Prob seeing vk in state Si)

b11 b21 b31 b41 b51

b12 b22 b32 b42 b52

b13 b23 b33 b43 b53

P(vk | Si)


Hidden Markov Model

Computation: Prob of ordered sequence (S2,v3) (S1,v1) (S5,v2) (S4,v1)

P(S2,S1,S5,S4,v3,v1,v2,v1) = P(S2) P(v3|S2) P(S1|S2) P(v1|S1) P(S5|S1) P(v2|S5) P(S4|S5) P(v1|S4) = 2 b23 a21 b11 a15 b52 a54 b41

= (2 a21 a15 a54) ( b23 b11 b52b41)= P(S1,S2,S3,S4) P(v1,v2,v3,v4|S1,S2,S3,S4)


HMM as Graphical Model

O1 O2 O3 O4 observations

B B BB

X1 X2 X3 X4 markov chain

A A A

Let state random variables be X1,X2,X3,X4,... with Xt being state at time t.Allowable values of Xt (a discrete random variable) are {S1,S2,...,Sn}

Given HMM with states S1,S2,...,Sn symbols v1,v2,...,vk, transitionprobs A={aij}, initial probs ={i}, and observation probs B={bik}

Let observed random variables be O1,O2,O3... with Ot observed at time t.Allowable values of Ot (a discrete random variable) are {v1,v2,...,vk}



Let state random variables be X1,X2,X3,X4,... with Xt being state at time t.Allowable values of Xt (a discrete random variable) are {S1,S2,...,Sn}

X1 X2 X3 X4

O1 O2 O3 O4

markov chain

observations

A

Given HMM with states S1,S2,...,Sn symbols v1,v2,...,vk, transitionprobs A={aij}, initial probs ={i}, and observation probs B={bik}

Let observed random variables be O1,O2,O3... with Ot observed at time t.Allowable values of Ot (a discrete random variable) are {v1,v2,...,vk}

A A

B B BB

Oi are observed variables. Xj are hidden (latent) variables.



X1 X2 X3 X4

O1 O2 O3 O4

markov chain

observations

A

Verify: What is P(X1,X2,X3,X4,O1,O2,O3,O4)?

A A

B B BB

P(X1,X2,X3,X4,O1,O2,O3,O4) =P(X1) P(O1|X1) P(X2|X1) P(O2|X2) P(X3|X2) P(O3|X3) P(X4|X3)P(O4|X4) ...


Three Computations for HMMs CSE586Robert Collins

HMM Problem 1

What is the likelihood of observing a sequence, e.g. O1,O2,O3 ?

Note: there are multiple ways that a given sequence could be emitted, involvingdifferent sequences of hidden states X1,X2,X3

One (inefficient) way to compute the answer: generate all sequences of threestates Sx,Sy,Sz and compute P(Sx,Sy,Sz,O1,O2,O3) [which we know how todo]. Summing up over all sequences of three states gives us our answer.

Drawback, there are 3^N subsequences that can be formed from N states.


HMM Problem 1

Better (efficient) solution is based on message passing / belief propagation.

marginal

This is the sum-product procedure of message passing. It is called the forward-backward procedure in HMM terminology


HMM Problem 2

What is the most likely sequence of hidden states S1,S2,S3 given that wehave observed O1,O2,O3 ?

Again, there is an inefficient way based on explicitly generating the exponential number of possible state sequences... AND, there is amore efficient way using message passing.

X

X


HMM Problem 2

XX

X

maxmaxmaxX

Define:

Then XX

This is the max-product procedure of message passing. It is called the Viterbi algorithm in HMM terminology


Viterbi, under the hood

X1

S1

S2

S3

S4

X2

S1

S2

S3

S4

X3

S1

S2

S3

S4

X4

S1

S2

S3

S4

B E

1) Build a state-space trellis. 2) Add a source and sink node.4) Assign weights to edges based on observations.3) Given observations.

O1 O2 O3 O4

P(Si)P(O1|Si)P(Sj|Si) P(O2|Si)P(Sj|Si)

P(O4|Si)P(O3|Si)P(Sj|Si)

Do multistage dynamic programming to find the max product path.

(note: in some implementations, each edge is weighted by -log(wi) of the weights shown here, so that standard min length path DP can be performed.)


HMM Problem 3

Given a training set of observation sequences{{O1,O2,...}, {O1,O2,...}, {O1,O2,...}}

how do we determine the transition probs aij, initial state probs pi,and observation probs bik?

This is a learning problem. It is the hardest problem of the three. We assume topology of the HMM is known (number and connectivityof the states) otherwise it is even harder.

Two popular approaches:

• Segmental K-means algorithm

• Baum-Welch algorithm (EM-based)


Steve Mills, U.Nottingham





Segmental K-Means



Note: this is essentially a clustering problem!

















Note: This part assumes continuous-valued observations are output. For a discrete set of observation symbols, we could just do histograms here.













Really?


Baum-Welch Algorithm

Same thing, but with dancing bears.

fractional assignments of observations to states

Basic idea is to use EM instead of K-means as a way of assessing ownership weights beforecomputing observation statistics at each state.

hidden markov models cse 586, spring 2011 …rtc12/cse586/lectures/cse586hmm_6pp.pdf · cse 586,...

Documents