hidden markov models cse 586, spring 2011 …rtc12/cse586/lectures/cse586hmm_6pp.pdf · cse 586,...

6
CSE586 Robert Collins CSE 586, Spring 2011 Advanced Computer Vision Graphical Models: Hidden Markov Models CSE586 Robert Collins Hidden Markov Models Note: a good background reference is LR Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proc. of the IEEE, Vol.77, No.2, pp.257-286, 1989. CSE586 Robert Collins Markov Chain Set of states, e.g. {S1,S2,S3,S4,S5} Table of transition probabilities (a ij = Prob of going from state S i to S j ) a 11 a 12 a 13 a 14 a 15 a 21 a 22 a 23 a 24 a 25 a 31 a 32 a 33 a 34 a 35 a 41 a 42 a 43 a 44 a 45 a 51 a 52 a 53 a 54 a 55 P(S j | S i ) Prob of starting in each state e.g. 1 , 2 , 3 , 4 , 5 P(S i ) Computation: prob of an ordered sequence S 2 S 1 S 5 S 4 P(S 2 ) P(S 1 | S 2 ) P(S 5 | S 1 ) P(S 4 | S 5 ) = 2 a 21 a 15 a 54 Note: this picture is a state transition diagram, not a graphical model! CSE586 Robert Collins Hidden Markov Model Set of states, e.g. {S1,S2,S3,S4,S5} Table of transition probabilities (a ij = Prob of going from state S i to S j ) a 11 a 12 a 13 a 14 a 15 a 21 a 22 a 23 a 24 a 25 a 31 a 32 a 33 a 34 a 35 a 41 a 42 a 43 a 44 a 45 a 51 a 52 a 53 a 54 a 55 P(S j | S i ) Prob of starting in each state e.g. 1 , 2 , 3 , 4 , 5 P(S i ) Set of observable symbols, e.g. {v1,v2,v3} {v1,v2,v3} {v1,v2,v3} {v1,v2,v3} {v1, v2, v3} {v1, v2, v3} Prob of seeing each symbol in a given state (b ik = Prob seeing v k in state S i ) b 11 b 21 b 31 b 41 b 51 b 12 b 22 b 32 b 42 b 52 b 13 b 23 b 33 b 43 b 53 P(v k | S i ) CSE586 Robert Collins Hidden Markov Model Computation: Prob of ordered sequence (S 2 ,v 3 ) (S 1 ,v 1 ) (S 5 ,v 2 ) (S 4 ,v 1 ) P(S2,S1,S5,S4,v3,v1,v2,v1) = P(S 2 ) P(v 3 |S 2 ) P(S 1 |S 2 ) P(v 1 |S 1 ) P(S 5 |S 1 ) P(v 2 |S 5 ) P(S 4 |S 5 ) P(v 1 |S 4 ) = 2 b 23 a 21 b 11 a 15 b 52 a 54 b 41 = ( 2 a 21 a 15 a 54 ) ( b 23 b 11 b 52 b 41 ) = P(S1,S2,S3,S4) P(v1,v2,v3,v4|S1,S2,S3,S4) CSE586 Robert Collins HMM as Graphical Model O1 O2 O3 O4 observations B B B B X1 X2 X3 X4 markov chain A A A Let state random variables be X1,X2,X3,X4,... with Xt being state at time t. Allowable values of Xt (a discrete random variable) are {S1,S2,...,Sn} Given HMM with states S1,S2,...,Sn symbols v1,v2,...,vk, transition probs A={a ij }, initial probs ={ i }, and observation probs B={b ik } Let observed random variables be O1,O2,O3... with Ot observed at time t. Allowable values of Ot (a discrete random variable) are {v1,v2,...,vk}

Upload: lyngoc

Post on 27-Sep-2018

216 views

Category:

Documents


1 download

TRANSCRIPT

CSE586Robert Collins

CSE 586, Spring 2011

Advanced Computer Vision

Graphical Models:Hidden Markov Models

CSE586Robert Collins

Hidden Markov Models

Note: a good background reference is LR Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proc. of the IEEE, Vol.77, No.2, pp.257-286, 1989.

CSE586Robert Collins

Markov Chain

Set of states, e.g. {S1,S2,S3,S4,S5}

Table of transition probabilities (aij = Probof going from state Si to Sj)

a11 a12 a13 a14 a15

a21 a22 a23 a24 a25

a31 a32 a33 a34 a35

a41 a42 a43 a44 a45

a51 a52 a53 a54 a55

P(Sj | Si)

Prob of starting in each statee.g. 1, 2, 3, 4, 5

P(Si)

Computation: prob of an ordered sequence S2 S1 S5 S4

P(S2) P(S1 | S2) P(S5 | S1) P(S4 | S5) = 2 a21 a15 a54

Note: this picture is a state transitiondiagram, not a graphical model!

CSE586Robert Collins

Hidden Markov Model

Set of states, e.g. {S1,S2,S3,S4,S5}

Table of transition probabilities (aij = Probof going from state Si to Sj)

a11 a12 a13 a14 a15

a21 a22 a23 a24 a25

a31 a32 a33 a34 a35

a41 a42 a43 a44 a45

a51 a52 a53 a54 a55

P(Sj | Si)

Prob of starting in each statee.g. 1, 2, 3, 4, 5

P(Si)

Set of observable symbols, e.g. {v1,v2,v3}

{v1,v2,v3}

{v1,v2,v3}

{v1,v2,v3}

{v1,v2,v3}

{v1,v2,v3}

Prob of seeing each symbol in a given state(bik = Prob seeing vk in state Si)

b11 b21 b31 b41 b51

b12 b22 b32 b42 b52

b13 b23 b33 b43 b53

P(vk | Si)

CSE586Robert Collins

Hidden Markov Model

Computation: Prob of ordered sequence (S2,v3) (S1,v1) (S5,v2) (S4,v1)

P(S2,S1,S5,S4,v3,v1,v2,v1) = P(S2) P(v3|S2) P(S1|S2) P(v1|S1) P(S5|S1) P(v2|S5) P(S4|S5) P(v1|S4) = 2 b23 a21 b11 a15 b52 a54 b41

= (2 a21 a15 a54) ( b23 b11 b52b41)= P(S1,S2,S3,S4) P(v1,v2,v3,v4|S1,S2,S3,S4)

CSE586Robert Collins

HMM as Graphical Model

O1 O2 O3 O4 observations

B B BB

X1 X2 X3 X4 markov chain

A A A

Let state random variables be X1,X2,X3,X4,... with Xt being state at time t.Allowable values of Xt (a discrete random variable) are {S1,S2,...,Sn}

Given HMM with states S1,S2,...,Sn symbols v1,v2,...,vk, transitionprobs A={aij}, initial probs ={i}, and observation probs B={bik}

Let observed random variables be O1,O2,O3... with Ot observed at time t.Allowable values of Ot (a discrete random variable) are {v1,v2,...,vk}

CSE586Robert Collins

HMM as Graphical Model

Let state random variables be X1,X2,X3,X4,... with Xt being state at time t.Allowable values of Xt (a discrete random variable) are {S1,S2,...,Sn}

X1 X2 X3 X4

O1 O2 O3 O4

markov chain

observations

A

Given HMM with states S1,S2,...,Sn symbols v1,v2,...,vk, transitionprobs A={aij}, initial probs ={i}, and observation probs B={bik}

Let observed random variables be O1,O2,O3... with Ot observed at time t.Allowable values of Ot (a discrete random variable) are {v1,v2,...,vk}

A A

B B BB

Oi are observed variables. Xj are hidden (latent) variables.

CSE586Robert Collins

HMM as Graphical Model

X1 X2 X3 X4

O1 O2 O3 O4

markov chain

observations

A

Verify: What is P(X1,X2,X3,X4,O1,O2,O3,O4)?

A A

B B BB

P(X1,X2,X3,X4,O1,O2,O3,O4) =P(X1) P(O1|X1) P(X2|X1) P(O2|X2) P(X3|X2) P(O3|X3) P(X4|X3)P(O4|X4) ...

CSE586Robert Collins

Three Computations for HMMs CSE586Robert Collins

HMM Problem 1

What is the likelihood of observing a sequence, e.g. O1,O2,O3 ?

Note: there are multiple ways that a given sequence could be emitted, involvingdifferent sequences of hidden states X1,X2,X3

One (inefficient) way to compute the answer: generate all sequences of threestates Sx,Sy,Sz and compute P(Sx,Sy,Sz,O1,O2,O3) [which we know how todo]. Summing up over all sequences of three states gives us our answer.

Drawback, there are 3^N subsequences that can be formed from N states.

CSE586Robert Collins

HMM Problem 1

Better (efficient) solution is based on message passing / belief propagation.

marginal

This is the sum-product procedure of message passing. It is called the forward-backward procedure in HMM terminology

CSE586Robert Collins

HMM Problem 2

What is the most likely sequence of hidden states S1,S2,S3 given that wehave observed O1,O2,O3 ?

Again, there is an inefficient way based on explicitly generating the exponential number of possible state sequences... AND, there is amore efficient way using message passing.

X

X

CSE586Robert Collins

HMM Problem 2

XX

X

maxmaxmaxX

Define:

Then XX

This is the max-product procedure of message passing. It is called the Viterbi algorithm in HMM terminology

CSE586Robert Collins

Viterbi, under the hood

X1

S1

S2

S3

S4

X2

S1

S2

S3

S4

X3

S1

S2

S3

S4

X4

S1

S2

S3

S4

B E

1) Build a state-space trellis. 2) Add a source and sink node.4) Assign weights to edges based on observations.3) Given observations.

O1 O2 O3 O4

P(Si)P(O1|Si)P(Sj|Si) P(O2|Si)P(Sj|Si)

P(O4|Si)P(O3|Si)P(Sj|Si)

Do multistage dynamic programming to find the max product path.

(note: in some implementations, each edge is weighted by -log(wi) of the weights shown here, so that standard min length path DP can be performed.)

CSE586Robert Collins

HMM Problem 3

Given a training set of observation sequences{{O1,O2,...}, {O1,O2,...}, {O1,O2,...}}

how do we determine the transition probs aij, initial state probs pi,and observation probs bik?

This is a learning problem. It is the hardest problem of the three. We assume topology of the HMM is known (number and connectivityof the states) otherwise it is even harder.

Two popular approaches:

• Segmental K-means algorithm

• Baum-Welch algorithm (EM-based)

CSE586Robert Collins

Steve Mills, U.Nottingham

CSE586Robert Collins

Steve Mills, U.Nottingham

CSE586Robert Collins

Steve Mills, U.Nottingham

Segmental K-Means

CSE586Robert Collins

Steve Mills, U.Nottingham

Note: this is essentially a clustering problem!

CSE586Robert Collins

Steve Mills, U.Nottingham

CSE586Robert Collins

Steve Mills, U.Nottingham

CSE586Robert Collins

Steve Mills, U.Nottingham

CSE586Robert Collins

Steve Mills, U.Nottingham

CSE586Robert Collins

Steve Mills, U.Nottingham

CSE586Robert Collins

Steve Mills, U.Nottingham

CSE586Robert Collins

Steve Mills, U.Nottingham

CSE586Robert Collins

Steve Mills, U.Nottingham

Note: This part assumes continuous-valued observations are output. For a discrete set of observation symbols, we could just do histograms here.

CSE586Robert Collins

Steve Mills, U.Nottingham

CSE586Robert Collins

Steve Mills, U.Nottingham

CSE586Robert Collins

Steve Mills, U.Nottingham

CSE586Robert Collins

Steve Mills, U.Nottingham

CSE586Robert Collins

Steve Mills, U.Nottingham

CSE586Robert Collins

Steve Mills, U.Nottingham

Really?

CSE586Robert Collins

Baum-Welch Algorithm

Same thing, but with dancing bears.

fractional assignments of observations to states

Basic idea is to use EM instead of K-means as a way of assessing ownership weights beforecomputing observation statistics at each state.