slide 1 ee3j2 data mining ee3j2 data mining lecture 14: introduction to hidden markov models martin...

EE3J2 Data MiningSlide 1

EE3J2 Data Mining

Lecture 14: Introduction to Hidden Markov Models

Martin Russell


Objectives

Limitations of sequence matching Introduction to hidden Markov models (HMMs)


Sequence retrieval using DP

……

AAGDTDTDTDD

AABBCBDAAAAAAA

BABABABBCCDF

GGGGDDGDGDGDGDTDTD

DGDGDGDGD

AABCDTAABCDTAABCDTAAB

CDCDCDTGGG

GGAACDTGGGGGAAA

…….

…….

Corpus of sequential data

‘query’ sequence Q

…BBCCDDDGDGDGDCDTCDTTDCCC…

Dynamic Programming

Distance Calculation Calculate ad(S,Q)

for each sequence S in corpus

QSadSS

,minargˆ


Limitations of ‘template matching’

This type of analysis is sometimes referred to as template matching

The ‘templates’ are the sequences in the corpus Can think of each template as representing a ‘class’ Problem is to determine which class best fits the

query Performance will depend on precisely which

template is used to represent the class


Alternative path shapes

The basic units of path considered so far are:

substitution insertion deletion

Others are possible and may have advantages, e.g:

substitution insertion deletion


Example


Hidden Markov Models (HMMs)

One solution is to replace the individual template sequence with an ‘average’ sequence

But what is an ‘average sequence’? One solution is to use a type of statistical model

called a Hidden Markov Model


HMMs

Suppose the following sequences are in same class:– ABC, YBBC, ABXC, AZ

Compute alignments:

Y B B C

A

B

C

A B X C

A

B

C

A Z

A

B

C


Finite State Network Representation The sequence consists of 3 ‘states’

– First state is ‘realised’ as A (twice) or Y (once)

– Second state ‘realised’ as B (three times) or X (once)

– Second state can be repeated or deleted

– Third state can be ‘realised’ as C (twice) or Z (once)


Network representation

Directed graph representation Each state associated with a set of probabilities

– Called the ‘state emission’ probabilities

0

,3

1 ,

3

2

ZpXpCpBp

YpAp


Transition probabilities

Transition probabilities control insertions and deletions of symbols

1 10.67

0.33

0.5

0.5

00000

10000

05.05.000

033.067.000

00010

A

ajk=Prob(state k follows state j)

Basic rule for drawing transition networks: Connect state j to state k if ajk > 0


Formal Definition

A Hidden Markov Model (HMM) for the symbols 1, 2, …, K consists of:

– A number of states N

– An N N state transition probability matrix A

– For each state k a set of probabilities pk(1), … , p(K) - p(k) is the probability that k occurs for state k


Alignment paths for HMMs

For HMMs, alignment paths are called state sequences

Y A B B B X B C

A

B

C

CpaBpaApaYpYABBBXBCp 4343232222


State-symbol trellis

Y A B B B X B C

A

B

C

Rule: connect state j at symbol m with state k at symbol m+1 if ajk > 0


More examples


Dynamic Programming

Y A B B B X B C

A

B

C

Bpa

Bpa

k

kk

4341

4241

3

2max4


Formal Definition

A Hidden Markov Model (HMM) for the symbols 1, 2, …, K consists of:

– A number of states N

– An N N state transition probability matrix A

– For each state k a set of probabilities pk(1), … , p(K) - p(k) is the probability that k occurs for state k


Alignment paths for HMMs

For HMMs, alignment paths are called state sequences

Y A B B B X B C

A

B

C

CpaBpaApaYpYABBBXBCp 4343232222

State sequence


The optimal state sequence

Let M be a HMM and s a sequence Probability on previous slide depends on the state

sequence and the model, so we write:

By analogy with dynamic programming, the optimal state sequence is the sequence such that:

Msp |,

Msp

MspMsp

|,maxargˆ

or, ,|,max|ˆ,


Computing the optimal state sequence:The ‘state-symbol’ trellis

Y A B B B X B C

A

B

C

Rule: connect state j at symbol m with state k at symbol m+1 if ajk > 0


More examples


Dynamic Programminga.k.a Viterbi Decoding

Y A B B B X B C

A

B

C

4|ˆ,ˆ ,3

2max4

4341

4241K

k

kk Mspsp

Bpa

Bpa

k

4

K


Sequence retrieval using HMMs

Corpus of pre-build HMMs

‘query’ sequence Q

…BBCCDDDGDGDGDCDTCDTTDCCC…Viterbi

Decoding

Calculate p(Q|M) for each HMM M

in corpus MQpMM

|maxargˆ


HMM Construction

Suppose we have a set of HMMs, each representing a different class (e.g. protein sequence)

Given an unknown sequence s:– Use Viterbi decoding to compare s with each HMM

– Compute

But how do we obtain the HMM in the first place?

MxspMsp |ˆ,|ˆ


HMM training

Given a set of example sequences S a HMM M can be built such that p(S|M) is locally maximised

Procedure is as follows:– Obtain an initial estimate of a suitable model M0

– Apply an algorithm – the ‘Baum-Welch’ algorithm – to obtain a new model M1 such that p(S|M1) ≥ p(S|M0)

– Repeat to produce a sequence of HMMs M0, M1,…,Mn with:

p(S|M0) ≤ p(S|M1) ≤ p(S|M2) ≤… ≤ p(S|Mn)


Local optimality

M0 M1…Mn

P(S|M)

Local maximum

Global maximum


Summary

Hidden Markov Models Importance of HMMs for sequence matching Viterbi decoding HMM training


Summary

Review of template matching Hidden Markov Models Dynamic programming for HMMs

slide 1 ee3j2 data mining ee3j2 data mining lecture 14: introduction to hidden markov models martin...

Documents

b c slide

class slide

corpus slide

b b b x b c

y b b c

ee3j2 data mining hmms

ee3j2 data mining example

hidden markov model