slide 1 ee3j2 data mining ee3j2 data mining lecture 14: introduction to hidden markov models martin...

32
EE3J2 Data Mining Slide 1 EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell

Post on 19-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell

EE3J2 Data MiningSlide 1

EE3J2 Data Mining

Lecture 14: Introduction to Hidden Markov Models

Martin Russell

Page 2: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell

EE3J2 Data MiningSlide 2

Objectives

Limitations of sequence matching Introduction to hidden Markov models (HMMs)

Page 3: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell

EE3J2 Data MiningSlide 3

Sequence retrieval using DP

……

AAGDTDTDTDD

AABBCBDAAAAAAA

BABABABBCCDF

GGGGDDGDGDGDGDTDTD

DGDGDGDGD

AABCDTAABCDTAABCDTAAB

CDCDCDTGGG

GGAACDTGGGGGAAA

…….

…….

Corpus of sequential data

‘query’ sequence Q

…BBCCDDDGDGDGDCDTCDTTDCCC…

Dynamic Programming

Distance Calculation Calculate ad(S,Q)

for each sequence S in corpus

QSadSS

,minargˆ

Page 4: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell

EE3J2 Data MiningSlide 4

Limitations of ‘template matching’

This type of analysis is sometimes referred to as template matching

The ‘templates’ are the sequences in the corpus Can think of each template as representing a ‘class’ Problem is to determine which class best fits the

query Performance will depend on precisely which

template is used to represent the class

Page 5: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell

EE3J2 Data MiningSlide 5

Alternative path shapes

The basic units of path considered so far are:

substitution insertion deletion

Others are possible and may have advantages, e.g:

substitution insertion deletion

Page 6: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell

EE3J2 Data MiningSlide 6

Example

Page 7: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell

EE3J2 Data MiningSlide 7

Hidden Markov Models (HMMs)

One solution is to replace the individual template sequence with an ‘average’ sequence

But what is an ‘average sequence’? One solution is to use a type of statistical model

called a Hidden Markov Model

Page 8: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell

EE3J2 Data MiningSlide 8

HMMs

Suppose the following sequences are in same class:– ABC, YBBC, ABXC, AZ

Compute alignments:

Y B B C

A

B

C

A B X C

A

B

C

A Z

A

B

C

Page 9: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell

EE3J2 Data MiningSlide 9

Finite State Network Representation The sequence consists of 3 ‘states’

– First state is ‘realised’ as A (twice) or Y (once)

– Second state ‘realised’ as B (three times) or X (once)

– Second state can be repeated or deleted

– Third state can be ‘realised’ as C (twice) or Z (once)

Page 10: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell

EE3J2 Data MiningSlide 10

Network representation

Directed graph representation Each state associated with a set of probabilities

– Called the ‘state emission’ probabilities

0

,3

1 ,

3

2

ZpXpCpBp

YpAp

Page 11: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell

EE3J2 Data MiningSlide 11

Transition probabilities

Transition probabilities control insertions and deletions of symbols

1 10.67

0.33

0.5

0.5

00000

10000

05.05.000

033.067.000

00010

A

ajk=Prob(state k follows state j)

Basic rule for drawing transition networks: Connect state j to state k if ajk > 0

Page 12: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell

EE3J2 Data MiningSlide 12

Formal Definition

A Hidden Markov Model (HMM) for the symbols 1, 2, …, K consists of:

– A number of states N

– An N N state transition probability matrix A

– For each state k a set of probabilities pk(1), … , p(K) - p(k) is the probability that k occurs for state k

Page 13: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell

EE3J2 Data MiningSlide 13

Alignment paths for HMMs

For HMMs, alignment paths are called state sequences

Y A B B B X B C

A

B

C

CpaBpaApaYpYABBBXBCp 4343232222

Page 14: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell

EE3J2 Data MiningSlide 14

State-symbol trellis

Y A B B B X B C

A

B

C

Rule: connect state j at symbol m with state k at symbol m+1 if ajk > 0

Page 15: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell

EE3J2 Data MiningSlide 15

More examples

Page 16: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell

EE3J2 Data MiningSlide 16

Dynamic Programming

Y A B B B X B C

A

B

C

Bpa

Bpa

k

kk

4341

4241

3

2max4

Page 17: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell

EE3J2 Data MiningSlide 17

Formal Definition

A Hidden Markov Model (HMM) for the symbols 1, 2, …, K consists of:

– A number of states N

– An N N state transition probability matrix A

– For each state k a set of probabilities pk(1), … , p(K) - p(k) is the probability that k occurs for state k

Page 18: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell

EE3J2 Data MiningSlide 18

Alignment paths for HMMs

For HMMs, alignment paths are called state sequences

Y A B B B X B C

A

B

C

CpaBpaApaYpYABBBXBCp 4343232222

State sequence

Page 19: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell

EE3J2 Data MiningSlide 19

The optimal state sequence

Let M be a HMM and s a sequence Probability on previous slide depends on the state

sequence and the model, so we write:

By analogy with dynamic programming, the optimal state sequence is the sequence such that:

Msp |,

Msp

MspMsp

|,maxargˆ

or, ,|,max|ˆ,

Page 20: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell

EE3J2 Data MiningSlide 20

Computing the optimal state sequence:The ‘state-symbol’ trellis

Y A B B B X B C

A

B

C

Rule: connect state j at symbol m with state k at symbol m+1 if ajk > 0

Page 21: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell

EE3J2 Data MiningSlide 21

More examples

Page 22: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell

EE3J2 Data MiningSlide 22

Dynamic Programminga.k.a Viterbi Decoding

Y A B B B X B C

A

B

C

4|ˆ,ˆ ,3

2max4

4341

4241K

k

kk Mspsp

Bpa

Bpa

k

4

K

Page 23: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell

EE3J2 Data MiningSlide 23

Sequence retrieval using HMMs

Corpus of pre-build HMMs

‘query’ sequence Q

…BBCCDDDGDGDGDCDTCDTTDCCC…Viterbi

Decoding

Calculate p(Q|M) for each HMM M

in corpus MQpMM

|maxargˆ

Page 24: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell

EE3J2 Data MiningSlide 24

Page 25: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell

EE3J2 Data MiningSlide 25

Page 26: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell

EE3J2 Data MiningSlide 26

Page 27: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell

EE3J2 Data MiningSlide 27

Page 28: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell

EE3J2 Data MiningSlide 28

HMM Construction

Suppose we have a set of HMMs, each representing a different class (e.g. protein sequence)

Given an unknown sequence s:– Use Viterbi decoding to compare s with each HMM

– Compute

But how do we obtain the HMM in the first place?

MxspMsp |ˆ,|ˆ

Page 29: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell

EE3J2 Data MiningSlide 29

HMM training

Given a set of example sequences S a HMM M can be built such that p(S|M) is locally maximised

Procedure is as follows:– Obtain an initial estimate of a suitable model M0

– Apply an algorithm – the ‘Baum-Welch’ algorithm – to obtain a new model M1 such that p(S|M1) ≥ p(S|M0)

– Repeat to produce a sequence of HMMs M0, M1,…,Mn with:

p(S|M0) ≤ p(S|M1) ≤ p(S|M2) ≤… ≤ p(S|Mn)

Page 30: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell

EE3J2 Data MiningSlide 30

Local optimality

M0 M1…Mn

P(S|M)

Local maximum

Global maximum

Page 31: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell

EE3J2 Data MiningSlide 31

Summary

Hidden Markov Models Importance of HMMs for sequence matching Viterbi decoding HMM training

Page 32: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell

EE3J2 Data MiningSlide 32

Summary

Review of template matching Hidden Markov Models Dynamic programming for HMMs