part 6 hmm in practice cse717, spring 2008 cubs, univ at buffalo

24
Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo

Post on 15-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo

Part 6 HMM in Practice

CSE717, SPRING 2008

CUBS, Univ at Buffalo

Page 2: Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo

Practical Problems in the HMM

Computation with Probabilities

Configuration of HMM

Robust Parameter Estimation (Feature Optimization, Tying)

Efficient Model Evaluation (Beam Search, Pruning)

Page 3: Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo

Computation with Probabilities

Logarithmic Probability Representation

Lower Bounds for Probabilities

Codebook for Semi-Continuous HMMs

Page 4: Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo

Probability of State Sequence s for a Given Model λ

If all , for a sequence of T>100,

T

tss tt

as1

,1)|Pr(

1.0,1

tt ssa

10010)|Pr( s

Page 5: Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo

Logarithm Transformation

pp ln~

)(})({max)( 11 tjijti

t Obaij

)(~

}~)(~

{min)(~

11 tjijti

t Obaij

Page 6: Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo

Kingsbury-Rayner Formula

)1ln(~

)}1ln({ln

))/1(ln()ln(~

)~~(1

lnln1

121213

12

12

pp

pp

ep

ep

pppppp

213 ppp

Page 7: Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo

Mixture Density Model

Kingsbury-Rayner Formula is not advisable here (too many exps and logs)

Approximation

M

kjkjkj xgcxb

1

)()(

)}({max)( xgcxb jkjkk

j

)}(~~{min)(~

xgcxb jkjkk

j

Page 8: Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo

Lower Bounds for Probabilities

Choose a minimal probability

For example: In training it is avoided that certain states are

not considered for parameter estimation

In decoding it is avoided that paths through states with vanishing output probabilities are immediately discarded

minp

minmax ln~ pp

min)( bxb j

Page 9: Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo

Codebook Evaluation for Semi-Continuous HMMs

Semi-Continuous HMM

M

kkjk

M

kkjkj xpcxgcxb

11

)|()()(

label class :k

Page 10: Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo

Codebook Evaluation for Semi-Continuous HMMs

By Bayes’ Law

Assume can be approximated by a uniform distribution, then

M

kkkk

kkk

xpx

xpxxp

1

)|()Pr()|Pr(

)()|Pr()Pr()|(

M

kk

kk

xp

xpx

1

)|(

)|()|Pr(

)Pr( k

Page 11: Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo

Codebook Evaluation for Semi-Continuous HMMs

This reduces the dynamic range of all quantities involved

M

kM

kk

kjk

M

kkjkj

xp

xpcxcxb

1

1

1

'

)|(

)|()|Pr()(

Page 12: Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo

Configuration of HMM

Model Topology

Modularization

Compound Models

Modeling Emissions

Page 13: Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo

Model Topology

Input data of speech and handwriting recognition exhibit a chronological or linear structure

Ergodic model is not necessary

Page 14: Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo

Linear Model

The most simple model that describes chronological sequences

Transitions to the next state and to the current state are allowed

Page 15: Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo

Bakis Model

Skipping of states is allowed

Larger flexibility inn the modeling of duration

Widely used in speech and handwriting recognition

Page 16: Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo

Left-to-right Model

An arbitrary number of states may be skipped in forward direction

Jumping back to “past” states is not allowed

Can describe larger variations in the temporal structure; longer parts of the data may be missing

Page 17: Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo

Modularization

English Word Recognition Thousands of words: more than thousands of

word models; requires large amount of training data

26 letters: limited number of character models

Modularization: divides complex model into smaller models of segmentation units Word -> subword -> character

Page 18: Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo

Variation of Segmentation Units in Different Context Phonetic transcription of word “speech”: /spitS/

Cannot easily be distinguished from achieve (/@tSiv/), cheese (/tSiz/), or reality (/riEl@ti/)

Triphone [Schwartz, 1984] Three immediately neighboring phone units

taken as a segmentation units, e.g., p/i/t Eliminates the dependence of the variability of

segmentation units on the context

Page 19: Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo

Compound Models

Parallel connection of all individual word models

HMM structure for isolated word recognition

Circles: Model States

Squares: Non-emission States

HMM structure for connected word recognition

Page 20: Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo

Grammar Coded into HMM

Page 21: Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo

Modeling Emissions

Continuous feature vectors in the fields of speech and handwriting recognition are described by mixture models Size of the codebook and number of component

densities per mixture density need to be decided No general way; a compromise between the precision

of the model, its generalization capabilities, and the computation time

Semi-Continuous Model Size of codebook: some hundred up to a few thousand

densities Mixture Model: 8 to 64 component densities

Page 22: Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo

References

[1] Schwartz R, Chow Y, Roucos S, Krasner M, Makhoul J, Improved hidden Markov Modelling of phonemes for continuous speech recognition, in International Conference on Acoustics, Speech and Signal Processing, pp 35.6.1-35.6.4, 1984.

Page 23: Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo

Robust Parameter Estimation

Feature Optimization

Tying

Page 24: Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo

Feature Optimization Techniques