hmm presentation

Upload: niravuchat

Post on 04-Apr-2018

234 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 HMM Presentation

    1/31

    IntroductionMotivation - Why HMM ?

    Understanding HMMHMM and Speech Recognition

    Isolated Word Recognizer

    Hidden Markov Model and Speech Recognition

    Nirav S. Uchat

    1 Dec,2006

    Nirav S. Uchat Hidden Markov Model and Speech Recognition

  • 7/30/2019 HMM Presentation

    2/31

    IntroductionMotivation - Why HMM ?

    Understanding HMMHMM and Speech Recognition

    Isolated Word Recognizer

    Outline

    1 Introduction

    2 Motivation - Why HMM ?

    3 Understanding HMM

    4 HMM and Speech Recognition

    5 Isolated Word Recognizer

    Nirav S. Uchat Hidden Markov Model and Speech Recognition

  • 7/30/2019 HMM Presentation

    3/31

    IntroductionMotivation - Why HMM ?

    Understanding HMMHMM and Speech Recognition

    Isolated Word Recognizer

    Introduction

    What is Speech Recognition ?

    Understanding what is being said

    Mapping speech data to textual information

    Speech Recognition is indeed challenging

    Due to presence of noise in input data

    Variation in voice data due to speakers physical condition,mood etc..

    Difficult to identify boundary condition

    Nirav S. Uchat Hidden Markov Model and Speech Recognition

  • 7/30/2019 HMM Presentation

    4/31

    IntroductionMotivation - Why HMM ?

    Understanding HMMHMM and Speech Recognition

    Isolated Word Recognizer

    Different types of Speech Recognition

    Type of SpeakerSpeaker Dependent(SD)

    relatively easy to constructrequires less training data (only from particular speaker)also known as speaker recognition

    Speaker Independent(SID)requires huge training data (from various speaker)difficult to construct

    Type of DataIsolates Word Recognizer

    recognize single wordeasy to construct (pointer for more difficult speechrecognition)may be speaker dependent or speaker independent

    Continuous Speech Recognitionmost difficult of all

    problem of finding word boundaryNirav S. Uchat Hidden Markov Model and Speech Recognition

    I d i

  • 7/30/2019 HMM Presentation

    5/31

    IntroductionMotivation - Why HMM ?

    Understanding HMMHMM and Speech Recognition

    Isolated Word Recognizer

    Outline

    1 Introduction

    2 Motivation - Why HMM ?

    3 Understanding HMM

    4 HMM and Speech Recognition

    5 Isolated Word Recognizer

    Nirav S. Uchat Hidden Markov Model and Speech Recognition

    I t d ti

  • 7/30/2019 HMM Presentation

    6/31

    IntroductionMotivation - Why HMM ?

    Understanding HMMHMM and Speech Recognition

    Isolated Word Recognizer

    Use of Signal Model

    it helps us to characterize the property of the given signal

    provide theoretical basis for signal processing system

    way to understand how system works

    we can simulate the source and it help us to understand asmuch as possible about signal source

    Why Hidden Markov Model (HMM) ?

    very rich in mathematical structure

    when applied properly, work very well in practical application

    Nirav S. Uchat Hidden Markov Model and Speech Recognition

    Introduction

  • 7/30/2019 HMM Presentation

    7/31

    IntroductionMotivation - Why HMM ?

    Understanding HMMHMM and Speech Recognition

    Isolated Word Recognizer

    Outline

    1 Introduction

    2 Motivation - Why HMM ?

    3 Understanding HMM

    4 HMM and Speech Recognition

    5 Isolated Word Recognizer

    Nirav S. Uchat Hidden Markov Model and Speech Recognition

    Introduction

  • 7/30/2019 HMM Presentation

    8/31

    IntroductionMotivation - Why HMM ?

    Understanding HMMHMM and Speech Recognition

    Isolated Word Recognizer

    Components of HMM [2]

    1 Number of state = N2 Number of distinct observation symbol per state = M,

    V = V1, V2, ,VM3 State transition probability =aij4 Observation symbol probability distribution in state

    j,Bj(K) = P[Vk at t|qt = Sj]

    5 The initial state distribution i = P[q1 = Si] 1 i NNirav S. Uchat Hidden Markov Model and Speech Recognition

    Introduction

  • 7/30/2019 HMM Presentation

    9/31

    IntroductionMotivation - Why HMM ?

    Understanding HMMHMM and Speech Recognition

    Isolated Word Recognizer

    Problem For HMM : Problem 1 [2]

    Problem 1 : Evaluation Problem Given the observationsequence O = O1 O2 OT, and model = (A,B, ), howdo we efficiently compute P(O|), the probability ofobservation sequence given the mode.

    Figure: Evaluation Problem

    Nirav S. Uchat Hidden Markov Model and Speech Recognition

    Introduction

  • 7/30/2019 HMM Presentation

    10/31

    IntroductionMotivation - Why HMM ?

    Understanding HMMHMM and Speech Recognition

    Isolated Word Recognizer

    Problem 2 and 3 [2]

    Problem 2 : Hidden State Determination (Decoding)Given the observation sequence O = O1 O2 OT, andmodel = (A, B, ), How do we choose BEST state

    sequence Q = q1 q2 qT which is optimal in somemeaningful sense.(In Speech Recognition it can be considered as state emittingcorrect phoneme)

    Problem 3 : Learning How do we adjust the modelparameter = (A,B, ) to maximize P(O|). Problem 3 isone in which we try to optimize model parameter so as tobest describe as to how given observation sequence comes out

    Nirav S. Uchat Hidden Markov Model and Speech Recognition

    Introduction

  • 7/30/2019 HMM Presentation

    11/31

    Motivation - Why HMM ?Understanding HMM

    HMM and Speech RecognitionIsolated Word Recognizer

    Solution For Problem 1 : Forward Algorithm

    P(O|) =

    q1, ,qT

    q1 bq1(O1)aq1q2 bq2 (O2) aqT1qTbqT(OT)

    Which is O(NT) algorithm i.e. at every state we have N

    choices to make and total length is T.

    Forward algorithm which uses dynamic programming methodto reduce time complexity.

    It uses forward variable t(i) defined as

    t(i) = P(O1,O2, ,Oi, qt = Si|)

    i.e., the probability of partial observation sequence, O1,O2 tillOt and state Si at time t given the model ,

    Nirav S. Uchat Hidden Markov Model and Speech Recognition

    Introduction

  • 7/30/2019 HMM Presentation

    12/31

    Motivation - Why HMM ?Understanding HMM

    HMM and Speech RecognitionIsolated Word Recognizer

    Figure: Forward Variable

    t+1(j) =

    N

    i=1

    t(i)aij

    bj(Ot+1), 1 t T 1, 1 j N

    Nirav S. Uchat Hidden Markov Model and Speech Recognition

    Introduction

  • 7/30/2019 HMM Presentation

    13/31

    Motivation - Why HMM ?Understanding HMM

    HMM and Speech RecognitionIsolated Word Recognizer

    Solution For Problem 2 : Decoding using Viterbi Algorithm

    [1]Viterbi Algorithm : To find single best state sequencewe define a quantity

    t(i) = maxq1

    ,

    q2,

    ,

    qt1

    P[q1q2 qt = i,O1O2 Ot|]

    i.e., t(i) is the best score along a single path, at time t,which account for the first t observations and ends in state Si,by induction

    t+1(j) = maxi

    t(i)aij bj(Ot+1)Key point is, Viterbi algorithm is similar (except for thebacktracking part) in implementation to the Forwardalgorithm. The major difference is maximization of theprevious state in place of summing procedure in forward

    calculationNirav S. Uchat Hidden Markov Model and Speech Recognition

    Introduction?

  • 7/30/2019 HMM Presentation

    14/31

    Motivation - Why HMM ?Understanding HMM

    HMM and Speech RecognitionIsolated Word Recognizer

    Solution For Problem 3 : Learning (Adjusting model

    parameter)

    Uses Baum-Welch Learning Algorithm

    Core operation is

    t(i,j) = P(qt = Si, qt+1 = Sj|O, ) i.e.,the probability ofbeing in state Si at time t, and state Sj at time t+ 1 given themodel and observation sequencet(i) = the probability of being in state Si at time t, given theobservation sequence and modelwe can relate :

    t(i) =N

    j=1

    t(i,j)

    re-estimated parameters are :

    = Expected number of times in state Si = 1(i)

    Nirav S. Uchat Hidden Markov Model and Speech Recognition

    IntroductionM i i Wh HMM ?

  • 7/30/2019 HMM Presentation

    15/31

    Motivation - Why HMM ?Understanding HMM

    HMM and Speech RecognitionIsolated Word Recognizer

    aij = expected number of transition from state Si to Sj

    expected number of transition form state Si

    =

    T1t=1

    t(i,j)

    T1t=1

    t(i)

    bj(k) = number of times in state j and observing symbol vkexpected number of times in state j

    =

    Tt=1

    s.t Ot=Vk

    t(j)

    T

    t=1

    t(j)

    Nirav S. Uchat Hidden Markov Model and Speech Recognition

    IntroductionM ti ti Wh HMM ?

  • 7/30/2019 HMM Presentation

    16/31

    Motivation - Why HMM ?Understanding HMM

    HMM and Speech RecognitionIsolated Word Recognizer

    Outline

    1 Introduction

    2 Motivation - Why HMM ?

    3 Understanding HMM

    4 HMM and Speech Recognition

    5 Isolated Word Recognizer

    Nirav S. Uchat Hidden Markov Model and Speech Recognition

    IntroductionMotivation Why HMM ?

  • 7/30/2019 HMM Presentation

    17/31

    Motivation - Why HMM ?Understanding HMM

    HMM and Speech RecognitionIsolated Word Recognizer

    Block Diagram of ASR using HMM

    Nirav S. Uchat Hidden Markov Model and Speech Recognition

    IntroductionMotivation - Why HMM ?

  • 7/30/2019 HMM Presentation

    18/31

    Motivation - Why HMM ?Understanding HMM

    HMM and Speech RecognitionIsolated Word Recognizer

    Basic Structure

    Phoneme

    smallest unit of information in speech signal (over 10 msec) isPhoneme

    ONE : W AH N

    English language has approximately 56 phoneme

    HMM structure for a Phoneme

    This model is First Order Markov Model

    Transition is from previous state to next state (no jumping)Nirav S. Uchat Hidden Markov Model and Speech Recognition

    IntroductionMotivation - Why HMM ?

  • 7/30/2019 HMM Presentation

    19/31

    Motivation Why HMM ?Understanding HMM

    HMM and Speech RecognitionIsolated Word Recognizer

    Question to be ask ?

    What represent state in HMM ?

    HMM for each phoneme

    3 state for each HMM

    states are : start mid and endONE : has 3 HMM for phoneme W AH and N each HMMhas 3 state

    What is output symbol ?Symbol form Vector Quantization is used as output symbolfrom state

    concatenation of symbol gives phoneme

    Nirav S. Uchat Hidden Markov Model and Speech Recognition

    IntroductionMotivation - Why HMM ?

  • 7/30/2019 HMM Presentation

    20/31

    Motivation Why HMM ?Understanding HMM

    HMM and Speech RecognitionIsolated Word Recognizer

    Front-End

    purpose is to parameterize an input signal (e.g., audio) into asequence of Features vectorMethod for Feature Vector extraction are

    MFCC - Mel Frequency Cepsteral Coefficient

    LPC Analysis - Linear Predictive Coding

    Nirav S. Uchat Hidden Markov Model and Speech Recognition

    IntroductionMotivation - Why HMM ?

  • 7/30/2019 HMM Presentation

    21/31

    yUnderstanding HMM

    HMM and Speech RecognitionIsolated Word Recognizer

    Acoustic Modeling[1]

    Uses Vector Quantization to map Feature vector to Symbol.

    create training set of feature vector

    cluster them in to small number of classes

    represent each class by symbol

    for each class Vk, compute the probability that it is generatedby given HMM state.

    Nirav S. Uchat Hidden Markov Model and Speech Recognition

    IntroductionMotivation - Why HMM ?

  • 7/30/2019 HMM Presentation

    22/31

    Understanding HMMHMM and Speech Recognition

    Isolated Word Recognizer

    Creation Of Search Graph [3]

    Search Graph represent Vocabulary under consideration

    Acoustic Model, Language model and Lexicon (Decoder

    during recognition) works together to produce Search GraphLanguage model represent how word are related to each other(which word follows other)

    it uses Bi-Gram model

    Lexicon is a file containing WORD PHONEME pairSo we have whole vocabulary represented as graph

    Nirav S. Uchat Hidden Markov Model and Speech Recognition

    IntroductionMotivation - Why HMM ?

  • 7/30/2019 HMM Presentation

    23/31

    Understanding HMMHMM and Speech Recognition

    Isolated Word Recognizer

    Complete Example

    Nirav S. Uchat Hidden Markov Model and Speech Recognition

    IntroductionMotivation - Why HMM ?

    U d di HMM

  • 7/30/2019 HMM Presentation

    24/31

    Understanding HMMHMM and Speech Recognition

    Isolated Word Recognizer

    Training

    Training is used to adjust model parameter to maximize theprobability of recognition

    Audio data from various different source are taken

    it is given to the prototype HMM

    HMM will adjust the parameter using Baum-Welch algorithm

    Once the model is train, unknown data is given for recognition

    Nirav S. Uchat Hidden Markov Model and Speech Recognition

    IntroductionMotivation - Why HMM ?

    U d t di HMM

  • 7/30/2019 HMM Presentation

    25/31

    Understanding HMMHMM and Speech Recognition

    Isolated Word Recognizer

    Decoding

    It uses Viterbi algorithm for finding BEST state sequence

    Nirav S. Uchat Hidden Markov Model and Speech Recognition

    IntroductionMotivation - Why HMM ?

    Understanding HMM

  • 7/30/2019 HMM Presentation

    26/31

    Understanding HMMHMM and Speech Recognition

    Isolated Word Recognizer

    Decoding Continued

    This is just for Single WordDuring Decoding whole graph is searched.

    Each HMM has two non emitting state for connecting it toother HMM

    Nirav S. Uchat Hidden Markov Model and Speech Recognition

    IntroductionMotivation - Why HMM ?

    Understanding HMM

  • 7/30/2019 HMM Presentation

    27/31

    Understanding HMMHMM and Speech Recognition

    Isolated Word Recognizer

    Outline

    1 Introduction

    2

    Motivation - Why HMM ?

    3 Understanding HMM

    4 HMM and Speech Recognition

    5 Isolated Word Recognizer

    Nirav S. Uchat Hidden Markov Model and Speech Recognition

    IntroductionMotivation - Why HMM ?

    Understanding HMM

  • 7/30/2019 HMM Presentation

    28/31

    Understanding HMMHMM and Speech Recognition

    Isolated Word Recognizer

    Isolated Word Recognizer [4]

    Nirav S. Uchat Hidden Markov Model and Speech Recognition

    IntroductionMotivation - Why HMM ?

    Understanding HMM

  • 7/30/2019 HMM Presentation

    29/31

    Understanding HMMHMM and Speech Recognition

    Isolated Word Recognizer

    Nirav S. Uchat Hidden Markov Model and Speech Recognition

    IntroductionMotivation - Why HMM ?

    Understanding HMM

  • 7/30/2019 HMM Presentation

    30/31

    gHMM and Speech Recognition

    Isolated Word Recognizer

    Problem With Continuous Speech Recognition

    Boundary condition

    Large vocabulary

    Training time

    Efficient Search Graph creation

    Nirav S. Uchat Hidden Markov Model and Speech Recognition

    IntroductionMotivation - Why HMM ?

    Understanding HMM

  • 7/30/2019 HMM Presentation

    31/31

    gHMM and Speech Recognition

    Isolated Word Recognizer

    Dan Jurafsky.

    CS 224S / LINGUIST 181 Speech Recognition and Synthesis.World Wide Web, http://www.stanford.edu/class/cs224s/.

    Lawrence R. Rabiner.A Tutorial on Hidden Markov Model and Selected Applicaiton

    in Speech Recognition.IEEE, 1989.

    Willie Walker, Paul Lamere, and Philip Kwok.Sphinx-4: A Flexible Open Source Framework for SpeechRecognition.SUN Microsystem, 2004.

    Steve Young and Gunnar Evermannl.The HTK Book.Microsoft Corporation, 2005.

    Nirav S. Uchat Hidden Markov Model and Speech Recognition