automated speach recognotion automated speach recognition by: amichai painsky
DESCRIPTION
ASR - basics Page 3 Automated Speach Recognotion Observations representing a speech signal Vocabulary V of different words Our goal – find the most likely word sequence Since we have language modeling acoustic modelingTRANSCRIPT
![Page 1: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky](https://reader035.vdocuments.net/reader035/viewer/2022062223/5a4d1b887f8b9ab0599bd914/html5/thumbnails/1.jpg)
Automated Speach Recognotion
Automated Speach Recognition
By: Amichai Painsky
![Page 2: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky](https://reader035.vdocuments.net/reader035/viewer/2022062223/5a4d1b887f8b9ab0599bd914/html5/thumbnails/2.jpg)
Automated Speech Recognition - setup
Page 2
Automated Speach Recognotion
• Input – speech waveform:
• Preprocessing:
• Modeling:
• Output – transcription: The boy is in the red house
![Page 3: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky](https://reader035.vdocuments.net/reader035/viewer/2022062223/5a4d1b887f8b9ab0599bd914/html5/thumbnails/3.jpg)
ASR - basics
Page 3
Automated Speach Recognotion
• Observations representing a speech signal
• Vocabulary V of different words
• Our goal – find the most likely word sequence
• Since
we have
language modeling
acoustic modeling
![Page 4: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky](https://reader035.vdocuments.net/reader035/viewer/2022062223/5a4d1b887f8b9ab0599bd914/html5/thumbnails/4.jpg)
Observations preprocessing
Page 4
Automated Speach Recognotion
• A sampled waveform is converted into a sequence of parameter vectors at a certain frame rate
• A frame rate of 10 ms is usually taken, because a speech signal is assumed to be stationary for about 10 ms
• Many different ways to extract meaningful features have been developed, some based on acoustic concepts or knowledge of the human vocal tract and psychophysical knowledge of the human perception
![Page 5: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky](https://reader035.vdocuments.net/reader035/viewer/2022062223/5a4d1b887f8b9ab0599bd914/html5/thumbnails/5.jpg)
Language modeling
Page 5
Automated Speach Recognotion
• Most generally, the probability of a sequence m of words is
• Language is highly structured and limited histories are capable of capturing quite a bit of this structure. Bigram models:
• More powerful two-words (trigrams) history models
• Longer history -> exponentially increasing number of models -> more data is required to train, more parameters, more overfitting
• Partial Matching modeling
![Page 6: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky](https://reader035.vdocuments.net/reader035/viewer/2022062223/5a4d1b887f8b9ab0599bd914/html5/thumbnails/6.jpg)
Acoustic modeling
Page 6
Automated Speach Recognotion
• Determines what sound is pronounced when a given sentence is uttered
• Number of possibilities is infinite! (depends on the speaker, the ambiance, microphone placement etc.)
• Possible solution – a parametric model in the form of Hidden Markov Model
• Notice other solutions may also apply (for example, neural nets)
![Page 7: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky](https://reader035.vdocuments.net/reader035/viewer/2022062223/5a4d1b887f8b9ab0599bd914/html5/thumbnails/7.jpg)
Hidden Markov Model
Page 7
Automated Speach Recognotion
• A simple example of HMM:
![Page 8: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky](https://reader035.vdocuments.net/reader035/viewer/2022062223/5a4d1b887f8b9ab0599bd914/html5/thumbnails/8.jpg)
Hidden Markov Model
Page 8
Automated Speach Recognotion
![Page 9: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky](https://reader035.vdocuments.net/reader035/viewer/2022062223/5a4d1b887f8b9ab0599bd914/html5/thumbnails/9.jpg)
Hidden Markov Model – Forward Algorithm
Page 9
Automated Speach Recognotion
• Given an observation sequence (for example 10110), what is the probability it was generated from a given HMM (for example – HMM from previous slides).
• For a path q=12312 and a given HMM :
• Therefore, summing over all possible paths:
![Page 10: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky](https://reader035.vdocuments.net/reader035/viewer/2022062223/5a4d1b887f8b9ab0599bd914/html5/thumbnails/10.jpg)
Hidden Markov Model – Forward Algorithm
Page 10
Automated Speach Recognotion
• Complexity: for a sequence of T observations each path necessitates 2T multiplications. Total number of paths is , therefore
• A more efficient approach – forward algorithm
![Page 11: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky](https://reader035.vdocuments.net/reader035/viewer/2022062223/5a4d1b887f8b9ab0599bd914/html5/thumbnails/11.jpg)
Hidden Markov Model – Forward Algorithm
Page 11
Automated Speach Recognotion
• Forward algorithm: calculates the probabilities for all subsequences in each time step, using the results from the previous step (dynamic programming)
• Define – the probability of being at the state i at time t and having observed the partial sequence :
![Page 12: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky](https://reader035.vdocuments.net/reader035/viewer/2022062223/5a4d1b887f8b9ab0599bd914/html5/thumbnails/12.jpg)
Hidden Markov Model – Forward Algorithm
Page 12
Automated Speach Recognotion
• Define – the probability of being at the state i at time t and having observed the partial sequence :
![Page 13: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky](https://reader035.vdocuments.net/reader035/viewer/2022062223/5a4d1b887f8b9ab0599bd914/html5/thumbnails/13.jpg)
Hidden Markov Model – Forward Algorithm
Page 13
Automated Speach Recognotion
• Complexity - At time t each calculation only involves N previous values of . The length of the sequence is T. Therefore, for each state we need and for the total N states we need
![Page 14: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky](https://reader035.vdocuments.net/reader035/viewer/2022062223/5a4d1b887f8b9ab0599bd914/html5/thumbnails/14.jpg)
Hidden Markov Model – Viterbi algorithm
Page 14
Automated Speach Recognotion
• Previously: given an observation sequence (for example 10110), what is the probability it was generated from a given HMM
• We now ask: given an observation sequence , what is the sequence of states that it most likely to have generated it?
![Page 15: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky](https://reader035.vdocuments.net/reader035/viewer/2022062223/5a4d1b887f8b9ab0599bd914/html5/thumbnails/15.jpg)
Hidden Markov Model – Viterbi algorithm
Page 15
Automated Speach Recognotion
• Define as the best path from the start state to state i at time t
• is our objective
• We solve this with the same forward algorithm as before, but this time with maximization instead of summation.
![Page 16: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky](https://reader035.vdocuments.net/reader035/viewer/2022062223/5a4d1b887f8b9ab0599bd914/html5/thumbnails/16.jpg)
Hidden Markov Model – Viterbi algorithm
Page 16
Automated Speach Recognotion
![Page 17: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky](https://reader035.vdocuments.net/reader035/viewer/2022062223/5a4d1b887f8b9ab0599bd914/html5/thumbnails/17.jpg)
Hidden Markov Model – Viterbi algorithm, example
Page 17
Automated Speach Recognotion
• Observations sequence - 101
![Page 18: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky](https://reader035.vdocuments.net/reader035/viewer/2022062223/5a4d1b887f8b9ab0599bd914/html5/thumbnails/18.jpg)
Hidden Markov Model – model fitting
Page 18
Automated Speach Recognotion
• In practice, the parameters of the HMM are unknown
• We are interested in
• No analytical maximum likelihood solution. We turn to Baum-Welch algorithm or forward-backward algorithm.
• Basic idea – count the visits of each state and the number of transitions to derive a probability estimator
![Page 19: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky](https://reader035.vdocuments.net/reader035/viewer/2022062223/5a4d1b887f8b9ab0599bd914/html5/thumbnails/19.jpg)
Hidden Markov Model – model fitting
Page 19
Automated Speach Recognotion
• Define:
This is the conditional probability that are observed, given that the system starts at state given the model . This can be calculated inductively:
• Define – the probability of being at the state i at time t and having observed the partial sequence :
![Page 20: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky](https://reader035.vdocuments.net/reader035/viewer/2022062223/5a4d1b887f8b9ab0599bd914/html5/thumbnails/20.jpg)
Hidden Markov Model – model fitting
Page 20
Automated Speach Recognotion
• Define:
This is the conditional probability that are observed, given that the system starts at state given the model .
• Define – the probability of being at the state i at time t and having observed the partial sequence :
• Therefore, the probability of being in state i at time t, given the entire observations sequence and the model is simply:
![Page 21: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky](https://reader035.vdocuments.net/reader035/viewer/2022062223/5a4d1b887f8b9ab0599bd914/html5/thumbnails/21.jpg)
Hidden Markov Model – model fitting
Page 21
Automated Speach Recognotion
• Define: the probability of being in state i at time t and state j at time t+1 given the model and the observations sequence:
• Graphically:
![Page 22: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky](https://reader035.vdocuments.net/reader035/viewer/2022062223/5a4d1b887f8b9ab0599bd914/html5/thumbnails/22.jpg)
Hidden Markov Model – model fitting
Page 22
Automated Speach Recognotion
• We are now ready to introduce the parameters estimators:
Transition probability estimator:
The expected number of transitions from state i to j, normalized by the expected number of visits of state i
![Page 23: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky](https://reader035.vdocuments.net/reader035/viewer/2022062223/5a4d1b887f8b9ab0599bd914/html5/thumbnails/23.jpg)
Hidden Markov Model – model fitting
Page 23
Automated Speach Recognotion
• We are now ready to introduce the parameters estimators:
Observations probability estimator:
The expected number times in state j at which the symbol was observed, normalized by the expected number of times the system visited state j
![Page 24: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky](https://reader035.vdocuments.net/reader035/viewer/2022062223/5a4d1b887f8b9ab0599bd914/html5/thumbnails/24.jpg)
Hidden Markov Model – model fitting
Page 24
Automated Speach Recognotion
• Notice that the parameters we wish to estimate actually appear in both sides of the equation:
![Page 25: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky](https://reader035.vdocuments.net/reader035/viewer/2022062223/5a4d1b887f8b9ab0599bd914/html5/thumbnails/25.jpg)
Hidden Markov Model – model fitting
Page 25
Automated Speach Recognotion
• Notice that the parameters we wish to estimate actually appear in both sides of the equation
• Therefore, we use an iterative procedure: after stating with an initial guess for the parameters we gradually update at each iteration and terminate once the parameters stop changing to a certain limit.
![Page 26: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky](https://reader035.vdocuments.net/reader035/viewer/2022062223/5a4d1b887f8b9ab0599bd914/html5/thumbnails/26.jpg)
Hidden Markov Model – model fitting
Page 26
Automated Speach Recognotion
• For continuous observations
• We estimate the mean and variance for each state j:
![Page 27: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky](https://reader035.vdocuments.net/reader035/viewer/2022062223/5a4d1b887f8b9ab0599bd914/html5/thumbnails/27.jpg)
Conclusions and final remarks
Page 27
Automated Speach Recognotion
• We learned how to:I. Estimate HMM parameters from a sequence of observationsII. Determine the probability of observing a sequence given an HMMIII. Determine the most likely sequence of states, given an HMM and a
sequence of observations
• Notice the states may either represent words, syllables, phoneme, etc. This is up for the system architect to decide
• For example, words are more informative than syllables, but results with more states and less accurate probability estimation (curse of dimensionality)
![Page 28: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky](https://reader035.vdocuments.net/reader035/viewer/2022062223/5a4d1b887f8b9ab0599bd914/html5/thumbnails/28.jpg)
Questions?
Page 28
Thank you!
Automated Speach Recognotion