automated speach recognotion automated speach recognition by: amichai painsky

Automated Speach Recognotion

Automated Speach Recognition

By: Amichai Painsky

Automated Speech Recognition - setup


• Input – speech waveform:

• Preprocessing:

• Modeling:

• Output – transcription: The boy is in the red house

ASR - basics


• Observations representing a speech signal

• Vocabulary V of different words

• Our goal – find the most likely word sequence

• Since

we have

language modeling

acoustic modeling

Observations preprocessing


• A sampled waveform is converted into a sequence of parameter vectors at a certain frame rate

• A frame rate of 10 ms is usually taken, because a speech signal is assumed to be stationary for about 10 ms

• Many different ways to extract meaningful features have been developed, some based on acoustic concepts or knowledge of the human vocal tract and psychophysical knowledge of the human perception

Language modeling


• Most generally, the probability of a sequence m of words is

• Language is highly structured and limited histories are capable of capturing quite a bit of this structure. Bigram models:

• More powerful two-words (trigrams) history models

• Longer history -> exponentially increasing number of models -> more data is required to train, more parameters, more overfitting

• Partial Matching modeling

Acoustic modeling


• Determines what sound is pronounced when a given sentence is uttered

• Number of possibilities is infinite! (depends on the speaker, the ambiance, microphone placement etc.)

• Possible solution – a parametric model in the form of Hidden Markov Model

• Notice other solutions may also apply (for example, neural nets)

Hidden Markov Model


• A simple example of HMM:

Hidden Markov Model


Hidden Markov Model – Forward Algorithm


• Given an observation sequence (for example 10110), what is the probability it was generated from a given HMM (for example – HMM from previous slides).

• For a path q=12312 and a given HMM :

• Therefore, summing over all possible paths:



• Complexity: for a sequence of T observations each path necessitates 2T multiplications. Total number of paths is , therefore

• A more efficient approach – forward algorithm



• Forward algorithm: calculates the probabilities for all subsequences in each time step, using the results from the previous step (dynamic programming)

• Define – the probability of being at the state i at time t and having observed the partial sequence :



• Complexity - At time t each calculation only involves N previous values of . The length of the sequence is T. Therefore, for each state we need and for the total N states we need

Hidden Markov Model – Viterbi algorithm


• Previously: given an observation sequence (for example 10110), what is the probability it was generated from a given HMM

• We now ask: given an observation sequence , what is the sequence of states that it most likely to have generated it?



• Define as the best path from the start state to state i at time t

• is our objective

• We solve this with the same forward algorithm as before, but this time with maximization instead of summation.

Hidden Markov Model – Viterbi algorithm, example


• Observations sequence - 101

Hidden Markov Model – model fitting


• In practice, the parameters of the HMM are unknown

• We are interested in

• No analytical maximum likelihood solution. We turn to Baum-Welch algorithm or forward-backward algorithm.

• Basic idea – count the visits of each state and the number of transitions to derive a probability estimator



• Define:

This is the conditional probability that are observed, given that the system starts at state given the model . This can be calculated inductively:




• Define:

This is the conditional probability that are observed, given that the system starts at state given the model .


• Therefore, the probability of being in state i at time t, given the entire observations sequence and the model is simply:



• Define: the probability of being in state i at time t and state j at time t+1 given the model and the observations sequence:

• Graphically:



• We are now ready to introduce the parameters estimators:

Transition probability estimator:

The expected number of transitions from state i to j, normalized by the expected number of visits of state i



• We are now ready to introduce the parameters estimators:

Observations probability estimator:

The expected number times in state j at which the symbol was observed, normalized by the expected number of times the system visited state j



• Notice that the parameters we wish to estimate actually appear in both sides of the equation:



• Notice that the parameters we wish to estimate actually appear in both sides of the equation

• Therefore, we use an iterative procedure: after stating with an initial guess for the parameters we gradually update at each iteration and terminate once the parameters stop changing to a certain limit.



• For continuous observations

• We estimate the mean and variance for each state j:

Conclusions and final remarks


• We learned how to:I. Estimate HMM parameters from a sequence of observationsII. Determine the probability of observing a sequence given an HMMIII. Determine the most likely sequence of states, given an HMM and a

sequence of observations

• Notice the states may either represent words, syllables, phoneme, etc. This is up for the system architect to decide

• For example, words are more informative than syllables, but results with more states and less accurate probability estimation (curse of dimensionality)

Questions?

Thank you!


automated speach recognotion automated speach recognition by: amichai painsky

Documents