![Page 1: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/1.jpg)
![Page 2: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/2.jpg)
![Page 3: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/3.jpg)
Feature extractor
![Page 4: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/4.jpg)
Feature extractorMel-Frequency Cepstral Coefficients
(MFCCs)Feature vectors
![Page 5: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/5.jpg)
Acoustic Observations
![Page 6: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/6.jpg)
Acoustic ObservationsHidden States
![Page 7: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/7.jpg)
Acoustic ObservationsHidden StatesAcoustic Observation likelihoods
![Page 8: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/8.jpg)
“Six”
![Page 9: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/9.jpg)
![Page 10: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/10.jpg)
Constructs the HMMs of phonesProduces observation likelihoods
![Page 11: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/11.jpg)
Constructs the HMMs for units of speech
Produces observation likelihoods Sampling rate is critical! WSJ vs. WSJ_8k
![Page 12: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/12.jpg)
Constructs the HMMs for units of speech
Produces observation likelihoods Sampling rate is critical! WSJ vs. WSJ_8kTIDIGITS, RM1, AN4, HUB4
![Page 13: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/13.jpg)
Word likelihoods
![Page 14: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/14.jpg)
ARPA format Example:
1-grams:-3.7839 board -0.1552-2.5998 bottom -0.3207-3.7839 bunch -0.21742-grams:-0.7782 as the -0.2717-0.4771 at all 0.0000-0.7782 at the -0.29153-grams:-2.4450 in the lowest -0.5211 in the middle -2.4450 in the on
![Page 15: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/15.jpg)
public <basicCmd> = <startPolite> <command> <endPolite>;
public <startPolite> = (please | kindly | could you ) *;
public <endPolite> = [ please | thanks | thank you ];
<command> = <action> <object>;
<action> = (open | close | delete | move); <object> = [the | a] (window | file | menu);
![Page 16: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/16.jpg)
Maps words to phoneme sequences
![Page 17: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/17.jpg)
Example from cmudict.06d
POULTICE P OW L T AH SPOULTICES P OW L T AH S IH ZPOULTON P AW L T AH NPOULTRY P OW L T R IYPOUNCE P AW N SPOUNCED P AW N S TPOUNCEY P AW N S IYPOUNCING P AW N S IH NGPOUNCY P UW NG K IY
![Page 18: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/18.jpg)
Constructs the search graph of HMMs from: Acoustic model Statistical Language model ~or~ Grammar Dictionary
![Page 19: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/19.jpg)
![Page 20: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/20.jpg)
![Page 21: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/21.jpg)
Can be statically or dynamically constructed
![Page 22: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/22.jpg)
FlatLinguist
![Page 23: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/23.jpg)
FlatLinguistDynamicFlatLinguist
![Page 24: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/24.jpg)
FlatLinguistDynamicFlatLinguistLexTreeLinguist
![Page 25: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/25.jpg)
Maps feature vectors to search graph
![Page 26: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/26.jpg)
Searches the graph for the “best fit”
![Page 27: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/27.jpg)
Searches the graph for the “best fit”
P(sequence of feature vectors| word/phone)
aka. P(O|W)
-> “how likely is the input to have been generated by the word”
![Page 28: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/28.jpg)
F ay ay ay ay v v v v vF f ay ay ay ay v v v vF f f ay ay ay ay v v vF f f f ay ay ay ay v vF f f f ay ay ay ay ay vF f f f f ay ay ay ay vF f f f f f ay ay ay v…
![Page 29: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/29.jpg)
TimeO1 O2 O3
![Page 30: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/30.jpg)
Uses algorithms to weed out low scoring paths during decoding
![Page 31: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/31.jpg)
Words!
![Page 32: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/32.jpg)
Most common metricMeasure the # of modifications to
transform recognized sentence into reference sentence
![Page 33: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/33.jpg)
Reference: “This is a reference sentence.”
Result: “This is neuroscience.”
![Page 34: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/34.jpg)
Reference: “This is a reference sentence.”
Result: “This is neuroscience.”Requires 2 deletions, 1 substitution
![Page 35: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/35.jpg)
Reference: “This is a reference sentence.”
Result: “This is neuroscience.”
€
WER =100 ×deletions+ substitutions+ insertions
Length
![Page 36: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/36.jpg)
Reference: “This is a reference sentence.”
Result: “This is neuroscience.” D S
D
€
WER =100 ×2 +1+ 0
5=100 ×
3
5= 60%
![Page 37: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/37.jpg)
![Page 38: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/38.jpg)
![Page 39: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/39.jpg)
![Page 40: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/40.jpg)
![Page 41: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/41.jpg)
![Page 42: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/42.jpg)
![Page 43: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/43.jpg)
![Page 44: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/44.jpg)
![Page 45: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/45.jpg)
![Page 46: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/46.jpg)
![Page 47: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/47.jpg)
![Page 48: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/48.jpg)
Limited Vocab Multi-Speaker
![Page 49: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/49.jpg)
Limited Vocab Multi-SpeakerExtensive Vocab Single Speaker
![Page 50: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/50.jpg)
*If you have noisy audio input multiply expected error rate x 2
![Page 51: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/51.jpg)
Other variables:-Continuous vs. Isolated-Conversational vs. Read-Dialect
![Page 52: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/52.jpg)
Questions?
![Page 53: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/53.jpg)
TimeO1 O2 O3
![Page 54: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/54.jpg)
TimeO1 O2 O3
P(ay | f) *P(O2|ay)
P(f|f) * P(O2 | f)
![Page 55: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/55.jpg)
TimeO1 O2 O3
P (O1) * P(ay | f) *P(O2|ay)
![Page 56: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/56.jpg)
TimeO1 O2 O3
![Page 57: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/57.jpg)
Common Sphinx4 FAQs can be found online:http://cmusphinx.sourceforge.net/sphinx4/doc/Sphinx4-faq.html
What followes are some less-FAQs
![Page 58: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/58.jpg)
Q. Is a search graph created for every recognition result or one for the recognition app?
A. This depends on which Linguist is used. The flat linguist generates the entire search graph and holds it in memory. It is only useful for small vocab recognition tasks. The lexTreeLinguist dynamically generates search states allowing it to handle very large vocabularies
![Page 59: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/59.jpg)
Q. How does the Viterbi algorithm save computation over exhaustive search?
A. The Viterbi algorithm saves memory and computation by reusing subproblems already solved within the larger solution. In this way probability calculations which repeat in different paths through the search graph do not get calculated multiple times
Viterbi cost = n2 – n3
Exhaustive search cost = 2n -3n
![Page 60: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/60.jpg)
Q. Does the linguist use a grammar to construct the search graph if it is available?
A. Yes, a grammar graph is created
![Page 61: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/61.jpg)
Q. What algorithm does the Pruner use?
A. Sphinx4 uses absolute and relative beam pruning
![Page 62: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/62.jpg)
Absolute Beam Width - # active search paths
<property name="absoluteBeamWidth" value="5000"/>
![Page 63: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/63.jpg)
Absolute Beam Width - # active search paths
<property name="absoluteBeamWidth" value="5000"/>
Relative Beam Width – probability threshold
<property name="relativeBeamWidth" value="1E-120"/>
![Page 64: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/64.jpg)
Absolute Beam Width - # active search paths
<property name="absoluteBeamWidth" value="5000"/>
Relative Beam Width – probability threshold
<property name="relativeBeamWidth" value="1E-120"/>
Word Insertion Probability – Word break likelihood
<property name="wordInsertionProbability" value="0.7"/>
![Page 65: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/65.jpg)
Absolute Beam Width - # active search paths <property name="absoluteBeamWidth" value="5000"/>Relative Beam Width – probability threshold <property name="relativeBeamWidth" value="1E-120"/> Word Insertion Probability – Word break likelihood <property name="wordInsertionProbability" value="0.7"/> Language Weight – Boosts language model scores <property name="languageWeight" value="10.5"/>
![Page 66: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/66.jpg)
Silence Insertion Probability – Likelihood of inserting silence
<property name="silenceInsertionProbability" value=".1"/>
![Page 67: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/67.jpg)
Silence Insertion Probability – Likelihood of inserting silence
<property name="silenceInsertionProbability" value=".1"/>
Filler Insertion Probability – Likelihood of inserting filler words
<property name="fillerInsertionProbability" value="1E-10"/>
![Page 68: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/68.jpg)
To call a Java example from Python:
import subprocess
subprocess.call(["java", "-mx1000m", "-jar","/Users/Username/sphinx4/bin/Transcriber.jar”)
![Page 69: Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors](https://reader033.vdocuments.net/reader033/viewer/2022061522/56649e725503460f94b70e20/html5/thumbnails/69.jpg)
Speech and Language Processing 2nd Ed.Daniel Jurafsky and James MartinPearson, 2009
Artificial Intelligence 6th Ed.George LugerAddison Wesley, 2009
Sphinx Whitepaperhttp://cmusphinx.sourceforge.net/sphinx4/#whitepaper
Sphinx Forumhttps://sourceforge.net/projects/cmusphinx/forums