giampiero salvi - kthgiampi/teaching/htk_tutorial.pdf · c 2008 giampiero salvi 1/47 htk tutorial...
TRANSCRIPT
ccopy2008 Giampiero Salvi 147
HTK Tutorial
Giampiero Salvi
KTH (Royal Institute of Technology)Dep of Speech Music and Hearing
Drottning Kristinas v 31SE-100 44 Stockholm Sweden
giampikthse
Nov 2003
ccopy2008 Giampiero Salvi 247
Introduction
Data formats and manipulation
Data visualization
Training
Recognition
ccopy2008 Giampiero Salvi 347
HTK What is it
I A toolkit for Hidden Markov Modeling
I General purpose but
I optimized for Speech Recognition
I Very flexible and complete (active development)
I Very good documentation (HTKBook)
ccopy2008 Giampiero Salvi 447
HTK What is it
I A toolkit for Hidden Markov Modeling
I General purpose but
I optimized for Speech Recognition
I Very flexible and complete (active development)
I Very good documentation (HTKBook)
ccopy2008 Giampiero Salvi 547
HTK What is it
I A toolkit for Hidden Markov Modeling
I General purpose but
I optimized for Speech Recognition
I Very flexible and complete (active development)
I Very good documentation (HTKBook)
ccopy2008 Giampiero Salvi 647
HTK What is it
I A toolkit for Hidden Markov Modeling
I General purpose but
I optimized for Speech Recognition
I Very flexible and complete (active development)
I Very good documentation (HTKBook)
ccopy2008 Giampiero Salvi 747
HTK What is it
I A toolkit for Hidden Markov Modeling
I General purpose but
I optimized for Speech Recognition
I Very flexible and complete (active development)
I Very good documentation (HTKBook)
ccopy2008 Giampiero Salvi 847
ASR Overview
Training
audiofeatureextraction
training model set
prototype HMM
features
dictionary
labels
Recognition
audiofeatureextraction
features
dictionary
model set
labelsdecoding
grammar
ccopy2008 Giampiero Salvi 947
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1047
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1147
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1247
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1347
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1447
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1547
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1647
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1747
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1847
The HTK data formats
data formatsaudio many common formats plus HTK binaryfeatures HTK binarylabels HTK (single or Master Label files) textmodels HTK (single or Master Macro files) text or binaryother HTK text
audiofeatureextraction
training model set
prototype HMM
features
dictionary
labels
audiofeatureextraction
features
dictionary
model set
labelsdecoding
grammar
ccopy2008 Giampiero Salvi 1947
Usage example (HList)
gt HList
USAGE HList [options] file
Option Default
-d Coerce observation to VQ symbols off-e N End at sample N 0-h Print source header info off-i N Set items per line to N 10-n N Set num streams to N 1-o Print observation structure off-p Playback audio off-r Write raw output off-s N Start at sample N 0-t Print target header info off-z Suppress printing data on-A Print command line arguments off-C cf Set config file to cf default-D Display configuration variables off
ccopy2008 Giampiero Salvi 2047
Command line switches and options
gt HList -e 1 -o -h feature_file
Source feature_file
Sample Bytes 26 Sample Kind MFCC_0
Num Comps 13 Sample Period 100000 us
Num Samples 336 File Format HTK
-------------------- Observation Structure ---------------------
x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0
------------------------ Samples 0-gt1 -------------------------
0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734
1 -13591 -4756 -6037 -3362 3541 3510 2867
0812 0630 5285 1054 8375 40778
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2147
Configuration file
gt cat config_file
SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A
gt HList -C config_file -e 0 -o -h feature_file
Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK
-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0
------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2247
File manipulation tools
I HCopy converts fromto various data formats (audiofeatures)
I HQuant quantizes speech (audio)
I HLEd edits label and master label files
I HDMan edits dictionary files
I HHEd edits model and master macro files
I HBuild converts language models in different formats (morein recognition section)
ccopy2008 Giampiero Salvi 2347
Computing feature files (HCopy)
gt cat config_file
Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250
gt HCopy -C config_file audio_file1 param_file1 audio_file2
gt HCopy -C config_file -S file_list
ccopy2008 Giampiero Salvi 2447
Label files
MLF
filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]
filename2
I [] = optional (0 or 1)
I = possible repetition (0 1 2)
I time stamps are in 100ns units () 10ms = 100000
ccopy2008 Giampiero Salvi 2547
Label file example 1
gt cat alignedmlf
MLFa10001a1rec
0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe
10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec
0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt
ccopy2008 Giampiero Salvi 2647
Label file example 2 (HLEd)
gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf
gt cat wordsmlf
MLF
a10001a1rec
forra
a10001i1rec
sju
gt cat words2phonesled
EX
IS sil sil
gt cat phonesmlf
MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil
ccopy2008 Giampiero Salvi 2747
Dictionary (HDMan)
WORD [OUTSYM] PRONPROB P1 P2 P3 P4
gt cat lexdic
forra f oe r a spsju S uh a sp
gt cat lex2dic
ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp
ccopy2008 Giampiero Salvi 2847
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
HMM definition (~h)
1
2 3 4
5
ccopy2008 Giampiero Salvi 2947
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
State definition (~s)
1
2 3 4
5
2
ccopy2008 Giampiero Salvi 3047
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Gaussian mixture component definition(~m)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3147
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Mean vector definition (~u)Diagonal variance vector definition (~v)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3247
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Transition matrix definition (~t)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3347
I HSLab graphical tool to label speech (use WaveSurferinstead)
I HList gives information about audio and feature files
I HSGen generates random sentences out of a regular grammar
ccopy2008 Giampiero Salvi 3447
Intermezzo what do we know so far
Training
audiofeatureextraction
prototype HMM
features
dictionary
labels
model settraining
Recognition
grammar
decodingaudiofeatureextraction
features
model set
labels
dictionary
ccopy2008 Giampiero Salvi 3547
model initialization
Initialization procedure depends on the information avaliable atthat time
I HCompV computes the overall mean and varianceInput a prototype HMM
I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means
Input a prototype HMM time aligned transcriptions
ccopy2008 Giampiero Salvi 3647
Traning tools
I HRest Baum-Welch re-estimation
Input an initialized model set time aligned transcriptions
I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions
I HEAdapt performs adaptation on a limited set of data
I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart
ccopy2008 Giampiero Salvi 3747
Training example RefRec
first pass
prototype HMM rarr HCompV rarr cloning rarr HERest rarr
realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr
HERest rarr realignment (HVite)
second pass
prototype HMM + aligned transcriptions rarr HInit rarr HRest
rarr HERest rarr increase MC (HHEd) rarr HERest
ccopy2008 Giampiero Salvi 3847
Recognition tools
grammar generation
I HLStats creates bigram from training data
I HParse parses a user defined grammar to produce a lattice
decoding
I HVite performs Viterbi decoding
evaluation
I HResults evaluates recognition results
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 247
Introduction
Data formats and manipulation
Data visualization
Training
Recognition
ccopy2008 Giampiero Salvi 347
HTK What is it
I A toolkit for Hidden Markov Modeling
I General purpose but
I optimized for Speech Recognition
I Very flexible and complete (active development)
I Very good documentation (HTKBook)
ccopy2008 Giampiero Salvi 447
HTK What is it
I A toolkit for Hidden Markov Modeling
I General purpose but
I optimized for Speech Recognition
I Very flexible and complete (active development)
I Very good documentation (HTKBook)
ccopy2008 Giampiero Salvi 547
HTK What is it
I A toolkit for Hidden Markov Modeling
I General purpose but
I optimized for Speech Recognition
I Very flexible and complete (active development)
I Very good documentation (HTKBook)
ccopy2008 Giampiero Salvi 647
HTK What is it
I A toolkit for Hidden Markov Modeling
I General purpose but
I optimized for Speech Recognition
I Very flexible and complete (active development)
I Very good documentation (HTKBook)
ccopy2008 Giampiero Salvi 747
HTK What is it
I A toolkit for Hidden Markov Modeling
I General purpose but
I optimized for Speech Recognition
I Very flexible and complete (active development)
I Very good documentation (HTKBook)
ccopy2008 Giampiero Salvi 847
ASR Overview
Training
audiofeatureextraction
training model set
prototype HMM
features
dictionary
labels
Recognition
audiofeatureextraction
features
dictionary
model set
labelsdecoding
grammar
ccopy2008 Giampiero Salvi 947
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1047
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1147
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1247
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1347
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1447
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1547
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1647
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1747
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1847
The HTK data formats
data formatsaudio many common formats plus HTK binaryfeatures HTK binarylabels HTK (single or Master Label files) textmodels HTK (single or Master Macro files) text or binaryother HTK text
audiofeatureextraction
training model set
prototype HMM
features
dictionary
labels
audiofeatureextraction
features
dictionary
model set
labelsdecoding
grammar
ccopy2008 Giampiero Salvi 1947
Usage example (HList)
gt HList
USAGE HList [options] file
Option Default
-d Coerce observation to VQ symbols off-e N End at sample N 0-h Print source header info off-i N Set items per line to N 10-n N Set num streams to N 1-o Print observation structure off-p Playback audio off-r Write raw output off-s N Start at sample N 0-t Print target header info off-z Suppress printing data on-A Print command line arguments off-C cf Set config file to cf default-D Display configuration variables off
ccopy2008 Giampiero Salvi 2047
Command line switches and options
gt HList -e 1 -o -h feature_file
Source feature_file
Sample Bytes 26 Sample Kind MFCC_0
Num Comps 13 Sample Period 100000 us
Num Samples 336 File Format HTK
-------------------- Observation Structure ---------------------
x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0
------------------------ Samples 0-gt1 -------------------------
0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734
1 -13591 -4756 -6037 -3362 3541 3510 2867
0812 0630 5285 1054 8375 40778
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2147
Configuration file
gt cat config_file
SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A
gt HList -C config_file -e 0 -o -h feature_file
Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK
-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0
------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2247
File manipulation tools
I HCopy converts fromto various data formats (audiofeatures)
I HQuant quantizes speech (audio)
I HLEd edits label and master label files
I HDMan edits dictionary files
I HHEd edits model and master macro files
I HBuild converts language models in different formats (morein recognition section)
ccopy2008 Giampiero Salvi 2347
Computing feature files (HCopy)
gt cat config_file
Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250
gt HCopy -C config_file audio_file1 param_file1 audio_file2
gt HCopy -C config_file -S file_list
ccopy2008 Giampiero Salvi 2447
Label files
MLF
filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]
filename2
I [] = optional (0 or 1)
I = possible repetition (0 1 2)
I time stamps are in 100ns units () 10ms = 100000
ccopy2008 Giampiero Salvi 2547
Label file example 1
gt cat alignedmlf
MLFa10001a1rec
0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe
10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec
0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt
ccopy2008 Giampiero Salvi 2647
Label file example 2 (HLEd)
gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf
gt cat wordsmlf
MLF
a10001a1rec
forra
a10001i1rec
sju
gt cat words2phonesled
EX
IS sil sil
gt cat phonesmlf
MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil
ccopy2008 Giampiero Salvi 2747
Dictionary (HDMan)
WORD [OUTSYM] PRONPROB P1 P2 P3 P4
gt cat lexdic
forra f oe r a spsju S uh a sp
gt cat lex2dic
ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp
ccopy2008 Giampiero Salvi 2847
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
HMM definition (~h)
1
2 3 4
5
ccopy2008 Giampiero Salvi 2947
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
State definition (~s)
1
2 3 4
5
2
ccopy2008 Giampiero Salvi 3047
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Gaussian mixture component definition(~m)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3147
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Mean vector definition (~u)Diagonal variance vector definition (~v)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3247
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Transition matrix definition (~t)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3347
I HSLab graphical tool to label speech (use WaveSurferinstead)
I HList gives information about audio and feature files
I HSGen generates random sentences out of a regular grammar
ccopy2008 Giampiero Salvi 3447
Intermezzo what do we know so far
Training
audiofeatureextraction
prototype HMM
features
dictionary
labels
model settraining
Recognition
grammar
decodingaudiofeatureextraction
features
model set
labels
dictionary
ccopy2008 Giampiero Salvi 3547
model initialization
Initialization procedure depends on the information avaliable atthat time
I HCompV computes the overall mean and varianceInput a prototype HMM
I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means
Input a prototype HMM time aligned transcriptions
ccopy2008 Giampiero Salvi 3647
Traning tools
I HRest Baum-Welch re-estimation
Input an initialized model set time aligned transcriptions
I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions
I HEAdapt performs adaptation on a limited set of data
I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart
ccopy2008 Giampiero Salvi 3747
Training example RefRec
first pass
prototype HMM rarr HCompV rarr cloning rarr HERest rarr
realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr
HERest rarr realignment (HVite)
second pass
prototype HMM + aligned transcriptions rarr HInit rarr HRest
rarr HERest rarr increase MC (HHEd) rarr HERest
ccopy2008 Giampiero Salvi 3847
Recognition tools
grammar generation
I HLStats creates bigram from training data
I HParse parses a user defined grammar to produce a lattice
decoding
I HVite performs Viterbi decoding
evaluation
I HResults evaluates recognition results
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 347
HTK What is it
I A toolkit for Hidden Markov Modeling
I General purpose but
I optimized for Speech Recognition
I Very flexible and complete (active development)
I Very good documentation (HTKBook)
ccopy2008 Giampiero Salvi 447
HTK What is it
I A toolkit for Hidden Markov Modeling
I General purpose but
I optimized for Speech Recognition
I Very flexible and complete (active development)
I Very good documentation (HTKBook)
ccopy2008 Giampiero Salvi 547
HTK What is it
I A toolkit for Hidden Markov Modeling
I General purpose but
I optimized for Speech Recognition
I Very flexible and complete (active development)
I Very good documentation (HTKBook)
ccopy2008 Giampiero Salvi 647
HTK What is it
I A toolkit for Hidden Markov Modeling
I General purpose but
I optimized for Speech Recognition
I Very flexible and complete (active development)
I Very good documentation (HTKBook)
ccopy2008 Giampiero Salvi 747
HTK What is it
I A toolkit for Hidden Markov Modeling
I General purpose but
I optimized for Speech Recognition
I Very flexible and complete (active development)
I Very good documentation (HTKBook)
ccopy2008 Giampiero Salvi 847
ASR Overview
Training
audiofeatureextraction
training model set
prototype HMM
features
dictionary
labels
Recognition
audiofeatureextraction
features
dictionary
model set
labelsdecoding
grammar
ccopy2008 Giampiero Salvi 947
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1047
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1147
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1247
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1347
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1447
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1547
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1647
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1747
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1847
The HTK data formats
data formatsaudio many common formats plus HTK binaryfeatures HTK binarylabels HTK (single or Master Label files) textmodels HTK (single or Master Macro files) text or binaryother HTK text
audiofeatureextraction
training model set
prototype HMM
features
dictionary
labels
audiofeatureextraction
features
dictionary
model set
labelsdecoding
grammar
ccopy2008 Giampiero Salvi 1947
Usage example (HList)
gt HList
USAGE HList [options] file
Option Default
-d Coerce observation to VQ symbols off-e N End at sample N 0-h Print source header info off-i N Set items per line to N 10-n N Set num streams to N 1-o Print observation structure off-p Playback audio off-r Write raw output off-s N Start at sample N 0-t Print target header info off-z Suppress printing data on-A Print command line arguments off-C cf Set config file to cf default-D Display configuration variables off
ccopy2008 Giampiero Salvi 2047
Command line switches and options
gt HList -e 1 -o -h feature_file
Source feature_file
Sample Bytes 26 Sample Kind MFCC_0
Num Comps 13 Sample Period 100000 us
Num Samples 336 File Format HTK
-------------------- Observation Structure ---------------------
x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0
------------------------ Samples 0-gt1 -------------------------
0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734
1 -13591 -4756 -6037 -3362 3541 3510 2867
0812 0630 5285 1054 8375 40778
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2147
Configuration file
gt cat config_file
SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A
gt HList -C config_file -e 0 -o -h feature_file
Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK
-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0
------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2247
File manipulation tools
I HCopy converts fromto various data formats (audiofeatures)
I HQuant quantizes speech (audio)
I HLEd edits label and master label files
I HDMan edits dictionary files
I HHEd edits model and master macro files
I HBuild converts language models in different formats (morein recognition section)
ccopy2008 Giampiero Salvi 2347
Computing feature files (HCopy)
gt cat config_file
Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250
gt HCopy -C config_file audio_file1 param_file1 audio_file2
gt HCopy -C config_file -S file_list
ccopy2008 Giampiero Salvi 2447
Label files
MLF
filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]
filename2
I [] = optional (0 or 1)
I = possible repetition (0 1 2)
I time stamps are in 100ns units () 10ms = 100000
ccopy2008 Giampiero Salvi 2547
Label file example 1
gt cat alignedmlf
MLFa10001a1rec
0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe
10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec
0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt
ccopy2008 Giampiero Salvi 2647
Label file example 2 (HLEd)
gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf
gt cat wordsmlf
MLF
a10001a1rec
forra
a10001i1rec
sju
gt cat words2phonesled
EX
IS sil sil
gt cat phonesmlf
MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil
ccopy2008 Giampiero Salvi 2747
Dictionary (HDMan)
WORD [OUTSYM] PRONPROB P1 P2 P3 P4
gt cat lexdic
forra f oe r a spsju S uh a sp
gt cat lex2dic
ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp
ccopy2008 Giampiero Salvi 2847
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
HMM definition (~h)
1
2 3 4
5
ccopy2008 Giampiero Salvi 2947
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
State definition (~s)
1
2 3 4
5
2
ccopy2008 Giampiero Salvi 3047
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Gaussian mixture component definition(~m)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3147
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Mean vector definition (~u)Diagonal variance vector definition (~v)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3247
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Transition matrix definition (~t)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3347
I HSLab graphical tool to label speech (use WaveSurferinstead)
I HList gives information about audio and feature files
I HSGen generates random sentences out of a regular grammar
ccopy2008 Giampiero Salvi 3447
Intermezzo what do we know so far
Training
audiofeatureextraction
prototype HMM
features
dictionary
labels
model settraining
Recognition
grammar
decodingaudiofeatureextraction
features
model set
labels
dictionary
ccopy2008 Giampiero Salvi 3547
model initialization
Initialization procedure depends on the information avaliable atthat time
I HCompV computes the overall mean and varianceInput a prototype HMM
I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means
Input a prototype HMM time aligned transcriptions
ccopy2008 Giampiero Salvi 3647
Traning tools
I HRest Baum-Welch re-estimation
Input an initialized model set time aligned transcriptions
I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions
I HEAdapt performs adaptation on a limited set of data
I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart
ccopy2008 Giampiero Salvi 3747
Training example RefRec
first pass
prototype HMM rarr HCompV rarr cloning rarr HERest rarr
realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr
HERest rarr realignment (HVite)
second pass
prototype HMM + aligned transcriptions rarr HInit rarr HRest
rarr HERest rarr increase MC (HHEd) rarr HERest
ccopy2008 Giampiero Salvi 3847
Recognition tools
grammar generation
I HLStats creates bigram from training data
I HParse parses a user defined grammar to produce a lattice
decoding
I HVite performs Viterbi decoding
evaluation
I HResults evaluates recognition results
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 447
HTK What is it
I A toolkit for Hidden Markov Modeling
I General purpose but
I optimized for Speech Recognition
I Very flexible and complete (active development)
I Very good documentation (HTKBook)
ccopy2008 Giampiero Salvi 547
HTK What is it
I A toolkit for Hidden Markov Modeling
I General purpose but
I optimized for Speech Recognition
I Very flexible and complete (active development)
I Very good documentation (HTKBook)
ccopy2008 Giampiero Salvi 647
HTK What is it
I A toolkit for Hidden Markov Modeling
I General purpose but
I optimized for Speech Recognition
I Very flexible and complete (active development)
I Very good documentation (HTKBook)
ccopy2008 Giampiero Salvi 747
HTK What is it
I A toolkit for Hidden Markov Modeling
I General purpose but
I optimized for Speech Recognition
I Very flexible and complete (active development)
I Very good documentation (HTKBook)
ccopy2008 Giampiero Salvi 847
ASR Overview
Training
audiofeatureextraction
training model set
prototype HMM
features
dictionary
labels
Recognition
audiofeatureextraction
features
dictionary
model set
labelsdecoding
grammar
ccopy2008 Giampiero Salvi 947
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1047
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1147
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1247
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1347
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1447
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1547
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1647
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1747
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1847
The HTK data formats
data formatsaudio many common formats plus HTK binaryfeatures HTK binarylabels HTK (single or Master Label files) textmodels HTK (single or Master Macro files) text or binaryother HTK text
audiofeatureextraction
training model set
prototype HMM
features
dictionary
labels
audiofeatureextraction
features
dictionary
model set
labelsdecoding
grammar
ccopy2008 Giampiero Salvi 1947
Usage example (HList)
gt HList
USAGE HList [options] file
Option Default
-d Coerce observation to VQ symbols off-e N End at sample N 0-h Print source header info off-i N Set items per line to N 10-n N Set num streams to N 1-o Print observation structure off-p Playback audio off-r Write raw output off-s N Start at sample N 0-t Print target header info off-z Suppress printing data on-A Print command line arguments off-C cf Set config file to cf default-D Display configuration variables off
ccopy2008 Giampiero Salvi 2047
Command line switches and options
gt HList -e 1 -o -h feature_file
Source feature_file
Sample Bytes 26 Sample Kind MFCC_0
Num Comps 13 Sample Period 100000 us
Num Samples 336 File Format HTK
-------------------- Observation Structure ---------------------
x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0
------------------------ Samples 0-gt1 -------------------------
0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734
1 -13591 -4756 -6037 -3362 3541 3510 2867
0812 0630 5285 1054 8375 40778
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2147
Configuration file
gt cat config_file
SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A
gt HList -C config_file -e 0 -o -h feature_file
Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK
-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0
------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2247
File manipulation tools
I HCopy converts fromto various data formats (audiofeatures)
I HQuant quantizes speech (audio)
I HLEd edits label and master label files
I HDMan edits dictionary files
I HHEd edits model and master macro files
I HBuild converts language models in different formats (morein recognition section)
ccopy2008 Giampiero Salvi 2347
Computing feature files (HCopy)
gt cat config_file
Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250
gt HCopy -C config_file audio_file1 param_file1 audio_file2
gt HCopy -C config_file -S file_list
ccopy2008 Giampiero Salvi 2447
Label files
MLF
filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]
filename2
I [] = optional (0 or 1)
I = possible repetition (0 1 2)
I time stamps are in 100ns units () 10ms = 100000
ccopy2008 Giampiero Salvi 2547
Label file example 1
gt cat alignedmlf
MLFa10001a1rec
0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe
10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec
0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt
ccopy2008 Giampiero Salvi 2647
Label file example 2 (HLEd)
gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf
gt cat wordsmlf
MLF
a10001a1rec
forra
a10001i1rec
sju
gt cat words2phonesled
EX
IS sil sil
gt cat phonesmlf
MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil
ccopy2008 Giampiero Salvi 2747
Dictionary (HDMan)
WORD [OUTSYM] PRONPROB P1 P2 P3 P4
gt cat lexdic
forra f oe r a spsju S uh a sp
gt cat lex2dic
ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp
ccopy2008 Giampiero Salvi 2847
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
HMM definition (~h)
1
2 3 4
5
ccopy2008 Giampiero Salvi 2947
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
State definition (~s)
1
2 3 4
5
2
ccopy2008 Giampiero Salvi 3047
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Gaussian mixture component definition(~m)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3147
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Mean vector definition (~u)Diagonal variance vector definition (~v)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3247
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Transition matrix definition (~t)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3347
I HSLab graphical tool to label speech (use WaveSurferinstead)
I HList gives information about audio and feature files
I HSGen generates random sentences out of a regular grammar
ccopy2008 Giampiero Salvi 3447
Intermezzo what do we know so far
Training
audiofeatureextraction
prototype HMM
features
dictionary
labels
model settraining
Recognition
grammar
decodingaudiofeatureextraction
features
model set
labels
dictionary
ccopy2008 Giampiero Salvi 3547
model initialization
Initialization procedure depends on the information avaliable atthat time
I HCompV computes the overall mean and varianceInput a prototype HMM
I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means
Input a prototype HMM time aligned transcriptions
ccopy2008 Giampiero Salvi 3647
Traning tools
I HRest Baum-Welch re-estimation
Input an initialized model set time aligned transcriptions
I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions
I HEAdapt performs adaptation on a limited set of data
I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart
ccopy2008 Giampiero Salvi 3747
Training example RefRec
first pass
prototype HMM rarr HCompV rarr cloning rarr HERest rarr
realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr
HERest rarr realignment (HVite)
second pass
prototype HMM + aligned transcriptions rarr HInit rarr HRest
rarr HERest rarr increase MC (HHEd) rarr HERest
ccopy2008 Giampiero Salvi 3847
Recognition tools
grammar generation
I HLStats creates bigram from training data
I HParse parses a user defined grammar to produce a lattice
decoding
I HVite performs Viterbi decoding
evaluation
I HResults evaluates recognition results
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 547
HTK What is it
I A toolkit for Hidden Markov Modeling
I General purpose but
I optimized for Speech Recognition
I Very flexible and complete (active development)
I Very good documentation (HTKBook)
ccopy2008 Giampiero Salvi 647
HTK What is it
I A toolkit for Hidden Markov Modeling
I General purpose but
I optimized for Speech Recognition
I Very flexible and complete (active development)
I Very good documentation (HTKBook)
ccopy2008 Giampiero Salvi 747
HTK What is it
I A toolkit for Hidden Markov Modeling
I General purpose but
I optimized for Speech Recognition
I Very flexible and complete (active development)
I Very good documentation (HTKBook)
ccopy2008 Giampiero Salvi 847
ASR Overview
Training
audiofeatureextraction
training model set
prototype HMM
features
dictionary
labels
Recognition
audiofeatureextraction
features
dictionary
model set
labelsdecoding
grammar
ccopy2008 Giampiero Salvi 947
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1047
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1147
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1247
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1347
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1447
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1547
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1647
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1747
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1847
The HTK data formats
data formatsaudio many common formats plus HTK binaryfeatures HTK binarylabels HTK (single or Master Label files) textmodels HTK (single or Master Macro files) text or binaryother HTK text
audiofeatureextraction
training model set
prototype HMM
features
dictionary
labels
audiofeatureextraction
features
dictionary
model set
labelsdecoding
grammar
ccopy2008 Giampiero Salvi 1947
Usage example (HList)
gt HList
USAGE HList [options] file
Option Default
-d Coerce observation to VQ symbols off-e N End at sample N 0-h Print source header info off-i N Set items per line to N 10-n N Set num streams to N 1-o Print observation structure off-p Playback audio off-r Write raw output off-s N Start at sample N 0-t Print target header info off-z Suppress printing data on-A Print command line arguments off-C cf Set config file to cf default-D Display configuration variables off
ccopy2008 Giampiero Salvi 2047
Command line switches and options
gt HList -e 1 -o -h feature_file
Source feature_file
Sample Bytes 26 Sample Kind MFCC_0
Num Comps 13 Sample Period 100000 us
Num Samples 336 File Format HTK
-------------------- Observation Structure ---------------------
x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0
------------------------ Samples 0-gt1 -------------------------
0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734
1 -13591 -4756 -6037 -3362 3541 3510 2867
0812 0630 5285 1054 8375 40778
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2147
Configuration file
gt cat config_file
SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A
gt HList -C config_file -e 0 -o -h feature_file
Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK
-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0
------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2247
File manipulation tools
I HCopy converts fromto various data formats (audiofeatures)
I HQuant quantizes speech (audio)
I HLEd edits label and master label files
I HDMan edits dictionary files
I HHEd edits model and master macro files
I HBuild converts language models in different formats (morein recognition section)
ccopy2008 Giampiero Salvi 2347
Computing feature files (HCopy)
gt cat config_file
Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250
gt HCopy -C config_file audio_file1 param_file1 audio_file2
gt HCopy -C config_file -S file_list
ccopy2008 Giampiero Salvi 2447
Label files
MLF
filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]
filename2
I [] = optional (0 or 1)
I = possible repetition (0 1 2)
I time stamps are in 100ns units () 10ms = 100000
ccopy2008 Giampiero Salvi 2547
Label file example 1
gt cat alignedmlf
MLFa10001a1rec
0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe
10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec
0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt
ccopy2008 Giampiero Salvi 2647
Label file example 2 (HLEd)
gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf
gt cat wordsmlf
MLF
a10001a1rec
forra
a10001i1rec
sju
gt cat words2phonesled
EX
IS sil sil
gt cat phonesmlf
MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil
ccopy2008 Giampiero Salvi 2747
Dictionary (HDMan)
WORD [OUTSYM] PRONPROB P1 P2 P3 P4
gt cat lexdic
forra f oe r a spsju S uh a sp
gt cat lex2dic
ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp
ccopy2008 Giampiero Salvi 2847
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
HMM definition (~h)
1
2 3 4
5
ccopy2008 Giampiero Salvi 2947
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
State definition (~s)
1
2 3 4
5
2
ccopy2008 Giampiero Salvi 3047
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Gaussian mixture component definition(~m)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3147
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Mean vector definition (~u)Diagonal variance vector definition (~v)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3247
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Transition matrix definition (~t)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3347
I HSLab graphical tool to label speech (use WaveSurferinstead)
I HList gives information about audio and feature files
I HSGen generates random sentences out of a regular grammar
ccopy2008 Giampiero Salvi 3447
Intermezzo what do we know so far
Training
audiofeatureextraction
prototype HMM
features
dictionary
labels
model settraining
Recognition
grammar
decodingaudiofeatureextraction
features
model set
labels
dictionary
ccopy2008 Giampiero Salvi 3547
model initialization
Initialization procedure depends on the information avaliable atthat time
I HCompV computes the overall mean and varianceInput a prototype HMM
I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means
Input a prototype HMM time aligned transcriptions
ccopy2008 Giampiero Salvi 3647
Traning tools
I HRest Baum-Welch re-estimation
Input an initialized model set time aligned transcriptions
I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions
I HEAdapt performs adaptation on a limited set of data
I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart
ccopy2008 Giampiero Salvi 3747
Training example RefRec
first pass
prototype HMM rarr HCompV rarr cloning rarr HERest rarr
realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr
HERest rarr realignment (HVite)
second pass
prototype HMM + aligned transcriptions rarr HInit rarr HRest
rarr HERest rarr increase MC (HHEd) rarr HERest
ccopy2008 Giampiero Salvi 3847
Recognition tools
grammar generation
I HLStats creates bigram from training data
I HParse parses a user defined grammar to produce a lattice
decoding
I HVite performs Viterbi decoding
evaluation
I HResults evaluates recognition results
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 647
HTK What is it
I A toolkit for Hidden Markov Modeling
I General purpose but
I optimized for Speech Recognition
I Very flexible and complete (active development)
I Very good documentation (HTKBook)
ccopy2008 Giampiero Salvi 747
HTK What is it
I A toolkit for Hidden Markov Modeling
I General purpose but
I optimized for Speech Recognition
I Very flexible and complete (active development)
I Very good documentation (HTKBook)
ccopy2008 Giampiero Salvi 847
ASR Overview
Training
audiofeatureextraction
training model set
prototype HMM
features
dictionary
labels
Recognition
audiofeatureextraction
features
dictionary
model set
labelsdecoding
grammar
ccopy2008 Giampiero Salvi 947
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1047
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1147
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1247
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1347
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1447
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1547
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1647
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1747
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1847
The HTK data formats
data formatsaudio many common formats plus HTK binaryfeatures HTK binarylabels HTK (single or Master Label files) textmodels HTK (single or Master Macro files) text or binaryother HTK text
audiofeatureextraction
training model set
prototype HMM
features
dictionary
labels
audiofeatureextraction
features
dictionary
model set
labelsdecoding
grammar
ccopy2008 Giampiero Salvi 1947
Usage example (HList)
gt HList
USAGE HList [options] file
Option Default
-d Coerce observation to VQ symbols off-e N End at sample N 0-h Print source header info off-i N Set items per line to N 10-n N Set num streams to N 1-o Print observation structure off-p Playback audio off-r Write raw output off-s N Start at sample N 0-t Print target header info off-z Suppress printing data on-A Print command line arguments off-C cf Set config file to cf default-D Display configuration variables off
ccopy2008 Giampiero Salvi 2047
Command line switches and options
gt HList -e 1 -o -h feature_file
Source feature_file
Sample Bytes 26 Sample Kind MFCC_0
Num Comps 13 Sample Period 100000 us
Num Samples 336 File Format HTK
-------------------- Observation Structure ---------------------
x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0
------------------------ Samples 0-gt1 -------------------------
0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734
1 -13591 -4756 -6037 -3362 3541 3510 2867
0812 0630 5285 1054 8375 40778
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2147
Configuration file
gt cat config_file
SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A
gt HList -C config_file -e 0 -o -h feature_file
Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK
-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0
------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2247
File manipulation tools
I HCopy converts fromto various data formats (audiofeatures)
I HQuant quantizes speech (audio)
I HLEd edits label and master label files
I HDMan edits dictionary files
I HHEd edits model and master macro files
I HBuild converts language models in different formats (morein recognition section)
ccopy2008 Giampiero Salvi 2347
Computing feature files (HCopy)
gt cat config_file
Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250
gt HCopy -C config_file audio_file1 param_file1 audio_file2
gt HCopy -C config_file -S file_list
ccopy2008 Giampiero Salvi 2447
Label files
MLF
filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]
filename2
I [] = optional (0 or 1)
I = possible repetition (0 1 2)
I time stamps are in 100ns units () 10ms = 100000
ccopy2008 Giampiero Salvi 2547
Label file example 1
gt cat alignedmlf
MLFa10001a1rec
0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe
10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec
0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt
ccopy2008 Giampiero Salvi 2647
Label file example 2 (HLEd)
gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf
gt cat wordsmlf
MLF
a10001a1rec
forra
a10001i1rec
sju
gt cat words2phonesled
EX
IS sil sil
gt cat phonesmlf
MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil
ccopy2008 Giampiero Salvi 2747
Dictionary (HDMan)
WORD [OUTSYM] PRONPROB P1 P2 P3 P4
gt cat lexdic
forra f oe r a spsju S uh a sp
gt cat lex2dic
ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp
ccopy2008 Giampiero Salvi 2847
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
HMM definition (~h)
1
2 3 4
5
ccopy2008 Giampiero Salvi 2947
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
State definition (~s)
1
2 3 4
5
2
ccopy2008 Giampiero Salvi 3047
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Gaussian mixture component definition(~m)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3147
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Mean vector definition (~u)Diagonal variance vector definition (~v)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3247
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Transition matrix definition (~t)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3347
I HSLab graphical tool to label speech (use WaveSurferinstead)
I HList gives information about audio and feature files
I HSGen generates random sentences out of a regular grammar
ccopy2008 Giampiero Salvi 3447
Intermezzo what do we know so far
Training
audiofeatureextraction
prototype HMM
features
dictionary
labels
model settraining
Recognition
grammar
decodingaudiofeatureextraction
features
model set
labels
dictionary
ccopy2008 Giampiero Salvi 3547
model initialization
Initialization procedure depends on the information avaliable atthat time
I HCompV computes the overall mean and varianceInput a prototype HMM
I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means
Input a prototype HMM time aligned transcriptions
ccopy2008 Giampiero Salvi 3647
Traning tools
I HRest Baum-Welch re-estimation
Input an initialized model set time aligned transcriptions
I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions
I HEAdapt performs adaptation on a limited set of data
I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart
ccopy2008 Giampiero Salvi 3747
Training example RefRec
first pass
prototype HMM rarr HCompV rarr cloning rarr HERest rarr
realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr
HERest rarr realignment (HVite)
second pass
prototype HMM + aligned transcriptions rarr HInit rarr HRest
rarr HERest rarr increase MC (HHEd) rarr HERest
ccopy2008 Giampiero Salvi 3847
Recognition tools
grammar generation
I HLStats creates bigram from training data
I HParse parses a user defined grammar to produce a lattice
decoding
I HVite performs Viterbi decoding
evaluation
I HResults evaluates recognition results
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 747
HTK What is it
I A toolkit for Hidden Markov Modeling
I General purpose but
I optimized for Speech Recognition
I Very flexible and complete (active development)
I Very good documentation (HTKBook)
ccopy2008 Giampiero Salvi 847
ASR Overview
Training
audiofeatureextraction
training model set
prototype HMM
features
dictionary
labels
Recognition
audiofeatureextraction
features
dictionary
model set
labelsdecoding
grammar
ccopy2008 Giampiero Salvi 947
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1047
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1147
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1247
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1347
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1447
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1547
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1647
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1747
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1847
The HTK data formats
data formatsaudio many common formats plus HTK binaryfeatures HTK binarylabels HTK (single or Master Label files) textmodels HTK (single or Master Macro files) text or binaryother HTK text
audiofeatureextraction
training model set
prototype HMM
features
dictionary
labels
audiofeatureextraction
features
dictionary
model set
labelsdecoding
grammar
ccopy2008 Giampiero Salvi 1947
Usage example (HList)
gt HList
USAGE HList [options] file
Option Default
-d Coerce observation to VQ symbols off-e N End at sample N 0-h Print source header info off-i N Set items per line to N 10-n N Set num streams to N 1-o Print observation structure off-p Playback audio off-r Write raw output off-s N Start at sample N 0-t Print target header info off-z Suppress printing data on-A Print command line arguments off-C cf Set config file to cf default-D Display configuration variables off
ccopy2008 Giampiero Salvi 2047
Command line switches and options
gt HList -e 1 -o -h feature_file
Source feature_file
Sample Bytes 26 Sample Kind MFCC_0
Num Comps 13 Sample Period 100000 us
Num Samples 336 File Format HTK
-------------------- Observation Structure ---------------------
x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0
------------------------ Samples 0-gt1 -------------------------
0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734
1 -13591 -4756 -6037 -3362 3541 3510 2867
0812 0630 5285 1054 8375 40778
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2147
Configuration file
gt cat config_file
SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A
gt HList -C config_file -e 0 -o -h feature_file
Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK
-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0
------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2247
File manipulation tools
I HCopy converts fromto various data formats (audiofeatures)
I HQuant quantizes speech (audio)
I HLEd edits label and master label files
I HDMan edits dictionary files
I HHEd edits model and master macro files
I HBuild converts language models in different formats (morein recognition section)
ccopy2008 Giampiero Salvi 2347
Computing feature files (HCopy)
gt cat config_file
Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250
gt HCopy -C config_file audio_file1 param_file1 audio_file2
gt HCopy -C config_file -S file_list
ccopy2008 Giampiero Salvi 2447
Label files
MLF
filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]
filename2
I [] = optional (0 or 1)
I = possible repetition (0 1 2)
I time stamps are in 100ns units () 10ms = 100000
ccopy2008 Giampiero Salvi 2547
Label file example 1
gt cat alignedmlf
MLFa10001a1rec
0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe
10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec
0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt
ccopy2008 Giampiero Salvi 2647
Label file example 2 (HLEd)
gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf
gt cat wordsmlf
MLF
a10001a1rec
forra
a10001i1rec
sju
gt cat words2phonesled
EX
IS sil sil
gt cat phonesmlf
MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil
ccopy2008 Giampiero Salvi 2747
Dictionary (HDMan)
WORD [OUTSYM] PRONPROB P1 P2 P3 P4
gt cat lexdic
forra f oe r a spsju S uh a sp
gt cat lex2dic
ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp
ccopy2008 Giampiero Salvi 2847
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
HMM definition (~h)
1
2 3 4
5
ccopy2008 Giampiero Salvi 2947
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
State definition (~s)
1
2 3 4
5
2
ccopy2008 Giampiero Salvi 3047
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Gaussian mixture component definition(~m)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3147
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Mean vector definition (~u)Diagonal variance vector definition (~v)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3247
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Transition matrix definition (~t)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3347
I HSLab graphical tool to label speech (use WaveSurferinstead)
I HList gives information about audio and feature files
I HSGen generates random sentences out of a regular grammar
ccopy2008 Giampiero Salvi 3447
Intermezzo what do we know so far
Training
audiofeatureextraction
prototype HMM
features
dictionary
labels
model settraining
Recognition
grammar
decodingaudiofeatureextraction
features
model set
labels
dictionary
ccopy2008 Giampiero Salvi 3547
model initialization
Initialization procedure depends on the information avaliable atthat time
I HCompV computes the overall mean and varianceInput a prototype HMM
I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means
Input a prototype HMM time aligned transcriptions
ccopy2008 Giampiero Salvi 3647
Traning tools
I HRest Baum-Welch re-estimation
Input an initialized model set time aligned transcriptions
I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions
I HEAdapt performs adaptation on a limited set of data
I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart
ccopy2008 Giampiero Salvi 3747
Training example RefRec
first pass
prototype HMM rarr HCompV rarr cloning rarr HERest rarr
realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr
HERest rarr realignment (HVite)
second pass
prototype HMM + aligned transcriptions rarr HInit rarr HRest
rarr HERest rarr increase MC (HHEd) rarr HERest
ccopy2008 Giampiero Salvi 3847
Recognition tools
grammar generation
I HLStats creates bigram from training data
I HParse parses a user defined grammar to produce a lattice
decoding
I HVite performs Viterbi decoding
evaluation
I HResults evaluates recognition results
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 847
ASR Overview
Training
audiofeatureextraction
training model set
prototype HMM
features
dictionary
labels
Recognition
audiofeatureextraction
features
dictionary
model set
labelsdecoding
grammar
ccopy2008 Giampiero Salvi 947
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1047
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1147
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1247
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1347
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1447
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1547
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1647
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1747
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1847
The HTK data formats
data formatsaudio many common formats plus HTK binaryfeatures HTK binarylabels HTK (single or Master Label files) textmodels HTK (single or Master Macro files) text or binaryother HTK text
audiofeatureextraction
training model set
prototype HMM
features
dictionary
labels
audiofeatureextraction
features
dictionary
model set
labelsdecoding
grammar
ccopy2008 Giampiero Salvi 1947
Usage example (HList)
gt HList
USAGE HList [options] file
Option Default
-d Coerce observation to VQ symbols off-e N End at sample N 0-h Print source header info off-i N Set items per line to N 10-n N Set num streams to N 1-o Print observation structure off-p Playback audio off-r Write raw output off-s N Start at sample N 0-t Print target header info off-z Suppress printing data on-A Print command line arguments off-C cf Set config file to cf default-D Display configuration variables off
ccopy2008 Giampiero Salvi 2047
Command line switches and options
gt HList -e 1 -o -h feature_file
Source feature_file
Sample Bytes 26 Sample Kind MFCC_0
Num Comps 13 Sample Period 100000 us
Num Samples 336 File Format HTK
-------------------- Observation Structure ---------------------
x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0
------------------------ Samples 0-gt1 -------------------------
0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734
1 -13591 -4756 -6037 -3362 3541 3510 2867
0812 0630 5285 1054 8375 40778
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2147
Configuration file
gt cat config_file
SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A
gt HList -C config_file -e 0 -o -h feature_file
Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK
-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0
------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2247
File manipulation tools
I HCopy converts fromto various data formats (audiofeatures)
I HQuant quantizes speech (audio)
I HLEd edits label and master label files
I HDMan edits dictionary files
I HHEd edits model and master macro files
I HBuild converts language models in different formats (morein recognition section)
ccopy2008 Giampiero Salvi 2347
Computing feature files (HCopy)
gt cat config_file
Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250
gt HCopy -C config_file audio_file1 param_file1 audio_file2
gt HCopy -C config_file -S file_list
ccopy2008 Giampiero Salvi 2447
Label files
MLF
filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]
filename2
I [] = optional (0 or 1)
I = possible repetition (0 1 2)
I time stamps are in 100ns units () 10ms = 100000
ccopy2008 Giampiero Salvi 2547
Label file example 1
gt cat alignedmlf
MLFa10001a1rec
0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe
10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec
0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt
ccopy2008 Giampiero Salvi 2647
Label file example 2 (HLEd)
gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf
gt cat wordsmlf
MLF
a10001a1rec
forra
a10001i1rec
sju
gt cat words2phonesled
EX
IS sil sil
gt cat phonesmlf
MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil
ccopy2008 Giampiero Salvi 2747
Dictionary (HDMan)
WORD [OUTSYM] PRONPROB P1 P2 P3 P4
gt cat lexdic
forra f oe r a spsju S uh a sp
gt cat lex2dic
ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp
ccopy2008 Giampiero Salvi 2847
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
HMM definition (~h)
1
2 3 4
5
ccopy2008 Giampiero Salvi 2947
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
State definition (~s)
1
2 3 4
5
2
ccopy2008 Giampiero Salvi 3047
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Gaussian mixture component definition(~m)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3147
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Mean vector definition (~u)Diagonal variance vector definition (~v)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3247
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Transition matrix definition (~t)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3347
I HSLab graphical tool to label speech (use WaveSurferinstead)
I HList gives information about audio and feature files
I HSGen generates random sentences out of a regular grammar
ccopy2008 Giampiero Salvi 3447
Intermezzo what do we know so far
Training
audiofeatureextraction
prototype HMM
features
dictionary
labels
model settraining
Recognition
grammar
decodingaudiofeatureextraction
features
model set
labels
dictionary
ccopy2008 Giampiero Salvi 3547
model initialization
Initialization procedure depends on the information avaliable atthat time
I HCompV computes the overall mean and varianceInput a prototype HMM
I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means
Input a prototype HMM time aligned transcriptions
ccopy2008 Giampiero Salvi 3647
Traning tools
I HRest Baum-Welch re-estimation
Input an initialized model set time aligned transcriptions
I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions
I HEAdapt performs adaptation on a limited set of data
I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart
ccopy2008 Giampiero Salvi 3747
Training example RefRec
first pass
prototype HMM rarr HCompV rarr cloning rarr HERest rarr
realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr
HERest rarr realignment (HVite)
second pass
prototype HMM + aligned transcriptions rarr HInit rarr HRest
rarr HERest rarr increase MC (HHEd) rarr HERest
ccopy2008 Giampiero Salvi 3847
Recognition tools
grammar generation
I HLStats creates bigram from training data
I HParse parses a user defined grammar to produce a lattice
decoding
I HVite performs Viterbi decoding
evaluation
I HResults evaluates recognition results
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 947
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1047
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1147
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1247
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1347
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1447
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1547
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1647
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1747
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1847
The HTK data formats
data formatsaudio many common formats plus HTK binaryfeatures HTK binarylabels HTK (single or Master Label files) textmodels HTK (single or Master Macro files) text or binaryother HTK text
audiofeatureextraction
training model set
prototype HMM
features
dictionary
labels
audiofeatureextraction
features
dictionary
model set
labelsdecoding
grammar
ccopy2008 Giampiero Salvi 1947
Usage example (HList)
gt HList
USAGE HList [options] file
Option Default
-d Coerce observation to VQ symbols off-e N End at sample N 0-h Print source header info off-i N Set items per line to N 10-n N Set num streams to N 1-o Print observation structure off-p Playback audio off-r Write raw output off-s N Start at sample N 0-t Print target header info off-z Suppress printing data on-A Print command line arguments off-C cf Set config file to cf default-D Display configuration variables off
ccopy2008 Giampiero Salvi 2047
Command line switches and options
gt HList -e 1 -o -h feature_file
Source feature_file
Sample Bytes 26 Sample Kind MFCC_0
Num Comps 13 Sample Period 100000 us
Num Samples 336 File Format HTK
-------------------- Observation Structure ---------------------
x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0
------------------------ Samples 0-gt1 -------------------------
0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734
1 -13591 -4756 -6037 -3362 3541 3510 2867
0812 0630 5285 1054 8375 40778
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2147
Configuration file
gt cat config_file
SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A
gt HList -C config_file -e 0 -o -h feature_file
Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK
-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0
------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2247
File manipulation tools
I HCopy converts fromto various data formats (audiofeatures)
I HQuant quantizes speech (audio)
I HLEd edits label and master label files
I HDMan edits dictionary files
I HHEd edits model and master macro files
I HBuild converts language models in different formats (morein recognition section)
ccopy2008 Giampiero Salvi 2347
Computing feature files (HCopy)
gt cat config_file
Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250
gt HCopy -C config_file audio_file1 param_file1 audio_file2
gt HCopy -C config_file -S file_list
ccopy2008 Giampiero Salvi 2447
Label files
MLF
filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]
filename2
I [] = optional (0 or 1)
I = possible repetition (0 1 2)
I time stamps are in 100ns units () 10ms = 100000
ccopy2008 Giampiero Salvi 2547
Label file example 1
gt cat alignedmlf
MLFa10001a1rec
0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe
10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec
0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt
ccopy2008 Giampiero Salvi 2647
Label file example 2 (HLEd)
gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf
gt cat wordsmlf
MLF
a10001a1rec
forra
a10001i1rec
sju
gt cat words2phonesled
EX
IS sil sil
gt cat phonesmlf
MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil
ccopy2008 Giampiero Salvi 2747
Dictionary (HDMan)
WORD [OUTSYM] PRONPROB P1 P2 P3 P4
gt cat lexdic
forra f oe r a spsju S uh a sp
gt cat lex2dic
ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp
ccopy2008 Giampiero Salvi 2847
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
HMM definition (~h)
1
2 3 4
5
ccopy2008 Giampiero Salvi 2947
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
State definition (~s)
1
2 3 4
5
2
ccopy2008 Giampiero Salvi 3047
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Gaussian mixture component definition(~m)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3147
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Mean vector definition (~u)Diagonal variance vector definition (~v)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3247
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Transition matrix definition (~t)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3347
I HSLab graphical tool to label speech (use WaveSurferinstead)
I HList gives information about audio and feature files
I HSGen generates random sentences out of a regular grammar
ccopy2008 Giampiero Salvi 3447
Intermezzo what do we know so far
Training
audiofeatureextraction
prototype HMM
features
dictionary
labels
model settraining
Recognition
grammar
decodingaudiofeatureextraction
features
model set
labels
dictionary
ccopy2008 Giampiero Salvi 3547
model initialization
Initialization procedure depends on the information avaliable atthat time
I HCompV computes the overall mean and varianceInput a prototype HMM
I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means
Input a prototype HMM time aligned transcriptions
ccopy2008 Giampiero Salvi 3647
Traning tools
I HRest Baum-Welch re-estimation
Input an initialized model set time aligned transcriptions
I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions
I HEAdapt performs adaptation on a limited set of data
I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart
ccopy2008 Giampiero Salvi 3747
Training example RefRec
first pass
prototype HMM rarr HCompV rarr cloning rarr HERest rarr
realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr
HERest rarr realignment (HVite)
second pass
prototype HMM + aligned transcriptions rarr HInit rarr HRest
rarr HERest rarr increase MC (HHEd) rarr HERest
ccopy2008 Giampiero Salvi 3847
Recognition tools
grammar generation
I HLStats creates bigram from training data
I HParse parses a user defined grammar to produce a lattice
decoding
I HVite performs Viterbi decoding
evaluation
I HResults evaluates recognition results
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 1047
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1147
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1247
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1347
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1447
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1547
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1647
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1747
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1847
The HTK data formats
data formatsaudio many common formats plus HTK binaryfeatures HTK binarylabels HTK (single or Master Label files) textmodels HTK (single or Master Macro files) text or binaryother HTK text
audiofeatureextraction
training model set
prototype HMM
features
dictionary
labels
audiofeatureextraction
features
dictionary
model set
labelsdecoding
grammar
ccopy2008 Giampiero Salvi 1947
Usage example (HList)
gt HList
USAGE HList [options] file
Option Default
-d Coerce observation to VQ symbols off-e N End at sample N 0-h Print source header info off-i N Set items per line to N 10-n N Set num streams to N 1-o Print observation structure off-p Playback audio off-r Write raw output off-s N Start at sample N 0-t Print target header info off-z Suppress printing data on-A Print command line arguments off-C cf Set config file to cf default-D Display configuration variables off
ccopy2008 Giampiero Salvi 2047
Command line switches and options
gt HList -e 1 -o -h feature_file
Source feature_file
Sample Bytes 26 Sample Kind MFCC_0
Num Comps 13 Sample Period 100000 us
Num Samples 336 File Format HTK
-------------------- Observation Structure ---------------------
x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0
------------------------ Samples 0-gt1 -------------------------
0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734
1 -13591 -4756 -6037 -3362 3541 3510 2867
0812 0630 5285 1054 8375 40778
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2147
Configuration file
gt cat config_file
SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A
gt HList -C config_file -e 0 -o -h feature_file
Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK
-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0
------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2247
File manipulation tools
I HCopy converts fromto various data formats (audiofeatures)
I HQuant quantizes speech (audio)
I HLEd edits label and master label files
I HDMan edits dictionary files
I HHEd edits model and master macro files
I HBuild converts language models in different formats (morein recognition section)
ccopy2008 Giampiero Salvi 2347
Computing feature files (HCopy)
gt cat config_file
Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250
gt HCopy -C config_file audio_file1 param_file1 audio_file2
gt HCopy -C config_file -S file_list
ccopy2008 Giampiero Salvi 2447
Label files
MLF
filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]
filename2
I [] = optional (0 or 1)
I = possible repetition (0 1 2)
I time stamps are in 100ns units () 10ms = 100000
ccopy2008 Giampiero Salvi 2547
Label file example 1
gt cat alignedmlf
MLFa10001a1rec
0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe
10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec
0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt
ccopy2008 Giampiero Salvi 2647
Label file example 2 (HLEd)
gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf
gt cat wordsmlf
MLF
a10001a1rec
forra
a10001i1rec
sju
gt cat words2phonesled
EX
IS sil sil
gt cat phonesmlf
MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil
ccopy2008 Giampiero Salvi 2747
Dictionary (HDMan)
WORD [OUTSYM] PRONPROB P1 P2 P3 P4
gt cat lexdic
forra f oe r a spsju S uh a sp
gt cat lex2dic
ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp
ccopy2008 Giampiero Salvi 2847
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
HMM definition (~h)
1
2 3 4
5
ccopy2008 Giampiero Salvi 2947
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
State definition (~s)
1
2 3 4
5
2
ccopy2008 Giampiero Salvi 3047
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Gaussian mixture component definition(~m)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3147
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Mean vector definition (~u)Diagonal variance vector definition (~v)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3247
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Transition matrix definition (~t)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3347
I HSLab graphical tool to label speech (use WaveSurferinstead)
I HList gives information about audio and feature files
I HSGen generates random sentences out of a regular grammar
ccopy2008 Giampiero Salvi 3447
Intermezzo what do we know so far
Training
audiofeatureextraction
prototype HMM
features
dictionary
labels
model settraining
Recognition
grammar
decodingaudiofeatureextraction
features
model set
labels
dictionary
ccopy2008 Giampiero Salvi 3547
model initialization
Initialization procedure depends on the information avaliable atthat time
I HCompV computes the overall mean and varianceInput a prototype HMM
I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means
Input a prototype HMM time aligned transcriptions
ccopy2008 Giampiero Salvi 3647
Traning tools
I HRest Baum-Welch re-estimation
Input an initialized model set time aligned transcriptions
I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions
I HEAdapt performs adaptation on a limited set of data
I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart
ccopy2008 Giampiero Salvi 3747
Training example RefRec
first pass
prototype HMM rarr HCompV rarr cloning rarr HERest rarr
realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr
HERest rarr realignment (HVite)
second pass
prototype HMM + aligned transcriptions rarr HInit rarr HRest
rarr HERest rarr increase MC (HHEd) rarr HERest
ccopy2008 Giampiero Salvi 3847
Recognition tools
grammar generation
I HLStats creates bigram from training data
I HParse parses a user defined grammar to produce a lattice
decoding
I HVite performs Viterbi decoding
evaluation
I HResults evaluates recognition results
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 1147
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1247
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1347
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1447
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1547
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1647
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1747
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1847
The HTK data formats
data formatsaudio many common formats plus HTK binaryfeatures HTK binarylabels HTK (single or Master Label files) textmodels HTK (single or Master Macro files) text or binaryother HTK text
audiofeatureextraction
training model set
prototype HMM
features
dictionary
labels
audiofeatureextraction
features
dictionary
model set
labelsdecoding
grammar
ccopy2008 Giampiero Salvi 1947
Usage example (HList)
gt HList
USAGE HList [options] file
Option Default
-d Coerce observation to VQ symbols off-e N End at sample N 0-h Print source header info off-i N Set items per line to N 10-n N Set num streams to N 1-o Print observation structure off-p Playback audio off-r Write raw output off-s N Start at sample N 0-t Print target header info off-z Suppress printing data on-A Print command line arguments off-C cf Set config file to cf default-D Display configuration variables off
ccopy2008 Giampiero Salvi 2047
Command line switches and options
gt HList -e 1 -o -h feature_file
Source feature_file
Sample Bytes 26 Sample Kind MFCC_0
Num Comps 13 Sample Period 100000 us
Num Samples 336 File Format HTK
-------------------- Observation Structure ---------------------
x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0
------------------------ Samples 0-gt1 -------------------------
0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734
1 -13591 -4756 -6037 -3362 3541 3510 2867
0812 0630 5285 1054 8375 40778
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2147
Configuration file
gt cat config_file
SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A
gt HList -C config_file -e 0 -o -h feature_file
Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK
-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0
------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2247
File manipulation tools
I HCopy converts fromto various data formats (audiofeatures)
I HQuant quantizes speech (audio)
I HLEd edits label and master label files
I HDMan edits dictionary files
I HHEd edits model and master macro files
I HBuild converts language models in different formats (morein recognition section)
ccopy2008 Giampiero Salvi 2347
Computing feature files (HCopy)
gt cat config_file
Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250
gt HCopy -C config_file audio_file1 param_file1 audio_file2
gt HCopy -C config_file -S file_list
ccopy2008 Giampiero Salvi 2447
Label files
MLF
filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]
filename2
I [] = optional (0 or 1)
I = possible repetition (0 1 2)
I time stamps are in 100ns units () 10ms = 100000
ccopy2008 Giampiero Salvi 2547
Label file example 1
gt cat alignedmlf
MLFa10001a1rec
0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe
10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec
0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt
ccopy2008 Giampiero Salvi 2647
Label file example 2 (HLEd)
gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf
gt cat wordsmlf
MLF
a10001a1rec
forra
a10001i1rec
sju
gt cat words2phonesled
EX
IS sil sil
gt cat phonesmlf
MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil
ccopy2008 Giampiero Salvi 2747
Dictionary (HDMan)
WORD [OUTSYM] PRONPROB P1 P2 P3 P4
gt cat lexdic
forra f oe r a spsju S uh a sp
gt cat lex2dic
ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp
ccopy2008 Giampiero Salvi 2847
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
HMM definition (~h)
1
2 3 4
5
ccopy2008 Giampiero Salvi 2947
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
State definition (~s)
1
2 3 4
5
2
ccopy2008 Giampiero Salvi 3047
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Gaussian mixture component definition(~m)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3147
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Mean vector definition (~u)Diagonal variance vector definition (~v)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3247
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Transition matrix definition (~t)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3347
I HSLab graphical tool to label speech (use WaveSurferinstead)
I HList gives information about audio and feature files
I HSGen generates random sentences out of a regular grammar
ccopy2008 Giampiero Salvi 3447
Intermezzo what do we know so far
Training
audiofeatureextraction
prototype HMM
features
dictionary
labels
model settraining
Recognition
grammar
decodingaudiofeatureextraction
features
model set
labels
dictionary
ccopy2008 Giampiero Salvi 3547
model initialization
Initialization procedure depends on the information avaliable atthat time
I HCompV computes the overall mean and varianceInput a prototype HMM
I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means
Input a prototype HMM time aligned transcriptions
ccopy2008 Giampiero Salvi 3647
Traning tools
I HRest Baum-Welch re-estimation
Input an initialized model set time aligned transcriptions
I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions
I HEAdapt performs adaptation on a limited set of data
I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart
ccopy2008 Giampiero Salvi 3747
Training example RefRec
first pass
prototype HMM rarr HCompV rarr cloning rarr HERest rarr
realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr
HERest rarr realignment (HVite)
second pass
prototype HMM + aligned transcriptions rarr HInit rarr HRest
rarr HERest rarr increase MC (HHEd) rarr HERest
ccopy2008 Giampiero Salvi 3847
Recognition tools
grammar generation
I HLStats creates bigram from training data
I HParse parses a user defined grammar to produce a lattice
decoding
I HVite performs Viterbi decoding
evaluation
I HResults evaluates recognition results
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 1247
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1347
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1447
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1547
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1647
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1747
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1847
The HTK data formats
data formatsaudio many common formats plus HTK binaryfeatures HTK binarylabels HTK (single or Master Label files) textmodels HTK (single or Master Macro files) text or binaryother HTK text
audiofeatureextraction
training model set
prototype HMM
features
dictionary
labels
audiofeatureextraction
features
dictionary
model set
labelsdecoding
grammar
ccopy2008 Giampiero Salvi 1947
Usage example (HList)
gt HList
USAGE HList [options] file
Option Default
-d Coerce observation to VQ symbols off-e N End at sample N 0-h Print source header info off-i N Set items per line to N 10-n N Set num streams to N 1-o Print observation structure off-p Playback audio off-r Write raw output off-s N Start at sample N 0-t Print target header info off-z Suppress printing data on-A Print command line arguments off-C cf Set config file to cf default-D Display configuration variables off
ccopy2008 Giampiero Salvi 2047
Command line switches and options
gt HList -e 1 -o -h feature_file
Source feature_file
Sample Bytes 26 Sample Kind MFCC_0
Num Comps 13 Sample Period 100000 us
Num Samples 336 File Format HTK
-------------------- Observation Structure ---------------------
x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0
------------------------ Samples 0-gt1 -------------------------
0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734
1 -13591 -4756 -6037 -3362 3541 3510 2867
0812 0630 5285 1054 8375 40778
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2147
Configuration file
gt cat config_file
SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A
gt HList -C config_file -e 0 -o -h feature_file
Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK
-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0
------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2247
File manipulation tools
I HCopy converts fromto various data formats (audiofeatures)
I HQuant quantizes speech (audio)
I HLEd edits label and master label files
I HDMan edits dictionary files
I HHEd edits model and master macro files
I HBuild converts language models in different formats (morein recognition section)
ccopy2008 Giampiero Salvi 2347
Computing feature files (HCopy)
gt cat config_file
Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250
gt HCopy -C config_file audio_file1 param_file1 audio_file2
gt HCopy -C config_file -S file_list
ccopy2008 Giampiero Salvi 2447
Label files
MLF
filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]
filename2
I [] = optional (0 or 1)
I = possible repetition (0 1 2)
I time stamps are in 100ns units () 10ms = 100000
ccopy2008 Giampiero Salvi 2547
Label file example 1
gt cat alignedmlf
MLFa10001a1rec
0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe
10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec
0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt
ccopy2008 Giampiero Salvi 2647
Label file example 2 (HLEd)
gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf
gt cat wordsmlf
MLF
a10001a1rec
forra
a10001i1rec
sju
gt cat words2phonesled
EX
IS sil sil
gt cat phonesmlf
MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil
ccopy2008 Giampiero Salvi 2747
Dictionary (HDMan)
WORD [OUTSYM] PRONPROB P1 P2 P3 P4
gt cat lexdic
forra f oe r a spsju S uh a sp
gt cat lex2dic
ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp
ccopy2008 Giampiero Salvi 2847
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
HMM definition (~h)
1
2 3 4
5
ccopy2008 Giampiero Salvi 2947
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
State definition (~s)
1
2 3 4
5
2
ccopy2008 Giampiero Salvi 3047
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Gaussian mixture component definition(~m)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3147
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Mean vector definition (~u)Diagonal variance vector definition (~v)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3247
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Transition matrix definition (~t)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3347
I HSLab graphical tool to label speech (use WaveSurferinstead)
I HList gives information about audio and feature files
I HSGen generates random sentences out of a regular grammar
ccopy2008 Giampiero Salvi 3447
Intermezzo what do we know so far
Training
audiofeatureextraction
prototype HMM
features
dictionary
labels
model settraining
Recognition
grammar
decodingaudiofeatureextraction
features
model set
labels
dictionary
ccopy2008 Giampiero Salvi 3547
model initialization
Initialization procedure depends on the information avaliable atthat time
I HCompV computes the overall mean and varianceInput a prototype HMM
I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means
Input a prototype HMM time aligned transcriptions
ccopy2008 Giampiero Salvi 3647
Traning tools
I HRest Baum-Welch re-estimation
Input an initialized model set time aligned transcriptions
I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions
I HEAdapt performs adaptation on a limited set of data
I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart
ccopy2008 Giampiero Salvi 3747
Training example RefRec
first pass
prototype HMM rarr HCompV rarr cloning rarr HERest rarr
realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr
HERest rarr realignment (HVite)
second pass
prototype HMM + aligned transcriptions rarr HInit rarr HRest
rarr HERest rarr increase MC (HHEd) rarr HERest
ccopy2008 Giampiero Salvi 3847
Recognition tools
grammar generation
I HLStats creates bigram from training data
I HParse parses a user defined grammar to produce a lattice
decoding
I HVite performs Viterbi decoding
evaluation
I HResults evaluates recognition results
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 1347
Things that you should have before you start
I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach
I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc
I lots of patience
I the fabulous HTK Book
I a look at the RefRec scripts
ccopy2008 Giampiero Salvi 1447
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1547
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1647
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1747
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1847
The HTK data formats
data formatsaudio many common formats plus HTK binaryfeatures HTK binarylabels HTK (single or Master Label files) textmodels HTK (single or Master Macro files) text or binaryother HTK text
audiofeatureextraction
training model set
prototype HMM
features
dictionary
labels
audiofeatureextraction
features
dictionary
model set
labelsdecoding
grammar
ccopy2008 Giampiero Salvi 1947
Usage example (HList)
gt HList
USAGE HList [options] file
Option Default
-d Coerce observation to VQ symbols off-e N End at sample N 0-h Print source header info off-i N Set items per line to N 10-n N Set num streams to N 1-o Print observation structure off-p Playback audio off-r Write raw output off-s N Start at sample N 0-t Print target header info off-z Suppress printing data on-A Print command line arguments off-C cf Set config file to cf default-D Display configuration variables off
ccopy2008 Giampiero Salvi 2047
Command line switches and options
gt HList -e 1 -o -h feature_file
Source feature_file
Sample Bytes 26 Sample Kind MFCC_0
Num Comps 13 Sample Period 100000 us
Num Samples 336 File Format HTK
-------------------- Observation Structure ---------------------
x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0
------------------------ Samples 0-gt1 -------------------------
0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734
1 -13591 -4756 -6037 -3362 3541 3510 2867
0812 0630 5285 1054 8375 40778
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2147
Configuration file
gt cat config_file
SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A
gt HList -C config_file -e 0 -o -h feature_file
Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK
-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0
------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2247
File manipulation tools
I HCopy converts fromto various data formats (audiofeatures)
I HQuant quantizes speech (audio)
I HLEd edits label and master label files
I HDMan edits dictionary files
I HHEd edits model and master macro files
I HBuild converts language models in different formats (morein recognition section)
ccopy2008 Giampiero Salvi 2347
Computing feature files (HCopy)
gt cat config_file
Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250
gt HCopy -C config_file audio_file1 param_file1 audio_file2
gt HCopy -C config_file -S file_list
ccopy2008 Giampiero Salvi 2447
Label files
MLF
filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]
filename2
I [] = optional (0 or 1)
I = possible repetition (0 1 2)
I time stamps are in 100ns units () 10ms = 100000
ccopy2008 Giampiero Salvi 2547
Label file example 1
gt cat alignedmlf
MLFa10001a1rec
0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe
10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec
0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt
ccopy2008 Giampiero Salvi 2647
Label file example 2 (HLEd)
gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf
gt cat wordsmlf
MLF
a10001a1rec
forra
a10001i1rec
sju
gt cat words2phonesled
EX
IS sil sil
gt cat phonesmlf
MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil
ccopy2008 Giampiero Salvi 2747
Dictionary (HDMan)
WORD [OUTSYM] PRONPROB P1 P2 P3 P4
gt cat lexdic
forra f oe r a spsju S uh a sp
gt cat lex2dic
ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp
ccopy2008 Giampiero Salvi 2847
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
HMM definition (~h)
1
2 3 4
5
ccopy2008 Giampiero Salvi 2947
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
State definition (~s)
1
2 3 4
5
2
ccopy2008 Giampiero Salvi 3047
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Gaussian mixture component definition(~m)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3147
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Mean vector definition (~u)Diagonal variance vector definition (~v)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3247
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Transition matrix definition (~t)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3347
I HSLab graphical tool to label speech (use WaveSurferinstead)
I HList gives information about audio and feature files
I HSGen generates random sentences out of a regular grammar
ccopy2008 Giampiero Salvi 3447
Intermezzo what do we know so far
Training
audiofeatureextraction
prototype HMM
features
dictionary
labels
model settraining
Recognition
grammar
decodingaudiofeatureextraction
features
model set
labels
dictionary
ccopy2008 Giampiero Salvi 3547
model initialization
Initialization procedure depends on the information avaliable atthat time
I HCompV computes the overall mean and varianceInput a prototype HMM
I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means
Input a prototype HMM time aligned transcriptions
ccopy2008 Giampiero Salvi 3647
Traning tools
I HRest Baum-Welch re-estimation
Input an initialized model set time aligned transcriptions
I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions
I HEAdapt performs adaptation on a limited set of data
I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart
ccopy2008 Giampiero Salvi 3747
Training example RefRec
first pass
prototype HMM rarr HCompV rarr cloning rarr HERest rarr
realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr
HERest rarr realignment (HVite)
second pass
prototype HMM + aligned transcriptions rarr HInit rarr HRest
rarr HERest rarr increase MC (HHEd) rarr HERest
ccopy2008 Giampiero Salvi 3847
Recognition tools
grammar generation
I HLStats creates bigram from training data
I HParse parses a user defined grammar to produce a lattice
decoding
I HVite performs Viterbi decoding
evaluation
I HResults evaluates recognition results
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 1447
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1547
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1647
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1747
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1847
The HTK data formats
data formatsaudio many common formats plus HTK binaryfeatures HTK binarylabels HTK (single or Master Label files) textmodels HTK (single or Master Macro files) text or binaryother HTK text
audiofeatureextraction
training model set
prototype HMM
features
dictionary
labels
audiofeatureextraction
features
dictionary
model set
labelsdecoding
grammar
ccopy2008 Giampiero Salvi 1947
Usage example (HList)
gt HList
USAGE HList [options] file
Option Default
-d Coerce observation to VQ symbols off-e N End at sample N 0-h Print source header info off-i N Set items per line to N 10-n N Set num streams to N 1-o Print observation structure off-p Playback audio off-r Write raw output off-s N Start at sample N 0-t Print target header info off-z Suppress printing data on-A Print command line arguments off-C cf Set config file to cf default-D Display configuration variables off
ccopy2008 Giampiero Salvi 2047
Command line switches and options
gt HList -e 1 -o -h feature_file
Source feature_file
Sample Bytes 26 Sample Kind MFCC_0
Num Comps 13 Sample Period 100000 us
Num Samples 336 File Format HTK
-------------------- Observation Structure ---------------------
x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0
------------------------ Samples 0-gt1 -------------------------
0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734
1 -13591 -4756 -6037 -3362 3541 3510 2867
0812 0630 5285 1054 8375 40778
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2147
Configuration file
gt cat config_file
SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A
gt HList -C config_file -e 0 -o -h feature_file
Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK
-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0
------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2247
File manipulation tools
I HCopy converts fromto various data formats (audiofeatures)
I HQuant quantizes speech (audio)
I HLEd edits label and master label files
I HDMan edits dictionary files
I HHEd edits model and master macro files
I HBuild converts language models in different formats (morein recognition section)
ccopy2008 Giampiero Salvi 2347
Computing feature files (HCopy)
gt cat config_file
Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250
gt HCopy -C config_file audio_file1 param_file1 audio_file2
gt HCopy -C config_file -S file_list
ccopy2008 Giampiero Salvi 2447
Label files
MLF
filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]
filename2
I [] = optional (0 or 1)
I = possible repetition (0 1 2)
I time stamps are in 100ns units () 10ms = 100000
ccopy2008 Giampiero Salvi 2547
Label file example 1
gt cat alignedmlf
MLFa10001a1rec
0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe
10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec
0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt
ccopy2008 Giampiero Salvi 2647
Label file example 2 (HLEd)
gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf
gt cat wordsmlf
MLF
a10001a1rec
forra
a10001i1rec
sju
gt cat words2phonesled
EX
IS sil sil
gt cat phonesmlf
MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil
ccopy2008 Giampiero Salvi 2747
Dictionary (HDMan)
WORD [OUTSYM] PRONPROB P1 P2 P3 P4
gt cat lexdic
forra f oe r a spsju S uh a sp
gt cat lex2dic
ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp
ccopy2008 Giampiero Salvi 2847
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
HMM definition (~h)
1
2 3 4
5
ccopy2008 Giampiero Salvi 2947
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
State definition (~s)
1
2 3 4
5
2
ccopy2008 Giampiero Salvi 3047
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Gaussian mixture component definition(~m)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3147
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Mean vector definition (~u)Diagonal variance vector definition (~v)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3247
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Transition matrix definition (~t)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3347
I HSLab graphical tool to label speech (use WaveSurferinstead)
I HList gives information about audio and feature files
I HSGen generates random sentences out of a regular grammar
ccopy2008 Giampiero Salvi 3447
Intermezzo what do we know so far
Training
audiofeatureextraction
prototype HMM
features
dictionary
labels
model settraining
Recognition
grammar
decodingaudiofeatureextraction
features
model set
labels
dictionary
ccopy2008 Giampiero Salvi 3547
model initialization
Initialization procedure depends on the information avaliable atthat time
I HCompV computes the overall mean and varianceInput a prototype HMM
I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means
Input a prototype HMM time aligned transcriptions
ccopy2008 Giampiero Salvi 3647
Traning tools
I HRest Baum-Welch re-estimation
Input an initialized model set time aligned transcriptions
I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions
I HEAdapt performs adaptation on a limited set of data
I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart
ccopy2008 Giampiero Salvi 3747
Training example RefRec
first pass
prototype HMM rarr HCompV rarr cloning rarr HERest rarr
realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr
HERest rarr realignment (HVite)
second pass
prototype HMM + aligned transcriptions rarr HInit rarr HRest
rarr HERest rarr increase MC (HHEd) rarr HERest
ccopy2008 Giampiero Salvi 3847
Recognition tools
grammar generation
I HLStats creates bigram from training data
I HParse parses a user defined grammar to produce a lattice
decoding
I HVite performs Viterbi decoding
evaluation
I HResults evaluates recognition results
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 1547
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1647
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1747
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1847
The HTK data formats
data formatsaudio many common formats plus HTK binaryfeatures HTK binarylabels HTK (single or Master Label files) textmodels HTK (single or Master Macro files) text or binaryother HTK text
audiofeatureextraction
training model set
prototype HMM
features
dictionary
labels
audiofeatureextraction
features
dictionary
model set
labelsdecoding
grammar
ccopy2008 Giampiero Salvi 1947
Usage example (HList)
gt HList
USAGE HList [options] file
Option Default
-d Coerce observation to VQ symbols off-e N End at sample N 0-h Print source header info off-i N Set items per line to N 10-n N Set num streams to N 1-o Print observation structure off-p Playback audio off-r Write raw output off-s N Start at sample N 0-t Print target header info off-z Suppress printing data on-A Print command line arguments off-C cf Set config file to cf default-D Display configuration variables off
ccopy2008 Giampiero Salvi 2047
Command line switches and options
gt HList -e 1 -o -h feature_file
Source feature_file
Sample Bytes 26 Sample Kind MFCC_0
Num Comps 13 Sample Period 100000 us
Num Samples 336 File Format HTK
-------------------- Observation Structure ---------------------
x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0
------------------------ Samples 0-gt1 -------------------------
0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734
1 -13591 -4756 -6037 -3362 3541 3510 2867
0812 0630 5285 1054 8375 40778
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2147
Configuration file
gt cat config_file
SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A
gt HList -C config_file -e 0 -o -h feature_file
Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK
-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0
------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2247
File manipulation tools
I HCopy converts fromto various data formats (audiofeatures)
I HQuant quantizes speech (audio)
I HLEd edits label and master label files
I HDMan edits dictionary files
I HHEd edits model and master macro files
I HBuild converts language models in different formats (morein recognition section)
ccopy2008 Giampiero Salvi 2347
Computing feature files (HCopy)
gt cat config_file
Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250
gt HCopy -C config_file audio_file1 param_file1 audio_file2
gt HCopy -C config_file -S file_list
ccopy2008 Giampiero Salvi 2447
Label files
MLF
filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]
filename2
I [] = optional (0 or 1)
I = possible repetition (0 1 2)
I time stamps are in 100ns units () 10ms = 100000
ccopy2008 Giampiero Salvi 2547
Label file example 1
gt cat alignedmlf
MLFa10001a1rec
0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe
10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec
0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt
ccopy2008 Giampiero Salvi 2647
Label file example 2 (HLEd)
gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf
gt cat wordsmlf
MLF
a10001a1rec
forra
a10001i1rec
sju
gt cat words2phonesled
EX
IS sil sil
gt cat phonesmlf
MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil
ccopy2008 Giampiero Salvi 2747
Dictionary (HDMan)
WORD [OUTSYM] PRONPROB P1 P2 P3 P4
gt cat lexdic
forra f oe r a spsju S uh a sp
gt cat lex2dic
ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp
ccopy2008 Giampiero Salvi 2847
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
HMM definition (~h)
1
2 3 4
5
ccopy2008 Giampiero Salvi 2947
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
State definition (~s)
1
2 3 4
5
2
ccopy2008 Giampiero Salvi 3047
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Gaussian mixture component definition(~m)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3147
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Mean vector definition (~u)Diagonal variance vector definition (~v)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3247
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Transition matrix definition (~t)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3347
I HSLab graphical tool to label speech (use WaveSurferinstead)
I HList gives information about audio and feature files
I HSGen generates random sentences out of a regular grammar
ccopy2008 Giampiero Salvi 3447
Intermezzo what do we know so far
Training
audiofeatureextraction
prototype HMM
features
dictionary
labels
model settraining
Recognition
grammar
decodingaudiofeatureextraction
features
model set
labels
dictionary
ccopy2008 Giampiero Salvi 3547
model initialization
Initialization procedure depends on the information avaliable atthat time
I HCompV computes the overall mean and varianceInput a prototype HMM
I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means
Input a prototype HMM time aligned transcriptions
ccopy2008 Giampiero Salvi 3647
Traning tools
I HRest Baum-Welch re-estimation
Input an initialized model set time aligned transcriptions
I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions
I HEAdapt performs adaptation on a limited set of data
I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart
ccopy2008 Giampiero Salvi 3747
Training example RefRec
first pass
prototype HMM rarr HCompV rarr cloning rarr HERest rarr
realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr
HERest rarr realignment (HVite)
second pass
prototype HMM + aligned transcriptions rarr HInit rarr HRest
rarr HERest rarr increase MC (HHEd) rarr HERest
ccopy2008 Giampiero Salvi 3847
Recognition tools
grammar generation
I HLStats creates bigram from training data
I HParse parses a user defined grammar to produce a lattice
decoding
I HVite performs Viterbi decoding
evaluation
I HResults evaluates recognition results
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 1647
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1747
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1847
The HTK data formats
data formatsaudio many common formats plus HTK binaryfeatures HTK binarylabels HTK (single or Master Label files) textmodels HTK (single or Master Macro files) text or binaryother HTK text
audiofeatureextraction
training model set
prototype HMM
features
dictionary
labels
audiofeatureextraction
features
dictionary
model set
labelsdecoding
grammar
ccopy2008 Giampiero Salvi 1947
Usage example (HList)
gt HList
USAGE HList [options] file
Option Default
-d Coerce observation to VQ symbols off-e N End at sample N 0-h Print source header info off-i N Set items per line to N 10-n N Set num streams to N 1-o Print observation structure off-p Playback audio off-r Write raw output off-s N Start at sample N 0-t Print target header info off-z Suppress printing data on-A Print command line arguments off-C cf Set config file to cf default-D Display configuration variables off
ccopy2008 Giampiero Salvi 2047
Command line switches and options
gt HList -e 1 -o -h feature_file
Source feature_file
Sample Bytes 26 Sample Kind MFCC_0
Num Comps 13 Sample Period 100000 us
Num Samples 336 File Format HTK
-------------------- Observation Structure ---------------------
x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0
------------------------ Samples 0-gt1 -------------------------
0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734
1 -13591 -4756 -6037 -3362 3541 3510 2867
0812 0630 5285 1054 8375 40778
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2147
Configuration file
gt cat config_file
SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A
gt HList -C config_file -e 0 -o -h feature_file
Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK
-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0
------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2247
File manipulation tools
I HCopy converts fromto various data formats (audiofeatures)
I HQuant quantizes speech (audio)
I HLEd edits label and master label files
I HDMan edits dictionary files
I HHEd edits model and master macro files
I HBuild converts language models in different formats (morein recognition section)
ccopy2008 Giampiero Salvi 2347
Computing feature files (HCopy)
gt cat config_file
Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250
gt HCopy -C config_file audio_file1 param_file1 audio_file2
gt HCopy -C config_file -S file_list
ccopy2008 Giampiero Salvi 2447
Label files
MLF
filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]
filename2
I [] = optional (0 or 1)
I = possible repetition (0 1 2)
I time stamps are in 100ns units () 10ms = 100000
ccopy2008 Giampiero Salvi 2547
Label file example 1
gt cat alignedmlf
MLFa10001a1rec
0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe
10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec
0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt
ccopy2008 Giampiero Salvi 2647
Label file example 2 (HLEd)
gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf
gt cat wordsmlf
MLF
a10001a1rec
forra
a10001i1rec
sju
gt cat words2phonesled
EX
IS sil sil
gt cat phonesmlf
MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil
ccopy2008 Giampiero Salvi 2747
Dictionary (HDMan)
WORD [OUTSYM] PRONPROB P1 P2 P3 P4
gt cat lexdic
forra f oe r a spsju S uh a sp
gt cat lex2dic
ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp
ccopy2008 Giampiero Salvi 2847
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
HMM definition (~h)
1
2 3 4
5
ccopy2008 Giampiero Salvi 2947
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
State definition (~s)
1
2 3 4
5
2
ccopy2008 Giampiero Salvi 3047
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Gaussian mixture component definition(~m)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3147
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Mean vector definition (~u)Diagonal variance vector definition (~v)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3247
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Transition matrix definition (~t)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3347
I HSLab graphical tool to label speech (use WaveSurferinstead)
I HList gives information about audio and feature files
I HSGen generates random sentences out of a regular grammar
ccopy2008 Giampiero Salvi 3447
Intermezzo what do we know so far
Training
audiofeatureextraction
prototype HMM
features
dictionary
labels
model settraining
Recognition
grammar
decodingaudiofeatureextraction
features
model set
labels
dictionary
ccopy2008 Giampiero Salvi 3547
model initialization
Initialization procedure depends on the information avaliable atthat time
I HCompV computes the overall mean and varianceInput a prototype HMM
I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means
Input a prototype HMM time aligned transcriptions
ccopy2008 Giampiero Salvi 3647
Traning tools
I HRest Baum-Welch re-estimation
Input an initialized model set time aligned transcriptions
I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions
I HEAdapt performs adaptation on a limited set of data
I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart
ccopy2008 Giampiero Salvi 3747
Training example RefRec
first pass
prototype HMM rarr HCompV rarr cloning rarr HERest rarr
realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr
HERest rarr realignment (HVite)
second pass
prototype HMM + aligned transcriptions rarr HInit rarr HRest
rarr HERest rarr increase MC (HHEd) rarr HERest
ccopy2008 Giampiero Salvi 3847
Recognition tools
grammar generation
I HLStats creates bigram from training data
I HParse parses a user defined grammar to produce a lattice
decoding
I HVite performs Viterbi decoding
evaluation
I HResults evaluates recognition results
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 1747
The HTK tools
I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild
I data visualization toolsHSLab HList HSGen
I training toolsHCompV HInit HRest HERest HEAdapt HSmooth
I recognition toolsHLStats HParse HVite HResults
ccopy2008 Giampiero Salvi 1847
The HTK data formats
data formatsaudio many common formats plus HTK binaryfeatures HTK binarylabels HTK (single or Master Label files) textmodels HTK (single or Master Macro files) text or binaryother HTK text
audiofeatureextraction
training model set
prototype HMM
features
dictionary
labels
audiofeatureextraction
features
dictionary
model set
labelsdecoding
grammar
ccopy2008 Giampiero Salvi 1947
Usage example (HList)
gt HList
USAGE HList [options] file
Option Default
-d Coerce observation to VQ symbols off-e N End at sample N 0-h Print source header info off-i N Set items per line to N 10-n N Set num streams to N 1-o Print observation structure off-p Playback audio off-r Write raw output off-s N Start at sample N 0-t Print target header info off-z Suppress printing data on-A Print command line arguments off-C cf Set config file to cf default-D Display configuration variables off
ccopy2008 Giampiero Salvi 2047
Command line switches and options
gt HList -e 1 -o -h feature_file
Source feature_file
Sample Bytes 26 Sample Kind MFCC_0
Num Comps 13 Sample Period 100000 us
Num Samples 336 File Format HTK
-------------------- Observation Structure ---------------------
x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0
------------------------ Samples 0-gt1 -------------------------
0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734
1 -13591 -4756 -6037 -3362 3541 3510 2867
0812 0630 5285 1054 8375 40778
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2147
Configuration file
gt cat config_file
SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A
gt HList -C config_file -e 0 -o -h feature_file
Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK
-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0
------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2247
File manipulation tools
I HCopy converts fromto various data formats (audiofeatures)
I HQuant quantizes speech (audio)
I HLEd edits label and master label files
I HDMan edits dictionary files
I HHEd edits model and master macro files
I HBuild converts language models in different formats (morein recognition section)
ccopy2008 Giampiero Salvi 2347
Computing feature files (HCopy)
gt cat config_file
Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250
gt HCopy -C config_file audio_file1 param_file1 audio_file2
gt HCopy -C config_file -S file_list
ccopy2008 Giampiero Salvi 2447
Label files
MLF
filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]
filename2
I [] = optional (0 or 1)
I = possible repetition (0 1 2)
I time stamps are in 100ns units () 10ms = 100000
ccopy2008 Giampiero Salvi 2547
Label file example 1
gt cat alignedmlf
MLFa10001a1rec
0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe
10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec
0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt
ccopy2008 Giampiero Salvi 2647
Label file example 2 (HLEd)
gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf
gt cat wordsmlf
MLF
a10001a1rec
forra
a10001i1rec
sju
gt cat words2phonesled
EX
IS sil sil
gt cat phonesmlf
MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil
ccopy2008 Giampiero Salvi 2747
Dictionary (HDMan)
WORD [OUTSYM] PRONPROB P1 P2 P3 P4
gt cat lexdic
forra f oe r a spsju S uh a sp
gt cat lex2dic
ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp
ccopy2008 Giampiero Salvi 2847
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
HMM definition (~h)
1
2 3 4
5
ccopy2008 Giampiero Salvi 2947
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
State definition (~s)
1
2 3 4
5
2
ccopy2008 Giampiero Salvi 3047
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Gaussian mixture component definition(~m)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3147
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Mean vector definition (~u)Diagonal variance vector definition (~v)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3247
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Transition matrix definition (~t)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3347
I HSLab graphical tool to label speech (use WaveSurferinstead)
I HList gives information about audio and feature files
I HSGen generates random sentences out of a regular grammar
ccopy2008 Giampiero Salvi 3447
Intermezzo what do we know so far
Training
audiofeatureextraction
prototype HMM
features
dictionary
labels
model settraining
Recognition
grammar
decodingaudiofeatureextraction
features
model set
labels
dictionary
ccopy2008 Giampiero Salvi 3547
model initialization
Initialization procedure depends on the information avaliable atthat time
I HCompV computes the overall mean and varianceInput a prototype HMM
I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means
Input a prototype HMM time aligned transcriptions
ccopy2008 Giampiero Salvi 3647
Traning tools
I HRest Baum-Welch re-estimation
Input an initialized model set time aligned transcriptions
I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions
I HEAdapt performs adaptation on a limited set of data
I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart
ccopy2008 Giampiero Salvi 3747
Training example RefRec
first pass
prototype HMM rarr HCompV rarr cloning rarr HERest rarr
realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr
HERest rarr realignment (HVite)
second pass
prototype HMM + aligned transcriptions rarr HInit rarr HRest
rarr HERest rarr increase MC (HHEd) rarr HERest
ccopy2008 Giampiero Salvi 3847
Recognition tools
grammar generation
I HLStats creates bigram from training data
I HParse parses a user defined grammar to produce a lattice
decoding
I HVite performs Viterbi decoding
evaluation
I HResults evaluates recognition results
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 1847
The HTK data formats
data formatsaudio many common formats plus HTK binaryfeatures HTK binarylabels HTK (single or Master Label files) textmodels HTK (single or Master Macro files) text or binaryother HTK text
audiofeatureextraction
training model set
prototype HMM
features
dictionary
labels
audiofeatureextraction
features
dictionary
model set
labelsdecoding
grammar
ccopy2008 Giampiero Salvi 1947
Usage example (HList)
gt HList
USAGE HList [options] file
Option Default
-d Coerce observation to VQ symbols off-e N End at sample N 0-h Print source header info off-i N Set items per line to N 10-n N Set num streams to N 1-o Print observation structure off-p Playback audio off-r Write raw output off-s N Start at sample N 0-t Print target header info off-z Suppress printing data on-A Print command line arguments off-C cf Set config file to cf default-D Display configuration variables off
ccopy2008 Giampiero Salvi 2047
Command line switches and options
gt HList -e 1 -o -h feature_file
Source feature_file
Sample Bytes 26 Sample Kind MFCC_0
Num Comps 13 Sample Period 100000 us
Num Samples 336 File Format HTK
-------------------- Observation Structure ---------------------
x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0
------------------------ Samples 0-gt1 -------------------------
0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734
1 -13591 -4756 -6037 -3362 3541 3510 2867
0812 0630 5285 1054 8375 40778
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2147
Configuration file
gt cat config_file
SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A
gt HList -C config_file -e 0 -o -h feature_file
Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK
-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0
------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2247
File manipulation tools
I HCopy converts fromto various data formats (audiofeatures)
I HQuant quantizes speech (audio)
I HLEd edits label and master label files
I HDMan edits dictionary files
I HHEd edits model and master macro files
I HBuild converts language models in different formats (morein recognition section)
ccopy2008 Giampiero Salvi 2347
Computing feature files (HCopy)
gt cat config_file
Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250
gt HCopy -C config_file audio_file1 param_file1 audio_file2
gt HCopy -C config_file -S file_list
ccopy2008 Giampiero Salvi 2447
Label files
MLF
filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]
filename2
I [] = optional (0 or 1)
I = possible repetition (0 1 2)
I time stamps are in 100ns units () 10ms = 100000
ccopy2008 Giampiero Salvi 2547
Label file example 1
gt cat alignedmlf
MLFa10001a1rec
0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe
10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec
0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt
ccopy2008 Giampiero Salvi 2647
Label file example 2 (HLEd)
gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf
gt cat wordsmlf
MLF
a10001a1rec
forra
a10001i1rec
sju
gt cat words2phonesled
EX
IS sil sil
gt cat phonesmlf
MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil
ccopy2008 Giampiero Salvi 2747
Dictionary (HDMan)
WORD [OUTSYM] PRONPROB P1 P2 P3 P4
gt cat lexdic
forra f oe r a spsju S uh a sp
gt cat lex2dic
ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp
ccopy2008 Giampiero Salvi 2847
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
HMM definition (~h)
1
2 3 4
5
ccopy2008 Giampiero Salvi 2947
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
State definition (~s)
1
2 3 4
5
2
ccopy2008 Giampiero Salvi 3047
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Gaussian mixture component definition(~m)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3147
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Mean vector definition (~u)Diagonal variance vector definition (~v)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3247
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Transition matrix definition (~t)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3347
I HSLab graphical tool to label speech (use WaveSurferinstead)
I HList gives information about audio and feature files
I HSGen generates random sentences out of a regular grammar
ccopy2008 Giampiero Salvi 3447
Intermezzo what do we know so far
Training
audiofeatureextraction
prototype HMM
features
dictionary
labels
model settraining
Recognition
grammar
decodingaudiofeatureextraction
features
model set
labels
dictionary
ccopy2008 Giampiero Salvi 3547
model initialization
Initialization procedure depends on the information avaliable atthat time
I HCompV computes the overall mean and varianceInput a prototype HMM
I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means
Input a prototype HMM time aligned transcriptions
ccopy2008 Giampiero Salvi 3647
Traning tools
I HRest Baum-Welch re-estimation
Input an initialized model set time aligned transcriptions
I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions
I HEAdapt performs adaptation on a limited set of data
I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart
ccopy2008 Giampiero Salvi 3747
Training example RefRec
first pass
prototype HMM rarr HCompV rarr cloning rarr HERest rarr
realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr
HERest rarr realignment (HVite)
second pass
prototype HMM + aligned transcriptions rarr HInit rarr HRest
rarr HERest rarr increase MC (HHEd) rarr HERest
ccopy2008 Giampiero Salvi 3847
Recognition tools
grammar generation
I HLStats creates bigram from training data
I HParse parses a user defined grammar to produce a lattice
decoding
I HVite performs Viterbi decoding
evaluation
I HResults evaluates recognition results
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 1947
Usage example (HList)
gt HList
USAGE HList [options] file
Option Default
-d Coerce observation to VQ symbols off-e N End at sample N 0-h Print source header info off-i N Set items per line to N 10-n N Set num streams to N 1-o Print observation structure off-p Playback audio off-r Write raw output off-s N Start at sample N 0-t Print target header info off-z Suppress printing data on-A Print command line arguments off-C cf Set config file to cf default-D Display configuration variables off
ccopy2008 Giampiero Salvi 2047
Command line switches and options
gt HList -e 1 -o -h feature_file
Source feature_file
Sample Bytes 26 Sample Kind MFCC_0
Num Comps 13 Sample Period 100000 us
Num Samples 336 File Format HTK
-------------------- Observation Structure ---------------------
x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0
------------------------ Samples 0-gt1 -------------------------
0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734
1 -13591 -4756 -6037 -3362 3541 3510 2867
0812 0630 5285 1054 8375 40778
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2147
Configuration file
gt cat config_file
SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A
gt HList -C config_file -e 0 -o -h feature_file
Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK
-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0
------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2247
File manipulation tools
I HCopy converts fromto various data formats (audiofeatures)
I HQuant quantizes speech (audio)
I HLEd edits label and master label files
I HDMan edits dictionary files
I HHEd edits model and master macro files
I HBuild converts language models in different formats (morein recognition section)
ccopy2008 Giampiero Salvi 2347
Computing feature files (HCopy)
gt cat config_file
Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250
gt HCopy -C config_file audio_file1 param_file1 audio_file2
gt HCopy -C config_file -S file_list
ccopy2008 Giampiero Salvi 2447
Label files
MLF
filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]
filename2
I [] = optional (0 or 1)
I = possible repetition (0 1 2)
I time stamps are in 100ns units () 10ms = 100000
ccopy2008 Giampiero Salvi 2547
Label file example 1
gt cat alignedmlf
MLFa10001a1rec
0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe
10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec
0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt
ccopy2008 Giampiero Salvi 2647
Label file example 2 (HLEd)
gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf
gt cat wordsmlf
MLF
a10001a1rec
forra
a10001i1rec
sju
gt cat words2phonesled
EX
IS sil sil
gt cat phonesmlf
MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil
ccopy2008 Giampiero Salvi 2747
Dictionary (HDMan)
WORD [OUTSYM] PRONPROB P1 P2 P3 P4
gt cat lexdic
forra f oe r a spsju S uh a sp
gt cat lex2dic
ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp
ccopy2008 Giampiero Salvi 2847
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
HMM definition (~h)
1
2 3 4
5
ccopy2008 Giampiero Salvi 2947
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
State definition (~s)
1
2 3 4
5
2
ccopy2008 Giampiero Salvi 3047
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Gaussian mixture component definition(~m)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3147
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Mean vector definition (~u)Diagonal variance vector definition (~v)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3247
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Transition matrix definition (~t)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3347
I HSLab graphical tool to label speech (use WaveSurferinstead)
I HList gives information about audio and feature files
I HSGen generates random sentences out of a regular grammar
ccopy2008 Giampiero Salvi 3447
Intermezzo what do we know so far
Training
audiofeatureextraction
prototype HMM
features
dictionary
labels
model settraining
Recognition
grammar
decodingaudiofeatureextraction
features
model set
labels
dictionary
ccopy2008 Giampiero Salvi 3547
model initialization
Initialization procedure depends on the information avaliable atthat time
I HCompV computes the overall mean and varianceInput a prototype HMM
I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means
Input a prototype HMM time aligned transcriptions
ccopy2008 Giampiero Salvi 3647
Traning tools
I HRest Baum-Welch re-estimation
Input an initialized model set time aligned transcriptions
I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions
I HEAdapt performs adaptation on a limited set of data
I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart
ccopy2008 Giampiero Salvi 3747
Training example RefRec
first pass
prototype HMM rarr HCompV rarr cloning rarr HERest rarr
realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr
HERest rarr realignment (HVite)
second pass
prototype HMM + aligned transcriptions rarr HInit rarr HRest
rarr HERest rarr increase MC (HHEd) rarr HERest
ccopy2008 Giampiero Salvi 3847
Recognition tools
grammar generation
I HLStats creates bigram from training data
I HParse parses a user defined grammar to produce a lattice
decoding
I HVite performs Viterbi decoding
evaluation
I HResults evaluates recognition results
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 2047
Command line switches and options
gt HList -e 1 -o -h feature_file
Source feature_file
Sample Bytes 26 Sample Kind MFCC_0
Num Comps 13 Sample Period 100000 us
Num Samples 336 File Format HTK
-------------------- Observation Structure ---------------------
x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0
------------------------ Samples 0-gt1 -------------------------
0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734
1 -13591 -4756 -6037 -3362 3541 3510 2867
0812 0630 5285 1054 8375 40778
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2147
Configuration file
gt cat config_file
SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A
gt HList -C config_file -e 0 -o -h feature_file
Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK
-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0
------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2247
File manipulation tools
I HCopy converts fromto various data formats (audiofeatures)
I HQuant quantizes speech (audio)
I HLEd edits label and master label files
I HDMan edits dictionary files
I HHEd edits model and master macro files
I HBuild converts language models in different formats (morein recognition section)
ccopy2008 Giampiero Salvi 2347
Computing feature files (HCopy)
gt cat config_file
Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250
gt HCopy -C config_file audio_file1 param_file1 audio_file2
gt HCopy -C config_file -S file_list
ccopy2008 Giampiero Salvi 2447
Label files
MLF
filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]
filename2
I [] = optional (0 or 1)
I = possible repetition (0 1 2)
I time stamps are in 100ns units () 10ms = 100000
ccopy2008 Giampiero Salvi 2547
Label file example 1
gt cat alignedmlf
MLFa10001a1rec
0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe
10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec
0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt
ccopy2008 Giampiero Salvi 2647
Label file example 2 (HLEd)
gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf
gt cat wordsmlf
MLF
a10001a1rec
forra
a10001i1rec
sju
gt cat words2phonesled
EX
IS sil sil
gt cat phonesmlf
MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil
ccopy2008 Giampiero Salvi 2747
Dictionary (HDMan)
WORD [OUTSYM] PRONPROB P1 P2 P3 P4
gt cat lexdic
forra f oe r a spsju S uh a sp
gt cat lex2dic
ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp
ccopy2008 Giampiero Salvi 2847
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
HMM definition (~h)
1
2 3 4
5
ccopy2008 Giampiero Salvi 2947
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
State definition (~s)
1
2 3 4
5
2
ccopy2008 Giampiero Salvi 3047
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Gaussian mixture component definition(~m)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3147
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Mean vector definition (~u)Diagonal variance vector definition (~v)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3247
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Transition matrix definition (~t)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3347
I HSLab graphical tool to label speech (use WaveSurferinstead)
I HList gives information about audio and feature files
I HSGen generates random sentences out of a regular grammar
ccopy2008 Giampiero Salvi 3447
Intermezzo what do we know so far
Training
audiofeatureextraction
prototype HMM
features
dictionary
labels
model settraining
Recognition
grammar
decodingaudiofeatureextraction
features
model set
labels
dictionary
ccopy2008 Giampiero Salvi 3547
model initialization
Initialization procedure depends on the information avaliable atthat time
I HCompV computes the overall mean and varianceInput a prototype HMM
I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means
Input a prototype HMM time aligned transcriptions
ccopy2008 Giampiero Salvi 3647
Traning tools
I HRest Baum-Welch re-estimation
Input an initialized model set time aligned transcriptions
I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions
I HEAdapt performs adaptation on a limited set of data
I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart
ccopy2008 Giampiero Salvi 3747
Training example RefRec
first pass
prototype HMM rarr HCompV rarr cloning rarr HERest rarr
realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr
HERest rarr realignment (HVite)
second pass
prototype HMM + aligned transcriptions rarr HInit rarr HRest
rarr HERest rarr increase MC (HHEd) rarr HERest
ccopy2008 Giampiero Salvi 3847
Recognition tools
grammar generation
I HLStats creates bigram from training data
I HParse parses a user defined grammar to produce a lattice
decoding
I HVite performs Viterbi decoding
evaluation
I HResults evaluates recognition results
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 2147
Configuration file
gt cat config_file
SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A
gt HList -C config_file -e 0 -o -h feature_file
Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK
-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7
MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0
------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830
3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051
----------------------------- END ------------------------------
ccopy2008 Giampiero Salvi 2247
File manipulation tools
I HCopy converts fromto various data formats (audiofeatures)
I HQuant quantizes speech (audio)
I HLEd edits label and master label files
I HDMan edits dictionary files
I HHEd edits model and master macro files
I HBuild converts language models in different formats (morein recognition section)
ccopy2008 Giampiero Salvi 2347
Computing feature files (HCopy)
gt cat config_file
Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250
gt HCopy -C config_file audio_file1 param_file1 audio_file2
gt HCopy -C config_file -S file_list
ccopy2008 Giampiero Salvi 2447
Label files
MLF
filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]
filename2
I [] = optional (0 or 1)
I = possible repetition (0 1 2)
I time stamps are in 100ns units () 10ms = 100000
ccopy2008 Giampiero Salvi 2547
Label file example 1
gt cat alignedmlf
MLFa10001a1rec
0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe
10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec
0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt
ccopy2008 Giampiero Salvi 2647
Label file example 2 (HLEd)
gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf
gt cat wordsmlf
MLF
a10001a1rec
forra
a10001i1rec
sju
gt cat words2phonesled
EX
IS sil sil
gt cat phonesmlf
MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil
ccopy2008 Giampiero Salvi 2747
Dictionary (HDMan)
WORD [OUTSYM] PRONPROB P1 P2 P3 P4
gt cat lexdic
forra f oe r a spsju S uh a sp
gt cat lex2dic
ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp
ccopy2008 Giampiero Salvi 2847
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
HMM definition (~h)
1
2 3 4
5
ccopy2008 Giampiero Salvi 2947
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
State definition (~s)
1
2 3 4
5
2
ccopy2008 Giampiero Salvi 3047
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Gaussian mixture component definition(~m)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3147
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Mean vector definition (~u)Diagonal variance vector definition (~v)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3247
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Transition matrix definition (~t)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3347
I HSLab graphical tool to label speech (use WaveSurferinstead)
I HList gives information about audio and feature files
I HSGen generates random sentences out of a regular grammar
ccopy2008 Giampiero Salvi 3447
Intermezzo what do we know so far
Training
audiofeatureextraction
prototype HMM
features
dictionary
labels
model settraining
Recognition
grammar
decodingaudiofeatureextraction
features
model set
labels
dictionary
ccopy2008 Giampiero Salvi 3547
model initialization
Initialization procedure depends on the information avaliable atthat time
I HCompV computes the overall mean and varianceInput a prototype HMM
I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means
Input a prototype HMM time aligned transcriptions
ccopy2008 Giampiero Salvi 3647
Traning tools
I HRest Baum-Welch re-estimation
Input an initialized model set time aligned transcriptions
I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions
I HEAdapt performs adaptation on a limited set of data
I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart
ccopy2008 Giampiero Salvi 3747
Training example RefRec
first pass
prototype HMM rarr HCompV rarr cloning rarr HERest rarr
realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr
HERest rarr realignment (HVite)
second pass
prototype HMM + aligned transcriptions rarr HInit rarr HRest
rarr HERest rarr increase MC (HHEd) rarr HERest
ccopy2008 Giampiero Salvi 3847
Recognition tools
grammar generation
I HLStats creates bigram from training data
I HParse parses a user defined grammar to produce a lattice
decoding
I HVite performs Viterbi decoding
evaluation
I HResults evaluates recognition results
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 2247
File manipulation tools
I HCopy converts fromto various data formats (audiofeatures)
I HQuant quantizes speech (audio)
I HLEd edits label and master label files
I HDMan edits dictionary files
I HHEd edits model and master macro files
I HBuild converts language models in different formats (morein recognition section)
ccopy2008 Giampiero Salvi 2347
Computing feature files (HCopy)
gt cat config_file
Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250
gt HCopy -C config_file audio_file1 param_file1 audio_file2
gt HCopy -C config_file -S file_list
ccopy2008 Giampiero Salvi 2447
Label files
MLF
filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]
filename2
I [] = optional (0 or 1)
I = possible repetition (0 1 2)
I time stamps are in 100ns units () 10ms = 100000
ccopy2008 Giampiero Salvi 2547
Label file example 1
gt cat alignedmlf
MLFa10001a1rec
0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe
10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec
0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt
ccopy2008 Giampiero Salvi 2647
Label file example 2 (HLEd)
gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf
gt cat wordsmlf
MLF
a10001a1rec
forra
a10001i1rec
sju
gt cat words2phonesled
EX
IS sil sil
gt cat phonesmlf
MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil
ccopy2008 Giampiero Salvi 2747
Dictionary (HDMan)
WORD [OUTSYM] PRONPROB P1 P2 P3 P4
gt cat lexdic
forra f oe r a spsju S uh a sp
gt cat lex2dic
ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp
ccopy2008 Giampiero Salvi 2847
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
HMM definition (~h)
1
2 3 4
5
ccopy2008 Giampiero Salvi 2947
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
State definition (~s)
1
2 3 4
5
2
ccopy2008 Giampiero Salvi 3047
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Gaussian mixture component definition(~m)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3147
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Mean vector definition (~u)Diagonal variance vector definition (~v)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3247
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Transition matrix definition (~t)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3347
I HSLab graphical tool to label speech (use WaveSurferinstead)
I HList gives information about audio and feature files
I HSGen generates random sentences out of a regular grammar
ccopy2008 Giampiero Salvi 3447
Intermezzo what do we know so far
Training
audiofeatureextraction
prototype HMM
features
dictionary
labels
model settraining
Recognition
grammar
decodingaudiofeatureextraction
features
model set
labels
dictionary
ccopy2008 Giampiero Salvi 3547
model initialization
Initialization procedure depends on the information avaliable atthat time
I HCompV computes the overall mean and varianceInput a prototype HMM
I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means
Input a prototype HMM time aligned transcriptions
ccopy2008 Giampiero Salvi 3647
Traning tools
I HRest Baum-Welch re-estimation
Input an initialized model set time aligned transcriptions
I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions
I HEAdapt performs adaptation on a limited set of data
I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart
ccopy2008 Giampiero Salvi 3747
Training example RefRec
first pass
prototype HMM rarr HCompV rarr cloning rarr HERest rarr
realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr
HERest rarr realignment (HVite)
second pass
prototype HMM + aligned transcriptions rarr HInit rarr HRest
rarr HERest rarr increase MC (HHEd) rarr HERest
ccopy2008 Giampiero Salvi 3847
Recognition tools
grammar generation
I HLStats creates bigram from training data
I HParse parses a user defined grammar to produce a lattice
decoding
I HVite performs Viterbi decoding
evaluation
I HResults evaluates recognition results
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 2347
Computing feature files (HCopy)
gt cat config_file
Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250
gt HCopy -C config_file audio_file1 param_file1 audio_file2
gt HCopy -C config_file -S file_list
ccopy2008 Giampiero Salvi 2447
Label files
MLF
filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]
filename2
I [] = optional (0 or 1)
I = possible repetition (0 1 2)
I time stamps are in 100ns units () 10ms = 100000
ccopy2008 Giampiero Salvi 2547
Label file example 1
gt cat alignedmlf
MLFa10001a1rec
0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe
10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec
0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt
ccopy2008 Giampiero Salvi 2647
Label file example 2 (HLEd)
gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf
gt cat wordsmlf
MLF
a10001a1rec
forra
a10001i1rec
sju
gt cat words2phonesled
EX
IS sil sil
gt cat phonesmlf
MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil
ccopy2008 Giampiero Salvi 2747
Dictionary (HDMan)
WORD [OUTSYM] PRONPROB P1 P2 P3 P4
gt cat lexdic
forra f oe r a spsju S uh a sp
gt cat lex2dic
ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp
ccopy2008 Giampiero Salvi 2847
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
HMM definition (~h)
1
2 3 4
5
ccopy2008 Giampiero Salvi 2947
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
State definition (~s)
1
2 3 4
5
2
ccopy2008 Giampiero Salvi 3047
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Gaussian mixture component definition(~m)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3147
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Mean vector definition (~u)Diagonal variance vector definition (~v)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3247
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Transition matrix definition (~t)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3347
I HSLab graphical tool to label speech (use WaveSurferinstead)
I HList gives information about audio and feature files
I HSGen generates random sentences out of a regular grammar
ccopy2008 Giampiero Salvi 3447
Intermezzo what do we know so far
Training
audiofeatureextraction
prototype HMM
features
dictionary
labels
model settraining
Recognition
grammar
decodingaudiofeatureextraction
features
model set
labels
dictionary
ccopy2008 Giampiero Salvi 3547
model initialization
Initialization procedure depends on the information avaliable atthat time
I HCompV computes the overall mean and varianceInput a prototype HMM
I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means
Input a prototype HMM time aligned transcriptions
ccopy2008 Giampiero Salvi 3647
Traning tools
I HRest Baum-Welch re-estimation
Input an initialized model set time aligned transcriptions
I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions
I HEAdapt performs adaptation on a limited set of data
I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart
ccopy2008 Giampiero Salvi 3747
Training example RefRec
first pass
prototype HMM rarr HCompV rarr cloning rarr HERest rarr
realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr
HERest rarr realignment (HVite)
second pass
prototype HMM + aligned transcriptions rarr HInit rarr HRest
rarr HERest rarr increase MC (HHEd) rarr HERest
ccopy2008 Giampiero Salvi 3847
Recognition tools
grammar generation
I HLStats creates bigram from training data
I HParse parses a user defined grammar to produce a lattice
decoding
I HVite performs Viterbi decoding
evaluation
I HResults evaluates recognition results
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 2447
Label files
MLF
filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]
filename2
I [] = optional (0 or 1)
I = possible repetition (0 1 2)
I time stamps are in 100ns units () 10ms = 100000
ccopy2008 Giampiero Salvi 2547
Label file example 1
gt cat alignedmlf
MLFa10001a1rec
0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe
10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec
0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt
ccopy2008 Giampiero Salvi 2647
Label file example 2 (HLEd)
gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf
gt cat wordsmlf
MLF
a10001a1rec
forra
a10001i1rec
sju
gt cat words2phonesled
EX
IS sil sil
gt cat phonesmlf
MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil
ccopy2008 Giampiero Salvi 2747
Dictionary (HDMan)
WORD [OUTSYM] PRONPROB P1 P2 P3 P4
gt cat lexdic
forra f oe r a spsju S uh a sp
gt cat lex2dic
ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp
ccopy2008 Giampiero Salvi 2847
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
HMM definition (~h)
1
2 3 4
5
ccopy2008 Giampiero Salvi 2947
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
State definition (~s)
1
2 3 4
5
2
ccopy2008 Giampiero Salvi 3047
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Gaussian mixture component definition(~m)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3147
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Mean vector definition (~u)Diagonal variance vector definition (~v)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3247
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Transition matrix definition (~t)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3347
I HSLab graphical tool to label speech (use WaveSurferinstead)
I HList gives information about audio and feature files
I HSGen generates random sentences out of a regular grammar
ccopy2008 Giampiero Salvi 3447
Intermezzo what do we know so far
Training
audiofeatureextraction
prototype HMM
features
dictionary
labels
model settraining
Recognition
grammar
decodingaudiofeatureextraction
features
model set
labels
dictionary
ccopy2008 Giampiero Salvi 3547
model initialization
Initialization procedure depends on the information avaliable atthat time
I HCompV computes the overall mean and varianceInput a prototype HMM
I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means
Input a prototype HMM time aligned transcriptions
ccopy2008 Giampiero Salvi 3647
Traning tools
I HRest Baum-Welch re-estimation
Input an initialized model set time aligned transcriptions
I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions
I HEAdapt performs adaptation on a limited set of data
I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart
ccopy2008 Giampiero Salvi 3747
Training example RefRec
first pass
prototype HMM rarr HCompV rarr cloning rarr HERest rarr
realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr
HERest rarr realignment (HVite)
second pass
prototype HMM + aligned transcriptions rarr HInit rarr HRest
rarr HERest rarr increase MC (HHEd) rarr HERest
ccopy2008 Giampiero Salvi 3847
Recognition tools
grammar generation
I HLStats creates bigram from training data
I HParse parses a user defined grammar to produce a lattice
decoding
I HVite performs Viterbi decoding
evaluation
I HResults evaluates recognition results
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 2547
Label file example 1
gt cat alignedmlf
MLFa10001a1rec
0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe
10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec
0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt
ccopy2008 Giampiero Salvi 2647
Label file example 2 (HLEd)
gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf
gt cat wordsmlf
MLF
a10001a1rec
forra
a10001i1rec
sju
gt cat words2phonesled
EX
IS sil sil
gt cat phonesmlf
MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil
ccopy2008 Giampiero Salvi 2747
Dictionary (HDMan)
WORD [OUTSYM] PRONPROB P1 P2 P3 P4
gt cat lexdic
forra f oe r a spsju S uh a sp
gt cat lex2dic
ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp
ccopy2008 Giampiero Salvi 2847
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
HMM definition (~h)
1
2 3 4
5
ccopy2008 Giampiero Salvi 2947
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
State definition (~s)
1
2 3 4
5
2
ccopy2008 Giampiero Salvi 3047
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Gaussian mixture component definition(~m)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3147
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Mean vector definition (~u)Diagonal variance vector definition (~v)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3247
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Transition matrix definition (~t)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3347
I HSLab graphical tool to label speech (use WaveSurferinstead)
I HList gives information about audio and feature files
I HSGen generates random sentences out of a regular grammar
ccopy2008 Giampiero Salvi 3447
Intermezzo what do we know so far
Training
audiofeatureextraction
prototype HMM
features
dictionary
labels
model settraining
Recognition
grammar
decodingaudiofeatureextraction
features
model set
labels
dictionary
ccopy2008 Giampiero Salvi 3547
model initialization
Initialization procedure depends on the information avaliable atthat time
I HCompV computes the overall mean and varianceInput a prototype HMM
I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means
Input a prototype HMM time aligned transcriptions
ccopy2008 Giampiero Salvi 3647
Traning tools
I HRest Baum-Welch re-estimation
Input an initialized model set time aligned transcriptions
I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions
I HEAdapt performs adaptation on a limited set of data
I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart
ccopy2008 Giampiero Salvi 3747
Training example RefRec
first pass
prototype HMM rarr HCompV rarr cloning rarr HERest rarr
realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr
HERest rarr realignment (HVite)
second pass
prototype HMM + aligned transcriptions rarr HInit rarr HRest
rarr HERest rarr increase MC (HHEd) rarr HERest
ccopy2008 Giampiero Salvi 3847
Recognition tools
grammar generation
I HLStats creates bigram from training data
I HParse parses a user defined grammar to produce a lattice
decoding
I HVite performs Viterbi decoding
evaluation
I HResults evaluates recognition results
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 2647
Label file example 2 (HLEd)
gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf
gt cat wordsmlf
MLF
a10001a1rec
forra
a10001i1rec
sju
gt cat words2phonesled
EX
IS sil sil
gt cat phonesmlf
MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil
ccopy2008 Giampiero Salvi 2747
Dictionary (HDMan)
WORD [OUTSYM] PRONPROB P1 P2 P3 P4
gt cat lexdic
forra f oe r a spsju S uh a sp
gt cat lex2dic
ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp
ccopy2008 Giampiero Salvi 2847
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
HMM definition (~h)
1
2 3 4
5
ccopy2008 Giampiero Salvi 2947
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
State definition (~s)
1
2 3 4
5
2
ccopy2008 Giampiero Salvi 3047
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Gaussian mixture component definition(~m)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3147
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Mean vector definition (~u)Diagonal variance vector definition (~v)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3247
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Transition matrix definition (~t)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3347
I HSLab graphical tool to label speech (use WaveSurferinstead)
I HList gives information about audio and feature files
I HSGen generates random sentences out of a regular grammar
ccopy2008 Giampiero Salvi 3447
Intermezzo what do we know so far
Training
audiofeatureextraction
prototype HMM
features
dictionary
labels
model settraining
Recognition
grammar
decodingaudiofeatureextraction
features
model set
labels
dictionary
ccopy2008 Giampiero Salvi 3547
model initialization
Initialization procedure depends on the information avaliable atthat time
I HCompV computes the overall mean and varianceInput a prototype HMM
I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means
Input a prototype HMM time aligned transcriptions
ccopy2008 Giampiero Salvi 3647
Traning tools
I HRest Baum-Welch re-estimation
Input an initialized model set time aligned transcriptions
I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions
I HEAdapt performs adaptation on a limited set of data
I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart
ccopy2008 Giampiero Salvi 3747
Training example RefRec
first pass
prototype HMM rarr HCompV rarr cloning rarr HERest rarr
realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr
HERest rarr realignment (HVite)
second pass
prototype HMM + aligned transcriptions rarr HInit rarr HRest
rarr HERest rarr increase MC (HHEd) rarr HERest
ccopy2008 Giampiero Salvi 3847
Recognition tools
grammar generation
I HLStats creates bigram from training data
I HParse parses a user defined grammar to produce a lattice
decoding
I HVite performs Viterbi decoding
evaluation
I HResults evaluates recognition results
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 2747
Dictionary (HDMan)
WORD [OUTSYM] PRONPROB P1 P2 P3 P4
gt cat lexdic
forra f oe r a spsju S uh a sp
gt cat lex2dic
ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp
ccopy2008 Giampiero Salvi 2847
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
HMM definition (~h)
1
2 3 4
5
ccopy2008 Giampiero Salvi 2947
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
State definition (~s)
1
2 3 4
5
2
ccopy2008 Giampiero Salvi 3047
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Gaussian mixture component definition(~m)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3147
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Mean vector definition (~u)Diagonal variance vector definition (~v)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3247
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Transition matrix definition (~t)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3347
I HSLab graphical tool to label speech (use WaveSurferinstead)
I HList gives information about audio and feature files
I HSGen generates random sentences out of a regular grammar
ccopy2008 Giampiero Salvi 3447
Intermezzo what do we know so far
Training
audiofeatureextraction
prototype HMM
features
dictionary
labels
model settraining
Recognition
grammar
decodingaudiofeatureextraction
features
model set
labels
dictionary
ccopy2008 Giampiero Salvi 3547
model initialization
Initialization procedure depends on the information avaliable atthat time
I HCompV computes the overall mean and varianceInput a prototype HMM
I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means
Input a prototype HMM time aligned transcriptions
ccopy2008 Giampiero Salvi 3647
Traning tools
I HRest Baum-Welch re-estimation
Input an initialized model set time aligned transcriptions
I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions
I HEAdapt performs adaptation on a limited set of data
I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart
ccopy2008 Giampiero Salvi 3747
Training example RefRec
first pass
prototype HMM rarr HCompV rarr cloning rarr HERest rarr
realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr
HERest rarr realignment (HVite)
second pass
prototype HMM + aligned transcriptions rarr HInit rarr HRest
rarr HERest rarr increase MC (HHEd) rarr HERest
ccopy2008 Giampiero Salvi 3847
Recognition tools
grammar generation
I HLStats creates bigram from training data
I HParse parses a user defined grammar to produce a lattice
decoding
I HVite performs Viterbi decoding
evaluation
I HResults evaluates recognition results
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 2847
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
HMM definition (~h)
1
2 3 4
5
ccopy2008 Giampiero Salvi 2947
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
State definition (~s)
1
2 3 4
5
2
ccopy2008 Giampiero Salvi 3047
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Gaussian mixture component definition(~m)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3147
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Mean vector definition (~u)Diagonal variance vector definition (~v)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3247
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Transition matrix definition (~t)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3347
I HSLab graphical tool to label speech (use WaveSurferinstead)
I HList gives information about audio and feature files
I HSGen generates random sentences out of a regular grammar
ccopy2008 Giampiero Salvi 3447
Intermezzo what do we know so far
Training
audiofeatureextraction
prototype HMM
features
dictionary
labels
model settraining
Recognition
grammar
decodingaudiofeatureextraction
features
model set
labels
dictionary
ccopy2008 Giampiero Salvi 3547
model initialization
Initialization procedure depends on the information avaliable atthat time
I HCompV computes the overall mean and varianceInput a prototype HMM
I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means
Input a prototype HMM time aligned transcriptions
ccopy2008 Giampiero Salvi 3647
Traning tools
I HRest Baum-Welch re-estimation
Input an initialized model set time aligned transcriptions
I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions
I HEAdapt performs adaptation on a limited set of data
I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart
ccopy2008 Giampiero Salvi 3747
Training example RefRec
first pass
prototype HMM rarr HCompV rarr cloning rarr HERest rarr
realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr
HERest rarr realignment (HVite)
second pass
prototype HMM + aligned transcriptions rarr HInit rarr HRest
rarr HERest rarr increase MC (HHEd) rarr HERest
ccopy2008 Giampiero Salvi 3847
Recognition tools
grammar generation
I HLStats creates bigram from training data
I HParse parses a user defined grammar to produce a lattice
decoding
I HVite performs Viterbi decoding
evaluation
I HResults evaluates recognition results
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 2947
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
State definition (~s)
1
2 3 4
5
2
ccopy2008 Giampiero Salvi 3047
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Gaussian mixture component definition(~m)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3147
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Mean vector definition (~u)Diagonal variance vector definition (~v)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3247
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Transition matrix definition (~t)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3347
I HSLab graphical tool to label speech (use WaveSurferinstead)
I HList gives information about audio and feature files
I HSGen generates random sentences out of a regular grammar
ccopy2008 Giampiero Salvi 3447
Intermezzo what do we know so far
Training
audiofeatureextraction
prototype HMM
features
dictionary
labels
model settraining
Recognition
grammar
decodingaudiofeatureextraction
features
model set
labels
dictionary
ccopy2008 Giampiero Salvi 3547
model initialization
Initialization procedure depends on the information avaliable atthat time
I HCompV computes the overall mean and varianceInput a prototype HMM
I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means
Input a prototype HMM time aligned transcriptions
ccopy2008 Giampiero Salvi 3647
Traning tools
I HRest Baum-Welch re-estimation
Input an initialized model set time aligned transcriptions
I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions
I HEAdapt performs adaptation on a limited set of data
I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart
ccopy2008 Giampiero Salvi 3747
Training example RefRec
first pass
prototype HMM rarr HCompV rarr cloning rarr HERest rarr
realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr
HERest rarr realignment (HVite)
second pass
prototype HMM + aligned transcriptions rarr HInit rarr HRest
rarr HERest rarr increase MC (HHEd) rarr HERest
ccopy2008 Giampiero Salvi 3847
Recognition tools
grammar generation
I HLStats creates bigram from training data
I HParse parses a user defined grammar to produce a lattice
decoding
I HVite performs Viterbi decoding
evaluation
I HResults evaluates recognition results
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 3047
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Gaussian mixture component definition(~m)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3147
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Mean vector definition (~u)Diagonal variance vector definition (~v)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3247
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Transition matrix definition (~t)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3347
I HSLab graphical tool to label speech (use WaveSurferinstead)
I HList gives information about audio and feature files
I HSGen generates random sentences out of a regular grammar
ccopy2008 Giampiero Salvi 3447
Intermezzo what do we know so far
Training
audiofeatureextraction
prototype HMM
features
dictionary
labels
model settraining
Recognition
grammar
decodingaudiofeatureextraction
features
model set
labels
dictionary
ccopy2008 Giampiero Salvi 3547
model initialization
Initialization procedure depends on the information avaliable atthat time
I HCompV computes the overall mean and varianceInput a prototype HMM
I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means
Input a prototype HMM time aligned transcriptions
ccopy2008 Giampiero Salvi 3647
Traning tools
I HRest Baum-Welch re-estimation
Input an initialized model set time aligned transcriptions
I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions
I HEAdapt performs adaptation on a limited set of data
I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart
ccopy2008 Giampiero Salvi 3747
Training example RefRec
first pass
prototype HMM rarr HCompV rarr cloning rarr HERest rarr
realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr
HERest rarr realignment (HVite)
second pass
prototype HMM + aligned transcriptions rarr HInit rarr HRest
rarr HERest rarr increase MC (HHEd) rarr HERest
ccopy2008 Giampiero Salvi 3847
Recognition tools
grammar generation
I HLStats creates bigram from training data
I HParse parses a user defined grammar to produce a lattice
decoding
I HVite performs Viterbi decoding
evaluation
I HResults evaluates recognition results
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 3147
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Mean vector definition (~u)Diagonal variance vector definition (~v)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3247
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Transition matrix definition (~t)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3347
I HSLab graphical tool to label speech (use WaveSurferinstead)
I HList gives information about audio and feature files
I HSGen generates random sentences out of a regular grammar
ccopy2008 Giampiero Salvi 3447
Intermezzo what do we know so far
Training
audiofeatureextraction
prototype HMM
features
dictionary
labels
model settraining
Recognition
grammar
decodingaudiofeatureextraction
features
model set
labels
dictionary
ccopy2008 Giampiero Salvi 3547
model initialization
Initialization procedure depends on the information avaliable atthat time
I HCompV computes the overall mean and varianceInput a prototype HMM
I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means
Input a prototype HMM time aligned transcriptions
ccopy2008 Giampiero Salvi 3647
Traning tools
I HRest Baum-Welch re-estimation
Input an initialized model set time aligned transcriptions
I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions
I HEAdapt performs adaptation on a limited set of data
I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart
ccopy2008 Giampiero Salvi 3747
Training example RefRec
first pass
prototype HMM rarr HCompV rarr cloning rarr HERest rarr
realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr
HERest rarr realignment (HVite)
second pass
prototype HMM + aligned transcriptions rarr HInit rarr HRest
rarr HERest rarr increase MC (HHEd) rarr HERest
ccopy2008 Giampiero Salvi 3847
Recognition tools
grammar generation
I HLStats creates bigram from training data
I HParse parses a user defined grammar to produce a lattice
decoding
I HVite performs Viterbi decoding
evaluation
I HResults evaluates recognition results
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 3247
HMM definition files (HHEd)
~h hmm_nameltBEGINHMMgt
ltNUMSTATESgt 5ltSTATEgt 2
ltNUMMIXESgt 2ltMIXTUREgt 1 08
ltMEANgt 401 00 07 03
ltVARIANCEgt 402 01 01 01
ltMIXTUREgt 2 02ltMEANgt 4
02 03 04 00ltVARIANCEgt 4
01 01 01 02ltSTATEgt 3
~s state_nameltSTATEgt 4
ltNUMMIXESgt 2ltMIXTUREgt 1 07
~m mix_nameltMIXTUREgt 2 03
ltMEANgt 4~u mean_name
ltVARIANCEgt 4~v variance_name
ltTRANSPgt~t transition_name
ltENDHMMgt
Transition matrix definition (~t)
1
2 3 4
5
ccopy2008 Giampiero Salvi 3347
I HSLab graphical tool to label speech (use WaveSurferinstead)
I HList gives information about audio and feature files
I HSGen generates random sentences out of a regular grammar
ccopy2008 Giampiero Salvi 3447
Intermezzo what do we know so far
Training
audiofeatureextraction
prototype HMM
features
dictionary
labels
model settraining
Recognition
grammar
decodingaudiofeatureextraction
features
model set
labels
dictionary
ccopy2008 Giampiero Salvi 3547
model initialization
Initialization procedure depends on the information avaliable atthat time
I HCompV computes the overall mean and varianceInput a prototype HMM
I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means
Input a prototype HMM time aligned transcriptions
ccopy2008 Giampiero Salvi 3647
Traning tools
I HRest Baum-Welch re-estimation
Input an initialized model set time aligned transcriptions
I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions
I HEAdapt performs adaptation on a limited set of data
I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart
ccopy2008 Giampiero Salvi 3747
Training example RefRec
first pass
prototype HMM rarr HCompV rarr cloning rarr HERest rarr
realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr
HERest rarr realignment (HVite)
second pass
prototype HMM + aligned transcriptions rarr HInit rarr HRest
rarr HERest rarr increase MC (HHEd) rarr HERest
ccopy2008 Giampiero Salvi 3847
Recognition tools
grammar generation
I HLStats creates bigram from training data
I HParse parses a user defined grammar to produce a lattice
decoding
I HVite performs Viterbi decoding
evaluation
I HResults evaluates recognition results
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 3347
I HSLab graphical tool to label speech (use WaveSurferinstead)
I HList gives information about audio and feature files
I HSGen generates random sentences out of a regular grammar
ccopy2008 Giampiero Salvi 3447
Intermezzo what do we know so far
Training
audiofeatureextraction
prototype HMM
features
dictionary
labels
model settraining
Recognition
grammar
decodingaudiofeatureextraction
features
model set
labels
dictionary
ccopy2008 Giampiero Salvi 3547
model initialization
Initialization procedure depends on the information avaliable atthat time
I HCompV computes the overall mean and varianceInput a prototype HMM
I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means
Input a prototype HMM time aligned transcriptions
ccopy2008 Giampiero Salvi 3647
Traning tools
I HRest Baum-Welch re-estimation
Input an initialized model set time aligned transcriptions
I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions
I HEAdapt performs adaptation on a limited set of data
I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart
ccopy2008 Giampiero Salvi 3747
Training example RefRec
first pass
prototype HMM rarr HCompV rarr cloning rarr HERest rarr
realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr
HERest rarr realignment (HVite)
second pass
prototype HMM + aligned transcriptions rarr HInit rarr HRest
rarr HERest rarr increase MC (HHEd) rarr HERest
ccopy2008 Giampiero Salvi 3847
Recognition tools
grammar generation
I HLStats creates bigram from training data
I HParse parses a user defined grammar to produce a lattice
decoding
I HVite performs Viterbi decoding
evaluation
I HResults evaluates recognition results
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 3447
Intermezzo what do we know so far
Training
audiofeatureextraction
prototype HMM
features
dictionary
labels
model settraining
Recognition
grammar
decodingaudiofeatureextraction
features
model set
labels
dictionary
ccopy2008 Giampiero Salvi 3547
model initialization
Initialization procedure depends on the information avaliable atthat time
I HCompV computes the overall mean and varianceInput a prototype HMM
I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means
Input a prototype HMM time aligned transcriptions
ccopy2008 Giampiero Salvi 3647
Traning tools
I HRest Baum-Welch re-estimation
Input an initialized model set time aligned transcriptions
I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions
I HEAdapt performs adaptation on a limited set of data
I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart
ccopy2008 Giampiero Salvi 3747
Training example RefRec
first pass
prototype HMM rarr HCompV rarr cloning rarr HERest rarr
realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr
HERest rarr realignment (HVite)
second pass
prototype HMM + aligned transcriptions rarr HInit rarr HRest
rarr HERest rarr increase MC (HHEd) rarr HERest
ccopy2008 Giampiero Salvi 3847
Recognition tools
grammar generation
I HLStats creates bigram from training data
I HParse parses a user defined grammar to produce a lattice
decoding
I HVite performs Viterbi decoding
evaluation
I HResults evaluates recognition results
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 3547
model initialization
Initialization procedure depends on the information avaliable atthat time
I HCompV computes the overall mean and varianceInput a prototype HMM
I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means
Input a prototype HMM time aligned transcriptions
ccopy2008 Giampiero Salvi 3647
Traning tools
I HRest Baum-Welch re-estimation
Input an initialized model set time aligned transcriptions
I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions
I HEAdapt performs adaptation on a limited set of data
I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart
ccopy2008 Giampiero Salvi 3747
Training example RefRec
first pass
prototype HMM rarr HCompV rarr cloning rarr HERest rarr
realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr
HERest rarr realignment (HVite)
second pass
prototype HMM + aligned transcriptions rarr HInit rarr HRest
rarr HERest rarr increase MC (HHEd) rarr HERest
ccopy2008 Giampiero Salvi 3847
Recognition tools
grammar generation
I HLStats creates bigram from training data
I HParse parses a user defined grammar to produce a lattice
decoding
I HVite performs Viterbi decoding
evaluation
I HResults evaluates recognition results
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 3647
Traning tools
I HRest Baum-Welch re-estimation
Input an initialized model set time aligned transcriptions
I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions
I HEAdapt performs adaptation on a limited set of data
I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart
ccopy2008 Giampiero Salvi 3747
Training example RefRec
first pass
prototype HMM rarr HCompV rarr cloning rarr HERest rarr
realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr
HERest rarr realignment (HVite)
second pass
prototype HMM + aligned transcriptions rarr HInit rarr HRest
rarr HERest rarr increase MC (HHEd) rarr HERest
ccopy2008 Giampiero Salvi 3847
Recognition tools
grammar generation
I HLStats creates bigram from training data
I HParse parses a user defined grammar to produce a lattice
decoding
I HVite performs Viterbi decoding
evaluation
I HResults evaluates recognition results
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 3747
Training example RefRec
first pass
prototype HMM rarr HCompV rarr cloning rarr HERest rarr
realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr
HERest rarr realignment (HVite)
second pass
prototype HMM + aligned transcriptions rarr HInit rarr HRest
rarr HERest rarr increase MC (HHEd) rarr HERest
ccopy2008 Giampiero Salvi 3847
Recognition tools
grammar generation
I HLStats creates bigram from training data
I HParse parses a user defined grammar to produce a lattice
decoding
I HVite performs Viterbi decoding
evaluation
I HResults evaluates recognition results
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 3847
Recognition tools
grammar generation
I HLStats creates bigram from training data
I HParse parses a user defined grammar to produce a lattice
decoding
I HVite performs Viterbi decoding
evaluation
I HResults evaluates recognition results
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 3947
Grammar definition (HParse)
delete
page
line
word
char
insert
end insert
bottom
top
movedown
left
up
right
sil
fil
spk
sil
fil
spk
quit
sil
fil
spk
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 4047
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 4147
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
bottom
top
movedown
left
up
right
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 4247
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
delete
page
line
word
char
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 4347
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
insert
end insert
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 4447
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 4547
Grammar definition (HParse)
gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)
I [] optional
I zero or moreI () blockI ltgt loop
I ltltgtgt context dep loop
I | alternative
sil
fil
spk
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 4647
Grammar parsing (HParse) and recognition (HVite)
Parse grammargt HParse grammarbnf grammarslf
Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis audio_filewav
Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf
-y lab dicttxt phoneslis
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-
ccopy2008 Giampiero Salvi 4747
Evaluation (HResults)
gt HResults -I referencemlf wordlst recognizedmlf
====================== HTK Results Analysis =======================
Date Thu Jan 18 161753 2001
Ref nworkdir_traintestsetmlf
Rec nresults_trainmono_32_2recmlf
------------------------ Overall Results --------------------------
SENT Correct=7407 [H=994 S=348 N=1342]
WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]
-------------------------------------------------------------------
N = total number I = insertions S = substitutions D = deletions
correct H = N minus S minus D
correct Corr = HN accuracy Acc = HminusIN
= NminusSminusDminusIN
- Outline
- Introduction
- Data formats and manipulation
- Data visualization
- Training
- Recognition
-