htk
DESCRIPTION
HTK steps for adaptation (very brief idea)TRANSCRIPT
![Page 1: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/1.jpg)
Introduction of HMM Tool Kit (HTK)
![Page 2: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/2.jpg)
2
Benefits of HTK
World recognized state-of-the-art speech recognition system Support a variety of different input formats Support different features Support almost all common speech recognition
technologies
![Page 3: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/3.jpg)
3
Detail features of HTK
HTK can support a variety of different formats ex : pcm, wav, …, ALIEN(unknown), etc. Feature extraction:
MFCC, filterbank, PLP, LPC, …, etc. Very free HMM definition Training
Viterbi (segmentation) Forward/Backward (Baun-Welch) Single model re-estimation (change
feature)
![Page 4: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/4.jpg)
4
HMM system refinement Context-dependent model Parameter tying/clustering Regression class tree (MLLR)
Language Word grammar and network Bigram language model
Decoding Evaluate recognition results Forced alignment NBest lists/lattices
Detail features of HTK
![Page 5: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/5.jpg)
5
HMM adaptation MLLR/Regression Tree MAP
Mean/variance
Detail features of HTK
![Page 6: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/6.jpg)
6
HTK procedures Data/Setting preparation
Define Acoustic units (phone table) Define Dictionary (word) Define grammar/network Collect speech database Generate transcription
Feature Extraction Set configuration file for MFCC feature extraction Prepare Script files (corpus file)
Define HMMs structure (prototype) Training HMM models
Prepare Script files (corpus file) Set configuration file for training, recognition, …,etc. Flat start (uniform segmentation) Viterbi search (forced alignment : segmentation )
Recognition/Performance Evaluation Viterbi search
![Page 7: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/7.jpg)
7
HMM/Data Setting
Phone Table Dictionary Grammar Rule Define HMMs
![Page 8: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/8.jpg)
8
Define Acoustic Units (Phone table) Using our traditional 100 RCD initials 40 CI finals
……
![Page 9: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/9.jpg)
9
Dictionary (411 Syllable table) Using our traditional 411 syllables (plus silence)
word phones_list
……
…...
……
![Page 10: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/10.jpg)
10
Task Grammar Rule Task: Phone Dialing
Dial three two six five four Dial nine zero four one oh nine Phone Woodland Call Steve Young
![Page 11: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/11.jpg)
11
Generate Task Grammar Rule Network
Task: free-syllable decoding Define gram file
$syllable=zhi | chi | ri | a | ……;
( SENT-START < $syllable [sil] > SENT-END )
Parsing gram by HParse wdnet
![Page 12: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/12.jpg)
12
Free-Syllable Decoding “wdnet”
……
Syntax : # I – Nodes# J – arcsN=? L=?
# define nodesI=x W=www…# define arcsJ=x S=y E=z…..
![Page 13: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/13.jpg)
13
Database Preparation
Collect speech data MLF (Master Label File)
The content is in word level Transcribe the collected speech database
Corpus files (training/test set) Script files Change MLF into Phone level labeling
Feature Extraction (MFCC)
![Page 14: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/14.jpg)
14
Word/Phone-Level Transcriptions Word Master Label
File #!MLF!#
“*/4_t0062_t0062331.lab”
tai
yin
.
“*/4_t0062_t0062340.lab”
.
.
Phone Master Label File#!MLF!#“*/4_t0062_t0062331.lab”siltaiNULLyinsil.“*/4_t0062_t0062340.lab”..
using HLEd to transform
![Page 15: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/15.jpg)
15
EX
IS sil sil
DE sp
![Page 16: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/16.jpg)
16
Feature Extraction HCOPY : Data Copy (with format changing)
![Page 17: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/17.jpg)
17
Script files
codetr.scp
source destination
![Page 18: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/18.jpg)
18
Feature Extraction Configuration
# byte order NATURALREADORDER=TRUE NATURALWRITEORDER=TRUE
# Waveform parameters SOURCEFORMAT=ALIEN HEADERSIZE=256 SOURCERATE=1250.0 # Coding parameters TARGETKIND=MFCC_E TARGETRATE=100000.0 SAVECOMPRESSED=F SAVEWITHCRC=T WINDOWSIZE=320000.0
# ZMEANSOURCE=T USEHAMMING=T PREEMCOEF=0.97 NUMCHANS=20 USEPOWER=F#normalized the dynamic range of MFCC CEPLIFTER=22 LOFREQ=0 HIFREQ=4000 NUMCEPS=12 ENORMALISE=T DELTAWINDOW=2 ACCWINDOW=2
![Page 19: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/19.jpg)
19
HMM Configuration
Config File (command-level)
Command –C config_file User Defaults
> export HCONFIG=my_HTK_config Built-in Defaults
ref Chap 18 in HTK manual
![Page 20: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/20.jpg)
20
Define HMM Structure
![Page 21: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/21.jpg)
21
HMM Prototype Definition
~o <VecSize> 39 <MFCC_Z_E_D_A>
~h "proto"
<BeginHMM>
<NumStates> 5
<State> 2 <NumMixes> 4
<Mixture> 1 0.25
<Mean> 39
……
<Variance> 39
……
<TransP> 5
0.0 1.0 0.0 0.0 0.0
0.0 0.5 0.5 0.0 0.0
0.0 0.0 0.5 0.5 0.0
0.0 0.0 0.0 0.5 0.5
0.0 0.0 0.0 0.0 0.0
<EndHMM>
![Page 22: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/22.jpg)
22
Create HMM Prototypes
![Page 23: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/23.jpg)
23
Training Procedure
Model Initialization Flat start (unknown segmentation uniform
segmentation) Viterbi search (given segmentation) Forward/backward only in word level
Model Refinement Mixture splitting
![Page 24: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/24.jpg)
24
Configuration file for Training/Test
# byte order#BYTEORDER=VAX NATURALREADORDER=TRUE NATURALWRITEORDER=TRUE# MFCC parameters SOURCEFORMAT=HTK SOURCERATE=100000.0
TARGETKIND=MFCC_E_D_A_Z TARGETRATE=100000.0 DELATWINDOW=2 ACCWINDOW=2#new variable can replace the varFloorVARFLOORPERCANTILE = 20
![Page 25: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/25.jpg)
25
Training Corpus
Mat4500_train_phones.mlf
……
……
Mat4500_train.scp
![Page 26: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/26.jpg)
26
Flat start
Viterbi search
Forward/Backward
![Page 27: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/27.jpg)
27
Initialize HMMs from Flat Start
![Page 28: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/28.jpg)
28
Initialize HMMs from Viterbi Search
![Page 29: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/29.jpg)
29
Utterance Segmentation *.mlf mat4500_train.mlf (phone-level with segmentation
information)
…
![Page 30: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/30.jpg)
30
Silence and short pause model sp share the middle state for silence
Sil.hed:AT 2 4 0.2 {sil.transP}
AT 4 2 0.2 {sil.transP}
AT 1 3 0.3{sp.transP}
TI silst {sil.state[3],sp.state[2]}
![Page 31: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/31.jpg)
31
Mixture Splitting
![Page 32: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/32.jpg)
32
Mixture Splitting Script
MU2.hed
……
![Page 33: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/33.jpg)
33
Recognition/Evaluation ProcedureRecognition
Evaluation
![Page 34: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/34.jpg)
34
Test Corpus Mat4500_test.mlf
……
Mat4500_test.scp
![Page 35: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/35.jpg)
35
Two Type of Result Formats
![Page 36: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/36.jpg)
36
Confusion Matrix Analysis
![Page 37: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/37.jpg)
37
Force Alignment
Viterbi decoding HVite using option -a You can get some statistics of the HMM
segmentation Useful for mixture number determined
![Page 38: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/38.jpg)
38
Speaker Adaptation – MLLR, MAP MLLR
In training phase generate the states occupation statistics
% HERest –s HHed
RN “models” //ReName hmmid
LS “stats” //loads states occupation statistics
RC 32 “rtree” //Regression class = 32
or RC 32 “rtree” {sil.state[2-4].mix}
![Page 39: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/39.jpg)
39
force alignment of adaptation data
%Hvite … -a … -I adapWords.mlf -m …. Find global MLLR
%HEAdapt –C … -g … -K global.tmf …-I adapPhone.mlf ….
*.tmf : transform model file Find MLLR regression Tree]
%HEAdapt –C … -J global.tmf –K rc.tmf …-I adapPhone.mlf …
Recognition
%HVite … -J rc.tmf ….
![Page 40: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/40.jpg)
40
MAP adaptation HEAdapt –C … -j 0.9 …-k …-I adapPhone.mlf
…
-j : weight
-k : using MLLR before MAP
![Page 41: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/41.jpg)
41
Further topics
Model/state tying (HMM definition) Context-dependent model Fast training/search (Beam search) Insertion/Deletion problem
Duration constraint
word transition penalty Word Lattice output
![Page 42: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/42.jpg)
42
Detail options for the HTK commands HCompV
Typical argumentsHCompV –C xxx –f 0.01 –m –S *.scp –M output_dir hmm
-m : update mean -f f : set varFloor to f*global variance
in hmm macro~o …~v “varFloor1”<Variance> 38………………..
![Page 43: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/43.jpg)
43
Detail options for the HTK commands HERest
Typical argumentsHERest –C xxx –I *.mlf –t 250.0 150.0 1000.0 -S *.scp –H hmm_macros –H hmm_defs –M output_dir hmmlist
-t f [i l] : set the pruning threshold to f f f+i until f=l
-T tracing option octal number, command dependent 00020 show occupation counts
![Page 44: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/44.jpg)
44
Detail options for the HTK commands HVite
Typical argumentsHVite –H hmm_macros –H hmm_defs –S *.scp –i output_mlf –w wdnet –p 0.0 –s 5.0 –t 250 dict tiedlist
-t f [i l] : set the pruning threshold to f f f+i until f=l
-m : show model boundaries -a : force alignment, -I input.mlf -p, -s : word insertion penalty, weight for grammar
score
![Page 45: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/45.jpg)
45
Detail options for the HTK commands HResult
Typical arguments
HResult –I *.mlf hmmlist answer.mlf
-n : use NIST -e s t : label t is made equivalent to s
![Page 46: HTK](https://reader031.vdocuments.net/reader031/viewer/2022012401/544d3bf1af7959f3138b4c0e/html5/thumbnails/46.jpg)
46
Detail options for the HTK commands HInit
Typical arguments
HInit –S *.scp –M hmm_macro –H hmm_defs model HRest
Typical arguments
HRest –S *.scp –M hmm_macro –H hmm_defs model HSLab
Use wavesufer.