htk

46
Introduction of HMM Tool Kit (HTK)

Upload: ashutosh-verma

Post on 26-Oct-2014

189 views

Category:

Documents


4 download

DESCRIPTION

HTK steps for adaptation (very brief idea)

TRANSCRIPT

Page 1: HTK

Introduction of HMM Tool Kit (HTK)

Page 2: HTK

2

Benefits of HTK

World recognized state-of-the-art speech recognition system Support a variety of different input formats Support different features Support almost all common speech recognition

technologies

Page 3: HTK

3

Detail features of HTK

HTK can support a variety of different formats ex : pcm, wav, …, ALIEN(unknown), etc. Feature extraction:

MFCC, filterbank, PLP, LPC, …, etc. Very free HMM definition Training

Viterbi (segmentation) Forward/Backward (Baun-Welch) Single model re-estimation (change

feature)

Page 4: HTK

4

HMM system refinement Context-dependent model Parameter tying/clustering Regression class tree (MLLR)

Language Word grammar and network Bigram language model

Decoding Evaluate recognition results Forced alignment NBest lists/lattices

Detail features of HTK

Page 5: HTK

5

HMM adaptation MLLR/Regression Tree MAP

Mean/variance

Detail features of HTK

Page 6: HTK

6

HTK procedures Data/Setting preparation

Define Acoustic units (phone table) Define Dictionary (word) Define grammar/network Collect speech database Generate transcription

Feature Extraction Set configuration file for MFCC feature extraction Prepare Script files (corpus file)

Define HMMs structure (prototype) Training HMM models

Prepare Script files (corpus file) Set configuration file for training, recognition, …,etc. Flat start (uniform segmentation) Viterbi search (forced alignment : segmentation )

Recognition/Performance Evaluation Viterbi search

Page 7: HTK

7

HMM/Data Setting

Phone Table Dictionary Grammar Rule Define HMMs

Page 8: HTK

8

Define Acoustic Units (Phone table) Using our traditional 100 RCD initials 40 CI finals

……

Page 9: HTK

9

Dictionary (411 Syllable table) Using our traditional 411 syllables (plus silence)

word phones_list

……

…...

……

Page 10: HTK

10

Task Grammar Rule Task: Phone Dialing

Dial three two six five four Dial nine zero four one oh nine Phone Woodland Call Steve Young

Page 11: HTK

11

Generate Task Grammar Rule Network

Task: free-syllable decoding Define gram file

$syllable=zhi | chi | ri | a | ……;

( SENT-START < $syllable [sil] > SENT-END )

Parsing gram by HParse wdnet

Page 12: HTK

12

Free-Syllable Decoding “wdnet”

……

Syntax : # I – Nodes# J – arcsN=? L=?

# define nodesI=x W=www…# define arcsJ=x S=y E=z…..

Page 13: HTK

13

Database Preparation

Collect speech data MLF (Master Label File)

The content is in word level Transcribe the collected speech database

Corpus files (training/test set) Script files Change MLF into Phone level labeling

Feature Extraction (MFCC)

Page 14: HTK

14

Word/Phone-Level Transcriptions Word Master Label

File #!MLF!#

“*/4_t0062_t0062331.lab”

tai

yin

.

“*/4_t0062_t0062340.lab”

.

.

Phone Master Label File#!MLF!#“*/4_t0062_t0062331.lab”siltaiNULLyinsil.“*/4_t0062_t0062340.lab”..

using HLEd to transform

Page 15: HTK

15

EX

IS sil sil

DE sp

Page 16: HTK

16

Feature Extraction HCOPY : Data Copy (with format changing)

Page 17: HTK

17

Script files

codetr.scp

source destination

Page 18: HTK

18

Feature Extraction Configuration

# byte order NATURALREADORDER=TRUE NATURALWRITEORDER=TRUE

# Waveform parameters SOURCEFORMAT=ALIEN HEADERSIZE=256 SOURCERATE=1250.0 # Coding parameters TARGETKIND=MFCC_E TARGETRATE=100000.0 SAVECOMPRESSED=F SAVEWITHCRC=T WINDOWSIZE=320000.0

# ZMEANSOURCE=T USEHAMMING=T PREEMCOEF=0.97 NUMCHANS=20 USEPOWER=F#normalized the dynamic range of MFCC CEPLIFTER=22 LOFREQ=0 HIFREQ=4000 NUMCEPS=12 ENORMALISE=T DELTAWINDOW=2 ACCWINDOW=2

Page 19: HTK

19

HMM Configuration

Config File (command-level)

Command –C config_file User Defaults

> export HCONFIG=my_HTK_config Built-in Defaults

ref Chap 18 in HTK manual

Page 20: HTK

20

Define HMM Structure

Page 21: HTK

21

HMM Prototype Definition

~o <VecSize> 39 <MFCC_Z_E_D_A>

~h "proto"

<BeginHMM>

<NumStates> 5

<State> 2 <NumMixes> 4

<Mixture> 1 0.25

<Mean> 39

……

<Variance> 39

……

<TransP> 5

0.0 1.0 0.0 0.0 0.0

0.0 0.5 0.5 0.0 0.0

0.0 0.0 0.5 0.5 0.0

0.0 0.0 0.0 0.5 0.5

0.0 0.0 0.0 0.0 0.0

<EndHMM>

Page 22: HTK

22

Create HMM Prototypes

Page 23: HTK

23

Training Procedure

Model Initialization Flat start (unknown segmentation uniform

segmentation) Viterbi search (given segmentation) Forward/backward only in word level

Model Refinement Mixture splitting

Page 24: HTK

24

Configuration file for Training/Test

# byte order#BYTEORDER=VAX NATURALREADORDER=TRUE NATURALWRITEORDER=TRUE# MFCC parameters SOURCEFORMAT=HTK SOURCERATE=100000.0

TARGETKIND=MFCC_E_D_A_Z TARGETRATE=100000.0 DELATWINDOW=2 ACCWINDOW=2#new variable can replace the varFloorVARFLOORPERCANTILE = 20

Page 25: HTK

25

Training Corpus

Mat4500_train_phones.mlf

……

……

Mat4500_train.scp

Page 26: HTK

26

Flat start

Viterbi search

Forward/Backward

Page 27: HTK

27

Initialize HMMs from Flat Start

Page 28: HTK

28

Initialize HMMs from Viterbi Search

Page 29: HTK

29

Utterance Segmentation *.mlf mat4500_train.mlf (phone-level with segmentation

information)

Page 30: HTK

30

Silence and short pause model sp share the middle state for silence

Sil.hed:AT 2 4 0.2 {sil.transP}

AT 4 2 0.2 {sil.transP}

AT 1 3 0.3{sp.transP}

TI silst {sil.state[3],sp.state[2]}

Page 31: HTK

31

Mixture Splitting

Page 32: HTK

32

Mixture Splitting Script

MU2.hed

……

Page 33: HTK

33

Recognition/Evaluation ProcedureRecognition

Evaluation

Page 34: HTK

34

Test Corpus Mat4500_test.mlf

……

Mat4500_test.scp

Page 35: HTK

35

Two Type of Result Formats

Page 36: HTK

36

Confusion Matrix Analysis

Page 37: HTK

37

Force Alignment

Viterbi decoding HVite using option -a You can get some statistics of the HMM

segmentation Useful for mixture number determined

Page 38: HTK

38

Speaker Adaptation – MLLR, MAP MLLR

In training phase generate the states occupation statistics

% HERest –s HHed

RN “models” //ReName hmmid

LS “stats” //loads states occupation statistics

RC 32 “rtree” //Regression class = 32

or RC 32 “rtree” {sil.state[2-4].mix}

Page 39: HTK

39

force alignment of adaptation data

%Hvite … -a … -I adapWords.mlf -m …. Find global MLLR

%HEAdapt –C … -g … -K global.tmf …-I adapPhone.mlf ….

*.tmf : transform model file Find MLLR regression Tree]

%HEAdapt –C … -J global.tmf –K rc.tmf …-I adapPhone.mlf …

Recognition

%HVite … -J rc.tmf ….

Page 40: HTK

40

MAP adaptation HEAdapt –C … -j 0.9 …-k …-I adapPhone.mlf

-j : weight

-k : using MLLR before MAP

Page 41: HTK

41

Further topics

Model/state tying (HMM definition) Context-dependent model Fast training/search (Beam search) Insertion/Deletion problem

Duration constraint

word transition penalty Word Lattice output

Page 42: HTK

42

Detail options for the HTK commands HCompV

Typical argumentsHCompV –C xxx –f 0.01 –m –S *.scp –M output_dir hmm

-m : update mean -f f : set varFloor to f*global variance

in hmm macro~o …~v “varFloor1”<Variance> 38………………..

Page 43: HTK

43

Detail options for the HTK commands HERest

Typical argumentsHERest –C xxx –I *.mlf –t 250.0 150.0 1000.0 -S *.scp –H hmm_macros –H hmm_defs –M output_dir hmmlist

-t f [i l] : set the pruning threshold to f f f+i until f=l

-T tracing option octal number, command dependent 00020 show occupation counts

Page 44: HTK

44

Detail options for the HTK commands HVite

Typical argumentsHVite –H hmm_macros –H hmm_defs –S *.scp –i output_mlf –w wdnet –p 0.0 –s 5.0 –t 250 dict tiedlist

-t f [i l] : set the pruning threshold to f f f+i until f=l

-m : show model boundaries -a : force alignment, -I input.mlf -p, -s : word insertion penalty, weight for grammar

score

Page 45: HTK

45

Detail options for the HTK commands HResult

Typical arguments

HResult –I *.mlf hmmlist answer.mlf

-n : use NIST -e s t : label t is made equivalent to s

Page 46: HTK

46

Detail options for the HTK commands HInit

Typical arguments

HInit –S *.scp –M hmm_macro –H hmm_defs model HRest

Typical arguments

HRest –S *.scp –M hmm_macro –H hmm_defs model HSLab

Use wavesufer.