large vocabulary continuous speech recognition. subword speech units

).()|(maxargˆ

.)(

)()|()|(

).|(max)|ˆ(ˆ

WPWYPW

YP

WPWYPYWP

YWPYWPW

W

W

Large VocabularyLarge VocabularyContinuous Speech RecognitionContinuous Speech Recognition

Subword Speech UnitsSubword Speech Units

HMM-Based Subword Speech UnitsHMM-Based Subword Speech Units

,: 321 IW WWWWS

),()()()()()(

)()()()()()(:

1)(213)(3231

2)(22211)(1211

3

21

WUWUWUWUWUWU

WUWUWUWUWUWUS

IWLIIWL

WLWLU

Training of Subword UnitsTraining of Subword Units

Training of Subword UnitsTraining of Subword Units

Training ProcedureTraining Procedure

Errors and performance Errors and performance evaluation in PLU recognitionevaluation in PLU recognition

Substitution error (s)Substitution error (s) Deletion error (d)Deletion error (d) Insertion error (i)Insertion error (i)

Performance evaluation:Performance evaluation: If the total number of PLUs is N, we define:If the total number of PLUs is N, we define:

Correctness rate: N – s – d /NCorrectness rate: N – s – d /N Accuracy rate: N – s – d – i / NAccuracy rate: N – s – d – i / N

otherwise

validiswwifwwP

wwwPwwwwP

wwwwP

wwwPwwPwPwwwPWP

wwwW

jkkj

jNjjjQ

QQ

Q

Q

0

1)|(

),|()|(

|(

)|()|()()()(

,

11121

).121

21312121

21

Language Models for LVCSRLanguage Models for LVCSR

Word Pair Model: Specify which word pairs are valid

Statistical Language ModelingStatistical Language Modeling

)(

)(

)(

),(

),(

),,(),|(ˆ

,),,(

),,,(),,|(ˆ

),,,,|()(

13

1

212

21

3211213

11

1111

1211

i

Nii

NiiiNiii

Niii

Q

iiN

wF

wFp

wF

wwFp

wwF

wwwFpwwwP

wwF

wwwFwwwP

wwwwPWP

),,,(log1

lim

)(log)(

)()()(),,,(

),,,(log),,,(1

lim

21

2121

2121

QQ

Vw

QQ

QQQ

wwwPQ

H

wPwPH

wPwPwPwwwP

wwwPwwwPQ

H

Perplexity of the Language ModelPerplexity of the Language Model

Entropy of the Source:

First order entropy of the source:

If the source is ergodic, meaning its statistical properties can be completely characterized in a sufficiently long sequence that the Source puts out,

QQ

H

Qp

Ni

Q

iiiip

Q

wwwPB

wwwPQ

H

wwwwPQ

H

wwwPQ

H

p /121

21

11

21

21

),,,(ˆ2

),,,(ˆlog1

),,,|(log1

),,,(log1

We often compute H based on a finite but sufficiently large Q:

H is the degree of difficulty that the recognizer encounters, on average,When it is to determine a word from the same source.

Using language model, if the N-gram language model PN(W) is used,An estimate of H is:

In general:

Perplexity is defined as:

Overall recognition system based on subword unitsOverall recognition system based on subword units

Naval Resource (Battleship) Management Task:991-word vocabularyNG (no grammar): perplexity = 991

)(},{)(})({})({)(},{)(:

322.|BE|sentence,aendorbegincannot

448|EB|sentence,aendcanbutsentenceabegincannot

64|EB|sentence,aendcannotbutsentenceabegincon

117|BE|sentence,aendorbegineithercon

that

that

that

that

words

word

words

words

of

of

of

of

set

set

set

set

}{

}{

}{

}{

silenceBEEBsilenceWWsilenceBEEBsilenceS

BE

EB

EB

BE

Word pair grammarWord pair grammar

We can partition the vocabulary into four nonoverlapping sets of words:

The overall FSN allows recognition of sentences of the form:

WP (word pair) grammar:Perplexity=60

FSN based on Partitioning Scheme:995 real arcs and18 null arcs

WB (word bigram)Grammar:Perplexity =20

Control of word insertion/word Control of word insertion/word deletion ratedeletion rate

In the discussed structure, there is In the discussed structure, there is no control on the sentence lengthno control on the sentence length

We introduce a word insertion We introduce a word insertion penalty into the Viterbi decodingpenalty into the Viterbi decoding

For this, a fixed negative quantity is For this, a fixed negative quantity is added to the likelihood score at the added to the likelihood score at the end of each word arcend of each word arc

diphone.(LRC)contextrightleft

diphone,(RC)contextrightt$

diphone(LC)contextleft$

Untis.DependentWord

UntisPhoneMultiple

Dependent)(ContextTriphones

UnitstIndependenContext

)(

1

$

)(

1

)(

2

)(

2

$

:

:

:

:

)4(

)3(

)2(

)1(

RL

R

L

ppp

pp

pp

abovev

v

vah

v

aboveah

ah

vahb

ah

aboveb

b

ahbax

b

aboveax

ax

bax

ax

above

above

above

above

Context-dependent subword unitsContext-dependent subword units

Creation of context-dependent diphones and triphones

$$$$$$

.

$)(

)($

$$

$

$

.3

.2

.1

,)(

spspipishishawawowshowsh

otherwise

Tppcif

Tppcif

p

pp

pp

ppp

ppp

ppp

thenTpppcIf

L

R

L

R

RL

RL

RL

RL

If c(.) is the occurrence count for a given unit, we can use a unit reduction rule such as:

$$ spspipishishshawawowawowshowsh

CD units using only intraword units for “show all ships”:

CD units using both intraword and itnerword units:

Smoothing and interpolation of CD PLU Smoothing and interpolation of CD PLU modelsmodels

.1

,

ˆ

$$$$

$$$$$$

$$

pppppLpppL

pppppp

ppLppLpppLpppLpppL

RR

RR

RRR

BB

BBB

Implementation issues using Implementation issues using CD unitsCD units

Word junction effectsWord junction effects

To handle known phonological changes, a set of phonological rules are Superimposed on both the training and recognition networks.Some typical phonological rules include:

Recognition results using CD Recognition results using CD unitsunits

Position dependent unitsPosition dependent units

qppppq

YLYLpD ||min)(

Unit splitting and Unit splitting and clusteringclustering

A key source of difficulty in continuous speech recognition is the So-called function words, which include words like a, and, for, in, is.The function words have the following properties:

Creation of vocabulary-Creation of vocabulary-independent unitsindependent units

Semantic PostprocessorFor Recognition

large vocabulary continuous speech recognition. subword speech units

Documents