large vocabulary continuous speech recognition. subword speech units

47
). ( ) | ( max arg ˆ . ) ( ) ( ) | ( ) | ( ). | ( max ) | ˆ ( ˆ W P W Y P W Y P W P W Y P Y W P Y W P Y W P W W W Large Vocabulary Large Vocabulary Continuous Speech Recognition Continuous Speech Recognition

Upload: mildred-lawrence

Post on 13-Dec-2015

243 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Large Vocabulary Continuous Speech Recognition. Subword Speech Units

).()|(maxargˆ

.)(

)()|()|(

).|(max)|ˆ(ˆ

WPWYPW

YP

WPWYPYWP

YWPYWPW

W

W

Large VocabularyLarge VocabularyContinuous Speech RecognitionContinuous Speech Recognition

Page 2: Large Vocabulary Continuous Speech Recognition. Subword Speech Units

Subword Speech UnitsSubword Speech Units

Page 3: Large Vocabulary Continuous Speech Recognition. Subword Speech Units
Page 4: Large Vocabulary Continuous Speech Recognition. Subword Speech Units
Page 5: Large Vocabulary Continuous Speech Recognition. Subword Speech Units

HMM-Based Subword Speech UnitsHMM-Based Subword Speech Units

Page 6: Large Vocabulary Continuous Speech Recognition. Subword Speech Units
Page 7: Large Vocabulary Continuous Speech Recognition. Subword Speech Units

,: 321 IW WWWWS

),()()()()()(

)()()()()()(:

1)(213)(3231

2)(22211)(1211

3

21

WUWUWUWUWUWU

WUWUWUWUWUWUS

IWLIIWL

WLWLU

Training of Subword UnitsTraining of Subword Units

Page 8: Large Vocabulary Continuous Speech Recognition. Subword Speech Units

Training of Subword UnitsTraining of Subword Units

Page 9: Large Vocabulary Continuous Speech Recognition. Subword Speech Units
Page 10: Large Vocabulary Continuous Speech Recognition. Subword Speech Units

Training ProcedureTraining Procedure

Page 11: Large Vocabulary Continuous Speech Recognition. Subword Speech Units
Page 12: Large Vocabulary Continuous Speech Recognition. Subword Speech Units
Page 13: Large Vocabulary Continuous Speech Recognition. Subword Speech Units
Page 14: Large Vocabulary Continuous Speech Recognition. Subword Speech Units

Errors and performance Errors and performance evaluation in PLU recognitionevaluation in PLU recognition

Substitution error (s)Substitution error (s) Deletion error (d)Deletion error (d) Insertion error (i)Insertion error (i)

Performance evaluation:Performance evaluation: If the total number of PLUs is N, we define:If the total number of PLUs is N, we define:

Correctness rate: N – s – d /NCorrectness rate: N – s – d /N Accuracy rate: N – s – d – i / NAccuracy rate: N – s – d – i / N

Page 15: Large Vocabulary Continuous Speech Recognition. Subword Speech Units

otherwise

validiswwifwwP

wwwPwwwwP

wwwwP

wwwPwwPwPwwwPWP

wwwW

jkkj

jNjjjQ

QQ

Q

Q

0

1)|(

),|()|(

|(

)|()|()()()(

,

11121

).121

21312121

21

Language Models for LVCSRLanguage Models for LVCSR

Word Pair Model: Specify which word pairs are valid

Page 16: Large Vocabulary Continuous Speech Recognition. Subword Speech Units

Statistical Language ModelingStatistical Language Modeling

)(

)(

)(

),(

),(

),,(),|(ˆ

,),,(

),,,(),,|(ˆ

),,,,|()(

13

1

212

21

3211213

11

1111

1211

i

Nii

NiiiNiii

Niii

Q

iiN

wF

wFp

wF

wwFp

wwF

wwwFpwwwP

wwF

wwwFwwwP

wwwwPWP

Page 17: Large Vocabulary Continuous Speech Recognition. Subword Speech Units

),,,(log1

lim

)(log)(

)()()(),,,(

),,,(log),,,(1

lim

21

2121

2121

QQ

Vw

QQ

QQQ

wwwPQ

H

wPwPH

wPwPwPwwwP

wwwPwwwPQ

H

Perplexity of the Language ModelPerplexity of the Language Model

Entropy of the Source:

First order entropy of the source:

If the source is ergodic, meaning its statistical properties can be completely characterized in a sufficiently long sequence that the Source puts out,

Page 18: Large Vocabulary Continuous Speech Recognition. Subword Speech Units

QQ

H

Qp

Ni

Q

iiiip

Q

wwwPB

wwwPQ

H

wwwwPQ

H

wwwPQ

H

p /121

21

11

21

21

),,,(ˆ2

),,,(ˆlog1

),,,|(log1

),,,(log1

We often compute H based on a finite but sufficiently large Q:

H is the degree of difficulty that the recognizer encounters, on average,When it is to determine a word from the same source.

Using language model, if the N-gram language model PN(W) is used,An estimate of H is:

In general:

Perplexity is defined as:

Page 19: Large Vocabulary Continuous Speech Recognition. Subword Speech Units

Overall recognition system based on subword unitsOverall recognition system based on subword units

Page 20: Large Vocabulary Continuous Speech Recognition. Subword Speech Units

Naval Resource (Battleship) Management Task:991-word vocabularyNG (no grammar): perplexity = 991

Page 21: Large Vocabulary Continuous Speech Recognition. Subword Speech Units

)(},{)(})({})({)(},{)(:

322.|BE|sentence,aendorbegincannot

448|EB|sentence,aendcanbutsentenceabegincannot

64|EB|sentence,aendcannotbutsentenceabegincon

117|BE|sentence,aendorbegineithercon

that

that

that

that

words

word

words

words

of

of

of

of

set

set

set

set

}{

}{

}{

}{

silenceBEEBsilenceWWsilenceBEEBsilenceS

BE

EB

EB

BE

Word pair grammarWord pair grammar

We can partition the vocabulary into four nonoverlapping sets of words:

The overall FSN allows recognition of sentences of the form:

Page 22: Large Vocabulary Continuous Speech Recognition. Subword Speech Units

WP (word pair) grammar:Perplexity=60

FSN based on Partitioning Scheme:995 real arcs and18 null arcs

WB (word bigram)Grammar:Perplexity =20

Page 23: Large Vocabulary Continuous Speech Recognition. Subword Speech Units

Control of word insertion/word Control of word insertion/word deletion ratedeletion rate

In the discussed structure, there is In the discussed structure, there is no control on the sentence lengthno control on the sentence length

We introduce a word insertion We introduce a word insertion penalty into the Viterbi decodingpenalty into the Viterbi decoding

For this, a fixed negative quantity is For this, a fixed negative quantity is added to the likelihood score at the added to the likelihood score at the end of each word arcend of each word arc

Page 24: Large Vocabulary Continuous Speech Recognition. Subword Speech Units
Page 25: Large Vocabulary Continuous Speech Recognition. Subword Speech Units
Page 26: Large Vocabulary Continuous Speech Recognition. Subword Speech Units
Page 27: Large Vocabulary Continuous Speech Recognition. Subword Speech Units
Page 28: Large Vocabulary Continuous Speech Recognition. Subword Speech Units

diphone.(LRC)contextrightleft

diphone,(RC)contextrightt$

diphone(LC)contextleft$

Untis.DependentWord

UntisPhoneMultiple

Dependent)(ContextTriphones

UnitstIndependenContext

)(

1

$

)(

1

)(

2

)(

2

$

:

:

:

:

)4(

)3(

)2(

)1(

RL

R

L

ppp

pp

pp

abovev

v

vah

v

aboveah

ah

vahb

ah

aboveb

b

ahbax

b

aboveax

ax

bax

ax

above

above

above

above

Context-dependent subword unitsContext-dependent subword units

Creation of context-dependent diphones and triphones

Page 29: Large Vocabulary Continuous Speech Recognition. Subword Speech Units

$$$$$$

.

$)(

)($

$$

$

$

.3

.2

.1

,)(

spspipishishawawowshowsh

otherwise

Tppcif

Tppcif

p

pp

pp

ppp

ppp

ppp

thenTpppcIf

L

R

L

R

RL

RL

RL

RL

If c(.) is the occurrence count for a given unit, we can use a unit reduction rule such as:

$$ spspipishishshawawowawowshowsh

CD units using only intraword units for “show all ships”:

CD units using both intraword and itnerword units:

Page 30: Large Vocabulary Continuous Speech Recognition. Subword Speech Units
Page 31: Large Vocabulary Continuous Speech Recognition. Subword Speech Units

Smoothing and interpolation of CD PLU Smoothing and interpolation of CD PLU modelsmodels

.1

,

ˆ

$$$$

$$$$$$

$$

pppppLpppL

pppppp

ppLppLpppLpppLpppL

RR

RR

RRR

BB

BBB

Page 32: Large Vocabulary Continuous Speech Recognition. Subword Speech Units

Implementation issues using Implementation issues using CD unitsCD units

Page 33: Large Vocabulary Continuous Speech Recognition. Subword Speech Units
Page 34: Large Vocabulary Continuous Speech Recognition. Subword Speech Units
Page 35: Large Vocabulary Continuous Speech Recognition. Subword Speech Units

Word junction effectsWord junction effects

To handle known phonological changes, a set of phonological rules are Superimposed on both the training and recognition networks.Some typical phonological rules include:

Page 36: Large Vocabulary Continuous Speech Recognition. Subword Speech Units

Recognition results using CD Recognition results using CD unitsunits

Page 37: Large Vocabulary Continuous Speech Recognition. Subword Speech Units
Page 38: Large Vocabulary Continuous Speech Recognition. Subword Speech Units

Position dependent unitsPosition dependent units

qppppq

YLYLpD ||min)(

Page 39: Large Vocabulary Continuous Speech Recognition. Subword Speech Units

Unit splitting and Unit splitting and clusteringclustering

Page 40: Large Vocabulary Continuous Speech Recognition. Subword Speech Units
Page 41: Large Vocabulary Continuous Speech Recognition. Subword Speech Units
Page 42: Large Vocabulary Continuous Speech Recognition. Subword Speech Units
Page 43: Large Vocabulary Continuous Speech Recognition. Subword Speech Units
Page 44: Large Vocabulary Continuous Speech Recognition. Subword Speech Units
Page 45: Large Vocabulary Continuous Speech Recognition. Subword Speech Units

A key source of difficulty in continuous speech recognition is the So-called function words, which include words like a, and, for, in, is.The function words have the following properties:

Page 46: Large Vocabulary Continuous Speech Recognition. Subword Speech Units

Creation of vocabulary-Creation of vocabulary-independent unitsindependent units

Page 47: Large Vocabulary Continuous Speech Recognition. Subword Speech Units

Semantic PostprocessorFor Recognition