acoustic / lexical model

20
Acoustic / Lexical Model Derk Geene

Upload: rafael

Post on 19-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Acoustic / Lexical Model. Derk Geene. Speech recognition. P(words|signal)= P(signal|words) P(words) / P(signal) P(signal|words): Acoustic model P(words): Language model Idea: Maximize P(signal|words) P(words) Today: Acoustic model. Variability. Variation Speaker Pronunciation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Acoustic / Lexical Model

Acoustic / Lexical Model

Derk Geene

Page 2: Acoustic / Lexical Model

Speech recognition P(words|signal)=

P(signal|words) P(words) / P(signal)

P(signal|words): Acoustic model P(words): Language model

Idea: Maximize P(signal|words) P(words) Today: Acoustic model

Page 3: Acoustic / Lexical Model

Variability Variation

Speaker Pronunciation Environmental Context

Static acoustic model will not work in real applications.

Dynamically adapt P(signal|words) while using the system.

Page 4: Acoustic / Lexical Model

Measuring errors (1) 500 sentences of 6 – 10 words each from 5

to 10 different speakers. 10% relative error reduction

Training set / Development set

First decide optimal parameter settings.

Page 5: Acoustic / Lexical Model

Measuring errors (2) Word recognition errors:

Substitution Deletion Insertion

Correct: Did mob mission area of the Copeland ever go to m4 in nineteen eighty one?

Recognized: Did mob mission area ** the copy land ever go to m4 in nineteen east one?

Page 6: Acoustic / Lexical Model

Measuring errors (3)Correct: The effect is clearRecognised: Effect is not clear

Error RateOne by one: 75%

Subs + Dels + Ins#words in correct sentence

Word error rate=100% x

Word error rate

Page 7: Acoustic / Lexical Model

Units of speech (1) Modeling is language dependent.fixme

Modeling unit Accurate Trainable Generalizable

Page 8: Acoustic / Lexical Model

Units of speech (2) Whole-word models

Only suitable for small vocabulary recognition

Phone models Suitable for large vocabulary recognition Problem: over-generalize less accurate

Syllable models

Page 9: Acoustic / Lexical Model

Context dependency (1) Recognition accuricy can be improved by

using context-dependent parameters.

Important in fast / spontanious speech.

Example: the phoneme /ee/

Page 10: Acoustic / Lexical Model

Peat

Wheel

Page 11: Acoustic / Lexical Model

Context dependency (2) Triphone model: phonetic model that takes into

consideration both the left and the right neightbouring phones.

If two phones have the same identity, but different left or right contexts, there are considered different triphones.

Interword context-dependent phones. Place in the word:

Beginning Middle End

Page 12: Acoustic / Lexical Model

Context dependency (3) Stress

Longer duration Higher pitch More intensity

Word-level stress Import – Import Italy – Italian

Sentence-level stress I did have dinner. I did have dinner.

Page 13: Acoustic / Lexical Model

Radio

Radio

Page 14: Acoustic / Lexical Model

Context dependency (4) Vary much triphones.

503 = 125.000 Many phonemes have the same effects

/b/ & /p/ labial (pronounces by using lips) /r/ & /w/ liquids

Clustered acoustic-phonetic unitsIs the left-context phone a fricative?Is the right-context phone a front vowel?

Page 15: Acoustic / Lexical Model

Acoustic model After feature extraction, we have a

sequence of feature vectors, such as the MFCC vector, as input data.

Feature stream

Phonemes / units

Words

Segmentation and labeling

Lexical access problem

Page 16: Acoustic / Lexical Model

Acoustic model Signal Phonemes

Problem: phonemes can be pronounced differently Speaker differences Speaker rate Microphone

Page 17: Acoustic / Lexical Model

Acoustic model Phonemes Words

The three major ways to do this: Vector Quantization Hidden Markov Models Neural Networks

Page 18: Acoustic / Lexical Model

Acoustic model Problem: Multiple pronunciations:

owt

aa

eyt ow

t

ow

ax

m

aa

ey

t ow

0,5

0,5

0,8

m

Dialect variation

Coarticulation

0,5

0,5

0,2

Page 19: Acoustic / Lexical Model
Page 20: Acoustic / Lexical Model

The End