robust speech recognition

8/2/2019 Robust Speech Recognition

1/17

Robust Speech

recognition


2/17

Mismatch Between Training and

Testing

mismatch influences scores

causes of mismatch

Speech Variation

Inter-Speaker Variation


3/17

Robust Approaches

three categoriesnoise resistant features (Speech var.)

speech enhancement (Speech var. + Inter-speaker var.)

model adaptation for noise (Speech var. + Inter-speaker var.)

Recognition system

testing

trainingModels

Features

encoding

Word sequence

Spk. A

Spk. B


4/17

Contents

Overview

Noise resistant features

Speach enhancementModel adaptation

Stochastic Matching

Our current work


5/17

Noise resistant features

Acoustic representation Emphasis on less affected evidences

Auditory systems inspired models

Filter banks, Loudness curve, Lateral inhibition

Slow variation removal Cepstrum Mean Normalization, Time derivatives

Linear Discriminative Analysis Searches for the best parameterization


6/17

Speech enhancement

Parameter mapping stereo data

observation subspace

Bayesian estimation stochastic modelization of speech and noise

Template based estimation restriction to a subspace

output is noise free various templates and combination methods

Spectral Subtraction noise and speech uncorrelated

slowly varying noise


7/17

Model Adaptation for noise Decomposition of HMM or PMC

Viterbi algorithm searches in a NxM state HMM Noise and speech simultaneously recognized

complex noises recognized

State dependant Wiener filtering Wiener filtering in spectral domain faces non-stationary

Hmms divide speech in quasi-stationary segments

wiener filters specific to the state

Discriminative training Classical technique trains models independently

error corrective training minimum classification error training

Training data contamination training set corrupted with noisy speech

depends on the test environment

lower discriminative scores

Training


8/17

Stochastic Matching : Introduction

General framework

in feature space

in model space


9/17

Stochastic Matching : Generalframework

HMM Models X, X training space

Y ={y1, , yt}observation in testing space and YFX XY G

XW

WYpW ,,maxarg)','(),(

M

j

jijix jiCxNwixp1

,, ,,,

S C

XCSYp ,,,maxarg'

X

W

WYp ,,max

XWYp ,,max

Y W


10/17

Stochastic Matching : In FeatureSpace

XX YCSYpEQ ,,,',,log'

Estimation step : Auxiliary function

Maximization step

''

maxarg' Q

CSXX CSYpCSYpQ

,

,',,log),,,('


11/17

Stochastic Matching : In FeatureSpace (2)

Simple distorsion function

Computation of the simple bias

iitit by

it

yFx

,,,

MNT

mnt imn

t

MNT

mnt imn

imnit

t

i

mn

ymn

b,,

,,2

,,

,,

,,

2

,,

,,,

,

,

'

b

1

2

y


12/17

Stochastic Matching : In ModelSpace

random additive bias sequence B={b1,,bt}independent of speech stochastic process of

mean b and diagonal covariance b

b

X,n,m

Y,n,m

222

bX,n,mY,n,m


13/17

On-Line Frame-Synchronous NoiseCompensation

Lies on stochastic matching method Transformation parameter estimated along with

optimal path.

Uses forward probabilities

b1 b2 b3 b4

Sequence of observations

Bias computation

y2y3

y4

z2 z3 z4 z5

reco reco reco

Transformed observations


14/17

Theoretical framework and issue

On line frame synchronous

cascade of errors

t N

n

M

m imn

B

t N

n

M

m imn

imni

B

itmnx

ymnx

b

1 1 12

,,

1 1 12

,,

,,,

,,,

,,

1

1

1. Initiate bias of first frame b0=0

2. Compute and then b

3. Transform next frame with b

4. Goto next frame

MNT

mnt imn

t

MNT

mnt imn

imnitt

i

mn

ymn

b,,

,,2

,,

,,

,,2

,,

,,,

,

,

'

Classical StochasticMatching


15/17

Viterbi Hypothesis vs LinearCombination

Viterbi Hypothesis take intoaccount only the most probable

state and gaussian component. Linear combination

t t+1

states


16/17

Experiments

Phone numbers in a running car

Forced Align

transcription + optimum path

Free Align

optimum path

Wild Align

no data

Viterbi LC Viterbi LC Viterbi LC

Word Accuracy 84,47 87,53 87,61 86,04 88,41 85,03 87,16 87,81 84,95

Forced Align Free Align Wild Alignbaseline MCR PMC


17/17

Perspectives

Error recovery problem

a forgetting process

a model of distorsion functionenvironmental clues

More elaborated transform

robust speech recognition

Documents