robust speech recognition

Upload: ashish-katiyar

Post on 06-Apr-2018

229 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 Robust Speech Recognition

    1/17

    Robust Speech

    recognition

  • 8/2/2019 Robust Speech Recognition

    2/17

    Mismatch Between Training and

    Testing

    mismatch influences scores

    causes of mismatch

    Speech Variation

    Inter-Speaker Variation

  • 8/2/2019 Robust Speech Recognition

    3/17

    Robust Approaches

    three categoriesnoise resistant features (Speech var.)

    speech enhancement (Speech var. + Inter-speaker var.)

    model adaptation for noise (Speech var. + Inter-speaker var.)

    Recognition system

    testing

    trainingModels

    Features

    encoding

    Word sequence

    Spk. A

    Spk. B

  • 8/2/2019 Robust Speech Recognition

    4/17

    Contents

    Overview

    Noise resistant features

    Speach enhancementModel adaptation

    Stochastic Matching

    Our current work

  • 8/2/2019 Robust Speech Recognition

    5/17

    Noise resistant features

    Acoustic representation Emphasis on less affected evidences

    Auditory systems inspired models

    Filter banks, Loudness curve, Lateral inhibition

    Slow variation removal Cepstrum Mean Normalization, Time derivatives

    Linear Discriminative Analysis Searches for the best parameterization

  • 8/2/2019 Robust Speech Recognition

    6/17

    Speech enhancement

    Parameter mapping stereo data

    observation subspace

    Bayesian estimation stochastic modelization of speech and noise

    Template based estimation restriction to a subspace

    output is noise free various templates and combination methods

    Spectral Subtraction noise and speech uncorrelated

    slowly varying noise

  • 8/2/2019 Robust Speech Recognition

    7/17

    Model Adaptation for noise Decomposition of HMM or PMC

    Viterbi algorithm searches in a NxM state HMM Noise and speech simultaneously recognized

    complex noises recognized

    State dependant Wiener filtering Wiener filtering in spectral domain faces non-stationary

    Hmms divide speech in quasi-stationary segments

    wiener filters specific to the state

    Discriminative training Classical technique trains models independently

    error corrective training minimum classification error training

    Training data contamination training set corrupted with noisy speech

    depends on the test environment

    lower discriminative scores

    Training

  • 8/2/2019 Robust Speech Recognition

    8/17

    Stochastic Matching : Introduction

    General framework

    in feature space

    in model space

  • 8/2/2019 Robust Speech Recognition

    9/17

    Stochastic Matching : Generalframework

    HMM Models X, X training space

    Y ={y1, , yt}observation in testing space and YFX XY G

    XW

    WYpW ,,maxarg)','(),(

    M

    j

    jijix jiCxNwixp1

    ,, ,,,

    S C

    XCSYp ,,,maxarg'

    X

    W

    WYp ,,max

    XWYp ,,max

    Y W

  • 8/2/2019 Robust Speech Recognition

    10/17

    Stochastic Matching : In FeatureSpace

    XX YCSYpEQ ,,,',,log'

    Estimation step : Auxiliary function

    Maximization step

    ''

    maxarg' Q

    CSXX CSYpCSYpQ

    ,

    ,',,log),,,('

  • 8/2/2019 Robust Speech Recognition

    11/17

    Stochastic Matching : In FeatureSpace (2)

    Simple distorsion function

    Computation of the simple bias

    iitit by

    it

    yFx

    ,,,

    MNT

    mnt imn

    t

    MNT

    mnt imn

    imnit

    t

    i

    mn

    ymn

    b,,

    ,,2

    ,,

    ,,

    ,,

    2

    ,,

    ,,,

    ,

    ,

    '

    b

    1

    2

    y

  • 8/2/2019 Robust Speech Recognition

    12/17

    Stochastic Matching : In ModelSpace

    random additive bias sequence B={b1,,bt}independent of speech stochastic process of

    mean b and diagonal covariance b

    b

    X,n,m

    Y,n,m

    222

    bX,n,mY,n,m

  • 8/2/2019 Robust Speech Recognition

    13/17

    On-Line Frame-Synchronous NoiseCompensation

    Lies on stochastic matching method Transformation parameter estimated along with

    optimal path.

    Uses forward probabilities

    b1 b2 b3 b4

    Sequence of observations

    Bias computation

    y2y3

    y4

    z2 z3 z4 z5

    reco reco reco

    Transformed observations

  • 8/2/2019 Robust Speech Recognition

    14/17

    Theoretical framework and issue

    On line frame synchronous

    cascade of errors

    t N

    n

    M

    m imn

    B

    t N

    n

    M

    m imn

    imni

    B

    itmnx

    ymnx

    b

    1 1 12

    ,,

    1 1 12

    ,,

    ,,,

    ,,,

    ,,

    1

    1

    1. Initiate bias of first frame b0=0

    2. Compute and then b

    3. Transform next frame with b

    4. Goto next frame

    MNT

    mnt imn

    t

    MNT

    mnt imn

    imnitt

    i

    mn

    ymn

    b,,

    ,,2

    ,,

    ,,

    ,,2

    ,,

    ,,,

    ,

    ,

    '

    Classical StochasticMatching

  • 8/2/2019 Robust Speech Recognition

    15/17

    Viterbi Hypothesis vs LinearCombination

    Viterbi Hypothesis take intoaccount only the most probable

    state and gaussian component. Linear combination

    t t+1

    states

  • 8/2/2019 Robust Speech Recognition

    16/17

    Experiments

    Phone numbers in a running car

    Forced Align

    transcription + optimum path

    Free Align

    optimum path

    Wild Align

    no data

    Viterbi LC Viterbi LC Viterbi LC

    Word Accuracy 84,47 87,53 87,61 86,04 88,41 85,03 87,16 87,81 84,95

    Forced Align Free Align Wild Alignbaseline MCR PMC

  • 8/2/2019 Robust Speech Recognition

    17/17

    Perspectives

    Error recovery problem

    a forgetting process

    a model of distorsion functionenvironmental clues

    More elaborated transform