a comparison of front-end configuration for robust speech

25
8/8/2019 A Comparison of Front-End Configuration for Robust Speech http://slidepdf.com/reader/full/a-comparison-of-front-end-configuration-for-robust-speech 1/25 A  COMPARISON  OF  FRONT END  CONFIGURATION  FOR ROBUST SPEECH  RECOGNITION Author : Ben Milner Professor: 陳嘉平 Reporter:葉佳璋 1

Upload: chayan-ghosh

Post on 09-Apr-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 A Comparison of Front-End Configuration for Robust Speech

    1/25

    ACOMPARISON

    OF

    FRONT

    END

    CONFIGURATIONFORROBUST

    SPEECHRECOGNITION

    Author:BenMilner

    Professor:Reporter:

    1

  • 8/8/2019 A Comparison of Front-End Configuration for Robust Speech

    2/25

    Outline

    Introduction

    Staticfeatures

    NormalizationMethods

    TemporalInformation

    ExperimentalResults

    2/25

  • 8/8/2019 A Comparison of Front-End Configuration for Robust Speech

    3/25

    Introduction

    Featureextractionisconsideredascomprising

    threedifferent

    processing

    stage;

    namely

    static

    featureextraction,normalizationand

    inclusionoftemporalinformation.

    Theaimofthisworkistocompare,both

    theoreticallyandexperimentally,anumberof

    morepopular

    techniques

    and

    identify

    which

    combinationsworkbest.

    3 /25

  • 8/8/2019 A Comparison of Front-End Configuration for Robust Speech

    4/25

    Staticfeaturesextraction

    Forspeechrecognition,thevocaltract

    componentprovides

    best

    discrimination

    betweenspeechsounds.

    Mostoffeatureextractionmethodsuse

    cepstralanalysis

    to

    extract

    this

    vocal

    tract

    componentfromcomputingcepstralfeature.

    Successful

    of

    these

    methods

    also

    include

    attribute ofthepsychophysicalprocessesofhearingintotheanalysis.

    4 /25

  • 8/8/2019 A Comparison of Front-End Configuration for Robust Speech

    5/25

    5 /25

  • 8/8/2019 A Comparison of Front-End Configuration for Robust Speech

    6/25

    ComparisonofMFCCandPLPAnalysis

    Spectralanalysis:

    PLPand

    MFCC

    analysis

    both

    obtain

    a

    shorttermpowerspectrumbyapplyinga

    Fourier

    transform

    to

    a

    frame

    of

    Hamming

    windowedspeech,typically2032msin

    duration.

    6 /25

  • 8/8/2019 A Comparison of Front-End Configuration for Robust Speech

    7/25

    ComparisonofMFCCandPLP

    Analysis(cont.) Criticalbandanalysis:

    BothPLP

    and

    MFCC

    employ

    an

    auditory

    basedwarpingofthefrequencyaxisderived

    from

    the

    frequency

    sensitivity

    of

    human

    hearing.

    MFCCarebasedon auniformspacing

    alongthe

    Mel

    scale

    and

    PLP

    uses

    the

    Bark

    scale.

    7 /25

  • 8/8/2019 A Comparison of Front-End Configuration for Robust Speech

    8/25

    8 /25

  • 8/8/2019 A Comparison of Front-End Configuration for Robust Speech

    9/25

    BothcriticalbandanalysisandMelfilteranalysiscanbe

    viewasapplyingasetofbasisfunctiontothepower

    spectrumofthespeechsignal.

    9 /25

  • 8/8/2019 A Comparison of Front-End Configuration for Robust Speech

    10/25

    ComparisonofMFCCandPLP

    Analysis(cont.) Equalloudnesspreemphasis:

    Tocompensatefortheunequalsensitivityofhuman

    hearingacross

    frequency.

    PLPanalysisscalesthecriticalbandsamplitudesaccordingtoanequalloudnesspreemphasisfunction,suchas

    InMFCCanalysis,preemphasisisappliedinthetime

    domainatypical

    implementation

    uses

    afirst

    order

    high

    passfilter

    )1038.0()103.6(

    )108.56()(

    92262

    462

    ++

    +=

    ww

    wwwE

    11)(

    = azzH

    10 /25

  • 8/8/2019 A Comparison of Front-End Configuration for Robust Speech

    11/25

    ComparisonofMFCCandPLP

    Analysis(cont.) Intensityloudnesspowerlaw:

    Thisprocessing

    stage

    models

    the

    non

    linear

    relationbetweentheintensityofsoundandits

    perceivedloudness.

    InthePLP,cubicrootcompressionofcritical

    bandenergiesisusedtoimplementthisfunction.

    Inthe

    MFCC

    analysis,

    logarithmic

    compression

    ofthemelfilterbankchannelsisapplied.

    11 /25

  • 8/8/2019 A Comparison of Front-End Configuration for Robust Speech

    12/25

    ComparisonofMFCCandPLP

    Analysis(cont.) ARModelingand CepstralAnalysis:

    MFCC analysis

    computes

    cepstral

    coefficients

    fromthe

    log

    Mel

    filter

    using

    adiscrete

    cosine

    transform.

    PLP

    analysis

    the

    critical

    band

    spectrum

    is

    convertedinto

    asmall

    number

    of

    LP

    coefficients

    throughtheapplicationofaninverse DFTtoprovideautocorrelationcoefficients.

    Fromthe

    LP

    coefficients,

    cepstral

    coefficients

    arecomputedandtheseformthefinalstaticfeaturevector.

    12 /25

  • 8/8/2019 A Comparison of Front-End Configuration for Robust Speech

    13/25

    NormalizationMethods

    Followingcomputationofstaticfeatures,front

    endprocessing

    techniques

    typically

    employ

    some

    formofnormalizationtothefeaturestream.

    Inthiscomparisontheprocess

    RASTAfiltering

    cepstralmean

    normalization

    13 /25

  • 8/8/2019 A Comparison of Front-End Configuration for Robust Speech

    14/25

    RASTAFiltering

    TheRASTAfilterasafrontendoperationto

    reduce

    both

    communication

    channel

    effects

    andnoisedistortion.

    Channeldistortion

    is

    additive

    in

    the

    log

    frequencyandcepstraldomain,soapplyinga

    sharp

    cut

    off

    high

    pass

    to

    each

    coefficient,

    overtime,removetheoffsetandsuppressesthechanneldistortion.

    14 /25

  • 8/8/2019 A Comparison of Front-End Configuration for Robust Speech

    15/25

    CepstralMeanNormalization(CMN)

    Calculatingthemeanofeachcoefficient

    acrossareasonably

    large

    number

    of

    frames

    givesthecepstralmean.

    Subtracting thisfromtheoriginalcepstral

    vectorsremoveschannelinducedoffsets

    togetherwith

    any

    other

    stationary

    speech

    component.

    15 /25

  • 8/8/2019 A Comparison of Front-End Configuration for Robust Speech

    16/25

  • 8/8/2019 A Comparison of Front-End Configuration for Robust Speech

    17/25

    TemporalInformation

    HMMsneedtoassumptiontheobservationvectoraregeneratedfromanindependentidentically

    distributed(IID).

    Thetemporalcorrelationwhichexistsinthefeature

    vector

    stream,

    brakes

    this

    assumption.

    encodingtemporal

    information:

    TemporalDerivatives

    Cepstraltimematrices

    17 /25

  • 8/8/2019 A Comparison of Front-End Configuration for Robust Speech

    18/25

    TemporalDerivatives

    Thefirstordertemporalderivative(velocity)

    Where isthefirsttimederivativeofthe cepstralcoefficient

    at

    time

    frame

    t,

    and

    is

    the

    coefficient

    ofthet+ staticcepstralvector.Therangeis kto+kisthetimespanofcepstralvectoracrosswhichthe

    derivativeis

    calculated.

    Inasimilarway,thesecondordertemporalderivative(acceleration) isusuallycomputeassimpledifferenceovervelocityvector.

    =

    =

    +

    =k

    kk

    k

    kk

    kt

    t

    k

    nkc

    nc2

    )(

    )(

    )(nct

    )(nckt+

    thn

    thk

    thn

    )(nct

    18 /25

  • 8/8/2019 A Comparison of Front-End Configuration for Robust Speech

    19/25

    Cepstraltimematrices(CTM)

    Thecepstraltimematrixisanalternative

    framework

    for

    encoding

    the

    temporal

    variationsofspeechintothefeature.

    Thecolumns

    of

    the

    resulting

    matrix

    representsdifferenttemporalregionsandcan

    be

    truncated

    according

    to

    the

    amount

    of

    temporalinformationrequiredinthefinalspeechfeature.

    19 /25

  • 8/8/2019 A Comparison of Front-End Configuration for Robust Speech

    20/25

    20 /25

  • 8/8/2019 A Comparison of Front-End Configuration for Robust Speech

    21/25

    2DDCT:

    =

    +

    +=

    1

    02

    )12(cos)(),(

    M

    k

    kttM

    mkncnmC

    21

  • 8/8/2019 A Comparison of Front-End Configuration for Robust Speech

    22/25

    ComparisonofTemporalDerivatives

    andCTM

    22 /25

  • 8/8/2019 A Comparison of Front-End Configuration for Robust Speech

    23/25

    Experimentalresults

    Toconstraintheexperimentalresultssoonlytheeffectoffeatureisconsidered,testshavebeen

    performedon

    an

    unconstrained

    monophone task.

    BTSubscribertelephonydatabasewhichcontainsapproximately4330sentencesinthetrainingset

    and2560

    in

    the

    test

    set.

    Eachofthe44phonemesismodeledbya3state,12model,diagonalcovarianceHMM.

    Inthe

    feature

    extraction

    schemes

    aframe

    rate

    of

    16mshasbeenused,togetherwithaHammingwindowwidthof32ms.

    23 /25

  • 8/8/2019 A Comparison of Front-End Configuration for Robust Speech

    24/25

    24 /25

  • 8/8/2019 A Comparison of Front-End Configuration for Robust Speech

    25/25

    25 /25