histogram-based quantization for distributed / robust speech recognition

51
Histogram-based Quantization for Distributed / Robust Speech Recognition Chia-yu Wan, Lin-shan Lee College of EECS, National Taiwan University, R. O. C. 2007/08/16

Upload: xanti

Post on 01-Feb-2016

24 views

Category:

Documents


0 download

DESCRIPTION

Histogram-based Quantization for Distributed / Robust Speech Recognition. Chia-yu Wan, Lin-shan Lee College of EECS, National Taiwan University, R. O. C. 2007/08/16. Outline. Introduction Histogram-based Quantization (HQ) Joint Uncertainty Decoding (JUD) Three-stage Error Concealment (EC) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Histogram-based Quantization for Distributed / Robust Speech Recognition

Histogram-based Quantization

for Distributed / Robust Speech

Recognition

Chia-yu Wan, Lin-shan Lee

College of EECS, National Taiwan University, R. O. C.

2007/08/16

Page 2: Histogram-based Quantization for Distributed / Robust Speech Recognition

Outline

Introduction

Histogram-based Quantization (HQ)

Joint Uncertainty Decoding (JUD)

Three-stage Error Concealment (EC)

Conclusion

Page 3: Histogram-based Quantization for Distributed / Robust Speech Recognition

Problems of Distance-based VQ Conventional Distance-based VQ (e.g. SVQ) was popularly used in

DSR Dynamic Environmental noise and codebook mismatch jointly degrade

the performance of SVQ

Histogram-based Quantization (HQ) is proposed to solve the problems

Noise moves clean speech to another partition cell (X to

Y)

Mismatch between fixed VQ codebook and test data

increases distortion

Quantization increases difference between clean

and noisy features

Page 4: Histogram-based Quantization for Distributed / Robust Speech Recognition

Decision boundaries yi{i=1,…,N} are dynamically defined by C(y). Representative values zi {i=1,…,N} are fixed, transformed by a standard Gaussian.

Histogram-based Quantization (HQ)T

i i i{D ,z ,b (vertical scale) i=1,...,N}

determined by Lloyd-Max and a

standard Gaussian Distribution

Page 5: Histogram-based Quantization for Distributed / Robust Speech Recognition

Histogram-based Quantization (HQ)T

1

1

, ( ) ,

1,2, ...

t ti ii

t ii

x z if b C x bor y x y

where i N

The actual decision boundaries (horizontal scale) for xt are dynamically defined by the inverse transformation of C(y).

Page 6: Histogram-based Quantization for Distributed / Robust Speech Recognition

Histogram-based Quantization (HQ)

With histogram C’(y’), decision boundaries automatically changed to .

Decision boundaries are adjusted according to local statistics, no codebook mismatch problem.

T

1( , )iiy y

1

1

, '( )

' ' ,

1,2, ...

t ti ii

t ii

x z if b C x b

or y x y

where i N

Page 7: Histogram-based Quantization for Distributed / Robust Speech Recognition

Histogram-based Quantization (HQ)

Based on CDF on the vertical scale and histogram, less sensitive to noise on the horizontal scale

Disturbances are automatically absorbed into HQ block

Dynamic nature of HQ hidden codebook on vertical scaletransformed by dynamic C(y){yi} Dynamic on horizontal scale

T

Page 8: Histogram-based Quantization for Distributed / Robust Speech Recognition

Histogram-based Vector Quantization (HVQ)

Page 9: Histogram-based Quantization for Distributed / Robust Speech Recognition

Discussions about robustness of Histogram-based Quantization (HQ)

Distributed speech recognition: SVQ v.s. HQ

Robust speech recognition: HEQ v.s. HQ

Page 10: Histogram-based Quantization for Distributed / Robust Speech Recognition

Comparison of Distance-based VQ and Histogram-based Quantization (HQ)

Distance-based VQ (SVQ) Histogram-based Quantization (HQ)

HQ solves the major problems of conventional Distance-based VQ

Fixed codebook cannot well represent the noisy speech

Dynamically adjusted to local statistics, no codebook mismatch

Quantization increases difference between clean and noisy speech.

Inherent robust nature, noise disturbances automatically absorbed by C(y)

Page 11: Histogram-based Quantization for Distributed / Robust Speech Recognition

HEQ performed point-to-point transformation

point-based order-statistics are more disturbed

HQ performed block-based transformation

automatically absorbed disturbance within a block

with proper choice of block size, block uncertainty can be

compensated by GMM and uncertainty decoding

Averaged normalized distance between clean and corrupted

speech features based on AURORA 2 database

HEQ (Histogram Equalization) v.s. HQ (Histogram-based Quantization)

Page 12: Histogram-based Quantization for Distributed / Robust Speech Recognition

HEQ performed point-to-point transformation

point-based order-statistics are more disturbed

HQ performed block-based transformation

automatically absorbed disturbance within a block

with proper choice of block size, block uncertainty can be

compensated by GMM and uncertainty decoding

HEQ (Histogram Equalization) v.s. HQ (Histogram-based Quantization)

HQ gives smaller d for all SNR condition less influenced by the noise disturbance

Page 13: Histogram-based Quantization for Distributed / Robust Speech Recognition

HQ as a feature transformation method

Page 14: Histogram-based Quantization for Distributed / Robust Speech Recognition

HQ as a feature quantization method

Page 15: Histogram-based Quantization for Distributed / Robust Speech Recognition

HQ as a feature quantization method

Page 16: Histogram-based Quantization for Distributed / Robust Speech Recognition

HQ as a feature quantization method

Page 17: Histogram-based Quantization for Distributed / Robust Speech Recognition

HQ as a feature quantization method

Page 18: Histogram-based Quantization for Distributed / Robust Speech Recognition

HQ as a feature quantization method

Page 19: Histogram-based Quantization for Distributed / Robust Speech Recognition

HQ as a feature quantization method

Page 20: Histogram-based Quantization for Distributed / Robust Speech Recognition

Further analysisBit rates v.s. SNR

Clean-condition training multi-condition training

Page 21: Histogram-based Quantization for Distributed / Robust Speech Recognition

HQ-JUDFor both robust and/or distributed speech recognition

For robust speech recognition

• HQ is used as the front-end feature transformation

•JUD as the enhancement approach at the backend recognizer

For Distributed Speech Recognition (DSR)

• HQ is applied at the client for data compression

•JUD at the server

Front-end Back-end Client Server

Robustness DSR

Page 22: Histogram-based Quantization for Distributed / Robust Speech Recognition

Joint Uncertainty Decoding (1/4)- Uncertainty Observation Decoding

HMM would be less discriminate on features with higher uncertainty Increasing larger variance for more uncertain features

w: observation, o: uncorrupted features

Assume

Page 23: Histogram-based Quantization for Distributed / Robust Speech Recognition

Joint Uncertainty Decoding (2/4) - Uncertainty for quantization errors

Codeword is the observation w

Samples in the partition cell are

the uncorrupted features o

p(o) is the pdf of the samples

within the partition cell

Variance of samples within partition cell

Page 24: Histogram-based Quantization for Distributed / Robust Speech Recognition

More uncertain regions

Loosely quantized cells

Joint Uncertainty Decoding (2/4) - Uncertainty for quantization errors

Codeword is the observation w

Samples in the partition cell are

the possible distribution o

p(o) is the pdf of the samples

within the partition cell

Increases the variances for the loosely quantized cells

Variance of samples within partition cell

Page 25: Histogram-based Quantization for Distributed / Robust Speech Recognition

Joint Uncertainty Decoding (3/4) -Uncertainty for environmental noise

Increase the variances for HQ features with a larger histogram shift

Histogram shift

Page 26: Histogram-based Quantization for Distributed / Robust Speech Recognition

Jointly consider the uncertainty caused by both the

environmental noise and the quantization errors.

One of the above two would dominate

Quantization errors (High SNR)

Disturbance absorbed into HQ block

Environment noise (Low SNR)

Noisy features moved to another partition cells

Joint Uncertainty Decoding (4/4)

Page 27: Histogram-based Quantization for Distributed / Robust Speech Recognition

HQ-JUDfor robust speech recognition

Page 28: Histogram-based Quantization for Distributed / Robust Speech Recognition

Different types of noise, averaged over all SNR values Client

HEQ-SVQ

ClientHEQ-SVQ

ServerUD

ClientHQ

ClientHQ

ServerJUD

HQ-JUDfor distributed speech recognition

Page 29: Histogram-based Quantization for Distributed / Robust Speech Recognition

Different types of noise, averaged over all SNR values Client

HEQ-SVQ

HEQSVQ-UD was slightly worse than HEQ for set C

ClientHEQ-SVQ

ServerUD

HQ-JUDfor distributed speech recognition

Page 30: Histogram-based Quantization for Distributed / Robust Speech Recognition

Different types of noise, averaged over all SNR values

HEQSVQ-UD was slightly worse than HEQ for set C HQ-JUD consistently improved the performance of HQ

ClientHQ

ClientHQ

ServerJUD

HQ-JUDfor distributed speech recognition

Page 31: Histogram-based Quantization for Distributed / Robust Speech Recognition

Different types of noise, averaged over all SNR values Client

HEQ-SVQClientHQ

HQ performed better than HEQ-SVQ for all types of noise

HQ-JUDfor distributed speech recognition

Page 32: Histogram-based Quantization for Distributed / Robust Speech Recognition

Different types of noise, averaged over all SNR values

HQ performed better than HEQ-SVQ for all types of noise HQ-JUD consistently performed better than HEQSVQ-UD

ClientHQ

ServerJUD

ClientHEQ-SVQ

ServerUD

HQ-JUDfor distributed speech recognition

Page 33: Histogram-based Quantization for Distributed / Robust Speech Recognition

Different SNR conditions, averaged over all noise types

HQ-JUD significantly improved the performance of SVQ-UD HQ-JUD consistently performed better than HEQSVQ-UD

ClientHEQ-SVQ

ServerUD

ClientHQ

ServerJUD

ClientSVQ

ServerUD

ClientHQ

ServerJUD

HQ-JUDfor distributed speech recognition

Page 34: Histogram-based Quantization for Distributed / Robust Speech Recognition

Three-stage error concealment (EC)

Page 35: Histogram-based Quantization for Distributed / Robust Speech Recognition

Stage 1 : error detection

Frame-level error detection

The received frame-pairs are first checked with CRC

Subvector-level error detection

The erroneous frame-pairs are then checked by the HQ

consistency check

The quantized codewords for HQ represent the order-statistics

information of the original parameters

Quantizaiton process does not change the order-statistics

Re-perform HQ on received subvector codeword should fall in

the same partition cell

Page 36: Histogram-based Quantization for Distributed / Robust Speech Recognition

Stage 1 : error detection

Noise seriously affects the SVQ with data consistency check

-precision degradation (from 66% at clean down to 12% at 0 dB)

HQ-based consistency approach is much more stable at all SNR values, - both recall and precision rates are higher.

Page 37: Histogram-based Quantization for Distributed / Robust Speech Recognition

Stage 2 : reconstruction

Based on the Maximum a posterior (MAP) criterion

-Considering the probability for all possible codewords St(i) at time

t, given the current and previous received subvector codewords, Rt and Rt-1,

-prior speech source statistics : HQ codeword bigram model

-channel transition probability : the estimated BER from stage1

-reliability of the received subvectors : consider the relative reliability between prior speech source and wireless channel

1 1( ) ( )

ˆ arg max{ ( ( ) | , )} arg max{ ( ( ) | ) ( | ( ))}t t

t t t t t t t tS i S i

S P S i R R P S i R P R S i prior channel

Page 38: Histogram-based Quantization for Distributed / Robust Speech Recognition

Channel transition probability P(Rt | St(i))

-significantly differentiated (for different codeword i, with different d) when Rt is

more reliable (BER is smaller)

-put more emphasis on prior speech source when Rt is less reliable

( ( ), )( ( ), )( | ( )) *(1- ) t tt t

t t M d S i Rd S i RP R S i BER BER

-the estimated BER is the number of inconsistent subvectors in

the present frame divided by the total number of bits in the frame

Stage 2 : reconstruction

Page 39: Histogram-based Quantization for Distributed / Robust Speech Recognition

Prior source information P(St (i)| Rt-1)

-based on the codeword bi-gram trained from cleaning training data in AURORA 2

-HQ can estimate the lost subvectors more preciously than SVQ

-The conditional entropy measure

Stage 2 : reconstruction

1 1 1( | ) [ log[ ]]( ( ) | ) ( ( ) | )t t t t t ti

H S S E P s i s P s i s

Page 40: Histogram-based Quantization for Distributed / Robust Speech Recognition

Stage 3 : Compensation in Viterbi decoding

The distribution of P(St (i)|Rt ,Rt-1) characterizes the

uncertainty of the estimated features

Assume the distribution P(St (i)|Rt ,Rt-1) is Gaussian, the

variance of the distribution P(St (i)|Rt ,Rt-1) is used in

Uncertainty Decoding

Make the HMMs less discriminative for the estimated

subvectors with higher uncertainty

Page 41: Histogram-based Quantization for Distributed / Robust Speech Recognition

HQ-based DSR system with transmission errors

Features corrupted by noise are more susceptible to transmission errors For SVQ, 98% to 87% (clean), 60% to 36% (10 dB SNR)

Page 42: Histogram-based Quantization for Distributed / Robust Speech Recognition

HQ-based DSR system with transmission errors

The improvements that HQ offered over HEQ-SVQ when transmission errors were present are consistent and significant at all SNR values

HQ is robust against both environmental noise and transmission errors

Page 43: Histogram-based Quantization for Distributed / Robust Speech Recognition

Analyze the degradation of recognition accuracy caused by transmission errors

Comparison of SVQ, HEQ-SVQ and HQ for the percentage of words which were correctly recognized if without transmission errors, but incorrectly recognized after transmission.

Page 44: Histogram-based Quantization for Distributed / Robust Speech Recognition

HQ-Based DSR with Wireless Channels and Error Concealment

ETSI repetition technique actually degraded the performance of HEQ-SVQg the whole feature vectors including the correct subvectors are

replaced by inaccurate estimations

g: GPRS r: ETSI repetition c: three-stage EC

Page 45: Histogram-based Quantization for Distributed / Robust Speech Recognition

HQ-Based DSR with Wireless Channels and Error Concealment

Three-stage EC improved the performance significantly for all cases. Robust against not only transmission errors, but against

environmental noise as well.

g: GPRS r: ETSI repetition c: three-stage EC

Page 46: Histogram-based Quantization for Distributed / Robust Speech Recognition

HQ-Based DSR with Wireless Channels and Error Concealment

Page 47: Histogram-based Quantization for Distributed / Robust Speech Recognition

Different client traveling speed (1/3)

Page 48: Histogram-based Quantization for Distributed / Robust Speech Recognition

Different client traveling speed (2/3)

Page 49: Histogram-based Quantization for Distributed / Robust Speech Recognition

Different client traveling speed (3/3)

Page 50: Histogram-based Quantization for Distributed / Robust Speech Recognition

Conclusions Histogram-based Quantization (HQ) is proposed

a novel approach for robust and/or distributed speech recognition

(DSR)

robust against environmental noise (for all types of noise and all SNR

conditions) and transmission errors

For future personalized and context aware DSR environments

HQ can be adapted to network and terminal capabilities

with recognition performance optimized based on environmental

conditions

Page 51: Histogram-based Quantization for Distributed / Robust Speech Recognition

Thank you for your attention