1 maxent 2007 r. f. astudillo, d. kolossa and r. orglmeister

22
1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister

Upload: aryanna-tunnell

Post on 01-Apr-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister

1

MAXENT 2007

R. F. Astudillo, D. Kolossa and R. Orglmeister

Page 2: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister

2

MAXENT 2007

R. F. Astudillo, D. Kolossa and R. Orglmeister

PROPAGATION OF STATISTICAL INFORMATION THROUGH NON-LINEAR FEATURE EXTRACTIONS FOR ROBUST

SPEECH RECOGNITION

Overview:

1. Introduction: Automatic speech recognition.2. Problem: Imperfect noise suppression.3. Proposed solution: Uncertainty propagation.4. Tests & results.5. Conclusions.

R. F. Astudillo, D. Kolossa and R. Orglmeister - TU-Berlin

Page 3: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister

3

MAXENT 2007

R. F. Astudillo, D. Kolossa and R. Orglmeister

Automatic Speech Recognizer (ASR)

• Feature extraction transforms signal into a domain more suitable for recognition.

• Speech recognizer models abstract speech components like phonemes or triphones, generates transcription.

• Most of speech recognition applications need noise suppression preprocessing.

Page 4: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister

4

MAXENT 2007

R. F. Astudillo, D. Kolossa and R. Orglmeister

• Non-linear transformations that imitate the way humans process speech.

• Robust against inter-speaker and intra-speaker variability.

• Mel-cepstral and RASTA-PLP transformations.

Feature Extraction

Page 5: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister

5

MAXENT 2007

R. F. Astudillo, D. Kolossa and R. Orglmeister

Speech Recognition

• Statistical models are used to model speech.

• Hidden Markov models with mixture of Gaussians (multivariable) for the emitting states.

Example:Mel-cepstral features

Page 6: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister

6

MAXENT 2007

R. F. Astudillo, D. Kolossa and R. Orglmeister

Noise Suppression

• MMSE-LSA bayesian estimation [Ephraim1985] is one of the most used.

• Leaves residual noise.

• Introduces artifacts in speech.

• Most methods obtain an estimation of the short-time spectrum (STFT) of the clean signal .

Problem: Imperfect estimation.

Page 7: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister

7

MAXENT 2007

R. F. Astudillo, D. Kolossa and R. Orglmeister

Solution: Modeling Uncertainty of Estimation

We model each element of the STFT as a complex Gaussian random distribution .

• Mean set equal to estimated clean value .

• Parameter controls the uncertainty.

Page 8: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister

8

MAXENT 2007

R. F. Astudillo, D. Kolossa and R. Orglmeister

Propagation of Uncertainty

• We propagate first and second order moments of the distributions.

• Correlation between feature appears (covariance).

• Resulting uncertainty is combined with statistical model parameters for robust speech recognition

Page 9: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister

9

MAXENT 2007

R. F. Astudillo, D. Kolossa and R. Orglmeister

Propagation of Uncertainty

• We propagate first and second order moments of the distributions.

• Correlation between feature appears (covariance).

• Resulting uncertainty is combined with statistical model parameters for robust speech recognition

Page 10: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister

10

MAXENT 2007

R. F. Astudillo, D. Kolossa and R. Orglmeister

Approaches to Uncertainty Propagation

Analytic solutions Imply complex calculations. Specific for each transformation.

Pseudo-Montecarlo Unscented Transform [Julier1996]. Inefficient for high number of dimensions

(i.e. STFT 256 dim./frame).

►Piecewise Propagation Efficient combination of both methods. Valid for many feature extractions (i.e. MELSPEC, MFCC, RASTA-PLP).

Page 11: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister

11

MAXENT 2007

R. F. Astudillo, D. Kolossa and R. Orglmeister

Piecewise Uncertainty Propagation

Exemplified with Mel-Ceptral transformation:

1. Modulus extraction (non-linear).2. Filterbank (linear).3. Logarithm (non-linear).4. Discrete-cosine-transform (linear).5. Delta and acceleration coefficients (linear).

Page 12: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister

12

MAXENT 2007

R. F. Astudillo, D. Kolossa and R. Orglmeister

Propagation through Modulus

• By integrating the phase of a complex Gaussian distribution we obtain the Rice distribution.

• Mean and variance can be calculated as:

were L is the Legendre polynom.

Page 13: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister

13

MAXENT 2007

R. F. Astudillo, D. Kolossa and R. Orglmeister

Propagation through filterbank

• Each filter output m is a weighted sum of frequency moduli.

• It can be expressed as a matrix multiplication.

• Mean and variance can be calculated as:

Page 14: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister

14

MAXENT 2007

R. F. Astudillo, D. Kolossa and R. Orglmeister

Full Covariance and other linear transformations

• DCT, delta and acceleration can be computed similarly.

• Covariance after filterbank is no longer diagonal.

• Additional computation costs.

Page 15: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister

15

MAXENT 2007

R. F. Astudillo, D. Kolossa and R. Orglmeister

Propagation through Logarithm

• Non-linear transformation

• Distribution after filterbank difficult to model

• not diagonal

• Dimesionality of the Mel features much smaller than the STFT features

► Unscented transform can be applied efficiently

Page 16: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister

16

MAXENT 2007

R. F. Astudillo, D. Kolossa and R. Orglmeister

Unscented Transform

• Only points must be propagated.

• Points on the th covariace contour and the mean.

• = feature dimension

• Example for =2

Page 17: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister

17

MAXENT 2007

R. F. Astudillo, D. Kolossa and R. Orglmeister

Unscented Transform II

• Mean and covariances are calculated by using weighted averages:

• Parameter allows higher moments of the distribution to be considered.

Page 18: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister

18

MAXENT 2007

R. F. Astudillo, D. Kolossa and R. Orglmeister

Use of Uncertainty

• After propagation of uncertainty, missing feature techniques or uncertainty decoding may be applied.

• These techniques combine uncertainty and model information to ignore or reestimate noisy features.

Parametersof state f1

Page 19: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister

19

MAXENT 2007

R. F. Astudillo, D. Kolossa and R. Orglmeister

Use of Uncertainty II

• Modified imputation [Kolossa2005] showed the best performance.

• It reestimates features for state q by maximizing the probability:

• Assuming multivariate Gaussian distribution for uncertaintyand model:

Page 20: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister

20

MAXENT 2007

R. F. Astudillo, D. Kolossa and R. Orglmeister

Recognition Tests TI-DIGITS database

% correct identified words

Windnoise Streetnoise

Test Type Uncertainty -15dB 5dB -15dB 5dB

Clean Speech ( ) 98.76

Noisy ( ) 28.44 87.94 22.87 92.43

MMSE-LSA ( ) 34.78 75.27 36.63 92.43

+Aprox. uncertainty 46.68 88.72 22.72 94.90

+Ideal uncertainty 51.93 94.28 48.53 96.45

0

0

0

• 200 files (20 different speakers).

• Best, second best results.

Page 21: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister

21

MAXENT 2007

R. F. Astudillo, D. Kolossa and R. Orglmeister

Conclusions

• The use of uncertainty in Mel-cepstral domain is useful to compensate imperfect estimation during noise suppression.

• Piecewise uncertainty propagation is valid for multiple feature extractions.

• Better estimation of uncertainty should improve the results.

Page 22: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister

22

MAXENT 2007

R. F. Astudillo, D. Kolossa and R. Orglmeister

Thank You!

Some literature:

[Ephraim1985] Y. Ephraim, and D. Malah, Acoustics, Speech, and Signal Processing, IEEE Transactions on 33, 443–445 (1985).

[Julier1996] S. Julier, and J. Uhlmann, A general method for approximating nonlineartransformations of probability distributions, Tech. rep., University of Oxford, UK (1996).

[Kolossa2005] D. Kolossa, A. Klimas, and R. Orglmeister, “Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques,” Applications of Signal Processing to Audio and Acoustics, 2005. IEEE Workshop on, 2005, pp. 82-85.