hiwire meeting nancy, july 6-7, 2006

21
HIWIRE MEETING HIWIRE MEETING Nancy, July 6-7, 2006 Nancy, July 6-7, 2006 José C. Segura, Ángel de la Torre José C. Segura, Ángel de la Torre

Upload: nusa

Post on 13-Jan-2016

40 views

Category:

Documents


0 download

DESCRIPTION

HIWIRE MEETING Nancy, July 6-7, 2006. José C. Segura, Ángel de la Torre. Schedule. Non-linear feature normalization for mobile platform Integration scheme Results and discussion Rapid speaker adaptation Combination of adaptation at signal level and acoustic model level - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: HIWIRE MEETING Nancy, July 6-7, 2006

HIWIRE MEETINGHIWIRE MEETINGNancy, July 6-7, 2006Nancy, July 6-7, 2006

José C. Segura, Ángel de la TorreJosé C. Segura, Ángel de la Torre

Page 2: HIWIRE MEETING Nancy, July 6-7, 2006

2 HIWIRE Meeting – Nancy, 6 -7 June, 2006

Schedule

Non-linear feature normalization for mobile platform Integration scheme Results and discussion

Rapid speaker adaptation Combination of adaptation at signal level and acoustic

model level Results and discussion

Assessment of two non-linear techniques for feature normalization Non-linear parametric equalization Model based feature compensation (VTS)

New improvements in robust VAD Model based VAD

Page 3: HIWIRE MEETING Nancy, July 6-7, 2006

HIWIRE MEETINGHIWIRE MEETINGNancy, July 6-7, 2006Nancy, July 6-7, 2006

José C. Segura,José C. Segura, Ángel de la Torre Ángel de la Torre

Page 4: HIWIRE MEETING Nancy, July 6-7, 2006

4 HIWIRE Meeting – Nancy, 6 -7 June, 2006

Schedule

Non-linear feature normalization for mobile platform Integration scheme Results and discussion

Rapid speaker adaptation Combination of adaptation at signal level and acoustic

model level Results and discussion

Assessment of two non-linear techniques for feature normalization Non-linear parametric equalization Model based feature compensation (VTS)

New improvements in robust VAD Model based VAD

Page 5: HIWIRE MEETING Nancy, July 6-7, 2006

5 HIWIRE Meeting – Nancy, 6 -7 June, 2006

Non-linear Parametric Equalization

Feature normalization

Motivation of PEQ: Limitation of linear methods:

Cepstral Mean Normalization Cepstral Mean and Variance Normalization

Limitation of non-linear methods (HEQ, OSEQ): Speech/non-speech ratio Estimation problems

Parametric Equalization PEQ: Two Gaussian Model (speech / non-speech) Training of clean Gaussians; estimation of noisy

Gaussians Non-linear transformation: combination of two linear

transformations (one for speech, one for non-speech)

Page 6: HIWIRE MEETING Nancy, July 6-7, 2006

6 HIWIRE Meeting – Nancy, 6 -7 June, 2006

Non-linear Parametric Equalization

Aurora-2 results:

Aver. WER Relative improv.

BASELINE 34.1 % 0.0 %

OSEQ 17.5 % 48.6 %

PEQ 18.6 % 45.3 %

Aurora-4 results:

Aver. WER Relative improv.

BASELINE 45.6 % 0.0 %

OSEQ 37.5 % 17.8 %

PEQ 31.5 % 30.1 %

Page 7: HIWIRE MEETING Nancy, July 6-7, 2006

7 HIWIRE Meeting – Nancy, 6 -7 June, 2006

Non-linear Parametric Equalization

Additional problem of non-linear transformations: Once the transformation is estimated, it is an

“instantaneous transformation” Temporal correlations are not exploited

Temporal Smoothing (TES): Each equalized cepstrum is time-filtered with an ARMA

filter that restores autocorrelation of clean data

Page 8: HIWIRE MEETING Nancy, July 6-7, 2006

8 HIWIRE Meeting – Nancy, 6 -7 June, 2006

Non-linear Parametric Equalization

Aurora-2 results:

Aver. WER Improv. Aver. WER Improv.

BASELINE 34.1 % 0.0 % 31.6 % 6.5 %

OSEQ 17.5 % 48.6 % 15.5 % 54.3 %

PEQ 18.6 % 45.3 % --- ---

Aurora-4 results:

TES

Aver. WER Improv. Aver. WER Improv.

BASELINE 45.6 % 0.0 % 43.4 % 4.9 %

OSEQ 37.5 % 17.8 % 35.5 % 22.2 %

PEQ 31.5 % 30.1 % 30.7 % 32.6 %

TES

Page 9: HIWIRE MEETING Nancy, July 6-7, 2006

9 HIWIRE Meeting – Nancy, 6 -7 June, 2006

Model Based Feature Compensation (VTS)

VTS feature normalization: Performed in log-FBE domain, (previous to DCT) Based on a Gaussian mixture model trained with clean

speech Allows feature compensation and uncertainty estimation

Summary of VTS (vector Taylor series approach):1. Given the noisy conditions, VTS provides a noisy

Gaussian from each clean Gaussian

2. The noisy Gaussian mixture model allow the computation of the probabilities P(k|y)

3. An estimation of the clean speech x is then possible

4. An estimation of the uncertainty is also possible

Page 10: HIWIRE MEETING Nancy, July 6-7, 2006

10 HIWIRE Meeting – Nancy, 6 -7 June, 2006

Model Based Feature Compensation (VTS)

Step 1: Estimation of a noisy Gaussian from a clean Gaussian:

where the function g0, f0 and h0 are evaluated at the mean of the clean Gaussian and at the mean of the noise:

Page 11: HIWIRE MEETING Nancy, July 6-7, 2006

11 HIWIRE Meeting – Nancy, 6 -7 June, 2006

Model Based Feature Compensation (VTS)

Step 2: Estimation of P(k|y):

is the k-th Gaussian evaluated at the noisy speech y, and P(k) is the a-priori probability of the Gaussian.

where:

Step 3: Estimation of clean speech:

Page 12: HIWIRE MEETING Nancy, July 6-7, 2006

12 HIWIRE Meeting – Nancy, 6 -7 June, 2006

Model Based Feature Compensation (VTS)

Step 4: Estimation of uncertainty:

the uncertainty of the clean speech can be estimated as:

and from the estimation of the clean speech:

assuming small values of the variance of the noise:

Page 13: HIWIRE MEETING Nancy, July 6-7, 2006

13 HIWIRE Meeting – Nancy, 6 -7 June, 2006

Aurora-2 results:

Aver. WER Relative improv.

BASELINE 34.1 % 0.0 %

VTS + MVN 14.0 % 58.9 %

VTS + MVN + UNCERT. 13.5 % 60.0 %

Model Based Feature Compensation (VTS)

Some considerations about VTS: Computational load Better than HEQ, PEQ, etc., but only valid for additive noise or

channel distortion Estimation of noise is critical There are some approximations in the formulation Uncertainty: small improvement (insert., substit., delet.)

Alternative: model-based compensation based on numerical integration of pdfs

Page 14: HIWIRE MEETING Nancy, July 6-7, 2006

14 HIWIRE Meeting – Nancy, 6 -7 June, 2006

Schedule

Non-linear feature normalization for mobile platform Integration scheme Results and discussion

Rapid speaker adaptation Combination of adaptation at signal level and acoustic

model level Results and discussion

Assessment of two non-linear techniques for feature normalization Non-linear parametric equalization Model based feature compensation (VTS)

New improvements in robust VAD Model based VAD

Page 15: HIWIRE MEETING Nancy, July 6-7, 2006

15 HIWIRE Meeting – Nancy, 6 -7 June, 2006

Model-based VAD

Fundamentals of model-based VAD: Gaussian mixture model in log-FBE domain Gaussian mixture model trained with clean speech VTS provides a noisy version of the GMM From the noisy GMM, P(k|y) can be estimated for

each observation y and each Gaussian k A-priori probability of kth Gaussian being speech

P(V|k) can be estimated from the training data

Then, the probability P(V|y) of the noisy observation y being speech is given by:

Page 16: HIWIRE MEETING Nancy, July 6-7, 2006

16 HIWIRE Meeting – Nancy, 6 -7 June, 2006

Model-based VAD

Some considerations about model-based VAD:

VAD decision relies on a Gaussian mixture model trained with clean speech (based on speech events observed in the training database)

Not based on energy.... Based on observations in the log-FBE domain

VTS adapts the Gaussian mixture to noisy conditions: the performance of the VAD is expected to be stable for a wide range of SNRs

Computational load

Page 17: HIWIRE MEETING Nancy, July 6-7, 2006

17 HIWIRE Meeting – Nancy, 6 -7 June, 2006

Model-based VAD

Model-based VAD for different SNRs:

Page 18: HIWIRE MEETING Nancy, July 6-7, 2006

18 HIWIRE Meeting – Nancy, 6 -7 June, 2006

Model-based VAD

Comparison with other VADs: HR1 and HR0 evaluated for AURORA-2

Page 19: HIWIRE MEETING Nancy, July 6-7, 2006

19 HIWIRE Meeting – Nancy, 6 -7 June, 2006

Model-based VAD

Comparison with other VADs: HR1 and HR0 evaluated for AURORA-2

Page 20: HIWIRE MEETING Nancy, July 6-7, 2006

20 HIWIRE Meeting – Nancy, 6 -7 June, 2006

Aurora-2 recognition results (WAcc):

Model-based VAD

WF WF+FD

G.729 57.1 % 57.8 %

AMR.1 66.3 % 65.0 %

AMR.2 78.3 % 78.5 %

AFE 75.3 % 79.0 %

VTS-VAD 78.4 % 80.2 %

Baseline: 60.5 % (no VAD, no WF, no FD)

Page 21: HIWIRE MEETING Nancy, July 6-7, 2006

HIWIRE MEETINGHIWIRE MEETINGNancy, July 6-7, 2006Nancy, July 6-7, 2006

José C. Segura, Ángel de la TorreJosé C. Segura, Ángel de la Torre