covariation and weighting of harmonically decomposed streams for asr

20
aperiodic periodic Production of /z/: Covariation and weighting of harmonically decomposed streams for ASR Introduction Pitch-scaled harmonic filter Recognition experiments Results Conclusion

Upload: huong

Post on 05-Jan-2016

36 views

Category:

Documents


0 download

DESCRIPTION

Covariation and weighting of harmonically decomposed streams for ASR. Introduction Pitch-scaled harmonic filter Recognition experiments Results Conclusion. Production of /z/:. periodic. aperiodic. Motivation and aims. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Covariation and weighting of  harmonically decomposed streams for ASR

aperiodic periodic

Production of /z/:

Covariation and weighting of harmonically decomposed

streams for ASR

Introduction

Pitch-scaled harmonic filter

Recognition experiments

Results

Conclusion

Page 2: Covariation and weighting of  harmonically decomposed streams for ASR

Motivation and aims

• Most speech sounds are either voiced or unvoiced, which have very different properties:

– voiced: quasi-periodic signal from phonation

– unvoiced: aperiodic signal from turbulence noise

• Do these properties allow humans to recognize speech in noise?

Maybe, we can use this information to help ASR...

by computing separate features for the two parts.

• Are their two contributions complementary?

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/ INTRODUCTION

Page 3: Covariation and weighting of  harmonically decomposed streams for ASR

aperiodic contribution periodic contribution

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/ INTRODUCTION

Voiced and unvoiced parts of a speech signal

Production of /z/:

Page 4: Covariation and weighting of  harmonically decomposed streams for ASR

speech waveform

aperiodic waveform

s(n)

periodic waveform

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/ METHOD

Pitch-scaled harmonic filter

u(n)^

time shifting

v(n)^

PSHF. . .

optimised pitch

f0raw

f0opt

pitch optimisation

pitch extraction

Nopt

PSHFPSHF

re-splicing

Page 5: Covariation and weighting of  harmonically decomposed streams for ASR

Orig

inal

Per

iodi

cA

perio

dic

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/ METHOD

Decomposition example (waveforms)

Page 6: Covariation and weighting of  harmonically decomposed streams for ASR

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/ METHOD

Orig

inal

Per

iodi

cA

perio

dic

Decomposition ex. (spectrograms)

Page 7: Covariation and weighting of  harmonically decomposed streams for ASR

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/ METHOD

Orig

inal

Per

iodi

cA

perio

dic

Decomposition ex. (MFCC specs.)

Page 8: Covariation and weighting of  harmonically decomposed streams for ASR

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/ METHOD

Speech database: Aurora 2.0

• From TIdigits database of connected English digit strings (male & female speakers), filtered with G.712 at 8 kHz.

Data type Signal-to-Noise Ratio (dB)

clean-condition

multi-condition 20 15 10 5

set A (same noises)

20 15 10 5 0 -5

set B (different noises)

20 15 10 5 0 -5

set C (diffferent channel)

20 15 10 5 0 -5

TR

AIN

TE

ST

Page 9: Covariation and weighting of  harmonically decomposed streams for ASR

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/ METHOD

Description of the experiments

• Baseline experiment: [base]– standard parameterisation of the original waveforms

(i.e., MFCC,+Δ,+ΔΔ)

• PCA experiments: [pca26, pca78, pca13 and pca39]– decorrelation of the feature vectors, and reduction of

the number of coefficients

• Split experiments: [split, split1]– adjustment of stream weights (periodic vs. aperiodic)

Caveat: pitch values were derived from clean speech files, for entire database!

Page 10: Covariation and weighting of  harmonically decomposed streams for ASR

PCA26:

PCA78:

PCA13:

PCA39:

MFCC +Δ, +Δ2catPSHF PCA

MFCC +Δ, +Δ2 catPSHF PCA

MFCC +Δ, +Δ2 catPSHF PCA

MFCC +Δ, +Δ2 catPSHF PCA

BASE: MFCCwaveform features

+Δ, +Δ2

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/ METHOD

Parameterisations

SPLIT: MFCC +Δ, +Δ2 catPSHF

SPLIT1: MFCC +Δ, +Δ2 catPSHF

Page 11: Covariation and weighting of  harmonically decomposed streams for ASR

Word Error Rate (%) clean multi overall base 47.4 21.7 34.6

pca26 33.8 11.4 22.6 pca78 42.7 12.8 27.7 pca13 28.3 13.0 20.7 pca39 30.3 14.5 22.4

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/ RESULTS

Full-sized PCA results

Page 12: Covariation and weighting of  harmonically decomposed streams for ASR

PCA26PCA39

• clean+ multi

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/ RESULTS

Variance of Principal Components

Page 13: Covariation and weighting of  harmonically decomposed streams for ASR

PCA26 experiment’s results

CLEAN MULTI

Page 14: Covariation and weighting of  harmonically decomposed streams for ASR

Word Error Rate (%) clean multi overall base 47.4 21.7 34.6

pca26 29.0 11.4 20.2 pca78 38.3 12.1 25.2 pca13 27.6 12.6 20.1 pca39 29.3 12.5 20.9

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/ RESULTS

Summary of best PCA results

Page 15: Covariation and weighting of  harmonically decomposed streams for ASR

Split experiment’s results

Page 16: Covariation and weighting of  harmonically decomposed streams for ASR

Word Error Rate (%) clean multi overall base 47.4 21.7 34.6

split (=0) 62.9 44.3 53.6

split (=1) 28.5 11.7 20.1

split (=2) 22.7 11.5 17.1

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/ RESULTS

Sample Split results

Note: same value of stream weights used in training as in testing, for Split.

Page 17: Covariation and weighting of  harmonically decomposed streams for ASR

Split1 experiment’s results

Page 18: Covariation and weighting of  harmonically decomposed streams for ASR

Word Error Rate (%) WER (%) clean multi overall abs. rel. base 47.4 21.7 34.6 0.0 0.0

pca26 29.0 11.4 20.2 14.4 41.6 pca78 38.3 12.1 25.2 9.4 27.2 pca13 27.6 12.6 20.1 14.5 41.9 pca39 29.3 12.5 20.9 13.7 39.6

split 22.6 11.0 16.8 17.8 51.4 split1 21.0 10.9 16.0 18.6 53.8

Word Error Rate (%) clean multi overall base 47.4 21.7 34.6

pca26 29.0 11.4 20.2 pca78 38.3 12.1 25.2 pca13 27.6 12.6 20.1 pca39 29.3 12.5 20.9

split 22.6 11.0 16.8 split1 21.0 10.9 16.0

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/ RESULTS

Summary of PCA & Split results

Page 19: Covariation and weighting of  harmonically decomposed streams for ASR

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/ CONCLUSION

Conclusions• PSHF module split Aurora’s speech waveforms into

two synchronous streams (periodic and aperiodic)– large improvements over the single-stream Baseline

• Split was better than all PCA combinations:– PCA26/13 better than PCA 78/39, and PCA13 best

– Split1 marginally better than Split

• Periodic speech segments give robustness to noise.

Further work– Modeling: how best to combine the streams?

– LVCSR: evaluate front end on TIMIT (phone recognition).

– Robust pitch tracking

Page 20: Covariation and weighting of  harmonically decomposed streams for ASR

COLUMBO PROJECT: Harmonic decomposition

applied to ASR

Philip J.B. Jackson 1 <[email protected]>

David M. Moreno 2 <[email protected]>

Javier Hernando 2 <[email protected]>

Martin J. Russell 3 <[email protected]>

1 2 3

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/