rasta processing of speech

30
RASTA Processing of Speech Hynek Hermansky & Nelson Morgan

Upload: benjamin-bengfort

Post on 13-May-2015

496 views

Category:

Science


3 download

DESCRIPTION

A presentation of Hermansky & Morgan's 1994 paper, RASTA Processing of Speech. Learn the dramatic effect of RASTA on critical band analysis when combined with PLP to do speech detection! Hermansky, Hynek, and Nelson Morgan. "RASTA processing of speech." Speech and Audio Processing, IEEE Transactions on 2.4 (1994): 578-589.

TRANSCRIPT

Page 1: Rasta processing of speech

RASTA Processing of SpeechHynek Hermansky & Nelson Morgan

Page 2: Rasta processing of speech

The Question

Stochastic techniques to derive information from sound seems wasteful, especially since non-speech components have a predictable effect on speech signal.

Can we suppress spectral components that change too quickly or slowly to be speech?

Page 3: Rasta processing of speech

The Answer

RASTA - much like human listeners, isolates not the speech components, but the relative spectral changes in order to reduce slowly changing or steady state factors (noise!). This emphasizes changes/“edges”.

Page 4: Rasta processing of speech

Quick disclaimer: we definitely know what we’re talking about

Page 5: Rasta processing of speech

Edge Detection

Page 6: Rasta processing of speech

Inspiration

Humans can perceive speech like sounds depending on the spectral difference between the current sound and the preceding sound.

Page 7: Rasta processing of speech

Sounds!

An analogous situation might occur in time-reversed speech:

Intelligibility of Time Reversed Speech

Page 9: Rasta processing of speech

Filters

Page 10: Rasta processing of speech

More Sounds!

What band pass filters sound like from Chris’ experiments.

Page 11: Rasta processing of speech

Speech Processing Reviewhttp://www.learnartificialneuralnetworks.com/images/srfig01.jpg

Page 12: Rasta processing of speech

Speech Processing Reviewhttp://www.learnartificialneuralnetworks.com/images/srfig01.jpg

Page 13: Rasta processing of speech

Perceptual Linear Predictionhttp://svr-www.eng.cam.ac.uk/~ajr/SA95/img181.gif

Page 14: Rasta processing of speech

Replace conventional critical-band short term spectrum in PLP analysis with spectral estimate from frequencies band-pass filtered via a sharp spectral zero.

New estimate is less sensitive to variations.

The RASTA Method

Page 15: Rasta processing of speech

1. Compute critical-band power spectrum (PLP)2. Transform spectral amplitude through compressing static

nonlinear transformation (RASTA)3. Filter the time trajectory of each transformed spectral

component (RASTA)4. Transform the filtered speech representation through

expanding static nonlinear transformation (RASTA)5. Multiply by the equal loudness curve and exponentiate by

0.33 to simulate hearing (PLP)6. Compute an all-pole model of the result (PLP)

RASTA-PLP

Page 16: Rasta processing of speech

The Key→ suppress constant factors in the auditory-like spectrum, prior to estimation of language model.

Research issues:- What domain is filtering in?- What filter to use?

Speech Signal

Spectral Analysis

Bank of Compressing Static Nonlinearities

Bank of Linear Bandpass Filters

Bank of Expanding Static Nonlinearities

Continued Processing

Page 17: Rasta processing of speech

For this paper: an IIR filter with this transfer function

Page 18: Rasta processing of speech

Resulting Filter

Page 19: Rasta processing of speech

- Affects choice of compressing/expanding static nonlinear function (The domain):

1. Logarithmic2. Lin-Log

Two Flavors of RASTA

Page 20: Rasta processing of speech

Logarithmic Amplitude Transformation (step 2)Antilogarithmic (exponential) transformation (step 4)

Page 21: Rasta processing of speech

Natural Logarithm dependent on J, a signal-dependent positive constant that is linear like for J < 1 and logarithmic like for J > 1

J=0.1

J=1.0

Page 22: Rasta processing of speech

Results

Page 23: Rasta processing of speech

Digits recorded over phone lines, with or without noise or changes in noise over time

Isolated Digits Recognition

Page 24: Rasta processing of speech

Large Vocab Continuous Speech

Four speakers each reading 2,652 sentencesSentences were preserved as recorded or had a low-pass filter applied to them

Page 25: Rasta processing of speech

Next Experiments

● Let’s train the model in with no noise and then test it in a situation with noise in the background

● Analogous to software assembled in the factory and used in the real world

Page 26: Rasta processing of speech

● RASTA > PLP when noise changes between training and test

● Success of RASTA depends on transform of signal

Isolated Digits Recognition

Page 27: Rasta processing of speech

Large Vocab Continuous Speech

● Again, success depends on filter used

Page 28: Rasta processing of speech

Optimizing J

● It seems important, then, to pick an appropriate J = domain parameter, for each level of noise

● This can be approximated by measuring energy at the first part of an utterance

● Performance improves even more!

Page 29: Rasta processing of speech

Consequences of RASTA Processing

● Most important advance of RASTA: compare current information to previous information

● This highlights transitions and changes → edge detection!

Page 30: Rasta processing of speech