on the representation of voice source aperiodicities in the mbe speech coding model

27
ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL Preeti Rao and Pushkar Preeti Rao and Pushkar Patwardhan Patwardhan Department of Electrical Engineering, Indian Institute of Technology, Bombay India

Upload: keane-byers

Post on 30-Dec-2015

28 views

Category:

Documents


0 download

DESCRIPTION

ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL. Preeti Rao and Pushkar Patwardhan. Department of Electrical Engineering, Indian Institute of Technology, Bombay India. The MBE Speech Model (Griffin & Lim, 1988). Modeled. Original. MBE modeling. X. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL

ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE

SPEECH CODING MODEL

Preeti Rao and Pushkar PatwardhanPreeti Rao and Pushkar Patwardhan

Department of Electrical Engineering,Indian Institute of Technology, Bombay India

Page 2: ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL

Department of Electrical Engineering, IIT Bombay 2

The MBE Speech Model (Griffin & Lim, 1988)

X

MBE modeling

Original Modeled

Page 3: ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL

Department of Electrical Engineering, IIT Bombay 3

Frame-based analysis

Within the window, assume: a constant–amplitude, constant-frequency sinusoidal model

Page 4: ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL

Department of Electrical Engineering, IIT Bombay 4

MBE Speech Model Parameters

Pitch Harmonic amplitudes

Band-wise voicing decisions

Parameter Estimation

Windowed speech

(Phase is predicted for smoothness)

Page 5: ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL

Department of Electrical Engineering, IIT Bombay 5

MBE Analysis: Parameter Estimation

Pitch and Spectral Amplitudes:Analysis-by-synthesis matching of a predicted harmonic spectrum with the actual signal spectrum.

Voicing decision per frequency band (3 harmonics):Based on the error between the actual and predicted spectra.

Page 6: ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL

Department of Electrical Engineering, IIT Bombay 6

MBE Analysis: Spectral Matching

Voicing thresholds are frame-adapted as determined by experimental tuning.

Page 7: ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL

Department of Electrical Engineering, IIT Bombay 7

MBE Synthesis

Voiced amplitudes

White noise

Unvoiced amplitudes

Reconstructedspeech

Bank of Harmonic OscillatorsPitch

Voiced speech

Voiced speech synthesis

Unvoicedspeech

LinearInterpolation

STFTReplace Envelope

Weighted Overlap-Add

Unvoiced speech synthesis

Voicedspeech

Unvoicedspeech

Page 8: ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL

Department of Electrical Engineering, IIT Bombay 8

The efficient quantisation of MBE parameters has led to:

IMBE (Inmarsat) @ 4.15 kbps

DVSI MBE codecs @ >2 kbps

LR MBE (IITB) @ 1.5 kbps

Research groups: (Univ. Surrey, UCSB, Sony Corp.)@1.2 kbps to 3 kbps

Narrowband Speech Coding with MBE

modeledreference

Page 9: ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL

Department of Electrical Engineering, IIT Bombay 9

Related Models: Speech Synthesis

• Harmonics+Noise Model (HNM): Stylianou• Harmonic/Stochastic Model (H/S): Dutoit,1996

Emphasis is on natural sounding wideband speech and easy prosody modification.

Both use essentially the Griffin & Lim MBE analysis.Important differences: • Analysis and synthesis are pitch synchronous• Estimated harmonic phases are utilised in synthesis

Page 10: ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL

Department of Electrical Engineering, IIT Bombay 10

MBE Model: Limitations

The codec speech quality does not improve with increasing bit rate => the model has its limitations

Assumption of frame-level quasi-stationarity: enables the accurate representation only of

• vowels• unvoiced and voiced fricatives

(not plosives, onsets,…)

Page 11: ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL

Department of Electrical Engineering, IIT Bombay 11

dark sharp

Glottal pulse shape variation(brightness, vocal effort)

Pitch cycle variations: Jitter / shimmer(roughness / harshness)

Frication and aspiration(friction, breathiness)

T2 TmT1

+

Glottal pulse

Vocal tract response

Speech signal

0 0.002 0.004 0.006 0.008 0.01 0.012 0.014-1

0

1

2

3

4

5

6

7

8

9

msec

“Steady” Sounds: Voice Quality

Page 12: ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL

Department of Electrical Engineering, IIT Bombay 12

Role of Model Excitation Parameters

The glottal spectral shape (glottal waveform shape) can be captured by the spectral envelope parameters. But the perceptual effects of

• vocal cord vibration aperiodicities • aspiration / frication noise must be reproduced (if at all) by the MB excitation.

Page 13: ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL

Department of Electrical Engineering, IIT Bombay 13

Effect of Aperiodicities on MBE Parameters

Voice source aperiodicities distort the harmonic spectrum (esp. if the frame contains several pitch cycles).

• Modulation (jitter-shimmer) aperiodicities => smearing of harmonic lobe structure; noise and subharmonics may be introduced.

• Aspiration noise => additive noise in harmonic regions

Page 14: ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL

Department of Electrical Engineering, IIT Bombay 14

MBE Analysis: Aperiodic Vowel

Increase in the analysis spectrum matching error => MBE synthesis of UV (random noise) frequency bands

Page 15: ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL

Department of Electrical Engineering, IIT Bombay 15

Previous: On Multi-band Excitation

• Fujimura, 1968: “A crude approximation of aperiodicity observed in natural speech can be made by distributing patches of random noise signals in the time-frequency space of the speech signal.”

• Makhoul, 1978: “Spectral devoicing due to vocal cord vibration irregularities is an artifact of the spectral estimation, and it may not be appropriate to use a noise source for the synthesis…”

• Griffin and Lim, 1988: Justify MBE model by quoting Fujimura, and also their own observations with speech in noise.

Page 16: ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL

Department of Electrical Engineering, IIT Bombay 16

Synthetic Vowel: Modulation Aperiodicities

Page 17: ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL

Department of Electrical Engineering, IIT Bombay 17

Synthetic Vowel: Modulation Aperiodicities

50 100 150 200 250 3000

2

4

6

8

10

12

Pitch (Hz)

% s

him

mer

high shimmer

referencemodeled

50 100 150 200 250 3000

0.5

1

1.5

2

2.5

3

3.5

4

Pitch (Hz)

% ji

tter

high jitter

referencemodeled

HIGH JITTER HIGH SHIMMER

80 Hz 160 Hz 250 HzPeriodic ref:

Page 18: ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL

Department of Electrical Engineering, IIT Bombay 18

Fujimura-type Experiment

Highly jittered vowel /ɑ/

Reference

MBE (note “unfused” noise)

MBE-modeled with forced decisions

Page 19: ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL

Department of Electrical Engineering, IIT Bombay 19

Experiments with Natural Speech

Goal: to study the MBE representation of • Unvoiced and voiced fricatives• Breathy voice• Rough and hoarse voices• Speech in noisy background

To understand the implications of simplifying the excitation to single-band (SBE) or two-band excitation (TBE)

Page 20: ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL

Department of Electrical Engineering, IIT Bombay 20

VCV: /ɑzɑ/

Reference MBE-Modeled

SBE modeled

Page 21: ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL

Department of Electrical Engineering, IIT Bombay 21

VCV: /ɑƷɑ/

Reference MBE-modeled

Page 22: ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL

Department of Electrical Engineering, IIT Bombay 22

Voice quality: BreathyMBE-modeled

TBE-modeled (buzzy)

Reference

Page 23: ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL

Department of Electrical Engineering, IIT Bombay 23

Voice Quality: HarshMBE-modeled

TBE-modeled

Reference

Page 24: ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL

Department of Electrical Engineering, IIT Bombay 24

Voice Quality: Rough

MBE-modeled

Reference

Page 25: ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL

Department of Electrical Engineering, IIT Bombay 25

Noise Corrupted Speech (15 dB SNR)

Reference

MBE-modeled

TBE-modeled (buzzy)

Page 26: ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL

Department of Electrical Engineering, IIT Bombay 26

Conclusions

• MB excitation represents frication and aspiration accurately; esp. crucial for noisy speech.

• Modulation aperiodicities are not captured at high pitches except through devoiced bands. Depending on the setting of thresholds, the noise bands may not fuse perceptually.

• It is possible to simulate partially the perceptual effects of jitter/shimmer by the controlled devoicing of bands in the t-f space.

Page 27: ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL

Thank you