1cm introducing covarep: a collaborative voice analysis...

Introducing COVAREP:

A collaborative voice analysis repositoryfor speech technologies

John Kane

Wednesday November 27th, 2013SIGMEDIA-group

TCD

COVAREP - Open-source speech processing repository 1

Introduction

(a) Gilles Degottex (b) Thomas Drugman

(c) Tuomo Raitio (d) Stefan Scherer


Motivation

“...open, well-documented, and well-tested scientific code isessential not only to reproducibility in modern scientific research,but to the very progression of research itself.”


Related toolkits

KALDI - Speech recognition toolkit

- Speech processing toolkit

VOICEBOX - Speech analysis toolkit


Solution?

Fast, effective results every time


COVAREP - Aims

Website: http://covarep.github.io/covarep/index.htmlGitHub: https://github.com/covarep/covarep


h

h

COVAREP - Aims

I More reproducible research

I Increase the availability and impact of speech processingalgorithms

I Participation and feedback


COVAREP - Scope

I Broad scope - any speech signal processing algorithms

I Speech analysis, synthesis, conversion, transformation, speechquality, enhancement, glottal source/voice quality analysis,etc.

I Use! Contribute!


Overview of COVAREP

Speech

Signal

PolarityDetection

GCIEstimation

PitchTracking

GlottaldFlowEstimation

GlottaldFlowParameterization

SpectraldEnvelopeEstimationd

FormantTracking

Phase-basedRepresentation

SinusoidalModeling


Overview of COVAREP

SpeechSignal

PolarityDetection

GCIEstimation

PitchTracking

Glottal FlowEstimation

Glottal FlowParameterization

Spectral EnvelopeEstimation

FormantTracking


SinusoidalModeling

1. Periodicity


Overview of COVAREP

SpeechSignal

PolarityDetection

GCIEstimation

PitchTracking




FormantTracking


SinusoidalModeling

1. Periodicity

2. Spectral envelope


Overview of COVAREP

SpeechSignal

PolarityDetection

GCIEstimation

PitchTracking




FormantTracking


SinusoidalModeling

1. Periodicity


3. Sine modelling


Overview of COVAREP

SpeechSignal

PolarityDetection

GCIEstimation

PitchTracking




FormantTracking


SinusoidalModeling

1. Periodicity


3. Sine modelling

4. Glottal analysis


Overview of COVAREP

SpeechSignal

PolarityDetection

GCIEstimation

PitchTracking




FormantTracking


SinusoidalModeling

1. Periodicity


3. Sine modelling

4. Glottal analysis

4. Phase analysis


COVAREP - Periodicity & synchronicity

SpeechSignal

PolarityDetection

GCIEstimation

PitchTracking




FormantTracking


SinusoidalModeling

1. Periodicity


COVAREP - Periodicity & synchronicity

I Polarity detection

I f0 and voicing decision extraction

I Detection of glottal closure instants


Periodicity & synchronicity - F0 extraction

0 1000 2000 3000 4000 5000−50

0

50A

mpl

itude

(dB

)

Frequency (Hz)

Speech spectrum

Speech amplitude spectrum



0 1000 2000 3000 4000 5000−50

0

50A

mpl

itude

(dB

)

Frequency (Hz)

Speech spectrum

0 1000 2000 3000 4000 5000−60

−40

−20

0

Frequency (Hz)

Am

plitu

de (

dB)

Residual spectrum

Envelope-removed speech amplitude spectrum



0 1000 2000 3000 4000 5000−50

0

50A

mpl

itude

(dB

)

Frequency (Hz)

Speech spectrum

0 1000 2000 3000 4000 5000−60

−40

−20

0

Frequency (Hz)

Am

plitu

de (

dB)

Residual spectrum

SRH(f) = E (f )+∑N

k=2[E (k ·f )−E ((k−0.5)·f )] for f ∈ [F 0min,F 0max ]

where E is the residual spectrum, f is frequency (Hz) and N is the number of harmonics considered


Periodicity & synchronicity - F0 extractionF

requ

ency

(H

z)

Time (seconds)

Residual harmonic summation

0.5 1 1.5 2 2.5 350

100

150

200

250

Residual harmonic summation over time


COVAREP - Periodicity & synchronicityFre

quen

cy [H

z]

Discrete All−Pole (DAP) envelope

0.56 0.58 0.6 0.62 0.64 0.66 0.68 0.70

1000

2000

3000

4000

5000

0.56 0.58 0.6 0.62 0.64 0.66 0.68 0.7

−0.1

−0.05

0

0.05

0.1

0.15Glottal Flow (GF) derivative with GCIs

Time [s]

Am

plitu

de

Detected glottal closure instants


COVAREP - Spectral envelope estimation

SpeechSignal

PolarityDetection

GCIEstimation

PitchTracking




FormantTracking


SinusoidalModeling




I Discrete all-pole (DAP) model

I “True envelope” (TE) - spectral envelope by iterative cepstralsmoothing

I Weighted linear prediction

I Conversion from envelope to Mel-Frequency CepstralCoefficients (MFCC)



0 1000 2000 3000 4000 5000 6000 7000 8000−50

−40

−30

−20

−10

0

10

20

30Speech spectrum

Am

plitu

de (

dB)

Frequency (Hz)

Speech amplitude spectrumCOVAREP - Open-source speech processing repository 24


−50

−30

−10

10

30A

mpl

itude

(dB

)Speech spectrum with mel−spaced filters

Frequency (Hz)

0 1000 2000 3000 4000 5000 6000 7000 80000

0.25

0.5

0.75

1

Speech spectrum with mel-spaced triangular filtersCOVAREP - Open-source speech processing repository 25


0 1000 2000 3000 4000 5000 6000 7000 8000−80

−60

−40

−20

0

20

40A

mpl

itude

(dB

)

Frequency (Hz)

Speech spectrum with "True Envelope"

Speech spectrum with TE spectral envelopeCOVAREP - Open-source speech processing repository 26


0 1000 2000 3000 4000 5000 6000 7000 8000−50

−30

−10

10

30A

mpl

itude

(dB

)"True Envelope" spectrum with mel−spaced filters

Frequency (Hz)

0

0.25

0.5

0.75

1

TE spectral envelope with mel-spaced triangular filtersCOVAREP - Open-source speech processing repository 27

COVAREP - Sinusoidal modelling

SpeechSignal

PolarityDetection

GCIEstimation

PitchTracking




FormantTracking


SinusoidalModeling

3. Sine modelling


COVAREP - Sinusoidal modelling

I Harmonic model

I Quasi-Harmonic Model (QHM)

I Adaptive Harmonic Model (aHM)

I Harmonic synthesis


COVAREP - Glottal analysis

SpeechSignal

PolarityDetection

GCIEstimation

PitchTracking




FormantTracking


SinusoidalModeling

4. Glottal analysis



I Deconvolution of glottal source and vocal tract components

I Algorithms for parameterising the glottal source

I Detection of changes in tone-of-voice and voice quality



Vocal effort



0 0.005 0.01 0.015 0.02

125

250

500

1000

2000

4000

8000

Time (seconds)

Fre

quen

cy (

Hz)

Wavelet decomposition of an impulse



0.3 0.31 0.32 0.33 0.34 0.350

0.2

0.4

0.6

0.8

1

Time (seconds)

Am

plitu

de

0.3 0.31 0.32 0.33 0.34 0.350

0.2

0.4

0.6

0.8

1

Time (seconds)

Am

plitu

de

125 Hz 250 Hz 500 Hz 1 kHz 2 kHz 4 kHz 8 kHz

All peaks across the different frequency bands

for breathy (top) and tense (bottom) speech samples


COVAREP - Phase processing

SpeechSignal

PolarityDetection

GCIEstimation

PitchTracking




FormantTracking


SinusoidalModeling

4. Phase analysis


COVAREP - Phase processing

I Relative phase shift - speaker verification

I Phase distortion - emotional valence detection

I Chirp group delay represenation - detection of voice disorders


Emotion classification experiment

I Speech data: Berlin emotion database (10 speakers, 7 actedemotions, 500+ utterances)

I Class labellng: Emotion vs non-emotion (binary),Passive-neutral-active (3-class)

I Feature extraction: Using COVAREP v1.1.0

I Classification: Support vector machines (RBF kernel)

I Validation: Speaker independent, leave-one-speaker-out


Emotion classification experiment

Feature sets

I MFCC: Standard Mel-frequency cepstral coefficients

I TE-MFCC MFCCs derived from True Enveloperepresentation

I Glottal/VQ: Glottal and voice quality related features

I ALL: TE-MFCC and Glottal/VQ combined

I SEL: 10 most discriminative features

Speaker independent - Leave-one-speaker-out classificationexperiments


Emotion classification experiment - Results

Neutral Anger Bored Disgust Fear Happy Sad

−0.4

−0.2

0

Neutral

peak

Slo

pe

Anger Bored Disgust Fear Happy Sad0.5

1

1.5

2

Rd



MFCCs TE_MFCCs Glottal/VQ ALL SEL0

10

20

30

40

Err

or (

%)

Emotion vs neutral

Activation (3−class)



Table: Confusion matrix (%)

MFCCs Glottal/VQNeutral Emotion Neutral Emotion

Neutral 48 52 82 18Emotion 18 82 27 73


Potential applications for COVAREP algorithms

I Speech synthesis

I Speech recognition

I Modelling variation in speaking styles and affective states

I Speaker verification

I Voice pathology detection

I Lots of others!!


COVAREP summary

I Repository of open-source speech processing algorithms

I Cross-unversity/country effort

I Fast access to newly developed state-of-the-art algorithms

I Improve visability and impact

I More reproducible research


... and finally!


Thank you!

Resources:Website: http://covarep.github.io/covarep/GitHub: https://github.com/covarep/covarepPaper: Degottex, G., Kane, J., Drugman, T., Raitio, T., “COVAREP - A

collaborative voice analysis repository for speech technologies”, Submitted to

ICASSP 2014


h

h

1cm introducing covarep: a collaborative voice analysis...

Documents