1cm introducing covarep: a collaborative voice analysis...

47
Introducing COVAREP: A collaborative voice analysis repository for speech technologies John Kane Wednesday November 27th, 2013 SIGMEDIA-group TCD COVAREP - Open-source speech processing repository 1

Upload: vutruc

Post on 10-Apr-2018

216 views

Category:

Documents


2 download

TRANSCRIPT

Introducing COVAREP:

A collaborative voice analysis repositoryfor speech technologies

John Kane

Wednesday November 27th, 2013SIGMEDIA-group

TCD

COVAREP - Open-source speech processing repository 1

Introduction

(a) Gilles Degottex (b) Thomas Drugman

(c) Tuomo Raitio (d) Stefan Scherer

COVAREP - Open-source speech processing repository 2

Motivation

“...open, well-documented, and well-tested scientific code isessential not only to reproducibility in modern scientific research,but to the very progression of research itself.”

COVAREP - Open-source speech processing repository 3

Related toolkits

KALDI - Speech recognition toolkit

- Speech processing toolkit

VOICEBOX - Speech analysis toolkit

COVAREP - Open-source speech processing repository 4

Solution?

Fast, effective results every time

COVAREP - Open-source speech processing repository 5

COVAREP - Aims

Website: http://covarep.github.io/covarep/index.htmlGitHub: https://github.com/covarep/covarep

COVAREP - Open-source speech processing repository 6

COVAREP - Aims

I More reproducible research

I Increase the availability and impact of speech processingalgorithms

I Participation and feedback

COVAREP - Open-source speech processing repository 7

COVAREP - Scope

I Broad scope - any speech signal processing algorithms

I Speech analysis, synthesis, conversion, transformation, speechquality, enhancement, glottal source/voice quality analysis,etc.

I Use! Contribute!

COVAREP - Open-source speech processing repository 8

Overview of COVAREP

Speech

Signal

PolarityDetection

GCIEstimation

PitchTracking

GlottaldFlowEstimation

GlottaldFlowParameterization

SpectraldEnvelopeEstimationd

FormantTracking

Phase-basedRepresentation

SinusoidalModeling

COVAREP - Open-source speech processing repository 9

Overview of COVAREP

SpeechSignal

PolarityDetection

GCIEstimation

PitchTracking

Glottal FlowEstimation

Glottal FlowParameterization

Spectral EnvelopeEstimation

FormantTracking

Phase-basedRepresentation

SinusoidalModeling

1. Periodicity

COVAREP - Open-source speech processing repository 10

Overview of COVAREP

SpeechSignal

PolarityDetection

GCIEstimation

PitchTracking

Glottal FlowEstimation

Glottal FlowParameterization

Spectral EnvelopeEstimation

FormantTracking

Phase-basedRepresentation

SinusoidalModeling

1. Periodicity

2. Spectral envelope

COVAREP - Open-source speech processing repository 11

Overview of COVAREP

SpeechSignal

PolarityDetection

GCIEstimation

PitchTracking

Glottal FlowEstimation

Glottal FlowParameterization

Spectral EnvelopeEstimation

FormantTracking

Phase-basedRepresentation

SinusoidalModeling

1. Periodicity

2. Spectral envelope

3. Sine modelling

COVAREP - Open-source speech processing repository 12

Overview of COVAREP

SpeechSignal

PolarityDetection

GCIEstimation

PitchTracking

Glottal FlowEstimation

Glottal FlowParameterization

Spectral EnvelopeEstimation

FormantTracking

Phase-basedRepresentation

SinusoidalModeling

1. Periodicity

2. Spectral envelope

3. Sine modelling

4. Glottal analysis

COVAREP - Open-source speech processing repository 13

Overview of COVAREP

SpeechSignal

PolarityDetection

GCIEstimation

PitchTracking

Glottal FlowEstimation

Glottal FlowParameterization

Spectral EnvelopeEstimation

FormantTracking

Phase-basedRepresentation

SinusoidalModeling

1. Periodicity

2. Spectral envelope

3. Sine modelling

4. Glottal analysis

4. Phase analysis

COVAREP - Open-source speech processing repository 14

COVAREP - Periodicity & synchronicity

SpeechSignal

PolarityDetection

GCIEstimation

PitchTracking

Glottal FlowEstimation

Glottal FlowParameterization

Spectral EnvelopeEstimation

FormantTracking

Phase-basedRepresentation

SinusoidalModeling

1. Periodicity

COVAREP - Open-source speech processing repository 15

COVAREP - Periodicity & synchronicity

I Polarity detection

I f0 and voicing decision extraction

I Detection of glottal closure instants

COVAREP - Open-source speech processing repository 16

Periodicity & synchronicity - F0 extraction

0 1000 2000 3000 4000 5000−50

0

50A

mpl

itude

(dB

)

Frequency (Hz)

Speech spectrum

Speech amplitude spectrum

COVAREP - Open-source speech processing repository 17

Periodicity & synchronicity - F0 extraction

0 1000 2000 3000 4000 5000−50

0

50A

mpl

itude

(dB

)

Frequency (Hz)

Speech spectrum

0 1000 2000 3000 4000 5000−60

−40

−20

0

Frequency (Hz)

Am

plitu

de (

dB)

Residual spectrum

Envelope-removed speech amplitude spectrum

COVAREP - Open-source speech processing repository 18

Periodicity & synchronicity - F0 extraction

0 1000 2000 3000 4000 5000−50

0

50A

mpl

itude

(dB

)

Frequency (Hz)

Speech spectrum

0 1000 2000 3000 4000 5000−60

−40

−20

0

Frequency (Hz)

Am

plitu

de (

dB)

Residual spectrum

SRH(f) = E (f )+∑N

k=2[E (k ·f )−E ((k−0.5)·f )] for f ∈ [F 0min,F 0max ]

where E is the residual spectrum, f is frequency (Hz) and N is the number of harmonics considered

COVAREP - Open-source speech processing repository 19

Periodicity & synchronicity - F0 extractionF

requ

ency

(H

z)

Time (seconds)

Residual harmonic summation

0.5 1 1.5 2 2.5 350

100

150

200

250

Residual harmonic summation over time

COVAREP - Open-source speech processing repository 20

COVAREP - Periodicity & synchronicityFre

quen

cy [H

z]

Discrete All−Pole (DAP) envelope

0.56 0.58 0.6 0.62 0.64 0.66 0.68 0.70

1000

2000

3000

4000

5000

0.56 0.58 0.6 0.62 0.64 0.66 0.68 0.7

−0.1

−0.05

0

0.05

0.1

0.15Glottal Flow (GF) derivative with GCIs

Time [s]

Am

plitu

de

Detected glottal closure instants

COVAREP - Open-source speech processing repository 21

COVAREP - Spectral envelope estimation

SpeechSignal

PolarityDetection

GCIEstimation

PitchTracking

Glottal FlowEstimation

Glottal FlowParameterization

Spectral EnvelopeEstimation

FormantTracking

Phase-basedRepresentation

SinusoidalModeling

2. Spectral envelope

COVAREP - Open-source speech processing repository 22

COVAREP - Spectral envelope estimation

I Discrete all-pole (DAP) model

I “True envelope” (TE) - spectral envelope by iterative cepstralsmoothing

I Weighted linear prediction

I Conversion from envelope to Mel-Frequency CepstralCoefficients (MFCC)

COVAREP - Open-source speech processing repository 23

COVAREP - Spectral envelope estimation

0 1000 2000 3000 4000 5000 6000 7000 8000−50

−40

−30

−20

−10

0

10

20

30Speech spectrum

Am

plitu

de (

dB)

Frequency (Hz)

Speech amplitude spectrumCOVAREP - Open-source speech processing repository 24

COVAREP - Spectral envelope estimation

−50

−30

−10

10

30A

mpl

itude

(dB

)Speech spectrum with mel−spaced filters

Frequency (Hz)

0 1000 2000 3000 4000 5000 6000 7000 80000

0.25

0.5

0.75

1

Speech spectrum with mel-spaced triangular filtersCOVAREP - Open-source speech processing repository 25

COVAREP - Spectral envelope estimation

0 1000 2000 3000 4000 5000 6000 7000 8000−80

−60

−40

−20

0

20

40A

mpl

itude

(dB

)

Frequency (Hz)

Speech spectrum with "True Envelope"

Speech spectrum with TE spectral envelopeCOVAREP - Open-source speech processing repository 26

COVAREP - Spectral envelope estimation

0 1000 2000 3000 4000 5000 6000 7000 8000−50

−30

−10

10

30A

mpl

itude

(dB

)"True Envelope" spectrum with mel−spaced filters

Frequency (Hz)

0

0.25

0.5

0.75

1

TE spectral envelope with mel-spaced triangular filtersCOVAREP - Open-source speech processing repository 27

COVAREP - Sinusoidal modelling

SpeechSignal

PolarityDetection

GCIEstimation

PitchTracking

Glottal FlowEstimation

Glottal FlowParameterization

Spectral EnvelopeEstimation

FormantTracking

Phase-basedRepresentation

SinusoidalModeling

3. Sine modelling

COVAREP - Open-source speech processing repository 28

COVAREP - Sinusoidal modelling

I Harmonic model

I Quasi-Harmonic Model (QHM)

I Adaptive Harmonic Model (aHM)

I Harmonic synthesis

COVAREP - Open-source speech processing repository 29

COVAREP - Glottal analysis

SpeechSignal

PolarityDetection

GCIEstimation

PitchTracking

Glottal FlowEstimation

Glottal FlowParameterization

Spectral EnvelopeEstimation

FormantTracking

Phase-basedRepresentation

SinusoidalModeling

4. Glottal analysis

COVAREP - Open-source speech processing repository 30

COVAREP - Glottal analysis

COVAREP - Open-source speech processing repository 31

COVAREP - Glottal analysis

I Deconvolution of glottal source and vocal tract components

I Algorithms for parameterising the glottal source

I Detection of changes in tone-of-voice and voice quality

COVAREP - Open-source speech processing repository 32

COVAREP - Glottal analysis

Vocal effort

COVAREP - Open-source speech processing repository 33

COVAREP - Glottal analysis

0 0.005 0.01 0.015 0.02

125

250

500

1000

2000

4000

8000

Time (seconds)

Fre

quen

cy (

Hz)

Wavelet decomposition of an impulse

COVAREP - Open-source speech processing repository 34

COVAREP - Glottal analysis

0.3 0.31 0.32 0.33 0.34 0.350

0.2

0.4

0.6

0.8

1

Time (seconds)

Am

plitu

de

0.3 0.31 0.32 0.33 0.34 0.350

0.2

0.4

0.6

0.8

1

Time (seconds)

Am

plitu

de

125 Hz 250 Hz 500 Hz 1 kHz 2 kHz 4 kHz 8 kHz

All peaks across the different frequency bands

for breathy (top) and tense (bottom) speech samples

COVAREP - Open-source speech processing repository 35

COVAREP - Phase processing

SpeechSignal

PolarityDetection

GCIEstimation

PitchTracking

Glottal FlowEstimation

Glottal FlowParameterization

Spectral EnvelopeEstimation

FormantTracking

Phase-basedRepresentation

SinusoidalModeling

4. Phase analysis

COVAREP - Open-source speech processing repository 36

COVAREP - Phase processing

I Relative phase shift - speaker verification

I Phase distortion - emotional valence detection

I Chirp group delay represenation - detection of voice disorders

COVAREP - Open-source speech processing repository 37

Emotion classification experiment

I Speech data: Berlin emotion database (10 speakers, 7 actedemotions, 500+ utterances)

I Class labellng: Emotion vs non-emotion (binary),Passive-neutral-active (3-class)

I Feature extraction: Using COVAREP v1.1.0

I Classification: Support vector machines (RBF kernel)

I Validation: Speaker independent, leave-one-speaker-out

COVAREP - Open-source speech processing repository 38

Emotion classification experiment

Feature sets

I MFCC: Standard Mel-frequency cepstral coefficients

I TE-MFCC MFCCs derived from True Enveloperepresentation

I Glottal/VQ: Glottal and voice quality related features

I ALL: TE-MFCC and Glottal/VQ combined

I SEL: 10 most discriminative features

Speaker independent - Leave-one-speaker-out classificationexperiments

COVAREP - Open-source speech processing repository 39

Emotion classification experiment - Results

Neutral Anger Bored Disgust Fear Happy Sad

−0.4

−0.2

0

Neutral

peak

Slo

pe

Anger Bored Disgust Fear Happy Sad0.5

1

1.5

2

Rd

COVAREP - Open-source speech processing repository 40

Emotion classification experiment - Results

MFCCs TE_MFCCs Glottal/VQ ALL SEL0

10

20

30

40

Err

or (

%)

Emotion vs neutral

Activation (3−class)

COVAREP - Open-source speech processing repository 41

Emotion classification experiment - Results

Table: Confusion matrix (%)

MFCCs Glottal/VQNeutral Emotion Neutral Emotion

Neutral 48 52 82 18Emotion 18 82 27 73

COVAREP - Open-source speech processing repository 42

Emotion classification experiment - Results

COVAREP - Open-source speech processing repository 43

Potential applications for COVAREP algorithms

I Speech synthesis

I Speech recognition

I Modelling variation in speaking styles and affective states

I Speaker verification

I Voice pathology detection

I Lots of others!!

COVAREP - Open-source speech processing repository 44

COVAREP summary

I Repository of open-source speech processing algorithms

I Cross-unversity/country effort

I Fast access to newly developed state-of-the-art algorithms

I Improve visability and impact

I More reproducible research

COVAREP - Open-source speech processing repository 45

... and finally!

COVAREP - Open-source speech processing repository 46

Thank you!

Resources:Website: http://covarep.github.io/covarep/GitHub: https://github.com/covarep/covarepPaper: Degottex, G., Kane, J., Drugman, T., Raitio, T., “COVAREP - A

collaborative voice analysis repository for speech technologies”, Submitted to

ICASSP 2014

COVAREP - Open-source speech processing repository 47