![Page 1: Modeling the creaky excitation for parametric speech](https://reader031.vdocuments.net/reader031/viewer/2022021212/62065bc48c2f7b1730070079/html5/thumbnails/1.jpg)
Modeling the creaky excitationfor parametric speech synthesis.
1Thomas Drugman, 2John Kane, 2Christer Gobl
September 11th, 2012Interspeech
Portland, Oregon, USA
1University of Mons, Belgium2Trinity College Dublin, Ireland
1 / 27
![Page 2: Modeling the creaky excitation for parametric speech](https://reader031.vdocuments.net/reader031/viewer/2022021212/62065bc48c2f7b1730070079/html5/thumbnails/2.jpg)
Creaky voice - examples
TTS corpora examples
American Male
Finnish female
Finnish Male
Conversational speech examples
Japanese female
American female
American Male
2 / 27
![Page 3: Modeling the creaky excitation for parametric speech](https://reader031.vdocuments.net/reader031/viewer/2022021212/62065bc48c2f7b1730070079/html5/thumbnails/3.jpg)
Creaky voice in speech
Phonetic contrast
e.g., Jalapa Mazatec
Phrase/sentence/turn boundaries
Commonly in American English, Finnish etc.
Interactive speech
Turn-takingHesitationsExpression of affective statesStylistic device
3 / 27
![Page 4: Modeling the creaky excitation for parametric speech](https://reader031.vdocuments.net/reader031/viewer/2022021212/62065bc48c2f7b1730070079/html5/thumbnails/4.jpg)
Creaky voice - acoustic characteristics
4 / 27
![Page 5: Modeling the creaky excitation for parametric speech](https://reader031.vdocuments.net/reader031/viewer/2022021212/62065bc48c2f7b1730070079/html5/thumbnails/5.jpg)
Creaky voice - acoustic characteristics
5 / 27
![Page 6: Modeling the creaky excitation for parametric speech](https://reader031.vdocuments.net/reader031/viewer/2022021212/62065bc48c2f7b1730070079/html5/thumbnails/6.jpg)
Problem statement
Unique acoustic characteristics of creak poorly modelled instandard vocoders
Silen et al. (2009) - improved robustness of f0 and voicingdecision
Our Aim: Provide a method for modelling the creakyexcitation to improve the timbre of creak in parametricsynthesis.
6 / 27
![Page 7: Modeling the creaky excitation for parametric speech](https://reader031.vdocuments.net/reader031/viewer/2022021212/62065bc48c2f7b1730070079/html5/thumbnails/7.jpg)
Speech data
American male (BDL) and Finnish male (MV)
100 sentences containing creak
7 / 27
![Page 8: Modeling the creaky excitation for parametric speech](https://reader031.vdocuments.net/reader031/viewer/2022021212/62065bc48c2f7b1730070079/html5/thumbnails/8.jpg)
Manual annotation
A rough quality with the sensation of repeating impulses- Ishi et al. (2008)
8 / 27
![Page 9: Modeling the creaky excitation for parametric speech](https://reader031.vdocuments.net/reader031/viewer/2022021212/62065bc48c2f7b1730070079/html5/thumbnails/9.jpg)
Glottal closure instants (GCIs)
Newly developed SE-VQ algorithm - Kane & Gobl, In Press
0.56 0.58 0.6 0.62 0.64 0.66 0.68 0.7−0.4−0.2
00.20.40.6
Am
plitu
de
Speech waveform
SEDREAMS − GCI
0.56 0.58 0.6 0.62 0.64 0.66 0.68 0.7−0.5
0
0.5
Am
plitu
de
Resonator output
0.56 0.58 0.6 0.62 0.64 0.66 0.68 0.7
0
0.5
1
Time (seconds)
Am
plitu
de
DEGG
DEGG − GCI
9 / 27
![Page 10: Modeling the creaky excitation for parametric speech](https://reader031.vdocuments.net/reader031/viewer/2022021212/62065bc48c2f7b1730070079/html5/thumbnails/10.jpg)
The deterministic plusstochastic model (DSM)
The Deterministic plus Stochastic Modelof the Residual Signal and its Applications
-Drugman & Dutoit (2012), IEEE TASLP
10 / 27
![Page 11: Modeling the creaky excitation for parametric speech](https://reader031.vdocuments.net/reader031/viewer/2022021212/62065bc48c2f7b1730070079/html5/thumbnails/11.jpg)
DSM - residual excitation
Univ ersité de Mons
Residual excitation
4
11 / 27
![Page 12: Modeling the creaky excitation for parametric speech](https://reader031.vdocuments.net/reader031/viewer/2022021212/62065bc48c2f7b1730070079/html5/thumbnails/12.jpg)
DSM - Residual frames
Univ ersité de Mons
The Deterministic plus Stochastic
Model
7
MGC Analys is
Inverse Filtering
GCI Estimation
PS Window ing
Speech Database
GCI positions
Dataset of PS residual frames
Residual signals
12 / 27
![Page 13: Modeling the creaky excitation for parametric speech](https://reader031.vdocuments.net/reader031/viewer/2022021212/62065bc48c2f7b1730070079/html5/thumbnails/13.jpg)
DSM - Deterministic modelling
Univ ersité de Mons
The Deterministic Component
8
Pitch Norm alizatio n
F0* Dataset of
PS residual frames
Energy Norm alizatio n
Dataset for the Deterministic
Modeling
13 / 27
![Page 14: Modeling the creaky excitation for parametric speech](https://reader031.vdocuments.net/reader031/viewer/2022021212/62065bc48c2f7b1730070079/html5/thumbnails/14.jpg)
DSM - vocoder
Univ ersité de Mons
The DSM vocoder
5
Deterministic component of the excitation
Stochastic component of the excitation
Filter
14 / 27
![Page 15: Modeling the creaky excitation for parametric speech](https://reader031.vdocuments.net/reader031/viewer/2022021212/62065bc48c2f7b1730070079/html5/thumbnails/15.jpg)
Extended DSM for creaky voice
15 / 27
![Page 16: Modeling the creaky excitation for parametric speech](https://reader031.vdocuments.net/reader031/viewer/2022021212/62065bc48c2f7b1730070079/html5/thumbnails/16.jpg)
DSM (creak) - Fundamental period/opening phase
100 150 200 250 300 350 4000
50
100
150
200
250
300
350
400
Fundamental period (samples)
Op
enin
g p
erio
d (
sam
ple
s)
200 250 300 350 400 450 500 5500
50
100
150
200
250
300
350
400
450
500
Fundamental period (samples)
Op
enin
g p
erio
d (
sam
ple
s)
16 / 27
![Page 17: Modeling the creaky excitation for parametric speech](https://reader031.vdocuments.net/reader031/viewer/2022021212/62065bc48c2f7b1730070079/html5/thumbnails/17.jpg)
DSM (creak) - Excitation modelling
Separate residual datasets for opening phase (secondary peak=> GCI) and closed phase (GCI => secondary peak)
Principal component analysis of each dataset separately,excitation model combining first eigenvectors for deterministiccomponent.
Energy envelope also derived for the two datasets separately.
17 / 27
![Page 18: Modeling the creaky excitation for parametric speech](https://reader031.vdocuments.net/reader031/viewer/2022021212/62065bc48c2f7b1730070079/html5/thumbnails/18.jpg)
DSM (creak) - Data-driven excitation signal
0 100 200 300 400−0.1
0
0.1
0.2
0.3
0.4
Time (samples)
Am
plitu
de
0 100 200 300 4000
0.05
0.1
0.15
0.2
0.25
Time (samples)
Am
plitu
de
18 / 27
![Page 19: Modeling the creaky excitation for parametric speech](https://reader031.vdocuments.net/reader031/viewer/2022021212/62065bc48c2f7b1730070079/html5/thumbnails/19.jpg)
DSM (creak) - Vocoder
19 / 27
![Page 20: Modeling the creaky excitation for parametric speech](https://reader031.vdocuments.net/reader031/viewer/2022021212/62065bc48c2f7b1730070079/html5/thumbnails/20.jpg)
Evaluation
20 / 27
![Page 21: Modeling the creaky excitation for parametric speech](https://reader031.vdocuments.net/reader031/viewer/2022021212/62065bc48c2f7b1730070079/html5/thumbnails/21.jpg)
Experimental setup
Subjective evaluation with 22 participants.
Copy-synthesis of short utterances by the American andFinnish speaker using the standard DSM vocoder and theproposed method.
ABX testOriginal utterance (X) and the two copy synthesis versions (A& B). Select most like original
Comparative Mean Opinion Score (CMOS) testCopy synthesis by both vocoders - signal preference on gradual7 point CMOS scale.
21 / 27
![Page 22: Modeling the creaky excitation for parametric speech](https://reader031.vdocuments.net/reader031/viewer/2022021212/62065bc48c2f7b1730070079/html5/thumbnails/22.jpg)
Results - ABX
22 / 27
![Page 23: Modeling the creaky excitation for parametric speech](https://reader031.vdocuments.net/reader031/viewer/2022021212/62065bc48c2f7b1730070079/html5/thumbnails/23.jpg)
Results - Comparative Mean Opinion Score (CMOS)
23 / 27
![Page 24: Modeling the creaky excitation for parametric speech](https://reader031.vdocuments.net/reader031/viewer/2022021212/62065bc48c2f7b1730070079/html5/thumbnails/24.jpg)
Results - Samples
American Male
1 Original standard HTS vocoder DSM vocoder DSM-creak
2 Original standard HTS vocoder DSM vocoder DSM-creak
3 Original standard HTS vocoder DSM vocoder DSM-creak
Finnish Male
1 Original standard HTS vocoder DSM vocoder DSM-creak
2 Original standard HTS vocoder DSM vocoder DSM-creak
3 Original standard HTS vocoder DSM vocoder DSM-creak
24 / 27
![Page 25: Modeling the creaky excitation for parametric speech](https://reader031.vdocuments.net/reader031/viewer/2022021212/62065bc48c2f7b1730070079/html5/thumbnails/25.jpg)
Ongoing/future research directions
Automate creak segmentation (see our poster at specialsession - glottal source processing!)
Prediction of creaky regions from contextual features (e.g.,phoneme, word stress, position in sentence, prosodic contextetc.)
Transformation of speakers voice characteristics.
25 / 27
![Page 26: Modeling the creaky excitation for parametric speech](https://reader031.vdocuments.net/reader031/viewer/2022021212/62065bc48c2f7b1730070079/html5/thumbnails/26.jpg)
Acknowledgements
This work was supported by the Science Foundation Ireland,Grant 07 / CE / I 1142 (Centre for Next GenerationLocalisation, www.cngl.ie) and Grant 09 / IN.1 / I 2631(FASTNET).
26 / 27
![Page 27: Modeling the creaky excitation for parametric speech](https://reader031.vdocuments.net/reader031/viewer/2022021212/62065bc48c2f7b1730070079/html5/thumbnails/27.jpg)
Thank you!
27 / 27