emotional speech synthesis - expressive synthetic...
Post on 15-Apr-2018
239 Views
Preview:
TRANSCRIPT
19.05.2009 1
Emotional SpeechSynthesisState of the art 2009 Felix Burkhardt
19.05.2009 2Emotional Soeech Synthesis - Felix Burkhardt,
outline
how to model and why simulate emotions?emotions in speechintroduction to speech synthesis approachesexamples, examples, examplesconclusion and outlook
19.05.2009 3Emotional Soeech Synthesis - Felix Burkhardt,
contents
how to model and why simulate emotions?emotions in speechoverview on speech synthesisexamples, examples, examplesconclusion, outlook
19.05.2009 4Emotional Soeech Synthesis - Felix Burkhardt,
emotion models
…everyone except a psychologist knows what an emotion is (Young 1973)
categories, e.g. anger, joy, …
dimensions, e.g. activation, dominance, valence
appraisals, e.g. novelty, intrinsic pleasantness, relevance, coping potential,
emotion cube
arou
sal
valence
dominance
anger joy
sadness
content
neutral
despair
boredom
source: Burkhardt 2001
19.05.2009 5Emotional Soeech Synthesis - Felix Burkhardt,
why model emotional behaviour?
aspects of emotion modeling in human-machine interaction:
source: Batliner et al 2006
19.05.2009 6Emotional Soeech Synthesis - Felix Burkhardt,
applications of emotional tts
fun, e.g. emotional greetingsprosthesisemotional chat avatarsgaming, believable charactersadapted dialog designadapted persona designtarget-group specific advertising…believable agents…artificial humans
time
19.05.2009 7Emotional Soeech Synthesis - Felix Burkhardt,
aspects of emotional tts
19.05.2009 8Emotional Soeech Synthesis - Felix Burkhardt,
contents
why simulate emotions?emotions in speechoverview on speech synthesisexamples, examples, examplesconclusion, outlook
19.05.2009 9Emotional Soeech Synthesis - Felix Burkhardt,
speech features
source: Reynolds et al 2003
descriptive layers of speech
19.05.2009 10Emotional Soeech Synthesis - Felix Burkhardt,
emotion in speech
source: TUB emotional database
frightened sad
happy bored
neutral angry
spectrograms from emotional acted speech
19.05.2009 11Emotional Soeech Synthesis - Felix Burkhardt,
emotional data?
actors vs. realityBerlin EmoDB: 10 actors x 7 emotions x 10 sentencesalternatives
induced data, e.g. Aibotelevision, radio data
EmoDB: Burkhardt et al 2005
19.05.2009 12Emotional Soeech Synthesis - Felix Burkhardt,
how to describe emotion?
EmotionML, incubator group at W3CExample, embedded in SSML:
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
<voice gender="female">
<prosody contour="(0%,+20Hz)(10%,+30%)(40%,+10Hz)">
Hi, am sad know but start getting angry...
</prosody>
</voice>
<emotion>
<category name="sadness„ set="basic" intensity="0.6"/>
<timing start="10%" end="50%"/>
</emotion>
<emotion>
<category name="anger" set="basic" intensity="0.4"/>
<timing start="50%" end="100%"/>
</emotion>
</speak> http://www.w3.org/2005/Incubator/emotion/
19.05.2009 13Emotional Soeech Synthesis - Felix Burkhardt,
loquendo tts director
source: Loquendo
19.05.2009 14Emotional Soeech Synthesis - Felix Burkhardt,
contents
why simulate emotions?emotions in speechintroduction to speech synthesis approachesexamples, examples, examplesconclusion, outlook
19.05.2009 15Emotional Soeech Synthesis - Felix Burkhardt,
speech synthesis taxonomy
speech synthesis systems
voice response systems arbitary speech synthesizersre (copy)-synthesis, voice transformation
text-to-speech(unknown input)
concept-to-speech(input from text-generation system)
voice conversion
19.05.2009 16Emotional Soeech Synthesis - Felix Burkhardt,
tts process chain
NLP naturallanguageprocessing
DSP digital speechprocessing
phonetic transcriptionprosody track
preprocessingmorpho-syntactic analysistranspcriptionprosody modeling
unit concatenation / searchprosody fittingedge smoothing
19.05.2009 17Emotional Soeech Synthesis - Felix Burkhardt,
synthesis approaches
system modelingsignal modeling
expert systemsformant synthesis
articulatory synthesisvocal tract shape synthesis
concatenative synthesis
coding of units type of unitssyllables, diphones, allophones, subsegments
parametric codedLPC linear predictive codingMFCC mel frequency cepstralMBR multi band resynthesisformants
waveform codedPCMLDM (linear delta mod.)
hybrid approachesMBRPSOLA, RELP
statistical model generatedHMM hidden markov modelsANN neural nets
rule based data based
non-uniform unit selection
pseudo articulatory
19.05.2009 18Emotional Soeech Synthesis - Felix Burkhardt,
historic development
articulatoryvan Kempelen
formant synthesise.g. Dec Talk
PSOLA basedsynthesise.g. Elan
non-uniform unitselectione.g. RealSpeak
flexiblehistoric
not flexiblemodern
natural soundingdomain dependent
artificial soundingdomain independent
2000199019801780 ….
19.05.2009 19Emotional Soeech Synthesis - Felix Burkhardt,
system modeling
19.05.2009 20Emotional Soeech Synthesis - Felix Burkhardt,
source filter model
source: Klatt80 formant synthesizer (Klatt 1980)
19.05.2009 21Emotional Soeech Synthesis - Felix Burkhardt,
contents
why simulate emotions?emotions in speechoverview on speech synthesisexamples, examples, examplesconclusion, outlook
19.05.2009 22Emotional Soeech Synthesis - Felix Burkhardt,
open source Java program based on MBROLA synthesis engine.NOT a complete text-to-speech systemprosody filter between natural language and digital speech signal processing modulesas multilingual as MBROLA which currently supports 35 languages.
examples: emofilt
19.05.2009 23Emotional Soeech Synthesis - Felix Burkhardt,
emoSpeak is integrated into the MARY text-to-speech framework by DFKI.Marc Schröderinvestigated in his ph.d. thesis, how to assign rule-based modification of speech to emotional dimensions.the system can be freely dowloaded
examples: emoSpeak
source: Schröder 2004
19.05.2009 24Emotional Soeech Synthesis - Felix Burkhardt,
examples voice conversion
neutral sadPhase vocoderGreg Beller, IRCAM
neutral angryPSOLA - LPC conversion
Murtaza Bulut et al, USC
19.05.2009 25Emotional Soeech Synthesis - Felix Burkhardt,
examples voice transformation
Laughter synthesis byLPC synthesis and mass-spring model
Shiva SundaramUSC 2007
womanas boyas manman breathywhisperytense
Mixed LF + harmonicmodel
Olivier RosecFranceTelecom 2009
19.05.2009 26Emotional Soeech Synthesis - Felix Burkhardt,
examples formant synthesis
neutral sadangry cryingcontent
prosody rules + phonation model
EmoSynBurkhardt, 2000
sad angryDEC Talk prosodyrules
AffectEditorJ. Cahn, MIT 1998
19.05.2009 27Emotional Soeech Synthesis - Felix Burkhardt,
examples diphone synthesis
neutral joyprosody rulesEmoFiltBurkhardt, 1999
joy angryprosody rules fordimensionsthree inventories forsoft, normal and tensespeech
MARYM. Schröder, DFKI
19.05.2009 28Emotional Soeech Synthesis - Felix Burkhardt,
examples statistical based
neutral joyHMM models spectraland prosodic features
Tokyo Institute, Kobayashi Lab
19.05.2009 29Emotional Soeech Synthesis - Felix Burkhardt,
examples unit selection
Katrinextralinguistic units
product researchCTTS with expressive units
Damian Shoutyfun personality voices
19.05.2009 30Emotional Soeech Synthesis - Felix Burkhardt,
examples non human
anger fearformant synthesisMIT Kismet robot
happy sadconcatenativeOudeyer: Sony petrobots
19.05.2009 31Emotional Soeech Synthesis - Felix Burkhardt,
examples singing
bicycle1961 articulatory, firstsong ever
Bell Labs Gerstman & Mathews,
aria1993Articulatory
pavarobottiIngo Titze
donna nobis2007articulatory
vocal tract labPeter Birkholz
19.05.2009 32Emotional Soeech Synthesis - Felix Burkhardt,
more examples …http://emosamples.syntheticspeech.de
19.05.2009 33Emotional Soeech Synthesis - Felix Burkhardt,
contents
why simulate emotions?emotions in speechoverview on speech synthesisexamples, examples, examplesconclusion, outlook
19.05.2009 34Emotional Soeech Synthesis - Felix Burkhardt,
conclusion
emotions are part of natural speechsimulation possible by either
modeling the processincluding emotional data
still text to speech fights with intelligible, neutral speechfirst steps: speaking styles, extralinguisticsfirst apps: fun, gaming
19.05.2009 35Emotional Soeech Synthesis - Felix Burkhardt,
outlook
discrepancy betweennatural but unflexible vs. artificial sounding but flexible
solutions short - middle term:very large databaseshybrid parametric – non-uniform unit selectionvoice transformation techniqueshigh quality source filter model based synthesis
solutions on the long runphysical modeling
19.05.2009 36Emotional Soeech Synthesis - Felix Burkhardt,
references
top related