sipcom 8-4 speech processing, mm7 - speech synthesis - speech recognition (part 1 of 3) børge...

10
SIPCom 8-4 Speech Processing, MM7 - Speech Synthesis - Speech Recognition (Part 1 of 3) Børge Lindberg [email protected]

Upload: leticia-joplin

Post on 11-Dec-2015

228 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: SIPCom 8-4 Speech Processing, MM7 - Speech Synthesis - Speech Recognition (Part 1 of 3) Børge Lindberg lindberg@kom.aau.dk lindberg@kom.aau.dk

SIPCom 8-4Speech Processing, MM7

- Speech Synthesis- Speech Recognition (Part 1 of

3)

Børge [email protected]

Page 2: SIPCom 8-4 Speech Processing, MM7 - Speech Synthesis - Speech Recognition (Part 1 of 3) Børge Lindberg lindberg@kom.aau.dk lindberg@kom.aau.dk

Text-to speech Synthesis

Text analysis

Prosody generation

Sound generation

Text Synthetic speech

Lexicon & Rules

Pitch & duration (stød)

Diphone-database

Page 3: SIPCom 8-4 Speech Processing, MM7 - Speech Synthesis - Speech Recognition (Part 1 of 3) Børge Lindberg lindberg@kom.aau.dk lindberg@kom.aau.dk

• Why is it so difficult ?– Text nomalisation

• “kl 12-14”, “8-3=5”, “8-4-1997”, “mio”, “USA”

– Morphological analysis• “periferien” vs. “skoleferien”, “hul”

– Syntactic analysis• “en mand med hul røst dør bag en dør med hul i”

– Semantic analysis• “The man fed her dog biscuits”

– Sound generation• Transitions, time- and pitch scaling

Page 4: SIPCom 8-4 Speech Processing, MM7 - Speech Synthesis - Speech Recognition (Part 1 of 3) Børge Lindberg lindberg@kom.aau.dk lindberg@kom.aau.dk

Concatenative synthesis

test = /tEsd/ = /#t/ + /tE/ + /Es/ + /sd/ + /d#/

/#t/ /tE/ /Es/ /sd/ /d#/

Page 5: SIPCom 8-4 Speech Processing, MM7 - Speech Synthesis - Speech Recognition (Part 1 of 3) Børge Lindberg lindberg@kom.aau.dk lindberg@kom.aau.dk

Di-(tri)phone Database

• database of male speaker

• Approx. 2600 subword units (di- & triphones)

• Requires pitch-, di- and triphone segmentation

Page 6: SIPCom 8-4 Speech Processing, MM7 - Speech Synthesis - Speech Recognition (Part 1 of 3) Børge Lindberg lindberg@kom.aau.dk lindberg@kom.aau.dk

j a j A v e h E l C h a O l

9 0

1 0 0

11 0

1 2 0

F 0 [H z ]

T im e

S en ten c e : " Ja , je g v il h e lle re h a ' ø l"

Input to the sound generator

Page 7: SIPCom 8-4 Speech Processing, MM7 - Speech Synthesis - Speech Recognition (Part 1 of 3) Børge Lindberg lindberg@kom.aau.dk lindberg@kom.aau.dk

Effect of scaling

• No scaling

• Time scaled

• + pitch scaled

• + energy + stød

Page 8: SIPCom 8-4 Speech Processing, MM7 - Speech Synthesis - Speech Recognition (Part 1 of 3) Børge Lindberg lindberg@kom.aau.dk lindberg@kom.aau.dk

(aalb.wav)•Normal

More examples

(fast.wav)•High speaking rate, normal pitch

(slow.wav)•Low speaking rate, normal pitch

(light.wav)•Normal speaking rate, high pitch

(dark.wav)•Normal speaking rate, low pitch

Page 9: SIPCom 8-4 Speech Processing, MM7 - Speech Synthesis - Speech Recognition (Part 1 of 3) Børge Lindberg lindberg@kom.aau.dk lindberg@kom.aau.dk

Intelligibility test DST-demo Natural speech

# answers 1600 1600 # errors 18 3 % error 1,1 0,2

Evaluation - intelligibility

• 32 test persons

• 156 stimuli in carrier sentence: “Det er <keyword>, de siger“

Page 10: SIPCom 8-4 Speech Processing, MM7 - Speech Synthesis - Speech Recognition (Part 1 of 3) Børge Lindberg lindberg@kom.aau.dk lindberg@kom.aau.dk

Evaluation - naturalness

• 32 test persons

• 155 stimuli

Naturalness test Category MOS

Natural speech 4,63 DST-demo 2,29 INFOVOX 1,11

GSM 3,99