mai internship april-may 2002
DESCRIPTION
MAI Internship April-May 2002. What?. The AST Project promotes development of speech technology for official languages of South Africa SAEnglish, Afrikaans, Zulu, Xhosa, Sesotho Create reusable databases & software Prototype hotel booking dialogue system 2000-2003. - PowerPoint PPT PresentationTRANSCRIPT
MAI Internship April-May 2002
MAI Internship 2002 Slide 2 of 14
What?
• The AST Project promotes development of speech technology for official languages of South Africa
• SAEnglish, Afrikaans, Zulu, Xhosa, Sesotho
• Create reusable databases & software• Prototype hotel booking dialogue
system• 2000-2003
MAI Internship 2002 Slide 3 of 14
AST dialogue system: basics
Telephone Network
Speech Recognitio
n
Natural Language Understanding
Dialogue Manager
Speech Synthesis
DATABASE
MAI Internship 2002 Slide 4 of 14
• Use? input ASR: acoustic training output ASR: dictionary
• Start from scratch, even for SAE• Telephone data based on SpeechDat
– Datasheet utterances– Hierarchical recruiting method
• Labeling Tool: PRAAT
AST Speech Database
MAI Internship 2002 Slide 5 of 14
Language Spoken Code No. of Speakers
1 English (E) Speech varieties: Mother-tongue English Black English Coloured English Asian English Afrikaans English
EEBECEASEAE
1500-2000 300-400300-400300-400300-400300-400
2 isiXhosa (X) XX 300-400
3 Sesotho (S) SS 300-400
4 isiZulu (Z) ZZ 300-400
5 Afrikaans (A) Speech varieties: Mother-tongue Afrikaans Black Afrikaans Coloured Afrikaans
AABACA
900-1200 300-400300-400300-400
MAI Internship 2002 Slide 6 of 14
AST Speech Database
Orthographic annotation
Phonemic transcription
Acoustic signal
Phonetic alignment
Manual labour
Rules & dictionary: Patana
Forced alignment: HTK
MAI Internship 2002 Slide 7 of 14
• Difficult:– Speaker independent, noisy conditions– Medium-size vocabulary (10.000 words)– Training data sparse
• Not so difficult:– Dialogue Manager helps
• Phoneme-based HMMs future diphones
• Finite-state language model• Pitch & clicks African languages ignored
AST Speech Recognition
MAI Internship 2002 Slide 8 of 14
• Same finite-state network as language model recogniser +: all utterances ‘understood’
-: FSG are limited• Makes no sense to recognise more than
we can understand• Semantic labels are activated• Alternative: robust parsing (Phoenix,
ATIS)
AST Natural Language Understanding
MAI Internship 2002 Slide 9 of 14
Speech Recognitio
n
NLU Dialogue
ManagerFSG
Recognised utterance
Grammar IDGrammar ID
Meaning
AST Natural Language Understanding
MAI Internship 2002 Slide 10 of 14
Embedded semantic tags:‘drie honderd duisend agt en neëntig’ 3 0 0 0 9 8
biljoen miljard miljoen duisend
NIL
NILNIL
NIL
NIL
miljoenmiljard
miljard
biljoen
biljoen
biljoen
$honderd
$honderd_en
@hundreds@hundreds@hundreds@hundreds
V6=3 V5=0 V4=0V3=0 V2=9 V1=8
t1=3 t2=0 t3=0
AST Natural Language Understanding
MAI Internship 2002 Slide 11 of 14
• Trade-off: naturalness response restriction
• System-directed: predictability user utterances, simple dialogues
• Mixed-initiative: shorter dialogues, more recognition errors
• User-initiative: unpopular
AST Dialogue Manager
MAI Internship 2002 Slide 12 of 14
Design:• Early focus on users and task• Wizard-of-Oz: pay no attention to the
man behind the curtain• System-in-the-loop
• Finite-state structure because of simplicity and functionality
• Possible frame-based approach in future
AST Dialogue Manager
MAI Internship 2002 Slide 13 of 14
• Fixed machine utterances: pre-recorded speech
• Database queries: limited-domain synthesis (Festival platform)
AST Speech Synthesis
MAI Internship 2002 Slide 14 of 14
Conclusion
Finite-state approach in– Recogniser– NLU component– Dialogue manager
Workable prototype New fundings 2003