the cued speech group
DESCRIPTION
Dr Mark Gales Machine Intelligence Laboratory Cambridge University Engineering Department. The CUED Speech Group. Signal Processing Lab. Computational and Biological Learning Lab. Machine Intelligence Lab. Control Lab. 4 Staff Bill Byrne Mark Gales Phil Woodland - PowerPoint PPT PresentationTRANSCRIPT
The CUED Speech Group
Dr Mark GalesMachine Intelligence Laboratory
Cambridge University Engineering Department
A. ThermoFluids
B. Electrical Eng
C. Mechanics
D. Structures
E. Management
F. Information Engineering Division
CUED: 6 Divisions 1301100450
Academic StaffUndergradsPostgrads
ControlLab
Signal Processing Lab
Computational and Biological Learning Lab
MachineIntelligence Lab
SpeechGroup
VisionGroup
MedicalImagingGroup
1. CUED Organisation
4 Staff Bill Byrne Mark Gales Phil Woodland Steve Young9 RA’s
12 PhD’s
2
2. Speech Group Overview
3
• Primary research interests in speech processing– 4 members of Academic Staff– 9 Research Assistants/Associates– 12 PhD students
PhD Projects in Fundamental Speech Technology
Development (10-15 students)
Funded Projects in Recognition/Translation/Synthesis
(5-10 RAs)
MPhil inComputerSpeech,Text and InternetTechnology
ComputerLaboratoryNLIP Group
HTK Software ToolsDevelopment
Computer Speech andLanguage
International Community
Principal Staff and Research Interests
4
• Dr Bill Byrne• Statistical machine translation
• Automatic speech recognition
• Cross-lingual adaptation and synthesis
• Dr Mark Gales• Large vocabulary speech recognition
• Speaker and environment adaptation
• Kernel methods for speech processing
• Professor Phil Woodland • Large vocabulary speech recognition/meta-data extraction
• Information retrieval from audio
• ASR and SMT integration
• Professor Steve Young• Statistical dialogue modelling
• Voice conversion
data driven semantic processing statistical modelling
Research Interests
data driven techniques voice transformation HMM-based techniques
large vocabulary systems [Eng, Chinese, Arabic ] acoustic model training and adaptation language model training and adaptation rich text transcription & spoken document retrieval
fundamental theory of statistical modelling and pattern processing
5
statistical machine translation finite state transducer framework
Example Current and Recent Projects
• Global Autonomous Language Exploitation – DARPA GALE funded (collab with BBN, LIMSI, ISI …)
• HTK Rich Audio Trancription Project (finished 2004)– DARPA EARS funded
• CLASSIC: Computational Learning in Adaptive Systems for Spoken Conversation
– EU (collab with Edinburgh, France Telecom,,…)
• EMIME: Effective Multilingual Interaction in Mobile Environments- EU (collab with Edinburgh, IDIAP, Nagoya Institute of Technology … )
• R2EAP: Rapid and Reliable Environment Aware Processing- TREL funded
Also active collaborations with IBM, Google, Microsoft, …
6
3. Rich Audio Transcription Project
7
New algorithms
Natural Speech
Rich Transcript
English/Mandarin
• DARPA-funded project – Effective Affordable Reusable Speech-to-text (EARS) program
• Transform natural speech into human readable form– Need to add meta-data to the ASR output– For example speaker-terms/handle disfluencies
http://mi.eng.cam.ac.uk/research/projects/EARS/index.htmlSee
Rich Text Transcription
okay carl uh do you exercise yeah actually um i belong to a gym down heregold’s gym and uh i try to exercise five days a week um and now and theni’ll i’ll get it interrupted by work or just full of crazy hours you know
ASR Output
Speaker1: / okay carl {F uh} do you exercise /Speaker2: / {DM yeah actually} {F um} i belong to a gym down here / / gold’s gym / / and {F uh} i try to exercise five days a week {F um} / / and now and then [REP i’ll + i’ll] get it interrupted by work or just full of crazy hours {DM you know } /
Meta-Data Extraction (MDE) Markup
Speaker1: Okay Carl do you exercise?Speaker2: I belong to a gym down here, Gold’s Gym, and I try to exercise five days a week and now and then I’ll get it interrupted by work or just full of crazy hours.
Final Text
8
4. Statistical Machine Translation
9
• Process involves collecting parallel (bitext) corpora– Align at document/sentence/word level
• Use statistical approaches to obtain most probable translation
• Aim is to translate from one language to another– For example translate text from Chinese to English
GALE: Integrated ASR and SMT
10
• Member of the AGILE team (lead by BBN)
The DARPA Global Autonomous Language Exploitation (GALE) program has the aim of developing speech and language processing technologies to recognise, analyse, and translate speech and text into readable English.
• Primary languages for STT/SMT: Chinese and Arabic
http://mi.eng.cam.ac.uk/research/projects/AGILE/index.htmlSee
5. Statistical Dialogue Modelling
Speech Understanding
Speech Generation
DialogueManager
System
uS
uA
sA
sS
us SS ,sY
uY
)|( uu YAP
)|( ss AYP
Waveforms Words/Concepts Dialogue Acts
11
• Use a statistical framework for all stages
Legend:
ASR: Automatic Speech recognition
NLU: Natural Language Understanding
DM: Dialogue Management
NLG: Natural Language Generation
TTS: Text To Speech
st: Input Sound Signalut: Utterance Hypothesesht: Conceptual Interpretation Hypothesesat: Action Hypotheseswt: Word String Hypothesesrt: Speech Synthesis HypothesesX: possible elimination of hypotheses
CLASSiC: Project Architecture
st
Speech InputASR NLU DM NLG TTS
Context t-1
ut ht atwt
rt
1-Best S
ignal Selection
x
xx
x x
x
Speech output
http://classic-project.orgSee
6. EMIME: Speech-to-Speech Translation
13
• Personalised speech-to-speech translation– Learn characteristics of a users speech
– Reproduce users speech in synthesis
• Cross-lingual capability– Map speaker characteristics across languages
• Unified approach for recognition and synthesis– Common statistical model; hidden Markov models
– Simplifies adaptation (common to both synthesis and recognition)
• Improve understanding of recognition/synthesis
http://emime.orgSee
7. R2EAP: Robust Speech Recognition
14
• Current ASR performance degrades with changing noise• Major limitation on deploying speech recognition systems
• Aims of the project1. To develop techniques that allow ASR system to rapidly respond to
changing acoustic conditions;
2. While maintaining high levels of recognition accuracy over a wide range of conditions;
3. And be flexible so they are applicable to a wide range of tasks and computational requirements.
• Project started in January 2008 – 3 year duration
• Close collaboration with TREL Cambridge Lab.– Common development code-base – extended HTK
– Common evaluation sets
– Builds on current (and previous) PhD studentships
– Monthly joint meetings
Project Overview
15
http://mi.eng.cam.ac.uk/~mjfg/REAP/index.htmlSee
Approach – Model Compensation
16
• Model compensation schemes highly effective BUT• Slow compared to feature compensation scheme
• Need schemes to improve speed while maintaining performance• Also automatically detect/track changing noise conditions
• To date 5 Research studentships (partly) funded by Toshiba– Shared software - code transfer both directions
– Shared data sets - both (emotional) synthesis and ASR
– 6 monthly reports and review meetings
• Students and topicsHank Liao (2003-2007): Uncertainty decoding for Noise Robust ASR
Catherine Breslin (2004-2008): Complementary System Generation and Combination
Zeynep Inanoglu (2004-2008): Recognition and Synthesis of Emotion
Rogier van Dalen (2007-2010): Noise Robust ASR
Stuart Moore (2007-2010): Number Sense Disambiguation
• Very useful and successful collaboration
8. Toshiba-CUED PhD Collaborations
17
9. HTK Version 3.0 Development
HTK is a free software toolkit for developing HMM-based systems• 1000’s of users worldwide• widely used for research by universities and industry
1989 – 1992
1993 – 1999
2000 – date
V1.0 – 1.4
V1.5 – 2.3
V3.0 – V3.4
Initial development at CUED
Commercial development by Entropic
Academic development at CUED
Development partly funded by Microsoft and DARPA EARS Project
Primary dissemination route for CU research output
http://htk.eng.cam.ac.ukSee
18
2004 - date: the ATK Real-time HTK-based recognition system2004 - date: the ATK Real-time HTK-based recognition system
10. Summary
19
• Speech Group works on many aspects of speech processing• Large vocabulary speech recognition
• Statistical machine translation
• Statistical dialogue systems
• Speech synthesis and voice conversion
• Statistical machine learning approach to all applications• World-wide reputation for research
• CUED systems have defined state-of-the-art for the past decade
• Developed a number of techniques widely used by industry
• Hidden Markov Model Toolkit (HTK)• Freely-available software, 1000’s of users worldwide
• State-of-the –art features (discriminative training, adaptation …)
• HMM Synthesis extension (HTS) from Nagoya Institute of Technology
http://mi.eng.cam.ac.uk/research/speechSee