multilingual hlt in europe and the development of asr
DESCRIPTION
Multilingual HLT in Europe and the development of ASR. Louis C.W. Pols Institute of Phonetic Sciences University of Amsterdam The Netherlands. PRASA2001 – Franschhoek, South Africa 30 Nov. 2001, keynote. Some history. - PowerPoint PPT PresentationTRANSCRIPT
Multilingual HLT in Europe and the development of
ASR
Louis C.W. PolsInstitute of Phonetic Sciences
University of AmsterdamThe Netherlands
PRASA2001 – Franschhoek, South Africa30 Nov. 2001, keynote
30 Nov. 2001 PRASA2001 - Franschhoek 2
Some history Liesbeth Botha spent half a year at our
institute during second half of 1996 ever since the possible organization of a
workshop or a major conference in South Africa was considered
(cancelled) AST Workshop on ‘Human Language Technologies for E-Governance in a Multilingual Society’, Stellenbosch
PRASA2001 – Franschhoek, 29-30 Nov., incl. Speech Processing and AST project
I always wanted to visit South Africa!
30 Nov. 2001 PRASA2001 - Franschhoek 3
Overview Multilingual Europe (vs. Multilingual South Africa) EU Framework Programs; Human Language
Technology (HLT) Other (European) programs and organizations ISCA Dutch speech database initiatives (vs. AST) Speech science and technology; ASR development Academia (knowledge) and industry (applications) Conclusions
30 Nov. 2001 PRASA2001 - Franschhoek 4
Multilingual Europe
Europe (West, Central, East) EU-countriesCandidate-EU-countries Schengen countries (internally no boundary
control)Euro countries (300 M people)
many nations and even more languages multilingual community and (open) market e-commerce, telebanking, infokiosk, etc.
30 Nov. 2001 PRASA2001 - Franschhoek 7
EU Framework Program FP5
Human Language Technologies RTD (HLT)http://www.hltcentral.org/
part of Information Society Technologies (IST), Key Action III (Multimedia Contents and Tools)
part of fifth Framework Program ’98-’02 (FP5) IST 3600 M€ (26.5% of FP5); HLT 125 M€ HLT: Multilingual communication
Natural Interactivity Cross-lingual information management Support & Accompanying Measures
30 Nov. 2001 PRASA2001 - Franschhoek 8
6th Framework program
FP6 (’02-’06) the way forward proposal published Febr. 2001 one of 7 priority themes:
Information Society Technologies also networks of excellence IST budget 3600 M€
30 Nov. 2001 PRASA2001 - Franschhoek 9
Complaints from academia
too much application & user oriented little room for research (reaction Commission: it
is time for HLT to show its usefulness!), but .... pendulum swings!
speech data not freely available (only with delay and at (high) costs via ELRA)
still: several very interesting projects we participated before (SAM, EuroCocosda,
somewhat in SpeechDat) but barely anymore, but (KPN Research and) Nijmegen University still do
30 Nov. 2001 PRASA2001 - Franschhoek 10
Some HLT ‘speech’ projects
C-ORAL-ROM Integrated Reference Corpora for Spoken Romance Languages (1/01, 36 mo)
CORETEX Improving Core Speech Recognition Technology (4/00, 36 mo) I-EYE Interacting with Eyes: Gaze Assisted Access to Information in
Multiple Languages (1/00, 30 mo) NESPOLE! NEgotiating through SPOken Lang. in E-comm. (1/00, 30 mo) SIRIDUS Specification, Interaction and Reconfiguration In Dialogue
Understanding Systems (1/00, 36 mo) SMADA Sp. Driven Multimodal Automatic Directory Assist. (1/00, 36 mo)
(finalizing ITRW ’Advanced ASR for Telecom Appl.’, Nov. 2002, Avignon) SPEECON Sp. Driven Interfaces for Consumer Applications (2/00, 24 mo)
30 Nov. 2001 PRASA2001 - Franschhoek 11
Some ‘past’ HLT projects
ARISE Automatic Railway Systems for Europe (10/96, 24 mo) CAVE Caller Verification in Bank and Telecommunication (11/95, 24
mo) EAGLES Expert Advisory Group on Language Engineering Standards
(11/97, 24 mo) ELRA European Language Resources Association (9/95, 50 mo) ELSE Evaluation in Language and Speech Engineering (1/98, 16 mo) SPEECHDAT Speech Databases for Creation of Voice Driven
Teleservices (3/96, 34 mo) SPEECHDAT-CAR (3/98, 30 mo) + variants VODIS Advanced Speech Technologies for Voice-operated Driver
Information Systems (11/95, 43 mo)
30 Nov. 2001 PRASA2001 - Franschhoek 12
some HLT ‘support’ projects
CLASS Collaboration in Language and Speech Science and technology (Int. WS on ‘Information Presentation and Natural Multimodal Dialogue’, Verona Italy, Dec 14-15, 2001)
ELSNET-HLT The European Network of Excellence in Human Language Technologies
HOPE HLT Opportunity Promotion in Europe, Euromap
ISLE-HLT Int. Standards for Language Engineering (Eagles follow-up) incl. I/O Meta Data Initiative (IMDI), see also COREX
30 Nov. 2001 PRASA2001 - Franschhoek 13
eContent
eContent part of eEurope initiative European Digital Content on the Global
Networks, ’01-’05, 100 M€, 1st call 3/2001 Action Line 2 (AL2) addresses the intersection of
the content and language industries, more specifically the design, production and distribution of high-quality European digital content for the global networks in an increasingly multilingual and multicultural socio-economic environment
http://www.hltcentral.org/econtent/
30 Nov. 2001 PRASA2001 - Franschhoek 14
MLIS
Multilingual Information Society Program Supporting the creation of a framework of
services for European language resources Encouraging the use of language
technologies, resources and standards Promoting the use of advanced language
tools in the Community and Member States public sector
one call in June ’99, 15 M€, some 30 proj. f.i. NL-TRANSLEX: Machine Translation for
Dutch and English/French/German
30 Nov. 2001 PRASA2001 - Franschhoek 15
INTAS
International Association for the promotion of co-operation with scientists from the New Independent States of the former Soviet Union (NIS)
established June 1993 Open + Thematic Call 2000 (budget 16 M €) max budget 150 k€/project (max 30 k€/NIS partner)
INTAS 915 ‘Spontaneous Speech of Typologically Unrelated Languages (Russian, Finnish and Dutch): Comparison of Phonetic Properties’ (90 k€, 7/01, 36 mo)
30 Nov. 2001 PRASA2001 - Franschhoek 16
Euromap
HLT Opportunity Promotion in Europe (HOPE) (2/00, 24 mo, 8 national focus points)
to raise awareness of the benefits of human language technologies (HLT) with companies, organizations and users; to accelerate technology transfer from the research base to the market; to stimulate community building in specific domains (tourism and e-commerce).
General: http://www.hltcentral.org/euromap/ Dutch site: http://www.taalunieversum.org/tst/en/
30 Nov. 2001 PRASA2001 - Franschhoek 17
European Language Resources Association
A non-profit organization to promote the creation, verification, and distribution of language resources. US counterpart: LDC 173 resources sold in 2000. organizer of LREC conferences (third one in May
2002 in Las Palmas, Spain) speech & related resources ~200 written resources ~145 terminological resources tools and software
http://www.icp.grenet.fr/ELRA/home.html
30 Nov. 2001 PRASA2001 - Franschhoek 18
ELSNET European Network of Excellence in Human
Language Technologies one of the ~20 networks within FP5 Transfer of knowledge and expertise; Shared
goals; Evaluation; Shared language resources; Promotion of best practice; Interoperability by means of standardization
yearly Elsnet Summer Schools: July 15-26, 2002 Odense, Denmark, ‘Evaluation and Assessment of Text and Speech Systems’
Newsletter Elsnews; http://www.elsnet.org
30 Nov. 2001 PRASA2001 - Franschhoek 19
COCOSDA
Internat. organization for coordinating the globalized efforts in spoken language resources and sp. technology evaluation
yearly, jointly, with Eurospeech and ICSLP since Chiavari, Italy, Sept. ’91 (Eurosp.’91) and before; Oriental Cocosda
topic domains Evaluation of Speech Underst. and Dialogue Systems (W. Minker) Multi-modal corpora (S. Nakamura) Corpus Annotation Tools (S. Bird) Local Languages (D. Gibbon)
regional programs (Europe; Asia; Oceania; Africa; Latin America) data center representatives (LDC, S. Bird; ELRA, K. Choukri) http://www.itl.atr.co.jp/cocosda
Justus Roux
30 Nov. 2001 PRASA2001 - Franschhoek 20
COCOSDA matrix
30 Nov. 2001 PRASA2001 - Franschhoek 21
COST
European Cooperation in the field of Scientific and Technical Research (~60 k€ per action, for additional costs only): COST 249: Continuous Speech Recognition over the
Telephone (19 countries; start 5/94; 6 yrs; final report) COST 250: Speaker Recognition in Telephony COST 258: The Naturalness of Synthetic Speech COST 277: Nonlinear Speech Processing COST 278: Spoken Language Interaction in
Telecommun. http://cost.cordis.lu/src/home.cfm
30 Nov. 2001 PRASA2001 - Franschhoek 22
EURESCOM
the European Institute for Research and Strategic Studies in Telecommunications
20 shareholders from 19 European countries (major European network operators and service providers) f.i. MUST - MUltimodal, multilingual
information Services with small mobile Terminals (P1104)
30 Nov. 2001 PRASA2001 - Franschhoek 23
ISCA European Speech Comm. Association founded in
’88 from ESCA to ISCA at Eurospeech’99 in Budapest membership organization organizer of Eurospeech/ICSLP - Interspeech organizer of specialized workshops (ITRWs) Special interest groups (SIGs) Speech Communication Journal
(http://www.elsevier.com/locate/specom) http://www.isca-speech.org/
30 Nov. 2001 PRASA2001 - Franschhoek 24
Eurospeech-ICSLP-Interspeech
odd years (Eurospeech) even years (ICSLP)
(in Europe) (elsewhere)1 Paris ’89 Kobe ’902 Genoa ’91 Banff ’923 Berlin ’93 Yokohama ’944 Madrid ’95 Philadelphia ’965 Rhodes ’97 Sydney ’986 Budapest ’99 Beijing ’007 Aalborg ’01 Denver ’028 Geneva ’03 Seoul ’049 Lisbon ’05 ?? ’06
past
future
30 Nov. 2001 PRASA2001 - Franschhoek 25
ISCA SIGs
Speech Synthesis - SynSig Audio Visual Speech - AVISA Speech And Language Technology for MInority
Languages - SALTMIL Integration of Speech Technology in (Language) Learning
- InSTIL SPeaker and Language Characterization - SPLC Education in the Field of Speech Communication -
EduSIG Speech Prosody - SProSIG Dialogue Processing - SigDial (also within ACL) Groupe Francophone de la Communication Parlée - GFCP
30 Nov. 2001 PRASA2001 - Franschhoek 26
ISCA ITRWs (forthcoming)
Prosody in Speech Recognition and Understanding - Prosody 2001Molly Pitcher Inn, Red Bank, NJ. October 22-24, 2001
TIPS - Temporal Integration in the Perception of Speech Aix-en-Provence, France, 8-10 April 2002
Multi-Modal Dialogue in Mobile Environments Kloster Irsee, Germany, June 17-21, 2002
Advanced ASR for Telecom Applications Palais des Papes, Avignon, France, November 27-29, 2002
Supported but not organized by ISCA: 2001 International Workshop on Automatic Sp. Recogn. and
Underst. Madonna di Campiglio (Trento), Italy, December 9-13, 2001
Speech Prosody 2002 Aix-en-Provence, France, 11-13 April, 2002
30 Nov. 2001 PRASA2001 - Franschhoek 27
IEEE IEEE Signal Processing Society
MMSP’01, Workshop on Multimedia Signal Processing, Cannes, France, October 3-5, 2001
ASRU’01, Automatic Speech Recognition and Understanding Workshop, Madonna de Campiglio (Trento), Italy, December 9-13, 2001
2002 International Workshop on Multimedia Signal Processing, US Virgin islands, December 9-11, 2002
IEEE Trans. on Signal Processing / Speech and Audio Processing / Multimedia / Neural Networks
http://www.ieee.org/
30 Nov. 2001 PRASA2001 - Franschhoek 28
DARPA NIST
DARPA Projects and Yearly evaluations CSR (Continuous Speech Recognition); LVCSR (Large Vocabulary Conversational
Speech Recognition); ATIS (Air Travel Information System); Language Recognition (Identification and
Verification); Speaker Recognition (Identification and
Verification)
30 Nov. 2001 PRASA2001 - Franschhoek 29
NATO-ASI
ASI = Advanced Study Institute many different domains certain restrictions on NATO vs. non-NATO
participants, free registration, some funding Dynamics of Speech Production and
Perception, Il Ciocci, Italy, June 23 – July 6, 2002
send application before Jan. 15, 2002 to [email protected]
Organizing Cee.: Pierre L. Divenyi & Klára Vicsi
30 Nov. 2001 PRASA2001 - Franschhoek 30
European national programs
German Verbmobil; SmartKom (since 9/99) Bavarian Archive for Speech Signals (BAS)
Spoken Dutch Corpus French AUP Swedish Centre for Speech Technology
(CTT) Swedish National Graduate School in Language Technology (GSLT)
30 Nov. 2001 PRASA2001 - Franschhoek 31
Dutch speech database initiatives
Speech Processing Expertise Center SPEX 5,000 speakers Polyphone 1,000 speakers SpeechDat + variants NWO Priority program TST-OVIS (public
transportation information system over telephone)
1,000 hrs CGN (Dutch-Flemish) 5.5 hrs ‘open source’ IFA-corpus TST Platform ToDI (Transcription of Dutch Intonation)
30 Nov. 2001 PRASA2001 - Franschhoek 32
Spoken Dutch Corpus
4.6 M€, 5 yrs, 10 M words, ~ 1000 hrs of speech Corpus design and compilation Recording and digitization Orthographic transcription (all) Lemmatization and POS tagging (all) Lexicon link-up (all) Broad phonetic transcription (1 M) Word segmentation (1 M) Syntactic annotation (1 M) Prosodic annotation (250 k) Development of exploitation software COREX
http://lands.let.kun.nl/cgn/home.htm
30 Nov. 2001 PRASA2001 - Franschhoek 33
IFA corpus
5.5 hrs of high-quality-recorded speech 4 male and 4 female speakers more than 30 min. per speaker various speaking styles per speaker
from conversational and read speech, to isolated sentences, words and syllables
everything phonemically segmented & labeled
free access via SQL query language http://www.fon.hum.uva.nl/IFAcorpus
30 Nov. 2001 PRASA2001 - Franschhoek 34
Speech science and speech technology
we should try to bridge that gap see my keynotes at ICPhS ’99 and
Eurospeech’01:“Flexible, robust and efficient human speech processing versus present-day speech technology”“Acquiring and implementing phonetic knowledge”
we have to understand each other in order to be able to communicate and to contribute
probabilistic vs. knowledge driven adding (multiple) knowledge (sources) to
improve performance much knowledge in speech databases
30 Nov. 2001 PRASA2001 - Franschhoek 35
Phonetics Speech Techn.
AFFINITY to:
from:
phonetics speech technology
phonetics
source / filter individuality context prosody
human performance specific knowledge regularities multiple features
speech technology
more data new models probabilities speech vs. NLP
EU FPV, DARPA applications user orientation evaluation
30 Nov. 2001 PRASA2001 - Franschhoek 36
Do recognizers need intelligent ears?
intelligent ears front-end pre-processor only if it improves performance humans are generally better speech
processors than machines, perhaps system developers can learn from human behavior
robustness at stake (noise, reverberation, incompleteness, restoration, competing speakers, variable speaking rate, context, dialects, non-nativeness, style, emotion)
30 Nov. 2001 PRASA2001 - Franschhoek 37
What is (phonetic) knowledge?
phonetic textbook knowledge probabilistic knowledge from
databases fixed set of features vs. adaptable set trading relations, selectivity knowledge of the world, expectation global vs. detailed
30 Nov. 2001 PRASA2001 - Franschhoek 38
How good ishuman/machine speech recogn.?
% word error corpus description vocabulary size
recognition perplexity machine human
TI digits read digits 10 10 0.72 0.009 alphabet read
letters 26 26 5 1.6
Resource Management
read sentences
1,000 60-1,000 17 2
NAB read sentences
5,000-unlimited
45-160 6.6 0.4
Switchboard CSR
spontaneous telephone conversations
2,000-unlimited
80-150 43 4
Switchboard wordspotting
idem 20 keywords
- 31.1 7.4
Adapted from Lippmann (SpeCom, 1997)
30 Nov. 2001 PRASA2001 - Franschhoek 39
Human vs. machine (ASR)
machine surprisingly good for certain tasks machine could be better for many others
robustness, outliers what are the limits of human performance?
in noise for degraded speech missing information (trading)
40
Human word intelligibility vs. noise
recognizers do have trouble!
humans start to have some trouble
30 Nov. 2001 PRASA2001 - Franschhoek 41
Robustness to degraded speech
speech = time-modulated signal in frequency bands
relatively insensitive to (spectral) distortions prerequisite for digital hearing aid modulating spectral slope: -5 to +5 dB/oct, 0.25-2 Hz
temporal smearing of envelope modulation ca. 4 Hz max. in modulation spectrum syllable LP>4 Hz and HP<8 Hz little effect on intelligibility
spectral envelope smearing for BW>1/3 oct masked SRT starts to degrade
30 Nov. 2001 PRASA2001 - Franschhoek 42
Robustness to degraded speechand missing information
partly reversed speech (Saberi & Perrott, Nature, 4/99) fixed duration segments time reversed or
shifted in time: perfect sentence intelligibility up to 50 ms (demo: every 50 ms reversed
original ) low frequency modulation envelope (3-8 Hz)
vs. acoustic spectrum syllable as information unit? (S. Greenberg)
gap and click restoration (Warren) gating experiments
30 Nov. 2001 PRASA2001 - Franschhoek 43
Desired pre-processor characteristics in ASR
basic sensitivity for stationary and dynamic sounds
robustness to degraded speech rather insensitive to spectral and temporal smearing
robustness to noise and reverberation filter characteristics
is BP, PLP, MFCC, RASTA, TRAPS good enough? lateral inhibition (spectral sharpening); dynamics
what can be neglected? non-linearities, limited dynamic range, active
elements, co-modulation, secondary pitch, etc.
30 Nov. 2001 PRASA2001 - Franschhoek 44
Caricature of present-day speech recognizers
fixed pre-processor, fixed features trained with a variety of speech input
much global information, but ..... no interrelations monaural, uni-modal input pitch extractor generally not operational performs well on average behavior
but ..... does poorly on any type of outlier (OOV, non-native, fast or whispered speech, other communication channel, new topic, new speaker)
neglects lots of useful (phonetic) information heavily relies on language model
Useful information: durational variability
R
S
Root /iy/
Lw
Lu
count
mean
s.d.
factorlevel
4626
95
39
1544
83
31
1588
95
36
1494
109
46
796
78
25
711
89
36
37
91
25
816
87
29
735
104
40
37
98
34
719
98
33
729
119
54
46
104
42
91
80
529
91
117
75
79
80
52
94
70
136
180
101
433
101
14
83
22
107
1
99
52
94
50
126
12
186
8
121
134
98
46
111
374
96
37
156
22
90
0 1 2
0 1 2 0 1 2 0 1 2
0 1 2 3 0 1 2 3 0 1 2
0 0 1 2 0 2 0 1 2
26 30 22 25 27 50 25 42 24 36 0
27 46 52 23 25 24 37 58 27
Adopted from Wang (1998)
normal rate=95
primary stress=104
word final=136
utterance final=186
overall average=95 ms
30 Nov. 2001 PRASA2001 - Franschhoek 46
Academia (knowledge) and industry (applications)
what do industry and universities expect from each other? (panel discussion at E’01)
proper education and training E-masters good exchange between academia & industry participation in joint projects speech DB adapt to requirements CAIP Symposium open source approach Linux, praat, HTK complaints: sometimes bad management and
high risk (puts HLT in bad spotlight, e.g. L&H)
30 Nov. 2001 PRASA2001 - Franschhoek 47
Information Technology for Homeland Security
Center for Advanced Information Processing, CAIP Symposium, Rutgers Univ., Nov. 29 “subsequent to events of Sept. 11, CAIP modified
its traditional Annual Research Review” “Symposium identifies issues in Homeland
Security and encourages research, particularly with university-industry cooperation”
e.g., biometric and voice identification; fusing voice and face data; multimodal interfaces for asset deployment; face-tracking for identification; microphone array for speaker tracking
30 Nov. 2001 PRASA2001 - Franschhoek 48
E-masters inLanguage and Speech
Course Content: Theoretical Linguistics Natural Language Processing Phonetics and Phonology Cognitive models for speech language
processing Speech signal processing Pattern recognition Language engineering applications
http://www.cstr.ed.ac.uk/euromasters/
30 Nov. 2001 PRASA2001 - Franschhoek 49
Conclusions
collecting speech corpora in national languages (like in SA) is and excellent basis, both for research and for applications
combine industrial and academic skills make proper use of experiences elsewhere that’s why we are all here at this workshop! good luck and thank you for your attention