multilingual hlt in europe and the development of asr

49
Multilingual HLT in Europe and the development of ASR Louis C.W. Pols Institute of Phonetic Sciences University of Amsterdam The Netherlands RASA2001 – Franschhoek, South Africa 0 Nov. 2001, keynote

Upload: vernon-stein

Post on 30-Dec-2015

19 views

Category:

Documents


2 download

DESCRIPTION

Multilingual HLT in Europe and the development of ASR. Louis C.W. Pols Institute of Phonetic Sciences University of Amsterdam The Netherlands. PRASA2001 – Franschhoek, South Africa 30 Nov. 2001, keynote. Some history. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Multilingual HLT in Europe and the development of ASR

Multilingual HLT in Europe and the development of

ASR

Louis C.W. PolsInstitute of Phonetic Sciences

University of AmsterdamThe Netherlands

PRASA2001 – Franschhoek, South Africa30 Nov. 2001, keynote

Page 2: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 2

Some history Liesbeth Botha spent half a year at our

institute during second half of 1996 ever since the possible organization of a

workshop or a major conference in South Africa was considered

(cancelled) AST Workshop on ‘Human Language Technologies for E-Governance in a Multilingual Society’, Stellenbosch

PRASA2001 – Franschhoek, 29-30 Nov., incl. Speech Processing and AST project

I always wanted to visit South Africa!

Page 3: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 3

Overview Multilingual Europe (vs. Multilingual South Africa) EU Framework Programs; Human Language

Technology (HLT) Other (European) programs and organizations ISCA Dutch speech database initiatives (vs. AST) Speech science and technology; ASR development Academia (knowledge) and industry (applications) Conclusions

Page 4: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 4

Multilingual Europe

Europe (West, Central, East) EU-countriesCandidate-EU-countries Schengen countries (internally no boundary

control)Euro countries (300 M people)

many nations and even more languages multilingual community and (open) market e-commerce, telebanking, infokiosk, etc.

Page 5: Multilingual HLT in Europe and the development of ASR
Page 6: Multilingual HLT in Europe and the development of ASR
Page 7: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 7

EU Framework Program FP5

Human Language Technologies RTD (HLT)http://www.hltcentral.org/

part of Information Society Technologies (IST), Key Action III (Multimedia Contents and Tools)

part of fifth Framework Program ’98-’02 (FP5) IST 3600 M€ (26.5% of FP5); HLT 125 M€ HLT: Multilingual communication

Natural Interactivity Cross-lingual information management Support & Accompanying Measures

Page 8: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 8

6th Framework program

FP6 (’02-’06) the way forward proposal published Febr. 2001 one of 7 priority themes:

Information Society Technologies also networks of excellence IST budget 3600 M€

Page 9: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 9

Complaints from academia

too much application & user oriented little room for research (reaction Commission: it

is time for HLT to show its usefulness!), but .... pendulum swings!

speech data not freely available (only with delay and at (high) costs via ELRA)

still: several very interesting projects we participated before (SAM, EuroCocosda,

somewhat in SpeechDat) but barely anymore, but (KPN Research and) Nijmegen University still do

Page 10: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 10

Some HLT ‘speech’ projects

C-ORAL-ROM Integrated Reference Corpora for Spoken Romance Languages (1/01, 36 mo)

CORETEX Improving Core Speech Recognition Technology (4/00, 36 mo) I-EYE Interacting with Eyes: Gaze Assisted Access to Information in

Multiple Languages (1/00, 30 mo) NESPOLE! NEgotiating through SPOken Lang. in E-comm. (1/00, 30 mo) SIRIDUS Specification, Interaction and Reconfiguration In Dialogue

Understanding Systems (1/00, 36 mo) SMADA Sp. Driven Multimodal Automatic Directory Assist. (1/00, 36 mo)

(finalizing ITRW ’Advanced ASR for Telecom Appl.’, Nov. 2002, Avignon) SPEECON Sp. Driven Interfaces for Consumer Applications (2/00, 24 mo)

Page 11: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 11

Some ‘past’ HLT projects

ARISE Automatic Railway Systems for Europe (10/96, 24 mo) CAVE Caller Verification in Bank and Telecommunication (11/95, 24

mo) EAGLES Expert Advisory Group on Language Engineering Standards

(11/97, 24 mo) ELRA European Language Resources Association (9/95, 50 mo) ELSE Evaluation in Language and Speech Engineering (1/98, 16 mo) SPEECHDAT Speech Databases for Creation of Voice Driven

Teleservices (3/96, 34 mo) SPEECHDAT-CAR (3/98, 30 mo) + variants VODIS Advanced Speech Technologies for Voice-operated Driver

Information Systems (11/95, 43 mo)

Page 12: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 12

some HLT ‘support’ projects

CLASS Collaboration in Language and Speech Science and technology (Int. WS on ‘Information Presentation and Natural Multimodal Dialogue’, Verona Italy, Dec 14-15, 2001)

ELSNET-HLT The European Network of Excellence in Human Language Technologies

HOPE HLT Opportunity Promotion in Europe, Euromap

ISLE-HLT Int. Standards for Language Engineering (Eagles follow-up) incl. I/O Meta Data Initiative (IMDI), see also COREX

Page 13: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 13

eContent

eContent part of eEurope initiative European Digital Content on the Global

Networks, ’01-’05, 100 M€, 1st call 3/2001 Action Line 2 (AL2) addresses the intersection of

the content and language industries, more specifically the design, production and distribution of high-quality European digital content for the global networks in an increasingly multilingual and multicultural socio-economic environment

http://www.hltcentral.org/econtent/

Page 14: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 14

MLIS

Multilingual Information Society Program Supporting the creation of a framework of

services for European language resources Encouraging the use of language

technologies, resources and standards Promoting the use of advanced language

tools in the Community and Member States public sector

one call in June ’99, 15 M€, some 30 proj. f.i. NL-TRANSLEX: Machine Translation for

Dutch and English/French/German

Page 15: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 15

INTAS

International Association for the promotion of co-operation with scientists from the New Independent States of the former Soviet Union (NIS)

established June 1993 Open + Thematic Call 2000 (budget 16 M €) max budget 150 k€/project (max 30 k€/NIS partner)

INTAS 915 ‘Spontaneous Speech of Typologically Unrelated Languages (Russian, Finnish and Dutch): Comparison of Phonetic Properties’ (90 k€, 7/01, 36 mo)

Page 16: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 16

Euromap

HLT Opportunity Promotion in Europe (HOPE) (2/00, 24 mo, 8 national focus points)

to raise awareness of the benefits of human language technologies (HLT) with companies, organizations and users; to accelerate technology transfer from the research base to the market; to stimulate community building in specific domains (tourism and e-commerce).

General: http://www.hltcentral.org/euromap/ Dutch site: http://www.taalunieversum.org/tst/en/

Page 17: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 17

European Language Resources Association

A non-profit organization to promote the creation, verification, and distribution of language resources. US counterpart: LDC 173 resources sold in 2000. organizer of LREC conferences (third one in May

2002 in Las Palmas, Spain) speech & related resources ~200 written resources ~145 terminological resources tools and software

http://www.icp.grenet.fr/ELRA/home.html

Page 18: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 18

ELSNET European Network of Excellence in Human

Language Technologies one of the ~20 networks within FP5 Transfer of knowledge and expertise; Shared

goals; Evaluation; Shared language resources; Promotion of best practice; Interoperability by means of standardization

yearly Elsnet Summer Schools: July 15-26, 2002 Odense, Denmark, ‘Evaluation and Assessment of Text and Speech Systems’

Newsletter Elsnews; http://www.elsnet.org

Page 19: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 19

COCOSDA

Internat. organization for coordinating the globalized efforts in spoken language resources and sp. technology evaluation

yearly, jointly, with Eurospeech and ICSLP since Chiavari, Italy, Sept. ’91 (Eurosp.’91) and before; Oriental Cocosda

topic domains Evaluation of Speech Underst. and Dialogue Systems (W. Minker) Multi-modal corpora (S. Nakamura) Corpus Annotation Tools (S. Bird) Local Languages (D. Gibbon)

regional programs (Europe; Asia; Oceania; Africa; Latin America) data center representatives (LDC, S. Bird; ELRA, K. Choukri) http://www.itl.atr.co.jp/cocosda

Justus Roux

Page 20: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 20

COCOSDA matrix

Page 21: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 21

COST

European Cooperation in the field of Scientific and Technical Research (~60 k€ per action, for additional costs only): COST 249: Continuous Speech Recognition over the

Telephone (19 countries; start 5/94; 6 yrs; final report) COST 250: Speaker Recognition in Telephony COST 258: The Naturalness of Synthetic Speech COST 277: Nonlinear Speech Processing COST 278: Spoken Language Interaction in

Telecommun. http://cost.cordis.lu/src/home.cfm

Page 22: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 22

EURESCOM

the European Institute for Research and Strategic Studies in Telecommunications

20 shareholders from 19 European countries (major European network operators and service providers) f.i. MUST - MUltimodal, multilingual

information Services with small mobile Terminals (P1104)

Page 23: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 23

ISCA European Speech Comm. Association founded in

’88 from ESCA to ISCA at Eurospeech’99 in Budapest membership organization organizer of Eurospeech/ICSLP - Interspeech organizer of specialized workshops (ITRWs) Special interest groups (SIGs) Speech Communication Journal

(http://www.elsevier.com/locate/specom) http://www.isca-speech.org/

Page 24: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 24

Eurospeech-ICSLP-Interspeech

odd years (Eurospeech) even years (ICSLP)

(in Europe) (elsewhere)1 Paris ’89 Kobe ’902 Genoa ’91 Banff ’923 Berlin ’93 Yokohama ’944 Madrid ’95 Philadelphia ’965 Rhodes ’97 Sydney ’986 Budapest ’99 Beijing ’007 Aalborg ’01 Denver ’028 Geneva ’03 Seoul ’049 Lisbon ’05 ?? ’06

past

future

Page 25: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 25

ISCA SIGs

Speech Synthesis - SynSig Audio Visual Speech - AVISA Speech And Language Technology for MInority

Languages - SALTMIL Integration of Speech Technology in (Language) Learning

- InSTIL SPeaker and Language Characterization - SPLC Education in the Field of Speech Communication -

EduSIG Speech Prosody - SProSIG Dialogue Processing - SigDial (also within ACL) Groupe Francophone de la Communication Parlée - GFCP

Page 26: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 26

ISCA ITRWs (forthcoming)

Prosody in Speech Recognition and Understanding - Prosody 2001Molly Pitcher Inn, Red Bank, NJ. October 22-24, 2001

TIPS - Temporal Integration in the Perception of Speech Aix-en-Provence, France, 8-10 April 2002

Multi-Modal Dialogue in Mobile Environments Kloster Irsee, Germany, June 17-21, 2002

Advanced ASR for Telecom Applications Palais des Papes, Avignon, France, November 27-29, 2002

Supported but not organized by ISCA: 2001 International Workshop on Automatic Sp. Recogn. and

Underst. Madonna di Campiglio (Trento), Italy, December 9-13, 2001

Speech Prosody 2002 Aix-en-Provence, France, 11-13 April, 2002

Page 27: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 27

IEEE IEEE Signal Processing Society

MMSP’01, Workshop on Multimedia Signal Processing, Cannes, France, October 3-5, 2001

ASRU’01, Automatic Speech Recognition and Understanding Workshop, Madonna de Campiglio (Trento), Italy, December 9-13, 2001

2002 International Workshop on Multimedia Signal Processing, US Virgin islands, December 9-11, 2002

IEEE Trans. on Signal Processing / Speech and Audio Processing / Multimedia / Neural Networks

http://www.ieee.org/

Page 28: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 28

DARPA NIST

DARPA Projects and Yearly evaluations CSR (Continuous Speech Recognition); LVCSR (Large Vocabulary Conversational

Speech Recognition); ATIS (Air Travel Information System); Language Recognition (Identification and

Verification); Speaker Recognition (Identification and

Verification)

Page 29: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 29

NATO-ASI

ASI = Advanced Study Institute many different domains certain restrictions on NATO vs. non-NATO

participants, free registration, some funding Dynamics of Speech Production and

Perception, Il Ciocci, Italy, June 23 – July 6, 2002

send application before Jan. 15, 2002 to [email protected]

Organizing Cee.: Pierre L. Divenyi & Klára Vicsi

Page 30: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 30

European national programs

German Verbmobil; SmartKom (since 9/99) Bavarian Archive for Speech Signals (BAS)

Spoken Dutch Corpus French AUP Swedish Centre for Speech Technology

(CTT) Swedish National Graduate School in Language Technology (GSLT)

Page 31: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 31

Dutch speech database initiatives

Speech Processing Expertise Center SPEX 5,000 speakers Polyphone 1,000 speakers SpeechDat + variants NWO Priority program TST-OVIS (public

transportation information system over telephone)

1,000 hrs CGN (Dutch-Flemish) 5.5 hrs ‘open source’ IFA-corpus TST Platform ToDI (Transcription of Dutch Intonation)

Page 32: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 32

Spoken Dutch Corpus

4.6 M€, 5 yrs, 10 M words, ~ 1000 hrs of speech Corpus design and compilation Recording and digitization Orthographic transcription (all) Lemmatization and POS tagging (all) Lexicon link-up (all) Broad phonetic transcription (1 M) Word segmentation (1 M) Syntactic annotation (1 M) Prosodic annotation (250 k) Development of exploitation software COREX

http://lands.let.kun.nl/cgn/home.htm

Page 33: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 33

IFA corpus

5.5 hrs of high-quality-recorded speech 4 male and 4 female speakers more than 30 min. per speaker various speaking styles per speaker

from conversational and read speech, to isolated sentences, words and syllables

everything phonemically segmented & labeled

free access via SQL query language http://www.fon.hum.uva.nl/IFAcorpus

Page 34: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 34

Speech science and speech technology

we should try to bridge that gap see my keynotes at ICPhS ’99 and

Eurospeech’01:“Flexible, robust and efficient human speech processing versus present-day speech technology”“Acquiring and implementing phonetic knowledge”

we have to understand each other in order to be able to communicate and to contribute

probabilistic vs. knowledge driven adding (multiple) knowledge (sources) to

improve performance much knowledge in speech databases

Page 35: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 35

Phonetics Speech Techn.

AFFINITY to:

from:

phonetics speech technology

phonetics

source / filter individuality context prosody

human performance specific knowledge regularities multiple features

speech technology

more data new models probabilities speech vs. NLP

EU FPV, DARPA applications user orientation evaluation

Page 36: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 36

Do recognizers need intelligent ears?

intelligent ears front-end pre-processor only if it improves performance humans are generally better speech

processors than machines, perhaps system developers can learn from human behavior

robustness at stake (noise, reverberation, incompleteness, restoration, competing speakers, variable speaking rate, context, dialects, non-nativeness, style, emotion)

Page 37: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 37

What is (phonetic) knowledge?

phonetic textbook knowledge probabilistic knowledge from

databases fixed set of features vs. adaptable set trading relations, selectivity knowledge of the world, expectation global vs. detailed

Page 38: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 38

How good ishuman/machine speech recogn.?

% word error corpus description vocabulary size

recognition perplexity machine human

TI digits read digits 10 10 0.72 0.009 alphabet read

letters 26 26 5 1.6

Resource Management

read sentences

1,000 60-1,000 17 2

NAB read sentences

5,000-unlimited

45-160 6.6 0.4

Switchboard CSR

spontaneous telephone conversations

2,000-unlimited

80-150 43 4

Switchboard wordspotting

idem 20 keywords

- 31.1 7.4

Adapted from Lippmann (SpeCom, 1997)

Page 39: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 39

Human vs. machine (ASR)

machine surprisingly good for certain tasks machine could be better for many others

robustness, outliers what are the limits of human performance?

in noise for degraded speech missing information (trading)

Page 40: Multilingual HLT in Europe and the development of ASR

40

Human word intelligibility vs. noise

recognizers do have trouble!

humans start to have some trouble

Page 41: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 41

Robustness to degraded speech

speech = time-modulated signal in frequency bands

relatively insensitive to (spectral) distortions prerequisite for digital hearing aid modulating spectral slope: -5 to +5 dB/oct, 0.25-2 Hz

temporal smearing of envelope modulation ca. 4 Hz max. in modulation spectrum syllable LP>4 Hz and HP<8 Hz little effect on intelligibility

spectral envelope smearing for BW>1/3 oct masked SRT starts to degrade

Page 42: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 42

Robustness to degraded speechand missing information

partly reversed speech (Saberi & Perrott, Nature, 4/99) fixed duration segments time reversed or

shifted in time: perfect sentence intelligibility up to 50 ms (demo: every 50 ms reversed

original ) low frequency modulation envelope (3-8 Hz)

vs. acoustic spectrum syllable as information unit? (S. Greenberg)

gap and click restoration (Warren) gating experiments

Page 43: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 43

Desired pre-processor characteristics in ASR

basic sensitivity for stationary and dynamic sounds

robustness to degraded speech rather insensitive to spectral and temporal smearing

robustness to noise and reverberation filter characteristics

is BP, PLP, MFCC, RASTA, TRAPS good enough? lateral inhibition (spectral sharpening); dynamics

what can be neglected? non-linearities, limited dynamic range, active

elements, co-modulation, secondary pitch, etc.

Page 44: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 44

Caricature of present-day speech recognizers

fixed pre-processor, fixed features trained with a variety of speech input

much global information, but ..... no interrelations monaural, uni-modal input pitch extractor generally not operational performs well on average behavior

but ..... does poorly on any type of outlier (OOV, non-native, fast or whispered speech, other communication channel, new topic, new speaker)

neglects lots of useful (phonetic) information heavily relies on language model

Page 45: Multilingual HLT in Europe and the development of ASR

Useful information: durational variability

R

S

Root /iy/

Lw

Lu

count

mean

s.d.

factorlevel

4626

95

39

1544

83

31

1588

95

36

1494

109

46

796

78

25

711

89

36

37

91

25

816

87

29

735

104

40

37

98

34

719

98

33

729

119

54

46

104

42

91

80

529

91

117

75

79

80

52

94

70

136

180

101

433

101

14

83

22

107

1

99

52

94

50

126

12

186

8

121

134

98

46

111

374

96

37

156

22

90

0 1 2

0 1 2 0 1 2 0 1 2

0 1 2 3 0 1 2 3 0 1 2

0 0 1 2 0 2 0 1 2

26 30 22 25 27 50 25 42 24 36 0

27 46 52 23 25 24 37 58 27

Adopted from Wang (1998)

normal rate=95

primary stress=104

word final=136

utterance final=186

overall average=95 ms

Page 46: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 46

Academia (knowledge) and industry (applications)

what do industry and universities expect from each other? (panel discussion at E’01)

proper education and training E-masters good exchange between academia & industry participation in joint projects speech DB adapt to requirements CAIP Symposium open source approach Linux, praat, HTK complaints: sometimes bad management and

high risk (puts HLT in bad spotlight, e.g. L&H)

Page 47: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 47

Information Technology for Homeland Security

Center for Advanced Information Processing, CAIP Symposium, Rutgers Univ., Nov. 29 “subsequent to events of Sept. 11, CAIP modified

its traditional Annual Research Review” “Symposium identifies issues in Homeland

Security and encourages research, particularly with university-industry cooperation”

e.g., biometric and voice identification; fusing voice and face data; multimodal interfaces for asset deployment; face-tracking for identification; microphone array for speaker tracking

Page 48: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 48

E-masters inLanguage and Speech

Course Content: Theoretical Linguistics Natural Language Processing Phonetics and Phonology Cognitive models for speech language

processing Speech signal processing Pattern recognition Language engineering applications

http://www.cstr.ed.ac.uk/euromasters/

Page 49: Multilingual HLT in Europe and the development of ASR

30 Nov. 2001 PRASA2001 - Franschhoek 49

Conclusions

collecting speech corpora in national languages (like in SA) is and excellent basis, both for research and for applications

combine industrial and academic skills make proper use of experiences elsewhere that’s why we are all here at this workshop! good luck and thank you for your attention