rapid language portability of speech processing systems

35
Tanja Schultz Language Technologies Institute, InterACT, Carnegie Mellon University MULTILING, Stellenbosch, April 10, 2006 Rapid Language Portability of Speech Processing Systems

Upload: others

Post on 04-Feb-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Rapid Language Portability of Speech Processing Systems

Tanja Schultz

Language Technologies Institute, InterACT, Carnegie Mellon University

MULTILING, Stellenbosch, April 10, 2006

Rapid Language Portability of Speech Processing Systems

Page 2: Rapid Language Portability of Speech Processing Systems

Rapid Language Portability, Tanja Schultz 2/33

Computerization: Speech is key technologyMobile Devices, Ubiquitous Information Access

Globalization: MultilingualityMore than 6900 Languages in the world

Multiple official languages

Europe has 20+ official languages

South Africa has 11 official languages

⇒ Speech Processing in multiple LanguagesCross-cultural Human-Human Interaction

Human-Machine Interface in mother tongue

Motivation

Page 3: Rapid Language Portability of Speech Processing Systems

Rapid Language Portability, Tanja Schultz 3/33

ChallengesAlgorithms are language independent but require data

Dozens of hours audio recordings and corresponding transcriptionsPronunciation dictionaries for large vocabularies (>100.000 words)Millions of words written text corpora in various domains in questionBilingual aligned text corpora

BUT: Such data are only available in very few languagesAudio data ≤ 40 languages, Transcriptions take up to 40x real timeLarge vocabulary pronunciation dictionaries ≤ 20 languagesSmall text corpora ≤ 100 languages, large corpora ≤ 30 languagesBilingual corpora in very few language pairs, pivot mostly English

Additional complications:Combinatorical explosion (domain, speaking style, accent, dialect, ...)Few native speakers at hand for minority (endangered) languagesLanguages without writing systems

Page 4: Rapid Language Portability of Speech Processing Systems

Rapid Language Portability, Tanja Schultz 4/33

Solution: Learning Systems⇒ Intelligent systems that learn a language from the user

Effizient learning algorithms for speech processingLearning:

Interactive learning with user in the loop

Statistical modeling approaches

Efficiency:Reduce amount of data (save time and costs): by a factor of 10

Speed up development cycles: days rather than months

⇒ Rapid Language Adaptation from universal models

Bridge the gap between language and technology expertsTechnology experts do not speak all languages in question

Native users are not in control of the technology

Page 5: Rapid Language Portability of Speech Processing Systems

Rapid Language Portability, Tanja Schultz 5/33

SPICESpeech Processing: Interactive Creation and Evaluation toolkit• National Science Foundation, Grant 10/2004, 3 years• Principle Investigators Tanja Schultz and Alan Black

• Bridge the gap between technology experts → language experts• Automatic Speech Recognition (ASR), • Machine Translation (MT),• Text-to-Speech (TTS)

• Develop web-based intelligent systems• Interactive Learning with user in the loop• Rapid Adaptation of universal models to unseen languages

• SPICE webpage http://cmuspice.org

Page 6: Rapid Language Portability of Speech Processing Systems

Rapid Language Portability, Tanja Schultz 6/33

Page 7: Rapid Language Portability of Speech Processing Systems

Rapid Language Portability, Tanja Schultz 7/33

Input: Speech

Speech Processing Systems

Pronunciation rules

hi /h//ai/you /j/u/we /w//i/

hi youyou areI am

AM Lex LM Output: Speech & Text

Hello NLP /

MTTTS

Text data

Phone set & Speech data

Page 8: Rapid Language Portability of Speech Processing Systems

Rapid Language Portability, Tanja Schultz 8/33

Input: Speech

hi /h//ai/you /j/u/we /w//i/

hi youyou areI am

AM Lex LM Output: Speech & Text

NLP /

MTTTS

Phone set & Speech data

+

Hello

Rapid Portability: Data

Page 9: Rapid Language Portability of Speech Processing Systems

Rapid Language Portability, Tanja Schultz 9/33

GlobalPhoneMultilingual Database

Widespread languagesNative SpeakersUniform DataBroad DomainLarge Text Resources

Internet, Newspaper

Corpus19 Languages … counting≥ 1800 native speakers≥ 400 hrs Audio dataRead SpeechFilled pauses annotated

ArabicCh-MandarinCh-ShanghaiGermanFrenchJapaneseKorean

CroatianPortugueseRussianSpanishSwedishTamilCzech

Turkish+ Thai+ Creole+ Polish+ Bulgarian+ ... ???

Now available from ELRA !!

Page 10: Rapid Language Portability of Speech Processing Systems

Rapid Language Portability, Tanja Schultz 10/33

Speech Recognition in 17 Languages

1011.8

14 14 14.514.516.9 18 19 20 20

29

33.5

2021.7

23.4

29

0

10

20

30

40

Japa

nese

German

English Tha

i

Korean

Ch-Man

darin

Turkish

French

Portug

uese

Croatia

n

Spanis

h

Bulgaria

n

Russian

Afrikaa

ns

Chines

eArab

icIra

qi

Wor

d Er

ror R

ate

[%]

Page 11: Rapid Language Portability of Speech Processing Systems

Rapid Language Portability, Tanja Schultz 11/33

Input: Speech

hi /h//ai/you /j/u/we /w//i/

hi youyou areI am

AM Lex LM Output: Speech & Text

NLP /

MTTTS

Phone set & Speech data

+

Hello

Rapid Portability: Acoustic Models

Page 12: Rapid Language Portability of Speech Processing Systems

Rapid Language Portability, Tanja Schultz 12/33

Speech Production is independent from Language ⇒ IPA1) IPA-based Universal Sound Inventory

2) Each sound class is trained by data sharing

Reduction from 485 to 162 sound classesm,n,s,l appear in all 12 languagesp,b,t,d,k,g,f and i,u,e,a,o in almost all

Problem: Context of sounds are language specific Context dependent models for new languages?Solution:1) Multilingual Decision Context Trees2) Specialize decision tree by Adaptation

Universal Sound Inventory

-1=Plosiv?N J

k (0)

klau k raut k leot k orin k ar

+2=Vokal?N J

k (1) k (2)lau k rain k ar

ut k leot k or

BlaukrautBrautkleidBrotkorbWeinkarte

Page 13: Rapid Language Portability of Speech Processing Systems

Rapid Language Portability, Tanja Schultz 13/33

Rapid Portability: Acoustic Model

69,1

57,149,9

40,632,8 28,9

19,6 19

0

20

40

60

80

100

Wor

d E

rror

rat

e [%

]

0 0:15 0:15 0:25 0:25 0:25 1:30 16:30

Ø Tree ML-Tree Po-Tree PDTS

+

Page 14: Rapid Language Portability of Speech Processing Systems

Rapid Language Portability, Tanja Schultz 14/33

Projekt: SPICE

Page 15: Rapid Language Portability of Speech Processing Systems

Rapid Language Portability, Tanja Schultz 15/33

Input: Speech

Rapid Portability: Pronunciation Dictionary

Pronunciation rules

hi /h//ai/you /j/u/we /w//i/

hi youyou areI am

AM Lex LM Output: Speech & Text

NLP /

MTTTS

Textdaten„adios“ /a/ /d/ /i/ /o/ /s/„Hallo“ /h/ /a/ /l/ /o/ „Phydough“ ???

Hello

Page 16: Rapid Language Portability of Speech Processing Systems

Rapid Language Portability, Tanja Schultz 16/33

11.5

19.218.424.526.8

15.614 12.7

3336.4

32.8

16

26.4

18.3

0.0

10.0

20.0

30.0

40.0

50.0

Wor

d E

rror

Rat

e [%

]Phoneme Grapheme (FTT)Grapheme

English Spanish German Russian Thai

Phoneme- vs Grapheme based ASR

Problem:• 1 Grapheme ≠ 1 Phoneme

Flexible Tree Tying (FTT):One decision tree• Improved parameter tying• Less over specification• Fewer inconsistencies

0=vowel?

0=obstruent? 0=begin-state?

-1=syllabic?0=mid-state?-1=obstruent?0=end-state?

AX-m

IX-m

AX-b

Page 17: Rapid Language Portability of Speech Processing Systems

Rapid Language Portability, Tanja Schultz 17/33

Dictionary: Interactive Learning* Follow the work ofDavel&Barnard

* Word list: extract from text

User

Word list W

i:= best selectWord wi

Generate pronunciation P(wi)

TTS

P(wi) okay?Yes

Delete wi

No

Update G-2-P

Improve P(wi)

G-2-P

Delete wi

* Update after each wi→ more effective training

* G-2-P- explicit mapping rules - neural networks - decision trees- instance learning (grapheme context)

LexSkip

Page 18: Rapid Language Portability of Speech Processing Systems

Rapid Language Portability, Tanja Schultz 18/33

Page 19: Rapid Language Portability of Speech Processing Systems

Rapid Language Portability, Tanja Schultz 19/33

Page 20: Rapid Language Portability of Speech Processing Systems

Rapid Language Portability, Tanja Schultz 20/33

Issues and ChallengesHow to make best use of the human?

Definition of successful completion Which words to present in what order How to be robust against mistakes Feedback that keeps users motivated to continue

How many words to be solicited? G2P complexity depends on language 80% coveragehundred (SP) to thousands (EN)G2P rule system perplexity

16.80Dutch16.70German11.48Afrikaans

1.21Spanish3.52Italian

50.11English

PerplexityLanguage

Page 21: Rapid Language Portability of Speech Processing Systems

Rapid Language Portability, Tanja Schultz 21/33

Input: Speech

Rapid Portability: LM

hi /h//ai/you /j/u/we /w//i/

hi youyou areI am

AM Lex LM Output: Speech & Text

NLP /

MTTTS

Text data

Internet / TV

Hello

Inquiry

AutomaticExtraction

LM

Bridge Languages

+

Resource rich languages ↔ Resource low languages:

Page 22: Rapid Language Portability of Speech Processing Systems

Rapid Language Portability, Tanja Schultz 22/33

Projekt: SPICE

Page 23: Rapid Language Portability of Speech Processing Systems

Rapid Language Portability, Tanja Schultz 23/33

Input: Speech

hi /h//ai/you /j/u/we /w//i/

hi youyou areI am

AM Lex LM Output: Speech & Text

NLP /

MTTTS

Phone set & Speech data

Hello

Rapid Portability: TTS

Page 24: Rapid Language Portability of Speech Processing Systems

Rapid Language Portability, Tanja Schultz 24/33

Parametric TTSText-to-speech for G2P Learning:

Technique: phoneme-by-phoneme concatenation, speech not natural but understandable (Marelie Davel)Units are based on IPA phoneme examples

PRO: covers languages through simple adaptationCONS: not good enough for speech applications

Text-to-speech for Applications: Common technologies

Diphone: too hard to record and label Unit selection: too much to record and label

New technology: clustergen trajectory synthesisClusters representing context-dependent allophones PRO: can work with little speech (10 minutes) CONS: speech sounds buzzy, lacks natural prosody

Page 25: Rapid Language Portability of Speech Processing Systems

Rapid Language Portability, Tanja Schultz 25/33

SPICE: Afrikaans - EnglishGoal: Build Afrikaans – English Speech Translation System using SPICE

Cooperation with University Stellenbosch and ARMSCORBilingual PhD visited CMU for 3 month (thanks Herman Engelbrecht !!!)Afrikaans: Related to Dutch and English, g-2-p very close, regular grammar, simple morphology

SPICE, all components apply statistical modeling paradigmASR: HMMs, N-gram LM (JRTk-ISL)MT: Statistical MT (SMT-ISL)TTS: Unit-Selection (Festival)Dictionary: G-2-P rules using CART decision trees

Text: 39 hansards; 680k words; 43k bilingual aligned sentence pairs;Audio: 6 hours read speech; 10k utterances, telephone speech (AST)

Page 26: Rapid Language Portability of Speech Processing Systems

Rapid Language Portability, Tanja Schultz 26/33

SPICE: Time effortGood results: ASR 20% WER; MT A-E (E-A) Bleu 34.1 (34.7), Nist 7.6 (7.9)Shared pronunciation dictionaries (for ASR+TTS) and LM (for ASR+MT)Most time consuming process: data preparation → reduce amount of data!Still too much expert knowledge required (e.g. ASR parameter tuning!)

5 8 7

311

5 505

1015

2025

Data Training Tuning Evaluation Prototype

daysAM (ASR) Lex LM (ASR, MT) TM (MT) TTS S-2-S

Page 27: Rapid Language Portability of Speech Processing Systems

Rapid Language Portability, Tanja Schultz 27/33

Other Projects on MultilingualityConstantly growing interest in multilingualityMajor needs:

Information gathering from multiple sourcesTranslation requirements for multilingual communitiesTwo-way communication

Translation of BN, Lectures, and MeetingsUS: GALE (DARPA), STR-Dust (NSF) Europe: TC_Star (EU FP6)

Translation in mobile communication scenariosUS: TransTac (DARPA), Thai ST (Laser)

Page 28: Rapid Language Portability of Speech Processing Systems

Rapid Language Portability, Tanja Schultz 28/33

Translation of Broadcast News, Lectures and Meetings

你们的评估准则是什么

Projects:• TC_STAR (EC FP6)• STR-DUST (NSF)• Gale (DARPA)

Demo

Page 29: Rapid Language Portability of Speech Processing Systems

Rapid Language Portability, Tanja Schultz 29/33

Largest DARPA project in HLT (EARS+TIDES)

Automatically process huge volumes of speech and text data in multiple languages

Broadcast News, Talk Shows, Telephone Conversations

Chinese, Arabic (+ dialectal variations), surprise languages

Deliver pertinent information in easy-to-understand forms to monolingual analysts, 3 engines:

Transcription: Transform multilingual speech to text

Translation: transform any text to English

Distillation: extract & present information to English analyst

Gale: Global Autonomous Language Exploitation

Page 30: Rapid Language Portability of Speech Processing Systems

Rapid Language Portability, Tanja Schultz 30/33

Demonstration

MandarinBroadcast NewsCCTVrecorded in the USover satellite

Transforming theMandarin speechInto Chinese textusing AutomaticSpeech RecognitionASR

Translating fromChinese text intoEnglish textusing StatisticalMachine Translation

SMT

Page 31: Rapid Language Portability of Speech Processing Systems

Rapid Language Portability, Tanja Schultz 31/33

PDA Speech Translation in Mobile Scenarios

TourismNeeds in Foreign CountryInternational Events

ConferencesBusinessOlympics

Humanitarian NeedsHumanitarian, Government

Medical, Refugee Registration

Projects:Thai ST (Laser)TransTac (DARPA)

Page 32: Rapid Language Portability of Speech Processing Systems

Rapid Language Portability, Tanja Schultz 32/33

TransTacTeam effort:

Speech Recognition (CMU / Mobile, LLC)Statistical MT (CMU / Mobile, LLC)Speech Synthesis Swift (Cepstral, LLC)Graphical User Interface (Mobile, LLC)

System runs on all platforms Off-the-shelf consumer PDAsLaptop/Desktop under Win/CE/LinuxPhraselator P2 (Voxtec)

InterfaceSimple and intuitive push-to-talk Back translation for confirmation

Language pairs: English-Thai + English-ArabicHandheld: Joint optimization of speed and accuracy

About 1.5 real-time on a 800MHz PXA270, 128Mb RAM

Page 33: Rapid Language Portability of Speech Processing Systems

Rapid Language Portability, Tanja Schultz 33/33

ConclusionIntelligent systems to learn language

SPICE: Learning by interaction with the (naive) user

Rapid Portability to unseen languages

Multilingual Systems

Systems and data in multiple languages

Universal language independent models

Projects on MultilingualityExtract information from multilingual speech data

Speech translation in mobile scenarios

Page 34: Rapid Language Portability of Speech Processing Systems

Rapid Language Portability, Tanja Schultz 34/33

Page 35: Rapid Language Portability of Speech Processing Systems

Rapid Language Portability, Tanja Schultz 35/33