direct translation approaches: statistical machine translation stephan vogel, alicia tribble...

Post on 12-Jan-2016

243 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Direct Translation Approaches:Statistical Machine Translation

Stephan Vogel, Alicia Tribble

Interactive Systems LabCarnegie Mellon University &University Karlsruhe

Speech-to-Speech Translation WorkshopESSLLI 2002, Trento, Italy

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 2

Overview

Translation ApproachesStatistical Machine TranslationTranslating with Cascaded TransducersExperiments on Nespole Data

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 3

Translation Approaches

Interlingua basedTransfer basedDirect Example based Statistical

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 4

Statistical Machine Translation

Based on Bayes´ Decision Rule:

ê = argmax{ p(e | f) } = argmax{ p(e) p(f | e) }

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 5

Tasks in SMT

Modelling build statistical models which capture characteristic features of translation equivalences and of the target language

Training train translation model on bilingual corpus, train language model on monolingual corpus

Decoding find best translation for new sentences according to models

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 6

Alignment Example

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 7

Translation Models

IBM1 – lexical probabilities onlyIBM2 – lexicon plus absolut positionHMM – lexicon plus relative positionIBM3 – plus fertilitiesIBM4 – inverted relative position alignment IBM5 – non-deficient version of model 4

[Brown, et.al. 93, Vogel, et.al. 96]

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 8

HMM Alignment Model

p(f|e) = a p(f1J, a1

J | e1I)

= a j p(fj , aj | f1j-1, a1

j-1, e1

I)

= a j p(aj | aj-1) p(fj | ea(j))

~ maxa j p(aj | aj-1) p(fj | ea(j))Alignment aj of current word fj depends on alignment aj-1 of previous word fj-1 .

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 9

Phrase Translation

Why? To capture context Local word reordering

How? Train alignment model Extract phrase-to-phrase translations from Viterbi path

Notes: Often better results when training target to source for

extraction of phrase translations Phrases are not fully integrated into alignment model,

they are extracted only after training is completed

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 10

Translation with Transducers

Transducer: Finite state machine Read sequence of words, write sequene of words Output vocaculary can be different from input vocabulary

Transducer used in current implementation: Tree Transducer, i.e. prefix tree over input strings Output from final states Used to encode lexicon, phrase translations, bilingual word classes and grammers

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 11

Cascaded Transducers

Generalization through cascaded transducers:Replace words by category labels and have a transducer for each category

[Vogel, Ney 2000]

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 12

Language Model

Standard n-gram model:

p(w1 ... wn) = i p(wi | w1... wi-1)

= i p(wi | wi-2 wi-1) trigram

= i p(wi | wi-1) bigram

Many events not seen -> smoothing required

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 13

Decoding Strategies

Sequential construction of target sentence Extend partial translation by words which are

translations of words in the source sentence Language model can be applied immediately Mechanism to ensure proper coverage of

source sentence required

Left – right over source sentence Find translations for sequences of words Construct translation lattice Apply language model and select best path

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 14

Translation Graph

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 15

Speech Recognition and Translation

Search best string in target language for given acoutsic signal in source language

ê = argmax{ p(e) p(x|e) } = argmax{ p(e) f p(f,x|e) }

= argmax{ p(e) f p(f|e) p(f) p(x|f,x) } = argmax{ p(e) f p(f|e) p(f) p(x|f) }

i.e. recognizer language model not needed !?[Ney, 2001]

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 16

Coupling Recognition and Translation

Sequential – first recognition, then translation First best recognition hypothesis N-best list – translate n times Word lattice – translate all pathes in lattice, reuse results

from partial pathes

Integrated – recognition and translation in combined search

Subsequential transducer approach uses this

Note: In Eutrans project best results when translation on first-best hypothesis

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 17

Example-Based Machine Translation

Re-use translations to create new translations:Store bilingual corpus with (partial) alignmentFind partial matches, i.e. sequences of words in stored corpus to cover a new sentence Extract translation(s) and build translation latticeApply language model to find best path, i.e. best translation

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 18

Nespole Experiments

Application of direct translation techniques to dialogue data collected in Nespole!Testing the effect of phrase translationExperiments with additional knowledge sources Preexisting: monolingual data for the LM and

publically available Lexica Engineered: handwritten rules for fixed

expressions and knowledge extracted from semantic grammars

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 19

Nespole Project Data

CMU database of dialogues in the travel domainGerman, English (Italian, French)Speech recognizer hypotheses and human transcriptions both availableSegmented into SDUs (Speech Dialogue Units)

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 20

Nespole Corpus: Training

Language English German

Tokens 15572 14992

Vocabulary 1032 1338

Singletons 404 620

3182 Parallel SDUs

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 21

Nespole Corpus: Testing

German Reference A Reference B

Tokens 437 610 607

Vocabulary 183 (45 OOV) 165 160

70 Parallel SDUs

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 22

0 1 2 3 4 5 6 7 8 9 10

English

German

0 2 4 6 8 10

English

German

Testing Data

Training Data

Corpus Challenges: Sentence Length

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 23

Evaluation

Human Scoring Good, Okay, Bad (c.f. Nespole evaluation) Collapsed into a „human score“ on [0,1]

Bleu Score Average of N-gram precisions from (1..N),

typically N=3 or 4 Penalty for short translations to substitute

for recall measure

[Papinini et.al. 2001]

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 24

Phrase Translation

Unequal sentence lengths means that training can be improved directionally: S T or T SGerman compounds are better for 1 to many alignments with English multiword phrases, so direction is importantStatistical lexicon alone

Statistical lexicon, phrases from S T training

Statistical lexicon, phrases from bidir. training

0,1903 0,2350 0,2654

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 25

Language Model

Monolingual text available from Verbmobil 500.000 words (32x the size of orig. English

corpus)Helps to choose among translation hypotheses but will not generate new ones

Stat. lexicon, phrases, fixed expression rules, gen. lexicon, and small LM

Stat. lexicon, phrases, fixed expression rules, gen. lexicon, and large LM

0,2613 0,3172

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 26

General-Purpose Lexicon

Statistical lexicon, phrases, and fixed exp´s with small LM

0,2654

Adding general-purpose lexicon as a transducer

0,2522

Using large instead of small LM

0,3141

general-purpose lexicon as training data instead of separate transducer

0,3275

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 27

Fixed Expression Rules

Transducer rules are human readable and can be added by handFixed expressions for times and dates are re-usable, require less time to build than domain-specific rules and improve coverage of some semi-idiomatic constructions.

Statistical lexicon with small LM

Statistical lexicon and fixed-expression transducer with small LM

0,1893 0,1903

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 28

Knowledge from Existing Grammars

Could help in domain- but not language- portabilityBenefit mostly in additional vocabulary Statistical lexicon, fixed exp´s, phrases, and general lexicon with large LM

Statistical lexicon, fixed exp´s, phrases, general lexicon and I-transducer with large LM

0,3141 0,3172

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 29

Comparative Evaluation Results

Good Okay Bad Score Bleu

Text IF 77 104 227 0,32 0,068

SMT 127 80 205 0,40 0,333

Speech

IF 64 101 243 0,28 0,059

SMT 95 83 227 0,34 0,262

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 30

Selected References

Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, Robert L. Mercer. The Mathematics of Statistical Machine Translation: Parameter Estimation, Computational Linguistics, 1993, 19,2,  pp.263—311

Stephan Vogel, Hermann Ney, Christoph Tillmann. HMM-Based Word Alignment in Statistical Translation. Int. Conf. on Computational Linguistics, Kopenhagen, Danemark, pp. 836-841, August 1996.

Stephan Vogel, Hermann Ney. Translation with Cascaded Finite State Transducers. 36th Annual Conference of the Association for Computational Linguistics, pp. 23-30, Hongkong, China, October2000.

Stephan Vogel, Alicia Tribble. Improving statistical machine translation for a speech-to-speech translation task. To appear in ICSLP 2002.

H. Ney. The Statistical Approach to Spoken Language Translation. Proc. IEEE Automatic Speech Recognition and Understanding Workshop, Madonna di Campiglio, Trento, Italy, 8 pages, CD ROM, IEEE Catalog No. 01EX544, December 2001.

Kishore Papinini, Salim Roukos, Todd Ward, Wei-Jing Zhu. Bleu: a Method for Automatic Evaluation ofMachine Translation. IBM Research Report RC22176(W0109-022), September17, 2001.  

top related