introduction to machine translation - nju nlpnlp.nju.edu.cn/huangsj/mtcourse/mt1-intro.pdf · 1984...
TRANSCRIPT
Machine Translation
n Automatically translate text from one language to the other. ¡ e.g. translate a text from English to Chinese
14-12-31 Introduction to Machine Translation 2
Interests
n Research perspective: ¡ one of the first applications envisioned for computers ¡ one of the most challenging problems in AI ¡ requiring knowledge from many NLP sub0areaes
n Commercial perspective ¡ US lounched a seri of projects: TIDES(Translingual Information Detection,
Extraction and Summarization), GALE (Global Autonomous Language Exploitation), BOLT(Broad Operational Language Translation)
¡ EU spends more than $1 billion in translation ¡ Practical usage
14-12-31 Introduction to Machine Translation 3
Machine Translation Roadmap
n 1949 The field of "machine translation" appeared in Warren Weaver's Memorandum on Translation.
n 1950s Research groups established in US, Japan, Russia n 1966 Automatic Language Processing Advisory Committee (ALPAC)
report gave negative judgment n 1968 SYSTRAN founded n 1970s MT is used to translate text abstract, technical manuals n 1984 Trados founded (Trans. Memory and CAT, acquired by SDL 2005) n 1990s Statistical methods were employed in MT research n 1992 SDL plc founded n 2002 Language Weaver founded (acquired by SDL in 2010 by $42.5m) n 2007 Google announced its language service/OPP “Moses” released n 2012 Google translates text eq. 1 million books in one day
[http://en.wikipedia.org/wiki/Machine_translation] 14-12-31 Introduction to Machine Translation 4
SMT Research Roadmap
n 1990
n 1993
n 1999 Phrase-based translation n 2001 Syntax-based translation n 2002 BLEU (Evaluation) n 2003 MERT (Minimum Error Rate Training) n 2005 Hierarchical Phrase-based translation
14-12-31 Introduction to Machine Translation 5
People
n Peter Brown (IBM Watson) n Franz J. Och (RWTH, USC, Google) n Philipp Koehn (USC, MIT, Edinburgh) n Kevin Knight, Danial Marcu (USC, Language Weaver) n Hermann Ney (RWTH) n Stephan Vogel (RWTH, CMU, Qatar) n 。。。
14-12-31 Introduction to Machine Translation 6
Paradigms
n Rule-based Translation n Statistical Machine Translation n Example-based Translation (Translation Memory)
n Translation levels: ¡ direct ¡ transfer ¡ interlingua
14-12-31 Introduction to Machine Translation 8
Rule-based Translation
n A dictionary that will map each English word to an appropriate German word.
n Rules representing regular English sentence structure. n Rules representing regular German sentence structure. n Rules to relate the above two structures together.
14-12-31 Introduction to Machine Translation 9
RBMT Example:
n A girl eats an apple. ¡ Source Language = English; Demanded Target Language = German
n 1st: getting part-of-speech information of source words: ¡ a = indef.article; girl = noun; eats = verb; an = indef.article; apple =
noun
n 2nd: getting syntactic information about the verb “to eat”: ¡ NP-eat-NP; here: eat – Present Simple, 3rd Person Singular
n 3rd: parsing the source sentence: ¡ (NP an apple) = the object of eat
14-12-31 Introduction to Machine Translation 10
RBMT Example(cont.):
n 4th: translate English words into German ¡ a (category = indef.article) => ein (category = indef.article) ¡ girl (category = noun) => Mädchen (category = noun) ¡ eat (category = verb) => essen (category = verb) ¡ an (category = indef. article) => ein (category = indef.article) ¡ apple (category = noun) => Apfel (category = noun)
n 5th: Mapping dictionary entries into appropriate inflected forms (final generation): ¡ A girl eats an apple. => Ein Mädchen isst einen Apfel.
14-12-31 Introduction to Machine Translation 11
Fundamental of Statistical MT Systems
14-12-31 Introduction to Machine Translation 12
12
French Broken English
English
French/English Bilingual Text
English Text
Statistical Analysis Statistical Analysis
I am so hungry “Que hambre tengo yo”
What hunger have I, Hungry I am so, I am so hungry, Have I that hunger …
13
Statistical MT Systems
French Broken English
English
French/English Bilingual Text
English Text
Statistical Analysis Statistical Analysis
“Que hambre tengo yo” I am so hungry
Translation Model P(f|e)
Language Model P(e)
Decoding algorithm argmax P(e) * P(f|e) e
14-12-31 Introduction to Machine Translation
14
Bayes Rule
French Broken English
English
“Que hambre tengo yo” I am so hungry
Translation Model P(f|e)
Language Model P(e)
Decoding algorithm argmax P(e) * P(f|e) e
Given a source sentence f, the decoder should consider many possible translations … and return the target string e that maximizes
P(e | f) By Bayes Rule, we can also write this as:
P(e) P(f | e) / P(f) and maximize that instead. P(f) never changes while we compare different e’s, so we can equivalently maximize this:
P(e) P(f | e) 14-12-31 Introduction to Machine Translation
15
The noisy Channel Model
n Goal: translation system from French to English n Have a model P(e|f) which estimates conditional probability of any
English sentence e given the French sentence f. Use the training corpus to set the parameters.
n A noisy channel model has two components: ¡ P(e) the language model ¡ P(f|e) the translation model
n Giving: and
P(e | f ) = P(e, f )P( f )
=P(e)P( f | e)P( f )
argmaxe P(e | f ) = argmaxe P(e)P( f | e)
14-12-31 Introduction to Machine Translation
16
A division of labor
n Use of Bayes Rule (“the noisy channel model”) allows a division of labor:
n The job of the translation model P( f| e) is just to model how various French words typically get translated into English (perhaps in a certain context) ¡ P(f|e) doesn’t have to worry about language-particular facts about
English word order: that’s the job of P(e) n The job of the language model is to choose felicitous bags of
words and to correctly order them for English ¡ P(e) can do bag generation: putting a bag of words in order:
n – E.g., hungry I am so → I am so hungry
14-12-31 Introduction to Machine Translation
17
Three Problems for Statistical MT n Language model
¡ Given an English string e, assigns P(e) by formula ¡ good English string -> high P(e) ¡ random word sequence -> low P(e)
n Translation model ¡ Given a pair of strings <f,e>, assigns P(f | e) by formula ¡ <f,e> look like translations -> high P(f | e) ¡ <f,e> don’t look like translations -> low P(f | e)
n Decoding algorithm ¡ Given a language model, a translation model, and a new sentence
f … find translation e maximizing P(e) * P(f | e)
14-12-31 Introduction to Machine Translation
18
Three Problems for Statistical MT(cont.)
n The language model P(e) could be a n-gram model, estimated from any data (parallel corpus not needed to estimate the parameters)
n The translation model P(f|e) is trained from a parallel corpus of French/English pairs
n Note: ¡ The translation model is backwards! ¡ The language model can make up for deficiencies of the translation
model. ¡ Later we’ll talk about how to build P(f|e) ¡ Decoding, i.e., finding argmaxeP(e)P(f|e) is also a challenging problem.
14-12-31 Introduction to Machine Translation
19
Example from Koehn and Knight tutorial
n Translation from French to English, candidate translations based on P(French|English) alone:
¡ “Que hambre tengo yo” →
¡ What Hunger have P(F|E)=0.000014
¡ Hungry I am so P(F|E)=0.000001
¡ I am so hungry P(F|E)=0.000001
¡ Have I that hunger P(F|E)=0.000020
14-12-31 Introduction to Machine Translation
20
Example from Koehn and Knight tutorial
n With a language model ¡ P(French|English) * P(English)
¡ “Que hambre tengo yo” →
¡ What Hunger have P=0.000014*0.000001
¡ Hungry I am so P=0.000001*0.0000014
¡ I am so hungry P=0.000001*0.0001
¡ Have I that hunger P=0.000020*0.00000098
14-12-31 Introduction to Machine Translation