4. josef van genabith (dcu) & khalil sima'an (uva) example based machine translation

Post on 11-May-2015

457 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Example-Based Machine Translation

Josef van Genabith, CNGL, Dublin City University

Khalil Sima’an, University of Amsterdam

Notes are based on � Carl, M. and Way, A., editors (2003). Recent Advances in Example-Based Machine Translation. Kluwer Academic Publishers,

Dordrecht, The Netherlands � Sandipan Dandapat “Mitigating  the Problems of SMT  using  EBMT”  PhD  Thesis,  DCU,  2012 � Dandapat, S., Morrissey, S., Way, A., and van Genabith, J. (2012). Combining EBMT, SMT, TM and IR Technologies for

Quality and Scale. In Proceedings of the EACL 2012 Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to MachineTranslation (HyTra), pages 48--58, Avignon, France.

� Gough, N. and Way, A. (2004). Robust Large-Scale EBMT with Marker-Based Segmentation. In Proceedings of the 10th International Conference on Theoretical and Methodological Issues in Machine Translation, (TMI 2004), page 95–104,Baltimore, MD.

� Green, T. (1979). The Necessity of Syntax Markers: Two Experiments with Artificial Languages. Journal of Verbal Learning and Behavior, 18:481–496.

� Groves, D. and Way, A. (2006). Hybridity in MT: Experiments on the Europarl Corpus. In Proceedings of the 11th Annual Conference of the European Association for Machine Translation, (EAMT 2006), page 115–124, Oslo, Norway.

� Hutchins, J. (2005). Example-Based Machine Translation: a Review and Commentary. Machine Translation, 19(3–4):197–211. � Lepage, Y. and Denoual,  E.  (2005c).  The  ‘purest’  EBMT  System  Ever  Built:  No Variables, No Templates, No Training,

Examples, Just examples, Only Examples. In Proceedings of the 2nd Workshop on Example-based Machine Translation, a Workshop at the MT Summit X, page 81–90, Phuket, Thailand.

� Nagao, M. (1984). A Framework of a Mechanical Translation between Japanese and English by Analogy Principle. In Elithorn, A. and Banerji, R., editors, Artificial and Human Intelligence, page 173–180. North-Holland, Amsterdam.

� Somers, H., Dandapat, S., and Naskar, S. K. (2009). A review of EBMT using proportional analogy. In Proceedings of the 3rd Workshop on Example-Based Machine Translation (EBMT 2009), pages 53--60, Dublin, Ireland.

� Dekai Wu (2006) MT Model Space: Statistical versus Compositional versus Example-Based Machine Translation, Machine Translation (2005) 19:213-227

Acknowledgements

3/23

He buys a book on international politics

Input

Matches + Alignment

He buys a notebook. Kare wa nōto o kau. I read a book on international politics. Watashi wa kokusai seiji nitsuite kakareta hon o yomu.

Recombination Result

Kare wa o kau. kokusai seiji nitsuite kakareta hon

Example (Sato & Nagao 1990)

From: Sandipan Dandapt, PhD, 2012

� “.  .  .  translation  is  a  fine  and  exciting  art,  but  there  is  much  about  it  

that is mechanical and routine.” Martin Kay (1997)

� SMT and EBMT systems are corpus-based approaches to MT � SMT: phrase translation probabilities, word reordering probabilities,

lexical weighting � EBMT: usually lacks well defined probability model

EBMT

� EBMT generally uses a sentence-aligned parallel text (TM) as the primary source of data.

� EBMT systems search the source side of the example-base for close matches to the input sentences and obtain corresponding target segments at runtime

� target segments are reused during recombination � EBMT  is  often  linked  with  the  related  concept  of  “Translation  Memory”  (TM).

� TM is an interactive tool for human translators � EBMT is a fully automatic translation

EBMT and TM

� EBMT: supposed to be good on limited amounts of data and homogeneous data (lots of repetition)

� EBMT systems produce a good translation while SMT systems fail and vice versa

EBMT

� phrase-based SMT approach has proven to be the most successful

MT approach in MT competitions e.g. NIST, WMT, IWSLT etc. � SMT systems discard the actual training data once the translation

model and language model have been estimated � => cannot always guarantee good quality translations for sentences

which closely match those in the training corpora

EBMT and SMT

� Two main approaches to EBMT: � Runtime using proportional analogy � Compile time using generalized translation template-based EBMT

model

EBMT

� rule-based or data-driven MT � Data driven MT: EBMT and SMT

� Corpus-based data driven approaches derive knowledge from

parallel corpora to translate new input

� Mostly SMT today

� A few EBMT (hybrid) systems include CMU-EBMT (Brown, 2011) and Cunei (Phillips, 2011)

� No commercial EBMT?

Background

� Nagao (1984) � “MT  by  analogy principle”

“Man  does  not  translate  a  simple  sentence  by  doing  deep  linguistic  analysis, rather, man does translation, first, by properly decomposing an input sentence into certain fragmental phrases, ... then by translating these phrases into other language phrases, and finally by properly composing these fragmental translations into one long sentence. The translation of each fragmental phrase will be done by the analogy translation principle with proper examples as its reference.”  (Nagao, 1984, p.178)

Where did it start?

� Translation in three steps: matching, alignment and recombination (Somers, 2003): � Matching: finds the example or set of examples from the bitext

which most closely match the source-language string to be translated.

� Alignment: extracts the source–target translation equivalents from the retrieved examples of the matching step.

� Recombination: produces the final translation by combining the target translations of the relevant subsentential fragments.

EBMT core steps

EBMT core steps

Sandipan Dandapt, PhD, 2012

� Where can I find tourist information

� Where can I find ladies dresses⇔ payan kıyafetlerini nereden bulabilirim � just in front of the tourist information⇔ turist bilgilerini hemen önünde

� Where can I find ⇔ nereden bulabilirim � tourist information ⇔ turist bilgilerini

� Where can I find tourist information ⇔ turist bilgilerini nereden bulabilirim

� Update  example  base  …  -

Informal Example

� EBMT systems differ widely in their matching stages � involve a distance or similarity measure of some kind (e.g. edit

distance) Character-Based Matching � dynamic programming technique, e.g. Levenshtein distance

a. The President agrees with the decision. b. The President disagrees with the decision. c. The President concurs with the decision.

� Problem: system will chose (b) given (a) /

Varieties of EBMT

Word-Based Matching: � Nagao (1984) � uses dictionaries and thesauri to determine the relative word distance

in terms of meaning (Sumita et al., 1990) a. The President agrees with the decision. b. The President disagrees with the decision. c. The President concurs with the decision.

� System will chose (c) given (a). - � Linguistics/knowledge heavy: WordNet,  thesaurus,  ontology,  …  /

Varieties of EBMT

Pattern-Based Matching � similar examples can be used to abstract  “generalised”  translation  

templates � Brown (1999): � NE equivalence classes, such as person, date and city � some linguistic information, such as gender and number

a. John Miller flew to Frankfurt on December 3rd. b. ⟨FIRSTNAME-M⟩ ⟨LASTNAME⟩ flew to ⟨CITY⟩ on ⟨MONTH⟩ ⟨ORDINAL⟩. c. ⟨PERSON-M⟩ flew to ⟨CITY⟩ on ⟨DATE⟩.

Varieties of EBMT

Syntax-Based Matching � Kaji et al. (1992): � Source and target side parsers � Alignment using bilingual dictionaries � Generate translation templates

Varieties of EBMT

� X1[NP] no nagasa wa saidai 512 baito de aru ⇔ The maximum length of X1[NP] is 512 bytes � X1[NP] no nagasa wa saidai X2[N] baito de aru ⇔ The maximum length of X1[NP] is X2[N] bytes Looks almost like HPB-SMT ….

Varieties of EBMT

Marker-Based Matching � Green (1979): � Closed call marker words/morphs � Use to chunk

� Veale and Way (1997), Gough and Way (2004)

Varieties of EBMT

that is almost a personal record for me this autumn ⇔ c’ est pratiquement un record personnel pour moi cet automne that is almost a personal record for me this autumn ⇔ c’  est pratiquement un record personnel pour moi cet automne [<DET>that is almost] [<DET>a personal record] [<PREP>for <PRON> me <DET> this autumn] ⇔ [<DET>c’  est pratiquement] [<DET>un record personnel] [<PREP>pour <PRON> moi <DET> cet automne]

Varieties of EBMT

[<DET>that is almost] [<DET>a personal record] [<PREP>for <PRON> me <DET> this autumn] ⇔ [<DET>c’  est pratiquement] [<DET>un record personnel] [<PREP>pour <PRON> moi <DET> cet automne] a. <DET>that is almost ⇔ <DET>c’  est pratiquement b. <DET>a personal record ⇔ <DET>un record personnel c. <PREP>for me this autumn ⇔ <PREP>pour moi cet automne a. <DET> is almost ⇔ <DET> est pratiquement b. <DET> personal record ⇔ <DET> record personnel c. <PREP> me this autumn ⇔ <PREP> moi cet automne

Varieties of EBMT

� a. <PREP> for ⇔ <PREP> pour � b. autumn ⇔ automne

Varieties of EBMT

23/23

The monkey ate a peach. � saru wa momo o tabeta. The man ate a peach. � hito wa momo o tabeta

monkey � saru man � hito

The  …  ate  a  peach.  � …  wa momo o tabeta

The dog ate a rabbit. � inu wa usagi o tabeta

dog � inu rabbit � usagi

The  …  ate  a  …  .  � …  wa … o tabeta

Varieties of EBMT

� first introduced as an analogy-based approach to MT � “case-based”,  “memory-based”  and  “experience-guided”  MT

� Many,  many  varieties  …..

� two main approaches � With or without preprocessing/training stage � Pure/runtime EBMT vs. compiled EBMT

Approaches to EBMT

Pure/runtime EBMT: � (e.g. Lepage and Denoual, 2005b) � No time consumed for training/preprocessing � But:  runtime/translation  complexity  very  considerable  …

Compiled approaches: � (e.g. Al-Adhaileh and Tang,1999; Cicekli and G¨uvenir, 2001) � Pre-compute units below sentence level before prediction/translation

time

Approaches to EBMT

� Everything happens at the translation stage � Lepage and Denoual (2005c)

� Based on proportional analogy (PA) � type of analogical learning

� A : B :: C : D � “A is to B as C is to D”

Pure/runtime EBMT

� A : B :: C : D � “A is to B as C is to D”

� A global relationship between 4 objects � “::”      ~      “=“ � “Analogical  equation” � A : B :: C : D? � Can have one or more solutions � Plato,  Aristotle,  … � Artificial Intelligence

Analogical Reasoning

a. lungs are to humans as gills are to fish b. cat : kitten :: dog : puppy c. speak : spoken :: break : broken

Analogies

a. lungs are to humans as gills are to X? X = fish b. cat : kitten :: dog : X? X = puppy c. speak : spoken :: break : X? X = broken

Analogies

a. lungs are to humans as gills are to fish b. cat : kitten :: dog : puppy c. speak : spoken :: break : broken Note: only (c) is a formal analogy! Computable using string operations …  That’s  the  guys  we’ll  be  concerned  with  most  (but  not  exclusively  ….)  

Analogies

� Lepage (1998) � algorithm that solves analogical equations over strings or characters � based on longest common subsequences, and edit distance � can handle � insertion/deletion of prefixes and suffixes (22a), � exchange of prefixes/suffixes (22b), � infixing (22c) � parallel infixing (22d).

Solving Analogical Equations

a. (French) répression : répressionnaire :: réaction : x ⇒ x=réactionnaire b. wolf : wolves :: leaf : x ⇒ x=leaves c. (German) fliehen : floh :: schließen : x ⇒ x=schloß d. (Proto-Semitic) yasriqu : sariq :: yanqimu : x ⇒ x=naqim

Solving Analogical Equations

Solving Analogical Equations

Analogy-Based EBMT � “The  ‘purest’  EBMT  system  ever  built:  no  variables,  no  templates,  no  

training, examples, just examples, only examples”   � (Lepage and Denoual 2005c)

Pure EBMT

Pure EBMT

Nadaron en el mar. Atravesaron el río Flotó en el mar. ???? nadando.

Atravesó el río flotando

1. Find a pair ⟨A,B⟩ of sentences in the example set that satisfies the PA in Equation: A : B :: C(?) : It floated across the river Solving this results in C = It floated in the sea. 2. Take the translations corresponding to A, B and C : A′,B′  and C′. 3. Solve Equation: A′  : B′  :: C′  : D′  (3.3) D′  is the desired translation.

Pure EBMT

This is complex

� O(n²) possibilities for ⟨A,B⟩

� Quadratic time for longest common subsequences and edit distances

� Time bounded solutions � Heuristics  …….  (!)

Pure EBMT

Pure EBMT

� analogical equation � Given the three entities (A, B, and C) of a PA � Lepage (1998): algorithm to solve an analogical equation to construct

the fourth entity (D). � Simple and some good results � ALEPH EBMT system � did very well on data from the IWSLT 2004 competition, coming a

close second to the competition winner on all measures(Lepage and Denoual, 2005b, p.273).

� GREYC system

Pure EBMT

� But: � Low recall � Processing time when example base gets large ..

� Yea : Yep :: At five a.m. : At five p.m. - � Yea : Yep :: At five a.m. : At five p.m.

Pure EBMT: Challenges

Pure EBMT: Challenges

Pure EBMT: Challenges

� learns translation templates from parallel sentences � (Cicekli and G¨uvenir, 2001)

Compiled/off-line EBMT

a. I will drink orange juice : portakal suyu içeceğim b. I will drink coffee : kahve içeceğim => a. I will drink : içeceğim b. coffee : kahve c. orange juice : portakal suyu => a. I will drink XS : XT içeceğim b. XS coffee : kahve XT

c. XS orange juice : portakal suyu XT

Compiled/off-line EBMT

� Translation templates essentially reduce the data-sparsity problem by generalizing some of the word sequences

� Another example: the work on the marker based approach we saw earlier on

Compiled/off-line EBMT

� Boundary friction

Input: The handsome boy entered the room Matches: The handsome boy ate his breakfast. Der schöne Junge aß sein Frühstück I saw the handsome boy. Ich sah den schönen Jungen. A woman entered the room. Eine Frau betrat den Raum. Output: den schönen Jungen betrat den Raum

� Solutions?

� Labelled fragments (remember where you got the fragment from – use its context)

� Target-language grammar � Target language model (as in SMT)

Compiled/off-line EBMT

� Dekai Wu � MT model space: statistical versus compositional versus example-

based machine translation � a perspective on EBMT from a statistical MT standpoint � What is the definition of EBMT? Do we even know what EBMT is? Is

there a strict definition of EBMT, or are there simply a large number of different models all using corpora, rules, and statistics to varying degrees? Is X a kind of EBMT model? Does X’s  model  qualify  as  EBMT but not SMT? Are all SMT models (perhaps excluding the IBM models) actually EBMT models as well? Are EBMT models actually SMT models? Can a rule-based model be example-based? Statistical?

� a three-dimensional  “MT  model  space

EBMT, SMT, RBMT

EBMT, SMT, RBMT

Dekai Wu

� Nagao (1984)  first  proposed  “translation  by  analogy”  (cf.  Lepage and Denoual 2005)

� analogical models arising in the mid-1980s under various similar names  including  “case-based  reasoning”  (CBR)  as  in Kolodner (1983a,b),  “exemplar-based  reasoning”  as  in  Porter  and  Bareiss (1986) or Kibler and  Aha  (1997),  “instance-based  reasoning”  as  in  Aha  et  al.  (1991),  “memory based reasoning”  as  in  Stanfill and Waltz (1988),  or  “analogy-based  reasoning”  as  in Hall (1989) or Veloso and Carbonell (1993).

� Collins and Somers (2003) � CBR: specific implication about how and when learning and

adaptation take place

EBMT, SMT, RBMT

� nontrivial use of a large library of examples/cases/exemplars/instances at runtime

� that is, during the task performance/testing phase rather than the learning/training phase

� New problems are solved at runtime via analogy to similar examples retrieved from the library, which are broken down, adapted, and recombined as needed to form a solution

� This stands in contrast to most other machine learning approaches which focus on heavy offline learning/training phases, so as to compile or generalize large example sets into abstracted performance models consisting of various forms of abstracted schemata (which are normally much smaller than the entire set of training examples).

EBMT, SMT, RBMT

� Leaning toward memorization rather than abstraction of the training set makes some significant tradeoffs. On one hand, given sufficiently large example libraries, memorization avoids loss of coverage often caused by incorrect generalization or overgeneralization. In the extreme case, memorization approaches are guaranteed to reproduce exactly all unique sentence translations from the training corpus, something abstracted schematic approaches may not necessarily do. On the other hand, memorization approaches tend to undergeneralize, and runtime space and time complexity are vastly increased.

� EBMT: SMT with fairly ad hoc  numerical  measures  …  -

EBMT, SMT, RBMT

� EBMT - SMTS

� Modern EBMT systems incorporate both; for example, Aramaki et al. (2005), Langlais and Gotti (2006), Liu et al. (2006), and Quirk and Menezes (2006) aim for probabilistic formulations of EBMT in terms of statistical inference

� Lots of references in Dekai’s paper

EBMT, SMT, RBMT

EBMT, SMT, RBMT

Dekai Wu

EBMT, SMT, RBMT

Dekai Wu

� Using EBMT chunks (marker hypothesis) in SMT � Using SMT chunks in EBMT

� Making EBMT more efficient: using IR technology for retrieval � Combining EBMT with TM

A few other approaches

� A taste of EBMT � Hard to define what exactly EBMT is and how it relates to SMT,

RBMT and TM � Some core EBMT approaches � Runtime, pure � Compile time/preprocessing � Many  EBMT  approaches  seem  to  be  essentially  hybrid  “in  nature/in  practice”

� EBMT: quo vadis – where are you going?

Conclusion

Compiled/off-line EBMT

Compiled/off-line EBMT

Compiled/off-line EBMT

Compiled/off-line EBMT

top related