towards interactive and automatic refinement of translation rules

77
Towards Interactive and Automatic Refinement of Translation Rules PhD Thesis Proposal Ariadna Font Llitjós 5 November 2004

Upload: hunter

Post on 20-Mar-2016

61 views

Category:

Documents


0 download

DESCRIPTION

Towards Interactive and Automatic Refinement of Translation Rules. PhD Thesis Proposal Ariadna Font Llitjós 5 November 2004. Outline. Introduction Related Work Technical Approach Interactive elicitation of error information A framework for automatic rule adaptation Preliminary Research - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Towards Interactive and Automatic Refinement of Translation Rules

Towards Interactive and Automatic Refinement of

Translation Rules

PhD Thesis Proposal

Ariadna Font Llitjós

5 November 2004

Page 2: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 2

Outline

• Introduction • Related Work• Technical Approach

– Interactive elicitation of error information– A framework for automatic rule adaptation

• Preliminary Research• Proposed Research• Contributions and Thesis Timeline

Page 3: Towards Interactive and Automatic Refinement of Translation Rules

How to recycle corrections of MT output back into the system

by adjusting and adapting the grammar and lexical rules

Page 4: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 4

The Problem

General- MT output still requires post-editing.- Current systems do not recycle post-editing

efforts back into the system, beyond adding as new training data.

Avenue specific- Resource-poor scenarios: lack of manual

grammar or very small initial grammar.- Need to validate elicitation corpus and

automatically learned translation rules .

Page 5: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 5

MotivationGeneral- Very costly and time consuming to refine and

extend translation rule sets manually by trained computational linguists with knowledge of both languages.

Resource-poor scenarios- Indigenous communities have difficult access

to crucial information that directly affects their life (such as land laws, plagues, health warnings, etc.).

- Preservation of language and culture.

Page 6: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 6

MT OutputSL: Mary and Anna are fallingTL: María y Ana están cayendo TL’: María y Ana se están cayendo

SL: Gaudi was a great artistTL: Gaudi estaba un artista grandeTL: Gaudi era un artista grande TL’: Gaudi era un gran artista

SL: You saw the womanTL: Viste la mujer TL’: Viste a la mujerTL: Vió la mujer

SL: I used my elbow to push the buttonTL: Usé mi codo que apretar el botónTL’: Usé mi codo para apretar el botón

SL: We are building new bridges in the cityTL: Nosotros estamos construyendo nuevo puentes dentro la ciudadTL’: Nosotros estamos construyendo nuevo puentes dentro de la ciudad

Page 7: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 7

Resource-poor scenarios

• No e-data available (often spoken tradition) SMT or EBMT

• No computational linguists to write a grammar

Page 8: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 8

Resource-poor scenarios

• No e-data available (often spoken tradition) SMT or EBMT

• No computational linguists to write a grammar

So how can we even start to think about MT?

Page 9: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 9

Resource-poor scenarios

• No e-data available (often spoken tradition) SMT or EBMT

• No computational linguists to write a grammar

So how can we even start to think about MT?– That’s what AVENUE is all about

Elicitation Corpus + Automatic Rule Learning

Page 10: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 10

Resource-poor scenarios• No e-data available (often spoken tradition)

SMT or EBMT• No computational linguists to write a grammar

So how can we even start to think about MT?– That’s what AVENUE is all about

Elicitation Corpus + Automatic Rule Learning

What do we usually have available in resource-poor scenarios?

Page 11: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 11

Resource-poor scenarios• No e-data available (often spoken tradition)

SMT or EBMT• No computational linguists to write a grammar

So how can we even start to think about MT?– That’s what AVENUE is all about

Elicitation Corpus + Automatic Rule Learning

What do we usually have available in resource-poor scenarios?Bilingual users

Page 12: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 12

Avenue overview

Learning Module

Transfer Rules

Lexical Resources

Run Time Transfer System

Lattice

Translation Correction

Tool

Word-Aligned Parallel Corpus

Elicitation Tool

Elicitation Corpus

Elicitation Rule Learning

Run-Time System

Rule Refinement

Rule Refinement

Module

Handcrafted rules

Morphology

Morpho-logical analyzer

Page 13: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 13

Avenue overview: my thesis

Learning Module

Transfer Rules

Lexical Resources

Run Time Transfer System

Lattice

Translation Correction

Tool

Word-Aligned Parallel Corpus

Elicitation Tool

Elicitation Corpus

Elicitation Rule Learning

Run-Time System

Rule Refinement

Rule Refinement

Module

Handcrafted rules

Morphology

Morpho-logical analyzer

Page 14: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 14

Thesis Statement

- Given a rule-based Transfer MT system, we can extract useful information from non-expert bilingual speakers about the corrections required to make MT output acceptable.

Page 15: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 15

Thesis Statement

- Given a rule-based Transfer MT system, we can extract useful information from non-expert bilingual speakers about the corrections required to make MT output acceptable.. - We can automatically refine translation rules, given corrected and aligned translation pairs and some error information, so as to improve coverage and overall MT quality.

Page 16: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 16

Page 17: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 17

Outline

• Introduction • Related Work• Technical Approach

– Interactive elicitation of error information– A framework for automatic rule adaptation

• Preliminary Research• Proposed Research• Contributions and Thesis Timeline

Page 18: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 18

Related Work

• Post-editing to improve MT systems– minimal post-editing [Allen, 2003]– include user feedback in the MT loop

[Callison-Burch, 2004], [Allen & Hogan, 2000], [Su et al. 1995], [Menezes & Richardson, 2001] and [Imamura et al. 2003]

• MT error information and classification– [Flanagan, 1994], [White et al., 1994],

[Allen 2003], [Niessen et al. 2000]

Page 19: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 19

Related Work++• Rule Adaptation

– POS tagging: [Lin et al., 1994]– parsing: [Lehman, 1989], [Brill, 2003]– NLU: [Gavaldà, 2000]– MT:

[Corston-Oliver & Gammon, 2003]: DTs to correct binary features of LF to reduce noise

[Yamada, 1995]: structural comparison between machine translations and manual translations to adapt MT system to new domain.

[Naruedomkul, 2001]: modify HPSG-like semantic representation of TL until it is acceptably similar to the SL.

Page 20: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 20

Outline

• Introduction • Related Work• Technical Approach

– Interactive elicitation of error information– A framework for automatic rule adaptation

• Preliminary Research• Proposed Research• Contributions and Thesis Timeline

Page 21: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 21

Interactive elicitation of MT errors

Assumptions:• non-expert bilingual users can reliably detect

and minimally correct MT errors, given:– SL sentence (I saw you)– TL sentence (Yo vi tú)– word-to-word alignments (I-yo, saw-vi, you-tú)– (context)

• using an online GUI: the Translation Correction Tool (TCTool)

Goal: • simplify MT correction task maximally

Page 22: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 22

MT error typology for RR (simplified)

• missing word• extra word• word order (local vs long-distance, word

vs phrase, word change)• incorrect word (sense, form, selectional

restrictions, idiom, ...)• agreement (missing constraint, extra

agreement constraint)

Page 23: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 23

Outline

• Motivation and Goals • Related Work• Technical Approach

– Interactive elicitation of error information– A framework for automatic rule adaptation

• Work to Date• Proposed Research• Contributions and Open Questions

Page 24: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 24

Automatic Rule Refinement Framework

• Find best RR operations given a:• grammar (G), • lexicon (L), • (set of) source language sentence(s) (SL), • (set of) target language sentence(s) (TL), • its parse tree (P), and • minimal correction of TL (TL’)

such that TQ2 > TQ1• Which can also be expressed as:

max TQ(TL|TL’,P,SL,RR(G,L))

Page 25: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 25

Types of RR operations

• Grammar:– R0 R0 + R1 [=R0’ + contr] Cov[R0] Cov[R0,R1]– R0 R1 [=R0 + constr] Cov[R0] Cov[R1]– R0 R1[=R0 + constr= -]

R2[=R0’ + constr=c +] Cov[R0] Cov[R1,R2]

• Lexicon– Lex0 Lex0 + Lex1[=Lex0 + constr] – Lex0 Lex1[=Lex0 + constr]– Lex0 Lex0 + Lex1[Lex0 + TLword] Lex1 (adding lexical item)

bifurcate

refine

Page 26: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 26

Formalizing Error Information

Wi = error

Wi’ = correction

Wc = clue word

Example:

SL: the red car - TL: *el auto roja TL’: el auto rojo

Wi = roja Wi’ = rojo Wc = auto

need to agree

Page 27: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 27

Finding Triggering Features

Once we have user’s correction (Wi’), we can compare it with Wi at the feature level and find which is the triggering feature.

If set is empty, need to postulate a new binary feature

Delta function:

Page 28: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 28

Outline

• Introduction • Related Work• Technical Approach

– Interactive elicitation of error information– A framework for automatic rule adaptation

• Preliminary Research• Proposed Research• Contributions and Thesis Timeline

Page 29: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 29

TCTool v0.1•Add a word•Delete a word•Modify a word•Change word order

Actions:

Interactive elicitation of error information

Page 30: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 30

TCTool v0.1 specs

• First five translations from lattice produced by transfer engine.

• Asks users to pick correct translation, or else, best incorrect translation (i.e. the one requiring the least amount of corrections).

• Provides translation correction and error classification help (static tutorial + error example page).

• CGI scripts in PERL• Correction interface in JavaScript (Kenneth Sim and Patrick Milholland)

Interactive elicitation of error information

Page 31: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 31

1st Eng2Spa user study

[LREC 2004]• Manual grammar: 12 rules + 442 lexical

entries• MT error classification (v0.0): 9 linguistically-

motivated classes word order, sense, agreement error (number, person, gender, tense), form, incorrect word and no translation

• Test set: 32 sentences from the AVENUE Elicitation Corpus (4 correct / 28 incorrect)

Interactive elicitation of error information

Page 32: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 32

Data Analysis

• Interested in high precision, even at the expense of lower recall

• Users did not always fix a translation in the same way

• Most of the time, when the final translation was not = gold standard, it was still correct or better (better stylistically)

Interactive elicitation of error information

Page 33: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 33

Rule Refinement Operations

• Organized according to type of actions users can perform to correct a sentence with TCTool

• And according to what error information is available (Wc, alignments, …)

Automatic Rule Adaptation

Page 34: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 34

Automatic Rule Adaptation

Page 35: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 35

Rule Refinement Simulation IAutomatic Rule Adaptation

Change word order1. Run SL sentence through the transfer engine

Gaudí was a great artist

2. Input SL sentence and up to 5 alternative translation with alignments to Translation Correction Tool.

3. Input user correction log file with transfer engine output to RR module variable instantiation.

4. Determine appropriate RR operations that need to apply.

5. Modify grammar and lexicon by applying RR ops.

6. Run MT system again with refined grammar and lexicon.

Page 36: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 36

Automatic Rule Adaptation

Page 37: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 37

Automatic Rule AdaptationSL + best TL picked by user

Page 38: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 39

Automatic Rule AdaptationChanging “grande” into “gran”

Page 39: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 40

Automatic Rule Adaptation

Page 40: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 41

Input to RR module

sl: Gaudi was a great artist

tl: GAUDI ERA UN ARTISTA GRANDE

tree: <((S,1 (NP,2 (N,5:1 "GAUDI") )

(VP,3 (VB,2 (AUX,17:2 "ERA") )

(NP,8 (DET,0:3 "UN")

(N,4:5 "ARTISTA")

(ADJ,5:4 "GRANDE") ) ) ) )>

-User correction log file -Transfer engine output (+ parse tree):

Page 41: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 42

Variable instantiation from log fileCorrection Actions:

1. Word order change (artista grande grande artista):

Wi = grande

2. Edited grande into gran:Wi’ = gran identified artist as clue word Wc = artist

In this case, even if user had not identified Wc, refinement process would have been the same

Page 42: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 43

Retrieve relevant lexical entriesADJ::ADJ |: [great] -> [grande]((X1::Y1)((x0 form) = great)((y0 agr num) = sg)((y0 agr gen) = masc))

N::N |: [artist] -> [artista]((X1::Y1)((x0 agr pers) = 3)((x0 agr num) = sg)((x0 form) = artist)((x0 semtype) = human))

Page 43: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 44

Add lexical entry for “gran”Duplicate lexical entry great-grande and changeTL side:

ADJ::ADJ |: [great] -> [gran]((X1::Y1)((x0 form) = great)((y0 agr num) = sg)((y0 agr gen) = masc))

Even if we had morphological analyzer available,no difference between them:

grande grande AQ0CS0 grande NCCS000gran gran AQ0CS0

Lex0 Lex1[Lex0 + TLword]

Page 44: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 45

Finding triggering feature(s)

Feature function: (Wi, Wi’) = need to postulate a new binary feature: feat1

Blame assignment:

tree: <((S,1 (NP,2 (N,5:1 "GAUDI") )

(VP,3 (VB,2 (AUX,17:2 "ERA") )

(NP,8 (DET,0:3 "UN")

(N,4:5 "ARTISTA")

(ADJ,5:4 "GRANDE") ) ) ) )>

Page 45: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 46

Refining the rules

Wi = grande POSi = ADJ = Y3, y3WWcc = artist = artist POS POScc = N = Y2, y2 = N = Y2, y2

{NP,8}NP::NP : [DET ADJ N] -> [DET N ADJ]( (X1::Y1) (X2::Y3) (X3::Y2) ((x0 def) = (x1 def)) (x0 = x3) ((y1 agr) = (y2 agr)) ; det-noun agreement ((y3 agr) = (y2 agr)) ; adj-noun agreement (y2 = x3) )

Page 46: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 47

Refining the rules

{NP,1008}NP::NP : [DET ADJ N] -> [DET ADJ N]( (X1::Y1) (X2::Y2) (X3::Y3) ((x0 def) = (x1 def)) (x0 = x3) ((y1 agr) = (y3 agr)) ; det-noun agreement ((y2 agr) = (y3 agr)) ; adj-noun agreement (y2 = x3) ((y2 feat1) =c + ) )

Page 47: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 48

Refining the lexical entriesADJ::ADJ |: [great] -> [grande]((X1::Y1)((x0 form) = great)((y0 agr num) = sg)((y0 agr gen) = masc)((y0 feat1) = -))

ADJ::ADJ |: [great] -> [gran]((X1::Y1)((x0 form) = great)((y0 agr num) = sg)((y0 agr gen) = masc)((y0 feat1) = +))

Page 48: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 49

Done? Not yet

• Right now we’ve just increased ambiguity in the grammar: translation candidate list size has increased by more than double, since both “grande” and “gran” can be unified with {NP,8} and “gran” now unifies with {NP,1008}.

• Need to restrict application of general rule to just post-nominal ADJ:

R0 R1[=R0 + constr= -] = NP,8 (general rule)

R2[=R0’ + constr=c +] = NP,1008 (specific rule)

Cov[R0] Cov[R1,R2]

Page 49: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 50

Add blocking constraint

{NP,8}NP::NP : [DET ADJ N] -> [DET N ADJ]( (X1::Y1) (X2::Y3) (X3::Y2) ((x0 def) = (x1 def)) (x0 = x3) ((y1 agr) = (y2 agr)) ; det-noun agreement ((y3 agr) = (y2 agr)) ; adj-noun agreement (y2 = x3) ((y3 feat1) = - ) )

Page 50: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 51

Refined MT output sl: Gaudi was a great artist

tl: GAUDI ERA UN ARTISTA GRANDE

tree: <((S,1 (NP,2 (N,5:1 "GAUDI") ) (VP,3 (VB,2 (AUX,17:2 "ERA") ) (NP,8 (DET,0:3 "UN") (N,4:5 "ARTISTA") (ADJ,5:4 "GRANDE") ) ) ) )>

tl: GAUDI ERA UNA ARTISTA GRANDE

tree: <((S,1 (NP,2 (N,5:1 "GAUDI") ) (VP,3 (VB,2 (AUX,17:2 "ERA") ) (NP,8 (DET,2:3 "UNA") (N,4:5 "ARTISTA") (ADJ,5:4 "GRANDE") ) ) ) )>

tl: GAUDI ERA UN GRAN ARTISTA

tree: <((S,1 (NP,2 (N,5:1 "GAUDI") ) (VP,3 (VB,2 (AUX,17:2 "ERA") ) (NP,1008 (DET,0:3 "UN") (ADJ,6:4 "GRAN") (N,4:5 "ARTISTA") ) ) ) )>

tl: GAUDI ERA UNA GRAN ARTISTA

tree: <((S,1 (NP,2 (N,5:1 "GAUDI") ) (VP,3 (VB,2 (AUX,17:2 "ERA") ) (NP,1008 (DET,2:3 "UNA") (ADJ,6:4 "GRAN") (N,4:5 "ARTISTA") ) ) ) )>

… [same for estaba]

Page 51: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 52

Can we get rid of incorrect translation?• Since “great” translates both as “grande” and “gran”,

both rules will be applied, one with each type of adjective.

• If user identified Wc, the RR module can also tag “artist” with ((feat1)= +) and add an agreement constraint between ADJ and N ((y2 feat1) = (y3 feat1)) to {NP,8}

• For each new adjective that appears in this context, one of the two NP rules will be picked by users, and the RR module will be able to tag them as being pre-nominal (feat1 = +) or post-nominal (feat1 = -) in the lexicon.

Page 52: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 53

Manual vs Learned Grammars[AMTA 2004]

Automatic Rule Adaptation

NIST BLEU METEORManual grammar 4.3 0.16 0.6

Learned grammar 3.7 0.14 0.55

• Manual inspection:

• Automatic MT Evaluation:

Page 53: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 54

Evaluation of refined MT output

Assumption:• user corrections = gold standard

reference translations

Idea:• compare raw MT output with MT output

by the refined grammar and lexicon using automatic evaluation metrics, such as BLEU and METEOR

Automatic Rule Adaptation

Page 54: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 55

Human Oracle experiment

• As a feasibility experiment, compared raw output with manually corrected MT:

statistically significant (confidence interval test)

• These is an upper-bound on how much difference we should expect any refinement approach to make.

Automatic Rule Adaptation

Page 55: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 56

Outline

• Introduction • Related Work• Technical Approach

– Interactive elicitation of error information– A framework for automatic rule adaptation

• Preliminary Research• Proposed Research• Contributions and Thesis Timeline

Page 56: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 57

Data Set• Development set: ~400 sentences which can

be fully parsed by the original manual grammar (50-100 rules)– from Typological and Structural Elicitation Corpus– categorized by error

• Split dev set into two: – dev set run user studies + develop RR module

+ validation set– test set evaluate effect of RR operations

• Wild test set (from naturally occurring text)– requirement: need to be fully parsed by grammar.

Page 57: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 58

TCTool v0.2

• New way of eliciting MT error information– simple statements/questions about the error – possibly ordering from most informative to

least informative

• Time permitting:– Dynamic tutorial– User profiles will allow us to experiment

with user idiosyncratic grammars(consistent refinements) vs a general grammar for all users

Page 58: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 59

User studies

• Eng2Spa II: test new version of TCTool (v0.2) and compare error correction + classification accuracy results.

• Resource-poor language to Spanish– likely candidates: Mapudungun, Quechua

• Batch vs Interactive mode

• Amount of error information elicited

Page 59: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 60

Interactive mode• Extra error information is required

– more sentences need to be evaluated (and corrected by user) relevant minimal pairs

• Elicit extra MT error information by either evaluating (and correcting) relevant minimal pairs or by asking users to check which simple statements are true.

• Typically requires more effort from users and takes longer smaller test set

Page 60: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 61

Active Learning

• Method to minimize the number of examples a human annotator must label [Cohn et al. 94] usually by processing examples in order of usefulness. [Lewis and Catlett 94] used uncertainty as a measure of usefulness.

• [Callison-Burch 2003] has proposed AL to reduce the cost of creating a corpus of labeled training examples for Statistical MT.

Page 61: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 62

Scope

• Focus 1: types of errors that can be refined fully automatically just by using correction information.

• Focus 2: types of errors that can be refined fully automatically using correction and error information.

• Focus 3: types of errors that require a reasonable amount of further user interaction and can be solved by available correction and error information.

Page 62: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 63

Outside scope

• Types of errors for which

– users cannot give the required information

or

– further user interaction might take too long and might not provide with the required information

Page 63: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 64

Automatic Rule Adaptation

Page 64: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 65

Rule Refinement Simulation IIAutomatic Rule Adaptation

Change word order + further user interaction1. Run SL sentence through the transfer engine

I see them

2. Input SL sentence and up to 5 alternative translation with alignments to Translation Correction Tool.

3. Input user correction log file with transfer engine output to RR module variable instantiation.

4. Determine appropriate RR operations that need to apply.

5. Modify grammar and lexicon by applying RR ops.

6. Run MT system again with refined grammar and lexicon.

Page 65: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 66

1

2

3

Page 66: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 67

Input to RR module

sl: I see them

tl: VEO LOS

tree: <((S,0 (VP,3 (VP,1 (V,1:2 "VEO") )

(NP,0 (PRON,2:3 "LOS") ) ) ) )>

- User correction log file - Transfer engine output (+ parse tree):

Page 67: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 68

under progress….

Page 68: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 69

Variable instantiation from log file

Correction Actions:

1. Word order change (veo los los veo):Wi = los

2. Wc ? not a specific word, but rather a combination of features in Wi(pron+acc)

Need a minimal pair to determime appropriate refinement

Page 69: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 70

Minimal Pair

sl: I see cars

tl: VEO AUTOS

tree: <((S,0 (VP,3 (VP,1 (V,1:2 "VEO") )

(NP,2 (N,1:3 “AUTOS") ) ) ) )>

Page 70: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 71

Evaluation of Rule Refinements

Hypothesis file (translations to be evaluated automatically)

Raw MT output:• Best sentence (picked by user to be correct or

requiring the least amount of correction) Refined MT output:• Use METOR score at sentence level to pick

best candidate from the list

Run all automatic metrics on the new hypothesis file using user corrections as reference translations.

Page 71: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 72

Measuring improvement

• Recall: increase on TQ+C as indicated by automatic metrics (generating the correct translation).

• Parsimony: not to increase the size of final candidate list more than strictly necessary.

• Precision: decreasing the number of incorrect hypothesis in the final translation candidate list when possible.

Page 72: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 73

Outline

• Introduction • Related Work• Technical Approach

– Interactive elicitation of error information– A framework for automatic rule adaptation

• Preliminary Research• Proposed Research• Contributions and Thesis Timeline

Page 73: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 74

Expected Contributions

• An efficient online GUI to display translations and alignments and solicit pinpoint fixes from non-expert bilingual users.

• An MT error typology and mapping between user corrections and rule repair operations.

• An expandable set of rule-refinement operators, triggered by user corrections.

Page 74: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 75

Expected contributions++

• A repair hypotheses management system, able to keep different candidate rule repairs for later confirmation or rejection.

• An analysis of the effects of automatic rule refinements on different types of grammars.

• A mechanism to automatically evaluate automatic RR wrt. user corrections.

Page 75: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 76

Thesis TimelineResearch components Duration (months)

TCTool implementation and testing (+user studies) 3Data set + manual grammar expansion 1RR module implementation (batch mode) 5RR module +AL (interactive mode) 3Active Learning methods 1Adapt RR module to new language pair 1Evaluation 1Write and defend thesis 3

Total 18

Page 76: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 78

References

Add references:

• Related work• Probst et al. 2002• AL

Page 77: Towards Interactive and Automatic Refinement of Translation Rules

Interactive and Automatic Rule Refinement 80

Desired contributions

• All the post-editing work done is not lost.

• For resource-poor languages, the AVENUE MT system with the TCTool and the RR module can effectively substitute for a computational linguist.