towards interactive and automatic refinement of translation rules

Towards Interactive and Automatic Refinement of

Translation Rules

PhD Thesis Proposal

Ariadna Font Llitjós

5 November 2004

Interactive and Automatic Rule Refinement 2

Outline

• Introduction • Related Work• Technical Approach

– Interactive elicitation of error information– A framework for automatic rule adaptation

• Preliminary Research• Proposed Research• Contributions and Thesis Timeline

How to recycle corrections of MT output back into the system

by adjusting and adapting the grammar and lexical rules


The Problem

General- MT output still requires post-editing.- Current systems do not recycle post-editing

efforts back into the system, beyond adding as new training data.

Avenue specific- Resource-poor scenarios: lack of manual

grammar or very small initial grammar.- Need to validate elicitation corpus and

automatically learned translation rules .


MotivationGeneral- Very costly and time consuming to refine and

extend translation rule sets manually by trained computational linguists with knowledge of both languages.

Resource-poor scenarios- Indigenous communities have difficult access

to crucial information that directly affects their life (such as land laws, plagues, health warnings, etc.).

- Preservation of language and culture.


MT OutputSL: Mary and Anna are fallingTL: María y Ana están cayendo TL’: María y Ana se están cayendo

SL: Gaudi was a great artistTL: Gaudi estaba un artista grandeTL: Gaudi era un artista grande TL’: Gaudi era un gran artista

SL: You saw the womanTL: Viste la mujer TL’: Viste a la mujerTL: Vió la mujer

SL: I used my elbow to push the buttonTL: Usé mi codo que apretar el botónTL’: Usé mi codo para apretar el botón

SL: We are building new bridges in the cityTL: Nosotros estamos construyendo nuevo puentes dentro la ciudadTL’: Nosotros estamos construyendo nuevo puentes dentro de la ciudad


Resource-poor scenarios

• No e-data available (often spoken tradition) SMT or EBMT

• No computational linguists to write a grammar





So how can we even start to think about MT?





So how can we even start to think about MT?– That’s what AVENUE is all about

Elicitation Corpus + Automatic Rule Learning


Resource-poor scenarios• No e-data available (often spoken tradition)

SMT or EBMT• No computational linguists to write a grammar



What do we usually have available in resource-poor scenarios?


Resource-poor scenarios• No e-data available (often spoken tradition)

SMT or EBMT• No computational linguists to write a grammar



What do we usually have available in resource-poor scenarios?Bilingual users


Avenue overview

Learning Module

Transfer Rules

Lexical Resources

Run Time Transfer System

Lattice

Translation Correction

Tool

Word-Aligned Parallel Corpus

Elicitation Tool

Elicitation Corpus

Elicitation Rule Learning

Run-Time System

Rule Refinement

Rule Refinement

Module

Handcrafted rules

Morphology

Morpho-logical analyzer


Avenue overview: my thesis

Learning Module

Transfer Rules

Lexical Resources

Run Time Transfer System

Lattice

Translation Correction

Tool

Word-Aligned Parallel Corpus

Elicitation Tool

Elicitation Corpus

Elicitation Rule Learning

Run-Time System

Rule Refinement

Rule Refinement

Module

Handcrafted rules

Morphology

Morpho-logical analyzer


Thesis Statement

- Given a rule-based Transfer MT system, we can extract useful information from non-expert bilingual speakers about the corrections required to make MT output acceptable.


Thesis Statement

- Given a rule-based Transfer MT system, we can extract useful information from non-expert bilingual speakers about the corrections required to make MT output acceptable.. - We can automatically refine translation rules, given corrected and aligned translation pairs and some error information, so as to improve coverage and overall MT quality.


Outline





Related Work

• Post-editing to improve MT systems– minimal post-editing [Allen, 2003]– include user feedback in the MT loop

[Callison-Burch, 2004], [Allen & Hogan, 2000], [Su et al. 1995], [Menezes & Richardson, 2001] and [Imamura et al. 2003]

• MT error information and classification– [Flanagan, 1994], [White et al., 1994],

[Allen 2003], [Niessen et al. 2000]


Related Work++• Rule Adaptation

– POS tagging: [Lin et al., 1994]– parsing: [Lehman, 1989], [Brill, 2003]– NLU: [Gavaldà, 2000]– MT:

[Corston-Oliver & Gammon, 2003]: DTs to correct binary features of LF to reduce noise

[Yamada, 1995]: structural comparison between machine translations and manual translations to adapt MT system to new domain.

[Naruedomkul, 2001]: modify HPSG-like semantic representation of TL until it is acceptably similar to the SL.


Outline





Interactive elicitation of MT errors

Assumptions:• non-expert bilingual users can reliably detect

and minimally correct MT errors, given:– SL sentence (I saw you)– TL sentence (Yo vi tú)– word-to-word alignments (I-yo, saw-vi, you-tú)– (context)

• using an online GUI: the Translation Correction Tool (TCTool)

Goal: • simplify MT correction task maximally


MT error typology for RR (simplified)

• missing word• extra word• word order (local vs long-distance, word

vs phrase, word change)• incorrect word (sense, form, selectional

restrictions, idiom, ...)• agreement (missing constraint, extra

agreement constraint)


Outline

• Motivation and Goals • Related Work• Technical Approach


• Work to Date• Proposed Research• Contributions and Open Questions


Automatic Rule Refinement Framework

• Find best RR operations given a:• grammar (G), • lexicon (L), • (set of) source language sentence(s) (SL), • (set of) target language sentence(s) (TL), • its parse tree (P), and • minimal correction of TL (TL’)

such that TQ2 > TQ1• Which can also be expressed as:

max TQ(TL|TL’,P,SL,RR(G,L))


Types of RR operations

• Grammar:– R0 R0 + R1 [=R0’ + contr] Cov[R0] Cov[R0,R1]– R0 R1 [=R0 + constr] Cov[R0] Cov[R1]– R0 R1[=R0 + constr= -]

R2[=R0’ + constr=c +] Cov[R0] Cov[R1,R2]

• Lexicon– Lex0 Lex0 + Lex1[=Lex0 + constr] – Lex0 Lex1[=Lex0 + constr]– Lex0 Lex0 + Lex1[Lex0 + TLword] Lex1 (adding lexical item)

bifurcate

refine


Formalizing Error Information

Wi = error

Wi’ = correction

Wc = clue word

Example:

SL: the red car - TL: *el auto roja TL’: el auto rojo

Wi = roja Wi’ = rojo Wc = auto

need to agree


Finding Triggering Features

Once we have user’s correction (Wi’), we can compare it with Wi at the feature level and find which is the triggering feature.

If set is empty, need to postulate a new binary feature

Delta function:


Outline





TCTool v0.1•Add a word•Delete a word•Modify a word•Change word order

Actions:

Interactive elicitation of error information


TCTool v0.1 specs

• First five translations from lattice produced by transfer engine.

• Asks users to pick correct translation, or else, best incorrect translation (i.e. the one requiring the least amount of corrections).

• Provides translation correction and error classification help (static tutorial + error example page).

• CGI scripts in PERL• Correction interface in JavaScript (Kenneth Sim and Patrick Milholland)



1st Eng2Spa user study

[LREC 2004]• Manual grammar: 12 rules + 442 lexical

entries• MT error classification (v0.0): 9 linguistically-

motivated classes word order, sense, agreement error (number, person, gender, tense), form, incorrect word and no translation

• Test set: 32 sentences from the AVENUE Elicitation Corpus (4 correct / 28 incorrect)



Data Analysis

• Interested in high precision, even at the expense of lower recall

• Users did not always fix a translation in the same way

• Most of the time, when the final translation was not = gold standard, it was still correct or better (better stylistically)



Rule Refinement Operations

• Organized according to type of actions users can perform to correct a sentence with TCTool

• And according to what error information is available (Wc, alignments, …)

Automatic Rule Adaptation


Rule Refinement Simulation IAutomatic Rule Adaptation

Change word order1. Run SL sentence through the transfer engine

Gaudí was a great artist

2. Input SL sentence and up to 5 alternative translation with alignments to Translation Correction Tool.

3. Input user correction log file with transfer engine output to RR module variable instantiation.

4. Determine appropriate RR operations that need to apply.

5. Modify grammar and lexicon by applying RR ops.

6. Run MT system again with refined grammar and lexicon.


Automatic Rule AdaptationSL + best TL picked by user


Automatic Rule AdaptationChanging “grande” into “gran”


Input to RR module

sl: Gaudi was a great artist

tl: GAUDI ERA UN ARTISTA GRANDE

tree: <((S,1 (NP,2 (N,5:1 "GAUDI") )

(VP,3 (VB,2 (AUX,17:2 "ERA") )

(NP,8 (DET,0:3 "UN")

(N,4:5 "ARTISTA")

(ADJ,5:4 "GRANDE") ) ) ) )>

-User correction log file -Transfer engine output (+ parse tree):


Variable instantiation from log fileCorrection Actions:

1. Word order change (artista grande grande artista):

Wi = grande

2. Edited grande into gran:Wi’ = gran identified artist as clue word Wc = artist

In this case, even if user had not identified Wc, refinement process would have been the same


Retrieve relevant lexical entriesADJ::ADJ |: [great] -> [grande]((X1::Y1)((x0 form) = great)((y0 agr num) = sg)((y0 agr gen) = masc))

N::N |: [artist] -> [artista]((X1::Y1)((x0 agr pers) = 3)((x0 agr num) = sg)((x0 form) = artist)((x0 semtype) = human))


Add lexical entry for “gran”Duplicate lexical entry great-grande and changeTL side:

ADJ::ADJ |: [great] -> [gran]((X1::Y1)((x0 form) = great)((y0 agr num) = sg)((y0 agr gen) = masc))

Even if we had morphological analyzer available,no difference between them:

grande grande AQ0CS0 grande NCCS000gran gran AQ0CS0

Lex0 Lex1[Lex0 + TLword]


Finding triggering feature(s)

Feature function: (Wi, Wi’) = need to postulate a new binary feature: feat1

Blame assignment:

tree: <((S,1 (NP,2 (N,5:1 "GAUDI") )

(VP,3 (VB,2 (AUX,17:2 "ERA") )

(NP,8 (DET,0:3 "UN")

(N,4:5 "ARTISTA")

(ADJ,5:4 "GRANDE") ) ) ) )>


Refining the rules

Wi = grande POSi = ADJ = Y3, y3WWcc = artist = artist POS POScc = N = Y2, y2 = N = Y2, y2

{NP,8}NP::NP : [DET ADJ N] -> [DET N ADJ]( (X1::Y1) (X2::Y3) (X3::Y2) ((x0 def) = (x1 def)) (x0 = x3) ((y1 agr) = (y2 agr)) ; det-noun agreement ((y3 agr) = (y2 agr)) ; adj-noun agreement (y2 = x3) )


Refining the rules

{NP,1008}NP::NP : [DET ADJ N] -> [DET ADJ N]( (X1::Y1) (X2::Y2) (X3::Y3) ((x0 def) = (x1 def)) (x0 = x3) ((y1 agr) = (y3 agr)) ; det-noun agreement ((y2 agr) = (y3 agr)) ; adj-noun agreement (y2 = x3) ((y2 feat1) =c + ) )


Refining the lexical entriesADJ::ADJ |: [great] -> [grande]((X1::Y1)((x0 form) = great)((y0 agr num) = sg)((y0 agr gen) = masc)((y0 feat1) = -))

ADJ::ADJ |: [great] -> [gran]((X1::Y1)((x0 form) = great)((y0 agr num) = sg)((y0 agr gen) = masc)((y0 feat1) = +))


Done? Not yet

• Right now we’ve just increased ambiguity in the grammar: translation candidate list size has increased by more than double, since both “grande” and “gran” can be unified with {NP,8} and “gran” now unifies with {NP,1008}.

• Need to restrict application of general rule to just post-nominal ADJ:

R0 R1[=R0 + constr= -] = NP,8 (general rule)

R2[=R0’ + constr=c +] = NP,1008 (specific rule)

Cov[R0] Cov[R1,R2]


Add blocking constraint

{NP,8}NP::NP : [DET ADJ N] -> [DET N ADJ]( (X1::Y1) (X2::Y3) (X3::Y2) ((x0 def) = (x1 def)) (x0 = x3) ((y1 agr) = (y2 agr)) ; det-noun agreement ((y3 agr) = (y2 agr)) ; adj-noun agreement (y2 = x3) ((y3 feat1) = - ) )


Refined MT output sl: Gaudi was a great artist

tl: GAUDI ERA UN ARTISTA GRANDE

tree: <((S,1 (NP,2 (N,5:1 "GAUDI") ) (VP,3 (VB,2 (AUX,17:2 "ERA") ) (NP,8 (DET,0:3 "UN") (N,4:5 "ARTISTA") (ADJ,5:4 "GRANDE") ) ) ) )>

tl: GAUDI ERA UNA ARTISTA GRANDE

tree: <((S,1 (NP,2 (N,5:1 "GAUDI") ) (VP,3 (VB,2 (AUX,17:2 "ERA") ) (NP,8 (DET,2:3 "UNA") (N,4:5 "ARTISTA") (ADJ,5:4 "GRANDE") ) ) ) )>

tl: GAUDI ERA UN GRAN ARTISTA

tree: <((S,1 (NP,2 (N,5:1 "GAUDI") ) (VP,3 (VB,2 (AUX,17:2 "ERA") ) (NP,1008 (DET,0:3 "UN") (ADJ,6:4 "GRAN") (N,4:5 "ARTISTA") ) ) ) )>

tl: GAUDI ERA UNA GRAN ARTISTA

tree: <((S,1 (NP,2 (N,5:1 "GAUDI") ) (VP,3 (VB,2 (AUX,17:2 "ERA") ) (NP,1008 (DET,2:3 "UNA") (ADJ,6:4 "GRAN") (N,4:5 "ARTISTA") ) ) ) )>

… [same for estaba]


Can we get rid of incorrect translation?• Since “great” translates both as “grande” and “gran”,

both rules will be applied, one with each type of adjective.

• If user identified Wc, the RR module can also tag “artist” with ((feat1)= +) and add an agreement constraint between ADJ and N ((y2 feat1) = (y3 feat1)) to {NP,8}

• For each new adjective that appears in this context, one of the two NP rules will be picked by users, and the RR module will be able to tag them as being pre-nominal (feat1 = +) or post-nominal (feat1 = -) in the lexicon.


Manual vs Learned Grammars[AMTA 2004]


NIST BLEU METEORManual grammar 4.3 0.16 0.6

Learned grammar 3.7 0.14 0.55

• Manual inspection:

• Automatic MT Evaluation:


Evaluation of refined MT output

Assumption:• user corrections = gold standard

reference translations

Idea:• compare raw MT output with MT output

by the refined grammar and lexicon using automatic evaluation metrics, such as BLEU and METEOR



Human Oracle experiment

• As a feasibility experiment, compared raw output with manually corrected MT:

statistically significant (confidence interval test)

• These is an upper-bound on how much difference we should expect any refinement approach to make.



Outline





Data Set• Development set: ~400 sentences which can

be fully parsed by the original manual grammar (50-100 rules)– from Typological and Structural Elicitation Corpus– categorized by error

• Split dev set into two: – dev set run user studies + develop RR module

+ validation set– test set evaluate effect of RR operations

• Wild test set (from naturally occurring text)– requirement: need to be fully parsed by grammar.


TCTool v0.2

• New way of eliciting MT error information– simple statements/questions about the error – possibly ordering from most informative to

least informative

• Time permitting:– Dynamic tutorial– User profiles will allow us to experiment

with user idiosyncratic grammars(consistent refinements) vs a general grammar for all users


User studies

• Eng2Spa II: test new version of TCTool (v0.2) and compare error correction + classification accuracy results.

• Resource-poor language to Spanish– likely candidates: Mapudungun, Quechua

• Batch vs Interactive mode

• Amount of error information elicited


Interactive mode• Extra error information is required

– more sentences need to be evaluated (and corrected by user) relevant minimal pairs

• Elicit extra MT error information by either evaluating (and correcting) relevant minimal pairs or by asking users to check which simple statements are true.

• Typically requires more effort from users and takes longer smaller test set


Active Learning

• Method to minimize the number of examples a human annotator must label [Cohn et al. 94] usually by processing examples in order of usefulness. [Lewis and Catlett 94] used uncertainty as a measure of usefulness.

• [Callison-Burch 2003] has proposed AL to reduce the cost of creating a corpus of labeled training examples for Statistical MT.


Scope

• Focus 1: types of errors that can be refined fully automatically just by using correction information.

• Focus 2: types of errors that can be refined fully automatically using correction and error information.

• Focus 3: types of errors that require a reasonable amount of further user interaction and can be solved by available correction and error information.


Outside scope

• Types of errors for which

– users cannot give the required information

or

– further user interaction might take too long and might not provide with the required information


Rule Refinement Simulation IIAutomatic Rule Adaptation

Change word order + further user interaction1. Run SL sentence through the transfer engine

I see them

2. Input SL sentence and up to 5 alternative translation with alignments to Translation Correction Tool.

3. Input user correction log file with transfer engine output to RR module variable instantiation.

4. Determine appropriate RR operations that need to apply.

5. Modify grammar and lexicon by applying RR ops.

6. Run MT system again with refined grammar and lexicon.


1

2

3


Input to RR module

sl: I see them

tl: VEO LOS

tree: <((S,0 (VP,3 (VP,1 (V,1:2 "VEO") )

(NP,0 (PRON,2:3 "LOS") ) ) ) )>

- User correction log file - Transfer engine output (+ parse tree):


under progress….


Variable instantiation from log file

Correction Actions:

1. Word order change (veo los los veo):Wi = los

2. Wc ? not a specific word, but rather a combination of features in Wi(pron+acc)

Need a minimal pair to determime appropriate refinement


Minimal Pair

sl: I see cars

tl: VEO AUTOS

tree: <((S,0 (VP,3 (VP,1 (V,1:2 "VEO") )

(NP,2 (N,1:3 “AUTOS") ) ) ) )>


Evaluation of Rule Refinements

Hypothesis file (translations to be evaluated automatically)

Raw MT output:• Best sentence (picked by user to be correct or

requiring the least amount of correction) Refined MT output:• Use METOR score at sentence level to pick

best candidate from the list

Run all automatic metrics on the new hypothesis file using user corrections as reference translations.


Measuring improvement

• Recall: increase on TQ+C as indicated by automatic metrics (generating the correct translation).

• Parsimony: not to increase the size of final candidate list more than strictly necessary.

• Precision: decreasing the number of incorrect hypothesis in the final translation candidate list when possible.


Outline





Expected Contributions

• An efficient online GUI to display translations and alignments and solicit pinpoint fixes from non-expert bilingual users.

• An MT error typology and mapping between user corrections and rule repair operations.

• An expandable set of rule-refinement operators, triggered by user corrections.


Expected contributions++

• A repair hypotheses management system, able to keep different candidate rule repairs for later confirmation or rejection.

• An analysis of the effects of automatic rule refinements on different types of grammars.

• A mechanism to automatically evaluate automatic RR wrt. user corrections.


Thesis TimelineResearch components Duration (months)

TCTool implementation and testing (+user studies) 3Data set + manual grammar expansion 1RR module implementation (batch mode) 5RR module +AL (interactive mode) 3Active Learning methods 1Adapt RR module to new language pair 1Evaluation 1Write and defend thesis 3

Total 18


References

Add references:

• Related work• Probst et al. 2002• AL


Desired contributions

• All the post-editing work done is not lost.

• For resource-poor languages, the AVENUE MT system with the TCTool and the RR module can effectively substitute for a computational linguist.

towards interactive and automatic refinement of translation rules

Documents

ebmtno computational

tradition smt

gaudi era

new training data

corrections of mt output

gaudi estaba

mara y ana estn cayendo

artista grande tl