Error Analysis of Two Types of Grammar for the purpose
ofAutomatic Rule Refinement
Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell
Language Technologies Institute
Carnegie Mellon University
AMTA 2004
October 1 AMTA 2004 2
Outline
• Automatic Rule Refinement• AVENUE and resource-poor scenarios• Experiment
• Data (eng2spa)• Two types of grammar• Evaluation results• Error analysis• RR required for each type
• Conclusions and Future Work
October 1 AMTA 2004 3
General- MT output still requires post-editing- Current systems do not recycle post-editing efforts
back into the system, beyond adding as new training data
within Avenue- Resource-poor scenarios: lack of manual grammar
or very small initial grammar- Need to validate elicitation corpus and
automatically learned translation rules
Motivation for Automatic RR
October 1 AMTA 2004 4
Motivation for Automatic RRGeneral- MT output still requires post-editing- Current systems do not recycle post-editing efforts
back into the system, beyond adding as new training data
within Avenue- Resource-poor scenarios: lack of manual grammar
or very small initial grammar- Need to validate elicitation corpus and
automatically learned translation rules
October 1 AMTA 2004 5
AVENUE and resource-poor scenarios
• No e-data available (often spoken tradition) SMT or EBMT
• lack of computational linguists to write a grammar
So how can we even start to think about MT?– That’s what AVENUE is all about
Elicitation Corpus + Automatic Rule Learning + Rule Refinement
What do we usually have available in resource-poor scenarios? Bilingual users
October 1 AMTA 2004 6
AVENUE overview
Learning
Module
Transfer Rules
Lexical Resources
Run Time Transfer System
Lattice
Translation
Correction
Tool
Word-Aligned Parallel Corpus
Elicitation Tool
Elicitation Corpus
Elicitation Rule Learning
Run-Time System
Rule Refinement
Rule
Refinement
Module
Handcrafted rules
Morphology
Morpho-logical analyzer
October 1 AMTA 2004 7
Automatic and Interactive RLR
SLS3
SLSentence1– TLSentence1 SLSentence2– TLSentence2
Automatically Learned Rule R
TLS3
1st step
2nd step
TLS3’
RR module
R’ (R refined)
SLS3
TLS3’
October 1 AMTA 2004 8
Interactive Elicitation of MT errorsAssumptions:
• non-expert bilingual users can reliably detect and minimally correct MT errors, given:– SL sentence (I saw you)– up to 5 TL sentences (Yo vi tú, ...)– word-to-word alignments (I-yo, saw-vi, you-tú)– (context)
• using an online GUI: the Translation Correction Tool (TCTool)
Goal: Simplify MT correction task maximally
User studies: 90% error detection accuracy and 73% error classification [LREC 2004]
October 1 AMTA 2004 11
TCTool v0.1•Add a word•Delete a word•Modify a word•Change word order
Actions:
October 1 AMTA 2004 12
RR Framework• Find best RR operations given a:
• grammar (G),
• lexicon (L),
• (set of) source language sentence(s) (SL),
• (set of) target language sentence(s) (TL),
• its parse tree (P), and
• minimal correction of TL (TL’)
such that TQ2 > TQ1• Which can also be expressed as:
max TQ(TL|TL’,P,SL,RR(G,L))
October 1 AMTA 2004 13
Types of RR operations• Grammar:
– R0 R0 + R1 [=R0’ + contr] Cov[R0] Cov[R0,R1]
– R0 R1 [=R0 + constr] Cov[R0] Cov[R1]
– R0 R1[=R0 + constr= -]
R2[=R0’ + constr=c +] Cov[R0] Cov[R1,R2]
• Lexicon– Lex0 Lex0 + Lex1[=Lex0 + constr]
– Lex0 Lex1[=Lex0 + constr]
– Lex0 Lex1[Lex0 + TLword] Lex1 (adding lexical item)
bifurcate
refine
October 1 AMTA 2004 15
Data: English - Spanish
Training• First 200 sentences from AVENUE Elicitation
Corpus• Lexicon: extracted semi-automatically from first
400 sentences (442 entries)
Test• 32 sentences manually selected from the next 200
sentences in the EC to showcase a variety of MT errors
October 1 AMTA 2004 16
Manual grammar
• 12 rules (2 S, 7 NP, 3 VP)
• Produces 1.6 different translations on average
October 1 AMTA 2004 17
Learned Grammar + feature constraints
• 316 rules (194 S, 43 NP, 78 VP, 1 PP)• emulated decoder by reordering of 3 rules
• Produces 18.6 different translations on average
October 1 AMTA 2004 18
Comparing Grammar Output: Results
• Manually:
• Automatic MT Evaluation:NIST BLEU METEOR
Manual grammar 4.3 0.16 0.6Learned grammar 3.7 0.14 0.55
October 1 AMTA 2004 19
Error Analysis• Most of the errors produced by the manual grammar can be
classified into:– lack of subj-pred agreement– wrong word order of object pronouns (clitic)– wrong preposition– wrong form (case)– OOV words
• On top of these, the learned grammar output exhibited errors of the following type:– lack of agreement constraints– missing preposition– over-generalization
October 1 AMTA 2004 20
• Same (both good)
• Manual Grammar better
• Learned Grammar better
• Different (both bad)
Examples
October 1 AMTA 2004 21
Types of RR required for
Manual Grammar
• Bifurcate a rule to code an exception:– R0 R0 + R1 [=R0’ + contr] Cov[R0] Cov[R0,R1]
– R0 R1[=R0 + constr= -]
R2[=R0’ + constr=c +] Cov[R0] Cov[R1,R2]
Learned Grammar
• Adjust feature constraints, such as agreement:– R0 R1 [=R0 +|- constr] Cov[R0] Cov[R1]
October 1 AMTA 2004 22
Conclusions
• TCTool + RR can improve both hand-crafted and automatically learned grammars.
• In the current experiment, MT errors differ almost 50% of the time, depending on the type of grammar.
• Manual G will need to be refined to encode exceptions, whereas Learned G will need to be refined to achieve the right level of generalization.
• We expect the RR to give the most leverage when combined with the Learned Grammar.
October 1 AMTA 2004 23
Future Work
• Experiment where user corrections are used both as new training examples for RL and to refine the existing grammar with the RR module.
• Investigate using reference translations to refine MT grammars automatically... but much harder since they are not minimal post-editions.
October 1 AMTA 2004 24
Questions???
Thank you!
October 1 AMTA 2004 28
RR Framework• types of operations: bifurcate, make more
specific/general, add blocking constraints, etc.
• formalizing error information (clue word)
• finding triggering features