mixed syntax translation hieu hoang mt marathon 2011 trento
TRANSCRIPT
MIXED SYNTAX TRANSLATION
Hieu HoangMT Marathon 2011 Trento
Contents• What is a syntactic model?
• What’s wrong with Syntax?
• Which syntax model to use?
• Why use syntactic models?
• Mixed-Syntax Model– Extraction– Decoding– Results
• Future Work
What is a syntactic model?
• Hierarchical Phrase-Based Model– String-to-string– Non-terminals are unlabelled
• Tree-to-string Model– Source non-terminals are labelled• match input parse tree
X habe X1 gegessen # have eaten X1
S habe NP1 gegessen # have eaten NP1
What is a syntactic model?
• Hierarchical Phrase-Based Model– String-to-string– Non-terminals are unlabelled
• Tree-to-string Model– Source non-terminals are labelled• match input parse tree
X habe X1 gegessen # have eaten X1
S habe NP1 gegessen # have eaten NP1
What is a syntactic model?
• Hierarchical Phrase-Based Model– String-to-string– Non-terminals are unlabelled
• Tree-to-string Model– Source non-terminals are labelled• match input parse tree
X habe X1 gegessen # have eaten X1
S habe NP1 gegessen # have eaten NP1
What is a syntactic model?
• Hierarchical Phrase-Based Model– String-to-string– Non-terminals are unlabelled
• Tree-to-string Model– Source non-terminals are labelled• match input parse tree
X habe X1 gegessen # have eaten X1
S habe NP1 gegessen # have eaten NP1
What is a syntactic model?
• Hierarchical Phrase-Based Model– String-to-string– Non-terminals are unlabelled
• Tree-to-string Model– Source non-terminals are labelled• match input parse tree
X habe X1 gegessen # have eaten X1
S habe NP1 gegessen # have eaten NP1
Contents• What is a syntactic model?
• What’s Wrong with Syntax?
• Which syntax model to use?
• Why use syntactic models?
• Mixed-Syntax Model– Extraction– Decoding– Results
• Future Work
What’s Wrong with Syntax?
(Ambati and Lavie, 2009)
Evaluation of French-English MT System
BLEU METEOR
Tree-to-string 27.02 57.68
Tree-to-tree 22.23 54.05
Moses (phrase-based) 30.18 58.13
Hierarchical Model
according to János Veres , this would be in the first quarter of 2008 possible .
Hierarchical Model
according to János Veres , this would be in the first quarter of 2008 possible .
Hierarchical Model
according to János Veres , this would be in the first quarter of 2008 possible .
Tree-to-String Model
1 2 3 4 5 6 7 8 9 10 11laut János Veres wäre dies im ersten Quartal 2008 möglich .
PN
PP PP
S
TOP
APPR NE NE VAFIN PDS APPRART ADJA NN CARD ADJA PUNC
Tree-to-String Model
PN
PP PP
S
TOP
APPR NE NE VAFIN PDS APPRART ADJA NN CARD ADJA PUNC
according to János Veres would be this in the first quarter of2008 possible .
Tree-to-String Model
PN
PP PP
S
TOP
APPR NE NE VAFIN PDS APPRART ADJA NN CARD ADJA PUNC
according to János Veres would be this in the first quarter of2008 possible .
[X,1] [X,2]
[X,1] [X,2]
[X,1] [X,2]
[X,1] [X,2]
[X,1] [X,2]
[X,1] [X,2]
[X,1] [X,2]
[X,1] [X,2]
[X,1] [X,2]
Contents
• What’s Wrong with Syntax?
• Which syntax model to use?
• Why use syntactic models?
• Mixed-Syntax Model– Extraction– Decoding– Results
• Future Work
Other Syntactic Models• Syntax-Augmented MT (SAMT)
– Not constrained only to parse tree– (Zollmann and Venugopal, 2006)
• Binarization– Restructure and relabel parse parse tree– (Wang et al, 2010)
• Forest-based translation– Recover from parse errors– (Mi et al, 2008)
• Soft constraint– Reward/Penalize derivations which follows parse structure– (Chiang 2010)
Other Syntactic Models• Syntax-Augmented MT (SAMT)
– Not constrained only to parse tree– (Zollmann and Venugopal, 2006)
• Binarization– Restructure and relabel parse parse tree– (Wang et al, 2010)
• Forest-based translation– Recover from parse errors– (Mi et al, 2008)
• Soft constraint– Reward/Penalize derivations which follows parse structure– (Chiang 2010)
Other Syntactic Models• Syntax-Augmented MT (SAMT)
– Not constrained only to parse tree– (Zollmann and Venugopal, 2006)
• Binarization– Restructure and relabel parse parse tree– (Wang et al, 2010)
• Forest-based translation– Recover from parse errors– (Mi et al, 2008)
• Soft constraint– Reward/Penalize derivations which follows parse structure– (Chiang 2010)
Other Syntactic Models• Syntax-Augmented MT (SAMT)
– Not constrained only to parse tree– (Zollmann and Venugopal, 2006)
• Binarization– Restructure and relabel parse parse tree– (Wang et al, 2010)
• Forest-based translation– Recover from parse errors– (Mi et al, 2008)
• Soft constraint– Reward/Penalize derivations which follows parse structure– (Chiang 2010)
Other Syntactic Models• Syntax-Augmented MT (SAMT)
– Not constrained only to parse tree– (Zollmann and Venugopal, 2006)
• Binarization– Restructure and relabel parse parse tree– (Wang et al, 2010)
• Forest-based translation– Recover from parse errors– (Mi et al, 2008)
• Soft constraint– Reward/Penalize derivations which follows parse structure– (Chiang 2010)
Contents
• What’s Wrong with Syntax?
• Which syntax model to use?
• Why use syntactic models?
• Mixed-Syntax Model– Extraction– Decoding– Results
• Future Work
Why Use Syntactic Models?
• Decrease decoding time– Derivation constrained by source parse tree
Why Use Syntactic Models?
• Decrease decoding time– Derivation constrained by source parse tree
• Long-range reordering during decoding– rules covering more words than max-span limit
Why Use Syntactic Models?
• Decrease decoding time– Derivation constrained by source parse tree
• Long-range reordering during decoding– rules covering more words than max-span limit
• Other rule-forms– 3+ non-terminals– consecutive non-terminals– non-lexicalized rules
Why Use Syntactic Models?
• Decrease decoding time– Derivation constrained by source parse tree
• Long-range reordering during decoding– rules covering more words than max-span limit
• Other rule-forms– 3+ non-terminals– consecutive non-terminals– non-lexicalized rules
X S1O2V3 # S1V3O2
X PRO1 PRO2 aime bien # PRO1 like PRO2
Contents
• What’s Wrong with Syntax?
• Which syntax model to use?
• Why use syntactic models?
• Mixed-Syntax Model– Extraction– Decoding– Results
• Future Work
Other Syntactic Models• Syntax-Augmented MT (SAMT)
– Not constrained only to parse tree– (Zollmann and Venugopal, 2006)
• Binarization– Restructure and relabel parse parse tree– (Wang et al, 2010)
• Forest-based translation– Recover from parse errors– (Mi et al, 2008)
• Soft constraint– Reward/Penalize derivations which follows parse structure– (Chiang 2010)
Other Syntactic Models• Syntax-Augmented MT (SAMT)
– Not constrained only to parse tree– (Zollmann and Venugopal, 2006)
• Binarization– Restructure and relabel parse parse tree– (Wang et al, 2010)
• Forest-based translation– Recover from parse errors– (Mi et al, 2008)
• Soft constraint– Reward/Penalize derivations which follows parse structure– (Chiang 2010)
• Ignore Syntax (occasionally)
Mixed-Syntax Model
• Tree-to-string model– input is a parse tree
• Roles of non-terminals– Constrain derivation to parse constituents– State information• Consistent node label on target derivation• hypotheses with different head NT cannot be
recombined
Mixed-Syntax Model
• Tree-to-string model– input is a parse tree
• Roles of non-terminals– Constrain derivation to parse constituents
• Can sometime have no constraints
– State information• Consistent node label on target derivation• hypotheses with different head NT cannot be
recombined• always X
Mixed-Syntax Model
• Naïve syntax model
• Mixed-Syntax Model
Example Translation Rules
VP VVFIN1 zu VVINF2 # to VVFIN2 VVINF1
VP X1 zu VVINF2 # X to X2 X1
Mixed-Syntax Model
• Naïve syntax model
• Mixed-Syntax Model
Example Translation Rules
VP VVFIN1 zu VVINF2 # to VVFIN2 VVINF1
VP X1 zu VVINF2 # X to X2 X1
Mixed-Syntax Model
• Naïve syntax model
• Mixed-Syntax Model
Example Translation Rules
VP VVFIN1 zu VVINF2 # to VVFIN2 VVINF1
VP X1 zu VVINF2 # X to X2 X1
Contents
• What’s Wrong with Syntax?
• Which syntax model to use?
• Why use syntactic models?
• Mixed-Syntax Model– Extraction– Decoding– Results
• Future Work
Extraction
• Allow rules– Max 3 non-terminals– Adjacent non-terminals• At least 1 NT must be syntactic
– Non-lexicalized rules
Example Rules ExtractedRule Factional Count p( t | s)
Syntactic Rules
VP NP1 VVINF2 # XX2 X1167.3 68%
Mixed Rules
VP X1 VZ2 # X X2 X163.3 64%
VP X1 zu VVINF2 # X to X2 X139.9 56%
TOP NP1 X2 # X X1 X243.1 92%
Contents
• What’s Wrong with Syntax?
• Which syntax model to use?
• Why use syntactic models?
• Mixed-Syntax Model– Extraction– Decoding– Results
• Future Work
Synchronous CFGInput:
Synchronous CFGInput:
Rules:S NP1 VP2 # NP1 VP2
NP je # I
PRO lui # him
VB vu # see
VP ne PRO1 VB2 pas # did not VB2 PRO1
Synchronous CFGInput:
Rules:S NP1 VP2 # NP1 VP2
NP je # I
PRO lui # him
VP ne PRO1 VB2 pas # did not VB2 PRO1
Derivation:
VB vu # see
Synchronous CFGInput:
Rules:S NP1 VP2 # NP1 VP2
NP je # I
PRO lui # him
VP ne PRO1 VB2 pas # did not VB2 PRO1
Derivation:
VB vu # see
Synchronous CFGInput:
Rules:S NP1 VP2 # NP1 VP2
NP je # I
PRO lui # him
VP ne PRO1 VB2 pas # did not VB2 PRO1
Derivation:
VB vu # see
Synchronous CFGInput:
Rules:S NP1 VP2 # NP1 VP2
NP je # I
PRO lui # him
VP ne PRO1 VB2 pas # did not VB2 PRO1
Derivation:
VB vu # see
Synchronous CFGInput:
Rules:S NP1 VP2 # NP1 VP2
NP je # I
PRO lui # him
VP ne PRO1 VB2 pas # did not VB2 PRO1
Derivation:
VB vu # see
Synchronous CFGInput:
Rules:S NP1 VP2 # NP1 VP2
NP je # I
PRO lui # him
VP ne PRO1 VB2 pas # did not VB2 PRO1
Derivation:
VB vu # see
Mixed-Syntax ModelInput:
Mixed-Syntax ModelInput:
Rules:S NP1 VP2 # X NP1 VP2
PRO je # X I
PRO lui # X him
VP ne X1 pas # did not X1
VB vu # X see
X PRO1 VB2 # X X2 X1
Mixed-Syntax ModelInput:
Rules:S NP1 VP2 # X NP1 VP2
PRO je # X I
PRO lui # X him
VP ne X1 pas # did not X1
VB vu # X see
X PRO1 VB2 # X X2 X1
Derivation:
Mixed-Syntax ModelInput:
Rules:S NP1 VP2 # X NP1 VP2
PRO je # X I
PRO lui # X him
VP ne X1 pas # did not X1
VB vu # X see
X PRO1 VB2 # X X2 X1
Derivation:
Mixed-Syntax ModelInput:
Rules:S NP1 VP2 # X NP1 VP2
PRO je # X I
PRO lui # X him
VP ne X1 pas # did not X1
VB vu # X see
X PRO1 VB2 # X X2 X1
Derivation:
Mixed-Syntax ModelInput:
Rules:S NP1 VP2 # X NP1 VP2
PRO je # X I
PRO lui # X him
VP ne X1 pas # did not X1
VB vu # X see
X PRO1 VB2 # X X2 X1
Derivation:
Mixed-Syntax ModelInput:
Rules:S NP1 VP2 # X NP1 VP2
PRO je # X I
PRO lui # X him
VP ne X1 pas # did not X1
VB vu # X see
X PRO1 VB2 # X X2 X1
Derivation:
Mixed-Syntax ModelInput:
Rules:S NP1 VP2 # X NP1 VP2
PRO je # X I
PRO lui # X him
VP ne X1 pas # did not X1
VB vu # X see
X PRO1 VB2 # X X2 X1
Derivation:
Contents
• What’s Wrong with Syntax?
• Which syntax model to use?
• Why use syntactic models?
• Mixed-Syntax Model– Extraction– Decoding– Results
• Future Work
ExperimentGerman-English
Corpus
German English
Train Sentences 82,306
Words 2,034,373 1,965,325
Tune Sentences 2000
Test Sentences 1026
Trained: News Commentary 2009Tuned: held out setTested: nc test2007
Results
Model # rules %BLEU
Hierarchical 61.2m 15.9
Tree-to-String 4.7m 14.9
Mixed Syntax 128.7m 16.7
Using constituent parseGerman-English
Model # rules %BLEU
Hierarchical 84.6m 10.2
Mixed Syntax 175.0m 10.6
English-German
Example
according to János Veres , this would be in the first quarter of 2008 possible .
Hierarchical Model
ExampleMixed-Syntax Model
according to János Veres this would be possible in the first quarter of 2008.
.
possible
2008quarter of
[APPRART,1] [X,2] [CARD,3]
thiswould beVeresJános
according to
[X,1] [X,2]
[X,1] [X,2]
[X,1] [X,2]
first
[ADJA,1] [NN,2]
in the
[X,2] [PP,1] [PUNC,3]
[PDS,2] [VAFIN,1]
[NE,1] [NE,2]
1 2 3 4 5 6 7 8 9 10 11
laut János Veres wäre dies im ersten Quartal 2008 möglich .
APPR NE NE VAFIN PDS APPRART ADJA NN CARD ADJA PUNC
PN PP
PP
S
TOP
Example
according to János Veres this would be possible in the first quarter of 2008.
.
possible
2008quarter of
[APPRART,1] [X,2] [CARD,3]
thiswould beVeresJános
according to
[X,1] [X,2]
[X,1] [X,2]
[X,1] [X,2]
first
[ADJA,1] [NN,2]
in the
[X,2] [PP,1] [PUNC,3]
[PDS,2] [VAFIN,1]
[NE,1] [NE,2]
1 2 3 4 5 6 7 8 9 10 11
laut János Veres wäre dies im ersten Quartal 2008 möglich .
APPR NE NE VAFIN PDS APPRART ADJA NN CARD ADJA PUNC
PN PP
PP
S
TOP
Mixed Syntax
Example
according to János Veres this would be possible in the first quarter of 2008.
.
possible
2008quarter of
[APPRART,1] [X,2] [CARD,3]
thiswould beVeresJános
according to
[X,1] [X,2]
[X,1] [X,2]
[X,1] [X,2]
first
[ADJA,1] [NN,2]
in the
[X,2] [PP,1] [PUNC,3]
[PDS,2] [VAFIN,1]
[NE,1] [NE,2]
1 2 3 4 5 6 7 8 9 10 11
laut János Veres wäre dies im ersten Quartal 2008 möglich .
APPR NE NE VAFIN PDS APPRART ADJA NN CARD ADJA PUNC
PN PP
PP
S
TOP
Mixed Syntax
Example
according to János Veres this would be possible in the first quarter of 2008.
.
possible
2008quarter of
[APPRART,1] [X,2] [CARD,3]
thiswould beVeresJános
according to
[X,1] [X,2]
[X,1] [X,2]
[X,1] [X,2]
first
[ADJA,1] [NN,2]
in the
[X,2] [PP,1] [PUNC,3]
[PDS,2] [VAFIN,1]
[NE,1] [NE,2]
1 2 3 4 5 6 7 8 9 10 11
laut János Veres wäre dies im ersten Quartal 2008 möglich .
APPR NE NE VAFIN PDS APPRART ADJA NN CARD ADJA PUNC
PN PP
PP
S
TOP
Mixed Syntax
Example
according to János Veres this would be possible in the first quarter of 2008.
.
possible
2008quarter of
[APPRART,1] [X,2] [CARD,3]
thiswould beVeresJános
according to
[X,1] [X,2]
[X,1] [X,2]
[X,1] [X,2]
first
[ADJA,1] [NN,2]
in the
[X,2] [PP,1] [PUNC,3]
[PDS,2] [VAFIN,1]
[NE,1] [NE,2]
1 2 3 4 5 6 7 8 9 10 11
laut János Veres wäre dies im ersten Quartal 2008 möglich .
APPR NE NE VAFIN PDS APPRART ADJA NN CARD ADJA PUNC
PN PP
PP
S
TOP
Mixed Syntax
Chunk Tags
• Advantages of Shallow Tags– Don’t need Treebank– More reliable
• Disadvantages– Not a tree structure• We don’t rely on tree structure
ResultsShallow Tags
German-EnglishModel # rules %BLEU
Hierarchical 64.3m 16.3
Mixed Syntax 254.5m 16.8
Larger Training CorpusGerman-English
Corpus
German English Corpus
Train Sentences 1,446,224 Europarl v5
Words 37,420,876 39,464,626
Tune Sentences 1910 dev2006
Test (in-domain)(out-of-domain)
Sentences19201042
nc test2007 v2 devtest2006
Larger Training CorpusGerman-English
Model # rules In-domain (BLEU)
Out-of-domain(BLEU)
Hierarchical 500m 22.1 16.5
Mixed Syntax (original) 2664m 21.6 16.3
Mixed Syntax (new extraction)
1435m 22.7 17.8
Contents
• What’s Wrong with Syntax?
• Which syntax model to use?
• Why use syntactic models?
• Mixed-Syntax Model– Extraction– Decoding– Results
• Future Work
Create your own labelDumb labels
ich bitte Sie , sich zu einer Schweigeminute zu erheben .
Create your own labelDumb labels
ich bitte Sie , sich zu einer Schweigeminute zu erheben .
Sz
e
Create your own labelsDumb labels
ich bitte Sie , sich zu einer Schweigeminute zu erheben .
Sz
e
Model In-domain (BLEU)
Out-of-domain(BLEU)
Hierarchical 22.1 16.5
Dumb Labels 22.0 16.3
Create your own labelsLabels motivated by reordering
Labelling patterns:
1. VMFIN...VVINF EOS2. VVINF und ... VVINF3. VAFIN ... (VVPP or VVINF) EOS4. , PRELS ... VVINF EOS5. EOS ... zu VVINF
Example:ich bitte Sie , sich zu einer Schweigeminute zu erheben .
label 5… werde ich dem Vorschlag von Herrn Evans folgen .
label 3
Create your own labels
Labels motivated by reordering
Model In-domain (BLEU)
Out-of-domain(BLEU)
Hierarchical 22.1 16.5
Dumb Labels 22.0 16.3
Reordering Labels 22.1 16.9
Conclusion• Mixed-Syntax Model– SCFG-based decoding– Hierarchical phrase-based v. tree-to-string– Generality v. specificity
• Syntax Models– Many variations– Won’t automatically make MT better– Question
• which syntactic information?• how do we use it?• why use it?
END