mixed syntax translation hieu hoang mt marathon 2011 trento

75
MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Upload: jacob-byrne

Post on 28-Mar-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

MIXED SYNTAX TRANSLATION

Hieu HoangMT Marathon 2011 Trento

Page 2: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Contents• What is a syntactic model?

• What’s wrong with Syntax?

• Which syntax model to use?

• Why use syntactic models?

• Mixed-Syntax Model– Extraction– Decoding– Results

• Future Work

Page 3: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

What is a syntactic model?

• Hierarchical Phrase-Based Model– String-to-string– Non-terminals are unlabelled

• Tree-to-string Model– Source non-terminals are labelled• match input parse tree

X habe X1 gegessen # have eaten X1

S habe NP1 gegessen # have eaten NP1

Page 4: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

What is a syntactic model?

• Hierarchical Phrase-Based Model– String-to-string– Non-terminals are unlabelled

• Tree-to-string Model– Source non-terminals are labelled• match input parse tree

X habe X1 gegessen # have eaten X1

S habe NP1 gegessen # have eaten NP1

Page 5: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

What is a syntactic model?

• Hierarchical Phrase-Based Model– String-to-string– Non-terminals are unlabelled

• Tree-to-string Model– Source non-terminals are labelled• match input parse tree

X habe X1 gegessen # have eaten X1

S habe NP1 gegessen # have eaten NP1

Page 6: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

What is a syntactic model?

• Hierarchical Phrase-Based Model– String-to-string– Non-terminals are unlabelled

• Tree-to-string Model– Source non-terminals are labelled• match input parse tree

X habe X1 gegessen # have eaten X1

S habe NP1 gegessen # have eaten NP1

Page 7: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

What is a syntactic model?

• Hierarchical Phrase-Based Model– String-to-string– Non-terminals are unlabelled

• Tree-to-string Model– Source non-terminals are labelled• match input parse tree

X habe X1 gegessen # have eaten X1

S habe NP1 gegessen # have eaten NP1

Page 8: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Contents• What is a syntactic model?

• What’s Wrong with Syntax?

• Which syntax model to use?

• Why use syntactic models?

• Mixed-Syntax Model– Extraction– Decoding– Results

• Future Work

Page 9: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

What’s Wrong with Syntax?

(Ambati and Lavie, 2009)

Evaluation of French-English MT System

BLEU METEOR

Tree-to-string 27.02 57.68

Tree-to-tree 22.23 54.05

Moses (phrase-based) 30.18 58.13

Page 10: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Hierarchical Model

according to János Veres , this would be in the first quarter of 2008 possible .

Page 11: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Hierarchical Model

according to János Veres , this would be in the first quarter of 2008 possible .

Page 12: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Hierarchical Model

according to János Veres , this would be in the first quarter of 2008 possible .

Page 13: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Tree-to-String Model

1 2 3 4 5 6 7 8 9 10 11laut János Veres wäre dies im ersten Quartal 2008 möglich .

PN

PP PP

S

TOP

APPR NE NE VAFIN PDS APPRART ADJA NN CARD ADJA PUNC

Page 14: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Tree-to-String Model

PN

PP PP

S

TOP

APPR NE NE VAFIN PDS APPRART ADJA NN CARD ADJA PUNC

according to János Veres would be this in the first quarter of2008 possible .

Page 15: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Tree-to-String Model

PN

PP PP

S

TOP

APPR NE NE VAFIN PDS APPRART ADJA NN CARD ADJA PUNC

according to János Veres would be this in the first quarter of2008 possible .

[X,1] [X,2]

[X,1] [X,2]

[X,1] [X,2]

[X,1] [X,2]

[X,1] [X,2]

[X,1] [X,2]

[X,1] [X,2]

[X,1] [X,2]

[X,1] [X,2]

Page 16: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Contents

• What’s Wrong with Syntax?

• Which syntax model to use?

• Why use syntactic models?

• Mixed-Syntax Model– Extraction– Decoding– Results

• Future Work

Page 17: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Other Syntactic Models• Syntax-Augmented MT (SAMT)

– Not constrained only to parse tree– (Zollmann and Venugopal, 2006)

• Binarization– Restructure and relabel parse parse tree– (Wang et al, 2010)

• Forest-based translation– Recover from parse errors– (Mi et al, 2008)

• Soft constraint– Reward/Penalize derivations which follows parse structure– (Chiang 2010)

Page 18: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Other Syntactic Models• Syntax-Augmented MT (SAMT)

– Not constrained only to parse tree– (Zollmann and Venugopal, 2006)

• Binarization– Restructure and relabel parse parse tree– (Wang et al, 2010)

• Forest-based translation– Recover from parse errors– (Mi et al, 2008)

• Soft constraint– Reward/Penalize derivations which follows parse structure– (Chiang 2010)

Page 19: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Other Syntactic Models• Syntax-Augmented MT (SAMT)

– Not constrained only to parse tree– (Zollmann and Venugopal, 2006)

• Binarization– Restructure and relabel parse parse tree– (Wang et al, 2010)

• Forest-based translation– Recover from parse errors– (Mi et al, 2008)

• Soft constraint– Reward/Penalize derivations which follows parse structure– (Chiang 2010)

Page 20: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Other Syntactic Models• Syntax-Augmented MT (SAMT)

– Not constrained only to parse tree– (Zollmann and Venugopal, 2006)

• Binarization– Restructure and relabel parse parse tree– (Wang et al, 2010)

• Forest-based translation– Recover from parse errors– (Mi et al, 2008)

• Soft constraint– Reward/Penalize derivations which follows parse structure– (Chiang 2010)

Page 21: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Other Syntactic Models• Syntax-Augmented MT (SAMT)

– Not constrained only to parse tree– (Zollmann and Venugopal, 2006)

• Binarization– Restructure and relabel parse parse tree– (Wang et al, 2010)

• Forest-based translation– Recover from parse errors– (Mi et al, 2008)

• Soft constraint– Reward/Penalize derivations which follows parse structure– (Chiang 2010)

Page 22: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Contents

• What’s Wrong with Syntax?

• Which syntax model to use?

• Why use syntactic models?

• Mixed-Syntax Model– Extraction– Decoding– Results

• Future Work

Page 23: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Why Use Syntactic Models?

• Decrease decoding time– Derivation constrained by source parse tree

Page 24: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Why Use Syntactic Models?

• Decrease decoding time– Derivation constrained by source parse tree

• Long-range reordering during decoding– rules covering more words than max-span limit

Page 25: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Why Use Syntactic Models?

• Decrease decoding time– Derivation constrained by source parse tree

• Long-range reordering during decoding– rules covering more words than max-span limit

• Other rule-forms– 3+ non-terminals– consecutive non-terminals– non-lexicalized rules

Page 26: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Why Use Syntactic Models?

• Decrease decoding time– Derivation constrained by source parse tree

• Long-range reordering during decoding– rules covering more words than max-span limit

• Other rule-forms– 3+ non-terminals– consecutive non-terminals– non-lexicalized rules

X S1O2V3 # S1V3O2

X PRO1 PRO2 aime bien # PRO1 like PRO2

Page 27: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Contents

• What’s Wrong with Syntax?

• Which syntax model to use?

• Why use syntactic models?

• Mixed-Syntax Model– Extraction– Decoding– Results

• Future Work

Page 28: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Other Syntactic Models• Syntax-Augmented MT (SAMT)

– Not constrained only to parse tree– (Zollmann and Venugopal, 2006)

• Binarization– Restructure and relabel parse parse tree– (Wang et al, 2010)

• Forest-based translation– Recover from parse errors– (Mi et al, 2008)

• Soft constraint– Reward/Penalize derivations which follows parse structure– (Chiang 2010)

Page 29: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Other Syntactic Models• Syntax-Augmented MT (SAMT)

– Not constrained only to parse tree– (Zollmann and Venugopal, 2006)

• Binarization– Restructure and relabel parse parse tree– (Wang et al, 2010)

• Forest-based translation– Recover from parse errors– (Mi et al, 2008)

• Soft constraint– Reward/Penalize derivations which follows parse structure– (Chiang 2010)

• Ignore Syntax (occasionally)

Page 30: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Mixed-Syntax Model

• Tree-to-string model– input is a parse tree

• Roles of non-terminals– Constrain derivation to parse constituents– State information• Consistent node label on target derivation• hypotheses with different head NT cannot be

recombined

Page 31: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Mixed-Syntax Model

• Tree-to-string model– input is a parse tree

• Roles of non-terminals– Constrain derivation to parse constituents

• Can sometime have no constraints

– State information• Consistent node label on target derivation• hypotheses with different head NT cannot be

recombined• always X

Page 32: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Mixed-Syntax Model

• Naïve syntax model

• Mixed-Syntax Model

Example Translation Rules

VP VVFIN1 zu VVINF2 # to VVFIN2 VVINF1

VP X1 zu VVINF2 # X to X2 X1

Page 33: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Mixed-Syntax Model

• Naïve syntax model

• Mixed-Syntax Model

Example Translation Rules

VP VVFIN1 zu VVINF2 # to VVFIN2 VVINF1

VP X1 zu VVINF2 # X to X2 X1

Page 34: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Mixed-Syntax Model

• Naïve syntax model

• Mixed-Syntax Model

Example Translation Rules

VP VVFIN1 zu VVINF2 # to VVFIN2 VVINF1

VP X1 zu VVINF2 # X to X2 X1

Page 35: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Contents

• What’s Wrong with Syntax?

• Which syntax model to use?

• Why use syntactic models?

• Mixed-Syntax Model– Extraction– Decoding– Results

• Future Work

Page 36: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Extraction

• Allow rules– Max 3 non-terminals– Adjacent non-terminals• At least 1 NT must be syntactic

– Non-lexicalized rules

Page 37: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Example Rules ExtractedRule Factional Count p( t | s)

Syntactic Rules

VP NP1 VVINF2 # XX2 X1167.3 68%

Mixed Rules

VP X1 VZ2 # X X2 X163.3 64%

VP X1 zu VVINF2 # X to X2 X139.9 56%

TOP NP1 X2 # X X1 X243.1 92%

Page 38: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Contents

• What’s Wrong with Syntax?

• Which syntax model to use?

• Why use syntactic models?

• Mixed-Syntax Model– Extraction– Decoding– Results

• Future Work

Page 39: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Synchronous CFGInput:

Page 40: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Synchronous CFGInput:

Rules:S NP1 VP2 # NP1 VP2

NP je # I

PRO lui # him

VB vu # see

VP ne PRO1 VB2 pas # did not VB2 PRO1

Page 41: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Synchronous CFGInput:

Rules:S NP1 VP2 # NP1 VP2

NP je # I

PRO lui # him

VP ne PRO1 VB2 pas # did not VB2 PRO1

Derivation:

VB vu # see

Page 42: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Synchronous CFGInput:

Rules:S NP1 VP2 # NP1 VP2

NP je # I

PRO lui # him

VP ne PRO1 VB2 pas # did not VB2 PRO1

Derivation:

VB vu # see

Page 43: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Synchronous CFGInput:

Rules:S NP1 VP2 # NP1 VP2

NP je # I

PRO lui # him

VP ne PRO1 VB2 pas # did not VB2 PRO1

Derivation:

VB vu # see

Page 44: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Synchronous CFGInput:

Rules:S NP1 VP2 # NP1 VP2

NP je # I

PRO lui # him

VP ne PRO1 VB2 pas # did not VB2 PRO1

Derivation:

VB vu # see

Page 45: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Synchronous CFGInput:

Rules:S NP1 VP2 # NP1 VP2

NP je # I

PRO lui # him

VP ne PRO1 VB2 pas # did not VB2 PRO1

Derivation:

VB vu # see

Page 46: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Synchronous CFGInput:

Rules:S NP1 VP2 # NP1 VP2

NP je # I

PRO lui # him

VP ne PRO1 VB2 pas # did not VB2 PRO1

Derivation:

VB vu # see

Page 47: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Mixed-Syntax ModelInput:

Page 48: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Mixed-Syntax ModelInput:

Rules:S NP1 VP2 # X NP1 VP2

PRO je # X I

PRO lui # X him

VP ne X1 pas # did not X1

VB vu # X see

X PRO1 VB2 # X X2 X1

Page 49: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Mixed-Syntax ModelInput:

Rules:S NP1 VP2 # X NP1 VP2

PRO je # X I

PRO lui # X him

VP ne X1 pas # did not X1

VB vu # X see

X PRO1 VB2 # X X2 X1

Derivation:

Page 50: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Mixed-Syntax ModelInput:

Rules:S NP1 VP2 # X NP1 VP2

PRO je # X I

PRO lui # X him

VP ne X1 pas # did not X1

VB vu # X see

X PRO1 VB2 # X X2 X1

Derivation:

Page 51: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Mixed-Syntax ModelInput:

Rules:S NP1 VP2 # X NP1 VP2

PRO je # X I

PRO lui # X him

VP ne X1 pas # did not X1

VB vu # X see

X PRO1 VB2 # X X2 X1

Derivation:

Page 52: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Mixed-Syntax ModelInput:

Rules:S NP1 VP2 # X NP1 VP2

PRO je # X I

PRO lui # X him

VP ne X1 pas # did not X1

VB vu # X see

X PRO1 VB2 # X X2 X1

Derivation:

Page 53: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Mixed-Syntax ModelInput:

Rules:S NP1 VP2 # X NP1 VP2

PRO je # X I

PRO lui # X him

VP ne X1 pas # did not X1

VB vu # X see

X PRO1 VB2 # X X2 X1

Derivation:

Page 54: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Mixed-Syntax ModelInput:

Rules:S NP1 VP2 # X NP1 VP2

PRO je # X I

PRO lui # X him

VP ne X1 pas # did not X1

VB vu # X see

X PRO1 VB2 # X X2 X1

Derivation:

Page 55: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Contents

• What’s Wrong with Syntax?

• Which syntax model to use?

• Why use syntactic models?

• Mixed-Syntax Model– Extraction– Decoding– Results

• Future Work

Page 56: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

ExperimentGerman-English

Corpus

German English

Train Sentences 82,306

Words 2,034,373 1,965,325

Tune Sentences 2000

Test Sentences 1026

Trained: News Commentary 2009Tuned: held out setTested: nc test2007

Page 57: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Results

Model # rules %BLEU

Hierarchical 61.2m 15.9

Tree-to-String 4.7m 14.9

Mixed Syntax 128.7m 16.7

Using constituent parseGerman-English

Model # rules %BLEU

Hierarchical 84.6m 10.2

Mixed Syntax 175.0m 10.6

English-German

Page 58: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Example

according to János Veres , this would be in the first quarter of 2008 possible .

Hierarchical Model

Page 59: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

ExampleMixed-Syntax Model

according to János Veres this would be possible in the first quarter of 2008.

.

possible

2008quarter of

[APPRART,1] [X,2] [CARD,3]

thiswould beVeresJános

according to

[X,1] [X,2]

[X,1] [X,2]

[X,1] [X,2]

first

[ADJA,1] [NN,2]

in the

[X,2] [PP,1] [PUNC,3]

[PDS,2] [VAFIN,1]

[NE,1] [NE,2]

1 2 3 4 5 6 7 8 9 10 11

laut János Veres wäre dies im ersten Quartal 2008 möglich .

APPR NE NE VAFIN PDS APPRART ADJA NN CARD ADJA PUNC

PN PP

PP

S

TOP

Page 60: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Example

according to János Veres this would be possible in the first quarter of 2008.

.

possible

2008quarter of

[APPRART,1] [X,2] [CARD,3]

thiswould beVeresJános

according to

[X,1] [X,2]

[X,1] [X,2]

[X,1] [X,2]

first

[ADJA,1] [NN,2]

in the

[X,2] [PP,1] [PUNC,3]

[PDS,2] [VAFIN,1]

[NE,1] [NE,2]

1 2 3 4 5 6 7 8 9 10 11

laut János Veres wäre dies im ersten Quartal 2008 möglich .

APPR NE NE VAFIN PDS APPRART ADJA NN CARD ADJA PUNC

PN PP

PP

S

TOP

Mixed Syntax

Page 61: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Example

according to János Veres this would be possible in the first quarter of 2008.

.

possible

2008quarter of

[APPRART,1] [X,2] [CARD,3]

thiswould beVeresJános

according to

[X,1] [X,2]

[X,1] [X,2]

[X,1] [X,2]

first

[ADJA,1] [NN,2]

in the

[X,2] [PP,1] [PUNC,3]

[PDS,2] [VAFIN,1]

[NE,1] [NE,2]

1 2 3 4 5 6 7 8 9 10 11

laut János Veres wäre dies im ersten Quartal 2008 möglich .

APPR NE NE VAFIN PDS APPRART ADJA NN CARD ADJA PUNC

PN PP

PP

S

TOP

Mixed Syntax

Page 62: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Example

according to János Veres this would be possible in the first quarter of 2008.

.

possible

2008quarter of

[APPRART,1] [X,2] [CARD,3]

thiswould beVeresJános

according to

[X,1] [X,2]

[X,1] [X,2]

[X,1] [X,2]

first

[ADJA,1] [NN,2]

in the

[X,2] [PP,1] [PUNC,3]

[PDS,2] [VAFIN,1]

[NE,1] [NE,2]

1 2 3 4 5 6 7 8 9 10 11

laut János Veres wäre dies im ersten Quartal 2008 möglich .

APPR NE NE VAFIN PDS APPRART ADJA NN CARD ADJA PUNC

PN PP

PP

S

TOP

Mixed Syntax

Page 63: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Example

according to János Veres this would be possible in the first quarter of 2008.

.

possible

2008quarter of

[APPRART,1] [X,2] [CARD,3]

thiswould beVeresJános

according to

[X,1] [X,2]

[X,1] [X,2]

[X,1] [X,2]

first

[ADJA,1] [NN,2]

in the

[X,2] [PP,1] [PUNC,3]

[PDS,2] [VAFIN,1]

[NE,1] [NE,2]

1 2 3 4 5 6 7 8 9 10 11

laut János Veres wäre dies im ersten Quartal 2008 möglich .

APPR NE NE VAFIN PDS APPRART ADJA NN CARD ADJA PUNC

PN PP

PP

S

TOP

Mixed Syntax

Page 64: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Chunk Tags

• Advantages of Shallow Tags– Don’t need Treebank– More reliable

• Disadvantages– Not a tree structure• We don’t rely on tree structure

Page 65: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

ResultsShallow Tags

German-EnglishModel # rules %BLEU

Hierarchical 64.3m 16.3

Mixed Syntax 254.5m 16.8

Page 66: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Larger Training CorpusGerman-English

Corpus

German English Corpus

Train Sentences 1,446,224 Europarl v5

Words 37,420,876 39,464,626

Tune Sentences 1910 dev2006

Test (in-domain)(out-of-domain)

Sentences19201042

nc test2007 v2 devtest2006

Page 67: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Larger Training CorpusGerman-English

Model # rules In-domain (BLEU)

Out-of-domain(BLEU)

Hierarchical 500m 22.1 16.5

Mixed Syntax (original) 2664m 21.6 16.3

Mixed Syntax (new extraction)

1435m 22.7 17.8

Page 68: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Contents

• What’s Wrong with Syntax?

• Which syntax model to use?

• Why use syntactic models?

• Mixed-Syntax Model– Extraction– Decoding– Results

• Future Work

Page 69: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Create your own labelDumb labels

ich bitte Sie , sich zu einer Schweigeminute zu erheben .

Page 70: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Create your own labelDumb labels

ich bitte Sie , sich zu einer Schweigeminute zu erheben .

Sz

e

Page 71: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Create your own labelsDumb labels

ich bitte Sie , sich zu einer Schweigeminute zu erheben .

Sz

e

Model In-domain (BLEU)

Out-of-domain(BLEU)

Hierarchical 22.1 16.5

Dumb Labels 22.0 16.3

Page 72: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Create your own labelsLabels motivated by reordering

Labelling patterns:

1. VMFIN...VVINF EOS2. VVINF und ... VVINF3. VAFIN ... (VVPP or VVINF) EOS4. , PRELS ... VVINF EOS5. EOS ... zu VVINF

Example:ich bitte Sie , sich zu einer Schweigeminute zu erheben .

label 5… werde ich dem Vorschlag von Herrn Evans folgen .

label 3

Page 73: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Create your own labels

Labels motivated by reordering

Model In-domain (BLEU)

Out-of-domain(BLEU)

Hierarchical 22.1 16.5

Dumb Labels 22.0 16.3

Reordering Labels 22.1 16.9

Page 74: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

Conclusion• Mixed-Syntax Model– SCFG-based decoding– Hierarchical phrase-based v. tree-to-string– Generality v. specificity

• Syntax Models– Many variations– Won’t automatically make MT better– Question

• which syntactic information?• how do we use it?• why use it?

Page 75: MIXED SYNTAX TRANSLATION Hieu Hoang MT Marathon 2011 Trento

END