syntactic reordering of source sentences for …rasooli/papers/mtsynreorder.pdfsyntactic reordering...

27
Syntactic Reordering of Source Sentences for Statistical Machine Translation Mohammad Sadegh Rasooli Columbia University [email protected] April 9, 2013 M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 1 / 27

Upload: others

Post on 12-May-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Syntactic Reordering of Source Sentences for …rasooli/papers/MTsynreorder.pdfSyntactic Reordering of Source Sentences for Statistical Machine Translation Mohammad Sadegh Rasooli

Syntactic Reordering of Source Sentences for StatisticalMachine Translation

Mohammad Sadegh Rasooli

Columbia University

[email protected]

April 9, 2013

M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 1 / 27

Page 2: Syntactic Reordering of Source Sentences for …rasooli/papers/MTsynreorder.pdfSyntactic Reordering of Source Sentences for Statistical Machine Translation Mohammad Sadegh Rasooli

Overview

1 First Paper: Collins, et al. (2005)The Role of Syntax in SMTSyntactic Preprocessing ApproachesClause RestructingExperimentsDiscussion

2 Second Paper: P. Xu, et al., (2009).Approaches to Syntactic ReorderingTranslation Between SVO and SOV LanguagesPrecedence Reordering Based on a Dependency ParserExperimentsDiscussion

M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 2 / 27

Page 3: Syntactic Reordering of Source Sentences for …rasooli/papers/MTsynreorder.pdfSyntactic Reordering of Source Sentences for Statistical Machine Translation Mohammad Sadegh Rasooli

First Paper

M. Collins, et al.: Clause Restructuring for Statistical Machine Translation.ACL 2005.

M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 3 / 27

Page 4: Syntactic Reordering of Source Sentences for …rasooli/papers/MTsynreorder.pdfSyntactic Reordering of Source Sentences for Statistical Machine Translation Mohammad Sadegh Rasooli

The Role of Syntax in SMT

In the original phrase-base SMT, syntax is not taken into acount.

Phrase-based systems have limited potential to model word-orderdifferences between languages.

The word order differences between languages are considered asdistortion.Each reordering rule adds distortion penalties to the overall score of thetranslation model.

M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 4 / 27

Page 5: Syntactic Reordering of Source Sentences for …rasooli/papers/MTsynreorder.pdfSyntactic Reordering of Source Sentences for Statistical Machine Translation Mohammad Sadegh Rasooli

Example: German vs. English Word Order

English

I will pass on to you the corresponding comments, so that you can adoptthem perhaps in the vote.

German

I will to you the corresponding comments pass on, so that you themperhaps in the vote adopt can.

M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 5 / 27

Page 6: Syntactic Reordering of Source Sentences for …rasooli/papers/MTsynreorder.pdfSyntactic Reordering of Source Sentences for Statistical Machine Translation Mohammad Sadegh Rasooli

Research on Syntax in MT

Changing the word order of one of the languages or both, to maketheir word order more similar to each other.

Syntax-Based MT Approaches

Make use of bitext grammars to parse both parts.Change the syntax of target language alone.

Transform the translation problem into a parsing problem.

Reranking methods

Select between N-best results of the phrase-based system, usingsyntactic information.

Preprocessing Approaches

The source language sentences are modified before translation.This approach is used in this paper.

M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 6 / 27

Page 7: Syntactic Reordering of Source Sentences for …rasooli/papers/MTsynreorder.pdfSyntactic Reordering of Source Sentences for Statistical Machine Translation Mohammad Sadegh Rasooli

Syntactic Preprocessing Approaches

English

I will pass on to you the corresponding comments, so that you can adoptthem perhaps in the vote.

German

I will to you the corresponding comments pass on, so that you themperhaps in the vote adopt can.

German (Preprocessed)

I will pass on to you the corresponding comments, so that you can adoptthem perhaps in the vote.

M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 7 / 27

Page 8: Syntactic Reordering of Source Sentences for …rasooli/papers/MTsynreorder.pdfSyntactic Reordering of Source Sentences for Statistical Machine Translation Mohammad Sadegh Rasooli

Clause Restructing

Steps (both in training and decoding)1 Parse the source sentence.2 Apply reordering rules on the source sentence.3 Use phrase-based models.

M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 8 / 27

Page 9: Syntactic Reordering of Source Sentences for …rasooli/papers/MTsynreorder.pdfSyntactic Reordering of Source Sentences for Statistical Machine Translation Mohammad Sadegh Rasooli

Example Parse Tree

M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 9 / 27

Page 10: Syntactic Reordering of Source Sentences for …rasooli/papers/MTsynreorder.pdfSyntactic Reordering of Source Sentences for Statistical Machine Translation Mohammad Sadegh Rasooli

Six Reordering Rules in German

M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 10 / 27

Page 11: Syntactic Reordering of Source Sentences for …rasooli/papers/MTsynreorder.pdfSyntactic Reordering of Source Sentences for Statistical Machine Translation Mohammad Sadegh Rasooli

Experiments

Experimental setup

Data: Europarl Corpus.751,088 parallel sentence.Evaluation on 2000 sentences.Average sentence length: 28 wordsBaseline: no reordering phrase-based system.

Results (BLEU score)

Basline: 25.2%Reordering: 26.8%

M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 11 / 27

Page 12: Syntactic Reordering of Source Sentences for …rasooli/papers/MTsynreorder.pdfSyntactic Reordering of Source Sentences for Statistical Machine Translation Mohammad Sadegh Rasooli

Human Translation Judgments

Two annotators judged 100 sentences (10 to 20 words in length;chosen at random).

Three versions: Human, baseline, reordered.

Judgments: Worse/better or equal.

Better Equal Worse

Annotator 1 40% 40% 20%

Annotator 2 44% 37% 19%

M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 12 / 27

Page 13: Syntactic Reordering of Source Sentences for …rasooli/papers/MTsynreorder.pdfSyntactic Reordering of Source Sentences for Statistical Machine Translation Mohammad Sadegh Rasooli

Example Output

Human

i think it is wrong in principle to have such measures in the europeanunion.

Reordered

i believe that it is wrong in principle to take such measures in theeuropean union.

Baseline

i believe that it is wrong in principle such measures in the european unionto take.

M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 13 / 27

Page 14: Syntactic Reordering of Source Sentences for …rasooli/papers/MTsynreorder.pdfSyntactic Reordering of Source Sentences for Statistical Machine Translation Mohammad Sadegh Rasooli

BLEU Statistical Significance

Authors use sign test for statistical significance.

f(x) is + if better than baseline, f(x) is - if worse; and f(x) is = if equalp+: probability of (f(x) is +) and p−: probability of f(x) is minus

BLEU does not have per-sentence evaluation.

Authors create an artificial comparison:

s baseline BLEU scoresi baseline BLEU score except the sentence i translated by thereordered model.f(x) is + is si > s; f(x) is - is si < s.

52.85% improved, 36.4% worse than baseline and 10.75% equal.

With 95% confidence, this method improves the baseline.

M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 14 / 27

Page 15: Syntactic Reordering of Source Sentences for …rasooli/papers/MTsynreorder.pdfSyntactic Reordering of Source Sentences for Statistical Machine Translation Mohammad Sadegh Rasooli

Discussion

The method clearly improves the baseline.

The rules are language-specific (even cannot be used for English toGerman translation).

The authors did not try to learn reordering rules automatically.

M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 15 / 27

Page 16: Syntactic Reordering of Source Sentences for …rasooli/papers/MTsynreorder.pdfSyntactic Reordering of Source Sentences for Statistical Machine Translation Mohammad Sadegh Rasooli

Second Paper

P. Xu, et al., Using a dependency parser to improve SMT forsubject-object-verb languages. NAACL 2009.

M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 16 / 27

Page 17: Syntactic Reordering of Source Sentences for …rasooli/papers/MTsynreorder.pdfSyntactic Reordering of Source Sentences for Statistical Machine Translation Mohammad Sadegh Rasooli

Approaches to Syntactic Reordering

Explicitly model phrase reordering distances; e.g. distance baseddistortion models.

Syntactic analysis of the target language into both modeling anddecoding.

Reordering source sentences based on syntactic analysis

This paper uses this approach

M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 17 / 27

Page 18: Syntactic Reordering of Source Sentences for …rasooli/papers/MTsynreorder.pdfSyntactic Reordering of Source Sentences for Statistical Machine Translation Mohammad Sadegh Rasooli

Translation Between SVO and SOV Languages

Subject-Verb-Object (SVO) and Subject-Object-Verb (SOV) are twocommon word order in the world languages.

English is SVO and Korean is SOV.

“John hit the ball.” vs. “John the ball hit.”

When the sentences get longer, the cost of moving structures duringdecoding (in phrase-based models) can be quite high.

“English is used as the first or second language in many countriesaround the world.”→ “is used” should skip 13 words to go to the end of the sentence.

M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 18 / 27

Page 19: Syntactic Reordering of Source Sentences for …rasooli/papers/MTsynreorder.pdfSyntactic Reordering of Source Sentences for Statistical Machine Translation Mohammad Sadegh Rasooli

Precedence Reordering Based on a Dependency Parser

The children of each word have some relative ordering.

A Precedence reordering rule is a mapping from T to a set oftuples {(L,W ,O)}

T : POS tagL: Dependency labelW : Weight indicating the order (highest to lowest)

Children with the same weights are ordered according to the orderdefined in the rule.Why not explicitly pre-define unequal weights?

O: order type

NORMAL: preserve the original orderRESERVE: flip the order

If a node is not listed in the rules, W = 0 and O = NORMAL

Use self to refer to the head node itself.

Punctuations and conjugations disallow movements across them.

M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 19 / 27

Page 20: Syntactic Reordering of Source Sentences for …rasooli/papers/MTsynreorder.pdfSyntactic Reordering of Source Sentences for Statistical Machine Translation Mohammad Sadegh Rasooli

Precedence Reordering Based on a Dependency Parser

After apply precedence rule, thiswill be: John the ball hit can.

M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 20 / 27

Page 21: Syntactic Reordering of Source Sentences for …rasooli/papers/MTsynreorder.pdfSyntactic Reordering of Source Sentences for Statistical Machine Translation Mohammad Sadegh Rasooli

Novelties in This Work

1 This model is more efficient than its counterpart.

2 Outperforms the state-of-the-art (stronger baseline).

3 It is not restricted to one language pair.

4 It is possible to automatically learn precedence rules.

5 They use dependency parse trees rather than constituency trees.

M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 21 / 27

Page 22: Syntactic Reordering of Source Sentences for …rasooli/papers/MTsynreorder.pdfSyntactic Reordering of Source Sentences for Statistical Machine Translation Mohammad Sadegh Rasooli

Experiments

English to 5 SOV languages.

Baseline: Maximum entropy based lexicalized phrase reorderingmodel.

Maximum allowed reordering: 10.

Parser: Deterministic transition-based dependency parser.

Parses in linear time.

Another baseline: Hierarchical phrase-based system.

Can capture long distance reordering by using a PCFG model.Uses chart parsing during decoding: slower than deterministicdependency parser.

9.5K English sentences (from web) as evaluation data.

3,500 sentences for dev (to perform MERT).1,000 sentences for test.5,000 sentences for blind test.

M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 22 / 27

Page 23: Syntactic Reordering of Source Sentences for …rasooli/papers/MTsynreorder.pdfSyntactic Reordering of Source Sentences for Statistical Machine Translation Mohammad Sadegh Rasooli

Experiments

M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 23 / 27

Page 24: Syntactic Reordering of Source Sentences for …rasooli/papers/MTsynreorder.pdfSyntactic Reordering of Source Sentences for Statistical Machine Translation Mohammad Sadegh Rasooli

Results

M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 24 / 27

Page 25: Syntactic Reordering of Source Sentences for …rasooli/papers/MTsynreorder.pdfSyntactic Reordering of Source Sentences for Statistical Machine Translation Mohammad Sadegh Rasooli

Results

M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 25 / 27

Page 26: Syntactic Reordering of Source Sentences for …rasooli/papers/MTsynreorder.pdfSyntactic Reordering of Source Sentences for Statistical Machine Translation Mohammad Sadegh Rasooli

Discussion

Reordering of languages with different word orders is essential.

The method seems to work fine for 5 languages.

Although authors claim that the rule can be extracted automatically,they did not try.

The improvement of the basic over hierarchical phrase-based is notsignificant.

M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 26 / 27

Page 27: Syntactic Reordering of Source Sentences for …rasooli/papers/MTsynreorder.pdfSyntactic Reordering of Source Sentences for Statistical Machine Translation Mohammad Sadegh Rasooli

Thanks!

M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 27 / 27