decoding in smt and more - cl.lingfil.uu.sethe log-linear model: decoding and tuning in statistical...

61
Decoding in SMT and more Sara Stymne September 24, 2018 Partly based on slides by Christian Hardmeier and Jörg Tiedemann måndag 24 september 18

Upload: others

Post on 04-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Decoding in SMTand more

Sara StymneSeptember 24, 2018

Partly based on slides by Christian Hardmeier and Jörg Tiedemann

måndag 24 september 18

Page 2: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Road Map

Decoding Phrase-Based SMT Models• iterative hypothesis expansion• pruning• beam search using stack decoding

Factored translationDocument-level translation

måndag 24 september 18

Page 3: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Translating with SMT Models

SMT Model• statistical model of the translation process• various components• millions and billions of parameters and options

SMT Decoder• implements search for a good translation candidate• tries to find translation hypothesis with the highest model

score (but may return sub-optimal solutions)• important features: accuracy and efficiency

måndag 24 september 18

Page 4: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Decoding Phrase-Based SMT

Search problem: • Find the best translation among all possible translation

candidates

The Log-Linear Model:

Decoding and Tuning inStatistical Machine Translation

Christian Hardmeier

2015-05-20

Decoding

The decoder is the part of the SMT systemthat creates the translations.

Given a set of models, how can we translateefficiently and accurately?

Decoding

Find the best translation among all possible translations.

t⇤ = arg maxt

f (s, t) = arg maxt

X

i

�ihi (s, t)

f (s, t) Scoring functionhi (s, t) Feature functions�i Feature weights

måndag 24 september 18

Page 5: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Model Error vs. Search Error

Model Error• The solution with the highest score according to our model

is not an acceptable translation• Model deficiency

Search Error• The decoder fails to find the solution with the highest model

score• Shortcomings of the search algorithm

måndag 24 september 18

Page 6: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

The Generative Model of PB-SMT

Model error vs. search error

Model error: The solution with the highest scoreunder our models is not a good translation.Search error: The decoder cannot find the solutionwith the highest model score.

Phrase-based SMT: Generative Model

Bakom huset hittade polisen en stor mängd narkotika .

Behind the house found police a large quantity of narcotics .

Behind the house police found a large quantity of narcotics .

1 Phrase segmentation2 Phrase translation3 Output ordering

Phrase-based SMT: Generative Model

Bakom huset hittade polisen en stor mängd narkotika .

Behind the house found police a large quantity of narcotics .

Behind the house police found a large quantity of narcotics .

Behind the housethe house police

house police foundpolice found a

found a large

måndag 24 september 18

Page 7: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

The Generative Model of PB-SMT

Model error vs. search error

Model error: The solution with the highest scoreunder our models is not a good translation.Search error: The decoder cannot find the solutionwith the highest model score.

Phrase-based SMT: Generative Model

Bakom huset hittade polisen en stor mängd narkotika .

Behind the house found police a large quantity of narcotics .

Behind the house police found a large quantity of narcotics .

1 Phrase segmentation2 Phrase translation3 Output ordering

Phrase-based SMT: Generative Model

Bakom huset hittade polisen en stor mängd narkotika .

Behind the house found police a large quantity of narcotics .

Behind the house police found a large quantity of narcotics .

Behind the housethe house police

house police foundpolice found a

found a largeContext-window fromN-gram language model

måndag 24 september 18

Page 8: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Translation OptionsTranslation Options

Illustrations by Philipp Koehn

Translation Options

he

er geht ja nicht nach hause

it, it

, he

isare

goesgo

yesis

, of course

notdo not

does notis not

afterto

according toin

househome

chamberat home

notis not

does notdo not

homeunder housereturn home

do not

it ishe will be

it goeshe goes

isare

is after alldoes

tofollowingnot after

not to

,

notis not

are notis not a

• Many translation options to choose from

– in Europarl phrase table: 2727 matching phrase pairs for this sentence– by pruning to the top 20 per phrase, 202 translation options remain

Decoding 8Decoding by Hypothesis Expansion

Illustrations by Philipp Koehn

Decoding: Hypothesis Expansion

er geht ja nicht nach hause

are

it

hegoes

does not

yes

go

to

home

home

also create hypotheses from created partial hypothesis

Decoding 14

Is it always possible to translate any sentence in this way?What would cause the process to break downso the decoder can’t find a translationthat covers the whole input sentence?How could you make sure that this never happens?Decoding: Hypothesis Expansion

er geht ja nicht nach hause

are

it

hegoes

does not

yes

go

to

home

home

also create hypotheses from created partial hypothesis

Decoding 14

Get (all) translation options for all possible segmentations of the input sentence to be translated!

måndag 24 september 18

Page 9: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Using the available translation optionscreate translation hypothesis from left to right:

Decoding by Hypothesis Expansion

Translation Options

Illustrations by Philipp Koehn

Translation Options

he

er geht ja nicht nach hause

it, it

, he

isare

goesgo

yesis

, of course

notdo not

does notis not

afterto

according toin

househome

chamberat home

notis not

does notdo not

homeunder housereturn home

do not

it ishe will be

it goeshe goes

isare

is after alldoes

tofollowingnot after

not to

,

notis not

are notis not a

• Many translation options to choose from

– in Europarl phrase table: 2727 matching phrase pairs for this sentence– by pruning to the top 20 per phrase, 202 translation options remain

Decoding 8Decoding by Hypothesis Expansion

Illustrations by Philipp Koehn

Decoding: Hypothesis Expansion

er geht ja nicht nach hause

are

it

hegoes

does not

yes

go

to

home

home

also create hypotheses from created partial hypothesis

Decoding 14

Is it always possible to translate any sentence in this way?What would cause the process to break downso the decoder can’t find a translationthat covers the whole input sentence?How could you make sure that this never happens?Decoding: Hypothesis Expansion

er geht ja nicht nach hause

are

it

hegoes

does not

yes

go

to

home

home

also create hypotheses from created partial hypothesis

Decoding 14

måndag 24 september 18

Page 10: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

• Is it always possible to translate any sentence in this way?• What would cause the process to break down

so the decoder can’t find a translationthat covers the whole input sentence?

• How could you make sure that this never happens?

Decoding by Hypothesis Expansion

Translation Options

Illustrations by Philipp Koehn

Translation Options

he

er geht ja nicht nach hause

it, it

, he

isare

goesgo

yesis

, of course

notdo not

does notis not

afterto

according toin

househome

chamberat home

notis not

does notdo not

homeunder housereturn home

do not

it ishe will be

it goeshe goes

isare

is after alldoes

tofollowingnot after

not to

,

notis not

are notis not a

• Many translation options to choose from

– in Europarl phrase table: 2727 matching phrase pairs for this sentence– by pruning to the top 20 per phrase, 202 translation options remain

Decoding 8Decoding by Hypothesis Expansion

Illustrations by Philipp Koehn

Decoding: Hypothesis Expansion

er geht ja nicht nach hause

are

it

hegoes

does not

yes

go

to

home

home

also create hypotheses from created partial hypothesis

Decoding 14

Is it always possible to translate any sentence in this way?What would cause the process to break downso the decoder can’t find a translationthat covers the whole input sentence?How could you make sure that this never happens?Decoding: Hypothesis Expansion

er geht ja nicht nach hause

are

it

hegoes

does not

yes

go

to

home

home

also create hypotheses from created partial hypothesis

Decoding 14

måndag 24 september 18

Page 11: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Decoding Complexity

Naively, in a sentence of N words with T translation options for each phrase, we can have

• O(2N) phrase segmentations,• O(TN) sets of phrase translations,• O(N!) word reordering permutations

måndag 24 september 18

Page 12: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Exploiting Model Locality

What we need to score a new hypothesis is• the score of the previous hypothesis• the translation model score• the new language model score

Decoding complexity

Naively, in a sentence of N words with T translation options foreach phrase, we can have

O(2N ) phrase segmentations,O(TN ) sets of phrase translations, andO(N!) word reordering permutations.

Exploiting Model Locality

Bakom huset hittade polisen en stor mängd narkotika .

Behind the house police found a big

To score a new hypothesis, we need:the score of the previous hypothesisthe translation model scorethe new language model scores

Hypothesis recombination

The translation model only looks at the current phrase.The n-gram model only looks at a window of n words.The choices the decoder makes are independent ofeverything beyond this window!The decoder never reconsiders its choices once they’vemoved out of the n-gram history.

given

contextindependent

limitedwindow

måndag 24 september 18

Page 13: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Hypothesis Recombination

Context-Independent Translation Model• looks only at the current phrase

N-gram Language Model• looks only at a window of N words

Decoder choices are independent of everything beyond this window!

• no need to reconsider alternative hypotheses once they have moved out of the n-gram history

måndag 24 september 18

Page 14: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Hypothesis Recombination

Example:• Three hypotheses with the same coverage• trigram language model

Competing hypotheses can be discarded because they will never beat the winning one later on!

Hypothesis recombination

Suppose we have these hypotheses with the same coverage,and we use a trigram language model:

After the house police Score = –12.5

Behind the house police Score = –11.2

, the house police Score = –22.0

We already know the winner!We can discard the competing hypotheses.

Hypothesis recombination

Hypothesis recombination combines branchesin the search graph:

It’s a form of dynamic programming.Recombination reduces the search space substantially. . .. . . it preserves search optimality. . .. . . but decoding is still exponential!

Pruning

To make decoding really efficient,we expand only hypotheses that look promising.Bad hypotheses should be pruned earlyto avoid wasting time on them.Pruning compromises search optimality!

måndag 24 september 18

Page 15: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Hypothesis Recombination

In the search graph: Combine branches

It is a form of dynamic programming• substantial reduction of the search space• search is still optimal• but decoding is still exponential

184 Chapter 6. Decoding

it is

it is

it is

Figure 6.3: Recombination example: Two hypothesis paths lead to two matchinghypotheses: they have the same number of foreign words translated, and the sameEnglish words in the output, but di↵erent scores. In the heuristic search they canbe recombined, the worse hypothesis is dropped.

it

he

does not

does not

it

he does not

Figure 6.4: More complex recombination example: In heuristic search with atrigram language model, also here two hypotheses can be recombined. Both hy-potheses are the same in respected to subsequent costs: The language model stillconsiders the last two words produced (does not), but not the third last word. Allother later costs are independent of the words produced so far.

worse hypothesis can never be part of the best overall path. For any paththat includes the worse hypothesis, we can substitute it with the better one,and get a better score. Thus, we can drop the worse hypothesis.

Identical English output is not required for hypothesis recombination,as the example in Figure 6.4 illustrates. Here we are considering two pathsthat start with di↵erent translations for the first German word er, namelyit and he. But then we continue with the same translation option, skippingone German word, and translation ja nicht as does not.

Here, we have two hypotheses that look similar: they are identical intheir German coverage, but di↵er slightly in the English output. However,recall our argument on why we can recombine hypotheses. If any pathstarting from the worse hypothesis can also be used to extend the better hy-pothesis with the same costs, then we can safely drop the worse hypothesis,since it can never be part of the best path.

It does not matter in our two examples that the two hypothesis pathsdi↵er in their first English word. All subsequent phrase translation costs donot depend on already generated English words. Only the language model

måndag 24 september 18

Page 16: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Pruning

Pruning = cut away things that are most likely not useful at all

Pruning in SMT decoding• expand only promising hypotheses• discard bad hypotheses early to avoid wasting time

Pruning greatly improves efficiency• but compromises search optimality

måndag 24 september 18

Page 17: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Stack DecodingStack decoding

Illustrations by Philipp Koehn

Stacks

are

it

he

goes does not

yes

no wordtranslated

one wordtranslated

two wordstranslated

three wordstranslated

• Hypothesis expansion in a stack decoder

– translation option is applied to hypothesis– new hypothesis is dropped into a stack further down

Decoding 21

Stack decoding algorithm

1: AddToStack(s0, h0)2: for i = 0 . . . N � 1 do3: for all h 2 si do4: for all t 2 T do5: if Applicable(h, t) then6: h0 Expand(h, t)7: j WordsCovered(h) + WordsCovered(t)8: AddToStack(s j , h0)9: end if

10: end for11: end for12: end for13: return best hypothesis on stack sN

Stacks

are

it

he

goes does not

yes

no wordtranslated

one wordtranslated

two wordstranslated

three wordstranslated

• Hypothesis expansion in a stack decoder

– translation option is applied to hypothesis– new hypothesis is dropped into a stack further down

Decoding 21

AddToStack(s, h)

1: for all h0 2 s do2: if Recombinable(h, h0) then3: add higher-scoring of h,h0 to stack s, discard other4: return5: end if6: end for7: add h to stack s8: if stack too large then9: prune stack

10: end if

sorted by the number of input words covered by the given hypothesis

limited size for each stack

måndag 24 september 18

Page 18: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Stack Decoding Algorithm

Stack decoding

Illustrations by Philipp Koehn

Stacks

are

it

he

goes does not

yes

no wordtranslated

one wordtranslated

two wordstranslated

three wordstranslated

• Hypothesis expansion in a stack decoder

– translation option is applied to hypothesis– new hypothesis is dropped into a stack further down

Decoding 21

Stack decoding algorithm

1: AddToStack(s0, h0)2: for i = 0 . . . N � 1 do3: for all h 2 si do4: for all t 2 T do5: if Applicable(h, t) then6: h0 Expand(h, t)7: j WordsCovered(h) + WordsCovered(t)8: AddToStack(s j , h0)9: end if

10: end for11: end for12: end for13: return best hypothesis on stack sN

Stacks

are

it

he

goes does not

yes

no wordtranslated

one wordtranslated

two wordstranslated

three wordstranslated

• Hypothesis expansion in a stack decoder

– translation option is applied to hypothesis– new hypothesis is dropped into a stack further down

Decoding 21

AddToStack(s, h)

1: for all h0 2 s do2: if Recombinable(h, h0) then3: add higher-scoring of h,h0 to stack s, discard other4: return5: end if6: end for7: add h to stack s8: if stack too large then9: prune stack

10: end if

next slide

måndag 24 september 18

Page 19: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

AddToStack(s,h)

Stack decoding

Illustrations by Philipp Koehn

Stacks

are

it

he

goes does not

yes

no wordtranslated

one wordtranslated

two wordstranslated

three wordstranslated

• Hypothesis expansion in a stack decoder

– translation option is applied to hypothesis– new hypothesis is dropped into a stack further down

Decoding 21

Stack decoding algorithm

1: AddToStack(s0, h0)2: for i = 0 . . . N � 1 do3: for all h 2 si do4: for all t 2 T do5: if Applicable(h, t) then6: h0 Expand(h, t)7: j WordsCovered(h) + WordsCovered(t)8: AddToStack(s j , h0)9: end if

10: end for11: end for12: end for13: return best hypothesis on stack sN

Stacks

are

it

he

goes does not

yes

no wordtranslated

one wordtranslated

two wordstranslated

three wordstranslated

• Hypothesis expansion in a stack decoder

– translation option is applied to hypothesis– new hypothesis is dropped into a stack further down

Decoding 21

AddToStack(s, h)

1: for all h0 2 s do2: if Recombinable(h, h0) then3: add higher-scoring of h,h0 to stack s, discard other4: return5: end if6: end for7: add h to stack s8: if stack too large then9: prune stack

10: end if

important for efficiency

måndag 24 september 18

Page 20: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Pruning

Histogram Pruning• keep no more that n hypotheses per stack• Parameter: Stack size n

Threshold Pruning• discard hypotheses with low scores compared to the score

of the best hypothesis on the same stack h*• Score(h) < a * Score(h*)• Parameter: threshold factor a

måndag 24 september 18

Page 21: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Beam Search Complexity

Setup:• For each of the N words in the input sentence• expand n hypotheses• by considering T translation options each

Complexity: O(n * N * T)

Number of translation options is linear in sentence length:• complexity: O(n * N2)

måndag 24 september 18

Page 22: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Distortion Limits

Limit reordering reduces search space dramatically• most partial hypotheses cover the same input• number of translation options at each stop is bound by a

constantIs it OK to do that?

• for closely related languages: most reordering is local• we do not have good reordering models anyway• could do pre-ordering if necessary

How to prune

Histogram pruningKeep no more than S hypotheses per stack.Parameter: Stack size S

Threshold pruningDiscard hypotheses whose score is very low compared tothat of the best hypothesis on the stack h⇤:

Score(h) < ⌘ · Score(h⇤)

Parameter: Beam size ⌘

Beam search: Complexity

For each of the N words in the input sentence,expand S hypothesesby considering T translation options each:

O(S · N · T )

The number of translation options is linear in the sentence length:

O(S · N2)

Distortion limit

When translating between closely related languages,most reorderings are local. . .. . . and anyhow, we haven’t got any reasonable modelsfor long-range reordering!If we impose a limit on reordering, the number of translationoptions to consider at each step is bounded by a constant.

Bakom huset hittade polisen en stor mängd narkotika .

Behind the house policelimited distortion window

decoder complexity:

O(n*N)

måndag 24 september 18

Page 23: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Incremental Scoring and Cherry Picking

Distortion limit

When translating between closely related languages,most reorderings are local. . .. . . and anyhow, we haven’t got any reasonable modelsfor long-range reordering!If we impose a limit on reordering, the number of translationoptions to consider at each step is bounded by a constant.

The number of hypotheses expanded by a beamsearch decoder with limited reordering is linear inthe stack size and the input size:

O(S · N )

Incremental scoring and cherry picking

Bakom huset hittade polisen en stor mängd narkotika .

Behind the house police found

Bakom huset hittade polisen en stor mängd narkotika .

Behind the house police a big

2 1

04 3

Incremental scoring and cherry picking

The path that looks cheapest necessarily incursa much higher cost later.Pruning may discard better options before this is recognised.To make scores more comparable, we shouldtake into account unavoidable future costs.Compare hypotheses based on current score + future score.

Problem: Delay expensive decisions and loose good hypotheses due to early pruning

måndag 24 september 18

Page 24: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Incremental Scoring and Cherry Picking

Cherry picking problems• easy decisions first• promising hypotheses may become very expensive later• pruning discards hypotheses that are cheaper in the end

Solution• estimate an approximate future cost• add future cost estimates to each hypotheses

måndag 24 september 18

Page 25: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Future Cost Estimation

Exact future cost require full decoding

Approximation using additional assumptions• independence between models• ignore LM history across phrase boundaries

Future cost estimation

Calculating the future cost exactly would amountto full decoding!Cheaper approximations can be computed bymaking additional independence assumptions.

Assume independence between models.Ignore LM history across phrase boundaries.Cost Estimates from Translation Options

the tourism initiative addresses this for the first time

-1.0 -2.0 -1.5 -2.4 -1.0 -1.0 -1.9 -1.6-1.4

-4.0 -2.5

-1.3

-2.2

-2.4

-2.7

-2.3

-2.3

-2.3

cost of cheapest translation options for each input span (log-probabilities)

Decoding 27

Illustrations by Philipp Koehn

DP Beam Search Decoding: Evaluation

DP beam search is by far the most popular search algorithmfor phrase-based SMT.It combines high speed with reasonable accuracy byexploiting the constraints of the standard models.It works well with very local models.

Sentence-internal long-range dependenciesincrease search errors by inhibiting recombination.No cross-sentence dependencies on the target side.

Current state of the art: Almost perfect local fluency, butserious problems with long-range reordering anddiscourse-level phenomena.

Feature Weight Tuning

f (s, t) =X

i

�ihi (s, t)

The SMT model is a weighted sum of feature scores.Translation modelLanguage modelDistortion model

The feature models are trained on corpus data.But where do we get the weights �i from?

måndag 24 september 18

Page 26: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Summary on Decoding

Dynamic Programming Beam Search• most popular search algorithm (but not the only one)• high speed and reasonable accuracy• fits well with the feature functions of standard PB-SMT

Limitations• not efficient with non-local models (modeling long-distance

dependencies)• not possible to include cross-sentence dependencies on the

target language side

måndag 24 september 18

Page 27: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Factored translation

måndag 24 september 18

Page 28: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

No use of morphology:• treat inflectional variants (“look”, “looks”, “looked”) as

completely different words!• in learning translation models: knowing how to translate

“look” doesn’t help to translate “looks”

Works fine for English (and reasonable amounts of data)

Problems:• morphologically rich languages• sparse data sets• flexible word order

Problems with PBSMT

måndag 24 september 18

Page 29: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Represent words by factors

Factored models (1)4

Factored translation models

• Factored represention of words

word word

part-of-speech

OutputInput

morphology

part-of-speech

morphology

word class

lemma

word class

lemma

......• Goals

– Generalization, e.g. by translating lemmas, not surface forms– Richer model, e.g. using syntax for reordering, language modeling)

Koehn, U Edinburgh ESSLLI Summer School Day 5

5

Related work

• Back o! to representations with richer statistics (lemma, etc.)[Nießen and Ney, 2001, Yang and Kirchho! 2006, Talbot and Osborne 2006]

• Use of additional annotation in pre-processing (POS, syntax trees, etc.)[Collins et al., 2005, Crego et al, 2006]

• Use of additional annotation in re-ranking (morphological features, POS,syntax trees, etc.)[Och et al. 2004, Koehn and Knight, 2005]

! we pursue an integrated approach

• Use of syntactic tree structure[Wu 1997, Alshawi et al. 1998, Yamada and Knight 2001, Melamed 2004,Menezes and Quirk 2005, Chiang 2005, Galley et al. 2006]

! may be combined with our approach

Koehn, U Edinburgh ESSLLI Summer School Day 5109

måndag 24 september 18

Page 30: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Morphology• is productive• well understood• generalizable patterns

Factored models• learn translations of base forms• learn to map morphology• learn to generate target surface form

Factored models (2)

måndag 24 september 18

Page 31: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Represent words by factors? Why?• combine scores for translating various factors• back-off to other factors (lemma)• use various factors for reordering• better word alignment (?)Better generalization• can translate words that we haven’t seen in training• better statistics for translation optionsRicher model (more (linguistic) information)• PoS, syntactic function, semantic role, ...

Factored models (3)

måndag 24 september 18

Page 32: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Factored model example (1)Factored Translation Models Syntax-Oriented Statistical Models Example-based MT

Decomposing translation: example

lemma lemma

part-of-speech

OutputInput

morphology

part-of-speech

word word

Itranslate lemma and POS separately

Igenerate surface word forms from translated factors

Jörg Tiedemann 5/37

translation steps

generation stepanalysis step

måndag 24 september 18

Page 33: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Use benefits of general phrase-based SMT!• factored models as alternative paths (or backoff)

Factored model example (2)Factored Translation Models Syntax-Oriented Statistical Models Example-based MT

Factored modelsI Basic phrase-based SMT is very powerful!I Why generalizing if we know specific translation?

lemma lemma

part-of-speech

OutputInput

morphology

part-of-speech

word wordor

I prefer surface model for known wordsI use morphgen model as back-off

Jörg Tiedemann 8/37

måndag 24 september 18

Page 34: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Do not always lead to improvement

Complicated models are slow compared to standard PBSMT

Factored model resultsFactored Translation Models Syntax-Oriented Statistical Models Example-based MT

Factored models: Results & Summary

Some Results (German/English, Koehn):

System In-domain Out-of-domainBaseline 18.19 15.01

With POS LM 19.05 15.03Morphgen model 14.38 11.65Both model paths 19.47 15.23

I factors on token levelI flexible SMT frameworkI many possible factors & translation/generation stepsI not much success yet ...

Jörg Tiedemann 10/37

måndag 24 september 18

Page 35: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Simpler models often more useful!

Factored model example (3)

Word Word

POS

Useful with POS LM

måndag 24 september 18

Page 36: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Often useful with POS/morphology LMsNot much slower than standard modelsTend to give some improvements to agreement

Example: Improve word order of split compounds:Number of compound modifiers without a head:

System without POS-model: 136 System with a POS-model: 6

Simple factored models

måndag 24 september 18

Page 37: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Factored model in Moses

Full support in Moses:• http://www.statmt.org/moses/?n=Moses.FactoredTutorial

Data Format (example):

• 4 source language factors (word|lemma|pos|morph)• 3 target language factors (word|lemma|pos)

==> factored-corpus/proj-syndicate.de <== korruption|korruption|nn|nn.fem.cas.sg floriert|florieren|vvfin|vvfin .|.|per|per

==> factored-corpus/proj-syndicate.en <== corruption|corruption|nn flourishes|flourish|nns .|.|.

måndag 24 september 18

Page 38: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Word-based SMTPhrase-based SMT• Factored SMT

Still not happy? What’s the problem?

SMT Approaches

måndag 24 september 18

Page 39: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Word-based SMTPhrase-based SMT• Factored SMT

Still not happy? What’s the problem?• Very simple reordering models• Only consecutive phrases (word sequences)• Long-distance dependencies / agreement• Problems with ungrammatical output

Tree-based SMT• Addresses these issues to some extent• But, computationally more complex, with an ever larger search

space

SMT Approaches

måndag 24 september 18

Page 40: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

A family of approaches (with confusing terminology)

• hierarchical phrase-based models (Hiero)• tree-to-string, string-to-tree, tree-to-tree• syntax-directed MT• syntax-augmented MT• syntactified phrase-based MT• string-to-dependency• dependency treelet-based

Tree-based SMT

måndag 24 september 18

Page 41: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Document-level MT

måndag 24 september 18

Page 42: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Discourse-Level Phenomena

Translate the following to German:

Ich sagte: “Stell es zurück!”

Aber es war nicht ihr Schlüssel.

Ich sagte: “Leg ihn zurück!”

Das Mädchen nahm meinen Schlüssel aus dem Schloss.

sein (?)

Ich sagte: “Steck ihn wieder rein!”

• The girl took my key from the door lock.

• But it wasn’t her key!

• I said: “Put it back!”

måndag 24 september 18

Page 43: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Connectedness of Natural Language

Textual cohesion• discourse markers• referential devices (e.g. pronominal anaphora)• ellipses (word omissions) and substitutions• lexical cohesion (word repetition, collocation)

Textual coherence• semantically meaningful relations• logical tense structure• presuppositions and implications connected to general

world knowledge

måndag 24 september 18

Page 44: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Discourse and Machine Translation

Long-distance relations are lost in local MT models• sentence-by-sentence translation• limited context window

Discourse-level devices do not easily map between languages• explicit vs. implicit discourse markers• grammatical agreement in anaphoric relations

Terminological consistency• domain-specific requirements

måndag 24 september 18

Page 45: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Pronominal Anaphora

Translating pronouns requires target-language dependencies

• The funeral of the Queen Mother will take place on Friday. It will be broadcast live.

• Les funérailles de la reine-mère auront lieu vendredi. Elles seront retransmises en direct.

Alternative translation:

• L’enterrement de la reine-mère aura lieu vendredi. Il sera retransmis en direct.

måndag 24 september 18

Page 46: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Beam Search Decoding

Advantages:• very good search results in a huge search space• manageable complexity• best efficiency / accuracy trade-off with current models

Disadvantages:• Markov assumption (independence outside of limited

history)• sentence-internal long-distance dependencies dramatically

increase search space and cause more search errors• difficult to integrate cross-sentence dependencies

måndag 24 september 18

Page 47: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Greedy Hill Climbing

InitializeFrom given state generate successor state

• apply randomly chosen operation at random locationCompute score for the new state

• may look at any place in the document• efficient score updates are necessary

Hill climbing• accept if the new state has a higher model score• continue until satisfied (or tired of waiting)

Implemented in the Docent decoder

måndag 24 september 18

Page 48: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Learning Curves

English-French Newswire (WMT task)

First-choice hill climbing

For a given state, generate a successor state by applyinga randomly chosen operation at a random locationin the document.Compute the score of the new state.Accept the new state if it is better, else keep the current state.Iterate until

a maximum number of steps is reached, orthere is a large number of rejections in a row.

Learning curves

English-French Newswire (WMT system)

1e+02 1e+04 1e+06 1e+08

−100

00−6

000

−200

020

00

Standard models

steps

Nor

mal

ised

sco

re

with beam searchw/o beam search

●●

●● ● ● ● ● ● ● ●

1e+02 1e+04 1e+06 1e+08

0.00

0.05

0.10

0.15

0.20

0.25

Standard models

steps

BLEU

with beam searchw/o beam search

●●

●●

●●

● ● ● ● ● ● ● ● ● ●

Part II

Pronominal Anaphora

måndag 24 september 18

Page 49: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Selected Discourse-Level Features

Lexical cohesion models• motivation: one sense per discourse / translation• document language model using distributional semantics

Lexical consistency• penalise inconsistent translation options

Stylistic features, target group adaptation• improved readability• simplified translation• format / formatting constraints

måndag 24 september 18

Page 50: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Document-Level Decoding in Docent

DOCENTdocument translation

change operation &score update

document translation

accept / rejectchange

MOSESor random

initialize

Local search with stochastic search operations

• start with a complete (suboptimal) solution• apply small changes anywhere to improve it• hill-climbing or annealing

final doc translationstep limit (108)

rejection limit (105)

måndag 24 september 18

Page 51: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Document-Level Decoding

Initial state• randomly selected segmentation and translation• or initialised by beam search with local features only

Search operations• simple operations that open entire search space• randomly selecting next operation

Search and Termination• accept useful changes• until step limit is reached• or rejection limit is reached Search is non-deterministic

måndag 24 september 18

Page 52: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Example: Initial Step

• that prevention this disease

• unfortunately , there are is not a miracle cure contribute to pre- venting get cancer but

• in spite to the made progress in ’ scientific research there remains the a healthy ways of life the best solution , the if of the risks decrease , in with him disease this .

måndag 24 september 18

Page 53: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Example: Step 8,192

• that prevention of the disease

• unfortunately , there are no miracle cure for preventing cancer .

• in spite of the progress in research remains the adoption of a healthy lifestyle is the best way , about the risk to reduce , in him on developing them .

måndag 24 september 18

Page 54: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Example: Step 134,217,728

• prevention of the disease

• unfortunately , there is no miracle cure for preventing cancer .

• despite the progress in research , the adoption of a healthy lifestyle , the best way to minimise the risk of him on ill .

måndag 24 september 18

Page 55: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Search Operations

Search steps: 8,192

that prevention of the disease

unfortunately , there are no miracle cure for preventing cancer .

in spite of the progress in research remains the adoption of ahealthy lifestyle is the best way , about the risk to reduce , in himon developing them .

Search steps: 134,217,728

prevention of the disease

unfortunately , there is no miracle cure for preventing cancer .

despite the progress in research , the adoption of a healthy lifestyle, the best way to minimise the risk of him on ill .

Local search operations for phrase-based SMT

er geht ja nicht nach hause

it is yes not after house

Change phrase translation: Change the translationof one phrase.Resegment: Find a new phrase segmentation for a sequenceof adjacent source words.Swap phrases: Exchange the positions of two target phrases.

måndag 24 september 18

Page 56: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Search Operations

Search steps: 8,192

that prevention of the disease

unfortunately , there are no miracle cure for preventing cancer .

in spite of the progress in research remains the adoption of ahealthy lifestyle is the best way , about the risk to reduce , in himon developing them .

Search steps: 134,217,728

prevention of the disease

unfortunately , there is no miracle cure for preventing cancer .

despite the progress in research , the adoption of a healthy lifestyle, the best way to minimise the risk of him on ill .

Local search operations for phrase-based SMT

er geht ja nicht nach hause

it is yes not after house

Change phrase translation: Change the translationof one phrase.Resegment: Find a new phrase segmentation for a sequenceof adjacent source words.Swap phrases: Exchange the positions of two target phrases.

Local search operations for phrase-based SMT

er geht ja nicht nach hause

he is yes not after house

Change phrase translation: Change the translationof one phrase.Resegment: Find a new phrase segmentation for a sequenceof adjacent source words.Swap phrases: Exchange the positions of two target phrases.

Local search operations for phrase-based SMT

er geht ja nicht nach hause

he is not after house

Change phrase translation: Change the translationof one phrase.Resegment: Find a new phrase segmentation for a sequenceof adjacent source words.Swap phrases: Exchange the positions of two target phrases.

Local search operations for phrase-based SMT

er geht ja nicht nach hause

he is not afterhouse

Change phrase translation: Change the translationof one phrase.Resegment: Find a new phrase segmentation for a sequenceof adjacent source words.Swap phrases: Exchange the positions of two target phrases.

Change translation option:• change the translation of one randomly selected phrase

måndag 24 september 18

Page 57: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Search Operations

Search steps: 8,192

that prevention of the disease

unfortunately , there are no miracle cure for preventing cancer .

in spite of the progress in research remains the adoption of ahealthy lifestyle is the best way , about the risk to reduce , in himon developing them .

Search steps: 134,217,728

prevention of the disease

unfortunately , there is no miracle cure for preventing cancer .

despite the progress in research , the adoption of a healthy lifestyle, the best way to minimise the risk of him on ill .

Local search operations for phrase-based SMT

er geht ja nicht nach hause

it is yes not after house

Change phrase translation: Change the translationof one phrase.Resegment: Find a new phrase segmentation for a sequenceof adjacent source words.Swap phrases: Exchange the positions of two target phrases.

Local search operations for phrase-based SMT

er geht ja nicht nach hause

he is yes not after house

Change phrase translation: Change the translationof one phrase.Resegment: Find a new phrase segmentation for a sequenceof adjacent source words.Swap phrases: Exchange the positions of two target phrases.

Local search operations for phrase-based SMT

er geht ja nicht nach hause

he is not after house

Change phrase translation: Change the translationof one phrase.Resegment: Find a new phrase segmentation for a sequenceof adjacent source words.Swap phrases: Exchange the positions of two target phrases.

Local search operations for phrase-based SMT

er geht ja nicht nach hause

he is not afterhouse

Change phrase translation: Change the translationof one phrase.Resegment: Find a new phrase segmentation for a sequenceof adjacent source words.Swap phrases: Exchange the positions of two target phrases.

Change translation option:• change the translation of one randomly selected phrase

Resegment:• find a new segmentation into phrases for a randomly

selected sequence of adjacent source words

Local search operations for phrase-based SMT

er geht ja nicht nach hause

he is yes not after house

Change phrase translation: Change the translationof one phrase.Resegment: Find a new phrase segmentation for a sequenceof adjacent source words.Swap phrases: Exchange the positions of two target phrases.

Local search operations for phrase-based SMT

er geht ja nicht nach hause

he is not after house

Change phrase translation: Change the translationof one phrase.Resegment: Find a new phrase segmentation for a sequenceof adjacent source words.Swap phrases: Exchange the positions of two target phrases.

Local search operations for phrase-based SMT

er geht ja nicht nach hause

he is not afterhouse

Change phrase translation: Change the translationof one phrase.Resegment: Find a new phrase segmentation for a sequenceof adjacent source words.Swap phrases: Exchange the positions of two target phrases.

måndag 24 september 18

Page 58: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Search Operations

Search steps: 8,192

that prevention of the disease

unfortunately , there are no miracle cure for preventing cancer .

in spite of the progress in research remains the adoption of ahealthy lifestyle is the best way , about the risk to reduce , in himon developing them .

Search steps: 134,217,728

prevention of the disease

unfortunately , there is no miracle cure for preventing cancer .

despite the progress in research , the adoption of a healthy lifestyle, the best way to minimise the risk of him on ill .

Local search operations for phrase-based SMT

er geht ja nicht nach hause

it is yes not after house

Change phrase translation: Change the translationof one phrase.Resegment: Find a new phrase segmentation for a sequenceof adjacent source words.Swap phrases: Exchange the positions of two target phrases.

Local search operations for phrase-based SMT

er geht ja nicht nach hause

he is yes not after house

Change phrase translation: Change the translationof one phrase.Resegment: Find a new phrase segmentation for a sequenceof adjacent source words.Swap phrases: Exchange the positions of two target phrases.

Local search operations for phrase-based SMT

er geht ja nicht nach hause

he is not after house

Change phrase translation: Change the translationof one phrase.Resegment: Find a new phrase segmentation for a sequenceof adjacent source words.Swap phrases: Exchange the positions of two target phrases.

Local search operations for phrase-based SMT

er geht ja nicht nach hause

he is not afterhouse

Change phrase translation: Change the translationof one phrase.Resegment: Find a new phrase segmentation for a sequenceof adjacent source words.Swap phrases: Exchange the positions of two target phrases.

Change translation option:• change the translation of one randomly selected phrase

Resegment:• find a new segmentation into phrases for a randomly

selected sequence of adjacent source wordsSwap phrases:

• Exchange the positions of two phrases

Local search operations for phrase-based SMT

er geht ja nicht nach hause

he is yes not after house

Change phrase translation: Change the translationof one phrase.Resegment: Find a new phrase segmentation for a sequenceof adjacent source words.Swap phrases: Exchange the positions of two target phrases.

Local search operations for phrase-based SMT

er geht ja nicht nach hause

he is not after house

Change phrase translation: Change the translationof one phrase.Resegment: Find a new phrase segmentation for a sequenceof adjacent source words.Swap phrases: Exchange the positions of two target phrases.

Local search operations for phrase-based SMT

er geht ja nicht nach hause

he is not afterhouse

Change phrase translation: Change the translationof one phrase.Resegment: Find a new phrase segmentation for a sequenceof adjacent source words.Swap phrases: Exchange the positions of two target phrases.

Local search operations for phrase-based SMT

er geht ja nicht nach hause

he is yes not after house

Change phrase translation: Change the translationof one phrase.Resegment: Find a new phrase segmentation for a sequenceof adjacent source words.Swap phrases: Exchange the positions of two target phrases.

Local search operations for phrase-based SMT

er geht ja nicht nach hause

he is not after house

Change phrase translation: Change the translationof one phrase.Resegment: Find a new phrase segmentation for a sequenceof adjacent source words.Swap phrases: Exchange the positions of two target phrases.

Local search operations for phrase-based SMT

er geht ja nicht nach hause

he is not afterhouse

Change phrase translation: Change the translationof one phrase.Resegment: Find a new phrase segmentation for a sequenceof adjacent source words.Swap phrases: Exchange the positions of two target phrases.

måndag 24 september 18

Page 59: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Summary

Document-Level Decoding• local search with stochastic change operations• flexible framework that allows long-distance dependencies• possible to initialize with beam search

Discourse-Level SMT• respect connectedness of sentences• treat phenomena that differ between languages• difficult to achieve visible (positive) effects

måndag 24 september 18

Page 60: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Coming Up

This week• Choose your preferred project topics (today!)• Assigment on PBSMT and Moses• Second lab 1 session• Lecture on Thursday cancelled• Start discussing/working on projects

Next week• Assignment on document-decoding with Docent• NMT lecture

Later• More NMT• Guest lecture• Project work and student seminars

måndag 24 september 18

Page 61: Decoding in SMT and more - cl.lingfil.uu.seThe Log-Linear Model: Decoding and Tuning in Statistical Machine Translation Christian Hardmeier 2015-05-20 Decoding The decoder is the part

Course organization

I will be on parental leave from October 2nd

Who to ask questions:• General formal questions about course organization,

grading etc → Mats• Labs and assignments → Zhengxian• NMT part of course → Gongbo• Project-related questions → Gongbo, Zhengxian

måndag 24 september 18