institut für anthropomatik 114.02.2014 introduction to smt – word-based translation models jan...

81
Institut für Anthropomatik 1 18.05.22 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Upload: jada-adkins

Post on 27-Mar-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik1 10.04.23

Introduction to SMT –Word-based Translation Models

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Page 2: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik2 10.04.23

Overview

IntroductionLexica

Alignment

IBM Model 1

EM Algorithm

Higher IBM Models

Word Alignment

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Page 3: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik3 10.04.23

Introduction

NotationSource

source (foreign) word

I: length of foreign sentence

i: position in source sentence (foreign sentence)

foreign sentence

Target: target (English) word

J: length of English sentence

j: position in english sentence

English sentence

Jan Niehues - Lehrstuhl Prof. Alex Waibel

f i

f = f1I = f1... f i ... f I

e j

e = e1I = e1...ei ...eI

Page 4: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik4 10.04.23

Introduction

Statistical Machine Translation:Find most probable translation e for a given source sentence f

Use Bayes Rule

Jan Niehues - Lehrstuhl Prof. Alex Waibel

ˆ e = argmaxe

p(e | f )

ˆ e = argmaxe

p(e | f ) = argmaxe

p( f | e) p(e)

p( f )

= argmaxe

p( f | e) p(e)

Page 5: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik5 10.04.23

System overview

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Page 6: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik6 10.04.23

Word-based Translation Model

Word-based models were introduced by Brown et al. in early 90s

Directly translate source words to target words

Model word-by-word translation probabilities

First statistical approach to machine translation

No longer state of the art

Used to generate word alignment for phrase extraction in phrase based models

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Page 7: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik7 10.04.23

Lexica

Store translation of the source words

One word can have several translations

Example:Haus – house, building, home, household, shell

Some are more likely, others are only used in certain circumstances

How to decide which one to use in the translation?

Use statistics

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Page 8: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik8 10.04.23

Lexica

0.00550Shell

0.015150Household

0.02200Home

0.161600Building

0.88000House

ProbabilityCountsTranslations

• Collect counts of different translation

• Approximate probability distribution )(: epep ff

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Page 9: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik9 10.04.23

Alignment

Mapping between source and target words that are translations of each other

Example: Input:

“das Haus ist klein”

Probabilistic Lexicon

Possible word-by-word translation:The house is small

Implicit alignment between source and target sentence:

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Page 10: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik10 10.04.23

AlignmentFormalized as a function:

Maps target word position to source word position

Example:

a : j →i

}44,33,22,11{: a

Page 11: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik11 10.04.23

Alignment DifficultiesWord reordering:

Leads to non-monoton alignment

Page 12: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik12 10.04.23

Alignment DifficultiesMany-to-one alignments:

One word of the input language is translated into several words

}45,44,33,22,11{: a

Page 13: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik13 10.04.23

Alignment DifficultiesDeletion:

For some source words there is no equivalent in the translation

}54,33,22,11{: a

Page 14: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik14 10.04.23

Alignment DifficultiesInsertion:

Some words of the target sentence have no equivalent in the source sentence

Add NULL word to have still a fully defined alignment function

}67,56,55,24,43,02,11{: a

Page 15: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik15 10.04.23

Alignment RemarksMany-to-one alignments are possible but no one-to-many alignment

In this models alignments are represented by a function

Leads to problems with languages like Chinese-English

In phrase-based system this is solved by looking at the translation process from both directions

Page 16: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik16 10.04.23

IBM Model 1Model that generates different translations for a sentence with associated probability

Generative Model: Break modeling of sentence translations into smaller steps of word-to-word translations with a coherent story

Probability of the English sentence e and Alignment a given Foreign sentence f

Number of possible alignments:

Normalization constant:

elfl )1(

e

e

l

jjajl

f

fetl

faep1

)( )|()1(

)|,(

Page 17: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik17 10.04.23

IBM 1 Example

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Page 18: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik18 10.04.23

IBM 1 Training

Learn translation probability distributions

Problem: incomplete dataOnly large amounts sentence-aligned parallel texts are available

Lack alignment information

Consider alignment as a hidden variable

Approach: Expectation maximization (EM) algorithm

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Page 19: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik19 10.04.23

EM Algorithm

1. Initialize the modelUse uniform distribution

2. Apply the model to the data (expectation step)Compute alignment probabilities

First all are equal but later “Hause” will be most likely translated to “house”

3. Learn the model from the data (maximization step)Learn translation probabilities from guess alignment

Use best alignment or all with weights according to their probability

4. Iterate steps 2 and 3 until convergence

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Page 20: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik20 10.04.23

Step 2

Calculate probability of an alignment

Using dynamic programming we can reduce the complexity from exponential to quadratic in sentence length

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Page 21: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik21 10.04.23

Step 2

Put together both equations:

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Page 22: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik22 10.04.23

Step 3Collect counts from every sentence pair (e,f):

Calculate translation probabilities:

Page 23: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik23 10.04.23

Pseudo-code

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Page 24: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik24 10.04.23

Example

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Page 25: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik25 10.04.23

Convergence

Goal:Find model that best fits the data

Measure:How well does it translate unseen sentences?

At this point no test data

How well does it model the training data

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Page 26: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik26 10.04.23

Convergence

Initial Model:

First iteration:

Final:

Probability of training sentences increases

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Page 27: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik27 10.04.23

Convergenz

• Perplexity of the model:

• Perplexity is guaranteed to decrease or stay the same at each iterations

• EM converges to local minimum• IBM1: global miminum

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Page 28: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik28 10.04.23

Higher IBM Models• IBM1 is very simple

• No treatment of reordering and adding or dropping words

• Five models of increasing complexity were proposed by Brown et al.

Lexicon plus relative positionsHMM

Fixes deficiencyIBM Model 5

Relative alignment positionsIBM Model 4

adds fertility modelIBM Model 3

adds absolute positionIBM Model 2

Lexical translationsIBM Model 1

Page 29: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik29 10.04.23

Higher IBM Models• Complexity of training grows, but general principal stays the same

• During training:

– First train IBM Model 1

– Use IBM Model 1 to initialize IBM Model 2

– …

• All models are implemented in the GIZA ++ Toolkit

• Used by many groups

• Parallel version developed at CMU

Page 30: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik30 10.04.23

IBM Model 2

• Problem of IBM Model 1: same probability for these both sentence pairs

Model for the alignment based on positions of input and output words

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Page 31: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik31 10.04.23

IBM Model 2

• Two step procedure:

• Mathematical formulation:

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Page 32: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik32 10.04.23

IBM Model 2

• Lexical translation step:

• Mathematical formulation:

Jan Niehues - Lehrstuhl Prof. Alex Waibel

t(of | natürlich) * t(course | natürlich) * t(is | ist) * t(the | das) * t(house | haus) * t(small | klein)

= 0.5* 0.6*0.7*0.8*0.8 *0.5

= 0.0672

Page 33: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik33 10.04.23

IBM Model 2

• Alignment step:

• Mathematical formulation in IBM1:

Jan Niehues - Lehrstuhl Prof. Alex Waibel

a(1 |1,6,5) * a(1 | 2,6,5) * a(3 | 3,6,5) * a(4 | 4,6,5) * a(2 | 5,6,5) * a(5 | 6,6,5)

=1/6*1/6*1/6*1/6*1/6*1/6

= 2.14 *10−5

Page 34: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik34 10.04.23

IBM Model 2

• Alignment step:

• Mathematical formulation in IBM2:

Jan Niehues - Lehrstuhl Prof. Alex Waibel

a(1 |1,6,5) * a(1 | 2,6,5) * a(3 | 3,6,5) * a(4 | 4,6,5) * a(2 | 5,6,5) * a(5 | 6,6,5)

=1/3*1/4 *1/3*1/3*1/10 *1/2

= 4.6296 *10−4

Page 35: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik35 10.04.23

IBM Model 2

• Training:– Similar to IBM Model 1 training

• Initialization:– Initialize with values of IBM Model 1 training

– Alignment probability:

1

1),,|(

ffe l

lljia

Page 36: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik36 10.04.23

IBM Model 2

Page 37: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik37 10.04.23

IBM Model 2

Page 38: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik38 10.04.23

• Did not model how many words are generated by a input word• Model fertility by a probability distribution:

• Examples:

• Add additional step to the model

IBM Model 3

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Page 39: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik39 10.04.23

IBM Model 3

• Word deletion:– Modelled by Fertility 0

• Word insertions:– Could be modelled by Fertility of NULL word:– But Fertility should depend on the sentence length– Instead add NULL Insertion step

• NULL Insert step:– Add NULL token after every word with probability or not with probability

1p

10 1 pp

)|( nulln

p1

Page 40: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik40 10.04.23

IBM Model 3

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Page 41: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik41 10.04.23

IBM Model 3

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Page 42: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik42 10.04.23

IBM Model 3

• Distortion model instead of Alignment model:– Different distortions in both productions by same alignment

– Different direction of both models:

),,|( fe lljia),,|( fe llijd

Page 43: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik43 10.04.23

IBM Model 3 Mathematical Formulation

• Fertility step– Fertility greater than one:

• Different tableaus for same alignment• Alignment probability for all tableau are the same• Number of different tableaus generating same alignment:• All tableaus generating same alignment have the same probability

– Probabilitiy:

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Φi!

Φi!n(i=1

l f

∏ Φ i | f i)

Page 44: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik44 10.04.23

IBM Model 3 Mathematical Formulation

• Fertility step

Jan Niehues - Lehrstuhl Prof. Alex Waibel

1!*n(1 | ich) *1!n(1 | gehe) *0!n(0 | ja) *1!n(1 | nicht) *2!(2 | zum) *1!(1 | haus)

= 0.9 *0.9 *0.4 * 0.8* 2* 0.7 *0.8

= 0.290304

Page 45: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik45 10.04.23

IBM Model 3 Mathematical Formulation

• NULL Word insertion– Number of generated NULL words:

• Depend on the number of generated output words from input puts words• After each generated word there may be inserted a NULL Word• s words generated from foreign input words• Maximal number of generated NULL words• Probability:

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Φ0

Page 46: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik46 10.04.23

IBM Model 3 Mathematical Formulation

• NULL Word insertion

Jan Niehues - Lehrstuhl Prof. Alex Waibel

(7 −1

1) *0.1*0.9* 0.9*0.9* 0.9*0.9

= 0.413343

Page 47: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik47 10.04.23

IBM Model 3 Mathematical Formulation

• Combine Fertility, lexical translation and distortion probabilities

Jan Niehues - Lehrstuhl Prof. Alex Waibel

p(e | f ) = p(e,a | f )a

= ... (le − Φ0

Φ0

) Φ i!n(Φ | f i)i=1

l f

∏a( le )=0

l f

∑a(1)=0

l f

t(e j | fa( j ))d( j | a( j),le, l f )j =1

le

Page 48: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik48 10.04.23

IBM Model 3 Training

• Problem: Exponential Number of Alignments– IBM1/2: Dynamic Programming

– IBM 3: No longer possible to use

• Sampling from space of possible alignments– Find most probable alignments– Add additional similar alignments– Use only these alignment for normalization

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Page 49: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik49 10.04.23

IBM Model 3 Training

• Finding most probable alignment– Exp. Number -> test all possible alignments to complex

– Use Hill climbing algorithm• Evaulate all points in neighbour• Go to highest Point• Iterate

• Problem: may end in local maxima– Start a various locations

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Page 50: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik50 10.04.23

IBM Model 3 Training

• Initialization:– Exp. Number -> test all possible alignments to complex

– Use Hill climbing algorithm• Evaulate all points in neighbour• Go to highest Point• Iterate

• Problem: may end in local maxima– Start a various locations

– Pegging

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Page 51: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik51 10.04.23

Pegging

• For all indices i– For all indices j

• Set alignment a(j)=i• Find most probable alignment under this condition• Add to set of starting points

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Page 52: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik52 10.04.23

Hillcliming

• Find most probable alignment in neighborhood• Neighborhood:

– Alignments differ by move• Two alignments differ a1 and a2 differ by a move if the alignments differ only in the

alignment for one word j

– Alignments differ by swap• Two alignments a1 and a2 differ by a swap if the agree in the alignments for all

words, except for two, for which the alignment points are switched:

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Page 53: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik53 10.04.23

IBM3 Training

• Summary for IBM3 training– Sampling the alignments

• Pegging

– Collecting counts

– Estimating probabilities

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Page 54: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik54 10.04.23

IBM Model 4

• Distortion Model:– Absolute Position in IBM Model 3

– Long sentences are relative rare

– Distortion probability can not approximated well

– Use relative position instead

– Problem:• Added Words• Droped Words• One-to-many alignments

d(j | i, le, l f )

Page 55: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik55 10.04.23

IBM Model 4

• Cept:– Each input word fj that is aligned to at least one output word forms a cept

Page 56: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik56 10.04.23

IBM Model 4

• Cept:– Each input word fj that is aligned to at least one output word forms a cept

– Center:• Ceiling of the average of the output word positions

Page 57: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik57 10.04.23

IBM Model 4

• Relative distortion:– Define relative distortion for each output word

1. Target words generated by the NULL token:• Uniform distribution

2. First word of a cept• Word position j relative to the center of the preceding cept i-1

3. Subsequent words in a cept• Word position I relative to postion of previous word in the cept

Page 58: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik58 10.04.23

IBM Model 4

Page 59: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik59 10.04.23

IBM Model 4

• Word classes:– Richer conditioning on the distortion:

• Some words are reordered more often• E.g.: Adjectives when translating form English to French

• Not sufficient statistics to estimate probabilities• Group words into word classes

• Possible classes: POS,• Originally: automatically cluster words

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Page 60: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik60 10.04.23

IBM Model 5

• Deficiency: According to IBM Model 3 and 4 multiple output words can be placed at the same position

• Positive probability for impossible alignments• IBM Model 5 prevent this

– No longer multiple tablaux with same alignment

– Place words only into vacant words position

– For all word positions• How many untranslated words until this word

• No improvement in alignment quality• Not used in most state-of-the-art systems

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Page 61: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik61 10.04.23

HMM Alignment Model

• HMM successfully used in speech recognition• Introduced by Vogel et. al• Idea: Use relative position instead of absolute

– Entire word groups (phrases) are moved with respect to source position

• Giza Toolkit:– Replace IBM2 by HMM Model

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Page 62: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik62 10.04.23

HMM Alignment Model

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Page 63: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik63 10.04.23

HMM Alignment Model

• First order model: target position dependent on previous target position(captures movement of entire phrases)

• Alignment probability:

• Maximum approximation:

Jan Niehues - Lehrstuhl Prof. Alex Waibel

),,|(),,|Pr( 101

1 IJaapeJaa jjIj

j

J

j

a

J

jajjj

IJ efpIaapIJpef1 1

111 )|(),|()|()|Pr(

J

jajjj

a

IJ

jJefpIaapIJpef

1111 )|(),|(max)|()|Pr(

1

Page 64: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik64 10.04.23

Viterbi Training

Jan Niehues - Lehrstuhl Prof. Alex Waibel

# Accumulation (over corpus)# find Viterbi pathFor each sentence pair For each source position j For each target position i Pbest = 0; t = p(fj|ei) For each target position i’ Pprev = P(j-1,i’) a = p(i|i’,I,J) Pnew = Pprev*t*a if (Pnew > Pbest) Pbest = Pnew

BackPointer(j,i) = i’

# update countsi = argmax{ BackPointer( J, I ) }For each j from J downto 1 Count(f_j, e_i)++ Count(i,iprev,I,J)++ i = BackPoint(j,i)

# renormalize…

Pprev

a = p(i | i’,I,J)

t = p(fj | ei)

Pnew=Pprev*a*t

Page 65: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik65 10.04.23

HMM Forward-Backward Training

• Gamma: Probailitiy to emit fj when state i in sentence s

• Sum over all paths through (j,i)

Jan Niehues - Lehrstuhl Prof. Alex Waibelj

i

iaa

J

jajjj

sj

jJ

jefpIaapi

, 1''1''

1

')|(),|()(

Page 66: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik66 10.04.23

HMM Forward-Backward Training

• Epsilon: Probability to transit from state I’ into I

• Sum over all paths through (j-1,I’) and (j,i) emitting fj

Jan Niehues - Lehrstuhl Prof. Alex Waibel

iaiaa

J

jajjj

jjJ

jefpIaapii

,', 1''1''

11

')|(),|(),'(

11-731 Machine Translation (2009)j-1

i

j

Page 67: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik67 10.04.23

Forward Probabilities

• Defined as:

• Recursion:

• Initial condition

Jan Niehues - Lehrstuhl Prof. Alex Waibel

iaa

j

jajjjj

jj

jefpIaapi

, 1''1''

1

')|(),|()(

)|(),'|()'()(1'

1 ij

I

ijj efpIiipii

)|(),0|()( 10 iefpIipi

j

i

Page 68: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik68 10.04.23

Backward Probabilities

• Defined as:

• Recursion:

• Initial condition

Jan Niehues - Lehrstuhl Prof. Alex Waibel

)|(),|'()'()( 11'

1 ij

I

ijj efpIiipii

iaa

J

jjajjjj

jJj

jefpIaapi

, ''1'' )|(),|()(

'

1)(0 I

j

i

Page 69: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik69 10.04.23

Forward-Backward

• Calculaate Gamma and Epsilon with Alpha and Beta

– Gamma:

– Epsilon

Jan Niehues - Lehrstuhl Prof. Alex Waibel

I

ij

j

ii

ii

1'

)()'(

)()(

iijijj

jijj

iefpIiipi

iefpIiipiii

~,'

~~1

1

)~

()|(),'~

|~

()'~

(

)()|(),'|()'(),'(

Page 70: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik70 10.04.23

Parameter Re-Estimation

• Lexicon probabilties:

• Aligment probailities:

Jan Niehues - Lehrstuhl Prof. Alex Waibel

S

s

J

eej

sj

S

s

J

eeffj

sj

s

i

s

ij

i

i

efp

1 1

1,1

)(

)(

)|(

p(i | i') =

ε js (i',i)

j =1

J s

∑s=1

S

γ js (i)

j =1

J s

∑s=1

S

Page 71: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik71 10.04.23

Forward-Backward Training Pseudo Code

Jan Niehues - Lehrstuhl Prof. Alex Waibel

# Accumulation

For each sentence-pair {

Forward. (Calculate Alpha’s)

Backward. (Calculate Beta’s)

Calculate Xi’s and Gamma’s.

For each source word {

Increase LexiconCount(f_j|e_i) by Gamma(j,i).

Increase AlignCount(i|i’) by Epsilon(j,i,i’).

}

}

# Update

Normalize LexiconCount to get P(f_j|e_i).

Normalize AlignCount to get P(i|i’).

Page 72: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik72 10.04.23

Example HMM Training

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Page 73: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik73 10.04.23

IBM Models

• Phrase-based systems outperform these word-based translation models• IBM Models can be used to generate a word alignment by using the viterbi

path• Problem: 1-to-many• But we can generate many-to-1 alignments• Use alignments from both directions and combine with a heuristic

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Page 74: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik74 10.04.23

Word alignment

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Page 75: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik75 10.04.23

Word alignment

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Page 76: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik76 10.04.23

Word alignment

• Evaluation:– Given some manually aligned data (ref) and automatically aligned data (hyp)

links can be• Correct, i.e. link in hyp matches link in ref: true positive (tp)• Wrong, i.e. link in hyp but not in ref: false positive (fp)• Missing, i.e. link in ref but not in hyp: false negaitve (fn)

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Page 77: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik77 10.04.23

Word alignment Measures

• Precision:– Number of correct links / Number of links in hyp

– Problem:• Less Links -> Improve Presicion

• Recall:– Number of correct links / Number of links in reference

– Problem:• All links in Alignment -> Recall = 1

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Precision =t p

t p + f p

=A ∩ R

| A |

Recall =t p

t p + fn

A ∩ R

| R |

Page 78: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik78 10.04.23

Word alignment Measures

• Precision:– Number of correct links / Number of links in hyp

– Problem:• Less Links -> Improve Presicion

• Recall:– Number of correct links / Number of links in reference

– Problem:• All links in Alignment -> Recall = 1

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Precision =t p

t p + f p

=A ∩ R

| A |

Recall =t p

t p + fn

A ∩ R

| R |

Page 79: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik79 10.04.23

Word alignment Measures

• F-Score:

• Alignment error rate (AER):

Jan Niehues - Lehrstuhl Prof. Alex Waibel

F − Score =2* t p

2* t p + f p + fn

2* A ∩ R

| A | + | R |

AER =1 − F − Score

Page 80: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik80 10.04.23

Refernce

• Sometimes it is difficult for human annotators to decide• Differentiate between sure and possible links• Sets:

– A: generated links

– S: sure links (not finding a sure link is an error)

– P: possible links (putting a link which is not possible is an error)

– Alignment error rate

Jan Niehues - Lehrstuhl Prof. Alex Waibel

Page 81: Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

Institut für Anthropomatik81 10.04.23

Conclusion

• Word-based Translation Models• Word alignment as hidden variable• Only 1-n alignments possible

Jan Niehues - Lehrstuhl Prof. Alex Waibel