cs460/626 : natural language processing/speech nlp and...

22
CS460/626 : Natural Language Processing/Speech NLP and the Web Processing/Speech, NLP and the Web (Lecture 18– Alignment in SMT and Tutorial on Giza++ and Moses) on Giza++ and Moses) Pushpak Bhattacharyya Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th F b 2011 15 th Feb, 2011

Upload: others

Post on 31-Aug-2019

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS460/626 : Natural Language Processing/Speech NLP and …pb/cs626-460-2011/cs626-460-lect18-mt-as-labeling-giza... · CS460/626 : Natural Language Processing/Speech NLP and the WebProcessing/Speech,

CS460/626 : Natural Language Processing/Speech NLP and the WebProcessing/Speech, NLP and the Web

(Lecture 18– Alignment in SMT and Tutorial on Giza++ and Moses)on Giza++ and Moses)

Pushpak BhattacharyyaPushpak BhattacharyyaCSE Dept., IIT Bombay

15th F b 201115th Feb, 2011

Page 2: CS460/626 : Natural Language Processing/Speech NLP and …pb/cs626-460-2011/cs626-460-lect18-mt-as-labeling-giza... · CS460/626 : Natural Language Processing/Speech NLP and the WebProcessing/Speech,

Going forward from word alignmentalignment

Word alignmentWord alignment

Phrase Alignment Decoding(going to bigger units (best possibleOf correspondence) translation)

Page 3: CS460/626 : Natural Language Processing/Speech NLP and …pb/cs626-460-2011/cs626-460-lect18-mt-as-labeling-giza... · CS460/626 : Natural Language Processing/Speech NLP and the WebProcessing/Speech,

Abstract ProblemAbstract Problem

Given: e e e e e e (Entities)Given: eoe1e2e3….enen+1 (Entities)

Goal: l l1l2l3 l l 1 (Labels)Goal: lol1l2l3….lnln+1 (Labels)

The Goal is to find the best possible label sequence

))|((maxarg* ELPLL

=

Generative Model

)|().(maxarg)|(maxarg LEPLPELPL

=L

Page 4: CS460/626 : Natural Language Processing/Speech NLP and …pb/cs626-460-2011/cs626-460-lect18-mt-as-labeling-giza... · CS460/626 : Natural Language Processing/Speech NLP and the WebProcessing/Speech,

SimplificationSimplification

Using Markov Assumption the LanguageUsing Markov Assumption, the Language Model can be represented using bigrams

Simila l t anslation model can also be

)|()( 10

ii

n

iLLPLP +

=∏=

Similarly translation model can also be represented in the following way:

∏=

=n

iii lePLEP

0

)|()|(

Page 5: CS460/626 : Natural Language Processing/Speech NLP and …pb/cs626-460-2011/cs626-460-lect18-mt-as-labeling-giza... · CS460/626 : Natural Language Processing/Speech NLP and the WebProcessing/Speech,

Statistical Machine Translation

Finding the best possible English sentence given the foreign sentencesentence given the foreign sentence

)|().(maxarg)|(maxarg* EFPEPFEPeE

==

P(E)= Language ModelP(F|E) Translation ModelP(F|E) = Translation ModelE: English, F: Foreign Language

Page 6: CS460/626 : Natural Language Processing/Speech NLP and …pb/cs626-460-2011/cs626-460-lect18-mt-as-labeling-giza... · CS460/626 : Natural Language Processing/Speech NLP and the WebProcessing/Speech,

Problems in the frameworkProblems in the frameworkLabels are words of the target languageLabels are words of the target language

Very large in number Who do you want to_go with ? Preposition

With whom do you want to go ?आप िकस के_साथ जाना चाहते_हो (Aap kis ke sath jaana chahate ho)

Stranding

(Aap kis ke_sath jaana chahate_ho)who whodo do and so on

you youwant wantto_go to_gowith with

Page 7: CS460/626 : Natural Language Processing/Speech NLP and …pb/cs626-460-2011/cs626-460-lect18-mt-as-labeling-giza... · CS460/626 : Natural Language Processing/Speech NLP and the WebProcessing/Speech,

Column of words of target language on the

l dsource language words

^ Aap kis ke_sath jaana chahate_ho .who whodo do and so on you youy y

^ want want … .to_go to_gowith withwith with

Find the best possible path from ‘^’ to ‘.’ using transition andObservation probabilities.

Viterbi can be usedViterbi can be used

Page 8: CS460/626 : Natural Language Processing/Speech NLP and …pb/cs626-460-2011/cs626-460-lect18-mt-as-labeling-giza... · CS460/626 : Natural Language Processing/Speech NLP and the WebProcessing/Speech,

TUTORIAL ON Giza++ and Moses tools(delivered by Kushal Ladha)

Page 9: CS460/626 : Natural Language Processing/Speech NLP and …pb/cs626-460-2011/cs626-460-lect18-mt-as-labeling-giza... · CS460/626 : Natural Language Processing/Speech NLP and the WebProcessing/Speech,

Word-based alignmentWord based alignment

For each word in source language alignFor each word in source language, align words from target language that this word possibly producespossibly producesBased on IBM models 1-5M d l 1 i l tModel 1 – simplestAs we go from models 1 to 5, models get more complex but more realisticThis is all that Giza++ does

Page 10: CS460/626 : Natural Language Processing/Speech NLP and …pb/cs626-460-2011/cs626-460-lect18-mt-as-labeling-giza... · CS460/626 : Natural Language Processing/Speech NLP and the WebProcessing/Speech,

Ali tAlignment

A function from target position to source position:

The alignment sequence is: 2,3,4,5,6,6,6Ali f i A A(1) 2 A(2) 3 Alignment function A: A(1) = 2, A(2) = 3 ..A different alignment function will give the sequence:1,2,1,2,3,4,3,4 for A(1), A(2)..

10

To allow spurious insertion, allow alignment with word 0 (NULL)No. of possible alignments: (I+1)J

Page 11: CS460/626 : Natural Language Processing/Speech NLP and …pb/cs626-460-2011/cs626-460-lect18-mt-as-labeling-giza... · CS460/626 : Natural Language Processing/Speech NLP and the WebProcessing/Speech,

IBM Model 1: Generative ProcessProcess

11

Page 12: CS460/626 : Natural Language Processing/Speech NLP and …pb/cs626-460-2011/cs626-460-lect18-mt-as-labeling-giza... · CS460/626 : Natural Language Processing/Speech NLP and the WebProcessing/Speech,

Training Alignment ModelsTraining Alignment Models

Given a parallel corpora, for each (F,E) learn the best alignment A and thelearn the best alignment A and the component probabilities:

t(f| ) f M d l 1t(f|e) for Model 1lexicon probability P(f|e) and alignment probability P(a |a I)probability P(ai|ai-1,I)

How to compute these probabilities if all h i ll l

12

you have is a parallel corpora

Page 13: CS460/626 : Natural Language Processing/Speech NLP and …pb/cs626-460-2011/cs626-460-lect18-mt-as-labeling-giza... · CS460/626 : Natural Language Processing/Speech NLP and the WebProcessing/Speech,

Intuition : Interdependence of ProbabilitiesProbabilities

If you knew which words are probable translation of each other then you cantranslation of each other then you can guess which alignment is probable and which one is improbablepIf you were given alignments with probabilities then you can compute p y ptranslation probabilitiesLooks like a chicken and egg problem

13

gg pEM algorithm comes to the rescue

Page 14: CS460/626 : Natural Language Processing/Speech NLP and …pb/cs626-460-2011/cs626-460-lect18-mt-as-labeling-giza... · CS460/626 : Natural Language Processing/Speech NLP and the WebProcessing/Speech,

Limitation: Only 1->Many Alignments ll dallowed

14

Page 15: CS460/626 : Natural Language Processing/Speech NLP and …pb/cs626-460-2011/cs626-460-lect18-mt-as-labeling-giza... · CS460/626 : Natural Language Processing/Speech NLP and the WebProcessing/Speech,

Phrase-based alignmentPhrase based alignment

More natural

Many-to-one mappings allowed

Page 16: CS460/626 : Natural Language Processing/Speech NLP and …pb/cs626-460-2011/cs626-460-lect18-mt-as-labeling-giza... · CS460/626 : Natural Language Processing/Speech NLP and the WebProcessing/Speech,

Giza++ and Moses PackageGiza++ and Moses Package

http://cl naist jp/~eric-n/ubuntu-nlp/http://cl.naist.jp/~eric-n/ubuntu-nlp/Select your Ubuntu versionBrowse the nlp folderDownload debian package of giza++, p g g ,moses, mkcls, srilmResolve all the dependencies and they getResolve all the dependencies and they get installedFor alternate installation refer toFor alternate installation, refer to http://www.statmt.org/moses_steps.html

Page 17: CS460/626 : Natural Language Processing/Speech NLP and …pb/cs626-460-2011/cs626-460-lect18-mt-as-labeling-giza... · CS460/626 : Natural Language Processing/Speech NLP and the WebProcessing/Speech,

StepsSteps

Input - sentence aligned parallel corpusO t t t t id t d d tOutput- target side tagged data

TrainingTuningGenerate output on test corpusGenerate output on test corpus (decoding)

Page 18: CS460/626 : Natural Language Processing/Speech NLP and …pb/cs626-460-2011/cs626-460-lect18-mt-as-labeling-giza... · CS460/626 : Natural Language Processing/Speech NLP and the WebProcessing/Speech,

TrainingTraining Create a folder named corpus containing test, train and tuning fileGiza++ is used to generate alignmentg gPhrase table is generated after trainingBefore training language model needs toBefore training language model needs to be build on target sidemkdir lm ; /usr/bin/ngram-count -order 3 -interpolate -kndiscount -text d ; /us /b / g a cou t o de 3 te po ate d scou t te t$PWD/corpus/train_surface.hi -lm lm/train.lm;/usr/share/moses/scripts/training/train-factored-phrase-model.perl -scripts-root-dir /usr/share/moses/scripts -root-dir . -corpus train.clean -e hi -f en -l $ /l / llm 0:3:$PWD/lm/train.lm:0;

Page 19: CS460/626 : Natural Language Processing/Speech NLP and …pb/cs626-460-2011/cs626-460-lect18-mt-as-labeling-giza... · CS460/626 : Natural Language Processing/Speech NLP and the WebProcessing/Speech,

ExampleExample

train en train prtrain.enh e l l oh l l

train.prhh eh l owhh h l h e l l o

w o r l dc o m p o u n d w o r d

hh ah l oww er l dk d dc o m p o u n d w o r d

h y p h e n a t e do n e

k aa m p aw n d w er dhh ay f ah n ey t ih dow eh n iyo n e

b o o mk w e e z l e b o t t e r

ow eh n iyb uw mk w iy z l ah b aa t ah rk w e e z l e b o t t e r k w iy z l ah b aa t ah r

Page 20: CS460/626 : Natural Language Processing/Speech NLP and …pb/cs626-460-2011/cs626-460-lect18-mt-as-labeling-giza... · CS460/626 : Natural Language Processing/Speech NLP and the WebProcessing/Speech,

Sample from Phrase-tableSample from Phrase table

b ||| b ||| (0) (1) ||| (0) (1) ||| 1 0 666667 1 0 181818b o ||| b aa ||| (0) (1) ||| (0) (1) ||| 1 0.666667 1 0.181818 2.718

b ||| b ||| (0) ||| (0) ||| 1 1 1 1 2.718c o m p o ||| aa m p ||| (2) (0,1) (1) (0) (1) ||| (1,3) (1,2,4) (0)

||| 1 0.0486111 1 0.154959 2.718c ||| p ||| (0) ||| (0) ||| 1 1 1 1 2.718d w ||| d w ||| (0) (1) ||| (0) (1) ||| 1 0.75 1 1 2.718

l l o ||| l ow ||| (0) (0) (1) ||| (0,1) (2) ||| 0.5 1 1 0.227273 2.718l l ||| l ||| (0) (0) ||| (0,1) ||| 0.25 1 1 0.833333 2.718l o ||| l ow ||| (0) (1) ||| (0) (1) ||| 0.5 1 1 0.227273 2.718l ||| l ||| (0) ||| (0) ||| 0 75 1 1 0 833333 2 718d ||| d ||| (0) ||| (0) ||| 1 1 1 1 2.718

e b ||| ah b ||| (0) (1) ||| (0) (1) ||| 1 1 1 0.6 2.718e l l ||| ah l ||| (0) (1) (1) ||| (0) (1,2) ||| 1 1 0.5 0.5 2.718e l l ||| eh l ||| (0) (0) (1) ||| (0,1) (2) ||| 1 0.111111 0.5

0.111111 2.718e l ||| eh ||| (0) (0) ||| (0,1) ||| 1 0.111111 1 0.133333 2.718e ||| ah ||| (0) ||| (0) ||| 1 1 0 666667 0 6 2 718

l ||| l ||| (0) ||| (0) ||| 0.75 1 1 0.833333 2.718m ||| m ||| (0) ||| (0) ||| 1 0.5 1 1 2.718n d ||| n d ||| (0) (1) ||| (0) (1) ||| 1 1 1 1 2.718n e ||| eh n iy ||| (1) (2) ||| () (0) (1) ||| 1 1 0.5 0.3 2.718n e ||| n iy ||| (0) (1) ||| (0) (1) ||| 1 1 0.5 0.3 2.718n ||| eh n ||| (1) ||| () (0) ||| 1 1 0.25 1 2.718e ||| ah ||| (0) ||| (0) ||| 1 1 0.666667 0.6 2.718

h e ||| hh ah ||| (0) (1) ||| (0) (1) ||| 1 1 1 0.6 2.718h ||| hh ||| (0) ||| (0) ||| 1 1 1 1 2.718l e b ||| l ah b ||| (0) (1) (2) ||| (0) (1) (2) ||| 1 1 1 0.5 2.718l e ||| l ah ||| (0) (1) ||| (0) (1) ||| 1 1 1 0.5 2.718

n ||| eh n ||| (1) ||| () (0) ||| 1 1 0.25 1 2.718o o m ||| uw m ||| (0) (0) (1) ||| (0,1) (2) ||| 1 0.5 1 0.181818 2.718o o ||| uw ||| (0) (0) ||| (0,1) ||| 1 1 1 0.181818 2.718o ||| aa ||| (0) ||| (0) ||| 1 0.666667 0.2 0.181818 2.718o ||| ow eh ||| (0) ||| (0) () ||| 1 1 0.2 0.272727 2.718o ||| ow ||| (0) ||| (0) ||| 1 1 0.6 0.272727 2.718w o r ||| w er ||| (0) (1) (1) ||| (0) (1,2) ||| 1 0.1875 1 0.424242 2.718w ||| w ||| (0) ||| (0) ||| 1 0.75 1 1 2.718

Page 21: CS460/626 : Natural Language Processing/Speech NLP and …pb/cs626-460-2011/cs626-460-lect18-mt-as-labeling-giza... · CS460/626 : Natural Language Processing/Speech NLP and the WebProcessing/Speech,

TuningTuning

Not a compulsory step but will improve the decoding by a small percentagethe decoding by a small percentagemkdir tuning; cp $WDIR/corpus/tun.en tuning/input; cp $WDIR/corpus/tun.hi tuning/reference; /usr/share/moses/scripts/training/mert moses pl $PWD/tuning/input/usr/share/moses/scripts/training/mert-moses.pl $PWD/tuning/input $PWD/tuning/reference /usr/bin/moses $PWD/model/moses.ini --working-dir $PWD/tuning --rootdir /usr/share/moses/scripts

It will take around 1 hour on a server with 32GBIt will take around 1 hour on a server with 32GB RAM

Page 22: CS460/626 : Natural Language Processing/Speech NLP and …pb/cs626-460-2011/cs626-460-lect18-mt-as-labeling-giza... · CS460/626 : Natural Language Processing/Speech NLP and the WebProcessing/Speech,

TestingTesting

mkdir evaluation; /usr/bin/moses -config $WDIR/tuning/moses.ini -input-file $WDIR/corpus/test.en >evaluation/test.output;

The output will be inThe output will be in evaluation/test.output fileSample OutputSample Output

h o t hh aa th |UNK hh h ip h o n e p|UNK hh ow eh n iy

b o o k b uw k