recall: abenhon cs388: natural language processing
TRANSCRIPT
SomeslidesadaptedfromDanKlein,UCBerkeley
CS388:NaturalLanguageProcessing
GregDurreB
Lecture17:MachineTranslaHon1
StarWarsTheThirdGathers:TheBackstrokeoftheWest(subHtlesmachinetranslatedfromChinese)
Recall:ABenHon
themoviewasgreat
h1 h2 h3 h4
<s>
h̄1
‣ Foreachdecoderstate,computeweightedsumofinputstates
eij = f(h̄i, hj)
ci =X
j
↵ijhj
c1
‣ SomefuncHonf(TBD)
‣Weightedsumofinputhidden states(vector)
le
↵ij =exp(eij)Pj0 exp(eij0)
P (yi|x, y1, . . . , yi�1) = softmax(W [ci; h̄i])
P (yi|x, y1, . . . , yi�1) = softmax(Wh̄i)‣ NoaBn:
the
movie was
great
Recall:PointerNetworks
P (yi|x, y1, . . . , yi�1) = softmax(W [ci; h̄i])
‣ Pointernetwork:predictfromsourcewordsinsteadoftargetvocab
the
movie
was
greata of … … …
themoviewasgreat the
movie
was
greata of … … …
‣ Standarddecoder(Pvocab):soUmaxovervocabulary,allwordsget>0prob
{ h>j V h̄i if yi = wj
<latexit sha1_base64="Q3tdPpnhrZJm8oaTIwqkzDwIlKQ=">AAAGcHicjVRbb9MwFM7GWka5bewFCRCGqSJhYWq2SSCkoQleEBJSuewiLVvkOm7rLTfZ7tIq9Ss/kDd+BC/8Ak6cZGNdtxGp6cn5vnOzP7uTBEzIVuvXzOyNuVr95vytxu07d+/dX1h8sCPiASd0m8RBzPc6WNCARXRbMhnQvYRTHHYCuts5/pDjuyeUCxZH3+UooQch7kWsywiW4PIW5340XRwkfew5prDQJnLpMDHdpM88agrbsd0Qy36nmw2VZTUqrjSFJzVbDEIvE14mXzlKoRLWX2bptc6n9KQtJ5Oe4rIKsvP8FgKsQ+XUeiu6nkb1h1k6J8tppw2vK4tKuwrXRQPalWZRq4dcFiFX0qHkYRb1OA4FVE49AIgfS0Rcznp9macM4h5qm6NNZ5zaxEIryO1yTDJHZceqbJ1tOuoQ8lXU1jj1GJAbzbZZNZiqfNS2mXqOpf/Wxqfmem7a4II5Aygv8kBIod2FB9CM6YUvkuQYOKJ8HSYYOnhcDXfCBJPUR99wpIOL7iuUxINIKnMa2Uappf6HaKnzFSNYvxRHEskY9WJ4T7Z0LeFT3I9QVeIz9nEPC4J50X8A58AHZV89yWUprpzqkiALpLFS7ow2ig42pncwLX+uqXKZzvZg7GbJWzRBTWxgvGu5agxZcnGl6Dpeoyk8p9EcFprWcutkX9WhjwoRY87jFA1fFHjWsh1XHWb+Cwgc5YGgWJDT2UGyR2eqG1WqcxMeJ7Bfjb53dOjKOEE7cFAxz/rKY6fNIdZFCkE6GDL1jryF5dZqSz/oouGUxrJRPm1v4afrx2QQ0kiSAAux77QSeZBhLhkJqGq4A0ETTI5xj+6DGeGQioNMX5gKNcHjo27M4Qfi0t5/IzI45GIUdoCZzyomsdw5DdsfyO6bg4xFyUDSiBSFuoMgl29++yKfcUpkMAIDE86gV0T6GLZYwh3dgEVwJke+aOysrTrrq2tfNpa33pfLMW88Mp4bpuEYr40t46PRNrYNMve7tlR7XHtS+1N/WH9af1ZQZ2fKmCXj3FN/+RcVWSa9</latexit>
0otherwise
w1w2w3w4
Ppointer(yi|x, y1, . . . , yi�1) /<latexit sha1_base64="/P0oXrWO7G7mmaRuLo+24t1JeVE=">AAAG+3icjVXdbtQ4FA6wM8sO+9PCJTfWViMSNVSTggRCKkLLDVpppdllW5Ca1nIcz4xpEke2p5ko41fZm70AIW73RfaOt+HESVo6HQqWkpyc851ff06iPOFKj0Yfr12/8V2v//3NHwa3fvzp5182Nm8fKDGXlO1TkQj5OiKKJTxj+5rrhL3OJSNplLBX0cnz2v7qlEnFRfa3LnN2lJJpxiecEg0qvNnbHIYkyWcEB67y0B4K2SJ3w3zGMXOVH/hhSvQsmlQL43mDDqtdhbVFq3mKK4UrfT8wBrVm++a2Wu9iSKx9vRr0zK47J7+O7yGwRUyvzbdt81mrfXFb5Wo6q/ThdmVS7XfuNmnCJtptck1RyDMUarbQMq2yqSSpgswFBgONhUY0lHw603XIREzR2C33gmXhUw9to3AiCa0CU52YtnS+F5hjiNdBR8sCcwAPhmO3K7Awdatjt8CBZx+7yzPxQS36oII+E0ivakcIYdWNBqwVt4NvgtQ2UGT1HFYQ1nnZNXfKFdcsRi9JZp2b6jsrFfNMG3cd2EeFZ74F6JmLGTOYX0EyjbRAUwH31ZK+CvhdzDLUpfiDxGRKFCWyqT+BcxADs6/u5EshruzqC04eUGO73RkrNBU8XF/Buvg1p9oxne/BMqzyJ2gFmvuAeDoKzRKi1OQq0Ndwg6HCwWC4aDht6RZVf5njGDUkJlKKAi3uNfZq5AehOa7ie+BYguMYn5WbC55pJo1xSyDY+dHyy3Melh0Pw1yKHHZwMJzhN8ehFjk6gLNLZDUzmJ/Vi/gEGQTxoO8Cv6ln8A3Ba5Kfd5yXxltXJmyGG6D7l7Cfg08FJZExeGNrtDOyC10WglbYcto1xhv/h7Gg85RlmiZEqcNglOujikjNacLMIJwrlhN6QqbsEMSMpEwdVfbbbdAQNDGaCAkX8NxqP/eo4HujyjQCZD0HtWqrletsh3M9eXxU8Syfa5bRJtFkntQnqf4RoJhLRnVSgkCo5FArojMCbIN5qQEMIVht+bJwsLsTPNjZ/fPh1rPf2nHcdO46vzquEziPnGfOC2fs7Du0t+j903vbe9c3/X/77/sfGuj1a63PHefC6v/3CTVpXbg=</latexit>
ThisLecture
‣MTbasics,evaluaHon
‣Wordalignment
‣ Languagemodels
‣ Phrase-baseddecoders
‣ Syntax-baseddecoders(probablynextHme)
MTBasics
MT
TrumpPopefamilywatchahundredyearsayearintheWhiteHousebalcony
People’sDaily,August30,2017
MTIdeally
‣ Ihaveafriend=>∃x friend(x,self)
‣MayneedinformaHonyoudidn’tthinkaboutinyourrepresentaHon
‣ Everyonehasafriend=> =>Tousaunami
‣ CanoUengetawaywithoutdoingalldisambiguaHon—sameambiguiHesmayexistinbothlanguages
J’aiuneamie(friendisfemale)
∃x∀y friend(x,y)∀x∃y friend(x,y)
‣ HardforsemanHcrepresentaHonstocovereverything
=>J’aiunami
MTinPracHce
Jefaisunbureau
Jefaisunesoupe
Jefaisunbureau
Imakeadesk
I’mmakingadesk
I’mmakingsoup
Qu’est-cequetufais? Whatareyoumaking?
‣WhataresometranslaHonpairsyoucanidenHfy?Howdoyouknow?
‣Whatmakesthishard? Notword-to-wordtranslaHon MulHpletranslaHonsofasinglesource(ambiguous)
‣ Bitext:thisiswhatwelearntranslaHonsystemsfrom
LevelsofTransfer:VauquoisTriangle
Slidecredit:DanKlein‣ Today:mostlyphrase-based,somesyntax
Phrase-BasedMT
‣ Keyidea:translaHonworksbeBerthebiggerchunksyouuse
‣ Rememberphrasesfromtrainingdata,translatepiece-by-pieceandsHtchthosepiecestogethertotranslate
‣ HowtoidenHfyphrases?Wordalignmentoversource-targetbitext
‣ HowtosHtchtogether?Languagemodelovertargetlanguage
‣ DecodertakesphrasesandalanguagemodelandsearchesoverpossibletranslaHons
‣ NOTlikestandarddiscriminaHvemodels(takeabunchoftranslaHonpairs,learnatonofparametersinanend-to-endway)
Phrase-BasedMT
Unlabeled English data
cat ||| chat ||| 0.9 the cat ||| le chat ||| 0.8 dog ||| chien ||| 0.8 house ||| maison ||| 0.6 my house ||| ma maison ||| 0.9 language ||| langue ||| 0.9 …
Language model P(e)
Phrase table P(f|e) P (e|f) / P (f |e)P (e)
Noisy channel model: combine scores from translation model + language model to translate foreign to
English
“Translate faithfully but make fluent English”
}EvaluaHngMT
‣ Fluency:doesitsoundgoodinthetargetlanguage?‣ Fidelity/adequacy:doesitcapturethemeaningoftheoriginal?
‣ BLEUscore:geometricmeanof1-,2-,3-,and4-gramprecisionvs.areference,mulHpliedbybrevitypenalty(penalizesshorttranslaHons)
‣ Typicallyn=4,wi=1/4
r=lengthofreferencec=lengthofpredicHon
‣ Doesthiscapturefluencyandadequacy?
BLEUScore
‣ BeBermethodswith human-in-the-loop
‣ Atacorpuslevel,BLEUcorrelatespreBywellwithhumanjudgments
‣ Ifyou’rebuildingrealMT systems,youdouserstudies. Inacademia,youmostlyuseBLEU
WordAlignment
WordAlignment§ Input:abitext:pairsoftranslatedsentences
§ Output:alignments:pairsoftranslatedwords§ Notalwaysone-to-one!
nous acceptons votre opinion .
we accept your view .
‣ Input:abitext,pairsoftranslatedsentences
nousacceptonsvotreopinion.|||weacceptyourview
nousallonschangerd’avis|||wearegoingtochangeourminds
‣ Output:alignmentsbetweenwordsineachsentence
‣Wewillseehowtoturntheseintophrases
“acceptandacceptonsarealigned”
1-to-ManyAlignments
WordAlignment
‣ModelsP(f|e):probabilityof“French”sentencebeinggeneratedfrom“English”sentenceaccordingtoamodel
‣ Correctalignmentsshouldleadtohigher-likelihoodgeneraHons,sobyopHmizingthisobjecHvewewilllearncorrectalignments
‣ Latentvariablemodel: P (f |e) =X
a
P (f ,a|e) =X
a
P (f |a, e)P (a)
IBMModel1
Brownetal.(1993)
Thankyou,Ishalldosogladly.e
‣ EachFrenchwordisalignedtoatmostoneEnglishword
0 2 6
Gracias,loharedemuybuengrado.f
5 7 7 7 7 8a
‣ SetP(a)uniformly(nopriorovergoodalignments)
‣ :wordtranslaHonprobabilitytableP (fi|eai)
P (f ,a|e) =nY
i=1
P (fi|eai)P (ai)
HMMforAlignment
Vogeletal.(1996)
Thankyou,Ishalldosogladly.e
‣ SequenHaldependencebetweena’stocapturemonotonicity
0 2 6
Gracias,loharedemuybuengrado.f
5 7 7 7 7 8a
‣ Alignmentdistparameterizedbyjumpsize:
‣ :sameasbeforeP (fi|eai)
§ Wantlocalmonotonicity:mostjumpsaresmall§ HMMmodel(Vogel96)
§ Re-es>mateusingtheforward-backwardalgorithm -2 -1 0 1 2 3
P (f ,a|e) =nY
i=1
P (fi|eai)P (ai|ai�1)
HMMModel
à
‣WhichdirecHonisthis?
‣ Somemistakes,especiallywhenyouhaverarewords(garbagecollec<on)
‣ Alignmentsaregenerallymonotonic(alongdiagonal)
EvaluaHngWordAlignment
Model AER Model 1 INT 19.5 HMM E→F 11.4 HMM F→E 10.8 HMM AND 7.1 HMM INT 4.7 GIZA M4 AND 6.9
‣ RunModel1inbothdirecHonsandintersect“intelligently”
‣ RunHMMmodelinbothdirecHonsandintersect“intelligently”
‣ “Alignmenterrorrate”:uselabeledalignmentsonsmallcorpus
PhraseExtracHon
‣ FindconHguoussetsofalignedwordsinthetwolanguagesthatdon’thavealignmentstootherwords
d’assisteràlareunionet|||toaBendthemeeHngand à
‣ Lotsofphrasespossible,countacrossallsentencesandscorebyfrequency
assisteràlareunion|||aBendthemeeHng
lareunionand|||themeeHngand
nous|||we…
Decoding
Recall:n-gramLanguageModels
‣ n-grammodels:distribuHonofnextwordisamulHnomialcondiHonedonpreviousn-1words
MaximumlikelihoodesHmateofthis3-gramprobabilityfromacorpus
IvisitedSan_____ putadistribuHonoverthenextword
‣ Typicallyuse~5-gramlanguagemodelsfortranslaHon
P (w) = P (w1)P (w2|w1)P (w3|w1, w2) . . .<latexit sha1_base64="5t3XfUKt6X50B4ft08GzEXR9MnE=">AAAEJ3icjVNLb9NAEHYdHiW8WjhyWVFFslVTxS0SXCJVcOEYJNJWqsNqvVk7q6wf8o5xIsf/hgt/hQsSIARH/gljx6loWiFWsj37fTPzzcx6/VRJDf3+ry2zc+Pmrdvbd7p3791/8HBn99GJTvKMixFPVJKd+UwLJWMxAglKnKWZYJGvxKk/e13zpx9EpmUSv4NFKsYRC2MZSM4AIbprDnoeU+mUUdfSNhkQT8xTy0unkgpLO67jRQymflDOK9vurn3B0hQab51HtNS0hGduVZGWbnZWi9qXU1JwYDPpBQ/rIKfObxPkfAHX6u03eg3bbKwW3JRrQAdf/xQFZx3eiCoRgLXSCoknY+KBmEMWlXGYsUijckGR4JMECPcyGU6hTqmSkAytxcBdFg63yT7xgozx0q3KWdWWLgdu9R7zrV37y4JKdO4OrXV9RVV3OrQK6trN53B5YR7VpoMQtqlQXXd7Q2tO5XKO8Apx5qjSzL1OsuIQiOsxbHjUwcXyojdspmAxEEhImOD7Pxzozl7/oN8sctVwW2PPaNeQ7nz1JgnPIxEDV0zrc7efwrhkGUiuRNX1ci1SxmcsFOdoxiwSelw2/3lFeohMSJBk+GAZDfp3RIlnoxeRj571LPUmV4PXcec5BC/HpYzTHETMV0JBrupG60tDJjITHNQCDcYzibUSPmV4toBXq4tDcDdbvmqcHB64RweHb5/vHb9qx7FtPDGeGpbhGi+MY+ONMTRGBjc/mp/Nb+b3zqfOl86Pzs+Vq7nVxjw2Lq3O7z+3SFKm</latexit>
P (wi|w1, . . . , wi�1) = P (wi|wi�n+1, . . . , wi�1)<latexit sha1_base64="51rE81WXGzM2/wAnhkW3v0LmNjI=">AAAEJ3icjVNLb9QwEE6zPMryauHIxaKqlKih2rRIcFmpggvHRaIPqVkix+tkrToPxRPSVTb/hgt/hQsSIARH/gljb7ai2wphKcn4+2bmmxnHUSGFgsHg15rdu3Hz1u31O/279+4/eLix+ehI5VXJ+CHLZV6eRFRxKTJ+CAIkPylKTtNI8uPo7LXmjz/wUok8ewezgo9TmmQiFowCQuGmPdwOqCymNPQd5ZIhCfh54QTFVITcUZ7vBSmFaRQ3563r9pe+4KgQjLeq0rBRYQPP/LYlHW12Toe6l1OG4MFq0gselkGezu8S5CIO1+rtGD3Dmo3TgatyBvTw9U9R8JbhRlTyGJyFVkICkZEA+DmUaZMlJU0VKtchEmySA2FBKZIp6JQyT8jImQ39ee0xl+yQIC4pa/y2OWu70sXQb99jvqXrYF6HAp372yNnWWDd6lZHTh36rvnszS/MfW16CGGfEuVVX6PCoAsAyUaYuS9yaA6BTI9hxUOL1vOL3rCZmmZAICdJju//cAg3tga7A7PIVcPvjC2rW6Nw42swyVmV8gyYpEqd+oMCxg0tQTDJ235QKV5QdkYTfopmRlOuxo35z1uyjciExHmJD5Zh0L8jGjwbNUsj9NSjVKucBq/jTiuIX44bkRUV8IwthOJK6kb1pSETUXIGcoYGZaXAWgmbUjxbwKvVxyH4qy1fNY72dv393b23z7cOXnXjWLeeWE8tx/KtF9aB9cYaWYcWsz/an+1v9vfep96X3o/ez4WrvdbFPLYurd7vP6JwUqA=</latexit>
P (w|visited San) =count(visited San, w)
count(visited San)<latexit sha1_base64="GxhegjoyI97hWYhkruShxjX0v5w=">AAAEmXicjVNdb5RAFKXdVev60a0mvvRlYtMEUmyW1kQfbFI1msanNbUfSVnJMDuwk4WBMJfSDctv8r/45r/xMguN3TbqJMDlnHPvuXNh/DQSCgaDXyurne69+w/WHvYePX7ydL2/8exUJXnG+AlLoiQ796nikZD8BARE/DzNOI39iJ/50481f3bJMyUS+Q1mKR/FNJQiEIwCQt7G6o9tl0bphHqOqSxyQFx+lZpuOhEeN5Xt2G5MYeIH5VVlWb1WC6byQKtVHnul8kp45VQVaWj9ZjaodbOkBzYsF73moU2y6/oWQc7ncKffjvbTrH4xG3DZToM23v5qCnabrk0jHoC58AqJKyRxgV9BFpcyzGis0LnwkGDjBAhzMxFOoC4ZJSEZmrMDZ17YzCI7xA0yykqnKqdV07o4cKrvWK+VDuaFJ1Dc2x6abYNFVW91aBaeY+nH3vw63K9DGyHcZ4T2qk7EEhpeIMiWQg9+UaTmEJD1HJYUPeTn7d4uhRLAx+SYSp27aL5lWZJLqMy7xDYprOp/hFalu712lDi+gkogkJAwwXtl/VPg9bcGuwO9yO3AaYIto1lDr//THScsj7kEFlGlLpxBCqOSZiBYxKuemyueUjalIb/AUNKYq1GpT1ZFthEZkyDJ8MI2NPpnRol/g5rFPirrj6eWuRq8i7vIIXg7KoVMc+CSLYyCPKo3Wh9TMhYZZxDNMKAsE9grYROKHwTwMPdwCM7ylm8Hp3u7zv7u3tfXW4cfmnGsGZvGS8M0HOONcWgcGUPjxGCdF513nU+dz93N7vvuUffLQrq60uQ8N26s7vFv6gt6gg==</latexit>
Phrase-BasedDecoding
‣ Phrasetable:setofphrasepairs(e,f)withprobabiliHesP(f|e)
‣ Inputs:
‣Whatwewanttofind:eproducedbyaseriesofphrase-by-phrasetranslaHonsfromaninputf,possiblywithreordering:
‣ n-gramlanguagemodel:P (ei|e1, . . . , ei�1) ⇡ P (ei|ei�n�1, . . . , ei�1)
Phraselavcesarebig!
� 7� �� �� �� � �� �� � .
Slidecredit:DanKlein
Phrase-BasedDecoding
‣ Input
‣ TranslaHons
‣ DecodingobjecHve(for3-gramLM)
Slidecredit:DanKlein
MonotonicTranslaHon
‣ Ifwetranslatewithbeamsearch,whatstatedoweneedtokeepinthebeam?
‣Whathavewetranslatedsofar?
‣Whatwordshaveweproducedsofar?
‣Whenusinga3-gramLM,onlyneedtorememberthelast2words!Koehn(2004)
MonotonicTranslaHon
…didnotidx=2
Marynot
Maryno
4.2
-1.2
-2.9
idx=2
idx=2
score=log[P(Mary)P(not|Mary)P(Maria|Mary)P(no|not)]{ {LM TM
Inreality:score=αlogP(LM)+βlogP(TM)…andTMisbrokendownintoseveralfeatures
Koehn(2004)
MonotonicTranslaHon
…notslapidx=5
…aslap
…noslap
8.7
-2.4
-1.1
idx=5
idx=5
‣ Severalpathscangetustothisstate,maxoverthem(likeViterbi)
…notgiveidx=3
…giveaidx=4
unabofetada|||aslap
bofetada|||slap ‣ Variable-lengthtranslaHonpieces=semi-HMM
Non-MonotonicTranslaHon
‣ Non-monotonictranslaHon:canvisitsourcesentence“outoforder”‣ Stateneedstodescribewhichwordshavebeentranslatedandwhichhaven’t
translated:Maria,dio,una,bofetada
‣ Bigenoughphrasesalreadycapturelotsofreorderings,sothisisn’tasimportantasyouthink
TrainingDecoders
‣ MERT(Och2003):decodetoget1000-besttranslaHonsforeachsentenceinasmalltrainingset(<1000sentences),dolinesearchonparameterstodirectlyopHmizeforBLEU
score=αlogP(LM)+βlogP(TM)
…andTMisbrokendownintoseveralfeatures
‣ Usually5-20featureweightstoset,wanttoopHmizeforBLEUscorewhichisnotdifferenHable
Moses
‣ Pharaoh(Koehn,2004)isthedecoderfromKoehn’sthesis
‣ ToolkitformachinetranslaHonduetoPhilippKoehn+HieuHoang
‣Mosesimplementswordalignment,languagemodels,andthisdecoder,plus*aton*morestuff
‣ HighlyopHmizedandheavilyengineered,couldmoreorlessbuildSOTAtranslaHonsystemswiththisfrom2007-2015
‣ NextHme:resultsontheseandcomparisonstoneuralmethods
Syntax
LevelsofTransfer:VauquoisTriangle
Slidecredit:DanKlein‣ Issyntaxa“beBer”abstracHonthanphrases?
SyntacHcMT‣ Ratherthanusephrases,useasynchronouscontext-freegrammar
NP→[DT1JJ2NN3;DT1NN3JJ2]
DT→[the,la]
NN→[car,voiture]JJ→[yellow,jaune]
the yellow car
‣ Assumesparallelsyntaxuptoreordering
DT→[the,le]
la voiture jaune
NP NP
DT1 NN3 JJ2DT1 NN3JJ2
‣ TranslaHon=parsetheinputwith“half”ofthegrammar,readofftheotherhalf
SyntacHcMT
Slidecredit:DanKlein
‣ Uselexicalizedrules,looklike“syntacHcphrases”
‣ LeadstoHUGEgrammars,parsingisslow
Takeaways
‣ Phrase-basedsystemsconsistof3pieces:aligner,languagemodel,decoder
‣ HMMsworkwellforalignment
‣ N-gramlanguagemodelsarescalableandhistoricallyworkedwell
‣ Decoderrequiressearchingthroughacomplexstatespace
‣ LotsofsystemvariantsincorporaHngsyntax
‣ NextHme:neuralMT