neural machine transla#on
TRANSCRIPT
NeuralMachineTransla/on
ThangLuongLecture@CS224D
Spring2016
(SpecialthankstoChrisManningforfeedback!)
7billionpeople,7000languages
5/19/16 ThangLuong-NeuralMachineTranslaHon 2
Auniversaltranslator
(TheBabelFishfrom“theHitchhiker'sGuidetotheGalaxy”)
5/19/16 ThangLuong-NeuralMachineTranslaHon 3
Machinevs.HumanTransla/on
“Nevertheless,withinthedisciplineofmachinelearning,aspecializa8oncalleddeeplearninghasbeendeveloped.Itsdis8nguishingfeatureisthatitisinspiredbyneurobiology.Deeplearningdealswithcomputa8onalelementswhichallowanetworkofar8ficialneuronstolearnamodelofthehumanbrain.”
Faithfultransla/on
5/19/16 ThangLuong-NeuralMachineTranslaHon 4
• GrammaHcallyincorrect.
Machinevs.HumanTransla/on
“Nevertheless,withinthedisciplineofmachinelearning,aspecializa8oncalleddeeplearninghasbeendeveloped.Itsdis8nguishingfeatureisthatitisinspiredbyneurobiology.Deeplearningdealswithcomputa8onalelementswhichallowanetworkofar8ficialneuronstolearnamodelofthehumanbrain.”
Faithfultransla/on
5/19/16 ThangLuong-NeuralMachineTranslaHon 5
• Badwordchoices.
Machinevs.HumanTransla/on
“Nevertheless,withinthedisciplineofmachinelearning,aspecializa8oncalleddeeplearninghasbeendeveloped.Itsdis8nguishingfeatureisthatitisinspiredbyneurobiology.Deeplearningdealswithcomputa8onalelementswhichallowanetworkofar8ficialneuronstolearnamodelofthehumanbrain.”
Faithfultransla/on
5/19/16 ThangLuong-NeuralMachineTranslaHon 6
• Badsentencestructures.
Machinevs.HumanTransla/on
“However,inmachinelearningaspecializa8oncalleddeeplearninghasemerged.Itcanberecognizedbyitsdis8nc8veneurobiologicalinfluence.Deeplearningiscenteredaroundnetworksofar8ficialneuronswhichcanlearnmodelsofthehumanbrain.”
Fluenttransla/on
Abiggap!5/19/16 ThangLuong-NeuralMachineTranslaHon 7
HowhasMTevolved?
5/19/16 ThangLuong-NeuralMachineTranslaHon 8
Phrase-basedMT
Sheloves
Elleaime
cute
leschats
cats
mignons(Brownetal.,1993;Koehnetal.,2003;Och&Ney,2004)
• Breaksentencesintochunks.• Transla8onmodel:lookupphrasetranslaHons.• Languagemodel:Hephrasestogether.
Translatelocally LMusesonlytargetwords5/19/16 ThangLuong-NeuralMachineTranslaHon 9
JointNeuralLanguageModel
• CondiHonedonsourcewords(Devlinetal.,2014)• SHlltranslatelocally.
…alléspourunepromenadelelongdela
WewentforastrollalongtheSouthBank
rive
walk along bank(river)
MTsystemsbecomemorecomplex!5/19/16 ThangLuong-NeuralMachineTranslaHon 10
NeuralMachineTransla/ontotherescue!
Let’sfindout!
• Sequence-to-sequence:translateglobally.• End-to-end:simple&generalizable.
(Sutskeveretal.,2014;Choetal.,2014)
NMT
Iamastudent Jesuisétudiant
5/19/16 ThangLuong-NeuralMachineTranslaHon 11
Outline
• BasicNMT– RNNRecap.– Encoder-Decoder.– Training.– TesHng.
• AdvancedNMT
5/19/16 ThangLuong-NeuralMachineTranslaHon 12
RecurrentNeuralNetworks(RNNs)
I am a studentinput:
5/19/16 ThangLuong-NeuralMachineTranslaHon 13
(PictureadaptedfromAndrejKarparthy)
RecurrentNeuralNetworks(RNNs)
I am a studentinput:
RNNstorepresentsequences!
ht-1 ht
xt
(PictureadaptedfromAndrejKarparthy)5/19/16 ThangLuong-NeuralMachineTranslaHon 14
NeuralMachineTransla/on(NMT)
am a student _ Je suis étudiant
Je suis étudiant _
I
5/19/16 ThangLuong-NeuralMachineTranslaHon 15
• RecurrentNeuralNetworks:– ModelP(target|source)directly.– Canbetrainedend-to-end.
NeuralMachineTransla/on(NMT)
am a student _ Je suis étudiant
Je suis étudiant _
I
5/19/16 ThangLuong-NeuralMachineTranslaHon 16
• RecurrentNeuralNetworks:– ModelP(target|source)directly.– Canbetrainedend-to-end.
NeuralMachineTransla/on(NMT)
am a student _ Je suis étudiant
Je suis étudiant _
I
5/19/16 ThangLuong-NeuralMachineTranslaHon 17
• RecurrentNeuralNetworks:– ModelP(target|source)directly.– Canbetrainedend-to-end.
NeuralMachineTransla/on(NMT)
am a student _ Je suis étudiant
Je suis étudiant _
I
5/19/16 ThangLuong-NeuralMachineTranslaHon 18
• RecurrentNeuralNetworks:– ModelP(target|source)directly.– Canbetrainedend-to-end.
NeuralMachineTransla/on(NMT)
am a student _ Je suis étudiant
Je suis étudiant _
I
5/19/16 ThangLuong-NeuralMachineTranslaHon 19
• RecurrentNeuralNetworks:– ModelP(target|source)directly.– Canbetrainedend-to-end.
NeuralMachineTransla/on(NMT)
am a student _ Je suis étudiant
Je suis étudiant _
I
Encoder
5/19/16 ThangLuong-NeuralMachineTranslaHon 20
• RecurrentNeuralNetworks:– ModelP(target|source)directly.– Canbetrainedend-to-end.
NeuralMachineTransla/on(NMT)
am a student _ Je suis étudiant
Je suis étudiant _
I
Encoder
5/19/16 ThangLuong-NeuralMachineTranslaHon 21
• RecurrentNeuralNetworks:– ModelP(target|source)directly.– Canbetrainedend-to-end.
Boundarymarker
NeuralMachineTransla/on(NMT)
am a student _ Je suis étudiant
Je suis étudiant _
I
Encoder
5/19/16 ThangLuong-NeuralMachineTranslaHon 22
• RecurrentNeuralNetworks:– ModelP(target|source)directly.– Canbetrainedend-to-end.
NeuralMachineTransla/on(NMT)
am a student _ Je suis étudiant
Je suis étudiant _
I
Encoder
5/19/16 ThangLuong-NeuralMachineTranslaHon 23
• RecurrentNeuralNetworks:– ModelP(target|source)directly.– Canbetrainedend-to-end.
NeuralMachineTransla/on(NMT)
am a student _ Je suis étudiant
Je suis étudiant _
I
Encoder Decoder
• RecurrentNeuralNetworks:– ModelP(target|source)directly.– Canbetrainedend-to-end.
Boundarymarker
5/19/16 ThangLuong-NeuralMachineTranslaHon 24
WordEmbeddings
am a student _ Je suis étudiant
Je suis étudiant _
I
• Oneforeachlanguage:canlearnfromscratch.
Sourceembeddings
Targetembeddings
5/19/16 ThangLuong-NeuralMachineTranslaHon 25
am a student _ Je suis étudiant
Je suis étudiant _
I
RecurrentConnec/onsIniHalstates
• Olensetto0.5/19/16 ThangLuong-NeuralMachineTranslaHon 26
am a student _ Je suis étudiant
Je suis étudiant _
I
RecurrentConnec/ons
Encoder1stlayer
5/19/16 ThangLuong-NeuralMachineTranslaHon 27
• Different:{1stlayer,2ndlayer}x{encoder,decoder}.
am a student _ Je suis étudiant
Je suis étudiant _
I
RecurrentConnec/ons
Encoder2ndlayer
5/19/16 ThangLuong-NeuralMachineTranslaHon 28
• Different:{1stlayer,2ndlayer}x{encoder,decoder}.
am a student _ Je suis étudiant
Je suis étudiant _
I
RecurrentConnec/ons
• Different:{1stlayer,2ndlayer}x{encoder,decoder}.
Decoder1stlayer
5/19/16 ThangLuong-NeuralMachineTranslaHon 29
am a student _ Je suis étudiant
Je suis étudiant _
I
RecurrentConnec/ons
Decoder2ndlayer
5/19/16 ThangLuong-NeuralMachineTranslaHon 30
• Different:{1stlayer,2ndlayer}x{encoder,decoder}.
RecurrentUnits
• Vanilla:
• LSTM:
5/19/16 ThangLuong-NeuralMachineTranslaHon 31
RNN
LSTM
C’mon,it’sbeenaroundfor20years!
Vanishinggradientproblem!
SoTmax:vectors↦categories
am a student _ Je suis étudiant
Je suis étudiant _
I
am a student _ Je suis étudiant
Je suis étudiant _
I
Solmaxparameters
5/19/16 ThangLuong-NeuralMachineTranslaHon 32
am a student _ Je suis étudiant
Je suis étudiant _
I
Targethiddenstate
|V|
SoTmax:vectors↦categories
am a student _ Je suis étudiant
Je suis étudiant _
I
am a student _ Je suis étudiant
Je suis étudiant _
I
5/19/16 ThangLuong-NeuralMachineTranslaHon 33
am a student _ Je suis étudiant
Je suis étudiant _
I
|V|
• Hiddenstates↦scores↦probabiliHes.
Scores
=
Probs
P(Je|…)exp&normalize
TrainingLoss
• MaximizeP(target|source)
am a student _ Je suis étudiantI
Je suis étudiant _
5/19/16 ThangLuong-NeuralMachineTranslaHon 34
TrainingLoss
am a student _ Je suis étudiant
-log P(Je)
I
-log P(suis)
• Sumofallindividuallosses5/19/16 ThangLuong-NeuralMachineTranslaHon 35
TrainingLoss
am a student _ Je suis étudiant
-log P(Je)
I
-log P(suis)
• Sumofallindividuallosses5/19/16 ThangLuong-NeuralMachineTranslaHon 36
TrainingLoss
• Sumofallindividuallosses
am a student _ Je suis étudiantI
-log P(étudiant)
5/19/16 ThangLuong-NeuralMachineTranslaHon 37
TrainingLoss
• Sumofallindividuallosses
am a student _ Je suis étudiantI
-log P(_)
5/19/16 ThangLuong-NeuralMachineTranslaHon 38
Backpropaga/onThroughTime
am a student _ Je suis étudiantI
-log P(_)
Initto0
5/19/16 ThangLuong-NeuralMachineTranslaHon 39
am a student _ Je suis étudiantI
-log P(étudiant)
Backpropaga/onThroughTime
5/19/16 ThangLuong-NeuralMachineTranslaHon 40
am a student _ Je suis étudiantI
-log P(étudiant)
Backpropaga/onThroughTime
5/19/16 ThangLuong-NeuralMachineTranslaHon 41
am a student _ Je suis étudiantI
-log P(suis)
Backpropaga/onThroughTime
5/19/16 ThangLuong-NeuralMachineTranslaHon 42
am a student _ Je suis étudiantI
-log P(suis)
Backpropaga/onThroughTime
5/19/16 ThangLuong-NeuralMachineTranslaHon 43
am a student _ Je suis étudiantI
Backpropaga/onThroughTime
RNNgradientsareaccumulated.5/19/16 ThangLuong-NeuralMachineTranslaHon 44
Trainingvs.Tes/ng
• Training– CorrecttranslaHonsareavailable.
• Tes8ng– Onlysourcesentencesaregiven.
am a student _ Je suis étudiant
Je suis étudiant _
I
am a student _ Je suis étudiant
Je suis étudiant _
I5/19/16 ThangLuong-NeuralMachineTranslaHon 45
Tes/ng
• Feedthemostlikelyword5/19/16 ThangLuong-NeuralMachineTranslaHon 46
Tes/ng
5/19/16 ThangLuong-NeuralMachineTranslaHon 47
• Feedthemostlikelyword
Tes/ng
5/19/16 ThangLuong-NeuralMachineTranslaHon 48
• Feedthemostlikelyword
Tes/ng
5/19/16 ThangLuong-NeuralMachineTranslaHon 49
• Feedthemostlikelyword
Tes/ng
Simplebeam-searchdecoders!5/19/16 ThangLuong-NeuralMachineTranslaHon 50
37
33.3
34.8
36.5
31
32
33
34
35
36
37
38
BLEU
SOTASMT(Durrani+,2014)
AvgSMT(Schwenk,2014)
NMT(Sutskever+,2014)
SMT+NMTrescore(Sutskever+,2014)
English-FrenchWMT’14results
5/19/16 ThangLuong-NeuralMachineTranslaHon 51
English-FrenchWMT’14results
37
33.3
34.8
36.5
31
32
33
34
35
36
37
38
BLEU
SOTASMT(Durrani+,2014)
AvgSMT(Schwenk,2014)
NMT(Sutskever+,2014)
SMT+NMTrescore(Sutskever+,2014)
5/19/16 ThangLuong-NeuralMachineTranslaHon 52
2decadesofresearch
1-2yearsofresearch
Encoder-decoderVariants
Encoder Decoder
(Sutskeveretal.,2014)MyNMTmodels DeepLSTM DeepLSTM
(Choetal.,2014)(Bahdanauetal.,2015)
(Jeanetal.,2015)
(BidirecHonal)GRU GRU
(Kalchbrenner&Blunsom,2013) CNN (InverseCNN)
RNN
Next,advancedNMT!5/19/16 ThangLuong-NeuralMachineTranslaHon 53
Deepfriedbaby Meatmusclestupidbeansprouts
Break/me:whenMTfails…
Saleofchickenmurder Gobacktowardyourbehind
5/19/16 ThangLuong-NeuralMachineTranslaHon 54
Limita/ons
• #1:thevocabularysizeproblem– Goal:extendthevocabularycoverage.
• #2:thesentencelengthproblem– Goal:translatelongsentencesbever.
• #3:thelanguagecomplexityproblem– Goal:handlemorelanguagevariaHons.
5/19/16 ThangLuong-NeuralMachineTranslaHon 55
AdvancingNMT
• #1:thevocabularysizeproblem– Sol:“copy”mechanism.
• #2:thesentencelengthproblem– Sol:avenHonmechanism.
• #3:thelanguagecomplexityproblem– Sol:character-leveltranslaHon.
5/19/16 ThangLuong-NeuralMachineTranslaHon 56
The<unk>porHcoin<unk>
Le<unk><unk>de<unk>
ecotax Pont-de-Buis
por8que écotaxe Pont-de-Buis
am a student _ Je suis étudiant
Je suis étudiant _
I
CVvs.NLP
ComputerVision
cat
1Kcategories 1Mcategories
NLP
mat
Thecatsatona
5/19/16 ThangLuong-NeuralMachineTranslaHon 57
#1TheVocabularySizeProblem
• WordgeneraHonproblem– Vocabsaremodest:50K.– Simplesolmax:GPUfriendliness.
am a student _ Je suis étudiant
Je suis étudiant _
I
The<unk>porHcoin<unk>Le<unk><unk>de<unk>
TheecotaxporHcoinPont-de-BuisLeporHqueécotaxedePont-de-Buis
5/19/16 ThangLuong-NeuralMachineTranslaHon 58
• Propose“copy”mechanismsfor<unk>.
• Simple&effecHve– TreatanyNMTasablackbox.– Annotatetrainingdata.– Post-processtranslaHons.
ThangLuong*,IlyaSutskever*,QuocLe*,OriolVinyals,andWojciechZaremba.AddressingtheRareWordProbleminNeuralMachineTransla>on.ACL2015.
SOTAforEnglish-FrenchtranslaHon.
5/19/16 ThangLuong-NeuralMachineTranslaHon 59
Ourapproach–trainingannota>on
• AddrelaHveposiHons.
“AvenHon”forrarewords
TheecotaxporHcoinPont-de-Buis
LeporHqueécotaxedePont-de-Buis
• Learnalignments.
The<unk>porHcoin<unk>
Leunk1unk-1deunk0
5/19/16 ThangLuong-NeuralMachineTranslaHon 60
Ourapproach–post-process
Testsentence
Transla8on
The<unk>porHcoin<unk>
LeporHqueunk-1deunk0
ecotax Pont-de-Buis
5/19/16 ThangLuong-NeuralMachineTranslaHon 61
Ourapproach–post-process
Testsentence
Transla8on
The<unk>porHcoin<unk>
LeporHqueunk-1deunk0
ecotax Pont-de-Buis
Post-editTransla8on
Dic/onarytranslaHon
LeporHqueécotaxedePont-de-Buis
5/19/16 ThangLuong-NeuralMachineTranslaHon 62
Ourapproach–post-process
Testsentence
Transla8on
The<unk>porHcoin<unk>
LeporHqueunk-1deunk0
ecotax Pont-de-Buis
LeporHqueécotaxedePont-de-Buis
Iden/tycopy
Post-editTransla8on
Orthogonaltolarge-vocabtechniques5/19/16 ThangLuong-NeuralMachineTranslaHon 63
EffectsofTransla/ngRareWords
25
30
35
40
BLEU
Sentencesorderedbyaveragefrequencyrank
Durranietal.(37.0)
Sutskeveretal.(34.8)
Thiswork(37.5)
5/19/16 ThangLuong-NeuralMachineTranslaHon 64
FirstSOTANMTsystem!
Sampletransla/ons
• Predictwelllong-distancealignments.– Correct:cataractvs.cataracte.
source AnaddiHonal2600operaHonsincludingorthopedicandcataractsurgerywillhelpclearabacklog.
human 2600opéraHonssupplémentaires,notammentdansledomainedelachirurgieorthopédiqueetdelacataracte,aiderontàravraperleretard.
trans Enoutre,unk1opéraHonssupplémentaires,dontlachirurgieunk5etlaunk6,permevrontderésorberl'arriéré.
trans+unk
Enoutre,2600opéraHonssupplémentaires,dontlachirurgieorthopédiquesetlacataracte,permevrontderésorberl'arriéré.
5/19/16 ThangLuong-NeuralMachineTranslaHon 65
Sampletransla/ons
• Translatewelllongsentences.– Correct:JPMorganvs.JPMorgan.
sourceThistrader,RichardUsher,lelRBSin2010andisunderstandtohavebegivenleavefromhiscurrentposiHonasEuropeanheadofforexspottradingatJPMorgan.
humanCetrader,RichardUsher,aquivéRBSen2010etauraitétémissuspendudesonpostederesponsableeuropéendutradingaucomptantpourlesdeviseschezJPMorgan.
transCeunk0,Richardunk0,aquivéunk1en2010etacomprisqu'ilestautoriséàquiversonposteactuelentantqueleadereuropéendumarchédespointsdeventeauunk5.
trans+unk
Cenégociateur,RichardUsher,aquivéRBSen2010etacomprisqu'ilestautoriséàquiversonposteactuelentantqueleadereuropéendumarchédespointsdeventeauJPMorgan.
5/19/16 ThangLuong-NeuralMachineTranslaHon 66
Sampletransla/ons
• IncorrectalignmentpredicHon:was–étaitvs.abandonnait.
source ButconcernshavegrownalerMrMazangawasquotedassayingRenamowasabandoningthe1992peaceaccord.
human Maisl'inquiétudeagrandiaprèsqueM.MazangaadéclaréquelaRenamoabandonnaitl'accorddepaixde1992.
trans MaislesinquiétudessesontaccruesaprèsqueM.unkpos3adéclaréquelaunk3unk3l'accorddepaixde1992.
trans+unk
MaislesinquiétudessesontaccruesaprèsqueM.MazangaadéclaréquelaRenamoétaitl'accorddepaixde1992.
5/19/16 ThangLuong-NeuralMachineTranslaHon 67
AdvancingNMT
• #1:thevocabularysizeproblem– Sol:“copy”mechanism.
• #2:thesentencelengthproblem– Sol:avenHonmechanism.
• #3:thelanguagecomplexityproblem– Sol:character-leveltranslaHon.
5/19/16 ThangLuong-NeuralMachineTranslaHon 68
The<unk>porHcoin<unk>
Le<unk><unk>de<unk>
ecotax Pont-de-Buis
por8que écotaxe Pont-de-Buis
am a student _ Je suis étudiant
Je suis étudiant _
I
#2TheSentenceLengthProblem
Problem:sentencemeaningisrepresentedbyafixed-dimensionalvector.
am a student _ Je suis étudiant
Je suis étudiant _
I
• TranslaHonqualitydegradeswithlongsentences.
5/19/16 ThangLuong-NeuralMachineTranslaHon 69
Youcan'tcramthemeaningofawhole%&!$#sentenceintoasingle$&!#*vector!
5/19/16 ThangLuong-NeuralMachineTranslaHon 70
(AdaptedfromKyungHuynCho’talk)
Amen/onMechanism
• SoluHon:randomaccessmemory– Retrieveasneeded.
am a student _ Je suis étudiant
Je suis étudiant _
I
Poolofsourcestates
5/19/16 ThangLuong-NeuralMachineTranslaHon 71
5/19/16 ThangLuong-NeuralMachineTranslaHon 72
DzmitryBahdanau,KyungHuynCho,andYoshuaBengio.NeuralMachineTransla>onbyJointlyLearningtoTranslateandAlign.ICLR2015.
WithavenHonWithoutavenHon
Amen/onMechanism
am a student _ Je
suis
I
Attention Layer
Context vector
?
Asimplifiedversionof(Bahdanauetal.,2015)5/19/16 ThangLuong-NeuralMachineTranslaHon 73
• Comparetargetandsourcehiddenstates.
Amen/onMechanism–Scoring
am a student _ Je
suis
I
Attention Layer
Context vector
?
3
5/19/16 ThangLuong-NeuralMachineTranslaHon 74
• Comparetargetandsourcehiddenstates.
Amen/onMechanism–Scoring
am a student _ Je
suis
I
Attention Layer
Context vector
?
53
5/19/16 ThangLuong-NeuralMachineTranslaHon 75
• Comparetargetandsourcehiddenstates.
Amen/onMechanism–Scoring
am a student _ Je
suis
I
Attention Layer
Context vector
?
13 5
5/19/16 ThangLuong-NeuralMachineTranslaHon 76
• Comparetargetandsourcehiddenstates.
Amen/onMechanism–Scoring
am a student _ Je
suis
I
Attention Layer
Context vector
?
13 5 1
5/19/16 ThangLuong-NeuralMachineTranslaHon 77
• Convertintoalignmentweights.
Amen/onMechanism–Normaliza>on
am a student _ Je
suis
I
Attention Layer
Context vector
?
0.10.3 0.5 0.1
5/19/16 ThangLuong-NeuralMachineTranslaHon 78
am a student _ Je
suis
I
Context vector
• Buildcontextvector:weightedaverage.
Amen/onMechanism–Contextvector
?
5/19/16 ThangLuong-NeuralMachineTranslaHon 79
am a student _ Je
suis
I
Context vector
• Computethenexthiddenstate.
Amen/onMechanism–Hiddenstate
5/19/16 ThangLuong-NeuralMachineTranslaHon 80
am a student _ Je
suis
I
Context vector
• Predictthenextword.
Amen/onMechanism–Predict
5/19/16 ThangLuong-NeuralMachineTranslaHon 81
• ExaminevariousavenHonmechanisms:
ThangLuong,HieuPham,andChrisManning.Effec>veApproachestoAGen>on-basedNeuralMachineTransla>on.EMNLP2015.
Global:allsourcestates. Local:subsetofsourcestates.
SOTAforEnglish-GermantranslaHon.
5/19/16 ThangLuong-NeuralMachineTranslaHon 82
TranslateLongSentences
10 20 30 40 50 60 7010
15
20
25
Sent Lengths
BLEU
����
�
ours, no attn (BLEU 13.9)ours, local−p attn (BLEU 20.9)ours, best system (BLEU 23.0)WMT’14 best (BLEU 20.7)Jeans et al., 2015 (BLEU 21.6)
NoAvenHon
AvenHon
5/19/16 ThangLuong-NeuralMachineTranslaHon 83
SampleEnglish-Germantransla/ons
• Translatenamescorrectly.
source OrlandoBloomandMirandaKerrsHllloveeachother
human OrlandoBloomundMirandaKerrliebensichnochimmer
bestOrlandoBloomundMirandaKerrliebeneinandernochimmer.
baseOrlandoBloomundLucasMirandaliebeneinandernochimmer.
5/19/16 ThangLuong-NeuralMachineTranslaHon 84
SampleEnglish-Germantransla/ons
• Translateadoubly-negatedphrasecorrectly• Failtotranslate“passengerexperience”.
sourceWeʹrepleasedtheFAArecognizesthatanenjoyablepassengerexperienceisnotincompa>blewithsafetyandsecurity,saidRogerDow,CEOoftheU.S.TravelAssociaHon.
humanWirfreuenuns,dassdieFAAerkennt,dasseinangenehmesPassagiererlebnisnichtimWider-spruchzurSicherheitsteht,sagteRogerDow,CEOderU.S.TravelAssociaHon.
best Wirfreuenuns,dassdieFAAanerkennt,dasseinangenehmesistnichtmitSicherheitundSicherheitunvereinbarist,sagteRogerDow,CEOderUS-die.
baseWirfreuenunsuberdie<unk>,dassein<unk><unk>mitSicherheitnichtvereinbaristmitSicherheitundSicherheit,sagteRogerCameron,CEOderUS-<unk>.
5/19/16 ThangLuong-NeuralMachineTranslaHon 85
sourceWeʹrepleasedtheFAArecognizesthatanenjoyablepassengerexperienceisnotincompa>blewithsafetyandsecurity,saidRogerDow,CEOoftheU.S.TravelAssociaHon.
humanWirfreuenuns,dassdieFAAerkennt,dasseinangenehmesPassagiererlebnisnichtimWider-spruchzurSicherheitsteht,sagteRogerDow,CEOderU.S.TravelAssociaHon.
best Wirfreuenuns,dassdieFAAanerkennt,dasseinangenehmesistnichtmitSicherheitundSicherheitunvereinbarist,sagteRogerDow,CEOderUS-die.
baseWirfreuenunsuberdie<unk>,dassein<unk><unk>mitSicherheitnichtvereinbaristmitSicherheitundSicherheit,sagteRogerCameron,CEOderUS-<unk>.
SampleEnglish-Germantransla/ons
• Translateadoubly-negatedphrasecorrectly• Failtotranslate“passengerexperience”.5/19/16 ThangLuong-NeuralMachineTranslaHon 86
TEDtalk,English-German
30.85
26.18 26.02 24.9622.51
20.08
0
5
10
15
20
25
30
35
BLEU
16.16
21.84 22.67 23.42
28.18
0
5
10
15
20
25
30
Stanford EdinburghKarlsruheHeidelberg PJAIT
HUMANTER
26%
ThangLuongandChrisManning.StanfordNeuralMachineTransla>onSystemsforSpokenLanguageDomain.IWSLT2015.
Winning
5/19/16 ThangLuong-NeuralMachineTranslaHon 87
AdvancingNMT
• #1:thevocabularysizeproblem– Sol:“copy”mechanism.
• #2:thesentencelengthproblem– Sol:avenHonmechanism.
• #3:thelanguagecomplexityproblem– Sol:character-leveltranslaHon.
5/19/16 ThangLuong-NeuralMachineTranslaHon 88
The<unk>porHcoin<unk>
Le<unk><unk>de<unk>
ecotax Pont-de-Buis
por8que écotaxe Pont-de-Buis
am a student _ Je suis étudiant
Je suis étudiant _
I
#3Therarewordproblem
• “Copying”mechanismsarenotsufficient.– Differentalphabets:Christopher↦Kryštof– MulH-wordalignment:Solarsystem↦Sonnensystem
• Needtohandlelarge,openvocabulary– Richmorphology:nejneobhospodařovávatelnějšímu
(“totheworstfarmableone”)– Informalspelling:gooooooodmorning!!!!!
Beabletogenerateatthecharacterlevel.5/19/16 ThangLuong-NeuralMachineTranslaHon 89
Recentcharacter-levelNMT
• UnsaHsfactoryperformance– (WangLing,IsabelTrancoso,ChrisDyer,AlanBlack,arXiv2015)
• IncompletesoluHon– Decoderonly(JunyoungChung,KyunghyunCho,YoshuaBengio.arXiv2016).
5/19/16 ThangLuong-NeuralMachineTranslaHon 90
• Abest-of-both-worldsarchitecture:– Translatemostlyatthewordlevel– Onlygothecharacterlevelwhenneeded.
• AddiHonal+2.1↦+11.4BLEUimprovement.
ThangLuongandChrisManning.AchievingOpenVocabularyNeuralMachineTransla>onwithHybridWord-CharacterModels.Insubmission,ACL2016.
SOTAforEnglish-CzechtranslaHon.
5/19/16 ThangLuong-NeuralMachineTranslaHon 91
HybridNMT
Word-level(4layers)
End-to-endtraining8-stackingLSTMlayers.
5/19/16 ThangLuong-NeuralMachineTranslaHon 92
SourceRepresenta/ons
• On-the-flyembeddings.– Zero-init,batchcomputaHon.
5/19/16 ThangLuong-NeuralMachineTranslaHon 93
TargetGenera/on Initwithword
hiddenstates.
Purposely,use<unk>tomakedecodingeasier.
5/19/16 ThangLuong-NeuralMachineTranslaHon 94
English-CzechWMT’15Results
Systems BLEUWinningentry(Bojar&Tamchyna,2015) 18.8
Exis8ngword-levelNMT(Jeanetal.,2015)
Singlemodel 15.7
Ensemble4models 18.3Largevocab+unkreplace
30xdata3systems
5/19/16 ThangLuong-NeuralMachineTranslaHon 95
English-CzechWMT’15Results
Systems BLEUWinningentry(Bojar&Tamchyna,2015) 18.8
Exis8ngword-levelNMT(Jeanetal.,2015)
Singlemodel 15.7
Ensemble4models 18.3
Ourcharacter-basedNMT
Singlemodel(600-stepbackprop) 15.9
Largevocab+unkreplace
30xdata3systems
• Purelycharacter-based:slowbutpromising!
5/19/16 ThangLuong-NeuralMachineTranslaHon 96
English-CzechWMT’15Results
Systems BLEUWinningentry(Bojar&Tamchyna,2015) 18.8
Exis8ngword-levelNMT(Jeanetal.,2015)
Singlemodel 15.7
Ensemble4models 18.3
Ourcharacter-basedNMT
Singlemodel(600-stepbackprop) 15.9
OurhybridNMT
Singlemodel 19.6
Largevocab+unkreplace
30xdata3systems
NewSOTA!
5/19/16 ThangLuong-NeuralMachineTranslaHon 97
English-CzechWMT’15Results
Systems BLEUWinningentry(Bojar&Tamchyna,2015) 18.8
Exis8ngword-levelNMT(Jeanetal.,2015)
Singlemodel 15.7
Ensemble4models 18.3
Ourcharacter-basedNMT
Singlemodel(600-stepbackprop) 15.9
OurhybridNMT
Singlemodel 19.6
Ensemble4models 20.7
Largevocab+unkreplace
30xdata3systems
BemerSOTA!
5/19/16 ThangLuong-NeuralMachineTranslaHon 98
EffectsofVocabularySizes
02468101214161820
1K 10K 20K 50K
BLEU
VocabularySize
Word Word+unkreplace Hybrid
5/19/16 ThangLuong-NeuralMachineTranslaHon 99
EffectsofVocabularySizes
02468101214161820
1K 10K 20K 50K
BLEU
VocabularySize
Word Word+unkreplace Hybrid
AddiHonalgainsof+2.1↦+11.4BLEU
+11.4
+4.5+3.5
+2.1
5/19/16 ThangLuong-NeuralMachineTranslaHon 100
EffectsofVocabularySizes
02468101214161820
1K 10K 20K 50K
BLEU
VocabularySize
Word Word+unkreplace Hybrid
Small-vocabhybrid=Large-vocabword5/19/16 ThangLuong-NeuralMachineTranslaHon 101
RareWordEmbeddings
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
acceptable
acknowledgement
admissionadmitadmittanceadmitting
advance
antagonistchoose
chooses
connect
decide
developdevelopments
evidentlyexplicit
founder
governance
immobileimmoveable
impossible
insensitiveinsufficiency
linkmanagement
necessary
nominated
noticeable
obvious
perceptible
possible
practice
satisfactory
sponsor
unacceptable
unaffected
uncomfortableunsatisfactory
unsuitable
antagonize
cofounderscompanionships
disrespectful
heartlesslyheartlessness
illiberal
impossibilitiesinabilities
loveless
narrowïmindednarrowïmindedness nonconscious
regretful
spiritless
unattainableness
unconcern
uncontroversial
unfeatheredunfledged
ungraceful
unrealizable
unsighted
untrustworthy
wholeheartedness
• Word&character-basedembeddings.5/19/16 ThangLuong-NeuralMachineTranslaHon 102
RareWordEmbeddings
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
acceptable
acknowledgement
admissionadmitadmittanceadmitting
advance
antagonistchoose
chooses
connect
decide
developdevelopments
evidentlyexplicit
founder
governance
immobileimmoveable
impossible
insensitiveinsufficiency
linkmanagement
necessary
nominated
noticeable
obvious
perceptible
possible
practice
satisfactory
sponsor
unacceptable
unaffected
uncomfortableunsatisfactory
unsuitable
antagonize
cofounderscompanionships
disrespectful
heartlesslyheartlessness
illiberal
impossibilitiesinabilities
loveless
narrowïmindednarrowïmindedness nonconscious
regretful
spiritless
unattainableness
unconcern
uncontroversial
unfeatheredunfledged
ungraceful
unrealizable
unsighted
untrustworthy
wholeheartedness
• Word&character-basedembeddings.5/19/16 ThangLuong-NeuralMachineTranslaHon 103
RareWordEmbeddings
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
acceptable
acknowledgement
admissionadmitadmittanceadmitting
advance
antagonistchoose
chooses
connect
decide
developdevelopments
evidentlyexplicit
founder
governance
immobileimmoveable
impossible
insensitiveinsufficiency
linkmanagement
necessary
nominated
noticeable
obvious
perceptible
possible
practice
satisfactory
sponsor
unacceptable
unaffected
uncomfortableunsatisfactory
unsuitable
antagonize
cofounderscompanionships
disrespectful
heartlesslyheartlessness
illiberal
impossibilitiesinabilities
loveless
narrowïmindednarrowïmindedness nonconscious
regretful
spiritless
unattainableness
unconcern
uncontroversial
unfeatheredunfledged
ungraceful
unrealizable
unsighted
untrustworthy
wholeheartedness
• Word&character-basedembeddings.5/19/16 ThangLuong-NeuralMachineTranslaHon 104
SampleEnglish-Czechtransla/ons
source TheauthorStephenJayGoulddied20yearsalerdiagnosis.human AutorStephenJayGouldzemřel20letpodiagnóze.
char AutorStepherStepherzemřel20letpodiagnóze.
wordAutorStephenJay<unk>zemřel20letpo<unk>.
AutorStephenJayGouldzemřel20letpopo.
hybridAutorStephenJay<unk>zemřel20letpo<unk>.
AutorStephenJayGouldzemřel20letpodiagnóze.
PerfecttranslaHon!
5/19/16 ThangLuong-NeuralMachineTranslaHon 105
SampleEnglish-Czechtransla/ons
• Char-based:wrongnametranslaHon.• Char-based:wrongnametranslaHon.
source TheauthorStephenJayGoulddied20yearsalerdiagnosis.human AutorStephenJayGouldzemřel20letpodiagnóze.
char AutorStepherStepherzemřel20letpodiagnóze.
wordAutorStephenJay<unk>zemřel20letpo<unk>.
AutorStephenJayGouldzemřel20letpopo.
hybridAutorStephenJay<unk>zemřel20letpo<unk>.
AutorStephenJayGouldzemřel20letpodiagnóze.
5/19/16 ThangLuong-NeuralMachineTranslaHon 106
SampleEnglish-Czechtransla/ons
• Word-based:incorrectalignment• Char-based:wrongnametranslaHon.
source TheauthorStephenJayGoulddied20yearsalerdiagnosis.human AutorStephenJayGouldzemřel20letpodiagnóze.
char AutorStepherStepherzemřel20letpodiagnóze.
wordAutorStephenJay<unk>zemřel20letpo<unk>.
AutorStephenJayGouldzemřel20letpopo.
hybridAutorStephenJay<unk>zemřel20letpo<unk>.
AutorStephenJayGouldzemřel20letpodiagnóze.
5/19/16 ThangLuong-NeuralMachineTranslaHon 107
SampleEnglish-Czechtransla/ons
• Char-based&hybrid:correcttranslaHonofdiagnóze.• Char-based:wrongnametranslaHon.
source TheauthorStephenJayGoulddied20yearsalerdiagnosis.human AutorStephenJayGouldzemřel20letpodiagnóze.
char AutorStepherStepherzemřel20letpodiagnóze.
wordAutorStephenJay<unk>zemřel20letpo<unk>.
AutorStephenJayGouldzemřel20letpopo.
hybridAutorStephenJay<unk>zemřel20letpo<unk>.
AutorStephenJayGouldzemřel20letpodiagnóze.
5/19/16 ThangLuong-NeuralMachineTranslaHon 108
SampleEnglish-Czechtransla/ons
source AstheReverendMar>nLutherKingJr.saidfiTyyearsago:human Jakpředpadesá/letyřeklreverendMar/nLutherKingJr.:
char JakoreverendMar/nLutherkrálříkalpředpadesá>lety:
wordJakřeklreverendMarHn<unk>King<unk>předpadesáHlety:
JakřeklreverendMar/nLutherKingřeklpředpadesá>lety:
hybridJakřeklreverendMarHn<unk>King<unk>předpadesáHlety:
Jakpředpadesá/letyřeklreverendMar/nLutherKingJr.:
Correctreordering!
5/19/16 ThangLuong-NeuralMachineTranslaHon 109
source AstheReverendMar>nLutherKingJr.saidfiTyyearsago:human Jakpředpadesá/letyřeklreverendMar/nLutherKingJr.:
char JakoreverendMar/nLutherkrálříkalpředpadesá>lety:
wordJakřeklreverendMarHn<unk>King<unk>předpadesáHlety:
JakřeklreverendMar/nLutherKingřeklpředpadesá>lety:
hybridJakřeklreverendMarHn<unk>King<unk>předpadesáHlety:
Jakpředpadesá/letyřeklreverendMar/nLutherKingJr.:
SampleEnglish-Czechtransla/ons
• Char-based:“král”means“king”.• Char-based:wrongnametranslaHon.5/19/16 ThangLuong-NeuralMachineTranslaHon 110
SampleEnglish-Czechtransla/ons
source Her11-year-olddaughter,ShaniBart,saiditfeltalivlebitweird
human Jejíjedenác/letádceraShaniBartováprozradila,žejetotrochuzvláštní
char Jejíjedenác/letádcera,ShaniBartová,říkala,žecí�trochudivně
wordJejí<unk>dcera<unk><unk>řekla,žejetotrochudivné
Její11-year-olddceraShani,řekla,žejetotrochudivné
hybridJejí<unk>dcera,<unk><unk>,řekla,žejeto<unk><unk>
Jejíjedenác/letádcera,GrahamBart,řekla,žecí�trochudivný
Generatecomplexwords!
5/19/16 ThangLuong-NeuralMachineTranslaHon 111
SampleEnglish-Czechtransla/ons
source Her11-year-olddaughter,ShaniBart,saiditfeltalivlebitweird
human Jejíjedenác/letádceraShaniBartováprozradila,žejetotrochuzvláštní
char Jejíjedenác/letádcera,ShaniBartová,říkala,žecí�trochudivně
wordJejí<unk>dcera<unk><unk>řekla,žejetotrochudivné
Její11-year-olddceraShani,řekla,žejetotrochudivné
hybridJejí<unk>dcera,<unk><unk>,řekla,žejeto<unk><unk>
Jejíjedenác/letádcera,GrahamBart,řekla,žecí�trochudivný
• Word-based:idenHtycopyfails.• Char-based:wrongnametranslaHon.5/19/16 ThangLuong-NeuralMachineTranslaHon 112
SampleEnglish-Czechtransla/ons
source Her11-year-olddaughter,ShaniBart,saiditfeltalivlebitweird
human Jejíjedenác/letádceraShaniBartováprozradila,žejetotrochuzvláštní
char Jejíjedenác/letádcera,ShaniBartová,říkala,žecí�trochudivně
wordJejí<unk>dcera<unk><unk>řekla,žejetotrochudivné
Její11-year-olddceraShani,řekla,žejetotrochudivné
hybridJejí<unk>dcera,<unk><unk>,řekla,žejeto<unk><unk>
Jejíjedenác/letádcera,GrahamBart,řekla,žecí�trochudivný
• Hybrid:translatenamesincorrectly.• Char-based:wrongnametranslaHon.5/19/16 ThangLuong-NeuralMachineTranslaHon 113
WehaveadvancedNMT
• #1:thevocabularysizeproblem– Sol:“copy”mechanism.– SOTAEnglish-French
• #2:thesentencelengthproblem– Sol:avenHonmechanism.– SOTAEnglish-German
• #3:thelanguagecomplexityproblem– Sol:character-leveltranslaHon.– SOTAEnglish-Czech
5/19/16 ThangLuong-NeuralMachineTranslaHon 114
The<unk>porHcoin<unk>
Le<unk><unk>de<unk>
ecotax Pont-de-Buis
por8que écotaxe Pont-de-Buis
am a student _ Je suis étudiant
Je suis étudiant _
I
NMT&beyond
• UnsupervisedlearningforNMT– UHlizemonolingualdata.
• Long-contextNMT– TranslaHnganarHcle/abook.– SmarteravenHon,longersequences.
• MulH-modalLanguageUnderstandingSystem– MulH-lingualtranslaHon+speechrecogniHon+more– MulH-tasklearning
Thankyou!
5/19/16 ThangLuong-NeuralMachineTranslaHon 115
I’mdone.Butifyouarecurious,readon!
5/19/16 ThangLuong-NeuralMachineTranslaHon 116
• CanweuHlizeallsequence-to-sequencedata?
• CanwecompressNMTformobiledevices?
#4ForthefutureofNMT
MachinetranslaHon ConsHtuentparsing
5/19/16 ThangLuong-NeuralMachineTranslaHon 117
Ourwork
• MulH-tasklearning:– MachinetranslaHon –ConsHtuentparsing– ImagecapHongeneraHon–Unsupervisedlearning
• TranslaHonimprovement:upto+1.5BLEU.
• State-of-the-artinconsHtuentparsing.
ThangLuong,QuocLe,IlyaSutskever,OriolVinyals,andLukaszKaiser.Mul>-tasksequencetosequencelearning.ICLR2016.
5/19/16 ThangLuong-NeuralMachineTranslaHon 118
Many-to-one:shareddecoder
English (unsupervised)
Image (captioning) English
German (translation)
1616.517
17.518
18.519
German-EnglishTransla/on(BLEU)
Big(transla>on)+Medium(cap>on)
(Luongetal.,2015)
Single
+capHon(0.1x)
+capHon(0.05x)
+capHon(0.01x)
+0.7BLEU
5/19/16 ThangLuong-NeuralMachineTranslaHon 119
1313.514
14.515
15.516
English-GermanTransla/on(BLEU)
Big(transla>on)+Small(PTBparsing)
(Luongetal.,2015)
Single
+parsing(1x)
+parsing(0.1x)
+parsing(0.01x)
+1.5BLEU
English (unsupervised)
German (translation)
Tags (parsing)English
One-to-many:sharedencoder
MixingraHo5/19/16 ThangLuong-NeuralMachineTranslaHon 120
Ourwork
AbigailSee*,ThangLuong*,andChrisManning.CompressionofNeuralMachineTransla>onModelsviaPruning.Insubmission.
• CompressNMTviapruning&retraining:
0 10 20 30 40 50 60 70 80 90
0
5
10
15
20
25
percentage pruned
BLEU
score
pruned
pruned and retrained
Figure 1: Performance of pruned models, immediately after pruning and after
retraining.
1
Originalmodel
Prunesmallestweights
Prune+retrain
5/19/16 ThangLuong-NeuralMachineTranslaHon 121
Ourwork• CompressNMTviapruning&retraining:
0 10 20 30 40 50 60 70 80 90
0
5
10
15
20
25
percentage pruned
BLEU
score
pruned
pruned and retrained
Figure 1: Performance of pruned models, immediately after pruning and after
retraining.
1
Originalmodel
Prunesmallestweights
Prune+retrain
Prune80%withoutlossofperformance.5/19/16 ThangLuong-NeuralMachineTranslaHon 122
NMTRedundancy–Embeddings
• Frequentwordshavelargerweights– white:large.– black:small.
target embedding weights
source embedding weights
least common wordmost common word
source layer 1 weights
recurrentfeed-forward
input gate
forget gate
output gate
input
source layer 2 weights source layer 3 weights source layer 4 weights
target layer 1 weights target layer 2 weights target layer 3 weights target layer 4 weights
5/19/16 ThangLuong-NeuralMachineTranslaHon 123
target embedding weights
source embedding weights
least common wordmost common word
source layer 1 weights
recurrentfeed-forward
input gate
forget gate
output gate
input
source layer 2 weights source layer 3 weights source layer 4 weights
target layer 1 weights target layer 2 weights target layer 3 weights target layer 4 weights
NMTRedundancy–LSTMLayer1 Layer2 Layer3 Layer4
Encoder
Decoder
Inputgate
Forgetgate
Outputgate
Inputsignal
Feed-forward Recurrent
5/19/16 ThangLuong-NeuralMachineTranslaHon 124
target embedding weights
source embedding weights
least common wordmost common word
source layer 1 weights
recurrentfeed-forward
input gate
forget gate
output gate
input
source layer 2 weights source layer 3 weights source layer 4 weights
target layer 1 weights target layer 2 weights target layer 3 weights target layer 4 weights
NMTRedundancy–LSTMLayer1 Layer2 Layer3 Layer4
Encoder
Decoder
Inputgate
Forgetgate
Outputgate
Inputsignal
Feed-forward Recurrent
Inputsignalisimportant!
5/19/16 ThangLuong-NeuralMachineTranslaHon 125
target embedding weights
source embedding weights
least common wordmost common word
source layer 1 weights
recurrentfeed-forward
input gate
forget gate
output gate
input
source layer 2 weights source layer 3 weights source layer 4 weights
target layer 1 weights target layer 2 weights target layer 3 weights target layer 4 weights
NMTRedundancy–LSTMLayer1 Layer2 Layer3 Layer4
Encoder
Decoder
Inputgate
Forgetgate
Outputgate
Inputsignal
Feed-forward Recurrent
Forgetgate:small–layer1large–layer4
5/19/16 ThangLuong-NeuralMachineTranslaHon 126
FutureChallenges
Shesawanelephantinherdress.Theelephantmusthaveagood
senseoffashion!Needstounderstand
commonsense&largercontext.
Shesawanelephantinherdress.
5/19/16 ThangLuong-NeuralMachineTranslaHon 127
Thankyou!