neural machine transla#on

127
Neural Machine Transla/on Thang Luong Lecture @ CS224D Spring 2016 (Special thanks to Chris Manning for feedback!)

Upload: lylien

Post on 02-Jan-2017

228 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Neural Machine Transla#on

NeuralMachineTransla/on

ThangLuongLecture@CS224D

Spring2016

(SpecialthankstoChrisManningforfeedback!)

Page 2: Neural Machine Transla#on

7billionpeople,7000languages

5/19/16 ThangLuong-NeuralMachineTranslaHon 2

Page 3: Neural Machine Transla#on

Auniversaltranslator

(TheBabelFishfrom“theHitchhiker'sGuidetotheGalaxy”)

5/19/16 ThangLuong-NeuralMachineTranslaHon 3

Page 4: Neural Machine Transla#on

Machinevs.HumanTransla/on

“Nevertheless,withinthedisciplineofmachinelearning,aspecializa8oncalleddeeplearninghasbeendeveloped.Itsdis8nguishingfeatureisthatitisinspiredbyneurobiology.Deeplearningdealswithcomputa8onalelementswhichallowanetworkofar8ficialneuronstolearnamodelofthehumanbrain.”

Faithfultransla/on

5/19/16 ThangLuong-NeuralMachineTranslaHon 4

•  GrammaHcallyincorrect.

Page 5: Neural Machine Transla#on

Machinevs.HumanTransla/on

“Nevertheless,withinthedisciplineofmachinelearning,aspecializa8oncalleddeeplearninghasbeendeveloped.Itsdis8nguishingfeatureisthatitisinspiredbyneurobiology.Deeplearningdealswithcomputa8onalelementswhichallowanetworkofar8ficialneuronstolearnamodelofthehumanbrain.”

Faithfultransla/on

5/19/16 ThangLuong-NeuralMachineTranslaHon 5

•  Badwordchoices.

Page 6: Neural Machine Transla#on

Machinevs.HumanTransla/on

“Nevertheless,withinthedisciplineofmachinelearning,aspecializa8oncalleddeeplearninghasbeendeveloped.Itsdis8nguishingfeatureisthatitisinspiredbyneurobiology.Deeplearningdealswithcomputa8onalelementswhichallowanetworkofar8ficialneuronstolearnamodelofthehumanbrain.”

Faithfultransla/on

5/19/16 ThangLuong-NeuralMachineTranslaHon 6

•  Badsentencestructures.

Page 7: Neural Machine Transla#on

Machinevs.HumanTransla/on

“However,inmachinelearningaspecializa8oncalleddeeplearninghasemerged.Itcanberecognizedbyitsdis8nc8veneurobiologicalinfluence.Deeplearningiscenteredaroundnetworksofar8ficialneuronswhichcanlearnmodelsofthehumanbrain.”

Fluenttransla/on

Abiggap!5/19/16 ThangLuong-NeuralMachineTranslaHon 7

Page 8: Neural Machine Transla#on

HowhasMTevolved?

5/19/16 ThangLuong-NeuralMachineTranslaHon 8

Page 9: Neural Machine Transla#on

Phrase-basedMT

Sheloves

Elleaime

cute

leschats

cats

mignons(Brownetal.,1993;Koehnetal.,2003;Och&Ney,2004)

•  Breaksentencesintochunks.•  Transla8onmodel:lookupphrasetranslaHons.•  Languagemodel:Hephrasestogether.

Translatelocally LMusesonlytargetwords5/19/16 ThangLuong-NeuralMachineTranslaHon 9

Page 10: Neural Machine Transla#on

JointNeuralLanguageModel

•  CondiHonedonsourcewords(Devlinetal.,2014)•  SHlltranslatelocally.

…alléspourunepromenadelelongdela

WewentforastrollalongtheSouthBank

rive

walk along bank(river)

MTsystemsbecomemorecomplex!5/19/16 ThangLuong-NeuralMachineTranslaHon 10

Page 11: Neural Machine Transla#on

NeuralMachineTransla/ontotherescue!

Let’sfindout!

•  Sequence-to-sequence:translateglobally.•  End-to-end:simple&generalizable.

(Sutskeveretal.,2014;Choetal.,2014)

NMT

Iamastudent Jesuisétudiant

5/19/16 ThangLuong-NeuralMachineTranslaHon 11

Page 12: Neural Machine Transla#on

Outline

•  BasicNMT– RNNRecap.– Encoder-Decoder.– Training.– TesHng.

•  AdvancedNMT

5/19/16 ThangLuong-NeuralMachineTranslaHon 12

Page 13: Neural Machine Transla#on

RecurrentNeuralNetworks(RNNs)

I am a studentinput:

5/19/16 ThangLuong-NeuralMachineTranslaHon 13

(PictureadaptedfromAndrejKarparthy)

Page 14: Neural Machine Transla#on

RecurrentNeuralNetworks(RNNs)

I am a studentinput:

RNNstorepresentsequences!

ht-1 ht

xt

(PictureadaptedfromAndrejKarparthy)5/19/16 ThangLuong-NeuralMachineTranslaHon 14

Page 15: Neural Machine Transla#on

NeuralMachineTransla/on(NMT)

am a student _ Je suis étudiant

Je suis étudiant _

I

5/19/16 ThangLuong-NeuralMachineTranslaHon 15

•  RecurrentNeuralNetworks:– ModelP(target|source)directly.– Canbetrainedend-to-end.

Page 16: Neural Machine Transla#on

NeuralMachineTransla/on(NMT)

am a student _ Je suis étudiant

Je suis étudiant _

I

5/19/16 ThangLuong-NeuralMachineTranslaHon 16

•  RecurrentNeuralNetworks:– ModelP(target|source)directly.– Canbetrainedend-to-end.

Page 17: Neural Machine Transla#on

NeuralMachineTransla/on(NMT)

am a student _ Je suis étudiant

Je suis étudiant _

I

5/19/16 ThangLuong-NeuralMachineTranslaHon 17

•  RecurrentNeuralNetworks:– ModelP(target|source)directly.– Canbetrainedend-to-end.

Page 18: Neural Machine Transla#on

NeuralMachineTransla/on(NMT)

am a student _ Je suis étudiant

Je suis étudiant _

I

5/19/16 ThangLuong-NeuralMachineTranslaHon 18

•  RecurrentNeuralNetworks:– ModelP(target|source)directly.– Canbetrainedend-to-end.

Page 19: Neural Machine Transla#on

NeuralMachineTransla/on(NMT)

am a student _ Je suis étudiant

Je suis étudiant _

I

5/19/16 ThangLuong-NeuralMachineTranslaHon 19

•  RecurrentNeuralNetworks:– ModelP(target|source)directly.– Canbetrainedend-to-end.

Page 20: Neural Machine Transla#on

NeuralMachineTransla/on(NMT)

am a student _ Je suis étudiant

Je suis étudiant _

I

Encoder

5/19/16 ThangLuong-NeuralMachineTranslaHon 20

•  RecurrentNeuralNetworks:– ModelP(target|source)directly.– Canbetrainedend-to-end.

Page 21: Neural Machine Transla#on

NeuralMachineTransla/on(NMT)

am a student _ Je suis étudiant

Je suis étudiant _

I

Encoder

5/19/16 ThangLuong-NeuralMachineTranslaHon 21

•  RecurrentNeuralNetworks:– ModelP(target|source)directly.– Canbetrainedend-to-end.

Boundarymarker

Page 22: Neural Machine Transla#on

NeuralMachineTransla/on(NMT)

am a student _ Je suis étudiant

Je suis étudiant _

I

Encoder

5/19/16 ThangLuong-NeuralMachineTranslaHon 22

•  RecurrentNeuralNetworks:– ModelP(target|source)directly.– Canbetrainedend-to-end.

Page 23: Neural Machine Transla#on

NeuralMachineTransla/on(NMT)

am a student _ Je suis étudiant

Je suis étudiant _

I

Encoder

5/19/16 ThangLuong-NeuralMachineTranslaHon 23

•  RecurrentNeuralNetworks:– ModelP(target|source)directly.– Canbetrainedend-to-end.

Page 24: Neural Machine Transla#on

NeuralMachineTransla/on(NMT)

am a student _ Je suis étudiant

Je suis étudiant _

I

Encoder Decoder

•  RecurrentNeuralNetworks:– ModelP(target|source)directly.– Canbetrainedend-to-end.

Boundarymarker

5/19/16 ThangLuong-NeuralMachineTranslaHon 24

Page 25: Neural Machine Transla#on

WordEmbeddings

am a student _ Je suis étudiant

Je suis étudiant _

I

•  Oneforeachlanguage:canlearnfromscratch.

Sourceembeddings

Targetembeddings

5/19/16 ThangLuong-NeuralMachineTranslaHon 25

Page 26: Neural Machine Transla#on

am a student _ Je suis étudiant

Je suis étudiant _

I

RecurrentConnec/onsIniHalstates

•  Olensetto0.5/19/16 ThangLuong-NeuralMachineTranslaHon 26

Page 27: Neural Machine Transla#on

am a student _ Je suis étudiant

Je suis étudiant _

I

RecurrentConnec/ons

Encoder1stlayer

5/19/16 ThangLuong-NeuralMachineTranslaHon 27

•  Different:{1stlayer,2ndlayer}x{encoder,decoder}.

Page 28: Neural Machine Transla#on

am a student _ Je suis étudiant

Je suis étudiant _

I

RecurrentConnec/ons

Encoder2ndlayer

5/19/16 ThangLuong-NeuralMachineTranslaHon 28

•  Different:{1stlayer,2ndlayer}x{encoder,decoder}.

Page 29: Neural Machine Transla#on

am a student _ Je suis étudiant

Je suis étudiant _

I

RecurrentConnec/ons

•  Different:{1stlayer,2ndlayer}x{encoder,decoder}.

Decoder1stlayer

5/19/16 ThangLuong-NeuralMachineTranslaHon 29

Page 30: Neural Machine Transla#on

am a student _ Je suis étudiant

Je suis étudiant _

I

RecurrentConnec/ons

Decoder2ndlayer

5/19/16 ThangLuong-NeuralMachineTranslaHon 30

•  Different:{1stlayer,2ndlayer}x{encoder,decoder}.

Page 31: Neural Machine Transla#on

RecurrentUnits

•  Vanilla:

•  LSTM:

5/19/16 ThangLuong-NeuralMachineTranslaHon 31

RNN

LSTM

C’mon,it’sbeenaroundfor20years!

Vanishinggradientproblem!

Page 32: Neural Machine Transla#on

SoTmax:vectors↦categories

am a student _ Je suis étudiant

Je suis étudiant _

I

am a student _ Je suis étudiant

Je suis étudiant _

I

Solmaxparameters

5/19/16 ThangLuong-NeuralMachineTranslaHon 32

am a student _ Je suis étudiant

Je suis étudiant _

I

Targethiddenstate

|V|

Page 33: Neural Machine Transla#on

SoTmax:vectors↦categories

am a student _ Je suis étudiant

Je suis étudiant _

I

am a student _ Je suis étudiant

Je suis étudiant _

I

5/19/16 ThangLuong-NeuralMachineTranslaHon 33

am a student _ Je suis étudiant

Je suis étudiant _

I

|V|

•  Hiddenstates↦scores↦probabiliHes.

Scores

=

Probs

P(Je|…)exp&normalize

Page 34: Neural Machine Transla#on

TrainingLoss

•  MaximizeP(target|source)

am a student _ Je suis étudiantI

Je suis étudiant _

5/19/16 ThangLuong-NeuralMachineTranslaHon 34

Page 35: Neural Machine Transla#on

TrainingLoss

am a student _ Je suis étudiant

-log P(Je)

I

-log P(suis)

•  Sumofallindividuallosses5/19/16 ThangLuong-NeuralMachineTranslaHon 35

Page 36: Neural Machine Transla#on

TrainingLoss

am a student _ Je suis étudiant

-log P(Je)

I

-log P(suis)

•  Sumofallindividuallosses5/19/16 ThangLuong-NeuralMachineTranslaHon 36

Page 37: Neural Machine Transla#on

TrainingLoss

•  Sumofallindividuallosses

am a student _ Je suis étudiantI

-log P(étudiant)

5/19/16 ThangLuong-NeuralMachineTranslaHon 37

Page 38: Neural Machine Transla#on

TrainingLoss

•  Sumofallindividuallosses

am a student _ Je suis étudiantI

-log P(_)

5/19/16 ThangLuong-NeuralMachineTranslaHon 38

Page 39: Neural Machine Transla#on

Backpropaga/onThroughTime

am a student _ Je suis étudiantI

-log P(_)

Initto0

5/19/16 ThangLuong-NeuralMachineTranslaHon 39

Page 40: Neural Machine Transla#on

am a student _ Je suis étudiantI

-log P(étudiant)

Backpropaga/onThroughTime

5/19/16 ThangLuong-NeuralMachineTranslaHon 40

Page 41: Neural Machine Transla#on

am a student _ Je suis étudiantI

-log P(étudiant)

Backpropaga/onThroughTime

5/19/16 ThangLuong-NeuralMachineTranslaHon 41

Page 42: Neural Machine Transla#on

am a student _ Je suis étudiantI

-log P(suis)

Backpropaga/onThroughTime

5/19/16 ThangLuong-NeuralMachineTranslaHon 42

Page 43: Neural Machine Transla#on

am a student _ Je suis étudiantI

-log P(suis)

Backpropaga/onThroughTime

5/19/16 ThangLuong-NeuralMachineTranslaHon 43

Page 44: Neural Machine Transla#on

am a student _ Je suis étudiantI

Backpropaga/onThroughTime

RNNgradientsareaccumulated.5/19/16 ThangLuong-NeuralMachineTranslaHon 44

Page 45: Neural Machine Transla#on

Trainingvs.Tes/ng

•  Training– CorrecttranslaHonsareavailable.

•  Tes8ng– Onlysourcesentencesaregiven.

am a student _ Je suis étudiant

Je suis étudiant _

I

am a student _ Je suis étudiant

Je suis étudiant _

I5/19/16 ThangLuong-NeuralMachineTranslaHon 45

Page 46: Neural Machine Transla#on

Tes/ng

•  Feedthemostlikelyword5/19/16 ThangLuong-NeuralMachineTranslaHon 46

Page 47: Neural Machine Transla#on

Tes/ng

5/19/16 ThangLuong-NeuralMachineTranslaHon 47

•  Feedthemostlikelyword

Page 48: Neural Machine Transla#on

Tes/ng

5/19/16 ThangLuong-NeuralMachineTranslaHon 48

•  Feedthemostlikelyword

Page 49: Neural Machine Transla#on

Tes/ng

5/19/16 ThangLuong-NeuralMachineTranslaHon 49

•  Feedthemostlikelyword

Page 50: Neural Machine Transla#on

Tes/ng

Simplebeam-searchdecoders!5/19/16 ThangLuong-NeuralMachineTranslaHon 50

Page 51: Neural Machine Transla#on

37

33.3

34.8

36.5

31

32

33

34

35

36

37

38

BLEU

SOTASMT(Durrani+,2014)

AvgSMT(Schwenk,2014)

NMT(Sutskever+,2014)

SMT+NMTrescore(Sutskever+,2014)

English-FrenchWMT’14results

5/19/16 ThangLuong-NeuralMachineTranslaHon 51

Page 52: Neural Machine Transla#on

English-FrenchWMT’14results

37

33.3

34.8

36.5

31

32

33

34

35

36

37

38

BLEU

SOTASMT(Durrani+,2014)

AvgSMT(Schwenk,2014)

NMT(Sutskever+,2014)

SMT+NMTrescore(Sutskever+,2014)

5/19/16 ThangLuong-NeuralMachineTranslaHon 52

2decadesofresearch

1-2yearsofresearch

Page 53: Neural Machine Transla#on

Encoder-decoderVariants

Encoder Decoder

(Sutskeveretal.,2014)MyNMTmodels DeepLSTM DeepLSTM

(Choetal.,2014)(Bahdanauetal.,2015)

(Jeanetal.,2015)

(BidirecHonal)GRU GRU

(Kalchbrenner&Blunsom,2013) CNN (InverseCNN)

RNN

Next,advancedNMT!5/19/16 ThangLuong-NeuralMachineTranslaHon 53

Page 54: Neural Machine Transla#on

Deepfriedbaby Meatmusclestupidbeansprouts

Break/me:whenMTfails…

Saleofchickenmurder Gobacktowardyourbehind

5/19/16 ThangLuong-NeuralMachineTranslaHon 54

Page 55: Neural Machine Transla#on

Limita/ons

•  #1:thevocabularysizeproblem– Goal:extendthevocabularycoverage.

•  #2:thesentencelengthproblem– Goal:translatelongsentencesbever.

•  #3:thelanguagecomplexityproblem– Goal:handlemorelanguagevariaHons.

5/19/16 ThangLuong-NeuralMachineTranslaHon 55

Page 56: Neural Machine Transla#on

AdvancingNMT

•  #1:thevocabularysizeproblem– Sol:“copy”mechanism.

•  #2:thesentencelengthproblem– Sol:avenHonmechanism.

•  #3:thelanguagecomplexityproblem– Sol:character-leveltranslaHon.

5/19/16 ThangLuong-NeuralMachineTranslaHon 56

The<unk>porHcoin<unk>

Le<unk><unk>de<unk>

ecotax Pont-de-Buis

por8que écotaxe Pont-de-Buis

am a student _ Je suis étudiant

Je suis étudiant _

I

Page 57: Neural Machine Transla#on

CVvs.NLP

ComputerVision

cat

1Kcategories 1Mcategories

NLP

mat

Thecatsatona

5/19/16 ThangLuong-NeuralMachineTranslaHon 57

Page 58: Neural Machine Transla#on

#1TheVocabularySizeProblem

•  WordgeneraHonproblem– Vocabsaremodest:50K.– Simplesolmax:GPUfriendliness.

am a student _ Je suis étudiant

Je suis étudiant _

I

The<unk>porHcoin<unk>Le<unk><unk>de<unk>

TheecotaxporHcoinPont-de-BuisLeporHqueécotaxedePont-de-Buis

5/19/16 ThangLuong-NeuralMachineTranslaHon 58

Page 59: Neural Machine Transla#on

•  Propose“copy”mechanismsfor<unk>.

•  Simple&effecHve– TreatanyNMTasablackbox.– Annotatetrainingdata.– Post-processtranslaHons.

ThangLuong*,IlyaSutskever*,QuocLe*,OriolVinyals,andWojciechZaremba.AddressingtheRareWordProbleminNeuralMachineTransla>on.ACL2015.

SOTAforEnglish-FrenchtranslaHon.

5/19/16 ThangLuong-NeuralMachineTranslaHon 59

Page 60: Neural Machine Transla#on

Ourapproach–trainingannota>on

•  AddrelaHveposiHons.

“AvenHon”forrarewords

TheecotaxporHcoinPont-de-Buis

LeporHqueécotaxedePont-de-Buis

•  Learnalignments.

The<unk>porHcoin<unk>

Leunk1unk-1deunk0

5/19/16 ThangLuong-NeuralMachineTranslaHon 60

Page 61: Neural Machine Transla#on

Ourapproach–post-process

Testsentence

Transla8on

The<unk>porHcoin<unk>

LeporHqueunk-1deunk0

ecotax Pont-de-Buis

5/19/16 ThangLuong-NeuralMachineTranslaHon 61

Page 62: Neural Machine Transla#on

Ourapproach–post-process

Testsentence

Transla8on

The<unk>porHcoin<unk>

LeporHqueunk-1deunk0

ecotax Pont-de-Buis

Post-editTransla8on

Dic/onarytranslaHon

LeporHqueécotaxedePont-de-Buis

5/19/16 ThangLuong-NeuralMachineTranslaHon 62

Page 63: Neural Machine Transla#on

Ourapproach–post-process

Testsentence

Transla8on

The<unk>porHcoin<unk>

LeporHqueunk-1deunk0

ecotax Pont-de-Buis

LeporHqueécotaxedePont-de-Buis

Iden/tycopy

Post-editTransla8on

Orthogonaltolarge-vocabtechniques5/19/16 ThangLuong-NeuralMachineTranslaHon 63

Page 64: Neural Machine Transla#on

EffectsofTransla/ngRareWords

25

30

35

40

BLEU

Sentencesorderedbyaveragefrequencyrank

Durranietal.(37.0)

Sutskeveretal.(34.8)

Thiswork(37.5)

5/19/16 ThangLuong-NeuralMachineTranslaHon 64

FirstSOTANMTsystem!

Page 65: Neural Machine Transla#on

Sampletransla/ons

•  Predictwelllong-distancealignments.– Correct:cataractvs.cataracte.

source AnaddiHonal2600operaHonsincludingorthopedicandcataractsurgerywillhelpclearabacklog.

human 2600opéraHonssupplémentaires,notammentdansledomainedelachirurgieorthopédiqueetdelacataracte,aiderontàravraperleretard.

trans Enoutre,unk1opéraHonssupplémentaires,dontlachirurgieunk5etlaunk6,permevrontderésorberl'arriéré.

trans+unk

Enoutre,2600opéraHonssupplémentaires,dontlachirurgieorthopédiquesetlacataracte,permevrontderésorberl'arriéré.

5/19/16 ThangLuong-NeuralMachineTranslaHon 65

Page 66: Neural Machine Transla#on

Sampletransla/ons

•  Translatewelllongsentences.– Correct:JPMorganvs.JPMorgan.

sourceThistrader,RichardUsher,lelRBSin2010andisunderstandtohavebegivenleavefromhiscurrentposiHonasEuropeanheadofforexspottradingatJPMorgan.

humanCetrader,RichardUsher,aquivéRBSen2010etauraitétémissuspendudesonpostederesponsableeuropéendutradingaucomptantpourlesdeviseschezJPMorgan.

transCeunk0,Richardunk0,aquivéunk1en2010etacomprisqu'ilestautoriséàquiversonposteactuelentantqueleadereuropéendumarchédespointsdeventeauunk5.

trans+unk

Cenégociateur,RichardUsher,aquivéRBSen2010etacomprisqu'ilestautoriséàquiversonposteactuelentantqueleadereuropéendumarchédespointsdeventeauJPMorgan.

5/19/16 ThangLuong-NeuralMachineTranslaHon 66

Page 67: Neural Machine Transla#on

Sampletransla/ons

•  IncorrectalignmentpredicHon:was–étaitvs.abandonnait.

source ButconcernshavegrownalerMrMazangawasquotedassayingRenamowasabandoningthe1992peaceaccord.

human Maisl'inquiétudeagrandiaprèsqueM.MazangaadéclaréquelaRenamoabandonnaitl'accorddepaixde1992.

trans MaislesinquiétudessesontaccruesaprèsqueM.unkpos3adéclaréquelaunk3unk3l'accorddepaixde1992.

trans+unk

MaislesinquiétudessesontaccruesaprèsqueM.MazangaadéclaréquelaRenamoétaitl'accorddepaixde1992.

5/19/16 ThangLuong-NeuralMachineTranslaHon 67

Page 68: Neural Machine Transla#on

AdvancingNMT

•  #1:thevocabularysizeproblem– Sol:“copy”mechanism.

•  #2:thesentencelengthproblem– Sol:avenHonmechanism.

•  #3:thelanguagecomplexityproblem– Sol:character-leveltranslaHon.

5/19/16 ThangLuong-NeuralMachineTranslaHon 68

The<unk>porHcoin<unk>

Le<unk><unk>de<unk>

ecotax Pont-de-Buis

por8que écotaxe Pont-de-Buis

am a student _ Je suis étudiant

Je suis étudiant _

I

Page 69: Neural Machine Transla#on

#2TheSentenceLengthProblem

Problem:sentencemeaningisrepresentedbyafixed-dimensionalvector.

am a student _ Je suis étudiant

Je suis étudiant _

I

•  TranslaHonqualitydegradeswithlongsentences.

5/19/16 ThangLuong-NeuralMachineTranslaHon 69

Page 70: Neural Machine Transla#on

Youcan'tcramthemeaningofawhole%&!$#sentenceintoasingle$&!#*vector!

5/19/16 ThangLuong-NeuralMachineTranslaHon 70

(AdaptedfromKyungHuynCho’talk)

Page 71: Neural Machine Transla#on

Amen/onMechanism

•  SoluHon:randomaccessmemory– Retrieveasneeded.

am a student _ Je suis étudiant

Je suis étudiant _

I

Poolofsourcestates

5/19/16 ThangLuong-NeuralMachineTranslaHon 71

Page 72: Neural Machine Transla#on

5/19/16 ThangLuong-NeuralMachineTranslaHon 72

DzmitryBahdanau,KyungHuynCho,andYoshuaBengio.NeuralMachineTransla>onbyJointlyLearningtoTranslateandAlign.ICLR2015.

WithavenHonWithoutavenHon

Page 73: Neural Machine Transla#on

Amen/onMechanism

am a student _ Je

suis

I

Attention Layer

Context vector

?

Asimplifiedversionof(Bahdanauetal.,2015)5/19/16 ThangLuong-NeuralMachineTranslaHon 73

Page 74: Neural Machine Transla#on

•  Comparetargetandsourcehiddenstates.

Amen/onMechanism–Scoring

am a student _ Je

suis

I

Attention Layer

Context vector

?

3

5/19/16 ThangLuong-NeuralMachineTranslaHon 74

Page 75: Neural Machine Transla#on

•  Comparetargetandsourcehiddenstates.

Amen/onMechanism–Scoring

am a student _ Je

suis

I

Attention Layer

Context vector

?

53

5/19/16 ThangLuong-NeuralMachineTranslaHon 75

Page 76: Neural Machine Transla#on

•  Comparetargetandsourcehiddenstates.

Amen/onMechanism–Scoring

am a student _ Je

suis

I

Attention Layer

Context vector

?

13 5

5/19/16 ThangLuong-NeuralMachineTranslaHon 76

Page 77: Neural Machine Transla#on

•  Comparetargetandsourcehiddenstates.

Amen/onMechanism–Scoring

am a student _ Je

suis

I

Attention Layer

Context vector

?

13 5 1

5/19/16 ThangLuong-NeuralMachineTranslaHon 77

Page 78: Neural Machine Transla#on

•  Convertintoalignmentweights.

Amen/onMechanism–Normaliza>on

am a student _ Je

suis

I

Attention Layer

Context vector

?

0.10.3 0.5 0.1

5/19/16 ThangLuong-NeuralMachineTranslaHon 78

Page 79: Neural Machine Transla#on

am a student _ Je

suis

I

Context vector

•  Buildcontextvector:weightedaverage.

Amen/onMechanism–Contextvector

?

5/19/16 ThangLuong-NeuralMachineTranslaHon 79

Page 80: Neural Machine Transla#on

am a student _ Je

suis

I

Context vector

•  Computethenexthiddenstate.

Amen/onMechanism–Hiddenstate

5/19/16 ThangLuong-NeuralMachineTranslaHon 80

Page 81: Neural Machine Transla#on

am a student _ Je

suis

I

Context vector

•  Predictthenextword.

Amen/onMechanism–Predict

5/19/16 ThangLuong-NeuralMachineTranslaHon 81

Page 82: Neural Machine Transla#on

•  ExaminevariousavenHonmechanisms:

ThangLuong,HieuPham,andChrisManning.Effec>veApproachestoAGen>on-basedNeuralMachineTransla>on.EMNLP2015.

Global:allsourcestates. Local:subsetofsourcestates.

SOTAforEnglish-GermantranslaHon.

5/19/16 ThangLuong-NeuralMachineTranslaHon 82

Page 83: Neural Machine Transla#on

TranslateLongSentences

10 20 30 40 50 60 7010

15

20

25

Sent Lengths

BLEU

����

ours, no attn (BLEU 13.9)ours, local−p attn (BLEU 20.9)ours, best system (BLEU 23.0)WMT’14 best (BLEU 20.7)Jeans et al., 2015 (BLEU 21.6)

NoAvenHon

AvenHon

5/19/16 ThangLuong-NeuralMachineTranslaHon 83

Page 84: Neural Machine Transla#on

SampleEnglish-Germantransla/ons

•  Translatenamescorrectly.

source OrlandoBloomandMirandaKerrsHllloveeachother

human OrlandoBloomundMirandaKerrliebensichnochimmer

bestOrlandoBloomundMirandaKerrliebeneinandernochimmer.

baseOrlandoBloomundLucasMirandaliebeneinandernochimmer.

5/19/16 ThangLuong-NeuralMachineTranslaHon 84

Page 85: Neural Machine Transla#on

SampleEnglish-Germantransla/ons

•  Translateadoubly-negatedphrasecorrectly•  Failtotranslate“passengerexperience”.

sourceWeʹrepleasedtheFAArecognizesthatanenjoyablepassengerexperienceisnotincompa>blewithsafetyandsecurity,saidRogerDow,CEOoftheU.S.TravelAssociaHon.

humanWirfreuenuns,dassdieFAAerkennt,dasseinangenehmesPassagiererlebnisnichtimWider-spruchzurSicherheitsteht,sagteRogerDow,CEOderU.S.TravelAssociaHon.

best Wirfreuenuns,dassdieFAAanerkennt,dasseinangenehmesistnichtmitSicherheitundSicherheitunvereinbarist,sagteRogerDow,CEOderUS-die.

baseWirfreuenunsuberdie<unk>,dassein<unk><unk>mitSicherheitnichtvereinbaristmitSicherheitundSicherheit,sagteRogerCameron,CEOderUS-<unk>.

5/19/16 ThangLuong-NeuralMachineTranslaHon 85

Page 86: Neural Machine Transla#on

sourceWeʹrepleasedtheFAArecognizesthatanenjoyablepassengerexperienceisnotincompa>blewithsafetyandsecurity,saidRogerDow,CEOoftheU.S.TravelAssociaHon.

humanWirfreuenuns,dassdieFAAerkennt,dasseinangenehmesPassagiererlebnisnichtimWider-spruchzurSicherheitsteht,sagteRogerDow,CEOderU.S.TravelAssociaHon.

best Wirfreuenuns,dassdieFAAanerkennt,dasseinangenehmesistnichtmitSicherheitundSicherheitunvereinbarist,sagteRogerDow,CEOderUS-die.

baseWirfreuenunsuberdie<unk>,dassein<unk><unk>mitSicherheitnichtvereinbaristmitSicherheitundSicherheit,sagteRogerCameron,CEOderUS-<unk>.

SampleEnglish-Germantransla/ons

•  Translateadoubly-negatedphrasecorrectly•  Failtotranslate“passengerexperience”.5/19/16 ThangLuong-NeuralMachineTranslaHon 86

Page 87: Neural Machine Transla#on

TEDtalk,English-German

30.85

26.18 26.02 24.9622.51

20.08

0

5

10

15

20

25

30

35

BLEU

16.16

21.84 22.67 23.42

28.18

0

5

10

15

20

25

30

Stanford EdinburghKarlsruheHeidelberg PJAIT

HUMANTER

26%

ThangLuongandChrisManning.StanfordNeuralMachineTransla>onSystemsforSpokenLanguageDomain.IWSLT2015.

Winning

5/19/16 ThangLuong-NeuralMachineTranslaHon 87

Page 88: Neural Machine Transla#on

AdvancingNMT

•  #1:thevocabularysizeproblem– Sol:“copy”mechanism.

•  #2:thesentencelengthproblem– Sol:avenHonmechanism.

•  #3:thelanguagecomplexityproblem– Sol:character-leveltranslaHon.

5/19/16 ThangLuong-NeuralMachineTranslaHon 88

The<unk>porHcoin<unk>

Le<unk><unk>de<unk>

ecotax Pont-de-Buis

por8que écotaxe Pont-de-Buis

am a student _ Je suis étudiant

Je suis étudiant _

I

Page 89: Neural Machine Transla#on

#3Therarewordproblem

•  “Copying”mechanismsarenotsufficient.– Differentalphabets:Christopher↦Kryštof– MulH-wordalignment:Solarsystem↦Sonnensystem

•  Needtohandlelarge,openvocabulary– Richmorphology:nejneobhospodařovávatelnějšímu

(“totheworstfarmableone”)–  Informalspelling:gooooooodmorning!!!!!

Beabletogenerateatthecharacterlevel.5/19/16 ThangLuong-NeuralMachineTranslaHon 89

Page 90: Neural Machine Transla#on

Recentcharacter-levelNMT

•  UnsaHsfactoryperformance–  (WangLing,IsabelTrancoso,ChrisDyer,AlanBlack,arXiv2015)

•  IncompletesoluHon– Decoderonly(JunyoungChung,KyunghyunCho,YoshuaBengio.arXiv2016).

5/19/16 ThangLuong-NeuralMachineTranslaHon 90

Page 91: Neural Machine Transla#on

•  Abest-of-both-worldsarchitecture:– Translatemostlyatthewordlevel– Onlygothecharacterlevelwhenneeded.

•  AddiHonal+2.1↦+11.4BLEUimprovement.

ThangLuongandChrisManning.AchievingOpenVocabularyNeuralMachineTransla>onwithHybridWord-CharacterModels.Insubmission,ACL2016.

SOTAforEnglish-CzechtranslaHon.

5/19/16 ThangLuong-NeuralMachineTranslaHon 91

Page 92: Neural Machine Transla#on

HybridNMT

Word-level(4layers)

End-to-endtraining8-stackingLSTMlayers.

5/19/16 ThangLuong-NeuralMachineTranslaHon 92

Page 93: Neural Machine Transla#on

SourceRepresenta/ons

•  On-the-flyembeddings.– Zero-init,batchcomputaHon.

5/19/16 ThangLuong-NeuralMachineTranslaHon 93

Page 94: Neural Machine Transla#on

TargetGenera/on Initwithword

hiddenstates.

Purposely,use<unk>tomakedecodingeasier.

5/19/16 ThangLuong-NeuralMachineTranslaHon 94

Page 95: Neural Machine Transla#on

English-CzechWMT’15Results

Systems BLEUWinningentry(Bojar&Tamchyna,2015) 18.8

Exis8ngword-levelNMT(Jeanetal.,2015)

Singlemodel 15.7

Ensemble4models 18.3Largevocab+unkreplace

30xdata3systems

5/19/16 ThangLuong-NeuralMachineTranslaHon 95

Page 96: Neural Machine Transla#on

English-CzechWMT’15Results

Systems BLEUWinningentry(Bojar&Tamchyna,2015) 18.8

Exis8ngword-levelNMT(Jeanetal.,2015)

Singlemodel 15.7

Ensemble4models 18.3

Ourcharacter-basedNMT

Singlemodel(600-stepbackprop) 15.9

Largevocab+unkreplace

30xdata3systems

•  Purelycharacter-based:slowbutpromising!

5/19/16 ThangLuong-NeuralMachineTranslaHon 96

Page 97: Neural Machine Transla#on

English-CzechWMT’15Results

Systems BLEUWinningentry(Bojar&Tamchyna,2015) 18.8

Exis8ngword-levelNMT(Jeanetal.,2015)

Singlemodel 15.7

Ensemble4models 18.3

Ourcharacter-basedNMT

Singlemodel(600-stepbackprop) 15.9

OurhybridNMT

Singlemodel 19.6

Largevocab+unkreplace

30xdata3systems

NewSOTA!

5/19/16 ThangLuong-NeuralMachineTranslaHon 97

Page 98: Neural Machine Transla#on

English-CzechWMT’15Results

Systems BLEUWinningentry(Bojar&Tamchyna,2015) 18.8

Exis8ngword-levelNMT(Jeanetal.,2015)

Singlemodel 15.7

Ensemble4models 18.3

Ourcharacter-basedNMT

Singlemodel(600-stepbackprop) 15.9

OurhybridNMT

Singlemodel 19.6

Ensemble4models 20.7

Largevocab+unkreplace

30xdata3systems

BemerSOTA!

5/19/16 ThangLuong-NeuralMachineTranslaHon 98

Page 99: Neural Machine Transla#on

EffectsofVocabularySizes

02468101214161820

1K 10K 20K 50K

BLEU

VocabularySize

Word Word+unkreplace Hybrid

5/19/16 ThangLuong-NeuralMachineTranslaHon 99

Page 100: Neural Machine Transla#on

EffectsofVocabularySizes

02468101214161820

1K 10K 20K 50K

BLEU

VocabularySize

Word Word+unkreplace Hybrid

AddiHonalgainsof+2.1↦+11.4BLEU

+11.4

+4.5+3.5

+2.1

5/19/16 ThangLuong-NeuralMachineTranslaHon 100

Page 101: Neural Machine Transla#on

EffectsofVocabularySizes

02468101214161820

1K 10K 20K 50K

BLEU

VocabularySize

Word Word+unkreplace Hybrid

Small-vocabhybrid=Large-vocabword5/19/16 ThangLuong-NeuralMachineTranslaHon 101

Page 102: Neural Machine Transla#on

RareWordEmbeddings

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

acceptable

acknowledgement

admissionadmitadmittanceadmitting

advance

antagonistchoose

chooses

connect

decide

developdevelopments

evidentlyexplicit

founder

governance

immobileimmoveable

impossible

insensitiveinsufficiency

linkmanagement

necessary

nominated

noticeable

obvious

perceptible

possible

practice

satisfactory

sponsor

unacceptable

unaffected

uncomfortableunsatisfactory

unsuitable

antagonize

cofounderscompanionships

disrespectful

heartlesslyheartlessness

illiberal

impossibilitiesinabilities

loveless

narrowïmindednarrowïmindedness nonconscious

regretful

spiritless

unattainableness

unconcern

uncontroversial

unfeatheredunfledged

ungraceful

unrealizable

unsighted

untrustworthy

wholeheartedness

•  Word&character-basedembeddings.5/19/16 ThangLuong-NeuralMachineTranslaHon 102

Page 103: Neural Machine Transla#on

RareWordEmbeddings

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

acceptable

acknowledgement

admissionadmitadmittanceadmitting

advance

antagonistchoose

chooses

connect

decide

developdevelopments

evidentlyexplicit

founder

governance

immobileimmoveable

impossible

insensitiveinsufficiency

linkmanagement

necessary

nominated

noticeable

obvious

perceptible

possible

practice

satisfactory

sponsor

unacceptable

unaffected

uncomfortableunsatisfactory

unsuitable

antagonize

cofounderscompanionships

disrespectful

heartlesslyheartlessness

illiberal

impossibilitiesinabilities

loveless

narrowïmindednarrowïmindedness nonconscious

regretful

spiritless

unattainableness

unconcern

uncontroversial

unfeatheredunfledged

ungraceful

unrealizable

unsighted

untrustworthy

wholeheartedness

•  Word&character-basedembeddings.5/19/16 ThangLuong-NeuralMachineTranslaHon 103

Page 104: Neural Machine Transla#on

RareWordEmbeddings

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

acceptable

acknowledgement

admissionadmitadmittanceadmitting

advance

antagonistchoose

chooses

connect

decide

developdevelopments

evidentlyexplicit

founder

governance

immobileimmoveable

impossible

insensitiveinsufficiency

linkmanagement

necessary

nominated

noticeable

obvious

perceptible

possible

practice

satisfactory

sponsor

unacceptable

unaffected

uncomfortableunsatisfactory

unsuitable

antagonize

cofounderscompanionships

disrespectful

heartlesslyheartlessness

illiberal

impossibilitiesinabilities

loveless

narrowïmindednarrowïmindedness nonconscious

regretful

spiritless

unattainableness

unconcern

uncontroversial

unfeatheredunfledged

ungraceful

unrealizable

unsighted

untrustworthy

wholeheartedness

•  Word&character-basedembeddings.5/19/16 ThangLuong-NeuralMachineTranslaHon 104

Page 105: Neural Machine Transla#on

SampleEnglish-Czechtransla/ons

source TheauthorStephenJayGoulddied20yearsalerdiagnosis.human AutorStephenJayGouldzemřel20letpodiagnóze.

char AutorStepherStepherzemřel20letpodiagnóze.

wordAutorStephenJay<unk>zemřel20letpo<unk>.

AutorStephenJayGouldzemřel20letpopo.

hybridAutorStephenJay<unk>zemřel20letpo<unk>.

AutorStephenJayGouldzemřel20letpodiagnóze.

PerfecttranslaHon!

5/19/16 ThangLuong-NeuralMachineTranslaHon 105

Page 106: Neural Machine Transla#on

SampleEnglish-Czechtransla/ons

•  Char-based:wrongnametranslaHon.•  Char-based:wrongnametranslaHon.

source TheauthorStephenJayGoulddied20yearsalerdiagnosis.human AutorStephenJayGouldzemřel20letpodiagnóze.

char AutorStepherStepherzemřel20letpodiagnóze.

wordAutorStephenJay<unk>zemřel20letpo<unk>.

AutorStephenJayGouldzemřel20letpopo.

hybridAutorStephenJay<unk>zemřel20letpo<unk>.

AutorStephenJayGouldzemřel20letpodiagnóze.

5/19/16 ThangLuong-NeuralMachineTranslaHon 106

Page 107: Neural Machine Transla#on

SampleEnglish-Czechtransla/ons

•  Word-based:incorrectalignment•  Char-based:wrongnametranslaHon.

source TheauthorStephenJayGoulddied20yearsalerdiagnosis.human AutorStephenJayGouldzemřel20letpodiagnóze.

char AutorStepherStepherzemřel20letpodiagnóze.

wordAutorStephenJay<unk>zemřel20letpo<unk>.

AutorStephenJayGouldzemřel20letpopo.

hybridAutorStephenJay<unk>zemřel20letpo<unk>.

AutorStephenJayGouldzemřel20letpodiagnóze.

5/19/16 ThangLuong-NeuralMachineTranslaHon 107

Page 108: Neural Machine Transla#on

SampleEnglish-Czechtransla/ons

•  Char-based&hybrid:correcttranslaHonofdiagnóze.•  Char-based:wrongnametranslaHon.

source TheauthorStephenJayGoulddied20yearsalerdiagnosis.human AutorStephenJayGouldzemřel20letpodiagnóze.

char AutorStepherStepherzemřel20letpodiagnóze.

wordAutorStephenJay<unk>zemřel20letpo<unk>.

AutorStephenJayGouldzemřel20letpopo.

hybridAutorStephenJay<unk>zemřel20letpo<unk>.

AutorStephenJayGouldzemřel20letpodiagnóze.

5/19/16 ThangLuong-NeuralMachineTranslaHon 108

Page 109: Neural Machine Transla#on

SampleEnglish-Czechtransla/ons

source AstheReverendMar>nLutherKingJr.saidfiTyyearsago:human Jakpředpadesá/letyřeklreverendMar/nLutherKingJr.:

char JakoreverendMar/nLutherkrálříkalpředpadesá>lety:

wordJakřeklreverendMarHn<unk>King<unk>předpadesáHlety:

JakřeklreverendMar/nLutherKingřeklpředpadesá>lety:

hybridJakřeklreverendMarHn<unk>King<unk>předpadesáHlety:

Jakpředpadesá/letyřeklreverendMar/nLutherKingJr.:

Correctreordering!

5/19/16 ThangLuong-NeuralMachineTranslaHon 109

Page 110: Neural Machine Transla#on

source AstheReverendMar>nLutherKingJr.saidfiTyyearsago:human Jakpředpadesá/letyřeklreverendMar/nLutherKingJr.:

char JakoreverendMar/nLutherkrálříkalpředpadesá>lety:

wordJakřeklreverendMarHn<unk>King<unk>předpadesáHlety:

JakřeklreverendMar/nLutherKingřeklpředpadesá>lety:

hybridJakřeklreverendMarHn<unk>King<unk>předpadesáHlety:

Jakpředpadesá/letyřeklreverendMar/nLutherKingJr.:

SampleEnglish-Czechtransla/ons

•  Char-based:“král”means“king”.•  Char-based:wrongnametranslaHon.5/19/16 ThangLuong-NeuralMachineTranslaHon 110

Page 111: Neural Machine Transla#on

SampleEnglish-Czechtransla/ons

source Her11-year-olddaughter,ShaniBart,saiditfeltalivlebitweird

human Jejíjedenác/letádceraShaniBartováprozradila,žejetotrochuzvláštní

char Jejíjedenác/letádcera,ShaniBartová,říkala,žecí�trochudivně

wordJejí<unk>dcera<unk><unk>řekla,žejetotrochudivné

Její11-year-olddceraShani,řekla,žejetotrochudivné

hybridJejí<unk>dcera,<unk><unk>,řekla,žejeto<unk><unk>

Jejíjedenác/letádcera,GrahamBart,řekla,žecí�trochudivný

Generatecomplexwords!

5/19/16 ThangLuong-NeuralMachineTranslaHon 111

Page 112: Neural Machine Transla#on

SampleEnglish-Czechtransla/ons

source Her11-year-olddaughter,ShaniBart,saiditfeltalivlebitweird

human Jejíjedenác/letádceraShaniBartováprozradila,žejetotrochuzvláštní

char Jejíjedenác/letádcera,ShaniBartová,říkala,žecí�trochudivně

wordJejí<unk>dcera<unk><unk>řekla,žejetotrochudivné

Její11-year-olddceraShani,řekla,žejetotrochudivné

hybridJejí<unk>dcera,<unk><unk>,řekla,žejeto<unk><unk>

Jejíjedenác/letádcera,GrahamBart,řekla,žecí�trochudivný

•  Word-based:idenHtycopyfails.•  Char-based:wrongnametranslaHon.5/19/16 ThangLuong-NeuralMachineTranslaHon 112

Page 113: Neural Machine Transla#on

SampleEnglish-Czechtransla/ons

source Her11-year-olddaughter,ShaniBart,saiditfeltalivlebitweird

human Jejíjedenác/letádceraShaniBartováprozradila,žejetotrochuzvláštní

char Jejíjedenác/letádcera,ShaniBartová,říkala,žecí�trochudivně

wordJejí<unk>dcera<unk><unk>řekla,žejetotrochudivné

Její11-year-olddceraShani,řekla,žejetotrochudivné

hybridJejí<unk>dcera,<unk><unk>,řekla,žejeto<unk><unk>

Jejíjedenác/letádcera,GrahamBart,řekla,žecí�trochudivný

•  Hybrid:translatenamesincorrectly.•  Char-based:wrongnametranslaHon.5/19/16 ThangLuong-NeuralMachineTranslaHon 113

Page 114: Neural Machine Transla#on

WehaveadvancedNMT

•  #1:thevocabularysizeproblem–  Sol:“copy”mechanism.–  SOTAEnglish-French

•  #2:thesentencelengthproblem–  Sol:avenHonmechanism.–  SOTAEnglish-German

•  #3:thelanguagecomplexityproblem–  Sol:character-leveltranslaHon.–  SOTAEnglish-Czech

5/19/16 ThangLuong-NeuralMachineTranslaHon 114

The<unk>porHcoin<unk>

Le<unk><unk>de<unk>

ecotax Pont-de-Buis

por8que écotaxe Pont-de-Buis

am a student _ Je suis étudiant

Je suis étudiant _

I

Page 115: Neural Machine Transla#on

NMT&beyond

•  UnsupervisedlearningforNMT– UHlizemonolingualdata.

•  Long-contextNMT–  TranslaHnganarHcle/abook.–  SmarteravenHon,longersequences.

•  MulH-modalLanguageUnderstandingSystem– MulH-lingualtranslaHon+speechrecogniHon+more– MulH-tasklearning

Thankyou!

5/19/16 ThangLuong-NeuralMachineTranslaHon 115

Page 116: Neural Machine Transla#on

I’mdone.Butifyouarecurious,readon!

5/19/16 ThangLuong-NeuralMachineTranslaHon 116

Page 117: Neural Machine Transla#on

•  CanweuHlizeallsequence-to-sequencedata?

•  CanwecompressNMTformobiledevices?

#4ForthefutureofNMT

MachinetranslaHon ConsHtuentparsing

5/19/16 ThangLuong-NeuralMachineTranslaHon 117

Page 118: Neural Machine Transla#on

Ourwork

•  MulH-tasklearning:– MachinetranslaHon –ConsHtuentparsing–  ImagecapHongeneraHon–Unsupervisedlearning

•  TranslaHonimprovement:upto+1.5BLEU.

•  State-of-the-artinconsHtuentparsing.

ThangLuong,QuocLe,IlyaSutskever,OriolVinyals,andLukaszKaiser.Mul>-tasksequencetosequencelearning.ICLR2016.

5/19/16 ThangLuong-NeuralMachineTranslaHon 118

Page 119: Neural Machine Transla#on

Many-to-one:shareddecoder

English (unsupervised)

Image (captioning) English

German (translation)

1616.517

17.518

18.519

German-EnglishTransla/on(BLEU)

Big(transla>on)+Medium(cap>on)

(Luongetal.,2015)

Single

+capHon(0.1x)

+capHon(0.05x)

+capHon(0.01x)

+0.7BLEU

5/19/16 ThangLuong-NeuralMachineTranslaHon 119

Page 120: Neural Machine Transla#on

1313.514

14.515

15.516

English-GermanTransla/on(BLEU)

Big(transla>on)+Small(PTBparsing)

(Luongetal.,2015)

Single

+parsing(1x)

+parsing(0.1x)

+parsing(0.01x)

+1.5BLEU

English (unsupervised)

German (translation)

Tags (parsing)English

One-to-many:sharedencoder

MixingraHo5/19/16 ThangLuong-NeuralMachineTranslaHon 120

Page 121: Neural Machine Transla#on

Ourwork

AbigailSee*,ThangLuong*,andChrisManning.CompressionofNeuralMachineTransla>onModelsviaPruning.Insubmission.

•  CompressNMTviapruning&retraining:

0 10 20 30 40 50 60 70 80 90

0

5

10

15

20

25

percentage pruned

BLEU

score

pruned

pruned and retrained

Figure 1: Performance of pruned models, immediately after pruning and after

retraining.

1

Originalmodel

Prunesmallestweights

Prune+retrain

5/19/16 ThangLuong-NeuralMachineTranslaHon 121

Page 122: Neural Machine Transla#on

Ourwork•  CompressNMTviapruning&retraining:

0 10 20 30 40 50 60 70 80 90

0

5

10

15

20

25

percentage pruned

BLEU

score

pruned

pruned and retrained

Figure 1: Performance of pruned models, immediately after pruning and after

retraining.

1

Originalmodel

Prunesmallestweights

Prune+retrain

Prune80%withoutlossofperformance.5/19/16 ThangLuong-NeuralMachineTranslaHon 122

Page 123: Neural Machine Transla#on

NMTRedundancy–Embeddings

•  Frequentwordshavelargerweights– white:large.– black:small.

target embedding weights

source embedding weights

least common wordmost common word

source layer 1 weights

recurrentfeed-forward

input gate

forget gate

output gate

input

source layer 2 weights source layer 3 weights source layer 4 weights

target layer 1 weights target layer 2 weights target layer 3 weights target layer 4 weights

5/19/16 ThangLuong-NeuralMachineTranslaHon 123

Page 124: Neural Machine Transla#on

target embedding weights

source embedding weights

least common wordmost common word

source layer 1 weights

recurrentfeed-forward

input gate

forget gate

output gate

input

source layer 2 weights source layer 3 weights source layer 4 weights

target layer 1 weights target layer 2 weights target layer 3 weights target layer 4 weights

NMTRedundancy–LSTMLayer1 Layer2 Layer3 Layer4

Encoder

Decoder

Inputgate

Forgetgate

Outputgate

Inputsignal

Feed-forward Recurrent

5/19/16 ThangLuong-NeuralMachineTranslaHon 124

Page 125: Neural Machine Transla#on

target embedding weights

source embedding weights

least common wordmost common word

source layer 1 weights

recurrentfeed-forward

input gate

forget gate

output gate

input

source layer 2 weights source layer 3 weights source layer 4 weights

target layer 1 weights target layer 2 weights target layer 3 weights target layer 4 weights

NMTRedundancy–LSTMLayer1 Layer2 Layer3 Layer4

Encoder

Decoder

Inputgate

Forgetgate

Outputgate

Inputsignal

Feed-forward Recurrent

Inputsignalisimportant!

5/19/16 ThangLuong-NeuralMachineTranslaHon 125

Page 126: Neural Machine Transla#on

target embedding weights

source embedding weights

least common wordmost common word

source layer 1 weights

recurrentfeed-forward

input gate

forget gate

output gate

input

source layer 2 weights source layer 3 weights source layer 4 weights

target layer 1 weights target layer 2 weights target layer 3 weights target layer 4 weights

NMTRedundancy–LSTMLayer1 Layer2 Layer3 Layer4

Encoder

Decoder

Inputgate

Forgetgate

Outputgate

Inputsignal

Feed-forward Recurrent

Forgetgate:small–layer1large–layer4

5/19/16 ThangLuong-NeuralMachineTranslaHon 126

Page 127: Neural Machine Transla#on

FutureChallenges

Shesawanelephantinherdress.Theelephantmusthaveagood

senseoffashion!Needstounderstand

commonsense&largercontext.

Shesawanelephantinherdress.

5/19/16 ThangLuong-NeuralMachineTranslaHon 127

Thankyou!