blockwise parallel decoding for deep autoregressive modelsmitchell/files/nips-2018_poster... ·...

Post on 10-Mar-2019

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

BlockwiseParallelDecodingforDeepAutoregressiveModelsMitchellStern

UCBerkeleyNoamShazeerGoogleBrain

JakobUszkoreitGoogleBrain

OverviewSomerecentsequence-to-sequencemodelsliketheTransformer(Vaswanietal.,2017)canscorealloutputposiQonsinparallel.WeproposeasimplealgorithmictechniquethatexploitsthispropertytogeneratemulQpletokensinparallelatdecodingQmewithliTletonolossinquality.Ourfastestmodelsexhibitwall-clockspeedupsofupto4xoverstandardgreedydecodingonthetasksofmachinetranslaQonandimagesuper-resoluQon.

BasicApproach

CombinedApproach

ImplementaQonandTraining•  Augmentthedecoderarchitecturetopredictthenextk

tokensinparallelwithsub-modelsp1,…,pk

•  Eitheruseafrozenbasemodeltoensurecomparablequality,oremployfine-tuningtoimproveinternalconsistencyandachievebeTerfuturepredicQon

•  OpQonallyusesequence-levelknowledgedisQllaQontoconstructatrainingsetwithgreaterpredictabilityarisingfromconsistentmodebreakingfromtheteachermodel

ExamplesEnglish-Germanmachinetransla1onusingamodeltrainedwithk=10:

Input:TheJamesWebbSpaceTelescope(JWST)willbelaunchedintospaceonboardanAriane5rocketby2018attheearliest.

Output:DasJamesWebbSpaceTeleskop(JWST)wirdbisspätestens2018anBordeinerAriane5-RaketeindenWeltraumgestartet.

•  Step1 10tokens [Das_, James_, Web, b_, Space_, Tele, sko, p_, (_, J]•  Step2 5tokens [W, ST_, ) _, wird_, bis_]•  Step3 4tokens [späte, stens_, 2018_, an_]•  Step4 10tokens [Bord_, einer_, Ari, ane, 5_, -_, Rak, ete_, in_, den_]•  Step5 2tokens [Weltraum, _]•  Step6 3tokens [gestartet_, ._, <EOS>]

Imagesuper-resolu1onusingamodeltrainedwithk=10andallowingforapproximatepixelmatches(leb:input,middle:greedydecode,right:paralleldecode):

ResultsEN-DEmachinetranslaQon:devBLEUscoreandmeanacceptedblocksize

EN-DEmachinetranslaQon:testBLEUscoreandwall-clockspeedup

Imagesuper-resoluQon:meanacceptedblocksize

Imagesuper-resoluQon:humanevaluaQon

Wall-clockspeedupvs.meanacceptedblocksize

Predictthenextktokensusingthebasescoringmodelandk-1auxiliarymodels;verifythepredicQonsinparallelusingthebasemodel;accepttheprefixthatagreeswiththegreedypredicQons.

Combiningthescoringandproposalmodelsallowsustomergethecurrentverifysubstepwiththenextpredictsubstep,reducingthenumberofparallelmodelinvocaQonsduringinferencebyafactorof2.

top related