blockwise parallel decoding for deep autoregressive modelsmitchell/files/nips-2018_poster... ·...
TRANSCRIPT
BlockwiseParallelDecodingforDeepAutoregressiveModelsMitchellStern
UCBerkeleyNoamShazeerGoogleBrain
JakobUszkoreitGoogleBrain
OverviewSomerecentsequence-to-sequencemodelsliketheTransformer(Vaswanietal.,2017)canscorealloutputposiQonsinparallel.WeproposeasimplealgorithmictechniquethatexploitsthispropertytogeneratemulQpletokensinparallelatdecodingQmewithliTletonolossinquality.Ourfastestmodelsexhibitwall-clockspeedupsofupto4xoverstandardgreedydecodingonthetasksofmachinetranslaQonandimagesuper-resoluQon.
BasicApproach
CombinedApproach
ImplementaQonandTraining• Augmentthedecoderarchitecturetopredictthenextk
tokensinparallelwithsub-modelsp1,…,pk
• Eitheruseafrozenbasemodeltoensurecomparablequality,oremployfine-tuningtoimproveinternalconsistencyandachievebeTerfuturepredicQon
• OpQonallyusesequence-levelknowledgedisQllaQontoconstructatrainingsetwithgreaterpredictabilityarisingfromconsistentmodebreakingfromtheteachermodel
ExamplesEnglish-Germanmachinetransla1onusingamodeltrainedwithk=10:
Input:TheJamesWebbSpaceTelescope(JWST)willbelaunchedintospaceonboardanAriane5rocketby2018attheearliest.
Output:DasJamesWebbSpaceTeleskop(JWST)wirdbisspätestens2018anBordeinerAriane5-RaketeindenWeltraumgestartet.
• Step1 10tokens [Das_, James_, Web, b_, Space_, Tele, sko, p_, (_, J]• Step2 5tokens [W, ST_, ) _, wird_, bis_]• Step3 4tokens [späte, stens_, 2018_, an_]• Step4 10tokens [Bord_, einer_, Ari, ane, 5_, -_, Rak, ete_, in_, den_]• Step5 2tokens [Weltraum, _]• Step6 3tokens [gestartet_, ._, <EOS>]
Imagesuper-resolu1onusingamodeltrainedwithk=10andallowingforapproximatepixelmatches(leb:input,middle:greedydecode,right:paralleldecode):
ResultsEN-DEmachinetranslaQon:devBLEUscoreandmeanacceptedblocksize
EN-DEmachinetranslaQon:testBLEUscoreandwall-clockspeedup
Imagesuper-resoluQon:meanacceptedblocksize
Imagesuper-resoluQon:humanevaluaQon
Wall-clockspeedupvs.meanacceptedblocksize
Predictthenextktokensusingthebasescoringmodelandk-1auxiliarymodels;verifythepredicQonsinparallelusingthebasemodel;accepttheprefixthatagreeswiththegreedypredicQons.
Combiningthescoringandproposalmodelsallowsustomergethecurrentverifysubstepwiththenextpredictsubstep,reducingthenumberofparallelmodelinvocaQonsduringinferencebyafactorof2.