domain adaptation for neural machine translationkevinduh/notes/1905-kduh-mtma-adapt.pdfsubnetworksto...
TRANSCRIPT
DomainAdaptationforNeuralMachineTranslation
KevinDuhJohnsHopkinsUniversity
May2019
Outline
1. Problemdefinition2. Surveyofadaptationmethods3. ErrorAnalysis4. PromisingResearchDirections
DomainAdaptationProblem:MachineLearningPerspective
• Trainingdata:– (x1,y1),(x2,y2),(x3,y3),…, i.i.d.samplesfromdistributionD
– Buildmodelp(y|x)
• IftestdataisnotfromD, p(y|x)maybeoperatingataspaceitwasn’tbuiltfor.– Twocasesforwhatwemeanby“notfromD”
Visualization:Fittingp(y|x)
x
y
Case1:Testisnotininputdomain(CovariateShift)
x
y
testsamplext
Wheredoyoupredictyt?
Case2:Input-outputrelationchanges
x
y
testsamplext
Expectyt here
Butunexpectedlyyt ishere
ExamplesinMachineTranslation(MT)
• Domainmismatchexample:– TrainingdataconsistsofPatent sentences– TestsampleisSocialMedia
• Case1:Testisnotininputdomain– cantranslatetechnicalwordslike“NMT”– noideahowtotranslate“OMG”
• Case2:Input-Outputrelationchanges– “CAT”translatestoawordthatmeans“ComputerAidedTranslation” ratherthan“Cutefurryanimal”
“Domain”isafuzzyconceptinpractice
• Corporadifferby:– Topics:patents,politics,news,medicine– Style:formal,informal–Modality:written,spoken
• Oftenuse“domain”torefertodatasource:– e.g.Europarl,OpenSubtitles,TED,Paracrawl
• Bothcase1andcase2mismatchesoccuratmultiplelevels:lexical,syntactic,etc.
Examplesentences(case1):whichisPatent,TED,Subtitles,Europarl?
1. Weliveinadigitalworld,butwe’refairlyanalogcreatures.
2. Thetabletsexhibitimprovedbioavailabilityoftheactiveingredient.
3. So,um… she’skidding.4. Resumptionofthesession
12 3 4
Examplebitext (case2)
WhyisDomainAdaptationanimportantprobleminMT?
• Itmaybeexpensivetoobtaintrainingbitextsthatarebothlarge &relevant totestdomain
• Oftenhavetoworkwithwhateverwecanget
Small Large
Irrelevant ✔
Relevant ✔ ✔✔
DataSize
Relevancetotestdomain
Terminology(1of2)
• Example:TestdomainisSocialMedia• In-domaindata– Datathatisrelevanttotestdomain:SNScorpus
• Out-of-domaindata– Datathatislessrelevanttotestdomain:Europarl
• General-domaindata–MayuseinterchangeablywithOut-of-Domain–Maymeanmixedcorpus:
• Europaral +Patent+TED• Europarl +Patent+TED+SNS
Terminology(2of2)
• Supervisedadaptationmethods:– AssumesOODbitext &IDbitext
• Unsupervisedadaptationmethods:– AssumesOODbitext &IDmonotext
Small Large
Irrelevant Out-of-Domain(OOD)
Relevant In-Domain(ID)
DataSize
Relevancetotestdomain
Outline
1. Problemdefinition2. Surveyofadaptationmethods3. ErrorAnalysis4. PromisingResearchDirections
AtaxonomyofdomainadaptationmethodsforNMT
From:Chenhui ChuandRui Wang,ASurveyofDomainAdaptationforNeuralMachineTranslation,COLING2018
Datacentricadaptationmethods
Note:I’llpresentanassortmentofmethods,pickedmainlytodemonstratethevarietybutnotnecessarilyrepresentativeoftheliterature.
SyntheticDataAugmentation(forwardorbacktranslation)
LargeOut-of-DomainBitext
In-domainMonotext
Out-of-DomainNMTModel
src trgsrc ortrg
FilteringOut-of-DomainBitext forrelevantdatasubsets(esp.forcase2)
LargeOut-of-DomainBitext
SmallIn-domainBitext
Relevancemodele.g.ngram languagemodel
RobertCMooreandWilliamLewis.Intelligentselectionoflanguagemodeltrainingdata.ACL2010
Amittai Axelrod,Xiaodong He,andJianfeng Gao.Domainadaptationviapseudoin-domaindataselection.EMNLP2011
KevinDuh,GrahamNeubig,KatsuhitoSudoh,andHajimeTsukada.Adaptationdataselectionusingneurallanguagemodels:Experimentsinmachinetranslation.ACL2013
Marcin Junczys-Dowmunt.Dualconditionalcross- entropyfilteringofnoisyparallelcorpora.WMT2018
Trainingobjectivecentricmethods
ContinuedTraining(a.k.a fine-tuning)
Out-of-DomainNMTModel
ContinuedTrainingNMTModel
RandomInitializedNMTModel
SmallIn-domainBitext
LargeOut-of-DomainBitext
Thisseemstobe1st citationonNMTcontinuedtraining:Minh-Thang Luong andChrisManning.StanfordNeuralMachineTranslationSystemsforSpokenLanguageDomain.IWSLT2016
ContinuedTraining:GeneralDomainà PatentDomain
0
10
20
30
40
50
60
70
German Korean Russian Chinese
GeneralDomain In-Domain ContinuedTraining
+0.4
+1.8 +10.1+3.5
BLEU
ResultsfromJHUSCALEWorkshop2018:ResilientMachineTranslationinNewDomains
Generalalgorithm:1. TrainmodelonconvergenceondatasetA(A=OODbitext)2. ContinuetrainingondatasetB(B=in-domainbitext)
ContinuedTrainingVariants:• Detailsonlearningrate,etc.instep2matters• Addingaregularizationtermorfixsubnetworks instep2
– AntonioValerioMiceli Barone,BarryHaddow,UlrichGermann,andRicoSennrich.Regularizationtechniquesforfine-tuninginneuralmachinetranslation.EMNLP2017
– HudaKhayrallah,BrianThompson,KevinDuh,andPhilippKoehn.Regularizedtrainingobjectiveforcontinuedtrainingfordomainadaptationinneuralmachinetranslation.WNMT2018
– BrianThompson,HudaKhayrallah,Antonios Anastasopoulos,Arya D.McCarthy,KevinDuh,RebeccaMarvin,PaulMcNamee,JeremyGwinnup,TimAnderson,PhilippKoehn.FreezingSubnetworks toAnalyzeDomainAdaptationinNeuralMachineTranslation,WMT2018
• Differentwaystomixdata(e.g.A+Binstep2)ororderdata– Chenhui Chu,RajDabre,andSadao Kurohashi.Anempiricalcomparisonofdomainadaptation
methodsforneuralmachinetranslation.ACL2017– Marlies vanderWees,AriannaBisazza,andChristof Monz.Dynamicdataselectionforneural
machinetranslation.EMNLP2017– WeiWang,TaroWatanabe,Macduff Hughes,Tetsuji Nakagawa,andCiprian Chelba.Denoising
neuralmachinetranslationtrainingwithtrusteddataandonlinedataselection.WMT2018– Xuan Zhang,PamelaShapiro,Gaurav Kumar,PaulMcNamee,MarineCarpuat andKevinDuh.
CurriculumLearningforDomainAdaptationinNeuralMachineTranslation.NAACL2019
• Ensembling out-of-domainmodelandcontinuedtrainedmodel:– MarkusFreitag andYaser Al-Onaizan.2016.FastDomainAdaptationforNeuralMachine
Translation.ArXiV abs/1612.06897.
InstanceWeighting
BoxingChen,ColinCherry,GeorgeFoster,SamuelLarkin.CostWeightingforNeuralMachineTranslationDomainAdaptation.WNMT2017
Rui Wang,MasaoUtiyama,Lemao Liu,Kehai Chen,Eiichiro Sumita.InstanceWeightingforNeuralMachineTranslationDomainAdaptation.EMNLP2017
Architecture/DecoderCentricmethods
• (Iprefertoconsiderthetwotypesofmethodstogethersinceitissometimesarbitrarytodifferentiatewhat’spartofthewholearchitectureandwhat’sonlypartofthedecodingprocess)
Fusionoftwomodels
Caglar Gulcehre,Orhan Firat,KelvinXu,Kyunghyun Cho,Loic Barrault,Huei-ChiLin,Fethi Bougares,Holger Schwenk,andYoshua Bengio.2015.Onusingmonolingualcorporainneuralmachinetranslation.CoRR,abs/1503.03535.
Translationmodelvs.Languagemodel
• Problem:Languagemodelcanbetoostrong
• Onesolution:
Toan NguyenandDavidChiang.ImprovingLexicalChoiceinNeuralMachineTranslation.NAACL2018(notethispaperaddressesthegeneralproblemofimproperlexicalchoice,butthisisafrequentproblemindomainadaptation)
Averagedsourcewordrepresentationatdecodetimet
Otheradaptationmethods
Adaptationatthetokenlevel:Subword Regularization
Taku Kudo,Subword Regularization:ImprovingNeuralNetworkTranslationModelswithMultipleSubword Candidates,ACL2018
Trainoverdifferentsubword segmentations,randomlysampled
Outline
1. Problemdefinition2. Surveyofadaptationmethods3. ErrorAnalysis4. PromisingResearchDirections
Thetruthis,it’shardtoanalyzeMTerrors
• ItwashardtopinpointwhytranslationwasincorrectintheSMTdays
• It’sperhapsevenharderforNMT• Butwetryanyway.Atleastitgivesawaytothinkabouttheproblem.
S4Analysis(originallydevelopedforSMT)
• SEENerror:Neverseenthissourcewordbeforeinthetrainingdata(case1in1st partofthistalk)
• SENSEerror:Thesourcewordappearsinthetrainingdata,butisnotusedinthissense.(e.g.case2)
• SCOREerror:Thesourcewordanditstranslationappearsinthetrainingdata,butthecorrecttranslationisscoredlower
• SEARCHerror:Thecorrecttranslationisscoredhigher,butsomehowgotlostinthesearchprocess
AnnIrvine,JohnMorgan,MarineCarpuat,HalDaumeIII,andDragosMunteanu.2013.Measuringmachinetranslationerrorsinnewdomains.TransactionsoftheAssociationforComputationalLinguistics(TACL)
S4Analysis(requiresreferenceandalignment)
AnnIrvine,JohnMorgan,MarineCarpuat,HalDaumeIII,andDragosMunteanu.2013.Measuringmachinetranslationerrorsinnewdomains.TransactionsoftheAssociationforComputationalLinguistics(TACL)
S4Analysis(forNMT?)
• First,runexternalwordalignertodeterminecorrect/incorrectwords(buthowmuchcanwetrustthis?)
• SEENerror:– Checkout-of-vocabularywordsonsourceside
• SENSEerror:– Foragivensourcewordf,checkifthedesiredtranslationnever
appearsinthetargetsideoftrainingbitext wheref appears?• SCOREerror:
– Whennoneoftheaboveistrue?• SEARCHerror:
– Notsurehowtocheckbesidesincreasingbeam,but..
FluentlyInadequateTranslations
• Source:(TEDtalk)
• Un-adaptedsystemoutput:I’mafraidI’mnotgoingtohavetogotobed.
• Gloss:Crowparentsseemtobeteachingtheiryoungtheseskills.
• Adaptedsystemoutput:Andtheirparentsalsotaughttheirchildrenhowtodoit.
• Translationsthatarefluentbuthavenothingtodowiththesourceareverydangerous!
MariannaJ.Martindale,MarineCarpuat,KevinDuhandPaulMcNamee.IdentifyingFluentlyInadequateOutputinNeuralandStatisticalMachineTranslation.MTSummit2019
Outline
1. Problemdefinition2. Surveyofadaptationmethods3. ErrorAnalysis4. PromisingResearchDirections(myopinion)
PersonalizedAdaptation,e.g.forComputerAssistedTranslation(CAT)
• CATpresentsmanyinterestingopportunitiesforresearch(withrealuserimpact!)
• ExampleinterfaceatLilt.com:
PaulMichel,GrahamNeubig.ExtremeAdaptationforPersonalizedNeuralMachineTranslation.ACL2018Sachith SriRamKothur,RebeccaKnowlesandPhilippKoehn.Document-LevelAdaptationforNeuralMachineTranslation.WNMT2018
AdaptationtoNewLanguages
• Givenbitext inlanguagepairsA->B,C->D– BuildatranslatorforA->D– BuildatranslatorforE->BwhereEisrelatedlanguagetoA
– Assumessomesharedrepresentation,canusecontinuedtraining,etc.
• Crazyideabutpotentiallylargeimpact
Barret Zoph,Deniz Yuret,JonathanMay,andKevinKnight.Transferlearningforlow-resourceneuralmachinetranslation.EMNLP2016;GrahamNeubig andJunjie Hu.RapidAdaptationofNeuralMachineTranslationtoNewLanguages.EMNLP2018
UnderstandingadaptationerrorsasawaytounderstandNMTbehavior
• DomainAdaptationprovidesagoodtestbedforunderstandingoverfitting,etc.
• Whattriggersafluentlyinadequatetranslation?
• Whydoescatastrophicforgettinghappen?Thompson,et.al.NAACL2019;Saunders,et.al.ACL2019;Kirpatrick,et.al.PNAS2017
Betteradaptationalgorithms
• ManyopportunitiesfornewideasinNMT,e.g.• Batchvs.Onlinesetup– Onlinesetupmakescurriculumlearningeasier
• Easytodesignnewarchitectures–Multitasklearningforsharingparameters&data
• WhatideastoborrowfromSMT?–Modularizationoflexicalchoice,reordering,syntax,etc.
Re-cap
1. Problemdefinition2. Surveyofadaptationmethods3. ErrorAnalysis4. PromisingResearchDirections
Case1:Testisnotininputdomain(CovariateShift)
x
y
testsamplext
Wheredoyoupredictyt?
Case2:Input-outputrelationchanges
x
y
testsamplext
Expectyt here
Butunexpectedlyyt ishere
WhyisDomainAdaptationanimportantprobleminMT?
• Expensivetoobtaintrainingbitexts thatarebothlarge &relevant totestdomain
Small Large
Irrelevant ✔
Relevant ✔ ✔✔
DataSize
Relevancetotestdomain
AtaxonomyofdomainadaptationmethodsforNMT
From:Chenhui ChuandRui Wang,ASurveyofDomainAdaptationforNeuralMachineTranslation,COLING2018
ContinuedTraining
Out-of-DomainNMTModel
ContinuedTrainingNMTModel
RandomInitializedNMTModel
SmallIn-domainBitext
LargeOut-of-DomainBitext
Outline
1. Problemdefinition2. Surveyofadaptationmethods3. ErrorAnalysis– S4&FluentlyInadequateTranslations
4. PromisingResearchDirections– CAT,newlanguageadaptation,adaptationas
windowtounderstandNMT,etc.