howdeep learningismaking mt and...
TRANSCRIPT
How DeepLearning is makingMTandother areas converge?MARTAR.COSTA-JUSSÀ
UNIVERSITAT POLITÈCNICA DECATALUNYA,BARCELONA
Aboutme
2
• ASR• SMT+NN
LIMSI-CNRS,Paris
• SMT• S2STranslation
UPC,Barcelona
• SMT• CLIR
USP,SãoPaulo
• HMT
I2R,Singapore
• CLIR• OM
BM,Barcelona
• HMT
IPN,Mexico
• NMT• NLI• SLTUPC,Barcelona
20042008201220142015
Outline
MachineTranslationandDeepLearning
NeuralMachineTranslation
NeuralMTarchitectureappliedtootherareas◦ NLP(Chatbot)◦ Speech(End-to-Endspeechrecognition,End-to-Endspeechtranslation)◦ Image(Imagecaptioning)
NeuralMTinspiredbyotherareas◦ Image/NLP(Character-awaremodelling)◦ MachineLearning(Adversarialnetworks)
Discussion
3
MachineTranslation
RulesDictionaries
Co-ocurrencesFrecuencyCounts
NeuralNetworks
SOURCELANGUAGE
TARGETLANGUAGE
MODEL
From1950stillnowEurotra,Apertium…(Forcada,2005)
From1990stillnowTC-Star,Moses…(Koehn,2010)
Startingin2014…NEMATUS…(Cho,2014)
DatesRefs
4
Neuralnetsare…
Neuralnetworks,abranchofmachinelearning,areabiologically-inspiredprogrammingparadigmwhichenablesacomputertolearnfromobservationaldata(http://neuralnetworksanddeeplearning.com/)
5
Deeplearningis…
A branchofmachinelearningbasedonasetofalgorithmsthatattempttomodelhigh-levelabstractionsindatabyusingmodelarchitectures,withcomplexstructuresorotherwise,composedofmultiplenon-lineartransformations(wikipedia)
Asetofmachinelearningalgorithmswhichattempttolearnmultiple-layeredmodelsofinputs,commonlyneuralnetworks(Duetal,2013)
6
NeuralMachineTranslation
7
Motivation:End-to-endsystem
PHRASE-BASED
Source Language Text
Preprocessing
Decoding
Postprocessing
Translation model
Language model
TargetLanguage Text
Ensure that translated wordscomeinthe right order
Finding the right targetwordsgiven the source words
Parallel corpus
WordAlignment
Phrase extraction
Monolingual corpus
TRAININGTEST
preprocessing
NEURAL
8
encoder
decoder
Relatedwork:languagemodelingFindafunctionthattakesasinputn-1 wordsandreturnsaconditionalprobabilityofthenextone
Recurrentneuralnetworkhaveallowedtoignoredependencybeyondthecontextwindow(viarecursion)
p(I’m) p(fine|I’m) p(.|fine) EOS
I’m fine .
9
Architecture:encoder-decoder
how are you ?
Cómo estás EOS
encoder decoder
eos Cómo estás
10
?
?
Attention-basedmechanism
encoder
decoder
+
11
NeuralMTarchitectureappliedtootherareasNATURALLANGUAGEPROCESSING
SPEECH
IMAGE
NaturalLanguageProcessing
13
Chatbot:acomputerprogramthatconductsaconversation
14
Chatbots models
RULE-BASED
ELIZA:Decompositionrulesandkeywordsmatching(Weizenbaum,1966)AIML(ArtificialIntelligenceMarkupLanguage)(Wallace,1995).
RETRIEVAL-BASED
IRIS:achat-orienteddialoguesystembasedonthevectorspacemodelframework(Banchs andLi,2012)
<category><pattern>WHATISYOURNAME</pattern><template>MynameisMichaelN.SEvanious.</template></category>
https://hackernoon.com/chatbot-architecture-496f5bf820ed15
Generative-basedchatbots
Human:whatisyourjob?Enc-dec:i’m alawyer.Human:whatdoyoudo?Enc-dec:i’m adoctor.
Thismodelcanbetrainedend-to-end
Cannotsolvetheproblemofmodelingthedialogue
encoder
decoder
16
ü Themodelcangeneratesimpleandbasicconversations.
X Themodelcannotcurrentlydeliverrealisticconversations.
Chatbot evaluationQuestion Count
Enc-dec PREFERRED OVER Cleverbot 97
Cleverbot PREFERRED OVER Enc-dec 60
Tie 20
Disagreemnt AMONGJUDGES(noconsensusof3outof4judges)
23
17
Chatbot examplesHuman:whatisthepurposeofbeingintelligent?
CleverBot:ThepurposeoflifeistoloveGodandlovepeople.
Enc-dec:tofindoutwhatitis
*
Human:Whatdoyouthinkaboutengland duringthereignofelizabeth ?
CleverBot:Ilikehim.Doyou?
Enc-dec:itwasagreatplace.
*
Human:Whatisthecolorofayellowcar?
CleverBot:Brown.
Enc-dec:yellow.
Human:whatisyourjob?
CleverBot:Iworkinmentalhealth
Enc-dec:i ’malawyer.
Human:whatdoyoudo?
CleverBot:Ihavenojob.
Enc-dec:i ’madoctor.
18
OtherNLPTaskswithSeq2seq
Text summarization: process of shortening a text document withsoftware to create a summary with the major points of the originaldocument.Question Answering: automatically producing an answer to aquestion given a corresponding document.Semantic Parsing: mapping natural language into a logical form thatcan be executed on a knowledge base and return an answerSyntactic Parsing: process of analysing a string of symbols, either innatural language or in computer languages, conforming to the rules ofa formal grammar
19
SpeechRecognition
20
SpeechRecognitionsystem
FEATURES RECOGNIZER DECISION
LexiconAcousticModels
Language models
TASKINFO
Featurevector
N-bestHip.
RECOGNIZED SENTENCE
microphone
x =x1 ...x|x| w =w1 ...w|w|
21
RNN/CNN-HMM+RNNLM
(N-GRAM+)RNN
HMM
RNN/CNN
LanguageModel
AcousticModel
Phoneticinventory
PronunciationLexicon
22
Speechrecognitionwithencoder-decoderwithattention
LanguageModel
AcousticModel
encoder
decoder
+
23
ListenerChallenges:speechsignalscanbehundredstothousandsofframeslong
Solution:usingapyramidBLSTM
24
Attend&Spell
25
End-to-endSpeech-to-text
Model WER
CLDNN-HMM* 8.0
LAS+LMRescoring 10.3
*ConvolutionalLongShortTermMemoryFullyConnectedDeepNeuralNetwork
26
End-to-endSpeech-to-textTranslation
Multi-task learning which aims at improving thegeneralization performance of a task using other relatedtasks.
One-to-many Many-to-OneWhatisnewherecomparedtopreviouswork?
Multi-tasktraining
SpanishSpeech
SpeechRecognition
SpeechTranslation
SpeechTranslation
TextTranslation
EnglishText
Oneencoder,multipledecoders Multipleencoders,onedecoder27
Spanish->EnglishFISHER/CALLHOMEBLEUresults
Model Test1 Test2
End-to-EndST 47.3 16.6
Multi-task 48.7 17.4
ASR /NMTconcatenation 45.4 16.6
28
Exampleofattentionprobabilities
29
Image
30
ImageCaptioning
Acatonthemat
31
Encoder-decoderwithattentiondecoder
+
encoder32
Captioning:Show,Attend&Tell
33
ResultsontheMSCOCOdatabase
Method BLEULog-Biliniar (Kiros etal2014a) 24.3
Enc-Dec(Vinyals et al2014a) 24.6
+Attention (Xuetal,2015) 25.0
34
OtherComputerVisionTaskswithAttentionVisual Question Answering: given an image and a natural languagequestion about the image, the task is to provide an accurate naturallanguage answer.Video Caption Generation: attempts to generate a complete andnatural sentence, enriching the single label as in video classification,to capture the most informative dynamics in videos.
35
NeuralMTarchitectureinspiredbyotherareas
ConvolutionalNeuralNeworks forcharacter-awareNeuralMT
37
German-EnglishBLEUResults
38
Method DE->EN EN->DEPhrase 20.99 17.04
NMT 20.64 17.15
+Char 22.10 20.22
Examples
39
GenerativeAdversarialNetworks
40
German-to-EnglishBLEUResults
41
Method DE->ENBaseline (Shenetal2016) 25.84
+Adversarial 27.94
German-to-EnglishExample
42
Source wir mussen verhindern ,dass diemenschenkenntnis erlangenvondingen ,vor allem dann ,wenn sie wahr sind .
Baseline weneedtopreventpeoplewhoareabletoknowthatpeoplehavetodo,especiallyiftheyaretrue.
+Adversarial weneedtopreventpeoplewhoareabletoknowaboutthings,especiallyiftheyaretrue.
REF wehavetopreventpeoplefromfindingaboutthings,especiallywhentheyaretrue.
Discussion
43
ImplementationsofEncoder-Decoder
LSTM CNN
44
Attention-basedmechanisms
SoftvsHard:softattentionweightsallpixels,hardattentioncropstheimageandforcesattentiononlyonthekeptpart.
GlobalvsLocal: aglobal approach whichalwaysattendstoallsourcewordsandalocalonethatonlylooksatasubsetofsourcewordsatatime.
IntravsExternal:intraattentioniswithintheencoder’sinputsentence,externalattentionisacrosssentences.
45
Onelargeencoder-decoder
•Text,speech,image…isallconverging toasignalparadigm?
•IfyouknowhowtobuildaneuralMTsystem,youmayeasilylearnhowtobuildaspeech-to-textrecognitionsystem...
•Oryoumaytrainthemtogethertoachievezero-shot AI.
*Andotherreferencesonthisresearchdirection….
46
Thanks
WWW.COSTA-JUSSA.COM
Acknowledgements:• Noé Casas and Carlos Escolano
for their valuable feedback on theslides.
• MT-Marathon Organizers forinviting me to this exciting event.