cs224n-2018-lecture8-rnn...2018/02/01 · • midterm logistics: fill out form on piazza if you...
TRANSCRIPT
![Page 1: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/1.jpg)
Natural Language Processingwith Deep Learning
CS224N/Ling284
Lecture 8:Recurrent Neural Networks
and Language Models
Abigail See
Natural Language Processingwith Deep Learning
CS224N/Ling284
Christopher Manning and Richard Socher
Lecture 2: Word Vectors
![Page 2: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/2.jpg)
Announcements
• Assignment1:Gradeswillbereleasedafterclass
• Assignment2:CodingsessionnextweekonMonday;detailsonPiazza
• Midtermlogistics:FilloutformonPiazzaifyoucan’tdomainmidterm,havespecialrequirements,orotherspecialcase
2/1/182
![Page 3: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/3.jpg)
Announcements
• DefaultFinalProject (PA4)releaselatetonight• Readthehandout,lookatthecode,decidewhichprojectyouwanttodo• Youmaynotunderstandallthetechnicalparts,butyou’llgetanoverview• Youdon’tyethavetheAzureresourcesyouneedtorunthecode
• Projectproposal duenextweek(ThursFeb8)• Detailsreleasedlatertoday• Everyonesubmitstheirteams• Customfinalprojectteamsalsodescribetheirproject
2/1/183
![Page 4: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/4.jpg)
2/1/184
Callforparticipation
![Page 5: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/5.jpg)
Overview
Todaywewill:
• IntroduceanewNLPtask• LanguageModeling
• Introduceanewfamilyofneuralnetworks• RecurrentNeuralNetworks(RNNs)
THEmostimportantideafortherestoftheclass!
motivates
2/1/185
![Page 6: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/6.jpg)
• LanguageModeling isthetaskofpredictingwhatwordcomesnext.
thestudentsopenedtheir______
• Moreformally:givenasequenceofwords,computetheprobabilitydistributionofthenextword:
whereisawordinthevocabulary
• AsystemthatdoesthisiscalledaLanguageModel.
LanguageModeling
examsminds
laptopsbooks
2/1/186
![Page 7: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/7.jpg)
YouuseLanguageModelseveryday!
2/1/187
![Page 8: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/8.jpg)
YouuseLanguageModelseveryday!
2/1/188
![Page 9: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/9.jpg)
n-gramLanguageModels
thestudentsopenedtheir______
• Question:HowtolearnaLanguageModel?• Answer (pre- DeepLearning):learnan-gramLanguageModel!
• Definition: An-gram isachunkofn consecutivewords.• unigrams:“the”,“students”,“opened”,”their”• bigrams:“thestudents”,“studentsopened”,“openedtheir”• trigrams:“thestudentsopened”,“studentsopenedtheir”• 4-grams:“thestudentsopenedtheir”
• Idea: Collectstatisticsabouthowfrequentdifferentn-gramsare,andusethesetopredictnextword.
2/1/189
![Page 10: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/10.jpg)
n-gramLanguageModels
• Firstwemakeasimplifyingassumption:dependsonlyonthepreceding(n-1)words
(statisticalapproximation)
(definitionofconditionalprob)
(assumption)
n-1words
prob ofan-gram
prob ofa(n-1)-gram
• Question: Howdowegetthesen-gramand(n-1)-gramprobabilities?• Answer: Bycounting theminsomelargecorpusoftext!
2/1/1810
![Page 11: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/11.jpg)
n-gramLanguageModels:Example
Supposewearelearninga4-gram LanguageModel.
astheproctorstartedtheclock,thestudentsopenedtheir_____discard
conditiononthis
Inthecorpus:• “studentsopenedtheir”occurred1000times• “studentsopenedtheirbooks”occurred400 times
• à P(books |studentsopenedtheir)=0.4• “studentsopenedtheirexams”occurred100 times
• à P(exams |studentsopenedtheir)=0.1
Shouldwehavediscardedthe“proctor”context?
2/1/1811
![Page 12: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/12.jpg)
Problems withn-gramLanguageModels
Note: Increasingnmakessparsityproblemsworse.Typicallywecan’thaven biggerthan5.
Problem:Whatif“studentsopenedtheir” neveroccurredindata?Thenwecan’tcalculateprobabilityforany !
SparsityProblem2
Problem:Whatif“studentsopenedtheir” neveroccurredindata?Thenhasprobability0!
SparsityProblem1
(Partial)Solution: Addsmall𝛿tocountforevery.Thisiscalledsmoothing.
(Partial)Solution: Justconditionon“openedtheir” instead.Thisiscalledbackoff.
2/1/1812
![Page 13: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/13.jpg)
Problems withn-gramLanguageModels
2/1/1813
Storage:Needtostorecountforallpossiblen-grams.SomodelsizeisO(exp(n)).
Increasingnmakesmodelsizehuge!
![Page 14: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/14.jpg)
n-gramLanguageModelsinpractice
• YoucanbuildasimpletrigramLanguageModelovera1.7millionwordcorpus(Reuters)inafewsecondsonyourlaptop*
todaythe _______
*Tryforyourself: https://nlpforhackers.io/language-models/
Otherwise,seemsreasonable!
company 0.153bank 0.153price 0.077italian 0.039emirate 0.039
…
getprobabilitydistribution
Sparsityproblem:notmuchgranularityintheprobability
distribution
Businessandfinancialnews
2/1/1814
![Page 15: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/15.jpg)
Generatingtextwithan-gramLanguageModel
• YoucanalsouseaLanguageModeltogeneratetext.
todaythe_______
conditiononthis
company 0.153bank 0.153price 0.077italian 0.039emirate 0.039
…
getprobabilitydistribution
sample
2/1/1815
![Page 16: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/16.jpg)
Generatingtextwithan-gramLanguageModel
• YoucanalsouseaLanguageModeltogeneratetext.
todaytheprice_______
conditiononthis
of 0.308for 0.050it 0.046to 0.046is 0.031
…
getprobabilitydistribution
sample
2/1/1816
![Page 17: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/17.jpg)
Generatingtextwithan-gramLanguageModel
• YoucanalsouseaLanguageModeltogeneratetext.
todaythepriceof_______
conditiononthis
the 0.07218 0.043oil 0.043its 0.036gold 0.018
…
getprobabilitydistribution
sample
2/1/1817
![Page 18: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/18.jpg)
Generatingtextwithan-gramLanguageModel
• YoucanalsouseaLanguageModeltogeneratetext.
todaythepriceofgold_______
2/1/1818
![Page 19: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/19.jpg)
Generatingtextwithan-gramLanguageModel
• YoucanalsouseaLanguageModeltogeneratetext.
todaythepriceofgoldperton,whileproductionofshoelastsandshoeindustry,thebankintervenedjustafteritconsideredandrejectedanimf demandtorebuilddepletedeuropean stocks,sept30endprimary76cts ashare.
Incoherent!Weneedtoconsidermorethan3wordsatatimeifwewantto
generategoodtext.
Butincreasingn worsenssparsityproblem,andexponentiallyincreasesmodelsize…
2/1/1819
![Page 20: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/20.jpg)
Howtobuildaneural LanguageModel?
• RecalltheLanguageModelingtask:• Input:sequenceofwords• Output:prob dist ofthenextword
• Howaboutawindow-basedneuralmodel?• WesawthisappliedtoNamedEntityRecognitioninLecture4
2/1/1820
![Page 21: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/21.jpg)
Afixed-windowneuralLanguageModel
the students opened theiras the proctor started the clock ______
discard fixedwindow2/1/1821
![Page 22: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/22.jpg)
Afixed-windowneuralLanguageModel
the students opened their
bookslaptops
concatenatedwordembeddings
words/one-hotvectors
hiddenlayer
a zoo
outputdistribution
2/1/1822
![Page 23: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/23.jpg)
Afixed-windowneuralLanguageModel
the students opened their
bookslaptops
a zoo
Improvements overn-gramLM:• Nosparsityproblem• ModelsizeisO(n)notO(exp(n))
Remainingproblems:• Fixedwindowistoosmall• Enlargingwindowenlarges• Windowcanneverbelarge
enough!• Eachusesdifferentrows
of.Wedon’tshareweightsacrossthewindow.
Weneedaneuralarchitecturethatcan
processanylengthinput
2/1/1823
![Page 24: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/24.jpg)
RecurrentNeuralNetworks(RNN)
hiddenstates
inputsequence(anylength)
…
…
…
Coreidea: ApplythesameweightsrepeatedlyAfamilyofneuralarchitectures
2/1/1824
outputs(optional)
![Page 25: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/25.jpg)
ARNNLanguageModel
the students opened theirwords/one-hotvectors
bookslaptops
wordembeddings
a zoo
outputdistribution
Note:thisinputsequencecouldbemuchlonger,butthisslidedoesn’thavespace!
hiddenstates
istheinitialhiddenstate
2/1/1825
![Page 26: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/26.jpg)
ARNNLanguageModel
the students opened their
bookslaptops
a zoo
RNN Advantages:• Canprocessanylength
input• Modelsizedoesn’t
increase forlongerinput• Computationforstept
can(intheory)useinformationfrom manystepsback
• Weightsaresharedacrosstimestepsàrepresentationsareshared
RNNDisadvantages:• Recurrentcomputation
isslow• Inpractice,difficultto
accessinformationfrommanystepsback
Moreonthesenextweek
2/1/1826
![Page 27: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/27.jpg)
TrainingaRNNLanguageModel
• Getabigcorpusoftextwhichisasequenceofwords• FeedintoRNN-LM;computeoutputdistributionforeverystept.
• i.e.predictprobabilitydist ofeveryword,givenwordssofar
• Lossfunctiononsteptisusualcross-entropybetweenourpredictedprobabilitydistribution,andthetruenextword:
• Averagethistogetoveralllossforentiretrainingset:
2/1/1827
![Page 28: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/28.jpg)
TrainingaRNNLanguageModel=negativelogprob
of“students”
the students opened their …examsCorpus
Loss
…
2/1/1828
![Page 29: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/29.jpg)
TrainingaRNNLanguageModel=negativelogprob
of“opened”
Corpus the students opened their …exams
Loss
…
2/1/1829
![Page 30: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/30.jpg)
TrainingaRNNLanguageModel=negativelogprob
of“their”
Corpus the students opened their …exams
Loss
…
2/1/1830
![Page 31: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/31.jpg)
TrainingaRNNLanguageModel=negativelogprob
of“exams”
Corpus the students opened their …exams
Loss
…
2/1/1831
![Page 32: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/32.jpg)
TrainingaRNNLanguageModel
++++…=
Corpus the students opened their …exams
Loss
…
2/1/1832
![Page 33: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/33.jpg)
TrainingaRNNLanguageModel
• However:Computinglossandgradientsacrossentirecorpus istooexpensive!
• Recall: StochasticGradientDescentallowsustocomputelossandgradientsforsmallchunkofdata,andupdate.
• à Inpractice,considerasasentence
• Computelossforasentence(actuallyusuallyabatchofsentences),computegradientsandupdateweights.Repeat.
2/1/1833
![Page 34: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/34.jpg)
BackpropagationforRNNs
……
Question:What’sthederivativeofw.r.t.therepeatedweightmatrix?
Answer:“Thegradientw.r.t.arepeatedweight
isthesumofthegradientw.r.t.eachtimeitappears”
2/1/1834
Why?
![Page 35: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/35.jpg)
MultivariableChainRule
2/1/1835
Source:https://www.khanacademy.org/math/multivariable-calculus/multivariable-derivatives/differentiating-vector-valued-functions/a/multivariable-chain-rule-simple-version
![Page 36: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/36.jpg)
BackpropagationforRNNs:Proofsketch
2/1/1836
Source:https://www.khanacademy.org/math/multivariable-calculus/multivariable-derivatives/differentiating-vector-valued-functions/a/multivariable-chain-rule-simple-version
…
Inourexample: Applythemultivariablechainrule:=1
![Page 37: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/37.jpg)
BackpropagationforRNNs
……
Question: Howdowecalculatethis?
Answer: Backpropagateovertimestepsi=t,…,0,summinggradientsasyougo.Thisalgorithmiscalled“backpropagationthroughtime”
2/1/1837
![Page 38: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/38.jpg)
GeneratingtextwithaRNNLanguageModelJustlikean-gramLanguageModel,youcanuseaRNNLanguageModeltogeneratetextbyrepeatedsampling.Sampledoutputisnextstep’sinput.
my favorite season is
…
sample
favorite
sample
season
sample
is
sample
spring
spring 2/1/1838
![Page 39: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/39.jpg)
GeneratingtextwithaRNNLanguageModel
• Let’shavesomefun!• YoucantrainaRNN-LMonanykindoftext,thengeneratetext
inthatstyle.• RNN-LMtrainedonObamaspeeches:
Source: https://medium.com/@samim/obama-rnn-machine-generated-political-speeches-c8abd18a2ea0
2/1/1839
![Page 40: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/40.jpg)
GeneratingtextwithaRNNLanguageModel
• Let’shavesomefun!• YoucantrainaRNN-LMonanykindoftext,thengeneratetext
inthatstyle.• RNN-LMtrainedonHarryPotter:
Source: https://medium.com/deep-writing/harry-potter-written-by-artificial-intelligence-8a9431803da6
2/1/1840
![Page 41: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/41.jpg)
GeneratingtextwithaRNNLanguageModel
• Let’shavesomefun!• YoucantrainaRNN-LMonanykindoftext,thengeneratetext
inthatstyle.• RNN-LMtrainedonSeinfeld scripts:
Source: https://www.avclub.com/a-bunch-of-comedy-writers-teamed-up-with-a-computer-to-1818633242
2/1/1841
![Page 42: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/42.jpg)
GeneratingtextwithaRNNLanguageModel
• Let’shavesomefun!• YoucantrainaRNN-LMonanykindoftext,thengeneratetext
inthatstyle.• (character-level)RNN-LMtrainedonpaintcolors:
Source: http://aiweirdness.com/post/160776374467/new-paint-colors-invented-by-neural-network
2/1/1842
![Page 43: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/43.jpg)
EvaluatingLanguageModels
• ThetraditionalevaluationmetricforLanguageModelsisperplexity.
• Lower isbetter!• InAssignment2youwillshowthatminimizingperplexity and
minimizingthelossfunctionareequivalent.
Inverseprobabilityofdataset
Normalizedbynumberofwords
2/1/1843
![Page 44: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/44.jpg)
RNNshavegreatlyimprovedperplexity
n-grammodel
IncreasinglycomplexRNNs
Perplexityimproves(lowerisbetter)
Source: https://research.fb.com/building-an-efficient-neural-language-model-over-a-billion-words/
2/1/1844
![Page 45: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/45.jpg)
WhyshouldwecareaboutLanguageModeling?
• LanguageModelingisasubcomponent ofotherNLPsystems:• Speechrecognition
• UseaLMtogeneratetranscription,conditioned onaudio
• MachineTranslation• UseaLMtogeneratetranslation,conditioned onoriginaltext
• Summarization• UseaLMtogeneratesummary,conditioned onoriginaltext
• LanguageModelingisabenchmarktask thathelpsusmeasureourprogress onunderstandinglanguage
ThesesystemsarecalledconditionalLanguageModels
2/1/1845
![Page 46: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/46.jpg)
Recap
• LanguageModel:Asystemthatpredictsthenextword
• RecurrentNeuralNetwork:Afamilyofneuralnetworksthat:• Takesequentialinputofanylength• Applythesameweightsoneachstep• Canoptionallyproduceoutputoneachstep
• RecurrentNeuralNetwork≠ LanguageModel
• We’veshownthatRNNsareagreatwaytobuildaLM.
• ButRNNsareusefulformuchmore!2/1/1846
![Page 47: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/47.jpg)
RNNscanbeusedfortagginge.g.part-of-speechtagging,namedentityrecognition
knocked over the vasethe startled cat
VBN IN DT NNDT VBN NN
2/1/1847
![Page 48: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/48.jpg)
RNNscanbeusedforsentenceclassification
the movie a lotoverall I enjoyed
positive
Sentenceencoding
Howtocomputesentenceencoding?
e.g.sentimentclassification
2/1/1848
![Page 49: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/49.jpg)
RNNscanbeusedforsentenceclassification
the movie a lotoverall I enjoyed
positive
Sentenceencoding
Howtocomputesentenceencoding?
Basicway:Usefinalhiddenstate
e.g.sentimentclassification
2/1/1849
![Page 50: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/50.jpg)
RNNscanbeusedforsentenceclassification
the movie a lotoverall I enjoyed
positive
Sentenceencoding
Howtocomputesentenceencoding?
Usuallybetter:Takeelement-wisemaxormeanofallhiddenstates
e.g.sentimentclassification
2/1/1850
![Page 51: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/51.jpg)
RNNscanbeusedtogeneratetexte.g.speechrecognition,machinetranslation,summarization
what’s the
weatherthewhat’s
<START>
Remember:thesearecalled“conditionallanguagemodels”.We’llseeMachineTranslationinmuchmoredetaillater.
2/1/1851
![Page 52: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/52.jpg)
RNNscanbeusedasanencodermodulee.g.questionanswering,machinetranslation
Questionencoding=element-wisemaxofhiddenstates Context: Ludwigvan
BeethovenwasaGermancomposerandpianist.Acrucialfigure…
Beethoven ?what nationality wasQuestion:
HeretheRNNactsasanencoderfortheQuestion.Theencoderispartofalargerneuralsystem.
Answer: German
2/1/1852
![Page 53: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/53.jpg)
Anoteonterminology
Bytheendofthecourse: Youwillunderstandphraseslike“stackedbidirectionalLSTMwithresidualconnectionsandself-attention”
RNNdescribedinthislecture=“vanillaRNN”
Nextlecture: YouwilllearnaboutotherRNNflavors
likeGRU andLSTM
2/1/1853
andmulti-layerRNNs
![Page 54: cs224n-2018-lecture8-rnn...2018/02/01 · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18](https://reader035.vdocuments.net/reader035/viewer/2022071414/610dd45063b759394d5e59e2/html5/thumbnails/54.jpg)
Nexttime
• Problems withRNNs!• Vanishinggradients
• FancyRNN variants!• LSTM• GRU• multi-layer• bidirectional
motivates
2/1/1854