cs224n-2018-lecture8-rnn...2018/02/01  · • midterm logistics: fill out form on piazza if you...

54
Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 8: Recurrent Neural Networks and Language Models Abigail See

Upload: others

Post on 08-Mar-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

Natural Language Processingwith Deep Learning

CS224N/Ling284

Lecture 8:Recurrent Neural Networks

and Language Models

Abigail See

Natural Language Processingwith Deep Learning

CS224N/Ling284

Christopher Manning and Richard Socher

Lecture 2: Word Vectors

Page 2: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

Announcements

• Assignment1:Gradeswillbereleasedafterclass

• Assignment2:CodingsessionnextweekonMonday;detailsonPiazza

• Midtermlogistics:FilloutformonPiazzaifyoucan’tdomainmidterm,havespecialrequirements,orotherspecialcase

2/1/182

Page 3: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

Announcements

• DefaultFinalProject (PA4)releaselatetonight• Readthehandout,lookatthecode,decidewhichprojectyouwanttodo• Youmaynotunderstandallthetechnicalparts,butyou’llgetanoverview• Youdon’tyethavetheAzureresourcesyouneedtorunthecode

• Projectproposal duenextweek(ThursFeb8)• Detailsreleasedlatertoday• Everyonesubmitstheirteams• Customfinalprojectteamsalsodescribetheirproject

2/1/183

Page 4: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

2/1/184

Callforparticipation

Page 5: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

Overview

Todaywewill:

• IntroduceanewNLPtask• LanguageModeling

• Introduceanewfamilyofneuralnetworks• RecurrentNeuralNetworks(RNNs)

THEmostimportantideafortherestoftheclass!

motivates

2/1/185

Page 6: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

• LanguageModeling isthetaskofpredictingwhatwordcomesnext.

thestudentsopenedtheir______

• Moreformally:givenasequenceofwords,computetheprobabilitydistributionofthenextword:

whereisawordinthevocabulary

• AsystemthatdoesthisiscalledaLanguageModel.

LanguageModeling

examsminds

laptopsbooks

2/1/186

Page 7: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

YouuseLanguageModelseveryday!

2/1/187

Page 8: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

YouuseLanguageModelseveryday!

2/1/188

Page 9: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

n-gramLanguageModels

thestudentsopenedtheir______

• Question:HowtolearnaLanguageModel?• Answer (pre- DeepLearning):learnan-gramLanguageModel!

• Definition: An-gram isachunkofn consecutivewords.• unigrams:“the”,“students”,“opened”,”their”• bigrams:“thestudents”,“studentsopened”,“openedtheir”• trigrams:“thestudentsopened”,“studentsopenedtheir”• 4-grams:“thestudentsopenedtheir”

• Idea: Collectstatisticsabouthowfrequentdifferentn-gramsare,andusethesetopredictnextword.

2/1/189

Page 10: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

n-gramLanguageModels

• Firstwemakeasimplifyingassumption:dependsonlyonthepreceding(n-1)words

(statisticalapproximation)

(definitionofconditionalprob)

(assumption)

n-1words

prob ofan-gram

prob ofa(n-1)-gram

• Question: Howdowegetthesen-gramand(n-1)-gramprobabilities?• Answer: Bycounting theminsomelargecorpusoftext!

2/1/1810

Page 11: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

n-gramLanguageModels:Example

Supposewearelearninga4-gram LanguageModel.

astheproctorstartedtheclock,thestudentsopenedtheir_____discard

conditiononthis

Inthecorpus:• “studentsopenedtheir”occurred1000times• “studentsopenedtheirbooks”occurred400 times

• à P(books |studentsopenedtheir)=0.4• “studentsopenedtheirexams”occurred100 times

• à P(exams |studentsopenedtheir)=0.1

Shouldwehavediscardedthe“proctor”context?

2/1/1811

Page 12: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

Problems withn-gramLanguageModels

Note: Increasingnmakessparsityproblemsworse.Typicallywecan’thaven biggerthan5.

Problem:Whatif“studentsopenedtheir” neveroccurredindata?Thenwecan’tcalculateprobabilityforany !

SparsityProblem2

Problem:Whatif“studentsopenedtheir” neveroccurredindata?Thenhasprobability0!

SparsityProblem1

(Partial)Solution: Addsmall𝛿tocountforevery.Thisiscalledsmoothing.

(Partial)Solution: Justconditionon“openedtheir” instead.Thisiscalledbackoff.

2/1/1812

Page 13: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

Problems withn-gramLanguageModels

2/1/1813

Storage:Needtostorecountforallpossiblen-grams.SomodelsizeisO(exp(n)).

Increasingnmakesmodelsizehuge!

Page 14: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

n-gramLanguageModelsinpractice

• YoucanbuildasimpletrigramLanguageModelovera1.7millionwordcorpus(Reuters)inafewsecondsonyourlaptop*

todaythe _______

*Tryforyourself: https://nlpforhackers.io/language-models/

Otherwise,seemsreasonable!

company 0.153bank 0.153price 0.077italian 0.039emirate 0.039

getprobabilitydistribution

Sparsityproblem:notmuchgranularityintheprobability

distribution

Businessandfinancialnews

2/1/1814

Page 15: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

Generatingtextwithan-gramLanguageModel

• YoucanalsouseaLanguageModeltogeneratetext.

todaythe_______

conditiononthis

company 0.153bank 0.153price 0.077italian 0.039emirate 0.039

getprobabilitydistribution

sample

2/1/1815

Page 16: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

Generatingtextwithan-gramLanguageModel

• YoucanalsouseaLanguageModeltogeneratetext.

todaytheprice_______

conditiononthis

of 0.308for 0.050it 0.046to 0.046is 0.031

getprobabilitydistribution

sample

2/1/1816

Page 17: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

Generatingtextwithan-gramLanguageModel

• YoucanalsouseaLanguageModeltogeneratetext.

todaythepriceof_______

conditiononthis

the 0.07218 0.043oil 0.043its 0.036gold 0.018

getprobabilitydistribution

sample

2/1/1817

Page 18: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

Generatingtextwithan-gramLanguageModel

• YoucanalsouseaLanguageModeltogeneratetext.

todaythepriceofgold_______

2/1/1818

Page 19: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

Generatingtextwithan-gramLanguageModel

• YoucanalsouseaLanguageModeltogeneratetext.

todaythepriceofgoldperton,whileproductionofshoelastsandshoeindustry,thebankintervenedjustafteritconsideredandrejectedanimf demandtorebuilddepletedeuropean stocks,sept30endprimary76cts ashare.

Incoherent!Weneedtoconsidermorethan3wordsatatimeifwewantto

generategoodtext.

Butincreasingn worsenssparsityproblem,andexponentiallyincreasesmodelsize…

2/1/1819

Page 20: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

Howtobuildaneural LanguageModel?

• RecalltheLanguageModelingtask:• Input:sequenceofwords• Output:prob dist ofthenextword

• Howaboutawindow-basedneuralmodel?• WesawthisappliedtoNamedEntityRecognitioninLecture4

2/1/1820

Page 21: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

Afixed-windowneuralLanguageModel

the students opened theiras the proctor started the clock ______

discard fixedwindow2/1/1821

Page 22: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

Afixed-windowneuralLanguageModel

the students opened their

bookslaptops

concatenatedwordembeddings

words/one-hotvectors

hiddenlayer

a zoo

outputdistribution

2/1/1822

Page 23: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

Afixed-windowneuralLanguageModel

the students opened their

bookslaptops

a zoo

Improvements overn-gramLM:• Nosparsityproblem• ModelsizeisO(n)notO(exp(n))

Remainingproblems:• Fixedwindowistoosmall• Enlargingwindowenlarges• Windowcanneverbelarge

enough!• Eachusesdifferentrows

of.Wedon’tshareweightsacrossthewindow.

Weneedaneuralarchitecturethatcan

processanylengthinput

2/1/1823

Page 24: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

RecurrentNeuralNetworks(RNN)

hiddenstates

inputsequence(anylength)

Coreidea: ApplythesameweightsrepeatedlyAfamilyofneuralarchitectures

2/1/1824

outputs(optional)

Page 25: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

ARNNLanguageModel

the students opened theirwords/one-hotvectors

bookslaptops

wordembeddings

a zoo

outputdistribution

Note:thisinputsequencecouldbemuchlonger,butthisslidedoesn’thavespace!

hiddenstates

istheinitialhiddenstate

2/1/1825

Page 26: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

ARNNLanguageModel

the students opened their

bookslaptops

a zoo

RNN Advantages:• Canprocessanylength

input• Modelsizedoesn’t

increase forlongerinput• Computationforstept

can(intheory)useinformationfrom manystepsback

• Weightsaresharedacrosstimestepsàrepresentationsareshared

RNNDisadvantages:• Recurrentcomputation

isslow• Inpractice,difficultto

accessinformationfrommanystepsback

Moreonthesenextweek

2/1/1826

Page 27: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

TrainingaRNNLanguageModel

• Getabigcorpusoftextwhichisasequenceofwords• FeedintoRNN-LM;computeoutputdistributionforeverystept.

• i.e.predictprobabilitydist ofeveryword,givenwordssofar

• Lossfunctiononsteptisusualcross-entropybetweenourpredictedprobabilitydistribution,andthetruenextword:

• Averagethistogetoveralllossforentiretrainingset:

2/1/1827

Page 28: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

TrainingaRNNLanguageModel=negativelogprob

of“students”

the students opened their …examsCorpus

Loss

2/1/1828

Page 29: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

TrainingaRNNLanguageModel=negativelogprob

of“opened”

Corpus the students opened their …exams

Loss

2/1/1829

Page 30: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

TrainingaRNNLanguageModel=negativelogprob

of“their”

Corpus the students opened their …exams

Loss

2/1/1830

Page 31: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

TrainingaRNNLanguageModel=negativelogprob

of“exams”

Corpus the students opened their …exams

Loss

2/1/1831

Page 32: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

TrainingaRNNLanguageModel

++++…=

Corpus the students opened their …exams

Loss

2/1/1832

Page 33: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

TrainingaRNNLanguageModel

• However:Computinglossandgradientsacrossentirecorpus istooexpensive!

• Recall: StochasticGradientDescentallowsustocomputelossandgradientsforsmallchunkofdata,andupdate.

• à Inpractice,considerasasentence

• Computelossforasentence(actuallyusuallyabatchofsentences),computegradientsandupdateweights.Repeat.

2/1/1833

Page 34: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

BackpropagationforRNNs

……

Question:What’sthederivativeofw.r.t.therepeatedweightmatrix?

Answer:“Thegradientw.r.t.arepeatedweight

isthesumofthegradientw.r.t.eachtimeitappears”

2/1/1834

Why?

Page 35: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

MultivariableChainRule

2/1/1835

Source:https://www.khanacademy.org/math/multivariable-calculus/multivariable-derivatives/differentiating-vector-valued-functions/a/multivariable-chain-rule-simple-version

Page 36: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

BackpropagationforRNNs:Proofsketch

2/1/1836

Source:https://www.khanacademy.org/math/multivariable-calculus/multivariable-derivatives/differentiating-vector-valued-functions/a/multivariable-chain-rule-simple-version

Inourexample: Applythemultivariablechainrule:=1

Page 37: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

BackpropagationforRNNs

……

Question: Howdowecalculatethis?

Answer: Backpropagateovertimestepsi=t,…,0,summinggradientsasyougo.Thisalgorithmiscalled“backpropagationthroughtime”

2/1/1837

Page 38: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

GeneratingtextwithaRNNLanguageModelJustlikean-gramLanguageModel,youcanuseaRNNLanguageModeltogeneratetextbyrepeatedsampling.Sampledoutputisnextstep’sinput.

my favorite season is

sample

favorite

sample

season

sample

is

sample

spring

spring 2/1/1838

Page 39: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

GeneratingtextwithaRNNLanguageModel

• Let’shavesomefun!• YoucantrainaRNN-LMonanykindoftext,thengeneratetext

inthatstyle.• RNN-LMtrainedonObamaspeeches:

Source: https://medium.com/@samim/obama-rnn-machine-generated-political-speeches-c8abd18a2ea0

2/1/1839

Page 40: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

GeneratingtextwithaRNNLanguageModel

• Let’shavesomefun!• YoucantrainaRNN-LMonanykindoftext,thengeneratetext

inthatstyle.• RNN-LMtrainedonHarryPotter:

Source: https://medium.com/deep-writing/harry-potter-written-by-artificial-intelligence-8a9431803da6

2/1/1840

Page 41: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

GeneratingtextwithaRNNLanguageModel

• Let’shavesomefun!• YoucantrainaRNN-LMonanykindoftext,thengeneratetext

inthatstyle.• RNN-LMtrainedonSeinfeld scripts:

Source: https://www.avclub.com/a-bunch-of-comedy-writers-teamed-up-with-a-computer-to-1818633242

2/1/1841

Page 42: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

GeneratingtextwithaRNNLanguageModel

• Let’shavesomefun!• YoucantrainaRNN-LMonanykindoftext,thengeneratetext

inthatstyle.• (character-level)RNN-LMtrainedonpaintcolors:

Source: http://aiweirdness.com/post/160776374467/new-paint-colors-invented-by-neural-network

2/1/1842

Page 43: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

EvaluatingLanguageModels

• ThetraditionalevaluationmetricforLanguageModelsisperplexity.

• Lower isbetter!• InAssignment2youwillshowthatminimizingperplexity and

minimizingthelossfunctionareequivalent.

Inverseprobabilityofdataset

Normalizedbynumberofwords

2/1/1843

Page 44: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

RNNshavegreatlyimprovedperplexity

n-grammodel

IncreasinglycomplexRNNs

Perplexityimproves(lowerisbetter)

Source: https://research.fb.com/building-an-efficient-neural-language-model-over-a-billion-words/

2/1/1844

Page 45: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

WhyshouldwecareaboutLanguageModeling?

• LanguageModelingisasubcomponent ofotherNLPsystems:• Speechrecognition

• UseaLMtogeneratetranscription,conditioned onaudio

• MachineTranslation• UseaLMtogeneratetranslation,conditioned onoriginaltext

• Summarization• UseaLMtogeneratesummary,conditioned onoriginaltext

• LanguageModelingisabenchmarktask thathelpsusmeasureourprogress onunderstandinglanguage

ThesesystemsarecalledconditionalLanguageModels

2/1/1845

Page 46: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

Recap

• LanguageModel:Asystemthatpredictsthenextword

• RecurrentNeuralNetwork:Afamilyofneuralnetworksthat:• Takesequentialinputofanylength• Applythesameweightsoneachstep• Canoptionallyproduceoutputoneachstep

• RecurrentNeuralNetwork≠ LanguageModel

• We’veshownthatRNNsareagreatwaytobuildaLM.

• ButRNNsareusefulformuchmore!2/1/1846

Page 47: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

RNNscanbeusedfortagginge.g.part-of-speechtagging,namedentityrecognition

knocked over the vasethe startled cat

VBN IN DT NNDT VBN NN

2/1/1847

Page 48: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

RNNscanbeusedforsentenceclassification

the movie a lotoverall I enjoyed

positive

Sentenceencoding

Howtocomputesentenceencoding?

e.g.sentimentclassification

2/1/1848

Page 49: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

RNNscanbeusedforsentenceclassification

the movie a lotoverall I enjoyed

positive

Sentenceencoding

Howtocomputesentenceencoding?

Basicway:Usefinalhiddenstate

e.g.sentimentclassification

2/1/1849

Page 50: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

RNNscanbeusedforsentenceclassification

the movie a lotoverall I enjoyed

positive

Sentenceencoding

Howtocomputesentenceencoding?

Usuallybetter:Takeelement-wisemaxormeanofallhiddenstates

e.g.sentimentclassification

2/1/1850

Page 51: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

RNNscanbeusedtogeneratetexte.g.speechrecognition,machinetranslation,summarization

what’s the

weatherthewhat’s

<START>

Remember:thesearecalled“conditionallanguagemodels”.We’llseeMachineTranslationinmuchmoredetaillater.

2/1/1851

Page 52: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

RNNscanbeusedasanencodermodulee.g.questionanswering,machinetranslation

Questionencoding=element-wisemaxofhiddenstates Context: Ludwigvan

BeethovenwasaGermancomposerandpianist.Acrucialfigure…

Beethoven ?what nationality wasQuestion:

HeretheRNNactsasanencoderfortheQuestion.Theencoderispartofalargerneuralsystem.

Answer: German

2/1/1852

Page 53: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

Anoteonterminology

Bytheendofthecourse: Youwillunderstandphraseslike“stackedbidirectionalLSTMwithresidualconnectionsandself-attention”

RNNdescribedinthislecture=“vanillaRNN”

Nextlecture: YouwilllearnaboutotherRNNflavors

likeGRU andLSTM

2/1/1853

andmulti-layerRNNs

Page 54: cs224n-2018-lecture8-rnn...2018/02/01  · • Midterm logistics: Fill out form on Piazza if you can’t do main midterm, have special requirements, or other special case 2 2/1/18

Nexttime

• Problems withRNNs!• Vanishinggradients

• FancyRNN variants!• LSTM• GRU• multi-layer• bidirectional

motivates

2/1/1854