recurrent neural networks - artificial...
TRANSCRIPT
![Page 1: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/1.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 20161
Lecture 10:
Recurrent Neural Networks
![Page 2: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/2.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 20162
Administrative- Midterm this Wednesday! woohoo!- A3 will be out ~Wednesday
![Page 3: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/3.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 20163
![Page 4: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/4.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 20164
http://mtyka.github.io/deepdream/2016/02/05/bilateral-class-vis.html
![Page 5: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/5.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 20165
http://mtyka.github.io/deepdream/2016/02/05/bilateral-class-vis.html
![Page 6: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/6.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 20166
Recurrent Networks offer a lot of flexibility:
Vanilla Neural Networks
![Page 7: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/7.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 20167
Recurrent Networks offer a lot of flexibility:
e.g. Image Captioningimage -> sequence of words
![Page 8: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/8.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 20168
Recurrent Networks offer a lot of flexibility:
e.g. Sentiment Classificationsequence of words -> sentiment
![Page 9: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/9.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 20169
Recurrent Networks offer a lot of flexibility:
e.g. Machine Translationseq of words -> seq of words
![Page 10: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/10.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201610
Recurrent Networks offer a lot of flexibility:
e.g. Video classification on frame level
![Page 11: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/11.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201611
Multiple Object Recognition with Visual Attention, Ba et al.
SequentialProcessingof fixed inputs
![Page 12: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/12.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201612
DRAW: A Recurrent Neural Network For Image Generation, Gregor et al.
SequentialProcessingof fixed outputs
![Page 13: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/13.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201613
Recurrent Neural Network
x
RNN
![Page 14: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/14.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201614
Recurrent Neural Network
x
RNN
yusually want to predict a vector at some time steps
![Page 15: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/15.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201615
Recurrent Neural Network
x
RNN
yWe can process a sequence of vectors x by applying a recurrence formula at every time step:
new state old state input vector at some time step
some functionwith parameters W
![Page 16: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/16.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201616
Recurrent Neural Network
x
RNN
yWe can process a sequence of vectors x by applying a recurrence formula at every time step:
Notice: the same function and the same set of parameters are used at every time step.
![Page 17: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/17.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201617
(Vanilla) Recurrent Neural Network
x
RNN
y
The state consists of a single “hidden” vector h:
![Page 18: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/18.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201618
Character-levellanguage modelexample
Vocabulary:[h,e,l,o]
Example trainingsequence:“hello”
x
RNN
y
![Page 19: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/19.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201619
Character-levellanguage modelexample
Vocabulary:[h,e,l,o]
Example trainingsequence:“hello”
![Page 20: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/20.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201620
Character-levellanguage modelexample
Vocabulary:[h,e,l,o]
Example trainingsequence:“hello”
![Page 21: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/21.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201621
Character-levellanguage modelexample
Vocabulary:[h,e,l,o]
Example trainingsequence:“hello”
![Page 22: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/22.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201622
min-char-rnn.py gist: 112 lines of Python
(https://gist.github.com/karpathy/d4dee566867f8291f086)
![Page 23: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/23.jpg)
min-char-rnn.py gistData I/O
![Page 24: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/24.jpg)
min-char-rnn.py gist Initializations
recall:
![Page 25: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/25.jpg)
min-char-rnn.py gistMain loop
![Page 26: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/26.jpg)
min-char-rnn.py gistMain loop
![Page 27: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/27.jpg)
min-char-rnn.py gistMain loop
![Page 28: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/28.jpg)
min-char-rnn.py gistMain loop
![Page 29: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/29.jpg)
min-char-rnn.py gistMain loop
![Page 30: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/30.jpg)
min-char-rnn.py gist Loss function- forward pass (compute loss)- backward pass (compute param gradient)
![Page 31: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/31.jpg)
min-char-rnn.py gist
Softmax classifier
![Page 32: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/32.jpg)
min-char-rnn.py gist
recall:
![Page 33: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/33.jpg)
min-char-rnn.py gist
![Page 34: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/34.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201634
x
RNN
y
![Page 35: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/35.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201635
![Page 36: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/36.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201636
train more
train more
train more
at first:
![Page 37: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/37.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201637
![Page 38: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/38.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201638
open source textbook on algebraic geometry
Latex source
![Page 39: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/39.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201639
![Page 40: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/40.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201640
![Page 41: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/41.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201641
![Page 42: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/42.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201642
Generated C code
![Page 43: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/43.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201643
![Page 44: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/44.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201644
![Page 45: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/45.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201645
Searching for interpretable cells
[Visualizing and Understanding Recurrent Networks, Andrej Karpathy*, Justin Johnson*, Li Fei-Fei]
![Page 46: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/46.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201646
quote detection cell
Searching for interpretable cells
![Page 47: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/47.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201647
line length tracking cell
Searching for interpretable cells
![Page 48: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/48.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201648
if statement cell
Searching for interpretable cells
![Page 49: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/49.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201649
quote/comment cell
Searching for interpretable cells
![Page 50: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/50.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201650
code depth cell
Searching for interpretable cells
![Page 51: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/51.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201651
Explain Images with Multimodal Recurrent Neural Networks, Mao et al.Deep Visual-Semantic Alignments for Generating Image Descriptions, Karpathy and Fei-FeiShow and Tell: A Neural Image Caption Generator, Vinyals et al.Long-term Recurrent Convolutional Networks for Visual Recognition and Description, Donahue et al.Learning a Recurrent Visual Representation for Image Caption Generation, Chen and Zitnick
Image Captioning
![Page 52: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/52.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201652
Convolutional Neural Network
Recurrent Neural Network
![Page 53: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/53.jpg)
test image
![Page 54: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/54.jpg)
test image
![Page 55: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/55.jpg)
test image
X
![Page 56: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/56.jpg)
test image
x0<START>
<START>
![Page 57: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/57.jpg)
h0
x0<START>
y0
<START>
test image
before:h = tanh(Wxh * x + Whh * h)
now:h = tanh(Wxh * x + Whh * h + Wih * v)
v
Wih
![Page 58: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/58.jpg)
h0
x0<START>
y0
<START>
test image
straw
sample!
![Page 59: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/59.jpg)
h0
x0<START>
y0
<START>
test image
straw
h1
y1
![Page 60: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/60.jpg)
h0
x0<START>
y0
<START>
test image
straw
h1
y1
hat
sample!
![Page 61: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/61.jpg)
h0
x0<START>
y0
<START>
test image
straw
h1
y1
hat
h2
y2
![Page 62: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/62.jpg)
h0
x0<START>
y0
<START>
test image
straw
h1
y1
hat
h2
y2
sample<END> token=> finish.
![Page 63: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/63.jpg)
Image Sentence Datasets
Microsoft COCO[Tsung-Yi Lin et al. 2014]mscoco.org
currently:~120K images~5 sentences each
![Page 64: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/64.jpg)
![Page 65: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/65.jpg)
![Page 66: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/66.jpg)
66Show Attend and Tell, Xu et al., 2015
Preview of fancier architectures
RNN attends spatially to different parts of images while generating each word of the sentence:
![Page 67: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/67.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201667
time
depth
RNN:
![Page 68: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/68.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201668
time
depth
RNN:
LSTM:
![Page 69: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/69.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201669
LSTM
![Page 70: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/70.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201670
Long Short Term Memory (LSTM)[Hochreiter et al., 1997]
x
h
vector from before (h)
W
i
f
o
g
vector from below (x)
sigmoid
sigmoid
tanh
sigmoid
4n x 2n 4n 4*n
![Page 71: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/71.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201671
Long Short Term Memory (LSTM)[Hochreiter et al., 1997]
cellstate c
f
x
![Page 72: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/72.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201672
Long Short Term Memory (LSTM)[Hochreiter et al., 1997]
cellstate c
f
x
i g
x
+
![Page 73: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/73.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201673
Long Short Term Memory (LSTM)[Hochreiter et al., 1997]
cellstate c
f
x +
tanh
o x
h
c
i g
x
![Page 74: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/74.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201674
Long Short Term Memory (LSTM)[Hochreiter et al., 1997]
cellstate c
f
x +
tanh
o x
h
c
i g
x
higher layer, or prediction
![Page 75: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/75.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201675
LSTM
cellstate c
f
x
i g
x
+
tanh
o
xf
x
i g
x
+
tanh
o
x
one timestep one timestep
h hh x x
![Page 76: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/76.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201676
f f fRNN
state
f
+
f
+
f
+LSTM(ignoring forget gates)
![Page 77: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/77.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201677
Recall: “PlainNets” vs. ResNets
ResNet is to PlainNet what LSTM is to RNN, kind of.
![Page 78: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/78.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201678
Understanding gradient flow dynamicsCute backprop signal video: http://imgur.com/gallery/vaNahKE
![Page 79: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/79.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201679
Understanding gradient flow dynamics
if the largest eigenvalue is > 1, gradient will explodeif the largest eigenvalue is < 1, gradient will vanish
[On the difficulty of training Recurrent Neural Networks, Pascanu et al., 2013]
![Page 80: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/80.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201680
Understanding gradient flow dynamics
if the largest eigenvalue is > 1, gradient will explodeif the largest eigenvalue is < 1, gradient will vanish
[On the difficulty of training Recurrent Neural Networks, Pascanu et al., 2013]
can control exploding with gradient clippingcan control vanishing with LSTM
![Page 81: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/81.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201681
LSTM variants and friends
[LSTM: A Search Space Odyssey, Greff et al., 2015]
[An Empirical Exploration of Recurrent Network Architectures, Jozefowicz et al., 2015]
GRU [Learning phrase representations using rnn encoder-decoder for statistical machine translation, Cho et al. 2014]
![Page 82: Recurrent Neural Networks - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 38 8 Feb 2016](https://reader033.vdocuments.net/reader033/viewer/2022042806/5f6986328c82d44053490c70/html5/thumbnails/82.jpg)
Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201682
Summary- RNNs allow a lot of flexibility in architecture design- Vanilla RNNs are simple but don’t work very well- Common to use LSTM or GRU: their additive interactions
improve gradient flow- Backward flow of gradients in RNN can explode or vanish.
Exploding is controlled with gradient clipping. Vanishing is controlled with additive interactions (LSTM)
- Better/simpler architectures are a hot topic of current research- Better understanding (both theoretical and empirical) is needed.