lecture 10 recap - github pages
TRANSCRIPT
![Page 1: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/1.jpg)
Lecture 10 Recap
1I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 2: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/2.jpg)
LeNet
• Digit recognition: 10 classes
• Conv -> Pool -> Conv -> Pool -> Conv -> FC• As we go deeper: Width, Height Number of Filters
2
60k parameters
I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 3: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/3.jpg)
AlexNet
• Softmax for 1000 classes3
[Krizhevsky et al., ANIPS’12] AlexNet
I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 4: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/4.jpg)
VGGNet
• Striving for simplicity– Conv -> Pool -> Conv -> Pool -> Conv -> FC– Conv=3x3, s=1, same; Maxpool=2x2, s=2
• As we go deeper: Width, Height Number of Filters• Called VGG-16: 16 layers that have weights
• Large but simplicity makes it appealing
4
[Simonyan et al., ICLR’15] VGGNet
138M parameters
I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 5: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/5.jpg)
Residual Block
• Two layers
5
Linear LinearInput
𝑥!"#𝑥!$# 𝑥!
𝑥!"# = 𝑓(𝑊!"#𝑥! + 𝑏!"# + 𝑥!$#)
𝑥!"# = 𝑓(𝑊!"#𝑥! + 𝑏!"#)I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 6: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/6.jpg)
Inception Layer
6
[Szegedy et al., CVPR’15] GoogleNet
I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 7: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/7.jpg)
Lecture 11
7I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 8: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/8.jpg)
Transfer Learning
8I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 9: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/9.jpg)
Transfer Learning
• Training your own model can be difficult with limited data and other resources
e.g.,• It is a laborious task to
manually annotate your own training dataset
à Why not reuse already pre-trained models?
9I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 10: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/10.jpg)
Transfer Learning
10
P1 P2
Large dataset Small dataset
Distribution Distribution
Use what has been learned for another
settingI2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 11: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/11.jpg)
[Zeiler al., ECCV’14] Visualizing and Understanding Convolutional Networks
Transfer Learning for Images
11I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 12: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/12.jpg)
Transfer Learning
12
Trained on ImageNet
Feature extraction
[Donahue et al., ICML’14] DeCAF, [Razavian et al., CVPRW’14] CNN Features off-the-shelf
I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 13: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/13.jpg)
Transfer Learning
13
Trained on ImageNet
Edges
Simple geometrical shapes (circles, etc)
Parts of an object (wheel, window)
Decision layers
[Donahue et al., ICML’14] DeCAF, [Razavian et al., CVPRW’14] CNN Features off-the-shelf
I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 14: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/14.jpg)
Transfer Learning
14
Trained on ImageNet
New dataset with C classes
TRAIN
FROZEN
[Donahue et al., ICML’14] DeCAF, [Razavian et al., CVPRW’14] CNN Features off-the-shelf
I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 15: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/15.jpg)
Transfer Learning
15
If the dataset is big enough train more layers with a low
learning rate
TRAIN
FROZEN
I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 16: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/16.jpg)
When Transfer Learning makes Sense
• When task T1 and T2 have the same input (e.g. an RGB image)
• When you have more data for task T1 than for task T2
• When the low-level features for T1 could be useful to learn T2
16I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 17: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/17.jpg)
Now you are:
• Ready to perform image classification on any dataset
• Ready to design your own architecture
• Ready to deal with other problems such as semantic segmentation (Fully Convolutional Network)
17I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 18: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/18.jpg)
Recurrent Neural Networks
18I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 19: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/19.jpg)
Processing Sequences
• Recurrent neural networks process sequence data
• Input/output can be sequences
19I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 20: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/20.jpg)
RNNs are Flexible
20
Classic Neural Networks for Image Classification
Source: http://karpathy.github.io/2015/05/21/rnn-effectiveness/I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 21: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/21.jpg)
RNNs are Flexible
21
Image captioning
Source: http://karpathy.github.io/2015/05/21/rnn-effectiveness/I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 22: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/22.jpg)
RNNs are Flexible
22
Language recognition
Source: http://karpathy.github.io/2015/05/21/rnn-effectiveness/I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 23: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/23.jpg)
RNNs are Flexible
23
Machine translation
Source: http://karpathy.github.io/2015/05/21/rnn-effectiveness/I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 24: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/24.jpg)
RNNs are Flexible
24
Event classificationSource: http://karpathy.github.io/2015/05/21/rnn-effectiveness/I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 25: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/25.jpg)
RNNs are Flexible
25
Event classificationSource: http://karpathy.github.io/2015/05/21/rnn-effectiveness/I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 26: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/26.jpg)
Basic Structure of an RNN
• Multi-layer RNN
26
Outputs
Inputs
Hidden states
I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 27: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/27.jpg)
Basic Structure of an RNN
• Multi-layer RNN
27
Outputs
Inputs
Hidden states
The hidden state will have its own internal dynamics
More expressive model!
I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 28: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/28.jpg)
Basic Structure of an RNN
• We want to have notion of “time” or “sequence”
28
Hidden state inputPrevious
hidden state
𝑨" = 𝜽#𝑨"$% + 𝜽&𝒙"
[Olah, https://colah.github.io ’15] Understanding LSTMsI2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 29: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/29.jpg)
Basic Structure of an RNN
• We want to have notion of “time” or “sequence”
29
Hidden state Parameters to be learned
𝑨" = 𝜽#𝑨"$% + 𝜽&𝒙"
[Olah, https://colah.github.io ’15] Understanding LSTMsI2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 30: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/30.jpg)
Basic Structure of an RNN
• We want to have notion of “time” or “sequence”
30
Hidden state
Note: non-linearitiesignored for now
Output𝑨" = 𝜽#𝑨"$% + 𝜽&𝒙"
𝒉" = 𝜽𝒉𝑨"
[Olah, https://colah.github.io ’15] Understanding LSTMsI2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 31: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/31.jpg)
Basic Structure of an RNN
• We want to have notion of “time” or “sequence”
31
Hidden state
Same parameters for each time step = generalization!
Output𝑨" = 𝜽#𝑨"$% + 𝜽&𝒙"
𝒉" = 𝜽𝒉𝑨"
[Olah, https://colah.github.io ’15] Understanding LSTMsI2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 32: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/32.jpg)
Basic Structure of an RNN
• Unrolling RNNs
32
Same function for the hidden layers
[Olah, https://colah.github.io ’15] Understanding LSTMsI2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 33: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/33.jpg)
Basic Structure of an RNN
• Unrolling RNNs
33[Olah, https://colah.github.io ’15] Understanding LSTMsI2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 34: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/34.jpg)
Basic Structure of an RNN
• Unrolling RNNs as feedforward nets
34
w1
w2
w3
w4
w1
w2
w3
w4
w1
w2
w3
w4
w1
w2
w3
w4
xt1
xt2
xt+21
xt+22xt+1
2
xt+11
Weights are the same!
I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 35: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/35.jpg)
Backprop through an RNN
• Unrolling RNNs as feedforward nets
35
w1
w2
w3
w4
w1
w2
w3
w4
w1
w2
w3
w4
w1
w2
w3
w4
Chain rule
All the way to 𝑡 = 0
Add the derivatives at different times for each weightI2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 36: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/36.jpg)
Long-term Dependencies
36
I moved to Germany … so I speak German fluently.
[Olah, https://colah.github.io ’15] Understanding LSTMsI2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 37: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/37.jpg)
Long-term Dependencies
• Simple recurrence
• Let us forget the input
37
Same weights are multiplied over and over again
𝑨" = 𝜽#𝑨"$% + 𝜽&𝒙"
𝑨" = 𝜽𝒄"𝑨(
I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 38: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/38.jpg)
Long-term Dependencies
• Simple recurrence
38
What happens to small weights?
What happens to large weights?
Vanishing gradient
Exploding gradient
𝑨" = 𝜽𝒄%𝑨(
I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 39: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/39.jpg)
Long-term Dependencies
• Simple recurrence
• If 𝜽 admits eigendecomposition
39
Diagonal of this matrix are the eigenvalues
Matrix of eigenvectors
𝑨" = 𝜽𝒄%𝑨(
𝜽 = 𝑸𝚲𝑸&
I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 40: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/40.jpg)
Long-term Dependencies
• Simple recurrence
• If 𝜽 admits eigendecomposition
• Orthogonal 𝜽 allows us to simplify the recurrence
40
𝑨% = 𝑸𝚲%𝑸&𝑨'
𝜽 = 𝑸𝚲𝑸&
𝑨" = 𝜽"𝑨(
I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 41: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/41.jpg)
Long-term Dependencies
• Simple recurrence
41
What happens to eigenvalues with magnitude less than one?
What happens to eigenvalues with magnitude larger than one?
Vanishing gradient
Exploding gradient Gradient clipping
𝑨" = 𝑸𝚲)𝑸*𝑨(
I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 42: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/42.jpg)
Long-term Dependencies
• Simple recurrence
42
Let us just make a matrix with eigenvalues = 1
Allow the cell to maintain its “state”
𝑨" = 𝜽𝒄%𝑨(
I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 43: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/43.jpg)
Vanishing Gradient
• 1. From the weights
• 2. From the activation functions (𝑡𝑎𝑛ℎ)
43
𝑨" = 𝜽𝒄%𝑨(
[Olah, https://colah.github.io ’15] Understanding LSTMsI2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 44: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/44.jpg)
𝑨" = 𝜽"𝑨(
Vanishing Gradient
• 1. From the weights
• 2. From the activation functions (𝑡𝑎𝑛ℎ)
44
1?
[Olah, https://colah.github.io ’15] Understanding LSTMsI2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 45: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/45.jpg)
Long Short Term Memory
45
[Hochreiter et al., Neural Computation’97] Long Short-Term Memory
I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 46: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/46.jpg)
Long-Short Term Memory Units
Simple RNN has tanh as non-linearity
46[Olah, https://colah.github.io ’15] Understanding LSTMsI2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 47: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/47.jpg)
Long-Short Term Memory Units
LSTM
47[Olah, https://colah.github.io ’15] Understanding LSTMsI2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 48: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/48.jpg)
Long-Short Term Memory Units
• Key ingredients • Cell = transports the information through the unit
48[Olah, https://colah.github.io ’15] Understanding LSTMsI2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 49: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/49.jpg)
Long-Short Term Memory Units
• Key ingredients • Cell = transports the information through the unit• Gate = remove or add information to the cell state
49
Sigmoid
[Olah, https://colah.github.io ’15] Understanding LSTMsI2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 50: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/50.jpg)
LSTM: Step by Step
• Forget gate 𝒇" = 𝑠𝑖𝑔𝑚(𝜽&+𝒙" + 𝜽,+𝒉"$% + 𝒃+)
50
Decides when to erase the cell state
Sigmoid = output between 0 (forget) and 1 (keep)
[Olah, https://colah.github.io ’15] Understanding LSTMsI2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 51: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/51.jpg)
LSTM: Step by Step
• Input gate 𝒊" = 𝑠𝑖𝑔𝑚(𝜽&-𝒙" + 𝜽,-𝒉"$% + 𝒃-)
51
Decides which values will be updated
New cell state, output from a tanh (−1,1)
[Olah, https://colah.github.io ’15] Understanding LSTMsI2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 52: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/52.jpg)
LSTM: Step by Step
• Element-wise operations
52
Previous states
Current state
𝑪% = 𝒇% ⊙𝑪%$# +𝒊%⊙𝒈%
[Olah, https://colah.github.io ’15] Understanding LSTMsI2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 53: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/53.jpg)
LSTM: Step by Step• Output gate 𝒉" = 𝒐"⊙tanh 𝑪"
53
Decides which values will be outputted
Output from a tanh (−1, 1)
[Olah, https://colah.github.io ’15] Understanding LSTMsI2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 54: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/54.jpg)
LSTM: Step by Step
• Forget gate 𝒇" = 𝑠𝑖𝑔𝑚(𝜽&+𝒙" + 𝜽,+𝒉"$% + 𝒃+)
• Input gate 𝒊" = 𝑠𝑖𝑔𝑚(𝜽&-𝒙" + 𝜽,-𝒉"$% + 𝒃-)
• Output gate 𝒐" = 𝑠𝑖𝑔𝑚(𝜽&.𝒙" + 𝜽,.𝒉"$% + 𝒃.)
• Cell update 𝒈" = 𝑡𝑎𝑛ℎ(𝜽&/𝒙" + 𝜽,/𝒉"$% + 𝒃/)
• Cell 𝑪" = 𝒇" ⊙𝑪"$% +𝒊"⊙𝒈"
• Output 𝒉" = 𝒐"⊙tanh 𝑪"
54I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 55: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/55.jpg)
LSTM: Step by Step
• Forget gate 𝒇" = 𝑠𝑖𝑔𝑚(𝜽&+𝒙" + 𝜽,+𝒉"$% + 𝒃+)
• Input gate 𝒊" = 𝑠𝑖𝑔𝑚(𝜽&-𝒙" + 𝜽,-𝒉"$% + 𝒃-)
• Output gate 𝒐" = 𝑠𝑖𝑔𝑚(𝜽&.𝒙" + 𝜽,.𝒉"$% + 𝒃.)
• Cell update 𝒈" = 𝑡𝑎𝑛ℎ(𝜽&/𝒙" + 𝜽,/𝒉"$% + 𝒃/)
• Cell 𝑪" = 𝒇" ⊙𝑪"$% +𝒊"⊙𝒈"
• Output 𝒉" = 𝒐"⊙tanh 𝑪"
55
Learned through backpropagation
I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 56: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/56.jpg)
LSTM: Vanishing Gradients?
• Cell
56
• 1. From the weights
• 2. From the activation functions
weightsIdentity function
1 for important information
𝑪% = 𝒇% ⊙𝑪%$# +𝒊%⊙𝒈%
I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 57: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/57.jpg)
LSTM
• Highway for the gradient to flow
57[Olah, https://colah.github.io ’15] Understanding LSTMsI2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 58: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/58.jpg)
LSTM: Dimensions
• Cell update 𝒈" = 𝑡𝑎𝑛ℎ(𝜽&/𝒙" + 𝜽,/𝒉"$% + 𝒃/)
58
128
128
What operation do I need to do to my input to get a 128 vector representation?
128 128 128
When coding an LSTM, we have to define the size of the hidden state
Dimensions need to match
[Olah, https://colah.github.io ’15] Understanding LSTMsI2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 59: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/59.jpg)
General LSTM Units
59
• Input, states, and gates not limited to 1st-order tensors
• Gate functions can consist of FC and CNN layers
I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 60: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/60.jpg)
ConvLSTM for Video Sequences
60
• Input, hidden, and cell states are higher order tensors (i.e. images)
• Gates have CNN instead of FC layers
Hidden state
Cell stateGates
I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 61: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/61.jpg)
RNNs in Computer Vision
• Caption generation
61
[Xu et al., PMLR’15] Neural Image Caption Generation
I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 62: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/62.jpg)
RNNs in Computer Vision
• Instance segmentation
62
[Romera-Paredes et al., ECCV’16] Recurrent Instance Segmentation
I2DL: Prof. Niessner, Prof. Leal-Taixé
![Page 63: Lecture 10 Recap - GitHub Pages](https://reader031.vdocuments.net/reader031/viewer/2022012412/616beb159bce322da6074491/html5/thumbnails/63.jpg)
See you next time!
63I2DL: Prof. Niessner, Prof. Leal-Taixé