neural networks and lecture 4:...
TRANSCRIPT
![Page 1: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/1.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 20201
Lecture 4:Neural Networks and Backpropagation
![Page 2: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/2.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020
Administrative: Assignment 1
Assignment 1 due Wednesday April 22, 11:59pm
If using Google Cloud, you don’t need GPUs for this assignment!
2
![Page 3: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/3.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020
Administrative: Project Proposal
Project proposal due 4/27
3
![Page 4: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/4.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020
Administrative: Discussion Section
Discussion section tomorrow:
Backpropagation
4
![Page 5: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/5.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020
Administrative: Midterm UpdatesUniversity has updated guidance on administering exams in spring quarter. In order to comply with the current policies, we have changed the exam format as the following to be consistent with exams in previous offerings of cs 231n:
Date: released on Tuesday 5/12 (open for 24 hours to choose 1hr 40 mins time frame)
Format: Timestamped with Gradescope
5
![Page 6: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/6.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 20206
Linear score function
SVM loss (or softmax)
data loss + regularization
Where we are...
![Page 7: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/7.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 20207
Finding the best W: Optimize with Gradient Descent
Landscape image is CC0 1.0 public domainWalking man image is CC0 1.0 public domain
![Page 8: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/8.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 20208
Numerical gradient: slow :(, approximate :(, easy to write :)Analytic gradient: fast :), exact :), error-prone :(
In practice: Derive analytic gradient, check your implementation with numerical gradient
Gradient descent
![Page 9: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/9.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 20209
How to find the best W?
Linear score function
SVM loss (or softmax)
data loss + regularization
Where we are...
![Page 10: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/10.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020
Problem: Linear Classifiers are not very powerful
10
Visual Viewpoint
Linear classifiers learn one template per class
Geometric Viewpoint
Linear classifiers can only draw linear decision boundaries
![Page 11: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/11.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020
Pixel Features
11
f(x) = WxClass scores
![Page 12: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/12.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020
Image Features
12
f(x) = WxClass scores
Feature Representation
![Page 13: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/13.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020
Image Features: Motivation
13
x
y
Cannot separate red and blue points with linear classifier
![Page 14: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/14.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020
Image Features: Motivation
14
x
y
r
θ
f(x, y) = (r(x, y), θ(x, y))
Cannot separate red and blue points with linear classifier
After applying feature transform, points can be separated by linear classifier
![Page 15: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/15.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020
Example: Color Histogram
15
+1
![Page 16: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/16.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020
Example: Histogram of Oriented Gradients (HoG)
16
Divide image into 8x8 pixel regionsWithin each region quantize edge direction into 9 bins
Example: 320x240 image gets divided into 40x30 bins; in each bin there are 9 numbers so feature vector has 30*40*9 = 10,800 numbers
Lowe, “Object recognition from local scale-invariant features”, ICCV 1999Dalal and Triggs, "Histograms of oriented gradients for human detection," CVPR 2005
![Page 17: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/17.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020
Example: Bag of Words
17
Extract random patches
Cluster patches to form “codebook” of “visual words”
Step 1: Build codebook
Step 2: Encode images
Fei-Fei and Perona, “A bayesian hierarchical model for learning natural scene categories”, CVPR 2005
![Page 18: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/18.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020
Image Features
18
![Page 19: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/19.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020
Feature Extraction
Image features vs ConvNets
19
f10 numbers giving scores for classes
training
training
10 numbers giving scores for classes
Krizhevsky, Sutskever, and Hinton, “Imagenet classification with deep convolutional neural networks”, NIPS 2012.Figure copyright Krizhevsky, Sutskever, and Hinton, 2012. Reproduced with permission.
![Page 20: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/20.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020
One Solution: Feature Transformation
20
f(x, y) = (r(x, y), θ(x, y))
Transform data with a cleverly chosen feature transform f, then apply linear classifier
Color Histogram Histogram of Oriented Gradients (HoG)
![Page 21: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/21.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202021
Today: Neural Networks
![Page 22: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/22.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202022
Neural networks: without the brain stuff
(Before) Linear score function:
![Page 23: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/23.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202023
(Before) Linear score function:
(Now) 2-layer Neural Network
Neural networks: without the brain stuff
(In practice we will usually add a learnable bias at each layer as well)
![Page 24: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/24.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202024
(Before) Linear score function:
(Now) 2-layer Neural Network
Neural networks: without the brain stuff
(In practice we will usually add a learnable bias at each layer as well)
“Neural Network” is a very broad term; these are more accurately called “fully-connected networks” or sometimes “multi-layer perceptrons” (MLP)
![Page 25: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/25.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202025
Neural networks: without the brain stuff
(Before) Linear score function:
(Now) 2-layer Neural Network or 3-layer Neural Network
(In practice we will usually add a learnable bias at each layer as well)
![Page 26: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/26.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202026
(Before) Linear score function:
(Now) 2-layer Neural Network
Neural networks: without the brain stuff
x hW1 sW2
3072 100 10
![Page 27: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/27.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202027
(Before) Linear score function:
(Now) 2-layer Neural Network
Neural networks: without the brain stuff
x hW1 sW2
3072 100 10
Learn 100 templates instead of 10. Share templates between classes
![Page 28: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/28.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020
The function is called the activation function.Q: What if we try to build a neural network without one?
28
(Before) Linear score function:
(Now) 2-layer Neural Network
Neural networks: without the brain stuff
![Page 29: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/29.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020
The function is called the activation function.Q: What if we try to build a neural network without one?
29
(Before) Linear score function:
(Now) 2-layer Neural Network
Neural networks: without the brain stuff
A: We end up with a linear classifier again!
![Page 30: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/30.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202030
Sigmoid
tanh
ReLU
Leaky ReLU
Maxout
ELU
Activation functions
![Page 31: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/31.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202031
Sigmoid
tanh
ReLU
Leaky ReLU
Maxout
ELU
Activation functions ReLU is a good default choice for most problems
![Page 32: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/32.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202032
“Fully-connected” layers“2-layer Neural Net”, or“1-hidden-layer Neural Net”
“3-layer Neural Net”, or“2-hidden-layer Neural Net”
Neural networks: Architectures
![Page 33: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/33.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202033
Example feed-forward computation of a neural network
![Page 34: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/34.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202034
Full implementation of training a 2-layer Neural Network needs ~20 lines:
![Page 35: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/35.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202035
Full implementation of training a 2-layer Neural Network needs ~20 lines:
Define the network
![Page 36: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/36.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202036
Full implementation of training a 2-layer Neural Network needs ~20 lines:
Define the network
Forward pass
![Page 37: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/37.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202037
Full implementation of training a 2-layer Neural Network needs ~20 lines:
Define the network
Forward pass
Calculate the analytical gradients
![Page 38: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/38.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202038
Full implementation of training a 2-layer Neural Network needs ~20 lines:
Define the network
Gradient descent
Forward pass
Calculate the analytical gradients
![Page 39: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/39.jpg)
Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201639
Setting the number of layers and their sizes
more neurons = more capacity
![Page 40: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/40.jpg)
Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201640
(Web demo with ConvNetJS: http://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html)
Do not use size of neural network as a regularizer. Use stronger regularization instead:
![Page 41: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/41.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202041
This image by Fotis Bobolas is licensed under CC-BY 2.0
![Page 42: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/42.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202042
Impulses carried toward cell body
Impulses carried away from cell body
This image by Felipe Peruchois licensed under CC-BY 3.0
dendrite
cell body
axon
presynaptic terminal
![Page 43: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/43.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202043
Impulses carried toward cell body
Impulses carried away from cell body
This image by Felipe Peruchois licensed under CC-BY 3.0
dendrite
cell body
axon
presynaptic terminal
![Page 44: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/44.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202044
sigmoid activation function
Impulses carried toward cell body
Impulses carried away from cell body
This image by Felipe Peruchois licensed under CC-BY 3.0
dendrite
cell body
axon
presynaptic terminal
![Page 45: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/45.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 20204545
Impulses carried toward cell body
Impulses carried away from cell body
This image by Felipe Peruchois licensed under CC-BY 3.0
dendrite
cell body
axon
presynaptic terminal
![Page 46: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/46.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202046
This image is CC0 Public Domain
Biological Neurons: Complex connectivity patterns
Neurons in a neural network:Organized into regular layers for computational efficiency
![Page 47: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/47.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202047
This image is CC0 Public Domain
Biological Neurons: Complex connectivity patterns
But neural networks with random connections can work too!
Xie et al, “Exploring Randomly Wired Neural Networks for Image Recognition”, arXiv 2019
![Page 48: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/48.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202048
Biological Neurons:● Many different types● Dendrites can perform complex non-linear computations● Synapses are not a single weight but a complex non-linear dynamical
system
[Dendritic Computation. London and Hausser]
Be very careful with your brain analogies!
![Page 49: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/49.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020
If we can compute then we can learn W1 and W2
49
Problem: How to compute gradients?
Nonlinear score function
SVM Loss on predictions
Regularization
Total loss: data loss + regularization
![Page 50: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/50.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202050
(Bad) Idea: Derive on paper
Problem: What if we want to change loss? E.g. use softmax instead of SVM? Need to re-derive from scratch =(
Problem: Very tedious: Lots of matrix calculus, need lots of paper
Problem: Not feasible for very complex models!
![Page 51: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/51.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202051
x
W
hinge loss
R
+ Ls (scores)
Better Idea: Computational graphs + Backpropagation
*
![Page 52: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/52.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202052
input image
loss
weights
Convolutional network(AlexNet)
Figure copyright Alex Krizhevsky, Ilya Sutskever, and
Geoffrey Hinton, 2012. Reproduced with permission.
![Page 53: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/53.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202053
Neural Turing Machine
Figure reproduced with permission from a Twitter post by Andrej Karpathy.
input image
loss
![Page 54: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/54.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020
Neural Turing Machine
Figure reproduced with permission from a Twitter post by Andrej Karpathy.
![Page 55: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/55.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202055
Solution: Backpropagation
![Page 56: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/56.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202056
Backpropagation: a simple example
![Page 57: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/57.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202057
Backpropagation: a simple example
![Page 58: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/58.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201758
e.g. x = -2, y = 5, z = -4
Backpropagation: a simple example
![Page 59: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/59.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201759
e.g. x = -2, y = 5, z = -4
Backpropagation: a simple example
![Page 60: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/60.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201760
e.g. x = -2, y = 5, z = -4
Backpropagation: a simple example
![Page 61: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/61.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201761
e.g. x = -2, y = 5, z = -4
Want:
Backpropagation: a simple example
![Page 62: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/62.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201762
e.g. x = -2, y = 5, z = -4
Want:
Backpropagation: a simple example
![Page 63: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/63.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201763
e.g. x = -2, y = 5, z = -4
Want:
Backpropagation: a simple example
![Page 64: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/64.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201764
e.g. x = -2, y = 5, z = -4
Want:
Backpropagation: a simple example
![Page 65: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/65.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201765
e.g. x = -2, y = 5, z = -4
Want:
Backpropagation: a simple example
![Page 66: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/66.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201766
e.g. x = -2, y = 5, z = -4
Want:
Backpropagation: a simple example
![Page 67: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/67.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201767
e.g. x = -2, y = 5, z = -4
Want:
Backpropagation: a simple example
![Page 68: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/68.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201768
e.g. x = -2, y = 5, z = -4
Want:
Backpropagation: a simple example
Chain rule:
Upstream gradient
Localgradient
![Page 69: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/69.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201769
Chain rule:
e.g. x = -2, y = 5, z = -4
Want:
Backpropagation: a simple example
Upstream gradient
Localgradient
![Page 70: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/70.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201770
e.g. x = -2, y = 5, z = -4
Want:
Backpropagation: a simple example
Chain rule:
Upstream gradient
Localgradient
![Page 71: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/71.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201771
Chain rule:
e.g. x = -2, y = 5, z = -4
Want:
Backpropagation: a simple example
Upstream gradient
Localgradient
![Page 72: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/72.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202072
f
![Page 73: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/73.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202073
f
“local gradient”
![Page 74: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/74.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202074
f
“local gradient”
“Upstreamgradient”
![Page 75: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/75.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202075
f
“local gradient”
“Upstreamgradient”
“Downstreamgradients”
![Page 76: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/76.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202076
f
“local gradient”
“Upstreamgradient”
“Downstreamgradients”
![Page 77: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/77.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202077
f
“local gradient”
“Upstreamgradient”
“Downstreamgradients”
![Page 78: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/78.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202078
Another example:
![Page 79: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/79.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202079
Another example:
![Page 80: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/80.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202080
Another example:
![Page 81: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/81.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202081
Another example:
![Page 82: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/82.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202082
Another example:
![Page 83: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/83.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202083
Another example:
Upstream gradient
Localgradient
![Page 84: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/84.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202084
Another example:
![Page 85: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/85.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202085
Another example:
Upstream gradient
Localgradient
![Page 86: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/86.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202086
Another example:
![Page 87: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/87.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202087
Another example:
Upstream gradient
Localgradient
![Page 88: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/88.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202088
Another example:
![Page 89: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/89.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202089
Another example:
Upstream gradient
Localgradient
![Page 90: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/90.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202090
Another example:
![Page 91: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/91.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202091
Another example:
[upstream gradient] x [local gradient][0.2] x [1] = 0.2[0.2] x [1] = 0.2 (both inputs!)
![Page 92: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/92.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202092
Another example:
![Page 93: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/93.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202093
Another example:
[upstream gradient] x [local gradient]w0: [0.2] x [-1] = -0.2x0: [0.2] x [2] = 0.4
![Page 94: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/94.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202094
Another example:
Sigmoid
Sigmoid function
Computational graph representation may not be unique. Choose one where local gradients at each node can be easily expressed!
![Page 95: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/95.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202095
Another example:
Sigmoid
Sigmoid function
Sigmoid local gradient:
Computational graph representation may not be unique. Choose one where local gradients at each node can be easily expressed!
![Page 96: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/96.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202096
Another example:
Sigmoid
Sigmoid function
Sigmoid local gradient:
Computational graph representation may not be unique. Choose one where local gradients at each node can be easily expressed!
[upstream gradient] x [local gradient][1.00] x [(1 - 1/(1+e1)) (1/(1+e1))] = 0.2
![Page 97: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/97.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202097
Another example:
Sigmoid
Sigmoid function
Sigmoid local gradient:
Computational graph representation may not be unique. Choose one where local gradients at each node can be easily expressed!
[upstream gradient] x [local gradient][1.00] x [(1 - 0.73) (0.73)] = 0.2
![Page 98: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/98.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202098
add gate: gradient distributor
Patterns in gradient flow
+3
472
2
2
![Page 99: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/99.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 202099
add gate: gradient distributor
Patterns in gradient flow
+3
472
2
2
mul gate: “swap multiplier”
×2
365
5*3=15
2*5=10
![Page 100: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/100.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020100
add gate: gradient distributor
Patterns in gradient flow
+3
472
2
2
mul gate: “swap multiplier”
copy gate: gradient adder
×2
365
5*3=15
2*5=10
7
77
4+2=6
4
2
![Page 101: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/101.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020101
add gate: gradient distributor
Patterns in gradient flow
+3
472
2
2
mul gate: “swap multiplier”
max gate: gradient router
max
copy gate: gradient adder
×2
365
5*3=15
2*5=10
4
559
0
9
7
77
4+2=6
4
2
![Page 102: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/102.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020102
Backprop Implementation: “Flat” code Forward pass:
Compute output
Backward pass:Compute grads
![Page 103: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/103.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020103
Backprop Implementation: “Flat” code Forward pass:
Compute output
Base case
![Page 104: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/104.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020104
Backprop Implementation: “Flat” code Forward pass:
Compute output
Sigmoid
![Page 105: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/105.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020105
Backprop Implementation: “Flat” code Forward pass:
Compute output
Add gate
![Page 106: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/106.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020106
Backprop Implementation: “Flat” code Forward pass:
Compute output
Add gate
![Page 107: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/107.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020107
Backprop Implementation: “Flat” code Forward pass:
Compute output
Multiply gate
![Page 108: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/108.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020108
Backprop Implementation: “Flat” code Forward pass:
Compute output
Multiply gate
![Page 109: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/109.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020109
Stage your forward/backward computation!E.g. for the SVM:
margins
“Flat” Backprop: Do this for assignment 1!
![Page 110: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/110.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020110
“Flat” Backprop: Do this for assignment 1!E.g. for two-layer neural net:
![Page 111: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/111.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020111
Backprop Implementation: Modularized API
Graph (or Net) object (rough pseudo code)
![Page 112: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/112.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020112
(x,y,z are scalars)
x
y
z*
Modularized implementation: forward / backward API
Need to stash some values for use in backward
Gate / Node / Function object: Actual PyTorch code
Upstream gradient
Multiply upstream and local gradients
![Page 113: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/113.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020113
Example: PyTorch operators
![Page 114: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/114.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020114
Source
Forward
PyTorch sigmoid layer
![Page 115: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/115.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020115
PyTorch sigmoid layer
Source
Forward
Forward actually defined elsewhere...
![Page 116: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/116.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020116
Source
Forward
Backward
PyTorch sigmoid layer
Forward actually defined elsewhere...
![Page 117: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/117.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020117
So far: backprop with scalars
What about vector-valued functions?
![Page 118: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/118.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020118
Recap: Vector derivativesScalar to Scalar
Regular derivative:
If x changes by a small amount, how much will y change?
![Page 119: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/119.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020119
Recap: Vector derivativesScalar to Scalar
Regular derivative:
If x changes by a small amount, how much will y change?
Vector to Scalar
Derivative is Gradient:
For each element of x, if it changes by a small amount then how much will y change?
![Page 120: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/120.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020120
Recap: Vector derivativesScalar to Scalar
Regular derivative:
If x changes by a small amount, how much will y change?
Vector to Scalar
Derivative is Gradient:
For each element of x, if it changes by a small amount then how much will y change?
Vector to Vector
Derivative is Jacobian:
For each element of x, if it changes by a small amount then how much will each element of y change?
![Page 121: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/121.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020121
f
Backprop with Vectors
Loss L still a scalar!
![Page 122: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/122.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020122
f
Backprop with Vectors
Dx
Dy
Dz
Loss L still a scalar!
![Page 123: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/123.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020123
f
“Upstream gradient”
Backprop with Vectors
Dx
Dy
Dz
Loss L still a scalar!
![Page 124: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/124.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020124
f
“Upstream gradient”
Dx
Dy
Dz
Dz
Loss L still a scalar!
For each element of z, how much does it influence L?
Backprop with Vectors
![Page 125: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/125.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020125
f
“local gradients”
“Upstream gradient”
Dx
Dy
Dz
Dz
Loss L still a scalar!
For each element of z, how much does it influence L?
“Downstream gradients”
Backprop with Vectors
![Page 126: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/126.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020126
f
“local gradients”
“Upstream gradient”
Dx
Dy
Dz
Dz
Loss L still a scalar!
[Dy x Dz]
[Dx x Dz]
Jacobian matrices
For each element of z, how much does it influence L?
“Downstream gradients”
Backprop with Vectors
![Page 127: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/127.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020127
f
“local gradients”
“Upstream gradient”
“Downstream gradients”
Dx
Dy
Dz
Dz
Loss L still a scalar!
[Dy x Dz]
[Dx x Dz]
Jacobian matrices
For each element of z, how much does it influence L?
Dy
Dx
Matrix-vectormultiply
Backprop with Vectors
![Page 128: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/128.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020128
f
“Upstream gradient”
Dx
Dy
Dz
Dz
Loss L still a scalar!
For each element of z, how much does it influence L?
Dy
Dx
Gradients of variables wrt loss have same dims as the original variable
![Page 129: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/129.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020129
f(x) = max(0,x)(elementwise)
4D input x:[ 1 ][ -2 ][ 3 ][ -1 ]
Backprop with Vectors4D output z:
[ 1 ][ 0 ][ 3 ][ 0 ]
![Page 130: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/130.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020130
f(x) = max(0,x)(elementwise)
4D input x:[ 1 ][ -2 ][ 3 ][ -1 ]
Backprop with Vectors4D output z:
[ 1 ][ 0 ][ 3 ][ 0 ]
4D dL/dz: [ 4 ][ -1 ][ 5 ][ 9 ]
Upstreamgradient
![Page 131: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/131.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020131
f(x) = max(0,x)(elementwise)
4D input x:[ 1 ][ -2 ][ 3 ][ -1 ]
Backprop with Vectors4D output z:
[ 1 ][ 0 ][ 3 ][ 0 ]
4D dL/dz: [ 4 ][ -1 ][ 5 ][ 9 ]
Jacobian dz/dx[ 1 0 0 0 ] [ 0 0 0 0 ] [ 0 0 1 0 ] [ 0 0 0 0 ]
Upstreamgradient
![Page 132: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/132.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020132
f(x) = max(0,x)(elementwise)
4D input x:[ 1 ][ -2 ][ 3 ][ -1 ]
Backprop with Vectors4D output z:
[ 1 ][ 0 ][ 3 ][ 0 ]
4D dL/dz: [ 4 ][ -1 ][ 5 ][ 9 ]
[dz/dx] [dL/dz][ 1 0 0 0 ] [ 4 ][ 0 0 0 0 ] [ -1 ][ 0 0 1 0 ] [ 5 ][ 0 0 0 0 ] [ 9 ]
Upstreamgradient
![Page 133: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/133.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020133
f(x) = max(0,x)(elementwise)
4D input x:[ 1 ][ -2 ][ 3 ][ -1 ]
Backprop with Vectors4D output z:
[ 1 ][ 0 ][ 3 ][ 0 ]
4D dL/dz: [ 4 ][ -1 ][ 5 ][ 9 ]
[dz/dx] [dL/dz][ 1 0 0 0 ] [ 4 ][ 0 0 0 0 ] [ -1 ][ 0 0 1 0 ] [ 5 ][ 0 0 0 0 ] [ 9 ]
Upstreamgradient
4D dL/dx: [ 4 ][ 0 ][ 5 ][ 0 ]
![Page 134: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/134.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020134
f(x) = max(0,x)(elementwise)
4D input x:[ 1 ][ -2 ][ 3 ][ -1 ]
Backprop with Vectors4D output z:
[ 1 ][ 0 ][ 3 ][ 0 ]
4D dL/dz: [ 4 ][ -1 ][ 5 ][ 9 ]
[dz/dx] [dL/dz][ 1 0 0 0 ] [ 4 ][ 0 0 0 0 ] [ -1 ][ 0 0 1 0 ] [ 5 ][ 0 0 0 0 ] [ 9 ]
Upstreamgradient
Jacobian is sparse: off-diagonal entries always zero! Never explicitly form Jacobian -- instead use implicit multiplication
4D dL/dx: [ 4 ][ 0 ][ 5 ][ 0 ]
![Page 135: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/135.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020135
f(x) = max(0,x)(elementwise)
4D input x:[ 1 ][ -2 ][ 3 ][ -1 ]
Backprop with Vectors4D output z:
[ 1 ][ 0 ][ 3 ][ 0 ]
4D dL/dz: [ 4 ][ -1 ][ 5 ][ 9 ]
[dz/dx] [dL/dz]4D dL/dx: [ 4 ][ 0 ][ 5 ][ 0 ]
Upstreamgradient
Jacobian is sparse: off-diagonal entries always zero! Never explicitly form Jacobian -- instead use implicit multiplication
z
![Page 136: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/136.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020136
f
Backprop with Matrices (or Tensors)
[Dx×Mx]
Loss L still a scalar!
Jacobian matrices
Matrix-vectormultiply
[Dy×My]
[Dz×Mz]
dL/dx always has the same shape as x!
![Page 137: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/137.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020137
f
“Upstream gradient”
“Downstream gradients”
Backprop with Matrices (or Tensors)
[Dx×Mx]
Loss L still a scalar!
Jacobian matrices
For each element of z, how much does it influence L?
Matrix-vectormultiply
[Dy×My]
[Dz×Mz]
[Dz×Mz]
[Dx×Mx]
[Dy×My]
dL/dx always has the same shape as x!
![Page 138: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/138.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020138
“local gradients”
“Upstream gradient”
“Downstream gradients”
Backprop with Matrices (or Tensors)
[Dx×Mx]
Loss L still a scalar!
Jacobian matrices
For each element of z, how much does it influence L?
For each element of y, how much does it influence each element of z?
Matrix-vectormultiply
[Dy×My]
[Dz×Mz]
[Dz×Mz]
[Dx×Mx]
[Dy×My]
dL/dx always has the same shape as x!
![Page 139: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/139.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020139
“local gradients”
“Upstream gradient”
“Downstream gradients”
Backprop with Matrices (or Tensors)
[Dx×Mx]
Loss L still a scalar!
[(Dx×Mx)×(Dz×Mz)]
Jacobian matrices
For each element of z, how much does it influence L?
For each element of y, how much does it influence each element of z?
Matrix-vectormultiply
[Dy×My]
[Dz×Mz]
[Dz×Mz][(Dy×My)×(Dz×Mz)]
[Dx×Mx]
[Dy×My]
dL/dx always has the same shape as x!
![Page 140: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/140.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020140
Backprop with Matricesx: [N×D]
[ 2 1 -3 ][ -3 4 2 ]w: [D×M]
[ 3 2 1 -1][ 2 1 3 2][ 3 2 1 -2]
Matrix Multiply
y: [N×M][13 9 -2 -6 ][ 5 2 17 1 ]
dL/dy: [N×M][ 2 3 -3 9 ][ -8 1 4 6 ]
Also see derivation in the course notes:http://cs231n.stanford.edu/handouts/linear-backprop.pdf
![Page 141: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/141.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020141
Backprop with Matricesx: [N×D]
[ 2 1 -3 ][ -3 4 2 ]w: [D×M]
[ 3 2 1 -1][ 2 1 3 2][ 3 2 1 -2]
Matrix Multiply
y: [N×M][13 9 -2 -6 ][ 5 2 17 1 ]
dL/dy: [N×M][ 2 3 -3 9 ][ -8 1 4 6 ]Jacobians:
dy/dx: [(N×D)×(N×M)]dy/dw: [(D×M)×(N×M)]
For a neural net we may have N=64, D=M=4096
Each Jacobian takes 256 GB of memory! Must work with them implicitly!
![Page 142: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/142.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020142
Backprop with Matricesx: [N×D]
[ 2 1 -3 ][ -3 4 2 ]w: [D×M]
[ 3 2 1 -1][ 2 1 3 2][ 3 2 1 -2]
Matrix Multiply
y: [N×M][13 9 -2 -6 ][ 5 2 17 1 ]
dL/dy: [N×M][ 2 3 -3 9 ][ -8 1 4 6 ]Q: What parts of y
are affected by one element of x?
![Page 143: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/143.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020143
Backprop with Matricesx: [N×D]
[ 2 1 -3 ][ -3 4 2 ]w: [D×M]
[ 3 2 1 -1][ 2 1 3 2][ 3 2 1 -2]
Matrix Multiply
y: [N×M][13 9 -2 -6 ][ 5 2 17 1 ]
dL/dy: [N×M][ 2 3 -3 9 ][ -8 1 4 6 ]Q: What parts of y
are affected by one element of x?A: affects the whole row
![Page 144: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/144.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020144
Backprop with Matricesx: [N×D]
[ 2 1 -3 ][ -3 4 2 ]w: [D×M]
[ 3 2 1 -1][ 2 1 3 2][ 3 2 1 -2]
Matrix Multiply
y: [N×M][13 9 -2 -6 ][ 5 2 17 1 ]
dL/dy: [N×M][ 2 3 -3 9 ][ -8 1 4 6 ]Q: What parts of y
are affected by one element of x?A: affects the whole row
Q: How much does affect ?
![Page 145: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/145.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020145
Backprop with Matricesx: [N×D]
[ 2 1 -3 ][ -3 4 2 ]w: [D×M]
[ 3 2 1 -1][ 2 1 3 2][ 3 2 1 -2]
Matrix Multiply
y: [N×M][13 9 -2 -6 ][ 5 2 17 1 ]
dL/dy: [N×M][ 2 3 -3 9 ][ -8 1 4 6 ]Q: What parts of y
are affected by one element of x?A: affects the whole row
Q: How much does affect ?A:
![Page 146: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/146.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020146
Backprop with Matricesx: [N×D]
[ 2 1 -3 ][ -3 4 2 ]w: [D×M]
[ 3 2 1 -1][ 2 1 3 2][ 3 2 1 -2]
Matrix Multiply
y: [N×M][13 9 -2 -6 ][ 5 2 17 1 ]
dL/dy: [N×M][ 2 3 -3 9 ][ -8 1 4 6 ]Q: What parts of y
are affected by one element of x?A: affects the whole row
Q: How much does affect ?A:
[N×D] [N×M] [M×D]
![Page 147: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/147.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020147
Backprop with Matricesx: [N×D]
[ 2 1 -3 ][ -3 4 2 ]w: [D×M]
[ 3 2 1 -1][ 2 1 3 2][ 3 2 1 -2]
Matrix Multiply
y: [N×M][13 9 -2 -6 ][ 5 2 17 1 ]
dL/dy: [N×M][ 2 3 -3 9 ][ -8 1 4 6 ]
[N×D] [N×M] [M×D] [D×M] [D×N] [N×M]
By similar logic:
These formulas are easy to remember: they are the only way to make shapes match up!
![Page 148: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/148.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020148
● (Fully-connected) Neural Networks are stacks of linear functions and nonlinear activation functions; they have much more representational power than linear classifiers
● backpropagation = recursive application of the chain rule along a computational graph to compute the gradients of all inputs/parameters/intermediates
● implementations maintain a graph structure, where the nodes implement the forward() / backward() API
● forward: compute result of an operation and save any intermediates needed for gradient computation in memory
● backward: apply the chain rule to compute the gradient of the loss function with respect to the inputs
Summary for today:
![Page 149: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/149.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020
Next Time: Convolutional Networks!
149
![Page 150: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/150.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020150
A vectorized example:
![Page 151: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/151.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020151
A vectorized example:
![Page 152: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/152.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020152
A vectorized example:
![Page 153: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/153.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020153
A vectorized example:
![Page 154: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/154.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020154
A vectorized example:
![Page 155: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/155.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020155
A vectorized example:
![Page 156: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/156.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020156
A vectorized example:
![Page 157: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/157.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020157
A vectorized example:
![Page 158: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/158.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020158
A vectorized example:
![Page 159: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/159.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020159
A vectorized example:
![Page 160: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/160.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020160
A vectorized example:
![Page 161: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/161.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020161
A vectorized example:
![Page 162: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/162.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020162
A vectorized example:
Always check: The gradient with respect to a variable should have the same shape as the variable
![Page 163: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/163.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020163
A vectorized example:
![Page 164: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/164.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020164
A vectorized example:
![Page 165: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/165.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020165
A vectorized example:
![Page 166: Neural Networks and Lecture 4: Backpropagationvision.stanford.edu/teaching/cs231n/slides/2020/lecture_4.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020 24](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f454a0807015e1fd430b9f1/html5/thumbnails/166.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 2017166
In discussion section: A matrix example...
?
?