cs 4803 / 7643: deep learning · “2-layer neural net”, or “1-hidden-layer neural net”...
TRANSCRIPT
![Page 1: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/1.jpg)
CS 4803 / 7643: Deep Learning
Zsolt Kira Georgia Tech
Topics: – Neural Networks– Computing Gradients
– (Finish) manual differentiation
![Page 2: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/2.jpg)
Administrivia• PS1/HW1 out
• Instructor Office hours: 10-11am Mondays CODA 11th floor, conference room C1106 Briarcliff– This week only: Thursday 11am (same room)
• Start thinking about project topics/teams
(C) Dhruv Batra & Zsolt Kira 2
![Page 3: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/3.jpg)
Do the Readings!
(C) Dhruv Batra & Zsolt Kira 3
![Page 4: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/4.jpg)
Recap from last time
(C) Dhruv Batra & Zsolt Kira 4
![Page 5: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/5.jpg)
(Before) Linear score function:
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Neural networks: without the brain stuff
![Page 6: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/6.jpg)
6
(Before) Linear score function:
(Now) 2-layer Neural Networkor 3-layer Neural Network
Neural networks: without the brain stuff
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 7: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/7.jpg)
● Not really how neurons work (they are more complex)● This is the same thing as● Started with binary output (0/1) depending on whether sum is larger than 0
(or threshold if no bias is used)● Still want to add non-linearity to increase representational power
7
Impulses carried toward cell body
Impulses carried away from cell body
This image by Felipe Peruchois licensed under CC-BY 3.0
dendrite
cell body
axon
presynaptic terminal
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 8: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/8.jpg)
8
sigmoid activation function
Impulses carried toward cell body
Impulses carried away from cell body
This image by Felipe Peruchois licensed under CC-BY 3.0
dendrite
cell body
axon
presynaptic terminal
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 9: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/9.jpg)
Sigmoid
tanh
ReLU
Leaky ReLU
Maxout
ELU
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Activation functions
With ReLU it’s what we showed before
![Page 10: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/10.jpg)
“Fully-connected” layers“2-layer Neural Net”, or“1-hidden-layer Neural Net”
“3-layer Neural Net”, or“2-hidden-layer Neural Net”
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Neural networks: Architectures
Representational Power: • Any continuous functions with 1 hidden layer• Any function with 2 hidden layers• Doesn’t say how many nodes we’ll need!
![Page 11: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/11.jpg)
11
(Before) Linear score function:
(Now) 2-layer Neural Networkor 3-layer Neural Network
Equivalent to Math Viewpoint!
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
x hW1 sW2
3072 100 10
![Page 12: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/12.jpg)
Strategy: Follow the slope
In 1-dimension, the derivative of a function:
In multiple dimensions, the gradient is the vector of (partial derivatives) along each dimension
The slope in any direction is the dot product of the direction with the gradientThe direction of steepest descent is the negative gradient
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 13: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/13.jpg)
Gradient Descent Pseudocodefor i in {0,…,num_epochs}:
for x, y in data:𝑦 𝑆𝑀 𝑊𝑥𝐿 𝐶𝐸 𝑦, 𝑦
? ? ?
𝑊 ≔ 𝑊 𝛼
Some design decisions:• How many examples to use to calculate gradient per iteration? • What should alpha (learning rate) be?
• Should it be constant throughout?• How many epochs to run to?
![Page 14: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/14.jpg)
Full sum expensive when N is large!
Approximate sum using a minibatch of examples32 / 64 / 128 common
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Stochastic Gradient Descent (SGD)
![Page 15: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/15.jpg)
How do we compute gradients?• Analytic or “Manual” Differentiation
• Symbolic Differentiation
• Numerical Differentiation
• Automatic Differentiation– Forward mode AD– Reverse mode AD
• aka “backprop”
(C) Dhruv Batra & Zsolt Kira 15
![Page 16: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/16.jpg)
gradient dW:
[-2.5,?,?,?,?,?,?,?,?,…]
(1.25322 - 1.25347)/0.0001= -2.5
current W:
[0.34,-1.11,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25347
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
W + h (first dim):
[0.34 + 0.0001,-1.11,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25322
![Page 17: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/17.jpg)
Manually Computing Gradients
17
• Our function: linear with sigmoid activation
• Definition of sigmoid
• Sigmoid has a nice derivative!
• Start with our loss function• Calculate derivative:
2*inside*derivative of inside• Also get rid of y (constant!) leaving
a negative sign for rest
• Move negative out, include derivative of g (g’) and multiple by derivative of inside
• Define some terms
• Write with new terms and substitute derivative of sigmoid in
![Page 18: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/18.jpg)
Plan for Today• Decomposing a function• Backpropagation algorithm (w/ example)• Math
– Function composition– Vectors– Tensors
(C) Dhruv Batra & Zsolt Kira 18
![Page 19: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/19.jpg)
How to Simplify? • Calculating gradients for large functions is
complicated
• Idea: Decompose the function and compute local gradients for each part!
(C) Dhruv Batra & Zsolt Kira 19
![Page 20: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/20.jpg)
Computational Graphs• Notation
(C) Dhruv Batra & Zsolt Kira 20
![Page 21: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/21.jpg)
Example
(C) Dhruv Batra & Zsolt Kira 21
+
sin( )
x1 x2
*
![Page 22: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/22.jpg)
x
W
hinge loss
R
+ Ls (scores)
*
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Computational Graph
![Page 23: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/23.jpg)
Logistic Regression as a Cascade
(C) Dhruv Batra and Zsolt Kira 23
Given a library of simple functions
Compose into a
complicate function
Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
![Page 24: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/24.jpg)
input image
loss
weights
Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced with permission.
Convolutional network (AlexNet)
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 25: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/25.jpg)
Figure reproduced with permission from a Twitter post by Andrej Karpathy.
input image
loss
Neural Turing Machine
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 26: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/26.jpg)
How do we compute gradients?• Analytic or “Manual” Differentiation
• Symbolic Differentiation
• Numerical Differentiation
• Automatic Differentiation– Forward mode AD– Reverse mode AD
• aka “backprop”
(C) Dhruv Batra & Zsolt Kira 26
![Page 27: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/27.jpg)
Any DAG of differentiable modules is allowed!
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra & Zsolt Kira 27
Computational Graph
![Page 28: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/28.jpg)
Directed Acyclic Graphs (DAGs)• Exactly what the name suggests
– Directed edges– No (directed) cycles– Underlying undirected cycles okay
(C) Dhruv Batra 28
![Page 29: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/29.jpg)
Directed Acyclic Graphs (DAGs)• Concept
– Topological Ordering
(C) Dhruv Batra 29
![Page 30: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/30.jpg)
Directed Acyclic Graphs (DAGs)
(C) Dhruv Batra 30
![Page 31: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/31.jpg)
Key Computation: Forward-Prop
(C) Dhruv Batra & Zsolt Kira 31Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
![Page 32: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/32.jpg)
Key Computation: Back-Prop
(C) Dhruv Batra & Zsolt Kira 32Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
![Page 33: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/33.jpg)
How to Simplify? • Calculating gradients for large functions is
complicated
• Idea: Decompose the function and compute local gradients for each part!
• We will use the chain rule to calculate the gradients– We will receive an upstream gradient from layer after us– We will then use that to compute local gradients using chain
rule
(C) Dhruv Batra & Zsolt Kira 33
![Page 34: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/34.jpg)
Neural Network Training• Step 1: Compute Loss on mini-batch [F-Pass]
(C) Dhruv Batra & Zsolt Kira 34Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
![Page 35: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/35.jpg)
Neural Network Training• Step 1: Compute Loss on mini-batch [F-Pass]
(C) Dhruv Batra & Zsolt Kira 35Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
![Page 36: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/36.jpg)
Neural Network Training• Step 1: Compute Loss on mini-batch [F-Pass]
(C) Dhruv Batra & Zsolt Kira 36Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
![Page 37: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/37.jpg)
Neural Network Training• Step 1: Compute Loss on mini-batch [F-Pass]• Step 2: Compute gradients wrt parameters [B-Pass]
(C) Dhruv Batra & Zsolt Kira 37Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
![Page 38: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/38.jpg)
Neural Network Training• Step 1: Compute Loss on mini-batch [F-Pass]• Step 2: Compute gradients wrt parameters [B-Pass]
(C) Dhruv Batra & Zsolt Kira 38Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
![Page 39: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/39.jpg)
Neural Network Training• Step 1: Compute Loss on mini-batch [F-Pass]• Step 2: Compute gradients wrt parameters [B-Pass]
(C) Dhruv Batra & Zsolt Kira 39Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
![Page 40: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/40.jpg)
Neural Network Training• Step 1: Compute Loss on mini-batch [F-Pass]• Step 2: Compute gradients wrt parameters [B-Pass]• Step 3: Use gradient to update parameters
(C) Dhruv Batra & Zsolt Kira 40Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
![Page 41: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/41.jpg)
41
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Backpropagation: a simple example
![Page 42: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/42.jpg)
42
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Backpropagation: a simple example
![Page 43: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/43.jpg)
43
e.g. x = -2, y = 5, z = -4
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Backpropagation: a simple example
![Page 44: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/44.jpg)
44
e.g. x = -2, y = 5, z = -4
Want:
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Backpropagation: a simple example
![Page 45: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/45.jpg)
45
e.g. x = -2, y = 5, z = -4
Want:
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Backpropagation: a simple example
![Page 46: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/46.jpg)
46
e.g. x = -2, y = 5, z = -4
Want:
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Backpropagation: a simple example
![Page 47: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/47.jpg)
47
e.g. x = -2, y = 5, z = -4
Want:
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Backpropagation: a simple example
![Page 48: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/48.jpg)
48
e.g. x = -2, y = 5, z = -4
Want:
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Backpropagation: a simple example
![Page 49: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/49.jpg)
49
e.g. x = -2, y = 5, z = -4
Want:
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Backpropagation: a simple example
![Page 50: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/50.jpg)
50
e.g. x = -2, y = 5, z = -4
Want:
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Backpropagation: a simple example
![Page 51: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/51.jpg)
51
e.g. x = -2, y = 5, z = -4
Want:
Backpropagation: a simple example
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 52: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/52.jpg)
52
e.g. x = -2, y = 5, z = -4
Want:
Backpropagation: a simple example
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 53: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/53.jpg)
53
e.g. x = -2, y = 5, z = -4
Want:
Chain rule:
Upstream gradient
Localgradient
Backpropagation: a simple example
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 54: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/54.jpg)
54
Chain rule:
e.g. x = -2, y = 5, z = -4
Want: Upstream gradient
Localgradient
Backpropagation: a simple example
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 55: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/55.jpg)
55
e.g. x = -2, y = 5, z = -4
Want:
Chain rule:
Upstream gradient
Localgradient
Backpropagation: a simple example
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 56: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/56.jpg)
56
Chain rule:
e.g. x = -2, y = 5, z = -4
Want: Upstream gradient
Localgradient
Backpropagation: a simple example
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 57: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/57.jpg)
Patterns in backward flow
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 58: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/58.jpg)
Patterns in backward flow
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 59: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/59.jpg)
add gate: gradient distributor
Patterns in backward flow
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 60: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/60.jpg)
add gate: gradient distributorQ: What is a max gate?
Patterns in backward flow
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 61: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/61.jpg)
add gate: gradient distributormax gate: gradient router
Patterns in backward flow
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 62: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/62.jpg)
add gate: gradient distributormax gate: gradient routerQ: What is a mul gate?
Patterns in backward flow
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 63: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/63.jpg)
add gate: gradient distributormax gate: gradient routermul gate: gradient switcher
Patterns in backward flow
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 64: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/64.jpg)
(C) Dhruv Batra and Zsolt Kira 64
Summary• We will have a composed non-linear function as our model
– Several portions will have parameters
• We will use (stochastic/mini-batch) gradient descent with a loss function to define our objective
• Rather than analytically derive gradients for complex function, we will modularize computation
– Back propagation = Gradient Descent + Chain Rule
• Now:– Work through mathematical view– Vectors, matrices, and tensors– Next time: Can the computer do this for us automatically?
• Read:– https://explained.ai/matrix-calculus/index.html– https://www.cc.gatech.edu/classes/AY2020/cs7643_fall/slides/L5_gradients
_notes.pdf
![Page 65: CS 4803 / 7643: Deep Learning · “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net” Slide Credit: Fei-FeiLi,](https://reader035.vdocuments.net/reader035/viewer/2022071210/602240eacb1af20c5100b329/html5/thumbnails/65.jpg)
Matrix/Vector Derivatives Notation• Read:
– https://explained.ai/matrix-calculus/index.html– https://www.cc.gatech.edu/classes/AY2020/cs7643_fall/slide
s/L5_gradients_notes.pdf
• Matrix/Vector Derivatives Notation• Vector Derivative Example• Extension to Tensors• Chain Rule: Composite Functions
– Scalar Case– Vector Case– Jacobian view– Graphical view– Tensors
• Logistic Regression Derivatives
(C) Dhruv Batra & Zsolt Kira 65