cs4670/5670: computer vision - cornell university[zeiler and fergus, “visualizing and...

158
Lecture 30: Neural Nets and CNNs CS4670/5670: Computer Vision Kavita Bala and Sean Bell

Upload: others

Post on 16-Oct-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Lecture 30: Neural Nets and CNNs

CS4670/5670: Computer VisionKavita Bala and Sean Bell

Page 2: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Today

• PA 4 due this week • PA 5 out this week. Due on Monday May 4

• HW 2 due next week

• Class on Monday (Charter Day)

Page 3: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Where are we?

Page 4: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Where are we?• Have score function and loss function –Will generalize the score function

• Find W and b to minimize loss

Page 5: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Where are we?

• Gradient descent to optimize loss functions – Batch gradient descent, stochastic gradient

descent – Momentum

• Where do we get gradients of loss? – Compute them – Backpropagation and chain rule (more today)

Page 6: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell
Page 7: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Image classification: training

Training Labels

Training Images

Page 8: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Image classification: training

Training Labels

Training Images

Page 9: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Image classification: training

Training Labels

Training Images

Classifier Training

Page 10: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Image classification: training

Training Labels

Training Images

Classifier TrainingTrained

Classifier

Page 11: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Image classification: training

Training Labels

Training Images

Classifier TrainingTrained

Classifier

Page 12: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Test Image

Image classification: testing

Page 13: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Test Image

Image classification: testing

Page 14: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Test Image

Trained Classifier

Image classification: testing

Page 15: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Test Image

Trained Classifier Outdoor

Prediction

Image classification: testing

Page 16: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Slide: R. Fergus

Page 17: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Slide: R. Fergus

Page 18: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Slide: R. Fergus

Page 19: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Slide: R. Fergus

Page 20: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Slide: R. Fergus

Page 21: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Feature hierarchy with CNNsEnd-to-end models

[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013]

Page 22: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell
Page 23: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell
Page 24: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Convolutional Neural NetworksCS 4670 Sean Bell

(Sep 2014)

Page 25: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Slide: Flickr

Page 26: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Slide: Flickr

Page 27: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Slide: Flickr

Page 28: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Slide: Flickr

Page 29: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Slide: Flickr

Page 30: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Slide: Flickr

This week, we’ll learn what this is, how to compute it, and how to learn it

Page 31: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

What is a ConvolutionalNeural Network (CNN)?

Page 32: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

What is a ConvolutionalNeural Network (CNN)?

function

Page 33: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

What is a ConvolutionalNeural Network (CNN)?

function function ...

P(class)

Page 34: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

What is a ConvolutionalNeural Network (CNN)?

function “horse”function ...

P(class)

Page 35: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

What is a ConvolutionalNeural Network (CNN)?

function “horse”function ...

P(class)

Key questions:

Page 36: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

What is a ConvolutionalNeural Network (CNN)?

function “horse”function ...

P(class)

Key questions:

- What kinds of of functions should we use?

Page 37: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

What is a ConvolutionalNeural Network (CNN)?

function “horse”function ...

P(class)

Key questions:

- What kinds of of functions should we use?

- How do we learn the parameters for those functions?

Page 38: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Example CNN

[Andrej Karpathy]

Page 39: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Example CNN

[Andrej Karpathy]

ConvReLU

ConvReLU

PoolFully

Connected

Softmax

Page 40: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

LeNet: a classifier for handwritten digits. [LeCun 1989]

CNNs in 1989: “LeNet”

Page 41: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

[Krizhevsky, Sutskever, Hinton. NIPS 2012]

“AlexNet” — Won the ILSVRC2012 Challenge

CNNs in 2012: “SuperVision” (aka “AlexNet”)

Page 42: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

[Krizhevsky, Sutskever, Hinton. NIPS 2012]

“AlexNet” — Won the ILSVRC2012 Challenge

CNNs in 2012: “SuperVision” (aka “AlexNet”)

Major breakthrough: 15.3% Top-5 error on ILSVRC2012 (Next best: 25.7%)

Page 43: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

CNNs in 2014: “GoogLeNet”

[Szegedy et al, arXiv 2014]

“GoogLeNet” — Won the ILSVRC2014 Challenge

Page 44: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

CNNs in 2014: “GoogLeNet”

[Szegedy et al, arXiv 2014]

“GoogLeNet” — Won the ILSVRC2014 Challenge

6.67% top-5 error rate!(1000 classes!)

Page 45: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

CNNs in 2014: “VGGNet”

[Simonyan et al, arXiv 2014]

“VGGNet” — Second Place in the ILSVRC2014 Challenge

No fancy picture, sorry

Page 46: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

CNNs in 2014: “VGGNet”

[Simonyan et al, arXiv 2014]

“VGGNet” — Second Place in the ILSVRC2014 Challenge

No fancy picture, sorry

7.3% top-5 error rate

Page 47: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

CNNs in 2014: “VGGNet”

[Simonyan et al, arXiv 2014]

“VGGNet” — Second Place in the ILSVRC2014 Challenge

No fancy picture, sorry

7.3% top-5 error rate

(and 1st place in the detection challenge)

Page 48: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Neural Networks

(First we’ll cover Neural Nets, then build up to Convolutional Neural Nets)

Page 49: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Inspiration from Biology

Figure: Andrej Karpathy

Page 50: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Inspiration from Biology

Figure: Andrej Karpathy

Page 51: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Inspiration from Biology

Neural nets are loosely inspired by biology

Figure: Andrej Karpathy

Page 52: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Inspiration from Biology

Neural nets are loosely inspired by biology

But they certainly are not a model of how the brain works, or even how neurons work

Figure: Andrej Karpathy

Page 53: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Simple Neural Net: 1 Layer

x Wx + b f

Let’s consider a simple 1-layer network:

Page 54: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Simple Neural Net: 1 Layer

x Wx + b f

Let’s consider a simple 1-layer network:

This is the same as what you saw

last class:

Page 55: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

1 Layer Neural Netx Wx + b f

(class scores)(Input)

Block Diagram:

Page 56: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

1 Layer Neural Netx Wx + b f

(class scores)(Input)

Block Diagram:

x

1

Expanded Block Diagram: DW

D

M + b

1

=

1

fM M

M classes D features 1 example

Page 57: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

1 Layer Neural Netx Wx + b f

(class scores)(Input)

Block Diagram:

NumPy:

x

1

Expanded Block Diagram: DW

D

M + b

1

=

1

fM M

M classes D features 1 example

Page 58: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

1 Layer Neural Net

Page 59: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

1 Layer Neural Net• How do we process N inputs at once?

Page 60: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

1 Layer Neural Net• How do we process N inputs at once?

• It’s most convenient to have the first dimension (row) represent which example we are looking at, so we need to transpose everything

Page 61: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

1 Layer Neural Net• How do we process N inputs at once?

• It’s most convenient to have the first dimension (row) represent which example we are looking at, so we need to transpose everything

XN

D

Page 62: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

1 Layer Neural Net• How do we process N inputs at once?

• It’s most convenient to have the first dimension (row) represent which example we are looking at, so we need to transpose everything

W

M

D

XN

D

Page 63: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

1 Layer Neural Net• How do we process N inputs at once?

• It’s most convenient to have the first dimension (row) represent which example we are looking at, so we need to transpose everything

+W

M

D

XN

D

b1

Mb!1

1N

Page 64: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

1 Layer Neural Net• How do we process N inputs at once?

• It’s most convenient to have the first dimension (row) represent which example we are looking at, so we need to transpose everything

+W

M

D

XN

D

=

M

FN

M classes D features

N examples

b1

Mb!1

1N

Page 65: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

1 Layer Neural Net• How do we process N inputs at once?

• It’s most convenient to have the first dimension (row) represent which example we are looking at, so we need to transpose everything

+W

M

D

XN

D

=

M

FN

M classes D features

N examplesNote: Often, if the weights are transposed, they are still called “W”

b1

Mb!1

1N

Page 66: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

1 Layer Neural Net

Page 67: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

1 Layer Neural Net

X Each row is one input example

D

N

Page 68: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

1 Layer Neural Net

W

M

D Each column is are the weights for one class

X Each row is one input example

D

N

Page 69: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

1 Layer Neural Net

W

M

D Each column is are the weights for one class

X Each row is one input example

D

N

MF N Each row is are the predicted

scores for one example

Page 70: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

1 Layer Neural Net

First attempt — let’s try this:

Implementing this with NumPy:

Page 71: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

1 Layer Neural Net

First attempt — let’s try this:

Implementing this with NumPy:

Doesn’t work — why?

Page 72: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

1 Layer Neural Net

First attempt — let’s try this:

Implementing this with NumPy:

Doesn’t work — why?

N x D M

D x M

Page 73: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

1 Layer Neural Net

First attempt — let’s try this:

- NumPy needs to know how to expand “b” from 1D to 2D

Implementing this with NumPy:

Doesn’t work — why?

N x D M

D x M

Page 74: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

1 Layer Neural Net

First attempt — let’s try this:

- NumPy needs to know how to expand “b” from 1D to 2D

Implementing this with NumPy:

Doesn’t work — why?

- This is called “broadcasting”

N x D M

D x M

Page 75: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

1 Layer Neural NetImplementing this with NumPy:

What does “np.newaxis” do?

Page 76: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

1 Layer Neural NetImplementing this with NumPy:

b = [0, 1, 2]

What does “np.newaxis” do?

Page 77: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

1 Layer Neural NetImplementing this with NumPy:

b = [0, 1, 2]

Make “b” a row vector

What does “np.newaxis” do?

Page 78: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

1 Layer Neural NetImplementing this with NumPy:

b = [0, 1, 2]

Make “b” a row vector

Make “b” a column vector

What does “np.newaxis” do?

Page 79: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

1 Layer Neural NetImplementing this with NumPy:

What does “np.newaxis” do?

Page 80: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

1 Layer Neural NetImplementing this with NumPy:

What does “np.newaxis” do?

Row vector (repeat along

rows)

Page 81: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

1 Layer Neural NetImplementing this with NumPy:

What does “np.newaxis” do?

Row vector (repeat along

rows)

Column vector (repeat along

columns)

Page 82: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

2 Layer Neural Net

x W (1)x + b(1)

What if we just added another layer?

h

Page 83: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

2 Layer Neural Net

x W (1)x + b(1)

What if we just added another layer?

W (2)h + b(2) fh

Page 84: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

2 Layer Neural Net

x W (1)x + b(1)

What if we just added another layer?

W (2)h + b(2) fh

Let’s expand out the equation:

Page 85: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

2 Layer Neural Net

x W (1)x + b(1)

What if we just added another layer?

f =W (2)(W (1)x + b(1) )+ b(2)

= (W (2)W (1) )x + (W (2)b(1) + b(2) )

W (2)h + b(2) fh

Let’s expand out the equation:

Page 86: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

2 Layer Neural Net

x W (1)x + b(1)

What if we just added another layer?

f =W (2)(W (1)x + b(1) )+ b(2)

= (W (2)W (1) )x + (W (2)b(1) + b(2) )

W (2)h + b(2) fh

Let’s expand out the equation:

But this is just the same as a 1 layer net with:

Page 87: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

2 Layer Neural Net

x W (1)x + b(1)

What if we just added another layer?

f =W (2)(W (1)x + b(1) )+ b(2)

= (W (2)W (1) )x + (W (2)b(1) + b(2) )

W (2)h + b(2) fh

Let’s expand out the equation:

But this is just the same as a 1 layer net with:

W =W (2)W (1) b =W (2)b(1) + b(2)

Page 88: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

2 Layer Neural Net

x W (1)x + b(1)

What if we just added another layer?

f =W (2)(W (1)x + b(1) )+ b(2)

= (W (2)W (1) )x + (W (2)b(1) + b(2) )

W (2)h + b(2) fh

Let’s expand out the equation:

But this is just the same as a 1 layer net with:

W =W (2)W (1) b =W (2)b(1) + b(2)

We need a non-linear operation between the layers

Page 89: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Nonlinearities

Page 90: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Nonlinearities

Sigmoid

Page 91: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Nonlinearities

Sigmoid Tanh

Page 92: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Nonlinearities

Sigmoid Tanh ReLU

Page 93: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Nonlinearities

Sigmoid Tanh ReLU

Historically popular

2 Big problems: - Not zero centered - They saturate

Page 94: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Nonlinearities

Sigmoid Tanh ReLU

Historically popular

2 Big problems: - Not zero centered - They saturate

- Zero-centered, - But also saturates

Page 95: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Nonlinearities

Sigmoid Tanh ReLU

Historically popular

2 Big problems: - Not zero centered - They saturate

- Zero-centered, - But also saturates

- No saturation - Very efficient

Page 96: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Nonlinearities

Sigmoid Tanh ReLU

Historically popular

2 Big problems: - Not zero centered - They saturate

- Zero-centered, - But also saturates

- No saturation - Very efficient

Best in practicefor classification

Page 97: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Nonlinearities — SaturationWhat happens if we reach this part?

Sigmoid

Page 98: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Nonlinearities — SaturationWhat happens if we reach this part?

Sigmoid

Page 99: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Nonlinearities — SaturationWhat happens if we reach this part?

Sigmoid

Page 100: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Nonlinearities — SaturationWhat happens if we reach this part?

Sigmoid

Page 101: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Nonlinearities — SaturationWhat happens if we reach this part?

σ (x) = 11+ e− x

Page 102: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Nonlinearities — SaturationWhat happens if we reach this part?

∂σ∂x

=σ (x) 1−σ (x)( )σ (x) = 11+ e− x

Page 103: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Nonlinearities — SaturationWhat happens if we reach this part?

∂σ∂x

=σ (x) 1−σ (x)( )σ (x) = 11+ e− x

Page 104: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Nonlinearities — SaturationWhat happens if we reach this part?

∂σ∂x

=σ (x) 1−σ (x)( )σ (x) = 11+ e− x

Page 105: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Nonlinearities — SaturationWhat happens if we reach this part?

∂σ∂x

=σ (x) 1−σ (x)( )σ (x) = 11+ e− x

Page 106: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Nonlinearities — SaturationWhat happens if we reach this part?

∂σ∂x

=σ (x) 1−σ (x)( )σ (x) = 11+ e− x

Saturation: the gradient is zero!

Page 107: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Nonlinearities

tanh

ReLU

[Krizhevsky 2012](AlexNet)

Page 108: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Nonlinearities

tanh

ReLU

In practice, ReLU converges ~6x faster than Tanh for classification problems

[Krizhevsky 2012](AlexNet)

Page 109: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

ReLU in NumPyMany ways to write ReLU — these are all equivalent:

Page 110: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

ReLU in NumPyMany ways to write ReLU — these are all equivalent:

(a) Elementwise max, “0” gets broadcasted to match the size of h1:

Page 111: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

ReLU in NumPyMany ways to write ReLU — these are all equivalent:

(a) Elementwise max, “0” gets broadcasted to match the size of h1:

(b) Make a boolean mask where negative values are True, and then set those entries in h1 to 0:

Page 112: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

ReLU in NumPyMany ways to write ReLU — these are all equivalent:

(a) Elementwise max, “0” gets broadcasted to match the size of h1:

(b) Make a boolean mask where negative values are True, and then set those entries in h1 to 0:

(c) Make a boolean mask where positive values are True, and then do an elementwise multiplication (since int(True) = 1):

Page 113: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

2 Layer Neural Net

Page 114: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

2 Layer Neural Net

x W (1)x + b(1)

(Layer 1)

Page 115: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

2 Layer Neural Net

x W (1)x + b(1)

(Layer 1)

h1max(0,⋅) h(1)(Nonlinearity)

(“hidden activations”)

Page 116: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

2 Layer Neural Net

x W (1)x + b(1)

(Layer 1)

W (2)x + b(2) f

(Layer 2)

h1max(0,⋅) h(1)(Nonlinearity)

(“hidden activations”)

Page 117: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

2 Layer Neural Net

Let’s expand out the equation:

x W (1)x + b(1)

(Layer 1)

W (2)x + b(2) f

(Layer 2)

h1max(0,⋅) h(1)(Nonlinearity)

(“hidden activations”)

Page 118: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

2 Layer Neural Net

f =W (2) max(0,W (1)x + b(1) )+ b(2)Let’s expand out the equation:

x W (1)x + b(1)

(Layer 1)

W (2)x + b(2) f

(Layer 2)

h1max(0,⋅) h(1)(Nonlinearity)

(“hidden activations”)

Page 119: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

2 Layer Neural Net

f =W (2) max(0,W (1)x + b(1) )+ b(2)Let’s expand out the equation:

Now it no longer simplifies — yay

x W (1)x + b(1)

(Layer 1)

W (2)x + b(2) f

(Layer 2)

h1max(0,⋅) h(1)(Nonlinearity)

(“hidden activations”)

Page 120: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

2 Layer Neural Net

f =W (2) max(0,W (1)x + b(1) )+ b(2)Let’s expand out the equation:

Now it no longer simplifies — yay

Note: any nonlinear function will prevent this collapse, but not all nonlinear functions actually work in practice

x W (1)x + b(1)

(Layer 1)

W (2)x + b(2) f

(Layer 2)

h1max(0,⋅) h(1)(Nonlinearity)

(“hidden activations”)

Page 121: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

2 Layer Neural Net

h1max(0,⋅) h(1)(Nonlinearity)

(“hidden activations”)

Note: Traditionally, the nonlinearity was considered part of the layer and is called an “activation function”

In this class, we will consider them separate layers, but be aware that many others consider them part of the layer

x W (1)x + b(1)

(Layer 1)

W (2)x + b(2) f

(Layer 2)

Page 122: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Neural Net: Graphical Representation

Page 123: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Neural Net: Graphical Representation

2 layers

Page 124: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Neural Net: Graphical Representation

2 layers

- Called “fully connected”because every output depends on every input.

- Also called “affine” or “inner product”

Page 125: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Neural Net: Graphical Representation

2 layers

3 layers

- Called “fully connected”because every output depends on every input.

- Also called “affine” or “inner product”

Page 126: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Questions?

Page 127: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Neural Networks, More generally

x h(1) fFunction Function h(2) ...

Page 128: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Neural Networks, More generally

x h(1) fFunction Function h(2) ...

This can be:- Fully connected layer - Nonlinearity (ReLU, Tanh, Sigmoid) - Convolution - Pooling (Max, Avg) - Vector normalization (L1, L2) - Invent new ones

Page 129: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Neural Networks, More generally

x h(1)Function Function h(2) ... f

Page 130: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Neural Networks, More generally

x h(1)Function Function h(2) ... f(Input) (Scores)(Hidden

Activation)

Page 131: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Neural Networks, More generally

x h(1)Function Function h(2) ...θ (1)

fθ (2)

(Parameters)

(Input) (Scores)(Hidden Activation)

(Parameters)

Page 132: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Neural Networks, More generally

x h(1)Function Function h(2) ...θ (1)

f

Here, represents whatever parameters that layer is using (e.g. for a fully connected layer ).

θθ (1) = { W (1), b(1) }

θ (2)(Parameters)

(Input) (Scores)(Hidden Activation)

(Parameters)

Page 133: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Neural Networks, More generally

x h(1)Function Function h(2) ...θ (1)

f

Here, represents whatever parameters that layer is using (e.g. for a fully connected layer ).

θθ (1) = { W (1), b(1) }

θ (2)

y(Ground Truth Labels)

(Parameters)

(Input) (Scores)(Hidden Activation)

(Parameters)

Page 134: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Neural Networks, More generally

x h(1)Function Function h(2) ...θ (1)

f

Here, represents whatever parameters that layer is using (e.g. for a fully connected layer ).

θθ (1) = { W (1), b(1) }

θ (2)

y(Ground Truth Labels)

(Parameters)

L(Loss)

(Input) (Scores)(Hidden Activation)

(Parameters)

Page 135: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Neural Networks, More generally

x h(1)Function Function h(2) ...θ (1)

f

Here, represents whatever parameters that layer is using (e.g. for a fully connected layer ).

θθ (1) = { W (1), b(1) }

θ (2)

y(Ground Truth Labels)

(Parameters)

L(Loss)

(Input) (Scores)(Hidden Activation)

(Parameters)

Recall: the loss “L” measures how far the predictions “f” are from the labels “y”. The most common loss is Softmax.

Page 136: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Neural Networks, More generally

x h(1)

L

Function Function h(2) ...

y

θ (1)

f

θ (2)

Page 137: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Neural Networks, More generally

x h(1)

L

Function Function h(2) ...

y

θ (1)

f

θ (2)

Goal: Find a value for parameters ( , , …), so that the loss (L) is small

θ (1) θ (2)

Page 138: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Neural Networks, More generally

x h(1) LFunction Function h(2) ...

θ (1) θ (2)

- To keep things clean, we will sometimes hide “y”, but remember that it is always there

Page 139: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Neural Networks, More generally

x h(1) LFunction Function h(2) ...

θ (1) θ (2)

- To keep things clean, we will sometimes hide “y”, but remember that it is always there

- From last class, we learned that to improve the weights, we can take a step in the negative gradient direction:

Page 140: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Neural Networks, More generally

x h(1) LFunction Function h(2) ...

θ (1) θ (2)

- To keep things clean, we will sometimes hide “y”, but remember that it is always there

- From last class, we learned that to improve the weights, we can take a step in the negative gradient direction:

θ ←θ −α ∂L∂θ

Page 141: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Neural Networks, More generally

x h(1) LFunction Function h(2) ...

θ (1) θ (2)

- To keep things clean, we will sometimes hide “y”, but remember that it is always there

- From last class, we learned that to improve the weights, we can take a step in the negative gradient direction:

θ ←θ −α ∂L∂θ

- How do we compute this?

Page 142: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Backpropagation[Rumelhart, Hinton, Williams. Nature 1986]

Page 143: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Backpropagation[Rumelhart, Hinton, Williams. Nature 1986]

Page 144: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Backpropagation[Rumelhart, Hinton, Williams. Nature 1986]

Page 145: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

x h(1) LFunction Function f...θ (1) θ (n)

Forward Propagation:

Backpropagation

(Last layer)(First layer)

Page 146: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

x h(1) LFunction Function f...θ (1) θ (n)

Forward Propagation:

Backward Propagation:

Backpropagation

(Last layer)(First layer)

Page 147: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

x h(1) LFunction Function f...θ (1) θ (n)

Forward Propagation:

L

Backward Propagation:

Backpropagation

(Last layer)(First layer)

Page 148: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

x h(1) LFunction Function f...θ (1) θ (n)

Forward Propagation:

L∂L∂ f

Backward Propagation:

Backpropagation

(Last layer)(First layer)

Page 149: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

x h(1) LFunction Function f...θ (1) θ (n)

Forward Propagation:

L∂L∂ f

Backward Propagation:

Function

∂L∂θ (n)

Backpropagation

(Last layer)(First layer)

Page 150: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

x h(1) LFunction Function f...θ (1) θ (n)

Forward Propagation:

L∂L∂ f

...

Backward Propagation:

∂L∂h(1)

Function

∂L∂θ (n)

Backpropagation

(Last layer)(First layer)

Page 151: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

x h(1) LFunction Function f...θ (1) θ (n)

Forward Propagation:

L∂L∂ f

...

Backward Propagation:

∂L∂h(1)

Function

∂L∂θ (n)

Function

∂L∂θ (1)

∂L∂x

Backpropagation

(Last layer)(First layer)

Page 152: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Backpropagation

LFunction Function∂L∂h(2)

...

Backward Propagation:

∂L∂h(1)

∂L∂θ (2)

∂L∂θ (1)

∂L∂x

Page 153: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Backpropagation

LFunction Function∂L∂h(2)

...

Backward Propagation:

∂L∂h(1)

∂L∂θ (2)

∂L∂θ (1)

∂L∂x

First, compute this

Page 154: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Backpropagation

LFunction Function∂L∂h(2)

...

Backward Propagation:

∂L∂h(1)

∂L∂θ (2)

∂L∂θ (1)

∂L∂x

First, compute this

Then compute this

Page 155: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

Backpropagation

LFunction Function∂L∂h(2)

...

Backward Propagation:

∂L∂h(1)

∂L∂θ (2)

∂L∂θ (1)

∂L∂x

Key observation: You can compute and given only , and you can do it one layer at a time.

∂L∂θ (2)

∂L∂h(2)

∂L∂h(1)

First, compute this

Then compute this

Page 156: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

∂L∂x

= ∂L∂h

∂h∂x

I hope everyone remembers the chain rule:

Chain rule recap

Page 157: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

∂L∂x

= ∂L∂h

∂h∂x

I hope everyone remembers the chain rule:

Chain rule recap

x h

∂L∂h

∂L∂x

Forward propagation:

Backward propagation:

...

...

Page 158: CS4670/5670: Computer Vision - Cornell University[Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013] Convolutional Neural Networks CS 4670 Sean Bell

∂L∂x

= ∂L∂h

∂h∂x

I hope everyone remembers the chain rule:

… but what if x and y are multi-dimensional?

Chain rule recap

x h

∂L∂h

∂L∂x

Forward propagation:

Backward propagation:

...

...