all you want to know about cnnsfidler/teaching/2015/slides/csc2523/... · 2016-01-21 · deep...

79
All You Want To Know About CNNs Yukun Zhu

Upload: others

Post on 28-Jan-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

All You Want To Know About CNNsYukun Zhu

Page 2: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Deep Learning

Page 3: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Deep Learning

Image from http://imgur.com/

Page 4: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Deep Learning

Image from http://imgur.com/

Page 5: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Deep Learning

Image from http://imgur.com/

Page 6: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Deep Learning

Image from http://imgur.com/

Page 7: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Deep Learning

Image from http://imgur.com/

Page 8: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Deep Learning

Image from http://imgur.com/

Page 9: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Deep Learning in Vision

33.4

DPM(2010)

Object detection performance, PASCAL VOC 2010

Page 10: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Deep Learning in Vision

33.4

DPM(2010)

40.4

segDPM(2014)

Object detection performance, PASCAL VOC 2010

Page 11: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Deep Learning in Vision

33.4

DPM(2010)

40.4

segDPM(2014)

53.7

RCNN(2014)

Object detection performance, PASCAL VOC 2010

Page 12: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Deep Learning in Vision

33.4

DPM(2010)

40.4

segDPM(2014)

53.7

RCNN(2014)

Object detection performance, PASCAL VOC 2010

62.9

RCNN*(Oct 2014)

Page 13: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Deep Learning in Vision

33.4

DPM(2010)

40.4

segDPM(2014)

53.7

RCNN(2014)

67.2

segRCNN(Jan 2015)

Object detection performance, PASCAL VOC 2010

62.9

RCNN*(Oct 2014)

Page 14: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Deep Learning in Vision

33.4

DPM(2010)

40.4

segDPM(2014)

53.7

RCNN(2014)

67.2

segRCNN(Jan 2015)

Object detection performance, PASCAL VOC 2010

62.9

RCNN*(Oct 2014)

70.8

Fast RCNN(Jun 2015)

Page 15: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

A Neuron

Image from http://cs231n.github.io/neural-networks-1/

Page 16: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

A Neuron in Neural Network

Image from http://cs231n.github.io/neural-networks-1/

Page 17: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Activation Functions● Sigmoid: f(x) = 1 / (1 + e

-x

)

● ReLU: f(x) = max(0, x)

● Leaky ReLU: f(x) = max(ax, x)

● Maxout: f(x) = max(w

0

x + b

0

, w

1

x + b

1

)

● and many others…

Page 18: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Neural Network (MLP)

Image modified from http://cs231n.github.io/neural-networks-1/

The network simulates a function y = f(x; w)

Page 19: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Forward Computation

Image and code modified from http://cs231n.github.io/optimization-2/

f(x

0

, x

1

) = 1 / (1 + exp(-(w

0

x

0

+ w

1

x

1

+ w

2

)))

x

0

x

1

1

sigmoid

2.00

-1.00

-3.00

-2.00

-3.00

Page 20: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Forward Computation

Image and code modified from http://cs231n.github.io/optimization-2/

f(x

0

, x

1

) = 1 / (1 + exp(-(w

0

x

0

+ w

1

x

1

+ w

2

)))

x

0

x

1

1

sigmoid

2.00

-1.00

-2.00

-3.00

-2.00

-3.00

Page 21: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Forward Computation

Image and code modified from http://cs231n.github.io/optimization-2/

f(x

0

, x

1

) = 1 / (1 + exp(-(w

0

x

0

+ w

1

x

1

+ w

2

)))

x

0

x

1

1

sigmoid

2.00

-1.00

-2.00

-3.00

-2.00

-3.00

6.00

Page 22: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Forward Computation

Image and code modified from http://cs231n.github.io/optimization-2/

f(x

0

, x

1

) = 1 / (1 + exp(-(w

0

x

0

+ w

1

x

1

+ w

2

)))

x

0

x

1

1

sigmoid

2.00

-1.00

-2.00

-3.00

-2.00

-3.00

6.00

4.00

Page 23: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Forward Computation

Image and code modified from http://cs231n.github.io/optimization-2/

f(x

0

, x

1

) = 1 / (1 + exp(-(w

0

x

0

+ w

1

x

1

+ w

2

)))

x

0

x

1

1

sigmoid

2.00

-1.00

-2.00

-3.00

-2.00

-3.00

6.00

4.00

1.00

Page 24: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Forward Computation

Image and code modified from http://cs231n.github.io/optimization-2/

f(x

0

, x

1

) = 1 / (1 + exp(-(w

0

x

0

+ w

1

x

1

+ w

2

)))

x

0

x

1

1

sigmoid

2.00

-1.00

-2.00

-3.00

-2.00

-3.00

6.00

4.00

1.00 -1.00

Page 25: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Forward Computation

Image and code modified from http://cs231n.github.io/optimization-2/

f(x

0

, x

1

) = 1 / (1 + exp(-(w

0

x

0

+ w

1

x

1

+ w

2

)))

x

0

x

1

1

sigmoid

2.00

-1.00

-2.00

-3.00

-2.00

-3.00

6.00

4.00

1.00 -1.00 0.37

Page 26: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Forward Computation

Image and code modified from http://cs231n.github.io/optimization-2/

f(x

0

, x

1

) = 1 / (1 + exp(-(w

0

x

0

+ w

1

x

1

+ w

2

)))

x

0

x

1

1

sigmoid

2.00

-1.00

-2.00

-3.00

-2.00

-3.00

6.00

4.00

1.00 -1.00 0.37 1.37

Page 27: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Forward Computation

Image and code modified from http://cs231n.github.io/optimization-2/

f(x

0

, x

1

) = 1 / (1 + exp(-(w

0

x

0

+ w

1

x

1

+ w

2

)))

x

0

x

1

1

sigmoid

2.00

-1.00

-2.00

-3.00

-2.00

-3.00

6.00

4.00

1.00 -1.00 0.37 1.37 0.73

Page 28: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Loss FunctionLoss function measures how well prediction matches true value

Commonly used loss function:

● Squared loss: (y - y’)

2

● Cross-entropy loss: -sum

i

(y

i

’ * log(y

i

))

● and many others

Page 29: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Loss FunctionDuring training, we would like to minimize the total loss on a set

of training data

● We want to find w* = argmin{sum

i

[loss(f(x

i

; w), y

i

)]}

Page 30: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Loss FunctionDuring training, we would like to minimize the total loss on a set

of training data

● We want to find w* = argmin{sum

i

[loss(f(x

i

; w), y

i

)]}

● Usually we use gradient based approach

○ w

t+1

= w

t

- a∇w

Page 31: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Backward Computation

Image and code modified from http://cs231n.github.io/optimization-2/

f(x

0

, x

1

) = 1 / (1 + exp(-(w

0

x

0

+ w

1

x

1

+ w

2

)))

x

0

x

1

1

sigmoid

2.00

-1.00

-2.00

-3.00

-2.00

-3.00

6.00

4.00

1.00 -1.00 0.37 1.37 0.73

1.00

Page 32: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Backward Computation

Image and code modified from http://cs231n.github.io/optimization-2/

f(x

0

, x

1

) = 1 / (1 + exp(-(w

0

x

0

+ w

1

x

1

+ w

2

)))

x

0

x

1

1

sigmoid

2.00

-1.00

-2.00

-3.00

-2.00

-3.00

6.00

4.00

1.00 -1.00 0.37 1.37 0.73

1.00-0.53

f = 1/xdf/dx = -1/x2

Page 33: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Backward Computation

Image and code modified from http://cs231n.github.io/optimization-2/

f(x

0

, x

1

) = 1 / (1 + exp(-(w

0

x

0

+ w

1

x

1

+ w

2

)))

x

0

x

1

1

sigmoid

2.00

-1.00

-2.00

-3.00

-2.00

-3.00

6.00

4.00

1.00 -1.00 0.37 1.37 0.73

1.00-0.53-0.53

f = x + 1df/dx = 1

Page 34: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Backward Computation

Image and code modified from http://cs231n.github.io/optimization-2/

f(x

0

, x

1

) = 1 / (1 + exp(-(w

0

x

0

+ w

1

x

1

+ w

2

)))

x

0

x

1

1

sigmoid

2.00

-1.00

-2.00

-3.00

-2.00

-3.00

6.00

4.00

1.00 -1.00 0.37 1.37 0.73

1.00-0.53-0.53-0.20

f = ex

df/dx = ex

Page 35: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Backward Computation

Image and code modified from http://cs231n.github.io/optimization-2/

f(x

0

, x

1

) = 1 / (1 + exp(-(w

0

x

0

+ w

1

x

1

+ w

2

)))

x

0

x

1

1

sigmoid

2.00

-1.00

-2.00

-3.00

-2.00

-3.00

6.00

4.00

1.00 -1.00 0.37 1.37 0.73

1.00-0.53-0.53-0.200.20

f = -xdf/dx = -1

Page 36: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Backward Computation

Image and code modified from http://cs231n.github.io/optimization-2/

f(x

0

, x

1

) = 1 / (1 + exp(-(w

0

x

0

+ w

1

x

1

+ w

2

)))

x

0

x

1

1

sigmoid

2.00

-1.00

-2.00

-3.00

-2.00

-3.00

6.00

4.00

1.00 -1.00 0.37 1.37 0.73

1.00-0.53-0.53-0.200.20

0.20

0.20

f = x + adf/dx = 1

Page 37: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Backward Computation

Image and code modified from http://cs231n.github.io/optimization-2/

f(x

0

, x

1

) = 1 / (1 + exp(-(w

0

x

0

+ w

1

x

1

+ w

2

)))

x

0

x

1

1

sigmoid

2.00

-1.00

-2.00

-3.00

-2.00

-3.00

6.00

4.00

1.00 -1.00 0.37 1.37 0.73

1.00-0.53-0.53-0.200.20

0.20

0.20

0.20

0.20

Page 38: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Backward Computation

Image and code modified from http://cs231n.github.io/optimization-2/

f(x

0

, x

1

) = 1 / (1 + exp(-(w

0

x

0

+ w

1

x

1

+ w

2

)))

x

0

x

1

1

sigmoid

2.00

-1.00

-2.00

-3.00

-2.00

-3.00

6.00

4.00

1.00 -1.00 0.37 1.37 0.73

1.00-0.53-0.53-0.200.20

0.20

0.20

0.200.40

-0.20

-0.40

-0.60

0.20

f = axdf/dx = a

Page 39: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Why NNs?

Page 40: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Universal Approximation TheoremA feed-forward network with a single hidden layer containing a

finite number of neurons, can approximate continuous functions

on compact subsets of R

n

, under mild assumptions on the

activation function.

https://en.wikipedia.org/wiki/Universal_approximation_theorem

Page 41: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Stone’s Theorem● Suppose X is a compact Hausdorff space and B is a

subalgebra in C(X, R) such that:

○ B separates points.

○ B contains the constant function 1.

○ If f ∈ B then af ∈ B for all constants a ∈ R.

○ If f, g ∈ B, then f + g, max{f, g} ∈ B.

● Then every continuous function defined on C(X, R) can be

approximated as closely as desired by functions in B

Page 42: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Why CNNs?

Page 43: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Problems of MLP in VisionFor input as a 10 * 10 image:

● A 3 layer MLP with 200 hidden units contains ~100k

parameters

For input as a 100 * 100 image:

● A 1 layer MLP with 20k hidden units contains ~200m

parameters

Page 44: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Can We Do Better?

Page 45: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Can We Do Better?

Page 46: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Can We Do Better?

Page 47: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Can We Do Better?

Page 48: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Can We Do Better?Based on such observation, MLP can be improved in two ways:

● Locally connected instead of fully connected

● Sharing weights between neurons

We achieve those by using convolution neurons

Page 49: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Convolutional Layers

Image from http://cs231n.github.io/convolutional-networks/

Page 50: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Convolutional Layers

Image from http://cs231n.github.io/convolutional-networks/. See this page for an excellent example of convolution.

depth

height

width

Page 51: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Pooling Layers

Image from http://cs231n.github.io/convolutional-networks/

Page 52: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Pooling Layers Example: Max Pooling

Image from http://cs231n.github.io/convolutional-networks/

Page 53: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Pooling LayersCommonly used pooling layers:

● Max pooling

● Average pooling

Why pooling layers?

● Reduce activation dimensionality

● Robust against tiny shifts

Page 54: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

CNN Architecture: An Example

Image from http://cs231n.github.io/convolutional-networks/

Page 55: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Layer Activations for CNNs

Image modified from http://cs231n.github.io/convolutional-networks/

Conv:1 ReLU:1 Conv:2 ReLU:2 MaxPool:1 Conv:3

Page 56: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Layer Activations for CNNs

Image modified from http://cs231n.github.io/convolutional-networks/

MaxPool:2 Conv:5 ReLU:5 Conv:6 ReLU:6 MaxPool:3

Page 57: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Learnt Weights for CNNs: First Conv Layer of AlexNet

Image from http://cs231n.github.io/convolutional-networks/

Page 58: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Why CNNs Work Now?

Page 59: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Convolutional Neural Networks● Faster heterogeneous parallel computing

○ CPU clusters, GPUs, etc.

● Large dataset

○ ImageNet: 1.2m images of 1,000 object classes

○ CoCo: 300k images of 2m object instances

● Improvements in model architecture

○ ReLU, dropout, inception, etc.

Page 60: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

AlexNet

Krizhevsky, Alex, et al. "Imagenet classification with deep convolutional neural networks." NIPS 2012

Page 61: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

GoogLeNet

Szegedy, Christian, et al. "Going deeper with convolutions." arXiv preprint arXiv:1409.4842 (2014).

Page 62: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Quiz# of parameters for the first conv layer of AlexNet?

Page 63: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Quiz# of parameters if the first layer is fully-connected?

Page 64: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

QuizGiven a convolution operation written as

f(x

3x3

; w

3x3

, b) = sum

i,j

(x

i,j

w

i,j

) + b

Can you derive its gradients (df/dx, df/dw, df/db)?

Page 65: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Ready to Build Your Own Networks?

Page 66: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Tips and Tricks for CNNs ● Know your data, clean your data, and normalize your data

○ A common trick: subtract the mean and divide by its std.

Image from http://cs231n.github.io/neural-networks-2/

Page 67: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Tips and Tricks for CNNs ● Augment your data

Page 68: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Tips and Tricks for CNNs ● Organize your data:

○ Keep training data balanced

○ Shuffle data before batching

● Feed your data in the correct way

○ Image channel order

○ Tensor storage order

Page 69: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Tips and Tricks for CNNs

First Order, in order. First Order, out of order.

Page 70: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Tips and Tricks for CNNs Common tensor storage order:

● BDRC

○ Used in Caffe, Torch, Theano, supported by CuDNN

○ Pros: faster for convolution (FFT, memory access)

● BRCD

○ Used in TensorFlow, limited support by CuDNN

○ Pros: Fast batch normalization, easier batching

Page 71: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Tips and Tricks for CNNs Designing model architecture

● Convolution, max pooling, then fully connected layers

● Nonlinearity

○ Stay away from sigmoid (except for output)

○ ReLU preferred

○ Leaky ReLU after

○ Use Maxout if most ReLU units die (have zero activation)

Page 72: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Tips and Tricks for CNNs Setting parameters

● Weights

○ Random initialization with proper variance

● Biases

○ For ReLU we prefer a small positive bias to activate ReLU

Page 73: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Tips and Tricks for CNNs Setting hyperparameters

● Learning Rate / Momentum (Δw

t*

= Δw

t

+ mΔw

t-1

)

○ Decrease learning rate while training

○ Setting momentum to 0.8 - 0.9

● Batch Size

○ For large dataset: set to whatever fits your memory

○ For smaller dataset: find a tradeoff between instance

randomness and gradient smoothness

Page 74: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Tips and Tricks for CNNs Monitoring your training:

● Split your dataset to training, validation and test

○ Optimize your hyperparameter in val and evaluate on test

○ Keep track of training and validation loss during training

○ Do early stopping if training and validation loss diverge

○ Loss doesn’t tell you all. Try precision, class-wise precision, and

more

Page 75: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Tips and Tricks for CNNs Borrow knowledge from another dataset

● Pre-train your CNN on a large dataset (e.g. ImageNet)

● Remove / reshape the last a few layers

● Fix the parameters of first a few layers, or make the learning

rate small for them

● Fine-tune the parameters on your own dataset

Page 76: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Tips and Tricks for CNNs Debugging

● import unittest, not import pdb

● Check your gradient [**deprecated**]

● Make your model large enough, and try overfitting training

● Check gradient norms, weight norms, and activation norms

Page 77: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Talk is Cheap, Show Me Some Code

Page 78: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Image from http://www.linkresearchtools.com/

Page 79: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection

Fully Convolutional Networks

Long, Jonathan, et al. "Fully convolutional networks for semantic segmentation." arXiv preprint arXiv:1411.4038 (2014).