"deep learning" chap.6 convolutional neural net

113
Chapter 6 Convolutional Neural Network 2015.7.15 wed. @kenmatsu4

Upload: kenichi-matsui

Post on 08-Jan-2017

5.465 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Page 1: "Deep Learning" Chap.6 Convolutional Neural Net

Chapter 6 Convolutional Neural Network

2015.7.15 wed.@kenmatsu4

Page 2: "Deep Learning" Chap.6 Convolutional Neural Net

Self-introduction・Twitter account    @kenmatsu4 (Please follow me )

・Blog

I’m writing my blog posts on Qiita (But, Japanese Only) (Category: Statistics, Machine Learning, Python etc…)

   http://qiita.com/kenmatsu4     (Over 2000 contribution ! ) ・My hobbies    - Playing the bass guitar with my band member.    - Traveling foreign countries, especially south-east Asia!    (Cambodia, Myanmar, Bangladesh, Uyghur etc) Pictures of my travel : http://matsu-ken.jimdo.com

Page 3: "Deep Learning" Chap.6 Convolutional Neural Net

・Japanese version of this slide    http://www.slideshare.net/matsukenbook/ss-50545587

Information

Page 4: "Deep Learning" Chap.6 Convolutional Neural Net

Author : Takayuki Okatani Machine Learning Professional Series ISBN: 978-4-06-152902-1

“Deep Learning”Chapter 6 Convolutional Neural Net

This is a slide for study group. Very good text for introduction of “Deep Learning. Let’s buy!

Unfortunately, Japanese only…

Page 5: "Deep Learning" Chap.6 Convolutional Neural Net

MASAKARI Come On !!!Let’s study together

https://twitter.com/_inundata/status/616658949761302528

Page 6: "Deep Learning" Chap.6 Convolutional Neural Net

For processing images with Neural Network, let’s use knowledge

of neuroscience!

Page 7: "Deep Learning" Chap.6 Convolutional Neural Net

• Receptive field • Simple cells • Complex cells

Using analogy of neuroscience

Page 8: "Deep Learning" Chap.6 Convolutional Neural Net

Receptive field ≒ Retina cells

http://bsd.neuroinf.jp/wiki/%e5%8f%97%e5%ae%b9%e9%87%8e

ON centered, OFF surrounded

OFF centered, ON surrounded

ON regionOFF region

Page 9: "Deep Learning" Chap.6 Convolutional Neural Net

On Center Cell On Center CellOff Center Cell Off Center Cell

https://en.wikipedia.org/wiki/Hypercomplex_cell

Receptive field ≒ Retina cells

Page 10: "Deep Learning" Chap.6 Convolutional Neural Net

Simple Cells and Complex Cells

https://en.wikipedia.org/wiki/Hypercomplex_cell

Forming a simple cell with setting receptive-field in line When exposed to light on

+ area and not exposed to light on - area,excitatory response occurs

When exposed to light on + and - area simultaneously,

excitatory response doesn’t occur

Simple Cells

Page 11: "Deep Learning" Chap.6 Convolutional Neural Net

http://www.cns.nyu.edu/~david/courses/perception/lecturenotes/V1/lgn-V1.html

Continuously respond with parallel moving, however, doesn’t respond with rotation.

Simple Cells and Complex Cells

Complex Cells

Page 12: "Deep Learning" Chap.6 Convolutional Neural Net

Main topic is from here.

Treat mathematically these knowledge of neuroscience,

and apply it to “Object Category Recognition”

Page 13: "Deep Learning" Chap.6 Convolutional Neural Net

Model of Simple Cells and Complex Cells

Receptive-field

Simple Cell

Complex Cell

The part of pink is a filter

Blue cell indicate the input signal.

Page 14: "Deep Learning" Chap.6 Convolutional Neural Net

Receptive-field

Simple Cell

Complex Cell

The part of pink is a filter

Model of Simple Cells and Complex Cells

Page 15: "Deep Learning" Chap.6 Convolutional Neural Net

The part of pink is a filter

Receptive-field

Simple Cell

Complex Cell

Model of Simple Cells and Complex Cells

Page 16: "Deep Learning" Chap.6 Convolutional Neural Net

The part of pink is a filter

Receptive-field

Simple Cell

Complex Cell

Model of Simple Cells and Complex Cells

Page 17: "Deep Learning" Chap.6 Convolutional Neural Net

The part of pink is a filter

Receptive-field

Simple Cell

Complex Cell

Model of Simple Cells and Complex Cells

Page 18: "Deep Learning" Chap.6 Convolutional Neural Net

The part of pink is a filter

Receptive-field

Simple Cell

Complex Cell

Model of Simple Cells and Complex Cells

Page 19: "Deep Learning" Chap.6 Convolutional Neural Net

Blue cell indicate the input signal.

The part of pink is a filter

Receptive-field

Model of Simple Cells and Complex Cells

Simple Cell

Complex Cell

Page 20: "Deep Learning" Chap.6 Convolutional Neural Net

Model of Simple Cells and Complex Cells

Input pattern has parallel shifted.

Receptive-field

Simple Cell

The cell on upper left was no longer respond due to position change

Complex Cell

Page 21: "Deep Learning" Chap.6 Convolutional Neural Net

If inputs is rotated…

the cell is not responded.

Model of Simple Cells and Complex Cells

Receptive-field

Simple Cell

Complex Cell

Page 22: "Deep Learning" Chap.6 Convolutional Neural Net

• Neocognitron   First application of 2 layer structure (Simple cells, Complex cells) for engineering pattern recognition)

• LaNet   LaNet is considered to roots of Convolutional Neural Net    ( http://yann.lecun.com/exdb/lenet/ )

Similar methods

Page 23: "Deep Learning" Chap.6 Convolutional Neural Net

Whole Structure

Page 24: "Deep Learning" Chap.6 Convolutional Neural Net

• fully-connected layer • convolution layer • pooling layer • Local Contrast Normalization layer,

LCN

Types of layer used on CNN

→ Discussed to previous chapter is fully-connected layer. Output of l-1 layer is input to all of units on l layer

Page 25: "Deep Learning" Chap.6 Convolutional Neural Net

Structure of typical CNN

input (image)

convolution

convolution

pooling

LCN

convolution

pooling

fully-connected

fully-connected

softmax

output (category label)

In many cases, pooling layer is put after a couple of Convolution layers. Sometimes LCN layer is allocated after that. If the purpose is classification, Softmax function which is multi-variate version of sigmoid function is usually used.

Softmax Function fi(x) =exp(xi)Pnj exp(xj)

example

Page 26: "Deep Learning" Chap.6 Convolutional Neural Net

def forward(self, x_data, y_data, train=True): x = Variable(x_data, volatile=not train) t = Variable(y_data, volatile=not train)

h = F.relu(self.conv1(x)) h = F.relu(self.conv1a(h)) h = F.relu(self.conv1b(h)) h = F.max_pooling_2d(h, 3, stride=2) h = F.relu(self.conv2(h)) h = F.relu(self.conv2a(h)) h = F.relu(self.conv2b(h)) h = F.max_pooling_2d(h, 3, stride=2) h = F.relu(self.conv3(h)) h = F.relu(self.conv3a(h)) h = F.relu(self.conv3b(h)) h = F.dropout(h, F.max_pooling_2d(h, 3, stride=2), train=train) h = F.relu(self.conv4(h)) h = F.relu(self.conv4a(h)) h = F.relu(self.conv4b(h)) h = F.reshape(F.average_pooling_2d(h, 6), (x_data.shape[0], 1000)) return F.softmax_cross_entropy(h, t), F.accuracy(h, t)

Example of Chainer (Deep Learning Framework)

https://github.com/pfnet/chainer/tree/master/examples/imagenet

Page 27: "Deep Learning" Chap.6 Convolutional Neural Net

Definition of Convolution

Page 28: "Deep Learning" Chap.6 Convolutional Neural Net

Definition of Convolution(0,0) (0,1) ・・・ (0, W-2) (0, W-1)

(1, 0) (1, 1) ・・・ (1, W-2) (1, W-1)

・・・ ・・・ ・・・ ・・・

(W-2, 0) (W-2, 1) ・・・ (W-2, W-2) (W-2, W-1)

(W-1, 0) (W-1, 1) ・・・ (W-1, W-2) (W-1, W-1)

W pixel

W pixel

Address map of W x W pixel image0 0 1 0 ・・・ 0 0 0 00 1 0 0 ・・・ 0 0 0 01 0 0 0 ・・・ 0 0 0 00 0 0 0 ・・・ 0 0 0 0

・・・ ・・・ ・・・ ・・・ ・・・ ・・・ ・・・ ・・・

0 0 0 0 ・・・ 0 0 0 00 0 0 0 ・・・ 0 0 0 00 0 0 0 ・・・ 0 0 0 00 0 0 0 ・・・ 0 0 0 0

Example of W x W pixel data

0.01 0.02 0.05 0.15

0.02 0.05 0.15 0.05

0.05 0.15 0.05 0.02

0.15 0.05 0.02 0.01

H pixel

H pixel

Filter of H x H pixel

xij(i, j)

Definition of convolution of pixels

uij =H�1X

p=0

H�1X

q=0

xi+p,j+qhpq

※ Actually, right symbol of x’s index just before p and q is -, however there is no substantial difference with this notation. So + is also fine.

Page 29: "Deep Learning" Chap.6 Convolutional Neural Net

Definition of Convolution(0,0) (0,1) ・・・ (0, W-2) (0, W-1)

(1, 0) (1, 1) ・・・ (1, W-2) (1, W-1)

・・・ ・・・ ・・・ ・・・

(W-2, 0) (W-2, 1) ・・・ (W-2, W-2) (W-2, W-1)

(W-1, 0) (W-1, 1) ・・・ (W-1, W-2) (W-1, W-1)

W pixel

W pixel

Address map of W x W pixel image0 0 1 0 ・・・ 0 0 0 00 1 0 0 ・・・ 0 0 0 01 0 0 0 ・・・ 0 0 0 00 0 0 0 ・・・ 0 0 0 0

・・・ ・・・ ・・・ ・・・ ・・・ ・・・ ・・・ ・・・

0 0 0 0 ・・・ 0 0 0 00 0 0 0 ・・・ 0 0 0 00 0 0 0 ・・・ 0 0 0 00 0 0 0 ・・・ 0 0 0 0

Example of W x W pixel data

H pixel

H pixel

Filter of H x H pixel

xij(i, j)

Definition of convolution of pixels

uij =H�1X

p=0

H�1X

q=0

xi+p,j+qhpq

※ Actually, right symbol of x’s index just before p and q is -, however there is no substantial difference with this notation. So + is also fine.

0.01 0.02 0.05 0.150.02 0.05 0.15 0.050.05 0.15 0.05 0.020.15 0.05 0.02 0.01

Page 30: "Deep Learning" Chap.6 Convolutional Neural Net

Role of Convolution

cos filter

Lenna’s image

https://gist.github.com/matsuken92/5b78c792f2ab98576c5c

畳込み

uij =H�1X

p=0

H�1X

q=0

xi+p,j+qhpq

Extracted feature of contrasting density from the image.

Page 31: "Deep Learning" Chap.6 Convolutional Neural Net

Role of Convolution

cos filter

Lenna’s image

https://gist.github.com/matsuken92/5b78c792f2ab98576c5c

畳込み

uij =H�1X

p=0

H�1X

q=0

xi+p,j+qhpq

Extracted feature of contrasting density from the image.

Page 32: "Deep Learning" Chap.6 Convolutional Neural Net

By the way…

Page 33: "Deep Learning" Chap.6 Convolutional Neural Net

The filter size is・・・

Role of Convolution

Page 34: "Deep Learning" Chap.6 Convolutional Neural Net

like this.The filter size is・・・

Role of Convolution

Page 35: "Deep Learning" Chap.6 Convolutional Neural Net

Padding(W � 2bH/2c)⇥ (W � 2bH/2c)

bH/2cbH/2cW

H

b·c* means round down to integer

x00

A preparation method of filtering for edge of image properly without reducing image size.

The image size will be reduced as much as this.

Padding is used in order to avoid this

reducing.

Page 36: "Deep Learning" Chap.6 Convolutional Neural Net

H � 1

Question: If we interpret the equation straightforwardly, isn't the reduced area like the figure on the left?

uij =H�1X

p=0

H�1X

q=0

xi+p,j+qhpq

x00

Padding

Page 37: "Deep Learning" Chap.6 Convolutional Neural Net

Zero-padding

0 0 0 0 0 0 0 0 0 0

0 77 80 82 78 70 82 82 140 0

0 83 78 80 83 82 77 94 151 0

0 87 82 81 80 74 75 112 152 0

0 87 87 85 77 66 99 151 167 0

0 84 79 77 78 76 107 162 160 0

0 86 72 70 72 81 151 166 151 0

0 78 72 73 73 107 166 170 148 0

0 76 76 77 84 147 180 168 142 0

0 0 0 0 0 0 0 0 0 0

The method that the padding area is filled by 0.

→ This is broadly used for convolutional neural net.

DemeritConsequence of the convolution with zero-padding, around the edge becomes dark.

Filled by the pixels of most outside. Filled by the folded back pixels on the four side.

The other method

Page 38: "Deep Learning" Chap.6 Convolutional Neural Net

Stride

77 80 82 78 70 82 82 140

83 78 80 83 82 77 94 151

87 82 81 80 74 75 112 152

87 87 85 77 66 99 151 167

84 79 77 78 76 107 162 160

86 72 70 72 81 151 166 151

78 72 73 73 107 166 170 148

76 76 77 84 147 180 168 142

When the filter is slid with few pixels step by step, not one by one, for calculating sum of products, in that case, the interval of the filter is called “Stride”. If you handle very large size image, it is able to avoid that the output unit is too much larger. (Trade off with performance degradation

uij =H�1X

p=0

H�1X

q=0

xsi+p,sj+qhpq

s : Stride

Output image size when stride is applied

(b(W � 1)/sc+ 1)⇥ (b(W � 1)/sc+ 1)

It is common that stride is more than 2 on a pooling layer.

Page 39: "Deep Learning" Chap.6 Convolutional Neural Net

Stride

77 80 82 78 70 82 82 140

83 78 80 83 82 77 94 151

87 82 81 80 74 75 112 152

87 87 85 77 66 99 151 167

84 79 77 78 76 107 162 160

86 72 70 72 81 151 166 151

78 72 73 73 107 166 170 148

76 76 77 84 147 180 168 142

When the filter is slid with few pixels step by step, not one by one, for calculating sum of products, in that case, the interval of the filter is called “Stride”. If you handle very large size image, it is able to avoid that the output unit is too much larger. (Trade off with performance degradation

uij =H�1X

p=0

H�1X

q=0

xsi+p,sj+qhpq

s : Stride

Output image size when stride is applied

(b(W � 1)/sc+ 1)⇥ (b(W � 1)/sc+ 1)

It is common that stride is more than 2 on a pooling layer.

Page 40: "Deep Learning" Chap.6 Convolutional Neural Net

Stride

77 80 82 78 70 82 82 140

83 78 80 83 82 77 94 151

87 82 81 80 74 75 112 152

87 87 85 77 66 99 151 167

84 79 77 78 76 107 162 160

86 72 70 72 81 151 166 151

78 72 73 73 107 166 170 148

76 76 77 84 147 180 168 142

When the filter is slid with few pixels step by step, not one by one, for calculating sum of products, in that case, the interval of the filter is called “Stride”. If you handle very large size image, it is able to avoid that the output unit is too much larger. (Trade off with performance degradation

uij =H�1X

p=0

H�1X

q=0

xsi+p,sj+qhpq

s : Stride

Output image size when stride is applied

(b(W � 1)/sc+ 1)⇥ (b(W � 1)/sc+ 1)

It is common that stride is more than 2 on a pooling layer.

Page 41: "Deep Learning" Chap.6 Convolutional Neural Net

Stride

77 80 82 78 70 82 82 140

83 78 80 83 82 77 94 151

87 82 81 80 74 75 112 152

87 87 85 77 66 99 151 167

84 79 77 78 76 107 162 160

86 72 70 72 81 151 166 151

78 72 73 73 107 166 170 148

76 76 77 84 147 180 168 142

When the filter is slid with few pixels step by step, not one by one, for calculating sum of products, in that case, the interval of the filter is called “Stride”. If you handle very large size image, it is able to avoid that the output unit is too much larger. (Trade off with performance degradation

uij =H�1X

p=0

H�1X

q=0

xsi+p,sj+qhpq

s : Stride

Output image size when stride is applied

(b(W � 1)/sc+ 1)⇥ (b(W � 1)/sc+ 1)

It is common that stride is more than 2 on a pooling layer.

Page 42: "Deep Learning" Chap.6 Convolutional Neural Net

Convolution Layer

Page 43: "Deep Learning" Chap.6 Convolutional Neural Net

Convolution layer

Convolution LayerThis is correspond to Simple Cell, as described the following figure.

The part of pink is a filter

Blue cell indicate the input signal.

Receptive-field

Simple Cell

Complex Cell

Page 44: "Deep Learning" Chap.6 Convolutional Neural Net

Calculate convolution with parallel filters to multi channel image (e.g. RGB) , don’t use just 1 pcs o grayscale image on the practical Convolutional NeuralNet.

W

WK

W :The number of pixel

K:The number of channel

e.g. K=3 (RGB image) W ⇥W ⇥KImage size :

Convolution Layer

Page 45: "Deep Learning" Chap.6 Convolutional Neural Net

In some context, the size of image ( ) is called as “map”.

Much more channel size (e.g. K=16, K=256 etc) is commonly used on hidden layers (convolution layer or pooling layer)

W ⇥W ⇥K

Convolution Layer

Page 46: "Deep Learning" Chap.6 Convolutional Neural Net

The equation to obtain is the following.

uijm =K�1X

k=0

H�1X

p=0

H�1X

q=0

z(l�1)i+p,j+q,khpqkm + bijm

W

W

K

Filter 1

* …H

H

K

hpqk0

f(·)

m = 0

uij0 zij0

uijm =K�1X

k=0

H�1X

p=0

H�1X

q=0

z(l�1)i+p,j+q,khpqkm + bijm

Bias is commonly set as which doesn’t depend on the position ( ) of pixel of the image. (it is like some whole ‘s density)

bijm = bmi, j uijm

b0

Convolution Layer

Page 47: "Deep Learning" Chap.6 Convolutional Neural Net

W

W

K

Filter 1

* …H

H

K

hpqk0

f(·)

m = 0

uij0 zij0

Identical values of weight is used for every pixel , it is called as ”weight sharing, weight tying”

zij0hpqk0

b0

Convolution Layer

Page 48: "Deep Learning" Chap.6 Convolutional Neural Net

W

W

K

Filter 1

*H

H

K

hpqk0

f(·)

m = 0

uij0 zij0

b0

Convolution Layer

Identical values of weight is used for every pixel , it is called as ”weight sharing, weight tying”

zij0hpqk0

Page 49: "Deep Learning" Chap.6 Convolutional Neural Net

W

W

K

Filter 1

*H

H

K

hpqk0

f(·)

m = 0

uij0 zij0

b0

Convolution Layer

Identical values of weight is used for every pixel , it is called as ”weight sharing, weight tying”

zij0hpqk0

Page 50: "Deep Learning" Chap.6 Convolutional Neural Net

W

W

K

Filter 1

Filter 2

Filter 3

*

*

*

…H

H

…H

H

…H

H

K

hpqk0

hpqk1

hpqk2

m = 0

m = 1

m = 2

uij0

z(l�1)ijk

uij1

uij2

zijm (l)

zij2zij1zij0

M

f(·)

f(·)

f(·)

b0

b1

b2

Page 51: "Deep Learning" Chap.6 Convolutional Neural Net

Output from Convolution layer can be regarded as a multi-channel image whose size is with interpreting the number of filter as channel size.

W ⇥W ⇥M

Convolution Layer

Page 52: "Deep Learning" Chap.6 Convolutional Neural Net

…H

H

K

Parameter size is not depend on the size of image ( )

…H

H

K

…H

H

K

When M=3

W ⇥W

H ⇥H ⇥K ⇥MParameter size is the following.

That is,filter size ⇥ filter size

⇥ channel size ⇥ the number of filter

Convolution Layer

Page 53: "Deep Learning" Chap.6 Convolutional Neural Net

Gradient Descent method is applied for parameter optimization of Convolutional Neural Net, too.

The targets of optimization are   and bias

For the calculation of the gradient, Back Propagation is also used. (in detail, explained later)

uijm =K�1X

k=0

H�1X

p=0

H�1X

q=0

z(l�1)i+p,j+q,khpqkm + bijm

hpqkm bijm

Convolution Layer

Page 54: "Deep Learning" Chap.6 Convolutional Neural Net

Pooling Layer

Page 55: "Deep Learning" Chap.6 Convolutional Neural Net

Pooling Layer

Generally, Pooling layer is located just after convolution layer, .

input (image)

convolution

convolution

pooling

LCN

convolution

pooling

fully-connected

fully-connected

softmax

output (category label)

example

Page 56: "Deep Learning" Chap.6 Convolutional Neural Net

Pooling layer is final layer of the following figure (Complex Cell part). It is designed to make the output of the pooling layer unchanged even if the target feature value becomes a little bit changed (or parallel transition).

Pooling Layer

The part of pink is a filter

Blue cell indicate the input signal.

Receptive-field

Pooling layer

Simple Cell

Complex Cell

Page 57: "Deep Learning" Chap.6 Convolutional Neural Net

Pooling Layer

The part of pink is a filter

Blue cell indicate the input signal.

Pooling layer is final layer of the following figure (Complex Cell part). It is designed to make the output of the pooling layer unchanged even if the target feature value becomes a little bit changed (or parallel transition).

Receptive-field

Pooling layer

Simple Cell

Complex Cell

Page 58: "Deep Learning" Chap.6 Convolutional Neural Net

Pooling layer is final layer of the following figure (Complex Cell part). It is designed to make the output of the pooling layer unchanged even if the target feature value becomes a little bit changed (or parallel transition).

Pooling Layer

The part of pink is a filter

Blue cell indicate the input signal.

Receptive-field

Pooling layer

Simple Cell

Complex Cell

Page 59: "Deep Learning" Chap.6 Convolutional Neural Net

zij

H

H

denotes a set of pixels included this area.Pij

W

W

A pixel value is obtained by using pcs of pixel value with every channelsk

H2

uijk

Padding

Pooling Layer

Page 60: "Deep Learning" Chap.6 Convolutional Neural Net

1. Max pooling

2. Average pooling

3. Lp pooling

3 Types of pooling layer

Page 61: "Deep Learning" Chap.6 Convolutional Neural Net

Using maximum value from the pixels in the area.

77 80 82 78 70 82 82 140

83 78 80 83 82 77 94 151

87 82 81 80 74 75 112 152

87 87 85 77 66 99 151 167

84 79 77 78 76 107 162 160

86 72 70 72 81 151 166 151

78 72 73 73 107 166 170 148

76 76 77 84 147 180 168 142

1.Max Pooling

87 87 87 83 112 152 152 152

87 87 87 99 151 167 167 167

87 87 87 107 162 167 167 167

87 87 87 151 166 167 167 167

87 87 107 166 170 170 170 170

87 87 147 180 180 180 180 180

86 86 147 180 180 180 180 180

86 86 147 180 180 180 180 180

uijk = max

p,q2Pi,j

zpqk

zpqkuijk

H2

Standard way to apply image recognition

Page 62: "Deep Learning" Chap.6 Convolutional Neural Net

77 80 82 78 70 82 82 140

83 78 80 83 82 77 94 151

87 82 81 80 74 75 112 152

87 87 85 77 66 99 151 167

84 79 77 78 76 107 162 160

86 72 70 72 81 151 166 151

78 72 73 73 107 166 170 148

76 76 77 84 147 180 168 142

87 87 87 83 112 152 152 152

87 87 87 99 151 167 167 167

87 87 87 107 162 167 167 167

87 87 87 151 166 167 167 167

87 87 107 166 170 170 170 170

87 87 147 180 180 180 180 180

86 86 147 180 180 180 180 180

86 86 147 180 180 180 180 180

uijk = max

p,q2Pi,j

zpqk

zpqkuijk

1.Max PoolingUsing maximum value from the pixels in the area. H2

Standard way to apply image recognition

Page 63: "Deep Learning" Chap.6 Convolutional Neural Net

77 80 82 78 70 82 82 140

83 78 80 83 82 77 94 151

87 82 81 80 74 75 112 152

87 87 85 77 66 99 151 167

84 79 77 78 76 107 162 160

86 72 70 72 81 151 166 151

78 72 73 73 107 166 170 148

76 76 77 84 147 180 168 142

87 87 87 83 112 152 152 152

87 87 87 99 151 167 167 167

87 87 87 107 162 167 167 167

87 87 87 151 166 167 167 167

87 87 107 166 170 170 170 170

87 87 147 180 180 180 180 180

86 86 147 180 180 180 180 180

86 86 147 180 180 180 180 180

uijk = max

p,q2Pi,j

zpqk

zpqkuijk

1.Max PoolingUsing maximum value from the pixels in the area. H2

Standard way to apply image recognition

Page 64: "Deep Learning" Chap.6 Convolutional Neural Net

77 80 82 78 70 82 82 140

83 78 80 83 82 77 94 151

87 82 81 80 74 75 112 152

87 87 85 77 66 99 151 167

84 79 77 78 76 107 162 160

86 72 70 72 81 151 166 151

78 72 73 73 107 166 170 148

76 76 77 84 147 180 168 142

87 87 87 83 112 152 152 152

87 87 87 99 151 167 167 167

87 87 87 107 162 167 167 167

87 87 87 151 166 167 167 167

87 87 107 166 170 170 170 170

87 87 147 180 180 180 180 180

86 86 147 180 180 180 180 180

86 86 147 180 180 180 180 180

uijk = max

p,q2Pi,j

zpqk

zpqkuijk

1.Max PoolingUsing maximum value from the pixels in the area. H2

Standard way to apply image recognition

Page 65: "Deep Learning" Chap.6 Convolutional Neural Net

77 80 82 78 70 82 82 140

83 78 80 83 82 77 94 151

87 82 81 80 74 75 112 152

87 87 85 77 66 99 151 167

84 79 77 78 76 107 162 160

86 72 70 72 81 151 166 151

78 72 73 73 107 166 170 148

76 76 77 84 147 180 168 142

87 87 87 83 112 152 152 152

87 87 87 99 151 167 167 167

87 87 87 107 162 167 167 167

87 87 87 151 166 167 167 167

87 87 107 166 170 170 170 170

87 87 147 180 180 180 180 180

86 86 147 180 180 180 180 180

86 86 147 180 180 180 180 180

uijk = max

p,q2Pi,j

zpqk

zpqkuijk

1.Max PoolingUsing maximum value from the pixels in the area. H2

Standard way to apply image recognition

Page 66: "Deep Learning" Chap.6 Convolutional Neural Net

77 80 82 78 70 82 82 140

83 78 80 83 82 77 94 151

87 82 81 80 74 75 112 152

87 87 85 77 66 99 151 167

84 79 77 78 76 107 162 160

86 72 70 72 81 151 166 151

78 72 73 73 107 166 170 148

76 76 77 84 147 180 168 142

87 87 87 83 112 152 152 152

87 87 87 99 151 167 167 167

87 87 87 107 162 167 167 167

87 87 87 151 166 167 167 167

87 87 107 166 170 170 170 170

87 87 147 180 180 180 180 180

86 86 147 180 180 180 180 180

86 86 147 180 180 180 180 180

uijk = max

p,q2Pi,j

zpqk

zpqkuijk

1.Max PoolingUsing maximum value from the pixels in the area. H2

Standard way to apply image recognition

Page 67: "Deep Learning" Chap.6 Convolutional Neural Net

77 80 82 78 70 82 82 140

83 78 80 83 82 77 94 151

87 82 81 80 74 75 112 152

87 87 85 77 66 99 151 167

84 79 77 78 76 107 162 160

86 72 70 72 81 151 166 151

78 72 73 73 107 166 170 148

76 76 77 84 147 180 168 142

87 87 87 83 112 152 152 152

87 87 87 99 151 167 167 167

87 87 87 107 162 167 167 167

87 87 87 151 166 167 167 167

87 87 107 166 170 170 170 170

87 87 147 180 180 180 180 180

86 86 147 180 180 180 180 180

86 86 147 180 180 180 180 180

zpqkuijk

1.Max PoolingUsing maximum value from the pixels in the area. H2

Standard way to apply image recognition

Page 68: "Deep Learning" Chap.6 Convolutional Neural Net

77 80 82 78 70 82 82 140

83 78 80 83 82 77 94 151

87 82 81 80 74 75 112 152

87 87 85 77 66 99 151 167

84 79 77 78 76 107 162 160

86 72 70 72 81 151 166 151

78 72 73 73 107 166 170 148

76 76 77 84 147 180 168 142

87 87 87 83 112 152 152 152

87 87 87 99 151 167 167 167

87 87 87 107 162 167 167 167

87 87 87 151 166 167 167 167

87 87 107 166 170 170 170 170

87 87 147 180 180 180 180 180

86 86 147 180 180 180 180 180

86 86 147 180 180 180 180 180

uijk = max

p,q2Pi,j

zpqk

zpqkuijk

1.Max PoolingUsing maximum value from the pixels in the area. H2

Standard way to apply image recognition

Page 69: "Deep Learning" Chap.6 Convolutional Neural Net

77 80 82 78 70 82 82 140

83 78 80 83 82 77 94 151

87 82 81 80 74 75 112 152

87 87 85 77 66 99 151 167

84 79 77 78 76 107 162 160

86 72 70 72 81 151 166 151

78 72 73 73 107 166 170 148

76 76 77 84 147 180 168 142

81.1 80.9 79.8 78.9 82.1 95.5 99.3 107.2

82.4 81.7 80.0 79.9 85.5 99.6 104.6 115.2

81.9 81.3 79.7 80.6 88.4 103.0 109.0 120.7

81.2 80.4 79.5 82.8 94.2 109.8 117.7 131.7

80.0 79.0 79.4 86.4 101.2 116.8 127.1 142.5

78.6 78.2 81.6 93.3 110.5 126.0 138.3 152.5

76.7 76.7 81.9 95.9 114.3 129.5 142.6 155.9

75.6 75.8 82.9 100.1 119.0 133.7 148.1 160.2

zpqkuijk

2. Average Poolinguijk =

1

H2

X

(p,q)2Pij

zpqkUsing average value from the pixels in the area. H2

Page 70: "Deep Learning" Chap.6 Convolutional Neural Net

77 80 82 78 70 82 82 140

83 78 80 83 82 77 94 151

87 82 81 80 74 75 112 152

87 87 85 77 66 99 151 167

84 79 77 78 76 107 162 160

86 72 70 72 81 151 166 151

78 72 73 73 107 166 170 148

76 76 77 84 147 180 168 142

81.1 80.9 79.8 78.9 82.1 95.5 99.3 107.2

82.4 81.7 80.0 79.9 85.5 99.6 104.6 115.2

81.9 81.3 79.7 80.6 88.4 103.0 109.0 120.7

81.2 80.4 79.5 82.8 94.2 109.8 117.7 131.7

80.0 79.0 79.4 86.4 101.2 116.8 127.1 142.5

78.6 78.2 81.6 93.3 110.5 126.0 138.3 152.5

76.7 76.7 81.9 95.9 114.3 129.5 142.6 155.9

75.6 75.8 82.9 100.1 119.0 133.7 148.1 160.2

zpqkuijk

uijk =1

H2

X

(p,q)2Pij

zpqk

2. Average PoolingUsing average value from the pixels in the area. H2

Page 71: "Deep Learning" Chap.6 Convolutional Neural Net

77 80 82 78 70 82 82 140

83 78 80 83 82 77 94 151

87 82 81 80 74 75 112 152

87 87 85 77 66 99 151 167

84 79 77 78 76 107 162 160

86 72 70 72 81 151 166 151

78 72 73 73 107 166 170 148

76 76 77 84 147 180 168 142

81.1 80.9 79.8 78.9 82.1 95.5 99.3 107.2

82.4 81.7 80.0 79.9 85.5 99.6 104.6 115.2

81.9 81.3 79.7 80.6 88.4 103.0 109.0 120.7

81.2 80.4 79.5 82.8 94.2 109.8 117.7 131.7

80.0 79.0 79.4 86.4 101.2 116.8 127.1 142.5

78.6 78.2 81.6 93.3 110.5 126.0 138.3 152.5

76.7 76.7 81.9 95.9 114.3 129.5 142.6 155.9

75.6 75.8 82.9 100.1 119.0 133.7 148.1 160.2

zpqkuijk

uijk =1

H2

X

(p,q)2Pij

zpqk

2. Average PoolingUsing average value from the pixels in the area. H2

Page 72: "Deep Learning" Chap.6 Convolutional Neural Net

77 80 82 78 70 82 82 140

83 78 80 83 82 77 94 151

87 82 81 80 74 75 112 152

87 87 85 77 66 99 151 167

84 79 77 78 76 107 162 160

86 72 70 72 81 151 166 151

78 72 73 73 107 166 170 148

76 76 77 84 147 180 168 142

81.1 80.9 79.8 78.9 82.1 95.5 99.3 107.2

82.4 81.7 80.0 79.9 85.5 99.6 104.6 115.2

81.9 81.3 79.7 80.6 88.4 103.0 109.0 120.7

81.2 80.4 79.5 82.8 94.2 109.8 117.7 131.7

80.0 79.0 79.4 86.4 101.2 116.8 127.1 142.5

78.6 78.2 81.6 93.3 110.5 126.0 138.3 152.5

76.7 76.7 81.9 95.9 114.3 129.5 142.6 155.9

75.6 75.8 82.9 100.1 119.0 133.7 148.1 160.2

zpqkuijk

uijk =1

H2

X

(p,q)2Pij

zpqk

2. Average PoolingUsing average value from the pixels in the area. H2

Page 73: "Deep Learning" Chap.6 Convolutional Neural Net

77 80 82 78 70 82 82 140

83 78 80 83 82 77 94 151

87 82 81 80 74 75 112 152

87 87 85 77 66 99 151 167

84 79 77 78 76 107 162 160

86 72 70 72 81 151 166 151

78 72 73 73 107 166 170 148

76 76 77 84 147 180 168 142

81.1 80.9 79.8 78.9 82.1 95.5 99.3 107.2

82.4 81.7 80.0 79.9 85.5 99.6 104.6 115.2

81.9 81.3 79.7 80.6 88.4 103.0 109.0 120.7

81.2 80.4 79.5 82.8 94.2 109.8 117.7 131.7

80.0 79.0 79.4 86.4 101.2 116.8 127.1 142.5

78.6 78.2 81.6 93.3 110.5 126.0 138.3 152.5

76.7 76.7 81.9 95.9 114.3 129.5 142.6 155.9

75.6 75.8 82.9 100.1 119.0 133.7 148.1 160.2

zpqkuijk

uijk =1

H2

X

(p,q)2Pij

zpqk

2. Average PoolingUsing average value from the pixels in the area. H2

Page 74: "Deep Learning" Chap.6 Convolutional Neural Net

77 80 82 78 70 82 82 140

83 78 80 83 82 77 94 151

87 82 81 80 74 75 112 152

87 87 85 77 66 99 151 167

84 79 77 78 76 107 162 160

86 72 70 72 81 151 166 151

78 72 73 73 107 166 170 148

76 76 77 84 147 180 168 142

81.1 80.9 79.8 78.9 82.1 95.5 99.3 107.2

82.4 81.7 80.0 79.9 85.5 99.6 104.6 115.2

81.9 81.3 79.7 80.6 88.4 103.0 109.0 120.7

81.2 80.4 79.5 82.8 94.2 109.8 117.7 131.7

80.0 79.0 79.4 86.4 101.2 116.8 127.1 142.5

78.6 78.2 81.6 93.3 110.5 126.0 138.3 152.5

76.7 76.7 81.9 95.9 114.3 129.5 142.6 155.9

75.6 75.8 82.9 100.1 119.0 133.7 148.1 160.2

zpqkuijk

uijk =1

H2

X

(p,q)2Pij

zpqk

2. Average PoolingUsing average value from the pixels in the area. H2

Page 75: "Deep Learning" Chap.6 Convolutional Neural Net

77 80 82 78 70 82 82 140

83 78 80 83 82 77 94 151

87 82 81 80 74 75 112 152

87 87 85 77 66 99 151 167

84 79 77 78 76 107 162 160

86 72 70 72 81 151 166 151

78 72 73 73 107 166 170 148

76 76 77 84 147 180 168 142

81.1 80.9 79.8 78.9 82.1 95.5 99.3 107.2

82.4 81.7 80.0 79.9 85.5 99.6 104.6 115.2

81.9 81.3 79.7 80.6 88.4 103.0 109.0 120.7

81.2 80.4 79.5 82.8 94.2 109.8 117.7 131.7

80.0 79.0 79.4 86.4 101.2 116.8 127.1 142.5

78.6 78.2 81.6 93.3 110.5 126.0 138.3 152.5

76.7 76.7 81.9 95.9 114.3 129.5 142.6 155.9

75.6 75.8 82.9 100.1 119.0 133.7 148.1 160.2

zpqkuijk

uijk =1

H2

X

(p,q)2Pij

zpqk

2. Average PoolingUsing average value from the pixels in the area. H2

Page 76: "Deep Learning" Chap.6 Convolutional Neural Net

77 80 82 78 70 82 82 140

83 78 80 83 82 77 94 151

87 82 81 80 74 75 112 152

87 87 85 77 66 99 151 167

84 79 77 78 76 107 162 160

86 72 70 72 81 151 166 151

78 72 73 73 107 166 170 148

76 76 77 84 147 180 168 142

81.1 80.9 79.8 78.9 82.1 95.5 99.3 107.2

82.4 81.7 80.0 79.9 85.5 99.6 104.6 115.2

81.9 81.3 79.7 80.6 88.4 103.0 109.0 120.7

81.2 80.4 79.5 82.8 94.2 109.8 117.7 131.7

80.0 79.0 79.4 86.4 101.2 116.8 127.1 142.5

78.6 78.2 81.6 93.3 110.5 126.0 138.3 152.5

76.7 76.7 81.9 95.9 114.3 129.5 142.6 155.9

75.6 75.8 82.9 100.1 119.0 133.7 148.1 160.2

zpqkuijk

uijk =1

H2

X

(p,q)2Pij

zpqk

2. Average PoolingUsing average value from the pixels in the area. H2

Page 77: "Deep Learning" Chap.6 Convolutional Neural Net

3.Lp pooling

https://gist.github.com/matsuken92/5b78c792f2ab98576c5c#file-03_anim_lp_pooling-py

uijk =

0

@ 1

H2

X

(p,q)2Pij

zPpqk

1

A

1P

Lp pooling is a general way including Max pooling and Average pooling.

When , it works as Average pooling. When , it works as Max pooling.

P = 1

P = 1

e.g. Uniform distribution

Page 78: "Deep Learning" Chap.6 Convolutional Neural Net

https://gist.github.com/matsuken92/5b78c792f2ab98576c5c#file-03_anim_lp_pooling-py

uijk =

0

@ 1

H2

X

(p,q)2Pij

zPpqk

1

A

1P

3.Lp pooling

Lp pooling is a general way including Max pooling and Average pooling.

When , it works as Average pooling. When , it works as Max pooling.

P = 1

P = 1

e.g. Beta distribution

Page 79: "Deep Learning" Chap.6 Convolutional Neural Net

Generally, calculation is conducted on every input channel independently on pooling layer, so the number of output-channel is same as input.K

W

W

K

W

W

K

Pooling Layer

The number of channel K is not changed.

※ Normally, activate function is not applied on pooling layer.

There is no parameter which is adjustable, since the weights on the pooling layer is fixed.

Pooling Layer

Page 80: "Deep Learning" Chap.6 Convolutional Neural Net

Pooling size : , Stride :

77 80 82 78 70 82 82 140

83 78 80 83 82 77 94 151

87 82 81 80 74 75 112 152

87 87 85 77 66 99 151 167

84 79 77 78 76 107 162 160

86 72 70 72 81 151 166 151

78 72 73 73 107 166 170 148

76 76 77 84 147 180 168 142

81.1 79.8 82.1 99.3

81.9 79.7 88.4 109.0

80.0 79.4 101.2 127.1

76.7 81.9 114.3 142.6

zpqkuijk

Stride of the pooling layers = 25⇥ 5

b(W � 1)/sc+ 1The size of output layer

So, in this example…b(8� 1)/2c+ 1 = 4

Page 81: "Deep Learning" Chap.6 Convolutional Neural Net

77 80 82 78 70 82 82 140

83 78 80 83 82 77 94 151

87 82 81 80 74 75 112 152

87 87 85 77 66 99 151 167

84 79 77 78 76 107 162 160

86 72 70 72 81 151 166 151

78 72 73 73 107 166 170 148

76 76 77 84 147 180 168 142

81.1 79.8 82.1 99.3

81.9 79.7 88.4 109.0

80.0 79.4 101.2 127.1

76.7 81.9 114.3 142.6

zpqkuijk

Stride of the pooling layerPooling size : , Stride : s = 25⇥ 5

b(W � 1)/sc+ 1The size of output layer

So, in this example…b(8� 1)/2c+ 1 = 4

Page 82: "Deep Learning" Chap.6 Convolutional Neural Net

77 80 82 78 70 82 82 140

83 78 80 83 82 77 94 151

87 82 81 80 74 75 112 152

87 87 85 77 66 99 151 167

84 79 77 78 76 107 162 160

86 72 70 72 81 151 166 151

78 72 73 73 107 166 170 148

76 76 77 84 147 180 168 142

81.1 79.8 82.1 99.3

81.9 79.7 88.4 109.0

80.0 79.4 101.2 127.1

76.7 81.9 114.3 142.6

zpqkuijk

Stride of the pooling layerPooling size : , Stride : s = 25⇥ 5

b(W � 1)/sc+ 1The size of output layer

So, in this example…b(8� 1)/2c+ 1 = 4

Page 83: "Deep Learning" Chap.6 Convolutional Neural Net

77 80 82 78 70 82 82 140

83 78 80 83 82 77 94 151

87 82 81 80 74 75 112 152

87 87 85 77 66 99 151 167

84 79 77 78 76 107 162 160

86 72 70 72 81 151 166 151

78 72 73 73 107 166 170 148

76 76 77 84 147 180 168 142

81.1 79.8 82.1 99.3

81.9 79.7 88.4 109.0

80.0 79.4 101.2 127.1

76.7 81.9 114.3 142.6

zpqkuijk

Stride of the pooling layerPooling size : , Stride : s = 25⇥ 5

b(W � 1)/sc+ 1The size of output layer

So, in this example…b(8� 1)/2c+ 1 = 4

Page 84: "Deep Learning" Chap.6 Convolutional Neural Net

1. Normalization for single channel 1-1. Subtractive Normalization 1-2. Divisive Normalization

2. Normalization for multi channel 2-1. Subtractive Normalization 2-2. Divisive Normalization

Local Contrast Normalization (LCN)

Page 85: "Deep Learning" Chap.6 Convolutional Neural Net

Contrast

http://homepage2.nifty.com/tsugu/sotuken/ronbun/sec3-2.html#0005

High contrast

Low contrast

Original

Contrast adjustment is a operation controlling the difference of color strength on a image. If high contrast, more distinguishable between bright and dark.

Input pixel value

Out

put p

ixel

val

ue

Page 86: "Deep Learning" Chap.6 Convolutional Neural Net

Brightness

http://www.mis.med.akita-u.ac.jp/~kata/image/monogamma.html

High Brightness

Low Brightness

Original

Brightness adjustment uses exponential function for transformation with parameter .

γ = 1.5 γ = 2.0

Page 87: "Deep Learning" Chap.6 Convolutional Neural Net

Normalization

exijk =1

N

NX

n=1

x

(n)ijk

Calculate average between training images for every channel and every pixels

xijk xijk � exijk

Converted input data is the value subtracted this average value from target pixel.

x

(n)ijk:Pixel value of channel

(i, j) and address k

The processing is applied for every image individually.

Average of pixels for every training image

Local Contrast Normalization

Page 88: "Deep Learning" Chap.6 Convolutional Neural Net

LCN: Subtractive NormalizationSingle channel image (Gray scale image etc…)

zij

H

Pij

W

H W

W

W

xij

x̄ij =1

H2

X

(p,q)2Pij

xi+p,j+q

x̄ij =X

(p,q)2Pij

wpqxi+p,j+q

Subtracting average pixel value in the area from pixels of input image.Pij

Weighted Average

zij = xij � x̄ij

Average

Page 89: "Deep Learning" Chap.6 Convolutional Neural Net

x̄ij =X

(p,q)2Pij

wpqxi+p,j+q

Weighted Average

The weight is set to the sum of the weight is 1.X

(p,q)2Pij

wpq =H�1X

p=0

H�1X

q=0

wpq = 10.01 0.01 0.01 0.01 0.01

0.01 0.05 0.05 0.05 0.01

0.01 0.05 0.44 0.05 0.01

0.01 0.05 0.05 0.05 0.01

0.01 0.01 0.01 0.01 0.01

To arrange the values to the following.

Example of the weight

LCN: Subtractive NormalizationSingle channel image (Gray scale image etc…)

- Locate max value on the center.- The value closer to the edge has lower value.

Page 90: "Deep Learning" Chap.6 Convolutional Neural Net

H = 17

https://gist.github.com/matsuken92/5b78c792f2ab98576c5c

H = 9H = 5H = 3

LCN: Subtractive Normalization

Page 91: "Deep Learning" Chap.6 Convolutional Neural Net

x̄ij =X

(p,q)2Pij

wpqxi+p,j+q

Weighted Average X

(p,q)2Pij

wpq =H�1X

p=0

H�1X

q=0

wpq = 1

2ij =

X

(p,q)2Pij

wpq(xi+p,j+q � x̄ij)2

Calculate variance of pixels in area , and apply normalization with this. This variance of pixels is the following.

Pij

Normalized value is the following.

zij =xij � x̄ij

�ij

LCN: Divisive NormalizationSingle channel image (Gray scale image etc…)

Page 92: "Deep Learning" Chap.6 Convolutional Neural Net

In order to avoid it, define a constant value . If standard deviation of pixels is lower than , divided with . That is,

However, if using normalized value as it is, there is a

demerit which is emphasized noise where contrasting density is low.

zij =xij � x̄ij

�ij

cc

zij =xij � x̄ij

max(c,�ij)

There is also similar way which is continuously changed depend on the value �ij

zij =xij � x̄ijqc+ �

2ij

LCN: Divisive Normalization

c

Page 93: "Deep Learning" Chap.6 Convolutional Neural Net

Considering interaction between channels, using average of same area with every channel.

x̄ij =1

K

K�1X

k=0

X

(p,q)2Pij

wpqxi+p,j+q,k

Pij

Subtract which is commonly used between channels from every pixel (i, j)

zijk = xijk � x̄ij

W

W

K

x̄ijzijk

LCN: Subtractive NormalizationMulti channel image (RGB etc…)

x̄ij

Page 94: "Deep Learning" Chap.6 Convolutional Neural Net

2ij =

1

K

K�1X

k=0

X

(p,q)2Pij

wpqk(xi+p,j+q,k � x̄ij)2

The variance of local area Pij

Calculation of divisive normalization

zijk =xij � x̄ijkq

c+ �

2ij

zijk =

xijk � x̄ij

max(c,�ij)

In the case denominator is changed continuously depend on the value of variance

LCN: Divisive NormalizationMulti channel image (RGB etc…)

Page 95: "Deep Learning" Chap.6 Convolutional Neural Net

Interaction between channels is applied on Normalization Layer for multi channel image

→ The idea of biological visual feature as the following is introduced into the model.

Local Contrast Normalization

- Sensitive to the difference of the contents - Insensitive to the absolute difference such as brightness or contrast etc.

Page 96: "Deep Learning" Chap.6 Convolutional Neural Net

Calculation of gradient

Page 97: "Deep Learning" Chap.6 Convolutional Neural Net

z(l) = f(u(l))(l) = f(W(l)z(l�1) + b(l))(l)

m1 1

m

1

m

1

nmn

Review

b(l)j

lth Layer (l � 1)th Layer

Page 98: "Deep Learning" Chap.6 Convolutional Neural Net

However, is not fully-connected

W(l)

Page 99: "Deep Learning" Chap.6 Convolutional Neural Net

Convolution layer

W

W

K

フィルタ1

H

m = 0

uij0

H

uijm =K�1X

k=0

H�1X

p=0

H�1X

q=0

z(l�1)i+p,j+q,khpqkm + bijm

Weight sharing, weight tying is applied.

Page 100: "Deep Learning" Chap.6 Convolutional Neural Net

Weight matrix can be constructed from a vector which is lined up with and the size is

…H

H

K

…H

H

K

…H

H

K

M=3

Gradient calculation of convolutionW(l)

hpqkm

mn

W(l)

How?

hH ⇥H ⇥K ⇥M

Page 101: "Deep Learning" Chap.6 Convolutional Neural Net

Example of The length of is

H = 3,K = 2,M = 2h 3⇥ 3⇥ 2⇥ 2 = 36

K = 0 K = 1

M=

0M

=1

h0000 h0100 h0200

h1000 h1100 h1200

h2200h2100h2000

h0010 h0110 h0210

h1210h1110h1010

h2010 h2110 h2210

h0001 h0101 h0201

h1001 h1101 h1201

h2001 h2101 h2201

h0011 h0111 h0211

h1011 h1011 h1211

h2011 h2111 h2211

h0000

h0100

h0200

h1000

h1100

h1200

h2000

h2100

h2200

h

h2211

h2111

h2011

h0011

h2210

h0010

h2201

h0001

(H ⇥H ⇥K ⇥M)

(H ⇥H ⇥K ⇥M)

Gradient calculation of convolution

Page 102: "Deep Learning" Chap.6 Convolutional Neural Net

h0000

h0100

h0200

h1000

h1100

h1200

h2000

h2100

h2200

Z(l�1)0

Z(l�1)0

U (l)0

z20,0

z00,0z01,0z02,0

z10,0z11,0z12,0

z21,0z22,0

ij

Page 103: "Deep Learning" Chap.6 Convolutional Neural Net

h0000

h0100

h0200

h1000

h1100

h1200

h2000

h2100

h2200

Z(l�1)0

Z(l�1)0

U (l)0

z20,0

z00,0z01,0z02,0

z10,0z11,0z12,0

z21,0z22,0

ij

wji = h0100

Page 104: "Deep Learning" Chap.6 Convolutional Neural Net

h0000

h0100

h0200

h1000

h1100

h1200

h2000

h2100

h2200

Z(l�1)0

Z(l�1)0

U (l)0

z20,0

z00,0z01,0z02,0

z10,0z11,0z12,0

z21,0z22,0

ij

Page 105: "Deep Learning" Chap.6 Convolutional Neural Net

h0000

h0100

h0200

h1000

h1100

h1200

h2000

h2100

h2200

Z(l�1)0

Z(l�1)0

U (l)0

z20,0

z00,0z01,0z02,0

z10,0z11,0z12,0

z21,0z22,0

ij

Page 106: "Deep Learning" Chap.6 Convolutional Neural Net

Gradient calculation of convolutionh0000

h0100

h0200

h1000

h1100

h1200

h2000

h2100

h2200

h

h2211

h2111

h2011

h0011

h2210

h0010

h2201

h0001

010000000

000000000

000000000

000000000

tij

wji = tTjih

123

3635

r

……

1

3

r

3635

……

(i = 3, j = 2)

r=2

When

Page 107: "Deep Learning" Chap.6 Convolutional Neural Net

h0000

h0100

h0200

h1000

h1100

h1200

h2000

h2100

h2200

h

h2211

h2111

h2011

h0011

h2210

h0010

h2201

h0001

010000000

000000000

000000000

000000000

tij123

3635

r

……

1

3

r

3635

……

r=2

wji = tTjih

tjir

(i = 3, j = 2)When

Gradient calculation of convolution

Page 108: "Deep Learning" Chap.6 Convolutional Neural Net

0 0 0 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0

Tr

0 1 … W−1…i

0

1

W−1

j

r = 2

Gradient calculation of convolution

(i = 3, j = 2)When

Page 109: "Deep Learning" Chap.6 Convolutional Neural Net

(@h)r =X

i,j

(Tr � @W)ji

@E

@W(l)= @W = �(l)z(l�1)T

Partial derivative of with respect to on the layer

Find the gradient of filter with which is calculated on previous page

Tr

l

h

i

j

h

r(H ⇥H ⇥K ⇥M) (W ⇥W )

Gradient calculation of convolution

E W

Page 110: "Deep Learning" Chap.6 Convolutional Neural Net

Review

z(l)j u(l)j

�(l+1)1

�(l+1)k

�(l+1)M

w(l+1)1j

w(l+1)kj

w(l+1)Mj

w(l)ji

Differentiate w.r.t. this.

z(l�1)i

f 0(l)

@En

@w(l)ji

=@En

@u(l)j

@u(l)j

@w(l)ji

= f 0(u(l)j )(

X

k

w(l+1)kj �(l+1)

k )z(l�1)i

lth Layer(l + 1)th Layer (l � 1)th Layer

Page 111: "Deep Learning" Chap.6 Convolutional Neural Net

@En

@w(l)ji

=@En

@u(l)j

@u(l)j

@w(l)ji

= f 0(u(l)j )(

X

k

w(l+1)kj �(l+1)

k )z(l�1)i

Matrix expression

means Product symbol for every element of Matrices

(6.5)�(l) = f 0(l)(u(l))� (W(l+1)T�(l+1))

�@E

@W(l)= @W = �(l)z(l�1)T

�(l)j = f 0(u(l)j )(

X

k

w(l+1)kj �(l+1)

k )

m1 1

1

m nmn

Gradient calculation of convolution

Page 112: "Deep Learning" Chap.6 Convolutional Neural Net

Handling Pooling LayerCalculation of gradient is not necessary, since there is no parameter for learning. So only back propagation of delta is calculated.

Perform calculation (6.5) described on previous page for every types of pooling with deciding W(l+1)

Average Pooling

Max Pooling

w(l+1)ji =

(1 (i, j) for max value

0 otherwise

w(l+1)ji =

(1

H2 if i 2 Pji

0 otherwise

Page 113: "Deep Learning" Chap.6 Convolutional Neural Net

Thanks• Azusa Colors (Keynote template) http://sanographix.github.io/azusa-colors/