"deep learning" chap.6 convolutional neural net
TRANSCRIPT
Chapter 6 Convolutional Neural Network
2015.7.15 wed.@kenmatsu4
Self-introduction・Twitter account @kenmatsu4 (Please follow me )
・Blog
I’m writing my blog posts on Qiita (But, Japanese Only) (Category: Statistics, Machine Learning, Python etc…)
http://qiita.com/kenmatsu4 (Over 2000 contribution ! ) ・My hobbies - Playing the bass guitar with my band member. - Traveling foreign countries, especially south-east Asia! (Cambodia, Myanmar, Bangladesh, Uyghur etc) Pictures of my travel : http://matsu-ken.jimdo.com
・Japanese version of this slide http://www.slideshare.net/matsukenbook/ss-50545587
Information
Author : Takayuki Okatani Machine Learning Professional Series ISBN: 978-4-06-152902-1
“Deep Learning”Chapter 6 Convolutional Neural Net
This is a slide for study group. Very good text for introduction of “Deep Learning. Let’s buy!
Unfortunately, Japanese only…
MASAKARI Come On !!!Let’s study together
https://twitter.com/_inundata/status/616658949761302528
For processing images with Neural Network, let’s use knowledge
of neuroscience!
• Receptive field • Simple cells • Complex cells
Using analogy of neuroscience
Receptive field ≒ Retina cells
http://bsd.neuroinf.jp/wiki/%e5%8f%97%e5%ae%b9%e9%87%8e
ON centered, OFF surrounded
OFF centered, ON surrounded
ON regionOFF region
On Center Cell On Center CellOff Center Cell Off Center Cell
https://en.wikipedia.org/wiki/Hypercomplex_cell
Receptive field ≒ Retina cells
Simple Cells and Complex Cells
https://en.wikipedia.org/wiki/Hypercomplex_cell
Forming a simple cell with setting receptive-field in line When exposed to light on
+ area and not exposed to light on - area,excitatory response occurs
When exposed to light on + and - area simultaneously,
excitatory response doesn’t occur
Simple Cells
http://www.cns.nyu.edu/~david/courses/perception/lecturenotes/V1/lgn-V1.html
Continuously respond with parallel moving, however, doesn’t respond with rotation.
Simple Cells and Complex Cells
Complex Cells
Main topic is from here.
Treat mathematically these knowledge of neuroscience,
and apply it to “Object Category Recognition”
Model of Simple Cells and Complex Cells
Receptive-field
Simple Cell
Complex Cell
The part of pink is a filter
Blue cell indicate the input signal.
Receptive-field
Simple Cell
Complex Cell
The part of pink is a filter
Model of Simple Cells and Complex Cells
The part of pink is a filter
Receptive-field
Simple Cell
Complex Cell
Model of Simple Cells and Complex Cells
The part of pink is a filter
Receptive-field
Simple Cell
Complex Cell
Model of Simple Cells and Complex Cells
The part of pink is a filter
Receptive-field
Simple Cell
Complex Cell
Model of Simple Cells and Complex Cells
The part of pink is a filter
Receptive-field
Simple Cell
Complex Cell
Model of Simple Cells and Complex Cells
Blue cell indicate the input signal.
The part of pink is a filter
Receptive-field
Model of Simple Cells and Complex Cells
Simple Cell
Complex Cell
Model of Simple Cells and Complex Cells
Input pattern has parallel shifted.
Receptive-field
Simple Cell
The cell on upper left was no longer respond due to position change
Complex Cell
If inputs is rotated…
the cell is not responded.
Model of Simple Cells and Complex Cells
Receptive-field
Simple Cell
Complex Cell
• Neocognitron First application of 2 layer structure (Simple cells, Complex cells) for engineering pattern recognition)
• LaNet LaNet is considered to roots of Convolutional Neural Net ( http://yann.lecun.com/exdb/lenet/ )
Similar methods
Whole Structure
• fully-connected layer • convolution layer • pooling layer • Local Contrast Normalization layer,
LCN
Types of layer used on CNN
→ Discussed to previous chapter is fully-connected layer. Output of l-1 layer is input to all of units on l layer
Structure of typical CNN
input (image)
convolution
convolution
pooling
LCN
convolution
pooling
fully-connected
fully-connected
softmax
output (category label)
In many cases, pooling layer is put after a couple of Convolution layers. Sometimes LCN layer is allocated after that. If the purpose is classification, Softmax function which is multi-variate version of sigmoid function is usually used.
Softmax Function fi(x) =exp(xi)Pnj exp(xj)
example
def forward(self, x_data, y_data, train=True): x = Variable(x_data, volatile=not train) t = Variable(y_data, volatile=not train)
h = F.relu(self.conv1(x)) h = F.relu(self.conv1a(h)) h = F.relu(self.conv1b(h)) h = F.max_pooling_2d(h, 3, stride=2) h = F.relu(self.conv2(h)) h = F.relu(self.conv2a(h)) h = F.relu(self.conv2b(h)) h = F.max_pooling_2d(h, 3, stride=2) h = F.relu(self.conv3(h)) h = F.relu(self.conv3a(h)) h = F.relu(self.conv3b(h)) h = F.dropout(h, F.max_pooling_2d(h, 3, stride=2), train=train) h = F.relu(self.conv4(h)) h = F.relu(self.conv4a(h)) h = F.relu(self.conv4b(h)) h = F.reshape(F.average_pooling_2d(h, 6), (x_data.shape[0], 1000)) return F.softmax_cross_entropy(h, t), F.accuracy(h, t)
Example of Chainer (Deep Learning Framework)
https://github.com/pfnet/chainer/tree/master/examples/imagenet
Definition of Convolution
Definition of Convolution(0,0) (0,1) ・・・ (0, W-2) (0, W-1)
(1, 0) (1, 1) ・・・ (1, W-2) (1, W-1)
・・・ ・・・ ・・・ ・・・
(W-2, 0) (W-2, 1) ・・・ (W-2, W-2) (W-2, W-1)
(W-1, 0) (W-1, 1) ・・・ (W-1, W-2) (W-1, W-1)
W pixel
W pixel
Address map of W x W pixel image0 0 1 0 ・・・ 0 0 0 00 1 0 0 ・・・ 0 0 0 01 0 0 0 ・・・ 0 0 0 00 0 0 0 ・・・ 0 0 0 0
・・・ ・・・ ・・・ ・・・ ・・・ ・・・ ・・・ ・・・
0 0 0 0 ・・・ 0 0 0 00 0 0 0 ・・・ 0 0 0 00 0 0 0 ・・・ 0 0 0 00 0 0 0 ・・・ 0 0 0 0
Example of W x W pixel data
0.01 0.02 0.05 0.15
0.02 0.05 0.15 0.05
0.05 0.15 0.05 0.02
0.15 0.05 0.02 0.01
H pixel
H pixel
Filter of H x H pixel
xij(i, j)
Definition of convolution of pixels
uij =H�1X
p=0
H�1X
q=0
xi+p,j+qhpq
※ Actually, right symbol of x’s index just before p and q is -, however there is no substantial difference with this notation. So + is also fine.
Definition of Convolution(0,0) (0,1) ・・・ (0, W-2) (0, W-1)
(1, 0) (1, 1) ・・・ (1, W-2) (1, W-1)
・・・ ・・・ ・・・ ・・・
(W-2, 0) (W-2, 1) ・・・ (W-2, W-2) (W-2, W-1)
(W-1, 0) (W-1, 1) ・・・ (W-1, W-2) (W-1, W-1)
W pixel
W pixel
Address map of W x W pixel image0 0 1 0 ・・・ 0 0 0 00 1 0 0 ・・・ 0 0 0 01 0 0 0 ・・・ 0 0 0 00 0 0 0 ・・・ 0 0 0 0
・・・ ・・・ ・・・ ・・・ ・・・ ・・・ ・・・ ・・・
0 0 0 0 ・・・ 0 0 0 00 0 0 0 ・・・ 0 0 0 00 0 0 0 ・・・ 0 0 0 00 0 0 0 ・・・ 0 0 0 0
Example of W x W pixel data
H pixel
H pixel
Filter of H x H pixel
xij(i, j)
Definition of convolution of pixels
uij =H�1X
p=0
H�1X
q=0
xi+p,j+qhpq
※ Actually, right symbol of x’s index just before p and q is -, however there is no substantial difference with this notation. So + is also fine.
0.01 0.02 0.05 0.150.02 0.05 0.15 0.050.05 0.15 0.05 0.020.15 0.05 0.02 0.01
Role of Convolution
cos filter
Lenna’s image
https://gist.github.com/matsuken92/5b78c792f2ab98576c5c
畳込み
uij =H�1X
p=0
H�1X
q=0
xi+p,j+qhpq
Extracted feature of contrasting density from the image.
Role of Convolution
cos filter
Lenna’s image
https://gist.github.com/matsuken92/5b78c792f2ab98576c5c
畳込み
uij =H�1X
p=0
H�1X
q=0
xi+p,j+qhpq
Extracted feature of contrasting density from the image.
By the way…
The filter size is・・・
Role of Convolution
like this.The filter size is・・・
Role of Convolution
Padding(W � 2bH/2c)⇥ (W � 2bH/2c)
bH/2cbH/2cW
H
b·c* means round down to integer
x00
A preparation method of filtering for edge of image properly without reducing image size.
The image size will be reduced as much as this.
Padding is used in order to avoid this
reducing.
H � 1
Question: If we interpret the equation straightforwardly, isn't the reduced area like the figure on the left?
uij =H�1X
p=0
H�1X
q=0
xi+p,j+qhpq
x00
Padding
Zero-padding
0 0 0 0 0 0 0 0 0 0
0 77 80 82 78 70 82 82 140 0
0 83 78 80 83 82 77 94 151 0
0 87 82 81 80 74 75 112 152 0
0 87 87 85 77 66 99 151 167 0
0 84 79 77 78 76 107 162 160 0
0 86 72 70 72 81 151 166 151 0
0 78 72 73 73 107 166 170 148 0
0 76 76 77 84 147 180 168 142 0
0 0 0 0 0 0 0 0 0 0
The method that the padding area is filled by 0.
→ This is broadly used for convolutional neural net.
DemeritConsequence of the convolution with zero-padding, around the edge becomes dark.
Filled by the pixels of most outside. Filled by the folded back pixels on the four side.
The other method
Stride
77 80 82 78 70 82 82 140
83 78 80 83 82 77 94 151
87 82 81 80 74 75 112 152
87 87 85 77 66 99 151 167
84 79 77 78 76 107 162 160
86 72 70 72 81 151 166 151
78 72 73 73 107 166 170 148
76 76 77 84 147 180 168 142
When the filter is slid with few pixels step by step, not one by one, for calculating sum of products, in that case, the interval of the filter is called “Stride”. If you handle very large size image, it is able to avoid that the output unit is too much larger. (Trade off with performance degradation
uij =H�1X
p=0
H�1X
q=0
xsi+p,sj+qhpq
s : Stride
Output image size when stride is applied
(b(W � 1)/sc+ 1)⇥ (b(W � 1)/sc+ 1)
It is common that stride is more than 2 on a pooling layer.
Stride
77 80 82 78 70 82 82 140
83 78 80 83 82 77 94 151
87 82 81 80 74 75 112 152
87 87 85 77 66 99 151 167
84 79 77 78 76 107 162 160
86 72 70 72 81 151 166 151
78 72 73 73 107 166 170 148
76 76 77 84 147 180 168 142
When the filter is slid with few pixels step by step, not one by one, for calculating sum of products, in that case, the interval of the filter is called “Stride”. If you handle very large size image, it is able to avoid that the output unit is too much larger. (Trade off with performance degradation
uij =H�1X
p=0
H�1X
q=0
xsi+p,sj+qhpq
s : Stride
Output image size when stride is applied
(b(W � 1)/sc+ 1)⇥ (b(W � 1)/sc+ 1)
It is common that stride is more than 2 on a pooling layer.
Stride
77 80 82 78 70 82 82 140
83 78 80 83 82 77 94 151
87 82 81 80 74 75 112 152
87 87 85 77 66 99 151 167
84 79 77 78 76 107 162 160
86 72 70 72 81 151 166 151
78 72 73 73 107 166 170 148
76 76 77 84 147 180 168 142
When the filter is slid with few pixels step by step, not one by one, for calculating sum of products, in that case, the interval of the filter is called “Stride”. If you handle very large size image, it is able to avoid that the output unit is too much larger. (Trade off with performance degradation
uij =H�1X
p=0
H�1X
q=0
xsi+p,sj+qhpq
s : Stride
Output image size when stride is applied
(b(W � 1)/sc+ 1)⇥ (b(W � 1)/sc+ 1)
It is common that stride is more than 2 on a pooling layer.
Stride
77 80 82 78 70 82 82 140
83 78 80 83 82 77 94 151
87 82 81 80 74 75 112 152
87 87 85 77 66 99 151 167
84 79 77 78 76 107 162 160
86 72 70 72 81 151 166 151
78 72 73 73 107 166 170 148
76 76 77 84 147 180 168 142
When the filter is slid with few pixels step by step, not one by one, for calculating sum of products, in that case, the interval of the filter is called “Stride”. If you handle very large size image, it is able to avoid that the output unit is too much larger. (Trade off with performance degradation
uij =H�1X
p=0
H�1X
q=0
xsi+p,sj+qhpq
s : Stride
Output image size when stride is applied
(b(W � 1)/sc+ 1)⇥ (b(W � 1)/sc+ 1)
It is common that stride is more than 2 on a pooling layer.
Convolution Layer
Convolution layer
Convolution LayerThis is correspond to Simple Cell, as described the following figure.
The part of pink is a filter
Blue cell indicate the input signal.
Receptive-field
Simple Cell
Complex Cell
Calculate convolution with parallel filters to multi channel image (e.g. RGB) , don’t use just 1 pcs o grayscale image on the practical Convolutional NeuralNet.
W
WK
W :The number of pixel
K:The number of channel
e.g. K=3 (RGB image) W ⇥W ⇥KImage size :
Convolution Layer
In some context, the size of image ( ) is called as “map”.
Much more channel size (e.g. K=16, K=256 etc) is commonly used on hidden layers (convolution layer or pooling layer)
W ⇥W ⇥K
Convolution Layer
The equation to obtain is the following.
uijm =K�1X
k=0
H�1X
p=0
H�1X
q=0
z(l�1)i+p,j+q,khpqkm + bijm
…
W
W
K
Filter 1
* …H
H
K
hpqk0
f(·)
m = 0
uij0 zij0
uijm =K�1X
k=0
H�1X
p=0
H�1X
q=0
z(l�1)i+p,j+q,khpqkm + bijm
Bias is commonly set as which doesn’t depend on the position ( ) of pixel of the image. (it is like some whole ‘s density)
bijm = bmi, j uijm
b0
Convolution Layer
…
W
W
K
Filter 1
* …H
H
K
hpqk0
f(·)
m = 0
uij0 zij0
Identical values of weight is used for every pixel , it is called as ”weight sharing, weight tying”
zij0hpqk0
b0
Convolution Layer
…
W
W
K
Filter 1
*H
H
K
hpqk0
f(·)
m = 0
uij0 zij0
b0
Convolution Layer
Identical values of weight is used for every pixel , it is called as ”weight sharing, weight tying”
zij0hpqk0
…
W
W
K
Filter 1
*H
H
K
hpqk0
f(·)
m = 0
uij0 zij0
b0
Convolution Layer
Identical values of weight is used for every pixel , it is called as ”weight sharing, weight tying”
zij0hpqk0
…
W
W
K
Filter 1
Filter 2
Filter 3
*
*
*
…H
H
…H
H
…H
H
K
hpqk0
hpqk1
hpqk2
m = 0
m = 1
m = 2
uij0
z(l�1)ijk
uij1
uij2
zijm (l)
zij2zij1zij0
M
f(·)
f(·)
f(·)
b0
b1
b2
Output from Convolution layer can be regarded as a multi-channel image whose size is with interpreting the number of filter as channel size.
W ⇥W ⇥M
Convolution Layer
…H
H
K
Parameter size is not depend on the size of image ( )
…H
H
K
…H
H
K
When M=3
W ⇥W
H ⇥H ⇥K ⇥MParameter size is the following.
That is,filter size ⇥ filter size
⇥ channel size ⇥ the number of filter
Convolution Layer
Gradient Descent method is applied for parameter optimization of Convolutional Neural Net, too.
The targets of optimization are and bias
For the calculation of the gradient, Back Propagation is also used. (in detail, explained later)
uijm =K�1X
k=0
H�1X
p=0
H�1X
q=0
z(l�1)i+p,j+q,khpqkm + bijm
hpqkm bijm
Convolution Layer
Pooling Layer
Pooling Layer
Generally, Pooling layer is located just after convolution layer, .
input (image)
convolution
convolution
pooling
LCN
convolution
pooling
fully-connected
fully-connected
softmax
output (category label)
example
Pooling layer is final layer of the following figure (Complex Cell part). It is designed to make the output of the pooling layer unchanged even if the target feature value becomes a little bit changed (or parallel transition).
Pooling Layer
The part of pink is a filter
Blue cell indicate the input signal.
Receptive-field
Pooling layer
Simple Cell
Complex Cell
Pooling Layer
The part of pink is a filter
Blue cell indicate the input signal.
Pooling layer is final layer of the following figure (Complex Cell part). It is designed to make the output of the pooling layer unchanged even if the target feature value becomes a little bit changed (or parallel transition).
Receptive-field
Pooling layer
Simple Cell
Complex Cell
Pooling layer is final layer of the following figure (Complex Cell part). It is designed to make the output of the pooling layer unchanged even if the target feature value becomes a little bit changed (or parallel transition).
Pooling Layer
The part of pink is a filter
Blue cell indicate the input signal.
Receptive-field
Pooling layer
Simple Cell
Complex Cell
zij
H
H
denotes a set of pixels included this area.Pij
W
W
A pixel value is obtained by using pcs of pixel value with every channelsk
H2
uijk
Padding
Pooling Layer
1. Max pooling
2. Average pooling
3. Lp pooling
3 Types of pooling layer
Using maximum value from the pixels in the area.
77 80 82 78 70 82 82 140
83 78 80 83 82 77 94 151
87 82 81 80 74 75 112 152
87 87 85 77 66 99 151 167
84 79 77 78 76 107 162 160
86 72 70 72 81 151 166 151
78 72 73 73 107 166 170 148
76 76 77 84 147 180 168 142
1.Max Pooling
87 87 87 83 112 152 152 152
87 87 87 99 151 167 167 167
87 87 87 107 162 167 167 167
87 87 87 151 166 167 167 167
87 87 107 166 170 170 170 170
87 87 147 180 180 180 180 180
86 86 147 180 180 180 180 180
86 86 147 180 180 180 180 180
uijk = max
p,q2Pi,j
zpqk
zpqkuijk
H2
Standard way to apply image recognition
77 80 82 78 70 82 82 140
83 78 80 83 82 77 94 151
87 82 81 80 74 75 112 152
87 87 85 77 66 99 151 167
84 79 77 78 76 107 162 160
86 72 70 72 81 151 166 151
78 72 73 73 107 166 170 148
76 76 77 84 147 180 168 142
87 87 87 83 112 152 152 152
87 87 87 99 151 167 167 167
87 87 87 107 162 167 167 167
87 87 87 151 166 167 167 167
87 87 107 166 170 170 170 170
87 87 147 180 180 180 180 180
86 86 147 180 180 180 180 180
86 86 147 180 180 180 180 180
uijk = max
p,q2Pi,j
zpqk
zpqkuijk
1.Max PoolingUsing maximum value from the pixels in the area. H2
Standard way to apply image recognition
77 80 82 78 70 82 82 140
83 78 80 83 82 77 94 151
87 82 81 80 74 75 112 152
87 87 85 77 66 99 151 167
84 79 77 78 76 107 162 160
86 72 70 72 81 151 166 151
78 72 73 73 107 166 170 148
76 76 77 84 147 180 168 142
87 87 87 83 112 152 152 152
87 87 87 99 151 167 167 167
87 87 87 107 162 167 167 167
87 87 87 151 166 167 167 167
87 87 107 166 170 170 170 170
87 87 147 180 180 180 180 180
86 86 147 180 180 180 180 180
86 86 147 180 180 180 180 180
uijk = max
p,q2Pi,j
zpqk
zpqkuijk
1.Max PoolingUsing maximum value from the pixels in the area. H2
Standard way to apply image recognition
77 80 82 78 70 82 82 140
83 78 80 83 82 77 94 151
87 82 81 80 74 75 112 152
87 87 85 77 66 99 151 167
84 79 77 78 76 107 162 160
86 72 70 72 81 151 166 151
78 72 73 73 107 166 170 148
76 76 77 84 147 180 168 142
87 87 87 83 112 152 152 152
87 87 87 99 151 167 167 167
87 87 87 107 162 167 167 167
87 87 87 151 166 167 167 167
87 87 107 166 170 170 170 170
87 87 147 180 180 180 180 180
86 86 147 180 180 180 180 180
86 86 147 180 180 180 180 180
uijk = max
p,q2Pi,j
zpqk
zpqkuijk
1.Max PoolingUsing maximum value from the pixels in the area. H2
Standard way to apply image recognition
77 80 82 78 70 82 82 140
83 78 80 83 82 77 94 151
87 82 81 80 74 75 112 152
87 87 85 77 66 99 151 167
84 79 77 78 76 107 162 160
86 72 70 72 81 151 166 151
78 72 73 73 107 166 170 148
76 76 77 84 147 180 168 142
87 87 87 83 112 152 152 152
87 87 87 99 151 167 167 167
87 87 87 107 162 167 167 167
87 87 87 151 166 167 167 167
87 87 107 166 170 170 170 170
87 87 147 180 180 180 180 180
86 86 147 180 180 180 180 180
86 86 147 180 180 180 180 180
uijk = max
p,q2Pi,j
zpqk
zpqkuijk
1.Max PoolingUsing maximum value from the pixels in the area. H2
Standard way to apply image recognition
77 80 82 78 70 82 82 140
83 78 80 83 82 77 94 151
87 82 81 80 74 75 112 152
87 87 85 77 66 99 151 167
84 79 77 78 76 107 162 160
86 72 70 72 81 151 166 151
78 72 73 73 107 166 170 148
76 76 77 84 147 180 168 142
87 87 87 83 112 152 152 152
87 87 87 99 151 167 167 167
87 87 87 107 162 167 167 167
87 87 87 151 166 167 167 167
87 87 107 166 170 170 170 170
87 87 147 180 180 180 180 180
86 86 147 180 180 180 180 180
86 86 147 180 180 180 180 180
uijk = max
p,q2Pi,j
zpqk
zpqkuijk
1.Max PoolingUsing maximum value from the pixels in the area. H2
Standard way to apply image recognition
77 80 82 78 70 82 82 140
83 78 80 83 82 77 94 151
87 82 81 80 74 75 112 152
87 87 85 77 66 99 151 167
84 79 77 78 76 107 162 160
86 72 70 72 81 151 166 151
78 72 73 73 107 166 170 148
76 76 77 84 147 180 168 142
87 87 87 83 112 152 152 152
87 87 87 99 151 167 167 167
87 87 87 107 162 167 167 167
87 87 87 151 166 167 167 167
87 87 107 166 170 170 170 170
87 87 147 180 180 180 180 180
86 86 147 180 180 180 180 180
86 86 147 180 180 180 180 180
zpqkuijk
1.Max PoolingUsing maximum value from the pixels in the area. H2
Standard way to apply image recognition
77 80 82 78 70 82 82 140
83 78 80 83 82 77 94 151
87 82 81 80 74 75 112 152
87 87 85 77 66 99 151 167
84 79 77 78 76 107 162 160
86 72 70 72 81 151 166 151
78 72 73 73 107 166 170 148
76 76 77 84 147 180 168 142
87 87 87 83 112 152 152 152
87 87 87 99 151 167 167 167
87 87 87 107 162 167 167 167
87 87 87 151 166 167 167 167
87 87 107 166 170 170 170 170
87 87 147 180 180 180 180 180
86 86 147 180 180 180 180 180
86 86 147 180 180 180 180 180
uijk = max
p,q2Pi,j
zpqk
zpqkuijk
1.Max PoolingUsing maximum value from the pixels in the area. H2
Standard way to apply image recognition
77 80 82 78 70 82 82 140
83 78 80 83 82 77 94 151
87 82 81 80 74 75 112 152
87 87 85 77 66 99 151 167
84 79 77 78 76 107 162 160
86 72 70 72 81 151 166 151
78 72 73 73 107 166 170 148
76 76 77 84 147 180 168 142
81.1 80.9 79.8 78.9 82.1 95.5 99.3 107.2
82.4 81.7 80.0 79.9 85.5 99.6 104.6 115.2
81.9 81.3 79.7 80.6 88.4 103.0 109.0 120.7
81.2 80.4 79.5 82.8 94.2 109.8 117.7 131.7
80.0 79.0 79.4 86.4 101.2 116.8 127.1 142.5
78.6 78.2 81.6 93.3 110.5 126.0 138.3 152.5
76.7 76.7 81.9 95.9 114.3 129.5 142.6 155.9
75.6 75.8 82.9 100.1 119.0 133.7 148.1 160.2
zpqkuijk
2. Average Poolinguijk =
1
H2
X
(p,q)2Pij
zpqkUsing average value from the pixels in the area. H2
77 80 82 78 70 82 82 140
83 78 80 83 82 77 94 151
87 82 81 80 74 75 112 152
87 87 85 77 66 99 151 167
84 79 77 78 76 107 162 160
86 72 70 72 81 151 166 151
78 72 73 73 107 166 170 148
76 76 77 84 147 180 168 142
81.1 80.9 79.8 78.9 82.1 95.5 99.3 107.2
82.4 81.7 80.0 79.9 85.5 99.6 104.6 115.2
81.9 81.3 79.7 80.6 88.4 103.0 109.0 120.7
81.2 80.4 79.5 82.8 94.2 109.8 117.7 131.7
80.0 79.0 79.4 86.4 101.2 116.8 127.1 142.5
78.6 78.2 81.6 93.3 110.5 126.0 138.3 152.5
76.7 76.7 81.9 95.9 114.3 129.5 142.6 155.9
75.6 75.8 82.9 100.1 119.0 133.7 148.1 160.2
zpqkuijk
uijk =1
H2
X
(p,q)2Pij
zpqk
2. Average PoolingUsing average value from the pixels in the area. H2
77 80 82 78 70 82 82 140
83 78 80 83 82 77 94 151
87 82 81 80 74 75 112 152
87 87 85 77 66 99 151 167
84 79 77 78 76 107 162 160
86 72 70 72 81 151 166 151
78 72 73 73 107 166 170 148
76 76 77 84 147 180 168 142
81.1 80.9 79.8 78.9 82.1 95.5 99.3 107.2
82.4 81.7 80.0 79.9 85.5 99.6 104.6 115.2
81.9 81.3 79.7 80.6 88.4 103.0 109.0 120.7
81.2 80.4 79.5 82.8 94.2 109.8 117.7 131.7
80.0 79.0 79.4 86.4 101.2 116.8 127.1 142.5
78.6 78.2 81.6 93.3 110.5 126.0 138.3 152.5
76.7 76.7 81.9 95.9 114.3 129.5 142.6 155.9
75.6 75.8 82.9 100.1 119.0 133.7 148.1 160.2
zpqkuijk
uijk =1
H2
X
(p,q)2Pij
zpqk
2. Average PoolingUsing average value from the pixels in the area. H2
77 80 82 78 70 82 82 140
83 78 80 83 82 77 94 151
87 82 81 80 74 75 112 152
87 87 85 77 66 99 151 167
84 79 77 78 76 107 162 160
86 72 70 72 81 151 166 151
78 72 73 73 107 166 170 148
76 76 77 84 147 180 168 142
81.1 80.9 79.8 78.9 82.1 95.5 99.3 107.2
82.4 81.7 80.0 79.9 85.5 99.6 104.6 115.2
81.9 81.3 79.7 80.6 88.4 103.0 109.0 120.7
81.2 80.4 79.5 82.8 94.2 109.8 117.7 131.7
80.0 79.0 79.4 86.4 101.2 116.8 127.1 142.5
78.6 78.2 81.6 93.3 110.5 126.0 138.3 152.5
76.7 76.7 81.9 95.9 114.3 129.5 142.6 155.9
75.6 75.8 82.9 100.1 119.0 133.7 148.1 160.2
zpqkuijk
uijk =1
H2
X
(p,q)2Pij
zpqk
2. Average PoolingUsing average value from the pixels in the area. H2
77 80 82 78 70 82 82 140
83 78 80 83 82 77 94 151
87 82 81 80 74 75 112 152
87 87 85 77 66 99 151 167
84 79 77 78 76 107 162 160
86 72 70 72 81 151 166 151
78 72 73 73 107 166 170 148
76 76 77 84 147 180 168 142
81.1 80.9 79.8 78.9 82.1 95.5 99.3 107.2
82.4 81.7 80.0 79.9 85.5 99.6 104.6 115.2
81.9 81.3 79.7 80.6 88.4 103.0 109.0 120.7
81.2 80.4 79.5 82.8 94.2 109.8 117.7 131.7
80.0 79.0 79.4 86.4 101.2 116.8 127.1 142.5
78.6 78.2 81.6 93.3 110.5 126.0 138.3 152.5
76.7 76.7 81.9 95.9 114.3 129.5 142.6 155.9
75.6 75.8 82.9 100.1 119.0 133.7 148.1 160.2
zpqkuijk
uijk =1
H2
X
(p,q)2Pij
zpqk
2. Average PoolingUsing average value from the pixels in the area. H2
77 80 82 78 70 82 82 140
83 78 80 83 82 77 94 151
87 82 81 80 74 75 112 152
87 87 85 77 66 99 151 167
84 79 77 78 76 107 162 160
86 72 70 72 81 151 166 151
78 72 73 73 107 166 170 148
76 76 77 84 147 180 168 142
81.1 80.9 79.8 78.9 82.1 95.5 99.3 107.2
82.4 81.7 80.0 79.9 85.5 99.6 104.6 115.2
81.9 81.3 79.7 80.6 88.4 103.0 109.0 120.7
81.2 80.4 79.5 82.8 94.2 109.8 117.7 131.7
80.0 79.0 79.4 86.4 101.2 116.8 127.1 142.5
78.6 78.2 81.6 93.3 110.5 126.0 138.3 152.5
76.7 76.7 81.9 95.9 114.3 129.5 142.6 155.9
75.6 75.8 82.9 100.1 119.0 133.7 148.1 160.2
zpqkuijk
uijk =1
H2
X
(p,q)2Pij
zpqk
2. Average PoolingUsing average value from the pixels in the area. H2
77 80 82 78 70 82 82 140
83 78 80 83 82 77 94 151
87 82 81 80 74 75 112 152
87 87 85 77 66 99 151 167
84 79 77 78 76 107 162 160
86 72 70 72 81 151 166 151
78 72 73 73 107 166 170 148
76 76 77 84 147 180 168 142
81.1 80.9 79.8 78.9 82.1 95.5 99.3 107.2
82.4 81.7 80.0 79.9 85.5 99.6 104.6 115.2
81.9 81.3 79.7 80.6 88.4 103.0 109.0 120.7
81.2 80.4 79.5 82.8 94.2 109.8 117.7 131.7
80.0 79.0 79.4 86.4 101.2 116.8 127.1 142.5
78.6 78.2 81.6 93.3 110.5 126.0 138.3 152.5
76.7 76.7 81.9 95.9 114.3 129.5 142.6 155.9
75.6 75.8 82.9 100.1 119.0 133.7 148.1 160.2
zpqkuijk
uijk =1
H2
X
(p,q)2Pij
zpqk
2. Average PoolingUsing average value from the pixels in the area. H2
77 80 82 78 70 82 82 140
83 78 80 83 82 77 94 151
87 82 81 80 74 75 112 152
87 87 85 77 66 99 151 167
84 79 77 78 76 107 162 160
86 72 70 72 81 151 166 151
78 72 73 73 107 166 170 148
76 76 77 84 147 180 168 142
81.1 80.9 79.8 78.9 82.1 95.5 99.3 107.2
82.4 81.7 80.0 79.9 85.5 99.6 104.6 115.2
81.9 81.3 79.7 80.6 88.4 103.0 109.0 120.7
81.2 80.4 79.5 82.8 94.2 109.8 117.7 131.7
80.0 79.0 79.4 86.4 101.2 116.8 127.1 142.5
78.6 78.2 81.6 93.3 110.5 126.0 138.3 152.5
76.7 76.7 81.9 95.9 114.3 129.5 142.6 155.9
75.6 75.8 82.9 100.1 119.0 133.7 148.1 160.2
zpqkuijk
uijk =1
H2
X
(p,q)2Pij
zpqk
2. Average PoolingUsing average value from the pixels in the area. H2
3.Lp pooling
https://gist.github.com/matsuken92/5b78c792f2ab98576c5c#file-03_anim_lp_pooling-py
uijk =
0
@ 1
H2
X
(p,q)2Pij
zPpqk
1
A
1P
Lp pooling is a general way including Max pooling and Average pooling.
When , it works as Average pooling. When , it works as Max pooling.
P = 1
P = 1
e.g. Uniform distribution
https://gist.github.com/matsuken92/5b78c792f2ab98576c5c#file-03_anim_lp_pooling-py
uijk =
0
@ 1
H2
X
(p,q)2Pij
zPpqk
1
A
1P
3.Lp pooling
Lp pooling is a general way including Max pooling and Average pooling.
When , it works as Average pooling. When , it works as Max pooling.
P = 1
P = 1
e.g. Beta distribution
Generally, calculation is conducted on every input channel independently on pooling layer, so the number of output-channel is same as input.K
…
W
W
K
…
W
W
K
Pooling Layer
The number of channel K is not changed.
※ Normally, activate function is not applied on pooling layer.
There is no parameter which is adjustable, since the weights on the pooling layer is fixed.
Pooling Layer
Pooling size : , Stride :
77 80 82 78 70 82 82 140
83 78 80 83 82 77 94 151
87 82 81 80 74 75 112 152
87 87 85 77 66 99 151 167
84 79 77 78 76 107 162 160
86 72 70 72 81 151 166 151
78 72 73 73 107 166 170 148
76 76 77 84 147 180 168 142
81.1 79.8 82.1 99.3
81.9 79.7 88.4 109.0
80.0 79.4 101.2 127.1
76.7 81.9 114.3 142.6
zpqkuijk
Stride of the pooling layers = 25⇥ 5
b(W � 1)/sc+ 1The size of output layer
So, in this example…b(8� 1)/2c+ 1 = 4
77 80 82 78 70 82 82 140
83 78 80 83 82 77 94 151
87 82 81 80 74 75 112 152
87 87 85 77 66 99 151 167
84 79 77 78 76 107 162 160
86 72 70 72 81 151 166 151
78 72 73 73 107 166 170 148
76 76 77 84 147 180 168 142
81.1 79.8 82.1 99.3
81.9 79.7 88.4 109.0
80.0 79.4 101.2 127.1
76.7 81.9 114.3 142.6
zpqkuijk
Stride of the pooling layerPooling size : , Stride : s = 25⇥ 5
b(W � 1)/sc+ 1The size of output layer
So, in this example…b(8� 1)/2c+ 1 = 4
77 80 82 78 70 82 82 140
83 78 80 83 82 77 94 151
87 82 81 80 74 75 112 152
87 87 85 77 66 99 151 167
84 79 77 78 76 107 162 160
86 72 70 72 81 151 166 151
78 72 73 73 107 166 170 148
76 76 77 84 147 180 168 142
81.1 79.8 82.1 99.3
81.9 79.7 88.4 109.0
80.0 79.4 101.2 127.1
76.7 81.9 114.3 142.6
zpqkuijk
Stride of the pooling layerPooling size : , Stride : s = 25⇥ 5
b(W � 1)/sc+ 1The size of output layer
So, in this example…b(8� 1)/2c+ 1 = 4
77 80 82 78 70 82 82 140
83 78 80 83 82 77 94 151
87 82 81 80 74 75 112 152
87 87 85 77 66 99 151 167
84 79 77 78 76 107 162 160
86 72 70 72 81 151 166 151
78 72 73 73 107 166 170 148
76 76 77 84 147 180 168 142
81.1 79.8 82.1 99.3
81.9 79.7 88.4 109.0
80.0 79.4 101.2 127.1
76.7 81.9 114.3 142.6
zpqkuijk
Stride of the pooling layerPooling size : , Stride : s = 25⇥ 5
b(W � 1)/sc+ 1The size of output layer
So, in this example…b(8� 1)/2c+ 1 = 4
1. Normalization for single channel 1-1. Subtractive Normalization 1-2. Divisive Normalization
2. Normalization for multi channel 2-1. Subtractive Normalization 2-2. Divisive Normalization
Local Contrast Normalization (LCN)
Contrast
http://homepage2.nifty.com/tsugu/sotuken/ronbun/sec3-2.html#0005
High contrast
Low contrast
Original
Contrast adjustment is a operation controlling the difference of color strength on a image. If high contrast, more distinguishable between bright and dark.
Input pixel value
Out
put p
ixel
val
ue
Brightness
http://www.mis.med.akita-u.ac.jp/~kata/image/monogamma.html
High Brightness
Low Brightness
Original
Brightness adjustment uses exponential function for transformation with parameter .
γ = 1.5 γ = 2.0
�
Normalization
exijk =1
N
NX
n=1
x
(n)ijk
Calculate average between training images for every channel and every pixels
xijk xijk � exijk
Converted input data is the value subtracted this average value from target pixel.
x
(n)ijk:Pixel value of channel
(i, j) and address k
The processing is applied for every image individually.
Average of pixels for every training image
Local Contrast Normalization
LCN: Subtractive NormalizationSingle channel image (Gray scale image etc…)
zij
H
Pij
W
H W
W
W
xij
x̄ij =1
H2
X
(p,q)2Pij
xi+p,j+q
x̄ij =X
(p,q)2Pij
wpqxi+p,j+q
Subtracting average pixel value in the area from pixels of input image.Pij
Weighted Average
zij = xij � x̄ij
Average
x̄ij =X
(p,q)2Pij
wpqxi+p,j+q
Weighted Average
The weight is set to the sum of the weight is 1.X
(p,q)2Pij
wpq =H�1X
p=0
H�1X
q=0
wpq = 10.01 0.01 0.01 0.01 0.01
0.01 0.05 0.05 0.05 0.01
0.01 0.05 0.44 0.05 0.01
0.01 0.05 0.05 0.05 0.01
0.01 0.01 0.01 0.01 0.01
To arrange the values to the following.
Example of the weight
LCN: Subtractive NormalizationSingle channel image (Gray scale image etc…)
- Locate max value on the center.- The value closer to the edge has lower value.
H = 17
https://gist.github.com/matsuken92/5b78c792f2ab98576c5c
H = 9H = 5H = 3
LCN: Subtractive Normalization
x̄ij =X
(p,q)2Pij
wpqxi+p,j+q
Weighted Average X
(p,q)2Pij
wpq =H�1X
p=0
H�1X
q=0
wpq = 1
�
2ij =
X
(p,q)2Pij
wpq(xi+p,j+q � x̄ij)2
Calculate variance of pixels in area , and apply normalization with this. This variance of pixels is the following.
Pij
Normalized value is the following.
zij =xij � x̄ij
�ij
LCN: Divisive NormalizationSingle channel image (Gray scale image etc…)
In order to avoid it, define a constant value . If standard deviation of pixels is lower than , divided with . That is,
However, if using normalized value as it is, there is a
demerit which is emphasized noise where contrasting density is low.
zij =xij � x̄ij
�ij
cc
zij =xij � x̄ij
max(c,�ij)
There is also similar way which is continuously changed depend on the value �ij
zij =xij � x̄ijqc+ �
2ij
LCN: Divisive Normalization
c
Considering interaction between channels, using average of same area with every channel.
x̄ij =1
K
K�1X
k=0
X
(p,q)2Pij
wpqxi+p,j+q,k
Pij
Subtract which is commonly used between channels from every pixel (i, j)
zijk = xijk � x̄ij
…
W
W
K
x̄ijzijk
LCN: Subtractive NormalizationMulti channel image (RGB etc…)
x̄ij
�
2ij =
1
K
K�1X
k=0
X
(p,q)2Pij
wpqk(xi+p,j+q,k � x̄ij)2
The variance of local area Pij
Calculation of divisive normalization
zijk =xij � x̄ijkq
c+ �
2ij
zijk =
xijk � x̄ij
max(c,�ij)
In the case denominator is changed continuously depend on the value of variance
LCN: Divisive NormalizationMulti channel image (RGB etc…)
Interaction between channels is applied on Normalization Layer for multi channel image
→ The idea of biological visual feature as the following is introduced into the model.
Local Contrast Normalization
- Sensitive to the difference of the contents - Insensitive to the absolute difference such as brightness or contrast etc.
Calculation of gradient
z(l) = f(u(l))(l) = f(W(l)z(l�1) + b(l))(l)
m1 1
m
1
m
1
nmn
Review
b(l)j
lth Layer (l � 1)th Layer
However, is not fully-connected
W(l)
Convolution layer
…
W
W
K
フィルタ1
H
m = 0
uij0
H
uijm =K�1X
k=0
H�1X
p=0
H�1X
q=0
z(l�1)i+p,j+q,khpqkm + bijm
Weight sharing, weight tying is applied.
Weight matrix can be constructed from a vector which is lined up with and the size is
…H
H
K
…H
H
K
…H
H
K
M=3
Gradient calculation of convolutionW(l)
hpqkm
mn
W(l)
How?
hH ⇥H ⇥K ⇥M
Example of The length of is
H = 3,K = 2,M = 2h 3⇥ 3⇥ 2⇥ 2 = 36
K = 0 K = 1
M=
0M
=1
h0000 h0100 h0200
h1000 h1100 h1200
h2200h2100h2000
h0010 h0110 h0210
h1210h1110h1010
h2010 h2110 h2210
h0001 h0101 h0201
h1001 h1101 h1201
h2001 h2101 h2201
h0011 h0111 h0211
h1011 h1011 h1211
h2011 h2111 h2211
h0000
h0100
h0200
h1000
h1100
h1200
h2000
h2100
h2200
h
h2211
h2111
h2011
h0011
h2210
h0010
h2201
h0001
(H ⇥H ⇥K ⇥M)
(H ⇥H ⇥K ⇥M)
Gradient calculation of convolution
…
h0000
h0100
h0200
h1000
h1100
h1200
h2000
h2100
h2200
Z(l�1)0
Z(l�1)0
U (l)0
z20,0
z00,0z01,0z02,0
z10,0z11,0z12,0
z21,0z22,0
ij
…
h0000
h0100
h0200
h1000
h1100
h1200
h2000
h2100
h2200
Z(l�1)0
Z(l�1)0
U (l)0
z20,0
z00,0z01,0z02,0
z10,0z11,0z12,0
z21,0z22,0
ij
wji = h0100
…
h0000
h0100
h0200
h1000
h1100
h1200
h2000
h2100
h2200
Z(l�1)0
Z(l�1)0
U (l)0
z20,0
z00,0z01,0z02,0
z10,0z11,0z12,0
z21,0z22,0
ij
…
h0000
h0100
h0200
h1000
h1100
h1200
h2000
h2100
h2200
Z(l�1)0
Z(l�1)0
U (l)0
z20,0
z00,0z01,0z02,0
z10,0z11,0z12,0
z21,0z22,0
ij
Gradient calculation of convolutionh0000
h0100
h0200
h1000
h1100
h1200
h2000
h2100
h2200
h
h2211
h2111
h2011
h0011
h2210
h0010
h2201
h0001
010000000
000000000
000000000
000000000
tij
wji = tTjih
123
3635
…
r
…
……
1
3
…
r
…
3635
……
(i = 3, j = 2)
r=2
When
h0000
h0100
h0200
h1000
h1100
h1200
h2000
h2100
h2200
h
h2211
h2111
h2011
h0011
h2210
h0010
h2201
h0001
010000000
000000000
000000000
000000000
tij123
3635
…
r
…
……
1
3
…
r
…
3635
……
r=2
wji = tTjih
tjir
(i = 3, j = 2)When
Gradient calculation of convolution
0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
Tr
0 1 … W−1…i
0
1
…
W−1
…
j
r = 2
Gradient calculation of convolution
(i = 3, j = 2)When
(@h)r =X
i,j
(Tr � @W)ji
@E
@W(l)= @W = �(l)z(l�1)T
Partial derivative of with respect to on the layer
Find the gradient of filter with which is calculated on previous page
Tr
l
h
i
j
h
r(H ⇥H ⇥K ⇥M) (W ⇥W )
Gradient calculation of convolution
E W
Review
z(l)j u(l)j
�(l+1)1
�(l+1)k
�(l+1)M
w(l+1)1j
w(l+1)kj
w(l+1)Mj
w(l)ji
Differentiate w.r.t. this.
z(l�1)i
f 0(l)
@En
@w(l)ji
=@En
@u(l)j
@u(l)j
@w(l)ji
= f 0(u(l)j )(
X
k
w(l+1)kj �(l+1)
k )z(l�1)i
lth Layer(l + 1)th Layer (l � 1)th Layer
@En
@w(l)ji
=@En
@u(l)j
@u(l)j
@w(l)ji
= f 0(u(l)j )(
X
k
w(l+1)kj �(l+1)
k )z(l�1)i
Matrix expression
means Product symbol for every element of Matrices
(6.5)�(l) = f 0(l)(u(l))� (W(l+1)T�(l+1))
�@E
@W(l)= @W = �(l)z(l�1)T
�(l)j = f 0(u(l)j )(
X
k
w(l+1)kj �(l+1)
k )
m1 1
1
m nmn
Gradient calculation of convolution
Handling Pooling LayerCalculation of gradient is not necessary, since there is no parameter for learning. So only back propagation of delta is calculated.
Perform calculation (6.5) described on previous page for every types of pooling with deciding W(l+1)
Average Pooling
Max Pooling
w(l+1)ji =
(1 (i, j) for max value
0 otherwise
w(l+1)ji =
(1
H2 if i 2 Pji
0 otherwise
Thanks• Azusa Colors (Keynote template) http://sanographix.github.io/azusa-colors/