convolutional neural networks10715-f18/lectures/cnns_2018.pdfconvolutional neural networks •a...

45
Convolutional Neural Networks Maria Florina Balcan 10/17/2018

Upload: others

Post on 21-May-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

Convolutional Neural NetworksMaria Florina Balcan

10/17/2018

Page 2: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

Convolutional neural networks

• A specialized kind of neural network for processing data that has a known grid-like topology.

• E.g., time-series data, which can be thought of as a 1-D grid taking samples at regular time intervals, and image data, which can be thought of as a 2-D grid of pixels

• The name “convolutional neural network” indicates that the network employs a mathematical operation called convolution . Convolution is a specialized kind of linear operation.

• Convolutional networks are neural networks that use convolution in place of general matrix multiplication in at least one of their layers.

Page 3: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

Convolutional neural networks

• Strong empirical application performance

• Convolutional networks: neural networks that use convolution in place of general matrix multiplication in at least one of their layers

for a specific kind of weight matrix 𝑊

ℎ = 𝜎(𝑊𝑇𝑥 + 𝑏)

Page 4: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

Convolution

Page 5: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

Convolution: discrete version

• Given array 𝑢𝑡 and 𝑤𝑡, their convolution is a function 𝑠𝑡

• Written as

• When 𝑢𝑡 or 𝑤𝑡 is not defined, assumed to be 0

𝑠𝑡 =

𝑎=−∞

+∞

𝑢𝑎𝑤𝑡−𝑎

𝑠 = 𝑢 ∗ 𝑤 or 𝑠𝑡 = 𝑢 ∗ 𝑤 𝑡

Page 6: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

Convolution, Motivation

• Suppose we track the location of a spaceship with a laser sensor. The laser sensor provides a single output u(t), the position of the spaceship at second t.

• Suppose sensor is noisy. To obtain a less noisy estimate of the spaceship’s position, we average several measurements. More recent measurements are more relevant, so we use a weighted average that gives more weight to recent measurements.

𝑠𝑡 =

𝑎=−∞

+∞

𝑢𝑎𝑤𝑡−𝑎

• Use a weighting function w(a), where a is the age of a measurement. If we apply such a weighted average operation at every moment, we obtain a new function s providing a smoothed estimate of the position of the spaceship:

Page 7: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

Illustration 1

a b c d e f

x y z

xb+yc+zd

𝑤= [z, y, x]𝑢 = [a, b, c, d, e, f]

𝑠3

𝐰𝟐 𝐰𝟏 𝐰𝟎

𝐮𝟏 𝒖𝟐 𝐮𝟑

Page 8: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

Illustration 1

a b c d e f

x y z

xc+yd+ze

𝑠4

𝐰𝟐 𝐰𝟏 𝐰𝟎

𝐮𝟐 𝒖𝟑 𝐮𝟒

Page 9: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

Illustration 1

a b c d e f

x y z

xd+ye+zf

𝐰𝟐 𝐰𝟏 𝐰𝟎

𝐮𝟑 𝒖𝟒 𝐮𝟓

𝑠5

Page 10: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

Illustration 1: boundary case

a b c d e f

x y

xe+yf

𝐰𝟐 𝐰𝟏

𝒖𝟒 𝐮𝟓

𝑠6

Page 11: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

Illustration 1 as matrix multiplication

y z

x y z

x y z

x y z

x y z

x y

a

b

c

d

e

f

Page 12: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

Illustration 2: two dimensional case

a b c d

e f g h

i j k l

w x

y z

aw + bx + ey + fz

bw + cx + fy + gz

cw + dx + gy + hz

ew + fx + iy + jz

fw + gx + jy + kz

gw + hx + ky + lz

Page 13: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

Illustration 2: two dimensional case

a b c d

e f g h

i j k l

w x

y z

wa + bx + ey + fz

Page 14: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

Illustration 2

a b c d

e f g h

i j k l

w x

y z

bw + cx + fy + gz

wa + bx + ey + fz

Page 15: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

Illustration 2

a b c d

e f g h

i j k l

w x

y z

bw + cx + fy + gz

wa + bx + ey + fz

Kernel (or filter)

Feature map

Input

Page 16: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

Illustration 2

a b c d

e f g h

i j k l

w x

y z

bw + cx + fy + gz

wa + bx + ey + fz

Kernel (or filter)

Feature map

Input

Page 17: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

Advantage: sparse interaction

Figure from Deep Learning, by Goodfellow, Bengio, and Courville

Fully connected layer, 𝑚 × 𝑛 edges

𝑚 output nodes

𝑛 input nodes

Page 18: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

Advantage: sparse interaction

Figure from Deep Learning, by Goodfellow, Bengio, and Courville

Convolutional layer, ≤ 𝑚 × 𝑘 edges

𝑚 output nodes

𝑛 input nodes

𝑘 kernel size

Store fewer parameters:

• reduces memory requirements

• improves statistical efficiency.

Page 19: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

Advantage: sparse interaction

Figure from Deep Learning, by Goodfellow, Bengio, and Courville

Multiple convolutional layers: larger receptive field

• Receptive field of units in deeper layers larger than receptive field of units in shallow layers.

• Even though direct connections are sparse, units in the deeper layers are indirectly connected most of the input image.

• At the first layer capture more local features, but as we go deeper in the network we capture more global features.

Page 20: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

Advantage: parameter sharing

Figure from Deep Learning, by Goodfellow, Bengio, and Courville

The same kernel are used repeatedly.E.g., the black edge is the same weightin the kernel.

Reduce the storage requirements of the model.

Page 21: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

Advantage: equivariant representations

• Equivariant: transforming the input = transforming the output

• Example: input is an image, transformation is shifting

• Convolution(shift(input)) = shift(Convolution(input))

• Useful when care only about the existence of a pattern, rather than the location

Page 22: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

Pooling

Page 23: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

Terminology

Figure from Deep Learning, by Goodfellow, Bengio, and Courville

Page 24: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

Pooling

• Summarizing the input (i.e., output the max of the input)

Figure from Deep Learning, by Goodfellow, Bengio, and Courville

A pooling function replaces the output of the net at a certain location with a summary statistic of the nearby outputs. . For example, the max pooling takes maximum output within a rectangular neighborhood.

Page 25: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

Advantage

Induce invariance

Figure from Deep Learning, by Goodfellow, Bengio, and Courville

Page 26: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

Variants of pooling

• Max pooling 𝑦 = max{𝑥1, 𝑥2, … , 𝑥𝑘}

• Average pooling 𝑦 = mean{𝑥1, 𝑥2, … , 𝑥𝑘}

• Others like max-out

Page 27: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

Motivation from neuroscience

• David Hubel and Torsten Wiesel studied early visual system in human brain (V1 or primary visual cortex), and won Nobel prize for this

• V1 properties• 2D spatial arrangement

• Simple cells: inspire convolution layers

• Complex cells: inspire pooling layers

Page 28: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

Variants of convolution and pooling

Page 29: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

Variants of convolutional layers

• Multiple dimensional convolution

• Input and kernel can be 3D• E.g., images have (width, height, RBG channels)

• Multiple kernels lead to multiple feature maps (also called channels)

Page 30: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

Variants of convolutional layers

• Padding: valid

a b c d e f

x y z

xd+ye+zf

Page 31: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

Variants of convolutional layers

• Padding: same

a b c d e f

x y

xe+yf

Page 32: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

Variants of convolutional layers

• Stride

Figure from Deep Learning, by Goodfellow, Bengio, and Courville

Page 33: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

Variants of pooling

• Stride and padding

Figure from Deep Learning, by Goodfellow, Bengio, and Courville

Page 34: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

Case study: LeNet-5

Page 35: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

LeNet-5

• Proposed in “Gradient-based learning applied to document recognition” , by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the IEEE, 1998

Page 36: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

LeNet-5

• Proposed in “Gradient-based learning applied to document recognition” , by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the IEEE, 1998

• Apply convolution on 2D images (MNIST) and use backpropagation

Page 37: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

LeNet-5

• Proposed in “Gradient-based learning applied to document recognition” , by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the IEEE, 1998

• Apply convolution on 2D images (MNIST) and use backpropagation

• Structure: 2 convolutional layers (with pooling) + 3 fully connected layers• Input size: 32x32x1

• Convolution kernel size: 5x5

• Pooling: 2x2

Page 38: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

LeNet-5

Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Page 39: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

LeNet-5

Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Page 40: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

LeNet-5

Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Filter: 5x5, stride: 1x1, #filters: 6

Page 41: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

LeNet-5

Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Pooling: 2x2, stride: 2

Page 42: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

LeNet-5

Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Filter: 5x5x6, stride: 1x1, #filters: 16

Page 43: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

LeNet-5

Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Pooling: 2x2, stride: 2

Page 44: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

LeNet-5

Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Weight matrix: 400x120

Page 45: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology

LeNet-5

Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Weight matrix: 120x84

Weight matrix: 84x10