lecture 7: convolutional networksjustincj/slides/eecs498/498_fa2019_lecture07.pdflecture 7 -2 due...

JustinJohnson September24,2019

Lecture7:ConvolutionalNetworks

Lecture7- 1


Reminder:A2

Lecture7- 2

DueMonday,September30,11:59pm(Evenifyouenrolledlate!)

Yoursubmissionmustpassthevalidationscript


Slightschedulechange

Lecture7- 3

Contentoriginallyplannedfortodaygotsplitintotwolectures

Pushestheschedulebackabit:

A4DueDate:Friday11/1->Friday11/8A5DueDate:Friday11/15->Friday11/22A6DueDate:StillFriday12/6


LastTime:Backpropagation

Lecture7- 4

x

W

hingeloss

R

+ Ls (scores)*

Representcomplexexpressionsascomputationalgraphs

Forwardpasscomputesoutputs

Backwardpasscomputesgradients

fLocal

gradients

Upstreamgradient

Downstreamgradients

Duringthebackwardpass,eachnodeinthegraphreceivesupstreamgradientsandmultipliesthembylocalgradients tocomputedownstreamgradients

JustinJohnson September24,2019Lecture7- 5

Inputimage(2,2)

56

231

24

2

56 231

24 2

Stretchpixelsintocolumn

(4,)x hW1 sW2

Input:3072

Hiddenlayer:100

Output:10

f(x,W)=Wx

Problem:Sofarourclassifiersdon’trespectthespatialstructureofimages!


Inputimage(2,2)

56

231

24

2

56 231

24 2

Stretchpixelsintocolumn

(4,)x hW1 sW2

Input:3072

Hiddenlayer:100

Output:10

f(x,W)=Wx

Problem:Sofarourclassifiersdon’trespectthespatialstructureofimages!

Solution:Definenewcomputationalnodesthatoperateonimages!


ComponentsofaFull-ConnectedNetwork

Lecture7- 7

x h s

Fully-ConnectedLayers ActivationFunction


ComponentsofaConvolutionalNetwork

Lecture7- 8

ConvolutionLayers PoolingLayers

x h s


Normalization



Lecture7- 9


x h s


Normalization


Fully-ConnectedLayer

Lecture7- 10

30721

32x32x3image->stretchto3072x1

10x3072weights

OutputInput

110


Fully-ConnectedLayer

Lecture7- 11

30721

32x32x3image->stretchto3072x1

10x3072weights

OutputInput

1number:theresultoftakingadotproductbetweenarowofWandtheinput(a3072-dimensionaldotproduct)

110


ConvolutionLayer

Lecture7- 12

32

3

3x32x32 image: preservespatialstructure

widthdepth/channels

height32


ConvolutionLayer

Lecture7- 13

32

3

3x32x32 image

widthdepth/channels

3x5x5filter

Convolvethefilterwiththeimagei.e.“slideovertheimagespatially,computingdotproducts”

height32


ConvolutionLayer

Lecture7- 14

32

3

3x32x32 image

width

height

depth/channels

3x5x5filter

Filtersalwaysextendthefulldepthoftheinputvolume

Convolvethefilterwiththeimagei.e.“slideovertheimagespatially,computingdotproducts”

32


ConvolutionLayer

Lecture7- 15

32

3

3x32x32image

3x5x5filter

321number:theresultoftakingadotproductbetweenthefilterandasmall3x5x5chunkoftheimage(i.e.3*5*5=75-dimensionaldotproduct+bias)


ConvolutionLayer

Lecture7- 16

32

3

3x32x32image

3x5x5filter

32convolve(slide)overallspatiallocations

1x28x28activationmap

1

28

28


ConvolutionLayer

Lecture7- 17

32

3

3x32x32image

3x5x5filter

32convolve(slide)overallspatiallocations

two1x28x28activationmap

1

28

1

28

28

Considerrepeatingwithasecond(green)filter:


ConvolutionLayer

Lecture7- 18

32

3

3x32x32image

32

6activationmaps,each1x28x28

Consider6filters,each3x5x5

ConvolutionLayer

6x3x5x5filters Stackactivationstogeta

6x28x28outputimage!


ConvolutionLayer

Lecture7- 19

32

3

3x32x32image

32

6activationmaps,each1x28x28Also6-dimbiasvector:

ConvolutionLayer


6x28x28outputimage!


ConvolutionLayer

Lecture7- 20

32

3

3x32x32image

32

28x28grid,ateachpointa6-dimvector

Also6-dimbiasvector:

ConvolutionLayer


6x28x28outputimage!


ConvolutionLayer

Lecture7- 21

32

3

2x3x32x32Batchofimages

32

2x6x28x28Batchofoutputs

Also6-dimbiasvector:

ConvolutionLayer

6x3x5x5filters


ConvolutionLayer

Lecture7- 22

W

Cin

NxCin xHxWBatchofimages

H

NxCout xH’xW’Batchofoutputs

AlsoCout-dimbiasvector:

ConvolutionLayer

Cout xCinx Kw xKhfilters

Cout


32

32

3

W1:6x3x5x5b1:5 28

28

6 10

26

26

….

StackingConvolutions

Input:Nx3x32x32

Firsthiddenlayer:Nx6x28x28

W2:10x6x3x3b2:10

Secondhiddenlayer:Nx10x26x26

Conv Conv Conv

W3:12x10x3x3b3:12


32

32

3

W1:6x3x5x5b1:5 28

28

6 10

26

26

….


Input:Nx3x32x32


W2:10x6x3x3b2:10


Conv Conv Conv

W3:12x10x3x3b3:12

Q:Whathappensifwestacktwoconvolutionlayers?


32

32

3

W1:6x3x5x5b1:6 28

28

6 10

26

26

….


Input:Nx3x32x32


W2:10x6x3x3b2:10


Conv

W3:12x10x3x3b3:12

Q:Whathappensifwestacktwoconvolutionlayers?A:Wegetanotherconvolution!

(Recally=W2W1xisalinearclassifier)

ReLU Conv ReLU Conv ReLU


32

32

3

W1:6x3x5x5b1:6 28

28

6 10

26

26

….

Whatdoconvolutionalfilterslearn?

Input:Nx3x32x32


W2:10x6x3x3b2:10


Conv

W3:12x10x3x3b3:12

ReLU Conv ReLU Conv ReLU


32

32

3

W1:6x3x5x5b1:6 28

28

6


Input:Nx3x32x32


Conv ReLU

Linearclassifier:Onetemplateperclass


32

32

3

W1:6x3x5x5b1:6 28

28

6


Input:Nx3x32x32


Conv ReLU

MLP:Bankofwhole-imagetemplates


32

32

3

W1:6x3x5x5b1:6 28

28

6


Input:Nx3x32x32


Conv ReLU

First-layerconvfilters:localimagetemplates(Oftenlearnsorientededges,opposingcolors)

AlexNet:64filters,each3x11x11


32

32

3

W1:6x3x5x5b1:6 28

28

6

Acloserlookatspatialdimensions

Input:Nx3x32x32


Conv ReLU



7

7

Input:7x7Filter:3x3



7

7

Input:7x7Filter:3x3Output:5x5



7

7


Ingeneral:Input:WFilter:KOutput:W– K+1

Problem:Featuremaps“shrink”witheachlayer!


0 0 0 0 0 0 0 0 0

0 0

0 0

0 0

0 0

0 0

0 0

0 0

0 0 0 0 0 0 0 0 0

Lecture7- 37



Ingeneral:Input:WFilter:KOutput:W– K+1

Problem:Featuremaps“shrink”witheachlayer!

Solution:paddingAddzerosaroundtheinput


0 0 0 0 0 0 0 0 0

0 0

0 0

0 0

0 0

0 0

0 0

0 0

0 0 0 0 0 0 0 0 0

Lecture7- 38



Ingeneral:Input:WFilter:KPadding:POutput:W– K+1+2P

Verycommon:SetP=(K– 1)/2tomakeoutputhavesamesizeasinput!


ReceptiveFields

Input Output

ForconvolutionwithkernelsizeK,eachelementintheoutputdependsonaKxKreceptivefield intheinput


ReceptiveFields

Input Output

EachsuccessiveconvolutionaddsK– 1tothereceptivefieldsizeWithLlayersthereceptivefieldsizeis1+L*(K– 1)

Becareful– ”receptivefieldintheinput”vs“receptivefieldinthepreviouslayer”Hopefullyclearfromcontext!


ReceptiveFields

Input Output


Problem:Forlargeimagesweneedmanylayersforeachoutputto“see”thewholeimageimage


ReceptiveFields

Input Output


Problem:Forlargeimagesweneedmanylayersforeachoutputto“see”thewholeimageimage

Solution:Downsample insidethenetwork


Strided ConvolutionInput:7x7Filter:3x3Stride:2



Output:3x3



Output:3x3

Ingeneral:Input:WFilter:KPadding:PStride:SOutput:(W– K+2P)/S+1


ConvolutionExample

Lecture7- 47

Inputvolume:3x 32 x 32105x5filterswithstride1,pad2

Outputvolumesize:?


ConvolutionExample

Lecture7- 48

Inputvolume:3x 32 x 3210 5x5 filterswithstride1,pad2

Outputvolumesize:(32+2*2-5)/1+1=32spatially,so10 x32 x 32


ConvolutionExample

Lecture7- 49

Inputvolume:3x32x32105x5filterswithstride1,pad2

Outputvolumesize:10x32x32Numberoflearnableparameters:?


ConvolutionExample

Lecture7- 50

Inputvolume:3 x32x3210 5x5 filterswithstride1,pad2

Outputvolumesize:10x32x32Numberoflearnableparameters:760Parametersperfilter:3*5*5+1(forbias)=7610 filters,sototalis10 *76 =760


ConvolutionExample

Lecture7- 51

Inputvolume:3x32x32105x5filterswithstride1,pad2

Outputvolumesize:10x32x32Numberoflearnableparameters:760Numberofmultiply-addoperations:?


ConvolutionExample

Lecture7- 52

Inputvolume:3 x32x32105x5 filterswithstride1,pad2

Outputvolumesize:10x32x32Numberoflearnableparameters:760Numberofmultiply-addoperations:768,00010*32*32 =10,240outputs;eachoutputistheinnerproductoftwo3x5x5tensors(75elems);total=75*10240=768K


Example:1x1Convolution

Lecture7- 53

64

56

561x1CONVwith32filters

3256

56

(eachfilterhassize1x1x64,andperformsa64-dimensionaldotproduct)


Example:1x1Convolution

Lecture7- 54

64

56

561x1CONVwith32filters

3256

56

(eachfilterhassize1x1x64,andperformsa64-dimensionaldotproduct)

Linetal,“NetworkinNetwork”,ICLR2014

Stacking1x1convlayersgivesMLPoperatingoneachinputposition


ConvolutionSummary

Lecture7- 55

Input:Cin xHxWHyperparameters:- Kernelsize:KH xKW- Numberfilters:Cout- Padding:P- Stride:SWeightmatrix:Cout xCin xKH xKWgivingCout filtersofsizeCin xKH xKWBiasvector:CoutOutputsize:Cout xH’xW’where:- H’=(H– K+2P)/S+1- W’=(W– K+2P)/S+1


ConvolutionSummary

Lecture7- 56

Input:Cin xHxWHyperparameters:- Kernelsize:KH xKW- Numberfilters:Cout- Padding:P- Stride:SWeightmatrix:Cout xCin xKH xKWgivingCout filtersofsizeCin xKH xKWBiasvector:CoutOutputsize:Cout xH’xW’where:- H’=(H– K+2P)/S+1- W’=(W– K+2P)/S+1

Commonsettings:KH =KW (Smallsquarefilters)P=(K– 1)/2(”Same”padding)Cin,Cout =32,64,128,256(powersof2)K=3,P=1,S=1(3x3conv)K=5,P=2,S=1(5x5conv)K=1,P=0,S=1(1x1conv)K=3,P=1,S=2(Downsample by2)


Othertypesofconvolution

Lecture7- 57

Sofar:2DConvolution

CinW

H

Input:Cin xHxWWeights:Cout xCin xKxK



Lecture7- 58

Sofar:2DConvolution 1DConvolution

CinW

H


Cin

W

Input:Cin xWWeights:Cout xCin xK



Lecture7- 59

Sofar:2DConvolution 3DConvolution

CinW

H


Cin-dimvectorateachpointinthevolume

W

D

H

Input:Cin xHxWxDWeights:Cout xCin xKxKxK


PyTorch ConvolutionLayer


PyTorch ConvolutionLayers



Lecture7- 62


x h s


Normalization


PoolingLayers:Anotherwaytodownsample

Lecture7- 63

Hyperparameters:KernelSizeStridePoolingfunction


MaxPooling

Lecture7- 64

1 1 2 4

5 6 7 8

3 2 1 0

1 2 3 4

Singledepthslice

x

y

Maxpoolingwith2x2kernelsizeandstride2 6 8

3 4

Introducesinvariance tosmallspatialshiftsNolearnableparameters!


PoolingSummary

Lecture7- 65

Input:CxHxWHyperparameters:- Kernelsize:K- Stride:S- Poolingfunction(max,avg)Output:CxH’xW’where- H’=(H– K)/S+1- W’=(W– K)/S+1Learnableparameters:None!

Commonsettings:max,K=2,S=2max,K=3,S=2(AlexNet)



Lecture7- 66


x h s


Normalization


ConvolutionalNetworks

Lecture7- 67

Lecun etal,“Gradient-basedlearningappliedtodocumentrecognition”,1998

Classicarchitecture:[Conv,ReLU,Pool]xN,flatten,[FC,ReLU]xN,FC

Example:LeNet-5


Example:LeNet-5

Lecture7- 68

Layer OutputSize WeightSizeInput 1x28 x28Conv(Cout=20,K=5, P=2,S=1) 20x28x28 20x1x5x5ReLU 20x28x28MaxPool(K=2,S=2) 20x14 x14Conv (Cout=50,K=5,P=2,S=1) 50x14x14 50x20x5x5ReLU 50x14x14MaxPool(K=2, S=2) 50x7x7Flatten 2450Linear(2450 ->500) 500 2450x500ReLU 500Linear(500->10) 10 500x10



Example:LeNet-5

Lecture7- 69




Example:LeNet-5

Lecture7- 70




Example:LeNet-5

Lecture7- 71




Example:LeNet-5

Lecture7- 72




Example:LeNet-5

Lecture7- 73




Example:LeNet-5

Lecture7- 74




Example:LeNet-5

Lecture7- 75




Example:LeNet-5

Lecture7- 76



Aswegothroughthenetwork:

Spatialsizedecreases(usingpoolingorstrided conv)

Numberofchannelsincreases(total“volume”ispreserved!)


Problem:DeepNetworksveryhardtotrain!

Lecture7- 77



Lecture7- 78


x h s


Normalization


BatchNormalization

Lecture7- 79

Ioffe andSzegedy,“Batchnormalization:Acceleratingdeepnetworktrainingbyreducinginternalcovariateshift”,ICML2015

Idea:“Normalize”theoutputsofalayersotheyhavezeromeanandunitvariance

Why?Helpsreduce“internalcovariateshift”,improvesoptimization

Wecannormalizeabatchofactivationslikethis:

Thisisadifferentiablefunction,sowecanuseitasanoperatorinournetworksandbackprop throughit!


BatchNormalization

Lecture7- 80


Input: Per-channelmean,shapeisD

Normalizedx,ShapeisNxD

XN

D

Per-channelstd,shapeisD


BatchNormalization

Lecture7- 81




XN

D Problem:Whatifzero-mean,unitvarianceistoohardofaconstraint?



BatchNormalization

Lecture7- 82

Learnablescaleandshiftparameters:

Output,ShapeisNxD

Learning=,=willrecoverthe

identityfunction!





BatchNormalization:Test-Time

Lecture7- 83


Output,ShapeisNxD


identityfunction!




Problem:Estimatesdependonminibatch;can’tdothisattest-time!



Lecture7- 84


Output,ShapeisNxD


identityfunction!




(Running)averageofvaluesseenduringtraining




Lecture7- 85


Output,ShapeisNxD






Duringtestingbatchnormbecomesalinearoperator!Canbefusedwiththepreviousfully-connectedorconvlayer


BatchNormalizationforConvNets

Lecture7- 86

x: N × D

𝞵,𝝈: 1 × Dɣ,β: 1 × Dy = ɣ(x-𝞵)/𝝈+β

x: N×C×H×W

𝞵,𝝈: 1×C×1×1ɣ,β: 1×C×1×1y = ɣ(x-𝞵)/𝝈+β

Normalize Normalize

BatchNormalizationforfully-connected networks

BatchNormalizationforconvolutional networks(SpatialBatchnorm,BatchNorm2D)


BatchNormalization

Lecture7- 87

FC

BN

tanh

FC

BN

tanh

UsuallyinsertedafterFullyConnectedorConvolutionallayers,andbeforenonlinearity.



BatchNormalization

Lecture7- 88

FC

BN

tanh

FC

BN

tanh

- Makesdeepnetworksmucheasiertotrain!- Allowshigherlearningrates,fasterconvergence- Networksbecomemorerobusttoinitialization- Actsasregularizationduringtraining- Zerooverheadattest-time:canbefusedwithconv!

Trainingiterations

ImageNetaccuracy



BatchNormalization

Lecture7- 89

FC

BN

tanh

FC

BN

tanh

- Makesdeepnetworksmucheasiertotrain!- Allowshigherlearningrates,fasterconvergence- Networksbecomemorerobusttoinitialization- Actsasregularizationduringtraining- Zerooverheadattest-time:canbefusedwithconv!- Notwell-understoodtheoretically(yet)- Behavesdifferentlyduringtrainingandtesting:this

isaverycommonsourceofbugs!



LayerNormalization

Lecture7- 90

x: N × D

𝞵,𝝈: 1 × Dɣ,β: 1 × Dy = ɣ(x-𝞵)/𝝈+β

x: N × D

𝞵,𝝈: N × 1ɣ,β: 1 × Dy = ɣ(x-𝞵)/𝝈+β

Normalize Normalize

LayerNormalization forfully-connectednetworksSamebehaviorattrainandtest!UsedinRNNs,Transformers

BatchNormalization forfully-connectednetworks

Ba,Kiros,andHinton,“LayerNormalization”,arXiv 2016


InstanceNormalization

Lecture7- 91

Ulyanovetal,ImprovedTextureNetworks:MaximizingQualityandDiversityinFeed-forwardStylizationandTextureSynthesis,CVPR2017

x: N×C×H×W

𝞵,𝝈: 1×C×1×1ɣ,β: 1×C×1×1y = ɣ(x-𝞵)/𝝈+β

x: N×C×H×W

𝞵,𝝈: N×C×1×1ɣ,β: 1×C×1×1y = ɣ(x-𝞵)/𝝈+β

Normalize Normalize

InstanceNormalization forconvolutionalnetworksSamebehaviorattrain/test!

BatchNormalization forconvolutionalnetworks


ComparisonofNormalizationLayers

Lecture7- 92

WuandHe,“GroupNormalization”,ECCV2018


GroupNormalization

Lecture7- 93

WuandHe,“GroupNormalization”,ECCV2018



Lecture7- 94


x h s

Fully-ConnectedLayers

ActivationFunction Normalization



Lecture7- 95


x h s



Mostcomputationally

expensive!


Summary:ComponentsofaConvolutionalNetworkConvolutionLayers PoolingLayers

x h s




Summary:ComponentsofaConvolutionalNetwork

Problem:Whatistherightwaytocombineallthesecomponents?


Nexttime:CNNArchitectures

Lecture7- 98

lecture 7: convolutional networksjustincj/slides/eecs498/498_fa2019_lecture07.pdflecture 7 -2 due...

Documents