lecture 8: cnn architecturesjustincj/slides/eecs498/498_fa2019_lecture08.pdf · lecture 8 -8 figure...
TRANSCRIPT
JustinJohnson September30,2019
Lecture8:CNNArchitectures
Lecture8- 1
JustinJohnson September30,2019
Reminder:A2duetoday!
Lecture8- 2
Dueat11:59pm
Remembertorunthevalidationscript!
JustinJohnson September30,2019
Soon:Assignment3!
Lecture8- 3
ModularAPIforbackpropagation
Fully-connectednetworksDropoutUpdaterules:SGD+Momentum,RMSprop,AdamConvolutionalnetworksBatchnormalization
WillbereleasedtodayortomorrowWillbeduetwoweeksfromthedayitisreleased
JustinJohnson September30,2019
LastTime:ComponentsofConvolutionalNetworks
Lecture8- 4
ConvolutionLayers PoolingLayers
x h s
Fully-ConnectedLayers
ActivationFunction Normalization
JustinJohnson September30,2019
ImageNetClassificationChallenge
Lecture8- 5
28.225.8
16.4
11.7
7.3 6.73.6 3 2.3
5.1
0
5
10
15
20
25
30
2010 2011 2012 2013 2014 2014 2015 2016 2017 Human
ErrorR
ate
Shallow
8layers 8layers
19layers
22layers
152layers
152layers
152layers
Linetal Sanchez&Perronnin
Krizhevsky etal(AlexNet)
Zeiler &Fergus
Simonyan &Zisserman(VGG)
Szegedy etal(GoogLeNet)
Heetal(ResNet)
Russakovsky etalShaoetal Huetal(SENet)
JustinJohnson September30,2019
ImageNetClassificationChallenge
Lecture8- 6
28.225.8
16.4
11.7
7.3 6.73.6 3 2.3
5.1
0
5
10
15
20
25
30
2010 2011 2012 2013 2014 2014 2015 2016 2017 Human
ErrorR
ate
Shallow
8layers 8layers
19layers
22layers
152layers
152layers
152layers
Linetal Sanchez&Perronnin
Krizhevsky etal(AlexNet)
Zeiler &Fergus
Simonyan &Zisserman(VGG)
Szegedy etal(GoogLeNet)
Heetal(ResNet)
Russakovsky etalShaoetal Huetal(SENet)
JustinJohnson September30,2019
AlexNet
Lecture8- 7
FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.
227x227inputs5ConvolutionallayersMaxpooling3fully-connectedlayersReLU nonlinearities
JustinJohnson September30,2019
AlexNet
Lecture8- 8
FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.
227x227inputs5ConvolutionallayersMaxpooling3fully-connectedlayersReLU nonlinearities
Used“Localresponsenormalization”;Notusedanymore
TrainedontwoGTX580GPUs– only3GBofmemoryeach!ModelsplitovertwoGPUs
JustinJohnson September30,2019
AlexNet
Lecture8- 9
FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.
JustinJohnson September30,2019
AlexNet
Lecture8- 10
FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.
284 9422672
5955
10173
14951
11533
0
2000
4000
6000
8000
10000
12000
14000
16000
2013 2014 2015 2016 2017 2018 2019
AlexNet Citationsperyear(Asof9/30/2019)
TotalCitations:46,510
JustinJohnson September30,2019
AlexNet
Lecture8- 11
FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.
284 9422672
5955
10173
14951
11533
0
2000
4000
6000
8000
10000
12000
14000
16000
2013 2014 2015 2016 2017 2018 2019
AlexNet Citationsperyear(Asof9/30/2019)
CitationCountsDarwin,“Ontheoriginofspecies”,1859:50,007
Shannon,“Amathematicaltheoryofcommunication”,1948:69,351
WatsonandCrick,“MolecularStructureofNucleicAcids”,1953:13,111
ATLASCollaboration,“ObservationofanewparticleinthesearchfortheStandardModelHiggsbosonwiththeATLASdetectorattheLHC”, 2012:14,424TotalCitations:46,510
JustinJohnson September30,2019
AlexNet
Lecture8- 12
FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.
Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36flatten 256 6 9216 36fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4
?
JustinJohnson September30,2019
AlexNet
Lecture8- 13
FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.
Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36flatten 256 6 9216 36fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4
Recall:Outputchannels=numberoffilters
?
JustinJohnson September30,2019
AlexNet
Lecture8- 14
FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.
Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36flatten 256 6 9216 36fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4
Recall:W’=(W– K+2P)/S+1=227– 11+2*2)/4+1=220/4+1=56
JustinJohnson September30,2019
AlexNet
Lecture8- 15
FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.
Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36flatten 256 6 9216 36fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4
?
JustinJohnson September30,2019
AlexNet
Lecture8- 16
FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.
Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36flatten 256 6 9216 36fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4
Numberofoutputelements=C*H’*W’=64*56*56=200,704
Bytesperelement=4(for32-bitfloatingpoint)
KB=(numberofelements)*(bytesperelem)/1024=200704*4/1024= 784
JustinJohnson September30,2019
AlexNet
Lecture8- 17
FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.
Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36flatten 256 6 9216 36fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4
?
JustinJohnson September30,2019
AlexNet
Lecture8- 18
FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.
Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36flatten 256 6 9216 36fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4
Weightshape=Cout xCin xKxK=64x3x11x11
Biasshape=Cout =64Numberofweights=64*3*11*11+64
=23,296
JustinJohnson September30,2019
AlexNet
Lecture8- 19
FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.
Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36flatten 256 6 9216 36fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4
?
JustinJohnson September30,2019
AlexNet
Lecture8- 20
FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.
Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36flatten 256 6 9216 36fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4
Numberoffloatingpointoperations(multiply+add)=(numberofoutputelements)*(opsperoutputelem)=(Cout xH’xW’)*(Cin xKxK)=(64*56*56)*(3*11*11)=200,704*363=72,855,552
JustinJohnson September30,2019
AlexNet
Lecture8- 21
FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.
Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36flatten 256 6 9216 36fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4
?
JustinJohnson September30,2019
AlexNet
Lecture8- 22
FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.
Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36flatten 256 6 9216 36fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4
Forpoolinglayer:
#outputchannels=#inputchannels=64
W’=floor((W– K)/S+1)=floor(53/2+1)=floor(27.5)=27
JustinJohnson September30,2019
AlexNet
Lecture8- 23
FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.
Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36flatten 256 6 9216 36fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4
#outputelems =Cout xH’xW’Bytesperelem =4KB=Cout *H’*W’*4/1024
=64*27*27*4/1024=182.25
?
JustinJohnson September30,2019
AlexNet
Lecture8- 24
FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.
Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182 0conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36flatten 256 6 9216 36fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4
Poolinglayershavenolearnableparameters!
?
JustinJohnson September30,2019
AlexNet
Lecture8- 25
FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.
Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182 0 0conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36flatten 256 6 9216 36fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4
Floating-pointopsforpoolinglayer=(numberofoutputpositions)*(flopsperoutputposition)=(Cout *H’*W’)*(K*K)=(64*27*27)*(3*3)=419,904=0.4MFLOP
JustinJohnson September30,2019
AlexNet
Lecture8- 26
FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.
Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182 0 0conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127 0 0conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36 0 0flatten 256 6 9216 36 0 0fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4
Flattenoutputsize=Cin xHxW=256*6*6=9216
JustinJohnson September30,2019
AlexNet
Lecture8- 27
Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182 0 0conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127 0 0conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36 0 0flatten 256 6 9216 36 0 0fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4
FCparams =Cin *Cout +Cout=9216*4096+4096=37,725,832
FCflops=Cin *Cout=9216*4096=37,748,736
JustinJohnson September30,2019
AlexNet
Lecture8- 28
Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182 0 0conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127 0 0conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36 0 0flatten 256 6 9216 36 0 0fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4
JustinJohnson September30,2019
AlexNet
Lecture8- 29
Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182 0 0conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127 0 0conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36 0 0flatten 256 6 9216 36 0 0fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4
Howtochoosethis?Trialanderror=(
JustinJohnson September30,2019
AlexNet
Lecture8- 30
Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182 0 0conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127 0 0conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36 0 0flatten 256 6 9216 36 0 0fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4
Interestingtrendshere!
JustinJohnson September30,2019
AlexNet
Lecture8- 31
0
5000
10000
15000
20000
25000
30000
35000
40000
Params(K)
0
50
100
150
200
250
MFLOP
0
100
200
300
400
500
600
700
800
900
Memory(KB)
Mostofthememoryusage isintheearlyconvolutionlayers
Nearlyallparameters areinthefully-connectedlayers
Mostfloating-pointops occurintheconvolutionlayers
JustinJohnson September30,2019
ImageNetClassificationChallenge
Lecture8- 32
28.225.8
16.4
11.7
7.3 6.73.6 3 2.3
5.1
0
5
10
15
20
25
30
2010 2011 2012 2013 2014 2014 2015 2016 2017 Human
ErrorR
ate
Shallow
8layers 8layers
19layers
22layers
152layers
152layers
152layers
Linetal Sanchez&Perronnin
Krizhevsky etal(AlexNet)
Zeiler &Fergus
Simonyan &Zisserman(VGG)
Szegedy etal(GoogLeNet)
Heetal(ResNet)
Russakovsky etalShaoetal Huetal(SENet)
JustinJohnson September30,2019
ImageNetClassificationChallenge
Lecture8- 33
28.225.8
16.4
11.7
7.3 6.73.6 3 2.3
5.1
0
5
10
15
20
25
30
2010 2011 2012 2013 2014 2014 2015 2016 2017 Human
ErrorR
ate
Shallow
8layers 8layers
19layers
22layers
152layers
152layers
152layers
Linetal Sanchez&Perronnin
Krizhevsky etal(AlexNet)
Zeiler &Fergus
Simonyan &Zisserman(VGG)
Szegedy etal(GoogLeNet)
Heetal(ResNet)
Russakovsky etalShaoetal Huetal(SENet)
JustinJohnson September30,2019
ZFNet:ABiggerAlexNet
Lecture8- 34
AlexNet but:CONV1:changefrom(11x11stride4)to(7x7stride2)CONV3,4,5:insteadof384,384,256filtersuse512,1024,512Moretrialanderror=(
ImageNettop5error:16.4%->11.7%
Zeiler andFergus,“VisualizingandUnderstandingConvolutionalNetworks”,ECCV2014
JustinJohnson September30,2019
ImageNetClassificationChallenge
Lecture8- 35
28.225.8
16.4
11.7
7.3 6.73.6 3 2.3
5.1
0
5
10
15
20
25
30
2010 2011 2012 2013 2014 2014 2015 2016 2017 Human
ErrorR
ate
Shallow
8layers 8layers
19layers
22layers
152layers
152layers
152layers
Linetal Sanchez&Perronnin
Krizhevsky etal(AlexNet)
Zeiler &Fergus
Simonyan &Zisserman(VGG)
Szegedy etal(GoogLeNet)
Heetal(ResNet)
Russakovsky etalShaoetal Huetal(SENet)
JustinJohnson September30,2019
ImageNetClassificationChallenge
Lecture8- 36
28.225.8
16.4
11.7
7.3 6.73.6 3 2.3
5.1
0
5
10
15
20
25
30
2010 2011 2012 2013 2014 2014 2015 2016 2017 Human
ErrorR
ate
Shallow
8layers 8layers
19layers
22layers
152layers
152layers
152layers
Linetal Sanchez&Perronnin
Krizhevsky etal(AlexNet)
Zeiler &Fergus
Simonyan &Zisserman(VGG)
Szegedy etal(GoogLeNet)
Heetal(ResNet)
Russakovsky etalShaoetal Huetal(SENet)
JustinJohnson September30,2019
VGG:DeeperNetworks,RegularDesign
Lecture8- 37
3x3 conv, 128Pool
3x3 conv, 643x3 conv, 64
Input
3x3 conv, 128Pool
3x3 conv, 2563x3 conv, 256
Pool
3x3 conv, 5123x3 conv, 512
Pool
3x3 conv, 5123x3 conv, 512
Pool
FC 4096FC 1000Softmax
FC 4096
3x3 conv, 512
3x3 conv, 512
3x3 conv, 384Pool
5x5 conv, 25611x11 conv, 96
Input
Pool3x3 conv, 3843x3 conv, 256
PoolFC 4096FC 4096
SoftmaxFC 1000
Pool
Input
Pool
Pool
Pool
Pool
Softmax
3x3 conv, 512
3x3 conv, 512
3x3 conv, 2563x3 conv, 256
3x3 conv, 1283x3 conv, 128
3x3 conv, 643x3 conv, 64
3x3 conv, 5123x3 conv, 512
3x3 conv, 512
3x3 conv, 5123x3 conv, 512
3x3 conv, 512
FC 4096FC 1000
FC 4096
AlexNet VGG16 VGG19Simonyan andZissermann,“VeryDeepConvolutionalNetworksforLarge-ScaleImageRecognition”,ICLR2015
VGGDesignrules:Allconvare3x3stride1pad1Allmaxpoolare2x2stride2Afterpool,double#channels
JustinJohnson September30,2019
VGG:DeeperNetworks,RegularDesign
Lecture8- 38
3x3 conv, 128Pool
3x3 conv, 643x3 conv, 64
Input
3x3 conv, 128Pool
3x3 conv, 2563x3 conv, 256
Pool
3x3 conv, 5123x3 conv, 512
Pool
3x3 conv, 5123x3 conv, 512
Pool
FC 4096FC 1000Softmax
FC 4096
3x3 conv, 512
3x3 conv, 512
3x3 conv, 384Pool
5x5 conv, 25611x11 conv, 96
Input
Pool3x3 conv, 3843x3 conv, 256
PoolFC 4096FC 4096
SoftmaxFC 1000
Pool
Input
Pool
Pool
Pool
Pool
Softmax
3x3 conv, 512
3x3 conv, 512
3x3 conv, 2563x3 conv, 256
3x3 conv, 1283x3 conv, 128
3x3 conv, 643x3 conv, 64
3x3 conv, 5123x3 conv, 512
3x3 conv, 512
3x3 conv, 5123x3 conv, 512
3x3 conv, 512
FC 4096FC 1000
FC 4096
AlexNet VGG16 VGG19Simonyan andZissermann,“VeryDeepConvolutionalNetworksforLarge-ScaleImageRecognition”,ICLR2015
VGGDesignrules:Allconvare3x3stride1pad1Allmaxpoolare2x2stride2Afterpool,double#channelsNetworkhas5convolutionalstages:Stage1:conv-conv-poolStage2:conv-conv-poolStage3:conv-conv-poolStage4:conv-conv-conv-[conv]-poolStage5:conv-conv-conv-[conv]-pool
(VGG-19has4convinstages4and5)
JustinJohnson September30,2019
VGG:DeeperNetworks,RegularDesign
Lecture8- 39
3x3 conv, 128Pool
3x3 conv, 643x3 conv, 64
Input
3x3 conv, 128Pool
3x3 conv, 2563x3 conv, 256
Pool
3x3 conv, 5123x3 conv, 512
Pool
3x3 conv, 5123x3 conv, 512
Pool
FC 4096FC 1000Softmax
FC 4096
3x3 conv, 512
3x3 conv, 512
3x3 conv, 384Pool
5x5 conv, 25611x11 conv, 96
Input
Pool3x3 conv, 3843x3 conv, 256
PoolFC 4096FC 4096
SoftmaxFC 1000
Pool
Input
Pool
Pool
Pool
Pool
Softmax
3x3 conv, 512
3x3 conv, 512
3x3 conv, 2563x3 conv, 256
3x3 conv, 1283x3 conv, 128
3x3 conv, 643x3 conv, 64
3x3 conv, 5123x3 conv, 512
3x3 conv, 512
3x3 conv, 5123x3 conv, 512
3x3 conv, 512
FC 4096FC 1000
FC 4096
AlexNet VGG16 VGG19Simonyan andZissermann,“VeryDeepConvolutionalNetworksforLarge-ScaleImageRecognition”,ICLR2015
VGGDesignrules:Allconvare3x3stride1pad1Allmaxpoolare2x2stride2Afterpool,double#channels
Option1:Conv(5x5,C->C)
Params:25C2FLOPs:25C2HW
JustinJohnson September30,2019
VGG:DeeperNetworks,RegularDesign
Lecture8- 40
3x3 conv, 128Pool
3x3 conv, 643x3 conv, 64
Input
3x3 conv, 128Pool
3x3 conv, 2563x3 conv, 256
Pool
3x3 conv, 5123x3 conv, 512
Pool
3x3 conv, 5123x3 conv, 512
Pool
FC 4096FC 1000Softmax
FC 4096
3x3 conv, 512
3x3 conv, 512
3x3 conv, 384Pool
5x5 conv, 25611x11 conv, 96
Input
Pool3x3 conv, 3843x3 conv, 256
PoolFC 4096FC 4096
SoftmaxFC 1000
Pool
Input
Pool
Pool
Pool
Pool
Softmax
3x3 conv, 512
3x3 conv, 512
3x3 conv, 2563x3 conv, 256
3x3 conv, 1283x3 conv, 128
3x3 conv, 643x3 conv, 64
3x3 conv, 5123x3 conv, 512
3x3 conv, 512
3x3 conv, 5123x3 conv, 512
3x3 conv, 512
FC 4096FC 1000
FC 4096
AlexNet VGG16 VGG19Simonyan andZissermann,“VeryDeepConvolutionalNetworksforLarge-ScaleImageRecognition”,ICLR2015
VGGDesignrules:Allconvare3x3stride1pad1Allmaxpoolare2x2stride2Afterpool,double#channels
Option1:Conv(5x5,C->C)
Params:25C2FLOPs:25C2HW
Option2:Conv(3x3,C->C)Conv(3x3,C->C)
Params:18C2FLOPs:18C2HW
JustinJohnson September30,2019
VGG:DeeperNetworks,RegularDesign
Lecture8- 41
3x3 conv, 128Pool
3x3 conv, 643x3 conv, 64
Input
3x3 conv, 128Pool
3x3 conv, 2563x3 conv, 256
Pool
3x3 conv, 5123x3 conv, 512
Pool
3x3 conv, 5123x3 conv, 512
Pool
FC 4096FC 1000Softmax
FC 4096
3x3 conv, 512
3x3 conv, 512
3x3 conv, 384Pool
5x5 conv, 25611x11 conv, 96
Input
Pool3x3 conv, 3843x3 conv, 256
PoolFC 4096FC 4096
SoftmaxFC 1000
Pool
Input
Pool
Pool
Pool
Pool
Softmax
3x3 conv, 512
3x3 conv, 512
3x3 conv, 2563x3 conv, 256
3x3 conv, 1283x3 conv, 128
3x3 conv, 643x3 conv, 64
3x3 conv, 5123x3 conv, 512
3x3 conv, 512
3x3 conv, 5123x3 conv, 512
3x3 conv, 512
FC 4096FC 1000
FC 4096
AlexNet VGG16 VGG19Simonyan andZissermann,“VeryDeepConvolutionalNetworksforLarge-ScaleImageRecognition”,ICLR2015
VGGDesignrules:Allconvare3x3stride1pad1Allmaxpoolare2x2stride2Afterpool,double#channels
Option1:Conv(5x5,C->C)
Params:25C2FLOPs:25C2HW
Option2:Conv(3x3,C->C)Conv(3x3,C->C)
Params:18C2FLOPs:18C2HW
Two3x3convhassamereceptivefieldasasingle5x5conv,buthasfewerparametersandtakeslesscomputation!
JustinJohnson September30,2019
VGG:DeeperNetworks,RegularDesign
Lecture8- 42
3x3 conv, 128Pool
3x3 conv, 643x3 conv, 64
Input
3x3 conv, 128Pool
3x3 conv, 2563x3 conv, 256
Pool
3x3 conv, 5123x3 conv, 512
Pool
3x3 conv, 5123x3 conv, 512
Pool
FC 4096FC 1000Softmax
FC 4096
3x3 conv, 512
3x3 conv, 512
3x3 conv, 384Pool
5x5 conv, 25611x11 conv, 96
Input
Pool3x3 conv, 3843x3 conv, 256
PoolFC 4096FC 4096
SoftmaxFC 1000
Pool
Input
Pool
Pool
Pool
Pool
Softmax
3x3 conv, 512
3x3 conv, 512
3x3 conv, 2563x3 conv, 256
3x3 conv, 1283x3 conv, 128
3x3 conv, 643x3 conv, 64
3x3 conv, 5123x3 conv, 512
3x3 conv, 512
3x3 conv, 5123x3 conv, 512
3x3 conv, 512
FC 4096FC 1000
FC 4096
AlexNet VGG16 VGG19Simonyan andZissermann,“VeryDeepConvolutionalNetworksforLarge-ScaleImageRecognition”,ICLR2015
VGGDesignrules:Allconvare3x3stride1pad1Allmaxpoolare2x2stride2Afterpool,double#channels
Input:Cx2Hx2WLayer:Conv(3x3,C->C)
Memory:4HWCParams:9C2FLOPs:36HWC2
JustinJohnson September30,2019
VGG:DeeperNetworks,RegularDesign
Lecture8- 43
3x3 conv, 128Pool
3x3 conv, 643x3 conv, 64
Input
3x3 conv, 128Pool
3x3 conv, 2563x3 conv, 256
Pool
3x3 conv, 5123x3 conv, 512
Pool
3x3 conv, 5123x3 conv, 512
Pool
FC 4096FC 1000Softmax
FC 4096
3x3 conv, 512
3x3 conv, 512
3x3 conv, 384Pool
5x5 conv, 25611x11 conv, 96
Input
Pool3x3 conv, 3843x3 conv, 256
PoolFC 4096FC 4096
SoftmaxFC 1000
Pool
Input
Pool
Pool
Pool
Pool
Softmax
3x3 conv, 512
3x3 conv, 512
3x3 conv, 2563x3 conv, 256
3x3 conv, 1283x3 conv, 128
3x3 conv, 643x3 conv, 64
3x3 conv, 5123x3 conv, 512
3x3 conv, 512
3x3 conv, 5123x3 conv, 512
3x3 conv, 512
FC 4096FC 1000
FC 4096
AlexNet VGG16 VGG19Simonyan andZissermann,“VeryDeepConvolutionalNetworksforLarge-ScaleImageRecognition”,ICLR2015
VGGDesignrules:Allconvare3x3stride1pad1Allmaxpoolare2x2stride2Afterpool,double#channels
Input:Cx2Hx2WLayer:Conv(3x3,C->C)
Memory:4HWCParams:9C2FLOPs:36HWC2
Input:2CxHxWConv(3x3,2C->2C)
Memory:2HWCParams:36C2FLOPs:36HWC2
JustinJohnson September30,2019
VGG:DeeperNetworks,RegularDesign
Lecture8- 44
3x3 conv, 128Pool
3x3 conv, 643x3 conv, 64
Input
3x3 conv, 128Pool
3x3 conv, 2563x3 conv, 256
Pool
3x3 conv, 5123x3 conv, 512
Pool
3x3 conv, 5123x3 conv, 512
Pool
FC 4096FC 1000Softmax
FC 4096
3x3 conv, 512
3x3 conv, 512
3x3 conv, 384Pool
5x5 conv, 25611x11 conv, 96
Input
Pool3x3 conv, 3843x3 conv, 256
PoolFC 4096FC 4096
SoftmaxFC 1000
Pool
Input
Pool
Pool
Pool
Pool
Softmax
3x3 conv, 512
3x3 conv, 512
3x3 conv, 2563x3 conv, 256
3x3 conv, 1283x3 conv, 128
3x3 conv, 643x3 conv, 64
3x3 conv, 5123x3 conv, 512
3x3 conv, 512
3x3 conv, 5123x3 conv, 512
3x3 conv, 512
FC 4096FC 1000
FC 4096
AlexNet VGG16 VGG19Simonyan andZissermann,“VeryDeepConvolutionalNetworksforLarge-ScaleImageRecognition”,ICLR2015
VGGDesignrules:Allconvare3x3stride1pad1Allmaxpoolare2x2stride2Afterpool,double#channels
Input:Cx2Hx2WLayer:Conv(3x3,C->C)
Memory:4HWCParams:9C2FLOPs:36HWC2
Input:2CxHxWConv(3x3,2C->2C)
Memory:2HWCParams:36C2FLOPs:36HWC2
Convlayersateachspatialresolutiontakethesameamountofcomputation!
JustinJohnson September30,2019
AlexNet vsVGG-16:Muchbiggernetwork!
Lecture8- 45
0
5000
10000
15000
20000
25000
30000
AlexNet vsVGG-16(Memory,KB)
AlexNet VGG-16
0
20000
40000
60000
80000
100000
120000
AlexNet vsVGG-16(Params,M)
AlexNet VGG-16
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
AlexNet vsVGG-16(MFLOPs)
AlexNet VGG-16
AlexNet total:1.9MBVGG-16total:48.6MB(25x)
AlexNet total:61MVGG-16total:138M(2.3x)
AlexNet total:0.7GFLOPVGG-16total:13.6GFLOP(19.4x)
JustinJohnson September30,2019
ImageNetClassificationChallenge
Lecture8- 46
28.225.8
16.4
11.7
7.3 6.73.6 3 2.3
5.1
0
5
10
15
20
25
30
2010 2011 2012 2013 2014 2014 2015 2016 2017 Human
ErrorR
ate
Shallow
8layers 8layers
19layers
22layers
152layers
152layers
152layers
Linetal Sanchez&Perronnin
Krizhevsky etal(AlexNet)
Zeiler &Fergus
Simonyan &Zisserman(VGG)
Szegedy etal(GoogLeNet)
Heetal(ResNet)
Russakovsky etalShaoetal Huetal(SENet)
JustinJohnson September30,2019
ImageNetClassificationChallenge
Lecture8- 47
28.225.8
16.4
11.7
7.3 6.73.6 3 2.3
5.1
0
5
10
15
20
25
30
2010 2011 2012 2013 2014 2014 2015 2016 2017 Human
ErrorR
ate
Shallow
8layers 8layers
19layers
22layers
152layers
152layers
152layers
Linetal Sanchez&Perronnin
Krizhevsky etal(AlexNet)
Zeiler &Fergus
Simonyan &Zisserman(VGG)
Szegedy etal(GoogLeNet)
Heetal(ResNet)
Russakovsky etalShaoetal Huetal(SENet)
JustinJohnson September30,2019
GoogLeNet:FocusonEfficiency
Lecture8- 48
Szegedy etal,“Goingdeeperwithconvolutions”,CVPR2015
Manyinnovationsforefficiency:reduceparametercount,memoryusage,andcomputation
JustinJohnson September30,2019
GoogLeNet:AggressiveStem
Lecture8- 49
Szegedy etal,“Goingdeeperwithconvolutions”,CVPR2015
Stemnetwork atthestartaggressivelydownsamples input(RecallinVGG-16:Mostofthecomputewasatthestart)
JustinJohnson September30,2019
GoogLeNet:AggressiveStem
Lecture8- 50
Szegedy etal,“Goingdeeperwithconvolutions”,CVPR2015
Inputsize Layer OutputsizeLayer C H / W filters kernelstride pad C H/W memory(KB) params (K) flop(M)conv 3 224 64 7 2 3 64 112 3136 9 118max-pool 64 112 3 2 1 64 56 784 0 2conv 64 56 64 1 1 0 64 56 784 4 13conv 64 56 192 3 1 1 192 56 2352 111 347max-pool 192 56 3 2 1 192 28 588 0 1
Totalfrom224to28spatialresolution:Memory:7.5MBParams:124KMFLOP:418
Stemnetwork atthestartaggressivelydownsamples input(RecallinVGG-16:Mostofthecomputewasatthestart)
JustinJohnson September30,2019
GoogLeNet:AggressiveStem
Lecture8- 51
Szegedy etal,“Goingdeeperwithconvolutions”,CVPR2015
Inputsize Layer OutputsizeLayer C H / W filters kernelstride pad C H/W memory(KB) params (K) flop(M)conv 3 224 64 7 2 3 64 112 3136 9 118max-pool 64 112 3 2 1 64 56 784 0 2conv 64 56 64 1 1 0 64 56 784 4 13conv 64 56 192 3 1 1 192 56 2352 111 347max-pool 192 56 3 2 1 192 28 588 0 1
Totalfrom224to28spatialresolution:Memory:7.5MBParams:124KMFLOP:418
CompareVGG-16:Memory:42.9MB(5.7x)Params:1.1M(8.9x)MFLOP:7485(17.8x)
Stemnetwork atthestartaggressivelydownsamples input(RecallinVGG-16:Mostofthecomputewasatthestart)
JustinJohnson September30,2019
GoogLeNet:InceptionModule
Lecture8- 52
Szegedy etal,“Goingdeeperwithconvolutions”,CVPR2015
InceptionmoduleLocalunitwithparallelbranches
Localstructurerepeatedmanytimesthroughoutthenetwork
JustinJohnson September30,2019
GoogLeNet:InceptionModule
Lecture8- 53
Szegedy etal,“Goingdeeperwithconvolutions”,CVPR2015
InceptionmoduleLocalunitwithparallelbranches
Localstructurerepeatedmanytimesthroughoutthenetwork
Uses1x1“Bottleneck”layerstoreducechanneldimensionbeforeexpensiveconv(wewillrevisitthiswithResNet!)
JustinJohnson September30,2019
GoogLeNet:GlobalAveragePooling
Lecture8- 54
NolargeFClayersattheend!Insteadusesglobalaveragepoolingtocollapsespatialdimensions,andonelinearlayertoproduceclassscores(RecallVGG-16:MostparameterswereintheFClayers!)
Inputsize Layer Output sizeLayer C H/W filters kernel stride pad C H/W memory(KB) params (k) flop(M)avg-pool 1024 7 7 1 0 1024 1 4 0 0fc 1024 1000 1000 0 1025 1
Layer C H/W filters kernel stride pad C H/W memory(KB) params(K) flop(M)flatten 512 7 25088 98fc6 25088 4096 4096 16 102760 103fc7 4096 4096 4096 16 16777 17fc8 4096 1000 1000 4 4096 4
ComparewithVGG-16:
JustinJohnson September30,2019
GoogLeNet:GlobalAveragePooling
Lecture8- 55
NolargeFClayersattheend!Insteaduses“globalaveragepooling”tocollapsespatialdimensions,andonelinearlayertoproduceclassscores(RecallVGG-16:MostparameterswereintheFClayers!)
Inputsize Layer Output sizeLayer C H/W filters kernel stride pad C H/W memory(KB) params (k) flop(M)avg-pool 1024 7 7 1 0 1024 1 4 0 0fc 1024 1000 1000 0 1025 1
Layer C H/W filters kernel stride pad C H/W memory(KB) params(K) flop(M)flatten 512 7 25088 98fc6 25088 4096 4096 16 102760 103fc7 4096 4096 4096 16 16777 17fc8 4096 1000 1000 4 4096 4
ComparewithVGG-16:
JustinJohnson September30,2019
GoogLeNet:AuxiliaryClassifiers
Lecture8- 56
Trainingusinglossattheendofthenetworkdidn’tworkwell:Networkistoodeep,gradientsdon’tpropagatecleanly
Asahack,attach“auxiliaryclassifiers”atseveralintermediatepointsinthenetworkthatalsotrytoclassifytheimageandreceiveloss
GoogLeNet wasbeforebatchnormalization!WithBatchNorm nolongerneedtousethistrick
JustinJohnson September30,2019
ImageNetClassificationChallenge
Lecture8- 57
28.225.8
16.4
11.7
7.3 6.73.6 3 2.3
5.1
0
5
10
15
20
25
30
2010 2011 2012 2013 2014 2014 2015 2016 2017 Human
ErrorR
ate
Shallow
8layers 8layers
19layers
22layers
152layers
152layers
152layers
Linetal Sanchez&Perronnin
Krizhevsky etal(AlexNet)
Zeiler &Fergus
Simonyan &Zisserman(VGG)
Szegedy etal(GoogLeNet)
Heetal(ResNet)
Russakovsky etalShaoetal Huetal(SENet)
JustinJohnson September30,2019
ImageNetClassificationChallenge
Lecture8- 58
28.225.8
16.4
11.7
7.3 6.73.6 3 2.3
5.1
0
5
10
15
20
25
30
2010 2011 2012 2013 2014 2014 2015 2016 2017 Human
ErrorR
ate
Shallow
8layers 8layers
19layers
22layers
152layers
152layers
152layers
Linetal Sanchez&Perronnin
Krizhevsky etal(AlexNet)
Zeiler &Fergus
Simonyan &Zisserman(VGG)
Szegedy etal(GoogLeNet)
Heetal(ResNet)
Russakovsky etalShaoetal Huetal(SENet)
JustinJohnson September30,2019
ResidualNetworks
Lecture8- 59
Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016
OncewehaveBatchNormalization,wecantrainnetworkswith10+layers.Whathappensaswegodeeper?
JustinJohnson September30,2019
ResidualNetworks
Lecture8- 60
Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016
OncewehaveBatchNormalization,wecantrainnetworkswith10+layers.Whathappensaswegodeeper?
Deepermodeldoesworsethanshallowmodel!
Initialguess:Deepmodelisoverfitting sinceitismuchbiggerthantheothermodel
Iterations
56-layer
20-layer
Test error
JustinJohnson September30,2019
ResidualNetworks
Lecture8- 61
Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016
OncewehaveBatchNormalization,wecantrainnetworkswith10+layers.Whathappensaswegodeeper?
Training error
Iterations
56-layer
20-layer
Iterations
56-layer
20-layer
Test error
Infactthedeepmodelseemstobeunderfitting sinceitalsoperformsworsethantheshallowmodelonthetrainingset!Itisactuallyunderfitting
JustinJohnson September30,2019
ResidualNetworks
Lecture8- 62
Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016
Adeepermodelcanemulate ashallowermodel:copylayersfromshallowermodel,setextralayerstoidentity
Thusdeepermodelsshoulddoatleastasgoodasshallowmodels
Hypothesis:Thisisanoptimization problem.Deepermodelsarehardertooptimize,andinparticulardon’tlearnidentityfunctionstoemulateshallowmodels
JustinJohnson September30,2019
ResidualNetworks
Lecture8- 63
Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016
Adeepermodelcanemulate ashallowermodel:copylayersfromshallowermodel,setextralayerstoidentity
Thusdeepermodelsshoulddoatleastasgoodasshallowmodels
Hypothesis:Thisisanoptimization problem.Deepermodelsarehardertooptimize,andinparticulardon’tlearnidentityfunctionstoemulateshallowmodels
Solution:Changethenetworksolearningidentityfunctionswithextralayersiseasy!
JustinJohnson September30,2019
ResidualNetworks
Lecture8- 64
Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016
conv
conv
relu
“Plain”block
X
H(x)
relu
ResidualBlock
conv
conv
Additive“shortcut”
F(x)+x
F(x)
relu
X
Solution:Changethenetworksolearningidentityfunctionswithextralayersiseasy!
JustinJohnson September30,2019
ResidualNetworks
Lecture8- 65
Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016
conv
conv
relu
“Plain”block
X
H(x)
relu
ResidualBlock
conv
conv
Additive“shortcut”
F(x)+x
F(x)
relu
X
Solution:Changethenetworksolearningidentityfunctionswithextralayersiseasy!
Ifyousettheseto0,thewholeblockwillcomputetheidentityfunction!
JustinJohnson September30,2019
ResidualNetworks
Lecture8- 66
Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016Input
Softmax
3x3 conv, 64
7x7 conv, 64, / 2
FC 1000
Pool
3x3 conv, 64
3x3 conv, 643x3 conv, 64
3x3 conv, 643x3 conv, 64
3x3 conv, 1283x3 conv, 128, / 2
3x3 conv, 1283x3 conv, 128
3x3 conv, 1283x3 conv, 128
..
.
3x3 conv, 5123x3 conv, 512, /2
3x3 conv, 5123x3 conv, 512
3x3 conv, 5123x3 conv, 512
Pool
relu
Residualblock
3x3 conv
3x3 conv
F(x)+x
F(x)
relu
X
Aresidualnetworkisastackofmanyresidualblocks
Regulardesign,likeVGG:eachresidualblockhastwo3x3conv
Networkisdividedintostages:thefirstblockofeachstagehalvestheresolution(withstride-2conv)anddoublesthenumberofchannels
JustinJohnson September30,2019
ResidualNetworks
Lecture8- 67
Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016Input
Softmax
3x3 conv, 64
7x7 conv, 64, / 2
FC 1000
Pool
3x3 conv, 64
3x3 conv, 643x3 conv, 64
3x3 conv, 643x3 conv, 64
3x3 conv, 1283x3 conv, 128, / 2
3x3 conv, 1283x3 conv, 128
3x3 conv, 1283x3 conv, 128
..
.
3x3 conv, 5123x3 conv, 512, /2
3x3 conv, 5123x3 conv, 512
3x3 conv, 5123x3 conv, 512
Pool
Usesthesameaggressivestem asGoogleNet todownsample theinput4xbeforeapplyingresidualblocks:
Inputsize Layer
Outputsize
Layer C H/W filters kernel stride pad C H/W memory(KB)params(k)
flop(M)
conv 3 224 64 7 2 3 64 112 3136 9 118max-pool 64 112 3 2 1 64 56 784 0 2
JustinJohnson September30,2019
ResidualNetworks
Lecture8- 68
Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016Input
Softmax
3x3 conv, 64
7x7 conv, 64, / 2
FC 1000
Pool
3x3 conv, 64
3x3 conv, 643x3 conv, 64
3x3 conv, 643x3 conv, 64
3x3 conv, 1283x3 conv, 128, / 2
3x3 conv, 1283x3 conv, 128
3x3 conv, 1283x3 conv, 128
..
.
3x3 conv, 5123x3 conv, 512, /2
3x3 conv, 5123x3 conv, 512
3x3 conv, 5123x3 conv, 512
Pool
LikeGoogLeNet,nobigfully-connected-layers:insteaduseglobalaveragepooling andasinglelinearlayerattheend
JustinJohnson September30,2019
ResidualNetworks
Lecture8- 69
Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016Errorratesare224x224single-croptesting,reportedbytorchvision
Input
Softmax
3x3 conv, 64
7x7 conv, 64, / 2
FC 1000
Pool
3x3 conv, 64
3x3 conv, 643x3 conv, 64
3x3 conv, 643x3 conv, 64
3x3 conv, 1283x3 conv, 128, / 2
3x3 conv, 1283x3 conv, 128
3x3 conv, 1283x3 conv, 128
..
.
3x3 conv, 5123x3 conv, 512, /2
3x3 conv, 5123x3 conv, 512
3x3 conv, 5123x3 conv, 512
Pool
ResNet-18:Stem:1convlayerStage1(C=64):2res.block=4convStage2(C=128):2res.block=4convStage3(C=256):2res.block=4convStage4(C=512):2res.block=4convLinear
ImageNettop-5error:10.92GFLOP:1.8
JustinJohnson September30,2019
ResidualNetworks
Lecture8- 70
Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016Errorratesare224x224single-croptesting,reportedbytorchvision
Input
Softmax
3x3 conv, 64
7x7 conv, 64, / 2
FC 1000
Pool
3x3 conv, 64
3x3 conv, 643x3 conv, 64
3x3 conv, 643x3 conv, 64
3x3 conv, 1283x3 conv, 128, / 2
3x3 conv, 1283x3 conv, 128
3x3 conv, 1283x3 conv, 128
..
.
3x3 conv, 5123x3 conv, 512, /2
3x3 conv, 5123x3 conv, 512
3x3 conv, 5123x3 conv, 512
Pool
ResNet-18:Stem:1convlayerStage1(C=64):2res.block=4convStage2(C=128):2res.block=4convStage3(C=256):2res.block=4convStage4(C=512):2res.block=4convLinear
ImageNettop-5error:10.92GFLOP:1.8
ResNet-34:Stem:1convlayerStage1:3res.block=6convStage2:4res.block=8convStage3:6res.block=12convStage4:3res.block=6convLinear
ImageNettop-5error:8.58GFLOP:3.6
JustinJohnson September30,2019
ResidualNetworks
Lecture8- 71
Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016Errorratesare224x224single-croptesting,reportedbytorchvision
Input
Softmax
3x3 conv, 64
7x7 conv, 64, / 2
FC 1000
Pool
3x3 conv, 64
3x3 conv, 643x3 conv, 64
3x3 conv, 643x3 conv, 64
3x3 conv, 1283x3 conv, 128, / 2
3x3 conv, 1283x3 conv, 128
3x3 conv, 1283x3 conv, 128
..
.
3x3 conv, 5123x3 conv, 512, /2
3x3 conv, 5123x3 conv, 512
3x3 conv, 5123x3 conv, 512
Pool
ResNet-18:Stem:1convlayerStage1(C=64):2res.block=4convStage2(C=128):2res.block=4convStage3(C=256):2res.block=4convStage4(C=512):2res.block=4convLinear
ImageNettop-5error:10.92GFLOP:1.8
ResNet-34:Stem:1convlayerStage1:3res.block=6convStage2:4res.block=8convStage3:6res.block=12convStage4:3res.block=6convLinear
ImageNettop-5error:8.58GFLOP:3.6
VGG-16:ImageNettop-5error:9.62GFLOP:13.6
JustinJohnson September30,2019
ResidualNetworks:BasicBlock
Lecture8- 72
Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016
“Basic”Residualblock
Conv(3x3,C->C)
Conv(3x3,C->C)
JustinJohnson September30,2019
ResidualNetworks:BasicBlock
Lecture8- 73
Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016
“Basic”Residualblock
Conv(3x3,C->C)
Conv(3x3,C->C) FLOPs:9HWC2
FLOPs:9HWC2
TotalFLOPs:18HWC2
JustinJohnson September30,2019
ResidualNetworks:BottleneckBlock
Lecture8- 74
Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016
“Basic”Residualblock
Conv(3x3,C->C)
Conv(3x3,C->C)
Conv(1x1,4C->C)
Conv(3x3,C->C)
Conv(1x1,C->4C)FLOPs:9HWC2
FLOPs:9HWC2
TotalFLOPs:18HWC2 “Bottleneck”
Residualblock
JustinJohnson September30,2019
ResidualNetworks:BottleneckBlock
Lecture8- 75
Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016
“Basic”Residualblock
Conv(3x3,C->C)
Conv(3x3,C->C)
Conv(1x1,4C->C)
Conv(3x3,C->C)
Conv(1x1,C->4C)FLOPs:9HWC2
FLOPs:9HWC2
FLOPs:4HWC2
FLOPs:9HWC2
FLOPs:4HWC2
TotalFLOPs:18HWC2 TotalFLOPs:
17HWC2“Bottleneck”Residualblock
Morelayers,lesscomputationalcost!
JustinJohnson September30,2019
ResidualNetworks
Lecture8- 76
Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016Errorratesare224x224single-croptesting,reportedbytorchvision
Input
Softmax
3x3 conv, 64
7x7 conv, 64, / 2
FC 1000
Pool
3x3 conv, 64
3x3 conv, 643x3 conv, 64
3x3 conv, 643x3 conv, 64
3x3 conv, 1283x3 conv, 128, / 2
3x3 conv, 1283x3 conv, 128
3x3 conv, 1283x3 conv, 128
..
.
3x3 conv, 5123x3 conv, 512, /2
3x3 conv, 5123x3 conv, 512
3x3 conv, 5123x3 conv, 512
Pool
Stage1 Stage2 Stage3 Stage4Blocktype
Stemlayers Blocks Layers Blocks Layers Blocks Layers Blocks Layers
FClayers GFLOP
ImageNettop-5error
ResNet-18 Basic 1 2 4 2 4 2 4 2 4 1 1.8 10.92ResNet-34 Basic 1 3 6 4 8 6 12 3 6 1 3.6 8.58ResNet-50 Bottle 1 3 9 4 12 6 18 3 9 1 3.8 7.13ResNet-101 Bottle 1 3 9 4 12 23 69 3 9 1 7.6 6.44ResNet-152 Bottle 1 3 9 8 24 36 108 3 9 1 11.3 5.94
JustinJohnson September30,2019
ResidualNetworks
Lecture8- 77
Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016Errorratesare224x224single-croptesting,reportedbytorchvision
Input
Softmax
3x3 conv, 64
7x7 conv, 64, / 2
FC 1000
Pool
3x3 conv, 64
3x3 conv, 643x3 conv, 64
3x3 conv, 643x3 conv, 64
3x3 conv, 1283x3 conv, 128, / 2
3x3 conv, 1283x3 conv, 128
3x3 conv, 1283x3 conv, 128
..
.
3x3 conv, 5123x3 conv, 512, /2
3x3 conv, 5123x3 conv, 512
3x3 conv, 5123x3 conv, 512
Pool
Stage1 Stage2 Stage3 Stage4Blocktype
Stemlayers Blocks Layers Blocks Layers Blocks Layers Blocks Layers
FClayers GFLOP
ImageNettop-5error
ResNet-18 Basic 1 2 4 2 4 2 4 2 4 1 1.8 10.92ResNet-34 Basic 1 3 6 4 8 6 12 3 6 1 3.6 8.58ResNet-50 Bottle 1 3 9 4 12 6 18 3 9 1 3.8 7.13ResNet-101 Bottle 1 3 9 4 12 23 69 3 9 1 7.6 6.44ResNet-152 Bottle 1 3 9 8 24 36 108 3 9 1 11.3 5.94
ResNet-50isthesameasResNet-34,butreplacesBasicblockswithBottleneckBlocks.Thisisagreatbaselinearchitectureformanytaskseventoday!
JustinJohnson September30,2019
ResidualNetworks
Lecture8- 78
Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016Errorratesare224x224single-croptesting,reportedbytorchvision
Input
Softmax
3x3 conv, 64
7x7 conv, 64, / 2
FC 1000
Pool
3x3 conv, 64
3x3 conv, 643x3 conv, 64
3x3 conv, 643x3 conv, 64
3x3 conv, 1283x3 conv, 128, / 2
3x3 conv, 1283x3 conv, 128
3x3 conv, 1283x3 conv, 128
..
.
3x3 conv, 5123x3 conv, 512, /2
3x3 conv, 5123x3 conv, 512
3x3 conv, 5123x3 conv, 512
Pool
Stage1 Stage2 Stage3 Stage4Blocktype
Stemlayers Blocks Layers Blocks Layers Blocks Layers Blocks Layers
FClayers GFLOP
ImageNettop-5error
ResNet-18 Basic 1 2 4 2 4 2 4 2 4 1 1.8 10.92ResNet-34 Basic 1 3 6 4 8 6 12 3 6 1 3.6 8.58ResNet-50 Bottle 1 3 9 4 12 6 18 3 9 1 3.8 7.13ResNet-101 Bottle 1 3 9 4 12 23 69 3 9 1 7.6 6.44ResNet-152 Bottle 1 3 9 8 24 36 108 3 9 1 11.3 5.94
DeeperResNet-101andResNet-152modelsaremoreaccurate,butalsomorecomputationallyheavy
JustinJohnson September30,2019
ResidualNetworks
Lecture8- 79
Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016
- Abletotrainverydeepnetworks- Deepernetworksdobetterthan
shallownetworks(asexpected)- Swept1stplaceinallILSVRCand
COCO2015competitions- Stillwidelyusedtoday!
JustinJohnson September30,2019
ImprovingResidualNetworks:BlockDesign
Lecture8- 80
Conv
BatchNorm
ReLU
Conv
BatchNorm
ReLU
BatchNorm
ReLU
Conv
BatchNorm
ReLU
Conv
OriginalResNet block “Pre-Activation”ResNet Block
Heetal,”Identitymappingsindeepresidualnetworks”,ECCV2016
NoteReLU after residual:
Cannotactuallylearnidentityfunctionsinceoutputsarenonnegative!
NoteReLU insideresidual:
CanlearntrueidentityfunctionbysettingConvweightstozero!
JustinJohnson September30,2019
ImprovingResidualNetworks:BlockDesign
Lecture8- 81
Conv
BatchNorm
ReLU
Conv
BatchNorm
ReLU
BatchNorm
ReLU
Conv
BatchNorm
ReLU
Conv
OriginalResNet block “Pre-Activation”ResNet Block
Heetal,”Identitymappingsindeepresidualnetworks”,ECCV2016
Slightimprovementinaccuracy(ImageNettop-1error)
ResNet-152:21.3vs21.1ResNet-200:21.8vs20.7
Notactuallyusedthatmuchinpractice
JustinJohnson September30,2019
ComparingComplexity
Lecture8- 82
Canziani etal,“Ananalysisofdeepneuralnetworkmodelsforpracticalapplications”,2017
JustinJohnson September30,2019
ComparingComplexity
Lecture8- 83
Canziani etal,“Ananalysisofdeepneuralnetworkmodelsforpracticalapplications”,2017
Inception-v4:Resnet +Inception!
JustinJohnson September30,2019
ComparingComplexity
Lecture8- 84
Canziani etal,“Ananalysisofdeepneuralnetworkmodelsforpracticalapplications”,2017
VGG:Highestmemory,mostoperations
JustinJohnson September30,2019
ComparingComplexity
Lecture8- 85
Canziani etal,“Ananalysisofdeepneuralnetworkmodelsforpracticalapplications”,2017
GoogLeNet:Veryefficient!
JustinJohnson September30,2019
ComparingComplexity
Lecture8- 86
Canziani etal,“Ananalysisofdeepneuralnetworkmodelsforpracticalapplications”,2017
AlexNet:Lowcompute,lotsofparameters
JustinJohnson September30,2019
ComparingComplexity
Lecture8- 87
Canziani etal,“Ananalysisofdeepneuralnetworkmodelsforpracticalapplications”,2017
ResNet:Simpledesign,moderateefficiency,highaccuracy
JustinJohnson September30,2019
ImageNetClassificationChallenge
Lecture8- 88
28.225.8
16.4
11.7
7.3 6.73.6 3 2.3
5.1
0
5
10
15
20
25
30
2010 2011 2012 2013 2014 2014 2015 2016 2017 Human
ErrorR
ate
Shallow
8layers 8layers
19layers
22layers
152layers
152layers
152layers
Linetal Sanchez&Perronnin
Krizhevsky etal(AlexNet)
Zeiler &Fergus
Simonyan &Zisserman(VGG)
Szegedy etal(GoogLeNet)
Heetal(ResNet)
Russakovsky etalShaoetal Huetal(SENet)
JustinJohnson September30,2019
ImageNetClassificationChallenge
Lecture8- 89
28.225.8
16.4
11.7
7.3 6.73.6 3 2.3
5.1
0
5
10
15
20
25
30
2010 2011 2012 2013 2014 2014 2015 2016 2017 Human
ErrorR
ate
Shallow
8layers 8layers
19layers
22layers
152layers
152layers
152layers
Linetal Sanchez&Perronnin
Krizhevsky etal(AlexNet)
Zeiler &Fergus
Simonyan &Zisserman(VGG)
Szegedy etal(GoogLeNet)
Heetal(ResNet)
Russakovsky etalShaoetal Huetal(SENet)
JustinJohnson September30,2019
ImageNet2016winner:ModelEnsembles
Lecture8- 90
Multi-scaleensembleofInception,Inception-Resnet,Resnet,WideResnet models
Shaoetal,2016
JustinJohnson September30,2019
ImprovingResNets
Lecture8- 91
Conv(1x1,4C->C)
Conv(3x3,C->C)
Conv(1x1,C->4C)
FLOPs:4HWC2
FLOPs:9HWC2
FLOPs:4HWC2
TotalFLOPs:17HWC2
“Bottleneck”Residualblock
JustinJohnson September30,2019
ImprovingResNets:ResNeXt
Lecture8- 92
Conv(1x1,4C->C)
Conv(3x3,C->C)
Conv(1x1,C->4C)
FLOPs:4HWC2
FLOPs:9HWC2
FLOPs:4HWC2
TotalFLOPs:17HWC2
“Bottleneck”Residualblock
Conv(1x1,4C->c)
Conv(3x3,c->c)
Conv(1x1,c->4C)
Conv(1x1,4C->c)
Conv(3x3,c->c)
Conv(1x1,c->4C)
…
Gparallelpathways
Xie etal,“Aggregatedresidualtransformationsfordeepneuralnetworks”,CVPR2017
JustinJohnson September30,2019
ImprovingResNets:ResNeXt
Lecture8- 93
Conv(1x1,4C->C)
Conv(3x3,C->C)
Conv(1x1,C->4C)
FLOPs:4HWC2
FLOPs:9HWC2
FLOPs:4HWC2
TotalFLOPs:17HWC2
“Bottleneck”Residualblock
Conv(1x1,4C->c)
Conv(3x3,c->c)
Conv(1x1,c->4C)
Conv(1x1,4C->c)
Conv(3x3,c->c)
Conv(1x1,c->4C)
…
Gparallelpathways
4HWCc
9HWc2
4HWCc
TotalFLOPs:(8Cc+9c2)*HWG
Xie etal,“Aggregatedresidualtransformationsfordeepneuralnetworks”,CVPR2017
JustinJohnson September30,2019
ImprovingResNets:ResNeXt
Lecture8- 94
Conv(1x1,4C->C)
Conv(3x3,C->C)
Conv(1x1,C->4C)
FLOPs:4HWC2
FLOPs:9HWC2
FLOPs:4HWC2
TotalFLOPs:17HWC2
“Bottleneck”Residualblock
Conv(1x1,4C->c)
Conv(3x3,c->c)
Conv(1x1,c->4C)
Conv(1x1,4C->c)
Conv(3x3,c->c)
Conv(1x1,c->4C)
…
Gparallelpathways
4HWCc
9HWc2
4HWCc
TotalFLOPs:(8Cc+9c2)*HWGEqualcostwhen
9Gc2 +8GCc– 17C2 =0Example:C=64,G=4,c=24;C=64,G=32,c=4Xie etal,“Aggregatedresidualtransformationsfordeepneuralnetworks”,CVPR2017
JustinJohnson September30,2019
GroupedConvolution
Lecture8- 95
Convolutionwithgroups=1:Normalconvolution
Input:Cin xHxWWeight:Cout xCin xKxKOutput:Cout xH’xW’FLOPs:CoutCinK2HW
AllconvolutionalkernelstouchallCin channelsoftheinput
JustinJohnson September30,2019
GroupedConvolution
Lecture8- 96
Convolutionwithgroups=2:Twoparallelconvolutionlayersthat
workonhalfthechannels
Input:Cin xHxW
Group1:(Cin /2)xHxW
Group2:(Cin /2)xHxW
Split
Conv(KxK,Cin/2->Cout/2) Conv(KxK,Cin/2->Cout/2)
Out1:(Cout /2)xH’xW’
Out2:(Cout /2)xH’xW’
Concat
Output:Cout xH’xW’
Convolutionwithgroups=1:Normalconvolution
Input:Cin xHxWWeight:Cout xCin xKxKOutput:Cout xH’xW’FLOPs:CoutCinK2HW
AllconvolutionalkernelstouchallCin channelsoftheinput
JustinJohnson September30,2019
GroupedConvolution
Lecture8- 97
Convolutionwithgroups=1:Normalconvolution
Input:Cin xHxWWeight:Cout xCin xKxKOutput:Cout xH’xW’FLOPs:CoutCinK2HW
AllconvolutionalkernelstouchallCin channelsoftheinput
Convolutionwithgroups=G:Gparallelconvlayers;each“sees”Cin/GinputchannelsandproducesCout/Goutputchannels
Input:Cin xHxWSplittoGx[(Cin /G)xHxW]Weight:Gx(Cout /G)x(Cin xG)xKxKGparallelconvolutionsOutput:Gx[(Cout /G)xH’xW’]Concat toCout xH’xW’FLOPs:CoutCinK2HW/G
JustinJohnson September30,2019
GroupedConvolution
Lecture8- 98
Convolutionwithgroups=1:Normalconvolution
Input:Cin xHxWWeight:Cout xCin xKxKOutput:Cout xH’xW’FLOPs:CoutCinK2HW
AllconvolutionalkernelstouchallCin channelsoftheinput
Convolutionwithgroups=G:Gparallelconvlayers;each“sees”Cin/GinputchannelsandproducesCout/Goutputchannels
Input:Cin xHxWSplittoGx[(Cin /G)xHxW]Weight:Gx(Cout /G)x(Cin xG)xKxKGparallelconvolutionsOutput:Gx[(Cout /G)xH’xW’]Concat toCout xH’xW’FLOPs:CoutCinK2HW/GDepthwise Convolution
Specialcase:G=Cin,Cout =nCinEachinputchannelisconvolvedwithndifferentKxKfilterstoproducenoutputchannels
JustinJohnson September30,2019
GroupedConvolutioninPyTorch
Lecture8- 99
PyTorch convolutiongivesanoptionforgroups!
JustinJohnson September30,2019
ImprovingResNets:ResNeXt
Lecture8- 100
Conv(1x1,4C->Gc)
Conv(3x3,Gc->Gc,groups=G)
Conv(1x1,Gc->4C)
ResNeXt block:Groupedconvolution
Conv(1x1,4C->c)
Conv(3x3,c->c)
Conv(1x1,c->4C)
Conv(1x1,4C->c)
Conv(3x3,c->c)
Conv(1x1,c->4C)
…
Gparallelpathways
4HWCc
9HWc2
4HWCc
TotalFLOPs:(8Cc+9c2)*HWGEqualcostwhen
9Gc2 +8GCc– 17C2 =0Example:C=64,G=4,c=24;C=64,G=32,c=4
Equivalentformulationwithgroupedconvolution
Xie etal,“Aggregatedresidualtransformationsfordeepneuralnetworks”,CVPR2017
JustinJohnson September30,2019
ResNeXt:Maintaincomputationbyaddinggroups!
Lecture8- 101
Model Groups Groupwidth Top-1ErrorResNet-50 1 64 23.9ResNeXt-50 2 40 23ResNeXt-50 4 24 22.6ResNeXt-50 8 14 22.3ResNeXt-50 32 4 22.2
Model Groups Groupwidth Top-1ErrorResNet-101 1 64 22.0ResNeXt-101 2 40 21.7ResNeXt-101 4 24 21.4ResNeXt-101 8 14 21.3ResNeXt-101 32 4 21.2
Xie etal,“Aggregatedresidualtransformationsfordeepneuralnetworks”,CVPR2017
Addinggroupsimprovesperformancewithsamecomputationalcomplexity!
JustinJohnson September30,2019
ImageNetClassificationChallenge
Lecture8- 102
28.225.8
16.4
11.7
7.3 6.73.6 3 2.3
5.1
0
5
10
15
20
25
30
2010 2011 2012 2013 2014 2014 2015 2016 2017 Human
ErrorR
ate
Shallow
8layers 8layers
19layers
22layers
152layers
152layers
152layers
Linetal Sanchez&Perronnin
Krizhevsky etal(AlexNet)
Zeiler &Fergus
Simonyan &Zisserman(VGG)
Szegedy etal(GoogLeNet)
Heetal(ResNet)
Russakovsky etalShaoetal Huetal(SENet)
JustinJohnson September30,2019
Squeeze-and-ExcitationNetworks
Lecture8- 103
Huetal,“Squeeze-and-Excitationnetworks”,CVPR2018
Addsa”Squeeze-and-excite”branchtoeachresidualblockthatperformsglobalpooling,full-connectedlayers,andmultipliesbackontofeaturemap
Addsglobalcontext toeachresidualblock!
WonILSVRC2017withResNeXt-152-SE
JustinJohnson September30,2019
ImageNetClassificationChallenge
Lecture8- 104
28.225.8
16.4
11.7
7.3 6.73.6 3 2.3
5.1
0
5
10
15
20
25
30
2010 2011 2012 2013 2014 2014 2015 2016 2017 Human
ErrorR
ate
Shallow
8layers 8layers
19layers
22layers
152layers
152layers
152layers
Linetal Sanchez&Perronnin
Krizhevsky etal(AlexNet)
Zeiler &Fergus
Simonyan &Zisserman(VGG)
Szegedy etal(GoogLeNet)
Heetal(ResNet)
Russakovsky etalShaoetal Huetal(SENet)
Completionofthechallenge:AnnualImageNetcompetitionnolongerheldafter2017->nowmovedtoKaggle.
JustinJohnson September30,2019
DenselyConnectedNeuralNetworks
Lecture8- 105
Conv
Conv
1x1 conv, 64
1x1 conv, 64
Input
Concat
Concat
Concat
Dense Block
Pool
Conv
Dense Block 1
Conv
Input
Conv
Dense Block 2
Conv
Pool
Conv
Dense Block 3
Softmax
FC
Pool
Huangetal,“Denselyconnectedneuralnetworks”,CVPR2017
Denseblockswhereeachlayerisconnectedtoeveryotherlayerinfeedforwardfashion
Alleviatesvanishinggradient,strengthensfeaturepropagation,encouragesfeaturereuse
JustinJohnson September30,2019
MobileNets:TinyNetworks(ForMobileDevices)
Lecture8- 106
BatchNorm
ReLU
Conv(3x3,C->C)
Conv(3x3,C->C,groups=C)
BatchNorm
ReLU
Conv(1x1,C->C)
BatchNorm
ReLU
9C2HW
9CHW
C2HW
StandardConvolutionBlockTotalcost:9C2HW
Depthwise SeparableConvolutionTotalcost:(9C+C2)HW
“Depthwise Convolution”
“PointwiseConvolution”
Howardetal,“MobileNets:EfficientConvolutionalNeuralNetworksforMobileVisionApplications”,2017
Speedup=9C2/(9C+C2)=9C/(9+C)=>9(asC->inf)
JustinJohnson September30,2019
MobileNets:TinyNetworks(ForMobileDevices)
Lecture8- 107
Conv(3x3,C->C,groups=C)
BatchNorm
ReLU
Conv(1x1,C->C)
BatchNorm
ReLU
9CHW
C2HW
Depthwise SeparableConvolutionTotalcost:(9C+C2)HW
“Depthwise Convolution”
“PointwiseConvolution”
Howardetal,“MobileNets:EfficientConvolutionalNeuralNetworksforMobileVisionApplications”,2017
Alsorelated:
ShuffleNet:Zhangetal,CVPR2018MobileNetV2:Sandleretal,CVPR2018ShuffleNetV2:Maetal,ECCV2018
JustinJohnson September30,2019
NeuralArchitectureSearch
Lecture8- 108
Zoph andLe,“NeuralArchitectureSearchwithReinforcementLearning”,ICLR2017
Designingneuralnetworkarchitecturesishard– let’sautomateit!
- Onenetwork(controller)outputsnetworkarchitectures- Samplechildnetworks fromcontrollerandtrainthem- Aftertrainingabatchofchildnetworks,makeagradient
steponcontrollernetwork(Usingpolicygradient)- Overtime,controllerlearnstooutputgoodarchitectures!
JustinJohnson September30,2019
NeuralArchitectureSearch
Lecture8- 109
Zoph andLe,“NeuralArchitectureSearchwithReinforcementLearning”,ICLR2017
Designingneuralnetworkarchitecturesishard– let’sautomateit!
- Onenetwork(controller)outputsnetworkarchitectures- Samplechildnetworks fromcontrollerandtrainthem- Aftertrainingabatchofchildnetworks,makeagradient
steponcontrollernetwork(Usingpolicygradient)- Overtime,controllerlearnstooutputgoodarchitectures!- VERYEXPENSIVE!!Eachgradientsteponcontroller
requirestrainingabatchofchildmodels!- Originalpapertrainedon800GPUsfor28days!- Followup workhasfocusedonefficientsearch
JustinJohnson September30,2019
NeuralArchitectureSearch
Lecture8- 110
Zoph etal,“LearningTransferableArchitecturesforScalableImageRecognition”,CVPR2018
NeuralarchitecturesearchcanbeusedtofindefficientCNNarchitectures!
JustinJohnson September30,2019
CNNArchitecturesSummary
Lecture8- 111
Earlywork(AlexNet ->ZFNet ->VGG)showsthatbiggernetworksworkbetter
GoogLeNet oneofthefirsttofocusonefficiency (aggressivestem,1x1bottleneckconvolutions,globalavg poolinsteadofFClayers)
ResNet showedushowtotrainextremelydeepnetworks– limitedonlybyGPUmemory!Startedtoshowdiminishingreturnsasnetworksgotbigger
AfterResNet:Efficientnetworks becamecentral:howcanweimprovetheaccuracywithoutincreasingthecomplexity?
Lotsoftinynetworks aimedatmobiledevices:MobileNet,ShuffleNet,etc
NeuralArchitectureSearchpromisestoautomatearchitecturedesign
JustinJohnson September30,2019
WhichArchitectureshouldIuse?
Lecture8- 112
Don’tbeahero.Formostproblemsyoushoulduseanoff-the-shelfarchitecture;don’ttrytodesignyourown!
Ifyoujustcareaboutaccuracy,ResNet-50 orResNet-101 aregreatchoices
Ifyouwantanefficientnetwork(real-time,runonmobile,etc)tryMobileNets andShuffleNets
JustinJohnson September30,2019
NextTime:DeepLearningHardwareandSoftware
Lecture8- 113