lecture 8: cnn architecturesjustincj/slides/eecs498/498_fa2019_lecture08.pdf · lecture 8 -8 figure...

JustinJohnson September30,2019

Lecture8:CNNArchitectures

Lecture8- 1


Reminder:A2duetoday!

Lecture8- 2

Dueat11:59pm

Remembertorunthevalidationscript!


Soon:Assignment3!

Lecture8- 3

ModularAPIforbackpropagation

Fully-connectednetworksDropoutUpdaterules:SGD+Momentum,RMSprop,AdamConvolutionalnetworksBatchnormalization

WillbereleasedtodayortomorrowWillbeduetwoweeksfromthedayitisreleased


LastTime:ComponentsofConvolutionalNetworks

Lecture8- 4

ConvolutionLayers PoolingLayers

x h s

Fully-ConnectedLayers

ActivationFunction Normalization


ImageNetClassificationChallenge

Lecture8- 5

28.225.8

16.4

11.7

7.3 6.73.6 3 2.3

5.1

0

5

10

15

20

25

30

2010 2011 2012 2013 2014 2014 2015 2016 2017 Human

ErrorR

ate

Shallow

8layers 8layers

19layers

22layers

152layers

152layers

152layers

Linetal Sanchez&Perronnin

Krizhevsky etal(AlexNet)

Zeiler &Fergus

Simonyan &Zisserman(VGG)

Szegedy etal(GoogLeNet)

Heetal(ResNet)

Russakovsky etalShaoetal Huetal(SENet)



Lecture8- 6

28.225.8

16.4

11.7

7.3 6.73.6 3 2.3

5.1

0

5

10

15

20

25

30

2010 2011 2012 2013 2014 2014 2015 2016 2017 Human

ErrorR

ate

Shallow

8layers 8layers

19layers

22layers

152layers

152layers

152layers



Zeiler &Fergus



Heetal(ResNet)



AlexNet

Lecture8- 7

FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.

227x227inputs5ConvolutionallayersMaxpooling3fully-connectedlayersReLU nonlinearities


AlexNet

Lecture8- 8


227x227inputs5ConvolutionallayersMaxpooling3fully-connectedlayersReLU nonlinearities

Used“Localresponsenormalization”;Notusedanymore

TrainedontwoGTX580GPUs– only3GBofmemoryeach!ModelsplitovertwoGPUs


AlexNet

Lecture8- 9



AlexNet

Lecture8- 10


284 9422672

5955

10173

14951

11533

0

2000

4000

6000

8000

10000

12000

14000

16000

2013 2014 2015 2016 2017 2018 2019

AlexNet Citationsperyear(Asof9/30/2019)

TotalCitations:46,510


AlexNet

Lecture8- 11


284 9422672

5955

10173

14951

11533

0

2000

4000

6000

8000

10000

12000

14000

16000

2013 2014 2015 2016 2017 2018 2019

AlexNet Citationsperyear(Asof9/30/2019)

CitationCountsDarwin,“Ontheoriginofspecies”,1859:50,007

Shannon,“Amathematicaltheoryofcommunication”,1948:69,351

WatsonandCrick,“MolecularStructureofNucleicAcids”,1953:13,111

ATLASCollaboration,“ObservationofanewparticleinthesearchfortheStandardModelHiggsbosonwiththeATLASdetectorattheLHC”, 2012:14,424TotalCitations:46,510


AlexNet

Lecture8- 12


Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36flatten 256 6 9216 36fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4

?


AlexNet

Lecture8- 13



Recall:Outputchannels=numberoffilters

?


AlexNet

Lecture8- 14



Recall:W’=(W– K+2P)/S+1=227– 11+2*2)/4+1=220/4+1=56


AlexNet

Lecture8- 15



?


AlexNet

Lecture8- 16



Numberofoutputelements=C*H’*W’=64*56*56=200,704

Bytesperelement=4(for32-bitfloatingpoint)

KB=(numberofelements)*(bytesperelem)/1024=200704*4/1024= 784


AlexNet

Lecture8- 17



?


AlexNet

Lecture8- 18



Weightshape=Cout xCin xKxK=64x3x11x11

Biasshape=Cout =64Numberofweights=64*3*11*11+64

=23,296


AlexNet

Lecture8- 19



?


AlexNet

Lecture8- 20



Numberoffloatingpointoperations(multiply+add)=(numberofoutputelements)*(opsperoutputelem)=(Cout xH’xW’)*(Cin xKxK)=(64*56*56)*(3*11*11)=200,704*363=72,855,552


AlexNet

Lecture8- 21



?


AlexNet

Lecture8- 22



Forpoolinglayer:

#outputchannels=#inputchannels=64

W’=floor((W– K)/S+1)=floor(53/2+1)=floor(27.5)=27


AlexNet

Lecture8- 23



#outputelems =Cout xH’xW’Bytesperelem =4KB=Cout *H’*W’*4/1024

=64*27*27*4/1024=182.25

?


AlexNet

Lecture8- 24


Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182 0conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36flatten 256 6 9216 36fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4

Poolinglayershavenolearnableparameters!

?


AlexNet

Lecture8- 25


Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182 0 0conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36flatten 256 6 9216 36fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4

Floating-pointopsforpoolinglayer=(numberofoutputpositions)*(flopsperoutputposition)=(Cout *H’*W’)*(K*K)=(64*27*27)*(3*3)=419,904=0.4MFLOP


AlexNet

Lecture8- 26


Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182 0 0conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127 0 0conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36 0 0flatten 256 6 9216 36 0 0fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4

Flattenoutputsize=Cin xHxW=256*6*6=9216


AlexNet

Lecture8- 27


FCparams =Cin *Cout +Cout=9216*4096+4096=37,725,832

FCflops=Cin *Cout=9216*4096=37,748,736


AlexNet

Lecture8- 28



AlexNet

Lecture8- 29


Howtochoosethis?Trialanderror=(


AlexNet

Lecture8- 30


Interestingtrendshere!


AlexNet

Lecture8- 31

0

5000

10000

15000

20000

25000

30000

35000

40000

Params(K)

0

50

100

150

200

250

MFLOP

0

100

200

300

400

500

600

700

800

900

Memory(KB)

Mostofthememoryusage isintheearlyconvolutionlayers

Nearlyallparameters areinthefully-connectedlayers

Mostfloating-pointops occurintheconvolutionlayers



Lecture8- 32

28.225.8

16.4

11.7

7.3 6.73.6 3 2.3

5.1

0

5

10

15

20

25

30

2010 2011 2012 2013 2014 2014 2015 2016 2017 Human

ErrorR

ate

Shallow

8layers 8layers

19layers

22layers

152layers

152layers

152layers



Zeiler &Fergus



Heetal(ResNet)




Lecture8- 33

28.225.8

16.4

11.7

7.3 6.73.6 3 2.3

5.1

0

5

10

15

20

25

30

2010 2011 2012 2013 2014 2014 2015 2016 2017 Human

ErrorR

ate

Shallow

8layers 8layers

19layers

22layers

152layers

152layers

152layers



Zeiler &Fergus



Heetal(ResNet)



ZFNet:ABiggerAlexNet

Lecture8- 34

AlexNet but:CONV1:changefrom(11x11stride4)to(7x7stride2)CONV3,4,5:insteadof384,384,256filtersuse512,1024,512Moretrialanderror=(

ImageNettop5error:16.4%->11.7%

Zeiler andFergus,“VisualizingandUnderstandingConvolutionalNetworks”,ECCV2014



Lecture8- 35

28.225.8

16.4

11.7

7.3 6.73.6 3 2.3

5.1

0

5

10

15

20

25

30

2010 2011 2012 2013 2014 2014 2015 2016 2017 Human

ErrorR

ate

Shallow

8layers 8layers

19layers

22layers

152layers

152layers

152layers



Zeiler &Fergus



Heetal(ResNet)




Lecture8- 36

28.225.8

16.4

11.7

7.3 6.73.6 3 2.3

5.1

0

5

10

15

20

25

30

2010 2011 2012 2013 2014 2014 2015 2016 2017 Human

ErrorR

ate

Shallow

8layers 8layers

19layers

22layers

152layers

152layers

152layers



Zeiler &Fergus



Heetal(ResNet)



VGG:DeeperNetworks,RegularDesign

Lecture8- 37

3x3 conv, 128Pool

3x3 conv, 643x3 conv, 64

Input

3x3 conv, 128Pool

3x3 conv, 2563x3 conv, 256

Pool

3x3 conv, 5123x3 conv, 512

Pool

3x3 conv, 5123x3 conv, 512

Pool

FC 4096FC 1000Softmax

FC 4096

3x3 conv, 512

3x3 conv, 512

3x3 conv, 384Pool

5x5 conv, 25611x11 conv, 96

Input

Pool3x3 conv, 3843x3 conv, 256

PoolFC 4096FC 4096

SoftmaxFC 1000

Pool

Input

Pool

Pool

Pool

Pool

Softmax

3x3 conv, 512

3x3 conv, 512

3x3 conv, 2563x3 conv, 256

3x3 conv, 1283x3 conv, 128


3x3 conv, 5123x3 conv, 512

3x3 conv, 512

3x3 conv, 5123x3 conv, 512

3x3 conv, 512

FC 4096FC 1000

FC 4096

AlexNet VGG16 VGG19Simonyan andZissermann,“VeryDeepConvolutionalNetworksforLarge-ScaleImageRecognition”,ICLR2015

VGGDesignrules:Allconvare3x3stride1pad1Allmaxpoolare2x2stride2Afterpool,double#channels



Lecture8- 38

3x3 conv, 128Pool


Input

3x3 conv, 128Pool

3x3 conv, 2563x3 conv, 256

Pool

3x3 conv, 5123x3 conv, 512

Pool

3x3 conv, 5123x3 conv, 512

Pool


FC 4096

3x3 conv, 512

3x3 conv, 512

3x3 conv, 384Pool

5x5 conv, 25611x11 conv, 96

Input


PoolFC 4096FC 4096

SoftmaxFC 1000

Pool

Input

Pool

Pool

Pool

Pool

Softmax

3x3 conv, 512

3x3 conv, 512

3x3 conv, 2563x3 conv, 256

3x3 conv, 1283x3 conv, 128


3x3 conv, 5123x3 conv, 512

3x3 conv, 512

3x3 conv, 5123x3 conv, 512

3x3 conv, 512

FC 4096FC 1000

FC 4096


VGGDesignrules:Allconvare3x3stride1pad1Allmaxpoolare2x2stride2Afterpool,double#channelsNetworkhas5convolutionalstages:Stage1:conv-conv-poolStage2:conv-conv-poolStage3:conv-conv-poolStage4:conv-conv-conv-[conv]-poolStage5:conv-conv-conv-[conv]-pool

(VGG-19has4convinstages4and5)



Lecture8- 39

3x3 conv, 128Pool


Input

3x3 conv, 128Pool

3x3 conv, 2563x3 conv, 256

Pool

3x3 conv, 5123x3 conv, 512

Pool

3x3 conv, 5123x3 conv, 512

Pool


FC 4096

3x3 conv, 512

3x3 conv, 512

3x3 conv, 384Pool

5x5 conv, 25611x11 conv, 96

Input


PoolFC 4096FC 4096

SoftmaxFC 1000

Pool

Input

Pool

Pool

Pool

Pool

Softmax

3x3 conv, 512

3x3 conv, 512

3x3 conv, 2563x3 conv, 256

3x3 conv, 1283x3 conv, 128


3x3 conv, 5123x3 conv, 512

3x3 conv, 512

3x3 conv, 5123x3 conv, 512

3x3 conv, 512

FC 4096FC 1000

FC 4096



Option1:Conv(5x5,C->C)

Params:25C2FLOPs:25C2HW



Lecture8- 40

3x3 conv, 128Pool


Input

3x3 conv, 128Pool

3x3 conv, 2563x3 conv, 256

Pool

3x3 conv, 5123x3 conv, 512

Pool

3x3 conv, 5123x3 conv, 512

Pool


FC 4096

3x3 conv, 512

3x3 conv, 512

3x3 conv, 384Pool

5x5 conv, 25611x11 conv, 96

Input


PoolFC 4096FC 4096

SoftmaxFC 1000

Pool

Input

Pool

Pool

Pool

Pool

Softmax

3x3 conv, 512

3x3 conv, 512

3x3 conv, 2563x3 conv, 256

3x3 conv, 1283x3 conv, 128


3x3 conv, 5123x3 conv, 512

3x3 conv, 512

3x3 conv, 5123x3 conv, 512

3x3 conv, 512

FC 4096FC 1000

FC 4096





Option2:Conv(3x3,C->C)Conv(3x3,C->C)




Lecture8- 41

3x3 conv, 128Pool


Input

3x3 conv, 128Pool

3x3 conv, 2563x3 conv, 256

Pool

3x3 conv, 5123x3 conv, 512

Pool

3x3 conv, 5123x3 conv, 512

Pool


FC 4096

3x3 conv, 512

3x3 conv, 512

3x3 conv, 384Pool

5x5 conv, 25611x11 conv, 96

Input


PoolFC 4096FC 4096

SoftmaxFC 1000

Pool

Input

Pool

Pool

Pool

Pool

Softmax

3x3 conv, 512

3x3 conv, 512

3x3 conv, 2563x3 conv, 256

3x3 conv, 1283x3 conv, 128


3x3 conv, 5123x3 conv, 512

3x3 conv, 512

3x3 conv, 5123x3 conv, 512

3x3 conv, 512

FC 4096FC 1000

FC 4096





Option2:Conv(3x3,C->C)Conv(3x3,C->C)


Two3x3convhassamereceptivefieldasasingle5x5conv,buthasfewerparametersandtakeslesscomputation!



Lecture8- 42

3x3 conv, 128Pool


Input

3x3 conv, 128Pool

3x3 conv, 2563x3 conv, 256

Pool

3x3 conv, 5123x3 conv, 512

Pool

3x3 conv, 5123x3 conv, 512

Pool


FC 4096

3x3 conv, 512

3x3 conv, 512

3x3 conv, 384Pool

5x5 conv, 25611x11 conv, 96

Input


PoolFC 4096FC 4096

SoftmaxFC 1000

Pool

Input

Pool

Pool

Pool

Pool

Softmax

3x3 conv, 512

3x3 conv, 512

3x3 conv, 2563x3 conv, 256

3x3 conv, 1283x3 conv, 128


3x3 conv, 5123x3 conv, 512

3x3 conv, 512

3x3 conv, 5123x3 conv, 512

3x3 conv, 512

FC 4096FC 1000

FC 4096



Input:Cx2Hx2WLayer:Conv(3x3,C->C)

Memory:4HWCParams:9C2FLOPs:36HWC2



Lecture8- 43

3x3 conv, 128Pool


Input

3x3 conv, 128Pool

3x3 conv, 2563x3 conv, 256

Pool

3x3 conv, 5123x3 conv, 512

Pool

3x3 conv, 5123x3 conv, 512

Pool


FC 4096

3x3 conv, 512

3x3 conv, 512

3x3 conv, 384Pool

5x5 conv, 25611x11 conv, 96

Input


PoolFC 4096FC 4096

SoftmaxFC 1000

Pool

Input

Pool

Pool

Pool

Pool

Softmax

3x3 conv, 512

3x3 conv, 512

3x3 conv, 2563x3 conv, 256

3x3 conv, 1283x3 conv, 128


3x3 conv, 5123x3 conv, 512

3x3 conv, 512

3x3 conv, 5123x3 conv, 512

3x3 conv, 512

FC 4096FC 1000

FC 4096





Input:2CxHxWConv(3x3,2C->2C)




Lecture8- 44

3x3 conv, 128Pool


Input

3x3 conv, 128Pool

3x3 conv, 2563x3 conv, 256

Pool

3x3 conv, 5123x3 conv, 512

Pool

3x3 conv, 5123x3 conv, 512

Pool


FC 4096

3x3 conv, 512

3x3 conv, 512

3x3 conv, 384Pool

5x5 conv, 25611x11 conv, 96

Input


PoolFC 4096FC 4096

SoftmaxFC 1000

Pool

Input

Pool

Pool

Pool

Pool

Softmax

3x3 conv, 512

3x3 conv, 512

3x3 conv, 2563x3 conv, 256

3x3 conv, 1283x3 conv, 128


3x3 conv, 5123x3 conv, 512

3x3 conv, 512

3x3 conv, 5123x3 conv, 512

3x3 conv, 512

FC 4096FC 1000

FC 4096





Input:2CxHxWConv(3x3,2C->2C)


Convlayersateachspatialresolutiontakethesameamountofcomputation!


AlexNet vsVGG-16:Muchbiggernetwork!

Lecture8- 45

0

5000

10000

15000

20000

25000

30000

AlexNet vsVGG-16(Memory,KB)

AlexNet VGG-16

0

20000

40000

60000

80000

100000

120000

AlexNet vsVGG-16(Params,M)

AlexNet VGG-16

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

AlexNet vsVGG-16(MFLOPs)

AlexNet VGG-16

AlexNet total:1.9MBVGG-16total:48.6MB(25x)

AlexNet total:61MVGG-16total:138M(2.3x)

AlexNet total:0.7GFLOPVGG-16total:13.6GFLOP(19.4x)



Lecture8- 46

28.225.8

16.4

11.7

7.3 6.73.6 3 2.3

5.1

0

5

10

15

20

25

30

2010 2011 2012 2013 2014 2014 2015 2016 2017 Human

ErrorR

ate

Shallow

8layers 8layers

19layers

22layers

152layers

152layers

152layers



Zeiler &Fergus



Heetal(ResNet)




Lecture8- 47

28.225.8

16.4

11.7

7.3 6.73.6 3 2.3

5.1

0

5

10

15

20

25

30

2010 2011 2012 2013 2014 2014 2015 2016 2017 Human

ErrorR

ate

Shallow

8layers 8layers

19layers

22layers

152layers

152layers

152layers



Zeiler &Fergus



Heetal(ResNet)



GoogLeNet:FocusonEfficiency

Lecture8- 48

Szegedy etal,“Goingdeeperwithconvolutions”,CVPR2015

Manyinnovationsforefficiency:reduceparametercount,memoryusage,andcomputation


GoogLeNet:AggressiveStem

Lecture8- 49


Stemnetwork atthestartaggressivelydownsamples input(RecallinVGG-16:Mostofthecomputewasatthestart)



Lecture8- 50


Inputsize Layer OutputsizeLayer C H / W filters kernelstride pad C H/W memory(KB) params (K) flop(M)conv 3 224 64 7 2 3 64 112 3136 9 118max-pool 64 112 3 2 1 64 56 784 0 2conv 64 56 64 1 1 0 64 56 784 4 13conv 64 56 192 3 1 1 192 56 2352 111 347max-pool 192 56 3 2 1 192 28 588 0 1

Totalfrom224to28spatialresolution:Memory:7.5MBParams:124KMFLOP:418




Lecture8- 51


Inputsize Layer OutputsizeLayer C H / W filters kernelstride pad C H/W memory(KB) params (K) flop(M)conv 3 224 64 7 2 3 64 112 3136 9 118max-pool 64 112 3 2 1 64 56 784 0 2conv 64 56 64 1 1 0 64 56 784 4 13conv 64 56 192 3 1 1 192 56 2352 111 347max-pool 192 56 3 2 1 192 28 588 0 1

Totalfrom224to28spatialresolution:Memory:7.5MBParams:124KMFLOP:418

CompareVGG-16:Memory:42.9MB(5.7x)Params:1.1M(8.9x)MFLOP:7485(17.8x)



GoogLeNet:InceptionModule

Lecture8- 52


InceptionmoduleLocalunitwithparallelbranches

Localstructurerepeatedmanytimesthroughoutthenetwork


GoogLeNet:InceptionModule

Lecture8- 53


InceptionmoduleLocalunitwithparallelbranches

Localstructurerepeatedmanytimesthroughoutthenetwork

Uses1x1“Bottleneck”layerstoreducechanneldimensionbeforeexpensiveconv(wewillrevisitthiswithResNet!)


GoogLeNet:GlobalAveragePooling

Lecture8- 54

NolargeFClayersattheend!Insteadusesglobalaveragepoolingtocollapsespatialdimensions,andonelinearlayertoproduceclassscores(RecallVGG-16:MostparameterswereintheFClayers!)

Inputsize Layer Output sizeLayer C H/W filters kernel stride pad C H/W memory(KB) params (k) flop(M)avg-pool 1024 7 7 1 0 1024 1 4 0 0fc 1024 1000 1000 0 1025 1

Layer C H/W filters kernel stride pad C H/W memory(KB) params(K) flop(M)flatten 512 7 25088 98fc6 25088 4096 4096 16 102760 103fc7 4096 4096 4096 16 16777 17fc8 4096 1000 1000 4 4096 4

ComparewithVGG-16:


GoogLeNet:GlobalAveragePooling

Lecture8- 55

NolargeFClayersattheend!Insteaduses“globalaveragepooling”tocollapsespatialdimensions,andonelinearlayertoproduceclassscores(RecallVGG-16:MostparameterswereintheFClayers!)

Inputsize Layer Output sizeLayer C H/W filters kernel stride pad C H/W memory(KB) params (k) flop(M)avg-pool 1024 7 7 1 0 1024 1 4 0 0fc 1024 1000 1000 0 1025 1

Layer C H/W filters kernel stride pad C H/W memory(KB) params(K) flop(M)flatten 512 7 25088 98fc6 25088 4096 4096 16 102760 103fc7 4096 4096 4096 16 16777 17fc8 4096 1000 1000 4 4096 4

ComparewithVGG-16:


GoogLeNet:AuxiliaryClassifiers

Lecture8- 56

Trainingusinglossattheendofthenetworkdidn’tworkwell:Networkistoodeep,gradientsdon’tpropagatecleanly

Asahack,attach“auxiliaryclassifiers”atseveralintermediatepointsinthenetworkthatalsotrytoclassifytheimageandreceiveloss

GoogLeNet wasbeforebatchnormalization!WithBatchNorm nolongerneedtousethistrick



Lecture8- 57

28.225.8

16.4

11.7

7.3 6.73.6 3 2.3

5.1

0

5

10

15

20

25

30

2010 2011 2012 2013 2014 2014 2015 2016 2017 Human

ErrorR

ate

Shallow

8layers 8layers

19layers

22layers

152layers

152layers

152layers



Zeiler &Fergus



Heetal(ResNet)




Lecture8- 58

28.225.8

16.4

11.7

7.3 6.73.6 3 2.3

5.1

0

5

10

15

20

25

30

2010 2011 2012 2013 2014 2014 2015 2016 2017 Human

ErrorR

ate

Shallow

8layers 8layers

19layers

22layers

152layers

152layers

152layers



Zeiler &Fergus



Heetal(ResNet)



ResidualNetworks

Lecture8- 59

Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016

OncewehaveBatchNormalization,wecantrainnetworkswith10+layers.Whathappensaswegodeeper?


ResidualNetworks

Lecture8- 60



Deepermodeldoesworsethanshallowmodel!

Initialguess:Deepmodelisoverfitting sinceitismuchbiggerthantheothermodel

Iterations

56-layer

20-layer

Test error


ResidualNetworks

Lecture8- 61



Training error

Iterations

56-layer

20-layer

Iterations

56-layer

20-layer

Test error

Infactthedeepmodelseemstobeunderfitting sinceitalsoperformsworsethantheshallowmodelonthetrainingset!Itisactuallyunderfitting


ResidualNetworks

Lecture8- 62


Adeepermodelcanemulate ashallowermodel:copylayersfromshallowermodel,setextralayerstoidentity

Thusdeepermodelsshoulddoatleastasgoodasshallowmodels

Hypothesis:Thisisanoptimization problem.Deepermodelsarehardertooptimize,andinparticulardon’tlearnidentityfunctionstoemulateshallowmodels


ResidualNetworks

Lecture8- 63


Adeepermodelcanemulate ashallowermodel:copylayersfromshallowermodel,setextralayerstoidentity

Thusdeepermodelsshoulddoatleastasgoodasshallowmodels

Hypothesis:Thisisanoptimization problem.Deepermodelsarehardertooptimize,andinparticulardon’tlearnidentityfunctionstoemulateshallowmodels

Solution:Changethenetworksolearningidentityfunctionswithextralayersiseasy!


ResidualNetworks

Lecture8- 64


conv

conv

relu

“Plain”block

X

H(x)

relu

ResidualBlock

conv

conv

Additive“shortcut”

F(x)+x

F(x)

relu

X



ResidualNetworks

Lecture8- 65


conv

conv

relu

“Plain”block

X

H(x)

relu

ResidualBlock

conv

conv

Additive“shortcut”

F(x)+x

F(x)

relu

X


Ifyousettheseto0,thewholeblockwillcomputetheidentityfunction!


ResidualNetworks

Lecture8- 66

Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016Input

Softmax

3x3 conv, 64

7x7 conv, 64, / 2

FC 1000

Pool

3x3 conv, 64



3x3 conv, 1283x3 conv, 128, / 2

3x3 conv, 1283x3 conv, 128

3x3 conv, 1283x3 conv, 128

..

.

3x3 conv, 5123x3 conv, 512, /2

3x3 conv, 5123x3 conv, 512

3x3 conv, 5123x3 conv, 512

Pool

relu

Residualblock

3x3 conv

3x3 conv

F(x)+x

F(x)

relu

X

Aresidualnetworkisastackofmanyresidualblocks

Regulardesign,likeVGG:eachresidualblockhastwo3x3conv

Networkisdividedintostages:thefirstblockofeachstagehalvestheresolution(withstride-2conv)anddoublesthenumberofchannels


ResidualNetworks

Lecture8- 67


Softmax

3x3 conv, 64

7x7 conv, 64, / 2

FC 1000

Pool

3x3 conv, 64



3x3 conv, 1283x3 conv, 128, / 2

3x3 conv, 1283x3 conv, 128

3x3 conv, 1283x3 conv, 128

..

.

3x3 conv, 5123x3 conv, 512, /2

3x3 conv, 5123x3 conv, 512

3x3 conv, 5123x3 conv, 512

Pool

Usesthesameaggressivestem asGoogleNet todownsample theinput4xbeforeapplyingresidualblocks:

Inputsize Layer

Outputsize

Layer C H/W filters kernel stride pad C H/W memory(KB)params(k)

flop(M)

conv 3 224 64 7 2 3 64 112 3136 9 118max-pool 64 112 3 2 1 64 56 784 0 2


ResidualNetworks

Lecture8- 68


Softmax

3x3 conv, 64

7x7 conv, 64, / 2

FC 1000

Pool

3x3 conv, 64



3x3 conv, 1283x3 conv, 128, / 2

3x3 conv, 1283x3 conv, 128

3x3 conv, 1283x3 conv, 128

..

.

3x3 conv, 5123x3 conv, 512, /2

3x3 conv, 5123x3 conv, 512

3x3 conv, 5123x3 conv, 512

Pool

LikeGoogLeNet,nobigfully-connected-layers:insteaduseglobalaveragepooling andasinglelinearlayerattheend


ResidualNetworks

Lecture8- 69

Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016Errorratesare224x224single-croptesting,reportedbytorchvision

Input

Softmax

3x3 conv, 64

7x7 conv, 64, / 2

FC 1000

Pool

3x3 conv, 64



3x3 conv, 1283x3 conv, 128, / 2

3x3 conv, 1283x3 conv, 128

3x3 conv, 1283x3 conv, 128

..

.

3x3 conv, 5123x3 conv, 512, /2

3x3 conv, 5123x3 conv, 512

3x3 conv, 5123x3 conv, 512

Pool

ResNet-18:Stem:1convlayerStage1(C=64):2res.block=4convStage2(C=128):2res.block=4convStage3(C=256):2res.block=4convStage4(C=512):2res.block=4convLinear

ImageNettop-5error:10.92GFLOP:1.8


ResidualNetworks

Lecture8- 70


Input

Softmax

3x3 conv, 64

7x7 conv, 64, / 2

FC 1000

Pool

3x3 conv, 64



3x3 conv, 1283x3 conv, 128, / 2

3x3 conv, 1283x3 conv, 128

3x3 conv, 1283x3 conv, 128

..

.

3x3 conv, 5123x3 conv, 512, /2

3x3 conv, 5123x3 conv, 512

3x3 conv, 5123x3 conv, 512

Pool



ResNet-34:Stem:1convlayerStage1:3res.block=6convStage2:4res.block=8convStage3:6res.block=12convStage4:3res.block=6convLinear



ResidualNetworks

Lecture8- 71


Input

Softmax

3x3 conv, 64

7x7 conv, 64, / 2

FC 1000

Pool

3x3 conv, 64



3x3 conv, 1283x3 conv, 128, / 2

3x3 conv, 1283x3 conv, 128

3x3 conv, 1283x3 conv, 128

..

.

3x3 conv, 5123x3 conv, 512, /2

3x3 conv, 5123x3 conv, 512

3x3 conv, 5123x3 conv, 512

Pool



ResNet-34:Stem:1convlayerStage1:3res.block=6convStage2:4res.block=8convStage3:6res.block=12convStage4:3res.block=6convLinear


VGG-16:ImageNettop-5error:9.62GFLOP:13.6


ResidualNetworks:BasicBlock

Lecture8- 72


“Basic”Residualblock

Conv(3x3,C->C)

Conv(3x3,C->C)


ResidualNetworks:BasicBlock

Lecture8- 73



Conv(3x3,C->C)

Conv(3x3,C->C) FLOPs:9HWC2

FLOPs:9HWC2

TotalFLOPs:18HWC2


ResidualNetworks:BottleneckBlock

Lecture8- 74



Conv(3x3,C->C)

Conv(3x3,C->C)

Conv(1x1,4C->C)

Conv(3x3,C->C)

Conv(1x1,C->4C)FLOPs:9HWC2

FLOPs:9HWC2

TotalFLOPs:18HWC2 “Bottleneck”

Residualblock


ResidualNetworks:BottleneckBlock

Lecture8- 75



Conv(3x3,C->C)

Conv(3x3,C->C)

Conv(1x1,4C->C)

Conv(3x3,C->C)

Conv(1x1,C->4C)FLOPs:9HWC2

FLOPs:9HWC2

FLOPs:4HWC2

FLOPs:9HWC2

FLOPs:4HWC2

TotalFLOPs:18HWC2 TotalFLOPs:

17HWC2“Bottleneck”Residualblock

Morelayers,lesscomputationalcost!


ResidualNetworks

Lecture8- 76


Input

Softmax

3x3 conv, 64

7x7 conv, 64, / 2

FC 1000

Pool

3x3 conv, 64



3x3 conv, 1283x3 conv, 128, / 2

3x3 conv, 1283x3 conv, 128

3x3 conv, 1283x3 conv, 128

..

.

3x3 conv, 5123x3 conv, 512, /2

3x3 conv, 5123x3 conv, 512

3x3 conv, 5123x3 conv, 512

Pool

Stage1 Stage2 Stage3 Stage4Blocktype

Stemlayers Blocks Layers Blocks Layers Blocks Layers Blocks Layers

FClayers GFLOP

ImageNettop-5error

ResNet-18 Basic 1 2 4 2 4 2 4 2 4 1 1.8 10.92ResNet-34 Basic 1 3 6 4 8 6 12 3 6 1 3.6 8.58ResNet-50 Bottle 1 3 9 4 12 6 18 3 9 1 3.8 7.13ResNet-101 Bottle 1 3 9 4 12 23 69 3 9 1 7.6 6.44ResNet-152 Bottle 1 3 9 8 24 36 108 3 9 1 11.3 5.94


ResidualNetworks

Lecture8- 77


Input

Softmax

3x3 conv, 64

7x7 conv, 64, / 2

FC 1000

Pool

3x3 conv, 64



3x3 conv, 1283x3 conv, 128, / 2

3x3 conv, 1283x3 conv, 128

3x3 conv, 1283x3 conv, 128

..

.

3x3 conv, 5123x3 conv, 512, /2

3x3 conv, 5123x3 conv, 512

3x3 conv, 5123x3 conv, 512

Pool



FClayers GFLOP

ImageNettop-5error


ResNet-50isthesameasResNet-34,butreplacesBasicblockswithBottleneckBlocks.Thisisagreatbaselinearchitectureformanytaskseventoday!


ResidualNetworks

Lecture8- 78


Input

Softmax

3x3 conv, 64

7x7 conv, 64, / 2

FC 1000

Pool

3x3 conv, 64



3x3 conv, 1283x3 conv, 128, / 2

3x3 conv, 1283x3 conv, 128

3x3 conv, 1283x3 conv, 128

..

.

3x3 conv, 5123x3 conv, 512, /2

3x3 conv, 5123x3 conv, 512

3x3 conv, 5123x3 conv, 512

Pool



FClayers GFLOP

ImageNettop-5error


DeeperResNet-101andResNet-152modelsaremoreaccurate,butalsomorecomputationallyheavy


ResidualNetworks

Lecture8- 79


- Abletotrainverydeepnetworks- Deepernetworksdobetterthan

shallownetworks(asexpected)- Swept1stplaceinallILSVRCand

COCO2015competitions- Stillwidelyusedtoday!


ImprovingResidualNetworks:BlockDesign

Lecture8- 80

Conv

BatchNorm

ReLU

Conv

BatchNorm

ReLU

BatchNorm

ReLU

Conv

BatchNorm

ReLU

Conv

OriginalResNet block “Pre-Activation”ResNet Block

Heetal,”Identitymappingsindeepresidualnetworks”,ECCV2016

NoteReLU after residual:

Cannotactuallylearnidentityfunctionsinceoutputsarenonnegative!

NoteReLU insideresidual:

CanlearntrueidentityfunctionbysettingConvweightstozero!


ImprovingResidualNetworks:BlockDesign

Lecture8- 81

Conv

BatchNorm

ReLU

Conv

BatchNorm

ReLU

BatchNorm

ReLU

Conv

BatchNorm

ReLU

Conv

OriginalResNet block “Pre-Activation”ResNet Block

Heetal,”Identitymappingsindeepresidualnetworks”,ECCV2016

Slightimprovementinaccuracy(ImageNettop-1error)

ResNet-152:21.3vs21.1ResNet-200:21.8vs20.7

Notactuallyusedthatmuchinpractice


ComparingComplexity

Lecture8- 82

Canziani etal,“Ananalysisofdeepneuralnetworkmodelsforpracticalapplications”,2017


ComparingComplexity

Lecture8- 83


Inception-v4:Resnet +Inception!


ComparingComplexity

Lecture8- 84


VGG:Highestmemory,mostoperations


ComparingComplexity

Lecture8- 85


GoogLeNet:Veryefficient!


ComparingComplexity

Lecture8- 86


AlexNet:Lowcompute,lotsofparameters


ComparingComplexity

Lecture8- 87


ResNet:Simpledesign,moderateefficiency,highaccuracy



Lecture8- 88

28.225.8

16.4

11.7

7.3 6.73.6 3 2.3

5.1

0

5

10

15

20

25

30

2010 2011 2012 2013 2014 2014 2015 2016 2017 Human

ErrorR

ate

Shallow

8layers 8layers

19layers

22layers

152layers

152layers

152layers



Zeiler &Fergus



Heetal(ResNet)




Lecture8- 89

28.225.8

16.4

11.7

7.3 6.73.6 3 2.3

5.1

0

5

10

15

20

25

30

2010 2011 2012 2013 2014 2014 2015 2016 2017 Human

ErrorR

ate

Shallow

8layers 8layers

19layers

22layers

152layers

152layers

152layers



Zeiler &Fergus



Heetal(ResNet)



ImageNet2016winner:ModelEnsembles

Lecture8- 90

Multi-scaleensembleofInception,Inception-Resnet,Resnet,WideResnet models

Shaoetal,2016


ImprovingResNets

Lecture8- 91

Conv(1x1,4C->C)

Conv(3x3,C->C)

Conv(1x1,C->4C)

FLOPs:4HWC2

FLOPs:9HWC2

FLOPs:4HWC2

TotalFLOPs:17HWC2

“Bottleneck”Residualblock


ImprovingResNets:ResNeXt

Lecture8- 92

Conv(1x1,4C->C)

Conv(3x3,C->C)

Conv(1x1,C->4C)

FLOPs:4HWC2

FLOPs:9HWC2

FLOPs:4HWC2

TotalFLOPs:17HWC2


Conv(1x1,4C->c)

Conv(3x3,c->c)

Conv(1x1,c->4C)

Conv(1x1,4C->c)

Conv(3x3,c->c)

Conv(1x1,c->4C)

…

Gparallelpathways

Xie etal,“Aggregatedresidualtransformationsfordeepneuralnetworks”,CVPR2017



Lecture8- 93

Conv(1x1,4C->C)

Conv(3x3,C->C)

Conv(1x1,C->4C)

FLOPs:4HWC2

FLOPs:9HWC2

FLOPs:4HWC2

TotalFLOPs:17HWC2


Conv(1x1,4C->c)

Conv(3x3,c->c)

Conv(1x1,c->4C)

Conv(1x1,4C->c)

Conv(3x3,c->c)

Conv(1x1,c->4C)

…

Gparallelpathways

4HWCc

9HWc2

4HWCc

TotalFLOPs:(8Cc+9c2)*HWG




Lecture8- 94

Conv(1x1,4C->C)

Conv(3x3,C->C)

Conv(1x1,C->4C)

FLOPs:4HWC2

FLOPs:9HWC2

FLOPs:4HWC2

TotalFLOPs:17HWC2


Conv(1x1,4C->c)

Conv(3x3,c->c)

Conv(1x1,c->4C)

Conv(1x1,4C->c)

Conv(3x3,c->c)

Conv(1x1,c->4C)

…

Gparallelpathways

4HWCc

9HWc2

4HWCc

TotalFLOPs:(8Cc+9c2)*HWGEqualcostwhen

9Gc2 +8GCc– 17C2 =0Example:C=64,G=4,c=24;C=64,G=32,c=4Xie etal,“Aggregatedresidualtransformationsfordeepneuralnetworks”,CVPR2017


GroupedConvolution

Lecture8- 95

Convolutionwithgroups=1:Normalconvolution

Input:Cin xHxWWeight:Cout xCin xKxKOutput:Cout xH’xW’FLOPs:CoutCinK2HW

AllconvolutionalkernelstouchallCin channelsoftheinput


GroupedConvolution

Lecture8- 96

Convolutionwithgroups=2:Twoparallelconvolutionlayersthat

workonhalfthechannels

Input:Cin xHxW

Group1:(Cin /2)xHxW

Group2:(Cin /2)xHxW

Split

Conv(KxK,Cin/2->Cout/2) Conv(KxK,Cin/2->Cout/2)

Out1:(Cout /2)xH’xW’

Out2:(Cout /2)xH’xW’

Concat

Output:Cout xH’xW’





GroupedConvolution

Lecture8- 97




Convolutionwithgroups=G:Gparallelconvlayers;each“sees”Cin/GinputchannelsandproducesCout/Goutputchannels

Input:Cin xHxWSplittoGx[(Cin /G)xHxW]Weight:Gx(Cout /G)x(Cin xG)xKxKGparallelconvolutionsOutput:Gx[(Cout /G)xH’xW’]Concat toCout xH’xW’FLOPs:CoutCinK2HW/G


GroupedConvolution

Lecture8- 98




Convolutionwithgroups=G:Gparallelconvlayers;each“sees”Cin/GinputchannelsandproducesCout/Goutputchannels

Input:Cin xHxWSplittoGx[(Cin /G)xHxW]Weight:Gx(Cout /G)x(Cin xG)xKxKGparallelconvolutionsOutput:Gx[(Cout /G)xH’xW’]Concat toCout xH’xW’FLOPs:CoutCinK2HW/GDepthwise Convolution

Specialcase:G=Cin,Cout =nCinEachinputchannelisconvolvedwithndifferentKxKfilterstoproducenoutputchannels


GroupedConvolutioninPyTorch

Lecture8- 99

PyTorch convolutiongivesanoptionforgroups!



Lecture8- 100

Conv(1x1,4C->Gc)

Conv(3x3,Gc->Gc,groups=G)

Conv(1x1,Gc->4C)

ResNeXt block:Groupedconvolution

Conv(1x1,4C->c)

Conv(3x3,c->c)

Conv(1x1,c->4C)

Conv(1x1,4C->c)

Conv(3x3,c->c)

Conv(1x1,c->4C)

…

Gparallelpathways

4HWCc

9HWc2

4HWCc

TotalFLOPs:(8Cc+9c2)*HWGEqualcostwhen

9Gc2 +8GCc– 17C2 =0Example:C=64,G=4,c=24;C=64,G=32,c=4

Equivalentformulationwithgroupedconvolution



ResNeXt:Maintaincomputationbyaddinggroups!

Lecture8- 101

Model Groups Groupwidth Top-1ErrorResNet-50 1 64 23.9ResNeXt-50 2 40 23ResNeXt-50 4 24 22.6ResNeXt-50 8 14 22.3ResNeXt-50 32 4 22.2

Model Groups Groupwidth Top-1ErrorResNet-101 1 64 22.0ResNeXt-101 2 40 21.7ResNeXt-101 4 24 21.4ResNeXt-101 8 14 21.3ResNeXt-101 32 4 21.2


Addinggroupsimprovesperformancewithsamecomputationalcomplexity!



Lecture8- 102

28.225.8

16.4

11.7

7.3 6.73.6 3 2.3

5.1

0

5

10

15

20

25

30

2010 2011 2012 2013 2014 2014 2015 2016 2017 Human

ErrorR

ate

Shallow

8layers 8layers

19layers

22layers

152layers

152layers

152layers



Zeiler &Fergus



Heetal(ResNet)



Squeeze-and-ExcitationNetworks

Lecture8- 103

Huetal,“Squeeze-and-Excitationnetworks”,CVPR2018

Addsa”Squeeze-and-excite”branchtoeachresidualblockthatperformsglobalpooling,full-connectedlayers,andmultipliesbackontofeaturemap

Addsglobalcontext toeachresidualblock!

WonILSVRC2017withResNeXt-152-SE



Lecture8- 104

28.225.8

16.4

11.7

7.3 6.73.6 3 2.3

5.1

0

5

10

15

20

25

30

2010 2011 2012 2013 2014 2014 2015 2016 2017 Human

ErrorR

ate

Shallow

8layers 8layers

19layers

22layers

152layers

152layers

152layers



Zeiler &Fergus



Heetal(ResNet)


Completionofthechallenge:AnnualImageNetcompetitionnolongerheldafter2017->nowmovedtoKaggle.


DenselyConnectedNeuralNetworks

Lecture8- 105

Conv

Conv

1x1 conv, 64

1x1 conv, 64

Input

Concat

Concat

Concat

Dense Block

Pool

Conv

Dense Block 1

Conv

Input

Conv

Dense Block 2

Conv

Pool

Conv

Dense Block 3

Softmax

FC

Pool

Huangetal,“Denselyconnectedneuralnetworks”,CVPR2017

Denseblockswhereeachlayerisconnectedtoeveryotherlayerinfeedforwardfashion

Alleviatesvanishinggradient,strengthensfeaturepropagation,encouragesfeaturereuse


MobileNets:TinyNetworks(ForMobileDevices)

Lecture8- 106

BatchNorm

ReLU

Conv(3x3,C->C)

Conv(3x3,C->C,groups=C)

BatchNorm

ReLU

Conv(1x1,C->C)

BatchNorm

ReLU

9C2HW

9CHW

C2HW

StandardConvolutionBlockTotalcost:9C2HW

Depthwise SeparableConvolutionTotalcost:(9C+C2)HW

“Depthwise Convolution”

“PointwiseConvolution”

Howardetal,“MobileNets:EfficientConvolutionalNeuralNetworksforMobileVisionApplications”,2017

Speedup=9C2/(9C+C2)=9C/(9+C)=>9(asC->inf)


MobileNets:TinyNetworks(ForMobileDevices)

Lecture8- 107

Conv(3x3,C->C,groups=C)

BatchNorm

ReLU

Conv(1x1,C->C)

BatchNorm

ReLU

9CHW

C2HW

Depthwise SeparableConvolutionTotalcost:(9C+C2)HW

“Depthwise Convolution”

“PointwiseConvolution”

Howardetal,“MobileNets:EfficientConvolutionalNeuralNetworksforMobileVisionApplications”,2017

Alsorelated:

ShuffleNet:Zhangetal,CVPR2018MobileNetV2:Sandleretal,CVPR2018ShuffleNetV2:Maetal,ECCV2018


NeuralArchitectureSearch

Lecture8- 108

Zoph andLe,“NeuralArchitectureSearchwithReinforcementLearning”,ICLR2017

Designingneuralnetworkarchitecturesishard– let’sautomateit!

- Onenetwork(controller)outputsnetworkarchitectures- Samplechildnetworks fromcontrollerandtrainthem- Aftertrainingabatchofchildnetworks,makeagradient

steponcontrollernetwork(Usingpolicygradient)- Overtime,controllerlearnstooutputgoodarchitectures!



Lecture8- 109

Zoph andLe,“NeuralArchitectureSearchwithReinforcementLearning”,ICLR2017

Designingneuralnetworkarchitecturesishard– let’sautomateit!

- Onenetwork(controller)outputsnetworkarchitectures- Samplechildnetworks fromcontrollerandtrainthem- Aftertrainingabatchofchildnetworks,makeagradient

steponcontrollernetwork(Usingpolicygradient)- Overtime,controllerlearnstooutputgoodarchitectures!- VERYEXPENSIVE!!Eachgradientsteponcontroller

requirestrainingabatchofchildmodels!- Originalpapertrainedon800GPUsfor28days!- Followup workhasfocusedonefficientsearch



Lecture8- 110

Zoph etal,“LearningTransferableArchitecturesforScalableImageRecognition”,CVPR2018

NeuralarchitecturesearchcanbeusedtofindefficientCNNarchitectures!


CNNArchitecturesSummary

Lecture8- 111

Earlywork(AlexNet ->ZFNet ->VGG)showsthatbiggernetworksworkbetter

GoogLeNet oneofthefirsttofocusonefficiency (aggressivestem,1x1bottleneckconvolutions,globalavg poolinsteadofFClayers)

ResNet showedushowtotrainextremelydeepnetworks– limitedonlybyGPUmemory!Startedtoshowdiminishingreturnsasnetworksgotbigger

AfterResNet:Efficientnetworks becamecentral:howcanweimprovetheaccuracywithoutincreasingthecomplexity?

Lotsoftinynetworks aimedatmobiledevices:MobileNet,ShuffleNet,etc

NeuralArchitectureSearchpromisestoautomatearchitecturedesign


WhichArchitectureshouldIuse?

Lecture8- 112

Don’tbeahero.Formostproblemsyoushoulduseanoff-the-shelfarchitecture;don’ttrytodesignyourown!

Ifyoujustcareaboutaccuracy,ResNet-50 orResNet-101 aregreatchoices

Ifyouwantanefficientnetwork(real-time,runonmobile,etc)tryMobileNets andShuffleNets


NextTime:DeepLearningHardwareandSoftware

Lecture8- 113

lecture 8: cnn architecturesjustincj/slides/eecs498/498_fa2019_lecture08.pdf · lecture 8 -8 figure...

Documents