googlenet insights
TRANSCRIPT
FiveInsightsfromGoogLeNet
YouCouldUseInYourOwnDeepLearningNetsAuroTripathy
3b 4a 4b 4c 4d 4e 5a3a 5b
www.shaBerline.com 1
Year1989Kicked-OffConvoluKonNeuralNetsTen-DigitClassifierusingaModestNeuralNetworkwithThreeHiddenLayers
Backpropaga)onAppliedtoHandwri4enZipCodeRecogni)on.LeCun,et.al.hBp://yann.lecun.com/exdb/publis/pdf/lecun-89e.pdf
HiddenUnits
Connec-ons Params
Out–H3(FC)
10Visible
10x(30W+1B)=310
10x(30W+1B)=310
H3–H2(FC)
30 30*(192Weights+1Bias)=5790
30*(192W+1B)=5790
H2–H1(Conv)
12X4x4=192
192x(5x5x8+1)=38592
5x5x8x12+192Biases=2592
H1–Input(Conv)
12x8x8=768
768x(5x5x1+1)=19968
5x5x1x12+768Biases=1068
Totals 16x16In+990Hidden+10Out
64660ConnecKons
9760Params
EachoftheunitsinH2combineslocalinformaKoncomingfrom8ofthe12differentfeaturemapsinH1.www.shaBerline.com 2
Year2012MarkedTheInflecKonPointReintroducingCNNsLedtoBigDropinErrorforImageClassificaKon.
SinceThen, NetworksConKnuedtoReduce
28.2
25.8
16.4
11.7
7.3 6.7
3.57
0
5
10
15
20
25
30
ILSVRC'10 ILSVRC'11 ILSVRC'12(Alexnet)
ILSVRC'13 ILSVRC'14 ILSVRC'14(GoogLeNet)
ILSVRC'15(ResNet)
0
20
40
60
80
100
120
140
160
Error%
Layers
www.shaBerline.com 3
Top-5
TheTrendhasbeentoIncreasethenumberofLayers(&LayerSize)
• Thetypical‘designpaBern’forConvoluKonalNeuralNets:– StackedconvoluKonallayers,
• linearfilterfollowedbyanon-linearacKvaKon– FollowedbycontrastnormalizaKonandmaxpooling,– PenulKmatelayers(oneormore)arefullyconnectedlayers.– UlKmatelayerisalosslayer,possiblymorethanone,inaweightedmix
• Useofdropoutstoaddresstheproblemofover-fipngduetomanylayers
• InaddiKontoclassificaKon,architecturegoodforlocalizaKonandobjectdetecKon– despiteconcernsthatmax-poolingdilutesspaKalinformaKon
www.shaBerline.com 4
TheChallengeofDeepNetworks
1. Addinglayersincreasesthenumberofparametersandmakesthenetworkpronetoover-fipng– Exacerbatedbypaucityofdata– MoredatameansmoreexpenseintheirannotaKon
2. MorecomputaKon– LinearincreaseinfiltersresultsinquadraKcincreaseincompute
– Ifweightsareclosetozero,we’vewastedcomputeresources
www.shaBerline.com 5
Year2014,GoogLeNetTookAimatEfficiencyandPracKcality
Resultantbenefitsofthenewarchitecture:• 12KmeslesserparametersthanAlexNet– SignificantlymoreaccuratethanAlexNet– Lowermemory-useandlowerpower-useacutelyimportantformobiledevices.
• Stayswithinthetargeted1.5BillionmulKply-addbudget– ComputaKonalcost“lessthan2XcomparedtoAlexNet”
hBp://www.youtube.com/watch?v=ySrj_G5gHWI&t=12m42s
www.shaBerline.com 6
IntroducingtheIncepKonModule
www.shaBerline.com 7
1x1
5x5
3x3
1x13x3 MaxPooling
PreviousLayer
Concatenate
IntuiKonbehindtheIncepKonModule• ClusterneuronsaccordingtothecorrelaKonstaKsKcsinthedataset
– AnopKmallayerednetworktopologycanbeconstructedbyanalyzingthecorrelaKonstaKsKcsoftheprecedinglayeracKvaKonsandandclusteringneuronswithhighlycorrelatedoutputs.
• Wealreadyknowthat,inthelowerlayers,thereexistshighcorrelaKonsinimagepatchesthatarelocalandnear-local.– Thesecanbecoveredby1x1convoluKons– AddiKonally,asmallernumberofspaKallyspread-outclusterscanbecovered
byconvoluKonoverlargerpatches;i.e.,3x3,and5x5– Andtherewillbedecreasingnumberofpatchesoverlargerandlarger
regions.
• Italsosuggeststhatthearchitectureisacombina)onoftheofalltheconvoluKons,the1x1,3x3,5x5,asinputtothenextstage
• Sincemax-poolinghasbeensuccessful,itsuggestsaddingapoolinglayerinparallel
www.shaBerline.com 8
InImages,correlaKontendstobelocal,exploitit.HeterogeneoussetofconvoluKonstocoverspread-outclusters
www.shaBerline.com 9
Coververylocalclustersw/1x1convoluKons
Covermorespread-outclustersw/3x3convoluKons
Coverevenmorespread-outclustersw/5x5convoluKons
5x5 3x3 1x1
5x53x31x1
PreviousLayer
ConceivingtheIncepKonModule
www.shaBerline.com 10
5x5
3x3
1x1
3x3 MaxPooling
Concatenate
PreviousLayer
IncepKonModulePutIntoPracKceJudiciousDimensionReducKon
www.shaBerline.com 11
1x1
5x5
3x3
1x13x3 MaxPooling
PreviousLayer
Concatenate
GoogLeNetInsight#1(SummaryfrompreviousSlides)
Leadstothefollowingarchitecturechoices:• Choosingfiltersizesof1X1,3X3,5X5• Applyingallthreefiltersonthesame“patch”ofimage(noneedtochoose)
• ConcatenaKngallfiltersasasingleoutputvectorforthenextstage.
• ConcatenaKnganaddiKonalpoolingpathsincepoolingisessenKaltothesuccessofCNNs.
www.shaBerline.com 13
GoogLeNetInsights#2DecreasedimensionswherevercomputaKonrequirementsincrease
viaa1X1DimensionReducKonLayer• Useinexpensive1X1convoluKonstocomputereducKonsbeforetheexpensive3X3and3X5convoluKons
• 1X1convoluKonsincludeaReLUacKvaKonmakingthendual-purpose.
1x1
PreviousLayer
ReLU
www.shaBerline.com 14
GoogLeNetInsight#3StackIncepKonModulesUponEachOther
• Occasionallyinsertmax-poolinglayerswithstride2todecimate(byhalf)theresoluKonofthegrid.
• StackingIncepKonLayersbenefitstheresultswhenusedathigherlayers(notstrictlynecessary)– LowerlayersarekeptintradiKonalconvoluKonsfashion(formemoryefficiencyreasons)
• ThisstackingallowsfortweakingeachmodulewithoutuncontrolledblowupincomputaKonalcomplexityatlaterstages.– Forexample,atweakcouldbeincreasewidthatanystage.
www.shaBerline.com 15
GoogLeNetComponentsStackingIncepKonModules
3b 4a 4b 4c 4d 4e 5a3a 5b
Input
Average Pooling
TraditionalConvolutions
(Conv + MaxPool + Conv + MaxPool)
Linear
Nine Inception Modules
SoftMax w/LossMaxPool
Label
www.shaBerline.com 16
GoogLeNetInsight#4Counter-BalancingBack-PropagaKonDownsidesinDeepNetworks• ApotenKalproblem– Back-propagaKngthrudeepnetworkscouldresultin“vanishinggradients”(possiblymean,deadReLUs).
• AsoluKon– Intermediatelayersdohavediscriminatorypowers– Auxiliaryclassifierswereappendedtotheintermediatelayers
– Duringtraining,theintermediatelosswasaddedtothetotallosswithadiscountedfactorof0.3
www.shaBerline.com 17
TwoAddiKonalLossLayersforTrainingtoDepth
3b 4a 4b 4c 4d 4e 5a3a 5b
Input
Average Pooling
TraditionalConvolutions
(Conv + MaxPool + Conv + MaxPool)
Linear
Nine Inception Modules
SoftMax w/Loss 2MaxPool
AveragePooling 1x1
Conv
DropOutFullyConnected
SoftMaxw/Loss 0Linear
Label
SoftMaxw/Loss 1
www.shaBerline.com 18
GoogLeNetInsight#5EndwithGlobalAveragePoolingLayerInsteadofFullyConnectedLayer
• Fully-Connectedlayersarepronetoover-fipng– HampersgeneralizaKon
• AveragePoolinghasnoparametertoopKmize,thusnoover-fipng.• AveragingmorenaKvetotheconvoluKonalstructure
– Naturalcorrespondencebetweenfeature-mapsandcategoriesleadingtoeasierinterpretaKon
• AveragePoolingdoesnotexcludetheuseofDropouts,aprovenregularizaKonmethodtoavoidover-fipng.
3b 4a 4b 4c 4d 4e 5a3a 5b
GlobalAverage Pooling
LinearLayer for
adapting to other label Sets
SoftMax w/Loss
Label
www.shaBerline.com 19
SummarizingTheInsights1. Exploitfullythefactthat,inImages,correlaKontend
tobelocal• Concatenate1X1,3X3,5x5convoluKonsalongwithpooling
2. DecreasedimensionswherevercomputaKonrequirementsincreaseviaa1X1DimensionReducKonLayer
3. StackIncepKonModulesUponEachOther4. Counter-BalanceBack-PropagaKonDownsidesin
DeepNetwork• Usesintermediatelossesinthefinalloss
5. EndwithGlobalAveragePoolingLayerInsteadofFullyConnectedLayer
www.shaBerline.com 20