googlenet insights

FiveInsightsfromGoogLeNet

YouCouldUseInYourOwnDeepLearningNetsAuroTripathy

3b 4a 4b 4c 4d 4e 5a3a 5b

www.shaBerline.com 1

Year1989Kicked-OffConvoluKonNeuralNetsTen-DigitClassifierusingaModestNeuralNetworkwithThreeHiddenLayers

Backpropaga)onAppliedtoHandwri4enZipCodeRecogni)on.LeCun,et.al.hBp://yann.lecun.com/exdb/publis/pdf/lecun-89e.pdf

HiddenUnits

Connec-ons Params

Out–H3(FC)

10Visible

10x(30W+1B)=310

10x(30W+1B)=310

H3–H2(FC)

30 30*(192Weights+1Bias)=5790

30*(192W+1B)=5790

H2–H1(Conv)

12X4x4=192

192x(5x5x8+1)=38592

5x5x8x12+192Biases=2592

H1–Input(Conv)

12x8x8=768

768x(5x5x1+1)=19968

5x5x1x12+768Biases=1068

Totals 16x16In+990Hidden+10Out

64660ConnecKons

9760Params

EachoftheunitsinH2combineslocalinformaKoncomingfrom8ofthe12differentfeaturemapsinH1.www.shaBerline.com 2

Year2012MarkedTheInflecKonPointReintroducingCNNsLedtoBigDropinErrorforImageClassificaKon.

SinceThen, NetworksConKnuedtoReduce

28.2

25.8

16.4

11.7

7.3 6.7

3.57

0

5

10

15

20

25

30

ILSVRC'10 ILSVRC'11 ILSVRC'12(Alexnet)

ILSVRC'13 ILSVRC'14 ILSVRC'14(GoogLeNet)

ILSVRC'15(ResNet)

0

20

40

60

80

100

120

140

160

Error%

Layers


Top-5

TheTrendhasbeentoIncreasethenumberofLayers(&LayerSize)

•  Thetypical‘designpaBern’forConvoluKonalNeuralNets:–  StackedconvoluKonallayers,

•  linearfilterfollowedbyanon-linearacKvaKon–  FollowedbycontrastnormalizaKonandmaxpooling,–  PenulKmatelayers(oneormore)arefullyconnectedlayers.–  UlKmatelayerisalosslayer,possiblymorethanone,inaweightedmix

•  Useofdropoutstoaddresstheproblemofover-fipngduetomanylayers

•  InaddiKontoclassificaKon,architecturegoodforlocalizaKonandobjectdetecKon–  despiteconcernsthatmax-poolingdilutesspaKalinformaKon


TheChallengeofDeepNetworks

1.  Addinglayersincreasesthenumberofparametersandmakesthenetworkpronetoover-fipng–  Exacerbatedbypaucityofdata– MoredatameansmoreexpenseintheirannotaKon

2.  MorecomputaKon–  LinearincreaseinfiltersresultsinquadraKcincreaseincompute

–  Ifweightsareclosetozero,we’vewastedcomputeresources


Year2014,GoogLeNetTookAimatEfficiencyandPracKcality

Resultantbenefitsofthenewarchitecture:•  12KmeslesserparametersthanAlexNet– SignificantlymoreaccuratethanAlexNet– Lowermemory-useandlowerpower-useacutelyimportantformobiledevices.

•  Stayswithinthetargeted1.5BillionmulKply-addbudget– ComputaKonalcost“lessthan2XcomparedtoAlexNet”

hBp://www.youtube.com/watch?v=ySrj_G5gHWI&t=12m42s


IntroducingtheIncepKonModule


1x1

5x5

3x3

1x13x3 MaxPooling

PreviousLayer

Concatenate

IntuiKonbehindtheIncepKonModule•  ClusterneuronsaccordingtothecorrelaKonstaKsKcsinthedataset

–  AnopKmallayerednetworktopologycanbeconstructedbyanalyzingthecorrelaKonstaKsKcsoftheprecedinglayeracKvaKonsandandclusteringneuronswithhighlycorrelatedoutputs.

•  Wealreadyknowthat,inthelowerlayers,thereexistshighcorrelaKonsinimagepatchesthatarelocalandnear-local.–  Thesecanbecoveredby1x1convoluKons–  AddiKonally,asmallernumberofspaKallyspread-outclusterscanbecovered

byconvoluKonoverlargerpatches;i.e.,3x3,and5x5–  Andtherewillbedecreasingnumberofpatchesoverlargerandlarger

regions.

•  Italsosuggeststhatthearchitectureisacombina)onoftheofalltheconvoluKons,the1x1,3x3,5x5,asinputtothenextstage

•  Sincemax-poolinghasbeensuccessful,itsuggestsaddingapoolinglayerinparallel


InImages,correlaKontendstobelocal,exploitit.HeterogeneoussetofconvoluKonstocoverspread-outclusters


Coververylocalclustersw/1x1convoluKons

Covermorespread-outclustersw/3x3convoluKons

Coverevenmorespread-outclustersw/5x5convoluKons

5x5 3x3 1x1

5x53x31x1

PreviousLayer

ConceivingtheIncepKonModule


5x5

3x3

1x1

3x3 MaxPooling

Concatenate

PreviousLayer

IncepKonModulePutIntoPracKceJudiciousDimensionReducKon


1x1

5x5

3x3

1x13x3 MaxPooling

PreviousLayer

Concatenate


Insights…

3b 4a 4b 4c 4d 4e 5a3a 5b

GoogLeNetInsight#1(SummaryfrompreviousSlides)

Leadstothefollowingarchitecturechoices:•  Choosingfiltersizesof1X1,3X3,5X5•  Applyingallthreefiltersonthesame“patch”ofimage(noneedtochoose)

•  ConcatenaKngallfiltersasasingleoutputvectorforthenextstage.

•  ConcatenaKnganaddiKonalpoolingpathsincepoolingisessenKaltothesuccessofCNNs.


GoogLeNetInsights#2DecreasedimensionswherevercomputaKonrequirementsincrease

viaa1X1DimensionReducKonLayer•  Useinexpensive1X1convoluKonstocomputereducKonsbeforetheexpensive3X3and3X5convoluKons

•  1X1convoluKonsincludeaReLUacKvaKonmakingthendual-purpose.

1x1

PreviousLayer

ReLU


GoogLeNetInsight#3StackIncepKonModulesUponEachOther

•  Occasionallyinsertmax-poolinglayerswithstride2todecimate(byhalf)theresoluKonofthegrid.

•  StackingIncepKonLayersbenefitstheresultswhenusedathigherlayers(notstrictlynecessary)–  LowerlayersarekeptintradiKonalconvoluKonsfashion(formemoryefficiencyreasons)

•  ThisstackingallowsfortweakingeachmodulewithoutuncontrolledblowupincomputaKonalcomplexityatlaterstages.–  Forexample,atweakcouldbeincreasewidthatanystage.


GoogLeNetComponentsStackingIncepKonModules

3b 4a 4b 4c 4d 4e 5a3a 5b

Input

Average Pooling

TraditionalConvolutions

(Conv + MaxPool + Conv + MaxPool)

Linear

Nine Inception Modules

SoftMax w/LossMaxPool

Label


GoogLeNetInsight#4Counter-BalancingBack-PropagaKonDownsidesinDeepNetworks•  ApotenKalproblem–  Back-propagaKngthrudeepnetworkscouldresultin“vanishinggradients”(possiblymean,deadReLUs).

•  AsoluKon–  Intermediatelayersdohavediscriminatorypowers– Auxiliaryclassifierswereappendedtotheintermediatelayers

– Duringtraining,theintermediatelosswasaddedtothetotallosswithadiscountedfactorof0.3


TwoAddiKonalLossLayersforTrainingtoDepth

3b 4a 4b 4c 4d 4e 5a3a 5b

Input

Average Pooling

TraditionalConvolutions

(Conv + MaxPool + Conv + MaxPool)

Linear

Nine Inception Modules

SoftMax w/Loss 2MaxPool

AveragePooling 1x1

Conv

DropOutFullyConnected

SoftMaxw/Loss 0Linear

Label

SoftMaxw/Loss 1


GoogLeNetInsight#5EndwithGlobalAveragePoolingLayerInsteadofFullyConnectedLayer

•  Fully-Connectedlayersarepronetoover-fipng–  HampersgeneralizaKon

•  AveragePoolinghasnoparametertoopKmize,thusnoover-fipng.•  AveragingmorenaKvetotheconvoluKonalstructure

–  Naturalcorrespondencebetweenfeature-mapsandcategoriesleadingtoeasierinterpretaKon

•  AveragePoolingdoesnotexcludetheuseofDropouts,aprovenregularizaKonmethodtoavoidover-fipng.

3b 4a 4b 4c 4d 4e 5a3a 5b

GlobalAverage Pooling

LinearLayer for

adapting to other label Sets

SoftMax w/Loss

Label


SummarizingTheInsights1.  Exploitfullythefactthat,inImages,correlaKontend

tobelocal•  Concatenate1X1,3X3,5x5convoluKonsalongwithpooling

2.  DecreasedimensionswherevercomputaKonrequirementsincreaseviaa1X1DimensionReducKonLayer

3.  StackIncepKonModulesUponEachOther4.  Counter-BalanceBack-PropagaKonDownsidesin

DeepNetwork•  Usesintermediatelossesinthefinalloss

5.  EndwithGlobalAveragePoolingLayerInsteadofFullyConnectedLayer


References

•  Seminal– Backpropaga)onAppliedtoHandwri4enZipCodeRecogni)on.LeCun,et.al.

•  DeepNetworks– GoingDeeperwithConvoluKons– NetworkInNetwork


googlenet insights

Engineering