neural networks: introduction - svivek · neural networks a robust approach for approximating...

Post on 20-May-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

MachineLearning

NeuralNetworks:Introduction

1BasedonslidesandmaterialfromGeoffreyHinton,RichardSocher,DanRoth,Yoav Goldberg,ShaiShalev-Shwartz andShaiBen-David,andothers

Wherearewe?

Generallearningprinciples• Overfitting• Mistake-boundlearning• PAClearning,samplecomplexity• Hypothesischoice&VCdimensions• Trainingandgeneralizationerrors• RegularizedEmpiricalLoss

Minimization• BayesianLearning

Learningalgorithms• DecisionTrees• Perceptron• AdaBoost• SupportVectorMachines• NaïveBayes• LogisticRegression

4

Producelinearclassifiers

NeuralNetworks

• Whatisaneuralnetwork?

• Predictingwithaneuralnetwork

• Trainingneuralnetworks

• Practicalconcerns

6

Thislecture

• Whatisaneuralnetwork?– Thehypothesisclass– Structure,expressiveness

• Predictingwithaneuralnetwork

• Trainingneuralnetworks

• Practicalconcerns

7

Wehaveseenlinearthresholdunits

11

features

dotproduct

threshold

Predictionsgn(&'( + *) = sgn(∑./0/ + *)

Learningvariousalgorithmsperceptron,SVM,logisticregression,…

ingeneral,minimizeloss

Butwheredotheseinputfeaturescomefrom?

Whatifthefeatureswereoutputsofanotherclassifier?

Featuresfromclassifiers

12

Featuresfromclassifiers

13

Featuresfromclassifiers

14

Eachoftheseconnectionshavetheirownweightsaswell

Featuresfromclassifiers

15

Featuresfromclassifiers

16

Thisisatwolayerfeedforwardneuralnetwork

Featuresfromclassifiers

17

Theoutputlayer

ThehiddenlayerTheinputlayer

Thisisatwolayerfeedforwardneuralnetwork

Thinkofthehiddenlayeraslearningagoodrepresentationoftheinputs

Featuresfromclassifiers

19

Thedotproductfollowedbythethresholdconstitutesaneuron

Fiveneuronsinthispicture(fourinhiddenlayerandoneoutput)

Thisisatwolayerfeedforwardneuralnetwork

Butwheredotheinputscomefrom?

20

Whatiftheinputsweretheoutputsofaclassifier?Theinputlayer

Wecanmakeathree layernetwork….Andsoon.

Letustrytoformalizethis

21

Neuralnetworks

Arobustapproachforapproximatingreal-valued,discrete-valuedorvectorvaluedfunctions

Amongthemosteffectivegeneralpurpose supervisedlearningmethodscurrentlyknown

Especiallyforcomplexandhardtointerpretdatasuchasreal-worldsensorydata

TheBackpropagationalgorithmforneuralnetworkshasbeenshownsuccessfulinmanypracticalproblems

Acrossvariousapplicationdomains

22

Artificialneurons

Functionsthatverylooselymimicabiologicalneuron

Aneuronacceptsacollectionofinputs(avectorx)andproducesanoutputby:

1. Applyingadotproductwithweightsw andaddingabiasb2. Applyinga(possiblynon-linear)transformationcalledanactivation

25

123423 = activation(&'( + *)

Artificialneurons

Functionsthatverylooselymimicabiologicalneuron

Aneuronacceptsacollectionofinputs(avectorx)andproducesanoutputby:

1. Applyingadotproductwithweightsw andaddingabiasb2. Applyinga(possiblynon-linear)transformationcalledanactivation

27

Dotproduct

Thresholdactivation

Otheractivationsarepossible

123423 = activation(&'( + *)

Activationfunctions

Nameoftheneuron Activationfunction:activation(;)Linearunit ;Threshold/sign unit sgn(;)

Sigmoidunit1

1 + exp(−;)Rectifiedlinearunit(ReLU) max(0, ;)Tanh unit tanh(;)

28

123423 = activation(&'( + *)

Manymoreactivationfunctionsexist(sinusoid,sinc,gaussian,polynomial…)

Alsocalledtransferfunctions

Aneuralnetwork

Afunctionthatconvertsinputstooutputsdefinedbyadirectedacyclicgraph

– Nodesorganizedinlayers,correspondtoneurons

– Edgescarryoutputofoneneurontoanother,associatedwithweights

• Todefineaneuralnetwork,weneedtospecify:– Thestructureofthegraph

• Howmanynodes,theconnectivity– Theactivationfunctiononeachnode– Theedgeweights

30

Input

Hidden

Output

wFGH

wFGI

Aneuralnetwork

Afunctionthatconvertsinputstooutputsdefinedbyadirectedacyclicgraph

– Nodesorganizedinlayers,correspondtoneurons

– Edgescarryoutputofoneneurontoanother,associatedwithweights

• Todefineaneuralnetwork,weneedtospecify:– Thestructureofthegraph

• Howmanynodes,theconnectivity– Theactivationfunctiononeachnode– Theedgeweights

31

Input

Hidden

Output

wFGH

wFGI

Aneuralnetwork

Afunctionthatconvertsinputstooutputsdefinedbyadirectedacyclicgraph

– Nodesorganizedinlayers,correspondtoneurons

– Edgescarryoutputofoneneurontoanother,associatedwithweights

• Todefineaneuralnetwork,weneedtospecify:– Thestructureofthegraph

• Howmanynodes,theconnectivity– Theactivationfunctiononeachnode– Theedgeweights

32

CalledthearchitectureofthenetworkTypicallypredefined,partofthedesignoftheclassifier

Input

Hidden

Output

wFGH

wFGI

Aneuralnetwork

Afunctionthatconvertsinputstooutputsdefinedbyadirectedacyclicgraph

– Nodesorganizedinlayers,correspondtoneurons

– Edgescarryoutputofoneneurontoanother,associatedwithweights

• Todefineaneuralnetwork,weneedtospecify:– Thestructureofthegraph

• Howmanynodes,theconnectivity– Theactivationfunctiononeachnode– Theedgeweights

33

CalledthearchitectureofthenetworkTypicallypredefined,partofthedesignoftheclassifier

Learnedfromdata

Input

Hidden

Output

wFGH

wFGI

Abriefhistoryofneuralnetworks

• 1943:McCulloughandPittsshowedhowlinearthresholdunitscancomputelogicalfunctions

• 1949:Hebbsuggestedalearningrulethathassomephysiologicalplausibility

• 1950s:Rosenblatt,thePeceptron algorithmforasinglethresholdneuron

• 1969:MinskyandPapert studiedtheneuronfromageometricalperspective

• 1980s:Convolutionalneuralnetworks(Fukushima,LeCun),thebackpropagationalgorithm(various)

• Early2000s-today:Morecompute,moredata,deepernetworks

34Seealso:http://people.idsia.ch/~juergen/deep-learning-overview.html

very

Whatfunctionsdoneuralnetworksexpress?

35

Asingleneuronwiththresholdactivation

36

Prediction=sgn(b+w1 x1 +w2x2)

++

++

+ +++

-- --

-- -- --

---- --

--

b+w1 x1 +w2x2=0

Twolayers,withthresholdactivations

37

Ingeneral,convexpolygons

FigurefromShaiShalev-Shwartz andShaiBen-David,2014

Threelayerswiththresholdactivations

38

Ingeneral,unionsofconvexpolygons

FigurefromShaiShalev-Shwartz andShaiBen-David,2014

Neuralnetworksareuniversalfunctionapproximators

• Anycontinuousfunctioncanbeapproximatedtoarbitraryaccuracyusingonehiddenlayerofsigmoidunits[Cybenko 1989]

• Approximationerrorisinsensitivetothechoiceofactivationfunctions[DasGupta etal1993]

• Twolayerthreshold networkscanexpressanyBooleanfunction– Exercise:Provethis

• VCdimensionofthresholdnetworkwithedgesE:JK = L(|N|log|N|)

• VCdimensionofsigmoidnetworkswithnodesVandedgesE:– Upperbound:Ο J H N H

– Lowerbound:Ω N H

39

Exercise:Showthatifwehaveonlylinearunits,thenmultiplelayersdoesnotchangetheexpressiveness

Neuralnetworksareuniversalfunctionapproximators

• Anycontinuousfunctioncanbeapproximatedtoarbitraryaccuracyusingonehiddenlayerofsigmoidunits[Cybenko 1989]

• Approximationerrorisinsensitivetothechoiceofactivationfunctions[DasGupta etal1993]

• Twolayerthreshold networkscanexpressanyBooleanfunction– Exercise:Provethis

• VCdimensionofthresholdnetworkwithedgesE:JK = L(|N|log|N|)

• VCdimensionofsigmoidnetworkswithnodesVandedgesE:– Upperbound:Ο J H N H

– Lowerbound:Ω N H

40

Neuralnetworksareuniversalfunctionapproximators

• Anycontinuousfunctioncanbeapproximatedtoarbitraryaccuracyusingonehiddenlayerofsigmoidunits[Cybenko 1989]

• Approximationerrorisinsensitivetothechoiceofactivationfunctions[DasGupta etal1993]

• Twolayerthreshold networkscanexpressanyBooleanfunction– Exercise:Provethis

• VCdimensionofthresholdnetworkwithedgesE:JK = L(|N|log|N|)

• VCdimensionofsigmoidnetworkswithnodesVandedgesE:– Upperbound:Ο J H N H

– Lowerbound:Ω N H

41

Exercise:Showthatifwehaveonlylinearunits,thenmultiplelayersdoesnotchangetheexpressiveness

top related