neural networks and deep learning

129
NEURAL NETWORKS AND DEEP LEARNING ASIM JALIS GALVANIZE

Upload: asim-jalis

Post on 15-Apr-2017

1.665 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Neural Networks and Deep Learning

NEURAL NETWORKS AND DEEPLEARNINGASIM JALIS

GALVANIZE

Page 2: Neural Networks and Deep Learning

INTRO

Page 3: Neural Networks and Deep Learning

ASIM JALISGalvanize/Zipfian, DataEngineeringCloudera, Microso!,SalesforceMS in Computer Sciencefrom University ofVirginia

Page 4: Neural Networks and Deep Learning

GALVANIZE PROGRAMSProgram Duration

Data ScienceImmersive

12weeks

DataEngineeringImmersive

12weeks

WebDeveloperImmersive

6months

Galvanize U 1 year

Page 5: Neural Networks and Deep Learning

TALK OVERVIEW

Page 6: Neural Networks and Deep Learning

WHAT IS THIS TALK ABOUT?Using Neural Networksand Deep LearningTo recognize imagesBy the end of the classyou will be able tocreate your own deeplearning systems

Page 7: Neural Networks and Deep Learning

HOW MANY PEOPLE HERE HAVEUSED NEURAL NETWORKS?

Page 8: Neural Networks and Deep Learning

HOW MANY PEOPLE HERE HAVEUSED MACHINE LEARNING?

Page 9: Neural Networks and Deep Learning

HOW MANY PEOPLE HERE HAVEUSED PYTHON?

Page 10: Neural Networks and Deep Learning

DEEP LEARNING

Page 11: Neural Networks and Deep Learning

WHAT IS MACHINE LEARNINGSelf-driving carsVoice recognitionFacial recognition

Page 12: Neural Networks and Deep Learning

HISTORY OF DEEP LEARNING

Page 13: Neural Networks and Deep Learning

HISTORY OF MACHINE LEARNINGInput Features Algorithm Output

Machine Human Human Machine

Machine Human Machine Machine

Machine Machine Machine Machine

Page 14: Neural Networks and Deep Learning

FEATURE EXTRACTIONTraditionally data scientists to define featuresDeep learning systems are able to extract featuresthemselves

Page 15: Neural Networks and Deep Learning

DEEP LEARNING MILESTONESYears Theme

1980s Backpropagation invented allows multi-layerNeural Networks

2000s SVMs, Random Forests and other classifiersovertook NNs

2010s Deep Learning reignited interest in NN

Page 16: Neural Networks and Deep Learning

IMAGENETAlexNet submitted to the ImageNet ILSVRC challenge in2012 is partly responsible for the renaissance.Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton usedDeep Learning techniques.They combined this with GPUs, some other techniques.The result was a neural network that could classify imagesof cats and dogs.It had an error 16% compared to 26% for the runner up.

Page 17: Neural Networks and Deep Learning

Ilya Sutskever, Alex Krizhevsky, Geoffrey Hinton

Page 18: Neural Networks and Deep Learning

INDEED.COM/SALARY

Page 19: Neural Networks and Deep Learning

MACHINE LEARNING

Page 20: Neural Networks and Deep Learning

MACHINE LEARNING AND DEEPLEARNING

Deep Learning fits insideMachine LearningDeep Learning aMachine LearningtechniqueShare techniques forevaluating andoptimizing models

Page 21: Neural Networks and Deep Learning

WHAT IS MACHINE LEARNING?Inputs: Vectors or points of high dimensionsOutputs: Either binary vectors or continuous vectorsMachine Learning finds the relationship between themUses statistical techniques

Page 22: Neural Networks and Deep Learning

SUPERVISED VS UNSUPERVISEDSupervised: Data needs to be labeledUnsupervised: Data does not need to be labeled

Page 23: Neural Networks and Deep Learning

TECHNIQUESClassificationRegressionClusteringRecommendationsAnomaly detection

Page 24: Neural Networks and Deep Learning

CLASSIFICATION EXAMPLE:EMAIL SPAM DETECTION

Page 25: Neural Networks and Deep Learning

CLASSIFICATION EXAMPLE:EMAIL SPAM DETECTION

Start with large collection of emails, labeled spam/not-spamConvert email text into vectors of 0s and 1s: 0 if a wordoccurs, 1 if it does notThese are called inputs or featuresSplit data set into training set (70%) and test set (30%)Use algorithm like Random Forest to build modelEvaluate model by running it on test set and capturingsuccess rate

Page 26: Neural Networks and Deep Learning

CLASSIFICATION ALGORITHMSNeural NetworksRandom ForestSupport Vector Machines (SVM)Decision TreesLogistic RegressionNaive Bayes

Page 27: Neural Networks and Deep Learning

CHOOSING ALGORITHMEvaluate different models on dataLook at the relative success ratesUse rules of thumb: some algorithms work better on somekinds of data

Page 28: Neural Networks and Deep Learning

CLASSIFICATION EXAMPLESIs this tumor benign or cancerous?Is this lead profitable or not?Who will win the presidential elections?

Page 29: Neural Networks and Deep Learning

CLASSIFICATION: POP QUIZIs classification supervised or unsupervised learning?

Supervised because you have to label the data.

Page 30: Neural Networks and Deep Learning

CLUSTERING EXAMPLE: LOCATECELL PHONE TOWERS

Start with GPScoordinates of all cellphone usersRepresent data asvectorsLocate towers in biggestclusters

Page 31: Neural Networks and Deep Learning

CLUSTERING EXAMPLE: T-SHIRTSWhat size should a t-shirt be?Everyone’s real t-shirtsize is differentLay out all sizes andclusterTarget large clusterswith XS, S, M, L, XL

Page 32: Neural Networks and Deep Learning

CLUSTERING: POP QUIZIs clustering supervised or unsupervised?

Unsupervised because no labeling is required

Page 33: Neural Networks and Deep Learning

RECOMMENDATIONS EXAMPLE:AMAZON

Model looks at userratings of booksViewing a book triggersimplicit ratingRecommend user newbooks

Page 34: Neural Networks and Deep Learning

RECOMMENDATION: POP QUIZAre recommendation systems supervised or unsupervised?

Unsupervised

Page 35: Neural Networks and Deep Learning

REGRESSIONLike classificationOutput is continuous instead of one from k choices

Page 36: Neural Networks and Deep Learning

REGRESSION EXAMPLESHow many units of product will sell next monthWhat will student score on SATWhat is the market price of this houseHow long before this engine needs repair

Page 37: Neural Networks and Deep Learning

REGRESSION EXAMPLE:AIRCRAFT PART FAILURE

Cessna collects datafrom airplane sensorsPredict when part needsto be replacedShip part to customer’sservice airport

Page 38: Neural Networks and Deep Learning

REGRESSION: QUIZIs regression supervised or unsupervised?

Supervised

Page 39: Neural Networks and Deep Learning

ANOMALY DETECTION EXAMPLE:CREDIT CARD FRAUD

Train model on goodtransactionsAnomalous activityindicates fraudCan pass transactiondown to human forinvestigation

Page 40: Neural Networks and Deep Learning

ANOMALY DETECTION EXAMPLE:NETWORK INTRUSION

Train model on networklogin activityAnomalous activityindicates threatCan initiate alerts andlockdown procedures

Page 41: Neural Networks and Deep Learning

ANOMALY DETECTION: QUIZIs anomaly detection supervised or unsupervised?

Unsupervised because we only train on normal data

Page 42: Neural Networks and Deep Learning

FEATURE EXTRACTIONConverting data to feature vectorsNatural Language ProcessingPrincipal Component AnalysisAuto-Encoders

Page 43: Neural Networks and Deep Learning

FEATURE EXTRACTION: QUIZIs feature extraction supervised or unsupervised?

Unsupervised

Page 44: Neural Networks and Deep Learning

MACHINE LEARNING WORKFLOW

Page 45: Neural Networks and Deep Learning

DEEP LEARNING USED FORFeature ExtractionClassificationRegression

Page 46: Neural Networks and Deep Learning

HISTORY OF MACHINE LEARNINGInput Features Algorithm Output

Machine Human Human Machine

Machine Human Machine Machine

Machine Machine Machine Machine

Page 47: Neural Networks and Deep Learning

DEEP LEARNING FRAMEWORKS

Page 48: Neural Networks and Deep Learning

DEEP LEARNING FRAMEWORKSTensorFlow: NN library from GoogleTheano: Low-level GPU-enabled tensor libraryTorch7: NN library, uses Lua for binding, used by Facebookand GoogleCaffe: NN library by Berkeley AMPLabNervana: Fast GPU-based machines optimized for deeplearning

Page 49: Neural Networks and Deep Learning

DEEP LEARNING FRAMEWORKSKeras, Lasagne, Blocks: NN libraries that make Theanoeasier to useCUDA: Programming model for using GPUs in general-purpose programmingcuDNN: NN library by Nvidia based on CUDA, can be usedwith Torch7, CaffeChainer: NN library that uses CUDA

Page 50: Neural Networks and Deep Learning

DEEP LEARNING PROGRAMMINGLANGUAGES

All the frameworks support PythonExcept Torch7 which uses Lua for its binding language

Page 51: Neural Networks and Deep Learning

TENSORFLOWTensorFlow originallydeveloped by GoogleBrain TeamAllows using GPUs fordeep learningalgorithmsSingle processor versionreleased in 2015Multiple processorversion released inMarch 2016

Page 52: Neural Networks and Deep Learning

KERASSupports Theano andTensorFlow as back-endsProvides deep learningAPI on top of TensorFlowTensorFlow provideslow-level matrixoperations

Page 53: Neural Networks and Deep Learning

TENSORFLOW: GEOFFREYHINTON, JEFF DEAN

Page 54: Neural Networks and Deep Learning

KERAS: FRANCOIS CHOLLET

Page 55: Neural Networks and Deep Learning

NEURAL NETWORKS

Page 56: Neural Networks and Deep Learning

WHAT IS A NEURON?

Receives signal on synapseWhen trigger sends signal on axon

Page 57: Neural Networks and Deep Learning

MATHEMATICAL NEURON

Mathematical abstraction, inspired by biological neuronEither on or off based on sum of input

Page 58: Neural Networks and Deep Learning

MATHEMATICAL FUNCTION

Neuron is a mathematical functionAdds up (weighted) inputs and applies sigmoid (or otherfunction)This determines if it fires or not

Page 59: Neural Networks and Deep Learning

WHAT ARE NEURAL NETWORKS?Biologically inspired machine learning algorithmMathematical neurons arranged in layersAccumulate signals from the previous layerFire when signal reaches threshold

Page 60: Neural Networks and Deep Learning

NEURAL NETWORKS

Page 61: Neural Networks and Deep Learning

NEURON INCOMINGEach neuron receivessignals from neurons inprevious layerSignal affected byweightSome are moreimportant than othersBias is the base signalthat the neuron receives

Page 62: Neural Networks and Deep Learning

NEURON OUTGOINGEach neuron sends itssignal to the neurons inthe next layerSignals affected byweight

Page 63: Neural Networks and Deep Learning

LAYERED NETWORK

Each layer looks at features identified by previous layer

Page 64: Neural Networks and Deep Learning

US ELECTIONS

Page 65: Neural Networks and Deep Learning

ELECTIONSConsider the electionsThis is a gated systemA way to aggregatedifferent views

Page 66: Neural Networks and Deep Learning

HIGHEST LEVEL: STATES

Page 67: Neural Networks and Deep Learning

NEXT LEVEL: COUNTIES

Page 68: Neural Networks and Deep Learning

ELECTIONSIs this a Neural Network?How many layers does ithave?

Page 69: Neural Networks and Deep Learning

NEURON LAYERSThe nomination is thelast layer, layer NStates are layer N-1Counties are layer N-2Districts are layer N-3Individuals are layer N-4Individual brains haveeven more layers

Page 70: Neural Networks and Deep Learning

GRADIENT DESCENT

Page 71: Neural Networks and Deep Learning

TRAINING: HOW DO WEIMPROVE?

Calculate error from desired goalIncrease weight of neurons who voted rightDecrease weight of neurons who voted wrongThis will reduce error

Page 72: Neural Networks and Deep Learning

GRADIENT DESCENTThis algorithm is called gradient descentThink of error as function of weights

Page 73: Neural Networks and Deep Learning

FEED FORWARDAlso called forwardpropagation or forwardpropInitialize inputsCalculate activation ofeach layerCalculate activation ofoutput layer

Page 74: Neural Networks and Deep Learning

BACK PROPAGATIONUse forward prop tocalculate the errorError is function of allnetwork weightsAdjust weights usinggradient descentRepeat with next recordKeep going over trainingset until convergence

Page 75: Neural Networks and Deep Learning

HOW DO YOU FIND THE MINIMUMIN AN N-DIMENSIONAL SPACE?

Take a step in the steepest direction.Steepest direction is vector sum of all derivatives.

Page 76: Neural Networks and Deep Learning
Page 77: Neural Networks and Deep Learning

PUTTING ALL THIS TOGETHERUse forward prop toactivateUse back prop to trainThen use forward propto test

Page 78: Neural Networks and Deep Learning

TYPES OF NEURONS

Page 79: Neural Networks and Deep Learning

SIGMOID

Page 80: Neural Networks and Deep Learning

TANH

Page 81: Neural Networks and Deep Learning

RELU

Page 82: Neural Networks and Deep Learning

BENEFITS OF RELUPopularAccelerates convergenceby 6x (Krizhevsky et al)Operation is faster sinceit is linear notexponentialCan die by going to zero

Pro: Sparse matrixCon: Network can die

Page 83: Neural Networks and Deep Learning

LEAKY RELUPro: Does not dieCon: Matrix is not sparse

Page 84: Neural Networks and Deep Learning

SOFTMAXFinal layer of networkused for classificationTurns output intoprobability distributionNormalizes output ofneurons to sum to 1

Page 85: Neural Networks and Deep Learning

HYPERPARAMETER TUNING

Page 86: Neural Networks and Deep Learning

PROBLEM: OIL EXPLORATIONDrilling holes isexpensiveWe want to find thebiggest oilfield withoutwasting money on dudsWhere should we plantour next oilfield derrick?

Page 87: Neural Networks and Deep Learning

PROBLEM: NEURAL NETWORKSTestinghyperparameters isexpensiveWe have an N-dimensional grid ofparametersHow can we quickly zeroin on the bestcombination ofhyperparameters?

Page 88: Neural Networks and Deep Learning

HYPERPARAMETER EXAMPLEHow many layers shouldwe haveHow many neuronsshould we have inhidden layersShould we use Sigmoid,Tanh, or ReLUShould we initialize

Page 89: Neural Networks and Deep Learning

ALGORITHMSGridRandomBayesian Optimization

Page 90: Neural Networks and Deep Learning

GRIDSystematically searchentire gridRemember best foundso far

Page 91: Neural Networks and Deep Learning

RANDOMRandomly search the gridRemember the best found so farBergstra and Bengio’s result and Alice Zheng’sexplanation (see References)60 random samples gets you within top 5% of grid searchwith 95% probability

Page 92: Neural Networks and Deep Learning

BAYESIAN OPTIMIZATIONBalance betweenexplore and exploitExploit: test spots withinexplored perimeterExplore: test new spotsin random locationsBalance the trade-off

Page 93: Neural Networks and Deep Learning

SIGOPTYC-backed SF startupFounded by Scott ClarkRaised $2MSells cloud-basedproprietary variant ofBayesian Optimization

Page 94: Neural Networks and Deep Learning

BAYESIAN OPTIMIZATION PRIMERBayesian Optimization Primer by Ian Dewancker, MichaelMcCourt, Scott ClarkSee References

Page 95: Neural Networks and Deep Learning

OPEN SOURCE VARIANTSOpen source alternatives:

SpearmintHyperoptSMACMOE

Page 96: Neural Networks and Deep Learning

PRODUCTION

Page 97: Neural Networks and Deep Learning

DEPLOYINGPhases: training,deploymentTraining phase run onback-end serversOptimize hyper-parameters on back-endDeploy model to front-end servers, browsers,devicesFront-end only usesforward prop and is fast

Page 98: Neural Networks and Deep Learning

SERIALIZING/DESERIALIZINGMODEL

Back-end: Serialize model + weightsFront-end: Deserialize model + weights

Page 99: Neural Networks and Deep Learning

HDF 5Keras serializes model architecture to JSONKeras serializes weights to HDF5Serialization model for hierarchical dataAPIs for C++, Python, Java, etchttps://www.hdfgroup.org

Page 100: Neural Networks and Deep Learning

DEPLOYMENT EXAMPLE: CANCERDETECTION

Rhobota.com’s cancerdetecting iPhone appDeveloped by BryanShaw a!er his son’sillnessModel built on back-end,deployed on iPhoneiPhone detects retinalcancer

Page 101: Neural Networks and Deep Learning

DEEP LEARNING

Page 102: Neural Networks and Deep Learning

WHAT IS DEEP LEARNING?Deep Learning is a learning method that can train the

system with more than 2 or 3 non-linear hidden layers.

Page 103: Neural Networks and Deep Learning

WHAT IS DEEP LEARNING?Machine learning techniques which enable unsupervisedfeature learning and pattern analysis/classification.The essence of deep learning is to computerepresentations of the data.Higher-level features are defined from lower-level ones.

Page 104: Neural Networks and Deep Learning

HOW IS DEEP LEARNINGDIFFERENT FROM REGULAR

NEURAL NETWORKS?Training neural networks requires applying gradientdescent on millions of dimensions.This is intractable for large networks.Deep learning places constraints on neural networks.This allows them to be solvable iteratively.The constraints are generic.

Page 105: Neural Networks and Deep Learning

AUTO-ENCODERS

Page 106: Neural Networks and Deep Learning

WHAT ARE AUTO-ENCODERS?An auto-encoder is a learning algorithmIt applies backpropagation and sets the target values tobe equal to its inputsIn other words it trains itself to do the identitytransformation

Page 107: Neural Networks and Deep Learning
Page 108: Neural Networks and Deep Learning

WHY DOES IT DO THIS?Auto-encoder places constraints on itselfE.g. it restricts the number of hidden neuronsThis allows it to find a good representation of the data

Page 109: Neural Networks and Deep Learning

IS THE AUTO-ENCODERSUPERVISED OR UNSUPERVISED?

It is unsupervised.The data is unlabeled.

Page 110: Neural Networks and Deep Learning

WHAT ARE CONVOLUTIONNEURAL NETWORKS?

Feedforward neural networksConnection pattern inspired by visual cortex

Page 111: Neural Networks and Deep Learning

CONVOLUTIONAL NEURALNETWORKS

Page 112: Neural Networks and Deep Learning

CNNSThe convolutional layer’s parameters are a set oflearnable filtersEvery filter is small along width and heightDuring the forward pass, each filter slides across the widthand height of the input, producing a 2-dimensionalactivation mapAs we slide across the input we compute the dot productbetween the filter and the input

Page 113: Neural Networks and Deep Learning

CNNSIntuitively, the network learns filters that activate whenthey see a specific type of feature anywhereIn this way it creates translation invariance

Page 114: Neural Networks and Deep Learning

CONVNET EXAMPLE

Zero-Padding: the boundaries are padded with a 0Stride: how much the filter moves in the convolutionParameter sharing: all filters share the same parameters

Page 115: Neural Networks and Deep Learning

CONVNET EXAMPLEFrom http://cs231n.github.io/convolutional-networks/

Page 116: Neural Networks and Deep Learning
Page 117: Neural Networks and Deep Learning

WHAT IS A POOLING LAYER?The pooling layer reduces the resolution of the imagefurtherIt tiles the output area with 2x2 mask and takes themaximum activation value of the area

Page 118: Neural Networks and Deep Learning
Page 119: Neural Networks and Deep Learning

REVIEWkeras/examples/mnist_cnn.py

Recognizes hand-written digitsBy combining different layers

Page 120: Neural Networks and Deep Learning

RECURRENT NEURAL NETWORKS

Page 121: Neural Networks and Deep Learning

RNNSRNNs capture patternsin time series dataConstrained by sharedweights across neuronsEach neuron observesdifferent times

Page 122: Neural Networks and Deep Learning

LSTMSLong Short Term Memory networksRNNs cannot handle long time lags between eventsLSTMs can pick up patterns separated by big lagsUsed for speech recognition

Page 123: Neural Networks and Deep Learning

RNN EFFECTIVENESSAndrej Karpathy usesLSTMs to generate textGenerates Shakespeare,Linux Kernel code,mathematical proofs.Seehttp://karpathy.github.io/

Page 124: Neural Networks and Deep Learning

RNN INTERNALS

Page 125: Neural Networks and Deep Learning

LSTM INTERNALS

Page 126: Neural Networks and Deep Learning

CONCLUSION

Page 127: Neural Networks and Deep Learning

REFERENCESBayesian Optimization by Dewancker et al

Random Search by Bengio et al

Evaluating machine learning modelsAlice Zheng

http://sigopt.com

http://jmlr.org

http://www.oreilly.com

Page 128: Neural Networks and Deep Learning

REFERENCESDropout by Hinton et al

Understanding LSTM Networks by Chris Olah

Multi-scale Deep Learning for Gesture Detection andLocalizationby Neverova et al

Unreasonable Effectiveness of RNNs by Karpathy

http://cs.utoronto.edu

http://github.io

http://uoguelph.ca

http://karpathy.github.io

Page 129: Neural Networks and Deep Learning

QUESTIONS