representation learning and deep neural networks -...

IntroductionDeep Learning

Applications

Representation Learning and Deep NeuralNetworks

Davide Bacciu

Dipartimento di InformaticaUniversità di [email protected]

Applied Brain Science - Computational Neuroscience (CNS)


Applications

Representation LearningDeep RepresentationsBio-Inspired Foundations

Learning Representations in the Brain

Sensory information is represented by neural activityResponse selectivity of individual neuronsDistribution of activation in neural populationSomething we have seen so far

Neural representation is hierarchicalBrain cortex processes information incrementallyIncreasing levels of abstraction

Representation Learning

How can we obtain articulated hierarchical representations ofinformation in computational models?


Applications


Representation Learning - A Computational View

Learning the unknown causes underlying dataCauses explain the data we observeAre also the basic building blocks which, when combined,generate the data we observe

Something we have already seen with RBM

Try explaining input data v usingthe unknown causes hLearning is

Generative as causes canreconstruct the dataUnsupervised as interest is inrepresenting information ratherthan, e.g., predicting a class


Applications


Representation Learning - A Classical View

Representation learning as density estimation: learn aprobability distribution for the data v that uses latent variables h

Learning of a Gaussian Mixture ModelData likelihood P(v|h)

Posterior P(h|v)

Several models in this classical view: PCA, ICA, factor analysis,clustering, ...


Applications


Representation Learning - A Modern View

Learn representations (a.k.a. causes or features) that areHierarchical: representations at increasing levels ofabstractionDistributed: information is encoded by a multiplicity ofcausesShared among tasksSparse: enforcing neural selectivityCharacterized by simple dependencies: a simple (linear)combination of the representations should be sufficient togenerate data

Deep models follow this modern view of representation learning


Applications


What is Deep Learning?

Machine learning algorithms inspired by brain organization,based on learning multiple levels of representation andabstraction

Learning models with many layers trained layer-wiseBuild a hierarchical feature space through layeringReduce the need of supervised information

Unsupervised discovery of features in the internal layersFinal layer performs supervised step


Applications


Learning Features

The traditional shallow way

The deep way


Applications


Why Deep Learning?


Applications


No, Seriously.. Why Deep Learning?

Deep learning is THE hot topic nowRevolutionized performance in

Speech recognitionMachine vision

Now expanding to other topicsNatural languageReinforcement learningRobotics

Its origins and inspiration can be traced back to brain/cortexorganization


Applications


Hierarchical RepresentationHMAX Model

Explaining the structure of the visual cortex in mammals

M. Riesenhuber, T. Poggio, Hierarchical Models of Object Recognition in Cortex. Nature Neuroscience 2:1019-1025, 1999


Applications


Sparse RepresentationNeuroscience

Sparse coding theory in brain: sensory information in the brainis represented by a relatively small number of simultaneouslyactive neurons out of a large population (B.A. Olshausen, D.J.Field, 1996)


Applications


Sparse RepresentationMachine Learning

The machine learning perspective: a single layer networklearns better to generate a target output if the input has asparse representation (Willshaw and Dayan, 1990)


Applications

IntroductionDeep Learning ArchitecturesPractical Things to Know

Deep Networks

Deep architecture with multiple layers focused on learningsparse encoding of input dataUnsupervised training between layers to decompose theproblem into distributed sub problems with increasinglevels of abstractionDeep networks type

Deep Generative ModelsStacked Neural AutoencodersDeep convolutional neural networksRecurrent neural networksHybrids of the above


Applications


Training Deep Networks

Problem with conventional backpropagation trainingStrongly relies on labeled training dataLearning does not scale well to multiple hidden layers

Greedy layer-wise trainingUnsupervised (internal) layer by layer representationlearning (pre-training)Supervised training of last layer (read-out) plus optionalfine-tuning

Key advantagesGive full learning focus to each layerExploit unlabeled data using supervised training only forfine tuning


Applications


Deep Generative Models

Intuition - Train a network where hidden units h organized into ahierarchy extract a good representation data from visible units x

Architecture - A networkof stacked RestrictedBoltzmann Machines(RBM) plus a supervisedread-out layer for makingpredictions


Applications


Deep Belief Networks (DBN)

Pretraining of the independent RBM byContrastive-Divergence, which are then unrolled to form thedeep architecture that is fine-tuned by error backpropagation


Applications


Deep (Restricted) Boltzmann Machines

DBM are directed generative models⇒ To train a Deep RBMwe need a pre-training trick

Sampling h1 by averagingbottom-up and top-downcontribution

h1j ∼ σ(∑

i

Mijxi +∑

m

Mjmh2m)

For multiple layers

Apply modified training to firstand last layer

Halve the weights of innerlayers


Applications


Neural Autoencoder Networks

Non-accretive associative memories for feature discovery anddata compression

E.g. linear hidden units with MSE perform PCA (guess why?)


Applications


Sparse Autoencoders

Autoencoders using more features with a significant numberbeing 0 when encoding an input

Using regularization approaches to enforce sparsity


Applications


Stacked Neural Autoencoders

Stack autoencoders with level-wise training as with DBM

Train each autoencoder in isolation and drop the decoding layerwhen training is completed


Applications


Convolutional Neural Networks (CNNs) - LeNet

Very much inspired by the visual processing pipeline in thebrain

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition,Proceedings of the IEEE 86(11): 2278-2324, 1998


Applications


Strategies to Control Model Complexity(Regularization)

Weight SharingExploit symmetry orstationarity assumption toreduce the number ofparametersConvolutional NN,recurrent and recursive NN

Dropout

Randomly disconnecthidden neurons duringtrainingNeed to scale hiddenweights at test time


Applications


Stochastic Gradient Descent

Mini-batchesCompute stochastic gradient onsmall batches of data rather thanon single sampleStabler and quicker learning andfacilitates GPU computing as abyproduct

Momentum Method

∆w(t+1) = α∆w(t)−ε∂E∂w

(t+1)Changes the velocity of theweight particle instead ofthe positionMay quicken convergencedue to avoiding oscillations


Applications

Deep Learning ApplicationsConclusions

Document Encoding - Deep Belief Network

Finding sparse features for > 800K newswire stories

DBN document encoding Latent semantic analysisHinton, Salakhutdinov, Reducing the dimensionality of data with neural networks, Science, 2006


Applications


DeepFace: a.k.a. beating humans at recognizingfaces

DeepFace recognition accuracy is ≈ 97.35%

23% better than previous resultsCNN with more than 120 million parameters

Taigman et al, Deepface: Closing the gap to human-level performance in faceverification, CVPR 2014


Applications


Learning to Play Atari (I)

Integrating CNN with Reinforcement Learning

Mnih et al, Human-level control through deep reinforcement learning, Nature 2015


Applications


Learning to Play Atari (II)


Applications


Deep Learning and Robotics

Learning where to grasp objects with robotic manipulatorsDeep auto-encoder networkPredicts which grasping box works better

Lenz et al, Deep Learning for Detecting Robotic Grasps, IJRR 2014


Applications


Cheap DL Robots


Applications


Deep Learning API

Caffe - DL framework for computer visionhttp://demo.caffe.berkeleyvision.org/

Theano - Python library with GPU supporthttp://deeplearning.net/software/theano/

Torch - Scientific computing environmenthttp://torch.ch/

MatConvNet - CNN in Matlab with GPU supporthttp://www.vlfeat.org/matconvnet/

CNTK - Microsoft NN and DL toolkithttp://www.cntk.ai/

Tensorflow - Google NN and DL libraryhttps://www.tensorflow.org/

Make sure you have a good GPU!

http://demo.caffe.berkeleyvision.org/

http://deeplearning.net/software/theano/

http://torch.ch/

http://www.vlfeat.org/matconvnet/

https://www.tensorflow.org/


Applications


Want to try something easy out?

Google Tensorflow Playgroundhttp://playground.tensorflow.org

Language learning and generationhttp://karpathy.github.io/2015/05/21/rnn-effectiveness/

Caffe image classification demohttp://demo.caffe.berkeleyvision.org/

Nvidia Digits deep learning GUIhttps://developer.nvidia.com/digits

http://playground.tensorflow.org

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

http://demo.caffe.berkeleyvision.org/

https://developer.nvidia.com/digits


Applications


Where is DL heading?

Image and video understanding by integration with naturallanguage descriptions

Bringing in biological models of attentionNatural language understanding and generation

Emotion understandingMoving from sequential representation of text to parse trees

Will soon impact heavily in roboticsSensorimotor learningPolicy learning: deep reinforcement learningCloud robotics: deep learned knowledge available toconnected devicesOnboard GPU computingSelf-driving cars, drones, connected devices


Applications


Take Home Messages

Deep learning is about learning features of complex dataFeatures ≡ hidden stochastic neuronsStrongly rooted in hierarchical information processing in thebrain

Stacking and level-wise trainingLet the deep network discover the features and then placeyour preferred learning model to perform your task

Breakthrough performance in several learningtasks/application areas

Complex spatio/temporal structured data

Do we really know what is going on with the encoding?How many layers?

representation learning and deep neural networks -...

Documents