deep learning presentation

Deep LearningRestricted Boltzmann Machine

Deep Belief NetworkConvolutional RBMConvolutional DBN

Conclusion

Deep Learning

Baptiste [email protected]

September 12, 2014

Baptiste [email protected] Deep Learning



Conclusion

Table of Contents

1 Deep Learning2 Restricted Boltzmann Machine3 Deep Belief Network4 Convolutional RBM5 Convolutional DBN6 Conclusion




Conclusion

DefinitionHistoryUsagesDifficulties

Contents

1 Deep LearningDefinitionHistoryUsagesDifficulties

2 Restricted Boltzmann Machine3 Deep Belief Network4 Convolutional RBM5 Convolutional DBN6 Conclusion




Conclusion


Definition

Deep Learning (Wikipedia)Deep learning is a set of algorithms in machine learning thatattempt to model high-level abstractions in data by using modelarchitectures composed of multiple non-linear transformations

Deep Learning (deeplearning.net)Deep Learning is a new area of Machine Learning research, whichhas been introduced with the objective of moving MachineLearning closer to one of its original goals: Artificial Intelligence.




Conclusion


Definition (cont.d)

Goal: Imitate the natureSet of algorithmsGenerally structures with multiple layersOften unsupervised feature learningTime-consuming trainingSometimes large amount of dataGenerally complex dataNew name for an old thinghot topic




Conclusion


History

1960: Neural networks1985: Multilayer Perceptrons1986: Restricted Boltzmann Machine1995: Support Vector Machine2006: Hinton presents the Deep Belief Network (DBN)

New interests in deep learning and RBMState of the art MNIST

2009: Deep Recurrent Neural Network2010: Convolutional DBN2011: Max-Pooling CDBN

Many competitions won and state of the art results




Conclusion


Names

Geoffrey HintonAndrew Y. NgYoshua BengioHonglak LeeYann LeCun...




Conclusion


Algorithms

Deep Neural NetworksDeep Belief NetworksConvolutional Deep Belief NetworksDeep SVM




Conclusion


Usages

Text recognitionFacial Expression RecognitionObject RecognitionAudio classification




Conclusion


Difficulties

Large number of free variablesFew insights on how to set them

Complex to implementLarge variations between papersLot of refinements were proposed




Conclusion

DefinitionTrainingUnitsVariants

Contents

1 Deep Learning2 Restricted Boltzmann Machine


3 Deep Belief Network4 Convolutional RBM5 Convolutional DBN6 Conclusion




Conclusion


Definition

Restricted Boltzmann MachineFunction: Learn a probability distribution over the inputGenerative stochastic neural networkVisible and hidden neuronsNeurons form a bipartite graphV visible units and visible biasesH hidden units and hidden biasesVxH weights




Conclusion


Definition (Cont.d)

Binary units (Bernoulli RBM)

p(hj = 1|v) = σ(cj +m∑i

viwi ,j)

p(vi = 1|h) = σ(bi +n∑j

hjwi ,j)




Conclusion


Example




Conclusion


Usages

Unsupervised feature learningClassification with other techniques (linear classifier, SVM, ...)Limited to one layer of abstraction

Stacking for higher-level models and classificationDeep Belief NetworkDeep Boltzmann Machines




Conclusion


Training

Objective: Maximizing the log-likelihoodIntractable

Other methods have been developed:Markov Chain Monte Carlo (MCMC) (Too slow)Contrastive Divergence (CD) (Hinton)Persistent CDMean-Field CD (mf-CD)Parallel TemperingAnnealed Importance Sampling




Conclusion


Contrastive Divergence

For each data point1 Compute gradients g between t = k and t = k − 12 Add α ∗ g to the weights and the biases

Repeat for several epochs

Experiments have shown that CD1 (k = 1) works well




Conclusion


Contrastive Divergence

When to stop training ?1 Proxies to log-likelihood:

Reconstruction errorPseudo-likelihood (PCD)

2 Visual inspection of the filtersTraining is relatively fast

Can be trained on GPUHard to compare two RBMsHard to test an implementation correctly




Conclusion


Contrastive Divergence Options

Mini-batch trainingMomentumWeight decaySparsity Target...




Conclusion


Units

RBM Was initially developed with binary unitsDifferent types of units can be used:

Gaussian visible units for real-value inputsSoftmax hidden unit for classification (last layer)Rectified Linear Unit (ReLU) units for hidden/visible

Can be capped




Conclusion


Variants

Convolutional RBM (see later)mean-covariance RBM (mcRBM)Sparse RBM (SRBM)Third-Order RBMSpike And Slab RBMNonnegative RBM...




Conclusion

DefinitionTraining

Contents

1 Deep Learning2 Restricted Boltzmann Machine3 Deep Belief Network

DefinitionTraining

4 Convolutional RBM5 Convolutional DBN6 Conclusion




Conclusion

DefinitionTraining

Definition

Deep Belief NetworkGenerative graphical modelType of Deep Neural NetworkMultiple layer of hidden unitsStack of RBMs

Can be implemented with other autoencoders




Conclusion

DefinitionTraining

Definition (Cont.d)

Each RBM takes inputfrom previous layer outputEach layer forms ahigher-level representationof the dataNumber of hidden units ineach layer can be tuned




Conclusion

DefinitionTraining

Training

1 Train each layer, from bottom to top, with ContrastiveDivergence (Unsupervised)

2 Then treat the DBN as a MLP3 If necessary, fine-tune the last layer for classification

(Supervised)Back propagationnonlinear Conjugate Gradient methodLimited Memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS)Hessian-Free CG (Martens)




Conclusion

DefinitionTrainingProbabilistic Max Pooling

Contents

1 Deep Learning2 Restricted Boltzmann Machine3 Deep Belief Network4 Convolutional RBM


5 Convolutional DBN6 Conclusion




Conclusion


Definition

Convolutional RBMMotivation: Translation-invariance

Scaling to full-size imagesVariant of RBM, concepts remain the sameNV xNV binary visible unitsK groups of hidden unitsNK xNK binary hidden units per groupEach group has a NW xNW filter (NW , NV − NH + 1)A bias bk for each hidden groupA single bias c for all visible units




Conclusion


Definition (Cont.d)

Binary units:

p(hkj = 1|v) = σ(bk + (W̃ k ∗v v)j)

p(vi = 1|h) = σ(c +K∑k(W k ∗f hk)i)




Conclusion


Training

Contrastive DivergenceGradients computations are done with convolutionsSame refinements can be used (weight decay, momentum, ...)

CRBM is highly overcompleteSparse learning is very important




Conclusion


Probabilistic Max Pooling

Shrink the representation by a constant factor CAllows higher-level to be invariant to small translationsReduces computational effort

Generative version of standard Max PoolingPooling layer with K groups of pooling unitsEach group has NPxNP unitsNP , NH/CEach hidden block α (CxC) is connected to exactly onepooling unit




Conclusion


Definition (Cont.d)

Binary units:

p(vi = 1|h) = σ(c +K∑k(W k ∗f hk)i)

I(hkj ) , bk + (W̃ k ∗v v)j

p(hkj = 1|v) = exp(I(hk

i ))

1 +∑

j′∈βαexp(I(hk

i ′))

p(pkα = 0|v) = 1

1 +∑

j′∈βαexp(I(hk

i ′))




Conclusion

Contents





Conclusion

Definition

Stack of Convolutional RBMWith or without ProbabilisticMax Pooling

Each RBM takes input fromprevious layer outputEach layer forms a higher-levelrepresentation of the dataNumber of hidden units in eachlayer can be tuned




Conclusion

Feature Learning

Source: Honglak LeeEach layer learns a differentabstraction of features

1 Stroke2 Parts of faces3 Faces




Conclusion

ImplementationConclusion

Contents






Conclusion


Implementation

Deep Learning Library (DLL)https://github.com/wichtounet/dllRBM

Binary, Gaussian, Softmax, ReLU unitsCD and PCDMomentum, Weight Decay, Sparsity Target

Convolutional RBMStandard versionProbabilistic Max PoolingVarious unitsCD and PCDMomentum, Weight Decay, Sparsity Target


https://github.com/wichtounet/dll



Conclusion


Implementation

DBNPretraining with RBMFine-tuning with Conjugate GradientFine-tuning with Stochastic Gradient Descent




Conclusion


Future Work

Use CDBN for text detectionConvolutional DBNSVM classification layer for DBNRefinements

New training methods for RBM/DBNReduce compute timeMaxout, Dropout




Conclusion


Conclusion

Deep Learning solutions are very powerfulState of the art in several problems ,Still room for improvement ,Still young solutions (hype) ,

HoweverThey are complex to implement /Free variables need to be configured with care /Results from paper are hard to reproduce /Heavy to train /


deep learning presentation

Technology