georgia tech cse6242 - intro to deep learning and dl4j
DESCRIPTION
Introduction to deep learning and DL4J - http://deeplearning4j.org/ - a guest lecture by Josh Patterson at Georgia Tech for the cse6242 graduate class.TRANSCRIPT
Deep Learning with DL4J
Scaleout Deep Learning
Josh Patterson
Email:[email protected]
Twitter:
@jpatanooga
Github:
https://github.com/jpatanooga
Past
Published in IAAI-09:
“TinyTermite: A Secure Routing Algorithm”
Grad work in Meta-heuristics, Ant-algorithms
Tennessee Valley Authority (TVA)
Hadoop and the Smartgrid
Cloudera
Principal Solution Architect
Today: Patterson Consulting
Overview
• What is Deep Learning?
• Deep Belief Networks
• DL4J
What is Deep Learning?
What is Deep Learning?
Algorithm that tries to learn simple features in lower layers
And more complex features in higher layers
Interesting Properties of Deep Learning
Reduces a problem with overfitting in neural networks.
Introduces new techniques for "unsupervised feature learning”
introduces new more automatic ways to figure out the parts of your data you should feed into your learning algorithm.
Chasing Nature
Learning sparse representations of auditory signals
leads to filters that closely correspond to neurons in early audio processing in mammals
When applied to speech
Learned representations showed a striking resemblance to the cochlear filters in the auditory cortext
Yann LeCunn on Deep Learning
Has become the dominant method for acoustic modeling in speech recognition
Quickly becoming the dominant method for several vision tasks such as
object recognition
object detection
semantic segmentation.
Deep Belief Networks
What is a Deep Belief Network?
Generative probabilistic model
Composed of one visible layer
Many hidden layers
Restricted Boltzman Machines
Each hidden layer learns relationship between units in lower layer
Higher layer representations tend to become more complex
Restricted Boltzmann Machines
• Unsupervised model
• Does feature learning by repeated sampling of the input data.
• Learns how to reconstruct data for good feature detection.
Deep Belief Network Training
Pre-Train
We should each RBM layer unlabeled vectors
“unsupervised learning”
For each layer we want to minimize the Cross Entropy
Fine-Tune
We move the learned weights (hidden bias units) from the RBMs to a traditional feed-forward neural network
We run gentle back-propagation with some labeled data
Pre-Train Reconstructions
High Cross Entropy
Low Cross Entropy
Deep Belief Network Diagram
• DBNs are classifiers• Layers of RBMs• Capped with a Logistic
Layer• RBMs are feature extractors• RBMs learn features via
sampling• Creates “simpler
problem” for later layers in stack
Rendering RBM Hidden Neuron
Filters
DeepLearning4J
Implementation in Java
Self-contained & built on Akka, Hazelcast, Jblas
Runs on desktop
Runs on Hadoop via YARN natively to scale out
Distributed to run faster and with more features than current Theano-based implementations
Vectorized Implementation
Handles lots of data concurrently.
Any number of examples at once, but the code does not change.
Faster: Allows for native/GPU execution.
One format: Everything is a matrix.
What are Good Applications for Deep Learning?
Image Processing
High MNIST Scores
Audio Processing
Current Champ on TIMIT dataset
Text / NLP Processing
Word2vec, etc
19
Parameter Averaging
McDonald, 2010
Distributed Training Strategies for the Structured Perceptron
Langford, 2007
Vowpal Wabbit
Jeff Dean’s Work on Parallel SGD
DownPour SGD
Parallelizing Deep Belief Networks
Two phase training
Pre Train
Fine tune
Each phase can do multiple passes over dataset
Entire network is averaged at master
PreTrain and Lots of Data
We’re exploring how to better leverage the unsupervised aspects of the PreTrain phase of Deep Belief Networks
Allows for the use of far less unlabeled data
Allows us to more easily modeled the massive amounts of structured data in HDFS
Refernces
Visualizing RBMs
https://jpatanooga.github.io/Metronome/rbm20140306.html
DL4J
http://deeplearning4j.org/