quoc le, stanford & google - tera scale deep learning
DESCRIPTION
TRANSCRIPT
Tera-scale deep learning Quoc V. Le
Stanford University and Google
Joint work with
Greg Corrado Jeff Dean MaAhieu Devin Kai Chen
Rajat Monga Andrew Ng Marc’Aurelio Ranzato
Paul Tucker Ke Yang
Machine Learning successes
Face recogniLon OCR Autonomous car
RecommendaLon systems Web page ranking
Email classificaLon
Quoc Le
The role of Feature ExtracLon in PaAern RecogniLon
Classifier
Feature extracLon (Mostly hand-‐craWed features)
Quoc Le
Hand-‐CraWed Features Computer vision:
Speech RecogniLon:
MFCC Spectrogram ZCR
…
SIFT/HOG SURF
…
Quoc Le
New feature-‐designing paradigm
Unsupervised Feature Learning / Deep Learning Show promises for small datasets Expensive and typically applied to small problems
Quoc Le
The Trend of BigData
Quoc Le
Brain SimulaLon
Watching 10 million YouTube video frames Train on 2000 machines (16000 cores) for 1 week 1.15 billion parameters -‐ 100x larger than previously reported -‐ Small compared to visual cortex
Pooling Size = 5 Number of maps = 8
Image Size = 200
Number of output channels = 8
Number of input channels = 3
One
laye
r
RF size = 18
Input to another layer above (image with 8 channels)
W
H
LCN Size = 5
Le, et al., Building high-‐level features using large-‐scale unsupervised learning. ICML 2012
Image
Autoencoder
Autoencoder
Autoencoder
Face detector Human body detector Cat detector
Key results
Totally unsupervised!
~85% correct in classifying face vs no face
Le, et al., Building high-‐level features using large-‐scale unsupervised learning. ICML 2012
ImageNet 2009 (10k categories): Best published result: 17% (Sanchez & Perronnin ‘11 ), Our method: 20% Using only 1000 categories, our method > 50%
0.005% Random guess
9.5% State-‐of-‐the-‐art
(Weston, Bengio ‘11)
15.8% Feature learning From raw pixels
Quoc Le
ImageNet classificaLon
Prior art
# Examples
# Dimensions
100,000
1,000
Scaling up Deep Learning
# Parameters 10,000,000
Our work
10,000,000
10,000
1,000,000,000
Learned features Edge filters from Images
High-‐level features Face, cat detectors
Data set size Gbytes Tbytes
Quoc Le
Summary of Scaling up
-‐ Local connecLvity (Model Parallelism)
-‐ Asynchronous SGDs (Clever opLmizaLon / Data parallelism) -‐ RPCs
-‐ Prefetching
-‐ Single
-‐ Removing slow machines
-‐ Lots of opLmizaLon
Quoc Le
Locally connected networks
Machine #1 Machine #2 Machine #3 Machine #4
Features
Image
Quoc Le
Asynchronous Parallel SGDs (Alex Smola’s talk)
Parameter server
Quoc Le
• Scale deep learning 100x larger using distributed training on 1000 machines
• Brain simulaLon -‐> Cat neuron • State-‐of-‐the-‐art performances on
– Object recogniLon (ImageNet) – AcLon RecogniLon – Cancer image classificaLon
• Other applicaLons – Speech recogniLon – Machine TranslaLon
Conclusions
Face neuron
0.005% 9.5% 15.8% Random guess Best published result Our method
ImageNet
Cat neuron
Parameter server
Model Parallelism
Data Parallelism
• Q.V. Le, M.A. Ranzato, R. Monga, M. Devin, G. Corrado, K. Chen, J. Dean, A.Y. Ng. Building high-‐level features using large-‐scale unsupervised learning. ICML, 2012.
• Q.V. Le, J. Ngiam, Z. Chen, D. Chia, P. Koh, A.Y. Ng. Tiled Convolu7onal Neural Networks. NIPS, 2010.
• Q.V. Le, W.Y. Zou, S.Y. Yeung, A.Y. Ng. Learning hierarchical spa7o-‐temporal features for ac7on recogni7on with independent subspace analysis. CVPR, 2011.
• Q.V. Le, J. Ngiam, A. Coates, A. Lahiri, B. Prochnow, A.Y. Ng. On op7miza7on methods for deep learning. ICML, 2011.
• Q.V. Le, A. Karpenko, J. Ngiam, A.Y. Ng. ICA with Reconstruc7on Cost for Efficient Overcomplete Feature Learning. NIPS, 2011.
• Q.V. Le, J. Han, J. Gray, P. Spellman, A. Borowsky, B. Parvin. Learning Invariant Features for Tumor Signatures. ISBI, 2012.
• I.J. Goodfellow, Q.V. Le, A.M. Saxe, H. Lee, A.Y. Ng, Measuring invariances in deep networks. NIPS, 2009.
References
hAp://ai.stanford.edu/~quocle