quoc le, stanford & google - tera scale deep learning

Tera-scale deep learning Quoc V. Le

Stanford University and Google

Joint work with

Greg Corrado Jeff Dean MaAhieu Devin Kai Chen

Rajat Monga Andrew Ng Marc’Aurelio Ranzato

Paul Tucker Ke Yang

Machine Learning successes

Face recogniLon OCR Autonomous car

RecommendaLon systems Web page ranking

Email classificaLon

Quoc Le

The role of Feature ExtracLon in PaAern RecogniLon

Classifier

Feature extracLon (Mostly hand-‐craWed features)

Quoc Le

Hand-‐CraWed Features Computer vision:

Speech RecogniLon:

MFCC Spectrogram ZCR

…

SIFT/HOG SURF

…

Quoc Le

New feature-‐designing paradigm

Unsupervised Feature Learning / Deep Learning Show promises for small datasets Expensive and typically applied to small problems

Quoc Le

The Trend of BigData

Quoc Le

Brain SimulaLon

Watching 10 million YouTube video frames Train on 2000 machines (16000 cores) for 1 week 1.15 billion parameters -‐  100x larger than previously reported -‐  Small compared to visual cortex

Pooling Size = 5 Number of maps = 8

Image Size = 200

Number of output channels = 8

Number of input channels = 3

One

laye

r

RF size = 18

Input to another layer above (image with 8 channels)

W

H

LCN Size = 5

Le, et al., Building high-‐level features using large-‐scale unsupervised learning. ICML 2012

Image

Autoencoder

Autoencoder

Autoencoder

Face detector Human body detector Cat detector

Key results

Totally unsupervised!

~85% correct in classifying face vs no face

Le, et al., Building high-‐level features using large-‐scale unsupervised learning. ICML 2012

ImageNet 2009 (10k categories): Best published result: 17% (Sanchez & Perronnin ‘11 ), Our method: 20% Using only 1000 categories, our method > 50%

0.005% Random guess

9.5% State-‐of-‐the-‐art

(Weston, Bengio ‘11)

15.8% Feature learning From raw pixels

Quoc Le

ImageNet classificaLon

Prior art

# Examples

# Dimensions

100,000

1,000

Scaling up Deep Learning

# Parameters 10,000,000

Our work

10,000,000

10,000

1,000,000,000

Learned features Edge filters from Images

High-‐level features Face, cat detectors

Data set size Gbytes Tbytes

Quoc Le

Summary of Scaling up

-‐  Local connecLvity (Model Parallelism)

-‐  Asynchronous SGDs (Clever opLmizaLon / Data parallelism) -‐  RPCs

-‐  Prefetching

-‐  Single

-‐  Removing slow machines

-‐  Lots of opLmizaLon

Quoc Le

Locally connected networks

Machine #1 Machine #2 Machine #3 Machine #4

Features

Image

Quoc Le

Asynchronous Parallel SGDs (Alex Smola’s talk)

Parameter server

Quoc Le

•  Scale deep learning 100x larger using distributed training on 1000 machines

•  Brain simulaLon -‐> Cat neuron •  State-‐of-‐the-‐art performances on

–  Object recogniLon (ImageNet) –  AcLon RecogniLon –  Cancer image classificaLon

•  Other applicaLons –  Speech recogniLon –  Machine TranslaLon

Conclusions

Face neuron

0.005% 9.5% 15.8% Random guess Best published result Our method

ImageNet

Cat neuron

Parameter server

Model Parallelism

Data Parallelism

•  Q.V. Le, M.A. Ranzato, R. Monga, M. Devin, G. Corrado, K. Chen, J. Dean, A.Y. Ng. Building high-‐level features using large-‐scale unsupervised learning. ICML, 2012.

•  Q.V. Le, J. Ngiam, Z. Chen, D. Chia, P. Koh, A.Y. Ng. Tiled Convolu7onal Neural Networks. NIPS, 2010.

•  Q.V. Le, W.Y. Zou, S.Y. Yeung, A.Y. Ng. Learning hierarchical spa7o-‐temporal features for ac7on recogni7on with independent subspace analysis. CVPR, 2011.

•  Q.V. Le, J. Ngiam, A. Coates, A. Lahiri, B. Prochnow, A.Y. Ng. On op7miza7on methods for deep learning. ICML, 2011.

•  Q.V. Le, A. Karpenko, J. Ngiam, A.Y. Ng. ICA with Reconstruc7on Cost for Efficient Overcomplete Feature Learning. NIPS, 2011.

•  Q.V. Le, J. Han, J. Gray, P. Spellman, A. Borowsky, B. Parvin. Learning Invariant Features for Tumor Signatures. ISBI, 2012.

•  I.J. Goodfellow, Q.V. Le, A.M. Saxe, H. Lee, A.Y. Ng, Measuring invariances in deep networks. NIPS, 2009.

References

hAp://ai.stanford.edu/~quocle

quoc le, stanford & google - tera scale deep learning

Technology

scale unsupervised

level features

building high

scaling

icml

hand

large

ngiam