quoc le, stanford & google - tera scale deep learning

15
Tera-scale deep learning Quoc V. Le Stanford University and Google Joint work with Greg Corrado Jeff Dean MaAhieu Devin Kai Chen Rajat Monga Andrew Ng MarcAurelio Ranzato Paul Tucker Ke Yang

Upload: kun-le

Post on 01-Nov-2014

2.209 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Quoc Le, Stanford & Google - Tera Scale Deep Learning

Tera-scale deep learning Quoc  V.  Le  

Stanford  University  and  Google      

Joint  work  with  

Greg  Corrado   Jeff  Dean   MaAhieu  Devin  Kai  Chen  

Rajat  Monga   Andrew  Ng  Marc’Aurelio  Ranzato  

Paul  Tucker   Ke  Yang  

Page 2: Quoc Le, Stanford & Google - Tera Scale Deep Learning

Machine  Learning  successes  

Face  recogniLon   OCR   Autonomous  car  

RecommendaLon  systems   Web  page  ranking  

Email  classificaLon  

Quoc  Le  

Page 3: Quoc Le, Stanford & Google - Tera Scale Deep Learning

The  role  of  Feature  ExtracLon    in  PaAern  RecogniLon  

Classifier  

Feature  extracLon  (Mostly  hand-­‐craWed  features)  

Quoc  Le  

Page 4: Quoc Le, Stanford & Google - Tera Scale Deep Learning

Hand-­‐CraWed  Features  Computer  vision:      

Speech  RecogniLon:      

MFCC   Spectrogram   ZCR  

…  

SIFT/HOG   SURF  

…  

Quoc  Le  

Page 5: Quoc Le, Stanford & Google - Tera Scale Deep Learning

New  feature-­‐designing  paradigm  

Unsupervised  Feature  Learning  /  Deep  Learning      Show  promises  for  small  datasets    Expensive  and  typically  applied  to  small  problems  

Quoc  Le  

Page 6: Quoc Le, Stanford & Google - Tera Scale Deep Learning

The  Trend  of  BigData  

Quoc  Le  

Page 7: Quoc Le, Stanford & Google - Tera Scale Deep Learning

Brain  SimulaLon  

Watching  10  million  YouTube  video  frames    Train  on  2000  machines  (16000  cores)  for  1  week    1.15  billion  parameters  -­‐  100x  larger  than  previously  reported    -­‐  Small  compared  to  visual  cortex    

Pooling Size = 5 Number of maps = 8

Image Size = 200

Number of output channels = 8

Number of input channels = 3

One

laye

r

RF size = 18

Input to another layer above (image with 8 channels)

W

H

LCN Size = 5

Le,  et  al.,  Building  high-­‐level  features  using  large-­‐scale  unsupervised  learning.  ICML  2012  

Image  

Autoencoder  

Autoencoder  

Autoencoder  

Page 8: Quoc Le, Stanford & Google - Tera Scale Deep Learning

Face  detector   Human  body  detector   Cat  detector  

Key  results  

Totally  unsupervised!    

~85%  correct  in    classifying    face  vs  no  face    

Le,  et  al.,  Building  high-­‐level  features  using  large-­‐scale  unsupervised  learning.  ICML  2012  

Page 9: Quoc Le, Stanford & Google - Tera Scale Deep Learning

ImageNet  2009  (10k  categories):  Best  published  result:  17%                                                                                                                        (Sanchez  &  Perronnin  ‘11  ),                                                                                                                        Our  method:  20%    Using  only  1000  categories,  our  method  >  50%    

0.005%  Random  guess  

9.5%  State-­‐of-­‐the-­‐art  

(Weston,  Bengio  ‘11)  

15.8%  Feature  learning    From  raw  pixels  

Quoc  Le  

ImageNet  classificaLon  

Page 10: Quoc Le, Stanford & Google - Tera Scale Deep Learning

Prior  art  

#  Examples  

#  Dimensions  

100,000  

1,000  

Scaling  up  Deep  Learning  

#  Parameters   10,000,000  

Our  work  

10,000,000  

10,000  

1,000,000,000  

Learned  features  Edge  filters    from  Images  

High-­‐level  features  Face,  cat  detectors  

Data  set  size   Gbytes   Tbytes  

Quoc  Le  

Page 11: Quoc Le, Stanford & Google - Tera Scale Deep Learning

Summary  of  Scaling  up  

-­‐  Local  connecLvity  (Model  Parallelism)  

-­‐  Asynchronous  SGDs  (Clever  opLmizaLon  /  Data  parallelism)    -­‐  RPCs  

-­‐  Prefetching  

-­‐  Single  

-­‐  Removing  slow  machines  

-­‐  Lots  of  opLmizaLon  

Quoc  Le  

Page 12: Quoc Le, Stanford & Google - Tera Scale Deep Learning

Locally  connected  networks  

Machine  #1   Machine  #2   Machine  #3   Machine  #4  

Features  

Image  

Quoc  Le  

Page 13: Quoc Le, Stanford & Google - Tera Scale Deep Learning

Asynchronous  Parallel  SGDs  (Alex  Smola’s  talk)  

Parameter  server  

Quoc  Le  

Page 14: Quoc Le, Stanford & Google - Tera Scale Deep Learning

•  Scale  deep  learning  100x  larger  using  distributed  training  on  1000  machines  

•  Brain  simulaLon  -­‐>  Cat  neuron  •  State-­‐of-­‐the-­‐art  performances  on    

–  Object  recogniLon  (ImageNet)  –  AcLon  RecogniLon  –  Cancer  image  classificaLon  

•  Other  applicaLons  –  Speech  recogniLon  –  Machine  TranslaLon  

Conclusions  

Face  neuron  

0.005%   9.5%   15.8%  Random  guess   Best  published  result   Our  method  

ImageNet  

Cat  neuron  

Parameter  server  

Model    Parallelism  

Data  Parallelism  

Page 15: Quoc Le, Stanford & Google - Tera Scale Deep Learning

•  Q.V.  Le,  M.A.  Ranzato,  R.  Monga,  M.  Devin,  G.  Corrado,  K.  Chen,  J.  Dean,  A.Y.  Ng.  Building  high-­‐level  features  using  large-­‐scale  unsupervised  learning.  ICML,  2012.  

•  Q.V.  Le,  J.  Ngiam,  Z.  Chen,  D.  Chia,  P.  Koh,  A.Y.  Ng.  Tiled  Convolu7onal  Neural  Networks.  NIPS,  2010.    

•  Q.V.  Le,  W.Y.  Zou,  S.Y.  Yeung,  A.Y.  Ng.  Learning  hierarchical  spa7o-­‐temporal  features  for  ac7on  recogni7on  with  independent  subspace  analysis.  CVPR,  2011.  

•  Q.V.  Le,  J.  Ngiam,  A.  Coates,  A.  Lahiri,  B.  Prochnow,  A.Y.  Ng.    On  op7miza7on  methods  for  deep  learning.  ICML,  2011.    

•  Q.V.  Le,  A.  Karpenko,  J.  Ngiam,  A.Y.  Ng.    ICA  with  Reconstruc7on  Cost  for  Efficient  Overcomplete  Feature  Learning.  NIPS,  2011.    

•  Q.V.  Le,  J.  Han,  J.  Gray,  P.  Spellman,  A.  Borowsky,  B.  Parvin.  Learning  Invariant  Features  for  Tumor  Signatures.  ISBI,  2012.    

•  I.J.  Goodfellow,  Q.V.  Le,  A.M.  Saxe,  H.  Lee,  A.Y.  Ng,    Measuring  invariances  in  deep  networks.  NIPS,  2009.  

References  

hAp://ai.stanford.edu/~quocle