urs köster presenting at re-work dl summit in boston
TRANSCRIPT
Proprietary and confidential. Do not distribute.
Deep Learning at Scale
May 2016 Urs Köster, PhD
Nervana
MAKING MACHINES SMARTER.
Proprietary and confidential. Do not distribute.
ne r vana
About nervana
2
• A platform for machine intelligence
• enable deep learning at scale
• optimized from algorithms to silicon
X
Proprietary and confidential. Do not distribute.
ne r vana
The Nervana Platform - a full-stack solution
3
neon deep learning
framework
nervana cloud Solutions
Images
Text
Tabular
Speech
Time series
Video
neon: nervana python deep learning library
4
• User-friendly, extensible, fast
• Support for many deep learning models
• Interface to nervana cloud
• Multiple backends
• nervana engine
• GPU (optimized assembler kernels)
• CPU cluster
Open source (Apache 2.0) on github.com/nervanaSystems/neon
Proprietary and confidential. Do not distribute.
ne r vana
Nervana Cloud
5
web interface
command line
Proprietary and confidential. Do not distribute.
ne r vana
Deep learning as a core technology
6
DL
Photos Maps
Voice Search
Self-driving car
Ad Targeting
Machine Translation
‘Google Brain’ model
DL
Image Classification
Object Localization
Video Indexing
Speech Recognition
Nervana Platform
Natural Language
Proprietary and confidential. Do not distribute.
ne r vana
Video recognition with 3D convolution
7
Training Speed
0
0.25
0.5
0.75
1
epochs / hour
neon caffe
Proprietary and confidential. Do not distribute.
ne r vana
Object Localization / Segmentation
8
CamVid DatasetSegNet model
KITTI DatasetFast R-CNN model
neon (ms) caffe (ms) Speedup
Fast-RCNN (batch size=4) 360 670 1.8x
SegNet (batch size=4) 267 1455 5.4x
SegNet (4 GPUs, batch size=16) 348 -- *5.9x
Proprietary and confidential. Do not distribute.
ne r vana
Image Classification (Residual Network)
9
Proprietary and confidential. Do not distribute.
ne r vana
Speech to text
10
Proprietary and confidential. Do not distribute.
ne r vana
Imagenet ILSVRC Challenge
11
Top-5
err
or
rate
0%
10%
20%
30%
2010 2011 2012 2013 2014 2015
Deep learninghuman
performance
Alex
Net
C
larifa
i
Goo
gleNe
t
Res
Net
Proprietary and confidential. Do not distribute.
ne r vana 12
• Same model, better performance:
• Hardware improvements
• Algorithmic improvements
Speeding up Deep Learning
0100200
300400500600
CPU GTX580TitanX neon
Soumith's AlexNet Benchmark
ms
0
100
200
300
400
500
4/2015 8/2015 3/2016
neonCuDNN
Soumith's GoogleNet Benchmark
ms
0
100
200
300
400
500
4/2015 8/2015 3/2016
neonCuDNN
15,000 ...
Alexnet ms / iteration
Proprietary and confidential. Do not distribute.
ne r vana
Dennard scaling has ended
13
# OF PROCESSORS
LEARNING SPEED
INDUSTRY STANDARD: COMMUNICATION OVERHEAD = PERFORMANCE CEILING
NERVANA: BETTER COMMUNICATION FABRIC, NEAR LINEAR SCALING
Transistors Clock speed Power Perf / clock
Proprietary and confidential. Do not distribute.
ne r vana
Nervana Engine (coming in 2017)
14
• Unprecedented computing power
• 10x speedup over current GPUs
• More memory on-chip
• High-Bandwidth Memory off-chip
• Six bi-directional high-bandwidth
links for 3D torus interconnect
• 8 chips in a box, seamlessly scale
to multiple chassis
Proprietary and confidential. Do not distribute.
ne r vana
Summary
15
• Deep learning is a new computational paradigm
• Learning and Inference on data
• neon with state-of-the-art GPU kernels
• Nervana Cloud with multi-GPU training
• Watch for Nervana Engine deep learning processor