![Page 1: Industrial Level Deep Learning Training Infrastructure](https://reader034.vdocuments.net/reader034/viewer/2022050306/626f46ea0ed5d72a6d2c1d0d/html5/thumbnails/1.jpg)
Industrial Level Deep Learning Training Infrastructure—the Practice and Experience from SenseTime
Shengen Yan
SenseTime Group Limited.
![Page 2: Industrial Level Deep Learning Training Infrastructure](https://reader034.vdocuments.net/reader034/viewer/2022050306/626f46ea0ed5d72a6d2c1d0d/html5/thumbnails/2.jpg)
The Success of Deep Learning
2006-01 2007-01 2008-01 2009-01 2010-01 2011-01 2012-01 2013-01 2014-01 2015-01 2016-01
Google Search
AlexNet won ImageNet
![Page 3: Industrial Level Deep Learning Training Infrastructure](https://reader034.vdocuments.net/reader034/viewer/2022050306/626f46ea0ed5d72a6d2c1d0d/html5/thumbnails/3.jpg)
What Lead to the Success?
![Page 4: Industrial Level Deep Learning Training Infrastructure](https://reader034.vdocuments.net/reader034/viewer/2022050306/626f46ea0ed5d72a6d2c1d0d/html5/thumbnails/4.jpg)
Model CapacityThe Key to High Performance
5 8 22
169
1207
LeNet AlexNet (2012) GoogLeNet (2014) ResNet (2016) Ours
# Layers
![Page 5: Industrial Level Deep Learning Training Infrastructure](https://reader034.vdocuments.net/reader034/viewer/2022050306/626f46ea0ed5d72a6d2c1d0d/html5/thumbnails/5.jpg)
Computation power
Years months weeks days
Accelerate the training time from several years to several days!
![Page 6: Industrial Level Deep Learning Training Infrastructure](https://reader034.vdocuments.net/reader034/viewer/2022050306/626f46ea0ed5d72a6d2c1d0d/html5/thumbnails/6.jpg)
Deep Learning PackageA deep learning framework that is efficient, scalable, and flexible.
DeepLinkA large-scale cluster platform designed for deep learning.
ApplicationsDelivers many application models
01
02
03
![Page 7: Industrial Level Deep Learning Training Infrastructure](https://reader034.vdocuments.net/reader034/viewer/2022050306/626f46ea0ed5d72a6d2c1d0d/html5/thumbnails/7.jpg)
Deep Learning is Complicated
Deep Learning community developedframeworks to make the life easier.
GoogleNet (2014)
![Page 8: Industrial Level Deep Learning Training Infrastructure](https://reader034.vdocuments.net/reader034/viewer/2022050306/626f46ea0ed5d72a6d2c1d0d/html5/thumbnails/8.jpg)
Deep learning Training Frameworks
‣SenseTime Deep Learning training Package
• Memory efficient
• Computation efficient
• Both model parallel & data parallel
• Support huge model
• Scalability
![Page 9: Industrial Level Deep Learning Training Infrastructure](https://reader034.vdocuments.net/reader034/viewer/2022050306/626f46ea0ed5d72a6d2c1d0d/html5/thumbnails/9.jpg)
Memory Footprint Optimization
high level compiler backend optimization algorithms on intermediate representation.
Optimizations: liveness analysis, computation graph
![Page 10: Industrial Level Deep Learning Training Infrastructure](https://reader034.vdocuments.net/reader034/viewer/2022050306/626f46ea0ed5d72a6d2c1d0d/html5/thumbnails/10.jpg)
Seeing
Perceiving
Generated Graph with mirror(re-compute) node
Chen T, Xu B, Zhang C, et al. Training deep nets with sublinear memory cost[J]. arXiv preprint arXiv:1604.06174, 2016.
Memory Footprint Optimization
![Page 11: Industrial Level Deep Learning Training Infrastructure](https://reader034.vdocuments.net/reader034/viewer/2022050306/626f46ea0ed5d72a6d2c1d0d/html5/thumbnails/11.jpg)
Model Capacity
Memory usage efficiency, higher is better
0
20
40
60
80
100
120
140
VGG ResNet50 ResNet152 Inception V4 ResNet269 Inception ResNet
Ours MxNet TensorFlow Chainer Caffe Torch
![Page 12: Industrial Level Deep Learning Training Infrastructure](https://reader034.vdocuments.net/reader034/viewer/2022050306/626f46ea0ed5d72a6d2c1d0d/html5/thumbnails/12.jpg)
Single-GPU Performance
Batch-32 Batch-64 Batch-128Caffe 497.5 1045 1965Chainer 200 290 543TensorFlow 178.6 315.7 587.2Parrots 122.7 225.6 471
0
500
1000
1500
2000
2500
milliseconds / iteration
Caffe Chainer TensorFlow Parrots
![Page 13: Industrial Level Deep Learning Training Infrastructure](https://reader034.vdocuments.net/reader034/viewer/2022050306/626f46ea0ed5d72a6d2c1d0d/html5/thumbnails/13.jpg)
Communication Optimization
Support Multi-GPUs and Multi-Nodes
Three procedures: Copy, Allreduce, Copy
Optimizations:
• Master-slave threads to overlap the communication and computation overhead
• GPU direct communication
• Ring allreduce message passing
GPU0 GPU1 GPU3GPU2
CPU Memory
Other NodesAllreduce
CopyCopy
![Page 14: Industrial Level Deep Learning Training Infrastructure](https://reader034.vdocuments.net/reader034/viewer/2022050306/626f46ea0ed5d72a6d2c1d0d/html5/thumbnails/14.jpg)
Scalability
0
0.2
0.4
0.6
0.8
1
1.2
0
2000
4000
6000
8000
10000
12000
1 2 3 4 8 16 24 32
# GPUs
millisec/iter scale efficiency
single node multiple nodes
![Page 15: Industrial Level Deep Learning Training Infrastructure](https://reader034.vdocuments.net/reader034/viewer/2022050306/626f46ea0ed5d72a6d2c1d0d/html5/thumbnails/15.jpg)
Deep Learning PackageA deep learning framework that is efficient, scalable, and flexible.
DeepLinkA large-scale cluster platform designed for deep learning.
ApplicationsDelivers many application models
01
02
03
![Page 16: Industrial Level Deep Learning Training Infrastructure](https://reader034.vdocuments.net/reader034/viewer/2022050306/626f46ea0ed5d72a6d2c1d0d/html5/thumbnails/16.jpg)
The role of supercomputer
It just like highway in the city
— It is a key infrastructure of AI
![Page 17: Industrial Level Deep Learning Training Infrastructure](https://reader034.vdocuments.net/reader034/viewer/2022050306/626f46ea0ed5d72a6d2c1d0d/html5/thumbnails/17.jpg)
Supercomputing Centers for AIThe key infrastructures for AI research.
DATA
COMPPUT-
ATIONMODEL
DeepLink
![Page 18: Industrial Level Deep Learning Training Infrastructure](https://reader034.vdocuments.net/reader034/viewer/2022050306/626f46ea0ed5d72a6d2c1d0d/html5/thumbnails/18.jpg)
Challenges
‣ Interconnects at multiple levels
• GPUs, Nodes, Sub-networks
‣Distributed data
• Random access becomes particularly difficult
‣Scale vs. Stability
• Failures of individual nodes/links
‣Human resources
• Engineers who understand both Deep Learning & HPC are difficult to come by
![Page 19: Industrial Level Deep Learning Training Infrastructure](https://reader034.vdocuments.net/reader034/viewer/2022050306/626f46ea0ed5d72a6d2c1d0d/html5/thumbnails/19.jpg)
DeepLink ClustersDesigned for Deep Learning
Software
Hardware
Co-design
High-
performance
Hardware
Customized
Middlewares
Maximize respective strengths while ensuring optimal cooperation.
• High speed interconnects
• High performance GPU computing
• Efficient distributed storage
• Distributed storage & cache system (optimized for small files)
• Distributed deep learning framework
• Task scheduling & monitoring
![Page 20: Industrial Level Deep Learning Training Infrastructure](https://reader034.vdocuments.net/reader034/viewer/2022050306/626f46ea0ed5d72a6d2c1d0d/html5/thumbnails/20.jpg)
Platform overview
Heterogeneous deep learning super computer
High speed storage system
Operation/Maintenance/Monitoring System
Lightweight virtualization
Task scheduling system
Distributed training software
Deep Learning Training Visualization System
Customized communication library for deep learning
Computation library
Distributed cache system
Softw
arePlatfo
rm
![Page 21: Industrial Level Deep Learning Training Infrastructure](https://reader034.vdocuments.net/reader034/viewer/2022050306/626f46ea0ed5d72a6d2c1d0d/html5/thumbnails/21.jpg)
Training Visualization
![Page 22: Industrial Level Deep Learning Training Infrastructure](https://reader034.vdocuments.net/reader034/viewer/2022050306/626f46ea0ed5d72a6d2c1d0d/html5/thumbnails/22.jpg)
DeepLink in SenseTime
>3000 GPUs
![Page 23: Industrial Level Deep Learning Training Infrastructure](https://reader034.vdocuments.net/reader034/viewer/2022050306/626f46ea0ed5d72a6d2c1d0d/html5/thumbnails/23.jpg)
Deep Learning PackageA deep learning framework that is efficient, scalable, and flexible.
DeepLinkA large-scale cluster platform designed for deep learning.
ApplicationsDelivers many application models
01
02
03
![Page 24: Industrial Level Deep Learning Training Infrastructure](https://reader034.vdocuments.net/reader034/viewer/2022050306/626f46ea0ed5d72a6d2c1d0d/html5/thumbnails/24.jpg)
![Page 25: Industrial Level Deep Learning Training Infrastructure](https://reader034.vdocuments.net/reader034/viewer/2022050306/626f46ea0ed5d72a6d2c1d0d/html5/thumbnails/25.jpg)
THANK YOU