scaling deep learning to 100s of gpus on hops hadoop€¦ · scaling deep learning to 100s of gpus...
TRANSCRIPT
Scaling Deep Learning to 100s of GPUs on Hops Hadoop
Fabio BusoSoftware EngineerLogical Clocks AB
2
HopsFS: Next generation HDFS
37xNumber of fles
16xThroughput
Scale Challenge Winner (2017)
*https://www.usenix.org/conference/fast17/technical-sessions/presentation/niazi**https://eurosys2017.github.io/assets/data/posters/poster09-Niazi.pdf
3
Hops platform
Projects, Datasets, Users
HopsFS, HopsYARN, MySQL NDB Cluster
Spark, Tensorfow, Hive, Kafka, Flink
Jupyter, Zeppelin
Jobs, Grafana, ELK
RESTAPI
Version 0.3.0 just released!
4
Python frst
Conda Repo
Project Conda env
Search
Install/Remove
Python-3.6, pandas-1.4,Numpy-0.9
Environment usable by Spark/Tensorfow
Hops python library: Make development easy● Hyperparameter searching● Manage Tensorboard lifecycle
5
Find big datasets - Dela*
● Discover, Share and experiment with interesting datasets
● p2p network of Hops Cluster● ImageNet, YouTube8M, Reddit comments...● Exploits unused bandwidth
*http://ieeexplore.ieee.org/document/7980225/ (ICDCS 2017)
Scale out level: 1Parallel Hyper parameter searching
7
Parallel Hyperparameter searching
def model(lr, dropout):…
args_dict = {'learning_rate': [0.001, 0.0005, 0.0001], 'dropout': [0.45, 0.7]}
args_dict_grid = util.grid_params(args_dict)
tflauncher.launch(spark, model, args_dict_grid)
Starts 6 parallel experiments
Scale out Level: 2Distributed Training
9
TensorFlowOnSpark (TFoS) by Yahoo!
● Distributed TensorFlow over Spark● Runs on top of a Hadoop cluster● PS/Workers executed inside Spark executors● Uses Spark for resource allocations
– Our version: exclusive GPUs allocations– Parameter server(s) do not get GPU(s)
● Manages Tensorboard
10
Run TFoS
def training_fun(argv, ctx):
…..
TFNode.start_cluster_server()
…..
TFCluster.run(spark, training_fun, num_exec, num_ps…)
Full conversion guide: https://github.com/yahoo/TensorFlowOnSpark/wiki/Conversion-Guide
Scale out level: Master of the dark artsHorovod
12
PS server architecture doesn’t scale
From: https://github.com/uber/horovod
13
Horovod by Uber
● Based on previous work done by Baidu
● Organize workers in a ring● Gradients updates distributed using All-Reduce
● Synchronous protocol
14
All-Reduce
GPU1
GPU2
GPU3
a0 b0 c0
a1 b1 c1
a2 b2 c2
15
All-Reduce
a0 b0 c0 + c2
a0 + a1 b1 c1
a2 b1 + b2 c2
GPU1
GPU2
GPU3
16
All-Reduce
a0 b0 + b1 + b2 c0 + c2
a0 + a1 b1 c0 + c1 + c2
a0 + a1 + a2 b1 + b2 c2
GPU1
GPU2
GPU3
17
All-Reduce
a0 b0 + b1 + b2 c0 + c2
a0 + a1 b1 c0 + c1 + c2
a0 + a1 + a2 b1 + b2 c2
GPU1
GPU2
GPU3
18
All-Reduce
a0 + a1 + a2 b0 + b1 + b2 c0 + c2
a0 + a1 b0 + b1 + b2 c0 + c1 + c2
a0 + a1 + a2 b1 + b2 c0 + c1 + c2
GPU1
GPU2
GPU3
19
All-Reduce
a0 + a1 + a2 b0 + b1 + b2 c0 + c1 + c2
a0 + a1 + a2 b0 + b1 + b2 c0 + c1 + c2
a0 + a1 + a2 b0 + b1 + b2 c0 + c1 + c2
GPU1
GPU2
GPU3
20
Hops AllReduce
import horovod.tensorflow as hvddef conv_model(feature, target, mode) …..def main(_): hvd.init() opt = hvd.DistributedOptimizer(opt) if hvd.local_rank()==0: hooks = [hvd.BroadcastGlobalVariablesHook(0), ..] ….. else: hooks = [hvd.BroadcastGlobalVariablesHook(0), ..]
…..from hops import allreduceallreduce.launch(spark, 'hdfs:///Projects/…/all_reduce.ipynb')
Demo time!
Play with it → hops.io/?q=content/hopsworks-vagrant
Doc → hops.ioStar us! → github.com/hopshadoopFollow us! → @hopshadoop