recent progress on distributing deep learning

48
Recent progress on distributing deep learning Viet-Trung Tran KDE lab Department of Information Systems School of Information and Communication Technology 1

Upload: viet-trung-tran

Post on 13-Apr-2017

731 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Recent progress on distributing deep learning

Recent progress on distributing deep learning

Viet-Trung Tran KDE lab

Department of Information Systems School of Information and Communication

Technology

1

Page 2: Recent progress on distributing deep learning

Outline

•  State of the art •  Overview of neural network and deep

learning •  Deep learning driven factors •  Scaling deep learning

2

Page 3: Recent progress on distributing deep learning

3

Page 4: Recent progress on distributing deep learning

4

Page 5: Recent progress on distributing deep learning

5

Page 6: Recent progress on distributing deep learning

6

Page 7: Recent progress on distributing deep learning

Perceptron

7

Page 8: Recent progress on distributing deep learning

Feed forward neural network

8

Page 9: Recent progress on distributing deep learning

Training algorithm

•  while not done yet – pick a random training case (x, y) –  run neural network on input x – modify connections to make prediction closer to

y, follow the gradient of the error w.r.t. the connections

9

Page 10: Recent progress on distributing deep learning

Parameter learning: back propagation of error

•  Calculate total error at the top •  Calculate contributions to error at each step going

backwards

10

Page 11: Recent progress on distributing deep learning

Stochastic gradient descent (SGD)

11

Page 12: Recent progress on distributing deep learning

12

Page 13: Recent progress on distributing deep learning

Fact

Anything humans can do in 0.1 sec, the right big 10-layer network can do too

13

Page 14: Recent progress on distributing deep learning

DEEP LEARNING DRIVEN FACTORS

14

Page 15: Recent progress on distributing deep learning

Big Data

15Source:EricP.Xing

Page 16: Recent progress on distributing deep learning

Computing resources

16

Page 17: Recent progress on distributing deep learning

"Modern" neural networks

•  Deeper but faster training models – Deep belief – ConvNet – RNN (LSTM, GRU)

17

Page 18: Recent progress on distributing deep learning

SCALING DISTRIBUTED DEEP LEARNING

18

Page 19: Recent progress on distributing deep learning

Growing Model Complexity

19Source:EricP.Xing

Page 20: Recent progress on distributing deep learning

Objective: minimizing time to results

•  experiment turnaround time •  making fast rather than optimizing resources

20

Page 21: Recent progress on distributing deep learning

Objective: improving results

•  Fact: increasing training examples, model parameters, or both, can drastically improve ultimate classification accuracy – D. C. Ciresan, U. Meier, L. M. Gambardella, and

J. Schmidhuber. Deep big simple neural nets excel on handwritten digit recognition. CoRR, 2010.

– R. Raina, A. Madhavan, and A. Y. Ng. Large-scale deep unsupervised learning using graphics processors. In ICML, 2009.

21

Page 22: Recent progress on distributing deep learning

Scaling deep learning

•  Leverage GPU •  Exploit many kinds of parallelism – Model parallelism – Data parallelism

22

Page 23: Recent progress on distributing deep learning

Why scaling out

•  We can use a cluster of machines to train a modestly sized speech model to the same classification accuracy in less than 1/10th the time required on a GPU

23

Page 24: Recent progress on distributing deep learning

Model parallelism

•  Parallelism in DistBelief

24

Page 25: Recent progress on distributing deep learning

Model parallelism [cont'd]

•  Message passing during upward and downward phases

•  Distributed computation •  Performance gains are held by

communication costs

25

Page 26: Recent progress on distributing deep learning

26

Source:JeffDean

Page 27: Recent progress on distributing deep learning

Data parallelism: Downpour SGD •  Divide the training data into a number of

subsets •  Run a copy of the model on each of

these subsets •  Before processing each mini-batch

–  model replica asks for up-to-date parameters

–  processes the mini-batch –  sending back the gradients

•  To reduce communication overhead –  request parameter servers every nfech

steps, update every npush steps •  A model replica is certainly working on a

set of out-of-date parameters

27

Page 28: Recent progress on distributing deep learning

Sandblaster •  Coordinator assigns each of

the N model replicas a small portion of work, much smaller than 1/Nth of the total size of a batch

•  Assigns replicas new portions whenever they are free

•  Schedules multiple copies of the outstanding portions and uses the result from whichever model replica finishes first

28

Page 29: Recent progress on distributing deep learning

AllReduce – Baidu DeepImage 2015

•  Each worker computes gradients and maintains a subset of parameters

•  Every node fetches up-to-date parameters from all other nodes

•  Optimization – Butterfly synchronization •  Require log(N) steps •  Last step to perform broadcasting

29

Page 30: Recent progress on distributing deep learning

Butterfly barrier

30

Page 31: Recent progress on distributing deep learning

Distributed Hogwild •  Used by Caffe. •  Each node maintains a

local replica of all parameters.

•  In an iteration, node computes gradients and updates locally

•  Exchange updates periodically

31

Page 32: Recent progress on distributing deep learning

DISTRIBUTED DEEP LEARNING FRAMEWORK

32

Page 33: Recent progress on distributing deep learning

Parameter server [OSDI 2014]

33

Page 34: Recent progress on distributing deep learning

Apache Singa [2015]

•  National University of Singapore

34

Page 35: Recent progress on distributing deep learning

Petuum CMU [ACML 2015]

35

Page 36: Recent progress on distributing deep learning

Stale Synchronous Parallel (SSP)

36

Page 37: Recent progress on distributing deep learning

Structure-Aware Parallelization (Strads engine)

37

Page 38: Recent progress on distributing deep learning

38

Page 39: Recent progress on distributing deep learning

•  Data flow graph •  Distributed version has

just been released (based on gRPC)

39

Page 40: Recent progress on distributing deep learning

Deep learning on spark

•  Deeplearning4j •  Adatao/Amiro scaling Tensorflow on spark •  Yahoo lab released CaffeOnSpark •  Data parallelism

40

Page 41: Recent progress on distributing deep learning

DEMO APPLICATIONS

41

Page 42: Recent progress on distributing deep learning

Vietnamese OCR

•  Recognize text line rather than word, character

•  Very good results with just ~20mb model, ~30 pages

42

Page 43: Recent progress on distributing deep learning

Vietnamese predictive text model •  ~ 20 MB plain text corpus •  Chú hoài linh đẹp trai. Chú hoài linh •  Chào buổi sáng •  chị hát hay wa!! nghe thick a. •  chị khởi my ơi e rất la hâm mộ •  làm gì bây giờ khi •  chú hoài linh thật đẹp zai và chú Trấn thành đẹp qá •  chú hoài linh thật đẹp zai và chú Phánh

43

Page 44: Recent progress on distributing deep learning

•  ~ 14 MB plain text corpus •  lịch sử ghi nhớ năm 1979 •  tại hội nghị, đồng chí Phạm Ngọc Thủy Võ Văn Kiệt •  tại hội nghị, đồng chí Hồ Chí Minh nói •  tại hội nghị, đồng chí Võ Nguyên Giáp và đồng chí Hồ Chí

Minh đã ngồi ở •  tại đại hội Đảng lần thứ nhất vào năm 1945, •  Ngay từ những ngày đầu, Đúng như nhận xét của Giáo sư

Nguyễn Văn Linh

44

Page 45: Recent progress on distributing deep learning

CONCLUSION

45

Page 46: Recent progress on distributing deep learning

Principles of ML System Design •  ACML 2015. How to Go Really Big in AI: Strategies &

Principles for Distributed Machine Learning – How to distribute? – How to bridge computation and communication? – How to communicate? – What to communicate?

46

Page 47: Recent progress on distributing deep learning

Thank you!

47

Page 48: Recent progress on distributing deep learning

48HowtoGoReallyBiginAI:Strategies&PrinciplesforDistributedMachineLearning