exploiting bounded staleness to speed up big data …exploiting bounded staleness to speed up big...

47
Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu Kumar, Jinliang Wei, Wei Dai, Greg Ganger, Phil Gibbons (Intel), Garth Gibson, and Eric Xing Carnegie Mellon University

Upload: others

Post on 20-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

Exploiting Bounded Staleness to Speed up Big Data Analytics

Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim,

Seunghak Lee, Abhimanu Kumar, Jinliang Wei, Wei Dai, Greg Ganger, Phil Gibbons (Intel), Garth Gibson, and Eric Xing

Carnegie Mellon University

Page 2: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

Iterative program fits model

Big Data Analytics Overview

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 2

Huge input data Model parameters (solution)

One

iter

atio

n

Page 3: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

Big Data Analytics Overview

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 3

Partitioned input data

Parallel iterative program

Model parameters (solution)

Page 4: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

Big Data Analytics Overview

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 4

Partitioned input data

Parallel iterative program

Model parameters (solution)

Parameter server

Goal: Less sync overhead

Page 5: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

Outline •  Two novel synchronization approaches

•  Arbitrarily-sized Bulk Synchronous Parallel (A-BSP) •  Stale Synchronous Parallel (SSP)

•  LazyTable architecture overview •  Taste of experimental results

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 5

Page 6: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

Bulk Synchronous Parallel

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 6

•  A barrier every clock (a.k.a. epoch) •  In ML apps, often one iteration over input data

Clock 0 1 2 3 4

Thread 1

Thread 2

Thread 3 Updates not necessarily visible

Thread 1 blocked by barrier

Iteration 0 1 2 3 4

Iterations complete, updates visible

Thread progress illustration:

Page 7: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

Data Staleness •  In BSP, threads can see "out-of-date" values

•  May not see others' updates right away •  Convergent apps usually tolerate that

•  Allowing more staleness for speed •  Less synchronizing among threads •  More using cached values •  More delaying and batching of updates

•  But, too much staleness hurts convergence •  Important to have staleness bound •  Staleness should be tunable

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 7

0 1 2 3

0 1 2 3

Page 8: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

Arbitrarily-sized BSP (A-BSP) •  Work in each clock can be more than one iteration

– Less synchronization overhead

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 8

Clock 0 1 2

Thread 1

Thread 2

Thread 3

two iterations per clock Iteration 0 1 2 3 4

Thread 1 blocked by barrier

Page 9: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

Problem of (A-)BSP: Stragglers •  A-BSP still has the straggler problem

•  A slow thread will slow down all •  Stragglers are common

in large systems

•  Many reasons for stragglers •  Hardware: lost packets, SSD cleaning, disk resets •  Software: garbage collection, virtualization •  Algorithmic: calculating objectives and stopping

conditions

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 9

0 1 2

0 1 2 3 4

Page 10: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

Stale Synchronous Parallel (SSP)

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 10

•  Threads are allowed to be slack clocks ahead of the slowest thread

[HotOS’13, NIPS’13]

Thread 1

Thread 2

Thread 3

Slack of 1 clock

Thread 1 at clock 3

Iteration 0 1 2 3 4

Clock 0 1 2 3 4

Thread 2 at clock 2

Page 11: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

Two Dimensional Config. Space •  Iters-per-clock and slack are both tunable

•  A-BSP is SSP with a slack of zero •  Every SSP config. has an A-BSP counterpart

with the same data staleness bound

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 11

A-BSP (iters-per-clock=2, slack=0): SSP (iters-per-clock=1, slack=1):

Iteration 0 1 2 3 4

Clock 0 1 2 3 4

Clock 0 1 2

Iteration

0 1 2 3 4

Page 12: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

LazyTable Architecture

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 12

Partitioned input data

Parallel iterative program on LazyTable

Model parameters (sharded)

Client-lib

Client-lib

Client-lib

Server

Server

Server

Server

Page 13: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

LazyTable Architecture

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 13

Partitioned input data

Parallel iterative program on LazyTable

Model parameters (sharded)

Client-lib

Client-lib

Client-lib

Server

Server

Server

Server

See the paper for more details

Page 14: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

Primary Experimental Setup •  Hardware information

•  8 machines, each with 64 cores & 128GB RAM

•  Basic configuration •  One client & tablet server per machine •  One computation thread per core

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 14

Page 15: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

Application Benchmark #1 •  Topic Modeling

•  Algorithm: Gibbs Sampling on LDA •  Input: NY Times dataset

– 300k docs, 100m words, 100k vocabulary •  Solution quality criterion: Loglikelihood

– How likely the model generates observed data – Becomes higher as the algorithm converges – A larger value indicates better quality

More apps described and used in paper

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 15

Page 16: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

Controlling Data Staleness •  SSP

•  Larger slack -> more staleness

•  A-BSP •  Larger iterations-per-clock -> more staleness

•  The tradeoffs with increased staleness

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 16

Page 17: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

Staleness Increases Iters/sec

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 17

0 100 200 3000

20

40

60

Iterations  per  s ec

Iteratio

ns  don

e

Giters-per-clock is 1

Page 18: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

0 100 200 3000

20

40

60

Iterations  per  s ec

Iteratio

ns  don

e

T ime  (s ec )

 s la ck=0  (B S P )

Staleness Increases Iters/sec

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 18

iters-per-clock is 1

Page 19: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

0 100 200 3000

20

40

60

Iterations  per  s ec

Iteratio

ns  don

e

T ime  (s ec )

 s la ck=0  (B S P )  s la ck=1  (S S P )

Staleness Increases Iters/sec

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 19

larger iters per sec with more staleness

iters-per-clock is 1

Page 20: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

0 100 200 3000

20

40

60

Iterations  per  s ec

Iteratio

ns  don

e

T ime  (s ec )

 s la ck=0  (B S P )  s la ck=1  (S S P )  s la ck=3  (S S P )

Staleness Increases Iters/sec

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 20

iters-per-clock is 1

Page 21: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

Staleness Reduces Converge/iter

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 21

0 20 40 60

Con

vergen

ce(highe

r  is  better)

Ite ra tions  done

C onverg enc e  per  iter

iters-per-clock is 1

Page 22: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

Staleness Reduces Converge/iter

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 22

0 20 40 60

Con

vergen

ce(highe

r  is  better)

Ite ra tions  done

 s la ck=0  (B S P )

C onverg enc e  per  iter

iters-per-clock is 1

Page 23: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

0 20 40 60

Con

vergen

ce(highe

r  is  better)

Ite ra tions  done

 s la ck=0  (B S P )  s la ck=1  (S S P )

C onverg enc e  per  iter

Staleness Reduces Converge/iter

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 23

more iters to converge with more staleness

iters-per-clock is 1

Page 24: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

0 20 40 60

Con

vergen

ce(highe

r  is  better)

Ite ra tions  done

 s la ck=0  (B S P )  s la ck=1  (S S P )  s la ck=3  (S S P )

C onverg enc e  per  iter

Staleness Reduces Converge/iter

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 24

iters-per-clock is 1

Page 25: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

0 20 40 60

Con

vergen

ce(highe

r  is  better)

Ite ra tions  done

 s la ck=0  (B S P )  s la ck=1  (S S P )  s la ck=3  (S S P )

C onverg enc e  per  iter

0 100 200 300

C onverg enc e  per  s ec

Con

vergen

ce(highe

r  is  better)

T ime  (s ec )

 s la ck=0  (B S P )  s la ck=1  (S S P )  s la ck=3  (S S P )

Sweet Spot Balances the Two

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 25

0 100 200 3000

20

40

60

Iterations  per  s ec

Iteratio

ns  don

e

T ime  (s ec )

 s la ck=0  (B S P )  s la ck=1  (S S P )  s la ck=3  (S S P )

Speed up with a good slack

Page 26: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

Key Takeaway Insight #1

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 26

Fresher data Staler data

Iterations per second

Convergence per iteration

Convergence per second

The sweet spot

Page 27: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

SSP vs A-BSP •  Similar performance

•  In the absence of stragglers

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 27

What about environment with stragglers?

Page 28: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

Straggler Experiment #1 •  Stragglers caused by background disruption

•  Fairly common in large, shared clusters

•  Experiment setup •  One disrupter process per machine

– Uses 50% of CPU cycles •  Work (disrupt) or sleep randomly for t seconds

– 10% work, 90% sleep More straggler experiments in the paper

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 28

Page 29: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

Straggler Results #1

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 29

1 2 4 80

20

40

60

Disruption duration (sec)

Itera

tion

time

incr

ease

(%)

w/o disrupt, each iter takes 4.2 sec

Page 30: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

1 2 4 80

20

40

60

Disruption duration (sec)

Itera

tion

time

incr

ease

(%) ideal

Straggler Results #1

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 30

Ideally 5%, because 50% slow down with 10% probability

w/o disrupt, each iter takes 4.2 sec

Page 31: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

1 2 4 80

20

40

60

Disruption duration (sec)

Itera

tion

time

incr

ease

(%) ideal

wpc=2, slack=0 (A-BSP)

Straggler Results #1

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 31

w/o disrupt, each iter takes 4.2 sec

Ideally 5%, because 50% slow down with 10% probability

Page 32: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

1 2 4 80

20

40

60

Disruption duration (sec)

Itera

tion

time

incr

ease

(%) ideal

wpc=2, slack=0 (A-BSP)wpc=1, slack=1 (SSP)

Straggler Results #1

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 32

SSP tolerates transient stragglers

w/o disrupt, each iter takes 4.2 sec

Ideally 5%, because 50% slow down with 10% probability

Page 33: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

Conclusion •  Staleness should be tuned

•  By iters-per-clock and/or slack

•  LazyTable implements SSP and A-BSP •  See paper for details

•  Key results from experiments •  Both SSP and A-BSP are able to exploit the

staleness sweet-spot for faster convergence •  SSP is tolerant of small transient stragglers •  But SSP incurs more communication traffic

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 33

Page 34: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

References •  J. Cipar, G. Ganger, K. Keeton, C. B. Morrey, III, C. A. Soules, and A.

Veitch. LazyBase: trading freshness for performance in a scalable database. Eurosys’12.

•  J. Cipar, Q. Ho, J. K. Kim, S. Lee, G. R. Ganger, G. Gibson, K. Keeton, and E. Xing. Solving the straggler problem with bounded staleness. HotOS’13.

•  NYTimes: http://archive.ics.uci.edu/ml/datasets/Bag+of+Words •  Q. Ho, J. Cipar, H. Cui, S. Lee, J. Kim, P. Gibbons, G. Gibson, G.

Ganger, and E. Xing. More effective distributed ML via a stale synchronous parallel parameter server. NIPS’13.

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 34

Page 35: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

BACK-UP

Page 36: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

Example: Topic Modeling

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 36

Corpus of documents

word-topic table

Topic modeler

doc-topic table

... ... ... ... Doc i Topic 1: 0.8 Topic 2: 0.1 ... ... ... ... ...

Page 37: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

•  (i,  j) represents iteration  i, work  uint  j  

BSP Progress and Staleness

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 37

(2,b)

(2,d)

(2,f)

Thre

ad

1

2

3

2

(3,a) (3,b) (4,a) (4,b)

(3,c) (3,d) (4,c) (4,d)

(3,e) (3,f) (4,e) (4,f) 3 Clock 4

barrier barrier barrier

... (2,a)

... (2,c)

... (2,e)

barrier

Page 38: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

•  A-BSP, wpc = 2 iterations

A-BSP Progress and Staleness

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 38

... (2,b)

(2,d)

(2,f)

Thre

ad

1

2

3

1

(3,a) (3,b) (4,a) (4,b)

(3,c) (3,d) (4,c) (4,d)

(3,e) (3,f) (4,e) (4,f) 2 Clock barrier barrier

(2,a)

... (2,c)

... (2,e)

Page 39: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

SSP Progress and Staleness •  SSP, wpc = 1 iteration, slack = 1 clock

•  Same staleness bound as the A-BSP one – But more flexible

•  Data staleness for SSP with wpc and slack: –  wpc x (slack + 1) - 1

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 39

(2,b)

(2,d)

(2,f)

Thre

ad

1

2

3

2

(3,a) (3,b) (4,a) (4,b)

(3,c) (3,d) (4,c) (4,d)

(3,e) (3,f) (4,e) (4,f) 3 Clock 4

Slack of 1 clock

... (2,a)

... (2,c)

... (2,e)

Page 40: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

SSP VS A-BSP •  A-BSP is SSP with a slack of zero •  Data staleness bound

•  SSP {wpc, slack} == A-BSP {wpc x (slack + 1), 0}

•  SSP is a “pipelined” version of A-BSP •  Tolerates transient stragglers

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 40

Page 41: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

LazyTable Architecture

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 41

Tablet server process-0

Client process-0

App. thread

Client library

Thread cache/oplog

Process cache/oplog

App. thread

Thread cache/oplog

Tablet server process-1

Client process-1

App. thread

Client library

Thread cache/oplog

Process cache/oplog

App. thread

Thread cache/oplog

Page 42: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

Stragglers: Delay •  Delaying some threads

•  Artificially introduce stragglers to the system •  Have some threads sleep() for a time

•  Experiment setup •  Threads sleep “d” seconds in turn

– Threads of machine “i” sleep at iteration “i” •  Compare influence of different “d”

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 42

Page 43: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

0 2 4 6 8 100

2

4

6

8

Delay d (sec)

Tim

e pe

r ite

ratio

n (s

ec)

idealwpc=2, slack=0 (A-BSP)wpc=1, slack=1 (SSP)

Stragglers: Delay (Results)

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 43

Ideal slow down: d/8 per iter on 8 machines

SSP tolerates transient stragglers

A-BSP slow down: d/2 per iter

Page 44: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

The Cost of Increased Flexibility •  Comparing {wpc=X, ...} with {wpc=2X, ...}

•  Bytes sent doubled (send update twice as often) •  Bytes received almost doubled

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 44

Send Receive0

50

100

150

Meg

abyt

es p

er it

erat

ion

(MB/

iter)

wpc=4, slack=0wpc=2, slack=1wpc=1, slack=3

smaller wpc, larger slack

Page 45: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

Key Takeaway Insight #3 •  SSP incurs more traffic

•  Finer grained division of clocks •  Avoiding barriers is still a win,

if communication is not the bottleneck

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 45

Page 46: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

Prefetching •  Conservative prefetch

•  refresh row only when it’s too stale

•  Aggressive prefetch

•  refresh row every clock

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 46

Page 47: Exploiting Bounded Staleness to Speed up Big Data …Exploiting Bounded Staleness to Speed up Big Data Analytics Henggang Cui James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu

Prefetching (results)

Henggang Cui © July 14!http://www.pdl.cmu.edu/ 47

0 20 40 60

-­‐1.08x109

-­‐1.04x109

-­‐1.00x109

-­‐9.60x108

Loglikelihoo

d

Work  done  (itera tions )

 cons erva tive  pf  ag g res s ive  pf

•  wpc = 1 iter, slack = 7 clocks