solving the straggler problem with bounded staleness jim cipar, qirong ho, jin kyu kim, seunghak...

26
Solving the straggler problem with bounded staleness Jim Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Gregory R. Ganger, Garth Gibson, Kimberly Keeton*, Eric Xing PARALLEL DATA LABORATORY Carnegie Mellon University * HP Labs

Upload: dwight-anthony

Post on 18-Jan-2018

219 views

Category:

Documents


0 download

DESCRIPTION

A typical ML algorithm Input data

TRANSCRIPT

Page 1: Solving the straggler problem with bounded staleness Jim Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Gregory R. Ganger, Garth Gibson, Kimberly Keeton*,

Solving the straggler problem with bounded staleness

Jim Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Gregory R. Ganger, Garth Gibson,

Kimberly Keeton*, Eric Xing

PARALLEL DATA LABORATORYCarnegie Mellon University

* HP Labs

Page 2: Solving the straggler problem with bounded staleness Jim Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Gregory R. Ganger, Garth Gibson, Kimberly Keeton*,

Overview

It’s time for all applications (and systems) to worry about data freshness

• Current focus: parallel machine learning

• Often limited by synchronization overhead

• What if we explicitly allow stale data?

Page 3: Solving the straggler problem with bounded staleness Jim Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Gregory R. Ganger, Garth Gibson, Kimberly Keeton*,

A typical ML algorithm

Input data

Page 4: Solving the straggler problem with bounded staleness Jim Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Gregory R. Ganger, Garth Gibson, Kimberly Keeton*,

A typical ML algorithm

Input data(1) initialization

IntermediateProgram state

Page 5: Solving the straggler problem with bounded staleness Jim Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Gregory R. Ganger, Garth Gibson, Kimberly Keeton*,

A typical ML algorithm

Input data(1) initialization

IntermediateProgram state

(2) Iterate, manysmall updates

Page 6: Solving the straggler problem with bounded staleness Jim Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Gregory R. Ganger, Garth Gibson, Kimberly Keeton*,

A typical ML algorithm

Input data

Output results

(1) initialization

IntermediateProgram state

(2) Iterate, manysmall updates

(3) Output resultsafter many iterations

Page 7: Solving the straggler problem with bounded staleness Jim Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Gregory R. Ganger, Garth Gibson, Kimberly Keeton*,

A typical ML algorithm

Input data

Output results

(1) initialization

IntermediateProgram state

(2) Iterate, manysmall updates

(3) Output resultsafter many iterations

Page 8: Solving the straggler problem with bounded staleness Jim Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Gregory R. Ganger, Garth Gibson, Kimberly Keeton*,

Parallel ML

• Generally follows bulk synchronous parallel model

• Many iterations of1. Computation: compute new values2. Synchronization: wait for all other threads3. Communication: send new values to other threads4. Synchronization: wait for all other threads… again

Page 9: Solving the straggler problem with bounded staleness Jim Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Gregory R. Ganger, Garth Gibson, Kimberly Keeton*,

BSP Visualized

All threads must be on the same iteration to continue

Thre

ad

Page 10: Solving the straggler problem with bounded staleness Jim Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Gregory R. Ganger, Garth Gibson, Kimberly Keeton*,

Stragglers in BSP

Slow thread(s) will hold up entire application

• Predictable stragglers– Slow/old machine– Bad network card– More data assigned to some threds

Page 11: Solving the straggler problem with bounded staleness Jim Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Gregory R. Ganger, Garth Gibson, Kimberly Keeton*,

Stragglers in BSP

Slow thread(s) will hold up entire application

• Predictable stragglers Easy case– Slow/old machine– Bad network card– More data assigned to some threads

Page 12: Solving the straggler problem with bounded staleness Jim Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Gregory R. Ganger, Garth Gibson, Kimberly Keeton*,

Stragglers in BSP

Slow thread(s) will hold up entire application

• Predictable stragglers Easy case• Unpredictable stragglers ???

Page 13: Solving the straggler problem with bounded staleness Jim Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Gregory R. Ganger, Garth Gibson, Kimberly Keeton*,

Stragglers in BSP

Slow thread(s) will hold up entire application

• Predictable stragglers Easy case• Unpredictable stragglers ???

– Hardware: disk seeks, network, CPU interrupts– Software: garbage collection, virtualization– Algorithmic: Calculating objectives and stopping

conditions

Page 14: Solving the straggler problem with bounded staleness Jim Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Gregory R. Ganger, Garth Gibson, Kimberly Keeton*,

Stragglers in BSP

Slow thread(s) will hold up entire application

• Predictable stragglers Easy case• Unpredictable stragglers ???

– Hardware: disk seeks, network latency, CPU interrupts

– Software: garbage collection, virtualization– Algorithmic: Calculating objectives, stopping

conditions

Don’t synchronize

Page 15: Solving the straggler problem with bounded staleness Jim Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Gregory R. Ganger, Garth Gibson, Kimberly Keeton*,

Don’t synchronize?

• Well, don’t synchronize much– Read old (stale) results from other threads– Application controls how stale the data can be

• Machine learning can get away with that

• Algorithms are convergent– Given (almost) any state, will find correct solution– Errors introduced by staleness are usually ok

Page 16: Solving the straggler problem with bounded staleness Jim Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Gregory R. Ganger, Garth Gibson, Kimberly Keeton*,

Trajectories of points in 2d

Points are initialized randomly,Always settle to correct locations

Page 17: Solving the straggler problem with bounded staleness Jim Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Gregory R. Ganger, Garth Gibson, Kimberly Keeton*,

Freshness and convergence: the sweet spot

Fresh reads/writes

Stalereads/writes

Page 18: Solving the straggler problem with bounded staleness Jim Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Gregory R. Ganger, Garth Gibson, Kimberly Keeton*,

Freshness and convergence: the sweet spot

Fresh reads/writes

Stalereads/writes

Iterations per second

Page 19: Solving the straggler problem with bounded staleness Jim Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Gregory R. Ganger, Garth Gibson, Kimberly Keeton*,

Freshness and convergence: the sweet spot

Fresh reads/writes

Stalereads/writes

Iterations per second

Improvement per iteration

Page 20: Solving the straggler problem with bounded staleness Jim Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Gregory R. Ganger, Garth Gibson, Kimberly Keeton*,

Freshness and convergence: the sweet spot

Fresh reads/writes

Stalereads/writes

Iterations per second

Improvement per iteration

Improvement per second

Page 21: Solving the straggler problem with bounded staleness Jim Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Gregory R. Ganger, Garth Gibson, Kimberly Keeton*,

Freshness and convergence: the sweet spot

Fresh reads/writes

Stalereads/writes

Iterations per second

Improvement per second

The sweet spot

Page 22: Solving the straggler problem with bounded staleness Jim Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Gregory R. Ganger, Garth Gibson, Kimberly Keeton*,

Stale synchronous parallel• Allow threads to continue ahead of others

– Avoids temporary straggler effects

• Application can limit allowed staleness– Ensure convergence– E.g. “threads may not be more than 3 iters ahead”

Page 23: Solving the straggler problem with bounded staleness Jim Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Gregory R. Ganger, Garth Gibson, Kimberly Keeton*,

SSP Visualized

Threads proceed, possibly using stale data

Thre

ad

Page 24: Solving the straggler problem with bounded staleness Jim Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Gregory R. Ganger, Garth Gibson, Kimberly Keeton*,

Total convergence time

I Increased staleness can mask the effects of occasional delays

Page 25: Solving the straggler problem with bounded staleness Jim Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Gregory R. Ganger, Garth Gibson, Kimberly Keeton*,

Ongoing work

• Characterizing “staleness-tolerant” algorithms– Properties of algorithms, rules of thumb– Convergence proof

• Automatically tune freshness requirement

• Specify freshness by error bounds– “Read X with no more than 5% error”

Page 26: Solving the straggler problem with bounded staleness Jim Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Gregory R. Ganger, Garth Gibson, Kimberly Keeton*,

Summary

Introducing staleness, but not too much staleness, can improve performance of machine

learning algorithms.