solving the straggler problem with bounded staleness jim cipar, qirong ho, jin kyu kim, seunghak...

Solving the straggler problem with bounded staleness

Jim Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Gregory R. Ganger, Garth Gibson,

Kimberly Keeton*, Eric Xing

PARALLEL DATA LABORATORYCarnegie Mellon University

* HP Labs

Overview

It’s time for all applications (and systems) to worry about data freshness

• Current focus: parallel machine learning

• Often limited by synchronization overhead

• What if we explicitly allow stale data?

A typical ML algorithm

Input data

Input data(1) initialization

IntermediateProgram state

Input data(1) initialization

(2) Iterate, manysmall updates

Input data

Output results

(1) initialization

(3) Output resultsafter many iterations

Input data

Output results

(1) initialization

(3) Output resultsafter many iterations

Parallel ML

• Generally follows bulk synchronous parallel model

• Many iterations of1. Computation: compute new values2. Synchronization: wait for all other threads3. Communication: send new values to other threads4. Synchronization: wait for all other threads… again

BSP Visualized

All threads must be on the same iteration to continue

Stragglers in BSP

Slow thread(s) will hold up entire application

• Predictable stragglers– Slow/old machine– Bad network card– More data assigned to some threds

Stragglers in BSP

• Predictable stragglers Easy case– Slow/old machine– Bad network card– More data assigned to some threads

Stragglers in BSP

• Predictable stragglers Easy case• Unpredictable stragglers ???

Stragglers in BSP

– Hardware: disk seeks, network, CPU interrupts– Software: garbage collection, virtualization– Algorithmic: Calculating objectives and stopping

conditions

Stragglers in BSP

– Hardware: disk seeks, network latency, CPU interrupts

– Software: garbage collection, virtualization– Algorithmic: Calculating objectives, stopping

conditions

Don’t synchronize

Don’t synchronize?

• Well, don’t synchronize much– Read old (stale) results from other threads– Application controls how stale the data can be

• Machine learning can get away with that

• Algorithms are convergent– Given (almost) any state, will find correct solution– Errors introduced by staleness are usually ok

Trajectories of points in 2d

Points are initialized randomly,Always settle to correct locations

Freshness and convergence: the sweet spot

Fresh reads/writes

Stalereads/writes

Fresh reads/writes

Stalereads/writes

Iterations per second

Fresh reads/writes

Stalereads/writes

Improvement per iteration

Fresh reads/writes

Stalereads/writes

Improvement per iteration

Improvement per second

Fresh reads/writes

Stalereads/writes

Improvement per second

The sweet spot

Stale synchronous parallel• Allow threads to continue ahead of others

– Avoids temporary straggler effects

• Application can limit allowed staleness– Ensure convergence– E.g. “threads may not be more than 3 iters ahead”

SSP Visualized

Threads proceed, possibly using stale data

Total convergence time

I Increased staleness can mask the effects of occasional delays

Ongoing work

• Characterizing “staleness-tolerant” algorithms– Properties of algorithms, rules of thumb– Convergence proof

• Automatically tune freshness requirement

• Specify freshness by error bounds– “Read X with no more than 5% error”

Summary

Introducing staleness, but not too much staleness, can improve performance of machine

learning algorithms.

solving the straggler problem with bounded staleness jim cipar, qirong ho, jin kyu kim, seunghak...

Documents

event-driven packet processing · 2020-07-14 · packet...

probabilistically bounded staleness for practical partial

sgd on hadoop for big data & huge models alex beutel based...

implementation of software-based 2x2 mimo lte base...

vision system and depth processing for drc-hubo+ · vision...

2018 sec filing deadlines and financial statement staleness...

petuum a new platform for distributed machine learning on...

lean portfolio management - net objectivesproduct-portfolio...

petuum: a new platform for distributed machine learning on...

+ writing tips choosing the right words. + avoiding...

single-inductor, dual-input ccm boost converter for multi...

more effective distributed ml via a stale synchronous...

1 staleness vs.waiting time in universal discrete broadcast...

· (iii) meaning of staleness, symptoms, measures to avoid...

lightlda: big topic models on modest computer clusters ·...

scaling sgd to big data & huge models alex beutel based on...

fleet: online federated learning via staleness awareness

deluxe multi gym 10 - intersport winninger · avoid muscle...

distributed cache management · web caching simplest model:...

toward understanding the impact of staleness in ...€¦ ·...