tarcil: reconciling scheduling speed and quality in large...

29
Christina Delimitrou 1 , Daniel Sanchez 2 and Christos Kozyrakis 1 1 Stanford University, 2 MIT SOCC – August 27 th 2015 Tarcil: Reconciling Scheduling Speed and Quality in Large Shared Clusters

Upload: others

Post on 03-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and

Christina Delimitrou1, Daniel Sanchez2 and Christos Kozyrakis1

1Stanford University, 2MIT

SOCC  –  August  27th  2015  

Tarcil: Reconciling Scheduling Speed and Quality in Large Shared Clusters

Page 2: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and

2

¨  Goals of cluster scheduling

¤  High decision quality ¤  High scheduling speed

¨  Problem: Disparity in scheduling designs ¤  Centralized schedulers à High quality, low speed ¤  Sampling-based schedulers à High speed, low quality

¨  Tarcil: Key scheduling techniques to bridge the gap ¤  Account for resource preferences à High decision quality

¤  Analytical framework for sampling à Predictable performance ¤  Admission control àHigh quality & speed

¤  Distributed design à High scheduling speed

Executive Summary

High performance High cluster utilization

Page 3: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and

3

Motivation

¨  Optimize scheduling speed (sampling-based, distributed)

¨  Optimize scheduling quality (centralized, greedy)

Good: Short jobs

Good: Long jobs

Bad: Long jobs

Bad: Short jobs

Short: 100msec, Medium: 1-10sec, Long: 10sec-10min

Page 4: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and

4

Motivation

¨  Optimize scheduling speed (sampling-based, distributed)

¨  Optimize scheduling quality (centralized, greedy)

Good: Short jobs

Good: Long jobs

Bad: Long jobs

Bad: Short jobs

Short: 100msec, Medium: 1-10sec, Long: 10sec-10min

Page 5: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and

5

Key Scheduling Techniques at Scale

Page 6: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and

6

1. Determine Resource Preferences

¨  Scheduling quality depends on: interference, heterogeneity, scale up/out, … ¤  Exhaustive exploration à infeasible ¤  Practical data mining framework1

¤ Measure impact of a couple of allocations à estimate for large space

1C. Delimitrou and C. Kozyrakis. Quasar: Resource-Efficient and QoS-Aware Cluster Management. In ASPLOS 2014.

Page 7: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and

7

Example: Quantifying Interference

¨  Interference: set of microbenchmarks of tunable intensity (iBench)

1C. Delimitrou and C. Kozyrakis. Quasar: Resource-Efficient and QoS-Aware Cluster Management. In ASPLOS 2014.

Measure tolerated & generated interference

QoS

68%

Resource Quality Q

QoS

7% …

Data mining: Recover missing resources

Page 8: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and

8

2. Analytical Sampling Framework

¨  Sample w.r.t. required resource quality

Page 9: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and

9

2. Analytical Sampling Framework

¨  Fine-grain allocations: partition servers in Resource Units (RU) à minimum allocation unit

RU

Single-threaded apps Reclaim unused resources

Page 10: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and

10

2. Analytical Sampling Framework

¨  Match a new job with required quality Q to appropriate RUs

QR1

QR2 QR3 QR4 QR5 QR6 QR7 QR8 QR9

QR10

QR20

QR30

QR42

QR54

QR11

QR21

QR31

QR43

QR55

QR61

QR67

QR60

QR66

QR74

Page 11: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and

11

2. Analytical Sampling Framework

¨  Rank resources by quality

Page 12: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and

12

2. Analytical Sampling Framework

¨  Break ties with a fair coin à uniform distribution

CD

F

Resource Quality Q

Page 13: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and

13

2. Analytical Sampling Framework

¨  Break ties with a fair coin à uniform distribution

CD

F

Resource Quality Q

Better resources

Worse resources

Page 14: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and

14

2. Analytical Sampling Framework

¨  Sample on uniform distribution à guarantees on resource allocation quality

Pr[Q≤x] = xR

Pr[Q<0.8]=10-3

Page 15: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and

15

Validation

¨  100 server EC2 cluster ¨  Short Spark tasks ¨  Deviation between analytical and empirical is minimal

Page 16: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and

16

Sampling at High Load

¨  Performance degrades (with small sample size) ¨  Or sample size needs to increase

Page 17: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and

17

3. Admission Control

¨  Queue jobs based on required resource quality ¨  Resource quality vs. waiting time à set max waiting time limit

Q

Page 18: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and

18

Tarcil Implementation

¨  4,000 loc in C/C++ and Python

¨  Supports apps in various frameworks (Hadoop, Spark, key-value stores)

¨  Distributed design: Concurrent scheduling agents (sim. Omega2) ¤  Each agent has local copy of state, one resilient master copy

¤  Lock-free optimistic concurrency for conflict resolution (rare) à Abort and retry

¤  30:1 worker to scheduling agent ratio

2M. Schwarzkopf, A. Konwinski, et al. Omega: flexible, scalable schedulers for large compute clusters. In EuroSys 2013.

Page 19: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and

19

Evaluation Methodology

1.  TPC-H Workload ¤  ~40k queries of different types ¤  Compare with a centralized scheduler (Quasar) and a distributed

scheduler based on random sampling (Sparrow) ¤  110-server EC2 cluster (100 workers, 10 scheduling agents)

n  Homogeneous cluster, no interference n  Homogeneous cluster, with interference n  Heterogeneous cluster, with interference

¨  Metrics: ¤  Task performance ¤  Performance predictability ¤  Scheduling latency

Page 20: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and

20

Evaluation

Centralized: high overheads

Sparrow and Tarcil: similar

Page 21: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and

21

Evaluation

Centralized: high overheads

Sparrow and Tarcil: similar

Centralized and Sparrow: comparable performance Tarcil: 24% lower completion time

Page 22: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and

22

Evaluation

Centralized: high overheads

Sparrow and Tarcil: similar

Centralized and Sparrow: comparable performance Tarcil: 24% lower completion time

Centralized outperforms Sparrow Tarcil: 41% lower completion time & less jitter

Page 23: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and

23

Scheduling Overheads

Heterogeneous, with interference

¨  Centralized: Two orders of magnitude slower than the distributed, sampling-based schedulers

¨  Sparrow and Tarcil: Comparable scheduling overheads

Page 24: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and

24

Resident Load

¨  Tarcil and Centralized account for cross-job interference à preserve memcached’s QoS

¨  Sparrow causes QoS violations for memcached

memcached

Page 25: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and

25

Motivation Revisited

Distributed, sampling-based

Centralized, greedy

Tarcil

Short: 100msec Medium: 1-10sec Long:10sec-10min

Page 26: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and

26

More details in the paper…

¨  Sensitivity on parameters such as: ¤  Cluster load ¤  Number of scheduling agents ¤  Sample size ¤  Task duration, etc.

¨  Job priorities

¨  Large allocations

¨  Generic application scenario (batch and latency-critical) on 200 EC2 servers

Page 27: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and

27

Conclusions

¨  Tarcil: Reconciles high quality and high speed scheduling ¤  Account for resource preferences ¤  Analytical sampling framework to improve predictability ¤  Admission control to maintain high scheduling quality at high load ¤  Distributed design to improve scheduling speed

¨  Results: ¤  41% better performance than random sampling-based schedulers ¤  100x better scheduling latency than centralized schedulers ¤  Predictable allocation quality & performance

Page 28: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and

28

Questions?

¨  Tarcil: Reconciles high quality and high speed schedulers ¤  Account for resource preferences ¤  Analytical sampling framework to improve predictability ¤  Admission control to maintain high scheduling quality at high load ¤  Distributed design to improve scheduling speed

¨  Results: ¤  41% better performance than random sampling-based schedulers ¤  100x better scheduling latency than centralized schedulers ¤  Predictable allocation quality & performance

Page 29: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and

29

   

 Thank you

Questions??