towards general-purpose resource management in shared ...jcmace/presentations/mace14...•5 design...

33
Towards General-Purpose Resource Management in Shared Cloud Services Jonathan Mace, Brown University Peter Bodik, MSR Redmond Rodrigo Fonseca, Brown University Madanlal Musuvathi, MSR Redmond

Upload: others

Post on 01-Dec-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Towards General-Purpose Resource Management in Shared ...jcmace/presentations/mace14...•5 design principles for resource policies in shared-tenant systems •Retro –prototype for

Towards General-Purpose Resource Management

in Shared Cloud Services

Jonathan Mace, Brown University

Peter Bodik, MSR Redmond

Rodrigo Fonseca, Brown University

Madanlal Musuvathi, MSR Redmond

Page 2: Towards General-Purpose Resource Management in Shared ...jcmace/presentations/mace14...•5 design principles for resource policies in shared-tenant systems •Retro –prototype for

Shared-tenant cloud services

Processes service requests from multiple clients

✓ Great for cost and efficiency

✘ Performance is a challenge

Aggressive tenants and system maintenance tasks

Resource starvation and bottlenecks

Degraded performance, Violated SLOs, system outages

2

Page 3: Towards General-Purpose Resource Management in Shared ...jcmace/presentations/mace14...•5 design principles for resource policies in shared-tenant systems •Retro –prototype for

Ideally

manage resources to provide end-to-end guarantees and isolation

Challenge

OS/hypervisor mechanisms insufficient✘ Shared threads & processes✘ Application-level resource bottlenecks (locks, queues)✘ Resources across multiple processes and machines

Today

lack of guarantees, isolation

some ad-hoc solutions

3

Shared-tenant cloud services

Page 4: Towards General-Purpose Resource Management in Shared ...jcmace/presentations/mace14...•5 design principles for resource policies in shared-tenant systems •Retro –prototype for

This paper

• 5 design principles for resource policies in shared-tenant systems

• Retro – prototype for principled resource management

• Preliminary demonstration of Retro in HDFS

4

Page 5: Towards General-Purpose Resource Management in Shared ...jcmace/presentations/mace14...•5 design principles for resource policies in shared-tenant systems •Retro –prototype for

Hadoop Distributed File System (HDFS)

5

HDFS NameNode HDFS DataNodeHDFS DataNode

HDFS DataNode

Filesystem metadata Replicated block storage

Page 6: Towards General-Purpose Resource Management in Shared ...jcmace/presentations/mace14...•5 design principles for resource policies in shared-tenant systems •Retro –prototype for

6

HDFS NameNode HDFS DataNodeHDFS DataNode

HDFS DataNode

Hadoop Distributed File System (HDFS)

Filesystem metadata Replicated block storage

Page 7: Towards General-Purpose Resource Management in Shared ...jcmace/presentations/mace14...•5 design principles for resource policies in shared-tenant systems •Retro –prototype for

7

Page 8: Towards General-Purpose Resource Management in Shared ...jcmace/presentations/mace14...•5 design principles for resource policies in shared-tenant systems •Retro –prototype for

8

Page 9: Towards General-Purpose Resource Management in Shared ...jcmace/presentations/mace14...•5 design principles for resource policies in shared-tenant systems •Retro –prototype for

HDFS NameNode HDFS DataNodeHDFS DataNode

HDFS DataNode

9

Page 10: Towards General-Purpose Resource Management in Shared ...jcmace/presentations/mace14...•5 design principles for resource policies in shared-tenant systems •Retro –prototype for

HDFS NameNode HDFS DataNodeHDFS DataNode

HDFS DataNode

10

Page 11: Towards General-Purpose Resource Management in Shared ...jcmace/presentations/mace14...•5 design principles for resource policies in shared-tenant systems •Retro –prototype for

HDFS NameNode HDFS DataNodeHDFS DataNode

HDFS DataNode

11

Page 12: Towards General-Purpose Resource Management in Shared ...jcmace/presentations/mace14...•5 design principles for resource policies in shared-tenant systems •Retro –prototype for

Principle 1: Consider all resources and request types

• Fine-grained resources within processes

• Resources shared between processes (disk, network)

• Many different API calls

• Bottlenecks can crop up in many placeshardware resources: disk, network, cpu, …software resources: locks, queues, …data structures: transaction logs, shared batches, …

12

Page 13: Towards General-Purpose Resource Management in Shared ...jcmace/presentations/mace14...•5 design principles for resource policies in shared-tenant systems •Retro –prototype for

HDFS NameNode HDFS DataNodeHDFS DataNode

HDFS DataNode

13

Page 14: Towards General-Purpose Resource Management in Shared ...jcmace/presentations/mace14...•5 design principles for resource policies in shared-tenant systems •Retro –prototype for

HDFS NameNode HDFS DataNodeHDFS DataNode

HDFS DataNode

14

Page 15: Towards General-Purpose Resource Management in Shared ...jcmace/presentations/mace14...•5 design principles for resource policies in shared-tenant systems •Retro –prototype for

HDFS NameNode HDFS DataNodeHDFS DataNode

HDFS DataNode

15

Page 16: Towards General-Purpose Resource Management in Shared ...jcmace/presentations/mace14...•5 design principles for resource policies in shared-tenant systems •Retro –prototype for

HDFS NameNode HDFS DataNodeHDFS DataNode

HDFS DataNode

16

Page 17: Towards General-Purpose Resource Management in Shared ...jcmace/presentations/mace14...•5 design principles for resource policies in shared-tenant systems •Retro –prototype for

Principle 2: Distinguish between tenants

• Tenants might send different types of requests

• Tenants might be utilizing different machines

• If a policy is efficient, it should be able to target the cause of contention

e.g.,

if a tenant is causing contention, throttle

otherwise leave the tenant alone

17

Page 18: Towards General-Purpose Resource Management in Shared ...jcmace/presentations/mace14...•5 design principles for resource policies in shared-tenant systems •Retro –prototype for

HDFS NameNode HDFS DataNodeHDFS DataNode

HDFS DataNode

18

Page 19: Towards General-Purpose Resource Management in Shared ...jcmace/presentations/mace14...•5 design principles for resource policies in shared-tenant systems •Retro –prototype for

HDFS NameNode HDFS DataNodeHDFS DataNode

HDFS DataNode

Admission Control

19

Page 20: Towards General-Purpose Resource Management in Shared ...jcmace/presentations/mace14...•5 design principles for resource policies in shared-tenant systems •Retro –prototype for

HDFS NameNode HDFS DataNodeHDFS DataNode

while (!Thread.isInterrupted()){

sendPacket();

}

HDFS DataNode

Admission Control

20

Page 21: Towards General-Purpose Resource Management in Shared ...jcmace/presentations/mace14...•5 design principles for resource policies in shared-tenant systems •Retro –prototype for

HDFS NameNode HDFS DataNodeHDFS DataNode

while (!Thread.isInterrupted()){

rate_limit();

sendPacket();

}

Principle 5:

Schedule early, schedule often

21

HDFS DataNode

Admission Control

Page 22: Towards General-Purpose Resource Management in Shared ...jcmace/presentations/mace14...•5 design principles for resource policies in shared-tenant systems •Retro –prototype for

Resource Management Design Principles

1. Consider all request types and all resources

2. Distinguish between tenants

3. Treat foreground and background tasks uniformly

4. Estimate resource usage at runtime

5. Schedule early, schedule often

Retro – prototype for principled resource management in shared-tenant systems

22

Page 23: Towards General-Purpose Resource Management in Shared ...jcmace/presentations/mace14...•5 design principles for resource policies in shared-tenant systems •Retro –prototype for

Retro: end-to-end tracing

23

Tenants

Page 24: Towards General-Purpose Resource Management in Shared ...jcmace/presentations/mace14...•5 design principles for resource policies in shared-tenant systems •Retro –prototype for

Retro: end-to-end tracing

24

Tenants

Page 25: Towards General-Purpose Resource Management in Shared ...jcmace/presentations/mace14...•5 design principles for resource policies in shared-tenant systems •Retro –prototype for

Retro: application-level resource interception

25

Tenants

Page 26: Towards General-Purpose Resource Management in Shared ...jcmace/presentations/mace14...•5 design principles for resource policies in shared-tenant systems •Retro –prototype for

Retro: aggregation and centralized reporting

26

Tenants

Page 27: Towards General-Purpose Resource Management in Shared ...jcmace/presentations/mace14...•5 design principles for resource policies in shared-tenant systems •Retro –prototype for

Retro: application-level enforcement

27

Tenants

Page 28: Towards General-Purpose Resource Management in Shared ...jcmace/presentations/mace14...•5 design principles for resource policies in shared-tenant systems •Retro –prototype for

Retro: distributed scheduling

28

Tenants

Page 29: Towards General-Purpose Resource Management in Shared ...jcmace/presentations/mace14...•5 design principles for resource policies in shared-tenant systems •Retro –prototype for

Tenants

29

Retro: distributed scheduling

Page 30: Towards General-Purpose Resource Management in Shared ...jcmace/presentations/mace14...•5 design principles for resource policies in shared-tenant systems •Retro –prototype for

Early Results

30

Op

en

Rea

d

Cre

ate

Ren

ame

Del

ete

No

rmal

ized

Th

rou

ghp

ut

HDFS

HDFS w/ Retro

1.1

1

0.9

Op

en

Rea

d

Cre

ate

Ren

ame

Del

ete

No

rmal

ized

Lat

ency

1.2

1

0.8

HDFS NNBenchbenchmark

0.01% to 2% average overhead on end-to-end latency, throughput

Page 31: Towards General-Purpose Resource Management in Shared ...jcmace/presentations/mace14...•5 design principles for resource policies in shared-tenant systems •Retro –prototype for

HDFS NameNode HDFS DataNodeHDFS DataNode

HDFS DataNode

31

Page 32: Towards General-Purpose Resource Management in Shared ...jcmace/presentations/mace14...•5 design principles for resource policies in shared-tenant systems •Retro –prototype for

HDFS NameNode HDFS DataNodeHDFS DataNode

HDFS DataNode

32

Page 33: Towards General-Purpose Resource Management in Shared ...jcmace/presentations/mace14...•5 design principles for resource policies in shared-tenant systems •Retro –prototype for

Retrospective

Thus far:• Per-tenant identification

• Resource measurements

• Schedule enforcement

Next steps:• Abstractions for writing simplified high-level policies

• Low-level enforcement mechanisms

• Policies to monitor system, find bottlenecks, provide guarantees

33