bridging the tenant-provider gap in cloud services

37
Bridging the Tenant-Provider Gap in Cloud Services Virajith Jalaparti, Hitesh Ballani, Paolo Costa Thomas Karagiannis, Ant Rowstron

Upload: toviel

Post on 10-Feb-2016

46 views

Category:

Documents


0 download

DESCRIPTION

Bridging the Tenant-Provider Gap in Cloud Services. Virajith Jalaparti , Hitesh Ballani , Paolo Costa Thomas Karagiannis , Ant Rowstron. Today’s Interface to the Cloud. Resource-centric Interface “I want 100 small VMs” Per-VM Per-Hour pricing E.g.: $0.08 per hour in Amazon EC2. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Bridging the Tenant-Provider Gap in Cloud Services

Bridging the Tenant-Provider Gap in Cloud Services

Virajith Jalaparti, Hitesh Ballani, Paolo Costa Thomas Karagiannis, Ant

Rowstron

Page 2: Bridging the Tenant-Provider Gap in Cloud Services

04/22/2023 Bridging the Tenant-Provider Gap in Cloud Services 2

Today’s Interface to the Cloud

• Resource-centric Interface– “I want 100 small VMs”

• Per-VM Per-Hour pricing– E.g.: $0.08 per hour in Amazon EC2

Simple but problematic!

Page 3: Bridging the Tenant-Provider Gap in Cloud Services

04/22/2023 Bridging the Tenant-Provider Gap in Cloud Services 3

Using the Resource-centric Interface

User

Job

Private Cluster [40 machines]

Done in T hrs

Cloud Provider40 VMs

T hrs, $40T

2T hrs,$80T!!

Unpredictable/Variable Performance and Costs

Page 4: Bridging the Tenant-Provider Gap in Cloud Services

04/22/2023 Bridging the Tenant-Provider Gap in Cloud Services 4

Proposal: Job-centric Interface

Cloud Provider

Finish in T hrs

Dedicated Resources

Done in T hrs!!

• Tenant specifies high-level goals they care about• Completion Time, Cost to run a job etc.

• Provider determines resources to use to meet tenants’ goal• VMs, Network Bandwidth etc.

User

Page 5: Bridging the Tenant-Provider Gap in Cloud Services

04/22/2023 Bridging the Tenant-Provider Gap in Cloud Services 5

Proposal: Job-centric Interface

Guaranteed performance for tenants

Page 6: Bridging the Tenant-Provider Gap in Cloud Services

04/22/2023 Bridging the Tenant-Provider Gap in Cloud Services 6

Proposal: Job-centric Interface

Incentive for provider?

Exploit multi-resource tradeoff

5 10 15 20 25 30 35 400

50100150200250300

Number of Compute Instances (N)

Band

wid

th (B

) (in

M

bps)

Resource Trade-off CurveLinkGraph in 300sec

Increases Goodput/Revenue<N,B> = <10, 150>

<N,B> = <20, 100>

Page 7: Bridging the Tenant-Provider Gap in Cloud Services

04/22/2023 Bridging the Tenant-Provider Gap in Cloud Services 7

Outline• Motivation• Job-centric Interface–Multi-resource tradeoff

• Bazaar: A Job-centric Cloud Framework– Performance Prediction– Resource Selection

• Evaluation• Bazaar: Extensions and Opportunities• Conclusion

Page 8: Bridging the Tenant-Provider Gap in Cloud Services

04/22/2023 Bridging the Tenant-Provider Gap in Cloud Services 8

Bazaar: A Job-centric Cloud Framework

Bazaar

User

Job Specificati

onCompletio

n Time

Performance

PredictionResource Selection

<N1,B1><N2,B2>

… <Nk,Bk>

Datacenter State

<N,B>

Focus on MapReduce Applications

Focus on two resources: Compute (N) and Bandwidth (B)

Notation: <N,B> denotes resources allocated

Resource tuple

Page 9: Bridging the Tenant-Provider Gap in Cloud Services

04/22/2023 Bridging the Tenant-Provider Gap in Cloud Services 9

Performance Prediction• Well studied area– Run-time profiling, Static analysis,

Simulations

• Bazaar requirements– Fast prediction (trades-off with

accuracy)– Account for Network along with

Compute • Not addressed by Jockey, MRPerf, Aria

Dedicated N and B makes the problem tractable

Page 10: Bridging the Tenant-Provider Gap in Cloud Services

04/22/2023 Bridging the Tenant-Provider Gap in Cloud Services 10

MRCuTE: Performance Prediction in Bazaar

MRCuTEJob SpecificationsProgram (P), Input

data (I), Sample Data (Is)

Resource Parameters <N, B>

Completion Time

Analytical Model

Profiler

Analytical Modeling + Profiling based approach

Map

Map

Map

Reduce

Reduce

Map Phase

Reduce Phase

Shuffle

Phase

Completion Time determined by (a)Input data size(b)Rate of progress

Job Specific

Program (P) profiled using sample data (Is) on one machine

Page 11: Bridging the Tenant-Provider Gap in Cloud Services

04/22/2023 Bridging the Tenant-Provider Gap in Cloud Services 11

Resource PredictionMRCuTE( P, I, Is, N, B ) Completion TimeN = MRCuTE-1( P, I, Is, Completion Time , B)

User specifiedProvider can determine multiple <N, B>

resource tuplesB1 N1

B2 N2B3 N3

Which <N,B> to use?

Page 12: Bridging the Tenant-Provider Gap in Cloud Services

04/22/2023 Bridging the Tenant-Provider Gap in Cloud Services 12

Resource Selection

Which <N,B> tuple maximizes the provider’s ability to accept future requests?

Increases Goodput/Revenue

Page 13: Bridging the Tenant-Provider Gap in Cloud Services

04/22/2023 Bridging the Tenant-Provider Gap in Cloud Services 13

Resource Selection: Example

4 Physical machines 2 VM slots each

TOR TOR

R1 = <3 VMs, 500Mbps>

R2 = <6 VMs, 200Mbps>

Replica 2

<4 VMs, 400Mbps>

Select the resource tuple leading to better goodput

or

Greedy packing allocation algorithm : Oktopus [Sigcomm’11]

Replica 1

Replica 1 will accept more requests than Replica 2

Page 14: Bridging the Tenant-Provider Gap in Cloud Services

04/22/2023 Bridging the Tenant-Provider Gap in Cloud Services 14

Resource Selection

• Similar to Multi-dimensional Bin Packing

• Heuristic: Minimize Resource Imbalance Metric– Select <N,B> which balances the

remaining capacity across resources

Page 15: Bridging the Tenant-Provider Gap in Cloud Services

04/22/2023 Bridging the Tenant-Provider Gap in Cloud Services 15

Outline• Motivation• Job-centric Interface– Resource Malleability

• Bazaar: A Job-centric Cloud Framework– Performance Prediction– Resource Selection

• Evaluation• Bazaar: Extensions and Opportunities• Conclusion

Page 16: Bridging the Tenant-Provider Gap in Cloud Services

04/22/2023 Bridging the Tenant-Provider Gap in Cloud Services 16

Evaluation• MRCuTE: Prediction accuracy

• Benefits of Bazaar– Testbed Deployment– Simulations

Page 17: Bridging the Tenant-Provider Gap in Cloud Services

04/22/2023 Bridging the Tenant-Provider Gap in Cloud Services 17

MRCuTE: Prediction Accuracy

• Setup: Hadoop on 35-node Emulab cluster Sort with 200GB of random

data

Average prediction error = 8.9%

Page 18: Bridging the Tenant-Provider Gap in Cloud Services

04/22/2023 Bridging the Tenant-Provider Gap in Cloud Services 18

MRCuTE: Prediction Accuracy

Sort WordCountGridMix TF-IDF LinkGraph0

10

20

30

40

50

% A

vera

ge E

rror

5 MapReduce Jobs

Average Error < 12%

Overcome prediction inaccuracies using slack

Page 19: Bridging the Tenant-Provider Gap in Cloud Services

04/22/2023 Bridging the Tenant-Provider Gap in Cloud Services 19

Evaluation: Benefits of Bazaar• Metrics– Fraction of rejected/accepted requests– Datacenter Goodput

• Strategies– Bazaar: Select <N,B> using resource imbalance

metric– Baseline: Select <N,B> randomly

• Workload– Poisson job arrival process with a target arrival rate

Page 20: Bridging the Tenant-Provider Gap in Cloud Services

04/22/2023 Bridging the Tenant-Provider Gap in Cloud Services 20

Bazaar: Testbed Deployment

• Working prototype on 26 node Emulab cluster–Workload: 100 Sort Jobs

Accepted Jobs

Goodput0

5

10

15

20G

ain

of B

azaa

r (%

) re

lativ

e to

Bas

elin

e

11.4% more

15.5% more

Page 21: Bridging the Tenant-Provider Gap in Cloud Services

04/22/2023 Bridging the Tenant-Provider Gap in Cloud Services 21

Bazaar: Simulations• Datacenter scale: 16,000 machines• Cross-validated using testbed

Operational occupancy range for services like Amazon

EC2 is 70-80%

Bazaar is ~50% better than

baseline

Requests arrive faster

Page 22: Bridging the Tenant-Provider Gap in Cloud Services

04/22/2023 Bridging the Tenant-Provider Gap in Cloud Services 22

Outline• Motivation• Job-centric Interface– Resource Malleability

• Bazaar: A Job-centric Cloud Framework– Performance Prediction– Resource Selection

• Evaluation• Bazaar: Extensions and Opportunities• Conclusion

Page 23: Bridging the Tenant-Provider Gap in Cloud Services

04/22/2023 SOCC 2012 - Bazaar 23

Bazaar-T: An extension of Bazaar• Bazaar trades-off N and B– Finish jobs “on time”

• Bazaar-T: Exploits flexibility with time– Finish jobs “before time”–More resources available in future

• Extend resource imbalance metric to time domain

Page 24: Bridging the Tenant-Provider Gap in Cloud Services

04/22/2023 Bridging the Tenant-Provider Gap in Cloud Services 24

Bazaar-T: More Flexibility, More Gains

Bazaar vs. Bazaar-T Bazaar-T has 10-20% more goodput than

Bazaar

Page 25: Bridging the Tenant-Provider Gap in Cloud Services

04/22/2023 Bridging the Tenant-Provider Gap in Cloud Services 25

Bazaar: Pricing implications• Today: Resource-based pricing – E.g: Using 20 VMs for 4hrs costs $80– Extendable to multiple resources– No incentive for provider to finish in time

• Bazaar enables job-based pricing– E.g.: Finish Sort over 200GB in 4hrs costs

$100– Tenants pay based on job characteristics– Aligns tenant and provider interests

Page 26: Bridging the Tenant-Provider Gap in Cloud Services

04/22/2023 Bridging the Tenant-Provider Gap in Cloud Services 26

Conclusion• Bazaar: Job-centric Framework for

MapReduce

• Win-win situation for provider and tenant– Tenants get predictable performance– Providers get increased revenue

• Provides new avenues for pricing

Page 27: Bridging the Tenant-Provider Gap in Cloud Services

04/22/2023 Bridging the Tenant-Provider Gap in Cloud Services

27

Thank You!

Page 28: Bridging the Tenant-Provider Gap in Cloud Services

04/22/2023 Bridging the Tenant-Provider Gap in Cloud Services

28

Back-up Slides

Page 29: Bridging the Tenant-Provider Gap in Cloud Services

04/22/2023 Bridging the Tenant-Provider Gap in Cloud Services 29

Related Work• Performance Prediction

– MRPerf [Mascots 2009], Mumak• Detailed Simulations

– Elastisizer [SOCC 2011]• Detailed Modeling of MapReduce

• SLOs– Jockey [Eurosys 2012]:

• Simulations; Runtime monitoring to meet deadline– Conductor [NSDI 2012]

• Solves optimization problem to meet goals– Proteus [Sigcomm 2012]:

• Time varying network reservations– Aria [ICAC 2011]

• Profiling and modeling

Page 30: Bridging the Tenant-Provider Gap in Cloud Services

04/22/2023 Bridging the Tenant-Provider Gap in Cloud Services 30

Hadoop Jobs Details

Page 31: Bridging the Tenant-Provider Gap in Cloud Services

04/22/2023 Bridging the Tenant-Provider Gap in Cloud Services 31

MRCuTE: Profiling Time

Page 32: Bridging the Tenant-Provider Gap in Cloud Services

04/22/2023 Bridging the Tenant-Provider Gap in Cloud Services 32

MRCuTE: Accounting for heterogeneity

Page 33: Bridging the Tenant-Provider Gap in Cloud Services

04/22/2023 Bridging the Tenant-Provider Gap in Cloud Services 33

Addressing Skew- Slack

% of Late Jobs vs. Slack

% of Rejected Requests vs. Slack

Page 34: Bridging the Tenant-Provider Gap in Cloud Services

04/22/2023 Bridging the Tenant-Provider Gap in Cloud Services 34

Goodput vs. Oversubscription

Page 35: Bridging the Tenant-Provider Gap in Cloud Services

04/22/2023 Bridging the Tenant-Provider Gap in Cloud Services 35

Rejected Requests vs. Mean BW

Page 36: Bridging the Tenant-Provider Gap in Cloud Services

04/22/2023 Bridging the Tenant-Provider Gap in Cloud Services 36

Rejected requests vs. Occupancy

Page 37: Bridging the Tenant-Provider Gap in Cloud Services

04/22/2023 Bridging the Tenant-Provider Gap in Cloud Services 37

Bazaar vs. Fair Sharing