autonomic sla-driven provisioning for cloud applications

33
Autonomic SLA-driven Provisioning for Cloud Applications Nicolas Bonvin , Thanasis Papaioannou, Karl Aberer CCGRID 2011, May 23-26 2011, New Port Beach, CA, USA [email protected] LSIR - EPFL

Upload: nbonvin

Post on 18-Dec-2014

948 views

Category:

Technology


1 download

DESCRIPTION

Significant achievements have been made for automated allocation of cloud resources. However, the performance of applications may be poor in peak load periods, unless their cloud resources are dynamically adjusted. Moreover, although cloud resources dedicated to different applications are virtually isolated, performance fluctuations do occur because of resource sharing, and software or hardware failures (e.g. unstable virtual machines, power outages, etc.). We propose a decentralized economic approach for dynamically adapting the cloud resources of various applications, so as to statistically meet their SLA performance and availability goals in the presence of varying loads or failures. According to our approach, the dynamic economic fitness of a Web service determines whether it is replicated or migrated to another server, or deleted. The economic fitness of a Web service depends on its individual performance constraints, its load, and the utilization of the resources where it resides. Cascading performance objectives are dynamically calculated for individual tasks in the application workflow according to the user requirements.

TRANSCRIPT

Page 1: Autonomic SLA-driven Provisioning for Cloud Applications

Autonomic SLA-driven Provisioning for Cloud Applications

Nicolas Bonvin, Thanasis Papaioannou, Karl Aberer

CCGRID 2011, May 23-26 2011, New Port Beach, CA, USA

[email protected] - EPFL

Page 2: Autonomic SLA-driven Provisioning for Cloud Applications

● A distributed, component-based application running on an elastic infrastructure

Cloud Apps – Issue #1 : Placement

2 EPFL – LSIR - Nicolas Bonvin

C1C1 C2C2 C3C3 C4C4

Page 3: Autonomic SLA-driven Provisioning for Cloud Applications

● A distributed, component-based application running on an elastic infrastructure

Cloud Apps – Issue #1 : Placement

3 EPFL – LSIR - Nicolas Bonvin

C1C1 C2C2 C3C3 C4C4

VM1 VM2 VM3

Page 4: Autonomic SLA-driven Provisioning for Cloud Applications

● A distributed, component-based application running on an elastic infrastructure

● Performance of C1, C2 and C3 is probably less than C4● No info on other VMs colocated on same server !

Cloud Apps – Issue #1 : Placement

4 EPFL – LSIR - Nicolas Bonvin

C3C3 C4C4

VM2 VM3

Server 1 Server 2

C1C1 C2C2

VM1

Page 5: Autonomic SLA-driven Provisioning for Cloud Applications

● A distributed, component-based application running on an elastic infrastructure

● Performance of C1, C2 and C3 is probably less than C4● No info on other VMs colocated on same server !

Cloud Apps – Issue #1 : Placement

5 EPFL – LSIR - Nicolas Bonvin

No control on placement

C3C3 C4C4

VM2 VM3

Server 1 Server 2

C1C1 C2C2

VM1

Page 6: Autonomic SLA-driven Provisioning for Cloud Applications

● Load-balanced trafic to 4 identical components on 4 identical VMs

Cloud Apps – Issue #2 : Unstability

6 EPFL – LSIR - Nicolas Bonvin

C1C1 C1C1 C1C1 C1C1

VM1 VM2 VM3 VM4

100 ms 100 ms 100 ms 100 ms

Page 7: Autonomic SLA-driven Provisioning for Cloud Applications

● Load-balanced trafic to 4 identical components on 4 identical VMs

– VM performance can vary up to a ratio 4 ! [Dej2009]

● Physical server, Hypervisor, Storage, ...

Cloud Apps – Issue #2 : Unstability

7 EPFL – LSIR - Nicolas Bonvin

C1C1 C1C1 C1C1 C1C1

VM1 VM2 VM3 VM4

100 ms 140 ms 100 ms 100 ms

Page 8: Autonomic SLA-driven Provisioning for Cloud Applications

● Load-balanced trafic to 4 identical components on 4 identical VMs

– VM performance can vary up to a ratio 4 ! [Dej2009]

● Physical server, Hypervisor, Storage, ...● Component overloaded

Cloud Apps – Issue #2 : Unstability

8 EPFL – LSIR - Nicolas Bonvin

C1C1 C1C1 C1C1 C1C1

VM1 VM2 VM3 VM4

130 ms 140 ms 100 ms 100 ms

Page 9: Autonomic SLA-driven Provisioning for Cloud Applications

● Load-balanced trafic to 4 identical components on 4 identical VMs

– VM performance can vary up to a ratio 4 ! [Dej2009]

● Physical server, Hypervisor, Storage, ...● Component overloaded● Component bug, crash, deadlock, ...

Cloud Apps – Issue #2 : Unstability

9 EPFL – LSIR - Nicolas Bonvin

C1C1 C1C1 C1C1 C1C1

VM1 VM2 VM3 VM4

130 ms 140 ms 100 ms infinity

Page 10: Autonomic SLA-driven Provisioning for Cloud Applications

● Load-balanced trafic to 4 identical components on 4 identical VMs

– VM performance can vary up to a ratio 4 ! [Dej2009]

● Physical server, Hypervisor, Storage, ...● Component overloaded● Component bug, crash, deadlock, ...● Failure of C1 on VM4 -> load is rebalanced

Cloud Apps – Issue #2 : Unstability

10 EPFL – LSIR - Nicolas Bonvin

C1C1 C1C1 C1C1 C1C1

VM1 VM2 VM3 VM4

140 ms 150 ms 130 ms infinity

Page 11: Autonomic SLA-driven Provisioning for Cloud Applications

● Load-balanced trafic to 4 identical components on 4 identical VMs

– VM performance can vary up to a ratio 4 ! [Dej2009]

● Physical server, Hypervisor, Storage, ...● Component overloaded● Component bug, crash, deadlock, ...● Failure of C1 on VM4 -> load is rebalanced

Cloud Apps – Issue #2 : Unstability

11 EPFL – LSIR - Nicolas Bonvin

C1C1 C1C1 C1C1 C1C1

VM1 VM2 VM3 VM4

140 ms 150 ms 130 ms infinity

Application should react early !

Page 12: Autonomic SLA-driven Provisioning for Cloud Applications

● Build for failures

– Do not trust the underlying infrastructure

– Do not trust your components either !

● Components should adapt to the changing conditions

– Quickly

– Automatically

– e.g. by replacing a wonky VM by a new one

Cloud Apps – Overview

12 EPFL – LSIR - Nicolas Bonvin

Page 13: Autonomic SLA-driven Provisioning for Cloud Applications

Scarce: a framework to build scalable cloud applications

Page 14: Autonomic SLA-driven Provisioning for Cloud Applications

Architecture Overview

14 EPFL – LSIR - Nicolas Bonvin

Agent

Server

GOSSIPING + BROADCAST

Agent

A

B

E

● An agent on each server / VM

– starts/stops/monitors the components

– Takes decisions on behalf of the components

● An agent communicates with other agents

– Routing table

– Status of the server (resources usage)

Agent

Agent

Agent

Agent

Page 15: Autonomic SLA-driven Provisioning for Cloud Applications

An economic approach

15 EPFL – LSIR - Nicolas Bonvin

● Time is split into epochs (no synchronization between servers)● Servers charge a virtual rent for hosting a component according to

– Current resource usage (I/O, CPU, ...) of the server

– Technical factors (HW, connectivity, ...)

– Non-technical factors (country stability, ....)

Page 16: Autonomic SLA-driven Provisioning for Cloud Applications

An economic approach

16 EPFL – LSIR - Nicolas Bonvin

● Time is split into epochs (no synchronization between servers)● Servers charge a virtual rent for hosting a component according to

– Current resource usage (I/O, CPU, ...) of the server

– Technical factors (HW, connectivity, ...)

– Non-technical factors (country stability, ....)

● Components

– Pay virtual rent at each epoch

– Gain virtual money by processing requests

– Take decisions based on balance ( = gain – rent )

● Replicate, migrate, suicide, stay

● Virtual rents are updated by gossiping (no centralized board)

Page 17: Autonomic SLA-driven Provisioning for Cloud Applications

Economic model (i)

17 EPFL – LSIR - Nicolas Bonvin

● The rent of a server is different for each component !

Page 18: Autonomic SLA-driven Provisioning for Cloud Applications

Economic model (ii)

18 EPFL – LSIR - Nicolas Bonvin

● VM1 and VM2 have an « identical » resources usage : 45%● Server rent = server's resources usage with component's weights

– Rent for C1 @ VM1 > rent for C1 @ VM2

C1C1CPU : 30%I/O : 5%

VM1

CPU : 70%I/O : 20%

Multiplexing of server resources

VM2

CPU : 25%I/O : 65%

?

Page 19: Autonomic SLA-driven Provisioning for Cloud Applications

Economic model (iii)

19 EPFL – LSIR - Nicolas Bonvin

● Choosing a candidate server j during replication/migration of a component i

– netbenefit maximization

● 2 optimization goals :

– high-availability by geographical diversity of replicas

– low latency by grouping related components

● gj : weight related to the proximity of the server location to the geographical distribution of the client requests to the component

● Si is the set of server hosting a replica of component i

Page 20: Autonomic SLA-driven Provisioning for Cloud Applications

SLA Performance Guarantees (i)

20 EPFL – LSIR - Nicolas Bonvin

● Each component has its own SLA constraints● SLA derived directly from entry components

● Resp. Time = Service Time + max (Resp. Time of Dependencies)

C3C3

C1SLA : 500ms

C1SLA : 500ms

C2C2

C5C5

C4C4

Page 21: Autonomic SLA-driven Provisioning for Cloud Applications

SLA Performance Guarantees (ii)

21 EPFL – LSIR - Nicolas Bonvin

● SLA propagation from parents to children● Parent j sends its performance constraints (e.g. response time upper

bound) to its dependencies D(j) :

● Child i computes its own performance constraints :

● : group of constraints sent by the replicas of the parent g

Page 22: Autonomic SLA-driven Provisioning for Cloud Applications

SLA Performance Guarantees (iii)

22 EPFL – LSIR - Nicolas Bonvin

● SLA propagation from parents to children

Page 23: Autonomic SLA-driven Provisioning for Cloud Applications

Automatic Provisioning

23 EPFL – LSIR - Nicolas Bonvin

● Usage of allocated resources is maximized :

– autonomic migration / replication / suicide of components

– not enough to ensure end-to-end response time

● Cloud resources managed by framework via cloud API

● Each individual component has to satisfy its own SLA

– SLA easily met -> decrease resources (scale down)

– SLA not met -> increase resources (scale up, scale out)

Page 24: Autonomic SLA-driven Provisioning for Cloud Applications

Adaptivity to slow servers

24 EPFL – LSIR - Nicolas Bonvin

● Each component keeps statistics about its children

– e.g. 95th perc. response time

● A routing coefficient is computed for each child at each epoch

– Send more requests to more performant children

Page 25: Autonomic SLA-driven Provisioning for Cloud Applications

Evaluation

Page 26: Autonomic SLA-driven Provisioning for Cloud Applications

Evaluation: Setup

26 EPFL – LSIR - Nicolas Bonvin

● 5 components, mostly CPU-intensive (wc >> wm,wn,wd)

● 8 8-cores servers (Intel Core i7 920, 2.67 GHz, 8GB, Linux 2.6.32-trunk-amd64)

● d=0, C=110, k =10000, xs* = 25%

C3C3

C1SLA : 500ms

C1SLA : 500ms

C2C2

C5C5

C4C4

Page 27: Autonomic SLA-driven Provisioning for Cloud Applications

Adaptation to Varying Load (i)

27 EPFL – LSIR - Nicolas Bonvin

● 5 rps to 60 rps at minute 8, step 5 rps/min● Static setup : 2 servers with 2 cores

Page 28: Autonomic SLA-driven Provisioning for Cloud Applications

Adaptation to Varying Load (ii)

28 EPFL – LSIR - Nicolas Bonvin

● 5 rps to 60 rps at minute 8, step 5 rps/min● Static setup : 2 servers with 2 cores

Page 29: Autonomic SLA-driven Provisioning for Cloud Applications

Adaptation to Slow Server

29 EPFL – LSIR - Nicolas Bonvin

● Max 2 cores/server, 25 rps● At minute 4, a server gets slower (200 ms delay)

Page 30: Autonomic SLA-driven Provisioning for Cloud Applications

Scalability

30 EPFL – LSIR - Nicolas Bonvin

● Add 5 rps

per minute until 150 rps● Max 6 cores/server

Page 31: Autonomic SLA-driven Provisioning for Cloud Applications

Conclusion

Page 32: Autonomic SLA-driven Provisioning for Cloud Applications

Conclusion

32 EPFL – LSIR - Nicolas Bonvin

● Framework for building cloud applications● Elasticity : add/remove resources ● High Availability : software, hardware, network failures● Scalability : growing load, peaks, scaling down, ...

– Quick replication of busy components

● Load Balancing : load has to be shared by all available servers

– Replication of busy components

– Migration of less busy components

– Reach equilibrium when load is stable

● SLA performance guarantees

– Automatic provisioning

● No synchronization, fully decentralized

Page 33: Autonomic SLA-driven Provisioning for Cloud Applications

Thank you !