autonomic sla-driven provisioning for cloud applications

Post on 18-Dec-2014

948 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Significant achievements have been made for automated allocation of cloud resources. However, the performance of applications may be poor in peak load periods, unless their cloud resources are dynamically adjusted. Moreover, although cloud resources dedicated to different applications are virtually isolated, performance fluctuations do occur because of resource sharing, and software or hardware failures (e.g. unstable virtual machines, power outages, etc.). We propose a decentralized economic approach for dynamically adapting the cloud resources of various applications, so as to statistically meet their SLA performance and availability goals in the presence of varying loads or failures. According to our approach, the dynamic economic fitness of a Web service determines whether it is replicated or migrated to another server, or deleted. The economic fitness of a Web service depends on its individual performance constraints, its load, and the utilization of the resources where it resides. Cascading performance objectives are dynamically calculated for individual tasks in the application workflow according to the user requirements.

TRANSCRIPT

Autonomic SLA-driven Provisioning for Cloud Applications

Nicolas Bonvin, Thanasis Papaioannou, Karl Aberer

CCGRID 2011, May 23-26 2011, New Port Beach, CA, USA

nicolas.bonvin@epfl.chLSIR - EPFL

● A distributed, component-based application running on an elastic infrastructure

Cloud Apps – Issue #1 : Placement

2 EPFL – LSIR - Nicolas Bonvin

C1C1 C2C2 C3C3 C4C4

● A distributed, component-based application running on an elastic infrastructure

Cloud Apps – Issue #1 : Placement

3 EPFL – LSIR - Nicolas Bonvin

C1C1 C2C2 C3C3 C4C4

VM1 VM2 VM3

● A distributed, component-based application running on an elastic infrastructure

● Performance of C1, C2 and C3 is probably less than C4● No info on other VMs colocated on same server !

Cloud Apps – Issue #1 : Placement

4 EPFL – LSIR - Nicolas Bonvin

C3C3 C4C4

VM2 VM3

Server 1 Server 2

C1C1 C2C2

VM1

● A distributed, component-based application running on an elastic infrastructure

● Performance of C1, C2 and C3 is probably less than C4● No info on other VMs colocated on same server !

Cloud Apps – Issue #1 : Placement

5 EPFL – LSIR - Nicolas Bonvin

No control on placement

C3C3 C4C4

VM2 VM3

Server 1 Server 2

C1C1 C2C2

VM1

● Load-balanced trafic to 4 identical components on 4 identical VMs

Cloud Apps – Issue #2 : Unstability

6 EPFL – LSIR - Nicolas Bonvin

C1C1 C1C1 C1C1 C1C1

VM1 VM2 VM3 VM4

100 ms 100 ms 100 ms 100 ms

● Load-balanced trafic to 4 identical components on 4 identical VMs

– VM performance can vary up to a ratio 4 ! [Dej2009]

● Physical server, Hypervisor, Storage, ...

Cloud Apps – Issue #2 : Unstability

7 EPFL – LSIR - Nicolas Bonvin

C1C1 C1C1 C1C1 C1C1

VM1 VM2 VM3 VM4

100 ms 140 ms 100 ms 100 ms

● Load-balanced trafic to 4 identical components on 4 identical VMs

– VM performance can vary up to a ratio 4 ! [Dej2009]

● Physical server, Hypervisor, Storage, ...● Component overloaded

Cloud Apps – Issue #2 : Unstability

8 EPFL – LSIR - Nicolas Bonvin

C1C1 C1C1 C1C1 C1C1

VM1 VM2 VM3 VM4

130 ms 140 ms 100 ms 100 ms

● Load-balanced trafic to 4 identical components on 4 identical VMs

– VM performance can vary up to a ratio 4 ! [Dej2009]

● Physical server, Hypervisor, Storage, ...● Component overloaded● Component bug, crash, deadlock, ...

Cloud Apps – Issue #2 : Unstability

9 EPFL – LSIR - Nicolas Bonvin

C1C1 C1C1 C1C1 C1C1

VM1 VM2 VM3 VM4

130 ms 140 ms 100 ms infinity

● Load-balanced trafic to 4 identical components on 4 identical VMs

– VM performance can vary up to a ratio 4 ! [Dej2009]

● Physical server, Hypervisor, Storage, ...● Component overloaded● Component bug, crash, deadlock, ...● Failure of C1 on VM4 -> load is rebalanced

Cloud Apps – Issue #2 : Unstability

10 EPFL – LSIR - Nicolas Bonvin

C1C1 C1C1 C1C1 C1C1

VM1 VM2 VM3 VM4

140 ms 150 ms 130 ms infinity

● Load-balanced trafic to 4 identical components on 4 identical VMs

– VM performance can vary up to a ratio 4 ! [Dej2009]

● Physical server, Hypervisor, Storage, ...● Component overloaded● Component bug, crash, deadlock, ...● Failure of C1 on VM4 -> load is rebalanced

Cloud Apps – Issue #2 : Unstability

11 EPFL – LSIR - Nicolas Bonvin

C1C1 C1C1 C1C1 C1C1

VM1 VM2 VM3 VM4

140 ms 150 ms 130 ms infinity

Application should react early !

● Build for failures

– Do not trust the underlying infrastructure

– Do not trust your components either !

● Components should adapt to the changing conditions

– Quickly

– Automatically

– e.g. by replacing a wonky VM by a new one

Cloud Apps – Overview

12 EPFL – LSIR - Nicolas Bonvin

Scarce: a framework to build scalable cloud applications

Architecture Overview

14 EPFL – LSIR - Nicolas Bonvin

Agent

Server

GOSSIPING + BROADCAST

Agent

A

B

E

● An agent on each server / VM

– starts/stops/monitors the components

– Takes decisions on behalf of the components

● An agent communicates with other agents

– Routing table

– Status of the server (resources usage)

Agent

Agent

Agent

Agent

An economic approach

15 EPFL – LSIR - Nicolas Bonvin

● Time is split into epochs (no synchronization between servers)● Servers charge a virtual rent for hosting a component according to

– Current resource usage (I/O, CPU, ...) of the server

– Technical factors (HW, connectivity, ...)

– Non-technical factors (country stability, ....)

An economic approach

16 EPFL – LSIR - Nicolas Bonvin

● Time is split into epochs (no synchronization between servers)● Servers charge a virtual rent for hosting a component according to

– Current resource usage (I/O, CPU, ...) of the server

– Technical factors (HW, connectivity, ...)

– Non-technical factors (country stability, ....)

● Components

– Pay virtual rent at each epoch

– Gain virtual money by processing requests

– Take decisions based on balance ( = gain – rent )

● Replicate, migrate, suicide, stay

● Virtual rents are updated by gossiping (no centralized board)

Economic model (i)

17 EPFL – LSIR - Nicolas Bonvin

● The rent of a server is different for each component !

Economic model (ii)

18 EPFL – LSIR - Nicolas Bonvin

● VM1 and VM2 have an « identical » resources usage : 45%● Server rent = server's resources usage with component's weights

– Rent for C1 @ VM1 > rent for C1 @ VM2

C1C1CPU : 30%I/O : 5%

VM1

CPU : 70%I/O : 20%

Multiplexing of server resources

VM2

CPU : 25%I/O : 65%

?

Economic model (iii)

19 EPFL – LSIR - Nicolas Bonvin

● Choosing a candidate server j during replication/migration of a component i

– netbenefit maximization

● 2 optimization goals :

– high-availability by geographical diversity of replicas

– low latency by grouping related components

● gj : weight related to the proximity of the server location to the geographical distribution of the client requests to the component

● Si is the set of server hosting a replica of component i

SLA Performance Guarantees (i)

20 EPFL – LSIR - Nicolas Bonvin

● Each component has its own SLA constraints● SLA derived directly from entry components

● Resp. Time = Service Time + max (Resp. Time of Dependencies)

C3C3

C1SLA : 500ms

C1SLA : 500ms

C2C2

C5C5

C4C4

SLA Performance Guarantees (ii)

21 EPFL – LSIR - Nicolas Bonvin

● SLA propagation from parents to children● Parent j sends its performance constraints (e.g. response time upper

bound) to its dependencies D(j) :

● Child i computes its own performance constraints :

● : group of constraints sent by the replicas of the parent g

SLA Performance Guarantees (iii)

22 EPFL – LSIR - Nicolas Bonvin

● SLA propagation from parents to children

Automatic Provisioning

23 EPFL – LSIR - Nicolas Bonvin

● Usage of allocated resources is maximized :

– autonomic migration / replication / suicide of components

– not enough to ensure end-to-end response time

● Cloud resources managed by framework via cloud API

● Each individual component has to satisfy its own SLA

– SLA easily met -> decrease resources (scale down)

– SLA not met -> increase resources (scale up, scale out)

Adaptivity to slow servers

24 EPFL – LSIR - Nicolas Bonvin

● Each component keeps statistics about its children

– e.g. 95th perc. response time

● A routing coefficient is computed for each child at each epoch

– Send more requests to more performant children

Evaluation

Evaluation: Setup

26 EPFL – LSIR - Nicolas Bonvin

● 5 components, mostly CPU-intensive (wc >> wm,wn,wd)

● 8 8-cores servers (Intel Core i7 920, 2.67 GHz, 8GB, Linux 2.6.32-trunk-amd64)

● d=0, C=110, k =10000, xs* = 25%

C3C3

C1SLA : 500ms

C1SLA : 500ms

C2C2

C5C5

C4C4

Adaptation to Varying Load (i)

27 EPFL – LSIR - Nicolas Bonvin

● 5 rps to 60 rps at minute 8, step 5 rps/min● Static setup : 2 servers with 2 cores

Adaptation to Varying Load (ii)

28 EPFL – LSIR - Nicolas Bonvin

● 5 rps to 60 rps at minute 8, step 5 rps/min● Static setup : 2 servers with 2 cores

Adaptation to Slow Server

29 EPFL – LSIR - Nicolas Bonvin

● Max 2 cores/server, 25 rps● At minute 4, a server gets slower (200 ms delay)

Scalability

30 EPFL – LSIR - Nicolas Bonvin

● Add 5 rps

per minute until 150 rps● Max 6 cores/server

Conclusion

Conclusion

32 EPFL – LSIR - Nicolas Bonvin

● Framework for building cloud applications● Elasticity : add/remove resources ● High Availability : software, hardware, network failures● Scalability : growing load, peaks, scaling down, ...

– Quick replication of busy components

● Load Balancing : load has to be shared by all available servers

– Replication of busy components

– Migration of less busy components

– Reach equilibrium when load is stable

● SLA performance guarantees

– Automatic provisioning

● No synchronization, fully decentralized

Thank you !

top related