autonomic sla-driven provisioning for cloud applications

Autonomic SLA-driven Provisioning for Cloud Applications

Nicolas Bonvin, Thanasis Papaioannou, Karl Aberer

CCGRID 2011, May 23-26 2011, New Port Beach, CA, USA

nicolas.bonvin@epfl.chLSIR - EPFL

● A distributed, component-based application running on an elastic infrastructure

Cloud Apps – Issue #1 : Placement

2 EPFL – LSIR - Nicolas Bonvin

C1C1 C2C2 C3C3 C4C4

VM1 VM2 VM3

● Performance of C1, C2 and C3 is probably less than C4● No info on other VMs colocated on same server !

C3C3 C4C4

VM2 VM3

Server 1 Server 2

C1C1 C2C2

● Performance of C1, C2 and C3 is probably less than C4● No info on other VMs colocated on same server !

No control on placement

C3C3 C4C4

VM2 VM3

Server 1 Server 2

C1C1 C2C2

● Load-balanced trafic to 4 identical components on 4 identical VMs

Cloud Apps – Issue #2 : Unstability

C1C1 C1C1 C1C1 C1C1

VM1 VM2 VM3 VM4

100 ms 100 ms 100 ms 100 ms

– VM performance can vary up to a ratio 4 ! [Dej2009]

● Physical server, Hypervisor, Storage, ...

C1C1 C1C1 C1C1 C1C1

VM1 VM2 VM3 VM4

100 ms 140 ms 100 ms 100 ms

● Physical server, Hypervisor, Storage, ...● Component overloaded

C1C1 C1C1 C1C1 C1C1

VM1 VM2 VM3 VM4

130 ms 140 ms 100 ms 100 ms

● Physical server, Hypervisor, Storage, ...● Component overloaded● Component bug, crash, deadlock, ...

C1C1 C1C1 C1C1 C1C1

VM1 VM2 VM3 VM4

130 ms 140 ms 100 ms infinity

● Physical server, Hypervisor, Storage, ...● Component overloaded● Component bug, crash, deadlock, ...● Failure of C1 on VM4 -> load is rebalanced

C1C1 C1C1 C1C1 C1C1

VM1 VM2 VM3 VM4

● Physical server, Hypervisor, Storage, ...● Component overloaded● Component bug, crash, deadlock, ...● Failure of C1 on VM4 -> load is rebalanced

C1C1 C1C1 C1C1 C1C1

VM1 VM2 VM3 VM4

Application should react early !

● Build for failures

– Do not trust the underlying infrastructure

– Do not trust your components either !

● Components should adapt to the changing conditions

– Quickly

– Automatically

– e.g. by replacing a wonky VM by a new one

Cloud Apps – Overview

Scarce: a framework to build scalable cloud applications

Architecture Overview

Server

GOSSIPING + BROADCAST

● An agent on each server / VM

– starts/stops/monitors the components

– Takes decisions on behalf of the components

● An agent communicates with other agents

– Routing table

– Status of the server (resources usage)

An economic approach

● Time is split into epochs (no synchronization between servers)● Servers charge a virtual rent for hosting a component according to

– Current resource usage (I/O, CPU, ...) of the server

– Technical factors (HW, connectivity, ...)

– Non-technical factors (country stability, ....)

An economic approach

● Time is split into epochs (no synchronization between servers)● Servers charge a virtual rent for hosting a component according to

– Current resource usage (I/O, CPU, ...) of the server

– Technical factors (HW, connectivity, ...)

– Non-technical factors (country stability, ....)

● Components

– Pay virtual rent at each epoch

– Gain virtual money by processing requests

– Take decisions based on balance ( = gain – rent )

● Replicate, migrate, suicide, stay

● Virtual rents are updated by gossiping (no centralized board)

Economic model (i)

● The rent of a server is different for each component !

Economic model (ii)

● VM1 and VM2 have an « identical » resources usage : 45%● Server rent = server's resources usage with component's weights

– Rent for C1 @ VM1 > rent for C1 @ VM2

C1C1CPU : 30%I/O : 5%

CPU : 70%I/O : 20%

Multiplexing of server resources

CPU : 25%I/O : 65%

Economic model (iii)

● Choosing a candidate server j during replication/migration of a component i

– netbenefit maximization

● 2 optimization goals :

– high-availability by geographical diversity of replicas

– low latency by grouping related components

● gj : weight related to the proximity of the server location to the geographical distribution of the client requests to the component

● Si is the set of server hosting a replica of component i

SLA Performance Guarantees (i)

● Each component has its own SLA constraints● SLA derived directly from entry components

● Resp. Time = Service Time + max (Resp. Time of Dependencies)

C1SLA : 500ms

SLA Performance Guarantees (ii)

● SLA propagation from parents to children● Parent j sends its performance constraints (e.g. response time upper

bound) to its dependencies D(j) :

● Child i computes its own performance constraints :

● : group of constraints sent by the replicas of the parent g

SLA Performance Guarantees (iii)

● SLA propagation from parents to children

Automatic Provisioning

● Usage of allocated resources is maximized :

– autonomic migration / replication / suicide of components

– not enough to ensure end-to-end response time

● Cloud resources managed by framework via cloud API

● Each individual component has to satisfy its own SLA

– SLA easily met -> decrease resources (scale down)

– SLA not met -> increase resources (scale up, scale out)

Adaptivity to slow servers

● Each component keeps statistics about its children

– e.g. 95th perc. response time

● A routing coefficient is computed for each child at each epoch

– Send more requests to more performant children

Evaluation

Evaluation: Setup

● 5 components, mostly CPU-intensive (wc >> wm,wn,wd)

● 8 8-cores servers (Intel Core i7 920, 2.67 GHz, 8GB, Linux 2.6.32-trunk-amd64)

● d=0, C=110, k =10000, xs* = 25%

C1SLA : 500ms

Adaptation to Varying Load (i)

● 5 rps to 60 rps at minute 8, step 5 rps/min● Static setup : 2 servers with 2 cores

Adaptation to Varying Load (ii)

● 5 rps to 60 rps at minute 8, step 5 rps/min● Static setup : 2 servers with 2 cores

Adaptation to Slow Server

● Max 2 cores/server, 25 rps● At minute 4, a server gets slower (200 ms delay)

Scalability

● Add 5 rps

per minute until 150 rps● Max 6 cores/server

Conclusion

● Framework for building cloud applications● Elasticity : add/remove resources ● High Availability : software, hardware, network failures● Scalability : growing load, peaks, scaling down, ...

– Quick replication of busy components

● Load Balancing : load has to be shared by all available servers

– Replication of busy components

– Migration of less busy components

– Reach equilibrium when load is stable

● SLA performance guarantees

– Automatic provisioning

● No synchronization, fully decentralized

Thank you !

autonomic sla-driven provisioning for cloud applications

Technology

source provisioning for cloud computing: challenges ... ·...

sla e penali di provisioning ed assurance

security service level agreements in the cloud: the specs...

autonomic provisioning of backend databases in dynamic...

autonomic provisioning and application mapping on spot

采用 oracle enterprise manager 12c...

autonomic nervous system chapter 15. autonomic nervous...

massimiliano raks, naples university on specs: secure...

autonomic backend databases in dynamic content...

autonomic sla-driven provisioning for cloud applications...

2011 international conference on cloud and service ... ·...

resource provisioning based scheduling framework for...

communication tools, project information package...

autonomic dysfunction: autonomic non-epileptic seizures...

our customer terms page 1 of 25 standard restoration, sla...

autonomic provisioning of backend databases in dynamic...

bimm118 autonomic nervous system. bimm118 autonomic nervous...

managing cloud service provisioning and sla enforcement...

report on conceptual framework for cloud sla negotiation...

a bandwidth allocation model provisioning framework with...