autonomic decentralised elasticity management of cloud applications

37
Autonomic Decentralised Elasticity Management of Cloud Applications Srikumar Venugopal, Reza Nouri and Han Li School of Computer Science and Engineering University of New South Wales, Sydney, Australia E: [email protected] W: http://www.cse.unsw.edu.au/~srikumarv

Upload: srikumar-venugopal

Post on 14-Jul-2015

240 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Autonomic Decentralised Elasticity Management of Cloud Applications

Autonomic Decentralised Elasticity

Management of Cloud Applications

Srikumar Venugopal, Reza Nouri and Han Li

School of Computer Science and Engineering

University of New South Wales, Sydney, Australia

E: [email protected]

W: http://www.cse.unsw.edu.au/~srikumarv

Page 2: Autonomic Decentralised Elasticity Management of Cloud Applications

Agenda

Background & Motivation

Problem Statement

Solution Overview

Evaluation: Methodology and Results

Conclusion

Page 3: Autonomic Decentralised Elasticity Management of Cloud Applications

The Promise of Cloud

Computing

Page 4: Autonomic Decentralised Elasticity Management of Cloud Applications

Background & Motivation

Page 5: Autonomic Decentralised Elasticity Management of Cloud Applications

State-of-the-art in Auto-scaling

Product/Project Trigger Controller Actions

Amazon Autoscaling

Cloudwatchmetrics/ Threshold

Rule-based/Schedule-based

Add/Remove Capacity

WASABi Azure Diagnostics/Threshold

Rule-based Add/RemoveCapacity, Custom

RightScale/Scalr Load monitoring Rule-based/Schedule-based

Add/Remove Capacity, Custom

Google ComputeEngine

CPU Load, etc. Rule-based Add/Remove Capacity

Academic

CloudScale Demand Prediction Control theory Voltage-scaling

Cataclysm Threshold-based Queueing-model Admission Control

IBM Unity Application Utility Utility functions/RL Add/RemoveCapacity

Page 6: Autonomic Decentralised Elasticity Management of Cloud Applications

Cons of Rule-based Auto-

scaling• Currently, the most popular mechanisms

for auto-scaling are rule-based

mechanisms

• The effectiveness of rule-based

autoscaling is determined by the trigger

conditions

• Setting up the triggers is a trial-and-error

process.

Page 7: Autonomic Decentralised Elasticity Management of Cloud Applications

Cons of Rule-based Autoscaling

• Commercial products are rule-based

– Gives “illusion of control” to users

– Leads to the problem of defining the “right”

thresholds

• Centralised controllers

– Communication overhead increases with size

– Processing overhead also increases (Big

Data!)

• Limited to One application per VM

Page 8: Autonomic Decentralised Elasticity Management of Cloud Applications

Challenges of large-scale

elasticity• Large numbers of instances and apps

– Deriving solutions takes time

• Dynamic conditions

– Apps are going into critical all the time

• Shifting bottlenecks

– Greedy solutions may create bottlenecks in

other places

• Network partitions, fault tolerance…

H. Li, S. Venugopal, Using Reinforcement Learning for Controlling an Elastic Web Application Hosting Platform, Proceedings of

8th ICAC '11.

Page 9: Autonomic Decentralised Elasticity Management of Cloud Applications

Problem Statement

Page 10: Autonomic Decentralised Elasticity Management of Cloud Applications

Initial Conditions

Instance1

App Server1

app1 app2

Instance2

App Server2

app3 app4

IaaS Provider

Page 11: Autonomic Decentralised Elasticity Management of Cloud Applications

A Critical Event

Instance1

App Server1

app1 app2

IaaS Provider

Instance2

App Server2

app3 app4

Page 12: Autonomic Decentralised Elasticity Management of Cloud Applications

Placement 1

Instance1

App Server1

app2

IaaS Provider

Instance2

App Server2

app3 app4 app1

Page 13: Autonomic Decentralised Elasticity Management of Cloud Applications

Placement 2

Instance1

App Server1

app1

IaaS Provider

Instance2

App Server2

app3 app4 app2

Page 14: Autonomic Decentralised Elasticity Management of Cloud Applications

Placement 3

Instance1

App Server1

app2

IaaS Provider

Instance2

App Server2

app3 app4

Instance3

App Server3

app1

Page 15: Autonomic Decentralised Elasticity Management of Cloud Applications

Placements 4 & 5

Instance1App Server1

app2

IaaS Provider

Instance2App Server2

app3 app4

Instance1App Server1

app2

IaaS Provider

Instance2App Server2

app3 app4

Instance3App Server3

app1 app1

app1 app1

Page 16: Autonomic Decentralised Elasticity Management of Cloud Applications

Challenges of App Placement

• Load shifts are dynamic

• Multiple applications may go critical

simultaneously

• Instance provisioning should be controlled

• Service QoS must be maintained

Page 17: Autonomic Decentralised Elasticity Management of Cloud Applications

Twin Objectives

• Provisioning Problem

– To determine the smallest number of servers

required to satisfy resource requirements of

all the applications

• Dynamic Placement Problem

– To distribute the applications so as to

maximise utilisation yet meet each app’s

response time and availability requirements

Page 18: Autonomic Decentralised Elasticity Management of Cloud Applications

Solution Overview

Page 19: Autonomic Decentralised Elasticity Management of Cloud Applications

Decentralised Elastic Control

• Instances control their own utilisation

– Monitoring, management and feedback

• Local controllers are learning agents

– Reinforcement Learning

• Servers are linked by Zookeeper

– Agility, Flexibility, Co-ordination

• We call our system ADEC (Autonomic

Decentralised Elasticity Control)

Page 20: Autonomic Decentralised Elasticity Management of Cloud Applications

Software Architecture of ADEC

Page 21: Autonomic Decentralised Elasticity Management of Cloud Applications

Reinforcement Learning

• Learn optimal management policies over time

– vs. Model-based policies

• Learn long-term effects of short-term actions

– If the state-action pairs are chosen correctly

• We have applied Q-Learning to this problem

– Initial actions are drawn using Boltzmann dist.

Page 22: Autonomic Decentralised Elasticity Management of Cloud Applications

Abstract View of the Control

Scheme

Page 23: Autonomic Decentralised Elasticity Management of Cloud Applications

States

Page 24: Autonomic Decentralised Elasticity Management of Cloud Applications

Basic Actions

Server

Application

create terminate find

move duplicate merge

(-3.5) (3.5) (3.5)

(0.5) (0.5) (0.5)

Page 25: Autonomic Decentralised Elasticity Management of Cloud Applications

Actions and Rewards

• Actual actions are a combination of a

server and an application action

– E.g. find and move, merge and terminate

• 11 pre-defined actions

– Reducing complexity

• Each action is associated with a reward

– -ve rewards for actions incurring costs (e.g.

start server)

– +ve rewards for actions that save (e.g.

terminate

Page 26: Autonomic Decentralised Elasticity Management of Cloud Applications

Co-ordination using find

• Server looks up other servers with the

least load

– Zookeeper lookup

• Sends a move message to the selected

server

• Replies with accept or reject

– accept has a +ve reward

Page 27: Autonomic Decentralised Elasticity Management of Cloud Applications

Shrinking

• The controller is always reward

maximising

– Highest Reward is for merge+terminate

• A controller initiates its own shutdown

– Low load on its applications

• Gets exclusive lock on termination

– Only one instance can terminate at a time

• Transfers state before shutdown

Page 28: Autonomic Decentralised Elasticity Management of Cloud Applications

Information on the DHT

• Server event notification

• List of applications on each server

• Server status updates (load information)

• Q-value updates

Page 29: Autonomic Decentralised Elasticity Management of Cloud Applications

Evaluation

Page 30: Autonomic Decentralised Elasticity Management of Cloud Applications

Experiment 1: Testing ADEC

• IaaS provider: Amazon EC2

– small instances and high CPU instance

• Load-tester: Apache Jmeter

• Application server: Tomcat 6.0

– JVM with 1 GB RAM

• Server thresholds: 60% and 85%

Page 31: Autonomic Decentralised Elasticity Management of Cloud Applications

Experiment 1: Testing

• Six web applications

– Test Application: Hotel Management

– Search Book Confirm

• Five were subjected to a background load

– Uniform Random

• One was subjected to the test load

• Application threshold: 200 and 500 ms

• Metrics

– Average Response Time, Drop Rate, Servers

Page 32: Autonomic Decentralised Elasticity Management of Cloud Applications

Peaking Workload

Page 33: Autonomic Decentralised Elasticity Management of Cloud Applications

Poisson Workload

Page 34: Autonomic Decentralised Elasticity Management of Cloud Applications

Conclusion

Page 35: Autonomic Decentralised Elasticity Management of Cloud Applications

Conclusion

• Demonstrated a co-ordination architecture

for provisioning web applications

• Each server is independent and the

system is managed by set of simple states

and actions

• Instances start and shutdown on their own

to meet application objectives

Page 36: Autonomic Decentralised Elasticity Management of Cloud Applications

Ongoing Work

• Imrpoved performance modeling for quick

detection of slowdowns

• Using utility functions for defining

application priorities

• Extension to SOA and BPM

– Collaboration with Technical Univ of Vienna,

Austria

• Scaling the database

– ElasCass project

Page 37: Autonomic Decentralised Elasticity Management of Cloud Applications

Questions ?

[email protected]

Thank you!