self learning cloud controllers

Self-Learning Cloud Controllers

Pooyan Jamshidi, Amir Sharifloo, Claus Pahl, Andreas Metzger, Giovani Estrada IC4, Dublin City University, Ireland University of Duisburg-Essen, Germany Intel, Ireland

pooyan.jamshidi@computing.dcu.ie

Invited presentation at UFC, Brazil, Fortaleza

~50% = wasted hardware

Actual traffic

Typical weekly traffic to Web-based applications (e.g., Amazon.com)

Problem 1: ~75% wasted capacity Actual

demand

Problem 2: customer lost

Traffic in an unexpected burst in requests (e.g. end of year traffic to Amazon.com)

Really like this??

Auto-scaling enables you to realize this ideal on-demand provisioning

Demand

? Enacting change in the Cloud resources are not real-time

Capacity we can provision with Auto-Scaling

A realistic figure of dynamic provisioning

0 50 1000

0 50 100100

0 50 1000

These quantitative values are required to be determined by the user ⇒  requires deep

knowledge of application (CPU, memory, thresholds)

⇒  requires performance modeling expertise (when and how to scale)

⇒  A unified opinion of user(s) is required

Amazon auto scaling Microsoft Azure Watch

Microsoft Azure Auto-scaling Application Block

Naeem Esfahani and Sam Malek, “Uncertainty in Self-Adaptive Software Systems”

Pooyan Jamshidi, Aakash Ahmad, Claus Pahl, Muhammad Ali Babar , “Sources of Uncertainty in Dynamic Management of Elastic Systems”, Under Review

Uncertainty related to enactment latency: The same scaling action (adding/removing a VM with precisely the same size) took different time to be enacted on the cloud platform (here is Microsoft Azure) at different points and this difference were significant (up to couple of minutes). The enactment latency would be also different on different cloud platforms.

Ø  Offline benchmarking

Ø  Trial-and-error

Ø  Expert knowledge

Costly and not systematic

A. Gandhi, P. Dube, A. Karve, A. Kochut, L. Zhang, Adaptive, “Model-driven Autoscaling for Cloud Applications”, ICAC’14

arrival rate (req/s)

95% Resp. :me (m

400 ms

60 req/s

RobusT2Scale Initial setting + elasticity rules + response-time SLA

environment monitoring

application monitoring

scaling actions

Fuzzy Reasoning Users

Prediction/Smoothing

0 0.5 1 1.5 2 2.5 30

Region of definite satisfaction

Region of definite dissatisfaction Region of

uncertain satisfaction

Performance Index

Performance Index P

words can mean different things to different people

Different users often recommend different elasticity policies

0 0.5 1 1.5 2 2.5 30

Type-2 MF

Type-1 MF

Workload

Response time

0 10 20 30 40 50 60 70 80 90 1000

Embedded

Rule (𝒍)

Antecedents Consequent

𝒄𝒂𝒗𝒈𝒍 Workload Response-‐

time Normal (-‐2)

Effort (-‐1)

Medium Effort (0)

High Effort (+1)

Maximum Effort (+2)

1 Very low Instantaneous 7 2 1 0 0 -‐1.6 2 Very low Fast 5 4 1 0 0 -‐1.4 3 Very low Medium 0 2 6 2 0 0 4 Very low Slow 0 0 4 6 0 0.6 5 Very low Very slow 0 0 0 6 4 1.4 6 Low Instantaneous 5 3 2 0 0 -‐1.3 7 Low Fast 2 7 1 0 0 -‐1.1 8 Low Medium 0 1 5 3 1 0.4 9 Low Slow 0 0 1 8 1 1 10 Low Very slow 0 0 0 4 6 1.6 11 Medium Instantaneous 6 4 0 0 0 -‐1.6 12 Medium Fast 2 5 3 0 0 -‐0.9 13 Medium Medium 0 0 5 4 1 0.6 14 Medium Slow 0 0 1 7 2 1.1 15 Medium Very slow 0 0 1 3 6 1.5 16 High Instantaneous 8 2 0 0 0 -‐1.8 17 High Fast 4 6 0 0 0 -‐1.4 18 High Medium 0 1 5 3 1 0.4 19 High Slow 0 0 1 7 2 1.1 20 High Very slow 0 0 0 6 4 1.4 21 Very high Instantaneous 9 1 0 0 0 -‐1.9 22 Very high Fast 3 6 1 0 0 -‐1.2 23 Very high Medium 0 1 4 4 1 0.5 24 Very high Slow 0 0 1 8 1 1 25 Very high Very slow 0 0 0 4 6 1.6

Rule ()

Antecedents Consequent

Work load

Response

-time -2 -1 0 +1 +2

12 Medium Fast 2 5 3 0 0 -0.9

10 experts’ responses

𝑅↑𝑙 : IF (the workload (𝑥↓1 ) is 𝐹 ↓𝑖↓1  , AND the response-time (𝑥↓2 ) is 𝐺 ↓𝑖↓2  ), THEN (add/remove 𝑐↓𝑎𝑣𝑔↑𝑙  instances).

𝑐↓𝑎𝑣𝑔↑𝑙 = ∑𝑢=1↑𝑁↓𝑙 ▒𝑤↓𝑢↑𝑙 ×𝐶 /∑𝑢=1↑𝑁↓𝑙 ▒𝑤↓𝑢↑𝑙   

Goal: pre-computations of costly calculations to make a runtime efficient elasticity reasoning based on fuzzy inference

Liang, Q., Mendel, J. M. (2000). Interval type-2 fuzzy logic systems: theory and design. Fuzzy Systems, IEEE Transactions on, 8(5), 535-550.

Scaling Actions

Monitoring Data

0 10 20 30 40 50 60 70 80 90 1000

0.5954

0.3797

0 10 20 30 40 50 60 70 80 90 1000

0.2212

0.0000

0 10 20 30 40 50 60 70 80 90 1000

uMonitoring data

Workload

Response time

0 10 20 30 40 50 60 70 80 90 1000

0.5954

0.3797

0 10 20 30 40 50 60 70 80 90 1000

10.95680.9377

Performance index

𝑦↓𝑙 , 𝑦↓𝑟 

0 50 1000

0 50 100100

0 50 1000

0 10 20 30 40 50 60 70 80 90 100-500

Time (seconds)

f hits

Original databetta=0.10, gamma=0.94, rmse=308.1565, rrse=0.79703betta=0.27, gamma=0.94, rmse=209.7852, rrse=0.54504betta=0.80, gamma=0.94, rmse=272.6285, rrse=0.70858

Big spike Dual phase Large variations Quickly varying Slowly varying Steep tri phase

0 50 1000

0 50 100100

0 50 1000

6000 50 1000

0 50 1000

SUT Criteria Big spike

Dual phase

Large variations

Quickly varying

Slowly varying

Steep tri phase

RobusT2Scale 973ms 537ms 509ms 451ms 423ms 498ms

3.2 3.8 5.1 5.3 3.7 3.9

Overprovisioning

354ms 411ms 395ms 446ms 371ms 491ms

6 6 6 6 6 6

Under provisioning

1465ms 1832ms 1789ms 1594ms 1898ms 2194ms

2 2 2 2 2 2

SLA: 𝒓𝒕↓𝟗𝟓 ≤𝟔𝟎𝟎𝒎𝒔 For every 10s control interval

• RobusT2Scale is superior to under-provisioning in terms of guaranteeing the SLA and does not require excessive resources • RobusT2Scale is superior to over-provisioning in terms of guaranteeing required resources while guaranteeing the SLA

alpha=0.1 alpha=0.5 alpha=0.9 alpha=1.0

Noise level: 10%

Self-Learning Controller

Current Solution (my PhD Thesis + IC4 work on auto-scaling)

Future Plan Updating K in MAPE-K @ Runtime Design-time

Assistance

Multi-cloud

Fuzzifier

Inference Engine

Defuzzifier Rule base

Fuzzy Q-learning

Cloud Application Monitoring Actuator

Cloud Platform

Fuzzy Logic Controller

Knowledge Learning

𝑟𝑡 𝑤

𝑤, 𝑟𝑡, 𝑡ℎ, 𝑣𝑚

𝑠𝑎

system state system goal

RobusT2Scale

Learned rules

Monitoring Actuator

Cloud Platform

ElasticBench

𝑤, 𝑟𝑡

𝑠𝑎

Load Generator

system state

𝛾,𝜂,𝜀,𝑟

Cloud Platform (PaaS) On-Premise

P: Worker

L: Web Role

P: Worker

M: Worker

Results:

Storage

Blackboard: Storage

LG: Console

Auto-scaling Logic (controller)

Policy Enforcer

LB: Load

Balancer

5 Queue

Actuator

Monitoring

0 4 12

S1 S2 S3 S4 S5

0 50 100 150 200 250 300 350

q(9,5)

0 0.02 0.04 0.06 0.08

0.1 0.12 0.14 0.16 0.18

0 50 100 150 200 250 300 350

q(7,5)

0 50 100 150 200 250 300 350

q(9,3)

0 50 100 150 200 250 300 350

q(3,3)

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Challenge 1: ~ 75% wasted capacityA c t u a l

d e m a n d

Challenge 2: customer lost

Fuzzifier

Inference Eng ine

DefuzzifierRule base

FuzzyQ-‐learning

Cloud ApplicationMonitoring Actuator

Cloud Platform

Fuzzy Log ic Controller

Knowledge Learning

omic Con

troller

𝑟𝑡𝑤

𝑤,𝑟𝑡,𝑡ℎ,𝑣𝑚

𝑠𝑎

system state system goal

RobusT2Scale

Learned rules

Monitoring Actuator

Cloud Platform

ElasticBench

𝑤, 𝑟𝑡

𝑠𝑎

Load Generator

system state

𝛾, 𝜂, 𝜀, 𝑟

http://computing.dcu.ie/~pjamshidi/PDF/SEAMS2014.pdf

More Details? => http://www.slideshare.net/pooyanjamshidi/

Slides?

Thank you!

Aakash Ahmad Claus Pahl Soodeh Farokhi Amir Sharifloo Armin Balalaie

https://github.com/pooyanjamshidi

Code? =>

Hamed Jamshidi

Nabor Mendonca

Brian Carroll Reza Teimourzadegan

self learning cloud controllers

different time

different cloud platforms

different things

different points

different elasticity

cloud applications

medium effort

mf workload response

Software

self servicing in epam private cloud 0.3 (1)

self-service in epam private cloud

differentiate cloud services with self-service application...

self servicing in epam private cloud 4.0

fan-tastic vent fans & controllers - amazon web...

make your openstack cloud self-defending with vespa!

self-service cloud...

controllers, cloud, & cooperative control - · pdf...

cloud computing and self-service

the community cloud and the quantifiable self

self-tuning service provisioning for decentralized cloud...

self-service cloud computing - rutgers...

secure self destruction scheme in cloud computing

self-adaptation of online recommender systems via...

self-adaptive cloud infrastructures with bidirectional...

self learning material cloud computing

personal cloud self‐protecting self‐encrypting storage...

cloud geospasial self-managed services untuk katalog data

self-optimizing memory controllers: a reinforcement...

the future of self-service is cloud analytics