self learning cloud controllers

Post on 18-Jul-2015

285 Views

Category:

Software

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Self-Learning Cloud Controllers

Pooyan Jamshidi, Amir Sharifloo, Claus Pahl, Andreas Metzger, Giovani Estrada IC4, Dublin City University, Ireland University of Duisburg-Essen, Germany Intel, Ireland

pooyan.jamshidi@computing.dcu.ie

Invited presentation at UFC, Brazil, Fortaleza

~50% = wasted hardware

Actual traffic

Typical weekly traffic to Web-based applications (e.g., Amazon.com)

Problem 1: ~75% wasted capacity Actual

demand

Problem 2: customer lost

Traffic in an unexpected burst in requests (e.g. end of year traffic to Amazon.com)

Really like this??

Auto-scaling enables you to realize this ideal on-demand provisioning

Time  

Demand  

?  Enacting change in the Cloud resources are not real-time

Capacity we can provision with Auto-Scaling

A realistic figure of dynamic provisioning

0 50 1000

500

1000

1500

0 50 100100

200

300

400

500

0 50 1000

1000

2000

0 50 1000

200

400

600

These quantitative values are required to be determined by the user ⇒  requires deep

knowledge of application (CPU, memory, thresholds)

⇒  requires performance modeling expertise (when and how to scale)

⇒  A unified opinion of user(s) is required

Amazon auto scaling Microsoft Azure Watch

9  

Microsoft Azure Auto-scaling Application Block

Naeem Esfahani and Sam Malek, “Uncertainty in Self-Adaptive Software Systems”

Pooyan Jamshidi, Aakash Ahmad, Claus Pahl, Muhammad Ali Babar , “Sources of Uncertainty in Dynamic Management of Elastic Systems”, Under Review

Uncertainty related to enactment latency: The same scaling action (adding/removing a VM with precisely the same size) took different time to be enacted on the cloud platform (here is Microsoft Azure) at different points and this difference were significant (up to couple of minutes). The enactment latency would be also different on different cloud platforms.

Ø  Offline benchmarking

Ø  Trial-and-error

Ø  Expert knowledge

Costly and not systematic

A. Gandhi, P. Dube, A. Karve, A. Kochut, L. Zhang, Adaptive, “Model-driven Autoscaling for Cloud Applications”, ICAC’14

arrival  rate  (req/s)  

95%  Resp.  :me  (m

s)  

400  ms    

60  req/s  

RobusT2Scale Initial setting + elasticity rules + response-time SLA

environment monitoring

application monitoring

scaling actions

Fuzzy Reasoning Users

Prediction/Smoothing

       0 0.5 1 1.5 2 2.5 30

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Region  of  definite  satisfaction  

Region  of  definite  dissatisfaction  Region  of  

uncertain  satisfaction  

Performance Index

Pos

sibi

lity

Performance Index P

ossi

bilit

y

words can mean different things to different people

Different users often recommend different elasticity policies

0 0.5 1 1.5 2 2.5 30

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Type-2 MF

Type-1 MF

Workload

Response time

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x2

uM

embe

rshi

p gr

ade

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x2u

Mem

bers

hip

grad

e

=>

=>

UMF

LMF

Embedded

FOU

mean

sd

Rule  (𝒍)  

Antecedents   Consequent  

𝒄𝒂𝒗𝒈𝒍  Workload   Response-­‐

time  Normal  (-­‐2)  

Effort  (-­‐1)  

Medium  Effort  (0)  

High  Effort  (+1)  

Maximum  Effort  (+2)  

1   Very  low   Instantaneous   7   2   1   0   0   -­‐1.6  2   Very  low   Fast   5   4   1   0   0   -­‐1.4  3   Very  low   Medium   0   2   6   2   0   0  4   Very  low   Slow   0   0   4   6   0   0.6  5   Very  low   Very  slow   0   0   0   6   4   1.4  6   Low   Instantaneous   5   3   2   0   0   -­‐1.3  7   Low   Fast   2   7   1   0   0   -­‐1.1  8   Low   Medium   0   1   5   3   1   0.4  9   Low   Slow   0   0   1   8   1   1  10   Low   Very  slow   0   0   0   4   6   1.6  11   Medium   Instantaneous   6   4   0   0   0   -­‐1.6  12   Medium   Fast   2   5   3   0   0   -­‐0.9  13   Medium   Medium   0   0   5   4   1   0.6  14   Medium   Slow   0   0   1   7   2   1.1  15   Medium   Very  slow   0   0   1   3   6   1.5  16   High   Instantaneous   8   2   0   0   0   -­‐1.8  17   High   Fast   4   6   0   0   0   -­‐1.4  18   High   Medium   0   1   5   3   1   0.4  19   High   Slow   0   0   1   7   2   1.1  20   High   Very  slow   0   0   0   6   4   1.4  21   Very  high   Instantaneous   9   1   0   0   0   -­‐1.9  22   Very  high   Fast   3   6   1   0   0   -­‐1.2  23   Very  high   Medium   0   1   4   4   1   0.5  24   Very  high   Slow   0   0   1   8   1   1  25   Very  high   Very  slow   0   0   0   4   6   1.6  

 

Rule ()  

Antecedents   Consequent  

Work load  

Response

-time  -2   -1   0 +1   +2  

12   Medium   Fast   2   5   3 0   0   -0.9  

10 experts’ responses

𝑅↑𝑙 : IF (the workload (𝑥↓1 ) is 𝐹 ↓𝑖↓1  , AND the response-time (𝑥↓2 ) is 𝐺 ↓𝑖↓2  ), THEN (add/remove 𝑐↓𝑎𝑣𝑔↑𝑙  instances).

𝑐↓𝑎𝑣𝑔↑𝑙 = ∑𝑢=1↑𝑁↓𝑙 ▒𝑤↓𝑢↑𝑙 ×𝐶 /∑𝑢=1↑𝑁↓𝑙 ▒𝑤↓𝑢↑𝑙   

Goal: pre-computations of costly calculations to make a runtime efficient elasticity reasoning based on fuzzy inference

Liang, Q., Mendel, J. M. (2000). Interval type-2 fuzzy logic systems: theory and design. Fuzzy Systems, IEEE Transactions on, 8(5), 535-550.

Scaling Actions

Monitoring Data

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.5954

0.3797

𝑀

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.2212

0.0000

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x2

u

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x2

uMonitoring data

Workload

Response time

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.5954

0.3797

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

10.95680.9377

Performance index

𝑦↓𝑙 , 𝑦↓𝑟 

0 50 1000

500

1000

1500

0 50 100100

200

300

400

500

0 50 1000

1000

2000

0 50 1000

200

400

600

0 50 1000

500

1000

0 50 1000

500

1000

0 10 20 30 40 50 60 70 80 90 100-500

0

500

1000

1500

2000

Time (seconds)

Num

ber o

f hits

Original databetta=0.10, gamma=0.94, rmse=308.1565, rrse=0.79703betta=0.27, gamma=0.94, rmse=209.7852, rrse=0.54504betta=0.80, gamma=0.94, rmse=272.6285, rrse=0.70858

0

0.2

0.4

0.6

0.8

1

1.2

1.4

Big spike Dual phase Large variations Quickly varying Slowly varying Steep tri phase

0 50 1000

500

1000

1500

0 50 100100

200

300

400

500

0 50 1000

1000

2000

0 50 1000

200

400

6000 50 1000

500

1000

0 50 1000

500

1000

Roo

t Rel

ativ

e S

quar

ed E

rror

SUT Criteria Big spike

Dual phase

Large variations

Quickly varying

Slowly varying

Steep tri phase

RobusT2Scale 973ms 537ms 509ms 451ms 423ms 498ms

3.2 3.8 5.1 5.3 3.7 3.9

Overprovisioning

354ms 411ms 395ms 446ms 371ms 491ms

6 6 6 6 6 6

Under provisioning

1465ms 1832ms 1789ms 1594ms 1898ms 2194ms

2 2 2 2 2 2

SLA: 𝒓𝒕↓𝟗𝟓 ≤𝟔𝟎𝟎𝒎𝒔 For every 10s control interval

• RobusT2Scale is superior to under-provisioning in terms of guaranteeing the SLA and does not require excessive resources • RobusT2Scale is superior to over-provisioning in terms of guaranteeing required resources while guaranteeing the SLA

0

0.02

0.04

0.06

0.08

0.1

alpha=0.1 alpha=0.5 alpha=0.9 alpha=1.0

Roo

t Mea

n S

quar

e E

rror

Noise level: 10%

Self-Learning Controller

Current Solution (my PhD Thesis + IC4 work on auto-scaling)

Future Plan Updating K in MAPE-K @ Runtime Design-time

Assistance

Multi-cloud

Fuzzifier

Inference Engine

Defuzzifier Rule base

Fuzzy Q-learning

Cloud Application Monitoring Actuator

Cloud Platform

Fuzzy Logic Controller

Knowledge Learning

Aut

onom

ic C

ontr

olle

r

𝑟𝑡 𝑤

𝑤,  𝑟𝑡,  𝑡ℎ,  𝑣𝑚

𝑠𝑎

system state system goal

RobusT2Scale

Learned rules

FQL

Monitoring Actuator

Cloud Platform

.fis

LW

W

ElasticBench

𝑤,  𝑟𝑡

𝑤,  𝑟𝑡,    𝑡ℎ,  𝑣𝑚

𝑠𝑎

Load Generator

C

system state

WCF

REST

𝛾,𝜂,𝜀,𝑟

Cloud Platform (PaaS) On-Premise

P: Worker

Role

L: Web Role

P: Worker

Role

P: Worker

Role

Cache

M: Worker

Role

Results:

Storage

Blackboard: Storage

LG: Console

Auto-scaling Logic (controller)

Policy Enforcer

1 2 3

7 8

11 12

4

10 9

LB: Load

Balancer

6

5 Queue

Actuator

Monitoring

0

0.2

0.4

0.6

0.8

1

1.2

0 4 12

15

21

26

33

39

47

53

60

68

76

83

90

95

103

110

118

124

130

136

145

152

159

166

171

179

185

193

200

206

214

218

219

228

236

242

249

255

263

267

271

275

283

290

295

303

307

314

320

S1 S2 S3 S4 S5

0

0.5

1

1.5

2

2.5

0 50 100 150 200 250 300 350

q(9,5)

0 0.02 0.04 0.06 0.08

0.1 0.12 0.14 0.16 0.18

0.2

0 50 100 150 200 250 300 350

q(7,5)

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

0 50 100 150 200 250 300 350

q(9,3)

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0 50 100 150 200 250 300 350

q(3,3)

0

1

2

3

4

5

6

7

8

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

1

2

3

4

Challenge  1:  ~ 75%  wasted  capacityA c t u a l  

d e m a n d

Challenge  2:  customer  lost

Fuzzifier

Inference  Eng ine

DefuzzifierRule  base

FuzzyQ-­‐learning

Cloud  ApplicationMonitoring Actuator

Cloud  Platform

Fuzzy  Log ic  Controller

Knowledge  Learning

Auton

omic  Con

troller

𝑟𝑡𝑤

𝑤,𝑟𝑡,𝑡ℎ,𝑣𝑚

𝑠𝑎

system  state system  goal

RobusT2Scale

Learned  rules

FQL

Monitoring Actuator

Cloud  Platform

.fis

LW

W

ElasticBench

𝑤, 𝑟𝑡

𝑤, 𝑟𝑡,    𝑡ℎ, 𝑣𝑚

𝑠𝑎

Load  Generator

C

system  state

WCF

REST

𝛾, 𝜂, 𝜀, 𝑟

http://computing.dcu.ie/~pjamshidi/PDF/SEAMS2014.pdf

More Details? => http://www.slideshare.net/pooyanjamshidi/

Slides?

=>

Thank you!

Aakash Ahmad Claus Pahl Soodeh Farokhi Amir Sharifloo Armin Balalaie

https://github.com/pooyanjamshidi

Code? =>

Hamed Jamshidi

Nabor Mendonca

Brian Carroll Reza Teimourzadegan

top related