self learning cloud controllers

40
Self-Learning Cloud Controllers Pooyan Jamshidi , Amir Sharifloo, Claus Pahl, Andreas Metzger, Giovani Estrada IC4, Dublin City University, Ireland University of Duisburg-Essen, Germany Intel, Ireland [email protected] Invited presentation at UFC, Brazil, Fortaleza

Upload: pooyan-jamshidi

Post on 18-Jul-2015

285 views

Category:

Software


1 download

TRANSCRIPT

Page 1: Self learning cloud controllers

Self-Learning Cloud Controllers

Pooyan Jamshidi, Amir Sharifloo, Claus Pahl, Andreas Metzger, Giovani Estrada IC4, Dublin City University, Ireland University of Duisburg-Essen, Germany Intel, Ireland

[email protected]

Invited presentation at UFC, Brazil, Fortaleza

Page 2: Self learning cloud controllers

~50% = wasted hardware

Actual traffic

Typical weekly traffic to Web-based applications (e.g., Amazon.com)

Page 3: Self learning cloud controllers

Problem 1: ~75% wasted capacity Actual

demand

Problem 2: customer lost

Traffic in an unexpected burst in requests (e.g. end of year traffic to Amazon.com)

Page 4: Self learning cloud controllers

Really like this??

Auto-scaling enables you to realize this ideal on-demand provisioning

Time  

Demand  

?  Enacting change in the Cloud resources are not real-time

Page 5: Self learning cloud controllers

Capacity we can provision with Auto-Scaling

A realistic figure of dynamic provisioning

Page 6: Self learning cloud controllers
Page 7: Self learning cloud controllers

0 50 1000

500

1000

1500

0 50 100100

200

300

400

500

0 50 1000

1000

2000

0 50 1000

200

400

600

Page 8: Self learning cloud controllers
Page 9: Self learning cloud controllers

These quantitative values are required to be determined by the user ⇒  requires deep

knowledge of application (CPU, memory, thresholds)

⇒  requires performance modeling expertise (when and how to scale)

⇒  A unified opinion of user(s) is required

Amazon auto scaling Microsoft Azure Watch

9  

Microsoft Azure Auto-scaling Application Block

Page 10: Self learning cloud controllers
Page 11: Self learning cloud controllers

Naeem Esfahani and Sam Malek, “Uncertainty in Self-Adaptive Software Systems”

Pooyan Jamshidi, Aakash Ahmad, Claus Pahl, Muhammad Ali Babar , “Sources of Uncertainty in Dynamic Management of Elastic Systems”, Under Review

Page 12: Self learning cloud controllers

Uncertainty related to enactment latency: The same scaling action (adding/removing a VM with precisely the same size) took different time to be enacted on the cloud platform (here is Microsoft Azure) at different points and this difference were significant (up to couple of minutes). The enactment latency would be also different on different cloud platforms.

Page 13: Self learning cloud controllers

Ø  Offline benchmarking

Ø  Trial-and-error

Ø  Expert knowledge

Costly and not systematic

A. Gandhi, P. Dube, A. Karve, A. Kochut, L. Zhang, Adaptive, “Model-driven Autoscaling for Cloud Applications”, ICAC’14

arrival  rate  (req/s)  

95%  Resp.  :me  (m

s)  

400  ms    

60  req/s  

Page 14: Self learning cloud controllers

RobusT2Scale Initial setting + elasticity rules + response-time SLA

environment monitoring

application monitoring

scaling actions

Fuzzy Reasoning Users

Prediction/Smoothing

Page 15: Self learning cloud controllers
Page 16: Self learning cloud controllers

       0 0.5 1 1.5 2 2.5 30

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Region  of  definite  satisfaction  

Region  of  definite  dissatisfaction  Region  of  

uncertain  satisfaction  

Performance Index

Pos

sibi

lity

Performance Index P

ossi

bilit

y

words can mean different things to different people

Different users often recommend different elasticity policies

0 0.5 1 1.5 2 2.5 30

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Type-2 MF

Type-1 MF

Page 17: Self learning cloud controllers
Page 18: Self learning cloud controllers

Workload

Response time

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x2

uM

embe

rshi

p gr

ade

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x2u

Mem

bers

hip

grad

e

=>

=>

UMF

LMF

Embedded

FOU

mean

sd

Page 19: Self learning cloud controllers

Rule  (𝒍)  

Antecedents   Consequent  

𝒄𝒂𝒗𝒈𝒍  Workload   Response-­‐

time  Normal  (-­‐2)  

Effort  (-­‐1)  

Medium  Effort  (0)  

High  Effort  (+1)  

Maximum  Effort  (+2)  

1   Very  low   Instantaneous   7   2   1   0   0   -­‐1.6  2   Very  low   Fast   5   4   1   0   0   -­‐1.4  3   Very  low   Medium   0   2   6   2   0   0  4   Very  low   Slow   0   0   4   6   0   0.6  5   Very  low   Very  slow   0   0   0   6   4   1.4  6   Low   Instantaneous   5   3   2   0   0   -­‐1.3  7   Low   Fast   2   7   1   0   0   -­‐1.1  8   Low   Medium   0   1   5   3   1   0.4  9   Low   Slow   0   0   1   8   1   1  10   Low   Very  slow   0   0   0   4   6   1.6  11   Medium   Instantaneous   6   4   0   0   0   -­‐1.6  12   Medium   Fast   2   5   3   0   0   -­‐0.9  13   Medium   Medium   0   0   5   4   1   0.6  14   Medium   Slow   0   0   1   7   2   1.1  15   Medium   Very  slow   0   0   1   3   6   1.5  16   High   Instantaneous   8   2   0   0   0   -­‐1.8  17   High   Fast   4   6   0   0   0   -­‐1.4  18   High   Medium   0   1   5   3   1   0.4  19   High   Slow   0   0   1   7   2   1.1  20   High   Very  slow   0   0   0   6   4   1.4  21   Very  high   Instantaneous   9   1   0   0   0   -­‐1.9  22   Very  high   Fast   3   6   1   0   0   -­‐1.2  23   Very  high   Medium   0   1   4   4   1   0.5  24   Very  high   Slow   0   0   1   8   1   1  25   Very  high   Very  slow   0   0   0   4   6   1.6  

 

Rule ()  

Antecedents   Consequent  

Work load  

Response

-time  -2   -1   0 +1   +2  

12   Medium   Fast   2   5   3 0   0   -0.9  

10 experts’ responses

𝑅↑𝑙 : IF (the workload (𝑥↓1 ) is 𝐹 ↓𝑖↓1  , AND the response-time (𝑥↓2 ) is 𝐺 ↓𝑖↓2  ), THEN (add/remove 𝑐↓𝑎𝑣𝑔↑𝑙  instances).

𝑐↓𝑎𝑣𝑔↑𝑙 = ∑𝑢=1↑𝑁↓𝑙 ▒𝑤↓𝑢↑𝑙 ×𝐶 /∑𝑢=1↑𝑁↓𝑙 ▒𝑤↓𝑢↑𝑙   

Goal: pre-computations of costly calculations to make a runtime efficient elasticity reasoning based on fuzzy inference

Page 20: Self learning cloud controllers

Liang, Q., Mendel, J. M. (2000). Interval type-2 fuzzy logic systems: theory and design. Fuzzy Systems, IEEE Transactions on, 8(5), 535-550.

Scaling Actions

Monitoring Data

Page 21: Self learning cloud controllers

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.5954

0.3797

𝑀

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.2212

0.0000

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x2

u

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x2

uMonitoring data

Workload

Response time

Page 22: Self learning cloud controllers

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.5954

0.3797

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

10.95680.9377

Page 23: Self learning cloud controllers
Page 24: Self learning cloud controllers

Performance index

𝑦↓𝑙 , 𝑦↓𝑟 

Page 25: Self learning cloud controllers
Page 26: Self learning cloud controllers

0 50 1000

500

1000

1500

0 50 100100

200

300

400

500

0 50 1000

1000

2000

0 50 1000

200

400

600

0 50 1000

500

1000

0 50 1000

500

1000

Page 27: Self learning cloud controllers

0 10 20 30 40 50 60 70 80 90 100-500

0

500

1000

1500

2000

Time (seconds)

Num

ber o

f hits

Original databetta=0.10, gamma=0.94, rmse=308.1565, rrse=0.79703betta=0.27, gamma=0.94, rmse=209.7852, rrse=0.54504betta=0.80, gamma=0.94, rmse=272.6285, rrse=0.70858

0

0.2

0.4

0.6

0.8

1

1.2

1.4

Big spike Dual phase Large variations Quickly varying Slowly varying Steep tri phase

0 50 1000

500

1000

1500

0 50 100100

200

300

400

500

0 50 1000

1000

2000

0 50 1000

200

400

6000 50 1000

500

1000

0 50 1000

500

1000

Roo

t Rel

ativ

e S

quar

ed E

rror

Page 28: Self learning cloud controllers

SUT Criteria Big spike

Dual phase

Large variations

Quickly varying

Slowly varying

Steep tri phase

RobusT2Scale 973ms 537ms 509ms 451ms 423ms 498ms

3.2 3.8 5.1 5.3 3.7 3.9

Overprovisioning

354ms 411ms 395ms 446ms 371ms 491ms

6 6 6 6 6 6

Under provisioning

1465ms 1832ms 1789ms 1594ms 1898ms 2194ms

2 2 2 2 2 2

SLA: 𝒓𝒕↓𝟗𝟓 ≤𝟔𝟎𝟎𝒎𝒔 For every 10s control interval

• RobusT2Scale is superior to under-provisioning in terms of guaranteeing the SLA and does not require excessive resources • RobusT2Scale is superior to over-provisioning in terms of guaranteeing required resources while guaranteeing the SLA

Page 29: Self learning cloud controllers

0

0.02

0.04

0.06

0.08

0.1

alpha=0.1 alpha=0.5 alpha=0.9 alpha=1.0

Roo

t Mea

n S

quar

e E

rror

Noise level: 10%

Page 30: Self learning cloud controllers

Self-Learning Controller

Page 31: Self learning cloud controllers

Current Solution (my PhD Thesis + IC4 work on auto-scaling)

Future Plan Updating K in MAPE-K @ Runtime Design-time

Assistance

Multi-cloud

Page 32: Self learning cloud controllers

Fuzzifier

Inference Engine

Defuzzifier Rule base

Fuzzy Q-learning

Cloud Application Monitoring Actuator

Cloud Platform

Fuzzy Logic Controller

Knowledge Learning

Aut

onom

ic C

ontr

olle

r

𝑟𝑡 𝑤

𝑤,  𝑟𝑡,  𝑡ℎ,  𝑣𝑚

𝑠𝑎

system state system goal

Page 33: Self learning cloud controllers

RobusT2Scale

Learned rules

FQL

Monitoring Actuator

Cloud Platform

.fis

LW

W

ElasticBench

𝑤,  𝑟𝑡

𝑤,  𝑟𝑡,    𝑡ℎ,  𝑣𝑚

𝑠𝑎

Load Generator

C

system state

WCF

REST

𝛾,𝜂,𝜀,𝑟

Page 34: Self learning cloud controllers

Cloud Platform (PaaS) On-Premise

P: Worker

Role

L: Web Role

P: Worker

Role

P: Worker

Role

Cache

M: Worker

Role

Results:

Storage

Blackboard: Storage

LG: Console

Auto-scaling Logic (controller)

Policy Enforcer

1 2 3

7 8

11 12

4

10 9

LB: Load

Balancer

6

5 Queue

Actuator

Monitoring

Page 35: Self learning cloud controllers

0

0.2

0.4

0.6

0.8

1

1.2

0 4 12

15

21

26

33

39

47

53

60

68

76

83

90

95

103

110

118

124

130

136

145

152

159

166

171

179

185

193

200

206

214

218

219

228

236

242

249

255

263

267

271

275

283

290

295

303

307

314

320

S1 S2 S3 S4 S5

Page 36: Self learning cloud controllers

0

0.5

1

1.5

2

2.5

0 50 100 150 200 250 300 350

q(9,5)

0 0.02 0.04 0.06 0.08

0.1 0.12 0.14 0.16 0.18

0.2

0 50 100 150 200 250 300 350

q(7,5)

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

0 50 100 150 200 250 300 350

q(9,3)

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0 50 100 150 200 250 300 350

q(3,3)

Page 37: Self learning cloud controllers

0

1

2

3

4

5

6

7

8

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Page 38: Self learning cloud controllers

1

2

3

4

Page 39: Self learning cloud controllers

Challenge  1:  ~ 75%  wasted  capacityA c t u a l  

d e m a n d

Challenge  2:  customer  lost

Fuzzifier

Inference  Eng ine

DefuzzifierRule  base

FuzzyQ-­‐learning

Cloud  ApplicationMonitoring Actuator

Cloud  Platform

Fuzzy  Log ic  Controller

Knowledge  Learning

Auton

omic  Con

troller

𝑟𝑡𝑤

𝑤,𝑟𝑡,𝑡ℎ,𝑣𝑚

𝑠𝑎

system  state system  goal

RobusT2Scale

Learned  rules

FQL

Monitoring Actuator

Cloud  Platform

.fis

LW

W

ElasticBench

𝑤, 𝑟𝑡

𝑤, 𝑟𝑡,    𝑡ℎ, 𝑣𝑚

𝑠𝑎

Load  Generator

C

system  state

WCF

REST

𝛾, 𝜂, 𝜀, 𝑟

Page 40: Self learning cloud controllers

http://computing.dcu.ie/~pjamshidi/PDF/SEAMS2014.pdf

More Details? => http://www.slideshare.net/pooyanjamshidi/

Slides?

=>

Thank you!

Aakash Ahmad Claus Pahl Soodeh Farokhi Amir Sharifloo Armin Balalaie

https://github.com/pooyanjamshidi

Code? =>

Hamed Jamshidi

Nabor Mendonca

Brian Carroll Reza Teimourzadegan