self learning cloud controllers
TRANSCRIPT
Self-Learning Cloud Controllers
Pooyan Jamshidi, Amir Sharifloo, Claus Pahl, Andreas Metzger, Giovani Estrada IC4, Dublin City University, Ireland University of Duisburg-Essen, Germany Intel, Ireland
Invited presentation at UFC, Brazil, Fortaleza
~50% = wasted hardware
Actual traffic
Typical weekly traffic to Web-based applications (e.g., Amazon.com)
Problem 1: ~75% wasted capacity Actual
demand
Problem 2: customer lost
Traffic in an unexpected burst in requests (e.g. end of year traffic to Amazon.com)
Really like this??
Auto-scaling enables you to realize this ideal on-demand provisioning
Time
Demand
? Enacting change in the Cloud resources are not real-time
Capacity we can provision with Auto-Scaling
A realistic figure of dynamic provisioning
0 50 1000
500
1000
1500
0 50 100100
200
300
400
500
0 50 1000
1000
2000
0 50 1000
200
400
600
These quantitative values are required to be determined by the user ⇒ requires deep
knowledge of application (CPU, memory, thresholds)
⇒ requires performance modeling expertise (when and how to scale)
⇒ A unified opinion of user(s) is required
Amazon auto scaling Microsoft Azure Watch
9
Microsoft Azure Auto-scaling Application Block
Naeem Esfahani and Sam Malek, “Uncertainty in Self-Adaptive Software Systems”
Pooyan Jamshidi, Aakash Ahmad, Claus Pahl, Muhammad Ali Babar , “Sources of Uncertainty in Dynamic Management of Elastic Systems”, Under Review
Uncertainty related to enactment latency: The same scaling action (adding/removing a VM with precisely the same size) took different time to be enacted on the cloud platform (here is Microsoft Azure) at different points and this difference were significant (up to couple of minutes). The enactment latency would be also different on different cloud platforms.
Ø Offline benchmarking
Ø Trial-and-error
Ø Expert knowledge
Costly and not systematic
A. Gandhi, P. Dube, A. Karve, A. Kochut, L. Zhang, Adaptive, “Model-driven Autoscaling for Cloud Applications”, ICAC’14
arrival rate (req/s)
95% Resp. :me (m
s)
400 ms
60 req/s
RobusT2Scale Initial setting + elasticity rules + response-time SLA
environment monitoring
application monitoring
scaling actions
Fuzzy Reasoning Users
Prediction/Smoothing
0 0.5 1 1.5 2 2.5 30
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Region of definite satisfaction
Region of definite dissatisfaction Region of
uncertain satisfaction
Performance Index
Pos
sibi
lity
Performance Index P
ossi
bilit
y
words can mean different things to different people
Different users often recommend different elasticity policies
0 0.5 1 1.5 2 2.5 30
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Type-2 MF
Type-1 MF
Workload
Response time
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x2
uM
embe
rshi
p gr
ade
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x2u
Mem
bers
hip
grad
e
=>
=>
UMF
LMF
Embedded
FOU
mean
sd
Rule (𝒍)
Antecedents Consequent
𝒄𝒂𝒗𝒈𝒍 Workload Response-‐
time Normal (-‐2)
Effort (-‐1)
Medium Effort (0)
High Effort (+1)
Maximum Effort (+2)
1 Very low Instantaneous 7 2 1 0 0 -‐1.6 2 Very low Fast 5 4 1 0 0 -‐1.4 3 Very low Medium 0 2 6 2 0 0 4 Very low Slow 0 0 4 6 0 0.6 5 Very low Very slow 0 0 0 6 4 1.4 6 Low Instantaneous 5 3 2 0 0 -‐1.3 7 Low Fast 2 7 1 0 0 -‐1.1 8 Low Medium 0 1 5 3 1 0.4 9 Low Slow 0 0 1 8 1 1 10 Low Very slow 0 0 0 4 6 1.6 11 Medium Instantaneous 6 4 0 0 0 -‐1.6 12 Medium Fast 2 5 3 0 0 -‐0.9 13 Medium Medium 0 0 5 4 1 0.6 14 Medium Slow 0 0 1 7 2 1.1 15 Medium Very slow 0 0 1 3 6 1.5 16 High Instantaneous 8 2 0 0 0 -‐1.8 17 High Fast 4 6 0 0 0 -‐1.4 18 High Medium 0 1 5 3 1 0.4 19 High Slow 0 0 1 7 2 1.1 20 High Very slow 0 0 0 6 4 1.4 21 Very high Instantaneous 9 1 0 0 0 -‐1.9 22 Very high Fast 3 6 1 0 0 -‐1.2 23 Very high Medium 0 1 4 4 1 0.5 24 Very high Slow 0 0 1 8 1 1 25 Very high Very slow 0 0 0 4 6 1.6
Rule ()
Antecedents Consequent
Work load
Response
-time -2 -1 0 +1 +2
12 Medium Fast 2 5 3 0 0 -0.9
10 experts’ responses
𝑅↑𝑙 : IF (the workload (𝑥↓1 ) is 𝐹 ↓𝑖↓1 , AND the response-time (𝑥↓2 ) is 𝐺 ↓𝑖↓2 ), THEN (add/remove 𝑐↓𝑎𝑣𝑔↑𝑙 instances).
𝑐↓𝑎𝑣𝑔↑𝑙 = ∑𝑢=1↑𝑁↓𝑙 ▒𝑤↓𝑢↑𝑙 ×𝐶 /∑𝑢=1↑𝑁↓𝑙 ▒𝑤↓𝑢↑𝑙
Goal: pre-computations of costly calculations to make a runtime efficient elasticity reasoning based on fuzzy inference
Liang, Q., Mendel, J. M. (2000). Interval type-2 fuzzy logic systems: theory and design. Fuzzy Systems, IEEE Transactions on, 8(5), 535-550.
Scaling Actions
Monitoring Data
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.5954
0.3797
𝑀
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.2212
0.0000
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x2
u
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x2
uMonitoring data
Workload
Response time
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.5954
0.3797
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
10.95680.9377
Performance index
𝑦↓𝑙 , 𝑦↓𝑟
0 50 1000
500
1000
1500
0 50 100100
200
300
400
500
0 50 1000
1000
2000
0 50 1000
200
400
600
0 50 1000
500
1000
0 50 1000
500
1000
0 10 20 30 40 50 60 70 80 90 100-500
0
500
1000
1500
2000
Time (seconds)
Num
ber o
f hits
Original databetta=0.10, gamma=0.94, rmse=308.1565, rrse=0.79703betta=0.27, gamma=0.94, rmse=209.7852, rrse=0.54504betta=0.80, gamma=0.94, rmse=272.6285, rrse=0.70858
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Big spike Dual phase Large variations Quickly varying Slowly varying Steep tri phase
0 50 1000
500
1000
1500
0 50 100100
200
300
400
500
0 50 1000
1000
2000
0 50 1000
200
400
6000 50 1000
500
1000
0 50 1000
500
1000
Roo
t Rel
ativ
e S
quar
ed E
rror
SUT Criteria Big spike
Dual phase
Large variations
Quickly varying
Slowly varying
Steep tri phase
RobusT2Scale 973ms 537ms 509ms 451ms 423ms 498ms
3.2 3.8 5.1 5.3 3.7 3.9
Overprovisioning
354ms 411ms 395ms 446ms 371ms 491ms
6 6 6 6 6 6
Under provisioning
1465ms 1832ms 1789ms 1594ms 1898ms 2194ms
2 2 2 2 2 2
SLA: 𝒓𝒕↓𝟗𝟓 ≤𝟔𝟎𝟎𝒎𝒔 For every 10s control interval
• RobusT2Scale is superior to under-provisioning in terms of guaranteeing the SLA and does not require excessive resources • RobusT2Scale is superior to over-provisioning in terms of guaranteeing required resources while guaranteeing the SLA
0
0.02
0.04
0.06
0.08
0.1
alpha=0.1 alpha=0.5 alpha=0.9 alpha=1.0
Roo
t Mea
n S
quar
e E
rror
Noise level: 10%
Self-Learning Controller
Current Solution (my PhD Thesis + IC4 work on auto-scaling)
Future Plan Updating K in MAPE-K @ Runtime Design-time
Assistance
Multi-cloud
Fuzzifier
Inference Engine
Defuzzifier Rule base
Fuzzy Q-learning
Cloud Application Monitoring Actuator
Cloud Platform
Fuzzy Logic Controller
Knowledge Learning
Aut
onom
ic C
ontr
olle
r
𝑟𝑡 𝑤
𝑤, 𝑟𝑡, 𝑡ℎ, 𝑣𝑚
𝑠𝑎
system state system goal
RobusT2Scale
Learned rules
FQL
Monitoring Actuator
Cloud Platform
.fis
LW
W
ElasticBench
𝑤, 𝑟𝑡
𝑤, 𝑟𝑡, 𝑡ℎ, 𝑣𝑚
𝑠𝑎
Load Generator
C
system state
WCF
REST
𝛾,𝜂,𝜀,𝑟
Cloud Platform (PaaS) On-Premise
P: Worker
Role
L: Web Role
P: Worker
Role
P: Worker
Role
Cache
M: Worker
Role
Results:
Storage
Blackboard: Storage
LG: Console
Auto-scaling Logic (controller)
Policy Enforcer
1 2 3
7 8
11 12
4
10 9
LB: Load
Balancer
6
5 Queue
Actuator
Monitoring
0
0.2
0.4
0.6
0.8
1
1.2
0 4 12
15
21
26
33
39
47
53
60
68
76
83
90
95
103
110
118
124
130
136
145
152
159
166
171
179
185
193
200
206
214
218
219
228
236
242
249
255
263
267
271
275
283
290
295
303
307
314
320
S1 S2 S3 S4 S5
0
0.5
1
1.5
2
2.5
0 50 100 150 200 250 300 350
q(9,5)
0 0.02 0.04 0.06 0.08
0.1 0.12 0.14 0.16 0.18
0.2
0 50 100 150 200 250 300 350
q(7,5)
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 50 100 150 200 250 300 350
q(9,3)
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0 50 100 150 200 250 300 350
q(3,3)
0
1
2
3
4
5
6
7
8
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
1
2
3
4
Challenge 1: ~ 75% wasted capacityA c t u a l
d e m a n d
Challenge 2: customer lost
Fuzzifier
Inference Eng ine
DefuzzifierRule base
FuzzyQ-‐learning
Cloud ApplicationMonitoring Actuator
Cloud Platform
Fuzzy Log ic Controller
Knowledge Learning
Auton
omic Con
troller
𝑟𝑡𝑤
𝑤,𝑟𝑡,𝑡ℎ,𝑣𝑚
𝑠𝑎
system state system goal
RobusT2Scale
Learned rules
FQL
Monitoring Actuator
Cloud Platform
.fis
LW
W
ElasticBench
𝑤, 𝑟𝑡
𝑤, 𝑟𝑡, 𝑡ℎ, 𝑣𝑚
𝑠𝑎
Load Generator
C
system state
WCF
REST
𝛾, 𝜂, 𝜀, 𝑟
http://computing.dcu.ie/~pjamshidi/PDF/SEAMS2014.pdf
More Details? => http://www.slideshare.net/pooyanjamshidi/
Slides?
=>
Thank you!
Aakash Ahmad Claus Pahl Soodeh Farokhi Amir Sharifloo Armin Balalaie
https://github.com/pooyanjamshidi
Code? =>
Hamed Jamshidi
Nabor Mendonca
Brian Carroll Reza Teimourzadegan