an industrial case study of automatically identifying performance regression-causes

Post on 12-Apr-2017

50 Views

Category:

Software

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

An Industrial Case Study of Automatically Identifying

Performance Regression-Causes

Thanh H. D. Nguyen, Meiyappan Nagappan, Ahmed E. Hassan

Mohamed Nasser, Parminder Flora

1

2

Performance is key in day to day cloud based software

3

Performance regressions are caused by changes to the software

Version 1 Version 2

4

Performance is measured using resource counters

Time

Counter Value

CPU % Memory Disk IO Network IO

5

Performance regressions are found during load testing

Apply the same load

Version 2

Version 1

Probable causes

Compare the counters

Performance Engineer

CPU %Memory usage

Disk IONetwork IO

CPU %Memory usage

Disk IONetwork IO

6

Probable Causes derived from Industrial Case Study

Probable Causes %Added frequently executed DB query or miss

matched DB indices30.54

Added frequently accessed fields and objects 30.18

Added frequently executed logic 16.67

Symptom of regression is detected (e.g., response time increased) but no regression-

cause can be determined

16.67

Added blocking I/O access 5.55

7

Leveraging a repository for Regression-cause analysis

Frequently executed logic

Counters from Version 1

Counters from Version 2

Mismatched DB indices

Counters from Version 2

Counters from Version 3

Frequently executed logic

Counters from Version n-1

Counters from Version n

Baseline Counters

Target Counters

Probable Cause

Version 1 Version 2 Frequently executed logic

Version 2 Version 3 Mismatched DB indices

… … …

Version n-1 Version n Frequently executed logic

Performance Data for many different Probable Causes!

8

Mining performance regression repositories

Performance Regression Repository

Train Model

Model

Counters from Version n

Counters from Version n+1

Predicted Probable

Cause

Evaluate Prediction

9

But Input to the model cannot be raw counter data

Time

Counter Value

Load test on Machine 1

Load test on Machine 2

Same Pattern, But Different ValuesUse Control Charts

Violations = 3

Violation Ratio = 3/7

Total = 7

Same Violation Ratio = 3/7

Upper Control Line

Lower Control Line

Control Line

Upper Control Line

Lower Control Line

Control Line

10

Case Study Subjects

Open-source

Small set of usersWeb app

Not open-source

Large set of usersCommunication

11

Case Study Methodology

Apply the same load

Version 2

Version 1

Probable causes

CPU %Memory usage

Disk IONetwork IO

CPU %Memory usage

Disk IONetwork IO

Inject Fault

Version 1 + Injected Fault

Can we find the probable cause of injected fault?

12

All machine learner performs 3-7 times better than random predictor

Random J48

RandomTreeLM

T

RandomForest

BayesN

et

N.Bayes

N.Bayes Multinomial

DecisionTa

blePART

JRipLW

L IBkKSta

r

SimpleLo

gistic

Logisti

cSM

O

MultilayerPerce

ptron

0%10%20%30%40%50%60%70%80%

Accuracy

Decision tree Bayes

Rule LazyLogistic

Neural net

Results hold for both case studies,But different ML is better in each

13

top related