ghotra icse

29
Revisiting the Impact of Classification Techniques on the Performance of Defect Prediction Models Baljinder Ghotra Ahmed E. Hassan Shane McIntosh

Upload: sailqu

Post on 17-Aug-2015

35 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Ghotra icse

Revisiting the Impact of Classification Techniques on the Performance of

Defect Prediction Models

Baljinder Ghotra

Ahmed E.Hassan

Shane McIntosh

Page 2: Ghotra icse

Quality assurance teams have limited resources

Personnel Schedules

2

Page 3: Ghotra icse

Executing all test suitestakes too long

3

Often release several timesin one day!

Page 4: Ghotra icse

Defect models can help QA teams to allocate limited resources effectively

4

Defect prediction

model

Page 5: Ghotra icse

Defect models are trained using historical data to predict the defect-prone modules

5

a

b

c c

a

New!

c

Reasonfor change

Changedmodules

Developerresponsible

Page 6: Ghotra icse

Defect prediction model

Defect models are trained using historical data to predict the defect-prone modules

6

abccaNew!c

Low riska b

High risk

c

Page 7: Ghotra icse

Defect models are trained using various techniques

7

Simple techniques

Advanced techniques

Decision Trees

Logistic Regression+

Logistic Model Trees (LMT)

Page 8: Ghotra icse

Most classification techniques produce models that achieve similar performance?

8

Decision Trees Logistic Model Trees (LMT)

+

The performance of 17 of 22 studied techniques are

indistinguishableBenchmarking classification models for software defect

predictionS. Lessmann, B. Baesens,

C. Mues, S. Pietsch [TSE 2008]

Page 9: Ghotra icse

Limitations of the prior work

9

Overlapping statistical ranks

Noisy data

Limited scope

Page 10: Ghotra icse

Do most techniques produce models with similar performance, when we use:

10

Non-overlappingstatistical ranks

Cleandata

Expandedscope

Overlapping statistical ranks

Noisy data

Limited scope

Page 11: Ghotra icse

Do most techniques produce models with similar performance, when we use:

11

Non-overlapping statistical ranks

Expanded scope

Clean data

Page 12: Ghotra icse

Do most techniques produce models with similar performance, when we use:

12

Non-overlapping statistical ranks

Expanded scope

Clean data

Page 13: Ghotra icse

Our approach to study the impact of classification techniques on defect models

13

Train and test models

using different

techniques

Rank techniques

using statistical clustering

11a

22b

NNz

...

Performance scores for

each technique

Rank Tech.123

z, …a,b,…

Repeat100 times

Page 14: Ghotra icse

Unfortunately, some projects yieldpoorer results than others

14

●●

●●

●●●

●●

●●

CM1

JM1

KC1

KC3

KC4

MW1

PC1

PC2

PC3

PC4

0.5

0.6

0.7

0.8

0.9

AUC

Performance values rarely overlap!

Page 15: Ghotra icse

Non-overlapping ranks using a double Scott-Knott test

15

Scott-Knotttest (2nd run)

Project 2

Scott-Knotttest (1st run)

...Mean AUC value

Technique 1

Mean AUC value

Technique 1

Mean AUC value

Technique 1

10xMean AUC

value

Technique 2

Mean AUC value

Technique 2

Mean AUC value

Technique 2

10xMean AUC

value

Technique N

Mean AUC value

Technique N

Mean AUC value

Technique N

10x

T2, T5, T7

TechniqueRank

1

T1, T102

T3, T4, T63

T8, T94

T2, T5

TechniqueRank

1

T1, T7, T102

T3, T4, T63

T8, T94

Project 1

Scott-Knotttest (1st run)

...Mean AUC value

Technique 1

Mean AUC value

Technique 1

Mean AUC value

Technique 1

10xMean AUC

value

Technique 2

Mean AUC value

Technique 2

Mean AUC value

Technique 2

10xMean AUC

value

Technique N

Mean AUC value

Technique N

Mean AUC value

Technique N

10x

T3, T7, T8

TechniqueRank

1

T2, T102

T1, T4, T63

T5, T94

Project M

Scott-Knotttest (1st run)

...Mean AUC value

Technique 1

Mean AUC value

Technique 1

Mean AUC value

Technique 1

10xMean AUC

value

Technique 2

Mean AUC value

Technique 2

Mean AUC value

Technique 2

10xMean AUC

value

Technique N

Mean AUC value

Technique N

Mean AUC value

Technique N

10x

T2, T10

TechniqueRank

1

T1, T7, T82

T3, T4, T63

T5, T94

...

Page 16: Ghotra icse

Non-overlapping ranks using a double Scott-Knott test

16

Scott-Knotttest (2nd run)

Project 2

Scott-Knotttest (1st run)

...Mean AUC value

Technique 1

Mean AUC value

Technique 1

Mean AUC value

Technique 1

10xMean AUC

value

Technique 2

Mean AUC value

Technique 2

Mean AUC value

Technique 2

10xMean AUC

value

Technique N

Mean AUC value

Technique N

Mean AUC value

Technique N

10x

T2, T5, T7

TechniqueRank

1

T1, T102

T3, T4, T63

T8, T94

T2, T5

TechniqueRank

1

T1, T7, T102

T3, T4, T63

T8, T94

Project 1

Scott-Knotttest (1st run)

...Mean AUC value

Technique 1

Mean AUC value

Technique 1

Mean AUC value

Technique 1

10xMean AUC

value

Technique 2

Mean AUC value

Technique 2

Mean AUC value

Technique 2

10xMean AUC

value

Technique N

Mean AUC value

Technique N

Mean AUC value

Technique N

10x

T3, T7, T8

TechniqueRank

1

T2, T102

T1, T4, T63

T5, T94

Project M

Scott-Knotttest (1st run)

...Mean AUC value

Technique 1

Mean AUC value

Technique 1

Mean AUC value

Technique 1

10xMean AUC

value

Technique 2

Mean AUC value

Technique 2

Mean AUC value

Technique 2

10xMean AUC

value

Technique N

Mean AUC value

Technique N

Mean AUC value

Technique N

10x

T2, T10

TechniqueRank

1

T1, T7, T82

T3, T4, T63

T5, T94

...

Page 17: Ghotra icse

17

Non-overlapping test:Most techniques have similar performance

Rank12

Ad+NB, EM, RBFs, …Rsub+SMO, J48, …

Technique

Similar to the prior work, techniques are grouped into 2 distinct ranks

Page 18: Ghotra icse

Do most techniques produce models with similar performance, when we use:

18

Non-overlapping statistical ranks

Expanded scope

Clean data

Yes, techniques

are grouped into

2 distinct ranks

Page 19: Ghotra icse

Do most techniques produce models with similar performance, when we use:

19

Non-overlapping statistical ranks

Expanded scope

Clean data

Yes, techniques

are grouped into

2 distinct ranks

Page 20: Ghotra icse

Clean NASA dataset:Cleaning criteria of prior work

20

Data Quality: Some Comments on the NASA Software Defect Datasets

M. Shepperd, Q. Song, Z. Sun, C. Mair [TSE 2013]

Identical cases

Missing values

Constraint violations

Page 21: Ghotra icse

Clean NASA dataset:Many distinct ranks of techniques

21

Rank12

LMT, SL, …KNN, RBFs, …

Technique

3 J48, K-means, …4 SMO, Ridor, …

Unlike the prior work, techniques are grouped into 4 distinct ranks

Top performers are LMT and logistic regression

Page 22: Ghotra icse

Do most techniques produce models with similar performance, when we use:

22

Non-overlapping statistical ranks

Expanded scope

Clean data

Yes, techniques

are grouped into

2 distinct ranks

No, unlike the prior work, techniques are grouped into 4 distinct ranks

Page 23: Ghotra icse

Do most techniques produce models with similar performance, when we use:

23

Non-overlapping statistical ranks

Expanded scope

Clean data

Yes, techniques

are grouped into

2 distinct ranks

No, unlike the prior work, techniques are grouped into 4 distinct ranks

Page 24: Ghotra icse

Another dataset:The PROMISE corpus

24

Page 25: Ghotra icse

Another dataset:Four significant ranks of techniques

25

Rank12

LMT, SL, …KNN, RBFs, …

Technique

3 J48, K-means, …4 SMO, Ridor, …

Unlike the prior work, techniques are grouped into 4 distinct ranks

Top performers are LMT and logistic regression

Page 26: Ghotra icse

Do most techniques produce models with similar performance, when we use:

26

Non-overlapping statistical ranks

Expanded scope

Clean data

No, similar to the

clean data study,

techniques are

grouped into 4

distinct ranks

Yes, techniques

are grouped into

2 distinct ranks

No, unlike the prior work, techniques are grouped into 4 distinct ranks

Page 27: Ghotra icse

Classification techniquematters!

27

Decision Trees Logistic Model Trees (LMT)

+

Page 28: Ghotra icse

Low-cost suggestion:Experiment with the available techniques

28

6,618 packages

are available

on CRAN

148 packages are available in package explorer