think locally, act gobally - improving defect and effort prediction models

78
Think Locally, Act Globally Improving Defect and Effort Prediction Models Nicolas Bettenburg • Meiyappan Nagappan • Ahmed E. Hassan Queen’s University • Kingston, ON, Canada SOFTWARE ANALYSIS & INTELLIGENCE LAB Saturday, 2 June, 12

Upload: nicolas-bettenburg

Post on 22-Apr-2015

1.059 views

Category:

Education


1 download

DESCRIPTION

Talk given at the 2012 Working Conference on Mining Software Repositories (MSR'12) in Zürich, Switzerland.

TRANSCRIPT

Page 1: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Print This Page

Secure Payment provided by E-xact Transactions Ltd.

Transcript Order

Your Order

Quantity Item Unit Price1 Transcript, As Quickly As Possible CAD 15.00 CAD 15.00

Total CAD 15.00

This order is now complete. Transaction approved!Here is your receipt:

=========== TRANSACTION RECORD ==========Queen’s Registrar’s Office125-74 UNION ST.GORDON HALLKINGSTON, ON K7L3N6Canada

TYPE: Purchase

ACCT: Mastercard $ 15.00 CAD

CARD NUMBER : ############9698DATE/TIME : 06 Oct 09 14:48:24REFERENCE # : 001 337 MAUTHOR. # : 15474ZTRANS. REF. : 091006154700c0bc69d1

Approved - Thank You 000

Please retain this copy for your records.

Cardholder will pay above amount to cardissuer pursuant to cardholder agreement.=========================================

Think Locally, Act GloballyImproving Defect and Effort Prediction Models

Nicolas Bettenburg • Meiyappan Nagappan • Ahmed E. HassanQueen’s University • Kingston, ON, Canada

SOFTWARE ANALYSIS

& INTELLIGENCE LAB

Saturday, 2 June, 12

Page 2: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Data Modelling in Empirical SE

Observations

2

measured from project data

Saturday, 2 June, 12

Page 3: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Data Modelling in Empirical SE

Observations

Model

2

measured from project data

describe observations mathematically

Saturday, 2 June, 12

Page 4: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Data Modelling in Empirical SE

Observations

Model

Understanding

Prediction

2

measured from project data

describe observations mathematically

guide decision making

guide process optimizations and future research

Saturday, 2 June, 12

Page 5: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Model Building Today

3

Whole Dataset

Saturday, 2 June, 12

Page 6: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Model Building Today

3

Whole Dataset

Testing Data

Training Data

Saturday, 2 June, 12

Page 7: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Model Building Today

3

Whole Dataset

Testing Data

Training Data

M

Learned Model

Saturday, 2 June, 12

Page 8: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Model Building Today

3

Whole Dataset

Testing Data

Training Data

M

Learned Model

PredictionsY

Saturday, 2 June, 12

Page 9: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Model Building Today

3

Whole Dataset

Testing Data

Training Data

M

Learned Model

PredictionsY

Compare

Saturday, 2 June, 12

Page 10: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Much Research Effort on new metrics and new models!

4

Saturday, 2 June, 12

Page 11: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Maybe we need to look more at the data part

Saturday, 2 June, 12

Page 12: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

In the Field

Saturday, 2 June, 12

Page 13: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

In the Field

Tom Zimmermann

Saturday, 2 June, 12

Page 14: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

In the Field

Tom Zimmermann

We ran 622 cross-project predictions and found that only

3.4% actually worked.

Saturday, 2 June, 12

Page 15: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

In the Field

Tim Menzies

Tom Zimmermann

We ran 622 cross-project predictions and found that only

3.4% actually worked.

Saturday, 2 June, 12

Page 16: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

In the Field

Tim Menzies

Tom Zimmermann

We ran 622 cross-project predictions and found that only

3.4% actually worked.

Rather than focus on generalities, empirical SE should focus more on context-specific

principles.

Saturday, 2 June, 12

Page 17: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

In the Field

Tim Menzies

Tom Zimmermann

We ran 622 cross-project predictions and found that only

3.4% actually worked.

Rather than focus on generalities, empirical SE should focus more on context-specific

principles.

Taking local properties of data into consideration leads to better models!

Saturday, 2 June, 12

Page 18: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Using Locality in Statistical Models

Saturday, 2 June, 12

Page 19: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Does this principle work for statistical models?1

Using Locality in Statistical Models

Saturday, 2 June, 12

Page 20: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Does this principle work for statistical models?1

Does it work for Prediction?2

Using Locality in Statistical Models

Saturday, 2 June, 12

Page 21: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Does this principle work for statistical models?1

Does it work for Prediction?2

Can we do better?3

Using Locality in Statistical Models

Saturday, 2 June, 12

Page 22: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

M

Learned Model

Building Local Models

8

Whole Dataset

Testing Data

Training Data

PredictionsY

Saturday, 2 June, 12

Page 23: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

M

Learned Model

Building Local Models

8

Whole Dataset

Testing Data

Training Data

PredictionsY

Cluster Data

Saturday, 2 June, 12

Page 24: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Building Local Models

8

Whole Dataset

Testing Data

Training Data Learned ModelsM1 M2 M3

PredictionsY

Cluster Data Learn Multiple

Models

Saturday, 2 June, 12

Page 25: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Building Local Models

8

Whole Dataset

Testing Data

Training Data Learned ModelsM1 M2 M3

Predictions

Y Y Y

Cluster Data Learn Multiple

Models

Predict

Individually

Saturday, 2 June, 12

Page 26: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Building Local Models

8

Whole Dataset

Compare

Testing Data

Training Data Learned ModelsM1 M2 M3

Predictions

Y Y Y

Cluster Data Learn Multiple

Models

Predict

Individually

Saturday, 2 June, 12

Page 27: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

9

Global Statistical ModelCHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34

X

f(X)

0 1 2 3 4 5 6

Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

C(Y |X) = f (X) = X�,

where X� = �0

+ �1

X1

+ �2

X2

+ �3

X3

+ �4

X4

,and

X1

= X X2

= (X � a)+

X3

= (X � b)+

X4

= (X � c)+

.

Overall linearity in X can be tested by testing H0

:

�2

= �3

= �4

= 0.

Saturday, 2 June, 12

Page 28: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

9

Global Statistical ModelCHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34

X

f(X)

0 1 2 3 4 5 6

Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

C(Y |X) = f (X) = X�,

where X� = �0

+ �1

X1

+ �2

X2

+ �3

X3

+ �4

X4

,and

X1

= X X2

= (X � a)+

X3

= (X � b)+

X4

= (X � c)+

.

Overall linearity in X can be tested by testing H0

:

�2

= �3

= �4

= 0.

Saturday, 2 June, 12

Page 29: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

9

Global Statistical ModelCHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34

X

f(X)

0 1 2 3 4 5 6

Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

C(Y |X) = f (X) = X�,

where X� = �0

+ �1

X1

+ �2

X2

+ �3

X3

+ �4

X4

,and

X1

= X X2

= (X � a)+

X3

= (X � b)+

X4

= (X � c)+

.

Overall linearity in X can be tested by testing H0

:

�2

= �3

= �4

= 0.

Saturday, 2 June, 12

Page 30: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

9

Global Statistical ModelCHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34

X

f(X)

0 1 2 3 4 5 6

Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

C(Y |X) = f (X) = X�,

where X� = �0

+ �1

X1

+ �2

X2

+ �3

X3

+ �4

X4

,and

X1

= X X2

= (X � a)+

X3

= (X � b)+

X4

= (X � c)+

.

Overall linearity in X can be tested by testing H0

:

�2

= �3

= �4

= 0.

Model fit leaves much room for improvement!Saturday, 2 June, 12

Page 31: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34

X

f(X)

0 1 2 3 4 5 6

Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

C(Y |X) = f (X) = X�,

where X� = �0

+ �1

X1

+ �2

X2

+ �3

X3

+ �4

X4

,and

X1

= X X2

= (X � a)+

X3

= (X � b)+

X4

= (X � c)+

.

Overall linearity in X can be tested by testing H0

:

�2

= �3

= �4

= 0.

10

Local Statistical Model

Saturday, 2 June, 12

Page 32: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34

X

f(X)

0 1 2 3 4 5 6

Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

C(Y |X) = f (X) = X�,

where X� = �0

+ �1

X1

+ �2

X2

+ �3

X3

+ �4

X4

,and

X1

= X X2

= (X � a)+

X3

= (X � b)+

X4

= (X � c)+

.

Overall linearity in X can be tested by testing H0

:

�2

= �3

= �4

= 0.

10

Local Statistical Model

Saturday, 2 June, 12

Page 33: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34

X

f(X)

0 1 2 3 4 5 6

Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

C(Y |X) = f (X) = X�,

where X� = �0

+ �1

X1

+ �2

X2

+ �3

X3

+ �4

X4

,and

X1

= X X2

= (X � a)+

X3

= (X � b)+

X4

= (X � c)+

.

Overall linearity in X can be tested by testing H0

:

�2

= �3

= �4

= 0.

10

Local Statistical Model

Model 1

Model 2

Saturday, 2 June, 12

Page 34: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34

X

f(X)

0 1 2 3 4 5 6

Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

C(Y |X) = f (X) = X�,

where X� = �0

+ �1

X1

+ �2

X2

+ �3

X3

+ �4

X4

,and

X1

= X X2

= (X � a)+

X3

= (X � b)+

X4

= (X � c)+

.

Overall linearity in X can be tested by testing H0

:

�2

= �3

= �4

= 0.

10

Local Statistical Model

Model 1

Model 2

Improved Fit!Saturday, 2 June, 12

Page 35: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

How can we use this approach to get an even better fit?

Saturday, 2 June, 12

Page 36: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

12

Be Even More Local !CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34

X

f(X)

0 1 2 3 4 5 6

Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

C(Y |X) = f (X) = X�,

where X� = �0

+ �1

X1

+ �2

X2

+ �3

X3

+ �4

X4

,and

X1

= X X2

= (X � a)+

X3

= (X � b)+

X4

= (X � c)+

.

Overall linearity in X can be tested by testing H0

:

�2

= �3

= �4

= 0.

Saturday, 2 June, 12

Page 37: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

12

Be Even More Local !CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34

X

f(X)

0 1 2 3 4 5 6

Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

C(Y |X) = f (X) = X�,

where X� = �0

+ �1

X1

+ �2

X2

+ �3

X3

+ �4

X4

,and

X1

= X X2

= (X � a)+

X3

= (X � b)+

X4

= (X � c)+

.

Overall linearity in X can be tested by testing H0

:

�2

= �3

= �4

= 0.

Saturday, 2 June, 12

Page 38: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

12

Be Even More Local !CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34

X

f(X)

0 1 2 3 4 5 6

Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

C(Y |X) = f (X) = X�,

where X� = �0

+ �1

X1

+ �2

X2

+ �3

X3

+ �4

X4

,and

X1

= X X2

= (X � a)+

X3

= (X � b)+

X4

= (X � c)+

.

Overall linearity in X can be tested by testing H0

:

�2

= �3

= �4

= 0.

Saturday, 2 June, 12

Page 39: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

12

Be Even More Local !CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34

X

f(X)

0 1 2 3 4 5 6

Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

C(Y |X) = f (X) = X�,

where X� = �0

+ �1

X1

+ �2

X2

+ �3

X3

+ �4

X4

,and

X1

= X X2

= (X � a)+

X3

= (X � b)+

X4

= (X � c)+

.

Overall linearity in X can be tested by testing H0

:

�2

= �3

= �4

= 0.

Great Fit!

Saturday, 2 June, 12

Page 40: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

12

Be Even More Local !CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34

X

f(X)

0 1 2 3 4 5 6

Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

C(Y |X) = f (X) = X�,

where X� = �0

+ �1

X1

+ �2

X2

+ �3

X3

+ �4

X4

,and

X1

= X X2

= (X � a)+

X3

= (X � b)+

X4

= (X � c)+

.

Overall linearity in X can be tested by testing H0

:

�2

= �3

= �4

= 0.

Great Fit!BUT: Risk of Overfitting the Data!!

Saturday, 2 June, 12

Page 41: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Saturday, 2 June, 12

Page 42: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Clustering independent of Fit

Saturday, 2 June, 12

Page 43: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34

X

f(X)

0 1 2 3 4 5 6

Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

C(Y |X) = f (X) = X�,

where X� = �0

+ �1

X1

+ �2

X2

+ �3

X3

+ �4

X4

,and

X1

= X X2

= (X � a)+

X3

= (X � b)+

X4

= (X � c)+

.

Overall linearity in X can be tested by testing H0

:

�2

= �3

= �4

= 0.

CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34

X

f(X)

0 1 2 3 4 5 6

Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

C(Y |X) = f (X) = X�,

where X� = �0

+ �1

X1

+ �2

X2

+ �3

X3

+ �4

X4

,and

X1

= X X2

= (X � a)+

X3

= (X � b)+

X4

= (X � c)+

.

Overall linearity in X can be tested by testing H0

:

�2

= �3

= �4

= 0.

14

Saturday, 2 June, 12

Page 44: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34

X

f(X)

0 1 2 3 4 5 6

Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

C(Y |X) = f (X) = X�,

where X� = �0

+ �1

X1

+ �2

X2

+ �3

X3

+ �4

X4

,and

X1

= X X2

= (X � a)+

X3

= (X � b)+

X4

= (X � c)+

.

Overall linearity in X can be tested by testing H0

:

�2

= �3

= �4

= 0.

CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34

X

f(X)

0 1 2 3 4 5 6

Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

C(Y |X) = f (X) = X�,

where X� = �0

+ �1

X1

+ �2

X2

+ �3

X3

+ �4

X4

,and

X1

= X X2

= (X � a)+

X3

= (X � b)+

X4

= (X � c)+

.

Overall linearity in X can be tested by testing H0

:

�2

= �3

= �4

= 0.

14

Optimize Local Fit wrt. Minimizing Global Overfit

Saturday, 2 June, 12

Page 45: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34

X

f(X)

0 1 2 3 4 5 6

Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

C(Y |X) = f (X) = X�,

where X� = �0

+ �1

X1

+ �2

X2

+ �3

X3

+ �4

X4

,and

X1

= X X2

= (X � a)+

X3

= (X � b)+

X4

= (X � c)+

.

Overall linearity in X can be tested by testing H0

:

�2

= �3

= �4

= 0.

CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34

X

f(X)

0 1 2 3 4 5 6

Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

C(Y |X) = f (X) = X�,

where X� = �0

+ �1

X1

+ �2

X2

+ �3

X3

+ �4

X4

,and

X1

= X X2

= (X � a)+

X3

= (X � b)+

X4

= (X � c)+

.

Overall linearity in X can be tested by testing H0

:

�2

= �3

= �4

= 0.

14

CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34

X

f(X)

0 1 2 3 4 5 6

Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

C(Y |X) = f (X) = X�,

where X� = �0

+ �1

X1

+ �2

X2

+ �3

X3

+ �4

X4

,and

X1

= X X2

= (X � a)+

X3

= (X � b)+

X4

= (X � c)+

.

Overall linearity in X can be tested by testing H0

:

�2

= �3

= �4

= 0.

Optimize Local Fit wrt. Minimizing Global Overfit

Saturday, 2 June, 12

Page 46: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34

X

f(X)

0 1 2 3 4 5 6

Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

C(Y |X) = f (X) = X�,

where X� = �0

+ �1

X1

+ �2

X2

+ �3

X3

+ �4

X4

,and

X1

= X X2

= (X � a)+

X3

= (X � b)+

X4

= (X � c)+

.

Overall linearity in X can be tested by testing H0

:

�2

= �3

= �4

= 0.

CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34

X

f(X)

0 1 2 3 4 5 6

Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

C(Y |X) = f (X) = X�,

where X� = �0

+ �1

X1

+ �2

X2

+ �3

X3

+ �4

X4

,and

X1

= X X2

= (X � a)+

X3

= (X � b)+

X4

= (X � c)+

.

Overall linearity in X can be tested by testing H0

:

�2

= �3

= �4

= 0.

14

CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34

X

f(X)

0 1 2 3 4 5 6

Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

C(Y |X) = f (X) = X�,

where X� = �0

+ �1

X1

+ �2

X2

+ �3

X3

+ �4

X4

,and

X1

= X X2

= (X � a)+

X3

= (X � b)+

X4

= (X � c)+

.

Overall linearity in X can be tested by testing H0

:

�2

= �3

= �4

= 0.

Optimize Local Fit wrt. Minimizing Global Overfit

Saturday, 2 June, 12

Page 47: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34

X

f(X)

0 1 2 3 4 5 6

Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

C(Y |X) = f (X) = X�,

where X� = �0

+ �1

X1

+ �2

X2

+ �3

X3

+ �4

X4

,and

X1

= X X2

= (X � a)+

X3

= (X � b)+

X4

= (X � c)+

.

Overall linearity in X can be tested by testing H0

:

�2

= �3

= �4

= 0.

CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34

X

f(X)

0 1 2 3 4 5 6

Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

C(Y |X) = f (X) = X�,

where X� = �0

+ �1

X1

+ �2

X2

+ �3

X3

+ �4

X4

,and

X1

= X X2

= (X � a)+

X3

= (X � b)+

X4

= (X � c)+

.

Overall linearity in X can be tested by testing H0

:

�2

= �3

= �4

= 0.

14

CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34

X

f(X)

0 1 2 3 4 5 6

Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

C(Y |X) = f (X) = X�,

where X� = �0

+ �1

X1

+ �2

X2

+ �3

X3

+ �4

X4

,and

X1

= X X2

= (X � a)+

X3

= (X � b)+

X4

= (X � c)+

.

Overall linearity in X can be tested by testing H0

:

�2

= �3

= �4

= 0.

Optimize Local Fit wrt. Minimizing Global Overfit

Multivariate Adaptive Regression Splines (MARS)

Saturday, 2 June, 12

Page 48: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34

X

f(X)

0 1 2 3 4 5 6

Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

C(Y |X) = f (X) = X�,

where X� = �0

+ �1

X1

+ �2

X2

+ �3

X3

+ �4

X4

,and

X1

= X X2

= (X � a)+

X3

= (X � b)+

X4

= (X � c)+

.

Overall linearity in X can be tested by testing H0

:

�2

= �3

= �4

= 0.

CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34

X

f(X)

0 1 2 3 4 5 6

Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

C(Y |X) = f (X) = X�,

where X� = �0

+ �1

X1

+ �2

X2

+ �3

X3

+ �4

X4

,and

X1

= X X2

= (X � a)+

X3

= (X � b)+

X4

= (X � c)+

.

Overall linearity in X can be tested by testing H0

:

�2

= �3

= �4

= 0.

14

CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34

X

f(X)

0 1 2 3 4 5 6

Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

C(Y |X) = f (X) = X�,

where X� = �0

+ �1

X1

+ �2

X2

+ �3

X3

+ �4

X4

,and

X1

= X X2

= (X � a)+

X3

= (X � b)+

X4

= (X � c)+

.

Overall linearity in X can be tested by testing H0

:

�2

= �3

= �4

= 0.

Optimize Local Fit wrt. Minimizing Global Overfit

Multivariate Adaptive Regression Splines (MARS)create local knowledge that optimizes process globally

Saturday, 2 June, 12

Page 49: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Case Study

15

Saturday, 2 June, 12

Page 50: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Case Study

15

Xalan 2.6

Lucene 2.4Post-Release Defects per Class

20 CK Metrics

Saturday, 2 June, 12

Page 51: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Case Study

15

Xalan 2.6

Lucene 2.4Post-Release Defects per Class

20 CK Metrics

CHINA Total Development Effort in Hours 14 FP Metrics

Saturday, 2 June, 12

Page 52: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Case Study

15

Xalan 2.6

Lucene 2.4Post-Release Defects per Class

20 CK Metrics

CHINA Total Development Effort in Hours 14 FP Metrics

NasaCocDevelopment Length in Months

24 COCOMO-II Metrics

Saturday, 2 June, 12

Page 53: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Results: Goodness of Fit

16

Rank-Correlation (0 = worst fit, 1 = optimal fit)

Saturday, 2 June, 12

Page 54: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Results: Goodness of Fit

16

Global Local (Clustered) MARS

Xalan 2.6

Lucene 2.4

CHINA

NasaCOC

0.33 0.52 0.69

0.32 0.60 0.83

0.83 0.89 0.89

0.93 0.97 0.99

Rank-Correlation (0 = worst fit, 1 = optimal fit)

Saturday, 2 June, 12

Page 55: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Results: Goodness of Fit

16

Global Local (Clustered) MARS

Xalan 2.6

Lucene 2.4

CHINA

NasaCOC

0.33 0.52 0.69

0.32 0.60 0.83

0.83 0.89 0.89

0.93 0.97 0.99

Rank-Correlation (0 = worst fit, 1 = optimal fit)

Saturday, 2 June, 12

Page 56: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Results: Goodness of Fit

16

Global Local (Clustered) MARS

Xalan 2.6

Lucene 2.4

CHINA

NasaCOC

0.33 0.52 0.69

0.32 0.60 0.83

0.83 0.89 0.89

0.93 0.97 0.99

Rank-Correlation (0 = worst fit, 1 = optimal fit)

Saturday, 2 June, 12

Page 57: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Results: Goodness of Fit

16

Global Local (Clustered) MARS

Xalan 2.6

Lucene 2.4

CHINA

NasaCOC

0.33 0.52 0.69

0.32 0.60 0.83

0.83 0.89 0.89

0.93 0.97 0.99

Rank-Correlation (0 = worst fit, 1 = optimal fit)

Saturday, 2 June, 12

Page 58: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Results: Goodness of Fit

16

Global Local (Clustered) MARS

Xalan 2.6

Lucene 2.4

CHINA

NasaCOC

0.33 0.52 0.69

0.32 0.60 0.83

0.83 0.89 0.89

0.93 0.97 0.99

Rank-Correlation (0 = worst fit, 1 = optimal fit)N

umbe

r of C

lust

ers

0

2

4

6

8

Fold01 Fold02 Fold03 Fold04 Fold05 Fold06 Fold07 Fold08 Fold09 Fold10

Dataset

CHINALucene 2.4NasaCocXalan 2.6

Figure 3: Number of clusters generated by MCLUST in each run of the 10-fold cross validation.

term for each additional prediction variable entering theregression model [23].

For practical purposes, we use a publicly available imple-mentation of BIC-based model selection, contained in theR package: BMA. The input to the BMA implementationis the dataset itself, as well as a list of all dependent andindependent variables that should be considered. In our casestudy, we always supply a list of all independent variablesthat were left after VIF analysis. The output of the BMAimplementation is a selection of independent variables, suchthat the linear regression model built with these variableswill produce the best fit to the data, while avoiding over-fitting.

D. Multivariate Adaptive Regression Splines

We use Multivariate Adaptive Regression Splines [11], orMARS, models, as an example of a global prediction modelthat takes local considerations of the dataset into account.MARS models have become increasingly popular in medi-cal, and social sciences, as well as in economical sciences,where they are used with great success [3], [21], [28]. AMARS model has the form Y = ✏

o

+ c1 ⇤H(X1) + · · · +ci

⇤ H(Xn

), with Y called the dependent variable (that isto be predicted), c

i

the i-th hinge coefficient, and H(Xi

)the i-th “hinge function”. Hinge functions are an integralpart of MARS models, as they allow to describe non-linearrelationships in the data. In particular, they partition thedata into disjoint regions that can be then described sepa-rately (our notion of local considerations). In general, hingefunctions used in MARS models take on the form of eitherH(X

i

) = max(c,Xi

�c), or H(Xi

) = max(c, c�Xi

), withc being some constant real value, and X

i

an independent(predictor) variable.

A MARS model is built in two separate phases. In theforward phase, MARS starts with a model which consists ofjust the intercept term (which is the mean of the independentvariables). It then repeatedly adds hinge functions in pairsto the model. At each step it finds the pair of functions thatgives the maximum reduction in residual error. This processof adding terms continues until the change in residual error

is too small to continue or until a maximum number of termsis reached. In our case study, the maximum number of termsis automatically determined by the implementation, and isbased on the amount of independent variables we give asinput. For MARS models, we use all independent variablesin a dataset after VIF analysis.

The first phase often builds a model that suffers fromoverfitting. As a result, the second phase, called the back-ward phase, prunes the model, to increase the model’s gen-eralization ability. The backward phase removes individualterms, deleting the least effective term at each step until itfinds the best submodel. Model subsets are compared usinga performance criterion specific to MARS models, and thebest model is selected and returned as the final predictionmodel.

MARS models have the advantage that a model selectionphase is already built-in by design, so we do not need tocarry out a model selection step similar to BIC, as we dowith global and local models. Second, the hinge functionsin MARS models do already model disjoint regions ofthe dataset separately, such that there is no need for priorpartitioning of the dataset with a clustering algorithm. Bothadvantages make this type of prediction model very easy touse in practice.

E. Cross Validation

For better generalizability of our results, and to counterrandom observation bias, all experiments described in ourcase study are repeated 10 times on random stratified sub-samples of the data into training (90% of the data) andtesting (10% of the data) sets. Stratification is carried outon the measure that is to be predicted. We evaluate allour findings based on the average over the 10 repetitions.This practice of evaluating results is a common approach inMachine Learning, and is often referred to as “10-fold crossvalidation” [27].

IV. RESULTS

In this section, we present the result of our case study.This presentation is carried out in three steps that followour initial three research questions. For each part, we first

Saturday, 2 June, 12

Page 59: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Results: Goodness of Fit

16

Global Local (Clustered) MARS

Xalan 2.6

Lucene 2.4

CHINA

NasaCOC

0.33 0.52 0.69

0.32 0.60 0.83

0.83 0.89 0.89

0.93 0.97 0.99

Rank-Correlation (0 = worst fit, 1 = optimal fit)

UP TO 2.5x BETTER FIT WHEN USING DATA LOCALITY!

Saturday, 2 June, 12

Page 60: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

0

0.175

0.35

0.525

0.7

Xalan 2.6

0.40.52

0.64

0

0.3

0.6

0.9

1.2

Lucene 2.4

0.941.151.15

0

200

400

600

800

CHINA

234.43552.85

765

0

1

2

3

4

NasaCoC

1.632.143.26

Results: Prediction Error

17

Global Local MARS

Saturday, 2 June, 12

Page 61: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

0

0.175

0.35

0.525

0.7

Xalan 2.6

0.40.52

0.64

0

0.3

0.6

0.9

1.2

Lucene 2.4

0.941.151.15

0

200

400

600

800

CHINA

234.43552.85

765

0

1

2

3

4

NasaCoC

1.632.143.26

Results: Prediction Error

17

Global Local MARS

Up to 4x lower prediction error with Local Models!

Saturday, 2 June, 12

Page 62: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

ModelInterpretation

?

Saturday, 2 June, 12

Page 63: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Model Interpretation

19

0 5 10 15 20−2.5

−1.5

−0.5

0.5

1 avg_cc

0 50 100 150

0.50

0.60

0.70

0.80

2 ca

0.0 0.2 0.4 0.6 0.8 1.0

0.44

0.48

0.52

3 cam

0 5 10 15 20 25 30

0.5

0.7

0.9

1.1

4 cbm

0 10 20 30 40 50

0.50

0.54

0.58

0.62

5 ce

0.0 0.2 0.4 0.6 0.8 1.0

0.35

0.45

6 dam

1 2 3 4 5 6 7 8

0.3

0.4

0.5

0.6

7 dit

0 1 2 3 4 5

0.50

0.55

0.60

0.65

8 ic

0 1000 3000 5000

0.6

1.0

1.4

1.8 9 lcom

0.0 0.5 1.0 1.5 2.0

0.3

0.4

0.5

0.6

0.7

10 lcom3

0 1000 2000 3000 4000

0.5

1.0

1.5

2.0

11 loc

0 20 40 60 80 120

12

34

12 max_cc

0.0 0.2 0.4 0.6 0.8 1.0

0.45

0.47

0.49

0.51

13 mfa

0 5 10 15

0.50

0.54

0.58

14 moa

0 5 10 15 20 25 30

0.42

0.46

0.50

15 noc

0 20 40 60 80 100 120

0.50

0.60

0.70

16 npm

lm(formula=f,data=training1)

(a) Part of a global Model learned on the Xalan 2.6 dataset

0.0 0.2 0.4 0.6 0.8 1.0

0.8

1.2

1.6

1 cam

0 5 10 15 20 25 30

1.0

1.2

1.4

1.6

1.8

2 cbm

0 10 20 30 40 50

0.2

0.4

0.6

0.8

1.0

3 ce

0.0 0.2 0.4 0.6 0.8 1.0

0.65

0.75

0.85

4 dam

1 2 3 4 5 6 7 8

0.2

0.4

0.6

0.8

5 dit

0 1 2 3 4 5

0.9

1.1

1.3

1.5

6 ic

0 1000 3000 5000

1.0

1.5

2.0

2.5

3.0 7 lcom

0.0 0.5 1.0 1.5 2.00.55

0.65

0.75

0.85

8 lcom3

0 1000 2000 3000 4000

12

34

56

9 loc

0.0 0.2 0.4 0.6 0.8 1.0

0.8

1.0

1.2

1.4

10 mfa

0 5 10 15

0.0

0.2

0.4

0.6

0.8

11 moa

0 5 10 15 20 25 30

0.0

0.5

1.0

1.5

12 noc

0 20 40 60 80 100 120

−1.0

0.0

0.5

1.0

13 npm

bug earth(formula=f,data=training1)

(b) Part of a Global model with local considerations learned on the Xalan2.6 dataset

Figure 6: Global models report general trends, while global models with local considerations give insights into different regions of the data. The Y-Axisdescribes the response (in this case bugs) while keeping all other prediction variables at their median values.

0 2 4 6 8 10

��

���

���

��� ��� ��� ��� ��� ���

��

��

��� ��� ��� ���

��

���

���

0 10 20 30 40 �� 60

��

���

���

���

��� ��� ��� ��� ��� ���

01

23

0 1 2 3 4

��

��

���

ic

npm

npm

ic mfa

mfa

Fold 9, Cluster 1

Fold 9, Cluster 6

Figure 7: Example of contradicting trends in local models (Xalan 2.6,Cluster 1 and Cluster 6 in Fold 9).

model already partition the data into regions with individualproperties. For example, we observe that an increase of ic(measuring the inheritance coupling through parent classes)is predicted to only have a negative effect on bug-pronenesswhen it attains values larger than 1. Thus a practitioner mightdecide a different course of action than he or she would havedone based on the trends outlined by a global model.

V. CONCLUSIONS

In this study, we investigated the difference of threedifferent types of prediction models. Global models arebuilt on software engineering datasets as-is, while forlocal models we first subdivide the datasets into subsets ofdata with similar observations, before building individualmodels on each subset. In addition, we also studied athird approach: multivariate adaptive regression splines asa global model with local considerations. MARS by designtakes local considerations of individual regions of the datainto account, and can thus be considered a hybrid betweenglobal and local models.

A. Think LocallyWe evaluated each of the three modelling strategies in acase study on four different datasets, which have beenused in prior research on the WHICH machine-learningalgorithm [14]. The results of our case study demonstratethat clustering of a dataset into regions with similarproperties and using the individual regions for building of

prediction models leads to an improved fit of these models.Our findings thus confirm the results of Menzies et al.,who observed a similar effect of data localization on theirWHICH machine-learning algorithm. These increased fitshave practical implications for researchers concerned inusing regression models for understanding: local modelsare more insightful than global models, which report onlygeneral trends across the whole dataset, whereas we havedemonstrated that such general trends may not hold true forparticular parts of the dataset. For example, we have seenin the Xalan 2.6 defect prediction dataset that particularsets of classes are influenced differently by attributes suchas inheritance, cohesion and complexity. Our findingsreinforce the recommendations of Menzies et al. againstthe use of a “one-size-fits-all” approach, such as a globalmodel, when trying to account for such localized effects.

B. Act GloballyWhen the goal is carrying out actual predictions, rather thanunderstanding, local models show only small improvementsover global models, with respect to prediction error andranking. In particular, building local models involves asignificant overhead due to clustering of the data. Eventhough clustering algorithms such as the one presented inthe work by Menzies et al. [14] might run in linear time, westill have to learn a multitude of models, one for each cluster.One particular point that we have not addressed in our studyis whether the choice of clustering algorithm influences thefinal performance of the local models. While our choice wasto use a state-of-the-art model-based clustering algorithmthat partitions data along dimensions of highest variability,future research may want to look deeper into the effect thatdifferent clustering approaches have on the performance oflocal models.

Surprisingly, we found that the relatively small increasein prediction performance of local models is offset byan increased error variance. While predictions from localmodels are close to the actual values most of the time, weobserved the occasional very high errors. In other words,while global models are not as accurate as local models,their worst case scenarios are not as bad as we observe withlocal models. We want to note however that this findingstand in conflict with the findings of Menzies et al., whoobserved the opposite: their clustering algorithm decreases

Saturday, 2 June, 12

Page 64: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Model Interpretation

19

0 5 10 15 20−2.5

−1.5

−0.5

0.5

1 avg_cc

0 50 100 150

0.50

0.60

0.70

0.80

2 ca

0.0 0.2 0.4 0.6 0.8 1.0

0.44

0.48

0.52

3 cam

0 5 10 15 20 25 30

0.5

0.7

0.9

1.1

4 cbm

0 10 20 30 40 50

0.50

0.54

0.58

0.62

5 ce

0.0 0.2 0.4 0.6 0.8 1.0

0.35

0.45

6 dam

1 2 3 4 5 6 7 8

0.3

0.4

0.5

0.6

7 dit

0 1 2 3 4 5

0.50

0.55

0.60

0.65

8 ic

0 1000 3000 5000

0.6

1.0

1.4

1.8 9 lcom

0.0 0.5 1.0 1.5 2.0

0.3

0.4

0.5

0.6

0.7

10 lcom3

0 1000 2000 3000 4000

0.5

1.0

1.5

2.0

11 loc

0 20 40 60 80 120

12

34

12 max_cc

0.0 0.2 0.4 0.6 0.8 1.0

0.45

0.47

0.49

0.51

13 mfa

0 5 10 15

0.50

0.54

0.58

14 moa

0 5 10 15 20 25 30

0.42

0.46

0.50

15 noc

0 20 40 60 80 100 120

0.50

0.60

0.70

16 npm

lm(formula=f,data=training1)

(a) Part of a global Model learned on the Xalan 2.6 dataset

0.0 0.2 0.4 0.6 0.8 1.0

0.8

1.2

1.6

1 cam

0 5 10 15 20 25 30

1.0

1.2

1.4

1.6

1.8

2 cbm

0 10 20 30 40 50

0.2

0.4

0.6

0.8

1.0

3 ce

0.0 0.2 0.4 0.6 0.8 1.0

0.65

0.75

0.85

4 dam

1 2 3 4 5 6 7 8

0.2

0.4

0.6

0.8

5 dit

0 1 2 3 4 5

0.9

1.1

1.3

1.5

6 ic

0 1000 3000 5000

1.0

1.5

2.0

2.5

3.0 7 lcom

0.0 0.5 1.0 1.5 2.00.55

0.65

0.75

0.85

8 lcom3

0 1000 2000 3000 4000

12

34

56

9 loc

0.0 0.2 0.4 0.6 0.8 1.0

0.8

1.0

1.2

1.4

10 mfa

0 5 10 15

0.0

0.2

0.4

0.6

0.8

11 moa

0 5 10 15 20 25 30

0.0

0.5

1.0

1.5

12 noc

0 20 40 60 80 100 120

−1.0

0.0

0.5

1.0

13 npm

bug earth(formula=f,data=training1)

(b) Part of a Global model with local considerations learned on the Xalan2.6 dataset

Figure 6: Global models report general trends, while global models with local considerations give insights into different regions of the data. The Y-Axisdescribes the response (in this case bugs) while keeping all other prediction variables at their median values.

0 2 4 6 8 10

��

���

���

��� ��� ��� ��� ��� ���

��

��

��� ��� ��� ���

��

���

���

0 10 20 30 40 �� 60

��

���

���

���

��� ��� ��� ��� ��� ���

01

23

0 1 2 3 4

��

��

���

ic

npm

npm

ic mfa

mfa

Fold 9, Cluster 1

Fold 9, Cluster 6

Figure 7: Example of contradicting trends in local models (Xalan 2.6,Cluster 1 and Cluster 6 in Fold 9).

model already partition the data into regions with individualproperties. For example, we observe that an increase of ic(measuring the inheritance coupling through parent classes)is predicted to only have a negative effect on bug-pronenesswhen it attains values larger than 1. Thus a practitioner mightdecide a different course of action than he or she would havedone based on the trends outlined by a global model.

V. CONCLUSIONS

In this study, we investigated the difference of threedifferent types of prediction models. Global models arebuilt on software engineering datasets as-is, while forlocal models we first subdivide the datasets into subsets ofdata with similar observations, before building individualmodels on each subset. In addition, we also studied athird approach: multivariate adaptive regression splines asa global model with local considerations. MARS by designtakes local considerations of individual regions of the datainto account, and can thus be considered a hybrid betweenglobal and local models.

A. Think LocallyWe evaluated each of the three modelling strategies in acase study on four different datasets, which have beenused in prior research on the WHICH machine-learningalgorithm [14]. The results of our case study demonstratethat clustering of a dataset into regions with similarproperties and using the individual regions for building of

prediction models leads to an improved fit of these models.Our findings thus confirm the results of Menzies et al.,who observed a similar effect of data localization on theirWHICH machine-learning algorithm. These increased fitshave practical implications for researchers concerned inusing regression models for understanding: local modelsare more insightful than global models, which report onlygeneral trends across the whole dataset, whereas we havedemonstrated that such general trends may not hold true forparticular parts of the dataset. For example, we have seenin the Xalan 2.6 defect prediction dataset that particularsets of classes are influenced differently by attributes suchas inheritance, cohesion and complexity. Our findingsreinforce the recommendations of Menzies et al. againstthe use of a “one-size-fits-all” approach, such as a globalmodel, when trying to account for such localized effects.

B. Act GloballyWhen the goal is carrying out actual predictions, rather thanunderstanding, local models show only small improvementsover global models, with respect to prediction error andranking. In particular, building local models involves asignificant overhead due to clustering of the data. Eventhough clustering algorithms such as the one presented inthe work by Menzies et al. [14] might run in linear time, westill have to learn a multitude of models, one for each cluster.One particular point that we have not addressed in our studyis whether the choice of clustering algorithm influences thefinal performance of the local models. While our choice wasto use a state-of-the-art model-based clustering algorithmthat partitions data along dimensions of highest variability,future research may want to look deeper into the effect thatdifferent clustering approaches have on the performance oflocal models.

Surprisingly, we found that the relatively small increasein prediction performance of local models is offset byan increased error variance. While predictions from localmodels are close to the actual values most of the time, weobserved the occasional very high errors. In other words,while global models are not as accurate as local models,their worst case scenarios are not as bad as we observe withlocal models. We want to note however that this findingstand in conflict with the findings of Menzies et al., whoobserved the opposite: their clustering algorithm decreases

Traditional Global Model: General Trends

Saturday, 2 June, 12

Page 65: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Model Interpretation

19

0 5 10 15 20−2.5

−1.5

−0.5

0.5

1 avg_cc

0 50 100 150

0.50

0.60

0.70

0.80

2 ca

0.0 0.2 0.4 0.6 0.8 1.0

0.44

0.48

0.52

3 cam

0 5 10 15 20 25 30

0.5

0.7

0.9

1.1

4 cbm

0 10 20 30 40 50

0.50

0.54

0.58

0.62

5 ce

0.0 0.2 0.4 0.6 0.8 1.0

0.35

0.45

6 dam

1 2 3 4 5 6 7 8

0.3

0.4

0.5

0.6

7 dit

0 1 2 3 4 5

0.50

0.55

0.60

0.65

8 ic

0 1000 3000 5000

0.6

1.0

1.4

1.8 9 lcom

0.0 0.5 1.0 1.5 2.0

0.3

0.4

0.5

0.6

0.7

10 lcom3

0 1000 2000 3000 4000

0.5

1.0

1.5

2.0

11 loc

0 20 40 60 80 120

12

34

12 max_cc

0.0 0.2 0.4 0.6 0.8 1.0

0.45

0.47

0.49

0.51

13 mfa

0 5 10 15

0.50

0.54

0.58

14 moa

0 5 10 15 20 25 30

0.42

0.46

0.50

15 noc

0 20 40 60 80 100 120

0.50

0.60

0.70

16 npm

lm(formula=f,data=training1)

(a) Part of a global Model learned on the Xalan 2.6 dataset

0.0 0.2 0.4 0.6 0.8 1.0

0.8

1.2

1.6

1 cam

0 5 10 15 20 25 30

1.0

1.2

1.4

1.6

1.8

2 cbm

0 10 20 30 40 50

0.2

0.4

0.6

0.8

1.0

3 ce

0.0 0.2 0.4 0.6 0.8 1.0

0.65

0.75

0.85

4 dam

1 2 3 4 5 6 7 8

0.2

0.4

0.6

0.8

5 dit

0 1 2 3 4 5

0.9

1.1

1.3

1.5

6 ic

0 1000 3000 5000

1.0

1.5

2.0

2.5

3.0 7 lcom

0.0 0.5 1.0 1.5 2.00.55

0.65

0.75

0.85

8 lcom3

0 1000 2000 3000 4000

12

34

56

9 loc

0.0 0.2 0.4 0.6 0.8 1.0

0.8

1.0

1.2

1.4

10 mfa

0 5 10 15

0.0

0.2

0.4

0.6

0.8

11 moa

0 5 10 15 20 25 30

0.0

0.5

1.0

1.5

12 noc

0 20 40 60 80 100 120

−1.0

0.0

0.5

1.0

13 npm

bug earth(formula=f,data=training1)

(b) Part of a Global model with local considerations learned on the Xalan2.6 dataset

Figure 6: Global models report general trends, while global models with local considerations give insights into different regions of the data. The Y-Axisdescribes the response (in this case bugs) while keeping all other prediction variables at their median values.

0 2 4 6 8 10

��

���

���

��� ��� ��� ��� ��� ���

��

��

��� ��� ��� ���

��

���

���

0 10 20 30 40 �� 60

��

���

���

���

��� ��� ��� ��� ��� ���

01

23

0 1 2 3 4

��

��

���

ic

npm

npm

ic mfa

mfa

Fold 9, Cluster 1

Fold 9, Cluster 6

Figure 7: Example of contradicting trends in local models (Xalan 2.6,Cluster 1 and Cluster 6 in Fold 9).

model already partition the data into regions with individualproperties. For example, we observe that an increase of ic(measuring the inheritance coupling through parent classes)is predicted to only have a negative effect on bug-pronenesswhen it attains values larger than 1. Thus a practitioner mightdecide a different course of action than he or she would havedone based on the trends outlined by a global model.

V. CONCLUSIONS

In this study, we investigated the difference of threedifferent types of prediction models. Global models arebuilt on software engineering datasets as-is, while forlocal models we first subdivide the datasets into subsets ofdata with similar observations, before building individualmodels on each subset. In addition, we also studied athird approach: multivariate adaptive regression splines asa global model with local considerations. MARS by designtakes local considerations of individual regions of the datainto account, and can thus be considered a hybrid betweenglobal and local models.

A. Think LocallyWe evaluated each of the three modelling strategies in acase study on four different datasets, which have beenused in prior research on the WHICH machine-learningalgorithm [14]. The results of our case study demonstratethat clustering of a dataset into regions with similarproperties and using the individual regions for building of

prediction models leads to an improved fit of these models.Our findings thus confirm the results of Menzies et al.,who observed a similar effect of data localization on theirWHICH machine-learning algorithm. These increased fitshave practical implications for researchers concerned inusing regression models for understanding: local modelsare more insightful than global models, which report onlygeneral trends across the whole dataset, whereas we havedemonstrated that such general trends may not hold true forparticular parts of the dataset. For example, we have seenin the Xalan 2.6 defect prediction dataset that particularsets of classes are influenced differently by attributes suchas inheritance, cohesion and complexity. Our findingsreinforce the recommendations of Menzies et al. againstthe use of a “one-size-fits-all” approach, such as a globalmodel, when trying to account for such localized effects.

B. Act GloballyWhen the goal is carrying out actual predictions, rather thanunderstanding, local models show only small improvementsover global models, with respect to prediction error andranking. In particular, building local models involves asignificant overhead due to clustering of the data. Eventhough clustering algorithms such as the one presented inthe work by Menzies et al. [14] might run in linear time, westill have to learn a multitude of models, one for each cluster.One particular point that we have not addressed in our studyis whether the choice of clustering algorithm influences thefinal performance of the local models. While our choice wasto use a state-of-the-art model-based clustering algorithmthat partitions data along dimensions of highest variability,future research may want to look deeper into the effect thatdifferent clustering approaches have on the performance oflocal models.

Surprisingly, we found that the relatively small increasein prediction performance of local models is offset byan increased error variance. While predictions from localmodels are close to the actual values most of the time, weobserved the occasional very high errors. In other words,while global models are not as accurate as local models,their worst case scenarios are not as bad as we observe withlocal models. We want to note however that this findingstand in conflict with the findings of Menzies et al., whoobserved the opposite: their clustering algorithm decreases

Traditional Global Model: General TrendsOne Curve per metric, run corp on that curve

Saturday, 2 June, 12

Page 66: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

20

0 5 10 15 20−2.5

−1.5

−0.5

0.5

1 avg_cc

0 50 100 150

0.50

0.60

0.70

0.80

2 ca

0.0 0.2 0.4 0.6 0.8 1.0

0.44

0.48

0.52

3 cam

0 5 10 15 20 25 30

0.5

0.7

0.9

1.1

4 cbm

0 10 20 30 40 50

0.50

0.54

0.58

0.62

5 ce

0.0 0.2 0.4 0.6 0.8 1.0

0.35

0.45

6 dam

1 2 3 4 5 6 7 8

0.3

0.4

0.5

0.6

7 dit

0 1 2 3 4 5

0.50

0.55

0.60

0.65

8 ic

0 1000 3000 50000.6

1.0

1.4

1.8 9 lcom

0.0 0.5 1.0 1.5 2.0

0.3

0.4

0.5

0.6

0.7

10 lcom3

0 1000 2000 3000 4000

0.5

1.0

1.5

2.0

11 loc

0 20 40 60 80 120

12

34

12 max_cc

0.0 0.2 0.4 0.6 0.8 1.0

0.45

0.47

0.49

0.51

13 mfa

0 5 10 15

0.50

0.54

0.58

14 moa

0 5 10 15 20 25 30

0.42

0.46

0.50

15 noc

0 20 40 60 80 100 120

0.50

0.60

0.70

16 npm

lm(formula=f,data=training1)

(a) Part of a global Model learned on the Xalan 2.6 dataset

0.0 0.2 0.4 0.6 0.8 1.0

0.8

1.2

1.6

1 cam

0 5 10 15 20 25 30

1.0

1.2

1.4

1.6

1.8

2 cbm

0 10 20 30 40 50

0.2

0.4

0.6

0.8

1.0

3 ce

0.0 0.2 0.4 0.6 0.8 1.0

0.65

0.75

0.85

4 dam

1 2 3 4 5 6 7 8

0.2

0.4

0.6

0.8

5 dit

0 1 2 3 4 5

0.9

1.1

1.3

1.5

6 ic

0 1000 3000 5000

1.0

1.5

2.0

2.5

3.0 7 lcom

0.0 0.5 1.0 1.5 2.00.55

0.65

0.75

0.85

8 lcom3

0 1000 2000 3000 4000

12

34

56

9 loc

0.0 0.2 0.4 0.6 0.8 1.0

0.8

1.0

1.2

1.4

10 mfa

0 5 10 15

0.0

0.2

0.4

0.6

0.8

11 moa

0 5 10 15 20 25 30

0.0

0.5

1.0

1.5

12 noc

0 20 40 60 80 100 120

−1.0

0.0

0.5

1.0

13 npm

bug earth(formula=f,data=training1)

(b) Part of a Global model with local considerations learned on the Xalan2.6 dataset

Figure 6: Global models report general trends, while global models with local considerations give insights into different regions of the data. The Y-Axisdescribes the response (in this case bugs) while keeping all other prediction variables at their median values.

0 2 4 6 8 10

��

���

���

��� ��� ��� ��� ��� ���

��

��

��� ��� ��� ���

��

���

���

0 10 20 30 40 �� 60

��

���

���

���

��� ��� ��� ��� ��� ���0

12

30 1 2 3 4

��

��

���

ic

npm

npm

ic mfa

mfa

Fold 9, Cluster 1

Fold 9, Cluster 6

Figure 7: Example of contradicting trends in local models (Xalan 2.6,Cluster 1 and Cluster 6 in Fold 9).

model already partition the data into regions with individualproperties. For example, we observe that an increase of ic(measuring the inheritance coupling through parent classes)is predicted to only have a negative effect on bug-pronenesswhen it attains values larger than 1. Thus a practitioner mightdecide a different course of action than he or she would havedone based on the trends outlined by a global model.

V. CONCLUSIONS

In this study, we investigated the difference of threedifferent types of prediction models. Global models arebuilt on software engineering datasets as-is, while forlocal models we first subdivide the datasets into subsets ofdata with similar observations, before building individualmodels on each subset. In addition, we also studied athird approach: multivariate adaptive regression splines asa global model with local considerations. MARS by designtakes local considerations of individual regions of the datainto account, and can thus be considered a hybrid betweenglobal and local models.

A. Think LocallyWe evaluated each of the three modelling strategies in acase study on four different datasets, which have beenused in prior research on the WHICH machine-learningalgorithm [14]. The results of our case study demonstratethat clustering of a dataset into regions with similarproperties and using the individual regions for building of

prediction models leads to an improved fit of these models.Our findings thus confirm the results of Menzies et al.,who observed a similar effect of data localization on theirWHICH machine-learning algorithm. These increased fitshave practical implications for researchers concerned inusing regression models for understanding: local modelsare more insightful than global models, which report onlygeneral trends across the whole dataset, whereas we havedemonstrated that such general trends may not hold true forparticular parts of the dataset. For example, we have seenin the Xalan 2.6 defect prediction dataset that particularsets of classes are influenced differently by attributes suchas inheritance, cohesion and complexity. Our findingsreinforce the recommendations of Menzies et al. againstthe use of a “one-size-fits-all” approach, such as a globalmodel, when trying to account for such localized effects.

B. Act GloballyWhen the goal is carrying out actual predictions, rather thanunderstanding, local models show only small improvementsover global models, with respect to prediction error andranking. In particular, building local models involves asignificant overhead due to clustering of the data. Eventhough clustering algorithms such as the one presented inthe work by Menzies et al. [14] might run in linear time, westill have to learn a multitude of models, one for each cluster.One particular point that we have not addressed in our studyis whether the choice of clustering algorithm influences thefinal performance of the local models. While our choice wasto use a state-of-the-art model-based clustering algorithmthat partitions data along dimensions of highest variability,future research may want to look deeper into the effect thatdifferent clustering approaches have on the performance oflocal models.

Surprisingly, we found that the relatively small increasein prediction performance of local models is offset byan increased error variance. While predictions from localmodels are close to the actual values most of the time, weobserved the occasional very high errors. In other words,while global models are not as accurate as local models,their worst case scenarios are not as bad as we observe withlocal models. We want to note however that this findingstand in conflict with the findings of Menzies et al., whoobserved the opposite: their clustering algorithm decreases

0 5 10 15 20−2.5

−1.5

−0.5

0.5

1 avg_cc

0 50 100 150

0.50

0.60

0.70

0.80

2 ca

0.0 0.2 0.4 0.6 0.8 1.0

0.44

0.48

0.52

3 cam

0 5 10 15 20 25 30

0.5

0.7

0.9

1.1

4 cbm

0 10 20 30 40 50

0.50

0.54

0.58

0.62

5 ce

0.0 0.2 0.4 0.6 0.8 1.0

0.35

0.45

6 dam

1 2 3 4 5 6 7 8

0.3

0.4

0.5

0.6

7 dit

0 1 2 3 4 5

0.50

0.55

0.60

0.65

8 ic

0 1000 3000 50000.6

1.0

1.4

1.8 9 lcom

0.0 0.5 1.0 1.5 2.0

0.3

0.4

0.5

0.6

0.7

10 lcom3

0 1000 2000 3000 4000

0.5

1.0

1.5

2.0

11 loc

0 20 40 60 80 120

12

34

12 max_cc

0.0 0.2 0.4 0.6 0.8 1.0

0.45

0.47

0.49

0.51

13 mfa

0 5 10 150.50

0.54

0.58

14 moa

0 5 10 15 20 25 30

0.42

0.46

0.50

15 noc

0 20 40 60 80 100 120

0.50

0.60

0.70

16 npm

lm(formula=f,data=training1)

(a) Part of a global Model learned on the Xalan 2.6 dataset

0.0 0.2 0.4 0.6 0.8 1.0

0.8

1.2

1.6

1 cam

0 5 10 15 20 25 30

1.0

1.2

1.4

1.6

1.8

2 cbm

0 10 20 30 40 50

0.2

0.4

0.6

0.8

1.0

3 ce

0.0 0.2 0.4 0.6 0.8 1.0

0.65

0.75

0.85

4 dam

1 2 3 4 5 6 7 8

0.2

0.4

0.6

0.8

5 dit

0 1 2 3 4 5

0.9

1.1

1.3

1.5

6 ic

0 1000 3000 5000

1.0

1.5

2.0

2.5

3.0 7 lcom

0.0 0.5 1.0 1.5 2.00.55

0.65

0.75

0.85

8 lcom3

0 1000 2000 3000 4000

12

34

56

9 loc

0.0 0.2 0.4 0.6 0.8 1.0

0.8

1.0

1.2

1.4

10 mfa

0 5 10 15

0.0

0.2

0.4

0.6

0.8

11 moa

0 5 10 15 20 25 30

0.0

0.5

1.0

1.5

12 noc

0 20 40 60 80 100 120

−1.0

0.0

0.5

1.0

13 npm

bug earth(formula=f,data=training1)

(b) Part of a Global model with local considerations learned on the Xalan2.6 dataset

Figure 6: Global models report general trends, while global models with local considerations give insights into different regions of the data. The Y-Axisdescribes the response (in this case bugs) while keeping all other prediction variables at their median values.

0 2 4 6 8 10��

���

���

��� ��� ��� ��� ��� ���

��

��

��� ��� ��� ���

��

���

���

0 10 20 30 40 �� 60

��

���

���

���

��� ��� ��� ��� ��� ���

01

23

0 1 2 3 4

��

��

���

ic

npm

npm

ic mfa

mfa

Fold 9, Cluster 1

Fold 9, Cluster 6

Figure 7: Example of contradicting trends in local models (Xalan 2.6,Cluster 1 and Cluster 6 in Fold 9).

model already partition the data into regions with individualproperties. For example, we observe that an increase of ic(measuring the inheritance coupling through parent classes)is predicted to only have a negative effect on bug-pronenesswhen it attains values larger than 1. Thus a practitioner mightdecide a different course of action than he or she would havedone based on the trends outlined by a global model.

V. CONCLUSIONS

In this study, we investigated the difference of threedifferent types of prediction models. Global models arebuilt on software engineering datasets as-is, while forlocal models we first subdivide the datasets into subsets ofdata with similar observations, before building individualmodels on each subset. In addition, we also studied athird approach: multivariate adaptive regression splines asa global model with local considerations. MARS by designtakes local considerations of individual regions of the datainto account, and can thus be considered a hybrid betweenglobal and local models.

A. Think LocallyWe evaluated each of the three modelling strategies in acase study on four different datasets, which have beenused in prior research on the WHICH machine-learningalgorithm [14]. The results of our case study demonstratethat clustering of a dataset into regions with similarproperties and using the individual regions for building of

prediction models leads to an improved fit of these models.Our findings thus confirm the results of Menzies et al.,who observed a similar effect of data localization on theirWHICH machine-learning algorithm. These increased fitshave practical implications for researchers concerned inusing regression models for understanding: local modelsare more insightful than global models, which report onlygeneral trends across the whole dataset, whereas we havedemonstrated that such general trends may not hold true forparticular parts of the dataset. For example, we have seenin the Xalan 2.6 defect prediction dataset that particularsets of classes are influenced differently by attributes suchas inheritance, cohesion and complexity. Our findingsreinforce the recommendations of Menzies et al. againstthe use of a “one-size-fits-all” approach, such as a globalmodel, when trying to account for such localized effects.

B. Act GloballyWhen the goal is carrying out actual predictions, rather thanunderstanding, local models show only small improvementsover global models, with respect to prediction error andranking. In particular, building local models involves asignificant overhead due to clustering of the data. Eventhough clustering algorithms such as the one presented inthe work by Menzies et al. [14] might run in linear time, westill have to learn a multitude of models, one for each cluster.One particular point that we have not addressed in our studyis whether the choice of clustering algorithm influences thefinal performance of the local models. While our choice wasto use a state-of-the-art model-based clustering algorithmthat partitions data along dimensions of highest variability,future research may want to look deeper into the effect thatdifferent clustering approaches have on the performance oflocal models.

Surprisingly, we found that the relatively small increasein prediction performance of local models is offset byan increased error variance. While predictions from localmodels are close to the actual values most of the time, weobserved the occasional very high errors. In other words,while global models are not as accurate as local models,their worst case scenarios are not as bad as we observe withlocal models. We want to note however that this findingstand in conflict with the findings of Menzies et al., whoobserved the opposite: their clustering algorithm decreases

Cluster 1

Cluster 6

Model Interpretation

...

Saturday, 2 June, 12

Page 67: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

20

Local (Clustered) Model: Many, many, many Trends!

0 5 10 15 20−2.5

−1.5

−0.5

0.5

1 avg_cc

0 50 100 150

0.50

0.60

0.70

0.80

2 ca

0.0 0.2 0.4 0.6 0.8 1.0

0.44

0.48

0.52

3 cam

0 5 10 15 20 25 30

0.5

0.7

0.9

1.1

4 cbm

0 10 20 30 40 50

0.50

0.54

0.58

0.62

5 ce

0.0 0.2 0.4 0.6 0.8 1.0

0.35

0.45

6 dam

1 2 3 4 5 6 7 8

0.3

0.4

0.5

0.6

7 dit

0 1 2 3 4 5

0.50

0.55

0.60

0.65

8 ic

0 1000 3000 50000.6

1.0

1.4

1.8 9 lcom

0.0 0.5 1.0 1.5 2.0

0.3

0.4

0.5

0.6

0.7

10 lcom3

0 1000 2000 3000 4000

0.5

1.0

1.5

2.0

11 loc

0 20 40 60 80 120

12

34

12 max_cc

0.0 0.2 0.4 0.6 0.8 1.0

0.45

0.47

0.49

0.51

13 mfa

0 5 10 15

0.50

0.54

0.58

14 moa

0 5 10 15 20 25 30

0.42

0.46

0.50

15 noc

0 20 40 60 80 100 120

0.50

0.60

0.70

16 npm

lm(formula=f,data=training1)

(a) Part of a global Model learned on the Xalan 2.6 dataset

0.0 0.2 0.4 0.6 0.8 1.0

0.8

1.2

1.6

1 cam

0 5 10 15 20 25 30

1.0

1.2

1.4

1.6

1.8

2 cbm

0 10 20 30 40 50

0.2

0.4

0.6

0.8

1.0

3 ce

0.0 0.2 0.4 0.6 0.8 1.0

0.65

0.75

0.85

4 dam

1 2 3 4 5 6 7 8

0.2

0.4

0.6

0.8

5 dit

0 1 2 3 4 5

0.9

1.1

1.3

1.5

6 ic

0 1000 3000 5000

1.0

1.5

2.0

2.5

3.0 7 lcom

0.0 0.5 1.0 1.5 2.00.55

0.65

0.75

0.85

8 lcom3

0 1000 2000 3000 4000

12

34

56

9 loc

0.0 0.2 0.4 0.6 0.8 1.0

0.8

1.0

1.2

1.4

10 mfa

0 5 10 15

0.0

0.2

0.4

0.6

0.8

11 moa

0 5 10 15 20 25 30

0.0

0.5

1.0

1.5

12 noc

0 20 40 60 80 100 120

−1.0

0.0

0.5

1.0

13 npm

bug earth(formula=f,data=training1)

(b) Part of a Global model with local considerations learned on the Xalan2.6 dataset

Figure 6: Global models report general trends, while global models with local considerations give insights into different regions of the data. The Y-Axisdescribes the response (in this case bugs) while keeping all other prediction variables at their median values.

0 2 4 6 8 10

��

���

���

��� ��� ��� ��� ��� ���

��

��

��� ��� ��� ���

��

���

���

0 10 20 30 40 �� 60

��

���

���

���

��� ��� ��� ��� ��� ���0

12

30 1 2 3 4

��

��

���

ic

npm

npm

ic mfa

mfa

Fold 9, Cluster 1

Fold 9, Cluster 6

Figure 7: Example of contradicting trends in local models (Xalan 2.6,Cluster 1 and Cluster 6 in Fold 9).

model already partition the data into regions with individualproperties. For example, we observe that an increase of ic(measuring the inheritance coupling through parent classes)is predicted to only have a negative effect on bug-pronenesswhen it attains values larger than 1. Thus a practitioner mightdecide a different course of action than he or she would havedone based on the trends outlined by a global model.

V. CONCLUSIONS

In this study, we investigated the difference of threedifferent types of prediction models. Global models arebuilt on software engineering datasets as-is, while forlocal models we first subdivide the datasets into subsets ofdata with similar observations, before building individualmodels on each subset. In addition, we also studied athird approach: multivariate adaptive regression splines asa global model with local considerations. MARS by designtakes local considerations of individual regions of the datainto account, and can thus be considered a hybrid betweenglobal and local models.

A. Think LocallyWe evaluated each of the three modelling strategies in acase study on four different datasets, which have beenused in prior research on the WHICH machine-learningalgorithm [14]. The results of our case study demonstratethat clustering of a dataset into regions with similarproperties and using the individual regions for building of

prediction models leads to an improved fit of these models.Our findings thus confirm the results of Menzies et al.,who observed a similar effect of data localization on theirWHICH machine-learning algorithm. These increased fitshave practical implications for researchers concerned inusing regression models for understanding: local modelsare more insightful than global models, which report onlygeneral trends across the whole dataset, whereas we havedemonstrated that such general trends may not hold true forparticular parts of the dataset. For example, we have seenin the Xalan 2.6 defect prediction dataset that particularsets of classes are influenced differently by attributes suchas inheritance, cohesion and complexity. Our findingsreinforce the recommendations of Menzies et al. againstthe use of a “one-size-fits-all” approach, such as a globalmodel, when trying to account for such localized effects.

B. Act GloballyWhen the goal is carrying out actual predictions, rather thanunderstanding, local models show only small improvementsover global models, with respect to prediction error andranking. In particular, building local models involves asignificant overhead due to clustering of the data. Eventhough clustering algorithms such as the one presented inthe work by Menzies et al. [14] might run in linear time, westill have to learn a multitude of models, one for each cluster.One particular point that we have not addressed in our studyis whether the choice of clustering algorithm influences thefinal performance of the local models. While our choice wasto use a state-of-the-art model-based clustering algorithmthat partitions data along dimensions of highest variability,future research may want to look deeper into the effect thatdifferent clustering approaches have on the performance oflocal models.

Surprisingly, we found that the relatively small increasein prediction performance of local models is offset byan increased error variance. While predictions from localmodels are close to the actual values most of the time, weobserved the occasional very high errors. In other words,while global models are not as accurate as local models,their worst case scenarios are not as bad as we observe withlocal models. We want to note however that this findingstand in conflict with the findings of Menzies et al., whoobserved the opposite: their clustering algorithm decreases

0 5 10 15 20−2.5

−1.5

−0.5

0.5

1 avg_cc

0 50 100 150

0.50

0.60

0.70

0.80

2 ca

0.0 0.2 0.4 0.6 0.8 1.0

0.44

0.48

0.52

3 cam

0 5 10 15 20 25 30

0.5

0.7

0.9

1.1

4 cbm

0 10 20 30 40 50

0.50

0.54

0.58

0.62

5 ce

0.0 0.2 0.4 0.6 0.8 1.0

0.35

0.45

6 dam

1 2 3 4 5 6 7 8

0.3

0.4

0.5

0.6

7 dit

0 1 2 3 4 5

0.50

0.55

0.60

0.65

8 ic

0 1000 3000 50000.6

1.0

1.4

1.8 9 lcom

0.0 0.5 1.0 1.5 2.0

0.3

0.4

0.5

0.6

0.7

10 lcom3

0 1000 2000 3000 4000

0.5

1.0

1.5

2.0

11 loc

0 20 40 60 80 120

12

34

12 max_cc

0.0 0.2 0.4 0.6 0.8 1.0

0.45

0.47

0.49

0.51

13 mfa

0 5 10 150.50

0.54

0.58

14 moa

0 5 10 15 20 25 30

0.42

0.46

0.50

15 noc

0 20 40 60 80 100 120

0.50

0.60

0.70

16 npm

lm(formula=f,data=training1)

(a) Part of a global Model learned on the Xalan 2.6 dataset

0.0 0.2 0.4 0.6 0.8 1.0

0.8

1.2

1.6

1 cam

0 5 10 15 20 25 30

1.0

1.2

1.4

1.6

1.8

2 cbm

0 10 20 30 40 50

0.2

0.4

0.6

0.8

1.0

3 ce

0.0 0.2 0.4 0.6 0.8 1.0

0.65

0.75

0.85

4 dam

1 2 3 4 5 6 7 8

0.2

0.4

0.6

0.8

5 dit

0 1 2 3 4 5

0.9

1.1

1.3

1.5

6 ic

0 1000 3000 5000

1.0

1.5

2.0

2.5

3.0 7 lcom

0.0 0.5 1.0 1.5 2.00.55

0.65

0.75

0.85

8 lcom3

0 1000 2000 3000 4000

12

34

56

9 loc

0.0 0.2 0.4 0.6 0.8 1.0

0.8

1.0

1.2

1.4

10 mfa

0 5 10 15

0.0

0.2

0.4

0.6

0.8

11 moa

0 5 10 15 20 25 30

0.0

0.5

1.0

1.5

12 noc

0 20 40 60 80 100 120

−1.0

0.0

0.5

1.0

13 npm

bug earth(formula=f,data=training1)

(b) Part of a Global model with local considerations learned on the Xalan2.6 dataset

Figure 6: Global models report general trends, while global models with local considerations give insights into different regions of the data. The Y-Axisdescribes the response (in this case bugs) while keeping all other prediction variables at their median values.

0 2 4 6 8 10��

���

���

��� ��� ��� ��� ��� ���

��

��

��� ��� ��� ���

��

���

���

0 10 20 30 40 �� 60

��

���

���

���

��� ��� ��� ��� ��� ���

01

23

0 1 2 3 4

��

��

���

ic

npm

npm

ic mfa

mfa

Fold 9, Cluster 1

Fold 9, Cluster 6

Figure 7: Example of contradicting trends in local models (Xalan 2.6,Cluster 1 and Cluster 6 in Fold 9).

model already partition the data into regions with individualproperties. For example, we observe that an increase of ic(measuring the inheritance coupling through parent classes)is predicted to only have a negative effect on bug-pronenesswhen it attains values larger than 1. Thus a practitioner mightdecide a different course of action than he or she would havedone based on the trends outlined by a global model.

V. CONCLUSIONS

In this study, we investigated the difference of threedifferent types of prediction models. Global models arebuilt on software engineering datasets as-is, while forlocal models we first subdivide the datasets into subsets ofdata with similar observations, before building individualmodels on each subset. In addition, we also studied athird approach: multivariate adaptive regression splines asa global model with local considerations. MARS by designtakes local considerations of individual regions of the datainto account, and can thus be considered a hybrid betweenglobal and local models.

A. Think LocallyWe evaluated each of the three modelling strategies in acase study on four different datasets, which have beenused in prior research on the WHICH machine-learningalgorithm [14]. The results of our case study demonstratethat clustering of a dataset into regions with similarproperties and using the individual regions for building of

prediction models leads to an improved fit of these models.Our findings thus confirm the results of Menzies et al.,who observed a similar effect of data localization on theirWHICH machine-learning algorithm. These increased fitshave practical implications for researchers concerned inusing regression models for understanding: local modelsare more insightful than global models, which report onlygeneral trends across the whole dataset, whereas we havedemonstrated that such general trends may not hold true forparticular parts of the dataset. For example, we have seenin the Xalan 2.6 defect prediction dataset that particularsets of classes are influenced differently by attributes suchas inheritance, cohesion and complexity. Our findingsreinforce the recommendations of Menzies et al. againstthe use of a “one-size-fits-all” approach, such as a globalmodel, when trying to account for such localized effects.

B. Act GloballyWhen the goal is carrying out actual predictions, rather thanunderstanding, local models show only small improvementsover global models, with respect to prediction error andranking. In particular, building local models involves asignificant overhead due to clustering of the data. Eventhough clustering algorithms such as the one presented inthe work by Menzies et al. [14] might run in linear time, westill have to learn a multitude of models, one for each cluster.One particular point that we have not addressed in our studyis whether the choice of clustering algorithm influences thefinal performance of the local models. While our choice wasto use a state-of-the-art model-based clustering algorithmthat partitions data along dimensions of highest variability,future research may want to look deeper into the effect thatdifferent clustering approaches have on the performance oflocal models.

Surprisingly, we found that the relatively small increasein prediction performance of local models is offset byan increased error variance. While predictions from localmodels are close to the actual values most of the time, weobserved the occasional very high errors. In other words,while global models are not as accurate as local models,their worst case scenarios are not as bad as we observe withlocal models. We want to note however that this findingstand in conflict with the findings of Menzies et al., whoobserved the opposite: their clustering algorithm decreases

Cluster 1

Cluster 6

Model Interpretation

...

Saturday, 2 June, 12

Page 68: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

20

Local (Clustered) Model: Many, many, many Trends!

0 5 10 15 20−2.5

−1.5

−0.5

0.5

1 avg_cc

0 50 100 150

0.50

0.60

0.70

0.80

2 ca

0.0 0.2 0.4 0.6 0.8 1.0

0.44

0.48

0.52

3 cam

0 5 10 15 20 25 30

0.5

0.7

0.9

1.1

4 cbm

0 10 20 30 40 50

0.50

0.54

0.58

0.62

5 ce

0.0 0.2 0.4 0.6 0.8 1.0

0.35

0.45

6 dam

1 2 3 4 5 6 7 8

0.3

0.4

0.5

0.6

7 dit

0 1 2 3 4 5

0.50

0.55

0.60

0.65

8 ic

0 1000 3000 50000.6

1.0

1.4

1.8 9 lcom

0.0 0.5 1.0 1.5 2.0

0.3

0.4

0.5

0.6

0.7

10 lcom3

0 1000 2000 3000 4000

0.5

1.0

1.5

2.0

11 loc

0 20 40 60 80 120

12

34

12 max_cc

0.0 0.2 0.4 0.6 0.8 1.0

0.45

0.47

0.49

0.51

13 mfa

0 5 10 15

0.50

0.54

0.58

14 moa

0 5 10 15 20 25 30

0.42

0.46

0.50

15 noc

0 20 40 60 80 100 120

0.50

0.60

0.70

16 npm

lm(formula=f,data=training1)

(a) Part of a global Model learned on the Xalan 2.6 dataset

0.0 0.2 0.4 0.6 0.8 1.0

0.8

1.2

1.6

1 cam

0 5 10 15 20 25 30

1.0

1.2

1.4

1.6

1.8

2 cbm

0 10 20 30 40 50

0.2

0.4

0.6

0.8

1.0

3 ce

0.0 0.2 0.4 0.6 0.8 1.0

0.65

0.75

0.85

4 dam

1 2 3 4 5 6 7 8

0.2

0.4

0.6

0.8

5 dit

0 1 2 3 4 5

0.9

1.1

1.3

1.5

6 ic

0 1000 3000 5000

1.0

1.5

2.0

2.5

3.0 7 lcom

0.0 0.5 1.0 1.5 2.00.55

0.65

0.75

0.85

8 lcom3

0 1000 2000 3000 4000

12

34

56

9 loc

0.0 0.2 0.4 0.6 0.8 1.0

0.8

1.0

1.2

1.4

10 mfa

0 5 10 15

0.0

0.2

0.4

0.6

0.8

11 moa

0 5 10 15 20 25 30

0.0

0.5

1.0

1.5

12 noc

0 20 40 60 80 100 120

−1.0

0.0

0.5

1.0

13 npm

bug earth(formula=f,data=training1)

(b) Part of a Global model with local considerations learned on the Xalan2.6 dataset

Figure 6: Global models report general trends, while global models with local considerations give insights into different regions of the data. The Y-Axisdescribes the response (in this case bugs) while keeping all other prediction variables at their median values.

0 2 4 6 8 10

��

���

���

��� ��� ��� ��� ��� ���

��

��

��� ��� ��� ���

��

���

���

0 10 20 30 40 �� 60

��

���

���

���

��� ��� ��� ��� ��� ���0

12

30 1 2 3 4

��

��

���

ic

npm

npm

ic mfa

mfa

Fold 9, Cluster 1

Fold 9, Cluster 6

Figure 7: Example of contradicting trends in local models (Xalan 2.6,Cluster 1 and Cluster 6 in Fold 9).

model already partition the data into regions with individualproperties. For example, we observe that an increase of ic(measuring the inheritance coupling through parent classes)is predicted to only have a negative effect on bug-pronenesswhen it attains values larger than 1. Thus a practitioner mightdecide a different course of action than he or she would havedone based on the trends outlined by a global model.

V. CONCLUSIONS

In this study, we investigated the difference of threedifferent types of prediction models. Global models arebuilt on software engineering datasets as-is, while forlocal models we first subdivide the datasets into subsets ofdata with similar observations, before building individualmodels on each subset. In addition, we also studied athird approach: multivariate adaptive regression splines asa global model with local considerations. MARS by designtakes local considerations of individual regions of the datainto account, and can thus be considered a hybrid betweenglobal and local models.

A. Think LocallyWe evaluated each of the three modelling strategies in acase study on four different datasets, which have beenused in prior research on the WHICH machine-learningalgorithm [14]. The results of our case study demonstratethat clustering of a dataset into regions with similarproperties and using the individual regions for building of

prediction models leads to an improved fit of these models.Our findings thus confirm the results of Menzies et al.,who observed a similar effect of data localization on theirWHICH machine-learning algorithm. These increased fitshave practical implications for researchers concerned inusing regression models for understanding: local modelsare more insightful than global models, which report onlygeneral trends across the whole dataset, whereas we havedemonstrated that such general trends may not hold true forparticular parts of the dataset. For example, we have seenin the Xalan 2.6 defect prediction dataset that particularsets of classes are influenced differently by attributes suchas inheritance, cohesion and complexity. Our findingsreinforce the recommendations of Menzies et al. againstthe use of a “one-size-fits-all” approach, such as a globalmodel, when trying to account for such localized effects.

B. Act GloballyWhen the goal is carrying out actual predictions, rather thanunderstanding, local models show only small improvementsover global models, with respect to prediction error andranking. In particular, building local models involves asignificant overhead due to clustering of the data. Eventhough clustering algorithms such as the one presented inthe work by Menzies et al. [14] might run in linear time, westill have to learn a multitude of models, one for each cluster.One particular point that we have not addressed in our studyis whether the choice of clustering algorithm influences thefinal performance of the local models. While our choice wasto use a state-of-the-art model-based clustering algorithmthat partitions data along dimensions of highest variability,future research may want to look deeper into the effect thatdifferent clustering approaches have on the performance oflocal models.

Surprisingly, we found that the relatively small increasein prediction performance of local models is offset byan increased error variance. While predictions from localmodels are close to the actual values most of the time, weobserved the occasional very high errors. In other words,while global models are not as accurate as local models,their worst case scenarios are not as bad as we observe withlocal models. We want to note however that this findingstand in conflict with the findings of Menzies et al., whoobserved the opposite: their clustering algorithm decreases

0 5 10 15 20−2.5

−1.5

−0.5

0.5

1 avg_cc

0 50 100 150

0.50

0.60

0.70

0.80

2 ca

0.0 0.2 0.4 0.6 0.8 1.0

0.44

0.48

0.52

3 cam

0 5 10 15 20 25 30

0.5

0.7

0.9

1.1

4 cbm

0 10 20 30 40 50

0.50

0.54

0.58

0.62

5 ce

0.0 0.2 0.4 0.6 0.8 1.0

0.35

0.45

6 dam

1 2 3 4 5 6 7 8

0.3

0.4

0.5

0.6

7 dit

0 1 2 3 4 5

0.50

0.55

0.60

0.65

8 ic

0 1000 3000 50000.6

1.0

1.4

1.8 9 lcom

0.0 0.5 1.0 1.5 2.0

0.3

0.4

0.5

0.6

0.7

10 lcom3

0 1000 2000 3000 4000

0.5

1.0

1.5

2.0

11 loc

0 20 40 60 80 120

12

34

12 max_cc

0.0 0.2 0.4 0.6 0.8 1.0

0.45

0.47

0.49

0.51

13 mfa

0 5 10 150.50

0.54

0.58

14 moa

0 5 10 15 20 25 30

0.42

0.46

0.50

15 noc

0 20 40 60 80 100 120

0.50

0.60

0.70

16 npm

lm(formula=f,data=training1)

(a) Part of a global Model learned on the Xalan 2.6 dataset

0.0 0.2 0.4 0.6 0.8 1.0

0.8

1.2

1.6

1 cam

0 5 10 15 20 25 30

1.0

1.2

1.4

1.6

1.8

2 cbm

0 10 20 30 40 50

0.2

0.4

0.6

0.8

1.0

3 ce

0.0 0.2 0.4 0.6 0.8 1.0

0.65

0.75

0.85

4 dam

1 2 3 4 5 6 7 8

0.2

0.4

0.6

0.8

5 dit

0 1 2 3 4 5

0.9

1.1

1.3

1.5

6 ic

0 1000 3000 5000

1.0

1.5

2.0

2.5

3.0 7 lcom

0.0 0.5 1.0 1.5 2.00.55

0.65

0.75

0.85

8 lcom3

0 1000 2000 3000 4000

12

34

56

9 loc

0.0 0.2 0.4 0.6 0.8 1.0

0.8

1.0

1.2

1.4

10 mfa

0 5 10 15

0.0

0.2

0.4

0.6

0.8

11 moa

0 5 10 15 20 25 30

0.0

0.5

1.0

1.5

12 noc

0 20 40 60 80 100 120

−1.0

0.0

0.5

1.0

13 npm

bug earth(formula=f,data=training1)

(b) Part of a Global model with local considerations learned on the Xalan2.6 dataset

Figure 6: Global models report general trends, while global models with local considerations give insights into different regions of the data. The Y-Axisdescribes the response (in this case bugs) while keeping all other prediction variables at their median values.

0 2 4 6 8 10��

���

���

��� ��� ��� ��� ��� ���

��

��

��� ��� ��� ���

��

���

���

0 10 20 30 40 �� 60

��

���

���

���

��� ��� ��� ��� ��� ���

01

23

0 1 2 3 4

��

��

���

ic

npm

npm

ic mfa

mfa

Fold 9, Cluster 1

Fold 9, Cluster 6

Figure 7: Example of contradicting trends in local models (Xalan 2.6,Cluster 1 and Cluster 6 in Fold 9).

model already partition the data into regions with individualproperties. For example, we observe that an increase of ic(measuring the inheritance coupling through parent classes)is predicted to only have a negative effect on bug-pronenesswhen it attains values larger than 1. Thus a practitioner mightdecide a different course of action than he or she would havedone based on the trends outlined by a global model.

V. CONCLUSIONS

In this study, we investigated the difference of threedifferent types of prediction models. Global models arebuilt on software engineering datasets as-is, while forlocal models we first subdivide the datasets into subsets ofdata with similar observations, before building individualmodels on each subset. In addition, we also studied athird approach: multivariate adaptive regression splines asa global model with local considerations. MARS by designtakes local considerations of individual regions of the datainto account, and can thus be considered a hybrid betweenglobal and local models.

A. Think LocallyWe evaluated each of the three modelling strategies in acase study on four different datasets, which have beenused in prior research on the WHICH machine-learningalgorithm [14]. The results of our case study demonstratethat clustering of a dataset into regions with similarproperties and using the individual regions for building of

prediction models leads to an improved fit of these models.Our findings thus confirm the results of Menzies et al.,who observed a similar effect of data localization on theirWHICH machine-learning algorithm. These increased fitshave practical implications for researchers concerned inusing regression models for understanding: local modelsare more insightful than global models, which report onlygeneral trends across the whole dataset, whereas we havedemonstrated that such general trends may not hold true forparticular parts of the dataset. For example, we have seenin the Xalan 2.6 defect prediction dataset that particularsets of classes are influenced differently by attributes suchas inheritance, cohesion and complexity. Our findingsreinforce the recommendations of Menzies et al. againstthe use of a “one-size-fits-all” approach, such as a globalmodel, when trying to account for such localized effects.

B. Act GloballyWhen the goal is carrying out actual predictions, rather thanunderstanding, local models show only small improvementsover global models, with respect to prediction error andranking. In particular, building local models involves asignificant overhead due to clustering of the data. Eventhough clustering algorithms such as the one presented inthe work by Menzies et al. [14] might run in linear time, westill have to learn a multitude of models, one for each cluster.One particular point that we have not addressed in our studyis whether the choice of clustering algorithm influences thefinal performance of the local models. While our choice wasto use a state-of-the-art model-based clustering algorithmthat partitions data along dimensions of highest variability,future research may want to look deeper into the effect thatdifferent clustering approaches have on the performance oflocal models.

Surprisingly, we found that the relatively small increasein prediction performance of local models is offset byan increased error variance. While predictions from localmodels are close to the actual values most of the time, weobserved the occasional very high errors. In other words,while global models are not as accurate as local models,their worst case scenarios are not as bad as we observe withlocal models. We want to note however that this findingstand in conflict with the findings of Menzies et al., whoobserved the opposite: their clustering algorithm decreases

Cluster 1

Cluster 6

Model Interpretation

...

Sometimes even contradict

Saturday, 2 June, 12

Page 69: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

21

0 5 10 15 20−2.5

−1.5

−0.5

0.5

1 avg_cc

0 50 100 150

0.50

0.60

0.70

0.80

2 ca

0.0 0.2 0.4 0.6 0.8 1.0

0.44

0.48

0.52

3 cam

0 5 10 15 20 25 30

0.5

0.7

0.9

1.1

4 cbm

0 10 20 30 40 50

0.50

0.54

0.58

0.62

5 ce

0.0 0.2 0.4 0.6 0.8 1.0

0.35

0.45

6 dam

1 2 3 4 5 6 7 8

0.3

0.4

0.5

0.6

7 dit

0 1 2 3 4 5

0.50

0.55

0.60

0.65

8 ic

0 1000 3000 5000

0.6

1.0

1.4

1.8 9 lcom

0.0 0.5 1.0 1.5 2.0

0.3

0.4

0.5

0.6

0.7

10 lcom3

0 1000 2000 3000 4000

0.5

1.0

1.5

2.0

11 loc

0 20 40 60 80 120

12

34

12 max_cc

0.0 0.2 0.4 0.6 0.8 1.0

0.45

0.47

0.49

0.51

13 mfa

0 5 10 15

0.50

0.54

0.58

14 moa

0 5 10 15 20 25 30

0.42

0.46

0.50

15 noc

0 20 40 60 80 100 120

0.50

0.60

0.70

16 npm

lm(formula=f,data=training1)

(a) Part of a global Model learned on the Xalan 2.6 dataset

0.0 0.2 0.4 0.6 0.8 1.0

0.8

1.2

1.6

1 cam

0 5 10 15 20 25 30

1.0

1.2

1.4

1.6

1.8

2 cbm

0 10 20 30 40 50

0.2

0.4

0.6

0.8

1.0

3 ce

0.0 0.2 0.4 0.6 0.8 1.0

0.65

0.75

0.85

4 dam

1 2 3 4 5 6 7 8

0.2

0.4

0.6

0.8

5 dit

0 1 2 3 4 5

0.9

1.1

1.3

1.5

6 ic

0 1000 3000 5000

1.0

1.5

2.0

2.5

3.0 7 lcom

0.0 0.5 1.0 1.5 2.00.55

0.65

0.75

0.85

8 lcom3

0 1000 2000 3000 4000

12

34

56

9 loc

0.0 0.2 0.4 0.6 0.8 1.0

0.8

1.0

1.2

1.4

10 mfa

0 5 10 15

0.0

0.2

0.4

0.6

0.8

11 moa

0 5 10 15 20 25 30

0.0

0.5

1.0

1.5

12 noc

0 20 40 60 80 100 120

−1.0

0.0

0.5

1.0

13 npm

bug earth(formula=f,data=training1)

(b) Part of a Global model with local considerations learned on the Xalan2.6 dataset

Figure 6: Global models report general trends, while global models with local considerations give insights into different regions of the data. The Y-Axisdescribes the response (in this case bugs) while keeping all other prediction variables at their median values.

0 2 4 6 8 10

��

���

���

��� ��� ��� ��� ��� ���

��

��

��� ��� ��� ���

��

���

���

0 10 20 30 40 �� 60

��

���

���

���

��� ��� ��� ��� ��� ���

01

23

0 1 2 3 4

��

��

���

ic

npm

npm

ic mfa

mfa

Fold 9, Cluster 1

Fold 9, Cluster 6

Figure 7: Example of contradicting trends in local models (Xalan 2.6,Cluster 1 and Cluster 6 in Fold 9).

model already partition the data into regions with individualproperties. For example, we observe that an increase of ic(measuring the inheritance coupling through parent classes)is predicted to only have a negative effect on bug-pronenesswhen it attains values larger than 1. Thus a practitioner mightdecide a different course of action than he or she would havedone based on the trends outlined by a global model.

V. CONCLUSIONS

In this study, we investigated the difference of threedifferent types of prediction models. Global models arebuilt on software engineering datasets as-is, while forlocal models we first subdivide the datasets into subsets ofdata with similar observations, before building individualmodels on each subset. In addition, we also studied athird approach: multivariate adaptive regression splines asa global model with local considerations. MARS by designtakes local considerations of individual regions of the datainto account, and can thus be considered a hybrid betweenglobal and local models.

A. Think LocallyWe evaluated each of the three modelling strategies in acase study on four different datasets, which have beenused in prior research on the WHICH machine-learningalgorithm [14]. The results of our case study demonstratethat clustering of a dataset into regions with similarproperties and using the individual regions for building of

prediction models leads to an improved fit of these models.Our findings thus confirm the results of Menzies et al.,who observed a similar effect of data localization on theirWHICH machine-learning algorithm. These increased fitshave practical implications for researchers concerned inusing regression models for understanding: local modelsare more insightful than global models, which report onlygeneral trends across the whole dataset, whereas we havedemonstrated that such general trends may not hold true forparticular parts of the dataset. For example, we have seenin the Xalan 2.6 defect prediction dataset that particularsets of classes are influenced differently by attributes suchas inheritance, cohesion and complexity. Our findingsreinforce the recommendations of Menzies et al. againstthe use of a “one-size-fits-all” approach, such as a globalmodel, when trying to account for such localized effects.

B. Act GloballyWhen the goal is carrying out actual predictions, rather thanunderstanding, local models show only small improvementsover global models, with respect to prediction error andranking. In particular, building local models involves asignificant overhead due to clustering of the data. Eventhough clustering algorithms such as the one presented inthe work by Menzies et al. [14] might run in linear time, westill have to learn a multitude of models, one for each cluster.One particular point that we have not addressed in our studyis whether the choice of clustering algorithm influences thefinal performance of the local models. While our choice wasto use a state-of-the-art model-based clustering algorithmthat partitions data along dimensions of highest variability,future research may want to look deeper into the effect thatdifferent clustering approaches have on the performance oflocal models.

Surprisingly, we found that the relatively small increasein prediction performance of local models is offset byan increased error variance. While predictions from localmodels are close to the actual values most of the time, weobserved the occasional very high errors. In other words,while global models are not as accurate as local models,their worst case scenarios are not as bad as we observe withlocal models. We want to note however that this findingstand in conflict with the findings of Menzies et al., whoobserved the opposite: their clustering algorithm decreases

Model Interpretation

Saturday, 2 June, 12

Page 70: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

21

Regression Splines: Local Trends in a Single Curve

0 5 10 15 20−2.5

−1.5

−0.5

0.5

1 avg_cc

0 50 100 150

0.50

0.60

0.70

0.80

2 ca

0.0 0.2 0.4 0.6 0.8 1.0

0.44

0.48

0.52

3 cam

0 5 10 15 20 25 30

0.5

0.7

0.9

1.1

4 cbm

0 10 20 30 40 50

0.50

0.54

0.58

0.62

5 ce

0.0 0.2 0.4 0.6 0.8 1.0

0.35

0.45

6 dam

1 2 3 4 5 6 7 8

0.3

0.4

0.5

0.6

7 dit

0 1 2 3 4 5

0.50

0.55

0.60

0.65

8 ic

0 1000 3000 5000

0.6

1.0

1.4

1.8 9 lcom

0.0 0.5 1.0 1.5 2.0

0.3

0.4

0.5

0.6

0.7

10 lcom3

0 1000 2000 3000 4000

0.5

1.0

1.5

2.0

11 loc

0 20 40 60 80 120

12

34

12 max_cc

0.0 0.2 0.4 0.6 0.8 1.0

0.45

0.47

0.49

0.51

13 mfa

0 5 10 15

0.50

0.54

0.58

14 moa

0 5 10 15 20 25 30

0.42

0.46

0.50

15 noc

0 20 40 60 80 100 120

0.50

0.60

0.70

16 npm

lm(formula=f,data=training1)

(a) Part of a global Model learned on the Xalan 2.6 dataset

0.0 0.2 0.4 0.6 0.8 1.0

0.8

1.2

1.6

1 cam

0 5 10 15 20 25 30

1.0

1.2

1.4

1.6

1.8

2 cbm

0 10 20 30 40 50

0.2

0.4

0.6

0.8

1.0

3 ce

0.0 0.2 0.4 0.6 0.8 1.0

0.65

0.75

0.85

4 dam

1 2 3 4 5 6 7 8

0.2

0.4

0.6

0.8

5 dit

0 1 2 3 4 5

0.9

1.1

1.3

1.5

6 ic

0 1000 3000 5000

1.0

1.5

2.0

2.5

3.0 7 lcom

0.0 0.5 1.0 1.5 2.00.55

0.65

0.75

0.85

8 lcom3

0 1000 2000 3000 4000

12

34

56

9 loc

0.0 0.2 0.4 0.6 0.8 1.0

0.8

1.0

1.2

1.4

10 mfa

0 5 10 15

0.0

0.2

0.4

0.6

0.8

11 moa

0 5 10 15 20 25 30

0.0

0.5

1.0

1.5

12 noc

0 20 40 60 80 100 120

−1.0

0.0

0.5

1.0

13 npm

bug earth(formula=f,data=training1)

(b) Part of a Global model with local considerations learned on the Xalan2.6 dataset

Figure 6: Global models report general trends, while global models with local considerations give insights into different regions of the data. The Y-Axisdescribes the response (in this case bugs) while keeping all other prediction variables at their median values.

0 2 4 6 8 10

��

���

���

��� ��� ��� ��� ��� ���

��

��

��� ��� ��� ���

��

���

���

0 10 20 30 40 �� 60

��

���

���

���

��� ��� ��� ��� ��� ���

01

23

0 1 2 3 4

��

��

���

ic

npm

npm

ic mfa

mfa

Fold 9, Cluster 1

Fold 9, Cluster 6

Figure 7: Example of contradicting trends in local models (Xalan 2.6,Cluster 1 and Cluster 6 in Fold 9).

model already partition the data into regions with individualproperties. For example, we observe that an increase of ic(measuring the inheritance coupling through parent classes)is predicted to only have a negative effect on bug-pronenesswhen it attains values larger than 1. Thus a practitioner mightdecide a different course of action than he or she would havedone based on the trends outlined by a global model.

V. CONCLUSIONS

In this study, we investigated the difference of threedifferent types of prediction models. Global models arebuilt on software engineering datasets as-is, while forlocal models we first subdivide the datasets into subsets ofdata with similar observations, before building individualmodels on each subset. In addition, we also studied athird approach: multivariate adaptive regression splines asa global model with local considerations. MARS by designtakes local considerations of individual regions of the datainto account, and can thus be considered a hybrid betweenglobal and local models.

A. Think LocallyWe evaluated each of the three modelling strategies in acase study on four different datasets, which have beenused in prior research on the WHICH machine-learningalgorithm [14]. The results of our case study demonstratethat clustering of a dataset into regions with similarproperties and using the individual regions for building of

prediction models leads to an improved fit of these models.Our findings thus confirm the results of Menzies et al.,who observed a similar effect of data localization on theirWHICH machine-learning algorithm. These increased fitshave practical implications for researchers concerned inusing regression models for understanding: local modelsare more insightful than global models, which report onlygeneral trends across the whole dataset, whereas we havedemonstrated that such general trends may not hold true forparticular parts of the dataset. For example, we have seenin the Xalan 2.6 defect prediction dataset that particularsets of classes are influenced differently by attributes suchas inheritance, cohesion and complexity. Our findingsreinforce the recommendations of Menzies et al. againstthe use of a “one-size-fits-all” approach, such as a globalmodel, when trying to account for such localized effects.

B. Act GloballyWhen the goal is carrying out actual predictions, rather thanunderstanding, local models show only small improvementsover global models, with respect to prediction error andranking. In particular, building local models involves asignificant overhead due to clustering of the data. Eventhough clustering algorithms such as the one presented inthe work by Menzies et al. [14] might run in linear time, westill have to learn a multitude of models, one for each cluster.One particular point that we have not addressed in our studyis whether the choice of clustering algorithm influences thefinal performance of the local models. While our choice wasto use a state-of-the-art model-based clustering algorithmthat partitions data along dimensions of highest variability,future research may want to look deeper into the effect thatdifferent clustering approaches have on the performance oflocal models.

Surprisingly, we found that the relatively small increasein prediction performance of local models is offset byan increased error variance. While predictions from localmodels are close to the actual values most of the time, weobserved the occasional very high errors. In other words,while global models are not as accurate as local models,their worst case scenarios are not as bad as we observe withlocal models. We want to note however that this findingstand in conflict with the findings of Menzies et al., whoobserved the opposite: their clustering algorithm decreases

Model Interpretation

Saturday, 2 June, 12

Page 71: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

21

Regression Splines: Local Trends in a Single Curve

0 5 10 15 20−2.5

−1.5

−0.5

0.5

1 avg_cc

0 50 100 150

0.50

0.60

0.70

0.80

2 ca

0.0 0.2 0.4 0.6 0.8 1.0

0.44

0.48

0.52

3 cam

0 5 10 15 20 25 30

0.5

0.7

0.9

1.1

4 cbm

0 10 20 30 40 50

0.50

0.54

0.58

0.62

5 ce

0.0 0.2 0.4 0.6 0.8 1.0

0.35

0.45

6 dam

1 2 3 4 5 6 7 8

0.3

0.4

0.5

0.6

7 dit

0 1 2 3 4 5

0.50

0.55

0.60

0.65

8 ic

0 1000 3000 5000

0.6

1.0

1.4

1.8 9 lcom

0.0 0.5 1.0 1.5 2.0

0.3

0.4

0.5

0.6

0.7

10 lcom3

0 1000 2000 3000 4000

0.5

1.0

1.5

2.0

11 loc

0 20 40 60 80 120

12

34

12 max_cc

0.0 0.2 0.4 0.6 0.8 1.0

0.45

0.47

0.49

0.51

13 mfa

0 5 10 15

0.50

0.54

0.58

14 moa

0 5 10 15 20 25 30

0.42

0.46

0.50

15 noc

0 20 40 60 80 100 120

0.50

0.60

0.70

16 npm

lm(formula=f,data=training1)

(a) Part of a global Model learned on the Xalan 2.6 dataset

0.0 0.2 0.4 0.6 0.8 1.0

0.8

1.2

1.6

1 cam

0 5 10 15 20 25 30

1.0

1.2

1.4

1.6

1.8

2 cbm

0 10 20 30 40 50

0.2

0.4

0.6

0.8

1.0

3 ce

0.0 0.2 0.4 0.6 0.8 1.0

0.65

0.75

0.85

4 dam

1 2 3 4 5 6 7 8

0.2

0.4

0.6

0.8

5 dit

0 1 2 3 4 5

0.9

1.1

1.3

1.5

6 ic

0 1000 3000 5000

1.0

1.5

2.0

2.5

3.0 7 lcom

0.0 0.5 1.0 1.5 2.00.55

0.65

0.75

0.85

8 lcom3

0 1000 2000 3000 4000

12

34

56

9 loc

0.0 0.2 0.4 0.6 0.8 1.0

0.8

1.0

1.2

1.4

10 mfa

0 5 10 15

0.0

0.2

0.4

0.6

0.8

11 moa

0 5 10 15 20 25 30

0.0

0.5

1.0

1.5

12 noc

0 20 40 60 80 100 120

−1.0

0.0

0.5

1.0

13 npm

bug earth(formula=f,data=training1)

(b) Part of a Global model with local considerations learned on the Xalan2.6 dataset

Figure 6: Global models report general trends, while global models with local considerations give insights into different regions of the data. The Y-Axisdescribes the response (in this case bugs) while keeping all other prediction variables at their median values.

0 2 4 6 8 10

��

���

���

��� ��� ��� ��� ��� ���

��

��

��� ��� ��� ���

��

���

���

0 10 20 30 40 �� 60

��

���

���

���

��� ��� ��� ��� ��� ���

01

23

0 1 2 3 4

��

��

���

ic

npm

npm

ic mfa

mfa

Fold 9, Cluster 1

Fold 9, Cluster 6

Figure 7: Example of contradicting trends in local models (Xalan 2.6,Cluster 1 and Cluster 6 in Fold 9).

model already partition the data into regions with individualproperties. For example, we observe that an increase of ic(measuring the inheritance coupling through parent classes)is predicted to only have a negative effect on bug-pronenesswhen it attains values larger than 1. Thus a practitioner mightdecide a different course of action than he or she would havedone based on the trends outlined by a global model.

V. CONCLUSIONS

In this study, we investigated the difference of threedifferent types of prediction models. Global models arebuilt on software engineering datasets as-is, while forlocal models we first subdivide the datasets into subsets ofdata with similar observations, before building individualmodels on each subset. In addition, we also studied athird approach: multivariate adaptive regression splines asa global model with local considerations. MARS by designtakes local considerations of individual regions of the datainto account, and can thus be considered a hybrid betweenglobal and local models.

A. Think LocallyWe evaluated each of the three modelling strategies in acase study on four different datasets, which have beenused in prior research on the WHICH machine-learningalgorithm [14]. The results of our case study demonstratethat clustering of a dataset into regions with similarproperties and using the individual regions for building of

prediction models leads to an improved fit of these models.Our findings thus confirm the results of Menzies et al.,who observed a similar effect of data localization on theirWHICH machine-learning algorithm. These increased fitshave practical implications for researchers concerned inusing regression models for understanding: local modelsare more insightful than global models, which report onlygeneral trends across the whole dataset, whereas we havedemonstrated that such general trends may not hold true forparticular parts of the dataset. For example, we have seenin the Xalan 2.6 defect prediction dataset that particularsets of classes are influenced differently by attributes suchas inheritance, cohesion and complexity. Our findingsreinforce the recommendations of Menzies et al. againstthe use of a “one-size-fits-all” approach, such as a globalmodel, when trying to account for such localized effects.

B. Act GloballyWhen the goal is carrying out actual predictions, rather thanunderstanding, local models show only small improvementsover global models, with respect to prediction error andranking. In particular, building local models involves asignificant overhead due to clustering of the data. Eventhough clustering algorithms such as the one presented inthe work by Menzies et al. [14] might run in linear time, westill have to learn a multitude of models, one for each cluster.One particular point that we have not addressed in our studyis whether the choice of clustering algorithm influences thefinal performance of the local models. While our choice wasto use a state-of-the-art model-based clustering algorithmthat partitions data along dimensions of highest variability,future research may want to look deeper into the effect thatdifferent clustering approaches have on the performance oflocal models.

Surprisingly, we found that the relatively small increasein prediction performance of local models is offset byan increased error variance. While predictions from localmodels are close to the actual values most of the time, weobserved the occasional very high errors. In other words,while global models are not as accurate as local models,their worst case scenarios are not as bad as we observe withlocal models. We want to note however that this findingstand in conflict with the findings of Menzies et al., whoobserved the opposite: their clustering algorithm decreases

Model Interpretation

Combines the best of both worlds!

Saturday, 2 June, 12

Page 72: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Saturday, 2 June, 12

Page 73: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Using Locality in Data to build better Statistical Models.

Saturday, 2 June, 12

Page 74: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Using Locality in Data to build better Statistical Models.

vs = Two Extremes

Saturday, 2 June, 12

Page 75: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Using Locality in Data to build better Statistical Models.

vs = Two Extremes

Build Local Model, globally Optimized

Saturday, 2 June, 12

Page 76: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Using Locality in Data to build better Statistical Models.

vs = Two Extremes

Build Local Model, globally Optimized• combines best of both worlds

Saturday, 2 June, 12

Page 77: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Using Locality in Data to build better Statistical Models.

vs = Two Extremes

Build Local Model, globally Optimized• combines best of both worlds• outperforms global and clustered local

Saturday, 2 June, 12

Page 78: Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Using Locality in Data to build better Statistical Models.

vs = Two Extremes

Build Local Model, globally Optimized• combines best of both worlds• outperforms global and clustered local• summarizes local trends in single curve

Saturday, 2 June, 12