predicting software metrics at design time

53
Predicting Software Metrics at Design Time Rahul Premraj Saarland University Thomas Zimmermann University of Calgary Wolfgang Holz Saarland University Andreas Zeller Saarland University

Upload: rahul-premraj

Post on 24-Jan-2015

1.589 views

Category:

Economy & Finance


0 download

DESCRIPTION

Talk at PROFES 08, Rome

TRANSCRIPT

Page 1: Predicting Software Metrics at Design Time

Predicting Software Metrics at Design Time

Rahul PremrajSaarland University

Thomas ZimmermannUniversity of Calgary

Wolfgang HolzSaarland University

Andreas ZellerSaarland University

Page 2: Predicting Software Metrics at Design Time

Early Resource Estimation

2

Page 3: Predicting Software Metrics at Design Time

Early Resource Estimation

• Essential for successful project management.

2

Page 4: Predicting Software Metrics at Design Time

Early Resource Estimation

• Essential for successful project management.

• Most commonly estimated resources: Time or Cost.

2

Page 5: Predicting Software Metrics at Design Time

Early Resource Estimation

• Essential for successful project management.

• Most commonly estimated resources: Time or Cost.

• Many estimation models use software metrics as inputs.

2

Page 6: Predicting Software Metrics at Design Time

Early Resource Estimation

• Essential for successful project management.

• Most commonly estimated resources: Time or Cost.

• Many estimation models use software metrics as inputs.

• Many of these metrics are not available at design time.

2

Page 7: Predicting Software Metrics at Design Time

Research Question

Software’s domain

3

Page 8: Predicting Software Metrics at Design Time

Research Question

Software’s domain

Software metrics

Anyrelationship?

3

Page 9: Predicting Software Metrics at Design Time

4

Page 10: Predicting Software Metrics at Design Time

Tom ZimmermannUniversity of Calgary

Adrian SchröterUniversity of Victoria

Predicting ComponentFailures at Design Time

International Symposium onEmpirical Software Engineering

Rio de Janiero, 2006

Andreas ZellerSarland University

4

Page 11: Predicting Software Metrics at Design Time

Tom ZimmermannUniversity of Calgary

Stephan NeuhausSaarland University

Predicting Vulnerable Software Components

Conference on Computer and Communication Security

Alexandria, 2007

Andreas ZellerSarland University

Tom ZimmermannUniversity of Calgary

Adrian SchröterUniversity of Victoria

Predicting ComponentFailures at Design Time

International Symposium onEmpirical Software Engineering

Rio de Janiero, 2006

Andreas ZellerSarland University

4

Page 12: Predicting Software Metrics at Design Time

Tom Zimmermann

University of Calgary

Stephan Neuhaus

Saarland University

Predicting Vulnerable

Software ComponentsConference on Computer and

Communication Security

Alexandria, 2007

Andreas Zeller

Sarland University

Tom Zimmermann

University of Calgary

Adrian Schröter

University of Victoria

Predicting Component

Failures at Design TimeInternational Symposium on

Empirical Software Engineering

Rio de Janiero, 2006

Andreas Zeller

Sarland University

Both papers used import statements as representatives of software domain.

5

Page 13: Predicting Software Metrics at Design Time

Learner Predictor

produces

Existing code as

imports with software metrics

New design

as imports

Predicted metric

Fig. 1: Approach overview. By learning from the relationship between imports and metrics inexisting code, we can predict metrics based on imports alone.

3. Using the ECLIPSE code base, we show how to predict software complexity, asdefined by the widely used object-oriented ckjm software metrics [6].

We expect that advance reliable knowledge of such product-specific metrics canbe a boon to solving several management issues that constantly loom over all types ofdevelopment projects at an early stage.

This paper is organized as follows. In Section 2, we discuss features and shortcom-ings of contemporary cost estimation models. The data used for our experimentationis elaborated upon in Section 3. Thereafter, we present our experimental setup in Sec-tion 4, which is followed by results and discussions in Section 5. Threats to validity areaddressed in Section 6 and lastly, we conclude our work in Section 7.

2 Background

As discussed above, cost estimation is vital to a successful outcome of a softwareproject. But most contemporary estimation models depend upon characteristics of thesoftware that are typically unknown at start. For example, many models take into ac-count the relationship between software size and cost. Examples include algorithmicmodels such as COCOMO [7] and Putnam [8], regression models [9] and analogy-based models [10–13]. To use these models, first an estimate of the size of the project isneeded. Again, size is unknown at start of the project and can only be estimated basedon other characteristics of the software. Hence, basing cost estimates on an estimateof size adds to uncertainty of the estimates and fate of the project. This challenges thevalue of such models.

We propose a novel approach that, in contrast to others, focusses on estimating thesize of a component with as little knowledge as its design. This places managers at aunique position from where they can choose between several alternatives to optimisenot only size, but also other metrics of the software that serve as its quality indicators.We present these metrics in more detail in the following section.

Overview of Approach

6

Page 14: Predicting Software Metrics at Design Time

Research Question

Software’s

domain

Software

metrics

Any

relationship?

To investigate, we used 89 core plugins from the Eclipse project.

7

Page 15: Predicting Software Metrics at Design Time

Data Collection

Scan through a .java file and identify lines that take the form

import a.b.c;

8

Page 16: Predicting Software Metrics at Design Time

9

Page 17: Predicting Software Metrics at Design Time

import java.sql.*;

9

Page 18: Predicting Software Metrics at Design Time

Connection conn = null;Statement stmt = null;

import java.sql.*;

9

Page 19: Predicting Software Metrics at Design Time

Connection conn = null;Statement stmt = null;

EclipseAbstract Syntax Tree Parser

(ASTParser)

Java file AST

import java.sql.*;

9

Page 20: Predicting Software Metrics at Design Time

Connection conn = null;Statement stmt = null;

import java.sql.*;

10

Page 21: Predicting Software Metrics at Design Time

Connection conn = null;Statement stmt = null;

ASTView

import java.sql.*;

10

Page 22: Predicting Software Metrics at Design Time

Connection conn = null;Statement stmt = null;

ASTView

java.sql.Connection

java.sql.Statement

import java.sql.*;

10

Page 23: Predicting Software Metrics at Design Time

David A. Wheeler

Lines of Codehttp://www.dwheeler.com/sloccount/sloccount.html

A physical SLOC is a line ending in a newline or end-of-file marker, and which contains at least one non-whitespace non-comment character.

11

Page 24: Predicting Software Metrics at Design Time

CKJM MetricsDiomidis D. Spinellis

Abbreviation MetricCACBOCBOJDKDITNOCNPMLCOMRFCWMC

Afferent Couplings Coupling between Class Objects Java specific CBO Depth of Inheritance Tree Number of Children Number of Public Methods Lack of Cohesion in Methods Response for a Class Weighted Methods per Class

http://www.spinellis.gr/sw/ckjm/

12

Page 25: Predicting Software Metrics at Design Time

Data

13

Page 26: Predicting Software Metrics at Design Time

Data

ClassName

13

Page 27: Predicting Software Metrics at Design Time

Data

ClassName

1 1 0 0

Input Features(14,185 import statements)

13

Page 28: Predicting Software Metrics at Design Time

Data

ClassName

1 1 0 0

Input Features(14,185 import statements)

SLOC WMC ... NPM

Output Features(SLOC and CKJM metrics)

13

Page 29: Predicting Software Metrics at Design Time

Data

ClassName

1 1 0 0

Input Features(14,185 import statements)

SLOC WMC ... NPM

Output Features(SLOC and CKJM metrics)

Training Data(two-thirds ofthe data set)

Testing Data(remaining one-third)

13

Page 30: Predicting Software Metrics at Design Time

Reduce Sample Bias

14

Page 31: Predicting Software Metrics at Design Time

Reduce Sample Bias

30x14

Page 32: Predicting Software Metrics at Design Time

15

Page 33: Predicting Software Metrics at Design Time

Import Statementsfrom Code

Code Metrics15

Page 34: Predicting Software Metrics at Design Time

Import Statementsfrom Code

Learner

Code MetricsTr

ai

ni

ng

P

ha

se

15

Page 35: Predicting Software Metrics at Design Time

Import Statementsfrom Code

Learner

Code MetricsTr

ai

ni

ng

P

ha

se

Predictor

produces

15

Page 36: Predicting Software Metrics at Design Time

Import Statementsfrom Code

Learner

Code MetricsTr

ai

ni

ng

P

ha

se T

es

ti

ng

P

ha

se

Import Statementsfrom Planned Code

Predictor

produces

15

Page 37: Predicting Software Metrics at Design Time

Import Statementsfrom Code

Learner

Code MetricsPredicted

Code MetricsTr

ai

ni

ng

P

ha

se T

es

ti

ng

P

ha

se

Import Statementsfrom Planned Code

Predictor

produces

15

Page 38: Predicting Software Metrics at Design Time

Evaluation

16

Page 39: Predicting Software Metrics at Design Time

EvaluationPred(x)

% of predictions that lie within x% of Actual

Value

16

Page 40: Predicting Software Metrics at Design Time

LOC 3 -x% +x%

LOC 4-x% +x%

-x% +x%

LOC 1-x% +x%

LOC 2-x% +x%

LOC 5

EvaluationActual Value

Predicted Value

Pred(x)

% of predictions that lie within x% of Actual

Value

16

Page 41: Predicting Software Metrics at Design Time

LOC 3 -x% +x%

LOC 4-x% +x%

-x% +x%

LOC 1-x% +x%

LOC 2-x% +x%

LOC 5

EvaluationActual Value

Predicted Value

Pred(x)

% of predictions that lie within x% of Actual

Value

Pred(x)

= .60

⇒ 60%

= 3/5

16

Page 42: Predicting Software Metrics at Design Time

Results

17

Page 43: Predicting Software Metrics at Design Time

CA

CBO

CBOJDK

DIT

LCOM

NOC

NPM

RFC

SLOC

WMC

0 20 40 60 80 100

! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!

!!!!! !!!!! !!!!!!!!!!! !!!!!! !!!

!!!!!!!!!!!!!!!!!!! !!! !!!!!!!!

!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

!!!!!!!! !!!!!!! !! !! !!!!!!!!!!!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

!!!!!!!!!!!! !!!!!!!!!!!!!!!! !!

!!!!!!!!!! !!!!!!!!! !!!!! !!!!!!

!! !!!!!!!! !!!!!!!!!!!!!! !!!!!!

!

!

!!!!

!

!! !

!

!

!!

!!! !

!!

! ! !

!

! !!

!

!

!

!!

!

!!

!

!!

!

!

!! !

!

!!!

! !!

!!

!!!

!! !

!

!

!!

!!

!

!

!

!

!

!!

! !!

!!

!

!!! !!

!

!

!

!!

!

!

!

!

!!

!

!!!

!!

!!

!!

!!

! !!

!!

!

!!

! !

!!!

!!

!!

!

!

!!

!

!!

!

!!!

!

!!

!

!!

!

!!

!!!

!!!

!

!

!!

!!!!

!

!

!

!!

!!

!

!!

!!! !!

!!!

!

!!

!!

!

!!!!

!

!

!

!!

!

!

!!!

!

!

!! !!

!

!

!!!

!

!!

!!

!!

!

!!! !!

!

!

!

!

!!

!

!!!

!

!

!

!

!!!

!!

!!

!

!!!!!

!!

!!

!!!!!

!

!!

!

!!!

!!!

!!

!

!!

!

!!

!!

!

!

!!!

!

!

!!!!

!!

!!

!!

!

!

!!!

!

! !!

Legend

Pred50

! Pred25Metr

ic

PredX Value

Fig. 4: Prediction accuracy for output metrics

5 Results and Discussion

Figure 4 presents the results from our experiments. All metrics are presented in alpha-betical order on the y-axis, while the PredX values are plotted on the x-axis. For eachmetric, we have plotted both, Pred25 (as circles) and Pred50 values (as triangles) fromeach of the thirty experimental runs. The plots are jittered [24], i.e., a small randomvariation has been introduced to ease observation of overlapping values on the x-axis.

We observe from the figure that SLOC is predicted with reasonable accuracy. Pred25values hover around 42% while Pred50 values hover around 71%. Whereas, predictionresults for CBO and CBOJDK are outstandingly good. The Pred25 value for CBO hoveraround 72% and even higher for CBOJDK at 86%. Their Pred50 values hover around88% and 97% respectively. The model also predicts RFC and DIT values with reasonable

0 20 40 60 80 100

WMC

SLOC

RFC

NPM

NOC

LCOM

DIT

CBOJDK

CBO

CA

Legend

Pred 25

Pred 50M

etric

s

Pred (x)18

Page 44: Predicting Software Metrics at Design Time

Threats to Validity

19

Page 45: Predicting Software Metrics at Design Time

Threats to Validity

• Issues with generalisation.

19

Page 46: Predicting Software Metrics at Design Time

Threats to Validity

• Issues with generalisation.

• Import statements at design time may differ from those at release time.

19

Page 47: Predicting Software Metrics at Design Time

Threats to Validity

• Issues with generalisation.

• Import statements at design time may differ from those at release time.

• No filtering of outliers, since they make interesting cases to study.

19

Page 48: Predicting Software Metrics at Design Time

Data Free to Download!

ClassName

1 1 0 0

Input Features(14,185 import statements)

SLOC WMC ... NPM

Output Features(SLOC and CKJM metrics)

This data will be made available in the PROMISE repository

by end-July ’08.

20

Page 49: Predicting Software Metrics at Design Time

Conclusions

21

Page 50: Predicting Software Metrics at Design Time

Conclusions• Imported components can determine many

software metrics.

21

Page 51: Predicting Software Metrics at Design Time

Conclusions

• Reliable estimation of resources at design time is possible.

• Imported components can determine many software metrics.

21

Page 52: Predicting Software Metrics at Design Time

Conclusions

• Reliable estimation of resources at design time is possible.

• Precision accuracy of up to 95% (Pred 25) for some metrics.

• Imported components can determine many software metrics.

21

Page 53: Predicting Software Metrics at Design Time

Conclusions

• Reliable estimation of resources at design time is possible.

• Precision accuracy of up to 95% (Pred 25) for some metrics.

• Predictions can be further used as inputs for other resource estimation models.

• Imported components can determine many software metrics.

21