seeing the forest for the trees, umons 2011
DESCRIPTION
Slides used during the talk at UMons in November 2011.TRANSCRIPT
Seeing the forest for the trees
Bogdan [email protected]://www.win.tue.nl/∼bvasiles/
Software Engineering and Technology groupEindhoven University of Technology
November 23, 2011
2/21
/ department of mathematics and computer science
Eindhoven
2/21
/ department of mathematics and computer science
Eindhoven
3/21
/ department of mathematics and computer science
Computer Science @TU/e
I Section Model Driven Software Engineering (MDSE)I Group Software Engineering and Technology (SET)
Mark van den Brand Alexander Serebrenik
3/21
/ department of mathematics and computer science
Computer Science @TU/e
I Section Model Driven Software Engineering (MDSE)I Group Software Engineering and Technology (SET)
Mark van den Brand Alexander Serebrenik
4/21
/ department of mathematics and computer science
Interested in . . .
I Software evolutionAggregation of code metrics Activity in open-source projects
I Computational geometry
4/21
/ department of mathematics and computer science
Interested in . . .
I Software evolutionAggregation of code metrics Activity in open-source projects
I Computational geometry
5/21
/ department of mathematics and computer science
Aggregation of software metrics
Maintaining a software system is like renovating a house.
Maintainability assessment precedes changing the software.
Metrics are often applied to measure maintainability.
But metrics are defined at a low level (method, class).
We need aggregation techniques.
6/21
/ department of mathematics and computer science
Aggregation of software metrics
7/21
/ department of mathematics and computer science
Traditional aggregation techniques
Standard summary statistics: mean, median, . . .
Red line – mean; blue line – median
8/21
/ department of mathematics and computer science
Recent trend: Inequality indices
Econometrics: measure/explain the inequality of income or wealth.
Software metrics and econometric variables have distributions withsimilar shapes.
Source Lines of Code: freecol−0.9.4
SLOC per class
Fre
quen
cy
0 500 1000 1500 2000 2500 3000
010
020
030
040
0
Household income in Ilocos, Philippines (1998)
Income
Fre
quen
cy
0 500000 1500000 2500000
010
020
030
040
050
0
9/21
/ department of mathematics and computer science
Degree of concentration of functionality
Lorenz curve for SLOC in Hibernate3.6.0-beta4.
% Classes
% S
LOC
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Measure inequality between:I individuals
(e.g., classes)I groups
(e.g., components)
Often desirable to assess thecontribution of the inequalitybetween the groups.
I Decomposable indicesI Root-cause analysis
9/21
/ department of mathematics and computer science
Degree of concentration of functionality
Lorenz curve for SLOC in Hibernate3.6.0-beta4.
IHoover
IGini =A
A
B
A+B = 2A
Measure inequality between:I individuals
(e.g., classes)I groups
(e.g., components)
Often desirable to assess thecontribution of the inequalitybetween the groups.
I Decomposable indicesI Root-cause analysis
9/21
/ department of mathematics and computer science
Degree of concentration of functionality
Lorenz curve for SLOC in Hibernate3.6.0-beta4.
IHoover
IGini =A
A
B
A+B = 2A
Measure inequality between:I individuals
(e.g., classes)I groups
(e.g., components)
Often desirable to assess thecontribution of the inequalitybetween the groups.
I Decomposable indicesI Root-cause analysis
10/21
/ department of mathematics and computer science
Traceability via decomposability
Which individuals (classes in package) contribute to 80% of theinequality (of SLOC)?
Which class contributes the most to the inequality?
11/21
/ department of mathematics and computer science
Other properties of inequality indices
Symmetry
Inequality stays the same for any permutation of the population.
11/21
/ department of mathematics and computer science
Other properties of inequality indices
Symmetry
Inequality stays the same for any permutation of the population.
11/21
/ department of mathematics and computer science
Other properties of inequality indices
Symmetry
Inequality stays the same for any permutation of the population.
12/21
/ department of mathematics and computer science
Other properties of inequality indices
Population principle
Inequality does not change if the population is replicated any number oftimes.
12/21
/ department of mathematics and computer science
Other properties of inequality indices
Population principle
Inequality does not change if the population is replicated any number oftimes.
12/21
/ department of mathematics and computer science
Other properties of inequality indices
Population principle
Inequality does not change if the population is replicated any number oftimes.
13/21
/ department of mathematics and computer science
Other properties of inequality indices
Transfers principle
A transfer from a rich man to a poor man (without reversing theirposition) should decrease inequality.
13/21
/ department of mathematics and computer science
Other properties of inequality indices
Transfers principle
A transfer from a rich man to a poor man (without reversing theirposition) should decrease inequality.
13/21
/ department of mathematics and computer science
Other properties of inequality indices
Transfers principle
A transfer from a rich man to a poor man (without reversing theirposition) should decrease inequality.
13/21
/ department of mathematics and computer science
Other properties of inequality indices
Transfers principle
20 36 45
30 36
A transfer from a rich man to a poor man (without reversing theirposition) should decrease inequality.
14/21
/ department of mathematics and computer science
Other properties of inequality indices
Scale invariance: Gini, Theil, Atkinson, Hoover
Inequality does not change if all values are multiplied by the sameconstant.
14/21
/ department of mathematics and computer science
Other properties of inequality indices
Scale invariance: Gini, Theil, Atkinson, Hoover
Inequality does not change if all values are multiplied by the sameconstant.
15/21
/ department of mathematics and computer science
Summary
Ineq. index Sym. Inv. Dec. Pop. Tra.IGini X × X XITheil X × X X XIMLD X × X X XIHoover X × XIαAtkinson X × X X XIβKolm X + X X X
Problems include:I Domain not always Rn .I No distinction between all values equal but low, and all values
equal but high.
15/21
/ department of mathematics and computer science
Summary
Ineq. index Sym. Inv. Dec. Pop. Tra.IGini X × X XITheil X × X X XIMLD X × X X XIHoover X × XIαAtkinson X × X X XIβKolm X + X X X
Problems include:I Domain not always Rn .I No distinction between all values equal but low, and all values
equal but high.
16/21
/ department of mathematics and computer science
Our research
17/21
/ department of mathematics and computer science
Which are redundant?
IGini, ITheil, IMLD, IAtkinson, and IHoover always convey the same information.-1.0
-0.5
0.0
0.5
1.0
-1.0
-0.5
0.0
0.5
1.0
SLOC
MLD-Hoo Gin-MLD The-MLD Gin-Hoo Atk-Hoo The-Hoo Gin-Atk MLD-Atk Gin-The The-Atk
(91%) (89%) (91%) (90%) (92%) (92%) (90%) (91%) (91%) (92%)
-1.0
-0.5
0.0
0.5
1.0
-1.0
-0.5
0.0
0.5
1.0
DIT
MLD-Hoo Atk-Hoo Gin-MLD The-Hoo Gin-Atk Gin-Hoo Gin-The The-MLD The-Atk MLD-Atk
(85%) (87%) (87%) (88%) (88%) (89%) (88%) (88%) (88%) (89%)
18/21
/ department of mathematics and computer science
Is the correlation meaningful?
Superlinear (e.g., ITheil–IGini) and chaotic (e.g., ITheil–IKolm) patterns canbe observed in the scatter plots.
0.1 0.2 0.3 0.4 0.5 0.6
0.0
0.2
0.4
0.6
0.8
1.0
compiere: Theil-Gini. Kendall: 0.94, p-val: 0.00
Gini (SLOC)
The
il (S
LOC
)
0 50 100 150 200 250 300 350
0.0
0.2
0.4
0.6
0.8
1.0
compiere: Theil-Kolm. Kendall: 0.25, p-val: 0.01
Kolm (SLOC)
The
il (S
LOC
)
19/21
/ department of mathematics and computer science
Does the aggregation level matter?
Changing the aggregation level to class level does not affect thecorrelation between various aggregation techniques as measured atpackage level.
-1.0
-0.5
0.0
0.5
1.0
Kendall: Gini - Theil (SLOC) (100%)
Ken
dall
corr
elat
ion
coef
ficie
nt
-1.0
-0.5
0.0
0.5
1.0
Kendall: Gini - Theil (SLOC) (100%)
Ken
dall
corr
elat
ion
coef
ficie
nt
-1.0
-0.5
0.0
0.5
1.0
Kendall: Theil - Atkinson (SLOC) (100%)
Ken
dall
corr
elat
ion
coef
ficie
nt
-1.0
-0.5
0.0
0.5
1.0
Kendall: Theil - Atkinson (SLOC) (100%)
Ken
dall
corr
elat
ion
coef
ficie
nt
-1.0
-0.5
0.0
0.5
1.0
Kendall: Theil - MLD (SLOC) (100%)
Ken
dall
corr
elat
ion
coef
ficie
nt
-1.0
-0.5
0.0
0.5
1.0
Kendall: Theil - MLD (SLOC) (100%)
Ken
dall
corr
elat
ion
coef
ficie
nt
20/21
/ department of mathematics and computer science
Does system size matter?
System size does influence the correlation between aggregationtechniques, e.g., ITheil–IKolm increases with system size.
0.0
0.2
0.4
0.6
0.8
1.0
hibernate − Kendall(Theil(SLOC), Kolm(SLOC)) (86 releases)
Cor
. coe
ff. T
heil(
SLO
C)
− K
olm
(SLO
C)
0.8.
11.
01.
12.
0−be
ta−
12.
0−be
ta−
22.
0−be
ta−
32.
0−be
ta−
42.
0−fin
al2.
0−rc
22.
0.1
2.0.
22.
0.3
2.1−
beta
−1
2.1−
beta
−2
2.1−
beta
−3
2.1−
beta
−3b
2.1−
beta
−4
2.1−
beta
−5
2.1−
beta
−6
2.1−
final
2.1−
rc1
2.1.
12.
1.2
2.1.
32.
1.4
2.1.
52.
1.6
2.1.
72.
1.8
3.0
3.0−
alph
a3.
0−be
ta1
3.0−
beta
23.
0−be
ta3
3.0−
beta
43.
0−rc
13.
0.1
3.0.
23.
0.3
3.0.
43.
0.5
3.1
3.1−
alph
a13.
1−be
ta1
3.1−
beta
23.
1−be
ta3
3.1−
rc1
3.1−
rc2
3.1−
rc3
3.1.
13.
1.2
3.1.
33.
2−al
pha1
3.2−
alph
a23.
2−cr
13.
2−cr
23.
2.0−
cr3
3.2.
0−cr
43.
2.0−
cr5
3.2.
0.ga
3.2.
1−ga
3.2.
2−ga
3.2.
3−ga
3.2.
4−ga
3.2.
4−sp
13.
2.5−
ga3.
2.6−
ga3.
2.7−
ga3.
3.0−
cr2
3.3.
0−ga
3.3.
0−sp
13.
3.0.
cr1
3.3.
1−ga
3.3.
2−ga
3.5.
0−be
ta−
13.
5.0−
beta
−2
3.5.
0−be
ta−
33.
5.0−
beta
−4
3.5.
0−cr
−1
3.5.
0−cr
−2
3.5.
3−fin
al3.
5.5−
final
3.6.
0−be
ta1
3.6.
0−be
ta2
3.6.
0−be
ta3
3.6.
0−be
ta4
21/21
/ department of mathematics and computer science
References
A. Serebrenik and M. G. J. van den Brand.
Theil index for aggregation of software metrics values.
In Int. Conf. on Software Maintenance, pages 1–9. IEEE, 2010.
B. Vasilescu.
Analysis of advanced aggregation techniques for software metrics.
Master’s thesis, Eindhoven, The Netherlands, July 2011.
B. Vasilescu, A. Serebrenik, and M. G. J. van den Brand.
By no means: A study on aggregating software metrics.
In 2nd International Workshop on Emerging Trends in Software Metrics,Honolulu, Hawaii, USA, 2011.
B. Vasilescu, A. Serebrenik, and M. G. J. van den Brand.
You can’t control the unfamiliar: A study on the relations betweenaggregation techniques for software metrics.
In Int. Conf. on Software Maintenance. IEEE, 2011.
22/21
/ department of mathematics and computer science
Correlation
Linear correlation can be misleading.
5 10 15
46
810
12
Pea: 0.816; Ken: 0.963; Spe: 0.990
●●
●
●●
●
●●
●
●●
5 10 15
46
810
12
Pea: 0.816; Ken: 0.636; Spe: 0.818
●
●●
●●
●
●
●
●
●
●
5 10 15
46
810
12
Pea: 0.816; Ken: 0.563; Spe: 0.690
●
●●●
●
●
●
●
●
●
●
5 10 15
46
810
12
Pea: 0.816; Ken: 0.426; Spe: 0.5
●
●
●
●●
●
●
●
●
●
●
[Vas11, VSvdB11a, SvdB10, VSvdB11b]
22/21
/ department of mathematics and computer science
Correlation
Linear correlation can be misleading.
5 10 15
46
810
12
Pea: 0.816; Ken: 0.963; Spe: 0.990
●●
●
●●
●
●●
●
●●
5 10 15
46
810
12
Pea: 0.816; Ken: 0.636; Spe: 0.818
●
●●
●●
●
●
●
●
●
●
5 10 15
46
810
12
Pea: 0.816; Ken: 0.563; Spe: 0.690
●
●●●
●
●
●
●
●
●
●
5 10 15
46
810
12
Pea: 0.816; Ken: 0.426; Spe: 0.5
●
●
●
●●
●
●
●
●
●
●
[Vas11, VSvdB11a, SvdB10, VSvdB11b]