an empirical study of coupling and control flow metrics
TRANSCRIPT
Information and Software Technology 39 (1997) 879-887 ELSEVIER
An empirical study of coupling and control flow metrics
Elaine Femeley
Manchester Metropolitan University, Department of Computing, John Dalton Extension, Chester Street, Manchester, Ml SGD, UK
Received 23 January 1997; revised 2 July 1997; accepted 3 July 1997
Abstract
This paper describes the development of a methodology and associated measures for assessing control flow and both the implicit and explicit coupling of hierarchical and network models of system designs. The methodology concentrates on the assessment of the intra-module attribute of embeddedness of control flow structures and the inter-module attribute of implicit complexity of module coupling. The associated
measures were developed and validated in collaboration with two commercial organizations whose interests were in reducing development time and corrective maintenance activity; data from these projects are presented here. The application of the measures has been supported by the development of a meta-CASE tool. The measures have shown considerable success in identifying areas of system designs requiring redefinition. However, subjective evaluation by experienced system designers has also provided some promising results, suggesting that
qualitative analysis still has a role to play in the ‘quality’ debate. 0 1997 Elsevier Science B.V.
Keywords: System design; Design measurement; Design quality
1. Introduction
System design is the earliest stage of software develop- ment at which the system architecture is clearly defined [ 11.
It has been suggested that quantifiable mechanisms for assessing ‘module coupling’ and ‘internal module strength’ at the high and low level design stages of software devel-
opment are required [2-51. Recent theoretical and empirical
research has highlighted a relationship between the appli- cation of measurement at the design stage of software
development and an enhancement in the perceived ‘quality’
of the delivered product [6- 101. At the design stage of system development the ‘essence’
of the system has been established and this paper will show that clear, quantifiable, coupling and control flow features can be identified. The design phase commonly produces a
number of directed graphs as a model of the proposed
system. In high level models, nodes represent entities in
the system, such as processes, functions and types; the directed links represent relationships (or coupling) between these design entities. In lower level models, nodes represent processing activity within an entity, and the directed links represent the flow of control between processing activities. Measures have been developed that indirectly assess either intra- or inter-module ‘complexity’. The ‘complexity’ of software is an abstract property and usually a function
0950-5849/97/$17.00 0 1997 Elsevier Science B.V. All rights reserved
PII SO950-5849(97)00034-7
of several factors. For example, Fenton suggests that complexity can be viewed from a number of different per- spectives including problem, algorithmic, structural and
cognitive complexity contexts [l I]. Therefore, it cannot be directly measured. What is required is a mechanism to
link characteristics of the product to particular factors that are measurable. The classic examples of measures of intra- and inter-module ‘complexity’ are, respectively, McCabe’s
‘cyclomatic complexity measure’ and Henry and Kafura’s
‘information flow measure’ [ 12,131. However, later work,
specifically by Shepperd, has shown that the theoretical foundations and statistical validation of these measures is flawed [14,15]. Furthermore, such intra- and inter-module
measures are in conflict, intra-module measures reward decomposition into a series of simple communicating modules whilst inter-module measures can be perceived to
reward large module structures with minimal communi-
cation between them [ 161.
This paper presents a measurement-set based on control
flow and information flow theory that has been validated against quantitative commercial data. The goals of the empirical study were
1. to investigate the extent to which the proposed measure- ment-set is a good predictor of development time,
2. to investigate the extent to which the proposed measure- ment-set is a good predictor of error rate,
880 E. Ferneley/Informtion and So&are Technology 39 (1997) 879-887
3. to investigate the extent to which the measurement-set out performs human judgement.
The first data-set (known as project A) was obtained from an international commercial software house; the implemen- tation was the graphical user interface (GUI) for their in-house CASE tool. The second data-set (known as project B) was obtained from an academic publishing company, the implementation was a sub-set of their order processing system. Projects A and B consisted of 93 and 121 modules respectively ranging from 4 to 685 lines of code. The two data-sets were also qualitatively assessed for perceived ‘quality’ by relatively independent software designers. To support the application of the measurement-set a meta- CASE tool has been developed that, because the measures are generic, may be reconfigured to support a wide range of software design methods.
activity embedded within a specific control flow stmc- ture. It has been shown that as the degree of embedded- ness increases so does the probability of error, and thereby the algorithmic complexity [ 191.
2. As a refinement to the DEC(i) factor the particular logi- cal constructs that the specified elementary component i
is embedded within. This refinement of the DEC(i) factor will be known as the SC(i) factor.
3. The individual SC(i) weightings are combined to provide a composite assessment of the control flow complexity (CF(i)) of the module i [20]. The CF(i) factor therefore aims to assess the complexity of the control flow structure of the specified module by considering the depth of nesting and the relationships of the con- structs sequence, selection and iteration within the module.
The control flow structure measurement philosophy can be summarized as follows:
2. The measurement-set DEC(i) depth of elementary component i (1)
SC(i) control flow history of elementary component i
(2)
U(i) f - control flow structure weighting for a i=l specific module i (3)
where p-is the number of individual processing paths.
The measurement-set considers both the intra- and inter- module features of control flow and coupling respectively.
2.1. Control Jlow structure philosophy
Control flow examines the range of processing paths within a module. The control flow structure, as illustrated in the system design, is the earliest indication of the algo- rithmic complexity of the future implementation. The com- plexity of the code’s structure has been shown to have a bearing on its understandability and thereby the ease with which future maintenance activity can be performed [ 17,181. In order to assess the control flow structure it is proposed that the following factors are considered.
1. The depth of nesting of the processing activity (DEC(i)),
a specific sub-set of processing activity is collectively referred to as an elementary component i. An elementary component is defined as the set of sequential processing
X Y
2.2. Control flow structure measurement
A weighting mechanism for control flow constructs within hierarchical structures has been defined. The hier- archical notation used when conducting this research was that of Jackson [21]. However the underlying philosophy remains unchanged regardless of the notation used. This weighting mechanism is derived from the cyclomatic
Fig. 1. Selection weightings.
E. Ferneley/lnformation and So&are Technology 39 (1997) 879-887 881
Fig. 2. Iteration weightings.
complexity equation that may be applied to strongly con-
nected control graphs to derive the number of linearly inde-
pendent circuits through the graph. A strongly connected
control graph is one where, for any given pair of nodes (x,
y), there is a path from x to y and a path from y to x. The linearly independent circuits through a graph, when com-
bined, generate all possible circuits through the control flow graph. A linearly independent circuit is defined as one which
cannot be derived by a combination of other circuits through
the control flow graph. The cyclomatic complexity equation v for a control flow graph G is
v(G)=e-n+l (4)
where e is the number of edges, and n is the number of
nodes in the control flow graph. To illustrate how selectable components are weighted
consider Fig. 1. Note that the extra edge has been added to the control flow graphs to make them strongly connected. The cyclomatic complexity figures for control flow graphs X and Y are 3 and 4 respectively. Therefore, when a selec- tion is encountered the corresponding weighting assigned
to the start (or parent) node equates to the number of select-
able alternatives. To illustrate how iterative components are weighted
consider Fig. 2; again an extra edge has been added to
make the graph strongly connected. The cyclomatic complexity figure for the iterative control flow graph is 2, this being the number of linearly independent paths.
At the pure design stage, the iteration may or may not be
entered. Having weighted individual control flow constructs
within an intra-module design a mechanism is required
for assessing the interrelationships between control flow constructs. To aid in this discussion consider Fig. 3 and the associated control flow graph presented in Fig. 4.
Fig. 3 has been developed using Jackson’s notation [21];
note that the type of the component is represented in its
subordinate components. In order to evaluate overall intra-
module designs each individual processing path in the model is considered (SC(i)). This is distinct from linearly independent paths as these fail to consider sequential processing activity. For example, the cyclomatic complex- ity figure for the derived control flow graph in Fig. 4 is 8 which is the same as the number of simple predicates illustrated in Fig. 3 plus 1 [ 121. However, the representation
LJ X Component identifier name 0 = Selection
cl n Component design complexity * = Multiple occurences
0 Weighting iliustraGng con@oljlow history of elementary component (SC)
Fig. 3. Overall control flow weightings.
in Fig. 3 shows 9 individual processing paths; the anomaly
is due to the explicit modelling of the sequential processing
activity N. In the control flow graph such sequential pro- cessing activity could either be incorporated within the IF
node, or explicitly modelled by a node and associated edge; in either case the sequential processing activity would have no effect on the cyclomatic complexity figure.
Therefore, the process by which an overall weighting for
a given intra-module design is achieved is by firstly deriving a weighting for each individual processing path through
the model. Consideration is given to the inherited structure
of the elementary component at the end of each individual
processing path. The rating is based upon the interrelation- ship between the elementary components’ predecessors in terms of sequence, selection and iteration. The value there-
fore takes into account ‘embeddedness’ and is the weighting for the SC(i) factor discussed earlier. The basic constructs of elementary sequential, selection and iteration are each
assigned a distinct weighting.
2.2.1. Elementary sequential components
Given that elementary components are the simplest component type they are given an initial design complexity
rating of 1. These components have an implicit complexity as they represent sequential processing activity; this com-
plexity is considered. If a series of elementary sequential
items are modelled, it may be argued that they have only
been modelled distinctly for clarity of design and could have been modelled as one elementary sequential component. However, if the designer feels that distinct modelling is necessary then such elementary sequential components are graded individually. It should be recognized that the design process is subjective. There may be many realistic designs for a specified problem, so the measurement process
882 E. FemeleyHnformation and Software Technology 39 (1997) 879-887
d Bsslll Hutch,ns
d Bipprst klmr
d Branch T,stlnq
d Depth of Nosting
d Dun~m~0-C8nnon
gj Extondrd product VINAP
d IS D-structumd
d lambda
d tarmth
d McCabr’s
d Number of Paths
d Numbor of Slmplr Paths
yj Pnthn’s
d PfoductVlNAP
d Statemwtt Tntln4
d SumVlNAP
d Vklt-ach-loop
4 YAM
count oct”we”~s of: _
Cakulatl) 5~0 mabicr)
24.72
5.00
u/c
3.00
1 .a2
33.00
0.00
u/c
21.00
8.00
u/c
u/c
U/C
26.00
u/c
33.00
u/c
15.87
m
Saw mrtrks to) : mrtrks.out
If wrists 9 Ovrrwrib Format 3 Tabk
Fl.Ids.parator =j Tab -
1 J r cmatool K5NSULm -/Bln/csR
dirty 1% scrronduw I rasflltrr8tol I ; lpr -pqas -Y
. ;
a MS9nlfkSth: 100% Show names) Fit In wlndow) 3
1
Fig. 4. Control flow graph representation of Fig. 3.
presented here aims to support the designer in choosing the ‘best’ of the potential designs [22]. Such distinct weighting also eliminates the problem of sequential segments of code all being assigned the same weighting regardless of length. Sequential items such as ‘M’ within the body of a hierarchy diagram are not rated as they are a Jackson specific notation facilitating diagrammatic sub-division of the model into logical processing tasks (this also explains why ‘N’ is not explicitly modelled in Fig. 4).
2.2.2. Selection The philosophy of linearly independent paths is the foun-
dation for the complexity grading as assigned to selectable components. The grading for selections is determined by the number of branches emanating from the selectable component. By using such a weighting mechanism for selections, the span of the selection in which an elementary component is embedded is reflected in the final SC(i) figure. An illustration is provided in Fig. 3 with reference to
components ‘D’ and ‘IT. ‘D’ being embedded within an iteration and a two-way selection receives an SC(D) figure of 5. ‘H’ being embedded within an iteration and a three- way selection receives an SC(H) figure of 6.
2.2.3. Iteration
Iterations are automatically assigned a grading of 2; this weighting is determined by considering the number of linearly independent paths emanating from the iteration node. The iteration, when consideration is only given to design, may or may not be invoked; hence the weighting of 2.
2.2.4. Individual processing paths
A grade for each individual processing path in the hierarchical model is derived by examining the path from the root component to the specified elementary com- ponent. Such a traversal through the hierarchical model considers the embeddedness or nesting of the individual
E. Femeley/lnformation and Software Technology 39 (1997) 879-887 883
Table 1 Coupling equations
Acronym Equation
FI-CF(i)
FO-CF(i)
FLMCF(i)
FO-MCF(i)
FI-CF(i) FO-CF(i) Fl-MCF(i) FO-MCF(i) CF(i) II
lx 1 CF(i) complexity totals terminating at the module i Eq. (5)
Lx , CF(i) complexity totals emanating from the module i Eq. (6)
FI - CF(i) * 1: fan - ins Eq. (7)
FO - CF(i) * Eyfan - outs Eq. (8)
fan-in complexity total of the module i fan-out complexity total of the module i fan-in multiplicity complexity total of the module i fan-out multiplicity complexity total of the module i control flow complexity of each unique information flow terminating at (or emanating from) the module i number of unique information flows terminating at (or emanating from) the module i
elements of the structure (the elementary components). For
example, in the case of ‘L’ in Fig. 3, the nesting weighting
figure (SC(L)) is 8. This is derived from adding the SC(i) weightings of ‘F’ (2), ‘G’ (3), ‘J’ (2) and ‘L’ (1).
2.2.5. Control jlow complexity
The composite weighting for assessing the complexity
of a control flow or data structure is derived by combining the individual SC(i) figures for the specified control flow or
data structure (CF(i)).
These rules provide an unambiguous mechanism for
assigning numbers to the various attributes of a control how structure allowing the relation > to be assigned to pairs of control flow structures. Hence, control flow struc- tures can be assessed on an ordinal scale.
2.3. Coupling philosophy
Coupling can be sub-divided into two broad categories:
‘import’ coupling (fan-in) and ‘export’ coupling (fan-out).
Import coupling is the extent to which a module depends on importing external data declarations from global data
structures or other modules. It is argued that the more dependent a module is on external declarations the more difficult it is to understand in isolation because, in order to
understand the local description of the module some global understanding is required. In addition, the wider the spread
of external declarations the more difficult the under-
standing process [23]. Export coupling is the extent to which a module’s internal data declarations affect the data
declarations of other modules in the system and are respons- ible for updating global data structures. Export coupling
examines how a particular module is used and how global
data structures are updated in the system. Therefore, it is argued that any change to the module under consideration has a direct effect on all modules and global data structures it is export coupled to. Research on Fortran modules by Card et al. revealed that modules with high export coupling tend to cost more and have a higher fault rate [24]. Also, as export coupling increases so does the likelihood of a
potential ripple effect when a change is made to the speci- fied module [25].
The ‘classical’ coupling measure is that as defined by
Henry and Kafura [ 131. However, as the work of Kitchen-
ham has shown, there is ambiguity in their definitions which may result in repeated counting of information flows [26].
Also, in order for Henry and Kafura’s measure to be applic-
able at the design stage of software development, the factor that considers length of code should not be included [27].
The problems of ambiguity arise if the ancestry of an infor-
mation flow is considered. However, as the work of Card
et al. and Lohse and Zweben has shown, there was no significant difference between global and parameter coupling with reference to modifiability [24,28]. Card et al.
do however show a weak negative correlation between the use of global coupling and cost [24]. Henry and Kafura’s
measure also penalized reuse; if only ‘unique’ fan-in or fan-outs are considered this oversight is eliminated [16].
The implicit structure of the fan-in and fan-outs should
also be considered [29]. For example, the passing of a single
parameter is inherently less substantial than the passing of a variant record; the information flow should therefore
be weighted accordingly. It is proposed that such a weight- ing can be achieved by applying the CF(i) equation to the
structure of the data being passed. In addition to Henry and Kafura, a number of other
authors have developed composite measures for assessing
module coupling and the empirical evidence is beginning to suggest that fan-out is more important than fan-in [8,30].
This research aims to contribute to this discussion, therefore
a composite coupling measure is not proposed, rather the distinct fan-in and fan-out coupling features are considered
independently.
2.4. Coupling measurement
A mechanism is provided for assessing the import and export coupling features of a specific module. By the application of the control flow measure (CF(i)) to the structure of the information being passed, a weighting
884 E. FemeleyAnformation and Sofhoare Technology 39 (1997) 879-887
mechanism for module connectivity can be defined. The process by which the individual coupling measures were derived is as follows.
Each unique fan-in and fan-out of the specified module is given a weighting based on its implicit data structure by application of the control flow measure (CF(I’)). To calculate an overall weighting for the fan-ins the individual unique fan-in W(i) figures are accumulated; an overall weighting for the unique fan-outs is calcu- lated in the same way (see Table 1). The decision to add the hierarchy figures at this stage was based on main- taining the simplest equation, relative ranking will be maintained. The calculation for a composite figure to reflect a single module’s import or export coupling is weighted to reflect the difference between a single input (or output) source and several input (or output) sources. In order to account for the number of inputs, the FZ-CF(1’) figure is multi- plied by the total number of unique fan-ins (known as the multiplicity value) giving a fan-in multiplicity com- plexity total (FZ-MCF(r’)). The fan-out multiplicity complexity total (FO-MCF(1’)) is derived in the same manner. Multiplication was applied to reflect the total number of unique possible paths into or out of a given module.
These rules provide an unambiguous mechanism for assigning numbers to the various coupling attributes allow- ing the relation > to be assigned to pairs of import or export coupling features. Hence, import and export coupling features can be assessed on an ordinal scale.
3. Data collection
A reconfigurable CASE tool (or meta-CASE tool) has been developed to support the application of the developed measures to a range of graphical representations of system designs. The CASE tool also incorporates a C parser allow- ing the measures to be reverse engineered from C source code. The meta-CASE tool has been developed on a SUN MicroSystem BLC workstation with colour display. The graphical user interface (GUI) has been implement in C within the X Window environment (OpenWindow version 3 with SUNOS UNIX version 4.13). The Xlib (MIT X Window System version 11, Release 4), Xt Intrinsics, Motif functions (OSF/Motif version 1.14) and standard Motif widgets were also extensively used by the source programs of the GUI. The Prolog language (Prolog by BIM version 3.0) was used to create the validation rules and the diagram complexity evaluation mechanisms. The logical diagram structures for diagram validation and evaluation were kept in the repository using Prolog predi- cates. Prolog was also used to implement the diagram repositories and knowledge base. The C parser was
implemented using the Practical Extraction and Report Language (Per1 Version 4.0).
The research was undertaken in collaboration with two commercial organizations (Projects A and B). The depend- ent variables examined were, for Project A, development time per design as a relative percentage and, for Project B, error rate. Project B had kept extensive information on the maintenance history of their order processing system, together with all versions of the software. The original and amended versions of the order processing system were analysed to derive cumulative figures of each module’s error rate over time. The error rate was defined as ‘the total number of errors in a module/LOG’. Qualitative data were also collected. The developed designs were categor- ized against a subjective ranking of design ‘quality’; five weightings were used. These ranged from 1 to 5, 1 the lowest and 5 being the highest quality.
4. Application of the measures and interpretation of the results
Spearman’s rank correlation coefficient rs, the coefficient of determination r2 and multi-variate outlier analysis were used to analyse the derived data. Spearman’s rank correla- tion coefficient was chosen because the application of the various intra- and inter-module measures resulted in a non-normal distribution. The coefficient of determination was used as this provides an indication of how much variation in the dependent variable can be ‘explained’ by variation in the independent variable. It is also referred to as the ‘degree of association’ coefficient. Outlier analysis was chosen as it was felt that more detailed analysis of anomalous modules may provide interesting insights into the software development process. Scatter plot graphs were used to identify outliers which were removed from the data-sets before Spearman’s rank correlation coefficient was applied.
4.1. Development time (DT)
Detailed records of the amount of time taken to develop the module designs had been kept for Project A. The module designs were developed using a CASE tool, therefore detailed records of development time were available. How- ever, development time that did not involve the use of the CASE tool such as formal and informal discussions and the manual drafting of work could not be captured. The company discouraged manual designs, preferring all work to be undertaken using the CASE tool, hence providing the developers with more immediate feedback regarding the global implications of their developing modules. How- ever, to be pedantic, this research is strictly concerned with the correlation between the application of the developed measures and development time involving the use of the CASE tool.
E. Ferneley/lnfonnation and Sofhvare Technology 39 (1997) 879-887 88.5
4.1.1. Control jlow results
The CFIDTresult shows an excellent correlation, rs being 0.9103. The corresponding r2 value is 83%. Therefore, with respect to this data-set in this application domain there is a reasonable degree of association between CF and develop- ment time. This result is not surprising as it may be expected that as a control flow structure employs more control flow constructs so the associated development time increases.
4.1.2. Coupling results
The correlation coefficient figures for development time and the various fan-in measures show little correlation. Indeed, two out of the three fan-in results have less than a 20% degree of association with development time (FLCFI
DT and FI-MCFIDT). Therefore, it is concluded that there is no significant relationship between fan-in and development time. This may be attributed to the philosophy behind inheritance and information hiding in that limited detailed understanding is required in order to use a previously defined entity. The rS figures for the fan-out measures are more promising, showing a significant correlation in the order FOIDT < FO-CFIDT < FO-MCFIDT over the data- set. The FO-MCF value, which considers both the number of information flows emanating from the module and the structure of those information flows, shows the most sig- nificant correlation with an rS value of 0.9223. The corre- sponding rf value is 85%. It is therefore concluded that by considering the number of output destinations and the impli- cit fan-out structure an accurate assessment of development time can be derived.
4.1.3. Outlier analysis results
Assessment of the outliers with respect to CFIDT high- lighted a number of interesting results. Firstly, two modules with unusually high CF results and correspondingly high development times consisted of a series of large CASE statements. Whilst it is recognized that the use of the CASE statement should not be penalized, detailed examina- tion of both these modules showed that they exhibited some multiple functionality and should have been divided into a series of smaller modules. Secondly, several modules had high fan-in results relative to low development time. Of specific interest were references to library modules which did not appear to affect development. Five modules had high fan-in figures as a result of referencing library modules yet did not have correspondingly high development time. A similar situation existed with three modules that had high fan-in results relative to a low development time as a result of references to non-library modules. This would concur with the conclusions of Card and Glass that fan-in is of little relevance as it is generally confined to the reuse of mathe- matical functions [30]. Thirdly, it was shown that modules that have both high fan-out and high development time values are potential indicators of a module exhibiting multi- ple functionality.
4.1.4. Inter-measure results
The results of application of the measures were tested to see whether they correlated with each other; no significant correlations were found. The strongest correlation was between CFIDT and FO-CFIDT with an rs value of 0.725, this was followed by CFIDT and FO-MCFIDT at 0.587. The results of the various correlations between CFIDT and the fan-in measure ranged from 0.321 to 0.338, whilst those correlations between the fan-in and fan-out measurement results ranged from 0.357 to 0.369.
4.2. Error rate (ER)
Project B was a four year old order processing system in a publishing company; detailed records of the system’s maintenance history had been kept. The error rate calcu- lation divided the total number of errors by the number of lines of code in the current version of the module. Obviously, as amendments had been made to the modules their length had changed, either increasing or decreasing, therefore the current length was chosen as a normalizing factor.
4.2.1. Control flow results
The results show a significant relationship between the CF measure and the dependent variable error rate. Specifi- cally, the rS and rZ figures for Project B’s data-set are 0.9249 and 0.8554 (86%) respectively. This result is perhaps not surprising as it is reasonable to assume that as an algorithm becomes increasingly elaborate so the propensity for error increases.
The rs figures for the fan-in measures show little correla- 4.2.2. Coupling results
tion between error rate and fan-in. Specifically, Project B’s FIIER, FI-CFIER and FI-MCFIER results show rS values of 0.5790, 0.5338 and 0.5602 respectively. There- fore, it is concluded that the fan-in measures are poor indi- cators of error rate. However, the rS figures for the fan-out measures show more significant strengths of association in the order FOIER < FO-MCFIER < FO-CFIER (0.7309 <
0.8608 < 0.8894). It is interesting to note that the weighted fan-out measure (FO-MCFIER) shows over a 20% improve- ment on the raw fan-out measure FOIER. Therefore, it is concluded that the weighted fan-out measures do have a significant correlation with error rate. As with development time, the fan-out measures appear to have appreciably better relationships with the dependent variable than the fan-in measures. This suggests that either the fan-in measures require redefinition or fan-in has no relationship with the dependent variables.
4.2.3. Outlier analysis results
Assessment of the outliers with respect to CFIER high- lighted a number of interesting results. Firstly, three modules had anomalously high CF values relative to low
886 E. Femeley/lnformation and Software Technology 39 (1997) 879-887
error rates. In all three cases the modules were complex sorting routines implemented using mature algorithms; the code had not been internally developed. Secondly, the modules with the two highest FO-CF and FO-MCF figures also had the highest error rates and were regarded by their developers as being ‘maintenance time bombs’.
4.2.4. Inter-measure results The results of application of the measures were tested to
see whether they correlated with each other; no significant correlations were found. The strongest correlation was between CFIDT and FO-MCFIDT with an r, value of 0.229, whilst the remaining correlations ranged from 0.034 to 0.134.
4.3. Design quality (DQ)
Over the years much work has been undertaken to develop software measures that can be objectively applied to determine various ‘quality’ factors [ 11. It is assumed that objective measures will be more accurate than any subjec- tive evaluation undertaken by software developers. This research decided to test this hypothesis. For project A the designs were subjectively evaluated for perceived ‘quality’ by the software development manager responsible for the CASE tool repository implementation. This manager had haSno direct responsibility for the GUI development but was obviously extremely familiar with the aims and objec- tives of the project as it was ultimately to interface with his own. For project B no such independent evaluator was available. Therefore, the team leader responsible for the system undertook to assess qualitatively the data-set. Obviously, the team leader’s past knowledge of the pro- blems encountered during the development of the project will have had an influence on the evaluation process. Future research aims to find a suitable evaluator with no knowledge of either the order processing system or the measurement philosophy.
4.3.1. Controljow result Excellent correlations was found between the subjec-
tive evaluation of it&a-module design quality and the dependent variables, the rS figures being 0.9359 and 0.9461 for projects A and B respectively. In both cases these results are marginally better than those gained
Table 2
Spearman’s rank correlation coefficient results
from application of the CF measure. However, it should be reiterated that in both instances the evaluators had some prior knowledge of either the project or the measure- ment process.
4.3.2. Coupling results When assessing inter-module design quality the evalua-
tors were only asked to look at fan-in and fan-out. It was felt that detailed assessment of the number of input sources or output destinations and the implicit structure of the inter- module relationships was too fine a level of granularity for the evaluators to be able to discern. However, even at this relatively high level, no significant correlations were found between inter-module design quality and the dependent variables. The highest rS figure was 0.4284 for project B’s fan-out/design quality versus error rate.
4.3.3. Inter-measure results The results of application of the measures were tested to
see whether they correlated with each other; as with devel- opment time and error rate, no significant correlations were found.
A precursor to this work is that of Card and Agresti where eight projects were subjectively ranked in order of best to worst in terms of design quality [ 11. The subjective ranking was correlated with the application of their design complex- ity measure using the Wilcoxon rank sum statistic, This showed that there was a probability of less than 0.02 that the observed good/poor grouping correlations with their design measure could occur by chance. Their design mea- sure is a composite mesure which considers both inter- and intra-module complexity. This work may be considered as a refinement of Card and Agresti’s work as it considers inter- and intra-module separately. Furthermore, a much larger data-set is considered here. The results presented here would suggest that, with reference to these data-sets, the designer is more able to perceive intra-module rather than inter-module features.
5. Conclusion
This paper discusses a methodology and associated measures for assessing both the implicit and explicit coupling and control flow of network and hierarchical
CF FI FO FI-CF FO-CF FI-MCF FO-MCF
Development time 0.9103 0.5621 0.8372 0.3875 0.8990 0.4408 0.9223
Error rate 0.9249 0.5790 0.7309 0.5338 0.8894 0.5602 0.8608
Design quality (A) 0.9359 0.2819 0.3174 NA NA NA NA
Design quality (B) 0.9461 0.3002 0.4284 NA NA NA NA
Design quality (A) refers to Project A.
Design quality (B) refers to Project B.
NA not assessed.
E. Femeley/Infonnation and So&are Technology 39 (1997) 879-887 887
models of system designs. Table 2 provides a detailed
breakdown of the correlations between the response vari-
ables (development time, error rate and design quality) and
each of the independent variables. The paper builds on the work of Card, et al. which examined the association between coupling, fault rate and cost and the work of Card and Agresti which examined the association between
inter- and intra-module complexity and percieved complex-
ity as determined by senior management [ 1,241.
The major contributions of this paper are as follows: (i)
the development of intra- and inter-module measures which
have their foundation in linearly independent path theory; (ii) the empirical validation of the developed measures;
(iii) the realization that fan-out is more significant than fan-in with reference to development time and error rate
and that the weighting of the fan-out provides a more accurate assessment of future development time and error
rate; (iv) the recognition that designer judgement still has a role to play in the assessment of system designs for per-
ceived ‘quality’. The methodology is general enough for
use with many structured design techniques that consider
algorithmic and inter-module structure. With reference to the declared goals of this empirical
study, it can be concluded that the the developed measure-
ment-set is a good predictor of development time and error
rate (goals (i) and (ii)), however the measurement-set can not be said to out perform human judgement, specifically with reference to ‘quality’, as opposed to ‘complexity’.
This paper also provides the foundation for further
research. The limited relationships between various fan-in measures and the independent variables suggest that a refinement of the measures may be suitable for application
to the object-oriented paradigm. The measures should
also be integrated into structured design methodologies
and considered with other data quality issues.
References
[I] D.N. Card, W.W. Agresti, Measuring software design complexity,
Journal of Systems and Software 8 (1988) 185-197.
[2] K.H. Moller, D.J. Paulish, Software Metrics-a Practitioner’s Guide
to Improved Product Development, Chapman and Hall, 1993.
[3] P. Goodman, Practical Implementation of Software Metrics,
International Software Quality Assurance Series, McGraw-Hill,
1993.
[4] B.A. Kitchenham, J.G. Walker, A quantitative approach to monitoring
software development, Software Engineering Journal January (1989)
2-13.
[5] T. DeMarco, Controlling Software Projects: Management, Measure-
ment and Evaluation, Yourdon Press, NJ, 1982.
[6] R.B. Grady, Hewlett-Packard-successfully applying software
metrics, IEEE Computer September (1994) 8-25.
[7] L. O’Connell, Testing times. In M. Pelru (ed.). Computing, 15th
September, 1994.
[8] A. Al-Janabi, B. Aspinwall, An evaluation of software design using
the DEMETER tool, Software Engineering Journal November (1993)
319-324.
[9] W.M. Zage, D.M. Zage, Evaluating design metrics on large-scale
software, IEEE Software May (1993).
[lo] R.B. Grady, Practical Software Metrics for Project Management and
Process Improvement, Prentice Hall, 1992.
[ll] N.B. Fenton, S.L. Pfleeger, Software Metrics: a Rigorous and
Practical Approach, International Thomson Computer Press, 1997,
2nd edn.
[ 121 T.J. McCabe, A complexity measure, IEEE Tmnsactions on Software
Engineering 2 (4) (1976).
[ 131 S. Henry, D. Kafura, Software structure metrics based on information
flow, IEEE Transactions on Software Engineering 7 (5) (1981) 5lO-
518.
[I41 M. Shepperd, D. Ince, Metrics, outlier analysis and the software
design process, Information and Software Technology 31 (2, March)
(1989).
[15] M. Shepperd, A critique of cyclomatic complexity as a software
metric, Software Engineering Journal March (1988).
[I61 M. Shepperd, Design metrics: an empirical analysis, Software
Engineering Journal 5 ( 1) ( 1990) 3- IO.
[ 171 R. Bathe, R. Tinker, A rigorous approach to metrication: a field trial
using KINDRA, Proc. IEE/BCS Software Engineering Conf., 1988,
pp. 28-32.
[18] D. Kafura, R.R. Reddy, The use of software complexity metrics in
software maintenance, IEEE Transactions on Software Engineering
13 (3, March) (1987) 335-343.
[ 191 D.A. Troy, S.H. Zweben, Measuring the quality of structured design,
Journal of Systems and Software 2 (1981) 113-120.
[20] E.H. Femeley, D.A. Howcroft, C.G. Davies, Complexity measures for
system development models. In M. Lee, B.-Z. Barta, P. Juliff (eds.).
Software Quality and Productivity: Theory, Practice, Education and
Training, Chapman Hall, 1995.
[21] M.A. Jackson, Principles of Program Design, Academic Press, 1975.
[22] 0.1. Lindland, G. Sindre, A. Solvberg, Understanding quality in con-
ceptual modelling, IEEE Software March (1994).
[23] L.C. Briand, S. Morasca, V.R. Basili, Measuring and assessing main-
tainability at the end of high level design, IEEE Software Maintenance
Conf., Quebec Montreal Canada, September 1993, pp. 88-97.
[24] D.N. Card, V.E. Church, W.W. Agresti, An empirical study of soft-
ware design practices, IEEE Transactions on Software Engineering 12
(2, February) (1986) 264-270.
[25] K. Kronlot, Method Integration, Wiley, 1993.
[26] B.A. KItchenham, An evaluation of software structure metrics, Proc.
12th Int. Computer Software and Applications Conf. COMPSAC.
IEEE, October 1988.
[27] B.A. Kitchenham, L.M. Pickard, S.J. Linkman, An evaluation
of some design metrics, Software Engineering Journal 5 (1) (1990)
50-58.
[28] J.B. Lohse, S.H. Zweben, Experimental evaluation of software design
practices: An investigation into the effect of module coupling on
system modifiability, Journal of Systems and Software 4 (1984)
301-308.
[29] J. Karimi, B.R. Konsynski, An automated software design assistant,
IEEE Transactions on Software Engineering I4 (2, February) (1988)
194-210.
[30] D.N. Card, R. Glass, Measuring Software Design Quality, Addison-
Wesley, 1990.