an empirical study of coupling and control flow metrics

9
Information and Software Technology 39 (1997) 879-887 ELSEVIER An empirical study of coupling and control flow metrics Elaine Femeley Manchester Metropolitan University, Department of Computing, John Dalton Extension, Chester Street, Manchester, Ml SGD, UK Received 23 January 1997; revised 2 July 1997; accepted 3 July 1997 Abstract This paper describes the development of a methodology and associated measures for assessing control flow and both the implicit and explicit coupling of hierarchical and network models of system designs. The methodology concentrates on the assessment of the intra-module attribute of embeddedness of control flow structures and the inter-module attribute of implicit complexity of module coupling. The associated measures were developed and validated in collaboration with two commercial organizations whose interests were in reducing development time and corrective maintenance activity; data from these projects are presented here. The application of the measures has been supported by the development of a meta-CASE tool. The measures have shown considerable success in identifying areas of system designs requiring redefinition. However, subjective evaluation by experienced system designers has also provided some promising results, suggesting that qualitative analysis still has a role to play in the ‘quality’ debate. 0 1997 Elsevier Science B.V. Keywords: System design; Design measurement; Design quality 1. Introduction System design is the earliest stage of software develop- ment at which the system architecture is clearly defined [ 11. It has been suggested that quantifiable mechanisms for assessing ‘module coupling’ and ‘internal module strength’ at the high and low level design stages of software devel- opment are required [2-51. Recent theoretical and empirical research has highlighted a relationship between the appli- cation of measurement at the design stage of software development and an enhancement in the perceived ‘quality’ of the delivered product [6- 101. At the design stage of system development the ‘essence’ of the system has been established and this paper will show that clear, quantifiable, coupling and control flow features can be identified. The design phase commonly produces a number of directed graphs as a model of the proposed system. In high level models, nodes represent entities in the system, such as processes, functions and types; the directed links represent relationships (or coupling) between these design entities. In lower level models, nodes represent processing activity within an entity, and the directed links represent the flow of control between processing activities. Measures have been developed that indirectly assess either intra- or inter-module ‘complexity’. The ‘complexity’ of software is an abstract property and usually a function 0950-5849/97/$17.00 0 1997 Elsevier Science B.V. All rights reserved PII SO950-5849(97)00034-7 of several factors. For example, Fenton suggests that complexity can be viewed from a number of different per- spectives including problem, algorithmic, structural and cognitive complexity contexts [l I]. Therefore, it cannot be directly measured. What is required is a mechanism to link characteristics of the product to particular factors that are measurable. The classic examples of measures of intra- and inter-module ‘complexity’ are, respectively, McCabe’s ‘cyclomatic complexity measure’ and Henry and Kafura’s ‘information flow measure’ [ 12,131. However, later work, specifically by Shepperd, has shown that the theoretical foundations and statistical validation of these measures is flawed [14,15]. Furthermore, such intra- and inter-module measures are in conflict, intra-module measures reward decomposition into a series of simple communicating modules whilst inter-module measures can be perceived to reward large module structures with minimal communi- cation between them [ 161. This paper presents a measurement-set based on control flow and information flow theory that has been validated against quantitative commercial data. The goals of the empirical study were 1. to investigate the extent to which the proposed measure- ment-set is a good predictor of development time, 2. to investigate the extent to which the proposed measure- ment-set is a good predictor of error rate,

Upload: elaine-ferneley

Post on 05-Jul-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: An empirical study of coupling and control flow metrics

Information and Software Technology 39 (1997) 879-887 ELSEVIER

An empirical study of coupling and control flow metrics

Elaine Femeley

Manchester Metropolitan University, Department of Computing, John Dalton Extension, Chester Street, Manchester, Ml SGD, UK

Received 23 January 1997; revised 2 July 1997; accepted 3 July 1997

Abstract

This paper describes the development of a methodology and associated measures for assessing control flow and both the implicit and explicit coupling of hierarchical and network models of system designs. The methodology concentrates on the assessment of the intra-module attribute of embeddedness of control flow structures and the inter-module attribute of implicit complexity of module coupling. The associated

measures were developed and validated in collaboration with two commercial organizations whose interests were in reducing development time and corrective maintenance activity; data from these projects are presented here. The application of the measures has been supported by the development of a meta-CASE tool. The measures have shown considerable success in identifying areas of system designs requiring redefinition. However, subjective evaluation by experienced system designers has also provided some promising results, suggesting that

qualitative analysis still has a role to play in the ‘quality’ debate. 0 1997 Elsevier Science B.V.

Keywords: System design; Design measurement; Design quality

1. Introduction

System design is the earliest stage of software develop- ment at which the system architecture is clearly defined [ 11.

It has been suggested that quantifiable mechanisms for assessing ‘module coupling’ and ‘internal module strength’ at the high and low level design stages of software devel-

opment are required [2-51. Recent theoretical and empirical

research has highlighted a relationship between the appli- cation of measurement at the design stage of software

development and an enhancement in the perceived ‘quality’

of the delivered product [6- 101. At the design stage of system development the ‘essence’

of the system has been established and this paper will show that clear, quantifiable, coupling and control flow features can be identified. The design phase commonly produces a

number of directed graphs as a model of the proposed

system. In high level models, nodes represent entities in

the system, such as processes, functions and types; the directed links represent relationships (or coupling) between these design entities. In lower level models, nodes represent processing activity within an entity, and the directed links represent the flow of control between processing activities. Measures have been developed that indirectly assess either intra- or inter-module ‘complexity’. The ‘complexity’ of software is an abstract property and usually a function

0950-5849/97/$17.00 0 1997 Elsevier Science B.V. All rights reserved

PII SO950-5849(97)00034-7

of several factors. For example, Fenton suggests that complexity can be viewed from a number of different per- spectives including problem, algorithmic, structural and

cognitive complexity contexts [l I]. Therefore, it cannot be directly measured. What is required is a mechanism to

link characteristics of the product to particular factors that are measurable. The classic examples of measures of intra- and inter-module ‘complexity’ are, respectively, McCabe’s

‘cyclomatic complexity measure’ and Henry and Kafura’s

‘information flow measure’ [ 12,131. However, later work,

specifically by Shepperd, has shown that the theoretical foundations and statistical validation of these measures is flawed [14,15]. Furthermore, such intra- and inter-module

measures are in conflict, intra-module measures reward decomposition into a series of simple communicating modules whilst inter-module measures can be perceived to

reward large module structures with minimal communi-

cation between them [ 161.

This paper presents a measurement-set based on control

flow and information flow theory that has been validated against quantitative commercial data. The goals of the empirical study were

1. to investigate the extent to which the proposed measure- ment-set is a good predictor of development time,

2. to investigate the extent to which the proposed measure- ment-set is a good predictor of error rate,

Page 2: An empirical study of coupling and control flow metrics

880 E. Ferneley/Informtion and So&are Technology 39 (1997) 879-887

3. to investigate the extent to which the measurement-set out performs human judgement.

The first data-set (known as project A) was obtained from an international commercial software house; the implemen- tation was the graphical user interface (GUI) for their in-house CASE tool. The second data-set (known as project B) was obtained from an academic publishing company, the implementation was a sub-set of their order processing system. Projects A and B consisted of 93 and 121 modules respectively ranging from 4 to 685 lines of code. The two data-sets were also qualitatively assessed for perceived ‘quality’ by relatively independent software designers. To support the application of the measurement-set a meta- CASE tool has been developed that, because the measures are generic, may be reconfigured to support a wide range of software design methods.

activity embedded within a specific control flow stmc- ture. It has been shown that as the degree of embedded- ness increases so does the probability of error, and thereby the algorithmic complexity [ 191.

2. As a refinement to the DEC(i) factor the particular logi- cal constructs that the specified elementary component i

is embedded within. This refinement of the DEC(i) factor will be known as the SC(i) factor.

3. The individual SC(i) weightings are combined to provide a composite assessment of the control flow complexity (CF(i)) of the module i [20]. The CF(i) factor therefore aims to assess the complexity of the control flow structure of the specified module by considering the depth of nesting and the relationships of the con- structs sequence, selection and iteration within the module.

The control flow structure measurement philosophy can be summarized as follows:

2. The measurement-set DEC(i) depth of elementary component i (1)

SC(i) control flow history of elementary component i

(2)

U(i) f - control flow structure weighting for a i=l specific module i (3)

where p-is the number of individual processing paths.

The measurement-set considers both the intra- and inter- module features of control flow and coupling respectively.

2.1. Control Jlow structure philosophy

Control flow examines the range of processing paths within a module. The control flow structure, as illustrated in the system design, is the earliest indication of the algo- rithmic complexity of the future implementation. The com- plexity of the code’s structure has been shown to have a bearing on its understandability and thereby the ease with which future maintenance activity can be performed [ 17,181. In order to assess the control flow structure it is proposed that the following factors are considered.

1. The depth of nesting of the processing activity (DEC(i)),

a specific sub-set of processing activity is collectively referred to as an elementary component i. An elementary component is defined as the set of sequential processing

X Y

2.2. Control flow structure measurement

A weighting mechanism for control flow constructs within hierarchical structures has been defined. The hier- archical notation used when conducting this research was that of Jackson [21]. However the underlying philosophy remains unchanged regardless of the notation used. This weighting mechanism is derived from the cyclomatic

Fig. 1. Selection weightings.

Page 3: An empirical study of coupling and control flow metrics

E. Ferneley/lnformation and So&are Technology 39 (1997) 879-887 881

Fig. 2. Iteration weightings.

complexity equation that may be applied to strongly con-

nected control graphs to derive the number of linearly inde-

pendent circuits through the graph. A strongly connected

control graph is one where, for any given pair of nodes (x,

y), there is a path from x to y and a path from y to x. The linearly independent circuits through a graph, when com-

bined, generate all possible circuits through the control flow graph. A linearly independent circuit is defined as one which

cannot be derived by a combination of other circuits through

the control flow graph. The cyclomatic complexity equation v for a control flow graph G is

v(G)=e-n+l (4)

where e is the number of edges, and n is the number of

nodes in the control flow graph. To illustrate how selectable components are weighted

consider Fig. 1. Note that the extra edge has been added to the control flow graphs to make them strongly connected. The cyclomatic complexity figures for control flow graphs X and Y are 3 and 4 respectively. Therefore, when a selec- tion is encountered the corresponding weighting assigned

to the start (or parent) node equates to the number of select-

able alternatives. To illustrate how iterative components are weighted

consider Fig. 2; again an extra edge has been added to

make the graph strongly connected. The cyclomatic complexity figure for the iterative control flow graph is 2, this being the number of linearly independent paths.

At the pure design stage, the iteration may or may not be

entered. Having weighted individual control flow constructs

within an intra-module design a mechanism is required

for assessing the interrelationships between control flow constructs. To aid in this discussion consider Fig. 3 and the associated control flow graph presented in Fig. 4.

Fig. 3 has been developed using Jackson’s notation [21];

note that the type of the component is represented in its

subordinate components. In order to evaluate overall intra-

module designs each individual processing path in the model is considered (SC(i)). This is distinct from linearly independent paths as these fail to consider sequential processing activity. For example, the cyclomatic complex- ity figure for the derived control flow graph in Fig. 4 is 8 which is the same as the number of simple predicates illustrated in Fig. 3 plus 1 [ 121. However, the representation

LJ X Component identifier name 0 = Selection

cl n Component design complexity * = Multiple occurences

0 Weighting iliustraGng con@oljlow history of elementary component (SC)

Fig. 3. Overall control flow weightings.

in Fig. 3 shows 9 individual processing paths; the anomaly

is due to the explicit modelling of the sequential processing

activity N. In the control flow graph such sequential pro- cessing activity could either be incorporated within the IF

node, or explicitly modelled by a node and associated edge; in either case the sequential processing activity would have no effect on the cyclomatic complexity figure.

Therefore, the process by which an overall weighting for

a given intra-module design is achieved is by firstly deriving a weighting for each individual processing path through

the model. Consideration is given to the inherited structure

of the elementary component at the end of each individual

processing path. The rating is based upon the interrelation- ship between the elementary components’ predecessors in terms of sequence, selection and iteration. The value there-

fore takes into account ‘embeddedness’ and is the weighting for the SC(i) factor discussed earlier. The basic constructs of elementary sequential, selection and iteration are each

assigned a distinct weighting.

2.2.1. Elementary sequential components

Given that elementary components are the simplest component type they are given an initial design complexity

rating of 1. These components have an implicit complexity as they represent sequential processing activity; this com-

plexity is considered. If a series of elementary sequential

items are modelled, it may be argued that they have only

been modelled distinctly for clarity of design and could have been modelled as one elementary sequential component. However, if the designer feels that distinct modelling is necessary then such elementary sequential components are graded individually. It should be recognized that the design process is subjective. There may be many realistic designs for a specified problem, so the measurement process

Page 4: An empirical study of coupling and control flow metrics

882 E. FemeleyHnformation and Software Technology 39 (1997) 879-887

d Bsslll Hutch,ns

d Bipprst klmr

d Branch T,stlnq

d Depth of Nosting

d Dun~m~0-C8nnon

gj Extondrd product VINAP

d IS D-structumd

d lambda

d tarmth

d McCabr’s

d Number of Paths

d Numbor of Slmplr Paths

yj Pnthn’s

d PfoductVlNAP

d Statemwtt Tntln4

d SumVlNAP

d Vklt-ach-loop

4 YAM

count oct”we”~s of: _

Cakulatl) 5~0 mabicr)

24.72

5.00

u/c

3.00

1 .a2

33.00

0.00

u/c

21.00

8.00

u/c

u/c

U/C

26.00

u/c

33.00

u/c

15.87

m

Saw mrtrks to) : mrtrks.out

If wrists 9 Ovrrwrib Format 3 Tabk

Fl.Ids.parator =j Tab -

1 J r cmatool K5NSULm -/Bln/csR

dirty 1% scrronduw I rasflltrr8tol I ; lpr -pqas -Y

. ;

a MS9nlfkSth: 100% Show names) Fit In wlndow) 3

1

Fig. 4. Control flow graph representation of Fig. 3.

presented here aims to support the designer in choosing the ‘best’ of the potential designs [22]. Such distinct weighting also eliminates the problem of sequential segments of code all being assigned the same weighting regardless of length. Sequential items such as ‘M’ within the body of a hierarchy diagram are not rated as they are a Jackson specific notation facilitating diagrammatic sub-division of the model into logical processing tasks (this also explains why ‘N’ is not explicitly modelled in Fig. 4).

2.2.2. Selection The philosophy of linearly independent paths is the foun-

dation for the complexity grading as assigned to selectable components. The grading for selections is determined by the number of branches emanating from the selectable component. By using such a weighting mechanism for selections, the span of the selection in which an elementary component is embedded is reflected in the final SC(i) figure. An illustration is provided in Fig. 3 with reference to

components ‘D’ and ‘IT. ‘D’ being embedded within an iteration and a two-way selection receives an SC(D) figure of 5. ‘H’ being embedded within an iteration and a three- way selection receives an SC(H) figure of 6.

2.2.3. Iteration

Iterations are automatically assigned a grading of 2; this weighting is determined by considering the number of linearly independent paths emanating from the iteration node. The iteration, when consideration is only given to design, may or may not be invoked; hence the weighting of 2.

2.2.4. Individual processing paths

A grade for each individual processing path in the hierarchical model is derived by examining the path from the root component to the specified elementary com- ponent. Such a traversal through the hierarchical model considers the embeddedness or nesting of the individual

Page 5: An empirical study of coupling and control flow metrics

E. Femeley/lnformation and Software Technology 39 (1997) 879-887 883

Table 1 Coupling equations

Acronym Equation

FI-CF(i)

FO-CF(i)

FLMCF(i)

FO-MCF(i)

FI-CF(i) FO-CF(i) Fl-MCF(i) FO-MCF(i) CF(i) II

lx 1 CF(i) complexity totals terminating at the module i Eq. (5)

Lx , CF(i) complexity totals emanating from the module i Eq. (6)

FI - CF(i) * 1: fan - ins Eq. (7)

FO - CF(i) * Eyfan - outs Eq. (8)

fan-in complexity total of the module i fan-out complexity total of the module i fan-in multiplicity complexity total of the module i fan-out multiplicity complexity total of the module i control flow complexity of each unique information flow terminating at (or emanating from) the module i number of unique information flows terminating at (or emanating from) the module i

elements of the structure (the elementary components). For

example, in the case of ‘L’ in Fig. 3, the nesting weighting

figure (SC(L)) is 8. This is derived from adding the SC(i) weightings of ‘F’ (2), ‘G’ (3), ‘J’ (2) and ‘L’ (1).

2.2.5. Control jlow complexity

The composite weighting for assessing the complexity

of a control flow or data structure is derived by combining the individual SC(i) figures for the specified control flow or

data structure (CF(i)).

These rules provide an unambiguous mechanism for

assigning numbers to the various attributes of a control how structure allowing the relation > to be assigned to pairs of control flow structures. Hence, control flow struc- tures can be assessed on an ordinal scale.

2.3. Coupling philosophy

Coupling can be sub-divided into two broad categories:

‘import’ coupling (fan-in) and ‘export’ coupling (fan-out).

Import coupling is the extent to which a module depends on importing external data declarations from global data

structures or other modules. It is argued that the more dependent a module is on external declarations the more difficult it is to understand in isolation because, in order to

understand the local description of the module some global understanding is required. In addition, the wider the spread

of external declarations the more difficult the under-

standing process [23]. Export coupling is the extent to which a module’s internal data declarations affect the data

declarations of other modules in the system and are respons- ible for updating global data structures. Export coupling

examines how a particular module is used and how global

data structures are updated in the system. Therefore, it is argued that any change to the module under consideration has a direct effect on all modules and global data structures it is export coupled to. Research on Fortran modules by Card et al. revealed that modules with high export coupling tend to cost more and have a higher fault rate [24]. Also, as export coupling increases so does the likelihood of a

potential ripple effect when a change is made to the speci- fied module [25].

The ‘classical’ coupling measure is that as defined by

Henry and Kafura [ 131. However, as the work of Kitchen-

ham has shown, there is ambiguity in their definitions which may result in repeated counting of information flows [26].

Also, in order for Henry and Kafura’s measure to be applic-

able at the design stage of software development, the factor that considers length of code should not be included [27].

The problems of ambiguity arise if the ancestry of an infor-

mation flow is considered. However, as the work of Card

et al. and Lohse and Zweben has shown, there was no significant difference between global and parameter coupling with reference to modifiability [24,28]. Card et al.

do however show a weak negative correlation between the use of global coupling and cost [24]. Henry and Kafura’s

measure also penalized reuse; if only ‘unique’ fan-in or fan-outs are considered this oversight is eliminated [16].

The implicit structure of the fan-in and fan-outs should

also be considered [29]. For example, the passing of a single

parameter is inherently less substantial than the passing of a variant record; the information flow should therefore

be weighted accordingly. It is proposed that such a weight- ing can be achieved by applying the CF(i) equation to the

structure of the data being passed. In addition to Henry and Kafura, a number of other

authors have developed composite measures for assessing

module coupling and the empirical evidence is beginning to suggest that fan-out is more important than fan-in [8,30].

This research aims to contribute to this discussion, therefore

a composite coupling measure is not proposed, rather the distinct fan-in and fan-out coupling features are considered

independently.

2.4. Coupling measurement

A mechanism is provided for assessing the import and export coupling features of a specific module. By the application of the control flow measure (CF(i)) to the structure of the information being passed, a weighting

Page 6: An empirical study of coupling and control flow metrics

884 E. FemeleyAnformation and Sofhoare Technology 39 (1997) 879-887

mechanism for module connectivity can be defined. The process by which the individual coupling measures were derived is as follows.

Each unique fan-in and fan-out of the specified module is given a weighting based on its implicit data structure by application of the control flow measure (CF(I’)). To calculate an overall weighting for the fan-ins the individual unique fan-in W(i) figures are accumulated; an overall weighting for the unique fan-outs is calcu- lated in the same way (see Table 1). The decision to add the hierarchy figures at this stage was based on main- taining the simplest equation, relative ranking will be maintained. The calculation for a composite figure to reflect a single module’s import or export coupling is weighted to reflect the difference between a single input (or output) source and several input (or output) sources. In order to account for the number of inputs, the FZ-CF(1’) figure is multi- plied by the total number of unique fan-ins (known as the multiplicity value) giving a fan-in multiplicity com- plexity total (FZ-MCF(r’)). The fan-out multiplicity complexity total (FO-MCF(1’)) is derived in the same manner. Multiplication was applied to reflect the total number of unique possible paths into or out of a given module.

These rules provide an unambiguous mechanism for assigning numbers to the various coupling attributes allow- ing the relation > to be assigned to pairs of import or export coupling features. Hence, import and export coupling features can be assessed on an ordinal scale.

3. Data collection

A reconfigurable CASE tool (or meta-CASE tool) has been developed to support the application of the developed measures to a range of graphical representations of system designs. The CASE tool also incorporates a C parser allow- ing the measures to be reverse engineered from C source code. The meta-CASE tool has been developed on a SUN MicroSystem BLC workstation with colour display. The graphical user interface (GUI) has been implement in C within the X Window environment (OpenWindow version 3 with SUNOS UNIX version 4.13). The Xlib (MIT X Window System version 11, Release 4), Xt Intrinsics, Motif functions (OSF/Motif version 1.14) and standard Motif widgets were also extensively used by the source programs of the GUI. The Prolog language (Prolog by BIM version 3.0) was used to create the validation rules and the diagram complexity evaluation mechanisms. The logical diagram structures for diagram validation and evaluation were kept in the repository using Prolog predi- cates. Prolog was also used to implement the diagram repositories and knowledge base. The C parser was

implemented using the Practical Extraction and Report Language (Per1 Version 4.0).

The research was undertaken in collaboration with two commercial organizations (Projects A and B). The depend- ent variables examined were, for Project A, development time per design as a relative percentage and, for Project B, error rate. Project B had kept extensive information on the maintenance history of their order processing system, together with all versions of the software. The original and amended versions of the order processing system were analysed to derive cumulative figures of each module’s error rate over time. The error rate was defined as ‘the total number of errors in a module/LOG’. Qualitative data were also collected. The developed designs were categor- ized against a subjective ranking of design ‘quality’; five weightings were used. These ranged from 1 to 5, 1 the lowest and 5 being the highest quality.

4. Application of the measures and interpretation of the results

Spearman’s rank correlation coefficient rs, the coefficient of determination r2 and multi-variate outlier analysis were used to analyse the derived data. Spearman’s rank correla- tion coefficient was chosen because the application of the various intra- and inter-module measures resulted in a non-normal distribution. The coefficient of determination was used as this provides an indication of how much variation in the dependent variable can be ‘explained’ by variation in the independent variable. It is also referred to as the ‘degree of association’ coefficient. Outlier analysis was chosen as it was felt that more detailed analysis of anomalous modules may provide interesting insights into the software development process. Scatter plot graphs were used to identify outliers which were removed from the data-sets before Spearman’s rank correlation coefficient was applied.

4.1. Development time (DT)

Detailed records of the amount of time taken to develop the module designs had been kept for Project A. The module designs were developed using a CASE tool, therefore detailed records of development time were available. How- ever, development time that did not involve the use of the CASE tool such as formal and informal discussions and the manual drafting of work could not be captured. The company discouraged manual designs, preferring all work to be undertaken using the CASE tool, hence providing the developers with more immediate feedback regarding the global implications of their developing modules. How- ever, to be pedantic, this research is strictly concerned with the correlation between the application of the developed measures and development time involving the use of the CASE tool.

Page 7: An empirical study of coupling and control flow metrics

E. Ferneley/lnfonnation and Sofhvare Technology 39 (1997) 879-887 88.5

4.1.1. Control jlow results

The CFIDTresult shows an excellent correlation, rs being 0.9103. The corresponding r2 value is 83%. Therefore, with respect to this data-set in this application domain there is a reasonable degree of association between CF and develop- ment time. This result is not surprising as it may be expected that as a control flow structure employs more control flow constructs so the associated development time increases.

4.1.2. Coupling results

The correlation coefficient figures for development time and the various fan-in measures show little correlation. Indeed, two out of the three fan-in results have less than a 20% degree of association with development time (FLCFI

DT and FI-MCFIDT). Therefore, it is concluded that there is no significant relationship between fan-in and development time. This may be attributed to the philosophy behind inheritance and information hiding in that limited detailed understanding is required in order to use a previously defined entity. The rS figures for the fan-out measures are more promising, showing a significant correlation in the order FOIDT < FO-CFIDT < FO-MCFIDT over the data- set. The FO-MCF value, which considers both the number of information flows emanating from the module and the structure of those information flows, shows the most sig- nificant correlation with an rS value of 0.9223. The corre- sponding rf value is 85%. It is therefore concluded that by considering the number of output destinations and the impli- cit fan-out structure an accurate assessment of development time can be derived.

4.1.3. Outlier analysis results

Assessment of the outliers with respect to CFIDT high- lighted a number of interesting results. Firstly, two modules with unusually high CF results and correspondingly high development times consisted of a series of large CASE statements. Whilst it is recognized that the use of the CASE statement should not be penalized, detailed examina- tion of both these modules showed that they exhibited some multiple functionality and should have been divided into a series of smaller modules. Secondly, several modules had high fan-in results relative to low development time. Of specific interest were references to library modules which did not appear to affect development. Five modules had high fan-in figures as a result of referencing library modules yet did not have correspondingly high development time. A similar situation existed with three modules that had high fan-in results relative to a low development time as a result of references to non-library modules. This would concur with the conclusions of Card and Glass that fan-in is of little relevance as it is generally confined to the reuse of mathe- matical functions [30]. Thirdly, it was shown that modules that have both high fan-out and high development time values are potential indicators of a module exhibiting multi- ple functionality.

4.1.4. Inter-measure results

The results of application of the measures were tested to see whether they correlated with each other; no significant correlations were found. The strongest correlation was between CFIDT and FO-CFIDT with an rs value of 0.725, this was followed by CFIDT and FO-MCFIDT at 0.587. The results of the various correlations between CFIDT and the fan-in measure ranged from 0.321 to 0.338, whilst those correlations between the fan-in and fan-out measurement results ranged from 0.357 to 0.369.

4.2. Error rate (ER)

Project B was a four year old order processing system in a publishing company; detailed records of the system’s maintenance history had been kept. The error rate calcu- lation divided the total number of errors by the number of lines of code in the current version of the module. Obviously, as amendments had been made to the modules their length had changed, either increasing or decreasing, therefore the current length was chosen as a normalizing factor.

4.2.1. Control flow results

The results show a significant relationship between the CF measure and the dependent variable error rate. Specifi- cally, the rS and rZ figures for Project B’s data-set are 0.9249 and 0.8554 (86%) respectively. This result is perhaps not surprising as it is reasonable to assume that as an algorithm becomes increasingly elaborate so the propensity for error increases.

The rs figures for the fan-in measures show little correla- 4.2.2. Coupling results

tion between error rate and fan-in. Specifically, Project B’s FIIER, FI-CFIER and FI-MCFIER results show rS values of 0.5790, 0.5338 and 0.5602 respectively. There- fore, it is concluded that the fan-in measures are poor indi- cators of error rate. However, the rS figures for the fan-out measures show more significant strengths of association in the order FOIER < FO-MCFIER < FO-CFIER (0.7309 <

0.8608 < 0.8894). It is interesting to note that the weighted fan-out measure (FO-MCFIER) shows over a 20% improve- ment on the raw fan-out measure FOIER. Therefore, it is concluded that the weighted fan-out measures do have a significant correlation with error rate. As with development time, the fan-out measures appear to have appreciably better relationships with the dependent variable than the fan-in measures. This suggests that either the fan-in measures require redefinition or fan-in has no relationship with the dependent variables.

4.2.3. Outlier analysis results

Assessment of the outliers with respect to CFIER high- lighted a number of interesting results. Firstly, three modules had anomalously high CF values relative to low

Page 8: An empirical study of coupling and control flow metrics

886 E. Femeley/lnformation and Software Technology 39 (1997) 879-887

error rates. In all three cases the modules were complex sorting routines implemented using mature algorithms; the code had not been internally developed. Secondly, the modules with the two highest FO-CF and FO-MCF figures also had the highest error rates and were regarded by their developers as being ‘maintenance time bombs’.

4.2.4. Inter-measure results The results of application of the measures were tested to

see whether they correlated with each other; no significant correlations were found. The strongest correlation was between CFIDT and FO-MCFIDT with an r, value of 0.229, whilst the remaining correlations ranged from 0.034 to 0.134.

4.3. Design quality (DQ)

Over the years much work has been undertaken to develop software measures that can be objectively applied to determine various ‘quality’ factors [ 11. It is assumed that objective measures will be more accurate than any subjec- tive evaluation undertaken by software developers. This research decided to test this hypothesis. For project A the designs were subjectively evaluated for perceived ‘quality’ by the software development manager responsible for the CASE tool repository implementation. This manager had haSno direct responsibility for the GUI development but was obviously extremely familiar with the aims and objec- tives of the project as it was ultimately to interface with his own. For project B no such independent evaluator was available. Therefore, the team leader responsible for the system undertook to assess qualitatively the data-set. Obviously, the team leader’s past knowledge of the pro- blems encountered during the development of the project will have had an influence on the evaluation process. Future research aims to find a suitable evaluator with no knowledge of either the order processing system or the measurement philosophy.

4.3.1. Controljow result Excellent correlations was found between the subjec-

tive evaluation of it&a-module design quality and the dependent variables, the rS figures being 0.9359 and 0.9461 for projects A and B respectively. In both cases these results are marginally better than those gained

Table 2

Spearman’s rank correlation coefficient results

from application of the CF measure. However, it should be reiterated that in both instances the evaluators had some prior knowledge of either the project or the measure- ment process.

4.3.2. Coupling results When assessing inter-module design quality the evalua-

tors were only asked to look at fan-in and fan-out. It was felt that detailed assessment of the number of input sources or output destinations and the implicit structure of the inter- module relationships was too fine a level of granularity for the evaluators to be able to discern. However, even at this relatively high level, no significant correlations were found between inter-module design quality and the dependent variables. The highest rS figure was 0.4284 for project B’s fan-out/design quality versus error rate.

4.3.3. Inter-measure results The results of application of the measures were tested to

see whether they correlated with each other; as with devel- opment time and error rate, no significant correlations were found.

A precursor to this work is that of Card and Agresti where eight projects were subjectively ranked in order of best to worst in terms of design quality [ 11. The subjective ranking was correlated with the application of their design complex- ity measure using the Wilcoxon rank sum statistic, This showed that there was a probability of less than 0.02 that the observed good/poor grouping correlations with their design measure could occur by chance. Their design mea- sure is a composite mesure which considers both inter- and intra-module complexity. This work may be considered as a refinement of Card and Agresti’s work as it considers inter- and intra-module separately. Furthermore, a much larger data-set is considered here. The results presented here would suggest that, with reference to these data-sets, the designer is more able to perceive intra-module rather than inter-module features.

5. Conclusion

This paper discusses a methodology and associated measures for assessing both the implicit and explicit coupling and control flow of network and hierarchical

CF FI FO FI-CF FO-CF FI-MCF FO-MCF

Development time 0.9103 0.5621 0.8372 0.3875 0.8990 0.4408 0.9223

Error rate 0.9249 0.5790 0.7309 0.5338 0.8894 0.5602 0.8608

Design quality (A) 0.9359 0.2819 0.3174 NA NA NA NA

Design quality (B) 0.9461 0.3002 0.4284 NA NA NA NA

Design quality (A) refers to Project A.

Design quality (B) refers to Project B.

NA not assessed.

Page 9: An empirical study of coupling and control flow metrics

E. Femeley/Infonnation and So&are Technology 39 (1997) 879-887 887

models of system designs. Table 2 provides a detailed

breakdown of the correlations between the response vari-

ables (development time, error rate and design quality) and

each of the independent variables. The paper builds on the work of Card, et al. which examined the association between coupling, fault rate and cost and the work of Card and Agresti which examined the association between

inter- and intra-module complexity and percieved complex-

ity as determined by senior management [ 1,241.

The major contributions of this paper are as follows: (i)

the development of intra- and inter-module measures which

have their foundation in linearly independent path theory; (ii) the empirical validation of the developed measures;

(iii) the realization that fan-out is more significant than fan-in with reference to development time and error rate

and that the weighting of the fan-out provides a more accurate assessment of future development time and error

rate; (iv) the recognition that designer judgement still has a role to play in the assessment of system designs for per-

ceived ‘quality’. The methodology is general enough for

use with many structured design techniques that consider

algorithmic and inter-module structure. With reference to the declared goals of this empirical

study, it can be concluded that the the developed measure-

ment-set is a good predictor of development time and error

rate (goals (i) and (ii)), however the measurement-set can not be said to out perform human judgement, specifically with reference to ‘quality’, as opposed to ‘complexity’.

This paper also provides the foundation for further

research. The limited relationships between various fan-in measures and the independent variables suggest that a refinement of the measures may be suitable for application

to the object-oriented paradigm. The measures should

also be integrated into structured design methodologies

and considered with other data quality issues.

References

[I] D.N. Card, W.W. Agresti, Measuring software design complexity,

Journal of Systems and Software 8 (1988) 185-197.

[2] K.H. Moller, D.J. Paulish, Software Metrics-a Practitioner’s Guide

to Improved Product Development, Chapman and Hall, 1993.

[3] P. Goodman, Practical Implementation of Software Metrics,

International Software Quality Assurance Series, McGraw-Hill,

1993.

[4] B.A. Kitchenham, J.G. Walker, A quantitative approach to monitoring

software development, Software Engineering Journal January (1989)

2-13.

[5] T. DeMarco, Controlling Software Projects: Management, Measure-

ment and Evaluation, Yourdon Press, NJ, 1982.

[6] R.B. Grady, Hewlett-Packard-successfully applying software

metrics, IEEE Computer September (1994) 8-25.

[7] L. O’Connell, Testing times. In M. Pelru (ed.). Computing, 15th

September, 1994.

[8] A. Al-Janabi, B. Aspinwall, An evaluation of software design using

the DEMETER tool, Software Engineering Journal November (1993)

319-324.

[9] W.M. Zage, D.M. Zage, Evaluating design metrics on large-scale

software, IEEE Software May (1993).

[lo] R.B. Grady, Practical Software Metrics for Project Management and

Process Improvement, Prentice Hall, 1992.

[ll] N.B. Fenton, S.L. Pfleeger, Software Metrics: a Rigorous and

Practical Approach, International Thomson Computer Press, 1997,

2nd edn.

[ 121 T.J. McCabe, A complexity measure, IEEE Tmnsactions on Software

Engineering 2 (4) (1976).

[ 131 S. Henry, D. Kafura, Software structure metrics based on information

flow, IEEE Transactions on Software Engineering 7 (5) (1981) 5lO-

518.

[I41 M. Shepperd, D. Ince, Metrics, outlier analysis and the software

design process, Information and Software Technology 31 (2, March)

(1989).

[15] M. Shepperd, A critique of cyclomatic complexity as a software

metric, Software Engineering Journal March (1988).

[I61 M. Shepperd, Design metrics: an empirical analysis, Software

Engineering Journal 5 ( 1) ( 1990) 3- IO.

[ 171 R. Bathe, R. Tinker, A rigorous approach to metrication: a field trial

using KINDRA, Proc. IEE/BCS Software Engineering Conf., 1988,

pp. 28-32.

[18] D. Kafura, R.R. Reddy, The use of software complexity metrics in

software maintenance, IEEE Transactions on Software Engineering

13 (3, March) (1987) 335-343.

[ 191 D.A. Troy, S.H. Zweben, Measuring the quality of structured design,

Journal of Systems and Software 2 (1981) 113-120.

[20] E.H. Femeley, D.A. Howcroft, C.G. Davies, Complexity measures for

system development models. In M. Lee, B.-Z. Barta, P. Juliff (eds.).

Software Quality and Productivity: Theory, Practice, Education and

Training, Chapman Hall, 1995.

[21] M.A. Jackson, Principles of Program Design, Academic Press, 1975.

[22] 0.1. Lindland, G. Sindre, A. Solvberg, Understanding quality in con-

ceptual modelling, IEEE Software March (1994).

[23] L.C. Briand, S. Morasca, V.R. Basili, Measuring and assessing main-

tainability at the end of high level design, IEEE Software Maintenance

Conf., Quebec Montreal Canada, September 1993, pp. 88-97.

[24] D.N. Card, V.E. Church, W.W. Agresti, An empirical study of soft-

ware design practices, IEEE Transactions on Software Engineering 12

(2, February) (1986) 264-270.

[25] K. Kronlot, Method Integration, Wiley, 1993.

[26] B.A. KItchenham, An evaluation of software structure metrics, Proc.

12th Int. Computer Software and Applications Conf. COMPSAC.

IEEE, October 1988.

[27] B.A. Kitchenham, L.M. Pickard, S.J. Linkman, An evaluation

of some design metrics, Software Engineering Journal 5 (1) (1990)

50-58.

[28] J.B. Lohse, S.H. Zweben, Experimental evaluation of software design

practices: An investigation into the effect of module coupling on

system modifiability, Journal of Systems and Software 4 (1984)

301-308.

[29] J. Karimi, B.R. Konsynski, An automated software design assistant,

IEEE Transactions on Software Engineering I4 (2, February) (1988)

194-210.

[30] D.N. Card, R. Glass, Measuring Software Design Quality, Addison-

Wesley, 1990.