generating test data from ocl constraints with search techniques

27
Generating Test Data from OCL Constraints with Search Techniques Shaukat Ali, Member, IEEE Computer Society, Muhammad Zohaib Iqbal, Member, IEEE Computer Society, Andrea Arcuri, Member, IEEE Computer Society, and Lionel C. Briand, Fellow, IEEE Abstract—Model-based testing (MBT) aims at automated, scalable, and systematic testing solutions for complex industrial software systems. To increase chances of adoption in industrial contexts, software systems can be modeled using well-established standards such as the Unified Modeling Language (UML) and the Object Constraint Language (OCL). Given that test data generation is one of the major challenges to automate MBT, we focus on test data generation from OCL constraints in this paper. This endeavor is all the more challenging given the numerous OCL constructs and operations that are designed to facilitate the definition of constraints. Though search-based software testing has been applied to test data generation for white-box testing (e.g., branch coverage), its application to the MBT of industrial software systems has been limited. In this paper, we propose a set of search heuristics targeted to OCL constraints to guide test data generation and automate MBT in industrial applications. We evaluate these heuristics for three search algorithms: Genetic Algorithm, (1+1) Evolutionary Algorithm, and Alternating Variable Method. We empirically evaluate our heuristics using complex artificial problems, followed by empirical analyses of the feasibility of our approach on one industrial system in the context of robustness testing. Our approach is also compared with the most widely referenced OCL solver (UMLtoCSP) in the literature and shows to be significantly more efficient. Index Terms—OCL, search-based testing, test data generation, empirical evaluation, search-based software engineering, model- based testing Ç 1 INTRODUCTION M ODEL-BASED testing (MBT) has recently received in- creasing attention in both industry and academia [1]. MBT leads to systematic, automated, and thorough system testing, which would often not be possible without models. However, the full automation of MBT, which is a require- ment for scaling up to real-world systems, requires supporting many tasks, including preparing models for testing (e.g., flattening state machines), defining appropri- ate test strategies and coverage criteria, and generating test data to execute test cases. Furthermore, to increase chances of adoption, using MBT for industrial applications should rely on well-established modeling standards, such as the Unified Modeling Language (UML) and its associated language to write constraints: the Object Constraint Language (OCL) [2]. OCL is a standard language that is widely accepted for writing constraints on UML models. OCL is based on first-order logic (FOL) and set theory, and provides various constructs (e.g., collection operations) to define constraints in a concise form. The language allows modelers to write constraints at various levels of abstraction and for various types of models. For example, it can be used to write class and state invariants, guards in state machines, constraints in sequence diagrams, and pre and postconditions of opera- tions. A basic subset of the language has been defined that can be used with metamodels defined in Meta Object Facility (MOF) [3] (which is a standard defined by Object Management Group (OMG) for defining metamodels). This subset of OCL has been used in the definition of UML for constraining various elements of the language. Moreover, the language is also used in writing constraints while defining UML profiles, which is a standard way of extending UML for various domains using predefined extension mechanisms. OCL has been used in many industrial projects for various purposes such as for configuration management [4] and test case generation [5], [6], [7]. OCL is also being used as the language for writing constraints on models in many commercial MBT tools such as CertifyIt [8] and QTronic [9]. Due to the ability of OCL to specify constraints for various purposes during modeling, for example when defining guard conditions or state invariants in state machines, such constraints play a significant role when testing is driven by models. For example, in state-based testing, if the aim of a test case is to execute a guarded transition—where the guard is written in OCL based on input values of the trigger and/or state variables—to achieve full transition coverage, then it is essential to provide input values to the event that triggers the transition such that the values satisfy the guard. Another example can 1376 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 39, NO. 10, OCTOBER 2013 . S. Ali and A. Arcuri are with the Certus Software V&V Center, Simula Research Laboratory, PO Box 134, Lysaker 1325, Norway. E-mail: {shaukat, arcuri}@simula.no. . M.Z. Iqbal is with the Department of Computer Science, National University of Computer and Emerging Sciences (FAST), A.K. Brohi Road, H-11/4, Islamabad 44000, Pakistan, and the SnT Centre, University of Luxembourg, Luxembourg. E-mail: [email protected]. . L.C. Briand is with the Faculte´ des Sciences, de la Technologie et de la Communicat, University of Luxembourg, 4, rue Alphonse Weicker, Luxembourg L-2721, Luxembourg. E-mail: [email protected]. Manuscript received 23 June 2012; revised 11 Dec. 2012; accepted 23 Mar. 2013; published online 29 Mar. 2013. Recommended for acceptance by T. Menzies. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TSE-2012-06-0165. Digital Object Identifier no. 10.1109/TSE.2013.17. 0098-5589/13/$31.00 ß 2013 IEEE Published by the IEEE Computer Society

Upload: lionel-c

Post on 27-Jan-2017

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Generating Test Data from OCL Constraints with Search Techniques

Generating Test Data from OCL Constraintswith Search Techniques

Shaukat Ali, Member, IEEE Computer Society,

Muhammad Zohaib Iqbal, Member, IEEE Computer Society,

Andrea Arcuri, Member, IEEE Computer Society, and Lionel C. Briand, Fellow, IEEE

Abstract—Model-based testing (MBT) aims at automated, scalable, and systematic testing solutions for complex industrial software

systems. To increase chances of adoption in industrial contexts, software systems can be modeled using well-established standards

such as the Unified Modeling Language (UML) and the Object Constraint Language (OCL). Given that test data generation is one of

the major challenges to automate MBT, we focus on test data generation from OCL constraints in this paper. This endeavor is all the

more challenging given the numerous OCL constructs and operations that are designed to facilitate the definition of constraints.

Though search-based software testing has been applied to test data generation for white-box testing (e.g., branch coverage), its

application to the MBT of industrial software systems has been limited. In this paper, we propose a set of search heuristics targeted to

OCL constraints to guide test data generation and automate MBT in industrial applications. We evaluate these heuristics for three

search algorithms: Genetic Algorithm, (1+1) Evolutionary Algorithm, and Alternating Variable Method. We empirically evaluate our

heuristics using complex artificial problems, followed by empirical analyses of the feasibility of our approach on one industrial system in

the context of robustness testing. Our approach is also compared with the most widely referenced OCL solver (UMLtoCSP) in the

literature and shows to be significantly more efficient.

Index Terms—OCL, search-based testing, test data generation, empirical evaluation, search-based software engineering, model-

based testing

Ç

1 INTRODUCTION

MODEL-BASED testing (MBT) has recently received in-creasing attention in both industry and academia [1].

MBT leads to systematic, automated, and thorough systemtesting, which would often not be possible without models.However, the full automation of MBT, which is a require-ment for scaling up to real-world systems, requiressupporting many tasks, including preparing models fortesting (e.g., flattening state machines), defining appropri-ate test strategies and coverage criteria, and generating testdata to execute test cases. Furthermore, to increase chancesof adoption, using MBT for industrial applications shouldrely on well-established modeling standards, such as theUnified Modeling Language (UML) and its associatedlanguage to write constraints: the Object ConstraintLanguage (OCL) [2].

OCL is a standard language that is widely acceptedfor writing constraints on UML models. OCL is based on

first-order logic (FOL) and set theory, and provides variousconstructs (e.g., collection operations) to define constraintsin a concise form. The language allows modelers to writeconstraints at various levels of abstraction and for varioustypes of models. For example, it can be used to write classand state invariants, guards in state machines, constraints insequence diagrams, and pre and postconditions of opera-tions. A basic subset of the language has been defined thatcan be used with metamodels defined in Meta ObjectFacility (MOF) [3] (which is a standard defined by ObjectManagement Group (OMG) for defining metamodels). Thissubset of OCL has been used in the definition of UML forconstraining various elements of the language. Moreover,the language is also used in writing constraints whiledefining UML profiles, which is a standard way ofextending UML for various domains using predefinedextension mechanisms. OCL has been used in manyindustrial projects for various purposes such as forconfiguration management [4] and test case generation [5],[6], [7]. OCL is also being used as the language for writingconstraints on models in many commercial MBT tools suchas CertifyIt [8] and QTronic [9].

Due to the ability of OCL to specify constraints forvarious purposes during modeling, for example whendefining guard conditions or state invariants in statemachines, such constraints play a significant role whentesting is driven by models. For example, in state-basedtesting, if the aim of a test case is to execute a guardedtransition—where the guard is written in OCL based oninput values of the trigger and/or state variables—toachieve full transition coverage, then it is essential toprovide input values to the event that triggers the transitionsuch that the values satisfy the guard. Another example can

1376 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 39, NO. 10, OCTOBER 2013

. S. Ali and A. Arcuri are with the Certus Software V&V Center, SimulaResearch Laboratory, PO Box 134, Lysaker 1325, Norway.E-mail: {shaukat, arcuri}@simula.no.

. M.Z. Iqbal is with the Department of Computer Science, NationalUniversity of Computer and Emerging Sciences (FAST), A.K. Brohi Road,H-11/4, Islamabad 44000, Pakistan, and the SnT Centre, University ofLuxembourg, Luxembourg. E-mail: [email protected].

. L.C. Briand is with the Faculte des Sciences, de la Technologie et de laCommunicat, University of Luxembourg, 4, rue Alphonse Weicker,Luxembourg L-2721, Luxembourg. E-mail: [email protected].

Manuscript received 23 June 2012; revised 11 Dec. 2012; accepted 23 Mar.2013; published online 29 Mar. 2013.Recommended for acceptance by T. Menzies.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TSE-2012-06-0165.Digital Object Identifier no. 10.1109/TSE.2013.17.

0098-5589/13/$31.00 � 2013 IEEE Published by the IEEE Computer Society

Page 2: Generating Test Data from OCL Constraints with Search Techniques

be to generate valid parameter values based on theprecondition of an operation.

Test data generation is an important component of MBTautomation. For UML models, with constraints in OCL, testdata generation is a nontrivial problem as OCL containsmany sophisticated constructs facilitating the definition ofconstraints. A few approaches in the literature exist thataddress this issue. But most of them, as we will explain inmore detail in the paper, either do not handle importantfeatures of OCL such as collections or their operations [10],[11], are not scalable, or lack proper tool support [12]. Thisis a major limitation when it comes to the industrialapplication of MBT approaches that use OCL to specifyconstraints on models.

This paper provides and assesses novel heuristics for theapplication of search-based techniques, such as GeneticAlgorithm (GA), (1 + 1) Evolutionary Algorithm (EA), andAlternating Variable Method (AVM), to generate test datafrom OCL constraints [2]. We implemented our test datagenerator in Java and evaluated its application to therobustness testing of industrial Video Conferencing Sys-tems (VCSs) developed by Cisco Systems, Inc., Norway.

This paper is an extended version of the conferencepaper presented in [13]. The differences in this paper fromthe conference version include:

1. the handling of three-valued Boolean logic supportedby OCL 2.2 in contrast to the two-valued logicreported earlier;

2. new heuristics for operations on collections, specialoperations (e.g., oclInState), and user- defined opera-tions;

3. an augmented empirical evaluation based on anindustrial case study, which is extended with newconstraints and an additional search algorithm(Alternating Variable Method);

4. the empirical evaluation of the individual heuristicson several artificial problems;

5. a comprehensive comparison with the existing workin the literature on test data generation from OCLconstraints;

6. a detailed empirical evaluation comparing our workwith the most widely used and referenced OCLconstraint solver in the literature (UMLtoCSP [14]).

The rest of the paper is organized as follows: Section 2discusses the background of our work and Section 3surveys related work. In Section 4, we first present therepresentation of our test data generation problem in thecontext of OCL followed by the definition of distancefunctions for various OCL constructs. Section 5 describesour tool support and presents a running example demon-strating test data generation in the context of MBT. Section 6reports on our industrial case study, a Video ConferencingSystem, and reports the results of the application of ourapproach, whereas Section 7 provides an empirical evalua-tion of heuristics on a set of artificial problems. Section 8provides an overall discussion of both empirical evaluationspresented in Sections 6 and 7. Section 9 addresses thethreats to validity of our empirical study, and finally,Section 10 concludes the paper.

2 BACKGROUND

In this section, we will provide background informationon topics that will help understanding the remainder ofthis paper.

2.1 Search-Based Software Testing (SBST)

The main aim of software testing is to detect as many faultsas possible, especially the most critical ones, in the systemunder test (SUT). To gain sufficient confidence that mostfaults are detected, testing should ideally be exhaustive.Since in practice this is not possible, testers resort to testmodels and coverage/adequacy criteria to define systema-tic and effective test strategies that are fault revealing. A testcase typically consists of test data and assertions on theexpected output [15]. Test data can take various forms, suchas values for input parameters of a function or a sequence ofmethod calls.

To perform test case generation, systematically andefficiently, automated test case generation strategiesshould be employed. Bertolino [16] addresses the need for100 percent automatic testing as a means to improve thequality of complex software systems that are becoming thenorm of modern society. A comprehensive testing strategymust address a number of activities that should ideally beautomated: the generation of test requirements, test casegeneration, test oracle generation, test case selection, or testcase prioritization. A promising strategy for tackling thischallenge comes from the field of search-based softwareengineering [17], [18], [19].

Search-based software engineering attempts to solve avariety of software engineering problems by reformulatingthem as search problems [18]. A major research area in thisdomain is the application of search algorithms to test casegeneration [19]. Search algorithms are a set of genericalgorithms that are used to find optimal or near optimalsolutions to problems that have large complex search spaces[18]. There is a clear match between search algorithms andsoftware test case generation. The process of generating testcases can be seen as a search or an optimization process:There are possibly hundreds of thousands of test cases thatcould be generated for a particular SUT. From this pool, weneed to select, systematically and at a reasonable cost, thosethat comply with certain coverage criteria and are expectedto be fault revealing, at least for certain types of faults.Hence, we can reformulate the generation of test cases as asearch that aims at finding the required or optimal set oftest cases from the space of all possible test cases. Whensoftware testing problems are reformulated into searchproblems, the resulting search spaces are usually verycomplex, especially for realistic and real-world SUTs. Forexample, in the case of white-box testing, this is due to thenonlinear nature of software resulting from control struc-tures such as if-statements and loops [20]. In such cases,simple search strategies may not be sufficient and globalsearch algorithms may, as a result, become a necessity asthey implement global search and are less likely to betrapped into local optima [21]. The use of search algorithmsfor test case generation is referred to as search-basedsoftware testing [22]. Mantere and Alander [23] discuss theuse of search algorithms for software testing in general andMcMinn [24] provides a survey of some of the searchalgorithms that have been used for test data generation.

ALI ET AL.: GENERATING TEST DATA FROM OCL CONSTRAINTS WITH SEARCH TECHNIQUES 1377

Page 3: Generating Test Data from OCL Constraints with Search Techniques

SBST has been shown to produce results that arecomparable to human competence [25], and the field hasreached a state mature enough to obtain real-world resultsof interest for practitioners. For example, there exist SBSTtools (e.g., EvoSuite [26]) that have been successfully usedto automatically generate test suites for open source projectsrandomly selected from open source repositories (e.g.,SourceForge), and not just small, artificial, hand-pickedcase studies [27].

2.2 Description of a Selection of Search Algorithms

The most common search algorithms that have beenemployed for SBST are evolutionary algorithms (e.g., GA),simulated annealing, hill climbing (HC), ant colonyoptimization, and particle swarm optimization [28]. Amongthese algorithms, HC is a simpler, local search algorithm.The SBST techniques using more complex, global searchalgorithms are often compared with test case generationbased on HC and random search [29] to determine whethertheir complexity is warranted to address a specific test casegeneration problem. The use of a more complex searchalgorithm may only be justified if it performs significantlybetter than, for instance, random search. To use a searchalgorithm, a fitness function needs to be defined. The fitnessfunction should be able to evaluate the quality of acandidate solution (i.e., an element in the search space).The fitness function is problem dependent, and proper careneeds to be taken for developing adequate fitness functions.The fitness function will be used to guide the search towardfitter solutions. Below, we provide a brief description of thesearch algorithms that we used in this paper.

2.2.1 Genetic Algorithm

Genetic Algorithms are the most well-known searchalgorithm [30] and are inspired by the Darwinian evolutiontheory. A population of individuals (i.e., candidate solu-tions) is evolved through a series of generations, wherereproducing individuals evolve through crossover andmutation operators. GA is the most commonly used typeof algorithm and, hence, we do not provide further details;the interested reader may consult the following referencefor more details [31].

2.2.2 (1+1) Evolutionary Algorithm

The (1+1) Evolutionary Algorithm [32] is simpler than GA.In (1+1) EA, population size is one, i.e., we have only oneindividual in the population and the individual is repre-sented as a bit string. As opposed to GA, we do not use thecrossover operator but only rely on a bitwise mutationoperator for exploring the search space. To produce anoffspring, this operator independently flips each bit in thebit string with a probability (p) based on the length ofthe string. If the fitness of the child is not worse than that ofthe parent (bit string of the child before mutation), the childis retained for the next generation.

2.2.3 Alternating Variable Method

Alternating Variable Method is a local search algorithm firstintroduced by Korel [33]. The algorithm works in thefollowing way: Suppose we have a set of variablesfv1; v2; . . . ; vng, we then try to maximize fitness of v1 whilekeeping the rest of the variables constant, which aregenerated randomly. The search is stopped if a solution is

found; otherwise, if the solution is not found but we found aminimum with respect to v1, we switch to the secondvariable. Now, we fix v1 at the found minimum value andtry to minimize v2 while keeping the rest of the variablesconstant. The search continues in this way until we find asolution or we have explored all the variables.

3 RELATED WORK

The Object Constraint Language is based on first-orderlogic, but has some distinct features that differentiate itfrom FOL, such as the internal object representation andsupport for three-valued logic (i.e., undefined, true, and falsefor constraint evaluation) [2], [10]. There are a number ofapproaches that have been proposed in the literature thatdeal with evaluating and solving OCL constraints forvarious purposes, such as test data generation, modelchecking, and theorem proving. In this section, we analyzehow these approaches relate to our approach, even thoughtheir objectives may differ. In Section 3.1, we relate our testdata generator with OCL evaluators. In Section 3.2, wediscuss related OCL constraint solving approaches that maybe used for test data generation. In Section 3.3, we considerthe approaches that specifically support test data generationfrom OCL constraints, whereas Section 3.4 discusses the useof search-based heuristics for testing.

3.1 Comparison with OCL Constraints Evaluation

An OCL evaluator tells whether an instantiation of a classdiagram provided to it satisfies a constraint on the classdiagram. Several OCL evaluators are currently availablethat can be used to evaluate OCL constraints such asOCLE 2.0 [34], Eclipse OCL [35], Dresden OCL Toolkit[36], USE [37], EyeOCL [38], and the OCL evaluation inCertifyIt by Smartesting [8]. Our work requires an OCLevaluator for two reasons: 1) An evaluator tells if aconstraint is solved, 2) an evaluator helps in calculating thefitness (e.g., using a branch distance [31]) of an OCLexpression to guide a search algorithm toward a solution.Any of the above-mentioned OCL evaluators can beextended with our proposed heuristics (Section 4) for testdata generation because these heuristics are defined onstandard OCL constructs.

3.2 OCL-Based Constraint Solvers

Tables 1 and 2 present various approaches in the literaturethat may be used for test data generation from OCLconstraints. In Table 1, in the second column we report ifan approach translates OCL into another formalism beforesolving, followed by the name of the formalism (Inter-mediate Representation) in the third column (e.g., Alloy [39]).The fourth column presents the OCL subset supported byan approach, followed by the support for three-valuedlogic in the last column. In Table 2, we present thefollowing information about an approach; 1) tool support(second column), 2) type of approach for solving, forexample, theorem proving (third column), 3) support fortest data generation.

Many approaches are proposed in the literature (Tables 1and 2) to solve OCL constraints for various purposes suchas checking unsatifiability [41], model-checking [40], andreasoning about models [43]. These approaches usuallytranslate constraints and associated models into anotherformalism (e.g., Alloy [39], temporal logic BOTL [40], FOL

1378 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 39, NO. 10, OCTOBER 2013

Page 4: Generating Test Data from OCL Constraints with Search Techniques

[41], Prototype Verification System (PVS) [45], graphconstraints [44], Higher-Order Logic (HOL) [49]), whichcan then be analyzed by a constraint analyzer (e.g., Alloyconstraint analyzer [51], model checker [40], SatisfiabilityModulo Theories (SMT) Solver [41], theorem prover [41],[45], [49]). Satisfiability Problem (SAT) solvers have alsobeen used for evaluating OCL specifications, for example,OCL operation contracts (e.g., [52], [46]). The approacheslisted in Table 2 that do not support test data generation(Test Data Generation column) may normally be extended fortest data generation. In Section 6.2, we compare ourapproach with one such approach (UMLtoCSP), which is

one of the most referenced works in the literature and forwhich a tool can be downloaded. We do not compare ourapproach with the rest of OCL constraint solving ap-proaches because their main aim is not test data generation.

3.3 OCL Test Data Generation Approaches

From all of the approaches listed in Tables 1 and 2, only fiveapproaches ([10], [11], [12], [47], and [49]) are specificallydeveloped for test data generation from OCL constraintsand thus are directly related to our approach. Below, wecompare our approach with them based on the differentcriteria listed in Tables 1 and 2.

ALI ET AL.: GENERATING TEST DATA FROM OCL CONSTRAINTS WITH SEARCH TECHNIQUES 1379

TABLE 1Summary of OCL Constraint Solving Approaches

TABLE 2Summary of OCL Constraint Solving Approaches

Page 5: Generating Test Data from OCL Constraints with Search Techniques

3.3.1 Translation to Formalism

All five approaches considered solve OCL constraints togenerate data by translating OCL into another formalismsuch as HOL [49] and SAT formulas [10]. Such translationis an additional overhead and our approach aims atavoiding such overhead by directly solving OCL con-straints to generate test data. Moreover, in many cases it isnot always possible to translate all features of OCL andassociated models into a given target formalism. Inaddition, such translation may result in combinatorialexplosion. For example, the authors in [14] note that theconversion of OCL to a SAT formula or a CSP instance caneasily lead to combinatorial explosion as the complexity ofthe model and constraints increases. One example factorthat could easily lead to a combinatorial explosion whenconverting an OCL constraint into an instance of SATformula is when the number of variables and their rangesare large. Conversion to a SAT formula requires that aconstraint must be encoded into Boolean formulas at the bit-level and, as the number of variables increases in theconstraint, chances of a combinatorial explosion thereforeincrease. For industrial scale systems, as in our case study,this is a major limitation because the models andconstraints are generally quite complex.

3.3.2 OCL Subset Supported

From Table 1, we can see that the five approachesconsidered do not handle important OCL features (e.g.,Collections [10], [47], Associations [11], Enumerations [10],[49], three-valued logic [10], [11], [12], [47]). Most of theunsupported OCL features are either used in our industrialcase study or are not efficiently solved by the existingapproaches (we will discuss them in Section 5.2). Incomparison, our approach supports a much more compre-hensive subset of OCL.

3.3.3 Approach Type

The five approaches we consider here use various types oftechniques for test data generation after translating OCLconstraints into other formalisms, including SAT solving[10], partition testing [11], [12], [47], and theorem proving[49]. In contrast, we propose a new approach which doesnot require translation and uses search algorithms based onnovel heuristics to generate test data. As discussed inSection 3.3.1, such translation is an extra overhead and is amajor limitation for large-scale industrial systems.

3.3.4 Tool Support

Automation is one of the key features for test datageneration. Among the five considered approaches, no toolsupport is mentioned in [11] and [12]. Our approach isautomated, as detailed in Section 5.

3.3.5 Three-Valued Logic

The OCL supports three-valued logic [2], i.e., eachconstraint in addition to being true or false may evaluateto undefined. The undefined value is commonly used toindicate an error in the evaluation of an expression such asdivide by zero. Among the five considered approaches,only one of them ([49]) handles the OCL three-valued logic,like we do.

3.3.6 Summary

To summarize, none of the five test data generationapproaches entirely meet the requirements of large scale,real-world industrial applications, such as the videoconferencing case study discussed in this paper (Section 6).Automation is a fundamental requirement for such adop-tion, which only three of these approaches support. None ofthese approaches support all OCL constructs, many of them(e.g., operations on collections) being important for success-ful industrial adoption. All five approaches translate OCLconstraints into other formalisms to generate test data. Suchtranslation is either not entirely possible or is not scalablefor industrial systems, as discussed in Section 3.3.1.

3.4 Search-Based Heuristics for Model-BasedTesting

The application of search-based heuristics for MBT hasreceived significant attention recently (e.g., [53], [54]).Existing techniques apply heuristics to guide the searchfor test data that should satisfy different types of coveragecriteria on state machines, such as state coverage. Achievingsuch coverage criteria is far from trivial because guards ontransitions can be arbitrarily complex. Finding the rightinputs to trigger these transitions is not simple. Heuristicshave been defined based on common practices in white-box, search-based testing, such as the use of branch distanceand approach level [31]. Our goal is to tailor these heuristicsfor test data generation from OCL constraints.

There are several advances in the field of SBST(especially for test data generation at the unit level) thatcan be adapted to solving OCL constraints and integrated inour tool. For example, Alshraideh and Bottaci [55] providedfitness functions to handle constraints involving stringcomparisons. McMinn et al. [56] used web queries to furtherhelp the solving of constraints involving strings. Fraser andArcuri [57] evaluated different seeding strategies to boostthe search by starting from fitter individuals. Issues withvariable length representations can be addressed with bloatcontrol techniques [58]. When generating test data to obtainfull coverage on a state machine, instead of trying to solveeach constraint individually, the whole test suite approach[59] could be used instead to further improve performance.The probability of applying the different search operatorscan be updated at runtime based on fitness feedback, asdone, for example, by Poulding et al. [60]. Furthermore,search algorithms can be easily run in parallel, even oninexpensive Graphical Processing Units (GPUs) [61].

4 HEURISTICS FOR TEST DATA GENERATION

In this section, we provide novel heuristics that are used bysearch-algorithms to generate test data from OCL con-straints. In Section 4.2.1, we define the representation of theproblem, followed by the definitions of fitness functions inSection 4.2.2.

4.1 Representation of Problem

In the context of test data generation from OCL constraints,a problem is equivalent to an OCL constraint. An OCLconstraint C (problem) is composed of a set of Booleanclauses fB1; B2; . . . ; Bng joined using Boolean operations,i.e., {and, or, implies, xor, not}. Each Boolean clause Bi isdefined over a set of variables fBi1; Bi2; . . . ; Bimg. The

1380 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 39, NO. 10, OCTOBER 2013

Page 6: Generating Test Data from OCL Constraints with Search Techniques

problem (C) to be solved by a search-algorithm can berepresented as a set of variables: C ¼ [nx¼1 [

mx

y¼1 fBxyg,where n is the number of clauses in a constraint and mx

is the number of variables involved in the xth clause. Notethat mx may be different across clauses.

In OCL, all data types are subtypes of OCLAny, which iscategorized into two subtypes: primitive types and collec-tion types. Primitive types are Real, Integer, String, andBoolean, whereas collection types include Collection as supertype with subtypes Set, OrderedSet, Bag, and Sequence.Therefore, a clause in an OCL constraint may use primitivedata types or collection-related types. A constraint can bedefined on variables of different types, such as equalities ofintegers and comparisons of strings. As an example,consider the UML class diagram in Fig. 1 consisting oftwo classes: University and Student. Constraints Universityand Student are shown in Fig. 2.

The first constraint states that the age of every Studentshould be greater than 15. Based on the type of attribute ageof the class Student, which is Integer, the comparison in theclause is determined to involve integers. The secondconstraint states that the number of students in theuniversity should be greater than 0. In this case, the size()operation, which is defined on collections in OCL andreturns an Integer denoting the number of elements in acollection, is called on collection student (containing ele-ments of the class Student). Even though an operation iscalled on a collection, the comparison is between twointegers (return value from operation size() and 0).

4.2 Definition of the Fitness Function for OCL

To guide the search for test data satisfying OCL constraints,it is necessary to define a set of heuristics. A heuristicindicates “how far” input data are from satisfying theconstraint. For example, let us say we want to satisfy theconstraint x ¼ 0, and suppose we have two data inputs:x1 :¼ 5 and x2 :¼ 1;000. Both inputs x1 and x2 do notsatisfy x ¼ 0, but x1 is heuristically closer to satisfy x ¼ 0than x2. A search algorithm would use such a heuristic as afitness function, to reward input data that are closer tosatisfy the target constraint.

In this paper, to generate test data to solve OCLconstraints, we use a fitness function that is adapted fromthe work targeting code coverage (e.g., for branch coveragein C code [31]). In particular, we use the so-called branchdistance (a function d()), as defined in [31]. The function d()returns a value 0 if the constraint is solved, and otherwise apositive value that heuristically estimates how far theconstraint was from being evaluated to true. As for anyheuristic, there is no guarantee that an optimal solution(e.g., in our case, input data satisfying the constraints) willbe found in reasonable time. Nevertheless, many successfulresults based on such heuristics are reported in theliterature for various software engineering problems [62].In cases where we want a constraint to evaluate to false, wecan simply negate the constraint and find data for whichthe negated constraint evaluates to true. For example, if wewant to prevent the firing of a guarded transition in a state

machine, we can simply negate the guard and find data forthe negated guard.

Following, we will discuss branch distance functions fordifferent types of clauses in OCL.

4.2.1 Primitive Types

Primitive types supported by OCL includes Integer, Real,String, and Boolean. The Integer type represents a set ofinteger values, the Real type holds a set of real numbers,Boolean is either true or false, and String contains strings overan alphabet. In the OCL, the domain of each of these datatypes also contains a special value called undefined (?).According to the OCL specifications [2], this value is usedfor the following purposes: 1) An undefined value can beassigned to an attribute for which the value is currentlyunknown and may be assigned later, when it is known;2) an undefined value may be assigned to an attribute for aparticular instance for which this attribute cannot have anyvalue; 3) an undefined value may represent an error in theevaluation of an expression such as divide by zero. In ourcontext of test data generation, only the third purpose isrelevant, where an undefined evaluation indicates an error.To resolve such situations without stopping the search, wedefine specialized heuristics, which are discussed next.

When a Boolean variable b is true, then the branch distanceis 0, i.e., dðbÞ ¼ 0. When b is false, then dðbÞ > 0 and dðbÞ < k,where k is an arbitrary positive constant, for example k ¼ 1,and when b is ?, then d(b) is k. If a Boolean variable isobtained from an operation call, then in general the branchdistance would take one of these three possible values. Forexample, when the operation is Empty() is called on acollection, the branch distance would take 0, a value between0 and k, or k, unless a more fine grained specialized distancecalculation is specified (e.g., returning the number ofelements in the collection). For some types of OCLoperations (e.g., forAll()), we can provide more fine grainedheuristics. We will provide more details on these operationsand their corresponding branch distance calculations inSection 4.2.2. Note that OCL supports two other specialvalues, which are null and invalid. These two values havetruth tables identical to undefined ([2, Section 7.4]), and sothese values are treated exactly the same way as we treatundefined in our approach. In the rest of the paper, we onlyprovide branch distance calculations corresponding to aBoolean clause evaluating to true, false, or undefined.

The operations defined in OCL to combine Booleanclauses are or, xor, and, not, if then else, and implies. Thethree-valued truth values for these operations are shownin Table 3. For these operations, branch distances areadopted from [31], but are extended to handle undefined(?). For example, for A and B, the branch distance iscalculated as follows:

dðA and BÞ:¼ numberOfUndefinedClauses

þ nor ðdðAÞ þ dðBÞÞ:

ALI ET AL.: GENERATING TEST DATA FROM OCL CONSTRAINTS WITH SEARCH TECHNIQUES 1381

Fig. 1. Example class diagram.

Fig. 2. Example constraints.

Page 7: Generating Test Data from OCL Constraints with Search Techniques

Notice that the standard definition of dðAandBÞ is dðAÞ þdðBÞ as specified in [31], but this calculation only accountsfor two truth values, i.e., true and false. By following thesame definition to account for undefined ð?Þ, if B isundefined, then the branch distance will be dðAÞ þ k.However, in this case, there is no gradient in avoiding theundefined value, i.e., turning from ? to either false or true.To provide a gradient for turning ? to either true or false, weextended the standard branch distance calculation byadding numberOfUndefinedClauses. This variable holds thenumber of undefined clauses in an expression and wenormalize dðAÞ þ dðBÞ between 0 and 1 so that the searchgives priority to minimizing numberOfUndefinedClauses, i.e.,turning ? to false/true. If the search manages to do that, thedistance will be decreased by 1. In the same, way, weextend the standard branch distance calculation of or whichis specified in [31].

Operations implies and xor are syntactic sugars thatusually do not appear in programming languages such asC and Java, and can be reexpressed using combinations ofand and or operators, as shown in Table 3. A side effect ofour extended branch distance for some Boolean operationssuch as implies and xor is that an expression P may beevaluated to true even if dðP Þ! ¼ 0. For example, A implies Bevaluates to true when A is false and B is ?. In this case,d(A implies B) will be 1, but the expression is true. To dealwith this, our implementation stops the search when anexpression is true, even if its distance is still not zero. Theevaluation of d() on an expression composed by twoclauses is specified in Table 4 and can simply berecursively computed for more than two clauses.

When an expression or one of its parts is negated, thenthe expression is transformed by moving the negationinward to the basic clauses, for example, not ðA and BÞwould be transformed into not A or not B.

For the numeric data types, i.e., Integer and Real, therelational operations that return Booleans (and so can beused in clauses) are <; , > , <¼ , >¼ , and <> . For theseoperations, we extended the branch distance calculationfrom [31] for three-valued logic as shown in Table 5.

In OCL, several other operations are defined on Real andInteger such as þ, �, �, =, absðÞ, divðÞ, modðÞ, maxðÞ, andminðÞ. Since these operations are not used to compare twonumerical values in clauses, there is no need to define abranch distance for them. For example, considering a and bof type Integer and a constraint aþ b � 3 < 4, then theoperations + and � are used only to define the constraint.The overall result of the expression aþ b � 3 will be anInteger and the clause will be considered as a comparison oftwo values of Integer type. For the String type, OCL definesseveral operations such as ¼ , þ, sizeðÞ, concatðÞ,substringðÞ, and to IntegerðÞ. There are only three opera-

tions that return a Boolean: equality operator =, inequality<>, and equalsIgnoreCase(). In these cases, instead of using kif the comparisons are false, we can return the value fromany string matching distance function to evaluate how closeany two strings are. In our approach, we implemented theedit distance [55] function, but any other string matchingdistance function can easily be incorporated.

4.2.2 Collection-Related Types

Collection types defined in OCL are Set, OrderedSet, Bag,and Sequence. Details of these types can be found in thestandard OCL specification [2]. OCL defines severaloperations on collections. An important point to note isthat if the return type of an operation on a collection is Realor Integer and that value is used in an expression, then thedistance is calculated in the same way as for primitive typesas defined in Section 4.2.1. An example is the size()operation, which returns an Integer. In this section, wediscuss branch distances for operations in OCL that arespecific to collections, and that usually are not common inprogramming languages for expressing constraints/expres-sions and, hence, are not discussed in the literature.

Equality of collections (=). In OCL constraints, we mayneed to compare the equality of two collections. We defineda branch distance for comparing collections as shown inFig. 3. The main goal is to improve the search process byproviding a heuristic that is finer grained than simplyyielding 0 if the result of an evaluation is true, a valuebetween 0 and k if the result is false, and k otherwise. InFig. 3, a branch distance for equality (=) of collections iscalculated in one of the following four ways.

First, if any of Collections, i.e., C1 and C2 are undefined,then the distance is simply 1. Second, if collections C1 andC2 are not of the same kind, i.e., not (C1.oclIsKindOf(C2))evaluate to true, then the distance is a value from startingfrom 0.75 and less than 1. Note that any other constantcould have been used to represent the maximum distance(1 in this case). Whenever, the distance is less than 1 andgreater than equal to 0.75, it means that the collections are ofdifferent types, and the search algorithms must be guidedto make the two collections of the same types.

Once the second condition is satisfied, the searchalgorithms must be guided such that the collections havean equal number of elements. The third condition in theformula checks if the collections which are of the same typehave different sizes. In that case, the search is guided to

1382 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 39, NO. 10, OCTOBER 2013

TABLE 3Truth Values for Boolean Operations [2]

TABLE 4Branch Distance Calculations for OCL’s Operations for Boolean

Page 8: Generating Test Data from OCL Constraints with Search Techniques

generate collections of equal size, i.e., C1! sizeðÞ ¼C2! sizeðÞ. We compute dðC1! sizeðÞ ¼ C2! sizeðÞÞand since sizeðÞ returns an integer, this distance calculationis simply performed using the equality operation onnumerical data as shown in Table 5. The maximum distancevalue that can be taken by dðC1 ¼ C2Þ in this case can bederived as follows:

dðC1 ¼ C2Þ$ 0:5þ 0:25�norðdðC1! sizeðÞ ¼ C2! sizeðÞÞÞ$ 0:5þ 0:25�norðk�norðabsðC1! sizeðÞ � C2

! sizeðÞÞÞÞ½Using formual for equality from Table V�$ 0:5þ 0:25 � ðY=ðYþ 1ÞÞ½Using the definition

of norðXÞ; and assuming

Y ¼ k�nor ðabsðC1! sizeðÞ � C2! sizeðÞÞÞ�:

In the above equation, Y =ðY þ 1Þ always computes avalue less than 1. Equation 0:5þ 0:25�ðY =ðY þ 1ÞÞ thereforealways takes a value between 0.5 and 0.75. Whenever,dðC1; C2Þ is greater than 0.5 and less than 0.75, this meansthat collections do not have the same number of elements.

For the fourth condition, i.e., if collections are of the sametype and have equal numbers of elements, the distance iscalculated based on comparing elements in both collections.If the collections are of either kind Set or Bag, we first sort

both collections based on their elements. Sorting the twocollections facilitates their equality evaluation by reducingthe required number of comparisons of elements. Aftersorting, we only need to compare each pair of elementsfrom the two collections at the same index (C1! sizeðÞcomparisons) instead of comparing all elements from onecollection with all elements in the other collection(C1! sizeðÞ�C1! sizeðÞ comparisons). For sorting, a nat-ural order among the elements must be defined. Forinstance, if collections consist of integers, then we simplysort based on the integer values. However, for other typesof elements (e.g., enumerations), there is no predefinednatural order and, in these cases, we sort using the name of

ALI ET AL.: GENERATING TEST DATA FROM OCL CONSTRAINTS WITH SEARCH TECHNIQUES 1383

Fig. 3. Branch distance equality for collections.

TABLE 5Branch Distance Calculations of OCL Relational Operations for Numeric Data

Page 9: Generating Test Data from OCL Constraints with Search Techniques

element identifiers (e.g., a sequence of enumerationsB;A;D;C would be sorted into fA;B;C;Dg). If a collectionconsists of collections, we recursively traverse the collectionand sort elements in each constrained collection that are ofprimitive types. Notice that how the sorting is done is notimportant. The important property that needs to be satisfiedis that if two collections are equal (regardless of the type ofcollection), then the sorting algorithm should produce thesame paired alignment. For example, the set fB;A;Cg isequal to fC;B;Ag (the order in the sets has no importance),and their alignment using the name of enumerationelements produces the same sorted sequence fA;B;Cg.When a collection consists of objects, a similar process isfollowed except that we perform sorting based on thevalues of the objects’ instance variables.

Once the elements of both collections are sorted, we sumthe distances between each pair of elements in the sameposition in the collections (i.e., distance between theith element of C1 with the ith element of C2) and finallytake the average by dividing the sum with the number ofelements in C1. When all elements of C1 are equal to C2,then dðpairÞ yields 0 and as a result dðC1 ¼ C2Þ ¼ 0. Themaximum value dðC1 ¼ C2Þ can take in this case can bederived as follows:

dðC1 ¼ C2Þ

$ 0:5 �XC1!ðÞ

i¼1

norðdðpairiÞÞ=C1! sizeðÞ

$ 0:5 � XC1!sizeðÞ

i¼1

dðpairiÞ=ðdðpairiÞ þ 1Þ=C1! sizeðÞ!

½Using definition of nor�:

dðpairiÞ=ðdðpairiÞ þ 1Þ will always compute a value less

than 1. Considering a simple example, in which collections

consist of Boolean values, using the formula from Table 4,

dðpairiÞ can take j, where j < k, as the maximum value. So

the formula will be reduced to

d C1 ¼ C2ð Þ $ 0:5 � XC1!ðÞ

i¼1

j=ðjþ 1Þ=C1! sizeðÞ!

$ 0:5 � ðj=ðjþ 1ÞÞ:

Since ðj=ðjþ 1ÞÞ computes a value below 1, the aboveformula will always compute a value below 0.5. To furtherexplain the computation of branch distance, when conditionnot (C1.oclIsKindOf(C2)) and C1! sizeðÞ ¼ C2! sizeðÞ istrue, we provide an example below:

Example 1. Suppose C1 ¼ f2; 1; 3g, C2 ¼ f5; 4; 9g, then the

distance will be calculated as follows:

d C1 ¼ C2ð Þ $ 0:5 �XC1!sizeðÞ

i¼1

norðdðpairiÞÞ=C1! sizeðÞ

$ 0:5 �X3

i¼1

norðdðpairiÞÞ=3

$ 0:5 � ðnor d 1 ¼ 4ð Þð Þ þ nor d 2 ¼ 5ð Þð Þþ norðdð3 ¼ 9ÞÞÞ=3½Sorted C1 and C2 will be

1; 2; 3 and 4; 5; 9�$ 0:5 � ðnorð3Þ þ norð3Þ þ norð6ÞÞ=3½Using k ¼ 1 and formula of equality

of two numeric values from Table 5

$ 0:39:

As illustrated in Table 6 the three conditions in Fig. 3match four distinct value ranges, thus ensuring that thedistance is always superior in the first case and the lowest inthe fourth case; therefore properly guiding the search.

Operations checking existence of one or more objects in acollection. OCL defines several operations to check theexistence or absence of one or more elements in a collectionsuch as includes() and excludes(). Whether a collection isempty is checked with isEmpty() and notEmpty(). Suchoperations can be further processed for a more refinedcalculation of branch distance, as described in Table 7.

For includes(object:T), a branch distance is the minimumdistance in the set of all distances (calculated using theheuristic for equality as listed in Table 5) between object andeach element of the collection (self) on which includes isinvoked. Getting the minimum distance will effectivelyguide a search algorithm to generate data that will make anelement in self equal to the object. When any element of self isequal to object, the distance will be 0, and the overalldistance will therefore be 0. When none of the collectionelements is equal to object, then we select the element in thecollection with minimum distance. The example belowillustrates how branch distance is calculated:

Example 2. Suppose C ¼ f1; 2; 3g and we have an expres-sion C ! includesð4Þ, then the branch distance will becalculated as

dðC! includesð4ÞÞ $ mini¼1 to self!sizeðÞ dðobject ¼ self:at ið ÞÞ$ min

i¼1 to 3dðobject ¼ C:at ið ÞÞ

$ minðd 1 ¼ 4ð Þ; d 2 ¼ 4ð Þ; d 3 ¼ 4ð ÞÞÞ$ minð3; 2; 1Þ½Using k ¼ 1 and formula of the equality

of two integers from Table 5�$ 1:

For excludes(object:T), a branch distance is calculated in asimilar way as includes, except: 1) We use the distance

1384 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 39, NO. 10, OCTOBER 2013

TABLE 6Minimum and Maximum Distance Values for Distance Calculation for Equality of Collections

Page 10: Generating Test Data from OCL Constraints with Search Techniques

heuristic for inequality (<>); 2) sum up the distances of allelements in the collection, which are equal to object toensure that object is not present even once in self. Theexample below illustrates how a branch distance iscomputed using the formula.

Example 3. Suppose C ¼ f1; 2; 3g and we have an expres-sion C ! excludesð2Þ, then

d C! excludes 2ð Þð Þ $Xself!sizeðÞ

i¼1

dðobject <> self:at ið ÞÞ

$X3

i¼1

dðobject <> C:at ið ÞÞ

$ d 1 <> 2ð Þ þ d 2 <> 2ð Þ þ d 3 <> 2ð Þ$ 0þ 1þ 0½Using k ¼ 1;

and formula from Table 5�$ 1:

In a similar fashion, we calculate branch distance ofincludesAll and excludesAll (Table 7), where we check if allelements of one collection are present/absent in anothercollection. For includesAll, we sum, over all elements of acollection, their minimum distance among all the elementsof another collection, as shown in the formula for includesAllin Table 7. For excludesAll, we sum all distances between allpossible pairs of elements across the two collections, asshown in the formula for excludesAll in Table 7. Branchdistance calculations for isEmpty and notEmpty are alsodefined in Table 7.

Branch distance for iterators. OCL defines several opera-tions to iterate over collections. Below, we will discussbranch distances for these iterators.

The forAll() iterator operation is applied to an OCLcollection and takes as input a Boolean expression todetermine whether the expression holds for all elementsin the collection. To obtain a fine grained branch distance,we calculate the distance of the Boolean expression bycomputing the distance on all elements in the collectionand summing the results. The distances are summed upbecause Boolean expression must be true for all elementsof the collection on which forAll is invoked. This means thatthe more the elements for which Boolean expression is false,the higher the branch distance. The function for forAllpresented in Table 8 is generic for any number of iterators.For the sake of clarity in the paper, we assume that functionexpr ðv1; v2; . . . ; vmÞ in Table 8 evaluates an expression expron a set of elements v1; v2; . . . ; vm. To explain expr, suppose

we have a collection C ¼ f1; 2; 3g and an expressionC ! forAllðx; yjx � y � gt; 0Þ, then expr(C.at(1),C.at(2)) en-tails calculating “dðx � y > 0Þ,” where x ¼ C:atð1Þ, i.e., 1and y ¼ C:atð2Þ, that is, 2. The keyword self in the tablerefers to the collection on which an operation is applied,atðiÞ is a standard OCL operation that returns theith element of a sequence or an ordered set, and size() isanother OCL operation that returns the number of elementsin a collection. The denominator ðself ! sizeðÞÞm is used tocompute the average distance over all element combina-tions of size m because we have ðself ! sizeðÞÞm distancecomputations. Notice that calculating the average distanceis important to avoid bias toward decreasing the size of thecollection. For example, because it is a minimizationproblem (i.e., we want to minimize the branch distance),there would be a bias against larger collections since theywould tend to have a higher branch distance (there are anumber of branch distance additions that are polynomial inthe number of iterators and collection size). A searchoperator that removes one element from the collectionwould always produce a better fitness function, so it wouldhave a clear gradient toward the empty collection. Anempty collection would make the constraint true, but it canhave at least two kinds of side effects: First, if a clause isconjuncted with other clauses that depend on the size (e.g.,C ! forAllðx j x > 5) and C ! sizeðÞ ¼ 10), then therewould likely be plateaus in the search landscape (e.g.,gradient to increase the size toward 10 would be masked bythe gradient toward the empty collection); second, becausein our context we solve constraints to generate test data, wewant to have fault revealing test data and not always emptycollections. In general, to avoid side effects such asunnecessary fitness plateaus, our branch distance functionsare designed in such a way that if there is no need to changethe size of a collection to solve a constraint on it, then thebranch distances should not have bias toward changing itssize in one direction or another.

Below, we further illustrate the branch distance for forAllwith the help of examples:

Example 4. Suppose we have a collection C ¼ f1; 2; 3g andthe expression is C ! forAllðxjx ¼ 0Þ. In this example,we have just one iterator x, and therefore m ¼ 1. In thiscase, the formula will be

d C! forAll xjx ¼ 0ð Þð Þ

$XC$sizeðÞ

i1¼1

dðexprðC:at i1ð ÞÞ=C! sizeðÞ:

ALI ET AL.: GENERATING TEST DATA FROM OCL CONSTRAINTS WITH SEARCH TECHNIQUES 1385

TABLE 7Branch Distance Calculation for Operations Checking Objects in Collections

Page 11: Generating Test Data from OCL Constraints with Search Techniques

The branch distance in this case will be calculated as

dðC! forAllðxjx ¼ 0ÞÞ$ ðdðexprðC:atð1ÞÞ þ dðexprðC:atð2ÞÞþ dðexprðC:atð3ÞÞÞ=3

$ ðdðexprð1Þ þ dðexprð2Þ þ dðexprð3ÞÞ=3$ ð1þ 2þ 3Þ=3½k ¼ 1; definition of expr

and formulas from Tables 4 and 5�$ 6=3 ¼ 2:

Example 5. Suppose we have a collection C ¼ f1; 2g and the

expression is C ! forAllðx; y j x � y > 0Þ. In this case, we

have two iterators x and y and thus the formula will

become

d C! forAll x; y j x � y > 0ð Þð Þ

$PC!sizeðÞ

i1¼1

PC!sizeðÞi2¼1 dðexprðC:at i1ð Þ;C:atði2ÞÞðC! sizeðÞÞ2

:

The branch distance in this case will be calculated as

$ ðdðexprðC:atð1Þ;C:atð1ÞÞÞ þ dðexprðC:atð1Þ;C:atð2ÞÞÞþ dðexprðC:atð2Þ;C:atð1ÞÞÞþ dðexprðC:atð2Þ;C:atð2ÞÞÞÞ=4C:atð2ÞÞÞÞ=4$ ðdðexprð1; 1ÞÞ þ dðexprð1; 2ÞÞ þ dðexprð2; 1ÞÞþ dðexprð2; 2ÞÞÞ=4

$ ðdð1 � 1 > 0Þ þ dð1 � 2 > 0Þ þ dð2 � 1 > 0Þþ dð2 � 2 > 0ÞÞ=4

$ ð0þ 0þ 0þ 0Þ=4 ½Considering k ¼ 1 and using

formulas from Tables 4 and 5�$ 0:

In a similar fashion, the formula can be used for anynumber of iterators (m).

The exists() iterator operation determines whether a

Boolean expression holds for at least one element of the

collection on which this operation is applied. The generic

distance form for exists() is shown in Table 8. The definition

of exists() is very similar to forAll() except for two

differences. First, instead of summing distances across all

element combinations of size m, we compute the minimum

of these distances because any element satisfying exp makes

exists() true. Second, we do not have a denominator because

no average needs to be computed. The expr() function works

in the same way as for forAll(). Below, we further illustrate

branch distance calculation using two examples.

Example 6. Suppose we have a collection C ¼ f1; 2; 3g and

the expression is C ! existsðx j x ¼ 0Þ. In this example,

we have just one iterator, i.e., x. The formula will be

d C! exists x j x ¼ 0ð Þð Þ $ mini121 to 3

dðexprðC:at i1ð ÞÞ:

The branch distance in this case will be calculated as

$ minðdðexprðC:atð1ÞÞ; dðexprðC:atð2ÞÞ; dðexprðC:atð3ÞÞÞ$ minðdðexprð1Þ; dðexprð2Þ; dðexprð3ÞÞ$ minð1; 2; 3Þ½Using k ¼ 1; expr; and formulas from

Tables 4 and 5�$ 1:

Example 7. Suppose we have a collection C ¼ f1; 2g and

the expression is C ! existsðx; yjx � y � 1Þ. In this case,

we have two iterators x and y, and thus the formula

will become

d C! exists x; yjx � y > 0ð Þð Þ$ min

i1;i221 to Cg!sizedðexprðC:at i1ð Þ;C:at i2ð ÞÞ:

The branch distance in this case will be calculated as

$ minðdðexprðC:atð1Þ;C:atð1ÞÞÞ; dðexprðC:atð1Þ;C:atð2ÞÞÞ;dðexprðC:atð2Þ;C:atð1ÞÞÞ; dðexprðC:atð2Þ;C:atð2ÞÞÞÞ

$ minðdðexprð1; 1ÞÞ; dðexprð1; 2ÞÞ; dðexprð2; 1ÞÞ;dðexprð2; 2ÞÞÞ

$ minðdð1 � 1 > 1Þ; dð1 � 2 > 1Þ; dð2 � 1 > 1Þ; dð2 � 2 > 1ÞÞ$ minð1; 0; 0; 0Þ½Considering k ¼ 1; the definition of expr;

and using formulas from Tables 4 and 5�$ 0:

In a similar fashion, as explained with Example 6 andExample 7, the formula can be used for any number ofiterators (m).

In addition, we also provide branch distance for one()and isUnique() operations in Table 8. The one operation

1386 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 39, NO. 10, OCTOBER 2013

TABLE 8Branch Distance for forAll and exists

Page 12: Generating Test Data from OCL Constraints with Search Techniques

returns true only if exp evaluates to true for exactly oneelement of the collection. The isUnique() operation returnstrue if exp on each element of the source collectionevaluates to a different value. In this case, the distance iscalculated by computing and summing the distancesbetween each element of the collection and every otherelement in the collection. Since in this formula we arecomputing ðððself ! sizeðÞÞ � ðself ! sizeðÞ � 1ÞÞÞ=ð2Þ dis-tances, we compute the average distance by using thisformula in the denominator. Again, calculating the averagedistance is important to avoid bias in the search towarddecreasing the size of the collection as we discussed forforAll. Below we provide an example of how we calculatebranch distance for isUnique().

Example 8. Suppose we have a collection C ¼ f1; 1; 3g andthe expression is C ! isUniqueðx j xÞ. In this example,we have just one iterator, i.e., x. Using the formula ofbranch distance for isUnique():

dðC! isUniqueðx j xÞÞ

$PðC!sizeðÞ�1Þ

i¼1

PðC!sizeðÞÞj¼iþ1 dðexprðC:at ið ÞÞ <> exprðC:at jð ÞÞÞ

ðC! sizeðÞð Þ � ðC! sizeðÞ � 1ÞÞ=2$ ðdðexprðC:atð1ÞÞ <> exprðC:atð2ÞÞÞþ dðexprðC:atð1Þ <> exprðC:atð3ÞÞÞþ dðexprðC:atð2ÞÞ <> exprðC:atð3ÞÞÞÞ=ðð3 � 2Þ=2Þ

$ ðdð1 <> 1Þ þ dð1 <> 3Þ þ dð1 <> 3ÞÞ=3$ ð1þ 0þ 0Þ=3

$ 1=3:

Select, reject, collect operations select a subset of elementsin a collection. The select() operation selects all elements of acollection for which a Boolean expression is true, whereasreject() selects all elements of a collection for which a Booleanexpression is false. In contrast, the collect() iterator mayreturn a subset of elements that does not belong to thecollection on which it is applied. Since all these iterators(like the generic iterator operation) return a collection andnot a Boolean value, we do not need to define branchdistance for them, as discussed in Section 4.2.1. However,an iterator operation (such as select()) followed by anotherOCL operation, for instance, size(), can be combined to makea Boolean expression of the following form:

exp ¼ C! selectionOp ðPÞ ! sizeðÞ RelOp zðÞ;

where C is a collection, selectionOp is either select, reject, orcollect, P is a Boolean expression, RelOp is a relationaloperation from set f<;<¼;¼; <>;>;>¼g, and z() is afunction that returns a constant value. A simple way ofcalculating branch distance for the above example, whenRelOp is =, and selectionOp is select would be as follows:

exp :¼ C! select ðPÞ ! sizeðÞ ¼ zðÞIf exp ¼ true then

dðexpÞ ¼ 0

else

dðexpÞ ¼ jC! selectðPÞ ! sizeðÞ � zðÞj þ k

An obvious problem of calculating branch distance inthis way is that it does not give any gradient at all to helpsearch algorithms solve P , which can be arbitrarilycomplex. To optimize branch distance calculation in this

particular case, we need special rules that are definedspecifically for each RelOp.

For> and>¼ , when exp is false, this means that the size ofresultant collection of the expression C ! selectðP Þ is lessthan the size which will make the branch distance 0. In thiscase, first we need a collection with size greater that zðÞ, andthen we need to obtain those elements of C that increase thevalue of size() returned by C ! selectðP Þ ! sizeðÞ. This canbe achieved by the rule shown in the first row of Table 9. Thenormalization function nor() is necessary because the branchdistance should first reward any increase in C ! sizeðÞ untilit is greater than zðÞ regardless of the evaluation of P on itselements. Then, once the collection C has enough elements,we need to account for the number of elements for which P istrue by using ððzðÞ � C ! selectðP Þ ! sizeðÞÞ þ kÞ. The func-tion d(P) returns the sum of branch distance evaluations of anexpression P over all the elements in C and providesadditional gradient by quantifying how close are collectionelements from satisfying P . Below, we further illustrate thiscase with an example.

Example 9. Suppose we have a collection C ¼ f1; 1; 3g andthe expression is C ! selectðx j x > 1Þ ! sizeðÞ >¼ 3.Using the formula of branch distance for the case when

RelOp is >, > ¼.In this case, C ! sizeðÞ is 3, which is equal to z(),

i.e., 3, so the formula for branch distance calculation is

dðC! selectðx j x > 1Þ ! sizeðÞ >¼ 3Þ$ norððzðÞ � C! selectðPÞ ! sizeðÞÞ þ kþ norðdðPÞÞÞ$ norðð3� 1Þ þ 1þ norðdðx > 1ÞÞÞ½Assuming k ¼ 1�$ norð3þ norðdðx > 1ÞÞÞ$ norð3þ norðdð1 > 1Þ þ dð1 > 1Þ þ dð3 > 1ÞÞÞ$ norð3þ norð1þ 1þ 0ÞÞ½Using k ¼ 1;

and formula from Table 5�$ norð3þ 0:667Þ$ 0:78:

For < and < ¼, when exp is false, this means thatC ! selectðP Þ ! sizeðÞ is greater than the size whichwill make the branch distance 0. Similar to the previouscase, the distance computation account for those ele-ments of C that decrease the value of size() returned byC ! selectðP Þ ! sizeðÞ and uses norðdðnotP ÞÞ to provideadditional gradient to the search, as shown in the secondrow of Table 9.

For the cases when the value of RelOp is inequality(<> ), the rule is shown in the third row of Table 9.Recall that our expression is in the following format:exp ¼ C ! selectionOpðP Þ ! sizeðÞRelOpzðÞ. For thisrule, there are three cases based on the value ofC ! selectðP Þ ! sizeðÞ. Recall that d(P) is simply thesum of all dðÞ on all elements of C. The first case is whenC ! selectðP Þ ! sizeðÞ ¼ 0, where P does not hold forany element in C. To guide the search toward increasingthe size of the collection, dðexpÞ will be dðP Þ so as tominimize the sum of distances of all elements with P .The second case is when P is true for all elements of C,which means that C ! selectðP Þ ! sizeðÞ ¼ C ! sizeðÞ.To guide the search in decreasing the size of thecollection, for reasons that are similar to the first case,

ALI ET AL.: GENERATING TEST DATA FROM OCL CONSTRAINTS WITH SEARCH TECHNIQUES 1387

Page 13: Generating Test Data from OCL Constraints with Search Techniques

we define dðexpÞ as dðnot P Þ. When 0 < C ! selectðP Þ !sizeðÞ < C ! sizeðÞ, we can guide the search to eitherincrease or decrease the size of the collection and thusdefine dðexpÞ as minðdðP Þ; dðnotP ÞÞ.

For the cases when the value of RelOp is equality (¼),the rule is shown in the fourth row of Table 9. There aretwo important cases which work in a similar way as thefirst and second cases as reported in Table 9. The firstcase is when C ! selectðP Þ ! sizeðÞ > zðÞ, where weneed to decrease C ! selectðP Þ ! sizeðÞ, which can beachieved by minimizing (C ! selectðP Þ ! sizeðÞ �zðÞÞ þ kþ norðdðnot P Þ). The second case is when C !selectðP Þ ! sizeðÞ < zðÞ. For this case, we need toincrease the number of elements in C for which P holdsand must minimize ðzðÞ � C ! selectðP Þ ! sizeðÞÞ þkþ norðdðP ÞÞ.

Note that we only presented formulas in Table 9 for thecases when the iterator operation considered selectionOp isselect; however, the formulas can simply be extended forother iterator operations. The collect operation works in thesame way as select, and hence, the formulas in Table 9 cansimply be adapted by replacing select with collect in theformulas. For instance, for the case when RelOp is > or> ¼, the formula for collect would be

dðexpÞ ¼ ðzðÞ � C ! collectðPÞ ! sizeðÞÞ þ kþ norðdðPÞÞ:

The reject operation works in a different way than selectbecause it rejects all those elements for which a Booleanexpression is true, but reject(P) can be simply transformedinto selectðnotP Þ.

In addition to the rules for an iterator followed by size(),we defined two new rules when a select() is followed byforAll() or exists() that are shown in Table 10. For example,C ! selectðP1Þ ! forAllðP2Þ (the first row in Table 10)implies that for all elements of C for which P1 holds, P2

should also hold. In other words, P1 implies P2. Therefore,C ! selectðP1Þ ! forAllðP2Þ can simply be transformedinto C ! forAllðP1 implies P2Þ. Similarly, a selectðP1Þfollowed by an existsðP2Þ can simply be transformedinto existsðP1 and P2Þ. This means that there should beat least one element in C for which P1 and P2 holds.Notice that a sequence of selects can be simplycombined, for example, C ! selectðP1Þ ! selectðP2Þ isequivalent to C ! selectðP1 and P2Þ.

The effectiveness of all these rules for calculating branchdistance is empirically evaluated in Section 7.

4.2.3 Tuples in OCL

In OCL, several different values can be grouped togetherusing tuples. A tuple consists of different parts separated bya comma and each part specifies a value. Each value has anassociated name and type. For example, consider thefollowing example of a tuple in OCL:

Tuple firstName ¼ ‘‘John’’; age ¼ 29:

This tuple defines a String firstName of value “John” andan Integer age of value 29. Each value is accessed via itsname. For example, Tuple{firstName = “John”, age = 29.}age

returns 29. There are no operations allowed on tuples inOCL because they are not subtypes of OCLAny. However,when a value in a tuple is accessed and compared, a branchdistance is calculated based on the type of the value and thecomparison operation used. For example, consider thefollowing constraint:

TuplefString: firstName ¼ ‘‘John’’;

Integer: age ¼ 29g:age > 20:

In this case, since age is an Integer and the comparisonoperation is > , we use the branch distance calculation ofnumerical data for the case of > as defined in Table 5.

4.2.4 Special Cases

In this section, we will discuss branch distance calculationsfor some special cases, including enumerations and otherspecial operations provided by OCL, such as oclInState.

4.2.5 Enumerations

Enumerations are data types in OCL that have a name and aset of enumeration literals. An enumeration can take anyone of the enumeration literals as its value. Enumerations inOCL are treated in the same way as enumerations inprogramming languages such as Java. Because enumera-tions are objects with no specific order relation, equalitycomparisons are treated as basic Boolean expressions, whosebranch distance is 0, a value between 0 and k, or k.

1388 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 39, NO. 10, OCTOBER 2013

TABLE 10Special Rules for Select() Followed by forALL and exists

TABLE 9Special Rules for Select() Followed by Size() When exp Is False

Page 14: Generating Test Data from OCL Constraints with Search Techniques

4.2.6 oclInState

The oclInState(s:OclState) operation returns true if an objectis in a state represented by s; otherwise it returns false. Thisoperation is valid in the context of UML state machines todetermine if an object is in a particular state of the statemachine. OclState is a data type similar to enumeration. Thisdata type is only valid in the context of oclInState and isused to hold the names of all possible states of an object asenumeration literals. In this particular case, the states of anobject are not precisely defined, i.e., each state of the objectis uniquely identified based on the names of the states.For example, a class Light having two states, On and Off, ismodeled as an enumeration with two literals On and Off.In this example, s:OclState takes either On or Off value andthe branch calculation is the same as for enumerations.However, if the states are defined as state invariants, whichis a common way of defining states in a UML state machineas an OCL constraint [2], then the branch distance iscalculated based on two special cases depending onwhether we can directly set the state of an object bymanipulating the state variables or not. Below, we willdiscuss each case separately.

The first case is when the state of an object can bemanipulated by directly setting its state defining attributes(or properties) to satisfy a state invariant. In this case, stateinvariants—which are OCL constraints—can be satisfied bysolving the constraints based on heuristics defined in theprevious sections. Note that each state in a state machine isuniquely identified by a state invariant and there is nooverlapping between state invariants of any two states(strong state invariants [64]). For instance, in our industrialcase study, we needed to emulate faulty situations in theenvironment for the purpose of robustness testing whichwere modeled as OCL constraints defined on the propertiesof the environment. In this case, it was possible to directlymanipulate the properties of the environment emulatorbased on which state of the environment is defined andeach state was uniquely identified based on its stateinvariant. A simple example of such a state invariant forthe environment is given below:

self:packetLoss:value > 5 and self:packetLoss:value <¼ 10:

The above state invariant defines a faulty situation in theenvironment when the value of packet loss in the environ-ment is greater than 5 percent and less or equal to10 percent. This constraint can easily be solved using theheuristics defined in the previous sections and the value ofpacketLoss generated can be directly set for the environment.

In the second case, when it is not possible to directly setthe state of an object, the approach level heuristic [31] can beused in conjunction with branch distance to make the objectreach the desired state. We will explain this case using adummy example of a UML state machine shown in Fig. 4.The approach level calculates the minimum number oftransitions in the state machine to reach the desired statefrom the closest executed state. For instance, in Fig. 4, ifthe desired state is S3 and currently we are in S1, then theapproach level is 1. By calculating the approach level for thestates that the object has reached, we can obtain a state thatis closest to the desired state (i.e., it has the minimum

approach level). In our example, the closest state based onthe approach level is S1. Now, the goal is to transition in thedirection of the desired state to reduce the approach level to0. This goal is achieved with the help of branch distance.The branch distance is used to heuristically score theevaluation of the OCL constraints on the path from thecurrent state to the desired state (e.g., guards on transitionsleading to the desired state). The distance is calculatedbased on the heuristics defined in this paper. The branchdistance is used to guide the search to find test data thatsatisfy these OCL constraints. An event corresponding to atransition can occur several times but the transition is onlytriggered when the guard is true. The branch distance iscalculated every time the guard is evaluated to capture howclose the values used are from solving the guard. In theexample, we need to solve the guard “a > 0” so thatwhenever e4() is triggered we can reach S3. Since the guardsare written in OCL, they can be solved using the heuristicsdefined in the previous sections. In the case of MBT, it is notalways possible to calculate the branch distance when therelated transition has never been triggered. In these cases,we assign to the branch distance its highest possible value.More details on this case can be found for example in [6].

4.2.7 Miscellaneous Operations

OCL defines several special operations that can be usedwith all types of objects: oclIsTypeOf(), oclIsKindOf(),oclIsNew(), oclIsUndefined(), and oclIsInvalid(). The oclIs-TypeOf(t:Classifier) returns true if t and the object on whichthis operation is called have the same type. TheoclIsKindOf(t:Classifier) operation returns true if t is eitherthe direct type or one of the super types of the object onwhich the operation is called. The operation oclIsNew()returns true if the object on which the operation is called isjust created. These three operations are defined to checkthe properties of objects and, hence, are not used for testdata generation; therefore we do not explicitly definebranch distance calculation for these operations. However,whenever these operations are used in constraints, thebranch distance is calculated as follows: If the invocationof an operation evaluates to true, then the branch distancesis 0; if the operation evaluates to false, then the branchdistance is a value between 0 and k, and k when theoperation evaluates to undefined. We deal with oclIsUnde-fined() and oclIsInvalid operations in the same way.

4.2.8 User-Defined Operations

Apart from the operations defined in the standard OCLlibrary, OCL also provides a facility for the users to definenew operations. The pre and postconditions of theseoperations are written using OCL expressions and may callthe standard OCL library operations. As we discussed inSection 4, we only provide specialized branch distance

ALI ET AL.: GENERATING TEST DATA FROM OCL CONSTRAINTS WITH SEARCH TECHNIQUES 1389

Fig. 4. A dummy example to explain oclInState().

Page 15: Generating Test Data from OCL Constraints with Search Techniques

calculations for the operations defined in the standard OCLlibrary. For user-defined operations, we calculate a branchdistance according to the return types of these operations. Ifa user-defined operation returns a Boolean to provide morefine grained fitness functions, it is possible to use testabilitytransformations on those operations as, for example, insearch-based software testing of Java software [65]. In ourtool, we have not implemented and evaluated this type oftestability transformations, and further research would beneeded to study their applications in OCL. For any otherreturn type but Boolean, we define a branch distance usingthe rules defined in Section 4.2.4. For instance, consider auser-defined OCL operation named operation1(), which isdefined on a collection and returns a collection, and is usedin the following constraint:

c1! operation1ðÞ ! isEmptyðÞ:

In this case, the branch distance is calculated based onthe heuristic for isEmpty() as defined in Section 4.2.2.

5 TOOL SUPPORT AND RUNNING EXAMPLE

In this section, we present our implementation of search-based test data generation and a running example todemonstrate how the generated data are used by ourModel-based testing tool TRUST [66] to generate executabletest cases.

5.1 Tool Support

To efficiently generate test data from OCL constraints, weimplemented the search-based approach described above.Fig. 5 shows the architecture diagram for our search-basedtest data generator tool named Evolutionary Solver for OCL(EsOCL). EsOCL is developed in Java that interacts with anexisting library, an OCL evaluator, called the EyeOCLSoftware (EOS) [38]. EOS is a Java component that providesAPIs to parse and evaluate an OCL expression based on anobject model. Our tool only requires interacting with EOSfor the evaluation of constraints. We use EOS as it is one ofthe most efficient evaluators currently available; however,any other OCL evaluator may be used. EsOCL implementsthe calculation of branch distance (DistanceCalculator) forvarious expressions in OCL as discussed in Section 4, whichaims at calculating how far the test data values are fromsatisfying constraints. For a constraint, the search space isdefined by those attributes that are used in the constraint.This is determined by statically parsing a constraint beforesolving it and improves the search efficiency in a similarfashion to the concept of input size reduction [67]. Thesearch algorithms employed are implemented in Java aswell and include Genetic Algorithm, (1+1) Evolutionary

Algorithm, and Alternating Variable Method. Note that ourimplementation of branch distance calculation correspondsto OCL semantics as specified in [2]. As a result, any otherOCL evaluator can be plugged into our implementationwithout affecting our implementation of branch distancecalculation.

5.2 Running Example of Test Data Generation

In this section, we present a running example todemonstrate how the generated test data are used in thetest cases generated using our model-based testing tool,called TRUST [66]. Fig. 6 shows a class diagram for a verysimplified version of the Video Conferencing System classdiagram, with one class VideoConferencingSystem. Thisclass has one attribute, NumberOfActiveCalls, which holdsthe number of current participants in a videoconference.The class has two operations dial(Number:Integer), whichare used to dial to a particular VCS using an IntegerNumber (valid values range from 4,000 to 5,000). The statemachine in Fig. 6 models the behavior of VCS, where itdials to another VCS using dial() subject to guard(Number <¼ 4;000 and Number <¼ 5;000) evaluating totrue, before transiting to the Connected state. FromConnected, when disconnect() is called the VCS transits toIdle. Notice that in this example we need to solve theguard and generate data for Number. Each state in Fig. 6has a state invariant written in OCL which serves as a testoracle.

TRUST works in a series of steps to generate executabletest cases including test data. For test case generation,TRUST manipulates models in three sequential steps asshown in Table 11 and Fig. 7. In the first step (implementedas a set of Kermeta [68] model-model transformations),TRUST flattens UML state machines with hierarchy andconcurrency. Details and algorithms for this step can befound in [66]. In our running example, the state machinedoes not have hierarchy or concurrency, so the input andoutput for this step are the same. In Step 2, the flattenedstate machine produced by Step 1 is converted into aninstance of TestTree ([66]) representing a set of abstract testpaths based on coverage criteria, such as All Transitions andAll States coverage. In our running example, using All Statescoverage, we obtain an abstract test path shown in Row 3 ofthe Output column in Table 11. Notice that the actualrepresentation of an abstract test path is in XML, but todemonstrate the test case generation process and for betterreadability we show it as text. Step 2 is also implemented asa set of Kemeta transformations. In the third step (Test casegeneration), each abstract test path is transformed into anexecutable test case using a set of model-to-text transforma-tions implemented in MOFScript [69]. For this step, as can

1390 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 39, NO. 10, OCTOBER 2013

Fig. 5. Architecture diagram for the search-based test data generator.

Page 16: Generating Test Data from OCL Constraints with Search Techniques

be seen in Fig. 7, TRUST uses our test data generatorby passing it the guard written in OCL (Number > ¼ 4;000and Number <¼ 5;000) and obtains a valid value forNumber, which in our running example is 4,005, as shownin Line 3 of the test script in Row 3, Output column inTable 11. This value is used in dial() to dial to another VCSin Line 4. Line 2 and Line 6 of the test script checks thecurrent state of the VCS with the state invariant (OCLconstraint) passed as a String in checkState() (Line 2 andLine 6). These state invariants are evaluated at runtimeusing EyeOCL [38] and the result of the evaluation (true orfalse) tells whether the VCS was in an incorrect state or not.

6 CASE STUDY: ROBUSTNESS TESTING OF VIDEO

CONFERENCE SYSTEM

This case study is part of a project aiming to supportautomated, model-based robustness testing of a coresubsystem of a video conference system called Saturn [66],developed by Cisco Systems, Inc., Norway. Saturn ismodeled as a UML class diagram meant to captureinformation about APIs and system (state) variables, whichare required to generate executable test cases in ourapplication context. The standard behavior of the system ismodeled as a UML 2.0 state machine. In addition, we usedAspect-oriented Modeling (AOM) and, more specifically,

the AspectSM profile [70] to model robustness behaviorseparately as aspect state machines. The robustness beha-vior is modeled based on functional and nonfunctionalproperties which, if violated, may lead to erroneous states.Such properties can be related to the SUT or its environ-ment, such as the network and other systems interactingwith the SUT. A weaver subsequently weaves robustnessbehavior into the standard behavior and generates astandard UML 2.0 state machine. Some details and modelsof the case study, including a partial woven state machine,are provided in [70]; however, due to confidentialityreasons, we cannot provide further details. Interestedreaders may also visit the official website of the systemwe tested for more information [71]. The woven statemachine produced by the weaver is used for test casegeneration. In the current, simplified case study, the wovenstate machine has 12 states and 103 transitions. Out of these103 transitions, only 83 transitions model robustnessbehavior as change events and 57 transitions out of these83 have identical change conditions, including 42 con-straints using select() and size() operations. A change eventis defined with a “when” keyword and associated condi-tion, and is triggered when this condition is met during theexecution of a system. An example of such a change event isshown in Fig. 8. This change event is fired during avideoconference when the synchronization between audioand video passes the allowed threshold. The nonfunctionalproperty synchronizationMismatch is defined using theMARTE profile [72], and measures the synchronizationbetween audio and video over time. To traverse thesetransitions appropriate test data are required that satisfy theconstraints specified as guards and when conditions(in case of change events). The characteristics of theseconstraints in terms of number of conjuncted clauses andtheir frequency of occurrence are reported in Table 12. Mostconstraints contain between six and eight clauses. The

ALI ET AL.: GENERATING TEST DATA FROM OCL CONSTRAINTS WITH SEARCH TECHNIQUES 1391

Fig. 6. A small running example for Video Conferencing System.

TABLE 11Steps for Test Case Generation Using TRUST

Fig. 7. Integration of TRUST with the EsOCL.

Page 17: Generating Test Data from OCL Constraints with Search Techniques

different OCL data types used in these constraintsare shown in Table 13 and we can see that most of theprimitive types are being used in our case study. Notice thatin our industrial case study, we needed to generate test dataonly for guards and change events of UML state machines.Due to the nature of the case study, the number of objectsfor each class were fixed (i.e., fixed upper limit formultiplicity) because objects corresponds to actual hard-ware features of a VCS such as audio and video channels.

In our case study, we target test data generation for model-based robustness testing of the VCS. Testing is performed atthe system level and we specifically target robustness faults,for example, related to faulty situations in the networkand other systems that comprise the environment of the SUT.Test cases are generated from the system state machinesusing the TRUST tool [66]. To execute test cases, we needappropriate data for the state variables of the system, statevariables of the environment (network properties and incertain cases state variables of other VCS), and inputparameters that may be used in the following UML statemachine elements: 1) guard conditions on transitions,2) change events as triggers on transitions, and 3) inputs totime events. We have successfully used the TRUST tool togenerate test cases using different coverage criteria on UMLstate machines, such as all transitions, all round trip,modified round trip strategy [5].

6.1 Empirical Evaluation

This section discusses the experiment design, execution,and analysis of the evaluation of the proposed OCL testdata generator on the VCS case.

6.1.1 Experiment Design

We designed our experiment using the guidelines proposedin [30], [73]. The objective of our experiment is to assess theefficiency of the selected search algorithms to generate testdata by solving OCL constraints. In our experiments, wecompared four search techniques: AVM, GA, (1+1) EA, andRS (Section 4). AVM was selected as a representative oflocal search algorithms. GA was selected because it is themost commonly used global search algorithm in search-based software engineering [30]. (1+1) EA is simpler thanGA, but previous software testing work showed it can be

more effective in some cases (e.g., see [63]). We used RS asthe comparison baseline to assess the difficulty of theaddressed problem [30].

From this experiment, we want to answer the followingresearch questions.

RQ1. Are search-based techniques effective and efficientat solving OCL constraints?

RQ2. Among the considered search algorithms (AVM,GA, (1+1) EA), which one fares best in solving OCLconstraints and how do they compare to RS?

6.1.2 Experiment Execution

We ran experiments for 57 OCL expressions from the VCSindustrial case study presented earlier. The number ofclauses in each expression varies from one to eight andthe median value is 6. The characteristics of the constraintsare summarized in Table 12, where we provide detailson the distribution of numbers of clauses. In Table 13, wesummarized the data types used in the constraints.

Fitness evaluations are computationally expensive asthey require the instantiation of models on which theconstraints are evaluated. Each algorithm was run 100 timesto account for the random variation inherent to randomizedalgorithms [74], which for our case study was enough togain enough statistical confidence on the validity of ourresults. We ran each algorithm up to 2,000 fitness evalua-tions for each constraint and collected data on whether analgorithm found a solution or not. On our machine (IntelCore Duo CPU 2.20 GHz with 4 GB of RAM, runningMicrosoft Windows 7 operating system), running 2,000fitness evaluations takes on average 3.8 minutes for allalgorithms. The number of fitness evaluations should not betoo high to enable enough runs on all constraints withinfeasible time, but should still represent a reasonable“budget” in an industrial setting (i.e., the time the softwaretesters are willing to wait when solving constraints togenerate system level test cases).

Though putting a limit on the number of fitnessevaluations is necessary for experimental purposes, inpractice one would instead put a limit on time dependingon practical constraints. This means we can run a searchalgorithm with as many iterations as possible and stop oncea predefined time threshold is reached if the constraint wasnot yet solved. For example, the choice of this thresholdcould be driven by the testing budget. However, thoughadequate in practice, in an experimental context using atime threshold would make it significantly more difficultand less reliable to compare to different search algorithms(e.g., accurately monitoring the passing of time, side effectsof other processes running at same time, inefficiencies inimplementation details).

1392 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 39, NO. 10, OCTOBER 2013

Fig. 8. A constraint checking synchronization of audio and video in avideoconference.

TABLE 12Characteristics of Constraints

TABLE 13OCL Data Types Used in Constraints

Page 18: Generating Test Data from OCL Constraints with Search Techniques

A solution is represented as an array of variables, thesame variables that appear in the OCL constraint we wantto solve. We selected steady state GA with a population sizeof 100 and a crossover rate of 0.75, with a 1.5 bias for rankselection. We used a standard one-point crossover, andmutation of a variable is done with the standard probability1=n, where n is the number of variables. Different settingswould lead to different performance of a search algorithm,but standard settings usually perform well [74]. As we willshow, because our search-based test data generator isalready very effective in solving OCL constraints, we didnot feel the need for tuning to improve the performanceeven further.

To compare the algorithms, we calculated their successrates. The success rate of an algorithm is defined as thenumber of times it was successful in finding a solution outof the total number of runs. In our context, it is the successrate in solving constraints.

6.1.3 Results and Analysis

Fig. 9 shows a box plot representing the success rates of the57 problems for AVM, (1+1) EA, GA, and RS. For eachsearch technique, the box-plot is based on 57 success rates,one for each constraint. The results show that AVM not onlyoutperformed all of the other three algorithms, i.e.,(1+1) EA, RS, and GA, but in addition achieved a consistentsuccess rate of 100 percent. (1+1) EA outperformed GA andRS and achieved an average success rate of 98 percent.Finally, GA outperformed RS, where GA and RS achievedan average success rate of 65 and 49 percent, respectively.We can observe that, with an upper limit of 2,000 iterations,(1+1) EA achieves a median success rate of 98 percent andGA exceeds a median of roughly 80 percent, whereas RScould not exceed a median of roughly 45 percent. We canalso see that all success rates for (1+1) EA are above90 percent and most of them are close to 100 percent.

Table 14 shows success rates for individual problems tofurther analyze the results. We observe that problems 42 to56 were solved by all the algorithms. The reason is thatthese problems are composed of just one or two clauses, asit can be seen from the number of conjuncted clausescolumn in Table 15. The problems with higher number of

clauses are the most difficult to solve for GA and RS, asshown in Table 15. As the number of conjuncted clauses isincreasing, the success rates of GA and RS are decreasing.However, in the case of AVM and (1+1) EA, we do not see asimilar pattern. AVM managed to maintain an averagesuccess rate of 100 percent even for problems with highernumbers of conjuncted clauses. In the case of (1+1) EA, theminimum average success rates (95 percent) are for theproblems with seven clauses. Based on these results, we cansee that our approach is effective and efficient, andtherefore practical, even for difficult constraints (RQ1) thatare likely to be encountered in industrial systems.

To check the statistical significance of the results, wecarried out a paired Mann-Whitney U-test (paired perconstraint) at the significance level (�) of 0.05 on thedistributions of the success rates for the four algorithms. Inall four distribution comparisons, p-values were very closeto 0, as shown in Table 16. This shows a strong statisticalsignificance in the differences among the four algorithmswhen applied to all 57 constraints in our case study. Inaddition, we performed a Fisher’s exact test with a ¼ 0:05between each pair of algorithms based on their success ratesfor the 57 constraints. The results for the Fisher’s exact testare shown in Tables 17 and 18. In addition to statisticalsignificance, we also assessed the magnitude of theimprovement by calculating the effect size in a standardizedway. We used odds ratio [73] for this purpose, as the resultsof our experiments are dichotomous. Tables 17 and 18 alsoshow the odds ratios for various pairs of approaches for all57 problems. For AVM versus (1+1) EA, we did not observepractically significant differences for most of the problems,except for Problem 2 and Problem 33, where AVMperformed significantly better than (1+1) EA. In addition,odds ratios between AVM and (1+1) EA for 23 problems aregreater than 1, implying that AVM has more chances ofsuccess than (1+1) EA. For 35 problems out of 57, the oddsratio is 1 suggesting that there is no difference betweenthese two algorithms. For AVM versus GA, for 38 problemsAVM performed significantly better than GA as p-valuesare below 0.05. The odds ratios for most of the problems,except for the problems with 1 or 2 clauses, are greater thanone, thus suggesting that AVM has more chances of successthan GA. Similar results were observed for (1+1) EA, wherefor 38 problems it significantly outperformed GA. For AVMversus RS, for almost all of the problems except the oneswith one or two clauses, AVM performed significantlybetter than RS. Similar results were observed for (1+1) EAversus RS and GA versus RS.

Based on the above results, we recommend using AVMfor as many iterations as possible (RQ2). We can see fromthe results that, even when we set the number of iterationsto 2,000, AVM managed to achieve a 100 percent successrate with 26 iterations on average. Note that in case studieswith more complex problems, a larger number of itera-tions may be required to eventually solve the problems.

6.2 Comparison with UMLtoCSP

UMLtoCSP [14] is the most widely used and referencedOCL constraint solver in the literature. To assess theperformance of UMLtoCSP to solve complex constraintssuch as the ones in our current industrial case study, we

ALI ET AL.: GENERATING TEST DATA FROM OCL CONSTRAINTS WITH SEARCH TECHNIQUES 1393

Fig. 9. Success rates for various algorithms.

Page 19: Generating Test Data from OCL Constraints with Search Techniques

conducted an experiment. We repeated the experiment for

the 57 constraints from our industrial application, whose

characteristics are summarized in Table 12. An example of

such a constraint, modeling a change event on a transition

of Saturn’s state machine, is shown in Fig. 10. This change

event is fired when Saturn is successful in recovering the

synchronization between audio and video. Since UML-

toCSP does not support enumerations, we converted each

enumeration into an Integer and limited its bound to the

number of literals in the enumeration. We also used the

MARTE profile to model different nonfunctional properties,

and because UMLtoCSP does not support UML profiles, we

1394 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 39, NO. 10, OCTOBER 2013

TABLE 14Success Rates for Individual Problems

Page 20: Generating Test Data from OCL Constraints with Search Techniques

explicitly modeled the used subset of MARTE as part of ourmodels. In addition, UMLtoCSP does not allow writingconstraints on inherited attributes of a class, so we modifiedour models and modeled inherited attributes directly in theclasses. We set the range of Integer attributes from 1 to 100.Since the UMLtoCSP tool did not support UML 2.xdiagrams, we also needed to recreate our models in aUML 1.x modeling tool.

We ran the experiment on the same machine as we usedin the experiments reported in the previous section. Thoughwe let UMLtoCSP attempt to solve each of the selectedconstraints for 1 hour each, it was not successful in findingany valid solution for the 42 problems comprised of five-eight clauses. A plausible explanation is that UMLtoCSP ishampered by a combinatorial explosion problem because ofthe complexity of the constraints in the model. However,

such constraints must be expected in real-world industrialapplications as our case study is not in any way particularlycomplex by industrial standards. In contrast, our test datagenerator managed to solve each constraint within at most2.96 seconds using AVM and 99 Seconds using (1+1) EA, asshown in Table 20. For the remaining 15 problemscomprised of either one or two clauses, UMLtoCSPmanaged to find solutions. Each of these constraints hasone variable of either Integer or Boolean type. The results ofthe comparison of UMLtoCSP with our tool for these simpleclauses (problems 42-56) are shown in Table 19. We providethe time taken by UMLtoCSP to solve each problem inseconds, which is reported by the tool itself and is0.01 second (maximum precision) for all 15 constraints.For these same 15 problems, we ran our tool 100 times andalso report the average time taken by our tool to solve each

ALI ET AL.: GENERATING TEST DATA FROM OCL CONSTRAINTS WITH SEARCH TECHNIQUES 1395

TABLE 15Average Success Rates for Problems of Varying Characteristics

TABLE 16Results for the Paired Mann-Whitney U-Test

at Significance Level of 0.05

TABLE 17Results for Fisher’s Exact Test

Page 21: Generating Test Data from OCL Constraints with Search Techniques

problem over 100 runs in Table 19. Since we used the same

machine to run experiments for both tools, it is clear that for

all fifteen problems comprising of one or two clauses,

UMLtoCSP took less time than our tool (which is on

average less than one second and in the worst case less than

two seconds). But considering that UMLtoCSP fails to solve

the more complex problems and its previously discussed

issues regarding limited support of OCL constructs, we

conclude it is not practical to apply UMLtoCSP in large

systems having complex constraints.

7 EMPIRICAL EVALUATION OF OPTIMIZED FITNESS

FUNCTIONS

In this section, we empirically evaluate the fine grained

fitness functions that we defined in Section 4 for various

OCL operations. Our goal is to determine if they really

improve the performance of search algorithms as compared

to using simple branch distance functions, yielding 0 if anexpression is true and k otherwise.

7.1 Experiment Design

To empirically evaluate whether the functions defined inSection 4 really improve the branch distance, we carefullydefined artificial problems to evaluate each heuristicbecause not all of the OCL constructs were used in theindustrial case study. The model we used for the experi-ment consists of a very simple class diagram with one classX. X has one attribute y of type Integer. The range of y wasset to �100 to 100. We populated 10 objects of class X. Theuse of a single class with 10 objects was sufficient to createcomplex constraints. For each heuristic, we created anartificial problem which was sufficiently complex (withsmall solution space) to remain unsolved by random search.We checked this by running all the artificial problems(100 times per problem) using random search for 20,000iterations per problem, and random search could not

1396 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 39, NO. 10, OCTOBER 2013

TABLE 18Results for Fisher’s Exact Test

Fig. 10. Condition for a change event which is fired when synchronization between audio and video is within threshold.

Page 22: Generating Test Data from OCL Constraints with Search Techniques

manage to solve most of the problems most of the times,except for problems A9 and A10. Table 21 lists the artificialproblems and the corresponding heuristics that we used inthe experiments. We prefixed each problem with A todenote that it is an artificial problem. For the evaluation, weused the best search algorithms among the ones used in theindustrial case study (Section 5 and in other works [63]):(1+1) EA and AVM. In this experiment, we address thefollowing research question:

RQ3. Do optimized branch distance calculations improvethe effectiveness of search over simple branch distancecalculations?

To answer this research question, we compared branchdistance calculations based on heuristics defined inSection 4 and without heuristics (i.e., branch distancecalculations either return 0 when a constraint is solved ork� 1 otherwise). We will refer to them here as Optimized(Op) and Nonoptimized (NOp) branch distance calcula-tions, respectively.

7.2 Experiment Execution

We ran experiments 100 times for (1+1) EA and AVM, withOp and NOp, and for each problem listed in Table 21. We let(1+1) EA and AVM run up to 2,000 fitness evaluations oneach problem and collected data on whether the algorithmsfound solutions for Op and NOp. We used a PC with IntelCore Duo CPU 2.20 GHz with 4 GB of RAM, runningMicrosoft Windows 7 operating system for the execution ofexperiment. To compare the algorithms for Op and NOp, wecalculated the success rate, which is defined as the numberof times a solution was found out of the total number ofruns (100 in this case).

7.3 Results and Analysis

Table 22 shows the results of success rates for Op and NOpfor each problem and both algorithms ((1+1) EA and AVM).To compare if the differences of success rates among Op andNOp are statistically significant, we performed Fisher’sexact test [75] with � ¼ 0:05. We chose this test because foreach run the result is binary, i.e., either the result is “found”or “not found” and this is exactly the situation for which theFisher’s exact test is defined. We performed the test only forthe problems having success rates greater than 0 and notequal to each other for both Op and NOp (i.e., for problemsA2, A3, and A4 in the case of 1+1 (EA) and problem A9 forAVM)). For 1+1 (EA), the p-values for all the three problems(A2, A3, and A4) are 0.0001, thus indicating that the successrate of Op is significantly higher than NOp. For problemsA1, A5, A6, A7, A8, A12, and A13, the results are even moreextreme as Op shows a 100 percent success rate, whereasNOp has a 0 percent success rate. For problems A9 and A10,the success rates are 100 percent for both Op and NOp. Forthese problems, we further compared the number ofiterations taken by (1+1) EA for Op and NOp to solve theproblems. We used the Mann-Whitney U-test [75], with� ¼ 0:05, to determine if significant differences existbetween Op and NOp. We chose this test based on recentguidelines for performing statistical tests for randomizedalgorithms [73]. Table 23 shows the results of the test andp-values are bold-faced when results are significant. InTable 23, we also show the mean differences for the numberof iterations and execution time between Op and NOp. Inaddition, we report effect size measurements using Varghaand Delaney’s A12 statistics, which is a nonparametriceffect size measure suggested in [73]. In our context, thevalue of A12 tells the probability for Op to find a solution inmore iterations than NOp. This means that the higher thevalue of A12, the higher the chances that Op will take moreiterations to find a solution than NOp. If Op and NOpare equal then the value of A12 is 0.5. With 1+1 (EA), for A9and A10, Op took significantly less iterations to solve theproblems (Table 23) as both p-values are below 0.05. Inaddition, for A9 and A10, values of A12 are 0.19 and0.46, respectively, thus showing that Op took moreiterations to solve the problem than NOp in 19 and46 percent of the time.

ALI ET AL.: GENERATING TEST DATA FROM OCL CONSTRAINTS WITH SEARCH TECHNIQUES 1397

TABLE 20Average Time Took by Algorithms to Solve the Problems

TABLE 19Results of Comparison with UMLtoCSP

Page 23: Generating Test Data from OCL Constraints with Search Techniques

For AVM, success rates for A10 were tied between Op

and NOp (Table 22). Therefore, we further compared Op

and NOp for these problems based on the number of

iterations AVM took to solve these problems. As discussed

before, we applied Mann-Whitney U-test [75] with � ¼ 0:05

to determine if significant differences exist between Op and

NOp. Table 24 shows mean differences, p-values, and A12

values. We observed that for these problems Op took less

iterations to solve them and significant differences were

observed for A10.Based on the above results, we can clearly see that

(1+1) EA and AVM with optimized branch distance

calculations significantly improve the success rates of

their nonoptimized counterparts. In the worst cases, when

there is no difference in success rates between Op and

NOp, (1+1) EA and AVM took significantly feweriterations to solve the problems.

8 OVERALL DISCUSSION

In this section, we reflect on the results obtained with theindustrial case study and artificial problems. In theindustrial case study, we observe that AVM and (1+1) EAperform better than GA and RS because the algorithms

1398 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 39, NO. 10, OCTOBER 2013

TABLE 23Results of t-Test ((1+1)EA)

TABLE 22Results of Fisher Exact Test Comparing Success Rates of Optimized (Op) and Nonoptimized

TABLE 21Artificial Problems for Heuristics

TABLE 24Results of t-Test (AVM)

Page 24: Generating Test Data from OCL Constraints with Search Techniques

achieve 100 and 98 percent success rates for all 57 con-straints on average, respectively (Section 6.1). For theexperiments based on artificial problems (Section 7.3), weobserve that AVM and (1+1) EA with optimized branchdistance calculations outperform nonoptimized branchdistance calculations. However, we notice that for certainartificial problems, the performance of (1+1) EA issignificantly better than that of AVM. For instance, inTable 22 for A2, (1+1) EA manages to find solutions for all100 runs, whereas AVM could only do so for 59 runs. Weperformed a Fisher’s exact test to determine if thedifferences are statistically significant (� ¼ 0:05) betweenthese two algorithms. We obtain a p-value of 0.001suggesting that (1+1) EA is significantly better than AVMfor A2. Since AVM is a local search algorithm and A2 is acomplex problem, AVM can be expected to be less efficientthan (1+1) EA. Similar results are obtained for A12.Conversely, for other problems (A3 and A11), AVM seemsmore successful than (1+1) EA. For A3, AVM manages tofind solutions 99 times, whereas (1+1) EA manages tofind solutions 95 times (Table 22). In this case, we obtain ap-value of 0.21 when we apply the Fisher’s exact test; hence,suggesting that the differences are not statistically signifi-cant between the two algorithms. Similarly, for A11, AVMfound solutions 99 times, whereas (1+1) EA found solutionsfor 94 times (Table 22). In this case, we obtain a p-value of0.11, suggesting again that the differences are not signifi-cant between these two algorithms.

Based on the results of our empirical analysis, weprovide the following recommendations about usingAVM and (1+1) EA: In the presence of severe timeconstraints, we recommend using AVM because it isquicker in finding solutions, as we discussed in Section 7,even though its performance was worse than (1+1) EA fortwo artificial problems. If we are flexible with time budget(e.g., the constraints need to be solved only once, and thecost of doing that is negligible compared to other costs inthe testing phase), we rather recommend running (1+1) EAfor as many iterations as possible. The success rate for (1+1)EA was 98 percent on average for the industrial case study,whereas for the artificial problems, it either fares better thanor equal to AVM.

The difference in performance between AVM and (1+1)EA has a clear explanation. AVM works like a sort of greedylocal search. If the fitness function provides a clear gradienttoward the global optima, then AVM will quickly convergeto one of them. On the other hand, (1+1) EA puts morefocus on the exploration of the search landscape. Whenthere is a clear gradient toward global optima, (1+1) EA isstill able to reach those optima in reasonable time, but willspend some time in exploring other areas of the searchspace. This latter property becomes essential in complexlandscapes, where there are many local optima. In thesecases, AVM gets stuck and has to re-start from other pointsin the search landscape. On the other hand, (1+1) EA,thanks to its mutation operator, has always a nonzeroprobability of escaping from local optima.

9 THREATS TO VALIDITY

To reduce construct validity threats, we chose as aneffectiveness measure the search success rate, which is

comparable across all four search algorithms (AVM,(1+1) EA, GA, and RS). Furthermore, we used the samestopping criterion for all algorithms, i.e., number of fitnessevaluations. This criterion is a comparable measure ofefficiency across all the algorithms because each iterationrequires updating the object diagram in EyeOCL andevaluating a query on it, as discussed in Sections 6.1 and 7.1.

The most probable conclusion validity threat in experi-ments involving randomized algorithms is due to randomvariations. To address it, we repeated experiments 100 timesto reduce the possibility that the results were obtained bychance. Furthermore, we performed Fisher exact tests tocompare proportions and determine the statistical signifi-cance of the results. We chose this test since it isappropriate for dichotomous data where proportions mustbe compared [73], thus matching our situation. Todetermine the practical significance of the results obtained,we measured the effect size using the odds ratio of successrates across search techniques.

A possible threat to internal validity is that we haveexperimented with only one configuration setting for theGA parameters. However, these settings are in accordancewith the common guidelines in the literature and ourprevious experience on testing problems. Parameter tuningcan improve the performance of GA, although defaultparameters often provide reasonable results [74].

We ran our experiments on an industrial case study togenerate test data for 57 different OCL constraints, rangingfrom constraints having just one clause to complexconstraints having eight clauses. Although the empiricalanalysis is based on a real industrial system, our resultsmight not generalize to other case studies. However, such athreat to external validity is common to all empirical studiesand such industrial case studies are nevertheless very rarelyreported in the research literature. In addition to theindustrial case study, we also conducted an empiricalevaluation of each proposed branch distance calculationusing small yet complex artificial problems to demonstratethat the effectiveness of our heuristics holds even for morecomplex problems. In addition, empirically evaluating allproposed branch distance calculations on artificial pro-blems was necessary because it was not possible to evaluatethem for all OCL features in the industrial case study, dueto its inherent characteristics.

In the empirical comparisons with UMLtoCSP, we mightalso have wrongly configured the tool. To reduce theprobability of such an event, we contacted the authors ofUMLtoCSP, who were very helpful in ensuring its properuse. From our analysis of UMLtoCSP, we cannot generalizeour results to traditional constraint solvers in general whenapplied to solve OCL constraints. However, empiricalcomparisons with other constraints solvers were notpossible because UMLtoCSP is not only the most referencedOCL solver but also the only one that is publically available.In addition, because the problems encountered withUMLtoCSP are due to the translation to a lower-levelconstraint language, we expect to face similar issues withother constraint solvers.

10 CONCLUSION

In this paper, we apply sophisticated search techniques andheuristics to solve constraints written in the Object

ALI ET AL.: GENERATING TEST DATA FROM OCL CONSTRAINTS WITH SEARCH TECHNIQUES 1399

Page 25: Generating Test Data from OCL Constraints with Search Techniques

Constraint Language—the standard constraint languagewhich is part of the Unified Modeling Language—togenerate test data. The goal is to achieve a practical,scalable solution to support test data generation for Model-based Testing when it is relying on the UML modelingstandard or extensions, as is often the case in industrialcontexts. Existing approaches in the literature have one ormore of the following limitations that make them difficult touse in industrial applications: 1) They miss key features ofOCL such as collections, which are common in industrialproblems, 2) they translate OCL into formalisms such asfirst order logic, temporal logic, or Alloy, and thus oftenface combinatorial explosion problems that limit theirpractical adoption in industrial settings.

To overcome the above-mentioned problems, we defineda set of heuristics based on OCL constraints to guide search-based algorithms (Genetic Algorithm, (1+1) EvolutionaryAlgorithm, Alternating Variable Method) and implementedthem in our search-based test data generator. Morespecifically, we defined branch distance functions forvarious types of expressions in OCL to effectively guidesearch algorithms. We demonstrated the effectiveness andefficiency of our search-based test data generator in thecontext of the model-based, robustness testing of anindustrial case study of a video conferencing system. Evenfor the most difficult constraints, with research prototypesand no parallel computations, we obtain test data in aboutthree seconds on average.

As a comparison, we ran 57 constraints from theindustrial case study on one well-known, downloadableOCL solver (UMLtoCSP) and the results showed that, evenafter running it for one hour, no solutions could be foundfor most of the constraints. Similarly to all existing OCLsolvers, because it could not handle all OCL constructs andUML features we had to transform our constraints to satisfyUMLtoCSP requirements.

We also conducted an empirical evaluation in which wecompared four search algorithms using two statistical tests totest differences between algorithms for each of the con-straints: 1) Fisher’s exact test to test differences in successrates and 2) a paired Mann-Whitney U-test to comparedistributions in success rates (paired per constraint). Resultsshowed that AVM was significantly better than the otherthree search algorithms, followed by (1+1) EA, GA, andrandom search, respectively. We also empirically evaluatedeach proposed branch distance calculation using small butyet complex artificial problems. These results showed that theproposed branch distance calculations significantly improvethe performance of solving OCL constraints for the purposeof test data generation when compared to standard andsimple branch distance calculations. Based on the results ofour empirical analyses, we recommend using AVM in thepresence of stringent time constraints because it is quicker infinding solutions, despite its worse performance than (1+1)EA for two complex artificial problems with difficult searchlandscapes. In other cases, if we are flexible with time budget(e.g., the constraints need to be solved only once), we ratherrecommend using (1+1) EA for as many iterations as possiblesince (1+1) EA has 98 percent success rate on average for theindustrial case study, whereas for the artificial problems, iteither fares better than or equal to AVM.

ACKNOWLEDGMENTS

The work described in this paper was supported by theNorwegian Research Council. This paper was produced aspart of the ITEA-2 project called VERDE. The authors thankMarius Christian Liaaen (Cisco Systems, Inc. Norway) forproviding them with the case study. They are also gratefulto Jordi Cabot, an author of UMLtoCSP, for helping them torun UMLtoCSP on the authors’ industrial case study. Hewas in part supported by a PEARL grant from the FondsNational de la Recherche, Luxembourg.

REFERENCES

[1] M. Utting and B. Legeard, Practical Model-Based Testing: A ToolsApproach. Morgan-Kaufmann, 2007.

[2] OCL, Object Constraint Language Specification, Version 2.2, http://www.omg.org/spec/OCL/2.2/, 2011.

[3] MOF, Meta Object Facility (MOF), http://www.omg.org/spec/MOF/2.0/, 2006.

[4] T. Yue, L. Briand, B. Selic, and Q. Gan, “Experiences with Model-Based Product Line Engineering for Developing a Family ofIntegrated Control Systems: An Industrial Case Study,” TechnicalReport 2012-06, Simula Research Laboratory, 2012.

[5] S. Ali, L. Briand, A. Arcuri, and S. Walawege, “An IndustrialApplication of Robustness Testing Using Aspect-Oriented Model-ing, UML/MARTE, and Search Algorithms,” Proc. 14th ACM/IEEEInt’l Conf. Model Driven Eng. Languages and Systems, 2011.

[6] A. Arcuri, M.Z. Iqbal, and L. Briand, “Black-Box System Testing ofReal-Time Embedded Systems Using Random and Search-BasedTesting,” Proc. IFIP Int’l Conf. Testing Software and Systems, 2010.

[7] M.Z. Iqbal, A. Arcuri, and L. Briand, “Environment Modelingwith UML/MARTE to Support Black-Box System Testing forReal-Time Embedded Systems: Methodology and Industrial CaseStudies,” Proc. 13th Int’l Conf. Model Driven Eng. Languages andSystems, pp. 286-300, 2010.

[8] CertifyIt, CertifyIt, http://www.smartesting.com/, 2011.[9] QTRONIC, QTRONIC, http://www.conformiq.com/qtronic.php,

2010.[10] L.v. Aertryck and T. Jensen, “UML-Casting: Test Synthesis from

UML Models Using Constraint Resolution,” Proc. ApprochesFormelles dans l’Assistance au Developpement de Logiciels, 2003.

[11] M. Benattou, J. Bruel, and N. Hameurlain, “Generating Test Datafrom OCL Specification,” Proc. ECOOP Workshop Integration andTransformation of UML Models, 2002.

[12] L. Bao-Lin, L. Zhi-shu, L. Qing, and C.Y. Hong, “Test CaseAutomate Generation from UML Sequence Diagram and OCLExpression,” Proc. Int’l Conf. Computational Intelligence and Security,2007.

[13] S. Ali, M.Z. Iqbal, A. Arcuri, and L. Briand, “A Search-Based OCLConstraint Solver for Model-Based Test Data Generation,” Proc.11th Int’l Conf. Quality Software, pp. 41-50, 2011.

[14] J. Cabot, R. Claris, and D. Riera, “Verification of UML/OCL ClassDiagrams Using Constraint Programming,” Proc. IEEE Int’l Conf.Software Testing Verification and Validation Workshop, 2008.

[15] A.P. Mathur, Foundations of Software Testing. Pearson Education,2008.

[16] A. Bertolino, “Software Testing Research: Achievements, Chal-lenges, Dreams,” Proc. Future of Software Eng., 2007.

[17] M. Harman and B.F. Jones, “Search-Based Software Engineering,”Information and Software Technology, vol. 43, pp. 833-839, 2001.

[18] J. Clarke, J.J. Dolado, M. Harman, R. Hierons, B. Jones, M. Lumkin,B. Mitchell, S. Mancoridis, K. Rees, M. Roper, and M. Shepperd,“Reformulating Software Engineering as a Search Problem,” IEESoftware vol. 150, no. 3, pp. 161-175, May/June 2003.

[19] M. Harman, A. Mansouri, and Y. Zhang, “Search Based SoftwareEngineering: Trends, Techniques and Applications,” ACM Com-puting Surveys, vol. 45, article 11, 2012.

[20] R. Drechsler and N. Drechsler, Evolutionary Algorithms forEmbedded System Design. Kluwer Academic Publishers, 2002.

[21] D.A. Coley, An Introduction to Genetic Algorithms for Scientists andEngineers. World Scientific Publishing Company, 1997.

[22] W. Afzal, R. Torkar, and R. Feldt, “A Systematic Review of Search-Based Testing for Non-Functional System Properties,” Informationand Software Technology, vol. 51, pp. 957-976, 2009.

1400 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 39, NO. 10, OCTOBER 2013

Page 26: Generating Test Data from OCL Constraints with Search Techniques

[23] T. Mantere and J.T. Alander, “Evolutionary Software Engineering,a Review,” Applied Soft Computing, vol. 5, pp. 315-331, 2005.

[24] P. McMinn, “Search-Based Software Test Data Generation: ASurvey,” Software Testing, Verification and Reliability, vol. 14, p. 52,2004.

[25] J.T.d. Souza, C.L. Maia, F.G.d. Freitas, and D.P. Coutinho, “TheHuman Competitiveness of Search Based Software Engineering,”Proc. Second Int’l Symp. Search Based Software Eng., 2010.

[26] G. Fraser and A. Arcuri, “EvoSuite: Automatic Test SuiteGeneration for Object-Oriented Software,” Proc. ACM Symp.Foundations of Software Eng., pp. 416-419, 2011.

[27] G. Fraser and A. Arcuri, “Sound Empirical Evidence in SoftwareTesting,” Proc. ACM/IEEE Int’l Conf. Software Eng., 2012.

[28] E.K. Burke and G. Kendall, Search Methodologies: IntroductoryTutorials in Optimization and Decision Support Techniques. Springer,2006.

[29] A. Arcuri, M.Z. Iqbal, and L. Briand, “Random Testing:Theoretical Results and Practical Implications,” IEEE Trans.Software Eng., vol. 38, no. 2, pp. 258-277, Mar./Apr. 2012.

[30] S. Ali, L.C. Briand, H. Hemmati, and R.K. Panesar-Walawege, “ASystematic Review of the Application and Empirical Investigationof Search-Based Test Case Generation,” IEEE Trans. Software Eng.,vol. 36, no. 6, pp. 742-762, Nov./Dec. 2010.

[31] P. McMinn, “Search-Based Software Test Data Generation: ASurvey: Research Articles,” Software Testing Verification andReliability, vol. 14, pp. 105-156, 2004.

[32] S. Droste, T. Jansen, and I. Wegener, “On the Analysis of the (1+1)Evolutionary Algorithm,” Theoretical Computer Science, vol. 276,pp. 51-81, 2002.

[33] B. Korel, “Automated Software Test Data Generation,” IEEETrans. Software Eng., vol. 16, no. 8, pp. 870-879, Aug. 1990.

[34] D. Chiorean, M. Bortes, D. Corutiu, C. Botiza, and A. Carcu,“OCLE,” V2.0 ed, 2010.

[35] Model Development Tools, http://www.eclipse.org/modeling/mdt/?project=ocl, 2012.

[36] Dresden OCL, http://www.dresden-ocl.org/index.php/DresdenOCL, 2012.

[37] UML-based Specification Environment (USE), http://sourceforge.net/apps/mediawiki/useocl/index.php?title=The_UML-based_Specification_Environment, 2012.

[38] M. Egea, “EyeOCL Software,” 2010.[39] B. Bordbar and K. Anastasakis, “UML2Alloy: A Tool for Light-

weight Modelling of Discrete Event Systems,” Proc. IADIS Int’lConf. Applied Computing, 2005.

[40] D. Distefano, J.-P. Katoen, and A. Rensink, “Towards ModelChecking OCL,” Proc. ECOOP-Workshop Defining Precise Semanticsfor UML, 2000.

[41] M. Clavel and M.A.G.d. Dios, “Checking Unsatisfiability for OCLConstraints,” Proc. Ninth OCL Workshop at the UML/MoDELS Conf.,2009.

[42] B.K. Aichernig and P.A.P. Salas, “Test Case Generation by OCLMutation and Constraint Solving,” Proc. Fifth Int’l Conf. QualitySoftware, 2005.

[43] D. Berardi, D. Calvanese, and G.D. Giacomo, “Reasoning on UMLClass Diagrams,” Artificial Intelligence, vol. 168, pp. 70-118, 2005.

[44] J. Winkelmann, G. Taentzer, K. Ehrig, and J.M. ster, “Translationof Restricted OCL Constraints into Graph Constraints forGenerating Meta Model Instances by Graph Grammars,” Electro-nic Notes Theoretical Computer Science, vol. 211, pp. 159-170, 2008.

[45] M. Kyas, H. Fecher, F.S.d. Boer, J. Jacob, J. Hooman, M.v.d. Zwaag,T. Arons, and H. Kugler, “Formalizing UML Models and OCLConstraints in PVERSUS,” Electronic Notes Theoretical ComputerScience, vol. 115, pp. 39-47, 2005.

[46] M.P. Krieger, A. Knapp, and B. Wolff, “Automatic and EfficientSimulation of Operation Contracts,” Proc. Ninth Int’l Conf.Generative Programming and Component Eng., 2010.

[47] S. Weißleder and B.-H. Schlingloff, “Deriving Input Partitionsfrom UML Models for Automatic Test Generation,” Models inSoftware Eng., pp. 151-163, Springer-Verlag, 2008.

[48] M. Gogolla, F. Bttner, and M. Richters, “USE: A UML-BasedSpecification Environment for Validating UML and OCL,” ScienceComputer Programming, vol. 69, pp. 27-34, 2007.

[49] A.D. Brucker, M.P. Krieger, D. Longuet, and B. Wolff, “ASpecification-Based Test Case Generation Method for UML/OCL,” Proc. Int’l Conf. Models in Software Eng., 2011.

[50] W. Ahrendt, T. Baar, B. Beckert, M. Giese, E. Habermalz, R.Hahnle, W. Menzel, and P.H. Schmitt, “The KeY Approach:Integrating Object Oriented Design and Formal Verification,”Proc. European Workshop Logics in Artificial Intelligence, 2000.

[51] D. Jackson, I. Schechter, and H. Shlyahter, “Alcoa: The AlloyConstraint Analyzer,” Proc. 22nd Int’l Conf. Software Eng., 2000.

[52] M. Krieger and A. Knapp, “Executing Underspecified OCLOperation Contracts with a SAT Solver,” Proc. Eighth Int’lWorkshop OCL Concepts and Tools, 2008.

[53] C. Doungsa-ard, K. Dahal, A. Hossain, and T. Suwannasart, “GA-Based Automatic Test Data Generation for UML State Diagramswith Parallel Paths,” Advanced Design and Manufacture to Gain aCompetitive Edge, pp. 147-156, Springer 2008.

[54] R. Lefticaru and F. Ipate, “Functional Search-Based Testing fromState Machines,” Proc. Int’l Conf. Software Testing, Verification, andValidation, 2008.

[55] M. Alshraideh and L. Bottaci, “Search-Based Software Test DataGeneration for String Data Using Program-Specific SearchOperators: Research Articles,” Software Testing Verification andReliability, vol. 16, pp. 175-203, 2006.

[56] P. McMinn, M. Shahbaz, and M. Stevenson, “Search-Based TestInput Generation for String Data Types Using the Results of WebQueries,” Proc. Fifth IEEE Int’l Conf. Software Testing, Verificationand Validation, 2012.

[57] G. Fraser and A. Arcuri, “The Seed Is Strong: Seeding Strategies inSearch-Based Software Testing,” Proc. Fifth IEEE Int’l Conf.Software Testing, Verification and Validation, 2012.

[58] G. Fraser and A. Arcuri, “It Is Not the Length That Matters, It IsHow You Control It,” Proc. Fourth IEEE Int’l Conf. Software Testing,Verification and Validation, 2011.

[59] G. Fraser and A. Arcuri, “Whole Test Suite Generation,” IEEETrans. Software Eng., vol. 39, no. 2, pp. 276-291, Feb. 2013.

[60] S. Poulding, J.A. Clark, and H. Waeselynck, “A PrincipledEvaluation of the Effect of Directed Mutation on Search-BasedStatistical Testing,” Proc. Fourth IEEE Int’l Conf. Software Testing,Verification and Validation Workshops, 2011.

[61] S. Yoo, M. Harman, and S. Ur, “Highly Scalable Multi ObjectiveTest Suite Minimisation Using Graphics Cards,” Proc. Third Int’lConf. Search Based Software Eng., 2011.

[62] M. Harman, S.A. Mansouri, and Y. Zhang, “Search BasedSoftware Engineering: A Comprehensive Analysis and Reviewof Trends Techniques and Applications,” Technical Report TR-09-03, King’s College, 2009.

[63] A. Arcuri, “It Really Does Matter How You Normalize the BranchDistance in Search-Based Software Testing,” Proc. Third Int’l Conf.Software Testing, Verification and Reliability, 2011.

[64] R.V. Binder, Testing Object-Oriented Systems: Models, Patterns, andTools. Addison-Wesley Longman Publishing Co., Inc., 1999.

[65] H. Li and Gordon, “Bytecode Testability Transformation,” Proc.Symp. Search Based Software Eng. Co-Located with ESEC/FSE, 2011.

[66] S. Ali, H. Hemmati, N.E. Holt, E. Arisholm, and L.C. Briand,“Model Transformations as a Strategy to Automate Model-BasedTesting—A Tool and Industrial Case Studies,” technical report,Simula Research Laboratory, 2010.

[67] M. Phil, “Input Domain Reduction through Irrelevant VariableRemoval and Its Effect on Local, Global, and Hybrid Search-BasedStructural Test Data Generation,” IEEE Trans. Software Eng.,vol. 38, no. 2, pp. 453-477, Mar./Apr. 2012.

[68] Kermeta, Kermeta—Breathe Life into Your Metamodels, http://www.kermeta.org/, 2011.

[69] F. Lindlar and A. Windisch, “A Search-Based Approach toFunctional Hardware-in-the-Loop Testing,” Proc. Second Int’lSymp. Search Based Software Eng., 2010.

[70] S. Ali, L.C. Briand, and H. Hemmati, “Modeling RobustnessBehavior Using Aspect-Oriented Modeling to Support RobustnessTesting of Industrial Systems,” Software and Systems Modeling,vol. 11, pp. 633-670, 2012.

[71] Cisco C90, http://www.cisco.com/en/US/products/ps11330/index.html, 2012.

[72] MARTE, Modeling and Analysis of Real-Time and Embedded Systems(MARTE), http://www.omgmarte.org/, 2010.

[73] A. Arcuri and L. Briand, “A Practical Guide for Using StatisticalTests to Assess Randomized Algorithms in Software Engineer-ing,” Proc. Int’l Conf. Software Eng., 2011.

[74] A. Arcuri and G. Fraser, “On Parameter Tuning in Search BasedSoftware Engineering,” Proc. Int’l Symp. Search Based Software Eng.,2011.

ALI ET AL.: GENERATING TEST DATA FROM OCL CONSTRAINTS WITH SEARCH TECHNIQUES 1401

Page 27: Generating Test Data from OCL Constraints with Search Techniques

[75] D.J. Sheskin, Handbook of Parametric and Nonparametric StatisticalProcedures. Chapman and Hall/CRC, 2007.

Shaukat Ali is currently a research scientist inthe Certus Software Verification and ValidationCenter, Simula Research Laboratory, Norway.He has been affiliated with the Simula ResearchLab since 2007. He has been involved in manyindustrial and research projects related tomodel-based testing (MBT) and empirical soft-ware engineering since 2003. He has experi-ence of working in several industries andacademic research groups in many countries

including the United Kingdom, Canada, Norway, and Pakistan. He is amember of the IEEE Computer Society.

Muhammad Zohaib Iqbal received the PhDdegree in software engineering from the Uni-versity of Oslo, Norway, in 2012. He is currentlyan assistant professor in the Department ofComputer Science, National University of Com-puter & Emerging Sciences (Fast-NU), Islama-bad, Pakistan, and a research fellow at theSoftware Verification & Validation Lab, Interdis-ciplinary Centre for Security, Reliability andTrust (SnT), University of Luxembourg. Before

joining Fast-NU, he was a research fellow at the Simula ResearchLaboratory, Norway. He was a lecturer in the Department of ComputerScience, International Islamic University, Islamabad, Pakistan, and theDepartment of Computer Science, Mohammad Ali Jinnah University,Pakistan. His research interests include model-driven engineering,software testing, and empirical software engineering. He has beeninvolved in research projects in these areas since 2004. He is a memberof the IEEE Computer Society

Andrea Arcuri received the BSc and MScdegrees in computer science from the Universityof Pisa, Italy, in 2004 and 2006, respectively, andthe PhD degree in computer science from theUniversity of Birmingham, England, in 2009. Heis a senior software engineer in the gas and oilindustry, while having a small part time positionas Adjunct Research Scientist at Simula Re-search Laboratory, Norway. His research inter-ests include search-based software testing and

randomized algorithms. He is a member of the IEEE Computer Society.

Lionel C. Briand is a professor and FNRPEARL Chair in Software Verification andValidation at the SnT Centre for Security,Reliability, and Trust, University of Luxembourg.He started his career as a software engineer inFrance (CS Communications & Systems) andhas conducted applied research in collaborationwith industry for more than 20 years. Untilmoving to Luxembourg in January 2012, heheaded the Certus Center for Software Verifica-

tion and Validation at the Simula Research Laboratory, where he ledapplied research projects in collaboration with industrial partners. Beforethat, he was on the faculty of the Department of Systems and ComputerEngineering, Carleton University, Ottawa, Canada, where he was fullprofessor and held the Canada Research Chair (Tier I) in SoftwareQuality Engineering. He has also been the Software Quality EngineeringDepartment head at the Fraunhofer Institute for Experimental SoftwareEngineering, Germany, and was a research scientist for the SoftwareEngineering Laboratory, a consortium of the NASA GoddardSpace Flight Center, CSC, and the University of Maryland. He hasbeen on the program, steering, or organization committees of manyinternational, IEEE and ACM conferences. He is the coeditor-in-chief ofEmpirical Software Engineering (Springer) and is a member of theeditorial boards of Systems and Software Modeling (Springer) andSoftware Testing, Verification, and Reliability (Wiley). He was on theboard of the IEEE Transactions on Software Engineering from 2000 to2004. He was recently given the IEEE Computer Society Harlan Millsaward for his work on model-based verification and testing. His researchinterests include: model-driven development, testing and verification,search-based software engineering, and empirical software engineering.He was elevated to the grade of IEEE fellow for his work on the testing ofobject-oriented systems.

. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.

1402 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 39, NO. 10, OCTOBER 2013