[ieee 2012 ninth international conference on information technology: new generations (itng) - las...

Using Reverse-Engineered Test-Based Models to Generate More TestsWhere is the sense in that?

Teemu Kanstrén1, Eric Piel2, Hans-Gerhard Gross2

1VTT, Finland, 2TU Delft, The Netherlands

Abstract

Model-based testing is a process of generatingtests for a test target based on a behavioral model. Inmany cases, suitable models do not exist and buildingthese from scratch is a costly effort. A potentialapproach to bootstrap this process is to generatemodels with reverse-engineering methods. Potentialsources of information for such models include usersessions, existing test cases, and field data. All ofthese can be seen as different forms of test data.However, using existing test cases as a basis to createa model for test generation raises the question ofwhat is the benefit over the previously existing tests.This paper aims to answer this question by evaluatingthe benefits in terms of a practical case study.

Keywords: model-based testing, test generation,reverse-engineering, specification mining.

1. IntroductionGenerating models from observations made about

software behavior, which is typically referred to asspecification mining, has become a popular researcharea in software engineering. Models are increasinglyused in various domains of software engineering,pushed by the attempts to increase abstraction and theadoption of practices such as model-driven design andtesting. Providing means to generate models thereforehas a large potential impact. This can be useful in caseswhere suitable models are not available such as legacysystems, or adoption of new techniques in laterdevelopment phases. One specific area of application isthat of model-based testing (MBT), where high-levelmodels are used as a basis to automatically generatetest cases with the help of MBT tools.

MBT can have several definitions. This paperfollows Utting and Legeard [1] who describe MBT as“Generation of test cases with oracles from abehavioural model”. The model describes the expectedbehaviour of the system under test (SUT), and is usedby a MBT tool in order to generate test cases in a formsuitable for the test target, such as method invocation

sequences and input data. The SUT output is checkedby the test oracles also encoded into the model.

Specification mining techniques as discussed in thispaper are practically reverse-engineering techniquesthat generate models of the system behavior based on aset of input data. The focus in this paper is on inputdata acquired through the means of dynamic analysis,meaning data observed during the execution of aprogram. Different sources of information can be usedas a basis for specification mining in this context, suchas user sessions, recorded field data, and existing testcases. These can all be seen as different forms of testcases, with user sessions as a form of exploratorytesting, and field data as a form of record-replayregression tests.

This paper focuses especially on specificationmining of models suitable for MBT. As MBT is a testgeneration technique, using such reverse-engineeredmodels as a basis for further test generation typicallyraises the question whether it makes any sense togenerate tests based on existing tests. The rationale forthis question is that there are already tests that coverthe model since the model is generated based on suchtests. The added value of more similar tests in thiscontext is seen as questionable.

This paper aims to answer this question and to showhow it can still be useful to bootstrap the modelingprocess for MBT based on existing tests. We discussthis in terms of case studies in applying specificationmining on components of sensor network systems andusing the resulting models as a means to bootstrap theMBT process.

The rest of the paper is structured as follows.Section 2 describes various approaches to specificationmining that have been applied in the context of testautomation. Section 3 presents the case study. Section4, discusses the concept in general. Finally,conclusions end the paper.

2. Related WorksAs noted before, various approaches have been

proposed for observation-based specification-mining.Various approaches have also been proposed for using

2012 Ninth International Conference on Information Technology- New Generations

978-0-7695-4654-4/12 $26.00 © 2012 IEEE

DOI 10.1109/ITNG.2012.42

247

2012 Ninth International Conference on Information Technology - New Generations

978-0-7695-4654-4/12 $26.00 © 2012 IEEE

DOI 10.1109/ITNG.2012.42

247

these as a basis for testing and verification. Thissection presents a brief overview of such works.

As an example of applying observations tomanually produce test cases, Ducasse [2] and DeRoover [3] have created regression tests in the form ofmanually crafted queries over behavioral traces. Thistype of an approach thus does not generate testautomatically but uses the observations about systembehavior as a basis for manual test creation.

Lo et al. [4] have mined temporal rules (patternsover control-flow) from observations. The rulesexpress a model in terms of specific premise to befollowed by a specific consequence over time,basically in terms of events in the control-flow. Thesemined patterns can be used to help understand thebehavior of the system and as an input to form testcases that describe the expected sequence of the events.

Lorenzoli et al. [5] generate a state-machine basedon the observed behavior of the SUT. This is based onmerging the observed control-flow patterns to form astate-machine describing the possible orderings of theobserved events. This is further augmented with data-flow invariants describing the overall state of the SUTwhen an event is observed. These invariants provideguard statements that define when one of the events inthe state-machine can happen. Lorenzoli et al. use thegenerated model to optimize the existing test suite interms of coverage of this model [5], and to observechanges in different SUT versions [6]. Although theydo not apply this approach in MBT, this type of amodel is very similar to that commonly used in MBTfor test generation. This is also the type of a modelused in the case study discussed in this work, slightlymodified to optimize it for MBT use.

One of the main pieces required to make agenerated model suitable for test execution and missingin the above discussed previous works, is the link fromthe generated model back to the SUT. This is needed toexecute test cases against the SUT, and is commonlyreferred to as the test harness. Related work in thisdomain is the generation of mock objects by Tillmanand Schulte [7], who decompose observed systembehaviour of a large test case execution intoexpectations for a mock object to isolate the testedcomponent from its environment in the context of aspecific more focused tests. In our case study, this hasbeen extended to generalize over all the generated testsas a part of the test model, instead of a single unit test(going oppositely from few small ones to one largeone). However, the principles are the same.

While this paper discusses specification mining ingeneral and its application to provide suitable input forgenerating new test cases, a step further would be to dothis in a domain specific way. This would be in linewith our previous work on using domain-specific

models as a basis for test modeling and generation [8].However, here the focus is on the generic applicabilityof the specification mining as input for MBT.

Some examples of domain specific specificationmining approaches can be found in the user interface(UI) domain. For example, Mesbah and van Deursen[9] generate a state-machine representation of the user-interface based on observed interactions between theUI elements and the UI state changes as expressed byUI properties. Similarly, Memon [10] discusses this inthe general UI specification context.

In these works, Mesbah and van Deursen [9] applytest oracles as manually defined checks over the UIelements and their expected interactions with otherelements as invariants. The test oracles can be seen asthe final element needed to produce executable testcases from generated models, and in this paper theobserved control-flow and data-flow patterns are takenas a basis for initial test oracles into the models.

3. Specification Mining for MBTThe previous section presented a number of

techniques suitable for specification mining as appliedin the context of software testing. As none of thesetargets MBT, a novel adaptation of them is needed toanswer our question. This section shows one way tocombine these techniques to produce a suitable modelfor test generation using a MBT tool. A case study isused to demonstrate what is needed to produce a usefulresult. This includes a model generated based on testcases and showing its use as a useful basis for furthertest generation and verification.

Examples from the sensor-networks domain areused to illustrate this concept. The main features ofdifferent components in this example are listed inTable 1. In addition to these, there are features relatedto node configurations and other support functionality.These details are not presented, as the ones shown inthe table are the main functionality needed to presentour case study.

Table 1. Components and their functionality.

Comp. Functionality the component can doSensor Register with the server

Provide measurement data to the serverUnregister from the server

Client Register with the serverSubscribe to sensor data on the serverUnsubscribe to subscribed data

Server Submit measurement requests to sensorsForward measurement values to clientsProvide sensor availability information toclients

248248

3.1 Model generation for MBTAs noted, this adaptation applies concepts from

many of the previous works to produce a suitablespecification-mining approach. Similar to Lorenzoli etal. [5], a set of control-flow patterns is merged to formthe basics of a generic state machine describing theoverall behavior of the SUT. This is similarlyaugmented with data-flow invariants describing theoverall state of the SUT when an event is observed.Together these form the basics of a state machine forMBT, where the test steps are represented by thetransitions in the mined state-machine (i.e. the control-flow).

Considering this pure control-flow aspect, each ofthe nine functional features shown in Table 1 can berepresented as their own test step (as transitions in thetest model). They are also modeled as applicationprogramming interface (API) calls in each of therelated components, making them suitable candidatesfor specification mining through these API’s. Miningthese is as simple as parsing the API definition orrecording the types of messages passed betweencomponents. We do not need complex tools for that.

To make this more interesting, we have to add inputdata to the potential events. To do this, we capture theparameter values for the messages passed between thecomponents. From all the values passed between thedifferent components, we mine the data-flow invariantsfor the parameters values observed. Examples of suchinvariants describe the parameters always being in agiven set (e.g., always having value of 1,2, or 5) oralways being within a given range (e.g. 1-100). Fromthis, we get a set of potential messages passed betweenthe components and their parameter value variation.

What still need is a definition of the valid test stepsequences that can be generated by a MBT tool. Thisrequires evaluating the points in time and SUT state-space when each test step can be taken. In order to dothis, we observe the global state of the SUT at the pointwhen each test step (i.e. API message) is observed. Asthe global state is composed of data, we apply the samedata-flow invariants to analyse this state and itsrelation to the different test steps. However, in this casethe set of invariants is different as the ones that arerelevant describe the observed relations betweendifferent elements and not the set or range of valuesobserved.

For example, to make a successful subscriptionfrom a client to a server, the server must have both aregistered client and a registered sensor available, andthe identifiers of the client and the sensor must matchone of those in the list of registered clients and sensors.Thus the relevant invariants for us describe theavailability of given type of global state when a

message is observed and the relation of the messageparameters to the content of this specific state data.

These elements practically define our simplespecification mining approach as adapted from theprevious works. They can be summarized them as:

� API messages form test steps� Presence of elements in global SUT state define

when a test step can be taken� Data-flow invariants over message parameters

form input data for state-independent test steps� Message parameters relations to global state

define parameters for state-dependent stepsHere an independent test step refers to a test step (or

its parameter) that has no dependency to global SUTstate. Dependent on the other hand refers to caseswhere such a dependency exists. The approach toevaluating if one is state-dependent is simply to checkif the relevant invariants hold against global stateduring the observations. If so, the miner considers it tobe a state-dependent variable and produces suitablemodel elements. If a checked invariant does not hold,nothing is generated.

In order to provide the test harness to make itpossible to evaluate the test model against the SUT, thesame procedures as above were applied but with areverse view. In this case, the environment of the testedcomponent (the server) is modeled from the viewpointof its collaborators (sensor API and the client API)with a similar approach. For the observations mockobjects are generated similar to [7], and used to enablethe testing of the tested component. Figure 1 illustratesthis where the arrows how the test model producesinput for the SUT, and evaluates the output based on amodel reverse-engineered from observations.

Figure 1. Test harness.

Figure 2 shows an example of a step executed froma model exhibiting the types of properties describedabove. Here, the circle contains the elements of globalstate-space at a point in time. The arrow between thetwo states describes a transition, or more precisely atest step which invokes the message to subscribe to asensor. The part above the arrow defines the conditionswhen the test step can be taken.

Clients = 1Sensors = 4

Messages = 7Subs. = 3

Clients = 1Sensors = 4

Messages = 7Subs . = 4

Subscribeto sensor

[ Clients > 0 &Sensors > 0 ]

Figure 2. Control-flow example.

249249

This example part of a model has been generatedbased on the observations that every time a successfulsubscription is made by a client to a sensor, there isalways a registered client and a registered sensor in theSUT global state. This forms the guard above thearrow. The parameters for the test step are then basedon the observation that the client id is always one thatis present in the “Client” state variable, and the sensorid is one always present in the “Sensors” state variable.We can further create a test oracle noting that thenumber of subscriptions must increase aftersubscription. The MBT tool will then generate testcases with steps matching these definitions, where atleast one client and sensor must be registered, thesubscribing client id and the subscribed sensor id mustbe one of the registered ones, and the number ofsubscriptions must increase. The number of messagesis in this case not relevant for this test step but could berelevant for others, for example, when existence of amessage (possibly with specific properties) mustprompt a specific reply.

3.2 Does it make any sense?Simply having a test model available does not really

take us far in testing, nor does it answer the questionasked originally: does this approach make any sensefor MBT? In order to be able to answer this question,this model was applied to testing of the same sensornetwork system, with the help of a MBT tool. Thissubsection presents the experiences in this process.

The generated models will not be complete, nor dowe think one should aim for them to be. Complete heremeans the ability to immediately execute the modelswith the help of a MBT tool and receive no errors. Themodel generation should form a basis to bootstrap themodeling and testing process, not be an end in itself. Ifthe models were made complete we expect they wouldfoster in people the desire to claim they apply MBTand all their tests pass.

In our opinion and experience, if the models are notevaluated by a human expert, any errors inherent in theSUT vs. its expectations by different stakeholders willstill be missed. Discovering such issues is one of thekey benefits of testing and a higher-level model such asthis provides good means to perform such evaluations.This is even further emphasized by the ability toexecute the model in terms of generating test casesfrom it with the MBT tool and running them againstthe SUT. Luckily, any non-trivial system is unlikely tohave a complete model produced by a specification-mining tool, for the reason that automatically figuringdomain meta-data for data-types and similar advancedconcepts without expert input is very difficult.

A specific process is needed to apply such models.In our experience this process takes the followingform:

� Generate an initial model� Set all guards to false, defining that no steps are

allowed in the model� Pick a test step the can be executed� Enable that step in the test model. This requires

evaluating when the model should allow it� Evaluate the enabled steps against the

specification� Fix any errors observed� Generate and execute tests from the model� Compare the results against the specification� Fix any errors and failures observed� Iterate from the second point with another stepThis process starts from an initial point where most

steps are disabled, and proceeds iteratively in smallpieces. This makes it possible to handle even complexmodels as the expert is not required to take it all in atone time. It also illustrates why a non-complete modelis good in our perspective. It practically forces one togo through this process of evaluation against thespecification to find any missing details andmismatches. The process also demonstrates why aMBT tool is good for this type of an evaluation as theexpectations practically end up being encoded in thetest model and are quickly verified against the SUT bygenerating and executing test cases from the model.

As we can see from this process, the test model isnot used as is to generate test cases but as a means tobootstrap the modeling and verification process and asa basis for further test generation.

The process has potential to reveal various types oferrors and failures. From our experiences, the causesof errors and failures discovered with this process canbe classified to missing functionality, mismatchesbetween the implementation and the specification, andproblems in the design causing problems under specificconditions.

In our case study, problems in the design wererelated to assumptions the SUT made about itsenvironment. It would expect to find only oneconnection per client, whereas it was possible to makeseveral. This was made visible by the new interactionsequences generated by the MBT tool based on the testmodel.

Mismatches between the implementation andspecification were revealed as the MBT tool againgenerated various interaction sequences with differentparameter values, leading to revelation of wrongassumptions the SUT implementation made, such asalways returning “ok” for client sensor datasubscriptions regardless of given parameters. This type

250250

of an error is due to ambiguity and resulting differentinterpretation of the specifications.

Missing implementation details were discovered bycomparing the generated and visualized control-flow ofthe state-machine with the natural languagespecification. Since a model generated from observedexecutions cannot contain data not observed (since it isnot implemented) this requires careful comparison tothe written specification and expert knowledge.

It is not always cost-effective to implement thisprocess. Our implementation required instrumentationof the relevant API’s, adding access to SUT globalstate, and workarounds for the specification-miningtool limitations. Adding all this customized support canbe a costly effort.

In summary, looking at these different types ofissues, we can summarize several things about thequestion whether it makes sense to use reverse-engineered test models to generate more tests. First ofall, just running a test generator tool on a model basedon an existing test set has little if any power to helpdetect new faults. The fact that the MBT tool generatesnew control-flow and data-flow combinations byvarying the overall combinations of the previous testcases can be find new failures. However, withoutexpert review and iterative evaluation of the generatedmodel we can say that practically none of the issues wediscovered would have been discovered.

On the other hand, in our experience the approachprovides a good basis for supporting the expert inbootstrapping a MBT process. We have creatednumerous models for various real industrial systems indifferent MBT notations and in our experience the timetaken to create this type of an initial model fromscratch takes a longer time, especially in terms offinding the relevant information. Having an initialmodel as a tool to aid in communication can be a bigasset in such a situation.

We find that generally the most useful aspects arethe automated provisioning of model elements,including the test steps, the global state variables, andthe parameters and their relations. In many cases, justseeing how little of the state-space is covered byexisting tests can be a revelation, providing additionalmotivation for going forward with MBT. Finally, wealso note that in our case a significant customizationeffort was required, and more generic solutions wouldhelp make this process more cost-effective.

4. DiscussionFrom our viewpoint, specification mining for MBT

can take place from two different viewpoints. One is todirectly reverse-engineer the test cases by analyzingtheir structure. This requires that the test cases are

defined in a formal way, where test steps and datavalues can be uniquely identified. Where this ispossible, it can provide a direct way to mine aspecification of the overall test logic. However, thiscan only use formally defined test cases as input, andinput such as exploratory testing and field data is ruledout, leading also to more limited model outcomes.

The broader approach applied in this paper is toobserve the external API of the SUT and use it todefine the possible test steps. In this case, each APImethod defines a potential test step. This also allowsuse of information such as field data or manual usertest sessions as input for specification mining.

As we described our experiences on application ofthe approach in section 3, we also briefly discussed theobserved limitations in terms of tools and interfaces. Inrelation to our initial question of whether this type ofan approach can make any sense, these limitations arenot necessarily inherent to the approach itself but to theenvironment where it is executed.

The specific limitations we observed were related tothe available observations over the execution, therequired access interfaces for global state, and theavailable specification mining tools. Most systems arenot instrumented by default to provide a systematic setof observations. When such observations are founduseful in general (e.g. to support also other features), itcan be built in. In other cases, external single-pointapproaches can be investigated. For example, in caseof networked systems, the observations can be madefrom a communication bus. However, these stillrequire integration with the input formats of thespecification-mining tools, which we found lacking.

Similar to the observations over messages passed(the control-flow events and their parameters), also theaccess to global state can be problematic for the samereasons. Commonly there is no other need to buildsuch support into the system, and in some cases itcould even be considered a vulnerability to allowaccess to internal state. In practical situation thisrequires careful consideration when such things areimplemented and how, contributing to the cost-factor.

For specification mining, while some quiteadvanced tools exist, our experience was that they werealso quite limited in their support for complexspecifications. The main tool we used was the Daikondata-flow invariant mining tool [11]. However, it isdesigned to address a need to mine invariants oversingle program points (e.g. invocation of an APImessage). In cases where the interest is towardsinteraction between several elements the tool requiredcustomization and the results were sub-optimal. Whatwe found necessary is the ability to analyse invariantsover several program points (e.g. client API vs. serverAPI vs. sensor API). The problem with providing such

251251

support is the huge amount of possibilities anautomated tool would generate. Thus, again expertassistance is required.

One property we have not discussed is the type ofbehavior observed and mined. For example, some inputmay trigger error handling functionality and some willproceed along the nominal execution path. These canbe seen as specific cases of control-flow and as suchshould be supported as is by specification-mining toolssuch as those described in this paper. However, weonly applied the approach on nominal flow.

Overall, it should be noted that our approach wasperhaps specific to the domain in question. In this case,the adoption of the techniques was performed byiteratively trying the specification mining tools,analyzing the results for the most relevant results, andfine tuning the generator to those results. In the case ofthe observed limitations, proper tool support wouldalso need to be developed through application invarious domains, and generalization of the experiencesin the usefulness of the different approach in differentcontexts. A likely result would then be also domain-specific rules and patterns that could make theapproach more cost-effective to apply.

5. ConclusionsIn this paper, we have demonstrated how reverse-

engineered test models may not be useful in generatingmore test cases as such but can be useful inbootstrapping the modeling process as an aid for adomain expert. While we have demonstrated that theapproach in general can be useful, much work remainsto produce really useful specification miningapproaches to for generating suitable models for MBT.The main point we demonstrated that while theapproach can make sense in theory, making it makesense in practice still requires improvements in cost-effectiveness. As one of the main points, this includesaddressing issues in specification-mining of morecomplex scenarios.

In our future work, we will seek cost-effective waysto make use of mined information. This is likely tostart with providing generic partial inputs for chosenaspects, such as component properties, and extendingfrom there as more advanced and suitable specificationmining approaches are available. Our initial work inthis direction has been in the development of the OpenSource Modelling Objects (OSMO) toolset, available

as open source [12]. This includes both a basic MBTtool and a simple specification mining tool. How farthat progresses depends on the different domains andcase studies where we apply this, and the experienceswe observe in these cases.

6. References[1] M. Utting and B. Legeard, Practical Model-Based Testing: A

Tools Approach.: Morgan Kaufmann, 2007.[2] S. Ducasse, T. Gîrba, and R. Wuyts, "Object-Oriented Legacy

System Trace-Based Logic Testing," in European Conf. onSoftware Maintenance and Reeng. (CSMR), 2006.

[3] C. D. Roover, I. Michiels, K. Gybels, K. Gybels, and T.D'Hondt, "An Approach to High-Level Behavioral ProgramDocumentation Allowing Lightweight Verification," in Proc.14th Int'l. Conf. on Program Comprehension (ICPC'06), 2006.

[4] D. Lo, S-C. Khoo, and C. Liu, "Mining Temporal Rules forSoftware Maintenance," Journal of Software Maintenance andEvolution: Research and Practice, vol. 20, no. 4, pp. 227-247,2008.

[5] D. Lorenzoli, L. Mariani, and M. Pezzè, "Automatic Generationof Software Behavioral Models," in Int'l. Conf. on SoftwareEng. (ICSE), 2008, pp. 501-510.

[6] L. Mariani and M. Pezzé, "Dynamic Detection of COTSComponent Incompatibility," IEEE Software, vol. 24, no. 5, pp.77-85, September/October 2007.

[7] N. Tillmann and W. Schulte, "Mock-Object Generation withBehaviour," in Proceedings of the 21st IEEE/ACM InternationConference on Automated Software Engineering, Tokyo, Japan,2006, pp. 365-368.

[8] O-P. Puolitaival, T. Kanstrén, V-M. Rytky, and A. Saarela,"Utilizing Domain-Specific Modelling for Software Testing," in3rd International Conference on Advances in System Testingand Validation Lifecycle (VALID2011), 2011.

[9] A. Mesbah and A. van Deursen, "Invariant-Based Testing ofAjax User Interfaces," in Int'l. Conf. on Software Eng. (ICSE),2009.

[10] A. M. Memon, "An Event-Flow Model of GUI-basedApplications for Testing," Journal of Software Testing,Verification and Reliability, vol. 17, pp. 137-157, 2007.

[11] M. D. Ernst, J. Cockrell, W. G. Griswold, and D. Notkin,"Dynamically Discovering Likely Program Invariants toSupport Program Evolution," IEEE Transactions on SoftwareEng., vol. 27, no. 2, pp. 99-123, Feb. 2001.

[12] T. Kanstrén. (2011, October) Open Source Modelling Objects(OSMO). [Online]. http://code.google.com/p/osmo/

252252

[ieee 2012 ninth international conference on information technology: new generations (itng) - las...

Documents