abstraction: a key notion for reverse engineering in a system reengineering approach

32
JOURNAL OF SOFTWARE MAINTENANCE: RESEARCH AND PRACTICE J. Softw. Maint: Res. Pract. 2000; 12:197–228 Research Abstraction: a key notion for reverse engineering in a system reengineering approach Hongji Yang *,† , Xiaodong Liu and Hussein Zedan Software Evolution and Reengineering Group, Department of Computer Science, De Montfort University, Leicester LE1 9BH, U.K. SUMMARY This paper advocates that extracting formal specification semantically consistent to the original legacy system will facilitate further redesign and forward engineering greatly. The three parts of reengineering could be integrated on the basis of a wide spectrum language. The key approach to comprehension and the production of formal specification is a notion of abstraction. Transformation can help to change the original source code into alternative forms, but with the same semantics. Abstraction is often interpreted as the act of hiding irrelevant details. What constitute as relevant details is often left open to different interpretations. A unified approach for reverse engineering is described within which the notion of abstraction is classified and precisely defined. Abstraction rules are given and applied to various case studies. Copyright 2000 John Wiley & Sons, Ltd. KEY WORDS: reverse engineering; reengineering; wide spectrum language; abstraction; object oriented; interval temporal logic 1. INTRODUCTION The importance and popularity of software reengineering increase as more and more successful computing systems become legacy systems. It has become evident that old architectures severely constrain new designs, which leads to demands for changes to existing software, for instance, fixing errors, adding enhancements and making optimizations. The implementation of the changes themselves creates problems over and above those that are being rectified. Early systems tended to be unstructured and ad hoc, which makes it hard to understand their behaviour. System documentation is often incomplete or out of date. With current methods, it is often difficult to retest or verify a system after a * Correspondence to: Hongji Yang, Software Evolution and Reengineering Group, Department of Computer Science, De Montfort University, Leicester LE1 9BH, U.K. E-mail: [email protected] Received 18 October 1999 Copyright 2000 John Wiley & Sons, Ltd. Revised 7 April 2000

Upload: hongji-yang

Post on 06-Jun-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Abstraction: a key notion for reverse engineering in a system reengineering approach

JOURNAL OF SOFTWARE MAINTENANCE: RESEARCH AND PRACTICEJ. Softw. Maint: Res. Pract.2000;12:197–228

Research

Abstraction: a key notion forreverse engineering in a systemreengineering approach

Hongji Yang∗,†, Xiaodong Liu and Hussein Zedan

Software Evolution and Reengineering Group, Department ofComputer Science, De Montfort University, Leicester LE1 9BH, U.K.

SUMMARY

This paper advocates that extracting formal specification semantically consistent to the original legacysystem will facilitate further redesign and forward engineering greatly. The three parts of reengineeringcould be integrated on the basis of a wide spectrum language. The key approach to comprehension and theproduction of formal specification is a notion ofabstraction. Transformation can help to change the originalsource code into alternative forms, but with the same semantics. Abstraction is often interpreted as the actof hiding irrelevant details. What constitute as relevant details is often left open to different interpretations.A unified approach for reverse engineering is described within which the notion of abstraction is classifiedand precisely defined. Abstraction rules are given and applied to various case studies. Copyright 2000John Wiley & Sons, Ltd.

KEY WORDS: reverse engineering; reengineering; wide spectrum language; abstraction; object oriented; intervaltemporal logic

1. INTRODUCTION

The importance and popularity of software reengineering increase as more and more successfulcomputing systems become legacy systems. It has become evident that old architectures severelyconstrain new designs, which leads to demands for changes to existing software, for instance, fixingerrors, adding enhancements and making optimizations. The implementation of the changes themselvescreates problems over and above those that are being rectified. Early systems tended to be unstructuredand ad hoc, which makes it hard to understand their behaviour. System documentation is oftenincomplete or out of date. With current methods, it is often difficult to retest or verify a system after a

∗Correspondence to: Hongji Yang, Software Evolution and Reengineering Group, Department of Computer Science,De Montfort University, Leicester LE1 9BH, U.K.†E-mail: [email protected]

Received 18 October 1999Copyright 2000 John Wiley & Sons, Ltd. Revised 7 April 2000

Page 2: Abstraction: a key notion for reverse engineering in a system reengineering approach

198 H. YANG, X. LIU AND H. ZEDAN

change has been made. By its nature, the process of designing software is an evolutionary one and thiswill ultimately lead to degraded structure and greater complexity.

Reengineering consists of three parts: reverse, redesign and forward engineering. It is naturalto assume that forward engineering can be carried out by borrowing an existing suitable softwaredevelopment method which has been well developed.

In this paper, we mainly explore approaches for the reverse engineering part of reengineering.It is commonly recognized that the process of reverse engineering constitutes three main activities:restructuring, comprehension and the production of a high level of specification. What is required isan integrated approach that deals with these activities in a unified manner. In order to increase thecorrectness in reverse engineering, such an approach must be based on formal techniques [1].

In addition, to facilitate the subsequent process of forward engineering, our integrated approachmust also be based on modern design/programming paradigms, namely object oriented techniques.Furthermore, our approach can also handle real-time applications in which its functional correctnessdepends on time constraints and temporal behaviour.

The approach is based on the construction of a wide spectrum language, known as RWSL, whichhas a sound formal semantics. This paper concentrates on the architectural design framework and theformalization of the notion of ‘abstraction’.

This paper is organized as follows. We first introduce our architectural design framework forreengineering. A taxonomy of abstraction is then defined formally and abstraction rules are givenaiming for practical specification extraction. Finally, the approach and abstraction rules are applied tovarious case studies. A conclusion is drawn based on our analysis.

2. RELATED WORK

The existing research closely related to our work (formal and informal) on software abstraction forreverse engineering was studied when our approach was developed [2–6]. Here only the few mostrelated projects are briefly discussed.

The Transformation-based Maintenance Model (TMM) is a method proposed in [7] for recoveringabstractions and design decisions that were made during implementation. The abstractions and designdecisions of the software must be recovered first before the software is re-implemented. The authors of[8] proposed ‘A Concept Recognition-Based Program Transformation System’, whose characteristicis its use of concept recognition, the understanding and abstraction of high-level programming anddomain entities in programs, as the basis for transformations. Four understanding levels are defined:the text level, the syntactic level, the semantic level and the concept level. The program transformationsystem depends on its program understanding capabilities up to the concept level. The REFORMproject developed a tool named the Maintainer’s Assistant to assist the human maintainer, handlingassembler and Z in an easy to use way [9–11]. One of the most important successes of Maintainer’sAssistant is that it is based on a wide spectrum language whose syntax and semantics are formallydefined. However, Maintainer’s Assistant focused on transformation rather than abstraction. How touse multi-levelled abstractions and relevant abstraction rules to reach a good system reengineering,especially reverse engineering, was only considered slightly. Researchers at the University of Californiaat San Diego [12] based their approach to reverse engineering on abstraction, and identified three kindsof abstractions: problem domain, structural and logical. However, this work was not formalized and

Copyright 2000 John Wiley & Sons, Ltd. J. Softw. Maint: Res. Pract.2000;12:197–228

Page 3: Abstraction: a key notion for reverse engineering in a system reengineering approach

ABSTRACTION: A KEY NOTION FOR REVERSE ENGINEERING 199

did not have multiple abstraction levels with an integrated formal semantics. This limits the accuracyand power of their approach. PRISME is a reverse engineering tool based on functional abstraction[13], but the abstraction in PRISME is function-based instead of semantics-based. It does not engage amature formal method to specify the target system, and PRISME can only extract simple ‘signatures’as pieces of an outline description of the system, but not a complete specification. The AUTOSPECproject [14–17] involves an approach to abstracting formal specifications from program code with theweakest preconditionwpand strongest postconditionsppredicate transformer. The difference betweenthe wp and sp approaches is that one has the ability to directly apply a predicate transformer to aprogram (i.e.,sp) whilst the other uses a predicate transformer as a guideline for constructing formalspecifications (i.e.,wp). In [17], the approach is extended to combine the use of informal methods,such as structured analysis, with formal techniques (sp) to reverse engineer imperative programs in C.However, this project only deals with the first abstraction step of reverse engineering, i.e. it only extractsan abstraction at a very low level of specification, in the form of predicate logic as a notation of thesource code. A system called AUTOSPEC is being developed. The authors of [18] presented work onassembler to C migration using the FermaT transformation system, where IBM 370 Assembler codewas migrated to equivalent, maintainable C code. In this work, the FermaT transformation engine onlytried to simplify the original code, remove redundancies and track dispatch codes, and did not try toabstract the object code or raise the levels of abstraction. The Bylands project [19,20] attempted themaintenance of safety-critical software, where issues on reverse engineering existing real-time codewere discussed; an initial design framework was proposed, and formal transformation was suggestedas a unifying technology for the problem of safety-critical system maintenance. However, the semanticsof the language proposed for representing concurrency was not defined, and no concrete abstractionrules were presented for dealing with extracting designs from real-time code.

To summarize, although many aspects of reverse engineering have been researched, using formalabstraction rules to extract formal specifications from source code is rarely addressed. The above-listedstudies solved some closely-related problems, such as transformation and part of informal abstraction.However, none of them engages in extracting semantics-consistent formal specifications from sourcecode through abstraction. Formal abstraction rules for reverse engineering seem not to have beendeveloped. Most of these approaches have been advocated for reverse engineering, but few have beenevaluated in practice on large-scale code. Abstraction levels are not clearly defined. Abstraction to beused in a ‘real-time’ system has been rarely addressed. Techniques for coping with crossing levels ofabstraction covering all abstraction levels need close research. Where genuine crossing of levels ofabstraction occurs, this is done manually.

3. AN INTEGRATED FRAMEWORK FOR REENGINEERING

Our study shows that using a wide spectrum language is one of the most suitable and efficientapproaches to the reengineering of computing systems because of its various abstraction levels andthe integrity of these levels [1]. In this section, the proposed architecture of RWSL (reengineering widespectrum language) is described and the main steps of the working process of using RWSL is explained.

RWSL is a multi-layered wide spectrum language with sound formal semantics. Due to the distinctadvantage of interval temporal logic (ITL) [21–23], we use it to provide a semantic foundation toRWSL.

Copyright 2000 John Wiley & Sons, Ltd. J. Softw. Maint: Res. Pract.2000;12:197–228

Page 4: Abstraction: a key notion for reverse engineering in a system reengineering approach

200 H. YANG, X. LIU AND H. ZEDAN

Interval Temporal Logic (ITL)

Object Temporal Agent Model (ObTAM)

Timed Guarded Command Language (TGCL)

Common Structural Language (CSL)

Common Object-Oriented Language (COOL)

translator universal

translator

universaltranslator

universaltranslator

universal

Legacy structural systemsRWSL

New structural systems

New OO systemsLegacy OO systems

Figure 1. RWSL: general architecture.

Figure1 shows the architecture of RWSL. The top part is the object-oriented section, which includesthree layers, namely ITL specification, object-oriented temporal agent model (ObTAM) and commonobject-oriented language (COOL). ObTAM is an extension of temporal agent model (TAM) language[24] with object-oriented features. The most concrete layer of the object-oriented section is commonobject-oriented language, which provides structures such as those in an ordinary OO language.

The bottom part is the structural (procedural) section, which also includes three layers: ITLspecification, timed guarded command language (TGCL) and common structural language (CSL).TGCL is an extension of Dijkstra’s guarded command language [25] with a time and concurrencyfeature. Both TGCL and CSL are at the code level, while in CSL operators and concepts areimplemented in common programming elements, such as shunts.

The approach may be used as follows: the source code of a legacy system is first translated intoCSL through a translator.† Such a translation ensures standardization, since legacy systems may havecome in various languages, such as C, Pascal or COBOL. This is followed by transformation to TGCLthrough successive application of correctness-preserving transformation rules.

There are three possible paths for reengineering (Figure2).

• TGCL code can be improved/extended by adding the required extra functionalities. TheTGCL code can then be transformed into an equivalent programming language (either throughtransformation or straightforward translation). In this path, the procedural nature of the legacysystem is kept.

†The ‘universal translator’, as shown in Figure1, translates between a source/target language to/from RWSL (e.g., a COBOL-to-RWSL Translator [26]). This translator must be written for each source/target language and is simply a one-to-one mapping,to ensure semantics equivalence.

Copyright 2000 John Wiley & Sons, Ltd. J. Softw. Maint: Res. Pract.2000;12:197–228

Page 5: Abstraction: a key notion for reverse engineering in a system reengineering approach

ABSTRACTION: A KEY NOTION FOR REVERSE ENGINEERING 201

Common Structural Language

TGCL (Timed GCL)

ObTAM (Object-oriented TAM)

ITL Specification

transform/abstract

Common OO Language

improve

transform

translate

Specification Level

Code Level

translate

ObTAM (Object-oriented TAM)

Code Level

e.g., Ada, Java, C++Legacy Systems in procedural code:

e.g., C, Pascal, COBOL

abstract

extract

abstract

Common Structural Language

New Systems in procedural code:e.g., C, Pascal, COBOL

TGCL (Timed GCL)

transform

translate

ITL Specification

Specification Level

refine refine

re-specify

improve

New Systems in object-oriented code:

Figure 2. RWSL: working process in reengineering.

• If the object-oriented paradigm is sought, object extraction is performed to obtain an equivalentObTAM code. Then the ObTAM code is extended/improved. Subsequently, this is transformedto an object-oriented language, such as JAVA and C++.• If a high level of abstract specification is needed, then following the construction of TGCL code

or/and ObTAM code, the semantics calculation is performed to produce an ITL specification.This will be subsequently used as a basis for forward engineering through refinement.

In this paper, we focus on abstraction.

4. REALIZATION OF OUR REENGINEERING APPROACH

For the sake of completeness, we give an outline of the components of the wide spectrum languageused in our reengineering approach.

4.1. Interval temporal logic

ITL forms the most abstract and logical layer in our language. It is used to give a specification-oriented semantics for TGCL and ObTAM. Furthermore, all transformation, object extraction, abstractand refinement relations and rules are precisely interpreted and proved within ITL. The choice ofITL is based on a number of reasons. It is a flexible notation for both propositional and first-orderreasoning about periods of time found in descriptions of hardware and software systems. Unlike mosttemporal logics, ITL can handle both sequential and parallel composition and offer powerful andextensible specification and proof techniques for reasoning about properties involving safety, liveness

Copyright 2000 John Wiley & Sons, Ltd. J. Softw. Maint: Res. Pract.2000;12:197–228

Page 6: Abstraction: a key notion for reverse engineering in a system reengineering approach

202 H. YANG, X. LIU AND H. ZEDAN

and projected time. Timing constraints are expressible and furthermore most imperative programmingconstructs can be viewed as formulae in a slightly modified version of ITL [22].

In addition, a refinement calculus for ITL was provided in [22], which takes ITL and calculatesa refined concrete portion in Tempura. The syntax of ITL is given in the next subsection. Interestedreaders may refer to [21] for ITL semantics and more exposition of the formalism and the executablesubset.

4.2. Syntax of ITL

An intervalσ is considered to be an (in)finite sequence of statesσ0σ1 . . . , where a stateσi is a mappingfrom the set of variablesVar to the set of valuesVal. The length|σ | of an intervalσ0 . . . σn is equal ton(one less than the number of states in the interval, i.e. a one state interval has length 0).

ITL syntax is defined as follows, wherei is a constant;a is a static variable (which does not changewithin an interval);A is a state variable (which can change within an interval);v is a static or statevariable;g is a function symbol andp is a predicate symbol.Expressions:

exp ::= i | a | A | g(exp1, . . . , expn) | ıa : fFormulae:

f ::= p(exp1, . . . , expn) | exp1 = exp2 | exp1 < exp2 | ¬f | f1 ∧ f2 | ∀v • f | skip | f1; f2 | f ∗.The informal semantics of the most interesting constructs is as follows:

• ıa : f : the value ofa such thatf holds;• ∀v • f : for all v such thatf holds;• skip: unit interval (length 1);• f1; f2: holds if the interval can be decomposed (‘chopped’) into a prefix and suffix interval, such

thatf1 holds over the prefix andf2 over the suffix;• f ∗: holds if the interval is decomposable into a finite number of intervals such that for each of

themf holds.

4.3. Timed guarded command language

TGCL is an extension of Dijkstra’s guarded command language with the abilities of time, concurrencyand communication. Letx denote a variable ande denote an expression, then the syntax of TGCL isas follows:

• Primitivesx := e

(x, y)← s

x → s

delayn

• Composed structuresA;A′if 2i∈I gi thenAi fiwhileg doA′ od

Copyright 2000 John Wiley & Sons, Ltd. J. Softw. Maint: Res. Pract.2000;12:197–228

Page 7: Abstraction: a key notion for reverse engineering in a system reengineering approach

ABSTRACTION: A KEY NOTION FOR REVERSE ENGINEERING 203

P(In ei , Outxj )

parbeginA1 ‖ A2 ‖, . . . , ‖ An parend[t]A′A1�

ts A2

• Declaration structures. LetT be a type, thenx : TT = {xi : Ti}procP(In pini : Ti, Out poutj : T ′j ){A′}

The informal semantics of the unusual structures of the above components is described as follows:

• A ;A′ means the sequential composition ofA andA′.• parbeginA1 ‖ A2 ‖, . . . , ‖ An parend. Here ‘‖’ is introduced as the parallel operator. This

statement means thatA1, . . . ,An execute concurrently, and terminate until allAi terminate.• [t]A′ means that the execution ofA′ should be completed withint time units (deadline).• A1 �

ts A2. The given shunts is treated as a signal, and is monitored from the release time for

t time units. Ifs is written to in that interval then the agentA2 is released with a release timeequal to the end of the interval, otherwise the agentA1 is released at the end of the interval.• (x, y)← s is the input statement with time feature, which reads the timestamp and value from a

shunts at the same time. The timestamp is read intox, and the value intoy.• x → s is the output statement with time feature, which writes the value given into shunts.

A TGCL variable can be the following:

v ::= vsig | vstr | x.f

wherevsig is a single variable,vstr a structure variable andx.f a data field of a structure variable.TGCL also adopts the concept of ‘shunt’ found in TAM.Shuntsare shared variables via which

communications between agents are performed. A shunt contains two values: the first one is a stampwhich records the time of the most recent write, and the second one is the value which was mostrecently written. The formal semantics of TGCL can be seen in [27].

4.4. Object-oriented temporal agent model

The syntax of ObTAM is the same as the syntax of TGCL less the procedural part, but with thefollowing additional object-oriented portion:A::= x : T| T <sub T ′| T = {xi : Ti, mj (In pinjk

: Tk, Out poutjl: T ′l )[Aj ]}

| x.f

| x.m(In ek : Tk, Out poutl : T ′l )

A variable of ObTAM can be the following:

v:: = vsig | vobj | x.f

wherevsig is a single variable,vobj an object variable, andx.f a data field of an object variable.The informal semantics of ObTAM is described as follows.

Copyright 2000 John Wiley & Sons, Ltd. J. Softw. Maint: Res. Pract.2000;12:197–228

Page 8: Abstraction: a key notion for reverse engineering in a system reengineering approach

204 H. YANG, X. LIU AND H. ZEDAN

• x : T means definingx as a variable of typeT . T can be a simple data type or a class.• T <sub T ′ can be used to build the object hierarchy. It declares that classT is a subclass of

classT ′. As a consequence,T will inherit all the data fields and methods inT ′ if they are notredefined inT . On the other hand, all the data fields and methods inT ′ will be overridden withthe counterparts inT if they are redefined inT .• T = {xi : Ti, mj (In pinjk

: Tk, Out poutjl: T ′l )[Aj ]} is the class building declaration. It

defines a class namedT , which has data fieldsxi of type Ti , i ∈ 1 . . . n, and methodsmj ,j ∈ 1 . . . r. The behaviour of a class is a sequence of method invocations.pinjk

stands for theinput parameters of methodmj , andpoutjl

stands for the output parameters of methodmj . Theinput parameter passing convention iscall by value, and the output parameter passing conventionis call by reference.Aj is the methods body of methodmj .• x.d is an object field reference.x is an object andd is a field ofx.• x.m(In ek : Tk, Out poutl : T ′l ) is method invocation. It invokes the methodm in objectx.

5. TAXONOMY OF ABSTRACTION

5.1. Definitions

The simplest interpretation of the notion of abstraction is to hide irrelevant details. Although simple, itleaves open to a wider interpretation the definition of what constitutes ‘irrelevant’. For this reason, wehave decided to categorize abstraction in a way that hopefully makes it clear. We classify abstraction asfollows: weakening abstraction (WA), hiding abstraction (HA), temporal abstraction (TA), structuralabstraction (SA)anddata abstraction (DA). These five kinds of abstraction form a rather completetaxonomy of abstractions.

An abstraction relation� is defined as a function relating two agents. In RWSL, an agent is definedas an entity with independent functionality, which can be either a group of statements or a formula. AnagentB is an abstraction ofA, written asA �f B (read asB is an abstraction ofA in respect off ) isdefined as

A �f B∧= f (A,B)

wheref is defined as bending to the type of abstraction, namely WA, HA, TA, SA and DA.

• For WA:A �WA B

∧= [[A]] ⇒ [[B]]means thatB is a weakening abstraction ofA on the condition that the semantics ofA implicatesthat ofB. It is the inverse of functional refinement.• For HA:

A �HA B∧= (∃x q [[A]])⇒ [[B]]

means thatB is a hiding abstraction ofA on the condition that a part ofA is hidden. This isa special case of WA, where part of the agent’s data space is considered as irrelevant. Hidingabstraction is often used to get rid of local variables and hide communication channels.• For TA:

A �T A B∧= ([[A]] ⇒ [[B]]) ∧ (T (A) ≷ T (B)∨ T (A)@opT (B))

Copyright 2000 John Wiley & Sons, Ltd. J. Softw. Maint: Res. Pract.2000;12:197–228

Page 9: Abstraction: a key notion for reverse engineering in a system reengineering approach

ABSTRACTION: A KEY NOTION FOR REVERSE ENGINEERING 205

T (A) is the duration ofA, i.e.T (A) = tω − tα , andtω represents the start time of the executionof A andtα represents the termination time of the execution ofA. This definition means thatBis a temporal abstraction ofA on the condition that the semantics ofB is a weakening of that ofA andT (A) has the following relation with that ofB:

– fixed time lines:T (A) has a quantitative relation toT (B), i.e. larger or smaller; thisrelation is denotedT (A) ≷ T (B);

– variable time lines:T (A) can be projected toT (B) through certain function(s); thisrelation is denotedT (A)@opT (B).

In this abstraction, the execution time ofB could be either speeded up or slowed down.• For SA:

A ;B �SA C∧= [[A ;B]] ⇒ [[C ]] and

∀A′,B′ q A � A′ ∧ B � B′ ⇒ A′ ;B′ 6∈ CA ‖ B �SA C

∧= [[A ‖ B]] ⇒ [[C ]] and

∀A′,B′ q A � A′ ∧ B � B′ ⇒ A′ ‖ B′ 6∈ Cmean that agentC is a structural abstraction of two agentsA andB on the condition thatA andB are composed sequentially or in parallel and this composition does not occur inC, and thesemantics of the composition of the two agentsA andB implicates the semantics of agentC.Obviously, the composition betweenA andB is merged inC.• For DA: assumingA andB are two agents,r is a data abstraction relation:

r = (states ofA −→ states ofB)

or in a more formal format:

r = {(x, y) : x ∈ X, y ∈ Y,X = {states ofA}, Y = {states ofB}}where astateof an agent consists of the values of all the variables in the frame of the agent.Therefore,A is data-abstracted toB on relationr, denotedA �DA−r B, is defined as:

A �DA−r B∧= r([[A]])⇒ [[B]]

which means thatB is a data abstraction ofA on relationr on the condition that if the states ofA are mapped to those ofB, then the semantics ofB is a weakening of the converted semanticsofA.

5.2. Healthiness conditions

Healthiness conditions are conditions that must hold true to validate the above abstraction definitions.Different abstractions have different healthiness conditions. These are similar to Dijkstra’s healthinessconditions for his guarded command language. One can think of them as axioms or invariants.

Weakening abstraction: Any agent should not be abstracted to TRUE or FALSE (trivial specificationor starting from scratch).

For allA �WA B

Copyright 2000 John Wiley & Sons, Ltd. J. Softw. Maint: Res. Pract.2000;12:197–228

Page 10: Abstraction: a key notion for reverse engineering in a system reengineering approach

206 H. YANG, X. LIU AND H. ZEDAN

there must beB 6= TRUE orB 6= FALSE

Hiding abstraction: Shared variables should not be hidden.

For allA �HA B

there must be∀(C1, C2) q (C1 6⊂ C2 ∧ C2 6⊂ C1)⇒ (x 6∈ WC1 ∨ x 6∈ WC2)

Temporal abstraction

• An infinite action cannot be performed in a finite interval.

For allA �T A B

there must beif T (A) = ∞ thenT (B) = ∞

• Any agent cannot be abstracted to an agent with negative time interval.

T (B) ≥ 0

• An abstracted agent can be performed in a longer interval than its original until no deadlock iscaused.

T (B) > T (A)⇒ T (B) < Tmax

whereTmax is the maximum interval that allows the execution ofB without any deadlock.

Structural abstraction : Two finite agents cannot be structurally abstracted to an infinite agent.

For allA ;B �SA C andA ‖ B �SA C

there must befinite(A) ∧ finite(B)⇒ finite(C)

This condition assures that two contradicting agents cannot be structurally abstracted into a singleagent, neither in parallel composition nor in sequential composition. For example, in parallelcomposition, if there is a communication deadlock or a resource deadlock betweenA andB, thenA andB are contradicted and therefore cannot be abstracted to any single agent.

Data abstraction: Recursion on data abstraction relation is forbidden. This means that the variable setof A should not be the same ofB, i.e. data abstraction relation should not map to itself. In that case,data abstraction turns into a kind of weakening abstraction.

For allA �DA−r B

there must beWA 6= WB

Copyright 2000 John Wiley & Sons, Ltd. J. Softw. Maint: Res. Pract.2000;12:197–228

Page 11: Abstraction: a key notion for reverse engineering in a system reengineering approach

ABSTRACTION: A KEY NOTION FOR REVERSE ENGINEERING 207

HA

TA

SA

DA

WA

Figure 3. Partial ordering relations of abstractions.

5.3. Monotonicity of the abstraction relations

Assuming thatA �f B andC is context, monotonicity of�f onC is defined asC(A) �f C(B) holdstrue. Whether�f is monotonic onC depends upon the contents ofC.

The conclusion we have reached is that: forC = ∧ |∨ | ; | ‖ | ⇒, if A andB occur as subcomponentsof C, then:

• for f =WA, C(A) �WA C(B) is true;• for f = HA, C(A) �HA C(B) is true;• for f = TA, C(A) �T A C(B) is true;• for f = SA,C(A) �SA C(B) is false under most circumstances;• for f = DA on data abstraction relationr, C(A) �DA−r C(B) is false. BecauseA andB occur

as subcomponents ofC, it is not certain whether the other components inC can also be mappedoverr and whether there is still a semantics weakening after this mapping.

5.4. Relations between abstractions

The partial ordering relations between the five categories of abstractions discussed in this paper areshown in Figure3.

(1) TA, SA and HA are also WA. This means that WA is the basis of TA, SA and HA, or in otherwords, TA, SA and HA are stronger than WA. This is because WA is part of the semantic basis ofthe other abstractions. Abstraction is different from both transformation and restructuring, andthere should be a consistency between the original semantics and the abstracted semantics.

(2) TA, SA and HA are independent of each other. There is no partial ordering or overlap betweenthem.

(3) For DA, if the variable set ofA remains the same as that ofB, i.e. the data abstraction relationr

maps to itself, then data abstraction turns into weakening abstraction.

Copyright 2000 John Wiley & Sons, Ltd. J. Softw. Maint: Res. Pract.2000;12:197–228

Page 12: Abstraction: a key notion for reverse engineering in a system reengineering approach

208 H. YANG, X. LIU AND H. ZEDAN

6. ABSTRACTION RULES

An implementation (code), a design and a specification of a software system are usually at differentlevels of abstraction. To move from code to design and then to specification involves a process ofcrossing levels of abstraction. Usually a specification is more abstract than its implementation, andtherefore the above process can also be represented as:

concrete→ less abstract→ more abstract

In this section we propose a group of abstraction rules for conducting abstraction in the aboveprocess.

6.1. Abstraction: weakening in semantics

A specification is different from source code in the following aspects:

(1) source code has more implementation details which need not exist in specification;(2) implementation is focused onhow to do, while specification is focused onwhat to do;(3) there is much more non-determinism in specification than in implementation.

Abstraction is weakening in semantics and this weakening is used to cause the following changes:

(1) inessential design/implementation details are omitted;(2) non-determinism is increased;(3) how to do is substituted by what to do.

We have classified the abstraction rules obtained through our study into two categories:elementaryabstraction rules, rules to abstract source statements into logic formulae, which may be very redundantand specific; andfurther abstraction rules, which extract a more concise and abstract specificationfrom the formulae through compositions and semantics weakening. Also, abstraction rules fall intodifferent sections according to the domain that the rules deal with. For example, when dealing withan object-oriented (time-critical) system, the abstraction rules consist ofgeneral abstraction rules,object-oriented abstraction rulesandtime critical rules.

6.2. Elementary abstraction rules

Elementary abstraction rules[28] aim to abstract the statements in TGCL and ObTAM to formulae inITL, i.e. formal specification. The resultant formulae may be redundant, or even ‘too specific’, whichwill be composed and simplified withfurther abstraction rules.

The statements in TGCL and ObTAM consist of two sets:simple statementssuch as assignment,input and output, andcomposite statementswhich are a composition of simple statements andcomposite statements through composition structures, such as a condition, a loop and a procedure.Therefore, elementary abstraction rules fall into two sets correspondingly: the first set namedPrimitiveAbstraction Rulesconverts simple statements to ITL formulae, and the second set namedCompoundAbstraction Rulesdeals with composite statements.

Copyright 2000 John Wiley & Sons, Ltd. J. Softw. Maint: Res. Pract.2000;12:197–228

Page 13: Abstraction: a key notion for reverse engineering in a system reengineering approach

ABSTRACTION: A KEY NOTION FOR REVERSE ENGINEERING 209

6.2.1. Primitive abstraction rules

The formal definition of primitive abstraction rules is as follows:

S � LOGIC(S)

whereS denotes an uncomposite statement in source code andLOGIC gives the semantics definitionof S in logical form.

For example, the following abstraction rule:

x := e � ©x = e

extracts a logic formula of the assignment statement which assigns the value of expressione to variablex.© is an ITL operator which means ‘in next interval’.

6.2.2. Compound abstraction rules

The formal definition of compound abstraction rules is as follows:

Si � 8i

C(Si) � fC(8i)

wherefC denotes logical construction corresponding to composition operatorC, andSi denotes agentswhich can be simple statements or composite statements.

For example, the rule for ‘sequential composition’ is:A � 8

B � 9

A ;B � 8 ;9i.e. two sequential agents can be abstracted separately, and the results are composed through thesequential operator ‘;’.

6.3. Further abstraction rules

Further abstraction rules[28] aim to extract more concise and abstract specifications from the formulaeobtained through applying elementary abstraction rules. Logic composition and semantics weakeningare the basis of further abstraction, and during the abstraction domain knowledge may be applied bysoftware engineers to give the software system a more concise and ‘professional’ description.

There is not any ‘object combination’ during further abstraction, i.e. objects will be abstracted butnot combined.

Abstraction is a process of generalization, removing restrictions, eliminating detail and removinginessential information. Unlike transformation which keeps the semantics unchanged, abstractionendeavours in weakening the original semantics of system implementation. In the general case,identification of the parts to be abstracted away cannot be determined automatically within the system,therefore user guidance is needed. A set ofabstraction patternsare developed as a means of acquiring

Copyright 2000 John Wiley & Sons, Ltd. J. Softw. Maint: Res. Pract.2000;12:197–228

Page 14: Abstraction: a key notion for reverse engineering in a system reengineering approach

210 H. YANG, X. LIU AND H. ZEDAN

observations identified by software engineers. Then the computer system can perform abstraction withthe aid of these observations and the developed abstraction rules.

For example,user interface formatis one of the further abstraction rules because almost allcomputing systems have to involve some contents about the format of their interface with users.However, this format is not directly function-related, and therefore could be viewed as implementationdetails to abstract away in specification. There are three sorts of so-called user interface format: inputformat, output format and graphic user interface (GUI).

Other further abstraction rules deal with sequence folding, specification combination, state test andexception handling, semantic core, trivial elements, domain function, efficiency-improving details, etc.A full description is given in the appendix and in [28].

7. CASE STUDIES

A prototype system, named the Reengineering Assistant (RA), has been developed and case studieshave been experimenting with the prototype. RA is a semi-automatic tool which aims at helpingsoftware engineers in the process of the reengineering of legacy systems. It is a rule-based intelligentsystem. Automation is the goal of RA; however, human intervention is crucial in reverse engineering,i.e. full automation is impossible and RA adopts semi-automation to facilitate the process ofreengineering. All the elementary abstractions can be done automatically, and all further abstractionscan be done automatically provided that correct user observations of the current system situation havebeen identified with abstraction patterns.

Here we use the mine drainage system, a typical case study in the real-time domain, and a robotcontrol system to demonstrate the proposed approach.

7.1. Mine drainage system

7.1.1. Background

The mine drainage system concerns the software necessary to manage a simplified pump control systemfor a mining environment. The case study is a good demonstration of the real-time aspect of theproposed approach. The original program is written in ADA (20 000 lines). The system is used topump mine water, which collects in a sump at the bottom of the shaft, to the surface. The main safetyrequirement is that the pump should not be operated when the level of methane gas in the mine reachesa high value due to the risk of explosion. A simple schematic diagram of the system is given in Figure4.

The functional specification of the system is divided into four components: the pump operation, theenvironment monitoring, the operator interaction and system monitoring.

The required behaviour of the pump is that it monitors the water levels in the sump. When the waterreaches a high level, the pump is turned on and the sump is drained until the water reaches the lowlevel. At this point, the pump is turned off. A flow of water in the pipe can be detected if required. Thepump should be allowed to operate only if the methane level in the mine is below a critical level.

The environment must be monitored to detect the level of methane in the air; there is a level beyondwhich it is not safe to cut coal or operate the pump. The monitoring also measures the level of carbon

Copyright 2000 John Wiley & Sons, Ltd. J. Softw. Maint: Res. Pract.2000;12:197–228

Page 15: Abstraction: a key notion for reverse engineering in a system reengineering approach

ABSTRACTION: A KEY NOTION FOR REVERSE ENGINEERING 211

To surface controlroom

PUMP

SUMP

High water level detector

Low water level detector

Environmentmonitoring

station

Water flow sensor

Carbon monoxide sensor

Methane sensor

Airflow sensor

Pumpcontrolstation

Figure 4. A mine drainage control system.

monoxide in the mine and detects whether there is an adequate flow of air. Alarms must be signalled ifgas levels or air-flow become critical.

The system is controlled from the surface via an operator’s console. The operator is informed of allcritical events. All the system events are to be stored in an archival database, and may be retrieved anddisplayed upon request.

The non-functional requirements include three components: timing, dependability and security. Thiscase study is mainly concerned with the timing requirements, which appear as monitoring periods,pump shut-down deadline and operator information deadline.

The complete case study is given in [28]. Here we use pump module as an example. Please refer tothe appendix for the details of abstraction rules used in this and the next example.

7.1.2. Extracting the specification

The system is implemented in Ada. As a preliminary process, we translated this implementation intoCSL.

proc motor-unsafe(){

if motor-status=Onthen

duration in 5ms sw:=Off end;motor-status:=Off;

Copyright 2000 John Wiley & Sons, Ltd. J. Softw. Maint: Res. Pract.2000;12:197–228

Page 16: Abstraction: a key notion for reverse engineering in a system reengineering approach

212 H. YANG, X. LIU AND H. ZEDAN

motor-log(In “motor-stopped”)fi;motor-condition:=Disabled;motor-log(In “motor-unsafe”)

};

proc motor-safe(){

if motor-status=Offthen

duration in 5ms sw:=On end;motor-status:=On;motor-log(In ”motor-started”)

fi;motor-condition:=Enabled;motor-log(In “motor-safe”)

};

proc set-pump(In pump-status: Boolean;){

if pump-status=Onthen if motor-status=Off

then if motor-condition=Disabledthen err-msg(In “pump-not-safe”)fi;if ch4-status=Motor-safethen motor-status:=On;

duration in 5ms sw:=On end;motor-log(In “motor-started”)

else err-msg(In “pump-not-safe”)fi

fielse if motor-status=On

then motor-status:=Off;if motor-condition=Enabledthen

duration in 5ms sw:=Off end;motor-log(In “motor-stopped”)

fifi

fi};

We first extract an ITL specification of the three procedures by applyingelementary abstraction rules:

motor-unsafe()∧= motor-status= On∧ (sw := Off ∧ len≤ 5ms;motor-status:= Off ;motor-log(′motor-stopped′)) ;

motor-condition:= Disabled;motor-log(′motor-unsafe′)

Copyright 2000 John Wiley & Sons, Ltd. J. Softw. Maint: Res. Pract.2000;12:197–228

Page 17: Abstraction: a key notion for reverse engineering in a system reengineering approach

ABSTRACTION: A KEY NOTION FOR REVERSE ENGINEERING 213

motor-safe()∧= motor-status= Off ∧ (sw := On∧ len≤ 5ms;motor-status:= On ;motor-log(′motor-started′)) ;

motor-condition:= Enabled;motor-log(′motor-safe′)

set-pump(pump-status)∧=

(pump-status= On∧(motor-status= Off∧

(motor-condition= Disabled∧ err-msg(′pump-not-safe′)) ;(ch4-status= Motor-safe∧ (motor-status:= On ; sw := On∧ len≤ 5ms;motor-log(′motor-started′)))∨(ch4-status= Motor-unsafe∧ err-msg(′pump-not-safe′))))

∨(pump-status= Off∧motor-status= On∧ (motor-status:= Off ;

motor-condition= Enabled∧ (sw := Off ∧ len≤ 5ms;motor-log(′motor-stopped′))))

In the above specification, there are several things we need to abstract. Firstly, some ‘chop’ operatorcould be replaced by logic conjunction (withsequence folding rule), and therefore result in further logiccomposition. Secondly, there are quite a lot of exception test and handling details in the specification.For example, when the methane level is not safe or the motor condition is disabled the system gives outan error message and takes no action. In a high-level specification, these kinds of descriptions couldbe considered as implementation-related details and therefore be abstracted away (withstate testandexception handling rule). A more abstracted specification is given as follows:

motor-unsafe()∧= motor-status= On∧ (sw := Off ∧ len≤ 5ms;motor-status:= Off ∧motor-log(′motor-stopped′)) ;

motor-condition:= Disabled∧motor-log(′motor-unsafe′)

motor-safe()∧= motor-status= Off ∧ (sw := On∧ len≤ 5ms;motor-status:= On∧motor-log(′motor-started′)) ;

motor-condition:= Enabled∧motor-log(′motor-safe′)

set-pump(pump-status)∧=

(pump-status= On∧(motor-status= Off∧

(ch4-status= Motor-safe∧ (motor-status:= On ; sw := On∧ len≤ 5ms;motor-log(′motor-started′)))))∨(pump-status= Off∧

motor-status= On∧ (motor-status:= Off ;motor-condition= Enabled∧ (sw := Off ∧ len≤ 5ms;motor-log(′motor-stopped′))))

More concisely, the specification is written as follows (withITL axioms):

motor-unsafe()∧= motor-status= On∧ (sw := Off ∧ len≤ 5ms;motor-status:= Off ∧motor-log(′motor-stopped′)) ;

motor-condition:= Disabled∧motor-log(′motor-unsafe′)

motor-safe()∧= motor-status= Off ∧ (sw := On∧ len≤ 5ms;motor-status:= On∧motor-log(′motor-started′)) ;

motor-condition:= Enabled∧motor-log(′motor-safe′)

set-pump(pump-status)∧=

(pump-status= On∧motor-status= Off ∧ ch4-status= Motor-safe∧(motor-status:= On ; sw := On∧ len≤ 5ms;motor-log(′motor-started′)))∨(pump-status= Off ∧motor-status= On∧

(motor-status:= Off ;motor-condition= Enabled∧ (sw := Off ∧ len≤ 5ms;motor-log(′motor-stopped′))))

Copyright 2000 John Wiley & Sons, Ltd. J. Softw. Maint: Res. Pract.2000;12:197–228

Page 18: Abstraction: a key notion for reverse engineering in a system reengineering approach

214 H. YANG, X. LIU AND H. ZEDAN

The log function is not directly related to the system function, so therefore could be abstracted away(with semantic core rule):

motor-unsafe()∧= motor-status= On∧ (sw := Off ∧ len≤ 5ms;motor-status:= Off) ;

motor-condition:= Disabled

motor-safe()∧= motor-status= Off ∧ (sw := On∧ len≤ 5ms;motor-status:= On) ;

motor-condition:= Enabled

set-pump(pump-status)∧=

(pump-status= On∧motor-status= Off ∧ ch4-status= Motor-safe∧(motor-status:= On ; sw := On∧ len≤ 5ms))∨(pump-status= Off ∧motor-status= On∧

(motor-status:= Off ;motor-condition= Enabled∧ sw := Off ∧ len≤ 5ms))

Through the final specification it is easy to understand the functional behaviour of the program. Theinvolved time constraint is the switching on or off of the pump which should be accomplished within5 ms.

7.2. Robot control system

7.2.1. Background

This case study is a concurrent multiple-process application, originally implemented in C. The tele-operated robot is a tracked device which was originally developed for military use. It is driven by twomotors, left and right. Both of these motors can move forwards and backwards. The robot is steered bymoving one motor faster than the other.

From a control point of view, commands are issued to the motors via an operator joystick whichissues integer values in the range 0. . .127 for forward motion (127 maximum speed) and 0. . .−128for reverse motion. It is possible to drive only one motor at a time, in such a case the robot will turn.The speed of the motors is directly proportional to the value written to them.

The robot is equipped with 8 infrared sensors. These return an integer value in the range 0. . . 255depending on whether an obstacle is present or not. 0 indicates no obstacle, 255 indicates obstaclevery near. The robot is operated normally with a threshold of around 100, above which the robot takesnotice of the sensor readings, i.e. an obstacle of interest. At this point reactive control takes over fromthe manual control by moving the robot away from the obstacle until the 100 threshold is not set. Thesensor positions are as follows: N, NE, E, SE, S, SW, W and NW, covering the body of the robot andshown in Figure5.

7.2.2. Extracting the specification

The translated CSL code is as follows:

proc move(In left-op:int, right-op:int){

Copyright 2000 John Wiley & Sons, Ltd. J. Softw. Maint: Res. Pract.2000;12:197–228

Page 19: Abstraction: a key notion for reverse engineering in a system reengineering approach

ABSTRACTION: A KEY NOTION FOR REVERSE ENGINEERING 215

Figure 5. The robot control system.

send-left-motor(left-op);send-right-motor(right-op);}

proc motor-control(){

while true doif(ir-active=1)then move(left-ir-cmd, right-ir-cmd)else if(operator-active=1)

then move(left-op-cmd, right-op-cmd)fi

fiod

}

proc operator(){

while true doif((left-op-cmd<>lleft-op-cmd) & (right-op-cmd<>lright-op-cmd))then operator-active := 1else operator-active := 0fi;

Copyright 2000 John Wiley & Sons, Ltd. J. Softw. Maint: Res. Pract.2000;12:197–228

Page 20: Abstraction: a key notion for reverse engineering in a system reengineering approach

216 H. YANG, X. LIU AND H. ZEDAN

lleft-op-cmd := left-op-cmd;lright-op-cmd:= right-op-cmd;

od}

proc ir(){

int: i, count;while true do

count := 0;left-ir-cmd := 0; right-ir-cmd := 0;ir-active := 0; i:=0;while (i<8) do

if (ir-counts(i) > 100)then

left-ir-cmd := left-ir-cmd+motor-values[i][0];right-ir-cmd := right-ir-cmd+motor-values[i][1];count++

fi;i:=i+1

odif (count>0) then ir-active := 1 fi;

od}

proc main(){

int: left-ir-cmd, right-ir-cmd;int: left-op-cmd, right-op-cmd;int: lleft-op-cmd, lright-op-cmd;int: ir-active, operator-active;int: motor-values[8][2];

left-ir-cmd:=0; right-ir-cmd:=0;left-op-cmd:=0; right-op-cmd:=0;lleft-op-cmd:=0; lright-op-cmd:=0;ir-active:=0; operator-active:=0;motor-values[8][2]={{-20,-20}, {-20, 0}, {-20, 20},

{0, 20}, {20,20}, {20, 0}, {20, -20}, {0, -20} };

parbeginmotor-control()‖ ir()‖ operator()

parend;}

At the first step,elementary abstraction rulesare applied to each procedure in the program. Becausethe procedures are small enough, no section decomposition will be conducted. The result is as follows:

Copyright 2000 John Wiley & Sons, Ltd. J. Softw. Maint: Res. Pract.2000;12:197–228

Page 21: Abstraction: a key notion for reverse engineering in a system reengineering approach

ABSTRACTION: A KEY NOTION FOR REVERSE ENGINEERING 217

move(left-op, right-op) � send-left-motor(left-op) ∧ send-right-motor(right-op)

motor-control() � ((ir -active= 1∧move(left-ir -cmd, right-ir -cmd))∨(ir -active 6= 1∧ operater-active= 1∧move(left-op-cmd, right-op-cmd)))∗

operator() � {left-op-cmd, lleft-op-cmd, right-op-cmd, lright-op-cmd, operator-active} :((left-op-cmd 6= lleft-op-cmd∧ right-op-cmd 6= lright-op-cmd∧ operator-active:= 1)

∨(left-op-cmd= lleft-op-cmd∨ right-op-cmd= lright-op-cmd∧ operator-active:= 0) ;(lleft-op-cmd := left-op-cmd) ; (lright-op-cmd := right-op-cmd))∗

ir () � {ir -active, left-ir -cmd, right-ir -cmd, i, count} :(count := 0 ; left-ir -cmd := 0 ; right-ir -cmd := 0 ; ir -active:= 0 ; i := 0 ;(i < 8∧ (ir -counts(i) > 100∧ left-ir -cmd := left-ir -cmd+motor-value[i][0] ;right-ir -cmd := right-ir -cmd+motor-value[i][1] ; count:= count+ 1) ; i := i + 1)∗ ;(count> 0∧ ir -active:= 1))∗

main() � {left-ir -cmd, right-ir -cmd, left-op-cmd, right-op-cmd, lleft-op-cmd,lright-op-cmd, ir -active, operator-active, motor-values[8][2]} :left-ir -cmd := 0 ; right-ir -cmd := 0 ; left-op-cmd := 0 ; right-op-cmd := 0 ;lleft-op-cmd := 0 ; lright-op-cmd := 0 ; ir -active:= 0 ; operator-active:= 0 ;motor-values[8][2] := {{−20,−20},{−20, 0}, {−20, 20}, {0, 20}, {20, 20}, {20, 0},

{20,−20}, {0,−20}} ;motor-control() ‖ ir () ‖ operator()

Then we begin to do further abstraction to each procedure. The procedureoperator sequence foldingrule is applied to change chop into logic conjunction:

operator() � {left-op-cmd, lleft-op-cmd, right-op-cmd, lright-op-cmd, operator-active} :((left-op-cmd 6= lleft-op-cmd∧ right-op-cmd 6= lright-op-cmd∧ operator-active:= 1)

∨(left-op-cmd= lleft-op-cmd∨ right-op-cmd= lright-op-cmd∧ operator-active:= 0)∧(lleft-op-cmd := left-op-cmd) ∧ (lright-op-cmd := right-op-cmd))∗

Then we use logic combination to make it more concise (withITL axioms):

operator() � {left-op-cmd, lleft-op-cmd, right-op-cmd, lright-op-cmd, operator-active} :(operator-active:= (left-op-cmd 6= lleft-op-cmd∧ right-op-cmd 6= lright-op-cmd)∧(lleft-op-cmd := left-op-cmd) ∧ (lright-op-cmd := right-op-cmd))∗

Procedureir() has two local variables,i andcount. Since the loop time is fixed to 8, we change thechop-star into a concrete number 8.countcould be left out by rewriting the specification in a morecompact style since it is merely a boolean test (withITL axiomsandsemantic core rule). Sequencefolding is done whenever possible (withsequence folding rule).

ir () �{ir -active, left-ir -cmd, right-ir -cmd, i, count} :(count := 0∧ left-ir -cmd := 0∧ right-ir -cmd := 0∧ ir -active:= 0∧ i := 0 ;((ir -counts(i) > 100∧ left-ir -cmd := left-ir -cmd+motor-value[i][0]∧right-ir -cmd := right-ir -cmd+motor-value[i][1] ∧ count:= count+ 1) ∧ i := i + 1)8 ;(count> 0∧ ir -active:= 1))∗

Copyright 2000 John Wiley & Sons, Ltd. J. Softw. Maint: Res. Pract.2000;12:197–228

Page 22: Abstraction: a key notion for reverse engineering in a system reengineering approach

218 H. YANG, X. LIU AND H. ZEDAN

ir () � {ir -active, left-ir -cmd, right-ir -cmd, i, count} :i ∈ [0, 7] ∧ (ir -active=

∨i

(ir -counts(i) > 100)∧left-ir -cmd=

∑i

((ir -counts(i) > 100) ∗motor-values[i][0])∧right-ir -cmd=

∑i

((ir -counts(i) > 100) ∗motor-values[i][1]))∗

For proceduremain(), the initialization part could be left out as trivial details (withsemantic corerule), therefore we get a quite concise specification:

main() � motor-control() ‖ ir () ‖ operator()

Putting it all together, the final specification is as follows:

move(left-op, right-op) � send-left-motor(left-op) ∧ send-right-motor(right-op)

motor-control() � ((ir -active= 1∧move(left-ir -cmd, right-ir -cmd))∨(ir -active 6= 1∧ operater-active= 1∧move(left-op-cmd, right-op-cmd)))∗

operator() � {left-op-cmd, lleft-op-cmd, right-op-cmd, lright-op-cmd, operator-active} :(operator-active:= (left-op-cmd 6= lleft-op-cmd∧ right-op-cmd 6= lright-op-cmd)∧(lleft-op-cmd := left-op-cmd) ∧ (lright-op-cmd := right-op-cmd))∗

ir ()∧= {ir -active, left-ir -cmd, right-ir -cmd, i, count} :i ∈ [0, 7] ∧ (ir -active=

∨i

(ir -counts(i) > 100)∧left-ir -cmd=

∑i

((ir -counts(i) > 100) ∗motor-values[i][0])∧right-ir -cmd=

∑i

((ir -counts(i) > 100) ∗motor-values[i][1]))∗

main() � motor-control() ‖ ir () ‖ operator()

The purpose of this case study is to test whether the proposed approach is capable of multipleconcurrent processes without communication.

From main, it is clear that the robot system is composed of three concurrent processes, namely,motor-control, ir andoperator. More details of the three processes are given in their own specification.Since the final specification is quite concise, it is easy to see the following points.

• From motor-control: if the control mode is infrared, then move the robot through parametersleft-ir -cmd and right-ir -cmd; if the control mode is operator, then move the robot throughparametersleft-op-cmdandright-op-cmd.• Fromoperator: if there is a new operator command then set operator control mode to active, and

change the former command to current value.• From ir : check the eight sensors, if any of them detects a nearby obstacle, then move the robot

away from it.

The final specification will help software engineers understand the robot system greatly.

Copyright 2000 John Wiley & Sons, Ltd. J. Softw. Maint: Res. Pract.2000;12:197–228

Page 23: Abstraction: a key notion for reverse engineering in a system reengineering approach

ABSTRACTION: A KEY NOTION FOR REVERSE ENGINEERING 219

8. CONCLUDING REMARKS

8.1. Lessons learnt

Through this study, we learnt the following lessons:

Definition of abstraction levels: RWSL provides a spectrum of abstractions of the reengineeredsystem, from concrete code to formal specification. These abstractions are integrated and cooperatein a uniform manner. All the layers in RWSL have formal syntax and semantics, which give the targetsystem unambiguous descriptions at various abstraction levels.

Development of formal abstraction taxonomy and rules: In the proposed approach, reverseengineering is carried out by extracting more concise system descriptions from a less abstract level, e.g.source code. This involves crossing levels of abstraction. To archive this, a taxonomy of abstraction isdeveloped to answer ‘what abstraction is’, including definitions and relations of diverse abstractions.Then, abstraction rules are developed to solve how to conduct abstractions. All the abstraction rulesare defined formally, which assures precise and rigorous semantics. With these rules a satisfactoryspecification can be extracted from source code.

Application in real-time domain : At present, most existing reverse engineering techniques are limitedto merely sequential and non-time systems no matter whether formal orad hoctechniques are adopted[19]. Our approach is based on a wide spectrum language, which is designed to bear an ability todescribe time critical features of the target system in a wide span.

Object orientation: The proposed approach relates to object orientation in two aspects. Firstly, theapproach aims to transform procedural legacy systems into object-oriented systems at the code level.A set ofobject extraction rulesare developed. Secondly, our approach supports reverse engineering ofobject-oriented systems, i.e. using abstraction rules an object-oriented program can be abstracted intologic specification.

Research tool maturity and scaling of the method: The prototype tool is still being improved withrespect to scaling up the proposed method for dealing with industrial-scaled systems. Based on how theapproach was designed, it is suitable for industrial-scaled systems and efficient enough for real practice.The approach adopts systematic stepwise abstraction, which slices a large system into manageablesub-systems, then deals with these sub-systems separately and finally integrates the results into onefull view of the system. It is believed in general that a fully automated tool may not be available in thenear future, but the proposed approach supports automation since a semi-automatic tool has been built,which much improves the efficiency of the reengineering process, together with the problem size.

8.2. Conclusion

The features of our abstraction approach (including the tool developed in the project—theReengineering Assistant) are as follows:

• use of ITL to define RWSL, allowing both non-real-time and real-time programs to berepresented and manipulated;

Copyright 2000 John Wiley & Sons, Ltd. J. Softw. Maint: Res. Pract.2000;12:197–228

Page 24: Abstraction: a key notion for reverse engineering in a system reengineering approach

220 H. YANG, X. LIU AND H. ZEDAN

• a small, traceable kernel language, i.e. ITL plus TGCL and ObTAM, allowing very precise andthorough formal semantics to be given to RWSL;• developing transformation for all kinds of programs;• developing object-extraction rules to enable transferring legacy procedural programs to object-

oriented programs;• developing abstraction taxonomy and abstraction rules for crossing levels of abstraction;• dealing with various languages via simple translation followed by automatic restructuring and

simplification;• developing an interactive, semi-automatic tool, rather than attempting complete automation,

thereby making good use of human expert knowledge about the software and its domain;• mechanical checking of the correctness conditions on transformation, object extraction and

abstraction, appearing in the tool menus;• using the prototype and manual case studies to demonstrate how the experienced user solves a

problem, and then implementing these methods and heuristics.

To conclude this study, a reengineering approach with an emphasis on reverse engineering usingprogram abstraction is proposed. A formal framework based on ITL semantics was developed and itis implemented in a wide spectrum language, RWSL. We have formalized program abstraction withina reengineering environment. The abstraction problem has been addressed by software engineeringresearchers for some years, but dedicated approaches used in a reengineering environment with bothconcurrency and real-time features have been non-existent.

The specification produced is then understood and used as a basis for enhanced specification forforward engineering the system. Before proceeding, the specification may be changed and/or extendedwith extra non-functional requirement(s) (e.g. reliability, dependability, limited resources, etc.).

Through the discussion in this paper, it can be concluded that program abstraction is a powerfulmeans for reverse engineering and a systematic approach such as the one proposed in this paper willhelp reengineering.

8.3. Future research

This approach has been applied on small to medium examples, such as those presented in this paper.Apparently, the obtained versions of programs (specifications) after abstraction was achieved are muchmore understandable. However, to fully test the approach, medium-to-large case studies need to beused. In addition, we are currently working on the development of an integrated tool that will supportthe whole process of reengineering of time-critical systems.

Reverse engineering ‘real-time’ programs is within the ability of our approach. There are manyembedded systems, in particular those in safety critical or safety related applications in the businessworld, that need reengineering. We will endeavour to deal with these embedded systems as one of themajor future tasks of our reengineering approach.

The main objectives of this paper were to explore the notion of abstraction and to give a sound basisto support it. The realization of the developed abstraction calculus within an industry-strength toolsetis beyond the scope of the present paper. However, initial results in integrating our calculus within theRA Workbench were promising. It is worth noting that our RA is a prototyping tool which is based on

Copyright 2000 John Wiley & Sons, Ltd. J. Softw. Maint: Res. Pract.2000;12:197–228

Page 25: Abstraction: a key notion for reverse engineering in a system reengineering approach

ABSTRACTION: A KEY NOTION FOR REVERSE ENGINEERING 221

our previous work on MA [11]. We envisage that the migration of RA to the industry-strength FermaTwill be straightforward. These results will be reported elsewhere.

An important issue in any approach such as ours is scalability. Although the abstraction relation ismonotonic, which guarantees compositionality, the present framing within ObTAM makes scalabilitycumbersome. However, a set-theoretic approach for framing is readily available.

APPENDIX. ABSTRACTION RULES

In this appendix, abstraction rules are listed, whereA,B,Ai ,Bi are agents, and8,9,8i,9i areformulae. All rules are proven sound in ITL.

A.1. Elementary abstraction rules

A.1.1. Primitive abstraction rules

(1) Assignment

x := e � ©x = e

This rule extracts a logic formula of the assignment statement which assigns the value ofexpressione to variablex.

(2) Input statement

(x, y)← s � x = √s ∧ y = read(s)

This rule extracts a logic formula of the input statement which reads the value in shunts tovariabley and stores the timestamp inx.

(3) Output statement

x → s � skip ∧©s = (√

s + 1, x)

This rule extracts a logic formula of the output statement which writes the value ofx andtimestamp to shunts.

(4) Type definition

x : T � ∃x q fT (x)

The feature of a variablex of typeT is described with functionfT (x).(5) Delay

delayn � len= n

Delay means doing nothing during the specified period. This rule extracts a logic formula of thedelay statement.

Copyright 2000 John Wiley & Sons, Ltd. J. Softw. Maint: Res. Pract.2000;12:197–228

Page 26: Abstraction: a key notion for reverse engineering in a system reengineering approach

222 H. YANG, X. LIU AND H. ZEDAN

A.1.2. Compound abstraction rules

(1) Sequential composition

A � 8

B � 9

A ;B � 8 ;9Two sequential agents can be abstracted separately, and the results are composed through thesequential operator ‘;’.

(2) Conditional statement

Ai � 8i (for all i ∈ I )

if 2i∈I gi thenAi fi �( ∨

i∈I(gi ∧8i)

)∨

( ∧i∈I¬gi

)

This rule extracts a logic formula of a conditional statement.(3) Iteration statement

A � 8

whileg doA od� (g ∧8 ;whileg doA od) ∨ (¬g)

This rule extracts a logic formula of an iteration statement.(4) Procedure definition

A′ � 8

procP(In pini : Ti, Out poutj : T ′j ){A′} � 8 ∧ stable(pini )

A procedure definition is abstracted into a separate agent with its input parameters stable andoutput parameters possibly unstable.

(5) Procedure invocation

A′ � 8

P(In ei, Outxj ) � 8(pini/ei , poutj /xj )

whereprocP(In pini : Ti, Out poutj : T ′j ){A′}The invocation of a procedure equals the execution of the procedure’s abstracted agent with theinput parameters’ values passed in and output parameters passed out.

(6) Parallel

A � 8, B � 9

parbeginA ‖ B parend� (8 ∧9)

Copyright 2000 John Wiley & Sons, Ltd. J. Softw. Maint: Res. Pract.2000;12:197–228

Page 27: Abstraction: a key notion for reverse engineering in a system reengineering approach

ABSTRACTION: A KEY NOTION FOR REVERSE ENGINEERING 223

Two concurrency or parallel agents can be abstracted separately and the results are combinedthrough the conjunction operator.

(7) Duration

A � 8

[t]A � 1t ∧ (8 ; true) ∧ (8 ⊃ len <= t)

Duration means that the execution of the specified agent should be finished within the indicatedtime duration. This rule extracts a logic formula of duration statement.

(8) Signal

A1 � 81A2 � 82

A1�ts A2 � (1t ∧ stable(

√s) ;81) ∨ (1t ∧ ¬stable(

√s) ;82)

The two agents in a signal statement can be abstracted separately. This rule extracts a logicformula of the signal statement.

(9) Object definitionAs type specification, classes will disappear in ITL specification. Only objects exist as formulaewith frames in ITL.Let T = {xi : Ti, mj (In pinjk

: Tk, Out poutjl: T ′l )[Aj ]}, then

Aj � 9j

x : T � Wx : f

whereWx =⋃i∈I

xi

f =∧i∈I

fT (xi) ∧∧j∈J

(9j ∧ stable(pinjk))

This rule transforms the definition of an object in source code into a logic description.W is thedata field of the object, andf is the behaviour description of the object where9j ∧stable(pinjk

)

is the description of methodmj .(10) Object hierarchy

Let T = {xi : Ti, mj (In pinjk: Tk, Out poutjl

: Tl)[Aj ]}T ′ = {yi′ : T ′i′ , m′

j ′(In pinj ′k′: T ′

k′ , Out poutj ′l′: T ′

l′)[A′j ′ ]}, then

T <sub T ′Aj � 9j ,A′

j ′ � 9 ′j ′

x : T � Wx : fwhereW =

⋃i∈I

xi ∪⋃i′∈I ′

yi′ iff for all xi i ∈ I , yi′ 6= xi

Copyright 2000 John Wiley & Sons, Ltd. J. Softw. Maint: Res. Pract.2000;12:197–228

Page 28: Abstraction: a key notion for reverse engineering in a system reengineering approach

224 H. YANG, X. LIU AND H. ZEDAN

f =∧i∈I

fT (xi) ∧∧i′∈I ′

fT (yi′) ∧∧j∈J

8j ∧∧

j ′∈J ′8′

j ′

iff for all xi i ∈ I , yi′ 6= xi , and iff for all 8j j ∈ J , 8′j ′ 6= 8j

∀j ∈ J q 8j = 9j ∧ stable(pinjk)

∀j ′ ∈ J ′ q 8′j ′ = 9 ′

j ′ ∧ stable(pinj ′k′)

The subclass relation<sub is transitive. This rule transforms the object hierarchy definition,including inheritance, into a logic formula. Assume thatT is a subclass ofT ′; for any objectxof classT , it will inherit all the data fields and methods inT ′ if they are not redefined inT . Onthe other hand, all the data fields and methods inT ′ will be overridden with the counterparts inT if they are redefined inT .

(11) Method invocation

A � 8

x.m(ei, yj ) � 8(pini/ei , poutj /yj )

wherem(In pini : Ti, Out poutj : Tj )[A]A method invocation equals the execution of the method’s abstracted agent with the inputparameters passed in and the output parameters passed out.

(12) Field reference

x.f � f ∈ Wx

A data field of an object is a variable belonging to the frame of the agent corresponding to theobject.

A.2. Further abstraction

(1) Transitive

A � B,B � C

A � C(2) Reflexive

A � A(3) Monotonic

A � BC = ∧| ∨ | ; | ‖ | ⇒

C(A) � C(B)

(4) Sequence folding

Copyright 2000 John Wiley & Sons, Ltd. J. Softw. Maint: Res. Pract.2000;12:197–228

Page 29: Abstraction: a key notion for reverse engineering in a system reengineering approach

ABSTRACTION: A KEY NOTION FOR REVERSE ENGINEERING 225

[[A ;B]] ⇒ [[A ∧ B]]

A ;B � A ∧ BIf no contradiction is caused when substituting the sequential relation between two agents to theconjunction relation, then the sequence can be folded through conjunction.

(5) Agent combination

4.1(W1 : 81) ∧ (W2 : 82) = (W1 ∪W2) : 81 ∧824.2(W1 : 81) ∨ (W2 : 82) � (W1 ∪W2) : 81 ∨82

Two agents with a conjunction or disjunction relation can be combined to one single agent.(6) Weakening

A � 8

8⇒ 9

A � 9

Any semantics weakening that does not contradict the healthiness condition and makes a betterspecification of the system can be applied. Semantics weakening is used to eliminate inessentialinformation, such as design details. Typical rules include the following.

• State test and exception handlingState tests and exception handling are often used in programs to assure smooth execution.Although they may be important in system implementation, these details do not involvethe crucial functionality of the system. Therefore, in high-level specification, these detailsare unnecessary and should be abstracted away.• User interface format

Almost all computing systems have to pay some attention to the format of its interfacewith the user. There are three sorts of so-called user interface format: input format, outputformat and graphic user interface (GUI).For some systems, a rather large part is devoted to making a better user interface format.However, these format-related parts are not involved in the core function of the system andcould be left out in high-level specification.• Semantic core

The semantic core of a specification is the part which covers the specification’s keycontents. In this abstraction pattern, once the semantic core of a specification is identified,further abstraction will keep the core but omit other parts of the specification.• Comment revise

Comments in source code often give great help to the understanding of the system. Duringreverse engineering, comments should be kept and revised to fit specifications at differentabstraction levels. For higher level specification, comments should be revised to be moreconcise and general.• Domain function

Domain functions give more scientific and concise descriptions of the functionality oftarget systems. If a component is identified as an implementation of a certain domain

Copyright 2000 John Wiley & Sons, Ltd. J. Softw. Maint: Res. Pract.2000;12:197–228

Page 30: Abstraction: a key notion for reverse engineering in a system reengineering approach

226 H. YANG, X. LIU AND H. ZEDAN

function, then it should be abstracted back to the domain function in further abstraction.This will make the specification more abstract and concise. For example, the agent{x, y} : (x > 0∧ y = y ∗ x ; x = x − 1)x−1 implementsy = x!. So, it can be abstractedas{x, y} : y = x!.• Efficiency-improving details

The implementation is often cluttered with information/details to improve the efficiency ofthe system. These details are normally not function related, and could be abstracted awayin high-level specification. For example, when using the register variable to improve thesystem’s efficiency, all the related parts are classified as efficiency-improving details.

(7) Conjunction

A � 8

A � 9

A � 8 ∧9

(8) Specification

(W : 8) ∧ s ∈ 8 ∧ stable(s) = W − s : 8 (if s not in8)

This rule eliminates the redundant variables in an agent.(9) Sequential

9.1empty ;A = A = A ; empty9.2A ; (B ; C) = (A ;B) ; C9.3A1 ; (A2 ∨A3) ;A4 = (A1 ;A2 ;A4) ∨ (A1 ;A3 ;A4)

These rules indicate that the sequential composition operator has empty as a unit and isassociative and distributive over non-deterministic choice.

(10) Delay

delayd1; delayd2

= delayd1+d2skip = delay1

(11) Parallel

11.1A ‖ B = B ‖ A11.2A ‖ (B ‖ C) = (A ‖ B) ‖ C11.3A ‖ true= A11.4A ‖ (B ∨ C) = (A ‖ B) ∨ (A ‖ C)11.4A ‖ B � A′ ‖ B, for anyB if A � A′11.5(G→ 81) ‖ (G′ → 82) = (G ∧G′)→ 81 ∧82

(12) Signal

12.1(A�ns B) ‖ (C �n

s D) = (A ‖ C)�ns (B ‖ D)

12.2A�n+1s B = (A�n

s B)�1s B

12.3A�ns (C �0

s B) = A�ns B

Copyright 2000 John Wiley & Sons, Ltd. J. Softw. Maint: Res. Pract.2000;12:197–228

Page 31: Abstraction: a key notion for reverse engineering in a system reengineering approach

ABSTRACTION: A KEY NOTION FOR REVERSE ENGINEERING 227

(13) Non-deterministic choice

13.1P ∨ P = P13.2P ∨Q = Q ∨ P13.3P ∨ (Q ∨R) = (P ∨Q) ∨R13.4true∨ P = true

(14) Iteration

µn+1A = A;µnA = µnA;A

ACKNOWLEDGEMENTS

The authors would like to thank Dr Antonio Cau for his comments on an earlier version of the paper and othermembers of the Software Evolution and Reengineering Group for their valuable discussions at our weekly internalseminars.

REFERENCES

1. Liu X, Yang H, Zedan H. Formal methods for the re-engineering of computing systems.Proceedings of the 21st IEEEInternational Conference on Computer Software and Application (COMPSAC’97), Washington DC, August 1997, IEEEComputer Society Press; 409–414.

2. Sere K, Wald´en M. Reverse engineering distributed algorithms.Software Maintenance: Research and Practice1996;8:117–144.

3. Holtzblatt LJ, Piazza RL, Reubenstein HB, Roberts SN, Harris DR. Design recovery for distributed systems.IEEETransactions on Software Engineering1997;23.

4. Abd-El-Hafiz SK, Basili VR. A knowledge-based approach to the analysis of loops.IEEE Transactions on SoftwareEngineering1996;22(5).

5. Layzell PJ, Freeman MJ, Benedusi P. Improving reverse engineering through the use of multiple knowledge sources.Software Maintenance: Research and Practice1995;7:279–299.

6. Sitaraman M, Weide BW, Ogden WF. On the practical need for abstraction relations to verify abstract data typerepresentations.IEEE Transactions on Software Engineering1997;23(3).

7. Arango G, Baxter I, Freeman P, Pidgeon C. TMM: Software maintenance by transformation.IEEE Software1986;3(3):27–39.

8. Engberts A, Kozaczynski W, Ning J. Concept recognition-based program transformation.IEEE Conference on SoftwareMaintenance, Sorrento, Italy, 1991; 73–82.

9. Bennett KH, Bull T, Yang H. A transformation system for maintenance—turning theory into practice.IEEE Conference onSoftware Maintenance, Orlando, FL, November 1992.

10. Yang H. Formal methods and software maintenance—some experience with the REFORM project.Workshop on FormalMethods, Position Paper, Monterey CA, September 1994.

11. Yang H, Bennett KH. Acquiring entity-relationship attribute diagrams from code and data through program transformation.IEEE International Conference on Software Maintenance (ICSM’95), Nice, France, October 1995.

12. Howden WE, Pak S. Problem domain, structural and logical abstractions in reverse engineering.Proceedings of theInternational Conference on Software Maintenance, November 1992, IEEE Computer Society Press; 214–224.

13. Balmas F. Prisme: Formalizing programming strategies as a way to understand programs.Eighth International Conferenceon Software Engineering and Knowledge Engineering, Lake Tahoe, Nevada, June 1996. IEEE Computer Society.

14. Cheng BHC. Applying formal methods in automated software development.Journal of Computer and SoftwareEngineering1994;2:137–164.

15. Cheng BHC, Jeng JJ. Reusing analogous components.IEEE Transactions on Knowledge and Data Engineering1997;9(2):341–349.

16. Gannod C, Cheng BHC. Strongest postcondition semantics as a basis for reverse engineering.Journal of AutomatedSoftware Engineering1996;3(1/2).

Copyright 2000 John Wiley & Sons, Ltd. J. Softw. Maint: Res. Pract.2000;12:197–228

Page 32: Abstraction: a key notion for reverse engineering in a system reengineering approach

228 H. YANG, X. LIU AND H. ZEDAN

17. Gannod C, Cheng BHC. Using informal and formal methods for the reverse engineering of c programs.Proceedings ofIEEE International Conference on Software Maintenance, Monterey, CA, November 1996.

18. Ward M. Assembler to C migration using the fermat transformation system.IEEE Conference on Software Maintenance,Oxford, England, 1999; 67–75.

19. Bull T, Younger E, Bennett KH, Luo Z. Bylands: reverse engineering safety-critical systems.IEEE Conference on SoftwareMaintenance, Nice, France, 1995.

20. Younger E, Luo Z, Bennett KH, Bull T. Reverse engineering concurrent programs using formal modelling and analysis 1.IEEE International Conference on Software Maintenance, Washington DC, November 1996.

21. Moszkowski B.Executing Temporal Logic Programs. Cambridge University Press: Cambridge, UK, 1986.22. Cau A, Zedan H. Refining interval temporal logic specifications.The 4th AMAST Workshop on Real-Time Systems,

Concurrent, and Distributed Software (ARTS’97), Mallorca, Spain, May 1997.23. Zedan H, Heping H. An executable specification language for fast prototyping parallel responsive systems.Computer

Language1996;22:1–13.24. Scholefield D, Zedan H. TAM: A formal framework for the development of distributed real-time systems.Symposium on

Formal Techniques in Real-Time and Fault Tolerant Systems, Nijmegen, The Netherlands, January 1992.25. Dijkstra EW.A Discipline of Programming. Prentice-Hall: Englewood Cliffs, NJ, 1976.26. Kwiatkowski J, Puchalski I, Yang H. Pre-processing cobol programs for reverse-engineering in a software maintenance

tool. 1st US Colloquium on Object Technology and System Re-engineering, Oxford, England, April 1998.27. Liu X. A design framework for system re-engineering.Technical Report, STRL, Department of Computer Science,

De Montfort University, England, July 1997.28. Liu X. Abstraction: A notion for reverse engineering.PhD Thesis, De Montfort University, 1999.

Copyright 2000 John Wiley & Sons, Ltd. J. Softw. Maint: Res. Pract.2000;12:197–228