communications software reverse engineering: a semi-automatic approach

12
ELSEVIER Information and Software Technology 38 (1996) 379-390 Communications software reverse engineering: a semi-automatic approach Kassem Saleh*, Abdulazeez Boujarwah Kuwait University, Department of Electrical and Computer Engineering, PO Box 5969, Safat 13060, Kuwait Received 22 August 1994; revised 5 July 1995; accepted 11 July 1995 Abstract A large amount of existing data communications software was developed prior to the advances in software technology using informal and ad hoc techniques. As a result, developers are suffering during the maintenance of this software since the quality of both the software and the associated documentation is not acceptable. Moreover, the addition of features to this software is often leading to side-effects and unexpected interactions. Also, much of this software is missing a clear and formal service definition, or at least a formal statement about their mission. Design documents are either informal or incomplete and do not reflect the existing software, and test plans are either incomplete or not documented. Maintaining and expanding such software becomes unmanageable, very time-consuming and sometimes impossible. In this paper, we propose a reverse engineering method that can be applied to such informally developed communications software to facilitate the extraction of design choices and documentation in addition to the formal definition of the intended communication service. This method obtains a high-level abstraction of the communications software based on Estelle, an International Standardization Organization (ISO) standard specification language for protocols and for distributed systems in general. The application of this reverse engineering process will definitely increase the productivity of the protocol/software engineer. Morover, it will allow the revalidation and redesign of the extracted design and the derivation of more comprehensive test plans. An example is also provided to illustrate the application of the method. Keywords: Communications software engineering; Estelle; reverse engineering; tools 1. Introduction Reverse software engineering is the process of analys- ing a software product to: (1) identify its components and their interrelationships; and (2) create system repre- sentations at similar or higher levels of abstractions. Starting from a software system implementation, a sub- set of the reverse engineering process consists of the recovery of the software design, and the synthesis of other implementation-independent abstractions. This process does not include any modifications to the system as such: it is only an examination of the system which leads to a redocumentation, and the recovery of the design and high level architecture of the software. The goal of redocumentation is to recover documentation about the system that exists or should exist. However, design recovery involves the recreation of design abstrac- tion from the given implementation and its existing design documents and the knowledge of the software engineer. Obviously, if the recovered design is not * email:[email protected] 0950-5849/96/$15.00 0 1996 Elsevier Science B.V. All rights reserved SSDI 0950-5849(95)01061-O satisfactory, other processes such as restructuring, redesign and re-engineering may be applied. For more information on the terminology and issues related to reverse engineering, design recovery, redocumentation, restructuring and re-engineering, the reader can refer to Biggerstaff [l] and Chikofsky and Cross [2]. In this paper, we are concerned with reverse engineer- ing communications software, and in general, real-time distributed systems software. Because distributed soft- ware involves the interplay of complex features such as communication, concurrency, synchronization and non- determinism, reverse engineering of this type of software is more challenging and complicated than its non- distributed or batch-oriented counterpart. One of the common features of communications software is that it is based on event-driven processes that react to external stimuli. Therefore, it is always possible to extract an underlying software design structure based on commu- nicating finite state machines (CFSM) that can intui- tively model the interactions between event-driven processes [3]. In this work, we consider Estelle, a standard formal description technique for distributed

Upload: kassem-saleh

Post on 26-Jun-2016

216 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Communications software reverse engineering: a semi-automatic approach

ELSEVIER Information and Software Technology 38 (1996) 379-390

Communications software reverse engineering: a semi-automatic approach

Kassem Saleh*, Abdulazeez Boujarwah

Kuwait University, Department of Electrical and Computer Engineering, PO Box 5969, Safat 13060, Kuwait

Received 22 August 1994; revised 5 July 1995; accepted 11 July 1995

Abstract

A large amount of existing data communications software was developed prior to the advances in software technology using informal and ad hoc techniques. As a result, developers are suffering during the maintenance of this software since the quality of both the software and the associated documentation is not acceptable. Moreover, the addition of features to this software is often leading to side-effects and unexpected interactions. Also, much of this software is missing a clear and formal service definition, or at least a formal statement about their mission. Design documents are either informal or incomplete and do not reflect the existing software, and test plans are either incomplete or not documented. Maintaining and expanding such software becomes unmanageable, very time-consuming and sometimes impossible. In this paper, we propose a reverse engineering method that can be applied to such informally developed communications software to facilitate the extraction of design choices and documentation in addition to the formal definition of the intended communication service. This method obtains a high-level abstraction of the communications software based on Estelle, an International Standardization Organization (ISO) standard specification language for protocols and for distributed systems in general. The application of this reverse engineering process will definitely increase the productivity of the protocol/software engineer. Morover, it will allow the revalidation and redesign of the extracted design and the derivation of more comprehensive test plans. An example is also provided to illustrate the application of the method.

Keywords: Communications software engineering; Estelle; reverse engineering; tools

1. Introduction

Reverse software engineering is the process of analys- ing a software product to: (1) identify its components and their interrelationships; and (2) create system repre- sentations at similar or higher levels of abstractions. Starting from a software system implementation, a sub- set of the reverse engineering process consists of the recovery of the software design, and the synthesis of other implementation-independent abstractions. This process does not include any modifications to the system as such: it is only an examination of the system which leads to a redocumentation, and the recovery of the design and high level architecture of the software. The goal of redocumentation is to recover documentation about the system that exists or should exist. However, design recovery involves the recreation of design abstrac- tion from the given implementation and its existing design documents and the knowledge of the software engineer. Obviously, if the recovered design is not

* email:[email protected]

0950-5849/96/$15.00 0 1996 Elsevier Science B.V. All rights reserved SSDI 0950-5849(95)01061-O

satisfactory, other processes such as restructuring, redesign and re-engineering may be applied. For more information on the terminology and issues related to reverse engineering, design recovery, redocumentation, restructuring and re-engineering, the reader can refer to Biggerstaff [l] and Chikofsky and Cross [2].

In this paper, we are concerned with reverse engineer- ing communications software, and in general, real-time distributed systems software. Because distributed soft- ware involves the interplay of complex features such as communication, concurrency, synchronization and non- determinism, reverse engineering of this type of software is more challenging and complicated than its non- distributed or batch-oriented counterpart. One of the common features of communications software is that it is based on event-driven processes that react to external stimuli. Therefore, it is always possible to extract an underlying software design structure based on commu- nicating finite state machines (CFSM) that can intui- tively model the interactions between event-driven processes [3]. In this work, we consider Estelle, a standard formal description technique for distributed

Page 2: Communications software reverse engineering: a semi-automatic approach

380 K. Saleh, A, BoujarwahlInformation and Software Technology 38 (1996) 379-390

systems developed by the International Standardization Organization [4], as the logical choice for representing the software design recovered by the reverse engineer- ing process. Estelle is based on a model for packaging and structuring communicating finite state machines extended with constructs for the manipulation of data and the evaluation of predicates.

The proposed reverse engineering method will be applied to an informally developed communications software, and it involves the extraction of a very high level of abstraction corresponding to the definition of the service provided by the software (protocol) to some distributed users, in addition to the extraction of the protocol specification that provides the service. Both abstraction levels will be formally described in Estelle. The tool provides a redocumentation for an existing sys- tem and recovers its implemented design. We believe that this process will definitely increase the productivity of any development group maintaining communications software. In addition, it will allow the formal revalida- tion of the synthesized design, and the systematic and formal derivation of test plans.

The rest of the paper is organized as follows. The next section briefly introduces the areas of protocol engineer- ing and presents the case for protocol reverse engineer- ing. The following section introduces Estelle. We then present a reverse engineering method for communica- tions protocols and an example illustrating the applica- tion of our reverse engineering approach. Moreover, we briefly describe what can be done with the extracted abstractions described in Estelle. The final section con- tains some concluding remarks.

2. Communications software (or protocol) engineering

2.1. Protocol engineering

A protocol constitutes the backbone of any commu- nication system. Therefore the production of correct pro- tocols is critical in the process of engineering such systems. Protocol Engineering, first coined as a term by Piatkowski in 1981 [5], is defined as the formal commu- nications software development process which starts from a communication service definition and produces a protocol implementation. Recent advances in this area 16-81 concentrated on the following phases of the proto- col engineering life-cycle (PELC): (i) development of for- mal techniques to support the protocol engineering process; (ii) the introduction and use of standard formal description techniques for communication protocols; (iii) the design and development of automated and interac- tive protocol synthesis techniques and tools for the trans- formation of service definitions into protocol specifications; (iv) the development of design validation techniques and supporting tools for proving correctness properties of

formally specified protocols; (v) the development of semi-automatic techniques for obtaining protocol imple- mentations from specifications; and finally (vi) the devel- opment of conformance testing languages, tools and techniques for checking the conformity of a protocol implementation to the formal definition of the protocol.

Research and development in protocol engineering encompasses three key components: (i) the formal methods and techniques to be used in each of the PELC; (ii) the support tools to assist in the automation of such methods and techniques; and finally (iii) the pro- cedures and standards to follow to perform the tasks involved in each of the phases involved. For a survey of techniques, methods and tools for protocol engineer- ing, the reader can refer to Bochmann [9] and Liu [lo].

Unfortunately, because the use of formal design tech- niques was not as popular as it should be in industry, lots of existing communications software has been developed using traditional and mostly informal approaches to pro- tocol engineering phases. Most of the existing and ageing software is somehow unmanageable, and new designers find it very difficult to grasp existing design ideas mainly because of the lack of documentation of higher level abstractions and representations of the software. Furthermore, integration and regression testing is not performed as well and as thoroughly as it is supposed to be, because test cases are incomplete, not representa- tive and not well documented. We believe that, because of the criticality of such software and the ever-increasing cost of maintaining it, there is a need to formalize and formulate a reverse engineering method.

2.2. Protocol reverse engineering and its benefits

In the following, we describe the main symptoms that are manifested at each of the phases of software main- tenance and that provide the raison d’e^tre for the appli- cation of a reverse engineering process. After all, it is mainly during software maintenance that these symp- toms become apparent to designers, and it is obvious that no maintenance would be required if the software works with no reported problems, and when no enhance- ments or features are planned. These symptoms are:

(a) Service definitions: No formal statement exists on the service provided by the software. This is equivalent to the unavailability of a specification document for tra- ditional software. Formal service definitions are important for maintenance, since any addition of new service requirements or features must be checked first for consistency with existing services (i.e. to elim- inate undesirable effects and feature interactions).

(b) Protocol design: Design documents are not kept up- to-date and the design specification is not formal. The original design was not formally validated, there- fore residual design errors remain dormant. Reported

Page 3: Communications software reverse engineering: a semi-automatic approach

K. Saleh. A. Boujarwahllnformation and Software Technology 38 ( 1996) 379-390 381

implementation errors are related to trivial design errors that would have been detected by automated protocol validation tools.

(c) Protocol implementation: Code is not well documen- ted and does not reflect design choices described in existing design documents. More seriously, the imple- mentation might not provide what is expected from the protocol (i.e. does not conform to the service specification of the protocol).

(d) Test plans: Acceptance, integration and unit test plans are not kept up-to-date. Using available test documents to test the integration of existing imple- mentation with additional code is not sufficient since such testing would be incomplete.

A lot of commercial tools exist to reverse-engineer tra- ditional software products written in high level languages like C and Ada [l 1,121, and Fortran [ 131. However, in this research, we are dealing with sophisticated commu- nications software, and we are interested in obtaining a unifying formal and standard model for representing communications software design.

3. An overview of Estelle

Estelle [14] (extended state transition language) is a formal description technique developed by IS0 (Inter- national Organization for Standardization) for the specification of distributed systems (e.g. telecommunication systems). In particular, it has been proposed for specifying Open Systems Interconnection (OSI) protocols and ser- vices. Estelle is a state-transition-oriented language based on (i) extended finite state machines to describe the system behaviour [3]; and (ii) extensions to the Pascal programming language. So far, Estelle has been success- fully used to specify various OS1 services and protocols [ 15- 171. For a complete introduction to Estelle, the reader may refer to Budkowski and Dembinski [18].

Estelle provides features for the description of two important aspects of distributed systems: the architec- tural model and the behavioural model. The architec- tural model describes a hierarchically structured system of non-deterministic and communicating modules exchanging messages across bidirectional channels. The model assumes that unbounded FIFO buffers exist at the receiving end of each channel where messages are kept until the receiving module is ready to process them. The hierarchical structure of modules, created during a mod- ule initialization phase, is dynamic and may change dur- ing the progress of communication between modules. The behavioural model allows the description of the state- transition-oriented behaviour of individual modules using an extended communicating finite state machine (ECFSM). An ECFSM representation of a module con- sists of a set of transitions which are executed once the preconditions for executability are satisfied.

In Estelle, a distributed system may be specified in terms of externally observable behaviour of a collection of modules which are exchanging messages (called inter- actions) with each other and with their environment through bidirectional links (called channels) between their ports (called interaction points). An interaction exchanged at an interaction point of a module transfers information either from the module (i.e. an output inter- action of the module) or to the module (i.e. an input interaction of the module). Each interaction is defined as consisting of an interaction identifier followed by a (possibly empty) list of interaction parameters.

A system specification in Estelle consists of the defini- tions of the channels and the modules comprising the structure of the specification. A channel definition con- sists of a channel heading definition and a channel block definition. In the channel heading, the identifier for the channel and the two roles (i.e. ‘user’ and ‘provider’) that are used to indicate the initiator of interactions exchanged through the channel are given. These roles will be assumed by a pair of modules interacting through the channel. The channel block first groups interactions that are exchanged through the channel as those that are initiated by ‘user’ or ‘provider’ and then enumerates each group of interactions by giving identifiers of each inter- action and its parameter fields.

A module definition consists of a module header defi- nition and a module body definition. In the module header, the identifier for the module, the identifier of each interaction point of the module, the identifier of the channel associated with each interaction point, and the role the module plays on the channel associated with each interaction point, are given, The module body defines the externally observable behaviour of the mod- ule (i.e. the possible orderings of interactions exchanged at the interaction points of the module) by an ECFSM. Possible orderings of interactions that a module exchanges are given in terms of the state space and the possible state transitions of the module where extensions (e.g. operations on interaction parameters) are expressed in a version of IS0 Pascal [4].

The state space of a module is specified by a set of variables (i.e. the control state variable, called state, and some additional state variables). A possible state of the module is characterized by the values of all of these variables. Each state transition is associated with an enabling condition and an action. The enabling con- dition of a transition defines the conditions that must be satisfied for the transition to be enabled for execution. At any given time there may be more than one transition enabled. However, only one enabled transition may be non-deterministically selected and executed at any given time. Execution of a transition is considered to be atomic (i.e. non-interruptible) and corresponds to performing the action associated with the transition. The action of a transition may: change the state of the module; modify

Page 4: Communications software reverse engineering: a semi-automatic approach

382 K. Saleh, A. Boujarwahllnformation and Software Technology 38 (1996) 379-390

Communications software to Ieverse engineer

intermediate Behaviom Data Struaure Represention

Extracted

Fig. 1. High-level description of the automated tool.

the values of additional state variables; and initiate out- put interactions.

Each state transition of a module is characterized by a clause group and a transition block. The clause group of a transition consists of at most one occurrence of each of the following clauses:

(a) FROM clause: specifies the state(s) from which the transition may originate. If this clause is not present, then the transition is understood to originate from any state.

(b) TO clause: specifies the state reached after the execu- tion of the transition. If this clause is not present, then it is understood that the state does not change.

(c) WHEN clause: specifies which input interaction must be received to partially enable the transition. If this clause is not present, then the transition is said to be a spontaneous transition.

(d) PROVIDED clause: specifies a conjunct of the enabling condition that must be satisfied for the tran- sition to be enabled. If this clause is not present, then it is understood to be ‘TRUE’. Note that the other conjuncts of the enabling condition are identified by FROM and WHEN clauses.

(e) DELAY clause: which may be used only for a spontaneous transition, specifies the minimum time units that the transition remains enabled before it is considered for execution and an optional upper bound (i.e. the maximum) on time units after which an enabled transition must be considered for execution.

(0 PRIORITY clause: specifies the.relative priority of the transition with respect to other transitions in terms of a priority number (the lowest non-negative integer is the highest priority). If this clause is not present, then the lowest priority is assumed.

(g) ANY clause: specifies the transition as a shorthand for a set of transitions in which no transition contains an ANY clause.

The transition block consists of declarations and defini- tions that are local to that block, in addition to Pascal- like statements (i.e., IF, FOR, . . .), including possibly the output statement. The layout of an Estelle specification of a protocol is listed in Appendix A.

4. Reverse engineering method

In this section, we describe the main components of a protocol reverse engineering tool. Since the output of the reverse engineering process is an Estelle specification of the protocol, the overall architecture of the tool is based on the identification of the two main sets of components needed in an Estelle specification: (i) the architectural and static components: and (ii) the dynamic and behav- ioural components. The tool must recognize these com- ponents and present them to the designer. The recognition of the architectural components is straight- forward and relies on some knowledge provided by the protocol user and by a simple walkthrough. For exam- ple, the names of input and output functions used in the implementation have to be identified by the designer. In general, language and implementation-dependent details are best understood by the designer and cannot be easily recognized by a tool. Also, intelligent parsers can be built to recognize the behavioural components. However, if the protocol code is badly structured, the designer may have to make it easily recognizable (i.e. replacing multi- ple if-else by a switch in C). In addition, a composition of the recovered protocol design would recover the formal definition of the service provided by the protocol.

A high-level description of a semi-automated tool implementing our reverse engineering method is shown in Fig. 1.

In the following, we describe both the non-automated architecture identification part and the automated behaviour identification part and their representative data structures.

Page 5: Communications software reverse engineering: a semi-automatic approach

K. Saleh, A.‘tloujarwah/Information and Software Technology 38 (1996) 379-390 383

4.1. Recognizing architectural components

The high-level and architectural information that should be extracted from the communications software include:

(a) The event-driven communicating finite state machine-based modules involved in the protocol implementation.

(b) The types of messages sent or received by the mod- ules recognized in (a).

(c) The directed interaction diagram (DID) describing the intra-module communication and the direction of the messages exchanged.

(d) The input and output functions that are used to get input messages and to deliver output messages, from or to other modules, respectively.

The DID can be formalized after a further classification of each of the messages obtained in (b) by their source module if it is received, or by its destination if it is sent. Furthermore, after finding the DID, we can obtain the logical channels involved in the protocol.

The information collected in the above four items can then be refined to obtain the Estelle architecture specifi- cation of the protocol: the module header definitions, the interaction points, the interactions (or messages) and the channel definitions.

A summary of the activities that has to be performed is described in the following:

(a) For each event-driven ECFSM-based module: iden- tify the utility functions or the utilities file name used by the module. For each message type (or interac- tion): if message is received, then identify its sender or source. If message is sent, then identify its recipient or destination.

(b) Construct the directed interaction diagram (DID) - the number of undirected arcs determines the chan- nels in the Estelle specification.

(c) Generate the following partial Estelle. specification:

(*Definition of the channels and the associated interactions *) channel channel_name_l (rolel, role2); by user:

(* interactions and their parameters *) . . . . . .

by provider: (* interactions and their parameters *) . . . . . .

channel channel-name-2 (rolel, role2); . . .

(* Definition of the module headers*) module module_l;

ip interactiongoint id-1 : channel-name-1 (role); interactiongointIid_2 : channel-name-2 (role);

end; module module_2;

ip . . . end; .

module module_l_body for module-l; (* constants, variables and types declarations from utility files *) (* utility functions *)

(* Initializations and Transitions of the module body - to be filled later *)

end (* module_l_body *)

It is clear that this part of the reverse engineering process has to be performed by a walkthrough and by getting feedback from the maintainer of the communications software.

4.2. Recognizing behavioural components

The recognition of the behavioural components from the implementation of the protocol can be almost fully automated. A scanner is built to obtain a middle-level data structure-based representation of the behaviour of each event-driven module included in the implementation.

For each event-driven communication modules:

l Recognize the state variable name. l Recognize the driving loop. l Recognize the initialization part of the module: initial

state value and initialization statements. l Construct the behaviour representation data structure

according to the following: -For the statements outside the state matching case,

create an Estelle transition that might be executed at any state (i.e. an entry with an empty source state is added to the data structure).

-For each state matching case in the driving loop: - add a node (i.e. transition) to the list of transitions; - determine the stimuli or interaction (if it exists)

needed and its source module; - determine whether an internal stimuli or timeout

implementation exists; - determine the additional conditions needed to

accept the stimuli; - determine the next state to which the module goes

after accepting the stimuli; - determine the actions performed when accepting

the stimuli, including the output action(s) (i.e. using an output function) should it exist;

- check whether the actions contain multiple state variable assignments, in such case, if n such assignments exist, add n nodes to the list of tran- sitions and fill the related information for each transition accordingly. Those n transitions share the same source state and message type, but differ in the additional conditions, next state and action statements.

While scanning each module, a representative data structure is constructed, a graphical representation of which is shown in Fig. 2. This data structure is dynamic

Page 6: Communications software reverse engineering: a semi-automatic approach

384 K. Saleh. A. Boujarwahllnformation and Software Technology 38 (1996) 379-390

List of transitions

Sour,_e Type of Cond- Dutput Acti- Next message itions ons state

I I -

Conditions text I

El Fig. 2. A general format of an internal data structure representation of the code.

in which the list of transitions is represented by a linked list.

A translator can then take the structure and generate the complete module body definition for each of the modules in the protocol. The parts of the Estelle specifi- cation generated from this phase of the reverse engineer- ing process are described in the following:

module module_l_body for module-l; (* constants, variables and types declarations from

utility files *) (* utility functions *)

(* state set - list of all the state variable values *) (** obtain from the data structures **)

(* include all declarations that are local to the module *)

(* Initializations and Transitions of the module body *) initialize to initial-state he&J

(* all the statements prior to the execution of the driver loop *)

end;

trans (* transition part *)

when ip.interaction provided condition_is_true. from source-state to destination-state begin

(* statements to be executed *) output ipinteraction;

end, . . . . . .

end (* module_l_body *)

5. Application and follow-up

In this section, we provide an example showing the application of the reverse engineering method on a sim- ple transport protocol written in the C programming language. Parts of the input program are shown in Appendix B, and the Estelle definition extracted from the program is shown below. Then, we briefly describe the post-reverse engineering activities that can be per- formed on the Estelle specification in order to assess whether re-engineering or redesign are required.

5.1. Case study

In the following, we show the Estelle specification obtained both manually and automatically after apply- ing the reverse engineering method on the C code of a simple transport protocol listed in Appendix B. In this example, the designer has to identify the files in which the main driver of the protocol exis’ts. Also, the names of input and output functions and the name of the state variables are supplied. However, for the recovery of the behavioural part of the protocol (mainly included in the file protocol.c), the designer has to identify the protocol driver switch. The provided code did not contain nested switches. However, if nested switches existed inside the driving switch they would be treated as conditional statements related to the incoming messages or other internal decisions. We feel that the tool needs to be modified to accommodate the various complex coding

Page 7: Communications software reverse engineering: a semi-automatic approach

K. Saleh, A. Boujarwahllnformation and Software Technology 38 (1996) 379-390 385

situations that may exist in badly structured software.

However, since there are an infinity of ways one can

write bad code, it would be misleading to say that a

reverse engineering process can be completely auto- mated. In our opinion, we feel that this process will always rely on the engineer’s knowledge and can never

be fully automated.

specification TP; default individual queue; timescale seconds;

(* constant declarations *) const

. . . (* type declarations *)

type credit-type . . . T-address-type . options-type . reason-type . . order-type . . reference-type . . . seq -number-type . .

. TPDU_code_type = (CR, CC, DR, DC, ACK TPD6.L !ecord

full : boolean; order : order-type; peer-address : T-address-type; credit-value : credit-type; dest_ref : reference-type; user-data : data-type; case kind : TPDU_code_type of CR. CC: (

DR : (

DC : ( ); DT : (

ACK:(

end;

‘options_ind : options-type; TSAP_id_calling, TSAP_id_called : T_suffix_type);

is_last_PDU : boolean; disconenct_reason : reason-type);

send-sequence : seq_number_type; end_of_TDSU : boolean);

expected_seq -number : seq -number-type );

(* channel declarations *) channel AP_TS (user, provider); by user :

TCONreq ( dest_address : T-address type; proposed_options : opt&s-type);

TCONresp (accepted-options : options-type ); TDISreq ; U-Ready (credits : credit-type );

by provider: TCONind ( source-address: T-address-type ; proposed_options : options-type); TCONconf (accepted-options : options-type); TDISind( DIS_reason : reason-type); TDISconf; Ready ( );

module protocol systemactivity; ip TS: AP_TS (provider) ;

MAP: AP_MAP (protocol) ; end;

body protocol-body for protocol;

(* variable declarations *) var option : option-type;

(* states of the protocol *) state CLOSED, OPEN, WAIT_FOR_CC, WAIT-FOR-DC, WAIT_FOR_TCONRESP

. .

(* message construction functions *) function get_reason_for_error : reason-type; function construct_CR(dest_address : T-address-type, option : options-type, r-credit : credit-type): TPDU; procedure construct7TCONind(peer_address : T-address-type, optlon : options-type); procedure construct_TCONconf (option : options-type); procedure constrnct_TDISind (reason : reason-type); function construct_DR(procedure_error : reason-type, last_ DR PDU : boolean): TPDU; function construct_DT(trseq : seq_number_type, R-credit : credit-type) : TPDU; procedure construct_TDISind(reason : reason-type); function construct-DC: TPDU; procedure Construct-READY; procedure Construct-U-READY (credit : credit-type);

(* protocol state initialization*) initialize to CLOSED begin end;

trans

when TS.TCONREQ from CLOSED to WAIT_FOR_CC begin

option := proposed_option; message : = construct_Cr (dest_address, option, r-credit);

output MAP.CR; end;

when MAP.PDU provided (PDU.kind = CC) & (message.options_ind <= option)

from WAIT_FOR_CC to OPEN begin

option = CC.options_ind; TRseq = 0; TSseq = 0; s-credit = CC.credit_value; construct_TCONconf(option); output TS.TCONCONF;

end;

When MAP.PDU Provided NOT ((PDU.kind = CC) & (message.options_ind <= opt)) from WAIT_FOR_CC to WAIT-FOR-DC; begin

reason : = get_reason_for_error; construct_TDISind (reason); message : = constrnct_DR(procedure_error, false);

Page 8: Communications software reverse engineering: a semi-automatic approach

386 K. Saleh, A. Boujarwahllnformation and Software Technology 38 (1996) 379-390

output TSTDISIND; output MAP.DR,

end;

when MAP.PDU provided(PDU.kind = CR)

from CLOSED to WAIT_FOR_TCONRESP begin

option : = CR.option_ind; s-credit : = CR.credit_value; construct_TCONind(CR.peer_address,option); output TS.TCONIND;

end;

when MAP.PDU provided (PDU.kind = ACK)

from OPEN to same begin

if tsseq < ACK.expect_send_sequence then new-credit : = ACK.expected_send_sequence - (tsseq + 128) + ACK.credit_value else new-credit : = ACK.expected_send_sequence +ACK.credit_value - tsseq; if (new-credit >= 0) and (new-credit <= 15) then s-credit : = new-credit;

end;

from OPEN to WAIT-FOR-DC provided reason < > ts_user_initiated

begin reason : = get_reason_for_error; construct_TDISind(reason); output TS.TDISind; message : = construct_DR; output MAP.DR; end,

when TS.U_READY begin

r-credit : = r-credit + credits; end;

from OPEN to same begin

message : = construct_DT(trseq, r-credit); output MAP.DT;

end; _

provided map-ready and (s-credit > 0) begin

Construct-READY ( ); output TS.READY;

end;

(* more transitions here *) . . .

modvar (* has to be filled manually *)

initialize begin

(* has to be filled manually *)

end, end. (* specification TP *)

5.2. Follow-up: protocol re-engineering

Once the protocol design is recovered and is specified

in Estelle, Estelle tools [9] can be used to enhance our understanding of the protocol and the services it pro-

vides, and enhance the protocol confotmance test

plans. These Estelle-based tools can perform the follow-

ing: protocol validation [19,20], protocol performance

evaluation [21], protocol visualization [22] and protocol conformance test sequence generation [23,24]. Such post- reverse engineering activities also fall within the realm of protocol re-engineering.

The application of a protocol reverse engineering method would have the following impacts on the pro- cesses and documentation involved in the protocol engineering cycle. The communications software will be better documented. High level and detailed design would be described formally. In general, protocol reverse engineering provides easier access to design documents, easier maintainability, extendibility, testa- bility and tractability, and in general a better protocol understanding.

Thus allowing the revalidation of the design. More- over, protocol conformance test suites can be generated using existing tools. As a result, documented test plans would reduce the time spent on integration testing and other testing activities. The formal definition of the ser- vice will also be extracted from the existing design.

6. Conclusions

In this paper, we proposed a semi-automatic method for protocol reverse engineering based on the recovery of the service definition and the protocol specification, both described in Estelle. Using our tool, the reverse engineer- ing process starts from a communications protocol implementation and some information from the protocol designer as its input, and extracts a formal specification of the protocol design and its corresponding service defi- nition. Two important components of a communications software are identified using our redocumentation and recovery tool: (a) architectural and static components which include the interaction points, channels, modules and their input and output parameters, predicates, and incoming and outgoing messages and their parameters; and (b) behavioural and dynamic components which include all transitions and state variables. The extraction of architectural components is semi-automatic and is done with some assistance from the designer. However, the identification of behavioural components can be fully automated, and requires an intelligent tool since sophis- ticated operations such as the recognition of states and the normalization of transitions must be performed. The use of Estelle will allow us to use Estelle-based tools to revalidate the recovered design and to generate conformance test cases, and in general allows us to better understand the implemented protocol and its provided service. Such tools will definitely facilitate the protocol

Page 9: Communications software reverse engineering: a semi-automatic approach

K. Saleh, A. Boujarwahllnformaiion and Software Technology 38 (1996) 379-390 3x7

re-engineering if serious problems were identified in the

recovered design.

Acknowledgments

The authors would like to thank the anonymous refer- ees for their constructive comments which helped to

improve an earlier version of this paper. The authors

would also like to acknowledge the support of this

work by a Kuwait University Research Grant No.

EE059.

Appendix A

The layout of an Estelle specification of a communica-

tions protocol follows.

(* Protocol specification *) specification (protocolpame)

(* constant definitions *) const . . .

(* type definitions *) type “’

(* variable definitions *) var . . .

(* channel definitions *) channel channel-1 .

(* list of interactions for each role *)

channel channel-2 . . .

(* list of interactions for each role *)

(* module definition-l *) module

(* list of interaction points *) ip . .

end;

(* module definition-2 *) module . .

(* list of interaction points *) ip .

end; .

(* module body definition-l *) body module_l_body for module-l;

(* constant definitions *) const .

(* type definitions *) type .

(* variable definitions *) var .

(* function and procedure declarations *) function . . begin . . . end; procedure . . . begin . . end; (* module definitions *)

module . . ip . . . end; . . . (* state and stateset definitions *) state . . . stateset .

(* module body initialization *) initialize to . .

(* transition part *) trans

tname:

When (input interaction) or Delay (timegeriod);

(* variable, constant, type, functions and procedure decls *) . Provided (enabling predicate) From (from-state) To(to_state)

(declarations of constants, types, variables, functions and procedures)

Begin End (action)

(* other transitions *) end; (* of module body *)

(*module body definition 2 *) body module_2_body for module-l;

. end; (* of module body *)

. (* module variable definitions *) modvar . . (* instantiation of module variables with modules names *) init . . (* channel configuration *) connect . . end. (* of protocol specification *)

Appendix B

The following is part of a C code implementing a sim-

ple Transport Protocol. The code is composed of three

files: (1) globals file including all literals and global type

definitions and variables; (2) utilities header file, includ-

ing function definitions; and (3) the protocol file.

File protoco1.g #defineMAP_LIT 1; ##defineTC_LIT 2; #define FREQUENCY 100;

typedef struct { int length; data-type d; }

/* definitions of other types */ credit-type . T-address-type . . . options-type reason-type . . . sequence-number-type . T-suffix-type . . . . . . . . . enum TPDU_code_type = (CR, CC, DR, DC, ACK, . .);

Page 10: Communications software reverse engineering: a semi-automatic approach

388 K. Saleh, A. Boujarwahllnformation and Software Technology 38 (1996) 379-390

struct cr_cc

struct dr

struct dc struct dt

struct ack

1 options-type options ind; T-s&ix-type TSAP id-calling; T-suffix-type TSAPrid_called; } { int is_last_PDU; reason-type disconnect-reason; }

t] seq_number_type send-sequence; int end_of_TDSU; }

I seq_number_type expected_seq_ number; }

typedef struct TPDU { int full; order-type order; T-address-type peer-address; credit-type credit-value; reference-type dest_ref; data-type user-data; TPDU_code_type kind; union {

cr_cc connect; dr disconnect; dc disconf; dt transfer; ack acknowledge;

typedef struct {

typedef struct {

typedef struct {

typedef struct {

typedef struct (

typedef struct {

typedef struct { reason-type DIS_reason; } TDISind;

typedef struct { data-type TS_user_data; int End_of_SDU; } TDISconf;

} U-Ready; typedef struct {

File prot_util.h I* input functions *I

void receive_message(int from, char *message [ I);

int get-type (char*message [ I); int get-kind (char *message [ I); reason-type get_reason_for_error;

T-address-type dest_address; options-type proposed_options;} TCONreq;

options-type accepted-options;} TCONresp;

} TDISreq;

credit-type credits;} U-Ready;

T-address-type source-address; options-type proposed_options;} TCONind;

options-type accepted-options; } TCONconf;

/* output functions */

void send_message(int destination, int message-length, char *message);

/* message construction functions */

U-Ready *construct_READY(char *dest_address, int option, int r-credit); TCONind *construct_TCONIND(char *peer-address, int option); TCONconf *construct_ TCONCONF(int option); TDISreq *construct_TDISREQ ( ); dr *construct_DR(int error:reason, int flag); cr *construct_CR(int trseq, mt r-credit); TDISind *construct_TDISIND(reason_type disconnect-reason); ack *construct_ACK(seq_number_type sequence-number);

File protoco1.c

Ohinclude (protoco1.h) %include (protoco1.g)

int tpgrotocol_driver ( ) {

enum stateid = (CLOSED, OPEN, WAIT-FOR-CC, WAIT-FOR-DC, WAIT-FOR_ TCONRESP);

enum message-type = (READYid, TCONREQid, TRANSFERid, . . .); enum message-kinds = (DRkind, CCkind, ACKkind, CRkind, . . .);

/* variable declarations */ int r-credits; int reason; int map-ready = 0; int s-credits = 0; int option; int trseq; int tsseq; int new-credit; stateid state, previous-state; message-type message-id; message-kinds message-kind;

/* set the initial state */ state = CLOSED;

While (1)

{ receive message(from, message);/* get message and source=/ message-id = get-type (&message); message-kind = get-kind (&message);

switch (state) { case CLOSED:

/* transition tl */ if (from == TS_LIT) && (message-id == TCONREQid) {

previous-state = state; state = WAIT_FOR_CC; option = TCONREQ.option_ind; CR = construct CR (dest_address,oition,r_credit); send-message (MAP_LIT,&CR); break;

3* transition t2 */ else if (from == MAP-LIT) && (message-id == TRANSFERid)

Page 11: Communications software reverse engineering: a semi-automatic approach

K. Saleh, A. Boujarwahllnjormation and Software Technology 38 (1996) 379-390 389

&& (message-kind == CRkind)

\ nrevious state = state: state = WAIT-FOR TCONRESP; option = CR.option2nd; s-credit = CR.credit_value;

send_message(TS_LIT, construct_TCONind (CR.peer_address,option));

break;

case WAIT_FOR_CC: if ((from == MAP-LIT) && (message-id == TRANSFERid) && (message-kind == CCkind)) if (message_id.options_ind <= option ) /* transition t3 *I { previous-state = state;

state = OPEN; option = CC.options_ind; trseq = tsseq = 0; s-credit = CC.credit_value; send_message(TS_LIT, construct_TCONCONF(option)); break;

1 else /* transition t4 */ 1 previous-state = state;

state = WAIT_FOR_DC; reason = get_reason_for_error; send_message(TS_LIT, construct_TDISIND(reason)); send_message(MAP_LIT, construct_DR(reason,false)); break;

case OPEN: if (from == MAP-LIT) && (message-id == TRANSFERid) && (message-kind == ACKkind) /* transition t5 */

previous state = state; if (tsseq < ACK.expected_send_sequence)

new-credit = ACK.expected_send_sequence- (tsseq + 128) + ACK.credit_value;

else new-credit = ACK.expected_send_sequence ACK.credit_value-tsseq;

if (new-credit >= 0) && (new-credit <= 15) s-credit = new-credit;

break;

/* transition t6 */ if (reason != ts_user_intiated) {

previoussstate = state; state = WAIT-FOR-DC; reason = get_reason_for_error; send-message (TS_LIT, construct_TDISIND(reason)); send_message(MAP_LIT, construct_DR (reason, false)); break;

j* transition tz -- self-loop *I /* execute the following anytime at this state */ counter++;

if (!MOD(counter,FREQUENCY))

send_message(MAP_LIT, construct_ACK (trseq,r_credit));

I

case WAIT-FOR-DC: . break;

case WAIT_FOR_TCONRESP: . . break;

} I* switch *I /* transition tx “1 /* execute the following regardless of the current state */

if ((from == TS_LIT) && (message-id == READYid)) r-credit + = credits;

/* transition ty *I /* execute the following regardless of the current state and

incoming message *I if ((map-ready) && (s-credit > 0))

{ send_message(TS_LIT, construct_READY( ));

1

} /* The big while*/

References

Ul

[21

131

[41

[51

PI

[71 PI

[91

PO1

T. Biggerstaff, Design recovery for maintenance and reuse, IEEE Computer (July 1989) 36-49. E. Chikofsky and J. Cross, Reverse engineering and design recov- ery: a taxonomy, IEEE Software (January 1990) 13-17. G.v. Bochmann, Finite state description of communication pro- tocols, Computer Networks, 2, no. 4/.5 (September 1978) 361-372. International Organization for Standardization - Programming Language - Pascal IS 7185. T.F. Piatkowski, An engineering discipline for distributed proto- col systems, in Proc. IFIP workshop on Protocol Testing, 1981, pp. 177-215. FORTE, Proc. Int. Conf. on Formal Description Techniques, 1988-1996. IWPTS Proc. Int. Workshop on Protocol Test Systems, 1988-1994. PSTV, Proc. Int. Symp. on Protocoi Specification, Testing and Verification 1980-1996. G.v. Bochmann, Usage of protocol development tools: the results of a survey, in Proc. IFIP Int. Conf. on Protocol Specification, Testing and Verification VII, Zurich, North-Holland, Amsterdam, 1988. M.T. Liu, Protocol engineering, Advances in Computers Vol 29 (1989) 79-195.

[l l] S. Paul, A. Prakash, E. Buss and J. Henshaw, Theories and tech- niques of program understanding, Proc. of IBM Center for Advanced Studies Conference CAS conf ‘9 1 (October 1991).

[12] E. Buss and J. Henshaw, A software reverse engineering experi- ence, in Proc. IBM Centre for Advanced Studies Conf. CAS conf ‘91, October 1991, pp. 55-72.

[13] K. Gillis and D. Wright, Improving software maintenance using system-level reverse engineering, Proc. IEEE Conf. on Software Maintenance, 1990, pp. 84-90.

[14] ISO-Information Processing Systems-Open Systems Intercon- nection: Estelle-a formal description technique based on an extended state transition model, IS 9074.

[ 151 G.v. Bochmann, Specifications of a simplified transport protocol

Page 12: Communications software reverse engineering: a semi-automatic approach

390 K. Saleh. A. Boujarwahllnformation and Software Technology 38 (1996) 379-390

using different formal description techniques, Computer Networks and ISDN Systems, Vol 18, no. 5 (June 1990) 335-378.

[ 161 A. Lombardo, On Estelle specification of OS1 protocols, in Proc. Computer Networking Symp., Washington, DC (November 1986) pp. 110-l 19.

[17] K. Saleh and H. Ural, Formal specification of an information gateway service interface in Estelle, Computer Standards and Interfaces, 16 (1994) 341-368.

[18] S. Budkowski and P. Dembinski, An introduction to Estelle: a specification language for distributed systems, Computer Net- works and ISDN Systems, 14 (1987) 13-23.

[19] P. de Saqui-Sannes and J.P. Courtiat, From the simulation to the verification of Estelle specifications, in Proc. 2nd Int. Conf. on Formal Description Techniques, 1989, pp 524-541.

[20] B. Pehrson, Protocol verification for OSI, Computer Networks and ISDN Systems (1989/90) 185-202.

[21] P. K&zinger and G. Wheeler, A protocol engineering work- bench applied to protocol performance analysis in Proc. 2nd Int. Conf. on Formal Description Techniques, 1989, pp. 63-76.

[22] D. New and P. Amer, Adding graphics and animation to Estelle, in Proc. 9th IFIP Int. Symp. on Protocol Specification, Testing and Verification, 1989.

[23] H. Ural, Test sequence selection based on static data flow analysis, Computer Comm., 10 (1987) 234-242.

[24] C. Wang and M.T. Liu, Automatic test case generation for Estelle, in Proc. 1993 Int. Conf. on Network Protocols, October 1993, pp. 225-233.