[ieee 2006 13th working conference on reverse engineering - benevento, italy...

10
Reverse Engineering of System Interfaces A Report from the Field Harry M. Sneed and Stephan H. Sneed ANECON GmbH, Vienna E-mail: [email protected] Abstract: This paper is a report on three real industrial projects conducted to redocument the system interfaces of existing application systems a stock trading system, a health insurance system and a bank credit system. The first reverse engineering project was for the purpose of accessing the backend C server software from Web clients. The second project was for the purpose of testing the interfaces between COBOL subsystems running parallel in two interconnected environments a Unix computer and a Bull mainframe. The third project was to extract procedures from existing PL/I programs for reuse as web services. The lessons learned from the three projects are assessed to draw conclusions on how to improve the development process. Keywords: Reverse Engineering, Interface comprehension, data structures in C/C++, COBOL and PL/I, XML, XMI, WSDL. System comprehension entails much more than just understanding the individual programs. A system is more than the sum of its parts. It is the sum of the parts plus the sum of all relationships between those parts and the sum of all effects caused by the interaction between those parts. [1] An IT system consists today of many different types of elements. Besides the programs or classes, there are also the data stores and the data interfaces. Much has been written on the comprehension of programs and classes, as well as on the reverse engineering of databases and files, but little research has been done on the understanding of data interfaces, though this is very important for system integration. This contribution describes two reverse engineering projects intended to redocument existing system interfaces. The problems encountered in these projects and the results obtained will hopefully encourage more research on this important topic. The prerequisite to reusing existing software components in another context is to be able to wrap them and the prerequisite to wrapping is to be able to comprehend and re-specify the interfaces. Thus, interface comprehension is an essential activity in achieving Enterprise Application Integration. The problem lies in separating the interface from the rest of the code in which the interface is embedded. First the interface must be recognized and extracted. After that, it can be analyzed and documented. 1. Overview of the Interface Reverse Engineering Projects Three case studies taking place in the last five years are being described in this paper to point out the objectives and difficulties of industrial reverse engineering. In all three cases, the goal was to recapture the structure and content of existing system interfaces in order to be able to reuse the legacy software in another context. The three projects involved are a Unix based client/server system for stock exchange with a frontend in C++ and a backend in C a Bull mainframe system for medical insurance written in COBOL a bank credit system implemented in PL/I on the IBM mainframe The fact that two of the three projects failed to meet their objectives demonstrates how difficult this problem is. There are not only technical barriers to overcome, but also organizational and human barriers to the reuse of legacy systems. The greatest barrier is the complete lack of documentation. All of the information required had to be taken from the source code. It is a sad fact that, not even the internal interfaces of mission critical application systems are documented. 2. Redocumenting C interfaces in a client/server environment The first project to be described here, took place in the summer of 2001 within the context of a large scale evolution project for an existing financial application. The application named GEOS for Global Entity Ordering System is a stock trading system for banks implemented in a client/server environment. The system is divided into a frontend running on client workstations under MS-Windows and a backend running on an IBM mainframe under OS-MVS. [2] Proceedings of the 13th Working Conference on Reverse Engineering (WCRE'06) 0-7695-2719-1/06 $20.00 © 2006

Upload: stephan

Post on 28-Mar-2017

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: [IEEE 2006 13th Working Conference on Reverse Engineering - Benevento, Italy (2006.10.23-2006.10.27)] 2006 13th Working Conference on Reverse Engineering - Reverse Engineering of System

Reverse Engineering of System Interfaces A Report from the Field

Harry M. Sneed and Stephan H. SneedANECON GmbH, Vienna

E-mail: [email protected]

Abstract: This paper is a report on three realindustrial projects conducted to redocument thesystem interfaces of existing application systems astock trading system, a health insurance system anda bank credit system. The first reverse engineeringproject was for the purpose of accessing thebackend C server software from Web clients. Thesecond project was for the purpose of testing theinterfaces between COBOL subsystems runningparallel in two interconnected environments aUnix computer and a Bull mainframe. The thirdproject was to extract procedures from existingPL/I programs for reuse as web services. Thelessons learned from the three projects are assessedto draw conclusions on how to improve thedevelopment process.Keywords: Reverse Engineering, Interfacecomprehension, data structures in C/C++, COBOLand PL/I, XML, XMI, WSDL.

System comprehension entails much more thanjust understanding the individual programs. Asystem is more than the sum of its parts. It is thesum of the parts plus the sum of all relationshipsbetween those parts and the sum of all effectscaused by the interaction between those parts. [1]An IT system consists today of many differenttypes of elements. Besides the programs or classes,there are also the data stores and the data interfaces.Much has been written on the comprehension ofprograms and classes, as well as on the reverseengineering of databases and files, but littleresearch has been done on the understanding of datainterfaces, though this is very important for systemintegration. This contribution describes two reverseengineering projects intended to redocumentexisting system interfaces. The problemsencountered in these projects and the resultsobtained will hopefully encourage more research onthis important topic.

The prerequisite to reusing existing softwarecomponents in another context is to be able to wrapthem and the prerequisite to wrapping is to be ableto comprehend and re-specify the interfaces. Thus,interface comprehension is an essential activity inachieving Enterprise Application Integration. Theproblem lies in separating the interface from therest of the code in which the interface is embedded.

First the interface must be recognized andextracted. After that, it can be analyzed anddocumented.

1. Overview of the Interface ReverseEngineering Projects

Three case studies taking place in the last fiveyears are being described in this paper to point outthe objectives and difficulties of industrial reverseengineering. In all three cases, the goal was torecapture the structure and content of existingsystem interfaces in order to be able to reuse thelegacy software in another context. The threeprojects involved are

a Unix based client/server system for stockexchange with a frontend in C++ and a backendin C

a Bull mainframe system for medical insurancewritten in COBOL

a bank credit system implemented in PL/I on theIBM mainframe

The fact that two of the three projects failed tomeet their objectives demonstrates how difficultthis problem is. There are not only technicalbarriers to overcome, but also organizational andhuman barriers to the reuse of legacy systems. Thegreatest barrier is the complete lack ofdocumentation. All of the information required hadto be taken from the source code. It is a sad factthat, not even the internal interfaces of missioncritical application systems are documented.

2. Redocumenting C interfaces in aclient/server environment

The first project to be described here, took placein the summer of 2001 within the context of a largescale evolution project for an existing financialapplication. The application named GEOS forGlobal Entity Ordering System is a stock tradingsystem for banks implemented in a client/serverenvironment. The system is divided into a frontendrunning on client workstations under MS-Windowsand a backend running on an IBM mainframe underOS-MVS. [2]

Proceedings of the 13th Working Conference on Reverse Engineering (WCRE'06)0-7695-2719-1/06 $20.00 © 2006

Page 2: [IEEE 2006 13th Working Conference on Reverse Engineering - Benevento, Italy (2006.10.23-2006.10.27)] 2006 13th Working Conference on Reverse Engineering - Reverse Engineering of System

The frontend software was developed in C++using the GUI technologies of the 1990 s. Therewere at this time some 4738 classes with 1150000statements grouped into 1100 components on thefrontend side. The backend was implemented inmacro C using a preprocessor to generate the finalcode for execution under the teleprocessing monitorCICS. The backend was much larger with some1.450.000 statements in 2833 modules grouped into22 subsystems. Each module made up anexecutable load module which was linked togetherwith the other modules of that subsystem to create asingle program running within its own virtualpartition.

The backend included a database access shellcomprising special SQL macro modules, one foreach of the 632 DB2 tables. It was the task of thesemodules to provide the processing modules withdata objects created by selecting and joining datafrom various tables and after processing to store themodified and newly created data objects back intothe appropriate database tables.

The interaction between the frontend and thebackend was realized by using a commonintermediate database in this case an IBM UDB-2relational database. The client components insertedtheir messages into a database table, one for eachbackend subsystem and the server scheduler, whichwas constantly searching for new messages, pickedthem up, checked their destination and routed them

to the appropriate backend module for processing.When a backend process was finished it wouldstore its response in another intermediate databasetable and this would be selected by the client, whichafter storing a request, would make a periodic queryof the message database whether its response wasthere. If, after a given time, there was still noresponse, the client would assume a timeout error.There were more than 500 interfaces implementedin this way. (See Figure 1.)

This describes the physical exchange of databetween the frontend and the backend. The logicalformat of the exchanged data was only partiallystandardized. There could be singular and multipleobjects, an object being a set of attributes with aunique identifier. The structure of the objects wasleft to the individual developers. Thus, the frontenddeveloper would meet with the backend developerand they would agree upon a certain ordering of thedata which each would implement as a C strucstatement. Sometimes, the structure declaration wasput in a header file, other times it was codeddirectly into the C module or the C++ class. Theonly convention was in the naming of the scheduledmodules which read in the requests and wrote outthe responses. They had distinct names identifyingthem as communication components.

As requirements changed so did the structuresof the interfaces. More and more data was added sothat in the end the structure became very complexincluding length fields and pointers. At thebeginning, the interfaces were documented in theform of tables but as time went on, the documentionwas neglected. Only the C code remaineddescribing the interface structures.

The greatest obstacle to comprehending theinterfaces was the frequent use of alternate datastructures based on pointers set at run time. Whichalternate structure was used, i.e. which pointer wasset, was based on the value of some field in theheader of the interface that was checked when thedata came in. any static description of the interfacehad to include all potential data layouts and thevalue indicating which one would be used.

When the internet became available the ownerof the stock trading system was pressured by thecustomers to provide a web access to the backend.For this purpose, a new Java team was set up todevelop a web based frontend using HTML masksand a HTTP connection to a web server linked tothe mainframe. The physical interaction wasimplemented and confirmed. The web server couldinsert the requests in the message database and pickup the responses. The problem came up with thelogical content of the messages. Since there was nolonger a valid documentation, the only source ofinformation was the C source code.

The format of the new web interfaces was to bein XML with an XML schema to describe thestructure and contents of the interface. For thisreason, it was decided to redocument the interfacesin XML. A reverse engineering project was startedin the summer of 2001 to produce an XML schemafor each server interface. The task was assigned tothe second author, who went to work writing a Java

DB2 Data Base

UDB-2Database

InterfaceComponent

Msg.DB

Msg.DB

Communication Access Shell (SQL)

InterfaceComponent

Backend Processing( CBE Components )

IBMMainframe

OS/MVS

OutputMessageQueue

InputMessage

Queue

UNIXProcessor

Routing Computer

Input Messages Output Messages

FrontendMS-WindowsClients

Figure 1: GEOS Client/Server Data Exchange

Proceedings of the 13th Working Conference on Reverse Engineering (WCRE'06)0-7695-2719-1/06 $20.00 © 2006

Page 3: [IEEE 2006 13th Working Conference on Reverse Engineering - Benevento, Italy (2006.10.23-2006.10.27)] 2006 13th Working Conference on Reverse Engineering - Reverse Engineering of System

parser to parse the C code and to extract the Cstructure declarations. The difficulty lay inrecognizing the C data types and in handling theundefined strings and the repeating groups as wellas the alternate structures. The interfaces werenormally defined as a C struc statement. (see Figure2)typedef struct Tupel { struct { ObjType objtype; unsigned short usDomain; DbType dbtype; union { short iSize; char fsz[255]; short sTableId; AKEY fk; short iDB; char fc; short sVars; short fs; char * pszLogKey; long fi; char * pszSurrogat; double fr; ADATI * pdtTimeStamp; ADATA fda; short * psSysstatus; ADATI fdt; ADATI * pdtTstTo; ATIME fti; char * pszMAND_Id; AIMAGE fimg; short * psIndicators; } data; void ** ppParents; } fszValue, REC_I aVars[4]; } Tuple_1; short fsDbtab_Nr; short fsAttr_Nr; char fkAttribut_Unter[5];

Figure 2: CBE Client/Server interface

C data types are defined somewhere outside ofthe structure in which they are used, mainly in atypedef command. A type can be derived fromother types so that this entails a recursive searchback to the original type definition. Here, the datatypes were defined in header files which referred totype definitions in other header files. So a chain ofheader files had to be processed in order to get tothe final data type and length which was thentranslated into a corresponding XML type using thestandard DOM procedures. [3]

Undefined strings were converted to CDATAelements so that they could contain any type ofcharacter. The lengths were specified in a generatedlength variable. Multiple groups were assigned tothe occurs attribute with a variable to contain themaximum number. For the alternate overlaystructures a choice selection based on the indicatorfield was used. This resulted in some complexschema structures, but in the end the interfacestructures were captured and it was done fullyautomatic. The input was the C source librarieswith the interface modules and all header files. Theoutput was a general XML schema with asubschema for each interface. (see Figure 3)

<parameters> <param type="struct" name="Tupel"> <param type="ObjType" name="objtype"/> <param type="DbType" name="dbtype"/> <param type="short" name="iSize"/> <param type="short" name="sTableId"/> <param type="short" name="iDB"/> <param type="short" name="sVars"/> <paramtype="char*"name= pszLogKey />

<param type="char*" name="pszSurrogat"/> <param type="ADATI*" name="pdtTimeStamp"/> <param type="short*" name="psSysstatus"/> <param type="ADATI*" name="pdtTstTo"/> <param type="char*" name="pszMAND_Id"/> <param type="void**" name="ppParents"/> <param type="REC_I" name="aVars[4]"/> <param type="short" name="fsDbtab_Nr"/>

<param type="short" name="fsAttr_Nr"/> <param type="char" name ="fkAttribut_Unter[5]"/> <param type="struct" name="Tupel_1"> <param type="struct" name="fszValue"> <group grouporder = "OR"> <param type="short" name="usDomain"/> <param type="struct" name="data"> <param type="char" name="fsz[255]"/> <param type="AKEY" name="fk"/> <param type="char" name="fc"/> <param type="short" name="fs"/> <param type="long" name="fl"/> <param type="double" name="fr"/> <param type="ADATE" name="fda"/> <param type="ADATI" name="fdt"/> <param type="ATIME" name="fti"/> <param type="AIMAGE" name="fimg"/> </param> </group> </param> </param></parameters>

Figure 3: Generated XML Client/Server interface

This interface reengineering project wascompleted within two months. The results weresuch that the XML schema for those 92 interfacesanalyzed could be used by the Java team to definethe structure of their XML messages to thebackend. This interaction was tested and proven tobe correct in all but a few exceptions, which werecaused by conflicting data type definitions. Thisproblem has also been noted in previous studies. [4]Unfortunately, not all of the interfaces could beanalyzed. In the case of some 50 interfacecomponents, it was not possible to extract theinterface clearly from the code. Because of this andthe fact that the performance was never acceptable,the web frontend project was later abandoned.

3. Documenting COBOL interfacesin a Bull mainframe environment

The second project to be described took place inthe fall of 2004 as part of a mainframe to Unixmigration project in an insurance company. Sincethe migration took place in an incremental mode one subsystem after the other part of theapplication was already in operation on the Unixserver whereas the remainder was still in operationon the mainframe. Current data was passed betweenthe two machines by means of sequential files, asthe two databases Bull IDS and Oracle RDMS were incompatible with one another. Thisincremental migration approach using an

Proceedings of the 13th Working Conference on Reverse Engineering (WCRE'06)0-7695-2719-1/06 $20.00 © 2006

Page 4: [IEEE 2006 13th Working Conference on Reverse Engineering - Benevento, Italy (2006.10.23-2006.10.27)] 2006 13th Working Conference on Reverse Engineering - Reverse Engineering of System

intermediate data exchange has been described byStonebraker and Brody and is referred to as theChicken-Little strategy. [5] (see Figure 4)

One of the main problems using this approachcame up with the testing of the migratedcomponents on the new Unix platform. The inputfiles from the mainframe had to be simulated. Forthat it was necessary to edit the files and for that itwas necessary to document and display theirstructure and content. Testing is the major costdriver in many software projects, particularly inmigration projects, where it makes up for more than50% of the total effort and the most cost intensivetask in testing is creating the proper test data. [6]

The interface files in this project were producedand consumed by COBOL programs on both sides.The Bull-COBOL programs which had beenrunning on the mainframe were converted to MF-COBOL programs on the Unix with some minoradaptations, the goal being to retain COBOL as theprimary application language. The exchange fileswere created from complex COBOL data structuresusing variable length fields, mixed data types andmultidimensional repeating groups. These filestructures reflected the complexity of the networkedtype database system on the mainframe.

To document the interfaces it was necessary asin the GEOS project to go into the program code tofind the interface description and as in the GEOSproject this task was complicated by the fact thatthe interface descriptions were embedded in theprogram source. And, as in the GEOS project, theauthor chose to convert the files into an XMLstructure so that the testers could edit them using anXML editor.

To identify the interface files the author beganby analyzing the Job Control procedures. Bull hasit s own special JCL language so special scannerhad to be developed for that. The scannerrecognized which physical data files were allocatedas data exchange files and assigned to logicalnames contained within the programs which wrotethem. This was the first step. The job controlanalyzer produced a list of physical and logical filenames for the interface files.

The second step was to analyze the COBOLsource code using a standard LR COBOL parser.[7] The logical file names from the JCL proceduresare declared in the SELECT statements of the FILESECTION. These statements refer to the FD Filedescription statements in the DATA DIVISION.There the record descriptions are contained. Insome cases the record structures are defined in full.

in other cases, the record is only a string and thecontents of the string are defined elsewhere in theworking storage. (see Figure 5)

In the GEOS project the greatest obstacle tolocating the actual interface structure in C lay inidentifying which pointers were set to the addressof the interface descriptions. Here in COBOL theproblem lay in resolving the redefinitions. For onephysical record area there could be several logicalrecord descriptions. The current valid recorddescription was determined dynamically by thevalue of a record type variable. To staticallydocument the structure of the record it wasnecessary to select all potential record descriptionsas overlay structures. Further more, it wasnecessary to detect which data structures weremoved into the record area. So the contents of arecord could be derived from one of three sources:

by setting the fields in the original recorddescription,

by redefining the record structure withalternate data structures or

by moving different data structures into therecord buffer.

01 WSY016. 05 WSY016-REC. 10 WSY016-SATZART PIC 9(003). 10 FILLER PIC X(797). 05 WSY016-DATEI-ANFANG REDEFINESWSY016-REC. 10 FILLER PIC 9(003) VALUE ZERO. 10 FILLER PIC X(001) VALUE "^". 05 WSY016-DATEI-ENDE REDEFINES WYS016-REC. 10 FILLER PIC 9(003) VALUE 999. 10 FILLER PIC X(001) VALUE "^". 05 WSY016-DOK-TRENNBLATT REDEFINESWSY016-REC. 10 FILLER PIC 9(003) VALUE 800. 10 FILLER PIC X(001) VALUE "^". 10 WSY016-TRENNBL-FARBE PIC 9(001). 88 WSY016-TRENNBL-ROT VALUE 1. 88 WSY016-TRENNBL-BLAU VALUE 2. 88 WSY016-TRENNBL-GRUEN VALUE 3. 10 FILLER PIC X(001) VALUE "^". 10 WSY016-TRENNBL-TEXT PIC X(040). 10 FILLER PIC X(001) VALUE "^". 10 WSY016-TRENNBL-DATUM PIC X(010). 10 FILLER PIC X(001) VALUE "^". 05 WSY016-DOK-HEADER REDEFINES WSY016-REC.

Data Access Shell

InterfaceFile

AccessRequest

COBOLService

Component

COBOLService

Component

COBOLApplicationComponent

COBOLApplicationComponent

IDS DBCorporateDatabase

Data Access Shell

COBOLService

Component

COBOLApplicationComponent

COBOLApplicationComponent

Figure 4: UNIX Migration Architekture

UNIX Computer BULL Mainframe

Part of the Applicationhas been integrated

Rest of the ApplicationRemains on the Mainframe

Proceedings of the 13th Working Conference on Reverse Engineering (WCRE'06)0-7695-2719-1/06 $20.00 © 2006

Page 5: [IEEE 2006 13th Working Conference on Reverse Engineering - Benevento, Italy (2006.10.23-2006.10.27)] 2006 13th Working Conference on Reverse Engineering - Reverse Engineering of System

10 FILLER PIC 9(003) VALUE 100. 10 FILLER PIC X(001) VALUE "^". 10 WSY016-HEADER-SPARTE PIC X(003). 10 FILLER PIC X(001) VALUE "^". 10 WSY016-HEADER-GPNR PIC 9(003). 10 FILLER PIC X(001) VALUE "^". 10 WSY016-HEADER-FORMULAR IC X(006). 10 FILLER PIC X(001) VALUE "^". 10 WSY016-HEADER-SBNR PIC 9(006). 10 FILLER PIC X(001) VALUE "^". 10 WSY016-HEADER-BEILAGE OCCURS 6 INDEXED BY WSY016-I. 15 WSY016-T-BEILAGE PIC 9(003).

Figure 5: COBOL Interface Description

To document a record structure it was necessaryto combine all three possible sources of data. Theresult was an intermediate COBOL recorddescription in a separate copy member. This recorddescription could contain several structuresredefined upon the base one. In fact, each structuremoved to the record or a part there of became aredefinition. In this way the dynamic data structureoverlay was converted to a set of potential datastructures redefined upon one another.

Once all possible views of the record inquestion had been joined into a common unifiedrecord description it was possible in a third step toconvert that COBOL data structure into acorresponding XML schema structure. The singledimensional arrays were converted to a vector. Themultiple dimensional arrays were converted tonested vectors. The repeating groups wereconverted to embedded objects with variablenumber of occurrences. The redefined structureswere converted to alternate structures using achoice selection depending on the record typevariable. The COBOL data types were converted tothe corresponding XSL data types, e.g. character tostring, decimal to decimal, binary to integer, etc.(see Figure 6 below)

<?xml version = "1.0" encoding="UTF-8" ?><!--This schema was generated from prog: by the Sneed Tool GENSCHEMA on date:041026 --><schema name = "rundlauf" xmlns= "XSDCOB"><XSDCOB:complexType type = "#file" name = "rundlauf" content = "eltOnly" model = "closed"><XSDCOB:complexType type = "#record" name ="WSY016" content = "eltOnly" model = "closed" level ="01" occurs = "ONEORMORE" minOccurs = "0001"maxOccurs = "unbounded"><XSDCOB:complexType type = "#group" name ="WSY016-REC" content = "eltOnly" model = "closed" level= "05" occurs = "ONEORMORE" minOccurs = "0001"maxOccurs = "0001"><XSDCOB:element type = "#dec" name = "WSY016-SATZART" content = "TextOnly" model = "closed" level ="10" occurs = "ONEORMORE" minOccurs = "0001"maxOccurs = "0001" pos = "0001" lng = "0003" val = ""pic = "9(003)" usage = "DISPLAY"/><XSDCOB:element type = "#char" name = "FILLER"content = "TextOnly" model = "closed" level = "10" occurs

= "ONEORMORE" minOccurs = "0001" maxOccurs ="0001" pos = "0004" lng = "0797" val = "" pic = "X(797)"usage = "DISPLAY"/></XSDCOB:complexType><XSDCOB:complexType type = "#group" name ="WSY016-DATEI-ANFANG" content = "eltOnly" model ="closed" level = "05" redef = "WSY016-REC" occurs ="OPTIONAL" minOccurs = "0001" maxOccurs = "0001"><XSDCOB:element type = "#dec" name = "FILLER-0001"content = "TextOnly" model = "closed" level = "10" occurs= "ONEORMORE" minOccurs = "0001" maxOccurs ="0001" pos = "0001" lng = "0003" val = "000000000000" pic = "9(003)" usage = "DISPLAY"/><XSDCOB:element type = "#char" name = "FILLER-0002" content = "TextOnly" model = "closed" level = "10" occurs= "ONEORMORE" minOccurs = "0001" maxOccurs ="0001" pos = "0004" lng = "0001" val = "^" pic = "X(001)"usage = "DISPLAY"/></XSDCOB:complexType>

Figure 6: Generated XML Schema

After the 3rd step an XSD schema existed to beused as a basis for converting the contents of thesequential flat files into an XML format, whichcould then be edited using an XML editor. In thismanner some 124 COBOL interface files wereconverted to XML and edited with selected test datato test the migrated COBOL programs in the UNIXenvironment. The goals of the project were met bythe author within a month by reusing existingCOBOL parsers and XML generators. As a resultof this effort, the testers were able to edit and enrichthe data import files to test the UNIX conversion.

4. Redocumenting PL/I Interfaces forreuse as Web Services

The third and final project in this series ofinterface reverse engineering projects was carriedthrough in the summer of 2005 as an experiment fora Swiss bank in the reuse of mainframe PL/Iprograms as web services. The goal was to extractthe procedural interfaces from the source code andto convert them both into WSDL interfaces as wellas into BPEL4WS procedures. The WSDLinterfaces were to describe the structure of therequests and responses whereas the BPELprocedures were to describe how to invoke theWSDL requests and to receive the WSDLresponses. Thus, from the one input the PL/Iprogram source two outputs should be produced the WSDL interface description and the BPELinvocation procedure. The BPEL procedure wasintended to be made available in a library of BPELscript fragments which could be assembled intocomplete business processes when required.

The first step as in the proceeding projects wasto parse the source code, extract the procedureinterfaces and to store them as separate includemembers. In PL/I there are both external and

Proceedings of the 13th Working Conference on Reverse Engineering (WCRE'06)0-7695-2719-1/06 $20.00 © 2006

Page 6: [IEEE 2006 13th Working Conference on Reverse Engineering - Benevento, Italy (2006.10.23-2006.10.27)] 2006 13th Working Conference on Reverse Engineering - Reverse Engineering of System

internal procedures. External procedures correspondto modules and can be separately compiled. Internalprocedures are embedded in an external procedure.They can be invoked from outside if they aredeclared to be external. However, they are bound tothe external procedure by the common datadeclared in that procedure. In contrast to COBOL,they have their own parameters and their own datastack. An external procedure may contain anynumber of internal procedures. As a rule theinternal procedures correspond to basic businessfunctions. The role of the external procedure is toset up the common data and to control theexecution sequence of the internal procedures.Therefore, it is the internal procedures which haveto be targeted as potential web services. The task ofcontrolling their execution sequence should betaken over by the client business process.

The same problem came up here as with allwrapping of embedded procedures and that is thedata state. When an internal procedure in PL/I isinvoked, it inherits the data state of the procedure inwhich it is embedded. It can access and alter thatglobal data. Thus, if the embedded procedure is tobe used as on it s own, something must be done toinitialize the state of the global data before startingthe procedure and to preserve that state when theprocedure is ended. Global data usage has alwayspresented the greatest challenge to the reuse ofembedded code units. [8]

To solve that problem here, the data usage of theoriginal procedures was analyzed to identify theglobal variables. These were then included asadditional parameters in the interface definition.This meant that the declaration of the originalprocedure had to be altered. In the declaration anew parameter was added to the parameter list,namely a separate structure which included allglobal variables referenced by that procedure. Thesame structure had to be included as a complex datatype in the WSDL. At this point, the former internalprocedure became an external procedure and couldbe compiled independently of the externalprocedure in which it had been embedded.

KAS_TABLE: PROC (GLOBAL_DATA)RETURNS(bit(1));/*Global Data inserted as Parameter */DCL 1 KAS_TABLE_PARAMS, 2 #T9999RC CHAR(2), 2 #T9999LV CHAR(2), 2 #K9999CC CHAR(2), 2 #T9999FU CHAR(4), 2 #T9999TA CHAR(4), 2 #T9999SG CHAR(4), 2 #K9999RE CHAR(2), 2 #K999904 CHAR(8),

2 #K9999RS CHAR(8), 2 #K9999DP PTR, 2 #K501265(20) BIT, 2 P50, 3 FILL1 CHAR(4), 3 LANDKREDIT(8) BIT, 3 FILL2 CHAR(15), 2 TABP50(20) CHAR DEFINES P50, 2 KTO, 3 FILL3 CHAR(6), 3 C_KAS CHAR(3), 3 FILL4 CHAR(14), 3 I_ID_ACNT CHAR(10), 3 FILL5 CHAR(22), 2 CIF, 3 FILL6 CHAR(12), 3 C_BUID CHAR(4), 3 FILL7 CHAR(4)/*---------------------------------*/

Figure 7: PLI Procedure InterfaceOnce the internal procedures had been extracted

from the source code and stored as separate sourcemembers, it was possible to process themindividually. Each procedure declaration wasconverted into a WSDL schema. For this, the datadeclarations of the parameters in the original sourcehad to be analyzed. Often the parameters were datastructures. In this case, the entire data structure withall of it s subordinate elements had to be includedin the schema description. Structures andsubstructures were converted into complex datatypes. Elementary data items were converted toequivalent WSDL data types, eg. CHAR to stringBIT to boolean and BINARY to integer. The resultwas an WSDL schema as depicted below.<definitionsxmlns="http://schemas.xmlsoap.org/wsdl"><types><schema xmlns="http://www/XMLSchema"><complexType name="#P50"> <sequence> <element name="FILL1" type="string"lng="4"/> <element name="LANDKREDIT" type ="string" lng="1"/></sequence></complexType><complexType name="#KTO"> <sequence> <element name="FILL3" type="string"lng="6"/> <element name="C_KAS" type = "string"lng="3"/> <element name="FILL4" type="string"lng="14"/> <element name="I_ID_ACNT" type="string" lng="10"/></sequence></complexType><complexType name="#CIF"> <sequence> <element name="FILL6" type="string"lng="12"/> <element name="C_BUID" type = "string"lng="4"/></sequence></complexType>

Proceedings of the 13th Working Conference on Reverse Engineering (WCRE'06)0-7695-2719-1/06 $20.00 © 2006

Page 7: [IEEE 2006 13th Working Conference on Reverse Engineering - Benevento, Italy (2006.10.23-2006.10.27)] 2006 13th Working Conference on Reverse Engineering - Reverse Engineering of System

<complexType name="#GLOBAL_DATA"> <sequence> <element name="#T9999RC" type="string"/> ........................................ <element name="#K9999DP" type="double"/> <element name="#K501265" type="boolean" MinOccurs="20" MaxOccurs="20"/> <element name="P50" type="tns:#P50"/> <element name="KTO" type="tns:#KTO"/> <element name="CIF" type="tns:#CIF"/> </sequence></complexType> <element name="KAS_TABLE_PARAMS"type="tns:#KAS_TABLE_PARAMS"/></schema></types><message name="CreditCheck"> <part name="parameters"element="tns2:KAS_TABLE_PARAMS"/></message><portType name="CreditCheck"> <operation name="KAS_TABLE"> <input message="tns:CreditCheck"/> <output message="tns_ReturnCode"/> <fault name="KAS_Exception"message="tns:ReturnCode"/> </operation></portType></definitions>

Figure 8: Generated WSDL Interface

In this way, the responsibility for initializing theglobal data state was left up to the client. It was upto the client to preserve the state of the individualservice procedures and to pass it over each time theservice was invoked. The service itself becamestateless. [9] This appears to be the only solution toachieving the reuse of embedded code fragments. Itapplies equally well to sections in COBOL andprocedures in C, both of which use global datacontained in the embedding program. Thealternative would be to reuse the whole program asa web service, but then control is left in the handsof the server. This is not the sense of web services.

The final step in this wrapping process was togenerate the BPEL instructions for invoking eachinternal procedure as a web service. The parametersin the WSDL had to be defined as inputs andoutputs to the service and initialized using the Copyinstruction. Then followed the invoke statement forcalling the web service, after which the response ispicked up and the results secured. A fragment of thegenerated BPEL code is depicted below.

<process name = "CheckCustomerCredit" xmlns:calender = "http://anecon.com/sneed/sample/" > <partnerLinks> <PartnerLink name = "CreditCheck" partnerLinkType = "Bank:User" myRole = "Provider" partnerRole = "User" /> </partnerLinks> <variables> <!-- inputs for Credit Checking -->

<variable name = "#T9999RC" messageType ="CreditCheck:KAS_TABLE_PARAMS"/>

<variable name = "#K501265" messageType ="CreditCheck:KAS_TABLE_PARAMS"/>

<variable name = "P50" messageType ="CreditCheck:KAS_TABLE_PARAMS"/>

<variable name = "KTO" messageType ="CreditCheck:KAS_TABLE_PARAMS"/><!-- outputs for Calender Functions -->

<variable name = "ResponseCode" messageType ="CreditCheck:Response"/></variables><assign> <copy> <from variable = "Customer_Id" part = "Customer" /> <to variable = "P50" part = "KAS_TABLE_DATA" /> </copy></assign> <!-- call CreditCheck --> <invoke partnerLink = "BankCustomer"

portType = "CreditStatusPT" operation = "KAS_TABLE" inputVariable = "KAS_TABLE_PARAMS" output Variable = "ResponseCode" />

<assign> <copy><from variable = "ResponseCode" part = "CreditResponse" /><to variable = "CreditRating" part = "CustomerData" /> </copy> </assign></process>

Figure 9: Generated BPEL Procedure

As was the case in the first project, this projectwas never carried out to the end. The work wasviewed by the customer as a feasibility study on thereuse of the legacy software. Once it wasdemonstrated that a link could be established fromthe client process on the PC-Workstation to theserver programs on the host via MQ-Series, theproject was abandoned. The customer had in themeantime decided to take another approach. Theonly lasting benefit to the customer was adocumentation of his PL/I program interfaces in anXML format.

5. Common problems in all threeprojects

It is an interesting observation that in all threeprojects the problems of reverse engineering theinterfaces were very similar. The main problem liesin the lack of an interface architecture. In both casesthere was no higher order framework to govern theimplementation of the interfaces. It was left to theindividual developers to sort it out amongthemselves. Every developer was free to structurehis or her interfaces, in any way he chose. Thisresulted in the use of overlays, pointers, alternaterecord types and incompatible data types, in otherwords a big undocumented mess, since none of thedevelopers ever took the time to really specify theirinterfaces. They simply emerged according to theinflux of diverse, often conflicting requirements.

Proceedings of the 13th Working Conference on Reverse Engineering (WCRE'06)0-7695-2719-1/06 $20.00 © 2006

Page 8: [IEEE 2006 13th Working Conference on Reverse Engineering - Benevento, Italy (2006.10.23-2006.10.27)] 2006 13th Working Conference on Reverse Engineering - Reverse Engineering of System

Thus it did not matter whether the interfaces wereimplemented in COBOL, PLI, C or C++. In allcases, the developers, being left to their owndevices, managed to create highly complicated datastructures which no one could readily understand.

This is all the more astounding since in all ofthe organizations involved great pains were taken tounify and control the database structures. In all ofthe organizations affected there was a rigid databaseadministration which developed and maintainedcentralized database schemas. The developers werenot allowed to either create or to alter databasestructures on their own. They had to submitrequests to the database administrator who keptright control over the evolution of the databases.

In contrast, the data interfaces between systemsor subsystems were not controlled at all, but wereallowed to emerge in any possible modus. Onlywhen it came time to consolidate and to specify theinterfaces did the managers realize what a problemthey had on their hands. In the case of the GEOSproject, it prohibited the full use of the new webfrontend since many backend C interfaces couldnever be properly serviced by the JAVA clients. Inthe case of the GCOS project it delayed the test ofthe migrated COBOL programs by two monthswhile the interfaces were being post documented. Inthe case of the Swiss bank it caused the wrappingeffort to be abandoned. In all cases the financialloss was significant. Had it not have been possibleto develop an adequate reverse engineering toolnone of the interfaces could have been documented.

6. Lessons learned from the threeprojects

The main lesson learned is that systeminterfaces have to be managed in the same way thatthe databases are managed. There should be aninterface administrator just as there is a databaseadministrator. This interface administrator shouldbe responsible for designing a commonmetaschema for all data interfaces preferably withXMI. [10] There he should establish a commonontology and specify the basic data types as well asthe standard data structures. If developers arerequired to establish an import or export interfacethey should have to request it from the interfaceadministrator who will then generate an XSDsubschema for them. In no case should individualdevelopers be allowed to develop their owninterfaces, since this would only lead to theproblems that came up in the two projects discussedin this paper.

In principle programmers must be kept undertight control as concerns the system architecture,the databases and the system interfaces. In fact, theless freedom individual programmers have, thebetter if is for the system as a whole. They must beforced to follow strict coding conventions, to avoidusing complicated programming constructs and todocument everything they do. Unfortunately,generations of programmers have proven that theyare not mature enough to do what is best for theiremployers. So employers have no other choice butto impose rigid standards upon them.

Once, the situation is out of hand, as it is withmost legacy systems, the system owners must bewilling to invest in reengineering projects to restoreorder in their systems. This should begin with thesystem interfaces. They need to first beredocumented and then to be reengineered intoXML type schemas. Eventually the variousschemas, one for each system interface, should bemerged together into a common metaschema with acommon ontology and a set of standard data typesand structure templates. This may take months toachieve, but once it is done, the legacy system willbe in a much better position to maintain, migrateand to test than the systems presented in this paper.[11]

7. Related Work

There is no lack of related work in this field.Work on the reverse engineering of legacyapplications for inclusion as web services has beenin progress since the year 2000, much of what hastaken place at the RCOST institute in Benevento.One of the first research projects was made byAversano, Canfora and Delucia. [12]. That was tobe followed by several others including, inparticular, the work of Bodhuin, Guardabascio, andTotorella [13] that of Bovenzi, Canforna, andFasolina [14]. The researchers at RCOST succeededin restructuring the user interfaces of existing e-government applications for reuse as web services.

Other research in this field has taken place inCanada at the Universities of Victoria [15] andEdmunton, where Elena Stroulia has managed toreverse engineer GUI user interfaces to web pages[16]. In England, C. Boldyreff has been working formany years on the conversion of legacyapplications in the health administration to the web.[17] However, most all of this research has beendirected toward reengineering the user interfacesand not to reusing the system interfaces.

Proceedings of the 13th Working Conference on Reverse Engineering (WCRE'06)0-7695-2719-1/06 $20.00 © 2006

Page 9: [IEEE 2006 13th Working Conference on Reverse Engineering - Benevento, Italy (2006.10.23-2006.10.27)] 2006 13th Working Conference on Reverse Engineering - Reverse Engineering of System

8. Conclusions

In this paper three projects for interface reverseengineering were described. In the first project, thegoal was to redocument the interfaces of a backendapplication server implemented in C in order to beable access those interfaces from a web client. Thiswas an example of interface reverse engineering forthe purpose of system migration.

In the second project, the goal was toredocument the interfaces between already migratedCOBOL programs on a UNIX and not yet migratedCOBOL programs on a Bull mainframe in order tobe able to simulate the interfaces. This was an andexample of interface reverse engineering for thepurpose of system testing. The first project wasonly partially successful, the second was fullysuccessful.

In the third project, the objective was to makePL/I procedures on the mainframe available as webservices to business processes running on theinternet. This was an example of interface reverseengineering for the purpose of wrapping existingbusiness functions embedded in legacy code.Unfortunately this project was never carried out toan end, because of the organizational difficultiesinvolved in implementing web services.

The conclusion of the paper is that the problemsoccurring here could have been easily avoided hadthere been a central interface administration, and acommon interface schema. System interfaces likesystem databases need to be designed andadministered by a single point of control. This is themain prerequisite to a Service OrientedArchitecture [18].

References

[1] Malhotra, Y./Galletta, D.: Building Systemsthat Users want to use , Comm. Of ACM, Vol. 47,No. 12, Dec. 2004, p. 89[2] Sneed, H./Dombovari, T.: Comprehending acomplex, distributed Client/Server System AReport from the Field in Proc. of 7th IWPC, IEEEComputer Society Press, Pittsburgh, May, 1999, p.218[3] Fitzgerald, M.: Building B2B Applications withXML, John Wiley & Sons, New York, 2001, p. 51[4] Grado-Caffaro, M.-A./ Grado-Caffaro, M.: TheChallenges that XML faces in IEEE Computer,October, 2001, p. 15

[5] Brodie, M./Stonebraker, M.: Migrating LegacySystems, Morgan-Kaufmann Pub., San Francisco,1995, p. 13[6] Woodward, M.: Insights in Software Testing ,Software Focus, Vol. 2, No. 3, June 2001, p. 93[7] Sellink, A./ Verhoef, C.: Native ParsingPatterns , Proc. of 8th WCRE, IEEE ComputerSociety Press, Honolulu, Oct. 1998, p. 89[8] Sneed, H.: Encapsulating Legacy Software foruse in Client/Server Systems in Proc. of WorkingConf. on Reverse Engineering, Monterey, Cal.,IEEE Press, Nov. 1996, p. 104[9] Sneed, H.: Generation of Stateless Componentsfrom Procedural Programs for Reuse in aDistributed System , in Proc. of 4th EuropeanCSMR, IEEE Press, Zürich, March 2000, p.183[10] Daum, D./Merten, U.: System Architecturewith XML, Morgan Kaufmann Pub, San Francisco,2003, p. 68[11] Hasselbring, W.: Information SystemIntegration , Comm. Of ACM, Vol. 43, No. 6, June2000, p. 35[12] Aversano, L./Canfora, G./Cimitile, A./DeLucia, A.: Migrating Legacy Systems to theWeb Proc of 5th CSMR, IEEE Computer SocietyPress, Lisabon, March 2001, p. 148[13] Bodhuin, T./Guardabascio, E./ Totorella, M.:Migrating COBOL Systems to the WEB , Proc. of

9th WCRE-2002, IEEE Computer Society,Richmond Va., Nov. 2002, p. 329[14] Bovenzi, D./ Canforna, G./ Fasolina, A.:Enabling Legacy System Accessibility by Web

Heterogeneneos Clients , Proc. of 7th CSMR-2003,IEEE Computer Society Press, Benevento, March,2003, p. 73[15] Tilley, S./ Gerdes, J./ Hamilton, T./ Huang, S./Müller, H./Smith, D./Wong, K.: On the businessvalue and technical challenges of adapting Webservices , Journal of Software Maintenance andEvolution, Vol. 16, Nr. 1, 2004, p. 31[16] Stroulia, E./ El-Ramly, M./ Sorenson, P.:From Legacy to Web through Interaction

Modeling , in Proc. of ICSM-2002, IEEEComputer Society Press, Montreal, Oct. 2002, p.320[17] Boldyreff, C./ Kewish, R.: ReverseEngineering to achieve maintainable Websites ,Proc. of WCRE-2001, IEEE Computer SocietyPress, Stuttgart, 2001, p. 249[18] Kratzig, D./ Banke, K./ Slama, D.: EnterpriseSOA, Prentice-Hall Pub., Upper Saddle River, N.J.,2004, p. 6

Proceedings of the 13th Working Conference on Reverse Engineering (WCRE'06)0-7695-2719-1/06 $20.00 © 2006

Page 10: [IEEE 2006 13th Working Conference on Reverse Engineering - Benevento, Italy (2006.10.23-2006.10.27)] 2006 13th Working Conference on Reverse Engineering - Reverse Engineering of System

This document was created with Win2PDF available at http://www.daneprairie.com.The unregistered version of Win2PDF is for evaluation or non-commercial use only.

Proceedings of the 13th Working Conference on Reverse Engineering (WCRE'06)0-7695-2719-1/06 $20.00 © 2006