[ieee 2009 16th working conference on reverse engineering - lille, france (2009.10.13-2009.10.16)]...

2
JavaCompExt: Extracting Architectural Elements from Java Source Code Nicolas Anquetil and Jean-Claude Royer ASCOLA, EMN-INRIA, Nantes, France Email: (nicolas.anquetil,jean-claude.royer)@emn.fr Pascal Andr´ e and Gilles Ardourel LINA, UMR 6241, Nantes, France Email: (pascal.andre,gilles.ardourel)@univ-nantes.fr Petr Hnˇ etynka and Tom´ s Poch DSRG - Charles University, Prague, Czech Republic Email: (hnetynka,poch)@dsrg.mff.cuni.cz Dragos ¸ Petras ¸cu and Vladiela Petras ¸cu LCI - Babes ¸-Bolyai University, Cluj-Napoca, Romania Email: (vladi,petrascu)@cs.ubbcluj.ro Abstract—Software architecture erosion is a general problem in legacy software. To fight this trend, component models and languages are designed to try to make explicit, and automati- cally enforceable, the architectural decisions in terms of compo- nents, interfaces, and allowed communication channels between component interfaces. To help maintainers work on existing object-oriented systems, we explore the possibility of extracting architectural elements (components, communications, services, . . . ) from the source code. We designed a tool based on some heuristics for extracting component information from Java source code. Keywords-component recovery, communication channels, re- verse engineering, architecture recovery I. I NTRODUCTION Architectural erosion is the process by which a system’s architecture gradually degrade as maintainers make changes to the system that violate the original architectural intents. The result is that the system becomes gradually more diffi- cult to maintain as communication channels (e.g. method calls) are established among all parts of the system. To fight this trend, new languages and development methods are proposed that make explicit some architectural decisions in the source code (for the benefit of the programmers) and allow automatic verification and enforcement of these decisions, either at compile or execution time. For example, initiatives like ArchJava [1] extend the Java language with architectural component types, in and out ports on the com- ponents, that allow to establish explicit connections between the components, . . . One interesting property is to be able to statically check communication integrity [2], that is to say, ensure that implemented components do not communicate between themselves in ways that would violate the intended control flow rules of the architecture. We explore the possibility of automatic reverse- engineering a formal model from a conventional Java legacy application by extracting the component types it contains and making explicit the communication channels between them. This paper describe the main rules and the associated tool support. II. ABSTRACTION FROM SOURCE CODE A full component based architectural model can include: Components, required and provided services of the compo- nents, composition structure of the components (composite components and sub-components), communication channels between components, and data circulating on the communi- cation channels. We aim at extracting these informations from a traditional object oriented application. We are currently working on Java code, but our hypotheses could apply to any Object- Oriented language. Classes and Java interfaces are called types of interest. A type of interest may be either a compo- nent type or a data type. Component types are created from these types of interest, we do not group variables, routines or object to create components as traditional component recovery would do (e.g. see [3]). We pay more attention to the components because they are semantically richer from an architectural point of view: they have provided and required interfaces, they communicate, etc. We made the decision to favour precision of components over recall, this means, we prefer recover fewer components, with lower probability of false positive, rather than trying to extract “all” components at the cost of having many false positive. Therefore, we use strict rules for components. A. Main Heuristics for Recovering The first rule concerns component types extraction: If a type of interest is passed as parameter of a method or returned by a method it is considered a data type, otherwise it is considered a component type. The idea is that static checking of communication integrity is possible only when all uses of a component are explicit (as opposed to accessing the component through a pointer). This is a standard assump- tion in component based developpement (e.g. [1], [4]). The main exception to the first heuristic is that component may be passed to or returned by constructor. Consequently of this first rule, a sub-type of a data type is also considered a data type, since instances of the sub-type could be used as parameters according to sub-typing rules. 2009 16th Working Conference on Reverse Engineering 1095-1350/09 $26.00 © 2009 IEEE DOI 10.1109/WCRE.2009.53 317

Upload: vladiela

Post on 17-Mar-2017

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: [IEEE 2009 16th Working Conference on Reverse Engineering - Lille, France (2009.10.13-2009.10.16)] 2009 16th Working Conference on Reverse Engineering - JavaCompExt: Extracting Architectural

JavaCompExt: Extracting Architectural Elements from Java Source Code

Nicolas Anquetil and Jean-Claude RoyerASCOLA, EMN-INRIA, Nantes, France

Email: (nicolas.anquetil,jean-claude.royer)@emn.fr

Pascal Andre and Gilles ArdourelLINA, UMR 6241, Nantes, France

Email: (pascal.andre,gilles.ardourel)@univ-nantes.fr

Petr Hnetynka and Tomas PochDSRG - Charles University, Prague, Czech Republic

Email: (hnetynka,poch)@dsrg.mff.cuni.cz

Dragos Petrascu and Vladiela PetrascuLCI - Babes-Bolyai University, Cluj-Napoca, Romania

Email: (vladi,petrascu)@cs.ubbcluj.ro

Abstract—Software architecture erosion is a general problemin legacy software. To fight this trend, component models andlanguages are designed to try to make explicit, and automati-cally enforceable, the architectural decisions in terms of compo-nents, interfaces, and allowed communication channels betweencomponent interfaces. To help maintainers work on existingobject-oriented systems, we explore the possibility of extractingarchitectural elements (components, communications, services,. . . ) from the source code. We designed a tool based on someheuristics for extracting component information from Javasource code.

Keywords-component recovery, communication channels, re-verse engineering, architecture recovery

I. INTRODUCTION

Architectural erosion is the process by which a system’sarchitecture gradually degrade as maintainers make changesto the system that violate the original architectural intents.The result is that the system becomes gradually more diffi-cult to maintain as communication channels (e.g. methodcalls) are established among all parts of the system. Tofight this trend, new languages and development methodsare proposed that make explicit some architectural decisionsin the source code (for the benefit of the programmers)and allow automatic verification and enforcement of thesedecisions, either at compile or execution time. For example,initiatives like ArchJava [1] extend the Java language witharchitectural component types, in and out ports on the com-ponents, that allow to establish explicit connections betweenthe components, . . . One interesting property is to be able tostatically check communication integrity [2], that is to say,ensure that implemented components do not communicatebetween themselves in ways that would violate the intendedcontrol flow rules of the architecture.

We explore the possibility of automatic reverse-engineering a formal model from a conventional Java legacyapplication by extracting the component types it contains andmaking explicit the communication channels between them.This paper describe the main rules and the associated toolsupport.

II. ABSTRACTION FROM SOURCE CODE

A full component based architectural model can include:Components, required and provided services of the compo-nents, composition structure of the components (compositecomponents and sub-components), communication channelsbetween components, and data circulating on the communi-cation channels.

We aim at extracting these informations from a traditionalobject oriented application. We are currently working onJava code, but our hypotheses could apply to any Object-Oriented language. Classes and Java interfaces are calledtypes of interest. A type of interest may be either a compo-nent type or a data type. Component types are created fromthese types of interest, we do not group variables, routinesor object to create components as traditional componentrecovery would do (e.g. see [3]). We pay more attention tothe components because they are semantically richer from anarchitectural point of view: they have provided and requiredinterfaces, they communicate, etc. We made the decision tofavour precision of components over recall, this means, weprefer recover fewer components, with lower probability offalse positive, rather than trying to extract “all” componentsat the cost of having many false positive. Therefore, we usestrict rules for components.

A. Main Heuristics for Recovering

The first rule concerns component types extraction: Ifa type of interest is passed as parameter of a method orreturned by a method it is considered a data type, otherwiseit is considered a component type. The idea is that staticchecking of communication integrity is possible only whenall uses of a component are explicit (as opposed to accessingthe component through a pointer). This is a standard assump-tion in component based developpement (e.g. [1], [4]). Themain exception to the first heuristic is that component maybe passed to or returned by constructor. Consequently ofthis first rule, a sub-type of a data type is also considered adata type, since instances of the sub-type could be used asparameters according to sub-typing rules.

2009 16th Working Conference on Reverse Engineering

1095-1350/09 $26.00 © 2009 IEEE

DOI 10.1109/WCRE.2009.53

317

Page 2: [IEEE 2009 16th Working Conference on Reverse Engineering - Lille, France (2009.10.13-2009.10.16)] 2009 16th Working Conference on Reverse Engineering - JavaCompExt: Extracting Architectural

We chose to ignore all types of interest not defined in theJava project considered. That is to say we ignore externallibraries as java.io.*. One reason is that we want toextract the provided services of the components, and theirstructure and we need to analyze the source code for this.Another reason is that we cannot hope to restructure theentire world and want to limit ourselves to the applicationat hand.

After extracting the component types, the compositestructure can be retrieved. This is our second rule: Thecomposition structure of components is extracted from thefields (component that is part of another component). Wechoose to consider the maximal structure, that is collectingthe attributes defined and inherited.

Since components should communicate we have a thirdrule for extracting communications: There is a communi-cation between two component types if a method of onecomponent type makes a call to a method of the other.If the method “returns” void it is a one way communi-cation, otherwise, it is a two ways communication. Othercommunications means are possible, for example in Javausing the reflective API. We did not consider these cases asthey require more advanced knowledge of an application toknow how communications are implemented in it.

We also consider that sub-typing could exist betweencomponent types, this defines the fourth rule: Sub-typingrelationships are computed from the language inheritancerelationships. In Java, there are two such relationships:extends and implements. Component types may inheritfrom component types but not data types. Data type mayinherit from data types or component types.

The fifth rule extracts provided and required services: Therequired services of a component type are those methodsthat are called in it. Provided services are all the publiclyavailable methods defined in the component type. In Java,these methods are the public and default packageones.

B. Tool support

Our tool support is an Eclipse plugin based on the JDTparser to analyse Java projects. The program is a sourcecode (binary Java code could be analysed in more or lessthe same way, either directly or after decompiling). Theplugin processes the rules describe previously and extractsthe information. The information recovered is available ontwo modes. Basically several textual views provides theresults of the extraction and some metrics. The secondmode proposes two interactive graphical views to analyse theresults. In this case the views represent the information asa graph. The graph may be dynamically configured to showcomponent types and/or data types, structure relationships,and/or communication relationships, and/or inheritance re-lationships. The various nodes and links are differentiated

by colours and shapes. Furthermore, the width of the com-munication arrow depend on the number of services calledon this communication channel. A context window on thecomponents shows the services it provides and requires. Acontext window on the communication shows what servicesare involved in this communication. There are also somesearch function facilities which colours the query result.

III. CONCLUSION

One possibility to fight against the erosion of a system’sarchitecture is to make it explicit in the source code. Com-ponent Based Software Engineering proposes tools and ap-proaches that allow this. We propose a component recoverytool that extracts component types, data types, provided andrequired services, structure of composite component types,and communication channels between components. This toolis intended to help its user compare (and map) a con-crete implementation with an abstract model. The prototypewas tested on various implementations of the same system(CoCoME implementations), in one case, it showed a lackof mapping between the implementation and the abstractmodel. It could also be used to check the good state of thearchitecture of a system by indicating when one componentsis used improperly (e.g. passed as parameter, or commu-nicating with the wrong other component), something thatis typically not possible with existing industrial approaches.The tool may also be used to hint at possible problems in theimplementation: component passed as parameters, cycle inthe structure of (composite) components, boundary analysischecking, etc. We believe the tool could be used to helprestructuring an application into a componentized one. Itcould help the user identify components or check that whathe thinks are components really respect the typical rules ofthe kind.

REFERENCES

[1] J. Aldrich, C. Chambers, and D. Notkin, “ArchJava: connect-ing software architecture to implementation,” in ICSE ’02:Proceedings of the 24th International Conference on SoftwareEngineering. New York, NY, USA: ACM, 2002, pp. 187–197.

[2] D. C. Luckham, J. L. Kenney, L. M. Augustin, J. Vera,D. Bryan, and W. Mann, “Specification and analysis of systemarchitecture using rapide,” IEEE Transactions on SoftwareEngineering, vol. 21, pp. 336–55, 1995.

[3] R. Koschke, G. Canfora, and J. Czeranski, “Revisiting the∆IC approach to component recovery,” Sci. Comput. Program.,vol. 60, no. 2, pp. 171–188, 2006.

[4] L. Chouambe, B. Klatt, and K. Krogmann, “ReverseEngineering Software-Models of Component-Based Systems,”in 12th European Conference on Software Maintenanceand Reengineering, K. Kontogiannis, C. Tjortjis, andA. Winter, Eds. Athens, Greece: IEEE ComputerSociety, April 1–4 2008, pp. 93–102. [Online]. Available:http://sdqweb.ipd.uka.de/publications/pdfs/chouambe2008a.pdf

318