[IEEE 2008 15th Working Conference on Reverse Engineering (WCRE) - Antwerp, Belgium (2008.10.15-2008.10.18)] 2008 15th Working Conference on Reverse Engineering - Towards a Benchmark for Evaluating Reverse Engineering Tools

Download [IEEE 2008 15th Working Conference on Reverse Engineering (WCRE) - Antwerp, Belgium (2008.10.15-2008.10.18)] 2008 15th Working Conference on Reverse Engineering - Towards a Benchmark for Evaluating Reverse Engineering Tools

Post on 10-Mar-2017

214 views

Category:

Documents

2 download

Embed Size (px)

TRANSCRIPT

  • Towards a Benchmark for Evaluating Reverse Engineering Tools

    Lajos Jeno Fulop, Peter Hegedus, Rudolf Ferenc and Tibor Gyimothy

    University of Szeged, Department of Software Engineering

    {flajos|ferenc|gyimi}@inf.u-szeged.hu, hegedus.peter.3@stud.u-szeged.hu

    Abstract

    In this paper we present work in progress towards imple-

    menting a benchmark called BEFRIEND (BEnchmark For

    Reverse engInEering tools workiNg on source coDe), with

    which the outputs of reverse engineering tools can be evalu-

    ated and compared easily and efficiently. Such tools are e.g.

    design pattern miners, duplicated code detectors and cod-

    ing rule violation checkers. BEFRIEND supports different

    kinds of tool families, programming languages and software

    systems, and it enables the users to define their own evalu-

    ation criteria.

    Keywords

    Benchmark, reverse engineering tools, tool evaluation

    1 Introduction

    Several design pattern recognition tools have been in-

    troduced in literature, and so far they have proven to be

    rather efficient. Despite this, it would be difficult to state

    that the performance of design pattern recognition tools is

    well-defined and well-known as far as the accuracy and rate

    of the recognized patterns are concerned. So far, this has

    been quite difficult to achieve since for the comparison of

    different tools a common measuring tool and a common set

    of testing data are needed. To solve this problem, we devel-

    oped the DEEBEE (DEsign pattern Evaluation BEnchmark

    Environment) benchmark system in our previous work [2].

    The current work introduces the further development of

    the DEEBEE system which has become more widely appli-

    cable by generalizing the evaluating aspects and the data

    to be indicated. The new system is called BEFRIEND

    (BEnchmark For Reverse engInEering tools workiNg on

    source coDe). With BEFRIEND, the results of reverse engi-

    neering tools from different domains recognizing arbitrary

    characteristics of source code can be subjectively evaluated

    and compared with each other. Such tools are, e.g. bad code

    smell miners, duplicated code detectors, and coding rule vi-

    olation checkers.

    BEFRIEND largely differs from its predecessor in five

    areas: (1) it enables uploading and evaluating results related

    to different domains, (2) it enables adding and deleting the

    evaluating aspects of the results arbitrarily, (3) it introduces

    a new user interface, (4) it generalizes the definition of sib-

    ling relationships [2], and (5) it enables uploading files in

    different formats by adding the appropriate uploading plug-

    in. BEFRIEND is a freely accessible online system avail-

    able at http://www.inf.u-szeged.hu/befriend/.

    2 Motivation

    Nowadays, more and more papers introducing the evalu-

    ation of reverse engineering tools are published. These are

    needed because the number of reverse engineering tools is

    increasing and it is difficult to decide which of these tools

    is the most suitable to perform a given task.

    Petterson et al. [3] summarized problems during the

    evaluation of accuracy in pattern detection. The goal was

    to make accuracy measurements more comparable. They

    stated that community effort is highly required to make con-

    trol sets for a set of applications.

    Bellon et al. [1] presented an experiment to evaluate and

    compare clone detectors. The experiment involved several

    researchers who applied their tools on carefully selected

    large C and Java programs. Their benchmark gives a stan-

    dard procedure for every new clone detector.

    Wagner et al. [5] compared 3 Java bug searching tools

    on 1 university and 5 industrial projects. A 5-level severity

    scale, which can be integrated into BEFRIEND, served as

    the basis for comparison.

    Moreover, several articles dealt with comparing and

    evaluating some kind of reverse engineering tools but they

    had to miss the support of an automated framework like BE-

    FRIEND.

    3 Architecture

    We use the well-known issue and bug tracking system

    called Trac [4] (version 0.10.3) as the basis of the bench-

    mark. Issue tracking is based on tickets, where a ticket

    stores all information about an issue or a bug. Trac is writ-

    ten in Python and it is an easily extendible and customizable

    plug-in oriented system.

    2008 15th Working Conference on Reverse Engineering

    1095-1350/08 $25.00 2008 IEEEDOI 10.1109/WCRE.2008.18

    335

    2008 15th Working Conference on Reverse Engineering

    1095-1350/08 $25.00 2008 IEEEDOI 10.1109/WCRE.2008.18

    335

    2008 15th Working Conference on Reverse Engineering

    1095-1350/08 $25.00 2008 IEEEDOI 10.1109/WCRE.2008.18

    335

    2008 15th Working Conference on Reverse Engineering

    1095-1350/08 $25.00 2008 IEEEDOI 10.1109/WCRE.2008.18

    335

    2008 15th Working Conference on Reverse Engineering

    1095-1350/08 $25.00 2008 IEEEDOI 10.1109/WCRE.2008.18

    335

  • Although the Trac system provides many useful services,

    we had to do a lot of customization and extension work

    to create a benchmark from it. The two major extensions

    were the customization of the graphical user interface and

    the customization of the systems tickets. In the case of the

    graphical user interface we had to inherit and implement

    some core classes of the Trac system. In the case of the

    tickets we had to extend them to be able to describe design

    pattern, duplicate code and rule violation instances (name of

    the pattern or rule violation, information about its position

    in the source code, information about its evaluation, etc).

    Furthermore we extended the database schema to support

    different kind of reverse engineering tools.

    4 BEFRIEND

    BEFRIEND serves the evaluation of tools working on

    source code, which hereafter will be called tools. The tools

    can be classified into domains. The tools in a given do-

    main produce different results which refer to one or more

    positions in the analyzed source code. We refer to these po-

    sitions as result instances. Many times it may happen that

    several instances can be grouped, which can largely speed

    up their evaluation. Furthermore, it can be said that with-

    out grouping, the interpretation of tool results may lead to

    false conclusions. In order to group instances, their rela-

    tions need to be defined. If two instances are related to each

    other, they are called siblings. BEFRIEND supports differ-

    ent kinds of sibling mechanisms which cannot be detailed

    here because of space limitations.

    During the development of BEFRIEND, we were striv-

    ing for full generalization: an arbitrary number of domains

    can be created (design patterns, bad smells, rule violations,

    duplicated code, etc.), the domain evaluation aspects and

    the setting of instance siblings can be customized. Further-

    more, for uploading the results of different tools, the bench-

    mark provides an import filter plug-in mechanism.

    In the following, we show the steps needed to perform

    a tool evaluation and comparison task in a concrete domain

    (e.g. duplicated code) with the help of BEFRIEND:

    1. add the new domain to the benchmark (e.g. Duplicated

    Code).

    2. add one or more evaluation criteria with evaluation

    queries for the new domain (e.g. Procedure abstrac-

    tion Is it worth substituting the duplicated code frag-

    ments with a new function and function calls?).

    3. upload the results of the tools and set the appropriate

    sibling relations.

    4. evaluate the uploaded results with the evaluation crite-

    ria defined in step 2.

    5. by using the statistics and comparison functionality of

    the benchmark the tools can be easily evaluated and

    compared.

    Benefits. Previously, if one wanted to evaluate and com-

    pare reverse engineering tools he had to examine and search

    the appropriate source code fragments by hand. For ex-

    ample, he had to traverse several directories and files and

    search the exact lines and columns in a source file. Con-

    sequently, he could make mistakes and examine the wrong

    files. Furthermore, he had to store his evaluation results

    somewhere for later use which is now automatically sup-

    ported by BEFRIEND.

    To evaluate and compare reverse engineering tools with-

    out BEFRIEND one also has to define evaluation crite-

    ria, find test cases, and evaluate the results by hand. BE-

    FRIEND only has the cost of implementing the appropriate

    plug-in for uploading the results of a tool. However, this

    is a small cost, plug-ins that have been developed till now

    contain less than 100 lines of code.

    The evaluation of the tools with BEFRIEND is clearly

    faster than without it (it costs less).

    5 Conclusion and future workThis work is the first step to create a generally applica-

    ble benchmark that can help to evaluate and compare many

    kinds of reverse engineering tools. In the future, we will

    need the opinion and advice of reverse engineering tool de-

    velopers in order for the benchmark to achieve this aim and

    satisfy all needs.

    In the future, we would like to examine further reverse

    engineering domains, prepare the benchmark for these do-

    mains and deal with the possible shortcomings.

    References

    [1] S. Bellon, R. Koschke, G. Antoniol, J. Krinke, and E. Merlo.

    Comparison and Evaluation of Clone Detection Tools. In

    IEEE Transactions on Software Engineering, volume 33,

    pages 577591, 2007.[2] L. J. Fulop, R. Ferenc, and T. Gyimothy. Towards a Bench-

    mark for Evaluating Design Pattern Miner Tools. In Proceed-

    ings of the 12th European Conference on Software Mainte-

    nance and Reengineering (CSMR 2008). IEEE Computer So-

    ciety, Apr. 2008.[3] N. Pettersson, W. Lowe, and J. Nivre. On evaluation of

    accuracy in pattern detection. In First International Work-

    shop on Design Pattern Detection for Reverse Engineering

    (DPD4RE06), October 2006.[4] The Trac Homepage.

    http://trac.edgewall.org/.[5] S. Wagner, J. Jurjens, C. Koller, and P. Trischberger. Compar-

    ing bug finding tools with reviews and tests. In In Proceedings

    of 17th International Conference on Testing of Communicat-

    ing Systems (TestCom05), pages 4055. Springer, 2005.

    336336336336336

Recommended

View more >