reverse engineering web applications

16
Ph.D. Dissertation Forum – ICSM 2005 Ph.D. Dissertation Forum – ICSM 2005 Ph.D. Dissertation Ph.D. Dissertation Reverse Engineering Reverse Engineering Web Applications Web Applications Porfirio Tramontana Porfirio Tramontana University of Naples “Federico II” University of Naples “Federico II”

Upload: porfirio-tramontana

Post on 24-May-2015

2.659 views

Category:

Technology


9 download

DESCRIPTION

The heterogeneous and dynamic nature of components making up a Web Application, the lack of effective programming mechanisms for implementing basic software engineering principles in it, and undisciplined development processes induced by the high pressure of a very short time-to-market, make Web Application maintenance a challenging problem. A relevant issue consists of reusing the methodological and technological experience in the sector of traditional software maintenance, and exploring the opportunity of using Reverse Engineering to support effective Web Application maintenance. The Ph.D. Thesis presents an approach for Reverse Engineering Web Applications. The approach include the definition of Reverse Engineering methods and supporting software tools, that help to understand existing undocumented Web Applications to be maintained or evolved, through the reconstruction of UML diagrams. Some validation experiments have been carried out and they showed the usefulness of the proposed approach and highlighted possible areas for improvement of its effectiveness.

TRANSCRIPT

Page 1: Reverse Engineering Web Applications

Ph.D. Dissertation Forum – ICSM 2005Ph.D. Dissertation Forum – ICSM 2005

Ph.D. DissertationPh.D. Dissertation

Reverse Engineering Reverse Engineering

Web ApplicationsWeb Applications

Porfirio TramontanaPorfirio Tramontana

University of Naples “Federico II”University of Naples “Federico II”

Page 2: Reverse Engineering Web Applications

Ph.D. Dissertation Forum – ICSM 2005Ph.D. Dissertation Forum – ICSM 2005

Web Applications: open problemsWeb Applications: open problems

In the past years, a great request for Web Applications takes place, due to the World Wide Web diffusion making available many services all over the world

Web Applications have been developed with immature design methodologies and technologies

Nowadays, there is a number of legacy Web Applications needing for maintenance and re-engineering

Page 3: Reverse Engineering Web Applications

Ph.D. Dissertation Forum – ICSM 2005Ph.D. Dissertation Forum – ICSM 2005

Ph. D. Thesis Goals

• To propose models, methods and tools supporting Reverse Engineering and Comprehension of Web Applications

• Reverse Engineering and comprehension are fundamental tasks needed to efficiently support maintenance, testing and quality assessment of Web Applications

Doctoral Thesis Goals

Page 4: Reverse Engineering Web Applications

Ph.D. Dissertation Forum – ICSM 2005Ph.D. Dissertation Forum – ICSM 2005

Peculiarities of script-based Web Applications

Page based Client-Server Architecture Interpreted languages Client pages may be generated “on the

fly” Client pages are executed in a browser

(and the designer doesn’t know what kind of browser will be used)

HTML interpreters are fault tolerant

... and so on ...

Page 5: Reverse Engineering Web Applications

Ph.D. Dissertation Forum – ICSM 2005Ph.D. Dissertation Forum – ICSM 2005

A process for the A process for the Reverse Engineering of Web ApplicationsReverse Engineering of Web Applications

Abstraction

Extraction

WASourceCode

StaticAnalysis

Dynamic Analysis

Business Level UML Diagram Abstractions

WA Execution

Identification of cloned components

Identification of Interaction Design

Patterns

Assignment of Concepts

Functional Clustering

Cloned components

Interaction Design Patterns

Concepts describing Reverse Engineering artifacts

Groups of pages realizing Web Application use cases

Structural and Business Level UML diagrams

Maintanability assessment

Abstraction

Extraction

WASourceCode

StaticAnalysis

Dynamic Analysis

Business Level UML Diagram Abstractions

WA Execution

Identification of cloned components

Identification of Interaction Design

Patterns

Assignment of Concepts

Functional Clustering

Cloned components

Interaction Design Patterns

Concepts describing Reverse Engineering artifacts

Groups of pages realizing Web Application use cases

Structural and Business Level UML diagrams

Maintanability assessment

G.A. Di Lucca, A.R. Fasolino, P. Tramontana, “Reverse Engineering Web Application: the WARE approach”, Journal of Software Maintenance and Evolution: Research and Practice, Volume 16, Issue 1-2, Date: January - April 2004, Pages: 71-101

Page 6: Reverse Engineering Web Applications

Ph.D. Dissertation Forum – ICSM 2005Ph.D. Dissertation Forum – ICSM 2005

Analysis of Web ApplicationsAnalysis of Web Applications

1) Static analysis of the source codeA multi-language parser analysing the source code of Server pages, Client pages and Script modules has been realized.

During the analysis of server pages, facts related to the client pages that are built by server pages are also recorded.

Static analysis results are stored in a intermediate form and are used to fill a relational database

2) Dynamic AnalysisAnalysis of Built Client pages in order to add to the database some facts that have been observed by executing the application

The reference model adopted is an extension of the one proposed by Conallen for the forward engineering of Web Applications

Page 7: Reverse Engineering Web Applications

Ph.D. Dissertation Forum – ICSM 2005Ph.D. Dissertation Forum – ICSM 2005

Model of Web ApplicationsModel of Web Applications

Static Page

DB Interface

Java Applet

TextareaSelect Button

Media Flash Object Mail Address

Mail Interface Server File Interface

Other Object

Generic File

Download

Parameter

Other Interface

Hyperlink

Frame

Web Object

Frameset

Anchor

Field

Server Function Server Class

Interface Object

Built Page

Form

Server Script

Session Variable

Server CookieServer Page

Submits

include

HTML Tag

Web Page

source

redirect

Client Page

Client Script

event

Modify Tag

redirect

Client Function

Client Module

Page 8: Reverse Engineering Web Applications

Ph.D. Dissertation Forum – ICSM 2005Ph.D. Dissertation Forum – ICSM 2005

WARE (Web Application Reverse WARE (Web Application Reverse Engineering) toolEngineering) tool

Extractor Abstractor

Interface layer

IRF

DBR

Diagrams

Repository

HTML

Parsers

Service Layer

WARE-Tool

WA Source Files

WARE GUI

Graphical Visualizer

Dotty

VCG RIGI

ASP

VBS

PHP

JS

….

IRF Translator

Query Executor

UML Diagrams Abstractor /areadocente.html

/check.asp

Redirect

/check.aspBuilds

/autenticazionedocente.html

Submit

/check.asp /check.asp/check.asp

Submit

/areadocente.html

/check.asp

Redirect

/check.aspBuilds

/autenticazionedocente.html

Submit

/check.asp /check.asp/check.asp

Submit

WARE Architecture

Detail Class Diagram abstracted by WAREG. A. Di Lucca, A.R. Fasolino, U. De Carlini, F. Pace, P. Tramontana, “WARE: a tool for the Reverse Engineering of web Applications”, Proc. of 6th IEEE European Conference on Software Maintenance and Reengineering, CSMR 2002, IEEE CS Press, Los Alamitos, CA, Pages:241 - 250

Page 9: Reverse Engineering Web Applications

Ph.D. Dissertation Forum – ICSM 2005Ph.D. Dissertation Forum – ICSM 2005

Functional Clustering of Web Functional Clustering of Web PagesPages

• Goal:

To cluster together subsets of components realizing Web Application functionalities

• Proposed Technique:

Hierarchical clustering algorithm, grouping Web Application pages in subsets, maximizing the cohesion and minimizing the coupling between them

G. A. Di Lucca, A.R. Fasolino, U. De Carlini, F. Pace, P. Tramontana, “Comprehending Web Applications by a Clustering Based Approach”, Proc. of 10th IEEE Workshop on Program Comprehension, IWPC 2002, Pages:261 - 270

Page 10: Reverse Engineering Web Applications

Ph.D. Dissertation Forum – ICSM 2005Ph.D. Dissertation Forum – ICSM 2005

Concept AssignmentConcept Assignment Goal:Goal:

To identify the more To identify the more relevant concepts in client relevant concepts in client pages with the purpose to pages with the purpose to suggest a semantic suggest a semantic description of client pages description of client pages and of functional clusters and of functional clusters of pagesof pages

Proposed Technique:Proposed Technique: Heuristic Algorithms based Heuristic Algorithms based

on Information Retrievalon Information Retrieval Candidate concepts are Candidate concepts are

searched in textual content searched in textual content of client pagesof client pages

Single common words and Single common words and short word sequences are short word sequences are candidated to be conceptscandidated to be concepts

Built Client Page

Server Page

0..*

1

0..*

1<<builds>>

Data Component

StopWord

Word

has synonym

has stem

Web Page

Static Client Page

AttributeName

TagNameWeight

nested in

0..*0..*

Control Component

0..*0..*

Client PageFile name

1111

TextWeight

0..*0..*

0..1

0..1

0..1

0..1

0..*0..1 0..*0..1

Concept1

1

1

1

1

1

1

1

G.A. Di Lucca, A.R.Fasolino, P.Tramontana, U.De Carlini, “Supporting Concept Assignment in the Comprehension of Web Applications”, Proceedings of the 28th IEEE Annual International Computer Software and Applications Conference, COMPSAC 2004

Page 11: Reverse Engineering Web Applications

Ph.D. Dissertation Forum – ICSM 2005Ph.D. Dissertation Forum – ICSM 2005

Interaction Design PatternInteraction Design Patterns s IdentificationIdentification

Goal:Goal: To identify repetitive structures in Web To identify repetitive structures in Web

Client pagesClient pages These structures can be related to known These structures can be related to known

Programming PatternsProgramming Patterns Proposed Technique:Proposed Technique:

Statistical methodology based on Statistical methodology based on features extracted in the source code features extracted in the source code of client pages.of client pages.

Presence, quantity and dimension of forms, Presence, quantity and dimension of forms, tables, input fields, frames, common tables, input fields, frames, common keywords and so on. keywords and so on.

 

 

G.A. Di Lucca, A.R.Fasolino, P.Tramontana, “Recovering Interaction Design Patterns in Web Applications”, submitted to 9th IEEE European Conference on Software Maintenace and Reengineering, CSMR 2005

Page 12: Reverse Engineering Web Applications

Ph.D. Dissertation Forum – ICSM 2005Ph.D. Dissertation Forum – ICSM 2005

Identification of cloned Identification of cloned componentscomponents

Goals:Goals: Re-Engineering of cloned components via code Re-Engineering of cloned components via code

transformationstransformations Classification of Built Client Pages Classification of Built Client Pages Identification of reusable Programming PatternsIdentification of reusable Programming Patterns

Proposed Techniques:Proposed Techniques: Extraction of features in the structure of Client Extraction of features in the structure of Client

pages and in the source code of server pagespages and in the source code of server pages Computation of distance measures between Computation of distance measures between

pages (Euclidean dstance, Levenshtein edit pages (Euclidean dstance, Levenshtein edit distance)distance)

G.A. Di Lucca, A.R. Fasolino, P. Tramontana, U. De Carlini, “Identifying Reusable Components in Web Applications”, IASTED International Conference on Software Engineering, SE 2004, pp.526-531

Page 13: Reverse Engineering Web Applications

Ph.D. Dissertation Forum – ICSM 2005Ph.D. Dissertation Forum – ICSM 2005

Abstraction of Business Level Abstraction of Business Level ModelsModels

Goals:Goals: To abstract object oriented To abstract object oriented

business level models of business level models of Web Applications Web Applications

Proposed Techniques:Proposed Techniques: Classes and attributes are Classes and attributes are

identified by analysing the identified by analysing the data that are exchanged data that are exchanged between user, Web pages between user, Web pages and databases. and databases.

Class methods are Class methods are identified by analysing the identified by analysing the functions implemented by functions implemented by cluster of pages cluster of pages

Relationships between Relationships between classes are identified classes are identified analysing data structures analysing data structures and data flow among and data flow among pagespages

Tutoring requestDate

TeacherNameSurnameE-mailPhone numberPasswordCode

TutoringDateStart timeEnd time

NewsNumberDateText

StudentNameSurnameE-mailPasswordCodePhone number

ExamDateTimeClassroom

CourseAcademic yearCodeName

Exam ReservationDate

G.A. Di Lucca, A.R.Fasolino, U.De Carlini, P.Tramontana, “Recovering a Business Object Model from Web Applications”, Proceedings of the 27th IEEE Annual International Computer Software and Applications Conference, COMPSAC 2003, Pages: 348 - 353

Page 14: Reverse Engineering Web Applications

Ph.D. Dissertation Forum – ICSM 2005Ph.D. Dissertation Forum – ICSM 2005

Maintainability ModelMaintainability Model

Goals:Goals: To propose models and methods for the To propose models and methods for the

assessment of the maintainability of Web assessment of the maintainability of Web ApplicationsApplications

Proposed Models and Techniques:Proposed Models and Techniques: Adapting to Web Applications the Oman model Adapting to Web Applications the Oman model

(thought for traditional applications)(thought for traditional applications) Selection of a set of product metrics and Selection of a set of product metrics and

proposal of a maintainability index that can be proposal of a maintainability index that can be calculated with negligible effort and timecalculated with negligible effort and time

G.A. Di Lucca, A.R.Fasolino, P.Tramontana, C.A.Visaggio, “Towards the definition of a maintainability model for web applications”, Proceedings of the Eighth IEEE European Conference on Software Maintenance and Reengineering, CSMR 2004, pages:279 - 287

Page 15: Reverse Engineering Web Applications

Ph.D. Dissertation Forum – ICSM 2005Ph.D. Dissertation Forum – ICSM 2005

Current and future worksCurrent and future works

Techniques for the dynamic Techniques for the dynamic analysis of Web Applicationsanalysis of Web Applications

Accessibility assessment of Accessibility assessment of Client pagesClient pages

Migration from Web Migration from Web Applications to Web ServicesApplications to Web Services

Testing of Web ApplicationsTesting of Web Applications Mutation Testing techniquesMutation Testing techniques

Maintainability assessmentMaintainability assessment Definition of ageing measures for Definition of ageing measures for

Web ApplicationsWeb Applications

G.A. Di Lucca, M. Di Penta, A.R. Fasolino, P. Tramontana, “Supporting Web Application Evolution by Dynamic Analysis”, IWPSE 2005

G.A. Di Lucca, A.R. Fasolino, P. Tramontana, “Web Site Accessibility: Identifying and Fixing of Accessibility Problems in Client Page Code”, WSE 2005

Page 16: Reverse Engineering Web Applications

Ph.D. Dissertation Forum – ICSM 2005Ph.D. Dissertation Forum – ICSM 2005