softwindows 4/30/07spiros/teaching/cs675/slides/sca.pdf · softwindows 4/30/07 distributed objects...

45
SoftWindows 4/30/07 Distributed Objects 1 Reverse Engineering (Source Code Analysis) Reverse Engineering (Source Code Analysis) © SERG Source Code Analysis Using BAT Reverse Engineering (Source Code Analysis) Reverse Engineering (Source Code Analysis) © SERG What is Static Analysis? Mining source code for information. Using that information to present abstractions of, and answer questions about, software structure. Reverse Engineering (Source Code Analysis) Reverse Engineering (Source Code Analysis) © SERG What can we get from source code analysis? Type of information is model dependent In almost any language, we can find out information about variable usage: Who? Where? etc. In an OO environment, we can find out which classes use other classes, which are a base of an inheritance structure, etc. We can also find potential blocks of code that can never be executed in running the program (dead code).

Upload: others

Post on 24-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 1

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Source Code AnalysisUsing BAT

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

What is Static Analysis?

• Mining source code for information.

• Using that information to presentabstractions of, and answer questions about,software structure.

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

What can we get fromsource code analysis?

• Type of information is model dependent– In almost any language, we can find out

information about variable usage: Who?Where? etc.

– In an OO environment, we can find out whichclasses use other classes, which are a base of aninheritance structure, etc.

– We can also find potential blocks of code thatcan never be executed in running the program(dead code).

Page 2: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 2

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

BAT

• Is a tool that lets us perform static analysison Java programs (class files).

– Builds an XML database of entities and relationships ina system.

– Can use several tools for querying and visualizing thedata.

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Entities• ‘Entities’ are individuals that live in the system, and

attributes associated with them.Some examples:– Classes, along with information about their superclass, their scope,

and ‘where’ in the code they exists.– Methods/functions and what their return type or parameter list is,

etc.– Variables and what their types are, and whether or not they are

static, etc.

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Relationships

• ‘Relationships’ are interactions between theentities in the system.Relationships include:– Classes inheriting from one another.– Methods in one class calling the methods of another

class, and methods within the same class calling oneanother.

– One variable referencing another variable.

Page 3: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 3

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Creating BAT Databases

• BAT is really a library that can process JAR files• BATAnalyzer is a small app wrapped around BAT

to return a full XML database from BAT for laterprocessing– Found at: BATROOT/analyzer/src

• To run:export PATH=/usr/remote/serg/jdk1.5.0_11/bin/:$PATHjava -Xmx2G -cp /usr/remote/serg/binbat2toxml.jar:/usr/remote/serg/bin/batanalyzer.jar

batanalyzer.Main <JAR> <OUTPUT>

Need to giveJava a lot ofMemory to

process largeprojects

Project toanalyze

BAT APIXMLoutput file

Call toanalyzer

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Provided Tools to deal with BAT• bdef – A BASH wrapper around XSLT queries to get

entity information• bref – A BASH wrapper around XSLT queries to get

relationship information

• dot – A visualization tool. Takes information from queryand displays it as a graph.

• On TUX to get the scripts do:export PATH=$PATH:/usr/remote/serg/bin/

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

bdef Syntax

• bdef takes information from the entities database basedon a query, and returns the results in an ascii-table.

bdef xml_file entity_kind entity_name [attr=val]

– xml_file is the xml file containing the extracted database– entity_kind is the ‘type’ of entity to retrieve.– entity_name is a pattern to match for names of entities.– attr=val are bindings to match for attributes of the entity

Page 4: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 4

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Entity Kinds

• Chava recognizes several types of entity‘kinds’ for use in the bdef/brefcommands.

• m is for Method• c is for Class• f is for Field• - is a match for any entity_kind

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Entity Names

• An entity name can assume many formsfollowing regEX patterns

– Explicit name (e.g., ‘myTempStringVar’)

– Wild-card Pattern (e.g., ‘myTemp.*’)

– A complete wild-card, denoted with ‘.*’

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Attribute=Value

• Attribute=Value settings are used to furtherrestrict a query based on some conditionspecified as regEX.

• Any field is searchable• The most common restriction is to restrict to

a specific file, or to filter out a file. E.g.,bdef file.xml - - filename=FileIDoLike.java

bdef file.xml - - filename=[^(FileIDoNOTLike.java)]

Page 5: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 5

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Fields• Class

– name, filename, scope, deprecated, final, abstract

• Method– name, class, filename, scope, static, deprecated, final,

abstract, varargs, bridge, native, synchronized, return,parameters

• Field– Name, class, filename, type, scope, static, deprecated

final, transient, volatile, enum

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Example Query• Assume that we want to find all the methods in a specific file (in

this case, World.java) that start with ‘get’. Our query would looklike the following:

bdef sim.xml m "get.*" filename="World\.java”

World.java is a part of a Discrete Event Simulator that containsinformation about the simulation environment

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Example Results (bdef)bdef sim.xml m "get.*" filename="World\.java"

getWorldArray:World:World.java:public:false:false:false:false:false:false:false:false:getWorldString:World:World.java:public:false:false:false:false:false:false:false:false:getWorldString:World:World.java:public:false:false:false:false:false:false:false:false:getWorldMaskString:World:World.java:public:false:false:false:false:false:false:false:false:getEmpty:World:World.java:public:false:false:false:false:false:false:false:false:getWidth:World:World.java:public:false:false:false:false:false:false:false:false:getHeight:World:World.java:public:false:false:false:false:false:false:false:false:

Page 6: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 6

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Results Explained• The bdef query resulted in a collection of : separated

lists. The data in the columns mean the following:– name is the name of the method– class is the class the method belongs too– filename the file containing this method– scope the scope of the method– static if the method is static– deprecated if the method is deprecated– final if the method is final– abstract if the method is abstract– varargs if the method uses variable arguments– bridge if the method is a bridge– native if the method is native– synchronized if the method is synchronized– return the method’s return type– parameters the types of parameters accepted

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Exercise

• This exercise uses some Unix utilitiesalong with our use of bdef. Theexercise involves two things:

– Counting the number of methods of class World (inWorld.java).

– Printing out a list of methods in the form of theirname, return type, and parameter list.

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Using Unix(Part One)

• In order to count the number of lines of adocument, one can use the command linetool wc.– The –l option makes it count lines.– Piping to it makes it count the lines of output

from a program.{bdef query} | wc –l

counts the number of lines in a bdef query.

Page 7: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 7

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

The solution is …

• The solution to the first problem is:

bdef sim.xml m ".*" filename="World\.java" | wc -l

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Using Unix(Part Two)

• For the second question, we will again use theunformatted output of bdef.– This time, we’ll take note of the format of the

unformatted output! We’ll keep this limited to the caseof unformatted output for methods.

– Each field of the unformatted output is delimited by acolon. The fields we care about are the name, return-type, and parameter-list fields. These are fields 1, 13,and 14, respectively.

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Using Unix(Part Two)

• The final piece in the puzzle of displaying thespecific fields is getting the fields themselvesout of the output.– The cut utility will do nicely. We can send it a

delimiter, and a list of field numbers for a file, and itwill return those fields for each line.

– The delimiter flag for cut is –d. The field numbersdelimiter is –f, followed by a series of commaseparated numbers.

Page 8: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 8

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

The solution is …

• Our target query is thus:

bdef sim.xml m ".*" class="World"| cut -d ":" -f 1,13,14

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Output for Exercise

• Question One: 13

• Question Two:<init>::(int,int,)removeEntity::(Location,)moveEntity::(Location,Location,)addEntity::(Location,)checkBounds:boolean:(Location,)checkLocation:boolean:(Location,)getWorldArray:char[][]:()getWorldString:java.lang.String:(char[][],)getWorldString:java.lang.String:()getWorldMaskString:java.lang.String:(java.util.Vector,java.util.Vector,)setBox::(char[][],int,int,int,int,char,)getEmpty:char:()getWidth:int:()getHeight:int:()<clinit>::()

• Not very pretty, but useful (we hope…).

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

bref

• bref is a tool that displays relationshipinformation by linking one entity to another

Page 9: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 9

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

bref Syntax

bref xml kind1 name1 kind2 name2

– kind1 and kind2 are entity kinds– name1 and name2 are entity names– xml the XML file containing the database

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Example Query

• Here’s a query to find all class-classrelationships in the database.

bref sim.xml c “.*” c “.*”

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Example Results (bref)

• bref sim.xml c “.*” c “.*"AutoCar" -> "Car”

"AutoControl" -> "java.lang.Object"

"Car" -> "Entity”

"CarControlException" -> "java.lang.Exception"

"CarCrashException" -> "java.lang.Exception"

"CarMoveController" -> "Entity"

"CarOutOfBounds" -> "java.lang.Exception"

"CarParkTrafficGenerator" -> "Entity"

………………………

Page 10: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 10

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Results Explained

• bref returned a list of classes.• Each line represents a relationship between

the entities• The entity on the right is the first entity

asked for• The entity of the left is the second entity

asked for

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Exercise – bref• In these exercises, we’ll examine various

relations between the entities of a system.• We’ll go over:

– Inheritance relationships.– Method-Method relationships.– How to write a shell script using BAT tools

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Exercise #1

• We’ve already seen how to find the entireinheritance tree from our example, so thisexercise should be easy:

– Find all the classes that Entity inherits from, andall the classes that subclass it.

Page 11: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 11

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Inheritance Relation

• The relation between classes that we areinterested in is subclassing.

• But which entity in the relation subclassesthe other?– The answer is that the first entity subclasses the

second.

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Inheritance Relation (Cont’d)

– The answer to the question “which class isEntity a subclass of” is:

bref sim.xml c “Entity” c “.*”

– We can analogously find which classes subclassEntity :

bref sim.xml c “.*” c “Entity “

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Exercise #2

• This exercise concentrates on method-to-method relations.

• Our task is to find what the fan-in and fan-out of a function are.

• We’ll use World.addEntity function in theexample

Page 12: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 12

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Definition: Fan-In/Fan-Out

• Fan-In– The fan-in of a function/method is the number

of functions/methods that invoke that method.

• Fan-Out– The fan-out of a function/method is the number

of functions/methods that it invokes.

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Finding Fan-In, Fan-Out

• The fan-in of a method can be calculated thusly:bref sim.xml m ".*" m "World.addEntity" | wc -l

• The fan-out of a method can be calculated analogously:bref sim.xml m "World.addEntity" m ".*" | wc -l

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Exercise #3

• In this Exercise, we’ll write a shell script todetermine if one class is an ancestor or adescendent of another.

Page 13: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 13

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Descendent Relation

• A class X is an descendent of class Y if Xsubclasses Y, or X’s superclass is adescendent of Y.

• This sets up a nice recursion, which willmake our job easy.

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Shell Scripting

• Our first step is to come up with an exactspecification of what we want:– Given two classes, D and A, our script should

report a 1 if D is an descendent of A, and 0otherwise.

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Shell Scripting…

• Our first coding step is to determine whatshell to use. For this exercise, we’ll beusing the C shell.

• This makes our shebang line like:#!/bin/csh

Page 14: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 14

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Shell Scripting

• To make this a little nicer to look at, we’llmake a few small helper-scripts…– One to return whether one class subclasses

another.– One to return the ‘name’ field from

unformatted BAT output.– One to return the names of all the classes that

inherit from a given class.

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Helper Script (does_subclass)

• Our first script is pretty simple:

#!/bin/csh

@ z = `bref $1 c $2 c $3 | wc -l` != 0

echo ${z}

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Helper Script (get_name)

• Our get_name script only has to return thevalue of one field. We’ll just make a smallscript to do it.

cut -d " " -f1

Page 15: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 15

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Helper Script (subclasses)

• A script to get all the subclasses is alsorelatively trivial:

bref $1 c ".*" c $2 |get_name

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

The Actual Script (ancestor)

• Since our relation is a recursive one, we have tostart our code by taking care of the base case(which is that D is a subclass of A. Parent-Childrelationship…).

#!/bin/csh

if (`bref $1 c $2 c $3 | wc -l ` != 0) then

echo 1

exit

endif

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

The Rest of the Script• The rest of the script deals with the recursion. We

have to check every subclass to see if it is anancestor of the target class.

foreach child (`subclasses $1 $3`) if (`ancestor $1 $2 $child`) then

echo 1

exit endif

end

Page 16: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 16

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

However…

• There’s a better way to do this, which would be totraverse up from the descendent.– There can be multiple subclasses to any class.– In Java, there is only one superclass to a class.

• We’ll call this the ancestor relation, defined as:– X is an ancestor of Y if X is Y’s superclass,– or X is an ancestor of Y’s superclass.

• We’ll write two little helper scripts to do therewrite.

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Helper Scripts, II(other_name)

• A script to get the name of the second entityof a relation could be useful.

cut -d " " -f3

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Helcper Scripts, II (parent)

• A second script, to return the parent of aclass, if it exists, would be:

#!/bin/csh

bref $1 c $2 c ".*" | other_name

Page 17: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 17

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Making the Finished Product

• First take care of the base case of therecursion:

#!/bin/cshif (`other_name $1 $2 $3`) then echo 1 exitendif

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Last Bit o’ Code

• The rest of the code deals with recursing upthe inheritance tree…

if (`parent $1 $2 | wc -l ` != 0) then

ancestor $1 `parent $1 $2` $3

else

echo 0

endif

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Visualizing Relationships

• We will be using DOT and Graphviz tovisualize BAT relationships

– dot: Used to draw a ‘directed graph.’

– Graphviz: Visualizes DOT format

Page 18: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 18

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Graphs (Definition)

• A graph G(V, E) is a set of vertices, V, anda set of edges, E.

• For each edge e in E, there are two vertices,(x, y), in V such that E is an edge between xand y.

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Graph Details

• Edge Crossings

• Directed Graphs

• Parallel Edges

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Graph Examples

• A road map of a large area is a graph.Cities are vertices, and roads are edges.

• An inheritance tree is a directed graph.

• A call tree is a graph.

Page 19: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 19

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

DOT Format

digraph mdg { "First" -> "java.lang.Object" "First" -> "Second" "Second" -> "java.lang.Object" "Second" -> "java.lang.System" "Second" -> "java.io.PrintStream"}

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Relationship to DOT

• The relationship queries already return inDOT format, minus the header.

• All we need to do is append the following tothe head:– digraph mdg {

• And the following to the tail:– }

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

XSLT

• Both bdef and bref are wrappers aroundXSLT queries

• XSLT/XPATH – Used to query thedatabase.– Firefox can render XSLT stylesheets over XML

datasets

Page 20: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 20

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

XSLT/XPATH Tutorials/Tools

• References– http://www.w3schools.com/– http://www.zvon.org/xxl/XSLTreference/Output/index.html– http://www.xml.com/pub/a/2000/08/holman/index.html

• Tools– xsltproc on *nix systems– Windows:

http://www.microsoft.com/downloads/details.aspx?familyid=2fb55371-c94e-4373-b0e9-db4816552e41&displaylang=en

– Firefox can apply XSLT stylesheets

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Source Code AnalysisUsing Chava

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

What is Static Analysis?

• Mining source code for information.

• Using that information to presentabstractions of, and answer questions about,software structure.

Page 21: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 21

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

What can we get fromsource code analysis?

• Type of information is model dependent– In almost any language, we can find out

information about variable usage: Who?Where? etc.

– In an OO environment, we can find out whichclasses use other classes, which are a base of aninheritance structure, etc.

– We can also find potential blocks of code thatcan never be executed in running the program(dead code).

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Chava

• Is a tool that lets us perform static analysison Java programs (source or class files).

– Builds a database of entities in a system.

– Builds a database of relationships in a system.

– Includes several tools for querying the databases fordata, and some tools for visualizing results.

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Entities• ‘Entities’ are individuals that live in the system, and

attributes associated with them.Some examples:– Classes, along with information about their superclass, their scope,

and ‘where’ in the code they exists.– Methods/functions and what their return type or parameter list is,

etc.– Variables and what their types are, and whether or not they are

static, etc.

Page 22: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 22

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Relationships

• ‘Relationships’ are interactions between theentities in the system.Relationships include:– Classes inheriting from one another.– Methods in one class calling the methods of another

class, and methods within the same class calling oneanother.

– One variable referencing another variable.

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Creating Chava Databases

• Chava takes java/class files, and turns theminto data files (.A ext) that can be integratedinto a database– Create a .A file for a given Java file:chava –c filename.java

– Create .A files for all Java files in directory:chava –c *.class

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Pulling it all together…

• Chava then takes .A files and creates thedatabases.– Create databases out of two .A files:chava –l f1.A f2.A

– Create databases for all .A files in directory:chava –l *.A

Page 23: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 23

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Chava Tools• cdef/vdef – Used to query the entities database.

• cref/vref – Used to query the relationship database.

• dagger/dot – A visualization tool. Takes informationfrom chava databases and displays it as a graph.

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

cdef/vdef

• cdef takes information from the entitiesdatabase based on a query, and returns theresults in an ascii-table.

• vdef actually shows the code of theentities from a query.

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Syntax

• cdef and vdef share the same syntax:

{vdef|cdef} entity_kind entity_name [attr=val]..

– entity_kind is the ‘type’ of entity to retrieve.– entity_name is a pattern to match for names of entities.– attr=val are bindings to match for attributes of the entity

Page 24: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 24

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Entity Kinds• Chava recognizes several types of entity ‘kinds’ for use in

the cdef/vdef/cref/vref commands.• p is for Package• f is for File• m is for Method• c is for Class• l is for Field• s is for String• i is for Interface• - is a match for any entity_kind

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Entity Names

• An entity name can assume many forms

– Explicit name (e.g., ‘myTempStringVar’)

– Wild-card Pattern (e.g., ‘myTemp*’)

– A complete wild-card, denoted with ‘-’

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Attribute=Value

• Attribute=Value settings are used to furtherrestrict a query based on some condition.

• The most common restriction is to restrict toa specific file, or to filter out a file. E.g.,

cdef - - file=FileIDoLike.java

cdef - - file!=FileIDoNOTLike.java

Page 25: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 25

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Example Query• Assume that we want to find all the methods in a specific file (in

this case, ANSIDisplay.java) that start with ‘get’. Our query wouldlook like the following:

cdef m ‘get*’ file=./ANSIDisplay.java

• Or, to see the code…vdef m ‘get*’ file=./ANSIDisplay.java

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Example Results (cdef)• cdef m ‘get*’ file=ANSIDisplay.java

name scope file bline eline====================== ======= ================= ===== =====String getEscapeSequen public ANSIDisplay.java 76 82String getEscapeSequen public ANSIDisplay.java 38 42String getEscapeSequen public ANSIDisplay.java 93 108String getEscapeSequen public ANSIDisplay.java 118 128String getEscapeSequen public ANSIDisplay.java 139 153

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Results Explained

• The cdef query resulted in a table withseveral columns. The data in the columnsmean the following:

– name: The name of the entity.– scope: The scope of the entity within its ‘parent’

entity (the entity it resides in).– file: The name of the file that the entity is in.– bline: The line that the entity begins on.– eline: The line that the entity ends on.

Page 26: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 26

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Example Results (vdef)

• vdef m ‘get*’ file=ANSIDisplay.java(partial results)

public static String getEscapeSequence(int colour, boolean foreground){

colour = setColour(colour, foreground);return (ESCAPE + Integer.toString(colour) + "m");

}public static String getEscapeSequence(int value){

if (!ANSIDisplaySwitchCheck.validSwitch(value))throw new IllegalArgumentException("Bad Switch");

return (ESCAPE + Integer.toString(value) + "m");}

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Results Explained

• vdef printed out the entities we askedabout, exactly how they appear in the sourcecode.

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Finding all File Names• Knowing all the file names could be important, so let’s see

how to do that with chava.– We want to use cdef for this, and just have chava output a list of

file names.– We also want to restrict the entity_kind to that of file. If you

remember, ‘f’ is the type for file.– We also want any file in the database to be listed, so we want to

match against any entity_name. ‘-’ will do.cdef f -

Page 27: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 27

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Exercise

• It would be nice to know how a class interactswith its superclass.

• We’ll take a peek at this with the classesANSIColourPrinter andANSIPrinter.

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Class-Superclass

• This problem is a bit more than just onecdef/vdef command. First step…– We need to see how ANSIColourPrinter

calls its super-constructor.– We want to see the calls, so we’ll use vdef.– Constructors are methods in chava.

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Class-Superclass(Step One)

• The query we need to see the constructorsof ANSIColourPrinter is:

vdef m ANSIColourPrinter

This results in…

Page 28: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 28

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Class-Superclass Interaction(Step 1 – Results)

public ANSIColourPrinter(OutputStream out){

this(out, m_defaultColour);}public ANSIColourPrinter(OutputStream out,boolean doReset){

this(out, m_defaultColour,doReset);}public ANSIColourPrinter(OutputStream out, ANSICharacterColour colour)

{this(out, colour, m_defaultReset);

}publicANSIColourPrinter(OutputStream out,ANSICharacterColour colour, boolean

doReset)

{super(colour, out, doReset);

}

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Class-Superclass Interaction(Step One – Analysis)

• We now know that ANSIColourPrinter accepts:– An OutputStream,– An ANSICharacterColour– A boolean.

• When not supplied with either of the last twoparameters, the constructor uses some defaults.

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Class-Superclass Interaction(Step Two)

• The next step is to examine whatANSIPrinter does in its constructor.

• This is basically the same thing as peekingat the ANSIColourPrinterconstructors.

vdef m ANSIPrinter

Page 29: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 29

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

public ANSIPrinter(OutputStream out, ANSIEscapeSequenceType sequence){

this(out, sequence, m_defaultReset);}public ANSIPrinter(OutputStream out, ANSIEscapeSequenceType sequence, boolean doReset){

this(sequence, out, doReset, true);}public ANSIPrinter(OutputStream out, ANSIEscapeSequenceType sequence, boolean doReset,

boolean resetOnLeave){

super(out);

m_escape = sequence;m_reset = doReset;m_resetOnFinalize = resetOnLeave;

m_showEscape = false;}

Class-Superclass Interaction(Step Two – Results)

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Class-Superclass Interaction(Step Two – Analysis)

• Apparently, the constructor for ANSIPrinteraccepts values for:– an OutputStream– an ANSIEscapeSequenceType– two booleans.

• From what we see the constructor ofANSIPrinter doing, we know that theconstructor does nothing more than just set somevariables to what we pass to it. Nothing really thatspecial.

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Another Exercise

• This exercise uses some Unix utilitiesalong with our use of cdef/vdef.The exercise involves two things:

Counting the number of methods ofANSICharacterColour (inANSICharacterColour.java).

Printing out a list of methods in the form of theirname, return type, and parameter list.

Page 30: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 30

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Using Unix(Part One)

• In order to count the number of lines of adocument, one can use the command linetool wc.– The –l option makes it count lines.– Piping to it makes it count the lines of output

from a program.{cdef query} | wc –l

counts the number of lines in a cdef query.

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Using Unix(Part One)

• Problem with using wc–wc counts all lines, including the ones for

our formatted output table.– Passing the –u option to cdef gives

unformatted output, which is very useful forintegrating chava with unix tools. The syntax is:cdef [-u] kind name [attr=val]

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

The solution is …

• The solution to the first problem is:cdef –u m –file=./ANSICharacterColour.java | wc-l

Page 31: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 31

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Using Unix(Part Two)

• For the second question, we will again use theunformatted output of cdef.– This time, we’ll take note of the format of the

unformatted output! We’ll keep this limited to the caseof unformatted output for methods.

– Each field of the unformatted output is delimited by asemicolon. The fields we care about are the name,return-type, and parameter-list fields. These are fields2, 5, and 9, respectively.

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Using Unix(Part Two)

• The final piece in the puzzle of displaying thespecific fields is getting the fields themselvesout of the output.– The cut utility will do nicely. We can send it a

delimiter, and a list of field numbers for a file, and itwill return those fields for each line.

– The delimiter flag for cut is –d. The field numbersdelimiter is –f, followed by a series of commaseparated numbers.

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

The solution is …

• Our target query is thus:cdef –u m –file=./ANSICharacterColour.java| cut –d’;’ –f2,5,9

Page 32: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 32

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Output for Exercise• Question One: 13

• Question Two:ANSICharacterColour;void;(acin.common.ansi.ANSIColour,acin.common.ansi.ANSIColour)create;acin.common.ansi.ANSICharacterColour;(acin.common.ansi.ANSIColour,acin.common.ansi.ANSIColour)create;acin.common.ansi.ANSICharacterColour;(acin.common.ansi.ANSIColour,int)create;acin.common.ansi.ANSICharacterColour;(acin.common.ansi.ANSIColour,java.lang.String)create;acin.common.ansi.ANSICharacterColour;(int,acin.common.ansi.ANSIColour)create;acin.common.ansi.ANSICharacterColour;(int,int)create;acin.common.ansi.ANSICharacterColour;(int,java.lang.String)create;acin.common.ansi.ANSICharacterColour;(java.lang.String,acin.common.ansi.ANSIColour)create;acin.common.ansi.ANSICharacterColour;(java.lang.String,int)create;acin.common.ansi.ANSICharacterColour;(java.lang.String,java.lang.String)getANSIString;java.lang.String;()getBackground;acin.common.ansi.ANSIColour;()getForeground;acin.common.ansi.ANSIColour;()

• Not very pretty, but useful (we hope…).

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

cref/vref

• cref is a tool that displays informationfrom the Chava relationship database,returning the results in a table.

• vref displays the actual entities involvedin a relationship.

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Syntax

• cref and vref share the same syntax{cref|vref} kind1 name1 kind2 name2 [attr=val].

– kind1 and kind2 are entity kinds– name1 and name2 are entity names– Attributes are a bit different…

Page 33: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 33

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

cref/vref Attributes

• Attr=val pairs in cref/vref aredifferent because they have to deal with twodifferent entities. This is solved byappending a ‘1’ or a ‘2’ on the attribute.

E.g.,– file1=myFile.java

– file2!=yourFile.java

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Example Query

• Here’s a query to find all class-classrelationships in the database.

cref c – c –

or, to see the results:vref c – c –

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Example Results (cref)

• cref c – c –kind1 name1 file1 kind2 name2 file2 rk===== ======== ================ ===== ======== ================ ==class ANSIChar ANSICharacterCol class ANSIEsca ANSIEscapeSequen suclass ANSIColo ANSIColour.java class ANSIEsca ANSIEscapeSequen suclass ANSIColo ANSIColourPrinte class ANSIPrin ANSIPrinter.java su

class ANSIColo ANSIColourPrinte class ANSIPrin ANSIPrinterMap.j suclass ANSICurs ANSICursorMove.j class Object suclass ANSICurs ANSICursorMoveSe class Object suclass ANSIDisp ANSIDisplay.java class Object suclass ANSIDisp ANSIDisplaySwitc class Object suclass ANSIEsca ANSIEscapeSequen class ANSIEsca ANSIEscapeSequen suclass ANSIEsca ANSIEscapeSequen class Object su

class ANSIPrin ANSIPrinter.java class PrintStream suclass ANSIPrin ANSIPrinterMap.j class Object suclass ANSIPrin ANSIPrinterOptio class Object su

Page 34: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 34

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Results Explained

• cref returned a table. The columns arejust like cdef columns, except some have a‘1’ and some have a ‘2’ appended.

• Columns with a ‘1’ appended refer to thefirst entity.

• Columns with a ‘2’ refer to the secondentity.

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

That last column…

• The last column, rk, denotes the kind ofrelationship. Its values can be:

• Reference

• Fieldread

• Fieldwrite

• Implements

• Subclass

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Example Results (vref)

• vref c – c – (partial results)RECORD NUMBER 0### ANSICharacterColour.java ###public class ANSICharacterColour extends ANSIEscapeSequenceType{ private ANSIColour m_foreground; private ANSIColour m_background;

/** * Method to create an <code>ANSICharacterColour</code> from two integers representing the

foreground and background colour, as defined in <code>ANSIColourConstants</code>. * * @param foreground The value representing the colour to be the foreground. * @param background The value representing the colour to be the background. * @exception java.lang.IllegalArgumentException Thrown if the foreground and background values

aren't valid ANSI colours. */

public static ANSICharacterColour create(int foreground, int background) { return new ANSICharacterColour(new ANSIColour(foreground, true), new ANSIColour(background,

false)); }

Page 35: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 35

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Exercise – cref/vref• In these exercises, we’ll examine various

relations between the entities of a system.• We’ll go over:

– Inheritance relationships.– Method-Method relationships.– How to write a shell script using Chava tools

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Exercise #1

• We’ve already seen how to find the entireinheritance tree from our example, so thisexercise should be easy:

– Find all the classes thatANSIEscapeSequenceType inherits from, andall the classes that subclass it.

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Inheritance Relation

• The relation between classes that we areinterested in is subclassing.

• But which entity in the relation subclassesthe other?– The answer is that the first entity subclasses the

second.

Page 36: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 36

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Inheritance Relation (Cont’d)

– The answer to the question “which class isANSIEscapeSequenceType a subclass of” is:

cref c ANSIEscapeSequenceType c -

– We can analogously find which classes subclassANSIEscapeSequenceType:

cref c - c ANSIEscapeSequenceType

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Exercise #2

• This exercise concentrates on method-to-method relations.

• Our task is to find what the fan-in and fan-out of a function are.

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Definition: Fan-In/Fan-Out

• Fan-In– The fan-in of a function/method is the number

of functions/methods that invoke that method.

• Fan-Out– The fan-out of a function/method is the number

of functions/methods that it invokes.

Page 37: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 37

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Finding Fan-In, Fan-Out

• A key piece of information to know here is that the –uoption from cdef works in cref.

• The fan-in of a method can be calculated thusly:cref –u m – m my_method | wc –l

• The fan-out of a method can be calculated analogously:cref –u m my_method m - | wc –l

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Exercise #3

• In this Exercise, we’ll write a shell script todetermine if one class is an ancestor or adescendent of another.

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Descendent Relation

• A class X is an descendent of class Y if Xsubclasses Y, or X’s superclass is adescendent of Y.

• This sets up a nice recursion, which willmake our job easy.

Page 38: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 38

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Shell Scripting

• Our first step is to come up with an exactspecification of what we want:– Given two classes, D and A, our script should

report a 1 if D is an descendent of A, and 0otherwise.

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Shell Scripting…

• Our first coding step is to determine whatshell to use. For this exercise, we’ll beusing the C shell.

• This makes our shebang line like:#!/bin/csh

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Shell Scripting

• To make this a little nicer to look at, we’llmake a few small helper-scripts…– One to return whether one class subclasses

another.– One to return the ‘name’ field from

unformatted chava output.– One to return the names of all the classes that

inherit from a given class.

Page 39: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 39

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Helper Script (does_subclass)

• Our first script is pretty simple:

#!/bin/csh@ z = `cref –u c $1 c $2 | wc –l` != 0echo ${z}

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Helper Script (get_name)

• Our get_name script only has to return thevalue of one field. We’ll just make a smallawk script to do it.

awk –F ‘;’ ‘{print $3}’

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Helper Script (subclasses)

• A script to get all the subclasses is alsorelatively trivial:

cref –u c – c $1 | get_name

Page 40: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 40

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

The Actual Script (ancestor)• Since our relation is a recursive one, we have to start our

code by taking care of the base case (which is that D is asubclass of A. Parent-Child relationship…).

if (`cref –u c $1 c $2 | wc -l` != 0) then

echo 1exit

endif

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

The Rest of the Script• The rest of the script deals with the recursion. We

have to check every subclass to see if it is anancestor of the target class.

foreach child (`subclasses $2`)if (`ancestor $1 $child`) then

echo 1exit

endifend

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

However…• There’s a better way to do this, which would be to traverse

up from the descendent.– There can be multiple subclasses to any class.– In Java, there is only one superclass to a class.

• We’ll call this the ancestor relation, defined as:– X is an ancestor of Y if X is Y’s superclass,– or X is an ancestor of Y’s superclass.

• We’ll write two little helper scripts to do the rewrite.

Page 41: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 41

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Helper Scripts, II(other_name)

• A script to get the name of the second entityof a relation could be useful.

awk -F ';' '{print $17}'

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Helper Scripts, II (parent)

• A second script, to return the parent of aclass, if it exists, would be:

#!/bin/cshcref -u c $1 c - | other_name

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Making the Finished Product

• First take care of the base case of therecursion:

if (`does_subclass $1 $2`) thenecho 1exit

endif

Page 42: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 42

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Last Bit o’ Code

• The rest of the code deals with recursing upthe inheritance tree…

if (`parent $1 | wc –l` != 0) thenancestor `parent $1` $2

elseecho 0

endif

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Visualizing Chava

• There are two tools we’ll be using tovisualize chava queries.– dagger: Lets us use a cref-esque query to

create a ‘directed graph.’– dot: Used to draw a ‘directed graph.’

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Graphs (Definition)

• A graph G(V, E) is a set of vertices, V, anda set of edges, E.

• For each edge e in E, there are two vertices,(x, y), in V such that E is an edge between xand y.

Page 43: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 43

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Graph Details

• Edge Crossings

• Directed Graphs

• Parallel Edges

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Graph Examples

• A road map of a large area is a graph.Cities are vertices, and roads are edges.

• An inheritance tree is a directed graph.

• A call tree is a graph.

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

The dagger Tool

• The dagger tool takes a cref-style query,and returns the results as a graph of therelationships.

Page 44: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 44

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Syntax

• dagger syntax is exactly like cref syntax(except for lack of options).

dagger kind1 name1 kind2 name2

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

dagger to dot

• dagger only creates a representation describinga graph.

• dot takes that representation and outputssomething that can be visualized.– Can make dotty files.– Can also make postscript files.

dagger kind1 name1 kind2 name2 | dot -Tps

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Example Query

• A sample query will show just how the outputof dagger -> dot looks.

• A good thing to check is the class inheritanceheirarchy.– We already know the cref query for this.– The dagger query is

dagger c – c – | dot –Tps > classes.ps

Page 45: SoftWindows 4/30/07spiros/teaching/CS675/slides/sca.pdf · SoftWindows 4/30/07 Distributed Objects 2 Reverse Engineering (Source Code Analysis) © SERG BAT •Is a tool that lets

SoftWindows 4/30/07

Distributed Objects 45

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Viewing PostScript

• A good PostScript viewer is ghostview.– The command to use ghostview is

ggv <file>

• Use ghostview to look at the class heirarchygraph that you just created.

Reverse Engineering (Source Code Analysis)Reverse Engineering (Source Code Analysis) © SERG

Does Chava have siblings?

• Chava is really a tool that uses the CIAsystem, from AT&T Labs - Research.

• The CIA system can be extended to anytype of structured language.

• Other implementations exist for:– C/C++, HTML, ksh, etc.