investigating java classes with formal concept analysis

23
Investigating JAVA Investigating JAVA Classes with Classes with Formal Concept Formal Concept Analysis Analysis Uri Dekel Uri Dekel ([email protected]) ([email protected]) Based on M.Sc. work at the Israeli Institute of Technology. To appear: 10 th Working Conference on Reverse Engineering (WCRE’03), and as a poster in OOPSLA’03 17-791 Software Research Seminar (SSSG)

Upload: moke

Post on 06-Jan-2016

52 views

Category:

Documents


0 download

DESCRIPTION

17-791 Software Research Seminar (SSSG). Investigating JAVA Classes with Formal Concept Analysis. Uri Dekel ([email protected]). Based on M.Sc. work at the Israeli Institute of Technology. To appear: 10 th Working Conference on Reverse Engineering (WCRE’03), and as a poster in OOPSLA’03. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Investigating JAVA Classes with Formal Concept Analysis

Investigating JAVA Investigating JAVA Classes with Classes with

Formal Concept Formal Concept Analysis Analysis

Uri Dekel Uri Dekel ([email protected])([email protected])Based on M.Sc. work at the Israeli Institute of Technology.

To appear: 10th Working Conference on Reverse Engineering (WCRE’03), and as a poster in OOPSLA’03

17-791 Software Research Seminar (SSSG)

Page 2: Investigating JAVA Classes with Formal Concept Analysis

9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 2

OutlineOutline

Research goals and hypothesesResearch goals and hypotheses A crash-course in formal concept A crash-course in formal concept

analysisanalysis Interface visualizationInterface visualization Reasoning about class Reasoning about class

implementation.implementation. Applications to code inspectionApplications to code inspection Additional researchAdditional research

Page 3: Investigating JAVA Classes with Formal Concept Analysis

9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 3

GoalsGoals

Research question:Research question:``Can we exploit the data-member based cohesion ``Can we exploit the data-member based cohesion

between function-methods in a class to reason between function-methods in a class to reason about the class and discover errors?’’about the class and discover errors?’’

Specifically:Specifically:1.1. Provide faster learning curve for new class users Provide faster learning curve for new class users

by improving interface presentationby improving interface presentation

2.2. Assist reverse engineering by visualizing structureAssist reverse engineering by visualizing structure

3.3. Assist code inspection by suggesting reading orderAssist code inspection by suggesting reading order Important principle: keep it simple to use and Important principle: keep it simple to use and

learn.learn.

Page 4: Investigating JAVA Classes with Formal Concept Analysis

9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 4

Hypothesis #1Hypothesis #1

Data-member use is fundamental to Data-member use is fundamental to understanding a class.understanding a class. All possible implementations of an All possible implementations of an

operation will use the same fieldsoperation will use the same fields Representation changes are rareRepresentation changes are rare Basis for cohesion-based metrics (e.g., Basis for cohesion-based metrics (e.g.,

LCOM)LCOM) Analogous to global variable based Analogous to global variable based

modularization of procedural code.modularization of procedural code.

Page 5: Investigating JAVA Classes with Formal Concept Analysis

9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 5

Hypothesis #2Hypothesis #2

Methods that use the same Methods that use the same combination of fields are likely to be combination of fields are likely to be related.related. e.g., get/set, add/remove, etc.e.g., get/set, add/remove, etc. Even more so due to the ``shopping list Even more so due to the ``shopping list

approach’’approach’’ Promotes complete interfaces using Promotes complete interfaces using

composite methodscomposite methods

Page 6: Investigating JAVA Classes with Formal Concept Analysis

9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 6

MeansMeans Formal Concept AnalysisFormal Concept Analysis

Mathematical classification techniqueMathematical classification technique Uses binary relation (Uses binary relation (contextcontext) between ) between objectsobjects and and

attributesattributes not to be confused with OO termsnot to be confused with OO terms

Produces a concept lattice (next slide)Produces a concept lattice (next slide) Much literature on applications in various fieldsMuch literature on applications in various fields

Pnt3D getX setX getY setY setXY getZ setZ setXYZ getColor setColor drawx √ √ √ √ √ √y √ √ √ √ √ √z √ √ √ √ √

color √ √ √ √

Ob

jects

Attributes

Example: Context of the Pnt3D class

Page 7: Investigating JAVA Classes with Formal Concept Analysis

9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 7

Formal Concept AnalysisFormal Concept Analysis Input: A context <O,A,R> Input: A context <O,A,R>

O is a set of O is a set of objectsobjects A is a set of A is a set of attributesattributes R is a binary relation between O and AR is a binary relation between O and A

Mapping: Mapping: Galois ConnectionGalois Connection Common attributesCommon attributes of a set of objects: of a set of objects:

Common objectsCommon objects of a set of attributes: of a set of attributes:

Output: Output: ConceptsConcepts <O’,A’> s.t. <O’,A’> s.t.

RaoOoAaOOCA ,:''

')'(

')'(

OACO

AOCA RaoAaOoAACO ,:''

Page 8: Investigating JAVA Classes with Formal Concept Analysis

9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 8

A concept lattice is based upon a partial order between concepts:

Formal Concept AnalysisFormal Concept Analysis

Concept Objects Attributes

C1 {} {Pnt3D, getx, setx, getY, setY, setXY, getZ, setZ, setXYZ, getColor, setColor, draw}C2 {color} {Pnt3D, getColor, setColor, draw}C3 {x} {Pnt3D, getX, setX, setXY, setXYZ, draw}C4 {y} {Pnt3D, getY, setY, setXY, setXYZ, draw}C5 {x,y} {Pnt3D, setXY, setXYZ, draw}C6 {z} {Pnt3D, getZ, seZ, setXYZ, draw}C7 {x,y,z} {Pnt3D, setXYZ, draw}C8 {x,y,color,z} {Pnt3D, draw}

Example: Concepts of the Pnt3D class

xgetX() setX()setXY() Pnt3D()setXYZ() draw()

C3

getX() setX() getY() getY() setXY() Pnt3D()getColor() setColor() getZ() setZ() setXYZ() draw()

C1

x ysetXY() Pnt3D() setXYZ() draw()

C5

Pnt3D() setXYZ() draw()x y z C7

x y color zPnt3D() draw()

C8

colorgetColor()setColor()Pnt3D() Draw()

C2 ygetY() setY()setXY() Pnt3D()setXYZ() draw()

C4 zPnt3D() getZ()setZ() setXYZ()draw()

C6

21212211 ,, AAOOAOAO

Page 9: Investigating JAVA Classes with Formal Concept Analysis

9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 9

Concept LatticesConcept Lattices

A A sparse concept lattice sparse concept lattice provides an provides an alternate view of the tabular context and alternate view of the tabular context and the full concept latticethe full concept lattice Each concept is a group of objects which have Each concept is a group of objects which have

the same attributes the same attributes The attributes are the union of attributes in that The attributes are the union of attributes in that

concept and all the concept that it dominatesconcept and all the concept that it dominates

In our case, methods that useIn our case, methods that usethe same fields are clustered the same fields are clustered togethertogether Reveals structure and asymmetriesReveals structure and asymmetries

colorgetColor()setColor()

setXY()

setXYZ()

Pnt3D()draw()

zgetZ()setZ()

xgetX()setX()

ygetY()setY()

Page 10: Investigating JAVA Classes with Formal Concept Analysis

9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 10

Interface VisualizationInterface Visualization

The lattice partitions the methods in the The lattice partitions the methods in the interface into equivalence classesinterface into equivalence classes Similar methods are heuristically clustered Similar methods are heuristically clustered

together.together. An automatic ``feature categorization’’An automatic ``feature categorization’’

Lattice provides multidimensional Lattice provides multidimensional connectionsconnections

Compare with simple lexical lists of methodsCompare with simple lexical lists of methods

(Note: class is “flattened” to remove (Note: class is “flattened” to remove inheritance details)inheritance details)

Page 11: Investigating JAVA Classes with Formal Concept Analysis

9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 11

Interface VisualizationInterface Visualization

To be effective, multiple methods To be effective, multiple methods should appear in each concept, on should appear in each concept, on averageaverage

A lattice can have up to A lattice can have up to n=2n=2MIN(|M|,|F|)MIN(|M|,|F|) conceptsconcepts In a data set of circa 6000 classes:In a data set of circa 6000 classes:

In 99.5%, In 99.5%, n < M + Fn < M + F In 77.4%, n < MIn 77.4%, n < MExample: Concepts vs.

Methods in Eclipse.

Page 12: Investigating JAVA Classes with Formal Concept Analysis

9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 12

Case StudyCase Study

The The MoleculeMolecule class from class from CDKCDK CDKCDK: Chemistry Development Kit: Chemistry Development Kit

Open source library of chemistry related classesOpen source library of chemistry related classes Developed at the Max Plank institute in GermanyDeveloped at the Max Plank institute in Germany Used in chemistry visualization applicationsUsed in chemistry visualization applications

Why the Why the MoleculeMolecule class? class? Has a large interface (nearly 75 public members)Has a large interface (nearly 75 public members) The represented entity is familiar to most peopleThe represented entity is familiar to most people

Our technique revealed new errors in Our technique revealed new errors in this class.this class.

Page 13: Investigating JAVA Classes with Formal Concept Analysis

9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 13

Case StudyCase Study

Lattice structure hints on class Lattice structure hints on class structurestructure A lot of independent operations on the A lot of independent operations on the

left.left. Similar to a C struct.Similar to a C struct.

Cohesive component on the right.Cohesive component on the right.

contains(Bond):booleangetBond(Atom,Atom):BondgetBonds():Bond[]getBondCount(Atom):intgetBondOrderSum(Atom):intgetConnectedAtoms(Atom):Atom[]getConnectedAtomsVector(Atom):VectorgetConnectedBonds(Atom):Bond[]getDegree(Atom):intgetHighestCurrentBondOrder(Atom):doublegetMinimumBondOrder(Atom):doubleremoveBond(Bond):BondremoveBond(int):BondremoveBond(Atom,Atom):boolean

atoms():EnumerationshallowCopy():Object

addListener(ChemObjectListener):voidremoveListener(ChemObjectListener):void

getRemark(Object):ObjectsetRemark(Object,Object):void

add(AtomContainer):voidaddBonds(double):voidclone():ObjectgetIntersection(AtomContainer):AtomContainerremoveAllElements():void

Molecule()Molecule(AtomContainer)Molecule(int,int)Molecule(Molecule)

flags:boolean[]

pointers:Vector[]

contains(Atom):booleanget2DCenter():Point2Dget3DCenter():Point3DgetAtoms():Atom[]getAtomNumber(Atom):intgetLastAtom():AtomremoveAtom(Atom):voidremoveAtom(int):void

addAtom(Atom):void getDegree(int):int

getConnectionMatrix():double[][]remove(AtomContainer):voidremoveAtomAndConnectedBonds(Atom):voidtoString()::String

addBond(int,int,int):voidaddBond(int,int,int,int):void

addBonds(AtomContainer):voidaddBond(Bond):voidremoveAllBonds():void

setProperty(Object,Object):voidgetProperty(Object):Object

setPhysicalProperty(Object,Object):voidgetPhysicalProperty(Object):Object

addChemName(String):voidgetChemName(int):StringgetChemNames():VectorgetChemNamesCount():intsetChemNames(Vector):void

getBeilsteinRN():StringsetBeilsteinRN(String):void

getCasRN():StringsetCasRN(String):void

getAutonomName():StringsetAutonomName(String):void

getFirstAtom():AtomsetAtomAt(int,Atom):voidgetAtomAt(int):AtomsetAtoms(Atom[]):void

getAtomCount():intsetAtomCount(int):void

getBondAt(int):BondsetBondAt(int,Bond):voidsetBonds(Bond[]):void getBondCount():int

title:StringgetTitle():StringsetTitle(String):void

C2 C3 C4 C5 C6 C7

C8

C9 C10C11

C12C13

C14C15

C1

C21

C23

C24 C25

C20

C17

C22

C16

C18 C19

C26

Page 14: Investigating JAVA Classes with Formal Concept Analysis

9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 14

Interface VisualizationInterface Visualization

Multiple Multiple methods with the methods with the similar similar signatures signatures indicate possible indicate possible repetition.repetition.

Inconsistency in Inconsistency in naming.naming.

Inconsistencies Inconsistencies in return types.in return types.

contains(Bond):booleangetBond(Atom,Atom):BondgetBonds():Bond[]getBondCount(Atom):intgetBondOrderSum(Atom):intgetConnectedAtoms(Atom):Atom[]getConnectedAtomsVector(Atom):VectorgetConnectedBonds(Atom):Bond[]getDegree(Atom):intgetHighestCurrentBondOrder(Atom):doublegetMinimumBondOrder(Atom):doubleremoveBond(Bond):BondremoveBond(int):BondremoveBond(Atom,Atom):boolean

C17

Because related methods are grouped Because related methods are grouped in concepts, we can notice in concepts, we can notice inconsistencies or repetitionsinconsistencies or repetitions

Page 15: Investigating JAVA Classes with Formal Concept Analysis

9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 15

Investigate Investigate ImplementationImplementation

We examine fields and dependencies We examine fields and dependencies between concepts to understand the between concepts to understand the cohesive componentcohesive component Collections of Collections of atoms and bondsatoms and bonds

Micro-Micro-management of management of arrays (arrays (countcount field tracks field tracks available items)available items)

Inconsistencies Inconsistencies and broken and broken invariants.invariants.

addBonds(AtomContainer):voidaddBond(Bond):voidremoveAllBonds():void

contains(Atom):booleanget2DCenter():Point2Dget3DCenter():Point3DgetAtoms():Atom[]getAtomNumber(Atom):intgetLastAtom():AtomremoveAtom(Atom):voidremoveAtom(int):void

contains(Bond):booleangetBond(Atom,Atom):BondgetBonds():Bond[]getBondCount(Atom):intgetBondOrderSum(Atom):intgetConnectedAtoms(Atom):Atom[]getConnectedAtomsVector(Atom):VectorgetConnectedBonds(Atom):Bond[]getDegree(Atom):intgetHighestCurrentBondOrder(Atom):doublegetMinimumBondOrder(Atom):doubleremoveBond(Bond):BondremoveBond(int):BondremoveBond(Atom,Atom):Bond

getDegree(int):int

addBond(int,int,int):voidaddBond(int,int,int,int):void

getConnectionMatrix():double[][]remove(AtomContainer):voidremoveAtomAndConnectedBonds(Atom):voidtoString()::String

addAtom(Atom):void

clone():ObjectaddBonds(double):voidremoveAllElements():voidadd(AtomContainer):voidgetIntersection(AtomContainer):AtomContainer

getBondCount():intbondCount:int

growArraySize:int

getFirstAtom():AtomsetAtomAt(int,Atom):voidgetAtomAt(int):AtomsetAtoms(Atom[]):void

atoms:Atom[]

getBondAt(int):BondsetBondAt(int,Bond):voidsetBonds(Bond[]):void

bonds:Bond[]

getAtomCount():intsetAtomCount(int):void

atomCount:intC10

C11

C12C13 C14

C16

C26

C17

C20

C22

C19

C21C18

C23

Page 16: Investigating JAVA Classes with Formal Concept Analysis

9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 16

Investigate Investigate ImplementationImplementation

Asymmetries are revealed by Asymmetries are revealed by examining pairs of related concepts.examining pairs of related concepts.

addBonds(AtomContainer):voidaddBond(Bond):voidremoveAllBonds():void

contains(Atom):booleanget2DCenter():Point2Dget3DCenter():Point3DgetAtoms():Atom[]getAtomNumber(Atom):intgetLastAtom():AtomremoveAtom(Atom):voidremoveAtom(int):void

contains(Bond):booleangetBond(Atom,Atom):BondgetBonds():Bond[]getBondCount(Atom):intgetBondOrderSum(Atom):intgetConnectedAtoms(Atom):Atom[]getConnectedAtomsVector(Atom):VectorgetConnectedBonds(Atom):Bond[]getDegree(Atom):intgetHighestCurrentBondOrder(Atom):doublegetMinimumBondOrder(Atom):doubleremoveBond(Bond):BondremoveBond(int):BondremoveBond(Atom,Atom):Bond

addBond(int,int,int):voidaddBond(int,int,int,int):void

addAtom(Atom):void

getBondCount():intbondCount:int

getFirstAtom():AtomsetAtomAt(int,Atom):voidgetAtomAt(int):AtomsetAtoms(Atom[]):void

atoms:Atom[]

getBondAt(int):BondsetBondAt(int,Bond):voidsetBonds(Bond[]):void

bonds:Bond[]

getAtomCount():intsetAtomCount(int):void

atomCount:intC10

C11

C14

C16

C17

C20

C22

C18

C13

Methodsfor Atoms

Methodsfor Bonds

Arrays Counts Arrays And Counts Additions

Page 17: Investigating JAVA Classes with Formal Concept Analysis

9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 17

Embedded Call GraphEmbedded Call Graph

A concept lattice clusters methods A concept lattice clusters methods but does not portray interactionsbut does not portray interactions

Call graphs show interaction Call graphs show interaction between methods but layout does between methods but layout does not depend on semanticsnot depend on semantics

Embedded call graph combines the Embedded call graph combines the twotwo setXYZ

getColor setColor getX setX getY setY

setXY

draw Pnt3D

getZ setZ

xcolor C3 C y zC2 C4 C6

C1

C5

C8

C7

Page 18: Investigating JAVA Classes with Formal Concept Analysis

9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 18

Code InspectionCode Inspection

Lattice can help us select a reading orderLattice can help us select a reading order Minimize focus shifts.Minimize focus shifts. Similar methods are read consecutively.Similar methods are read consecutively.

We define a global order between concepts.We define a global order between concepts. e.g., each component separately, topological e.g., each component separately, topological

ordering, read by order of layers. ordering, read by order of layers. We define a local order between methods in We define a local order between methods in

each concept.each concept. e.g., topological ordering, read by order of e.g., topological ordering, read by order of

simplicity, etc.simplicity, etc.

Page 19: Investigating JAVA Classes with Formal Concept Analysis

9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 19

Tooling SupportTooling Support

Batch-mode prototypeBatch-mode prototype Produces lattices and metricsProduces lattices and metrics Database-support for metrics and Database-support for metrics and

statistics researchstatistics research Interactive Eclipse plug-in prototypeInteractive Eclipse plug-in prototype

Adds an additional view for a Adds an additional view for a .java.java files files Uses simplistic external static analyzer.Uses simplistic external static analyzer. Limited by current 2D capabilities of Limited by current 2D capabilities of

eclipse.eclipse.

Page 20: Investigating JAVA Classes with Formal Concept Analysis

9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 20

Research DirectionsResearch Directions

Conduct user studies to validate Conduct user studies to validate methodologymethodology Preliminary user-studies provided good Preliminary user-studies provided good

feedbackfeedback Lattice-based metrics suiteLattice-based metrics suite Application to class design in CASE toolsApplication to class design in CASE tools

Interactive class diagram editor based on Interactive class diagram editor based on concept latticeconcept lattice

Semantics assigned by connecting methods to Semantics assigned by connecting methods to fields. Compare with simply adding methods to a fields. Compare with simply adding methods to a list as in current tools.list as in current tools.

Page 21: Investigating JAVA Classes with Formal Concept Analysis

9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 21

Research DirectionsResearch Directions

Class-wide “diffing”Class-wide “diffing” Provide birds-eye view of changed areas.Provide birds-eye view of changed areas.

Concept #5

packing

"directed"

Bottom Concept.Single utility method.

"idHash" "nodeList" "edges"

Edge insertion and removal by node indices.Conversion to GML format moved to #18.

"lastTopId" field. Node insertions.

Concept #14Concept #12 Concept #13Concept #11

Concept #16

Concept #17 Concept #18

Concept #19

Concept #6

Concept #4

Concept #1

Concept #3 Concept #2

Concept #7

Concept #8Concept #9

"pathList" fieldmany methods.

"MaxViewedPaths"inspector & mutator

"pathDist"inspector & mutator

"lightPathRequest"inspector & mutator

Path creation. Conversion to GML format

New top concept. Contains the "copy" method.

mutator for "directed"edge object insertion

Concept #15

Empty concept

Path manipulation.

Old Top concept. Node removal. Group manipulation."copy" method moved to concept #19.

Concept #10

Example: Differences between the original version of the “Graph” class of VGJ (Visualizing Graphs with Java) and the Technion adaptation of that class.

Original appear in bold font, modifications appear in plain font

Page 22: Investigating JAVA Classes with Formal Concept Analysis

Backup MaterialBackup Material

Page 23: Investigating JAVA Classes with Formal Concept Analysis

9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 23

Concept #1

(PUB) getIndexFromNode(Node):int

Concept #5

(PRV) Edges : HashTable

(PUB) getEdges():Enumeration(PUB) getEdge(int,int): Edge(PRV) removeFalseEdges_( ):void(PUB) getEdgePathPoints_( int,int):DPoint3(PRV) fillBackEdges_():void(PUB) removeEdgePaths():void

Concept #4

(PRV) directed_: bool

(PUB) isDirected():bool

Concept #7

(PUB) setDirected(bool):void(PUB) insertEdge(Edge):void

Concept #6

(PUB) pack():void

Concept #9

(PRV) lastTopId_: int

(PUB) insertNodeAt(int):void(PRV) validateIds():void(PUB)insertNode():int(PUB)insertNode(bool):int

Concept #8

(PUB) insertEdge(int,int):void(PUB) insertEdge(int,int,DPoint3[]):void(PUB) insertEdge(int,int,DPoint3[],String):void(PUB) removeEdge(int,int):void(PUB) removeEdge(Edge):void(PUB) setGMLvalues(GMLobject):void

Concept #10

(PUB) copy(Graph):void(PUB) dummysToEdgePaths():void(PUB) killGroup(Node):void(PUB) removeGroups():void(PUB) removeNode(int):void(PUB) removeNode(Node):void(PUB) setNodeGroup(Node, Node):void

Concept #3

(PRV) idHash_: HashTable

(PUB) getNodeFromId( int):Node

Concept #2

(PRV) nodeList_ : NodeList

(PRV) adjustGroupChildren_(...):void(PUB) children(int):Set(PUB) firstAvailable():int(PUB) firstNode():Node(PUB) firstNodeIndex():int(PRV) getGroupCoordinates_(...):int(PUB) getNodeFromIndex(int):Node(PUB) group(Node, boolean):void(PUB) highestIndex():int(PRV) markGroupChildren(...):void(PUB) nextNode(Node):Node(PUB) nextNodeIndex(int):int(PUB) nodeFromIndex(int):Node(PUB) numberOfNodes():int(PUB) parents(int):Set

Graph ClassGraph Class