investigating java classes with formal concept analysis
DESCRIPTION
17-791 Software Research Seminar (SSSG). Investigating JAVA Classes with Formal Concept Analysis. Uri Dekel ([email protected]). Based on M.Sc. work at the Israeli Institute of Technology. To appear: 10 th Working Conference on Reverse Engineering (WCRE’03), and as a poster in OOPSLA’03. - PowerPoint PPT PresentationTRANSCRIPT
Investigating JAVA Investigating JAVA Classes with Classes with
Formal Concept Formal Concept Analysis Analysis
Uri Dekel Uri Dekel ([email protected])([email protected])Based on M.Sc. work at the Israeli Institute of Technology.
To appear: 10th Working Conference on Reverse Engineering (WCRE’03), and as a poster in OOPSLA’03
17-791 Software Research Seminar (SSSG)
9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 2
OutlineOutline
Research goals and hypothesesResearch goals and hypotheses A crash-course in formal concept A crash-course in formal concept
analysisanalysis Interface visualizationInterface visualization Reasoning about class Reasoning about class
implementation.implementation. Applications to code inspectionApplications to code inspection Additional researchAdditional research
9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 3
GoalsGoals
Research question:Research question:``Can we exploit the data-member based cohesion ``Can we exploit the data-member based cohesion
between function-methods in a class to reason between function-methods in a class to reason about the class and discover errors?’’about the class and discover errors?’’
Specifically:Specifically:1.1. Provide faster learning curve for new class users Provide faster learning curve for new class users
by improving interface presentationby improving interface presentation
2.2. Assist reverse engineering by visualizing structureAssist reverse engineering by visualizing structure
3.3. Assist code inspection by suggesting reading orderAssist code inspection by suggesting reading order Important principle: keep it simple to use and Important principle: keep it simple to use and
learn.learn.
9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 4
Hypothesis #1Hypothesis #1
Data-member use is fundamental to Data-member use is fundamental to understanding a class.understanding a class. All possible implementations of an All possible implementations of an
operation will use the same fieldsoperation will use the same fields Representation changes are rareRepresentation changes are rare Basis for cohesion-based metrics (e.g., Basis for cohesion-based metrics (e.g.,
LCOM)LCOM) Analogous to global variable based Analogous to global variable based
modularization of procedural code.modularization of procedural code.
9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 5
Hypothesis #2Hypothesis #2
Methods that use the same Methods that use the same combination of fields are likely to be combination of fields are likely to be related.related. e.g., get/set, add/remove, etc.e.g., get/set, add/remove, etc. Even more so due to the ``shopping list Even more so due to the ``shopping list
approach’’approach’’ Promotes complete interfaces using Promotes complete interfaces using
composite methodscomposite methods
9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 6
MeansMeans Formal Concept AnalysisFormal Concept Analysis
Mathematical classification techniqueMathematical classification technique Uses binary relation (Uses binary relation (contextcontext) between ) between objectsobjects and and
attributesattributes not to be confused with OO termsnot to be confused with OO terms
Produces a concept lattice (next slide)Produces a concept lattice (next slide) Much literature on applications in various fieldsMuch literature on applications in various fields
Pnt3D getX setX getY setY setXY getZ setZ setXYZ getColor setColor drawx √ √ √ √ √ √y √ √ √ √ √ √z √ √ √ √ √
color √ √ √ √
Ob
jects
Attributes
Example: Context of the Pnt3D class
9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 7
Formal Concept AnalysisFormal Concept Analysis Input: A context <O,A,R> Input: A context <O,A,R>
O is a set of O is a set of objectsobjects A is a set of A is a set of attributesattributes R is a binary relation between O and AR is a binary relation between O and A
Mapping: Mapping: Galois ConnectionGalois Connection Common attributesCommon attributes of a set of objects: of a set of objects:
Common objectsCommon objects of a set of attributes: of a set of attributes:
Output: Output: ConceptsConcepts <O’,A’> s.t. <O’,A’> s.t.
RaoOoAaOOCA ,:''
')'(
')'(
OACO
AOCA RaoAaOoAACO ,:''
9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 8
A concept lattice is based upon a partial order between concepts:
Formal Concept AnalysisFormal Concept Analysis
Concept Objects Attributes
C1 {} {Pnt3D, getx, setx, getY, setY, setXY, getZ, setZ, setXYZ, getColor, setColor, draw}C2 {color} {Pnt3D, getColor, setColor, draw}C3 {x} {Pnt3D, getX, setX, setXY, setXYZ, draw}C4 {y} {Pnt3D, getY, setY, setXY, setXYZ, draw}C5 {x,y} {Pnt3D, setXY, setXYZ, draw}C6 {z} {Pnt3D, getZ, seZ, setXYZ, draw}C7 {x,y,z} {Pnt3D, setXYZ, draw}C8 {x,y,color,z} {Pnt3D, draw}
Example: Concepts of the Pnt3D class
xgetX() setX()setXY() Pnt3D()setXYZ() draw()
C3
getX() setX() getY() getY() setXY() Pnt3D()getColor() setColor() getZ() setZ() setXYZ() draw()
C1
x ysetXY() Pnt3D() setXYZ() draw()
C5
Pnt3D() setXYZ() draw()x y z C7
x y color zPnt3D() draw()
C8
colorgetColor()setColor()Pnt3D() Draw()
C2 ygetY() setY()setXY() Pnt3D()setXYZ() draw()
C4 zPnt3D() getZ()setZ() setXYZ()draw()
C6
21212211 ,, AAOOAOAO
9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 9
Concept LatticesConcept Lattices
A A sparse concept lattice sparse concept lattice provides an provides an alternate view of the tabular context and alternate view of the tabular context and the full concept latticethe full concept lattice Each concept is a group of objects which have Each concept is a group of objects which have
the same attributes the same attributes The attributes are the union of attributes in that The attributes are the union of attributes in that
concept and all the concept that it dominatesconcept and all the concept that it dominates
In our case, methods that useIn our case, methods that usethe same fields are clustered the same fields are clustered togethertogether Reveals structure and asymmetriesReveals structure and asymmetries
colorgetColor()setColor()
setXY()
setXYZ()
Pnt3D()draw()
zgetZ()setZ()
xgetX()setX()
ygetY()setY()
9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 10
Interface VisualizationInterface Visualization
The lattice partitions the methods in the The lattice partitions the methods in the interface into equivalence classesinterface into equivalence classes Similar methods are heuristically clustered Similar methods are heuristically clustered
together.together. An automatic ``feature categorization’’An automatic ``feature categorization’’
Lattice provides multidimensional Lattice provides multidimensional connectionsconnections
Compare with simple lexical lists of methodsCompare with simple lexical lists of methods
(Note: class is “flattened” to remove (Note: class is “flattened” to remove inheritance details)inheritance details)
9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 11
Interface VisualizationInterface Visualization
To be effective, multiple methods To be effective, multiple methods should appear in each concept, on should appear in each concept, on averageaverage
A lattice can have up to A lattice can have up to n=2n=2MIN(|M|,|F|)MIN(|M|,|F|) conceptsconcepts In a data set of circa 6000 classes:In a data set of circa 6000 classes:
In 99.5%, In 99.5%, n < M + Fn < M + F In 77.4%, n < MIn 77.4%, n < MExample: Concepts vs.
Methods in Eclipse.
9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 12
Case StudyCase Study
The The MoleculeMolecule class from class from CDKCDK CDKCDK: Chemistry Development Kit: Chemistry Development Kit
Open source library of chemistry related classesOpen source library of chemistry related classes Developed at the Max Plank institute in GermanyDeveloped at the Max Plank institute in Germany Used in chemistry visualization applicationsUsed in chemistry visualization applications
Why the Why the MoleculeMolecule class? class? Has a large interface (nearly 75 public members)Has a large interface (nearly 75 public members) The represented entity is familiar to most peopleThe represented entity is familiar to most people
Our technique revealed new errors in Our technique revealed new errors in this class.this class.
9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 13
Case StudyCase Study
Lattice structure hints on class Lattice structure hints on class structurestructure A lot of independent operations on the A lot of independent operations on the
left.left. Similar to a C struct.Similar to a C struct.
Cohesive component on the right.Cohesive component on the right.
contains(Bond):booleangetBond(Atom,Atom):BondgetBonds():Bond[]getBondCount(Atom):intgetBondOrderSum(Atom):intgetConnectedAtoms(Atom):Atom[]getConnectedAtomsVector(Atom):VectorgetConnectedBonds(Atom):Bond[]getDegree(Atom):intgetHighestCurrentBondOrder(Atom):doublegetMinimumBondOrder(Atom):doubleremoveBond(Bond):BondremoveBond(int):BondremoveBond(Atom,Atom):boolean
atoms():EnumerationshallowCopy():Object
addListener(ChemObjectListener):voidremoveListener(ChemObjectListener):void
getRemark(Object):ObjectsetRemark(Object,Object):void
add(AtomContainer):voidaddBonds(double):voidclone():ObjectgetIntersection(AtomContainer):AtomContainerremoveAllElements():void
Molecule()Molecule(AtomContainer)Molecule(int,int)Molecule(Molecule)
flags:boolean[]
pointers:Vector[]
contains(Atom):booleanget2DCenter():Point2Dget3DCenter():Point3DgetAtoms():Atom[]getAtomNumber(Atom):intgetLastAtom():AtomremoveAtom(Atom):voidremoveAtom(int):void
addAtom(Atom):void getDegree(int):int
getConnectionMatrix():double[][]remove(AtomContainer):voidremoveAtomAndConnectedBonds(Atom):voidtoString()::String
addBond(int,int,int):voidaddBond(int,int,int,int):void
addBonds(AtomContainer):voidaddBond(Bond):voidremoveAllBonds():void
setProperty(Object,Object):voidgetProperty(Object):Object
setPhysicalProperty(Object,Object):voidgetPhysicalProperty(Object):Object
addChemName(String):voidgetChemName(int):StringgetChemNames():VectorgetChemNamesCount():intsetChemNames(Vector):void
getBeilsteinRN():StringsetBeilsteinRN(String):void
getCasRN():StringsetCasRN(String):void
getAutonomName():StringsetAutonomName(String):void
getFirstAtom():AtomsetAtomAt(int,Atom):voidgetAtomAt(int):AtomsetAtoms(Atom[]):void
getAtomCount():intsetAtomCount(int):void
getBondAt(int):BondsetBondAt(int,Bond):voidsetBonds(Bond[]):void getBondCount():int
title:StringgetTitle():StringsetTitle(String):void
C2 C3 C4 C5 C6 C7
C8
C9 C10C11
C12C13
C14C15
C1
C21
C23
C24 C25
C20
C17
C22
C16
C18 C19
C26
9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 14
Interface VisualizationInterface Visualization
Multiple Multiple methods with the methods with the similar similar signatures signatures indicate possible indicate possible repetition.repetition.
Inconsistency in Inconsistency in naming.naming.
Inconsistencies Inconsistencies in return types.in return types.
contains(Bond):booleangetBond(Atom,Atom):BondgetBonds():Bond[]getBondCount(Atom):intgetBondOrderSum(Atom):intgetConnectedAtoms(Atom):Atom[]getConnectedAtomsVector(Atom):VectorgetConnectedBonds(Atom):Bond[]getDegree(Atom):intgetHighestCurrentBondOrder(Atom):doublegetMinimumBondOrder(Atom):doubleremoveBond(Bond):BondremoveBond(int):BondremoveBond(Atom,Atom):boolean
C17
Because related methods are grouped Because related methods are grouped in concepts, we can notice in concepts, we can notice inconsistencies or repetitionsinconsistencies or repetitions
9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 15
Investigate Investigate ImplementationImplementation
We examine fields and dependencies We examine fields and dependencies between concepts to understand the between concepts to understand the cohesive componentcohesive component Collections of Collections of atoms and bondsatoms and bonds
Micro-Micro-management of management of arrays (arrays (countcount field tracks field tracks available items)available items)
Inconsistencies Inconsistencies and broken and broken invariants.invariants.
addBonds(AtomContainer):voidaddBond(Bond):voidremoveAllBonds():void
contains(Atom):booleanget2DCenter():Point2Dget3DCenter():Point3DgetAtoms():Atom[]getAtomNumber(Atom):intgetLastAtom():AtomremoveAtom(Atom):voidremoveAtom(int):void
contains(Bond):booleangetBond(Atom,Atom):BondgetBonds():Bond[]getBondCount(Atom):intgetBondOrderSum(Atom):intgetConnectedAtoms(Atom):Atom[]getConnectedAtomsVector(Atom):VectorgetConnectedBonds(Atom):Bond[]getDegree(Atom):intgetHighestCurrentBondOrder(Atom):doublegetMinimumBondOrder(Atom):doubleremoveBond(Bond):BondremoveBond(int):BondremoveBond(Atom,Atom):Bond
getDegree(int):int
addBond(int,int,int):voidaddBond(int,int,int,int):void
getConnectionMatrix():double[][]remove(AtomContainer):voidremoveAtomAndConnectedBonds(Atom):voidtoString()::String
addAtom(Atom):void
clone():ObjectaddBonds(double):voidremoveAllElements():voidadd(AtomContainer):voidgetIntersection(AtomContainer):AtomContainer
getBondCount():intbondCount:int
growArraySize:int
getFirstAtom():AtomsetAtomAt(int,Atom):voidgetAtomAt(int):AtomsetAtoms(Atom[]):void
atoms:Atom[]
getBondAt(int):BondsetBondAt(int,Bond):voidsetBonds(Bond[]):void
bonds:Bond[]
getAtomCount():intsetAtomCount(int):void
atomCount:intC10
C11
C12C13 C14
C16
C26
C17
C20
C22
C19
C21C18
C23
9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 16
Investigate Investigate ImplementationImplementation
Asymmetries are revealed by Asymmetries are revealed by examining pairs of related concepts.examining pairs of related concepts.
addBonds(AtomContainer):voidaddBond(Bond):voidremoveAllBonds():void
contains(Atom):booleanget2DCenter():Point2Dget3DCenter():Point3DgetAtoms():Atom[]getAtomNumber(Atom):intgetLastAtom():AtomremoveAtom(Atom):voidremoveAtom(int):void
contains(Bond):booleangetBond(Atom,Atom):BondgetBonds():Bond[]getBondCount(Atom):intgetBondOrderSum(Atom):intgetConnectedAtoms(Atom):Atom[]getConnectedAtomsVector(Atom):VectorgetConnectedBonds(Atom):Bond[]getDegree(Atom):intgetHighestCurrentBondOrder(Atom):doublegetMinimumBondOrder(Atom):doubleremoveBond(Bond):BondremoveBond(int):BondremoveBond(Atom,Atom):Bond
addBond(int,int,int):voidaddBond(int,int,int,int):void
addAtom(Atom):void
getBondCount():intbondCount:int
getFirstAtom():AtomsetAtomAt(int,Atom):voidgetAtomAt(int):AtomsetAtoms(Atom[]):void
atoms:Atom[]
getBondAt(int):BondsetBondAt(int,Bond):voidsetBonds(Bond[]):void
bonds:Bond[]
getAtomCount():intsetAtomCount(int):void
atomCount:intC10
C11
C14
C16
C17
C20
C22
C18
C13
Methodsfor Atoms
Methodsfor Bonds
Arrays Counts Arrays And Counts Additions
9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 17
Embedded Call GraphEmbedded Call Graph
A concept lattice clusters methods A concept lattice clusters methods but does not portray interactionsbut does not portray interactions
Call graphs show interaction Call graphs show interaction between methods but layout does between methods but layout does not depend on semanticsnot depend on semantics
Embedded call graph combines the Embedded call graph combines the twotwo setXYZ
getColor setColor getX setX getY setY
setXY
draw Pnt3D
getZ setZ
xcolor C3 C y zC2 C4 C6
C1
C5
C8
C7
9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 18
Code InspectionCode Inspection
Lattice can help us select a reading orderLattice can help us select a reading order Minimize focus shifts.Minimize focus shifts. Similar methods are read consecutively.Similar methods are read consecutively.
We define a global order between concepts.We define a global order between concepts. e.g., each component separately, topological e.g., each component separately, topological
ordering, read by order of layers. ordering, read by order of layers. We define a local order between methods in We define a local order between methods in
each concept.each concept. e.g., topological ordering, read by order of e.g., topological ordering, read by order of
simplicity, etc.simplicity, etc.
9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 19
Tooling SupportTooling Support
Batch-mode prototypeBatch-mode prototype Produces lattices and metricsProduces lattices and metrics Database-support for metrics and Database-support for metrics and
statistics researchstatistics research Interactive Eclipse plug-in prototypeInteractive Eclipse plug-in prototype
Adds an additional view for a Adds an additional view for a .java.java files files Uses simplistic external static analyzer.Uses simplistic external static analyzer. Limited by current 2D capabilities of Limited by current 2D capabilities of
eclipse.eclipse.
9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 20
Research DirectionsResearch Directions
Conduct user studies to validate Conduct user studies to validate methodologymethodology Preliminary user-studies provided good Preliminary user-studies provided good
feedbackfeedback Lattice-based metrics suiteLattice-based metrics suite Application to class design in CASE toolsApplication to class design in CASE tools
Interactive class diagram editor based on Interactive class diagram editor based on concept latticeconcept lattice
Semantics assigned by connecting methods to Semantics assigned by connecting methods to fields. Compare with simply adding methods to a fields. Compare with simply adding methods to a list as in current tools.list as in current tools.
9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 21
Research DirectionsResearch Directions
Class-wide “diffing”Class-wide “diffing” Provide birds-eye view of changed areas.Provide birds-eye view of changed areas.
Concept #5
packing
"directed"
Bottom Concept.Single utility method.
"idHash" "nodeList" "edges"
Edge insertion and removal by node indices.Conversion to GML format moved to #18.
"lastTopId" field. Node insertions.
Concept #14Concept #12 Concept #13Concept #11
Concept #16
Concept #17 Concept #18
Concept #19
Concept #6
Concept #4
Concept #1
Concept #3 Concept #2
Concept #7
Concept #8Concept #9
"pathList" fieldmany methods.
"MaxViewedPaths"inspector & mutator
"pathDist"inspector & mutator
"lightPathRequest"inspector & mutator
Path creation. Conversion to GML format
New top concept. Contains the "copy" method.
mutator for "directed"edge object insertion
Concept #15
Empty concept
Path manipulation.
Old Top concept. Node removal. Group manipulation."copy" method moved to concept #19.
Concept #10
Example: Differences between the original version of the “Graph” class of VGJ (Visualizing Graphs with Java) and the Technion adaptation of that class.
Original appear in bold font, modifications appear in plain font
Backup MaterialBackup Material
9/25/2003 Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar 23
Concept #1
(PUB) getIndexFromNode(Node):int
Concept #5
(PRV) Edges : HashTable
(PUB) getEdges():Enumeration(PUB) getEdge(int,int): Edge(PRV) removeFalseEdges_( ):void(PUB) getEdgePathPoints_( int,int):DPoint3(PRV) fillBackEdges_():void(PUB) removeEdgePaths():void
Concept #4
(PRV) directed_: bool
(PUB) isDirected():bool
Concept #7
(PUB) setDirected(bool):void(PUB) insertEdge(Edge):void
Concept #6
(PUB) pack():void
Concept #9
(PRV) lastTopId_: int
(PUB) insertNodeAt(int):void(PRV) validateIds():void(PUB)insertNode():int(PUB)insertNode(bool):int
Concept #8
(PUB) insertEdge(int,int):void(PUB) insertEdge(int,int,DPoint3[]):void(PUB) insertEdge(int,int,DPoint3[],String):void(PUB) removeEdge(int,int):void(PUB) removeEdge(Edge):void(PUB) setGMLvalues(GMLobject):void
Concept #10
(PUB) copy(Graph):void(PUB) dummysToEdgePaths():void(PUB) killGroup(Node):void(PUB) removeGroups():void(PUB) removeNode(int):void(PUB) removeNode(Node):void(PUB) setNodeGroup(Node, Node):void
Concept #3
(PRV) idHash_: HashTable
(PUB) getNodeFromId( int):Node
Concept #2
(PRV) nodeList_ : NodeList
(PRV) adjustGroupChildren_(...):void(PUB) children(int):Set(PUB) firstAvailable():int(PUB) firstNode():Node(PUB) firstNodeIndex():int(PRV) getGroupCoordinates_(...):int(PUB) getNodeFromIndex(int):Node(PUB) group(Node, boolean):void(PUB) highestIndex():int(PRV) markGroupChildren(...):void(PUB) nextNode(Node):Node(PUB) nextNodeIndex(int):int(PUB) nodeFromIndex(int):Node(PUB) numberOfNodes():int(PUB) parents(int):Set
Graph ClassGraph Class