unsupervised machine learning for clone detection
DESCRIPTION
"Unsupervised Machine Learning for clone detection" highlights the main topics of using Unsupervised Machine Learning techniques (Kernel methods and data clustering) for the code clones detection task.TRANSCRIPT
![Page 2: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/2.jpg)
General Disclaimer: All the Maths appearing in the next slides is only intended to better introduce the considered case studies. Speakers are not
responsible for any possible disease or “brain consumption” caused by too much formulas.
So BEWARE; use this information at your own risk!
It's intention is solely educational. We would strongly encourage you to use this information in cooperation with a medical or
health professional.
Aw
ful M
aths
![Page 3: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/3.jpg)
Number one in the stink parade is duplicated code.
If you see the same code structure in more than one
place, you can be sure that your program will be better
if you find a way to unify them.
![Page 4: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/4.jpg)
ImageMapOutputFormat.java SVGOutputFormat.java
JHOTDRAW
![Page 5: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/5.jpg)
CPY
THO
N 2.
5.1
![Page 6: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/6.jpg)
PYTHON (NLTK)
![Page 7: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/7.jpg)
PROBLEM
S T A T EM E N T
CLONE DETECTIONSoftware clones are fragments of code that are similar according
to some predefined measure of similarity
I.D. Baxter, 1998
![Page 8: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/8.jpg)
PROBLEM
S T A T EM E N T
CLONE DETECTION
![Page 9: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/9.jpg)
PROBLEM
S T A T EM E N T
CLONE DETECTION
Clones Textual Similarity
![Page 10: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/10.jpg)
PROBLEM
S T A T EM E N T
CLONE DETECTION
Clones Functional Similarity
![Page 11: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/11.jpg)
PROBLEM
S T A T EM E N T
CLONE DETECTION
Clones affect the reliability of the system!
Sneaky Bug!
![Page 12: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/12.jpg)
DIFFERENT TYPES OF CLONES
![Page 13: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/13.jpg)
THE ORIGINAL ONE
# Original Fragmentdef do_something_cool_in_Python(filepath, marker='---end---'):! lines = list()! with open(filepath) as report:! ! for l in report:! ! ! if l.endswith(marker):! ! ! ! lines.append(l) # Stores only lines that ends with "marker"! return lines #Return the list of different lines
![Page 14: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/14.jpg)
TYPE 1: Exact Copy• Identical code segments except for differences in
layout, whitespace, and comments
![Page 15: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/15.jpg)
def do_something_cool_in_Python (filepath, marker='---end---'):! lines = list() # This list is initially empty
! with open(filepath) as report:! ! for l in report: # It goes through the lines of the file! ! ! if l.endswith(marker):! ! ! ! lines.append(l)! return lines
TYPE 1: Exact Copy• Identical code segments except for differences in
layout, whitespace, and comments
# Original Fragmentdef do_something_cool_in_Python(filepath, marker='---end---'):! lines = list()! with open(filepath) as report:! ! for l in report:! ! ! if l.endswith(marker):! ! ! ! lines.append(l) # Stores only lines that ends with "marker"! return lines #Return the list of different lines
![Page 16: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/16.jpg)
TYPE 2: Parameter Substituted• Structurally identical segments except for differences in identifiers, literals,
layout, whitespace, and comments
![Page 17: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/17.jpg)
# Type 2 Clonedef do_something_cool_in_Python(path, end='---end---'):! targets = list()! with open(path) as data_file:! ! for t in datae:! ! ! if l.endswith(end):! ! ! ! targets.append(t) # Stores only lines that ends with "marker"! #Return the list of different lines! return targets
# Original Fragmentdef do_something_cool_in_Python(filepath, marker='---end---'):! lines = list()! with open(filepath) as report:! ! for l in report:! ! ! if l.endswith(marker):! ! ! ! lines.append(l) # Stores only lines that ends with "marker"! return lines #Return the list of different lines
TYPE 2: Parameter Substituted• Structurally identical segments except for differences in identifiers, literals,
layout, whitespace, and comments
![Page 18: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/18.jpg)
TYPE 3: Structure Substituted• Similar segments with further modifications such as changed, added (or deleted)
statements, in additions to variations in identifiers, literals, layout and comments
![Page 19: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/19.jpg)
import osdef do_something_with(path, marker='---end---'):! # Check if the input path corresponds to a file! if not os.path.isfile(path):! ! return None! bad_ones = list()! good_ones = list()! with open(path) as report:! ! for line in report:! ! ! line = line.strip()! ! ! if line.endswith(marker):! ! ! ! good_ones.append(line)! ! ! else:! ! ! ! bad_ones.append(line)! #Return the lists of different lines! return good_ones, bad_ones
TYPE 3: Structure Substituted• Similar segments with further modifications such as changed, added (or deleted)
statements, in additions to variations in identifiers, literals, layout and comments
![Page 20: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/20.jpg)
import osdef do_something_with(path, marker='---end---'):! # Check if the input path corresponds to a file! if not os.path.isfile(path):! ! return None! bad_ones = list()! good_ones = list()! with open(path) as report:! ! for line in report:! ! ! line = line.strip()! ! ! if line.endswith(marker):! ! ! ! good_ones.append(line)! ! ! else:! ! ! ! bad_ones.append(line)! #Return the lists of different lines! return good_ones, bad_ones
TYPE 3: Structure Substituted• Similar segments with further modifications such as changed, added (or deleted)
statements, in additions to variations in identifiers, literals, layout and comments
![Page 21: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/21.jpg)
import osdef do_something_with(path, marker='---end---'):! # Check if the input path corresponds to a file! if not os.path.isfile(path):! ! return None! bad_ones = list()! good_ones = list()! with open(path) as report:! ! for line in report:! ! ! line = line.strip()! ! ! if line.endswith(marker):! ! ! ! good_ones.append(line)! ! ! else:! ! ! ! bad_ones.append(line)! #Return the lists of different lines! return good_ones, bad_ones
TYPE 3: Structure Substituted• Similar segments with further modifications such as changed, added (or deleted)
statements, in additions to variations in identifiers, literals, layout and comments
![Page 22: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/22.jpg)
import osdef do_something_with(path, marker='---end---'):! # Check if the input path corresponds to a file! if not os.path.isfile(path):! ! return None! bad_ones = list()! good_ones = list()! with open(path) as report:! ! for line in report:! ! ! line = line.strip()! ! ! if line.endswith(marker):! ! ! ! good_ones.append(line)! ! ! else:! ! ! ! bad_ones.append(line)! #Return the lists of different lines! return good_ones, bad_ones
TYPE 3: Structure Substituted• Similar segments with further modifications such as changed, added (or deleted)
statements, in additions to variations in identifiers, literals, layout and comments
![Page 23: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/23.jpg)
TYPE 4: “Functional” Copies• Semantically equivalent segments that perform the same
computation but are implemented by different syntactic variants
![Page 24: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/24.jpg)
# Original Fragmentdef do_something_cool_in_Python(filepath, marker='---end---'):! lines = list()! with open(filepath) as report:! ! for l in report:! ! ! if l.endswith(marker):! ! ! ! lines.append(l) # Stores only lines that ends with "marker"! return lines #Return the list of different lines
def do_always_the_same_stuff(filepath, marker='---end---'):! report = open(filepath)! file_lines = report.readlines()! report.close()! #Filters only the lines ending with marker! return filter(lambda l: len(l) and l.endswith(marker), file_lines)
TYPE 4: “Functional” Copies• Semantically equivalent segments that perform the same
computation but are implemented by different syntactic variants
![Page 25: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/25.jpg)
![Page 26: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/26.jpg)
HTTP
D 2.
2.14
: TYP
E 1
![Page 27: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/27.jpg)
HTTP
D 2.
2.14
: TYP
E 2
![Page 28: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/28.jpg)
HTTP
D 2.
2.14
: TYP
E 3
![Page 29: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/29.jpg)
SOUR
CE
CO
DE IN
FOR
MAT
ION
![Page 30: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/30.jpg)
SOUR
CE
CO
DE IN
FOR
MAT
ION
![Page 31: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/31.jpg)
SOUR
CE
CO
DE IN
FOR
MAT
ION FUNCTION
parser_compare PARAMS
PARAMPARAM
node *left node *right
IF-STMT IF-STMT RETURN-STMT
BODY
CALL-STMT
parser_compare_node
PARAMS
STRUCT-OP
right st_nodeleft st_node
BODY BODYCOND COND
OR
====
left right0 0
==
rightleft
RETURN-STMTRETURN-STMT
00
![Page 32: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/32.jpg)
SOUR
CE
CO
DE IN
FOR
MAT
ION
ENTRY EXIT
FORMAL-IN
ACTUAL-IN
ACTUAL-IN
FORMAL-IN
BODY
CONTROL-POINTEXPR
CONTROL-POINT CONTROL-POINT CALL-SITE
RETURN
ACTUAL-OUT
RETURN
EXPR
EXPR
FORMAL-OUT
![Page 33: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/33.jpg)
Duplix
Scorpio
PMD
CCFinder
Dup CPD
Duplix
Shinobi
Clone Detective
Gemini
iClones
KClone
ConQAT
Deckard
Clone Digger
JCCD
CloneDr SimScan
CLICS
NiCAD
Simian
Duploc
Dude
SDD
STAT
E O
F TH
E A
RT
TOO
LS
![Page 34: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/34.jpg)
Duplix
Scorpio
PMD
CCFinder
Dup CPD
Duplix
Shinobi
Clone Detective
Gemini
iClones
KClone
ConQAT
Deckard
Clone Digger
JCCD
CloneDr SimScan
CLICS
NiCAD
Simian
Duploc
Dude
SDD
Text Based Tools:
Text is compared line by line
STAT
E O
F TH
E A
RT
TOO
LS
![Page 35: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/35.jpg)
Duplix
Scorpio
PMD
CCFinder
Dup CPD
Duplix
Shinobi
Clone Detective
Gemini
iClones
KClone
ConQAT
Deckard
Clone Digger
JCCD
CloneDr SimScan
CLICS
NiCAD
Simian
Duploc
Dude
SDD
Token Based Tools:Token sequences are
compared to sequences
STAT
E O
F TH
E A
RT
TOO
LS
![Page 36: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/36.jpg)
Duplix
Scorpio
PMD
CCFinder
Dup CPD
Duplix
Shinobi
Clone Detective
Gemini
iClones
KClone
ConQAT
Deckard
Clone Digger
JCCD
CloneDr SimScan
CLICS
NiCAD
Simian
Duploc
Dude
SDDSyntax Based Tools:
Syntax subtrees are compared
to each other
STAT
E O
F TH
E A
RT
TOO
LS
![Page 37: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/37.jpg)
Duplix
Scorpio
PMD
CCFinder
Dup CPD
Duplix
Shinobi
Clone Detective
Gemini
iClones
KClone
ConQAT
Deckard
Clone Digger
JCCD
CloneDr SimScan
CLICS
NiCAD
Simian
Duploc
Dude
SDD
Graph Based Tools:
(sub) graphs are compared to
each other
STAT
E O
F TH
E A
RT
TOO
LS
![Page 38: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/38.jpg)
• String/Token based Techniques:
• Pros: Run very fast
• Cons: Too many false clones
STAT
E O
F TH
E A
RT
TEC
HNIQ
UES
![Page 39: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/39.jpg)
• String/Token based Techniques:
• Pros: Run very fast
• Cons: Too many false clones
• Syntax based (AST) Techniques:
• Pros: Well suited to detect structural similarities
• Cons: Not Properly suited to detect Type 3 Clones
STAT
E O
F TH
E A
RT
TEC
HNIQ
UES
![Page 40: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/40.jpg)
• String/Token based Techniques:
• Pros: Run very fast
• Cons: Too many false clones
• Syntax based (AST) Techniques:
• Pros: Well suited to detect structural similarities
• Cons: Not Properly suited to detect Type 3 Clones
• Graph based Techniques:
• Pros: The only one able to deal with Type 4 Clones
• Cons: Performance Issues
STAT
E O
F TH
E A
RT
TEC
HNIQ
UES
![Page 41: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/41.jpg)
USE MACHINE LEARNING
L U K E
![Page 42: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/42.jpg)
USE MACHINE LEARNING
L U K E
• Provides computational effective solutions to analyze large data sets
![Page 43: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/43.jpg)
USE MACHINE LEARNING
L U K E
• Provides computational effective solutions to analyze large data sets
• Provides solutions that can be tailored to different tasks/domains
![Page 44: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/44.jpg)
USE MACHINE LEARNING
L U K E
• Provides computational effective solutions to analyze large data sets
• Provides solutions that can be tailored to different tasks/domains
• Requires many efforts in:
![Page 45: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/45.jpg)
USE MACHINE LEARNING
L U K E
• Provides computational effective solutions to analyze large data sets
• Provides solutions that can be tailored to different tasks/domains
• Requires many efforts in:
• the definition of the relevant information best suited for the specific task/domain
![Page 46: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/46.jpg)
USE MACHINE LEARNING
L U K E
• Provides computational effective solutions to analyze large data sets
• Provides solutions that can be tailored to different tasks/domains
• Requires many efforts in:
• the definition of the relevant information best suited for the specific task/domain
• the application of the learning algorithms to the considered data
![Page 47: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/47.jpg)
UN
SUPE
RVIS
ED L
EARN
ING
• Supervised Learning: • Learn from labelled samples
• Unsupervised Learning:• Learn (directly) from the data
Learn by examples
![Page 48: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/48.jpg)
UN
SUPE
RVIS
ED L
EARN
ING
• Supervised Learning: • Learn from labelled samples
• Unsupervised Learning:• Learn (directly) from the data
Learn by examples
(+) No cost of labeling samples
(-) Trade-off imposed on the quality of the data
![Page 49: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/49.jpg)
CODESTRUCTURES
KER
NEL
S FO
R ST
RUC
TURE
SComputation of the dot product between (Graph) Structures
K( ),
![Page 50: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/50.jpg)
CODESTRUCTURES
KER
NEL
S FO
R ST
RUC
TURE
S
Abstract Syntax Tree (AST) Tree structure representing the syntactic structure of the different instructions of a program (function)
Program Dependencies Graph (PDG)
(Directed) Graph structure representing the relationship among the different statement of a program
Computation of the dot product between (Graph) Structures
K( ),
![Page 51: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/51.jpg)
CODEKE
RNEL
FO
R C
LONE
S
![Page 52: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/52.jpg)
<
x y = =
x +
x 1
y -
y 1
while
block
while
block
block
if
>
b a = =
a +
a 1
b -
b 1
>
b 0 =
c 3
CODE ASTKE
RNEL
FO
R C
LONE
S
![Page 53: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/53.jpg)
<
x y = =
x +
x 1
y -
y 1
while
block
while
block
block
if
>
b a = =
a +
a 1
b -
b 1
>
b 0 =
c 3
CODE AST AST KERNELKE
RNEL
FO
R C
LONE
S
<
block
while
= =
block
=
y -
=
x +
+
x 1
-
y 1
<
x y
>
b 0 =
c 3
if
block
>
b a
-
b 1
<block
while
+
a 1
=
b -
=
a +
![Page 54: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/54.jpg)
while
block<
x y
KERNELSFOR CODE
STRUCTURES:AST
KERN
EL F
EATU
RES
![Page 55: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/55.jpg)
while
block<
x y
KERNELSFOR CODE
STRUCTURES:AST
KERN
EL F
EATU
RES Instruction Class (IC)
i.e., LOOP, CALL,CONDITIONAL_STATEMENT
![Page 56: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/56.jpg)
while
block<
x y
KERNELSFOR CODE
STRUCTURES:AST
KERN
EL F
EATU
RES Instruction Class (IC)
i.e., LOOP, CALL,CONDITIONAL_STATEMENT
Instruction (I)i.e., FOR, IF, WHILE, RETURN
![Page 57: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/57.jpg)
while
block<
x y
KERNELSFOR CODE
STRUCTURES:AST
KERN
EL F
EATU
RES Instruction Class (IC)
i.e., LOOP, CALL,CONDITIONAL_STATEMENT
Instruction (I)i.e., FOR, IF, WHILE, RETURN
Context (C)i.e., Instruction Class of the closer statement node
![Page 58: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/58.jpg)
while
block<
x y
KERNELSFOR CODE
STRUCTURES:AST
KERN
EL F
EATU
RES Instruction Class (IC)
i.e., LOOP, CALL,CONDITIONAL_STATEMENT
Instruction (I)i.e., FOR, IF, WHILE, RETURN
Context (C)i.e., Instruction Class of the closer statement node
Lexemes (Ls)Lexical information gathered (recursively) from leaves
![Page 59: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/59.jpg)
while
block<
x y
KERNELSFOR CODE
STRUCTURES:AST
KERN
EL F
EATU
RES
IC = Conditional-ExprI = Less-operatorC = LoopLs= [x,y]
IC = LoopI = while-loopC = Function-BodyLs= [x, y]
Instruction Class (IC)i.e., LOOP, CALL,CONDITIONAL_STATEMENT
Instruction (I)i.e., FOR, IF, WHILE, RETURN
Context (C)i.e., Instruction Class of the closer statement node
Lexemes (Ls)Lexical information gathered (recursively) from leaves
IC = BlockI = while-bodyC = LoopLs= [ x ]
![Page 60: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/60.jpg)
CLONE DETECTION• Comparison with another (pure) AST-based clone detector
• Comparison on a system with randomly seeded clones
0
0.25
0.5
0.75
1
Precision Recall F-measure
CloneDigger Tree Kernel Tool
RESULTS
Results refer to clones where code
fragments have been modified by adding/
removing or changing code statements
![Page 61: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/61.jpg)
0
0.25
0.50
0.75
1.00
0.6 0.62 0.64 0.66 0.68 0.7 0.72 0.74 0.76 0.78 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98
Precision, Recall and F-Measure
Precision Recall F1
Precision: How accurate are the obtained results? (Altern.) How many errors do they contain?
Recall: How complete are the obtained results? (Altern.) How many clones have been retrieved w.r.t. Total Clones?
![Page 62: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/62.jpg)
CODESTRUCTURES
PDG NODES AND EDGES
while call-site
argexpr
![Page 63: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/63.jpg)
CODESTRUCTURES
PDG
• Two Types of Nodes
• Control Nodes (Dashed ones)
• e.g., if - for - while - function calls...
• Data Nodes
• e.g., expressions - parameters...
NODES AND EDGES
while call-site
argexpr
![Page 64: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/64.jpg)
CODESTRUCTURES
PDG
• Two Types of Nodes
• Control Nodes (Dashed ones)
• e.g., if - for - while - function calls...
• Data Nodes
• e.g., expressions - parameters...
• Two Types of Edges (i.e., dependencies)
• Control edges (Dashed ones)
• Data edges
NODES AND EDGES
while call-site
argexpr
![Page 65: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/65.jpg)
• Features of nodes:
• Node Label
• i.e., , WHILE, CALL-SITE, EXPR, ...
• Node Type
• i.e., Data Node or Control Node
• Features of edges:
• Edge Type
• i.e., Data Edge or Control Edge
KERNELSFOR CODE
STRUCTURES:PDG
GRAPH KERNELS FOR PDG
while
call-site
arg
expr expr
![Page 66: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/66.jpg)
• Features of nodes:
• Node Label
• i.e., , WHILE, CALL-SITE, EXPR, ...
• Node Type
• i.e., Data Node or Control Node
• Features of edges:
• Edge Type
• i.e., Data Edge or Control Edge
KERNELSFOR CODE
STRUCTURES:PDG
Node Label = WHILENode Type = Control Node
GRAPH KERNELS FOR PDG
while
call-site
arg
expr expr
Control Edge
Data Edge
![Page 67: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/67.jpg)
while
call-site
arg
expr expr
while
call-site
arg
expr call-site
GRAPH KERNELS FOR PDG• Goal: Identify common subgraphs
• Selectors: Compare nodes to each others and explore the subgraphs of only “compatible”
nodes (i.e., Nodes of the same type)
• Context: The subgraph of a node (with paths whose lengths are at most L to avoid loops)
![Page 68: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/68.jpg)
while
call-site
arg
expr expr
while
call-site
arg
expr call-site
GRAPH KERNELS FOR PDG• Goal: Identify common subgraphs
• Selectors: Compare nodes to each others and explore the subgraphs of only “compatible”
nodes (i.e., Nodes of the same type)
• Context: The subgraph of a node (with paths whose lengths are at most L to avoid loops)
![Page 69: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/69.jpg)
while
call-site
arg
expr expr
while
call-site
arg
expr call-site
GRAPH KERNELS FOR PDG• Goal: Identify common subgraphs
• Selectors: Compare nodes to each others and explore the subgraphs of only “compatible”
nodes (i.e., Nodes of the same type)
• Context: The subgraph of a node (with paths whose lengths are at most L to avoid loops)
![Page 70: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/70.jpg)
while
call-site
arg
expr expr
while
call-site
arg
expr call-site
GRAPH KERNELS FOR PDG• Goal: Identify common subgraphs
• Selectors: Compare nodes to each others and explore the subgraphs of only “compatible”
nodes (i.e., Nodes of the same type)
• Context: The subgraph of a node (with paths whose lengths are at most L to avoid loops)
![Page 71: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/71.jpg)
SCEN
ARIO
-BAS
ED
EVAL
UATIO
N
![Page 72: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/72.jpg)
![Page 73: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/73.jpg)
FUTURE RESEARCHDIRECTIONS
![Page 74: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/74.jpg)
PROBLEM
S T A T EM E N T
(MODEL) CLONE DETECTION
Models: models are typically represented visually, as box-and-arrow diagrams,
and the clones we are searching for are similar subgraphs of these diagrams.
Model Granularity: models could be represented at different levels of granularity
(such as the source code) corresponding to different syntactic (and semantic)
units.
Models Clones are categorized in (three) different Types
![Page 75: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/75.jpg)
REFE
RENC
E EX
AMPL
E
![Page 76: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/76.jpg)
TYPE 1C L O N E S
(MODEL) CLONE DETECTION
• Type 1 (exact) model clones: Identical model fragments except for
variations in visual presentation, layout and formatting.
![Page 77: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/77.jpg)
TYPE 2C L O N E S
(MODEL) CLONE DETECTION
Type 2 (renamed) model clones: Structurally identical model fragments except
for variations in labels, values, types, visual presentation, layout and formatting.
model@Friction Mode Logic/Break Apart Detection
model@Friction Mode Logic/Lockup Detection/Required Friction for Lockup
![Page 78: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/78.jpg)
TYPE 3C L O N E S
(MODEL) CLONE DETECTION
Type 3 (near-miss) model clones: Model fragments with further modifications,
such as changes in position or connection with respect to other model fragments and small additions or removals of blocks or lines in addition to variations in labels,
values, types, visual presentation, layout and formatting.
[email protected]_estimation
[email protected]_estimation
![Page 79: Unsupervised Machine Learning for clone detection](https://reader034.vdocuments.net/reader034/viewer/2022052522/54b74b254a7959e7498b4577/html5/thumbnails/79.jpg)
MO
DELS
AS
SOU
RC
E C
OD
E