clone detection by exploiting assembler
DESCRIPTION
Clone Detection by Exploiting Assembler. Ian Davis, Mike Godfrey University of Waterloo Ontario, Canada. The Original Assembler. .LC107: .string "merge “ … pushl $ .LC107 pushl command_buf+8 .LCFI378: call prefixcmp addl $16,%esp testl %eax,%eax jne .L485 subl $8,%esp - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Clone Detection by Exploiting Assembler](https://reader034.vdocuments.net/reader034/viewer/2022051401/5681499e550346895db6df89/html5/thumbnails/1.jpg)
Clone Detection by Exploiting Assembler
Ian Davis, Mike Godfrey
University of Waterloo
Ontario, Canada
![Page 2: Clone Detection by Exploiting Assembler](https://reader034.vdocuments.net/reader034/viewer/2022051401/5681499e550346895db6df89/html5/thumbnails/2.jpg)
IWSC May 2010 Clone Detection by Exploiting Assembler
2
![Page 3: Clone Detection by Exploiting Assembler](https://reader034.vdocuments.net/reader034/viewer/2022051401/5681499e550346895db6df89/html5/thumbnails/3.jpg)
IWSC May 2010 Clone Detection by Exploiting Assembler
3
![Page 4: Clone Detection by Exploiting Assembler](https://reader034.vdocuments.net/reader034/viewer/2022051401/5681499e550346895db6df89/html5/thumbnails/4.jpg)
IWSC May 2010 Clone Detection by Exploiting Assembler
4
![Page 5: Clone Detection by Exploiting Assembler](https://reader034.vdocuments.net/reader034/viewer/2022051401/5681499e550346895db6df89/html5/thumbnails/5.jpg)
IWSC May 2010 Clone Detection by Exploiting Assembler
5
![Page 6: Clone Detection by Exploiting Assembler](https://reader034.vdocuments.net/reader034/viewer/2022051401/5681499e550346895db6df89/html5/thumbnails/6.jpg)
IWSC May 2010 Clone Detection by Exploiting Assembler
6
.LC107: .string "merge “…pushl $.LC107pushl command_buf+8.LCFI378:call prefixcmpaddl $16,%esptestl %eax,%eaxjne .L485subl $8,%esppushl $32pushl command_buf+8call strchraddl $16,%esp incl %eaxmovl %eax,-16 (%ebp)subl $12,%esppushl $24call xmallocaddl $16,%espmovl %eax,-8(%ebp)subl $12,%esppushl -16 (%ebp)call lookup_branch….L485
The Original Assembler
• Identify function boundaries
• Relate assembler back to source
• Remove comments, white space, etc.
• Normalize instruction set if needed
• Convert to relative addressing
• Inline string constants
• Reconstruct parameter names
• Reconstruct local variable names
![Page 7: Clone Detection by Exploiting Assembler](https://reader034.vdocuments.net/reader034/viewer/2022051401/5681499e550346895db6df89/html5/thumbnails/7.jpg)
IWSC May 2010 Clone Detection by Exploiting Assembler
7
pushl $"merge " pushl command_buf+8
call prefixcmpaddl $16,%esptestl %eax,%eaxjne +124subl $8,%esppushl $32pushl command_buf+8call strchraddl $16,%esp incl %eaxmovl %eax,from(%ebp)subl $12,%esppushl $24call xmallocaddl $16,%espmovl %eax,n (%ebp)subl $12,%esppushl from(%ebp)call lookup_branch
The Annotated Assembler
• Identify function boundaries
• Relate assembler to source
• Remove comments, white space, etc.
• Normalize instruction set if needed
• Convert to relative addressing
• Inline string constants
• Reconstruct parameter names
• Reconstruct local variable names
![Page 8: Clone Detection by Exploiting Assembler](https://reader034.vdocuments.net/reader034/viewer/2022051401/5681499e550346895db6df89/html5/thumbnails/8.jpg)
IWSC May 2010 Clone Detection by Exploiting Assembler
8
The Matching Algorithm
• Scan entire source once
• Use hashing to find first pairing
• Ignore pairings in identified clones
• Don’t cross function boundaries
• Terminate clone before later in function
• Weight matches (+) and mismatches (-)
• Special logic for matching branches
• Advance greedily while weight ≥ 0
• Then employ hill climbing
• Continue while improvement possible
• Accept if clones satisfy minimum length
• Alternative minimum for matching functions
![Page 9: Clone Detection by Exploiting Assembler](https://reader034.vdocuments.net/reader034/viewer/2022051401/5681499e550346895db6df89/html5/thumbnails/9.jpg)
IWSC May 2010 Clone Detection by Exploiting Assembler
9
from = strchr(command_buf.buf, ' ') + 1;n = xmalloc(sizeof(*n));s = lookup_branch(from);if (s) hashcpy(n->sha1, s->sha1);else if (*from == ':') {
uintmax_t idnum = strtoumax(from + 1, NULL, 10); struct object_entry *oe = find_mark(idnum ); if (oe->type != OBJ_COMMIT) die("Mark :%" PRIuMAX " not a commit", idnum ); hashcpy(n->sha1, oe->sha1);} else if (!get_sha1(from, n->sha1)) { unsigned long size;
char *buf = read_object_with_reference(n->sha1, commit_type, &size, n->sha1); if (!buf || size < 46) die("Not a valid commit: %s", from); free(buf);} else die("Invalid ref name or SHA1 expression: %s", from);
Source Clone 1
![Page 10: Clone Detection by Exploiting Assembler](https://reader034.vdocuments.net/reader034/viewer/2022051401/5681499e550346895db6df89/html5/thumbnails/10.jpg)
IWSC May 2010 Clone Detection by Exploiting Assembler
10
from = strchr(command_buf.buf, ' ') + 1;
s = lookup_branch(from);if (s) hashcpy( sha1, s->sha1);else if (*from == ':') { struct object_entry *oe; from_mark = strtoumax(from + 1, NULL, 10); oe = find_mark(from_mark); if (oe->type != OBJ_COMMIT) die("Mark :%" PRIuMAX " not a commit", from_mark); hashcpy( sha1, oe->sha1);} else if (!get_sha1(from, sha1)) { unsigned long size; char *buf; buf = read_object_with_reference( sha1, commit_type, &size, sha1); if (!buf || size < 46) die("Not a valid commit: %s", from); free(buf);} else die("Invalid ref name or SHA1 expression: %s", from);
Source Clone 2
![Page 11: Clone Detection by Exploiting Assembler](https://reader034.vdocuments.net/reader034/viewer/2022051401/5681499e550346895db6df89/html5/thumbnails/11.jpg)
IWSC May 2010 Clone Detection by Exploiting Assembler
11
Benefits and Conclusions
Assembler easy to derive from source / object / executable
Compliments other clone detection approaches
Compiler performs useful normalization of source for free
The analysis is semantic – not syntactic By function (forbidding overlapped clones pairs) Can handle branching sensibly Case statements easier to handle Can weight different assembler instructions differently Can reason about assembler when performing detection
![Page 12: Clone Detection by Exploiting Assembler](https://reader034.vdocuments.net/reader034/viewer/2022051401/5681499e550346895db6df89/html5/thumbnails/12.jpg)
IWSC May 2010 Clone Detection by Exploiting Assembler
12
Thank You