an algorithm for keyword search on an execution path

23
An Algorithm for Keyword Search on an Execution Path Toshihiro Kamiya Future University Hakodate [email protected]

Upload: kamiya-toshihiro

Post on 15-Jan-2015

310 views

Category:

Technology


1 download

DESCRIPTION

Toshihiro Kamiya, "An Algorithm for Keyword Search on an Execution Path", In Proc. CSMR-WCRE 2014, pp. 328-332, 2014-02-06.

TRANSCRIPT

Page 1: An Algorithm for Keyword Search on an Execution Path

An Algorithm for Keyword Search

on an Execution Path

Toshihiro KamiyaFuture University Hakodate

[email protected]

Page 2: An Algorithm for Keyword Search on an Execution Path

CSMR-WCRE-2014 Era Track 2

Developers do search! ➤ To find reusable components for a function of a product

➤ To find similar code fragments before modifying a code

➤ To find code samples showing usage a given class or component

Background #1: Code searching

Page 3: An Algorithm for Keyword Search on an Execution Path

CSMR-WCRE-2014 Era Track 3

Background #2: Emerging fine-grained module technologies

More and more fine-grained modules are used.● Object/Closure

extract a data and its manipulation● Aspect

extract interests, a set of code invoked by a specific condition or event

● Dependency Injectionsplit code at each dependency

Page 4: An Algorithm for Keyword Search on an Execution Path

CSMR-WCRE-2014 Era Track 4

Problem: Searching on fine-grained modules

Code search becomes difficult by fine-grained modules

(Old days) the search result was contained in a file

(Now) is a set of several parts of several files

This affects code-search methods in both● Algorithm

– “how to find”● Displaying/Visualizing

– “how to show search results”

Old days

Now

Page 5: An Algorithm for Keyword Search on an Execution Path

CSMR-WCRE-2014 Era Track 5

Solution: Keyword Search on an Execution Path

● Static analysis● Find the execution paths that include given keywords

● From all possible execution paths of a target program● Idea: a compact data structure (And/Or/Call graph) of

execution paths + search algorithm on it● A prototype implementation

● applied to up-to 183k lines of Java source code

Related work● Prospector[8]● PARSEWeb[9]

Page 6: An Algorithm for Keyword Search on an Execution Path

CSMR-WCRE-2014 Era Track 6

And/Or/Call Graph

● A DAG contains all execution paths in a compact form

● is generated by the following translation rules

– Sequence structure And node➡– Selection structure Or node➡– Repetitive structure

Selection among sequences ➡of 0-time repetition, 1-time repetition,2-times repetition, ...

Or node having And nodes as ➡children

– Method call Call node➡● Dynamic dispatching

s1

s2

s3

s1;s2;s3;

st

se

if (...) { st;} else { se;}

interface I { m(); }

class B implements I { m() {...}}class C implements I { m() {...}}

I i;...i.m();

B//m

C//m

(s1 ∧ s2 ∧ s3)

(st ∨ se)

(B//m { }∨ C//m { })

Source code Graphical form Textual expression

(s1 · s2 · s3)

or

Page 7: An Algorithm for Keyword Search on an Execution Path

CSMR-WCRE-2014 Era Track 7

Example

行 1 行 2 行 3 行 40

2

4

6

8

10

12

列 1

列 2

列 3

main

Calendar//getIntance

split

Calendar//set

Calender//get

printf

getDay

getToday

getDayOfWeek

parseIntparseIntparseInt

Calendar//getIntance

Page 8: An Algorithm for Keyword Search on an Execution Path

CSMR-WCRE-2014 Era Track 8

Example

行 1 行 2 行 3 行 40

2

4

6

8

10

12

列 1

列 2

列 3

main

Calendar//getIntance

split

Calendar//set

Calender//get

printf

getDay

getToday

getDayOfWeek

parseIntparseIntparseInt

Calendar//getIntance

Page 9: An Algorithm for Keyword Search on an Execution Path

CSMR-WCRE-2014 Era Track 9

Example

行 1 行 2 行 3 行 40

2

4

6

8

10

12

列 1

列 2

列 3

main

Calendar//getIntance

split

Calendar//set

Calender//get

printf

getDay

getToday

getDayOfWeek

parseIntparseIntparseInt

Calendar//getIntance

Page 10: An Algorithm for Keyword Search on an Execution Path

CSMR-WCRE-2014 Era Track 10

Search Algorithm

● Input: Keywords to identify nodes● Output: Connected sub-graphs including the

nodes identified with the keywords“connected sub-graph” → continuous execution path

● Heuristics– Find deepest nodes

← Assumption: small operation is easy to understand– Extract shallowest sub-graph(treecut)

← Assumption: deep method-invocation chain is difficult to understand

Page 11: An Algorithm for Keyword Search on an Execution Path

CSMR-WCRE-2014 Era Track 11

Label and Summary

Label/Summary are “index” data of search algorithm.● Label

– A set of names put on a node– Keywords in a query

● Summary– A node n’s summary S(n) is a set of

names of (child and) descendant nodes of n.

Properties– For any node n and its any child node c S(n) ⊇ S(c).

– A root node has a summary of local maximum.

main

Calendar//getIntance

split

Calendar//set

Calender//get

printf

getDay

getToday

getDayOfWeek

parseIntparseIntparseInt

Calendar//getIntance

Page 12: An Algorithm for Keyword Search on an Execution Path

CSMR-WCRE-2014 Era Track 12

Label and Summary

Label/Summary are “index” data of search algorithm.● Label

– A set of names put on a node– Keywords in a query

● Summary– A node n’s summary S(n) is a set of

names of (child and) descendant nodes of n.

Properties– For any node n and its any child node c S(n) ⊇ S(c).

– A root node has a summary of local maximum.

main

Calendar//getIntance

split

Calendar//set

Calender//get

printf

getDay

getToday

getDayOfWeek

parseIntparseIntparseInt

Calendar//getIntance

summary

Page 13: An Algorithm for Keyword Search on an Execution Path

CSMR-WCRE-2014 Era Track 13

Label and Summary

Label/Summary are “index” data of search algorithm.● Label

– A set of names put on a node– Keywords in a query

● Summary– A node n’s summary S(n) is a set of

names of (child and) descendant nodes of n.

Properties– For any node n and its any child node c S(n) ⊇ S(c).

– A root node has a summary of local maximum.

main

Calendar//getIntance

split

Calendar//set

Calender//get

printf

getDay

getToday

getDayOfWeek

parseIntparseIntparseInt

Calendar//getIntance

{ “Calendar//getInstance”,“Calendar//set”,“split”, “parseInt” }

summary

Page 14: An Algorithm for Keyword Search on an Execution Path

CSMR-WCRE-2014 Era Track 14

Label and Summary

Label/Summary are “index” data of search algorithm.● Label

– A set of names put on a node– Keywords in a query

● Summary– A node n’s summary S(n) is a set of

names of (child and) descendant nodes of n.

Properties– For any node n and its any child node c S(n) ⊇ S(c).

– A root node has a summary of local maximum.

main

Calendar//getIntance

split

Calendar//set

Calender//get

printf

getDay

getToday

getDayOfWeek

parseIntparseIntparseInt

Calendar//getIntance

{ “Calendar//getInstance”, “Calendar//get”,“Calendar//set”, “getDay”, “getDayOfWeek”,“split”, “parseInt”, “printf” }

summary

Page 15: An Algorithm for Keyword Search on an Execution Path

CSMR-WCRE-2014 Era Track 15

Steps of search algorithm

(S1) finds query-fulfilling sub-trees of the (local) maximum depths– by comparing summary of each node with the query

(S2) makes the shallowest treecut– by removing deeper leaf nodes until the treecut

does not fulfill the query anymore.

(S3) removes uncontributing leaf nodes– Uncontributing = its label does not match any of the

query keywords

Page 16: An Algorithm for Keyword Search on an Execution Path

CSMR-WCRE-2014 Era Track 16

Example

Query { “Calender//get”,“Calender//set” }

(S1) finds query-fulfilling sub-trees of the (local) maximum depths

(S2) makes the shallowest treecut

(S3) removes uncontributing leaf nodes

main

Calendar//getIntance

split

Calendar//set

Calender//get

printf

getDay

getToday

getDayOfWeek

parseIntparseIntparseInt

Calendar//getIntance

Page 17: An Algorithm for Keyword Search on an Execution Path

CSMR-WCRE-2014 Era Track 17

Example

Query { “Calender//get”,“Calender//set” }

main

Calendar//getIntance

split

Calendar//set

Calender//get

printf

getDay

getToday

getDayOfWeek

parseIntparseIntparseInt

Calendar//getIntance

{ “Calendar//getInstance”, “Calendar//get”,“Calendar//set”, “getDay”, “getDayOfWeek”,“split”, “parseInt”, “printf” }

(S1) finds query-fulfilling sub-trees of the (local) maximum depths

(S2) makes the shallowest treecut

(S3) removes uncontributing leaf nodes

Page 18: An Algorithm for Keyword Search on an Execution Path

CSMR-WCRE-2014 Era Track 18

Example

Query { “Calender//get”,“Calender//set” }

main

Calendar//getIntance

split

Calendar//set

Calender//get

printf

getDay

getToday

getDayOfWeek

parseIntparseIntparseInt

Calendar//getIntance

(S1) finds query-fulfilling sub-trees of the (local) maximum depths

(S2) makes the shallowest treecut

(S3) removes uncontributing leaf nodes

Page 19: An Algorithm for Keyword Search on an Execution Path

CSMR-WCRE-2014 Era Track 19

mainCalendar//set

Calender//get

getDay

getDayOfWeek

Example

Query { “Calender//get”,“Calender//set” }

main { getDay { Calendar//set } getDayOfWeek { Calendar//get }}

Search result

(S1) finds query-fulfilling sub-trees of the (local) maximum depths

(S2) makes the shallowest treecut in each of the sub-trees

(S3) removes uncontributing leaf nodes

Page 20: An Algorithm for Keyword Search on an Execution Path

CSMR-WCRE-2014 Era Track 20

Prototype tool

Implementation● Target: Java source

code– Analysis of Java's

dynamic dispatch● Written in 8k lines of

Python● Applied up-to 183kloc

product (jEdit)

Limitations● Keywords

– Names of class or method– Text in string literal

● Exception handling– Does not search in the

execution paths that throw● Entry points

– main() and static initializers– Does not search for entry

points such as @Test

Page 21: An Algorithm for Keyword Search on an Execution Path

CSMR-WCRE-2014 Era Track 21

Java class files(bytecode)

Line number table

And/Or/Call graph

Node summary

Type hierarchy

Method signature

Method callsControl flow

And/Or/Call graphof method body

Method-body analysis

Dynamic-dispatch resolver

Dynamic-dispatch analysis

Keyword-query search

Sub-graph /Execution path

Formatting

Search result

Whole-program graph building

Inde

xing

Sea

rchi

ng

Node summary building

Node label

Query

Page 22: An Algorithm for Keyword Search on an Execution Path

CSMR-WCRE-2014 Era Track 22

Applied to jEdit

● H/W– CPU Xeon E5520 2.27GHz– 32GiB mem.

● Indexing – 48.8 sec. in elapsed time– 644 MiB peak mem.

● Searching– 3.09 72.2 (ave. 5.71) ∼

sec. in elapsed time– up-to 1412 MiB peak mem.

Page 23: An Algorithm for Keyword Search on an Execution Path

CSMR-WCRE-2014 Era Track 23

Summary

● Background– #1: Code searching– #2: Emerging of fine-grained module technologies

● Problem: Searching on fine-grained modules● Solution: Keyword search on an execution Path

– And/Or/Call graph, Label/summary– Search algorithm

● Prototype implementationApplied to jEdit

● GitHub– https://github.com/tos-kamiya/agoat/