spotting working code examples (icse 2014)
TRANSCRIPT
SPOTTING
WORKING CODE EXAMPLES
Iman Keivanloo Juergen Rilling Ying Zou
1
Code Completion
_File_>
public static void test() {
FileInputStream fStream = new FileInputStrea…
try {
String everything = IOUtils.toString(fStream );
} finally {
fStream.close();
}
2
Code Recommendation
_FileInputStream_>
3
• Limited query
• Usage pattern
4
Spotting Working Code Examples
_Read file line by line FileInputStream_ __> Real-time search
100ms < <400ms
Challenges
in Spotting Working Code Example
Correctness
while ((content = fis.read()) != -1){
System.out.print((char) content);}
Correct Complete Concise
FileInputStream fis = null;
File file = new File(“foo.txt”);
fis = new FileInputStream(file);
int content;Send SMS …
+ +
5
Challenges in Spotting Working Code
Example
Query:
{read, file}549,750
6
7
8
Why NOT Vector Space Model?
• e.g.,
test(readFile(“f1.txt”));
test(readFile(“f2.txt”));
test(readFile(“f3.txt”));
VSMBag-of-
words
Cosine
similarity+ +
VSM does not search for patterns+
9
Search Space
Search Algorithm
Similarity
Search Space
Content
Similarity
int temp = 1;
int temp = 0;
float var = 3;
{int, temp}
{int, temp}
{float, var}
*Bag-of-words model
10
Our Approach
Search Space
Content
Similarity
int temp = 1;
int temp = 0;
float var = 3;
{int, temp}
{int, temp}
{float, var}
*Bag-of-words model *p-strings
[Baker, B. S. 1993]
Pattern
Similarity
𝜌 𝜌 = 𝜌 ;
𝜌 𝜌 = 𝜌 ;
𝜌 𝜌 = 𝜌 ;
𝜌 𝜌 = 𝜌 ; + {int, temp, foat, var}
𝜌 𝜌 = 𝜌 ; + {int, temp, foat, var}
𝜌 𝜌 = 𝜌 ; + {int, temp, foat, var}
11
12
Offline Code Snippet Processing
12
13
Discarding Unnecessary Details …
13
14
{int, temp, foat, var}
Representation without Ordering Data
14
Mining Abstract Solutions
15
abstract programming solution (clone)
16
Search Space
Search Algorithm
The Proposed Greedy Algorithm
top-k lines
(imaginary snippet)
1st abstract clone top snippet
query
{read, file}
𝑙𝑞,1
𝑙𝑞,2
…
𝑙𝑞,𝑛
𝑝𝑐,1
𝑝𝑐,2
…
𝑝𝑐,𝑛
𝑐𝑝,1
𝑐𝑝,2
…
𝑐𝑝,𝑛
top-k abstract
clones
top-k lines
17
Spotting Working Code Examples
1. Free-form querying
2. Self-contained code examples
query= { JFreeChart, JPEG}
18
Spotting Working Code Examples
3. Less dependency on term matching
4. No limitation on query’s terms
query= { bubblesort }
19
Case Study
1. Feasibility (e.g., no data/control flow data!)
2. Scalability
3. Performance:
•RQ1 Ranking schema?
•RQ2 Our approach VS. code search engines?
20
Corpus
~12 million
Java classes
~25,000~3 million
Unique Java
classes
~300 million
LOC
-----------------
5.5 million
fragments
~15.5 million
abstract clones
21
• Features for ranking:
1. Similarity (S)
2. Popularity (P)
3. Size (A)
feature X
Top-K
RQ1 – What is the best ranking schema
for spotting working code examples?
Re-ranking
4. Combination of P and S
5. Combination of A and S
22
• Recall is misleading
• The first answer matters
• WTA (Winner Takes All)
RQ1 – What is the best ranking schema?
23
Whether the top ranked answer is correct?
RQ1 – What is the best ranking schema?
60
70
80
90
S P PS A AS
Coverage
Precision
Similarity (S) Popularity (P) Size (A)
S P P+S A A+S
24
Whether the top ranked answer is a good code example?
RQ1 – What is the best ranking schema?
Completeness Conciseness
S P A P
100
60
30
100
60
20
(S) Similarity
(P) Popularity
(A) Size
S P A P+S A+S
Popularity + Similarity
leads to the best ranking schema
for
spotting working code examples
RQ2 – Can our approach outperform
Internet-scale code search engines?
Our approach
~25,000
26
27
RQ2 – Our approach vs. Ohloh Code?
Our approach Our approach
Best Hit’s Rank NDCG
40
20
2
1
0.7
0.5
The proposed real-time search is
feasible + outperforms Ohloh Code
28
Summary