spotting working code examples (icse 2014)

28
SPOTTING WORKING CODE EXAMPLES Iman Keivanloo Juergen Rilling Ying Zou 1

Upload: imanmahsa

Post on 02-Jul-2015

162 views

Category:

Science


3 download

TRANSCRIPT

Page 1: Spotting Working Code Examples (ICSE 2014)

SPOTTING

WORKING CODE EXAMPLES

Iman Keivanloo Juergen Rilling Ying Zou

1

Page 2: Spotting Working Code Examples (ICSE 2014)

Code Completion

_File_>

public static void test() {

FileInputStream fStream = new FileInputStrea…

try {

String everything = IOUtils.toString(fStream );

} finally {

fStream.close();

}

2

Page 3: Spotting Working Code Examples (ICSE 2014)

Code Recommendation

_FileInputStream_>

3

• Limited query

• Usage pattern

Page 4: Spotting Working Code Examples (ICSE 2014)

4

Spotting Working Code Examples

_Read file line by line FileInputStream_ __> Real-time search

100ms < <400ms

Page 5: Spotting Working Code Examples (ICSE 2014)

Challenges

in Spotting Working Code Example

Correctness

while ((content = fis.read()) != -1){

System.out.print((char) content);}

Correct Complete Concise

FileInputStream fis = null;

File file = new File(“foo.txt”);

fis = new FileInputStream(file);

int content;Send SMS …

+ +

5

Page 6: Spotting Working Code Examples (ICSE 2014)

Challenges in Spotting Working Code

Example

Query:

{read, file}549,750

6

Page 7: Spotting Working Code Examples (ICSE 2014)

7

Page 8: Spotting Working Code Examples (ICSE 2014)

8

Why NOT Vector Space Model?

• e.g.,

test(readFile(“f1.txt”));

test(readFile(“f2.txt”));

test(readFile(“f3.txt”));

VSMBag-of-

words

Cosine

similarity+ +

VSM does not search for patterns+

Page 9: Spotting Working Code Examples (ICSE 2014)

9

Search Space

Search Algorithm

Page 10: Spotting Working Code Examples (ICSE 2014)

Similarity

Search Space

Content

Similarity

int temp = 1;

int temp = 0;

float var = 3;

{int, temp}

{int, temp}

{float, var}

*Bag-of-words model

10

Page 11: Spotting Working Code Examples (ICSE 2014)

Our Approach

Search Space

Content

Similarity

int temp = 1;

int temp = 0;

float var = 3;

{int, temp}

{int, temp}

{float, var}

*Bag-of-words model *p-strings

[Baker, B. S. 1993]

Pattern

Similarity

𝜌 𝜌 = 𝜌 ;

𝜌 𝜌 = 𝜌 ;

𝜌 𝜌 = 𝜌 ;

𝜌 𝜌 = 𝜌 ; + {int, temp, foat, var}

𝜌 𝜌 = 𝜌 ; + {int, temp, foat, var}

𝜌 𝜌 = 𝜌 ; + {int, temp, foat, var}

11

Page 12: Spotting Working Code Examples (ICSE 2014)

12

Offline Code Snippet Processing

12

Page 13: Spotting Working Code Examples (ICSE 2014)

13

Discarding Unnecessary Details …

13

Page 14: Spotting Working Code Examples (ICSE 2014)

14

{int, temp, foat, var}

Representation without Ordering Data

14

Page 15: Spotting Working Code Examples (ICSE 2014)

Mining Abstract Solutions

15

abstract programming solution (clone)

Page 16: Spotting Working Code Examples (ICSE 2014)

16

Search Space

Search Algorithm

Page 17: Spotting Working Code Examples (ICSE 2014)

The Proposed Greedy Algorithm

top-k lines

(imaginary snippet)

1st abstract clone top snippet

query

{read, file}

𝑙𝑞,1

𝑙𝑞,2

𝑙𝑞,𝑛

𝑝𝑐,1

𝑝𝑐,2

𝑝𝑐,𝑛

𝑐𝑝,1

𝑐𝑝,2

𝑐𝑝,𝑛

top-k abstract

clones

top-k lines

17

Page 18: Spotting Working Code Examples (ICSE 2014)

Spotting Working Code Examples

1. Free-form querying

2. Self-contained code examples

query= { JFreeChart, JPEG}

18

Page 19: Spotting Working Code Examples (ICSE 2014)

Spotting Working Code Examples

3. Less dependency on term matching

4. No limitation on query’s terms

query= { bubblesort }

19

Page 20: Spotting Working Code Examples (ICSE 2014)

Case Study

1. Feasibility (e.g., no data/control flow data!)

2. Scalability

3. Performance:

•RQ1 Ranking schema?

•RQ2 Our approach VS. code search engines?

20

Page 21: Spotting Working Code Examples (ICSE 2014)

Corpus

~12 million

Java classes

~25,000~3 million

Unique Java

classes

~300 million

LOC

-----------------

5.5 million

fragments

~15.5 million

abstract clones

21

Page 22: Spotting Working Code Examples (ICSE 2014)

• Features for ranking:

1. Similarity (S)

2. Popularity (P)

3. Size (A)

feature X

Top-K

RQ1 – What is the best ranking schema

for spotting working code examples?

Re-ranking

4. Combination of P and S

5. Combination of A and S

22

Page 23: Spotting Working Code Examples (ICSE 2014)

• Recall is misleading

• The first answer matters

• WTA (Winner Takes All)

RQ1 – What is the best ranking schema?

23

Page 24: Spotting Working Code Examples (ICSE 2014)

Whether the top ranked answer is correct?

RQ1 – What is the best ranking schema?

60

70

80

90

S P PS A AS

Coverage

Precision

Similarity (S) Popularity (P) Size (A)

S P P+S A A+S

24

Page 25: Spotting Working Code Examples (ICSE 2014)

Whether the top ranked answer is a good code example?

RQ1 – What is the best ranking schema?

Completeness Conciseness

S P A P

100

60

30

100

60

20

(S) Similarity

(P) Popularity

(A) Size

S P A P+S A+S

Popularity + Similarity

leads to the best ranking schema

for

spotting working code examples

Page 26: Spotting Working Code Examples (ICSE 2014)

RQ2 – Can our approach outperform

Internet-scale code search engines?

Our approach

~25,000

26

Page 27: Spotting Working Code Examples (ICSE 2014)

27

RQ2 – Our approach vs. Ohloh Code?

Our approach Our approach

Best Hit’s Rank NDCG

40

20

2

1

0.7

0.5

The proposed real-time search is

feasible + outperforms Ohloh Code

Page 28: Spotting Working Code Examples (ICSE 2014)

28

Summary