assieme: finding and leveraging implicit references in a web search interface for programmers...

33
Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University of Washington, Seattle UIST 2007

Upload: diana-mcdonald

Post on 26-Mar-2015

231 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University

Assieme: Finding and Leveraging Implicit

References in a Web Search Interface for Programmers

Raphael Hoffmann, James Fogarty, Daniel S. Weld

University of Washington, SeattleUIST 2007

Page 2: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University

Programmers Use Search

• To identify an API• To seek information about an API• To find examples on how to use an

API

“Programmatically output an Acrobat PDF file in Java.”

Example Task:

Page 3: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University

Example: General Web Search Interface

Page 4: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University

Example: Code-Specific Web Search

Interface

Page 5: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University

Problems

• Information is dispersed: tutorials, API itself, documentation, pages with samples

• Difficult and time-consuming to …– locate required pieces,– get an overview of alternatives,– judge relevance and quality of results,– understand dependencies.

• Many page visits required

Page 6: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University

With Assieme we …

• Designed a new Web search interface• Developed needed inference

Page 7: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University

Outline

• Motivation• What Programmers Search For• The Assieme Search Engine

– Inferring Implicit References– Using Implicit References for Scoring

• Evaluation of Inference & User Study• Discussion & Conclusion

Page 8: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University

Six Learning Barriers faced by Programmers (Ko et

al. 04) • Design barriers — What to do?

• Selection barriers — What to use?

• Coordination barriers — How to combine?

• Use barriers — How to use?

• Understanding barriers — What is wrong?

• Information barriers — How to check?

Page 9: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University

Examining Programmer Web Queries

Objective• See what programmers search for

Dataset• 15 million queries and click-through data• Random sample of MSN queries in 05/06

Procedure• Extract query sessions containing ‘java’ – 2,529• Manual looking at queries and defining regex

filters• Informal taxonomy of query sessions

Page 10: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University

Examining Programmer Web Queries

Page 11: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University

Examining Programmer Web Queries

Descriptive Contain package, type or member name

Contain terms like “example”, “using”, “sample code”

64.1 % 35.9 %

17.9 %

“java JSP current date” “java SimpleDateFormat”

“using currentdate in jsp”

Selection barrier Use barrier

Coordination barrier

Page 12: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University

Assieme

example

code

documentation

required

libaries

relevance indicated by

# uses

Summaries show

referenced types

links torelated

info

Page 13: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University

Challenges

How to put the right information on the interface ?

• Get all programming-related data• Interpret data and infer relationships

Page 14: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University

Outline

• Motivation• What Programmers Search For• The Assieme Search Engine

– Inferring Implicit References– Using Implicit References for Scoring

• Evaluation of Inference & User Study• Discussion & Conclusion

Page 15: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University

Assieme’s Data

… is crawled using existing search engines

Pages withcode examples JAR files JavaDoc pages

Queried Google on“java ±import ±class …”

Queried Google on“overview-tree.html …”

Downloaded libraryfiles for all projects onSun.com, Apache.org,

Java.net, SourceForge.net

~2,360,000 ~79,000 ~480,000

Page 16: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University

The Assieme Search Engine

… infers 2 kinds of implicit references

JAR files

JavaDoc pages

Pages withcode examples

Uses of packages,

types and members

Matches of packages,

types and members

?

Page 17: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University

unclear segmentation

Extracting Code Samples

code in a different language (C++)distracting terms ‘…’ in code

line numbers

Page 18: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University

Extracting Code Samples

remove HTML commands,but preserve line breaksremove some distracters by heuristicslaunch (error-tolerant) Java parser at every line break

(separately parse for types, methods, and sequences of statements)

<html><head><title></title></head><body>A simple example:<br><br> 1: import java.util.*; <br>2: class c {<br>3: HashMap m = new HashMap();<br>4: void f() { m.clear(); }<br>5: }<br><br><a href=“index.html”>back</a></body></html>

<html><head><title></title></head><body>A simple example:<br><br> 1: import java.util.*; <br>2: class c {<br>3: HashMap m = new HashMap();<br>4: void f() { m.clear(); }<br>5: }<br><br><a href=“index.html”>back</a></body></html>

A simple example:

1: import java.util.*;2: class c {3: HashMap m = new HashMap();4: void f() { m.clear(); }5: }

back

A simple example:

1: import java.util.*;2: class c {3: HashMap m = new HashMap();4: void f() { m.clear(); }5: }

back

A simple example:

import java.util.*;class c {HashMap m = new HashMap();void f() { m.clear(); }}

back

A simple example:

import java.util.*;class c {HashMap m = new HashMap();void f() { m.clear(); }}

back

Page 19: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University

Resolving External Code References

Naïve approach of finding term matches does not work:

1 import java.util.*;2 class c {3 HashMap m = new HashMap();4 void f() { m.clear(); }5 }

Reference java.util.HashMap.clear() on line 4 only detectable by considering several lines

?

Use compiler to identify unresolved names

Page 20: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University

Resolving External Code References

• Index packages/types/members in Jar files

JARfiles

Utility function:# covered references(and JAR

popularity)

java.util.HashMap.clear()java.util.HashMap…

greedily pickbest JARs

JARfiles

unresolved names

compile

indexlookup

put onclasspath

• Compile & lookup

Page 21: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University

Scoring

• Existing techniques …

– Docs modeled as weighted term frequencies– Hypertext link analysis (PageRank)

– JAR files (binary code) provide no context– Source code contains few relevant keywords– Structure in code important for relevance

• … do not work well for code, because:

Page 22: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University

Using Implicit References to Improve Scoring

• Assieme exploits structure on Web pages

HTML hyperlinks

and structure in code

code references

Page 23: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University

Scoring

APIs(packages/types/members)

Web pages

Page 24: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University

Scoring

APIs• Use text on doc pages and on pages with

code samples that reference API (~ anchor text)

• Weight APIs by #incoming refs (~ PageRank)

Web Pages• Use fully qualified references

(java.util.HashMap) and adjust term weights• Filter pages by references• Favor pages with accompanying text

Page 25: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University

Outline

• Motivation• What Programmers Search For• The Assieme Search Engine

– Inferring Implicit References– Using Implicit References for Scoring

• Evaluation of Inference & User Study• Discussion & Conclusion

Page 26: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University

Evaluating Code Extraction and Reference Resolution

… on 350 hand-labeled pages from Assieme’s data

Reference Resolution• Recall 89.6%, Precision 86.5% • False positives: Fisheye and diff pages• False negatives: incomplete code samples

Code Extraction• Recall 96.9%, Precision 50.1% ( 76.7%)• False positives: C, C#, JavaScript, PHP,

FishEye/diff• (After filtering pages without refs: precision 76.7%)

Page 27: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University

User Study

Assieme vs. Google vs. Google Code Search

Design• 40 search tasks based on queries in logs:

query “socket java” “Write a basic server that communicates using Sockets”

• Find code samples (and required libraries)• 4 blocks of 10 tasks: 1 for training + 1 per

interfaceParticipants• 9 (under-)graduate students in Computer Science

Page 28: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University

User Study – Task Time

Assieme Google GCS0

50

100

150

seco

nd

s (

SE

M)

F(1,258)=5.74p ≈ .017

F(1,258)=1.91p ≈ .17

*significant

Page 29: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University

User Study – Solution Quality

0 seriously flawed .5 generally good but fell short in critical regard1 fairly complete

Assieme Google GCS0.0

0.2

0.4

0.6

0.8

1.0

qu

alit

y (

SE

M)

F(1,258)=55.5p < .0001F(1,258)=6.29

p ≈ .013**

Page 30: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University

User Study – # Queries Issued

Assieme Google GCS0.0

0.5

1.0

1.5

2.0

2.5

#qu

erie

s (

SE

M)

F(1,259)=9.77p ≈ .002

F(1,259)=6.85p ≈ .001

**

Page 31: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University

Outline

• Motivation• What Programmers Search For• The Assieme Search Engine

– Inferring Implicit References– Using Implicit References for Scoring

• Evaluation of Inference & User Study• Discussion & Conclusion

Page 32: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University

Discussion & Conclusion

• Assieme – a novel web search interface• Programmers obtain better solutions,

using fewer queries, in the same amount of time

• Using Google subjects visited 3.3 pages/task, using Assieme only 0.27 pages, but 4.3 previews

• Ability to quickly view code samples changed participants’ strategies

Page 33: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University

Thank YouRaphael Hoffmann

Computer Science & EngineeringUniversity of Washington

[email protected]

James FogartyComputer Science & Engineering

University of [email protected]

Daniel S. WeldComputer Science & Engineering

University of [email protected]

This material is based upon work supported by the National Science Foundation under grant IIS-0307906, by the Office of Naval Research under grant N00014-06-1-0147, SRI International under CALO grant 03-000225 and the Washington Research Foundation / TJ Cable Professorship.