predicting reuse of end-user web macro scripts chris scaffidi 1 2, chris bogart 2, margaret burnett...

Predicting Reuse of End-User Predicting Reuse of End-User Web Macro ScriptsWeb Macro Scripts

Chris Scaffidi12, Chris Bogart2, Margaret Burnett2, Allen Cypher3, Brad Myers1, Mary Shaw1

1 Carnegie Mellon University2 Oregon State University

3 IBM-Almaden

22

Repositories of end-user code:Repositories of end-user code:The good, the great, and the “other”The good, the great, and the “other”

C. Bogart, et al. End-User Programming in the Wild: A Field Study of CoScripter Scripts. VL/HCC 2008.

Previous study:

Of 1445 CoScripter macros…~ 10% had many runs~ 10% had many users~ 80% were “other”

This is the largest web macro repository> 6000 users, > 3000 “public” scripts

Problem Traits Predictions Conclusion

33

What if our repositories could…What if our repositories could…

• … omit pieces of code from search results if they are unlikely to be reused, anyway?

• ... provide a UI for administrators to review (and remove?) old code that’s unlikely to be used?

• … advise programmers, when they upload code, about how to improve the reusability of their code?


44

So how do we separate the So how do we separate the wheat from the chaff?wheat from the chaff?

• Providing such features requires predicting whether code will ever be reused

– Without relying on information that’s available after code is reused (“chicken and egg”)

• Ratings, reviews, etc…• (For some features, of course, we can always add this

information in later.)

– With a fairly simple model for making predictions• So that predictions can be explained to users• Especially when we’re advising users about how to improve

reusability of their code!!!!!


55

Needed: a model for predicting reuseNeeded: a model for predicting reuse

• Key questions for discovering such a model…– What information about the code indicates reusability?– How do we combine this information to predict reuse?

• Similar models have been successful on OO code– Predicting reuse based on coupling & cohesion– Predicting bugginess based on code complexity metrics,

information about code authors, code churn, …

Web macros are much simpler (don’t call each other, don’t have loops, etc)… we need different information here.


66

ApproachApproach

• Approach:– Consider the steps required for reusing code– Identify macro traits that might support reusing code– Empirically test whether code with these traits is more

likely to be reused– Empirically test whether these traits together can

accurately predict reuse (using machine learning)


77

What are the What are the traits of reusable web macros?traits of reusable web macros?

• Four fundamental steps of reuse in general:– Finding code– Understanding it– Modifying it– Composing it

• We expect that code is more reusable if it does not need modification to be reused.

• Users rarely combine CoScripter web macros.

• Traits should support finding, understanding, and not needing to modify.


88

We identified 35 We identified 35 candidatecandidate traits traits in 8 categoriesin 8 categories

• Mass appeal – eg popular keywords F• Language – eg data values are in English U• Annotations – eg comments U• Flexibility – eg parameterization (variables) M• Length – eg small # distinct lines of code UM• Author information – eg at IBM IP address M• Advanced syntax – eg “control-click” keyword UM• No Preconditions – eg no cookies needed M

F = findability, U = understandability, M = not modifying

All candidate traits values’ are computed fully automatically.


99

Getting some data to work withGetting some data to work with

• Extracted 6 months of IBM wiki data– Source code & usage logs for 937 public scripts– Four (binary) measures of reuse

• Execution by author > 24 hours after initial creation• Execution by any other user• Editing by any other user• Clone/copy-paste by any other user

– Why not use non-binary, absolute # of reuse counts??• Macros that call themselves (infinite loops)• Macros called periodically by other (non-macro) programs• Information cascades: popularity leads to popularity (purely an

artifact of the wiki’s UI)

(But we come back to absolute numbers later on…)


1010

Testing for correspondenceTesting for correspondence

• For each candidate trait, divide scripts into two groups– For boolean traits, based on true/false– For numerical traits, based on above/below mean

• Performed z-test of proportions:– Does the trait correspond as expected to higher

likelihood of reuse?


1111

We found many traits that empirically We found many traits that empirically corresponded to reuse.corresponded to reuse.

• Traits significant at p<0.00036 wrt at least one reuse measure– If websites hit by the macro contain certain keywords– If the macro was intended by IBM as a “tutorial” script– Number of comments in the macro’s code– If the macro has a title– Number of parameters in the macro– Number of literals hard-coded in the macro– Number of distinct lines of code in the macro– ID number of the macro author (indicates early adopter)– ID number of the script (generally lower for early adopters)– If the author was at an IBM IP address– Number of author’s previous scripts that had been reused– If the macro used ordinal advanced syntax– If the macro used “control-click”/”control-select” syntax– If the macro required user to be at a certain URL prior to run– If the macro hits a lot of different websites

• Traits significant at p<0.00036 wrt at least one reuse measure– If websites hit by the macro contain certain keywords– If the macro was intended by IBM as a “tutorial” script– Number of comments in the macro’s code– If the macro has a title– Number of parameters in the macro– Number of literals hard-coded in the macro– Number of distinct lines of code in the macro– ID number of the macro author (indicates early adopter)– ID number of the script (generally lower for early adopters)– If the author was at an IBM IP address– Number of author’s previous scripts that had been reused– If the macro used ordinal advanced syntax– If the macro used “control-click”/”control-select” syntax– If the macro required user to be at a certain URL prior to run– If the macro hits a lot of different websites

Mass appeal traits

Annotation traits

Length traits

Traits hinting higher author expertise

Use of advanced syntax


1212

These traits are “These traits are “raw materialsraw materials” for a ” for a predictive model.predictive model.

• A model of the form

reuse-measure = F(trait1, trait2, …, traitN)

– For starters, continue to use binary reuse measures.

– Approachable with supervised machine learning.

– F should be pretty simple, so that we can generate those explanations.


1313

• For each trait– Find the threshold that optimally divides the reused

macros from the un-reused macros– Retain trait only if its optimal divide does a good job of

dividing reused macros from un-reused macros

We call each trait-based constraint a “reuse predictor”.

Model that we developed Model that we developed (in words & pictures)(in words & pictures)

Tra

it le

vel

Threshold

Tra

it le

vel

Threshold


1414

Predicting if a macro will be reusedPredicting if a macro will be reused

• Count how many predictors are satisfied

• Predict that the macro will be reused if this count exceeds some minimum– Also a tunable parameter– A higher minimum implies a higher bar that a macro

must overcome to be predicted as to be reused• Fewer false positives, higher false negatives


1515

ExampleExample

• E.g.: Suppose the predictors were…comments ≥ 3 inet_urls ≤ 1 prev_created ≥ 10 literals ≤ 4

• Explanations might someday be formed like,– “You might be able to raise the reusability of this

macro by providing a few more comments and by replacing some literals with variables.” Show me how.

– “Macros in the search results have many comments, experienced authors, few inaccessible URLs, and few hardcoded literals.” Tell me more.


1616

Algorithm accuracy with varying values Algorithm accuracy with varying values for tunable parametersfor tunable parameters

Alternatemachinelearningalgorithms

Tru

e P

ositi

ve R

ate

False Positive Rate


1717

Absolute level of reuse rose sharply Absolute level of reuse rose sharply with the number of matcheswith the number of matches


1818

ConclusionsConclusions

• Traits contain enough information to predict reuse– Can we improve accuracy by tweaking how predictors

are built and selected?– Can we improve accuracy by using more traits and/or

information available after reuse is attempted?– Can we generalize to other kinds of programs?– Can we also predict reusability?

• Predictions combine trait data fairly simply– Work with IBM to enhance the CoScripter wiki

• Improving the search results• Providing UI for administrators to review macros• Giving programmers advice automatically


1919

ConclusionsConclusionsand future workand future work

• Traits contain enough information to predict reuse– Can we improve accuracy by tweaking how predictors

are built and selected?– Can we improve accuracy by using more traits and/or

information available after reuse is attempted?– Can we generalize to other kinds of programs?– Can we also predict reusability?

• Predictions combine trait data fairly simply– Work with IBM to enhance the CoScripter wiki

• Improving the search results• Providing UI for administrators to review macros• Giving programmers advice automatically


2020

Thank YouThank You

• To the VL/HCC for this opportunity

• To the EUSES Consortium for feedback

• To NSF for funding


2121

On closer examination…On closer examination…

Many rarely-reused macros…

• Reference non-public (intranet) URLs• Assume the browser is at a certain URL• Require user to be logged into a site• Hardcode values for form fields

2222

Typical activities toward achieving Typical activities toward achieving end-user programming goalsend-user programming goals

• Create a new end-user program from scratch

• Clone or copy-paste from existing end-user program

• Tweak code

• Programmatically call existing end-user code (rare)

• Manually run a series of existing end-user programs

Or any combination of the above.

• Create a new end-user program from scratch• Clone or copy-paste from existing end-user program• Tweak code• Programmatically call existing end-user code (rare)• Manually run a series of existing end-user programs

Or any combination of the above.

Note: 4 activities reuse or operate on existing code.

2323

Details of picking thresholdsDetails of picking thresholdsand choosing whether to retain traitsand choosing whether to retain traitsPick trait’s threshold by maximizing difference between

(fraction of above-threshold macros that are reused)& (fraction of below-threshold macros that are reused)

Retain trait if its difference exceeds a minimal distance

•Raising the minimum means that– Only the best traits are retained– Information can be lost

Similar to a z-test, but simpler to explain

“95% of macros with >3 comments were reused, versus only 24% of macros with fewer comments”


predicting reuse of end-user web macro scripts chris scaffidi 1 2, chris bogart 2, margaret burnett...

Documents

code authors

code churn

pieces of code

repositories of enduser

traits of reusable web

code complexity metrics

old code thats unlikely

coscripter web macros