predicting reuse of end-user web macro scripts chris scaffidi 1 2, chris bogart 2, margaret burnett...
TRANSCRIPT
Predicting Reuse of End-User Predicting Reuse of End-User Web Macro ScriptsWeb Macro Scripts
Chris Scaffidi12, Chris Bogart2, Margaret Burnett2, Allen Cypher3, Brad Myers1, Mary Shaw1
1 Carnegie Mellon University2 Oregon State University
3 IBM-Almaden
22
Repositories of end-user code:Repositories of end-user code:The good, the great, and the “other”The good, the great, and the “other”
C. Bogart, et al. End-User Programming in the Wild: A Field Study of CoScripter Scripts. VL/HCC 2008.
Previous study:
Of 1445 CoScripter macros…~ 10% had many runs~ 10% had many users~ 80% were “other”
This is the largest web macro repository> 6000 users, > 3000 “public” scripts
Problem Traits Predictions Conclusion
33
What if our repositories could…What if our repositories could…
• … omit pieces of code from search results if they are unlikely to be reused, anyway?
• ... provide a UI for administrators to review (and remove?) old code that’s unlikely to be used?
• … advise programmers, when they upload code, about how to improve the reusability of their code?
Problem Traits Predictions Conclusion
44
So how do we separate the So how do we separate the wheat from the chaff?wheat from the chaff?
• Providing such features requires predicting whether code will ever be reused
– Without relying on information that’s available after code is reused (“chicken and egg”)
• Ratings, reviews, etc…• (For some features, of course, we can always add this
information in later.)
– With a fairly simple model for making predictions• So that predictions can be explained to users• Especially when we’re advising users about how to improve
reusability of their code!!!!!
Problem Traits Predictions Conclusion
55
Needed: a model for predicting reuseNeeded: a model for predicting reuse
• Key questions for discovering such a model…– What information about the code indicates reusability?– How do we combine this information to predict reuse?
• Similar models have been successful on OO code– Predicting reuse based on coupling & cohesion– Predicting bugginess based on code complexity metrics,
information about code authors, code churn, …
Web macros are much simpler (don’t call each other, don’t have loops, etc)… we need different information here.
Problem Traits Predictions Conclusion
66
ApproachApproach
• Approach:– Consider the steps required for reusing code– Identify macro traits that might support reusing code– Empirically test whether code with these traits is more
likely to be reused– Empirically test whether these traits together can
accurately predict reuse (using machine learning)
Problem Traits Predictions Conclusion
77
What are the What are the traits of reusable web macros?traits of reusable web macros?
• Four fundamental steps of reuse in general:– Finding code– Understanding it– Modifying it– Composing it
• We expect that code is more reusable if it does not need modification to be reused.
• Users rarely combine CoScripter web macros.
• Traits should support finding, understanding, and not needing to modify.
Problem Traits Predictions Conclusion
88
We identified 35 We identified 35 candidatecandidate traits traits in 8 categoriesin 8 categories
• Mass appeal – eg popular keywords F• Language – eg data values are in English U• Annotations – eg comments U• Flexibility – eg parameterization (variables) M• Length – eg small # distinct lines of code UM• Author information – eg at IBM IP address M• Advanced syntax – eg “control-click” keyword UM• No Preconditions – eg no cookies needed M
F = findability, U = understandability, M = not modifying
All candidate traits values’ are computed fully automatically.
Problem Traits Predictions Conclusion
99
Getting some data to work withGetting some data to work with
• Extracted 6 months of IBM wiki data– Source code & usage logs for 937 public scripts– Four (binary) measures of reuse
• Execution by author > 24 hours after initial creation• Execution by any other user• Editing by any other user• Clone/copy-paste by any other user
– Why not use non-binary, absolute # of reuse counts??• Macros that call themselves (infinite loops)• Macros called periodically by other (non-macro) programs• Information cascades: popularity leads to popularity (purely an
artifact of the wiki’s UI)
(But we come back to absolute numbers later on…)
Problem Traits Predictions Conclusion
1010
Testing for correspondenceTesting for correspondence
• For each candidate trait, divide scripts into two groups– For boolean traits, based on true/false– For numerical traits, based on above/below mean
• Performed z-test of proportions:– Does the trait correspond as expected to higher
likelihood of reuse?
Problem Traits Predictions Conclusion
1111
We found many traits that empirically We found many traits that empirically corresponded to reuse.corresponded to reuse.
• Traits significant at p<0.00036 wrt at least one reuse measure– If websites hit by the macro contain certain keywords– If the macro was intended by IBM as a “tutorial” script– Number of comments in the macro’s code– If the macro has a title– Number of parameters in the macro– Number of literals hard-coded in the macro– Number of distinct lines of code in the macro– ID number of the macro author (indicates early adopter)– ID number of the script (generally lower for early adopters)– If the author was at an IBM IP address– Number of author’s previous scripts that had been reused– If the macro used ordinal advanced syntax– If the macro used “control-click”/”control-select” syntax– If the macro required user to be at a certain URL prior to run– If the macro hits a lot of different websites
• Traits significant at p<0.00036 wrt at least one reuse measure– If websites hit by the macro contain certain keywords– If the macro was intended by IBM as a “tutorial” script– Number of comments in the macro’s code– If the macro has a title– Number of parameters in the macro– Number of literals hard-coded in the macro– Number of distinct lines of code in the macro– ID number of the macro author (indicates early adopter)– ID number of the script (generally lower for early adopters)– If the author was at an IBM IP address– Number of author’s previous scripts that had been reused– If the macro used ordinal advanced syntax– If the macro used “control-click”/”control-select” syntax– If the macro required user to be at a certain URL prior to run– If the macro hits a lot of different websites
Mass appeal traits
Annotation traits
Length traits
Traits hinting higher author expertise
Use of advanced syntax
Problem Traits Predictions Conclusion
1212
These traits are “These traits are “raw materialsraw materials” for a ” for a predictive model.predictive model.
• A model of the form
reuse-measure = F(trait1, trait2, …, traitN)
– For starters, continue to use binary reuse measures.
– Approachable with supervised machine learning.
– F should be pretty simple, so that we can generate those explanations.
Problem Traits Predictions Conclusion
1313
• For each trait– Find the threshold that optimally divides the reused
macros from the un-reused macros– Retain trait only if its optimal divide does a good job of
dividing reused macros from un-reused macros
We call each trait-based constraint a “reuse predictor”.
Model that we developed Model that we developed (in words & pictures)(in words & pictures)
Tra
it le
vel
Threshold
Tra
it le
vel
Threshold
Problem Traits Predictions Conclusion
1414
Predicting if a macro will be reusedPredicting if a macro will be reused
• Count how many predictors are satisfied
• Predict that the macro will be reused if this count exceeds some minimum– Also a tunable parameter– A higher minimum implies a higher bar that a macro
must overcome to be predicted as to be reused• Fewer false positives, higher false negatives
Problem Traits Predictions Conclusion
1515
ExampleExample
• E.g.: Suppose the predictors were…comments ≥ 3 inet_urls ≤ 1 prev_created ≥ 10 literals ≤ 4
• Explanations might someday be formed like,– “You might be able to raise the reusability of this
macro by providing a few more comments and by replacing some literals with variables.” Show me how.
– “Macros in the search results have many comments, experienced authors, few inaccessible URLs, and few hardcoded literals.” Tell me more.
Problem Traits Predictions Conclusion
1616
Algorithm accuracy with varying values Algorithm accuracy with varying values for tunable parametersfor tunable parameters
Alternatemachinelearningalgorithms
Tru
e P
ositi
ve R
ate
False Positive Rate
Problem Traits Predictions Conclusion
1717
Absolute level of reuse rose sharply Absolute level of reuse rose sharply with the number of matcheswith the number of matches
Problem Traits Predictions Conclusion
1818
ConclusionsConclusions
• Traits contain enough information to predict reuse– Can we improve accuracy by tweaking how predictors
are built and selected?– Can we improve accuracy by using more traits and/or
information available after reuse is attempted?– Can we generalize to other kinds of programs?– Can we also predict reusability?
• Predictions combine trait data fairly simply– Work with IBM to enhance the CoScripter wiki
• Improving the search results• Providing UI for administrators to review macros• Giving programmers advice automatically
Problem Traits Predictions Conclusion
1919
ConclusionsConclusionsand future workand future work
• Traits contain enough information to predict reuse– Can we improve accuracy by tweaking how predictors
are built and selected?– Can we improve accuracy by using more traits and/or
information available after reuse is attempted?– Can we generalize to other kinds of programs?– Can we also predict reusability?
• Predictions combine trait data fairly simply– Work with IBM to enhance the CoScripter wiki
• Improving the search results• Providing UI for administrators to review macros• Giving programmers advice automatically
Problem Traits Predictions Conclusion
2020
Thank YouThank You
• To the VL/HCC for this opportunity
• To the EUSES Consortium for feedback
• To NSF for funding
Problem Traits Predictions Conclusion
2121
On closer examination…On closer examination…
Many rarely-reused macros…
• Reference non-public (intranet) URLs• Assume the browser is at a certain URL• Require user to be logged into a site• Hardcode values for form fields
2222
Typical activities toward achieving Typical activities toward achieving end-user programming goalsend-user programming goals
• Create a new end-user program from scratch
• Clone or copy-paste from existing end-user program
• Tweak code
• Programmatically call existing end-user code (rare)
• Manually run a series of existing end-user programs
Or any combination of the above.
• Create a new end-user program from scratch• Clone or copy-paste from existing end-user program• Tweak code• Programmatically call existing end-user code (rare)• Manually run a series of existing end-user programs
Or any combination of the above.
Note: 4 activities reuse or operate on existing code.
2323
Details of picking thresholdsDetails of picking thresholdsand choosing whether to retain traitsand choosing whether to retain traitsPick trait’s threshold by maximizing difference between
(fraction of above-threshold macros that are reused)& (fraction of below-threshold macros that are reused)
Retain trait if its difference exceeds a minimal distance
•Raising the minimum means that– Only the best traits are retained– Information can be lost
Similar to a z-test, but simpler to explain
“95% of macros with >3 comments were reused, versus only 24% of macros with fewer comments”
Problem Traits Predictions Conclusion