advancing foundation and practice of software analytics
DESCRIPTION
Vision Statement Presentation on "Advancing Foundation & Practice of Software Analytics" at the 2nd International NSF sponsored Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE 2013) http://promisedata.org/raise/2013/TRANSCRIPT
Advancing Foundation & Practice of Software Analytics
Tao Xie
North Carolina State Universitywith Dongmei Zhang (Microsoft Research Asia) Xusheng Xiao (North Carolina State University)
Chunhua Weng (Columbia University)
RAISE 2013
Software Analytics
Software analytics is to enable software practitioners to perform data exploration and analysis in order to obtain insightful and actionable information for data-driven tasks around software and services.
Dongmei Zhang, Yingnong Dang, Jian-Guang Lou, Shi Han, Haidong Zhang, and Tao Xie. Software Analytics as a Learning Case in Practice: Approaches and Experiences. In Proc. MALETS 2011.
MSRA Software Analytics group founded in May 2009 Term coined/defined expanding scope of previous work [Buse and Zimmermann, FoSER 10][Hassan and Xie, FoSER 10]
http://research.microsoft.com/en-us/groups/sa/ http://research.microsoft.com/en-us/news/features/softwareanalytics-052013.aspx
ICSE 2013
Five Dimensions
Research Topics
Technology Pillars
Target Audience
Connection to Practice
Output
Research Topics – the Trinity View
Software Users
Software Development Process
Software System
• Covering different areas ofsoftware domain
• Throughout entire development cycle
• Enabling practitioners to obtain insights
Data Sources
Runtime traces
Program logs
System events
Perf counters
…
Usage logUser surveysOnline forum
postsBlog & Twitter
…
Source codeBug history
Check-in historyTest cases
…
Target Audience – Software Practitioners
Developer
Tester
Program Manager
Usability engineer
Designer
Support engineer
Management personnel
Operation engineer
ICSE 2013
Output – Insightful Information
Conveys meaningful and useful understanding or knowledge towards completing the target task
Not easily attainable via directly investigating raw data without aid of analytics technologies
Going from correlation to causality Examples
It is easy to count the number of re-opened bugs, but how to find out the primary reasons for these re-opened bugs?
When the availability of an online service drops below a threshold, how to localize the problem?
ICSE 2013
Output – Actionable Information
Enables software practitioners to come up with concrete solutions towards completing the target task
Examples Why bugs were re-opened?▪ A list of bug groups each with the same reason
of re-opening Why availability of online services dropped?▪ A list of problematic areas with associated
confidence values Which part of my code should be refactored?▪ A list of cloned code snippets easily explored
from different perspectives
Research Topics & Technology Pillars
Vertical
Horizontal
Information Visualization
Data Analysis Algorithms
Large-scale Computing
Software Users
Software Development Process
Software System
ICSE 2013
Connection to Practice
Software Analytics is naturally tied with software development practice
Getting real
RealData
RealProblem
s
RealUsers
RealTools
Human/Tool Cooperation: Performance Debugging in the Large
11
Pattern Matching
Bug update
Problematic Pattern
Repository
Bug Database
Trace analysis
Bug filing
StackMine [Han et al. ICSE 12]
Trace StorageTrace collection
Internet
Shi Han, Yingnong Dang, Song Ge, Dongmei Zhang, and Tao Xie. Performance Debugging in the Large via Mining Millions of Stack Traces. In Proc. ICSE 2012
How many issues are still unknown?
Which trace file should I investigate
first?
Key to issue discovery
Bottleneck of
scalability
StackMine: Industry Impact
“We believe that the MSRA tool is highly valuable and much more efficient for mass trace (100+ traces) analysis. For 1000 traces, we believe the tool saves us 4-6 weeks of time to create new signatures, which is quite a significant productivity boost.”
- from Development Manager in WindowsHighly effective new issue
discovery onWindows mini-hang
Continuous impact on future Windows versions
12
Shi Han, Yingnong Dang, Song Ge, Dongmei Zhang, and Tao Xie. Performance Debugging in the Large via Mining Millions of Stack Traces. In Proc. ICSE 2012
Dual Ends of the Road
13
Foundation: Science of Software Analytics?From correlation to causality
Practice: Software AnalyticsFrom pieces to a wholeBring human in the loopMake real impact in practice
FoundationPractice
Caricature: Standard Security Research
Choose random system component
Find vulnerability
Suggest defense
Analyze security or test performance
Are we making progress?
Positive aspect: most security research addresses real problems
@J. Mitchell
Meaning of “Science”
Systematization of Knowledge: An organized body of knowledge gained through researchAd hoc point solutions vs. general understandingRepeating failures of the past with each new platform, type
of vulnerability
Scientific Method: System of acquiring knowledge based on the scientific methodProcess of hypothesis testing and experimentsBuilding abstractions and models, theorems
Universal Laws: Laws or theories that are predictiveWidely applicableMake strong, quantitative predictions
@D. Evans, J. Mitchell
Percentage of bug-introducing changes for eclipse
Don’t program on Fridays ;-)
[Zimmermann et al. 05]
Failure is a 4-letter Word
[PROMISE’11 Zeller et al.]
From Correlation to Causality
Analytic techniques are often used for applications that emphasize results over causation of the findings
Users may choose to act on the behavior without focus on understanding it (or its causation) provided that the pattern has a high empirical probability of correctly identifying an issueE.g., smuggling, traveling with false documents,
or predicting winning stock
@L. Williams, M. Rappa
From Correlation to Causality cont.
Analytic techniques are often not used to support the identification and advancement of fundamental scientific principles based upon an analysis of causation
Emphasize the use of analytics to advance science (e.g., producing insights) besides the use of analytics in providing just observations
@L. Williams, M. Rappa
Open Questions
How much science of a field (e.g., soft analytics)?A field may be a means/solution in contrast
to a problem domain like “security”, “design”
How can analytics/AI be used to help build science of “X”?
How to move a field to a foundational level?How to balance foundation and practice?
Dual Ends of the Road
21
FoundationPractice
Foundation: Science of Software Analytics?From correlation to causality
Practice: Software AnalyticsFrom pieces to a wholeBring human in the loopMake real impact in practice
Fitnex Path-Exploration Strategy for Pex in Pex Download counts
initial 20 months of release Academic: 17,366
Industrial: 13,022 Total: 30,388
22
Released since 2008
Analytics/AI is the Means to the End
Interesting results
Actionable results
vs.
Problem hunting
vs.
Problem driven
Open Questions
24
Who should bring software analytics research results to the hands of practitioners?
How to do so?
Dual Ends of the Road
25
FoundationPractice
Foundation: Science of Software Analytics?From correlation to causality
Practice: Software AnalyticsFrom pieces to a wholeBring human in the loopMake real impact in practice
Thank you!
Questions ?
https://sites.google.com/site/asergrp
NSF grants CCF-0845272, CCF-0915400, CNS-0958235, ARO grant W911NF-08-1-0443, an NSA Science of Security, Lablet grant, a NIST grant, a 2011 Microsoft Research SEIF Award