practitioners’ expectations on automated fault localization
TRANSCRIPT
Practitioners’ Expectations on Automated Fault Localization
Pavneet Singh Kochhar*, Xin Xia+, David Lo*, Shanping Li+*Singapore Management University
+Zhejiang University
The International Symposium on Software Testing and Analysis (ISSTA)
Too many bugs!
• Many projects receive large numbers of bug reports.
• Large number of bug reports can overwhelm developers. - Mozilla developer - “Everyday, almost 300 bugs appear that need triaging. This is far too much for only the Mozilla programmers to handle” *
What have researchers proposed to overcome this issue?
*J. Anvik, L. Hiew, and G. C. Murphy, “Coping with an open bug repository,” in ETX, pp. 35–39, 2005
2/31
Fault Localization
Thousands of Source Code Files
Find the buggy files/
methods/statements/
blocks3/31
------>GOAL:
How Fault Localization Works
4/31
Bug Reports Test Cases
Fault Localization TechniquesInformation Retrieval-Based, Slicing, Spectrum-Based etc.
Statements Methods Classes
Fault Localization
What are the expectations of practitioners on fault localization?
What factors affect adoption of fault localization tools?
What are the thresholds for adoption?
5/31
Our Study
Practitioners Expectations
6/31
Survey LiteratureReview
Practitioner Survey
7/31
Practitioners Survey
• Multi-pronged strategy:• Our contacts in IT industry
• Email 3300 practitioners on
• We receive 403 responses.
8/31
Survey Demographics
• 386 responses
• 33 countries
• Job profile• Software Dev – 80.83%• Software Testing – 30.05%• Project Management – 17.10%
• Professional – 78.13%, Open-source – 44.24%
9/31
RQ1: Importance of Fault Localization
10/31
All Dev Test PM ExpLow ExpMed ExpHigh OS Prof0%
10%20%30%40%50%60%70%80%90%
100%Essential Worthwhile Unimportant Unwise
Demographics
Rat
ings
RQ1: Importance of Fault Localization
11/31
All Dev Test PM ExpLow ExpMed ExpHigh OS Prof0%
10%20%30%40%50%60%70%80%90%
100%Essential Worthwhile Unimportant Unwise
Demographics
Rat
ings
Fisher’s Exact Test = p-values < 0.05
RQ1: Importance of Fault Localization
Why “Unimportant” or “Unwise”• Can’t deal with difficult bugs
- “I’m well aware of what static analysis can do and very few hard bugs would be solved with it.”
• No rationale - “I doubt any automated software can explain the reason for things such as broken backwards compatibility, unclear documentation etc.”
• Status quo - “I don’t think personally I would pay for it, because for my cases usual stack trace is over than enough”
12/31
RQ2: Availability of Debugging Data
13/31
Math-Spec Text-Spec One-Test Multi-Tests Suc-Tests Text-Desc0%
10%20%30%40%50%60%70%80%90%
100%All the time Sometimes Rarely Never
Debugging Hints Available
Rat
ings
RQ2: Availability of Debugging Data
14/31
Math-Spec Text-Spec One-Test Multi-Tests Suc-Tests Text-Desc0%
10%20%30%40%50%60%70%80%90%
100%All the time Sometimes Rarely Never
Debugging Hints Available
Rat
ings
>70% respondents mention availability of test cases
RQ2: Availability of Debugging Data
15/31
Math-Spec Text-Spec One-Test Multi-Tests Suc-Tests Text-Desc0%
10%20%30%40%50%60%70%80%90%
100%All the time Sometimes Rarely Never
Debugging Hints Available
Rat
ings
>80% respondents mention availability of bug reports
RQ3: Preferred Granularity Level
16/31
Component Class Method Block Statement0%
20%
40%
60%
20.21%26.42%
51.81%
44.30%50.00%
Preferred Granularity Level
Perc
enta
ge o
f Res
pond
ents
RQ4: Minimum Success Criterion
17/31
Position of the buggy element in returned list
Top 1 Top 5 Top 10 Top 20 Top 500%
25%
50%
75%
100%
9.43%
73.58%
15.09%
1.35% 0.54%
Minimum Success Criterion
Perc
enta
ge o
f Res
pond
ents
RQ5: Trustworthiness
18/31
Proportion of times a technique works.
5% 20% 50% 75% 90% 100%0%
25%
50%
75%
100%
Minimum Success Rate
Satis
fact
ion
Rat
e
RQ6: Scalability
19/31
Program sizes a technique can work on.
1-100 1-1000 1-10,000 1-100,000 1-1000,0000%
25%
50%
75%
100%
Minimum Program Size
Satis
fact
ion
Rat
e
RQ7: Efficiency
20/31
Time taken to produce the results.
< 1 seconds < 1 minute < 30 minutes < 1 hour < 1 day0%
25%
50%
75%
100%
Maximum Runtime
Satis
fact
ion
Rat
e
RQ8: Willingness to Adopt
21/31
• > 98% willing to adopt a trustworthy, scalable and efficient fault localization technique.
• Unwilling - Resistance to Change “Since I already have one and to use another would require training time and time to get used to it”
- More information needed “Would it be open source? Would it work with my main programming language? Would it work with distributed environments?” - Disbelief of possibility of success “I don’t think you can do it.”
RQ9: Other Factors (Hypotheses)
22/31
• Rationale - An automated debugging tool must provide a rationale why some program locations are marked as suspicious. - I will *still adopt* an efficient, scalable, and trustworthy automated debugging tool, even if it cannot provide rationales.
• IDE Integration - An automated debugging tool must be integrated well to my favourite IDE.
- I will *still adopt* a an efficient, scalable, and trustworthy automated debugging tool, even if it is not integrated well to my favourite IDE.
RQ9: Other Factors
23/31
Rationale Adoption w/o Rationale
IDE Adoption w/o IDE
0%10%20%30%40%50%60%70%80%90%
100%Strongly Agree Agree Neutral Disagree Strongly Disagree
Statements
Rat
ings
RQ9: Other Factors
24/31
• Rationale - False Positives “False positives are worst than false negatives in my opinion”
- Rationale for buggy code “Because to make a decisions about bug fixing I want to *exactly* know why the automated tool “thinks” that the code have a bug.”
• IDE Integration - Extra steps needed “No integration means extra steps which means testing will be more cumbersome and hence less used.”
- Strong Reliance on IDE “IDE is our environment. If I can’t add something into my environment, it’s useless.”
LiteratureReview
25/31
Literature Review
26/31
• Papers published in last 5 years (2011-2015) - ICSE (417) ---> 2 - FSE/ESEC-FSE (255) ---> 5 - ISSTA (169) ---> 3 - TSE (350) ---> 2 - TOSEM (137) ---> 4
• Included papers - Spectrum-Based fault localization, Information-retrieval- Based etc.
• Excluded papers - Automatic repair, empirical study on debugging, bug prediction, bug detection etc.
16 papers
Literature ReviewFactor Type Papers
Debugging Data
Specification -
Test Cases [4], [5], [24], [29], [35], [40], [44], [55],
[57], [59]Bug Reports [16], [19], [24], [52],
[56], [60]
Granularity
Method [24], [52]
Statement [4], [5], [29], [35], [44], [55], [57], [59]
Basic Block [16]
Other [19], [40], [56], [60]
27/31
Literature ReviewFactor Satisfaction Rate Papers
Success Rate
90% (90%) -
75% (75%) -
50% (50%) [16], [19], [35], [40], [52], [56], [59], [60]
? [4], [5], [29], [55], [57]
Scalability
90% (≥1M LOC) [29], [52]
75% (≥100,000 LOC) [16], [24], [56], [59], [60]
50% (≥10,000 LOC) [4], [5], [35], [40], [44], [55], [57]
? [19]
Efficiency90% (<1 minute) [4], [24], [40], [44], [56]
? [16], [19], [29], [52], [57], [60]
28/31
Literature Review
Factor Support? Papers
Rationale Yes [29], [44]
IDE Integration
Yes -
29/31
Key Takeaways
Large demand for fault localization >97% mention “Essential” or “Worthwhile”
High adoption barrier Satisfy 75% of practitioners; successful results in Top 5,
works 75% of time; ≥100,000 LOC; takes <1 minute.
Current techniques can’t satisfy 75% of respondents.
Techniques that satisfy 50% of respondents work on coarse granularity (class or file).
Rationale and IDE Integration are important.
30/31
Future Work
Develop fault localization techniques to bring current state-of-research closer to practitioners expectations.
Systematic Literature Review (SLR)
31/31
Conclusion
386 practitioners surveyed from 33 countries.
Test cases and bug reports are often available.
Preferred granularity - Method & Statement
Preferred Success Criterion – Top 5.
Different satisfaction rates for trustworthiness, scalability and efficiency.
Rationale and IDE Integration are important.
33/30