practitioners’ expectations on automated fault localization

Practitioners’ Expectations on Automated Fault Localization

Pavneet Singh Kochhar*, Xin Xia+, David Lo*, Shanping Li+*Singapore Management University

+Zhejiang University

The International Symposium on Software Testing and Analysis (ISSTA)

Too many bugs!

• Many projects receive large numbers of bug reports.

• Large number of bug reports can overwhelm developers. - Mozilla developer - “Everyday, almost 300 bugs appear that need triaging. This is far too much for only the Mozilla programmers to handle” *

What have researchers proposed to overcome this issue?

*J. Anvik, L. Hiew, and G. C. Murphy, “Coping with an open bug repository,” in ETX, pp. 35–39, 2005

2/31

Fault Localization

Thousands of Source Code Files

Find the buggy files/

methods/statements/

blocks3/31

------>GOAL:

How Fault Localization Works

4/31

Bug Reports Test Cases

Fault Localization TechniquesInformation Retrieval-Based, Slicing, Spectrum-Based etc.

Statements Methods Classes

Fault Localization

What are the expectations of practitioners on fault localization?

What factors affect adoption of fault localization tools?

What are the thresholds for adoption?

5/31

Our Study

Practitioners Expectations

6/31

Survey LiteratureReview

Practitioner Survey

7/31

Practitioners Survey

• Multi-pronged strategy:• Our contacts in IT industry

• Email 3300 practitioners on

• We receive 403 responses.

8/31

Survey Demographics

• 386 responses

• 33 countries

• Job profile• Software Dev – 80.83%• Software Testing – 30.05%• Project Management – 17.10%

• Professional – 78.13%, Open-source – 44.24%

9/31

RQ1: Importance of Fault Localization

10/31

All Dev Test PM ExpLow ExpMed ExpHigh OS Prof0%

10%20%30%40%50%60%70%80%90%

100%Essential Worthwhile Unimportant Unwise

Demographics

Rat

ings


11/31

All Dev Test PM ExpLow ExpMed ExpHigh OS Prof0%

10%20%30%40%50%60%70%80%90%

100%Essential Worthwhile Unimportant Unwise

Demographics

Rat

ings

Fisher’s Exact Test = p-values < 0.05


Why “Unimportant” or “Unwise”• Can’t deal with difficult bugs

- “I’m well aware of what static analysis can do and very few hard bugs would be solved with it.”

• No rationale - “I doubt any automated software can explain the reason for things such as broken backwards compatibility, unclear documentation etc.”

• Status quo - “I don’t think personally I would pay for it, because for my cases usual stack trace is over than enough”

12/31

RQ2: Availability of Debugging Data

13/31

Math-Spec Text-Spec One-Test Multi-Tests Suc-Tests Text-Desc0%

10%20%30%40%50%60%70%80%90%

100%All the time Sometimes Rarely Never

Debugging Hints Available

Rat

ings


14/31


10%20%30%40%50%60%70%80%90%



Rat

ings

>70% respondents mention availability of test cases


15/31


10%20%30%40%50%60%70%80%90%



Rat

ings

>80% respondents mention availability of bug reports

RQ3: Preferred Granularity Level

16/31

Component Class Method Block Statement0%

20%

40%

60%

20.21%26.42%

51.81%

44.30%50.00%

Preferred Granularity Level

Perc

enta

ge o

f Res

pond

ents

RQ4: Minimum Success Criterion

17/31

Position of the buggy element in returned list

Top 1 Top 5 Top 10 Top 20 Top 500%

25%

50%

75%

100%

9.43%

73.58%

15.09%

1.35% 0.54%

Minimum Success Criterion

Perc

enta

ge o

f Res

pond

ents

RQ5: Trustworthiness

18/31

Proportion of times a technique works.

5% 20% 50% 75% 90% 100%0%

25%

50%

75%

100%

Minimum Success Rate

Satis

fact

ion

Rat

e

RQ6: Scalability

19/31

Program sizes a technique can work on.

1-100 1-1000 1-10,000 1-100,000 1-1000,0000%

25%

50%

75%

100%

Minimum Program Size

Satis

fact

ion

Rat

e

RQ7: Efficiency

20/31

Time taken to produce the results.

< 1 seconds < 1 minute < 30 minutes < 1 hour < 1 day0%

25%

50%

75%

100%

Maximum Runtime

Satis

fact

ion

Rat

e

RQ8: Willingness to Adopt

21/31

• > 98% willing to adopt a trustworthy, scalable and efficient fault localization technique.

• Unwilling - Resistance to Change “Since I already have one and to use another would require training time and time to get used to it”

- More information needed “Would it be open source? Would it work with my main programming language? Would it work with distributed environments?” - Disbelief of possibility of success “I don’t think you can do it.”

RQ9: Other Factors (Hypotheses)

22/31

• Rationale - An automated debugging tool must provide a rationale why some program locations are marked as suspicious. - I will *still adopt* an efficient, scalable, and trustworthy automated debugging tool, even if it cannot provide rationales.

• IDE Integration - An automated debugging tool must be integrated well to my favourite IDE.

- I will *still adopt* a an efficient, scalable, and trustworthy automated debugging tool, even if it is not integrated well to my favourite IDE.

RQ9: Other Factors

23/31

Rationale Adoption w/o Rationale

IDE Adoption w/o IDE

0%10%20%30%40%50%60%70%80%90%

100%Strongly Agree Agree Neutral Disagree Strongly Disagree

Statements

Rat

ings

RQ9: Other Factors

24/31

• Rationale - False Positives “False positives are worst than false negatives in my opinion”

- Rationale for buggy code “Because to make a decisions about bug fixing I want to *exactly* know why the automated tool “thinks” that the code have a bug.”

• IDE Integration - Extra steps needed “No integration means extra steps which means testing will be more cumbersome and hence less used.”

- Strong Reliance on IDE “IDE is our environment. If I can’t add something into my environment, it’s useless.”

LiteratureReview

25/31

Literature Review

26/31

• Papers published in last 5 years (2011-2015) - ICSE (417) ---> 2 - FSE/ESEC-FSE (255) ---> 5 - ISSTA (169) ---> 3 - TSE (350) ---> 2 - TOSEM (137) ---> 4

• Included papers - Spectrum-Based fault localization, Information-retrieval- Based etc.

• Excluded papers - Automatic repair, empirical study on debugging, bug prediction, bug detection etc.

16 papers

Literature ReviewFactor Type Papers

Debugging Data

Specification -

Test Cases [4], [5], [24], [29], [35], [40], [44], [55],

[57], [59]Bug Reports [16], [19], [24], [52],

[56], [60]

Granularity

Method [24], [52]

Statement [4], [5], [29], [35], [44], [55], [57], [59]

Basic Block [16]

Other [19], [40], [56], [60]

27/31

Literature ReviewFactor Satisfaction Rate Papers

Success Rate

90% (90%) -

75% (75%) -

50% (50%) [16], [19], [35], [40], [52], [56], [59], [60]

? [4], [5], [29], [55], [57]

Scalability

90% (≥1M LOC) [29], [52]

75% (≥100,000 LOC) [16], [24], [56], [59], [60]

50% (≥10,000 LOC) [4], [5], [35], [40], [44], [55], [57]

? [19]

Efficiency90% (<1 minute) [4], [24], [40], [44], [56]

? [16], [19], [29], [52], [57], [60]

28/31

Literature Review

Factor Support? Papers

Rationale Yes [29], [44]

IDE Integration

Yes -

29/31

Key Takeaways

Large demand for fault localization >97% mention “Essential” or “Worthwhile”

High adoption barrier Satisfy 75% of practitioners; successful results in Top 5,

works 75% of time; ≥100,000 LOC; takes <1 minute.

Current techniques can’t satisfy 75% of respondents.

Techniques that satisfy 50% of respondents work on coarse granularity (class or file).

Rationale and IDE Integration are important.

30/31

Future Work

Develop fault localization techniques to bring current state-of-research closer to practitioners expectations.

Systematic Literature Review (SLR)

31/31

Thank You!

Pavneet Singh Kochharkochharps.wix.com/pavneet

Email: [email protected]

Conclusion

386 practitioners surveyed from 33 countries.

Test cases and bug reports are often available.

Preferred granularity - Method & Statement

Preferred Success Criterion – Top 5.

Different satisfaction rates for trustworthiness, scalability and efficiency.

Rationale and IDE Integration are important.

33/30