a bug report analysis and search tool (presentation for m.sc. degree)
TRANSCRIPT
A Bug Report Analysis and Search ToolM.Sc. Presentation
Yguaratã Cerqueira [email protected]
Advisor: Silvio Romero de Lemos MeiraCo-Advisor: Eduardo Santana de Almeida
Center for Informatics – Federal University of Pernambuco (UFPE)http://www.cin.ufpe.br
Reuse in Software Engineering (RiSE)http://www.rise.com.br
07/03/2009, Recife – Brazil
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 1 / 57
Summary
1 IntroductionM.Sc. Context, Motivation, Proposed solution
2 The Bug Report Duplication Problem: A Characterization StudyDefinition, Planning and Operation, Results
3 BASTRequirements, Architecture, Overview
4 Case StudyDefinition, Planning, Analysis and interpretation
5 ExperimentDefinition, Planning, Analysis and interpretation
6 Related Work
7 Conclusion
8 References
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 2 / 57
Outline
1 IntroductionM.Sc. Context, Motivation, Proposed solution
2 The Bug Report Duplication Problem: A Characterization StudyDefinition, Planning and Operation, Results
3 BASTRequirements, Architecture, Overview
4 Case StudyDefinition, Planning, Analysis and interpretation
5 ExperimentDefinition, Planning, Analysis and interpretation
6 Related Work
7 Conclusion
8 References
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 3 / 57
M.Sc. Context
Change management handles requests for:
new features
correction of errors
improvements
It drives the software maintenance and evolution
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 4 / 57
M.Sc. Context
Change management handles requests for:
new features
correction of errors
improvements
It drives the software maintenance and evolution
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 4 / 57
Motivation
Software maintenance and evolution are characterised by their hugecost and slow speed of implementation
Sommerville says that it takes almost 90% of costs
Year Total costs Reference2000 >90% Erlikh (2000)1993 75% Eastwood (1993)1990 >90% Moad (1990)1990 60–70% Huff (1990)1988 60–70% Port (1988)1984 65–75% McKee (1984)1981 >50% Lientz and Swanson (1981)1979 67% Zelkowitz et al. (1979)
Table: Conducted studies about software maintenance costs (Koskinen, 2004).
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 5 / 57
Bug tracking activity
Bug reports management
Verify bug report validity
Analyze the impact of a bug report
Assign a developer
Help with development process in general
Bug reports Software artifact that describes some defect or enhancement;Generally, bug report submitters are developers, users, ortesters
Bug trackers Bug trackers are used to manage, store and handle changerequests (also known as bug reports)
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 6 / 57
Bug tracking activity
Bug reports management
Verify bug report validity
Analyze the impact of a bug report
Assign a developer
Help with development process in general
Bug reports Software artifact that describes some defect or enhancement;Generally, bug report submitters are developers, users, ortesters
Bug trackers Bug trackers are used to manage, store and handle changerequests (also known as bug reports)
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 6 / 57
Bug trackers advantages
Traceability (developers, releases)
Fast identification of problems
Metrics (errors per developers, to identify critical components, etc)
Comments
Project history
Examples: Mantis, Bugzilla, Trac, Jyra
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 7 / 57
A bug report example
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 8 / 57
A bug report example [2]
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 9 / 57
A bug report example [3]
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 10 / 57
A bug report example [4]
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 11 / 57
Issues coming from bug trackers
Dynamic assignment of bug reports (Anvik et al., 2006);
Change impact analysis and effort estimation of new bug reports(Song et al., 2006);
Quality of bug report descriptions (Ko et al., 2006);
Software evolution traceability (Sandusky et al., 2004); and
Duplicate bug reports detection consists in avoiding the submission ofbug reports that describe the submitted issue (Hiew, 2006).
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 12 / 57
The bug report duplication problem
Characterized by the submission of two or more bug reports that describethe same software issue
Overhead of rework to search and analyze bug reports
People take almost 5-15 minutes to perform search and analysis (Anviket al., 2005; Cavalcanti et al., 2008)
10% to 30% of a bug report repository are composed by duplicated bugreports (Anvik et al., 2005; Runeson et al., 2007; Cavalcanti et al., 2008)
So, costs withopening bug reports (5-15 minutes)CCB analysis (5-15 minutes)developer analysis (5-15 minutes)
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 13 / 57
Proposed solution
The proposed solution consists in a Web based application that enablespeople involved with bug report search and analysis to perform suchtasks more effectively.
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 14 / 57
Outline
1 IntroductionM.Sc. Context, Motivation, Proposed solution
2 The Bug Report Duplication Problem: A Characterization StudyDefinition, Planning and Operation, Results
3 BASTRequirements, Architecture, Overview
4 Case StudyDefinition, Planning, Analysis and interpretation
5 ExperimentDefinition, Planning, Analysis and interpretation
6 Related Work
7 Conclusion
8 References
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 15 / 57
Definition
The goal of this study was to analyze bug repositories and the activities forsearching and analyzing bug reports
with the purpose of understanding them with respect to the possible factorsthat could impact on the duplication problem and theirconsequences on software development
from the point of view of the researchers
in the context of software development projects
QuestionsQ1: Do the projects have a considerable amount of duplicate bug reports?Q2: Is the productivity being affected by the bug report duplication problem?Q3: Is there a common vocabulary for bug report descriptions?Q4: How are the relationships between master bug reports and duplicate bugreports characterized?Q5: Does the type of bug report influence the amount of duplicates?
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 16 / 57
Definition
The goal of this study was to analyze bug repositories and the activities forsearching and analyzing bug reports
with the purpose of understanding them with respect to the possible factorsthat could impact on the duplication problem and theirconsequences on software development
from the point of view of the researchers
in the context of software development projects
QuestionsQ1: Do the projects have a considerable amount of duplicate bug reports?Q2: Is the productivity being affected by the bug report duplication problem?Q3: Is there a common vocabulary for bug report descriptions?Q4: How are the relationships between master bug reports and duplicate bugreports characterized?Q5: Does the type of bug report influence the amount of duplicates?
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 16 / 57
Planning and operation
Projects and data selectionAll bug reports till June/2008
Project LOC Staff size Bugs Life-timeBugzilla 55K 340 12829 14Eclipse 6.5M 352 130095 7Epiphany 100K 19 10683 6Evolution 1M 156 72646 11Firefox 80K 514 60233 9GCC 4.2M 285 35797 9Thunderbird 310K 192 19204 8Tomcat 200K 57 8293 8Private Project 2M 21 7955 2
Performed at C.E.S.A.R. between June/2008 to August/2008
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 17 / 57
Results
Question 1: Do the analyzed projects have a considerable amount ofduplicate bug reports?
Metric Bugz. Eclip. Epiph. Evol. Firef . GCC Thund . Tomc. Private Proj. Mean SDM1 % 23.32 19.44 31.52 43.24 38.39 17.68 49.10 8.24 21.59 28.1 13.4
Question 2: Is the submitters productivity being affected by the bug reportduplication problem?
Metric Bugz. Eclip. Epiph. Evol. Firef . GCC Thund . Tomc. Private Proj. Mean SDM2 (min) 05-15 – 05-15 05-15 05-10 05-15 05-15 – 20-30 12.5 1.88M4 bugs per day 71 722 59 403 334 198 106 46 145 231.5 222.1
Question 3: Is there a common vocabulary for bug report descriptions?
Metric Bugz. Eclip. Epiph. Evol. Firef . GCC Thund . Tomc. Private Proj. Mean SDM5 % – 25 – – 22 – – – 35 31.2 9.5
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 18 / 57
Results
Question 1: Do the analyzed projects have a considerable amount ofduplicate bug reports?
Metric Bugz. Eclip. Epiph. Evol. Firef . GCC Thund . Tomc. Private Proj. Mean SDM1 % 23.32 19.44 31.52 43.24 38.39 17.68 49.10 8.24 21.59 28.1 13.4
Question 2: Is the submitters productivity being affected by the bug reportduplication problem?
Metric Bugz. Eclip. Epiph. Evol. Firef . GCC Thund . Tomc. Private Proj. Mean SDM2 (min) 05-15 – 05-15 05-15 05-10 05-15 05-15 – 20-30 12.5 1.88M4 bugs per day 71 722 59 403 334 198 106 46 145 231.5 222.1
Question 3: Is there a common vocabulary for bug report descriptions?
Metric Bugz. Eclip. Epiph. Evol. Firef . GCC Thund . Tomc. Private Proj. Mean SDM5 % – 25 – – 22 – – – 35 31.2 9.5
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 18 / 57
Results
Question 1: Do the analyzed projects have a considerable amount ofduplicate bug reports?
Metric Bugz. Eclip. Epiph. Evol. Firef . GCC Thund . Tomc. Private Proj. Mean SDM1 % 23.32 19.44 31.52 43.24 38.39 17.68 49.10 8.24 21.59 28.1 13.4
Question 2: Is the submitters productivity being affected by the bug reportduplication problem?
Metric Bugz. Eclip. Epiph. Evol. Firef . GCC Thund . Tomc. Private Proj. Mean SDM2 (min) 05-15 – 05-15 05-15 05-10 05-15 05-15 – 20-30 12.5 1.88M4 bugs per day 71 722 59 403 334 198 106 46 145 231.5 222.1
Question 3: Is there a common vocabulary for bug report descriptions?
Metric Bugz. Eclip. Epiph. Evol. Firef . GCC Thund . Tomc. Private Proj. Mean SDM5 % – 25 – – 22 – – – 35 31.2 9.5
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 18 / 57
Results [2]
Question 4: How are the relationships between master bug reports andduplicate bug reports characterized?
One to one relation
bug123: bug3453
One to many relation
bug345: bug45345,bug465, bug654
Figure: Bug reports grouping.
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 19 / 57
Results [3]Question 5: Does the type of bug report influence the amount of duplicates?
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 20 / 57
Study summary
All the projects are being affected by the bug report duplication problem;
The productivity is being affected by the bug reports duplication problem;
It is not used a common vocabulary to describe the bug reports;
> 80% of the groups are composed by one-to-one grouping type;
The bug report duplication occur independently of the type of bug reports;
The number of LOC is not a factor for the duplication problem;
The size of the repository is not a factor for duplication;
Projects’ life-time is not a factor for duplication;
The staff size (developers) is not a factor for the duplication problem;and
The profile of the submitter is a determining factor for the submission ofduplicates: sporadic ≥ average ≥ frequent
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 21 / 57
Study summary
All the projects are being affected by the bug report duplication problem;
The productivity is being affected by the bug reports duplication problem;
It is not used a common vocabulary to describe the bug reports;
> 80% of the groups are composed by one-to-one grouping type;
The bug report duplication occur independently of the type of bug reports;
The number of LOC is not a factor for the duplication problem;
The size of the repository is not a factor for duplication;
Projects’ life-time is not a factor for duplication;
The staff size (developers) is not a factor for the duplication problem;and
The profile of the submitter is a determining factor for the submission ofduplicates: sporadic ≥ average ≥ frequent
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 21 / 57
Outline
1 IntroductionM.Sc. Context, Motivation, Proposed solution
2 The Bug Report Duplication Problem: A Characterization StudyDefinition, Planning and Operation, Results
3 BASTRequirements, Architecture, Overview
4 Case StudyDefinition, Planning, Analysis and interpretation
5 ExperimentDefinition, Planning, Analysis and interpretation
6 Related Work
7 Conclusion
8 References
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 22 / 57
Requirements
Functional requirements
FR1 - Keyword-based search
FR2 - Rank search results basedon bug reports similarity rate
FR3 - Index bug reports from XMLfiles
FR4 - Index bug reports fromoriginal database
FR5 - Extract useful informationfrom bug reports
Non-Functional requirements
NFR1 - Simple and intuitive filtersinterface
NFR2 - Reports about bugrepository status
NFR3 - Integration with mostpopular bug report trackingsystems
NFR4 - Log search queries anduser actions
NFR5 - Reasonable similarity rate
NFR6 - Web-based interface withAJAX
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 23 / 57
Requirements
Functional requirements
FR1 - Keyword-based search
FR2 - Rank search results basedon bug reports similarity rate
FR3 - Index bug reports from XMLfiles
FR4 - Index bug reports fromoriginal database
FR5 - Extract useful informationfrom bug reports
Non-Functional requirements
NFR1 - Simple and intuitive filtersinterface
NFR2 - Reports about bugrepository status
NFR3 - Integration with mostpopular bug report trackingsystems
NFR4 - Log search queries anduser actions
NFR5 - Reasonable similarity rate
NFR6 - Web-based interface withAJAX
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 23 / 57
Architecture
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 24 / 57
Overview
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 25 / 57
Outline
1 IntroductionM.Sc. Context, Motivation, Proposed solution
2 The Bug Report Duplication Problem: A Characterization StudyDefinition, Planning and Operation, Results
3 BASTRequirements, Architecture, Overview
4 Case StudyDefinition, Planning, Analysis and interpretation
5 ExperimentDefinition, Planning, Analysis and interpretation
6 Related Work
7 Conclusion
8 References
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 26 / 57
DefinitionContext. Performed in a real test cycle at a C.E.S.A.R. partner
between July and August 2008Systematic process to test and open bug reports
Objectives. 1 Which can prevent more duplicate bug reports2 To consider whether our tool decreases the time spent on
analysis of bug reportsBaseline tool. Internal tool where testers can search for bug reports using
SQL filters.
Null hypotheses
H0: µ time with BAST > µ time with baseline
µduplicates avoided with BAST < µduplicates avoided with baseline
Alternative hypotheses
H1: µ time with BAST < µ time with baseline
µduplicates avoided with BAST > µduplicates avoided with baseline
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 27 / 57
DefinitionContext. Performed in a real test cycle at a C.E.S.A.R. partner
between July and August 2008Systematic process to test and open bug reports
Objectives. 1 Which can prevent more duplicate bug reports2 To consider whether our tool decreases the time spent on
analysis of bug reportsBaseline tool. Internal tool where testers can search for bug reports using
SQL filters.
Null hypotheses
H0: µ time with BAST > µ time with baseline
µduplicates avoided with BAST < µduplicates avoided with baseline
Alternative hypotheses
H1: µ time with BAST < µ time with baseline
µduplicates avoided with BAST > µduplicates avoided with baseline
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 27 / 57
Planning
The tool was tested by the Bug Report MasterResponsible for the test cycleMost experienced testerDoubt should be saned with him
Case study design: Search and analysis being performed in:
1 step. Internal tool =⇒ BAST2 step. BAST =⇒ Internal tool
Metrics (manual annotations):Type of bug reports analyzedNumber of duplicate bug reports avoidedTime spent to analyze similar bug reports
Quantitative analysis: Descriptive statistics
It were analyzed 144 bug reports
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 28 / 57
Analysis and interpretation
Repository status
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 29 / 57
Analysis and interpretation [2]
Duplicates found
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 30 / 57
Analysis and interpretation [3]Time spent
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 31 / 57
Case study summary
Bug tracker status. More than 50% of duplicates
Duplicates found. Our tool can prevent more duplicates than thebaseline tool
Time spent. The bug report master saved time using our tool
Drawbacks
Case study design. Accommodation of the subject, in which he prefersto use one tool instead of other.
Amount of bug reports in treatments. The amounts of bug reports thatwere analyzed in each treatment were very different.
Lack of subjects. The number of subjects was not sufficient togeneralize the case study results.
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 32 / 57
Case study summary
Bug tracker status. More than 50% of duplicates
Duplicates found. Our tool can prevent more duplicates than thebaseline tool
Time spent. The bug report master saved time using our tool
Drawbacks
Case study design. Accommodation of the subject, in which he prefersto use one tool instead of other.
Amount of bug reports in treatments. The amounts of bug reports thatwere analyzed in each treatment were very different.
Lack of subjects. The number of subjects was not sufficient togeneralize the case study results.
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 32 / 57
Outline
1 IntroductionM.Sc. Context, Motivation, Proposed solution
2 The Bug Report Duplication Problem: A Characterization StudyDefinition, Planning and Operation, Results
3 BASTRequirements, Architecture, Overview
4 Case StudyDefinition, Planning, Analysis and interpretation
5 ExperimentDefinition, Planning, Analysis and interpretation
6 Related Work
7 Conclusion
8 References
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 33 / 57
Definition
The goal of this experiment was to analyze a tool to improve search andanalysis of bug reports
with the purpose of evaluating it with respect to its effectiveness and efficiencyon detection of duplicate bug reports and time saving
from the point of view of the researchers
in the context of software development projects
Questions
Q1 Is there a reduction on the number of duplicated bug reportswith the new tool adoption?
Q2 Is there a reduction on the time that submitters spend to performthe search and analysis of bug reports with the tool adoption?
Q3 Did the submitters have difficulties to use the tool?
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 34 / 57
Definition
The goal of this experiment was to analyze a tool to improve search andanalysis of bug reports
with the purpose of evaluating it with respect to its effectiveness and efficiencyon detection of duplicate bug reports and time saving
from the point of view of the researchers
in the context of software development projects
Questions
Q1 Is there a reduction on the number of duplicated bug reportswith the new tool adoption?
Q2 Is there a reduction on the time that submitters spend to performthe search and analysis of bug reports with the tool adoption?
Q3 Did the submitters have difficulties to use the tool?
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 34 / 57
Definition [2]
Objects of study: BAST and Bugzilla.
Quality focus: Effectiveness and efficiency of the tool developed.
Context: The adoption of a tool developed to aid the bug report trackingprocess, focusing on search and analysis of bug report to avoidduplicates.
Experiment type: Off-line experiment (Wohlin et al., 2000)
Subjects: 18 Ph.D. and M.Sc. students from the Computer Sciencedepartment at Federal University of Pernambuco/Brazil
Performed distributed (no place restrictions)
Bug reports from Firefox open-source project
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 35 / 57
Planning
Subjects selection. Selected by convenience sampling (Wohlin et al.,2000; Kitchenham and Pfleeger, 2002)Instrumentation: 32 error descriptions concerning Firefox project
50% with defects that already have bug reports describing them in therepository50% with unique/not-reported defects
Guidelines to guide the experiment execution (FAQ)
Time-sheets to collect the time with search and analysis
Quantitative analysis: Descriptive statistics and hypothesis testing[test-t (Wohlin et al., 2000)]
Qualitative analysis: Questionnaire
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 36 / 57
Planning [2]
Null hypothesis
H0: µ time with BAST > µ time with baseline
µduplicates avoided with BAST < µduplicates avoided with baseline
Alternative hypothesis
H1: µ time with BAST < µ time with baseline
µduplicates avoided with BAST > µduplicates avoided with baseline
Independent variables. The tool used (BAST or Bugzilla)
Dependent variables. (a) amount of duplicate bug reports and (b) thetime spent with search and analysis
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 37 / 57
Planning [3]
Experiment design
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 38 / 57
Analysis and interpretation
Descriptive statistics
Time spent on analysis Bug-reports avoidedBAST Bugzilla BAST Bugzilla
Mean 4.54 4.32 7.56 8.33Maximum 6.84 9.56 13 12Minimum 1.78 2.47 0 0SD 1.49 1.91 3.5 3.2
Table: Descriptive statistics.
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 39 / 57
Analysis and interpretation [2]Descriptive statistics [2]
Figure: Box plot for time spent.
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 40 / 57
Analysis and interpretation [3]Descriptive statistics [3]
Figure: Box plot for duplicates avoided.
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 41 / 57
Analysis and interpretation [4]
Hypothesis test
Time spent on analysis Duplicates avoidedt0 0.6292 -1.2466Degrees of freedom 17 17p-value 0.5376 0.2294T distribution 2.11 2.11Result (t0 > T) H0: not rejected H0: not rejected
Analysis of dependency
BAST time Bugzilla time BAST duplicates Bugzilla duplicatesYears of experience -0.13 -0.02 -0.19 0.18Number of projects -0.11 0.37 -0.28 -0.025Bug trackers used -0.16 0.35 -0.26 0.05
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 42 / 57
Analysis and interpretation [4]
Hypothesis test
Time spent on analysis Duplicates avoidedt0 0.6292 -1.2466Degrees of freedom 17 17p-value 0.5376 0.2294T distribution 2.11 2.11Result (t0 > T) H0: not rejected H0: not rejected
Analysis of dependency
BAST time Bugzilla time BAST duplicates Bugzilla duplicatesYears of experience -0.13 -0.02 -0.19 0.18Number of projects -0.11 0.37 -0.28 -0.025Bug trackers used -0.16 0.35 -0.26 0.05
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 42 / 57
Qualitative analysis
BAST features. Seven (7) used the filter features provided by the tool.
BAST Usability. Only one mentioned some difficult to use the filters, and onlyone subject had problem with ordering features.
BAST usefulness. Fifteen (15) subjects believe that the way as bug reportdetails are presented in BAST is useful for the analysis, more than Bugzilla.
Testimonials“in fact, the way details are presented saves time to check them, since it is notnecessary to open extra tabs or windows to see the details”, and other wrote “itbecame easier to identify the duplicate bug reports and navigate among thedetails of the them”.
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 43 / 57
Qualitative analysis
BAST features. Seven (7) used the filter features provided by the tool.
BAST Usability. Only one mentioned some difficult to use the filters, and onlyone subject had problem with ordering features.
BAST usefulness. Fifteen (15) subjects believe that the way as bug reportdetails are presented in BAST is useful for the analysis, more than Bugzilla.
Testimonials“in fact, the way details are presented saves time to check them, since it is notnecessary to open extra tabs or windows to see the details”, and other wrote “itbecame easier to identify the duplicate bug reports and navigate among thedetails of the them”.
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 43 / 57
Validity Threats
Boredom
Lack of Historical Data
Environment
Subjects Knowledge on bug reports
Errors re-descriptions and fictitious errors
Halo Effect
Internet Connection Constraints
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 44 / 57
Outline
1 IntroductionM.Sc. Context, Motivation, Proposed solution
2 The Bug Report Duplication Problem: A Characterization StudyDefinition, Planning and Operation, Results
3 BASTRequirements, Architecture, Overview
4 Case StudyDefinition, Planning, Analysis and interpretation
5 ExperimentDefinition, Planning, Analysis and interpretation
6 Related Work
7 Conclusion
8 References
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 45 / 57
Related work
Automated Support for Classifying Software Failure Reports(Podgurski et al., 2003)
Bug reports: Software failures automatically submittedTechnique: Supervised and unsupervised pattern classification andmultivariate visualizationTesting: Batch runsDataset: GCC, Jikes, and JavaC
Assisted Detection of Duplicate Bug Reports (Hiew, 2006)Bug reports: Natural language bug reportsTechnique: Organize similar bug reports into centroids using TF-IDFTesting: Batch runsDataset: Firefox, Eclipse, Apache, and Fedora CoreResults: Precision of 29% and recall of 50%
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 46 / 57
Related work [2]
Detection of Duplicate Defect Reports Using Natural LanguageProcessing (Runeson et al., 2007)
Bug reports: Natural language bug reportsTechnique: Natural Language Processing (NLP)Testing: Batch runs and a toolDataset: Sony Ericsson Mobile CommunicationsResults: Recall of 40%
An Approach to Detecting Duplicate Bug Reports Using NaturalLanguage and Execution Information (Wang et al., 2008)
Bug reports: Natural language bug reportsTechnique: NLP and execution informationTesting: Batch runsDataset: Firefox and EclipseResults: Recall of 67%-93% at its best
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 47 / 57
Outline
1 IntroductionM.Sc. Context, Motivation, Proposed solution
2 The Bug Report Duplication Problem: A Characterization StudyDefinition, Planning and Operation, Results
3 BASTRequirements, Architecture, Overview
4 Case StudyDefinition, Planning, Analysis and interpretation
5 ExperimentDefinition, Planning, Analysis and interpretation
6 Related Work
7 Conclusion
8 References
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 48 / 57
Research contribution
A taxonomy for the bug repositories mining area
The state-of-the-art on mining bug repositories
A characterization of the bug report duplication problem
A tool to reduce the time spent with search and analysis of bugreports
A case study to evaluate the tool proposed;
An experiment with 18 subjects to evaluate the tool
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 49 / 57
Papers
Cavalcanti, Y. C., Martins, A. C., de Almeida, E. S., and de Lemos Meira,S. R. (2008a). Avoiding Duplicate CR reports in Open Source SoftwareProjects. In The 9th International Free Software Forum (IFSF’08), PortoAlegre, Brazil.
Cavalcanti, Y. C., de Almeida, E. S., da Cunha, C. E. A., Pinto, E. R., andMeira, S. R. L. (2008b). The Bug Report Duplication Problem: ACharacterization Study. Technical report, C.E.S.A.R and FederalUniversity of Pernambuco.
Papers for the Case Study and for the Experiment
And more two journal papers being written (characterization and thesis)
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 50 / 57
Future Work
Evolve from prototype
Information visualizationAlternative integration methods
Provide integration with othertools
Search and raking techniquesComments of a bug reportNumber of informal references
Experiment replications
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 51 / 57
Outline
1 IntroductionM.Sc. Context, Motivation, Proposed solution
2 The Bug Report Duplication Problem: A Characterization StudyDefinition, Planning and Operation, Results
3 BASTRequirements, Architecture, Overview
4 Case StudyDefinition, Planning, Analysis and interpretation
5 ExperimentDefinition, Planning, Analysis and interpretation
6 Related Work
7 Conclusion
8 References
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 52 / 57
References I
Anvik, J., Hiew, L., and Murphy, G. C. (2005). Coping with an open bugrepository. In Proceedings of the 2005 OOPSLA workshop on Eclipsetechnology eXchange, pages 35–39, New York, NY, USA. ACM Press.
Anvik, J., Hiew, L., and Murphy, G. C. (2006). Who should fix this bug? InProceeding of the 28th International Conference on Software Engineering(ICSE’06), pages 361–370, New York, NY, USA. ACM Press.
Cavalcanti, Y. C., Almeida, E. S., da Cunha, C. E. A., Pinto, E. R., and Meira,S. R. L. (2008). The bug-report duplication problem: a characterizationstudy. Technical report, C.E.S.A.R and Federal University of Pernambuco.
Eastwood, A. (1993). Firm fires shots at legacy systems. Computing Canada,19(2), 17.
Erlikh, L. (2000). Leveraging legacy system dollars for e-business. ITProfessional , 2(3), 17–23.
Hiew, L. (2006). Assisted Detection of Duplicate Bug Reports. Master’s thesis,The University of British Columbia.
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 53 / 57
References IIHuff, F. (1990). Information systems maintenance. The Business Quarterly ,
(55), 30–32.
Kitchenham, B. and Pfleeger, S. L. (2002). Principles of survey research: part5: populations and samples. SIGSOFT Software Engineering Notes, 27(5),17–20.
Ko, A. J., Myers, B. A., and Chau, D. H. (2006). A linguistic analysis of howpeople describe software problems. In Proceedings of the VisualLanguages and Human-Centric Computing (VLHCC’06), pages 127–134,Washington, DC, USA. IEEE Computer Science.
Koskinen, J. (2004). Software maintenance costs.http://www.cs.jyu.fi/~koskinen/smcosts.htm.
Lientz, B. P. and Swanson, E. B. (1981). Problems in application softwaremaintenance. Communications of the ACM, 24(11), 763–769.
McKee, J. R. (1984). Maintenance as a function of design. In AFIPS NationalConference Proceeding, volume 53, pages 187–1983.
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 54 / 57
References III
Moad, J. (1990). Maintaining the competitive edge. Datamation, 4(36), 61–62.
Podgurski, A., Leon, D., Francis, P., Masri, W., Minch, M., Sun, J., and Wang,B. (2003). Automated support for classifying software failure reports. InProceedings of the 25th International Conference on Software Engineering(ICSE’03), pages 465–475, Washington, DC, USA. IEEE Computer Society.
Port, O. (1988). The software trap – automate or else. Business Week ,9(3051), 142–154.
Runeson, P., Alexandersson, M., and Nyholm, O. (2007). Detection ofduplicate defect reports using natural language processing. In Proceedingsof the 29th International Conference on Software Engineering (ICSE’07),pages 499–510. IEEE Computer Science Press.
Sandusky, R. J., Gasser, L., and Ripoche, G. (2004). Bug report networks:Varieties, strategies, and impacts in a f/oss development community. InProceedings of the 1st International Workshop on Mining SoftwareRepositories (MSR’04), pages 80–84, University of Waterloo, Waterloo.
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 55 / 57
References IV
Sommerville, I. (2007). Software Engineering. Addison Wesley, 8 edition.
Song, Q., Shepperd, M. J., Cartwright, M., and Mair, C. (2006). Softwaredefect association mining and defect correction effort prediction. IEEETransactions on Software Engineering, 32(2), 69–82.
Wang, X., Zhang, L., Xie, T., Anvik, J., and Sun, J. (2008). An approach todetecting duplicate bug reports using natural language and executioninformation. In Proceedings of the 13th International Conference onSoftware Engineering (ICSE’08), pages 461–470. ACM Press.
Wohlin, C., Runeson, P., Martin Höst, M. C. O., Regnell, B., and Wesslén, A.(2000). Experimentation in Software Engineering: An Introduction. TheKluwer Internation Series in Software Engineering. Kluwer AcademicPublishers, Norwell, Massachusets, USA.
Zelkowitz, M. V., Shaw, A. C., and Gannon, J. D. (1979). Principles of SoftwareEngineering and Design. Prentice Hall Professional Technical Reference.
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 56 / 57
A Bug Report Analysis and Search ToolM.Sc. Presentation
Yguaratã Cerqueira [email protected]
Advisor: Silvio Romero de Lemos MeiraCo-Advisor: Eduardo Santana de Almeida
Center for Informatics – Federal University of Pernambuco (UFPE)http://www.cin.ufpe.br
Reuse in Software Engineering (RiSE)http://www.rise.com.br
07/03/2009, Recife – Brazil
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 57 / 57