a bug report analysis and search tool (presentation for m.sc. degree)

69
A Bug Report Analysis and Search Tool M.Sc. Presentation Yguaratã Cerqueira Cavalcanti [email protected] Advisor: Silvio Romero de Lemos Meira Co-Advisor: Eduardo Santana de Almeida Center for Informatics – Federal University of Pernambuco (UFPE) http://www.cin.ufpe.br Reuse in Software Engineering (RiSE) http://www.rise.com.br 07/03/2009, Recife – Brazil Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 1 / 57

Upload: yguarata

Post on 25-Jul-2015

192 views

Category:

Software


1 download

TRANSCRIPT

A Bug Report Analysis and Search ToolM.Sc. Presentation

Yguaratã Cerqueira [email protected]

Advisor: Silvio Romero de Lemos MeiraCo-Advisor: Eduardo Santana de Almeida

Center for Informatics – Federal University of Pernambuco (UFPE)http://www.cin.ufpe.br

Reuse in Software Engineering (RiSE)http://www.rise.com.br

07/03/2009, Recife – Brazil

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 1 / 57

Summary

1 IntroductionM.Sc. Context, Motivation, Proposed solution

2 The Bug Report Duplication Problem: A Characterization StudyDefinition, Planning and Operation, Results

3 BASTRequirements, Architecture, Overview

4 Case StudyDefinition, Planning, Analysis and interpretation

5 ExperimentDefinition, Planning, Analysis and interpretation

6 Related Work

7 Conclusion

8 References

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 2 / 57

Outline

1 IntroductionM.Sc. Context, Motivation, Proposed solution

2 The Bug Report Duplication Problem: A Characterization StudyDefinition, Planning and Operation, Results

3 BASTRequirements, Architecture, Overview

4 Case StudyDefinition, Planning, Analysis and interpretation

5 ExperimentDefinition, Planning, Analysis and interpretation

6 Related Work

7 Conclusion

8 References

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 3 / 57

M.Sc. Context

Change management handles requests for:

new features

correction of errors

improvements

It drives the software maintenance and evolution

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 4 / 57

M.Sc. Context

Change management handles requests for:

new features

correction of errors

improvements

It drives the software maintenance and evolution

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 4 / 57

Motivation

Software maintenance and evolution are characterised by their hugecost and slow speed of implementation

Sommerville says that it takes almost 90% of costs

Year Total costs Reference2000 >90% Erlikh (2000)1993 75% Eastwood (1993)1990 >90% Moad (1990)1990 60–70% Huff (1990)1988 60–70% Port (1988)1984 65–75% McKee (1984)1981 >50% Lientz and Swanson (1981)1979 67% Zelkowitz et al. (1979)

Table: Conducted studies about software maintenance costs (Koskinen, 2004).

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 5 / 57

Bug tracking activity

Bug reports management

Verify bug report validity

Analyze the impact of a bug report

Assign a developer

Help with development process in general

Bug reports Software artifact that describes some defect or enhancement;Generally, bug report submitters are developers, users, ortesters

Bug trackers Bug trackers are used to manage, store and handle changerequests (also known as bug reports)

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 6 / 57

Bug tracking activity

Bug reports management

Verify bug report validity

Analyze the impact of a bug report

Assign a developer

Help with development process in general

Bug reports Software artifact that describes some defect or enhancement;Generally, bug report submitters are developers, users, ortesters

Bug trackers Bug trackers are used to manage, store and handle changerequests (also known as bug reports)

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 6 / 57

Bug trackers advantages

Traceability (developers, releases)

Fast identification of problems

Metrics (errors per developers, to identify critical components, etc)

Comments

Project history

Examples: Mantis, Bugzilla, Trac, Jyra

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 7 / 57

A bug report example

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 8 / 57

A bug report example [2]

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 9 / 57

A bug report example [3]

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 10 / 57

A bug report example [4]

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 11 / 57

Issues coming from bug trackers

Dynamic assignment of bug reports (Anvik et al., 2006);

Change impact analysis and effort estimation of new bug reports(Song et al., 2006);

Quality of bug report descriptions (Ko et al., 2006);

Software evolution traceability (Sandusky et al., 2004); and

Duplicate bug reports detection consists in avoiding the submission ofbug reports that describe the submitted issue (Hiew, 2006).

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 12 / 57

The bug report duplication problem

Characterized by the submission of two or more bug reports that describethe same software issue

Overhead of rework to search and analyze bug reports

People take almost 5-15 minutes to perform search and analysis (Anviket al., 2005; Cavalcanti et al., 2008)

10% to 30% of a bug report repository are composed by duplicated bugreports (Anvik et al., 2005; Runeson et al., 2007; Cavalcanti et al., 2008)

So, costs withopening bug reports (5-15 minutes)CCB analysis (5-15 minutes)developer analysis (5-15 minutes)

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 13 / 57

Proposed solution

The proposed solution consists in a Web based application that enablespeople involved with bug report search and analysis to perform suchtasks more effectively.

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 14 / 57

Outline

1 IntroductionM.Sc. Context, Motivation, Proposed solution

2 The Bug Report Duplication Problem: A Characterization StudyDefinition, Planning and Operation, Results

3 BASTRequirements, Architecture, Overview

4 Case StudyDefinition, Planning, Analysis and interpretation

5 ExperimentDefinition, Planning, Analysis and interpretation

6 Related Work

7 Conclusion

8 References

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 15 / 57

Definition

The goal of this study was to analyze bug repositories and the activities forsearching and analyzing bug reports

with the purpose of understanding them with respect to the possible factorsthat could impact on the duplication problem and theirconsequences on software development

from the point of view of the researchers

in the context of software development projects

QuestionsQ1: Do the projects have a considerable amount of duplicate bug reports?Q2: Is the productivity being affected by the bug report duplication problem?Q3: Is there a common vocabulary for bug report descriptions?Q4: How are the relationships between master bug reports and duplicate bugreports characterized?Q5: Does the type of bug report influence the amount of duplicates?

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 16 / 57

Definition

The goal of this study was to analyze bug repositories and the activities forsearching and analyzing bug reports

with the purpose of understanding them with respect to the possible factorsthat could impact on the duplication problem and theirconsequences on software development

from the point of view of the researchers

in the context of software development projects

QuestionsQ1: Do the projects have a considerable amount of duplicate bug reports?Q2: Is the productivity being affected by the bug report duplication problem?Q3: Is there a common vocabulary for bug report descriptions?Q4: How are the relationships between master bug reports and duplicate bugreports characterized?Q5: Does the type of bug report influence the amount of duplicates?

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 16 / 57

Planning and operation

Projects and data selectionAll bug reports till June/2008

Project LOC Staff size Bugs Life-timeBugzilla 55K 340 12829 14Eclipse 6.5M 352 130095 7Epiphany 100K 19 10683 6Evolution 1M 156 72646 11Firefox 80K 514 60233 9GCC 4.2M 285 35797 9Thunderbird 310K 192 19204 8Tomcat 200K 57 8293 8Private Project 2M 21 7955 2

Performed at C.E.S.A.R. between June/2008 to August/2008

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 17 / 57

Results

Question 1: Do the analyzed projects have a considerable amount ofduplicate bug reports?

Metric Bugz. Eclip. Epiph. Evol. Firef . GCC Thund . Tomc. Private Proj. Mean SDM1 % 23.32 19.44 31.52 43.24 38.39 17.68 49.10 8.24 21.59 28.1 13.4

Question 2: Is the submitters productivity being affected by the bug reportduplication problem?

Metric Bugz. Eclip. Epiph. Evol. Firef . GCC Thund . Tomc. Private Proj. Mean SDM2 (min) 05-15 – 05-15 05-15 05-10 05-15 05-15 – 20-30 12.5 1.88M4 bugs per day 71 722 59 403 334 198 106 46 145 231.5 222.1

Question 3: Is there a common vocabulary for bug report descriptions?

Metric Bugz. Eclip. Epiph. Evol. Firef . GCC Thund . Tomc. Private Proj. Mean SDM5 % – 25 – – 22 – – – 35 31.2 9.5

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 18 / 57

Results

Question 1: Do the analyzed projects have a considerable amount ofduplicate bug reports?

Metric Bugz. Eclip. Epiph. Evol. Firef . GCC Thund . Tomc. Private Proj. Mean SDM1 % 23.32 19.44 31.52 43.24 38.39 17.68 49.10 8.24 21.59 28.1 13.4

Question 2: Is the submitters productivity being affected by the bug reportduplication problem?

Metric Bugz. Eclip. Epiph. Evol. Firef . GCC Thund . Tomc. Private Proj. Mean SDM2 (min) 05-15 – 05-15 05-15 05-10 05-15 05-15 – 20-30 12.5 1.88M4 bugs per day 71 722 59 403 334 198 106 46 145 231.5 222.1

Question 3: Is there a common vocabulary for bug report descriptions?

Metric Bugz. Eclip. Epiph. Evol. Firef . GCC Thund . Tomc. Private Proj. Mean SDM5 % – 25 – – 22 – – – 35 31.2 9.5

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 18 / 57

Results

Question 1: Do the analyzed projects have a considerable amount ofduplicate bug reports?

Metric Bugz. Eclip. Epiph. Evol. Firef . GCC Thund . Tomc. Private Proj. Mean SDM1 % 23.32 19.44 31.52 43.24 38.39 17.68 49.10 8.24 21.59 28.1 13.4

Question 2: Is the submitters productivity being affected by the bug reportduplication problem?

Metric Bugz. Eclip. Epiph. Evol. Firef . GCC Thund . Tomc. Private Proj. Mean SDM2 (min) 05-15 – 05-15 05-15 05-10 05-15 05-15 – 20-30 12.5 1.88M4 bugs per day 71 722 59 403 334 198 106 46 145 231.5 222.1

Question 3: Is there a common vocabulary for bug report descriptions?

Metric Bugz. Eclip. Epiph. Evol. Firef . GCC Thund . Tomc. Private Proj. Mean SDM5 % – 25 – – 22 – – – 35 31.2 9.5

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 18 / 57

Results [2]

Question 4: How are the relationships between master bug reports andduplicate bug reports characterized?

One to one relation

bug123: bug3453

One to many relation

bug345: bug45345,bug465, bug654

Figure: Bug reports grouping.

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 19 / 57

Results [3]Question 5: Does the type of bug report influence the amount of duplicates?

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 20 / 57

Study summary

All the projects are being affected by the bug report duplication problem;

The productivity is being affected by the bug reports duplication problem;

It is not used a common vocabulary to describe the bug reports;

> 80% of the groups are composed by one-to-one grouping type;

The bug report duplication occur independently of the type of bug reports;

The number of LOC is not a factor for the duplication problem;

The size of the repository is not a factor for duplication;

Projects’ life-time is not a factor for duplication;

The staff size (developers) is not a factor for the duplication problem;and

The profile of the submitter is a determining factor for the submission ofduplicates: sporadic ≥ average ≥ frequent

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 21 / 57

Study summary

All the projects are being affected by the bug report duplication problem;

The productivity is being affected by the bug reports duplication problem;

It is not used a common vocabulary to describe the bug reports;

> 80% of the groups are composed by one-to-one grouping type;

The bug report duplication occur independently of the type of bug reports;

The number of LOC is not a factor for the duplication problem;

The size of the repository is not a factor for duplication;

Projects’ life-time is not a factor for duplication;

The staff size (developers) is not a factor for the duplication problem;and

The profile of the submitter is a determining factor for the submission ofduplicates: sporadic ≥ average ≥ frequent

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 21 / 57

Outline

1 IntroductionM.Sc. Context, Motivation, Proposed solution

2 The Bug Report Duplication Problem: A Characterization StudyDefinition, Planning and Operation, Results

3 BASTRequirements, Architecture, Overview

4 Case StudyDefinition, Planning, Analysis and interpretation

5 ExperimentDefinition, Planning, Analysis and interpretation

6 Related Work

7 Conclusion

8 References

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 22 / 57

Requirements

Functional requirements

FR1 - Keyword-based search

FR2 - Rank search results basedon bug reports similarity rate

FR3 - Index bug reports from XMLfiles

FR4 - Index bug reports fromoriginal database

FR5 - Extract useful informationfrom bug reports

Non-Functional requirements

NFR1 - Simple and intuitive filtersinterface

NFR2 - Reports about bugrepository status

NFR3 - Integration with mostpopular bug report trackingsystems

NFR4 - Log search queries anduser actions

NFR5 - Reasonable similarity rate

NFR6 - Web-based interface withAJAX

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 23 / 57

Requirements

Functional requirements

FR1 - Keyword-based search

FR2 - Rank search results basedon bug reports similarity rate

FR3 - Index bug reports from XMLfiles

FR4 - Index bug reports fromoriginal database

FR5 - Extract useful informationfrom bug reports

Non-Functional requirements

NFR1 - Simple and intuitive filtersinterface

NFR2 - Reports about bugrepository status

NFR3 - Integration with mostpopular bug report trackingsystems

NFR4 - Log search queries anduser actions

NFR5 - Reasonable similarity rate

NFR6 - Web-based interface withAJAX

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 23 / 57

Architecture

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 24 / 57

Overview

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 25 / 57

Outline

1 IntroductionM.Sc. Context, Motivation, Proposed solution

2 The Bug Report Duplication Problem: A Characterization StudyDefinition, Planning and Operation, Results

3 BASTRequirements, Architecture, Overview

4 Case StudyDefinition, Planning, Analysis and interpretation

5 ExperimentDefinition, Planning, Analysis and interpretation

6 Related Work

7 Conclusion

8 References

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 26 / 57

DefinitionContext. Performed in a real test cycle at a C.E.S.A.R. partner

between July and August 2008Systematic process to test and open bug reports

Objectives. 1 Which can prevent more duplicate bug reports2 To consider whether our tool decreases the time spent on

analysis of bug reportsBaseline tool. Internal tool where testers can search for bug reports using

SQL filters.

Null hypotheses

H0: µ time with BAST > µ time with baseline

µduplicates avoided with BAST < µduplicates avoided with baseline

Alternative hypotheses

H1: µ time with BAST < µ time with baseline

µduplicates avoided with BAST > µduplicates avoided with baseline

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 27 / 57

DefinitionContext. Performed in a real test cycle at a C.E.S.A.R. partner

between July and August 2008Systematic process to test and open bug reports

Objectives. 1 Which can prevent more duplicate bug reports2 To consider whether our tool decreases the time spent on

analysis of bug reportsBaseline tool. Internal tool where testers can search for bug reports using

SQL filters.

Null hypotheses

H0: µ time with BAST > µ time with baseline

µduplicates avoided with BAST < µduplicates avoided with baseline

Alternative hypotheses

H1: µ time with BAST < µ time with baseline

µduplicates avoided with BAST > µduplicates avoided with baseline

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 27 / 57

Planning

The tool was tested by the Bug Report MasterResponsible for the test cycleMost experienced testerDoubt should be saned with him

Case study design: Search and analysis being performed in:

1 step. Internal tool =⇒ BAST2 step. BAST =⇒ Internal tool

Metrics (manual annotations):Type of bug reports analyzedNumber of duplicate bug reports avoidedTime spent to analyze similar bug reports

Quantitative analysis: Descriptive statistics

It were analyzed 144 bug reports

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 28 / 57

Analysis and interpretation

Repository status

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 29 / 57

Analysis and interpretation [2]

Duplicates found

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 30 / 57

Analysis and interpretation [3]Time spent

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 31 / 57

Case study summary

Bug tracker status. More than 50% of duplicates

Duplicates found. Our tool can prevent more duplicates than thebaseline tool

Time spent. The bug report master saved time using our tool

Drawbacks

Case study design. Accommodation of the subject, in which he prefersto use one tool instead of other.

Amount of bug reports in treatments. The amounts of bug reports thatwere analyzed in each treatment were very different.

Lack of subjects. The number of subjects was not sufficient togeneralize the case study results.

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 32 / 57

Case study summary

Bug tracker status. More than 50% of duplicates

Duplicates found. Our tool can prevent more duplicates than thebaseline tool

Time spent. The bug report master saved time using our tool

Drawbacks

Case study design. Accommodation of the subject, in which he prefersto use one tool instead of other.

Amount of bug reports in treatments. The amounts of bug reports thatwere analyzed in each treatment were very different.

Lack of subjects. The number of subjects was not sufficient togeneralize the case study results.

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 32 / 57

Outline

1 IntroductionM.Sc. Context, Motivation, Proposed solution

2 The Bug Report Duplication Problem: A Characterization StudyDefinition, Planning and Operation, Results

3 BASTRequirements, Architecture, Overview

4 Case StudyDefinition, Planning, Analysis and interpretation

5 ExperimentDefinition, Planning, Analysis and interpretation

6 Related Work

7 Conclusion

8 References

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 33 / 57

Definition

The goal of this experiment was to analyze a tool to improve search andanalysis of bug reports

with the purpose of evaluating it with respect to its effectiveness and efficiencyon detection of duplicate bug reports and time saving

from the point of view of the researchers

in the context of software development projects

Questions

Q1 Is there a reduction on the number of duplicated bug reportswith the new tool adoption?

Q2 Is there a reduction on the time that submitters spend to performthe search and analysis of bug reports with the tool adoption?

Q3 Did the submitters have difficulties to use the tool?

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 34 / 57

Definition

The goal of this experiment was to analyze a tool to improve search andanalysis of bug reports

with the purpose of evaluating it with respect to its effectiveness and efficiencyon detection of duplicate bug reports and time saving

from the point of view of the researchers

in the context of software development projects

Questions

Q1 Is there a reduction on the number of duplicated bug reportswith the new tool adoption?

Q2 Is there a reduction on the time that submitters spend to performthe search and analysis of bug reports with the tool adoption?

Q3 Did the submitters have difficulties to use the tool?

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 34 / 57

Definition [2]

Objects of study: BAST and Bugzilla.

Quality focus: Effectiveness and efficiency of the tool developed.

Context: The adoption of a tool developed to aid the bug report trackingprocess, focusing on search and analysis of bug report to avoidduplicates.

Experiment type: Off-line experiment (Wohlin et al., 2000)

Subjects: 18 Ph.D. and M.Sc. students from the Computer Sciencedepartment at Federal University of Pernambuco/Brazil

Performed distributed (no place restrictions)

Bug reports from Firefox open-source project

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 35 / 57

Planning

Subjects selection. Selected by convenience sampling (Wohlin et al.,2000; Kitchenham and Pfleeger, 2002)Instrumentation: 32 error descriptions concerning Firefox project

50% with defects that already have bug reports describing them in therepository50% with unique/not-reported defects

Guidelines to guide the experiment execution (FAQ)

Time-sheets to collect the time with search and analysis

Quantitative analysis: Descriptive statistics and hypothesis testing[test-t (Wohlin et al., 2000)]

Qualitative analysis: Questionnaire

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 36 / 57

Planning [2]

Null hypothesis

H0: µ time with BAST > µ time with baseline

µduplicates avoided with BAST < µduplicates avoided with baseline

Alternative hypothesis

H1: µ time with BAST < µ time with baseline

µduplicates avoided with BAST > µduplicates avoided with baseline

Independent variables. The tool used (BAST or Bugzilla)

Dependent variables. (a) amount of duplicate bug reports and (b) thetime spent with search and analysis

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 37 / 57

Planning [3]

Experiment design

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 38 / 57

Analysis and interpretation

Descriptive statistics

Time spent on analysis Bug-reports avoidedBAST Bugzilla BAST Bugzilla

Mean 4.54 4.32 7.56 8.33Maximum 6.84 9.56 13 12Minimum 1.78 2.47 0 0SD 1.49 1.91 3.5 3.2

Table: Descriptive statistics.

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 39 / 57

Analysis and interpretation [2]Descriptive statistics [2]

Figure: Box plot for time spent.

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 40 / 57

Analysis and interpretation [3]Descriptive statistics [3]

Figure: Box plot for duplicates avoided.

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 41 / 57

Analysis and interpretation [4]

Hypothesis test

Time spent on analysis Duplicates avoidedt0 0.6292 -1.2466Degrees of freedom 17 17p-value 0.5376 0.2294T distribution 2.11 2.11Result (t0 > T) H0: not rejected H0: not rejected

Analysis of dependency

BAST time Bugzilla time BAST duplicates Bugzilla duplicatesYears of experience -0.13 -0.02 -0.19 0.18Number of projects -0.11 0.37 -0.28 -0.025Bug trackers used -0.16 0.35 -0.26 0.05

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 42 / 57

Analysis and interpretation [4]

Hypothesis test

Time spent on analysis Duplicates avoidedt0 0.6292 -1.2466Degrees of freedom 17 17p-value 0.5376 0.2294T distribution 2.11 2.11Result (t0 > T) H0: not rejected H0: not rejected

Analysis of dependency

BAST time Bugzilla time BAST duplicates Bugzilla duplicatesYears of experience -0.13 -0.02 -0.19 0.18Number of projects -0.11 0.37 -0.28 -0.025Bug trackers used -0.16 0.35 -0.26 0.05

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 42 / 57

Qualitative analysis

BAST features. Seven (7) used the filter features provided by the tool.

BAST Usability. Only one mentioned some difficult to use the filters, and onlyone subject had problem with ordering features.

BAST usefulness. Fifteen (15) subjects believe that the way as bug reportdetails are presented in BAST is useful for the analysis, more than Bugzilla.

Testimonials“in fact, the way details are presented saves time to check them, since it is notnecessary to open extra tabs or windows to see the details”, and other wrote “itbecame easier to identify the duplicate bug reports and navigate among thedetails of the them”.

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 43 / 57

Qualitative analysis

BAST features. Seven (7) used the filter features provided by the tool.

BAST Usability. Only one mentioned some difficult to use the filters, and onlyone subject had problem with ordering features.

BAST usefulness. Fifteen (15) subjects believe that the way as bug reportdetails are presented in BAST is useful for the analysis, more than Bugzilla.

Testimonials“in fact, the way details are presented saves time to check them, since it is notnecessary to open extra tabs or windows to see the details”, and other wrote “itbecame easier to identify the duplicate bug reports and navigate among thedetails of the them”.

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 43 / 57

Validity Threats

Boredom

Lack of Historical Data

Environment

Subjects Knowledge on bug reports

Errors re-descriptions and fictitious errors

Halo Effect

Internet Connection Constraints

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 44 / 57

Outline

1 IntroductionM.Sc. Context, Motivation, Proposed solution

2 The Bug Report Duplication Problem: A Characterization StudyDefinition, Planning and Operation, Results

3 BASTRequirements, Architecture, Overview

4 Case StudyDefinition, Planning, Analysis and interpretation

5 ExperimentDefinition, Planning, Analysis and interpretation

6 Related Work

7 Conclusion

8 References

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 45 / 57

Related work

Automated Support for Classifying Software Failure Reports(Podgurski et al., 2003)

Bug reports: Software failures automatically submittedTechnique: Supervised and unsupervised pattern classification andmultivariate visualizationTesting: Batch runsDataset: GCC, Jikes, and JavaC

Assisted Detection of Duplicate Bug Reports (Hiew, 2006)Bug reports: Natural language bug reportsTechnique: Organize similar bug reports into centroids using TF-IDFTesting: Batch runsDataset: Firefox, Eclipse, Apache, and Fedora CoreResults: Precision of 29% and recall of 50%

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 46 / 57

Related work [2]

Detection of Duplicate Defect Reports Using Natural LanguageProcessing (Runeson et al., 2007)

Bug reports: Natural language bug reportsTechnique: Natural Language Processing (NLP)Testing: Batch runs and a toolDataset: Sony Ericsson Mobile CommunicationsResults: Recall of 40%

An Approach to Detecting Duplicate Bug Reports Using NaturalLanguage and Execution Information (Wang et al., 2008)

Bug reports: Natural language bug reportsTechnique: NLP and execution informationTesting: Batch runsDataset: Firefox and EclipseResults: Recall of 67%-93% at its best

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 47 / 57

Outline

1 IntroductionM.Sc. Context, Motivation, Proposed solution

2 The Bug Report Duplication Problem: A Characterization StudyDefinition, Planning and Operation, Results

3 BASTRequirements, Architecture, Overview

4 Case StudyDefinition, Planning, Analysis and interpretation

5 ExperimentDefinition, Planning, Analysis and interpretation

6 Related Work

7 Conclusion

8 References

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 48 / 57

Research contribution

A taxonomy for the bug repositories mining area

The state-of-the-art on mining bug repositories

A characterization of the bug report duplication problem

A tool to reduce the time spent with search and analysis of bugreports

A case study to evaluate the tool proposed;

An experiment with 18 subjects to evaluate the tool

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 49 / 57

Papers

Cavalcanti, Y. C., Martins, A. C., de Almeida, E. S., and de Lemos Meira,S. R. (2008a). Avoiding Duplicate CR reports in Open Source SoftwareProjects. In The 9th International Free Software Forum (IFSF’08), PortoAlegre, Brazil.

Cavalcanti, Y. C., de Almeida, E. S., da Cunha, C. E. A., Pinto, E. R., andMeira, S. R. L. (2008b). The Bug Report Duplication Problem: ACharacterization Study. Technical report, C.E.S.A.R and FederalUniversity of Pernambuco.

Papers for the Case Study and for the Experiment

And more two journal papers being written (characterization and thesis)

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 50 / 57

Future Work

Evolve from prototype

Information visualizationAlternative integration methods

Provide integration with othertools

Search and raking techniquesComments of a bug reportNumber of informal references

Experiment replications

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 51 / 57

Outline

1 IntroductionM.Sc. Context, Motivation, Proposed solution

2 The Bug Report Duplication Problem: A Characterization StudyDefinition, Planning and Operation, Results

3 BASTRequirements, Architecture, Overview

4 Case StudyDefinition, Planning, Analysis and interpretation

5 ExperimentDefinition, Planning, Analysis and interpretation

6 Related Work

7 Conclusion

8 References

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 52 / 57

References I

Anvik, J., Hiew, L., and Murphy, G. C. (2005). Coping with an open bugrepository. In Proceedings of the 2005 OOPSLA workshop on Eclipsetechnology eXchange, pages 35–39, New York, NY, USA. ACM Press.

Anvik, J., Hiew, L., and Murphy, G. C. (2006). Who should fix this bug? InProceeding of the 28th International Conference on Software Engineering(ICSE’06), pages 361–370, New York, NY, USA. ACM Press.

Cavalcanti, Y. C., Almeida, E. S., da Cunha, C. E. A., Pinto, E. R., and Meira,S. R. L. (2008). The bug-report duplication problem: a characterizationstudy. Technical report, C.E.S.A.R and Federal University of Pernambuco.

Eastwood, A. (1993). Firm fires shots at legacy systems. Computing Canada,19(2), 17.

Erlikh, L. (2000). Leveraging legacy system dollars for e-business. ITProfessional , 2(3), 17–23.

Hiew, L. (2006). Assisted Detection of Duplicate Bug Reports. Master’s thesis,The University of British Columbia.

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 53 / 57

References IIHuff, F. (1990). Information systems maintenance. The Business Quarterly ,

(55), 30–32.

Kitchenham, B. and Pfleeger, S. L. (2002). Principles of survey research: part5: populations and samples. SIGSOFT Software Engineering Notes, 27(5),17–20.

Ko, A. J., Myers, B. A., and Chau, D. H. (2006). A linguistic analysis of howpeople describe software problems. In Proceedings of the VisualLanguages and Human-Centric Computing (VLHCC’06), pages 127–134,Washington, DC, USA. IEEE Computer Science.

Koskinen, J. (2004). Software maintenance costs.http://www.cs.jyu.fi/~koskinen/smcosts.htm.

Lientz, B. P. and Swanson, E. B. (1981). Problems in application softwaremaintenance. Communications of the ACM, 24(11), 763–769.

McKee, J. R. (1984). Maintenance as a function of design. In AFIPS NationalConference Proceeding, volume 53, pages 187–1983.

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 54 / 57

References III

Moad, J. (1990). Maintaining the competitive edge. Datamation, 4(36), 61–62.

Podgurski, A., Leon, D., Francis, P., Masri, W., Minch, M., Sun, J., and Wang,B. (2003). Automated support for classifying software failure reports. InProceedings of the 25th International Conference on Software Engineering(ICSE’03), pages 465–475, Washington, DC, USA. IEEE Computer Society.

Port, O. (1988). The software trap – automate or else. Business Week ,9(3051), 142–154.

Runeson, P., Alexandersson, M., and Nyholm, O. (2007). Detection ofduplicate defect reports using natural language processing. In Proceedingsof the 29th International Conference on Software Engineering (ICSE’07),pages 499–510. IEEE Computer Science Press.

Sandusky, R. J., Gasser, L., and Ripoche, G. (2004). Bug report networks:Varieties, strategies, and impacts in a f/oss development community. InProceedings of the 1st International Workshop on Mining SoftwareRepositories (MSR’04), pages 80–84, University of Waterloo, Waterloo.

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 55 / 57

References IV

Sommerville, I. (2007). Software Engineering. Addison Wesley, 8 edition.

Song, Q., Shepperd, M. J., Cartwright, M., and Mair, C. (2006). Softwaredefect association mining and defect correction effort prediction. IEEETransactions on Software Engineering, 32(2), 69–82.

Wang, X., Zhang, L., Xie, T., Anvik, J., and Sun, J. (2008). An approach todetecting duplicate bug reports using natural language and executioninformation. In Proceedings of the 13th International Conference onSoftware Engineering (ICSE’08), pages 461–470. ACM Press.

Wohlin, C., Runeson, P., Martin Höst, M. C. O., Regnell, B., and Wesslén, A.(2000). Experimentation in Software Engineering: An Introduction. TheKluwer Internation Series in Software Engineering. Kluwer AcademicPublishers, Norwell, Massachusets, USA.

Zelkowitz, M. V., Shaw, A. C., and Gannon, J. D. (1979). Principles of SoftwareEngineering and Design. Prentice Hall Professional Technical Reference.

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 56 / 57

A Bug Report Analysis and Search ToolM.Sc. Presentation

Yguaratã Cerqueira [email protected]

Advisor: Silvio Romero de Lemos MeiraCo-Advisor: Eduardo Santana de Almeida

Center for Informatics – Federal University of Pernambuco (UFPE)http://www.cin.ufpe.br

Reuse in Software Engineering (RiSE)http://www.rise.com.br

07/03/2009, Recife – Brazil

Yguaraṭ Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife РBrazil 57 / 57