good hunting: locating, prioritizing, and fixing bugs automatically (keynote, iwesep 2013)
DESCRIPTION
Good Hunting: Locating, Prioritizing, and Fixing Bugs Automatically Keynote Speech IWESEP 2013 https://sites.google.com/site/iwesep2013/TRANSCRIPT
Good Hunting:Locating, Prioritizing, and Fixing Bugs Automatically
Dongsun KimThe University of Luxembourg
Interdisciplinary Centre for Security, Reliability and Trust
2 Dec 20131
2
Serval Team
2
33
33
33
33
33
4
Hunting
4
5
Hunting 101
5
6
1. Seeking1. Seeking
6
7
2. Selecting
7
7
2. Selecting
7
8
3. Shooting
8
9
Debugging 101
9
10
1. Localizing
10
11
2. Prioritizing
11
12
3. Fixing
12
13
About This Talk
13
13
About This Talk
Three debugging techniques based on SW repository mining
13
13
About This Talk
Three debugging techniques based on SW repository mining
Quick Tips on mining
13
13
About This Talk
Three debugging techniques based on SW repository mining
Future Directions
Quick Tips on mining
13
14
Three techniques
14
14
Two-phase recommendation model for bug localization
Three techniques
14
14
Two-phase recommendation model for bug localization
Early prediction model for bug prioritization
Three techniques
14
14
Two-phase recommendation model for bug localization
Early prediction model for bug prioritization
Pattern-based program repair for bug fixing
Three techniques
14
Where Should We Fix This Bug? A Two-phase Recommendation Model
Dongsun Kim, Yida Tao, Sunghun KimThe Hong Kong University of Science and Technology, China
Andreas ZellerSaarland University
IEEE Transactions on Software Engineering (Vol. 39, No. 11)15
16
Fault LocalizationTest cases
(passing/failing)Program
(class/module)
Faultystatement/predicate
Bug (File) Localization
Bug report Program Buggy file
16
16
Fault LocalizationTest cases
(passing/failing)Program
(class/module)
Faultystatement/predicate
Bug (File) Localization
Bug report Program Buggy file
16
17
Bug Report
Bug Description
Comments
17
18
ML Classification 101
18
18
ML Classification 101
(1, 90, 21, A, text, ... , 58)Feature vector
(2, 12, 100, E, aaa, ... , 76)...
18
18
ML Classification 101
(1, 90, 21, A, text, ... , 58)Feature vector
(2, 12, 100, E, aaa, ... , 76)...
Class[Type 4][Type 2]
...
18
18
ML Classification 101
(1, 90, 21, A, text, ... , 58)Feature vector
(2, 12, 100, E, aaa, ... , 76)...
Class[Type 4][Type 2]
...
18
18
ML Classification 101
Classifier (Model)
(1, 90, 21, A, text, ... , 58)Feature vector
(2, 12, 100, E, aaa, ... , 76)...
Class[Type 4][Type 2]
...
18
18
ML Classification 101
Classifier (Model)
Training
(1, 90, 21, A, text, ... , 58)Feature vector
(2, 12, 100, E, aaa, ... , 76)...
Class[Type 4][Type 2]
...
18
18
ML Classification 101
Classifier (Model)
Training
(4, 39, 5, K, text, ... , 32)Feature vector
(1, 90, 21, A, text, ... , 58)Feature vector
(2, 12, 100, E, aaa, ... , 76)...
Class[Type 4][Type 2]
...
18
18
ML Classification 101
Classifier (Model)
Training
(4, 39, 5, K, text, ... , 32)Feature vector
(1, 90, 21, A, text, ... , 58)Feature vector
(2, 12, 100, E, aaa, ... , 76)...
Class[Type 4][Type 2]
...
18
18
ML Classification 101
Classifier (Model)
Training
(4, 39, 5, K, text, ... , 32)Feature vector Class
[Type 3]
(1, 90, 21, A, text, ... , 58)Feature vector
(2, 12, 100, E, aaa, ... , 76)...
Class[Type 4][Type 2]
...
18
18
ML Classification 101
Classifier (Model)
Training
(4, 39, 5, K, text, ... , 32)Feature vector Class
[Type 3]
Classification
(1, 90, 21, A, text, ... , 58)Feature vector
(2, 12, 100, E, aaa, ... , 76)...
Class[Type 4][Type 2]
...
18
19
Bug Localization using ML
Feature vectorMeta + Textual
Information in bug reports
Class Changed files in bug reports
19
20
Intuitive ApproachJOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007 5
PredictionModel
a.cppa.cppa.cppb.cppb.cppb.cpp
Bug ReportsPredicted
m.cm.cm.ch.ch.ch.c
PredictedFiles to Fix
Fig. 1: One-phase Prediction Model.
Bug 403040 - Places killed all my history >2 days ago Last Comment
Status: VERIFIED FIXED Whiteboard:
Keywords: dataloss, regression
Product: FirefoxComponent: Bookmarks & History
Version: Trunk Platform: All All
Importance: P1 critical (vote) Target Milestone: Firefox 3 beta2
Assigned To: Dietrich Ayala (:dietrich)QA Contact: bookmarks
URL:
Depends on: Blocks:
Show dependency tree / graph
Reported: 2007-11-08 08:47 PST by Reed Loden [:reed] (very busy)
Modified: 2010-12-17 06:29 PST (History)
CC List: 14 users (show)
Flags: mconnor: blocking-firefox3+ (more flags)
See Also:
Crash Signature:
Summon comment box
Attachments
fix v1.1 (1.31 KB, patch)2007-11-09 11:51 PST, Dietrich Ayala (:dietrich)
set mExpireVisits to default if not set in prefs (3.37 KB, patch)2007-11-15 02:31 PST, Marco Bonardo [:mak]
Add an attachment (proposed patch, testcase, etc.)
2007-11-08 08:47:34 PST
So, I noticed last night that I was missing three days or so in my history sidebar (I think days 3, 4, and 5), and now when I check today, the only thing in my history sidebar is today and yesterday. What happened to my history? :(
I have no idea how this happened, so I can't give good STR, sorry. I have had to kill Firefox several times lately, so it may be related to some type of shutdown sqlite save or expiration or something?
Description
Dietrich Ayala (:dietrich) 2007-11-08 09:04:47 PST
What's your browser.history_expire_days value?
Comment 1
Dietrich Ayala (:dietrich) 2007-11-08 09:20:58 PST
Hrm, no other bugs reported about this. I also searched the build forums for the last few days, no comments about anything like this.
Killing Firefox shouldn't matter: the shutdown work would not have occurred if your force-killed it, and SQLite is (theoretically) immune to data corruption from unexpected shutdown given that we run it in the safest possible mode (synchronous = full).
Are all your bookmarks still there?
Comment 2
Reed Loden [:reed] (very busy) 2007-11-08 09:48:13 PST
(In reply to comment #1) > What's your browser.history_expire_days value?
Both browser.history_expire_days and browser.history_expire_days.mirror are 180 days.
(In reply to comment #2) > Are all your bookmarks still there?
Yes, all my bookmarks are still there.
Comment 3
Robert Kaiser (:[email protected]) 2007-11-09 08:13:20 PST
I'm using places history with my self-built SeaMonkey builds, and lose my places history about daily in the last few days, though it worked perfectly before in older builds. I first thought it would be lost on shutdown/restart, but I noticed that I had a few visited links left from a last session a few times, so it at least isn't at every restart.
Comment 4
Marco Bonardo [:mak] 2007-11-09 09:31:10 PST
i'm not sure if that could be the problem, but nsNavHistoryExpire::FindVisits it's looking strange:
Comment 5
Page 1 of 5Bug 403040 – Places killed all my history >2 days ago
2012-01-23https://bugzilla.mozilla.org/show_bug.cgi?id=403040
Fig. 2: An uninformative bug report. This is an excerptfrom Mozilla Bug #403040, written by the bug submit-ter. This description is not informative and the bugreviewer indeed had to ask the submitter for furtherelaboration on his browser’s history and bookmarksettings.
further computes each file’s probability of being afile to fix. Based on this probability, top-k files arerecommended to developers as the prediction result.
3.3 Two-phase Prediction Model
As Hooimeijer et al. [49] and Bettenburg et al. [50]noticed, the quality of bug reports can vary consider-ably. Some bug reports may not have enough infor-mation to predict files to fix. Our evaluation of one-phase prediction (Section 5) confirms this conjecture:bug reports whose files are not successfully predictedusually have insufficient information (e.g., no initialdescription). In other words, including uninformativebug reports might yield poor prediction performance.
Figure 2 shows an example of an uninformativebug report. In this report, the submitter describesa problem faced when using Firefox. However, thisdescription is very general and contains few informa-tive keywords that indicate the problematic modules.Therefore, it is not helpful for developers to locate thefiles to fix. Similarly, our one-phase prediction modeldoes not perform well with such uninformative bugreports.
Hence, it is desirable to filter out uninformative bugreports before the actual prediction process. Based onthis observation, we propose the two-phase predictionmodel that has two classification phases: binary andmulti-class classification (Figure 3). The model firstfilters out uninformative reports (Section 3.3.1) andthen predicts files to fix (Section 3.3.2).
Phase 2Phase 1
PredictionModel
PredictionModelP
a.cppb.cpp
Bug Reports PredictableReports Predicted
Pm.ch.c
Reports PredictedFiles to Fix
Deficient
DDeficientReports
Fig. 3: Two-phase prediction model. This model rec-ommends files to fix only when the given bug reportis determined to have sufficient information.
3.3.1 Phase 1
Phase 1 filters out uninformative bug reports beforepredicting files to fix. Its prediction model classifiesa given bug report as “predictable” or “deficient”(binary classification) as shown in Figure 3. Only bugreports classified as “predictable” are taken up for thePhase 2 prediction.
The prediction model in Phase 1 leveragesprediction history. The training dataset of this modeluses a set of bug reports that have already beenresolved. Let B = {b1, b2, . . . , bn} be a set of nresolved bug reports chronologically sorted by theirfiling date. V (bi) is the i-th bug report’s featurevector, which is extracted as described in Section 3.1.P (bi) is the set of actual files changed to fix thebug (i.e., the files in the bug’s patch), which can beobtained as well from report bi. For each report, itslabel (“predictable” or “deficient”) is determined bythe following process: for an arbitrary report bj 2 B,a one-phase prediction model Mj is trained on{(V (b1), P (b1)), (V (b2), P (b2)) . . . (V (bj�1), P (bj�1))}to predict files to fix for bj . If the predictionresult hits any file in P (bj), bj is labeled as“predictable”; otherwise, it is labeled as “deficient”.Now, let L(b) be the label of report b. Byapplying the above process to all reports inB � {b1}, we can obtain the training dataset{(V (b2), L(b2)), (V (b3), L(b3)), . . . , (V (bn), L(bn))} forthe prediction model of Phase 1. Note that no trainingdataset is built for b1 since there is no bug reportbefore b1 to create (V (b1), L(b1)).
When a new bug report is submitted, the predictionmodel classifies it as either “predictable” or “defi-cient”. If the report is classified as “predictable”, it ispassed on to Phase 2 prediction; otherwise, no furtherprediction is conducted. In the latter case, developersmay ask the report submitter to give more informationabout the bug.
3.3.2 Phase 2
The Phase 2 model accepts “predictable” bug reportsobtained from Phase 1 as the input. It extracts features
20
20
Intuitive ApproachJOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007 5
PredictionModel
a.cppa.cppa.cppb.cppb.cppb.cpp
Bug ReportsPredicted
m.cm.cm.ch.ch.ch.c
PredictedFiles to Fix
Fig. 1: One-phase Prediction Model.
Bug 403040 - Places killed all my history >2 days ago Last Comment
Status: VERIFIED FIXED Whiteboard:
Keywords: dataloss, regression
Product: FirefoxComponent: Bookmarks & History
Version: Trunk Platform: All All
Importance: P1 critical (vote) Target Milestone: Firefox 3 beta2
Assigned To: Dietrich Ayala (:dietrich)QA Contact: bookmarks
URL:
Depends on: Blocks:
Show dependency tree / graph
Reported: 2007-11-08 08:47 PST by Reed Loden [:reed] (very busy)
Modified: 2010-12-17 06:29 PST (History)
CC List: 14 users (show)
Flags: mconnor: blocking-firefox3+ (more flags)
See Also:
Crash Signature:
Summon comment box
Attachments
fix v1.1 (1.31 KB, patch)2007-11-09 11:51 PST, Dietrich Ayala (:dietrich)
set mExpireVisits to default if not set in prefs (3.37 KB, patch)2007-11-15 02:31 PST, Marco Bonardo [:mak]
Add an attachment (proposed patch, testcase, etc.)
2007-11-08 08:47:34 PST
So, I noticed last night that I was missing three days or so in my history sidebar (I think days 3, 4, and 5), and now when I check today, the only thing in my history sidebar is today and yesterday. What happened to my history? :(
I have no idea how this happened, so I can't give good STR, sorry. I have had to kill Firefox several times lately, so it may be related to some type of shutdown sqlite save or expiration or something?
Description
Dietrich Ayala (:dietrich) 2007-11-08 09:04:47 PST
What's your browser.history_expire_days value?
Comment 1
Dietrich Ayala (:dietrich) 2007-11-08 09:20:58 PST
Hrm, no other bugs reported about this. I also searched the build forums for the last few days, no comments about anything like this.
Killing Firefox shouldn't matter: the shutdown work would not have occurred if your force-killed it, and SQLite is (theoretically) immune to data corruption from unexpected shutdown given that we run it in the safest possible mode (synchronous = full).
Are all your bookmarks still there?
Comment 2
Reed Loden [:reed] (very busy) 2007-11-08 09:48:13 PST
(In reply to comment #1) > What's your browser.history_expire_days value?
Both browser.history_expire_days and browser.history_expire_days.mirror are 180 days.
(In reply to comment #2) > Are all your bookmarks still there?
Yes, all my bookmarks are still there.
Comment 3
Robert Kaiser (:[email protected]) 2007-11-09 08:13:20 PST
I'm using places history with my self-built SeaMonkey builds, and lose my places history about daily in the last few days, though it worked perfectly before in older builds. I first thought it would be lost on shutdown/restart, but I noticed that I had a few visited links left from a last session a few times, so it at least isn't at every restart.
Comment 4
Marco Bonardo [:mak] 2007-11-09 09:31:10 PST
i'm not sure if that could be the problem, but nsNavHistoryExpire::FindVisits it's looking strange:
Comment 5
Page 1 of 5Bug 403040 – Places killed all my history >2 days ago
2012-01-23https://bugzilla.mozilla.org/show_bug.cgi?id=403040
Fig. 2: An uninformative bug report. This is an excerptfrom Mozilla Bug #403040, written by the bug submit-ter. This description is not informative and the bugreviewer indeed had to ask the submitter for furtherelaboration on his browser’s history and bookmarksettings.
further computes each file’s probability of being afile to fix. Based on this probability, top-k files arerecommended to developers as the prediction result.
3.3 Two-phase Prediction Model
As Hooimeijer et al. [49] and Bettenburg et al. [50]noticed, the quality of bug reports can vary consider-ably. Some bug reports may not have enough infor-mation to predict files to fix. Our evaluation of one-phase prediction (Section 5) confirms this conjecture:bug reports whose files are not successfully predictedusually have insufficient information (e.g., no initialdescription). In other words, including uninformativebug reports might yield poor prediction performance.
Figure 2 shows an example of an uninformativebug report. In this report, the submitter describesa problem faced when using Firefox. However, thisdescription is very general and contains few informa-tive keywords that indicate the problematic modules.Therefore, it is not helpful for developers to locate thefiles to fix. Similarly, our one-phase prediction modeldoes not perform well with such uninformative bugreports.
Hence, it is desirable to filter out uninformative bugreports before the actual prediction process. Based onthis observation, we propose the two-phase predictionmodel that has two classification phases: binary andmulti-class classification (Figure 3). The model firstfilters out uninformative reports (Section 3.3.1) andthen predicts files to fix (Section 3.3.2).
Phase 2Phase 1
PredictionModel
PredictionModelP
a.cppb.cpp
Bug Reports PredictableReports Predicted
Pm.ch.c
Reports PredictedFiles to Fix
Deficient
DDeficientReports
Fig. 3: Two-phase prediction model. This model rec-ommends files to fix only when the given bug reportis determined to have sufficient information.
3.3.1 Phase 1
Phase 1 filters out uninformative bug reports beforepredicting files to fix. Its prediction model classifiesa given bug report as “predictable” or “deficient”(binary classification) as shown in Figure 3. Only bugreports classified as “predictable” are taken up for thePhase 2 prediction.
The prediction model in Phase 1 leveragesprediction history. The training dataset of this modeluses a set of bug reports that have already beenresolved. Let B = {b1, b2, . . . , bn} be a set of nresolved bug reports chronologically sorted by theirfiling date. V (bi) is the i-th bug report’s featurevector, which is extracted as described in Section 3.1.P (bi) is the set of actual files changed to fix thebug (i.e., the files in the bug’s patch), which can beobtained as well from report bi. For each report, itslabel (“predictable” or “deficient”) is determined bythe following process: for an arbitrary report bj 2 B,a one-phase prediction model Mj is trained on{(V (b1), P (b1)), (V (b2), P (b2)) . . . (V (bj�1), P (bj�1))}to predict files to fix for bj . If the predictionresult hits any file in P (bj), bj is labeled as“predictable”; otherwise, it is labeled as “deficient”.Now, let L(b) be the label of report b. Byapplying the above process to all reports inB � {b1}, we can obtain the training dataset{(V (b2), L(b2)), (V (b3), L(b3)), . . . , (V (bn), L(bn))} forthe prediction model of Phase 1. Note that no trainingdataset is built for b1 since there is no bug reportbefore b1 to create (V (b1), L(b1)).
When a new bug report is submitted, the predictionmodel classifies it as either “predictable” or “defi-cient”. If the report is classified as “predictable”, it ispassed on to Phase 2 prediction; otherwise, no furtherprediction is conducted. In the latter case, developersmay ask the report submitter to give more informationabout the bug.
3.3.2 Phase 2
The Phase 2 model accepts “predictable” bug reportsobtained from Phase 1 as the input. It extracts features
→ Low precision and recall
20
21
Quality Matters
Good Report Bad Report
“not working”
“error message”
“there is a glitch at the toolbar”
“When I did B and C after A, it crashes with
this stack trace”
“My bookmark item is deleted if I try this link”
“no response after clicking button A and B”
Hooimeijer and Weimer, “Modeling bug report quality,” ASE2007Zimmermann, et al., “What makes a good bug report?” TSE2010
21
22
JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007 5
PredictionModel
a.cppa.cppa.cppb.cppb.cppb.cpp
Bug ReportsPredicted
m.cm.cm.ch.ch.ch.c
PredictedFiles to Fix
Fig. 1: One-phase Prediction Model.
Bug 403040 - Places killed all my history >2 days ago Last Comment
Status: VERIFIED FIXED Whiteboard:
Keywords: dataloss, regression
Product: FirefoxComponent: Bookmarks & History
Version: Trunk Platform: All All
Importance: P1 critical (vote) Target Milestone: Firefox 3 beta2
Assigned To: Dietrich Ayala (:dietrich)QA Contact: bookmarks
URL:
Depends on: Blocks:
Show dependency tree / graph
Reported: 2007-11-08 08:47 PST by Reed Loden [:reed] (very busy)
Modified: 2010-12-17 06:29 PST (History)
CC List: 14 users (show)
Flags: mconnor: blocking-firefox3+ (more flags)
See Also:
Crash Signature:
Summon comment box
Attachments
fix v1.1 (1.31 KB, patch)2007-11-09 11:51 PST, Dietrich Ayala (:dietrich)
set mExpireVisits to default if not set in prefs (3.37 KB, patch)2007-11-15 02:31 PST, Marco Bonardo [:mak]
Add an attachment (proposed patch, testcase, etc.)
2007-11-08 08:47:34 PST
So, I noticed last night that I was missing three days or so in my history sidebar (I think days 3, 4, and 5), and now when I check today, the only thing in my history sidebar is today and yesterday. What happened to my history? :(
I have no idea how this happened, so I can't give good STR, sorry. I have had to kill Firefox several times lately, so it may be related to some type of shutdown sqlite save or expiration or something?
Description
Dietrich Ayala (:dietrich) 2007-11-08 09:04:47 PST
What's your browser.history_expire_days value?
Comment 1
Dietrich Ayala (:dietrich) 2007-11-08 09:20:58 PST
Hrm, no other bugs reported about this. I also searched the build forums for the last few days, no comments about anything like this.
Killing Firefox shouldn't matter: the shutdown work would not have occurred if your force-killed it, and SQLite is (theoretically) immune to data corruption from unexpected shutdown given that we run it in the safest possible mode (synchronous = full).
Are all your bookmarks still there?
Comment 2
Reed Loden [:reed] (very busy) 2007-11-08 09:48:13 PST
(In reply to comment #1) > What's your browser.history_expire_days value?
Both browser.history_expire_days and browser.history_expire_days.mirror are 180 days.
(In reply to comment #2) > Are all your bookmarks still there?
Yes, all my bookmarks are still there.
Comment 3
Robert Kaiser (:[email protected]) 2007-11-09 08:13:20 PST
I'm using places history with my self-built SeaMonkey builds, and lose my places history about daily in the last few days, though it worked perfectly before in older builds. I first thought it would be lost on shutdown/restart, but I noticed that I had a few visited links left from a last session a few times, so it at least isn't at every restart.
Comment 4
Marco Bonardo [:mak] 2007-11-09 09:31:10 PST
i'm not sure if that could be the problem, but nsNavHistoryExpire::FindVisits it's looking strange:
Comment 5
Page 1 of 5Bug 403040 – Places killed all my history >2 days ago
2012-01-23https://bugzilla.mozilla.org/show_bug.cgi?id=403040
Fig. 2: An uninformative bug report. This is an excerptfrom Mozilla Bug #403040, written by the bug submit-ter. This description is not informative and the bugreviewer indeed had to ask the submitter for furtherelaboration on his browser’s history and bookmarksettings.
further computes each file’s probability of being afile to fix. Based on this probability, top-k files arerecommended to developers as the prediction result.
3.3 Two-phase Prediction Model
As Hooimeijer et al. [49] and Bettenburg et al. [50]noticed, the quality of bug reports can vary consider-ably. Some bug reports may not have enough infor-mation to predict files to fix. Our evaluation of one-phase prediction (Section 5) confirms this conjecture:bug reports whose files are not successfully predictedusually have insufficient information (e.g., no initialdescription). In other words, including uninformativebug reports might yield poor prediction performance.
Figure 2 shows an example of an uninformativebug report. In this report, the submitter describesa problem faced when using Firefox. However, thisdescription is very general and contains few informa-tive keywords that indicate the problematic modules.Therefore, it is not helpful for developers to locate thefiles to fix. Similarly, our one-phase prediction modeldoes not perform well with such uninformative bugreports.
Hence, it is desirable to filter out uninformative bugreports before the actual prediction process. Based onthis observation, we propose the two-phase predictionmodel that has two classification phases: binary andmulti-class classification (Figure 3). The model firstfilters out uninformative reports (Section 3.3.1) andthen predicts files to fix (Section 3.3.2).
Phase 2Phase 1
PredictionModel
PredictionModelP
a.cppb.cpp
Bug Reports PredictableReports Predicted
Pm.ch.c
Reports PredictedFiles to Fix
Deficient
DDeficientReports
Fig. 3: Two-phase prediction model. This model rec-ommends files to fix only when the given bug reportis determined to have sufficient information.
3.3.1 Phase 1
Phase 1 filters out uninformative bug reports beforepredicting files to fix. Its prediction model classifiesa given bug report as “predictable” or “deficient”(binary classification) as shown in Figure 3. Only bugreports classified as “predictable” are taken up for thePhase 2 prediction.
The prediction model in Phase 1 leveragesprediction history. The training dataset of this modeluses a set of bug reports that have already beenresolved. Let B = {b1, b2, . . . , bn} be a set of nresolved bug reports chronologically sorted by theirfiling date. V (bi) is the i-th bug report’s featurevector, which is extracted as described in Section 3.1.P (bi) is the set of actual files changed to fix thebug (i.e., the files in the bug’s patch), which can beobtained as well from report bi. For each report, itslabel (“predictable” or “deficient”) is determined bythe following process: for an arbitrary report bj 2 B,a one-phase prediction model Mj is trained on{(V (b1), P (b1)), (V (b2), P (b2)) . . . (V (bj�1), P (bj�1))}to predict files to fix for bj . If the predictionresult hits any file in P (bj), bj is labeled as“predictable”; otherwise, it is labeled as “deficient”.Now, let L(b) be the label of report b. Byapplying the above process to all reports inB � {b1}, we can obtain the training dataset{(V (b2), L(b2)), (V (b3), L(b3)), . . . , (V (bn), L(bn))} forthe prediction model of Phase 1. Note that no trainingdataset is built for b1 since there is no bug reportbefore b1 to create (V (b1), L(b1)).
When a new bug report is submitted, the predictionmodel classifies it as either “predictable” or “defi-cient”. If the report is classified as “predictable”, it ispassed on to Phase 2 prediction; otherwise, no furtherprediction is conducted. In the latter case, developersmay ask the report submitter to give more informationabout the bug.
3.3.2 Phase 2
The Phase 2 model accepts “predictable” bug reportsobtained from Phase 1 as the input. It extracts features
Bad Report
22
23
Two-phase Recommendation ModelJOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007 5
PredictionModel
a.cppa.cppa.cppb.cppb.cppb.cpp
Bug ReportsPredicted
m.cm.cm.ch.ch.ch.c
PredictedFiles to Fix
Fig. 1: One-phase Prediction Model.
Bug 403040 - Places killed all my history >2 days ago Last Comment
Status: VERIFIED FIXED Whiteboard:
Keywords: dataloss, regression
Product: FirefoxComponent: Bookmarks & History
Version: Trunk Platform: All All
Importance: P1 critical (vote) Target Milestone: Firefox 3 beta2
Assigned To: Dietrich Ayala (:dietrich)QA Contact: bookmarks
URL:
Depends on: Blocks:
Show dependency tree / graph
Reported: 2007-11-08 08:47 PST by Reed Loden [:reed] (very busy)
Modified: 2010-12-17 06:29 PST (History)
CC List: 14 users (show)
Flags: mconnor: blocking-firefox3+ (more flags)
See Also:
Crash Signature:
Summon comment box
Attachments
fix v1.1 (1.31 KB, patch)2007-11-09 11:51 PST, Dietrich Ayala (:dietrich)
set mExpireVisits to default if not set in prefs (3.37 KB, patch)2007-11-15 02:31 PST, Marco Bonardo [:mak]
Add an attachment (proposed patch, testcase, etc.)
2007-11-08 08:47:34 PST
So, I noticed last night that I was missing three days or so in my history sidebar (I think days 3, 4, and 5), and now when I check today, the only thing in my history sidebar is today and yesterday. What happened to my history? :(
I have no idea how this happened, so I can't give good STR, sorry. I have had to kill Firefox several times lately, so it may be related to some type of shutdown sqlite save or expiration or something?
Description
Dietrich Ayala (:dietrich) 2007-11-08 09:04:47 PST
What's your browser.history_expire_days value?
Comment 1
Dietrich Ayala (:dietrich) 2007-11-08 09:20:58 PST
Hrm, no other bugs reported about this. I also searched the build forums for the last few days, no comments about anything like this.
Killing Firefox shouldn't matter: the shutdown work would not have occurred if your force-killed it, and SQLite is (theoretically) immune to data corruption from unexpected shutdown given that we run it in the safest possible mode (synchronous = full).
Are all your bookmarks still there?
Comment 2
Reed Loden [:reed] (very busy) 2007-11-08 09:48:13 PST
(In reply to comment #1) > What's your browser.history_expire_days value?
Both browser.history_expire_days and browser.history_expire_days.mirror are 180 days.
(In reply to comment #2) > Are all your bookmarks still there?
Yes, all my bookmarks are still there.
Comment 3
Robert Kaiser (:[email protected]) 2007-11-09 08:13:20 PST
I'm using places history with my self-built SeaMonkey builds, and lose my places history about daily in the last few days, though it worked perfectly before in older builds. I first thought it would be lost on shutdown/restart, but I noticed that I had a few visited links left from a last session a few times, so it at least isn't at every restart.
Comment 4
Marco Bonardo [:mak] 2007-11-09 09:31:10 PST
i'm not sure if that could be the problem, but nsNavHistoryExpire::FindVisits it's looking strange:
Comment 5
Page 1 of 5Bug 403040 – Places killed all my history >2 days ago
2012-01-23https://bugzilla.mozilla.org/show_bug.cgi?id=403040
Fig. 2: An uninformative bug report. This is an excerptfrom Mozilla Bug #403040, written by the bug submit-ter. This description is not informative and the bugreviewer indeed had to ask the submitter for furtherelaboration on his browser’s history and bookmarksettings.
further computes each file’s probability of being afile to fix. Based on this probability, top-k files arerecommended to developers as the prediction result.
3.3 Two-phase Prediction Model
As Hooimeijer et al. [49] and Bettenburg et al. [50]noticed, the quality of bug reports can vary consider-ably. Some bug reports may not have enough infor-mation to predict files to fix. Our evaluation of one-phase prediction (Section 5) confirms this conjecture:bug reports whose files are not successfully predictedusually have insufficient information (e.g., no initialdescription). In other words, including uninformativebug reports might yield poor prediction performance.
Figure 2 shows an example of an uninformativebug report. In this report, the submitter describesa problem faced when using Firefox. However, thisdescription is very general and contains few informa-tive keywords that indicate the problematic modules.Therefore, it is not helpful for developers to locate thefiles to fix. Similarly, our one-phase prediction modeldoes not perform well with such uninformative bugreports.
Hence, it is desirable to filter out uninformative bugreports before the actual prediction process. Based onthis observation, we propose the two-phase predictionmodel that has two classification phases: binary andmulti-class classification (Figure 3). The model firstfilters out uninformative reports (Section 3.3.1) andthen predicts files to fix (Section 3.3.2).
Phase 2Phase 1
PredictionModel
PredictionModelP
a.cppb.cpp
Bug Reports PredictableReports Predicted
Pm.ch.c
Reports PredictedFiles to Fix
Deficient
DDeficientReports
Fig. 3: Two-phase prediction model. This model rec-ommends files to fix only when the given bug reportis determined to have sufficient information.
3.3.1 Phase 1
Phase 1 filters out uninformative bug reports beforepredicting files to fix. Its prediction model classifiesa given bug report as “predictable” or “deficient”(binary classification) as shown in Figure 3. Only bugreports classified as “predictable” are taken up for thePhase 2 prediction.
The prediction model in Phase 1 leveragesprediction history. The training dataset of this modeluses a set of bug reports that have already beenresolved. Let B = {b1, b2, . . . , bn} be a set of nresolved bug reports chronologically sorted by theirfiling date. V (bi) is the i-th bug report’s featurevector, which is extracted as described in Section 3.1.P (bi) is the set of actual files changed to fix thebug (i.e., the files in the bug’s patch), which can beobtained as well from report bi. For each report, itslabel (“predictable” or “deficient”) is determined bythe following process: for an arbitrary report bj 2 B,a one-phase prediction model Mj is trained on{(V (b1), P (b1)), (V (b2), P (b2)) . . . (V (bj�1), P (bj�1))}to predict files to fix for bj . If the predictionresult hits any file in P (bj), bj is labeled as“predictable”; otherwise, it is labeled as “deficient”.Now, let L(b) be the label of report b. Byapplying the above process to all reports inB � {b1}, we can obtain the training dataset{(V (b2), L(b2)), (V (b3), L(b3)), . . . , (V (bn), L(bn))} forthe prediction model of Phase 1. Note that no trainingdataset is built for b1 since there is no bug reportbefore b1 to create (V (b1), L(b1)).
When a new bug report is submitted, the predictionmodel classifies it as either “predictable” or “defi-cient”. If the report is classified as “predictable”, it ispassed on to Phase 2 prediction; otherwise, no furtherprediction is conducted. In the latter case, developersmay ask the report submitter to give more informationabout the bug.
3.3.2 Phase 2
The Phase 2 model accepts “predictable” bug reportsobtained from Phase 1 as the input. It extracts features
Noise Filtering FileRecommendation
23
24
JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007 5
PredictionModel
a.cppa.cppa.cppb.cppb.cppb.cpp
Bug ReportsPredicted
m.cm.cm.ch.ch.ch.c
PredictedFiles to Fix
Fig. 1: One-phase Prediction Model.
Bug 403040 - Places killed all my history >2 days ago Last Comment
Status: VERIFIED FIXED Whiteboard:
Keywords: dataloss, regression
Product: FirefoxComponent: Bookmarks & History
Version: Trunk Platform: All All
Importance: P1 critical (vote) Target Milestone: Firefox 3 beta2
Assigned To: Dietrich Ayala (:dietrich)QA Contact: bookmarks
URL:
Depends on: Blocks:
Show dependency tree / graph
Reported: 2007-11-08 08:47 PST by Reed Loden [:reed] (very busy)
Modified: 2010-12-17 06:29 PST (History)
CC List: 14 users (show)
Flags: mconnor: blocking-firefox3+ (more flags)
See Also:
Crash Signature:
Summon comment box
Attachments
fix v1.1 (1.31 KB, patch)2007-11-09 11:51 PST, Dietrich Ayala (:dietrich)
set mExpireVisits to default if not set in prefs (3.37 KB, patch)2007-11-15 02:31 PST, Marco Bonardo [:mak]
Add an attachment (proposed patch, testcase, etc.)
2007-11-08 08:47:34 PST
So, I noticed last night that I was missing three days or so in my history sidebar (I think days 3, 4, and 5), and now when I check today, the only thing in my history sidebar is today and yesterday. What happened to my history? :(
I have no idea how this happened, so I can't give good STR, sorry. I have had to kill Firefox several times lately, so it may be related to some type of shutdown sqlite save or expiration or something?
Description
Dietrich Ayala (:dietrich) 2007-11-08 09:04:47 PST
What's your browser.history_expire_days value?
Comment 1
Dietrich Ayala (:dietrich) 2007-11-08 09:20:58 PST
Hrm, no other bugs reported about this. I also searched the build forums for the last few days, no comments about anything like this.
Killing Firefox shouldn't matter: the shutdown work would not have occurred if your force-killed it, and SQLite is (theoretically) immune to data corruption from unexpected shutdown given that we run it in the safest possible mode (synchronous = full).
Are all your bookmarks still there?
Comment 2
Reed Loden [:reed] (very busy) 2007-11-08 09:48:13 PST
(In reply to comment #1) > What's your browser.history_expire_days value?
Both browser.history_expire_days and browser.history_expire_days.mirror are 180 days.
(In reply to comment #2) > Are all your bookmarks still there?
Yes, all my bookmarks are still there.
Comment 3
Robert Kaiser (:[email protected]) 2007-11-09 08:13:20 PST
I'm using places history with my self-built SeaMonkey builds, and lose my places history about daily in the last few days, though it worked perfectly before in older builds. I first thought it would be lost on shutdown/restart, but I noticed that I had a few visited links left from a last session a few times, so it at least isn't at every restart.
Comment 4
Marco Bonardo [:mak] 2007-11-09 09:31:10 PST
i'm not sure if that could be the problem, but nsNavHistoryExpire::FindVisits it's looking strange:
Comment 5
Page 1 of 5Bug 403040 – Places killed all my history >2 days ago
2012-01-23https://bugzilla.mozilla.org/show_bug.cgi?id=403040
Fig. 2: An uninformative bug report. This is an excerptfrom Mozilla Bug #403040, written by the bug submit-ter. This description is not informative and the bugreviewer indeed had to ask the submitter for furtherelaboration on his browser’s history and bookmarksettings.
further computes each file’s probability of being afile to fix. Based on this probability, top-k files arerecommended to developers as the prediction result.
3.3 Two-phase Prediction Model
As Hooimeijer et al. [49] and Bettenburg et al. [50]noticed, the quality of bug reports can vary consider-ably. Some bug reports may not have enough infor-mation to predict files to fix. Our evaluation of one-phase prediction (Section 5) confirms this conjecture:bug reports whose files are not successfully predictedusually have insufficient information (e.g., no initialdescription). In other words, including uninformativebug reports might yield poor prediction performance.
Figure 2 shows an example of an uninformativebug report. In this report, the submitter describesa problem faced when using Firefox. However, thisdescription is very general and contains few informa-tive keywords that indicate the problematic modules.Therefore, it is not helpful for developers to locate thefiles to fix. Similarly, our one-phase prediction modeldoes not perform well with such uninformative bugreports.
Hence, it is desirable to filter out uninformative bugreports before the actual prediction process. Based onthis observation, we propose the two-phase predictionmodel that has two classification phases: binary andmulti-class classification (Figure 3). The model firstfilters out uninformative reports (Section 3.3.1) andthen predicts files to fix (Section 3.3.2).
Phase 2Phase 1
PredictionModel
PredictionModelP
a.cppb.cpp
Bug Reports PredictableReports Predicted
Pm.ch.c
Reports PredictedFiles to Fix
Deficient
DDeficientReports
Fig. 3: Two-phase prediction model. This model rec-ommends files to fix only when the given bug reportis determined to have sufficient information.
3.3.1 Phase 1
Phase 1 filters out uninformative bug reports beforepredicting files to fix. Its prediction model classifiesa given bug report as “predictable” or “deficient”(binary classification) as shown in Figure 3. Only bugreports classified as “predictable” are taken up for thePhase 2 prediction.
The prediction model in Phase 1 leveragesprediction history. The training dataset of this modeluses a set of bug reports that have already beenresolved. Let B = {b1, b2, . . . , bn} be a set of nresolved bug reports chronologically sorted by theirfiling date. V (bi) is the i-th bug report’s featurevector, which is extracted as described in Section 3.1.P (bi) is the set of actual files changed to fix thebug (i.e., the files in the bug’s patch), which can beobtained as well from report bi. For each report, itslabel (“predictable” or “deficient”) is determined bythe following process: for an arbitrary report bj 2 B,a one-phase prediction model Mj is trained on{(V (b1), P (b1)), (V (b2), P (b2)) . . . (V (bj�1), P (bj�1))}to predict files to fix for bj . If the predictionresult hits any file in P (bj), bj is labeled as“predictable”; otherwise, it is labeled as “deficient”.Now, let L(b) be the label of report b. Byapplying the above process to all reports inB � {b1}, we can obtain the training dataset{(V (b2), L(b2)), (V (b3), L(b3)), . . . , (V (bn), L(bn))} forthe prediction model of Phase 1. Note that no trainingdataset is built for b1 since there is no bug reportbefore b1 to create (V (b1), L(b1)).
When a new bug report is submitted, the predictionmodel classifies it as either “predictable” or “defi-cient”. If the report is classified as “predictable”, it ispassed on to Phase 2 prediction; otherwise, no furtherprediction is conducted. In the latter case, developersmay ask the report submitter to give more informationabout the bug.
3.3.2 Phase 2
The Phase 2 model accepts “predictable” bug reportsobtained from Phase 1 as the input. It extracts features
Noise Filtering FileRecommendation
Noise Filtering
24
25
Noise Filtering -Classifying existing reports
bugreport #1
N-1
Model...
[Training]
N
[Testing]
Matchany file
No match
Nis [predictable]
Nis [deficient]
25
26
[predictable]Phase 1Model
[deficient]N
[Training]
Noise Filtering - Training Phase 1 model
26
27
Phase 1Model new
[Classifying]
[predictable] [deficient]
Noise Filtering - Using Phase 1 model
27
28
JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007 5
PredictionModel
a.cppa.cppa.cppb.cppb.cppb.cpp
Bug ReportsPredicted
m.cm.cm.ch.ch.ch.c
PredictedFiles to Fix
Fig. 1: One-phase Prediction Model.
Bug 403040 - Places killed all my history >2 days ago Last Comment
Status: VERIFIED FIXED Whiteboard:
Keywords: dataloss, regression
Product: FirefoxComponent: Bookmarks & History
Version: Trunk Platform: All All
Importance: P1 critical (vote) Target Milestone: Firefox 3 beta2
Assigned To: Dietrich Ayala (:dietrich)QA Contact: bookmarks
URL:
Depends on: Blocks:
Show dependency tree / graph
Reported: 2007-11-08 08:47 PST by Reed Loden [:reed] (very busy)
Modified: 2010-12-17 06:29 PST (History)
CC List: 14 users (show)
Flags: mconnor: blocking-firefox3+ (more flags)
See Also:
Crash Signature:
Summon comment box
Attachments
fix v1.1 (1.31 KB, patch)2007-11-09 11:51 PST, Dietrich Ayala (:dietrich)
set mExpireVisits to default if not set in prefs (3.37 KB, patch)2007-11-15 02:31 PST, Marco Bonardo [:mak]
Add an attachment (proposed patch, testcase, etc.)
2007-11-08 08:47:34 PST
So, I noticed last night that I was missing three days or so in my history sidebar (I think days 3, 4, and 5), and now when I check today, the only thing in my history sidebar is today and yesterday. What happened to my history? :(
I have no idea how this happened, so I can't give good STR, sorry. I have had to kill Firefox several times lately, so it may be related to some type of shutdown sqlite save or expiration or something?
Description
Dietrich Ayala (:dietrich) 2007-11-08 09:04:47 PST
What's your browser.history_expire_days value?
Comment 1
Dietrich Ayala (:dietrich) 2007-11-08 09:20:58 PST
Hrm, no other bugs reported about this. I also searched the build forums for the last few days, no comments about anything like this.
Killing Firefox shouldn't matter: the shutdown work would not have occurred if your force-killed it, and SQLite is (theoretically) immune to data corruption from unexpected shutdown given that we run it in the safest possible mode (synchronous = full).
Are all your bookmarks still there?
Comment 2
Reed Loden [:reed] (very busy) 2007-11-08 09:48:13 PST
(In reply to comment #1) > What's your browser.history_expire_days value?
Both browser.history_expire_days and browser.history_expire_days.mirror are 180 days.
(In reply to comment #2) > Are all your bookmarks still there?
Yes, all my bookmarks are still there.
Comment 3
Robert Kaiser (:[email protected]) 2007-11-09 08:13:20 PST
I'm using places history with my self-built SeaMonkey builds, and lose my places history about daily in the last few days, though it worked perfectly before in older builds. I first thought it would be lost on shutdown/restart, but I noticed that I had a few visited links left from a last session a few times, so it at least isn't at every restart.
Comment 4
Marco Bonardo [:mak] 2007-11-09 09:31:10 PST
i'm not sure if that could be the problem, but nsNavHistoryExpire::FindVisits it's looking strange:
Comment 5
Page 1 of 5Bug 403040 – Places killed all my history >2 days ago
2012-01-23https://bugzilla.mozilla.org/show_bug.cgi?id=403040
Fig. 2: An uninformative bug report. This is an excerptfrom Mozilla Bug #403040, written by the bug submit-ter. This description is not informative and the bugreviewer indeed had to ask the submitter for furtherelaboration on his browser’s history and bookmarksettings.
further computes each file’s probability of being afile to fix. Based on this probability, top-k files arerecommended to developers as the prediction result.
3.3 Two-phase Prediction Model
As Hooimeijer et al. [49] and Bettenburg et al. [50]noticed, the quality of bug reports can vary consider-ably. Some bug reports may not have enough infor-mation to predict files to fix. Our evaluation of one-phase prediction (Section 5) confirms this conjecture:bug reports whose files are not successfully predictedusually have insufficient information (e.g., no initialdescription). In other words, including uninformativebug reports might yield poor prediction performance.
Figure 2 shows an example of an uninformativebug report. In this report, the submitter describesa problem faced when using Firefox. However, thisdescription is very general and contains few informa-tive keywords that indicate the problematic modules.Therefore, it is not helpful for developers to locate thefiles to fix. Similarly, our one-phase prediction modeldoes not perform well with such uninformative bugreports.
Hence, it is desirable to filter out uninformative bugreports before the actual prediction process. Based onthis observation, we propose the two-phase predictionmodel that has two classification phases: binary andmulti-class classification (Figure 3). The model firstfilters out uninformative reports (Section 3.3.1) andthen predicts files to fix (Section 3.3.2).
Phase 2Phase 1
PredictionModel
PredictionModelP
a.cppb.cpp
Bug Reports PredictableReports Predicted
Pm.ch.c
Reports PredictedFiles to Fix
Deficient
DDeficientReports
Fig. 3: Two-phase prediction model. This model rec-ommends files to fix only when the given bug reportis determined to have sufficient information.
3.3.1 Phase 1
Phase 1 filters out uninformative bug reports beforepredicting files to fix. Its prediction model classifiesa given bug report as “predictable” or “deficient”(binary classification) as shown in Figure 3. Only bugreports classified as “predictable” are taken up for thePhase 2 prediction.
The prediction model in Phase 1 leveragesprediction history. The training dataset of this modeluses a set of bug reports that have already beenresolved. Let B = {b1, b2, . . . , bn} be a set of nresolved bug reports chronologically sorted by theirfiling date. V (bi) is the i-th bug report’s featurevector, which is extracted as described in Section 3.1.P (bi) is the set of actual files changed to fix thebug (i.e., the files in the bug’s patch), which can beobtained as well from report bi. For each report, itslabel (“predictable” or “deficient”) is determined bythe following process: for an arbitrary report bj 2 B,a one-phase prediction model Mj is trained on{(V (b1), P (b1)), (V (b2), P (b2)) . . . (V (bj�1), P (bj�1))}to predict files to fix for bj . If the predictionresult hits any file in P (bj), bj is labeled as“predictable”; otherwise, it is labeled as “deficient”.Now, let L(b) be the label of report b. Byapplying the above process to all reports inB � {b1}, we can obtain the training dataset{(V (b2), L(b2)), (V (b3), L(b3)), . . . , (V (bn), L(bn))} forthe prediction model of Phase 1. Note that no trainingdataset is built for b1 since there is no bug reportbefore b1 to create (V (b1), L(b1)).
When a new bug report is submitted, the predictionmodel classifies it as either “predictable” or “defi-cient”. If the report is classified as “predictable”, it ispassed on to Phase 2 prediction; otherwise, no furtherprediction is conducted. In the latter case, developersmay ask the report submitter to give more informationabout the bug.
3.3.2 Phase 2
The Phase 2 model accepts “predictable” bug reportsobtained from Phase 1 as the input. It extracts features
Noise Filtering FileRecommendation
File Recommendation
28
29
Evaluation
Two-phaseModel
JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007 5
PredictionModel
a.cppa.cppa.cppb.cppb.cppb.cpp
Bug ReportsPredicted
m.cm.cm.ch.ch.ch.c
PredictedFiles to Fix
Fig. 1: One-phase Prediction Model.
Bug 403040 - Places killed all my history >2 days ago Last Comment
Status: VERIFIED FIXED Whiteboard:
Keywords: dataloss, regression
Product: FirefoxComponent: Bookmarks & History
Version: Trunk Platform: All All
Importance: P1 critical (vote) Target Milestone: Firefox 3 beta2
Assigned To: Dietrich Ayala (:dietrich)QA Contact: bookmarks
URL:
Depends on: Blocks:
Show dependency tree / graph
Reported: 2007-11-08 08:47 PST by Reed Loden [:reed] (very busy)
Modified: 2010-12-17 06:29 PST (History)
CC List: 14 users (show)
Flags: mconnor: blocking-firefox3+ (more flags)
See Also:
Crash Signature:
Summon comment box
Attachments
fix v1.1 (1.31 KB, patch)2007-11-09 11:51 PST, Dietrich Ayala (:dietrich)
set mExpireVisits to default if not set in prefs (3.37 KB, patch)2007-11-15 02:31 PST, Marco Bonardo [:mak]
Add an attachment (proposed patch, testcase, etc.)
2007-11-08 08:47:34 PST
So, I noticed last night that I was missing three days or so in my history sidebar (I think days 3, 4, and 5), and now when I check today, the only thing in my history sidebar is today and yesterday. What happened to my history? :(
I have no idea how this happened, so I can't give good STR, sorry. I have had to kill Firefox several times lately, so it may be related to some type of shutdown sqlite save or expiration or something?
Description
Dietrich Ayala (:dietrich) 2007-11-08 09:04:47 PST
What's your browser.history_expire_days value?
Comment 1
Dietrich Ayala (:dietrich) 2007-11-08 09:20:58 PST
Hrm, no other bugs reported about this. I also searched the build forums for the last few days, no comments about anything like this.
Killing Firefox shouldn't matter: the shutdown work would not have occurred if your force-killed it, and SQLite is (theoretically) immune to data corruption from unexpected shutdown given that we run it in the safest possible mode (synchronous = full).
Are all your bookmarks still there?
Comment 2
Reed Loden [:reed] (very busy) 2007-11-08 09:48:13 PST
(In reply to comment #1) > What's your browser.history_expire_days value?
Both browser.history_expire_days and browser.history_expire_days.mirror are 180 days.
(In reply to comment #2) > Are all your bookmarks still there?
Yes, all my bookmarks are still there.
Comment 3
Robert Kaiser (:[email protected]) 2007-11-09 08:13:20 PST
I'm using places history with my self-built SeaMonkey builds, and lose my places history about daily in the last few days, though it worked perfectly before in older builds. I first thought it would be lost on shutdown/restart, but I noticed that I had a few visited links left from a last session a few times, so it at least isn't at every restart.
Comment 4
Marco Bonardo [:mak] 2007-11-09 09:31:10 PST
i'm not sure if that could be the problem, but nsNavHistoryExpire::FindVisits it's looking strange:
Comment 5
Page 1 of 5Bug 403040 – Places killed all my history >2 days ago
2012-01-23https://bugzilla.mozilla.org/show_bug.cgi?id=403040
Fig. 2: An uninformative bug report. This is an excerptfrom Mozilla Bug #403040, written by the bug submit-ter. This description is not informative and the bugreviewer indeed had to ask the submitter for furtherelaboration on his browser’s history and bookmarksettings.
further computes each file’s probability of being afile to fix. Based on this probability, top-k files arerecommended to developers as the prediction result.
3.3 Two-phase Prediction Model
As Hooimeijer et al. [49] and Bettenburg et al. [50]noticed, the quality of bug reports can vary consider-ably. Some bug reports may not have enough infor-mation to predict files to fix. Our evaluation of one-phase prediction (Section 5) confirms this conjecture:bug reports whose files are not successfully predictedusually have insufficient information (e.g., no initialdescription). In other words, including uninformativebug reports might yield poor prediction performance.
Figure 2 shows an example of an uninformativebug report. In this report, the submitter describesa problem faced when using Firefox. However, thisdescription is very general and contains few informa-tive keywords that indicate the problematic modules.Therefore, it is not helpful for developers to locate thefiles to fix. Similarly, our one-phase prediction modeldoes not perform well with such uninformative bugreports.
Hence, it is desirable to filter out uninformative bugreports before the actual prediction process. Based onthis observation, we propose the two-phase predictionmodel that has two classification phases: binary andmulti-class classification (Figure 3). The model firstfilters out uninformative reports (Section 3.3.1) andthen predicts files to fix (Section 3.3.2).
Phase 2Phase 1
PredictionModel
PredictionModelP
a.cppb.cpp
Bug Reports PredictableReports Predicted
Pm.ch.c
Reports PredictedFiles to Fix
Deficient
DDeficientReports
Fig. 3: Two-phase prediction model. This model rec-ommends files to fix only when the given bug reportis determined to have sufficient information.
3.3.1 Phase 1
Phase 1 filters out uninformative bug reports beforepredicting files to fix. Its prediction model classifiesa given bug report as “predictable” or “deficient”(binary classification) as shown in Figure 3. Only bugreports classified as “predictable” are taken up for thePhase 2 prediction.
The prediction model in Phase 1 leveragesprediction history. The training dataset of this modeluses a set of bug reports that have already beenresolved. Let B = {b1, b2, . . . , bn} be a set of nresolved bug reports chronologically sorted by theirfiling date. V (bi) is the i-th bug report’s featurevector, which is extracted as described in Section 3.1.P (bi) is the set of actual files changed to fix thebug (i.e., the files in the bug’s patch), which can beobtained as well from report bi. For each report, itslabel (“predictable” or “deficient”) is determined bythe following process: for an arbitrary report bj 2 B,a one-phase prediction model Mj is trained on{(V (b1), P (b1)), (V (b2), P (b2)) . . . (V (bj�1), P (bj�1))}to predict files to fix for bj . If the predictionresult hits any file in P (bj), bj is labeled as“predictable”; otherwise, it is labeled as “deficient”.Now, let L(b) be the label of report b. Byapplying the above process to all reports inB � {b1}, we can obtain the training dataset{(V (b2), L(b2)), (V (b3), L(b3)), . . . , (V (bn), L(bn))} forthe prediction model of Phase 1. Note that no trainingdataset is built for b1 since there is no bug reportbefore b1 to create (V (b1), L(b1)).
When a new bug report is submitted, the predictionmodel classifies it as either “predictable” or “defi-cient”. If the report is classified as “predictable”, it ispassed on to Phase 2 prediction; otherwise, no furtherprediction is conducted. In the latter case, developersmay ask the report submitter to give more informationabout the bug.
3.3.2 Phase 2
The Phase 2 model accepts “predictable” bug reportsobtained from Phase 1 as the input. It extracts features
-‐ Comparative Study
JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007 5
PredictionModel
a.cppa.cppa.cppb.cppb.cppb.cpp
Bug ReportsPredicted
m.cm.cm.ch.ch.ch.c
PredictedFiles to Fix
Fig. 1: One-phase Prediction Model.
Bug 403040 - Places killed all my history >2 days ago Last Comment
Status: VERIFIED FIXED Whiteboard:
Keywords: dataloss, regression
Product: FirefoxComponent: Bookmarks & History
Version: Trunk Platform: All All
Importance: P1 critical (vote) Target Milestone: Firefox 3 beta2
Assigned To: Dietrich Ayala (:dietrich)QA Contact: bookmarks
URL:
Depends on: Blocks:
Show dependency tree / graph
Reported: 2007-11-08 08:47 PST by Reed Loden [:reed] (very busy)
Modified: 2010-12-17 06:29 PST (History)
CC List: 14 users (show)
Flags: mconnor: blocking-firefox3+ (more flags)
See Also:
Crash Signature:
Summon comment box
Attachments
fix v1.1 (1.31 KB, patch)2007-11-09 11:51 PST, Dietrich Ayala (:dietrich)
set mExpireVisits to default if not set in prefs (3.37 KB, patch)2007-11-15 02:31 PST, Marco Bonardo [:mak]
Add an attachment (proposed patch, testcase, etc.)
2007-11-08 08:47:34 PST
So, I noticed last night that I was missing three days or so in my history sidebar (I think days 3, 4, and 5), and now when I check today, the only thing in my history sidebar is today and yesterday. What happened to my history? :(
I have no idea how this happened, so I can't give good STR, sorry. I have had to kill Firefox several times lately, so it may be related to some type of shutdown sqlite save or expiration or something?
Description
Dietrich Ayala (:dietrich) 2007-11-08 09:04:47 PST
What's your browser.history_expire_days value?
Comment 1
Dietrich Ayala (:dietrich) 2007-11-08 09:20:58 PST
Hrm, no other bugs reported about this. I also searched the build forums for the last few days, no comments about anything like this.
Killing Firefox shouldn't matter: the shutdown work would not have occurred if your force-killed it, and SQLite is (theoretically) immune to data corruption from unexpected shutdown given that we run it in the safest possible mode (synchronous = full).
Are all your bookmarks still there?
Comment 2
Reed Loden [:reed] (very busy) 2007-11-08 09:48:13 PST
(In reply to comment #1) > What's your browser.history_expire_days value?
Both browser.history_expire_days and browser.history_expire_days.mirror are 180 days.
(In reply to comment #2) > Are all your bookmarks still there?
Yes, all my bookmarks are still there.
Comment 3
Robert Kaiser (:[email protected]) 2007-11-09 08:13:20 PST
I'm using places history with my self-built SeaMonkey builds, and lose my places history about daily in the last few days, though it worked perfectly before in older builds. I first thought it would be lost on shutdown/restart, but I noticed that I had a few visited links left from a last session a few times, so it at least isn't at every restart.
Comment 4
Marco Bonardo [:mak] 2007-11-09 09:31:10 PST
i'm not sure if that could be the problem, but nsNavHistoryExpire::FindVisits it's looking strange:
Comment 5
Page 1 of 5Bug 403040 – Places killed all my history >2 days ago
2012-01-23https://bugzilla.mozilla.org/show_bug.cgi?id=403040
Fig. 2: An uninformative bug report. This is an excerptfrom Mozilla Bug #403040, written by the bug submit-ter. This description is not informative and the bugreviewer indeed had to ask the submitter for furtherelaboration on his browser’s history and bookmarksettings.
further computes each file’s probability of being afile to fix. Based on this probability, top-k files arerecommended to developers as the prediction result.
3.3 Two-phase Prediction Model
As Hooimeijer et al. [49] and Bettenburg et al. [50]noticed, the quality of bug reports can vary consider-ably. Some bug reports may not have enough infor-mation to predict files to fix. Our evaluation of one-phase prediction (Section 5) confirms this conjecture:bug reports whose files are not successfully predictedusually have insufficient information (e.g., no initialdescription). In other words, including uninformativebug reports might yield poor prediction performance.
Figure 2 shows an example of an uninformativebug report. In this report, the submitter describesa problem faced when using Firefox. However, thisdescription is very general and contains few informa-tive keywords that indicate the problematic modules.Therefore, it is not helpful for developers to locate thefiles to fix. Similarly, our one-phase prediction modeldoes not perform well with such uninformative bugreports.
Hence, it is desirable to filter out uninformative bugreports before the actual prediction process. Based onthis observation, we propose the two-phase predictionmodel that has two classification phases: binary andmulti-class classification (Figure 3). The model firstfilters out uninformative reports (Section 3.3.1) andthen predicts files to fix (Section 3.3.2).
Phase 2Phase 1
PredictionModel
PredictionModelP
a.cppb.cpp
Bug Reports PredictableReports Predicted
Pm.ch.c
Reports PredictedFiles to Fix
Deficient
DDeficientReports
Fig. 3: Two-phase prediction model. This model rec-ommends files to fix only when the given bug reportis determined to have sufficient information.
3.3.1 Phase 1
Phase 1 filters out uninformative bug reports beforepredicting files to fix. Its prediction model classifiesa given bug report as “predictable” or “deficient”(binary classification) as shown in Figure 3. Only bugreports classified as “predictable” are taken up for thePhase 2 prediction.
The prediction model in Phase 1 leveragesprediction history. The training dataset of this modeluses a set of bug reports that have already beenresolved. Let B = {b1, b2, . . . , bn} be a set of nresolved bug reports chronologically sorted by theirfiling date. V (bi) is the i-th bug report’s featurevector, which is extracted as described in Section 3.1.P (bi) is the set of actual files changed to fix thebug (i.e., the files in the bug’s patch), which can beobtained as well from report bi. For each report, itslabel (“predictable” or “deficient”) is determined bythe following process: for an arbitrary report bj 2 B,a one-phase prediction model Mj is trained on{(V (b1), P (b1)), (V (b2), P (b2)) . . . (V (bj�1), P (bj�1))}to predict files to fix for bj . If the predictionresult hits any file in P (bj), bj is labeled as“predictable”; otherwise, it is labeled as “deficient”.Now, let L(b) be the label of report b. Byapplying the above process to all reports inB � {b1}, we can obtain the training dataset{(V (b2), L(b2)), (V (b3), L(b3)), . . . , (V (bn), L(bn))} forthe prediction model of Phase 1. Note that no trainingdataset is built for b1 since there is no bug reportbefore b1 to create (V (b1), L(b1)).
When a new bug report is submitted, the predictionmodel classifies it as either “predictable” or “defi-cient”. If the report is classified as “predictable”, it ispassed on to Phase 2 prediction; otherwise, no furtherprediction is conducted. In the latter case, developersmay ask the report submitter to give more informationabout the bug.
3.3.2 Phase 2
The Phase 2 model accepts “predictable” bug reportsobtained from Phase 1 as the input. It extracts features
One-phaseModel
29
30
JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007 9
●
●
●
●
●●
●
●● ●
Top 1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
50
60
70
80
90
100Li
kelih
ood
(%)
ff−bookmark# of total test cases: 431# of predictable cases: 196Feedback: 45.5%
●
Usual SuspectsOne−phaseBugScoutTwo−phase
●●
●●
●
● ● ● ●●
Top 1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
50
60
70
80
90
100
Like
lihoo
d (%
)
ff−general# of total test cases: 216# of predictable cases: 158Feedback: 73.2%
●
Usual SuspectsOne−phaseBugScoutTwo−phase
●
●●
●●
●●
● ●●
Top 1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
50
60
70
80
90
100
Like
lihoo
d (%
)
core−js# of total test cases: 517# of predictable cases: 446Feedback: 78.1%
●
Usual SuspectsOne−phaseBugScoutTwo−phase
●
●
●
●
●● ●
● ● ●
Top 1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
50
60
70
80
90
100
Like
lihoo
d (%
)
core−dom# of total test cases: 251# of predictable cases: 98Feedback: 39.0%
●
Usual SuspectsOne−phaseBugScoutTwo−phase
●
●
●
●● ● ● ● ● ●
Top 1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
50
60
70
80
90
100
Like
lihoo
d (%
)
core−layout# of total test cases: 471# of predictable cases: 208Feedback: 44.2%
●
Usual SuspectsOne−phaseBugScoutTwo−phase
●
●
●●
● ● ● ● ●●
Top 1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
50
60
70
80
90
100
Like
lihoo
d (%
)
core−style# of total test cases: 171# of predictable cases: 127Feedback: 74.3%
●
Usual SuspectsOne−phaseBugScoutTwo−phase
●
●
●●
●● ● ● ● ●
Top 1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
50
60
70
80
90
100
Like
lihoo
d (%
)
core−xpcom# of total test cases: 202# of predictable cases: 41Feedback: 20.3%
●
Usual SuspectsOne−phaseBugScoutTwo−phase
●● ●
●
●● ●
●● ●
Top 1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
50
60
70
80
90
100
Like
lihoo
d (%
)
core−xul# of total test cases: 202# of predictable cases: 37Feedback: 18.3%
●
Usual SuspectsOne−phaseBugScoutTwo−phase
Fig. 4: Prediction likelihood for each module shown in Table 1. The Y-axis represents the likelihood valuescomputed by Equation (1). The X-axis represents the k values described in Section 3. In the upper-left cornerof each plot, the total number of bug reports in the test set, the number of predictable bug reports, and feedbackvalue computed by Equation (5) are shown.
Results - Likelihood
30
30
JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007 9
●
●
●
●
●●
●
●● ●
Top 1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
50
60
70
80
90
100Li
kelih
ood
(%)
ff−bookmark# of total test cases: 431# of predictable cases: 196Feedback: 45.5%
●
Usual SuspectsOne−phaseBugScoutTwo−phase
●●
●●
●
● ● ● ●●
Top 1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
50
60
70
80
90
100
Like
lihoo
d (%
)
ff−general# of total test cases: 216# of predictable cases: 158Feedback: 73.2%
●
Usual SuspectsOne−phaseBugScoutTwo−phase
●
●●
●●
●●
● ●●
Top 1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
50
60
70
80
90
100
Like
lihoo
d (%
)
core−js# of total test cases: 517# of predictable cases: 446Feedback: 78.1%
●
Usual SuspectsOne−phaseBugScoutTwo−phase
●
●
●
●
●● ●
● ● ●
Top 1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
50
60
70
80
90
100
Like
lihoo
d (%
)
core−dom# of total test cases: 251# of predictable cases: 98Feedback: 39.0%
●
Usual SuspectsOne−phaseBugScoutTwo−phase
●
●
●
●● ● ● ● ● ●
Top 1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
50
60
70
80
90
100
Like
lihoo
d (%
)
core−layout# of total test cases: 471# of predictable cases: 208Feedback: 44.2%
●
Usual SuspectsOne−phaseBugScoutTwo−phase
●
●
●●
● ● ● ● ●●
Top 1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
50
60
70
80
90
100
Like
lihoo
d (%
)
core−style# of total test cases: 171# of predictable cases: 127Feedback: 74.3%
●
Usual SuspectsOne−phaseBugScoutTwo−phase
●
●
●●
●● ● ● ● ●
Top 1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
50
60
70
80
90
100
Like
lihoo
d (%
)
core−xpcom# of total test cases: 202# of predictable cases: 41Feedback: 20.3%
●
Usual SuspectsOne−phaseBugScoutTwo−phase
●● ●
●
●● ●
●● ●
Top 1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
50
60
70
80
90
100
Like
lihoo
d (%
)
core−xul# of total test cases: 202# of predictable cases: 37Feedback: 18.3%
●
Usual SuspectsOne−phaseBugScoutTwo−phase
Fig. 4: Prediction likelihood for each module shown in Table 1. The Y-axis represents the likelihood valuescomputed by Equation (1). The X-axis represents the k values described in Section 3. In the upper-left cornerof each plot, the total number of bug reports in the test set, the number of predictable bug reports, and feedbackvalue computed by Equation (5) are shown.
Results - Likelihood
JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007 8
differences are statistically significant with 95%confidence [57].We chose this non-parametric test method insteadof any parametric test method such as t-testbecause the distribution of our evaluation resultsmay not be normal.
In addition, we used Feedback [25] to compute theratio of bug reports classified as predictable after Phase1 prediction. Let NP denote the number of predictablebug reports and ND denote the number of deficientones. Feedback is computed as follows:
Feedback =NP
NP +ND(5)
5 RESULTS
This section reports the evaluation results. Sections 5.1and 5.2 report the prediction performance and com-pare the results of four different models with theirstatistical significance (RQ1). We discuss the feedback(RQ2) in Section 5.3, and present the sensitivity anal-ysis in Section 5.4 to compare the prediction powerof individual features (RQ3). Section 5.5 shows exam-ples of usage to demonstrate how our approach canimprove developers’ bug-fixing practice (RQ4).
5.1 Performance
We first address RQ1: What is the predictive powerof the two-phase model in recommending files to fix?We present the likelihood, precision and recall valuesin Figures 4, 5, and 6, respectively. Since the modelrecommends top-k files, the performance depends onthe value of k. The X-axis of the figures represents thek value, which ranges from 1 to 10.
When recommending only the top one file (i.e.,k = 1), the two-phase model’s likelihood ranges from19% to 57%. The likelihood value grows as k increases.When k = 10, the two-phase model yields likelihoodbetween 52% and 88%. Suppose there are 10 bugreports. In the best scenario, our two-phase predictionmodel is able to successfully recommend at least onefile to fix for 6 to 9 out of 10 reports, which is verypromising.
When k = 1, the two-phase model’s precisionranges from 6% to 47%, with average of 23%. Theprecision ranges from 7% to 11% when k = 10. Thesevalues indicate that the two-phase model can makecorrect prediction even with a small k.
The average recall of the two-phase model increasesfrom 9% to 33% as k grows from 1 to 10. This indicatesthat when recommending the top ten files, our modelcan correctly suggest on average 1/3 of files whichneed to be fixed for a given bug. In addition, thetwo-phase model achieves a 60% recall value for ff-bookmark when k = 10.
◆✓
⇣⌘
Our two-phase model successfully predicts files tofix for 52% to 88% of all bug reports, with an
average of 70%.
5.2 Comparison
As shown in Figure 4, the two-phase model outper-forms the one-phase model in prediction likelihood.For example, when recommending the top 10 files, thelikelihood of the two-phase model for eight modulesranges from 52% to 88%, with an average value of70%. The one-phase model, on the other hand, has anaverage likelihood of only 44% when k=10, which iseven less than the lowest prediction likelihood of thetwo-phase model.
To counteract the problem that rare events arelikely to be observed in multiple comparisons, weused Bonferroni correction [58] so that a p-value lessthan 0.05/4 = 0.0125 indicates a significant differencebetween the corresponding pair of models. As shownin Table 2, the two-phase model significantly outper-forms the one-phase model for half of the modules.
The two-phase model also manifests higher preci-sion and recall than the one-phase model, as shownin Figures 5 and 6.◆
✓⇣⌘
The two-phase model outperformsthe one-phase model in prediction likelihood,
precision, and recall.
The one-phase model, on the other hand, manifestsprediction performance comparable to the Usual Sus-pects model — the last column of Table 2 shows thatthe p-values between these two models are greaterthan 0.0125 for all eight modules. BugScout alsoshows performance similar to the Usual Suspects, asshown in Figures 4, 5 and 6. One possible reason isthat BugScout leverages the defect-proneness infor-mation to recommend files to fix, an idea similar tothe Usual Suspects model.�
⇢
⇠
⇡Only the two-phase model outperforms
the Usual Suspects model, while the one-phasemodel and BugScout are both on par with the
Usual Suspects model.
We also compared the average rank of correctlypredicted files for each model (Equation 4). As shownin Table 3, the two-phase model has the highestaverage rank among the four prediction models for6 out of 8 modules (except for core-js and core-xul).This implies that compared to the other three models,developers might have more confidence in using thetwo-phase model since it ranks correctly predictedfiles at a higher position, which could potentially savetheir inspection time.
30
Which Crashes Should I Fix First?Crashing Bug Prioritization
Dongsun Kim, Xinming Wang, Sunghun Kim, S. C. CheungThe Hong Kong University of Science and Technology, China
Andreas ZellerSaarland University
IEEE Transactions on Software Engineering, May/June 2011selected as the featured article of the issue
Sooyong ParkSogang University
31
32
Crashes
32
32
Crashes
32
32
Crashes
32
32
Crashes
32
33
Crash Reporting System
Apple Crash Report
Dr. Watson
Breakpad + Socorro
33
34
Bucketting
34
34
Bucketting
34
35
Top Crashes
of crash reports, we sorted crashes by their frequency ofbeing reported, and then counted the percentage of crashreports accounted for in each interval of 10 crashes. The barchart in Fig. 5 shows the results. For example, the leftmostbar indicates that the top-10 crashes accounted for morethan 50 percent of the Firefox crash reports and more than35 percent of the Thunderbird crash reports. Fig. 5 providesthe initial validation of our hypothesis: For example, thetop-20 crashes account for 72 and 55 percent of the crashreports for Firefox and Thunderbird, respectively.
Note that such a trend has also been observed incommercial software. For example, by analyzing crashreporting data, Microsoft has found that a small set ofdefects is responsible for the vast majority of its code-relatedproblems: “fixing 20 percent of code defects can eliminate80 percent or more of the problems users encounter” [1]. Thisindicates that identifying top crashes is important forcommercial products as well as open source projects.
Moreover, such a phenomenon is not restricted to crash-related failures. For example, Adams [2] observed that mostoperational system failures are caused by a small propor-tion of latent faults. Goseva and Hamill [23], [25] observedthat a few small regions in a program could account for thereliability of the whole program. Our finding here isconsistent with these studies.
3.2 Limitation of Current Practice
Top crashes need to be fixed as soon as possible. Given atop crash, how long does it take for developers to startworking on it? Ideally, a top crash should be handledimmediately once it is reported. In other words, the date ofa first crash report should be close to the date whendevelopers begin to work on the crash. To verify whetherthis is the case in the real world, we investigated the crashesand bug-fixing activities of Firefox 3.5.
One issue here is how to determine the time whendevelopers begin to work with a crash. In Mozilla projectssuch as Firefox and Thunderbird, management policymandates that any bug-fixing activity for a crash in the crashrepository must begin with the creation of a bug report usingBugzilla [10] by the developer. Thus, when the developercreates a bug report for a crash, we assume that he or she isready to work on this crash. Therefore, we regard the timewhen its corresponding bug report is created as the timewhen developers begin to work on this crash. With thisinformation, we calculated the number of days it took for a
KIM ET AL.: WHICH CRASHES SHOULD I FIX FIRST?: PREDICTING TOP CRASHES AT AN EARLY STAGE TO PRIORITIZE DEBUGGING... 433
Fig. 4. Number of crash reports for Firefox 3.5 per day since its release (30 June 2009). More than 14,000-24,000 crash reports have been reportedper day. The number of crash reports indicates that users experienced at least the same number of failures (abrupt program termination). Note that750 crashes for (crash points) are reported for Firefox 3.5.
Fig. 5. Number of crash reports ranked in groups of 10 for Firefox andThunderbird. Firefox 3.0 and Thunderbird 3.0 crash reports werecollected for July 2008-December 2008, and January 2009-May 2009,respectively. The top-10 crashes accounted for more than 35 percent(Thunderbird) and 50 percent (Firefox) of the total number of crashreports.
35
35
Top Crashes
of crash reports, we sorted crashes by their frequency ofbeing reported, and then counted the percentage of crashreports accounted for in each interval of 10 crashes. The barchart in Fig. 5 shows the results. For example, the leftmostbar indicates that the top-10 crashes accounted for morethan 50 percent of the Firefox crash reports and more than35 percent of the Thunderbird crash reports. Fig. 5 providesthe initial validation of our hypothesis: For example, thetop-20 crashes account for 72 and 55 percent of the crashreports for Firefox and Thunderbird, respectively.
Note that such a trend has also been observed incommercial software. For example, by analyzing crashreporting data, Microsoft has found that a small set ofdefects is responsible for the vast majority of its code-relatedproblems: “fixing 20 percent of code defects can eliminate80 percent or more of the problems users encounter” [1]. Thisindicates that identifying top crashes is important forcommercial products as well as open source projects.
Moreover, such a phenomenon is not restricted to crash-related failures. For example, Adams [2] observed that mostoperational system failures are caused by a small propor-tion of latent faults. Goseva and Hamill [23], [25] observedthat a few small regions in a program could account for thereliability of the whole program. Our finding here isconsistent with these studies.
3.2 Limitation of Current Practice
Top crashes need to be fixed as soon as possible. Given atop crash, how long does it take for developers to startworking on it? Ideally, a top crash should be handledimmediately once it is reported. In other words, the date ofa first crash report should be close to the date whendevelopers begin to work on the crash. To verify whetherthis is the case in the real world, we investigated the crashesand bug-fixing activities of Firefox 3.5.
One issue here is how to determine the time whendevelopers begin to work with a crash. In Mozilla projectssuch as Firefox and Thunderbird, management policymandates that any bug-fixing activity for a crash in the crashrepository must begin with the creation of a bug report usingBugzilla [10] by the developer. Thus, when the developercreates a bug report for a crash, we assume that he or she isready to work on this crash. Therefore, we regard the timewhen its corresponding bug report is created as the timewhen developers begin to work on this crash. With thisinformation, we calculated the number of days it took for a
KIM ET AL.: WHICH CRASHES SHOULD I FIX FIRST?: PREDICTING TOP CRASHES AT AN EARLY STAGE TO PRIORITIZE DEBUGGING... 433
Fig. 4. Number of crash reports for Firefox 3.5 per day since its release (30 June 2009). More than 14,000-24,000 crash reports have been reportedper day. The number of crash reports indicates that users experienced at least the same number of failures (abrupt program termination). Note that750 crashes for (crash points) are reported for Firefox 3.5.
Fig. 5. Number of crash reports ranked in groups of 10 for Firefox andThunderbird. Firefox 3.0 and Thunderbird 3.0 crash reports werecollected for July 2008-December 2008, and January 2009-May 2009,respectively. The top-10 crashes accounted for more than 35 percent(Thunderbird) and 50 percent (Firefox) of the total number of crashreports.
Top 20 Crashes account for > 50~70% crashes
35
36
developer to start working on a top crash. Fig. 6 shows theresults for the top-100 crashes of Firefox 3.5.
From Fig. 6, we can observe that the real situation is farfrom ideal: On average, developers waited 40 days until theystarted to work on a top-10 crash. This is unfortunate because,given the frequency of these top crashes, such a delay wouldmean hundreds of thousands of crash occurrences.
So why did Mozilla developers allow such a long delayin handling top crashes? One might blame this delay oninsufficient motivation for maintenance. However, ourpersonal communication with Mozilla development teammembers Gary Kong and Channy Yun suggests otherwise:Mozilla developers are generally eager to work on topcrashes. However, they are conservative in acknowledginga crash as a top crash, even if it appears at the top of the listfor the moment. This conservativeness is driven by theconcern that, at the early stage when crashes are firstreported (e.g., in the alpha and beta-testing phases), thefrequency of a crash might be substantially different fromits frequency at the later stage. Therefore, developers preferto “wait and see” until there are sufficient crash reports tosupport a crash being a top crash.
What if Mozilla developers were less conservative? Letus assume that they had used the data at an early stage, thealpha-testing phase, to determine top crashes. Using the5,199 crash reports submitted during the alpha-testingphase of Firefox 3.5, they would replace those crashes thatoccurred most frequently in this stage. However, are thesecrashes really the top crashes? Fig. 7 illustrates the rankingof these crashes in terms of their actual occurrencefrequencies, which are derived from all 415,351 crashreports submitted during the main life span of Firefox 3.5(from the start of alpha testing to the day when the nextversion was released). In this figure, each bar represents ak-most-frequent crash in the alpha-testing phase. Forexample, the leftmost bar indicates that the most-frequentcrash in the alpha-testing phase is ranked 162nd in terms ofactual occurrence frequency.
From Fig. 7, we can observe that the k-most-frequentcrashes in the alpha-testing phase are poor indicators ofactual top crashes: Only two of them (k ¼ 3 and k ¼ 10) aretop-20 crashes, while most of the others are actuallyinfrequent crashes. In fact, the 20 most-frequent crashes inthe alpha-testing phase can account for only 13.35 percent ofthe all crash reports of Firefox 3.5, whereas the actual top-20crashes account for 78.26 percent. The key reason, as pointedout by Fenton and Neil [19], is that the failure rate of a fault atthe early stage (prerelease) can be significantly different fromits failure rate after release. In practice, the goal of internaland volunteer alpha testers is to expose the most number ofbugs with the least number of test cases. Therefore, theyusually tend not to repeat already-exercised crashing testcases even though these test cases might trigger top crashes.
The above discussion highlights the dilemma of thecurrent practice: By being more conservative in determin-ing top crashes, developers delay bug fixing, but by beingless conservative in determining top crashes, developersmiss the actual top crashes. The core of the problem isthat current practice relies on hindsight to identify topcrashes, that is, we can accurately identify top crashesonly after they have already caused significant trouble forthe users.
It should be noted that most of the top crashes do occurin the early phase, although they are not frequent. Forexample, 16 of the top-20 crashes of Firefox 3.5 occurred atleast once during the alpha testing (shown in the bottom-right Gantt chart of Fig. 8). This indicates an opportunity forimproving current practice (see Section 6.7 for morediscussion on this topic).
3.3 How Can Prediction Improve the CurrentPractice?
To address this problem of current practice, we advocate aprediction-based approach that does not rely on hindsightto identify top crashes. With our approach, it becomesfeasible to identify top crashes during prerelease testing(i.e., alpha or beta testing), and also to react as soon as thefirst crash reports are received. Rather than waiting for anumber of crashes to occur, developers can identify andaddress the most pressing problems without delay.
To see the benefit of our approach, let us assume that wehave an “ideal top-crashes predictor” that can accurately
434 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 37, NO. 3, MAY/JUNE 2011
Fig. 6. Number of days for crashes to be reported as bugs (Firefox 3.5).We measured the number of days between the first crash report for eachcrash and its bug report. There was a correlation between the crash’sranking and time taken for bug reporting.
Fig. 7. The ranking of most-frequent crashes in the alpha-testing phase.
Time to Action
36
36
developer to start working on a top crash. Fig. 6 shows theresults for the top-100 crashes of Firefox 3.5.
From Fig. 6, we can observe that the real situation is farfrom ideal: On average, developers waited 40 days until theystarted to work on a top-10 crash. This is unfortunate because,given the frequency of these top crashes, such a delay wouldmean hundreds of thousands of crash occurrences.
So why did Mozilla developers allow such a long delayin handling top crashes? One might blame this delay oninsufficient motivation for maintenance. However, ourpersonal communication with Mozilla development teammembers Gary Kong and Channy Yun suggests otherwise:Mozilla developers are generally eager to work on topcrashes. However, they are conservative in acknowledginga crash as a top crash, even if it appears at the top of the listfor the moment. This conservativeness is driven by theconcern that, at the early stage when crashes are firstreported (e.g., in the alpha and beta-testing phases), thefrequency of a crash might be substantially different fromits frequency at the later stage. Therefore, developers preferto “wait and see” until there are sufficient crash reports tosupport a crash being a top crash.
What if Mozilla developers were less conservative? Letus assume that they had used the data at an early stage, thealpha-testing phase, to determine top crashes. Using the5,199 crash reports submitted during the alpha-testingphase of Firefox 3.5, they would replace those crashes thatoccurred most frequently in this stage. However, are thesecrashes really the top crashes? Fig. 7 illustrates the rankingof these crashes in terms of their actual occurrencefrequencies, which are derived from all 415,351 crashreports submitted during the main life span of Firefox 3.5(from the start of alpha testing to the day when the nextversion was released). In this figure, each bar represents ak-most-frequent crash in the alpha-testing phase. Forexample, the leftmost bar indicates that the most-frequentcrash in the alpha-testing phase is ranked 162nd in terms ofactual occurrence frequency.
From Fig. 7, we can observe that the k-most-frequentcrashes in the alpha-testing phase are poor indicators ofactual top crashes: Only two of them (k ¼ 3 and k ¼ 10) aretop-20 crashes, while most of the others are actuallyinfrequent crashes. In fact, the 20 most-frequent crashes inthe alpha-testing phase can account for only 13.35 percent ofthe all crash reports of Firefox 3.5, whereas the actual top-20crashes account for 78.26 percent. The key reason, as pointedout by Fenton and Neil [19], is that the failure rate of a fault atthe early stage (prerelease) can be significantly different fromits failure rate after release. In practice, the goal of internaland volunteer alpha testers is to expose the most number ofbugs with the least number of test cases. Therefore, theyusually tend not to repeat already-exercised crashing testcases even though these test cases might trigger top crashes.
The above discussion highlights the dilemma of thecurrent practice: By being more conservative in determin-ing top crashes, developers delay bug fixing, but by beingless conservative in determining top crashes, developersmiss the actual top crashes. The core of the problem isthat current practice relies on hindsight to identify topcrashes, that is, we can accurately identify top crashesonly after they have already caused significant trouble forthe users.
It should be noted that most of the top crashes do occurin the early phase, although they are not frequent. Forexample, 16 of the top-20 crashes of Firefox 3.5 occurred atleast once during the alpha testing (shown in the bottom-right Gantt chart of Fig. 8). This indicates an opportunity forimproving current practice (see Section 6.7 for morediscussion on this topic).
3.3 How Can Prediction Improve the CurrentPractice?
To address this problem of current practice, we advocate aprediction-based approach that does not rely on hindsightto identify top crashes. With our approach, it becomesfeasible to identify top crashes during prerelease testing(i.e., alpha or beta testing), and also to react as soon as thefirst crash reports are received. Rather than waiting for anumber of crashes to occur, developers can identify andaddress the most pressing problems without delay.
To see the benefit of our approach, let us assume that wehave an “ideal top-crashes predictor” that can accurately
434 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 37, NO. 3, MAY/JUNE 2011
Fig. 6. Number of days for crashes to be reported as bugs (Firefox 3.5).We measured the number of days between the first crash report for eachcrash and its bug report. There was a correlation between the crash’sranking and time taken for bug reporting.
Fig. 7. The ranking of most-frequent crashes in the alpha-testing phase.
Time to Action
Faster but not so fast!
36
37
To address this challenge, we adopt a learning-basedapproach, summarized in Fig. 2. From an earlier release, weknow which crash reports are “top” (frequent) and whichones are “bottom” (infrequent). We extract the top andbottom stack traces as well as their method signatures. Thefeatures of these signatures are then passed to a machinelearner. The learner can then immediately classify a crashsummarized by a new incoming crash report as frequent (atop crash) or not. As shown in Section 3, the deployment ofan accurate top-crash predictor may reduce the number ofcrash reports in Firefox 3.5 by at least 36 percent ifdevelopers fix top crashes first.
We employ features from crash reports and source code totrain a machine learner. Our preliminary observations andinsights led us to focus on three types of features that formthe core of our approach:
. First, we observed that statistical characteristics canindicate whether a crash is a top or bottom crash: Inparticular, methods in stack traces of top crashesappear again in other top crashes. This motivated usto extract historical features from crash reports.
. Second, intramethod characteristics can also indicatewhether a method belongs to frequent crashes;
complex methods may crash more often. Thismotivated us to employ complexity metrics (CM)features such as lines of code and the number ofpaths for top-crash prediction.
. Third, intermethod characteristics can describecrash frequency; well-connected methods in callgraphs may crash often. To measure connectedness,we employ social network analysis (SNA) featuressuch as centrality.
To validate our approach, we investigate the crash reportrepositories of the Firefox Web browser as well as theThunderbird e-mail client. We use a very small training set ofonly 150-250 crash reports from a prior release (that is, thecrash reports received within 10-15 minutes after release).Given the small size of the set, the machine learner can thenclassify crash reports for the new release immediately—thatis, with the very first crash report. This classificationmethod has a high accuracy: In Firefox, 75 percent of allincoming reports are correctly classified; in Thunderbird,the accuracy rises to 90 percent. These accurate predictionresults can provide valuable information for developers toprioritize their defect-fixing efforts, improve quality at anearly stage, and improve the overall user experience.
From a technical standpoint, this paper makes thefollowing contributions:
1. We present a novel technique to predict whether acrash will be frequent (a “top crash”) or not.
2. We evaluate our approach on the crash reportrepositories of Thunderbird and Mozilla, demon-strating that it scales to real-life software.
3. We show that our approach is efficient, as itrequires only a small training set from the previousrelease. This implies that it can be applied at anearly stage of development, e.g., during alpha orbeta testing.
4. We show that our approach is effective, as it predictstop crashes with high accuracy. This means that efforton addressing the predicted problems is well spent.
5. We discuss and investigate under which circum-stances our approach works best; in particular, weinvestigate which features of crash reports are mostsensitive for successful prediction.
KIM ET AL.: WHICH CRASHES SHOULD I FIX FIRST?: PREDICTING TOP CRASHES AT AN EARLY STAGE TO PRIORITIZE DEBUGGING... 431
Fig. 2. Approach overview. Our approach has three steps: extracting traces from top and bottom crash reports, creating training data from the traces,and predicting unknown crashes. The first step classifies top and bottom crashes and extracts stack traces from their reports. The second stepextracts methods from the stack traces and characterizes these methods using feature data, which are extracted from source code repositories.Feature values are then accumulated per trace. These are used for training a machine learner. In the prediction step, the machine learner takes anunknown crash stack trace and classifies it as a top or bottom trace. (a) Extracting crash traces. (b) Creating corpus. (c) Prediction.
Fig. 1. A Firefox crash message from a user’s perspective.
Approach
ML Classification
Using three feature groups
• History• Complexity• Social Network Analysis (SNA) Measures
37
38
History Features
f( )f( )
f( )
f( )
g( )
“f()” is shown in crash traces more frequently
andmore vulnerable to crash
38
39
Complexity Features
f( ) g( )
“f()” is more complexand
more vulnerable to crash
39
40
SNA Features
f( )
k( )
x( )
y( )
h( )
z( )
g( )
r( )
“f()” is well-connectedand
more vulnerable to crash
40
41
Evaluation - Preprocessing
Top Crashes Bottom Crashes
Our Approach
CrashReports
41
42
crashes, they motivate us to investigate on three featuregroups.
5 EVALUATION
We present the experimental evaluation of our approach inthis section. Five research questions will be evaluated:
. RQ1: Is history information indicative of topcrashes?
. RQ2: Is the complexity of a method indicative of itschance of triggering top crashes?
. RQ3: Does the connectedness of a method correlatewith its chance of occurring in top crashes?
. RQ4: Is the size of training data relevant to theaccuracy of top-crash prediction?
. RQ5: Which feature is more indicative than the otherfeatures?
This section describes the experiment setup to evaluate ourresearch questions and reports the experimental results.
5.1 Experiment Setup
For our experiments, we used real crash reports from twoopen source systems: Firefox and Thunderbird. To demon-strate the effectiveness of our approach toward unknownstack traces, we explicitly separated the training set and thetesting set. For example, we collected a training set fromFirefox 3.0.9 and a testing set from Firefox 3.0.10. Sometimes,crashes may not be fixed in the following versions. Forexample, the crash “_PR_MD_SEND” in Firefox 3.0.9 was notfixed in Firefox 3.0.10. As a result, we find that some crashesare reported across different software versions. For fairexperiments, we ensured that the reports of the same crashdid appear in both the training and the testing sets byremoving these reports from our experiments.
Table 4 describes the data sets (corpus) used in ourexperiments. We collected crash reports for four programs(two versions of Firefox and two versions of Thunderbird).The two Firefox projects had more than 1,000 data instances(i.e., trace-based feature vectors) extracted from the stacktrace database, while the two Thunderbird projects hadaround 590 data instances. Each project had the samenumber of top and bottom crashes. Each instance wascharacterized by 10 history, 28 CM, and five SNA features,as described in Section 4.2, and had 86 elements (sum andaverage of features), as described in Section 4.3.
Specifically, we created training sets as follows:
1. Sort crashes and choose top-20 crashes.2. Randomly select n (e.g., 40 in the case of Firefox
3.0.9) stack traces for each crash.
3. Choose bottom-20 crashes and select all traces asthese crashes had less than 10 crash reports (some-times only one).
4. Select the additional bottom 20þ k crashes andselect all traces until the number of traces is equalto the number of top traces. The testing sets werealso created in the same manner.
We only used history information in the training set tocreate our testing set, as we assumed that we did not knowthe history information of the testing set. For example, wecounted how many times the method appeared in topcrashes for the training set. It is possible that some methodsin the testing set did not appear in the training set. In thiscase, we set the corresponding history features as missingvalues [37].
For a machine learner, we used two machine learningalgorithms, Naive Bayes (NB) [45] and multilayer percep-tron (MLP) [52]. Naive Bayes is a simple probabilisticclassification algorithm based on Bayes’ theorem [6] withstrong naive independence assumptions. It takes trainingdata and calculates probabilities from them. When a newinstance is presented, it predicts the target value of the newinstance. It is adopted for our evaluation because of itssimple structure and fast learning.
MLP is a feedforward artificial neural network [27]. Ithas several layers of perceptrons, which are simple binaryclassifiers. Learning in MLP occurs by changing connectionweights between perceptrons after the training data areprocessed. MLP was chosen for our evaluation becauseMLP can efficiently classify nonlinear problems [52] (weassumed that it is difficult to learn features in trace-basedfeature vectors using linear functions).
In addition, we applied the feature selection algorithmproposed by Shivaji et al. [53], which is based on abackward wrapped feature selection technique [47]. First,we put features in order according to their predictive poweras measured by the information gain ratio [34], a well-known measure of the amount by which a given featurecontributes information to a classification decision. Then,we removed the least significant feature from the feature setand measured the top/bottom crash prediction accuracy.Next, we continually removed the next weakest feature andmeasured the accuracy until there was only one feature leftin the feature set. After this iteration, it was possible toidentify the best prediction accuracy and the feature set thatyielded the best accuracy.
Although our application scenarios consider predictionat an early stage (e.g., alpha or beta-testing phases), ourevaluation concerns two subsequent official release versions(Firefox) because we focused on a performance comparisonbetween our approach and the wait-and-see approach. In
438 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 37, NO. 3, MAY/JUNE 2011
TABLE 4Data Set Used in Our Experiments
Experiment Subjects
42
43other words, we cannot compare the performance if wepredict the alpha version crash stack traces as stated inbackground (Section 3); the wait-and-see approach does notwork for the alpha version. Note that stack traces of alphaversions are the same as those of official versions. There-fore, our evaluation deals with correct subjects.
In the case of Thunderbird, we adopted two subsequentalpha versions for our evaluation because these versions arequasi-official versions, which consist of sufficient crashreports. In addition, crash reports of the latest officialversion (Thunderbird 2.0) are currently not available.Therefore, no crash report of the version can be collected.
To implement all the machine learning algorithmsmentioned above, we used the Weka [56] library.
5.2 Evaluation Measures
Applying a machine learner to a top-crash predictionproblem can result in four possible outcomes:
1. predicting a top stack trace as a top stack trace(T! T),
2. predicting a top stack trace as a bottom stack trace(T! B),
3. predicting a bottom stack trace as a top stack trace(B! T), and
4. predicting a bottom stack trace as a bottom stacktrace (B! B).
Items 1 and 4 are correct predictions, while the others areincorrect.
We used the above outcomes to evaluate the classifica-tion with the following four measures [3], [31], [48]:
. Accuracy: the number of correctly classified stacktraces divided by the total number of traces. This is agood overall measure of classification performance.
Accuracy ¼ NT!T þNB!B
NT!T þNT!B þNB!T þNB!B: ð1Þ
. Precision: the number of stack traces correctlyclassified as expected class (NT!T or NB!B) overthe number of all methods classified as top orbottom stack traces (NT!T þNB!T or NB!B þNT!B).
Precision of Top crashed traces
P ðT Þ ¼ NT!T
NT!T þNB!T;
ð2Þ
Precision of Bottom crashed traces
P ðBÞ ¼ NB!B
NB!B þNT!B: ð3Þ
. Recall: the number of traces correctly classified astop or bottom traces (NT!T or NB!B) over thenumber of actual top or bottom stack traces.
Top traces recall RðT Þ ¼ NT!T
NT!T þNT!B; ð4Þ
Bottom traces recall RðBÞ ¼ NB!B
NB!B þNB!T: ð5Þ
. F-score: a composite measure of precision P ð%Þ andrecall Rð%Þ for each class (top and bottom).
F score F ð%Þ ¼ 2& P ð%Þ &Rð%ÞP ð%Þ þRð%Þ
: ð6Þ
5.3 Prediction Results
This section reports our prediction results. First, we applied
our approach to two subsequent versions. For example, we
trained a model with Firefox and then applied the model toa subsequent version of Firefox. Second, we applied our
approach for cross projects. We trained a model on Firefox
and applied it to Thunderbird and vice versa. Table 5 shows
KIM ET AL.: WHICH CRASHES SHOULD I FIX FIRST?: PREDICTING TOP CRASHES AT AN EARLY STAGE TO PRIORITIZE DEBUGGING... 439
TABLE 5Prediction Results
Experiments were conducted for four subjects: two same-project subjects and two cross-project subjects. For each subject, Naive Bayes, NB withfeature selection, multilayer perceptron, and MLP with FS were used to classify top and bottom crashes. Four criteria were measured: accuracy,precision, recall, and F-score. In terms of accuracy, MLP outperformed Naive Bayes except for the fourth subject, and MLP with FS outperformedMLP and Naive Bayes for all subjects.
Results
43
43other words, we cannot compare the performance if wepredict the alpha version crash stack traces as stated inbackground (Section 3); the wait-and-see approach does notwork for the alpha version. Note that stack traces of alphaversions are the same as those of official versions. There-fore, our evaluation deals with correct subjects.
In the case of Thunderbird, we adopted two subsequentalpha versions for our evaluation because these versions arequasi-official versions, which consist of sufficient crashreports. In addition, crash reports of the latest officialversion (Thunderbird 2.0) are currently not available.Therefore, no crash report of the version can be collected.
To implement all the machine learning algorithmsmentioned above, we used the Weka [56] library.
5.2 Evaluation Measures
Applying a machine learner to a top-crash predictionproblem can result in four possible outcomes:
1. predicting a top stack trace as a top stack trace(T! T),
2. predicting a top stack trace as a bottom stack trace(T! B),
3. predicting a bottom stack trace as a top stack trace(B! T), and
4. predicting a bottom stack trace as a bottom stacktrace (B! B).
Items 1 and 4 are correct predictions, while the others areincorrect.
We used the above outcomes to evaluate the classifica-tion with the following four measures [3], [31], [48]:
. Accuracy: the number of correctly classified stacktraces divided by the total number of traces. This is agood overall measure of classification performance.
Accuracy ¼ NT!T þNB!B
NT!T þNT!B þNB!T þNB!B: ð1Þ
. Precision: the number of stack traces correctlyclassified as expected class (NT!T or NB!B) overthe number of all methods classified as top orbottom stack traces (NT!T þNB!T or NB!B þNT!B).
Precision of Top crashed traces
P ðT Þ ¼ NT!T
NT!T þNB!T;
ð2Þ
Precision of Bottom crashed traces
P ðBÞ ¼ NB!B
NB!B þNT!B: ð3Þ
. Recall: the number of traces correctly classified astop or bottom traces (NT!T or NB!B) over thenumber of actual top or bottom stack traces.
Top traces recall RðT Þ ¼ NT!T
NT!T þNT!B; ð4Þ
Bottom traces recall RðBÞ ¼ NB!B
NB!B þNB!T: ð5Þ
. F-score: a composite measure of precision P ð%Þ andrecall Rð%Þ for each class (top and bottom).
F score F ð%Þ ¼ 2& P ð%Þ &Rð%ÞP ð%Þ þRð%Þ
: ð6Þ
5.3 Prediction Results
This section reports our prediction results. First, we applied
our approach to two subsequent versions. For example, we
trained a model with Firefox and then applied the model toa subsequent version of Firefox. Second, we applied our
approach for cross projects. We trained a model on Firefox
and applied it to Thunderbird and vice versa. Table 5 shows
KIM ET AL.: WHICH CRASHES SHOULD I FIX FIRST?: PREDICTING TOP CRASHES AT AN EARLY STAGE TO PRIORITIZE DEBUGGING... 439
TABLE 5Prediction Results
Experiments were conducted for four subjects: two same-project subjects and two cross-project subjects. For each subject, Naive Bayes, NB withfeature selection, multilayer perceptron, and MLP with FS were used to classify top and bottom crashes. Four criteria were measured: accuracy,precision, recall, and F-score. In terms of accuracy, MLP outperformed Naive Bayes except for the fourth subject, and MLP with FS outperformedMLP and Naive Bayes for all subjects.
Results
the overall results. These results may answer RQ1, 2, and 3.For more details (i.e., predictive power of individual featuregroups), see Section 5.5.
For the subsequent versions prediction, our approachpredicted top or bottom crashes with > 75 percent accuracy,which is sufficiently high to be useful in practice. Note that theaccuracy of a random guess would be around 50 percent sinceour testing sets were evenly distributed, as shown in Table 5.In terms of top-crash precision, the accuracy of our model wasaround 90 percent for Thunderbird and 75 percent for Firefox.Overall, we believe our approach is effective and accurate atidentifying top crashes as soon as a new crash report arrives.
For the cross-project prediction, the accuracy was around70 percent, which is slightly lower than that of thesubsequent version prediction. However, an accuracy of70 percent is still considerably better than that of a randomprediction. These results suggest that our trained predictionmodel can be applied to new projects. For example, supposethat the Mozilla group releases a new product. It is possibleto predict the new product’s crashes as top or bottom usingour prediction model trained from Firefox crashes.
MLP mostly outperformed Naive Bayes. We obtained thebest results when we used MLP with feature selection. Thisimplies that using the appropriate combinations of featuresincreased the prediction accuracy. We discuss the predic-tive power of various training data sizes (Section 5.4), andfor each feature and feature groups in Section 5.5.
5.4 Size of Training Data
In this experiment, we evaluate the impact of training setsize to measure the necessary training data size (i.e., thenumber of crash instances represented in feature vectorsdescribed in Section 4.3) for yielding a reasonable predic-tion accuracy (around 70 percent) [43] (RQ4). We trainedour prediction model using various training set sizes and
measured the accuracy. Figs. 10 and 11 show the predictionaccuracy with various sizes of training data. We also useddifferent feature groups, history, SNA, CM, and all tomeasure the accuracy.
In the case of Firefox (Fig. 10), the accuracy jittered whenour model was trained with less than 200 training data.However, after 250 training data, the results stabilized andreached a reasonable accuracy. Similarly, the accuracy forThunderbird (Fig. 11) settled after 150 training data.
5.5 Feature Sensitivity Analysis
In this section, we measure and discuss the sensitivity(predictive power) of feature groups and individualfeatures (RQ1, 2, 3, and 5).
To measure the predictive power of each feature group,we trained our prediction model with three different featuregroups: history, CM, and SNA (as described in Section 4.2;these feature groups had 10, 28, and five features,respectively). The results are shown in Figs. 10 and 11.
In the case of Firefox, CM features outperformed theother feature groups. They were more than 70 percentaccurate and close to the accuracy of all features (for sometraining data sizes, they even outperformed the accuracy ofall features together). The history feature group showedaround 65 percent accuracy after 200 training instances.However, the SNA feature group performed worse thanrandom guess.
In the case of Thunderbird, all three types of featuregroups showed more than 60 percent accuracy, and thehistory and SNA feature groups showed more than70 percent accuracy after 600 training instances. The historyfeature group even outperformed the case in which allfeatures were used.
440 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 37, NO. 3, MAY/JUNE 2011
Fig. 10. Prediction accuracy using various training data sizes (Fire-fox 3.0.10 training on Firefox 3.0.9). This graph shows the accuracy onthe basis of different feature groups: social network analysis, complexitymetrics, history, and all. At the beginning, the accuracy jitters, but it isstabilized after 250 training instances.
Fig. 11. Prediction accuracy using various training data sizes (Thunder-bird 3.0a2 training on Thunderbird 3.0a1). This graph shows accuracyon the basis of different feature groups, the same as Fig. 10. This alsohas some jitters, but the accuracy stabilized after 150 training instances.Compared to Fig. 10, the accuracy for all four feature groups increasedgradually.
43
Automatic Patch Generation Learned from Human-Written Patches
Dongsun Kim, Jaechang Nam, Jaewoo Song, and Sunghun KimThe Hong Kong University of Science and Technology, China
the 35th International Conference on Software Engineering (ICSE 2013)
ACM SIGSOFT Distinguished Paper Award
44
45
GenProg
C. Le Goues, M. Dewey-Vogt, S. Forrest, and W. Weimer, “A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each,” in ICSE ’12.
45
45
GenProgState-of-the-art
C. Le Goues, M. Dewey-Vogt, S. Forrest, and W. Weimer, “A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each,” in ICSE ’12.
45
45
GenProgState-of-the-artGenetic Programming
C. Le Goues, M. Dewey-Vogt, S. Forrest, and W. Weimer, “A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each,” in ICSE ’12.
45
45
GenProgState-of-the-artGenetic ProgrammingRandom Mutation
C. Le Goues, M. Dewey-Vogt, S. Forrest, and W. Weimer, “A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each,” in ICSE ’12.
45
45
GenProgState-of-the-artGenetic ProgrammingRandom MutationSystematically Evaluated
C. Le Goues, M. Dewey-Vogt, S. Forrest, and W. Weimer, “A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each,” in ICSE ’12.
45
46
Buggy Code
46
1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502������ � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� � state.parens[i].length�=�0;�1506� state.parenCount�=�num;�
46
in Interpreter.java reported as Mozilla Bug #76683
Buggy Code
46
1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502������ � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� � state.parens[i].length�=�0;�1506� state.parenCount�=�num;�
46
in Interpreter.java reported as Mozilla Bug #76683
Buggy Code
46
1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502������ � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� � state.parens[i].length�=�0;�1506� state.parenCount�=�num;�
46
in Interpreter.java reported as Mozilla Bug #76683
Null Pointer Exception
Buggy Code
46
1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502������ � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� � state.parens[i].length�=�0;�1506� state.parenCount�=�num;�
47
GenProg repairs bugs
47
1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502������ � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� � state.parens[i].length�=�0;�1506� state.parenCount�=�num;�
1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502������ � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� � state.parens[i].length�=�0;�1506� state.parenCount�=�num;�
�
�
1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502 ����� � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� {�1506� � //�deleted.�1507� }�1508� state.parenCount�=�num;�
�
47
GenProg repairs bugs
1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502������ � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� � state.parens[i].length�=�0;�1506� state.parenCount�=�num;�
Buggy Code
GenProg
47
1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502������ � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� � state.parens[i].length�=�0;�1506� state.parenCount�=�num;�
1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502������ � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� � state.parens[i].length�=�0;�1506� state.parenCount�=�num;�
�
�
1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502 ����� � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� {�1506� � //�deleted.�1507� }�1508� state.parenCount�=�num;�
�
47
GenProg repairs bugs
1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502������ � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� � state.parens[i].length�=�0;�1506� state.parenCount�=�num;�
Buggy Code
GenProg
47
1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502������ � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� � state.parens[i].length�=�0;�1506� state.parenCount�=�num;�
1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502������ � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� � state.parens[i].length�=�0;�1506� state.parenCount�=�num;�
�
�
1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502 ����� � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� {�1506� � //�deleted.�1507� }�1508� state.parenCount�=�num;�
�
47
GenProg repairs bugs
1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502������ � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� � state.parens[i].length�=�0;�1506� state.parenCount�=�num;�
Buggy Code
GenProg
This patch passes ALL test cases.
47
48
GenProg repairs bugs1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502������ � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� � state.parens[i].length�=�0;�1506� state.parenCount�=�num;�
�
�
1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502 ����� � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� {�1506� � //�deleted.�1507� }�1508� state.parenCount�=�num;�
�
48
48
GenProg repairs bugs1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502������ � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� � state.parens[i].length�=�0;�1506� state.parenCount�=�num;�
�
�
1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502 ����� � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� {�1506� � //�deleted.�1507� }�1508� state.parenCount�=�num;�
�
1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502������ � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� � state.parens[i].length�=�0;�1506� state.parenCount�=�num;�
�
�
1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502 ����� � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� {�1506� � //�deleted.�1507� }�1508� state.parenCount�=�num;�
�48
1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502������ � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� � state.parens[i].length�=�0;�1506� state.parenCount�=�num;�
48
GenProg repairs bugs1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502������ � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� � state.parens[i].length�=�0;�1506� state.parenCount�=�num;�
�
�
1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502 ����� � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� {�1506� � //�deleted.�1507� }�1508� state.parenCount�=�num;�
�
1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502������ � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� � state.parens[i].length�=�0;�1506� state.parenCount�=�num;�
�
�
1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502 ����� � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� {�1506� � //�deleted.�1507� }�1508� state.parenCount�=�num;�
�48
49
1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502������ � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� � state.parens[i].length�=�0;�1506� state.parenCount�=�num;�
�
�
1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502 ����� � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� {�1506� � //�do�nothing.�1507� }�1508� state.parenCount�=�num;�
�
Would you accept?
17 Students68 Developers
49
49
1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502������ � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� � state.parens[i].length�=�0;�1506� state.parenCount�=�num;�
�
�
1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502 ����� � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� {�1506� � //�do�nothing.�1507� }�1508� state.parenCount�=�num;�
�
Would you accept?
9.4%
90.6%17 Students68 Developers
49
50
Human-written Patches
50
50
Human-written Patches
Readable
50
50
Human-written Patches
ReadableNatural
50
50
Human-written Patches
ReadableNaturalEasy to understand
50
50
Human-written Patches
ReadableNaturalEasy to understand
We can learn how to generate patches from human knowledge.
50
5151
51
JDT
51
51
>60,000Patches
JDT
51
51
ManualClassification
>60,000Patches
JDT
51
51
ManualClassification
# p
atch
es
Patterns
>60,000Patches
JDT
51
51
ManualClassification
# p
atch
es
Patterns
>60,000Patches
JDT
51
51
ManualClassification
# p
atch
es
Patterns
Top frequent patterns account for >20~30%
>60,000Patches
JDT
51
52
Common Fix PatternsAltering method parameters
obj.method(v1,v2)0→0obj.method(v1,v3)
Altering method parameters
obj.method(v1,v2)0→0obj.method(v1,v3)
52
52
Common Fix PatternsAdding a checker
obj.m1())→)if(obj'!='null)){obj.m1()}
Adding a checker
obj.m1())→)if(obj'!='null)){obj.m1()}
52
53
PAR
Pattern-based Automatic Program Repair
53
54
$VJDGVIDVGIDVOGMJONDMDVJDGI��O+6GOMNIONDVMGI�OMDVGOIMVGO-GVIVDGIVGDIVGDVGONIMGDVOM+
Using Human Knowledge for patch generation
54
54
$VJDGVIDVGIDVOGMJONDMDVJDGI��O+6GOMNIONDVMGI�OMDVGOIMVGO-GVIVDGIVGDIVGDVGONIMGDVOM+
Fix Templates
Using Human Knowledge for patch generation
54
54
$VJDGVIDVGIDVOGMJONDMDVJDGI��O+6GOMNIONDVMGI�OMDVGOIMVGO-GVIVDGIVGDIVGDVGONIMGDVOM+
Fix TemplatesProgram Edit Script
Using Human Knowledge for patch generation
54
54
$VJDGVIDVGIDVOGMJONDMDVJDGI��O+6GOMNIONDVMGI�OMDVGOIMVGO-GVIVDGIVGDIVGDVGONIMGDVOM+
Fix TemplatesProgram Edit Script
Using Human Knowledge for patch generation
10
54
54
$VJDGVIDVGIDVOGMJONDMDVJDGI��O+6GOMNIONDVMGI�OMDVGOIMVGO-GVIVDGIVGDIVGDVGONIMGDVOM+
Fix TemplatesProgram Edit Script
Manually created from fix patterns
Using Human Knowledge for patch generation
10
JDT
54
54
$VJDGVIDVGIDVOGMJONDMDVJDGI��O+6GOMNIONDVMGI�OMDVGOIMVGO-GVIVDGIVGDIVGDVGONIMGDVOM+
Fix TemplatesProgram Edit Script
Manually created from fix patternsHighly reusable
Using Human Knowledge for patch generation
10
JDT
54
if(lhs == DBL_MRK) lhs = ...; if(lhs == undefined) { lhs = strings[pc + 1]; } Scriptable calleeScope = ...;
DDKOMDOMODMIO�GVNOMKVGONMIGVDI
$VJDGVIDVGIDVOGMJONDMDVJDGI��O
6GOMNIONDVMGI�OMDVGOIMVGO
GVIVDGIVGDIVGDVGONIMGDVOM
ONDVMIONMVGDONIMOGNVDMIONGVDM
Buggy Program
DDKOMDOMODMIO�GVNOMKVGONMIGVDI
$VJDGVIDVGIDVOGMJONDMDVJDGI��O
6GOMNIONDVMGI�OMDVGOIMVGO
GVIVDGIVGDIVGDVGONIMGDVOM
ONDVMIONMVGDONIMOGNVDMIONGVDM
(a) Fault Localization
$VJDGVIDVGIDVOGMJONDMDVJDGI��O+6GOMNIONDVMGI�OMDVGOIMVGO-GVIVDGIVGDIVGDVGONIMGDVOM+
DDKOMDOMODMIO�GVNOMKVGONMIGVDI
$VJDGVIDVGIDVOGMJONDMDVJDGI��O
6GOMNIONDVMGI�OMDVGOIMVGO-++
DDKOMDOMODMIO�GVNOMKVGONMIGVDI
$VJDGVIDVGIDVOGMJONDMDVJDGI��O
GVIVDGIVGDIVGDVGONIMGDVOM
(b) Template-based Patch Candidate Generation
Fail
Pass
(c) Patch Evaluation
T Repaired
Fix Template
Patch Candidate
VGDJNOMODNZMHIUOZMHIVGMON�
DDKOMDOMODMIO�GVNOMKVGONMIGVDI
$VJDGVIDVGIDVOGMJONDMDVJDGI��O
6GOMNIONDVMGI�OMDVGOIMVGO
GVIVDGIVGDIVGDVGONIMGDVOM
ONDVMIONMVGDONIMOGNVDMIONGVDM
VGDJNOMODNZMHIUOZMHIVGMONDDKOMDOMODMIO�GVNOMKVGONMIGVDI
$VJDGVIDVGIDVOGMJONDMDVJDGI��O
6GOMNIONDVMGI�OMDVGOIMVGO
GVIVDGIVGDIVGDVGONIMGDVOM
ONDVMIONMVGDONIMOGNVDMIONGVDM
Repaired Program
Fault Location
55
Template-basedPatch Candidate Generation
$VJDGVIDVGIDVOGMJONDMDVJDGI��O+6GOMNIONDVMGI�OMDVGOIMVGO-GVIVDGIVGDIVGDVGONIMGDVOM+
DDKOMDOMODMIO�GVNOMKVGONMIGVDI
$VJDGVIDVGIDVOGMJONDMDVJDGI��O
6GOMNIONDVMGI�OMDVGOIMVGO-++
DDKOMDOMODMIO�GVNOMKVGONMIGVDI
$VJDGVIDVGIDVOGMJONDMDVJDGI��O
GVIVDGIVGDIVGDVGONIMGDVOM
Fix Template
Patch Candidate
DDKOMDOMODMIO�GVNOMKVGONMIGVDI
$VJDGVIDVGIDVOGMJONDMDVJDGI��O
6GOMNIONDVMGI�OMDVGOIMVGO
GVIVDGIVGDIVGDVGONIMGDVOM
ONDVMIONMVGDONIMOGNVDMIONGVDM
VGDJNOMODNZMHIUOZMHIVGMON
Fault Location
55
Using a Fix Template: An Example
56
1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502������ � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� � state.parens[i].length�=�0;�1506� state.parenCount�=�num;�
56
Using a Fix Template: An Example
56
1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502������ � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� � state.parens[i].length�=�0;�1506� state.parenCount�=�num;�
56
$VJDGVIDVGIDVOGMJONDMDVJDGI��O+6GOMNIONDVMGI�OMDVGOIMVGO-GVIVDGIVGDIVGDVGONIMGDVOM+Null Pointer Checker
Using a Fix Template: An Example
56
1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502������ � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� � state.parens[i].length�=�0;�1506� state.parenCount�=�num;�
56
$VJDGVIDVGIDVOGMJONDMDVJDGI��O+6GOMNIONDVMGI�OMDVGOIMVGO-GVIVDGIVGDIVGDVGONIMGDVOM+Null Pointer Checker
Using a Fix Template: An Example
56
obj ref.: state, parens[i], ...
1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502������ � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� � state.parens[i].length�=�0;�1506� state.parenCount�=�num;�
56
$VJDGVIDVGIDVOGMJONDMDVJDGI��O+6GOMNIONDVMGI�OMDVGOIMVGO-GVIVDGIVGDIVGDVGONIMGDVOM+Null Pointer Checker
Using a Fix Template: An Example
56
obj ref.: state, parens[i], ...
Check obj ref.: PASS
1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502������ � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� � state.parens[i].length�=�0;�1506� state.parenCount�=�num;�
56
$VJDGVIDVGIDVOGMJONDMDVJDGI��O+6GOMNIONDVMGI�OMDVGOIMVGO-GVIVDGIVGDIVGDVGONIMGDVOM+Null Pointer Checker
Using a Fix Template: An Example
56
obj ref.: state, parens[i], ...
Check obj ref.: PASS
Edit: Insert ... ... + if( ) { state.parens[i].length = 0; + } ... ...
state != null && state.parens[i] != null
1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502������ � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� � state.parens[i].length�=�0;�1506� state.parenCount�=�num;�
56
$VJDGVIDVGIDVOGMJONDMDVJDGI��O+6GOMNIONDVMGI�OMDVGOIMVGO-GVIVDGIVGDIVGDVGONIMGDVOM+Null Pointer Checker
Using a Fix Template: An Example
56
obj ref.: state, parens[i], ...
Check obj ref.: PASS
Edit: Insert ... ... + if( ) { state.parens[i].length = 0; + } ... ...
state != null && state.parens[i] != null
1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502������ � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� � state.parens[i].length�=�0;�1506� state.parenCount�=�num;�
56
$VJDGVIDVGIDVOGMJONDMDVJDGI��O+6GOMNIONDVMGI�OMDVGOIMVGO-GVIVDGIVGDIVGDVGONIMGDVOM+Null Pointer Checker
Using a Fix Template: An Example
56
obj ref.: state, parens[i], ...
Check obj ref.: PASS
Edit: Insert ... ... + if( ) { state.parens[i].length = 0; + } ... ...
state != null && state.parens[i] != null
1500 num = state.parenCount; 1501 int kidMatch = matchRENodes(state, (RENode)ren.kid, 1502 stop, index); 1503 if (kidMatch != -1) return kidMatch; 1504 for (int i = num; i < state.parenCount; i++) 1505 state.parens[i].length = 0; 1506 state.parenCount = num;
1500 num = state.parenCount; 1501 int kidMatch = matchRENodes(state, (RENode)ren.kid, 1502 stop, index); 1503 if (kidMatch != -1) return kidMatch; 1504 for (int i = num; i < state.parenCount; i++) 1505 { 1506 // deleted. 1507 } 1508 state.parenCount = num;
1500 num = state.parenCount; 1501 int kidMatch = matchRENodes(state, (RENode)ren.kid, 1502 stop, index); 1503 if (kidMatch != -1) return kidMatch; 1504 for (int i = num; i < state.parenCount; i++) 1505 state.parens[i].length = 0; 1506 state.parenCount = num;
1500 num = state.parenCount; 1501 int kidMatch = matchRENodes(state, (RENode)ren.kid, 1502 stop, index); 1503 if (kidMatch != -1) return kidMatch; 1504 for (int i = num; i < state.parenCount; i++) 1505 { 1506 if( state != null && state.parens[i] != null) 1507 state.parens[i].length = 0; 1508 } 1509 state.parenCount = num;
1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502������ � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� � state.parens[i].length�=�0;�1506� state.parenCount�=�num;�
56
$VJDGVIDVGIDVOGMJONDMDVJDGI��O+6GOMNIONDVMGI�OMDVGOIMVGO-GVIVDGIVGDIVGDVGONIMGDVOM+Null Pointer Checker
Using a Fix Template: An Example
56
obj ref.: state, parens[i], ...
Check obj ref.: PASS
Edit: Insert ... ... + if( ) { state.parens[i].length = 0; + } ... ...
state != null && state.parens[i] != null
1500 num = state.parenCount; 1501 int kidMatch = matchRENodes(state, (RENode)ren.kid, 1502 stop, index); 1503 if (kidMatch != -1) return kidMatch; 1504 for (int i = num; i < state.parenCount; i++) 1505 state.parens[i].length = 0; 1506 state.parenCount = num;
1500 num = state.parenCount; 1501 int kidMatch = matchRENodes(state, (RENode)ren.kid, 1502 stop, index); 1503 if (kidMatch != -1) return kidMatch; 1504 for (int i = num; i < state.parenCount; i++) 1505 { 1506 // deleted. 1507 } 1508 state.parenCount = num;
1500 num = state.parenCount; 1501 int kidMatch = matchRENodes(state, (RENode)ren.kid, 1502 stop, index); 1503 if (kidMatch != -1) return kidMatch; 1504 for (int i = num; i < state.parenCount; i++) 1505 state.parens[i].length = 0; 1506 state.parenCount = num;
1500 num = state.parenCount; 1501 int kidMatch = matchRENodes(state, (RENode)ren.kid, 1502 stop, index); 1503 if (kidMatch != -1) return kidMatch; 1504 for (int i = num; i < state.parenCount; i++) 1505 { 1506 if( state != null && state.parens[i] != null) 1507 state.parens[i].length = 0; 1508 } 1509 state.parenCount = num;
1500 num = state.parenCount; 1501 int kidMatch = matchRENodes(state, (RENode)ren.kid, 1502 stop, index); 1503 if (kidMatch != -1) return kidMatch; 1504 for (int i = num; i < state.parenCount; i++) 1505 state.parens[i].length = 0; 1506 state.parenCount = num;
1500 num = state.parenCount; 1501 int kidMatch = matchRENodes(state, (RENode)ren.kid, 1502 stop, index); 1503 if (kidMatch != -1) return kidMatch; 1504 for (int i = num; i < state.parenCount; i++) 1505 { 1506 // deleted. 1507 } 1508 state.parenCount = num;
1500 num = state.parenCount; 1501 int kidMatch = matchRENodes(state, (RENode)ren.kid, 1502 stop, index); 1503 if (kidMatch != -1) return kidMatch; 1504 for (int i = num; i < state.parenCount; i++) 1505 state.parens[i].length = 0; 1506 state.parenCount = num;
1500 num = state.parenCount; 1501 int kidMatch = matchRENodes(state, (RENode)ren.kid, 1502 stop, index); 1503 if (kidMatch != -1) return kidMatch; 1504 for (int i = num; i < state.parenCount; i++) 1505 { 1506 if( state != null && state.parens[i] != null) 1507 state.parens[i].length = 0; 1508 } 1509 state.parenCount = num;
1500� num�=�state.parenCount;�1501� int�kidMatch�=�matchRENodes(state,�(RENode)ren.kid,�1502������ � � � � stop,�index);�1503� if�(kidMatch�!=�Ş1)�return�kidMatch;�1504� for�(int�i�=�num;�i�<�state.parenCount;�i++)�1505� � state.parens[i].length�=�0;�1506� state.parenCount�=�num;�
56
57
List of Templates
57
57
List of Templates
Parameter Replacer
Method Replacer
Parameter Adder and Remover
Expression Replacer
Expression Adder and Remover
Object Initializer
Range Checker
Collection Size Checker
Null Pointer Checker
Class Cast Checker
57
58
Evaluation: Experiment Design
58
58
Evaluation: Experiment Design
58
58
Evaluation: Experiment Design
PAR GenProg
58
58
Evaluation: Experiment Design
PAR GenProg
58
58
Evaluation: Experiment Design
PAR GenProg
58
58
Evaluation: Experiment Design
PAR GenProg
# #
58
59
RQ1(Fixability): How many bugs are fixed successfully?
RQ2(Acceptability): Which approach can generate more acceptable bug patches?
Evaluation: Research Questions
#
59
60
Subject # bugs LOC # test cases
Rhino 17 51,001 5,578
AspectJ 18 180,394 1,602
log4j 15 27,855 705
Math 29 121,168 3,538
Lang 20 54,537 2,051
Collections 20 48,049 11,577
Total 119 351,406 25,051
Experiment Subjects
60
61
RQ1: Fixability
61
61
RQ1: Fixability
PAR GenProg06
12182430
61
61
RQ1: Fixability
PAR GenProg06
12182430
27
61
61
RQ1: Fixability
PAR GenProg06
12182430
27
16
61
61
RQ1: Fixability
PAR GenProg06
12182430
27
16PAR GenProg
27 16>61
GenProg
62
PAR
RQ2: Acceptability
62
GenProg
62
0
10
20
30
40
21
28
37
14resp
onse
s (%
)
PAR HumanBoth NotSure
PAR
RQ2: Acceptability
62
GenProg
62
0
10
20
30
40
21
28
37
14resp
onse
s (%
)
PAR HumanBoth NotSure
PAR
0
15
30
45
60
20
12
51
17re
spon
ses
(%)
GenProg HumanBoth NotSure
RQ2: Acceptability
62
GenProg
62
0
10
20
30
40
21
28
37
14resp
onse
s (%
)
PAR HumanBoth NotSure
PAR
0
15
30
45
60
20
12
51
17re
spon
ses
(%)
GenProg HumanBoth NotSure
49%
RQ2: Acceptability
62
GenProg
62
0
10
20
30
40
21
28
37
14resp
onse
s (%
)
PAR HumanBoth NotSure
PAR
0
15
30
45
60
20
12
51
17re
spon
ses
(%)
GenProg HumanBoth NotSure
49%
32%
RQ2: Acceptability
62
GenProg
62
0
10
20
30
40
21
28
37
14resp
onse
s (%
)
PAR HumanBoth NotSure
PAR
0
15
30
45
60
20
12
51
17re
spon
ses
(%)
GenProg HumanBoth NotSure
49%
32%PAR generates more
acceptable patches than GenProg
RQ2: Acceptability
62
63
Quick Tips on Mining
Repositories hate massive crawler.
Data format can be changed frequently.
Noise Filtering [ICSE2011,ICSE2013] is very important.
63
64
Future Directions
Automatic Fix Template Identification
Tangled Changes [MSR2013]
Build Scripts (e.g., Ant, maven, and gradle)
64