Transcript
Page 1: Predicting Fault-Prone Files using Machine Learning

Predicting Fault-Prone Files using Machine Learning: A Java Open Source CaseStudy

Priya Krishnan Sundararajan, Ole J. Mengshoel, SureshBabu Rajasekaran,Guido A. Ciollaro and Xinyao HuCarnegie Mellon University, Silicon Valley

{priya.sundararajan, ole.mengshoel, xinyao.hu}@west.cmu.edu, [email protected], guido [email protected]

Abstract

In this paper, we study 9 open source Java projects,10 classical software metrics, 6 machine learning algo-rithms, and the FindBugs software. These Java projectscontain a total of 18,985 files and 3 million lines ofcode. We used the machine learning approaches to esti-mate the posterior probability of a file being buggy. Ourmain finding is that classification using decision trees,Naive Bayes, and Bayesian networks perform best.

1 IntroductionBugs account for 40% of system failures (Marcus and Stern2000). The correctness and dependability of software sys-tems are very important in mission-critical systems. In thepast, the way to find bugs in software was by manual inspec-tion or by extensive testing. Nowadays, there are many toolbased approaches (Rutar, Almazan, and Foster 2004) whichcan be broadly classified as dynamic analysis and static anal-ysis techniques.

Dynamic analysis techniques discover properties by mon-itoring program executions for particular inputs; standardtesting is the most commonly used form of dynamic anal-ysis (Dillig, Dillig, and Aiken 2010). Static analysis tools(Balakrishnan and Reps 2010) does not require code exe-cution and the code is tested for all possible inputs. Find-Bugs (Hovemeyer and Pugh 2004) is a specific example ofsuch an Oracle. All these tools are based on detailed anal-ysis of source code files and will often have computationalcost or require substantial setup time, especially for com-plex software, models and algorithms. Some techniques,including most model checkers (Corbett et al. 2000) andtheorem provers (Flanagan et al. 2002), work on mathemat-ical abstractions of source code which are difficult or time-consuming to develop.

The purpose of this work is different from and comple-mentary to much previous work on software quality assur-ance. Instead of attempting to locate the exact location of aknown or potential bug, our goal is to establish, by means ofmachine learning, the “smell” of each file that makes up asoftware project. Stated more formally, we consider the out-put of a machine learning classifier, which for each source

Copyright c© 2012, Association for the Advancement of ArtificialIntelligence (www.aaai.org). All rights reserved.

code file a feature vector as input, to rank the files based ontheir probability of being buggy. The lift curves that are in-duced from the ranking are drawn based on the predictionsof the Oracle which gives the correct class, thus incorrectpredictions can easily be identified. We explore methodsto predict the number of buggy files to stop the machinelearning classifiers as soon as the expected number of buggyfiles are identified. By this analysis we wish not only to fur-ther investigate the files that are most likely to be fault-pronefirst, but also to stop a more detailed analysis as soon as theexpected number of buggy files have been processed. Wecall this methodology Strategic Bug Finding (SBF).

In experiments, we use six machine learning algorithms,namely Naive Bayes, Bayesian network, decision tree(C4.5), radial basis function (RBF), simple logistics and ze-roR. We analyze nine open source Java projects using theseML algorithms, as implemented in the Weka tool (Hall etal. 2009). Our main experimental result is that Bayesiannetworks and decision tree perform better for most of theprojects.

The rest of this article is organized as follows: In section2 we discuss previous work in the prediction of buggy files.The machine learning algorithms used in our approach arediscussed in section 3. In section 4 we present our frame-work and the algorithms that we have used. In section 5experimental results and the lift charts are shown. section 6concludes the paper and presents future work.

2 BackgroundSoftware metrics are used, for example, to provide measure-ments of the source codes’ architecture and data flow. Tak-ing software metrics as features, machine learning classifierscan be used to predict the probability of a file being buggy.This fundamental connection between software quality andmachine learning has been explored by several researchersin the past. There are studies that focus on predicting thefault-proneness of software based on structural propertiesof object oriented code (Arisholm, Briand, and Johannessen2010). Code change history, along with dependency metrics,has been used to predict fault-prone modules (Nagappan andBall 2007) (Kim, Whitehead, and Zhang 2008). A numberof process measures in addition to structural measures havealso been used (Weyuker, Ostrand, and Bell 2007). It hasbeen argued that performance measures should always take

Page 2: Predicting Fault-Prone Files using Machine Learning

into account the percentage of source code predicted as de-fective, at least for unit testing and code reviews (Mende andKoschke 2009). The cost of predicting a source file as de-fective during manual testing is proportional to the size ofthe source code; this is often neglected (Arisholm, Briand,and Fuglerud 2007).

The quality and quantity of training data (software met-rics) have direct impact on the learning process as differ-ent machine learning methods have different criteria regard-ing training data. Some methods require large amounts ofdata, other methods are very sensitive to the quality of data,and still others need both training data and a domain the-ory (Witten and Frank 2005)). An interesting comparisonof ant colony optimization versus well-known techniqueslike C4.5, support vector machine (SVM), logistic regres-sion, K-nearest neighbour, RIPPER and majority vote hasbeen performed (Vandecruys et al. 2008). In terms of ac-curacy, C4.5 was the best technique. An experimental eval-uation of neural networks versus SVMs on a simple binaryclassification problem (i.e. buggy vs. correct) gave theseresults (Gondra 2008): the accuracy was 87.4% when usingSVM compared to 72.61% when using neural networks. Thecost-effectiveness and classification accuracy (precision, re-call, ROC) of eight data mining techniques have also beenevaluated (Arisholm, Briand, and Fuglerud 2007). The C4.5method was found to out-perform other including SVM andneural networks. Our results also confirms that C4.5 per-forms better than other machine learning classifiers. Oneway to graphically evaluate the performance of the modelsare lift charts also known as Alberg diagrams (Ohlsson andAlberg 1996).

3 Machine LearningIntuitively, predicting fault-prone files is a simple classifi-cation problem label each file instance as buggy or correct.We can solve this classification problem by learning a classi-fier that discriminates between the files in the project basedon the metrics of the file. A set of labeled training exam-ples is given to the learner and the learned classifier is thenevaluated on a set of test instances.

The application of machine learning techniques to soft-ware bug finding problems has become increasingly popularin recent years. But selecting a best machine learning clas-sifier for a given software project is not possible except intrivial cases. For large software projects, we do not under-stand the software metrics that affect the performance of aclassifier to make the predictions with confidence. Severalmachine learning techniques can be used. We use the NaiveBayes, decision tree, Bayesian network, radial basis func-tion, zeroR and simple logistic classifiers. We evaluate theperformance of these classifier using the lift charts.

We use the C4.5 decision tree classifier (Quinlan 1993)which works by recursively partitioning the dataset. In ourcontext, each leaf of a decision tree corresponds to a sub-set of the data set available (based on the software metricsfor the subset of files) and its probability distribution can beused for prediction when all the conditions leading to thatleaf are met. The C4.5 decision tree classifier performed

better than other modeling techniques like SVM and neuralnetworks (Arisholm, Briand, and Fuglerud 2007).

In terms of predicting the defects in software, NaiveBayes outperforms a wide range of other methods (Turhanand Bener 2009), (Tao and Wei-hua 2010). Naive Bayes as-sumes the conditional independence of attributes. Althoughthe probability estimates that it produces can be inaccurate,it often assigns maximum probability to the correct class(Frank et al. 2000). It is therefore interesting to see howit performs for software bug prediction task with the ’inde-pendence’ assumption. The Naive Bayes classifier (John andLangley 1995) is used.

Whereas in reality, in terms of these metrics measure var-ious aspects of the same file in a software project, individ-ual metric attributes tend to be highly correlated with eachother (known as multicollinearity) (Dick et al. 2004). TheBayesian network classifier facilitates the study of intra-relationship between software metrics and allow one to learnabout causal relationships. We use Bayesian network classi-fier (Bouckaert 2004), the training set is used to build theBayesian network structure using the learning algorithmsand the predictions on the test set to classify buggy files fromnon-buggy ones are done using the inference algorithms.

We use the simple logistic classifier (Landwehr, Hall, andFrank 2005) for building linear logistic regression models.The zeroR determines the most common class and tests howwell the class can be predicted without considering other at-tributes. It is used as a lower bound on performance. We usea normalized Gaussian radial basis basis function network(Buhmann 2003).

4 Framework and AlgorithmsWe consider a set of software projects Π = {π1, π2, . . .}.Consider one particular project π. It contains a set of (sourcecode) files Φ = {φ1, φ2, . . .}, were each φi is expressedin a programming language. For each (source code) file,multiple metrics are computed. In other words, we have aset of metrics functions G = {g1, g2, . . . , gm}. Specifically,a metric function g ∈ G is a function from the set of allpossible programs into the natural numbers N. The machinelearning algorithms are represented as L = {l1, l2 . . .}

Using the above set-up, we form φ ∈ Φ, and get a case(E, c), where E = (g1(φi), g2(φi), g3(φi), . . . , gm(φi)),that also contains a discrete bug classification cεC = {c1,c2, . . .}. In the simplest case, C = {0, 1} where 0 meansnon-buggy while 1 means buggy. Here, C is computed usinga bug-finding algorithm or the Oracle, such as FindBugs. Wegenerate the training set E for a subset of the projects exceptthe test project. This process is shown in Algorithm 1.

Algorithm 2 show the test set generation and the learn-ing phase. Each probabilistic classifier computes a posteriorPr(C | E) , where E is a case (the set of metrics for the file)but without the classification. Consider a project having nnumber of cases E where m of are buggy and n − m arecorrect. Using a classifier, we can compute Pr(C | E) foreach case.The probabilities are sorted in a decreasing way,having the case E with higher probability of being buggyfirst. Algorithm 1 and 2 are run for each project to identifythe buggy files in each of them.

Page 3: Predicting Fault-Prone Files using Machine Learning

Algorithm 1 Training SetΠ← {π1, π2, . . . πM}Φ← {φ1, φ2, . . . φN}G← {g1, g2, . . . , gm}E ← (g1(φi), g2(φi), g3(φi), . . . , gm(φi), c)for i = 1 to M in Π and i 6= πk do

for j = 1 to N in Φ doEij ← (g1(φj), g2(φj), . . . gm(φj))cij ← Oracle(Eij)Ttraining ← Ttraining

⋃Eij , cij

end forend for

Algorithm 2 Test Set - Project kL← {l1, l2 . . . lO}for l = 1 to O in L do

for j = 1 to N in Φ doEkj ← (g1(φj), g2(φj), . . . gm(φj))c, Pr(c | Ekj)← lk(Ttraining, Ekj)Predictedtest ← Predictedtest

⋃Ekj , c, Pr(c | Ekj)

end forend for

Lift Curve Analysis

Lift curves are used to plot the number of source code files inthe x-axis vs the number of buggy files in the y-axis. Thesecurves are then used as in previous works, where predic-tion is based on ranking classes with highest fault predictionprobability (Briand and Wst 2002). We expect the curve toincrease sharply in the beginning and then reach a plateau.The fault-prone files are caught as the slope of the curve in-creases. Testing can be performed till this stage and stoppedwhen the slope of the curve flattens or the expected numberof buggy files can be calculated using maximum likelihoodestimation.

Consider a set of software projects1 Π = {π1, π2, . . .}.Consider we wish to analyze one particular project πk. Wewill create a training set2, Ttraining using the metrics fromprojects {π1, π2, . . . , πM} excluding the ones from πk. Wewill use the metrics from project πk as our test set, Ttest.Both the test set and the training set will have their predic-tion class labeled for each file. This is done using the Oracle.

The test set predictions are used here to evaluate the pre-dictions of the machine learning classifier. The machinelearning classifier gives the probability of bugginess giventhe software metrics, Pr(c | Eik) for each file i in thetest project πk. The files are sorted in decreasing order tocapture the files that have the highest probability of beingbuggy. While drawing the lift charts, we consider the cumu-lative count of the actual predictions made by the Oracle foreach file instance. We do this counting in the order as sortedbased on the machine learning classifier. For each file i, weadd one and for each non-buggy file we add zero. The for-mal mathematical definition of the lift curve function g(i) is

1Note that this could be a very large number of projects.2Training set could be a subset of the projects.

shown below:g(i + 1) = g(i) + δ(i + 1)

g(0) = 0(1)

δ(i) ={

0 if Pr(c | Eik) < 0.5 and Oracle(Eik)=correct,1 if Pr(c | Eik) ≥ 0.5 and Oracle(Eik)=buggy

Thus only ranking is considered from the classifier, andthe lift curves are generated based on the predictions of theOracle. In the early stage, we expect both the classifier andthe Oracle to predict only buggy files. Thus the slope of thelift curve is expected to increase in the initial stage. If thereis a mismatch at this stage, then the lift curve will flatten outshowing the weakness of the classifier. But we expect thelift curve to flatten out after a certain period when there areno more buggy files to predict.

Figure 1: Lift Chart

Consider a project containing 1000 (one thousand) files,of which 200 (two hundred) are buggy. As shown in Figure1, the ideal situation would be that all bugs are found withinthe first 200 files as shown by the best case curve. On theother hand, the worst situation would be that all bugs wouldbe found within the last 200 files (worst case curve). Bysorting files by their probability of being buggy, a similarresult to average case curve would be obtained dependingon the accuracy of the ML algorithm. The better a classifieris trained, the closer to the best-case curve the result of usingit will be.

One major drawback of these lift curves is that the Oracleis employed to predict the buggy files. In such cases theperformance assessment of the machine learning classifierson large projects will be a burden. It will be of great help ifwe could predict the number of buggy files prior to runningthe Oracle. In the next section, we propose two differentmethods to identify the number of buggy files.

How many files are buggy?The machine learning classifier assigns a probability of be-ing buggy given the metrics, Pr(c | Eik) for each file in thetest. One method is to use a threshold probability p. Onlythe files having a probability higher than p (Pr(c | Eik) ≥ p)of being buggy can be analyzed using the Oracle. Anothermethod is to define a percentage c%, and analyze using theOracle only the top c number of files. For example, if thereare 200 files in a project, and we define c=20, we analyze

Page 4: Predicting Fault-Prone Files using Machine Learning

Algorithm 3 Maximum Likelihood Estimation Using TBF{φa, φb..φn} ← πk(φ1, φ2, . . . φN )for j = 1 to n in πk do

Eij ← (g1(φj), g2(φj), . . . gm(φj))cij ← Oracle(Eij)if cij = 1 then

TBFbuggy ← TBFbuggy + 1end if

end forpbuggy ← TBFbuggy/nEx(k)← pbuggy ∗Nreturn Ex(k)

Table 2: Metrics DescriptionAbbreviation DescriptionLOC Lines of CodeBLOC Blank Lines of CodeSLOC-P Physical Executable Lines of CodeSLOC-L Logical Executable Lines of CodeMVG McCabe VG ComplexityC&SLOC Code and Comment Lines of CodeCLOC Comment Only Lines of CodeCWORD Commentary WordsHCLOC Header Comment Lines of CodeHCWORD Header Commentary Words

40 files using the Oracle. To determine this value, considerthe following approach: Out of 1000 files we can randomlypick 50 files and check for faults, if we find 5 of them to bebuggy using the Oracle, then the prior value, the MLE is 0.1.So the expected number of buggy files will be the top 100(1000*0.1) files when the files are sorted on the descendingorder of their probability of being buggy.

5 ExperimentsThe dataset of our experiment consists of nine Java opensource projects3. The standard software metrics used areshown in Table 2. These metrics are generated using theLOCMetrics tool4.

Machine Learning ResultsFor each file in the projects, we compute the set of metricsas shown in Table 1and the class value, denoting whether afile is buggy or not. The class value is predicted using theFindBugs tool as in TBF. The metrics along with the classlabel denotes the training set. The metrics for the test projectis generated and used as the test set. Using the training andtest sets the machine learning classifier is used to predict thebuggy files of the test project. We use Weka tool to run themachine learning algorithms. The last six columns of Table3 shows the percentage of correctly classified files by thesix different classifiers; these are Naive Bayes (NB), deci-sion tree (DT), Bayesian network (BN), radial basis function(RBF), zeroR and simple logistic (SL). Our finding is that

3Since the purpose of the paper is not to develop novel softwaremetrics, we use standard software metrics from the literature.

4http://www.locmetrics.com

classification using decision trees, and Naive Bayes classi-fiers perform best. But note that the accuracy percentage ofcorrectly classified files involves correctly predicting boththe actual buggy and the correct files. We are more interestedin finding the performance of the classifier in predicting theactual buggy files. We use the lift charts for this purpose.

Lift Charts AssessmentFigure 2 shows the lift charts depicting the performance ofthe classifier for each of the nine projects. For each project,the classifiers are compared to the best case model as shownin Figure 1. The classifiers below the average line, seems towork much poorer than those above the average line. Theperformance of zeroR classifier is around the average, ex-cept for the project Hibernate where the curve reaches thebest case. zeroR performs poorly in project Vuze. Simplelogistic and RBF classifier also performs badly, as it is be-low the average line for projects like Weka, Resin, Vuze andOpenSwing. Decision tree classifier behavior can be tracedalways above the average line except for the project Resin.Similarly Bayesian network classifier also performs betterfor all the projects except in project Hibernate where thecurve overlaps with the average line. Naive Bayes classifieralso performs better for most of the projects except JSpider.From our results, we find decision tree, Bayesian networkand Naive Bayes performs better for most of the projects.This process of generating the lift curves may be difficult ifwe need to run the Oracle on a large project. To overcomesuch situations, we proposed the maximum likelihood esti-mate approach.

Predicting the expected number of buggy filesIn this method we use the maximum likelihood estimator topredict the expected number of buggy files. We use a sam-ple size of 250 for all the projects. Table 3 shows the ex-perimental results obtained from analyzing all the projects.The first column gives a brief description about the project.The next three columns shows the number of buggy files aspredicted using the Oracle (FindBugs), maximum likelihoodestimate and the machine learning algorithm. From Figure3, the machine learning algorithm that is closest to the bestcase model (as in Figure 1) of the lift curve is selected forthat project. The chosen classifier predicts the maximumnumber of buggy files for each project. Though the numberof buggy files predicted by the machine learning approachand the Oracle approach are reasonable, maximum likeli-hood estimate varies based on the sample size.

6 Discussion and Future WorkUsing software metrics and machine learning has proven tobe a very fast and accurate way of finding bugs in software.Developers are relieved from the laborious task of checkingall the source code files for bugs. Only the predicted set offiles can be checked to fix most of the bugs. There are twoimportant points in the whole process. First, selecting thecorrect software metrics that reflect, as best as possible, thesoftware. Second, using the right machine learning classifierwill give the most accurate results. We used the lift charts to

Page 5: Predicting Fault-Prone Files using Machine Learning

Table 1: Metric sets for each ProjectResin Vuze Hibernate Weka Hadoop OpenSwing Jedit Heritrix Jspider

Source Files 6662 3261 4544 1194 951 982 531 562 298Directories 1771 531 1665 87 468 579 131 122 415LOC 1282473 776970 530664 457724 207056 201862 177199 127112 19272BLOC 207406 172928 75222 237314 24259 30560 19800 12502 4149SLOC-P 715851 482312 322519 168329 128859 121935 109515 64916 12622SLOC-L 433632 304377 237506 36361 92456 84748 64650 44473 10121MVG 112433 59211 34223 57370 15292 13714 16268 8125 1113C&SLOC 1120 3376 1015 2327 1434 432 4652 829 18CLOC 359216 121730 132923 163040 53938 49367 47884 49694 2501CWORD 1768708 749607 796577 848954 330457 281936 263498 285514 10992HCLOC 181083 54295 64320 22995 16198 60 10939 12591 52HCWORD 1189290 425380 479177 162505 122533 340 80178 83697 187Bugs 2,700 2,126 410 1278 656 1,421 551 457 174

Table 3: Number of buggy files predicted by maximum likelihood estimate and by the best machine learning algorithm, and the (%) ofcorrectly predicted buggy files by each machine learning classifier for each project

Number of Buggy FilesProject Description Actual by Predicted Predicted NB DT BN RBF ZeroR SL

Oracle by MLE by best MLResin Server 1,308 1119 2278 82.3 82.6 76.9 80.4 80.4 80.9Vuze Bittorrent 347 548 364 85.2 89.2 75.6 81.9 89.4 83.2Hibernate Framework 502 763 502 93.3 95.7 95.6 93.3 95.7 93.3Hadoop Framework 444 200 393 75.8 75.7 71.0 74.1 74.1 75.6Weka Machine Learning 246 159 143 77.6 77.3 72.5 76.3 62.8 72.7OpenSwing Graphics Suite 324 165 279 70.0 81.5 77.1 68.9 67.0 68.3Jedit Editor 159 89 302 73.5 73.5 70.4 70.0 70.0 72.8Heritrix Web crawler 199 94 212 71.2 79.0 73.8 64.6 64.6 66.7Jspider Search Engine 63 50 18 79.9 82.5 73.5 78.8 78.8 78.8

Average 78.7 81.8 76.2 76.4 75.8 76.9

Figure 2: Lift Charts for the nine open source projects using the six machine learning algorithms such as Naive Bayes, Bayesiannetwork, decision tree (C4.5), RBF, simple logistics and zeroR

Page 6: Predicting Fault-Prone Files using Machine Learning

determine the behavior of each machine learning classifier.The decision tree, Bayesian Network and the Naive Bayesclassifier performed better in our experiments. We tried us-ing the maximum likelihood estimate to predict the expectednumber of buggy files in a projects and results varies basedon the sample size. Future research in such statistical esti-mation methods will aid the developers to make an assump-tion of the number of probable buggy files in their projects.

ReferencesArisholm, E.; Briand, L. C.; and Fuglerud, M. 2007. Datamining techniques for building fault-proneness models intelecom java software. Software Reliability Engineering, In-ternational Symposium on 0:215–224.Arisholm, E.; Briand, L. C.; and Johannessen, E. B. 2010.A systematic and comprehensive investigation of methodsto build and evaluate fault prediction models. Journal ofSystems and Software 83(1):2 – 17. SI: Top Scholars.Balakrishnan, G., and Reps, T. 2010. Wysinwyx: What yousee is not what you execute. ACM Trans. Program. Lang.Syst. 32:23:1–23:84.Bouckaert, R. R. 2004. Bayesian network classifiers inweka. Internal Notes 11(3):1–23.Briand, L. C., and Wst, J. 2002. Empirical studies of qualitymodels in object-oriented systems. In Advances in Comput-ers, 97–166. Academic Press.Buhmann, M. D. 2003. Radial Basis Functions: Theory andImplementations. Cambridge University Press.Corbett, J. C.; Dwyer, M. B.; Hatcliff, J.; Laubach, S.;Pasareanu, C. S.; and Zheng, H. 2000. Bandera: Extractingfinite-state models from java source code. In In ProceedingsOf the 22nd International Conference on Software Engineer-ing, 439–448. ACM Press.Dick, S.; Meeks, A.; Last, M.; Bunke, H.; and Kandel, A.2004. Data mining in software metrics databases. FuzzySets and Systems 145(1):81 – 110. ¡ce:title¿ComputationalIntelligence in Software Engineering¡/ce:title¿.Dillig, I.; Dillig, T.; and Aiken, A. 2010. Reasoning aboutthe unknown in static analysis. Commun. ACM 53:115–123.Flanagan, C.; Leino, K. R. M.; Lillibridge, M.; Nelson, G.;Saxe, J. B.; and Stata, R. 2002. Extended static checking forjava. SIGPLAN Not. 37:234–245.Frank, E.; Trigg, L.; Holmes, G.; Witten, I. H.; and Aha,W. 2000. Naive bayes for regression. In Machine Learning,5–26.Gondra, I. 2008. Applying machine learning to softwarefault-proneness prediction. J. Syst. Softw. 81:186–195.Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann,P.; and Witten, I. H. 2009. The weka data mining software:an update. SIGKDD Explor. Newsl. 11:10–18.Hovemeyer, D., and Pugh, W. 2004. Finding bugs is easy.SIGPLAN Not. 39:92–106.John, G., and Langley, P. 1995. Estimating continuousdistributions in bayesian classifiers. In In Proceedings ofthe Eleventh Conference on Uncertainty in Artificial Intelli-gence, 338–345. Morgan Kaufmann.

Kim, S.; Whitehead, J.; and Zhang, Y. 2008. ClassifyingSoftware Changes: Clean or Buggy? Software Engineering,IEEE Transactions on 34(2):181–196.Landwehr, N.; Hall, M.; and Frank, E. 2005. Logistic modeltrees. Mach. Learn. 59:161–205.Marcus, E., and Stern, H. 2000. Blueprints for high avail-ability: designing resilient distributed systems. New York,NY, USA: John Wiley & Sons, Inc.Mende, T., and Koschke, R. 2009. Revisiting the evaluationof defect prediction models. In Proceedings of the 5th Inter-national Conference on Predictor Models in Software En-gineering, PROMISE ’09, 7:1–7:10. New York, NY, USA:ACM.Nagappan, N., and Ball, T. 2007. Using Software Dependen-cies and Churn Metrics to Predict Field Failures: An Empiri-cal Case Study. In First International Symposium on Empir-ical Software Engineering and Measurement (ESEM 2007),364–373. IEEE.Ohlsson, N., and Alberg, H. 1996. Predicting fault-pronesoftware modules in telephone switches. IEEE Transactionson Software Engineering 22:886–894.Quinlan, R. 1993. C4.5: Programs for Machine Learning.San Mateo, CA: Morgan Kaufmann Publishers.Rutar, N.; Almazan, C. B.; and Foster, J. S. 2004. A com-parison of bug finding tools for java. In Proceedings of the15th International Symposium on Software Reliability Engi-neering, 245–256. Washington, DC, USA: IEEE ComputerSociety.Tao, W., and Wei-hua, L. 2010. Naive bayes software de-fect prediction model. In Computational Intelligence andSoftware Engineering (CiSE), 2010 International Confer-ence on, 1 –4.Turhan, B., and Bener, A. 2009. Analysis of naive bayes’ as-sumptions on software fault data: An empirical study. DataKnowl. Eng. 68:278–290.Vandecruys, O.; Martens, D.; Baesens, B.; Mues, C.;Backer, M. D.; and Haesen, R. 2008. Mining software repos-itories for comprehensible software fault prediction models.Journal of Systems and Software 81(5):823 – 839. SoftwareProcess and Product Measurement.Weyuker, E. J.; Ostrand, T. J.; and Bell, R. M. 2007. Us-ing developer information as a factor for fault prediction.In Proceedings of the Third International Workshop on Pre-dictor Models in Software Engineering, PROMISE ’07, 8–.Washington, DC, USA: IEEE Computer Society.Witten, I. H., and Frank, E. 2005. Data Mining: Practi-cal Machine Learning Tools and Techniques, Second Edi-tion (Morgan Kaufmann Series in Data Management Sys-tems). Morgan Kaufmann series in data management sys-tems. Morgan Kaufmann, 2 edition.


Top Related