commentary - software: metrics mentality versus statistical mentality

IEEE TRANSACTIONS ON RELIABILITY, VOL. 49, NO. 3, SEPTEMBER 2000 319

Commentary—Software: Metrics Mentality versusStatistical Mentality

John D. Healy, Member IEEE

Abstract—The software reliability community must move awayfrom a metrics mentality to a statistical mentality. In a metricsmentality, the goal is to find quantities that can be calculatedeasily. This metrics mentality leads to the calculation of mean-ingless quantities. The user thinks he has useful information, butis mistaken. A statistical mentality defines parameters, collectsdata, and applies statistical procedures using the data to estimateparameters. The results are meaningful, useful estimates.

Index Terms—Software metrics, software reliability, statistics.

I. INTRODUCTION

A LL THROUGH the software reliability literature there arepleas for simple, usable metrics. Unfortunately they rarely

distinguish between “what is to be estimated” and “how thatquantity is to be estimated.” In statistics, there is a fundamentaldistinction between parameters and statistics. Parameters arepopulation characteristics. Data are collected and put togetherinto statistics. Statistics are used to estimate parameters and tomake inferences about parameters.

In software reliability, there are proposals for metrics.Whether these metrics are parameters (quantities to be esti-mated) or statistics (quantities which show how data should beused to estimate parameters) is unclear. When one companycollects its metrics, no one else knows exactly what thatcompany is really collecting. Also, it is unclear what someoneis supposed to do with these metrics.

This paper asks the software reliability community to:

1) Decide what types of the decisions they should makeabout software. The software reliability community thenbegins discussing what parameters they want estimatedand how these parameters relate to the decisions that needto be made. Usually this will result in hypotheses aboutthe parameters.

2) Begin to describe the type of data that should be collectedso that the parameter can be estimated. Sometimes morethan one type of data is needed.

3) Develop recommended statistical procedures to estimatethe parameters using the data that can be collected andare usually collected. These procedures do not have to betrivial 1-line formulas. But what they estimate must beboth understandable and relevant.

This paper presents two examples of metrics that have beenrecommended by a software organization which was proud of

Manuscript received October 25, 1999; revised April 20, 2000.The author is with the NVC-2X227, Telcordia Technologies Inc., Red Bank,

NJ 07701 USA (e-mail: [email protected]).Publisher Item Identifier S 0018-9529(00)11757-9.

TABLE IDATASET 1

its metrics although they could not interpret the results of thesemetrics.

Notation

number of problemsnumber of distinct times for “time to closeproblems”-expected time to close a problem

estimate oftime at which problem is closednumber of problems closed atsurvivor function; Pr{problem is still open at time}Kaplan–Meier estimate of

II. EXAMPLE SOFTWARE METRIC: AVERAGE TIME TO CLOSE

PROBLEMS

Intuitively we are interested in determining a typical time tofix problems. This metric seems simple and usable. Since ev-eryone knows what an average is, it seems as if we could talkto virtually any engineer and each one would come up with thesame thing. Table I has some data on one product.

The software organization wanted a simple, intuitive metric.What choices was it considering?

A. Option A

Ignore the “still open” reports and take the average times forthe “closed after” reports:

daysdays

Discussion: This procedure is certainly simple, repeatable,and seems to meet the definition of the metric. It is an averageof the closed times, and is a simple, easy-to-use metric. Unfor-tunately, it is also ridiculous. The data on open problems areentirely ignored.

Assume that a new problem arrived. If we were trying topredict an average time to close this new problem, 3.75 days

0018–9529/00$10.00 © 2000 IEEE

320 IEEE TRANSACTIONS ON RELIABILITY, VOL. 49, NO. 3, SEPTEMBER 2000

would be a serious underestimate. The organization with whichwe were consulting was actually using this metric. They won-dered why the metric always seemed to get worse the longerthe product was out in the field. They liked this metric becausein the long run it had to converge to the mean close time. Wepointed out that after 1.5 days, the estimate that the software or-ganization was using was 1 day, which they said was correct:the average of the closed times!

Option A provides a very optimistic metric. It is not obviouswhat this metric estimates, particularly if some of the open timesare smaller than some of the closed times. It is certainly notestimating the -expected time to fix a random problem. Thismeans that -confidence intervals for any parameter can not becalculated using this metric. How can this metric be comparedacross products?

B. Option B

Take the average of all the times:

daysdays

Discussion: This procedure is simple and repeatable. It is anaverage so that it meets the definition of the metric. It is alsoridiculous. It is an underestimate—the problems that are stillopen would have to take at least 7 days to close. But it is clearlyan average of repair times.

Option B tends to be optimistic. Nobody is quite sure ofwhat it really estimates. Any comparison across products andany trend analysis are impossible. How is a-confidence in-terval generated for the parameter when the estimator is not-consistent for the parameter?

C. Option C

Treat this problem statistically:

1) Define the parameter: Assume that the population is allthe software problems for a particular product. For eachproblem there is a “time to fix the problem.” Thus there isa population parameter:-expected time to fix a problem.This is equivalent to the average time it takes to close arandomly selected problem. This parameter needs to beestimated.

2) Collect data to estimate the parameter: The data inDataset 1 can be used.

3) Define and apply a statistical procedure to estimate theparameter: The data collected are censored data. Basi-cally, there are some problems where the “time to close”is known. There are other problems that are “still open”:we know only that the “time to close” will be greater thanthe “time spent so far” on these problems.

For censored data, there are statistical procedures. To estimatethe mean, assume that the “time to close problems” follows aparticular distribution. (More than one model can be fitted.) As-sume that the time to close a problem follows an exponentialdistribution and consider Dataset 1. Here, the usual estimator isthe sum of the times divided by the number of closed problems:

daysdays (1)

TABLE IIDATASET 2

This procedure looks a little different from an average.Discussion: The procedure is relatively simple. Defining the

population parameter is easy. Collecting data to estimate theparameter is straight-forward with one caveat: Keep track ofwhether each problem was closed or is still open. Developingthe statistical estimate is sometimes easy and sometimes diffi-cult. For example, if the time to fix a problem has an exponentialdistribution, there is a simple closed-form solution to the esti-mate of the mean time to close a problem. If the time to fix aproblem has a Weibull distribution, the estimator does not havea closed form.

The estimator in (1) has the usual statistical properties: it is-consistent and-efficient. The population quantity that it es-

timates is known. -Confidence intervals can be calculated. Inthis example, a 95%-confidence interval is [3.8, 26.7].

The estimator is quite different from the estimators inOptions A and B.

Other parameters can easily be estimated. Consider “mediantime to close problems.” The data can be ordered: 1, 2, 6, 6,7,

7, 7. The sample median is 6. (For an exponential distribu-tion, the median is about 70% of the mean.)

III. EXAMPLE SOFTWARE METRIC: FRACTION OF PROBLEMS

CLOSED ONTIME

Intuitively, we are interested in determining whether prob-lems are closed fast enough. Thus what are the chances ofclosing a typical problem “on time”?

This metric seems simple and usable. As long as we define“on time,” it seems that we could talk to virtually any engineerand each one would come up with the same thing. Table II hassome data on one product.

For this product, “on time” is 10 days. So what is the metric“Fraction of Software Problems Closed in 10 Days” for thisproduct?

A. Option A

There have been only 6 “closed after” problems. Since 4 prob-lems were closed in less than 10 days, the metric for these datais: %.

Discussion: This procedure is simple and repeatable; itseems to meet the definition of the metric. It is an easy-to-usemetric. Unfortunately, it is a ridiculous procedure. The data onopen problems is entirely ignored. At least 2 of the problemscould not possibly be closed in less than 10 days. The same

HEALY: COMMENTARY—SOFTWARE: METRICS MENTALITY VERSUS STATISTICAL MENTALITY 321

metric would have been obtained if there were 5000 problemsthat were open for 2 months.

This procedure provides a very optimistic metric. Nobody isquite sure what it is really estimating. It is certainly not esti-mating the chances that a typical problem is closed in 10 days.This means that nobody can set up-confidence intervals forany parameter using this metric. How can this metric be com-pared across products?

B. Option B

This procedure does not ignore all the open problems. At least2 of the open problems were closed after 10 days. Four problemswere closed on time so that a reasonable numerator is 4. A rea-sonable denominator is 8; it includes 6 closed problems and the2 problems that were open longer than 10 days. So a reasonablemetric is %.

Discussion: This procedure certainly is simple, repeatable,and usable. Unfortunately, it is not a very good metric. The in-formation on problems open for less than 10 days is entirelyignored.

This procedure also tends to be optimistic. Again nobodyis quite sure of what it is really estimating. Any “comparisonacross products” and “trend analysis” are impossible.

C. Option C

Treat this problem statistically.

1) Define the parameter: Assume that there is population ofall software problems for a particular product. For eachproblem there is a “time to fix the problem.” Thus there isa population parameter: Fraction of Problems Closed OnTime. This is equivalent to the probability that a randomlyselected problem is closed on time.

2) Collect data to estimate the parameter: The data in Dataset2 can be used.

3) Define and apply a statistical procedure to estimate theparameter: Dataset 2 contains censored data. Basicallythere are some problems for which the “time to close” isknown. Other problems are still open; for these problemswe know only that the “time to close” is greater than thetime spent so far on those problems.

For these types of problems, there are common statisticalprocedures. The Kaplan–Meier estimate can be applied to thistype of data; it provides a direct estimate of the probability ofnot “closing a problem by time (for every ).” It can pro-vide an estimate of the probability of not “closing a problem in10 days.” “1—this estimate” estimates the probability of closingthe problem in 10 days.

Estimate the probability that a problem is closed indays.Assume there aresoftware problems, anddistinct times

TABLE IIIRESULTS

at which problems are closed. The possibility ofthere being more than one problem closed atis allowed. TheKaplan–Meier estimate is:

number of problems closed at time.number of problems not closed prior toand have been

open for at least . Apply this to Dataset 2 in Table III.The % is the estimate that a problem is not

closed in 10 days. This is equivalent to an estimate of% that a problem is closed in 10 days. The standard de-

viation is 0.161 days. Many statistical packages provide theseoutputs.

Discussion: This procedure is fairly simple. TheKaplan–Meier estimate does not look like the usual esti-mates of proportions. It is not the simple quotient of twoquantities. Many reasonable statistics can not be written inclosed form. The procedure, however, is repeatable. The dataneeded to estimate the parameter are very simple and areavailable. This estimate has the usual statistical properties:-consistent and -efficient. The population quantity that it

estimates is known, and the-confidence intervals can becalculated, and the standard deviation can be estimated. Forthis example, the 95%-confidence interval for the “fractionclosed in 10 days” is [0.10, 0.73].

The estimate (41.7%) is appreciably different than the quan-tities provided in Options A and B.

John D. Healy is the Chief Scientist of Network Reliability at TelcordiaTechnologies. His expertise is in reliability modeling, statistical analysis,quality control, and data analysis. John led the data collection and analysisefforts for both Network Reliability Councils chartered by the FCC to improvetelecommunications network reliability. He chairs the Facilities Solution Teamaimed at reducing damage to underground fiber facilities. He served as anAdvisor to the President’s Commission on Critical Infrastructure Protection onnetwork reliability and security. John wrote Telcordia’s Reliability PredictionProcedure. He is a Vice-Chairman of the Annual Reliability and Maintain-ability Symposium Committee (RAMS) and an Associate Editor of this IEEETRANSACTIONS ON RELIABILITY . John has a Ph.D. in mathematical statisticsfrom Purdue University.

commentary - software: metrics mentality versus statistical mentality

Documents