it metrics and benchmarking - semantic scholar · of it metrics and benchmarking — some...

June 2003 Vol. 16, No. 6

ACCESS TO THE EXPERTS

The Journal of Information Technology Management

IT Metrics and Benchmarking

Use BenchmarkData to SubstantiateIT ResourceRequirementsBenchmarking enables you toassess IT productivity and demon-strate potential resource savings toexecutive-level management. Thedata will make your assertionsmore compelling and defendable.

Don’t Waste ValuableIT Resources onBenchmark DataBenchmark data looks at whereothers have been in years past andhas no relevance to what you aredoing today. The data can easilybe manipulated to serve anyone’sparticular interests.

“Many organizations today are attemptingto realize the benefits of IT metrics andbenchmarking — some successfully andsome not so successfully”

— David Garmus, Guest Editor

Opening StatementDavid Garmus 2

The Big Picture: Software Measurements in Large Corporations

Capers Jones 6

Extracting Real Value from ProcessImprovementThomas M. Cagley, Jr. 13

Hitting the Sweet Spot: Metrics Success at AT&T

John Cirone, Patricia Hinerman, and Patrick Rhodes 20

From Important to Vital: The Evolution of a MetricsProgram from Internal to Outsourced Applications

Barbara Beech 28

Benchmarking for the Rest of UsJim Brosseau 33

The Practical Collection, Acquisition, andApplication of Software MetricsPeter R. Hill 38

http://www.cutter.com

Many organizations today are

attempting to realize the benefits

of IT metrics and benchmarking —

some successfully and some not

so successfully. IT has become a

direct contributor to bottom-line

business value. The need to build

higher-quality systems faster and

more cheaply is increasingly signifi-

cant to each organization. Toward

that end, IT is constantly seeking

ways to improve as it comes under

greater scrutiny from senior execu-

tives with regard to cost and return

on their investment.

Organizations require accurate,

detailed data to manage their soft-

ware business. They require cost

information that permits informed

decisions with regard to technology

strategies, effective implementa-

tion of architectures, and cost-

efficient resource management.

They need to understand their

capacity to deliver with regard to

their utilization of methods and

tools, effective deployment of train-

ing programs, and potential out-

sourcing opportunities. Simply

stated, they require business meas-

ures based on facts.

Evidently, many people consider

this to be a critical topic. Seventeen

individuals submitted ideas for this

issue. Since we could not print

them all this month, we will have

another issue on this topic later in

the year, with additional infor-

mation in the areas of practical

software measurement and the

Capability Maturity Model (CMM)/

CMM-Integration (CMMI).

Capers Jones starts us out with

the first of six articles in this issue.

Who can dispute Capers? He was

my first mentor in this area and is

well known to all. His article is an

excellent introduction to some of

the standard IT metrics. While the

article focuses primarily on meas-

urement in very large companies

that employ more than 10,000 soft-

ware personnel, Capers briefly

characterizes the measurement

practices of a broad range of

organizations:

� Companies with less than

100 software personnel

may have fairly simple

measurement programs

that capture software

defect data during testing,

customer-reported defects,

and possibly basic productiv-

ity measures such as cost per

function point or work-hours

per function point.

� Companies with more than

100 software personnel begin

to pay serious attention to

software issues. Such com-

panies may commission soft-

ware process assessments

and benchmark studies to

ascertain their performance

against industry norms.

They very likely have soft-

ware productivity and quality

measures, business meas-

ures such as market share,

and customer satisfaction

measures.

� Companies that employ

more than 10,000 software

personnel are often very

sophisticated in terms of

software measurement and

tend to measure everything

of importance to software:

productivity, costs, quality,

schedules, sizes of applica-

tions, process assessments,

benchmarks, baselines, staff

demographics, staff morale,

staff turnover rates, customer

satisfaction, and market

share information. Many

sources of data are used

and need to be reviewed

and validated.

My own experience with very

large companies does not validate

Capers’ opinion of their sophis-

tication, but I do concur with his

categorization of software meas-

urement into quality, productivity

and schedule, process assessment,

so m

any

ideas

and s

o lit

tle s

uccess

©2003 Cutter Information LLC.2 June 2003

Opening Statement

by David Garmus

Simply stated, organizations

require business measures

based on facts.

and business and corporate meas-

ures. Capers gives excellent defini-

tions of the types of data that can

be collected, but he doesn’t say

much about the CMM.

The next article by Tom Cagley,

however, discusses how real value

can be extracted from process

improvement and the CMM. Tom

states that serious metrics-based

process improvement programs

begin with a quantitative baseline

of current organizational data that

enables a comparison of changes

within an organization. He goes

on to claim that data analysis

requires combining the strengths,

weaknesses, and recommenda-

tions generated in a CMM appraisal

with quantitative findings from

the metrics-based productivity

assessments. He found a client

organization willing to sponsor a

joint assessment, and the results of

this assessment provided the client

with integrated recommendations

to support effective allocation of

scarce resources.

Tom relates that the individual CMM

and productivity results, as well as

the joint recommendations, proved

useful to the organization by focus-

ing future process improvement

efforts. Can you be convinced that a

CMM or CMMI assessment would

help your IT organization? Tom lays

out an interesting scenario. The

software industry has become

increasingly aware of the need to

measure itself. As organizations

pursue better software manage-

ment, they recognize the urgency

for process improvement strategies

and the quantification of business

value. Each organization must pick

and choose its own road — not

easy, is it?

Back to Capers for a moment. In his

article, he recommends that we

visit companies such as AT&T and

others to find out firsthand what

kinds of measurements occur —

and that’s just what we have done

in our next two articles. We went

to two separate IT organizations

within AT&T to learn from their

experiences with software meas-

urement programs. Most IT shops

have common goals to deliver soft-

ware projects on time, within bud-

get, and with high quality; however,

the execution and implementation

of measurement programs vary

greatly. In my opinion, any organi-

zation that has established an effec-

tive measurement process — a

process that enables the quantita-

tive and qualitative assessment of

the value and quality of the prod-

ucts and services it produces — is

exceptional. This is especially true

when that organization uses met-

rics to identify opportunities for

improving development and sup-

porting productivity and quality.

To use Capers’ term, AT&T is

sophisticated in terms of software

measurement.

John Cirone leads the IT organi-

zation responsible for all of the

finance and human resource

Get The Cutter Edge free: www.cutter.com 3Vol. 16, No. 6

Cutter IT Journal®

Cutter Business Technology Council:Rob Austin, Christine Davis, LynneEllyn, Tom DeMarco, Jim Highsmith,Tim Lister, Ken Orr, Ed Yourdon

Editorial Board: Larry L. Constantine, Bill Curtis, Tom DeMarco, Peter Hruschka,Tomoo Matsubara, Navyug Mohnot,Roger Pressman, Howard Rubin,Paul A. Strassmann, Rob Thomsett

Editor Emeritus: Ed YourdonPublisher: Karen Fine CoburnGroup Publisher: Bruce LynchManaging Editor: Karen PasleyProduction Editor: Linda MallonClient Services: Carol Bedrosian

Cutter IT Journal® (ISSN 1522-7383)is published 12 times a year by CutterInformation LLC, 37 Broadway,Suite 1, Arlington, MA 02474-5552,USA (Tel: +1 781 648 8700 or, withinNorth America, +1 800 964 5118;Fax: +1 781 648 1950 or, withinNorth America, +1 800 888 1816;E-mail: [email protected];Web site: www.cutter.com).

Cutter IT Journal® covers the soft-ware scene, with particular empha-sis on those events that will impactthe careers of IT professionalsaround the world.

©2003 by Cutter Information LLC. All rights reserved. Cutter IT Journal®

is a trademark of Cutter InformationLLC. No material in this publicationmay be reproduced, eaten, or distrib-uted without written permissionfrom the publisher. Unauthorizedreproduction in any form, includingphotocopying, faxing, and imagescanning, is against the law. Reprintsmake an excellent training tool. Forinformation about reprints and/orback issues of Cutter Consortiumpublications, call +1 781 648 8700or e-mail [email protected].

Subscription rates are US $485 a yearin North America, US $585 elsewhere,payable to Cutter Information LLC.Reprints, bulk purchases, past issues,and multiple subscription and sitelicense rates are available on request.

Can you be convinced

that a CMM or CMMI

assessment would help

your IT organization?



mailto:[email protected]

systems at AT&T, and Patrick

Rhodes and Patricia Hinerman are

members of his staff. They collabo-

rated to write our third article, and

they are the principal individuals

responsible for the management,

development, implementation,

and maintenance of the metrics

program they discuss.

I believe that this article presents

one of the best examples of a

measurement program that was

clearly defined and planned before-

hand and well managed in the

implementation and maintenance

phases. True, they used function

points, but their program would

work with any sizing metric. They

argue that what is key to the value

of a metrics program for manage-

ment and financial sponsors is a

direct relationship in the form of

quantitative information to man-

age the business. The data they

have collected and analyzed has

become very useful not just for

application-specific decisions but

for decisions that impact the entire

development shop.

Much has been written about the

benefits of software measurement

and the failure of software meas-

urement programs. If software

measurement is beneficial and

relatively easy, why don’t more

companies incorporate measure-

ment into their development and

maintenance practices? How does

an organization start?

Barbara Beech, district manager

in the Consumer CIO Vendor

Management Division, contributes

our fourth article, in which she

discusses how she initiated a

measurement program elsewhere

within AT&T. Barbara also had the

support of top-level management,

one of the key factors in ensuring a

successful startup. As she points

out, though, “... top-level support

will only go so far. You need to get

down to the details to make metrics

collection and reporting a reality.”

She identifies the critical steps her

group took and the quarterly targets

and scorecard they developed,

which were used by management

to monitor progress. Barbara

addresses the challenges faced in

collecting metrics data and the

changes necessary when a

decision is made to outsource

development work.

The level of interest and the need

for industry data within IT has

increased dramatically over the

past several years. Two main forces

are whetting this increased appetite

for information on IT performance:

competitive positioning and out-

sourcing. IT organizations need

to benchmark their progress and

compare their rate of improvement

to an industry standard.

The focus on improved productivity

and cost reduction has driven many

companies to outsource their IT

activities when faced with the real-

ization that their performance lev-

els were below par. For IT groups

that have chosen to outsource their

applications development and

maintenance functions, industry

benchmark data is invaluable. As

an outsourcing deal is being devel-

oped, benchmark data can be used

to properly set service levels and

define improvement goals. As the

outsourcing deal matures, periodic

checks on industry trends can be of

great value.

When an organization researches

the available sources of industry

data, an overriding question has

to be whether the data obtained

is valid. Our fifth author, Jim

Brosseau, identifies the short-

comings of some of the pub-

lished data and enumerates the

approaches organizations can use

to effectively generate meaningful

benchmarking information. While

Jim asserts that “using data from

reputable sources will help you to

back up your assertions and can

make your arguments much more

compelling and defendable,” he

continues with this show-stopper:

“The allure of benchmarking data

comes from its external sterility.

The data provided is based on other

people’s performance, and it may

provide a sanitized look at what

the industry is doing. For some

organizations, it can become a

game to blithely quote industry

performance figures while avoiding

internal measurement, knowing

that the truth can be a bitter pill

to swallow.”

©2003 Cutter Information LLC4 June 2003

Two main forces are

whetting the increased

appetite for information on

IT performance: competitive

positioning and outsourcing.

Get The Cutter Edge free: www.cutter.com

Jim also hits the SEI and the CMM

for their extremely small sample

size, considering the number of

software development organiza-

tions worldwide. He concludes that

“industry benchmark data defi-

nitely has its place in your arsenal

of information for making strategic

business decisions. Still, it has limi-

tations that must be overcome with

a deep understanding of why you

are measuring and balanced with

data gathered internally with rea-

sonable approaches.”

In our final article, Peter Hill, exec-

utive director of the International

Software Benchmarking Standards

Group (ISBSG), informs us that

real benchmark delivery rates

are available for your industry,

technology, platform, and software

type. The ISBSG is a not-for-profit

organization that maintains an

extensive database of metrics on

development projects and mainte-

nance support applications that it

sells worldwide. Peter observes

that “the commercial consulting

companies that offer benchmark-

ing services tend not to let you look

at the data used in their bench-

mark reports.” In contrast, ISBSG’s

project-level data is available to

anyone who wishes to purchase

a copy. (Company names are

changed to protect the innocent,

of course.) Peter suggests that you

ask a number of questions before

buying benchmark data:

� Is the collection instrument

well thought out and proven?

� Has the data been rated?

� How old is the data?

� Can I use the data to com-

pare “apples with apples”?

� What is the possibility of data

manipulation?

Remember, though, Jim Brosseau’s

caution about using benchmark

data as a driver for direction in your

organization: “If you are looking at

how much your organization

should be spending, historical

benchmarking data will tell you

where the industry has been, but it

will not help you resolve how to

best address your organizational

needs in the future.”

So where are we on that road map?

As you read these articles, keep in

mind that IT metrics do not appear

to be well defined, nor do they

follow a standardized process.

That may be the next step in the

maturation process of software

measurement.

Vol. 16, No. 6 5

EA Governance: From Platitudes to ProgressGuest Editor: George Westerman

So you want to build an enterprise architecture? Congratulations on having the vision to improve the ITorganization’s efficiency and flexibility!

Building an enterprise architecture? There’s no way this IT organization is going to spend zillions onsomething that provides no benefit to the company!

Which reaction will you get from your boss? In next month’s Cutter IT Journal, Guest Editor and CutterConsortium Senior Consultant George Westerman will examine a key aspect of enterprise architectureperformance or failure — namely, governance. In the issue, you’ll read insightful analyses of successes andfailures that help move the discussion of EA governance beyond simple platitudes. The sage advice of ourEA veterans will help prepare you for whatever reaction you may get.n

ext

iss

ue


INTRODUCTION

As company size grows, the kinds

of software measurement pro-

grams encountered also increase.

There are two main reasons for

this. First, in large corporations,

software costs are among the

largest identifiable expense ele-

ments. Second, large corporations

need large software applications.

Large software applications are

very likely to fail or run out of con-

trol. Therefore, the top executives

in large corporations have strong

business reasons for wanting cor-

porate software activities to be

under top management regulation.

Measurement programs are very

effective in bringing software proj-

ects under executive control.

Very small companies, with less

than 10 software personnel, usually

have no formal software measure-

ment programs. About the only

kind of measurement data they

collect is the number of customer-

reported defects.

Small companies, with less than

100 software personnel, may have

fairly simple measurement pro-

grams that capture software

defect data during testing, as well

as defects reported by users in

deployed software applications.

A few small companies have basic

productivity measures such as cost

per function point or work-hours

per function point.

Midsized companies, with between

100 and 1,000 software personnel,

begin to pay serious attention to

software issues. Such companies

may commission software process

assessments and benchmark stud-

ies to ascertain their performance

against industry norms. Companies

in this size range are very likely to

have both software productivity

and quality measures in place. Of

course, business measures such as

market share and customer satis-

faction are also common in this size

range. Business measures often

use a combination of data collected

by inhouse personnel plus data

acquired from consulting groups.

Large companies, with between

more than 1,000 and 10,000 soft-

ware personnel, usually have fairly

good software measurement pro-

grams that include productivity,

quality, and customer satisfaction

measures. Such companies also

tend to have business measures

such as market shares, and they

may have personnel and demo-

graphic measures.

At the top of the spectrum are very

large companies that employ more

than 10,000 software personnel,

such as IBM, Microsoft, Electronic

Data Systems, Siemens Nixdorf,

and the like. Some of these may

top 50,000 software personnel, and

they are often very sophisticated in

terms of software measurements.

These very large companies tend to

measure everything of importance

to software: productivity, costs,

quality, schedules, sizes of appli-

cations, process assessments,

benchmarks, baselines, staff demo-

graphics, staff morale, staff turnover

rates, customer satisfaction, market

shares, and competitive informa-

tion. Many sources of data are used

and need to be reviewed and vali-

dated. Therefore, measurement

programs in large companies are

usually carried out by full-time

measurement personnel.

These observations are gener-

ally true, but there are exceptions.

There are some small companies

with excellent measurement pro-

grams. Conversely, there are some

large companies with minimal

measurement programs. However,

©2003 by Capers Jones. All rights reserved6

livi

n’ la

rge

June 2003

The Big Picture: Software Measurements in Large Corporations

by Capers Jones

Measurement programs are

very effective in bringing

software projects under

executive control.


company size and measurement

sophistication do correlate very

strongly, for solid business reasons.

Measurement is not the only factor

that leads to software excellence.

Measurement is only one part of a

whole spectrum of issues, including:

� Good management

� Good technical staff

� Good development

processes

� Effective and complete

tool suites

� Good organization structures

� Specialized staff skills

� Continuing on-the-job

training

� Good personnel policies

� Good working environments

� Good communications

However, measurement is the tech-

nology that allows companies to

make visible progress in improving

the other factors. Without measure-

ment, progress is slow and some-

times negative. Companies that

don’t measure tend to waste scarce

investment dollars in “silver bullet”

approaches that consume time and

energy but generate little forward

progress. In fact, good quality and

productivity measurement pro-

grams provide one of the best

returns on investment of any

known software technology.

WHAT CAN BE MEASURED?

The best way for a company to

decide what to measure is to

find out what major companies

measure and do the same things.

In the next section, I discuss the

kinds of measurements used by

large companies that are at the top

of their markets and are generally

succeeding in global competition.

If possible, try to visit companies

such as Microsoft, IBM, AT&T, or HP

and find out firsthand what kinds of

measurements tend to occur.

SOFTWARE QUALITY MEASURES

Major companies measure soft-

ware quality. Quality is the most

important topic of software meas-

urement, and here are the most

important quality measures:1

Customer SatisfactionLarge companies perform annual

or semiannual customer satisfac-

tion surveys to find out what their

clients think about their products.

There is also sophisticated defect

reporting and customer support

information available via the Web.

Many large companies have active

user groups and forums. These

groups often produce independent

surveys on quality and satisfaction

topics that are quite helpful.

Defect Quantities and OriginsLarge companies usually keep

accurate records of the bugs or

defects found in all major deliver-

ables, and they tend to start early

during requirements or design. At

least five categories of defects are

measured: requirements defects,

design defects, code defects, docu-

mentation defects, and bad fixes

(i.e., secondary bugs introduced

accidentally while fixing another

bug). This form of measurement is

one of the oldest software meas-

ures on record, and companies

such as IBM began defect measure-

ments as early as the late 1950s.

Some leading companies perform

root cause analysis on software

defects in order to find and elimi-

nate common sources of error.

Defect Removal EfficiencyThe phrase “defect removal effi-

ciency” originated in IBM in the

early 1970s. It refers to the percent-

age of bugs or defects removed

before software is delivered to

customers. This is an important

aspect of software development,

but it is not universally measured.

According to my observations

among major corporations, about

a third of large companies measure

defect removal efficiency.

It is useful to measure the average

and maximum efficiency of every

major kind of review, inspection,

and test stage. This allows compa-

nies to select an optimal series of

Vol. 16, No. 6 7

1A useful summary of almost all known

software quality measures can be found

in Stephen Kan’s Metrics and Models in

Software Quality Engineering [6].

Good quality and productivity

measurement programs pro-

vide one of the best returns

on investment of any known

software technology.


removal steps for projects of vari-

ous kinds and sizes. Testing alone is

not very efficient. A combination of

reviews and inspections and mul-

tiple test stages is most efficient.

Leading companies remove from

95% to more than 99% of all defects

prior to delivery of software to cus-

tomers. Laggards seldom exceed

80% in terms of defect removal effi-

ciency and may drop below 50%.

The US average is about 85% [5].

Delivered Defects by ApplicationIt is common among large compa-

nies to accumulate statistics on

errors reported by users as soon

as software is delivered. Monthly

reports showing the defect trends

against all applications are pre-

pared and given to executives, and

they are also summarized on an

annual basis. These reports may

include supplemental statistics

such as defect reports by country,

state, industry, client, and so on.

Defect Severity LevelsAll major companies use some kind

of a severity scale for evaluating

incoming bugs or defects reported

from the field. The number of

plateaus vary from one to five. In

general, “Severity 1” defects are

problems that cause the system to

fail completely; the severity scale

then descends in seriousness. A

few companies are using the newer

“orthogonal defect classification”

developed by IBM [1]. In addition

to severity levels, this method

captures information about the

business importance of various

kinds of bugs or defects.

Complexity of SoftwareIt has been known for many years

that complex code is difficult to

maintain and has higher-than-

average defect rates. A variety of

complexity analysis tools are com-

mercially available that support

standard complexity measures such

as cyclomatic and essential com-

plexity, the two most widely used

complexity measures. Both meas-

ures were developed by the com-

plexity pioneer, Tom McCabe [8].

Test Case CoverageSoftware testing may or may not

cover every branch and pathway

through applications. A variety of

commercial tools are available that

monitor the results of software test-

ing and help to identify portions of

applications where testing was

sparse or did not occur.

Cost of Quality Control and Defect RepairsOne significant aspect of quality

measurement is to keep accurate

records of the costs and resources

associated with various forms of

defect prevention and removal. For

software, these measures include:

� The costs of software

assessments

� The costs of quality

baseline studies

� The costs of reviews,

inspections, and testing

� The costs of warranty

repairs and post-release

maintenance

� The costs of quality tools

� The costs of quality

education

� The costs of your software

quality assurance organization

� The costs of user satisfaction

surveys

� The costs of any litigation

involving poor quality or

customer losses attributed

to poor quality

About 50% of large companies

quantify the “cost of quality” [2].

SOFTWARE PRODUCTIVITYAND SCHEDULE MEASURES

The measurement of software

schedules, software effort, and soft-

ware costs is an important topic in

major corporations. As of 2003,

many major corporations have

adopted function point metrics

rather than the older and inade-

quate “lines of code” metric [4].

However, large corporations in the

defense industry (and a few others)

attempt to measure productivity

using the obsolete lines of code

measure.

The topic of software productivity

tends to have more benchmark

studies than almost any other.

About 65% of the large corporations

I’ve visited have commissioned

various benchmark comparisons

of their software performance.

Normally these studies are per-

formed by external consulting

©2003 by Capers Jones. All rights reservedJune 20038

Leading companies remove

from 95% to more than 99%

of all defects prior to delivery

of software to customers. The

US average is about 85%.


groups. Here are some of the key

productivity measures examined

by large software producers:

Size MeasuresBecause costs and schedules of

software projects are directly

related to the size of the applica-

tion, this is an important topic.

Industry leaders measure the size

of the major deliverables associ-

ated with software projects. Size

data is kept in two ways. One

method is to record the size of

actual deliverables such as pages of

specifications, pages of user manu-

als, screens, test cases, and source

code. The second way is to normal-

ize the data for comparative pur-

poses. Here the function point

metric is now the most common

and the most useful. Examples of

normalized data would be pages of

specifications produced per func-

tion point, source code produced

per function point, and test cases

produced per function point. The

function point metric defined by the

International Function Point Users

Group (IFPUG) is now the major

metric used for software data

collection.2

Schedule MeasuresMany large companies measure

overall project schedules from

start to finish. However, about 25%

of leading large companies meas-

ure the schedules of every activity

and how those activities overlap

or are carried out in parallel.

Overall schedule measurements

without any details are inadequate

for any kind of serious process

improvement.

Cost MeasuresAlmost all large companies meas-

ure the costs of software projects.

About 25% of the leaders measure

the effort for every activity, starting

with requirements and continuing

through maintenance. These meas-

ures include all major activities,

such as technical documentation,

integration, quality assurance, and

so on. Leading large companies

tend to have rather complete charts

of accounts, with no serious gaps or

omissions. Three kinds of normal-

ized data are typically created for

development productivity studies:

1. Work hours per function

point by activity and in total

2. Function points produced

per staff-month by activity

and in total

3. Cost per function point by

activity and in total

Cost benchmarking is very com-

mon. Cost benchmarks can be

either fairly high-level, such as total

software expenses, or fairly granu-

lar, such as benchmarks of specific

projects. Some large corporations

use both forms of benchmarking.

Maintenance MeasuresBecause maintenance and

enhancement of aging software are

now the dominant activities of the

software world, most companies

also measure maintenance pro-

ductivity. An interesting metric for

maintenance is “maintenance

assignment scope.” This is defined

as the number of function points of

software that one programmer can

support during a calendar year.

Other maintenance measures

include number of customers sup-

ported per staff member, number

of defects repaired per time period,

and rate of growth of applications

over time.

Indirect Cost MeasuresAbout 15% of the large companies

I’ve visited measure costs of indi-

rect software activities. Some of the

indirect activities — such as travel,

meeting costs, training and edu-

cation, moving and living, legal

expenses, and the like — are so

expensive that they cannot be

overlooked.

Rates of Requirements ChangeThe advent of function point

metrics has allowed direct meas-

urement of the rate at which soft-

ware requirements change. The

observed rate of change in the US is

about 2% per calendar month. The

rate of change is derived from two

measurement points: the function

point total of an application when

the requirements are first defined

and the function point total when

Vol. 16, No. 6 9

2The IFPUG Web site (www.ifpug.org)

is a source of information on function

point publications and uses of function

points. See also IT Measurement:

Practical Advice from the Experts [3].

Because maintenance and

enhancement of aging soft-

ware are now the dominant

activities of the software

world, most companies

also measure maintenance

productivity.


the software is delivered to actual

customers. About 20% of large

companies measure requirements

change. It is significant that when

outsourced projects go bad and

end up in court for breach of con-

tract, almost every case includes

claims of excessive requirements

change.

PROCESS ASSESSMENT, OR“SOFT FACTOR” MEASURES

Even accurate quality and produc-

tivity data is of no value unless it

can be explained why some proj-

ects are visibly better or worse than

others. More than half of the large

corporations I’ve visited have com-

missioned one or more assessment

studies. Assessments are far more

common among companies that

produce systems software, embed-

ded software, or military software

than they are among companies

that produce normal information

systems.

In general, software process

assessments, which are usually

performed by consulting organiza-

tions, cover the following topics:

Software ProcessesThis topic deals with the entire

suite of activities that are per-

formed from early requirements

through deployment. How the

project is designed, what quality

assurance steps are used, and how

configuration control is managed

are some of the topics included.

This information is recorded in

order to guide future process

improvement activities. If historical

development methods are not

recorded, there is no statistical way

of separating ineffective methods

from effective ones.

Software Tool SuitesThere are more than 2,500 software

development tools on the commer-

cial market and at least the same

number of proprietary tools that

companies have built for their own

use. It is very important to explore

the usefulness of the available

tools, and that means that each

project must record the tools uti-

lized. Thoughtful companies iden-

tify gaps and missing features and

use this kind of data for planning

improvements.

Software InfrastructureThe number, size, and kinds of

departments within large organiza-

tions are an important topic, as are

the ways of communicating across

organizational boundaries. Other

factors that exert a significant

impact on results include whether

a project uses matrix or hierarchi-

cal management and whether a

project involves a single location

or multiple cities or countries.

Software Team Skills andExperienceLarge corporations can have

more than 100 different occupa-

tion groups within their software

domains. Some of these specialists

include quality assurance, techni-

cal writing, testing, integration

and configuration control, network

specialists, and many more. Since

large software projects do better

with specialists than with general-

ists, it is important to record the

occupation groups used.

Staff and Management TrainingSoftware personnel, like medical

doctors and attorneys, need con-

tinuing education to stay current.

Leading companies tend to provide

10-15 days of education per year,

for both technical staff members

and software management.

Assessments explore this topic.

Normally, training takes place

between assignments and is not a

factor on specific projects, unless

activities such as formal inspec-

tions or joint application design

are being used for the first time.

BUSINESS AND CORPORATEMEASURES

Thus far, measurement has been

discussed at the level of software

projects. However, software sup-

ports other business operations.

Therefore, all large companies with

thousands of software personnel

perform many kinds of business

measurements. About 70% of

the large corporations I’ve visited

use some form of the “balanced

scorecard” approach for business

measures. This method, developed

by David Norton and Robert Kaplan

[7], combines financial perfor-

mance measures, customer meas-

ures, and business goals and can


If historical development

methods are not recorded,

there is no statistical way

of separating ineffective

methods from effective ones.


be customized with other factors

such as productivity and quality.

Below are just a few samples of

corporate measures to illustrate

the topics of concern.

Portfolio MeasuresMajor corporations can own from

250,000 to more than 1,000,000 func-

tion points of software, apportioned

across thousands of programs and

dozens to hundreds of systems.

Many large enterprises know the

sizes of their portfolios, their growth

rate, replacement cost, quality

levels, and many other factors.

Market Share MeasuresMost large companies know quite

a lot about their markets, market

shares, and competitors. For exam-

ple, industry leaders in the com-

mercial software domain tend to

know how every one of their prod-

ucts is selling in every country and

how well competitive products are

selling in every country. Some com-

panies carry out market share stud-

ies with their own personnel. Other

companies depend upon outside

consulting groups. Still other com-

panies use both internal and exter-

nal sources of data. Much of this

kind of information is available

from various industry sources such

as Dun & Bradstreet, Mead Data

Central, Fortune magazine, and

other journals.

SUMMARY AND CONCLUSIONS

The software industry is struggling

to overcome a very bad reputation

for poor quality and long schedules.

The companies that have been

most successful in improving qual-

ity and shortening schedules have

also been the ones with the best

measurements.

Since large companies have the

greatest costs for software and

build most of the world’s major

software systems, it is natural for

large companies to have the most

complete and sophisticated meas-

urement programs. Within the

set of large software companies,

those with the best measurement

programs have the highest suc-

cess rate in building software

applications. Good measurement

programs and good software prac-

tices are almost always found

together.

REFERENCES

1. Chillarege, Ram. “ODC Basics

for ODC Process Measurement,

Analysis and Control.” Proceedings

of the Fourth International

Conference on Software Quality.

ASQC Software Division, 1994.

2. Crosby, Philip B. Quality Is Free:

The Art of Making Quality Certain.

New American Library, 1979.

3. International Function Point

Users Group. IT Measurement:

Practical Advice from the Experts.

Addison-Wesley, 2002.

4. Jones, Capers. “Sizing Up

Software.” Scientific American

(December 1998).

5. Jones, Capers. Software Quality:

Analysis and Guidelines for

Success. International Thomson

Computer Press, 1997.

6. Kan, Stephen H. Metrics and

Models in Software Quality

Engineering, 2nd edition.


7. Kaplan, Robert S., and David P.

Norton. “Using the Balanced

Scorecard as a Strategic

Management System.” Harvard

Business Review (January-

February 1996).

8. McCabe, Tom. “A Complexity

Measure.” IEEE Transactions on

Software Engineering, Vol. 2,

No. 4 (1976).

ADDITIONAL READING

The literature on software meas-

urement and metrics is expanding

rapidly. Following are a few of the

more significant titles to illustrate

the topics that are available.

Boehm, Barry W. Software

Engineering Economics. Prentice

Hall, 1982.

Garmus, David, and David Herron.

Function Point Analysis. Addison-

Wesley, 2001.

Vol. 16, No. 6 11

The companies that have

been most successful in

improving quality and short-

ening schedules have also

been the ones with the best

measurements.


Garmus, David, and David Herron.

Measuring the Software Process:

A Practical Guide to Functional

Measurement. Prentice Hall, 1996.

Grady, Robert B., and Deborah L.

Caswell. Software Metrics:

Establishing a Company-Wide

Program. Prentice Hall, 1987.

Howard, Alan, ed. Software Metrics

and Project Management Tools.

Applied Computer Research, 1997.

Jones, Capers. Applied Software

Measurement, 2nd ed. McGraw-

Hill, 1996.

Jones, Capers. Software

Assessments, Benchmarks, and

Best Practices. Addison-Wesley,

2000.

Miller, Sharon E., and George T.

Tucker. “Software Development

Process Benchmarking.”

Proceedings of the IEEE Global

Telecommunications Conference.

IEEE Communications Society,

1991.

Putnam, Lawrence H., and Ware

Myers. Measures for Excellence:

Reliable Software on Time, Within

Budget. Yourdon Press Computing

Series, Pearson Education POD,

1992.

Putnam, Lawrence H., and Ware

Myers. Industrial Strength Software:

Effective Management Using

Measurement. IEEE Press, 1997.

Capers Jones is Founder of Software

Productivity Research (SPR). After SPR

was acquired by Artemis Management

Systems, he became Chief Scientist

Emeritus of Artemis Management

Systems. Mr. Jones is an author and

speaker on software productivity, quality,

project management, and measurement

and the developer of the SPQR models

(Software Productivity, Quality, and

Reliability estimators). He was formerly

Assistant Director of Measurements with

ITT Programming Technology Center.

Prior to this, Mr. Jones was a project

team leader for the software process

improvement group at IBM’s Systems

Development Division. The team was

chartered to improve the quality and pro-

ductivity of IBM’s commercial software

systems. Mr. Jones also served as team

leader of software process assessments

at Nolan, Norton, and Company. He was

a programmer/analyst for the Office of

the Surgeon General in Washington, DC,

and also for Crane Company in Chicago.

Mr. Jones is graduate of the University of

Florida. He is member of IEEE and the

International Function Point Users Group

(IFPUG). Mr. Jones was awarded a life-

time membership in IFPUG for his work

in software measurement and metrics

analysis.

Mr. Jones can be reached at

[email protected].


two b

irds

wit

h o

ne s

tone

©2003 The David Consulting Group. All rights reserved. 13

Despite the hopes of process

improvement advocates, there is

limited information in the literature

as to the quantifiable benefits of

process improvement. This is due

to several factors:

� No standard for accounting

benefits

� Inconsistency in applying

cost accounting standards

� Failure to recognize natural

evolution or improvement

� Lack of formal quantification

of productivity or efficiency

improvements

These factors suggest the need for

a different approach to assessing

process improvement costs and

benefits in terms of quantitatively

matching process capability/

maturity to “faster, better, and

cheaper” project performance.1

In response to the growing need to

demonstrate quantitative value of

improvement effort, in quality and

dollars, this author has developed

a joint model and productivity

assessment methodology. The

methodology generates specific

recommendations that match an

organization’s business goals with

specific process changes. In most

cases, the recommendations focus

on specific tasks and activities

rather than on the generic goal

of attaining a Capability Maturity

Model (CMM) level.2 Recommen-

dations include targeted quantita-

tive productivity improvements and

reductions in time to market, deliv-

ered defects, total defects, and

maintenance effort. The intent is

to help organizations demonstrate

value early in the process improve-

ment cycle through many small

changes rather than a “big bang.”

The resulting recommendations

provide a road map that will enable

the organization to better leverage

effort and cost and thus derive true

benefit from process improvement.

INPUT #1: CMM-BASEDAPPRAISALS

In 1986, the Software Engineering

Institute (SEI) began developing a

process maturity framework to help

organizations improve their soft-

ware process and to provide a

means of assessing software devel-

opment capability. After more than

four years’ experience with the

framework, the SEI evolved it into

the CMM for Software. Version 1.1

of the Software CMM (SW-CMM),

published in 1993, was based on

actual practice, included practices

that were believed representative

of the industry’s best practices, and

provided a framework to meet the

needs of individuals who perform

process improvement and process

appraisal activities. The SW-CMM

was designed to focus on a soft-

ware organization’s capability for

producing high-quality products

consistently and predictably. In

this case, capability was defined as

the inherent ability of a software

process (i.e., in-use activities, meth-

ods, and practices) to produce

planned results.

The staged structure of the SW-

CMM includes five maturity levels

and is based on product quality

principles. The SEI adapted these

principles into the maturity frame-

work, establishing a project

management and engineering

foundation for quantitative control

of the software process. Each level

comprises a set of process goals

that, when satisfied, stabilize

process components. The model

therefore reflects prioritized

improvement actions for those

Vol. 16, No. 6

Extracting Real Value from Process Improvement

by Thomas M. Cagley, Jr.

1It has been noted that the pay rate vari-

ances between countries and regions

within countries cause additional com-

plications when comparing costs. Cost

variances require normalization for

comparisons.

2The SW-CMM was used in the assess-

ments described in this article. However,

the processes have been mapped to the

CMM Integration (CMMI) and tested with

similar results.

processes that are of value across

an organization.

At the next level of detail, the model

provides heightened management

insight (or at least the capability for

it) into project methods, progress,

and results. This additional insight,

when coupled with effective and

timely corrective action by manage-

ment, is the primary driver for

improved project results.

From a higher-level perspective, the

SW-CMM provides an organized

framework for producing software

faster, better, and cheaper. It stands

to reason, for example, that sound

estimates produce more realistic

plans that, when effectively

tracked, better identify risks that

require proper management so

they don’t become cost or schedule

impacts. However, what has been

missing until now is a quantitative

correlation between the SW-CMM

practices, their stabilization, and

the resulting impact as to just how

much faster, how much better,

and how much cheaper an orga-

nization can produce a given set

of products.

A CMM-based appraisal includes

15 discrete tasks partitioned across

four phases: planning, preparation,

conducting, and reporting. The con-

ducting phase activities include

data collection and consolidation

tasks performed at the organiza-

tion’s site. These appraisals include

an expert evaluation of an organiza-

tion’s process capability against the

SW-CMM or other reference mod-

els, including appraisals that result

in maturity level ratings.

INPUT #2: PRODUCTIVITYASSESSMENTS

Serious metrics-based process

improvement programs begin with

a quantitative baseline of current

organizational data. The organiza-

tion needs a baseline for cost justi-

fying the investments required for

advancement. A baseline provides

a platform for comparing changes

within an organization.

A “baseline” is a point-in-time

inventory of the number and sizes

of a relevant group of software

applications and/or projects.

Statistical sampling techniques are

the best means to identify the rele-

vant group of projects or applica-

tions (although studies of whole

portfolio populations offer the most

predictive data). Software baselines

also include the other “hard” data

(schedules, costs, effort, defects,

user satisfaction) associated with

the sample. The combination of size

and other hard data allows a com-

parison of related projects and

applications against either industry

average results or best-in-class

results. A comparison of this type is

referred to as a “benchmark.” On-

site data collection provides the

most reliable results for baselines

and benchmarks.

The overall process for baselining

productivity is as follows: develop a

sample, gather quantitative data,

and then develop recommenda-

tions based on the data and the

data collection process (see

Figure 1). The organization’s size

and process maturity, along with

the level of metrics integration, are

indicators of complexity. All these

factors drive the effort required to

perform the assessment process.

Sample SelectionThe first phase of baselining is to

determine a sample size. The

benchmarkers need to generate

samples that will allow them to

forecast the impact of process

improvements and to generate sta-

tistical confidence in the overall

results. The sample should be both

representative of the organization

and statistically sound in order for

the data to be used for forecasting

improvements.

©2003 The David Consulting Group. All rights reserved.June 200314

SampleSelection

Data Analysisand Recommendation

Development

Project Size

OtherQuantitative

Metrics

ProjectAttributes

Dat

a C

olle

ctio

n

Figure 1 — Productivity baseline approach.


The benchmarkers forecast

improvements by leveraging mod-

els constructed in tools such as JMP

or Excel, as well as estimation tools

such as SMART Predictor or SEER-

SEM. Less mature organizations or

organizations that have not devel-

oped historical metrics typically

generate a sample that is represen-

tative of perceived demographics

rather than based on statistical

techniques. Samples should

include a balanced set of environ-

mental attributes, such as platform,

language, project size, and degree

of vendor functionality present

in the development portfolio. I

strongly suggest the use of function

points as the sizing metric for both

applications and projects. Function

points are an industry standard

measure of software size that can

be applied regardless of technology

or language.3

Data CollectionThe data collection phase of a

baseline focuses on collecting

basic metrics for each of the proj-

ects or applications in the sample.

This phase can include developing

or reviewing the metrics based on

the organization’s process and met-

rics maturity. The metrics included

in a quantitative baseline typically

include size, effort, defect, and

schedule data. Quantitative data

describes what has occurred but

not why. Project attribute data that

describes project classification and

organizational characteristics is

needed to complete the picture.

Data Analysis and RecommendationDevelopmentThe third phase of the baseline

process is to analyze the data and

develop recommendations. Data

analysis requires combining the

strengths, weaknesses, and recom-

mendations generated in the CMM

appraisal and the quantitative find-

ings according to the mapped rela-

tionships between the two models.

The combined data allows the

benchmarkers to derive compar-

isons to industry data at any rele-

vant levels of granularity. They use

the results of the assessment and

comparison to industry data to per-

form sensitivity and “what-if” analy-

ses while developing improvement

recommendations.

AND THERE THE TWAIN SHALLMEET: A COMBINED ASSESSMENTAPPROACH

CMM-based appraisals and

metrics-based productivity assess-

ments have evolved as separate

processes. However, each assess-

ment technique is focused on

increasing organization effective-

ness and efficiency.

In developing the initial combined

assessment process, I reviewed

both models and assessment meth-

ods to determine the feasibility of

a combined assessment. The

results of this evaluation identified

the following:

� There was sufficient overlap

and similarities between the

two models and the assess-

ment methods (data collec-

tion, data consolidation, and

reporting) to determine that

a combined assessment was

feasible.

� Overlap between CMM and

productivity assessment

models provided both com-

monality and robustness.

� Overlap between assessment

methods provided opportuni-

ties to leverage assessment

activities.

� Synergies between both

models and assessment

methods would increase the

value of combined results

and joint recommendations.

These findings validated the fea-

sibility of a joint CMM (capability-

focused) and productivity

(effectiveness-focused) assessment.

After the initial review, I performed

further analysis on the two models

and assessment methods to

gauge the amount of model over-

lap and determine how the joint

assessment could be tailored.

Figure 2 shows the overlap

between the two models. The goals

of each assessment drove tailoring

decisions, which had to include

the following:

Vol. 16, No. 6 15

ProductivityAssessment

Synergy(Effectiveness)

CMMAssessment

Synergy(Capability)

Overlap of Assessments(Assessment Synergy)

Figure 2 — Model overlap and synergies.

3See www.ifpug.org for more information

on function points.


� The results of the combined

assessment had to contribute

to process improvements.

� The combined assessment

and results, including recom-

mendations, had to optimize

value to the sponsor, includ-

ing supporting the business,

optimizing cost, and minimiz-

ing disruption.

� The appraisal process had to

be reliable in that the com-

bined event would create a

repeatable process, standard-

ize the conduct of the com-

bined appraisal, yield

predictable results, and allow

for use by both inhouse and

outside consultants.

My analysis illustrated that a mod-

erate degree of overlap existed

between components of the

CMM and productivity assessment

models. I then used the overlap to

calibrate the SW-CMM to the quan-

titative productivity assessment to

create a model used to predict the

impact of organizational process

changes.

PUTTING IT TO THE TEST

In the fall of 1995, the combined

SW-CMM and productivity

assessment was implemented

at a major US IT organization. The

underlying business consider-

ations that supported a combined

appraisal included the overlap and

synergies mentioned earlier, as

well as the very real need to reduce

appraisal impacts. When the two

types of assessments are com-

bined, the overall cost and impact

are lower than if the assessments

are done separately. The assess-

ment team proposed the combined

assessment because it strongly

believed that the productivity and

quality gains that are thought to

result from improved process

maturity could be quantified. The

overall appraisal flow is illustrated

in Figure 3.

In the combined assessment, at a

conceptual level, data would be

gathered via common team activi-

ties, processed using common

rules, and then parsed to the

respective model/method rules

associated with each model. This

overall assessment flow is based

essentially on the major phases of

a CMM Appraisal Framework–

compliant appraisal: the planning

and preparation phase, the con-

ducting (on-site appraisal activities)

phase, and the reporting phase.

Assessment tailoring decisions

were guided by the following

considerations:

� The joint assessment had to

be designed in such a way as

to produce the same result

that standalone assessments

would.

� The outcome of the tailored

assessment had to be consid-

ered legitimate by the

respective communities

associated with each model.

� The combined assessment

had to have the rigor of indi-

vidual assessments while

reducing the duplication of

information collected during

the assessment.

At a practical level, the integrated

assessment involved the following

activities:

� Performing joint planning and

preparation

� Conducting joint data gather-

ing sessions where feasible

� Performing separate rating

and reporting tasks consis-

tent with each method as it

applied to each model

� Performing joint “results

analysis” to create an inte-

grated report and recom-

mendations

Organizational coverage was

another significant issue. As a

starting point, an assessment team

consisting of some inhouse person-

nel and myself selected a set of 62

development projects for the pro-

ductivity assessment. We chose


Common DataCollection

CommonPlanning andPreparation

CMMAssessment

DataCollection

ProductivityData

Collection ReportRate

Figure 3 — Overall appraisal flow.


these particular projects in order to

provide a statistically significant

representation of the organization’s

development environment.4 Our

selection criteria included: plat-

form, programming language

mix, and the presence of vendor-

supplied functionality. From these

62 projects, we then selected eight

for the CMM appraisal. The project

selection criteria for the CMM

appraisal included:

� Software-intensive projects

either completed or in the

latter stages of development

and testing

� Projects that, as a group,

represented the site’s

software work

� Projects that, as a group, had

work for which the CMM key

process areas (KPAs) could

be evaluated

� Projects for which personnel

were available to participate

during the site visits

� Projects that were consistent

with (a subset of) the proj-

ects selected for the process

analysis

Data collection, data consolidation,

and rating activities were per-

formed independently by the

respective development teams,5

and the assessment team sepa-

rately developed and reported the

results, including conclusions and

recommendations.6 I then com-

bined, analyzed, and presented

these results and recommenda-

tions for subsequent process

improvement.

WHAT THE JOINT ASSESSMENTREVEALED

From a SW-CMM PerspectiveThe joint assessment found that the

organization met most goals of the

CMM Level 2 KPAs but not those

relating to the software quality

assurance and software subcon-

tract management KPAs. At CMM

Level 3, goals for the organization

process focus, integrated software

management, and intergroup coor-

dination KPAs were fully satisfied,

but just one of the two goals in the

organization process definition and

software product engineering KPAs

was satisfied. The organization was

therefore rated at Level 1 against

the SW-CMM, version 1.1. Major

recommendations for achieving

CMM Levels 2 and 3 in a future

appraisal included the following:

� Enhancing the collection,

analysis, and planning for

delivery of training

� Formalizing subcontract

management procedures

already in place

� Developing and deploying

a process for formal peer

reviews of work products

� Institutionalizing SQA

reviews/audits

From a Productivity AssessmentPerspectiveData was collected from all of the

projects in the sample. The assess-

ment team used the collected

data to generate comparisons for

six basic software metrics (see

Table 1). The comparisons for

each project were made to projects

of the same type, size, and com-

plexity.7 For example, we would

compare a 500-function-point

Vol. 16, No. 6 17

Time to market Slower than average

Productivity More productive than average

Project staffing Projects use more staff than average

Defect removal efficiency Below average

Delivered defects Higher than average

Project documentation Less documentation produced than average

Table 1 — Productivity Metrics at the Portfolio Level

4The initial combination assessment

was done for a client with three devel-

opment locations (two major and

one minor).

5Data collection was coordinated through

daily joint team meetings.

6Further refinement of the methodology

will reduce the amount of independent

data collection.

7Comparisons are made based on

available data, which typically includes

industry data and internal organizational

data, if available.


client-server project to a statistical

subset of the ISBSG (International

Software Benchmarking Standards

Group) database or other industry

data made up of 500-function-point

client-server projects. Therefore,

deviation was measured from a

relative zero point.

The portfolio-level metrics profile

reflects the results gathered

against the SW-CMM. Organiza-

tions that actively manage with

the assistance of metrics require

the process discipline engendered

in the project management areas

(i.e., the software project plan-

ning and project tracking and

oversight KPAs) to attain maximum

effectiveness from the metrics.8

Some observers have noted that

there is a propensity to trade off

between time to market, project

staffing profiles, and productivity. It

appears that if an organization that

is beginning to develop process dis-

cipline isolates its focus on any one

of these variables, it takes its eye off

the others. Organizations at Level 1

or 2 tend not have the process arti-

facts and discipline required for

optimizing all three variables.

JOINT ANALYSIS ANDRECOMMENDATIONS

The primary goal for the joint

assessment was to develop and

prioritize improvement goals based

on both the SW-CMM and produc-

tivity assessment frameworks. The

prioritization was based on metrics

deemed important by the sponsor

and the organization’s manage-

ment. Process improvement rec-

ommendations were formulated

after joint analysis of all of the

assessment results.

Using the mapped relationship

between the SW-CMM and the

productivity assessment frame-

work, I generated the impact of the

improvement recommendations.

Table 2 shows an example of one

of the specific recommendations.

Each recommendation included

the forecasted quantitative benefits

of its implementation.

Each of the specific recommenda-

tions identified both the CMM KPAs

that would be strengthened and

provided a quantified estimate of

the impact to productivity, time to

market, delivered defects, total

defects, and maintenance. The

factors used to show change were

selected based on feedback from

the sponsor and the goals of the

client organization.

The recommendations that came

out of the joint assessment ranged

from specific changes in the client’s

development methodology all the

way up to and including KPA-level


Improvement Methodical Review of Deliverables by Author and Peers

• Define peer review process

• Identify deliverables to be reviewed

• Collect and use defect data

• Review and audit process

Productivity Improvement

8%-11%

Time-to-Market Reduction

3%-5%

Delivered Defect Reduction

12%-16%

Total Defects Reduction

7%-9%

Maintenance Reduction

6%-8%

Key Process Areas Impacted

• Peer review

• Software project engineering

• Organization process design

Table 2 — Recommendation Example

8It should be noted that if an organization

waits until it has attained the proper

level of discipline before using metrics

for management, it will typically be too

late. I recommend measuring early and

letting the resulting demand for the met-

rics pull the process forward.


process implementation (e.g., peer

reviews). The steps required to

achieve CMM Level 2 and 3 reflect

large increases in productivity and

reductions in time to market. The

integrated recommendations gave

the client critical information for

supporting effective allocation of

scarce resources for future

improvement activities.

CONCLUSION

The joint CMM and productivity

assessment was successful both in

its intended design and in its out-

comes. In measuring the capability

and productivity of the organiza-

tion, the assessments found that

the organization was at CMM Level

1 with overall above-average pro-

ductivity. The individual CMM and

productivity results, as well as the

joint recommendations, proved

useful to the organization by focus-

ing future process improvement

efforts.

This assessment represented a con-

tinued evolution coupling process

maturity and productivity assess-

ments. The combined assessment

significantly reduced the time and

effort impact on the client organiza-

tion that would have resulted from

separate events. One impact reduc-

tion was the ability to combine

assessment preparations. Also,

combining the models and meth-

ods reduced the assessment team’s

time on-site from three weeks

to two weeks. Finally, and most

importantly, the combined assess-

ment allowed the organization to

develop integrated recommenda-

tions based on joint coverage,

which would have been difficult

to develop from separate events.

The following improvements to the

methodology and its implementa-

tion are under development:

� Continued refinement of the

mapping of organizational

effectiveness attributes and

their “levels” (leading edge,

above average, average,

below average, deficient) to

the SW-CMM key practices

(and CMMI process areas)

in order to provide a stronger

basis for performing joint data

collection and consolidation

� A more effective correlation

between CMM techniques

and productivity data

collection processes and

additional work aids (e.g.,

scripted questions, detailed

interview planning, and well-

defined requirements for the

organizational documenta-

tion to be reviewed) to

enhance the ability to per-

form joint data collection

� Deepening the level of team

member expertise in both

CMM and productivity assess-

ment methods so as to per-

mit a single team to perform

the combined assessment

� Introduction of interim

reviews to facilitate early

discussion of findings and

support identification of

trends or global issues to

improve data capture and

contribute to more efficient

changes in data collection

strategies, schedule, and

focus during the combined

assessment

Thomas Cagley is a Managing Senior

Consultant for The David Consulting

Group. He is an authority in guiding

organizations through the process of

integrating software measurement with

model-based assessments to yield effec-

tive and efficient process improvement

programs. Mr. Cagley is a recognized

industry expert in the measurement and

estimation of software projects. His areas

of expertise encompass management

experience in methods and metrics, qual-

ity integration, quality assurance, and the

application of the Software Engineering

Institute’s Capability Maturity Model to

achieve process improvements.

Mr. Cagley has managed many types

of projects within the IT field, including

large-scale software development, con-

version, and organizational transforma-

tion projects. Based on his expertise,

Mr. Cagley managed the development

of an internal project management certifi-

cation program for software project man-

agers for a major bank holding company.

He has also managed and performed

quality assurance (technical and process)

for large IT organizations. He is a fre-

quent speaker at metrics, quality, and

project management conferences.

Mr. Cagley can be reached at E-mail:

[email protected].

Vol. 16, No. 6 19


There is ample literature supporting

the value of software measurement

— and the relative immaturity of

the software development industry,

implying unfledged software meas-

urement models. However, the

attention given to understanding

software productivity and quality is

encouraging. A software measure-

ment program can provide a useful

and proactive management tool

that combines quantitative and

qualitative components [1]. In addi-

tion, not only does a metrics pro-

gram provide a great deal of insight

into the software development and

maintenance process, but as

Raymond J. Offen and Ross Jeffery

observe, it reveals the intersection

of technical and market impera-

tives and the tensions between

them [6].

Shari Lawrence Pfleeger’s analysis

suggests that two out of three met-

rics initiatives do not last beyond a

second year [7]. Karl Wiegers fur-

ther notes that up to 80% of soft-

ware metrics initiatives fail [8]. He

suggests the following reasons:

� Lack of management

commitment

� Measuring too much,

too soon

� Measuring too little, too late

� Measuring the wrong things

� Having imprecise metrics

definitions

� Using metrics data to

evaluate individuals

� Using metrics to motivate

rather than to understand

� Collecting unused data

� Lacking training and

communication

� Misinterpreting the

metrics data

Creating a reasonable, phased

project with clear definitions and

goals can prevent many of these

problems. However, if there is no

management commitment, or mis-

information is communicated, it

will be difficult to establish a suc-

cessful program.

IT performance measures, which

have been used in the past to

show productivity and quality

performance results, must now

evolve into a more sophisticated,

business-oriented measure,

and CIOs will be required to

demonstrate their contribution to

the business using this new form of

measurement [1]. Traditional met-

rics programs measure individual

projects, a factor that is integral in a

development manager’s project

planning and execution. However,

Anandasivam Gopal and his

coauthors suggest that a program’s

success depends on senior man-

agement [2]. As we explain the

value of a metrics program to man-

agement and financial sponsors, it

is difficult to ask for support without

a direct incentive in the form of

quantitative information to manage

the business.

BUILDING A SOLID FOUNDATIONAT AT&T

Looking back to its inception in

April 1999, our organization’s meas-

urement program was based on

some key factors:

� Incremental implementation

� Developer participation

� Separate and dedicated

program managers

� Regularly scheduled feed-

back (quarterly scorecard

reviews)

� Automated data collection

� Training

� Support of the project

champion’s goals

Case studies agree that the key fac-

tors found in our approach are vital

for success [3]. Wiegers would also

say that the way to create a suc-

cessful measurement culture is to

start small, explain why, share the

data, define the data clearly, and

understand data trends.

©2003 Cutter Information LLC20

who y

ou c

allin

’ “a

vera

ge”?

June 2003

Hitting the Sweet Spot: Metrics Success at AT&T

by John Cirone, Patricia Hinerman, and Patrick Rhodes


Choosing Function PointsOne of the fundamental decisions

that we made early in the program

was to use function points to meas-

ure the size of our applications in

order to derive productivity, cost,

cycle time, and quality measures.

Although there are many measure-

ment models and methodologies to

quantify software size, we chose

function points as our unit of meas-

ure and applied them consistently

across the organization.

Function point counting measures

software development from the

customer’s point of view, quantify-

ing functionality provided to the

user based primarily on logical

design. This definition is in agree-

ment with the International

Function Point Users Group’s

Function Point Counting Practices

Manual, Release 4.1 [4]. Our deci-

sion to use function points is sup-

ported by metrics experts David

Garmus, David Herron, and Capers

Jones who contend that function

points, although not perfect, are the

best choice for studying software

economics and quality [1, 5].

Getting Outside HelpThe second fundamental decision

was to use an outside vendor to

count our function points. This

enabled us to share some of the

risk for program success, alleviate

bias in counting practices, and have

an experienced skill set readily

available. The main objectives of

our program were significant, and

we realized early that it would take

time and practice to create a pro-

gram that would produce useful

and accurate measures for charting

the company’s direction. Our

objectives were to:

� Create and maintain a man-

agement tool to measure

software development and

maintenance independent

of the technology used for

implementation

� Measure functionality that the

user requests and receives

� Develop repeatable and

measurable performance

profiles

� Improve the metrics data

collection process

� Establish a set of measures

that support improvement

initiatives (productivity

and quality)

� Identify measures that sup-

port business strategies

� Expand to help estimate and

choose projects as well as

manage contracts

� Align with supporting pro-

grams such as release plan-

ning, project management,

our standard development

framework, and software

quality assurance

Doing Time Reporting RightWe learned quickly that a well-

defined and reasonable time-

reporting process was essential

once we had accurate function

point counts. We found that if we

did not have a time-reporting

process in place or maturing in

parallel with our function point

counting program, function points

would be less valuable.

In order to understand our produc-

tivity and costs, we needed to

categorize work effort to apply to

the function points, the unit in

which effort was being measured.

We spent time defining the work

activity, which we categorized as

new development and mainte-

nance. Next, we created projects,

which combined like applications

together by customer and category

(warehouse, transactional inter-

nally developed, transactional

vendor developed) and limited the

number of projects so individuals

would more likely report to the cor-

rect project. Each project had the

agreed-upon task structure below

it for time-reporting purposes. We

tracked releases (enhancement

projects) by quarter and built quar-

terly development tracking into the

task structure.

Taking a Phased ApproachWe took a phased approach and

assigned a central project man-

agement group to track process,

communicate definitions, provide

status, and support 100% time

reporting. This central group priori-

tized applications, completed a

baseline count for 42% of the appli-

cations identified as core applica-

tions, and performed estimated

counts for the remaining. We chose

Vol. 16, No. 6 21

We learned quickly that a

well-defined and reasonable

time-reporting process was

essential once we had accu-

rate function point counts.


not to use backfiring (a computed

function point value based on the

total number of lines of code and

the complexity level of the pro-

gramming language) because of

its reported inaccuracy [1].

Each quarter, we counted releases

for the baselined applications and

replaced all estimated counts with

actual counts by the end of 2000.

We set goals to replace estimates

with actual counts and prioritized

releases (counted medium and

large). In an effort to manage to

a budget, we combined small

releases over quarters and counted

them collectively while merging the

effort expended across quarters to

create them.

Benchmarking Our PerformanceOnce we felt that we were provid-

ing information to count function

points accurately and reporting

time appropriately, we became

comfortable with the results that

were being reported on our quar-

terly scorecards. Our next question

was how we were doing as com-

pared to other development shops.

We began to look for benchmark

data, which helped serve two

important purposes. It facilitated

inspection of our measures for

accuracy, and it gave us the infor-

mation we needed to set goals for

achieving above our industry’s

benchmark.

One of the most tedious tasks was

to align our data definitions with

those of the benchmark data so

that we would actually have com-

parative data. The benchmark data

was organized by the following cat-

egories: warehouse, transactional

internally developed, and transac-

tional vendor developed; and func-

tion point ranges: S, M, L, and XL.

We used other variables as well —

operating system, programming

languages, and so on — to get the

best comparisons. Our benchmark

data (specific to our portfolio) iden-

tified a “sweet spot” to aim for. For

us, this translated to quarterly

releases for cycle time and release

sizes of 76-175 function points.

Keeping It SimpleWe tried to keep the program as

simple and nonintrusive to the

development and maintenance

process as possible. We used data

that already existed, and where we

needed to collect new data, we

automated and simplified collec-

tion as much as we could.

Gopal et al.’s study supports the

need to (1) keep the technical envi-

ronment “user-friendly” and (2)

engage stakeholders (i.e., develop-

ers, managers, and senior leader-

ship) in order to have a successful

metrics program. Their study shows

a positive and significant inter-

dependence between these two

factors, which contributed to met-

rics program performance. They

further state that a successful met-

rics program is one that results in an

increase of metrics use in decision-

making and has a positive impact

on organizational performance [2].

Leveraging the DataWe began to see that the data we

collected and analyzed became

very useful not just for application-

specific decisions but for decisions

that would impact the entire devel-

opment shop. However, the latter

required us to arrive at an aggre-

gated view of the work being done

while still being able to drill down

into the details to explain trends

and anomalies. As we experi-

mented to come up with an aggre-

gated view and then agreed upon a

method, we soon found that the

data helped us plan and manage

human resources more effectively.

We reviewed our time reporting

on a regular basis and began to

understand our development-to-

maintenance ratio as compared

to our plan. As we started new

projects, this data helped us see

how to distribute resources more

effectively across the entire

development shop.

Data analysis was also influential in

creating processes to support hit-

ting the “sweet spot” (for both

release size and cycle time) where

we can be most competitive. We

have created recommendations for

a quarterly release schedule, fine-

tuned our time reporting and esti-

mating in a centralized manner

across the development shop,

and aligned our tools with the

processes we have developed.

Finally, we are now able to explain

the value we provide in a measur-

able way. One of the most mar-

ketable outputs from this program

is the ability to articulate our value

and to react quickly to information

requests, with supporting details.

WAYS TO MEASURE PRODUCTIVITY— AND WAYS NOT TO

For this article, we will focus on

the labor productivity metric for

new software development. The

measure is a quantification of the

©2003 Cutter Information LLCJune 200322


number of function points of new

software functionality that one soft-

ware developer can develop in one

month’s time. This productivity

metric is called “function points per

staff-month” and is calculated as

the number of function points

developed for a release divided

by the person-months of labor

expended. The same approach can

be used for the development cycle

time and software maintenance

metrics as well as new software

development productivity.

Table 1 contains sample data for

two successive quarters. Each row

in the quarter represents newly

developed software that was insti-

tuted into the production system in

the quarter (although some of the

labor associated with the release

may have been expended in prior

quarters). For the purposes of this

article, we assume all of the sys-

tems in this universe can be cate-

gorized as either a transactional

system or a data warehouse system

(indicated in column B). Column C

is the size of the release stated in

function points. Column D is the

staff-months of labor expended to

develop each release. Column E is

the productivity metric “function

points per staff-month,” calculated

by dividing column C by column D.

An aggregate productivity total can

also be correctly calculated by

dividing the total in column C by

the total in column D.

Table 2 contains sample external

benchmark data, which was

derived from a software bench-

marking firm after a study of the

applications being benchmarked.

In this case, the benchmark for

a given release is determined

through a combination of the type

of application (column A) and the

release size in function points

(column B).

Pairing the quarterly release data

in Table 1 with the benchmark data

in Table 2 yields the benchmark

comparison in Table 3. Column C

is our achieved productivity calcu-

lated in Table 1. Column D is the

benchmark productivity determined

by looking up the application type

and release size in the benchmark

data table (shown in Table 2).

This comparison is great for meas-

uring individual releases. Detailed

examination and root cause analy-

sis of releases that exceeded or

missed the benchmark can lead

to great process improvements,

which ultimately yield consistently

Vol. 16, No. 6 23

Quarter #1 A B C D E

Release # Type of Application Release Labor Productivity Size (in staff-months) 1 Transactional 10 1.2 8.33 2 Transactional 175 17 10.29 3 Data warehouse 15 3 5.00 4 Data warehouse 75 9 8.33 5 Transactional 100 7.5 13.33 Quarter #1 Total 375 37.7 9.95

Quarter #2 Release # Type of Application Release Labor Productivity Size (in staff-months) 1 Data warehouse 90 14 6.43 2 Data warehouse 100 12 8.33 3 Transactional 80 8 10.00 4 Data warehouse 75 9 8.33 5 Data warehouse 10 2 5.00 6 Transactional 60 4.5 13.33 Quarter #2 Total 415 49.5 8.38

Table 1 — Quarterly Software Release Data

A B C Type of

Application Release Size in Function Points

Benchmark Productivity*

Transactional 1-50 10

51-150 12.5

> 150 11

Data warehouse 1-50 6

51-150 7.5

> 150 7

*Productivity expressed as new function points developed per staff-month of labor

Table 2 — Function Point Development Productivity Benchmark Data


higher productivity in future

releases. This is the traditional

value of the function point metric,

and, in our opinion, it continues to

be its greatest value.

The problem lies in using this data

to calculate some sort of overall

benchmarked productivity meas-

ure that can be compared from

quarter to quarter. This would be

useful for measuring overall trends

in development productivity and

for executive-level benchmarking

presentations. Below we examine

(and discount) some simple

approaches to creating such a

measure.

Trend the Actual ProductivityThis approach would not use any

external benchmark data; we

would simply trend our actual

aggregate productivity. In our

sample data, using the quarterly

productivity totals from column E in

Table 1, we would say that produc-

tivity in aggregate declined from

9.95 in quarter #1 to 8.38 in quarter

#2, a drop of 15.8%.

This is a flawed approach because

the benchmark data tells us that

enhancements on decision support

applications are inherently less pro-

ductive than enhancements for

transactional applications. Also,

productivity varies by the size of the

release. This approach ignores

those factors entirely. In an even

modestly complicated develop-

ment shop, quarterly productivity

measures can vary widely from

quarter to quarter depending

on the type of applications

enhanced as well as the size

of the enhancement.

Count the Winners and LosersThis approach expresses the num-

ber of releases exceeding the

benchmark as a percentage of the

total number of releases. In our

sample data, for quarter #1, two

of five releases (40%) exceeded the

benchmark. In quarter #2, three

of six releases (50%) exceeded the

benchmark, a 10% improvement.

This is a valid metric and one we

use every quarter. It is useful for

measuring how often a develop-

ment group is achieving bench-

mark. Its use is limited, however,

because it weights all releases

equally, regardless of how big

the release was.


Table 3 — Actual Versus Benchmark Productivity

B C D E Quarter #1

Type of Application Actual Benchmark Comparison Productivity Productivity to Benchmark Transactional 8.33 10 ��

Transactional 10.29 11 ��

Data warehouse 5.00 6 ��

Data warehouse 8.33 7.5 ��

Transactional 13.33 12.5 ��

Total 9.95 9.4

Quarter #2

Type of Application Actual Benchmark Comparison Productivity Productivity to Benchmark



Transactional 10.00 11 ��



Transactional 13.33 12.5 ��

Total 8.38 8.58 �

Actual Benchmark

Actual as % Productivity

Productivity of Benchmark

Quarter #1 9.95 9.40 105.9%

Quarter #2 8.38 8.58 97.7%

Table 4 — Averaging the Benchmark Data


Average the Benchmark DataIn this approach we would com-

pare our aggregate productivity

to the average of the benchmark

productivity (see Table 4). In our

example, we would average the

benchmark data in column D of

Table 3, giving us 9.40 for quarter

#1 and 8.58 for quarter #2. Using

this method, we could create quar-

ter over quarter comparability by

expressing actual productivity as a

percentage of average benchmark

productivity.

This method would tell us that our

aggregate productivity dropped

from 105.9% of benchmark to 97.7%

quarter over quarter, a 7.7% drop in

productivity.

This approach is better, but the sim-

ple averaging of the benchmark for

each enhancement again weights

them all the same, regardless of

how big they were. Consider a

quarter with two releases, a huge

one (1,000+ function points) that

beat the benchmark by a large mar-

gin and a tiny one (1 function point)

that missed the benchmark by an

equally large margin. This simple

average calculation would indicate

that we hit the benchmark exactly,

when intuitively it felt like a very

productive quarter.

OUR APPROACH

Over time we have refined the

benchmark averaging method

described above to weight the

aggregate benchmark average

by the size of the releases.

Conceptually, we calculate the

labor it would have taken if we

performed exactly at benchmark

and then calculate aggregate

benchmark productivity from that

number. This is explained in more

detail below.

The formula for our benchmark

is Productivity = Release Size/

Labor. Using simple algebra,

we infer that Labor = Release

Size/Productivity. If we want to

know the labor required to

perform exactly at benchmark,

we can further refine the formula

as Benchmark Labor = Release

Size/Benchmark Productivity. In

Table 5, we calculated this theoreti-

cal benchmark labor as shown in

column E by dividing column C by

column D. We can total column E

to get a grand total of the labor it

would have taken if every enhance-

ment had been performed exactly

at benchmark.

To get the weighted average

aggregate benchmark, we divide

the total function points in all

enhancements by the total labor

at benchmark. In Table 5, this is

the total in column C divided by

the total in column E:

Quarter #1 Weighted AverageAggregate Benchmark = 375/37.4 = 10.02

Quarter #2 Weighted AverageAggregate Benchmark = 415/49.9 = 8.32

Vol. 16, No. 6 25

A B C D E Quarter #1

Release # Type of Application Release Size Benchmark Labor Required Productivity to Achieve Benchmark

1 Transactional 10 10 1.0


3 Data warehouse 15 6 2.5

4 Data warehouse 75 7.5 10.0

5 Transactional 100 12.5 8.0

Quarter #1 Total 375 37.4

Quarter #2

Release # Type of Application Release Size Benchmark Labor Required Productivity to Achieve Benchmark






6 Transactional 60 12.5 4.8

Quarter #2 Total 415 49.9

Table 5 — Weighted Average Aggregate Benchmark


Now we can express our aggregate

productivity as a percentage of the

weighted average aggregate pro-

ductivity to get a ratio we can com-

pare (see Table 6).

Using this method, we actually

trended our productivity upward by

1.4% quarter over quarter. Note that

the weighted average indicates a

trend that is increasing, while the

simple average indicated that pro-

ductivity had dropped (an outcome

that further illustrates the frailties of

the simple average).

When developing the aggregate

benchmark, this method takes into

account release size as well as

application type and weights them

appropriately. A trend plotted from

this data is an extremely valuable

tool for communicating overall pro-

ductivity trends. We currently have

a 12-quarter trend line that high-

lights steady improvements over

time. This has proven to be a

valuable addition to the metrics

program as a high-level commu-

nication tool.

CONTINUOUS PROCESSIMPROVEMENT

During the early stages of imple-

menting a measurement program,

an organization should focus on

validating the process at the funda-

mental level to ensure a sound

base for the program. Key require-

ments in the initial phase of the pro-

gram are consistent definitions for

system applications, system docu-

mentation, and development and

maintenance task code structures.

We implemented various process

improvements to enhance the level

of accuracy for time-reporting data,

clearly defined development and

maintenance activities, and for-

mally captured software release

data for utilization in the develop-

ment release sizing. Comparison to

industry benchmark data was com-

pleted at the detail level to ensure

consistent application of develop-

ment and maintenance task levels.

We compiled quarterly trend analy-

sis for both development and main-

tenance productivity metrics over a

sustained time period to assist in

validating the accuracy of the met-

rics process. This gave us the

opportunity to expand the analysis

and reporting at an aggregate level

for IT and executive management

reporting.

Once the measurement program

matured at the detailed level, the

focus of process improvements

shifted from validation of the data

to internal analysis and aggregate

comparisons to industry data. Our

next phase is to expand the metrics

process further back into the devel-

opment lifecycle. Currently, all

analysis addresses software pro-

ductivity following the completion

of the development lifecycle. We

have investigated performing

function point analysis of potential

software releases prior to the

system development phase based

on detailed user and system

requirements.

Projecting the function point count

of software releases at the require-

ments stage will assist in resource

planning, improve the overall cost

estimates, more accurately deter-

mine software delivery schedules,

and highlight additional functional

changes to requirements that are

inherent but not often measured,

within the development cycle

(scope creep). This measure can

be accomplished utilizing the same

requirements documentation avail-

able at the end of the project and

would not require the developers

to compile additional documents.

This data analysis would also high-

light the potential stability or volatil-

ity of the business requirements

presented in the initial develop-

ment phase and would be an

invaluable tool in managing user

and executive management expec-

tations for software delivery.

Our organization has also planned

another key quality process

improvement to more closely

align defect tracking with the

economic measurement process.

Going forward, the development

shop intends to track the relation-

ship between release sizes,


Table 6 — Aggregate Productivity as a Percentage of the Weighted Average Aggregate Productivity

Actual Benchmark

Weighted Benchmark Productivity

Actual as Percentage of

Benchmark

Quarter #1 9.95 10.02 99.3%

Quarter #2 8.38 8.32 100.7%


associated productivity measures,

cycle time, and software defects

encountered in production as part

of its overall analysis.

At the detailed level, we have uti-

lized industry benchmark data to

compare software delivery for

individual application software

releases and have tracked aggre-

gate productivity against the aggre-

gate benchmark data for the total

development shop. This aggregate

benchmark represents an industry

average, and while this has served

us well, we have matured past

comparing ourselves to the aver-

age. Our next goal is to determine a

benchmark indicating first quartile

(top 25%) and our performance

in comparison to external high-

performing IT organizations.

SUMMARY

Our program utilizes the precision

of a traditional software measure-

ment program to track diverse

application-level productivity

through function points, and it sum-

marizes the information to produce

an aggregate to measure the overall

effectiveness of the development

shop against an aggregate compos-

ite of industry benchmarks. The

comparison is an effective tool for

demonstrating IT productivity and

value to both IT and executive-level

management.

In developing the program, we

viewed the implementation as a

phased approach and focused our

initial investment on the accuracy

and validity of the captured data. As

with any long-term IT initiative in

the current corporate climate,

demonstrated short-term success

is significant in gaining continued

commitment from the organization.

For the program to be sustainable

over time, it must have commit-

ment from the developers, be flexi-

ble enough at the detail level to

adapt to inevitable organizational

changes, and continue to mature

through process improvements

and continual leadership vision

and sponsorship.

REFERENCES

1. Garmus, D., and D. Herron.

Function Point Analysis.


2. Gopal, A., M.S. Krishnan,

T. Mukhopadhyay, and D.R.

Goldenson. “Measurement

Programs in Software

Development: Determinants of

Success.” IEEE Transactions on

Software Engineering, Vol. 28, No. 9

(September 2002), pp. 863-875.

3. Hall, T., and N. Fenton.

“Implementing Effective Software

Metrics Programs.” IEEE Software

(March/April 1997), pp. 55-64.

4. International Function Point

Users Group. Function Point

Counting Practices Manual, Release

4.1. IFPUG Standards, 1999.

5. Jones, C. Applied Software

Measurement. McGraw-Hill, 1996.

6. Offen, R.J., and R. Jeffery.

“Establishing Software

Measurement Programs.” IEEE

Software (March/April 1997),

pp. 45-53.

7. Pfleeger, S.L. “Lessons Learned

in Building a Corporate Metrics

Program.” IEEE Software (May

1993), pp. 67-74.

8. Wiegers, K. “10 Traps to Avoid.”

Software Development, Vol. 5,

No. 10 (October 1997), pp. 49-53.

John Cirone received his BS in computer

science from Kean University and an

MS in technology management from

Stevens Institute of Technology. He cur-

rently leads the IT organization respon-

sible for all of the finance and human

resource systems at AT&T. The metrics

program discussed in this article was

instituted for Mr. Cirone’s organization.

Mr. Cirone can be reached at AT&T, One

AT&T Way, Room 2B121, Bedminster, NJ

07921-0752, USA. E-mail: [email protected].

Patricia Hinerman received her BS in

neuroscience from the University of

Scranton and an MBA from Rutgers

University. She is currently a member

of AT&T’s technical staff and has seven

years of experience managing require-

ments and developing software appli-

cations. Ms. Hinerman was one of the

principals in the development and main-

tenance of the metrics program discussed

in the article.

Ms. Hinerman can be reached at AT&T,

30 Knightsbridge Road, Room 52G19,

Piscataway, NJ 08854-3913, USA. E-mail:

[email protected].

Patrick Rhodes received his BA in political

science and English and MBA from

Rutgers University. He is currently a

member of AT&T’s technical staff and

has 20 years of experience supporting all

phases of system development within the

AT&T financial systems. Mr. Rhodes was

one of the principals responsible for the

development and implementation of the

metrics program discussed in the article.

Mr. Rhodes can be reached at AT&T,

30 Knightsbridge Road, Room 52A254,

Piscataway, NJ 08854-3913, USA. E-mail:

[email protected].

Vol. 16, No. 6 27


My journey in metrics began more

than six years ago, when I was

asked to put together metrics for a

group within our IT development

division. The group was composed

of about 40 systems and 600 IT staff,

and it collected very little, if any,

metrics data. There was a mandate

across the total development divi-

sion to provide metrics data for

review with our CIO.

I knew collecting metrics data was

going to be a hard sell, especially

with the technical staff. Technical

staff typically feel that collecting all

this data just gets in the way of “real

work.” Luckily, I had come from a

similar organization and had faced

this same task. This is the story of

how I got started, what happened,

and how this program evolved

once we began to outsource our

work. The adage “you can’t man-

age what you don’t measure” is

really true. Unless a CIO has some

measures in place for demonstrat-

ing both the cost and quality of sys-

tems, it is impossible to tell how

well the organization is doing com-

pared to the industry average and

best in class.

Since I had the top-level support I

needed, my first task was to assess

the existing situation to see where

the organization was. An enterprise

metrics handbook and enterprise

metrics repository that I had previ-

ously helped to develop would

serve as the standards for what

metrics we would collect and

how we would report them.

Most articles I read say top-level

support is critical to making a met-

rics program happen. This certainly

is true, as it helps to motivate

groups to begin the process of

collecting metrics data. However,

top-level support will only go so far.

You need to get down to the details

to make metrics collection and

reporting a reality. Here are the

steps we took:

� We developed a metrics

program plan for the organi-

zation. This consisted of the

metrics I wanted to collect

and the process for collection.

� We developed an action plan

to begin the collection of

data. It is important to get

upper management to realize

that data collection does not

happen overnight!

� We developed the software

quality assurance (SQA) role

within the organization

and ensured that the SQAs

worked for a central, inde-

pendent group.

� We gave the SQAs responsi-

bility for collecting metrics

data for their projects.

� We established a metrics

coordinator who was respon-

sible for both scheduling

function point counts across

the applications and validat-

ing data.

� We coupled metrics collec-

tion with improvements the

organization was making to

reach Level 2 of the Software

Engineering Institute’s (SEI)

Capability Maturity Model

(CMM). This was very effec-

tive, since achieving Level 2

requires you to show that you

are collecting metrics data.

� I began communicating the

value of metrics at various

division meetings. Sell, sell,

and sell! This involved

explaining how the data

collected would help the

groups identify how they

were doing and help improve

processes. I made sure the

groups understood that the

data would not be used

against them.


From Important to Vital: The Evolution of a MetricsProgram from Internal to Outsourced Applications

by Barbara Beech

insi

de, out

June 2003

Top-level support will only

go so far.


� We identified the metrics that

would be easiest to collect

first. I felt it was important to

start reporting something

soon, even if it wasn’t all the

metrics that were planned.

� I worked with the groups that

were the most receptive,

which put peer pressure on

the other groups to follow.

� We developed a scorecard at

various levels (application,

division, and all systems) and

reviewed the scorecard at

monthly meetings with the

various groups.

� We began the process of

using the collected metrics

data for estimation purposes.

This became a harder task

than it seemed, because

we couldn’t use the metrics

data directly without a profile

of the project staff. Still, it

showed a lot of promise

and value for the CIO and

the development division.

Figure 1 shows the action plan that

we developed. It is important to

note that it would take at least six

months to get a good sample of

data from across the organization,

and it took longer to obtain a good

base of data for some metrics than

for others.

Figure 2 shows an example of the

scorecard we developed. There

were different versions within the

organization at various levels, such

as the division and application lev-

els. There are some important

points to note here. We measured

quarterly progress as well as year-

to-date progress. We also estab-

lished objectives after reviewing

our baseline data and compared

our quarterly results to the monthly

objectives. Our yearly objectives

were based on percentage

improvements from our baseline

results. I also compared our results

to best-in-class data in addition to

our yearly objective.

THE CHALLENGES WE FACED

The Perception That Metrics WereMore WorkTo tackle this, I tried to make the

collection of metrics data as trans-

parent to the technical staff as pos-

sible. The SQAs were charged with

the task of collecting metrics data

for the projects they worked on.

This worked quite well. Since the

SQAs worked with the project

groups, it was easier for them to

collect the data and report it.

Using Function Points to Sizethe WorkThere is an endless debate on the

use of function points, one that

needs to be put to bed early in a

metrics program. I challenged the

skeptics to give me a better size

measure to use. To date, no one

has provided anything better than

function points. I know they have

their limitations, but they are the

best method the IT community has

for sizing work. So we hired an

Vol. 16, No. 6 29

Metric (Actuals) Quarterly Targets (Percentage of system releases providing metrics data) 1Q 2Q 3Q 4Q 1Q99 2Q99 Action Plans to Attain Targets Productivity 29% 33% 50% 75% 100% Schedule function point counting and establish metrics program and

metrics collection process.

Defect removal efficiency

29% 33% 35% 50% 75% 100% Increase number of inspections and training. Monitor inspection reports.

Delivered defect density

29% 33% 50% 60% 75% 100% Establish metrics program and metrics collection process. Automate defect collection.

Overall defect density

29% 33% 50% 60% 75% 100% Increase number of inspections. Publish inspection results. Automate defect collection.

Labor cost/FP 29% 33% 50% 75% 100% Schedule function point counting and establish metrics program and metrics collection process.

Development cycle time

29% 33% 50% 75% 100% Establish metrics program and metrics collection process.

Cost variance 29% 33% 50% 75% 100% Establish metrics program and metrics collection process. Further implement automated tool for estimating.

Figure 1 — Our metrics action plan.


external consultant to baseline and

count enhancement projects. We

tried to take up the least amount of

the project teams’ time to do this.

Having an external resource do the

function point counting on an as-

needed basis was very effective

from a time and cost perspective.

Collecting Accurate Time DataMost metrics require accurate time

data to be collected on projects.

You need to know the amount of

hours spent on specific projects to

determine a project’s cost per func-

tion point. Although we had an

internal time-tracking system,

ensuring that data was entered cor-

rectly and charged to the right proj-

ect was quite a challenge. Our

SQAs would review the time data

for all projects to ensure it was cor-

rect. However, with an internal

organization, time reporting is

never going to be exact. As long as

the organization was evaluated on

the overall cost of all projects,

ensuring that the time for all proj-

ects was correct was a difficult if

not impossible task. Tracking over-

time hours was also a challenge.

So we did the best we could with

the data while acknowledging its

limitations.

Collecting Defect DataIf you want to know the quality of

your systems, you need to collect

the defects associated with your

projects. This is probably the hard-

est data to begin to gather, and it

takes the longest. If there is no

defect data currently being col-

lected, then it could take months to

begin to collect this information.

You need a tool to gather and

record the defects associated

with all application releases and

their severity levels. It is best to

standardize on a tool, but that

depends on the environment you

are working in: client-server or

mainframe. So you might have sev-

eral tools across your organization.

Furthermore, collecting the defects

is not enough; you need to ensure

that they are associated with a spe-

cific application release. The best

measure to use here is delivered

defect density. Looking at total

defects from requirements to pro-

duction is great, but very few orga-

nizations track defects well enough

in the requirements stage to make

this a very valid measure. Most

organizations seem to collect data

only on testing defects. To collect

defects on up-front work (i.e.,

requirements/coding), you need to

have a good inspection process in

place. So even though we collected


Figure 2 — Our metrics scorecard.

Dimension End-to-End Metric 1Q99 2Q99YTD

Progress s YTD 1999

1999Objective e

Best in Ct in Class

StatusCompared ed to

1999 O9 Objective e

Productivity(FP/ef/effort-mot-months)s)

REDRED

CostCost/fu/function po point GREENGREEN

Cost va variance(ac(actual: es: estimated) )

GREENGREEN

Defect re removalefefficiency y

YELLOWYELLOW

Quality Delivered de defect de densityty REDRED

Overall de defect de density GREENGREEN

Time te to Mo Market Project cy cycle ti time GREENGREEN

ProcessImprovement t

Process coProcess compliance GREENGREEN

StaStatutus:

Red: M Red: Major improvement t needed compared withth 1999 ob objectitive

Yellow: So Yellow: Some improvement t needed compared withth 1999 ob objectitive (w(withthin 20%) ) or addititional datata needed Green: A Green: As good (w(withthin 2%) ) or bettetter ththan 1999 ob objectitive


data on overall defect density, I

doubt we were actually capturing

all the defects throughout the entire

software development process.

MAKING YOUR METRICS DATAUSEFUL TO OTHERS

The following are some ways to

make the metrics data you collect

of value to others:

� Communicate the value of

the data you are collecting

and show that it can be used

to improve processes. This is

a tough sell to some groups,

but it is easier with those

groups that want to achieve

SEI CMM Level 2 and

improve their processes.

� Post data and send the met-

rics results out to the applica-

tion development teams

frequently.

� Try to tie metrics results to

objectives. This is difficult,

since many times getting a

project out is more important

than anything else and

becomes the overwhelming

objective.

� Couple metrics data with an

estimating process. Use the

data you collect to help you

estimate your work more

accurately.

� Translate metrics results into

a real public relations oppor-

tunity for the CIO. If IT devel-

opment costs and quality are

within industry standards —

to say nothing of best in class

— this can go a long way

toward demonstrating the

effectiveness and efficiency

of IT development within the

company.

HOW OUTSOURCING AFFECTEDOUR METRICS PROGRAMS

Two years into my work on the

metrics program, senior manage-

ment decided to outsource all the

development work. Obviously, this

changed things a lot. Suddenly,

defining the right service levels for

the outsource contract became

critical, and metrics collection

became very important!

What we found out:

� Outsourcing automatically

engages all your systems in

the collection of service-level

data. What was a difficult sell

prior to outsourcing becomes

part of the contract, and all

groups must participate

whether they like it or not.

� We didn’t feel that our inter-

nal data was robust enough

to use as service-level agree-

ment (SLA) targets, so we

decided to rebaseline the

data with the vendor.

� Time tracking becomes

easier, since this is how the

vendor is paid.

� Application development

teams that weren’t tracking

metrics data before now need

to get on the bandwagon.

� Quality measures are critical,

and so is tracking defect

data. Tracking defect data

minimizes risk to the busi-

ness when new releases

are deployed.

� You need to measure not

only delivered defect density

but also residual or latent

defect density to get the best

picture of the application

quality.

� Different measures need to

be added besides cost and

quality. We also need to

measure our vendor on

responsiveness to requests

and timely delivery of

enhancements.

� The vendor now owns all

application work, which was

previously split among vari-

ous internal IT groups with

competing interests. This

means that measuring proj-

ects end to end has become

easier.

� Validation of data is critical,

since a vendor can be penal-

ized if it misses service-level

targets.

� Benchmarking has become

very important, since we

need to ensure that our

outsource cost and quality

are at least meeting industry

averages.

� Establishing the right service

levels at the beginning of the

contract is vital.

What works better:

� The vendor is now responsi-

ble for the collection of all

data and needs to provide

information specified in the

contract. Therefore, there is

less fighting with internal

organizations just to collect

the data.

Vol. 16, No. 6 31

Outsourcing automatically

engages all your systems

in the collection of service-

level data.


� Time tracking needs to be

more exact than within our

internal organization.

� Defect tracking is initiated for

all applications and standard

tool sets are applied.

What is harder:

� There is less flexibility to

change or add new metrics.

� Validation of data that the

vendor is providing is a diffi-

cult task. This is more critical

now since we need to ensure

that the vendor is reporting

correct data.

� Root cause analysis

processes for service-level

misses need to be developed

with much rigor in a multi-

vendor environment.

Something we did not do

before as an internal organi-

zation now needs a lot of

focus since we need to know

the reason for any service-

level failures and what is

being done to correct them.

The chart in Figure 3 lists the met-

rics we have begun collecting from

our vendors in the following areas:

� Cost: Critical financial drivers

� Quality: Measures directly

affecting internal AT&T

customers

� Responsiveness: Metrics

ensuring that AT&T meets

commitments to business

partners

� Customer Satisfaction:

Qualitative measurement

of the success of the

partnership

WHAT WE HAVE LEARNED� Metrics are essential to any

software development proj-

ect or program, whether it is

inhouse or outsourced.

� Even though I sometimes see

metrics programs disappear,

they always come back in

some shape or form.

� You can’t know where you

are as an IT development

organization if you don’t

measure quality and cost.

� Standard definition of metrics

is important.

� Collection of metrics data is

hard and takes time.

� Knowing where you stand

in relation to the rest of the

industry is important for set-

ting improvement goals.

� There is no point in getting

hung up on function points.

They are just a size measure,

and the best one I have

found to date across various

platforms.

� Metrics collection should

be coupled with process

improvement activities.

� The data must be believable.

If no one believes that the

data you are reporting is

accurate, they will not take

the metrics you report seri-

ously, and your metrics pro-

gram will soon lose steam.

Barbara Beech is a District Manager

at AT&T in the Consumer CIO Vendor

Management Division. She has worked

at AT&T for 19 years in the area of soft-

ware development. During that time, she

was involved in the development of new

systems supporting both business and

consumer services. For the past seven

years, her focus has been on process and

metrics. She has worked to establish a

balanced scorecard, helped application

teams achieve CMM Level 2, and sup-

ported the definition of service levels for

outsourcing initiatives.

Ms. Beech can be reached at AT&T,

30 Knightsbridge Road, Room 53C338,

Piscataway, NJ 08854, USA. Tel: +1 732

457 3715; E-mail: [email protected].


Cost Responsiveness

• Cost per function point for enhancements

• Estimate accuracy (committed versus delivered)

• Estimate accuracy (original versus committed)

• Project estimates within commitment time frames

• Enhancement within commitment time frames

• Ad hoc requests within commitment time frames

• Enhancement cycle time

Quality Customer Satisfaction

• Customer satisfaction survey

Key Development Initiatives

• End-to-end user response time for critical systems

• Critical deliverables• Key deliverables

• Delivered defect density

• Residual defect density

• System availability (IUMs)

• Number of production application defects

• Production application defects closed within commitment time frames

• Business process metrics

Key development initiatives must be delivered with the specified:

• Functionality• Schedule• Quality (production defects)

Figure 3 — Metrics we collect from our vendors.

look in t

he m

irro

r, p

al

Get The Cutter Edge free: www.cutter.comwww.cutter.com/consortium/ 33

To be effective, businesses of all

sizes need to understand their own

performance. While large, estab-

lished organizations will typically

have a solid infrastructure and a

constant finger on their pulse,

smaller, growing companies often

struggle with fundamental issues.

This is highly prevalent in technol-

ogy organizations, where an entre-

preneur with the Next Great Idea

suddenly finds that organizational

issues are consuming more and

more of his or her precious time. It

is wise for any organization to have

a clear understanding of how well it

is performing, either as an impetus

to improve or as a basis for under-

standing how much work it should

reasonably take on in the future.

What is the best structure for our

organization? How productive

should we expect to be? What

should we be paying our staff?

These and many other questions

need to be resolved for small, grow-

ing companies to get beyond their

nontechnical hurdles to success.

With all these questions to answer

and so little time, there is often a

rush to quickly find “the solution,”

whether a general solution really

exists for the industry as a whole

or not. Among the frequently asked

questions are the following:

What is the appropriate ratio of

software testers to developers?

Companies want to use this num-

ber to mold the structure of the

development organization, but

there is no right answer here. I have

worked on significant projects in

which the developers successfully

performed the bulk of the testing of

the system, and I have worked with

teams where dedicated testers out-

numbered developers almost 2:1

and still could not keep up with the

issues that were cropping up.

How productive should I expect

my team to be (given a variety of

factors)? There is clear bench-

marking data available indicating

average ratios of function points to

lines of code and productivity in

terms of lines of code per day, given

the type and criticality of the appli-

cation being developed. There are

a great deal of data points behind

the scenes used to make the infor-

mation statistically relevant, usually

with extremely wide variations.

This variability rarely makes it to

the surface of the data presented,

but it is a strong indicator that your

mileage may vary — and probably

by a large amount, even within

your team.

How should I compensate my

staff? Industry salary reports have

dropped from their staggering

heights of a few years ago to reflect

the changing times. Still, there is

significant geographical variation

to consider.

For a variety of reasons, many com-

panies turn to externally generated

benchmarking data to provide the

answers they need. Unfortunately,

there is a dark side to the quick and

sometimes blind use of external

information. Use of benchmarking

data needs to be carefully tem-

pered if it is to provide value for

organizational improvement.

BENCHMARKING DATA IS ALLURING

For better or worse, most organiza-

tions refer to industry benchmark-

ing data as a means of gauging their

performance. Like most people in

the industry, I’ve done my fair share

of quoting statistics from the

Standish Group’s 1994 CHAOS

Report [6] and used the quarterly

reports from the Software

Engineering Institute (SEI) to

describe industry performance in

discussions with clients. A number

of people and organizations have

collected and disseminated a great

deal of benchmarking data over the

years, including the guest editor of

this issue, David Garmus.

Vol. 16, No. 6

Benchmarking for the Rest of Us

by Jim Brosseau


Collected benchmarking data is

relatively easy to obtain as, for the

most part, it is readily available, if

for a price. It generally comes

from well established, reputable

sources, either published in books

or trade journals or available for

purchase from a number of organi-

zations worldwide. It is usually well

organized and indexed in a manner

that will allow you to quickly arrive

at the information you are looking

for. Using data from reputable

sources will help you to back up

your assertions and can make your

arguments much more compelling

and defendable. It can be an indi-

cation that you have “done your

homework.”

At times, however, the allure of

benchmarking data comes from its

external sterility. The data provided

is based on other people’s perfor-

mance, and it may provide a sani-

tized look at what the industry is

doing. For some organizations, it

can become a game to blithely

quote industry performance figures

while avoiding internal measure-

ment, knowing that the truth can

be a bitter pill to swallow.

BENCHMARKING DATA:CAVEAT EMPTOR

He uses statistics as a

drunken man uses lamp-

posts — for support rather

than illumination.

— Andrew Lang

Imagine a situation in which you

decide when it is best to leave for

work in the morning by observing

your neighbor’s departure patterns.

Over the course of a month, his

average departure time is 7:15,

give or take five minutes or so.

That’s pretty consistent, so you

decide that 7:15 must be appropri-

ate for you as well. Unfortunately,

your neighbor works about a mile

away, while you have a cross-town

trek. Worse yet, you may be on the

afternoon shift, or you may work

from home. Is that benchmarking

data worthwhile?

There are a number of problems

associated with using the industry

data that we have all turned to on

occasion. We need to be extremely

careful to drill down past the super-

ficial presentation of the data —

usually a simplified table or graph

— to determine if it is applicable to

our situation at all. Quite often, the

data will be presented in a form

that may be visually compelling

while obfuscating some important

elements of information that would

otherwise be helpful. With the gen-

eral availability of spreadsheets and

graphics packages, we often find

ourselves interested in the super-

ficial presentation rather than the

intrinsic information.

Wide Variability, Hidden BiasBeyond the simple data points

presented in benchmark data, it

is important to recognize that the

underlying data may have potential

hidden biases or wide variations

within the sample space. These

attributes, if not clearly understood,

can lead one to rely more heavily

on the benchmark data than is

reasonable.

Parametric estimation models, for

example, are essentially the result

of curve-fitting exercises based on a

broad sample space of thousands

of completed software projects,

which can make the models com-

pelling to use for early, whole-

project estimates. The SLIM

parametric estimation model is

based on a large number of proj-

ects, divided into roughly a dozen

different industry types [3], with the

intent to provide a sample space

that is relevant to your situation. As

you drill deeper, though, you find

that the variation within each of

these industry types is very wide

and that your performance may

actually be closer to the median

performance of an industry type

that does not appear to be close

to yours.

The COCOMO II parametric model

[1] introduces a bias of another

form. While the sample space is

much smaller, it is important to

note that many of the projects that

have been used for curve-fitting the

model are primarily in the defense

and aerospace realm, where prac-

tices are such that there is a very

low correlation with commercial

software development or other

development types.

Both the SLIM and COCOMO II

models have been fit primarily

with projects that are fairly large in

terms of effort and scope. It would


Some organizations blithely

quote industry performance

figures while avoiding inter-

nal measurement, knowing

that the truth can be a bitter

pill to swallow.


be erroneous to assume that the

models could be extrapolated

down for use on small projects. To

blindly use these models “out of the

box” for small projects or projects

that have not been calibrated

appropriately would be to generate

estimates that are falsely defend-

able. While the data behind the

models has been validated, that

does not mean that it cleanly maps

to your situation.1

Sparse, Slanted SEI DataThe SEI’s quarterly Process Maturity

Profile of the Software Community

may suffer from bias problems of its

own. According to the August 2002

release [5], the report shows that

19.3% of reporting organizations

are performing at Level 1 (the

Initial level) of the SEI’s Capability

Maturity Model (CMM) scale, which

is quite a strong positive indicator

for the industry as a whole. The

fine print, however, indicates that

this figure is “based on the most

recent assessment, since 1998, of

1,124 organizations.”

There are a couple of points to

note here. This sample space is

extremely small considering the

number of software development

organizations worldwide. In addi-

tion, it is biased not only toward

organizations that are aware of the

SEI, the CMM, and the suggested

best practices they promote, but

also toward organizations that have

reported results to the SEI from for-

mal assessments.

In most organizations that I have

worked with in the past four years,

the majority of the people were not

aware of the SEI, and their prac-

tices and performance clearly

placed them in the Initial level of

the CMM. Among those organiza-

tions that claimed to have attained

a higher level of maturity on the

SEI’s scale (i.e., Levels 3-5), most,

in my experience, were not able to

perform in accordance with even

those goals attributed to the

Repeatable level (Level 2).

(Ir)Relevance of Annual DataOften, benchmarking data is pro-

vided on an annual basis, which

allows you to subscribe and remain

current. One must be careful, how-

ever, to determine whether time-

based trends would be relevant for

the information provided. Clearly,

annualized reports showing the

equivalent of the average results of

1,000 coin tosses will yield limited

additional insight. While some

benchmarking data will benefit

from annualized updates (such

as new data sectors or evolving

trends), there are other classes

of data that do not benefit to the

same degree.

It’s Not a Divining RodThere is danger in using bench-

marking data to determine direc-

tion for your organization. Industry

averages in IT spending, for exam-

ple, can be extremely revealing if

you are on the receiving end of that

spending trend. They can also be

used as one of the drivers for fore-

casting, especially if historical

spending trends have tracked well

with your performance in the past.

If you are looking at how much

your organization should be spend-

ing, historical benchmarking data

will tell you where the industry has

been, but it will not help you

resolve how to best address your

organizational needs in the future.

Budgeting for future spending

based on industry trends fails to

address what is important for you.

BENCHMARKING DATA’SSILVER LINING

All models are wrong.

Some models are useful.

— George Box

All this is not to say that you should

never use externally generated

benchmarking data within your

organization. There is a great deal

of consideration and industry

Vol. 16, No. 6 35

1Beyond the selection of a specific para-

metric model for estimation, there is the

question of which estimation procedure

to use. Many organizations will try to

take a published procedure (such as

that used by the NASA Software

Engineering Laboratory [2]) and its

embedded information (such as uncer-

tainty, phases, and approaches) and call

it their own. While there are industry-

wide principles that an estimation pro-

cedure should embrace, there is not a

one-size-fits-all solution.

Historical benchmarking data

will tell you where the indus-

try has been, but it will not

help you resolve how to best

address your organizational

needs in the future.


research that has gone into much

of the available benchmarking

data, and it is important to under-

stand how and whether the data

appropriately applies to your

situation.

Benchmarking data that is used as

a basis for or result of certifications

or qualifications — such as ISO

quality standards, SEI maturity lev-

els, or the Project Management

Institute’s Project Management

Professional (PMP) designation —

provides an indication that the

organization or individual has

passed a baseline level of perfor-

mance or understanding. ISO-

certified organizations have clearly

identified their quality practices

and demonstrated that they “prac-

tice what they preach” (although

this is not a guarantee that their

next project will be a success), and

the certification can reasonably be

used as part of the criteria in an

acquisition process. Individuals

with the PMP designation have

been assessed to have knowl-

edge of a base set of commonly

accepted project management best

practices and have performed a

prescribed amount of work in the

project management arena (but

this is not a guarantee that they are

effective project managers).

For much of the benchmarking data

that is available, the underlying

assumptions, variability of the data,

and inherent biases can usually be

identified with some digging. The

information may be published

along with the primary information

that has been distilled, available

from the provided reference

information, or obtained through

deeper discussion with the data

provider if one is so inclined (and

diligent).

THE PERSONALIZED SOLUTION:ANSWER YOUR OWN QUESTIONSFIRST

A reasonable approach to the

use of metrics data is to balance

external benchmarking data with

internally derived data to help

you understand whether or not

you are achieving your organiza-

tional goals. As Peter Senge noted

in The Dance of Change, we need

to measure to learn rather than to

merely report [4].

With an understanding that the first

step is to identify our goals in the

measurement process, we can lean

on Vic Basili’s GQM approach or

extend and elaborate on that prac-

tice using techniques such as the

balanced scorecard. Identifying

these goals and the model we will

use to validate the goals allows us

to remove biases from the response

and resist the temptation to use

data simply because it is readily

available. Our quests become

tightly coupled with our culture

and organizational needs.

Using this approach, we can then

perform our own internal meas-

urements and make comparisons

against industry benchmarks

where it makes sense. With the

added internal diligence, we will

have a more valuable understand-

ing of the biases that are inherent in

the data (and a comfort that the

biases are more likely to be work-

ing for us than against us) and of

the uncertainty or variability in the

data set.

It is important to recognize the dis-

tinction between statistical variabil-

ity across industry benchmarking

data and individual performance

variability that will arise in your own

measured data. The former is an

indication of the relative applicabil-

ity of the information to your situa-

tion, while the latter is an expected

artifact of the measurement

approach that needs to be fully

appreciated. You need to accept

individual variation as a fact of life.

Even if you use the information to

cull the low-performing individuals

(not a recommended practice),

you will continue to have variabil-

ity; by definition, 50% of the people

will always fall below the median of

your data set. It is dangerous to fall

into the trap of using measures for

segregating the team rather than for

improving the organization.

Some of your greatest insight will

come as you track your own meas-

urements over time and observe

the variation and trends that are

revealed. This information is not

something that can be gained from

industry benchmarking data, but


Some of your greatest insight

will come as you track your

own measurements over time

and observe the variation

and trends that are revealed.


you can see whether you are

tracking toward or away from the

industry data, which will provide a

greater indication of the applicabil-

ity of the benchmarking informa-

tion to your situation.

For this tracking to be effective,

you need to be consistent in your

measurement approaches within

your organization over time. One

commonly hears concerns in the

industry about inconsistency of

measurement, whether it be for his-

togram categories to collect data or

the approach used (such as the

highly variably lines of code meas-

ure, for example). The bottom line

here is that you should select a spe-

cific approach, identify that it is the

standard, and stick with it in order

to ensure that you are indeed mak-

ing apples-to-apples comparisons.

Industry benchmarking data defi-

nitely has its place in your arsenal of

information for making strategic

business decisions. Still, it has limi-

tations that must be overcome with

a deep understanding of why you

are measuring and balanced with

data gathered internally with rea-

sonable approaches. Taken with a

grain of salt, benchmarking infor-

mation can give us the perspective

we need to better understand what

our internal information is telling us.

REFERENCES

1. Boehm, Barry, Bradford Clark,

Ellis Horowitz, Ray Madachy,

Richard Shelby, and Chris

Westland. “Cost Models for Future

Software Life Cycle Processes:

COCOMO 2.0.” Annals of Software

Engineering (1995) (http://sunset.

usc.edu/research/COCOMOII/

index.html).

2. National Aeronautics and

Space Administration. The

Manager’s Handbook for

Software Development, Revision 1

(Software Engineering Laboratory

Series SEL-84-101). NASA, 1990

(http://sel.gsfc.nasa.gov/website/

documents/online-doc/84-101.pdf).

3. Putnam, Lawrence, and Ware

Myers. Measures for Excellence:

Reliable Software on Time, Within

Budget. Yourdon Press Computing

Series, Pearson Education POD,

1992.

4. Senge, Peter, Art Kleiner,

Charlotte Roberts, George Roth,

Rick Ross, and Bryan Smith. The

Dance of Change: The Challenges

to Sustaining Momentum in

Learning Organizations.

Currency/Doubleday, 1999.

5. Software Engineering Institute.

Process Maturity Profile of the

Software Community 2002 Mid-Year

Update. SEI, August 2002.

6. The Standish Group. The

CHAOS Report. The Standish

Group, 1994 (www.standishgroup.

com/sample_research/chaos_1994

_1.php).

Jim Brosseau has 20 years’ experience

in the software industry in a wide variety

of roles, application platforms, and

domains. A common thread through his

experience has been a drive to find a less

painful approach to software develop-

ment. Mr. Brosseau has worked in qual-

ity assurance at Canadian Marconi and

was involved in the development and

management of the test infrastructure

used to support the Canadian Automated

Air Traffic System. He is Principal of the

Clarrus Consulting Group in Vancouver,

Canada, and in the past four years, he

has consulted with numerous organiza-

tions throughout North America, specifi-

cally to improve their development

practices.

Mr. Brosseau publishes the Clarrus

Compendium, a free weekly newsletter

with a unique perspective on the soft-

ware industry (www.clarrus.com/

resources.htm). He has been published

in PM Network magazine, the PMI

GovSIG magazine, and the SEA Software

Journal, and he has made presentations

at Comdex West, PSQT North, the New

Brunswick SPIN group, and several local

associations.

Mr. Brosseau can be reached at Clarrus

Consulting Group Inc., 7770 Elford

Avenue, Burnaby, BC Canada V3N 4B7.

Tel: +1 604 540 6718; Fax: +1 604 648

9534; E-mail: [email protected].

Vol. 16, No. 6 37


Before we consider software

measurement and the collection

of software metrics, we need to

ask why we want to put ourselves

through the pain. The short answer

is that the organization’s return on

investment for IT has come under

increased scrutiny from senior

business executives and directors.

Consequently, IT now has to

operate like other parts of the

organization, being aware of its per-

formance, its contribution to the

organization’s success, and oppor-

tunities for improvement. How can

IT executives achieve this without

performance data? Flying blind is

not an option.

So what is it that managers need to

know? Here are some of the ques-

tions that executives in the banking

industry raised with me during

recent discussions about the use

of metrics in that sector:

� How do I know if my internal

IT operation is performing

satisfactorily?

� How do I decide whether I

should outsource some or all

of my IT operations?

� How do I know if my out-

sourcer is performing?

� What are the risk factors

I should consider in an

IT project?

� What questions should I ask

to ensure that an IT project

proposal is realistic?

� How do I know if a project is

healthy? What should I be

worrying about?

� What are the infrastructure

trends for software develop-

ment (languages, platforms,

tools, etc.)?

And the list goes on. Furthermore,

none of these questions can be

answered without sound data.

COLLECTING DATA

Organization-Level Versus Industry-Level DataHaving established that we need

data, how do we go about collect-

ing it? It would seem from the

questions listed above that there

are two levels at which we need

data: initially at the organization

level and then at the industry level

(with the ability to look at subsets

of industry data — by industry sec-

tor, for example). If an organization

collects data about its own IT proj-

ects and builds a repository from

this data, it can use it for macro-

estimation of future projects, do

internal benchmarking, track per-

formance improvement, analyze

what seems to work and not work

in its operations, and so on. This is

a very good start, but many of the

questions being raised by manage-

ment go beyond the organization

itself. Once there is a need to

benchmark against the world out-

side, estimate a project type that

the organization has never done

before, or analyze the performance

of other languages and tools, then

we have to find industry data.

In order for data to be useful, we

have to be able to compare “apples

with apples.” Thus it is important

that the data collected at an orga-

nization level can be compared to

data collected at the industry level.

The obvious question is, “What

should I collect?” Now this is the

fun bit! If you ask your people what

data they think you need to collect,

the list will be almost endless. If you

then produce a questionnaire to

collect all the data requested, the

same people will wail: “I can’t col-

lect and enter all that. I haven’t got

the time or patience!” It’s a lot like

system functional requirements —

“nice to have” is fine, as long as you

can afford it.


it’s

a b

ig w

orl

d o

ut

there

June 2003

The Practical Collection, Acquisition, andApplication of Software Metrics

by Peter R. Hill

The obvious question is,

“What should I collect?”


ISBSG QuestionnaireThe International Software

Benchmarking Standards Group

(ISBSG) established its initial data

collection standard more than 10

years ago. ISBSG constantly moni-

tors the use of its data collection

package and reviews the package

content. It has endeavored to reach

a balance between what data is

good to have and what is practical

to collect. Rather than reinvent the

wheel, any organization can use

the ISBSG data collection question-

naire, in total or in part, for its own

use. The questionnaire is available

free from the ISBSG Web site

(www.isbsg.org), with no obliga-

tion to submit data to the group. But

whatever data collection mecha-

nism you end up with, ensure that

the only data you are collecting is

data that will be used and useful.

If you employ a questionnaire

approach to data collection, you

should give some thought to devel-

oping a set of questions that pro-

vide a degree of cross-checking.

Such an approach will allow you to

assess the collected data and rate it

for completeness and integrity. You

can then consider the ratings when

selecting a data set for analysis. As

a guide, the ISBSG employs the four

rating levels shown in Table 1.

Even D-rated projects may be worth

retaining, as they may contain some

data, perhaps qualitative, that could

be useful for a specific analysis.

Automated Collection Manual data collection will always

be painful, and collection after the

fact may increase the likelihood of

error. The best collection approach

must surely be one that is auto-

matic, that occurs throughout the

complete lifecycle of a project and

goes virtually unnoticed. Some

project management tools collect

data as part of the natural planning

and management process, from ini-

tial estimation through to mainte-

nance and support. Such a system

removes the pain of collection and

increases the integrity of the data.

ACQUIRING INDUSTRY DATA

Once you decide to obtain and use

industry data, more questions arise.

Where do I get industry data? How

do I know that it is sound data?

How can I be sure that the data has

not been manipulated to suit some-

one’s specific agenda? How do I

know it’s not biased?

Where Do I Get It?Despite the need for industry data,

there seem to be few sources. The

commercial consulting companies

that offer benchmarking services

tend not to let you look at the data

used in their benchmark reports.

Industry groups will sometimes

arrange to collect data from a num-

ber of organizations that agree to

participate, but with everyone seek-

ing a competitive advantage, gain-

ing cooperation can be difficult.

Governments have been known to

encourage industry benchmarking.

In Finland, for example, the govern-

ment supported the establishment

of a national software repository to

which the major Finnish organiza-

tions contributed metrics data so

that they could benchmark them-

selves and improve their perfor-

mance. For its part, the ISBSG

repository is “open”; the data is

available to anyone who wishes

to purchase a copy.

Vol. 16, No. 6 39

Rating Description

A The data provided was assessed as being sound, with nothing being identified that might affect its integrity.

B While the data was assessed as being sound, there are some factors that could affect the credibility of the data.

C Because significant data was not provided, it was not possible to assess the integrity of the data.

D Because of one factor or a combination of factors, little credibility should be given to the data.

Table 1 — ISBSG’s Data Rating Levels

Whatever data collection

mechanism you end up with,

ensure that the only data you

are collecting is data that

will be used and useful.


How Do I Know It’s Sound?Establishing whether the data that

you are buying is sound involves

seeking answers to the following

questions. Is the collection instru-

ment well thought out and proven?

Has the data been rated? How old is

the data? Can I use the data to com-

pare “apples with apples”?

Has the Data Been Manipulated?The possibility of data manipulation

is sometimes raised. Would an

organization or other entity supply

false data to a repository? The

answer could be “yes” if there is

an opportunity to profit from such

deceit. However, if the anonymity

of the submitter is maintained and

certain types of reporting on the

data are avoided, then there is no

point in submitting false data.

Where anonymity is ensured, only

those who submitted a project can

identify that project in the reposi-

tory. Why would they want to fool

themselves?

Although this approach removes

the likelihood of data manipulation,

it also removes the possibility of

comparing one organization to

another. Projects from a specific

industry sector may be identifi-

able, but projects from a specific

organization (other than your own),

will not be. So if your organization

is a bank and you want to bench-

mark your bank against other

specific banks, you can only do it

with their cooperation — and then

their honesty.

Is the Data Biased?Data quality extends beyond the

integrity of the individual entries in

the repository that you are propos-

ing to use. Is the data representative

of the industry as a whole? At this

stage of the IT industry’s maturity,

that answer would surely be “no”

in all cases.

Any organization that has software

metrics data, or has hired consul-

tants to gather such data, or has

contributed data to a repository is

displaying a certain level of maturity

that is likely to put it at the upper

end of the scale. Normally the data

collected is from completed proj-

ects. Sadly, that in itself excludes a

lot of IT projects! Human nature

also plays a role, and despite

anonymity, the temptation may be

to submit only the better projects.

Consequently, if you have satisfied

yourself about the factual integrity

of the data, then it is highly likely

to be representative of the top 25%

of the IT industry. As we will see in

the following examples, knowing

this is useful when you come to

use the data.

APPLYING INDUSTRY DATA TOGOOD USE

So now that we have got some data

and we know what we have got,

what will we do with it? Let’s

answer a couple of the banking

executives’ questions.

Should I Outsource My ITOperations?Answering this question is a practi-

cal application of benchmarking.

If you have data about the perfor-

mance of your internal IT organiza-

tion, then you can compare it to

industry data. Such a comparison

might reveal that the internal group

is doing a good job or that only

certain activities should be out-

sourced. If outsourcing is being

considered, such an exercise will

also provide the basis for estab-

lishing outsourcer performance

requirements. Obviously it is impor-

tant to ensure that you compare

like with like. There is no point, for

example, in comparing the perfor-

mance factors of projects devel-

oped on a PC with those developed

on a mainframe.

As a very simple example, you

might use industry data1 to gauge

your IT development organization’s

performance on the basis of the

number of hours it takes to deliver a

function point of functionality. From

the industry data, you could select

a data set based on projects with

characteristics similar to yours:

banking sector, new developments,

mainframe, COBOL. Figure 1

shows one possible result.

In this simple example, the produc-

tivity of your IT organization, as


1Industry figures used in the examples

are from the ISBSG repository.

Normally the data collected

is from completed projects.

Sadly, that in itself excludes

a lot of IT projects!


measured in the number of hours it

takes to produce a function point of

functionality, looks pretty good; it

takes fewer hours than the industry

median of 14. Similar benchmark

reports could be produced for

speed of delivery or for a number

of different project characteristic

sets to cover the bank’s portfolio

of projects. Where “Your

Organization” appears on the

resulting graphs could influence

an outsourcing discussion.

How Can I Ensure an IT ProjectProposal Is Realistic?If a software development project

is being submitted for funding

approval, how will the decision-

makers know whether or not the

proposal is realistic? If the proposal

was for the construction of a build-

ing, a quantity surveyor would have

already estimated the project

within 5% of its likely cost and

would have similar calculations for

speed of delivery and total dura-

tion. The questions that the deci-

sionmakers need to ask about the

proposed IT project are, “How does

the proposed project compare

against industry data for similar

projects?” and “Is the proposal

realistic?” These comparisons can

be made at a number of levels:

� Project component break-

down percentages (files,

reports, inquiries, etc.)2 exist

to ensure that nothing has

been missed and that if the

project does differ signifi-

cantly from the industry

norms, the reasons are

known and verifiable.

� The project lifecycle phase

breakdown (plan, specify,

design, build, test, imple-

ment) exists, again, to ensure

that nothing has been missed

and that if the project does

differ significantly from the

industry norms, the reasons

are known and verifiable.

� Project delivery rate, work

effort, speed of delivery, and

duration are also important.

The industry data for compa-

rable projects will quickly

reveal whether the project

being proposed is realistic.

If it does vary greatly from

the industry norms, partic-

ularly if it looks optimistic,

then the reasons should be

known and verifiable.

For example, if an organization is

proposing a banking project of 500

function points, with the same

characteristics as the one shown

in Figure 1, the industry figures in

Table 2 will provide a reality check.

Given that we believe the industry

data comes from “better” projects,

then any proposal that provides fig-

ures that are closer to optimistic

than likely should be questioned

with vigor. Of course we could

add other project characteristics,

such as application type, to further

define our project. As long as the

resulting sample data set has a

reasonable number of projects in it,

then each additional characteristic

Vol. 16, No. 6 41

0 5 10 15 20 25

Industry 75%

Industry median

Industry 25%

Your Organization

Hours Per Function Point

Figure 1 — Comparing development performance.

2There are stable industry ratios for proj-

ect components and project lifecycle

phases. See the ISBSG Software Metrics

Compendium (www.isbsg.org).


will improve the comparison

between our proposed project

and the selected group of industry

projects.

There is no doubt that individual

organizations need data not only at

their own IT activity level but also at

the industry level. Use of good soft-

ware metrics data has extended

beyond internal IT benchmarking

and project estimation to the

broader areas of IT and business

management, including:

� Outsource performance

management

� Development scope

management

� Development infrastructure

planning

� Business case reality

checking

There are mature data collection

packages and tools available that

provide guidance on what data to

collect and how to make collection

easier. There is a growing industry

body of knowledge that can be

used to help IT make its contribu-

tion to organizational strategy, com-

petitive advantage, and profitability.

Peter Hill is the Executive Director of the

International Software Benchmarking

Standards Group (ISBSG), a not-for-profit

organization. ISBSG members are the

software metrics organizations of 11

countries. The group has built, grows,

maintains, and exploits repositories of IT

industry data. Mr. Hill has compiled and

edited four books for the ISBSG: Software

Project Estimation, The Benchmark

Release 6, Practical Project Estimation,

and The Software Metrics Compendium.

Over many years, Mr. Hill has written

articles and delivered papers at IS-

and business-oriented conferences in

Australia, New Zealand, Finland, Spain,

and Malaysia.

Mr. Hill can be reached at Tel: +61 3

9844 0560; Fax: +61 3 9844 0561; E-mail:

[email protected].


Project Delivery Rate

(hours per function point)

Project Work Effort (hours)

Speed of Delivery (function points per

month)

Duration (months)

Optimistic 9.3 4,672 46.6 10.7

Likely 14.5 7,270 30.5 16.4

Conservative 21.5 10,733 19.8 25.2

Table 2 — Estimated Metrics for a Hypothetical Banking Project (500 Function Points)

Information and analysis are the foundation for performance manage-ment systems. However, few companies have built a solid performancemeasurement process that aligns measures throughout the organization.Entities that have implemented sound measurements and tie them toorganizational strategy are enabling their managers and process ownersto focus on strategic, rather than day-to-day, issues. In turn, better strate-gic decisions are being made based on those measures.

Cutter Consortium constantly conducts surveys and studies of IT andbusiness practices worldwide, designed to deliver valuable insight andintelligence into how organizations are using IT to become more effi-cient and profitable. Cutter Benchmark Review is a succinct distillationof these findings that provides every member of your organization withthe key data and analysis necessary to build and maintain a successfulperformance management system — and use the system to makemore effective decisions.

The undiluted business intelligence provided by Cutter BenchmarkReview is simply not available anywhere else. The cost in time andresources to complete this kind of research internally would beprohibitive — but you can get it today and for the next full year forabout $20 a month.

Take a few moments now to ensure that you have unfettered access tocritical statistical analysis of important IT initiatives and programs bycompleting the order form below and becoming a subscriber to CutterBenchmark Review. We’re sure once you have this publication in yourhands, you won’t want to give it up!

�� YES! Please start my one-year subscription to Cutter Benchmark Review for theCharter Rate of $195 (US $255 outside N. America)

Name/Company

Address/P.O. Box

City State/Province

ZIP/Postal Code Country

Telephone Fax

E-Mail

Mail to Cutter Consortium, 37 Broadway, Suite 1, Arlington, MA 02474-5552, USA, or send a fax to +1 781 648 1950.Or call +1 781 648 8700 or send e-mail to [email protected]. Web site: www.cutter.com

Cutter Benchmark Review helps you support — and even profit from — yourorganization’s IT initiatives and programs.

SPECIAL OFFER!

Get Cutter BenchmarkReview at the Charter Ratewhen you order byJuly 31st!

Simply complete andreturn the coupon belowor call +1 781 648 8700,or fax +1 781 648 1950.

You can also order onlineat www.cutter.com orby sending e-mail [email protected].

Cutter Benchmark Reviewdelivers the survey-basedstatistics you need to keepyour IT goals on track.

Payment or P.O. enclosed.

Charge Mastercard, Visa, AmEx, Diners Club.(Charge will appear as Cutter Consortium.)

Card #

Expiration Date

Signature

Analyzing IT Metrics forInformed Management Decisions

Cutter Benchmark Review

Priority Code: 220*2CIT

SUBSCRIBE NOWAT THE CHARTER RATE!



mailto:[email protected]

htt

p://w

ww

.cutt

er.com

/ or

+1 8

00 9

64 5

118

Cutter IT JournalTopic Index

David Garmus, Guest Editor

David Garmus is a Principal in The David Consulting Group and an acknowl-edged authority in the sizing, measurement, and estimation of software appli-cation development and maintenance. He has helped numerous CIOs andCFOs successfully manage expectations in software development projects,using function point analysis to enable effective IT cost management and

achieve a realistic return on investment. He is the coauthor of Function Point Analysis:Measurement Practices for Successful Software Projects and Measuring the SoftwareProcess: A Practical Guide to Functional Measurements. He has served as the Presidentof the International Function Point Users Group (IFPUG).

Mr. Garmus has more than 30 years of experience managing software developmentand maintenance and has taught college-level courses in computing- and finance-related subjects. He is a member of Project Management Institute, the Quality AssuranceInstitute, and the IEEE Computer Society, and he holds a BS from the University ofCalifornia–Los Angeles and an MBA from the Harvard Business School. He can be reachedat [email protected].

UpcomingIssue Themes

Enterprise Architecture Governance

The New CIO Agenda

Usability

Patterns

Killing IT Projects

June 2003 IT Metrics and Benchmarking

May 2003 Is Open Source Ready for Prime Time?

April 2003 Project Portfolio Management:Blueprint for Efficiency or Formula for Boondoggle?

March 2003 Critical Chain Project Management:Coming to a Radar Screen Near You!

February 2003 XP and Culture Change: Part II

January 2003 Ending “Garbage In, Garbage Out”:IT’s Role in Improving Data Quality

December 2002 Preventing IT Burnout

November 2002 Globalization: Boon or Bane?

October 2002 Whither Wireless?

September 2002 XP and Culture Change

August 2002 Plotting a Testing Course in the IT Universe

July 2002 Confronting Complexity: Contemporary Software Testing

June 2002 B2B Collaboration: Where to Start?


it metrics and benchmarking - semantic scholar · of it metrics and benchmarking — some...

Documents