measurement and metrics for test managers

Presented by:

Rick raig Software eering

Brought to you by:

340 Corporate Way, Suite Orange Park, FL 32073 888‐2

MG PM Tutorial4/7/2014 1:00 PM

“Measurement and Metrics for Test Managers”

C

Quality Engin

300,68‐8770 ∙ 904‐278‐0524 ∙ [email protected] ∙ www.sqe.com

http://starcanada.techwell.com/�

A consultant, lecturer, author, and test manager, Rick Craig has led numerous teams

ny

e has

quent on.

Rick Craig ity Engineering Software Qual

of testers on both large and small projects. In his twenty-five years of consulting worldwide, Rick has advised and supported a diverse group of organizations on matesting and test management issues. From large insurance providers and telecommunications companies to smaller software services companies, hmentored senior software managers and helped test teams improve their effectiveness. Rick is coauthor of Systematic Software Testing and is a frespeaker at testing conferences, including every STAR conference since its incepti

http://starcanada.techwell.com/sme-profiles/rick-craig�

© 2014 SQE Training V3.2 1

IntroductionMEASUREMENT AND METRICS FOR TEST MANAGERS

Administrivia

Course timing Mealstiming

Electronic devices Facilities

Smoking Breaks

4© 2014 SQE Training V3.2


Course Agenda

1. Introduction to Software Measurement2. Metrics—Rules of Thumb3. A Tester’s Dashboard4. Estimation (Optional)


1INTRODUCTION TO SOFTWARE MEASUREMENT


What is software measurement?

“It’s easy to get numbers, what is y g ,hard is to know they are right and understand what they mean”

— Bill Hetzel


What is software measurement?

“ ifi d b i ”“Quantified observations”about any aspect of software(product, process, or project)



Lord Kelvin

“To measure is to know”

“If you cannot measure it, you cannot improve it”

“Th d d h i“The more you understand what is wrong with a figure, the more valuable that figure becomes”


There Are Lots and Lots of MeasuresPrimitive:

– Aspirins consumed this week– Number of staff assigned to project AP f i t ifi ti– Pages of requirements specifications

– Hours worked to accomplish change request X– Number of operational failures in system Y this year– Lines of code in program Z

Computed:Defects per 1000 lines of code in program A– Defects per 1000 lines of code in program A

– Productivity in function points deliveredby person B

– Quality Score for project C– Average coffee consumption per line of code– Accuracy of hours worked per week is ± 20%



Common Metrics• Test defects• Defects after releaseO bl• Open problems

• Open issues• Schedule performance• Process compliance (e.g., ISO)• Test results• Reliability• Time fixing problems• Defects from fixes• Lines of code• Plan and schedule changes


Uncommon Metrics

• Code coverageC l it• Complexity

• Cost of rework• Cost of quality

Defect age



Basic DefinitionsThe four Ms:

• Measure

Primitive (raw data)

13, 34, 17, 74 42, 34 56 77 94 34• Measure

• Metric• Meter• Meta‐measure

34, 56, 77 94, 34, 45, 63 45, 67, 12, 31 61, 06, 91, 42

Computed (information)


What Makes a Good Measure?

• SimpleObj i• Objective

• Easily collected• Robust• Valid



What Can Measures Do for You?• Facilitate estimation• Identify risky areasy y• Measure testing status• Measure/predict product quality• Measure test effectiveness• Identify training opportunitiesIdentify training opportunities• Identify process improvement opportunities• Provide “meters” to flag actions


2METRICS—RULES OF THUMB


Metrics--Rules of Thumb

• The Human Element• The Basics• KISS• And a Myth or Two


The Human Element

•Without buy‐in, metrics may be falsified

•Without buy‐in, metrics may be ignored

Buy‐in is key



Class Discussion

How do you obtain buy‐in?


Ways to Obtain Buy-in

• Training• Metrics• Feedback loops• ReviewsReviews• Participation



The Human Element

• Measure processes and products instead of l if iblpeople if possible

• Beware of the dark side of the Hawthorne Effect


Two Sides of Measurement

…the information

…the information will help me

may be used against me.

punderstand what is going on and do

a better job.



The Hawthorne EffectMeasuring people improves their productivity


The Human Element

Tailor metrics to the audience

Users, managers, practitioners all have different languages

Set the appropriate level of detailpp p

How you present the material matters



Who is your audience?Users

Developers

Testers


% of Red Cars Soars

2626 026.1

25.525.4

25.425.525.625.725.825.926.0

2008 2009 201025.125.225.3



% of Red Cars Soars? 100

50

75

25.5 2625 4

2008 2009 2010

2525.4


The Human Factor

Training is required

Metrics are not second nature

Your metrics are affected by how they are collected

Establish range of expected values

Publish historical values



The Basics

• Use a metric to validate a metric• Use meta‐measures• Use meters when possible• Consistency sometimes trumps accuracy• Subjective is good; objective is better


KISS ― Keep It Simple Sir

• More is not always better• All metrics are not foreverAll metrics are not forever

– Consider temporary metrics– Consider sampling

• Automate collection when possible



3’A TESTER’S DASHBOARD

A Dashboard



Establish a Dashboard

• Easy to use/understand at a glance

Quality of productQuality of product

Status

Test effectiveness

Resources

* Remember you need at least two metrics per “instrument”

Issues


Measures of Quality

• It is difficult to develop practical measures of qualityquality

• The cost to achieve various quality levels must be taken into account

• Many quality metrics are relatively subjective

• Quality goals will be affected by the industry and corporate culture



What Is Quality?

•Meeting requirements(stated and/or implied)Quality


Sample Quality Factors and Criteria• Correctness• Reliability• Testability• Flexibility• Usability• Portability• Interoperability• Efficiency

Correctability

CorrectnessCorrectnessCorrectness

• Integrity• Maintainability• Revisability• Survivability

CorrectnessCorrectnessCorrectnessCorrectness



Defect Density/Clustering

# of Defects

per1,000

Lines of Code

Module NameD B A C E F


Defect Density

IssuesCoverage of tests

Weighting of defects

Weighting by relative risk

What to use as the denominator



Effect of Complexity on Quality

ase

bilit

y of

Pos

t-rel

eaD

efec

t

Complexity

Pro

bab


Other Measures of Product Quality• Customer satisfaction• Repeat customers?• Referrals?• Calls to the help desk?• Timeliness?• Defect age?• Complexity?• Rework?• Reliability?



Quality of Product

• Record any current measures of product quality that you are using here. Give them a q y y ggrade for effectiveness (A, B, C, etc.)

• Any new metrics you would use?

* Remember you need at least two metrics per instrument


Establish a Dashboard• Easy to use/understand at a glance:


Status

Test effectiveness

Resources


Issues



Status Reporting• The Master Test Plan should

specify

– What to report– How often– To whom


Common Test Status Metrics% of Test Cases Executed

IIssues:

• Weighting of TC by coverage metrics• Weighting of TC by risk• Weighting of TC by execution effort• Weighting of TC by time to executeWeighting of TC by time to execute

What do you really want to know?



Sample Test Status Report (raw data)Project: Online-Trade Date: 4/23/2009

Feature Total # % # %Tested Tests CompleteCompleteSuccess Success

Open Acct 46 46 100 41 89Sell Order 36 25 69 25 69Buy Order 19 17 89 12 63…..…..…..…..

Totals 395 320 81 311 79


Open and Closed Over TimeIncoming

Released

30

40

2 4 6 8 10 12 14 16 18 20

Weeks

Def

ects

Fixed

0

10

20

24222018161412ve

Def

ects

Detected

121086420

Days

Cum

ulat

iv

Open

0 10 20 30 40



When Is the Software “Good Enough”?

• Test exit criteria met

When to stop testing

• Return On Investment (ROI) not sufficient• Defect arrival rate• Resources exhausted

– Time– Moneyy

• Profiles (based on failures encounteredusing profiles of real data)

• Project cancelled!


Software PsychologyWhat is “good enough”?

# of# of Bugs

Time49© 2014 SQE Training V3.2


Economics of Test and Failure

Source: IBM Systems Sciences Institute


Stopping Criteria ― RevisitedAbnormal

• Resource exhaustion– Schedule

Normal• Test set exit criteria• Remaining defects

– Budget– System access– Patience

• Project redirection

e a g de ectsestimation criteria

– Defect history of past software– Defect history of current item– Software complexity– Combination of these

• Diminishing return criteria

– Cost to Detect Next Defect

• Combined criteria“There is no single, valid, rational criterion for stopping. Furthermore, given any set of applicable criteria, how each is weighed depends very much on the product, the environment, the culture, and the attitude to risk.”

— Boris Beizer



Test Summary Report• Report identifier• References

– Test items (with revision #s)

Adequacy assessmentEvaluation of coverageIdentify uncovered attributes

Summary of activitiesTest items (with revision #s)– Environments– References

• Variances (deviations)– From test plan or

requirements– Reasons for deviations

• Summary of incidents

Summary of activitiesSystem/CPU usageStaff timeElapsed time

Software evaluation LimitationsFailure likelihood

ApprovalsSummary of incidents– Resolved incidents– Defect patterns– Unresolved incidents


Status• Record any current test status measures that you are using here. Give them a grade for effectiveness (A B C etc )(A, B, C, etc.)







Status

Test effectiveness

Resources


Issues


How Do You Measure Test Effectiveness?



A Common Answer

– Coverage– Defect age (phase or product version)Defect age (phase or product version)– # of bugs– Defect density– Defect removal efficiency– Defect seeding– Mutation analysis– Customer complaints


Three Major Categories



Customer Satisfaction Measures

Issues

Who to ask

“After the fact”

Diffi l i iDifficulty in measuring

Doesn’t differentiate between the effectiveness of development and testing


Customer Satisfaction Measures• Subjective is good• Objective is betterObjective is better



Defect Measures

• Why is it important to track defects?• What are some ways to analyze defects?• What are some ways to analyze defects?• DDP• Defect density• Defect age


Why is it important to track defects?• Identify process improvement• Identify training needsy g• Identify problematic (high‐risk) areas• Determine test status



Defect Analysis ― Example

• Phase• Type• Type• Severity• Priority• AuthorA• Age

• Module


Defect Detection Percentage (DDP)

D f t Di dDDP =

Defects Discoveredx 100%

Defects at Start

85% is the average DRE for US software gprojects greater than 1,000 function points in size.

— Capers Jones



Defect Detection Percentage (DDP)

Issues

Severity and distribution of defects

How to know when all bugs are found

“After the fact”

What constitutes bug‐finding activities?

Some bugs cannot be found in testing


Defect “Value” (Cost Avoidance)

Requirements 1

When discovered Typical hours to rework/fix

High level design 1Detailed design 1Code 1Unit Test 3 – 5Integration test 5 –10System/acceptance test 10 – 30Production 20 – 60+



Defect Age (PhAge)Phase

discovered

Requirements

High level design0

Phasecreated

1 432 98765

0 321 87654

Detailed design

Coding

0 321 87654

210 76543

10 65432


Defect Age

IssuesIssues

Difficult to do root cause

Requires weighting of defectsq g g

How to handle latent/masked defects



Coverage Measures

Discussion

Requirements vs. design vs. code coverage

Completeness/accuracy of test basis

Coverage of test set vs. coverage of tests g gexecuted (e.g., we don’t always run every test)

Coverage vs. actual results (DDP)


Mapping Test Cases to Requirements

Requirements spec.

3 5 1 3 2

Test plan

T t C #33.5.1.3.2

…..

3.5.1.4.7

…..

3.6.4.2.1

Test Case #3

…..

Test Case #5

…..

Test Case #12

…..

3.8.2.7.1

…..

Test Case #19



Requirements/Design Coverage

Test Case

Conceptual model of requirements/ design coverage:

Test Case1 2 3 Covered?

Requirement A X X YB NC X X Y

Feature A X YB X X YB X X Y

Design A X X YB X YC ND X X Y


Requirements/Design Coverage

IssuesOnly as good as test basis

Relatively low coverage of code

Code coverage achieved with requirements testsb k ( )Major bank (20 apps) 20%

Major DBMS vendor 47%Major h/w s/w vendor 60%

— Source: Bender and Associates



Code Coverage

Test run

Conceptual model of code coverage

Test run1 2 3 Covered?

Statement A X X X YB X X YC X YD NE X YE X Y

60% 20% 60% 80%


Code Coverage

IssuesRequires a tool

Doesn’t prove the code actually “works” correctly

Did we test the “right code”?

Statement vs. branch vs. path



Test Effectiveness• Record any current test effectiveness measures that you are using here. Give them a grade for ff ti (A B C t )effectiveness (A, B, C, etc.)




Establish a Dashboard

• Easy to use/understand at a glance:


Status

Test effectiveness

Resources


Issues



Resources• Resource estimates/consumption are necessary in order to do test planning, estimation, budgeting and staffingbudgeting, and staffing

• You must consider the level of granularity in the collection of these metrics based on the accuracy of the required metrics and your ability to validate them

• Some people choose to exclude the resources instrument from the dashboard because they feel it is not a “day to day” metric


ResourcesResource metrics are normally collected Resource metrics are normally collected in terms of

Actual/expected budget

Actual/expected engineering hours

Test environment utilization/availability

Staffing levelsStaffing levels

Contractor availability

Other hardware/software resources



Resources• Record any current resource measures that you are using here. Give them a grade for ff ti (A B C t )effectiveness (A, B, C, etc.)





Quality of ProductQuality of Product

Status

Test Effectiveness

Resources


Issues



Issues

• This is included to address any important items not otherwise included on theitems not otherwise included on the dashboard. These are normally subjective and not necessarily conducive to systematic analysis

I ld i l t i i i t ll ti f• Issues could involve training, installation of new hardware/software, politics—even the weather


A Sample Tester’s Dashboard

Status• % completion• Defect info

Product Quality• Defect density• Performance, etc.

Test Effectiveness• DDP• Coverage

ResourcesE i i h• Engineering hours

• MoneyIssues



Avoiding Dysfunction

• Measure processes and products—not people!

• Beware of the dark side of the Hawthorne EffectBeware of the dark side of the Hawthorne Effect

• Remember that more is not always better

• Avoid the exclusive use of top‐down metrics

• Provide training—not all metrics are intuitive

• Consider temporary metricsConsider temporary metrics


Avoiding Dysfunction

• Define each metric, its use, who will see it, expected ranges etcranges, etc.

• Remember your audience and tailor to their needs

• Always seek multiple interpretations

• Ask your audience what their interpretation of a metric is before you offer yoursmetric is before you offer yours

• Sell, sell, sell, sell



One Truth and One Myth in ClosingThe TruthGilb’s Law“Anything you need to quantify can be measured in some way that is superior to not measuring it at all”

—Tom Gilb

The Myth“Some metrics are always better than no metrics …”


4ESTIMATION(OPTIONAL)


Estimation

Estimate:1. A tentative evaluation or rough calculation g2. A preliminary calculation of the cost of a project 3. A judgment based upon one’s impressions; opinion

—The American Heritage Dictionary

It is very difficult to make a vigorous plausible and job‐It is very difficult to make a vigorous, plausible, and jobrisking estimate that is derived by no quantitative method, supported by little data and certified chiefly by hunches of the managers.

— Fred Brooks87© 2014 SQE Training V3.2

Test Estimation

Estimation: the creation of an approximate target for costs and completion dates

The best estimates

Represent the collective wisdom of practitioners and have their buy‐in

g p

Provide specific, detailed catalogs of the costs, resources, tasks, and people involved

Present, for each activity estimated, the most likely cost, effort, and duration



Test Estimation (cont.)

Factors that can influence cost, effort, and duration include:

Required level of quality of the system

Size of the system to be tested

Historical data

Process factors (process maturity, etc.)Process factors (process maturity, etc.)

Material factors (tools, data, etc.)

People factors (skills, experience, managers, etc.)


Test Estimation (cont.)

• Delivery of estimates should include j tifi tijustification

• Negotiation and re‐work of estimates is normal

• Final estimates represent a balance of organizational and project goals in the areasorganizational and project goals in the areas of quality, schedule, budget, and features



How Good Is Our Industry (at Estimating)?• Tata: 62% of projects fail to meet schedule

49% have budget overruns• Moiokken and Jorgensen: 30-40% overruns


Class DiscussionWhy is estimating not done well?Y t fiYour top five reasons:

1) Too many variables____________________

2) ____________________________________

3) ____________________________________

4) ____________________________________

5) ____________________________________



Why Estimates Are Inaccurate ― Part I• Lack of estimating experience• Lack of historical data on which to base estimates• Lack of systematic estimation process sound• Lack of systematic estimation process, sound techniques, or models suited to the project

• Failure to include essential activities and products within the scope of the estimates

• Unrealistic expectations or assumptions• Failure to recognize and address the uncertainty inherent in project estimatesinherent in project estimates

Practical Software MeasurementAddison‐Wesley, 2001


Why Estimates Are Inaccurate ― Part II• Lack of education and training• Confusing the target with the estimate• Hope‐based planning• Inability to communicate and support estimates

• Incomplete, changing, and creeping requirements

• Quality surprises (test and re‐fix)• Quality surprises (test and re‐fix)

—adapted from Linda M. LairdThe Limitations of Estimation



Bohem’s Cone of Uncertainty


NHC Track Forecast Cone



“Testing” Track Forecast Cone(or why it is important to constantly re-estimate)

Ta

sk

s

T i m e97© 2014 SQE Training V3.2

The Fantasy FactorToday 1st 3rd2nd

Weeks

0 1 2 3 4 5 6 7 8 9

What would have to happen to deliver this in four weeks?weeks?

What should the estimate have been?



Estimation

1, 2, 3, or 4 Variables + Many Modifiers:

Time

If it’s not variable, then it’s fixed.

Size Resources


Time vs. Resources

=



Test Estimation Techniques ― Examples• Intuition and guess• Work‐breakdown‐structures• Three‐point estimates• Company standards and norms• % of project effort or staffing• Industry averages and predictive models (e.g., FP, TPA )• Team estimation sessions

d b d l h– Wideband Delphi– Story point sizing– Poker estimation– T‐shirt sizing


Karl Wiegers’s Estimation Safety Tips

• A goal is not an estimate• The estimate you produce should be unrelated to what you think the requester wants to hear

• The correct answer to any request for an estimate is “Let me get back to you on that”estimate is “Let me get back to you on that”

• Avoid giving single point estimates• Incorporate contingency buffers into estimates



Rick Craig’s Tips for Better Estimates• Do it!• Collect metrics• Remember the “fantasy” factor• Remember the fantasy factor• Don’t “pad” your estimates*• Don’t spend a ton of time• Estimates don’t have to be perfect

– Estimates are just estimates– They will change/constantly as you re‐estimate– Remember planning risks and contingenciesp g g– Remember Brooke’s Law

• If the date is fixed, estimate something else• Use tools• Use ranges of value instead of discrete numbers


measurement and metrics for test managers

Technology

sqe training v3

software measurement

senior software managers

systematic software

introduction measurement

human element withoutbuyin

sides of measurement

ma testing