measurement and metrics for test managers

© 2013 SQE Training V3.1 1

Introduction MEASUREMENT AND METRICS FOR TEST MANAGERS

Hidden Slide for Logos

2 © 2013 SQE Training V3.1


Hidden Slide for SQE Training Copyright


Administrivia

Course timing

Electronic devices

Smoking

Meals

Facilities

Breaks



Course Agenda

1.  Introduc+on to So.ware Measurement 2.  Metrics—Rules of Thumb 3.  A Tester’s Dashboard 4.  Es+ma+on (Op+onal)


1 INTRODUCTION TO SOFTWARE MEASUREMENT


What is software measurement?

— Bill Hetzel

“It’s easy to get numbers, what is hard is to know they are right and understand what they mean”


What is software measurement?

“Quan+fied observa+ons” about any aspect of so.ware (product, process, or project)



Lord Kelvin “To measure is to know” “If you cannot measure it, you cannot improve it”

“The more you understand what is wrong with a figure, the more valuable that figure becomes”


There Are Lots and Lots of Measures Primi+ve:

–  Aspirins consumed this week –  Number of staff assigned to project A –  Pages of requirements specifica+ons –  Hours worked to accomplish change request X –  Number of opera+onal failures in system Y this year –  Lines of code in program Z

Computed: –  Defects per 1000 lines of code in program A –  Produc+vity in func+on points delivered by person B

–  Quality Score for project C –  Average coffee consump+on per line of code –  Accuracy of hours worked per week is ± 20%



Common Metrics •  Test defects •  Defects a.er release •  Open problems •  Open issues •  Schedule performance •  Process compliance (e.g., ISO) •  Test results •  Reliability •  Time fixing problems •  Defects from fixes •  Lines of code •  Plan and schedule changes


Uncommon Metrics

• Code coverage • Complexity • Cost of rework • Cost of quality

Defect age



Basic Definitions The four Ms:

•  Measure •  Metric •  Meter •  Meta-‐measure

Primi+ve (raw data)

Computed (informa+on)

13, 34, 17, 74 42, 34, 56, 77 94, 34, 45, 63 45, 67, 12, 31 61, 06, 91, 42


What Makes a Good Measure?

•  Simple •  Objec+ve •  Easily collected •  Robust •  Valid



What Can Measures Do for You? •  Facilitate es+ma+on •  Iden+fy risky areas •  Measure tes+ng status •  Measure/predict product quality •  Measure test effec+veness •  Iden+fy training opportuni+es •  Iden+fy process improvement opportuni+es •  Provide “meters” to flag ac+ons


2 METRICS—RULES OF THUMB


Metrics--Rules of Thumb

•  The Human Element •  The Basics •  KISS •  And a Myth or Two


The Human Element

• Without buy-‐in, metrics may be falsified

• Without buy-‐in, metrics may be ignored

Buy-‐in is key



Class Discussion

How do you obtain buy-‐in?


Ways to Obtain Buy-in

•  Training •  Metrics •  Feedback loops •  Reviews •  Par+cipa+on



The Human Element

•  Measure processes and products instead of people if possible

•  Beware of the dark side of the Hawthorne Effect


Two Sides of Measurement

…the informa-on may be used against me.

…the informa-on will help me

understand what is going on and do

a be8er job.



The Hawthorne Effect

Measuring people improves their produc+vity


The Human Element

Tailor metrics to the audience

Users, managers, prac++oners all have different languages

Set the appropriate level of detail

How you present the material maqers



Who is your audience?

Developers

Testers

Users


% of Red Cars Soars

2008 2009 2010 25.1

25.5

26

25.4

25.2 25.3 25.4 25.5 25.6 25.7 25.8 25.9 26.0 26.1



% of Red Cars Soars?

2008 2009 2010

25

50

75

100

25.5 26 25.4


The Human Factor

Training is required

Metrics are not second nature

Your metrics are affected by how they are collected

Establish range of expected values

Publish historical values



The Basics

•  Use a metric to validate a metric •  Use meta-‐measures •  Use meters when possible •  Consistency some+mes trumps accuracy •  Subjec+ve is good; objec+ve is beqer


KISS ― Keep It Simple Sir

•  More is not always beqer •  All metrics are not forever – Consider temporary metrics – Consider sampling

•  Automate collec+on when possible



This Slide Is Hidden


3 A TESTER’S DASHBOARD


A Dashboard


Establish a Dashboard

•  Easy to use/understand at a glance

* Remember you need at least two metrics per “instrument”

Quality of product

Status

Test effec+veness

Resources

Issues



Measures of Quality

•  It is difficult to develop prac+cal measures of quality

•  The cost to achieve various quality levels must be taken into account

•  Many quality metrics are rela+vely subjec+ve

•  Quality goals will be affected by the industry and corporate culture


What Is Quality?

• Mee+ng requirements (stated and/or implied) Quality



Sample Quality Factors and Criteria •  Correctness •  Reliability •  Testability •  Flexibility •  Usability •  Portability •  Interoperability •  Efficiency •  Integrity •  Maintainability •  Revisability •  Survivability

Correctability Correctness Correctness Correctness Correctness Correctness Correctness


Defect Density/Clustering

Module Name

# of Defects

per 1,000

Lines of Code

D B A C E F



Defect Density

Issues Coverage of tests

Weighting of defects

Weighting by relative risk

What to use as the denominator


Effect of Complexity on Quality

Complexity

Pro

babi

lity

of P

ost-r

elea

se

Def

ect



Other Measures of Product Quality •  Customer sa+sfac+on •  Repeat customers? •  Referrals? •  Calls to the help desk? •  Timeliness? •  Defect age? •  Complexity? •  Rework? •  Reliability?


Quality of Product

•  Record any current measures of product quality that you are using here. Give them a grade for effec+veness (A, B, C, etc.)

•  Any new metrics you would use?

* Remember you need at least two metrics per instrument




•  Easy to use/understand at a glance:


Quality of product

Status

Test effec+veness

Resources

Issues


Status Reporting •  The Master Test Plan should

specify

– What to report – How often – To whom

Bugs



Common Test Status Metrics % of Test Cases Executed

Issues: •  Weighting of TC by coverage metrics •  Weighting of TC by risk •  Weighting of TC by execution effort •  Weighting of TC by time to execute What do you really want to know?


Sample Test Status Report (raw data) Project: Online-Trade Date: 4/23/2009 Feature Total # % # % Tested Tests Complete Complete Success Success Open Acct 46 46 100 41 89 Sell Order 36 25 69 25 69 Buy Order 19 17 89 12 63 ….. ….. ….. ….. Totals 395 320 81 311 79



Open and Closed Over Time

2 4 6 8 10 12 14 16 18 20

Weeks

Def

ects

Incoming

Fixed

Released

0

10

20

30

40

24 22 20 18 16 14 12 10

8 6 4 2 0

Days

Cum

ulat

ive

Def

ects

Detected

Open

0 10 20 30 40


When Is the Software “Good Enough”?

•  Test exit criteria met •  Return On Investment (ROI) not sufficient •  Defect arrival rate •  Resources exhausted

–  Time – Money

•  Profiles (based on failures encountered using profiles of real data)

•  Project cancelled!

When to stop tes-ng



Software Psychology What is “good enough”?

Time

# of Bugs


Economics of Test and Failure

Source: IBM Systems Sciences Institute



Stopping Criteria ― Revisited Abnormal

•  Resource exhaustion –  Schedule –  Budget –  System access –  Patience

•  Project redirection

Normal •  Test set exit criteria •  Remaining defects

estimation criteria –  Defect history of past software –  Defect history of current item –  Software complexity –  Combination of these

•  Diminishing return criteria

–  Cost to Detect Next Defect

•  Combined criteria “There is no single, valid, rational criterion for stopping. Furthermore, given any set of applicable criteria, how each is weighed depends very much on the product, the environment, the culture, and the attitude to risk.”

— Boris Beizer


Test Summary Report •  Report identifier •  References

–  Test items (with revision #s) –  Environments –  References

•  Variances (deviations) –  From test plan or

requirements –  Reasons for deviations

•  Summary of incidents –  Resolved incidents –  Defect patterns –  Unresolved incidents

Adequacy assessment Evaluation of coverage Identify uncovered attributes

Summary of activities System/CPU usage Staff time Elapsed time

Software evaluation Limitations Failure likelihood

Approvals



Status •  Record any current test status measures that you are using here. Give them a grade for effec+veness (A, B, C, etc.)

•  Any new metrics you would use? * Remember you need at least two metrics per instrument





Quality of product

Status

Test effec-veness

Resources

Issues



How Do You Measure Test Effectiveness?


A Common Answer

– Coverage – Defect age (phase or product version) – # of bugs – Defect density – Defect removal efficiency – Defect seeding – Muta+on analysis – Customer complaints



Three Major Categories


Customer Satisfaction Measures

Issues

Who to ask

“A.er the fact”

Difficulty in measuring

Doesn’t differen+ate between the effec+veness of development and tes+ng



Customer Satisfaction Measures •  Subjec+ve is good •  Objec+ve is beqer


Defect Measures

•  Why is it important to track defects? •  What are some ways to analyze defects? •  DDP •  Defect density •  Defect age



Why is it important to track defects? •  Iden+fy process improvement •  Iden+fy training needs •  Iden+fy problema+c (high-‐risk) areas •  Determine test status


Defect Analysis ― Example

•  Phase •  Type •  Severity •  Priority •  Author •  Age •  Module



Defect Detection Percentage (DDP)

DDP = Defects Discovered

x 100% Defects at Start

85% is the average DRE for US software projects greater than 1,000 function points in size.

— Capers Jones


Defect Detection Percentage (DDP)

Issues

Severity and distribu+on of defects

How to know when all bugs are found

“A.er the fact”

What cons+tutes bug-‐finding ac+vi+es?

Some bugs cannot be found in tes+ng



Defect “Value” (Cost Avoidance)

Requirements 1 High level design 1 Detailed design 1 Code 1 Unit Test 3 – 5 Integration test 5 –10 System/acceptance test 10 – 30 Production 20 – 60+

When discovered Typical hours to rework/fix


Defect Age (PhAge)

Requirements High level design Detailed design Coding

Req

uire

men

ts

Hig

h le

vel d

esig

n

Det

aile

d de

sign

Cod

ing

Uni

t tes

ting

Inte

grat

ion

test

ing

Syst

em te

stin

g

Acc

epta

nce

test

ing

Pilo

t

Prod

uctio

n

0

Phase discovered

Phase created

1 4 3 2 9 8 7 6 5

0 3 2 1 8 7 6 5 4

2 1 0 7 6 5 4 3

1 0 6 5 4 3 2



Defect Age

Issues

Difficult to do root cause

Requires weighting of defects

How to handle latent/masked defects


Coverage Measures

Discussion Requirements vs. design vs. code coverage

Completeness/accuracy of test basis

Coverage of test set vs. coverage of tests executed (e.g., we don’t always run every test)

Coverage vs. actual results (DDP)



Mapping Test Cases to Requirements

Requirements spec.

3.5.1.3.2

…..

3.5.1.4.7

…..

3.6.4.2.1

…..

3.8.2.7.1

Test plan

Test Case #3

…..

Test Case #5

…..

Test Case #12

…..

Test Case #19


Requirements/Design Coverage

Test Case 1 2 3 Covered?

Requirement A X X Y B N C X X Y

Feature A X Y B X X Y

Design A X X Y B X Y C N D X X Y

Conceptual model of requirements/ design coverage:



Requirements/Design Coverage

Issues Only as good as test basis

Relatively low coverage of code

Code coverage achieved with requirements tests Major bank (20 apps) 20% Major DBMS vendor 47% Major h/w s/w vendor 60%

— Source: Bender and Associates


Code Coverage

Test run 1 2 3 Covered?

Statement A X X X Y B X X Y C X Y D N E X Y 60% 20% 60% 80%

Conceptual model of code coverage



Code Coverage

Issues Requires a tool

Doesn’t prove the code actually “works” correctly

Did we test the “right code”?

Statement vs. branch vs. path


Test Effectiveness •  Record any current test effec+veness measures that you are using here. Give them a grade for effec+veness (A, B, C, etc.)








Quality of product

Status

Test effec+veness

Resources

Issues


Resources •  Resource es+mates/consump+on are necessary in order to do test planning, es+ma+on, budge+ng, and staffing

•  You must consider the level of granularity in the collec+on of these metrics based on the accuracy of the required metrics and your ability to validate them

•  Some people choose to exclude the resources instrument from the dashboard because they feel it is not a “day to day” metric



Resources Resource metrics are normally collected in terms of

Actual/expected budget

Actual/expected engineering hours

Test environment u+liza+on/availability

Staffing levels

Contractor availability

Other hardware/so.ware resources


Resources •  Record any current resource measures that you are using here. Give them a grade for effec+veness (A, B, C, etc.)








Quality of Product

Status

Test Effec+veness

Resources

Issues


Issues

•  This is included to address any important items not otherwise included on the dashboard. These are normally subjec+ve and not necessarily conducive to systema+c analysis

•  Issues could involve training, installa+on of new hardware/so.ware, poli+cs—even the weather



A Sample Tester’s Dashboard

Status • % completion • Defect info

Product Quality • Defect density • Performance, etc.

Test Effectiveness • DDP • Coverage

Resources • Engineering hours • Money

Issues


Avoiding Dysfunction

•  Measure processes and products—not people!

•  Beware of the dark side of the Hawthorne Effect •  Remember that more is not always beqer

•  Avoid the exclusive use of top-‐down metrics

•  Provide training—not all metrics are intui+ve

•  Consider temporary metrics



Avoiding Dysfunction

•  Define each metric, its use, who will see it, expected ranges, etc.

•  Remember your audience and tailor to their needs

•  Always seek mul+ple interpreta+ons

•  Ask your audience what their interpreta+on of a metric is before you offer yours

•  Sell, sell, sell, sell


One Truth and One Myth in Closing The Truth Gilb’s Law “Anything you need to quan+fy can be measured in some way that is superior to not measuring it at all” —Tom Gilb

The Myth “Some metrics are always beqer than no metrics …”



This Slide Is Hidden


4 ESTIMATION (OPTIONAL)


Estimation

Es+mate: 1.  A tenta+ve evalua+on or rough calcula+on 2.  A preliminary calcula+on of the cost of a project 3.  A judgment based upon one’s impressions; opinion —The American Heritage DicSonary

It is very difficult to make a vigorous, plausible, and job-‐risking es-mate that is derived by no quan-ta-ve method, supported by li8le data and cer-fied chiefly by hunches of the managers.

— Fred Brooks 87 © 2013 SQE Training V3.1

Test Estimation

The best es+mates

Represent the collec+ve wisdom of prac++oners and have their buy-‐in

Provide specific, detailed catalogs of the costs, resources, tasks, and people involved

Present, for each ac+vity es+mated, the most likely cost, effort, and dura+on

Es+ma+on: the crea+on of an approximate target for costs and comple+on dates



Test Estimation (cont.)

Factors that can influence cost, effort, and dura+on include:

Required level of quality of the system

Size of the system to be tested

Historical data

Process factors (process maturity, etc.)

Material factors (tools, data, etc.)

People factors (skills, experience, managers, etc.)


Test Estimation (cont.)

•  Delivery of es+mates should include jus+fica+on

•  Nego+a+on and re-‐work of es+mates is normal

•  Final es+mates represent a balance of organiza+onal and project goals in the areas of quality, schedule, budget, and features



How Good Is Our Industry (at Estimating)? •  Tata: 62% of projects fail to meet schedule

49% have budget overruns •  Moiokken and Jorgensen: 30-40% overruns


Class Discussion Why is es+ma+ng not done well? Your top five reasons:

1)  Too many variables____________________

2)  ____________________________________

3)  ____________________________________

4)  ____________________________________

5) ____________________________________



Why Estimates Are Inaccurate ― Part I •  Lack of es+ma+ng experience •  Lack of historical data on which to base es+mates •  Lack of systema+c es+ma+on process, sound techniques, or models suited to the project

•  Failure to include essen+al ac+vi+es and products within the scope of the es+mates

•  Unrealis+c expecta+ons or assump+ons •  Failure to recognize and address the uncertainty inherent in project es+mates

PracScal SoUware Measurement Addison-‐Wesley, 2001


Why Estimates Are Inaccurate ― Part II •  Lack of educa+on and training •  Confusing the target with the es+mate •  Hope-‐based planning •  Inability to communicate and support es+mates

•  Incomplete, changing, and creeping requirements

•  Quality surprises (test and re-‐fix)

—adapted from Linda M. Laird The LimitaSons of EsSmaSon



Bohem’s Cone of Uncertainty


NHC Track Forecast Cone



“Testing” Track Forecast Cone

Best case

Expected case

Worst case

T i m e

T as k

s

(or why it is important to constantly re-estimate)


The Fantasy Factor

Weeks

Today

0 1 2 3 4 5 6 7 8 9

1st 3rd 2nd

What would have to happen to deliver this in four weeks?

What should the es+mate have been?



Estimation

1, 2, 3, or 4 Variables + Many Modifiers:

If it’s not variable, then it’s fixed.

Time

Size Resources


Time vs. Resources

=



Test Estimation Techniques ― Examples •  Intui+on and guess •  Work-‐breakdown-‐structures •  Three-‐point es+mates •  Company standards and norms •  % of project effort or staffing •  Industry averages and predic+ve models (e.g., FP, TPA ) •  Team es+ma+on sessions

– Wideband Delphi –  Story point sizing –  Poker es+ma+on –  T-‐shirt sizing


Karl Wiegers’s Estimation Safety Tips

•  A goal is not an es+mate •  The es+mate you produce should be unrelated to what you think the requester wants to hear

•  The correct answer to any request for an es+mate is “Let me get back to you on that”

•  Avoid giving single point es+mates •  Incorporate con+ngency buffers into es+mates 102 © 2013 SQE Training V3.1


Rick Craig’s Tips for Better Estimates •  Do it! •  Collect metrics •  Remember the “fantasy” factor •  Don’t “pad” your es+mates* •  Don’t spend a ton of +me •  Es+mates don’t have to be perfect

–  Es+mates are just es+mates –  They will change/constantly as you re-‐es+mate –  Remember planning risks and con+ngencies –  Remember Brooke’s Law

•  If the date is fixed, es+mate something else •  Use tools •  Use ranges of value instead of discrete numbers


measurement and metrics for test managers

Technology