empirical studies in test-driven development laurie williams, ncsu [email protected]

Empirical Studies in Test-Driven Development

Laurie Williams, [email protected]

© Laurie Williams 2007

Agenda

• Overview of Test-Driven Development (TDD)

• TDD Case Studies

• TDD within XP Case Studies

• Summary


Test-driven Development


Overview of TDD via JUnit


xUnit tools


Agenda




• Summary


Structured Experiment

– 3 companies, 24 professional developers • all worked in pairs• Rolemodel, John Deere, Ericsson

– Random groups: Test-first vs. test-last • Test-last “refused” to write automated JUnit tests test-never

– Test-first: • 18% more manual, black box test cases passed• 16% more time

• Good test coverage (98% Method, 92% Statement, 97% Branch)


IBM Retail Device Drivers

0

2

4

6

8

10

12

14

16

18

1 2 3 4 5 6 7 8 9 10

Release

Nu

mb

er o

f d

evel

oper

s

Raleigh Guadalajara Martinez

-20000

-10000

0

10000

20000

30000

40000

50000

1 2 3 4 5 6 7 8 9 10

Release

Lin

es o

f C

ode

Source Test

IBM research partners:Julio Sanchez (Mexico) and Michael Maximilien (Almaden)


Building Test Assets

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 2 3 4 5 6 7 8 9 10

Release

Tes

t L

OC

/Sou

rce

LO

C

Test LOC/Source LOC


Defect Density

1 2 3 4 5 6 7 8 9 10

Release

Def

ects

/KL

OC

Internal External


The New Anti-aging Formula?

1 2 3 4 5 6 7 8 9 10

Release

Cyclomatic Complexity Test LOC/Source LOC


Lessons Learned

1. A passionate champion for the practice and tools is necessary.

2. JUnit well structured framework relative to “ad hoc” automated testing.

3. Can extend JUnit to handle necessary manual intervention.

4. Not all tests can be automated.

5. Create a good design using object-oriented principles.

6. Execute all tests prior to checking code into code base.

7. Write tests prior to code.

8. Institute a nightly build that includes running all the unit tests.

9. Create a test whenever a defect is detected internally or externally.


Test-first Performance

Execute Tests for Performance-Critical Devices

Performance Log File

Fix Problem Problems

Check In

Implement a Feature

Functional Tests Passed?

Y

Y

N

N

Test Design

Performance Problems Found?

Project Finished?

Y

N

Execute Master Test Suite

Performance Measurements

Performance Problems Found?

Make Suggestions for Performance

Improvement

Y

N

Test Design

Project Finished?

Y

N

Research partners: Chih-wei Ho; IBM: Mike Johnson and Michael Maximilien


Sample Results

Printer Throughput Tests (taller is better)

0

5

10

15

20

25

30

ADept ASmkt MaxS SDept

S1

S2

S3

S4

Printer Throughput Tests (taller is better)

0

5

10

15

20

25

ADept ASmkt MaxS SDept

Previous

S3


Agenda




• Summary

© Laurie Williams 2007 16

IBM: XP-Context Factors (XP-cf)• Small team (7-10)• Co-located• Web development (toolkit)• Supplier and customer

distributed (US and overseas)

• Examined one release “old” (low XP) to the next “new” (more XP)


IBM: XP-Adherence Metrics (XP-am)

XP-am Metric Practice Old New

Automated test class per user story

Testing 0.11 0.45

Test coverage (statement) Testing 30% 46%

Unit test runs per person day Testing 14% 11%

Test LOC/Source LOC Testing 0.26 0.42

Accept test execute Testing Manual Manual

Did customers run your acceptance tests?

Testing No No

Pairing Frequency Pair Programming <5% 48%

Release Length Short Release 10 months 5 months

Iteration Length Short Release Weekly Weekly

© Laurie Williams 2007© Laurie Williams 2007

18

IBM: XP-Outcome Measures (XP-om)

XP Outcome Measures Old New

Response to Customer Change(Ratio (user stories in + out) /total)

NA 0.23

Pre-release Quality(test defects/KLOEC of code)

1.0 0.50

Post-release Quality(released defects/KLOEC of code)

1.0 0.61

Productivity (stories / PM)Relative KLOEC / PMPutnam Product. Parameter

1.01.01.0

1.341.701.92

Customer Satisfaction NA High (qualitative)

Morale (via survey) 1.0 1.11



19

Sabre-A: XP Context Factors (XP-cf)



20

Sabre-A: XP-Adherence Metrics (XP-am)

XP-am Metric Practice Old New

Automated test class per new/changed class

Testing 0.036 0.572

Test coverage (statement) Testing N/A 32.9%

Unit test runs per person day Testing 0 1.0

Test LOC/Source LOC Testing 0.054 0.296

Accept test execute Testing Manual Manual

Did customers run your acceptance tests?

Testing No No

Pairing Frequency Pair Programming

<0% 50%

Release Length Short Release 18 months 3.5 months

Iteration Length Short Release -- 10 days


21

Sabre-A: XP-Outcome Measures (XP-om)

XP Outcome Measures Old New

Response to Customer Change(Ratio (user stories in + out) /total)

NA N/A

Pre-release Quality(test defects/KLOEC of code)

1.0 0.35

Post-release Quality(released defects/KLOEC of code)

1.0 0.70

Productivity (stories / PM)Relative KLOEC / PM Putnam Product. Parameter

N/A1.01.0

N/A1.462.89

Customer Satisfaction NA High (anecdotal)

Morale (via survey) N/A 68.1%


Sabre-P: XP Context Factors (XP-cf) Medium sized team (15) Co-located Large web application (1M LOC) Customers domestic & overseas

Examined 13th release of the product; 20 months after starting XP



23

Sabre-P: XP-Adherence Metrics (XP-am)

XP-am Metric Practice New


Testing 0.0225

Test coverage (statement) Testing 7.7%

Unit test runs per person day Testing 0.4

Test LOC/Source LOC Testing 0.296

Pair programming Pair programming 70%

Release Length Short Release 3 months

Iteration Length Short Release 10 days


24

Sabre-P: XP-Outcome Measures (XP-om)

XP Outcome Measures Bangalore SPIN Benchmarking group

Capers Jones

Pre-release defect density Similar Lower

Total defect density Lower Lower

Productivity Similar Higher


Tekelec: XP Context Factors (XP-cf)

• Small team (4-7; 2 during maintenance phase)

• Geographically distributed– Contractors in Czech Republic for US

development organization (Tekelec)

• Simulator for a telecommunications signal transfer point system (train new customers)

• Considerable amount of requirements volatility


26

Tekelec: XP-Adherence Metrics (XP-am)

XP-am Metric Practice New


Testing 1.0

Test coverage (statement) Testing N/A

Unit test runs per person day Testing 1/day for all; 1/hour for quickset

Test LOC/Source LOC Testing 0.91

Pair programming Pair programming 77.5%

Release Length Short Release 4 months

Iteration Length Short Release 10 days


27

Tekelec: XP-Outcome Measures (XP-om)

Outcome measure F-15 project

Pre-release Quality(test defects/KLOEC)

N/A

Post-release Quality(post-release defects/KLOEC)

1.62 defects/KLOEC [Lower than industry standards]

Customer Satisfaction(interview)

Capability – NeutralReliability – SatisfiedCommunication – Very Satisfied

Productivity 1.22 KLOEC/PM [Lower than industry standards]

2.32 KLOEC/PM (including test code) [on par with industry standards]


28

Empirical Studies of XP Teams


Agenda




• Summary


Summary

• Increased quality with “no” long-run productivity impact

• Valuable test assets created in the process• Indications:

– Improved design– Anti-aging

empirical studies in test-driven development laurie williams, ncsu [email protected]

Documents

xp laurie williams

junit laurie williams

branch laurie williams

xunit tools laurie williams

defect density laurie

sample results laurie

overview of tdd

test locsource loctesting0