gtac 2014: what lurks in test suites?

Beyond Coverage:What Lurks in Test Suites?

Patrick Lam, @uWaterlooSE(and Felix Fang)

University of Waterloo

Test Suites: Myths vs Realities.

Subjects: Open-Source Test Suites

Basic Test Suite Properties

Benchmark sizes: 30 kLOC (google-visualization) to 495 kLOC (weka)

% of system represented by tests: 5.3% (weka) to 50.4% (joda-time)

Static Test Suite Properties

Test suite versus benchmark size

m = 0.3002

m = 0.03514

# test cases versus # test methods

apache-commons-collection tests

Consider map.TestFlat3Map:contains 14 test methodsyet, 156 test cases

superclass tests: 42 tests+ 4 Apache Commons Collections “bulk tests”

Run-time Test Suite Properties

Test suites run quicklyjoda-time 4.9s

jdom 5.0s

google-vis 5.1s

jgrapht 16.9s

weka 28.9s

apache-cc 34.0s

poi 36.5s

jmeter 53.0s

jfreechart 241.0s

Failing tests

76/384

0n/a 0

1

03/1109

00

0

Continuous Integration: Daily Builds

Continuous Integration: Daily Tests

(via SonarQube, Travis CI, Surefire)

Myth #1:

Coverage is a key property of test suites.

Coverage is central in textbooks

Ammann and Offutt, Introduction to Software Testing

Coverage metrics from EclEmma

Coverage metrics

Reality #1

Coverage sometimes important, but tools only give limited data.

Guideline #1

Consider metrics beyond reported coverage results:

- weka uses peer review for QA- not measured by tools:

input space coverage

Myth #2

Tests are simple.- test complexity- test dependencies

Static Code Complexity

Test methods with at least 5 asserts

e.g. from Joda-Time:

public void testEquality() {

assertSame(getInstance(TOKYO), getInstance(TOKYO));

assertSame(getInstance(LONDON), getInstance(LONDON));

assertSame(getInstance(PARIS), getInstance(PARIS));

assertSame(getInstanceUTC(), getInstanceUTC());

assertSame(getInstance(), getInstance(LONDON));

}

% Test methods with ≥ 5 asserts

Test Methods with Branchesif (isAllowNullKey() == false) { try {

assertEquals(null, o.nextKey(null));

} catch (NullPointerException ex) {}

} else { assertEquals(null, o.nextKey(null));

}

// from apache-cc

Test Methods with Loops counter = 0;

while (this.complexPerm.hasNext()) { this.complexPerm.getNext();

counter++;

} assertEquals(maxPermNum, counter);

// from jgrapht

% Test Methods with Control-Flow

Tests Which Use the Filesystem

Filesystem Usage Details

new File(tempDir, "tzdata");

verifies vs canonical forms of serialized collections on disk

More Filesystem Usage Details

resources, serialization

creates charts, tests their existencesome comparisons vs test data

Tests Which Use the Network

*

Network Usage Details

connects to http://sc.openoffice.org

tests HTTP mirror server at localhost

flip side: Mocks and Stubs

True mocks only in Google Visualization.

flip side: Mocks and Stubs

True mocks only in Google Visualization.

Found stubs/fakes in 4 other suites.

Reality #2

Test cases are mostly simple.few asserts, little branchingsome filesystem/net usage

Consequence #2

Many tests don’t need high expertise to write,

but some do!

Myth #3

Test cases are written by hand.

Types of reuse (standard Java)

1. test class setUp()/tearDown()

2. inheritance: e.g. in apache-cc,TestFastHashMap extends AbstractTestMap

3. composition: e.g. in jfreechart, helper class RendererChangeDetector

JUnit setup/tearDown usage

Inheritance is heavily used

(> 50% test classes inherit functionality)

Test Classes with Custom Superclasses

Helper Classes Example

from poi:

/** Test utility class to get Records * out of HSSF objects. */public final class RecordInspector {

public static Record[] getRecords(...) {}}

Helper Class Countweka 1

google-vis 3

jdom 6

joda-time 7

jfreechart 7

jmeter 12

jgrapht 15

apache-cc 22

hsqldb 31

poi 54

public void testNominalFiltering() {

m_Filter = getFilter(Attribute.NOMINAL);

Instances r = useFilter();

for (int i = 0; i < r.numAttributes(); i++)

assertTrue(r.attribute(i).type() != Attribute.NOMINAL);}

public void testStringFiltering() {

m_Filter = getFilter(Attribute.STRING);

Instances r = useFilter();

for (int i = 0; i < r.numAttributes(); i++)

assertTrue(r.attribute(i).type() != Attribute.STRING);}

Test Clone Example

Assertion Fingerprints

detect clones by identifyingsimilar tests

Incidence of cloning

How to Refactor?

● setUp/tearDown/subclassing● JUnit 4:

Parametrized Unit Tests● Test Theories

apache-cc: Bulk testspublic BulkTest bulkTestKeySet() { return new TestSet(makeFullMap().keySet());}

● runs all tests in the TestSet class with the object returned from makeFullMap().keySet()

jdom: Generated Test Case Stubs

class ClassGenerator makes e.g.: class TestDocument {void test_TCC__List();void test_TCM__int_hashCode();

}

Developer still needs to populate tests.

Automated Testing Technology

In our test suites, the principal automation technology was cut-and-paste.

Reality #3

Automated test generationis uncommon in our test suites.

Guideline

Maximize reuse:

whatever works for you!

setUp/tearDown,inheritance,parametrized tests,

Suggestion

Use automated test generation tools!Some examples:

● Korat (structurally complex tests)● Randoop (random testing)● CERT Basic Fuzzing Framework

http://mit.bme.hu/~micskeiz/pages/code_based_test_generation.html



Summary

Myths:1. Coverage is a key property

of test suites. ≈2. Tests are simple. ✓3. Tests are written by hand. ✓

Datahttps://docs.google.com/spreadsheets/d/1xAsdk35tJAOM4WGbGloliS4ovDJ8_MDn6_Gzk0DXEZQ

https://docs.google.com/spreadsheets/d/1xAsdk35tJAOM4WGbGloliS4ovDJ8_MDn6_Gzk0DXEZQ




gtac 2014: what lurks in test suites?

Engineering

test methodsyet

test casessuperclass

test class setupteardown2

opensource test suites

key propertyof test

static test suite properties

runtime test suite properties

coverage metrics