gtac 2014: what lurks in test suites?
DESCRIPTION
We all want "better" test suites. But what makes for a good test suite? Certainly, test suites ought to aim for good coverage, at least at the statement coverage level. To be useful, test suites should run quickly enough to provide timely feedback. This talk will investigate a number of other dimensions on which to evaluate test suites. The talk claims that better test suites are more maintainable, more usable (for instance, because they run faster, or use fewer resources), and have fewer unjustified failures. In this talk, I'll present and synthesize facts about 10 open-source test suites (from 8,000 to 246,000 lines of code) and evaluate how they are doing.TRANSCRIPT
Beyond Coverage:What Lurks in Test Suites?
Patrick Lam, @uWaterlooSE(and Felix Fang)
University of Waterloo
Test Suites: Myths vs Realities.
Subjects: Open-Source Test Suites
Basic Test Suite Properties
Benchmark sizes: 30 kLOC (google-visualization) to 495 kLOC (weka)
% of system represented by tests: 5.3% (weka) to 50.4% (joda-time)
Static Test Suite Properties
Test suite versus benchmark size
m = 0.3002
m = 0.03514
# test cases versus # test methods
apache-commons-collection tests
Consider map.TestFlat3Map:contains 14 test methodsyet, 156 test cases
superclass tests: 42 tests+ 4 Apache Commons Collections “bulk tests”
Run-time Test Suite Properties
Test suites run quicklyjoda-time 4.9s
jdom 5.0s
google-vis 5.1s
jgrapht 16.9s
weka 28.9s
apache-cc 34.0s
poi 36.5s
jmeter 53.0s
jfreechart 241.0s
Failing tests
76/384
0n/a 0
1
03/1109
00
0
Continuous Integration: Daily Builds
Continuous Integration: Daily Tests
(via SonarQube, Travis CI, Surefire)
Myth #1:
Coverage is a key property of test suites.
Coverage is central in textbooks
Ammann and Offutt, Introduction to Software Testing
Coverage metrics from EclEmma
Coverage metrics
Reality #1
Coverage sometimes important, but tools only give limited data.
Guideline #1
Consider metrics beyond reported coverage results:
- weka uses peer review for QA- not measured by tools:
input space coverage
Myth #2
Tests are simple.- test complexity- test dependencies
Static Code Complexity
Test methods with at least 5 asserts
e.g. from Joda-Time:
public void testEquality() {
assertSame(getInstance(TOKYO), getInstance(TOKYO));
assertSame(getInstance(LONDON), getInstance(LONDON));
assertSame(getInstance(PARIS), getInstance(PARIS));
assertSame(getInstanceUTC(), getInstanceUTC());
assertSame(getInstance(), getInstance(LONDON));
}
% Test methods with ≥ 5 asserts
Test Methods with Branchesif (isAllowNullKey() == false) { try {
assertEquals(null, o.nextKey(null));
} catch (NullPointerException ex) {}
} else { assertEquals(null, o.nextKey(null));
}
// from apache-cc
Test Methods with Loops counter = 0;
while (this.complexPerm.hasNext()) { this.complexPerm.getNext();
counter++;
} assertEquals(maxPermNum, counter);
// from jgrapht
% Test Methods with Control-Flow
Tests Which Use the Filesystem
Filesystem Usage Details
new File(tempDir, "tzdata");
verifies vs canonical forms of serialized collections on disk
More Filesystem Usage Details
resources, serialization
creates charts, tests their existencesome comparisons vs test data
Tests Which Use the Network
*
Network Usage Details
connects to http://sc.openoffice.org
tests HTTP mirror server at localhost
flip side: Mocks and Stubs
True mocks only in Google Visualization.
flip side: Mocks and Stubs
True mocks only in Google Visualization.
Found stubs/fakes in 4 other suites.
Reality #2
Test cases are mostly simple.few asserts, little branchingsome filesystem/net usage
Consequence #2
Many tests don’t need high expertise to write,
but some do!
Myth #3
Test cases are written by hand.
Types of reuse (standard Java)
1. test class setUp()/tearDown()
2. inheritance: e.g. in apache-cc,TestFastHashMap extends AbstractTestMap
3. composition: e.g. in jfreechart, helper class RendererChangeDetector
JUnit setup/tearDown usage
Inheritance is heavily used
(> 50% test classes inherit functionality)
Test Classes with Custom Superclasses
Helper Classes Example
from poi:
/** Test utility class to get Records * out of HSSF objects. */public final class RecordInspector {
public static Record[] getRecords(...) {}}
Helper Class Countweka 1
google-vis 3
jdom 6
joda-time 7
jfreechart 7
jmeter 12
jgrapht 15
apache-cc 22
hsqldb 31
poi 54
public void testNominalFiltering() {
m_Filter = getFilter(Attribute.NOMINAL);
Instances r = useFilter();
for (int i = 0; i < r.numAttributes(); i++)
assertTrue(r.attribute(i).type() != Attribute.NOMINAL);}
public void testStringFiltering() {
m_Filter = getFilter(Attribute.STRING);
Instances r = useFilter();
for (int i = 0; i < r.numAttributes(); i++)
assertTrue(r.attribute(i).type() != Attribute.STRING);}
Test Clone Example
Assertion Fingerprints
detect clones by identifyingsimilar tests
Incidence of cloning
How to Refactor?
● setUp/tearDown/subclassing● JUnit 4:
Parametrized Unit Tests● Test Theories
apache-cc: Bulk testspublic BulkTest bulkTestKeySet() { return new TestSet(makeFullMap().keySet());}
● runs all tests in the TestSet class with the object returned from makeFullMap().keySet()
jdom: Generated Test Case Stubs
class ClassGenerator makes e.g.: class TestDocument {void test_TCC__List();void test_TCM__int_hashCode();
}
Developer still needs to populate tests.
Automated Testing Technology
In our test suites, the principal automation technology was cut-and-paste.
Reality #3
Automated test generationis uncommon in our test suites.
Guideline
Maximize reuse:
whatever works for you!
setUp/tearDown,inheritance,parametrized tests,
Suggestion
Use automated test generation tools!Some examples:
● Korat (structurally complex tests)● Randoop (random testing)● CERT Basic Fuzzing Framework
http://mit.bme.hu/~micskeiz/pages/code_based_test_generation.html
Summary
Myths:1. Coverage is a key property
of test suites. ≈2. Tests are simple. ✓3. Tests are written by hand. ✓
Datahttps://docs.google.com/spreadsheets/d/1xAsdk35tJAOM4WGbGloliS4ovDJ8_MDn6_Gzk0DXEZQ