effective testing of apache accumulo iterators

19
Effective Testing of Apache Accumulo Iterators Josh Elser Accumulo Summit 2016 2016/10/11

Upload: josh-elser

Post on 09-Feb-2017

63 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Effective Testing of Apache Accumulo Iterators

Effective Testing ofApache Accumulo IteratorsJosh ElserAccumulo Summit 20162016/10/11

Page 2: Effective Testing of Apache Accumulo Iterators

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Engineer at Hortonworks, Member of the Apache Software Foundation

Top-Level Projects• Apache Accumulo®• Apache Calcite™• Apache Commons ™• Apache HBase ®• Apache Phoenix ™

ASF Incubator• Apache Fluo ™• Apache Gossip ™• Apache Pirk ™• Apache Rya ™• Apache Slider ™

These Apache project names are trademarks or registeredtrademarks of the Apache Software Foundation.

Page 3: Effective Testing of Apache Accumulo Iterators

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

A Novel Feature of Apache Accumulo

SortedKeyValueIterator (SKVI or “Iterators”) Computation offload Reduced I/O Rumored to be called “cool” by Jeff Dean

TransformationsServer-Side

Predicate-Pushdown

Filters

Aggregations

Combiners

Versioning

Security

Page 4: Effective Testing of Apache Accumulo Iterators

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Accumulo Iterators

Column Slices (CfCqSliceFilter) Basic Statistics (StatsCombiner) Value/Array Concatenation (Summing[Array]Combiner) Aggregations (WholeRowIterator, WholeColumnFamilyIterator) In-Row operations (AndIterator, OrIterator) Filters (RegExFilter, GrepIterator, FirstEntryInRowIterator)

Page 5: Effective Testing of Apache Accumulo Iterators

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Reads

Clients request a Range of data Key to Row to Tablet to TabletServer Sorted, merged-read of memory and files Computation offload and RPC boost

Tablet

Memory RFileRFile

RFileRFile

RFileClient

Iterators

Page 6: Effective Testing of Apache Accumulo Iterators

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Reads with Iterators

A poor-man’s “VIEW” Server-side transformation at query-time

Raw Key Value Transformed Key Value

3141592 siblings:brothers Bobby,Steven 3141592 siblings:count 4

3141592 siblings:sisters Sally,Francine

3141593 siblings:brothers Frank 3141593 siblings:count 3

3141593 siblings:sisters Amy,Loretta

3141594 siblings:brothers 3141594 siblings:count 2

3141594 siblings:sisters Rebecca,Savannah

Page 7: Effective Testing of Apache Accumulo Iterators

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Compactions

Bounds number of files and performance Iterators provide data optimization mechanism

Tablet

RFileRFile

RFileRFile

RFile

RFile

RFile

Before AfterIterators

Page 8: Effective Testing of Apache Accumulo Iterators

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Compactions with Iterators Deferred aggregation Rewrite application data in optimal form

Raw Key Value Transformed Key Value

3141592 siblings:brothers Bobby,Steven 3141592 siblings:brothers …

3141592 siblings:count 4

3141592 siblings:sisters Sally,Francine 3141592 siblings:sisters …

3141593 siblings:brothers Frank 3141593 siblings:brothers …

3141593 siblings:count 3

3141593 siblings:sisters Amy,Loretta 3141593 siblings:sisters …

3141594 siblings:brothers 3141594 siblings:brothers …

3141594 siblings:counts 2

3141594 siblings:sisters Rebecca,Savannah 3141594 siblings:sisters …

Page 9: Effective Testing of Apache Accumulo Iterators

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Better for Everyone

Iterators are great– Abstraction for system-level filters and optimizations– Better performance for power-users

Lots of things Iterators are not– Triggers– Hooks– Coprocessors– “Hammers”

Iterators do not generally replace– Flink, Hive, Mesos, Presto, Storm, Spark, YARN, etc– Can in some cases

Page 10: Effective Testing of Apache Accumulo Iterators

10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

On Building an Iterator

The API is not particularly intuitive

Hard to create/support SKVIv2

Edge-cases in production are hard to understand

Lots of things to not do in an Iterator– Trial and error

Difficult insight in production systems

Page 11: Effective Testing of Apache Accumulo Iterators

11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Good– Fast– Concise/Simple– Given input, verify output

Bad– Not end-to-end– Not representative invocation

Unit Testing Good

– Same server execution as production– Same client interaction as production

Bad– Slow/Memory intensive– Pedantic to write tests– Might not catch production edge-cases– Impacted by environment

MiniAccumuloCluster (MAC) Testing

Existing Testing Tools

What’s the happy medium?

Page 12: Effective Testing of Apache Accumulo Iterators

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Iterator Testing Harness

Testing harness designed to capture common pitfalls– ACCUMULO-626 in >=1.8.0

Complementary The good parts

– Fast– Generalized/Reusable tests– Extensible

The bad parts– Not directly using TabletServer for invocation– Subtle failures

Page 13: Effective Testing of Apache Accumulo Iterators

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Iterator Testing Harness

Testing an Iterator requires three things– Input data– Expected output– Collection of test cases to run

Test cases found via reflection– Common edge cases provided– Easy to develop and run new test cases

JUnit4 integration

@Parameters public static Object[][] data() { IteratorTestInput input = createIteratorInput(); IteratorTestOutput expectedOutput = createIteratorOuput(); List<IteratorTestCase> testCases = createTestCases(); return BaseJUnit4IteratorTest.createParameters(input,

expectedOutput, testCases); }

Page 14: Effective Testing of Apache Accumulo Iterators

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Example Test Cases

Iterator Instantiation– Does the Iterator have a visibile no-args constructor?

”DeepCopy” safety– Can a “deepCopy()” of an Iterator be used like the original?

Stateless “hasTop()”– Do multiple invocations of “hasTop()” cause incorrect results/errors?

Re-seek()’ing– Accumulo will re-instantiate scan sessions and use new Ranges– Does the Iterator still return correct results in this case?

Page 15: Effective Testing of Apache Accumulo Iterators

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

In an Ideal World

Good testing means faster deployments Faster deployment means more value for customers Automated tests combats technical debt in code growth More automation reduces developer stress

Unit Tests MiniAccumuloCluster Iterator Testing Harness+ + =

Page 16: Effective Testing of Apache Accumulo Iterators

16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

In an Ideal World

Unit Tests (test lifecycle phase)– Fast verification given input/output– Validate impact of state

Iterator Testing Harness (test lifecycle phase)– Catch common-mistakes– Basic lifetime/API validation– Encourage best-practices

MiniAccumuloCluster (integration-test lifecycle phase)– Functional/Acceptance tests– Does the ingest/query system function– Real execution of Iterator by TabletServer

A Trio of Testing Approaches

Page 17: Effective Testing of Apache Accumulo Iterators

17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Standalone environment– The ”laptop test”– Sanity check

Staging environments– Small cluster with a subset of data– Correctness and performance

In an Ideal World

Code

MAC

IteratorTest Harness

Unit Tests

BinaryArtifacts

Standalone

Staging

ProductionDeploy

Page 18: Effective Testing of Apache Accumulo Iterators

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

In an Ideal World

No more ”voodoo” and “black magic” Find common errors fast Catch bad Iterator design early Standardized testing methodology Community contributes new tests Increase in quality, reusability, and confidence

Page 19: Effective Testing of Apache Accumulo Iterators

19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Thank YouTwitter: @josh_elserEmail: [email protected] / [email protected]