apache accumulo 1.8.0 overview

Click here to load reader

Post on 16-Apr-2017




5 download

Embed Size (px)


Apache Accumulo 1.8.0 TBD

Apache Accumulo 1.8.0 OverviewJosh ElserApache Accumulo Meetup Group2016/06/27

# Hortonworks Inc. 2011 2016. All Rights Reserved

Hortonworks: Powering the Future of Data1

Apache Accumulo 1.8.0First release candidate in the worksA minor release, but significantly more work required than a patch releaseContinuousIngest and verificationRandomWalkLong time coming..

# Hortonworks Inc. 2011 2016. All Rights Reserved

Hortonworks: Powering the Future of Data2

Semantic VersioningDefines a set of rules for software projects to adhere to across different versions.Clear understanding on compatibilityRules are defined in terms of a public APIDefined by the project adopting SemVerMajorIncompatible changes, deprecations removedMinorBackwards-compatible features addedPatchBackwards-compatible bug-fixes only (no features)http://semver.org - major.minor.patch

# Hortonworks Inc. 2011 2016. All Rights ReservedApache Accumulo and Semantic VersioningApache Accumulo defines a public APIMade up of Java classes, defined by packagesThe goal is to describe how user code should function across releasesRecursively, all public types in (excluding impl, thrift, or crypto)org.apache.accumulo.core.{client,data,security}org.apache.accumulo.miniclusterOther concerns for compatibility tooRPC classesPersistent data (RFiles and ZooKeeper)Not comprehensive!Not all user facing code is yet included in the public APIMonitoring UIs and dataStart/stop scriptsThe Accumulo Shell

# Hortonworks Inc. 2011 2016. All Rights ReservedApache Accumulo and Semantic VersioningIs it guaranteed that your application from 1.7.1 work against 1.8.0?

What about a 1.6.5 application?

Are you guaranteed to be able to roll back an upgrade from 1.8.0 to 1.7.1?

Is it guaranteed that your 1.8.0 application work against 1.7.0?POP QUIZ!

# Hortonworks Inc. 2011 2016. All Rights ReservedNotable changes currentlystaged for Apache Accumulo 1.8.0

# Hortonworks Inc. 2011 2016. All Rights Reserved

# Hortonworks Inc. 2011 2016. All Rights Reserved

Hortonworks: Powering the Future of Data6

System Administrator Changes[ACCUMULO-925] - Launch scripts should use a PIDfileNew script: start-daemon.shEncapsulates only the things that need to happen on the machine starting a processNo SSHingSupport for PID files to track processesRotating .out and .err files on startCritical for delayed JVM layer issues

# Hortonworks Inc. 2011 2016. All Rights ReservedPerformance![ACCUMULO-3423] - Speed up write-ahead log (WAL) roll-oversChanges how references to WALs are stored by AccumuloReduces the number of writes when switching to a new WALUses ZooKeeper to track the state, copies into tablet row before recovery starts10-30% faster over previous implementation (while exacerbating the problem)[ACCUMULO-1124] - Optimize index size in RFileRFiles have data and index blocks; index from RowID to data block containing that RowIDLarge RowIDs bloat the index (e.g. inverted URL)Fewer index blocks can be cachedRelated work: [ACCUMULO-4164] and [ACCUMULO-4314]

# Hortonworks Inc. 2011 2016. All Rights ReservedNew Features[ACCUMULO-3913] - Add per table samplingHelpful in running analytics over some percentage of the total dataCan automatically create samples during compaction or on the fly using IteratorsConfigurable hashing to ensure consistency across index and data tablesNo dangling references index records or unreachable data recordsConsider snapshoting a sample of a table. After compaction, just a normal table[ACCUMULO-4187] - Rate limiting of major compactionsCompactions can strain system resources: hardware, JVM and HDFSNormally, desirable to process compactions as fast as possibleCan negatively affect low-latency workloadsConfigure a limit in bytes per seconds that a TabletServer should process during compaction

# Hortonworks Inc. 2011 2016. All Rights ReservedNew Features (pt.2)[ACCUMULO-3948] - Enable A/B testing of scan iterators on a tableClasspath context is a definition of JARs which the TabletServer should dynamically loadConfiguration allows a context to be specified when using a [Batch]ScannerMultiple implementations of the same SKVIterator classes can co-existUseful in testing new implementations of iterators on real data before switching production[ACCUMULO-626] - Create an iterator fuzz testerWriting SKVIterators is notoriously difficultMany common pitfalls and gotchas, often not appearing until real useA testing framework codifies these edge cases and can automatically test iteratorsSimilar to security fuzzingUsers must provide data sets and the expected outcome from using their SKVIteratorA supplement to unit testing and MiniAccumuloCluster, not a replacementTest cases implicitly encourage good design of SKVIterators

# Hortonworks Inc. 2011 2016. All Rights ReservedNew APIs[ACCUMULO-2883] - Add API to fetch current locations of TabletsLong-standing feature request (order of years)Extremely useful for distributed execution engines for locality aware computationApache Hive, Presto, Apache Drill, Apache Spark, etcSmart placement can reduce client Accumulo network trafficLocality with Accumulo Tablets also implies locality with HDFS data (over time)[ACCUMULO-4165] - Create a user level API for RFileExample of a glaring hole in the public APIOnly stable way to create an RFile is via MapReduceProvides a supported API for reading and writing RFilesSimplifies implementation and use of RFile access internally too

# Hortonworks Inc. 2011 2016. All Rights ReservedChanges to be wary of[ACCUMULO-3409] - Move default ports out of ephemeral rangeTraditional ephemeral range on Linux: [32768, 61000]Transient connections can prevent processes from startingMonitor HTTP port moves from 50095 to 9995[ACCUMULO-4077] - Upgrade to Apache Thrift 0.9.3Thrift is used by Accumulo for RPCsSerialized messages are compatible (with caveats) across releases, but Java classes are notA massive pain for downstream integrationsIf you require a different version of Thrift and want to use Accumulo 1.8.0Shade+Relocate your version of Thrift in your applicationUpgrade to Apache Thrift 0.9.3

# Hortonworks Inc. 2011 2016. All Rights ReservedThank YouEmail: elserj@apache.orgTwitter: @josh_elserMailing list: dev@accumulo.apache.org

# Hortonworks Inc. 2011 2016. All Rights Reserved

# Hortonworks Inc. 2011 2016. All Rights Reserved

Hortonworks: Powering the Future of Data13

View more