accumulo summit 2014: data-center replication with apache accumulo

14
© Hortonworks Inc. 2014 Data-Center Replication Josh Elser Member of Technical Staff PMC, Apache Accumulo June 12 th , 2014 Page 1 Apache, Accumulo, Apache Accumulo, and ZooKeeper are trademarks of the Apache Software Foundation with Apache Accumulo

Upload: accumulo-summit

Post on 28-Nov-2014

396 views

Category:

Technology


2 download

DESCRIPTION

Speaker: Josh Elser Apache Accumulo presently lacks the ability to automatically replicate its contents to another Accumulo instance with low latency. The only options currently available involve quiescing a table, exporting that table, copying it to the remote instance and importing it. This is unacceptable for a few reasons, the most important of these reasons being the require unavailability to export the given table. This talkwill outline the problems in designing a low-latency replication system for Accumulo tables, describe an implementation that leverages some useful features of Accumulo, and outlines future work in the area.

TRANSCRIPT

Page 1: Accumulo Summit 2014: Data-Center Replication with Apache Accumulo

© Hortonworks Inc. 2014

Data-Center Replication

Josh Elser

Member of Technical Staff

PMC, Apache Accumulo

June 12th, 2014

Page 1

Apache, Accumulo, Apache Accumulo, and ZooKeeper are trademarks of the Apache Software Foundation.

with Apache Accumulo

Page 2: Accumulo Summit 2014: Data-Center Replication with Apache Accumulo

© Hortonworks Inc. 2014

Justification

Shouldn’t Accumulo be able to handle failures?

Absolutely

• Accumulo is great at surviving through failure conditions• Always chooses the durable option by default

BUT

• Sometimes wide-spread failure over some set of resources makes them unavailable for some time

Page 2

Page 3: Accumulo Summit 2014: Data-Center Replication with Apache Accumulo

© Hortonworks Inc. 2014

Justification

• Expect the worst hardware failure–Both in magnitude and frequency

• Administration problems–Human error is inevitable

• A single data-center location may be limiting–Geography: 150ms to send a packet from California to the

Netherlands and back again.–Throughput: Concurrent user access (1K, 10K, 100K?)

• Applications must also consider failure conditions–Must maintain its service-level agreement (SLA)

Page 3

Jeff Dean – “Numbers everyone should know”

http://static.googleusercontent.com/media/research.google.com/en/us/people/jeff/stanford-295-talk.pdf

Page 4: Accumulo Summit 2014: Data-Center Replication with Apache Accumulo

© Hortonworks Inc. 2014

Justification

Wait, failure doesn’t really happen at the data center level

Page 4

… … right?

Inside Google data centers– http://www.google.com/about/datacenters/gallery/images/_2000/IDI_014.jpg

Page 5: Accumulo Summit 2014: Data-Center Replication with Apache Accumulo

© Hortonworks Inc. 2014

Justification

• HVAC Issues–Microsoft: http://blogs.office.com/2013/03/13/details-of-the-

hotmail-outlook-com-outage-on-march-12th/

• “Acts of God”–Hurricane Sandy (NYC):

http://www.datacenterknowledge.com/archives/2012/10/30/major-flooding-nyc-data-centers/

–Thunderstorms (AWS, Netflix, Heroku, Pinterest, Instagram): http://www.datacenterknowledge.com/archives/2012/06/30/amazon-data-center-loses-power-during-storm/

• Squirrels–Level(3): http://blog.level3.com/level-3-network/the-10-most-

bizarre-and-annoying-causes-of-fiber-cuts/

Page 5

Page 6: Accumulo Summit 2014: Data-Center Replication with Apache Accumulo

© Hortonworks Inc. 2014

Justification

• “The Joys of Real Hardware” – Jeff Dean

• Typical first year with a new cluster~0.5 overheating (~1-2 days)

~1 PDU failure (~6 hours)

~20 Rack failures (1-6 hours)

~3 Router failures (~1 hour)

• What happens when you suddenly lose a large percent of nodes in your cluster?

Page 6

Jeff Dean – “The Joys of Real Hardware”

http://static.googleusercontent.com/media/research.google.com/en/us/people/jeff/stanford-295-talk.pdf

Page 7: Accumulo Summit 2014: Data-Center Replication with Apache Accumulo

© Hortonworks Inc. 2014

Justification

And what about your software…

Page 7

Don’t forget about Grace Hopper. Bugs will show up one way or another

Lots of great technical “post-mortems” which involve software problems

from companies like Github, Google, Dropbox, Cloudflare, and others.

Unmodified source: http://www.bugzilla.org/img/buggie.png

License: http://creativecommons.org/licenses/by-sa/2.0/

Page 8: Accumulo Summit 2014: Data-Center Replication with Apache Accumulo

© Hortonworks Inc. 2014

Implementation

• A framework for tracking data written to a table• Interfaces to transmit data to another storage system• Asynchronous replication, eventual consistency• Configurable, cyclic replication graph• “Primary-push” system from primary to peers• Survives prolonged peer outages

Page 8

NoSQL X RDBMS

Page 9: Accumulo Summit 2014: Data-Center Replication with Apache Accumulo

© Hortonworks Inc. 2014

Implementation

• Write-Ahead Logs is source of data for replication–Tricky because WALs are per-TabletServer, not per-table.

• Primary storage of book-keeping in a new table

• Requires some records in Accumulo metadata table

• Pluggable replication work assignment by the Master to TabletServers on primary–Default implementation uses ZooKeeper

Page 9

Page 10: Accumulo Summit 2014: Data-Center Replication with Apache Accumulo

© Hortonworks Inc. 2014

Accumulo Replication

• TabletServers track use of WALs–Only for tables configured for replication

• Master reads these records–Creates units of replication work

Unit = File needing replication to a peer

–Makes replication work units available–Cleans up records which are fully replicated

• TabletServers acquire the replication work–Read part of file, replicate, record progress, repeat–Tries to replicate as much data as possible

• Garbage Collector manages unused WALs–Records an update when a WAL is no longer referenced–Only removes WALs when replication is complete

Page 10

Page 11: Accumulo Summit 2014: Data-Center Replication with Apache Accumulo

© Hortonworks Inc. 2014

Implementation

ReplicaSystem – interface for defining how data is replicated to a peer.

• Runs inside a Primary tserver, does the “heavy lifting”• AccumuloReplicaSystem – implementation that sends data to another Accumulo instance–Primary tserver asks the Peer’s Master for a tserver–Primary tserver sends serialized Mutations from the WAL for the

given table to the Peer tserver–Peer tserver applies the Mutations locally and reports the number

of Mutations applied back to the Primary

Page 11

Primary Peer

Page 12: Accumulo Summit 2014: Data-Center Replication with Apache Accumulo

© Hortonworks Inc. 2014

Implementation

• And it works!• Will be introduced as a part of Apache Accumulo 1.7.0• Also included in the next release of HDP• Currently (6/2014) waiting on review from other devs

Page 12

Page 13: Accumulo Summit 2014: Data-Center Replication with Apache Accumulo

© Hortonworks Inc. 2014

Future

• Replication to other types of systems–Other NoSQL systems, RDBMSes, others?

• Support for replication of bulk imports• Support for Conditional Mutations

–Maybe? Somehow?

• Consistency of table configurations– Iterator/Combiner definitions–Could cause undesired implications

– Peer system is smaller than primary, cannot hold as much data– Need to set shorter TTL/age-off on peer than primary

Page 13

Page 14: Accumulo Summit 2014: Data-Center Replication with Apache Accumulo

© Hortonworks Inc. 2014

Email: [email protected]: @je2451

Page 14

I owe a huge thank you to everyone who was a

part of this in one way or another.

JIRA: https://issues.apache.org/jira/browse/ACCUMULO-378