data-center replication with apache accumulo
Click here to load reader
Post on 25-Jan-2015
Embed Size (px)
DESCRIPTIONTalk given at Accumulo Summit 2014 in Hyattsville, MD. Apache Accumulo presently lacks the ability to automatically replicate its contents to another Accumulo instance with low latency and no administrator intervention. This talk will outline the problems in designing a low-latency replication system for Accumulo tables, describe an implementation that leverages some useful features of Accumulo, and outline future work in the area.
- 1. Hortonworks Inc. 2014 Data-Center Replication Josh Elser Member of Technical Staff PMC, Apache Accumulo June 12th, 2014 Page 1 Apache, Accumulo, Apache Accumulo, and ZooKeeper are trademarks of the Apache Software Foundation. with Apache Accumulo
2. Hortonworks Inc. 2014 Justification Shouldnt Accumulo be able to handle failures? Absolutely Accumulo is great at surviving through failure conditions Always chooses the durable option by default BUT Sometimes wide-spread failure over some set of resources makes them unavailable for some time Page 2 3. Hortonworks Inc. 2014 Justification Expect the worst hardware failure Both in magnitude and frequency Administration problems Human error is inevitable A single data-center location may be limiting Geography: 150ms to send a packet from California to the Netherlands and back again. Throughput: Concurrent user access (1K, 10K, 100K?) Applications must also consider failure conditions Must maintain its service-level agreement (SLA) Page 3 Jeff Dean Numbers everyone should know http://static.googleusercontent.com/media/research.google.com/en/us/people/jeff/stanford-295-talk.pdf 4. Hortonworks Inc. 2014 Justification Wait, failure doesnt really happen at the data center level Page 4 right? Inside Google data centers http://www.google.com/about/datacenters/gallery/images/_2000/IDI_014.jpg 5. Hortonworks Inc. 2014 Justification HVAC Issues Microsoft: http://blogs.office.com/2013/03/13/details-of-the- hotmail-outlook-com-outage-on-march-12th/ Acts of God Hurricane Sandy (NYC): http://www.datacenterknowledge.com/archives/2012/10/30/major- flooding-nyc-data-centers/ Thunderstorms (AWS, Netflix, Heroku, Pinterest, Instagram): http://www.datacenterknowledge.com/archives/2012/06/30/amazo n-data-center-loses-power-during-storm/ Squirrels Level(3): http://blog.level3.com/level-3-network/the-10-most- bizarre-and-annoying-causes-of-fiber-cuts/ Page 5 6. Hortonworks Inc. 2014 Justification The Joys of Real Hardware Jeff Dean Typical first year with a new cluster ~0.5 overheating (~1-2 days) ~1 PDU failure (~6 hours) ~20 Rack failures (1-6 hours) ~3 Router failures (~1 hour) What happens when you suddenly lose a large percent of nodes in your cluster? Page 6 Jeff Dean The Joys of Real Hardware http://static.googleusercontent.com/media/research.google.com/en/us/people/jeff/stanford-295-talk.pdf 7. Hortonworks Inc. 2014 Justification And what about your software Page 7 Dont forget about Grace Hopper. Bugs will show up one way or another Lots of great technical post-mortems which involve software problems from companies like Github, Google, Dropbox, Cloudflare, and others. Unmodified source: http://www.bugzilla.org/img/buggie.png License: http://creativecommons.org/licenses/by-sa/2.0/ 8. Hortonworks Inc. 2014 Implementation A framework for tracking data written to a table Interfaces to transmit data to another storage system Asynchronous replication, eventual consistency Configurable, cyclic replication graph Primary-push system from primary to peers Survives prolonged peer outages Page 8 NoSQL X RDBMS 9. Hortonworks Inc. 2014 Implementation Write-Ahead Logs is source of data for replication Tricky because WALs are per-TabletServer, not per-table. Primary storage of book-keeping in a new table Requires some records in Accumulo metadata table Pluggable replication work assignment by the Master to TabletServers on primary Default implementation uses ZooKeeper Page 9 10. Hortonworks Inc. 2014 Implementation TabletServers track use of WALs Only for tables configured for replication Master reads these records Creates units of replication work Unit = File needing replication to a peer Makes replication work units available Cleans up records which are fully replicated TabletServers acquire the replication work Read part of file, replicate, record progress, repeat Tries to replicate as much data as possible Garbage Collector manages unused WALs Records an update when a WAL is no longer referenced Only removes WALs when replication is complete Page 10 11. Hortonworks Inc. 2014 Implementation ReplicaSystem interface for defining how data is replicated to a peer. Runs inside a Primary tserver, does the heavy lifting AccumuloReplicaSystem implementation that sends data to another Accumulo instance Primary tserver asks the Peers Master for a tserver Primary tserver sends serialized Mutations from the WAL for the given table to the Peer tserver Peer tserver applies the Mutations locally and reports the number of Mutations applied back to the Primary Page 11 Primary Peer 12. Hortonworks Inc. 2014 Implementation And it works! Will be introduced as a part of Apache Accumulo 1.7.0 Also included in the next release of HDP Currently (6/2014) waiting on review from other devs Page 12 13. Hortonworks Inc. 2014 Future Replication to other types of systems Other NoSQL systems, RDBMSes, others? Support for replication of bulk imports Support for Conditional Mutations Maybe? Somehow? Consistency of table configurations Iterator/Combiner definitions Could cause undesired implications Peer system is smaller than primary, cannot hold as much data Need to set shorter TTL/age-off on peer than primary Page 13 14. Hortonworks Inc. 2014 Email: firstname.lastname@example.org Twitter: @je2451 Page 14 I owe a huge thank you to everyone who was a part of this in one way or another. JIRA: https://issues.apache.org/jira/browse/ACCUMULO-378