accumulo summit 2015: reactive programming in accumulo: the observable wal [internals]

of 22 /22
The Observable WAL Reactive Programming in Accumulo

Author: accumulo-summit

Post on 15-Jul-2015




1 download

Embed Size (px)


The Observable WALReactive Programming in AccumuloWhat is Write-Ahead LoggingIn computer science, write-ahead logging (WAL) is a family of techniques for providing atomicity and durability (two of the ACID properties) in database systems. - WikipediaAllow updates to database to be done in-placeThis is advantageous because it reduces to modify indexes and the places where the data is stored

Primary Use Case for WALWrite-ahead logging is primarily used for disaster recoveryWrite-ahead logging is a technique which all modifications are written to a log before they are applied. Usually both redo and undo information is stored in the write ahead log. This guarantees that all modifications to the data can be played and replayed.

Accumulo Write-Ahead Log (WAL)Write-Ahead logs are stored based on tablet serverWrite-Ahead logs are stored in HDFS (as of Accumulo 1.5) generally under (/accumulo/wal/They are written to HDFS so that they are available to all Tablet ServersAccumulo Write-Ahead Logs (WAL) is a file which are a series of key-value pairs of serialized objectsTablet Server FlowTablet ServerWAL (HDFS)Mem TableISAM FileIndexed Sequence Access Method File1. Write2. Mutation goes to WAL3. Minor compactions written to WALMaster use of the WALAccording to the Accumulo Manual the Master will coordinate startup, shutdown, and recovery using the WALs if a tablet server fails.If a tablet server fails the tablets are reassigned to a different tablet server, the WAL is used to copy the entries that were in the failed TServers MemTable to the new tablet server hosting the tablets

Metadata InformationWhat WAL information is stored in the metadata table?3< log: []|6

There can me many WALs for a particular tableThis record tells us the name of the WAL file in this case it would be /accumulo/wal/ also tells us that the data for the tablet is encoded using the tablet ID of 6 within the log fileLogFileKeySource is located at: server/tserver/.../tserver/logger/LogFileKey.javaConsists of:Log Event (Open, Define Tablet, Mutation, Many Mutations, Compaction Start, and Compaction Finish)Tablet IDOther information depending on the event TServer Session (Open)Filename (Compaction Start)Sequence (Mutation, Many Mutations, Define Tablet, Compaction Start and Compaction Finish)KeyExtent Tablet (Define Tablet)LogFileValueThis is generally used when dealing with the Mutations and the ManyMutaitons log eventsThe value itself consists of a number of Mutation objects and it serializes them into the WAL valuesReactive ProgrammingDefinition: programming with asynchronous data streamsAn example of this are clickstreams and event buses

What should a Reactive System be?

Image courtesy of Reactive ManifestoWhy Reactive Programming?Modern applications are becoming more collaborative and require sharing data with multiple users in real time.The polling model had problems scaling to this as users would put constant load on the database due the low polling intervals to keep up with the database.Most modern systems and databases are moving to a Push infrastructure to ease stress on the database

WAL + Reactive Programming = FunWe can use the WAL to monitor for additions, updates, and deletions and then push those to other pieces of code to handle the various mutationsReplicationUpdating external indexesTrigger updates to websites, etc...

How do we do it?We write a simple program to create an event stream for tables: Monitor the WAL directory for updates using the MultiReader class (see: server/tserver/.../logging/LogReader as an example)Correlate the Tablet IDs to a particular tablePush the Mutations to the users.Profit!!!!!!Step 2.5 (Security)Before we push the mutations to the different users we are going to have to do some sort of security verification. The easiest method is when users register for notifications to gather their security credentials and verify that they are allowed to see each mutation. Accumulo has no built in mechanism at this level to guarantee security visibility so it is left to the application.Lambda Architectures

Lambda ArchitecturesLambda architectures is a data processing architecture designed to handle massive quantities of data by taking advantage of both batch and stream processing methods.WikipediaReactive programming is best used with the stream processing component of Lambda Architectures.AWS LambdaLambda Architecture using Events from the Amazon AWS Ecosystem

DyanmoDB EventS3 EventKinesis EventAWS LambdaNode JS Code

console.log(Loading event);

exports.handler = function(event, context) { console.log(event.key1);};

Stargate (Coming Soon)AWS Lambda clone using Docker

Accumulo EventHDFS EventStorm Event

Stargate (ctd)Users can upload docker containers which can then be used to subscribe to events.Each container has a REST server and a /events endpoint which can take POST requests which is where the system will post the eventsSecurity is handled by this system and passed along to the upstream systemsScaling will be handled based on the events that are coming inQuestions

LinksStargate Architecture Manifesto Programming GIST