Transcript

The Observable WALReactive Programming in Accumulo

What is Write-Ahead Logging

“In computer science, write-ahead logging (WAL) is a family of techniques for providing atomicity and durability (two of the ACID properties) in database systems.” - Wikipedia● Allow updates to database to be done “in-place”

o This is advantageous because it reduces to modify indexes and the places where the data is stored

Primary Use Case for WAL● Write-ahead logging is primarily used for disaster

recovery● Write-ahead logging is a technique which all

modifications are written to a log before they are applied. Usually both redo and undo information is stored in the write ahead log.

● This guarantees that all modifications to the data can be played and replayed.

Accumulo Write-Ahead Log (WAL)

● Write-Ahead logs are stored based on tablet server● Write-Ahead logs are stored in HDFS (as of Accumulo

1.5) generally under (/accumulo/wal/<tablet server+port>

● They are written to HDFS so that they are available to all Tablet Servers

● Accumulo Write-Ahead Logs (WAL) is a file which are a series of key-value pairs of serialized objects

Tablet Server Flow

Tablet Server

WAL (HDFS)

Mem TableISAM FileIndexed Sequence

Access Method File

1. Write

2. Mutation

goes to

WAL3. Minor compactions

written to WAL

Master use of the WAL● According to the Accumulo Manual the Master will

coordinate startup, shutdown, and recovery using the WALs if a tablet server fails.

● If a tablet server fails the tablets are reassigned to a different tablet server, the WAL is used to copy the entries that were in the failed TServers MemTable to the new tablet server hosting the tablets

Metadata Information

What WAL information is stored in the metadata table?3< log:127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995 [] 127.0.0.1+9997/0cb7ce52-

ac46-4bf7-ae1d-acdcfaa97995|6

There can me many WALs for a particular table

● This record tells us the name of the WAL file in this case it would be

/accumulo/wal/127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995

● It also tells us that the data for the tablet is encoded using the tablet ID of 6

within the log file

LogFileKeySource is located at: server/tserver/.../tserver/logger/LogFileKey.javaConsists of:● Log Event (Open, Define Tablet, Mutation, Many Mutations, Compaction

Start, and Compaction Finish)● Tablet ID● Other information depending on the event

o TServer Session (Open)o Filename (Compaction Start)o Sequence (Mutation, Many Mutations, Define Tablet,

Compaction Start and Compaction Finish)o KeyExtent Tablet (Define Tablet)

LogFileValue

This is generally used when dealing with the Mutations and the ManyMutaitons log events● The value itself consists of a number of

Mutation objects and it serializes them into the WAL values

Reactive ProgrammingDefinition: programming with asynchronous data streamsAn example of this are clickstreams and event buses

What should a Reactive System be?

Image courtesy of Reactive Manifesto

Why Reactive Programming?

● Modern applications are becoming more collaborative and require sharing data with multiple users in real time.

● The polling model had problems scaling to this as users would put constant load on the database due the low polling intervals to keep up with the database.

● Most modern systems and databases are moving to a Push infrastructure to ease stress on the database

WAL + Reactive Programming = Fun

We can use the WAL to monitor for additions, updates, and deletions and then push those to other pieces of code to handle the various mutations● Replication● Updating external indexes● Trigger updates to websites, etc...

How do we do it?We write a simple program to create an event stream for tables: 1. Monitor the WAL directory for updates using the

MultiReader class (see: server/tserver/.../logging/LogReader as an example)

2. Correlate the Tablet IDs to a particular table3. Push the Mutations to the users.4. Profit!!!!!!

Step 2.5 (Security)Before we push the mutations to the different users we are going to have to do some sort of security verification. ● The easiest method is when users “register” for

notifications to gather their security credentials and verify that they are allowed to see each mutation.

● Accumulo has no built in mechanism at this level to guarantee security visibility so it is left to the application.

Lambda Architectures

Lambda ArchitecturesLambda architectures is a data processing architecture designed to handle massive quantities of data by taking advantage of both batch and stream processing methods.- Wikipedia

Reactive programming is best used with the stream processing component of Lambda Architectures.

AWS Lambda

Lambda Architecture using Events from the Amazon AWS Ecosystem

DyanmoDB Event

S3 Event

Kinesis Event

AWS Lambda

Node JS Code

console.log(“Loading event”);

exports.handler = function(event,

context) {

console.log(event.key1);

};

Stargate (Coming Soon)

AWS Lambda clone using Docker

Accumulo Event

HDFS Event

Storm Event

Stargate (ctd…)● Users can upload docker containers which can then be

used to subscribe to events.● Each container has a REST server and a /events

endpoint which can take POST requests which is where the system will post the events

● Security is handled by this system and passed along to the upstream systems

● Scaling will be handled based on the events that are coming in

Questions

Links

Stargate https://github.com/immuta/stargate

Accumulo-Observable-WAL https://github.com/immuta/accumulo-observable-wal

Lambda Architecture http://lambda-architecture.net/

Reactive Manifesto http://www.reactivemanifesto.org/

Reactive Programming GIST https://gist.github.com/staltz/868e7e9bc2a7b8c1f754


Top Related