tungsten replicator for kafka, elasticsearch, cassandra to kafka+elastic... · •full transaction...

Tungsten Replicator for Kafka, Elasticsearch, Cassandra

Topics

In todays session• Replicator Basics• Filtering and Glue• Kafka and Options• Elasticsearch and Options• Cassandra• Future Direction

2

Asynchronous replication decouples transaction processing on master and slave DBMS nodes

DBMS Logs

Download transactions via network

Apply using JDBC

THL = Events + Metadata

MySQL/Oracle

DBMS-specific Logging (i.e. Redo or Binary)

Option 1: Local InstallExtractor reads directly from the logs, even when the DBMS service is down. This is the default.

Option 2: RemoteExtractor gets log data via MySQL Replication Slave protocols (which requires the DBMS service to be online) or the Redo Reader feature. This is how we handle RDS and Oracle extraction tasks.

Extractor Options

Master Replicator: Extractor

THL

2 1Slave Replicator:

Applier

THL

MySQL/Oracle

39

Parallel apply maximizes DBMS I/O bandwidth when updating replicas

Master replicator

THL

Parallel Queue

(Events+ metadata)

Slave

Extract Filter Apply Extract Filter Apply Extract Filter Apply

Extract Filter Apply

Extract Filter Apply

StageStage Stage

Slave Replicator Pipeline

remote-to-thl thl-to-q q-to-dbms

30

Why Kafka

• Kafka is a high performance message bus• NOT a database• Great for distributing messages and firing/triggering operations on content• Log aggregation• Activity/security tracking• Metrics• Auditing• Data ingestion for Hadoop

Mass Data Collection with Kafka

Kafka

Kafka

Kafka

Tungsten Replicator

Multiple Target Distribution

Kafka

Kafka

Kafka

Tungsten Replicator

Database

Database

Database

Image Process

Email

Metrics

How Kafka Replication Works


Kafka Applier(Native)



THL

Slave Replicator: Applier

THL

Zookeeper

DBMS Logs

MySQL/Oracle

DBMS-specific Logging (i.e. Redo or Binary)

What Tungsten Replicator Does to Apply into Kafka

• Takes an incoming row and converts it to a message• Message consists of metadata:

– Schema name, table name– Sequence number– Commit timestamp– Operation Type

• Embedded Message Content

Message Structure

SchemaTable

RowRowRowRowRow

Topic: Schema_Table

Row

MsgID: Schema Table PKey

Row

MsgID: Schema Table PKey

Row

Sample Message

{"_meta_committime" : "2017-05-27 14:27:18.0","_meta_source_schema" : "sbtest","_meta_seqno" : "10130","_meta_source_table" : "sbtest","_meta_optype" : "INSERT","record" : {

"c" : "Base Msg","k" : "100","id" : "255759","pad" : "Some other submsg"

}}

Customizable Elements

• Whether acknowledgements are required from Kafka• How much distribution/replication is required before sending the message• Format of the message key• Whether to embed schema and table name• Whether the commit timestamp should be embedded

Elasticsearch

• Immediately replicate data into Elasticsearch for searching

• Contains the core text and content of the records

• Provides the original information to track back to the record

• Content structure against the schema (index type) and tablename (index)

• Document ID based on the pkey and other information which is configurable

How Elasticsearch Replication Works

DBMS Logs


Elasticsearch Applier(REST API)


Redo Logging


THL

Slave Replicator: Applier

THL

Redo ReaderGeneratedPLOG

Sample Entry

{ "_id" : "99999", "_type" : "mg", "found" : true, "_version" : 2, "_index" : "msg", "_source" : { "msg" : "Hello ElasticSearch", "id" : "99999" } }

Replicating into CassandraReplicating into Cassandra

Cassandra

• Great for fast online and CRM style deployments

• Highly fault tolerant and scalable

• Has some data and formatting changes– Currently needs our DDL translation tool (soon built-in)

• Quasi table/doccument style

How Cassandra Replication Works

base

Master Replicator

Slave Replicator

CSV

Ruby Connector

staging

base

merge

JS

46

Cassandra

Future Direction for these appliers and related technology

• Full transaction support for Kafka• Support for Amazon Elasticsearch• Kafka Extraction

– Parsing contents of Kafka message queues– Database updates– Large scale distribution of database changes– Filtering and re-submission

General Tungsten Replicator Functionality

• Expanding the standard filter technology– Data translation (dates, numbers, hex)– Basic lookup/combination to aid ETL style deployments– Data munging/obfuscation (PII, credit cards) for analytics

• More appliers– InfluxDB– SQL Server– PostgreSQL– Hadoop JDBC– MemSQL– Amazon (Aurora, Elasticsearch)– CouchDB/Base

• THL Compression/Encryption

Next Steps

• If you are interested in knowing more about Tungsten Replicator and would like to try it out for yourself, please contact our sales team who will be able to take you through the details and setup a POC – [email protected]

• Read the documentation at http://docs.continuent.com/tungsten-replicator-5.2/index.html

• Subscribe to our Tungsten University YouTube channel! http://tinyurl.com/TungstenUni

14

For more information, contact us:

MC BrownVP [email protected]

tungsten replicator for kafka, elasticsearch, cassandra to kafka+elastic... · •full transaction...

Documents