tungsten replicator for kafka, elasticsearch, cassandra to kafka+elastic... · •full transaction...
TRANSCRIPT
Tungsten Replicator for Kafka, Elasticsearch, Cassandra
Topics
In todays session• Replicator Basics• Filtering and Glue• Kafka and Options• Elasticsearch and Options• Cassandra• Future Direction
2
Asynchronous replication decouples transaction processing on master and slave DBMS nodes
DBMS Logs
Download transactions via network
Apply using JDBC
THL = Events + Metadata
MySQL/Oracle
DBMS-specific Logging (i.e. Redo or Binary)
Option 1: Local InstallExtractor reads directly from the logs, even when the DBMS service is down. This is the default.
Option 2: RemoteExtractor gets log data via MySQL Replication Slave protocols (which requires the DBMS service to be online) or the Redo Reader feature. This is how we handle RDS and Oracle extraction tasks.
Extractor Options
Master Replicator: Extractor
THL
2 1Slave Replicator:
Applier
THL
MySQL/Oracle
39
Parallel apply maximizes DBMS I/O bandwidth when updating replicas
Master replicator
THL
Parallel Queue
(Events+ metadata)
Slave
Extract Filter Apply Extract Filter Apply Extract Filter Apply
Extract Filter Apply
Extract Filter Apply
StageStage Stage
Slave Replicator Pipeline
remote-to-thl thl-to-q q-to-dbms
30
Why Kafka
• Kafka is a high performance message bus• NOT a database• Great for distributing messages and firing/triggering operations on content• Log aggregation• Activity/security tracking• Metrics• Auditing• Data ingestion for Hadoop
Mass Data Collection with Kafka
Kafka
Kafka
Kafka
Tungsten Replicator
Multiple Target Distribution
Kafka
Kafka
Kafka
Tungsten Replicator
Database
Database
Database
Image Process
Metrics
How Kafka Replication Works
Download transactions via network
Kafka Applier(Native)
THL = Events + Metadata
Master Replicator: Extractor
THL
Slave Replicator: Applier
THL
Zookeeper
DBMS Logs
MySQL/Oracle
DBMS-specific Logging (i.e. Redo or Binary)
What Tungsten Replicator Does to Apply into Kafka
• Takes an incoming row and converts it to a message• Message consists of metadata:
– Schema name, table name– Sequence number– Commit timestamp– Operation Type
• Embedded Message Content
Message Structure
SchemaTable
RowRowRowRowRow
Topic: Schema_Table
Row
MsgID: Schema Table PKey
Row
MsgID: Schema Table PKey
Row
Sample Message
{"_meta_committime" : "2017-05-27 14:27:18.0","_meta_source_schema" : "sbtest","_meta_seqno" : "10130","_meta_source_table" : "sbtest","_meta_optype" : "INSERT","record" : {
"c" : "Base Msg","k" : "100","id" : "255759","pad" : "Some other submsg"
}}
Customizable Elements
• Whether acknowledgements are required from Kafka• How much distribution/replication is required before sending the message• Format of the message key• Whether to embed schema and table name• Whether the commit timestamp should be embedded
Demo
Elasticsearch
• Immediately replicate data into Elasticsearch for searching
• Contains the core text and content of the records
• Provides the original information to track back to the record
• Content structure against the schema (index type) and tablename (index)
• Document ID based on the pkey and other information which is configurable
How Elasticsearch Replication Works
DBMS Logs
Download transactions via network
Elasticsearch Applier(REST API)
THL = Events + Metadata
Redo Logging
Master Replicator: Extractor
THL
Slave Replicator: Applier
THL
Redo ReaderGeneratedPLOG
Sample Entry
{ "_id" : "99999", "_type" : "mg", "found" : true, "_version" : 2, "_index" : "msg", "_source" : { "msg" : "Hello ElasticSearch", "id" : "99999" } }
Replicating into CassandraReplicating into Cassandra
Demo
Cassandra
• Great for fast online and CRM style deployments
• Highly fault tolerant and scalable
• Has some data and formatting changes– Currently needs our DDL translation tool (soon built-in)
• Quasi table/doccument style
How Cassandra Replication Works
base
Master Replicator
Slave Replicator
CSV
Ruby Connector
staging
base
merge
JS
46
Cassandra
Demo
Future Direction for these appliers and related technology
• Full transaction support for Kafka• Support for Amazon Elasticsearch• Kafka Extraction
– Parsing contents of Kafka message queues– Database updates– Large scale distribution of database changes– Filtering and re-submission
General Tungsten Replicator Functionality
• Expanding the standard filter technology– Data translation (dates, numbers, hex)– Basic lookup/combination to aid ETL style deployments– Data munging/obfuscation (PII, credit cards) for analytics
• More appliers– InfluxDB– SQL Server– PostgreSQL– Hadoop JDBC– MemSQL– Amazon (Aurora, Elasticsearch)– CouchDB/Base
• THL Compression/Encryption
Next Steps
• If you are interested in knowing more about Tungsten Replicator and would like to try it out for yourself, please contact our sales team who will be able to take you through the details and setup a POC – [email protected]
• Read the documentation at http://docs.continuent.com/tungsten-replicator-5.2/index.html
• Subscribe to our Tungsten University YouTube channel! http://tinyurl.com/TungstenUni
14
For more information, contact us:
MC BrownVP [email protected]