real time data streaming using kafka & storm
DESCRIPTION
This presentation describes 3 real use case of Real-Time Data Streaming and how they were implemented in LivePerson using Kafka and StormTRANSCRIPT
![Page 1: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/1.jpg)
DATA
LivePerson Case Study: Real Time Data Streaming
March 20th 2014Ran Silberman
![Page 2: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/2.jpg)
About me● Technical Leader of Data Platform in LivePerson
● Bird watcher and amateur bird photographer
Pharaoh Eagle-Owl / Bubo ascalaphus This is what the people from previous slide were looking at…
Amir Silberman
![Page 3: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/3.jpg)
Agenda● Why we chose Kafka + Storm
● How implementation was done
● Measures of success
● Two examples of use
● Tips from our experience
![Page 4: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/4.jpg)
Data in LivePersonVisitor in Site
Chat Window
Agent console
LivePerson SaaS Server
LoginMonitor
Rules,Intelligence,Decision
Chat
Chat
Invite
DATA
DATA DATA
BIGDATA
![Page 5: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/5.jpg)
Legacy Data flow in LivePerson
BI DWH (Oracle)
RealTime servers
ETLSessionize
Modeling
Schema View
Real-Time data
Historical data
![Page 6: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/6.jpg)
Why Kafka + Storm?● Need to scale out and plan for future scale
○ Limit for scale should not be technology
○ Let the limit be cost of (commodity) hardware
● What Data platforms can be implemented quickly?
○ Open source - fast evolving and community
○ Micro-services - do only what you ought to do!
● Are there risks in this choice?
○ Yes! technology is not mature enough
○ But, there is no other mature technology that can
address our needs!
![Page 7: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/7.jpg)
Long-eared Owl / Asio otusAmir Silberman
![Page 8: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/8.jpg)
Legacy Data flow in LivePerson
BI DWH (Oracle)
RealTime servers
Customers
ETLSessionize
Modeling
Schema View
![Page 9: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/9.jpg)
1st phase - move to Hadoop
ETLSessionize
Modeling
Schema View
RealTime servers
BI DWH (Vertica)HDFS
Hadoop
MR Job transfers data to BI DWH
Customers
![Page 10: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/10.jpg)
2. move to Kafka
6
RealTime servers
HDFSBI DWH (Vertica)
Hadoop
MR Job transfers data to BI DWH
KafkaTopic-1
Customers
![Page 11: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/11.jpg)
3. Integrate with new producers
6
RealTime servers
HDFSBI DWH (Vertica)
Hadoop
MR Job transfers data to BI DWH
KafkaTopic-1 Topic-2
New RealTime servers
Customers
![Page 12: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/12.jpg)
4. Add Real-time BI
6
Customers
RealTime servers
HDFSBI DWH (Vertica)
Hadoop
MR Job transfers data to BI DWH
KafkaTopic-1 Topic-2
New RealTime servers
Storm
Topology
Analytics DB
![Page 13: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/13.jpg)
Architecture
Real-time servers
Kafka
Storm
Cassandra/ CouchBase
Real Time Processing
Flow rate into Kafka:33 MB/Sec
Flow rate from Kafka: 20 MB/Sec
Total daily data in Kafka:17 Billion events
Some Numbers: Cyber Monday 2013
Dashboards
4 topologies reading all events
![Page 14: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/14.jpg)
Eurasian Wryneck / Jynx torquillaAmir Silberman
![Page 15: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/15.jpg)
Two use cases 1. Visitor list
2. Agent State
![Page 16: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/16.jpg)
1st Strom Use Case: “Visitors List”Use case:
● Show list of visitors in the “Agent Console”
● Collect data about visitor in real time
● Visitor stickiness in streaming process
![Page 17: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/17.jpg)
Visitors List Topology
![Page 18: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/18.jpg)
Selected Analytics DB - Couchbase
1st Strom Use Case: “Visitors List”
● Document Store - for complex documents
● Searchable - possible to search by different
attributes.
● High throughput - Read & Write
![Page 19: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/19.jpg)
First Storm Topology – Visitor Feed
Storm Topology
Kafka Spout Analyze relevant events
Write event to Visitor document
emit emit
Kafka events stream
Add/ Update
Couchbase
“Visitor List” Topology: Analytics DB: Couchbase - Document store
Parse Avro into tuple
emit
![Page 20: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/20.jpg)
Visitors List - Storm considerations● Complex calculations before sending to DB
○ Ignore delayed events
○ Reorder events before storing
● Document cached in memory
● Fields Grouping to bolt that writes to CouchBase
● High parallelism in bolt that writes to CouchBase
![Page 21: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/21.jpg)
Visitors List Topology
![Page 22: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/22.jpg)
European Roller / Coracias garrulusAmir Silberman
![Page 23: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/23.jpg)
2nd Storm Use Case: “Agent State”Use case:
● Show Agent activity on “Agent Console”
● Count Agent statistics
● Display graphs
![Page 24: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/24.jpg)
Agent Status Topology
![Page 25: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/25.jpg)
Selected Analytics DB - Cassandra
2nd Storm Use Case: “Agent State”
● Wide Column Store DB
● Highly Available w/o Single point of failure
● High throughput
● Optimized for counters
![Page 26: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/26.jpg)
First Storm Topology – Visitor Feed
Storm Topology
Kafka Spout Analyze relevant events
Send events
emit emit
Kafka events stream
Add
“Agent Status” Topology: Analytics DB: Cassandra - Document store
Parse Avro into tuple
emit
Data visualization using Highcharts
![Page 27: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/27.jpg)
Agent Status - Storm considerations● Counters stored by topology
● Calculations done after reading from DB
● Delayed events should not be ignored
● Order of events does not matter
● Using Highcharts for data visualization
![Page 28: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/28.jpg)
Spur-winged Lapwing / Vanellus spinosusAmir Silberman
![Page 29: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/29.jpg)
3rd Storm Use Case: Data AuditingUse case:
● Needs to be able to tell whether events arrived
○ Where there any missing events?
○ Where there any duplicated events?
○ How long did it take for events to arrive?
● Data not important - only count of events
![Page 30: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/30.jpg)
3rd Storm Use Case: Data AuditingRealtime server
Kafka Topics
Auditing Topic
Storm Sync topology
Audit-loader topology
MySql
Hadoop
HDFS
audit job
kafka1
3
4
2
Auditor
![Page 31: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/31.jpg)
First Storm Topology – Visitor Feed
Storm Topology
Kafka Spout Analyze relevant events
Send events
emit emit
Kafka events stream
Add
“Sync Audit” Topology: Sync messages between two topics
Parse Avro into tuple
emit
Kafka Audit topic
![Page 32: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/32.jpg)
First Storm Topology – Visitor Feed
Storm Topology
Kafka Spout Analyze relevant events
Send events
emit emit
Kafka Audit topic
Add
“Load Audit” Topology: Analytics DB: MySql - RDBMS
Parse Avro into tuple
emit
Auditing Report
![Page 33: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/33.jpg)
“Load Audit” Topology:● Stores statistics of events count
● SQL type DB
● Used for Auditing and other statistics
● Requires metadata in events header
![Page 34: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/34.jpg)
Challenges:
● High network traffic
● Writing to Kafka is faster than reading
● All topologies read all events
● How to avoid resource starvation in Storm
Subalpine Warbler / Sylvia cantillansAmir Silberman
![Page 35: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/35.jpg)
Optimizations of Kafka● Increase Kafka consuming rate by adding partitions
● Run on physical machines with RAID
● Set retention to the proper need
● Monitor data flow!
![Page 36: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/36.jpg)
Optimizations of Storm● #of Kafka-Spouts = number of total partitions
● Set “Isolation mode” for important topologies
● Validate Network cards can carry network traffic
● Set Storm cluster on high CPU machines
● Monitor servers CPU & Memory (Graphite)
● Assess min. #Cores that topology needs
○ Use “top” -> “load” to find server load
![Page 37: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/37.jpg)
Demo● Agent Console - https://z1.le.liveperson.net/
71394613 / [email protected]
● My Site - http://birds-of-israel.weebly.com/
![Page 38: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/38.jpg)
Questions?
Little Owl / Athene noctuaAmir Silberman
![Page 39: Real Time Data Streaming using Kafka & Storm](https://reader034.vdocuments.net/reader034/viewer/2022042700/554a042cb4c905e56c8b53f6/html5/thumbnails/39.jpg)
Thank you!
Ruff / Philomachus pugnaxAmir Silberman