kafka and stream processing, taking analytics real-time, mike spicer
TRANSCRIPT
Kafka and Stream Processing, Taking Analytics Real-Time
Mike Spicer - Lead Architect, IBM Streams
Traditional Processing Stream Processing
Data Repository
Data Query
request
response
Real-Time Analytics
Data Results
Current fact finding
Analyze data in motion – before it is stored
Low latency paradigm, push model
Data driven: bring data to the analytics
Historical fact finding
Find and analyze information stored on disk
Batch paradigm, pull model
Query-driven: submits queries to static data
Stream Processing
What Makes Kafka ideal for Stream ProcessingFAST –
• A single Streams Kafka Source/Sink can Consume/Produce 100,000’s msgs/sec
SCALABLE – • Partitioned Kafka Topics work with parallel Streams Kafka Sources
• Parallel sources in the same Consumer group can consume 1,000,000’s msgs/sec
DURABLE – • Kafka is distributed and replicated • Messages are logged and replayable for a configured period • Streams Kafka connectors support Guaranteed Processing
• Source supports exactly once (& at least once) semantics • Sink supports at least once semantics
A UNIVERSAL HUB – • Hub connecting all applications and data sources • Isolation between Producer and Consumer
Streaming Analytics Can Handle Many Use Cases
IBM Streams is being applied in many use cases –
• Market and Customer Intelligence
• Revenue, Upsell / Cross Sell
• Personalized Customer Experience
• Network Analytics
• IoT, Connected Car and Telematics
• National / Cyber Security, PII & PCI Data Leakage
• Health and Improved Patient Outcomes
• Operational Optimization
Watch the video
Watch the video
Watch the video
Watch the video Watch the video
Watch the video
Watch the video
Watch the video
Watch the video
Watch the video Watch the video
Watch the video
Watch the video
Watch the video
Insight Presentation Insight Presentation Insight Presentation Read the Case Study Read the Case Study
Read the Case Study
Read the Case Study
Read the Case Study Read the Case Study
Read the Case Study Insight Presentation Read the Case Study Insight Presentation
Read the Press Release Read the Abstract
Example Real-Time Analytics Use Cases
North American Telco Real Time Advertising – • Click thru rate and Revenue up 50% • ~30M in memory profiles, 500 SPSS models • Purchases, Web click stream, CDRs, IPTV viewing,
Behavioral events • Total events ~1.2B per day, 210K per second • Average Latency 8ms
Thompson Reuters Eikon – • News Ingest and Analytics
• News, Market Data & Meta Data Streams to HBase • Signal App: Real time Technical Analysis
• Bollinger Band, Simple moving average, etc. • VolSurf: Real time volatility surfaces
• 200k instruments, 100k msgs/sec
Multichannel
@
Website
Predictive Models Scoring, Segmentation, Analysis, Association
Target Advertising Platform (Campaign Management)
Transactions from all customers
Descriptive • Age • Gender • Family situation • Zip code
Transactions from this customer
• Cardholder since YYYYMM • Average transaction value • Monthly transaction value • Categories purchased • Brands purchased
Interactions • Web registration • Web visits • Customer service contacts • Channel preference Attitudes • Satisfaction scores • Shopper type • Eco score
Customers
Capture: Search keywords Page content Cookies
IP addresses Device info Actions within a window of time
In-Motion Behavior Analysis
Match with Global Id Map keywords to attributes and classification hierarchy
Invoke behavior models/scores
Advertisers
IBM Streams
Inges&on Technology
SDI Data (Metadata)
Elektron (Market data)
News
Others… IBM Streams
Real-Time Analytics from the Center to the Edge with Quarks for edge analytics on device or gateway –
• Lightweight embedded streaming analytics runtime • Analyze events locally on the edge • Reduce communication costs by only sending relevant events
Device Hub – • Device management • Message broker (including MQTT & Kafka) • Public device hub API supports custom device hub
IBM Streams for streaming analytics – • High performance, full featured streaming analytics • Build windows of state and correlate across devices • Have access to data-of-record systems, e.g. medical history • Control edge device based upon analytics • Central job management/health summary • Automatic application connectivity
Cluster
Gateway
Edge Device Edge
Device
Messaging (MQTT, Kafka etc.)
Real-Time Analytics – What Are You Waiting For?
The World is real-time, analyze it in real-time – • Acquire events as they happen • Analyze in real-time to detect and predict insights • Act immediately to change outcomes
Forrester Research described the following key takeaway in their recent Wave report – • All Data Is Born Fast “All data originates in a flash, whether it is from Internet-of-Things (IoT) devices, web clicks, transactions, or mobile app usage. But traditional analytics is done much, much later. Why wait? AD&D pros can use streaming analytics embedded in applications to get actionable value tout de suite. So what are you waiting for? Streaming analytics solutions can capture perishable insights on real-time data to bring immediate context to all IoT, mobile, web, and enterprise apps.”
The Forrester Wave™: Big Data Streaming Analytics Platforms, Q1 2016
The Forrester Wave is copyrighted by Forrester Research, Inc. Forrester and Forrester Wave are trademarks of Forrester Research, Inc. The Forrester Wave is a graphical representation of Forrester's call on a market and is plotted using a detailed spreadsheet with exposed scores, weightings, and comments. Forrester does not endorse any vendor, product, or service depicted in the Forrester Wave. Information is based on best available resources. Opinions reflect judgment at the time and are subject to change.
Thank You!