acm debs 2015: realtime streaming analytics patterns

105
ACM DEBS 2015: Realtime Streaming Analytics Patterns Srinath Perera Sriskandarajah Suhothayan WSO2 Inc.

Upload: srinath-perera

Post on 21-Apr-2017

6.535 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Page 1: ACM DEBS 2015: Realtime Streaming Analytics Patterns

ACM DEBS 2015: Realtime Streaming Analytics

Patterns

Srinath Perera Sriskandarajah Suhothayan

WSO2 Inc.

Page 2: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Data Analytics ( Big Data)

o Scientists are doing this for 25 year with MPI (1991) using special Hardware

o Took off with Google’s MapReduce paper (2004), Apache Hadoop, Hive and whole ecosystem created.

o Later Spark emerged, and it is faster.

o But, processing takes time.

Page 3: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Value of Some Insights degrade Fast!

o For some usecases ( e.g. stock markets, traffic, surveillance, patient monitoring) the value of insights degrade very quickly with time. o E.g. stock markets and speed of

light

oo We need technology that can produce outputs fasto Static Queries, but need very fast output (Alerts, Realtime

control) o Dynamic and Interactive Queries ( Data exploration)

Page 4: ACM DEBS 2015: Realtime Streaming Analytics Patterns

History

▪Realtime Analytics are not new either!!- Active Databases (2000+)

- Stream processing (Aurora, Borealis (2005+) and later Storm)

- Distributed Streaming Operators (e.g. Database research topic around 2005)

- CEP Vendor Roadmap ( from http://www.complexevents.com/2014/12/03/cep-tooling-market-survey-2014/)

Page 5: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Data Analytics Landscape

Page 6: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Realtime Interactive Analytics

o Usually done to support interactive queries

o Index data to make them them readily accessible so you can respond to queries fast. (e.g. Apache Drill)

o Tools like Druid, VoltDB and SAP Hana can do this with all data in memory to make things really fast.

Page 7: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Realtime Streaming Analytics

o Process data without Streaming ( As data some in) o Queries are fixed ( Static) o Triggers when given conditions are met.o Technologies

o Stream Processing ( Apache Storm, Apache Samza)o Complex Event Processing/CEP (WSO2 CEP, Esper,

StreamBase)o MicroBatches ( Spark Streaming)

Page 9: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Why Realtime Streaming AnalyticsPatterns?

o Reason 1: Usual advantages o Give us better understanding

o Give us better vocabulary to teach and communicate

o Tools can implement them o ..

o Reason 2: Under theme realtime analytics, lot of people get too much carried away with word count example. Patterns shows word count is just tip of the iceberg.

Page 10: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Earlier Work on Patterns

o Patterns from SQL ( project, join, filter etc) o Event Processing Technical Society’s (EPTS)

reference architectureo higher-level patterns such as tracking, prediction and

learning in addition to low-level operators that comes from SQL like languages.

o Esper’s Solution Patterns Document (50 patterns) o Coral8 White Paper

Page 11: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Basic Patterns

o Pattern 1: Preprocessing ( filter, transform, enrich, project .. )

o Pattern 2: Alerts and Thresholdso Pattern 3: Simple Counting and Counting with

Windowso Pattern 4: Joining Event Streamso Pattern 5: Data Correlation, Missing Events, and

Erroneous Data

Page 12: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Patterns for Handling Trends

o Pattern 7: Detecting Temporal Event Sequence Patterns

o Pattern 8: Tracking ( track something over space or time)

o Pattern 9: Detecting Trends ( rise, fall, turn, tipple bottom)

o Pattern 13: Online Control

Page 13: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Mixed Patterns

o Pattern 6: Interacting with Databaseso Pattern 10: Running the same Query in Batch and

Realtime Pipelineso Pattern 11: Detecting and switching to Detailed

Analysiso Pattern 12: Using a Machine Learning Model

Page 14: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Earlier Work on Patterns

Page 15: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Realtime Streaming Analytics Tools

Page 16: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Implementing Realtime Analytics

o tempting to write a custom code. Filter look very easy. Too complex!! Don’t!

o Option 1: Stream Processing (e.g. Storm). Kind of works. It is like Map Reduce, you have to write code.

o Option 2: Spark Streaming - more compact than Storm, but cannot do some stateful operations.

o Option 3: Complex Event Processing - compact, SQL like language, fast

Page 17: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Stream Processing

o Program a set of processors and wire them up, data flows though the graph.

o A middleware framework handles data flow, distribution, and fault tolerance (e.g. Apache Storm, Samza)

o Processors may be in the same machine or multiple machines

Page 18: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Writing a Storm Program

o Write Spout(s)o Write Bolt(s)o Wire them upo Run

Page 19: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Write Bolts

We will use a shorthand like on the left to explain

public static class WordCount extends BaseBasicBolt { @Override public void execute(Tuple tuple, BasicOutputCollector collector) { .. do something … collector.emit(new Values(word, count)); }

@Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word", "count")); } }

Page 20: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Wire up and RunTopologyBuilder builder = new TopologyBuilder();builder.setSpout("spout", new RandomSentenceSpout(), 5);builder.setBolt("split", new SplitSentence(), 8)

.shuffleGrouping("spout");builder.setBolt("count", new WordCount(), 12)

.fieldsGrouping("split", new Fields("word"));

Config conf = new Config(); if (args != null && args.length > 0) { conf.setNumWorkers(3); StormSubmitter.submitTopologyWithProgressBar(

args[0], conf, builder.createTopology()); }else { conf.setMaxTaskParallelism(3); LocalCluster cluster = new LocalCluster(); cluster.submitTopology("word-count", conf,

builder.createTopology()); ... } }

Page 21: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Complex Event Processing

Page 22: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Micro Batches ( e.g. Spark Streaming)

o Process data in small batches, and then combine results for final results (e.g. Spark)

o Works for simple aggregates, but tricky to do this for complex operations (e.g. Event Sequences)

o Can do it with MapReduce as well if the deadlines are not too tight.

Page 23: ACM DEBS 2015: Realtime Streaming Analytics Patterns

o A SQL like data processing languages (e.g. Apache Hive)

o Since many understand SQL, Hive made large scale data processing Big Data accessible to many

o Expressive, short, and sweet. o Define core operations that

covers 90% of problems o Let experts dig in when they

like!

SQL Like Query Languages

Page 24: ACM DEBS 2015: Realtime Streaming Analytics Patterns

o Easy to follow from SQLo Expressive, short, and sweet. o Define core operations that covers 90% of problems o Let experts dig in when they like!

CEP = SQL for Realtime Analytics

Page 25: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Pattern Implementations

Page 27: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Pattern 1: Preprocessing

o What? Cleanup and prepare data via operations like filter, project, enrich, split, and transformations

o Usecases?o From twitter data stream: we extract author,

timestamp and location fields and then filter them based on the location of the author.

o From temperature stream we expect temperature & room number of the sensor and filter by them.

Page 28: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Filter

from TempStream [ roomNo > 245 and roomNo <= 365]select roomNo, tempinsert into ServerRoomTempStream ;

In Storm

In CEP ( Siddhi)

Page 29: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Architecture of WSO2 CEP

Page 30: ACM DEBS 2015: Realtime Streaming Analytics Patterns

CEP Event Adapters

Support for several transports (network access)● SOAP● HTTP● JMS● SMTP● SMS● Thrift● Kafka ● Websocket ● MQTT

Supports database writes using Map messages● Cassandra ● RDBMs

Supports custom event adaptors via its pluggable architecture!

Page 31: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Stream Definition (Data Model)

{ 'name':'soft.drink.coop.sales', 'version':'1.0.0', 'nickName': 'Soft_Drink_Sales', 'description': 'Soft drink sales', 'metaData':[ {'name':'region','type':'STRING'} ], 'correlationData':[ {'name':’transactionID’,'type':'STRING'} ], 'payloadData':[

{'name':'brand','type':'STRING'}, {'name':'quantity','type':'INT'},

{'name':'total','type':'INT'}, {'name':'user','type':'STRING'}

]}

Page 32: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Projection

define stream TempStream(deviceID long, roomNo int, temp double);

from TempStreamselect roomNo, tempinsert into OutputStream ;

Page 33: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Inferred Streams

from TempStreamselect roomNo, tempinsert into OutputStream ;

define stream OutputStream(roomNo int, temp double);

Page 34: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Enrich

from TempStreamselect roomNo, temp,‘C’ as scaleinsert into OutputStream

define stream OutputStream(roomNo int, temp double, scale string);

from TempStreamselect deviceID, roomNo, avg(temp) as avgTempinsert into OutputStream ;

Page 35: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Transformation

from cseEventStream[price >= 20 and symbol==’IBM’]select symbol, volumeinsert into StockQuote

from TempStreamselect concat(deviceID, ‘-’, roomNo) as uid,

toFahrenheit(temp) as tempInF, ‘F’ as scale

insert into OutputStream ;

Page 36: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Split

from TempStreamselect roomNo, tempinsert into RoomTempStream ;

from TempStreamselect deviceID, tempinsert into DeviceTempStream ;

Page 37: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Pattern 2: Alerts and Thresholds

o What? detects a condition and generates alerts based on a condition. (e.g. Alarm on high temperature). o These alerts can be based on a simple value or

more complex conditions such as rate of increase etc.

o Usecases?o Raise alert when vehicle going too fasto Alert when a room is too hot

Page 38: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Filter Alert

from TempStream [ roomNo > 245 and roomNo <= 365 and temp > 40 ]

select roomNo, tempinsert into AlertServerRoomTempStream ;

Page 39: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Pattern 3: Simple Counting and Counting with Windows

o What? aggregate functions like Min, Max, Percentiles, etc

o Often they can be counted without storing any data

o Most useful when used with a windowo Usecases?

o Most metrics need a time bound so we can

compare ( errors per day, transactions per second)

o Linux Load Average give us an idea of overall trend by reporting last 1m, 3m, and 5m mean.

Page 40: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Types of windows

o Sliding windows vs. Batch (tumbling) windows o Time vs. Length windows

Also supports o Unique windowo First unique windowo External time window

Page 41: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Window

In Storm

Page 42: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Aggregation

In CEP (Siddhi)

from TempStreamselect roomNo, avg(temp) as avgTempinsert into HotRoomsStream ;

Page 43: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Sliding Time Window

from TempStream#window.time(1 min)select roomNo, avg(temp) as avgTempinsert all events into AvgRoomTempStream ;

Page 44: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Group By

from TempStream#window.time(1 min)select roomNo, avg(temp) as avgTempgroup by roomNoinsert all events into HotRoomsStream ;

Page 45: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Batch Time Window

from TempStream#window.timeBatch(5 min)select roomNo, avg(temp) as avgTempgroup by roomNoinsert all events into HotRoomsStream ;

Page 46: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Pattern 4: Joining Event Streams

o What? Create a new event stream by joining multiple streams

o Complication comes with time. So need at least one window

o Often used with a windowo Usecases?

o To detecting when a player has kicked the ball in a football game .

o To correlate TempStream and the state of the regulator and trigger control commands

Page 47: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Join with Storm

Page 48: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Join

define stream TempStream(deviceID long, roomNo int, temp double);

define stream RegulatorStream(deviceID long, roomNo int, isOn bool);

In CEP (Siddhi)

Page 49: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Join

define stream TempStream(deviceID long, roomNo int, temp double);

define stream RegulatorStream(deviceID long, roomNo int, isOn bool);

from TempStream[temp > 30.0]#window.time(1 min) as T join RegulatorStream[isOn == false]#window.length(1) as R on T.roomNo == R.roomNoselect T.roomNo, R.deviceID, ‘start’ as actioninsert into RegulatorActionStream ;

In CEP (Siddhi)

Page 50: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Pattern 5: Data Correlation, Missing Events, and Erroneous Datao What? find correlations and use that to detect and

handle missing and erroneous Data o Use Cases?

o Detecting a missing event (e.g., Detect a

customer request that has not been responded within 1 hour of its reception)

o Detecting erroneous data (e.g., Detecting failed

sensors using a set of sensors that monitor

overlapping regions. We can use those

redundant data to find erroneous sensors and remove those data from further processing)

Page 51: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Missing Event in Storm

Page 52: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Missing Event in CEP

In CEP (Siddhi)

from RequestStream#window.time(1h) insert expired events into ExpiryStream

from r1=RequestStream->r2=Response[id=r1.id] or r3=ExpiryStream[id=r1.id] select r1.id as id ...insert into AlertStream having having r2.id == null;

Page 53: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Pattern 6: Interacting with Databases

o What? Combine realtime data against historical data

o Use Cases?

o On a transaction, looking up the customer age

using ID from customer database to detect fraud (enrichment)

o Checking a transaction against blacklists and whitelists in the database

o Receive an input from the user (e.g., Daily

discount amount may be updated in the

database, and then the query will pick it automatically without human intervention).

Page 54: ACM DEBS 2015: Realtime Streaming Analytics Patterns

In Storm

Querying Databases

Page 55: ACM DEBS 2015: Realtime Streaming Analytics Patterns

In CEP (Siddhi)

Event Table

define table CardUserTable (name string, cardNum long) ;

@from(eventtable = 'rdbms' , datasource.name = ‘CardDataSource’ , table.name = ‘UserTable’, caching.algorithm’=‘LRU’)define table CardUserTable (name string, cardNum long)

Cache types supported● Basic: A size-based algorithm based on FIFO.● LRU (Least Recently Used): The least recently used event is dropped

when cache is full.● LFU (Least Frequently Used): The least frequently used event is dropped

when cache is full.

Page 56: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Join : Event Table

define stream Purchase (price double, cardNo long, place string);

define table CardUserTable (name string, cardNum long) ;

from Purchase#window.length(1) join CardUserTableon Purchase.cardNo == CardUserTable.cardNum

select Purchase.cardNo as cardNo, CardUserTable.name as name, Purchase.price as price

insert into PurchaseUserStream ;

Page 57: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Insert : Event Table

define stream FraudStream (price double, cardNo long, userName string);

define table BlacklistedUserTable (name string, cardNum long) ;

from FraudStreamselect userName as name, cardNo as cardNuminsert into BlacklistedUserTable ;

Page 58: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Update : Event Table

define stream LoginStream (userID string, islogin bool, loginTime long);

define table LastLoginTable (userID string, time long) ;

from LoginStreamselect userID, loginTime as timeupdate LastLoginTable

on LoginStream.userID == LastLoginTable.userID ;

Page 59: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Pattern 7: Detecting Temporal Event Sequence Patterns

o What? detect a temporal sequence of events or condition arranged in time

o Use Cases?

o Detect suspicious activities like small transaction immediately followed by a large transaction

o Detect ball possession in a football game

o Detect suspicious financial patterns like large buy and sell behaviour within a small time period

Page 60: ACM DEBS 2015: Realtime Streaming Analytics Patterns

In Storm

Pattern

Page 61: ACM DEBS 2015: Realtime Streaming Analytics Patterns

In CEP (Siddhi)

Pattern

define stream Purchase (price double, cardNo long,place string);

from every (a1 = Purchase[price < 100] -> a3= ..) -> a2 = Purchase[price >10000 and a1.cardNo == a2.cardNo]

within 1 dayselect a1.cardNo as cardNo, a2.price as price, a2.place as placeinsert into PotentialFraud ;

Page 62: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Pattern 8: Tracking

o What? detecting an overall trend over timeo Use Cases?

o Tracking a fleet of vehicles, making sure that

they adhere to speed limits, routes, and Geo-fences.

o Tracking wildlife, making sure they are alive (they

will not move if they are dead) and making sure they will not go out of the reservation.

o Tracking airline luggage and making sure they have not been sent to wrong destinations

o Tracking a logistic network and figuring out bottlenecks and unexpected conditions.

Page 63: ACM DEBS 2015: Realtime Streaming Analytics Patterns

TFL: Traffic Analytics

Built using TFL ( Transport for London) open data feeds. http://goo.gl/9xNiCm http://goo.gl/04tX6k

Page 64: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Pattern 9: Detecting Trends

o What? tracking something over space and time and detects given conditions.

o Useful in stock markets, SLA enforcement, auto scaling, predictive maintenance

o Use Cases?

o Rise, Fall of values and Turn (switch from rise to a fall)

o Outliers - deviate from the current trend by a large value

o Complex trends like “Triple Bottom” and “Cup and Handle” [17].

Page 65: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Trend in Storm

Build and apply an state machine

Page 66: ACM DEBS 2015: Realtime Streaming Analytics Patterns

In CEP (Siddhi)

Sequence

from t1=TempStream,t2=TempStream [(isNull(t2[last].temp) and t1.temp<temp) or

(t2[last].temp < temp and not(isNull(t2[last].temp))]+within 5 min

select t1.temp as initialTemp, t2[last].temp as finalTemp,t1.deviceID, t1.roomNo

insert into IncreaingHotRoomsStream ;

Page 67: ACM DEBS 2015: Realtime Streaming Analytics Patterns

In CEP (Siddhi)

Partition

partition by (roomNo of TempStream)begin

from t1=TempStream,t2=TempStream [(isNull(t2[last].temp) and t1.temp<temp) or (t2[last].temp < temp and not(isNull(t2[last].temp))]+within 5 min

select t1.temp as initialTemp, t2[last].temp as finalTemp,

t1.deviceID, t1.roomNo

insert into IncreaingHotRoomsStream ;end;

Page 68: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Detecting Trends in Real Life

o Paper “A Complex Event Processing Toolkit for Detecting Technical Chart Patterns” (HPBC 2015) used the idea to identify stock chart patterns

o Used kernel regression for smoothing and detected maxima’s and minimas.

o Then any pattern can be written as a temporal event sequence.

Page 69: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Pattern 10: Lambda Architecture

o What? runs the same query in both relatime and batch pipelines. This uses realtime analytics to fill the lag in batch analytics results. o Also called “Lambda Architecture”. See Nathen

Marz’s “Questioning the Lambda Architecture” o Use Cases?

o For example, if batch processing takes 15

minutes, results would always lags 15 minutes

from the current data. Here realtime processing fill the gap.

Page 70: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Lambda Architecture. How?

Page 71: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Pattern 11: Detecting and switching to Detailed Analysiso What? detect a condition that suggests some

anomaly, and further analyze it using historical data. o Use Cases?

o Use basic rules to detect Fraud (e.g., large transaction), then pull out all transactions done against that credit card for a larger time period (e.g., 3 months data) from batch pipeline and run a detailed analysis

o While monitoring weather, detect conditions like high temperature or low pressure in a given region, and then start a high resolution localized forecast for that region.

o Detect good customers (e.g., through expenditure of more than $1000 within a month, and then run a detailed model to decide the potential of offering a deal).

Page 72: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Pattern 11: How?

Page 73: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Pattern 12: Using a Machine Learning Modelo What? The idea is to train a model (often a

Machine Learning model), and then use it with the Realtime pipeline to make decisionso For example, you can build a model using R, export it as

PMML (Predictive Model Markup Language) and use it within your realtime pipeline.

o Use Cases?o Fraud Detection

o Segmentation

o Predict Churn

Page 74: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Predictive Analytics

o Build models and use them with WSO2 CEP, BAM and ESB using upcoming WSO2 Machine Learner Product ( 2015 Q2)

o Build model using R, export them as PMML, and use within WSO2 CEP

o Call R Scripts from CEP queries

Page 75: ACM DEBS 2015: Realtime Streaming Analytics Patterns

In CEP (Siddhi)

PMML Model

from TrasnactionStream #ml:applyModel(‘/path/logisticRegressionModel1.xml’,

timestamp, amount, ip)insert into PotentialFraudsStream;

Page 76: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Pattern 13: Online Control

o What? Control something Online. These would involve problems like current situation awareness, predicting next value(s), and deciding on corrective actions.

o Use Cases?o Autopilot

o Self-driving

o Robotics

Page 77: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Fraud Demo

Page 78: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Scaling & HA for Pattern Implementations

Page 79: ACM DEBS 2015: Realtime Streaming Analytics Patterns

So how we scale a system ?

o Vertical Scaling

o Horizontal Scaling

Page 80: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Vertical Scaling

Page 81: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Horizontal Scaling

E.g. Calculate Mean

Page 82: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Horizontal Scaling ...

E.g. Calculate Mean

Page 83: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Horizontal Scaling ...

E.g. Calculate Mean

Page 84: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Horizontal Scaling ...

How about scaling median ?

Page 85: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Horizontal Scaling ...

How about scaling median ?

If & only if we can partition !

Page 86: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Scalable Realtime solutions ...

Spark Streaming

o Supports distributed processingo Runs micro batcheso Not supports pattern & sequence detection

Page 87: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Scalable Realtime solutions ...

Spark Streaming

o Supports distributed processingo Runs micro batcheso Not supports pattern & sequence detection

Apache Storm

o Supports distributed processingo Stream processing engine

Page 88: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Why not use Apache Storm ?

Advantages

o Supports distributed processing

o Supports Partitioning

o Extendable

o Opensource

Disadvantages

o Need to write Java code

o Need to start from basic principles ( & data structures )

o Adoption for change is slow

o No support to govern artifacts

Page 89: ACM DEBS 2015: Realtime Streaming Analytics Patterns

WSO2 CEP += Apache Storm

Advantages

o Supports distributed processing

o Supports Partitioning

o Extendable

o Opensource

Disadvantages

o No need to write Java code (Supports SQL like query language)

o No need to start from basic principles (Supports high level

language)

o Adoption for change is fast

o Govern artifacts using Toolboxes

o etc ...

Page 90: ACM DEBS 2015: Realtime Streaming Analytics Patterns

How we scale ?

Page 91: ACM DEBS 2015: Realtime Streaming Analytics Patterns

How we scale ...

Page 92: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Scaling with Storm

Page 93: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Siddhi QL

define stream StockStream (symbol string, volume int, price double);

@name(‘Filter Query’)from StockStream[price > 75]select *insert into HighPriceStockStream ;

@name(‘Window Query’)from HighPriceStockStream#window.time(10 min)select symbol, sum(volume) as sumVolume insert into ResultStockStream ;

Page 94: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Siddhi QL - with partition

define stream StockStream (symbol string, volume int, price double);

@name(‘Filter Query’)from StockStream[price > 75]select *insert into HighPriceStockStream ;

@name(‘Window Query’)partition with (symbol of HighPriceStockStream)begin

from HighPriceStockStream#window.time(10 min)select symbol, sum(volume) as sumVolume insert into ResultStockStream ;

end;

Page 95: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Siddhi QL - distributed

define stream StockStream (symbol string, volume int, price double);

@name(Filter Query’)@dist(parallel= ‘3')from StockStream[price > 75]select *insert into HightPriceStockStream ;

@name(‘Window Query’)@dist(parallel= ‘2')partition with (symbol of HighPriceStockStream)begin

from HighPriceStockStream#window.time(10 min)select symbol, sum(volume) as sumVolume insert into ResultStockStream ;

end;

Page 96: ACM DEBS 2015: Realtime Streaming Analytics Patterns

On Storm UI

Page 97: ACM DEBS 2015: Realtime Streaming Analytics Patterns

On Storm UI

Page 98: ACM DEBS 2015: Realtime Streaming Analytics Patterns

High Availability

Page 99: ACM DEBS 2015: Realtime Streaming Analytics Patterns

HA / Persistence

o Option 1: Side by side o Recommendedo Takes 2X hardwareo Gives zero down time

o Option 2: Snapshot and restoreo Uses less HW o Will lose events between snapshotso Downtime while recovery o ** Some scenarios you can use event tables to keep intermediate state

Page 100: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Siddhi Extensions

● Function extension● Aggregator extension● Window extension● Transform extension

Page 101: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Siddhi Query : Function Extension

from TempStreamselect deviceID, roomNo,

custom:toKelvin(temp) as tempInKelvin, ‘K’ as scale

insert into OutputStream ;

Page 102: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Siddhi Query : Aggregator Extension

from TempStreamselect deviceID, roomNo, temp

custom:stdev(temp) as stdevTemp, ‘C’ as scale

insert into OutputStream ;

Page 103: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Siddhi Query : Window Extension

from TempStream#window.custom:lastUnique(roomNo,2 min)

select *insert into OutputStream ;

Page 104: ACM DEBS 2015: Realtime Streaming Analytics Patterns

Siddhi Query : Transform Extension

from XYZSpeedStream#transform.custom:getVelocityVector(v,vx,vy,vz)

select velocity, directioninsert into SpeedStream ;