wso2 product release webinar - introducing the wso2 complex event processor
TRANSCRIPT
WSO2 Product Release Webinar
WSO2 Complex Event Processor 2.0.1
Simplifying High Performant Data Processing
S. Suhothayan (Suho) Software Engineer,
Data Technologies Team.
Outline � What is Complex Event Processing? � WSO2 CEP Server & SOA integrates � The Siddhi Runtime CEP Engine. � High availability, Persistence and Scalability of
WSO2 CEP � How CEP can be combined with Business
Activity Monitoring (BAM). � Demo
Complex Event Processing ?
Complex Event processing is about listening to events and detecting patterns in
near real-time without storing all events.
WSO2 Inc. 4
CEP Is & Is NOT! � Is NOT!
o Simple filters - Simple Event Processing - E.g. Is this a gold or platinum customer?
o Joining multiple event streams - Event Stream Processing
� Is ! o Processing multiple event streams o Identify meaningful patterns among streams o Useing temporal windows
- E.g. Notify if there is a 10% increase in overall trading
activity AND the average price of commodities has
fallen 2% in the last 4 hours
WSO2 CEP Server � Enterprise grade server for CEP runtimes � Supports several transports (network access) � Supports several data formats � Support for multiple CEP runtimes � Governance � Monitoring � Tools (WSO2 Dev Studio)
CEP Brokers
� Is an adaptor for receiving and publishing events
� Has the configurations to connect to external endpoints
� Its many-to-many with CEP engine
CEP Brokers � Support for several transports (network access)
and data formats o SOAP/WS-Eventing
- XML messages o REST
- JSON messages o JMS
- Map messages - XML messages - Text messages
o SMTP (Email) - Text messages
o Thrift - WSO2 data format High Performant Event Capturing & Delivery Framework supports Java/C/C++/C# via Thrift language bindings - WSO2 Event
� & Brokers are pluggable !
CEP Buckets
� Is an isolated logical execution unit
� Each CEP bucket has a set of o Queries o Input & Output
event mappings. � Its one-to-one with
a CEP Backend Runtime Engine
Opensource CEP Runtimes for Buckets � Siddhi
o Apache License, a java library, Tuple based event model
o Supports distributed processing o Supports multiple query models
- Based on a SQL-like language - Filters, Windows, Joins, Ordering and others
� Esper, http://esper.codehaus.org (Deprecated) o GPLv2 License, a Java library, Events can be XML, Map,
Object o Supports multiple query models
- Based on a SQL-like language - Filters, Windows, Joins, Ordering and others
� Drools Fusion (Deprecated) o Apache License, a java library o Support for temporal reasoning + windows
Developer Studio UI
� Eclipse based tool to define buckets
� Can manage the configurations throughout the production lifecycle
� Note: 2.1.0 Still not support Text Output Mapping
Monitoring � Provides real-time statistical visual illustrations of
request & response counts per time based on CEP server, bucket, broker and topics.
Siddhi Queries � Filters and Projection � Windows
o Events are processed within temporal windows. (e.g. for aggregation and joins)
Time window vs. length window. � Joins
o Join two streams � Event ordering
o Identify event sequences and patterns
Filters
� Filters the events by conditions � Conditions
o >, <, = , <=, <=, != o contains, instanceof o and, or, not
� Example
from <stream-name> [<conditions>]* insert into <stream-name>
from cseEventStream[price >= 20 and symbol==’IBM’] insert into StockQuote symbol, volume
Window
� Types of Windows o (Time | Length) (Sliding| Batch) windows
� Type of aggregate functions o sum, avg, max, min
� Example
from <stream-name> [<conditions>]#window.<window-name>(<parameters>) Insert [<output-type>] into <stream-name
from cseEventStream[price >= 20]#window.lengthBatch(50) insert into StockQuote symbol, avg(price) as avgPrice group by symbol having avgPrice>50
Join
� Join two streams based on a condition and window � Unidirectional – event arriving only to the
unidirectional stream triggers join � Example
from <stream>#<window> [unidirectional] join <stream>#<window> on <condition> within <time> insert into <stream>
from TickEvent[symbol==’IBM’]#window.length(2000) join NewsEvent#window.time(5 min) insert into JoinStream *
Pattern
� Check condition A happen before/after condition B � Can do iterative checks via “every” keyword. � Here with “within <time>”, SIddhi emits only events
that are within that time of each other � Example
from [every] <condition> Æ [every] <condition> … <condition> within <time> insert into StockQuote (<attribute-name>* | * )
from every (a1 = purchase[price < 10] ) Æa2 = purchase [price >10000 and a1.cardNo==a2.cardNo]
within 1 day insert into potentialFraud a1.cardNo as cardNo, a2.price as price, a2.place as place
a1 x1 k5 a2 n7 y1
Sequence
� Regular Expressions supported o * - Zero or more matches (reluctant). o + - One or more matches (reluctant). o ? - Zero or one match (reluctant). o or – either event
� Here we have to refer events returned by * , + using square brackets to access a specific occurrence of that event
from <event-regular-expression> within <time> insert into <stream>
from a1 = requestOrder[action == "buy"], b1 = cseEventStream[price > a1.price and symbol==a1.symbol]+, b2 = cseEventStream[price <b1.price] insert into purchaseOrder a1. symbol as symbol, b1[0].price as firstPrice, b2.price as orderPrice
a1 b1 b1 b2 n7 y1
� We compared Siddhi with Esper, the widely used opensource CEP engine
� For evaluation, we did setup different queries using both
systems, push events in to the system, and measure the time till all of them are processed.
� We used Intel(R) Xeon(R) X3440 @2.53GHz , 4 cores 8M
cache 8GB RAM running Debian 2.6.32-5-amd64 Kernel
Performance Results
Simple filter without window
Performance Comparison With ESPER
from StockTick[prize >6] return symbol, price
State machine query for pattern matching
Performance Comparison With ESPER
From f=FraudWarningEvent -> p=PINChangeEvent(accountNumber=f.accountNumber) return accountNumber;
Performance of WSO2 CEP � Here we publihsed data from two client publisher
nodes to the CEP Sever node and sent the triggered notifications of CEP to a client subscriber node.
� To test the worsecase sinario, 100% of the data
published to CEP is recived at the subscriber node after processing (No data is filtered)
� We used Intel® Core™ i7-2630QM CPU @ 2.00GHz, 8
cores, 8GB RAM running Ubnthu 12.04, 3.2.0-32-generic Kernel, for running CEP and used Intel® Core™
i3-2350M CPU @ 2.30GHz, 4 cores, 4GB RAM running Ubnthu 12.04, 3.2.0-32-generic Kernel, for the three client nodes.
Simple filter without window
Performance of WSO2 CEP
from StockTick[prize >6] return symbol, price
1 2 3 4 5 6 7 8 9 10 50 100 Avg 67 135 181 210 212 232 245 250 234 186 187 112
0
50
100
150
200
250
300
kilo
Eve
nts/
Sec
# Clients
WSO2 CEP Throughput
HA/ Persistence � Ability to recover
runtime state in the case of a failure.
� Enables queries to span lifetimes much greater than server uptime.
� Takes periodic snapshots and stores all state information to a scalable persistence store (Apache Cassandra).
� Supports pluggable persistent stores.
Scaling � Vertically scaling
o Can be distributed as a pipeline � Horizontally scaling
o Queries like windows, patterns, and Join have shared states, hence hard to distribute!
o Use distributed cache (Hazelcast) to achieve this - shared memory and batch processing
Event Recording � Ability to record all/some of the events for
future processing � Few options
o Publish them to Cassandra cluster using WSO2 data bridge API or BAM (can process data in Cassandra with Hadoop using WSO2 BAM).
o Write them to distributed cache o Custom thrift based event recorder
Scenario � Monitoring stock exchange for game changing
moments � Two input event streams.
o Event stream of Stock Quotes from a stock exchange
o Event stream of word count on various company names from twitter pages
� Check whether the last traded price of the stock has changed significantly(by 2%) within last minute, and people are twitting about that company (> 10) within last minute
Input events � Input events are JMS Maps
o Stock Exchange Stream
Map<String, Object> map1 = new HashMap<String, Object>(); map1.put("symbol", "MSFT"); map1.put("price", 26.36); publisher.publish("AllStockQuotes", map1);
o Twitter Stream
Map<String, Object> map1 = new HashMap<String, Object>();
map1.put("company", "MSFT");
map1.put("wordCount", 8);
publisher.publish("TwitterFeed", map1);
Queries from allStockQuotes[win.time(60000)] insert into fastMovingStockQuotes symbol,price, avg(price) as averagePrice group by symbol having ((price > averagePrice*1.02) or (averagePrice*0.98 > price )) from twitterFeed[win.time(60000)] insert into highFrequentTweets company as company, sum(wordCount) as words group by company having (words > 10) from fastMovingStockQuotes[win.time(60000)] as fastMovingStockQuotes join highFrequentTweets[win.time(60000)] as highFrequentTweets on fastMovingStockQuotes.symbol==highFrequentTweets.company insert into predictedStockQuotes fastMovingStockQuotes.symbol as company, fastMovingStockQuotes.averagePrice as amount, highFrequentTweets.words as words
Alert � As a Email
Hi Within last minute, people being twitting about {company}
{words} times, and the last traded price of {company} has changed by 2% and now being trading at ${amount}.
From CEP
Useful links � WSO2 CEP 2.0.1
http://wso2.com/products/complex-event-processor/
� Distributed Processing Sample With Siddhi CEP and ActiveMQ JMS Broker.
http://suhothayan.blogspot.com/2012/08/distributed-processing-sample-for-wso2.html
� Creating Custom Data Publishers to BAM/CEP http://wso2.org/library/articles/2012/07/creating-custom-agents-publish-
events-bamcep
� WSO2 BAM 2.0.1 http://wso2.com/products/business-activity-monitor/