parallel complex event processing
TRANSCRIPT
![Page 1: Parallel Complex Event Processing](https://reader034.vdocuments.net/reader034/viewer/2022050614/554f71adb4c905c8088b5628/html5/thumbnails/1.jpg)
Parallel Complex Event Processing
Karol Grzegorczyk03-06-2013
![Page 2: Parallel Complex Event Processing](https://reader034.vdocuments.net/reader034/viewer/2022050614/554f71adb4c905c8088b5628/html5/thumbnails/2.jpg)
Big Data classification
[http://en.wikipedia.org/wiki/File:3_states_of_data.jpg]
![Page 3: Parallel Complex Event Processing](https://reader034.vdocuments.net/reader034/viewer/2022050614/554f71adb4c905c8088b5628/html5/thumbnails/3.jpg)
Event-driven architecture
![Page 4: Parallel Complex Event Processing](https://reader034.vdocuments.net/reader034/viewer/2022050614/554f71adb4c905c8088b5628/html5/thumbnails/4.jpg)
Complex Event Processing solutions
● Open Source:
– Esper
– Drools Fusion
– Storm
– WSO2 Complex Event Processor● Proprietary software
– Oracle Complex Event Processing
– StreamBase Complex Event Processing
– Informatica RulePoint
– TIBCO Complex Event Processing
![Page 5: Parallel Complex Event Processing](https://reader034.vdocuments.net/reader034/viewer/2022050614/554f71adb4c905c8088b5628/html5/thumbnails/5.jpg)
Esper
● Two editions:
― Open source library
― Enterprise server based on Jetty
● Core component of Esper is a CEP engine.
● CEP engine is working like database turned upside-down
● Expressions are defined in Event Processing Language (EPL)
― Declarative domain specific language
― Similar with the SQL query language but differs from SQL in its use of views
rather than tables and events instead of records (rows)
― Views are reused among EPL statements for efficiency!
select * from OrderEvent.win:length(5)
![Page 6: Parallel Complex Event Processing](https://reader034.vdocuments.net/reader034/viewer/2022050614/554f71adb4c905c8088b5628/html5/thumbnails/6.jpg)
Streams
● Complex event can be build based on several data streams.select * from AlertEvent as a, NewsEvent as n
where a.symbol = n.symbol
● Esper defines two types of data streams:
― Filter-based event streamselect * from OrderEvent(itemType='shirt')
― Pattern-based event streamselect * from pattern [
OrderEvent(itemType='shirt') -> OrderEvent(itemType='trousers')]
● It is possible to join between filter-based and pattern-based streams!
● Events can be forwarded to others streams using INSERT INTO keywords.
● It is also possible to update event (using UPDATE keyword) before it applies
to any selecting statements
![Page 7: Parallel Complex Event Processing](https://reader034.vdocuments.net/reader034/viewer/2022050614/554f71adb4c905c8088b5628/html5/thumbnails/7.jpg)
Views
● Events are derived from streams (both filter- and pattern-based) by views
● Default view encloses all events from the stream since addition of the statement to the engine.
● View types:– Data windows (e.g. lenght, time)
– Named windows
– Extension Views (sorted window, rankied window, time-order view)
– Standard views (unique, grouped, size, lastevent)
– Statistics view (univariate, regression, correlation)
![Page 8: Parallel Complex Event Processing](https://reader034.vdocuments.net/reader034/viewer/2022050614/554f71adb4c905c8088b5628/html5/thumbnails/8.jpg)
Esper processing
● Update listeners and subscriber objects are associated with EPL statements
● By defualt listeners and subscribers are notified when new event that match EPL query arrive (insert stream)
● In addition listeners and subscribers can be notified when some event that match EPL query is removed from the stream (due to the limit of particular window)
![Page 9: Parallel Complex Event Processing](https://reader034.vdocuments.net/reader034/viewer/2022050614/554f71adb4c905c8088b5628/html5/thumbnails/9.jpg)
[Esper Reference]
![Page 10: Parallel Complex Event Processing](https://reader034.vdocuments.net/reader034/viewer/2022050614/554f71adb4c905c8088b5628/html5/thumbnails/10.jpg)
Filtering
Esper provides two types of filtering:
● Stream-level filteringselect * from OrderEvent(type= 'shirt')
● Post-data-window filteringselect * from OrderEvent where type = 'shirt'
![Page 11: Parallel Complex Event Processing](https://reader034.vdocuments.net/reader034/viewer/2022050614/554f71adb4c905c8088b5628/html5/thumbnails/11.jpg)
[Esper Reference]
![Page 12: Parallel Complex Event Processing](https://reader034.vdocuments.net/reader034/viewer/2022050614/554f71adb4c905c8088b5628/html5/thumbnails/12.jpg)
[Esper Reference]
![Page 13: Parallel Complex Event Processing](https://reader034.vdocuments.net/reader034/viewer/2022050614/554f71adb4c905c8088b5628/html5/thumbnails/13.jpg)
Stream-level filtering vs post-data-window filtering
select * from OrderEvent(type= 'shirt')
vs
select * from OrderEvent where type = 'shirt'
The first form is preferred, but still sometimes post-data-window filtering is desired:
Select one hundred orders and calculate average price of trousers.
select avg(price) from OrderEvent.win:length(100) where type = 'trousers'
![Page 14: Parallel Complex Event Processing](https://reader034.vdocuments.net/reader034/viewer/2022050614/554f71adb4c905c8088b5628/html5/thumbnails/14.jpg)
Data Windows
● Basic windows:
― Length window (win:length)
― Length batch window (win:length_batch)
― Time window (win:time)
― Time batch window (win:time_batch)
● Advanced time windows
― Externally-timed window (win:ext_timed)
― Externally-timed batch window (win:ext_timed_batch)
― Time-Length combination batch window (win:time_length_batch)
― Time-Accumulating window (win:time_accum)
― Keep-All window (win:keepall)
― First Length (win:firstlength)
― First Time (win:firsttime)
![Page 15: Parallel Complex Event Processing](https://reader034.vdocuments.net/reader034/viewer/2022050614/554f71adb4c905c8088b5628/html5/thumbnails/15.jpg)
[Esper Reference]
![Page 16: Parallel Complex Event Processing](https://reader034.vdocuments.net/reader034/viewer/2022050614/554f71adb4c905c8088b5628/html5/thumbnails/16.jpg)
[Esper Reference]
![Page 17: Parallel Complex Event Processing](https://reader034.vdocuments.net/reader034/viewer/2022050614/554f71adb4c905c8088b5628/html5/thumbnails/17.jpg)
Scaling Esper
● According to the documentation Esper exceeds over 500 000 event/s on a dual CPU 2GHz Intel based hardware, with engine latency below 3 microseconds average (below 10us with more than 99% predictability) on a VWAP benchmark with 1000 statements registered in the system - this tops at 70 Mbit/s at 85% CPU usage.
● Parallel processing
– Within one machine
- Context partitions
– With multiple machines
- Partitioned stream- Partition by use case
![Page 18: Parallel Complex Event Processing](https://reader034.vdocuments.net/reader034/viewer/2022050614/554f71adb4c905c8088b5628/html5/thumbnails/18.jpg)
Context
● Context partition – basic level for locking
● By default single context partition
● Context types:
― Keyed Segmented
― Hash Segmented
― Category Segmented
― Non-overlapping context
― Overlapping context
● Nesting context
![Page 19: Parallel Complex Event Processing](https://reader034.vdocuments.net/reader034/viewer/2022050614/554f71adb4c905c8088b5628/html5/thumbnails/19.jpg)
Keyed Segmented Context
create context ByCustomerAndAccountpartition by custId and account from BankTxn
context ByCustomerAndAccountselect custId, account, sum(amount) from BankTxn
Implicite grouping in select statement.
![Page 20: Parallel Complex Event Processing](https://reader034.vdocuments.net/reader034/viewer/2022050614/554f71adb4c905c8088b5628/html5/thumbnails/20.jpg)
Hash Segmented Context
Assigns events to context partitions based on result of a hash function and modulo operation
create context SegmentedByCustomerHash coalesce by hash_code (custId) from BankTxn granularity 16 preallocate
context SegmentedByCustomerHashselect custId, account, sum(amount) from BankTxn group by custId, account
No implicite grouping in select statement!
![Page 21: Parallel Complex Event Processing](https://reader034.vdocuments.net/reader034/viewer/2022050614/554f71adb4c905c8088b5628/html5/thumbnails/21.jpg)
Category Segmented Context
Assigns events to context partitions based on the values of one or more event properties, using a predicate expression(s) to define context partition membership.
create context CategoryByTempgroup temp < 65 as cold,group temp between 65 and 85 as normal,group temp > 85 as largefrom SensorEvent
context CategoryByTempselect context.label, count(*) from SensorEvent
![Page 22: Parallel Complex Event Processing](https://reader034.vdocuments.net/reader034/viewer/2022050614/554f71adb4c905c8088b5628/html5/thumbnails/22.jpg)
Non-overlapping context
Non-overlapping context is created when start condition is meet and ended when end condition is meet. There is always either one or zero context partions.
create context NineToFive start (0, 9, *, *, *) end (0, 17, *, *, *)
context NineToFive select * from TrafficEvent(speed >= 100)
![Page 23: Parallel Complex Event Processing](https://reader034.vdocuments.net/reader034/viewer/2022050614/554f71adb4c905c8088b5628/html5/thumbnails/23.jpg)
Overlapping context
This context initiates a new context partition when an initiating condition occurs, and terminates one or more context partitions when the terminating condition occurs.
create context CtxTrainEnter initiated by TrainEnterEvent as te terminated after 5 minutes
context CtxTrainEnter select t1 from pattern [t1=TrainEnterEvent -> timer:interval(5 min) and not TrainLeaveEvent(trainId = context.te.trainId)]
![Page 24: Parallel Complex Event Processing](https://reader034.vdocuments.net/reader034/viewer/2022050614/554f71adb4c905c8088b5628/html5/thumbnails/24.jpg)
Context nesting
In case of nested contextx the context declared first controls thelifecycle of the context(s) declared thereafter.
create context NineToFiveSegmentedcontext NineToFive start (0, 9, *, *, *) end (0, 17, *, *, *),context SegmentedByCustomer partition by custId from BankTxn
context NineToFiveSegmentedselect custId, account, sum(amount) from BankTxn group by account
![Page 25: Parallel Complex Event Processing](https://reader034.vdocuments.net/reader034/viewer/2022050614/554f71adb4c905c8088b5628/html5/thumbnails/25.jpg)
Partitioning without context declaration
Grouped data window std:groupwin()
What is the difference between:
select avg(price) from OrderEvent.std:groupwin(itemType).win:length(10)
And
select avg(price) from OrderEvent.win:length(10) group by itemType
?
![Page 26: Parallel Complex Event Processing](https://reader034.vdocuments.net/reader034/viewer/2022050614/554f71adb4c905c8088b5628/html5/thumbnails/26.jpg)
Parallel processing on multiple machines
● Partitioned stream● Partition by use case
![Page 27: Parallel Complex Event Processing](https://reader034.vdocuments.net/reader034/viewer/2022050614/554f71adb4c905c8088b5628/html5/thumbnails/27.jpg)
[Esper Enterprise Edition Reference]
![Page 28: Parallel Complex Event Processing](https://reader034.vdocuments.net/reader034/viewer/2022050614/554f71adb4c905c8088b5628/html5/thumbnails/28.jpg)
Thank you