debs 2011 pattern rewritingforeventprocessingoptimization
DESCRIPTION
DEBS 2011 presentationPattern Rewriting for event processing optimization by Ella Rabinovich, Opher Etzion and Avigdor GalTRANSCRIPT
IBM Haifa Research Lab – Event Processing
© 2011 IBM Corporation
Pattern Rewriting Framework forEvent Processing Optimization
Ella Rabinovich, Opher Etzion, Avigdor Gal
IBM Haifa Research Lab – Event Processing
© 2011 IBM Corporation2
Motivation Adi A., Etzion O. Amit - the situation manager.The VLDB Journal – The International Journal on Very Large Databases. Volume 13 Issue 2, 2004.
Previous studies indicate that thereis a major performance degradation asapplication complexity increases.
Mendes M., Bizarro P., Marques P. Benchmarkingevent processing systems: current state and future directions. WOSP/SIPEW 2010: 259-260.
event processing system benchmark
0
10000
20000
30000
40000
50000
60000
70000
80000
standby w orld noisy w orld filtered w orld complex w orld
category
thro
ug
hp
ut
throughput
event processing system benchmark
0
20000
40000
60000
80000
100000
120000
140000
standby world noisy world filtered world complex world
category
late
nc
y (
ms
)
performance time (ms)
performance study of event processing systems
0
100
200
300
400
500
600
700
800
900
1000
selection andprojection
aggregation overwindows
joins pattern detection
category
thro
ug
hp
ut
* 1
0^
3
system 1 system 2 system 3
Optimize complex scenarios
IBM Haifa Research Lab – Event Processing
© 2011 IBM Corporation3
Optimization tools
Blackbox optimizations:DistributionParallelismSchedulingLoad balancingLoad shedding
Whitebox optimizations:Implementation selectionImplementation optimizationPattern rewriting Our
focus
IBM Haifa Research Lab – Event Processing
© 2011 IBM Corporation4
An example of a complex scenario
E1 E2 E3 E15 E16
A process has 16 steps, that have to be executed in a predefined order; termination of each step creates an event with a status-code (SC).
The process is reported as committed whenThe 16 steps have completed in the correct order (sequence pattern) and the pattern assertion is satisfied.
The assertion that may look like:
E1.SC == E2.SC or E3.SC < 4
For this scenario we succeeded to achieve more than tenfold decreaseof latency, or more than 20% increase in throughput
IBM Haifa Research Lab – Event Processing
© 2011 IBM Corporation5
Pattern Rewriting Approach
The goal: create equivalent pattern that provides better performance
seq(E1,E2,E3,E4)seq(E1,E2,E3,E4)
seq(E1,E2,E3,E5,E6)seq(E1,E2,E3,E5,E6)
seq(E1,E2,E3)seq(E1,E2,E3)
seq(DE,E4)seq(DE,E4)
seq(DE,E5,E6)seq(DE,E5,E6)
all(E1,E2,E3,E4)all(E1,E2,E3,E4)
all(E1,E2)all(E1,E2)
all(E3,E4)all(E3,E4)
all(DE1,DE2)all(DE1,DE2)
subsumption of a common logic splitting for parallel execution
DE1
DE2
DE
Rewriting techniques exist in other domains such as: rule system, SQL queriesDue to the inherent complexity of event processing patterns there are some unique challenges
IBM Haifa Research Lab – Event Processing
© 2011 IBM Corporation6
Challenges: Assertion Split
A pattern assertion (PA) is a predicate that event collection needs to satisfied for the pattern to be matched.A pattern assertion (PA) is a predicate that event collection needs to satisfied for the pattern to be matched.
seq(E1,E2)with PA’
seq(E1,E2)with PA’
seq(DE,E3)with PA’’
seq(DE,E3)with PA’’
DE
seq(E1,E2,E3) with pattern assertion: E1.SC == E2.SC OR E3.SC < 4
E1.SC == E2.SC OR E3.SC <4 E1.SC == E2.SC E3.SC < 4
seq(E1,E2,E3)with PA
seq(E1,E2,E3)with PA
the direct connection of the two patterns implies “AND” operator between PA’ and PA’’
seq(E1,E3)with PA’
seq(E1,E3)with PA’
seq(DE,E2)with PA’’
seq(DE,E2)with PA’’
DE
seq(E1,E2,E3) with pattern assertion: E1.SC == E3.SC AND E2.SC = 0
E1.SC == E3.SC AND E2.SC = 0 E1.SC == E3.SC E2.SC = 0
seq(E1,E2,E3)with PA
seq(E1,E2,E3)with PA
the assertion should be separable in terms of its variables
IBM Haifa Research Lab – Event Processing
© 2011 IBM Corporation7
Assertion Split – Solution
Convert the pattern assertionexpression into conjunctive normal form (CNF).
Identify independent participants’sub-groups, by generating assertion variables dependency graph.
Maximal number of independent partitions implies the finest granulation of the assertion expression.
E1E1 E2E2 E4E4 E5E5 E6E6
E3E3
(E1.SC > E2.SC) AND (E4.SC > E5.SC) ANDNOT ((E5.SC==E6.SC) AND (E3.SC==77)) (E1.SC > E2.SC) AND (E4.SC > E5.SC) ANDNOT ((E5.SC==E6.SC) AND (E3.SC==77))
(E1.SC > E2.SC) AND (E4.SC > E5.SC) AND (NOT(E5.SC==E6.SC) OR NOT(E3.SC==77))(E1.SC > E2.SC) AND (E4.SC > E5.SC) AND (NOT(E5.SC==E6.SC) OR NOT(E3.SC==77))
CNF
IBM Haifa Research Lab – Event Processing
© 2011 IBM Corporation8
Pattern Matching - Policies
PG1 PG2 ATM-W1
Instance selection policy
PG1 PG2ATM-W1
ATM-W2
first detection additional detection?
Cardinality policy
PG1 PG2ATM-W1
first detection – are instances consumed?
ATM-W2
Consumption policy
Pattern: seq(PG, ATM-W) within 10 minutes
IBM Haifa Research Lab – Event Processing
© 2011 IBM Corporation9
Challenges: Policies Mapping
Naïve pattern split, keeping the original policies in the rewrittenversion will result in incorrect matching:
seq(E1,E2,E3){single, last, …}seq(E1,E2,E3){single, last, …}
seq(E1,E2){single, last, …}
seq(E1,E2){single, last, …}
seq(DE,E3){single, last, …}
seq(DE,E3){single, last, …}
e1.1 e3.1 e1.1 e3.1e2.1blood
pressuremeasure
e2.2blood
pressuremeasure
e2.1blood
pressuremeasure
e2.2blood
pressuremeasure
detection point
detection point
detection point
IBM Haifa Research Lab – Event Processing
© 2011 IBM Corporation10
Policies Mapping – Solution
Mapping of policies in the rewritten alternative (f2’ + f2’’),based on the original pattern (f1):
policy original (f1) rewritten (f2’) rewritten (f2’’)
Cardinality single unrestricted single
Instance selection
last last last
Consumption - reuse -
seq(E1,E2,E3)seq(E1,E2,E3) seq(E1,E2)seq(E1,E2) seq(DE,E3)seq(DE,E3)
policy original (f1) rewritten (f2’) rewritten (f2’’)
Cardinality unrestricted unrestricted unrestricted
Instance selection
last each last
Consumption consume reuse consume
+ pattern assertion extensions
IBM Haifa Research Lab – Event Processing
© 2011 IBM Corporation11
Denotational semantics approach:
Event processing pattern is a function (f), mapping pattern’s input (participantset - PS) into its output (matching set - MS). We formally demonstrate that forthe same PS both alternatives produce the identical MS:
f1(PS, …) == f2’((f2’’(PS’, …) PS’’), …) PS
Equivalence assurance
seq(E1,…, EN)PA, Policies
seq(E1,…, EN)PA, Policies
seq(E1, …, EK)PA’, Policies’
seq(E1, …, EK)PA’, Policies’
seq(DE, EK+1, …, EN)PA’’, Policies’’
seq(DE, EK+1, …, EN)PA’’, Policies’’
participant set (PS)
participant set (PS)
matching set (MS)
matching set (MS)
IBM Haifa Research Lab – Event Processing
© 2011 IBM Corporation12
Throughput vs. Latency Tradeoff
Pattern throughput is an average rate of events it can processPattern throughput is an average rate of events it can process
The detecting event latency as a delay between the last input event causing a
this pattern detection and the detection itself, resulting in derivationof an output event.
The detecting event latency as a delay between the last input event causing a
this pattern detection and the detection itself, resulting in derivationof an output event.
Example: seq(E1,E2,E3) produces derived event DE
Detecting event latency = DE.detection_time - E3.detection_time DE.detection_time: time DE was detected by the systemE3.detection_time: time E3 arrived to the system
seq(E1,…, EN)seq(E1,…, EN) seq(E1, …, EN-2)seq(E1, …, EN-2) seq(DE, EN-1, EN)seq(DE, EN-1, EN)
throughput latency throughput latency
lazy evaluation eager evaluation
IBM Haifa Research Lab – Event Processing
© 2011 IBM Corporation13
Bi-objective Performance Optimization
Define bi-objective performance function Assign a scalar weight for each objective to be optimized
Weight of to pattern throughput (th)
Complementary weight (1-) to the detecting event latency
(lt)
Minimize the goal function of the form:
g = *lt + C*(1-)*(1/th) Simulation-based approach to select the optimal rewriting
alternative (minimizing the goal function g) For a set of rewriting alternatives A = {A1, … AK}, find
argminAi ( g )
IBM Haifa Research Lab – Event Processing
© 2011 IBM Corporation14
Experimental Results
# rewriting
rewritten pattern throughput (event/s)
Detected event
latency (ms)
1 0 : 8 140 260
2 1 : 7 110 189
3 2 : 6 142 174
4 3 : 5 155 147
5 4 : 4 165 95
6 5 : 3 172 63
7 6 : 2 163 32
8 7 : 1 95 15
lazy
Simulation results for seq (E1, …, E16) split of pairs
eager
The Paretofrontier
Min latency
Max throughput
The basepattern
Not in thePareto frontier
IBM Haifa Research Lab – Event Processing
© 2011 IBM Corporation15
Future Work
Pattern rewriting framework for event processing optimization With more than tenfold performance improvement between the original
pattern and its rewritten alternative
Future research and practical activities Investigation of additional rewritings
Using patterns of the same type (e.g., for all pattern)
Additional methods for rewriting (e.g. seq using all and filter agents)
Elaborating an algorithm for event processing network rewriting
Exploring heuristic-based approach for selection of the rewriting alternativeof the sequence pattern
IBM Haifa Research Lab – Event Processing
© 2011 IBM Corporation16