Transcript

MIDDLEWARE SYSTEMSRESEARCH GROUP

msrg.org

Predictive Publish/Subscribe Matching

Joint work with Vinod Muthusamy& Haifeng Liu

University of Toronto

P-ToPSSproject

Hans-Arno Jacobsen

MIDDLEWARE SYSTEMSRESEARCH GROUP

msrg.orgLittle Anecdote

2

Date: Mon, 14 Sep … 10:37:26 -0400From: "security@noc ... "To: …Cc: … CNS Security AdminSubject: DDoS attack originating from …

MIDDLEWARE SYSTEMSRESEARCH GROUP

msrg.org/var/log/secure* & LogWatchaaron/password from 211.43.206.53: …abdullah/password from 211.43.206.53: abraham/password from 211.43.206.53: abram/password from 211.43.206.53: account/password from 142.150.237.133:account/password from 211.43.206.53:adam/password from 211.43.206.53: addison/password from 211.43.206.53: aditya/password from 211.43.206.53: admin/password from 142.150.237.133: 18 Time(s)admin/password from 211.43.206.53: 18 Time(s)administrator/password from 142.150.237.133: 3 Time(s)administrator/password from 211.43.206.53: 3 Time(s)jacobsen/password from 191.43.206.53: 2 Time(s)

3

MIDDLEWARE SYSTEMSRESEARCH GROUP

msrg.orgAnd So It Happened: Post-mortem forensics via events across different logs

… deniedJohn 211.43.206.53 successful timestamp…John logoff timestamp…John 190.35.106.46 successful timestamp…John password changed

4

Had set user john with password john!

MIDDLEWARE SYSTEMSRESEARCH GROUP

msrg.orgPredictive Analytics?• Series of failed login attempts from same IP

– System is under attack• Series of failed login attempts from same IP,

followed by successful login from that IP, followed by immediate logoff– System compromised

• Could we predict that the system is going to be compromised soon with a certain probability, after observing a partial match of the above pattern? – E.g.,: "failed logins from IP, successful login from IP”

5Compromised? Compromised? Compromised?

MIDDLEWARE SYSTEMSRESEARCH GROUP

msrg.orgEvents, Subscriptions & Publish/Subscribe

• Here, events are– Login attempts, logoff, system compromised

• Here, subscriptions are– Specific patterns of interest

• Series of login attempts from same IP• Series of login attempts from same IP, followed by logoff

• The publish/subscribe system is the abstraction that matches subscriptions based on events observed

• A match detects the event, e.g., system compromised

6

MIDDLEWARE SYSTEMSRESEARCH GROUP

msrg.orgOutline• Predictive Toronto Publish/Subscribe System

• Event & subscription language model

• Matching with P-ToPSS

• Predicting with P-ToPSS

• Evaluation

7

MIDDLEWARE SYSTEMSRESEARCH GROUP

msrg.orgP-ToPSS is Latest ToPSS Member

• For many applications raising an alert after a malicious activity occurred is too late– Credit card fraud (fraud committed)– Network intrusion (system compromised)– Problem determination (problem occurred)– Root-cause analysis (system crashed, poor user experience)

• Capability to predict the probability that a given subscription will match in the future is needed.

• P-ToPSS computes the probability that a subscription will match based on the event history and based on partial matches observed so far.

8

MIDDLEWARE SYSTEMSRESEARCH GROUP

msrg.orgP-ToPSS Model

9

Match

Engine cs1 will be matched with Probability 0.5

cs4 will be matched with Probability 0.75

cs2 is matched

cs1 is fully matchedcs1 will be matched with Probability 0.8

Publish/Subscribe matching problem• Find all matches

Publish/Subscribe prediction problem• Find partial matches • Determine subscriptions with matching probability > threshold

MIDDLEWARE SYSTEMSRESEARCH GROUP

msrg.org

10

Event ModelAn event: e = {(a1,v1),(a2,v2), …(an,vn)}

Event stream: {e1, e2, … ek, …}

Events are ordered (system timestamps)

MIDDLEWARE SYSTEMSRESEARCH GROUP

msrg.org

11

Subscription Language Model• Primitive subscriptions

– S = p1 p2 p3, …– pi is a Boolean predicate

• Composite subscriptions– CS = R(S1, S2 , S3 , … Sm)

• R: Operators– Temporal operators:

• , : contiguous sequence• ; : non-contiguous sequence• @:explicit temporal operator

– Boolean operators:• : conjunction• : disjunction

• Contiguous event sequence• No event can be skipped

• Non-contiguous event sequence• Events can be skipped

MIDDLEWARE SYSTEMSRESEARCH GROUP

msrg.orgExample

s1: ip=$x login=denied

s2: ip=$x login=denied

s3: ip=$x login=success

s4: ip=$x login=success

s5: ip=$x action=passwd

s6: ip=$x action=logoff

12

csintrusion matched by {e0 , e1}, e2, e3, e4

csintrusion = s1; ( ( s2;s3@(t3-t2<d) ) (s4,s5) );s6

MIDDLEWARE SYSTEMSRESEARCH GROUP

msrg.orgProblem Statement

• Matching Problem– Given a set of composite subscriptions, CS, and

an event stream, {ei}, find all cs = R(s1, s2, …, sn) such there that exists {ej1,ej2,…, ejn} {ei} and ej1 matches s1, … , ejn matches sn subject to R and all time constraints are satisfied.

• Prediction Problem– Find all partially matched cs such that

Prcs(full match | partial) > θcs

13

MIDDLEWARE SYSTEMSRESEARCH GROUP

msrg.orgRequired Matching Tasks

• Composite subscription: s1; ( (s2;s3@(t3-t1<d) ) (s4,s5) );s6

• Primitive subscriptions, like si, matching single events (i.e., sets of attribute-value-pairs)

• Sequences of primitive subscriptions matching consecutive and non-consecutive events in the input

• Boolean expressions, like term1 term2 above, matching higher-level patterns of events

• Computation of probabilities to predict full matches given partial matches

14

MIDDLEWARE SYSTEMSRESEARCH GROUP

msrg.orgMatching Engine

15

Primitive Subscriptions

Matcher

State Machine

Engine

Boolean Expression Tree

Matcher

Prediction Engine

Full matches Full matches

Event stream

Derived events

Derived events

Partial matches

Partial matches

Predictions (subscription, matching probability > θS)

Primitive subscription matches Primitive subscription matches

s1; ( (s2;s3@(t3-t1<d))(s4,s5) );s6

term1 term2 s2;s3

s3

MIDDLEWARE SYSTEMSRESEARCH GROUP

msrg.orgAlgorithms for Matching Tasks

• Primitive Subscription Matcher– BDD-based approach (our ICDCS’05 algorithm)– Alternatively, our SIGMOD’01 algorithm or our new

indEX (fastest Boolean Expression Index in the market)

• Boolean Expression Tree Matcher (state-based)– Extension of the Rete algorithms as in-memory

event processing network (Forgy, 1982)– For extensions & implementation , see our PADRES

code base at padres.msrg.org

16

MIDDLEWARE SYSTEMSRESEARCH GROUP

msrg.orgAlgorithms for Matching Tasks

• State Machine Engine– Based on evaluating finite state machines (FSMs)– Combined with techniques to merge states to amortize

processing of similar subscriptions– Combined with algorithms and data structures to track

time conditions

• Prediction Engine– Based on training and evaluating a Markov model

• Trained on past events• Evaluation over event stream

17

MIDDLEWARE SYSTEMSRESEARCH GROUP

msrg.orgState Machine Engine

• State machine creation• State machine evaluation

18

MIDDLEWARE SYSTEMSRESEARCH GROUP

msrg.org

Example: F, F, F @(tN3-tN1<d), S

We abstract for ease of presentation• F represents a primitive subscription that evaluates to true for a failed

login• S represents a primitive subscription that evaluates to true for a

successful login• Index in time constrain refers to position (state) in the subscription (FSM)

19

N0N1

(F)

F FF

@(tS3-tS1<d) SN2

(F,F)N3

(F,F,F)

N4(F,F,F,

S)

t

Time of the most recent transition into the state

• Explicit temporal operator treated as another predicate to be evaluated over transition times tracked for all states

Contiguous sequence operator

MIDDLEWARE SYSTEMSRESEARCH GROUP

msrg.org

20

N1

(F)

F FF

@(tS3-tS1<d) SN2

(F,F)N3

(F,F,F)

t1 t2 t3

FF

Event stream

F

time

N1

(F)

F FF

@(tS3-tS1<d) SN2

(F,F)N3

(F,F,F)

S = F, F, F @(tN3-tN1<d), S

Current state

N1

(F)t1

At t1

At t2

At t3

F

N1

(F)

F FF

@(tS3-tS1<d) SN2

(F,F)N3

(F,F,F)

F F

F

N2

(F,F)t2

N1

(F)t2

F

F

F

F

F

N3(F,F,F)

t3

N2

(F,F)t3

N1

(F)t3

@(tS3-tS1<d)

Contiguous sequence operator

MIDDLEWARE SYSTEMSRESEARCH GROUP

msrg.org

Example: F; S1; F; S2@(tS2-tS1<T)

21

N0N1

(F)

F S F S@TN2

(F;S)N3

(F;S;F)

N4(F;S;F;S@T)

• Events not contributing to matching a subscriptions are allowed to occur (must remain in current state; achieved via self-links)

• Upon a match of the next primitive subscription • Time conditions are checked, if any• Transition times are updated

• Transition times are only tracked for primary & secondary links

Non-contiguous sequence operator

F

*

S

*

F

*

Primary link

Secondary link

Self linkTriggered for every eventexcept those that triggerprimary & secondary links.

First transition into state Continued matching of primitive subscription that led tothe transitioning into this state.

MIDDLEWARE SYSTEMSRESEARCH GROUP

msrg.org

22

N0N1

(S1)S1 S2

@T1

S3

@T3

@T2

S1; S2 @T1; S3 @T2 @ T3

N2(S1;S2)

N3(S1;S2;S3

)

not(S2)

S2 not(T1)

not(S3)

S3 ( not(T2) not(T3) )

T1 : (tS2-tS1 < 3)T2 : (tS3-tS1 < 6)T3 : (tS3-tS2 > 3)

t1 time

S1

t4 t7

S1 S1 S2 S2 S2 S3 S3

not(S1)

not(S1)

S1

Time(S1): S1:t1

S1:t2

S1:t3

Time(S2):

S2:t4 Tc(S1) = {t2, t3} S2:t5 Tc(S1) = {t3}

Time(S3): S3:t8

Tc(S2) = {t4}

Tc(S1) = {t3}

S2 : t4 S2 : t5 S2 : t6 S3 : t7 S3 : t8

S1 S2 S3

MIDDLEWARE SYSTEMSRESEARCH GROUP

msrg.org

23

Merging State MachinesTwo states N1 and N2 are equivalent iff:

1. The number of incoming transitions of N1 and N2 are equal.2.Any incoming transitions arrive from equivalent states and are triggered by the same set of events. Initial states are equivalent.

N0a N2

(a;b)

b cN1

(a)

*N3

(a;b,c)

N0a N2

(a;b)

b dN1

(a)

*N3

(a;b,d)

N0a N1

(a)

M0

M2

(a;b)

bc

M1

(a)

a

*

M4

(a;b,d)

M3

(a;b,c)

d

N5

(a)

a

Merged:

• a; b; c• a; b; d• a

MIDDLEWARE SYSTEMSRESEARCH GROUP

msrg.org

24

Markov Model for Prediction• FSMs record incremental matches of subscriptions• Probability of transitioning to next state for a given

event depends only on current state• Our FSMs are Markov processes• Our prediction algorithm uses the properties of

Markov processes to predict future matches based on current state and event history– Probability of reaching the final state in n events– … of reaching final state in the next 1, 2, 3, … n events

MIDDLEWARE SYSTEMSRESEARCH GROUP

msrg.org

25

Prediction & Training• Compute long-run transition probability of reaching a

given state• Based on the input (event history), we count the

number of times transitions are taken• Based on counters, we compute transition

probabilities of the model• Transition probability from state i to j is • Complete Markov chain with finite state space• pij = Pr(Xn+1 = j| Xn = i)

– Conditional probability of transitioning to j given i

# times transition

taken

all incoming

transitions

MIDDLEWARE SYSTEMSRESEARCH GROUP

msrg.orgExperiments

• Synthetic workload

• Real data set

26

MIDDLEWARE SYSTEMSRESEARCH GROUP

msrg.orgEffect of Number of Subscriptions

27

• Merging reduces number of states by up to 30% for given data set• Number of states increases linearly in number of subscriptions• More states are required for workloads with less state sharing potential

Number of states Average matching time per event

• Matching time increases in the number of subscriptions• More sharing requires more processing as a given event may trigger more transitions

Gaussian

Uniform

More sharing

Less sharing

MIDDLEWARE SYSTEMSRESEARCH GROUP

msrg.org

Effect of Number of Non-contiguous Operators

• Matching time increases in number of non-contiguous operators

• More and more subscription instances are partially matched waiting for events

• Asks for a garbage collection scheme

28

Average matching time per event

MIDDLEWARE SYSTEMSRESEARCH GROUP

msrg.orgExperiments on Synthetic Workload

29

• Precision decreases as look-ahead increases• Precision increases as prediction-threshold increases and stabilizes for large thresholds

MIDDLEWARE SYSTEMSRESEARCH GROUP

msrg.orgExpert Model (full) vs. Learned Model

30

Full model (about 1400 states) Learned model (5 states)

Precision defined as True positives / All predictionsResult: With increasing look-ahead learned model results in higher precision.

MIDDLEWARE SYSTEMSRESEARCH GROUP

msrg.orgConclusions• P-ToPSS is a new publish/subscribe model for

event stream processing• Predicts the probability a subscription will match in

the future• Performs traditional publish/subscribe matching• Supports state-based, temporal and Boolean

operators over predicates (complex subscriptions)• Based on Markov chains for prediction• Prediction performance of learned model is

better than hand-crafted model in our experiments

31

MIDDLEWARE SYSTEMSRESEARCH GROUP

msrg.org

32


Top Related