efficient filtering in pub-sub systems using bdd

57
Efficient Filtering in Publish- Subscribe Systems using BDD Alexis Campailla, Sagar Chaki, Edmund Clarke, Somesh Jha, Helmut Veith Prepared by Nabeel Mohamed 4/16/08 1

Upload: nabeel-yoosuf

Post on 28-Jun-2015

2.793 views

Category:

Technology


2 download

DESCRIPTION

Slides prepared based on the paper Efficient Filtering in Publish-Subscribe Systems using BDD by Alexis Campailla, SagarChaki, Edmund Clarke, SomeshJha, Helmut Veith

TRANSCRIPT

Page 1: Efficient Filtering in Pub-Sub Systems using BDD

Efficient Filtering in Publish-

Subscribe Systems using BDDAlexis Campailla, Sagar Chaki, Edmund Clarke, Somesh Jha, Helmut Veith

Prepared by Nabeel Mohamed

4/16/08

1

Page 2: Efficient Filtering in Pub-Sub Systems using BDD

Outline

Research problem at hand

Content-based Publish-Subscribe

Subscription Query Language

BDD Semantics

BDD Based matching

Experimental Results

Discussion (Pros and Cons)

2

Page 3: Efficient Filtering in Pub-Sub Systems using BDD

Research Problem at Hand

Loosely-coupled interactions in

publish-subscribe systems allows to

build very large scale systems

However, filtering techniques used are

a major bottleneck

Efficiency of the filtering technique

plays a major role in scalability

Whatever technique we use should be

provably correct

3

Page 4: Efficient Filtering in Pub-Sub Systems using BDD

Major Contributions

A Precise semantics to match

messages (events) to subscriptions

(subscription queries)

Modeling filtering as a satisfiability

check in BDD

4

Page 5: Efficient Filtering in Pub-Sub Systems using BDD

Roadmap

Research problem at hand

Content-based Publish-Subscribe

Subscription Query Language

BDD Semantics

BDD Based matching

Experimental Results

Discussion (Pros and Cons)

5

Page 6: Efficient Filtering in Pub-Sub Systems using BDD

Publish-Subscribe Systems

Publisher

Publisher

Publisher

SubscriberNotify()

SubscriberNotify()

SubscriberNotify()

Distributed

Subscription

Mgmt and Routing

Distributed

Content Routers

Notify()

Subscribe()

Unsubscribe()

publish

publish

notify

subscribe

unsubscribe

6

Page 7: Efficient Filtering in Pub-Sub Systems using BDD

Publish-Subscribe Systems

Publishers and Subscribers are

loosely coupled

◦ Space decoupled

◦ Time decoupled

◦ Synchronization decoupled

Content routers (brokers) form a

structured p2p system

Scalable Systems

7

Page 8: Efficient Filtering in Pub-Sub Systems using BDD

Message (Event) Filtering

Filtering

◦ Matching incoming messages (events) generated by Publishers with subscription criteria

◦ A main task of content routers (brokers) –filtering engine

Content-based pub-sub systems routes messages (events) based on the content itself

Example: Filter Quotes with symbol = Google and offer price < 400 in a Financial ticker.

8

Page 9: Efficient Filtering in Pub-Sub Systems using BDD

Example Pub-Sub Systems

Stock market feeds

◦ For delivery of financial data such as

stock quotes, trade reports, news, etc. to

customers

◦ OPRA feed disseminates more than

100,000 quotes/sec

Sensor networks

Network traffic analysis

Transaction log analysis

9

Page 10: Efficient Filtering in Pub-Sub Systems using BDD

Desirable Functions of a Filtering

Engine Correctness:

◦ Correctly matching incoming messages with subscription criteria

Expressiveness:◦ Rich subscription language

Efficiency:◦ Real time matching

Scalability:◦ Handling a large number of subscriptions

Dynamic:◦ Capability to add and remove subscriptions

online

10

Page 11: Efficient Filtering in Pub-Sub Systems using BDD

Related Work

Most existing systems support only conjunctive subscriptions

◦ GRYPHON

◦ SIENA

◦ Le Subscribe

Example: The following subscription requires 27 GRYPHON-like subscriptions while BDD handles it naturally.

11

Page 12: Efficient Filtering in Pub-Sub Systems using BDD

Related Work

Some systems have higher expressive power at the expense of less efficient filtering.◦ ELVIN

Can we come up with an efficient filtering technique while providing an expressive subscription language?

BDD based filtering may be employed in existing systems to improve matching efficiency

12

Page 13: Efficient Filtering in Pub-Sub Systems using BDD

Roadmap

Research problem at hand

Content-based Publish-Subscribe

Subscription Query Language

BDD Semantics

BDD Based matching

Experimental Results

Discussion (Pros and Cons)

13

Page 14: Efficient Filtering in Pub-Sub Systems using BDD

Subscription Query Language

The language used to describe

subscription criteria or subscriptions

Three Subscription Languages of

increasing complexity

◦ SiSL – Simple Subscription Language

◦ StSL – Strict Subscription Language

◦ DeSL – Default Subscription Language

14

Page 15: Efficient Filtering in Pub-Sub Systems using BDD

Messages and Attributes

V = <v1, .., vn> = a finite sequence of

attributes

Each attribute vi has a type

Each attribute vi has a corresponding

domain

Event schema =

15

Page 16: Efficient Filtering in Pub-Sub Systems using BDD

Messages and Attributes

A message = an assignment of values

to some (not necessarily all) of the

attributes

Formally, a message is a mapping m

such that for each attribute v, either

(m does not define v) ≡

A message is total if it defines all

attributes in V.16

Page 17: Efficient Filtering in Pub-Sub Systems using BDD

Messages and Attributes –

Example 1 Let V = <company, product, price>

over the event schema <STR, STR, DBL>

Consider the following message:<company> IBM </company><product>PC AT, 20 Mhz, 256 KB RAM</product><price>5000</price>

This describes a total message m1

where m1(company) = “IBM”, m1(product) = “PC AT, 20 Mhz, 256 KB RAM” and m1(price) = 5000.

17

Page 18: Efficient Filtering in Pub-Sub Systems using BDD

Messages and Attributes –

Example 2 Consider the following message:

<company> IBM </company>

<product>PC AT, 20 Mhz, 256 KB RAM</product>

This describes a different message m2

which is not total (i.e. partial), since

m2(price) = *.

18

Page 19: Efficient Filtering in Pub-Sub Systems using BDD

Three Subscription Languages

SiSL – Simple Subscription Language

◦ All messages are total

StSL – Strict Subscription Language◦ Messages define all attributes that occur in

the query (subscription criteria)

◦ SiSL is a subset of StSL

DeSL – Default Subscription Language

◦ All attributes are initialized to default values (e.g. using NULL)

◦ Extends the functionality of SiSL to heterogeneous message formats

19

Page 20: Efficient Filtering in Pub-Sub Systems using BDD

Formalizing SiSL Queries

(Subscriptions) Atomic formulas

Let v be an attribute in V

If and

then the formulas v = c, v < c, c < v

are atomic formulas.

If , atomic formulas are

defined similarly.

If

then the formulas are

atomic formulas. ( ≡ substring)20

Page 21: Efficient Filtering in Pub-Sub Systems using BDD

Formalizing SiSL Queries

(Subscriptions) Atoms = the set of atomic formulas

A Query is a Boolean combination

of atomic formulas

= the set of attributes occurring

in

= the set of atomic formulas

occurring in

21

Page 22: Efficient Filtering in Pub-Sub Systems using BDD

Formalizing SiSL Queries

(Subscriptions) Abbreviations

22

Page 23: Efficient Filtering in Pub-Sub Systems using BDD

Example: SiSL Query

The following SiSL query matches all

messages for 1000 Mhz PCs

manufactured by IBM, Dell or Siemens

which cost at most $1000.

23

Page 24: Efficient Filtering in Pub-Sub Systems using BDD

Formalizing SiSL Queries

(Subscriptions) = The instantiation of a query by

a message m.

Definition:

is defined as the query obtained

from by replacing all variables

for which m(v) ≠ * by m(v).

Definition:

The SiSL query matches the total

message m if evaluates to true.

24

Page 25: Efficient Filtering in Pub-Sub Systems using BDD

Formalizing StSL Queries

(Subscriptions) StSL (Strict Subscription Language) is

generalization of SiSL.

Definition: adequacy

A message m is adequate for a query

, if for all , it holds that m(v)

≠ *.

Definition:

The query matches m, iff m is

adequate for and

25

Page 26: Efficient Filtering in Pub-Sub Systems using BDD

Formalizing DeSL Queries

(Subscriptions) DeSL (Default Subscription Language)

is the most general out of the three.

For each attribute vi, there’s a default

value

Definition:

The default extension of m is

defined as follows.

26

Page 27: Efficient Filtering in Pub-Sub Systems using BDD

Formalizing DeSL Queries

(Subscriptions) Definition:

The query matches the message m

under default semantics if (i.e.

evaluates to true)

27

Page 28: Efficient Filtering in Pub-Sub Systems using BDD

Roadmap

Research problem at hand

Content-based Publish-Subscribe

Subscription Query Language

BDD Semantics

BDD Based matching

Experimental Results

Discussion (Pros and Cons)

28

Page 29: Efficient Filtering in Pub-Sub Systems using BDD

BDDs (Binary Decision

Diagrams) Notations

A = a set of propositional variables

= a linear ordering (variable

ordering) on A

= An ordered BDD over A, whose

non-terminal nodes are labeled by

variables in A, terminals by 0 or 1.

= The Boolean function

represented by node v in

29

Page 30: Efficient Filtering in Pub-Sub Systems using BDD

Properties of BDDs

Each non-terminal node v has two out-

edges: low edge and high edge

Let a non-terminal node v with label ai

has successors at the low and high

edges u and w respectively. Then,

Size = # nodes in the BDD

30

Page 31: Efficient Filtering in Pub-Sub Systems using BDD

Example: BDD

The following BDD represents the

Boolean function x AND ( y OR z).

The variable ordering is

31

Page 32: Efficient Filtering in Pub-Sub Systems using BDD

Shared BDDs (SBDDs)

While OBDDs represent one Boolean function, SBDDs represent multiple Boolean functions.

SBDD is a collection of component OBDDs respecting same variable ordering.

SBDD has a set of output nodes Vo = {o1, …, on} each corresponding to Boolean functions <f1,…, fn> respectively.

32

Page 33: Efficient Filtering in Pub-Sub Systems using BDD

SBDDs

Every root node of component

OBDDS Vo

Notation:

Denotes the BDD together with its

output nodes {o1, …, on}

is polynomial time

computable from any other shared

BDD over A for <f1,…, fn>

33

Page 34: Efficient Filtering in Pub-Sub Systems using BDD

Example: Shared BDD

Node 1 represents

Node 2 represents

Node 3 represents

34

Page 35: Efficient Filtering in Pub-Sub Systems using BDD

BDD Data Structure

A BDD with n nodes is represented as a graph whose vertices are the natural numbers 1,…, n.

The adjacency relationship is described by an array of size n.

ith element = (low[i], high[i], label[i], value[i])◦ low[i] = low successor of i◦ high[i] = high successor of i

◦ label[i] = label of i◦ value[i] = used later to store the result of the

BDD evaluation corresponding to i.

35

Page 36: Efficient Filtering in Pub-Sub Systems using BDD

BDD Evaluation

The above algorithm computes the

value of each node in under the

assignment where

= = value of ith component36

Page 37: Efficient Filtering in Pub-Sub Systems using BDD

BDD Evaluation

Notice that we can compute the value

of Boolean functions associated with

each output node in one pass.

37

Page 38: Efficient Filtering in Pub-Sub Systems using BDD

BDD Restrictions

The idea is to restrict the possible

truth assignments such that

external constraint f (a Boolean fn

over A) evaluates to true under

Definition: f-restriction

38

Page 39: Efficient Filtering in Pub-Sub Systems using BDD

Roadmap

Research problem at hand

Content-based Publish-Subscribe

Subscription Query Language

BDD Semantics

BDD Based matching

Experimental Results

Discussion (Pros and Cons)

39

Page 40: Efficient Filtering in Pub-Sub Systems using BDD

Query BDDs

Key Idea

◦ Represent many subscription queries by a

single shared BDD whose nodes

correspond to atomic sub-formulas of the

queries.

◦ Messages are matched against queries

by simply running EvalBDD on the shared

BDD.

40

Page 41: Efficient Filtering in Pub-Sub Systems using BDD

Query BDDs

, a sequence of queries

over the set of attributes V

A = , the set of atomic

sub-formulas of the queries.

is the set of propositional variables

such that each atomic sub-formula a

in A is assigned a propositional

variable

= Boolean query obtained by

substituting each a with 41

Page 42: Efficient Filtering in Pub-Sub Systems using BDD

Example: Query BDDs

Let & two subscriptions received

Then, =

Three atomic sub-formulas => Three

propositional variables

42

Page 43: Efficient Filtering in Pub-Sub Systems using BDD

Example: Query BDDs

Let the variable order be

SBDD corresponding

to the queries

43

Page 44: Efficient Filtering in Pub-Sub Systems using BDD

Query Matching: SiSL

Use EvalBDD algorithm for query

matching

A query Qi is considered matched if

the BDD node corresponding to Qi

evaluates to 1.

Bottom-up evaluation makes sure sub-

queries are evaluated only once.

44

Page 45: Efficient Filtering in Pub-Sub Systems using BDD

Query Matching: DeSL

Same as handling complete

messages

When a message received, it is

extended to a total message before

performing the matching.

45

Page 46: Efficient Filtering in Pub-Sub Systems using BDD

Query Matching: StSL

Recall that a message m matches a

subscription Q iff m is adequate for Q

and m satisfies Q.

Can use a modified EvalBDD to

perform faster matching

Key Ideas

◦ An undefined atom renders all sub-

formulas in which it occurs undefined.

◦ Treat * as new value undefined

46

Page 47: Efficient Filtering in Pub-Sub Systems using BDD

Query Matching: StSL

MVEvalBDD for StSL is significantly

faster than EvalBDD for SiSL

47

Page 48: Efficient Filtering in Pub-Sub Systems using BDD

Roadmap

Research problem at hand

Content-based Publish-Subscribe

Subscription Query Language

BDD Semantics

BDD Based matching

Experimental Results

Discussion (Pros and Cons)

48

Page 49: Efficient Filtering in Pub-Sub Systems using BDD

# Nodes in SBDD vs. #

Subscriptions

Number of nodes scale almost linearly

◦ High scalability

Restriction further reduces node count,

minimizing memory requirements

49

Page 50: Efficient Filtering in Pub-Sub Systems using BDD

Matching time for SiSL and StSL

Inputs: Number of subscription queries and message density (how total)

Partial messages can be matched quickly.

Time for StSL queries

50

Page 51: Efficient Filtering in Pub-Sub Systems using BDD

Roadmap

Research problem at hand

Content-based Publish-Subscribe

Subscription Query Language

BDD Semantics

BDD Based matching

Experimental Results

Discussion (Pros and Cons)

51

Page 52: Efficient Filtering in Pub-Sub Systems using BDD

Variable Ordering vs. BDD size

Variable ordering has a tremendous

influence on BDD size.

52

Page 53: Efficient Filtering in Pub-Sub Systems using BDD

Pros

Introduces a well-formed semantics to

describe the matching process in

publish-subscribe systems

Matching as a satisfiability checking in

SBDD allows to incrementally check

multiple subscriptions

Scalable

StSL is more efficient than SiSL

53

Page 54: Efficient Filtering in Pub-Sub Systems using BDD

Cons/Improvements

Does not describe any heuristics to select node ordering (NP-hard);

◦ Can we order based on the significance of the attributes involved?

Does not explore possibility of eliminating redundancies due to semantically related atomic sub-formulas (e.g.: price = 100 and price > 80) (again NP-hard)

◦ Can we further reduce the node count exploiting the semantics without causing side effect?

Efficiency of matching is not compared with existing systems

54

Page 55: Efficient Filtering in Pub-Sub Systems using BDD

Conclusion

Two major contributions

◦ A Precise semantics to match messages

to subscriptions

◦ Modeling filtering as a satisfiability check

in BDD

55

Page 56: Efficient Filtering in Pub-Sub Systems using BDD

Questions

56

Page 57: Efficient Filtering in Pub-Sub Systems using BDD

Thank You

57