be-tree: an index structure to efficiently match boolean expressions over high...

65
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions BE-Tree: An Index Structure to Efficiently Match B oolean E xpressions over High-dimensional Space Mohammad Sadoghi Hans-Arno Jacobsen University of Toronto June 15, 2011 Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 1 / 45

Upload: others

Post on 17-May-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

BE-Tree: An Index Structure to Efficiently MatchBoolean Expressions over High-dimensional Space

Mohammad Sadoghi Hans-Arno Jacobsen

University of Toronto

June 15, 2011

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 1 / 45

Page 2: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

1 Application Scenarios

2 Matching Problem

3 BE-Tree (Boolean Expression-Tree)

4 BE-Tree Adaptation Policies

5 Experimental Evaluation

6 Conclusions

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 2 / 45

Page 3: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Computational Advertising (A Billion-dollar Industry)

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 3 / 45

Page 4: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Computational Advertising (A Billion-dollar Industry)

Advertiser

Advertising Campaign

SonySears

AmazonAdvertisement (BE):age < 32credit-score > 630num-visits > 4price = 150

Advertiser

Subscriptions

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 4 / 45

Page 5: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Computational Advertising (A Billion-dollar Industry)

Broker

Advertiser

Online User

Advertising Campaign

car=BMWnum-visits=13

year=2008

model=X3

credit-score=647age=25

SonySears

AmazonAdvertisement (BE):age < 32credit-score > 630num-visits > 4price = 150

User Profiles

Clickstream

Advertiser

Subscriptions

Events Events

“BMW X3 2008”

price<235

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 5 / 45

Page 6: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Computational Advertising (A Billion-dollar Industry)

Broker

Advertiser

Online User

Advertising Campaign

Targeted Ads Targeted Ads

car=BMWnum-visits=13

year=2008

model=X3

credit-score=647age=25

SonySears

AmazonAdvertisement (BE):age < 32credit-score > 630num-visits > 4price = 150

User Profiles

Clickstream

Advertiser

Subscriptions

Events Events

“BMW X3 2008”

price<235

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 6 / 45

Page 7: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Application Scenarios

1 Computational advertising (targeted advertising)

2 Computational finance (algorithmic trading)

3 Intrusion detection (deep packet inspection)

4 Real-time data analysis (data analytics)

5 Emerging mobile applications in co-spaces (location-based services)

6 Approximate string matching (data cleansing)

Common Denominator

To continuously evaluate a set of predefined patterns/specifications(subscriptions) over incoming events.

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 7 / 45

Page 8: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Application Scenarios

1 Computational advertising (targeted advertising)

2 Computational finance (algorithmic trading)

3 Intrusion detection (deep packet inspection)

4 Real-time data analysis (data analytics)

5 Emerging mobile applications in co-spaces (location-based services)

6 Approximate string matching (data cleansing)

Common Denominator

To continuously evaluate a set of predefined patterns/specifications(subscriptions) over incoming events.

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 7 / 45

Page 9: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Challenges Derived from Application Scenarios

Key matching problem challenges addressed in this work

1 Enable an expressive matching semantics that supports constraints onboth the subscription and the event.

2 Handle subscriptions with expressive operators that impose conditionsonly on a small set of dimensions, which results in a high degree ofoverlap among subscriptions.

3 Scale to large collections of subscriptions with thousands ofdimensions.

4 Sustain high matching rates of events in presence of frequentinsertions and deletions of subscriptions.

5 Adapt to temporal changes of workload distribution (self-adjustingmechanism), i.e., avoid structure deterioration.

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 8 / 45

Page 10: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Challenges Derived from Application Scenarios

Key matching problem challenges addressed in this work

1 Enable an expressive matching semantics that supports constraints onboth the subscription and the event.

2 Handle subscriptions with expressive operators that impose conditionsonly on a small set of dimensions, which results in a high degree ofoverlap among subscriptions.

3 Scale to large collections of subscriptions with thousands ofdimensions.

4 Sustain high matching rates of events in presence of frequentinsertions and deletions of subscriptions.

5 Adapt to temporal changes of workload distribution (self-adjustingmechanism), i.e., avoid structure deterioration.

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 8 / 45

Page 11: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

1 Application Scenarios

2 Matching Problem

3 BE-Tree (Boolean Expression-Tree)

4 BE-Tree Adaptation Policies

5 Experimental Evaluation

6 Conclusions

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 9 / 45

Page 12: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Language and Data Model

Both subscription and event are defined as Boolean expressions which arecollections of predicates.

A predicate P is a triple consisting of an attribute uniquely representing a dimensionin n-dimensional space, an operator, and a set of values, denoted by P(attr,opt,val).A predicate P(x) either accepts or rejects an input x such thatP : x −→ {True, False}, where x ∈ Dom(Pattr).Each predicate supports relational operators (<, ≤, =, 6=, ≥, >), set operators (∈,/∈), or the SQL BETWEEN operator.Formally, a Boolean expression e is defined over an n-dimensional space as follows:

Definition

e ={

P(attr,opt,val)1 ∧ · · · ∧ P

(attr,opt,val)k

},

where k ≤ n; i , j ≤ k , Pattri = Pattr

j iff i = j

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 10 / 45

Page 13: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Language and Data Model

Both subscription and event are defined as Boolean expressions which arecollections of predicates.A predicate P is a triple consisting of an attribute uniquely representing a dimensionin n-dimensional space, an operator, and a set of values, denoted by P(attr,opt,val).

A predicate P(x) either accepts or rejects an input x such thatP : x −→ {True, False}, where x ∈ Dom(Pattr).Each predicate supports relational operators (<, ≤, =, 6=, ≥, >), set operators (∈,/∈), or the SQL BETWEEN operator.Formally, a Boolean expression e is defined over an n-dimensional space as follows:

Definition

e ={

P(attr,opt,val)1 ∧ · · · ∧ P

(attr,opt,val)k

},

where k ≤ n; i , j ≤ k , Pattri = Pattr

j iff i = j

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 10 / 45

Page 14: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Language and Data Model

Both subscription and event are defined as Boolean expressions which arecollections of predicates.A predicate P is a triple consisting of an attribute uniquely representing a dimensionin n-dimensional space, an operator, and a set of values, denoted by P(attr,opt,val).A predicate P(x) either accepts or rejects an input x such thatP : x −→ {True, False}, where x ∈ Dom(Pattr).

Each predicate supports relational operators (<, ≤, =, 6=, ≥, >), set operators (∈,/∈), or the SQL BETWEEN operator.Formally, a Boolean expression e is defined over an n-dimensional space as follows:

Definition

e ={

P(attr,opt,val)1 ∧ · · · ∧ P

(attr,opt,val)k

},

where k ≤ n; i , j ≤ k , Pattri = Pattr

j iff i = j

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 10 / 45

Page 15: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Language and Data Model

Both subscription and event are defined as Boolean expressions which arecollections of predicates.A predicate P is a triple consisting of an attribute uniquely representing a dimensionin n-dimensional space, an operator, and a set of values, denoted by P(attr,opt,val).A predicate P(x) either accepts or rejects an input x such thatP : x −→ {True, False}, where x ∈ Dom(Pattr).Each predicate supports relational operators (<, ≤, =, 6=, ≥, >), set operators (∈,/∈), or the SQL BETWEEN operator.

Formally, a Boolean expression e is defined over an n-dimensional space as follows:

Definition

e ={

P(attr,opt,val)1 ∧ · · · ∧ P

(attr,opt,val)k

},

where k ≤ n; i , j ≤ k , Pattri = Pattr

j iff i = j

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 10 / 45

Page 16: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Language and Data Model

Both subscription and event are defined as Boolean expressions which arecollections of predicates.A predicate P is a triple consisting of an attribute uniquely representing a dimensionin n-dimensional space, an operator, and a set of values, denoted by P(attr,opt,val).A predicate P(x) either accepts or rejects an input x such thatP : x −→ {True, False}, where x ∈ Dom(Pattr).Each predicate supports relational operators (<, ≤, =, 6=, ≥, >), set operators (∈,/∈), or the SQL BETWEEN operator.Formally, a Boolean expression e is defined over an n-dimensional space as follows:

Definition

e ={

P(attr,opt,val)1 ∧ · · · ∧ P

(attr,opt,val)k

},

where k ≤ n; i , j ≤ k , Pattri = Pattr

j iff i = j

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 10 / 45

Page 17: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Matching Semantics

Stabbing Subscription

Given an event e and a set of subscriptions S, find all subscriptions si ∈ Ssatisfied by e.

Definition

SQ(e) ={

si | ∀Pattr,opt,valq (x) ∈ si , ∃Pattr,opt,val

o (x) ∈ e,

Pattrq = Pattr

o ,∃x ∈ Dom(Pattrq ), Pq(x) ∧ Po(x)

} (1)

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 11 / 45

Page 18: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Matching Semantics

Stabbing Subscription

Given an event e and a set of subscriptions S, find all subscriptions si ∈ Ssatisfied by e.

Definition

SQ(e) ={

si | ∀Pattr,opt,valq (x) ∈ si , ∃Pattr,opt,val

o (x) ∈ e,

Pattrq = Pattr

o ,∃x ∈ Dom(Pattrq ), Pq(x) ∧ Po(x)

} (1)

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 11 / 45

Page 19: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Matching Semantics

Stabbing Subscription

Given an event e and a set of subscriptions S, find all subscriptions si ∈ Ssatisfied by e.

Definition

SQ(e) ={

si | ∀Pattr,opt,valq (x) ∈ si , ∃Pattr,opt,val

o (x) ∈ e,

Pattrq = Pattr

o ,∃x ∈ Dom(Pattrq ), Pq(x) ∧ Po(x)

} (1)

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 11 / 45

Page 20: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

1 Application Scenarios

2 Matching Problem

3 BE-Tree (Boolean Expression-Tree)

4 BE-Tree Adaptation Policies

5 Experimental Evaluation

6 Conclusions

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 12 / 45

Page 21: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Key Principles

Most Important Observation

To utilize the underlying high-dimensional, discrete, finite, and multi-valued attributespace that is assumed in many practical scenarios.

Most Important Design Feature

To systematically explore the space in two iterative phases of space partitioning andspace clustering (i) to cope with the curse of dimensionality (ii) to support dynamicinsertion and removal of subscriptions independent of their orders.

The two-phase space-cutting technique consists of

1 space partitioning: global structuring to determine the best splitting dimension2 space clustering: local structuring for each partition to determine the best

grouping of expressions with respect to the expressions’ range of values for eachdimension

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 13 / 45

Page 22: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Key Principles

Most Important Observation

To utilize the underlying high-dimensional, discrete, finite, and multi-valued attributespace that is assumed in many practical scenarios.

Most Important Design Feature

To systematically explore the space in two iterative phases of space partitioning andspace clustering (i) to cope with the curse of dimensionality (ii) to support dynamicinsertion and removal of subscriptions independent of their orders.

The two-phase space-cutting technique consists of

1 space partitioning: global structuring to determine the best splitting dimension2 space clustering: local structuring for each partition to determine the best

grouping of expressions with respect to the expressions’ range of values for eachdimension

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 13 / 45

Page 23: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Key Principles

Most Important Observation

To utilize the underlying high-dimensional, discrete, finite, and multi-valued attributespace that is assumed in many practical scenarios.

Most Important Design Feature

To systematically explore the space in two iterative phases of space partitioning andspace clustering (i) to cope with the curse of dimensionality (ii) to support dynamicinsertion and removal of subscriptions independent of their orders.

The two-phase space-cutting technique consists of

1 space partitioning: global structuring to determine the best splitting dimension2 space clustering: local structuring for each partition to determine the best

grouping of expressions with respect to the expressions’ range of values for eachdimension

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 13 / 45

Page 24: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

The Complete Structure

c-node

l-node

c-directory

p-node

p-directory

c-directory

p-directoryl-nodel-nodel-node

c-node c-node c-node c-node

p-node

p-node p-node

O(1)

O(logN)O(klogN)

Space Clustering

Space Partitioning

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 14 / 45

Page 25: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

The Complete Structure

c-node

l-node

c-directory

p-node

p-directory

c-directory

p-directoryl-nodel-nodel-node

c-node c-node c-node c-node

p-node

p-node p-node

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 15 / 45

Page 26: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

The Partition Directory

dim_i dim_j

c-node

l-node

p-node

p-directory

p-node

c-directory c-directory

Maintaining information for dimensions

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 16 / 45

Page 27: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

The Complete Structure

c-node

l-node

c-directory

p-node

p-directory

c-directory

p-directoryl-nodel-nodel-node

c-node c-node c-node c-node

p-node

p-node p-node

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 17 / 45

Page 28: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

The Cluster Directory

[1, N]

p-node

[1, N/2] [N/2, N]

[1,N/4] [N/4,N/2] [3N/4,N]

[1] [N/2]

c-node

l-node

c-directory

c-node

l-node l-node l-node l-node

c-node c-node c-node

buckets

Maintaining information for the range of values

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 18 / 45

Page 29: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Intuition Behind the Two-phase Space-cutting Technique

Required rules to avoid the cascading split problem and to ensure adeterministic property for altering between space partitioning and clustering.

1 Insertion rule: expression is always inserted into the smallest bucketthat encloses it.

2 Forced split rule: non-atomic bucket (i.e., a multi-valued, divisiblebucket) is always split before switching back to space partitioning.

3 Merge rule: underflowing leaf bucket is merged with its parent only ifthe parent bucket is not partitioned yet.

Invariance

Every expression always resides in the smallest bucket that encloses it, and anon-atomic leaf bucket is never partitioned.

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 19 / 45

Page 30: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Intuition Behind the Two-phase Space-cutting Technique

Required rules to avoid the cascading split problem and to ensure adeterministic property for altering between space partitioning and clustering.

1 Insertion rule: expression is always inserted into the smallest bucketthat encloses it.

2 Forced split rule: non-atomic bucket (i.e., a multi-valued, divisiblebucket) is always split before switching back to space partitioning.

3 Merge rule: underflowing leaf bucket is merged with its parent only ifthe parent bucket is not partitioned yet.

Invariance

Every expression always resides in the smallest bucket that encloses it, and anon-atomic leaf bucket is never partitioned.

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 19 / 45

Page 31: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Intuition Behind the Two-phase Space-cutting Technique

Required rules to avoid the cascading split problem and to ensure adeterministic property for altering between space partitioning and clustering.

1 Insertion rule: expression is always inserted into the smallest bucketthat encloses it.

2 Forced split rule: non-atomic bucket (i.e., a multi-valued, divisiblebucket) is always split before switching back to space partitioning.

3 Merge rule: underflowing leaf bucket is merged with its parent only ifthe parent bucket is not partitioned yet.

Invariance

Every expression always resides in the smallest bucket that encloses it, and anon-atomic leaf bucket is never partitioned.

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 19 / 45

Page 32: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

1 Application Scenarios

2 Matching Problem

3 BE-Tree (Boolean Expression-Tree)

4 BE-Tree Adaptation Policies

5 Experimental Evaluation

6 Conclusions

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 20 / 45

Page 33: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

BE-Tree Self-adjustment Mechanism

Ranking Objective

A novel ranking objective that directly reduces the matching cost as opposed to aranking that is biased towards either the least or the most popular dimension(s).

The matching cost consists of

1 minimizing false candidate computations: reduce the number of predicateevaluations before an unsatisfied predicate is reached, namely, penalizingmultiple search paths and penalizing paths that produce many false candidates(Loss function).

2 minimizing true candidate computations: promote the evaluation of thecommon predicate among matched expressions exactly once (Gain function).

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 21 / 45

Page 34: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

BE-Tree Self-adjustment Mechanism

Ranking Objective

A novel ranking objective that directly reduces the matching cost as opposed to aranking that is biased towards either the least or the most popular dimension(s).

The matching cost consists of

1 minimizing false candidate computations: reduce the number of predicateevaluations before an unsatisfied predicate is reached, namely, penalizingmultiple search paths and penalizing paths that produce many false candidates(Loss function).

2 minimizing true candidate computations: promote the evaluation of thecommon predicate among matched expressions exactly once (Gain function).

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 21 / 45

Page 35: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Self-adjustment Mechanism (Cost-based Ranking Model)

Definition

Rank(ni ) =

{βGain(ni )− γLoss(ni ) if ni is a l-node(∑

lj∈c(ni ) Rank(lj))− γLoss(ni ) otherwise

,where 0 ≤ β, γ ≤ 1

(2)

Loss(ni ) =∑

e′∈windowm(ni )

# discarded pred eval for e ′

|window(ni )|(3)

Gain(lj) = αsGains(lj) + αcGainc(lj), 0 ≤ αs , αc ≤ 1 (4)

Gains(lj) = # subsumed pred, Gainc(lj) = # covered pred (5)

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 22 / 45

Page 36: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Self-adjustment Mechanism (Cost-based Ranking Model)

Definition

Rank(ni ) =

{βGain(ni )− γLoss(ni ) if ni is a l-node(∑

lj∈c(ni ) Rank(lj))− γLoss(ni ) otherwise

,where 0 ≤ β, γ ≤ 1

(2)

Loss(ni ) =∑

e′∈windowm(ni )

# discarded pred eval for e ′

|window(ni )|(3)

Gain(lj) = αsGains(lj) + αcGainc(lj), 0 ≤ αs , αc ≤ 1 (4)

Gains(lj) = # subsumed pred, Gainc(lj) = # covered pred (5)

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 22 / 45

Page 37: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Self-adjustment Mechanism (Cost-based Ranking Model)

Definition

Rank(ni ) =

{βGain(ni )− γLoss(ni ) if ni is a l-node(∑

lj∈c(ni ) Rank(lj))− γLoss(ni ) otherwise

,where 0 ≤ β, γ ≤ 1

(2)

Loss(ni ) =∑

e′∈windowm(ni )

# discarded pred eval for e ′

|window(ni )|(3)

Gain(lj) = αsGains(lj) + αcGainc(lj), 0 ≤ αs , αc ≤ 1 (4)

Gains(lj) = # subsumed pred, Gainc(lj) = # covered pred (5)

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 22 / 45

Page 38: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Self-adjustment Mechanism (Cost-based Ranking Model)

Definition

Rank(ni ) =

{βGain(ni )− γLoss(ni ) if ni is a l-node(∑

lj∈c(ni ) Rank(lj))− γLoss(ni ) otherwise

,where 0 ≤ β, γ ≤ 1

(2)

Loss(ni ) =∑

e′∈windowm(ni )

# discarded pred eval for e ′

|window(ni )|(3)

Gain(lj) = αsGains(lj) + αcGainc(lj), 0 ≤ αs , αc ≤ 1 (4)

Gains(lj) = # subsumed pred, Gainc(lj) = # covered pred (5)

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 22 / 45

Page 39: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

BE-Tree Example (Insertion)

S1= [a>0, b=2, e>4]

S2= [c<3, f>1, m<2]

S3= [d=1, f<3]

S4= [a=3]

S5= [d>3, f>2]

S6= [d=4, f<4]

S7= [a<4, b=1]

S8= [a<3]

S9= [a>1]

c-node

l-node

{S1,S

2,S

3}

Insertion Sequence

l-node's maxcap

= 3

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 23 / 45

Page 40: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

BE-Tree Example (Insertion)

S1= [a>0, b=2, e>4]

S2= [c<3, f>1, m<2]

S3= [d=1, f<3]

S4= [a=3]

S5= [d>3, f>2]

S6= [d=4, f<4]

S7= [a<4, b=1]

S8= [a<3]

S9= [a>1]

c-node

l-node

{S2}

Insertion Sequence

l-node's maxcap

= 3

{S1,S

2,S

3,S

4}

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 24 / 45

Page 41: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

BE-Tree Example (Insertion)

S1= [a>0, b=2, e>4]

S2= [c<3, f>1, m<2]

S3= [d=1, f<3]

S4= [a=3]

S5= [d>3, f>2]

S6= [d=4, f<4]

S7= [a<4, b=1]

S8= [a<3]

S9= [a>1]

c-node

l-node

p-node

c-node

p-directory

[1, 4]c-directory

a{S

2}

l-node

{S1,S

4}

Insertion Sequence

l-node's maxcap

= 3

{S2,S

3}

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 25 / 45

Page 42: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

BE-Tree Example (Insertion)

S1= [a>0, b=2, e>4]

S2= [c<3, f>1, m<2]

S3= [d=1, f<3]

S4= [a=3]

S5= [d>3, f>2]

S6= [d=4, f<4]

S7= [a<4, b=1]

S8= [a<3]

S9= [a>1]

c-node

l-node

p-node

c-node

p-directory

[1, 4]c-directory

a{S

2}

l-node

{S1,S

4}

Insertion Sequence

l-node's maxcap

= 3

{S2,S

3,S

5,S

6}

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 26 / 45

Page 43: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

BE-Tree Example (Insertion)

S1= [a>0, b=2, e>4]

S2= [c<3, f>1, m<2]

S3= [d=1, f<3]

S4= [a=3]

S5= [d>3, f>2]

S6= [d=4, f<4]

S7= [a<4, b=1]

S8= [a<3]

S9= [a>1]

c-node

l-node

p-node p-node

c-node

p-directory

[1, 4]c-directory

c-node

l-node

a d

{S3, S

5,S

6}

{S2}

l-node

{S1,S

4}

Insertion Sequence

l-node's maxcap

= 3

{ S2}

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 27 / 45

Page 44: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

BE-Tree Example (Insertion)

c-node

l-node

p-node p-node

c-node

p-directory

[1, 4]c-directory

c-node

l-node

a d

{S3, S

5,S

6}

{S2}

S1= [a>0, b=2, e>4]

S2= [c<3, f>1, m<2]

S3= [d=1, f<3]

S4= [a=3]

S5= [d>3, f>2]

S6= [d=4, f<4]

S7= [a<4, b=1]

S8= [a<3]

S9= [a>1]

l-node

{S1,S

4,S

7,S

8}

Insertion Sequence

l-node's maxcap

= 3

{ S2}

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 28 / 45

Page 45: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

BE-Tree Example (Insertion)

c-node

l-node

p-node p-node

c-nodec-node

l-node

p-directory

[1, 4]

[2, 4]

c-directory

c-node

l-node

a d

{S4}

{S3, S

5,S

6}

{S2}

S1= [a>0, b=2, e>4]

S2= [c<3, f>1, m<2]

S3= [d=1, f<3]

S4= [a=3]

S5= [d>3, f>2]

S6= [d=4, f<4]

S7= [a<4, b=1]

S8= [a<3]

S9= [a>1]

l-node

{S1,S

7,S

8}

Insertion Sequence

l-node's maxcap

= 3

{ S2}

{S1,S

7,S

8}

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 29 / 45

Page 46: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

BE-Tree Example (Insertion)

c-node

l-node

p-node p-node

c-nodec-node

l-node

p-directory

[1, 4]

[2, 4]

c-directory

c-node

l-node

a d

{S4}

{S3, S

5,S

6}

{S2}

S1= [a>0, b=2, e>4]

S2= [c<3, f>1, m<2]

S3= [d=1, f<3]

S4= [a=3]

S5= [d>3, f>2]

S6= [d=4, f<4]

S7= [a<4, b=1]

S8= [a<3]

S9= [a>1]

l-node

{S8,S

9}

Insertion Sequence

l-node's maxcap

= 3

{ S2}

{S8,S

9}{S

1,S

7,S

8,S

9}

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 30 / 45

Page 47: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

BE-Tree Example (Insertion)

c-node

l-node

p-node p-node

c-nodec-node

l-node

p-directory

[1, 4]

[2, 4]

c-directory

c-node

l-node

a d

p-directoryb

p-node c-nodel-node

{S4}

{S3, S

5,S

6}

{S2}

{S1,S

7}

S1= [a>0, b=2, e>4]

S2= [c<3, f>1, m<2]

S3= [d=1, f<3]

S4= [a=3]

S5= [d>3, f>2]

S6= [d=4, f<4]

S7= [a<4, b=1]

S8= [a<3]

S9= [a>1]

l-node

{S8,S

9}

Insertion Sequence

l-node's maxcap

= 3

{ S2}

{S8,S

9}

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 31 / 45

Page 48: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

BE-Tree Example (the Notion of a Key)

c-node

l-node

p-node p-node

c-nodec-node

l-node

p-directory

[1, 4]

[2, 4]

c-directory

c-node

l-node

a d

p-directoryb

p-node c-nodel-node

{S4}

{S3, S

5,S

6}

{S2}

{S1,S

7}

S1= [a>0, b=2, e>4]

S2= [c<3, f>1, m<2]

S3= [d=1, f<3]

S4= [a=3]

S5= [d>3, f>2]

S6= [d=4, f<4]

S7= [a<4, b=1]

S8= [a<3]

S9= [a>1]

l-node

{S8,S

9}

l-node's maxcap

= 3

{ S2}

{S8,S

9}

Insertion Sequence

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 32 / 45

Page 49: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

BE-Tree Example (Matching)

c-node

l-node

p-node p-node

c-nodec-node

l-node

p-directory

[1, 4]

[2, 4]

c-directory

c-node

l-node

a d

p-directoryb

p-node c-nodel-node

{S4}

{S3, S

5,S

6}

{S2}

{S1,S

7}

S1= [a>0, b=2, e>4]

S2= [c<3, f>1, m<2]

S3= [d=1, f<3]

S4= [a=3]

S5= [d>3, f>2]

S6= [d=4, f<4]

S7= [a<4, b=1]

S8= [a<3]

S9= [a>1]

l-node

{S8,S

9}

Insertion Sequence

l-node's maxcap

= 3

{ S2}

{S8,S

9}

Evente

1=[a=1, b=2, e=3]

e1=[a=1]

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 33 / 45

Page 50: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

BE-Tree Example (Matching)

c-node

l-node

p-node p-node

c-nodec-node

l-node

p-directory

[1, 4]

[2, 4]

c-directory

c-node

l-node

a d

p-directoryb

p-node c-nodel-node

{S4}

{S3, S

5,S

6}

{S2}

{S1,S

7}

S1= [a>0, b=2, e>4]

S2= [c<3, f>1, m<2]

S3= [d=1, f<3]

S4= [a=3]

S5= [d>3, f>2]

S6= [d=4, f<4]

S7= [a<4, b=1]

S8= [a<3]

S9= [a>1]

l-node

{S8,S

9}

Insertion Sequence

l-node's maxcap

= 3

{ S2}

{S8,S

9}

e1=[a=1, b=2, e=3]

e1=[a=1]

Event

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 34 / 45

Page 51: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

1 Application Scenarios

2 Matching Problem

3 BE-Tree (Boolean Expression-Tree)

4 BE-Tree Adaptation Policies

5 Experimental Evaluation

6 Conclusions

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 35 / 45

Page 52: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Experimental Evaluation

Algorithms

1 BE: BE-Tree (our index structure)

2 BE-B: BE-Tree (batch version)

3 GR: IBM Gryphon (Aguilera et al., PODC’99)

4 P: Propagation Algorithm (Fabret et al. SIGMOD’01)

5 k-ind: k-index (Whang et al. VLDB’09)

6 SIFT: Counting Algorithm (Yan et al. TODS’94)

7 SCAN: Sequential Scan

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 36 / 45

Page 53: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Workload Configurations

Table: Synthetic and Real Workload Properties

Wor

kloa

dS

ize

Nu

mb

erof

Dim

ensi

ons

Dim

ensi

onC

ard

inal

ity

Pre

dic

ate

Sel

ecti

vity

Dim

ensi

onS

elec

tivi

ty

Su

b/E

ven

t

Siz

e

%E

qu

alit

yP

red

Mat

chP

rob

DB

LP

(Au

thor

)

DB

LP

(Tit

le)

Mat

chP

rob

(Au

thor

)

Mat

chP

rob

(Tit

le)

Size 100K-1M 1M 100K 100K 100K 100K 1M 1M 100-760K 50-250K 400k 150

Number of Dim 400 50-1400 400 400 400 400 400 400 677 677 677 677

Cardinality 48 48 48-150K 48 2-10 48 48 48 26 26 26 26

Avg. Sub Size 7 7 7 7 7 5-66 7 7 8 35 8 30

Avg. Event Size 15 15 15 15 15 13-81 15 15 8 35 16 43

Pred Avg. Range Size % 12 12 12 6-50 — 12 12 12 — — 12 12

% Equality Pred 0.3 0.3 0.3 0.3 1.0 0.3 0.2-1.0 0.3 1.0 1.0 0.3 0.3

Op Class Med Med Med Med Min Med Med Lo-Hi Min Min Lo-Hi Lo-Hi

Match Prob % 1 1 1 1 — 1 1 0.01-9 — — 0.01-9 0.01-9

The results in this paper were verified by the SIGMOD repeatability committee.

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 37 / 45

Page 54: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Workload Configurations

Table: Synthetic and Real Workload Properties

Wor

kloa

dS

ize

Nu

mb

erof

Dim

ensi

ons

Dim

ensi

onC

ard

inal

ity

Pre

dic

ate

Sel

ecti

vity

Dim

ensi

onS

elec

tivi

ty

Su

b/E

ven

t

Siz

e

%E

qu

alit

yP

red

Mat

chP

rob

DB

LP

(Au

thor

)

DB

LP

(Tit

le)

Mat

chP

rob

(Au

thor

)

Mat

chP

rob

(Tit

le)

Size 100K-1M 1M 100K 100K 100K 100K 1M 1M 100-760K 50-250K 400k 150

Number of Dim 400 50-1400 400 400 400 400 400 400 677 677 677 677

Cardinality 48 48 48-150K 48 2-10 48 48 48 26 26 26 26

Avg. Sub Size 7 7 7 7 7 5-66 7 7 8 35 8 30

Avg. Event Size 15 15 15 15 15 13-81 15 15 8 35 16 43

Pred Avg. Range Size % 12 12 12 6-50 — 12 12 12 — — 12 12

% Equality Pred 0.3 0.3 0.3 0.3 1.0 0.3 0.2-1.0 0.3 1.0 1.0 0.3 0.3

Op Class Med Med Med Med Min Med Med Lo-Hi Min Min Lo-Hi Lo-Hi

Match Prob % 1 1 1 1 — 1 1 0.01-9 — — 0.01-9 0.01-9

The results in this paper were verified by the SIGMOD repeatability committee.

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 37 / 45

Page 55: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Effect of Workload Size on Matching (Log Scale)

0.5

1

2

4

8

16

32

64

128

256

100K300K

500K700K

900K1MM

atc

hin

g T

ime/E

vent (m

s)

Varying Number of Subscriptions

BE-BBEGR

Pk-IndSIFT

SCAN

(a) Uniform: Workload Size

1

2

4

8

16

32

64

128

256

512

1024

100K300K

500K700K

900K1MM

atc

hin

g T

ime/E

vent (m

s)

Varying Number of Subscriptions

BE-BBEGR

Pk-IndSIFT

SCAN

(b) Zipf: Workload Size

Figure: Varying Workload Size

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 38 / 45

Page 56: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Effect of Dimension Cardinality on Matching (Log Scale)

0.5

1

2

4

8

16

32

48 2401.2K

6K 30K150K

Matc

hin

g T

ime/E

vent (m

s)

Varying Cardinality; Sub=100K

BE-BBEGR

Pk-IndSIFT

SCAN

(a) Uniform: Dimension Cardinality

1

2

4

8

16

32

64

48 2401.2K

6K 30K150K

Matc

hin

g T

ime/E

vent (m

s)

Varying Cardinality; Sub=100K

BE-BBEGR

Pk-IndSIFT

SCAN

(b) Zipf: Dimension Cardinality

Figure: Varying Dimension Cardinality

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 39 / 45

Page 57: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Effect of Predicate Expressiveness on Matching (Log Scale)

4

8

16

32

64

128

256

0.20.3

0.40.6

0.81.0

Matc

hin

g T

ime/E

vent (m

s)

Varying % of Equality Pred; Sub=1M

BE-BBEGR

Pk-IndSIFT

SCAN

(a) Uniform: % Equality Predicate

4

8

16

32

64

128

256

512

1024

2048

0.20.3

0.40.6

0.81.0

Matc

hin

g T

ime/E

vent (m

s)

Varying % of Equality Pred; Sub=1M

BE-BBEGR

Pk-IndSIFT

SCAN

(b) Zipf: % Equality Predicate

Figure: Varying % of Equality Predicates

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 40 / 45

Page 58: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

1 Application Scenarios

2 Matching Problem

3 BE-Tree (Boolean Expression-Tree)

4 BE-Tree Adaptation Policies

5 Experimental Evaluation

6 Conclusions

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 41 / 45

Page 59: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Conclusions

The main benefits in the overall design of BE-Tree are to

1 support expressive set of predicate operators (e.g., range predicates)through a novel predicate transformation

2 handle subscriptions that impose conditions only on a small set ofdimensions (of a high-dimensional space), resulting in a high degree ofoverlap, through the two-phase space-cutting technique that identifiesdense subspaces and copes with the curse of dimensionality

3 adapt to temporal changes of the workload distribution and to sustain highmatching rates in presence of frequent subscription updates through

1 a cascading-split-free approach when inserting and removing expressions2 a deterministic clustering and a partition-clustering alteration strategy that

are independent of the insertion sequence3 a novel cost-based technique to optimize the matching cost and recycle

ineffective buckets

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 42 / 45

Page 60: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Conclusions

The main benefits in the overall design of BE-Tree are to

1 support expressive set of predicate operators (e.g., range predicates)through a novel predicate transformation

2 handle subscriptions that impose conditions only on a small set ofdimensions (of a high-dimensional space), resulting in a high degree ofoverlap, through the two-phase space-cutting technique that identifiesdense subspaces and copes with the curse of dimensionality

3 adapt to temporal changes of the workload distribution and to sustain highmatching rates in presence of frequent subscription updates through

1 a cascading-split-free approach when inserting and removing expressions2 a deterministic clustering and a partition-clustering alteration strategy that

are independent of the insertion sequence3 a novel cost-based technique to optimize the matching cost and recycle

ineffective buckets

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 42 / 45

Page 61: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Conclusions

The main benefits in the overall design of BE-Tree are to

1 support expressive set of predicate operators (e.g., range predicates)through a novel predicate transformation

2 handle subscriptions that impose conditions only on a small set ofdimensions (of a high-dimensional space), resulting in a high degree ofoverlap, through the two-phase space-cutting technique that identifiesdense subspaces and copes with the curse of dimensionality

3 adapt to temporal changes of the workload distribution and to sustain highmatching rates in presence of frequent subscription updates through

1 a cascading-split-free approach when inserting and removing expressions2 a deterministic clustering and a partition-clustering alteration strategy that

are independent of the insertion sequence3 a novel cost-based technique to optimize the matching cost and recycle

ineffective buckets

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 42 / 45

Page 62: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Conclusions

The main benefits in the overall design of BE-Tree are to

1 support expressive set of predicate operators (e.g., range predicates)through a novel predicate transformation

2 handle subscriptions that impose conditions only on a small set ofdimensions (of a high-dimensional space), resulting in a high degree ofoverlap, through the two-phase space-cutting technique that identifiesdense subspaces and copes with the curse of dimensionality

3 adapt to temporal changes of the workload distribution and to sustain highmatching rates in presence of frequent subscription updates through

1 a cascading-split-free approach when inserting and removing expressions2 a deterministic clustering and a partition-clustering alteration strategy that

are independent of the insertion sequence3 a novel cost-based technique to optimize the matching cost and recycle

ineffective buckets

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 42 / 45

Page 63: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Thank You,

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 43 / 45

Page 64: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Effect of Sub/Event Size on Matching (Log Scale)

0.5

1

2

4

8

16

32

64

128

256

5/137/15

14/25

27/43

45/61

66/81

Matc

hin

g T

ime/E

vent (m

s)

Varying Sub/Event Size; Sub=100K

BE-BBEGR

Pk-IndSIFT

SCAN

(a) Uniform: Sub/Event Size

1

2

4

8

16

32

64

128

256

512

1024

5/137/15

14/25

27/43

45/61

66/81

Matc

hin

g T

ime/E

vent (m

s)

Varying Sub/Event Size; Sub=100K

BE-BBEGR

Pk-IndSIFT

SCAN

(b) Zipf: Sub/Event Size

Figure: Varying Subscription/Event Size

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 44 / 45

Page 65: BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High ...msrg.org/publications/presentations/2011/moSIGMOD11-BE... · 2011. 6. 27. · BE-Tree: An Index Structure

Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions

Effect of Matching Probability on Matching (Log Scale)

(a) Uniform: % Matching Prob (b) Zipf: % Matching Prob

Figure: Varying % of Matching Probability

Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 45 / 45