indexing boolean expressions author : steven euijong whang presented by : aparna kulkarni

25
Indexing Boolean Expressions Author : Steven Euijong Whang Presented by : Aparna Kulkarni

Upload: iris-wheeler

Post on 18-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Indexing Boolean Expressions Author : Steven Euijong Whang Presented by : Aparna Kulkarni

Indexing Boolean Expressions

Author : Steven Euijong Whang

Presented by : Aparna Kulkarni

Page 2: Indexing Boolean Expressions Author : Steven Euijong Whang Presented by : Aparna Kulkarni

Problem

Online Display Advertising Example BE: age {10,20} & country {US}∈ ∉ S: age=20 & country=FR & gender=F Given an assignment S, find all matching

Boolean expressions (BEs)

Page 3: Indexing Boolean Expressions Author : Steven Euijong Whang Presented by : Aparna Kulkarni

Possible Applications

Display advertizing Publish/subscribe System Expert Systems Pattern matching in AI Compliance checker

Page 4: Indexing Boolean Expressions Author : Steven Euijong Whang Presented by : Aparna Kulkarni

Contributions

Use inverted indexing techniques for ‘complex’ BEs

- DNF, CNF expressions of , ∈ ∉predicates with multiple values

Support top-k pruning given relevance score

Page 5: Indexing Boolean Expressions Author : Steven Euijong Whang Presented by : Aparna Kulkarni

Outline

Inverted index construction Search algorithms

– DNF

– CNF ( only)∈ Experimental results

Page 6: Indexing Boolean Expressions Author : Steven Euijong Whang Presented by : Aparna Kulkarni

Inverted Index

E1: A {1}∈ E2: A {1} & B {2} & C {3,4}∈ ∈ ∈ S: A=1 & B=2

Key Posting List

(A,1) E1,E2

(B,2) E2

(C,3) E2

(C,4) E2

Page 7: Indexing Boolean Expressions Author : Steven Euijong Whang Presented by : Aparna Kulkarni

Inverted List Construction

ID Expression K

1 age {3} state {NY } ∈ ∧ ∈ 2

2 age {3} gender {F} ∈ ∧ ∈ 2

3 age {3} gender {M} ∈ ∧ ∈ state {CA}∧ ∉

2

4 state {CA} gender ∈ ∧ ∈{M}

2

5 age {3, 4}∈ 1

6 state {CA,NY }∉ 0

K Key and UB Posting List

0(state,CA), 2.0 (6, , 0)∉

(state,NY ), 5 (6, , 0)∉

Z, 0 (6, , 0)∈

1(age, 3), 1.0 (5, , 0.1)∈

(age, 4), 3.0 (5, , 0.5)∈

2

(state,NY ), 5 (1, , 4.0)∈

(age, 3), 1.0 (1, , 0.1) (2, , ∈ ∈0.1) (3, , 0.2)∈

(gender, F), 2 (2, , 0.3)∈

(state,CA), 2.0 (3, , 0) (4, , ∉ ∈1.5)

(gender,M), 1.0

(3, , 0.5) (4, , ∈ ∈0.9)

Page 8: Indexing Boolean Expressions Author : Steven Euijong Whang Presented by : Aparna Kulkarni

Algorithm 1:1: input: inverted list idx and assignment S2: output: set of IDs O matching S3: O ← ∅4: for K=min(idx.MaxConjunctionSize, |S|). . .0 do5: /* List of posting lists matching A for conjunction size K */6: PLists←idx.GetPostingLists(S,K)7: InitializeCurrentEntries(PLists)8: /* Processing K=0 and K=1 are identical */9: if K=0 then K ←110: /* Too few posting lists for any conjunction to be satisfied */11: if PLists.size() < K then12: continue to next for loop iteration13: while PLists[K-1].CurrEntry 6= EOL do14: SortByCurrentEntries(PLists)15: /* Check if the first K posting lists have the same conjunctionID in their current entries */

Page 9: Indexing Boolean Expressions Author : Steven Euijong Whang Presented by : Aparna Kulkarni

16: if PLists[0].CurrEntry.ID = PLists[K-1].CurrEntry.ID17: /* Reject conjunction if a 6 ∈ predicate is violated */18: if PLists[0].CurrEntry.AnnotatedBy(6∈) then19: RejectID←PLists[0].CurrEntry.ID20: for L = K .. (PLists.size()-1) do21: if PLists[L].CurrEntry.ID = RejectID then22: /* Skip to smallest ID where ID > RejectID */23: PLists[L].SkipTo(RejectID+1)24: else25: break out of for loop26: continue to next while loop iteration27: else /*conjunction is fully satisfied */28: O←O {∪ PLists[K-1].CurrEntry.ID}

29: /* NextID is the smallest possible ID after current ID*/30: NextID←PLists[K-1].CurrEntry.ID + 131: else32: /* Skip first K-1 posting lists */33: NextID←PLists[K-1].CurrEntry.ID34: for L = 0. . .K-1 do35: /* Skip to smallest ID such that ID ≥ NextID */36: PLists[L].SkipTo(NextID)37: return O

Page 10: Indexing Boolean Expressions Author : Steven Euijong Whang Presented by : Aparna Kulkarni

DNF Algorithm

S : {age = 3, state = CA, gender=M} Posting List for assignment S

K Key Posting List

0(state,CA) (6, )∉

Z (6, )∉

1 (age, 3) (5, )∈

2(age, 3) (1, ) (2, ) (3, )∈ ∈ ∈

(state,CA) (3, ) (4, )∉ ∈

(gender,M) (3, ) (4, )∈ ∈

Page 11: Indexing Boolean Expressions Author : Steven Euijong Whang Presented by : Aparna Kulkarni

DNF Algorithm

S : {age = 3, state = CA, gender=M} Posting List for k=2

K Key Posting List

2(age, 3) (1, ) ∈ (2, ) (3, )∈ ∈

(state,CA) (3, ) ∉ (4, )∈

(gender,M) (3, ) ∈ (4, )∈

Page 12: Indexing Boolean Expressions Author : Steven Euijong Whang Presented by : Aparna Kulkarni

DNF Algorithm

S : {age = 3, state = CA, gender=M} Posting List for k=2 after first skipping

K Key Posting List

2(state,CA) (3, ) ∉ (4, )∈

(age, 3) (1, ) (2, ) ∈ ∈ (3, )∈

(gender,M) (3, ) ∈ (4, )∈

Page 13: Indexing Boolean Expressions Author : Steven Euijong Whang Presented by : Aparna Kulkarni

DNF Algorithm

S : {age = 3, state = CA, gender=M} Posting List for k=2 after second skipping

K Key Posting List

2(state,CA) (3, ) ∉ (4, )∈

(gender,M) (3, ) ∈ (4, )∈

(age, 3) (1, ) (2, ) (3, ) ∈ ∈ ∈EOL

Page 14: Indexing Boolean Expressions Author : Steven Euijong Whang Presented by : Aparna Kulkarni

CNF:Inverted List

ID Expression

C1 (A {1} B {1}) (C {1} D {1})∈ ∨ ∈ ∧ ∈ ∨ ∈

C2 (A {1} C {2}) (B {1} D {1})∈ ∨ ∈ ∧ ∈ ∨ ∈

C3 (A {1} B {1}) (C {2} D {1})∈ ∨ ∈ ∧ ∈ ∨ ∈

C4 (A {1} B {1}) (A {1, 2} D {1})∈ ∨ ∈ ∧ ∈ ∨ ∈

C5 A {1} B {1}) (C ∈ ∨ ∈ ∧ ∉ {1, 2} D ∨ ∉ {1} E ∨ {1})∈

C6 A ∉ {1} B {1}∨ ∈

ID Key ,UB

0(A, 1), 0.5 (6, ∉, 0, 0)

(B, 1), 1.5 (6, , 0, 0.1)∈

Z, 0 (6, ,−1, 0)∈

1

(C, 1), 2.5 (5, ∉, 1, 0)

(C, 2), 3.0 (5, ∉, 1, 0)

(D, 1), 3.5 (5, ∉, 1, 0)

(A, 1), 0.5 (5, ∈, 0, 0.1)

(B, 1), 1.5 (5, ∈, 0, 0.7)

(E, 1), 4.5 (5, ∈, 1, 3.9)

2

(A, 1), 0.5 (1, , 0, 0.1)(2, , 0, 0.3)(3, ∈ ∈, 0, 0.3)(4, , 0, 0.1)∈ ∈

(B, 1), 1.5 (1, , 0, 0.3)(2, , 1, 0.5)(3, ∈ ∈, 0, 0.3)(4, , 0, 0.5)∈ ∈

(C, 1), 2.5 (1, , 1, 0.2)∈

(D, 1), 3.5 (1, , 1, 2.1)(2, , 1, 2.5)(3, ∈ ∈, 1, 1.7)(4, , 1, 1.9)∈ ∈

(C, 2), 3.0 (2, , 0, 2.5)(3, , 1, 2.7)∈ ∈

(A, 1), 0.5 (4, , 1, 0.1)∈

(A, 2), 1.0 (4, , 0, 0.1)∈

Page 15: Indexing Boolean Expressions Author : Steven Euijong Whang Presented by : Aparna Kulkarni

1: [Steps 1∼15 of Algorithm 1 except for Step 4]2: if PLists[0].CurrEntry.ID = PLists[K-1].CurrEntry.ID then3: /* For each disjunction in the current CNF, one counter isinitialized to the negative number of 6 ∈ predicates */4: Counters.Initialize(PLists[0].CurrEntry.ID)5: for L = 0. . .(PLists.size()-1) do6: if PLists[L].CurrEntry.ID = PLists[0].CurrEntry.ID then7: /* Ignore entries in the Z posting list */8: if PLists[L].CurrEntry.DisjID = -1 then9: continue to next for loop10: if PLists[L].CurrEntry.AnnotatedBy(6 ) then∈11: Counters[PLists[L].CurrEntry.DisjID]++12: else /*Disjunction is satisfied */13: Counters[PLists[L].CurrEntry.DisjID]←114: else15: break16: Satisfied← true17: for L = 0. . .Counters.size()-1 do18: /* No ∈ or 6 ∈ predicates were satisfied */19: if Counters[L] = 0 then20: Satisfied← false21: if Satisfied = true then22: O←O {PLists[K-1].CurrEntry.ID}∪23: [Steps 29∼37 of Algorithm 1]

Page 16: Indexing Boolean Expressions Author : Steven Euijong Whang Presented by : Aparna Kulkarni

CNF Algorithm

S : {A = 1, C = 2} Posting List for assignment S

K Key Posting List

0(A, 1), 0.5 (6, ∉, 0, 0)

Z, 0 (6, ,−1, 0)∈

1(C, 2), 3.0 (5, ∉, 1, 0)

(A, 1), 0.5 (5, ∈, 0, 0.1)

2(A, 1), 0.5 (1, , 0, 0.1)(2, , 0, 0.3)(3, , 0, ∈ ∈ ∈

0.3)(4, , 0, 0.1)∈

(C, 2), 3.0 (2, , 0, 2.5)(3, , 1, 2.7)∈ ∈

(A, 1), 0.5 (4, , 1, 0.1)∈

Page 17: Indexing Boolean Expressions Author : Steven Euijong Whang Presented by : Aparna Kulkarni

Ranking Boolean Expressions: DNF Algorithm After sorting the posting lists in Step 14, the sum of UB(A, v)×wS(A, v) for

every posting list PLists[L] such that PLists[L].CurrentEntry.ID≤PLists[K-1].CurrentEntry.IDis an upperbound for the score of the conjunction PLists[K-1].CurrentEntry.ID. If the upperbound is less than the Nth highest conjunction score, we can skip all the posting lists with CurrentEntry.ID less than or equal to PLists[K-1]. CurrentEntry.ID and continue to the next while loop at Step 13.

2. Before processing PLists from Step 7, the sum of the top-K UB(A, v)×wS(A, v) values for all the posting lists in Plists is an upperbound of the score for all the matching conjunctions with size K. If the upperbound is less than the Nth highest conjunction score, we can skip processing PLists for the current K-index and continue to the next for loop at Step 4.

Page 18: Indexing Boolean Expressions Author : Steven Euijong Whang Presented by : Aparna Kulkarni

Ranking Boolean Expressions: DNF Algorithm Example S :{age = 3, state =NY, gender= F}

wS(age, 3) = 0.8, wS(state,NY ) = 1.0, and wS(gender,F) =0.9. Score C1 = w1(state,NY )×wS(state,NY )+w1(age, 3) × wS(age, 3)

= 4.0×1.0 + 0.1×0.8 = 4.08. UB c2 = UB(age, 3)×wS(age, 3)+UB(gender,F) × wS(gender,F) =

1.0×0.8 + 2.0×0.9 = 2.6 UB for k=1 : UB(age, 3) ×wS(age, 3) = 1.0×0.8 = 0.8.

K Ws Key, UB Posting list

0 0.8 (age, 3), 1.0 (5, , 0.1)∈

11.0 (state,NY ), 5.0 (1, , 4.0)∈

0.8 (age, 3), 1.0 (1, , 0.1) (2, , 0.1) (3, , ∈ ∈ ∈0.2)

0.9 (gender, F), 2.0 (2, , 0.3)∈

Page 19: Indexing Boolean Expressions Author : Steven Euijong Whang Presented by : Aparna Kulkarni

Ranking Boolean Expressions: CNF Algorithm First technique same as DNF algorithm Can not apply 2nd technique. Example: S: {A=1,C=2} ws (A,1)=0.01 & ws (C,2) = 0.9

Matching CNFs are c3 and c4. Score C3 = w3(A, 1)*wS(A, 1)+w3(C, 2)×wS(C, 2) = 0.3×0.1 +

2.7×0.9 = 2.46 Skip processing c4, as upperbound of c4 = UB(A, 1)×wS(A, 1) +

UB(A, 1) ×wS(A, 1) = 0.5×0.1 +0.5×0.1 = 0.1

ws Key & UB Posting List

0.1 (A, 1), 0.5 (1, , 0, 0.1)(2, , 0, 0.3)(3, , 0, 0.3)(4, , 0, ∈ ∈ ∈ ∈0.1)

0.9 (C, 2), 3.0 (2, , 0, 2.5)(3, , 1, 2.7)∈ ∈

0.1 (A, 1), 0.5 (4, , 1, 0.1)∈

Page 20: Indexing Boolean Expressions Author : Steven Euijong Whang Presented by : Aparna Kulkarni

Data Set

Assignments = Display advertising Impressions (Ad opportunities)

BEs = Synthetic workloads generated from display advertising contracts

Up to 1 million DNF/CNF BEs generated High dimensional (~1500)

Page 21: Indexing Boolean Expressions Author : Steven Euijong Whang Presented by : Aparna Kulkarni

DNF algorithm scalability

Page 22: Indexing Boolean Expressions Author : Steven Euijong Whang Presented by : Aparna Kulkarni

CNF algorithm scalability

Page 23: Indexing Boolean Expressions Author : Steven Euijong Whang Presented by : Aparna Kulkarni

Top-k algorithms results

Page 24: Indexing Boolean Expressions Author : Steven Euijong Whang Presented by : Aparna Kulkarni

Conclusion

Proposed algorithms that use inverted indexes to efficiently search matching DNF/CNF BEs

Proposed top-k algorithms for BEs based on relevance scores

Page 25: Indexing Boolean Expressions Author : Steven Euijong Whang Presented by : Aparna Kulkarni

Questions???