promotion analysis in multi-dimensional space vldb 2009 tianyi wu tianyi wu 1 dong xin 2 qiaozhu mei...

Post on 25-Dec-2015

220 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Content

Introduction Promotion analysis problem

Problem definition Promotiveness measure

The basic query execution framework Subspace pruning Object pruning Promotion cube

Experimental evaluations Conclusion

Introduction

Introduction

Promotion has been playing a key role in marketing…

Retailer Category Readership Year #Sales

A Sci & Tech College students 2009 20

A Sci & Tech University students 2009 5

A Comedy University students 2009 9

B Sci & Tech College students 2009 20

B Sci & Tech University students 2009 7

B Fiction University students 2009 5

B Comedy Kindergarten 2009 20

B Comedy College students 2009 10

C Sci & Tech College students 2009 12

… … … … …

A Sci & Tech College students 2010 22

A Sci & Tech University students 2010 4

A Comedy College students 2010 1

B Sci & Tech College students 2010 13

B Sci & Tech University students 2010 30

B Fiction University students 2010 5

B Comedy Kindergarten 2010 20

B Comedy College students 2010 10

C Sci & Tech College students 2010 16

C Comedy Kindergarten 2010 52

Book sales database

Introduction

Promotion has been playing a key role in marketing…

Manager of retailer A

Retailer Category Readership Year #Sales

A Sci & Tech College students 2009 20

A Sci & Tech University students 2009 5

A Comedy University students 2009 9

B Sci & Tech College students 2009 20

B Sci & Tech University students 2009 7

B Fiction University students 2009 5

B Comedy Kindergarten 2009 20

B Comedy College students 2009 10

C Sci & Tech College students 2009 12

… … … … …

A Sci & Tech College students 2010 22

A Sci & Tech University students 2010 4

A Comedy College students 2010 1

B Sci & Tech College students 2010 13

B Sci & Tech University students 2010 30

B Fiction University students 2010 5

B Comedy Kindergarten 2010 20

B Comedy College students 2010 10

C Sci & Tech College students 2010 16

C Comedy Kindergarten 2010 52

What is the rank of our book sales among other retailers?

Book sales database

Introduction

Promotion has been playing a key role in marketing…

Manager of retailer A

Retailer Category Readership Year #Sales

A Sci & Tech College students 2009 20

A Sci & Tech University students 2009 5

A Comedy University students 2009 9

B Sci & Tech College students 2009 20

B Sci & Tech University students 2009 7

B Fiction University students 2009 5

B Comedy Kindergarten 2009 20

B Comedy College students 2009 10

C Sci & Tech College students 2009 12

… … … … …

A Sci & Tech College students 2010 22

A Sci & Tech University students 2010 4

A Comedy College students 2010 1

B Sci & Tech College students 2010 13

B Sci & Tech University students 2010 30

B Fiction University students 2010 5

B Comedy Kindergarten 2010 20

B Comedy College students 2010 10

C Sci & Tech College students 2010 16

C Comedy Kindergarten 2010 52

What is the rank of our book sales among other retailers?

We ranked the 3rd among all book retailers !

Retailer #Sales

A 61

B 180

C 80

Book sales database

Global aggregate result

E.g. To compute the aggregate value of this cell, we project all tuples with Retailer = “A” and sum up their sales.

We ranked the 3rd among all book retailers !

What is the rank of our book sales among other retailers?

Introduction

Promotion has been playing a key role in marketing…

Manager of retailer A

Retailer Category Readership Year #Sales

A Sci & Tech College students 2009 20

A Sci & Tech University students 2009 5

A Comedy University students 2009 9

B Sci & Tech College students 2009 20

B Sci & Tech University students 2009 7

B Fiction University students 2009 5

B Comedy Kindergarten 2009 20

B Comedy College students 2009 10

C Sci & Tech College students 2009 12

… … … … …

A Sci & Tech College students 2010 22

A Sci & Tech University students 2010 4

A Comedy College students 2010 1

B Sci & Tech College students 2010 13

B Sci & Tech University students 2010 30

B Fiction University students 2010 5

B Comedy Kindergarten 2010 20

B Comedy College students 2010 10

C Sci & Tech College students 2010 16

C Comedy Kindergarten 2010 52

Retailer #Sales

A 61

B 180

C 80

Discover the most interesting subspaces where the our brand (Retailer A) is highly ranked among other competitors.

Book sales database

We ranked the 3rd among all book retailers !

What is the rank of our book sales among other retailers?

Introduction

Promotion has been playing a key role in marketing…

Manager of retailer A

Retailer Category Readership Year #Sales

A Sci & Tech College students 2009 20

A Sci & Tech University students 2009 5

A Comedy University students 2009 9

B Sci & Tech College students 2009 20

B Sci & Tech University students 2009 7

B Fiction University students 2009 5

B Comedy Kindergarten 2009 20

B Comedy College students 2009 10

C Sci & Tech College students 2009 12

… … … … …

A Sci & Tech College students 2010 22

A Sci & Tech University students 2010 4

A Comedy College students 2010 1

B Sci & Tech College students 2010 13

B Sci & Tech University students 2010 30

B Fiction University students 2010 5

B Comedy Kindergarten 2010 20

B Comedy College students 2010 10

C Sci & Tech College students 2010 16

C Comedy Kindergarten 2010 52

Retailer #Sales

A 61

B 180

C 80

Retailer Category Readership #Sales

A Sci & Tech College students 42

Discover the most interesting subspaces where the our brand (Retailer A) is highly ranked among other competitors.

Book sales database

We ranked the 3rd among all book retailers !

What is the rank of our book sales among other retailers?

Introduction

Promotion has been playing a key role in marketing…

Manager of retailer A

Retailer Category Readership Year #Sales

A Sci & Tech College students 2009 20

A Sci & Tech University students 2009 5

A Comedy University students 2009 9

B Sci & Tech College students 2009 20

B Sci & Tech University students 2009 7

B Fiction University students 2009 5

B Comedy Kindergarten 2009 20

B Comedy College students 2009 10

C Sci & Tech College students 2009 12

… … … … …

A Sci & Tech College students 2010 22

A Sci & Tech University students 2010 4

A Comedy College students 2010 1

B Sci & Tech College students 2010 13

B Sci & Tech University students 2010 30

B Fiction University students 2010 5

B Comedy Kindergarten 2010 20

B Comedy College students 2010 10

C Sci & Tech College students 2010 16

C Comedy Kindergarten 2010 52

Retailer #Sales

A 61

B 180

C 80

Retailer Category Readership #Sales

A Sci & Tech College students 42

B Sci & Tech College students 33

Discover the most interesting subspaces where the our brand (Retailer A) is highly ranked among other competitors.

Book sales database

We ranked the 3rd among all book retailers !

What is the rank of our book sales among other retailers?

Introduction

Promotion has been playing a key role in marketing…

Manager of retailer A

Retailer Category Readership Year #Sales

A Sci & Tech College students 2009 20

A Sci & Tech University students 2009 5

A Comedy University students 2009 9

B Sci & Tech College students 2009 20

B Sci & Tech University students 2009 7

B Fiction University students 2009 5

B Comedy Kindergarten 2009 20

B Comedy College students 2009 10

C Sci & Tech College students 2009 12

… … … … …

A Sci & Tech College students 2010 22

A Sci & Tech University students 2010 4

A Comedy College students 2010 1

B Sci & Tech College students 2010 13

B Sci & Tech University students 2010 30

B Fiction University students 2010 5

B Comedy Kindergarten 2010 20

B Comedy College students 2010 10

C Sci & Tech College students 2010 16

C Comedy Kindergarten 2010 52

Retailer #Sales

A 61

B 180

C 80

Retailer Category Readership #Sales

A Sci & Tech College students 42

B Sci & Tech College students 33

C Sci & Tech College students 28

Discover the most interesting subspaces where the our brand (Retailer A) is highly ranked among other competitors.

We are the top-1 bookseller in the { Readership = College Students, Category = Sci & Tech } segment !!!

Book sales database

We ranked the 3rd among all book retailers !

What is the rank of our book sales among other retailers?

We are the top-1 bookseller in the { Readership = College Students, Category = Sci & Tech } segment !!!

Introduction

Promotion has been playing a key role in marketing…

Manager of retailer A

Retailer Category Readership Year #Sales

A Sci & Tech College students 2009 20

A Sci & Tech University students 2009 5

A Comedy University students 2009 9

B Sci & Tech College students 2009 20

B Sci & Tech University students 2009 7

B Fiction University students 2009 5

B Comedy Kindergarten 2009 20

B Comedy College students 2009 10

C Sci & Tech College students 2009 12

… … … … …

A Sci & Tech College students 2010 22

A Sci & Tech University students 2010 4

A Comedy College students 2010 1

B Sci & Tech College students 2010 13

B Sci & Tech University students 2010 30

B Fiction University students 2010 5

B Comedy Kindergarten 2010 20

B Comedy College students 2010 10

C Sci & Tech College students 2010 16

C Comedy Kindergarten 2010 52

Retailer #Sales

A 61

B 180

C 80

Retailer Category Readership #Sales

A Sci & Tech College students 42

B Sci & Tech College students 33

C Sci & Tech College students 28

Discover the most interesting subspaces where the our brand (Retailer A) is highly ranked among other competitors.

Full space

Subspaces

Global rank

Local rank

May not be interesting.

Globally low-ranked object may becomes prominent in some subspaces.

Compare with ALL objects in ALL aspects.

Compare with objects in certain area.

Low cost

High cost

Single SQL.

A naïve approach is to compute rank for ALL possible subspaces and return the interesting ones.

Promotion query

We ranked the 3rd among all book retailers !

What is the rank of our book sales among other retailers?

We are the top-1 bookseller in the { Readership = College Students, Category = Sci & Tech } segment !!!

Introduction

Promotion has been playing a key role in marketing…

Manager of retailer A

Retailer Category Readership Year #Sales

A Sci & Tech College students 2009 20

A Sci & Tech University students 2009 5

A Comedy University students 2009 9

B Sci & Tech College students 2009 20

B Sci & Tech University students 2009 7

B Fiction University students 2009 5

B Comedy Kindergarten 2009 20

B Comedy College students 2009 10

C Sci & Tech College students 2009 12

… … … … …

A Sci & Tech College students 2010 22

A Sci & Tech University students 2010 4

A Comedy College students 2010 1

B Sci & Tech College students 2010 13

B Sci & Tech University students 2010 30

B Fiction University students 2010 5

B Comedy Kindergarten 2010 20

B Comedy College students 2010 10

C Sci & Tech College students 2010 16

C Comedy Kindergarten 2010 52

Retailer #Sales

A 61

B 180

C 80

Retailer Category Readership #Sales

A Sci & Tech College students 42

B Sci & Tech College students 33

C Sci & Tech College students 28

Discover the most interesting subspaces where the our brand (Retailer A) is highly ranked among other competitors.

Object dimension

Subspace dimensions

Scoredimension

Target object

Introduction

Person PromotionAn NBA manager would like to promote

Michael Jordan as a superstar.3rd all time leading scorer.Further analysis…

Top scorer in the guard position. Top scorer on the Chicago Bulls team. 11 individual years’ scoring champion.

Player Position Team Year Game … Score

Michael Jordan

Guard Chicago Bulls

1998 vs N.Y. Knicks

… 33

Michael Jordan

Guard Chicago Bulls

1998 vs Utah Jazz

… 15

Scottie Pippen

Small Forward

Chicago Bulls

1998 vs Utah Jazz

… 18

… … … … … … …

Target object

Object dimension

Subspace dimensions

Scoredimension

Local rank in some subspaces

Introduction

The promotion query problemGiven an object (e.g. a product, a person)Goal: Discover the most interesting

subspaces where the object is highly ranked.

Problem Definition

Promotiveness measure

Problem Definition Object Location Year Score

T1 NY 2008 0.5

T1 WA 2008 0.8

T2 WA 2007 1.0

T2 WA 2008 1.0

T3 NY 2007 0.3

T3 WA 2007 0.6

T3 WA 2008 0.7

Query Target Object : T1 Aggregation measure : SUM Goal : Discover the most interesting subspaces where T1 is

highly ranked.

Object dimension

Subspace dimensions

Scoredimension

Problem Definition

Query Target Object : T1 Aggregation measure : SUM Goal : Discover the most interesting subspaces where T1 is

highly ranked.

{*}

{NY} {WA} {2008}

{NY,2008} {WA,2008}

{2007}

{NY,2007} {WA,2007}

Object Location Year Score

T1 NY 2008 0.5

T1 WA 2008 0.8

T2 WA 2007 1.0

T2 WA 2008 1.0

T3 NY 2007 0.3

T3 WA 2007 0.6

T3 WA 2008 0.7

All possible subspaces.

Problem Definition Object Location Year Score

T1 NY 2008 0.5

T1 WA 2008 0.8

T2 WA 2007 1.0

T2 WA 2008 1.0

T3 NY 2007 0.3

T3 WA 2007 0.6

T3 WA 2008 0.7

Query Target Object : T1 Aggregation measure : SUM Goal : Discover the most interesting subspaces where T1 is

highly ranked.

{*}

{NY} {WA} {2008}

{NY,2008} {WA,2008}

{2007}

{NY,2007} {WA,2007}

Note that the target object T1 only appears in year = 2008, therefore the subspace {2007} can be pruned.

Problem Definition Object Location Year Score

T1 NY 2008 0.5

T1 WA 2008 0.8

T2 WA 2007 1.0

T2 WA 2008 1.0

T3 NY 2007 0.3

T3 WA 2007 0.6

T3 WA 2008 0.7

Query Target Object : T1 Aggregation measure : SUM Goal : Discover the most interesting subspaces where T1 is

highly ranked.

{*}

{NY} {WA} {2008}

{NY,2008} {WA,2008}

Target subspaces of T1.

Problem Definition Object Location Year Score

T1 NY 2008 0.5

T1 WA 2008 0.8

T2 WA 2007 1.0

T2 WA 2008 1.0

T3 NY 2007 0.3

T3 WA 2007 0.6

T3 WA 2008 0.7

Query Target Object : T1 Aggregation measure : SUM Goal : Discover the most interesting subspaces where T1 is

highly ranked.

{*}

{NY} {WA} {2008}

{NY,2008} {WA,2008}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

Object SUM(Score)

T1 0.5 + 0.8 = 1.3

T2 1 + 1 = 2

T3 0.3 + 0.6 + 0.7 = 1.6

We project all tuples of T1 into this cell and sum up their scores.

Problem Definition Object Location Year Score

T1 NY 2008 0.5

T1 WA 2008 0.8

T2 WA 2007 1.0

T2 WA 2008 1.0

T3 NY 2007 0.3

T3 WA 2007 0.6

T3 WA 2008 0.7

Query Target Object : T1 Aggregation measure : SUM Goal : Discover the most interesting subspaces where T1 is

highly ranked.

{*}

{NY} {WA} {2008}

{NY,2008} {WA,2008}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

SUM (T1) = 1.3Rank (T1) = 1st / 3

Object SUM(Score)

T1 0.5 + 0.8 = 1.3

T2 1 + 1 = 2

T3 0.3 + 0.6 + 0.7 = 1.6

Object Year SUM(Score)

T1 2008 0.5 + 0.8 = 1.3

T2 2008 1

T3 2008 0.7

We project all tuples of T1 with Year = “2008” into this cell and sum up their scores.

Problem Definition Object Location Year Score

T1 NY 2008 0.5

T1 WA 2008 0.8

T2 WA 2007 1.0

T2 WA 2008 1.0

T3 NY 2007 0.3

T3 WA 2007 0.6

T3 WA 2008 0.7

Query Target Object : T1 Aggregation measure : SUM Goal : Discover the most interesting subspaces where T1 is

highly ranked.

{*}

{NY} {WA} {2008}

{NY,2008} {WA,2008}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

SUM (T1) = 0.5Rank (T1) = 1st / 1

Object SUM(Score)

T1 0.5 + 0.8 = 1.3

T2 1 + 1 = 2

T3 0.3 + 0.6 + 0.7 = 1.6

Object Year SUM(Score)

T1 2008 0.5 + 0.8 = 1.3

T2 2008 1

T3 2008 0.7

Object Location Year SUM(Score)

T1 NY 2008 0.5

T2 NY 2008 NO Tuples !

T3 NY 2008 NO Tuples !

We project all tuples of T1 with Location = “NY” and Year = “2008” into this cell and sum up their scores.

SUM (T1) = 1.3Rank (T1) = 1st / 3

Problem Definition Object Location Year Score

T1 NY 2008 0.5

T1 WA 2008 0.8

T2 WA 2007 1.0

T2 WA 2008 1.0

T3 NY 2007 0.3

T3 WA 2007 0.6

T3 WA 2008 0.7

Query Target Object : T1 Aggregation measure : SUM Goal : Discover the most interesting subspaces where T1 is

highly ranked.

{*}

{NY} {WA} {2008}

{NY,2008} {WA,2008}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

SUM (T1) = 1.3Rank (T1) = 1st / 3

SUM (T1) = 0.5Rank (T1) = 1st / 1

Object SUM(Score)

T1 0.5 + 0.8 = 1.3

T2 1 + 1 = 2

T3 0.3 + 0.6 + 0.7 = 1.6

Object Year SUM(Score)

T1 2008 0.5 + 0.8 = 1.3

T2 2008 1

T3 2008 0.7

Object Location Year SUM(Score)

T1 NY 2008 0.5

T2 NY 2008 NO Tuples !

T3 NY 2008 NO Tuples !

T1 ranks 1st in both {2008} and {NY,2008}, which one is more interesting?

Problem Definition

Promotiveness of a subspace S : a class of measures to quantify how well a subspace S can promote the target object T. Rank of the target object, Rank(S,T)

Higher rank -> more promotive.

Significance of the subspace, Sig(S) More significant subspace (e.g. more objects) -> more promotive.

P(S, T) = f( Rank(S, T) ) * g( Sig(S) ) Example

P(S,T) = Rank-1(S,T) P(S,T) = Rank-1(S,T) * ObjectCount(S) P(S,T) = Rank-1(S,T) * I(ObjectCount(S) > MinSig)

Problem Definition

The promotion query problem Input

a target object T a top-R parameter

Output top-R subspaces with the largest P(S, T) scores

Assume simple ranking model P(S,T) = Rank-1(S,T)

Query processing methods

Query execution framework Basic framework

To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.

Object Location Year Score

T1 NY 2008 0.5

T1 WA 2008 0.8

T2 WA 2007 1.0

T2 WA 2008 1.0

T3 NY 2007 0.3

T3 WA 2007 0.6

T3 WA 2008 0.7

Partition Aggregation

Start

Query execution framework Basic framework

To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.

Object Location Year Score

T1 NY 2008 0.5

T1 WA 2008 0.8

T2 WA 2007 1.0

T2 WA 2008 1.0

T3 NY 2007 0.3

T3 WA 2007 0.6

T3 WA 2008 0.7

{*} SUM (T1) = 1.3Rank (T1) = 3rd / 3

Partition Aggregation

Start

Start from the coarsest subspace {*}.

Query execution framework Basic framework

To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.

{*} SUM (T1) = 1.3Rank (T1) = 3rd / 3

Object Location Year Score

T1 NY 2008 0.5

T1 WA 2008 0.8

T2 WA 2007 1.0

T2 WA 2008 1.0

T3 NY 2007 0.3

T3 WA 2007 0.6

T3 WA 2008 0.7

{NY} {WA}

Partition Aggregation

Start

Partition the data based on the first dimension (i.e. Location). Generate candidate subspaces by substituting values in that dimension.

Query execution framework Basic framework

To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.

{*}

{NY} {WA}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

Object Location Year Score

T1 NY 2008 0.5

T3 NY 2007 0.3

T2 WA 2007 1.0

T2 WA 2008 1.0

T1 WA 2008 0.8

T3 WA 2007 0.6

T3 WA 2008 0.7

Partition Aggregation

Start

Partition the data based on the first dimension (i.e. Location). Generate candidate subspaces by substituting values in that dimension.

Query execution framework Basic framework

To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.

Partition Aggregation

Start

{*}

{NY} {WA}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

Object Location Year Score

T1 NY 2008 0.5

T3 NY 2007 0.3

T2 WA 2007 1.0

T2 WA 2008 1.0

T1 WA 2008 0.8

T3 WA 2007 0.6

T3 WA 2008 0.7

SUM (T1) = 0.5Rank (T1) = 1st / 2

Recursive operate on the child subspace, perform aggregation.T1 ranks 1st among two objects (i.e. T1 and T3).

Query execution framework Basic framework

To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.

Partition Aggregation

Start

{*}

{NY} {WA}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

Object Location Year Score

T1 NY 2008 0.5

T3 NY 2007 0.3

T2 WA 2007 1.0

T2 WA 2008 1.0

T1 WA 2008 0.8

T3 WA 2007 0.6

T3 WA 2008 0.7

SUM (T1) = 0.5Rank (T1) = 1st / 2

{NY,2008}{NY,2007}

Partition the data based on the next dimension (i.e. Year). Generate candidate subspaces by substituting values in that dimension.

Query execution framework Basic framework

To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.

Partition Aggregation

Start

{*}

{NY} {WA}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

Object Location Year Score

T1 NY 2008 0.5

T3 NY 2007 0.3

T2 WA 2007 1.0

T2 WA 2008 1.0

T1 WA 2008 0.8

T3 WA 2007 0.6

T3 WA 2008 0.7

SUM (T1) = 0.5Rank (T1) = 1st / 2

{NY,2008}{NY,2007}

Pruned!!!

Recursive operate on the child subspace.The target object T1 does not appear in this subspace, prune it!!!

Query execution framework Basic framework

To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.

Partition Aggregation

Start

{*}

{NY} {WA}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

Object Location Year Score

T1 NY 2008 0.5

T3 NY 2007 0.3

T2 WA 2007 1.0

T2 WA 2008 1.0

T1 WA 2008 0.8

T3 WA 2007 0.6

T3 WA 2008 0.7

SUM (T1) = 0.5Rank (T1) = 1st / 2

{NY,2008}{NY,2007}

Pruned!!! SUM (T1) = 0.5Rank (T1) = 1st / 1

Query execution framework Basic framework

To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.

Partition Aggregation

Start

{*}

{NY} {WA}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

Object Location Year Score

T1 NY 2008 0.5

T3 NY 2007 0.3

T2 WA 2007 1.0

T2 WA 2008 1.0

T1 WA 2008 0.8

T3 WA 2007 0.6

T3 WA 2008 0.7

SUM (T1) = 0.5Rank (T1) = 1st / 2

{NY,2008}{NY,2007}

Pruned!!! SUM (T1) = 0.5Rank (T1) = 1st / 1

SUM (T1) = 0.8Rank (T1) = 3rd / 3

Query execution framework Basic framework

To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.

Partition Aggregation

Start

{*}

{NY} {WA}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

Object Location Year Score

T1 NY 2008 0.5

T3 NY 2007 0.3

T2 WA 2007 1.0

T2 WA 2008 1.0

T1 WA 2008 0.8

T3 WA 2007 0.6

T3 WA 2008 0.7

SUM (T1) = 0.5Rank (T1) = 1st / 2

{NY,2008}{NY,2007}

Pruned!!! SUM (T1) = 0.5Rank (T1) = 1st / 1

SUM (T1) = 0.8Rank (T1) = 3rd / 3

{WA,2008}{WA,2007}

Query execution framework Basic framework

To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.

Partition Aggregation

Start

{*}

{NY} {WA}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

Object Location Year Score

T1 NY 2008 0.5

T3 NY 2007 0.3

T2 WA 2007 1.0

T3 WA 2007 0.6

T1 WA 2008 0.8

T2 WA 2008 1.0

T3 WA 2008 0.7

SUM (T1) = 0.5Rank (T1) = 1st / 2

{NY,2008}{NY,2007}

Pruned!!! SUM (T1) = 0.5Rank (T1) = 1st / 1

SUM (T1) = 0.8Rank (T1) = 3rd / 3

{WA,2008}{WA,2007}

Query execution framework Basic framework

To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.

Partition Aggregation

Start

{*}

{NY} {WA}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

Object Location Year Score

T1 NY 2008 0.5

T3 NY 2007 0.3

T2 WA 2007 1.0

T3 WA 2007 0.6

T1 WA 2008 0.8

T2 WA 2008 1.0

T3 WA 2008 0.7

SUM (T1) = 0.5Rank (T1) = 1st / 2

{NY,2008}{NY,2007}

Pruned!!! SUM (T1) = 0.5Rank (T1) = 1st / 1

SUM (T1) = 0.8Rank (T1) = 3rd / 3

{WA,2008}{WA,2007}

Pruned!!!

Query execution framework Basic framework

To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.

Partition Aggregation

Start

{*}

{NY} {WA}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

Object Location Year Score

T1 NY 2008 0.5

T3 NY 2007 0.3

T2 WA 2007 1.0

T3 WA 2007 0.6

T1 WA 2008 0.8

T2 WA 2008 1.0

T3 WA 2008 0.7

SUM (T1) = 0.5Rank (T1) = 1st / 2

{NY,2008}{NY,2007}

Pruned!!! SUM (T1) = 0.5Rank (T1) = 1st / 1

SUM (T1) = 0.8Rank (T1) = 3rd / 3

{WA,2008}{WA,2007}

Pruned!!! SUM (T1) = 0.8Rank (T1) = 2nd / 3

Query execution framework Basic framework

To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.

Partition Aggregation

Start

{*}

{NY} {WA}

{NY,2008} {WA,2008}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

SUM (T1) = 0.5Rank (T1) = 1st / 2

SUM (T1) = 0.8Rank (T1) = 3rd / 3

SUM (T1) = 0.5Rank (T1) = 1st / 1

SUM (T1) = 0.8Rank (T1) = 2nd / 3

{NY,2007}

Pruned!!!

{WA,2007}

Pruned!!!

Object Location Year Score

T1 NY 2008 0.5

T3 NY 2007 0.3

T2 WA 2007 1.0

T3 WA 2007 0.6

T1 WA 2008 0.8

T2 WA 2008 1.0

T3 WA 2008 0.7

{2007} {2008}

Query execution framework Basic framework

To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.

Partition Aggregation

Start

{*}

{NY} {WA}

{NY,2008} {WA,2008}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

SUM (T1) = 0.5Rank (T1) = 1st / 2

SUM (T1) = 0.8Rank (T1) = 3rd / 3

SUM (T1) = 0.5Rank (T1) = 1st / 1

SUM (T1) = 0.8Rank (T1) = 2nd / 3

{NY,2007}

Pruned!!!

{WA,2007}

Pruned!!!

Object Location Year Score

T3 WA 2007 0.6

T3 NY 2007 0.3

T2 WA 2007 1.0

T1 NY 2008 0.5

T1 WA 2008 0.8

T2 WA 2008 1.0

T3 WA 2008 0.7

{2007} {2008}

Query execution framework Basic framework

To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.

Partition Aggregation

Start

{*}

{NY} {WA}

{NY,2008} {WA,2008}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

SUM (T1) = 0.5Rank (T1) = 1st / 2

SUM (T1) = 0.8Rank (T1) = 3rd / 3

SUM (T1) = 0.5Rank (T1) = 1st / 1

SUM (T1) = 0.8Rank (T1) = 2nd / 3

{NY,2007}

Pruned!!!

{WA,2007}

Pruned!!!

{2007} {2008}

Object Location Year Score

T3 WA 2007 0.6

T3 NY 2007 0.3

T2 WA 2007 1.0

T1 NY 2008 0.5

T1 WA 2008 0.8

T2 WA 2008 1.0

T3 WA 2008 0.7

Pruned!!!

Query execution framework Basic framework

To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.

Partition Aggregation

Start

{*}

{NY} {WA} {2008}

{NY,2008} {WA,2008}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

SUM (T1) = 0.5Rank (T1) = 1st / 2

SUM (T1) = 0.8Rank (T1) = 3rd / 3

SUM (T1) = 0.5Rank (T1) = 1st / 1

SUM (T1) = 0.8Rank (T1) = 2nd / 3

{NY,2007}

Pruned!!!

{WA,2007}

Pruned!!!

{2007}

Pruned!!!

Object Location Year Score

T3 WA 2007 0.6

T3 NY 2007 0.3

T2 WA 2007 1.0

T1 NY 2008 0.5

T1 WA 2008 0.8

T2 WA 2008 1.0

T3 WA 2008 0.7

SUM (T1) = 1.3Rank (T1) = 1st / 3

Finish!!!

Query execution framework Basic framework

To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.

Partition Aggregation

Start

{*}

{NY} {WA} {2008}

{NY,2008} {WA,2008}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

SUM (T1) = 0.5Rank (T1) = 1st / 2

SUM (T1) = 0.8Rank (T1) = 3rd / 3

SUM (T1) = 0.5Rank (T1) = 1st / 1

SUM (T1) = 0.8Rank (T1) = 2nd / 3

{NY,2007}

Pruned!!!

{WA,2007}

Pruned!!!

{2007}

Pruned!!!

Object Location Year Score

T3 WA 2007 0.6

T3 NY 2007 0.3

T2 WA 2007 1.0

T1 NY 2008 0.5

T1 WA 2008 0.8

T2 WA 2008 1.0

T3 WA 2008 0.7

SUM (T1) = 1.3Rank (T1) = 1st / 3

P(S,T) = Rank-1(S,T)

Return Top-3 subspaces

Query execution framework Basic framework

To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.

Partition Aggregation

Start

{*}

{NY} {WA} {2008}

{NY,2008} {WA,2008}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

SUM (T1) = 0.5Rank (T1) = 1st / 2

SUM (T1) = 0.8Rank (T1) = 3rd / 3

SUM (T1) = 0.5Rank (T1) = 1st / 1

SUM (T1) = 0.8Rank (T1) = 2nd / 3

{NY,2007}

Pruned!!!

{WA,2007}

Pruned!!!

{2007}

Pruned!!!

Object Location Year Score

T3 WA 2007 0.6

T3 NY 2007 0.3

T2 WA 2007 1.0

T1 NY 2008 0.5

T1 WA 2008 0.8

T2 WA 2008 1.0

T3 WA 2008 0.7

SUM (T1) = 1.3Rank (T1) = 1st / 3

P(S,T) = Rank-1(S,T) * I(ObjCount(S) > 1)

Return Top-3 subspaces

Query execution framework

Query execution framework

The basic execution framework…Computes ALL subspaces, and thus the

overall cost could be quite prohibitive for large datasets.

Develop optimization techniques based on thresholding techniquesSubspace pruningObject pruning

Efficient computation methods Subspace pruning

Object pruning

Subspace pruning

Key motivation If the upper bound of the promotiveness score of an

unseen subspace is lower than the current top-R promotiveness score, we can prune the subspace.

How to obtain the upper bound of promotiveness scores of the unseen subspaces? P(S,T) = Rank-1(S,T) Obtain a lower bound of the rank.

Subspace pruning

Assumption : The aggregation measure is a

monotone function (e.g. SUM) The Sig measure is also monotone.

Key observations

{*}

{A} {B} {C}

{AB}

{ABC}

{AC} {BC}

Observation 1: Objects in the child subspace must be a member of its parent subspace.

Observation 2: Object’s aggregate score in the child subspace must be smaller than or equal to its parent subspace.

Subspace pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a}

S2 = {ab}

S3 = {abc}

S4 = {ac}

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Subspace pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} 0.7

S2 = {ab} 0.6

S3 = {abc} 0.3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Initialization step: We first scan the dataset once to calculate the aggregate of the target object t7 in each subspace.

Subspace pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7

S2 = {ab} 0.6

S3 = {abc} 0.3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Start: Compute the aggregate of objects in subspace {a}.

Subspace pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3

S2 = {ab} 0.6

S3 = {abc} 0.3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Current top-1 result : S1(1/3)

Subspace pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3

S2 = {ab} t6(0.7) t7(0.6) t3(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6

S3 = {abc} 0.3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Current top-1 result : S1(1/3)

Subspace pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3

S2 = {ab} t6(0.7) t7(0.6) t3(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6 2 1/2

S3 = {abc} 0.3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Current top-1 result : S1(1/3) S2(1/2)

Subspace pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3

S2 = {ab} t6(0.7) t7(0.6) t3(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6 2 1/2

S3 = {abc} t3(0.6) t6(0.5) t7(0.3) t1(0.1) t5(0.1) 0.3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Current top-1 result : S2(1/2)

Subspace pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3

S2 = {ab} t6(0.7) t7(0.6) t3(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6 2 1/2

S3 = {abc} t3(0.6) t6(0.5) t7(0.3) t1(0.1) t5(0.1) 0.3 3 1/3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Current top-1 result : S2(1/2)

Subspace pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3

S2 = {ab} t6(0.7) t7(0.6) t3(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6 2 1/2

S3 = {abc} t3(0.6) t6(0.5) t7(0.3) t1(0.1) t5(0.1) 0.3 3 1/3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Current top-1 result : S2(1/2) Can we deduce a lower bound of Rank of S4?

Subspace pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3

S2 = {ab} t6(0.7) t7(0.6) t3(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6 2 1/2

S3 = {abc} t3(0.6) t6(0.5) t7(0.3) t1(0.1) t5(0.1) 0.3 3 1/3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Current top-1 result : S2(1/2) Can we deduce a lower bound of Rank of S4?

Given the tuples in S3, can we deduce some of the members of S4?

Subspace pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3

S2 = {ab} t6(0.7) t7(0.6) t3(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6 2 1/2

S3 = {abc} t3(0.6) t6(0.5) t7(0.3) t1(0.1) t5(0.1) 0.3 3 1/3

S4 = {ac} t3(?) t6(?) t7(?) t1(?) t5(?) 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Can we deduce a lower bound of Rank of S4?

Can we say something about the scores of these members?

Current top-1 result : S2(1/2)Tuples in the child subspace must also appear in the parent subspaces.

Subspace pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3

S2 = {ab} t6(0.7) t7(0.6) t3(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6 2 1/2

S3 = {abc} t3(0.6) t6(0.5) t7(0.3) t1(0.1) t5(0.1) 0.3 3 1/3

S4 = {ac} t3(?) t6(?) t7(0.4) t1(?) t5(?) 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Can we deduce a lower bound of Rank of S4?

Can we say something about the scores of these members?

Current top-1 result : S2(1/2)

Subspace pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3

S2 = {ab} t6(0.7) t7(0.6) t3(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6 2 1/2

S3 = {abc} t3(0.6) t6(0.5) t7(0.3) t1(0.1) t5(0.1) 0.3 3 1/3

S4 = {ac} t3(0.6) t6(0.5) t7(0.4) t1(0.1) t5(0.1) 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Can we deduce a lower bound of Rank of S4?

Can we say something about the scores of these members?

Current top-1 result : S2(1/2)Aggregate score of a tuple in the child subspace must be smaller or equal to its score in the parent subspace.

Subspace pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3

S2 = {ab} t6(0.7) t7(0.6) t3(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6 2 1/2

S3 = {abc} t3(0.6) t6(0.5) t7(0.3) t1(0.1) t5(0.1) 0.3 3 1/3

S4 = {ac} t3(0.6) t6(0.5) t7(0.4) t1(0.1) t5(0.1) 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Can we deduce a lower bound of Rank of S4?Current top-1 result : S2(1/2)

?

Subspace pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3

S2 = {ab} t6(0.7) t7(0.6) t3(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6 2 1/2

S3 = {abc} t3(0.6) t6(0.5) t7(0.3) t1(0.1) t5(0.1) 0.3 3 1/3

S4 = {ac} t3(0.6) t6(0.5) t7(0.4) t1(0.1) t5(0.1) 0.4 >= 3

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Can we deduce a lower bound of Rank of S4?Current top-1 result : S2(1/2)

Subspace pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3

S2 = {ab} t6(0.7) t7(0.6) t3(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6 2 1/2

S3 = {abc} t3(0.6) t6(0.5) t7(0.3) t1(0.1) t5(0.1) 0.3 3 1/3

S4 = {ac} t3(0.6) t6(0.5) t7(0.4) t1(0.1) t5(0.1) 0.4 >= 3 ?

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

S4 Pruned!!!

Current top-1 result : S2(1/2)The promotive score of S4 should be less than or equal to 1/3, which is less than the current top-1 promotive score (1/2), so S4 can be pruned!!!!

Subspace pruning

Object pruning

Key motivation Try to prune the objects by obtaining an

upper bound of the aggregate score of unseen objects.

Unseen objects with upper bound smaller than the smallest aggregate score of target object can be pruned.

Object pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} 0.7

S2 = {ab} 0.6

S3 = {abc} 0.3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Minimum of the aggregate scores of t7: = min{0.7, 0.6, 0.3, 0.4} = 0.3

Object pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t1(0.7) t7(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7

S2 = {ab} 0.6

S3 = {abc} 0.3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Minimum of the aggregate scores of t7: = min{0.7, 0.6, 0.3, 0.4} = 0.3

Question: Can we prune some objects in the subtree of S1?

Object pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t1(0.7) t7(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7

S2 = {ab} 0.6

S3 = {abc} 0.3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Minimum of the aggregate scores of t7: = min{0.7, 0.6, 0.3, 0.4} = 0.3

Question: Can we prune some objects in the subtree of S1?

0.2 is the upper bound of the aggregate scores of t2 in the subtree of S1. i.e. the aggregate score will only be <= 0.2 !!!

Object pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t1(0.7) t7(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7

S2 = {ab} 0.6

S3 = {abc} 0.3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Minimum of the aggregate scores of t7: = min{0.7, 0.6, 0.3, 0.4} = 0.3

Since the minimum of the aggregate scores of t7 is 0.3, the aggregate scores of t2 will not affect the rank of t7 in the subtree of S1.

0.2 is the upper bound of the aggregate scores of t2 in the subtree of S1. i.e. the aggregate score will only be <= 0.2 !!!

Object pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t1(0.7) t7(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7

S2 = {ab} 0.6

S3 = {abc} 0.3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Minimum of the aggregate scores of t7: = min{0.7, 0.6, 0.3, 0.4} = 0.3

Since the minimum of the aggregate scores of t7 is 0.3, the aggregate scores of t2 will not affect the rank of t7 in the subtree of S1.

Pruned!!!

0.2 is the upper bound of the aggregate scores of t2 in the subtree of S1. i.e. the aggregate score will only be <= 0.2 !!!

Object pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t1(0.7) t7(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7

S2 = {ab} 0.6

S3 = {abc} 0.3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Minimum of the aggregate scores of t7: = min{0.7, 0.6, 0.3, 0.4} = 0.3

Similarly, t4, t5 can also be Pruned!!!

Object pruning

Promotion cube

Promotion cube

Promotion cellGiven a subspace S, a promotion cell S.Pcell is

defined as the sequence of the top-k largest object aggregate scores in S.

Promotion cubeThe promotion cube consists of a set of triples

in the format (S, S.Pcell, Sig), where Sig is the significance of the subspace S.

Promotion cubeS1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Subspace Top-3 largest aggregate scores in the subspace

Sig

S1 = {a} (1.2) (1.0) (0.7) 1

S2 = {ab} (0.7) (0.6) (0.6) 1

S3 = {abc} (0.6) (0.5) (0.3) 1

S4 = {ac} (0.8) (0.7) (0.6) 1

In the promotion cube, we precompute the top-3 largest aggregate scores (not the object) in each subsapce.

Promotion cube

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} 0.7

S2 = {ab} 0.6

S3 = {abc} 0.3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Subspace Top-3 largest aggregate scores in the subspace

Sig

S1 = {a} (1.2) (1.0) (0.7) 1

S2 = {ab} (0.7) (0.6) (0.6) 1

S3 = {abc} (0.6) (0.5) (0.3) 1

S4 = {ac} (0.8) (0.7) (0.6) 1

In the promotion cube, we precompute the top-3 largest aggregate scores (not the object) in each subsapce.

Subspace Top-3 largest aggregate scores in the subspace

Sig

S1 = {a} (1.2) (1.0) (0.7) 1

S2 = {ab} (0.7) (0.6) (0.6) 1

S3 = {abc} (0.6) (0.5) (0.3) 1

S4 = {ac} (0.8) (0.7) (0.6) 1

Promotion cube

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} 0.7

S2 = {ab} 0.6

S3 = {abc} 0.3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Can you tell the exact rank of t7 in S1? The aggregate score of t7 is 0.7, there are 2 other objects with aggregate value larger than t7!

^

Subspace Top-3 largest aggregate scores in the subspace

Sig

S1 = {a} (1.2) (1.0) (0.7) 1

S2 = {ab} (0.7) (0.6) (0.6) 1

S3 = {abc} (0.6) (0.5) (0.3) 1

S4 = {ac} (0.8) (0.7) (0.6) 1

Promotion cube

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} No need to compute 0.7 3 1/3

S2 = {ab} 0.6

S3 = {abc} 0.3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Current top-1 result : S1(1/3)

Can you tell the exact rank of t7 in S1? The aggregate score of t7 is 0.7, there are 2 other objects with aggregate value larger than t7!

^

Subspace Top-3 largest aggregate scores in the subspace

Sig

S1 = {a} (1.2) (1.0) (0.7) 1

S2 = {ab} (0.7) (0.6) (0.6) 1

S3 = {abc} (0.6) (0.5) (0.3) 1

S4 = {ac} (0.8) (0.7) (0.6) 1

Promotion cube

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} No need to compute 0.7 3 1/3

S2 = {ab} No need to compute 0.6 2 1/2

S3 = {abc} 0.3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Current top-1 result : S2(1/2)

^

Subspace Top-3 largest aggregate scores in the subspace

Sig

S1 = {a} (1.2) (1.0) (0.7) 1

S2 = {ab} (0.7) (0.6) (0.6) 1

S3 = {abc} (0.6) (0.5) (0.3) 1

S4 = {ac} (0.8) (0.7) (0.6) 1

Promotion cube

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} No need to compute 0.7 3 1/3

S2 = {ab} No need to compute 0.6 2 1/2

S3 = {abc} No need to compute 0.3 3 1/3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Current top-1 result : S2(1/2)^

Subspace Top-3 largest aggregate scores in the subspace

Sig

S1 = {a} (1.2) (1.0) (0.7) 1

S2 = {ab} (0.7) (0.6) (0.6) 1

S3 = {abc} (0.6) (0.5) (0.3) 1

S4 = {ac} (0.8) (0.7) (0.6) 1

Promotion cube

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} No need to compute 0.7 3 1/3

S2 = {ab} No need to compute 0.6 2 1/2

S3 = {abc} No need to compute 0.3 3 1/3

S4 = {ac} No need to compute 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Current top-1 result : S2(1/2)

Can you tell the exact rank of t7 in S4? No! The aggregate score of t7 is 0.4, there are at least 3 objects with aggregate value larger than t7!

^

Subspace Top-3 largest aggregate scores in the subspace

Sig

S1 = {a} (1.2) (1.0) (0.7) 1

S2 = {ab} (0.7) (0.6) (0.6) 1

S3 = {abc} (0.6) (0.5) (0.3) 1

S4 = {ac} (0.8) (0.7) (0.6) 1

Promotion cube

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} No need to compute 0.7 3 1/3

S2 = {ab} No need to compute 0.6 2 1/2

S3 = {abc} No need to compute 0.3 3 1/3

S4 = {ac} No need to compute 0.4 >=3

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

S4 Pruned!!!

Current top-1 result : S2(1/2)

Can you tell the exact rank of t7 in S4? No!The aggregate score of t7 is 0.4, there are at least 3 objects with aggregate value larger than t7!

Experimental evaluations

Settings

ImplementationPentium 3GHz processor2GB of memory160G hard diskWinXP/ Microsoft Visual C# 2008 (in-memory)

Dataset DBLP DatasetTPC-H

Algorithms

PromoRankThe basic query execution framework.

PromoRank++The basic query execution framework

with subspace pruning and object pruning.

PromoCube

DBLP Dataset

Subspace dimensions Conference (2,506) Year (50) Database (boolean) Data mining (boolean) Information retrieval (boolean) Machine learning (boolean)

Object dimension: Author(450K) Score dimension: Paper count Base tuples : 1.76M

DBLP DatasetThe running time increases as R increases. It is because the pruning threshold is determined by the current top-R’s aggregate score. The threshold becomes looser as R becomes larger.

PromoCube performs extremely well when R is small.It is because in such case, the PromoCube can directly return the result using O(1) lookup time.

DBLP DatasetN

um

be

r o

f su

bsp

ace

ag

gre

ga

tion

s

PromoCube performs extremely well when R is small.It is because in such case, the PromoCube can directly return the result using O(1) lookup time.

The running time increases as R increases. It is because the pruning threshold is determined by the current top-R’s aggregate score. The threshold becomes looser as R becomes larger.

TPC-H Dataset

Subspace dimensions l_shipdate (2526) l_quantity (50) l_discount (11) l_tax (9) l_linenumber (7) l_returnflag (3)

Object dimension: l suppkey (10,000) Score dimension: l_extendedprice (ranges from 901.00

to 104949.50) Base tuples: 6,001,215

TPC-H

The gap between PromoRank and PromoRank++ is not large when number of dimensions is small.This is because the total number of target subspace itself is quite small, less chance to perform pruning that exploit parent-child relationship.

PromoCube is increasingly faster w.r.t. number of tuples.

This is because the actual aggregation and partition cost saving of PromoCube is much larger.PromoCube prunes subspace before any aggregation happens, but PromoCube++ prunes subspaces during aggregation process.

Runtime increases when dimensionality increases.This is because there will be more target subspaces when there are more dimensions.

TPC-H

All algorithm’s running time is faster when there are more objects.It is because more objects, less number of target subspaces for each object.With other parameters unchanged, if there are more objects, each object will appear in less tuples, causing less number of target subspaces for each object .

Both PromoRank++ and PromoCube favor large cardinalities, because… With other parameters unchanged, larger cardinality

implies more subspaces. With the same number of tuples, the chance of two

tuples having the same dimension values becomes lower.

Therefore, it is more likely that the aggregate scores would be equal across parent-child subspaces, thereby providing a tighter lower bound for Rank.

TPC-H{*}

{NY} {WA} {2008}

{NY,2008} {WA,2008}{NY,2007} {WA,2007}

{2007}

{*}

{NY}

{NY,2007}

{2007}

{NY,2008}

{2008}

Both PromoRank++ and PromoCube favor large cardinalities, because… With other parameters unchanged, larger cardinality

implies more subspaces. With the same number of tuples, the chance of two

tuples having the same dimension value becomes lower.

Therefore, it is more likely that the aggregate scores would be equal across parent-child subspaces, thereby providing a tighter lower bound for Rank.

TPC-H{*}

{NY} {WA} {2008}

{NY,2008} {WA,2008}{NY,2007} {WA,2007}

{2007}

Object Location Year Score

T1 NY 2007 0.6

T2 NY 2008 0.4

Object Location Year Score

T1 NY 2007 0.6

T2 WA 2008 0.4{*}

{NY}

{NY,2007}

{2007}

{NY,2008}

{2008}

Both PromoRank++ and PromoCube favor large cardinalities, because… With other parameters unchanged, larger cardinality

implies more subspaces. With the same number of tuples, the chance of two

objects having the same dimension value becomes lower (sparse).

Therefore, it is more likely that the aggregate scores would be equal across parent-child subspaces, thereby providing a tighter lower bound for Rank.

0.6 0.4

1

0.6 0.4

0.6 0.4

Conclusion

Introduced the promotion analysis problem.

Presented a basic query execution framework.

Proposed two pruning techniques and the Promotion Cube for efficient query processing.

The End

Appendix

Object pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t1(0.7) t7(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3

S2 = {ab} t6(0.7) t3(0.6) t7(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6 2 1/2

S3 = {abc} t3(0.6) t6(0.5) t7(0.3) t1(0.1) t5(0.1) 0.3 3 1/3

S4 = {ac} t3(0.8) t6(0.7) t1(0.6) t7(0.4) t5(0.2) t2(0.1) t4(0.1) 0.4 4 1/4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Introduction

Promotion has been playing a key role in marketing…

Manager

Query execution framework{*}

{A} {B}

Basic framework To use a recursive process to

partition and aggregate the data to compute the target object’s rank in each subspace.

Depth-first manner

{C} {D}

{AB}

{ABC}

{ABCD}

{AC} {AD} {BC} {BD} {CD}

{ABD} {ACD} {BCD}

TPC-H

Effectiveness of promotion query

top related