dp idp exploredb

29
George Valkanas 1 , Apostolos N. Papadopoulos 2 , Dimitrios Gunopulos 1 Skyline Ranking à la IR 1 University of Athens, Greece 2 Aristotle University of Thessaloniki, Greece 1 st ExploreDB Workshop Athens, Greece 28 th March, 2014

Upload: george-valkanas

Post on 11-Aug-2014

148 views

Category:

Data & Analytics


3 download

DESCRIPTION

Ranking skyline points with an IR-style weighting scheme

TRANSCRIPT

Page 1: Dp idp exploredb

George Valkanas1, Apostolos N. Papadopoulos2, Dimitrios Gunopulos1

Skyline Ranking à la IR

1University of Athens, Greece2Aristotle University of Thessaloniki, Greece

1st ExploreDB WorkshopAthens, Greece28th March, 2014

Page 2: Dp idp exploredb

Skyline Problem Introduction

• Dataset D = (p1, p2, …, pn) in d-dimensional space• Preferences for each dimension: min, max• p dominates q iff pi ≤ qi i = 1,..,d && j: pj < qj

Page 3: Dp idp exploredb

Usefulness of Skyline• Multi-Objective optimization

• Exploratory Search

• Improve Recommendations

• Data summarization technique

• Building block for defining competitiveness

Page 4: Dp idp exploredb

Skyline Cardinality Explosion

O( (ln n)d-1)

• Skyline becomes too large to inspect manually

Page 5: Dp idp exploredb

Solving the Cardinality Problem

• Select subset of size k– Coverage-based– Contour representation– Diversification

• Ranking– Top-k Dominating– Subspace dominance

Page 6: Dp idp exploredb

Skyline + IR: Intuition

• Dominated points are not equally important• Scheme similar to TF-IDF

Page 7: Dp idp exploredb

Skyline + IR: How ?

• 2 Factors– DP (~ tf)

– IDP (~ idf)

• DP-IDP

Page 8: Dp idp exploredb

Ranking the Skyline• Baseline:

– sp• Iterate over its dominated points, and SUM

SlowUnnecessary computations

• Alternative?Bound the score

• Lower• Upper

Prune skyline points

Page 9: Dp idp exploredb

A Simpler Representation

• More comprehensive for bounds

Page 10: Dp idp exploredb

Bounding the Score• Q1: What is the score for B ?

Page 11: Dp idp exploredb

Bounding the Score• Q1: What is the score for B ?• A1: Depends on the assignment of the

remaining edges

Page 12: Dp idp exploredb

Bounding the Score• Q1: What is the score for B ?• A1: Depends on the assignment of the

remaining edges

• Q2: What is the maximum score for B ?

Page 13: Dp idp exploredb

Bounding the Score• Q1: What is the score for B ?• A1: Depends on the assignment of the

remaining edges

• Q2: What is the maximum score for B ?• A2: Assign appropriately the remaining

edges

Page 14: Dp idp exploredb

Bounding the Score• Q1: What is the score for B ?• A1: Depends on the assignment of the

remaining edges

• Q2: What is the maximum score for B ?• A2: Assign appropriately the remaining

edges

• Q3: What is the appropriate way?

Page 15: Dp idp exploredb

Bounding the Score• Q1: What is the score for B ?• A1: Depends on the assignment of the

remaining edges

• Q2: What is the maximum score for B ?• A2: Assign appropriately the remaining

edges

• Q3: What is the appropriate way?• A3:

– Same layer → Higher score (dp)– Minimum overlap → Higher score (idp)

• No overlap → Loose bounds

Page 16: Dp idp exploredb

The SkyIR Algorithm

Page 17: Dp idp exploredb

The SkyIR Algorithm

Page 18: Dp idp exploredb

The SkyIR Algorithm

Page 19: Dp idp exploredb

The SkyIR Algorithm

Page 20: Dp idp exploredb

The SkyIR Algorithm

Page 21: Dp idp exploredb

The SkyIR Algorithm

Page 22: Dp idp exploredb

The SkyIR Algorithm

• Priority can be:– Round Robin (RRB)– Pending points (PND)– Upper Bound (UBS)

Page 23: Dp idp exploredb

Experimental Setup

• Datasets

• Algorithms– Baseline– SkyIR

• Bounds: Loose (LS), Collaborative (CB)• 3 Priority schemes: RRB, PND, UBS

Page 24: Dp idp exploredb

Total Runtime – IND distr

k=5, d=3

CB-UBS is 4x faster than the Baseline

Page 25: Dp idp exploredb

Total Runtime – ANT distr

• Interesting fact: ANT is easier than IND (fewer layers to extract)

Page 26: Dp idp exploredb

Total Runtime – Forest Cover

Page 27: Dp idp exploredb

Memory Consumption

CB, k=5

PND is the best memory-wise

Page 28: Dp idp exploredb

Conclusions

• IR-style ranking for skyline– Formal framework– Bounds for efficient computation

• SkyIR algorithm– Experimental evaluation

• Future Work– Speed up / Scale up– Improve bounds (lower, upper)– Approximation technique(s)

Page 29: Dp idp exploredb

Thank you!

Questions?

Acknowledgements: Heraclitus II fellowship, THALIS – GeomComp, THALIS – DISFER, ARISTEIA – MMD, FP7 INSIGHT