dp idp exploredb
DESCRIPTION
Ranking skyline points with an IR-style weighting schemeTRANSCRIPT
George Valkanas1, Apostolos N. Papadopoulos2, Dimitrios Gunopulos1
Skyline Ranking à la IR
1University of Athens, Greece2Aristotle University of Thessaloniki, Greece
1st ExploreDB WorkshopAthens, Greece28th March, 2014
Skyline Problem Introduction
• Dataset D = (p1, p2, …, pn) in d-dimensional space• Preferences for each dimension: min, max• p dominates q iff pi ≤ qi i = 1,..,d && j: pj < qj
Usefulness of Skyline• Multi-Objective optimization
• Exploratory Search
• Improve Recommendations
• Data summarization technique
• Building block for defining competitiveness
Skyline Cardinality Explosion
O( (ln n)d-1)
• Skyline becomes too large to inspect manually
Solving the Cardinality Problem
• Select subset of size k– Coverage-based– Contour representation– Diversification
• Ranking– Top-k Dominating– Subspace dominance
Skyline + IR: Intuition
• Dominated points are not equally important• Scheme similar to TF-IDF
Skyline + IR: How ?
• 2 Factors– DP (~ tf)
– IDP (~ idf)
• DP-IDP
Ranking the Skyline• Baseline:
– sp• Iterate over its dominated points, and SUM
SlowUnnecessary computations
• Alternative?Bound the score
• Lower• Upper
Prune skyline points
A Simpler Representation
• More comprehensive for bounds
Bounding the Score• Q1: What is the score for B ?
Bounding the Score• Q1: What is the score for B ?• A1: Depends on the assignment of the
remaining edges
Bounding the Score• Q1: What is the score for B ?• A1: Depends on the assignment of the
remaining edges
• Q2: What is the maximum score for B ?
Bounding the Score• Q1: What is the score for B ?• A1: Depends on the assignment of the
remaining edges
• Q2: What is the maximum score for B ?• A2: Assign appropriately the remaining
edges
Bounding the Score• Q1: What is the score for B ?• A1: Depends on the assignment of the
remaining edges
• Q2: What is the maximum score for B ?• A2: Assign appropriately the remaining
edges
• Q3: What is the appropriate way?
Bounding the Score• Q1: What is the score for B ?• A1: Depends on the assignment of the
remaining edges
• Q2: What is the maximum score for B ?• A2: Assign appropriately the remaining
edges
• Q3: What is the appropriate way?• A3:
– Same layer → Higher score (dp)– Minimum overlap → Higher score (idp)
• No overlap → Loose bounds
The SkyIR Algorithm
The SkyIR Algorithm
The SkyIR Algorithm
The SkyIR Algorithm
The SkyIR Algorithm
The SkyIR Algorithm
The SkyIR Algorithm
• Priority can be:– Round Robin (RRB)– Pending points (PND)– Upper Bound (UBS)
Experimental Setup
• Datasets
• Algorithms– Baseline– SkyIR
• Bounds: Loose (LS), Collaborative (CB)• 3 Priority schemes: RRB, PND, UBS
Total Runtime – IND distr
k=5, d=3
CB-UBS is 4x faster than the Baseline
Total Runtime – ANT distr
• Interesting fact: ANT is easier than IND (fewer layers to extract)
Total Runtime – Forest Cover
Memory Consumption
CB, k=5
PND is the best memory-wise
Conclusions
• IR-style ranking for skyline– Formal framework– Bounds for efficient computation
• SkyIR algorithm– Experimental evaluation
• Future Work– Speed up / Scale up– Improve bounds (lower, upper)– Approximation technique(s)
Thank you!
Questions?
Acknowledgements: Heraclitus II fellowship, THALIS – GeomComp, THALIS – DISFER, ARISTEIA – MMD, FP7 INSIGHT