towards multidimensional skyline analysis jian pei simon fraser university, canada jpei...

33
Towards Multidimensional Skyline Analysis Jian Pei Simon Fraser University, Canada http://www.cs.sfu.ca/~jpei [email protected] Joint work with Y. Tao, M. Ester and W. Jin

Upload: annika-sarra

Post on 30-Mar-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Towards Multidimensional Skyline Analysis Jian Pei Simon Fraser University, Canada jpei jpei@cs.sfu.ca Joint work with Y. Tao, M

Towards Multidimensional Skyline Analysis

Jian PeiSimon Fraser University, Canada

http://www.cs.sfu.ca/~jpei [email protected]

Joint work with Y. Tao, M. Ester and W. Jin

Page 2: Towards Multidimensional Skyline Analysis Jian Pei Simon Fraser University, Canada jpei jpei@cs.sfu.ca Joint work with Y. Tao, M

J. Pei: Towards Multidimensional Skyline Analysis 2

Searching Flights to Sydney

• Price, travel-time and # stops all matter!• A (long) list of all feasible flights? boring to review• Presenting only some selected flights – how?

– Vancouver Honolulu Sydney ($2100, 19 hours, 1 stop) Good!– Vancouver Honolulu Auckland Sydney ($1980, 24 hours, 2

stop) Also good, cheaper, though longer travel time and more stops– Vancouver Los Angles Honolulu Sydney ($2060, 28 hours,

3 stops) Not good, more expensive, longer travel time, and more stops!

• Skyline routes – all possible trade-offs among price, travel-time and # stops superior to the others

Page 3: Towards Multidimensional Skyline Analysis Jian Pei Simon Fraser University, Canada jpei jpei@cs.sfu.ca Joint work with Y. Tao, M

J. Pei: Towards Multidimensional Skyline Analysis 3

Domination and Skyline

• A set of objects S in an n-dimensional space D=(D1, …, Dn)

– Numeric dimensions for illustration in this talk

• For u, vS, u dominates v if – u is better than v in one dimension, and – u is not worse than v in any other dimensions– For illustration in this talk, the smaller the better

• u S is a skyline object if u is not dominated by any other objects in S

Page 4: Towards Multidimensional Skyline Analysis Jian Pei Simon Fraser University, Canada jpei jpei@cs.sfu.ca Joint work with Y. Tao, M

J. Pei: Towards Multidimensional Skyline Analysis 4

Finding the Skyline in Full Space

• Many existing methods• Divide-and-conquer and block nested loops by

Borzsonyi et al.• Sort-first-skyline (SFS) by Chomicki et al.• Using bitmaps and the relationships between the

skyline and the minimum coordinates of individual points, by Tan et al.

• Using nearest-neighbor search by Kossmann et al.• The progressive branch-and-bound method by

Papadias et al.

Page 5: Towards Multidimensional Skyline Analysis Jian Pei Simon Fraser University, Canada jpei jpei@cs.sfu.ca Joint work with Y. Tao, M

J. Pei: Towards Multidimensional Skyline Analysis 5

Full Space Skyline Is Not Enough!

• Skylines in subspaces– Mr. Richer does not care about the price, how

can we derive the superior trade-offs between travel-time and number of stops from the full space skyline?

• Sky cube – computing skylines in all non-empty subspaces (Yuan et al., VLDB’05)– Any subspace skyline queries can be answered

(efficiently)

Page 6: Towards Multidimensional Skyline Analysis Jian Pei Simon Fraser University, Canada jpei jpei@cs.sfu.ca Joint work with Y. Tao, M

J. Pei: Towards Multidimensional Skyline Analysis 6

Sky Cube

Page 7: Towards Multidimensional Skyline Analysis Jian Pei Simon Fraser University, Canada jpei jpei@cs.sfu.ca Joint work with Y. Tao, M

J. Pei: Towards Multidimensional Skyline Analysis 7

Understanding Skylines

• Understanding skyline objects– Both Wilt Chamberlain and Michael Jordan are in the full

space skyline of the Great NBA Players, which merits, respectively, really make them outstanding?

– How are they different?

• Finding the decisive subspaces – the minimal combinations of factors that determine the (subspace) skyline membership of an object?– Total rebounds for Chamberlain, (total points, total

rebounds, total assists) and (games played, total points, total assists) for Jordan

Page 8: Towards Multidimensional Skyline Analysis Jian Pei Simon Fraser University, Canada jpei jpei@cs.sfu.ca Joint work with Y. Tao, M

J. Pei: Towards Multidimensional Skyline Analysis 8

Redundancy in Sky Cube

Does it just happen that skylines in multiple subspaces are identical?

Page 9: Towards Multidimensional Skyline Analysis Jian Pei Simon Fraser University, Canada jpei jpei@cs.sfu.ca Joint work with Y. Tao, M

J. Pei: Towards Multidimensional Skyline Analysis 9

Observations

• a, b and c are in the skyline of (X, Y)– Both a and c are in some subspace

skylines– b is not in any subspace skyline

• d and e are not in the skyline of (X, Y)– d is in the skyline of subspace X– e is not in any subspace skyline

• Why and in which subspaces is an object in the skyline?

Page 10: Towards Multidimensional Skyline Analysis Jian Pei Simon Fraser University, Canada jpei jpei@cs.sfu.ca Joint work with Y. Tao, M

J. Pei: Towards Multidimensional Skyline Analysis 10

Subspace Skylines Monotonic?

• Is subspace skyline membership monotonic?– x is in the skylines in spaces ABCD and A, but it is not in

the skyline in ABD – it is dominated by y in ABD

• x and y collapse in AD, x and y are in the skylines of the same subspaces of AD

Page 11: Towards Multidimensional Skyline Analysis Jian Pei Simon Fraser University, Canada jpei jpei@cs.sfu.ca Joint work with Y. Tao, M

J. Pei: Towards Multidimensional Skyline Analysis 11

Coincident Groups

• How to capture groups of objects that share values in subspaces?

• (G, B) is a coincident group (c-group) if all objects in G share the same values on all dimensions in B– GB is the projection

• A c-group (G, B) is maximal if no any further objects or dimensions can be added into the group– Example: (xy, AD)

Page 12: Towards Multidimensional Skyline Analysis Jian Pei Simon Fraser University, Canada jpei jpei@cs.sfu.ca Joint work with Y. Tao, M

J. Pei: Towards Multidimensional Skyline Analysis 12

C-Group Lattices

C-group lattices Maximal c-group latticesquotient

Where are the skylines?Are they also in good structure?

Page 13: Towards Multidimensional Skyline Analysis Jian Pei Simon Fraser University, Canada jpei jpei@cs.sfu.ca Joint work with Y. Tao, M

J. Pei: Towards Multidimensional Skyline Analysis 13

Skyline Groups

• A maximal c-group (G, B) is a skyline group if GB is in the subspace skyline of B

• How to characterize the subspaces where GB is in the skyline?– (x, ABCD) is a skyline group– If the set of subspaces are convex, we can use bounds

Page 14: Towards Multidimensional Skyline Analysis Jian Pei Simon Fraser University, Canada jpei jpei@cs.sfu.ca Joint work with Y. Tao, M

J. Pei: Towards Multidimensional Skyline Analysis 14

Decisive Subspaces

• A space CB is decisive if– GC is in the subspace skyline of C

– No any other objects share the same values with objects in G on C

– C is minimal – no C’C has the above two properties

• (x, ABCD) is a skyline group, AC, CD are decisive

Page 15: Towards Multidimensional Skyline Analysis Jian Pei Simon Fraser University, Canada jpei jpei@cs.sfu.ca Joint work with Y. Tao, M

J. Pei: Towards Multidimensional Skyline Analysis 15

Semantics

• In which subspaces an object or a group of objects are in the skyline?

• The skyline membership of skyline groups are established by their decisive subspaces– For skyline group (G, B), if C is decisive, then G

is in the skyline of any subspace C’ where CC’B

• Signature of skyline group Sig(G, B)=(GB, C1, …, Ck) where C1, …, Ck are all decisive subspaces

Page 16: Towards Multidimensional Skyline Analysis Jian Pei Simon Fraser University, Canada jpei jpei@cs.sfu.ca Joint work with Y. Tao, M

J. Pei: Towards Multidimensional Skyline Analysis 16

Example

The skyline membership of an object is determined by the skyline groups in which it participates

An object u is in the skyline of subspace C if and only if there exists a skyline group (G, B) and its decisive subspace C’ such that uG and C’CB

Page 17: Towards Multidimensional Skyline Analysis Jian Pei Simon Fraser University, Canada jpei jpei@cs.sfu.ca Joint work with Y. Tao, M

J. Pei: Towards Multidimensional Skyline Analysis 17

Subspace Skyline Analysis

• All skyline projections form a lattice (skyline projection lattice)– A sub-lattice of the c-group lattice

• All skyline groups form a lattice (skyline group lattice)– A quotient lattice of the skyline projection lattice– A sub-lattice of the maximal c-group lattice

Page 18: Towards Multidimensional Skyline Analysis Jian Pei Simon Fraser University, Canada jpei jpei@cs.sfu.ca Joint work with Y. Tao, M

J. Pei: Towards Multidimensional Skyline Analysis 18

Relationship Among Lattices

C-group lattices Maximal c-group lattices

Skyline projection lattices Skyline group lattices

quotient

quotient

sub-lattice sub-lattice

Page 19: Towards Multidimensional Skyline Analysis Jian Pei Simon Fraser University, Canada jpei jpei@cs.sfu.ca Joint work with Y. Tao, M

J. Pei: Towards Multidimensional Skyline Analysis 19

OLAP Analysis on Skylines

• Subspace skylines

• Relationships between skylines in subspaces

• Closure information

Page 20: Towards Multidimensional Skyline Analysis Jian Pei Simon Fraser University, Canada jpei jpei@cs.sfu.ca Joint work with Y. Tao, M

J. Pei: Towards Multidimensional Skyline Analysis 20

Full Space vs. Subspace Skylines

• For any skyline group (G, B), there exists at least one object uG such that u is in the full space skyline– Can use u as the representative of the group

• An object not in the full skyline can be in some subspace skyline only if it collapses to some full space skyline objects– All objects not in the full space skyline and not

collapsing to any full space skyline object can be removed from skyline analysis

– If only the projections are concerned, only the full space skyline objects are sufficient for skyline analysis

Page 21: Towards Multidimensional Skyline Analysis Jian Pei Simon Fraser University, Canada jpei jpei@cs.sfu.ca Joint work with Y. Tao, M

J. Pei: Towards Multidimensional Skyline Analysis 21

Computing Skylines in All Subspaces

• NP-hard– Intuition: the curse of dimensionality – there are an

exponential number of subspaces

• Reduction from frequent itemset mining

Tid Items

T1 {a, b, c}

T2 {a, c, d, e}

T3 {b, c, d, e}

If min_sup=2, a, b, c, d, e, ac, bc, cd, cde, de are frequent itemsets

Oid a b c d e

O1 0 0 0 1 1

O2 0 1 0 0 0

O3 1 0 0 0 0

O0 0 0 0 0 0

Sup(cde)=# skyline objects in cde - 1

Page 22: Towards Multidimensional Skyline Analysis Jian Pei Simon Fraser University, Canada jpei jpei@cs.sfu.ca Joint work with Y. Tao, M

J. Pei: Towards Multidimensional Skyline Analysis 22

Subspace Skyline Computation

• Compute the set of skyline groups and their signatures– NP-hard: reduction from frequent closed itemset mining

• Top-down enumeration of subspaces– Similar ideas in skyline cube computation

• For each subspace, find skyline groups and decisive subspaces– Find (subspace) skylines by sorting– Share sorting and use merge-sorting as much as

possible

Page 23: Towards Multidimensional Skyline Analysis Jian Pei Simon Fraser University, Canada jpei jpei@cs.sfu.ca Joint work with Y. Tao, M

J. Pei: Towards Multidimensional Skyline Analysis 23

Enumerating Subspaces

• Using a top-down enumeration tree– Each child explores a proper subspace with one

dimension less– All objects not in the skyline of the parent subspace and

not collapsing to one skyline object of the parent subspace can be removed

Page 24: Towards Multidimensional Skyline Analysis Jian Pei Simon Fraser University, Canada jpei jpei@cs.sfu.ca Joint work with Y. Tao, M

J. Pei: Towards Multidimensional Skyline Analysis 24

Computing Skylines by Sorting

• Sort all objects in lexicographic ascending order– a-d-b-e-c

• Check objects in the sorted list, an object is in the skyline if it is not dominated by any skyline objects before it in the list– {a, b, c} are skyline objects

Page 25: Towards Multidimensional Skyline Analysis Jian Pei Simon Fraser University, Canada jpei jpei@cs.sfu.ca Joint work with Y. Tao, M

J. Pei: Towards Multidimensional Skyline Analysis 25

Efficient Local Sorting

• Not necessary to sort for each subspace– A sorted list in subspace (A, B, C, D) can be used in

subspaces (A), (A, B), (A, B, C)– To generate a sorted list in subspace (B, C, D), we can

use merging sort to merge the sublists of different values on A

• If a non-skyline object collapses to a skyline object, the skyline object “absorbs” the non-skyline object by taking the non-skyline object’s id– A non-skyline object may be “absorbed” by multiple

skyline objects– Recursively reduce the number of objects and shorten

the sorted lists

Page 26: Towards Multidimensional Skyline Analysis Jian Pei Simon Fraser University, Canada jpei jpei@cs.sfu.ca Joint work with Y. Tao, M

J. Pei: Towards Multidimensional Skyline Analysis 26

Results on Great NBA Players’

• 17,266 records• 4 attributes are selected• 67 skyline records in the full space, 146 decisive

subspaces

Page 27: Towards Multidimensional Skyline Analysis Jian Pei Simon Fraser University, Canada jpei jpei@cs.sfu.ca Joint work with Y. Tao, M

J. Pei: Towards Multidimensional Skyline Analysis 27

# Skyline Groups vs. Dimensionality

• Dimensionality: the complexity of subspaces – A 1-d subspace has only one skyline group– A high-dimensional subspace many have many skyline groups– # skyline groups tends to increase when dimensionality increases

• Number of subspaces– An n-d data set has n 1-d subspaces, 1 n-d (sub-)space, and

n!/[(n/2)!(n/2)!] n/2-d subspaces (if n is even)

• The number of skyline groups in subspaces of dimensionality k depends on the joint-effect of the two factors– When k < n/2, the two factors are consistent– When k > n/2, the two factors are contrasting

Page 28: Towards Multidimensional Skyline Analysis Jian Pei Simon Fraser University, Canada jpei jpei@cs.sfu.ca Joint work with Y. Tao, M

J. Pei: Towards Multidimensional Skyline Analysis 28

About the Synthetic Data Sets

• Independent: attribute values are uniformly distributed

• Correlated: if a record is good in one dimension, likely it is also good in others

• Anti-correlated: if a record is good in one dimension, it is unlikely to be good in others

Page 29: Towards Multidimensional Skyline Analysis Jian Pei Simon Fraser University, Canada jpei jpei@cs.sfu.ca Joint work with Y. Tao, M

J. Pei: Towards Multidimensional Skyline Analysis 29

Scalability w.r.t Database Size

Independent

Correlated

Anti-correlated

Page 30: Towards Multidimensional Skyline Analysis Jian Pei Simon Fraser University, Canada jpei jpei@cs.sfu.ca Joint work with Y. Tao, M

J. Pei: Towards Multidimensional Skyline Analysis 30

Scalability w.r.t. Dimensionality

Page 31: Towards Multidimensional Skyline Analysis Jian Pei Simon Fraser University, Canada jpei jpei@cs.sfu.ca Joint work with Y. Tao, M

J. Pei: Towards Multidimensional Skyline Analysis 31

Conclusions

• Skyline analysis is important in many applications– Only skyline objects in the full space may not be enough

• Skyline cube is powerful to answer subspace skyline queries– But it is interesting to ask why an object is in the

subspace skylines, and more• Skyline groups and decisive subspaces –

capturing the semantics of subspace skylines• OLAP subspace skyline analysis• An efficient algorithm to compute skyline groups• Latest progress: An efficient algorithm to query

subspace skylines (Tao et al., ICDE’06)

Page 32: Towards Multidimensional Skyline Analysis Jian Pei Simon Fraser University, Canada jpei jpei@cs.sfu.ca Joint work with Y. Tao, M

J. Pei: Towards Multidimensional Skyline Analysis 32

References

• J. Pei, W. Jin, M. Ester, and Y. Tao. "Catching the Best Views of Skyline: A Semantic Approach Based on Decisive Subspaces". In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB'05), Trondheim, Norway, August 30-September 2, 2005.

• Y. Tao, X. Xiao, and J. Pei. "SUBSKY: Efficient Computation of Skylines in Subspaces". In Proceedings of the 22nd International Conference on Data Engineering (ICDE'06), Atlanta, GA, USA, April 3-7, 2006.

Page 33: Towards Multidimensional Skyline Analysis Jian Pei Simon Fraser University, Canada jpei jpei@cs.sfu.ca Joint work with Y. Tao, M

J. Pei: Towards Multidimensional Skyline Analysis 33

Thank You!

Vancouver, BC, Canadahttp://members.virtualtourist.com/m/822f5/dc80f/

Trondheim, Norway By Gerold Jung

Hong Konghttp://lambcutlet.org/gallery/Day_6/Hong_Kong_Island_

skyline_on_a_cloudy_night_around_Central