skyline charuka silva. outline charuka silva, skyline2 motivation skyline definition applications...

25
Skyline Charuka Silva

Upload: kaylah-gladney

Post on 01-Apr-2015

223 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Skyline Charuka Silva. Outline Charuka Silva, Skyline2  Motivation  Skyline Definition  Applications  Skyline Query  Similar Interesting Problem

Skyline

Charuka Silva

Page 2: Skyline Charuka Silva. Outline Charuka Silva, Skyline2  Motivation  Skyline Definition  Applications  Skyline Query  Similar Interesting Problem

Outline

Charuka Silva, Skyline2

Motivation Skyline Definition Applications Skyline Query Similar Interesting Problem Algorithms

Divide and Conquer Algorithm Index based Algorithm Nearest Neighbor

Page 3: Skyline Charuka Silva. Outline Charuka Silva, Skyline2  Motivation  Skyline Definition  Applications  Skyline Query  Similar Interesting Problem

Trip to Nassau (Bahamas) Hotel that is cheap and close to the beach. Two goals are complementary as the hotels near

the beach tend to be more expensive. Travel agent can suggest all interesting hotels. Interesting are all hotels that are not worse than

any other hotel in both dimensions. We call this set of interesting hotels the Skyline

Charuka Silva, Skyline3

Page 4: Skyline Charuka Silva. Outline Charuka Silva, Skyline2  Motivation  Skyline Definition  Applications  Skyline Query  Similar Interesting Problem

Distribution of Hotels

Charuka Silva, Skyline4

Page 5: Skyline Charuka Silva. Outline Charuka Silva, Skyline2  Motivation  Skyline Definition  Applications  Skyline Query  Similar Interesting Problem

Formal Skyline Definition

Skyline is defined as those points which are not dominated by any other point. A point dominates

another point if it is as good or better in all dimensions and better in at least one dimension.

Charuka Silva, Skyline5

Page 6: Skyline Charuka Silva. Outline Charuka Silva, Skyline2  Motivation  Skyline Definition  Applications  Skyline Query  Similar Interesting Problem

Where It Applies?

Skyline operator is important for applications

involving multi-criteria decision making.

Charuka Silva, Skyline6

Page 7: Skyline Charuka Silva. Outline Charuka Silva, Skyline2  Motivation  Skyline Definition  Applications  Skyline Query  Similar Interesting Problem

Some Applications

Customer information systems, travel agencies and mobile city guides. Skyline has to be computed as user move on.

The Skyline of Manhattan, for instance, can be computed as the set of buildings which are high and close to the Hudson river.

Decision Support (Business intelligence), e.g. Customers who buy more and complain little

Data visualization. E.g. The points of an object from certain perspective can be determined

Distributed Query optimization. E.g. find set of interesting sites which have high computation power and are close to data needed to execute the query.

Charuka Silva, Skyline7

Page 8: Skyline Charuka Silva. Outline Charuka Silva, Skyline2  Motivation  Skyline Definition  Applications  Skyline Query  Similar Interesting Problem

Skyline Query

select * from Hotels, skyline of price min , distance

min

what else:

max, joins, group by and so on.

Charuka Silva, Skyline8

Page 9: Skyline Charuka Silva. Outline Charuka Silva, Skyline2  Motivation  Skyline Definition  Applications  Skyline Query  Similar Interesting Problem

Skyline Query Results

Results for the query will be{a,i,k}

Charuka Silva, Skyline9

Page 10: Skyline Charuka Silva. Outline Charuka Silva, Skyline2  Motivation  Skyline Definition  Applications  Skyline Query  Similar Interesting Problem

Top-K Queries Vs Skyline

Top-K (or ranked) queries retrieve the best K objects that minimize a specific preference function.

E.g. Given preference function f(x,y)=x+y, the top-3 query

Retrieves <i,5>, <h,7>, <m,8> (in this order)

Charuka Silva, Skyline10

Page 11: Skyline Charuka Silva. Outline Charuka Silva, Skyline2  Motivation  Skyline Definition  Applications  Skyline Query  Similar Interesting Problem

Divide-and-Conquer (D&C)

Divides the dataset into several partitions so that

each partition fits in memory

The partial skyline of the points in every partition

is computed

Merge the partial ones to obtain full skyline

Algorithm 1

Charuka Silva, Skyline11

Page 12: Skyline Charuka Silva. Outline Charuka Silva, Skyline2  Motivation  Skyline Definition  Applications  Skyline Query  Similar Interesting Problem

{ a,c,g}, {d}, {i},{m,k}

Partitioned Space

Charuka Silva, Skyline12

Page 13: Skyline Charuka Silva. Outline Charuka Silva, Skyline2  Motivation  Skyline Definition  Applications  Skyline Query  Similar Interesting Problem

Divide and Conquer

All points in the skyline of s3 must remain.

Those in s2 are discarded; dominated by s3

Each skyline point in s1 is compared only with

points in s3, no point in s2 or s4 can dominate

those in s1.

Charuka Silva, Skyline13

Page 14: Skyline Charuka Silva. Outline Charuka Silva, Skyline2  Motivation  Skyline Definition  Applications  Skyline Query  Similar Interesting Problem

Drawbacks

D&C efficient only for small data sets. If the data

set is large, the partitioning process requires

reading and writing entire data set at least once :

high I/O cost

Not suitable for online applications: can't report

any results until partition process completes.

Charuka Silva, Skyline14

Page 15: Skyline Charuka Silva. Outline Charuka Silva, Skyline2  Motivation  Skyline Definition  Applications  Skyline Query  Similar Interesting Problem

Index Based Skyline

Organize set of d-dimensional points into d lists,

a point p = (p1, p2, ..., pd) is assigned to the ith

list (1≤i≤d) when pi is the smallest.

Points in each list are sorted in ascending order

of their minimum

A batch in the ith list consists of points that have

the same ith coordinate

Algorithm 2

Charuka Silva, Skyline15

Page 16: Skyline Charuka Silva. Outline Charuka Silva, Skyline2  Motivation  Skyline Definition  Applications  Skyline Query  Similar Interesting Problem

Index List

Charuka Silva, Skyline16

Page 17: Skyline Charuka Silva. Outline Charuka Silva, Skyline2  Motivation  Skyline Definition  Applications  Skyline Query  Similar Interesting Problem

Processing a batch

Computing the skyline inside the batch

Among the computed points, it adds the ones not

dominated by any of the already-found skyline

points into the skyline list

Charuka Silva, Skyline17

Page 18: Skyline Charuka Silva. Outline Charuka Silva, Skyline2  Motivation  Skyline Definition  Applications  Skyline Query  Similar Interesting Problem

Processing a batch

Loads the first batch of each list, and handles the one with the minimum minC ( i.e. {a}, {k} ), add {a} to the Skyline list

Compare batch {b} and {k}, and add {k} to the list. Load {b} and {i,m} ; Find skyline inside {i,m} first, that is {i} Compare {i} and {b} and add {i} to skyline list Algorithm stops, since any other batch is greater than or equal to {i} Skyline is {a,k,i}

Charuka Silva, Skyline18

Page 19: Skyline Charuka Silva. Outline Charuka Silva, Skyline2  Motivation  Skyline Definition  Applications  Skyline Query  Similar Interesting Problem

Pros and Cons Hashing technique is straight forward and incurs

low CPU overhead But high I/O cost, since multiple queries access

large part of space. Propagate and merge incur high I/O cost to scan

to-do lost every time when a point is discovered and when finding best fit to merge.

Charuka Silva, Skyline19

Page 20: Skyline Charuka Silva. Outline Charuka Silva, Skyline2  Motivation  Skyline Definition  Applications  Skyline Query  Similar Interesting Problem

Nearest Neighbor (NN)

Performs a NN query on the R-tree, to find the point with the minimum distance from the beginning of the axes (point o).

Distances are computed according to L1 norm All the points in the dominance region are

exempt from further consideration Results of NN search is used to partition the data

universe recursively.

Algorithm 3

Charuka Silva, Skyline20

Page 21: Skyline Charuka Silva. Outline Charuka Silva, Skyline2  Motivation  Skyline Definition  Applications  Skyline Query  Similar Interesting Problem

Nearest Neighbor (NN)

Two Partitions [0,ix) [0,∞) and (ii) [0,∞) [0,iy)

Partition1: 1, 3 Partition2: 1,2

Charuka Silva, Skyline21

Page 22: Skyline Charuka Silva. Outline Charuka Silva, Skyline2  Motivation  Skyline Definition  Applications  Skyline Query  Similar Interesting Problem

Nearest Neighbor (NN)

The set of partitions resulting after the discovery of a skyline point are inserted in a to-do list

While the to-do list is not empty, NN removes one of the partitions from the list and recursively repeats the same process

Charuka Silva, Skyline22

Page 23: Skyline Charuka Silva. Outline Charuka Silva, Skyline2  Motivation  Skyline Definition  Applications  Skyline Query  Similar Interesting Problem

Nearest Neighbor (NN)

[0,ax) [0,∞) subdivisions 1 and 3[0,ix) [0,ay) subdivision 1 and 2

Charuka Silva, Skyline23

Page 24: Skyline Charuka Silva. Outline Charuka Silva, Skyline2  Motivation  Skyline Definition  Applications  Skyline Query  Similar Interesting Problem

NN Concepts Laisser-faire: A main memory hash table stores the

skyline points found so far. Propagate: When a point p is found, all the partitions in

the to-do list that contain p are removed and re-partitioned according to p.

Merge: The main idea is to merge partitions in the to-do, thus reducing the number of queries that have to be performed.

Fine-grained Partitioning: The original NN algorithm generates d partitions after a skyline point is found. An alternative approach is to generate 2d non-overlapping subdivisions.

Charuka Silva, Skyline24

Page 25: Skyline Charuka Silva. Outline Charuka Silva, Skyline2  Motivation  Skyline Definition  Applications  Skyline Query  Similar Interesting Problem

Reference

S. Borzs onyi, D. Kossmann, and K. Stocker.The skyline operator. In Proc. IEEE Conf. on Data Engineering, Heidelberg, Germany, 2001.

K.-L. Tan, P.-K. Eng, and B. C. Ooi. Ecient progressive skyline computation. In Proc. of the Conf. on Very Large Data Bases, Rome, Italy, Sept. 2001

H. T. Kung, F. Luccio, and F. P. Preparata.On finding the maxima of a set of vectors. Journal of the ACM, 22(4), 1975

Kossmann, D., Ramsak, F., Rost, S. Shooting Stars in the Sky: an Online Algorithm for Skyline Queries.VLDB, 2002.

Dimitris Papadias, Yufei Tao , Greg Fu  Bernhard Seeger. An optimal and progressive algorithm for skyline queries. In Conf. on Management of Data ACM SIGMOD 2003.

Charuka Silva, Skyline25