skyline charuka silva. outline charuka silva, skyline2 motivation skyline definition applications...
TRANSCRIPT
Skyline
Charuka Silva
Outline
Charuka Silva, Skyline2
Motivation Skyline Definition Applications Skyline Query Similar Interesting Problem Algorithms
Divide and Conquer Algorithm Index based Algorithm Nearest Neighbor
Trip to Nassau (Bahamas) Hotel that is cheap and close to the beach. Two goals are complementary as the hotels near
the beach tend to be more expensive. Travel agent can suggest all interesting hotels. Interesting are all hotels that are not worse than
any other hotel in both dimensions. We call this set of interesting hotels the Skyline
Charuka Silva, Skyline3
Distribution of Hotels
Charuka Silva, Skyline4
Formal Skyline Definition
Skyline is defined as those points which are not dominated by any other point. A point dominates
another point if it is as good or better in all dimensions and better in at least one dimension.
Charuka Silva, Skyline5
Where It Applies?
Skyline operator is important for applications
involving multi-criteria decision making.
Charuka Silva, Skyline6
Some Applications
Customer information systems, travel agencies and mobile city guides. Skyline has to be computed as user move on.
The Skyline of Manhattan, for instance, can be computed as the set of buildings which are high and close to the Hudson river.
Decision Support (Business intelligence), e.g. Customers who buy more and complain little
Data visualization. E.g. The points of an object from certain perspective can be determined
Distributed Query optimization. E.g. find set of interesting sites which have high computation power and are close to data needed to execute the query.
Charuka Silva, Skyline7
Skyline Query
select * from Hotels, skyline of price min , distance
min
what else:
max, joins, group by and so on.
Charuka Silva, Skyline8
Skyline Query Results
Results for the query will be{a,i,k}
Charuka Silva, Skyline9
Top-K Queries Vs Skyline
Top-K (or ranked) queries retrieve the best K objects that minimize a specific preference function.
E.g. Given preference function f(x,y)=x+y, the top-3 query
Retrieves <i,5>, <h,7>, <m,8> (in this order)
Charuka Silva, Skyline10
Divide-and-Conquer (D&C)
Divides the dataset into several partitions so that
each partition fits in memory
The partial skyline of the points in every partition
is computed
Merge the partial ones to obtain full skyline
Algorithm 1
Charuka Silva, Skyline11
{ a,c,g}, {d}, {i},{m,k}
Partitioned Space
Charuka Silva, Skyline12
Divide and Conquer
All points in the skyline of s3 must remain.
Those in s2 are discarded; dominated by s3
Each skyline point in s1 is compared only with
points in s3, no point in s2 or s4 can dominate
those in s1.
Charuka Silva, Skyline13
Drawbacks
D&C efficient only for small data sets. If the data
set is large, the partitioning process requires
reading and writing entire data set at least once :
high I/O cost
Not suitable for online applications: can't report
any results until partition process completes.
Charuka Silva, Skyline14
Index Based Skyline
Organize set of d-dimensional points into d lists,
a point p = (p1, p2, ..., pd) is assigned to the ith
list (1≤i≤d) when pi is the smallest.
Points in each list are sorted in ascending order
of their minimum
A batch in the ith list consists of points that have
the same ith coordinate
Algorithm 2
Charuka Silva, Skyline15
Index List
Charuka Silva, Skyline16
Processing a batch
Computing the skyline inside the batch
Among the computed points, it adds the ones not
dominated by any of the already-found skyline
points into the skyline list
Charuka Silva, Skyline17
Processing a batch
Loads the first batch of each list, and handles the one with the minimum minC ( i.e. {a}, {k} ), add {a} to the Skyline list
Compare batch {b} and {k}, and add {k} to the list. Load {b} and {i,m} ; Find skyline inside {i,m} first, that is {i} Compare {i} and {b} and add {i} to skyline list Algorithm stops, since any other batch is greater than or equal to {i} Skyline is {a,k,i}
Charuka Silva, Skyline18
Pros and Cons Hashing technique is straight forward and incurs
low CPU overhead But high I/O cost, since multiple queries access
large part of space. Propagate and merge incur high I/O cost to scan
to-do lost every time when a point is discovered and when finding best fit to merge.
Charuka Silva, Skyline19
Nearest Neighbor (NN)
Performs a NN query on the R-tree, to find the point with the minimum distance from the beginning of the axes (point o).
Distances are computed according to L1 norm All the points in the dominance region are
exempt from further consideration Results of NN search is used to partition the data
universe recursively.
Algorithm 3
Charuka Silva, Skyline20
Nearest Neighbor (NN)
Two Partitions [0,ix) [0,∞) and (ii) [0,∞) [0,iy)
Partition1: 1, 3 Partition2: 1,2
Charuka Silva, Skyline21
Nearest Neighbor (NN)
The set of partitions resulting after the discovery of a skyline point are inserted in a to-do list
While the to-do list is not empty, NN removes one of the partitions from the list and recursively repeats the same process
Charuka Silva, Skyline22
Nearest Neighbor (NN)
[0,ax) [0,∞) subdivisions 1 and 3[0,ix) [0,ay) subdivision 1 and 2
Charuka Silva, Skyline23
NN Concepts Laisser-faire: A main memory hash table stores the
skyline points found so far. Propagate: When a point p is found, all the partitions in
the to-do list that contain p are removed and re-partitioned according to p.
Merge: The main idea is to merge partitions in the to-do, thus reducing the number of queries that have to be performed.
Fine-grained Partitioning: The original NN algorithm generates d partitions after a skyline point is found. An alternative approach is to generate 2d non-overlapping subdivisions.
Charuka Silva, Skyline24
Reference
S. Borzs onyi, D. Kossmann, and K. Stocker.The skyline operator. In Proc. IEEE Conf. on Data Engineering, Heidelberg, Germany, 2001.
K.-L. Tan, P.-K. Eng, and B. C. Ooi. Ecient progressive skyline computation. In Proc. of the Conf. on Very Large Data Bases, Rome, Italy, Sept. 2001
H. T. Kung, F. Luccio, and F. P. Preparata.On finding the maxima of a set of vectors. Journal of the ACM, 22(4), 1975
Kossmann, D., Ramsak, F., Rost, S. Shooting Stars in the Sky: an Online Algorithm for Skyline Queries.VLDB, 2002.
Dimitris Papadias, Yufei Tao , Greg Fu Bernhard Seeger. An optimal and progressive algorithm for skyline queries. In Conf. on Management of Data ACM SIGMOD 2003.
Charuka Silva, Skyline25