![Page 1: Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti](https://reader036.vdocuments.net/reader036/viewer/2022062503/5a4d1ae17f8b9ab059977242/html5/thumbnails/1.jpg)
Work-Efficient Parallel Skyline Computation for
the GPUAuthors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University).
Type: Research Paper
Presented by: Dardan Xhymshiti
Fall 2015
![Page 2: Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti](https://reader036.vdocuments.net/reader036/viewer/2022062503/5a4d1ae17f8b9ab059977242/html5/thumbnails/2.jpg)
Outline Introduction Skyline computation Related-Work GPU-Friendly partitioning The SKYALIGN algorithm Experimental evaluation
![Page 3: Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti](https://reader036.vdocuments.net/reader036/viewer/2022062503/5a4d1ae17f8b9ab059977242/html5/thumbnails/3.jpg)
Introduction Skyline operator:
First introduced:
Stephan Borzsonyi, Donald Kossman, Konrad Stocker 2001
(Universitat Passau & Technische Universitat Muncen Germany)
![Page 4: Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti](https://reader036.vdocuments.net/reader036/viewer/2022062503/5a4d1ae17f8b9ab059977242/html5/thumbnails/4.jpg)
Introduction Skyline operator: Example:
1. Go for a one day skiing in one of the Colorado’s ski center. 2. You have spent a lot of money. 3. It happens a car defect. 4. Try to find the nearest and cheapest hotel. 5. Take your phone and lunch the unknown touristic application. 6. A lot of hotels in different locations with variety of prices. 7. You want to find the CHEAPEST and the NEAREST one!?
![Page 5: Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti](https://reader036.vdocuments.net/reader036/viewer/2022062503/5a4d1ae17f8b9ab059977242/html5/thumbnails/5.jpg)
Introduction Skyline operator: Example:
Query results:
Result query Price Distance (Miles)
Hotel A $120 1.5Hotel B $140 1Hotel C $200 2Hotel D $150 0.7…. … …
120 140 150 200
0.71
1.5
2
![Page 6: Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti](https://reader036.vdocuments.net/reader036/viewer/2022062503/5a4d1ae17f8b9ab059977242/html5/thumbnails/6.jpg)
Introduction Skyline operator: Example:
Query results:
Result query Price Distance (Miles)
Hotel A $120 1.5Hotel B $140 1Hotel C $200 2Hotel D $150 0.7…. … …
120 140 150 200
0.71
1.5
2Skyline set = {Hotel A, Hotel B, Hotel D}
Term: Dominance
![Page 7: Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti](https://reader036.vdocuments.net/reader036/viewer/2022062503/5a4d1ae17f8b9ab059977242/html5/thumbnails/7.jpg)
Introduction Major problems:
Multidimensional data.
Computation intensive.
Comparison tuple-to-tuple (point-to-point).
What is done till now: State-of-the art sequential algorithms. Parallel skyline query processing algorithms.
Often try to achieve device’s maximum theoretical compute throughput.
Throughput is costly. The most efficient GPU algorithm GSS, does up to 650 times
more work comparing to the best sequential algorithm, even if executing in 2688 cores.
For benchmark datasets, sequential algorithms perform 3x faster than GPU ones.
Should we use GPU or NOT?
![Page 8: Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti](https://reader036.vdocuments.net/reader036/viewer/2022062503/5a4d1ae17f8b9ab059977242/html5/thumbnails/8.jpg)
Introduction Sequential algorithms high performance is achieved by using:
Trees
Recursion
Strict ordering of computation.
Unpredictable branching.
Motivation:
Come up with a new algorithm called SkyAlign which: MAIN POINT: Avoid as much as it can point-to-point comparisons.
Employ a globally static grid schema to make the dataset compatible for GPU.
This algorithm do not maximizes THROUGHPUT but is WORK-EFFICIENT.
Many of these techniques are not compatible with
GPU.
![Page 9: Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti](https://reader036.vdocuments.net/reader036/viewer/2022062503/5a4d1ae17f8b9ab059977242/html5/thumbnails/9.jpg)
Introduction
Dataset
Skyline set
Parallel
Dataset Skyline set
SequentialVS
![Page 10: Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti](https://reader036.vdocuments.net/reader036/viewer/2022062503/5a4d1ae17f8b9ab059977242/html5/thumbnails/10.jpg)
Skyline computation Notations:
P : dataset
: number of tuples (points) in the dataset
dimensions (number of attributes)
arbitrary points
: the value of the attribute in the tuple (point)
Id1 2 32 2 12 4 13 3 3
![Page 11: Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti](https://reader036.vdocuments.net/reader036/viewer/2022062503/5a4d1ae17f8b9ab059977242/html5/thumbnails/11.jpg)
Skyline computation Skyline definitions:
Skyline is defined through the concept of dominance. Definition 1:
Data point (tuple) A dominates the data point iff:
1. for all the attribute values
2. for at least one attribute value Definition 2 (on this paper):
Point dominates point , denoted by iff:
If neither we say that and are incomparable.
Transitivity:
![Page 12: Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti](https://reader036.vdocuments.net/reader036/viewer/2022062503/5a4d1ae17f8b9ab059977242/html5/thumbnails/12.jpg)
Skyline computation Measuring skyline work:
Dominance Test (DT)
Determining if a data point dominates the data point by comparing point-to-point.
Defining the number of DTs done, actually tells the skyline work performed.
Mask Test(MT)
Define bitmask for each point by comparing it with a skyline (pivot) point.
Use transitivity for pruning the number of tests.
Mask Tests are much cheaper than Dominance Tests.
![Page 13: Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti](https://reader036.vdocuments.net/reader036/viewer/2022062503/5a4d1ae17f8b9ab059977242/html5/thumbnails/13.jpg)
Skyline computation GPU Computation
Tesla K80: 4992 number of Cuda Cores. Threads are grouped into warps usually of sizes 32. Warps are grouped into thread blocks. All threads within a warp execute the same instruction at the same time. Problem: branch divergence.
![Page 14: Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti](https://reader036.vdocuments.net/reader036/viewer/2022062503/5a4d1ae17f8b9ab059977242/html5/thumbnails/14.jpg)
Related work Partition-based skyline algorithms
Divide-and-Conquer:
Halved the dataspace recursively by the median of an arbitrarily chosen dimension and solved each half. After that the results are merged.
Sequential partition-based algorithms:
These algorithms employ recursive, point-based partitioning.
For each partition defined, a skyline point (pivot), is found, and the other points are partitioned based on their relationship to the pivot.
The work performed varies from the pivot selected.
SkyAlign: is a partition-based method, but it is not recursive and has no merge.
![Page 15: Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti](https://reader036.vdocuments.net/reader036/viewer/2022062503/5a4d1ae17f8b9ab059977242/html5/thumbnails/15.jpg)
Related work Sort-based (and GPU) skyline algorithms
Obtain efficiency from monotonicity and transitivity. Block-nested-loops algorithm(BNL)
Each unprocessed point is compared with DT against each point which actually is a skyline point. If the is removed and control passes to the next point.
Sort-first skyline (SFS)
Sorts the data points prior to executing BNL. Once a point is added to the solution, it will never be removed.
GNL
Assigns a thread for every point . ’ thread compares it with another data points to check the dominance criterion.
![Page 16: Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti](https://reader036.vdocuments.net/reader036/viewer/2022062503/5a4d1ae17f8b9ab059977242/html5/thumbnails/16.jpg)
GPU-Friendly partitioning Work efficiency of skyline algorithms comes from skipping DTs. To know which DTs to skip among two data points and we need to know if they are
incomparable. Transitivity helps on this. Example:
1. Say that we have three data tuples: and
2. The relationship of with is represented with one bit for each ,d). (Mask Test)
3. The relationship of with also is represented with one bit for each ,d).
4. The incomparability between with can be detected by comparing these mask tests.
Mask Test (MT) is cheaper than (DT).
![Page 17: Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti](https://reader036.vdocuments.net/reader036/viewer/2022062503/5a4d1ae17f8b9ab059977242/html5/thumbnails/17.jpg)
GPU-Friendly partitioningGet to know with point-based methods Point-based recursive partitioning methods use a quad-tree partitioning of the data
set and record skyline points as they are found in a tree.
CB
E
A
D
F
Skyline points (pivots):
![Page 18: Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti](https://reader036.vdocuments.net/reader036/viewer/2022062503/5a4d1ae17f8b9ab059977242/html5/thumbnails/18.jpg)
GPU-Friendly partitioning
Each tree node contains a bitmask that records on which dimensions is worse than its parent.
When processing a point the quad tree can be used to eliminate DTs for .
First builds a bitmask recording its dimension-wise relationship to the root of the tree (in this case C).
If all bits are set (all bits in bitmask are 1) is dominated, otherwise only children of the root (B, E) for which comparing the bitmasks between and them do not infer incomparability need to be visited.
Deeper tree, permits skipping more DTs.
![Page 19: Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti](https://reader036.vdocuments.net/reader036/viewer/2022062503/5a4d1ae17f8b9ab059977242/html5/thumbnails/19.jpg)
GPU-Friendly partitioningWhy recursive partitioning is not preferred?
High divergence
Traversal
Consider when points in F are to compare with points in D. First a DT with the root E is performed for each point, so generating bitmasks. These
bitmasks are then used to determine which branches of D each point of F should traverse. Results often diverge.
Partitioning
Each partition has to be sub-partitioned relative to its own pivot.
The pivot needs to be skyline.
High dimensions
Quad-tree partitioning do not scale well with dimensionality.
![Page 20: Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti](https://reader036.vdocuments.net/reader036/viewer/2022062503/5a4d1ae17f8b9ab059977242/html5/thumbnails/20.jpg)
GPU-Friendly partitioningA static grid alternative Each dimension is split based on the quartiles computed from the dimension
values. There is defined three global pivots one corresponding to each quartile boundary.
For each point there is defined:
1. One bitmask relative to the median
2. One bitmask relative to either first or third quartile. First level: all the points are partition by their relationship to the median of the
dataset. Second level: All the points are partitioned by their relationship to either the first or
third quartiles. Do we need a third level?
![Page 21: Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti](https://reader036.vdocuments.net/reader036/viewer/2022062503/5a4d1ae17f8b9ab059977242/html5/thumbnails/21.jpg)
GPU-Friendly partitioningDefinition of masks Let:
be the quartile for the attribute
be the median
be the quartile for the attribute
We denote by:
the median-level-resolution bitmask for point
: the quartile-level-resolution bitmask for point
![Page 22: Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti](https://reader036.vdocuments.net/reader036/viewer/2022062503/5a4d1ae17f8b9ab059977242/html5/thumbnails/22.jpg)
GPU-Friendly partitioningDefinition of masks For dimension , (or is set) if larger or equal to the median on dimension For dimension , (or is set) if larger or equal to the on dimension We have:
![Page 23: Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti](https://reader036.vdocuments.net/reader036/viewer/2022062503/5a4d1ae17f8b9ab059977242/html5/thumbnails/23.jpg)
GPU-Friendly partitioningDefinition of masks
because is less then x-median and greater than y-median.
Same for the others.
![Page 24: Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti](https://reader036.vdocuments.net/reader036/viewer/2022062503/5a4d1ae17f8b9ab059977242/html5/thumbnails/24.jpg)
GPU-Friendly partitioningHow to define incomparability using statically-based MT We can define incomparability between two bitmasks by considering:
1. Ordering (Number of bits being 1 in bitmasks)
2. Bitwise relationships. The authors have defined these equations for both resolutions, which rely on the
transitivity property with respect to the median: Median-level resolution
![Page 25: Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti](https://reader036.vdocuments.net/reader036/viewer/2022062503/5a4d1ae17f8b9ab059977242/html5/thumbnails/25.jpg)
GPU-Friendly partitioning Median-level resolution
This equation checks whether has any bits set (equal to 1) that are not also set in (are 0). If so, then such that . Consequently .
Example:
![Page 26: Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti](https://reader036.vdocuments.net/reader036/viewer/2022062503/5a4d1ae17f8b9ab059977242/html5/thumbnails/26.jpg)
GPU-Friendly partitioning Median-level resolution
If has more 1’s than does then it necessarily contains one that is not set in .
Example:
![Page 27: Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti](https://reader036.vdocuments.net/reader036/viewer/2022062503/5a4d1ae17f8b9ab059977242/html5/thumbnails/27.jpg)
GPU-Friendly partitioning Median-level resolution
If and have the same order, then the only condition under which all bits set in are also set in , is if the masks are identical. If the bit masks are not identical then either and , because both of them have the same order but different arrangements of 1s.
Example: , but and
![Page 28: Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti](https://reader036.vdocuments.net/reader036/viewer/2022062503/5a4d1ae17f8b9ab059977242/html5/thumbnails/28.jpg)
GPU-Friendly partitioning Quartile-level resolution
![Page 29: Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti](https://reader036.vdocuments.net/reader036/viewer/2022062503/5a4d1ae17f8b9ab059977242/html5/thumbnails/29.jpg)
The Skyline Algorithm A global static partitioning is done in the data set. Each thread is assigned to each data point. At a high level, SkyAlign consists of d iterations. In the iteration, remaining points
are compared, each by its thread, to all points with order using MTs and DTs as necessary.
After each phase we remove dominated points and move all surviving points into the solution.
![Page 30: Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti](https://reader036.vdocuments.net/reader036/viewer/2022062503/5a4d1ae17f8b9ab059977242/html5/thumbnails/30.jpg)
The Skyline Algorithm
Pre-filter: Eliminates points that are easy to identify as not in the skyline by defining a threshold as the min of max values.
which is the ’s max value and the smallest largest value in the data.
Each thread is responsible for a point, and the comparing starts whether the data point has any values larger than threshold.
Id
1 2 3
2 2 1
2 4 1
3 3 3
![Page 31: Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti](https://reader036.vdocuments.net/reader036/viewer/2022062503/5a4d1ae17f8b9ab059977242/html5/thumbnails/31.jpg)
The Skyline Algorithm
Mask assignment: Masks are assigned for each point, given the quartiles of the dataset for each dimension.
![Page 32: Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti](https://reader036.vdocuments.net/reader036/viewer/2022062503/5a4d1ae17f8b9ab059977242/html5/thumbnails/32.jpg)
The Skyline Algorithm
Data sorting:Sort the data points based on their masks order.
![Page 33: Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti](https://reader036.vdocuments.net/reader036/viewer/2022062503/5a4d1ae17f8b9ab059977242/html5/thumbnails/33.jpg)
The Skyline Algorithm
Data sorting:Sort the data points based on their masks order.
![Page 34: Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti](https://reader036.vdocuments.net/reader036/viewer/2022062503/5a4d1ae17f8b9ab059977242/html5/thumbnails/34.jpg)
Experimental evaluation Evaluation is done by comparing SkyAlign against state-of-the-art
sequential, multi-core, and GPU skyline algorithms. Algorithms used for comparing:
BSkyTree, Hybrid, GSS, SkyAlign Testing is done using synthetic data generated by skyline dataset
generator, which produced datasets that are correlated, independent and anticorrelated.
By default: and Environment:
Quad core Intel i7 at 3.40GHz, with 16GB of Ram, using NVidia GTX titan GPU.
![Page 35: Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti](https://reader036.vdocuments.net/reader036/viewer/2022062503/5a4d1ae17f8b9ab059977242/html5/thumbnails/35.jpg)
Experimental evaluationRun-time performance Measure the execution time of the four algorithms, testing them on datasets with variations in distribution, dimensionality and cardinality.
1. Cardinality (d = 12)
2. Dimensionality (n = )
![Page 36: Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti](https://reader036.vdocuments.net/reader036/viewer/2022062503/5a4d1ae17f8b9ab059977242/html5/thumbnails/36.jpg)
Experimental evaluationWork-efficiencyCompare the performance of the four algorithms with respect to:
1. Dominance tests (DT)
2. Work-efficiency
![Page 37: Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti](https://reader036.vdocuments.net/reader036/viewer/2022062503/5a4d1ae17f8b9ab059977242/html5/thumbnails/37.jpg)
Experimental evaluationWork-efficiencyCompare the performance of the four algorithms with respect to:
1. Dominance tests (DT)
2. Work-efficiency
![Page 38: Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti](https://reader036.vdocuments.net/reader036/viewer/2022062503/5a4d1ae17f8b9ab059977242/html5/thumbnails/38.jpg)
Thank You