engineering a cache oblivious sorting algorithm · funnel sort algorithm description Ørecursively...
TRANSCRIPT
![Page 1: Engineering A Cache Oblivious Sorting Algorithm · Funnel Sort Algorithm Description ØRecursively sort n⅓contiguous arrays of n⅔items ØMerge the sorted sequences using a n⅓-merger](https://reader030.vdocuments.net/reader030/viewer/2022040619/5f2b522cd7ef0811b45219e4/html5/thumbnails/1.jpg)
Engineering A Cache Oblivious Sorting Algorithm
Gerth Brodal, Rolf Fagerberg and Kristoffer Vinther
Presenter: Rawn HenryFebruary 25 2019
![Page 2: Engineering A Cache Oblivious Sorting Algorithm · Funnel Sort Algorithm Description ØRecursively sort n⅓contiguous arrays of n⅔items ØMerge the sorted sequences using a n⅓-merger](https://reader030.vdocuments.net/reader030/viewer/2022040619/5f2b522cd7ef0811b45219e4/html5/thumbnails/2.jpg)
Memory HierarchyTypical cache sizes:
L1 Cache: 32kB – 64kBL2 Cache: 256kB – 512kBL3 Cache: 8MB – 32 MBMain Memory: 4GB – 32 GBDisk: Terabytes
![Page 3: Engineering A Cache Oblivious Sorting Algorithm · Funnel Sort Algorithm Description ØRecursively sort n⅓contiguous arrays of n⅔items ØMerge the sorted sequences using a n⅓-merger](https://reader030.vdocuments.net/reader030/viewer/2022040619/5f2b522cd7ef0811b45219e4/html5/thumbnails/3.jpg)
Observations
ØMemory accesses are usually the bottleneck in algorithms since CPU is a lot faster than main memory
ØWant to minimize the number of times we have get data from slow memory by maximizing data reuse
![Page 4: Engineering A Cache Oblivious Sorting Algorithm · Funnel Sort Algorithm Description ØRecursively sort n⅓contiguous arrays of n⅔items ØMerge the sorted sequences using a n⅓-merger](https://reader030.vdocuments.net/reader030/viewer/2022040619/5f2b522cd7ef0811b45219e4/html5/thumbnails/4.jpg)
Cache Aware vs Cache Oblivious
ØBoth have cache friendly access patterns.ØCache aware algorithms depend on the parameters of the architecture
such as cache size and depth of hierarchy whereas cache obliviousalgorithms do not.
ØCache aware algorithms tend to be faster but are not portable withoutretuning.
Ø We want speed of cache friendly access but portability of cacheoblivious algorithm.
![Page 5: Engineering A Cache Oblivious Sorting Algorithm · Funnel Sort Algorithm Description ØRecursively sort n⅓contiguous arrays of n⅔items ØMerge the sorted sequences using a n⅓-merger](https://reader030.vdocuments.net/reader030/viewer/2022040619/5f2b522cd7ef0811b45219e4/html5/thumbnails/5.jpg)
Funnel Sort Algorithm Description
ØRecursively sort n⅓ contiguous arrays of n⅔ itemsØMerge the sorted sequences using a n⅓ -mergerØBase case is a merger with k = 2ØSimilar to merge sort but different merging routine
![Page 6: Engineering A Cache Oblivious Sorting Algorithm · Funnel Sort Algorithm Description ØRecursively sort n⅓contiguous arrays of n⅔items ØMerge the sorted sequences using a n⅓-merger](https://reader030.vdocuments.net/reader030/viewer/2022040619/5f2b522cd7ef0811b45219e4/html5/thumbnails/6.jpg)
Funnel Sort Picture
Taken from 6.172 lecture 15 Fall 2018
![Page 7: Engineering A Cache Oblivious Sorting Algorithm · Funnel Sort Algorithm Description ØRecursively sort n⅓contiguous arrays of n⅔items ØMerge the sorted sequences using a n⅓-merger](https://reader030.vdocuments.net/reader030/viewer/2022040619/5f2b522cd7ef0811b45219e4/html5/thumbnails/7.jpg)
Sorting bounds
Quick sortØ Work = O(nlgn)ØCache usage = O(n/B)lgn)
Funnel sortØWork = O(nlgn)ØCache = O((n/B)logMn)
![Page 8: Engineering A Cache Oblivious Sorting Algorithm · Funnel Sort Algorithm Description ØRecursively sort n⅓contiguous arrays of n⅔items ØMerge the sorted sequences using a n⅓-merger](https://reader030.vdocuments.net/reader030/viewer/2022040619/5f2b522cd7ef0811b45219e4/html5/thumbnails/8.jpg)
Issues with Funnel Sort
ØIn practice it is not always possible to split K-Funnel into √K bottom funnels, it may lead to rounding errors.
ØvEB layout performs well for binary trees but does not perform well for complex data structures.
![Page 9: Engineering A Cache Oblivious Sorting Algorithm · Funnel Sort Algorithm Description ØRecursively sort n⅓contiguous arrays of n⅔items ØMerge the sorted sequences using a n⅓-merger](https://reader030.vdocuments.net/reader030/viewer/2022040619/5f2b522cd7ef0811b45219e4/html5/thumbnails/9.jpg)
Lazy Funnel Sort
ØTo overcome rounding problem, we use binary mergersØTakes two sorted streams and delivers an output of 2 sorted streams
ØvEB layout is very friendly to binary trees
ØAnalysis of algorithm remains the same despite changes
![Page 10: Engineering A Cache Oblivious Sorting Algorithm · Funnel Sort Algorithm Description ØRecursively sort n⅓contiguous arrays of n⅔items ØMerge the sorted sequences using a n⅓-merger](https://reader030.vdocuments.net/reader030/viewer/2022040619/5f2b522cd7ef0811b45219e4/html5/thumbnails/10.jpg)
Lazy k-Funnel Sort Diagram
![Page 11: Engineering A Cache Oblivious Sorting Algorithm · Funnel Sort Algorithm Description ØRecursively sort n⅓contiguous arrays of n⅔items ØMerge the sorted sequences using a n⅓-merger](https://reader030.vdocuments.net/reader030/viewer/2022040619/5f2b522cd7ef0811b45219e4/html5/thumbnails/11.jpg)
Algorithm Parameters
ØLazy funnel sort recursively sorts N(1/d) segments of size N(1-1/d) then performs a N(1/d) merge
Ø α – controls the buffer size
![Page 12: Engineering A Cache Oblivious Sorting Algorithm · Funnel Sort Algorithm Description ØRecursively sort n⅓contiguous arrays of n⅔items ØMerge the sorted sequences using a n⅓-merger](https://reader030.vdocuments.net/reader030/viewer/2022040619/5f2b522cd7ef0811b45219e4/html5/thumbnails/12.jpg)
Optimizations: k-Merger Structure
ØMemory layoutØBFS, DFS, vEBØNodes and buffers separate/together
ØTree navigation methodØPointers, address calculations
ØStyles for invocationØRecursive, iterative
![Page 13: Engineering A Cache Oblivious Sorting Algorithm · Funnel Sort Algorithm Description ØRecursively sort n⅓contiguous arrays of n⅔items ØMerge the sorted sequences using a n⅓-merger](https://reader030.vdocuments.net/reader030/viewer/2022040619/5f2b522cd7ef0811b45219e4/html5/thumbnails/13.jpg)
Optimal k-Merger Structure
ØSwept k in [15, 270] and performed (20 000 000 / k3) mergesØOn 3 architectures found best configuration for merge structure was:
ØRecursive invocationØPointer-based navigationØvEB layoutØNodes and buffers separate
![Page 14: Engineering A Cache Oblivious Sorting Algorithm · Funnel Sort Algorithm Description ØRecursively sort n⅓contiguous arrays of n⅔items ØMerge the sorted sequences using a n⅓-merger](https://reader030.vdocuments.net/reader030/viewer/2022040619/5f2b522cd7ef0811b45219e4/html5/thumbnails/14.jpg)
Optimizations: Choosing the right merger
ØMinimum of elements left in each input buffer and the space remaining in output buffer
ØOptimal merging algorithmØHybrid of optimal merging algorithm and heuristicØSimple
ØSimple was the fastest probably due to hardware branch predictions
![Page 15: Engineering A Cache Oblivious Sorting Algorithm · Funnel Sort Algorithm Description ØRecursively sort n⅓contiguous arrays of n⅔items ØMerge the sorted sequences using a n⅓-merger](https://reader030.vdocuments.net/reader030/viewer/2022040619/5f2b522cd7ef0811b45219e4/html5/thumbnails/15.jpg)
Optimization: Degree of Merges
ØSimple merge by comparing first elementsØTournament trees
ØIncreasing merge degree decreases height of the tree meaning less tree traversals and data movement down the tree.
![Page 16: Engineering A Cache Oblivious Sorting Algorithm · Funnel Sort Algorithm Description ØRecursively sort n⅓contiguous arrays of n⅔items ØMerge the sorted sequences using a n⅓-merger](https://reader030.vdocuments.net/reader030/viewer/2022040619/5f2b522cd7ef0811b45219e4/html5/thumbnails/16.jpg)
Tournament trees
Taken from 6.172 lecture 15 Fall 2018
![Page 17: Engineering A Cache Oblivious Sorting Algorithm · Funnel Sort Algorithm Description ØRecursively sort n⅓contiguous arrays of n⅔items ØMerge the sorted sequences using a n⅓-merger](https://reader030.vdocuments.net/reader030/viewer/2022040619/5f2b522cd7ef0811b45219e4/html5/thumbnails/17.jpg)
Optimal Merge pattern
ØFound 4 or 5 way mergers were optimal. Tournament trees have too large of an overhead to be worthwhile.
![Page 18: Engineering A Cache Oblivious Sorting Algorithm · Funnel Sort Algorithm Description ØRecursively sort n⅓contiguous arrays of n⅔items ØMerge the sorted sequences using a n⅓-merger](https://reader030.vdocuments.net/reader030/viewer/2022040619/5f2b522cd7ef0811b45219e4/html5/thumbnails/18.jpg)
Optimization: Caching Mergers
ØEach of the calls of the outer recursion use the same size k-merger. Therefore, instead of remaking the merger, it was simply reused for each recursion.
ØAchieved speedups of 3-5% on all architectures
![Page 19: Engineering A Cache Oblivious Sorting Algorithm · Funnel Sort Algorithm Description ØRecursively sort n⅓contiguous arrays of n⅔items ØMerge the sorted sequences using a n⅓-merger](https://reader030.vdocuments.net/reader030/viewer/2022040619/5f2b522cd7ef0811b45219e4/html5/thumbnails/19.jpg)
Other Optimizations
ØSorting Algorithm for base case:ØGCC quick sort to avoid making mergers with height less than 2
ØTuning parameters alpha (to control the buffer size) and d (to control the progression of the recursion)
![Page 20: Engineering A Cache Oblivious Sorting Algorithm · Funnel Sort Algorithm Description ØRecursively sort n⅓contiguous arrays of n⅔items ØMerge the sorted sequences using a n⅓-merger](https://reader030.vdocuments.net/reader030/viewer/2022040619/5f2b522cd7ef0811b45219e4/html5/thumbnails/20.jpg)
Results
ØPerformance depended on architectureØQuick sort was better for architectures with very fast memory buses or slower
CPUs so memory was not as much of a bottleneckØFunnel sort generally outperformed quicksort for larger n
![Page 21: Engineering A Cache Oblivious Sorting Algorithm · Funnel Sort Algorithm Description ØRecursively sort n⅓contiguous arrays of n⅔items ØMerge the sorted sequences using a n⅓-merger](https://reader030.vdocuments.net/reader030/viewer/2022040619/5f2b522cd7ef0811b45219e4/html5/thumbnails/21.jpg)
Results
![Page 22: Engineering A Cache Oblivious Sorting Algorithm · Funnel Sort Algorithm Description ØRecursively sort n⅓contiguous arrays of n⅔items ØMerge the sorted sequences using a n⅓-merger](https://reader030.vdocuments.net/reader030/viewer/2022040619/5f2b522cd7ef0811b45219e4/html5/thumbnails/22.jpg)
Results – Faster Memory Arch
![Page 23: Engineering A Cache Oblivious Sorting Algorithm · Funnel Sort Algorithm Description ØRecursively sort n⅓contiguous arrays of n⅔items ØMerge the sorted sequences using a n⅓-merger](https://reader030.vdocuments.net/reader030/viewer/2022040619/5f2b522cd7ef0811b45219e4/html5/thumbnails/23.jpg)
Results – QS Memory Sensitivity
![Page 24: Engineering A Cache Oblivious Sorting Algorithm · Funnel Sort Algorithm Description ØRecursively sort n⅓contiguous arrays of n⅔items ØMerge the sorted sequences using a n⅓-merger](https://reader030.vdocuments.net/reader030/viewer/2022040619/5f2b522cd7ef0811b45219e4/html5/thumbnails/24.jpg)
Results – External Sorting