adapting tree structures for processing with simd instructions
DESCRIPTION
ADAPTING TREE STRUCTURES FOR PROCESSING WITH SIMD INSTRUCTIONS. Authors: Steffen Zeuch, Frank Huber, Johann-Christoph Freytag Humboldt-Universität zu Berlin {zeuchste,huber,freytag}@informatik.hu-berlin.de. 1. Motivation. B + -Tree: common index structure - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: ADAPTING TREE STRUCTURES FOR PROCESSING WITH SIMD INSTRUCTIONS](https://reader035.vdocuments.net/reader035/viewer/2022062422/56812d03550346895d91db6d/html5/thumbnails/1.jpg)
ADAPTING TREE STRUCTURES FOR PROCESSING WITH SIMD INSTRUCTIONS
1
Authors: Steffen Zeuch, Frank Huber, Johann-Christoph FreytagHumboldt-Universität zu Berlin
{zeuchste,huber,freytag}@informatik.hu-berlin.de
![Page 2: ADAPTING TREE STRUCTURES FOR PROCESSING WITH SIMD INSTRUCTIONS](https://reader035.vdocuments.net/reader035/viewer/2022062422/56812d03550346895d91db6d/html5/thumbnails/2.jpg)
MotivationB+-Tree: common index structure
Common node-internal search algorithm:
Binary search in O(log2n)
2
Can we do better?
Yes with SIMD!
![Page 3: ADAPTING TREE STRUCTURES FOR PROCESSING WITH SIMD INSTRUCTIONS](https://reader035.vdocuments.net/reader035/viewer/2022062422/56812d03550346895d91db6d/html5/thumbnails/3.jpg)
Outline1. Background
2. Binary Search and SIMD
3. Segmented Tree
4. Segmented Trie
5. Evaluation
6. Conclusion
3
![Page 4: ADAPTING TREE STRUCTURES FOR PROCESSING WITH SIMD INSTRUCTIONS](https://reader035.vdocuments.net/reader035/viewer/2022062422/56812d03550346895d91db6d/html5/thumbnails/4.jpg)
Single Instruction Multiple Data:
Available on CPU and GPU
Arithmetical, comparison, conversion, logical
SIMD
4
3 2
5 4
+2 +2
Add const to vector
Add two vectors
3 2 65 67
67 69
+
Compare two vectors
0 -1
≥
673 265
![Page 5: ADAPTING TREE STRUCTURES FOR PROCESSING WITH SIMD INSTRUCTIONS](https://reader035.vdocuments.net/reader035/viewer/2022062422/56812d03550346895d91db6d/html5/thumbnails/5.jpg)
Binary Search
5SeparatorSearch KeyExcludedSearch Space
Iteration
1
2
3
4
5
Search Key = 9
![Page 6: ADAPTING TREE STRUCTURES FOR PROCESSING WITH SIMD INSTRUCTIONS](https://reader035.vdocuments.net/reader035/viewer/2022062422/56812d03550346895d91db6d/html5/thumbnails/6.jpg)
Outline1. Background
2. Binary Search and SIMD
3. Segmented Tree
4. Segmented Trie
5. Evaluation
6. Conclusion
6
![Page 7: ADAPTING TREE STRUCTURES FOR PROCESSING WITH SIMD INSTRUCTIONS](https://reader035.vdocuments.net/reader035/viewer/2022062422/56812d03550346895d91db6d/html5/thumbnails/7.jpg)
Binary Search - two Separator
7SeparatorSearch KeyExcludedSearch Space
Search Key = 9Iteration
1
2
3
![Page 8: ADAPTING TREE STRUCTURES FOR PROCESSING WITH SIMD INSTRUCTIONS](https://reader035.vdocuments.net/reader035/viewer/2022062422/56812d03550346895d91db6d/html5/thumbnails/8.jpg)
Binary Search + SIMD
8
0 -1
SIMD Register C
>=
Separator
Search KeyExcludedSearch Space
8 17 9 9SIMD
Register A
SIMD Register
B
![Page 9: ADAPTING TREE STRUCTURES FOR PROCESSING WITH SIMD INSTRUCTIONS](https://reader035.vdocuments.net/reader035/viewer/2022062422/56812d03550346895d91db6d/html5/thumbnails/9.jpg)
Problem: SIMD on CPU
SIMD on CPU do not support Scatter and Gather functionality.
9
8 94 x 32-bit
SIMD Register 10 11
SIMD load(start position)
![Page 10: ADAPTING TREE STRUCTURES FOR PROCESSING WITH SIMD INSTRUCTIONS](https://reader035.vdocuments.net/reader035/viewer/2022062422/56812d03550346895d91db6d/html5/thumbnails/10.jpg)
Solution: K-ary Search by Schlegel et al.
10SeparatorSearch KeyExcludedSearch Space
3-ary Search Tree(k = 3)
Linearized Order
Search Key = 9
![Page 11: ADAPTING TREE STRUCTURES FOR PROCESSING WITH SIMD INSTRUCTIONS](https://reader035.vdocuments.net/reader035/viewer/2022062422/56812d03550346895d91db6d/html5/thumbnails/11.jpg)
Applied K-ary Search
11SeparatorSearch KeyExcludedSearch Space
3-ary Search Tree
Linearized Order Search Key = 91
2
3
![Page 12: ADAPTING TREE STRUCTURES FOR PROCESSING WITH SIMD INSTRUCTIONS](https://reader035.vdocuments.net/reader035/viewer/2022062422/56812d03550346895d91db6d/html5/thumbnails/12.jpg)
Degree of Parallelism
12
SIMDBandwidth
SearchMethod
DataType
ParallelComparisons
128-bit
K-arySearch
8-bit 16
16-bit 8
32-bit 4
64-bit 2
BinarySearch
All 1
![Page 13: ADAPTING TREE STRUCTURES FOR PROCESSING WITH SIMD INSTRUCTIONS](https://reader035.vdocuments.net/reader035/viewer/2022062422/56812d03550346895d91db6d/html5/thumbnails/13.jpg)
Outline1. Background
2. Binary Search and SIMD
3. Segmented Tree
4. Segmented Trie
5. Evaluation
6. Conclusion
13
![Page 14: ADAPTING TREE STRUCTURES FOR PROCESSING WITH SIMD INSTRUCTIONS](https://reader035.vdocuments.net/reader035/viewer/2022062422/56812d03550346895d91db6d/html5/thumbnails/14.jpg)
Segmented Tree
14
Change inner-node search algorithm from commonlybinary search to k-ary search.
![Page 15: ADAPTING TREE STRUCTURES FOR PROCESSING WITH SIMD INSTRUCTIONS](https://reader035.vdocuments.net/reader035/viewer/2022062422/56812d03550346895d91db6d/html5/thumbnails/15.jpg)
Problem: Unfilled Nodes
15
3-ary Search Tree
Linearized Order
K-ary requirement: multiple of k-1 keys Smax+1
![Page 16: ADAPTING TREE STRUCTURES FOR PROCESSING WITH SIMD INSTRUCTIONS](https://reader035.vdocuments.net/reader035/viewer/2022062422/56812d03550346895d91db6d/html5/thumbnails/16.jpg)
ReorderingNew keys require reordering:
Sorting → Inserting → Linearizing
Exceptions:
Empty Node
Key is greater than the largest existing key
16
![Page 17: ADAPTING TREE STRUCTURES FOR PROCESSING WITH SIMD INSTRUCTIONS](https://reader035.vdocuments.net/reader035/viewer/2022062422/56812d03550346895d91db6d/html5/thumbnails/17.jpg)
Segmented TreeAdvantages:
High resource utilization
Less iterations required
Binary Search: log2n vs. K-ary Search logkn
Disadvantages:
Reordering overhead
Large data types decrease performance
17
![Page 18: ADAPTING TREE STRUCTURES FOR PROCESSING WITH SIMD INSTRUCTIONS](https://reader035.vdocuments.net/reader035/viewer/2022062422/56812d03550346895d91db6d/html5/thumbnails/18.jpg)
Outline1. Background
2. Binary Search and SIMD
3. Segmented Tree
4. Segmented Trie
5. Evaluation
6. Conclusion
18
![Page 19: ADAPTING TREE STRUCTURES FOR PROCESSING WITH SIMD INSTRUCTIONS](https://reader035.vdocuments.net/reader035/viewer/2022062422/56812d03550346895d91db6d/html5/thumbnails/19.jpg)
Segmented Trie
19
Key (Dec)
Key (Hex)
Partial Key(Hex)
Level 1
Level 2
![Page 20: ADAPTING TREE STRUCTURES FOR PROCESSING WITH SIMD INSTRUCTIONS](https://reader035.vdocuments.net/reader035/viewer/2022062422/56812d03550346895d91db6d/html5/thumbnails/20.jpg)
Segmented Trie
20
![Page 21: ADAPTING TREE STRUCTURES FOR PROCESSING WITH SIMD INSTRUCTIONS](https://reader035.vdocuments.net/reader035/viewer/2022062422/56812d03550346895d91db6d/html5/thumbnails/21.jpg)
Segmented TrieAdvantages:
High SIMD search performance
Prefix compression
Early termination
Disadvantages:
Fix level count
Reordering overhead
21
![Page 22: ADAPTING TREE STRUCTURES FOR PROCESSING WITH SIMD INSTRUCTIONS](https://reader035.vdocuments.net/reader035/viewer/2022062422/56812d03550346895d91db6d/html5/thumbnails/22.jpg)
SegTree vs. SegTrie
22
SegTreeSegTree SegTrieSegTrie
Derived From
B+-Tree Prefix B-Tree
Number of Iterations Tree Height Max. #Level
(Early termination)
Number of Level Dynamic Static (Pre-defined)
DOP Depends onData Type
16 (8-bit)
![Page 23: ADAPTING TREE STRUCTURES FOR PROCESSING WITH SIMD INSTRUCTIONS](https://reader035.vdocuments.net/reader035/viewer/2022062422/56812d03550346895d91db6d/html5/thumbnails/23.jpg)
Outline1. Background
2. Binary Search and SIMD
3. Segmented Tree
4. Segmented Trie
5. Evaluation
6. Conclusion
23
![Page 24: ADAPTING TREE STRUCTURES FOR PROCESSING WITH SIMD INSTRUCTIONS](https://reader035.vdocuments.net/reader035/viewer/2022062422/56812d03550346895d91db6d/html5/thumbnails/24.jpg)
Test SetupHW/SW Configuration:
CPU: Intel Xeon 5520, 4 x 2,26 GHz
L1: 32KB, L2: 256 KB, L3: 8 MB, MM: 8 GB
Cacheline: 128 Byte, SIMD bandwidth: 128 Bit
Windows 7 64-bit Professional
Test Dataset:
Synthetically generated, ascending, starting at 024
![Page 25: ADAPTING TREE STRUCTURES FOR PROCESSING WITH SIMD INSTRUCTIONS](https://reader035.vdocuments.net/reader035/viewer/2022062422/56812d03550346895d91db6d/html5/thumbnails/25.jpg)
Evaluation: Bitmask
25
Three Algorithms:
1. Bit Shifting
2. Case-Switch
3. PopCnt
0 -1 SIMD Register C
>=
8 17 9 9SIMD Register A
SIMD Register B
![Page 26: ADAPTING TREE STRUCTURES FOR PROCESSING WITH SIMD INSTRUCTIONS](https://reader035.vdocuments.net/reader035/viewer/2022062422/56812d03550346895d91db6d/html5/thumbnails/26.jpg)
Evaluation: SegTree
26
![Page 27: ADAPTING TREE STRUCTURES FOR PROCESSING WITH SIMD INSTRUCTIONS](https://reader035.vdocuments.net/reader035/viewer/2022062422/56812d03550346895d91db6d/html5/thumbnails/27.jpg)
Evaluation: SegTrie
27
![Page 28: ADAPTING TREE STRUCTURES FOR PROCESSING WITH SIMD INSTRUCTIONS](https://reader035.vdocuments.net/reader035/viewer/2022062422/56812d03550346895d91db6d/html5/thumbnails/28.jpg)
Outline1. Background
2. Binary Search and SIMD
3. Segmented Tree
4. Segmented Trie
5. Evaluation
6. Conclusion
28
![Page 29: ADAPTING TREE STRUCTURES FOR PROCESSING WITH SIMD INSTRUCTIONS](https://reader035.vdocuments.net/reader035/viewer/2022062422/56812d03550346895d91db6d/html5/thumbnails/29.jpg)
Our Contributions B+-Tree and prefix B-Tree using SIMD
Transformation and search algorithm for breadth-first and depth-first data layout
Three algorithms for interpreting a SIMD comparison result
Solution for an arbitrary key count
29
Thanks
![Page 30: ADAPTING TREE STRUCTURES FOR PROCESSING WITH SIMD INSTRUCTIONS](https://reader035.vdocuments.net/reader035/viewer/2022062422/56812d03550346895d91db6d/html5/thumbnails/30.jpg)
Backup
30