future vector microprocessor xtensions...
TRANSCRIPT
![Page 1: FUTURE VECTOR MICROPROCESSOR XTENSIONS …isca2016.eecs.umich.edu/wp-content/uploads/2016/07/7A-2.pdf2016/07/07 · Future Vector Microprocessor Extensions for Data Aggregations This](https://reader033.vdocuments.net/reader033/viewer/2022060208/5f0421a57e708231d40c7807/html5/thumbnails/1.jpg)
Timothy Hayes, Oscar Palomar,
Osman Unsal, Adrian Cristal, Mateo Valero
FUTURE VECTOR MICROPROCESSOR
EXTENSIONS FOR DATA AGGREGATIONS
ISCA-43 (2016)
![Page 2: FUTURE VECTOR MICROPROCESSOR XTENSIONS …isca2016.eecs.umich.edu/wp-content/uploads/2016/07/7A-2.pdf2016/07/07 · Future Vector Microprocessor Extensions for Data Aggregations This](https://reader033.vdocuments.net/reader033/viewer/2022060208/5f0421a57e708231d40c7807/html5/thumbnails/2.jpg)
Motivation
Data generation growing at an exponential rate
Increasing demand to summarise/aggregate data quickly
Since ~2005, frequency scaling no longer viable
Explicit forms of parallelism must be used
Data-level parallelism (DLP) is excellent when available
Vector SIMD ISAs are highly efficient
Compact representation – Implicit parallelism – Scalable
Energy-efficient hardware implementations
Vector SIMD ISA perfect when DLP is regular
Many algorithms need transformations to be regular
Transformations often hurt performance
Future Vector Microprocessor Extensions for Data Aggregations2
![Page 3: FUTURE VECTOR MICROPROCESSOR XTENSIONS …isca2016.eecs.umich.edu/wp-content/uploads/2016/07/7A-2.pdf2016/07/07 · Future Vector Microprocessor Extensions for Data Aggregations This](https://reader033.vdocuments.net/reader033/viewer/2022060208/5f0421a57e708231d40c7807/html5/thumbnails/3.jpg)
Contributions
We examine applicability of vector SIMD to aggregations
Propose and evaluate different algorithms
1. With transformations to use typical vector instructions
2. Vectorise directly using our novel vector instructions
Evaluate with many datasets
Five unique distributions
Twenty-two cardinalities
Speedups between 2.7x and 7.6x over scalar baseline
Future Vector Microprocessor Extensions for Data Aggregations
This work is an extension to our HPCA-21 article– VSR Sort: A Novel Vectorised Sorting Algorithm. (2015) Hayes et al.
I will skip many things due to time constraints
3
![Page 4: FUTURE VECTOR MICROPROCESSOR XTENSIONS …isca2016.eecs.umich.edu/wp-content/uploads/2016/07/7A-2.pdf2016/07/07 · Future Vector Microprocessor Extensions for Data Aggregations This](https://reader033.vdocuments.net/reader033/viewer/2022060208/5f0421a57e708231d40c7807/html5/thumbnails/4.jpg)
Presentation Contents
I. Motivation
II. What is Data Aggregation?
III. Experimental Setup
IV. Algorithms
1. Scalar Baseline
2. Polytable
3. Sorted Reduce
4. Monotable
5. Partially Sorted Monotable
6. Summary
V. Conclusions
Future Vector Microprocessor Extensions for Data Aggregations4
![Page 5: FUTURE VECTOR MICROPROCESSOR XTENSIONS …isca2016.eecs.umich.edu/wp-content/uploads/2016/07/7A-2.pdf2016/07/07 · Future Vector Microprocessor Extensions for Data Aggregations This](https://reader033.vdocuments.net/reader033/viewer/2022060208/5f0421a57e708231d40c7807/html5/thumbnails/5.jpg)
What is a Data Aggregation?
Frequently occurring operation found in
SQL GROUP BY queries
MapReduce
Statistical Languages
OLAP Cubes
Reduction of key-value pairs
Aggregation function, e.g.
SUM
MINIMUM
MAXIMUM
AVERAGE
Future Vector Microprocessor Extensions for Data Aggregations5
![Page 6: FUTURE VECTOR MICROPROCESSOR XTENSIONS …isca2016.eecs.umich.edu/wp-content/uploads/2016/07/7A-2.pdf2016/07/07 · Future Vector Microprocessor Extensions for Data Aggregations This](https://reader033.vdocuments.net/reader033/viewer/2022060208/5f0421a57e708231d40c7807/html5/thumbnails/6.jpg)
7 5 13 25 85 33 9 44
valu
es
0 1 3 2 2 0 3 1keys
40+
What is a Data Aggregation?
Future Vector Microprocessor Extensions for Data Aggregations6
49+ 110+ 22+
![Page 7: FUTURE VECTOR MICROPROCESSOR XTENSIONS …isca2016.eecs.umich.edu/wp-content/uploads/2016/07/7A-2.pdf2016/07/07 · Future Vector Microprocessor Extensions for Data Aggregations This](https://reader033.vdocuments.net/reader033/viewer/2022060208/5f0421a57e708231d40c7807/html5/thumbnails/7.jpg)
Presentation Contents
I. Motivation
II. What is Data Aggregation?
III. Experimental Setup
IV. Algorithms
1. Scalar Baseline
2. Polytable
3. Sorted Reduce
4. Monotable
5. Partially Sorted Monotable
6. Summary
V. Conclusions
Future Vector Microprocessor Extensions for Data Aggregations7
![Page 8: FUTURE VECTOR MICROPROCESSOR XTENSIONS …isca2016.eecs.umich.edu/wp-content/uploads/2016/07/7A-2.pdf2016/07/07 · Future Vector Microprocessor Extensions for Data Aggregations This](https://reader033.vdocuments.net/reader033/viewer/2022060208/5f0421a57e708231d40c7807/html5/thumbnails/8.jpg)
Query and Algorithms
Scalar baseline – no vector instructions
Regular DLP – Typical vector instructions
A. Polytable
B. Standard Sorted Reduce [not in presentation]
Irregular DLP – Novel vector instructions
A. Advanced Sorted Reduce
B. Monotable
C. Partially Sorted Monotable
Future Vector Microprocessor Extensions for Data Aggregations
SELECT key, COUNT(*), SUM(value)
FROM table GROUP BY key
8
![Page 9: FUTURE VECTOR MICROPROCESSOR XTENSIONS …isca2016.eecs.umich.edu/wp-content/uploads/2016/07/7A-2.pdf2016/07/07 · Future Vector Microprocessor Extensions for Data Aggregations This](https://reader033.vdocuments.net/reader033/viewer/2022060208/5f0421a57e708231d40c7807/html5/thumbnails/9.jpg)
Datasets: Five Distributions
1. Uniform
2. Sorted
3. Sequential
4. Heavy Hitter
5. Zipfian
Future Vector Microprocessor Extensions for Data Aggregations9
4 8 2 3 6 7 4 4 1 6 6 7 1 5 2 4 8 1 3 1 2 2 3 3 7 8 5 5 7 6 5 8
1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 7 7 7 7 8 8 8 8
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
3 8 2 3 6 7 3 4 1 3 6 3 3 5 3 4 8 3 3 1 3 2 3 3 7 3 3 5 7 3 5 8
6 8 4 3 2 1 2 2 3 4 8 7 6 1 2 7 3 1 2 5 4 1 5 1 3 1 1 1 1 2 1 5
![Page 10: FUTURE VECTOR MICROPROCESSOR XTENSIONS …isca2016.eecs.umich.edu/wp-content/uploads/2016/07/7A-2.pdf2016/07/07 · Future Vector Microprocessor Extensions for Data Aggregations This](https://reader033.vdocuments.net/reader033/viewer/2022060208/5f0421a57e708231d40c7807/html5/thumbnails/10.jpg)
Datasets: Cardinalities
Number of unique keys within dataset, e.g.
N = 10,000,000
C = 4 10,000,000
Grouped into four cardinality divisions1. Low cardinalities – many repeated keys
2. Low-normal cardinalities
3. High-normal cardinalities
4. High cardinalities – many unique keys
Future Vector Microprocessor Extensions for Data Aggregations10
N=8, C=11 1 1 1 1 1 1 1
N=8, C=21 1 2 1 2 2 2 1
N=8, C=42 1 1 3 4 2 4 1
N=8, C=83 2 8 5 4 6 1 7
![Page 11: FUTURE VECTOR MICROPROCESSOR XTENSIONS …isca2016.eecs.umich.edu/wp-content/uploads/2016/07/7A-2.pdf2016/07/07 · Future Vector Microprocessor Extensions for Data Aggregations This](https://reader033.vdocuments.net/reader033/viewer/2022060208/5f0421a57e708231d40c7807/html5/thumbnails/11.jpg)
Simulation Framework
Custom Simulation Framework
PTLsim – 32 nm Westmere microarchitecture
DRAMSim2 – DDR3-1333
Extended vector SIMD support
Heavily influenced from classical vector machines, e.g. CRAY-1
Emphasis on integer operations
16x vector registers with 64x 64bit elements
Pipelined functional units with 4x lockstepped parallel lanes
Masked operations
Indexed memory operations, i.e. gather/scatter
Integrated in out-of-order superscalar pipeline
Future Vector Microprocessor Extensions for Data Aggregations11
![Page 12: FUTURE VECTOR MICROPROCESSOR XTENSIONS …isca2016.eecs.umich.edu/wp-content/uploads/2016/07/7A-2.pdf2016/07/07 · Future Vector Microprocessor Extensions for Data Aggregations This](https://reader033.vdocuments.net/reader033/viewer/2022060208/5f0421a57e708231d40c7807/html5/thumbnails/12.jpg)
Presentation Contents
I. Motivation
II. What is Data Aggregation?
III. Experimental Setup
IV. Algorithms
1. Scalar Baseline
2. Polytable
3. Sorted Reduce
4. Monotable
5. Partially Sorted Monotable
6. Summary
V. Conclusions
Future Vector Microprocessor Extensions for Data Aggregations12
![Page 13: FUTURE VECTOR MICROPROCESSOR XTENSIONS …isca2016.eecs.umich.edu/wp-content/uploads/2016/07/7A-2.pdf2016/07/07 · Future Vector Microprocessor Extensions for Data Aggregations This](https://reader033.vdocuments.net/reader033/viewer/2022060208/5f0421a57e708231d40c7807/html5/thumbnails/13.jpg)
Scalar Baseline
Future Vector Microprocessor Extensions for Data Aggregations
1 5 5 3
27 19 43 31
0
0
0
0
0
keys
values
table
1
2
3
4
5
13
31
![Page 14: FUTURE VECTOR MICROPROCESSOR XTENSIONS …isca2016.eecs.umich.edu/wp-content/uploads/2016/07/7A-2.pdf2016/07/07 · Future Vector Microprocessor Extensions for Data Aggregations This](https://reader033.vdocuments.net/reader033/viewer/2022060208/5f0421a57e708231d40c7807/html5/thumbnails/14.jpg)
Scalar Baseline – Results
0
15
30
45
60
75
90
105
120
135
4 9
19 38 76
152
305
610
1,220
2,441
4,882
9,765
19,531
39,062
78,125
156,250
312,500
625,000
1,250,000
2,500,000
5,000,000
10,000,000
low low-normal high-normal high
cycl
es
pe
r tu
ple
uniform sorted sequential hhitter zipf
Future Vector Microprocessor Extensions for Data Aggregations14
![Page 15: FUTURE VECTOR MICROPROCESSOR XTENSIONS …isca2016.eecs.umich.edu/wp-content/uploads/2016/07/7A-2.pdf2016/07/07 · Future Vector Microprocessor Extensions for Data Aggregations This](https://reader033.vdocuments.net/reader033/viewer/2022060208/5f0421a57e708231d40c7807/html5/thumbnails/15.jpg)
Presentation Contents
I. Motivation
II. What is Data Aggregation?
III. Experimental Setup
IV. Algorithms
1. Scalar Baseline
2. Polytable
3. Sorted Reduce
4. Monotable
5. Partially Sorted Monotable
6. Summary
V. Conclusions
Future Vector Microprocessor Extensions for Data Aggregations15
![Page 16: FUTURE VECTOR MICROPROCESSOR XTENSIONS …isca2016.eecs.umich.edu/wp-content/uploads/2016/07/7A-2.pdf2016/07/07 · Future Vector Microprocessor Extensions for Data Aggregations This](https://reader033.vdocuments.net/reader033/viewer/2022060208/5f0421a57e708231d40c7807/html5/thumbnails/16.jpg)
Polytable
Future Vector Microprocessor Extensions for Data Aggregations
1 5 5 3
27 19 43 31
0
0
0
0
0
keys
values
table
1
2
3
4
5
16
process m key-values
gather-modify-scatter conflict
![Page 17: FUTURE VECTOR MICROPROCESSOR XTENSIONS …isca2016.eecs.umich.edu/wp-content/uploads/2016/07/7A-2.pdf2016/07/07 · Future Vector Microprocessor Extensions for Data Aggregations This](https://reader033.vdocuments.net/reader033/viewer/2022060208/5f0421a57e708231d40c7807/html5/thumbnails/17.jpg)
Polytable
Future Vector Microprocessor Extensions for Data Aggregations
0
0
0
0
0
m local tables
1
2
3
4
5
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
17
1 5 5 3
27 19 43 31
keys
values
✓31
43 19
27
![Page 18: FUTURE VECTOR MICROPROCESSOR XTENSIONS …isca2016.eecs.umich.edu/wp-content/uploads/2016/07/7A-2.pdf2016/07/07 · Future Vector Microprocessor Extensions for Data Aggregations This](https://reader033.vdocuments.net/reader033/viewer/2022060208/5f0421a57e708231d40c7807/html5/thumbnails/18.jpg)
Polytable – Results
0
15
30
45
60
75
90
105
120
135
4 9
19
38
76
152
305
610
1,220
2,441
4,882
9,765
19,531
39,062
78,125
156,250
312,500
625,000
1,250,000
2,500,000
5,000,000
10,000,000
low low-normal high-normal high
cycl
es
pe
r tu
ple
uniform sorted sequential hhitter zipf
scalar
0
15
30
45
60
75
90
105
120
135
4 9
19
38
76
152
305
610
1,220
2,441
4,882
9,765
19,531
39,062
78,125
156,250
312,500
625,000
1,250,000
2,500,000
5,000,000
10,000,000
low low-normal high-normal high
cycl
es
pe
r tu
ple
uniform sorted sequential hhitter zipf
polytable
Future Vector Microprocessor Extensions for Data Aggregations18
![Page 19: FUTURE VECTOR MICROPROCESSOR XTENSIONS …isca2016.eecs.umich.edu/wp-content/uploads/2016/07/7A-2.pdf2016/07/07 · Future Vector Microprocessor Extensions for Data Aggregations This](https://reader033.vdocuments.net/reader033/viewer/2022060208/5f0421a57e708231d40c7807/html5/thumbnails/19.jpg)
Presentation Contents
I. Motivation
II. What is Data Aggregation?
III. Experimental Setup
IV. Algorithms
1. Scalar Baseline
2. Polytable
3. Sorted Reduce
4. Monotable
5. Partially Sorted Monotable
6. Summary
V. Conclusions
Future Vector Microprocessor Extensions for Data Aggregations19
![Page 20: FUTURE VECTOR MICROPROCESSOR XTENSIONS …isca2016.eecs.umich.edu/wp-content/uploads/2016/07/7A-2.pdf2016/07/07 · Future Vector Microprocessor Extensions for Data Aggregations This](https://reader033.vdocuments.net/reader033/viewer/2022060208/5f0421a57e708231d40c7807/html5/thumbnails/20.jpg)
Sorted Reduce
Future Vector Microprocessor Extensions for Data Aggregations
5 1 5 3
27 19 43 31
keys
values
Our new sorting algorithm from HPCA-21
Based on vectorised radix sort
Uses novel vector SIMD instructions
Avoids gather-modify-scatter conflicts
Vector Prior Instances (VPI)
20
![Page 21: FUTURE VECTOR MICROPROCESSOR XTENSIONS …isca2016.eecs.umich.edu/wp-content/uploads/2016/07/7A-2.pdf2016/07/07 · Future Vector Microprocessor Extensions for Data Aggregations This](https://reader033.vdocuments.net/reader033/viewer/2022060208/5f0421a57e708231d40c7807/html5/thumbnails/21.jpg)
Sorted Reduce
Future Vector Microprocessor Extensions for Data Aggregations21
inp
ut
ou
tpu
t
2 2 3 1
0 1 0 0
least significant element
most significant element
5 1 5 3
27 19 43 31
keys
values
VPI
![Page 22: FUTURE VECTOR MICROPROCESSOR XTENSIONS …isca2016.eecs.umich.edu/wp-content/uploads/2016/07/7A-2.pdf2016/07/07 · Future Vector Microprocessor Extensions for Data Aggregations This](https://reader033.vdocuments.net/reader033/viewer/2022060208/5f0421a57e708231d40c7807/html5/thumbnails/22.jpg)
Sorted Reduce
Future Vector Microprocessor Extensions for Data Aggregations
sort
22
output
5 1 5 3
27 19 43 31
keys
values
1 + reduce
3 + reduce
5 + reduce
5 5 3 1
27 43 31 19
19
31
70
![Page 23: FUTURE VECTOR MICROPROCESSOR XTENSIONS …isca2016.eecs.umich.edu/wp-content/uploads/2016/07/7A-2.pdf2016/07/07 · Future Vector Microprocessor Extensions for Data Aggregations This](https://reader033.vdocuments.net/reader033/viewer/2022060208/5f0421a57e708231d40c7807/html5/thumbnails/23.jpg)
Sorted Reduce – Results
0
15
30
45
60
75
90
105
120
135
4 9
19
38
76
152
305
610
1,220
2,441
4,882
9,765
19,531
39,062
78,125
156,250
312,500
625,000
1,250,000
2,500,000
5,000,000
10,000,000
low low-normal high-normal high
cycl
es
pe
r tu
ple
uniform sorted sequential hhitter zipf
0
15
30
45
60
75
90
105
120
135
4 9
19
38
76
152
305
610
1,220
2,441
4,882
9,765
19,531
39,062
78,125
156,250
312,500
625,000
1,250,000
2,500,000
5,000,000
10,000,000
low low-normal high-normal high
cycl
es
pe
r tu
ple
uniform sorted sequential hhitter zipf
scalar advanced sorted reduce
Future Vector Microprocessor Extensions for Data Aggregations23
![Page 24: FUTURE VECTOR MICROPROCESSOR XTENSIONS …isca2016.eecs.umich.edu/wp-content/uploads/2016/07/7A-2.pdf2016/07/07 · Future Vector Microprocessor Extensions for Data Aggregations This](https://reader033.vdocuments.net/reader033/viewer/2022060208/5f0421a57e708231d40c7807/html5/thumbnails/24.jpg)
Presentation Contents
I. Motivation
II. What is Data Aggregation?
III. Experimental Setup
IV. Algorithms
1. Scalar Baseline
2. Polytable
3. Sorted Reduce
4. Monotable
5. Partially Sorted Monotable
6. Summary
V. Conclusions
Future Vector Microprocessor Extensions for Data Aggregations24
![Page 25: FUTURE VECTOR MICROPROCESSOR XTENSIONS …isca2016.eecs.umich.edu/wp-content/uploads/2016/07/7A-2.pdf2016/07/07 · Future Vector Microprocessor Extensions for Data Aggregations This](https://reader033.vdocuments.net/reader033/viewer/2022060208/5f0421a57e708231d40c7807/html5/thumbnails/25.jpg)
Monotable
The Polytable algorithm needs to replicate tables
Avoids gather-modify-scatter conflicts
Hurts performance
The Sorted Reduce algorithm uses VSR Sort
VSR Sort uses VPI to resolve gather-modify-scatter conflicts
Could VPI also be used to optimise Polytable?
VPI is not sufficient, but…
Hardware could be reused
Create similar-style but different instruction
Vector Group Aggregate: SUM (VGAsum)
Similar to VPI but uses second vector of values
Vectorise scalar baseline without transformations
Future Vector Microprocessor Extensions for Data Aggregations25
![Page 26: FUTURE VECTOR MICROPROCESSOR XTENSIONS …isca2016.eecs.umich.edu/wp-content/uploads/2016/07/7A-2.pdf2016/07/07 · Future Vector Microprocessor Extensions for Data Aggregations This](https://reader033.vdocuments.net/reader033/viewer/2022060208/5f0421a57e708231d40c7807/html5/thumbnails/26.jpg)
Monotable
Future Vector Microprocessor Extensions for Data Aggregations26
va
lue
ou
tpu
t
6 12 3 5ke
y
2 0 2 2
VGAsummost significant element
least significant element
6 12 9 14
![Page 27: FUTURE VECTOR MICROPROCESSOR XTENSIONS …isca2016.eecs.umich.edu/wp-content/uploads/2016/07/7A-2.pdf2016/07/07 · Future Vector Microprocessor Extensions for Data Aggregations This](https://reader033.vdocuments.net/reader033/viewer/2022060208/5f0421a57e708231d40c7807/html5/thumbnails/27.jpg)
Monotable – Results
0
15
30
45
60
75
90
105
120
135
4 9
19
38
76
152
305
610
1,220
2,441
4,882
9,765
19,531
39,062
78,125
156,250
312,500
625,000
1,250,000
2,500,000
5,000,000
10,000,000
low low-normal high-normal high
cycl
es
pe
r tu
ple
uniform sorted sequential hhitter zipf
0
15
30
45
60
75
90
105
120
135
4 9
19
38
76
152
305
610
1,220
2,441
4,882
9,765
19,531
39,062
78,125
156,250
312,500
625,000
1,250,000
2,500,000
5,000,000
10,000,000
low low-normal high-normal high
cycl
es
pe
r tu
ple
uniform sorted sequential hhitter zipf
scalar monotable
Future Vector Microprocessor Extensions for Data Aggregations27
![Page 28: FUTURE VECTOR MICROPROCESSOR XTENSIONS …isca2016.eecs.umich.edu/wp-content/uploads/2016/07/7A-2.pdf2016/07/07 · Future Vector Microprocessor Extensions for Data Aggregations This](https://reader033.vdocuments.net/reader033/viewer/2022060208/5f0421a57e708231d40c7807/html5/thumbnails/28.jpg)
Presentation Contents
I. Motivation
II. What is Data Aggregation?
III. Experimental Setup
IV. Algorithms
1. Scalar Baseline
2. Polytable
3. Sorted Reduce
4. Monotable
5. Partially Sorted Monotable
6. Summary
V. Conclusions
Future Vector Microprocessor Extensions for Data Aggregations28
![Page 29: FUTURE VECTOR MICROPROCESSOR XTENSIONS …isca2016.eecs.umich.edu/wp-content/uploads/2016/07/7A-2.pdf2016/07/07 · Future Vector Microprocessor Extensions for Data Aggregations This](https://reader033.vdocuments.net/reader033/viewer/2022060208/5f0421a57e708231d40c7807/html5/thumbnails/29.jpg)
Partially Sorted Monotable
Losing locality hurts performance
Fully sorting can have a high overhead
VSR Sort has O(k.n) complexity
If we reduce the ‘k’, we reduce the overhead
92,345
0001,0110,1000,1011,1001
Future Vector Microprocessor Extensions for Data Aggregations
Ignore LSBssort on MSBs (partition)
k bits
key
29
![Page 30: FUTURE VECTOR MICROPROCESSOR XTENSIONS …isca2016.eecs.umich.edu/wp-content/uploads/2016/07/7A-2.pdf2016/07/07 · Future Vector Microprocessor Extensions for Data Aggregations This](https://reader033.vdocuments.net/reader033/viewer/2022060208/5f0421a57e708231d40c7807/html5/thumbnails/30.jpg)
Partially Sorted Monotable – Results
0
15
30
45
60
75
90
105
120
135
4 9
19
38
76
152
305
610
1,220
2,441
4,882
9,765
19,531
39,062
78,125
156,250
312,500
625,000
1,250,000
2,500,000
5,000,000
10,000,000
low low-normal high-normal high
cycl
es
pe
r tu
ple
uniform sorted sequential hhitter zipf
0
15
30
45
60
75
90
105
120
135
4 9
19
38
76
152
305
610
1,220
2,441
4,882
9,765
19,531
39,062
78,125
156,250
312,500
625,000
1,250,000
2,500,000
5,000,000
10,000,000
low low-normal high-normal high
cycl
es
pe
r tu
ple
uniform sorted sequential hhitter zipf
scalar partially sorted monotable
Future Vector Microprocessor Extensions for Data Aggregations30
![Page 31: FUTURE VECTOR MICROPROCESSOR XTENSIONS …isca2016.eecs.umich.edu/wp-content/uploads/2016/07/7A-2.pdf2016/07/07 · Future Vector Microprocessor Extensions for Data Aggregations This](https://reader033.vdocuments.net/reader033/viewer/2022060208/5f0421a57e708231d40c7807/html5/thumbnails/31.jpg)
Presentation Contents
I. Motivation
II. What is Data Aggregation?
III. Experimental Setup
IV. Algorithms
1. Scalar Baseline
2. Polytable
3. Sorted Reduce
4. Monotable
5. Partially Sorted Monotable
6. Summary
V. Conclusions
Future Vector Microprocessor Extensions for Data Aggregations31
![Page 32: FUTURE VECTOR MICROPROCESSOR XTENSIONS …isca2016.eecs.umich.edu/wp-content/uploads/2016/07/7A-2.pdf2016/07/07 · Future Vector Microprocessor Extensions for Data Aggregations This](https://reader033.vdocuments.net/reader033/viewer/2022060208/5f0421a57e708231d40c7807/html5/thumbnails/32.jpg)
Summary – Best Speedups Overall
1
2
3
4
5
6
7
8
low low-normal high-normal high
aver
age
spee
du
p o
ver
scal
ar
cardinality
uniform sorted sequential hhitter zipf
Future Vector Microprocessor Extensions for Data Aggregations
2.7x
7.6x
32
![Page 33: FUTURE VECTOR MICROPROCESSOR XTENSIONS …isca2016.eecs.umich.edu/wp-content/uploads/2016/07/7A-2.pdf2016/07/07 · Future Vector Microprocessor Extensions for Data Aggregations This](https://reader033.vdocuments.net/reader033/viewer/2022060208/5f0421a57e708231d40c7807/html5/thumbnails/33.jpg)
Summary – Best Speedups Overall
1
2
3
4
5
6
7
8
low low-normal high-normal high
aver
age
spee
du
p o
ver
scal
ar
cardinality
uniform sorted sequential hhitter zipfm
on
o
po
ly
mo
no
mo
no
mo
no
mo
no
po
ly
mo
no
mo
no
mo
no
ps-
mo
no
sorte
d r
ed
uce
mo
no
ps-
mo
no
ps-
mo
no
ps-
mo
no
mo
no
mo
no
ps-
mo
no
ps-
mo
no
Future Vector Microprocessor Extensions for Data Aggregations
2.7x
7.6x
33
![Page 34: FUTURE VECTOR MICROPROCESSOR XTENSIONS …isca2016.eecs.umich.edu/wp-content/uploads/2016/07/7A-2.pdf2016/07/07 · Future Vector Microprocessor Extensions for Data Aggregations This](https://reader033.vdocuments.net/reader033/viewer/2022060208/5f0421a57e708231d40c7807/html5/thumbnails/34.jpg)
Presentation Contents
I. Motivation
II. What is Data Aggregation?
III. Experimental Setup
IV. Algorithms
1. Scalar Baseline
2. Polytable
3. Sorted Reduce
4. Monotable
5. Partially Sorted Monotable
6. Summary
V. Conclusions
Future Vector Microprocessor Extensions for Data Aggregations34
![Page 35: FUTURE VECTOR MICROPROCESSOR XTENSIONS …isca2016.eecs.umich.edu/wp-content/uploads/2016/07/7A-2.pdf2016/07/07 · Future Vector Microprocessor Extensions for Data Aggregations This](https://reader033.vdocuments.net/reader033/viewer/2022060208/5f0421a57e708231d40c7807/html5/thumbnails/35.jpg)
Conclusions
Aggregating data quickly is important
DLP & SIMD is an attractive way to accelerate it
Aggregation algorithms are simple but DLP is irregular
We proposed various algorithms
A. Use transformations and typical vector SIMD instructions
B. Avoid transformations using our novel vector instructions
Evaluated using many data distributions and cardinalities
Speedups between 2.7x and 7.6x over scalar baseline
Best solution is dependent on input characteristics
Future Vector Microprocessor Extensions for Data Aggregations35