universal scalable matrix multiplication
TRANSCRIPT
-
7/31/2019 Universal Scalable Matrix Multiplication
1/10
r c
MK KN r c
ar acbr bc ar ac
br bc
ar K
Kbc
-
7/31/2019 Universal Scalable Matrix Multiplication
2/10
pbpb
ar pbpb bc
-
7/31/2019 Universal Scalable Matrix Multiplication
3/10
ar bc
kthiter
kthiter
1 kiter K
-
7/31/2019 Universal Scalable Matrix Multiplication
4/10
-
7/31/2019 Universal Scalable Matrix Multiplication
5/10
-
7/31/2019 Universal Scalable Matrix Multiplication
6/10
-
7/31/2019 Universal Scalable Matrix Multiplication
7/10
pb = 1, 4, 16
pb = 1
0 10 20 30 40 50 60 70
0
20
40
60
80
100
120
#processors
timeperiter
MPI (only)
128
256
512
1024
2048
4096
-
7/31/2019 Universal Scalable Matrix Multiplication
8/10
0 10 20 30 40 50 60 700
100
200
300
400
500
600
700
#processors
timeperiter
MPI+OpenMP (#threads 2)
128
256
512
1024
2048
4096
0 10 20 30 40 50 60 700
100
200
300
400
500
#processors
timeperiter
MPI+OpenMP (#threads 4)
128
256
512
1024
2048
4096
0 10 20 30 40 50 60 700
50
100
150
200
250
300
350
400
450
#processors
timeperiter
MPI+OpenMP (#threads 6)
128
256
512
1024
2048
4096
-
7/31/2019 Universal Scalable Matrix Multiplication
9/10
0 20 40 60 800
10
20
30
40
50
60
#processors
timeperiter
MPI+CUDA (block size 2)
128
256
512
1024
20484096
0 20 40 60 800
5
10
15
20
25
30
35
#processors
timeperiter
MPI+CUDA (block size 4)
128
256
512
1024
20484096
0 20 40 60 800
2
4
6
8
10
12
14
#processors
timeperiter
MPI+CUDA (block size 16)
128
256
512
1024
2048
4096
-
7/31/2019 Universal Scalable Matrix Multiplication
10/10