parallel iterative solution of the hermite collocation equations on gpus
TRANSCRIPT
![Page 1: Parallel iterative solution of the hermite collocation equations on gpus](https://reader033.vdocuments.net/reader033/viewer/2022051516/559a99a31a28ab603d8b45fb/html5/thumbnails/1.jpg)
Parallel Iterative solution of the
Hermite Collocation Equations
on GPUs
Emmanuel N. Mathioudakis
Co-Authors : Elena Papadopoulou – Yiannis Saridakis – Nikolaos Vilanakis
TECHNICAL UNIVERSITY OF CRETE
DEPARTMENT OF SCIENCES
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
73100 CHANIA - CRETE - GREECE
![Page 2: Parallel iterative solution of the hermite collocation equations on gpus](https://reader033.vdocuments.net/reader033/viewer/2022051516/559a99a31a28ab603d8b45fb/html5/thumbnails/2.jpg)
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Talk Overview
Hermite Collocation for elliptic BVPs &
Derivation of the Collocation linear system
Development of a parallel algorithm for the
Schur Complement method on Shared
Memory Parallel Architectures
Parallel implementation on multicore computers
with GPUs
![Page 3: Parallel iterative solution of the hermite collocation equations on gpus](https://reader033.vdocuments.net/reader033/viewer/2022051516/559a99a31a28ab603d8b45fb/html5/thumbnails/3.jpg)
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
( , ) ( , ) , ( , )
( , ) ( , ) , ( , )
u x y f x y x y
u x y g x y x y
L
BBV
P
( , ) ( , ) 0 , ( , ):
( , ) ( , ) 0 , (,
, )
y y yx x xi j i j i j
y y yx x xi j i j i j
ai j
u f
u g
L
B
Hermite Collocation Method
![Page 4: Parallel iterative solution of the hermite collocation equations on gpus](https://reader033.vdocuments.net/reader033/viewer/2022051516/559a99a31a28ab603d8b45fb/html5/thumbnails/4.jpg)
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Hermite Collocation Method…
![Page 5: Parallel iterative solution of the hermite collocation equations on gpus](https://reader033.vdocuments.net/reader033/viewer/2022051516/559a99a31a28ab603d8b45fb/html5/thumbnails/5.jpg)
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Hermite Collocation Method…
![Page 6: Parallel iterative solution of the hermite collocation equations on gpus](https://reader033.vdocuments.net/reader033/viewer/2022051516/559a99a31a28ab603d8b45fb/html5/thumbnails/6.jpg)
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Red – Black Collocation Linear system
![Page 7: Parallel iterative solution of the hermite collocation equations on gpus](https://reader033.vdocuments.net/reader033/viewer/2022051516/559a99a31a28ab603d8b45fb/html5/thumbnails/7.jpg)
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Red – Black Collocation Linear system
![Page 8: Parallel iterative solution of the hermite collocation equations on gpus](https://reader033.vdocuments.net/reader033/viewer/2022051516/559a99a31a28ab603d8b45fb/html5/thumbnails/8.jpg)
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Red – Black Collocation Linear system
![Page 9: Parallel iterative solution of the hermite collocation equations on gpus](https://reader033.vdocuments.net/reader033/viewer/2022051516/559a99a31a28ab603d8b45fb/html5/thumbnails/9.jpg)
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Red – Black Collocation Linear system
![Page 10: Parallel iterative solution of the hermite collocation equations on gpus](https://reader033.vdocuments.net/reader033/viewer/2022051516/559a99a31a28ab603d8b45fb/html5/thumbnails/10.jpg)
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Red – Black Collocation Linear system
![Page 11: Parallel iterative solution of the hermite collocation equations on gpus](https://reader033.vdocuments.net/reader033/viewer/2022051516/559a99a31a28ab603d8b45fb/html5/thumbnails/11.jpg)
with 0
( , ) ( , ) ( , ) , ( , )
( , ) ( , ) , ( , )
u x y u x y f x y x y
u x y g x y x y
2M
od
el
Pro
ble
m
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
![Page 12: Parallel iterative solution of the hermite collocation equations on gpus](https://reader033.vdocuments.net/reader033/viewer/2022051516/559a99a31a28ab603d8b45fb/html5/thumbnails/12.jpg)
Helmholtz Collocation
Matrix
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
![Page 13: Parallel iterative solution of the hermite collocation equations on gpus](https://reader033.vdocuments.net/reader033/viewer/2022051516/559a99a31a28ab603d8b45fb/html5/thumbnails/13.jpg)
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Red – Black Collocation Linear system
![Page 14: Parallel iterative solution of the hermite collocation equations on gpus](https://reader033.vdocuments.net/reader033/viewer/2022051516/559a99a31a28ab603d8b45fb/html5/thumbnails/14.jpg)
The collocation matrix is large, sparse and enjoys no pleasant
properties (e.g. symmetric, definite)
Iterative
+
Parallel
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
![Page 15: Parallel iterative solution of the hermite collocation equations on gpus](https://reader033.vdocuments.net/reader033/viewer/2022051516/559a99a31a28ab603d8b45fb/html5/thumbnails/15.jpg)
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Iterative Solution
with
![Page 16: Parallel iterative solution of the hermite collocation equations on gpus](https://reader033.vdocuments.net/reader033/viewer/2022051516/559a99a31a28ab603d8b45fb/html5/thumbnails/16.jpg)
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Iterative Solution
![Page 17: Parallel iterative solution of the hermite collocation equations on gpus](https://reader033.vdocuments.net/reader033/viewer/2022051516/559a99a31a28ab603d8b45fb/html5/thumbnails/17.jpg)
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Eigenvalues of Collocation matrix
![Page 18: Parallel iterative solution of the hermite collocation equations on gpus](https://reader033.vdocuments.net/reader033/viewer/2022051516/559a99a31a28ab603d8b45fb/html5/thumbnails/18.jpg)
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Schur Complement Iterative Solution
![Page 19: Parallel iterative solution of the hermite collocation equations on gpus](https://reader033.vdocuments.net/reader033/viewer/2022051516/559a99a31a28ab603d8b45fb/html5/thumbnails/19.jpg)
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Parallel Iterative Solution of Collocation Linear system
on Shared Memory Architectures
Uniform Load Balancing between core threads
Minimal Idle Cycles of core threads
Minimal Communication
![Page 20: Parallel iterative solution of the hermite collocation equations on gpus](https://reader033.vdocuments.net/reader033/viewer/2022051516/559a99a31a28ab603d8b45fb/html5/thumbnails/20.jpg)
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
case of ns = 2p
( )1Rx ( )
2Rx ( )
3Rx ( )
4Rx
![Page 21: Parallel iterative solution of the hermite collocation equations on gpus](https://reader033.vdocuments.net/reader033/viewer/2022051516/559a99a31a28ab603d8b45fb/html5/thumbnails/21.jpg)
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
case of ns = 2p
( )1
Bp
x
( )2
Bp
x
( )3
Bp
x
( )4
Bp
x
![Page 22: Parallel iterative solution of the hermite collocation equations on gpus](https://reader033.vdocuments.net/reader033/viewer/2022051516/559a99a31a28ab603d8b45fb/html5/thumbnails/22.jpg)
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
case of ns = 2p
V1
( )1Rx ( )
2Rx ( )
3Rx ( )
4Rx( )
1B
px
( )
2B
px
( )
3B
px
( )4
Bp
x
V2 V3 V4 V5 V6 V7 V8
![Page 23: Parallel iterative solution of the hermite collocation equations on gpus](https://reader033.vdocuments.net/reader033/viewer/2022051516/559a99a31a28ab603d8b45fb/html5/thumbnails/23.jpg)
V1 V2 V3 V4 V5
![Page 24: Parallel iterative solution of the hermite collocation equations on gpus](https://reader033.vdocuments.net/reader033/viewer/2022051516/559a99a31a28ab603d8b45fb/html5/thumbnails/24.jpg)
Mapping into a fixed size Architecture of N Cores
case of k = 2p/N even 2k virtual threads
l=(j-1)k+1,…,jk
l=2p+(j-1)k+1,…,2p+jk
j=1,…,N
( )Bl
x
( )Rl
x
V2 . . .
Pj
![Page 25: Parallel iterative solution of the hermite collocation equations on gpus](https://reader033.vdocuments.net/reader033/viewer/2022051516/559a99a31a28ab603d8b45fb/html5/thumbnails/25.jpg)
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Parallel Schur Complement Iterative Solution
![Page 26: Parallel iterative solution of the hermite collocation equations on gpus](https://reader033.vdocuments.net/reader033/viewer/2022051516/559a99a31a28ab603d8b45fb/html5/thumbnails/26.jpg)
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Parallel BiCGSTAB
![Page 27: Parallel iterative solution of the hermite collocation equations on gpus](https://reader033.vdocuments.net/reader033/viewer/2022051516/559a99a31a28ab603d8b45fb/html5/thumbnails/27.jpg)
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Parallel BiCGSTAB
![Page 28: Parallel iterative solution of the hermite collocation equations on gpus](https://reader033.vdocuments.net/reader033/viewer/2022051516/559a99a31a28ab603d8b45fb/html5/thumbnails/28.jpg)
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
The Dirichlet Helmholtz Problem
2100( 0.1)2with
( , ) 10 ( ) ( ) , ( , ) [0,1] [0,1]
( ) ( ) x
u x y x y x y
x x x e
![Page 29: Parallel iterative solution of the hermite collocation equations on gpus](https://reader033.vdocuments.net/reader033/viewer/2022051516/559a99a31a28ab603d8b45fb/html5/thumbnails/29.jpg)
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
HP SL390s
6 core [email protected]
24GB memory
Oracle Linux 6.3 x64
PGI 13.5 Fortran
PCI-e gen2 x16
HP SL390s – Tesla M2070 GPUs
+ 2 x
![Page 30: Parallel iterative solution of the hermite collocation equations on gpus](https://reader033.vdocuments.net/reader033/viewer/2022051516/559a99a31a28ab603d8b45fb/html5/thumbnails/30.jpg)
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Realization on HP SL390s Tesla GPU machine Iterations / Error measurements
ns BiCGSTAB
Iterations || b – Axn||2
256 294 6.06e-11
512 589 2.85e-11
1024 1161 1.39e-11
2048 3726 9.59e-12
![Page 31: Parallel iterative solution of the hermite collocation equations on gpus](https://reader033.vdocuments.net/reader033/viewer/2022051516/559a99a31a28ab603d8b45fb/html5/thumbnails/31.jpg)
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Realization on HP SL390s Tesla GPU machine Time measurements
![Page 32: Parallel iterative solution of the hermite collocation equations on gpus](https://reader033.vdocuments.net/reader033/viewer/2022051516/559a99a31a28ab603d8b45fb/html5/thumbnails/32.jpg)
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Realization on HP SL390s Tesla GPU machine Time measurements
![Page 33: Parallel iterative solution of the hermite collocation equations on gpus](https://reader033.vdocuments.net/reader033/viewer/2022051516/559a99a31a28ab603d8b45fb/html5/thumbnails/33.jpg)
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Realization on HP SL390s Tesla GPU machine Time measurements
![Page 34: Parallel iterative solution of the hermite collocation equations on gpus](https://reader033.vdocuments.net/reader033/viewer/2022051516/559a99a31a28ab603d8b45fb/html5/thumbnails/34.jpg)
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Realization on HP SL390s Tesla GPU machine Time measurements
![Page 35: Parallel iterative solution of the hermite collocation equations on gpus](https://reader033.vdocuments.net/reader033/viewer/2022051516/559a99a31a28ab603d8b45fb/html5/thumbnails/35.jpg)
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Realization on HP SL390s Tesla GPU machine Time measurements
![Page 36: Parallel iterative solution of the hermite collocation equations on gpus](https://reader033.vdocuments.net/reader033/viewer/2022051516/559a99a31a28ab603d8b45fb/html5/thumbnails/36.jpg)
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Realization on HP SL390s Tesla GPU machine Time measurements
![Page 37: Parallel iterative solution of the hermite collocation equations on gpus](https://reader033.vdocuments.net/reader033/viewer/2022051516/559a99a31a28ab603d8b45fb/html5/thumbnails/37.jpg)
Conclusions
• A new parallel algorithm implementing the Schur complement with BiCGSTAB iterative method for Hermite Collocation equations has been developed.
• The algorithm is realized on Shared Memory multi-core machines with GPUs .
• A performance acceleration of up to 30% is observed.
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
![Page 38: Parallel iterative solution of the hermite collocation equations on gpus](https://reader033.vdocuments.net/reader033/viewer/2022051516/559a99a31a28ab603d8b45fb/html5/thumbnails/38.jpg)
Future work
• Design an efficient parallel Schur complement algorithm of the Hermite Collocation equations for Multiprocessor /Grid machines with GPUs.
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY