![Page 1: AmgX 2.0: Scaling toward CORALimages.nvidia.com/events/sc15/pdfs/AmgX-v2...AmgX 2.0: Scaling toward CORAL 2 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for](https://reader033.vdocuments.net/reader033/viewer/2022051914/6005853edca3693bbe336ec0/html5/thumbnails/1.jpg)
Joe Eaton, November 19, 2015
AmgX 2.0: Scaling toward CORAL
![Page 2: AmgX 2.0: Scaling toward CORALimages.nvidia.com/events/sc15/pdfs/AmgX-v2...AmgX 2.0: Scaling toward CORAL 2 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for](https://reader033.vdocuments.net/reader033/viewer/2022051914/6005853edca3693bbe336ec0/html5/thumbnails/2.jpg)
2
Agenda
Introduction to AmgX
Current Capabilities
Scaling
V2.0
Roadmap for the future
![Page 3: AmgX 2.0: Scaling toward CORALimages.nvidia.com/events/sc15/pdfs/AmgX-v2...AmgX 2.0: Scaling toward CORAL 2 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for](https://reader033.vdocuments.net/reader033/viewer/2022051914/6005853edca3693bbe336ec0/html5/thumbnails/3.jpg)
3
AmgX
Fast, scalable linear solvers, emphasis on iterative methods
Flexible toolkit for GPU accelerated Ax = b solver
Simple API makes it easy to solve your problems faster
![Page 4: AmgX 2.0: Scaling toward CORALimages.nvidia.com/events/sc15/pdfs/AmgX-v2...AmgX 2.0: Scaling toward CORAL 2 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for](https://reader033.vdocuments.net/reader033/viewer/2022051914/6005853edca3693bbe336ec0/html5/thumbnails/4.jpg)
4
“ Using AmgX has allowed us to
exploit the power of the GPU
while freeing up development
time to concentrate on
reservoir simulation.”
Garf Bowen, RidgewayKiteSoftware
![Page 5: AmgX 2.0: Scaling toward CORALimages.nvidia.com/events/sc15/pdfs/AmgX-v2...AmgX 2.0: Scaling toward CORAL 2 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for](https://reader033.vdocuments.net/reader033/viewer/2022051914/6005853edca3693bbe336ec0/html5/thumbnails/5.jpg)
5
1150
197 98
0
500
1000
1500
CPU GPUCustom
AmgX
AmgX in Reservoir Simulation
Solve Faster
Solve Larger Systems
Flexible High Level API
Application Time (seconds)
Lower is
Better
3-phase Black Oil Reservoir Simulation. 400K
grid blocks solved fully implicitly.
CPU: Intel Xeon CPU E5-2670
GPU: NVIDIA Tesla K10
![Page 6: AmgX 2.0: Scaling toward CORALimages.nvidia.com/events/sc15/pdfs/AmgX-v2...AmgX 2.0: Scaling toward CORAL 2 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for](https://reader033.vdocuments.net/reader033/viewer/2022051914/6005853edca3693bbe336ec0/html5/thumbnails/6.jpg)
6
AmgX 2.0: New Features since 1.0
Classical AMG with truncation, robust aggressive coarsening
Complex arithmetic
GPUDirect, RDMA-async
Power8 support, Maxwell support
Crash-proof object management
Re-usable setup phase
Adaptors for major solver packages:
HYPRE, PETSc, Trilinos
Import data structures directly to AmgX for solve, export solution
Host or Device pointer support
JSON configuration
![Page 7: AmgX 2.0: Scaling toward CORALimages.nvidia.com/events/sc15/pdfs/AmgX-v2...AmgX 2.0: Scaling toward CORAL 2 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for](https://reader033.vdocuments.net/reader033/viewer/2022051914/6005853edca3693bbe336ec0/html5/thumbnails/7.jpg)
7
Key Features
Un-smoothed Aggregation AMG
Krylov methods: CG, GMRES, BiCGStab, IDR
Smoothers and Solvers:
Block-Jacobi, Gauss-Seidel
Incomplete LU, Dense LU
KPZ-Polynomial, Chebyshev
Flexible composition system
Scalar or coupled block systems, multi-precision
MPI, OpenMP support
Auto-consolidation
Flexible, simple high level C API
![Page 8: AmgX 2.0: Scaling toward CORALimages.nvidia.com/events/sc15/pdfs/AmgX-v2...AmgX 2.0: Scaling toward CORAL 2 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for](https://reader033.vdocuments.net/reader033/viewer/2022051914/6005853edca3693bbe336ec0/html5/thumbnails/8.jpg)
8
Minimal Example With Config
//One header
#include “amgx_c.h”
//Read config file
AMGX_create_config(&cfg, cfgfile);
//Create resources based on config
AMGX_resources_create_simple(&res,
cfg);
//Create solver object, A,x,b, set
precision
AMGX_solver_create(&solver, res,
mode, cfg);
AMGX_matrix_create(&A,res,mode);
AMGX_vector_create(&x,res,mode);
AMGX_vector_create(&b,res,mode);
//Read coefficients from a file
AMGX_read_system(&A,&x,&b,
matrixfile);
//Setup and Solve Loop
AMGX_solver_setup(solver,A);
AMGX_solver_solve(solver, b, x);
//Download Result
AMGX_download_vector(&x)
solver(main)=FGMRES
main:max_iters=100
main:convergence=RELATIVE_MAX
main:tolerance=0.1
main:preconditioner(amg)=AMG
amg:algorithm=AGGREGATION
amg:selector=SIZE_8
amg:cycle=V
amg:max_iters=1
amg:max_levels=10
amg:smoother(smoother)=BLOCK_JACOBI
amg:relaxation_factor= 0.75
amg:presweeps=1
amg:postsweeps=2
amg:coarsest_sweeps=4
determinism_flag=1
![Page 9: AmgX 2.0: Scaling toward CORALimages.nvidia.com/events/sc15/pdfs/AmgX-v2...AmgX 2.0: Scaling toward CORAL 2 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for](https://reader033.vdocuments.net/reader033/viewer/2022051914/6005853edca3693bbe336ec0/html5/thumbnails/9.jpg)
9
Integrates easily MPI and OpenMP domain decomposition
Adding GPU support to existing applications raises new issues
Proper ratio of CPU cores / GPU?
How can multiple CPU cores (MPI ranks) share a single GPU?
How does MPI switch between two sets of ‘ranks’: one set for CPUs, one set for GPUs?
AmgX handles this via Consolidation
Consolidate multiple smaller sub-matrices into single matrix
Handled automatically during PCIE data copy
![Page 10: AmgX 2.0: Scaling toward CORALimages.nvidia.com/events/sc15/pdfs/AmgX-v2...AmgX 2.0: Scaling toward CORAL 2 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for](https://reader033.vdocuments.net/reader033/viewer/2022051914/6005853edca3693bbe336ec0/html5/thumbnails/10.jpg)
10
u1
u2 u4 u3
u5
u6
u7
u1
u2
u4
u3
u5
u6
u7
u’4
u’2
Rank 0
Rank 1
GPU
u1
u2 u4 u3
u5
u6
u7
PCIE
PCIE
Original Problem
Partitioned to 2 MPI Ranks
Consolidated onto 1 GPU
Boundary exchange
![Page 11: AmgX 2.0: Scaling toward CORALimages.nvidia.com/events/sc15/pdfs/AmgX-v2...AmgX 2.0: Scaling toward CORAL 2 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for](https://reader033.vdocuments.net/reader033/viewer/2022051914/6005853edca3693bbe336ec0/html5/thumbnails/11.jpg)
11
Consolidation Examples
1 CPU socket <=> 1 GPU
Dual socket CPU <=> 2 GPUs
Dual socket CPU <=> 4 GPUs
Arbitrary Cluster:
4 nodes x [2 CPUs + 3 GPUs] IB
![Page 12: AmgX 2.0: Scaling toward CORALimages.nvidia.com/events/sc15/pdfs/AmgX-v2...AmgX 2.0: Scaling toward CORAL 2 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for](https://reader033.vdocuments.net/reader033/viewer/2022051914/6005853edca3693bbe336ec0/html5/thumbnails/12.jpg)
12
PETSc KSP vs AmgX performance test
PDE:
∂u2∂2x+∂u2∂2y+∂u2∂2z=−12π2cos(2πx)cos(2πy)cos(2πz)
BCs:
∂u∂x∣∣∣x=0=∂u∂x∣∣∣x=1=∂u∂y∣∣∣y=0=∂u∂y∣∣∣y=1=∂u∂z∣∣∣z=0=∂u∂z∣∣∣z=1=0
Exact solution:
u(x,y)=cos(2πx)cos(2πy)cos(2πz)
![Page 13: AmgX 2.0: Scaling toward CORALimages.nvidia.com/events/sc15/pdfs/AmgX-v2...AmgX 2.0: Scaling toward CORAL 2 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for](https://reader033.vdocuments.net/reader033/viewer/2022051914/6005853edca3693bbe336ec0/html5/thumbnails/13.jpg)
13
PETSc vs AmgX
7x speedup @4M unknowns 16 cores vs 1 GPU 8x speedup @100M unknowns 512 cores vs 32 GPUs
Machine specification
GPU nodes:
GPU: two K20m per node
CPU nodes:
CPU: two Intel Xeon E5-2670 per node (totally
16 cores per node)
PETSc KSP solver
![Page 14: AmgX 2.0: Scaling toward CORALimages.nvidia.com/events/sc15/pdfs/AmgX-v2...AmgX 2.0: Scaling toward CORAL 2 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for](https://reader033.vdocuments.net/reader033/viewer/2022051914/6005853edca3693bbe336ec0/html5/thumbnails/14.jpg)
14
SPE10 Cases We derived several test cases from the SPE10
permeability distribution by fixing an x-y resolution
and adding resolution in z, using TPFA stencil.
![Page 15: AmgX 2.0: Scaling toward CORALimages.nvidia.com/events/sc15/pdfs/AmgX-v2...AmgX 2.0: Scaling toward CORAL 2 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for](https://reader033.vdocuments.net/reader033/viewer/2022051914/6005853edca3693bbe336ec0/html5/thumbnails/15.jpg)
15
SPE10 Matrix Tests
GPU: NVIDIA K40
CPU: HYPRE on 10 core IvyBridge Xeon E5-2690 V2 @ 3.0GHz
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0 2 4 6 8 10
Spe
ed
up
Millions of Unknowns
1 Socket vs 1 GPU
![Page 16: AmgX 2.0: Scaling toward CORALimages.nvidia.com/events/sc15/pdfs/AmgX-v2...AmgX 2.0: Scaling toward CORAL 2 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for](https://reader033.vdocuments.net/reader033/viewer/2022051914/6005853edca3693bbe336ec0/html5/thumbnails/16.jpg)
16
Scaling up the right way
![Page 17: AmgX 2.0: Scaling toward CORALimages.nvidia.com/events/sc15/pdfs/AmgX-v2...AmgX 2.0: Scaling toward CORAL 2 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for](https://reader033.vdocuments.net/reader033/viewer/2022051914/6005853edca3693bbe336ec0/html5/thumbnails/17.jpg)
17
Poisson Equation / Laplace operator
Titan (Oak Ridge National Laboratory)
GPU: NVIDIA K20x (one per node)
CPU: 16 core AMD Opteron 6274 @ 2.2GHz
Aggregation and Classical Weak Scaling, 8Million DOF per GPU
0.0
2.0
4.0
6.0
8.0
10.0
12.0
1 2 4 8 16 32 64 128 256 512
Tim
e (
s)
Number of GPUs
Setup
AmgX 1.0 (PMIS) AmgX 1.0 (AGG)
![Page 18: AmgX 2.0: Scaling toward CORALimages.nvidia.com/events/sc15/pdfs/AmgX-v2...AmgX 2.0: Scaling toward CORAL 2 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for](https://reader033.vdocuments.net/reader033/viewer/2022051914/6005853edca3693bbe336ec0/html5/thumbnails/18.jpg)
18
Poisson Equation / Laplace operator
Titan (Oak Ridge National Laboratory)
GPU: NVIDIA K20x (one per node)
CPU: 16 core AMD Opteron 6274 @ 2.2GHz
Aggregation and Classical Weak Scaling, 8Million DOF per GPU
y = 0.0062x + 0.0719 R² = 0.9249
y = 0.0022x + 0.0585 R² = 0.9437
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
1 2 4 8 16 32 64 128 256 512
Solv
e T
ime
Number of GPUs
Time per Iteration vs Log(P)
ClassicalAMGSolve
AggregationAMGSolve
Linear (ClassicalAMGSolve)
Linear (AggregationAMGSolve)
![Page 19: AmgX 2.0: Scaling toward CORALimages.nvidia.com/events/sc15/pdfs/AmgX-v2...AmgX 2.0: Scaling toward CORAL 2 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for](https://reader033.vdocuments.net/reader033/viewer/2022051914/6005853edca3693bbe336ec0/html5/thumbnails/19.jpg)
19
Poisson Equation / Laplace operator
Titan (Oak Ridge National Laboratory)
GPU: NVIDIA K20x (one per node)
CPU: 16 core AMD Opteron 6274 @ 2.2GHz
Classical AMG Preconditioner, 8Million DOF per GPU
0
20
40
60
80
100
120
1 2 4 8 16 32 64 128 256 512
Itera
tions
Number of GPUs
PCG
GMRES
![Page 20: AmgX 2.0: Scaling toward CORALimages.nvidia.com/events/sc15/pdfs/AmgX-v2...AmgX 2.0: Scaling toward CORAL 2 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for](https://reader033.vdocuments.net/reader033/viewer/2022051914/6005853edca3693bbe336ec0/html5/thumbnails/20.jpg)
20
Poisson Equation / Laplace operator
Titan (Oak Ridge National Laboratory)
GPU: NVIDIA K20x (one per node)
CPU: 16 core AMD Opteron 6274 @ 2.2GHz
Classical AMG Preconditioner, 8Million DOF per GPU
0.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
16.00
1 2 4 8 16 32 64 128 256 512
Solv
e Ti
me(
s)
Number of GPUs
GMRES
PCG
![Page 21: AmgX 2.0: Scaling toward CORALimages.nvidia.com/events/sc15/pdfs/AmgX-v2...AmgX 2.0: Scaling toward CORAL 2 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for](https://reader033.vdocuments.net/reader033/viewer/2022051914/6005853edca3693bbe336ec0/html5/thumbnails/21.jpg)
21
AmgX 2.0: MPI with GPUDirect RDMA
4x lower latency, 3x Bandwidth, 45% lower CPU utilization
![Page 22: AmgX 2.0: Scaling toward CORALimages.nvidia.com/events/sc15/pdfs/AmgX-v2...AmgX 2.0: Scaling toward CORAL 2 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for](https://reader033.vdocuments.net/reader033/viewer/2022051914/6005853edca3693bbe336ec0/html5/thumbnails/22.jpg)
22
Basic Coarsening
![Page 23: AmgX 2.0: Scaling toward CORALimages.nvidia.com/events/sc15/pdfs/AmgX-v2...AmgX 2.0: Scaling toward CORAL 2 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for](https://reader033.vdocuments.net/reader033/viewer/2022051914/6005853edca3693bbe336ec0/html5/thumbnails/23.jpg)
23
Basic Coarsening
![Page 24: AmgX 2.0: Scaling toward CORALimages.nvidia.com/events/sc15/pdfs/AmgX-v2...AmgX 2.0: Scaling toward CORAL 2 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for](https://reader033.vdocuments.net/reader033/viewer/2022051914/6005853edca3693bbe336ec0/html5/thumbnails/24.jpg)
24
Aggressive Coarsening
![Page 25: AmgX 2.0: Scaling toward CORALimages.nvidia.com/events/sc15/pdfs/AmgX-v2...AmgX 2.0: Scaling toward CORAL 2 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for](https://reader033.vdocuments.net/reader033/viewer/2022051914/6005853edca3693bbe336ec0/html5/thumbnails/25.jpg)
25
Aggressive Coarsening
Less Memory, Faster Setup
![Page 26: AmgX 2.0: Scaling toward CORALimages.nvidia.com/events/sc15/pdfs/AmgX-v2...AmgX 2.0: Scaling toward CORAL 2 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for](https://reader033.vdocuments.net/reader033/viewer/2022051914/6005853edca3693bbe336ec0/html5/thumbnails/26.jpg)
26
AmgX 2.0 Licensing
Developer/Academic License
non commercial use, free
Commercial License, Developer License, Premier Support Service
Subscription License (node/year)
Includes Support and Maintenance
Volume based pricing
Site License
Perpetual License
20% Maintenance and Support
![Page 27: AmgX 2.0: Scaling toward CORALimages.nvidia.com/events/sc15/pdfs/AmgX-v2...AmgX 2.0: Scaling toward CORAL 2 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for](https://reader033.vdocuments.net/reader033/viewer/2022051914/6005853edca3693bbe336ec0/html5/thumbnails/27.jpg)
27
AmgX Roadmap
Continuous Improvement
Availability Features
Classical AMG
- multi node
- multi GPU
- Aggressive coarsening
Complex Arithmetic + Aggregation
Easy interfaces, python
PETSc, HYPRE, Trilinos
Robust convergence on SPE10
GPUDirect v2.0
Scalable Sparse Eigensolvers
Scaling past 512 GPUs
Range Decomposition AMG
Guaranteed convergence aggregation
Commercial License
Premier Support
AmgX 2.5 Q2 2016
AmgX 2.0 Release Q4 2015
CUDA 8.0 with Pascal Support
Tuning for Maxwell
![Page 28: AmgX 2.0: Scaling toward CORALimages.nvidia.com/events/sc15/pdfs/AmgX-v2...AmgX 2.0: Scaling toward CORAL 2 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for](https://reader033.vdocuments.net/reader033/viewer/2022051914/6005853edca3693bbe336ec0/html5/thumbnails/28.jpg)
AmgX 2.0 was made by a great team of contributors. AmgX 2.0 Team: Marat Arsaev, Joe Eaton, Alex Fender, Andrei Schaffer AmgX 2.0 Devtechs: Simon Layton, Nikolai Sakharnykh, Nikolay Markovskiy Interns: Rohit Gupta, Constantine Stulov