large-scale reservoir simulation on gpu - gpu technology...
TRANSCRIPT
-
RESERVOIR SIMULATION
Large-Scale Reservoir Simulation on GPU
Song Yu, Hui Liu
Advisor: Dr. Zhangxing (John) Chen
University of Calgary
-
RESERVOIR SIMULATION
Outline • Introduction
• GPU-based Linear Solver
• GPU-based Reservoir Simulation
• Numerical Experiments
• Conclusions
-
RESERVOIR SIMULATION
Introduction
• Numerical method: FDM, FEM, FVM à matrix system • A system matrix arising from simulation: sparse, highly nonsymmetric and ill-
conditioned.
• The general choice: Krylov subspace solvers with preconditioners. • Large-scale Reservoir simulation time: 80% -90% on solver • Speed up linear solvers à speed up reservoir simulation
-
RESERVOIR SIMULATION
GPU Architecture (Tesla) D
RA
M I/
F H
OST
I/F
Gig
a Th
read
D
RA
M I/
F DRA
M I/F
DR
AM
I/F D
RA
M I/F
DR
AM
I/F
L2
GPU SM
-
RESERVOIR SIMULATION
GPU-based Linear Solver Package
-
RESERVOIR SIMULATION
GMRES Iterative algorithm used for solving linear system of equations in the form of Ax = b For an m*m matrix, GMRES guarantees convergence to the exact solution within m iterations. In reality, m is a very large number, so we use restart GMRES(m). GMRES converges after a small number of iterations when it is used in conjunction with a good preconditioner.
Main computational factor: • BLAS operation: • Matrix-vector product:
• Preconditioning operation:
T
y x vector scalez x y dot producty x y saxpy
α
α
=
=
= +
y Ax=
Mr b=
-
RESERVOIR SIMULATION
Preconditioner The convergence rate of iterative linear solvers depends highly on the condition number of the matrix. Preconditioners are used to reduce the matrix condition number and speed up the convergence of iterative solvers. Ax = b à M-1Ax = M-1b M ≈ A ≈ LU Two criteria to choose M : 1: good approximation of A 2: easy to compute M-1 or solve Mx=b • ILU is one of the most popular preconditioner families. Some non-zero elements in the L and U factors are ignored to reduce the cost
and the number of fill-ins. ILU has many varieties based on the level of fill-in. 1. no fill-in ILU: ILU(0), is the simplest one. In ILU(0), the lower and upper triangular matrices only keep non-zero elements, whose positions have non-zero elements in the original matrix. 2. fill-in ILU : ILUT with numerical threshold and ILUk with fill-in level k The more fill-in, the more time the factorization takes. It is a trade-off between accuracy and speed
-
RESERVOIR SIMULATION
Sparse matrix vector multiplication
• Matrix: HEC format, Hybrid of ELL format and CSR format
J V Ap
J
V
ELL format CSR format
i
i i+1
-
RESERVOIR SIMULATION
GPU-based Linear Solver Package
-
RESERVOIR SIMULATION
GPU-based Reservoir Simulation
• Conservation Equations – Material Conservation – Energy Conservation
• Linear Solver – Linear Solver, eg. GMRES, BICGSTAB, ORTHOMIN – Non-Linear (Newtonian) Solver
-
RESERVOIR SIMULATION
Jacobian Matrix Example
nRRRRRRRRR
TspTspTsp
n
e
w
o
e
w
o
e
w
o
o
o
o
o
o
o
TR
sR
pR
TR
sR
pR
TR
sR
pR
TR
sR
pR
TR
sR
pR
TR
sR
pR
TR
sR
pR
TR
sR
pR
TR
SR
pR
TR
sR
pR
TR
sR
pR
TR
sR
pR
TR
SR
pR
TR
sR
pR
TR
sR
pR
TR
sR
pR
TR
sR
pR
TR
sR
pR
TR
sR
pR
TR
sR
pR
TR
sR
pR
e
o
e
o
ee
o
e
o
e
w
o
w
o
ww
o
w
o
w
o
o
o
o
oo
o
o
o
o
e
o
e
o
ee
o
e
o
ee
o
e
o
e
w
o
w
o
ww
o
w
o
ww
o
w
o
w
o
o
o
o
oo
o
o
o
oo
o
o
o
o
e
o
e
o
ee
o
e
o
e
w
o
w
o
ww
o
w
o
w
o
o
o
o
oo
o
o
o
o
⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜
⎝
⎛
−=
⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜
⎝
⎛
Δ
Δ
Δ
Δ
Δ
Δ
Δ
Δ
Δ
⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜
⎝
⎛
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
ΔΔ
3
3
3
2
2
2
1
1
1
3
3
3
2
2
2
1
1
1
3
3
3
3
3
3
2
3
2
3
2
3
3
3
3
3
3
3
2
3
2
3
2
3
3
3
3
3
3
3
2
3
2
3
2
3
3
2
3
2
3
2
2
2
2
2
2
2
1
2
1
2
1
2
3
2
3
2
3
2
2
2
2
2
2
2
1
2
1
2
1
2
3
2
3
2
3
2
2
2
2
2
2
2
1
2
1
2
1
2
2
1
2
1
2
1
1
1
1
1
1
1
2
1
2
1
2
1
1
1
1
1
1
1
2
1
2
1
2
1
1
1
1
1
1
1
000000000
000000000
-
RESERVOIR SIMULATION
GPU-based Reservoir Simulation
All timesteps done?
Start timestep loop
Initialization
Start Newton iteration
Build Jacobian & r.h.s.
Solve matrix equation
Converged?
Update and I/O
End
No
Yes
No
Yes
Data input
Yes
Data input
Time to end?
Start time step loop
Initialization
Start Newton iteration
Build Jacobian & r.h.s.
Solve matrix equation on GPU
Converged?
Update and I/O
End
No
Yes
No Matrix solver
Matrix preprocess on CPU
Generate PC M on CPU
Transfer DATA to GPU
Solve Ax = b on GPU
Transfer x back to CPU
-
RESERVOIR SIMULATION
Numerical Experiments • CPU, Intel Xeon X5570, 8M cache, 2.93GHz, 32G memory • GPU, NVIDIA Tesla C2050/C2070, 3G/6G memory, ECC
disabled • Environment: Linux (Fedora 13 x86_64, kernel 2.6.34.7-61),
CUDA Toolkit 4.0, GCC 4.4.5 • Compiler options: -arch=sm_20 –Xcompiler “-Wall” –O3
-
RESERVOIR SIMULATION
Numerical Experiments
case 1: Testing 4 preconditioners and 3 solvers.
case 2: Testing the effect of block number to the speedup performance of BILU(0) and BILU(T)
case 3: Testing the speedup of the whole simulation process
Matrix N NNZ NNZ/ROW SPE10-1 2,188,851 29,915,573 13.7 SPE10-2 2,188,851 29,915,573 13.7
Case description
-
RESERVOIR SIMULATION
Case 1
Matrix N NNZ NNZ/ROW SPE10-1 2,188,851 29,915,573 13.7
Relative tolerance 1E-3 Restart m 40
Neumann Polynomial order 16 METIS partition 8
Case description
Experimental parameter
-
RESERVOIR SIMULATION
Solver PC Iteration CPU time GPU time Speedup GMRES Neumann Poly 30 1620.5 125.9 12.9
ILU(0) 18 263.8 27.9 9.5 BILU(0) 20 307.8 27.5 11.2
Performance comparison
All the PC à speedup of 10x Bilu(0) and ILU(0) both converge fast.
-
RESERVOIR SIMULATION
Solver PC Iteration CPU time GPU time Speedup BiCGSTAB Neumann Poly 359 740.7 64.7 11.4
ILU(0) 260/249 84.3 11.7 7.2 BILU(0) 243 85.6 9 9.5
Performance comparison
Solver PC Iteration CPU time GPU time Speedup ORTHOMIN Neumann Poly 543 1449.9 114.1 12.7
ILU(0) 392 284.8 30.1 9.5 BILU(0) 400 283 27.6 10.3
Speed up à 10x BICGSTAB with ILU(0) and BILU(0) solved faster than GMRES and ORTHOMIN
-
RESERVOIR SIMULATION
Blks Iteration CPU time GPU time Speedup 1 21 121.1 15 8.14 4 23 124.33 15 8.27 8 23 126.40 15.32 8.23
16 29 180.06 19.05 9.44
Blks Iteration CPU time GPU time Speedup 1 5 34.20 11.70 2.92 4 7 44.67 10.35 4.30 8 7 45.78 9.58 4.76
16 10 63.13 12.43 5.07
GMRES(20) + block ILU(0)
GMRES(20) + block ILUT
Case 2
-
RESERVOIR SIMULATION
Case 3: GPU-based Reservoir Simulation
• The SPE 10 Comparative Solution Project • Fine grid (60 * 220 * 85) • Highly heterogeneous
Relative tolerance 1e-3 Restart m 60
Neumann Polynomial order
16
Number of blocks 8
-
RESERVOIR SIMULATION
Solver PC CPU time GPU time Speedup
GMRES Neumann Poly 4h49m23s 29m43s 9.7
ILU(0) 1h30m16s 17m18s 5.2
BILU(0) 2h37m02s 20m18s 7.7
BiCGSTAB Neumann Poly 4h14m57s 36m13s 7
ILU(0) 1h0m40s 31m42s 1.9 BILU(0) 1h7m22s 34m28s 2
ORTHOMIN Neumann Poly 7h57m11s 56m27s 8.5 ILU(0) 2h25m48s 37m58s 3.8
BILUK(0) 2h37m23s 41m22s 3.8
-
RESERVOIR SIMULATION
Conclusions • Implemented a GPU-based linear solver package including the blas
operation, linear solvers, preconditioners and several pre-process methods.
• Compared the speedup performances of different linear solvers and preconditioners, and achieved around 10x speedup for SPE10 matrix.
• Coupled our GPU-based linear solver package with a in-house black oil reservoir simulator to speed-up SPE10 simulation problem and using GMRES, we can achieve the speed up of 5-10 for different precondtioners.
All publications can be accessed at:
• http://sites.google.com/site/monramax/publication
-
RESERVOIR SIMULATION
THANK YOU