Recent advances in HPC with FreeFem++
Pierre Jolivet
Laboratoire Jacques-Louis LionsLaboratoire Jean Kuntzmann
Fourth workshop on FreeFem++
December 6, 2012
With F. Hecht, F. Nataf, C. Prud’homme.
Introduction New solvers Domain decomposition methods Conclusion
Outline
1 IntroductionFlashback to last yearA new machine
2 New solversMUMPSPARDISO
3 Domain decomposition methodsThe necessary toolsPreconditioning Krylov methodsNonlinear problems
4 Conclusion
2 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
Where were we one year ago ?
What we had:• FreeFem++ version 3.17,• a “simple” toolbox for domain decomposition methods,• 3 supercomputers (SGI, Bull, IBM).
What we achieved:• linear problems up to 120 million unkowns in few minutes,• good scaling up to ∼2048 processes on a BlueGene/P.
3 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
FreeFem++ is working on the following parallel architectures:N of cores Memory Peak performance
hpc1@LJLL [email protected] GHz 640 Go ∼ 10 TFLOP/stitane@CEA [email protected] GHz 37 To 140 TFLOP/sbabel@IDRIS 40960@850 MHz 20 To 139 TFLOP/s
curie@CEA [email protected] GHz 320 To 1.6 PFLOP/s
http://www-hpc.cea.fr, Bruyères-le-Châtel, France.http://www.idris.fr, Orsay, France (grant PETALh).
http://www.prace-project.eu (grant HPC-PDE).
4 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
FreeFem++ is working on the following parallel architectures:N of cores Memory Peak performance
hpc1@LJLL [email protected] GHz 640 Go ∼ 10 TFLOP/stitane@CEA [email protected] GHz 37 To 140 TFLOP/sbabel@IDRIS 40960@850 MHz 20 To 139 TFLOP/scurie@CEA [email protected] GHz 320 To 1.6 PFLOP/s
http://www-hpc.cea.fr, Bruyères-le-Châtel, France.http://www.idris.fr, Orsay, France (grant PETALh).http://www.prace-project.eu (grant HPC-PDE).
4 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
FreeFem++ and the linear solvers
set(A, solver = sparsesolver) in FreeFem++⇓
instance of VirtualSolver<T> from which inherits a solver.
Interfacing a new solver basically consists in implementing:1 the constructor MySolver(const MatriceMorse<T>&)2 the pure virtual method
void Solver(const MatriceMorse<T>&, KN_<T>&,const KN_<T>&) const
3 the destructor MySolver(const MatriceMorse<T>&)
real[int] x = Aˆ-1 * b will call MySolver::Solver.
5 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
FreeFem++ and the linear solvers
set(A, solver = sparsesolver) in FreeFem++⇓
instance of VirtualSolver<T> from which inherits a solver.
Interfacing a new solver basically consists in implementing:1 the constructor MySolver(const MatriceMorse<T>&)
2 the pure virtual methodvoid Solver(const MatriceMorse<T>&, KN_<T>&,
const KN_<T>&) const3 the destructor MySolver(const MatriceMorse<T>&)
real[int] x = Aˆ-1 * b will call MySolver::Solver.
5 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
FreeFem++ and the linear solvers
set(A, solver = sparsesolver) in FreeFem++⇓
instance of VirtualSolver<T> from which inherits a solver.
Interfacing a new solver basically consists in implementing:1 the constructor MySolver(const MatriceMorse<T>&)2 the pure virtual method
void Solver(const MatriceMorse<T>&, KN_<T>&,const KN_<T>&) const
3 the destructor MySolver(const MatriceMorse<T>&)
real[int] x = Aˆ-1 * b will call MySolver::Solver.
5 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
FreeFem++ and the linear solvers
set(A, solver = sparsesolver) in FreeFem++⇓
instance of VirtualSolver<T> from which inherits a solver.
Interfacing a new solver basically consists in implementing:1 the constructor MySolver(const MatriceMorse<T>&)2 the pure virtual method
void Solver(const MatriceMorse<T>&, KN_<T>&,const KN_<T>&) const
3 the destructor MySolver(const MatriceMorse<T>&)
real[int] x = Aˆ-1 * b will call MySolver::Solver.
5 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
FreeFem++ and the linear solvers
set(A, solver = sparsesolver) in FreeFem++⇓
instance of VirtualSolver<T> from which inherits a solver.
Interfacing a new solver basically consists in implementing:1 the constructor MySolver(const MatriceMorse<T>&)2 the pure virtual method
void Solver(const MatriceMorse<T>&, KN_<T>&,const KN_<T>&) const
3 the destructor MySolver(const MatriceMorse<T>&)
real[int] x = Aˆ-1 * b will call MySolver::Solver.
5 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
MUltifrontal Massively Parallel Sparse direct Solver
http://graal.ens-lyon.fr/MUMPS
Distributed memory direct solver.Compiled by FreeFem++ with --enable-download.Renumbering via AMD, QAMD, AMF, PORD, (Par)METIS,
(PT-)SCOTCH.
Solves unsymmetric and symmetric linear systems !
6 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
MUMPS and FreeFem++
1 load "MUMPS"int[int] l = [1, 1, 2, 2];mesh Th;if(mpirank != 0) // no need to store the matrix on ranks other than 0
5 Th = square(1, 1, label = l);else
Th = square(150, 150, label = l);fespace Vh(Th, P2);
9 varf lap(u, v) = int2d(Th)(dx(u)∗dx(v) + dy(u)∗dy(v)) + int2d(Th)(v)+ on(1, u = 1);
real[int] b = lap(0, Vh);matrix A = lap(Vh, Vh);set(A, solver = sparsesolver);
13 Vh x; x[] = A−1 ∗ b;plot(Th, x, wait = 1, dim = 3, fill = 1, cmm = "sparsesolver", value =
1);
7 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
Intel MKL PARDISO
http://software.intel.com/en-us/intel-mkl
Shared memory direct solver.Part of the Intel Math Kernel Library.Renumbering via Metis or threaded nested dissection.
Solves unsymmetric and symmetric linear systems !
8 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
PARDISO and FreeFem++
load "PARDISO"2 include "cube.idp"int n = 35; int[int] NN = [n, n, n]; real[int, int] BB = [[0,1], [0,1],
[0,1]]; int[int, int] L = [[1,1], [3,4], [5,6]];mesh3 Th = Cube(NN, BB, L);fespace Vh(Th, P2);
6 varf lap(u, v) = int3d(Th)(dx(u)∗dx(v) + dz(u)∗dz(v) + dy(u)∗dy(v))+ int3d(Th)(v) + on(1, u = 1);
real[int] b = lap(0, Vh);matrix A = lap(Vh, Vh);real timer = mpiWtime();
10 set(A, solver = sparsesolver);cout << "Factorization: " << mpiWtime() − timer << endl;Vh x; timer = mpiWtime();x[] = A−1 ∗ b;
14 cout << "Solve: " << mpiWtime() − timer << endl;
9 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
Is it really necessary ?
If you are using direct methods within FreeFem++: yes !
Why ? Sooner or later, UMFPACK will blow up:
UMFPACK V5.5.1 (Jan 25, 2011): ERROR: out of memory
umfpack_di_numeric failed
10 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
Motivation
One of the most straightforward way to solve BVP in parallel.
Based on the “divide and conquer” paradigm:1 assemble,2 factorize and3 solve smaller problems.
11 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
Motivation
One of the most straightforward way to solve BVP in parallel.
Based on the “divide and conquer” paradigm:1 assemble,2 factorize and3 solve smaller problems.
11 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
A short introduction I
Consider the following BVP in R2:
∇ · (κ∇u) = F in Ω
B(u) = 0 on ∂ΩΩ
Then, solve in parallel:
∇ · (κ∇um+11 ) = F1 in Ω1
B(um+11 ) = 0 on ∂Ω1 ∩ ∂Ω
um+11 = um
2 on ∂Ω1 ∩ Ω2
∇ · (κ∇um+12 ) = F2 in Ω2
B(um+12 ) = 0 on ∂Ω2 ∩ ∂Ω
um+12 = um
1 on ∂Ω2 ∩ Ω1
12 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
A short introduction I
Consider the following BVP in R2:
∇ · (κ∇u) = F in Ω
B(u) = 0 on ∂Ω
Ω2Ω1
Then, solve in parallel:
∇ · (κ∇um+11 ) = F1 in Ω1
B(um+11 ) = 0 on ∂Ω1 ∩ ∂Ω
um+11 = um
2 on ∂Ω1 ∩ Ω2
∇ · (κ∇um+12 ) = F2 in Ω2
B(um+12 ) = 0 on ∂Ω2 ∩ ∂Ω
um+12 = um
1 on ∂Ω2 ∩ Ω1
12 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
A short introduction II
We have effectively divided, but we have yet to conquer.
Duplicated unknowns coupled via a partition of unity:
I =N∑
i=1RT
i DiRi ,
where RTi is the prolongation from Vi to V .
Then,
um+1 =N∑
i=1RT
i Dium+1i .
13 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
A short introduction II
We have effectively divided, but we have yet to conquer.
Duplicated unknowns coupled via a partition of unity:
I =N∑
i=1RT
i DiRi ,
where RTi is the prolongation from Vi to V .
Then,
um+1 =N∑
i=1RT
i Dium+1i .
13 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
One-level preconditioners
A common preconditioner is:
M−1RAS :=
N∑i=1
RTi Di(RiART
i )−1Ri .
For future references, let Aij := RiARTj .
These preconditioners don’t scale as N increases:
κ(M−1A) 6 C 1H2
(1 +
Hδh
)
They act as low-pass filters (think about mulgrid methods).
14 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
One-level preconditioners
A common preconditioner is:
M−1RAS :=
N∑i=1
RTi Di(RiART
i )−1Ri .
For future references, let Aij := RiARTj .
These preconditioners don’t scale as N increases:
κ(M−1A) 6 C 1H2
(1 +
Hδh
)
They act as low-pass filters (think about mulgrid methods).
14 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
Two-level preconditionersA common technique in the field of DDM, MG, deflation:
introduce an auxiliary “coarse” problem.
Let Z be a rectangular matrix so that the “bad eigenvectors”of M−1A belong to the space spanned by its columns. Define
E := Z T AZ Q := ZE−1Z T .
Z has O(N) columns, hence E is relatively smaller than A.The following preconditioner can scale theoretically:
P−1A-DEF1 := M−1(I − AQ) + Q .
15 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
Two-level preconditionersA common technique in the field of DDM, MG, deflation:
introduce an auxiliary “coarse” problem.
Let Z be a rectangular matrix so that the “bad eigenvectors”of M−1A belong to the space spanned by its columns. Define
E := Z T AZ Q := ZE−1Z T .
Z has O(N) columns, hence E is relatively smaller than A.
The following preconditioner can scale theoretically:
P−1A-DEF1 := M−1(I − AQ) + Q .
15 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
Two-level preconditionersA common technique in the field of DDM, MG, deflation:
introduce an auxiliary “coarse” problem.
Let Z be a rectangular matrix so that the “bad eigenvectors”of M−1A belong to the space spanned by its columns. Define
E := Z T AZ Q := ZE−1Z T .
Z has O(N) columns, hence E is relatively smaller than A.The following preconditioner can scale theoretically:
P−1A-DEF1 := M−1(I − AQ) + Q .
15 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
Preconditioners in actionBack to our Krylov method of choice: the GMRES.
procedure GMRES(input vector x0, right-hand side b)r0 ← P−1
A-DEF1(b−Ax0)v0 ← r0/||r0||2for i = 0, . . . ,m − 1 do
w ← P−1A-DEF1Avi
for j = 0, . . . , i dohj,i ← 〈w , vj〉
end forvi+1 ← w −
∑ij=1 hj,i vj
hi+1,i ← ||vi+1||2vi+1 ← vi+1/hi+1,iapply Givens rotations to h:,i
end forym ← arg min||Hmym − βe1||2 with β = ||r0||2return x0 + Vmym
end procedure
global sparse matrix-vector multiplication
global dot products
global preconditioner-vector computation
16 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
Preconditioners in actionBack to our Krylov method of choice: the GMRES.
procedure GMRES(input vector x0, right-hand side b)r0 ← P−1
A-DEF1(b−Ax0)v0 ← r0/||r0||2for i = 0, . . . ,m − 1 do
w ← P−1A-DEF1Avi
for j = 0, . . . , i dohj,i ← 〈w , vj〉
end forvi+1 ← w −
∑ij=1 hj,i vj
hi+1,i ← ||vi+1||2vi+1 ← vi+1/hi+1,iapply Givens rotations to h:,i
end forym ← arg min||Hmym − βe1||2 with β = ||r0||2return x0 + Vmym
end procedure
global sparse matrix-vector multiplication
global dot products
global preconditioner-vector computation
16 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
Preconditioners in actionBack to our Krylov method of choice: the GMRES.
procedure GMRES(input vector x0, right-hand side b)r0 ← P−1
A-DEF1(b−Ax0)v0 ← r0/||r0||2for i = 0, . . . ,m − 1 do
w ← P−1A-DEF1Avi
for j = 0, . . . , i dohj,i ← 〈w , vj〉
end forvi+1 ← w −
∑ij=1 hj,i vj
hi+1,i ← ||vi+1||2vi+1 ← vi+1/hi+1,iapply Givens rotations to h:,i
end forym ← arg min||Hmym − βe1||2 with β = ||r0||2return x0 + Vmym
end procedure
global sparse matrix-vector multiplication
global dot products
global preconditioner-vector computation
16 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
Preconditioners in actionBack to our Krylov method of choice: the GMRES.
procedure GMRES(input vector x0, right-hand side b)r0 ← P−1
A-DEF1(b−Ax0)v0 ← r0/||r0||2for i = 0, . . . ,m − 1 do
w ← P−1A-DEF1Avi
for j = 0, . . . , i dohj,i ← 〈w , vj〉
end forvi+1 ← w −
∑ij=1 hj,i vj
hi+1,i ← ||vi+1||2vi+1 ← vi+1/hi+1,iapply Givens rotations to h:,i
end forym ← arg min||Hmym − βe1||2 with β = ||r0||2return x0 + Vmym
end procedure
global sparse matrix-vector multiplication
global dot products
global preconditioner-vector computation
16 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
In practice
A scalable DDM framework must include routines for:• p2p communications to compute spmv,• p2p communications to apply a one-level preconditioner,• global transfer between master and slaves processes toapply “coarse” corrections,
• direct solvers for the “coarse” problem and local problems.
17 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
In FreeFem++
subdomain A(S, D, restrictionIntersection, arrayIntersection,communicator = mpiCommWorld); // build buffers for spmv
2 preconditioner E(A); // build M−1RAS
if(CoarseOperator) ...attachCoarseOperator(E, A, A = GEVPA, B = GEVPB, parameters
= parm); // build P−1A-DEF1
6 GMRESDDM(A, E, u[], rhs, dim = 100, iter = 100, eps = eps);
18 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
Nonlinear elasticity
We solve an evolution problem of hyperelasticity on a beam,find u such that: ∀t ∈ [0; T ], ∀v ∈ [H1(Ω)]
3,∫
Ωρu · v +
∫Ω
(I +∇u) Σ︸ ︷︷ ︸=:F(u)
: ∇v =∫
Ωf · v +
∫∂ΩS
g · v
u = 0 on ∂ΩD
σ(u) · n = 0 on ∂ΩN
whereΣ = λtr(E )I + 2µE
19 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
In FreeFem++
Add another operator rebuild (currently only rebuilds Aii−1).
The rest is (almost) the same.1 for(int k = 0; k < 200; ++k)
SchemeInit(u)if(k > 0)
S = vPbNonlinear(Wh, Wh, solver = CG);5 rhs = vPbLinear(0, Wh);
rebuild(S, A, E);GMRESDDM(A, E, h[], rhs, dim = 100, iter = 100, eps = eps);
9 SchemeAdvance(u, h)
20 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
Outline
1 IntroductionFlashback to last yearA new machine
2 New solversMUMPSPARDISO
3 Domain decomposition methodsThe necessary toolsPreconditioning Krylov methodsNonlinear problems
4 Conclusion
21 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
Amuses bouche before lunch I
1 0242 048
4 0966 144
8 192
1
2
4
6
8
#processes
Tim
ingrelative
to102
4processes
Linear speedup 10
15
20
25
#iterations
Linear elasticity in 2D with P3 FE.1 billion unknowns solved in 31 seconds at peak performance.
22 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
Amuses bouche before lunch II
1 0242 048
4 0966 144
8 192
1
2
4
6
8
#processes
Tim
ingrelative
to102
4processes
Linear speedup 10
15
20
25
#iterations
Linear elasticity in 3D with P2 FE.80 million unknowns solved in 35 seconds at peak performance.
23 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
Amuses bouche before lunch III
1 0242 048
4 0966 144
8 19212 288
0%
20%
40%
60%
80%
100%
#processes
Weakeffi
ciency
relativeto
1024processes
844
10 176
#d.o.f.(inmillion
s)
Scalar diffusivity in 2D with P3 FE.10 billion unknowns solved in 160 seconds on 12 228 processes.
24 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
Amuses bouche before lunch IV
1 0242 048
4 0966 144
8 1920%
20%
40%
60%
80%
100%
#processes
Weakeffi
ciency
relativeto
1024processes
130
1 051
#d.o.f.(inmillion
s)
Scalar diffusivity in 3D with P2 FE.1 billion unknowns solved in 150 seconds on 8 192 processes.
25 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
To sum up
1 FreeFem+++ = easy framework to solve large systems.
Two-level DDM2 New solvers give performance boost on smaller systems.
New problems being tackled:• more complex nonlinearities,• reuse of Krylov and deflation subspaces,• BNN and FETI methods.
Thank you for your attention.
26 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
To sum up
1 FreeFem+++ = easy framework to solve large systems.
Two-level DDM2 New solvers give performance boost on smaller systems.
New problems being tackled:• more complex nonlinearities,• reuse of Krylov and deflation subspaces,• BNN and FETI methods.
Thank you for your attention.
26 / 26 Pierre Jolivet HPC and FreeFem++
Introduction New solvers Domain decomposition methods Conclusion
To sum up
1 FreeFem+++ = easy framework to solve large systems.
Two-level DDM2 New solvers give performance boost on smaller systems.
New problems being tackled:• more complex nonlinearities,• reuse of Krylov and deflation subspaces,• BNN and FETI methods.
Thank you for your attention.26 / 26 Pierre Jolivet HPC and FreeFem++