on preconditioning of real linear systems of …...on preconditioning of real linear systems of...
TRANSCRIPT
On Preconditioning of Real Linear Systemsof Equations with Complex Shifts
Jens Saak
joint work withHartwig Anzt (ICL UTK)
and
Edmond Chow (CC GATech)
Joint GAMM DMV Annual MeetingTU Braunschweig, March 8 2016
MotivationModel Order Reduction of linear time invariant dynamical systems oftenemploys the rational Krylov subspace method (RKSM) of some flavor:
Moment Matching,
IRKA
Balanced Truncation (computing the Gramians via ADI or RKSM)
Dominant computation in all these methods
Sequences of Shifted Linear Systems (SLS)
Pj
(A + σjE )Xj =
Pj
Yj , for i ∈ I ⊂ N,
with real data A ∈ Rn×n, E ∈ Rn×n, Yj ∈ Rn×m, m� n, but possiblycomplex shifts σj ∈ C.
Find sequence of preconditioners Pj reusing information where possible!
For iterative solvers
Jens Saak, [email protected] Preconditioning Linear Systems w. Complex Shifts 2/14
MotivationModel Order Reduction of linear time invariant dynamical systems oftenemploys the rational Krylov subspace method (RKSM) of some flavor:
Moment Matching,
IRKA
Balanced Truncation (computing the Gramians via ADI or RKSM)
Dominant computation in all these methods
Sequences of Shifted Linear Systems (SLS)
Pj
(A + σjE )Xj =
Pj
Yj , for i ∈ I ⊂ N,
with real data A ∈ Rn×n, E ∈ Rn×n, Yj ∈ Rn×m, m� n, but possiblycomplex shifts σj ∈ C.
Find sequence of preconditioners Pj reusing information where possible!
For iterative solvers
Jens Saak, [email protected] Preconditioning Linear Systems w. Complex Shifts 2/14
MotivationModel Order Reduction of linear time invariant dynamical systems oftenemploys the rational Krylov subspace method (RKSM) of some flavor:
Moment Matching,
IRKA
Balanced Truncation (computing the Gramians via ADI or RKSM)
Dominant computation in all these methods
Sequences of Shifted Linear Systems (SLS)
Pj (A + σjE )Xj = PjYj , for i ∈ I ⊂ N,
with real data A ∈ Rn×n, E ∈ Rn×n, Yj ∈ Rn×m, m� n, but possiblycomplex shifts σj ∈ C.
Find sequence of preconditioners Pj reusing information where possible!
For iterative solvers
Jens Saak, [email protected] Preconditioning Linear Systems w. Complex Shifts 2/14
Agenda
1. Motivation
2. Prior WorkReusing and Realigning PreconditionerThe Symmetric CaseComplex Linear Systems
3. Structure Exploiting PreconditionersProblem StructurePreconditioning Approaches
4. Experiments
Jens Saak, [email protected] Preconditioning Linear Systems w. Complex Shifts 3/14
Prior WorkReusing and Realigning Preconditioner. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1. reuse preconditioner for small matrix changes, i.e. similar shifts
2. update incomplete factorization preconditioner[Bellavia/De Simone/di Serafino/Morini ’11, Benzi/Bertaccini ’03,
Bertaccini ’04, Calgaro/Chehab/Saad ’10,
Duintjer Tebbens/Tuma ’07]
Main Issues
reuses reduce efficiency of the preconditioner
updates have limited potential for parallel execution
Goal
highly parallel preconditioner update and application
Jens Saak, [email protected] Preconditioning Linear Systems w. Complex Shifts 4/14
Prior WorkThe Symmetric Case [Anzt/Chow/S. ’14–’16]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Preconditioned iterative solution ofSequences of Shifted Linear Systems (SLS)
Pj (A + σjE )Xj = PjYj , for j ∈ I ⊂ N,
for A, E symmetric and σj real.
Idea: Iterative computation/approximation of preconditioners
If |σj − σj−1| is small enough, then ‖Pj − Pj−1‖ is small⇒ Pj−1 good initial guess for Pj .
Use [Chow/Patel ’15, Chow/Anzt/Dongarra ’15] for (asynchronous)iterative computation of incomplete factorization preconditioners.
Jens Saak, [email protected] Preconditioning Linear Systems w. Complex Shifts 5/14
Prior WorkThe Symmetric Case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Idea: Minimize ‖A− UTU‖ via fixed point approach [Chow/et al. ’15]
Algorithm 1: Iterative IC Algorithm (shared mem., asynchronous)
Input: Initial guess U ∈ Rn×n and allowed pattern SUOutput: Approximate incomplete Cholesky factor U
1 for sweep = 1, 2, . . . until convergence do2 parallel for (i , j) ∈ SU do
3 s = aij −∑i−1
k=1 ukiukj4 if i 6= j then5 uij = s/uii6 else7 uii =
√s
8 end
9 endfor
10 end
synchronous version globally convergent, asynchronous only locally. e.g.[Frommer/Szyld ’00]
Jens Saak, [email protected] Preconditioning Linear Systems w. Complex Shifts 6/14
Prior WorkThe Symmetric Case Oberwolfach Rail1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Wachspress shifts
Shift index0 10 20 30 40
Shi
ft <
i
10-6
10-4
10-2
100
102
104
disc_1disc_2disc_3disc_4disc_5disc_6disc_7
condition numbers of A + σjE
Shift index0 5 10 15 20 25 30 35
Con
ditio
n nu
mbe
r10 1
10 2
10 3
10 4
10 5
10 6
disc 1 = 371 dof, disc 2 = 1 357 dof, disc 3 = 5 177 dof, disc 4 = 20 209 dof,disc 5 = 79 841 dof, disc 6 = 317 377 dof, disc 7 = 1 265 537 dof
1FEniCS based reimplementation by Maximilian Behr
Jens Saak, [email protected] Preconditioning Linear Systems w. Complex Shifts 7/14
Prior WorkThe Symmetric Case Oberwolfach Rail. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 sweep
Shift index5 10 15 20 25 30
PC
G it
erat
ions
0
50
100
150
200
250
300
350
400
450
500ICSIGPFIG
2 sweeps
Shift index5 10 15 20 25 30
PC
G it
erat
ions
0
50
100
150
200
250
300
350
400
450
500ICSIGPFIG
Jens Saak, [email protected] Preconditioning Linear Systems w. Complex Shifts 8/14
Prior WorkThe Symmetric Case Oberwolfach Rail. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Time [ms] SpeedupNumber of sweeps Number of sweeps
IC 1 3 5 1 3 5371 4.88 0.03 0.06 0.08 167.70 87.14 57.48
1 357 9.01 0.03 0.06 0.08 309.62 158.07 107.395 177 15.60 0.04 0.08 0.12 421.62 200.00 134.48
20 209 32.00 0.08 0.19 0.30 405.58 167.54 105.2679 841 66.40 0.23 0.62 1.01 286.21 106.58 65.74
317 377 142.00 0.78 2.25 3.73 182.75 63.11 38.071 265 537 323.00 2.94 8.77 14.50 109.86 36.83 22.28
Table: Time [ms] for the exact incomplete Cholesky factorization using NVIDIA’scuSPARSE library and 1, 3, and 5 sweeps of the iterative counterpart – and therespective speedups.
Jens Saak, [email protected] Preconditioning Linear Systems w. Complex Shifts 9/14
Prior WorkComplex Linear Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ConsiderAx = b
for A ∈ Cn×n, x , b ∈ Cn.With Ar , Ai ∈ Rn×n, xr , xi , br , bi ∈ Rn write
A = Ar + ıAi , x = xr + ıxi , b = br + ıbi .
Idea e.g.[Day/Heroux ’01, Benzi/Bertaccini ’08]
Rewrite to real linear system[Ar −Ai
Ai Ar
] [xrxi
]=
[brbi
]and employ block preconditioner.
Jens Saak, [email protected] Preconditioning Linear Systems w. Complex Shifts 10/14
Structure Exploiting Preconditioners
Problem Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
In our special case
Aj ,r = A + Re (σj)E , Aj ,i = Im (σj)E
Mj =
[Re (σj) − Im (σj)Im (σj) Re (σj)
]and Jj =
[0 − Im (σj)
Im (σj) 0
]Then
Aj Vj = Wj ⇔[Aj ,r −Aj ,i
Aj ,i Aj ,r
] [Vj ,r
Vj ,i
]=
[Wj ,r
Wj ,i
]⇔ (I2 ⊗ A + Mj ⊗ E )Vj = Wj
⇔ (I2 ⊗Aj ,r + Jj ⊗ E )Vj = Wj
Jens Saak, [email protected] Preconditioning Linear Systems w. Complex Shifts 11/14
Structure Exploiting Preconditioners
Preconditioning Approaches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Matrix Representations
A = (I2 ⊗ A + Mj ⊗ E ) A = (I2 ⊗Aj ,r + Jj ⊗ E )
Derived Preconditioner Definitions
1. P1 := (Mj ⊗ E )−1 =(M−1j ⊗ E−1
)Note: M−1j = 1
|σj |2MT
j
2. P2 := (I2 ⊗ A)−1 =(I2 ⊗ A−1
)3. P3 := (I2 ⊗Aj ,r )−1 =
(I2 ⊗A−1j ,r
)
counter diagonal blocks are:
± Im (σj) (A+ Re (σj)E)−1E =
± Im (σj)
Re (σj)(A+ Re (σj)E)
−1 (A+ Re (σj)E − A)
=± Im (σj)
Re (σj)
(In − (A+ Re (σj)E)
−1A).
Jens Saak, [email protected] Preconditioning Linear Systems w. Complex Shifts 12/14
Structure Exploiting Preconditioners
Preconditioning Approaches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Matrix Representations
A = (I2 ⊗ A + Mj ⊗ E ) A = (I2 ⊗Aj ,r + Jj ⊗ E )
Derived Preconditioner Definitions
1. P1 := (Mj ⊗ E )−1 =(M−1j ⊗ E−1
)Note: M−1j = 1
|σj |2MT
j
2. P2 := (I2 ⊗ A)−1 =(I2 ⊗ A−1
)3. P3 := (I2 ⊗Aj ,r )−1 =
(I2 ⊗A−1j ,r
)P1A = (M−1j ⊗ E−1A) + I2n = ( 1
|σj |2MT ⊗ E−1A) + I2n ≈
|σj | largeI2n
+ realignment exceptionally cheap.
+ application can be recast: P1Rj ⇔ 1|σj |2
E−1RjMj
counter diagonal blocks are:
± Im (σj) (A+ Re (σj)E)−1E =
± Im (σj)
Re (σj)(A+ Re (σj)E)
−1 (A+ Re (σj)E − A)
=± Im (σj)
Re (σj)
(In − (A+ Re (σj)E)
−1A).
Jens Saak, [email protected] Preconditioning Linear Systems w. Complex Shifts 12/14
Structure Exploiting Preconditioners
Preconditioning Approaches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Matrix Representations
A = (I2 ⊗ A + Mj ⊗ E ) A = (I2 ⊗Aj ,r + Jj ⊗ E )
Derived Preconditioner Definitions
1. P1 := (Mj ⊗ E )−1 =(M−1j ⊗ E−1
)Note: M−1j = 1
|σj |2MT
j
2. P2 := (I2 ⊗ A)−1 =(I2 ⊗ A−1
)3. P3 := (I2 ⊗Aj ,r )−1 =
(I2 ⊗A−1j ,r
)P2A = I2n + Mj ⊗ A−1E
+ better capturing of A, same realignment cost
− unclear when Mj ⊗ A−1E vanishes
counter diagonal blocks are:
± Im (σj) (A+ Re (σj)E)−1E =
± Im (σj)
Re (σj)(A+ Re (σj)E)
−1 (A+ Re (σj)E − A)
=± Im (σj)
Re (σj)
(In − (A+ Re (σj)E)
−1A).
Jens Saak, [email protected] Preconditioning Linear Systems w. Complex Shifts 12/14
Structure Exploiting Preconditioners
Preconditioning Approaches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Matrix Representations
A = (I2 ⊗ A + Mj ⊗ E ) A = (I2 ⊗Aj ,r + Jj ⊗ E )
Derived Preconditioner Definitions
1. P1 := (Mj ⊗ E )−1 =(M−1j ⊗ E−1
)Note: M−1j = 1
|σj |2MT
j
2. P2 := (I2 ⊗ A)−1 =(I2 ⊗ A−1
)3. P3 := (I2 ⊗Aj ,r )−1 =
(I2 ⊗A−1j ,r
)P3A = I2n + Jj ⊗A−1r E
+ none-identity parts concentrated on counter-diagonal
+ good preconditioning for small Re (σj), or Im (σj).
counter diagonal blocks are:
± Im (σj) (A+ Re (σj)E)−1E =
± Im (σj)
Re (σj)(A+ Re (σj)E)
−1 (A+ Re (σj)E − A)
=± Im (σj)
Re (σj)
(In − (A+ Re (σj)E)
−1A).
Jens Saak, [email protected] Preconditioning Linear Systems w. Complex Shifts 12/14
Structure Exploiting Preconditioners
Preconditioning Approaches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Matrix Representations
A = (I2 ⊗ A + Mj ⊗ E ) A = (I2 ⊗Aj ,r + Jj ⊗ E )
Derived Preconditioner Definitions
1. P1 := (Mj ⊗ E )−1 =(M−1j ⊗ E−1
)Note: M−1j = 1
|σj |2MT
j
2. P2 := (I2 ⊗ A)−1 =(I2 ⊗ A−1
)3. P3 := (I2 ⊗Aj ,r )−1 =
(I2 ⊗A−1j ,r
)P3A = I2n + Jj ⊗A−1r E
+ none-identity parts concentrated on counter-diagonal
+ good preconditioning for small Re (σj), or Im (σj).
counter diagonal blocks are:
± Im (σj) (A+ Re (σj)E)−1E =
± Im (σj)
Re (σj)(A+ Re (σj)E)
−1 (A+ Re (σj)E − A)
=± Im (σj)
Re (σj)
(In − (A+ Re (σj)E)
−1A).
Jens Saak, [email protected] Preconditioning Linear Systems w. Complex Shifts 12/14
Experiments
FEniCS Convect. Diffusion on Unit Square1 n = 9801. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
P3-GMRES iterations for SLS (tol=10−8, maxiter=250)
−3.71 102 + ı 2.39 102/10−2/10−6 −3.71 10−6/104 + 2.39 102ı
no prec. 174 136 136 250 11
block ILU(0) 54 41 41 130 5
block ILU(1) 35 26 26 85 4
block ILU(2) 26 19 19 65 4
block ILU(3) 21 14 14 54 4
block \ 15 2 1 38 4
full ILU(0) 52 41 41 125 5
1model data by Maximilian Behr
Jens Saak, [email protected] Preconditioning Linear Systems w. Complex Shifts 13/14
Conclusions & OutlookConclusions
analysis of the problem structure gives rise to 2 promisingprecoditioner candidates
classic results suggesting block diagonal preconditioning confirmed forthe structures investigated
precoditioner only requires one n × n ILU with a real shift(enables reuse of prior results)
Outlook
proper HPC implementation
comparison with existing strategies
Thank you for your attention!
Jens Saak, [email protected] Preconditioning Linear Systems w. Complex Shifts 14/14
Conclusions & OutlookConclusions
analysis of the problem structure gives rise to 2 promisingprecoditioner candidates
classic results suggesting block diagonal preconditioning confirmed forthe structures investigated
precoditioner only requires one n × n ILU with a real shift(enables reuse of prior results)
Outlook
proper HPC implementation
comparison with existing strategies
Thank you for your attention!
Jens Saak, [email protected] Preconditioning Linear Systems w. Complex Shifts 14/14