a simplified presentation of simplified gmrmhg/talks/china06/sgmres.pdf · a simplified...

A simplified presentation of Simplified GMRES

Martin H. Gutknecht

Seminar for Applied MathematicsETH Zurich

First International Conference on Numerical Algebra and Scientific Computing

(NASC06)

Beijing, Oct. 22–25, 2006

Martin H. Gutknecht A simplified presentation of Simplified GMRES

Krylov subspaces and Krylov space solvers

DEFINITION. Given a nonsingular A ∈ CN×N and y 6= o ∈ CN ,the nth Krylov (sub)space Kn(A, y) generated by A from y is

Kn :≡ Kn(A, y) :≡ span (y, Ay, . . . , An−1y). (1)

DEFINITION. A (standard) Krylov space method for solvinga linear system Ax = b or, briefly, a (standard) Krylov spacesolver is an iterative method starting from some initialapproximation x0 and the corresponding residual r0 :≡ b− Ax0and generating for all, or at least most n, iterates xn such that

xn − x0 = qn−1(A)r0 ∈ Kn(A, r0) (2)

with a polynomial qn−1 of exact degree n − 1. N

Krylov subspaces and Krylov space solvers

DEFINITION. Given a nonsingular A ∈ CN×N and y 6= o ∈ CN ,the nth Krylov (sub)space Kn(A, y) generated by A from y is

Kn :≡ Kn(A, y) :≡ span (y, Ay, . . . , An−1y). (1)

DEFINITION. A (standard) Krylov space method for solvinga linear system Ax = b or, briefly, a (standard) Krylov spacesolver is an iterative method starting from some initialapproximation x0 and the corresponding residual r0 :≡ b− Ax0and generating for all, or at least most n, iterates xn such that

xn − x0 = qn−1(A)r0 ∈ Kn(A, r0) (2)

with a polynomial qn−1 of exact degree n − 1. N

Krylov space solvers (cont’d)

The residuals of a Krylov space solver satisfy

rn = pn(A)r0 ∈ r0 + AKn(A, r0) ⊆ Kn+1(A, r0) , (3)

where pn is a polynomial of degree n, which is related to thepolynomial qn−1 of (2) by

pn(ζ) = 1− ζqn−1(ζ) . (4)

In particular,pn(0) = 1 . (5)

DEFINITION. pn ∈ Pn is the nth residual polynomial.Condition (5) is its consistency condition. N

REMARK. For some Krylov space solvers (e.g., BICG) theremay exist exceptional situations, where for some n the iteratexn and the residual rn are not defined.

There are also nonstandard Krylov space methods wherethe search space for xn − x0 is still a Krylov space, but one thatdiffers from Kn(A, r0). H

REMARK. With respect to the “influence on the developmentand practice of science and engineering in the 20th century”,Krylov space methods are considered as one of the ten mostimportant classes of numerical methods. H

Minimum residual methods

DEFINITION. A minimum residual method (in general sense)for solving Ax = b is one where in some norm (given by Hpd C)

‖rn‖C :≡ ‖b− Axn‖C = minx∈Kn

‖b− Axn‖C (6)N

Examples:

method/algorithm requirement Cconjugate gradients (CG) A Hpd A−1

conjugate residuals (CR) A Hermitian Igener. conj. residuals (GCR) Igener. min. residuals (GMRES) I

DEFINITION. A minimum residual method (in the true sense)is one where additionally C = I. N

Minimum residual methods

DEFINITION. A minimum residual method (in general sense)for solving Ax = b is one where in some norm (given by Hpd C)

‖rn‖C :≡ ‖b− Axn‖C = minx∈Kn

‖b− Axn‖C (6)N

Examples:

method/algorithm requirement Cconjugate gradients (CG) A Hpd A−1

conjugate residuals (CR) A Hermitian Igener. conj. residuals (GCR) Igener. min. residuals (GMRES) I

DEFINITION. A minimum residual method (in the true sense)is one where additionally C = I. N

The Arnoldi algorithm (based on CGS)AlgorithmLet a nonsingular matrix A and a nonzero vector y be given.For constructing a nested set of orthonormal bases{y0, y1, . . . , ym} for the nested Krylov subspaces Km+1(A, y)(m = 0, . . . , ν(y, A)− 1) we let η0 := ‖y‖, y0 := y/η0 andcompute, for n = 0, 1, . . . , m − 1,

y :=(Ayn − yn ηn,n − · · · − y0 η0,n

yn+1 := y/ηn+1,n ,

where the coefficients η0,n, η1,n, . . . , ηn,n are chosen to make yorthogonal to y0, y1, . . . , yn, while ηn+1,n is used to normalize y:

ηk ,n :≡ 〈yk , Ayn〉 (k = 0, . . . , n), ηn+1,n :≡ ‖y‖ . (8)

When n = m − 1 = ν − 1, then y = o and the processterminates.

The Arnoldi algorithm (based on MGS)

Instead of classical Gram-Schmidt (CGS) we should applymodified Gram-Schmidt (MGS):

AlgorithmLet a nonsingular matrix A and a nonzero vector y be given.For constructing a nested set of orthonormal bases{y0, y1, . . . , ym} for the nested Krylov subspaces Km+1(A, y)(m = 1, . . . , ν(y, A)− 1) we let y0 := y/η0 and compute, forn = 0, 1, . . . , m − 1,

y := Ayn

y := y− yk ηk ,n , ηk ,n :≡⟨yk , y

⟩(k = n, n − 1, . . . , 0),

yn+1 := y/ηn+1,n , ηn+1,n :≡ ‖y‖ .

When n = m − 1 = ν − 1, then y = o and the processterminates.

The Arnoldi relation

We can cast the recursions into matrix identities. After m − 1steps we define the N ×m matrix

Ym :≡(

y0 y1 · · · ym−1)

and an extended Hessenberg matrix of size (m + 1)×m:

Hm :≡

η0,0 η0,1 · · · η0,m−1η1,0 η1,1 · · · η1,m−1

η2,1. . .

.... . . ηm−1,m−1

ηm,m−1

ThenAYm = Ym+1Hm . (10)

This identity is often referred to as Arnoldi relation.

Transforming Ax = b to coordinate space

Since xn − x0 ∈ Kn(A, r0), there exists a coordinate vectorkn ∈ Cn such that

xn = x0 + Ynkn , rn = r0 − AYnkn . (11)

In view of y0 = r0/ρ0 with ρ0 :≡ ‖r0‖, we find, by inserting (10)and using

r0 = y0ρ0 = Yn+1 e1 ρ0 , (12)

with e1 :≡(

1 0 0 · · ·)T ∈ Rn+1, that

rn = Yn+1 (e1ρ0 − Hnkn) . (13)

If Yn+1 has orthonormal columns, minimizing rn is equivalent tominimizing its coordinate vector, the quasi-residual

qn :≡ e1ρ0 − Hnkn . (14)

r0 = y0ρ0 = Yn+1 e1 ρ0 , (12)

with e1 :≡(

1 0 0 · · ·)T ∈ Rn+1, that

rn = Yn+1 (e1ρ0 − Hnkn) . (13)

qn :≡ e1ρ0 − Hnkn . (14)

r0 = y0ρ0 = Yn+1 e1 ρ0 , (12)

with e1 :≡(

1 0 0 · · ·)T ∈ Rn+1, that

rn = Yn+1 (e1ρ0 − Hnkn) . (13)

qn :≡ e1ρ0 − Hnkn . (14)

‖rn‖2 = ‖qn‖2 = ‖e1ρ0 − Hnkn‖2 = min! (15)

is an (n + 1)× n least square problem with matrix Hn.

It can be solved by (full) QR decomposition: Hn = Qn+1RQRn .

hn :≡(

):≡ Q?

n+1e1ρ0 . (16)

‖e1ρ0 − Hnkn‖2 = ‖Q?n+1e1ρ0 − RQR

n kn‖2

= ‖hn − RQRn kn‖2

= ‖hn − RQRn kn‖2 + |ηn|2

we see that the solution is

kn = (RQRn )−1hn with ‖e1ρ0 − Hnkn‖2 = |ηn|2 . (17)

‖rn‖2 = ‖qn‖2 = ‖e1ρ0 − Hnkn‖2 = min! (15)

is an (n + 1)× n least square problem with matrix Hn.

It can be solved by (full) QR decomposition: Hn = Qn+1RQRn .

hn :≡(

):≡ Q?

n+1e1ρ0 . (16)

‖e1ρ0 − Hnkn‖2 = ‖Q?n+1e1ρ0 − RQR

n kn‖2

= ‖hn − RQRn kn‖2

= ‖hn − RQRn kn‖2 + |ηn|2

we see that the solution is

kn = (RQRn )−1hn with ‖e1ρ0 − Hnkn‖2 = |ηn|2 . (17)

GMRES (cont’d)

Determine Qn+1 only in its factored form, as product of n Givensrotations chosen to annihilate the subdiagonal elements of Hn.

In each step the GMRES alg. of Saad and Schultz (1986)

extends the Krylov space:

Kn(A, r0) Kn+1(A, r0)

extends Hn and updates its QR decomposition:

Hn = Qn+1RQRn Hn+1 = Qn+1RQR

updates kn and |ηn|2 = ‖rn‖2 (but not xn and rn).

At convergence (when |ηn| ≤ tol): compute xn and rn, andcheck ‖rn‖.Memory requirement grows linearly with n restarts needed.

GMRES (cont’d)

Determine Qn+1 only in its factored form, as product of n Givensrotations chosen to annihilate the subdiagonal elements of Hn.

In each step the GMRES alg. of Saad and Schultz (1986)

extends the Krylov space:

Kn(A, r0) Kn+1(A, r0)

extends Hn and updates its QR decomposition:

Hn = Qn+1RQRn Hn+1 = Qn+1RQR

updates kn and |ηn|2 = ‖rn‖2 (but not xn and rn).

At convergence (when |ηn| ≤ tol): compute xn and rn, andcheck ‖rn‖.Memory requirement grows linearly with n restarts needed.

Is it possible to simplify GMRES?

For sure, we can determine rn much easier! Recall:

‖rn‖ = minr∈r0+AKn

‖r‖ .

��

��=

−rn r0 − rn

r0 + AKn

Figure: rn is the best approximation of 0 from r0 + AKn. Therefore,r0 − rn is the best approximation of r0 from AKn, and consequently,r0 − rn is the orthogonal projection of r0 on AKn.

Computing r0 − rn as projection of r0

The orthogonal projection r0 − rn of r0 on AKn is easiest tocompute if we have an orthonormal basis of AKn.

We can use the Arnoldi process, started with Ar0, to constructsuch an orthonormal basis {y1, . . . , yn} of AKn, n = 1, 2, . . . .Then,

r0 − rn =n∑

yjy?j r0 .

If we introduce the N × n matrix Yn =(

y1 . . . yn), then

r0 − rn = YnY?nr0 = Ynkn with kn :≡ Y?

nr0 . (18)

r0 − rn =n∑

yjy?j r0 .

y1 . . . yn), then

nr0 . (18)

r0 − rn =n∑

yjy?j r0 .

y1 . . . yn), then

nr0 . (18)

Computing r0 − rn as projection of r0 (cont’d)

Unlike in GMRES kn is obtained from kn−1 by just appending anew component:

kn ≡:

κ1...

(kn−1κn

Incidentally, kn is also the solution of the normal equations

Y?nYnkn = Y?

associated with the least squares problem Ynkn ≈ r0 .

Computing r0 − rn as projection of r0 (cont’d)

Unlike in GMRES kn is obtained from kn−1 by just appending anew component:

kn ≡:

κ1...

(kn−1κn

Incidentally, kn is also the solution of the normal equations

Y?nYnkn = Y?

associated with the least squares problem Ynkn ≈ r0 .

How do we find xn?

Finding xn is the nontrivial part of this approach.

Walker and Zhou (1994) came up with a solution; they calledthis method Simpler GMRES. We may just write SGMRES.

However, their derivation is not very elegant.

Using (block) matrix identities we can give a much shorter andmore transparent derivation.

How do we find xn? (cont’d)

y1 :≡ Ar0

‖Ar0‖, so Ar0 = y1ρ0,0 , ρ0,0 :≡ ‖Ar0‖ .

Combining this with the Arnoldi relation AYn−1 = YnHn−1 we get

r0 Yn−1)

= YnRn , (19)

Rn :≡(

e1ρ0,0 Hn−1)

ρ0,00 Hn−1...0

. (20)

is upper triangular. So, by (19), A−1Yn =(

r0 Yn−1)

R−1n .

Again:

A−1Yn =(

r0 Yn−1)

R−1n . (21)

On the other hand: by (18), r0 − rn = Ynkn , and we get

xn − x0 = −(x0 − xn) = A−1(r0 − rn) = A−1Ynkn ,

soxn = x0 + A−1Ynkn . (22)

Finally, using (21) we obtain

xn = x0 +(

r0 Yn−1)

mn , where mn :≡ R−1n kn .

(23)As in GMRES there is no need to update xn as we will computeit only once ‖rn‖ is small enough.

Again:

A−1Yn =(

r0 Yn−1)

R−1n . (21)

soxn = x0 + A−1Ynkn . (22)

xn = x0 +(

r0 Yn−1)

Again:

A−1Yn =(

r0 Yn−1)

R−1n . (21)

soxn = x0 + A−1Ynkn . (22)

xn = x0 +(

r0 Yn−1)

Walker and Zhou (1994) prefer a variant of this formula for xn,where r0 is substituted using (18) with n replaced by n − 1.

To write it in block matrix form we have to separate the firstelement of mn:

mn ≡:

). (24)

Then xn = x0 + r0µn + Yn−1mn, and thus

xn = x0 + rn−1µn + Yn−1(mn + kn−1µn) . (25)

Updating the residual norm

Recall:kn ≡:

κ1...

(kn−1κn

So the orthogonal decomposition of r0 in (18) becomes

r0 = rn +n∑

yj κj (26)

and implies

‖r0‖2 = ‖rn‖2 +n∑

|κj |2 . (27)

In particular, we have the recursions

rn := rn−1 − ynκn , and ‖rn‖2 := ‖rn−1‖2 − |κn|2 . (28)

They are endangered by roundoff, however.Walker and Zhou (1994) have again a better variant.

Updating the residual norm

Recall:kn ≡:

κ1...

(kn−1κn

So the orthogonal decomposition of r0 in (18) becomes

r0 = rn +n∑

yj κj (26)

and implies

‖r0‖2 = ‖rn‖2 +n∑

|κj |2 . (27)

In particular, we have the recursions

rn := rn−1 − ynκn , and ‖rn‖2 := ‖rn−1‖2 − |κn|2 . (28)

They are endangered by roundoff, however.Walker and Zhou (1994) have again a better variant.

Updating the solution

Walker and Zhou gave also several update formulas for xn(although they are here not needed).

The simplest is based on introducing direction vectors. Let

Zn :≡(

z0 z1 . . . zn)

r0 Yn−1)

R−1n . (29)

Then (23) yields

xn = x0 + Znkn = xn−1 + znκn . (30)

The direction vectors zn are obtained by a recursion that can beextracted from ZnRn =

(r0 Yn−1

). In general, this recursion

involves all previous direction vectors and is therefore too costly.

Updating the solution

Walker and Zhou gave also several update formulas for xn(although they are here not needed).

The simplest is based on introducing direction vectors. Let

Zn :≡(

z0 z1 . . . zn)

r0 Yn−1)

R−1n . (29)

Then (23) yields

xn = x0 + Znkn = xn−1 + znκn . (30)

The direction vectors zn are obtained by a recursion that can beextracted from ZnRn =

(r0 Yn−1

). In general, this recursion

involves all previous direction vectors and is therefore too costly.

Simpler MINRES

Walker and Zhou (1994) missed to mention that, in the sameway, we can introduce Simpler MINRES.

As in the transition from GMRES to MINRES, when we applySimpler GMRES to a real symmetric or Hermitian matrix, wecan profit from simplifications that make the algorithm muchless costly.

First, the Arnoldi process generating Hn is replaced by thesymmetric Lanczos process generating an extended realsymmetric tridiagonal matrix Tn.

Simpler MINRES

Walker and Zhou (1994) missed to mention that, in the sameway, we can introduce Simpler MINRES.

As in the transition from GMRES to MINRES, when we applySimpler GMRES to a real symmetric or Hermitian matrix, wecan profit from simplifications that make the algorithm muchless costly.

First, the Arnoldi process generating Hn is replaced by thesymmetric Lanczos process generating an extended realsymmetric tridiagonal matrix Tn.

Simpler MINRES (cont’d)

Then, most importantly, Rn of (20) becomes an banded uppertriangular matrix of bandwidth 3 only:

Rn :≡(

e1β0 Tn−1)

β0 α1 β1

β1 α2. . .

β2. . . βn−1. . . αn

where now β0 :≡ ‖Ar0‖ .

Now, the relation

ZnRn =(

r0 Yn−1)

can be viewed as a three-term recursion for computing thedirection vectors zk that are needed in (30) for updating xn:

z0 := r0/β0 ,z1 := (y1 − z0α1) /β1 ,zn := (yn − zn−1αn − zn−2βn−1) /βn , n = 2, 3, . . . .

Using these recursions we obtain a Simpler MINRES orSMINRES algorithm that does not require to store the Lanczosbasis or any other set of O(n) vectors from RN or CN .

So, it is comparable in cost to CG, CR, and MINRES.

Unlike CG and the standard OMIN version of CR it does notrequire A to be spd or Hpd.

Conclusions: pros and cons

SGMRES and SMINRES are simpler than GMRES andMINRES, respectively: there is no need to QR decomposea Hessenberg or a tridiagonal matrix.They do not transfer the problem to coordinate space, butjust reflect the orthogonal projection / best approximationproperty of a minimum residual method.The numerical experiments of Walker and Zhou (1994)indicated that SGMRES is less accurate / stable thanGMRES, and Liesen, Rozložník and Strakoš (2002)supported this by an analysis.But does this matter when we need restarts anyway?It may matter for SMINRES.

SGMRES and SMINRES are simpler than GMRES andMINRES, respectively: there is no need to QR decomposea Hessenberg or a tridiagonal matrix.They do not transfer the problem to coordinate space, butjust reflect the orthogonal projection / best approximationproperty of a minimum residual method.The numerical experiments of Walker and Zhou (1994)indicated that SGMRES is less accurate / stable thanGMRES, and Liesen et al. (2002) supported this by ananalysis.But does this matter when we need restarts anyway?It may matter for SMINRES.

Outlook on block version for systems with MRHS

Block GMRES is complicated, in particular when we wantto implement an efficient/agressive deflation technique fordealing with linearly dependent residuals.An efficient QR decomposition for the resulting blockHessenberg matrices is also quite complicated; see(Gutknecht and Schmelzer 2006).Simplified Block GMRES has been introduced by Liu andZhong (2006); yet without deflation.It has the potential of becoming simpler and more efficientthan block GMRES.

Thanks for listening and come to ...

References

M. H. Gutknecht and T. Schmelzer (2006), ‘Updating the QRdecomposition of block tridiagonal and block hessenbergmatrices generated by block krylov space methods’. manuscript.

J. Liesen, M. Rozložník and Z. Strakoš (2002), ‘Least squaresresiduals and minimal residual methods’, SIAM J. Sci. Comput.23(5), 1503–1525.

H.-L. Liu and B.-J. Zhong (2006), ‘Simpler block GMRES fornonsymmetric systems with multiple right-hand sides’.manuscript.

Y. Saad and M. H. Schultz (1986), ‘GMRES: a generalized minimalresidual algorithm for solving nonsymmetric linear systems’,SIAM J. Sci. Statist. Comput. 7, 856–869.

H. F. Walker and L. Zhou (1994), ‘A simpler GMRES’, Numer. LinearAlgebra Appl. 1(6), 571–581.

a simplified presentation of simplified gmrmhg/talks/china06/sgmres.pdf · a simplified...

Documents