the conjugate gradient method - stanford...

The Conjugate Gradient Method

Jason E. Hicken

Aerospace Design Lab

Department of Aeronautics & Astronautics

Stanford University

14 July 2011

Lecture Objectives

� describe when CG can be used to solve Ax = b

� relate CG to the method of conjugate directions

� describe what CG does geometrically

� explain each line in the CG algorithm

We are interested in solving the linear system

Ax = b

where x , b ∈ Rn and A ∈ R

Matrix is symmetric positive-definite (SPD)

AT = A (symmetric)

xTAx > 0, ∀ x 6= 0 (positive-definite)

• discretization of elliptic PDEs

• optimization of quadratic functionals

• nonlinear optimization problems

We are interested in solving the linear system

Ax = b

where x , b ∈ Rn and A ∈ R

Matrix is symmetric positive-definite (SPD)

AT = A (symmetric)

xTAx > 0, ∀ x 6= 0 (positive-definite)

• discretization of elliptic PDEs

• optimization of quadratic functionals

• nonlinear optimization problems

When A is SPD, solving the linear system is thesame as minimizing the quadratic form

f (x) =1

2xTAx − bTx .

Why? If x⋆ is the minimizing point, then

∇f (x⋆) = Ax⋆ − b = 0

and, for x 6= x⋆

f (x)− f (x⋆) > 0. (homework)

When A is SPD, solving the linear system is thesame as minimizing the quadratic form

f (x) =1

2xTAx − bTx .

Why? If x⋆ is the minimizing point, then

∇f (x⋆) = Ax⋆ − b = 0

and, for x 6= x⋆

f (x)− f (x⋆) > 0. (homework)

DefinitionsLet xi be the approximate solution to Ax = b atiteration i .

error: ei ≡ xi − x

residual: ri ≡ b − Axi

The following identities for the residual will beuseful later.

ri = −Aei

ri = −∇f (xi)

Model problem[

5 −3−3 5

](x1x2

0 0.5 1 1.5 2 2.5 3 3.5 40.5

x = [2 2]T5x 1

−3x 1 + 5x 2

Model problem[

5 −3−3 5

](x1x2

x = [2 2]T5x 1

−3x 1 + 5x 2

0 0.5 1 1.5 2 2.5 3 3.5 40.5

Review: Steepest Descent MethodQualitatively, how will steepest descent proceed onour model problem, starting at x0 = (1

3, 1)T ?

0 0.5 1 1.5 2 2.5 3 3.5 40.5

3, 1)T ?

0 0.5 1 1.5 2 2.5 3 3.5 40.5

3, 1)T ?

0 0.5 1 1.5 2 2.5 3 3.5 40.5

How can we eliminate this zig-zag behaviour?

To find the answer, we begin by considering theeasier problem

[2 00 8

](x1x2

f (x) = x21 + 4x22 − 4√2x1.

Here, the equations are decoupled, so we canminimize in each direction independently.What do the contours of the correspondingquadratic form look like?

How can we eliminate this zig-zag behaviour?

To find the answer, we begin by considering theeasier problem

[2 00 8

](x1x2

f (x) = x21 + 4x22 − 4√2x1.

Here, the equations are decoupled, so we canminimize in each direction independently.What do the contours of the correspondingquadratic form look like?

Simplified problem[2 00 8

](x1x2

0 0.5 1 1.5 2 2.5 3 3.5 4−1.5

−0.5

](x1x2

0 0.5 1 1.5 2 2.5 3 3.5 4−1.5

−0.5

](x1x2

0 0.5 1 1.5 2 2.5 3 3.5 4−1.5

−0.5

](x1x2

0 0.5 1 1.5 2 2.5 3 3.5 4−1.5

−0.5

Method of Orthogonal Directions

Idea: Express error as a sum of n orthogonal searchdirections

e ≡ x0 − x =n−1∑

αidi .

At iteration i + 1, eliminate component αidi .

• never need to search along di again

• converge in n iterations!

How would we apply the method of orthogonaldirections to a non-diagonal matrix?

e ≡ x0 − x =n−1∑

αidi .

e ≡ x0 − x =n−1∑

αidi .

Review of Inner ProductsThe search directions in the method of orthogonaldirections are orthogonal with respect to the dotproduct.

The dot product is an example of an inner product.

Inner ProductFor x , y , z ∈ R

n and α ∈ R, an inner product(, ) : Rn × R

n → R satisfies

• symmetry: (x , y) = (y , x)

• linearity: (αx + y , z) = α(x , z) + (y , z)

• positive-definiteness: (x , x) > 0 ⇔ x 6= 0

Review of Inner ProductsThe search directions in the method of orthogonaldirections are orthogonal with respect to the dotproduct.

The dot product is an example of an inner product.

Inner ProductFor x , y , z ∈ R

n and α ∈ R, an inner product(, ) : Rn × R

n → R satisfies

• symmetry: (x , y) = (y , x)

• linearity: (αx + y , z) = α(x , z) + (y , z)

• positive-definiteness: (x , x) > 0 ⇔ x 6= 0

Fact: (x , y)A ≡ xTAy is an inner product

A-orthogonality (conjugacy)We say two vectors x , y ∈ R

n are A-orthogonal, orconjugate, if

(x , y)A = xTAy = 0.

What happens if we use A-orthogonality rather thanstandard orthogonality in the method of orthogonaldirections?

Let {p0, p1, . . . , pn−1} be a set of n linearlyindependent vectors that are A-orthogonal. If pi isthe i th column of P, then

PTAP = Σ

where Σ is a diagonal matrix.

Substitute x = Py into the quadratic form:

f (Py) = yTΣy − (PTb)Ty .

We can apply the method of orthogonal directionsin y -space.

Let {p0, p1, . . . , pn−1} be a set of n linearlyindependent vectors that are A-orthogonal. If pi isthe i th column of P, then

PTAP = Σ

where Σ is a diagonal matrix.

Substitute x = Py into the quadratic form:

f (Py) = yTΣy − (PTb)Ty .

We can apply the method of orthogonal directionsin y -space.

New Problem: how do we get the set {pi} ofconjugate vectors?

Gram-Schmidt ConjugationLet {d0, d1, . . . , dn−1} be a set of linearlyindependent vectors, e.g., coordinate axes.

• set p0 = d0

• for i > 0

pi = di −i−1∑

βijpj

where βij = (di , pj)A/(pj , pj)A.

New Problem: how do we get the set {pi} ofconjugate vectors?

Gram-Schmidt ConjugationLet {d0, d1, . . . , dn−1} be a set of linearlyindependent vectors, e.g., coordinate axes.

• set p0 = d0

• for i > 0

pi = di −i−1∑

βijpj

where βij = (di , pj)A/(pj , pj)A.

The Method of Conjugate DirectionsForce the error at iteration i + 1 to be conjugate tothe search direction pi .

pTi Aei+1 = pTi A(ei + αipi) = 0

⇒ αi = −pTi Aei

pTi Api

=pTi ri

pTi Api

• never need to search along pi again

pTi Api

=pTi ri

pTi Api

=pTi ri

pTi Api

The Method of Conjugate Directions[

5 −3−3 5

](x1x2

0 0.5 1 1.5 2 2.5 3 3.5 40.5

5 −3−3 5

](x1x2

0 0.5 1 1.5 2 2.5 3 3.5 40.5

5 −3−3 5

](x1x2

0 0.5 1 1.5 2 2.5 3 3.5 40.5

5 −3−3 5

](x1x2

0 0.5 1 1.5 2 2.5 3 3.5 40.5

5 −3−3 5

](x1x2

0 0.5 1 1.5 2 2.5 3 3.5 40.5

The Method of Conjugate Directions is well defined,and avoids the “zig-zagging”of Steepest Descent.

What about computational expense?

• If we choose the di in Gram-Schmidtconjugation to be the coordinate axes, theMethod of Conjugate Directions is equivalentto Gaussian elimination.

• Keeping all the pi is the same as storing adense matrix!

Can we find a smarter choice for di?

The Method of Conjugate Directions is well defined,and avoids the “zig-zagging”of Steepest Descent.

What about computational expense?

• If we choose the di in Gram-Schmidtconjugation to be the coordinate axes, theMethod of Conjugate Directions is equivalentto Gaussian elimination.

• Keeping all the pi is the same as storing adense matrix!

Can we find a smarter choice for di?

Error Decomposition Using pi

0 0.5 1 1.5 2 2.5 3 3.5 40.5

Error Decomposition Using pi

0 0.5 1 1.5 2 2.5 3 3.5 40.5

The error at iteration i can be expressed as

ei =n−1∑

αkpk ,

so the error must be conjugate to pj for j < i :

pTj Aei = 0, ⇒ pTj ri = 0,

but from Gram-Schmidt conjugation we have

ei =n−1∑

αkpk ,

ei =n−1∑

αkpk ,

ei =n−1∑

αkpk ,

pj = dj −j−1∑

βjkpk .

ei =n−1∑

αkpk ,

pTj ri = dTj ri −

j−1∑

βjkpTk ri

0 = dTj ri , j < i .

Thus, the residual at iteration i is orthogonal to thevectors dj used in the previous iterations:

dTj ri = 0, j < i

Idea: what happens if we choose di = ri?

• residuals become mutually orthogonal

• ri is orthogonal to pj , for j < i ⋆

• ri+1 becomes conjugate to pj , for j < i

This last point is not immediately obvious, so wewill prove it. This result has significant implicationsfor Gram-Schmidt conjugation.

⋆we showed this is true for any choice of di

Thus, the residual at iteration i is orthogonal to thevectors dj used in the previous iterations:

dTj ri = 0, j < i

Idea: what happens if we choose di = ri?

• residuals become mutually orthogonal

• ri is orthogonal to pj , for j < i ⋆

• ri+1 becomes conjugate to pj , for j < i

This last point is not immediately obvious, so wewill prove it. This result has significant implicationsfor Gram-Schmidt conjugation.

⋆we showed this is true for any choice of di

The solution is updated according to

xj+1 = xj + αjpj

⇒ rj+1 = rj − αjApj

⇒ Apj =1

(rj − rj+1).

Next, take the dot product of both sides with anarbitrary residual ri :

rTi Apj =

rTi riαi

, i = j

− rTi riαi−1

, i = j + 1

0, otherwise.

xj+1 = xj + αjpj

⇒ Apj =1

(rj − rj+1).

rTi Apj =

rTi riαi

, i = j

− rTi riαi−1

, i = j + 1

0, otherwise.

xj+1 = xj + αjpj

⇒ Apj =1

(rj − rj+1).

rTi Apj =

rTi riαi

, i = j

− rTi riαi−1

, i = j + 1

0, otherwise.

xj+1 = xj + αjpj

⇒ Apj =1

(rj − rj+1).

rTi Apj =

rTi riαi

, i = j

− rTi riαi−1

, i = j + 1

0, otherwise.

xj+1 = xj + αjpj

⇒ Apj =1

(rj − rj+1).

rTi Apj =

rTi riαi

, i = j

− rTi riαi−1

, i = j + 1

0, otherwise.

We can show that the first case (i = j) contains nonew information (homework). Divide the remainingcases by pTj Apj and insert the definition of αi−1:

rTi Apj

pTj Apj︸︷︷︸

− rTi ri

rTi−1ri−1

, i = j + 1

0, otherwise.

We recognize the L.H.S. as the coefficients inGram-Schmidt conjugation

• only one coefficient is nonzero!

We can show that the first case (i = j) contains nonew information (homework). Divide the remainingcases by pTj Apj and insert the definition of αi−1:

rTi Apj

pTj Apj︸︷︷︸

− rTi ri

rTi−1ri−1

, i = j + 1

0, otherwise.

We recognize the L.H.S. as the coefficients inGram-Schmidt conjugation

• only one coefficient is nonzero!

The Conjugate Gradient MethodSet p0 = r0 = b − Ax0 and i = 0

αi = (pTi ri)/(pTi Api) (step length)

xi+1 = xi + αipi (sol. update)

ri+1 = ri − αiApi (resid. update)

βi+1,i = −(rTi+1ri+1)/(rTi ri) (G.S. coeff.)

pi+1 = ri+1 − βi+1,i pi (Gram Schmidt)

i := i + 1

The Conjugate Gradient Method[

5 −3−3 5

](x1x2

0 0.5 1 1.5 2 2.5 3 3.5 40.5

5 −3−3 5

](x1x2

0 0.5 1 1.5 2 2.5 3 3.5 40.5

5 −3−3 5

](x1x2

0 0.5 1 1.5 2 2.5 3 3.5 40.5

Lecture Objectives� describe when CG can be used to solve Ax = b

A must be symmetric positive-definite

� relate CG to the method of conjugate directionsCG is a method of conjugate directions withthe choice di = ri , which simplifiesGram-Schmidt conjugation

� describe what CG does geometricallyPerforms the method of orthogonal directionsin a transformed space where the contours ofthe quadratic form are aligned with thecoordinate axes

References

• Saad, Y., “Iterative Methods for Sparse LinearSystems”, second edition

• Shewchuk, J. R., “An introduction to theConjugate Gradient method without theagonizing pain”

the conjugate gradient method - stanford...

Documents