chapter 8: eigenvalues and singular values

July 20, 2013

Chapter 8: Eigenvalues and Singular Values

Uri M. Ascher and Chen GreifDepartment of Computer ScienceThe University of British Columbia

{ascher,greif}@cs.ubc.ca

Slides for the bookA First Course in Numerical Methods (published by SIAM, 2011)

http://www.ec-securehost.com/SIAM/CS07.html

Eigenvalues and Singular Values Goals

Goals of this chapter

• To find out how eigenvalues and singular values of a given matrix arecomputed;

• to find out how the largest (and smallest) few eigenvalues and singular valuesare computed;

• to see some interesting applications of eigenvalues and singular values.

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 1 / 1

Eigenvalues and Singular Values Outline

Outline

• Algorithms for a few eigenvalues

• Uses of eigenvalues and eigenvectors

• Uses of SVD

• (A taste of) algorithms for all eigenvalues and SVD

Eigenvalues and Singular Values Motivation

Recall eigenvalues and singular values (Ch. 4)

• For a real, square n × n matrix A, an eigenvalue λ and correspondingeigenvector x = 0 satisfy Ax = λx.

• There are n (possibly complex) eigenpairs λ1, λ2, . . . , λn and eigenvectorsx1,x2, . . . ,xn s.t. Axi = λixi. For Λ = diag(λi)

AX = XΛ.

• If A is non-defective then the eigenvector matrix X is nonsingular:

X−1AX = Λ.

• For a real, m × n matrix A, the singular value decomposition (SVD) is

A = UΣV T

where U m × m and V n × n are orthogonal matrices and Σ is “diagonal”consisting of zeros and singular values σ1 ≥ σ2, · · · ≥ σr > 0, r ≤ min(m,n).

• Note: singular values are square roots of eigenvalues of AT A.

Eigenvalues and Singular Values Motivation

Classes of methods for eigenvalue problem

• In general, must iterate to find eigenvalues. Nonetheless, methods for findingall eigenvalues for non-large matrices resemble properties of direct solvers; inparticular, they are based on matrix decompositions.

• To find a few eigenvalues there are the basic power and inverse powermethods, and their generalization to orthogonal iterations. Large and sparseeigenvalue solvers are based on Lanczos and Arnoldi iterations (will not bediscussed). In Matlab: eigs

• Methods for finding all eigenvalues are based on orthogonal similaritytransformations. In Matlab: eig

• In general, algorithms for finding SVD are related to those for findingeigenvalues. In Matlab: svd

Eigenvalues and Singular Values Algorithm for a few eigenvalues

Outline

• Uses of SVD

Power method

• Given v0, expand using eigenpairs

v0 =n∑

βjxj ,

where Axj = λjxj , j = 1, . . . , n.

• Then

Av0 =n∑

βjAxj =n∑

(βjλj)xj .

• Multiplying by A k − 1 times gives

Akv0 =n∑

βjλk−1j Axj =

n∑j=1

(βjλkj )xj .

Power method (cont.)

Assuming:

• A non-defective

• |λ1| > |λj |, j = 2, . . . , n

• β1 = 0Obtain

Akv0 → x1.

Given eigenvector, obtain eigenvalue from Rayleigh quotient

λ1 =xT

xT1 x1

Assuming:

• A non-defective

• |λ1| > |λj |, j = 2, . . . , n

• β1 = 0Obtain

Akv0 → x1.

λ1 =xT

xT1 x1

Assuming:

• A non-defective

• |λ1| > |λj |, j = 2, . . . , n

• β1 = 0Obtain

Akv0 → x1.

λ1 =xT

xT1 x1

Power method algorithm and properties

• Algorithm:For k = 1, 2, . . . until termination

v = Avk−1

vk = v/∥v∥2

λ(k)1 = vT

k Avk.

• Properties:• Simple, basic, can be slow.• Can be applied to large, sparse or implicit matrices.• Limiting assumptions.• Used as a building block for other, more robust algorithms.

Power method algorithm and properties

• Algorithm:For k = 1, 2, . . . until termination

v = Avk−1

vk = v/∥v∥2

λ(k)1 = vT

k Avk.

• Properties:• Simple, basic, can be slow.• Can be applied to large, sparse or implicit matrices.• Limiting assumptions.• Used as a building block for other, more robust algorithms.

Example: power method

u = [1:32]; v = [1:30, 30, 32];A = diag(u); B = diag(v);

0 50 100 150 20010

10−5

10−4

10−3

10−2

10−1

iteration

A=diag(u)B=diag(v)

Inverse power method

• Power method good only for a well-separated dominant eigenvalue. Whatabout other eigenvalues? What about more general case of looking for anon-dominant eigenpair?

• If we know α that approximates well some simple eigenvalue λs then|λs − α| ≪ |λi − α|, all i = s. Hence

|λs − α|−1 ≫ |λi − α|−1, all i = s.

• These are the eigenvalues of (A − αI)−1 with the same eigenvectors!

• The parameter α is a shift.

Inverse power method

• Power method good only for a well-separated dominant eigenvalue. Whatabout other eigenvalues? What about more general case of looking for anon-dominant eigenpair?

• If we know α that approximates well some simple eigenvalue λs then|λs − α| ≪ |λi − α|, all i = s. Hence

|λs − α|−1 ≫ |λi − α|−1, all i = s.

• These are the eigenvalues of (A − αI)−1 with the same eigenvectors!

• The parameter α is a shift.

Inverse power method algorithm

• Algorithm: For k = 1, 2, . . . until termination

Solve (A − αI)v = vk−1

vk = v/∥v∥λ(k) = vT

k Avk.

• Properties:• Can apply to find different eigenvalues using different shifts.• But must solve (possibly many) linear systems.• If α is fixed then there is one matrix and many right hand sides: if a direct

method can be applied then form LU decomposition of A − αI once. (SeeChapter 5)

• Alternatively, can set α = αk and learn it as the iteration proceeds using λk−1.Now each iteration is more expensive, but the algorithm may converge muchfaster.

Inverse power method algorithm

• Algorithm: For k = 1, 2, . . . until termination

Solve (A − αI)v = vk−1

vk = v/∥v∥λ(k) = vT

k Avk.

• Properties:• Can apply to find different eigenvalues using different shifts.• But must solve (possibly many) linear systems.• If α is fixed then there is one matrix and many right hand sides: if a direct

method can be applied then form LU decomposition of A − αI once. (SeeChapter 5)

• Alternatively, can set α = αk and learn it as the iteration proceeds using λk−1.Now each iteration is more expensive, but the algorithm may converge muchfaster.

Example: inverse power method

u = [1:32]; A = diag(u);

0 5 10 15 20 2510

10−5

10−4

10−3

10−2

10−1

iteration

α=33α=35

Eigenvalues and Singular Values Uses of eigenvalues and eigenvectors

Outline

• Uses of SVD

Eigenvalues and Singular Values Uses of eigenvalues and eigenvectors

Uses of eigenvalues and eigenvectors

• The famous PageRank algorithm for enabling quick www searching amountsto finding the dominant eigenvector, corresponding to the largest eigenvalueλ = 1 in a huge matrix A where relations among web pages are recorded.

• Often, the matrix A represents a discretization of a differential equation(such as those in Chapters 7 and 16). Correspondingly, A may be large andsparse. The eigenvector x in the relation Ax = λx then corresponds to adiscretization of an eigenfunction.

• Eigenvalues are used for determining ℓ2 condition numbers and naturally arisein many other analysis-related tasks.

Eigenvalues and Singular Values Uses of SVD

Outline

• Uses of SVD

Singular value decomposition (SVD)

• Recall material from Chapter 4 (where there is also a PCA example):

• For a real, m × n matrix A, the singular value decomposition (SVD) is givenby

A = UΣV T

• Singular values are square roots of eigenvalues of AT A.

• If r = n ≤ m then

κ2(A) =σ1

√κ2(AT A).

• Note that, unlike for eigenvalues, A need not be square, the singular valuesare real and nonnegative, and the transformation to “diagonal” form isalways well-conditioned.

Singular value decomposition (SVD)

• Recall material from Chapter 4 (where there is also a PCA example):

• For a real, m × n matrix A, the singular value decomposition (SVD) is givenby

A = UΣV T

• Singular values are square roots of eigenvalues of AT A.

• If r = n ≤ m then

κ2(A) =σ1

√κ2(AT A).

• Note that, unlike for eigenvalues, A need not be square, the singular valuesare real and nonnegative, and the transformation to “diagonal” form isalways well-conditioned.

Solving the linear least squares problem using SVD

• A = UΣV T , so

∥b − Ax∥ = ∥UT b − Σy∥, x = V y.

• If κ(A) = σ1/σn is not too large then can proceed as with QR decomposition(Chapter 6).

• If κ(A) = σ1/σn is too large then proceed to determine an effective cutoff:going from n backward find r, 1 ≤ r < n, such that σ1/σr is not too largeand set σr+1 = · · · = σn = 0, obtaining Σ. This changes the problem, fromA to A = UΣV T , regularizing it!

• Precisely the same procedure can be also applied for problems Ax = b whereA is square but too ill-conditioned (so the solution using GEPP may bemeaningless).

• A = UΣV T , so

∥b − Ax∥ = ∥UT b − Σy∥, x = V y.

• A = UΣV T , so

∥b − Ax∥ = ∥UT b − Σy∥, x = V y.

Regularization

Solve a nearby well-conditioned problem safely:...1 For n, n − 1 . . . until r found s.t. σ1

σris tolerable in size. This is the condition

number of the problem that we actually solve:

min{∥x∥ ; x = argmin∥b − Ax∥}

...2 Let ui be ith column vector of U ; set zi = uTi b, i = 1, . . . , r.

...3 Set yi = σ−1i zi, i = 1, 2, . . . , r, yi = 0, i = r + 1, . . . , n.

...4 Let vi be ith column vector of V ; set x =∑r

i=1 yivi.

Regularization

Solve a nearby well-conditioned problem safely:...1 For n, n − 1 . . . until r found s.t. σ1

σris tolerable in size. This is the condition

number of the problem that we actually solve:

min{∥x∥ ; x = argmin∥b − Ax∥}

...2 Let ui be ith column vector of U ; set zi = uTi b, i = 1, . . . , r.

...3 Set yi = σ−1i zi, i = 1, 2, . . . , r, yi = 0, i = r + 1, . . . , n.

...4 Let vi be ith column vector of V ; set x =∑r

i=1 yivi.

Example

• The least squares problem minx ∥Cx − b∥ is easily solved for

1 0 12 3 55 3 −23 5 4−1 6 3

4−25−21

Call the solution x.• Next consider minx ∥Ax − b∥, where we add to C a column that is sum of

the three previous ones:

1 0 1 22 3 5 105 3 −2 63 5 4 12−1 6 3 8

• Note that A has m = 5, n = 4, r = 3: can’t reliably solve this using normalequations or even QR.

• But the truncated SVD method works: if x solves the problem with A,obtain ∥x∥ ≈ ∥x∥, and ∥Ax − b∥ ≈ ∥Cx − b∥.

Example

1 0 12 3 55 3 −23 5 4−1 6 3

4−25−21

1 0 1 22 3 5 105 3 −2 63 5 4 12−1 6 3 8

Example

1 0 12 3 55 3 −23 5 4−1 6 3

4−25−21

1 0 1 22 3 5 105 3 −2 63 5 4 12−1 6 3 8

Truncated SVD and data compression

• Given an m × n matrix A, the best rank-r approximation of A = UΣV T isthe matrix

Ar =r∑

σiuivTi .

• This is another example of truncated SVD (TSVD), so named because onlythe first r columns of U and V are utilized.

• It is a model reduction technique, which is a best approximation in the sensethat ∥A − Ar∥2 is minimal over all possible rank-r matrices. The minimumresidual norm is equal to σr+1.

• Note Ar uses only r(m + n + 1) storage locations – significantly fewer thanmn if r ≪ min(m,n).

Example

Consider the following clown image, taken from Matlab’s image repository:

50 100 150 200 250 300

Compressed image of clown example

Here is what we get if we use r = 20.

50 100 150 200 250 300

Assessment

• The original image requires 200 × 320 = 64000 matrix entries to be stored;the compressed image requires merely 20 · (200 + 320 + 1) ≈ 10000 storagelocations.

• By storing less than 16% of the data, we get a reasonable image. It’s notgreat, but all main the features of the clown are clearly shown.

• What are less clear in the compressed image are fine features (highfrequency), such as the fine details of the clown’s hair.

• Certainly, specific techniques such as DCT (Chapter 13) and wavelet are farsuperior to SVD for the task of image compression. Still, our example visuallyshows that most information is stored already in the leading singular vectors.

Eigenvalues and Singular Values Algorithms for all eigenvalues

Outline

• Uses of SVD

Finding all eigenvalues

• Consider A real and square. Apply similarity orthogonal transformations

A0 = A; Ak+1 = QTk AkQk, k = 0, 1, 2, . . . ,

so that Ak → upper triangular (diagonal for symmetric case) form.

• QR algorithm (distinct from QR decomposition!): two stages.

• Stage IGiven A, use Householder reflections (Chapter 6) to zero out elements in firstdiagonal: Q(1)T A = A(1). However, to maintain similarity must multiply byQ(1) on the right, so

A1 = Q(1)T AQ(1).

Finding all eigenvalues

• Consider A real and square. Apply similarity orthogonal transformations

A0 = A; Ak+1 = QTk AkQk, k = 0, 1, 2, . . . ,

so that Ak → upper triangular (diagonal for symmetric case) form.

• QR algorithm (distinct from QR decomposition!): two stages.

• Stage IGiven A, use Householder reflections (Chapter 6) to zero out elements in firstdiagonal: Q(1)T A = A(1). However, to maintain similarity must multiply byQ(1) on the right, so

A1 = Q(1)T AQ(1).

QR algorithm: stages I-II

• Define

A1 = Q(1)T AQ(1).

But can only zero elements below main subdiagonal.

• Leads to an upper Hessenberg form. (Tridiagonal if A is symmetric.)

• Stage IILet A0 be the given matrix, transformed into upper Hessenberg form;for k = 0, 1, 2, . . . until terminationQR decomposition: QkRk = Ak − αkI;Construct next iterate: Ak+1 = RkQk + αkI

QR algorithm: stages I-II

• Define

A1 = Q(1)T AQ(1).

But can only zero elements below main subdiagonal.

• Leads to an upper Hessenberg form. (Tridiagonal if A is symmetric.)

• Stage IILet A0 be the given matrix, transformed into upper Hessenberg form;for k = 0, 1, 2, . . . until terminationQR decomposition: QkRk = Ak − αkI;Construct next iterate: Ak+1 = RkQk + αkI

chapter 8: eigenvalues and singular values

Documents