chapter 8: eigenvalues and singular values
TRANSCRIPT
July 20, 2013
Chapter 8: Eigenvalues and Singular Values
Uri M. Ascher and Chen GreifDepartment of Computer ScienceThe University of British Columbia
{ascher,greif}@cs.ubc.ca
Slides for the bookA First Course in Numerical Methods (published by SIAM, 2011)
http://www.ec-securehost.com/SIAM/CS07.html
Eigenvalues and Singular Values Goals
Goals of this chapter
• To find out how eigenvalues and singular values of a given matrix arecomputed;
• to find out how the largest (and smallest) few eigenvalues and singular valuesare computed;
• to see some interesting applications of eigenvalues and singular values.
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 1 / 1
Eigenvalues and Singular Values Outline
Outline
• Algorithms for a few eigenvalues
• Uses of eigenvalues and eigenvectors
• Uses of SVD
• (A taste of) algorithms for all eigenvalues and SVD
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 2 / 1
Eigenvalues and Singular Values Motivation
Recall eigenvalues and singular values (Ch. 4)
• For a real, square n × n matrix A, an eigenvalue λ and correspondingeigenvector x = 0 satisfy Ax = λx.
• There are n (possibly complex) eigenpairs λ1, λ2, . . . , λn and eigenvectorsx1,x2, . . . ,xn s.t. Axi = λixi. For Λ = diag(λi)
AX = XΛ.
• If A is non-defective then the eigenvector matrix X is nonsingular:
X−1AX = Λ.
• For a real, m × n matrix A, the singular value decomposition (SVD) is
A = UΣV T
where U m × m and V n × n are orthogonal matrices and Σ is “diagonal”consisting of zeros and singular values σ1 ≥ σ2, · · · ≥ σr > 0, r ≤ min(m,n).
• Note: singular values are square roots of eigenvalues of AT A.
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 3 / 1
Eigenvalues and Singular Values Motivation
Classes of methods for eigenvalue problem
• In general, must iterate to find eigenvalues. Nonetheless, methods for findingall eigenvalues for non-large matrices resemble properties of direct solvers; inparticular, they are based on matrix decompositions.
• To find a few eigenvalues there are the basic power and inverse powermethods, and their generalization to orthogonal iterations. Large and sparseeigenvalue solvers are based on Lanczos and Arnoldi iterations (will not bediscussed). In Matlab: eigs
• Methods for finding all eigenvalues are based on orthogonal similaritytransformations. In Matlab: eig
• In general, algorithms for finding SVD are related to those for findingeigenvalues. In Matlab: svd
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 4 / 1
Eigenvalues and Singular Values Algorithm for a few eigenvalues
Outline
• Algorithms for a few eigenvalues
• Uses of eigenvalues and eigenvectors
• Uses of SVD
• (A taste of) algorithms for all eigenvalues and SVD
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 5 / 1
Eigenvalues and Singular Values Algorithm for a few eigenvalues
Power method
• Given v0, expand using eigenpairs
v0 =n∑
j=1
βjxj ,
where Axj = λjxj , j = 1, . . . , n.
• Then
Av0 =n∑
j=1
βjAxj =n∑
j=1
(βjλj)xj .
• Multiplying by A k − 1 times gives
Akv0 =n∑
j=1
βjλk−1j Axj =
n∑j=1
(βjλkj )xj .
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 6 / 1
Eigenvalues and Singular Values Algorithm for a few eigenvalues
Power method (cont.)
Assuming:
• A non-defective
• |λ1| > |λj |, j = 2, . . . , n
• β1 = 0Obtain
Akv0 → x1.
Given eigenvector, obtain eigenvalue from Rayleigh quotient
λ1 =xT
1 Ax1
xT1 x1
.
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 7 / 1
Eigenvalues and Singular Values Algorithm for a few eigenvalues
Power method (cont.)
Assuming:
• A non-defective
• |λ1| > |λj |, j = 2, . . . , n
• β1 = 0Obtain
Akv0 → x1.
Given eigenvector, obtain eigenvalue from Rayleigh quotient
λ1 =xT
1 Ax1
xT1 x1
.
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 7 / 1
Eigenvalues and Singular Values Algorithm for a few eigenvalues
Power method (cont.)
Assuming:
• A non-defective
• |λ1| > |λj |, j = 2, . . . , n
• β1 = 0Obtain
Akv0 → x1.
Given eigenvector, obtain eigenvalue from Rayleigh quotient
λ1 =xT
1 Ax1
xT1 x1
.
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 7 / 1
Eigenvalues and Singular Values Algorithm for a few eigenvalues
Power method algorithm and properties
• Algorithm:For k = 1, 2, . . . until termination
v = Avk−1
vk = v/∥v∥2
λ(k)1 = vT
k Avk.
• Properties:• Simple, basic, can be slow.• Can be applied to large, sparse or implicit matrices.• Limiting assumptions.• Used as a building block for other, more robust algorithms.
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 8 / 1
Eigenvalues and Singular Values Algorithm for a few eigenvalues
Power method algorithm and properties
• Algorithm:For k = 1, 2, . . . until termination
v = Avk−1
vk = v/∥v∥2
λ(k)1 = vT
k Avk.
• Properties:• Simple, basic, can be slow.• Can be applied to large, sparse or implicit matrices.• Limiting assumptions.• Used as a building block for other, more robust algorithms.
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 8 / 1
Eigenvalues and Singular Values Algorithm for a few eigenvalues
Example: power method
u = [1:32]; v = [1:30, 30, 32];A = diag(u); B = diag(v);
0 50 100 150 20010
−6
10−5
10−4
10−3
10−2
10−1
100
101
102
iteration
abso
lute
err
or
A=diag(u)B=diag(v)
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 9 / 1
Eigenvalues and Singular Values Algorithm for a few eigenvalues
Inverse power method
• Power method good only for a well-separated dominant eigenvalue. Whatabout other eigenvalues? What about more general case of looking for anon-dominant eigenpair?
• If we know α that approximates well some simple eigenvalue λs then|λs − α| ≪ |λi − α|, all i = s. Hence
|λs − α|−1 ≫ |λi − α|−1, all i = s.
• These are the eigenvalues of (A − αI)−1 with the same eigenvectors!
• The parameter α is a shift.
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 10 / 1
Eigenvalues and Singular Values Algorithm for a few eigenvalues
Inverse power method
• Power method good only for a well-separated dominant eigenvalue. Whatabout other eigenvalues? What about more general case of looking for anon-dominant eigenpair?
• If we know α that approximates well some simple eigenvalue λs then|λs − α| ≪ |λi − α|, all i = s. Hence
|λs − α|−1 ≫ |λi − α|−1, all i = s.
• These are the eigenvalues of (A − αI)−1 with the same eigenvectors!
• The parameter α is a shift.
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 10 / 1
Eigenvalues and Singular Values Algorithm for a few eigenvalues
Inverse power method algorithm
• Algorithm: For k = 1, 2, . . . until termination
Solve (A − αI)v = vk−1
vk = v/∥v∥λ(k) = vT
k Avk.
• Properties:• Can apply to find different eigenvalues using different shifts.• But must solve (possibly many) linear systems.• If α is fixed then there is one matrix and many right hand sides: if a direct
method can be applied then form LU decomposition of A − αI once. (SeeChapter 5)
• Alternatively, can set α = αk and learn it as the iteration proceeds using λk−1.Now each iteration is more expensive, but the algorithm may converge muchfaster.
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 11 / 1
Eigenvalues and Singular Values Algorithm for a few eigenvalues
Inverse power method algorithm
• Algorithm: For k = 1, 2, . . . until termination
Solve (A − αI)v = vk−1
vk = v/∥v∥λ(k) = vT
k Avk.
• Properties:• Can apply to find different eigenvalues using different shifts.• But must solve (possibly many) linear systems.• If α is fixed then there is one matrix and many right hand sides: if a direct
method can be applied then form LU decomposition of A − αI once. (SeeChapter 5)
• Alternatively, can set α = αk and learn it as the iteration proceeds using λk−1.Now each iteration is more expensive, but the algorithm may converge muchfaster.
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 11 / 1
Eigenvalues and Singular Values Algorithm for a few eigenvalues
Example: inverse power method
u = [1:32]; A = diag(u);
0 5 10 15 20 2510
−6
10−5
10−4
10−3
10−2
10−1
100
101
102
iteration
abso
lute
err
or
α=33α=35
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 12 / 1
Eigenvalues and Singular Values Uses of eigenvalues and eigenvectors
Outline
• Algorithms for a few eigenvalues
• Uses of eigenvalues and eigenvectors
• Uses of SVD
• (A taste of) algorithms for all eigenvalues and SVD
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 13 / 1
Eigenvalues and Singular Values Uses of eigenvalues and eigenvectors
Uses of eigenvalues and eigenvectors
• The famous PageRank algorithm for enabling quick www searching amountsto finding the dominant eigenvector, corresponding to the largest eigenvalueλ = 1 in a huge matrix A where relations among web pages are recorded.
• Often, the matrix A represents a discretization of a differential equation(such as those in Chapters 7 and 16). Correspondingly, A may be large andsparse. The eigenvector x in the relation Ax = λx then corresponds to adiscretization of an eigenfunction.
• Eigenvalues are used for determining ℓ2 condition numbers and naturally arisein many other analysis-related tasks.
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 14 / 1
Eigenvalues and Singular Values Uses of SVD
Outline
• Algorithms for a few eigenvalues
• Uses of eigenvalues and eigenvectors
• Uses of SVD
• (A taste of) algorithms for all eigenvalues and SVD
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 15 / 1
Eigenvalues and Singular Values Uses of SVD
Singular value decomposition (SVD)
• Recall material from Chapter 4 (where there is also a PCA example):
• For a real, m × n matrix A, the singular value decomposition (SVD) is givenby
A = UΣV T
where U m × m and V n × n are orthogonal matrices and Σ is “diagonal”consisting of zeros and singular values σ1 ≥ σ2, · · · ≥ σr > 0, r ≤ min(m,n).
• Singular values are square roots of eigenvalues of AT A.
• If r = n ≤ m then
κ2(A) =σ1
σn=
√κ2(AT A).
• Note that, unlike for eigenvalues, A need not be square, the singular valuesare real and nonnegative, and the transformation to “diagonal” form isalways well-conditioned.
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 16 / 1
Eigenvalues and Singular Values Uses of SVD
Singular value decomposition (SVD)
• Recall material from Chapter 4 (where there is also a PCA example):
• For a real, m × n matrix A, the singular value decomposition (SVD) is givenby
A = UΣV T
where U m × m and V n × n are orthogonal matrices and Σ is “diagonal”consisting of zeros and singular values σ1 ≥ σ2, · · · ≥ σr > 0, r ≤ min(m,n).
• Singular values are square roots of eigenvalues of AT A.
• If r = n ≤ m then
κ2(A) =σ1
σn=
√κ2(AT A).
• Note that, unlike for eigenvalues, A need not be square, the singular valuesare real and nonnegative, and the transformation to “diagonal” form isalways well-conditioned.
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 16 / 1
Eigenvalues and Singular Values Uses of SVD
Solving the linear least squares problem using SVD
• A = UΣV T , so
∥b − Ax∥ = ∥UT b − Σy∥, x = V y.
• If κ(A) = σ1/σn is not too large then can proceed as with QR decomposition(Chapter 6).
• If κ(A) = σ1/σn is too large then proceed to determine an effective cutoff:going from n backward find r, 1 ≤ r < n, such that σ1/σr is not too largeand set σr+1 = · · · = σn = 0, obtaining Σ. This changes the problem, fromA to A = UΣV T , regularizing it!
• Precisely the same procedure can be also applied for problems Ax = b whereA is square but too ill-conditioned (so the solution using GEPP may bemeaningless).
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 17 / 1
Eigenvalues and Singular Values Uses of SVD
Solving the linear least squares problem using SVD
• A = UΣV T , so
∥b − Ax∥ = ∥UT b − Σy∥, x = V y.
• If κ(A) = σ1/σn is not too large then can proceed as with QR decomposition(Chapter 6).
• If κ(A) = σ1/σn is too large then proceed to determine an effective cutoff:going from n backward find r, 1 ≤ r < n, such that σ1/σr is not too largeand set σr+1 = · · · = σn = 0, obtaining Σ. This changes the problem, fromA to A = UΣV T , regularizing it!
• Precisely the same procedure can be also applied for problems Ax = b whereA is square but too ill-conditioned (so the solution using GEPP may bemeaningless).
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 17 / 1
Eigenvalues and Singular Values Uses of SVD
Solving the linear least squares problem using SVD
• A = UΣV T , so
∥b − Ax∥ = ∥UT b − Σy∥, x = V y.
• If κ(A) = σ1/σn is not too large then can proceed as with QR decomposition(Chapter 6).
• If κ(A) = σ1/σn is too large then proceed to determine an effective cutoff:going from n backward find r, 1 ≤ r < n, such that σ1/σr is not too largeand set σr+1 = · · · = σn = 0, obtaining Σ. This changes the problem, fromA to A = UΣV T , regularizing it!
• Precisely the same procedure can be also applied for problems Ax = b whereA is square but too ill-conditioned (so the solution using GEPP may bemeaningless).
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 17 / 1
Eigenvalues and Singular Values Uses of SVD
Regularization
Solve a nearby well-conditioned problem safely:...1 For n, n − 1 . . . until r found s.t. σ1
σris tolerable in size. This is the condition
number of the problem that we actually solve:
min{∥x∥ ; x = argmin∥b − Ax∥}
...2 Let ui be ith column vector of U ; set zi = uTi b, i = 1, . . . , r.
...3 Set yi = σ−1i zi, i = 1, 2, . . . , r, yi = 0, i = r + 1, . . . , n.
...4 Let vi be ith column vector of V ; set x =∑r
i=1 yivi.
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 18 / 1
Eigenvalues and Singular Values Uses of SVD
Regularization
Solve a nearby well-conditioned problem safely:...1 For n, n − 1 . . . until r found s.t. σ1
σris tolerable in size. This is the condition
number of the problem that we actually solve:
min{∥x∥ ; x = argmin∥b − Ax∥}
...2 Let ui be ith column vector of U ; set zi = uTi b, i = 1, . . . , r.
...3 Set yi = σ−1i zi, i = 1, 2, . . . , r, yi = 0, i = r + 1, . . . , n.
...4 Let vi be ith column vector of V ; set x =∑r
i=1 yivi.
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 18 / 1
Eigenvalues and Singular Values Uses of SVD
Example
• The least squares problem minx ∥Cx − b∥ is easily solved for
C =
1 0 12 3 55 3 −23 5 4−1 6 3
, b =
4−25−21
.
Call the solution x.• Next consider minx ∥Ax − b∥, where we add to C a column that is sum of
the three previous ones:
A =
1 0 1 22 3 5 105 3 −2 63 5 4 12−1 6 3 8
.
• Note that A has m = 5, n = 4, r = 3: can’t reliably solve this using normalequations or even QR.
• But the truncated SVD method works: if x solves the problem with A,obtain ∥x∥ ≈ ∥x∥, and ∥Ax − b∥ ≈ ∥Cx − b∥.
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 19 / 1
Eigenvalues and Singular Values Uses of SVD
Example
• The least squares problem minx ∥Cx − b∥ is easily solved for
C =
1 0 12 3 55 3 −23 5 4−1 6 3
, b =
4−25−21
.
Call the solution x.• Next consider minx ∥Ax − b∥, where we add to C a column that is sum of
the three previous ones:
A =
1 0 1 22 3 5 105 3 −2 63 5 4 12−1 6 3 8
.
• Note that A has m = 5, n = 4, r = 3: can’t reliably solve this using normalequations or even QR.
• But the truncated SVD method works: if x solves the problem with A,obtain ∥x∥ ≈ ∥x∥, and ∥Ax − b∥ ≈ ∥Cx − b∥.
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 19 / 1
Eigenvalues and Singular Values Uses of SVD
Example
• The least squares problem minx ∥Cx − b∥ is easily solved for
C =
1 0 12 3 55 3 −23 5 4−1 6 3
, b =
4−25−21
.
Call the solution x.• Next consider minx ∥Ax − b∥, where we add to C a column that is sum of
the three previous ones:
A =
1 0 1 22 3 5 105 3 −2 63 5 4 12−1 6 3 8
.
• Note that A has m = 5, n = 4, r = 3: can’t reliably solve this using normalequations or even QR.
• But the truncated SVD method works: if x solves the problem with A,obtain ∥x∥ ≈ ∥x∥, and ∥Ax − b∥ ≈ ∥Cx − b∥.
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 19 / 1
Eigenvalues and Singular Values Uses of SVD
Truncated SVD and data compression
• Given an m × n matrix A, the best rank-r approximation of A = UΣV T isthe matrix
Ar =r∑
i=1
σiuivTi .
• This is another example of truncated SVD (TSVD), so named because onlythe first r columns of U and V are utilized.
• It is a model reduction technique, which is a best approximation in the sensethat ∥A − Ar∥2 is minimal over all possible rank-r matrices. The minimumresidual norm is equal to σr+1.
• Note Ar uses only r(m + n + 1) storage locations – significantly fewer thanmn if r ≪ min(m,n).
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 20 / 1
Eigenvalues and Singular Values Uses of SVD
Example
Consider the following clown image, taken from Matlab’s image repository:
50 100 150 200 250 300
20
40
60
80
100
120
140
160
180
200
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 21 / 1
Eigenvalues and Singular Values Uses of SVD
Compressed image of clown example
Here is what we get if we use r = 20.
50 100 150 200 250 300
20
40
60
80
100
120
140
160
180
200
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 22 / 1
Eigenvalues and Singular Values Uses of SVD
Assessment
• The original image requires 200 × 320 = 64000 matrix entries to be stored;the compressed image requires merely 20 · (200 + 320 + 1) ≈ 10000 storagelocations.
• By storing less than 16% of the data, we get a reasonable image. It’s notgreat, but all main the features of the clown are clearly shown.
• What are less clear in the compressed image are fine features (highfrequency), such as the fine details of the clown’s hair.
• Certainly, specific techniques such as DCT (Chapter 13) and wavelet are farsuperior to SVD for the task of image compression. Still, our example visuallyshows that most information is stored already in the leading singular vectors.
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 23 / 1
Eigenvalues and Singular Values Algorithms for all eigenvalues
Outline
• Algorithms for a few eigenvalues
• Uses of eigenvalues and eigenvectors
• Uses of SVD
• (A taste of) algorithms for all eigenvalues and SVD
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 24 / 1
Eigenvalues and Singular Values Algorithms for all eigenvalues
Finding all eigenvalues
• Consider A real and square. Apply similarity orthogonal transformations
A0 = A; Ak+1 = QTk AkQk, k = 0, 1, 2, . . . ,
so that Ak → upper triangular (diagonal for symmetric case) form.
• QR algorithm (distinct from QR decomposition!): two stages.
• Stage IGiven A, use Householder reflections (Chapter 6) to zero out elements in firstdiagonal: Q(1)T A = A(1). However, to maintain similarity must multiply byQ(1) on the right, so
A1 = Q(1)T AQ(1).
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 25 / 1
Eigenvalues and Singular Values Algorithms for all eigenvalues
Finding all eigenvalues
• Consider A real and square. Apply similarity orthogonal transformations
A0 = A; Ak+1 = QTk AkQk, k = 0, 1, 2, . . . ,
so that Ak → upper triangular (diagonal for symmetric case) form.
• QR algorithm (distinct from QR decomposition!): two stages.
• Stage IGiven A, use Householder reflections (Chapter 6) to zero out elements in firstdiagonal: Q(1)T A = A(1). However, to maintain similarity must multiply byQ(1) on the right, so
A1 = Q(1)T AQ(1).
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 25 / 1
Eigenvalues and Singular Values Algorithms for all eigenvalues
QR algorithm: stages I-II
• Define
A1 = Q(1)T AQ(1).
But can only zero elements below main subdiagonal.
• Leads to an upper Hessenberg form. (Tridiagonal if A is symmetric.)
• Stage IILet A0 be the given matrix, transformed into upper Hessenberg form;for k = 0, 1, 2, . . . until terminationQR decomposition: QkRk = Ak − αkI;Construct next iterate: Ak+1 = RkQk + αkI
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 26 / 1
Eigenvalues and Singular Values Algorithms for all eigenvalues
QR algorithm: stages I-II
• Define
A1 = Q(1)T AQ(1).
But can only zero elements below main subdiagonal.
• Leads to an upper Hessenberg form. (Tridiagonal if A is symmetric.)
• Stage IILet A0 be the given matrix, transformed into upper Hessenberg form;for k = 0, 1, 2, . . . until terminationQR decomposition: QkRk = Ak − αkI;Construct next iterate: Ak+1 = RkQk + αkI
Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 26 / 1