singular value decomposition - cornell university · 2008-07-16 · rank deficiency • suppose a...

51
Singular Value Decomposition CS3220 - Summer 2008 Jonathan Kaldor

Upload: others

Post on 02-Mar-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

Singular Value Decomposition

CS3220 - Summer 2008Jonathan Kaldor

Page 2: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

Another Factorization?

• We’ve already looked at A=LU (for n x n matrices) and A=QR (for m x m matrices)

• Both of them are exceedingly useful, but somewhat specialized

Page 3: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

Extending QR

• We factored A=QR because we wanted an “easy” system to solve for the least squares problem (namely, upper triangular system)

• Recall also that when solving n x n systems, we observed that diagonal systems were even easier to solve

• Can we come up with a factorization where we only have orthogonal and diagonal matrices?

Page 4: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

The SVD

• For any m x n matrix A, we can factor it into A = U∑VT, where:

U: m x m orthogonal matrixV: n x n orthogonal matrix∑: m x n diagonal matrix, with ∑i,i = σi ≥ 0

σi’s are typically ordered so σi ≥ σi+1 for i=1...n

Page 5: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

Terminology

• The σis are called the singular values of the matrix A

• The columns ui of U are called the left singular vectors of A

• The columns vi of V are called the right singular vectors of A

Page 6: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

Existence

• Note: we have not made any qualifications about A

• In particular, A doesn’t need to be full rank, and m can be less than, equal to, or greater than n (works for all sized matrices A). Essentially, every matrix has an SVD factorization

Page 7: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

Uniqueness

• SVD is “mostly” unique. Singular values σi are unique, and singular vectors ui and vi are unique up to choice of sign if σis are distinct. If σi = σi+1 for some i, then SVD is not unique

• Example: identity matrix

Page 8: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

How to Compute?

• Short answer: in MATLAB, with [U, S, V] = svd(A);

• Longer answer: algorithm is beyond the scope of this course

• Sufficient to know that factorization exists, we can compute it, and that computing it is more expensive than LU or QR factorization.

Page 9: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

What does it mean?

• Let A = U∑VT. What is Av1 (A multiplied by the first column of V)?

• VTv1 = e1 = [1;0;0;...0]

• ∑e1 = σ1e1 = [σ1;0;0;...0]

• Uσ1e1 = σ1Ue1 = σ1u1

• Extending this shows that Avi = σiui for all i

Page 10: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

What does it mean?

• Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices

v1v2

Unit Circle

Page 11: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

What does it mean?

• Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices

v1v2

Page 12: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

What does it mean?

• Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices

v1v2

v1

v2

Multiply by VT

(rotates v vectors to axes)

Page 13: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

What does it mean?

• Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices

v1

v2

Page 14: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

What does it mean?

• Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices

Page 15: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

What does it mean?

• Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices

Multiply by ∑ (scales 1st axis by σ1,

second by σ2)

Page 16: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

What does it mean?

• Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices

Page 17: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

What does it mean?

• Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices

Page 18: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

What does it mean?

• Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices

Multiply by U (rotates ellipse to u1, u2)

u1

u2

Page 19: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

What does it mean?

• Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices

u1

u2

v1v2

Multiply by U∑VT

Page 20: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

What does it mean?

• In 2D, says that A takes directions v1 and v2, scales them by σ1 and σ2, and then rotates them to u1 and u2

• Maps unit circle defined by v1 and v2 to ellipse defined by axes u1 and u2

• This geometric argument is true in general (but best not to try and imagine it for n > 3!)

Page 21: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

Rank Deficiency

• Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means that it maps to a subspace of the 2D plane (i.e. a line). This can also be seen to be mapping the unit circle to a degenerate ellipse where one axis has length 0

• In terms of our rotation and scaling operations, this is equivalent to having σ2=0 (why σ2?)

Page 22: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

Rank Deficiency

• Thus, we can use the SVD to determine the rank of a matrix: Compute the SVD, and count the number of singular values > 0 (in practice, need to count number of singular values > some small epsilon to account for floating point issues)

Page 23: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

Rank Deficiency

• If matrix is rank r

U1 U2

V1T

V2T

σ1 ⋱

σr

∑1

r m - r r

r

n-rm

n

n-r

Page 24: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

Range and Null Space

• Suppose vi is a vector in V2. What is Avi?

• If vi is in V2, then σi = 0. That means that Avi = σiui = 0

• Thus, vi is in the null space of A

• This means that V2 is an orthonormal basis for the null space of A (or null(A))

Page 25: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

Range and Null Space

• Similarly, let x be any random vector. Then Ax is a linear combination of the columns of U1

• This means that U1 is an orthonormal basis for range(A)

Page 26: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

Range and Null Space

• Since columns in V1 are orthonormal to V2, and V2 is an orthonormal basis for null(A), it follows that V1 is an orthonormal basis for the orthogonal complement of null(A) (or null(A)⊥ )

• Similarly, columns of U2 are orthonormal to U1, so U2 is an orthonormal basis for ran(A)⊥

Page 27: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

Skinny SVD

• Just like QR, there are some columns of U and V that we don’t need

• Namely, any vector in V2 gets zeroed out when multiplying by ∑, and any vector in U2 is zeroed when multiplying by ∑

• Can reconstruct A using only U1, ∑(1:r, 1:r), and V1, where U1 is m x r and V1 is n x r

Page 28: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

Generalizing Least Squares

• We know how to solve least squares as long as A is of full rank

• Suppose A is instead of rank r (σr+1 = ... = σn = 0). Can we still find an acceptable least squares solution?

• Note: have to combine both overdetermined and underdetermined strategies (may not be an exact solution, and we can add any vector in the null space without changing the answer)

Page 29: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

Rank-Deficient Least Squares

• ‖Ax-b‖22 = ‖U∑VTx - b‖22

=

=

=

U1 U2 ∑1 00 0

V1T

V2Tx

b-

U1T

U2T∑1 00 0

V1T

V2Tx b-

U1T∑1 0 V1T

V2Tx b- U2T b+

Page 30: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

Rank Deficient Least Squares

• =

Let y = = and a = U1Tb. Then

U1T∑1 0 V1T

V2Tx b-

V1T

V2Tx y1

y2

∑1 0 y1

y2

a-

Page 31: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

Rank Deficient Least Squares

• Can choose y2 to be anything and still satisfy equation. Let it be 0 to minimize norm

Then our solution is simply ∑1y1 = a and y2 = 0. Recall that y1 = V1Tx and a = U1Tb. Substituting everything, we get x = V1∑1-1U1Tb

∑1 0 y1

y2

a-

Page 32: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

Rank Deficient Least Squares

• Note that this included both of the observations we made while solving overdetermined and underdetermined systems (ignoring ‖U2Tb‖ since there was no way to minimize it, and seeing that we could arbitrarily set part of our solution vector (V2Tx) without changing the solution)

Page 33: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

Matrix Inverse

• Let A = U∑VT be an n x n matrix of full rank. Then A-1=V∑-1UT

• What can we observe about this?

• Roles of U and V are changed

• Singular values of inverse are reciprocals of singular values of A: 1/σi (note that they are in reverse order because of the way we order singular values)

Page 34: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

Matrix (Pseudo)Inverse

• We can generalize this notion of the matrix inverse to come up with the pseudoinverse, which exists for m x n matrices of rank r:A+ = V1∑1-1U1T, where V1, ∑1, and U1 are defined from the skinny SVD

• This is in a sense the closest matrix to the inverse for matrices that don’t have an inverse

Page 35: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

Matrix (Pseudo)Inverse

• Note that when A is n x n and full rank, U = U1, V = V1, and ∑ = ∑1, so the pseudoinverse is the inverse.

• Note that the pseudoinverse is just what we came up with in the general rank-deficient least squares case: the pseudoinverse A+b solves the least squares problem Ax=b

Page 36: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

Sidestep: Matrix Norms

• When introducing least squares, we introduced the concept of vector norms, which measured size.

• There is a corresponding notion of norms for matrices as well, which have similar properties as the vector norms:‖A‖ > 0 if A≠0‖c A‖ = |c| ‖A‖‖A + B‖ ≤ ‖A‖ + ‖B‖

Page 37: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

Sidestep: Matrix Norms

• Frobenius norm of a matrix: ‖A‖F = sqrt(sum(sum(Ai,j2)))

Page 38: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

Sidestep: Matrix Norms

• If ‖‖n is some vector norm, then there is a corresponding matrix norm defined as

‖A‖n = max ‖Ax‖n / ‖x‖n

Note: this does not necessarily give us a rule for how to compute the matrix norm

x≠0

Page 39: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

Sidestep: Matrix Norms

• ‖A‖1 happens to be easy to compute directly: it’s max ‖A(:,i)‖1, i.e. the maximum one-norm of the columns of A

• ‖A‖2 unfortunately is rather difficult to compute directly... at least, without our fancy new SVD. Namely, ‖A‖2 = σ1

Page 40: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

Sidestep: Matrix Norms

• Matrix norms are useful for measuring the maximum amount any vector x is scaled by when multiplying by A

• We’ll see on Wednesday that they are also useful for measuring error propagation in linear systems

Page 41: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

Rank-k Approximation

• A = U∑VT can be expressed as

sum(ui σi viT)

from i = 1 ... n

• Idea: cut off summation at value k < n

• Gives us the rank-k approximation

Page 42: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

Rank-k Approximation

• The rank-k approximation is a rank-k matrix that is the closest rank-k matrix to the original matrix A when measured using the Frobenius matrix norm

• ‖A-Ak‖F is minimal over all rank k matrices

• Same as setting k+1...n singular values to 0.

Page 43: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

Principal Component Analysis (PCA)

• Powerful tool for data analysis

• General idea: we have some data in a high dimensional space, but it actually comes from some low dimensional model, maybe corrupted by some noise

Page 44: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

Principal Component Analysis

• Example: Suppose we have a zip line (rope between two trees) that we can slide down. We set up a bunch of cameras to record our motion as we slide down. Each camera records an (x,y) position for us at each point in time - our position in camera space. We have 2n measurements, if we have n cameras. However, our motion is primarily in one direction (plus some error)

Page 45: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

Principal Component Analysis

Page 46: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

Principal Component Analysis

Page 47: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

Variance and Covariance

• Variance of a set of m measurements: how much it spreads from the mean∑(Xi - X)2/mwhere X is the mean of the data

• Covariance: how much a pair of measurements change together (dependence between the two variables)∑(Xi - X)(Yi - Y)/m

Page 48: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

Variance and Covariance

• Idea: choose basis that maximizes variance, minimizes covariance of measurements in that basis

• Maximizing variance: finding important dimensions

• Minimizing covariance: reducing dependence between pairs of measurements

Page 49: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

Variance and Covariance

• If we have experimental data for measurement X in m x 1 vector x, then variance is simply xTx/m

• If we have experimental data for two measurements X and Y in m x 1 vectors x and y, then covariance is simply xTy/m

Page 50: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

Variance and Covariance

• Let X be our m x n matrix of experimental data (m trials, n measurements per trial, adjusted so mean of each measurement is 0)

• Covariance matrix can then be expressed as XTX

• Want to find orthonormal basis that diagonalizes covariance matrix: this will minimize covariance, maximize variance (in that basis)

Page 51: Singular Value Decomposition - Cornell University · 2008-07-16 · Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means

PCA

• We can use the SVD to find a good basis: if X = U∑VT, then XTX = V∑∑VT

• Singular values σi give us relative importances of each dimension

• Columns of V give the orthonormal basis we’re interested in.