matrix decompositions cheat sheet - skoltech

1
L A T E X TikZposter Matrix Decompositions Cheat Sheet Numerical linear algebra course at Skoltech by Ivan Oseledets Poster is prepared by TAs Maxim Rakhuba and Alexandr Katrutsa Based on the idea by Skoltech students, NLA 2016 Matrix Decompositions Cheat Sheet Numerical linear algebra course at Skoltech by Ivan Oseledets Poster is prepared by TAs Maxim Rakhuba and Alexandr Katrutsa Based on the idea by Skoltech students, NLA 2016 Name Definition ! Algorithms Use cases SVD (Singular Value Decomposition) A = " # U m×m " # σ 1 σ r Σ m×n " # V * n×n I r = rank(A) I U , V — unitary I σ 1 ... σ r > 0 are nonzero singular values I columns of U , V are singular vectors Note: SVD can be also defined with U C m×p , Σ R p×p and V C n×p , p = min{n.m} I Singular values are unique I If all σ i are different, U and V are unique up to unitary diagonal D: U ΣV * =(UD)Σ(VD) * I If some σ i coincide, then U and V are not unique I SVD via spectral decomposition of AA * and A * A stability issues I Stable algorithm, O(mn 2 ) flops (m>n): 1. Bidiagonalize A by Householder reflections A = U 1 BV * 1 = U 1 " # V * 1 2. Find SVD of B = U 2 ΣV * 2 by spectral decom- position of T (2 options): a) T = B * B , don’t form T explicitly! b) T = h B * B i , permute T to tridiagonal 3. U = U 1 U 2 , V = V 1 V 2 I Data compression, as Eckart-Young theorem states that truncated SVD A k = " # U k m×k h i σ 1 σ k Σ k k×k h i V * k k×n yields best rank-k approximation to A in k·k 2,F I Calculation of pseudoinverse A + , e.g. in solv- ing over/underdetermined, singular, or ill-posed linear systems I Feature extraction in machine learning Note: SVD is also called principal component analysis (PCA) Skeleton (also known as Rank decomposition) X = U m×r h i V > r×n or using matrix entries A m×n = b C m×r h i b A -1 -1 r×r h i b R r×n I r = rank(A) I C and b C are full column rank I R and b R are full row rank Not unique: I in A = CR version S : det(S ) 6=0: CR = CSS -1 R = e C e R I in A = b C b A -1 b R version any r linearly inde- pendent columns and rows can be chosen Assuming m>n: I truncated SVD, O(mn 2 ) flops, C = U r Σ r ,R = V * r I RRQR: O(mnr ) flops I Cross approximation: O((n + m)r 2 ) flops. It is based on greedy maximization of | det( b A)|. Might fail on some A. I Optimization methods (ALS, ...) for kA - CRk→ min C,R , sometimes with additional constraints, e.g. – nonnegativity of C and R elements – small norms of C and R I Model reduction, data compression, and speedup of computations in numerical analy- sis: given rank-r matrix with r n, m one needs to store O((n + m)r ) nm elements I Feature extraction in machine learning, where it is also known as matrix factorization I All applications where SVD applies, since Skele- ton decomposition can be transformed into truncated SVD form Schur A = " # U n×n " # λ 1 λ n T n×n " # U * n×n I U is unitary I λ 1 ,...,λ n are eigenvalues I columns of U are Schur vectors I Not unique in terms of both U and T : per- mutation of λ 1 ,...,λ n in T will change both U and off-diagonal part of T I QR algorithm, O(n 4 ) flops: A k = Q k R k ,A k +1 = R k Q k I “Smart” QR algorithm, O(n 3 ) flops: 1. Reduce A to upper Hessenberg form e A = Q * AQ = " # Note: then each iteration of QR algorithm will cost O(n 2 ) 2. Run QR algorithm for e A with shifting strat- egy to speed-up convergence I Computation of matrix spectrum I Computation of matrix functions (Schur-Parlett algorithm) I Solving matrix equations (e.g. Sylvester equa- tion) Spectral A = " # S n×n " # λ 1 λ n Λ n×n " # -1 S -1 n×n I λ 1 ,...,λ n are eigenvalues I columns of S are eigenvectors I iff λ i its geo- metric multiplic- ity equals alge- braic multiplicity I and S – unitary iff A is normal: AA * = A * A, e.g. Hermitian I If all λ i are differ- ent, then unique up to permutation and scaling of eigenvec- tors I If some λ i coincide, S is not unique I If A = A * , Jacobi method: O(n 3 ) I If AA * = A * A, QR algorithm: O(n 3 ) I If AA * 6= A * A, O(n 3 ) flops: 1. Find Schur form A = UTU * via QR algo- rithm 2. Given T find its eigenvectors V 3. S = UV , Λ = diag(T ) I Full spectral decomposition is rarely used unless all eigenvectors are needed I If one needs only spectrum, Schur decomposi- tion is the method of choice I If matrix has no spectral decomposition, Schur decomposition is preferable for numerics com- pared to Jordan form QR A = Q is left unitary m×n " # R n×n m n A = " # Q is unitary m×m " # R m×n m<n I Unique if all diagonal elements of R are set to be positive Assuming m>n: I Gram-Schmidt (GS) process: 2mn 2 flops; not stable I modified Gram-Schmidt (MGS) process: 2mn 2 flops; stable I via Householder reflections: 2mn 2 - (2/3)n 3 flops; best for dense matrices, sequential computer architectures; stable I via Givens rotations: 3mn 2 - n 3 flops; best for sparse matrices, parallel computer archi- tectures; stable I Computation of orthogonal basis in a linear space I Solving least squares problem (m>n): kAx - bk 2 min x x = R -1 Q * b I Solving linear systems Note: more stable, but has larger constant than LU I Don’t confuse QR decomposition and QR al- gorithm! RRQR (Rank Revealing QR) AP = " # Q is unitary n×n " # R r n - r n×n I P is permutation matrix I r = rank(A) I Not unique since any r linearly inde- pendent columns can be selected I Basic algorithm: Householder QR with col- umn pivoting. On k -th iteration: 1. Find column of largest norm in R k [:,k:n] 2. Permute this column and the k -th column 3. Zero subcolumn of the k -th column by Householder reflection R k +1 Complexity: O(nmr ) flops I Solving rank deficient least squares problem I Finding subset of linearly independent columns I Computation of matrix approximation of a given rank LU A = " # 1 1 L n×n " # U n×n Let det(A) 6=0 I LU iff all leading minors 6=0 I Unique if det(A) 6=0 I Different versions of Gaussian elimination, O(n 3 ) flops. In LU for stability use permuta- tion of rows or columns (LUP) I O(n 3 ) can be decreased for sparse matri- ces by appropriate permutations, e.g. – minimum degree ordering – Cuthill–McKee algorithm I Banded matrix with bandwidth b 2b can be decomposed using O(nb 2 ) flops LU, LDL, Cholesky are used for I solving linear systems. Given A = LU , complex- ity of solving Ax = b is O(n 2 ): 1. Forward substitution: Ly = b 2. Backward substitution: Ux = y I matrix inversion I computation of determinant Cholesky is also used for I computing QR decomposition LDL A = 1 1 L n×n D n×n 1 1 L * n×n Let det(A) 6=0 I LDL iff A = A * and all leading minors 6=0 Cholesky A = " # L n×n " # L * n×n I Cholesky iff A = A * and A 0 I Unique if A 0 References [1]G. H. G C. F. V L, Matrix computations, JHU Press, 4th ed., 2013. [2] L. N. T D. B III, Numerical linear algebra, vol. 50, SIAM, 1997. [3] E. E. T, A brief introduction to numerical analysis, Springer Science & Business Media, 2012. Contact information Course materials: https://github.com/oseledets/nla2016 Email: [email protected] Our research group website: oseledets.github.io

Upload: others

Post on 16-Apr-2022

28 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Matrix Decompositions Cheat Sheet - Skoltech

LATEX TikZposter

Matrix Decompositions Cheat SheetNumerical linear algebra course at Skoltech by Ivan Oseledets

Poster is prepared by TAs Maxim Rakhuba and Alexandr Katrutsa

Based on the idea by Skoltech students, NLA 2016

Matrix Decompositions Cheat SheetNumerical linear algebra course at Skoltech by Ivan Oseledets

Poster is prepared by TAs Maxim Rakhuba and Alexandr Katrutsa

Based on the idea by Skoltech students, NLA 2016

Name Definition ∃ ! Algorithms Use cases

SVD

(Singular ValueDecomposition)

A =

[ ]U

m×m

[ ]σ1

σr

Σm×n

[ ]V ∗

n×n

I r = rank(A)

I U , V — unitaryI σ1 ≥ . . . ≥ σr > 0 are nonzero singular

valuesI columns of U , V are singular vectors

Note: SVD can be also defined with U ∈ Cm×p,Σ ∈ Rp×p and V ∈ Cn×p, p = min{n.m}

I Singular values areunique

I If all σi are different, Uand V are unique upto unitary diagonalD:UΣV ∗ = (UD)Σ(V D)∗

I If some σi coincide,then U and V are notunique

I SVD via spectral decomposition of AA∗ andA∗A – stability issues

I Stable algorithm, O(mn2) flops (m > n):1. BidiagonalizeAby Householder reflections

A = U1BV∗

1 = U1

[ ]V ∗1

2. Find SVD ofB = U2ΣV ∗2 by spectral decom-position of T (2 options):

a) T = B∗B, don’t form T explicitly!b) T =

[B∗

B

], permute T to tridiagonal

3. U = U1U2, V = V1V2

IData compression, as Eckart-Young theoremstates that truncated SVD

Ak =

[ ]Uk

m×k

[ ]σ1

σk

Σk

k×k

[ ]V ∗k

k×n

yields best rank-k approximation to A in ‖ · ‖2,FICalculation of pseudoinverse A+, e.g. in solv-ing over/underdetermined, singular, or ill-posedlinear systems

I Feature extraction in machine learningNote: SVD is also called principal component analysis (PCA)

Skeleton

(also known asRank

decomposition)

X =

Um×r

[ ]V >

r×n

or using matrix entries

Am×n

=

Cm×r

[ ]A−1

−1

r×r

[ ]R

r×n

I r = rank(A)

IC and C are full column rankIR and R are full row rank

Not unique:I in A = CR version ∀S:

det(S) 6= 0:CR = CSS−1R = CR

I in A = CA−1R versionany r linearly inde-pendent columnsand rows can bechosen

Assuming m > n:I truncated SVD, O(mn2) flops,

C = UrΣr, R = V ∗r

I RRQR: O(mnr) flopsICross approximation: O((n + m)r2) flops. Itis based on greedy maximization of | det(A)|.Might fail on some A.

IOptimization methods (ALS, ...) for‖A− CR‖ → min

C,R,

sometimes with additional constraints, e.g.– nonnegativity of C and R elements– small norms of C and R

IModel reduction, data compression, andspeedup of computations in numerical analy-sis: given rank-r matrix with r � n,m one needsto store O((n + m)r)� nm elements

I Feature extraction in machine learning, whereit is also known as matrix factorization

IAll applications where SVD applies, since Skele-ton decomposition can be transformed intotruncated SVD form

Schur

A =

[ ]U

n×n

[ ]λ1

λn

Tn×n

[ ]U ∗

n×n

I U is unitaryI λ1, . . . , λn are eigenvaluesI columns of U are Schur vectors

INot unique in terms ofboth U and T : per-mutation of λ1, . . . , λnin T will change bothU and off-diagonalpart of T

IQR algorithm, O(n4) flops:Ak = QkRk, Ak+1 = RkQk

I “Smart” QR algorithm, O(n3) flops:1. Reduce A to upper Hessenberg form

A = Q∗AQ =

[ ]Note: then each iteration of QR algorithm will cost O(n2)

2. Run QR algorithm for A with shifting strat-egy to speed-up convergence

IComputation of matrix spectrumIComputation of matrix functions (Schur-Parlettalgorithm)

I Solving matrix equations (e.g. Sylvester equa-tion)

Spectral

A =

[ ]S

n×n

[ ]λ1

λn

Λn×n

[ ]−1

S−1n×n

I λ1, . . . , λn are eigenvaluesI columns of S are eigenvectors

I ∃ iff ∀λi its geo-metric multiplic-ity equals alge-braic multiplicity

I ∃ and S – unitaryiff A is normal:

AA∗ = A∗A,e.g. Hermitian

I If all λi are differ-ent, then unique upto permutation andscaling of eigenvec-tors

I If some λi coincide,S is not unique

I If A = A∗, Jacobi method: O(n3)

I If AA∗ = A∗A, QR algorithm: O(n3)

I If AA∗ 6= A∗A, O(n3) flops:1. Find Schur form A = UTU∗ via QR algo-rithm

2. Given T find its eigenvectors V3. S = UV , Λ = diag(T )

I Full spectral decomposition is rarely used unlessall eigenvectors are needed

I If one needs only spectrum, Schur decomposi-tion is the method of choice

I If matrix has no spectral decomposition, Schurdecomposition is preferable for numerics com-pared to Jordan form

QR

A =

Q is left unitarym×n

[ ]R

n×n

m ≥ n

A =

[ ]Q is unitary

m×m

[ ]R

m×n

m < n

I Unique if all diagonalelements of R are setto be positive

Assuming m > n:IGram-Schmidt (GS) process: 2mn2 flops; notstable

Imodified Gram-Schmidt (MGS) process:2mn2 flops; stable

I via Householder reflections: 2mn2 − (2/3)n3

flops; best for dense matrices, sequentialcomputer architectures; stable

I via Givens rotations: 3mn2 − n3 flops; bestfor sparsematrices, parallel computer archi-tectures; stable

IComputation of orthogonal basis in a linearspace

I Solving least squares problem (m > n):‖Ax− b‖2→ min

x⇒ x = R−1Q∗b

I Solving linear systemsNote: more stable, but has larger constant than LU

IDon’t confuse QR decomposition and QR al-gorithm!

RRQR

(RankRevealing QR)

AP =

[ ]Q is unitary

n×n

[ ]R

r n− r n×n

I P is permutation matrixI r = rank(A)

INot unique sinceany r linearly inde-pendent columnscan be selected

I Basic algorithm: Householder QR with col-umn pivoting. On k-th iteration:1. Find column of largest norm in Rk[:,k:n]2. Permute this column and the k-th column3. Zero subcolumn of the k-th column by

Householder reflection→ Rk+1

Complexity: O(nmr) flops

I Solving rank deficient least squares problemI Finding subset of linearly independent columnsIComputation of matrix approximation of agiven rank

LU A =

[ ]1

1

Ln×n

[ ]U

n×n

Let det(A) 6= 0

I LU ∃ iff all leadingminors 6= 0

I Unique if det(A) 6= 0 IDifferent versions of Gaussian elimination,O(n3) flops. In LU for stability use permuta-tion of rows or columns (LUP)

IO(n3) can be decreased for sparse matri-ces by appropriate permutations, e.g.–minimum degree ordering–Cuthill–McKee algorithm

I Banded matrix with bandwidth b[ ]2b

can be decomposed using O(nb2) flops

LU, LDL, Cholesky are used forI solving linear systems. Given A = LU , complex-ity of solving Ax = b is O(n2):1. Forward substitution: Ly = b

2. Backward substitution: Ux = y

Imatrix inversionI computation of determinant

Cholesky is also used forI computing QR decomposition

LDL A =

1

1

Ln×n

D

n×n

1

1

L∗n×n

Let det(A) 6= 0

I LDL ∃ iff A = A∗

and all leadingminors 6= 0

Cholesky A =

[ ]L

n×n

[ ]L∗

n×n

ICholesky ∃ iffA = A∗ and A � 0

I Unique if A � 0

References[1] G. H. Golub and C. F. Van Loan, Matrix computations, JHU Press, 4th ed., 2013.[2] L. N. Trefethen and D. Bau III, Numerical linear algebra, vol. 50, SIAM, 1997.[3] E. E. Tyrtyshnikov, A brief introduction to numerical analysis, Springer Science & Business Media, 2012.

Contact informationCourse materials: https://github.com/oseledets/nla2016Email: [email protected] research group website: oseledets.github.io