session :spectraltheorem, pca&singularvalue decomposition

38
Session : Spectral theorem, PCA & Singular Value Decomposition Optimization and Computational Linear Algebra for Data Science Marylou Gabrié (based on material by Léo Miolane)

Upload: others

Post on 15-May-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Session :Spectraltheorem, PCA&SingularValue Decomposition

Session�: Spectral theorem,PCA&SingularValueDecomposition

Optimization and Computational Linear Algebra for Data Science

Marylou Gabrié (based onmaterial by Léo Miolane)

Page 2: Session :Spectraltheorem, PCA&SingularValue Decomposition

Midterm

The Midterm exam is in �week.

Scope: Session � to Session � included - HW� to HW� included

Knowing is not enough! You need to practice: reviewproblems available on the last year’s course’s webpage.

Practice is not enough! You need to know thedefinitions/theorems/propositions.

Past years midterms also available, with solutions.

Important: when working on a problem, take at least ��minon it before looking at the solution (in case you are stuck).

You can bring notes, but if you think that you need them forthe exam, you are probably not prepared enough.

Page 3: Session :Spectraltheorem, PCA&SingularValue Decomposition

Contents

�. The Spectral Theorem�.� Theorem�.� Consequences�.� The Theorem behind PCA

�. Principal Component Analysis�. Singular Value Decomposition

Page 4: Session :Spectraltheorem, PCA&SingularValue Decomposition

�. The Spectral theorem �/�

�. TheSpectral theorem

Page 5: Session :Spectraltheorem, PCA&SingularValue Decomposition

�.� The Spectral theorem

�. The Spectral theorem �.� The Spectral theorem �/�

TheoremLetA œ Rn◊n be a symmetricmatrix. Then there is a orthonormalbasis ofRn composed of eigenvectors ofA.

That means that ifA is symmetric, then there exists an orthonormalbasis (v1, . . . , vn) ofRn and ⁄1, . . . , ⁄n œ R such that

Avi = ⁄ivi for all i œ {1, . . . , n}.

Theorem (Matrix formulation)LetA œ Rn◊n be a symmetricmatrix. Then there exists anorthogonal matrix P and a diagonal matrixD of sizes n ◊ n suchthat

A = PDP T.

1

Page 6: Session :Spectraltheorem, PCA&SingularValue Decomposition

The spectral orthonormal basis

�. The Spectral theorem �.� The Spectral theorem �/�

For any a EIR on rn is an orthonormal basisAofEnd eigenvectors

N Gian t hairnet G Nn onMA

An Can Aunt arm Avn

n a Mut t Ln on AnUn

Define DMe v4 orthogonal D Iya

pint 11 11 TIEExeaze PAPI AK for any ne IR PD.PT I

Page 7: Session :Spectraltheorem, PCA&SingularValue Decomposition

Geometric interpretation

�. The Spectral theorem �.� The Spectral theorem �/�

matrix an RixA symmetric Thereis O in to Lt

II PERO

Page 8: Session :Spectraltheorem, PCA&SingularValue Decomposition

�.� Consequences

�. The Spectral theorem �.� Consequences �/�

If

A = P

Q

ccccca

⁄1 0 · · · 00 ⁄2

...... . . . 00 · · · 0 ⁄n

R

dddddbP T

for some orthogonal matrix P then:

Consequence #�: ⁄1, . . . , ⁄n are the only eigenvalues ofA, and thenumber of time that an eigenvalue appear on the diagonal equalsits multiplicity.

Page 9: Session :Spectraltheorem, PCA&SingularValue Decomposition

Proof sketch on an example

�. The Spectral theorem �.� Consequences �/�

Consider n = 3 and

A = P

Q

ca3 0 00 3 00 0 ≠1

R

db P T where P =

Q

ca| | |

v1 v2 v3| | |

R

db

is an orthogonal matrix.

Arn P1 EPin

1 P13

Ava 3v2

Av3 V3multiplicityof An 3 m3 2 since Span unVa Elamultiplicityof 112 1 my 1 sinceSpanky E A

Mz t m i 3 dim IRS no more eigenvalues

Page 10: Session :Spectraltheorem, PCA&SingularValue Decomposition

Proof sketch on an example

�. The Spectral theorem �.� Consequences �/�

Page 11: Session :Spectraltheorem, PCA&SingularValue Decomposition

�.� Consequences

�. The Spectral theorem �.� Consequences �/�

If

A = P

Q

ccccca

⁄1 0 · · · 00 ⁄2

...... . . . 00 · · · 0 ⁄n

R

dddddbP T

for some orthogonal matrix P then:

Consequence #�: The rank ofA equals to the number of non-zero⁄i’s on the diagonal:

rank(A) = #)i-- ⁄i ”= 0

*.

Page 12: Session :Spectraltheorem, PCA&SingularValue Decomposition

Proof

�. The Spectral theorem �.� Consequences �/�

hd

MjKenA Eo A eigenspace associated witheigenvalue O

A fi In OnatePt

Basisof ker A is on Va

E Iv Va are linearly independent since thefamilyis orthonormal

Span on Ya KerAK Espanto va ne Kala

ÉÉns

at herA a darn t there that that thnkthereexists an dm

e An Ot to Leftturnn t Sparta k a

Page 13: Session :Spectraltheorem, PCA&SingularValue Decomposition

�.� Consequences

�. The Spectral theorem �.� Consequences ��/�

If

A = P

Q

ccccca

⁄1 0 · · · 00 ⁄2

...... . . . 00 · · · 0 ⁄n

R

dddddbP T

for some orthogonal matrix P then:

Consequence #�: A is invertible if and only if ⁄i ”= 0 for all i. In suchcase

A≠1 = P

Q

ccccca

1/⁄1 0 · · · 00 1/⁄2

...... . . . 00 · · · 0 1/⁄n

R

dddddbP T

EQUIVALENCE

cos

Page 14: Session :Spectraltheorem, PCA&SingularValue Decomposition

Proof

�. The Spectral theorem �.� Consequences ��/�

Excise

Page 15: Session :Spectraltheorem, PCA&SingularValue Decomposition

�.� Consequences

�. The Spectral theorem �.� Consequences ��/�

If

A = P

Q

ccccca

⁄1 0 · · · 00 ⁄2

...... . . . 00 · · · 0 ⁄n

R

dddddbP T

for some orthogonal matrix P then:

Consequence #�: Tr(A) = ⁄1 + · · · + ⁄n.

AE lRnxn

H TCA InAiifB in IR

mxn

C in tramTr BL Tr CB

Trl A EET Tr DIII Tr D

Page 16: Session :Spectraltheorem, PCA&SingularValue Decomposition

�.� The Theorembehind PCA

�. The Spectral theorem �.� The Theorem behind PCA ��/�

TheoremLetA be a n ◊ n symmetric matrix and let ⁄1 Ø · · · Ø ⁄n be its neigenvalues and v1, . . . , vn be an associated orthonormal family ofeigenvectors. Then

⁄1 = maxÎvÎ=1

vTAv and v1 = arg maxÎvÎ=1

vTAv .

Moreover, for k = 2, . . . , n:

⁄k = maxÎvÎ=1, v‹v1,...,vk≠1

vTAv , and vk = arg maxÎvÎ=1, v‹v1,...,vk≠1

vTAv.

FIRIRN

2 Az max VTAV Nz argmax FAVlull L HullNIU Vtv

Page 17: Session :Spectraltheorem, PCA&SingularValue Decomposition

Proof

�. The Spectral theorem �.� The Theorem behind PCA ��/�

let ME 112 suchthat Hulkslet Can an be the coordinates of u in basis by rn

Aa A Liv t an un

ofAuant tannin 1 in ban's

atAu n Au In I didnt thin

Maximize atAu subject to Hull L

Page 18: Session :Spectraltheorem, PCA&SingularValue Decomposition

Proof

�. The Spectral theorem �.� The Theorem behind PCA ��/�

Maximize atAn subject to Hulks

Maximize a dat an in subject to aft tanks

Since Xr 7 In the maximum is acheived

for da 1 X2 dm O

This corresponds to Me on

I For which at An An

Page 19: Session :Spectraltheorem, PCA&SingularValue Decomposition

Proof

�. The Spectral theorem �.� The Theorem behind PCA ��/�

If now we want to maximize atau subject toULuton

we maximize

Page 20: Session :Spectraltheorem, PCA&SingularValue Decomposition

�. Principal Component Analysis ��/�

�. PrincipalComponentAnalysis

Page 21: Session :Spectraltheorem, PCA&SingularValue Decomposition

Empiricalmean and covariance

�. Principal Component Analysis ��/�

We are given a dataset of n points a1, . . . , an œ Rd

d = 1

Mean

µ = 1n

nÿ

i=1ai œ R

Variance

‡2 = 1n

nÿ

i=1(ai≠µ)2 œ R

d Ø 2

Mean

µ = 1n

nÿ

i=1ai œ Rd

Covariancematrix

S = 1n

nÿ

i=1(ai ≠ µ)(ai ≠ µ)T œ Rd◊d

= 1n

nÿ

i=1aia

Ti if µ = 0.

Page 22: Session :Spectraltheorem, PCA&SingularValue Decomposition

Empiricalmean and covariance

�. Principal Component Analysis ��/�

We are given a dataset of n points a1, . . . , an œ Rd

d = 1

Mean

µ = 1n

nÿ

i=1ai œ R

Variance

‡2 = 1n

nÿ

i=1(ai≠µ)2 œ R

d Ø 2

Mean

µ = 1n

nÿ

i=1ai œ Rd

Covariancematrix

S = 1n

nÿ

i=1(ai ≠ µ)(ai ≠ µ)T œ Rd◊d

= 1n

nÿ

i=1aia

Ti if µ = 0.

She InEnlaiulalaia

Eid xd

Page 23: Session :Spectraltheorem, PCA&SingularValue Decomposition

PCA

�. Principal Component Analysis ��/�

We are given a dataset of n points a1, . . . , an œ Rd, where d is«large».Goal: represent this dataset in lower dimension, i.e. findÂa1, . . . , Âan œ Rk where k π d.Assume that the dataset is centered:

qni=1 ai = 0.

Then, S can be simply written as:

S =nÿ

i=1aia

Ti = ATA.

whereA is the n ◊ d “data matrix”:

A =

Q

ca

≠ aT1 ≠...

≠ aTn ≠

R

db .

designmatrix

Page 24: Session :Spectraltheorem, PCA&SingularValue Decomposition

Direction ofmaximal variance

�. Principal Component Analysis ��/�

i

Page 25: Session :Spectraltheorem, PCA&SingularValue Decomposition

Direction ofmaximal variance

�. Principal Component Analysis ��/�

agent Titi as

Efron

Letzte Chanth thian

Cv In an t tan

O

Lu ai Tai ait v

IÉÉÉ iii anÉ naiait

vt InE aiait vNt STER

Page 26: Session :Spectraltheorem, PCA&SingularValue Decomposition

Direction ofmaximal variance

�. Principal Component Analysis ��/�

Good news: S = ATA is symmetric.Spectral Theorem: let ⁄1 Ø ⁄2 Ø · · · Ø ⁄n be the eigenvalues of Sand (v1, . . . , vn) an associated orthonormal basis of eigenvectors.

SEATTLEagmÉ I É ai ait A III

06.50 Sispositivedefinite 147,127 7 In

Byprevious theorem 1.3

we let Su is maximized by Iswith Us by

Transform an an in IRdIn the dimensionallyreduced at or an each in IR

É Vn an

Page 27: Session :Spectraltheorem, PCA&SingularValue Decomposition

�nd direction ofmaximal variance

�. Principal Component Analysis ��/�

BythetheoremNz maximizes its a subject to

11411 1

let V2

A 2d reduction of the data set an an eachin Irt

aifairyam

anKanin

an V2

Page 28: Session :Spectraltheorem, PCA&SingularValue Decomposition

jth direction ofmaximal variance

�. Principal Component Analysis ��/�

The « jth direction of maximal variance » is vj since vj issolution of

maximize vTSv, subject to ÎvÎ = 1, v ‹ v1, v ‹ v2, . . . , v ‹ vj≠1.

The dimensionally reduced dataset of in k-dimensions is thenQ

cccca

Èv1, a1ÍÈv2, a1Í

...Èvk, a1Í

R

ddddb,

Q

cccca

Èv1, a2ÍÈv2, a2Í

...Èvk, a2Í

R

ddddb,

Q

cccca

Èv1, a3ÍÈv2, a3Í

...Èvk, a3Í

R

ddddb. . .

Q

cccca

Èv1, anÍÈv2, anÍ

...Èvk, anÍ

R

ddddb.

as at as an

Page 29: Session :Spectraltheorem, PCA&SingularValue Decomposition

Recap

�. Principal Component Analysis ��/�

How to conpute reduced dimensional dataset?

CENTER DATA SUCH THAT Eai D

COMPUTE COVARIANCE S IndiaT ATA

COMPUTE EIGEN DECOMPOSITION SORTED

1471127 Xm OFS

Na in

KEEP K LARGEST

Compote at In

Page 30: Session :Spectraltheorem, PCA&SingularValue Decomposition

Which value of k shouldwe take?

�. Principal Component Analysis ��/�

IWAY KEEP A GIVEN FRACTION OF TOTAL VARIANCE

in ItnChoose the smallest k such that fee 70.8

Page 31: Session :Spectraltheorem, PCA&SingularValue Decomposition

Which value of k shouldwe take?

�. Principal Component Analysis ��/�

2ndWA PLOT EIGEN VANES OF S

f gapbetween Az and43

111Keep Foie

Page 32: Session :Spectraltheorem, PCA&SingularValue Decomposition

�. Singular Value Decomposition ��/�

�. SingularValueDecomposition

CENTERED DATA

an aar r ar r

n I I

Page 33: Session :Spectraltheorem, PCA&SingularValue Decomposition

PCA

�. Singular Value Decomposition ��/�

Data matrix A œ Rn◊m

“Covariance matrix” S = ATA œ Rm◊m.

S is symmetric positive semi-definite.

Spectral Theorem: there exists an orthonormal basisv1, . . . , vm ofRm such that the vi’s are eigenvectors of Sassociated with the eigenvalues ⁄1 Ø · · · Ø ⁄m Ø 0.

Page 34: Session :Spectraltheorem, PCA&SingularValue Decomposition

Singular values/vectors

�. Singular Value Decomposition ��/�

For i = 1, . . . , m:we define ‡i =

Ô⁄i, called the ith singular value ofA.

we call vj the ith right singular vector ofA.

For i = 1, . . . , r:we call ui = 1

‡iAvi the ith le� singular vector ofA.

If r < n, we add ur+1, · · · un such that u1, · · · un is an orthonormalbasis ofRn.

Page 35: Session :Spectraltheorem, PCA&SingularValue Decomposition

Singular Value decomposition

�. Singular Value Decomposition ��/�

TheoremLetA œ Rn◊m. Then there exists two orthogonal matricesU œ Rn◊n and V œ Rm◊m and amatrix� œ Rn◊m such that�1,1 Ø �2,2 Ø · · · Ø 0 and�i,j = 0 for i ”= j, that verify

A = U�V T.

Page 36: Session :Spectraltheorem, PCA&SingularValue Decomposition

Geometric interpretation of U�V T

�. Singular Value Decomposition ��/�

Page 37: Session :Spectraltheorem, PCA&SingularValue Decomposition

Questions?

��/�

A fataiay

A data point Inmy

points

Variance of 1st coordinate In far it

co variance of 1st 2nd coordinate

In E Caa a ai ut i datpaintindex

covariance k e featureifup0ztalee.S're I É lari a aei ne

Page 38: Session :Spectraltheorem, PCA&SingularValue Decomposition

Questions?

��/�