singular value decomposition (svd)/ principal … › ... › files › 5_pca.pdf · pca -...

31
SINGULAR VALUE DECOMPOSITION (SVD)/ PRINCIPAL COMPONENTS ANALYSIS (PCA) 1

Upload: others

Post on 26-Jun-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SINGULAR VALUE DECOMPOSITION (SVD)/ PRINCIPAL … › ... › files › 5_PCA.pdf · PCA - INTRODUCTION!4 X = 1 2 4 2 1 5 3 4 10 4 3 11. PCA - INTRODUCTION!5. PCA - INTRODUCTION!6

SINGULAR VALUE DECOMPOSITION (SVD)/ PRINCIPAL COMPONENTS ANALYSIS (PCA)

!1

Page 2: SINGULAR VALUE DECOMPOSITION (SVD)/ PRINCIPAL … › ... › files › 5_PCA.pdf · PCA - INTRODUCTION!4 X = 1 2 4 2 1 5 3 4 10 4 3 11. PCA - INTRODUCTION!5. PCA - INTRODUCTION!6

SVD - EXAMPLE

!2

U, S, VT = numpy . linalg . svd(img)

Page 3: SINGULAR VALUE DECOMPOSITION (SVD)/ PRINCIPAL … › ... › files › 5_PCA.pdf · PCA - INTRODUCTION!4 X = 1 2 4 2 1 5 3 4 10 4 3 11. PCA - INTRODUCTION!5. PCA - INTRODUCTION!6

SVD - EXAMPLE

!3

full rank 600 300

100

50 20

10 U[: , k]S[: k]VT[: k, :]

Page 4: SINGULAR VALUE DECOMPOSITION (SVD)/ PRINCIPAL … › ... › files › 5_PCA.pdf · PCA - INTRODUCTION!4 X = 1 2 4 2 1 5 3 4 10 4 3 11. PCA - INTRODUCTION!5. PCA - INTRODUCTION!6

PCA - INTRODUCTION

!4

X =

1 2 42 1 53 4 104 3 11

Page 5: SINGULAR VALUE DECOMPOSITION (SVD)/ PRINCIPAL … › ... › files › 5_PCA.pdf · PCA - INTRODUCTION!4 X = 1 2 4 2 1 5 3 4 10 4 3 11. PCA - INTRODUCTION!5. PCA - INTRODUCTION!6

PCA - INTRODUCTION

!5

Page 6: SINGULAR VALUE DECOMPOSITION (SVD)/ PRINCIPAL … › ... › files › 5_PCA.pdf · PCA - INTRODUCTION!4 X = 1 2 4 2 1 5 3 4 10 4 3 11. PCA - INTRODUCTION!5. PCA - INTRODUCTION!6

PCA - INTRODUCTION

!6

Page 7: SINGULAR VALUE DECOMPOSITION (SVD)/ PRINCIPAL … › ... › files › 5_PCA.pdf · PCA - INTRODUCTION!4 X = 1 2 4 2 1 5 3 4 10 4 3 11. PCA - INTRODUCTION!5. PCA - INTRODUCTION!6

PCA - INTRODUCTION

!7

Page 8: SINGULAR VALUE DECOMPOSITION (SVD)/ PRINCIPAL … › ... › files › 5_PCA.pdf · PCA - INTRODUCTION!4 X = 1 2 4 2 1 5 3 4 10 4 3 11. PCA - INTRODUCTION!5. PCA - INTRODUCTION!6

PRINCIPAL COMPONENT ANALYSIS

• A technique to find the directions along which the points (set of tuples) in high-dimensional data line up best.

• Treat a set of tuples as a matrix M and find the eigenvectors for MMT or MTM.

• The matrix of these eigenvectors can be thought of as a rigid rotation in a high-dimensional space.

• When this transformation is applied to the original data - the axis corresponding to the principal eigenvector is the one along which the points are most “spread out”.

!8

Page 9: SINGULAR VALUE DECOMPOSITION (SVD)/ PRINCIPAL … › ... › files › 5_PCA.pdf · PCA - INTRODUCTION!4 X = 1 2 4 2 1 5 3 4 10 4 3 11. PCA - INTRODUCTION!5. PCA - INTRODUCTION!6

PRINCIPAL COMPONENT ANALYSIS

• When this transformation is applied to the original data - the axis corresponding to the principal eigenvector is the one along which the points are most “spread out”.

• This axis is the one along which variance of the data is maximized.

• Points can best be viewed as lying along this axis with small deviations from this axis.

• Likewise, the axis corresponding the second eigenvector is the axis along which the variance of distances from the first axis is greatest, and so on.

!9

Page 10: SINGULAR VALUE DECOMPOSITION (SVD)/ PRINCIPAL … › ... › files › 5_PCA.pdf · PCA - INTRODUCTION!4 X = 1 2 4 2 1 5 3 4 10 4 3 11. PCA - INTRODUCTION!5. PCA - INTRODUCTION!6

PRINCIPAL COMPONENT ANALYSIS• Principal Component Analysis (PCA) is a dimensionality reduction

method.

• The goal is to embed data in high dimensional space, onto a small number of dimensions.

• It most frequent use is in exploratory data analysis and visualization.

• It can also be helpful in regression (linear or logistic) where we can transform input variables into a smaller number of predictors for modeling.

!10

Page 11: SINGULAR VALUE DECOMPOSITION (SVD)/ PRINCIPAL … › ... › files › 5_PCA.pdf · PCA - INTRODUCTION!4 X = 1 2 4 2 1 5 3 4 10 4 3 11. PCA - INTRODUCTION!5. PCA - INTRODUCTION!6

PRINCIPAL COMPONENT ANALYSIS• Mathematically,Given: Data set where, is the vector of p variable values for the i-th observation. Return:Matrix of linear transformations that retain maximal variance.

• You can think of the first vector as a linear transformation that embeds observations into 1 dimension

!11

{x1, x2, . . . , xn}

xi

[ϕ1, ϕ2, . . . , ϕp]

ϕ1

Z1 = ϕ11X1 + ϕ21X2 + … + ϕp1Xp

Page 12: SINGULAR VALUE DECOMPOSITION (SVD)/ PRINCIPAL … › ... › files › 5_PCA.pdf · PCA - INTRODUCTION!4 X = 1 2 4 2 1 5 3 4 10 4 3 11. PCA - INTRODUCTION!5. PCA - INTRODUCTION!6

PRINCIPAL COMPONENT ANALYSIS• You can think of the first vector as a linear transformation that

embeds observations into 1 dimensionwhere is selected so that the resulting dataset has maximum variance.

• In order for this to make sense, mathematically, data has to be centered

• Each has zero mean • Transformation vector has to be normalized, i.e.,

!12

ϕ1

Z1 = ϕ11X1 + ϕ21X2 + … + ϕp1Xp

ϕ1 {zi, …, zn}

Xiϕ1

p

∑j=1

ϕ2j1 = 1

Page 13: SINGULAR VALUE DECOMPOSITION (SVD)/ PRINCIPAL … › ... › files › 5_PCA.pdf · PCA - INTRODUCTION!4 X = 1 2 4 2 1 5 3 4 10 4 3 11. PCA - INTRODUCTION!5. PCA - INTRODUCTION!6

PRINCIPAL COMPONENT ANALYSIS• In order for this to make sense, mathematically, data has to be

centered • Each has zero mean • Transformation vector has to be normalized, i.e.,

• We can find by solving an optimization problem: Maximize variance but subject to normalization constraint.

!13

ϕ1

Xi p

∑j=1

ϕ2j1 = 1

ϕ1

maxϕ11,ϕ21,…,ϕp1

1n

n

∑i=1 (

p

∑j=1

ϕj1xij)2

s.t. p

∑j=1

ϕ2j1 = 1

Page 14: SINGULAR VALUE DECOMPOSITION (SVD)/ PRINCIPAL … › ... › files › 5_PCA.pdf · PCA - INTRODUCTION!4 X = 1 2 4 2 1 5 3 4 10 4 3 11. PCA - INTRODUCTION!5. PCA - INTRODUCTION!6

PRINCIPAL COMPONENT ANALYSIS

• We can find by solving an optimization problem: Maximize variance but subject to normalization constraint.

• The second transformation, is obtained similarly with the added constraint that is orthogonal to

• Taken together define a pair of linear transformations of the data into 2 dimensional space

!14

ϕ1

maxϕ11,ϕ21,…,ϕp1

1n

n

∑i=1 (

p

∑j=1

ϕj1xij)2

s.t. p

∑j=1

ϕ2j1 = 1

ϕ2

ϕ2 ϕ1

[ϕ1, ϕ2]

Zn×2 = Xn×p[ϕ1, ϕ2]p×2

Page 15: SINGULAR VALUE DECOMPOSITION (SVD)/ PRINCIPAL … › ... › files › 5_PCA.pdf · PCA - INTRODUCTION!4 X = 1 2 4 2 1 5 3 4 10 4 3 11. PCA - INTRODUCTION!5. PCA - INTRODUCTION!6

PRINCIPAL COMPONENT ANALYSIS

• Taken together define a pair of linear transformations of the data into 2 dimensional space

• Each of the columns of the Z matrix are called Principal components.

• The units of the PCs are meaningless.

• In practice we may also scale to have unit variance.

• In general if variables are measured in different units(e.g., miles vs. liters vs. dollars), variables should be scaled to have unit variance.

!15

[ϕ1, ϕ2]

Zn×2 = Xn×p[ϕ1, ϕ2]p×2

Xj

Xj

Page 16: SINGULAR VALUE DECOMPOSITION (SVD)/ PRINCIPAL … › ... › files › 5_PCA.pdf · PCA - INTRODUCTION!4 X = 1 2 4 2 1 5 3 4 10 4 3 11. PCA - INTRODUCTION!5. PCA - INTRODUCTION!6

!16

(XT X)ϕ = λϕ

XXT(Xϕ) = λ(Xϕ)

Using Spectral theorem

The matrices XXT and XT X share the same nonzero eigenvalues

Conclusion:

To get an eigenvector of XXT from XT X multiply ϕ on the left by X

SPECTRAL THEOREM

Very powerful, particularly if number of observations, m, and the number of predictors, n, are drastically different in size.

Cov(X, X ) = XXTFor PCA:

Page 17: SINGULAR VALUE DECOMPOSITION (SVD)/ PRINCIPAL … › ... › files › 5_PCA.pdf · PCA - INTRODUCTION!4 X = 1 2 4 2 1 5 3 4 10 4 3 11. PCA - INTRODUCTION!5. PCA - INTRODUCTION!6

EXAMPLE - PCA

!17

X =

1 22 13 44 3

Eigen Values and Eigen Vectors?

Page 18: SINGULAR VALUE DECOMPOSITION (SVD)/ PRINCIPAL … › ... › files › 5_PCA.pdf · PCA - INTRODUCTION!4 X = 1 2 4 2 1 5 3 4 10 4 3 11. PCA - INTRODUCTION!5. PCA - INTRODUCTION!6

EXAMPLE - PCA

!18

X =

1 22 13 44 3

From spectral theorem:

(XT X)ϕ = λϕ

XT X = [1 2 3 42 1 4 3]

1 22 13 44 3

= [30 2828 30]

Page 19: SINGULAR VALUE DECOMPOSITION (SVD)/ PRINCIPAL … › ... › files › 5_PCA.pdf · PCA - INTRODUCTION!4 X = 1 2 4 2 1 5 3 4 10 4 3 11. PCA - INTRODUCTION!5. PCA - INTRODUCTION!6

EXAMPLE - PCA

!19

X =

1 22 13 44 3

From spectral theorem:

(XT X)ϕ = λϕ ⟹ (XT X)ϕ − λIϕ = 0

[30 − λ 2828 30 − λ] = 0 ⟹ λ = 58 and λ = 2

((XT X) − λI)ϕ = 0

Page 20: SINGULAR VALUE DECOMPOSITION (SVD)/ PRINCIPAL … › ... › files › 5_PCA.pdf · PCA - INTRODUCTION!4 X = 1 2 4 2 1 5 3 4 10 4 3 11. PCA - INTRODUCTION!5. PCA - INTRODUCTION!6

EXAMPLE - PCA

!20

X =

1 22 13 44 3

From spectral theorem:

(XT X)ϕ = λϕ

[30 2828 30] [ϕ11

ϕ12] = 58 [ϕ11

ϕ12] ⟹ ϕ1 =

1

21

2

Page 21: SINGULAR VALUE DECOMPOSITION (SVD)/ PRINCIPAL … › ... › files › 5_PCA.pdf · PCA - INTRODUCTION!4 X = 1 2 4 2 1 5 3 4 10 4 3 11. PCA - INTRODUCTION!5. PCA - INTRODUCTION!6

EXAMPLE - PCA

!21

X =

1 22 13 44 3

From spectral theorem: (XT X)ϕ = λϕ

[30 2828 30] [ϕ11

ϕ12] = 58 [ϕ11

ϕ12] ⟹ ϕ1 =

1

21

2

[30 2828 30] [ϕ21

ϕ22] = 2 [ϕ21

ϕ22] ⟹ ϕ2 =

−1

21

2

Page 22: SINGULAR VALUE DECOMPOSITION (SVD)/ PRINCIPAL … › ... › files › 5_PCA.pdf · PCA - INTRODUCTION!4 X = 1 2 4 2 1 5 3 4 10 4 3 11. PCA - INTRODUCTION!5. PCA - INTRODUCTION!6

EXAMPLE - PCA

!22

X =

1 22 13 44 3

From spectral theorem: (XT X)ϕ = λϕ

ϕ1 =

1

21

2

ϕ2 =

−1

21

2

λ1 = 58 λ2 = 2

ϕ =

1

2

−1

21

2

1

2

Page 23: SINGULAR VALUE DECOMPOSITION (SVD)/ PRINCIPAL … › ... › files › 5_PCA.pdf · PCA - INTRODUCTION!4 X = 1 2 4 2 1 5 3 4 10 4 3 11. PCA - INTRODUCTION!5. PCA - INTRODUCTION!6

EXAMPLE - PCA

!23

X =

1 22 13 44 3

ϕ1 =

1

21

2

ϕ2 =

−1

21

2

λ1 = 58 λ2 = 2

Z = Xϕ =

1 22 13 44 3

1

2

−1

21

2

1

2

=

3

2

1

23

2

−1

27

2

1

27

2

−1

2

Page 24: SINGULAR VALUE DECOMPOSITION (SVD)/ PRINCIPAL … › ... › files › 5_PCA.pdf · PCA - INTRODUCTION!4 X = 1 2 4 2 1 5 3 4 10 4 3 11. PCA - INTRODUCTION!5. PCA - INTRODUCTION!6

EXAMPLE - PCA

!24

X =

1 22 13 44 3

Z =

3

2

1

23

2

−1

27

2

1

27

2

−1

2

(2,1)

(1,2) (4,3)

(3,4)

(1.5,1.5)

(3.5,3.5)

( 3

2,

1

2 )

( 3

2,

−1

2 )

( 7

2,

1

2 )

( 7

2,

−1

2 )

Page 25: SINGULAR VALUE DECOMPOSITION (SVD)/ PRINCIPAL … › ... › files › 5_PCA.pdf · PCA - INTRODUCTION!4 X = 1 2 4 2 1 5 3 4 10 4 3 11. PCA - INTRODUCTION!5. PCA - INTRODUCTION!6

PCA STEPS - STEP 1 MEAN SUBTRACTION

!25

Page 26: SINGULAR VALUE DECOMPOSITION (SVD)/ PRINCIPAL … › ... › files › 5_PCA.pdf · PCA - INTRODUCTION!4 X = 1 2 4 2 1 5 3 4 10 4 3 11. PCA - INTRODUCTION!5. PCA - INTRODUCTION!6

PCA STEPS - STEP 2 COVARIANCE MATRIX

!26

top 5 rows of mean centered data

covariance matrix

Page 27: SINGULAR VALUE DECOMPOSITION (SVD)/ PRINCIPAL … › ... › files › 5_PCA.pdf · PCA - INTRODUCTION!4 X = 1 2 4 2 1 5 3 4 10 4 3 11. PCA - INTRODUCTION!5. PCA - INTRODUCTION!6

PCA STEPS - STEP 3 EIGEN VALUES & EIGEN VECTORS OF COVARIANCE MATRIX

!27

Page 28: SINGULAR VALUE DECOMPOSITION (SVD)/ PRINCIPAL … › ... › files › 5_PCA.pdf · PCA - INTRODUCTION!4 X = 1 2 4 2 1 5 3 4 10 4 3 11. PCA - INTRODUCTION!5. PCA - INTRODUCTION!6

PCA STEPS - STEP 4 - PRINCIPAL COMPONENTS

Multiply each eigen vector by its corresponding eigen value (usually square root)

Plot them on top of the data

!28

Page 29: SINGULAR VALUE DECOMPOSITION (SVD)/ PRINCIPAL … › ... › files › 5_PCA.pdf · PCA - INTRODUCTION!4 X = 1 2 4 2 1 5 3 4 10 4 3 11. PCA - INTRODUCTION!5. PCA - INTRODUCTION!6

PCA STEPS - STEP 5 - PROJECT DATA ALONG DOMINANT PC

!29

newData = PC1 × oldData

Page 30: SINGULAR VALUE DECOMPOSITION (SVD)/ PRINCIPAL … › ... › files › 5_PCA.pdf · PCA - INTRODUCTION!4 X = 1 2 4 2 1 5 3 4 10 4 3 11. PCA - INTRODUCTION!5. PCA - INTRODUCTION!6

HOW MANY PRINCIPAL COMPONENTS ?

• How many PCs should we consider in post-hoc analysis?

• One result of PCA is a measure of the variance to each PC relative to the total variance of the dataset.

• We can calculate the percentage of variance explained for the m-th PC:

!30

PVEm =

n∑i=1

z2im

p∑j=1

n∑i=1

x2ij

Page 31: SINGULAR VALUE DECOMPOSITION (SVD)/ PRINCIPAL … › ... › files › 5_PCA.pdf · PCA - INTRODUCTION!4 X = 1 2 4 2 1 5 3 4 10 4 3 11. PCA - INTRODUCTION!5. PCA - INTRODUCTION!6

HOW MANY PRINCIPAL COMPONENTS ?

• We can calculate the percentage of variance explained for the m-th PC:

!31

PVEm =

n∑i=1

z2im

p∑j=1

n∑i=1

x2ij