techniques for studying correlation and covariance structure principal components analysis (pca)...

40
Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

Upload: edwina-wilcox

Post on 23-Dec-2015

233 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

Techniques for studying correlation and covariance structure

Principal Components Analysis (PCA)

Factor Analysis

Page 2: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

Principal Component Analysis

Page 3: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

Let x

and covariance matrix .

Definition:

1 1 1 p pC a x a x a x

have a p-variate Normal distribution

with mean vector

The linear combination

is called the first principal component if

1, , pa a a

is chosen to maximize

1Var C Var a x a a

subject to2 21 1pa a a a

Page 4: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

Let

, 1 1g a V a a a a a a

Consider maximizing

subject to 2 21 1pa a a a

V Var a x a a

Using the Lagrange multiplier technique

Page 5: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

Now

,1 0 if 1

g aa a a a

and

,2 2 0 if

g aa a a a

a

Thus is an eigenvector of and is the eigenvalue

associated with .

a

a

Page 6: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

Also Var a x a a a a a a

Hence is maximized if is the largest

eigenvalue of .

Var a x

Summary

1 1 1 p pC a x a x a x

is the first principal component if 1

p

a

a

a

2 21i.e. 1pa a a a

is the eigenvector (length 1)of associated with the largest eigenvalue 1 of .

Page 7: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

Let x

and covariance matrix .

Definition:

1 11 1 1 1

1 1

p p

p p pp p p

C a x a x a x

C a x a x a x

have a p-variate Normal distribution

with mean vector

The set of linear combinations

are called the principal components of

1, ,i i ipa a a

2 21 1i i i ipa a a a

The complete set of Principal components

x

if are chosen such that

Page 8: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

Note: we have already shown that

1 11 1, , pa a a

is the eigenvector of associated with the largest eigenvalue, 1 ,of the covariance matrix and

1 1 1 1 1Var C Var a x a a

and

1. Var(C1) is maximized.

2. Var(Ci) is maximized subject to Ci being independent of C1, …, Ci-1 (the previous i -1 principle components)

Page 9: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

We will now show that

1, ,i i ipa a a

is the eigenvector of associated with the ith largest eigenvalue, i of the covariance matrix and

i i i i iVar C Var a x a a

Proof (by induction – Assume true for i -1, then prove true for i)

Page 10: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

1 1 1

i

i i i

C a x a

x A x

C a x a

Now

has covariance matrix

1

1i i i

i

a

A A a a

a

1 1 1 1 1 1 1

1 1 1 1 1 1 1

1 1 1 1

0

0

i i i

i i i i i i i i

i i i i i i i i i i

a a a a a a a a

a a a a a a a a

a a a a a a a a a a a a

Page 11: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

Hence Ci is independent of C1, …, Ci-1 if

1 1 1, , ,i i i i ig a a a a a

We want to maximize i i iVar C a a

subject to

1 11. 0i i ia a a a

2. 1i ia a

Let

1 1

1 1 1

1 1

0

0

0

i i i

i i i i

i i i i

a a a a

a a a a

a a a a

1 1 1i i i i i ia a a a

Page 12: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

1 1, , , ,0 for 1, 1i i i

j ij

g aa a j i

Now

and

1 1, , , ,1 0i i i

i ii

g aa a

Page 13: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

Now

1, , ,i i

i

g a

a

1 1 1 12 2

0i i i i ia a a a

1 1 1 1

1

2i i i i ia a a a

1 1 1 1

1

2j i i j i j i j ia a a a a a a a

1

0 02 j j ja a

hence

Also for j < i

Hence j = 0 for j < I and equation (1) becomes

(1)

i i ia a

Page 14: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

1, , pa a

i p

the principal component thi iC i a x

are the eignevectors of associated with the eigenvalues

Thus

and

1. Var(C1) is maximized.

2. Var(Ci) is maximized subject to Ci being independent of C1, …, Ci-1 (the previous i -1 principal components)

where

Page 15: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

1, , pa a

0i p

Recall any positive matrix,

where are eigenvectors of of length 1 and

1 1

1

0

, ,

0p

p p

a

a a PDP

a

are eigenvalues of

1, , is an orthogonal matrix.

( )

pP a a

P P PP I

Page 16: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

Example

In this example wildlife (moose) population density was measured over time (once a year) in three areas.

Year Area 1 Area 2 Area 3 Year Area 1 Area 2 Area 3

1 11.3 14.1 6.9 13 6.1 9.9 6.82 10.4 14 11.2 14 9.7 13.2 6.63 9.9 13 8.7 15 8.1 9.4 44 8.2 11.4 3.3 16 11.3 11.8 4.95 10.1 11.9 8.7 17 8.8 11.5 8.86 10.7 13.8 12.5 18 9.4 11.6 5.77 11 14.9 8.9 19 7.5 11.4 4.98 7.1 8.5 3.7 20 8.8 10.7 7.29 14.7 14.5 12.1 21 7.5 11.1 7

10 5.4 9 4.1 22 9.1 13.2 8.911 7.3 7.6 5.6 23 6.8 9.8 7.612 10.2 10.9 7.3

Page 17: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

picture

Area 3

Area 1

Area 2

Page 18: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

The Sample Statistics

4.297 3.307 3.295

3.527 3.527

6.566

S

9.10

11.62

7.19

x

1 .796 .620

1 .687

1

R

The mean vector

The covariance matrix

The correlation matrix

Page 19: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

Principal component Analysis

1 2 3

.522 .582 .624

.523 , .359 , .733

.674 .730 .117

a a a

1 2 311.85974, 2.204232, 0.814249 The eigenvalues of S

The eigenvectors of S

1 1 2 3

2 1 2 3

3 1 2 3

.522 .523 .674

.582 .359 .730

.624 .733 .117

C x x x

C x x x

C x x x

The principal components

Page 20: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

Area 3

Area 1

Area 2

1 1 2 3.522 .523 .674C x x x

Page 21: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

Area 3

Area 1

Area 2

2 1 2 3.582 .359 .730C x x x

Page 22: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

Area 3

Area 1

Area 2

3 1 2 3.624 .733 .117C x x x

Page 23: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

Graphical Picture of Principal Components

Multivariate Normal data falls in an ellipsoidal pattern.

The shape and orientation of the ellipsoid is determined by the covariance matrix

The eignevectors of are vectors giving the directions of the axes of the ellopsoidThe eigenvalues give the length of these axes.

Page 24: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

Recall that if is a positive definite matrix

1 1 1 p p pa a a a

1 1

1 1

0

, ,

0 p p

a

a a

a

PDPwhere P is an orthogonal matrix (P’P = PP’ = I) with the columns equal to the eigenvectors of .and D is a diagonal matrix with diagonal elements equal to the eigenvalues of .

Page 25: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

The vector of Principal components

1 1 1

p p p

C a x a

C x P x

C a x a

c P P P PDP P P P D P P

has covariance matrix

1 0

0 p

D

Page 26: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

An orthogonal matrix rotates vectors, thus

C P x

ctr tr P P tr PP tr

Also

1 1

p p

i iii i

1 1

var var Total Variance of p p

i ii i

C x x

rotates the vector x

into the vector of Principal components C

tr(D) =

Page 27: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

The ratio

1 1

var

Total Variance of ii i

p p

j jjj j

C

x

denotes the proportion of variance explained by the ith principal component Ci.

Page 28: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

The Example

i i % variance

1 11.8597 79.71%2 2.20423 14.82%3 0.81425 5.47%

Total 14.8782 100%

Page 29: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

cov , cov ,i j i j i jC x a x e x a e

Also

1 1 1i p p p ja a a a a e

i i j i ija e a

0,0, ,0,1,0, ,0jj

e

where

cov ,corr ,

i j

i j

i j

C xC x

Var C Var x

i ij iij

jji jj

aa

Page 30: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

Comment:If instead of the covariance matrix, , The correlation matrix , is used to extract the Principal components then the Principal components are defined in terms of the standard scores of the observations:

i ii

ii

xz

*and corr i i ijC a

The correlation matrix is the covariance matrix of the standard scores of the observations:

Page 31: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

More Examples

Page 32: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

Example 2: Bone Lengths of White Leghorn Fowl: The correlation matrix of the complete set of six fowl bone

measurements had the following form: Skull Length 1.000 0.584 0.615 0.601 0.570 0.600 Skull Breadth 1.000 0.576 0.530 0.526 0.555 Humerus 1.000 0.940 0.875 0.878 Ulna 1.000 0.877 0.886 Femur 1.000 0.924 Tibia 1.000

Page 33: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

Table: Principal Components Component Dimension 1 2 3 4 5 6

Skull: Length 0.74 0.45 0.49 -0.02 0.01 0.00 Breadth 0.70 0.59 -0.41 0.00 0.00 -0.01 Wing: Humerus 0.95 -0.16 -0.03 0.22 0.05 0.16 Ulna 0.94 -0.21 0.01 0.20 -0.04 -0.17 Leg: Femur 0.93 -0.24 -0.04 -0.21 0.18 -0.03 Tibia 0.94 -0.19 -0.03 -0.20 -0.19 0.04 Eigenvalue 4.568 0.714 0.412 0.173 0.076 0.057 % of Total 76.1 11.9 6.9 2.9 1.3 0.9 Variance

Page 34: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

Identification of Components: component Description 1 General average of all bone dimensions (Size) 2 Comparison of skull sizewith Wing and Leg lengths 3 Comparison of skull length and breadth (Skull Shape) 4 Comparison of Wing and Leg lengths 5 Comparison of femur and tibia 6 Comparison of humerus and ulna

Page 35: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

Example 3: Weschler Adult Intelligence Scale Subtest Scores Table: Principal Components Component 1 2 3 4 WAIS subtest: Information 0.83 0.33 -0.04 -0.01 Comprehension 0.75 0.31 0.07 -0.17 Arihtmetic 0.72 0.25 -0.08 0.35 Similarities 0.78 0.14 0.00 -0.21 Digit Span 0.62 0.00 -0.38 0.58 Vocabulary 0.83 0.38 -0.03 -0.16 Digit Symbol 0.72 -0.36 -0.26 -0.01 Picture Completion 0.78 -0.10 -0.25 -0.01 Block Design 0.72 -0.26 0.36 0.18 Picture Arrangement 0.72 -0.23 0.04 -0.05 Object assembly 0.65 -0.30 0.47 0.13 Age: -0.34 0.80 0.26 0.18 Years of Education: 0.75 0.01 -0.30 -0.23

Eigenvalue 6.69 1.42 0.80 0.71

% of Total Variance 51.47 10.90 6.15 5.48 Cum % of Variance 51.47 62.37 68.52 74.01

Page 36: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

Identification of Components: component Description 1 General intellectual Performance 2 Experiential or age factor - bipolar dimension comparing verbal

or informational skills known to increase with advancing age to subtests measuring spatial-perceptual qualities and other cognitive abilities known to decrease with age

3 Spatial imagery or perception dimension 4 Numerical facility

Page 37: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

Computation of the eigenvalues and eigenvectors of

1 1 1 p p pa a a a

21 1 1 p p pa a a a

1 1 1 p p pa a a a

2 21 1 1 p p pa a a a

Recall:

Page 38: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

continuing we see that:

1 1 1n n n

p p pa a a a

21 1 1

1 1

n n

pnp p p pa a a a a a

1 1 1n a a

For large values of n

1 1n a a

Page 39: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

The algorithm for computing the eigenvector 1a

1. Compute

2 4 8 16, , , , etc

rescaling so that the elements do not become to large in value.

i.e. rescale so that the largest element is 1.

2. Compute

1 1 1 1 for large and 1n ka a n a a 1a

using the fact that:

3. Compute 1 using

1 1 1a a

Page 40: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

4. Repeat using the matrix

21 1 1 2 2 2 p p pa a a a a a

5. Continue with i = 2 , … , p – 1 using the matrix

11 1 1

i ii i i i i i p p pa a a a a a

Example – Using Excel - Eigen