principal component analysis principles and application

Post on 22-Dec-2015

221 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Principal Component Analysis

Principles and

Application

Fast Multi-Sensor Large

Computers Instruments Data Sets

Examples:•Satellite Data•Digital Camera, Video Data•Tomography•Particle Imaging Velocimetry (PIV)•Ultrasound Velocimetry (UVP)

Low resolution image

Large Data Sets

1 1 2 1 1 600

2 1 2 2 2 600

400 1 400 2 400 600

x

( , ) ( , ) ( , )y

( , ) ( , ) ( , )

( , ) ( , ) ( , )

p x y p x y p x y

p x y p x y p x y

p x y p x y p x y

• There are 400 x 600 = 240,000 pieces of information.

• Not all of this information is independent => information compression (data compression)

Experiment:

• Consider the flow past a cylinder, and suppose we position a cross-wire probe downstream of the cylinder.

• With a cross-wire probe we can measure two components of the velocity at successive time intervals and store the results in a computer.

1 2

1 2

, , , , ,

time

j m

j m

u uu u

v vv v

Example 1Two component velocity measurement

• As the previous slide suggests, the pair of velocities can be represented as a column vector:

• u is a vector at position x in physical space:

• The magnitude and angle of the vector changes with time.

j

jj

u

v

u

x

yu

x

Mathematical Representation of Data

• Mean velocity :

• Variance :

• Covariance :

• Correlation :

1

1, where the bar means

m

jj

uu u

v m

u

2 2

1

2 2

1

1( ) ( )

1( ) ( )

m

u jj

m

v jj

Var u u um

Var v v vm

1

1cov( , ) ( )( )

cov( , ) cov( , )

m

j jj

u v u u v vm

v u u v

Basic Statistics

cov( , ) , 1 1uv uv

u v

u v

Plot u vs v

u

v1

1

j m

j m

u u u

v v v

The data look correlated

Examine the Statistics

Move to a data centered

coordinate system

u

v ( , )u v

v’

u’

2

1 1

2

1 1

1 1

1 1

m m

i i i

m m

i i i

u u vm m

v u vm m

Calculate the Covariance

matrix

Diagonal terms are the variances in the

u’ and v’ directions

Examine the Statistics

Move to a data centered

coordinate system

u

v ( , )u v

v’

u’

2

1 1

2

1 1

1 1

1 1

m m

i i i

m m

i i i

u u vm m

v u vm m

Calculate the Covariance

matrix

covariance or cross-correlation

Rotate coordinates to remove the correlations

u

v

1

v”

2

u”

2

1

2

1

01

0

m

i

m

i

u

mv

Covariance matrix in the (u”,v”) coordinate system

We have just carried out a

Principal Axis Transformation.

This is the first step in a

Principal Component Analysis

(PCA).

Principal Component Analysis

A procedure for transforming a set of correlated

variables into a new set of uncorrelated variables.

How do we do it??

Construction of the

PCA coordinate system

The PCA coordinate system is one that maximizes the mean squared projection of the data. In this sense it is an “optimal” orthogonal coordinate system. Its popularity is primarily due to its dimension reducing properties.

The basic algorithm for constructing the PCA eigenvectors is:

• Find the best direction (line) in the space, 1.

• Find the best direction (line) 2 with the restriction that it must be orthogonal to 1.

• Find the best direction (line) i with the restriction that i is orthogonal to j for all j < i.

How do we find this nice

coordinate system??

Calculate the eigenvalues and eigenvectors

of the

Covariance Matrix

Experiment:

• Pipe Flow -- measurement of velocity profile.

Example 2.Velocity Profile Measurement

z

u(z)

1

2 where ( )k k

n

u

uu u z

u

u

• As before we represent the velocities in the form of a column vector, but this time the vector is not in physical space.

• The space in which our vector lives is one we shall call profile space or pattern space.

• Profile space has n dimensions. In this example, the position zk defines a direction in profile space.

• As time evolves, we measure a sequence of velocity profiles:11 1 1 2 1

22 1 2 2 2

1 2

( , )( , ) ( , ) ( , )

( , )( , ) ( , ) ( , ), , , , ,

( , )( , ) ( , ) ( , )

time

j m

j m

n jn n n m

u z tu z t u z t u z t

u z tu z t u z t u z t

u z tu z t u z t u z t

Vectors in Profile Space

The Preliminary Calculations

1 1 2 1 1

2 1 2 2 2

1 2

( , ) ( , ) ( , )

( , ) ( , ) ( , )

( , ) ( , ) ( , )

m

m

n n n m

u z t u z t u z t

u z t u z t u z t

u z t u z t u z t

U

1. UVP Data Matrix (n x m=128 x 1024)

1 1 1

2 2 2

1

( ) ( ) ( )

( ) ( ) ( )

( ) ( ) ( )

1( ) ( , )

n n n

m

i i kk

u z u z u z

u z u z u z

u z u z u z

u z u z tm

U

2. Mean Profile Matrix (n x m)

1

m X U U

3. Centered Data Matrix (n x m)

1 1

T

T

T

m m

X X

R U U U U

R XX

4. Covariance Matrix (n x n = 128 x 128)

The Diagonalization

R λ Φ 0

Eigenvalue Equation

1

1 2

0

0

0 n

λ

Eigenvalues

1

1

11 1

1

1

0

n

T

n

n n

i k

n

n

i k

i k

Φ

Eigenvectors (eigenprofiles)

2

k

Note: is the variance of

the data in the direction:k

k

k

Example 3.Taylor-Couette Flow

UVP Example

space

time

UVP data

Before

space

space

After (diagonalisation)

Covariance Matrix

compression!!

The Eigenvalue Spectrum(Signal) Energy Spectrum

Energy Fraction

1

1 1

kk n

kk

n

kk

E

E

Ek

Mode Number 1281

1

0

cumulative sum of Ek

Ek

Mode Number1 20

1

0

1

Filtering and Reconstruction

• Decompose X into signal and noise dominated components (subspaces):

where XF is the Filtered data

XNoise is the Residual

• Reconstruct filtered UVP velocity

F F U X U

F Noise X X X

U

UF

XNoise=U-UF

Eigenvalue Spectrum

Filtered Time Series(Channel 70)

Raw data

Filtered data

Residual

Power Spectra(Integrated over all channels)

Superimpose the Spectra

Generalizations

Generalise

• Response to a stimulus • Comparison of multiple data sets obtained by

varying a parameter to study a transition.

1ref

m X U U

ref U 0

top related