computational intelligence: methods and applications lecture 5 eda and linear transformations....

14
Computational Intelligence: Computational Intelligence: Methods and Applications Methods and Applications Lecture 5 EDA and linear transformations. Włodzisław Duch Dept. of Informatics, UMK Google: W Duch

Upload: anne-darlene-foster

Post on 28-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computational Intelligence: Methods and Applications Lecture 5 EDA and linear transformations. Włodzisław Duch Dept. of Informatics, UMK Google: W Duch

Computational Intelligence: Computational Intelligence: Methods and ApplicationsMethods and Applications

Lecture 5 EDA and linear transformations.

Włodzisław Duch

Dept. of Informatics, UMK

Google: W Duch

Page 2: Computational Intelligence: Methods and Applications Lecture 5 EDA and linear transformations. Włodzisław Duch Dept. of Informatics, UMK Google: W Duch

Chernoff facesChernoff facesHumans have specialized brain areas for face recognition.

For d < 20 represent each feature by changing some face elements.

Interesting applets:

http://www.cs.uchicago.edu/~wiseman/chernoff/

http://www.cs.unm.edu/~dlchao/flake/chernoff/ (Chernoff park)

http://kspark.kaist.ac.kr/Human%20Engineering.files/Chernoff/Chernoff%20Faces.htm

Page 3: Computational Intelligence: Methods and Applications Lecture 5 EDA and linear transformations. Włodzisław Duch Dept. of Informatics, UMK Google: W Duch

Fish viewFish viewOther shapes may also be used to visualized data, for example fish.

Page 4: Computational Intelligence: Methods and Applications Lecture 5 EDA and linear transformations. Włodzisław Duch Dept. of Informatics, UMK Google: W Duch

Ring visualizationRing visualization (SunBurst) (SunBurst)Show tree-like hierarchical representation in form of rings.

Page 5: Computational Intelligence: Methods and Applications Lecture 5 EDA and linear transformations. Włodzisław Duch Dept. of Informatics, UMK Google: W Duch

Other EDA techniquesOther EDA techniques

NIST Engineering Statistics Handbook has a chapter on

exploratory data analysis (EDA).

http://www.itl.nist.gov/div898/handbook/index.htm

Unfortunately many visualization programs are written for X-Windows only, are in Fortran, or S or R languages.

Sonification: data converted to sounds!

Example: sound of EEG data.

Java Voice

Think about potential applications! More: http://sonification.de/

http://en.wikipedia.org/wiki/Sonification

Page 6: Computational Intelligence: Methods and Applications Lecture 5 EDA and linear transformations. Włodzisław Duch Dept. of Informatics, UMK Google: W Duch

CI approach to visualizationCI approach to visualization

Scatterograms: project all data on two features.

Find more interesting directions to create projections.

Linear projections:

• Principal Component Analysis,

• Discriminant Component Analysis,

• Projection Pursuit – “define interesting” projections.

Non-linear methods – more advanced, some will appear later.

Statistical methods: multidimensional scaling.

Neural methods: competitive learning, Self-Organizing Maps.

Kernel methods, principal curves and surfaces.

Information-theoretic methods.

Page 7: Computational Intelligence: Methods and Applications Lecture 5 EDA and linear transformations. Włodzisław Duch Dept. of Informatics, UMK Google: W Duch

Distances in feature spacesDistances in feature spaces

Data vector, d-dimensions XT = (X1, ... Xd), YT = (Y1, ... Yd)

Distance, or metric function, is a 2-argument function that satisfies:

Distance functions measure (dis)similarity.

, 0; , ,

, , ,

d d d

d d d

X Y X Y X Y Y X

X Y X Z Z Y

1/ 2

2

21

11

d

i ii

d

i ii

X Y

X Y

X Y

X Y

Popular distance functions:

Euclidean distance (L2 norm)

Manhattan (city-block) distance

(L1 norm)

Page 8: Computational Intelligence: Methods and Applications Lecture 5 EDA and linear transformations. Włodzisław Duch Dept. of Informatics, UMK Google: W Duch

Two metric functionsTwo metric functions

Equidistant points in 2D:

Euclidean case: circle or sphere Manhattan case: square

Identical distance between two points X, Y: imagine that in 10 D !

X1

X2

X1

X2

L L X P Y P

X X

YY

All points in the shaded area have the same Manhattan distance to X and Y!

isotropic non-isotropic

Page 9: Computational Intelligence: Methods and Applications Lecture 5 EDA and linear transformations. Włodzisław Duch Dept. of Informatics, UMK Google: W Duch

Linear transformationsLinear transformations

2D vectors X in a unit circle with mean (1,1); Y = AX, A = 2x2 matrix

The shape and the mean of data distribution is changed.

Scaling (diagonal aii elements); rotation (off-diag), mirror reflection.

Distances between vectors are not invariant: ||Y1-Y2||≠||X1-X2||

1 11 12 1

2 21 22 2

Y a a X

Y a a X

Page 10: Computational Intelligence: Methods and Applications Lecture 5 EDA and linear transformations. Włodzisław Duch Dept. of Informatics, UMK Google: W Duch

Invariant distancesInvariant distances

Euclidean distance is not invariant to linear transformations Y = AX,

scaling of units has strong influence on distances.

How to select scaling/rotations for simplest description of data?

Orthonormal matrices: ATA = I, are inducing rigid rotations.

To achieve full invariance requires therefore standardization of data (scaling invariance) and should use covariance matrix.

Mahalanobis metric will replace ATA by inverse of the covariance matrix.

T21 2 1 2 1 2

T1 2 1 2T

Y Y Y Y Y Y

X X A A X X

Page 11: Computational Intelligence: Methods and Applications Lecture 5 EDA and linear transformations. Włodzisław Duch Dept. of Informatics, UMK Google: W Duch

Data standardizationData standardization

For each vector component X(j)T=(X1(j), ... Xd

(j)), j=1 .. n

calculate mean and std: n – number of vectors, d – their dimension

( ) ( )

1 1

1 1;

i

n nj j

ij j

X Xn n

X X Vector of mean

feature values.

Averages over rows.

(1) (2) ( )

(1) (2) ( )1 1 1 1

(1) (2) ( )2 2 2 2

(1) (2) ( )

n

n

n

nd d d d

X X X X

X X X X

X X X X

X X X

Page 12: Computational Intelligence: Methods and Applications Lecture 5 EDA and linear transformations. Włodzisław Duch Dept. of Informatics, UMK Google: W Duch

Standard deviationStandard deviation

Calculate standard deviation:

Why n1, not n ? If true mean was known it should be n, but if the mean

is calculated the formula with n1 converges to the true variance!

Transform X => Z, standardized data vectors:

( )

1

22 ( )

1

1

1

1

i

i i

nj

ij

nj

ij

X Xn

X Xn

Vector of mean feature values.

Variance = square of standard deviation (std), sum of all deviations from the mean value.

( ) ( )j ji i i iZ X X

Page 13: Computational Intelligence: Methods and Applications Lecture 5 EDA and linear transformations. Włodzisław Duch Dept. of Informatics, UMK Google: W Duch

Standardized dataStandardized data

Std data: zero mean and unit variance.

Standardize data after making data transformation.

Effect: data is invariant to scaling only; for diagonal transformations distances after standardization are invariant, are based on identical units.

Note: it does not mean that all data models are performing better!

How to make data invariant to any linear transformations?

,

( ) ( )

1 1

2 22 ( ) ( ) 2

1 1

1 10

1 11

1 1

i i

Z i i i

n nj j

i i ij j

n nj j

i i ij j

Z Z X Xn n

Z Z X Xn n

Page 14: Computational Intelligence: Methods and Applications Lecture 5 EDA and linear transformations. Włodzisław Duch Dept. of Informatics, UMK Google: W Duch

Std exampleStd example

Before std

After std

Mean and std are shown using a colored bar; minimum and max values may extend outside.

Some features (ex. yellow), have large values; some (ex: gray) have small values; this may depend on units used to measure them.

Standardized data have all mean 0 and =1, thus contribution from different features to similarity or distance calculation is comparable.