exercices multivariate data analysis. topic 1 multivariate data analysis topic 1 theory:...

157
Exercices Multivariate Data Analysis

Upload: tracy-newton

Post on 26-Dec-2015

331 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

ExercicesMultivariate Data Analysis

Page 2: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Topic 1 Multivariate Data AnalysisTopic 1 Theory: Multivariate Data AnalysisIntroduction to Multivariate Data AnalysisPrincipal Component Analysis (PCA) Multivariate Linear Regression (MLR, PCR and PLSR)

Laboratory exercises:Introduction to MATLAB Examples of  PCA (cluster analysis of samples, identification and geographical distribution of contamination sources/patterns…) Examples of Multivariate Regression (prediction of concentration of chemicals from spectral analysis, investigation of correlation patterns and of the relative importance of variables,…

Romà Tauler (IDAEA, CSIC, Barcelona)Febrero 2009

Page 3: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Introduction to MATLAB.

What is MATLAB?

Matlab is a contraction for “Matrix Laboratory" and, though originally designed as a tool for the manipulation of matrices, is now capable of performing a wide range of numerica computations.

Matlab also possess esextensive graphics capabilities.

Page 4: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Introduction to MATLABCommand line programming environmentcommand window prompt (»)Matrix algebra: scalars, vectors, matricesWork / use:•Interactively at the command line•Create/use programs (functions or scripts)•Toolboxes add on additional functionality

Page 5: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

The MATLAB Workspace

Workspace is where: variables are stored, create variables, manipulate and operate on variables

Save workspace variables

Information about variables in the workspace: who and whos

»whos

Name Size Bytes Class

fsparse 100x100 1604 sparse array

modstruct 1x1 130 struct array

my3D 10x20x104 166400 double array

mymat 5x4 160 double array

myvect 1x3 24 double array

somechars 1x8 16 char array

zcells 2x2 167082 cell array

Grand total is 41766 elements using 335416 bytes

Page 6: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

MATLAB Data Types•double -- double precision floating point

-- number array (this is the traditional

-- MATLAB matrix or array)

•sparse -- 2-D real (or complex) sparse matrix

•struct -- Structure array

•cell -- cell array

•char -- Character array

•logical -- Logical arrays (1,0)

<class_name> -- Custom object class

dataset -- Standard Data Object

Page 7: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 8: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Command Line Help: help functionname; lookfor method; which functionname

helpwin

Page 9: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Importing Data into MATLAB

MATLAB can read flat ASCII files

Import Wizard

A variety of image formats can be imported with

IMREAD function (JPEG, BMP, TIFF, etc.)

Various spreadsheet import functions

Custom developed routines for reading binary

instrument files

Additional Functions for Importing Data

‘xlsfinfo’ - reads sheetnames from .xls file

‘xlsread’ - reads in data from .xls file

Page 10: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Format types

Page 11: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 12: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 13: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 14: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 15: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 16: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 17: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 18: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 19: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 20: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 21: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 22: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 23: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

A = [1 2 0; 2 5 -1; 4 10 -1]A = 1 2 0 2 5 -1 4 10 -1 >>B = A'B = 1 2 4 2 5 10 0 -1 -1 >>C = A .* BC = 1 4 0 4 25 -10 0 -10 1

The same for the ./and.\operators

NaNconcept:NaN is the IEEE arithmetic representation for Not-a-Number.A NaN is obtained as a result of mathematically undefined operations like 0.0/0.0 and inf-inf.

Page 24: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Useful functions for beginners:

HELP:On-line help, display text at command line.LOOKFOR:Search all M-files for keyword.WHOS:List current variables, long form. MAX:Largest component.MIN:Smallest component.ROUND, CEIL, FLOOR, FIX:Rounding.SQUEEZE:Remove singleton dimensions.FIND:Find indices of nonzero elements.MEAN:Average or mean value.ISNAN:True for Not-a-Number.FLIPUD:Flip matrix in up/down direction.FLIPDIM:Flip matrix along specified dimension.RESHAPE:Change size.PERMUTE:Permute array dimensions.REPMAT:Replicate and tile an array.EVAL:Execute string with MATLAB expression.

Page 25: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 26: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 27: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 28: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 29: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 30: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 31: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 32: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 33: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 34: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 35: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Indexing into Three-way (and higher) Arrays

MATLAB supports three-way and higher arrays Indexing extends easily to multi-way:

»x = round(rand(4,5,2)*10)

x(:,:,1) =

10 9 8 9 9

2 8 4 7 9

6 5 6 2 4

5 0 8 4 9

x(:,:,2) =

1 1 3 4 8

4 2 2 9 5

8 2 0 5 2

0 6 7 4 7

»x(:,:,2) = ones(4,5)*5

x(:,:,1) =

10 9 8 9 9

2 8 4 7 9

6 5 6 2 4

5 0 8 4 9

x(:,:,2) =

5 5 5 5 5

5 5 5 5 5

5 5 5 5 5

5 5 5 5 5

Page 36: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Cell Arrays

Cell arrays are a handy way

to store different length

matrices from batch process

data, example at left

»x = cell(4,1)

x =

[]

[]

[]

[]

»x{1} = rand(4,5);

»x{2} = rand(10,5);

»x{3} = rand(6,5);

»x{4} = rand(8,5);

»xx = [ 4x5 double] [10x5 double] [ 6x5 double] [ 8x5 double]»x{1}ans = 0.8381 0.8318 0.3046 0.3028 0.3784 0.0196 0.5028 0.1897 0.5417 0.8600 0.6813 0.7095 0.1934 0.1509 0.8537 0.3795 0.4289 0.6822 0.6979 0.5936

Page 37: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 38: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 39: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 40: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 41: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 42: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 43: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 44: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 45: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 46: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

help diary DIARY Save text of MATLAB session. DIARY FILENAME causes a copy of all subsequent command window input and most of the resulting command window output to be appended to the named file. If no file is specified, the file 'diary' is used. DIARY OFF suspends it. DIARY ON turns it back on. DIARY, by itself, toggles the diary state. Use the functional form of DIARY, such as DIARY('file'), when the file name is stored in a string. See also <a href="matlab:help save">save</a>.

Reference page in Help browser <a href="matlab:doc diary">doc diary</a>

doc diarydiary

Page 47: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 48: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 49: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 50: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Introduction to Linear Algebra

• Definitions

• scalar, vector, matrix

• Linear Algebra Operations

• vector and matrix addition

• vector and matrix multiplication

• projection

• Gaussian elimination

• the concept of rank

• matrix inverses

• rank deficiency

• ......

Page 51: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 52: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 53: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 54: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 55: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 56: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 57: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 58: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Projection of a vector y onto a vector x

Projection of a vector y onto a subspace X (onto the columns of X)

Page 59: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 60: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 61: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 62: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Diagonalization of a non-singular symetric matrix.Eigenvalues and eigenvectors. Calculation of the principal components.

(X1, X2, ..., Xn) Linear Transformation (PC1, PC2, ...., Pcn)

 PC1 = l11X1 + l12X2 + .... + l1nXn

PC2 = l21X2 + l22X2 + .... + l2nXn

...................................................PCn = ln1X1 + ln2X2 + ..... + lnnXn

(PC1, PC2, .....Pcn) = (X1, X2, ....Xn)

PC = X L

l l l

l l l

l l l

n

n

n n nn

11 21 1

12 22 2

1 2

...

...

.............

...

with the constraints applied in ascending order: 1. Var(PC1) maximum

2. Var(PC2) maximum but with Cov(PC1,PC2) = 0

.....................................................................................n. Var(PCn) maximum but with

Cov(PC1,PCn) = 0, Cov(PC2,PCn) = 0, Cov(PC3,PCn) = 0, ..........................., Cov(PCn-1,PCn) = 0

Page 63: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

* Diagonalization of a non singular square symetric matrix  

S = Cov(X1, X2, ...., Xn)

 S = L Diag(1,2,...n) L

t =L D() Lt

 L is an orthonormal matrix; it has the eigenvectors of S (loadings); they are in the

columns of matrix L

nnnn

n

n

nnnnn

n

n

lll

lll

lll

lll

lll

lll

....

.............

...

...

...00

..........

0...0

0...0

...

............

...

...

= S

21

22221

11211

2

1

21

22212

12111

Eigenvalues of matrix S are in the diagonal of matrix D 

1 = Var(PC1), 2 = Var(PC2), .... ,n = Var(PCn)

s11+s22+...+snn = Trace(S) = Trace(D()) = 1+2+...+n

Det(S) = Det(D()) = 12.....n

Znn = (zij) is the matrix of scores; object coordinates in the new axes (new variables,

or PCs) 

Znn=Xnn Lnn ; Znn Ltnn = Xnn

Page 64: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 65: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Linear Combination of the original variables Factors Principal Components (PC) Canonic Variables Latent Variables Discriminant Functions ............................................... Linear Combination of random variables y = a1x1 + a2x2 + ....+ anxn = at x E(y) = a1E(x1) + a2E(x2) + ....+ anE(xn) Var(y) = (a1, a2, ..., an) S a = at S a,

on S és la matriu de variances-covariances de X z = b1x1 + b2x2 + ...+ bnxn = bt x Cov(y,z) = at S b ..................................................................................

Page 66: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

•Noise Filtering  

Selection of the first principal components, e.g.. if e PC are selected 

Xmn = ZmeLtee + Emn

  

Emn is the residuals matrix, after subtracting the

contributions of the first PCs

Page 67: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

* Euclidean Distance 

d2(Oi,Oj) = d2( (xi1,xi2,...,xin) (xj1,xj2,...,xjn) ) =

 = (xi1-xj1)

2 + (xi2-xj2)2 + (xin-xjn)

2 =

 

= (xi1-xj1, ...,xin-xjn) I

 

* Mahalanobis Distance 

d2m(Oi,Oj) = (xi1-xj1, ...,xin-xjn) S

where S is the covariances matrix

It takes into account covariance between variables!

x x

x x

x x

i j

i j

in jn

1 1

2 2

...........

x x

x x

x x

i j

i j

in jn

1 1

2 2

...........

Page 68: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Univariate Statistics

n

ii 1

n

i2 2 i 1

X x

n

ii 1

X x

n

i ii 1

x,y x,y

x,yx,y

x y

xX

n

(x X)s

n 1

(x X)s

n 1

(x X)(y Y)s

n 1s

rs s

mean

variance

standard deviation

covariance

correlation

Page 69: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Multivariate Statistics

Matrix X of experimental measures Xnm

11 12 1m

21 22 2m

n1 n2 nm

x x ... x

x x ... xX(n,m)

... ... ... ...

x x ... x

vector of column means: )x ..., ,x ,x( = x m21 where

n

iji=1

j

xx = , j 1,...,m

n

Matrix of variances-covariances S(m,m) = (s2

ij) It is a square symmetric matrix

s2jl = Cov(xj , xl) =

n

ij j il li 1

n 1

(x x )(x -x )

2 2 211 12 1m2 221 2m

2 2m1 mm

s s ... s

s ... ... sS

... ... ... ...

s ... ... s

Page 70: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Multivariate Statistics

s2jj = Var(xj) =

1n

xxn

1i

2jij

)( =

1n

1

x j x j

2

X (n,m) = X(n,m) - x x , . . . , x1 2 n,

mn mean centered data matrix

11 1 12 2 1m m

21 1 22 2 2m m

n1 1 n2 2 nm m

x x x x ... x x

x x x x x xX(n,m)

... ... ... ...

x x x x ... x x

S (m,m) = 1/(m-1) XXT(m,n) X(n,m) covariance matrix Standard deviations

(s1, s2,..., sn) = (s211

1/2, s222

1/2,...,s2nn

1/2)

Page 71: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Multivariate Statistics Correlation matrix C (m,m) X (n,m) => mean centering => X (n,m) standardizing Xs(n,m)

(xij) ij jx x

ij j

j

x x

s

C (m,m) = Corr(Xj) = 1/(n-1) Xs

T Xs

Covariance matrix respect the origen M (m,m) = 1/n XT X

Page 72: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

11 12 1m

21 22 2m

n1 n2 nm

x x ... x

x x ... xX(n,m)

... ... ... ...

x x ... x

11 1 12 2 1m m

21 1 22 2 2m m

n1 1 n2 2 nm m

x x x x ... x x

x x x x x xX(n,m)

... ... ... ...

x x x x ... x x

Page 73: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

2 2 211 12 1m

2 221 2m

2 2m1 mm

s s ... s

s ... ... sS

... ... ... ...

s ... ... s

2

2jj

n

ij ji 1s

n 1

(x x )

2 2 211 12 1m

2 221 2m

2 2m1 mm

r r ... r

r ... ... rC

... ... ... ...

r ... ... r

2ij2

iji j

2i ii

sr

s s

s s

1n

)x)(xx(xs

n

1ililjij

2jl

Page 74: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Univariate Normal Distribution with mean and standard deviation

f(x) = 1

2

1

2

2

2

ex

( )

Sample mean, m, is an estimation of the population mean and standard deviation of the sample, s, is an estimation of the standard deviation of the population,

Page 75: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

M u l t i v a r i a t e N o r m a l D i s t r i b u t i o n

= ( 1 ,

2 , . . . . , n ) p o p u l a t i o n m e a n

x ( , , . . . )x x x n1 2 s a m p l e m e a n a s e s t i m a t i o n o f

c o v a r i a n c e s m a t r i x ( m a t r i x S i s a n e s t i m a t i o n o f )

f ( x 1 , x 2 , . . . . , x n ) = 1

21 2 2

1 2 1 11

1 1

/ /

/ ( , . . . . ) . . . . . . . . . . .. . . . . . . . . . .

( )

n

x x

x

xen n

n n

Page 76: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Other subjects to consider (exercises):-Statistical distributions (with MATLAB)•Elementary Statistical functions (in MATLAB)•Statistical tests•ANOVA•Experimental design...

Page 77: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Comparison of sample mean with a

known value (population mean) (0):

BEGIN

n 30

zx

ncalc

0 t

x

s ncalc

0

END

zcal 1 96.

x 0x

0

yes no

END

x 0x

0

yes no

t t

n d. fcal tab

1 . .

yes no

Page 78: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Comparison between the mean of two samples

BEGIN

normality? transformation

zx x

n n

cal

1 2

12

1

22

2

normality

after

transformation?

12

22 2

test F

sn s n s

n n2 1 1

22 2

2

1 2

1 1

2

( ) ( )

TESTS NON PARAMÈTRIC

zcal1 96.

1 2

END

tx x

sn n

cal

1 2

1 2

1 1

1 2

1 2

END

t t

n n g lcal tab

1 2 2 . .

g l

sn

sn

sn

n

sn

n

t t for d f

cal

tab cal

. .

' . .

12

1

22

2

2

12

1

2

1

22

2

2

21 1

2t

tsn

tsn

sn

sn

t t n g l

t t n g ltab

tab

'

( ) . .

( ) . .

112

12

22

2

12

1

22

2

1 1

2 2

1

1

tx x

sn

sn

cal

1 2

12

1

22

2

t tcal '

END

1 2 1 2

n i n1 2 30

1 2

yes no no

yesyes

noyes no

yesno

no

no yes

yes

Page 79: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

BEGIN

normality? transformation

normality after the

transformation

non parametric tests

END

zd

s ncal

d

td

s ncal

d

n30

zcal 196.

d 0 d0

END

d 0 d 0

t t

n dcal tab1 . f.

yes no no

yes

yes

no

no yes no yes

Comparison between the mean of two samples

Page 80: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Topic 1 Multivariate Data AnalysisTopic 1 Theory: Multivariate Data AnalysisIntroduction to Multivariate Data AnalysisPrincipal Component Analysis (PCA) Multivariate Linear Regression (MLR, PCR and PLSR)

Laboratory exercises:Introduction to MATLAB Examples of  PCA (cluster analysis of samples, identification and geographical distribution of contamination sources/patterns…) Examples of Multivariate Regression (prediction of concentration of chemicals from spectral analysis, investigation of correlation patterns and of the relative importance of variables,…

Romà Tauler (IDAEA, CSIC, Barcelona)Febrero 2009

Page 81: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 82: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 83: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Principal Component Analysis (PCA)

Page 84: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Principal Component Analysis (PCA)

Page 85: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Principal Component Analysis (PCA)

Page 86: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Unsupervised PatternRegognition

Supervised PatternRegognition

Page 87: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 88: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 89: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 90: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 91: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 92: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 93: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 94: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

>> load arch>> whos Name Size Bytes Class Attributes

arch 75x10 6000 double Data matrix class 75x1 600 double Classification index samps 75x5 750 char Sample levels vars 10x2 40 char Variable levels>> plot(arch)>> plot(arch')

0 10 20 30 40 50 60 70 800

200

400

600

800

1000

1200

1400

1600

1800

Fe Ti Ba Ca K Mn Rb Sr Y Zr0

200

400

600

800

1000

1200

1400

1600

1800

Page 95: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Data Statistics min: 45 max: 1100 mean: 334.7000 median: 131 mode: 45 std: 386.7365 range: 1055

1 2 3 4 5 6 7 8 9 10

0

200

400

600

800

1000

1200

1400

1600

1800

Val

ues

Column Number

1 Fe2 Ti3 Ba4 Ca5 K 6 Mn7 Rb8 Sr9 Y 10 Zr

732 836 940 1044 1148 1252 1356 1460 1564 16680

5

10

15

20

25

Fe

100 150 200 250 300 350 400 4500

5

10

15

20

25

Ti

9 15 21 27 33 39 45 51 57 630

2

4

6

8

10

12

14

16

18

20

Ba

200 300 400 500 600 700 800 900 1000 11000

5

10

15

20

25

Ca

250 300 350 400 450 500 5500

2

4

6

8

10

12

14

16

18

K

20 30 40 50 60 70 80 900

5

10

15

20

25

30

Mn

70 80 90 100 110 120 130 140 1500

2

4

6

8

10

12

14

Rb

0 10 20 30 40 50 60 70 800

2

4

6

8

10

12

14

16

18

20

Sr

30 40 50 60 70 80 900

5

10

15

20

25

30

Y

40 60 80 100 120 140 160 180 200 220 2400

2

4

6

8

10

12

14

16

18

20

Zr

boxplot(arch’)

hist(arch(:,v)

Page 96: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Data pretreatment>> xcal=arch(1:63,:);>> xtest=arch(64:75,:);>> axcal=auto(xcal);>> subplot(1,2,1),plot(axcal);>> subplot(1,2,2),plot(axcal');>> boxplot(axcal)

0 10 20 30 40 50 60-3

-2

-1

0

1

2

3

2 4 6 8 10-3

-2

-1

0

1

2

3

1 2 3 4 5 6 7 8 9 10

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5V

alue

s

Column Number

Page 97: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Nr. of componentslarch=svd(arch)larch=larch(1:10)plot(larch)

laxcal=svd(axcal)laxcal = 18.1975 11.4439 8.2437 7.1865 3.9787 2.9436 2.4939 1.8726 1.4955 1.3505plot(laxcal)plot(larch)

1 2 3 4 5 6 7 8 9 100

2000

4000

6000

8000

10000

12000

14000

1 2 3 4 5 6 7 8 9 100

2

4

6

8

10

12

14

16

18

20

Page 98: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

PCA Principal components analysisPCA on axcalI/O: [scores,loads,ssq,res,reslm,tsqlm,tsq] = pca(data,plots,scl,lvs); The input is the data matrix (data). Outputs are the scores (scores), loadings (loads), variance info (ssq), residuals (res), Q limit (reslm), T^2 limit (tsqlm), and T^2's (tsq).Optional inputs are (plots) plots = 0 suppresses all plots, plots = 1 [default] produces plots with no confidence limits, plots = 2 produces plots with limits, plots = -1 plots the eigenvalues only (without limits), a vector (scl) for plotting scores against, (if scl = 0 sample numbers will be used), and a scalar (lv) which specifies the number of principal components to use in the model and which suppresses the prompt for number of PCs.

[scores,loads,ssq,res,reslm,tsqlm,tsq]=pca(axcal);

Page 99: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Percent Variance Captured by PCA Model Principal Eigenvalue % Variance % VarianceComponent of Captured Captured Number Cov(X) This PC Total--------- ---------- ---------- ---------- 1 5.34e+000 53.41 53.41 2 2.11e+000 21.12 74.53 3 1.10e+000 10.96 85.50 4 8.33e-001 8.33 93.83 5 2.55e-001 2.55 96.38 6 1.40e-001 1.40 97.78 7 1.00e-001 1.00 98.78 8 5.66e-002 0.57 99.35 9 3.61e-002 0.36 99.71 10 2.94e-002 0.29 100.00

Page 100: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

1 2 3 4 5 6 7 8 9 10-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

Variable Number

Load

ings

for

PC

# 1

Variable Number vs. Loadings for PC# 1

1 2 3 4 5 6 7 8 9 10-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

Variable Number

Load

ings

for

PC

# 2

Variable Number vs. Loadings for PC# 2

1 2 3 4 5 6 7 8 9 10-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

Variable Number

Load

ings

for

PC

# 3

Variable Number vs. Loadings for PC# 3

1 2 3 4 5 6 7 8 9 10-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

Variable Number

Load

ings

for

PC

# 4

Variable Number vs. Loadings for PC# 4

1 Fe2 Ti3 Ba4 Ca5 K 6 Mn7 Rb8 Sr9 Y 10 Zr

Page 101: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

0 10 20 30 40 50 60 70-5

-4

-3

-2

-1

0

1

2

3

4

5

Sample Number

Sco

re o

n P

C#

1

Sample Scores with 95% Limits

0 10 20 30 40 50 60 70-4

-3

-2

-1

0

1

2

3

Sample Number

Sco

re o

n P

C#

2

Sample Scores with 95% Limits

0 10 20 30 40 50 60 70-3

-2

-1

0

1

2

3

Sample Number

Sco

re o

n P

C#

3

Sample Scores with 95% Limits

0 10 20 30 40 50 60 70-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

Sample Number

Sco

re o

n P

C#

4

Sample Scores with 95% Limits

Page 102: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

PLTLOADS Plots loadings from PCA This function may be used to make 2-D and 3-D plots of loadings vectors against each other. The inputs to the function are the matrix of loadings vectors (loads) where each column represents a loadings vector from the PCA function and an optional variable of labels (labels) which describe the original data variables. Note: labels must be a "column vector" where each label is in single quotes and has the same number of letters. Example: labels = ['Height'; 'Weight'; 'Waist '; 'IQ '] The function will prompt to select 2 or 3-D plots, for for the numbers of the PCs, and if you would like "drop lines" and axes on the 3-D plots. I/O: pltloads(loads,labels)

Page 103: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

pltloads(loads,vars);

-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

Fe

Ti

Ba

Ca K

Mn

Rb Sr

Y

Zr

Loadings for PC# 1

Load

ings

for

PC

# 2

Loadings for PC# 1 versus PC# 2

-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

Fe Ti

Ba

Ca

K

Mn

Rb

Sr

Y

Zr

Loadings for PC# 1

Load

ings

for

PC

# 3

Loadings for PC# 1 versus PC# 3

-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

Fe

Ti

Ba

Ca

K

Mn

Rb Sr

Y

Zr

Loadings for PC# 1

Load

ings

for

PC

# 4

Loadings for PC# 1 versus PC# 4

Page 104: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

pltscrs(scores,samps(1:63,:),class(1:63,:))

-4 -3 -2 -1 0 1 2 3 4-4

-3

-2

-1

0

1

2

KAVG

K-1B

K-2

K-3A

K-1C

K-1D

K-3B

K-4R

K-4B

K-1A

BLAV1 BLAV9

BL-2 BL-3

BL-6

BL-7

BLAV7

BL-1

BL-8

SH-1

SH-15

SH-S1

SH-68

SH-2

SH-3

SH-5

SH-13

SHII7 SHV18 SHIL1

SHIL1 SHII1

SHV12 SHV24

SHII5

SHIIK SHIL1

SHV12

SHI10

SHI13

SHV14

SHII7

ANA-2 ANA-3

ANA-4 ANA-5

ANA-6

ANA-7

ANA-8

ANA-9

ANA-1 ANA-1 ANA-1

ANA-1

ANA-1 ANA-1

ANA-1

ANA-1

ANA-1

ANA-1 ANA-1

ANA-2 ANA-2

Scores on PC# 1

Sco

res

on P

C#

2Scores for PC# 1 versus PC# 2

Page 105: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

-4 -3 -2 -1 0 1 2 3 4-3

-2

-1

0

1

2

3

KAVG

K-1B

K-2

K-3A

K-1C

K-1D

K-3B

K-4R

K-4B

K-1A

BLAV1

BLAV9

BL-2

BL-3

BL-6

BL-7

BLAV7 BL-1

BL-8

SH-1

SH-15

SH-S1 SH-68

SH-2

SH-3

SH-5

SH-13

SHII7

SHV18

SHIL1 SHIL1

SHII1

SHV12

SHV24

SHII5

SHIIK

SHIL1

SHV12 SHI10

SHI13

SHV14 SHII7

ANA-2

ANA-3

ANA-4

ANA-5

ANA-6

ANA-7 ANA-8

ANA-9 ANA-1

ANA-1

ANA-1

ANA-1

ANA-1

ANA-1

ANA-1

ANA-1

ANA-1

ANA-1 ANA-1

ANA-2

ANA-2

Scores on PC# 1

Sco

res

on P

C#

3

Scores for PC# 1 versus PC# 3

Page 106: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

-4 -3 -2 -1 0 1 2 3 4-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

KAVG K-1B

K-2

K-3A

K-1C

K-1D

K-3B

K-4R

K-4B

K-1A

BLAV1

BLAV9

BL-2

BL-3

BL-6 BL-7

BLAV7

BL-1 BL-8

SH-1

SH-15

SH-S1

SH-68

SH-2

SH-3

SH-5

SH-13 SHII7

SHV18

SHIL1

SHIL1

SHII1 SHV12

SHV24

SHII5

SHIIK

SHIL1

SHV12

SHI10 SHI13

SHV14

SHII7

ANA-2

ANA-3

ANA-4

ANA-5

ANA-6

ANA-7

ANA-8

ANA-9

ANA-1 ANA-1

ANA-1

ANA-1

ANA-1

ANA-1

ANA-1

ANA-1

ANA-1 ANA-1

ANA-1 ANA-2

ANA-2

Scores on PC# 1

Sco

res

on P

C#

4

Scores for PC# 1 versus PC# 4

Page 107: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

PCAPRO Projects new data on old principal components model. Inputs are the new data (newdata), the old loadings (loads), the old variance info (ssq), the limit for q (q), the limit for t^2 (tsq) and an optional variable (plots) which suppresses the plots when set to 0. Outputs are the new scores (scores), residuals (res) and t^2 values (tsqvals). These are plotted as the function proceeds if plots ~= 0. The I/O format is: [scores,resids,tsqs] = pcapro(newdata,loads,ssq,q,tsq,plots); WARNING: Be sure that (newdata) is scaled the same as original data!

AUTO Autoscales matrix to mean zero unit variance Autoscales a matrix (x) and returns the resulting matrix (ax) with mean-zero unit variance columns, a vector of means (mx) and a vector of standard deviations (stdx) used in the scaling. I/O format is: [ax,mx,stdx] = auto(x);

SCALE Scales matrix as specified. Scales a matrix (x) using means (mx) and standard deviations (stds) specified. I/O format is: sx = scale(x,mx,stdx);

Page 108: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

axtest=scale(xtest,mx,stdx);[scores_xtest]=pcapro(axtest,loads,ssq,reslm,tsqlm);

0 2 4 6 8 10 12-5

-4

-3

-2

-1

0

1

2

3

4

5

Sample Number

Sco

re o

n P

C#

1

New Sample Scores with 95% Limits from Old Model

0 2 4 6 8 10 12-3

-2

-1

0

1

2

3

Sample Number

Sco

re o

n P

C#

2

New Sample Scores with 95% Limits from Old Model

0 2 4 6 8 10 12-4

-3

-2

-1

0

1

2

3

Sample Number

Sco

re o

n P

C#

3

New Sample Scores with 95% Limits from Old Model

0 2 4 6 8 10 12-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

Sample Number

Sco

re o

n P

C#

4

New Sample Scores with 95% Limits from Old Model

Page 109: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

pltscrs([scores;scores_xtest],samps);

-4 -3 -2 -1 0 1 2 3 4-4

-3

-2

-1

0

1

2

KAVG

K-1B

K-2

K-3A

K-1C

K-1D

K-3B

K-4R

K-4B

K-1A

BLAV1 BLAV9

BL-2 BL-3

BL-6

BL-7

BLAV7

BL-1

BL-8

SH-1

SH-15

SH-S1

SH-68

SH-2

SH-3

SH-5

SH-13

SHII7 SHV18 SHIL1

SHIL1 SHII1

SHV12 SHV24

SHII5

SHIIK SHIL1

SHV12

SHI10

SHI13

SHV14

SHII7

ANA-2 ANA-3

ANA-4 ANA-5

ANA-6

ANA-7

ANA-8

ANA-9

ANA-1 ANA-1 ANA-1

ANA-1

ANA-1 ANA-1

ANA-1

ANA-1

ANA-1

ANA-1 ANA-1

ANA-2 ANA-2

s1

s2

s3 s4

s5

s6

s7

s8

s9 s10

s11

s12

Scores on PC# 1

Sco

res

on P

C#

2Scores for PC# 1 versus PC# 2

Page 110: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

-4 -3 -2 -1 0 1 2 3 4-4

-3

-2

-1

0

1

2

3

KAVG

K-1B

K-2

K-3A

K-1C

K-1D

K-3B

K-4R

K-4B

K-1A

BLAV1

BLAV9

BL-2

BL-3

BL-6

BL-7

BLAV7 BL-1

BL-8

SH-1

SH-15

SH-S1 SH-68 SH-2

SH-3

SH-5

SH-13

SHII7

SHV18

SHIL1 SHIL1

SHII1

SHV12

SHV24

SHII5

SHIIK

SHIL1

SHV12 SHI10

SHI13

SHV14 SHII7

ANA-2

ANA-3

ANA-4

ANA-5

ANA-6

ANA-7 ANA-8

ANA-9 ANA-1

ANA-1

ANA-1

ANA-1

ANA-1

ANA-1

ANA-1

ANA-1

ANA-1

ANA-1 ANA-1

ANA-2

ANA-2

s1

s2

s3 s4

s5

s6

s7

s8

s9 s10

s11

s12

Scores on PC# 1

Sco

res

on P

C#

3

Scores for PC# 1 versus PC# 3

Page 111: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

-4 -3 -2 -1 0 1 2 3 4-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

KAVG K-1B

K-2

K-3A

K-1C

K-1D

K-3B

K-4R

K-4B

K-1A

BLAV1

BLAV9

BL-2

BL-3

BL-6 BL-7

BLAV7

BL-1 BL-8

SH-1

SH-15

SH-S1

SH-68

SH-2

SH-3

SH-5

SH-13 SHII7

SHV18

SHIL1

SHIL1

SHII1 SHV12

SHV24

SHII5

SHIIK

SHIL1

SHV12

SHI10 SHI13

SHV14

SHII7

ANA-2

ANA-3

ANA-4

ANA-5

ANA-6

ANA-7

ANA-8

ANA-9

ANA-1 ANA-1

ANA-1

ANA-1

ANA-1

ANA-1

ANA-1

ANA-1

ANA-1 ANA-1

ANA-1 ANA-2

ANA-2

s1

s2

s3

s4

s5

s6

s7

s8

s9

s10

s11

s12

Scores on PC# 1

Sco

res

on P

C#

4

Scores for PC# 1 versus PC# 4

Page 112: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data
Page 113: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Exercise: multivariate data analysis of environmental samples

• NW Mediteranean contamination by organic compounds

Page 114: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

load envwhosName Size Bytes Class Attributes

sampnames 22x1 1458 cell textdata 74x2 10874 cell varnames 74x1 6296 cell x 22x96 16896 double plot(x)plot(x’)

0 5 10 15 20 250

0.5

1

1.5

2

2.5x 10

4

0 10 20 30 40 50 60 70 80 90 1000

0.5

1

1.5

2

2.5x 10

4

25UCM

25, UCM

PCBs

samples variables

Page 115: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

1 'Ty27'2 'BC12'3 'BC15'4 'Ty23'5 'TyK'6 'Ty8'7 'Ty17'8 'BC4'9 'Ty3‘10 'Ty19'11 'BC8'12 'A2'13 'BC10'14 'BC6'15 'BC4'16 'D3'17 'BC9'18 'D2'19 'C1'20 'D1'21 'BC11‘22 'BC7'

Page 116: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

'n-C16' 'n-C17' 'n-C18' 'n-C19' 'n-C20' 'n-C21' 'n-C22' 'n-C23' 'n-C24' 'n-C25' 'n-C26' 'n-C27' 'n-C28' 'n-C29' 'n-C30' 'n-C31' 'n-C32' 'n-C33' 'n-C34' 'n-C35' 'n-C36' 'n-C37' 'n-C38' 'n-C39'

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

'UCM ( 'pristane' 'phytane' 'fluoranthene' 'phenanthrene' 'anthracene' 'methy I phenanthrene' 'dimethylphenanthrenes' 'fluoranthene' 'acephenantrylene' 'pyrene' 'methylfluoranthenes' 'benzo[a]fluorene' 'benzo[b]fluorene' 'retene' 'benzo[b]phenanthrene' 'benz[a]anthracene' 'crysene + triphenylene' 'benzo[/+b+/c]fluoranthenes' 'benzo[a]fluoranthene' 'benzo[e]pyrene' 'benzo[a]pyrene' 'perylene' 'indeno[7,1,2,3-cde/]chrysene'

25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

‘ indeno[1,2,3,-cd]p yrene' 'benzo[ghí\ perylene' 'benzo[ghí\ fluoranthene' 'cyclopenta[cd]pyrene' 'dibenzoanthracenes' 'benzo[b]chrysene' 'coronene' 302 ?? 'naphtho[1,2,-b]thiophene' 'dibenzothiophene' 'naphtho[2,1-b]thiophene' '4-methyldibenzothiophene' '3,2-methyldibenzothiophene' '1-methyldibenzothiophene' 'benzo[b]naphtho[2,1-d]thiophene' 'benzo[b]naphtho[1,2-d]thiophene' 'benzo[b]naphtho[2,3-b]thiophene’

49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

alca

nes

1-24

PAHs, alquenes 26-65

Page 117: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

'PCB-52‘ 'PCB-101' 'PCB-118' 'PCB-153' 'PCB-138' 'PCB-187' 'PCB-128' 'PCB-180' 'PCB-170 'o,p'-DDD' ‘o,p'-DDE ‘o,p'-DDT p,p'-DDE p,p'-DDD p,p'-DDT hexaclorobenzene hexaclorohexane lindane octachloroestyrene

66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84

27-nor-24-methylcholesta-5a,22(£)-dien-3/3-olCholesta-5a,22(£)-dien-3/3-olCholesterolCholestanolbrassicasterol24-methyl-5a(W)-cholest-22(£)-en-3/3-ol24-methylhcolest-5-en-3/3-olstigmasterol24-ethyl-5a-cholest-22-en-3/3-ol/3-sitosterol 24-ethyl-5a-cholestan-3/3-ol dinosterol

85 86 87 88 89 90 91 92 93 94 95 96

esterols 85-96organochlorine, PCBs 66-84

Page 118: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

0

50

100

150

200

250

300

350

400

Val

ues

Column Number1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

0

20

40

60

80

100

120

140

160

180

200

Val

ues

Column Number

excluding variable 25, UCMboxplot variables 26-50

boxplot variables 1-24

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

0

10

20

30

40

50

60

Val

ues

Column Number

boxplot variables51-74

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596

0

0.5

1

1.5

2

x 104

Val

ues

Column Number

Page 119: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

>> stdx=std(x);>> mx=mean(x);>> sx=scale(x,zeros(1,96),stdx);>> plot(sx)>> plot(sx')>> lsx=svd(sx);>> lsx=lsx(1:10)

0 10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5

6

0 5 10 15 20 250

1

2

3

4

5

6

1 2 3 4 5 6 7 8 9 100

10

20

30

40

50

60

70

80

lsx = 74.5755 28.6436 14.0461 9.3163 8.7101 8.3390 7.3844 6.4339 5.5273 5.1046

3 components?

Page 120: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

[scores,loads,ssq,res,reslm,tsqlm,tsq]=pca(sx); Warning: Data does not appear to be mean centered. Variance captured table should be read as sum of squares captured. Percent Variance Captured by PCA Model Principal Eigenvalue % Variance % VarianceComponent of Captured Captured Number Cov(X) This PC Total--------- ---------- ---------- ---------- 1 2.65e+002 78.50 78.50 2 3.91e+001 11.58 90.08 3 9.39e+000 2.78 92.86 4 4.13e+000 1.23 94.09 5 3.61e+000 1.07 95.16 6 3.31e+000 0.98 96.14 7 2.60e+000 0.77 96.91 8 1.97e+000 0.58 97.49 9 1.45e+000 0.43 97.92 10 1.24e+000 0.37 98.29

Page 121: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

0 10 20 30 40 50 60 70 80 90 100-0.2

-0.18

-0.16

-0.14

-0.12

-0.1

-0.08

-0.06

-0.04

-0.02

0

Variable Number

Load

ings

for

PC

# 1

Variable Number vs. Loadings for PC# 1

0 10 20 30 40 50 60 70 80 90 100-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

Variable Number

Load

ings

for

PC

# 2

Variable Number vs. Loadings for PC# 2

0 10 20 30 40 50 60 70 80 90 100-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

Variable Number

Load

ings

for

PC

# 3

Variable Number vs. Loadings for PC# 3

PAHS

alcanes

PCBsesterols

alcanes higher PM

Page 122: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

0 5 10 15 20 25-40

-30

-20

-10

0

10

20

30

40

Sample Number

Sco

re o

n P

C#

1

Sample Scores with 95% Limits

0 5 10 15 20 25-15

-10

-5

0

5

10

15

Sample Number

Sco

re o

n P

C#

2

Sample Scores with 95% Limits

0 5 10 15 20 25-8

-6

-4

-2

0

2

4

6

8

Sample Number

Sco

re o

n P

C#

3

Sample Scores with 95% Limits

Page 123: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

-0.2 -0.18 -0.16 -0.14 -0.12 -0.1 -0.08 -0.06 -0.04 -0.02

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

1

2

3

4 5

6

7 8

9 10

11

12

13

14

15

16

17

18

19

20

21 22 23 24

25

26

27

28

29 30

31 32

33

34

35

36

37 38

39

40 41 42

43 44

45 46

47

48 49 50

51

52

53

54

55

56

57

58 59 60

61

62

63

64 65

66 67 68 69 70

71 72

73 74 75

76

77

78 79

80 81 82

83

84

85 86 87

88 89 90

91

92 93 94

95

96

Loadings for PC# 1

Load

ings

for

PC

# 2

Loadings for PC# 1 versus PC# 2

pltloads(loads);

alcanes

esterols

PCBs

PAHs

Page 124: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

-0.2 -0.18 -0.16 -0.14 -0.12 -0.1 -0.08 -0.06 -0.04 -0.02

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

1

2 3 4 5

6

7

8

9

10

11 12

13

14 15

16 17

18 19

20 21 22 23

24

25

26

27

28

29 30

31 32 33

34 35

36

37 38

39

40 41

42

43

44

45 46 47

48

49

50

51

52

53

54

55 56

57

58

59

60 61

62

63

64 65

66 67

68 69 70

71 72 73 74 75

76

77

78

79 80

81

82

83

84

85

86

87

88 89

90

91

92

93

94

95

96

Loadings for PC# 1

Load

ings

for

PC

# 3

Loadings for PC# 1 versus PC# 3

pltloads(loads);

alcanes

PAHs

PCBs

Page 125: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

-25 -20 -15 -10 -5 0-10

-5

0

5

10

15

Ty27

BC12 BC15

Ty23 TyK

Ty8 Ty17

BC4

Ty3

Ty19

BC8 A2

BC10 BC6 BC4

D3

BC9

D2

C1

D1

BC11

BC7

Scores on PC# 1

Sco

res

on P

C#

2

Scores for PC# 1 versus PC# 2

pltscrs(scores,samp)

open sea

Page 126: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

-25 -20 -15 -10 -5 0-8

-6

-4

-2

0

2

4

6

Ty27

BC12

BC15

Ty23

TyK

Ty8

Ty17

BC4

Ty3

Ty19

BC8

A2

BC10

BC6

BC4

D3

BC9

D2 C1 D1

BC11

BC7

Scores on PC# 1

Sco

res

on P

C#

3

Scores for PC# 1 versus PC# 3

pltscrs(scores,samp)

Page 127: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

-0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.60

5

10

15

20

Distance to K-Nearest Neighbor

Ty27

BC12

BC15

Ty23

TyK

Ty8

Ty17

BC4

Ty3

Ty19

BC8

A2

BC10

BC6

BC4

D3

BC9

D2

C1

D1

BC11

BC7

Dendrogram Using Mahalanobis Distance on 3 PCs

Page 128: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

0 0.5 1 1.5 2

0

5

10

15

20

Distance to K-Nearest Neighbor

Ty27

BC12

BC15

Ty23

TyK

Ty8

Ty17

BC4

Ty3

Ty19

BC8

A2

BC10

BC6

BC4

D3

BC9

D2

C1

D1

BC11

BC7

Dendrogram Using Mahalanobis Distance on 3 PCs

cluster(x,samp)

opensea

BCNGulfLion

-0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.60

5

10

15

20

Distance to K-Nearest Neighbor

Ty27

BC12

BC15

Ty23

TyK

Ty8

Ty17

BC4

Ty3

Ty19

BC8

A2

BC10

BC6

BC4

D3

BC9

D2

C1

D1

BC11

BC7

Dendrogram Using Mahalanobis Distance on 3 PCs

Page 129: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

[axscores,axloads,axssq,axres,axreslm,axtsqlm,axtsq]=pca(ax); Percent Variance Captured by PCA Model Principal Eigenvalue % Variance % VarianceComponent of Captured Captured Number Cov(X) This PC Total--------- ---------- ---------- ---------- 1 3.98e+001 41.42 41.42 2 2.57e+001 26.78 68.21 3 8.07e+000 8.41 76.62 4 3.72e+000 3.88 80.50 5 3.33e+000 3.47 83.97 6 3.22e+000 3.35 87.32 7 2.60e+000 2.70 90.02 8 1.93e+000 2.01 92.03 9 1.31e+000 1.36 93.39 10 1.17e+000 1.22 94.62

autoscaled data

Page 130: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

0 10 20 30 40 50 60 70 80 90 100-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

Variable Number

Load

ings

for

PC

# 1

Variable Number vs. Loadings for PC# 1

0 10 20 30 40 50 60 70 80 90 100-0.18

-0.16

-0.14

-0.12

-0.1

-0.08

-0.06

-0.04

-0.02

0

0.02

Variable Number

Load

ings

for

PC

# 2

Variable Number vs. Loadings for PC# 2

0 10 20 30 40 50 60 70 80 90 100-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

Variable Number

Load

ings

for

PC

# 3

Variable Number vs. Loadings for PC# 3

Page 131: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

0 5 10 15 20 25-15

-10

-5

0

5

10

15

20

Sample Number

Sco

re o

n P

C#

1

Sample Scores with 95% Limits

0 5 10 15 20 25-15

-10

-5

0

5

10

15

Sample Number

Sco

re o

n P

C#

2

Sample Scores with 95% Limits

0 5 10 15 20 25-6

-4

-2

0

2

4

6

Sample Number

Sco

re o

n P

C#

3

Sample Scores with 95% Limits

Page 132: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15-0.18

-0.16

-0.14

-0.12

-0.1

-0.08

-0.06

-0.04

-0.02

0

0.02

1

2

3

4 5

6

7 8

9

10

11

12

13 14

15

16

17

18

19

20

21 22

23

24

25

26

27

28 29

30

31 32

33

34

35

36

37

38

39

40 41 42

43 44

45 46

47

48 49

50

51

52

53

54

55

56

57 58

59 60 61

62

63

64

65

66 67

68 69

70

71

72

73 74

75

76

77

78

79

80

81

82

83

84

85

86 87

88

89

90 91

92 93

94

95 96

Loadings for PC# 1

Load

ings

for

PC

# 2

Loadings for PC# 1 versus PC# 2

PAHS

alcanes PCBs

esterols

alcanes

Page 133: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

1

2 3 4 5

6

7 8

9

10

11 12

13

14 15

16 17

18 19

20

21 22 23

24

25

26

27

28

29 30

31 32

33

34 35

36

37 38

39

40 41

42

43

44

45

46 47

48

49

50

51 52

53

54

55

56

57

58

59

60 61

62

63 64 65

66 67 68 69 70

71 72 73 74

75 76

77

78 79

80 81

82

83

84

85

86

87

88 89

90

91

92

93

94

95

96

Loadings for PC# 1

Load

ings

for

PC

# 3

Loadings for PC# 1 versus PC# 3

Page 134: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

-10 -5 0 5 10 15 20-15

-10

-5

0

5

10

15

Ty27

BC12

BC15

Ty23 TyK

Ty8

Ty17

BC4

Ty3 Ty19

BC8

A2

BC10 BC6

BC4

D3

BC9

D2

C1

D1

BC11

BC7

Scores on PC# 1

Sco

res

on P

C#

2

Scores for PC# 1 versus PC# 2

open sealess contam.

Gulf of Lion

Ebro Delta

BCN

Page 135: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

1 'Ty27'2 'BC12'3 'BC15'4 'Ty23'5 'TyK'6 'Ty8'7 'Ty17'8 'BC4'9 'Ty3‘10 'Ty19'11 'BC8'12 'A2'13 'BC10'14 'BC6'15 'BC4'16 'D3'17 'BC9'18 'D2'19 'C1'20 'D1'21 'BC11‘22 'BC7'

Page 136: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

-10 -5 0 5 10 15 20-5

-4

-3

-2

-1

0

1

2

3

4

5

Ty27

BC12

BC15

Ty23

TyK

Ty8

Ty17

BC4

Ty3

Ty19

BC8 A2

BC10

BC6

BC4 D3

BC9

D2

C1

D1 BC11

BC7

Scores on PC# 1

Sco

res

on P

C#

3

Scores for PC# 1 versus PC# 3

Ebro Delta

BCN

Page 137: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

1 'Ty27'2 'BC12'3 'BC15'4 'Ty23'5 'TyK'6 'Ty8'7 'Ty17'8 'BC4'9 'Ty3‘10 'Ty19'11 'BC8'12 'A2'13 'BC10'14 'BC6'15 'BC4'16 'D3'17 'BC9'18 'D2'19 'C1'20 'D1'21 'BC11‘22 'BC7'

Page 138: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

0 0.5 1 1.5 2

0

5

10

15

20

Distance to K-Nearest Neighbor

Ty27

BC12

BC15

Ty23

TyK

Ty8

Ty17

BC4

Ty3

Ty19

BC8

A2

BC10

BC6

BC4

D3

BC9

D2

C1

D1

BC11

BC7

Dendrogram Using Mahalanobis Distance on 3 PCs

cluster(x,samp)

opensea

BCNGulfLion

Page 139: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

CHEMOMETRICS STUDY OF CTD ARTIC SEA WATER DATA

• Introduction

• CTD data description

• PCA results for XTOT, Xdcm,Xsurf,Xdeep

• PLS prediction yfluor = f(Xdcm)

• PARAFAC modelling of X(80,200,10)

• MCR of Xfluor, Xcond, Xtemp,...

• PCA of continuos integrated data

Page 140: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

1

2

3

4

5 6 7 8 910

1112131415161718192021222324

252627282930313233343536373839404142

43444546474849

30 W

20 W 10

W 0

10 E

20 E

70 N

80 N

90 N

Decluttered

10 20 30 40 50 60 70 80-20

-15

-10

-5

0

5

10

15

20

Sample

long

1 1b

2b

3b

4b

5b

6b 7 8 9a 9c 10 11 12a 12c

13 14 15a 15c

16 18a 18c

20a 21 22 23a 23c 25 26b 27b

28 29 30a

31 32

33a 33c 34 36a 37 39a 39c 40 42a 42c

43b 44 45

46a 46c 48 49b

Decluttered

10 20 30 40 50 60 70 8068

70

72

74

76

78

80

82

Sample

latd

1 1b

2b

3b

4b

5b 6b 7

8 9b 10 11 12b 13 15a 15c

17 18b 20a 21 22 23a 23c

25 26b 27b 29 31 33a 33c 35 36b 38 39b 40 42a 42c 43b 44

46a 46c 48 49b

longitud

latitud

PROYECTO ATOS (Julio 2007)

E

W

78N

80N

Page 141: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

y18ay19

y20ay21

y22y23a

y24

y25 y26a

y27a

y28

y29

y30a

y31

y32 y33ay34

y35y36a

y37y38

y39ay40y41y42a

y43ay44

y45

y46a

y47y48

y49a

17 E 18 E

78 N

79 N

80 N

y6ay7

y8

y9ay10

y11y12a

y13

y14y15a

y16 y17

2 E 3 E 4 E 5 E 6 E 7 E 8 E 9 E 10 E 11 E 12 E 13 E 14 E 15 E 16 E

81 N

SW

NE

Colderwaters

Warmerwaters

ICE

Page 142: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

10 CTD measured variables:1 Depth, press2 Temperature, temp3 Conductivity, cond4 Salt concentration, salt5 Oxygent dissolved, oxyg6 beam light transmission, btrm7 fluorescence, fluor8 turbidimetry, turb9 latitude, latd10 longitude, long

CTD data evaluation and Integration

depths(100,..1000 m)

10 variables

EstaciónCTD1

10 variables

EstaciónCTD49

depths(100,…1000 m)

10 variables

EstaciónCTD20

depths(100,…1000 m)

Gross data table : X(53367x10) 81 CTD experiments CTD (49 estaciones with replicates and depths to 100-1000

Station 19 was removedbecause it had only54 depths

80 experiments49 stations

BuildingX(10,80,200)

For each variable:Xvar(80,200), Yvar

…..

…..80 estaciones

Fast data acquisition

Data should be averaged and filtered

Page 143: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

CHEMOMETRICS STUDY OF CTD ARTIC SEA WATER DATA

• Introduction

• CTD data description

• PCA results for XTOT, Xdcm,Xsurf,Xdeep

• PLS prediction yfluor = f(Xdcm)

• PARAFAC modelling of X(80,200,10)

• MCR of Xfluor, Xcond, Xtemp,...

• PCA of continuos integrated data

Page 144: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

salt

condtemp

temp/salt

10 20 30 40 50 60 70 80

10

20

30

40

50

60

70

80

90

100

31

31.5

32

32.5

33

33.5

34

34.5

35

10 20 30 40 50 60 70 80

10

20

30

40

50

60

70

80

90

100

26

27

28

29

30

31

32

33

34

35

10 20 30 40 50 60 70 80

10

20

30

40

50

60

70

80

90

100

-1

0

1

2

3

4

5

6

7

10 20 30 40 50 60 70 80

10

20

30

40

50

60

70

80

90

100 -0.05

0

0.05

0.1

0.15

0.2

SW NW

NE23 493711

Page 145: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

10 20 30 40 50 60 70 80

10

20

30

40

50

60

70

80

90

10060

65

70

75

80

85

90

95

10 20 30 40 50 60 70 80

10

20

30

40

50

60

70

80

90

100

200

250

300

350

400

fluor

btrmoxyg

turb

10 20 30 40 50 60 70 80

10

20

30

40

50

60

70

80

90

100

0

5

10

15

20

25

30

35

40

45

10 20 30 40 50 60 70 80

10

20

30

40

50

60

70

80

90

100

0

50

100

150

200

23 493711

Page 146: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

0 10 20 30 40 50 60 70 80

-10

0

10

20

30

40

50fluo

surf

dcmdeep

0 10 20 30 40 50 60 70 80-2

-1

0

1

2

3

4

5

6

7

8temp

surf

dcmdeep

0 10 20 30 40 50 60 70 8031

31.5

32

32.5

33

33.5

34

34.5

35

35.5salt

surf

dcmdeep

0 10 20 30 40 50 60 70 80150

200

250

300

350

400

450oxyg

surf

dcm

deep

temp salt

oxyg

fluor

Page 147: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

1

2

3

4

5

6

7

8

9

10

Scale Gives Value of R for Each Variable Pair

Correlation Map, Variables in Original Order

1 2 3 4 5 6 7 8 9 10

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

correlation in dcm (máximo de clorofilamáximo de fluorescencia)

1 'press'2 'temp'3 'cond'4 'salt'5 'oxyg'6 'btra'7 'fluo'8 'turb'9 'long'10 'latd'

Page 148: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

CHEMOMETRICS STUDY OF CTD ARTIC SEA WATER DATA

• Introduction

• CTD data description

• PCA results for XTOT, Xdcm,Xsurf,Xdeep

• PLS prediction yfluor = f(Xdcm)

• PARAFAC modelling of X(80,200,10)

• MCR of Xfluor, Xcond, Xtemp,...

• PCA of continuos integrated data

Page 149: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Percent Variance Captured by PCA ModelPrincipal Eigenvalue % Variance % VarianceComponent of Captured Captured Number Cov(X) This PC Total--------- ---------- ---------- ---------- 1 3.95e+000 39.52 39.52 2 3.10e+000 31.00 70.53 3 1.54e+000 15.42 85.95 4 8.46e-001 8.46 94.41

1 2 3 4 5 6 7 8 9 10-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

Variable

Load

ings

on

PC

1 (

39.5

2%)

press

temp cond

salt

oxyg

btra

fluo turb

long latd

Variables/Loadings Plot for ydcm

1 2 3 4 5 6 7 8 9 10-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

Variable

Load

ings

on

PC

2 (

31.0

0%)

press

temp cond

salt

oxyg

btra

fluo turb

long latd

Variables/Loadings Plot for ydcm

1 2 3 4 5 6 7 8 9 10-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

Variable

Load

ings

on

PC

3 (

15.4

2%)

press

temp cond

salt

oxyg

btra

fluo

turb

long latd

Variables/Loadings Plot for ydcm

PCAX(80,10)At DCM

Page 150: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

-5 -4 -3 -2 -1 0 1 2 3-3

-2

-1

0

1

2

3

4

5

Scores on PC 1 (39.52%)

Sco

res

on P

C 2

(31

.00%

)

1 2a 2b 3b 4a

4b

5a

5b

6a 6b

6c

7 8 9a

9b

10

11

12a

12c 13

14

15a 15b

15c

16

17

18b

18c

20a 20b

22

23a

23b

23c

24

25

26a 26b

27a

27b

28

29

30a

31

32 33a 33b 33c

34

36a

36b

37 38

39a

39b

39c 40

41

42a

42c

43a

43b 43c

44

45

46a

46b 46c

47

48

49a

49b

Samples/Scores Plot of ydcm

Decluttered

tempcondsalt

oxyg

fluorturb

btrm

PCAX(80,10)At DCM

Page 151: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Percent Variance Captured by PCA ModelPrincipal Eigenvalue % Variance % VarianceComponent of Captured Captured Number Cov(X) This PC Total--------- ---------- ---------- ---------- 1 3.86e+000 38.55 38.55 2 2.01e+000 20.11 58.67 3 1.53e+000 15.27 73.94 4 1.19e+000 11.87 85.81

1 2 3 4 5 6 7 8 9 10-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

Variable

Load

ings

on

PC

1 (

38.5

5%)

press

temp cond salt

oxyg

btra

fluo

turb

long latd

Variables/Loadings Plot for ysurf

PCAX(80,10)

At surfaceExcluded sample 67, 42b

1 2 3 4 5 6 7 8 9 10-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Variable

Load

ings

on

PC

2 (

20.1

1%)

press

temp

cond

salt

oxyg

btra

fluo

turb

long latd

Variables/Loadings Plot for ysurf

1 2 3 4 5 6 7 8 9 10-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

Variable

Load

ings

on

PC

3 (

13.3

5%)

press

temp cond salt

oxyg

btra

fluo

turb

long latd

Variables/Loadings Plot for ysurf

Page 152: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Decluttered

-4 -3 -2 -1 0 1 2 3 4 5-6

-5

-4

-3

-2

-1

0

1

2

Scores on PC 1 (38.55%)

Sco

res

on P

C 2

(20

.11%

)

1 1b

2a 2b

3b

4a 4b

5b 6a 6b

7 8 9a 9b

9c

11 12b

15a 15b 15c

16 17 18a 18b

18c

22 23a 23b 23c

24 26a

26b

27b

28

29

30a

31 32

33b

33c 34 35 37

39a

40 41 42a 42c

43a

43b 43c 44 46a 46b 46c

47 48 49a 49b

Samples/Scores Plot of ysurf

Excluded station 67

latdlong

tempcondsaltfluor

oxygbtrm PCA

X(80,10)At surface

Page 153: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Topic 1 Multivariate Data AnalysisTopic 1 Theory: Multivariate Data AnalysisIntroduction to Multivariate Data AnalysisPrincipal Component Analysis (PCA) Multivariate Linear Regression (MLR, PCR and PLSR)

Laboratory exercises:Introduction to MATLAB Examples of  PCA (cluster analysis of samples, identification and geographical distribution of contamination sources/patterns…) Examples of Multivariate Regression (prediction of concentration of chemicals from spectral analysis, investigation of correlation patterns and of the relative importance of variables,…

Romà Tauler (IDAEA, CSIC, Barcelona)Febrero 2009

Page 154: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

CHEMOMETRICS STUDY OF CTD ARTIC SEA WATER DATA

• Introduction

• CTD data description

• PCA results for XTOT, Xdcm,Xsurf,Xdeep

• PLS prediction yfluor = f(Xdcm)

• PARAFAC modelling of X(80,200,10)

• MCR of Xfluor, Xcond, Xtemp,...

• PCA of continuos integrated data

Page 155: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Linear regression model usingPartial Least Squares calculated with SIMPLS Cross validation: random samples w/ 8 splits Percent Variance Captured by Regression Model -----X-Block----- -----Y-Block----- Comp This Total This Total ---- ------- ------- ------- ------- 1 29.00 29.00 76.95 76.95 2 38.72 67.72 2.51 79.46 1 2 3 4 5 6 7 8 9

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

Variable

Reg

Vec

tor

for

Y 1

press

temp cond salt

oxyg

btra

turb

long latd

Variables/Loadings Plot for yred11Fluorescence PLS prediction from other parameters

(DCM data X(80,9), y(80,1))

1 2 3 4 5 6 7 8 9 100

0.5

1

1.5

2

2.5

3

3.5

4

Variable

VIP

Sco

res

for

Y 1

press temp cond

salt oxyg

btra

turb

long latd

Variables/Loadings Plot for ydcm

0 5 10 15 20 25 30 35 40 45 50-10

0

10

20

30

40

50

60

Y Measured 1

Y P

redi

cted

1

1

2a 2b

3b

4a

4b

5a

6a 6b

6c

7

8 9a

9c 10 12a 13

15a 15b

15c

16

18a

18c

20a 20b 21

22

23a 23c

24

25

26a

26b

27a 27b

28

30a

31

32 33a 33b

33c

34 36a

36b

37 38 39a

39c

40

41

42b 42c

43a

43b 43c

44

45 46a

46b 46c

47

48

49a

49b

Samples/Scores Plot of ydcm

Decluttered

Page 156: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Linear regression model usingPartial Least Squares calculated with the SIMPLS Cross validation: random samples w/ 8 splits Percent Variance Captured by Regression Model -----X-Block----- -----Y-Block----- Comp This Total This Total ---- ------- ------- ------- ------- 1 34.94 34.94 33.23 33.23 2 39.55 74.49 10.53 43.76 3 11.72 86.21 12.44 56.20

Fluorescence PLS prediction from otherexcluding beam transmission and turbidity

(DCM data X(80,7), y(80,1))

1 2 3 4 5 6 7-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

Variable

Load

ings

on

LV 3

(11

.72%

)

press

temp cond

salt

oxyg

long

latd

Variables/Loadings Plot for yred11

1 2 3 4 5 6 70

0.5

1

1.5

2

2.5

Variable

VIP

Sco

res

for

Y 1

press

temp cond

salt

oxyg

long

latd

Variables/Loadings Plot for yred11

0 5 10 15 20 25 30 35 40 45 50-10

0

10

20

30

40

50

Y Measured 1

Y P

redi

cted

1

1

1b

2a 2b

3b

4a

4b

5a

5b

6a 6b

6c

8 9a 9c

11 12c

13

14 15a

15b 15c

16

17

18a

18b

18c

20a

21

22

24

25

26a 26b

27a

27b

28

29

31

32

33a

33c 34

36a 36b

37 38

39a

39c 40

42b 42c

43a 43b 43c 44

45

46a 46b 46c

47

48

49a 49b

Samples/Scores Plot of ydcm

Decluttered

Page 157: Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Fluorescence PLS prediction from other parameterssurf data X(80,9), y(80,1))

Percent Variance Captured by Regression Model -----X-Block----- -----Y-Block----- Comp This Total This Total ---- ------- ------- ------- ------- 1 33.89 33.89 29.08 29.08 2 22.29 56.18 3.53 32.61 3 14.83 71.01 2.76 35.37 4 4.80 75.81 5.10 40.48

1 2 3 4 5 6 7 8 9 10-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

Variable

Reg

Vec

tor

for

Y 1

press

temp cond

salt

oxyg

btra

turb long

latd

Variables/Loadings Plot for ysurf

1 2 3 4 5 6 7 8 9 100

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Variable

VIP

Sco

res

for

Y 1

press

temp

cond salt

oxyg

btra

turb

long latd

Variables/Loadings Plot for ysurf

-5 0 5 10 15 20 25 30-4

-2

0

2

4

6

8

10

12

14

Y Measured 1

Y P

redi

cted

1

2a

2b

4b

5a 6a

6c

7

8 9a

9b

9c

10

11

12a

12c

14

15a

15b

15c

16

18a

18c 22 23b

23c

27a

27b

29

31

32

33a

33c

37

39a

39c

40

42b

43a

43b

43c 44

46a

46b

46c

48 49b

Samples/Scores Plot of ysurf

Decluttered