a guide to the practical use of multivariate analysis in sims...a guide to the practical use of...

63
A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington, UK A copy of these slides can be downloaded from http://www.npl.co.uk/nanoanalysis/chemometrics.html

Upload: others

Post on 27-Feb-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

A Guide to the Practical Use ofMultivariate Analysis in SIMS

J LS Lee, I S GilmoreNational Physical Laboratory, Teddington, UK

A copy of these slides can be downloaded from http://www.npl.co.uk/nanoanalysis/chemometrics.html

Page 2: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 2

0

20

40

60

80

100

120

1990 2000 2010

Year of Publication

Num

ber

of P

ublic

atio

ns

PCAMCRPLSDFAANNs

Why are we here?

Page 3: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 3

Data analysis

SIMS Dataset

SIMS Dataset

How is it related to known properties?

Where are they located?

What chemicalsare on the surface?

Calibration / Quantification

Classification

Identification

Can we predictthese properties?

Which group does it belong to?

What are the differencesbetween groups?

Page 4: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 4

Contents

1. Introduction• What is multivariate analysis?• Some matrix algebra…

2. Identification3. Quantification and prediction4. Classification5. Conclusion

Page 5: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 5

Multivariate analysis

• Multivariate = More than 1 variable• Multivariate analysis is the statistical study of the dependence (covariance)

between different variables

• Variables are numerical values that we can measure on a sampleExample 1: A sample of people

Variables: Height, weight, shoe size, days since last haircut…

Example 2: A sample of weatherVariables: Temperature, humidity, wind speed, visibility, UV index…

Example 3: A sample of SIMS spectraVariables: Intensity of Si+ peak, intensity of O+ peak, intensity of C3H5

+

peak…

Key points• Many surface analytical technique, incl. SIMS and

XPS, gives data that are multivariate in nature• Finding correlations in the data is the key to

multivariate analysis!

Page 6: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 6

Why use multivariate analysis?

• Modern ToF-SIMS instrument generates huge, multivariatedata sets

• Manual analysis involves selecting a sub-set of most interesting features for analysis by eye

• Multivariate analysis involves simultaneous statistical analysis of all the variables

• Multivariate analysis can summarise the data with a large number of variables, using a much smaller number of “factors”

72 ion images (out of > 400!)

Page 7: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 7

Advantages and disadvantages

• Advantages– Fast and efficient on modern computers– Uses all information available– Improves signal to noise ratio– Statistically valid, removes potential bias

• Disadvantages– Lots of different methods, procedures, terminologies– Can be difficult to understand and interpret

Page 8: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 8

Contents

1. Introduction• What is multivariate analysis?• Some matrix algebra…

2. Identification3. Quantification and prediction4. Classification5. Conclusion

Page 9: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 9

Data matrix

X has 3 row and 5 columns →3 × 5 data matrix

Each row (spectra) is represented by a vector

=66301224

124222018

21110329

X

Variables

Sam

ples

Mass spectrum of Sample 1

010203040

1 2 3 4 5Mass

Inte

nsity

Mass spectrum of Sample 2

0

10

20

30

1 2 3 4 5Mass

Inte

nsity

Mass spectrum of Sample 3

010203040

1 2 3 4 5Mass

Inte

nsity

A matrix is simply a rectangular table of numbers!

Page 10: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 10

Matrix algebra and SIMS

Mass spectrum of Sample 1

010203040

1 2 3 4 5Mass

Inte

nsity

Mass spectrum of Sample 2

0

10

20

30

1 2 3 4 5Mass

Inte

nsity

Mass spectrum of Sample 3

010203040

1 2 3 4 5Mass

Inte

nsity

Mass spectrum of Chemical 1

0

2

4

6

8

1 2 3 4 5Mass

Inte

nsity

Mass spectrum of Chemical 2

0

2

4

6

1 2 3 4 5Mass

Inte

nsity

= ×

60Sample

3

42Sample

2

15Sample

1

Chemical2

Chemical1

60

42

15

Variables [mass]S

amples

Variables [mass]Chemicals

Chem

icals

Data matrix Chemical spectraSample composition= ×

11524

40161

66301224

124222018

21110329

= ×

Sam

ples

Page 11: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 11

Matrix algebra and SIMS

1. Each spectrum can be represented by a vector

2. Instead of x, y, z in 3D real space, the axes are mass1, mass2, mass3… etc in variable space (also ‘data space’)

3. Assuming the data are a linear combination of chemical spectra, we can write it as a product of two matrices

4. There are infinite number of possible solutions!

60

42

15

Variables [mass]S

amples

Variables [mass]Chemicals

Chem

icals

Data matrix Chemical spectraSample composition= ×

11524

40161

66301224

124222018

21110329

= ×

Sam

ples

Page 12: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 12

Factor analysis

1. Each spectrum can be represented by a vector

2. Instead of x, y, z in 3D real space, the axes are mass1, mass2, mass3… etc in variable space (also ‘data space’)

3. Assuming the data are a linear combination of chemical spectra, we can write it as a product of two matrices

4. There are infinite number of possible solutions!

-1.63.8

-0.23.7

1.93.6

Variables [mass]S

amples

Variables [mass]Factors

Factors

Data matrix “Loadings”“Scores”= ××××

−−− 4.43.13.59.59.3

5.30.17.57.57.4

66301224

124222018

21110329

= ××××

Sam

ples

We can describe the data as a linear combination of spectra, by writing the data matrix as product of two matrices:

One contains the spectra (“loadings”)One containing the contributions (“scores”)

This is the basis of factor analysis!

Page 13: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 13

Contents

1. Introduction2. Identification

• Principal component analysis (PCA)• PCA walkthrough• Data preprocessing• PCA examples• Multivariate curve resolution (MCR)• MCR examples

3. Quantification and prediction4. Classification5. Conclusion

Page 14: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 14

Data analysis

SIMS Dataset

SIMS Dataset

How is it related to known properties?

Where are they located?

What chemicalsare on the surface?

Calibration / Quantification

Classification

Identification

Can we predictthese properties?

Which group does it belong to?

What are the differencesbetween groups?

Page 15: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 15

Terminology

In order to clarify existing terminology and emphasise the relationship between the different multivariate techniques, we are going to adopt the following terminology in this lecture

ISO 18115-1 Surface chemical analysis – Vocabulary –Part 1: General terms and terms for the spectroscopies

ScoresPure Component Concentration

Scores, Projections

Projection of the samples onto the factors

TScores

LoadingsPure Component Spectrum

Loadings, Eigenvector

Projection of a factor onto the variables

PLoadings

Latent Vectors, Latent Variables

Pure Component

Principal Component

An axis in the data space of a factor analysis model, representing an underlying dimension that contributes to summarising or accounting for the original data set

-Factor

PLSMCRPCADefinitionSymbolTerms Here

Page 16: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 16

Principal component analysis (PCA)

m1

m2

PCA Factor 1

PCA Factor 2

• Factors are directions in the data space chosen such that they reflect interesting properties of the dataset

• Equivalent to a rotation in data space – factors are new axes

• Data described by their projections onto the factors

PCA Factor 1

PCA Factor 2

Page 17: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 17

Principal component analysis (PCA)

m1

m2

PCA Factor 1

PCA Factor 2

• The projection of the PCA factors onto the original variables (m1, m2) are ‘loadings’

• The projection of the samples (stars) onto the PCA factors are ‘scores’

• The data is fully described by D factors, where D is the ‘dimensionality of the data’ (number of samples or variables, whichever is smaller)

Data matrix

Scores matrix Loadings matrix

PTX ′=

PCA Factor 1

PCA Factor 2

I = no. of samplesK = no. of mass units

D = dimensionality of data

))(()( KDDIKI ××=×

Page 18: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 18

Principal component analysis (PCA)

m1

m2

PCA Factor 1

PCA Factor 2

• PCA extract orthogonal (uncorrelated)factors that successively capture the largest amount of variance within the data

• The amount of variance described by each factor is called ‘eigenvalue’

PCA Factor 1(78%)

PCA Factor 2 (22%)

Data matrix

Scores matrix Loadings matrix

PTX ′=))(()( KDDIKI ××=×

I = no. of samplesK = no. of mass units

D = dimensionality of data

Page 19: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 19

Principal component analysis (PCA)

m1

m2

PCA Factor 1

• By removing higher factors (small variance due to noise) we can reduce the dimensionality of data ⇒ ‘factor compression’

• Often hundreds of variables can be described with just a handful of factors!

Data matrix

Scores matrix Loadings matrix

Residuals (noise)

EPTX +′=

PCA Factor 1(78%)

)())(()( KIKNNIKI ×+××=×

I = no. of samplesK = no. of mass units

N = no. of PCA factors

PCA reproduceddata matrix

Scores matrix Loadings matrix

))(()( KNNIKI ××=×PTX ′=

Page 20: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 20

Number of factors

1 2 3 4 5 6 7 810-15

10-10

10-5

100

105

1010

Var

ianc

e ca

ptur

ed (

Eig

enva

lues

)

1 2 3 4 5 6 7 8101

102

103

104

105

106

107

108

PCA Factor

(a)

(b)

Data set of 8 spectra from mixing 3 pure compound spectra

no noise

Poisson noise

max 5000 counts

1. Prior knowledge of system

2. ‘Scree test’:Eigenvalue plot levels off in a linearly decreasing manner after 3 factors

3. Percentage of variance captured byNth PCA factor:

4. Percentage of total variance captured by first N PCA factors:

%100seigenvalue all of sum

eigenvalue th

×N

%100seigenvalue all of sum

to up seigenvalue of sum ×N

Page 21: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 21

Contents

1. Introduction2. Identification

• Principal component analysis (PCA)• PCA walkthrough• Data preprocessing• PCA examples• Multivariate curve resolution (MCR)• MCR examples

3. Quantification and prediction4. Classification5. Conclusion

Page 22: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 22

PCA walkthrough

Eight library spectra:

PS 2480PS 3550PMMA 2170PMMA 2500PEG 1470PEG 4250PPG 425PPG 1000

Unit mass binnedand mean centeredprior to analysis

Calculation using MATLAB with PLS Toolbox 4.0

Page 23: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 23

PCA walkthrough

Perform scree test using the log eigenvalue plot

Page 24: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 24

PCA walkthrough

First PCA factor (PC1):

Page 25: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 25

PCA walkthrough

Second PCA factor (PC2):

Page 26: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 26

PCA walkthrough

Third PCA factor (PC3):

Page 27: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 27

PCA walkthrough

Biplot - PCA factor 1 against PCA factor 2:

Page 28: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 28

PCA walkthrough

Using PCA we have effectively reduced 300 correlated variables (mass units) to 3 independent variables (factors) by which all the samples can be characterised.

Page 29: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 29

Contents

1. Introduction2. Identification

• Principal component analysis (PCA)• PCA walkthrough• Data preprocessing• PCA examples• Multivariate curve resolution (MCR)• MCR examples

3. Quantification and prediction4. Classification5. Conclusion

Page 30: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 30

Data preprocessing

Data preprocessing is the manipulation of data prior to data analysis…

Page 31: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 31

• Enhances PCA by bringing out important variance in dataset• Makes assumption about the nature of variance in data• Can distort interpretation and quantification

• Includes:– Peak selection and binning– Centering

• Mean centering– Scaling

• Normalisation• Variance scaling• Poisson scaling• Binominal scaling

Data preprocessing

Page 32: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 32

Peak selection and binning

All peaks of interest included?What threshold to use?

Auto peak search

Straightforward to use but detailed information lost

Unit mass binning

0.5 u binning *Separates organic from inorganics

Peaks of interest onlyUnexpected features lost

Manual selection

A Henderson et al., Surf. Interface Anal. 41 (2009) 666-674

Important considerations

• What information are we putting into PCA? What is included? What is omitted?

• Do we need to apply further processinge.g. dead time correction?

Page 33: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 33

• Subtract mean spectrum from each sample• PCA describes variations from the mean

Mean centering

m1

Factor 2

Factor 1

Raw data

1st factor goes from origin to centre of gravity of data

m2

m1

m3

Factor 1Factor 2

Mean Centering

1st factor goes from origin and accounts for the highest variance

( )kikik :mean~

XXX −=

m3

m2

Page 34: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 34

Normalisation

• Divide each spectrum by a constant for each sample e.g. intensity of a specific ion, total ion intensity

• Assumes chemical variances can be described by relative changes in ion intensities

• Preserves the shape of spectra

• Reduces effects of topography, sample charging, changes in primary ion current

( ) iki

ik XX

X ×=:sum

~ 1

0 50 100 150Mass, u

0 50 100 150Mass, u

Normalised by total ion intensity

Page 35: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 35

Variance scaling

• Divide each variable by its standard deviation in the dataset• Equalises importance of each variable (i.e. mass)• Problematic for weak peaks – usually used with peak selection• Called ‘auto scaling’ if combined with mean centering

Var

ianc

e

Mean

For each variable (mass, in SIMS spectrum)

Raw dataMean-

centeringVariance scaling

Auto scaling

Diagram from P. Geladi and B. Kowalski, Partial Least-Squares Regression: A Tutorial, Analytica Chimica Acta, 185 (1986) 1

( ) ikk

ik XX

X ×=:var

~ 1

Page 36: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 36

Poisson and binomial scaling

• SIMS data is dominated by Poisson counting noise – statistical uncertainty of a peak is proportional to intensity

• The noise becomes binomial for saturated data with dead time correction

• Divide data by the estimated noise variance of each data point

M R Keenan et al. Surf. Interface Anal., 36 (2004) 203M R Keenan et al., Surf. Interface Anal., 40 (2008) 97

No preprocessingNo. of factors?

Poisson scaling4 factors needed

• Emphasises weak peaks which vary above the expected counting noise, over intense peaks varying solely due to counting statistics

• Provides better noise rejection in PCA

Page 37: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 37

Contents

1. Introduction2. Identification

• Principal component analysis (PCA)• PCA walkthrough• Data preprocessing• PCA examples• Multivariate curve resolution (MCR)• MCR examples

3. Quantification and prediction4. Classification5. Conclusion

Page 38: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 38

PCA example (1)

• Three protein compositions (100% fibrinogen, 50% fibrinogen / 50% albumin, 100% albumin) adsorbed onto poly(DTB suberate)

• Loadings on first factor (PC1) shows relative abundance of amino acid peaks of two proteins

• Scores on PC1 separates samples based on protein composition

D.J. Graham et al, Appl. Surf. Sci., 252 (2006) 6860

PC

1 S

core

s (6

2%)

PC

1 Lo

adin

gs (

62%

)

Fib

Alb

Page 39: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 39

PCA example (2)

• SIMS spectra acquired for antiferritin with or without trehalose coating

• Largest variance (PC 1) arises from sample heterogeneity

• PC 2 distinguishes samples protected by trehalose –higher intensities of polar and hydrophilic amino acid fragments

• Trehalose preserves protein conformation in UHV

N. Xia et al, Langmuir, 18 (2002) 4090

Page 40: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 40

PCA example (3)

• 16 different single protein films adsorbed on mica

• Excellent classification of proteins using only 2 factors

• Loadings consistent with total amino acid composition of various proteins

• 95% confidence limits provide means for identification / classification

M. Wagner & D. G. Castner, Langmuir, 17 (2001) 4649

PC

2 S

core

s (1

9%)

PC1 Scores (53%)

PC

A L

oadi

ngs

Page 41: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 41

PCA image analysis

I, rows

J, columnsK, m

ass

peaks963

852

741

963

852

741

• ‘Datacube’ contains a raster of I x J pixels and K mass peaks

• The datacube is rearranged into 2D data matrix with dimensions [(I × J) × K] prior to PCA – ‘unfolding’

• PCA results are folded to form scores images prior to interpretation

9

8

7

6

5

4

3

2

1

963

852

741 unfold

9

8

7

6

5

4

3

2

1

9

8

7

6

5

4

3

2

1

Page 42: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 42

2 4 6 8 10 12 14 16 18 20

0

0.2

0.4

0.6

0.8

Sorted eigenvector index

log

(eig

env

alue

s)

2 4 6 8 10 12 14 16 18 20-2

-1

0

1

2

Sorted eigenvector index

log(

eig

enva

lue

s)

2 4 6 8 10 12 14 16 18 20-5

-4.5-4

-3.5-3

-2.5-2

-1.5

Sorted eigenvector index

log

(eig

enva

lues

)

PCA image example (1)

Immiscible PC / PVC polymer blend42 counts per pixel on average

Total ion image

Mean centering

Normalisation

Poisson scaling

Only 2 factors needed –dimensionality of image reducedby a factor of 20!

J L S Lee, I S Gilmore, in Surface Analysis: The Principal Techniques 2nd edition (eds J C Vickerman, I S Gilmore), Wiley.

Page 43: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 43

-5

0

5

10

0 5 10 15 20 25 30 35 40-0.5

0

0.5

1

1.5

2

Mass, u

PCA image example (1)

PCA results after Poisson scaling and mean centering

1st factor distinguishes PVC and PC phases

PC1 loadings

PC1 scores

-2

0

2

4

6

8

10

0 5 10 15 20 25 30 35 40-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Mass, u

2nd factor shows detector saturation for intense 35Cl peak

PC2 loadings

PC2 scores

35Cl + 37Cl

O + OH

J L S Lee, I S Gilmore, in Surface Analysis: The Principal Techniques 2nd edition (eds J C Vickerman, I S Gilmore), Wiley.

Page 44: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 44

PCA image example (2)

72 ion images (out of > 400!)

Image courtesy ofDr Ian FletcherIntertek MSG

50µµµµmMass, u

Total Spectra

Page 45: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 45

PCA image example (2)

Hair fibre with multi-component pretreatment

PCA factors are abstract combinations of chemical components andoptimally describe variance – PCA results can be difficult to interpret!

Image courtesy of Dr Ian Fletcher, Intertek MSG

Fac

tor

1

1000

2000

3000

4000

5000

Fac

tor

2

-3000

-2000

-1000

0

1000

0

0.2

0.4

A

C

D E

B

-0.2

0

0.2

0.4

0.6

E

CB

0.8

PCA scores PCA loadings

Fac

tor

1

1000

2000

3000

4000

5000

Fac

tor

2

-3000

-2000

-1000

0

1000

Fac

tor

1

1000

2000

3000

4000

5000

Fac

tor

2

-3000

-2000

-1000

0

1000

0

0.2

0.4

A

C

D E

B

-0.2

0

0.2

0.4

0.6

E

CB

0.8

0

0.2

0.4

A

C

D E

B

-0.2

0

0.2

0.4

0.6

E

CB

0.8

PCA scores PCA loadings

Fac

tor

3

-1000

0

1000

2000

3000

Fac

tor

4

-1000

0

1000

2000

Fac

tor

5

-1000

0

1000

2000

-0.4

-0.2

0

0.2

0.4

0.6

0.8

D

B

A

-0.4

-0.2

0

0.2

0.4

0.6

A

E

B

C

-0.4

-0.2

0

0.2

0.4

0.6

0.8

Mass (arb. scale)

C

D

B

Fac

tor

3

-1000

0

1000

2000

3000

Fac

tor

4

-1000

0

1000

2000

Fac

tor

5

-1000

0

1000

2000

Fac

tor

3

-1000

0

1000

2000

3000

Fac

tor

4

-1000

0

1000

2000

Fac

tor

5

-1000

0

1000

2000

-0.4

-0.2

0

0.2

0.4

0.6

0.8

D

B

A

-0.4

-0.2

0

0.2

0.4

0.6

A

E

B

C

-0.4

-0.2

0

0.2

0.4

0.6

0.8

Mass (arb. scale)

C

D

B

-0.4

-0.2

0

0.2

0.4

0.6

0.8

D

B

A

-0.4

-0.2

0

0.2

0.4

0.6

A

E

B

C

-0.4

-0.2

0

0.2

0.4

0.6

0.8

Mass (arb. scale)

C

D

B

J L S Lee et al., Surf. Interface Anal. 2009, 41, 653-665

Page 46: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 46

PCA summary

• PCA describes the original data using factors, consisting of loadingsand scores which efficiently accounts for variance in the data

• Eigenvalues give the variance captured by the corresponding factors

• Data preprocessing method needs to be selected with care

• PCA is excellent for discrimination and classification based on differences in spectra, and for identifying important mass peaks

• PCA factors optimally describe variance – PCA results may be difficult to interpret

Data matrix

Projection of samplesonto factors (scores matrix)

Projection of factorsonto variables (loadings matrix)

Residuals (noise)

EPTX +′=

I = no. of samplesK = no. of mass unitsN = no. of factors

Page 47: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 47

Contents

1. Introduction2. Identification

• Principal component analysis (PCA)• PCA walkthrough• Data preprocessing• PCA examples• Multivariate curve resolution (MCR)• MCR examples

3. Quantification and prediction4. Classification5. Conclusion

Page 48: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 48

Multivariate curve resolution (MCR)

• PCA factors are directions that describes variance– positive and negative peaks in the loadings– can be difficult to interpret

• What if we want to resolve original chemical spectra and reverse the following process?

• Try multivariate curve resolution (MCR)!

60

42

15

Variables [mass]

Sam

ples

Variables [mass]Chemicals

Chem

icals

Data matrix Chemical spectraSample composition= ×

11524

40161

66301224

124222018

21110329

= ×S

amples

Page 49: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 49

Multivariate curve resolution (MCR)

Data matrix

Projection of samplesonto factors (scores matrix)

Projection of factors onto variables (loadings matrix)

Residuals (noise)

EPTX +′=)())(()( KIKNNIKI ×+××=×

I = no. of samplesK = no. of mass unitsN = no. of factors

m1

m2 MCR Factor 2

MCR Factor 1

MCR is designed for recovery of chemical spectra and contributions from a multi-component mixture, when little or no prior information about the composition is available

MCR assumes linear combination of chemical spectra (loadings) and contributions (scores) – only an approximation in SIMS

Page 50: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 50

Multivariate curve resolution (MCR)

MCR uses an iterative least-squares algorithm to extract solutions, while applying suitable constraints

With non-negativity constraint, MCR factors resemble SIMS spectra and chemical contributions more directly, as these must be positive

m1

m2 MCR Factor 2

MCR Factor 1

Data matrix

Projection of samplesonto factors (scores matrix)

Projection of factors onto variables (loadings matrix)

Residuals (noise)

EPTX +′=)())(()( KIKNNIKI ×+××=×

I = no. of samplesK = no. of mass unitsN = no. of factors

Page 51: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 51

Outline of MCR Raw Data

Constraints•Non-negativity•Equality

Convergence criterion

•Non-negativity•Equality

MCRalternating-least-squares

optimisation

EPTX +′=

Initial Estimates of T or P•Random initialisation•PCA loadings or scores•Varimax rotated PCA loadings or scores

•Pure variable detection algorithm e.g. SIMPLISMA Number of Factors

ReproducedData Matrix

•Noise filtered data•Ensures MCR solution is robust

X

MCRScores T

MCRLoadings P

PCA

Data Matrix X

Page 52: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 52

• MCR solutions are not unique!• Accuracy of resolved spectra depends on the existence of pixels or

samples where there is only contribution from one chemical component (‘selectivity’)

• Good initial estimates, suitable data preprocessing and correct number of factors are essential

m2

Rotational ambiguity

m1

MCR Factor 2

MCR Factor 1

m2

m1

MCR Factor 2

MCR Factor 1

Page 53: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 53

Contents

1. Introduction2. Identification

• Principal component analysis (PCA)• PCA walkthrough• Data preprocessing• PCA examples• Multivariate curve resolution (MCR)• MCR examples

3. Quantification and prediction4. Classification5. Conclusion

Page 54: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 54

MCR image example (1)

0 10 20 30 400

0.5

1

1.5

2

(a)

Loadings on MCR factor 1

0 10 20 30 400

0.5

1

1.5

Mass, u

Loadings on MCR factor 2

(b)

Scores on MCR factor 1

Scores on MCR factor 2

0

5

10

0

5

10

Simple PVC / PC polymer blend

• MCR extracts two distinctive factors, corresponding to PVC and PC respectively

• Straightforward interpretation

PVC

PC

J L S Lee et al., Surf. Interface Anal. 2008, 40, 1-14

Page 55: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 55

MCR image example (2)

72 ion images (out of > 400!)

Image courtesy ofDr Ian FletcherIntertek MSG

50µµµµmMass, u

Total Spectra

Page 56: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 56

MCR image example (2)

0

0.2

0.4

0.6

D

B

0

0.2

0.4 E

0

0.2

0.4

0.6

0.8

1

Mass (arb. scale)

C

Fac

tor

3

0

1000

2000

3000

4000

5000

Fac

tor

4

0

1000

2000

3000

4000

Fac

tor

5

0

1000

2000

MCR scores MCR loadings

0

0.2

0.4

0.6

0.8

1

B

0

0.2

0.4

0.6

0.8A

Fac

tor

1

0

1000

2000

3000

4000

5000

6000

Fac

tor

2

0

1000

2000

3000

MCR scores MCR loadings

Mass (arb. scale)

MCR loadings resemble SIMS spectra (characteristic peaks A-E) and fragments, and scores directly reveal spatial distributions!

J L S Lee et al., Surf. Interface Anal. 2009, 41, 653-665

Image courtesy of Dr Ian Fletcher, Intertek MSG

Page 57: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 57

PCA Scores 1 PCA Scores 2 PCA Scores 3MCR Scores 1 MCR Scores 2 MCR Scores 3

MCR resolves the original images unambiguously!

MCR image example (3)

• We take three pictures and assign each with a SIMS spectra (PBC, PC, PVT)

• The pictures are combined to form a multivariate image dataset• Poisson noise are added to the image (avg ~50 counts per pixel)

Page 58: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 58

MCR summary

• MCR describes the original data using factors, consisting of loadingsand scores which which resembles chemical spectra and contributions from a multi-component mixture, respectively

• MCR uses an iterative algorithm to extract solutions, while applying suitable constraints e.g. non-negativity

• Good initial estimates and suitable data preprocessing are essential

• MCR is excellent for identification and localisation of chemicals in complex mixtures and allows for direct interpretation

Data matrix

Projection of samplesonto factors (scores matrix)

Projection of variablesonto factors (loadings matrix)

Residuals (noise)

EPTX +′=

I = no. of samplesK = no. of mass unitsN = no. of factors

Page 59: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 59

Identification summary

Easy –

Non-negative scores and loadings

Medium / Difficult –

Abstract, orthogonal factors

Easy –

Single ion images

Ease of interpretation

Difficult ? –Possibly depend on system studied

Easy –Higher factors capture small variance

Difficult –Only if substance is known

Detection of minor components

Easy –Full spectra obtained

Medium –Important peaks and correlation

Difficult –Characteristic peaks only

Chemical identification

Most suitable for

Attributes

Identification for unknown mixtures

Discrimination of similar chemical phases

Simple dataset with good prior knowledge

MCRPCAManual analysis

Page 60: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 60

Contents

1. Introduction2. Identification3. Quantification and prediction4. Classification5. Conclusion

Page 61: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 61

Data analysis

SIMS Dataset

SIMS Dataset

How is it related to known properties?

Where are they located?

What chemicalsare on the surface?

Calibration / Quantification

Identification

Can we predictthese properties?

PCA, MCR

PLS

ClassificationWhich group does it belong to?

What are the differencesbetween groups?

PC-DFA, PLS-DA

Page 62: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 62

Conclusion

In this lecture we looked at– Identification, quantification & classification using multivariate analysis– Importance of validation for predictive models– Data preprocessing techniques and their effects

Surface and Interface AnalysisMultivariate Analysis special issues (Volume 41 Issue 2 & 8, Feb/Aug 2009)

http://mvsa.nb.uw.edu/Community website with tutorials, links and software

Developed by Dan Graham and hosted by NESAC/BIO

Surface Analysis: The Principal Techniques 2nd edition , Chapter 10 “The application of multivariate data analysis techniques in surface analysis”

http://www.npl.co.uk/nanoanalysis/chemometrics.html

• Tutorial slides• Multivariate analysis terminology

defined in ISO 18115-1

Page 63: A Guide to the Practical Use of Multivariate Analysis in SIMS...A Guide to the Practical Use of Multivariate Analysis in SIMS J LS Lee, I S Gilmore National Physical Laboratory, Teddington,

Slide 63

Bibliography

General• J. L. S. Lee et al, “The application of multivariate data analysis techniques in surface analysis”, in Surface Analysis: The

Principal Techniques 2nd edition (eds J C Vickerman, I S Gilmore), Wiley.• S. Wold, Chemometrics; what do we mean with it, and what do we want from it?, Chemom. Intell. Lab. Syst. 30 (1995) 109• E. R. Malinowski, Factor analysis in Chemistry, John Wiley and Sons (2002)• P. Geladi et al, Multivariate image analysis, John Wiley and Sons (1996)• J. L. S. Lee et al, Quantification and methodology issues in multivariate analysis of ToF-SIMS data for mixed organic systems,

Surf. Interface Anal. 40 (2008) 1• D. J. Graham, NESAC/BIO ToF-SIMS MVA web resource, http://nb.engr.washington.edu/nb-sims-resource/

PCA• D. J. Graham et al, Information from complexity: challenges of ToF-SIMS data interpretation, Appl. Surf. Sci. 252 (2006) 6860• M. R. Keenan et al, Accounting for Poisson noise in the multivariate analysis of ToF-SIMS spectrum images, Surf. Interface

Anal. 36 (2004) 203• M. R. Keenan et al, Mitigating dead-time effects during multivariate analysis of ToF-SIMS spectral images, Surf. Interface Anal.

40 (2008) 97

MCR• N. B. Gallagher et al, Curve resolution for multivariate images with applications to TOF-SIMS and Raman, Chemom. Intell.

Lab. Syst. 73 (2004) 105• J. A. Ohlhausen et al, Multivariate statistical analysis of time-of-flight secondary ion mass spectrometry using AXSIA, Appl.

Surf. Sci. 231-232 (2004) 230• R. Tauler, A. de Juan, MCR-ALS Graphic User Friendly Interface, http://www.ub.edu/mcr/

PLS• P. Geladi et al, Partial Least-Squares Regression: A Tutorial, Analytica Chimica Acta 185 (1986) 1• A. M. C. Davies et al, Back to basics: observing PLS, Spectroscopy Europe 17 (2005) 28