multivariate resolution inmultivariate resolution in chemistry

Multivariate Resolution inMultivariate Resolution in Chemistryy

Lecture 2Lecture 2

Roma TaulerRoma TaulerIIQAB-CSIC, SpainQ , p

e-mail: [email protected]

Lecture 2Lecture 2

R l ti f t d t• Resolution of two-way data.• Resolution conditions.

– Selective and pure variables. – Local rank

N t l t i t– Natural constraints. • Non-iterative and iterative resolution methods

d l ithand algorithms. • Multivariate Curve Resolution using Alternating

L t S MCR ALSLeast Squares, MCR-ALS. • Examples of application.

Multivariate (Soft) Self Modeling CurveMultivariate (Soft) Self Modeling CurveResolution (definition)

• Group of techniques which intend the recovery of the response profiles (spectrarecovery of the response profiles (spectra, pH profiles, time profiles, elution profiles,....) of more than one component in anof more than one component in an unresolved and unknown mixture obtained from chemical processes and systemsfrom chemical processes and systemswhen no (little) prior information is available about the nature and/oravailable about the nature and/or composition of these mixtures.

Chemical reaction systems monitored using t i tspectroscopic measurements

J J

ST1.5

0.8

1

STC+C EI I

0.5

1

0.4

0.6SC

J

0 20 40 60 80 1000

0 10 20 30 400

0.2

N

J

1

1.5

D

ijekjsN1k ikcijd +∑=

=

Bili it !

I D0.5

1

Bilinearity!0 10 20 30 40 50 60 70 80 90

0

Analytical characterization of complex environmental, industrial and food mixtures using hyphenated methods

-5 4

g yp(chromatography or continuous flow methods with

spectroscopic detection).x 10

1.5

2

2.5

3

3.5x 10

STCSTNC

0.5

1

1

1.5

2

+C

ST

ENR

0 20 40 600

0 20 40 60 80 1000

0.5 +C ENR

D

1.2

0 8

1

LC-DAD coelution NC

D

0 2

0.4

0.6

0.8


NRD

0 10 20 30 40 50 60-0.2

0

0.2jj1kj =

Bilinearity!

Protein folding and dynamic protein-nucleic acid interaction processes.

0.70.80.9

ce (a

.u.) ST

0.8

1CD2O and Cprotein

ion

(a.u

.)

P1 STNC

0 10.20.30.40.50.6

Abs

orba

nc

P1

P

D1

D2

0

0.2

0.4

0.6

Con

cent

rat

43.8 ºC

63.9 ºC

P2

D1

D2 +CST

ENRNR

D

14001500160017001800190000.1

Wavenumber (cm-1)

P220 30 40 50 60 70 800

Temperature (ºC)

NC

NR

1

1.2

1.4

ance

protein

NdNR

NC

0.4

0.6

0.8

Abs

orba

ijekjs1k ikcijd +∑=

=

Bilinearity!

NRD

1400150016001700180019000

0.2

Wavenumber (cm-1)

D2OBilinearity!

Environmental source resolution and apportiomentpp

0

0.05

0.1

0.15

0.2

5

10

15

20

+CST

E

0 10 20 30 40 50 60 70 80 90 1000

0 10 20 30 40 50 60 70 80 90 1000

0.05

0.1

0.15

0.2

0.3

0.4

0 5 10 15 20 250

0 5 10 15 20 250

10

20

30

+C ENR NR

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.20 5 10 15 20 25

0 5 10 15 20 250

5

10

15

20

sourcecomposition

N

NC6

22 samples

sourcedistribution


=

NRD

3

4

522 samples

Bilinearity!1

2

Bilinearity!0 10 20 30 40 50 60 70 80 90 100

0

concn. of 96 organic compounds

Soft-modelling

MCR bilinear model for two way data:

N

d ∑J

ij in nj ijn 1

d c s e=

= +∑I

dij

TD CS E= +D

dij is the data measurement (response) of variable j in sample idij is the data measurement (response) of variable j in sample in=1,...,N are the number of components (species, sources...)cin is the concentration of component n in sample i;s is the response of component nsnj is the response of component nat variable j

Lecture 2Lecture 2

R l ti f t d t• Resolution of two-way data. • Resolution conditions.





Resolution conditions to reduce MCR rotation ambiguities (unique solutions?)

•Selective variables for every component•Local rank conditions (Resolution Theorems)•Natural Constraints

•non-negativity•unimodality•closure (mass-balance)closure (mass balance)

•Multiway Data (i.e. trilinear data...)•Hard-modelling constraints

•mass action law•mass-action law•rate law•....

Sh t i t ( i l t i i t i k h l•Shape constraints (gaussian, lorentzian, assimetric peak shape, log peak shape, ...)•....

Unique resolution conditionsFirst possibility: using selective/pure variables

2l th l ti

First possibility: using selective/pure variables

1

wavelength selectiveRanges, where only onecomponent absorbs

1 elution profiles can beestimated without ambiguitiesg

2elution time selective ranges, where only one component is

1present spectra can be estimated without ambiguities

Detection of ‘purest’ (more selective) variables

Methods focused on finding the most representative g p(purest) rows (or columns) in a data matrix.

Based on PCABased on PCA• Key Set Factor Analysis (KSFA)

Based on the use of real variables• Simple-to-use Interactive Self-modelling analysis p g y

(SIMPLISMA) • Orthogonal Projection Approach (OPA)Orthogonal Projection Approach (OPA)

How to detect purest/selective variables?

Selective variables are the more pure/representative/ dissimilar/orthogonal (linearly independent) variables..!

Examples of proposed methods for detection ofExamples of proposed methods for detection of selective variables:•Key set variables KSFA E.D.Malinowski, Anal.Chim Acta,Key set variables KSFA E.D.Malinowski, Anal.Chim Acta, 134 (1982) 129; IKSFA, Chemolab, 6 (1989) 21 •SIMPLISMA: W.Windig & J.Guilmet, Anal. Chem., 63 (1991) 1425-1432)•Orthogonal Projection Analysis OPA: F.Cuesta-Sanchez et al Anal Chem 68 (1996) 79)et al., Anal. Chem. 68 (1996) 79) •.......

SIMPLISMA

• Finds the purest process or signal variables in a data setdata set.

Most dissimilar signal variablesMost dissimilar signal variables (approximate concentration profiles)

Proocess va

Most dissimilar process variables (approximate signal profiles) ariables

( pp g p )

Signal variables

SIMPLISMA

HPLC-DAD Purest retention times

• Variable purity

i

ii m

sp =

Reten

iimntion tim si Std. deviation

M

mes

Signal variables

mi Mean

Noisy variablesSignal variables Noisy variables

si ↑ mi ↓ ⇒ pi ↑

SIMPLISMA

HPLC-DAD Purest retention times

• Variable purity

fmspi

ii +=

Reten

ifmi +ntion tim si Std. deviation

M

mes

Signal variables

mi Meanf % noise (offset)

Signal variables

Noisy variables → pi ↓

SIMPLISMA

Working procedure1. Selection of first pure variable. max(pi)2. Normalisation of spectra.3. Selection of second pure variable.

a. Calculation of weights (wi)

Rete1a. Calculation of weights (wi) ention tii( )iT

ii YYdetw =

mes

Si l i bl

b. Recalculation of purity (p’i)p’i = wi pi

Signal variables

YiT

c. Next purest variable. max(p’i)

SIMPLISMA

Working procedure3. Selection of third pure variable.

a Calculation of weights (w )a. Calculation of weights (wi)

Rete1( )iT

ii YYdetw = ention tim2b. Recalculation of purity (p’’i)’’ i

mes

Signal variables

p’’i = wi pi

c. Next purest variable. max(p’’i)i

Signal variables

YiT

.

.

.

SIMPLISMA

Graphical informationGraphical information

• Purity spectrum.Plot of pi vs. variables.Plot of pi vs. variables.

• Std. deviation spectrum.Std. deviation spectrum.Plot of ‘purity corrected’ std. dev. (csi) vs.

variablesvariablescsi = wi si

SIMPLISMA

Graphical information10000

Mean spectrum

1 2

1.4Concentration profiles

0

5000

1

1.2

banc

e

0 10 20 30 40 50 600

4000

Std. deviation spectrum

0.6

0.8

Abs

orb

0 10 20 30 40 50 600

2000

1 t t

0.2

0.4

0 5

1

1st pure spectrum

31

0 10 20 30 40 50 600

Retention times 0 10 20 30 40 50 600

0.5

if 1 t i bl i t iif 1st variable is too noisy ⇒ f is too low and should be increased

SIMPLISMA

Graphical information2nd pure spectrum

1 2


0 1

0.15

0.22nd pure spectrum

40

1

1.2

banc

e

0 10 20 30 40 50 600

0.05

0.1

0.6

0.8

Abs

orb

311500

0 10 20 30 40 50 60

2nd std. dev. spectrum

0.2

0.4

500

1000

0 10 20 30 40 50 600

Retention times 0 10 20 30 40 50 600

500

SIMPLISMA

Graphical information3 d t

1 2


40 0.04

0.063rd pure spectrum

23

1

1.2

banc

e

0 02

0

0.02

0.6

0.8

Abs

orb

310 10 20 30 40 50 60

-0.02

1503rd std. dev. spectrum

0.2

0.4

50

100

0 10 20 30 40 50 600

Retention times 0 10 20 30 40 50 60-50

0

SIMPLISMA

Graphical information4th pure spectrum

1 2


402

3x 10

-3 4th pure spectrum

1

1.2

banc

e

0 10 20 30 40 50 60-1

0

1 13

0.6

0.8

Abs

orb

3123

84th std. dev. spectrum

0 10 20 30 40 50 60

0.2

0.4

0

2

4

6

0 10 20 30 40 50 600

Retention times0 10 20 30 40 50 60

-2

0

SIMPLISMA

Graphical information2

x 10-18 5th pure spectrum

1 2


40 0

1

1

1.2

banc

e

130 10 20 30 40 50 60

-1

0

0.6

0.8

Abs

orb

3123

131

x 10-14 5th std. dev. spectrum

0.2

0.4

1

0

0 10 20 30 40 50 600

Retention times

0 10 20 30 40 50 60-1

Noisy pattern in both spectra

No more significant contributions

SIMPLISMA

InformationInformation

• Purest variables in the two modes.Purest variables in the two modes.

• Purest signal and concentration profiles.

• Number of compounds.

Unique resolution conditions

•Many chemical mixture systems (evolving or y y ( gnot) do not have selective variables for all the components of the systemp y

•When selected variables are not (totally) ( y)selective, their detection is still very useful as an initial description of the system reducing its p y gcomplexity and because they provide good initial estimations of species profiles useful for most of p pthe resolution methods

Lecture 2Lecture 2







Second possibility: using local rank information

What is local rank?

Local rank is the rank of reduced data regions in any of the two orders of the original data matrix

It can be obtained by Evolving Factor Analysisderived methods (EFA FSMW-EFA )derived methods (EFA, FSMW EFA, ...)

Conditions for unique solutions (unique resolution, uniqueness) based using local rank information have been described as: Resolution TheoremsRolf Manne, On the resolution problem in hyphenated chromatography. Chemometrics and Intelligent Laboratory Systems, 1995, 27, 89-94

Resolution Theorems

Theorem 1: If all interfering compounds that appear inside the concentration window of a given analyte also appearthe concentration window of a given analyte also appear

outside this window, it is possible to calculate without ambiguities the concentration profile of the analyte

[ ] ⎥⎤

⎢⎡ ∑ TTTTVVID )([ ] ⎥

⎦⎢⎣

−=− ∑m

Tmm

Ta

Taa

T vvsscVVID )(

V matrix defines the vector subspace where the analyte is notV matrix defines the vector subspace where the analyte is not present and all the interferents are present. V matrix can be found by PCA (loadings) of the submatrix where the analyte is not present!

1x 10 -5

lResolution Theorems

0.6

0.7

0.8

0.9 analyte

interference

0 2

0.3

0.4

0.5

0.6

interference

interference

0 10 20 30 40 50 600

0.1

0.2

1111111222222222111222222211111111

This local rank information can be obtained from submatrix analysis (EFA, EFF)

11111112222222221112222222111111111111111 ------------ 111---------- 11111111

y ( , )Matrix VT may be obtained from PCA of the regions where the analyteis not present

Thi i kn 1

T T T T−⎡ ⎤∑This is a rank

one matrix!T T T T

a a a m mm 1

D(I VV ) c s (s v )v=

⎡ ⎤− = −⎢ ⎥⎣ ⎦∑

concentration profile of analyte ca may be resolved from D and VT

Resolution Theorems

Theorem 2: If for every interference the concentration window of the analyte has a subwindow where the interference is absent, y ,

then it is possible to calculate the spectrum of the analyte

x 10 -5

0.8

0.9

1

analyteinterference 1

0.3

0.4

0.5

0.6

0.7

interference 2

region where interference 2 region where interference 10 10 20 30 40 50 60

0

0.1

0.2

region where interference 2is not present

region where interference 1is not present

Local rankinformation

Resolution Theorems

Theorem 3. For a resolution based only upon rank information in the chromatographic direction the conditions of Theorems 1

and 2 are not only sufficient but also necessary conditions

Resolution based on local rank conditions1.5

x 10 -5

1.6

1.8

2x 10

-5

Resolution based on local rank conditions

0.5

1

0.6

0.8

1

1.2

1.4

0 10 20 30 40 50 600

0 10 20 30 40 50 600

0.2

0.4

thi tthis system canbe totally resolvedusing local rank

this system cannotbe totally resolved(only partially) based

information!!!( y p y)only in local rankinformation

Unique resolution conditions?

1.8

2x 10

-5

In the case ofembedded peaks,

1.4

1.6

embedded peaks,resolution conditions based

0.8

1

1.2

on local rank arenot fulfilled!

0.4

0.6

resolution withoutambiguities will be

0 10 20 30 40 50 60

0

0.2ambiguities will bedifficult when a singlematrix is analyzed

Conclusions about unique resolutionconditions based on local rank analysis

I d t h t l ti f th t d t

conditions based on local rank analysis

In order to have a correct resolution of the system and to apply resolution theorems it is very important to have:

1) an accurate detection of local rank information EFA based methods

2) This local rank information can be introduced in the resolution process using either:

i i di l i h dnon-iterative direct resolution methodsiterative optimization methods

Resolution Theorems

•Resolution theorems can be used in the two matrix directions (modes/orders), in the chromatographic and in the spectral directionspectral direction.

•Resolution theorems can be easily extended to multiwayResolution theorems can be easily extended to multiway data and augmented data matrices (unfolded, matricized three-way data) Lecture 3

•Many resolution methods are implicitly based on these resolution theoremsresolution theorems

Lecture 2


– Selective and pure variables– Local rank





Third possibility: using natural constraints

Natural constraints are previously known conditions p ythat the profile solutions should have. We know that certain solutions are not correct!

Even when non selective variables nor local rank resolutions conditions are present natural constraints canresolutions conditions are present, natural constraints can be applied. They reduce significantly the number of possible solutions (rotation ambiguity) p ( g y)

However, natural constraints alone, do not produce unique solutions in general

Natural constraints

• Non negativity:– species profiles in one or two orders are not

negative (concentration and spectra profiles)• Unimodality:

– some species profiles have only one maximum (i.e. concentration profiles)

Cl• Closure– the sum of species concentration is a known

t t l (i i ti b d tconstant value (i.e. in reaction based systems = mass balance equation)

Non-negativity

Cc0 35

C*0 3

Constrained profile(s) update

plain LS profile(s). 0.2

0.25

0.3

0.35

0.15

0.2

0.25

0.3

plain LS profile(s).

0.05

0.1

0.15

-0.05

0

0.05

0.1

0 10 20 30 40 500

Retention times0 10 20 30 40 50

-0.1

Retention times

Unimodality

C* Cc

0.25

0.3

0.35c

0.25

0.3

0.35

0.1

0.15

0.2

0.1

0.15

0.2

0 5 10 15 20 25 30 35 40 45 500

0.05

Retention times0 5 10 15 20 25 30 35 40 45 50

0

0.05

Retention times

Closure

Σ = ctotal

C*0 35 0 35

CcMass balance

0.25

0.3

0.35

c0.25

0.3

0.35 c

c

0.1

0.15

0.2 ctotal

0.1

0.15

0.2 ctotal

2 3 4 5 6 7 8 90

0.05

pH2 3 4 5 6 7 8 9

0

0.05

pH

Hard-modelling

C* CcPhysicochemical model

0.7

0.8

0.9

1

c

0.7

0.8

0.9

1

y

0.3

0.4

0.5

0.6

0 2

0.3

0.4

0.5

0.6

2 3 4 5 6 7 8 90

0.1

0.2

pH2 3 4 5 6 7 8 9

0

0.1

0.2

pH

Unique resolution conditionsUnique resolution conditions

Forth possibility: by multiway, multiset data analysisForth possibility: by multiway, multiset data analysis and matrix augmentation strategies (Lecture 3)

• A set of correlated data matrices of the same system obtained under different conditions are simultaneously analyzed (Matrix Augmentation)

• Factor Analysis ambiguities can be solved more easily for three-way data, specially for trilinear three-way data

Lecture 2






Multivariate Curve Resolution (MCR) methods

•Non-iterative resolution methodsNo e ve eso u o e odsRank Annihilation Evolving Factor Analysis (RAEFA)Window Factor Analysis (WFA)Heuristic Evolving Latent Projections (HELP)Subwindow Factor Analysis (SFA)GentleGentle.....

•Iterative resolution methodsIterative Factor Factor Analysis (ITF)Positive Matrix Factorization (PMF)Alternating Least Squares (ALS)…….

Non-iterative resolution methods are mostly based on d t ti d f l l k i f tidetection and use of local rank information

• Rank Annihilation by Evolving Factor Analysis (RAEFA, H.Gampp et al. Anal.Chim.Acta 193 (1987) 287)287)

• Non-iterative EFA (M.Maeder, Anal.Chem. 59 (1987) 527)527)

• Window Factor Analysis (WFA, E.R.Malinowski, J.Chemomet., 6 (1992) 29), ( ) )

• Heuristic Evolving Latent Projections (HELP, O.M.Kvalheim et al., Anal.Chem. 64 (1992) 936)( ) )

WFA method descriptionE R Malinowski J Chemomet 6 (1992) 29)E.R.Malinowski, J.Chemomet., 6 (1992) 29)

D = C ST = Σ cisTi i=1,...,n

1. Evaluate the window where the analyte n is present (EFA, EFF..)2 C t b t i Do d l ti th i d f th l t2. Create submatrix Do deleting the window of the analyte n3. Apply PCA to Do = Uo VTo = Σ uo

jvToj j=1,...,m, m==n-1

4 Spectra of the interferents are: s = Σ β vTo j=1 m4. Spectra of the interferents are: si = Σ βij vToj j=1,...m

5. Spectra of the analyte lie in the orthogonal subspace of VTo

6 Concentration of the analyte c can be calculated from:6. Concentration of the analyte cn can be calculated from:T o

nn n n n(I VV )D s c Dβ− = = Dn is a rank one matrixs o is part of the spectrum of thesn is part of the spectrum of theanalyte sn which is orthogonal tothe interference spectra cn and sn

o can be obtainedobtaineddirectly!! Like 1st Resolution Theorem!!!

Non-iterative resolution methods based on detection and f l l k i f ti

a) D

use of local rank information

a)EFA or EFF: conc. window nth

component= U

VT

Rank n

b)

= UoVTo

R k ( 1)

Do

c)Do

U Rank (n - 1)

T VToVT VTo

vnTo

∈ ⊥

d)

orthogonal

d)

= D

vno

cn

Non-iterative resolution methods based on detection and f l l k i f tiuse of local rank information

The main drawbacks of non-iterative resolution methods (like WFA) are:

a) the impossibility to solve data sets with non-sequential profiles (e.g., data sets q p ( g ,with embedded profiles)

b) h d ff f b d d fi i ib) the dangerous effects of a bad definition of concentration windows.

Non-iterative resolution methods based on detection and

Improving WFA has been the main goal of modifications of

use of local rank information

Improving WFA has been the main goal of modifications of this algorithm:

E.R. Malinowski, Automatic Window Factor Analysis. A more efficient method for determining concentration profiles from evolutionary spectra”. J. Chemometr. 10, 273-279 (1996).

Subwindow Factor Analysis (SFA) based on the systematic comparison of matrix windows sharing onesystematic comparison of matrix windows sharing one compound in common. R. Manne, H. Shen and Y. Liang. “Subwindow factor analysis”. Chemom. Intell. Lab. Sys., 45, 171-176 (1999).

Iterative resolution methods (third alternative!)

Iterative Target Factor Analysis, ITTFAP J G li J Ch I f C t S i 1984– P.J. Gemperline, J.Chem.Inf.Comput.Sci., 1984, 24, 206-12

– B G M Vandeginste et al Anal Chim Acta 1985B.G.M.Vandeginste et al., Anal.Chim.Acta 1985, 173, 253-264

Alternating Least Squares, ALS– R.Tauler, A.Izquierdo-Ridorsa and E.Casassas.

Ch t i d I t lli t L b tChemometrics and Intelligent Laboratory Systems, 1993, 18, 293-300.

– R Tauler A K Smilde and B R Kowalski J– R. Tauler, A.K. Smilde and B.R Kowalski. J. Chemometrics 1995, 9, 31-58.

– R.Tauler, Chemometrics and Intelligent , gLaboratory Systems, 1995, 30, 133-146.

Iterative Target Factor Analysisa)

x1in

xa) Geometrical representation

x1out

x2in

x2out

ITTFAof ITTFA from initialneedle targets x1in and x2in

b)

2out

1x1in x1ou

t

b) Evolution of the shapeof the two profiles through

tR

x

tR

x

the ITTFA process

x2ou

t

x2in

tR tRtR tR

ITTFA

Iterative resolution methods

Iterative Target Factor Analysis ITTFA

ITTFA gets each concentration profile following the stepsbelow:below:

1. Calculation of the score matrix by PCA.2. Use of an estimated concentration profile as initialtarget.3 Projection of the target onto the score space3. Projection of the target onto the score space.4. Constraint of the target projected.5. Projection of the constrained target.5. Projection of the constrained target.6. Go to 4 until convergence is achieved.

Lecture 2Lecture 2






Soft-modelling

MCR bilinear model for two way data:

N

d ∑J

ij in nj ijn 1

d c s e=

= +∑I

dij

TD CS E= +D

dij is the data measurement (response) of variable j in sample idij is the data measurement (response) of variable j in sample in=1,...,N are the number of components (species, sources...)cin is the concentration of component n in sample i;s is the response of component nsnj is the response of component nat variable j

Multivariate Curve Resolution (MCR)

Pure component informationMixed information

s1

t

λ

C

STsn

c nc 1

D

tR

CD

WavelengthsRetention times

Pure concentration profilesChemical model

Pure signalsCompound identity

Process evolutionCompound contribution

relative quantitation

Compound identitysource identification and Interpretation

An algorithm to solve Bilinear models using Multivariate Curve Resolution (MCR):

Alternating Least Squares (MCR-ALS)

C and ST are obtained by solving iteratively the two alternating LS equations:

TPCAC

SCDmin ˆˆˆˆ

− TPCA

SSCDmin

Tˆˆˆ −

• Optional constraints (local rank, non-negativity, unimodality,closure,…) are applied at each iteration• Initial estimates of C or S are obtained from EFA or from pure variable detection methods.

Multivariate Curve ResolutionAlternating Least SquaresAlternating Least Squares

ModelTD = CS +E

Model ˆ TPCAD = UV

ˆˆ T

PCAC constraintsmin D - CS

C,constraints

ˆ Ti D CS

Algorithm to findthe Solution

T

TPCAS ,constraints

min D - CS

Multivariate Curve Resolution Alternating Least SquaresAlternating Least Squares

(MCR-ALS)U t i d S l tiUnconstrained Solution

I iti l ti t f C• Initial estimates of C or S are obtained from EFA or from pure

TD=C S +EEFA or from pure variable detection methods

PCAˆT +1)S =C D

• Optional constraints are applied at each

PCA

ˆ T +

1) S C D

2)C=D (S )iteration ! PCA2) C D (S )C+ and (ST)+ are the pseudoinversesC and (S ) are the pseudoinverses

of C and ST respe ctively

Matrix pseudoinverses

C and ST are not square matrices. Their inverses are not defined

If they are full rank, i.e. the rank of C is equal to the number of its columns, and the rank of ST is equal to the number of its rows,The generalized inverse or pseudoinverse is defined:The generalized inverse or pseudoinverse is defined:

D = C ST D = C ST

CT D = CT C ST D S = C ST SC D = C C S D S = C S S(CT C)-1 CT D = (CT C)-1(CT C) ST D S (ST S)-1 = C (ST S) (ST S)-1

(CT C)-1 CT D = ST D S (ST S)-1 = CC+ D = ST D (ST)+ = CC+ D = ST D (ST)+ = CWhere C+ = (CT C)-1 CT Where (ST)+ = S (ST S)-1

C+ and (ST)+ are the pseudoinverses of C and ST respectively. They also provide the best least squares estimations of the overdetermined linear system of equations. If C and ST are not full rank, it is still possible to define their pseudoinverses using SVD

Flowchart of DMCR-ALS 1

PCA EFA FSMWEFApurest2

Constraints:N t l

N.components Local RankInitialeatimates

NaturalSelectivity

Local Rank Shape

34Shape

EqualityCorrelationHard model

ALS5

Hard model..........

STQuantitative E Fit and

DiagnosticsC

SQualitativeInformation

QuantitativeInformation

Diagnostics

Iterative resolution methods

Alternating Least Squares MCR-ALS

ALS optimizes concentration and spectra profiles using a constrained alternating least squares method. The main steps of the method are:

1 Calculation of the PCA reproduced data matrix1. Calculation of the PCA reproduced data matrix.2. Calculation of initial estimations of concentration or spectral profiles (e.g, using SIMPLISMA or EFA).spectral profiles (e.g, using SIMPLISMA or EFA).3. Alternating Least Squares

Iterative least squares constrained estimation of C or ST

Iterative least squares constrained estimation of ST or CTest convergence

4 Interpretation of results4. Interpretation of results

Flowchart of MCR-ALS

D C ST + E

Journal of Chemometrics, 1995, 9, 31-58; Chemomet.Intel. Lab. Systems, 1995, 30, 133-146Journal of Chemometrics, 2001, 15, 749-7; Analytica Chimica Acta, 2003, 500,195-210

Resolved

STD = C ST + E(bilinear model)

Data InitialSVDor ALS

ResolvedSpectraprofiles

olve

dtr

atio

nfi

les

Matrix Estimationor

PCA optimization

Res

oC

once

npr

of

E+

DC

Estimation of the number

of components

Initial estimation ALS optimization

D t t i

C

componentsCONSTRAINTS Results of the ALS optimization

procedure:Fit and Diagnostics

Data matrix decomposition according to a bilinear model

Tˆˆˆ TˆˆˆTPCAC

SCDmin ˆˆˆˆ

− TPCA

SSCDmin

Tˆˆˆ −

Until recentlyMCR-ALS input had to be typed in

Until recently the MATLAB command line

Troublesome and difficult in complex cases where several data matrices are simultaneously analyzed and/or different constraints are applied to each of them for an optimal resolution

A graphical user-friendlyNow

A graphical user friendly interface for MCR-ALS

J Jaumot R Gargallo A de Juan and R Tauler ChemometricsJ. Jaumot, R. Gargallo, A. de Juan and R. Tauler, Chemometrics and Intelligent Laboratory Systems, 2005, 76(1) 101-110

Multivariate Curve ResolutionHome PageHome Page

http://www.ub.es/gesq/mcr/mcr.htm

E l A l i f lti l i t A l i f 4Example. Analysis of multiple experiments. Analysis of 4 HPLC-DAD runs each of them containing four compounds

Alternating Least Squares Initial estimates

• from EFA derived methods (for evolving methods like chromatography titrations )chromatography, titrations...)

• from ‘pure’ variable (SIMPLISMA) detection methods (for non-evolving methods and/or for very poorly resolved g y p ysystems...)

• from individually and directly selected from the data using chemical reasoning (i.e first and last spectrum; isosbestic points, ....)

• from known profiles ...

Alternating Least Squares with constraints

• Natural constraints: non-negativity; unimodality, closureclosure,...

• Equality constraints: selectivity, zero concentration windows, known profiles..., p

• Optional Shape constraints (gaussian shapes, asymmetric shapes)

• Hard modeling constraints (rate law, equilibrium mass-action law...)

• ......................

How to implement constrained ALS optimization algorithms in optimal way from a least squares sense?optimal way from a least squares sense?

Considerations:

How to implement these algorithms in a way that all the constraints be fulfilled simultaneously at the same timeconstraints be fulfilled simultaneously at the same time(in every least squares step - in one LS shot- of the optimization)?

Updating (substitution) methods do work well most of the times! Why? Because the optimal solutions which better fit the data (apart from noise and degrees of freedom) do also fulfill the constraints of the system

Constraints are used to lead the optimization in the right direction within feasible band solutions. .

Implementation of constraintsNon negativity constraints caseNon-negativity constraints case

a) forcing values during iteration (e g negative values to zero)a) forcing values during iteration (e.g negative values to zero)intuitivefasteasy to implementit can be used individually for each profile independentlyless efficient

b) using non-negative rigurous least squares optimization proceures:more statistically efficientmore statistically efficientmore efficientmore difficult to implementpit has to be used to all profiles simultaneouslydifferent approaches (penalty functions, constrainedoptimization, elimination...

How to implement constrained ALS optimization algorithms in optimal way from a least squaresalgorithms in optimal way from a least squares

sense?

Different rigorous least-squares approaches have been proposed

- Non-negative least squares methods (Lawson CL, Hanson RJ. Solving Least Squares Problems.Prentice-Hall: 1974; Bro R, de Jong S. J. Chemometrics 1997; 11: 393–40; Mark H.Van Benthem and Michael R.Keenan, Journal of Chemometrics, 18, 441-450; ...)

- Unimodal least-squares approaches (R.Bro, N.D.Sidiropoulus, J.of Chemometrics, 1998, 12, 223-247)

- Equality constraints (Van Benthem M, Keenan M, Haaland D. J. Chemometrics 2002; 16, 613–622....)

- Use of penalty terms in the objective functions to optimize

- Non-linear optimization with non-linear constraints (PMF, Multilinear Engine, sequential quadratic programming.....

Active non negativity constraints:Checking active constraints:Are still active the constraints at the optimum ALS solution?

Active non-negativity constraints:C matrix

r c value19 1 4 1408e 003

Checking active constraints:ALS solutions DPCA, CALS, SALS

New unconstrained solutions 19 1 -4.1408e-00321 1 -3.2580e-00323 1 -1.8209e-00324 1 -3.3004e-0031 2 1 1663e 002

Cunc = DPCA (STALS)+

STunc = (CALS)+ DPCA

Deviationsare small!!!1 2 -1.1663e-002

2 2 -2.1166e-0023 2 -2.1081e-0024 2 -3.8524e-00325 2 1 9865e 0031

1.5

2c1 alsc2 alsc3 alsc1 unc

25 2 -1.9865e-00326 2 -1.3210e-0037 3 -5.9754e-0038 3 -5.5289e-004

0 5

0

0.5

1 c2 uncc3 unc

ST matrixEmpty matrix: 0-by-3

0 5 10 15 20 25-0.5

0.4

0.5s1 alss2 alss3 als

0.1

0.2

0.3s1 uncs2 uncs3 unc

Proposal: Check ALS solutions for active

constraints and if0 5 10 15 20 25 30 35 40 45 50

0constraints and if deviations are large!

Implementation of unimodality constraints

‘vertical’ unimodality: forcing non-unimodal parts of the profile to zero p p

‘horizontal’ unimodality: forzing non-unimodal parts of the profile to be equal to the last unimodal value

‘average’ unimodality: forcing non-unimodal parts of the profile to p pbe an average between the two extreme values being still unimodal

using momotone regressionproceduresprocedures

Implementation of closure/ /normalization constraintsconstraints

Equality constraints:Closure constraintsexperimental point i 3 concn profilesexperimental point i, 3 concn profilesci1 + ci2 + ci3 = tici1r1+ci2r2+ci3r3 = ti

• • Σ•= t

closure

.

i1 1 i2 2 i3 3 iC r = tr = C+ t These are equality

constraints!Normalization constraintsmax(s) = 1, spectra maximummax(c) = 1 peak maximum

constraints!

max(c) = 1, peak maximum||(s)|| = 1, area, length,................................

Implementation of selectivity/local rank constraints

⎞⎛ 00

constraintsUsing a masking Csel or ST

sel matrix

⎟⎟⎟⎟⎞

⎜⎜⎜⎜⎛

xxxxx

x000

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

=xxx......selC

From local rank (EFA) setting some values to zero ⎟⎟

⎠⎜⎜

⎝ xxxx

00

⎥⎥⎤

⎢⎢⎡

kkkkkkxxxxxxxx ...

TS

i i k

⎥⎥⎥

⎦⎢⎢⎢

⎣

=xxxxxxkkkkkk

...

...selS

Fixing a kown spectrum

Solving intensity ambiguitiesSolving intensity ambiguities in MCR-ALS

d ∑ ∑ k1

d c s c sij in nj

n

in nj

n

= =∑ ∑ kk

k is arbitrary. How to find the right one?In the simultaneous analysis of multiple data matricesintensity/scale ambiguities can be solved y ga) in relative terms (directly)b) in absolute terms using external knowledge ) g g

Two-way dataMCR-ALS for quantitative determinations

T l t 2008 74 1201 10

DSTALS

C

Talanta, 2008, 74, 1201-10

D

S l tUpdated

CConcentration

correlationSelectrefc

b b

correlationconstraint

(multivariatecalALSc cal

ALSc calc calcb, b0

(multivariatecalibration)

Local model

predcpredALSc

calALSref cc −

Errorbcbc 0calALSf ++=

cALS pred

c

b, b0 dˆ

Errorbcbc 0ALSref ++

predALSc predc

0predALS

pred bcbc +=ˆ

V lid ti f th tit tiValidation of the quantitative determination: spectrophotometric analysis ofspectrophotometric analysis of nucleic bases mixtures

Protein and moisture determination in agricultural samples (ray-grass) by PLSR and MCR-ALS

Talanta 2008 74 1201 10Talanta, 2008, 74, 1201-10

RMSEP SEP Bias Correlation RE (%)

ALS PLS ALS PLS ALS PLS ALS PLS ALS PLS

HUM 0.312 0.249 0.315 0.248 7.30 e-4 4.50 e-2 0.9755 0.986 3.70 2.96

PB 0.782 0.564 0.788 0.571 7.35 e-2 3.31 e-2 0.9860 0.993 4.65 3.67

Soft-Hard modellingA B C X

0.6

0.7

0.8

0.9

1n

(a.u

.)

A C

0 6

0.7

0.8

0.9

1

(a.u

.)

A B C XA C

0.1

0.2

0.3

0.4

0.5

Con

cent

ratio

n

B X

0 1

0.2

0.3

0.4

0.5

0.6

Con

cent

ratio

n (

B X

C C0 1 2 3 4 5 6 7 8 9 10

0

Time 0 1 2 3 4 5 6 7 8 9 100

0.1

TimeCSM CHM

N li d lNon-linear model fitting

min(CHM - CSM)

• All or some of the concentration profiles can be

CHM = f(k1, k2)

• All or some of the concentration profiles can be constrained.

• All or some of the batches can be constrained.All or some of the batches can be constrained.

Implementation of hard modelling and shape constraintsconstraints

min ||D –C ST||ALS (D ST) CD = C ST ALS (D,ST) → CALS (D,C) → ST

k3 C

k2 B

k1 A D

C

rate

Csoft Csoft/hard

Ordinary differential equations Integration

Law

d[A]dt

= -k1 [A]

d[B]dt

= k1 [A]- k2 [B]

[A]= [A]0 e-kt

[B]= [A]0 k1

k1 - k2 (e-k1t - e-k2t )

dt……………….. …………….……………….. …………….

Quality of MCR SolutionsRotational AmbiguitiesRotational Ambiguities

Factor Analysis (PCA) Data Matrix DecompositionD U VT ED = U VT + E

‘True’ Data Matrix DecompositionD = C ST + ED = C S + E

D = U T T-1 VT + E = C ST + EC = U T; ST = T-1 VT

How to find the rotation matrix T?Matrix decomposition is not unique!Matrix decomposition is not unique!

T(N,N) is any non-singular matrixThere is rotational freedom for T

It is possible to define bands and límits for the feasible solutions

(Tmax y Tmin)?1) What are the variables of the problem?

T (rotation matrix),

•0 1•0.2•0.3•0.4•0.5

max min

HowTmax and Tmin

( ),D = C T T-1 ST

2) What is the objective function f(T) to•0 •5 •10 •15 •20 •25 •30 •35 •40 •45 •50•0

0.1

•1

•1.5

max mincan becalculated from the

t i t

optimize?

For every species i = 1,..,ns

•0 •5 •10 •15 •20 •25 •30 •35 •40•0

•0.5constraintsof the system

ij ijjcs

f( ) f( )i ics ∑T TConstrained Non-Linear Optimization

Problem (NCP)

jori iT

ij iji,j

f( ) f( )csC S

i i= =∑

T T

Find T which makes: min/max f(T)under ge(T) = 0and gi(T) ≤ 0 f(T) is a scalar value between 0 and 1!

where T is the matrix of variables, f(T) is a scalar non-linear functin of T and g(T) is the vector of non-linear constraints

This function gives the relative contribution of species i compared to h l b l d i l!

Matlab Optimizarion Toolbox fmincon functionthe global measured signal!

3) What are the constraints g(T)?

Optimization algorithmR.Tauler. Journal of Chemometrics, 2001, 15, 627-646 3) What are the constraints g(T)?

The following constraints are considerednormalization/closure gnorm/gclosnon-negativity gcneg/gsneg

Initial estimations of CALS and SALS

profiles are obtained by MCR-ALST=eye(number of species)known values/selectivity gknown/gsel

unimodality gunimtrilinearity (three-way data) gtril For each species define objective function

f(T) ( (T) (T)) ( T / T)

T eye(number of species)

Are they equality or inequality constraints?

4) What are the initial estimations of C and ST?•Initial estimaciones of C y ST are obtained by MCR- Select constraints g(T):

f(T)=norm(c(T)s(T))=norm(cALS T sALS / T)

Initial estimaciones of C y S are obtained by MCRALS•Initial estimations should fulfill the constraints of the system (non-negativity, uunimodality, closure,

g( )equality ge: normalization/closure, known values,

inequality gi: non-negartivity, selectivity, unimodality, trilinearity,

selectivity, local rank ,…)5) What are the initial values of T?

•NCP depends on initial values of T! (local minima, convergence speed )

Find Tmin which gives a minimumof f(T)

under constraints gi(T)<0, ge(T)=0

Find Tmax which gives a maximumof f(T)

under constraints gi(T)<0. ge(T)=0convergence, speed …)

⎟⎟⎞

⎜⎜⎛

0...100...01 Built minimum band

cmin = cALS / Tmin

Built maximum bandcmax = cALS / Tmax

⎟⎟⎟⎟

⎠⎜⎜⎜⎜

⎝ 1...00............

Tini = eye(N) =c c /smin = sALS / Tmin

c c /smax=sALS / Tmax

0.9 3

0.7

0.8

1 5

2

2.5

0.6

0.7

0.5

1

1.5

0.4

0.50 10 20 30 40 50 60

0

0.3

3

4x 104

0.1

0.2

2

3

0 1

0

0

1

0 10 20 30 40 50 60-0.1

0 20 40 60 80 1000

Calculation of feasible bands in the resolution of a single chromatographic run (run 1)

Applied constraints were spectra and elution profiles non-negativity and spectra normalization:

4 4 0.6 0.6

elution profiles spectra profiles

1

2

3

1

2

3

0.2

0.4

0.2

0.4

0 20 40 600

0 20 40 600

4 4

0 10 20 30 400

0 10 20 30 400

0.6 0.6

1

2

3

1

2

3

0.2

0.4

0.2

0.4

0 20 40 600

0 20 40 600

0 10 20 30 400

0 10 20 30 400


Applied constraints were spectra and elution profiles non-negativity, spectra normalization:, and unimodality

1 2

1.4

1.6

0.8

1

1.2

0.4

0.6

unimodality

0 10 20 30 40 50 600

0.2unimodality

no unimodality


Applied constraints were spectra and elution profiles non-negativity, spectra normalization:, and selectivity/local rank

(31 51 45 51 1 8 1 15)(31-51, 45-51, 1-8,1-15)

0.4

0.5

0.4

0.53

3

4

0

0.1

0.2

0.3

0

0.1

0.2

0.3

0

1

2

0

1

2

3

0 10 20 30 400

0.4

0.5

0.4

0.5

0 10 20 30 400

0 20 40 600

0 20 40 600

2

3

2

3

0 10 20 30 400

0.1

0.2

0.3

0 10 20 30 400

0.1

0.2

0.3

0

1

2

0

1

2

0 10 20 30 400 10 20 30 400 20 40 60 0 20 40 60

Evaluation of boundaries of feasible bands:

• W H Lawton and E A Sylvestre Technometrics 1971 13 617-

Previous studies• W.H.Lawton and E.A.Sylvestre, Technometrics, 1971, 13, 617-633•O.S.Borgen and B.R.Kowalski, Anal. Chim. Acta, 1985, 174, 1-g26•K.Kasaki, S.Kawata, S.Minami, Appl. Opt., 1983 (22), 3599-3603R C H d B M Ki (Ch d I ll L b S•R.C.Henry and B.M.Kim (Chemomet. and Intell. Lab. Syst.,

1990, 8, 205-216)•P D Wentzell J-H Wang L F Loucks and K M MillerP.D.Wentzell, J-H. Wang, L.F.Loucks and K.M.Miller (Can.J.Chem. 76, 1144-1155 (1998))•P. Gemperline (Analytical Chemistry, 1999, 71, 5398-5404)p ( y y )•R.Tauler (J.of Chemometrics 2001, 15, 627-46)•M.Legger and P.D.Wentzell, Chemomet and Intell. Lab. Syst., gg , y ,2002, 171-188

Quality of MCR resultsQuality of MCR resultsError propagation and resampling methods

•How experimental error/noise in the input data t i ff t MCR ALS lt ?matrices affects MCR-ALS results?

•For ALS calc lations there is no kno n•For ALS calculations there is no known analytical formula to calculate error estimations. (i e like in linear lesast squares regressions)(i.e. like in linear lesast-squares regressions)

•Bootstrap estimations using resampling methods•Bootstrap estimations using resampling methods is attempted

MCRMCR--ALS: Quality AssessmentALS: Quality AssessmentPropagation of experimental noise into the MCR-ALS solutionsPropagation of experimental noise into the MCR ALS solutions

Experimental noise is propagated into the MCR-ALS solutions andcauses uncertainties in the obtained results.

To estimate these uncertainties for non-linear models like MCR-ALS computer intensive resampling methods can be used

Noise added

(J. of Chemometrics, 2004, 18, 327–340; J.Chemometrics, 2006, 20, 4-67)Mean, max and min profiles Confidence range profiles

Error PropagationParameter Confidence Range

Real 0.1 % 1 % 2 % 5 %

pk1 pk2 pk1 pk2 pk1 pk2 pk1 pk2 pk1 pk2

Theoretical Value Value 3.6660

4.9244 - - - - - - - -

MonteCarlo Simulations

Value - - 3.666 4.924 3.669 4.926 3.676 4.917 3.976 5.074

Stand.dev.

- - 0.001 0.001 0.0065 0.012 0.012 0.024 0.434 0.759

Noise Addition

Value - - 3.654 4.922 3.659 4.913 3.665 4.910 4.075 5.330

StandStand.dev.

- - 0.001 0.002 0.006 0.026 0.010 0.040 0.487 1.122

Value - - 3.655 4.920 3.660 4.913 3.667 4.913 4.082 5.329

JackKnifeStand.

dev.- - 0.004 0.003 0.009 0.024 0.012 0.047 0.514 1.091

Maximum Likelihood MCR-ALS solutions2 2Q Q∂ ∂2 2

T

Q Q= 0, = 0S C∂ ∂

= −∂ ∂

TALSÁLS

ˆ ˆQ D C S ,

Without including t i ti

Including uncertaintiesσ

2ˆ( )m n d d−

uncertainties σi,j

2 2, ,

1 1

ˆ( )m n

i j i ji j

Q d d= =

= −∑∑, ,2

21 1 ,

( )i j i j

i j i j

d dQ

σ= =

= ∑∑Unconstrained WALS solution1 1i j= =

,1−Σ= iiW rows or { }jiσ=ΣUnconstrained ALS solution

Unconstrained WALS solution

T T -1 +PCA PCA

T 1 T +

S = (C C) CD = C Dˆ ˆ

ˆ ˆT -1c(i :)=d(i :)WS(S WS)

,1−Σ= jjW columns{ }ji,σΣ

T -1 T +PCA PCAC = D S(S S) = D (S ) i i

T T -1 Tj j

c(i,:)=d(i,:)WS(S WS)s (:,j)=(C W C) C W d(:,j);

MCR-ALS results quality assesment

Data Fitting n

1i

m

1j2

ji, xxee

lof −==∑ ∑= =100

- lof % ji,ji,ji,n

1i

m

1j2

ji,

xxe ,x

lof ==∑ ∑= =

100

∑ ∑∑ ∑ n mn m

- R %∑ ∑

∑ ∑∑ ∑ = == =−

= n m

n

i

m

j jin

i

m

j ji

x

exR

2

1 12

,1 12

,2 100

Profiles recovery∑ ∑= =i j jix

1 1 ,

yxT

αcos2r- r2 (similarity)

l

yx== αcosr

- recovery angles measured by the inverse cosine α, expressed in hexadecimal degrees )(cos 2rda=αr2 1 0.99 0.95 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00α 0 8.1 18 26 37 46 53 60 66 72 78 84 90

300

350

10

15

350

400

Y + E = X

150

200

250

0

5

150

200

250

300

lof (%) = 14%R2 98.0%mean(S/N)=21 7

0

50

100

15

-10

-5

0

50

100

150 mean(S/N)=21.7

0 10 20 300

0 10 20 30-15

0 10 20 30

Noise structure:r = 0.01*max(max(Y)) = 3.21

HOMOCEDASTICNOISE CASE

600

700

0.7

0.8

( ( ))S = I .* rE = S .* N(0,1)

SVD

NOISE CASE

300

400

500

0.4

0.5

0.6

700

800

900

36

38

40

700

800

900 818.1348 9

815.2346 6

SVDY E X

100

200

0.1

0.2

0.3

0 5 100

100

200

300

400

500

600

0 5 1026

28

30

32

34

0 5 100

100

200

300

400

500

600 348.9112.966.137.0

346.6104.162.90.0

0 5 10 15 20 25 300

0 10 20 30 40 500

G FT

37.039.436.6

0.0

0.7

0.8

0.6

0.7

Red max and min bandsBlue ‘true’ FT

0.4

0.5

0.6

0.4

0.5

f

Blue true F+ from ‘true’ * from pure

0.2

0.3

0.2

0.3

f1f2

0 5 10 15 20 25 30 35 40 45 500

0.1

0 5 10 15 20 25 30 35 40 45 500

0.1

0.6

0.7

0.5

0.6

f3 f4

0.4

0.5

0.3

0.4

0.2

0.3

0 1

0.2

0 5 10 15 20 25 30 35 40 45 500

0.1

0 5 10 15 20 25 30 35 40 45 500

0.1

300

350

120

140

Red max and min bandsBlue ‘true’ G

200

250

80

100

Blue true G+ from ‘true’ * from ‘pure’

100

150

40

60

g1g2

0 5 10 15 20 25 300

50

0 5 10 15 20 25 300

20

100

120

600

700

60

80

400

500g3 g4

20

40

200

300

0 5 10 15 20 25 300

0 5 10 15 20 25 300

100

No noise and homocedastic noise cases resultsrecovery angles

System init method lof % R2% f1 f2 f3 f4g1 g2 g3 g4

recovery angles α

g1 g2 g3 g4

No noise true ALS 0 100 0 0 0 00 0 0 00 0 0 0

No noise purest ALS 0 100 1.8 11 7.9 5.05.9 9.1 13 2.8

max band - Bands 0 100 3.1 13 7.5 5.58.2 18 10 1.7

min band - Bands 0 100 2.1 3.7 3.9 3.95.2 8.1 14 3.0

Homo noise true ALS 12.6 98.4 3.0 12 8.7 2.14.8 12 9.0 2.4

Homo noise purest ALS 12.6 98.4 3.0 17 8.5 5.07.1 12 16 3.7

H i Th 14 0 98 0Homo noise ----- Theor 14.0 98.0 ---- ---- ---- ----Homo noise ----- PCA 12.6 98.4 ---- ---- ---- ----

250

300

350

250

300

350

10

15

Y + E = X

150

200

250

100

150

200

250

-5

0

5 lof (%) = 12, 25, 44%R2 99, 94, 80%mean(S/N) = 17, 10, 3

0 10 20 300

50

100

0 10 20 30

0

50

100

0 10 20 30-15

-10

Noise structure:r = 5, 10, 20S = r * R(0 1) (interv 0 1)

random numbersHETEROCEDASTICNOISE CASE

Low, Medium, High

500

600

700

0.6

0.7

0.8S = r. R(0,1) (interv 0-1)E = S.* N(0,1)

L M HSVD

Y E X

NormalDistributed

, , g

200

300

400

0 2

0.3

0.4

0.5 L M H814 829 823348 340 347111 118 154

815347104

Y E X

500

600

700

800

900

120

130

140

150

500

600

700

800

900

0 5 10 15 20 25 300

100

0 10 20 30 40 500

0.1

0.2

G FT

111 118 15467 82 13533 64 130

L M H

104630 0 5 10

0

100

200

300

400

0 5 1090

100

110

0 5 100

100

200

300

400

G FT L M H36 71 14534 69 134

>>

0.7

0.8

0.7

0.8


0.4

0.5

0.6

0.4

0.5

0.6Blue true F+ from ‘true’ * from pure• No Weighting

0.2

0.3

0.2

0.3

g g

0 5 10 15 20 25 30 35 40 45 500

0.1

0 5 10 15 20 25 30 35 40 45 500

0.1

0.6

0.7

0.5

0.6

0.4

0.5

0.3

0.4

0.2

0.3

0.1

0.2

0 5 10 15 20 25 30 35 40 45 500

0.1

0 5 10 15 20 25 30 35 40 45 50-0.1

0

0.7

0.8

0.6

0.7


0.4

0.5

0.6

0.4

0.5

Blue true F+ from ‘true’ * from pure • weighting

0.2

0.3

0.2

0.3

g g

0 5 10 15 20 25 30 35 40 45 500

0.1

0 5 10 15 20 25 30 35 40 45 500

0.1

0.6

0.7

0.5

0.6

weighting

0.4

0.5

0.3

0.4improvesrecoveries

0.2

0.3

0.1

0.2

0 5 10 15 20 25 30 35 40 45 500

0.1

0 5 10 15 20 25 30 35 40 45 50-0.1

0

300

350

120

140


150

200

250

60

80

100Blue true G+ from ‘true’ * from pure• no weighting

50

100

20

40

g g

0 5 10 15 20 25 30-50

0

0 5 10 15 20 25 30-20

0

120

140

600

700

60

80

100

400

500

20

40

200

300

0 5 10 15 20 25 30-20

0

0 5 10 15 20 25 300

100

300

350

140

160

180


150

200

250

80

100

120

Blue true G+ from ‘true’ * from pure• weighting

50

100

20

40

60

g g

0 5 10 15 20 25 30-50

0

0 5 10 15 20 25 30-20

0

weightingrecovery

120

140

160

700

800

recoveryoverallimprovement

80

100

120

400

500

600

20

40

60

200

300

0 5 10 15 20 25 30-20

0

0 5 10 15 20 25 300

100

System init w lof % R2% f1 f2 f3 f4

Hoterocedastic noise case resultsrecovery angles α

(Case) exp exp g1 g2 g3 g4Hetero noise purest ALS 10.7 98.8 3.1 14 9.0 3.8

(low) 7.0 10 15 4.3Hetero noise purest WALS 12.0 98.6 2.6 12 15 4.3

(low) 7.8 15 15 3.7Theoretical ---- ---- 12.0 98.6 ---- ---- ---- ----PCA 10 7 98 8PCA ---- ---- 10.7 98.8 ---- ---- ---- ----

Hetero noise purest ALS 22.3 95.0 7.7 22 22 5.7(medium) 7 2 21 24 4 5(medium) 7.2 21 24 4.5

Hetero noise purest WALS 24.0 94.2 6.6 22 18 5.7(medium 7.4 14 17 5.5

Theoretical 25 0 93 6Theoretical ---- ---- 25.0 93.6 ---- ---- ---- ----PCA ---- ---- 22.0 95.1 ---- ---- ---- ----

Hetero noise purest ALS 40 0 84 0 12 33 38 10Hetero noise purest ALS 40.0 84.0 12 33 38 10(high) 15 38 34 9.0

Hetero noise purest WALS 43.1 81.4 12 26 25 6.0(high) 5.0 27 16 3.0(high) 5.0 27 16 3.0

Theoretical ---- ---- 44.2 80.4 ---- ---- ---- ----PCA ---- ---- 40.8 83.4 ---- ---- ---- ----

Lecture 2






Spectrometric titrations: An easy way for the generation of two- and three-way data in the study of chemical reactions and interactions

Peristaltic

y y

pump

SpectrophotometerComputer

0.050 mlAutoburettePrinter

Stirrer

T=37oC-125.3

pHmeter

Thermostatic bath

0.4

Three spectrometric titrations of a complexation system at different ligand to metal ratios R

0.2

0.3 R=1.5

400 450 500 550 600 650 700 750 800 850 9000

0.1

0.3

0.4

0.5

R=2

400 450 500 550 600 650 700 750 800 850 9000

0.1

0.2

400 450 500 550 600 650 700 750 800 850 900

0.4

0.5

R=3

0.1

0.2

0.33

400 450 500 550 600 650 700 750 800 850 9000

nm

100

MCR-ALS resolved concentration profiles at R=1.5

80

90

Simoultaneous

70

80 Simoultaneousresolution andtheoretical

50

60 Individualresolution

30

40

20

30

3 4 5 6 7 8 90

10

pH

100


90

100

Individualresolution

70

80 Simoultaneousresolution andtheoretical

50

60

30

40

10

20

3 4 5 6 7 8 90

10

pH

100


90Simoultaneousresolution andtheoretical Individual

70

80Individualresolution

50

60

40

20

30

0

10

3 4 5 6 7 8 9pH

45

MCR-ALS resolved spectra profiles

40

Simoultaneous

30

35 resolution andtheoretical

25

Individual

15

20d v dua

resolutionat R=1.5

10

400 450 500 550 600 650 700 750 800 850 9000

5

400 450 500 550 600 650 700 750 800 850 900nm

Process analysis 4x 10

-4Process analysis0

2

tive

0.08

0.09

0.1

-6

-4

-2

sign

al s

econ

d de

rivat

2nd derivative

0 04

0.05

0.06

0.07

IR a

bsor

banc

e

0 10 20 30 40 50 60 70-10

-8

spectra channel

0.01

0.02

0.03

0.04

0

2

4x 10

-4

0 10 20 30 40 50 60 70spectra channel

One process IR run (raw data) -6

-4

-2

sign

al s

econ

d de

rivat

ive

2nd derivativeand PCA(3 PCs)

0 10 20 30 40 50 60 70-10

-8

spectra channel

(3 PCs)

R.Tauler, B.Kowalski and S.Fleming Anal. Chem., 65 (1993) 2040-47

0.357

ALS resolved pure IR spectra profiles

0.2

0.25

0.3

tratio

n, a

.u.5

6

2

0.05

0.1

0.15

conc

ent

3

4

abso

rban

ce, a

.u.

0 20 40 60 80 100 120 1400

time

EFA of 2nd derivative data: 0

1

2

1

3

initial estimation of process profilesfor 3 components

0 10 20 30 40 50 60 700

spectra channel

0.25

0.15

0.2

a.u.

3

13

3

0.1conc

entra

tion,

a

11

1

1ALS resolved pure concetration profilesin the simultaneous analysis of eigth

runs of the process

0 100 200 300 400 500 600 700 8000

0.05

time

1 1

1

11

1

2

2

33 2

2222

Melting 1Melting 20.91

atio

nStudy of conformational

0 40.50.60.70.8

e co

ncen

tra poly(A)-poly(U) ds

poly(U) rc

equilibria of polynucleotides

20 30 40 50 60 70 80 9000.10.20.30.4

Rel

ativ

e

poly(A)-poly(U)-poly(U) tspoly(A) cs

poly(A) rc

0 2

20 30 40 50 60 70 80 900Temperature (oC)

poly(A) poly(U)

00.050.10.150.2

00.050.10.150.2 rc

ss

poly(adenylic)-poly(uridylic) acid systemlti d t 0 1

0.150.2

0 10.150.2

24026028030002402602803000

R.Tauler, R.Gargallo, M.Vives and A I i d Rid

Melting data2402602803000

0.050.1

24026028030000.050.1

poly(A)-poly(U) ds poly(A) poly(U) poly(U) tA.Izquierdo-RidorsaChemometrics and Intelligent Lab

Systems, 1998

poly(A) poly(U) ds poly(A)-poly(U)-poly(U) t

1source contribution profiles using

0.5

0 5 10 15 20 250

1

0.5

1

0 5 10 15 20 250

0 5

1

0 5 10 15 20 250

0.5

0 5 10 15 20 25

6resolved composition profiles using nnls

2

4

0 20 40 60 80 1000

2

6

2

4

6

0 20 40 60 80 1000

2

6

4

6

0 20 40 60 80 1000

2

Historical Evolution of Multivariate Curve R l ti M th dResolution Methods

• Extension to more than two components• Extension to more than two components • Target Factor Analysis and Iterative Target Factor Analysis Methods• Local Rank Detection, Evolving Factor Analysis, Window Factor Analysis. • Rank Annihilation derived methods• Detection and selection of pure (selective) variables based methods• Alternating Least Squares methods, 1992• Implementation of soft modelling constraints (non-negativity, unimodality, closure,

selectivity local rank ) 1993selectivity, local rank,…) 1993• Extension to higher order data, multiway methods (extension of bilinear models to

augmented data matrices), 1993-5• Trilinear (PARAFAC) models, 1997• Implementation of hard modelling constraints 1997• Implementation of hard-modelling constraints, 1997• Breaking rank deficiencies by matrix augmentation, 1998• Calculation of feasible bands, 2001• Noise propagation,2002p p g• Tucker models, 2005• Weighted Alternating Least Squares method (Maximum Likelihood),2006• …

multivariate resolution inmultivariate resolution in chemistry

Documents