multivariate resolution inmultivariate resolution in chemistry
TRANSCRIPT
Multivariate Resolution inMultivariate Resolution in Chemistryy
Lecture 2Lecture 2
Roma TaulerRoma TaulerIIQAB-CSIC, SpainQ , p
e-mail: [email protected]
Lecture 2Lecture 2
R l ti f t d t• Resolution of two-way data.• Resolution conditions.
– Selective and pure variables. – Local rank
N t l t i t– Natural constraints. • Non-iterative and iterative resolution methods
d l ithand algorithms. • Multivariate Curve Resolution using Alternating
L t S MCR ALSLeast Squares, MCR-ALS. • Examples of application.
Multivariate (Soft) Self Modeling CurveMultivariate (Soft) Self Modeling CurveResolution (definition)
• Group of techniques which intend the recovery of the response profiles (spectrarecovery of the response profiles (spectra, pH profiles, time profiles, elution profiles,....) of more than one component in anof more than one component in an unresolved and unknown mixture obtained from chemical processes and systemsfrom chemical processes and systemswhen no (little) prior information is available about the nature and/oravailable about the nature and/or composition of these mixtures.
Chemical reaction systems monitored using t i tspectroscopic measurements
J J
ST1.5
0.8
1
STC+C EI I
0.5
1
0.4
0.6SC
J
0 20 40 60 80 1000
0 10 20 30 400
0.2
N
J
1
1.5
D
ijekjsN1k ikcijd +∑=
=
Bili it !
I D0.5
1
Bilinearity!0 10 20 30 40 50 60 70 80 90
0
Analytical characterization of complex environmental, industrial and food mixtures using hyphenated methods
-5 4
g yp(chromatography or continuous flow methods with
spectroscopic detection).x 10
1.5
2
2.5
3
3.5x 10
STCSTNC
0.5
1
1
1.5
2
+C
ST
ENR
0 20 40 600
0 20 40 60 80 1000
0.5 +C ENR
D
1.2
0 8
1
LC-DAD coelution NC
D
0 2
0.4
0.6
0.8
ijekjsN1k ikcijd +∑=
NRD
0 10 20 30 40 50 60-0.2
0
0.2jj1kj =
Bilinearity!
Protein folding and dynamic protein-nucleic acid interaction processes.
0.70.80.9
ce (a
.u.) ST
0.8
1CD2O and Cprotein
ion
(a.u
.)
P1 STNC
0 10.20.30.40.50.6
Abs
orba
nc
P1
P
D1
D2
0
0.2
0.4
0.6
Con
cent
rat
43.8 ºC
63.9 ºC
P2
D1
D2 +CST
ENRNR
D
14001500160017001800190000.1
Wavenumber (cm-1)
P220 30 40 50 60 70 800
Temperature (ºC)
NC
NR
1
1.2
1.4
ance
protein
NdNR
NC
0.4
0.6
0.8
Abs
orba
ijekjs1k ikcijd +∑=
=
Bilinearity!
NRD
1400150016001700180019000
0.2
Wavenumber (cm-1)
D2OBilinearity!
Environmental source resolution and apportiomentpp
0
0.05
0.1
0.15
0.2
5
10
15
20
+CST
E
0 10 20 30 40 50 60 70 80 90 1000
0 10 20 30 40 50 60 70 80 90 1000
0.05
0.1
0.15
0.2
0.3
0.4
0 5 10 15 20 250
0 5 10 15 20 250
10
20
30
+C ENR NR
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.20 5 10 15 20 25
0 5 10 15 20 250
5
10
15
20
sourcecomposition
N
NC6
22 samples
sourcedistribution
ijekjsN1k ikcijd +∑=
=
NRD
3
4
522 samples
Bilinearity!1
2
Bilinearity!0 10 20 30 40 50 60 70 80 90 100
0
concn. of 96 organic compounds
Soft-modelling
MCR bilinear model for two way data:
N
d ∑J
ij in nj ijn 1
d c s e=
= +∑I
dij
TD CS E= +D
dij is the data measurement (response) of variable j in sample idij is the data measurement (response) of variable j in sample in=1,...,N are the number of components (species, sources...)cin is the concentration of component n in sample i;s is the response of component nsnj is the response of component nat variable j
Lecture 2Lecture 2
R l ti f t d t• Resolution of two-way data. • Resolution conditions.
– Selective and pure variables. – Local rank
N t l t i t– Natural constraints. • Non-iterative and iterative resolution methods
d l ithand algorithms. • Multivariate Curve Resolution using Alternating
L t S MCR ALSLeast Squares, MCR-ALS. • Examples of application.
Resolution conditions to reduce MCR rotation ambiguities (unique solutions?)
•Selective variables for every component•Local rank conditions (Resolution Theorems)•Natural Constraints
•non-negativity•unimodality•closure (mass-balance)closure (mass balance)
•Multiway Data (i.e. trilinear data...)•Hard-modelling constraints
•mass action law•mass-action law•rate law•....
Sh t i t ( i l t i i t i k h l•Shape constraints (gaussian, lorentzian, assimetric peak shape, log peak shape, ...)•....
Unique resolution conditionsFirst possibility: using selective/pure variables
2l th l ti
First possibility: using selective/pure variables
1
wavelength selectiveRanges, where only onecomponent absorbs
1 elution profiles can beestimated without ambiguitiesg
2elution time selective ranges, where only one component is
1present spectra can be estimated without ambiguities
Detection of ‘purest’ (more selective) variables
Methods focused on finding the most representative g p(purest) rows (or columns) in a data matrix.
Based on PCABased on PCA• Key Set Factor Analysis (KSFA)
Based on the use of real variables• Simple-to-use Interactive Self-modelling analysis p g y
(SIMPLISMA) • Orthogonal Projection Approach (OPA)Orthogonal Projection Approach (OPA)
How to detect purest/selective variables?
Selective variables are the more pure/representative/ dissimilar/orthogonal (linearly independent) variables..!
Examples of proposed methods for detection ofExamples of proposed methods for detection of selective variables:•Key set variables KSFA E.D.Malinowski, Anal.Chim Acta,Key set variables KSFA E.D.Malinowski, Anal.Chim Acta, 134 (1982) 129; IKSFA, Chemolab, 6 (1989) 21 •SIMPLISMA: W.Windig & J.Guilmet, Anal. Chem., 63 (1991) 1425-1432)•Orthogonal Projection Analysis OPA: F.Cuesta-Sanchez et al Anal Chem 68 (1996) 79)et al., Anal. Chem. 68 (1996) 79) •.......
SIMPLISMA
• Finds the purest process or signal variables in a data setdata set.
Most dissimilar signal variablesMost dissimilar signal variables (approximate concentration profiles)
Proocess va
Most dissimilar process variables (approximate signal profiles) ariables
( pp g p )
Signal variables
SIMPLISMA
HPLC-DAD Purest retention times
• Variable purity
i
ii m
sp =
Reten
iimntion tim si Std. deviation
M
mes
Signal variables
mi Mean
Noisy variablesSignal variables Noisy variables
si ↑ mi ↓ ⇒ pi ↑
SIMPLISMA
HPLC-DAD Purest retention times
• Variable purity
fmspi
ii +=
Reten
ifmi +ntion tim si Std. deviation
M
mes
Signal variables
mi Meanf % noise (offset)
Signal variables
Noisy variables → pi ↓
SIMPLISMA
Working procedure1. Selection of first pure variable. max(pi)2. Normalisation of spectra.3. Selection of second pure variable.
a. Calculation of weights (wi)
Rete1a. Calculation of weights (wi) ention tii( )iT
ii YYdetw =
mes
Si l i bl
b. Recalculation of purity (p’i)p’i = wi pi
Signal variables
YiT
c. Next purest variable. max(p’i)
SIMPLISMA
Working procedure3. Selection of third pure variable.
a Calculation of weights (w )a. Calculation of weights (wi)
Rete1( )iT
ii YYdetw = ention tim2b. Recalculation of purity (p’’i)’’ i
mes
Signal variables
p’’i = wi pi
c. Next purest variable. max(p’’i)i
Signal variables
YiT
.
.
.
SIMPLISMA
Graphical informationGraphical information
• Purity spectrum.Plot of pi vs. variables.Plot of pi vs. variables.
• Std. deviation spectrum.Std. deviation spectrum.Plot of ‘purity corrected’ std. dev. (csi) vs.
variablesvariablescsi = wi si
SIMPLISMA
Graphical information10000
Mean spectrum
1 2
1.4Concentration profiles
0
5000
1
1.2
banc
e
0 10 20 30 40 50 600
4000
Std. deviation spectrum
0.6
0.8
Abs
orb
0 10 20 30 40 50 600
2000
1 t t
0.2
0.4
0 5
1
1st pure spectrum
31
0 10 20 30 40 50 600
Retention times 0 10 20 30 40 50 600
0.5
if 1 t i bl i t iif 1st variable is too noisy ⇒ f is too low and should be increased
SIMPLISMA
Graphical information2nd pure spectrum
1 2
1.4Concentration profiles
0 1
0.15
0.22nd pure spectrum
40
1
1.2
banc
e
0 10 20 30 40 50 600
0.05
0.1
0.6
0.8
Abs
orb
311500
0 10 20 30 40 50 60
2nd std. dev. spectrum
0.2
0.4
500
1000
0 10 20 30 40 50 600
Retention times 0 10 20 30 40 50 600
500
SIMPLISMA
Graphical information3 d t
1 2
1.4Concentration profiles
40 0.04
0.063rd pure spectrum
23
1
1.2
banc
e
0 02
0
0.02
0.6
0.8
Abs
orb
310 10 20 30 40 50 60
-0.02
1503rd std. dev. spectrum
0.2
0.4
50
100
0 10 20 30 40 50 600
Retention times 0 10 20 30 40 50 60-50
0
SIMPLISMA
Graphical information4th pure spectrum
1 2
1.4Concentration profiles
402
3x 10
-3 4th pure spectrum
1
1.2
banc
e
0 10 20 30 40 50 60-1
0
1 13
0.6
0.8
Abs
orb
3123
84th std. dev. spectrum
0 10 20 30 40 50 60
0.2
0.4
0
2
4
6
0 10 20 30 40 50 600
Retention times0 10 20 30 40 50 60
-2
0
SIMPLISMA
Graphical information2
x 10-18 5th pure spectrum
1 2
1.4Concentration profiles
40 0
1
1
1.2
banc
e
130 10 20 30 40 50 60
-1
0
0.6
0.8
Abs
orb
3123
131
x 10-14 5th std. dev. spectrum
0.2
0.4
1
0
0 10 20 30 40 50 600
Retention times
0 10 20 30 40 50 60-1
Noisy pattern in both spectra
No more significant contributions
SIMPLISMA
InformationInformation
• Purest variables in the two modes.Purest variables in the two modes.
• Purest signal and concentration profiles.
• Number of compounds.
Unique resolution conditions
•Many chemical mixture systems (evolving or y y ( gnot) do not have selective variables for all the components of the systemp y
•When selected variables are not (totally) ( y)selective, their detection is still very useful as an initial description of the system reducing its p y gcomplexity and because they provide good initial estimations of species profiles useful for most of p pthe resolution methods
Lecture 2Lecture 2
R l ti f t d t• Resolution of two-way data. • Resolution conditions.
– Selective and pure variables. – Local rank
N t l t i t– Natural constraints. • Non-iterative and iterative resolution methods
d l ithand algorithms. • Multivariate Curve Resolution using Alternating
L t S MCR ALSLeast Squares, MCR-ALS. • Examples of application.
Unique resolution conditions
Second possibility: using local rank information
What is local rank?
Local rank is the rank of reduced data regions in any of the two orders of the original data matrix
It can be obtained by Evolving Factor Analysisderived methods (EFA FSMW-EFA )derived methods (EFA, FSMW EFA, ...)
Conditions for unique solutions (unique resolution, uniqueness) based using local rank information have been described as: Resolution TheoremsRolf Manne, On the resolution problem in hyphenated chromatography. Chemometrics and Intelligent Laboratory Systems, 1995, 27, 89-94
Resolution Theorems
Theorem 1: If all interfering compounds that appear inside the concentration window of a given analyte also appearthe concentration window of a given analyte also appear
outside this window, it is possible to calculate without ambiguities the concentration profile of the analyte
[ ] ⎥⎤
⎢⎡ ∑ TTTTVVID )([ ] ⎥
⎦⎢⎣
−=− ∑m
Tmm
Ta
Taa
T vvsscVVID )(
V matrix defines the vector subspace where the analyte is notV matrix defines the vector subspace where the analyte is not present and all the interferents are present. V matrix can be found by PCA (loadings) of the submatrix where the analyte is not present!
1x 10 -5
lResolution Theorems
0.6
0.7
0.8
0.9 analyte
interference
0 2
0.3
0.4
0.5
0.6
interference
interference
0 10 20 30 40 50 600
0.1
0.2
1111111222222222111222222211111111
This local rank information can be obtained from submatrix analysis (EFA, EFF)
11111112222222221112222222111111111111111 ------------ 111---------- 11111111
y ( , )Matrix VT may be obtained from PCA of the regions where the analyteis not present
Thi i kn 1
T T T T−⎡ ⎤∑This is a rank
one matrix!T T T T
a a a m mm 1
D(I VV ) c s (s v )v=
⎡ ⎤− = −⎢ ⎥⎣ ⎦∑
concentration profile of analyte ca may be resolved from D and VT
Resolution Theorems
Theorem 2: If for every interference the concentration window of the analyte has a subwindow where the interference is absent, y ,
then it is possible to calculate the spectrum of the analyte
x 10 -5
0.8
0.9
1
analyteinterference 1
0.3
0.4
0.5
0.6
0.7
interference 2
region where interference 2 region where interference 10 10 20 30 40 50 60
0
0.1
0.2
region where interference 2is not present
region where interference 1is not present
Local rankinformation
Resolution Theorems
Theorem 3. For a resolution based only upon rank information in the chromatographic direction the conditions of Theorems 1
and 2 are not only sufficient but also necessary conditions
Resolution based on local rank conditions1.5
x 10 -5
1.6
1.8
2x 10
-5
Resolution based on local rank conditions
0.5
1
0.6
0.8
1
1.2
1.4
0 10 20 30 40 50 600
0 10 20 30 40 50 600
0.2
0.4
thi tthis system canbe totally resolvedusing local rank
this system cannotbe totally resolved(only partially) based
information!!!( y p y)only in local rankinformation
Unique resolution conditions?
1.8
2x 10
-5
In the case ofembedded peaks,
1.4
1.6
embedded peaks,resolution conditions based
0.8
1
1.2
on local rank arenot fulfilled!
0.4
0.6
resolution withoutambiguities will be
0 10 20 30 40 50 60
0
0.2ambiguities will bedifficult when a singlematrix is analyzed
Conclusions about unique resolutionconditions based on local rank analysis
I d t h t l ti f th t d t
conditions based on local rank analysis
In order to have a correct resolution of the system and to apply resolution theorems it is very important to have:
1) an accurate detection of local rank information EFA based methods
2) This local rank information can be introduced in the resolution process using either:
i i di l i h dnon-iterative direct resolution methodsiterative optimization methods
Resolution Theorems
•Resolution theorems can be used in the two matrix directions (modes/orders), in the chromatographic and in the spectral directionspectral direction.
•Resolution theorems can be easily extended to multiwayResolution theorems can be easily extended to multiway data and augmented data matrices (unfolded, matricized three-way data) Lecture 3
•Many resolution methods are implicitly based on these resolution theoremsresolution theorems
Lecture 2
R l ti f t d t• Resolution of two-way data. • Resolution conditions.
– Selective and pure variables– Local rank
N t l t i t– Natural constraints. • Non-iterative and iterative resolution methods
d l ithand algorithms. • Multivariate Curve Resolution using Alternating
L t S MCR ALSLeast Squares, MCR-ALS. • Examples of application.
Unique resolution conditions
Third possibility: using natural constraints
Natural constraints are previously known conditions p ythat the profile solutions should have. We know that certain solutions are not correct!
Even when non selective variables nor local rank resolutions conditions are present natural constraints canresolutions conditions are present, natural constraints can be applied. They reduce significantly the number of possible solutions (rotation ambiguity) p ( g y)
However, natural constraints alone, do not produce unique solutions in general
Natural constraints
• Non negativity:– species profiles in one or two orders are not
negative (concentration and spectra profiles)• Unimodality:
– some species profiles have only one maximum (i.e. concentration profiles)
Cl• Closure– the sum of species concentration is a known
t t l (i i ti b d tconstant value (i.e. in reaction based systems = mass balance equation)
Non-negativity
Cc0 35
C*0 3
Constrained profile(s) update
plain LS profile(s). 0.2
0.25
0.3
0.35
0.15
0.2
0.25
0.3
plain LS profile(s).
0.05
0.1
0.15
-0.05
0
0.05
0.1
0 10 20 30 40 500
Retention times0 10 20 30 40 50
-0.1
Retention times
Unimodality
C* Cc
0.25
0.3
0.35c
0.25
0.3
0.35
0.1
0.15
0.2
0.1
0.15
0.2
0 5 10 15 20 25 30 35 40 45 500
0.05
Retention times0 5 10 15 20 25 30 35 40 45 50
0
0.05
Retention times
Closure
Σ = ctotal
C*0 35 0 35
CcMass balance
0.25
0.3
0.35
c0.25
0.3
0.35 c
c
0.1
0.15
0.2 ctotal
0.1
0.15
0.2 ctotal
2 3 4 5 6 7 8 90
0.05
pH2 3 4 5 6 7 8 9
0
0.05
pH
Hard-modelling
C* CcPhysicochemical model
0.7
0.8
0.9
1
c
0.7
0.8
0.9
1
y
0.3
0.4
0.5
0.6
0 2
0.3
0.4
0.5
0.6
2 3 4 5 6 7 8 90
0.1
0.2
pH2 3 4 5 6 7 8 9
0
0.1
0.2
pH
Unique resolution conditionsUnique resolution conditions
Forth possibility: by multiway, multiset data analysisForth possibility: by multiway, multiset data analysis and matrix augmentation strategies (Lecture 3)
• A set of correlated data matrices of the same system obtained under different conditions are simultaneously analyzed (Matrix Augmentation)
• Factor Analysis ambiguities can be solved more easily for three-way data, specially for trilinear three-way data
Lecture 2
R l ti f t d t• Resolution of two-way data. • Resolution conditions.
– Selective and pure variables– Local rank
N t l t i t– Natural constraints. • Non-iterative and iterative resolution methods
d l ithand algorithms. • Multivariate Curve Resolution using Alternating
L t S MCR ALSLeast Squares, MCR-ALS. • Examples of application.
Multivariate Curve Resolution (MCR) methods
•Non-iterative resolution methodsNo e ve eso u o e odsRank Annihilation Evolving Factor Analysis (RAEFA)Window Factor Analysis (WFA)Heuristic Evolving Latent Projections (HELP)Subwindow Factor Analysis (SFA)GentleGentle.....
•Iterative resolution methodsIterative Factor Factor Analysis (ITF)Positive Matrix Factorization (PMF)Alternating Least Squares (ALS)…….
Non-iterative resolution methods are mostly based on d t ti d f l l k i f tidetection and use of local rank information
• Rank Annihilation by Evolving Factor Analysis (RAEFA, H.Gampp et al. Anal.Chim.Acta 193 (1987) 287)287)
• Non-iterative EFA (M.Maeder, Anal.Chem. 59 (1987) 527)527)
• Window Factor Analysis (WFA, E.R.Malinowski, J.Chemomet., 6 (1992) 29), ( ) )
• Heuristic Evolving Latent Projections (HELP, O.M.Kvalheim et al., Anal.Chem. 64 (1992) 936)( ) )
WFA method descriptionE R Malinowski J Chemomet 6 (1992) 29)E.R.Malinowski, J.Chemomet., 6 (1992) 29)
D = C ST = Σ cisTi i=1,...,n
1. Evaluate the window where the analyte n is present (EFA, EFF..)2 C t b t i Do d l ti th i d f th l t2. Create submatrix Do deleting the window of the analyte n3. Apply PCA to Do = Uo VTo = Σ uo
jvToj j=1,...,m, m==n-1
4 Spectra of the interferents are: s = Σ β vTo j=1 m4. Spectra of the interferents are: si = Σ βij vToj j=1,...m
5. Spectra of the analyte lie in the orthogonal subspace of VTo
6 Concentration of the analyte c can be calculated from:6. Concentration of the analyte cn can be calculated from:T o
nn n n n(I VV )D s c Dβ− = = Dn is a rank one matrixs o is part of the spectrum of thesn is part of the spectrum of theanalyte sn which is orthogonal tothe interference spectra cn and sn
o can be obtainedobtaineddirectly!! Like 1st Resolution Theorem!!!
Non-iterative resolution methods based on detection and f l l k i f ti
a) D
use of local rank information
a)EFA or EFF: conc. window nth
component= U
VT
Rank n
b)
= UoVTo
R k ( 1)
Do
c)Do
U Rank (n - 1)
T VToVT VTo
vnTo
∈ ⊥
d)
orthogonal
d)
= D
vno
cn
Non-iterative resolution methods based on detection and f l l k i f tiuse of local rank information
The main drawbacks of non-iterative resolution methods (like WFA) are:
a) the impossibility to solve data sets with non-sequential profiles (e.g., data sets q p ( g ,with embedded profiles)
b) h d ff f b d d fi i ib) the dangerous effects of a bad definition of concentration windows.
Non-iterative resolution methods based on detection and
Improving WFA has been the main goal of modifications of
use of local rank information
Improving WFA has been the main goal of modifications of this algorithm:
E.R. Malinowski, Automatic Window Factor Analysis. A more efficient method for determining concentration profiles from evolutionary spectra”. J. Chemometr. 10, 273-279 (1996).
Subwindow Factor Analysis (SFA) based on the systematic comparison of matrix windows sharing onesystematic comparison of matrix windows sharing one compound in common. R. Manne, H. Shen and Y. Liang. “Subwindow factor analysis”. Chemom. Intell. Lab. Sys., 45, 171-176 (1999).
Iterative resolution methods (third alternative!)
Iterative Target Factor Analysis, ITTFAP J G li J Ch I f C t S i 1984– P.J. Gemperline, J.Chem.Inf.Comput.Sci., 1984, 24, 206-12
– B G M Vandeginste et al Anal Chim Acta 1985B.G.M.Vandeginste et al., Anal.Chim.Acta 1985, 173, 253-264
Alternating Least Squares, ALS– R.Tauler, A.Izquierdo-Ridorsa and E.Casassas.
Ch t i d I t lli t L b tChemometrics and Intelligent Laboratory Systems, 1993, 18, 293-300.
– R Tauler A K Smilde and B R Kowalski J– R. Tauler, A.K. Smilde and B.R Kowalski. J. Chemometrics 1995, 9, 31-58.
– R.Tauler, Chemometrics and Intelligent , gLaboratory Systems, 1995, 30, 133-146.
Iterative Target Factor Analysisa)
x1in
xa) Geometrical representation
x1out
x2in
x2out
ITTFAof ITTFA from initialneedle targets x1in and x2in
b)
2out
1x1in x1ou
t
b) Evolution of the shapeof the two profiles through
tR
x
tR
x
the ITTFA process
x2ou
t
x2in
tR tRtR tR
ITTFA
Iterative resolution methods
Iterative Target Factor Analysis ITTFA
ITTFA gets each concentration profile following the stepsbelow:below:
1. Calculation of the score matrix by PCA.2. Use of an estimated concentration profile as initialtarget.3 Projection of the target onto the score space3. Projection of the target onto the score space.4. Constraint of the target projected.5. Projection of the constrained target.5. Projection of the constrained target.6. Go to 4 until convergence is achieved.
Lecture 2Lecture 2
R l ti f t d t• Resolution of two-way data. • Resolution conditions.
– Selective and pure variables– Local rank
N t l t i t– Natural constraints. • Non-iterative and iterative resolution methods
d l ithand algorithms. • Multivariate Curve Resolution using Alternating
L t S MCR ALSLeast Squares, MCR-ALS. • Examples of application.
Soft-modelling
MCR bilinear model for two way data:
N
d ∑J
ij in nj ijn 1
d c s e=
= +∑I
dij
TD CS E= +D
dij is the data measurement (response) of variable j in sample idij is the data measurement (response) of variable j in sample in=1,...,N are the number of components (species, sources...)cin is the concentration of component n in sample i;s is the response of component nsnj is the response of component nat variable j
Multivariate Curve Resolution (MCR)
Pure component informationMixed information
s1
t
λ
C
STsn
c nc 1
D
tR
CD
WavelengthsRetention times
Pure concentration profilesChemical model
Pure signalsCompound identity
Process evolutionCompound contribution
relative quantitation
Compound identitysource identification and Interpretation
An algorithm to solve Bilinear models using Multivariate Curve Resolution (MCR):
Alternating Least Squares (MCR-ALS)
C and ST are obtained by solving iteratively the two alternating LS equations:
TPCAC
SCDmin ˆˆˆˆ
− TPCA
SSCDmin
Tˆˆˆ −
• Optional constraints (local rank, non-negativity, unimodality,closure,…) are applied at each iteration• Initial estimates of C or S are obtained from EFA or from pure variable detection methods.
Multivariate Curve ResolutionAlternating Least SquaresAlternating Least Squares
ModelTD = CS +E
Model ˆ TPCAD = UV
ˆˆ T
PCAC constraintsmin D - CS
C,constraints
ˆ Ti D CS
Algorithm to findthe Solution
T
TPCAS ,constraints
min D - CS
Multivariate Curve Resolution Alternating Least SquaresAlternating Least Squares
(MCR-ALS)U t i d S l tiUnconstrained Solution
I iti l ti t f C• Initial estimates of C or S are obtained from EFA or from pure
TD=C S +EEFA or from pure variable detection methods
PCAˆT +1)S =C D
• Optional constraints are applied at each
PCA
ˆ T +
1) S C D
2)C=D (S )iteration ! PCA2) C D (S )C+ and (ST)+ are the pseudoinversesC and (S ) are the pseudoinverses
of C and ST respe ctively
Matrix pseudoinverses
C and ST are not square matrices. Their inverses are not defined
If they are full rank, i.e. the rank of C is equal to the number of its columns, and the rank of ST is equal to the number of its rows,The generalized inverse or pseudoinverse is defined:The generalized inverse or pseudoinverse is defined:
D = C ST D = C ST
CT D = CT C ST D S = C ST SC D = C C S D S = C S S(CT C)-1 CT D = (CT C)-1(CT C) ST D S (ST S)-1 = C (ST S) (ST S)-1
(CT C)-1 CT D = ST D S (ST S)-1 = CC+ D = ST D (ST)+ = CC+ D = ST D (ST)+ = CWhere C+ = (CT C)-1 CT Where (ST)+ = S (ST S)-1
C+ and (ST)+ are the pseudoinverses of C and ST respectively. They also provide the best least squares estimations of the overdetermined linear system of equations. If C and ST are not full rank, it is still possible to define their pseudoinverses using SVD
Flowchart of DMCR-ALS 1
PCA EFA FSMWEFApurest2
Constraints:N t l
N.components Local RankInitialeatimates
NaturalSelectivity
Local Rank Shape
34Shape
EqualityCorrelationHard model
ALS5
Hard model..........
STQuantitative E Fit and
DiagnosticsC
SQualitativeInformation
QuantitativeInformation
Diagnostics
Iterative resolution methods
Alternating Least Squares MCR-ALS
ALS optimizes concentration and spectra profiles using a constrained alternating least squares method. The main steps of the method are:
1 Calculation of the PCA reproduced data matrix1. Calculation of the PCA reproduced data matrix.2. Calculation of initial estimations of concentration or spectral profiles (e.g, using SIMPLISMA or EFA).spectral profiles (e.g, using SIMPLISMA or EFA).3. Alternating Least Squares
Iterative least squares constrained estimation of C or ST
Iterative least squares constrained estimation of ST or CTest convergence
4 Interpretation of results4. Interpretation of results
Flowchart of MCR-ALS
D C ST + E
Journal of Chemometrics, 1995, 9, 31-58; Chemomet.Intel. Lab. Systems, 1995, 30, 133-146Journal of Chemometrics, 2001, 15, 749-7; Analytica Chimica Acta, 2003, 500,195-210
Resolved
STD = C ST + E(bilinear model)
Data InitialSVDor ALS
ResolvedSpectraprofiles
olve
dtr
atio
nfi
les
Matrix Estimationor
PCA optimization
Res
oC
once
npr
of
E+
DC
Estimation of the number
of components
Initial estimation ALS optimization
D t t i
C
componentsCONSTRAINTS Results of the ALS optimization
procedure:Fit and Diagnostics
Data matrix decomposition according to a bilinear model
Tˆˆˆ TˆˆˆTPCAC
SCDmin ˆˆˆˆ
− TPCA
SSCDmin
Tˆˆˆ −
Until recentlyMCR-ALS input had to be typed in
Until recently the MATLAB command line
Troublesome and difficult in complex cases where several data matrices are simultaneously analyzed and/or different constraints are applied to each of them for an optimal resolution
A graphical user-friendlyNow
A graphical user friendly interface for MCR-ALS
J Jaumot R Gargallo A de Juan and R Tauler ChemometricsJ. Jaumot, R. Gargallo, A. de Juan and R. Tauler, Chemometrics and Intelligent Laboratory Systems, 2005, 76(1) 101-110
Multivariate Curve ResolutionHome PageHome Page
http://www.ub.es/gesq/mcr/mcr.htm
E l A l i f lti l i t A l i f 4Example. Analysis of multiple experiments. Analysis of 4 HPLC-DAD runs each of them containing four compounds
Alternating Least Squares Initial estimates
• from EFA derived methods (for evolving methods like chromatography titrations )chromatography, titrations...)
• from ‘pure’ variable (SIMPLISMA) detection methods (for non-evolving methods and/or for very poorly resolved g y p ysystems...)
• from individually and directly selected from the data using chemical reasoning (i.e first and last spectrum; isosbestic points, ....)
• from known profiles ...
Alternating Least Squares with constraints
• Natural constraints: non-negativity; unimodality, closureclosure,...
• Equality constraints: selectivity, zero concentration windows, known profiles..., p
• Optional Shape constraints (gaussian shapes, asymmetric shapes)
• Hard modeling constraints (rate law, equilibrium mass-action law...)
• ......................
How to implement constrained ALS optimization algorithms in optimal way from a least squares sense?optimal way from a least squares sense?
Considerations:
How to implement these algorithms in a way that all the constraints be fulfilled simultaneously at the same timeconstraints be fulfilled simultaneously at the same time(in every least squares step - in one LS shot- of the optimization)?
Updating (substitution) methods do work well most of the times! Why? Because the optimal solutions which better fit the data (apart from noise and degrees of freedom) do also fulfill the constraints of the system
Constraints are used to lead the optimization in the right direction within feasible band solutions. .
Implementation of constraintsNon negativity constraints caseNon-negativity constraints case
a) forcing values during iteration (e g negative values to zero)a) forcing values during iteration (e.g negative values to zero)intuitivefasteasy to implementit can be used individually for each profile independentlyless efficient
b) using non-negative rigurous least squares optimization proceures:more statistically efficientmore statistically efficientmore efficientmore difficult to implementpit has to be used to all profiles simultaneouslydifferent approaches (penalty functions, constrainedoptimization, elimination...
How to implement constrained ALS optimization algorithms in optimal way from a least squaresalgorithms in optimal way from a least squares
sense?
Different rigorous least-squares approaches have been proposed
- Non-negative least squares methods (Lawson CL, Hanson RJ. Solving Least Squares Problems.Prentice-Hall: 1974; Bro R, de Jong S. J. Chemometrics 1997; 11: 393–40; Mark H.Van Benthem and Michael R.Keenan, Journal of Chemometrics, 18, 441-450; ...)
- Unimodal least-squares approaches (R.Bro, N.D.Sidiropoulus, J.of Chemometrics, 1998, 12, 223-247)
- Equality constraints (Van Benthem M, Keenan M, Haaland D. J. Chemometrics 2002; 16, 613–622....)
- Use of penalty terms in the objective functions to optimize
- Non-linear optimization with non-linear constraints (PMF, Multilinear Engine, sequential quadratic programming.....
Active non negativity constraints:Checking active constraints:Are still active the constraints at the optimum ALS solution?
Active non-negativity constraints:C matrix
r c value19 1 4 1408e 003
Checking active constraints:ALS solutions DPCA, CALS, SALS
New unconstrained solutions 19 1 -4.1408e-00321 1 -3.2580e-00323 1 -1.8209e-00324 1 -3.3004e-0031 2 1 1663e 002
Cunc = DPCA (STALS)+
STunc = (CALS)+ DPCA
Deviationsare small!!!1 2 -1.1663e-002
2 2 -2.1166e-0023 2 -2.1081e-0024 2 -3.8524e-00325 2 1 9865e 0031
1.5
2c1 alsc2 alsc3 alsc1 unc
25 2 -1.9865e-00326 2 -1.3210e-0037 3 -5.9754e-0038 3 -5.5289e-004
0 5
0
0.5
1 c2 uncc3 unc
ST matrixEmpty matrix: 0-by-3
0 5 10 15 20 25-0.5
0.4
0.5s1 alss2 alss3 als
0.1
0.2
0.3s1 uncs2 uncs3 unc
Proposal: Check ALS solutions for active
constraints and if0 5 10 15 20 25 30 35 40 45 50
0constraints and if deviations are large!
Implementation of unimodality constraints
‘vertical’ unimodality: forcing non-unimodal parts of the profile to zero p p
‘horizontal’ unimodality: forzing non-unimodal parts of the profile to be equal to the last unimodal value
‘average’ unimodality: forcing non-unimodal parts of the profile to p pbe an average between the two extreme values being still unimodal
using momotone regressionproceduresprocedures
Implementation of closure/ /normalization constraintsconstraints
Equality constraints:Closure constraintsexperimental point i 3 concn profilesexperimental point i, 3 concn profilesci1 + ci2 + ci3 = tici1r1+ci2r2+ci3r3 = ti
• • Σ•= t
closure
.
i1 1 i2 2 i3 3 iC r = tr = C+ t These are equality
constraints!Normalization constraintsmax(s) = 1, spectra maximummax(c) = 1 peak maximum
constraints!
max(c) = 1, peak maximum||(s)|| = 1, area, length,................................
Implementation of selectivity/local rank constraints
⎞⎛ 00
constraintsUsing a masking Csel or ST
sel matrix
⎟⎟⎟⎟⎞
⎜⎜⎜⎜⎛
xxxxx
x000
⎟⎟⎟⎟⎟
⎜⎜⎜⎜⎜
=xxx......selC
From local rank (EFA) setting some values to zero ⎟⎟
⎠⎜⎜
⎝ xxxx
00
⎥⎥⎤
⎢⎢⎡
kkkkkkxxxxxxxx ...
TS
i i k
⎥⎥⎥
⎦⎢⎢⎢
⎣
=xxxxxxkkkkkk
...
...selS
Fixing a kown spectrum
Solving intensity ambiguitiesSolving intensity ambiguities in MCR-ALS
d ∑ ∑ k1
d c s c sij in nj
n
in nj
n
= =∑ ∑ kk
k is arbitrary. How to find the right one?In the simultaneous analysis of multiple data matricesintensity/scale ambiguities can be solved y ga) in relative terms (directly)b) in absolute terms using external knowledge ) g g
Two-way dataMCR-ALS for quantitative determinations
T l t 2008 74 1201 10
DSTALS
C
Talanta, 2008, 74, 1201-10
D
S l tUpdated
CConcentration
correlationSelectrefc
b b
correlationconstraint
(multivariatecalALSc cal
ALSc calc calcb, b0
(multivariatecalibration)
Local model
predcpredALSc
calALSref cc −
Errorbcbc 0calALSf ++=
cALS pred
c
b, b0 dˆ
Errorbcbc 0ALSref ++
predALSc predc
0predALS
pred bcbc +=ˆ
V lid ti f th tit tiValidation of the quantitative determination: spectrophotometric analysis ofspectrophotometric analysis of nucleic bases mixtures
Protein and moisture determination in agricultural samples (ray-grass) by PLSR and MCR-ALS
Talanta 2008 74 1201 10Talanta, 2008, 74, 1201-10
RMSEP SEP Bias Correlation RE (%)
ALS PLS ALS PLS ALS PLS ALS PLS ALS PLS
HUM 0.312 0.249 0.315 0.248 7.30 e-4 4.50 e-2 0.9755 0.986 3.70 2.96
PB 0.782 0.564 0.788 0.571 7.35 e-2 3.31 e-2 0.9860 0.993 4.65 3.67
Soft-Hard modellingA B C X
0.6
0.7
0.8
0.9
1n
(a.u
.)
A C
0 6
0.7
0.8
0.9
1
(a.u
.)
A B C XA C
0.1
0.2
0.3
0.4
0.5
Con
cent
ratio
n
B X
0 1
0.2
0.3
0.4
0.5
0.6
Con
cent
ratio
n (
B X
C C0 1 2 3 4 5 6 7 8 9 10
0
Time 0 1 2 3 4 5 6 7 8 9 100
0.1
TimeCSM CHM
N li d lNon-linear model fitting
min(CHM - CSM)
• All or some of the concentration profiles can be
CHM = f(k1, k2)
• All or some of the concentration profiles can be constrained.
• All or some of the batches can be constrained.All or some of the batches can be constrained.
Implementation of hard modelling and shape constraintsconstraints
min ||D –C ST||ALS (D ST) CD = C ST ALS (D,ST) → CALS (D,C) → ST
k3 C
k2 B
k1 A D
C
rate
Csoft Csoft/hard
Ordinary differential equations Integration
Law
d[A]dt
= -k1 [A]
d[B]dt
= k1 [A]- k2 [B]
[A]= [A]0 e-kt
[B]= [A]0 k1
k1 - k2 (e-k1t - e-k2t )
dt……………….. …………….……………….. …………….
Quality of MCR SolutionsRotational AmbiguitiesRotational Ambiguities
Factor Analysis (PCA) Data Matrix DecompositionD U VT ED = U VT + E
‘True’ Data Matrix DecompositionD = C ST + ED = C S + E
D = U T T-1 VT + E = C ST + EC = U T; ST = T-1 VT
How to find the rotation matrix T?Matrix decomposition is not unique!Matrix decomposition is not unique!
T(N,N) is any non-singular matrixThere is rotational freedom for T
It is possible to define bands and límits for the feasible solutions
(Tmax y Tmin)?1) What are the variables of the problem?
T (rotation matrix),
•0 1•0.2•0.3•0.4•0.5
max min
HowTmax and Tmin
( ),D = C T T-1 ST
2) What is the objective function f(T) to•0 •5 •10 •15 •20 •25 •30 •35 •40 •45 •50•0
0.1
•1
•1.5
max mincan becalculated from the
t i t
optimize?
For every species i = 1,..,ns
•0 •5 •10 •15 •20 •25 •30 •35 •40•0
•0.5constraintsof the system
ij ijjcs
f( ) f( )i ics ∑T TConstrained Non-Linear Optimization
Problem (NCP)
jori iT
ij iji,j
f( ) f( )csC S
i i= =∑
T T
Find T which makes: min/max f(T)under ge(T) = 0and gi(T) ≤ 0 f(T) is a scalar value between 0 and 1!
where T is the matrix of variables, f(T) is a scalar non-linear functin of T and g(T) is the vector of non-linear constraints
This function gives the relative contribution of species i compared to h l b l d i l!
Matlab Optimizarion Toolbox fmincon functionthe global measured signal!
3) What are the constraints g(T)?
Optimization algorithmR.Tauler. Journal of Chemometrics, 2001, 15, 627-646 3) What are the constraints g(T)?
The following constraints are considerednormalization/closure gnorm/gclosnon-negativity gcneg/gsneg
Initial estimations of CALS and SALS
profiles are obtained by MCR-ALST=eye(number of species)known values/selectivity gknown/gsel
unimodality gunimtrilinearity (three-way data) gtril For each species define objective function
f(T) ( (T) (T)) ( T / T)
T eye(number of species)
Are they equality or inequality constraints?
4) What are the initial estimations of C and ST?•Initial estimaciones of C y ST are obtained by MCR- Select constraints g(T):
f(T)=norm(c(T)s(T))=norm(cALS T sALS / T)
Initial estimaciones of C y S are obtained by MCRALS•Initial estimations should fulfill the constraints of the system (non-negativity, uunimodality, closure,
g( )equality ge: normalization/closure, known values,
inequality gi: non-negartivity, selectivity, unimodality, trilinearity,
selectivity, local rank ,…)5) What are the initial values of T?
•NCP depends on initial values of T! (local minima, convergence speed )
Find Tmin which gives a minimumof f(T)
under constraints gi(T)<0, ge(T)=0
Find Tmax which gives a maximumof f(T)
under constraints gi(T)<0. ge(T)=0convergence, speed …)
⎟⎟⎞
⎜⎜⎛
0...100...01 Built minimum band
cmin = cALS / Tmin
Built maximum bandcmax = cALS / Tmax
⎟⎟⎟⎟
⎠⎜⎜⎜⎜
⎝ 1...00............
Tini = eye(N) =c c /smin = sALS / Tmin
c c /smax=sALS / Tmax
0.9 3
0.7
0.8
1 5
2
2.5
0.6
0.7
0.5
1
1.5
0.4
0.50 10 20 30 40 50 60
0
0.3
3
4x 104
0.1
0.2
2
3
0 1
0
0
1
0 10 20 30 40 50 60-0.1
0 20 40 60 80 1000
Calculation of feasible bands in the resolution of a single chromatographic run (run 1)
Applied constraints were spectra and elution profiles non-negativity and spectra normalization:
4 4 0.6 0.6
elution profiles spectra profiles
1
2
3
1
2
3
0.2
0.4
0.2
0.4
0 20 40 600
0 20 40 600
4 4
0 10 20 30 400
0 10 20 30 400
0.6 0.6
1
2
3
1
2
3
0.2
0.4
0.2
0.4
0 20 40 600
0 20 40 600
0 10 20 30 400
0 10 20 30 400
Calculation of feasible bands in the resolution of a single chromatographic run (run 1)
Applied constraints were spectra and elution profiles non-negativity, spectra normalization:, and unimodality
1 2
1.4
1.6
0.8
1
1.2
0.4
0.6
unimodality
0 10 20 30 40 50 600
0.2unimodality
no unimodality
Calculation of feasible bands in the resolution of a single chromatographic run (run 1)
Applied constraints were spectra and elution profiles non-negativity, spectra normalization:, and selectivity/local rank
(31 51 45 51 1 8 1 15)(31-51, 45-51, 1-8,1-15)
0.4
0.5
0.4
0.53
3
4
0
0.1
0.2
0.3
0
0.1
0.2
0.3
0
1
2
0
1
2
3
0 10 20 30 400
0.4
0.5
0.4
0.5
0 10 20 30 400
0 20 40 600
0 20 40 600
2
3
2
3
0 10 20 30 400
0.1
0.2
0.3
0 10 20 30 400
0.1
0.2
0.3
0
1
2
0
1
2
0 10 20 30 400 10 20 30 400 20 40 60 0 20 40 60
Evaluation of boundaries of feasible bands:
• W H Lawton and E A Sylvestre Technometrics 1971 13 617-
Previous studies• W.H.Lawton and E.A.Sylvestre, Technometrics, 1971, 13, 617-633•O.S.Borgen and B.R.Kowalski, Anal. Chim. Acta, 1985, 174, 1-g26•K.Kasaki, S.Kawata, S.Minami, Appl. Opt., 1983 (22), 3599-3603R C H d B M Ki (Ch d I ll L b S•R.C.Henry and B.M.Kim (Chemomet. and Intell. Lab. Syst.,
1990, 8, 205-216)•P D Wentzell J-H Wang L F Loucks and K M MillerP.D.Wentzell, J-H. Wang, L.F.Loucks and K.M.Miller (Can.J.Chem. 76, 1144-1155 (1998))•P. Gemperline (Analytical Chemistry, 1999, 71, 5398-5404)p ( y y )•R.Tauler (J.of Chemometrics 2001, 15, 627-46)•M.Legger and P.D.Wentzell, Chemomet and Intell. Lab. Syst., gg , y ,2002, 171-188
Quality of MCR resultsQuality of MCR resultsError propagation and resampling methods
•How experimental error/noise in the input data t i ff t MCR ALS lt ?matrices affects MCR-ALS results?
•For ALS calc lations there is no kno n•For ALS calculations there is no known analytical formula to calculate error estimations. (i e like in linear lesast squares regressions)(i.e. like in linear lesast-squares regressions)
•Bootstrap estimations using resampling methods•Bootstrap estimations using resampling methods is attempted
MCRMCR--ALS: Quality AssessmentALS: Quality AssessmentPropagation of experimental noise into the MCR-ALS solutionsPropagation of experimental noise into the MCR ALS solutions
Experimental noise is propagated into the MCR-ALS solutions andcauses uncertainties in the obtained results.
To estimate these uncertainties for non-linear models like MCR-ALS computer intensive resampling methods can be used
Noise added
(J. of Chemometrics, 2004, 18, 327–340; J.Chemometrics, 2006, 20, 4-67)Mean, max and min profiles Confidence range profiles
Error PropagationParameter Confidence Range
Real 0.1 % 1 % 2 % 5 %
pk1 pk2 pk1 pk2 pk1 pk2 pk1 pk2 pk1 pk2
Theoretical Value Value 3.6660
4.9244 - - - - - - - -
MonteCarlo Simulations
Value - - 3.666 4.924 3.669 4.926 3.676 4.917 3.976 5.074
Stand.dev.
- - 0.001 0.001 0.0065 0.012 0.012 0.024 0.434 0.759
Noise Addition
Value - - 3.654 4.922 3.659 4.913 3.665 4.910 4.075 5.330
StandStand.dev.
- - 0.001 0.002 0.006 0.026 0.010 0.040 0.487 1.122
Value - - 3.655 4.920 3.660 4.913 3.667 4.913 4.082 5.329
JackKnifeStand.
dev.- - 0.004 0.003 0.009 0.024 0.012 0.047 0.514 1.091
Maximum Likelihood MCR-ALS solutions2 2Q Q∂ ∂2 2
T
Q Q= 0, = 0S C∂ ∂
= −∂ ∂
TALSÁLS
ˆ ˆQ D C S ,
Without including t i ti
Including uncertaintiesσ
2ˆ( )m n d d−
uncertainties σi,j
2 2, ,
1 1
ˆ( )m n
i j i ji j
Q d d= =
= −∑∑, ,2
21 1 ,
( )i j i j
i j i j
d dQ
σ= =
= ∑∑Unconstrained WALS solution1 1i j= =
,1−Σ= iiW rows or { }jiσ=ΣUnconstrained ALS solution
Unconstrained WALS solution
T T -1 +PCA PCA
T 1 T +
S = (C C) CD = C Dˆ ˆ
ˆ ˆT -1c(i :)=d(i :)WS(S WS)
,1−Σ= jjW columns{ }ji,σΣ
T -1 T +PCA PCAC = D S(S S) = D (S ) i i
T T -1 Tj j
c(i,:)=d(i,:)WS(S WS)s (:,j)=(C W C) C W d(:,j);
MCR-ALS results quality assesment
Data Fitting n
1i
m
1j2
ji, xxee
lof −==∑ ∑= =100
- lof % ji,ji,ji,n
1i
m
1j2
ji,
xxe ,x
lof ==∑ ∑= =
100
∑ ∑∑ ∑ n mn m
- R %∑ ∑
∑ ∑∑ ∑ = == =−
= n m
n
i
m
j jin
i
m
j ji
x
exR
2
1 12
,1 12
,2 100
Profiles recovery∑ ∑= =i j jix
1 1 ,
yxT
αcos2r- r2 (similarity)
l
yx== αcosr
- recovery angles measured by the inverse cosine α, expressed in hexadecimal degrees )(cos 2rda=αr2 1 0.99 0.95 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00α 0 8.1 18 26 37 46 53 60 66 72 78 84 90
300
350
10
15
350
400
Y + E = X
150
200
250
0
5
150
200
250
300
lof (%) = 14%R2 98.0%mean(S/N)=21 7
0
50
100
15
-10
-5
0
50
100
150 mean(S/N)=21.7
0 10 20 300
0 10 20 30-15
0 10 20 30
Noise structure:r = 0.01*max(max(Y)) = 3.21
HOMOCEDASTICNOISE CASE
600
700
0.7
0.8
( ( ))S = I .* rE = S .* N(0,1)
SVD
NOISE CASE
300
400
500
0.4
0.5
0.6
700
800
900
36
38
40
700
800
900 818.1348 9
815.2346 6
SVDY E X
100
200
0.1
0.2
0.3
0 5 100
100
200
300
400
500
600
0 5 1026
28
30
32
34
0 5 100
100
200
300
400
500
600 348.9112.966.137.0
346.6104.162.90.0
0 5 10 15 20 25 300
0 10 20 30 40 500
G FT
37.039.436.6
0.0
0.7
0.8
0.6
0.7
Red max and min bandsBlue ‘true’ FT
0.4
0.5
0.6
0.4
0.5
f
Blue true F+ from ‘true’ * from pure
0.2
0.3
0.2
0.3
f1f2
0 5 10 15 20 25 30 35 40 45 500
0.1
0 5 10 15 20 25 30 35 40 45 500
0.1
0.6
0.7
0.5
0.6
f3 f4
0.4
0.5
0.3
0.4
0.2
0.3
0 1
0.2
0 5 10 15 20 25 30 35 40 45 500
0.1
0 5 10 15 20 25 30 35 40 45 500
0.1
300
350
120
140
Red max and min bandsBlue ‘true’ G
200
250
80
100
Blue true G+ from ‘true’ * from ‘pure’
100
150
40
60
g1g2
0 5 10 15 20 25 300
50
0 5 10 15 20 25 300
20
100
120
600
700
60
80
400
500g3 g4
20
40
200
300
0 5 10 15 20 25 300
0 5 10 15 20 25 300
100
No noise and homocedastic noise cases resultsrecovery angles
System init method lof % R2% f1 f2 f3 f4g1 g2 g3 g4
recovery angles α
g1 g2 g3 g4
No noise true ALS 0 100 0 0 0 00 0 0 00 0 0 0
No noise purest ALS 0 100 1.8 11 7.9 5.05.9 9.1 13 2.8
max band - Bands 0 100 3.1 13 7.5 5.58.2 18 10 1.7
min band - Bands 0 100 2.1 3.7 3.9 3.95.2 8.1 14 3.0
Homo noise true ALS 12.6 98.4 3.0 12 8.7 2.14.8 12 9.0 2.4
Homo noise purest ALS 12.6 98.4 3.0 17 8.5 5.07.1 12 16 3.7
H i Th 14 0 98 0Homo noise ----- Theor 14.0 98.0 ---- ---- ---- ----Homo noise ----- PCA 12.6 98.4 ---- ---- ---- ----
250
300
350
250
300
350
10
15
Y + E = X
150
200
250
100
150
200
250
-5
0
5 lof (%) = 12, 25, 44%R2 99, 94, 80%mean(S/N) = 17, 10, 3
0 10 20 300
50
100
0 10 20 30
0
50
100
0 10 20 30-15
-10
Noise structure:r = 5, 10, 20S = r * R(0 1) (interv 0 1)
random numbersHETEROCEDASTICNOISE CASE
Low, Medium, High
500
600
700
0.6
0.7
0.8S = r. R(0,1) (interv 0-1)E = S.* N(0,1)
L M HSVD
Y E X
NormalDistributed
, , g
200
300
400
0 2
0.3
0.4
0.5 L M H814 829 823348 340 347111 118 154
815347104
Y E X
500
600
700
800
900
120
130
140
150
500
600
700
800
900
0 5 10 15 20 25 300
100
0 10 20 30 40 500
0.1
0.2
G FT
111 118 15467 82 13533 64 130
L M H
104630 0 5 10
0
100
200
300
400
0 5 1090
100
110
0 5 100
100
200
300
400
G FT L M H36 71 14534 69 134
>>
0.7
0.8
0.7
0.8
Red max and min bandsBlue ‘true’ FT
0.4
0.5
0.6
0.4
0.5
0.6Blue true F+ from ‘true’ * from pure• No Weighting
0.2
0.3
0.2
0.3
g g
0 5 10 15 20 25 30 35 40 45 500
0.1
0 5 10 15 20 25 30 35 40 45 500
0.1
0.6
0.7
0.5
0.6
0.4
0.5
0.3
0.4
0.2
0.3
0.1
0.2
0 5 10 15 20 25 30 35 40 45 500
0.1
0 5 10 15 20 25 30 35 40 45 50-0.1
0
0.7
0.8
0.6
0.7
Red max and min bandsBlue ‘true’ FT
0.4
0.5
0.6
0.4
0.5
Blue true F+ from ‘true’ * from pure • weighting
0.2
0.3
0.2
0.3
g g
0 5 10 15 20 25 30 35 40 45 500
0.1
0 5 10 15 20 25 30 35 40 45 500
0.1
0.6
0.7
0.5
0.6
weighting
0.4
0.5
0.3
0.4improvesrecoveries
0.2
0.3
0.1
0.2
0 5 10 15 20 25 30 35 40 45 500
0.1
0 5 10 15 20 25 30 35 40 45 50-0.1
0
300
350
120
140
Red max and min bandsBlue ‘true’ G
150
200
250
60
80
100Blue true G+ from ‘true’ * from pure• no weighting
50
100
20
40
g g
0 5 10 15 20 25 30-50
0
0 5 10 15 20 25 30-20
0
120
140
600
700
60
80
100
400
500
20
40
200
300
0 5 10 15 20 25 30-20
0
0 5 10 15 20 25 300
100
300
350
140
160
180
Red max and min bandsBlue ‘true’ G
150
200
250
80
100
120
Blue true G+ from ‘true’ * from pure• weighting
50
100
20
40
60
g g
0 5 10 15 20 25 30-50
0
0 5 10 15 20 25 30-20
0
weightingrecovery
120
140
160
700
800
recoveryoverallimprovement
80
100
120
400
500
600
20
40
60
200
300
0 5 10 15 20 25 30-20
0
0 5 10 15 20 25 300
100
System init w lof % R2% f1 f2 f3 f4
Hoterocedastic noise case resultsrecovery angles α
(Case) exp exp g1 g2 g3 g4Hetero noise purest ALS 10.7 98.8 3.1 14 9.0 3.8
(low) 7.0 10 15 4.3Hetero noise purest WALS 12.0 98.6 2.6 12 15 4.3
(low) 7.8 15 15 3.7Theoretical ---- ---- 12.0 98.6 ---- ---- ---- ----PCA 10 7 98 8PCA ---- ---- 10.7 98.8 ---- ---- ---- ----
Hetero noise purest ALS 22.3 95.0 7.7 22 22 5.7(medium) 7 2 21 24 4 5(medium) 7.2 21 24 4.5
Hetero noise purest WALS 24.0 94.2 6.6 22 18 5.7(medium 7.4 14 17 5.5
Theoretical 25 0 93 6Theoretical ---- ---- 25.0 93.6 ---- ---- ---- ----PCA ---- ---- 22.0 95.1 ---- ---- ---- ----
Hetero noise purest ALS 40 0 84 0 12 33 38 10Hetero noise purest ALS 40.0 84.0 12 33 38 10(high) 15 38 34 9.0
Hetero noise purest WALS 43.1 81.4 12 26 25 6.0(high) 5.0 27 16 3.0(high) 5.0 27 16 3.0
Theoretical ---- ---- 44.2 80.4 ---- ---- ---- ----PCA ---- ---- 40.8 83.4 ---- ---- ---- ----
Lecture 2
R l ti f t d t• Resolution of two-way data. • Resolution conditions.
– Selective and pure variables– Local rank
N t l t i t– Natural constraints. • Non-iterative and iterative resolution methods
d l ithand algorithms. • Multivariate Curve Resolution using Alternating
L t S MCR ALSLeast Squares, MCR-ALS. • Examples of application.
Spectrometric titrations: An easy way for the generation of two- and three-way data in the study of chemical reactions and interactions
Peristaltic
y y
pump
SpectrophotometerComputer
0.050 mlAutoburettePrinter
Stirrer
T=37oC-125.3
pHmeter
Thermostatic bath
0.4
Three spectrometric titrations of a complexation system at different ligand to metal ratios R
0.2
0.3 R=1.5
400 450 500 550 600 650 700 750 800 850 9000
0.1
0.3
0.4
0.5
R=2
400 450 500 550 600 650 700 750 800 850 9000
0.1
0.2
400 450 500 550 600 650 700 750 800 850 900
0.4
0.5
R=3
0.1
0.2
0.33
400 450 500 550 600 650 700 750 800 850 9000
nm
100
MCR-ALS resolved concentration profiles at R=1.5
80
90
Simoultaneous
70
80 Simoultaneousresolution andtheoretical
50
60 Individualresolution
30
40
20
30
3 4 5 6 7 8 90
10
pH
100
MCR-ALS resolved concentration profiles at R=2.0
90
100
Individualresolution
70
80 Simoultaneousresolution andtheoretical
50
60
30
40
10
20
3 4 5 6 7 8 90
10
pH
100
MCR-ALS resolved concentration profiles at R=3.0
90Simoultaneousresolution andtheoretical Individual
70
80Individualresolution
50
60
40
20
30
0
10
3 4 5 6 7 8 9pH
45
MCR-ALS resolved spectra profiles
40
Simoultaneous
30
35 resolution andtheoretical
25
Individual
15
20d v dua
resolutionat R=1.5
10
400 450 500 550 600 650 700 750 800 850 9000
5
400 450 500 550 600 650 700 750 800 850 900nm
Process analysis 4x 10
-4Process analysis0
2
tive
0.08
0.09
0.1
-6
-4
-2
sign
al s
econ
d de
rivat
2nd derivative
0 04
0.05
0.06
0.07
IR a
bsor
banc
e
0 10 20 30 40 50 60 70-10
-8
spectra channel
0.01
0.02
0.03
0.04
0
2
4x 10
-4
0 10 20 30 40 50 60 70spectra channel
One process IR run (raw data) -6
-4
-2
sign
al s
econ
d de
rivat
ive
2nd derivativeand PCA(3 PCs)
0 10 20 30 40 50 60 70-10
-8
spectra channel
(3 PCs)
R.Tauler, B.Kowalski and S.Fleming Anal. Chem., 65 (1993) 2040-47
0.357
ALS resolved pure IR spectra profiles
0.2
0.25
0.3
tratio
n, a
.u.5
6
2
0.05
0.1
0.15
conc
ent
3
4
abso
rban
ce, a
.u.
0 20 40 60 80 100 120 1400
time
EFA of 2nd derivative data: 0
1
2
1
3
initial estimation of process profilesfor 3 components
0 10 20 30 40 50 60 700
spectra channel
0.25
0.15
0.2
a.u.
3
13
3
0.1conc
entra
tion,
a
11
1
1ALS resolved pure concetration profilesin the simultaneous analysis of eigth
runs of the process
0 100 200 300 400 500 600 700 8000
0.05
time
1 1
1
11
1
2
2
33 2
2222
Melting 1Melting 20.91
atio
nStudy of conformational
0 40.50.60.70.8
e co
ncen
tra poly(A)-poly(U) ds
poly(U) rc
equilibria of polynucleotides
20 30 40 50 60 70 80 9000.10.20.30.4
Rel
ativ
e
poly(A)-poly(U)-poly(U) tspoly(A) cs
poly(A) rc
0 2
20 30 40 50 60 70 80 900Temperature (oC)
poly(A) poly(U)
00.050.10.150.2
00.050.10.150.2 rc
ss
poly(adenylic)-poly(uridylic) acid systemlti d t 0 1
0.150.2
0 10.150.2
24026028030002402602803000
R.Tauler, R.Gargallo, M.Vives and A I i d Rid
Melting data2402602803000
0.050.1
24026028030000.050.1
poly(A)-poly(U) ds poly(A) poly(U) poly(U) tA.Izquierdo-RidorsaChemometrics and Intelligent Lab
Systems, 1998
poly(A) poly(U) ds poly(A)-poly(U)-poly(U) t
1source contribution profiles using
0.5
0 5 10 15 20 250
1
0.5
1
0 5 10 15 20 250
0 5
1
0 5 10 15 20 250
0.5
0 5 10 15 20 25
6resolved composition profiles using nnls
2
4
0 20 40 60 80 1000
2
6
2
4
6
0 20 40 60 80 1000
2
6
4
6
0 20 40 60 80 1000
2
Historical Evolution of Multivariate Curve R l ti M th dResolution Methods
• Extension to more than two components• Extension to more than two components • Target Factor Analysis and Iterative Target Factor Analysis Methods• Local Rank Detection, Evolving Factor Analysis, Window Factor Analysis. • Rank Annihilation derived methods• Detection and selection of pure (selective) variables based methods• Alternating Least Squares methods, 1992• Implementation of soft modelling constraints (non-negativity, unimodality, closure,
selectivity local rank ) 1993selectivity, local rank,…) 1993• Extension to higher order data, multiway methods (extension of bilinear models to
augmented data matrices), 1993-5• Trilinear (PARAFAC) models, 1997• Implementation of hard modelling constraints 1997• Implementation of hard-modelling constraints, 1997• Breaking rank deficiencies by matrix augmentation, 1998• Calculation of feasible bands, 2001• Noise propagation,2002p p g• Tucker models, 2005• Weighted Alternating Least Squares method (Maximum Likelihood),2006• …