2003, gls
Post on 03-Apr-2018
216 Views
Preview:
TRANSCRIPT
-
7/28/2019 2003, GLS
1/13
Pre-whitening of data by covariance-weighted
pre-processing
Harald Martens1*, Martin Hy2, Barry M. Wise3, Rasmus Bro1 and Per B. Brockhoff4
1Department of Food and Dairy Science, Royal Veterinary and Agricultural University, DK-1958 Frederiksberg C, Denmark2Institute of Chemistry, Norwegian University of Science and Technology, N-7491 Trondheim, Norway3Eigenvector Research Inc., Manson, WA, USA4Department of Mathematics and Physics, Royal Veterinary and Agricultural University, DK-1871 Frederiksberg C, Denmark
Received 7 May 2001; Revised 9 September 2002; Accepted 22 November 2002
A data pre-processing method is presented for multichannel `spectra' from process spectro-
photometers and other multichannel instruments. It may be seen as a `pre-whitening' of the spectra,
and serves to make the instrument `blind' to certain interferants while retaining its analyte
sensitivity. Thereby the instrument selectivity may be improved already prior to multivariatecalibration. The result is a reduced need for process perturbation or sample spiking just to generate
calibration samples that span the unwanted interferants. The method consists of shrinking the
multidimensional data space of the spectra in the off-axis dimensions corresponding to the spectra of
these interferants. A `nuisance' covariance matrix S is first constructed, based on prior knowledge or
estimates of the major interferants' spectra, and the scaling matrix G = S1/2 is defined. The pre-
processing then consists of multiplying each input spectrum by G. When these scaled spectra are
analysed in conventional chemometrics software by PCA, PCR, PLSR, curve resolution, etc., the
modelling becomes simpler, because it does not have to account for variations in the unwanted
interferants. The obtained model parameter may finally be descaled by G1 for graphical inter-
pretation. The pre-processing method is illustrated by the use of prior spectroscopic knowledge to
simplify the multivariate calibration of a fibre optical vis/NIR process analyser. The 48-dimensional
spectral space, corresponding to the 48 instrument wavelength channels used, is shrunk in two of itsdimensions, defined by the known spectra of two major interferants. Successful multivariate
calibration could then be obtained, based on a very small calibration sample set. Then the paper
shows the pre-whitening used for reducing the number of bilinear PLSR components in multivariate
calibration models. Nuisance covariance S is either based on the prior knowledge of interferants'
spectra or based on estimating the interferants' spectral subspace from the calibration data at hand.
The relationship of the pre-processing to weighted and generalized least squares from classical
statistics is outlined. Copyright # 2003 John Wiley & Sons, Ltd.
KEYWORDS: pre-whitening; covariance; weighted; preprocessing; GLS; prior knowledge; process; multivariate
calibration
1. INTRODUCTION
1.1. Reducing unwanted effectsClassical chemical modelling, where prior knowledge is
used to formulate mathematical models based on causal/
mechanistic/first-principles theory, has problems when the
a priori knowledge is erroneous or incomplete. On the other
hand, data-driven explorative modelling, such as multi-
variate regression of one set of variables Y on another set of
variables X, has problems if the available data are inade-
quate. Sometimes, purely data-driven modelling requires
large amounts of input data for estimation of parameters that
one already knows.
The goal of the present covariance-weighted pre-proces-
sing technique is to maintain the flexibility of the data-driven
`soft modelling', but to reduce the requirements for
empirical calibration data, by including quantitative prior
knowledge in the modelling. If successful, this should
reduce the existing prerequisite for spanning all relevant
types of variation by the calibration samplesa requirementthat has made multivariate calibration of process analysers
expensive and cumbersome. It should also decrease the total
number of calibration samples needed, as fewer statistical
*Correspondence to: H. Martens, Department of Food and Dairy Science,Royal Veterinary and Agricultural University, DK-1958 Frederiksberg C,Denmark.E-mail: Harald.Martens@mail.tele.dk
Copyright # 2003 John Wiley & Sons, Ltd.
JOURNAL OF CHEMOMETRICS
J. Chemometrics 2003; 17: 153165Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cem.780
-
7/28/2019 2003, GLS
2/13
-
7/28/2019 2003, GLS
3/13
eigenanalyses, these software systems include a weighting-
based pre-processing step, to balance the relevance and noise
levels of the different variables. This weighting may be
written as
X XInputG 2
where G (K K) is a scaling matrix. In this conventionalweighting, G is diagonal, with scaling elements that are the
inverse of a predefined standard deviation s (K 1). In the
commonly used standardization, vector s is defined as the
total initial standard deviation s0 of the Kvariables in the set
of available objects. However, it is also possibleand
statistically more optimalto define s as the standard un-
certainty of the different variables, i.e. the expected standard
deviation of their errors.
More formally, the scaling matrix G may be seen as the
inverse square root of the diagonal variance elements in
matrix S:
G S1=2
3
Defining S = diag(s2) and replacing X by XInputG =
XInputS1/2 in the PCA and PLSR definitions shows that
the pre-processing of the X-variables is equivalent (see
Appendix I) to defining the score vectors as eigenvectors of
XInputS1/2X'Input in PCA/PCR and of XInputS
1/2X'InputYY'
in PLSR (after deflation). In the NIPALS estimation
algorithm it may equivalently be attained by using weighted
least squares (WLS) in the repeated regression over X-vari-
ables that defines each score vector.
If the errors in different X-variables are correlated, S
becomes a covariance matrix with non-zero off-diagonal
elements. From more or less approximate prior knowledgeabout this uncertainty covariance, Equation (3) may still be
used for defining the pre-processing. Equation (2) then
yields a covariance-weighted pre-processing of the input
data. The equivalent NIPALS algorithm then requires
generalized least squares (GLS) regression [1,2] over the
X-variables to estimate the score vectors. Further details of
the relationship between classical GLS and the present use of
covariance-weighted pre-processing for `pre-whitening' of
spectral data are given in Appendix II. This also shows the
converse object weighting to remove correlated errors
between objects.
2.2.3. Denition of the pre-processing weights GA practical implementation of Equation (3) is based on
eigenanalysis of the uncertainty variancecovariance matrix
S in terms of its eigenvectors V and eigenvalues l:
SV Vdiagl 4a
The covariance weighting matrix is here defined as
G Vdiagl1=2VH 4b
The chosen symmetrical definition of G is not mandatory as
long as GG' =S1, but it simplifies the visual interpretation
of the weighted model parameters and residuals.
2.2.4. Deweighting the model parametersThe loadings P and residuals E of the X-variables, obtained
from the bilinear model of the mean-centred, weighted
X-data,
X TPH E 5a
may be descaled to fit the model of the mean-centred,
unweighted data, i.e.
XInput TPHInput EInput 5b
If G is symmetrical and has full rank (see below), the
inversion of Equation (2) gives
XDescaled XInput XG1 5c
Likewise,
EDescaled EG1 5d
and
PDescaled G1P 5e
This simplifies the graphical interpretation of the X-loadings.
In regression methods such as PCR and PLSR the mean-centred, reduced-rank linear regression model summary,
based on the scaled X-variables, may be written as
Y XBA FA 5f
where the regression coefficient parameter matrix BA (KJ)
uses A latent variables and FA (NJ) represents residuals.
BA may be seen as linear combinations of orthogonal
X-loadings (PCR) or orthogonal loading-like loading weights
(PLSR). For graphical interpretation, BA may therefore be
descaled in analogy to Equation (5e) as
BA;Descaled G1BA 5g
On the other hand, the regression coefficients suitable forprediction of the Y-variables directly from the unweighted
X-variables,
bYA XInputBA;ForInput 5h
may be obtained by inserting Equation (2) into Equation (5f),
yielding
BA;ForInput GBA 5i
2.2.5. Denition of the uncertainty covarianceSfrom
prior knowledgeIn the situation with undesired interferants outlined in
Equation (1), it is natural to define S from D = DL' E. The
spectra L of the interferants (the undesired variation
patterns) may sometimes be assumed known, while their
concentrations D are unknown. The formally correct defini-
tion could then be
S L covDLH covE 6a
where cov(D) represents the expected variancecovariance
of the interferant concentrations and cov(E) represents the
covariance of other, unidentified error patterns plus the
variance of random i.i.d. noise. In practice, the variation in
interferant concentrations may be difficult to specify and
may e.g. be replaced by the approximation
covD d2 I 6b
Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 153165
Pre-whitening of spectra 155
-
7/28/2019 2003, GLS
4/13
where d2 is the expected average variance of the interferants'
concentrations; intercorrelations between the interferants'
concentrations are assumed to be negligible. The scalar d is
given in the unit of interferant concentrations. Moreover, it
may often be adequate to assume that the errors in E are
uncorrelated, i.e.
covE diags2 6c
Thereby Equation (6a) simplifies to
S d2LLH diags2 6d
If all the X-variables have about the same uncertainty
variance s2, i.e.
covE s2I 6e
this leads to a further simplification. With the expected
average interferant concentration variance d2 being a general
scaling factor determining the contribution of the interferant
spectra, this further simplifies the definition of S to
S d2LLH I 6f
By defining the scaling factor d sufficiently large, the pre-
processing X = XInputS1/2 (Equations (2) and (3)) in effect
can make the subsequent least squares-based modelling of X
completely insensitive (`blind') to signal variations caused
by the unknown interferant concentrations. Only the net
analyte signal obtained as the residual after projecting K
(Equation (1)) on L will remain in X, together with un-
modelled variations and measurement noise.
2.2.6. Denition of the uncertainty covarianceSfrom
previous residualsWhen explicit prior knowledge about the spectrum of the
individual interferants in L is lacking, the required informa-
tion may instead be defined from spectral modelling
residuals in previous calibration data. If X and Y data from
a previous relevant set of M objects are available, D, the
spectral residuals in these data, may be obtained after
projection of X on the Jknown constituent concentrations Y:
D XI YYHY1YH 7a
These residualsDmay then be used for estimating the future
error covariance matrix S, by defining L in Equation (6f) as
e.g. the first few (A) principal components of D, obtained bysingular value decomposition of D:
USVH D 7b
In the notation of e.g. Matlab the subspace of the interferants
may be defined as
L V:; 1 : AS1 : A; 1 : A 7c
2.2.7. Denition of the uncertainty covarianceSfromthe data at handEquations (7a)(7c) may alternatively be based on the X and
Y data at hand in the actual set of N calibration samples,instead of on previous data. However, care must then be
taken to avoid overfitting. For instance, if cross-validation
and jackknifing are to be used for statistical assessment of a
calibration model, S may e.g. have to be re-estimated within
each cross-validation segment.
3. MATERIALS AND METHODS
3.1. Input data
The data set used for illustrating the pre-processing has beenchosen for its simplicity, in order to make the method clear.
The data [11] concern the determination of the protonated
state of a chemical dye, litmus.
3.2. MethodsTransmitted light spectra Twere measured remotely by fibre
optics in an industrial process spectrophotometer (Guided
WaveModel 200). The transmittance spectra were converted
into absorbance (here referred to as `optical density' (OD))
spectra and collected in K= 48 wavelength channels between
about 400 and 700 nm. These OD spectra were termed XInput,
available for a total of 23 samples.
The samples contain different known concentrations [11]
of protonated (red-coloured) litmus, which is the analyte to
be calibrated for here, Y = [protonated litmus]. In addition,
the samples have various unknown concentration variations
of two interferants, unprotonated (blue-coloured) litmus
(due to varying pH) and white zinc oxide powder. The data
were analysed in Matlab2Version 5.3 (The MathWorks, Inc.)
using the first author's software.
4. RESULTS
4.1. Previous results for the same data
Without any interferants the OD data are expected toincrease proportionally with the concentration of the red-
coloured analyte, Y = [protonated litmus], at each wave-
length k where the analyte absorbs light, xInput,k, k = 1,2,,K.
However, the two interferants (blue litmus, white powder)
generate selectivity problems: strongly varying but un-
known levels of one or both of the interferants make it
impossible to determine the analyte by conventional
univariate calibration based on a single wavelength channel.
Such selectivity problems may be removed by multi-
variate calibration [2], without knowing anything about the
spectral characteristics of the pure analyte and the inter-
ferants, and without even knowing the concentrations of theinterferants in the calibration samples, as demonstrated for
these data in References [2,11]. However, this requires that
the calibration sample set spans not only the analyte's
concentration but also each of the interferants' concentra-
tions. The present paper shows how additional spectral
information about the interferants may be used to filter out
their effects by shrinking the X-space, to the extent that they
do not have to be modelled and therefore not even spanned
by the calibration set.
4.2. Input dataThe two full curves in Figure 1 show the known
interference structures in the present application example:the instrument responses L=[l1, l2] (crosses) of the two
interferants, represented by their OD spectra at K= 48
wavelength channels in the visible wavelength range. These
Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 153165
156 H. Martens et al.
-
7/28/2019 2003, GLS
5/13
-
7/28/2019 2003, GLS
6/13
4.3. Increasing degree of shrinkage of inputdataThe rest of Figure 2 illustrates how the spectra X look after
increased downscaling of the two known interferants'
impact in the pre-processing X = XInputG = XInputS1/2
(Equations (2) and (3)). The error covariance matrix S was
here defined by the simplified expression in Equation (6f) asan increasingly weighted sum of the covariance d2LL' (where
L=[l1, l2] from Figure 1) plus a constant noise variance,
diag(s2) = I.
The scalar d2 determines the degree of shrinkage. The four
rows in Figure 2 represents four increasing degrees of
shrinkage, d2 = 0, 0.1, 1 and 100. This may be thought of as
four different subjective judgements of the relevance of the
two interferants. The left side of the figure shows a gradual
simplification of the X-data, until with d2 = 100 (Figure 2(g))
only one systematic pattern of variation is clearly discernible
from the random measurement noise.
The right side of the figure confirms this: as the
contributions from the two interferants are diminished, theability of the remaining absorbance variation in X to describe
the analyte Y increases. Without any shrinkage of the
interferants' absorbance contributions (d2 = 0), three PCs
were required to describe both X and Y. Already at d2 = 1
most of the variation in Y is described after only one PC.
With d2 = 100 the first PC gives more or less a complete
description of X as well (Figure 2(h)). Equivalently (see
Appendix I), this means that XInputS1X'Input has only one
large eigenvalue.
4.4. A priori information for OLS, WLS and GLSpre-processingFigure 3 compares the pre-processing parameters in con-
ventional unweighted linear regression (here termed `OLS'),
in the pre-processing with diagonal S, as used in e.g. most
chemometric software (here termed `WLS'), and in the new
covariance-weighted pre-processing (here termed `GLS'; see
Appendix II). The left subplots show the uncertainty
information assumed available a priori in each of the three
cases. The right subplots illustrate the effect of the pre-
processing for three arbitrary X-variables (out of 48), namely
#10, 20 and 30, for all the samples.
In the top row (`OLS') there is no prior information used(in Equation (6d), diag(s)=I and d2 = 0). The variation in all
three directions #10, 20 and 30 is seen to be the information
that we expect from Figure 2(a).
Figure 2. Effect of increasing degree of GLS shrinkage of input data. Left: GLS pre-processed input data X = XInputG, where XInput is the
input spectra (a). Right: cumulative fit (fraction of explained variance, R2) of X (crosses, full line) and Y (circles, broken line) as a function
of PCA component a= 14. Rows 14: covariance scaling factors d2 = 0, 0.1, 1 and 100 respectively (Equation (6d)).
Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 153165
158 H. Martens et al.
-
7/28/2019 2003, GLS
7/13
The X-variables from wavelength channel #30 onwards
represent mostly baseline information. In order to visualize
the effect of the WLS pre-processing available in most
chemometrics software packages today, we make the
subjective assumption that the baseline channels from #30
onwards contain mainly irrelevant noise (we ignore that the
X-data in this region may carry useful baseline information).Therefore we a priori ascribe relative standard uncertainty
sk = 1 for X-variables k = 129, but increase this to sk = 4 for
k = 3048, and use these expected noise levels as s in
Equation (6d). For this WLS pre-processing, the covariance
shrinkage factor is still defined as d = 0. The vertical variation
in X-variable #30 is seen to have been reduced in Figure 3(d)
compared with Figure 3(b), but otherwise the sample
configuration is unchanged and the cloud of sample points
still spans three dimensions.
In the third row (`GLS') we additionally employ the
spectral background knowledge about the two interferants
from Figure 1, l1 and l2, with shrinkage factor d2 = 100. We
retain the value of s from the WLS case to illustrate howvariance diag(s2) and covariance d2LL' in Equation (6d) can
be used at the same time. The cloud of sample points in
Figure 3(f) now spans mainly a single dimensionvariations
in net analyte signal. Many of the interferant effects have
been removed already during pre-processing.
4.5. Calibration based on very few samplesIn this subsection we illustrate one possible use of pre-
whitening: the removal of interference effects not seen in the
calibration sample set. Conventional cross-validated PLSR isused as the calibration method.
In regression-based multivariate calibration, all the inter-
ference phenomena that may occur in future samples have to
be represented in the calibration sample set, with sufficient
clarity and sufficiently independent of the other types of
variations. Sometimes that is difficult to attain, for economic
or practical reasons, for instance when calibrating an
industrial process spectrophotometer. The covariance-
weighted pre-processing method allows interference phe-
nomena with known spectra L to be corrected for at the pre-
processing stage, so that they do not have to be spanned in
the calibration set.
The first column of subplots in Figure 4 shows the originalabsorbance spectra XInput. The second column of subplots in
Figure 4 shows the spectra after pre-processing by the three
methods illustrated in Figure 3 for three of the X-variables.
Figure 3. Comparison of OLS, WLS and GLS pre-processing. Top (a,b), OLS; middle (c,d), WLS; bottom (e,f), GLS. Left: information
available a priori. Right: data plotted in 3D for X-variables #10, 20 and 30. Each point represents one samples spectrum.
Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 153165
Pre-whitening of spectra 159
-
7/28/2019 2003, GLS
8/13
Calibration set. The three densely dotted curves in Figure
4(a) represent N= 3 objects that together are here regarded as
if they were the only samples available with both X-and Y-data .
This tiny calibration sample set has relative analyteconcentrations Y=[0.009,0.365,0.679]'. Test set. For the sake
of illustration, the thin curves in Figure 4 represent the
remaining 20 objects, which will now be treated as a new,
future set, for which Y is to be predicted from their spectra X.
These input data are the same for the OLS, WLS and GLS
cases (rows 1, 2, and 3 in Figure 4).
The three densely dotted curves were used as X in
calibration against Y, with the model parameters estimated
by PLSR. In all three cases, OLS, WLS and GLS, the PLSR
model with one PC appeared to perform best in the small
calibration set, because the calibration samples only spanned
the analyte variation and no interferants. The linear regres-
sion coefficient vector BA = 1 gave more or less equally`perfect' fit in the N= 3 calibration samples by all three pre-
processing methods, as evidenced by the three dots along the
`ideal' diagonal (middle column of subplots in Figure 4).
The analyte concentration in the remaining 20 `unknown'
samples, bYA, was now predicted from their spectra, using the
`optimal' calibration model BA=1. The circles in the middle
column of subplots in Figure 4 show that the OLS and WLScalibration models gave bad Y-predictions in the new,
independent samples, while the GLS calibration model gave
good prediction. The reason is that variations in the input
spectra due to varying, uncontrolled levels of the two
interferants were not seen in the calibration set and hence
were left unchecked by the conventional unweighted and
variance-weighted cases (OLS and WLS). In contrast, the
damaging effects of the interferants on the predictive ability
of the calibration model were more or less eliminated by the
covariance-weighted pre-processing (GLS).
The two rightmost columns of subplots in Figure 4 show
the X-residuals after the one-dimensional PLSR model, in
terms of the scaled residuals E (obtained after projection of Xon the first PC t1) and their descaled version EDescaled(Equation (5d)) respectively. This shows that the unmo-
delled interference information was clearly visible for the
Figure 4. Calibration with very few samples. Top (a-1 to a-5), OLS; middle (b-1 to b-5), WLS; bottom (c-1 to c-5), GLS. Column 1: input
data XInput of three calibration samples (densely dotted) and 20 unknown test samples. Column 2: scaled spectra for regression
modelling, X = XOLS, XWLS or XGLS. Column 3: Y-values predicted from optimal models, byi;A1 (ordinate), vs measured values yi
(abscissa); Target line byi;A1 yi. Column 4: spectral residuals from one-PC PLSR model of scaled X-data, E. Column 5: spectral
residuals E after descaling by Equation (5d), EDescaled.
Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 153165
160 H. Martens et al.
-
7/28/2019 2003, GLS
9/13
new unknown samples, both for the OLS/WLS and GLS
cases. In the GLS case, E was very low (Figure 4(c-4))
compared with the scaled X-data (Figure 4(c-2)), even for the
20 `new' samples. However, after descaling, the characteris-
tic signals of the two unmodelled interferants became clearly
visible in the residual spectra EDescaled (Figure 4(c-5)). These
residuals may be submitted to a second bilinear modelling,yielding a second set of score vectors and residual variances,
for outlier analysis, etc.
In summary, the pre-processing in this case allowed us to
make a valid calibration model with a small and otherwise
inadequate calibration set, in spite of a glaring lack of
interferant variability between the calibration objects. This
illustrates that shrinking away interference effects in the
X-space by pre-whitening makes it possible to use fewer
calibration samples, and in particular fewer Y-data, and
hence to get cheaper and simpler calibration models.
4.6. Calibration based on many samplesThe next two figures illustrate another advantage of pre-
whitening: the ability to reduce the required dimensionality
of the calibration model for a given set of calibration
samples. The main purpose of this reduction is to simplify
model interpretation, with a possible enhancement of the
predictive performance. In this case all the available objects
from Figure 2(a) are used as calibration samples (N= 23).The
same parameter sets (termed OLS, WLS and GLS) were used
as in the last example, and PLSR was again used for
developing the calibration models.
Full leave-one-out cross-validation was used for assessing
the models in terms of their optimal rank A and their root
mean square error of prediction in Y, RMSEP(Y)A. The input
spectra of the calibration samples now represent all N
(3 20 = 23) curves displayed in the right column of
subplots in Figure 4. The three full curves in Figure 5 show
the predictive ability of the OLS, WLS and GLS cases, in
terms of the cross-validated RMSEP(Y)A vs A = 0,1, 2,,6.(The dotted curve will be discussed later.)
The figure first of all shows that while the OLS and WLS
models require at least A = 3 PCs to reach acceptably low
predictive error, the GLS model did so with only A = 1 PC.
Moreover, a slight improvement in predictive ability was
attained: using two PCs, the GLS case gives a lower
predictive error than the OLS/WLS cases gave with three
or more PCs.
Finally, Figure 6 illustrates the effect of rescaling and
descaling of the model parameters, in this case of the
estimated regression coefficient vector at the lowest accep-
table rank, for OLS, WLS and GLS. The OLS solution is
superimposed on the WLS and GLS solutions as a dotted
line, for comparison.
The left column of subplots shows BA, as obtained from
bilinear PLSR at the optimal number of PCs (A), based on the
scaled X-variables in the OLS, WLS and GLS cases. The three
ways of pre-processing may be seen to yield somewhat
different scaled regression coefficients. Moreover, while the
OLS and WLS solutions requiredA = 3 PCs, the GLS solution
required only A = 1 PC.
The middle column shows the rescaled coefficient
spectrum BA,ForInput (Equation (5i)), suitable for application
Figure 5. Calibration based on all samples: predictive performance after OLS, WLS
and GLS pre-processing. Prediction error of y, estimated by full leave-one-out
cross-validation, from PLSR modelling from X = XInputG with G = S1/2. Squares:
OLS; S = l (no pre-processing). Circles: WLS; S diagonal (variance weighting).
Triangles: knowledge-based GLS; S defined from two known interferant spectra l1
and l2. Dotted curve: data-based GLS; S defined from spectral residuals after
projection of XInput on y.
Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 153165
Pre-whitening of spectra 161
-
7/28/2019 2003, GLS
10/13
directly to the input X-variables. Again the OLS solution is
superimposed on the WLS and GLS solutions (dotted line).
The scaling of the individual X-variables in vector BA,ForInputis independent of the pre-processing of the X-variables, so
the only difference between the solutions is due to the impact
of the pre-processing on the estimation process itself. Figures
6(e) and 6(h) show that the downweighting of the X-vari-
ables !channel #30 has rendered the other channels more
important for separating the baseline variations due to the
turbidity from the blue-coloured interferant and the red-
coloured analyte. The wavelength channels just below #30,
with low absorbance at the end of interferant spectrum l1(Figure 1), are given higher relative importance in the
modelling. This confirms that in a rank-reduced calibration
model such as the present low-rank PLSR modelling, there
are several almost equivalent ways to combine the 48 input
variables in order to attain the desired selectivity enhance-
ment.
The right column of subplots in Figure 6 shows the
descaled coefficient spectrum BA,Descaled (Equation (5g)),
suitable for graphical interpretation, with the OLS solution
again superimposed (dotted line). Now the obvious effect of
e.g. the sharp downweighing of X-variables !channel #30
has been removed.
The three solutions are qualitatively similar: they havepositive values below about channel #15, as expected from
the spectral characteristic of the analyte red litmus, and
negative values at higher wavelength channels in order to
compensate for the possible presence of the interferants blue
litmus and white ZnO. However, quantitatively, the three
solutions are somewhat different. This shows that with
different pre-processing methods the PLSR models needed
to describe different Y-relevant patterns of variation in the
data in order to attain the desired selectivity.
4.6.1. Denition of the uncertainty covarianceSfromthe calibration data at handThe dotted curve in Figure 5 represented the results when
interferant spectra L (Figure 1) were considered unknown,
and instead estimated from the X- and Y-data of the 23samples in the actual calibration data set at hand. As before,
leave-one-out cross-validation was employed, with re-
estimation of the spectral interferant covariance S for each
cross-validation segment. The figure shows that the pre-
whitening based on the estimated spectral residual matrix D
(Equation (7a)) with its dominant subspace L (Equations (7b)
and (7c), using A = 2 PCs) gives almost as simple modelling
as the one based on prior knowledge of the two interferants'
individual spectra L=[l1, l2]: in both cases the number of
PLSR components required is reduced, because the model
does not have to span these major interferants. However, the
prediction error is now slightly higher. A possible reason for
this is that the former, knowledge-based pre-processing usedthe known spectra L as additional independent information
in estimating S, while the latter, data-driven pre-processing
had no such extra information available.
Figure 6. Calibration based on all samples: regression coefficients estimated, rescaled
and descaled. Top, OLS (A = 3 PCs); middle, WLS (A = 3 PCs); bottom, GLS (A = 1 PC).
Left: coefficients bBA obtained from scaled spectra X. Middle: rescaled coefficientsbBA;ForInput (Equation (5i)), applicable directly to unscaled input spectra XInput. Right:
descaled coefficients bBA;Descaled (Equation (5g)); weighting effects removed. Dotted
curves: OLS estimate bBA3 from (a), for comparison.
Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 153165
162 H. Martens et al.
-
7/28/2019 2003, GLS
11/13
5. DISCUSSION
Figure 4 demonstrated an ability of the covariance-weighted
pre-processing to give good predictive ability even for new
samples with interferants not present in the calibration set . This
may become important in e.g. calibrating industrial process
analysers, when it is difficult to perturb the actual process
enough to get a sufficiently informative calibration sample
set. By introducing prior knowledge about known inter-
ferants' spectral signatures, the interferants can be compen-
sated for already in a pre-processing filtering step, and thus
do not have to vary in the calibration set.
Figure 5 demonstrated that the covariance-weighted GLS'
pre-processing yielded calibration models with lower rank
than those from the conventional `OLS' and `WLS' methods.
High-dimensional models are generally cumbersome to
interpret graphically, so that is an advantage. Moreover, as
long as the uncertainty covariance S represents prior
knowledge, a slight improvement in prediction ability may
be expected, because the subsequent calibration thenrequires fewer statistical parameters to be estimated from
the available Ncalibration data.
5.1. Comparison with other methodsThe covariance-weighted pre-processing based on prior
known spectra L has the advantage of reducing interference
without consuming degrees of freedom from the available,
often expensive Y-data. In that respect it resembles spectral
interference subtraction (SIS) [12]. If, instead, S is estimated
from the available data [X, Y] at hand, the pre-processing has
some similarity to so-called orthogonal signal correction
(OSC) [13] and direct orthogonalization (DO) [14]. Extendedmultiplicative signal correction (EMSC) [12,15] has similar
properties to SIS and covariance-weighted pre-processing,
but allows for removal of both additive and multiplicative
effects.
There is one major difference in how the covariance-
weighted pre-processing and the set of OSC, DO, SIS and
EMSC methods attempt to reduce the interference effects in
XInput. The latter methods subtract the effects in one way or
another. In contrast, the new covariance-weighted pre-
processing is based on shrinking by division (i.e. multi-
plication by the inverse of S; see Equations (2) and (3)). The
full consequences of this distinction are not yet clear.
However, it may be noted that DO [14] is particularly
similar to the data-driven estimation of interferant subspace
L (Equations (7a)(7c); Figure 5, dotted line), even though it
employs subtraction instead of inverted scaling to eliminate
the effect of the interferants.
5.2. Pre-colouring the spectraInstead of just shrinking the X-space in particularly
undesired or irrelevant directions, one may also reformulate
the covariance-weighted pre-processing to expand the
X-space in directions known to be particularly desired or
relevant. For instance, after having contracted the X-space to
filter out irrelevant or detrimental interferants, the X-spacecould then be expanded in the dimension of the analyte's
spectrum (curve 3, Figure 1), to enhance this desired type of
variation over e.g. random measurement noise in the
subsequent multivariate subspace analysis. Preliminary
Monte Carlo simulations (not shown here) indicate this to
have some statistical advantage.
The pre-processing has been used for pre-whitening
spectral X-variables in this paper. However, it may equally
well be applied to the set of Y-variables. Appendix I outlines
various equivalent alternatives for integrating the interferantcovariance matrix S into the actual estimators in PCA/PCR
and PLSR, instead of using S1/2 for pre-processing. When
prior knowledge is available about the available objects, the
pre-processing may also then be used, in a bilinear analogy
to the conventional GLS estimator (Appendix II).
It should be noted that after covariance-weighted pre-
processing to remove all major interferants, the remaining
spectra mainly show the net signal of the analyte plus
random noise (see Figure 2(g)). Of course, if the spectrum of
the analyte, K (Equation (1)), is a linear combination of the
spectra L of the interferants, the covariance-weighted pre-
processing will filter out the analyte effect too; the remaining
net analyte signal is zero. Thus the usual requirement in
quantitative analysis, that the analyte spectrum has to be
linearly independent of the major interferant spectra,
remains valid.
6. CONCLUSIONS
A method has been presented for covariance-weighted pre-
processing of multivariate input data. It facilitates the use of
prior knowledge about undesired (and desired) structures
that are expected to vary in the input data. Its purpose is to
reduce the complexity of the ensuing model and to improve
its predictive ability. The method was illustrated forreducing the effect of spectral variations due to known
interferants' known spectra.
In general, multivariate calibration by low-rank regres-
sion, using e.g. PCR or PLSR, has proven highly effective for
solving selectivity problems in complex systems. Many
unidentified interference problems can even be dealt with, as
long as they are spanned well in the calibration sample set
and picked up clearly by the multichannel instrument.
However, the present combination of prior knowledge
and empirical calibration data may simplify calibration,
because already known parameters do not have to be
estimated statistically from the calibration data. The finalstatistical regression stage in the calibration process could
then primarily be used for finding and correcting unknown
or unexpected phenomena in the data. Thereby calibration of
multichannel instruments may become less expensive and
time-consuming, and easier to understand.
APPENDIX I. EIGENVECTOR EXPRESSIONSFOR COVARIANCE-WEIGHTED PRE-PROCESSING
In PCA, each latent variable (PC) is an eigenvector of XX'
(after suitable mean centring). If the score vector for anindividual PC, t, is scaled to t't = 1, this may be written as
tl=(XX')t. Inserting X = XInputS1/2 (Equations (2) and (3))
into this eigenvalue expression yields the covariance-
Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 153165
Pre-whitening of spectra 163
-
7/28/2019 2003, GLS
12/13
weighted expression tl= (XInputS1X'Input)t. Equivalently, t
is then a right-hand singular vector of XInputS1/2.
Conversely, if the PCA loading vector p is scaled to
p'p = 1, then pl=(X'X)p. Inserting X = XInputS1/2 gives
pl= (S1/2X'InputXInputS1/2)p; p is then a left-hand singu-
lar vector of XInputS1/2.
In PLSR, each component is an eigenvector of the XYcovariance structure [10]. For instance, with orthonormal
scores, t is defined by tl= (XX'YY')t (after suitable deflation
for previous components). With X = XInputS1/2 this gives
the expression tl= (XInputS1X'InputYY')t. Conversely, the
orthonormal loading weight w for each component, used for
defining t = X'w (after suitable deflation of X for previous
components), is defined by wl= (X'YY'X)w. Covariance-
weighted pre-processing is equivalent to defining
wl= (S1/2X'InputYY'XInputS1/2)w, or w as the first left-
hand singular vector of S1=2XTInputY.
Hence the PCA/PCR and PLSR solutions may be
obtained either by covariance-weighted pre-processing
X = XInputS1/2 followed by standard OLS-based software
for PCA/PCR or PLSR, or by eigenvector decomposition of
cross-product matrices weighted by S1. The latter is
analoguous to generalised least squares (GLS) regression.
APPENDIX II. GLS AND COVARIANCE-WEIGHTED PRE-PROCESSING
The relationship between generalized least squares (GLS)
regression and covariance-weighted pre-processing will be
demonstrated here. In weighted least squares (WLS) the
regressorregressor and regressorregressand cross-productmatrices are modified by the inverse error covariance matrix
S1. When S has off-diagonal elements, this approach is
called `GLS' in some statistical literature [2]. The terms `WLS'
and `GLS' are therefore employed here to distinguish purely
variance-based weighting from covariance-based weighting.
In some other statistical literature the WLS and GLS terms
are used more interchangeably. More details are given in
Reference [1].
II. 1. Regression over objectsIn the conventional OLS case the input data for one or more
regressands, YInput (NJ)=[yInput,j, j = 1,2,,J], are modelledby projection on one or more regressors, XInput(N K)=[xInput,k, k = 1,2,,K], over a set of N objects,
according to the linear model YInput = XInputB FInput (ignor-
ing the mean centring). To estimate the regression coeffi-
cients B (KJ), the conventional estimator fits each
regressand yInput (N 1) individually to XInput by minimiz-
ing f'InputfInput. This yields the conventional full-rank OLS
estimator bB XHInputXInput1XHInputYInput.
If the correlation pattern between the response errors in
the Nobjects, SN (N N), is known, the GLS estimator bB
XHInputS1N XInput
1XHInputS1N YInput yields better estimates,
because it minimizes fHInputS1N fInput for each regressor, i.e.
the importance of the correlated error pattern is down-weighted.
Equivalently, the pre-whitening operators X S1=2N XInput
and Y S1=2N YInput allow the model to be rewritten as
Y = XB F. The same GLS estimator may now be rewritten
as bB XHX1XHY, which shows that covariance-weighted
pre-processing allows the GLS estimation of B to be
performed by conventional OLS tools. This was here shown
for full-rank OLS/GLS regression, but is equally applicable
for regression methods that handle collinear X-variables,
such as ridge regression and the bilinear methods PCR andPLSR.
II. 2. Regression over X-variablesThe converse case is traditional direct multivariate calibra-
tion or multicomponent curve resolution according to Beer's
law. Here each spectrum xInput (1 K) in the matrix
XInput=[xInput,k; k = 1,2,,K] is modelled by a set of J known
analyte spectra K (KJ) in the linear regression model
XInput = CK'Input EInput, where C (NJ) is the matrix of
unknown analyte concentrations and EInput (N K) is the
matrix of spectral residuals (ignoring baseline offsets). When
the constituent spectrum matrix KInput has full column rank,
the OLS estimator minimizes eInpute'Input for each row in
XInput, yielding bC XInputKInputKHInputKInput
1.
If the correlation pattern between the response errors
in the K X-variables, S (K K), is known, then the
GLS estimator minimizes eInputS1e'Input and yields
bC XInputS1KInputK
HInputS
1KInput1.
The equivalent covariance-weighted pre-processing solu-
tion for curve resolution pre-whitens the spectra [X; K'] =
[XInput; K'Input]S1/2, thereby shrinking away the noise
correlations between the X-variables. The model may then
be written as X = CK'E and the GLS concentration estimate
may be obtained by bC XKKHK1, i.e. by an OLS
expression.In summary, prior knowledge about the uncertainty
covariances S may be used to improve linear regression. In
Appendix I the same was shown for bilinear regressions. In
both cases, one may either analyse the input data directly by
GLS or GLS-like expressions, involving S1, or perform
covariance-weighted pre-processing of the input data by
S1/2, followed by OLS or OLS-like expressions, as
illustrated in this paper.
REFERENCES
1. Read BC. Weighted least squares. In Encyclopedia ofStatistical Sciences, vol. 9, Kotz S, Johnson NL (eds). WileyInterscience, J. Wiley & Sons Inc: New York, 1988; 576578.
2. Martens H, Naes T. Multivariate Calibration. Wiley:Chichester, 1989.
3. Gower JC. Generalised canonical analysis. In MultiwayData Analysis, Coppi R, Bolasco S (eds). Elsevier:Amsterdam, 1989; 221232.
4. Bullmore E, Long C, Suckling J, Fadili J, Calvert G, ZelayaF, Carpenter A, Brammer M. Colored noise andcomputational inference in neurophysiological (fMRI)time series analysis: resampling methods in time andwavelet domains. Human Brain Mapp. 2001; 12: 6178.
5. De Lathauwer L, de Moor B, Vandewalle J. An introduc-tion to independent component analysis. J. Chemometrics2000; 14: 123149.
6. Kuldvee R, Kaljurand M, Smit HC. Improvement ofsignal-to-noise ratio of electropherograms and analysis
Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 153165
164 H. Martens et al.
-
7/28/2019 2003, GLS
13/13
reproducibility with digital signal processing and multi-ple injections. J. High Resol. Chromatogr. 1998; 21: 169174.
7. Wentzell PD, Andrews DT, Kowalski BR. Maximumlikelihood multivariate calibration. Anal. Chem. 1997; 69:22992311.
8. Wentzell PD, Lohnes MT. Maximum likelihood principalcomponent analysis with correlated measurement errors:
theoretical and practical considerations. ChemometricsIntell. Lab. Syst. 1999; 45: 6585.9. Paatero P, Tapper U. Positive matrix factorisation: a non-
negative factor model with optimal utilisation of errorestimates of data values. Environmetrics 1994; 5: 111126.
10. Ho skuldsson A. PLS Regrl 7 session methods. J Chemo-metrics, 1988; 2: 211228.
11. Martens H, Martens M. Multivariate Analysis of Quality.An Introduction. Wiley: Chichester, 2001.
12. Martens H, Stark E. Extended multiplicative signalcorrection and spectral interference subtraction: newpre-processing methods for near infrared spectroscopy.J.Pharmaceut. Biomed. Anal. 1991; 9: 625635.
13. Wold S, Antti H, Lindgren F, O hman J. Orthogonal signalcorrection of near-infrared spectra. Chemometrics Intell.Lab. Syst. 1998; 44: 175185.
14. Andersson CA. Direct orthogonalization. ChemometricsIntell. Lab. Syst. 1999; 47: 5163.15. Martens H, Pram Nielsen J, Balling Engelsen S. Light
scattering and light absorbance separated by extendedmultiplicative signal correction (EMSC). Application toNIT analysis of powder mixtures. Anal. Chem. 2003; 75:394404.
Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 153165
Pre-whitening of spectra 165
top related