frequently asked questions on calibration -...
TRANSCRIPT
FAQ on calibration
1
Rationale
From experiences gained in inter-laboratory comparison studies, being it proficiency tests or method
validation studies by collaborative trial, we know that the importance of instrument calibration and its
effect on analysis results is frequently underestimated. Participants to some of the studies expressed
their wish to get more guidance on instrument calibration; despite a number of international guidance
documents is already available. This situation could be explained by the difficult language applied in
some of the guides, respectively the lack of knowledge on where to find very practical and easily
understandable guidance. The more than five million hits gained at the time of writing this text with
one of the most prominent internet search engines for the term "instrument calibration" do not really
contribute to improve the situation. Therefore we felt it necessary to prepare this document that aims
to address some aspects of standard preparation and instrument calibration for the determination of
polycyclic aromatic hydrocarbons (PAHs), respectively mycotoxins in food that seem to provide
difficulties to some operators.
Like in every question-and-answer scenario one will face the fact that either a question can be asked
extremely specific, resulting in a similar specific answer or a question can be asked rather general and
the answer thus might also be rather general. This might result in a dilemma, in which the answer to a
question might not be the one of interest, because either it is too specific and not applicable to the
exactly (slightly different) question one might have, or it is too general and does not touch specific
aspect one had in mind.
In other words:
“If a model is simple, it likely will be wrong,
if it is complex, it surely is impractical”
Applying this to this guide, the compromise was to try to answer both to relevant general issues but
also to a few specific ones that are sometimes encountered. The format of the guide was chosen on
purpose, because as a frequently asked questions (FAQ) document it remains open to address and
include any question regarding standard preparation and instrument calibration that might come up in
future.
FAQ on calibration
2
Index Standard preparation .................................................................................................................. 3
1. Where do I get reference materials for PAH analysis from? ......................................... 3 2. Which level of purity of reference materials is acceptable? .......................................... 4 3. Which are the advantages of gravimetric standard preparation? ................................... 4 4. Which type of balance do I need for the preparation of calibration standards?............. 4 5. Is serial dilution of a standard solution for the preparation of calibration standards
acceptable? ..................................................................................................................... 5 6. How shall I store PAH standard solutions?.................................................................... 6 7. Which containers shall I use for storage of standard solutions? .................................... 7 8. How shall I estimate the shelf life of my standard preparations? .................................. 7 9. Which type of volumetric glassware may I use for the preparation of calibration
standards? ....................................................................................................................... 8 10. How do I verify the concentration of my standard preparations? .................................. 8 11. How many points do I need for a calibration curve? ..................................................... 9 12. How many replicates per calibration point?................................................................. 11 13. Why shall the concentration levels of the calibration standards be equidistant? ......... 12 14. Which range of concentration has the calibration to cover? ........................................ 14 15. Which type of internal standard shall I use? ................................................................ 15 16. When do I need to prepare matrix matched calibration standards? ............................. 16 17. How do I determine matrix effects? ............................................................................. 17 18. In which sequence shall I measure the calibration standards? ..................................... 18
Evaluation of calibration measurements .................................................................................. 18 19. How shall I test for linearity of the calibration?........................................................... 18 20. Does a correlation coefficient (r) of 0.99 indicate linearity of calibration? ................. 19 21. Which level of R² is sufficient?.................................................................................... 19 22. Which information can I get from the plot of residuals? ............................................. 20 23. What is the residual standard deviation?...................................................................... 20 24. May I force the calibration curve through the origin?.................................................. 20 25. What is homo- and heteroscedasticity? ........................................................................ 20 26. How do I test for homoscedasticity / heteroscedasticity? ............................................ 21 27. Linear regression or weighted linear regression – which shall I apply? ...................... 21 28. May I remove outliers? ................................................................................................ 21 29. How do I estimate confidence and prediction intervals? ............................................. 22
General ..................................................................................................................................... 23 30. Is there any internationally harmonised document on calibration?.............................. 23 31. Where can I get guidance on calibration? .................................................................... 23
FAQ on calibration
3
Standard preparation
1. Where do I get reference materials for PAH analysis from? A number of suppliers of chemicals have PAH standards in their assortment. A non-exhaustive list of
suppliers, respectively links to other sources of information is given in the following:
The International Society for Polycyclic Compounds (ISPAC) has on its website a list of suppliers of
polycyclic aromatic hydrocarbons and heterocyclic aromatic compounds both neat and in solution:
ISPAC Standards
A searchable database on suppliers of different chemicals is on the homepage of Chemindustry
(www.chemindustry.com). The following link gives an example for suppliers of benzo[a]pyrene (neat,
and in solution):
ChemIndustry: example of search for benzo[a]pyrene
A similar searchable database which returns besides the name of different suppliers also some
information on the product (e.g. packaging size) can be found on the webpage www.chemexper.com
chemexper.com
A large collection of PAH reference substances, among others different certified reference materials, is
included in the 2008/2009 catalogue of LGC. It contains single substance reference materials (neat and
in solutions, native and labelled) as well as PAH mixtures.
LGC standards
- Important suppliers of
reference materials for PAH in Europe (non-exhaustive list)
ALFA Aesar, Chiron, Dr. Ehrenstorfer, SIGMA Aldrich, VWR
- certified reference materials (CRMs) for PAHs
The Institute for Reference Materials and Measurements (IRMM), LGC, the National Institute of
Standards and Technology (NIST)
FAQ on calibration
4
2. Which level of purity of reference materials is acceptable? A purity of 100 % would be desirable, but in reality most of the target PAHs (15+1 EU priority PAHs)
are available on the market in purities of above 95%.
Hence the operator has to choose a reference material with a purity that is suitable for the particular
task. However, care must be given that impurities do not interfere with the target analytes.
The purity of the reference substances shall be considered in the calculation of the standard
concentrations. The uncertainty of the purity shall be included in the measurement uncertainty
estimate.
3. Which are the advantages of gravimetric standard preparation? The weighing procedure is more precise than handling of volumes, which results normally in smaller
uncertainties. Handling of low volumes of liquids is difficult due to the influence of many factors such
as surface tension, and leads frequently to bias.
For gravimetric standard preparation it shall be noted that the uncertainty from weighing increases
with decreasing amounts of weighed substance. This has consequences for the selection of the type of
balance and the weighing procedure applied.
A prerequisite for gravimetric standard preparation is thermal equilibrium of the balance and all
chemicals and consumables which are used for the standard preparation. Thermal equilibration might
take a couple of hours especially in case of large solvent volumes.
Before starting with gravimetric standard preparation make sure that the balance is working properly,
by applying suitable check weights.
4. Which type of balance do I need for the preparation of calibration standards?
An analytical balance with a readability of 0.1 mg, respectively 0.01 mg for weighing of substances at
levels as low as about 30 milligram, will be fit for the purpose, which means that the uncertainty of
weighing is at an acceptable level.
The US Pharmacopeia [1] defines the minimum permissible weight of a balance as a load that will
give a relative uncertainty of less than 0.1%. As a rule of thumb the minimum weight can be estimated
for a balance by multiplying the readability of the balance (e.g. 0.1 mg) with a factor between 3000
and 5000.
FAQ on calibration
5
However the applicability of this rule of thumb depends on the precision of the balance and has to be
evaluated experimentally according to Eq 1:
00103 .amountweighed
Stdevx≤
tsmeasuremen10of Eq 1
It has to be noted that the minimum weight corresponds only to the amount of substance weighed and
does not include the tare weight of the weighing vessel!
In any case, care has to be given that the balance is calibrated and working according to the
specifications. Also provisions on environmental conditions must be respected, e.g. too low air
humidity leads to electrostatic problems and might cause bias.
Further information on the use of balances in standard preparation can be found in open source
literature in e.g. a paper published by Ch Burgess and R.D. McDowall [2]
5. Is serial dilution of a standard solution for the preparation of calibration standards acceptable?
No! Two aspects have to be taken into account in standard preparation by serial dilution. The probably
more important aspect is the lack of ability of identifying biased standard preparation.
Figure 1 A presents a standard preparation scheme where bias in the preparation of dilution 1 (D1)
from the stock standard solution (S) cannot be identified from the measurement results of the
calibration standards (CS1 to CS5). Even worse would be scheme B which includes a cascade of
dilutions of the calibration standards. Besides the risk of unidentified bias it provides high uncertainty
of the concentration of standard CS5, which is prepared in six dilution steps. According to the law of
error propagation the uncertainty of CS5 is equal to the square root of the sum of uncertainties of the
preparations S to CS5, which is of course larger than the uncertainty of any other calibration standard
shown in Figure 1.
FAQ on calibration
6
Figure 1: Different schemes for the preparation of calibration standards. Each arrow represents one
dilution step (S: stock standard solution; D1 dilution 1; CS1 to CS5: calibration standard solutions)
A B
S
D1
CS1 CS2 CS3 CS4 CS5
C
The most appropriate of the three schemes is shown in Figure 1 C. The calibration standard
solutions are prepared from independent dilutions of the stock standard solution. By doing so,
an error in the preparation of an intermediate dilution (D1 to D5) should be detectable in the
measurement results of the calibration standards.
Duplicating scheme C with two independent stock standard solutions provides the highest level of
information about the correctness of the calibration standards.
In practice the preparation of calibration standards needs thorough planning. The handling of low
volumes or low masses shall be avoided as much as possible. In case of PAHs limitations have to be
encountered in the preparation of the stock standard solutions, which are caused by the low solubility
of some PAHs (e.g. dibenzopyrenes) in the majority of organic solvents.
6. How shall I store PAH standard solutions? PAH standard solutions shall be stored in amber glass ware in the dark due to potential degradation of
PAHs by UV light. Room temperature (about 20 °C) is recommended for storage of PAH standard
solutions by a number of suppliers. Opened commercial standards and own standard preparations
should be stored cooled to avoid solvent losses. Do not put PAH standard solutions in the freezer as
the solubility of some PAHs might be affected at low temperatures.
FAQ on calibration
7
7. Which containers shall I use for storage of standard solutions? Amber glass ware with Teflon® lined closures should be used.
As a general rule, the headspace above the standard solution shall be as small as possible. It is also
recommended to divide the stock standard solution preparations for storage into several units of small
volume in order to conserve the composition of the parts of the preparations, which are at the time
being not in use.
8. How shall I estimate the shelf life of my standard preparations? The shelf-life of a product is the time that the average characteristic of the product remains within an
approved specification.
Translated to standard preparation this means that the change of standard concentration respectively
the associated uncertainty must not exceed certain predefined limits.
This sounds very well in theory, but causes several problems for the implementation into practise.
The first constraint is given by defining of the maximum tolerable change of concentration, which
might be caused by degradation of the analyte, loss of solvent etc. The question which has to be
answered is how much may the change of the composition of the standard preparations contribute to
the combined measurement uncertainty. There is not any general guidance on this. An appropriate
value has to be set on case by case basis. However, a relative change of the standard concentration of
1 % to 2 % could be acceptable.
The second problem consists of the identification of changes in practise, and related to that to the set
up of the experimental plan to proof the agreement with the predefined specifications. At the
beginning of such studies little knowledge of the stability of the standard solutions is available. Hence
the shelf life has to be estimated based on experiences made with similar substances, or information
from literature. The time of the study has to cover at least this first estimate of the shelf life.
The tested standard solution must be independent from the standard solution that is used for
instrument calibration in order to identify any changes, and hence estimate its shelf life. Usually
laboratories use one standard preparation at a time as standard solutions are expensive. Hence
requesting the preparation of a fresh standard solution for each set of shelf life experiments would be
illusionary. In addition the preparation of fresh standard solutions would make the determination of
the shelf life superfluous. More economically would be applying a single, second, independent,
standard solution over the whole period of shelf life experiments. However this does not provide the
requested information because in case of significant differences it is not possible to trace back which
of the two standard solutions has changed. Therefore it might be worth to look for alternatives.
A possibility could be to apply in the shelf life experiments a chemical as internal standard that is
available in large amounts at low costs. This chemical serves as reference point. The solution of the
FAQ on calibration
8
reference point has to be prepared freshly for each set of shelf life experiments. The low costs would
allow using large quantities in the standard preparation, which lowers the risk of bias. In the
experiments relative response factors between the analyte and the reference chemical are determined,
and any changes are monitored. The selection of a chemical serving as reference point depends on the
properties of the analyte.
The integrity/stability of standard preparations has to be monitored over the whole shelf life of the
standard preparation. Control charts shall be applied for this purpose. Repeated measurements shall be
performed at each control point in order to estimate the variability of the measurements.
The shelf life of the standard preparation can be shortened or extended depending of the experimental
results.
9. Which type of volumetric glassware may I use for the preparation of calibration standards?
The contribution of glassware tolerance to the global uncertainty of the method is very low but not
negligible. Class A glassware according to ISO standard 1042:1983 shall be applied. For light
sensitive substances the glass ware shall be produced from amber glass. The maximum tolerances for
different volumes are given in ISO standard 1042:1983 as well (for instance it is ± 0.04 ml for a 25 ml
flask). However it has to be pointed out that the handling (filling, emptying, parallax error) of
volumetric glass ware will contribute to the total uncertainty of the standard preparation probably to a
larger extend than the tolerances according to ISO standard 1042:1983. Gravimetric standard
preparation is considered superior to volumetric standard preparation with regard to precision.
10. How do I verify the concentration of my standard preparations? The verification of the standard concentration is crucial for assuring the quality of analysis results. In a
limited number of cases the concentration of standard preparations can be verified by application of
reference methods, e.g. the concentration of aflatoxin standard solutions in methanol/water can be
verified by photometry. More likely the concentration of a particular standard preparation can only be
verified against other standard preparations. Best practise in that respect would be verification against
a solution with certified values for the analyte(s). Such certified reference materials (CRMs) are
frequently not available.
Hence the concentration of the standard preparation shall be evaluated against an independent standard
preparation.
The minimum requirement is to verify the concentration of a new standard preparation against the
concentration of the preceding standard preparation.
FAQ on calibration
9
Bracketing calibration as detailed in ISO standard 11095:1996 shall be preferably applied for the
verification measurements, as this technique yields usually greater accuracy than linear calibration.
11. How many points do I need for a calibration curve? Before answering to this question the purpose of the calibration experiment has to be defined.
One has to distinguish between the calibration of a measurement system and the check of the validity
of the calibration of a measurement system.
Both topics are treated in depth by international standards such as ISO standard 11095:1996 and the
IUPAC guideline "Guidelines for calibration in analytical chemistry".
The first case, called in ISO standard 11095:1996 the "basic method", is usually applied for the
estimation of linear calibration functions. It encompasses the measurement of a certain number of
reference materials (calibration standards) at different concentration levels.
The minimum number of calibration points/levels is defined by ISO standard 11095:1996 for the basic
calibration method to three. However it also says that the number of levels shall be increased for an
initial assessment of the calibration function. This initial assessment is equal to operations performed
during method validation to assess the linear range of a measurement method. The EURACHEM
Guide "The Fitness for Purpose of Analytical Methods" specifies for that purpose at least six
concentration levels plus blank. The above mentioned IUPAC guide does not specify any concrete
number of calibration levels. Commission Decision 2002/657/EC stipulates at least 5 concentration
levels including zero for the construction of a calibration curve.
Other documents might lay down a different number of calibration levels. For example ISO standard
15302:2007 specifies four calibration levels, whereas the LGC/VAM guide "Preparation of Calibration
Curves" defines seven calibration levels, including blank, as minimum requirement for an initial
assessment of the calibration function. ISO 8466-1:1990 demands even ten calibration levels.
As can be seen the design of calibration experiments and the number of calibration levels depend very
much of the purpose of the experiment and of existing knowledge. The linearity of the instrument
response was probably tested for the analysis method that became an ISO standard. Hence ISO
regarded four calibration levels sufficient for the estimation of the calibration function. Less
knowledge on the shape of the calibration functions requires performing of measurements on more
concentration levels. The inclusion of blank or zero levels into the calibration design is required, if the
blank or zero sample produces a signal that is of the same nature as the signal produced by the analyte.
If the blank or zero sample does not produce any signal it can be excluded from the calibration
experiments.
FAQ on calibration
10
In general three concentration levels are required to fit a non-linear function, and at least one more
calibration level is needed for the statistical assessment of the calibration model. Increasing the
number of calibration levels and the number of replicate analyses per level reduces the width of
confidence and prediction intervals. However the return in terms of narrowing confidence intervals is
diminishing with the number of calibration levels. Exceeding ten calibration levels does not provide
any additional benefit. Figure 2 shows the confidence intervals for simulated calibration experiments
performed at different numbers of calibration levels. Each calibration level was measured once. The
underlying data are displayed in Table 1.
Table 1: Data of simulated calibration experiments including different numbers of calibration levels
Number of calibration points 2 3 4 7 9 Level Response 1 1.05 1.05 1.05 1.05 1.05 2 2 2 3 2.8 2.8 4 3.85 3.9 5 5.1 5.1 5.1 6 5.85 5.85 7 7.1 7.1 7.1 8 8.05 9 8.8 8.8 8.8 8.8 8.8 Slope(x) + Intercept
0.9688x + 0.0813
0.9688x + 0.1396
0.9837x + 0.0357
0.9871x + 0.0178
0.9950x + -0.0139
FAQ on calibration
11
Figure 2: Confidence intervals for simulated calibration experiments at two (black dashed line), three
(red dashed line), four (green dashed line), seven (purple dashed line) and nine (grey dashed line)
concentration levels (concentration points). The lines in the middle represent the calibration curves
corresponding to the different scenarios.
The check of the validity of a calibration system has to be clearly distinguished from the initial
calibration. This procedure is based on the information gained in an initial calibration experiment. ISO
standard 11095:1996 applies the term "Control method" for the check of the validity of a calibration
system. At least two, preferably three calibration levels are used to monitor via control charts the
validity of the calibration function, and to detect any shifts or errors.
12. How many replicates per calibration point? The ISO standard 11095:1996 demands at least two replicate analyses per calibration level and
recommends as many as possible. At least two replicate analyses are necessary to evaluate the
calibration for constancy of the residual standard deviation. This information is needed to decide on
which regression model is most appropriate (see below).
FAQ on calibration
12
Increasing the number of replicate analyses follows, as the number of calibration levels, the law of
diminishing return. Hence more than five replicate analyses per calibration level do not provide big
additional benefit.
NOTE: A very important thing to consider is that all performance data associated with a standard
method is based on the calibration procedure mentioned therein. If you deviate from this calibration
procedure, it is your responsibility to demonstrate that the modified calibration procedure will give
equivalent results.
13. Why shall the concentration levels of the calibration standards be equidistant?
The reason is that the higher the concentration of a respective calibration standard, the more it is
weighted for the calculation of the calibration curve (this is called leverage). As a result the calculated
slope and intercept might be influenced disproportionally by one data point.
The effect is demonstrated based on a simulated calibration experiment. In Figure 3 each calibration
level corresponds in concentration to the double of the next lower concentration level. Two data points
of the example, corresponding to the highest concentration level and one concentration level at the
lower end of the concentration range, were manipulated, one at a time, and calibration curves were
determined by linear regression. In each of the experiments one data point got a relative offset of
-20 %. The respective data points are indicated by bold dots.
The effect of the offset of the data point at the lower end of the calibration range (green dot) on the
regression curve is marginal. The contrary is the case if the data for the highest concentration level
would be biased. The signal value of this data point was changed from about 800 to 600. As a
consequence, both the slope and the intercept of the calibration curve change significantly.
This effect is based on the principle of the applied regression method, which aims to minimise the sum
of the squared residuals. Since the residual (absolute signal value) caused by the relative offset (20 %)
is much higher at the upper end of the calibration range than at the lower end, the data point at the
upper end of the calibration range gets higher weight, as mentioned before. Such relative offsets are
caused in practise by e.g. pipetting mistakes.
Figure 3: Simulated calibration experiments with a relative offset of -20% of one data point in each
experiment. The offset of the red dot (●) at the higher level has a much bigger influence on the
FAQ on calibration
13
resulting red calibration curve (—) than the offset of the green dot (●) on the resulting green
calibration curve (—).
Leverage
Analyte concentration
0 20 40 60 80
Sig
nal
0
200
400
600
800
It has to be stressed that the application of calibration designs based on standard concentrations that
correspond to multiples of the next lower concentration is strongly discouraged; despite they are
frequently found in practise.
The difference in effect on the regression curve of one biased calibration point is displayed in Figure 4
both for a set of six equidistant concentration levels and a set of six unevenly distributed concentration
levels (multiplication factor = 2). The offset of the data point at the highest concentration level has less
influence on the regression curve in the calibration with equidistant concentration levels than with
unevenly distributed concentration levels.
Figure 4: Effect of one biased calibration point (at concentration level 80, offset of signal = -20%) on
the regression curve of calibration experiments with equidistant (pink) and unevenly distributed (blue)
analyte concentration levels
0100
200300
400500
600700
800
0 20 40 60 80 100
Analyte concentration
Sig
nal
Unevenly distributed
Equidistant
Linear regression
Linear regression
FAQ on calibration
14
14. Which range of concentration has the calibration to cover? The calibration shall cover at least the content/concentration range in which you will need to report
results.
The calibration range defines also the working range of the analysis method.
The calibration standards have to be at concentration levels corresponding to the concentration levels
of the ready to measure/inject sample.
As a result it can be very narrow in concentration (e.g. around a legislative limit), provided the interest
concerns only this small working range. The upper range of concentration that a calibration
experiment may span is not defined. However factors such as homo-/heteroscedasticity (see below)
shall be taken into account in the design of the experiments. As rule of thumb the ratio between the
concentrations of the highest and lowest concentration levels shall not exceed a factor between 10 and
20.
Occasionally the analyte content of test samples will exceed this concentration range the instrument is
calibrated for. In that respect caution has to be given to “simply” diluting the test sample extract to
bring it to a concentration level that is covered by the instrument calibration, and to re-analyse it.
This might be possible in many cases, but is per se not applicable for all analysis methods due to the
alteration of matrix effects. However, where shown by experiments to be appropriate, a dilution can be
made.
FAQ on calibration
15
15. Which type of internal standard shall I use? The most important properties of a suitable internal standard are:
• the internal standard must behave the same or at least very similar to the analyte in question.
• the internal standard must not be found in the sample itself, otherwise the interpretation of the
internal standard data can be jeopardized.
• the concentration of the internal standard added to the sample shall preferably be in the middle
of the range of expected analyte concentrations
There are different options for the choice of internal standards. The applicability of the different
possibilities depends on the purpose of the internal standard and the applied detection system.
For example, if the analysis method comprises chromatography with optical detection (such as
fluorescence or UV-absorption) the chosen internal standard has to behave chemically and physically
very similar to the analyte (e.g. in extraction and clean up steps), but must be chromatographically
resolved from the analyte. Often analogues of the actual analyte are taken for this purpose.
Structural isomers of target analytes (e.g. benzo[b]chrysene) are applied for the determination of PAHs
in food by high performance liquid chromatography with fluorescence detection (HPLC-FLD).
Another option is provided by the application of fluorinated analogues of the target PAHs, because
chemical properties are very similar and chromatographic separation can easily be achieved. The same
holds true for deuterium substituted PAHs, which show in HPLC also slightly different retention
characteristics compared to the native compounds.
In the field of mycotoxins aflatoxicol (a metabolite of aflatoxins) is used as internal standard for the
determination of aflatoxins. Also structurally similar substances have been proposed when a
derivatisation is required and the analogue (internal standard) must react in the same manner as the
analyte. Examples are the use of verrucarol for the determination of fusarium toxins (for GC methods),
squaric acid for the determination of moniliformin (for HPLC-FL methods) or de-epoxy
deoxynivalenol (DOM-1) for GC methods.
If the chosen internal standard has a much different retention time and therefore most likely a rather
different chemical behaviour (e.g. in terms of polarity) it is likely that it also behaves different from
the analyte during extraction or clean-up. As a result, close structural analogues of the analyte are
preferably used.
FAQ on calibration
16
In the case of chromatography coupled to mass selective detection the substances of choice are isotope
labelled analogues of the analyte. This offers the detection of both substances (the analyte and the
labelled internal standard) with the same or very similar retention time, which is necessary for
compensating for matrix effects. The choice between deuterated and C13-labelled substances needs to
take into account different facts.
The differently labelled substances might show significant physico-chemical differences. Per-
deuterated substances as commercialised for some PAHs have, as mentioned above, different retention
characteristics compared to the native compounds, which might provide problems when it comes
about compensation of matrix effects in mass spectrometry. The possibility of deuterium-hydrogen
exchange cannot be excluded with deuterated compounds. Also the loss of deuterium atoms in
chemical reactions of the analyte, such as derivatisation reactions, might lead to problems in
distinguishing between the mass spectrometric signals of the native compound and the labelled
analogue. This phenomenon is encountered in the determination of acrylamide by GC-MS after
chlorination and consecutive dehydrochlorination. The hydrogen isotope clusters of some fragment
ions of the labelled and native acrylamide overlap partially, which makes them unsuitable for
quantitative analysis.
C13 labelled compounds do not provide such problems. However the costs for this kind of labelled
substances are substantially higher than for deuterated substances and the availability is limited.
16. When do I need to prepare matrix matched calibration standards? A matrix matched calibration is needed in those cases, where the matrix (even after clean-up
procedures) has an influence on the signal obtained for the analyte during measurement. Many
analysis systems are sensitive to matrix effects, e.g. LC-MS or GC-MS. Also fluorescence detection
can be subject to matrix influences (e.g. fluorescence quenching). However care must be taken, that
the matrix used to prepare the calibrant is sufficiently well matched to the matrix of the sample.
Isotope dilution with isotope labelled analogues of the target analyte is frequently applied to
compensate for matrix effects. The basic assumption with this technique is that relative responses
between the analyte and the labelled analogue stay constant.
FAQ on calibration
17
17. How do I determine matrix effects? Matrix effects can be identified from calibration curves obtained with matrix matched calibration
standards and calibration solutions in solvent. Matrix effects are encountered when the intercepts
and/or the slopes of the regression curves for the two sets of calibration solutions are significantly
different from each other. Ignoring these facts would lead in the earlier case to constant bias and in the
latter case to proportional bias.
The procedure to identify matrix effects is the same as to estimate a recovery function.
In the first step a calibration curve is constructed by linear regression with the calibration standards in
solvent solution. In the next step another calibration curve is constructed from the measurement data
of the matrix matched calibration standards.
Before proceeding it must be guaranteed that the precision of the two calibration curves is comparable.
Otherwise any significant difference between the calibration curves might be hidden by the different
level of precision. This is accomplished by testing the residual standard deviations of the two
calibration curves for significant differences (with an F-test) at the 99% confidence level. The number
of degrees of freedom is for each calibration experiment N-2 (with N=number of data).
Given that no significant differences of the residual standard deviations were identified, the
measurement data (y-values) of the matrix matched calibration are applied to the calibration function
gained with the calibration standards in solvent, and the corresponding concentration values (x-values)
are calculated. These values are called in the following "apparent concentration values". In the next
step a linear regression is performed on the concentration data of the calibration standard solutions in
solvent (x-values) and the apparent concentration values (used as signal data – y-values). This
regression curves contains the information on matrix effects. A slope different from one indicates
potential concentration proportional signal enhancement respectively signal suppression. An intercept
different from zero indicates concentration independent bias. However as the regression is based on a
particular data set, the question has to be answered whether the deviations from the ideal values
(slope=1, intercept=0) are significant or just random, as a consequence of the variability in the limited
number of data points. To answer this question, the confidence intervals (95% confidence level) of the
regression parameter have to be determined. If the confidence intervals include for the slope the value
one, and for the intercept the value zero, than it can be concluded that there is not any statistical
difference between the calibration with calibration standards in solvent solution and the matrix
matched calibration.
Matrix matched calibrations are an alternative to isotope dilution to compensate for matrix effects
when using mass spectrometry for measurement.
FAQ on calibration
18
18. In which sequence shall I measure the calibration standards? Generally the sequence in which the calibration standards are measured should be random.
The decision on whether to measure the calibration standards at the beginning of a sample sequence, at
the beginning and at the end of the sample sequence, or randomly distributed over the sample
sequence depends of the stability of the measurement system.
ISO standard 11095:1996 specifies as general requirements that "the measurements from which the
calibration function was calculated are representative of the normal conditions under which the
measurement system operates" and "that the measurement system is in a state of control". If the
measurement system is stable throughout the whole sample sequence then all approaches will give
equal results. However the design of the measurement sequence has to be modified if any instrument
drift is expected. Such modifications could exist of repeated analyses of the calibration solutions
during the measurement sequence or the inclusion of an increased number of quality control samples
in the measurement sequence.
Evaluation of calibration measurements
19. How shall I test for linearity of the calibration? For the purposes of this document linearity means the calibration can be best described by a straight
line. Linearity may also mean the estimated parameters are linear which would also be true for a
parabola, something that is not a straight line at all.
A straight line can be described by Eq 5:
nkn10nk ε+xβ+β=y Eq 5
with
β0 = intercept
β1 = line slope
ynk = the kth measured response of calibration level n
xn = the concentration of the analyte in calibration level n
εnk = the residual for the kth measurement of calibration level n.
The residual is the difference between the measured response and the response value calculated from
the calibration function:
nnknk yy=ε )−
FAQ on calibration
19
Plotting all nkε over ny) (residuals over fitted) results in the so called residual plot. This plot is a very
valuable diagnostic tool. If the points are evenly distributed around a horizontal line trough zero the
straight line function will be appropriate (see Figures 5 & 6).
Another, more complex approach, is the lack-of-fit test. If the lack-of-fit test is not significant then a
straight line function describes the calibration data appropriately. Replicate measurements at each
calibration level are a prerequisite for a lack-of-fit test.
20. Does a correlation coefficient (r) of 0.99 indicate linearity of calibration? No! The correlation coefficient is a measure of how much of the variability of y can be predicted by x.
An r-value of 1 indicates that y can be completely predicted by x, and a value of 0 indicates that y can
not be predicted by x. A parabola, which is markedly not a straight line, may have a correlation
coefficient of 0.99. And the r-value may improve by adding a quadratic term to one's calibration
function which then is certainly not linear in our sense of the word.
21. Which level of R² is sufficient? One can not define a sufficient level of R2! The closer R2 to 1 the better the quality of the predictions
made through the calibration. But certain calibration problems my never get beyond R2 = 0.98 while
for others 0.998 is a sign of an error.
0 2000 4000 6000 8000 10000
-200
-100
010
0
Fitted values
Res
idua
ls
Figure 5: straight line appropriate
0 2000 4000 6000 8000 10000
-400
-300
-200
-100
010
020
030
0Fitted values
Res
idua
ls
Figure 6: straight line inappropriate
FAQ on calibration
20
22. Which information can I get from the plot of residuals? The plot of residuals (see point 19) can show whether the assumption of linearity is met. But it can
also be used to check for homo- or heteroscedasticity of the calibration data and it is an indicator of the
residual variability.
23. What is the residual standard deviation? The residual standard deviation is a measure of the goodness-of-fit of the calibration. The smaller the
residual standard deviation, the closer are the measured data point to the calculated calibration curve.
It is used to calculate significance of the intercept and the slope.
24. May I force the calibration curve through the origin? If the test of significance shows that the estimated intercept is not different from zero, then the
intercept term is dropped from the calibration function and the calibration curve is assumed to
originate at x = 0 and y = 0. Otherwise the intercept term must be kept and the calibration curve is
assumed to originate at x = 0 and y = intercept.
25. What is homo- and heteroscedasticity? Homoscedasticity is the term for calibration data having about equal variability over the whole
calibration range. If the data's variability changes from one end of the range to the other the data is
called to be heteroscedastic.
FAQ on calibration
21
26. How do I test for homoscedasticity / heteroscedasticity? Whether one is dealing with homo- or heteroscedasticity, either can be determined from the residual
plot. In the case of homoscedasticity the residuals are more or less all within a band parallel to the x-
axis. In the case of heteroscedasticity the residuals assume a fan shape, from tight at one end to spread
out at the opposite end (see Figures 7 & 8).
27. Linear regression or weighted linear regression – which shall I apply? For homoscedastic data ordinary linear regression is appropriate. But if the data is heteroscedastic
ordinary linear regression will result in inflated estimates of the residual standard deviation. Therefore
weighted linear regression should be used in such case.
28. May I remove outliers? If an outlying value can be traced back to a failure in the system (e.g. injection error, bad
chromatography, pipetting error, etc.) then it is permissible to remove it or better yet to repeat the
measurement in question. If such a retrace does not come up with any failure then the outlying value
should be considered as a real but rare incident and kept in the data set.
0 2000 4000 6000 8000 10000
-300
-200
-100
010
020
0
Fitted values
Res
idua
ls
Figure 7: homoscedastic data
0 2000 4000 6000 8000 10000-1
000
-500
050
010
0015
00Fitted values
Res
idua
ls
Figure 8: heteroscedastic data
FAQ on calibration
22
29. How do I estimate confidence and prediction intervals? They are estimated based on the estimates of intercept ( 0β
)), slope ( 1β
)), and residual standard
deviation (σ) ) according to Eq 6.
( )( )∑ −
−+±+= − 2
2
2,101
xxxx
ntxy
n
CnpCC σββ )))
Eq 6
with
Cy = upper or lower bound of the confidence interval for Cx
Cx = value of x for which to compute the confidence interval
0β)
= estimate of the intercept
1β)
= estimate of the slope
σ) = estimate of the residual standard deviation
2, −npt = Student’s t for probability p and n-2 degrees of freedom
n = number of observations
x = average of all x-values of the calibration
nx = individual x-values of the calibration
The confidence interval or in the case of regression analysis better confidence band, defines the region
in which with a certain probability (usually 95%) the regression line would be found if the calibration
were repeated under similar conditions. As such the confidence band is of minor interest. More
important for the task of calibration is the prediction band which is wider than the confidence band.
( )( )∑ −−
++±+= − 2
2
2,1011
xxxx
ntxy
n
PnpPP σββ )))
Eq 7
The subscript C (Confidence) was replaced by the subscript P (Prediction). Otherwise the same
definitions as above are true.
The projection of the outer bounds of this prediction band onto the y-axis defines the range of values
which could reasonably be expected if one were too predict a new y for a new x.
The above formulas are for the ordinary least squares approach. If a weighted least squares approach
has to be used because of heteroscedasticity the weighted equivalents of all the estimates are used in
Eq 6 and 7.
FAQ on calibration
23
General
30. Is there any internationally harmonised document on calibration? ILAC/OIML Guide on Calibration
ILAC Guide G24:2007 / OIML D 10:2007 "Guidelines for the determination of calibration intervals of
measuring instruments", ILAC, Silverwater, Australia, 2007.
IUPAC Recommendations 1998
K. Danzer, L.A. Currie (1998), "Guidelines for Calibration in Analytical Chemistry – Part 1.
Fundamentals and single component calibration", Pure&Appl.Chem., 70: 993-1014
ISO Guide
ISO Guide 32:1997 "Calibration in analytical chemistry and use of certified reference materials", ISO,
Geneva, Switzerland, 1997
ISO Standard
ISO 8466-1:1990 "Water quality – Calibration and evaluation of analytical methods and estimation of
performance characteristics. Part 1: Statistical evaluation of the linear calibration function", ISO,
Geneva, Switzerland, 1990
ISO Standard
ISO 11095:1996 "Linear calibration using reference materials", ISO, Geneva, Switzerland, 1996
31. Where can I get guidance on calibration?
LGC Best practice guide for calibration design
LGC Document "Preparation of calibration curves – a guide to best practice. (2003)
L. Cuardos-Rodriguez, L. Gámiz-Gracia, E.M. Almansa-López, J.M. Bosque-Sendra, (2003)
"Calibration in chemical measurement processes. II. A methodological approach", Trends in Anal.
Chem., 20: 620-636