frequently asked questions on calibration -...

24
FAQ on calibration 1 Rationale From experiences gained in inter-laboratory comparison studies, being it proficiency tests or method validation studies by collaborative trial, we know that the importance of instrument calibration and its effect on analysis results is frequently underestimated. Participants to some of the studies expressed their wish to get more guidance on instrument calibration; despite a number of international guidance documents is already available. This situation could be explained by the difficult language applied in some of the guides, respectively the lack of knowledge on where to find very practical and easily understandable guidance. The more than five million hits gained at the time of writing this text with one of the most prominent internet search engines for the term "instrument calibration" do not really contribute to improve the situation. Therefore we felt it necessary to prepare this document that aims to address some aspects of standard preparation and instrument calibration for the determination of polycyclic aromatic hydrocarbons (PAHs), respectively mycotoxins in food that seem to provide difficulties to some operators. Like in every question-and-answer scenario one will face the fact that either a question can be asked extremely specific, resulting in a similar specific answer or a question can be asked rather general and the answer thus might also be rather general. This might result in a dilemma, in which the answer to a question might not be the one of interest, because either it is too specific and not applicable to the exactly (slightly different) question one might have, or it is too general and does not touch specific aspect one had in mind. In other words: “If a model is simple, it likely will be wrong, if it is complex, it surely is impractical” Applying this to this guide, the compromise was to try to answer both to relevant general issues but also to a few specific ones that are sometimes encountered. The format of the guide was chosen on purpose, because as a frequently asked questions (FAQ) document it remains open to address and include any question regarding standard preparation and instrument calibration that might come up in future.

Upload: hanga

Post on 05-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

FAQ on calibration

1

Rationale

From experiences gained in inter-laboratory comparison studies, being it proficiency tests or method

validation studies by collaborative trial, we know that the importance of instrument calibration and its

effect on analysis results is frequently underestimated. Participants to some of the studies expressed

their wish to get more guidance on instrument calibration; despite a number of international guidance

documents is already available. This situation could be explained by the difficult language applied in

some of the guides, respectively the lack of knowledge on where to find very practical and easily

understandable guidance. The more than five million hits gained at the time of writing this text with

one of the most prominent internet search engines for the term "instrument calibration" do not really

contribute to improve the situation. Therefore we felt it necessary to prepare this document that aims

to address some aspects of standard preparation and instrument calibration for the determination of

polycyclic aromatic hydrocarbons (PAHs), respectively mycotoxins in food that seem to provide

difficulties to some operators.

Like in every question-and-answer scenario one will face the fact that either a question can be asked

extremely specific, resulting in a similar specific answer or a question can be asked rather general and

the answer thus might also be rather general. This might result in a dilemma, in which the answer to a

question might not be the one of interest, because either it is too specific and not applicable to the

exactly (slightly different) question one might have, or it is too general and does not touch specific

aspect one had in mind.

In other words:

“If a model is simple, it likely will be wrong,

if it is complex, it surely is impractical”

Applying this to this guide, the compromise was to try to answer both to relevant general issues but

also to a few specific ones that are sometimes encountered. The format of the guide was chosen on

purpose, because as a frequently asked questions (FAQ) document it remains open to address and

include any question regarding standard preparation and instrument calibration that might come up in

future.

FAQ on calibration

2

Index Standard preparation .................................................................................................................. 3

1. Where do I get reference materials for PAH analysis from? ......................................... 3 2. Which level of purity of reference materials is acceptable? .......................................... 4 3. Which are the advantages of gravimetric standard preparation? ................................... 4 4. Which type of balance do I need for the preparation of calibration standards?............. 4 5. Is serial dilution of a standard solution for the preparation of calibration standards

acceptable? ..................................................................................................................... 5 6. How shall I store PAH standard solutions?.................................................................... 6 7. Which containers shall I use for storage of standard solutions? .................................... 7 8. How shall I estimate the shelf life of my standard preparations? .................................. 7 9. Which type of volumetric glassware may I use for the preparation of calibration

standards? ....................................................................................................................... 8 10. How do I verify the concentration of my standard preparations? .................................. 8 11. How many points do I need for a calibration curve? ..................................................... 9 12. How many replicates per calibration point?................................................................. 11 13. Why shall the concentration levels of the calibration standards be equidistant? ......... 12 14. Which range of concentration has the calibration to cover? ........................................ 14 15. Which type of internal standard shall I use? ................................................................ 15 16. When do I need to prepare matrix matched calibration standards? ............................. 16 17. How do I determine matrix effects? ............................................................................. 17 18. In which sequence shall I measure the calibration standards? ..................................... 18

Evaluation of calibration measurements .................................................................................. 18 19. How shall I test for linearity of the calibration?........................................................... 18 20. Does a correlation coefficient (r) of 0.99 indicate linearity of calibration? ................. 19 21. Which level of R² is sufficient?.................................................................................... 19 22. Which information can I get from the plot of residuals? ............................................. 20 23. What is the residual standard deviation?...................................................................... 20 24. May I force the calibration curve through the origin?.................................................. 20 25. What is homo- and heteroscedasticity? ........................................................................ 20 26. How do I test for homoscedasticity / heteroscedasticity? ............................................ 21 27. Linear regression or weighted linear regression – which shall I apply? ...................... 21 28. May I remove outliers? ................................................................................................ 21 29. How do I estimate confidence and prediction intervals? ............................................. 22

General ..................................................................................................................................... 23 30. Is there any internationally harmonised document on calibration?.............................. 23 31. Where can I get guidance on calibration? .................................................................... 23

FAQ on calibration

3

Standard preparation

1. Where do I get reference materials for PAH analysis from? A number of suppliers of chemicals have PAH standards in their assortment. A non-exhaustive list of

suppliers, respectively links to other sources of information is given in the following:

The International Society for Polycyclic Compounds (ISPAC) has on its website a list of suppliers of

polycyclic aromatic hydrocarbons and heterocyclic aromatic compounds both neat and in solution:

ISPAC Standards

A searchable database on suppliers of different chemicals is on the homepage of Chemindustry

(www.chemindustry.com). The following link gives an example for suppliers of benzo[a]pyrene (neat,

and in solution):

ChemIndustry: example of search for benzo[a]pyrene

A similar searchable database which returns besides the name of different suppliers also some

information on the product (e.g. packaging size) can be found on the webpage www.chemexper.com

chemexper.com

A large collection of PAH reference substances, among others different certified reference materials, is

included in the 2008/2009 catalogue of LGC. It contains single substance reference materials (neat and

in solutions, native and labelled) as well as PAH mixtures.

LGC standards

- Important suppliers of

reference materials for PAH in Europe (non-exhaustive list)

ALFA Aesar, Chiron, Dr. Ehrenstorfer, SIGMA Aldrich, VWR

- certified reference materials (CRMs) for PAHs

The Institute for Reference Materials and Measurements (IRMM), LGC, the National Institute of

Standards and Technology (NIST)

FAQ on calibration

4

2. Which level of purity of reference materials is acceptable? A purity of 100 % would be desirable, but in reality most of the target PAHs (15+1 EU priority PAHs)

are available on the market in purities of above 95%.

Hence the operator has to choose a reference material with a purity that is suitable for the particular

task. However, care must be given that impurities do not interfere with the target analytes.

The purity of the reference substances shall be considered in the calculation of the standard

concentrations. The uncertainty of the purity shall be included in the measurement uncertainty

estimate.

3. Which are the advantages of gravimetric standard preparation? The weighing procedure is more precise than handling of volumes, which results normally in smaller

uncertainties. Handling of low volumes of liquids is difficult due to the influence of many factors such

as surface tension, and leads frequently to bias.

For gravimetric standard preparation it shall be noted that the uncertainty from weighing increases

with decreasing amounts of weighed substance. This has consequences for the selection of the type of

balance and the weighing procedure applied.

A prerequisite for gravimetric standard preparation is thermal equilibrium of the balance and all

chemicals and consumables which are used for the standard preparation. Thermal equilibration might

take a couple of hours especially in case of large solvent volumes.

Before starting with gravimetric standard preparation make sure that the balance is working properly,

by applying suitable check weights.

4. Which type of balance do I need for the preparation of calibration standards?

An analytical balance with a readability of 0.1 mg, respectively 0.01 mg for weighing of substances at

levels as low as about 30 milligram, will be fit for the purpose, which means that the uncertainty of

weighing is at an acceptable level.

The US Pharmacopeia [1] defines the minimum permissible weight of a balance as a load that will

give a relative uncertainty of less than 0.1%. As a rule of thumb the minimum weight can be estimated

for a balance by multiplying the readability of the balance (e.g. 0.1 mg) with a factor between 3000

and 5000.

FAQ on calibration

5

However the applicability of this rule of thumb depends on the precision of the balance and has to be

evaluated experimentally according to Eq 1:

00103 .amountweighed

Stdevx≤

tsmeasuremen10of Eq 1

It has to be noted that the minimum weight corresponds only to the amount of substance weighed and

does not include the tare weight of the weighing vessel!

In any case, care has to be given that the balance is calibrated and working according to the

specifications. Also provisions on environmental conditions must be respected, e.g. too low air

humidity leads to electrostatic problems and might cause bias.

Further information on the use of balances in standard preparation can be found in open source

literature in e.g. a paper published by Ch Burgess and R.D. McDowall [2]

5. Is serial dilution of a standard solution for the preparation of calibration standards acceptable?

No! Two aspects have to be taken into account in standard preparation by serial dilution. The probably

more important aspect is the lack of ability of identifying biased standard preparation.

Figure 1 A presents a standard preparation scheme where bias in the preparation of dilution 1 (D1)

from the stock standard solution (S) cannot be identified from the measurement results of the

calibration standards (CS1 to CS5). Even worse would be scheme B which includes a cascade of

dilutions of the calibration standards. Besides the risk of unidentified bias it provides high uncertainty

of the concentration of standard CS5, which is prepared in six dilution steps. According to the law of

error propagation the uncertainty of CS5 is equal to the square root of the sum of uncertainties of the

preparations S to CS5, which is of course larger than the uncertainty of any other calibration standard

shown in Figure 1.

FAQ on calibration

6

Figure 1: Different schemes for the preparation of calibration standards. Each arrow represents one

dilution step (S: stock standard solution; D1 dilution 1; CS1 to CS5: calibration standard solutions)

A B

S

D1

CS1 CS2 CS3 CS4 CS5

C

The most appropriate of the three schemes is shown in Figure 1 C. The calibration standard

solutions are prepared from independent dilutions of the stock standard solution. By doing so,

an error in the preparation of an intermediate dilution (D1 to D5) should be detectable in the

measurement results of the calibration standards.

Duplicating scheme C with two independent stock standard solutions provides the highest level of

information about the correctness of the calibration standards.

In practice the preparation of calibration standards needs thorough planning. The handling of low

volumes or low masses shall be avoided as much as possible. In case of PAHs limitations have to be

encountered in the preparation of the stock standard solutions, which are caused by the low solubility

of some PAHs (e.g. dibenzopyrenes) in the majority of organic solvents.

6. How shall I store PAH standard solutions? PAH standard solutions shall be stored in amber glass ware in the dark due to potential degradation of

PAHs by UV light. Room temperature (about 20 °C) is recommended for storage of PAH standard

solutions by a number of suppliers. Opened commercial standards and own standard preparations

should be stored cooled to avoid solvent losses. Do not put PAH standard solutions in the freezer as

the solubility of some PAHs might be affected at low temperatures.

FAQ on calibration

7

7. Which containers shall I use for storage of standard solutions? Amber glass ware with Teflon® lined closures should be used.

As a general rule, the headspace above the standard solution shall be as small as possible. It is also

recommended to divide the stock standard solution preparations for storage into several units of small

volume in order to conserve the composition of the parts of the preparations, which are at the time

being not in use.

8. How shall I estimate the shelf life of my standard preparations? The shelf-life of a product is the time that the average characteristic of the product remains within an

approved specification.

Translated to standard preparation this means that the change of standard concentration respectively

the associated uncertainty must not exceed certain predefined limits.

This sounds very well in theory, but causes several problems for the implementation into practise.

The first constraint is given by defining of the maximum tolerable change of concentration, which

might be caused by degradation of the analyte, loss of solvent etc. The question which has to be

answered is how much may the change of the composition of the standard preparations contribute to

the combined measurement uncertainty. There is not any general guidance on this. An appropriate

value has to be set on case by case basis. However, a relative change of the standard concentration of

1 % to 2 % could be acceptable.

The second problem consists of the identification of changes in practise, and related to that to the set

up of the experimental plan to proof the agreement with the predefined specifications. At the

beginning of such studies little knowledge of the stability of the standard solutions is available. Hence

the shelf life has to be estimated based on experiences made with similar substances, or information

from literature. The time of the study has to cover at least this first estimate of the shelf life.

The tested standard solution must be independent from the standard solution that is used for

instrument calibration in order to identify any changes, and hence estimate its shelf life. Usually

laboratories use one standard preparation at a time as standard solutions are expensive. Hence

requesting the preparation of a fresh standard solution for each set of shelf life experiments would be

illusionary. In addition the preparation of fresh standard solutions would make the determination of

the shelf life superfluous. More economically would be applying a single, second, independent,

standard solution over the whole period of shelf life experiments. However this does not provide the

requested information because in case of significant differences it is not possible to trace back which

of the two standard solutions has changed. Therefore it might be worth to look for alternatives.

A possibility could be to apply in the shelf life experiments a chemical as internal standard that is

available in large amounts at low costs. This chemical serves as reference point. The solution of the

FAQ on calibration

8

reference point has to be prepared freshly for each set of shelf life experiments. The low costs would

allow using large quantities in the standard preparation, which lowers the risk of bias. In the

experiments relative response factors between the analyte and the reference chemical are determined,

and any changes are monitored. The selection of a chemical serving as reference point depends on the

properties of the analyte.

The integrity/stability of standard preparations has to be monitored over the whole shelf life of the

standard preparation. Control charts shall be applied for this purpose. Repeated measurements shall be

performed at each control point in order to estimate the variability of the measurements.

The shelf life of the standard preparation can be shortened or extended depending of the experimental

results.

9. Which type of volumetric glassware may I use for the preparation of calibration standards?

The contribution of glassware tolerance to the global uncertainty of the method is very low but not

negligible. Class A glassware according to ISO standard 1042:1983 shall be applied. For light

sensitive substances the glass ware shall be produced from amber glass. The maximum tolerances for

different volumes are given in ISO standard 1042:1983 as well (for instance it is ± 0.04 ml for a 25 ml

flask). However it has to be pointed out that the handling (filling, emptying, parallax error) of

volumetric glass ware will contribute to the total uncertainty of the standard preparation probably to a

larger extend than the tolerances according to ISO standard 1042:1983. Gravimetric standard

preparation is considered superior to volumetric standard preparation with regard to precision.

10. How do I verify the concentration of my standard preparations? The verification of the standard concentration is crucial for assuring the quality of analysis results. In a

limited number of cases the concentration of standard preparations can be verified by application of

reference methods, e.g. the concentration of aflatoxin standard solutions in methanol/water can be

verified by photometry. More likely the concentration of a particular standard preparation can only be

verified against other standard preparations. Best practise in that respect would be verification against

a solution with certified values for the analyte(s). Such certified reference materials (CRMs) are

frequently not available.

Hence the concentration of the standard preparation shall be evaluated against an independent standard

preparation.

The minimum requirement is to verify the concentration of a new standard preparation against the

concentration of the preceding standard preparation.

FAQ on calibration

9

Bracketing calibration as detailed in ISO standard 11095:1996 shall be preferably applied for the

verification measurements, as this technique yields usually greater accuracy than linear calibration.

11. How many points do I need for a calibration curve? Before answering to this question the purpose of the calibration experiment has to be defined.

One has to distinguish between the calibration of a measurement system and the check of the validity

of the calibration of a measurement system.

Both topics are treated in depth by international standards such as ISO standard 11095:1996 and the

IUPAC guideline "Guidelines for calibration in analytical chemistry".

The first case, called in ISO standard 11095:1996 the "basic method", is usually applied for the

estimation of linear calibration functions. It encompasses the measurement of a certain number of

reference materials (calibration standards) at different concentration levels.

The minimum number of calibration points/levels is defined by ISO standard 11095:1996 for the basic

calibration method to three. However it also says that the number of levels shall be increased for an

initial assessment of the calibration function. This initial assessment is equal to operations performed

during method validation to assess the linear range of a measurement method. The EURACHEM

Guide "The Fitness for Purpose of Analytical Methods" specifies for that purpose at least six

concentration levels plus blank. The above mentioned IUPAC guide does not specify any concrete

number of calibration levels. Commission Decision 2002/657/EC stipulates at least 5 concentration

levels including zero for the construction of a calibration curve.

Other documents might lay down a different number of calibration levels. For example ISO standard

15302:2007 specifies four calibration levels, whereas the LGC/VAM guide "Preparation of Calibration

Curves" defines seven calibration levels, including blank, as minimum requirement for an initial

assessment of the calibration function. ISO 8466-1:1990 demands even ten calibration levels.

As can be seen the design of calibration experiments and the number of calibration levels depend very

much of the purpose of the experiment and of existing knowledge. The linearity of the instrument

response was probably tested for the analysis method that became an ISO standard. Hence ISO

regarded four calibration levels sufficient for the estimation of the calibration function. Less

knowledge on the shape of the calibration functions requires performing of measurements on more

concentration levels. The inclusion of blank or zero levels into the calibration design is required, if the

blank or zero sample produces a signal that is of the same nature as the signal produced by the analyte.

If the blank or zero sample does not produce any signal it can be excluded from the calibration

experiments.

FAQ on calibration

10

In general three concentration levels are required to fit a non-linear function, and at least one more

calibration level is needed for the statistical assessment of the calibration model. Increasing the

number of calibration levels and the number of replicate analyses per level reduces the width of

confidence and prediction intervals. However the return in terms of narrowing confidence intervals is

diminishing with the number of calibration levels. Exceeding ten calibration levels does not provide

any additional benefit. Figure 2 shows the confidence intervals for simulated calibration experiments

performed at different numbers of calibration levels. Each calibration level was measured once. The

underlying data are displayed in Table 1.

Table 1: Data of simulated calibration experiments including different numbers of calibration levels

Number of calibration points 2 3 4 7 9 Level Response 1 1.05 1.05 1.05 1.05 1.05 2 2 2 3 2.8 2.8 4 3.85 3.9 5 5.1 5.1 5.1 6 5.85 5.85 7 7.1 7.1 7.1 8 8.05 9 8.8 8.8 8.8 8.8 8.8 Slope(x) + Intercept

0.9688x + 0.0813

0.9688x + 0.1396

0.9837x + 0.0357

0.9871x + 0.0178

0.9950x + -0.0139

FAQ on calibration

11

Figure 2: Confidence intervals for simulated calibration experiments at two (black dashed line), three

(red dashed line), four (green dashed line), seven (purple dashed line) and nine (grey dashed line)

concentration levels (concentration points). The lines in the middle represent the calibration curves

corresponding to the different scenarios.

The check of the validity of a calibration system has to be clearly distinguished from the initial

calibration. This procedure is based on the information gained in an initial calibration experiment. ISO

standard 11095:1996 applies the term "Control method" for the check of the validity of a calibration

system. At least two, preferably three calibration levels are used to monitor via control charts the

validity of the calibration function, and to detect any shifts or errors.

12. How many replicates per calibration point? The ISO standard 11095:1996 demands at least two replicate analyses per calibration level and

recommends as many as possible. At least two replicate analyses are necessary to evaluate the

calibration for constancy of the residual standard deviation. This information is needed to decide on

which regression model is most appropriate (see below).

FAQ on calibration

12

Increasing the number of replicate analyses follows, as the number of calibration levels, the law of

diminishing return. Hence more than five replicate analyses per calibration level do not provide big

additional benefit.

NOTE: A very important thing to consider is that all performance data associated with a standard

method is based on the calibration procedure mentioned therein. If you deviate from this calibration

procedure, it is your responsibility to demonstrate that the modified calibration procedure will give

equivalent results.

13. Why shall the concentration levels of the calibration standards be equidistant?

The reason is that the higher the concentration of a respective calibration standard, the more it is

weighted for the calculation of the calibration curve (this is called leverage). As a result the calculated

slope and intercept might be influenced disproportionally by one data point.

The effect is demonstrated based on a simulated calibration experiment. In Figure 3 each calibration

level corresponds in concentration to the double of the next lower concentration level. Two data points

of the example, corresponding to the highest concentration level and one concentration level at the

lower end of the concentration range, were manipulated, one at a time, and calibration curves were

determined by linear regression. In each of the experiments one data point got a relative offset of

-20 %. The respective data points are indicated by bold dots.

The effect of the offset of the data point at the lower end of the calibration range (green dot) on the

regression curve is marginal. The contrary is the case if the data for the highest concentration level

would be biased. The signal value of this data point was changed from about 800 to 600. As a

consequence, both the slope and the intercept of the calibration curve change significantly.

This effect is based on the principle of the applied regression method, which aims to minimise the sum

of the squared residuals. Since the residual (absolute signal value) caused by the relative offset (20 %)

is much higher at the upper end of the calibration range than at the lower end, the data point at the

upper end of the calibration range gets higher weight, as mentioned before. Such relative offsets are

caused in practise by e.g. pipetting mistakes.

Figure 3: Simulated calibration experiments with a relative offset of -20% of one data point in each

experiment. The offset of the red dot (●) at the higher level has a much bigger influence on the

FAQ on calibration

13

resulting red calibration curve (—) than the offset of the green dot (●) on the resulting green

calibration curve (—).

Leverage

Analyte concentration

0 20 40 60 80

Sig

nal

0

200

400

600

800

It has to be stressed that the application of calibration designs based on standard concentrations that

correspond to multiples of the next lower concentration is strongly discouraged; despite they are

frequently found in practise.

The difference in effect on the regression curve of one biased calibration point is displayed in Figure 4

both for a set of six equidistant concentration levels and a set of six unevenly distributed concentration

levels (multiplication factor = 2). The offset of the data point at the highest concentration level has less

influence on the regression curve in the calibration with equidistant concentration levels than with

unevenly distributed concentration levels.

Figure 4: Effect of one biased calibration point (at concentration level 80, offset of signal = -20%) on

the regression curve of calibration experiments with equidistant (pink) and unevenly distributed (blue)

analyte concentration levels

0100

200300

400500

600700

800

0 20 40 60 80 100

Analyte concentration

Sig

nal

Unevenly distributed

Equidistant

Linear regression

Linear regression

FAQ on calibration

14

14. Which range of concentration has the calibration to cover? The calibration shall cover at least the content/concentration range in which you will need to report

results.

The calibration range defines also the working range of the analysis method.

The calibration standards have to be at concentration levels corresponding to the concentration levels

of the ready to measure/inject sample.

As a result it can be very narrow in concentration (e.g. around a legislative limit), provided the interest

concerns only this small working range. The upper range of concentration that a calibration

experiment may span is not defined. However factors such as homo-/heteroscedasticity (see below)

shall be taken into account in the design of the experiments. As rule of thumb the ratio between the

concentrations of the highest and lowest concentration levels shall not exceed a factor between 10 and

20.

Occasionally the analyte content of test samples will exceed this concentration range the instrument is

calibrated for. In that respect caution has to be given to “simply” diluting the test sample extract to

bring it to a concentration level that is covered by the instrument calibration, and to re-analyse it.

This might be possible in many cases, but is per se not applicable for all analysis methods due to the

alteration of matrix effects. However, where shown by experiments to be appropriate, a dilution can be

made.

FAQ on calibration

15

15. Which type of internal standard shall I use? The most important properties of a suitable internal standard are:

• the internal standard must behave the same or at least very similar to the analyte in question.

• the internal standard must not be found in the sample itself, otherwise the interpretation of the

internal standard data can be jeopardized.

• the concentration of the internal standard added to the sample shall preferably be in the middle

of the range of expected analyte concentrations

There are different options for the choice of internal standards. The applicability of the different

possibilities depends on the purpose of the internal standard and the applied detection system.

For example, if the analysis method comprises chromatography with optical detection (such as

fluorescence or UV-absorption) the chosen internal standard has to behave chemically and physically

very similar to the analyte (e.g. in extraction and clean up steps), but must be chromatographically

resolved from the analyte. Often analogues of the actual analyte are taken for this purpose.

Structural isomers of target analytes (e.g. benzo[b]chrysene) are applied for the determination of PAHs

in food by high performance liquid chromatography with fluorescence detection (HPLC-FLD).

Another option is provided by the application of fluorinated analogues of the target PAHs, because

chemical properties are very similar and chromatographic separation can easily be achieved. The same

holds true for deuterium substituted PAHs, which show in HPLC also slightly different retention

characteristics compared to the native compounds.

In the field of mycotoxins aflatoxicol (a metabolite of aflatoxins) is used as internal standard for the

determination of aflatoxins. Also structurally similar substances have been proposed when a

derivatisation is required and the analogue (internal standard) must react in the same manner as the

analyte. Examples are the use of verrucarol for the determination of fusarium toxins (for GC methods),

squaric acid for the determination of moniliformin (for HPLC-FL methods) or de-epoxy

deoxynivalenol (DOM-1) for GC methods.

If the chosen internal standard has a much different retention time and therefore most likely a rather

different chemical behaviour (e.g. in terms of polarity) it is likely that it also behaves different from

the analyte during extraction or clean-up. As a result, close structural analogues of the analyte are

preferably used.

FAQ on calibration

16

In the case of chromatography coupled to mass selective detection the substances of choice are isotope

labelled analogues of the analyte. This offers the detection of both substances (the analyte and the

labelled internal standard) with the same or very similar retention time, which is necessary for

compensating for matrix effects. The choice between deuterated and C13-labelled substances needs to

take into account different facts.

The differently labelled substances might show significant physico-chemical differences. Per-

deuterated substances as commercialised for some PAHs have, as mentioned above, different retention

characteristics compared to the native compounds, which might provide problems when it comes

about compensation of matrix effects in mass spectrometry. The possibility of deuterium-hydrogen

exchange cannot be excluded with deuterated compounds. Also the loss of deuterium atoms in

chemical reactions of the analyte, such as derivatisation reactions, might lead to problems in

distinguishing between the mass spectrometric signals of the native compound and the labelled

analogue. This phenomenon is encountered in the determination of acrylamide by GC-MS after

chlorination and consecutive dehydrochlorination. The hydrogen isotope clusters of some fragment

ions of the labelled and native acrylamide overlap partially, which makes them unsuitable for

quantitative analysis.

C13 labelled compounds do not provide such problems. However the costs for this kind of labelled

substances are substantially higher than for deuterated substances and the availability is limited.

16. When do I need to prepare matrix matched calibration standards? A matrix matched calibration is needed in those cases, where the matrix (even after clean-up

procedures) has an influence on the signal obtained for the analyte during measurement. Many

analysis systems are sensitive to matrix effects, e.g. LC-MS or GC-MS. Also fluorescence detection

can be subject to matrix influences (e.g. fluorescence quenching). However care must be taken, that

the matrix used to prepare the calibrant is sufficiently well matched to the matrix of the sample.

Isotope dilution with isotope labelled analogues of the target analyte is frequently applied to

compensate for matrix effects. The basic assumption with this technique is that relative responses

between the analyte and the labelled analogue stay constant.

FAQ on calibration

17

17. How do I determine matrix effects? Matrix effects can be identified from calibration curves obtained with matrix matched calibration

standards and calibration solutions in solvent. Matrix effects are encountered when the intercepts

and/or the slopes of the regression curves for the two sets of calibration solutions are significantly

different from each other. Ignoring these facts would lead in the earlier case to constant bias and in the

latter case to proportional bias.

The procedure to identify matrix effects is the same as to estimate a recovery function.

In the first step a calibration curve is constructed by linear regression with the calibration standards in

solvent solution. In the next step another calibration curve is constructed from the measurement data

of the matrix matched calibration standards.

Before proceeding it must be guaranteed that the precision of the two calibration curves is comparable.

Otherwise any significant difference between the calibration curves might be hidden by the different

level of precision. This is accomplished by testing the residual standard deviations of the two

calibration curves for significant differences (with an F-test) at the 99% confidence level. The number

of degrees of freedom is for each calibration experiment N-2 (with N=number of data).

Given that no significant differences of the residual standard deviations were identified, the

measurement data (y-values) of the matrix matched calibration are applied to the calibration function

gained with the calibration standards in solvent, and the corresponding concentration values (x-values)

are calculated. These values are called in the following "apparent concentration values". In the next

step a linear regression is performed on the concentration data of the calibration standard solutions in

solvent (x-values) and the apparent concentration values (used as signal data – y-values). This

regression curves contains the information on matrix effects. A slope different from one indicates

potential concentration proportional signal enhancement respectively signal suppression. An intercept

different from zero indicates concentration independent bias. However as the regression is based on a

particular data set, the question has to be answered whether the deviations from the ideal values

(slope=1, intercept=0) are significant or just random, as a consequence of the variability in the limited

number of data points. To answer this question, the confidence intervals (95% confidence level) of the

regression parameter have to be determined. If the confidence intervals include for the slope the value

one, and for the intercept the value zero, than it can be concluded that there is not any statistical

difference between the calibration with calibration standards in solvent solution and the matrix

matched calibration.

Matrix matched calibrations are an alternative to isotope dilution to compensate for matrix effects

when using mass spectrometry for measurement.

FAQ on calibration

18

18. In which sequence shall I measure the calibration standards? Generally the sequence in which the calibration standards are measured should be random.

The decision on whether to measure the calibration standards at the beginning of a sample sequence, at

the beginning and at the end of the sample sequence, or randomly distributed over the sample

sequence depends of the stability of the measurement system.

ISO standard 11095:1996 specifies as general requirements that "the measurements from which the

calibration function was calculated are representative of the normal conditions under which the

measurement system operates" and "that the measurement system is in a state of control". If the

measurement system is stable throughout the whole sample sequence then all approaches will give

equal results. However the design of the measurement sequence has to be modified if any instrument

drift is expected. Such modifications could exist of repeated analyses of the calibration solutions

during the measurement sequence or the inclusion of an increased number of quality control samples

in the measurement sequence.

Evaluation of calibration measurements

19. How shall I test for linearity of the calibration? For the purposes of this document linearity means the calibration can be best described by a straight

line. Linearity may also mean the estimated parameters are linear which would also be true for a

parabola, something that is not a straight line at all.

A straight line can be described by Eq 5:

nkn10nk ε+xβ+β=y Eq 5

with

β0 = intercept

β1 = line slope

ynk = the kth measured response of calibration level n

xn = the concentration of the analyte in calibration level n

εnk = the residual for the kth measurement of calibration level n.

The residual is the difference between the measured response and the response value calculated from

the calibration function:

nnknk yy=ε )−

FAQ on calibration

19

Plotting all nkε over ny) (residuals over fitted) results in the so called residual plot. This plot is a very

valuable diagnostic tool. If the points are evenly distributed around a horizontal line trough zero the

straight line function will be appropriate (see Figures 5 & 6).

Another, more complex approach, is the lack-of-fit test. If the lack-of-fit test is not significant then a

straight line function describes the calibration data appropriately. Replicate measurements at each

calibration level are a prerequisite for a lack-of-fit test.

20. Does a correlation coefficient (r) of 0.99 indicate linearity of calibration? No! The correlation coefficient is a measure of how much of the variability of y can be predicted by x.

An r-value of 1 indicates that y can be completely predicted by x, and a value of 0 indicates that y can

not be predicted by x. A parabola, which is markedly not a straight line, may have a correlation

coefficient of 0.99. And the r-value may improve by adding a quadratic term to one's calibration

function which then is certainly not linear in our sense of the word.

21. Which level of R² is sufficient? One can not define a sufficient level of R2! The closer R2 to 1 the better the quality of the predictions

made through the calibration. But certain calibration problems my never get beyond R2 = 0.98 while

for others 0.998 is a sign of an error.

0 2000 4000 6000 8000 10000

-200

-100

010

0

Fitted values

Res

idua

ls

Figure 5: straight line appropriate

0 2000 4000 6000 8000 10000

-400

-300

-200

-100

010

020

030

0Fitted values

Res

idua

ls

Figure 6: straight line inappropriate

FAQ on calibration

20

22. Which information can I get from the plot of residuals? The plot of residuals (see point 19) can show whether the assumption of linearity is met. But it can

also be used to check for homo- or heteroscedasticity of the calibration data and it is an indicator of the

residual variability.

23. What is the residual standard deviation? The residual standard deviation is a measure of the goodness-of-fit of the calibration. The smaller the

residual standard deviation, the closer are the measured data point to the calculated calibration curve.

It is used to calculate significance of the intercept and the slope.

24. May I force the calibration curve through the origin? If the test of significance shows that the estimated intercept is not different from zero, then the

intercept term is dropped from the calibration function and the calibration curve is assumed to

originate at x = 0 and y = 0. Otherwise the intercept term must be kept and the calibration curve is

assumed to originate at x = 0 and y = intercept.

25. What is homo- and heteroscedasticity? Homoscedasticity is the term for calibration data having about equal variability over the whole

calibration range. If the data's variability changes from one end of the range to the other the data is

called to be heteroscedastic.

FAQ on calibration

21

26. How do I test for homoscedasticity / heteroscedasticity? Whether one is dealing with homo- or heteroscedasticity, either can be determined from the residual

plot. In the case of homoscedasticity the residuals are more or less all within a band parallel to the x-

axis. In the case of heteroscedasticity the residuals assume a fan shape, from tight at one end to spread

out at the opposite end (see Figures 7 & 8).

27. Linear regression or weighted linear regression – which shall I apply? For homoscedastic data ordinary linear regression is appropriate. But if the data is heteroscedastic

ordinary linear regression will result in inflated estimates of the residual standard deviation. Therefore

weighted linear regression should be used in such case.

28. May I remove outliers? If an outlying value can be traced back to a failure in the system (e.g. injection error, bad

chromatography, pipetting error, etc.) then it is permissible to remove it or better yet to repeat the

measurement in question. If such a retrace does not come up with any failure then the outlying value

should be considered as a real but rare incident and kept in the data set.

0 2000 4000 6000 8000 10000

-300

-200

-100

010

020

0

Fitted values

Res

idua

ls

Figure 7: homoscedastic data

0 2000 4000 6000 8000 10000-1

000

-500

050

010

0015

00Fitted values

Res

idua

ls

Figure 8: heteroscedastic data

FAQ on calibration

22

29. How do I estimate confidence and prediction intervals? They are estimated based on the estimates of intercept ( 0β

)), slope ( 1β

)), and residual standard

deviation (σ) ) according to Eq 6.

( )( )∑ −

−+±+= − 2

2

2,101

xxxx

ntxy

n

CnpCC σββ )))

Eq 6

with

Cy = upper or lower bound of the confidence interval for Cx

Cx = value of x for which to compute the confidence interval

0β)

= estimate of the intercept

1β)

= estimate of the slope

σ) = estimate of the residual standard deviation

2, −npt = Student’s t for probability p and n-2 degrees of freedom

n = number of observations

x = average of all x-values of the calibration

nx = individual x-values of the calibration

The confidence interval or in the case of regression analysis better confidence band, defines the region

in which with a certain probability (usually 95%) the regression line would be found if the calibration

were repeated under similar conditions. As such the confidence band is of minor interest. More

important for the task of calibration is the prediction band which is wider than the confidence band.

( )( )∑ −−

++±+= − 2

2

2,1011

xxxx

ntxy

n

PnpPP σββ )))

Eq 7

The subscript C (Confidence) was replaced by the subscript P (Prediction). Otherwise the same

definitions as above are true.

The projection of the outer bounds of this prediction band onto the y-axis defines the range of values

which could reasonably be expected if one were too predict a new y for a new x.

The above formulas are for the ordinary least squares approach. If a weighted least squares approach

has to be used because of heteroscedasticity the weighted equivalents of all the estimates are used in

Eq 6 and 7.

FAQ on calibration

23

General

30. Is there any internationally harmonised document on calibration? ILAC/OIML Guide on Calibration

ILAC Guide G24:2007 / OIML D 10:2007 "Guidelines for the determination of calibration intervals of

measuring instruments", ILAC, Silverwater, Australia, 2007.

IUPAC Recommendations 1998

K. Danzer, L.A. Currie (1998), "Guidelines for Calibration in Analytical Chemistry – Part 1.

Fundamentals and single component calibration", Pure&Appl.Chem., 70: 993-1014

ISO Guide

ISO Guide 32:1997 "Calibration in analytical chemistry and use of certified reference materials", ISO,

Geneva, Switzerland, 1997

ISO Standard

ISO 8466-1:1990 "Water quality – Calibration and evaluation of analytical methods and estimation of

performance characteristics. Part 1: Statistical evaluation of the linear calibration function", ISO,

Geneva, Switzerland, 1990

ISO Standard

ISO 11095:1996 "Linear calibration using reference materials", ISO, Geneva, Switzerland, 1996

31. Where can I get guidance on calibration?

LGC Best practice guide for calibration design

LGC Document "Preparation of calibration curves – a guide to best practice. (2003)

L. Cuardos-Rodriguez, L. Gámiz-Gracia, E.M. Almansa-López, J.M. Bosque-Sendra, (2003)

"Calibration in chemical measurement processes. II. A methodological approach", Trends in Anal.

Chem., 20: 620-636

FAQ on calibration

24

1 United States Pharmacopeia, Chapter 41, 28th Edition, Rockville, Maryland, USA, 2005 2 Ch. Burgess and R.D. McDowall, A question of balance? Part 2: Putting principles into practice, LCGC Europe, 19/3 (2006).