westgard ;bias

The issue of bias in analytical measurements generates a lot of

debate. Existential debates (does bias exist? should it?) are often

mixed with more practical debates (what's the best way to calculate

bias?). Here's a description of the different kinds of bias that

(might?) exist in the laboratory.

January 2010

Does Bias Exist? Should we pay attention to it?

What kinds of bias exist?

Bias from reference material, reference method or standard

Bias from Proficiency Testing (PT) or External Quality

Assurance (EQA)

Bias from Peer Group

Bias from Comparative method

Bias between instruments

Bias between reagent lots

So, if Bias Exists. when do we assess it?

The Commutability Conundrum and the Matrix Effect

Can't we just forget about Bias?

"The combination of imprecision and bias into a single parameter

appears to simplify daily quality assurance. However, no practical

benefit of such a combination in daily quality assurance has been

demonstrated in comparison with the well-established separate checks

for both errors. Laboratorians are used to thinking in terms of

imprecision and bias separately. It has been postulated that

clinicians may favor combination models. Many clinicians are well

aware that laboratory results vary, but they assume that bias can be

neglected. They are not used to combining both errors in any model.

Therefore, combination models are probably also of no benefit to

clinicians."

Benefits of combining bias and imprecision in quality assurance of clinical chemical

procedures, Rainer Haeckel and Werner Wosniok J Lab Med 2007;31(2):8789 _ 2007

"A majority of the methods used in thyroid function testing have

biases that limit their clinical utility.Traditional proficiency

testing materials do not adequately reflect these biases."

Analytic Bias of Thyroid Function Tests: Analysis of a College of American

Pathologists Fresh Frozen Serum Pool by 3900 Clinical Laboratories. Bernard W.

Steele, MD; Edward Wang, PhD; George G. Klee, MD, PhD; Linda M. Thienpont, PhD;

Steven J. Soldin, PhD; Lori J. Sokoll, PhD; William E. Winter, MD; Susan A. Fuhrman,

MD; Ronald J. Elin, MD, PhD. Arch Pathol Lab Med. 2005;129:310317.

"Analytic bias caused by assay differences and reagent variations can

cause major problems for clinicians trying to interpret the tests

results."

Clinical interpretation of reference intervals and reference limits. A plea for

Phoca PDF

assay harmonization. George Klee, Clin Chem Lab Med 2004: 42(7):752-757

"That's the news from Lake Wobegon, where all the women are strong,

all the men are good-looking, and all the children are above average."

Garrison Keillor, Prairie Home Companion

In the United States, bias is always a hot issue, particularly in

media and politics. "Bias" is the typical accusation thrown by

supporters of the Political Party of the "Buffalo" when a report in

the media comes out that they believe is somehow favorable to

Political Party of the "Fox". Likewise, if a media outlet criticizes a

policy or person associated with "Buffalos", the Buffalos cry foul.

Both "Buffalos" and "Foxes" allege that different media outlets,

journals, or research groups are biased in favor of their opponents.

As a result, networks, newspapers and journalists are vilified by one

side or another or frequently both. Objective truth - whether a policy

is actually good the country, or whether a politician has told the

truth or lied, for example - is often lost in the finger-pointing.

Back in the laboratory, the fight over bias is not quite as

contentious, although at times it seems the conversation is almost as

lively. There is both an existential debate (does bias exist? should

we allow it to exist when we detect it? should we incorporate bias

into our calculations?) and a practical concern (what's the best way

to determine bias? what is the "truth" against which we determine our

bias?). Often, one part of the argument overshadows the other part. As

we argue about whether or not bias should be incorporated into our

models and calculations, we may forget to discuss or even consider the

best way to practically calculate bias.

Does Bias exist? Should we pay attention to it?

The discussion of whether or not bias exists has been covered in

other discussions in the literature and on this website and the

blog. But if you want a quick recap: the ISO GUM model (Guide to

Uncertainty of Measurements) asserts that Measurement

Uncertainty (MU) is the best expression of performance by

laboratory tests - and this expression does not include bias.

Bias, therefore, should be eliminated whenever found, so that

Measurement Uncertainty can be calculated. Attempts have been

made since the original formulation of Measurement Uncertainty

to include and account for bias [for example, [Quality assessment of

quantitative analytical results in laboratory medicine by root mean square of

measurement deviation, Rainer Macdonald, J Lab Med 2006:30(3):111-117,],

but these attempts have been found wanting [Calculation of Measurement

Uncertainty - Why Bias Should Be Treated Separately, Linda M. Thienpont, Clinical

Chemistry 54: 1587-1588, 2008; Letter to the Editor: Benefits of combining bias and

imprecision in quality assurance of clinical chemical procedures, Rainer Haeckel and

Phoca PDF

Werner Wosniok, J Lab Med 2007:31(2):87-89]

On the other side of the debate, the Total Error model acknowledges

the existence of bias and includes it in the calculations. The Total

Error model agrees with the Measurement Uncertainty model that bias

should be eliminated where possible, but is not dogmatic on this

point. [On the practical side, recommendations for calculating Total

Error include an assumption of zero bias when data is not available

for this quantity.] And if bias really is zero, the estimates for

Total Error and Measurement Uncertainty converge.

In other words, Measurement Uncertainty is biased against bias, but

Total Error is not.

Here's an example of bias in the real world. Quest Diagnostics,

specifically its subsidiary Nichols Institute Diagnostics, was fined

$302,000,000 because of bias in 2009 (a $40,000,000 criminal fine plus

$262,000,000 as a civil settlement of the False Claims Act). It was

found that the Nichols Advantage Chemiluminescence Intact Parathyroid

Hormone Immunoassay "provided inaccurate and unreliable results" and

that during "periods of time...provided elevated results." These

results caused "some medical providers to submit false claims for

reimbursement to federal health programs for unnecessary treatments."

In other words, a high bias on this test led to unnecessary

operations. That's the real world impact of bias.

[Quest Diagnostics to Pay U.S. $302 Million to Resolve Allegations That a Subsidiary

Sold Misbranded Test Kits, Department of Justice Press Release, April 15, 2009.

http://www.usdoj.gov/opa/pr/2009/April/09-civ-350.html]

For those who still contend that bias doesn't exist, because

everywhere it's detected a correction is made to eliminate it, there's

no need to read further. For those who suspect bias does exist, does

affect the laboratory and cannot always be eliminated, read on.

What kinds of bias exist?

Just because we've decided to acknowledge the existence of bias

doesn't make life any easier. The harder question is how to measure

bias. Since bias is a relative term - you measure it against something

else - you have to decide, What is the standard?

There are many possible biases, including, just to name a few,

* Bias from reference material or reference method

* Bias from the all-method mean of a PT or EQA survey

* Bias from the mean of a peer group

* Bias from a comparison method

* Bias between identical instruments in the same laboratory

Phoca PDF

* Bias between reagent lots

Bias from a reference material, reference method, or

standard

For some analytes, there is a gold standard (or reference) method or

material. There is, in other words, a "true" value that should be

achieved by all methods. To get to this true value, and relate your

laboratory method to it, you must enter the world of Metrology.

"Metrology has been very good about identifying reference methods and

reference materials and putting together a formal traceability chain

so that you can tie your kit calibrator in your clinical lab back to a

reference material and a reference method that are internationally

recognized...The whole idea is that you can then come close to

scientific truth rather than a test result that is a relative truth."

David Armbruster, quoted in The Pursuit of Traceability,Bill Malone, Clinical

Laboratory News, October 2009, cover story

When you calculate bias against a reference method and/or reference

material, you're figuring out a "true" bias, one that is more

scientifically true than just relatively true. With the former, you

know you are not getting the true answer. With the latter, you only

know that you aren't getting the same answer as everyone else.

Bias calculated from PT or EQA

One of the routine ways to determine bias is to compare the results of

your laboratory against those of other laboratories through

proficiency testing (PT), which is sometimes known as external quality

assurance (EQA). Typically, a sample is sent out to all laboratories

in the program, all laboratories run the sample and report the result,

then the program tabulates the results and issues a report back to the

labs. Each report typically states the difference(or bias) between the

individual laboratory's result and that of the PT/EQA group method

mean. Given that information, each individual laboratory is supposed

to decide if the bias is significant and warrants a correction,

adjustment, or calibration on their part.

For some analytes, reference methods and/or reference materials are

used, so they include a definitive value for the event or sample. This

means that all labs should get a specific result, and every increment

away from that result is considered bias. If you determine bias using

the method mean from a reference method or reference material , you

are measuring the difference between your result and the "true"

result.

Phoca PDF

For many other analytes, no reference methods exist - or, even though

a reference method may be available, the PT/EQA group might not run

the sample using it - so there is no definitive value reported for the

sample. Instead, there is only the "all-method" mean reported. There

are other terms for this mean, and sometimes it is simply called the

group mean. But the essential meaning is that this mean is the (albeit

trimmed in some way) average of all the different laboratory results .

In other words, the mean is close to the answer that all the

laboratories reported. This doesn't mean that the answer is the "true"

answer, because all the laboratory methods could be biased in the same

direction (revisit the concept of "precise but not accurate"). Here,

if you determine bias using an all-method mean, you are measuring the

difference between your laboratory result and the results that most of

the other laboratories got.

Bias from a peer group

The next possible way to measure bias is through a peer group. This is

very similar to participating in PT or an EQA program, except all the

participants in the testing event are, well, peers. A peer group

typically is a group of the same instruments using the same controls

and/or reagents. So the answers the each laboratory obtains should be

much closer to each other. Now, again, while there should be smaller

differences between participants, the peer group mean is not a "true"

mean like you get with a reference method and/or reference material.

Peer group means are, in effect, all-method means for a single method.

You have more confidence that your bias exists, because if all your

peers diverge from you value, there must be something going on (you

can't blame the difference on different methods or materials anymore).

If the peer group is using a reference material or including a

reference method measurement with the results, the value of the report

is improved. Still, in the absence of additional information, peer

group reports are better, but they cannot tell you if you have a

"true" bias.

Bias from a comparative method

Part of the method validation process includes a comparison of methods

study, typically done between the new method that has just been

purchased and the old method which is being replaced. Note the

difference between "comparative" and "reference" method. The

comparative method is only a relative comparison; there is no claim to

scientific truth here. It could be that the old method was more

scientifically true while the new method is less scientifically true,

so a new relative bias exists in the wrong direction.

In a sense, any bias determined by a comparison study is still quite

Phoca PDF

real - because test results that span the switch-over to the new

method will be shifted up or down even when there is only a relative

difference between the new and old method. A patient receiving care

before and after the switch could see a rise or fall in their test

results, resulting, in the worst case, in misdiagnosis and treatment.

So this is "real" bias - even if it isn't "true" bias.

Bias between identical methods/instruments in the same

laboratory

In large health systems, laboratory testing volumes have grown to the

point where it's possible that multiple big box analyzers reside in

the same laboratory. But even when the same instrument is used, with

the same lot of reagents, the same calibrators, and the same lot of

quality controls, two "identical" instruments won't be. That is, each

instrument will have its own performance, and the same sample run on

instrument A will have a different result than when it is run on

instrument B. Since patients within the health system can't control on

which instrument their samples will run, this is a bias that they

undoubtedly will experience.

The question is, how big is that bias? Two identical instruments

within the same laboratory are the ultimate peer group, and it should

be easy to determine the nature and extent of any bias. It falls to

the laboratory professionals to determine the bias between the two

instruments and, once that bias is calculated, they must make a

judgment on whether or not that bias will impact patient care. If the

laboratory decides that a medically important between-instrument bias

exists, they then must take some action to account for this bias in

reports or they must eliminate the bias in some way.

If you think this is not a big problem, step out of the laboratory for

the moment and head for the near-patient testing environment: this

same challenge happens writ large with point-of-care (POC) devices.

With hundreds if not thousands of different operators and dozens or

possibly hundreds of the same POC device, how does a health system

ensure that POC device A-1 delivers the same test results as POC

device A-34? The US accreditation and regulatory systems have shrunk

from the implications of this problem, and health systems and

diagnostic manufacturers blanche at the costs of monitoring that would

be required to truly monitor performance of these devices.

Bias between reagent lots (and control lots)

Even if you decided to isolate your instrument and method from the

rest of the testing world - using only one instrument in your

laboratory, exclusively, without referencing any outside results - you

Phoca PDF

still cannot escape bias. Why? because you make changes to your

instrument periodically and the method itself changes over time.

Sometimes this is expressed as growing (or optimistically, declining)

imprecision. Other changes occur more distinctly.

Take reagent switches. When you bring in a set of reagents, those

materials are not the same as the old lot. Manufacturers take great

pains to make them as close to identical as possible, but there will

always be differences. Everyone hopes that these are small

differences. As with the two- or multiple-instrument problem above,

laboratories have to identify, calculate, and make a judgment about

any difference. Re-calibration usually takes care of the issues with

reagent switches, but laboratories need to monitor QC carefully after

a reagent switch. In fact, changing reagents is an event that may

trigger a run of extra controls.

A shift in values for control lots is also common - but it's not

really a bias. Just as with the reagents, the controls can't be

manufactured perfectly, so there is always a slightly different mean

for each control lot. But this shift is one of the easiest to correct.

Good Laboratory Practice (and CLSI guideline C24-A3) recommend a

crossover period between the old and new control lots of several weeks

to several days, depending on the stability of the control lots

(hematology controls have limited stability, so the crossover period

may only be a few days). In this way, a laboratory can phase in the

new control lot, characterizing the performance of the new lot and

providing a comparison with the old lot.

So, Bias Exists. Now, when do we assess it? Do we determine

bias at a specific time? At a specific level?

In addition to choosing the source or reference for comparison in the

estimation of bias, laboratories also have to answer a question of

scale, or time. Over what timeframe does a laboratory want to

determine its bias?

With a method validation study, for instance a comparison of methods

study, the bias calculations represent a specific window of time (the

duration of the study). The bias calculation is valuable for that

specific time period, but after that, the value diminishes as more

time elapses and the instrument, method, and laboratory staff changes.

Likewise, the results of a PT or EQA event are quite specific and few

in number. In the US, PT events may happen only two or three times per

year, and involve between two and five samples per event. Using just a

handful of data points to determine bias may not inspire confidence in

the calculations. For a longer, broader view of bias, you may want to

Phoca PDF

average a number of events and samples together. Outside the US, PT

and EQA may be more frequent and involve more samples, so confidence

in the bias determinations is higher.

Here is where peer group evaluation may be more helpful. Often peer

groups collect all the data, or at least a lot more data points than

PT or EQA challenges.

Another technique of monitoring bias on a continuous basis is patient

split-sampling. When there is an available reference/comparative

method, you can run the same patient sample on the "test" method and

comparative method on an ongoing basis. In our earlier scenario with

the two identical instruments in the same laboratory, for example,

split-sampling would be a good technique to monitor the differences

between the methods/instruments. Many health systems don't have the

ability to run that continuous comparison in-house, unfortunately.

The Commutability Conundrum and the Matrix effect

No, this isn't time to take the blue pill (although after learning

about all the biases in the laboratory world, you might wish you could

wake up from this metrology Wonderland). There's one last issue when

it comes to bias and how we measure it. It's called Commutability.

"The term 'commutability' was first used to describe the ability of a

reference or control material to have interassay properties comparable

to the properties demonstrated by authentic clinical samples when

measured by more than one analytical method ... More recent metrologic

documents expand the concept; they describe commutability as the

equivalence of the mathematical relationships between the results of

different measurement procedures for a reference material and for

representative samples from healthy and diseased individuals ."

W. Greg Miller, Gary L Myers, Robert Rej, Why Commutability Matters, Clin Chem 52(4):

553. 2006.

Commutability, in layman's terms, means that if a bias is detected by

control materials, there is also a bias in real patient samples. The

control behaves like the patient sample. The assumption that the

control behaves like the patient sample is built into the very

foundation of quality control (if the controls were unstable and

wholly different in behavior from patient samples, there would be no

point in running them).

Unfortunately, it is not easy to build a cost-efficient control

material that behaves exactly like a real patient sample. The

challenges of creating control materials are a huge topic outside the

scope of our focus in this lesson. Suffice it to say that

manufacturers try to create controls that are as close as practically

Phoca PDF

possible to patient samples, and laboratories try to put up with the

differences between controls and patient samples. For cholesterol and

glycated hemoglobin, for example, there is a strong commitment to

create a traceability chain and find, minimize, or eliminate matrix

effects. For other analytes, however, there is at best mixed success

in eliminating matrix biases.

When there is a distinct difference between the method performance

using a control material versus a patient sample, usually because of

the constituents of the control material, this is called a Matrix

Effect (which has absolutely no relationship to the Wachowski

brothers, Keanu Reeves, or Laurence Fishburne). This, in effect, is

another bias. The control materials are biased from the patient

samples. Matrix Effects are most obvious when you plot different

methods and instruments in PT or EQA groups; whenever there is a

marked difference between instrument A and instrument B on the same

event, the answer is often a matrix effect (really, it's a bias of the

biases)

Can't we just forget about bias? Pretend it doesn't exist?

"A fundamental goal of laboratory medicine is that results for

patients samples will be comparable and independent of the medical

laboratory that produced the results. Routine measurement procedures

of acceptable analytical specificity that have calibration traceable

to the same higher order reference material or reference measurement

procedure should produce numerical values for clinical samples that

are comparable irrespective of time, place, or laboratory generating

the result."

W. Greg Miller, Gary L Myers, Robert Rej, Why Commutability Matters, Clin Chem 52(4):

553. 2006.

In a world where professional guidelines are making global

recommendations for cutoff limits, where multinational diagnostic

manufacturers are issuing reference ranges that are often used by

customers as de facto reference ranges around the world, and where

pay-for-performance schemes are implemented with agency-mandated

cutoffs, it is not possible to ignore bias.

Furthermore, while one hand of ISO is encouraging measurement

uncertainty and the elimination of bias as a factor in the performance

of methods, another ISO-driven effort is toward traceability,

standardization, and harmonization. This latter effort explicitly

recognizes that there are biases between methods and urges some form

of standardization to harmonize results. In the US, there is no

regulatory mandate for standardization, harmonization, or

traceability. Indeed, the FDA has no real power to order medical

device manufacturers to supply traceability information with their

Phoca PDF

applications for FDA clearance and approval.

In a perfect world, these two ISO desires would be fulfilled. Methods

would be traceable, standardized, and/or harmonized, so that many of

the biases discussed here could be eliminated. In that world,

measurement uncertainty would be easy to calculate because you

wouldn't have to ignore biases.

In our less-than-perfect world, however, not only are methods often

un-traceable, un-standardized, and un-harmonious, there are also

biases that will still exist even in the presence of national

standardization programs (witness HbA1c). We have lots of biases which

have not yet been eliminated or reduced sufficiently to assure the

comparability of laboratory test results.

Powered by TCPDF (www.tcpdf.org)

Phoca PDF

westgard ;bias

Documents