westgard ;bias

10
The issue of bias in analytical measurements generates a lot of debate. Existential debates (does bias exist? should it?) are often mixed with more practical debates (what's the best way to calculate bias?). Here's a description of the different kinds of bias that (might?) exist in the laboratory. January 2010 Does Bias Exist? Should we pay attention to it? What kinds of bias exist? Bias from reference material, reference method or standard Bias from Proficiency Testing (PT) or External Quality Assurance (EQA) Bias from Peer Group Bias from Comparative method Bias between instruments Bias between reagent lots So, if Bias Exists. when do we assess it? The Commutability Conundrum and the Matrix Effect Can't we just forget about Bias? "The combination of imprecision and bias into a single parameter appears to simplify daily quality assurance. However, no practical benefit of such a combination in daily quality assurance has been demonstrated in comparison with the well-established separate checks for both errors. Laboratorians are used to thinking in terms of imprecision and bias separately. It has been postulated that clinicians may favor combination models. Many clinicians are well aware that laboratory results vary, but they assume that bias can be neglected. They are not used to combining both errors in any model. Therefore, combination models are probably also of no benefit to clinicians." Benefits of combining bias and imprecision in quality assurance of clinical chemical procedures, Rainer Haeckel and Werner Wosniok J Lab Med 2007;31(2):87–89 _ 2007 "A majority of the methods used in thyroid function testing have biases that limit their clinical utility.Traditional proficiency testing materials do not adequately reflect these biases." Analytic Bias of Thyroid Function Tests: Analysis of a College of American Pathologists Fresh Frozen Serum Pool by 3900 Clinical Laboratories. Bernard W. Steele, MD; Edward Wang, PhD; George G. Klee, MD, PhD; Linda M. Thienpont, PhD; Steven J. Soldin, PhD; Lori J. Sokoll, PhD; William E. Winter, MD; Susan A. Fuhrman, MD; Ronald J. Elin, MD, PhD. Arch Pathol Lab Med. 2005;129:310–317. "Analytic bias caused by assay differences and reagent variations can cause major problems for clinicians trying to interpret the tests results." Clinical interpretation of reference intervals and reference limits. A plea for Phoca PDF

Upload: arif-ahammed-p

Post on 08-Nov-2015

44 views

Category:

Documents


0 download

DESCRIPTION

qc

TRANSCRIPT

  • The issue of bias in analytical measurements generates a lot of

    debate. Existential debates (does bias exist? should it?) are often

    mixed with more practical debates (what's the best way to calculate

    bias?). Here's a description of the different kinds of bias that

    (might?) exist in the laboratory.

    January 2010

    Does Bias Exist? Should we pay attention to it?

    What kinds of bias exist?

    Bias from reference material, reference method or standard

    Bias from Proficiency Testing (PT) or External Quality

    Assurance (EQA)

    Bias from Peer Group

    Bias from Comparative method

    Bias between instruments

    Bias between reagent lots

    So, if Bias Exists. when do we assess it?

    The Commutability Conundrum and the Matrix Effect

    Can't we just forget about Bias?

    "The combination of imprecision and bias into a single parameter

    appears to simplify daily quality assurance. However, no practical

    benefit of such a combination in daily quality assurance has been

    demonstrated in comparison with the well-established separate checks

    for both errors. Laboratorians are used to thinking in terms of

    imprecision and bias separately. It has been postulated that

    clinicians may favor combination models. Many clinicians are well

    aware that laboratory results vary, but they assume that bias can be

    neglected. They are not used to combining both errors in any model.

    Therefore, combination models are probably also of no benefit to

    clinicians."

    Benefits of combining bias and imprecision in quality assurance of clinical chemical

    procedures, Rainer Haeckel and Werner Wosniok J Lab Med 2007;31(2):8789 _ 2007

    "A majority of the methods used in thyroid function testing have

    biases that limit their clinical utility.Traditional proficiency

    testing materials do not adequately reflect these biases."

    Analytic Bias of Thyroid Function Tests: Analysis of a College of American

    Pathologists Fresh Frozen Serum Pool by 3900 Clinical Laboratories. Bernard W.

    Steele, MD; Edward Wang, PhD; George G. Klee, MD, PhD; Linda M. Thienpont, PhD;

    Steven J. Soldin, PhD; Lori J. Sokoll, PhD; William E. Winter, MD; Susan A. Fuhrman,

    MD; Ronald J. Elin, MD, PhD. Arch Pathol Lab Med. 2005;129:310317.

    "Analytic bias caused by assay differences and reagent variations can

    cause major problems for clinicians trying to interpret the tests

    results."

    Clinical interpretation of reference intervals and reference limits. A plea for

    Phoca PDF

  • assay harmonization. George Klee, Clin Chem Lab Med 2004: 42(7):752-757

    "That's the news from Lake Wobegon, where all the women are strong,

    all the men are good-looking, and all the children are above average."

    Garrison Keillor, Prairie Home Companion

    In the United States, bias is always a hot issue, particularly in

    media and politics. "Bias" is the typical accusation thrown by

    supporters of the Political Party of the "Buffalo" when a report in

    the media comes out that they believe is somehow favorable to

    Political Party of the "Fox". Likewise, if a media outlet criticizes a

    policy or person associated with "Buffalos", the Buffalos cry foul.

    Both "Buffalos" and "Foxes" allege that different media outlets,

    journals, or research groups are biased in favor of their opponents.

    As a result, networks, newspapers and journalists are vilified by one

    side or another or frequently both. Objective truth - whether a policy

    is actually good the country, or whether a politician has told the

    truth or lied, for example - is often lost in the finger-pointing.

    Back in the laboratory, the fight over bias is not quite as

    contentious, although at times it seems the conversation is almost as

    lively. There is both an existential debate (does bias exist? should

    we allow it to exist when we detect it? should we incorporate bias

    into our calculations?) and a practical concern (what's the best way

    to determine bias? what is the "truth" against which we determine our

    bias?). Often, one part of the argument overshadows the other part. As

    we argue about whether or not bias should be incorporated into our

    models and calculations, we may forget to discuss or even consider the

    best way to practically calculate bias.

    Does Bias exist? Should we pay attention to it?

    The discussion of whether or not bias exists has been covered in

    other discussions in the literature and on this website and the

    blog. But if you want a quick recap: the ISO GUM model (Guide to

    Uncertainty of Measurements) asserts that Measurement

    Uncertainty (MU) is the best expression of performance by

    laboratory tests - and this expression does not include bias.

    Bias, therefore, should be eliminated whenever found, so that

    Measurement Uncertainty can be calculated. Attempts have been

    made since the original formulation of Measurement Uncertainty

    to include and account for bias [for example, [Quality assessment of

    quantitative analytical results in laboratory medicine by root mean square of

    measurement deviation, Rainer Macdonald, J Lab Med 2006:30(3):111-117,],

    but these attempts have been found wanting [Calculation of Measurement

    Uncertainty - Why Bias Should Be Treated Separately, Linda M. Thienpont, Clinical

    Chemistry 54: 1587-1588, 2008; Letter to the Editor: Benefits of combining bias and

    imprecision in quality assurance of clinical chemical procedures, Rainer Haeckel and

    Phoca PDF

  • Werner Wosniok, J Lab Med 2007:31(2):87-89]

    On the other side of the debate, the Total Error model acknowledges

    the existence of bias and includes it in the calculations. The Total

    Error model agrees with the Measurement Uncertainty model that bias

    should be eliminated where possible, but is not dogmatic on this

    point. [On the practical side, recommendations for calculating Total

    Error include an assumption of zero bias when data is not available

    for this quantity.] And if bias really is zero, the estimates for

    Total Error and Measurement Uncertainty converge.

    In other words, Measurement Uncertainty is biased against bias, but

    Total Error is not.

    Here's an example of bias in the real world. Quest Diagnostics,

    specifically its subsidiary Nichols Institute Diagnostics, was fined

    $302,000,000 because of bias in 2009 (a $40,000,000 criminal fine plus

    $262,000,000 as a civil settlement of the False Claims Act). It was

    found that the Nichols Advantage Chemiluminescence Intact Parathyroid

    Hormone Immunoassay "provided inaccurate and unreliable results" and

    that during "periods of time...provided elevated results." These

    results caused "some medical providers to submit false claims for

    reimbursement to federal health programs for unnecessary treatments."

    In other words, a high bias on this test led to unnecessary

    operations. That's the real world impact of bias.

    [Quest Diagnostics to Pay U.S. $302 Million to Resolve Allegations That a Subsidiary

    Sold Misbranded Test Kits, Department of Justice Press Release, April 15, 2009.

    http://www.usdoj.gov/opa/pr/2009/April/09-civ-350.html]

    For those who still contend that bias doesn't exist, because

    everywhere it's detected a correction is made to eliminate it, there's

    no need to read further. For those who suspect bias does exist, does

    affect the laboratory and cannot always be eliminated, read on.

    What kinds of bias exist?

    Just because we've decided to acknowledge the existence of bias

    doesn't make life any easier. The harder question is how to measure

    bias. Since bias is a relative term - you measure it against something

    else - you have to decide, What is the standard?

    There are many possible biases, including, just to name a few,

    * Bias from reference material or reference method

    * Bias from the all-method mean of a PT or EQA survey

    * Bias from the mean of a peer group

    * Bias from a comparison method

    * Bias between identical instruments in the same laboratory

    Phoca PDF

  • * Bias between reagent lots

    Bias from a reference material, reference method, or

    standard

    For some analytes, there is a gold standard (or reference) method or

    material. There is, in other words, a "true" value that should be

    achieved by all methods. To get to this true value, and relate your

    laboratory method to it, you must enter the world of Metrology.

    "Metrology has been very good about identifying reference methods and

    reference materials and putting together a formal traceability chain

    so that you can tie your kit calibrator in your clinical lab back to a

    reference material and a reference method that are internationally

    recognized...The whole idea is that you can then come close to

    scientific truth rather than a test result that is a relative truth."

    David Armbruster, quoted in The Pursuit of Traceability,Bill Malone, Clinical

    Laboratory News, October 2009, cover story

    When you calculate bias against a reference method and/or reference

    material, you're figuring out a "true" bias, one that is more

    scientifically true than just relatively true. With the former, you

    know you are not getting the true answer. With the latter, you only

    know that you aren't getting the same answer as everyone else.

    Bias calculated from PT or EQA

    One of the routine ways to determine bias is to compare the results of

    your laboratory against those of other laboratories through

    proficiency testing (PT), which is sometimes known as external quality

    assurance (EQA). Typically, a sample is sent out to all laboratories

    in the program, all laboratories run the sample and report the result,

    then the program tabulates the results and issues a report back to the

    labs. Each report typically states the difference(or bias) between the

    individual laboratory's result and that of the PT/EQA group method

    mean. Given that information, each individual laboratory is supposed

    to decide if the bias is significant and warrants a correction,

    adjustment, or calibration on their part.

    For some analytes, reference methods and/or reference materials are

    used, so they include a definitive value for the event or sample. This

    means that all labs should get a specific result, and every increment

    away from that result is considered bias. If you determine bias using

    the method mean from a reference method or reference material , you

    are measuring the difference between your result and the "true"

    result.

    Phoca PDF

  • For many other analytes, no reference methods exist - or, even though

    a reference method may be available, the PT/EQA group might not run

    the sample using it - so there is no definitive value reported for the

    sample. Instead, there is only the "all-method" mean reported. There

    are other terms for this mean, and sometimes it is simply called the

    group mean. But the essential meaning is that this mean is the (albeit

    trimmed in some way) average of all the different laboratory results .

    In other words, the mean is close to the answer that all the

    laboratories reported. This doesn't mean that the answer is the "true"

    answer, because all the laboratory methods could be biased in the same

    direction (revisit the concept of "precise but not accurate"). Here,

    if you determine bias using an all-method mean, you are measuring the

    difference between your laboratory result and the results that most of

    the other laboratories got.

    Bias from a peer group

    The next possible way to measure bias is through a peer group. This is

    very similar to participating in PT or an EQA program, except all the

    participants in the testing event are, well, peers. A peer group

    typically is a group of the same instruments using the same controls

    and/or reagents. So the answers the each laboratory obtains should be

    much closer to each other. Now, again, while there should be smaller

    differences between participants, the peer group mean is not a "true"

    mean like you get with a reference method and/or reference material.

    Peer group means are, in effect, all-method means for a single method.

    You have more confidence that your bias exists, because if all your

    peers diverge from you value, there must be something going on (you

    can't blame the difference on different methods or materials anymore).

    If the peer group is using a reference material or including a

    reference method measurement with the results, the value of the report

    is improved. Still, in the absence of additional information, peer

    group reports are better, but they cannot tell you if you have a

    "true" bias.

    Bias from a comparative method

    Part of the method validation process includes a comparison of methods

    study, typically done between the new method that has just been

    purchased and the old method which is being replaced. Note the

    difference between "comparative" and "reference" method. The

    comparative method is only a relative comparison; there is no claim to

    scientific truth here. It could be that the old method was more

    scientifically true while the new method is less scientifically true,

    so a new relative bias exists in the wrong direction.

    In a sense, any bias determined by a comparison study is still quite

    Phoca PDF

  • real - because test results that span the switch-over to the new

    method will be shifted up or down even when there is only a relative

    difference between the new and old method. A patient receiving care

    before and after the switch could see a rise or fall in their test

    results, resulting, in the worst case, in misdiagnosis and treatment.

    So this is "real" bias - even if it isn't "true" bias.

    Bias between identical methods/instruments in the same

    laboratory

    In large health systems, laboratory testing volumes have grown to the

    point where it's possible that multiple big box analyzers reside in

    the same laboratory. But even when the same instrument is used, with

    the same lot of reagents, the same calibrators, and the same lot of

    quality controls, two "identical" instruments won't be. That is, each

    instrument will have its own performance, and the same sample run on

    instrument A will have a different result than when it is run on

    instrument B. Since patients within the health system can't control on

    which instrument their samples will run, this is a bias that they

    undoubtedly will experience.

    The question is, how big is that bias? Two identical instruments

    within the same laboratory are the ultimate peer group, and it should

    be easy to determine the nature and extent of any bias. It falls to

    the laboratory professionals to determine the bias between the two

    instruments and, once that bias is calculated, they must make a

    judgment on whether or not that bias will impact patient care. If the

    laboratory decides that a medically important between-instrument bias

    exists, they then must take some action to account for this bias in

    reports or they must eliminate the bias in some way.

    If you think this is not a big problem, step out of the laboratory for

    the moment and head for the near-patient testing environment: this

    same challenge happens writ large with point-of-care (POC) devices.

    With hundreds if not thousands of different operators and dozens or

    possibly hundreds of the same POC device, how does a health system

    ensure that POC device A-1 delivers the same test results as POC

    device A-34? The US accreditation and regulatory systems have shrunk

    from the implications of this problem, and health systems and

    diagnostic manufacturers blanche at the costs of monitoring that would

    be required to truly monitor performance of these devices.

    Bias between reagent lots (and control lots)

    Even if you decided to isolate your instrument and method from the

    rest of the testing world - using only one instrument in your

    laboratory, exclusively, without referencing any outside results - you

    Phoca PDF

  • still cannot escape bias. Why? because you make changes to your

    instrument periodically and the method itself changes over time.

    Sometimes this is expressed as growing (or optimistically, declining)

    imprecision. Other changes occur more distinctly.

    Take reagent switches. When you bring in a set of reagents, those

    materials are not the same as the old lot. Manufacturers take great

    pains to make them as close to identical as possible, but there will

    always be differences. Everyone hopes that these are small

    differences. As with the two- or multiple-instrument problem above,

    laboratories have to identify, calculate, and make a judgment about

    any difference. Re-calibration usually takes care of the issues with

    reagent switches, but laboratories need to monitor QC carefully after

    a reagent switch. In fact, changing reagents is an event that may

    trigger a run of extra controls.

    A shift in values for control lots is also common - but it's not

    really a bias. Just as with the reagents, the controls can't be

    manufactured perfectly, so there is always a slightly different mean

    for each control lot. But this shift is one of the easiest to correct.

    Good Laboratory Practice (and CLSI guideline C24-A3) recommend a

    crossover period between the old and new control lots of several weeks

    to several days, depending on the stability of the control lots

    (hematology controls have limited stability, so the crossover period

    may only be a few days). In this way, a laboratory can phase in the

    new control lot, characterizing the performance of the new lot and

    providing a comparison with the old lot.

    So, Bias Exists. Now, when do we assess it? Do we determine

    bias at a specific time? At a specific level?

    In addition to choosing the source or reference for comparison in the

    estimation of bias, laboratories also have to answer a question of

    scale, or time. Over what timeframe does a laboratory want to

    determine its bias?

    With a method validation study, for instance a comparison of methods

    study, the bias calculations represent a specific window of time (the

    duration of the study). The bias calculation is valuable for that

    specific time period, but after that, the value diminishes as more

    time elapses and the instrument, method, and laboratory staff changes.

    Likewise, the results of a PT or EQA event are quite specific and few

    in number. In the US, PT events may happen only two or three times per

    year, and involve between two and five samples per event. Using just a

    handful of data points to determine bias may not inspire confidence in

    the calculations. For a longer, broader view of bias, you may want to

    Phoca PDF

  • average a number of events and samples together. Outside the US, PT

    and EQA may be more frequent and involve more samples, so confidence

    in the bias determinations is higher.

    Here is where peer group evaluation may be more helpful. Often peer

    groups collect all the data, or at least a lot more data points than

    PT or EQA challenges.

    Another technique of monitoring bias on a continuous basis is patient

    split-sampling. When there is an available reference/comparative

    method, you can run the same patient sample on the "test" method and

    comparative method on an ongoing basis. In our earlier scenario with

    the two identical instruments in the same laboratory, for example,

    split-sampling would be a good technique to monitor the differences

    between the methods/instruments. Many health systems don't have the

    ability to run that continuous comparison in-house, unfortunately.

    The Commutability Conundrum and the Matrix effect

    No, this isn't time to take the blue pill (although after learning

    about all the biases in the laboratory world, you might wish you could

    wake up from this metrology Wonderland). There's one last issue when

    it comes to bias and how we measure it. It's called Commutability.

    "The term 'commutability' was first used to describe the ability of a

    reference or control material to have interassay properties comparable

    to the properties demonstrated by authentic clinical samples when

    measured by more than one analytical method ... More recent metrologic

    documents expand the concept; they describe commutability as the

    equivalence of the mathematical relationships between the results of

    different measurement procedures for a reference material and for

    representative samples from healthy and diseased individuals ."

    W. Greg Miller, Gary L Myers, Robert Rej, Why Commutability Matters, Clin Chem 52(4):

    553. 2006.

    Commutability, in layman's terms, means that if a bias is detected by

    control materials, there is also a bias in real patient samples. The

    control behaves like the patient sample. The assumption that the

    control behaves like the patient sample is built into the very

    foundation of quality control (if the controls were unstable and

    wholly different in behavior from patient samples, there would be no

    point in running them).

    Unfortunately, it is not easy to build a cost-efficient control

    material that behaves exactly like a real patient sample. The

    challenges of creating control materials are a huge topic outside the

    scope of our focus in this lesson. Suffice it to say that

    manufacturers try to create controls that are as close as practically

    Phoca PDF

  • possible to patient samples, and laboratories try to put up with the

    differences between controls and patient samples. For cholesterol and

    glycated hemoglobin, for example, there is a strong commitment to

    create a traceability chain and find, minimize, or eliminate matrix

    effects. For other analytes, however, there is at best mixed success

    in eliminating matrix biases.

    When there is a distinct difference between the method performance

    using a control material versus a patient sample, usually because of

    the constituents of the control material, this is called a Matrix

    Effect (which has absolutely no relationship to the Wachowski

    brothers, Keanu Reeves, or Laurence Fishburne). This, in effect, is

    another bias. The control materials are biased from the patient

    samples. Matrix Effects are most obvious when you plot different

    methods and instruments in PT or EQA groups; whenever there is a

    marked difference between instrument A and instrument B on the same

    event, the answer is often a matrix effect (really, it's a bias of the

    biases)

    Can't we just forget about bias? Pretend it doesn't exist?

    "A fundamental goal of laboratory medicine is that results for

    patients samples will be comparable and independent of the medical

    laboratory that produced the results. Routine measurement procedures

    of acceptable analytical specificity that have calibration traceable

    to the same higher order reference material or reference measurement

    procedure should produce numerical values for clinical samples that

    are comparable irrespective of time, place, or laboratory generating

    the result."

    W. Greg Miller, Gary L Myers, Robert Rej, Why Commutability Matters, Clin Chem 52(4):

    553. 2006.

    In a world where professional guidelines are making global

    recommendations for cutoff limits, where multinational diagnostic

    manufacturers are issuing reference ranges that are often used by

    customers as de facto reference ranges around the world, and where

    pay-for-performance schemes are implemented with agency-mandated

    cutoffs, it is not possible to ignore bias.

    Furthermore, while one hand of ISO is encouraging measurement

    uncertainty and the elimination of bias as a factor in the performance

    of methods, another ISO-driven effort is toward traceability,

    standardization, and harmonization. This latter effort explicitly

    recognizes that there are biases between methods and urges some form

    of standardization to harmonize results. In the US, there is no

    regulatory mandate for standardization, harmonization, or

    traceability. Indeed, the FDA has no real power to order medical

    device manufacturers to supply traceability information with their

    Phoca PDF

  • applications for FDA clearance and approval.

    In a perfect world, these two ISO desires would be fulfilled. Methods

    would be traceable, standardized, and/or harmonized, so that many of

    the biases discussed here could be eliminated. In that world,

    measurement uncertainty would be easy to calculate because you

    wouldn't have to ignore biases.

    In our less-than-perfect world, however, not only are methods often

    un-traceable, un-standardized, and un-harmonious, there are also

    biases that will still exist even in the presence of national

    standardization programs (witness HbA1c). We have lots of biases which

    have not yet been eliminated or reduced sufficiently to assure the

    comparability of laboratory test results.

    Powered by TCPDF (www.tcpdf.org)

    Phoca PDF