benchmarking techniques to improve neonatal care: uses and abuses
TRANSCRIPT
Clin Perinatol 30 (2003) 343–350
Benchmarking techniques to improve neonatal
care: uses and abuses
Michele C. Walsh, MD, MSa,b,*aDepartment of Pediatrics, Case Western Reserve University, USA
bNeonatal Intensive Care Unit, Rainbow Babies & Children’s Hospital, University Hospitals of
Cleveland, 11000 Euclid Avenue, Mailstop 6010, Cleveland, OH 44106–6010, USA
In neonatology, as in health care in general, there is a desire to compare
outcomes across institutions. The goals of these comparisons are diverse and
range from a desire to reduce costs of care to improving patient outcomes. A new
goal of these comparisons is to reward institutions with practices that enhance
patient safety, such as the efforts launched by the Leapfrog Foundation and
recently by the Center for Medicare and Medicaid Services. The pioneering work
by Wennberg et al [1–4] was among the first to highlight the differences in
practice among institutions and regions of the country. Through careful dissection
of large patient care claims data sets, he and his colleagues were able to
demonstrate that variations in the frequency of operations, such as cardiac
catheterization, coronary artery bypass operations, and prostatectomy, could not
be explained by differences in patient characteristics. Instead the differences were
explained by differences in physicians’ preference for treatments. Some of these
practice variations are driven by uncertainty (ie, the lack of sound evidence for
any specific course of treatment). Benchmarking is an approach that capitalizes
on these variations in practice and associated outcome differences in an attempt to
reduce the uncertainty about the effects of specific treatment strategies.
Horbar [5] has proposed a theoretical framework to explain variations in
practice (Fig. 1). In his model, as the strength of evidence increases, agreement
on standard practice increases and the expected variation in a practice should
decrease, and in fact any variation may be considered inappropriate. Conversely,
when the evidence is weak, variation is expected to increase. Horbar and the
Vermont-Oxford Network have shown, however, that differences exist even in
the use of practices that have been shown to be highly effective, such as the
0095-5108/03/$ – see front matter D 2003 Elsevier Inc. All rights reserved.
doi:10.1016/S0095-5108(03)00016-2
* Neonatal Intensive Care Unit, Rainbow Babies & Children’s Hospital, University Hospitals of
Cleveland, 11000 Euclid Avenue, Mailstop 6010, Cleveland, OH 44106–6010.
E-mail address: [email protected]
Fig. 1. Variations in practice can be explained by the interaction of the strength of the evidence
together with the degree of agreement. When the evidence is strong, variation in practice is
inappropriate. When evidence is weak, variation is expected and even desirable. (Modified from
concepts presented by JD Horbar, Variations in Neonatal Care, Hot Topics in Neonatology, 2001;
with permission.)
M.C. Walsh / Clin Perinatol 30 (2003) 343–350344
administration of antenatal corticosteroids and the early administration of
surfactant. As predicted by the model, other less well studied practices in
neonatology vary widely among institutions. Differences have been shown to
exist in strategies for managing chronic lung disease and persistent pulmonary
hypertension, and the use of inotropes, pain medication, analgesic agents, and
blood transfusions [6–9]. This inherent variation in practice constitutes a natural
albeit uncontrolled experiment and allows one to explore the impact of different
practices on outcomes. One tool that is useful in exploiting this variation is
termed ‘‘benchmarking.’’
Uses of benchmarking
Benchmarking, a technique developed in the business world in the 1980s, is
the process of comparing outcomes together with a detailed examination of the
processes responsible for the outcomes (Box 1). In benchmarking, one identifies
an institution as the leader in some outcome, studies in detail the processes of
their care, and then emulates their practices with the goal of improving one’s own
outcomes. For example, a business in the hospitality industry that wished to
improve the satisfaction of their customers might study the practices of industry
leaders, such as the Ritz Carlton Hotel or Disney World. The practices that
employees in those outstanding businesses used are called ‘‘best practices’’ and
are then adopted by businesses that wished to improve.
The terminology of best practices elicits a strong negative reaction in some
physicians. They argue that such terminology within health care implies that the
practices have been exhaustively researched and proved to be superior. They
argue further that the practice of benchmarking runs counter to the movement that
Box 1. Uses of benchmarking techniques
� Comparisons of care processes and outcomes among likepopulations using data collected with similar definitions
� Breaking down provincial tendencies of isolated health careto demonstrate that care is not always given in theaccustomed manner
� Integration of practice variation with evidence-based medicineto accelerate the process of change
� Restoring equipoise that allows a critical examination ofpractices, setting the stage for change
� Generating hypotheses that can be tested in clinical trials
M.C. Walsh / Clin Perinatol 30 (2003) 343–350 345
seeks sound evidence for medical practices. In fact, benchmarking is a tool that
complements evidence-based medicine. Evidence-based medical practices are
based on rigorous scientific standards. Unfortunately, most practices in neo-
natology have not been rigorously studied and evidence of effectiveness does not
exist. Indeed, every physician is faced daily with deciding between alternative
treatment strategies that are essentially unstudied. Hundreds of decisions (when
to begin feedings, how rapidly to advance feedings, goals for PaCO2 and O2
saturation, and so forth) must be made in the absence of high-quality evidence to
guide these decisions. Plesk et al [10] has argued that it might not be possible to
conduct randomized trials in some areas because of the nature of the intervention,
the large number of potential treatments and variations on treatments needing
study, or the rapid pace of technologic change. Benchmarking is one tool that
may be used to learn from the existing natural experiment that is produced by
variations in practice among institutions. To reduce the distrust that the use of the
term ‘‘best practices’’ evokes in many physicians, Plesk et al [10] has proposed
the somewhat awkward but more acceptable and accurate term ‘‘potentially better
practices.’’ Plesk et al [10] states that this terminology ‘‘suggests that we are
looking for improvement ideas that have both the logical appeal and the
experience in practice that suggest that they might be beneficial if implemented
in our institution.’’
Several limitations of benchmarking must be recognized. Benchmarking is
only appropriate in areas where definitive scientific evidence of the superiority of
a treatment does not exist. One must also recognize that one is only studying a
subset of all the possible potentially better practices that exist. There may be even
better practices out there that have yet to be discovered. As with any practice
change, the potential for unanticipated adverse consequences exists and must be
monitored carefully. Benchmarking is one component of a continuous quality
improvement cycle of research, diligent evaluation of one’s own practices and
outcomes, formal visits to better performing sites, implementation of change, and
measurement of expected improvement.
M.C. Walsh / Clin Perinatol 30 (2003) 343–350346
Examples of the use of benchmarking as an improvement tool in health care
are numerous. One of the first and best documented benchmarking examples is
that of the New England Cardiovascular Collaborative [11,12]. This collabora-
tive group included 23 cardiothoracic surgeons from seven centers in Maine,
New Hampshire, and Vermont. Comparison of risk-adjusted mortality rates, from
a registry of coronary artery bypass graft surgery, convinced participants that
differences in mortality rates across institutions could not be explained by
differences in severity of illness, but were caused by unknown, and probably
subtle, differences in aspects of patient care [11]. The group designed a three-
part intervention to reduce mortality associated with coronary artery bypass graft
surgery [12]. The intervention consisted of feedback of outcome data by center,
training in continuous quality techniques, and site visits to other medical centers.
The main outcome measure was a comparison of the observed and expected
mortality ratio in the postintervention period. Mortality at six of the seven
centers declined significantly in the postintervention period for an overall 24%
reduction in mortality. The only center that did not decline was the best
performing center before the intervention. Comparisons with historical control
are notoriously prone to biases toward improvement over time. Comparison of
the New England Cardiovascular Collaborative with the national Health Care
Financing Administration data, however, showed the decline in mortality was
significantly better than that experienced at other centers in the nation during the
same period. Other potential sources of bias include selection bias (centers that
are motivated to improve might implement other changes that improve outcome)
and the Hawthorne effect (outcomes tend to get better when they are observed
more carefully).
In neonatology, the Vermont Oxford Collaborative for Quality Improvement
has conducted two demonstration projects using benchmarking methodology.
One group of six neonatal intensive care units (NICUs) focused on reducing
nosocomial infection, whereas another group of four NICUs focused on reducing
the rate of bronchopulmonary dysplasia [13]. Preliminary analyses have demon-
strated significant improvement in both outcomes between 1994, the year before
the beginning of the project, and 1996, the year after implementation of the
‘‘potentially better practices’’ had begun. The overall rate of nosocomial infection
at the six NICUs in the infection subgroup declined from 26.3% in 1994 to
20.9% in 1996 (P = .007). The rate of supplemental oxygen administration at
36 weeks for infants weighing 501 to 1000 g decreased from 43.5% in 1994 to
31.5% in 1996 (P = .03) at the four NICUs in the bronchopulmonary dysplasia
subgroup. There was significant variation among NICUs with respect to whether
improvement occurred and the magnitude of improvement achieved in each
subgroup. The improvements observed at the NICUs in these subgroups were
significantly larger than were the changes observed at the 66 other North
American Vermont Oxford Network centers not participating in the NICU
project. Benchmarking is a relatively new approach in medicine and further
study with more rigorous designs is needed to demonstrate the effectiveness and
validity of this approach.
M.C. Walsh / Clin Perinatol 30 (2003) 343–350 347
Abuses of benchmarking
Techniques of benchmarking can be abused if the basic principles on which
the method is based are violated. Benchmarking begins with an assumption that
patient populations are similar, and that the outcomes studied are clearly defined
and measured in the same way among institutions. Benchmarking proceeds with
a careful evaluation of care processes associated with desired outcomes. A
common but incorrect methodology that is labeled benchmarking seeks to use
the short cut of claims data, such as the Medicare database, or information drawn
from hospital charge data, to compare outcomes among institutions. Such efforts
are fraught with errors because inherently different patient populations are
compared and are not true benchmarking in that they look at outcomes only
and do not provide detailed data on the processes of this care.
A common failure in the use of claims based data is that such data are often
generated from the coding of medical records using ICD-9 codes and diagnosis-
related groups. These coding systems were designed for adult patients and do not
function well for pediatric patients in general and neonates in particular. Errors in
coding are common. Before any such data are used for comparison, neo-
natologists are wise to validate the data generated against other data sources,
such as an existing database or unit logbook to ensure that all patients are
captured and that values generated pass scrutiny at least at this level. For
example, if the ICD-9 or diagnosis-related group report for a large NICU
enumerates five patients per year with a birthweight less than 750 g with an
average length of stay of 15 days, neonatologists immediately recognize that
these numbers are erroneous. More subtle errors can only be detected by detailed
inspection of an internal database to confirm or refute the ICD-9 and diagnosis-
related group data reported. All such data must be verified before it is used as a
basis for comparison among institutions.
Neonatologists must be aware of the potential pitfalls that exist in comparing
such charge-based data. Current neonatal care arises from a regionalized system
of care with planned differences in the level of severity of illness in the patients
cared for in different level units. If the impact of the regionalized system is not
recognized, erroneous conclusions about care are particularly likely to occur. For
example, if two units are compared for the outcome length of stay and unit A is
found to have an average length of stay of 8.5 days, and unit B is found to have
an average length of stay of 12.5 days, an investigator who is unfamiliar with
clinical aspects of neonatology might erroneously conclude that unit A is more
efficient than unit B. Numerous other potential explanations exist, however, for
the difference in length of stay including the following: unit A is a level 2 unit
and unit B is a level 3 unit caring for more complex and ill patients; unit A
transfers all complex patients to another institution, whereas unit B accepts
patients from other units; unit B has equally complex patients as unit A but
transfers stabilized patients back to their original hospitals to complete their
convalescence. Neonatologists must be aware of these limitations and ensure that
all comparative reports are adjusted for severity of illness.
M.C. Walsh / Clin Perinatol 30 (2003) 343–350348
Protecting participants in quality improvement initiatives
In benchmarking and other quality improvement projects, one principle must
be kept in focus: patient protection. Any quality improvement project that
changes practice has the potential for unanticipated adverse consequences just
as that potential exists in randomized clinical trials. Participants in randomized
trials are protected from such consequences by the vigilance of the investigators,
oversight by institutional review boards, and data safety monitoring committees.
Strict regulations have evolved to ensure that subjects in trials are informed of the
risks and consent to participate. Patients who are affected by quality improvement
projects deserve the same protections. Casarett and Sugarman [14] have recently
proposed standards for determining when quality improvement projects should be
considered research. They propose that quality improvement projects be reviewed
and regulated as research if most patients involved are not expected to benefit
directly from the knowledge gained or if additional risks or burdens are imposed
to make the results generalizable. Whether the first of these criteria should require
institutional review as research is controversial. This standard should certainly be
applied in any quality improvement project that meets the second criterion.
Applying benchmarking to improve quality in practice
Benchmarking to improve the outcomes of care can be done at any institution.
But doing benchmarking well can be expensive and time consuming. The
Box 2. Steps in a successful benchmarking initiative
1. Compare outcomes to those of similar institutions.2. Select an outcome to improve.3. Scrutinize practices to become familiar with the details
of care.4. Identify comparator institutions and conduct interviews or
site visits to identify potentially better practices.5. Involve all disciplines of the care team in the project.6. Using a physician champion, build consensus for the prac-
tice change.7. Implement the change.8. Measure the impact of the change on both the process and
the outcomes frequently: weekly or monthly. Disseminatethe results to all members of the care team.
9. Re-evaluate the project within 6 weeks of implementation toassess successes and barriers.
10. Redesign areas in need of additional improvement and begina new cycle of change.
M.C. Walsh / Clin Perinatol 30 (2003) 343–350 349
necessary steps are outlined in Box 2. The payoffs for this investment are
improved quality of care. Better quality of care often translates directly into more
efficient care with shorter lengths of stay and a more favorable impact on the
hospital’s financial well being. Health administrators are well aware of this link
between quality and efficiency and often financially support teams that are
motivated to undertake these efforts. Rogowski et al [15] analyzed the economic
implications of collaborative quality improvement and reported improved patient
outcomes and substantial cost savings.
The necessary ingredients for a successful benchmarking initiative include
health team members dedicated to improving care, access to comparative data on
outcomes, and clinicians with a willingness to change current practices. These
characteristics represent the finest traditions in clinical medicine. Most physicians
intuitively recognize benchmarking processes as being similar to the opinions
they collect from respected colleagues when confronted with a challenging case.
True benchmarking expands and formalizes this process and provides access to
data that inform decisions. Some physicians are wary of participating in
benchmarking or other quality improvement processes for fear that they will be
asked to relinquish autonomy or practice ‘‘cookbook’’ medicine. Nothing could
be farther from the truth. Physicians are encouraged to continue to individualize
the treatment of their patients, while having a better understanding of the complex
interplay of processes, personnel, and organizations that impact that care.
Neonatologists are fortunate to have available an organized forum to shepherd
benchmarking and quality improvement efforts. The Vermont-Oxford Network
for Evidence-Based Quality Improvement Collaborative for Neonatology estab-
lished in 1998 provides high-quality support for NICUs beginning their collab-
orative improvement efforts. For a nominal fee, member institutions participate in
2-year programs that facilitate improvement. Quarterly and annual reports allow
institutions to continue their improvement initiatives and monitor their progress
over time. The effectiveness of these processes has not been tested rigorously in
randomized trials, but has been shown to be effective in numerous reported
projects. As evidence of their effectiveness, many health care insurers specifically
reward Vermont-Oxford Network institutions with higher scores on report cards.
Improving the quality of neonatal care and the outcomes of tiny patients are
goals that all neonatologists support. Benchmarking adds a tool to the armamen-
tarium to produce these changes and complements improvements driven by
evidence-based medicine.
References
[1] Wennberg J, Roos N, Sola L, et al. Use of claims data systems to evaluate health outcomes:
mortality and reoperation following prostatectomy. JAMA 1987;257:933–6.
[2] Wennberg J, Malley A, Hanley D, et al. An assessment of prostatectomy for benign urinary tract
obstruction. JAMA 1988;259:3027–30.
[3] Winslow CM, Kosecoff JB, Chassin MR, Knouse D, Brook R. The appropriateness of carotid
endartarectomy. N Engl J Med 1988;18:722–7.
M.C. Walsh / Clin Perinatol 30 (2003) 343–350350
[4] Winslow C, Chassin M, Kanouse DE, Brook RH, et al. The appropriateness of performing
carotid artery bypass surgery. JAMA 1989;260:505–9.
[5] Horbar JD. Variations in neonatal intensive care. In: Lucey J, editor. Hot topics in neonatology.
Washington: 2001.
[6] Horbar JD. Hospital and patient characteristics associated with variation in 28 day mortality rates
for VLBW infants. Pediatrics 1997;99:149–56.
[7] Kahn DJ, Richardson DK, Gray JE, et al. Variation in neonatal intensive care units in narcotic
administration. Arch Pediatr Adolesc Med 1998;152:844–51.
[8] Ringer SA, Richardson DK, Sacher RA, Keszler M, Churchill WH. Variations in transfusion
practice in neonatal intensive care. Pediatrics 1998;101:194–200.
[9] Walsh-Sukys MC, Tyson J, Wright LL, Bauer CR, Korones SB, Stevenson DK, et al. Persistent
pulmonary hypertension of the newborn in the era before nitric oxide: practice variation and
outcomes. Pediatrics 2000;105:14–20.
[10] Plesk PE, Al-Aweel IC, Gray JE, Richardson DK. Characterizing practice style in neonatal
intensive care. Pediatr Res 1998;43:226A. Quality improvement methods in clinical medicine.
Pediatrics 1999;103:203–14.
[11] O’Connor GT, Plume SK, Olmstead EM, et al. A regional prospective study of in-hospital
mortality associated with coronary artery bypass grafting. JAMA 1991;266:803–9.
[12] O’Connor GT, Plume SK, Olmstead EM, et al. A regional intervention to improve the hospital
mortality associated with coronary artery bypass graft surgery. JAMA 1996;275:841–6.
[13] Horbar JD, Rogowski JR, Plesk PE, et al. Collaborative quality improvement for neonatal
intensive care. Pediatrics 2001;107:14–22.
[14] Casarett D, Karlawish JH, Sugarman J. Determining when quality improvement initiatives
should be considered research: proposed criteria and implications. JAMA 2000;283:2275–80.
[15] Rogowski JA, Horbar JD, Plesk PE, et al. Economic implications of neonatal intensive care unit
collaborative quality improvement. Pediatrics 2001;107:23–9.