meta-analysis and endocrinology

14
EPIDEMIOLOGY AND CLINICAL DECISION MAKING 0889-8529/97 $0.00 + .20 META-ANALYSIS AND ENDOCRINOLOGY Diana B. Petitti, MD, MPH Meta-analysis is a quantitative approach for systematically identifying and statistically combining the results of studies to arrive at summary conclusions about a body of research. The term meta-analysis was coined in 1976 by sociolo- gist Roger Glass from the Greek prefix rneta, which means "transcending," and the root analysis.24 The development of meta-analysis in the social sciences grew out of the perception that narrative literature reviews were selective in their inclusion of studies and subjective in their weighting of studies. The use of meta-analysis in medicine has grown rapidly over the last decade (Table 1). Its expanding use in medicine coincides with the focus of medical research on the randomized clinical trial and with the growing emphasis on evidence-based clinical decision making. Meta-analysis and evidence-based medicine have become inextricably linkedJ9 Meta-analysis was initially used in medicine mainly as a way to overcome the problems associated with small and individually inconclusive clinical trials. It is now used more generally to summarize clinical trials irrespective of their size. Earlier, meta-analysis was applied mostly to combine the results of random- ized clinical trials. There are many topics for which randomized trials are impossible. Nonexperimental studies provide clinically relevant information. Thus, meta-analysis of nonexperimental studies has become common. Meta- analysis is used with increasing frequency not simply to derive a single estimate of effect but more broadly to examine and try to understand the reasons for contradictions among different studies of the same topic. USE OF META-ANALYSIS IN ENDOCRINOLOGY Published articles that report the results of meta-analysis are found more often in generalist than specialty journals (Table 2). Among specialty journals, From the Department of Research and Evaluation, Kaiser Permanente, Southern California Region, Pasadena, California ENDOCRINOLOGY AND METABOLISM CLINICS OF NORTH AMERICA VOLUME 26 * NUMBER 1 - MARCH 1997 31

Upload: diana-b

Post on 05-Jan-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

EPIDEMIOLOGY AND CLINICAL DECISION MAKING 0889-8529/97 $0.00 + .20

META-ANALYSIS AND ENDOCRINOLOGY

Diana B. Petitti, MD, MPH

Meta-analysis is a quantitative approach for systematically identifying and statistically combining the results of studies to arrive at summary conclusions about a body of research. The term meta-analysis was coined in 1976 by sociolo- gist Roger Glass from the Greek prefix rneta, which means "transcending," and the root analysis.24 The development of meta-analysis in the social sciences grew out of the perception that narrative literature reviews were selective in their inclusion of studies and subjective in their weighting of studies.

The use of meta-analysis in medicine has grown rapidly over the last decade (Table 1). Its expanding use in medicine coincides with the focus of medical research on the randomized clinical trial and with the growing emphasis on evidence-based clinical decision making. Meta-analysis and evidence-based medicine have become inextricably linkedJ9

Meta-analysis was initially used in medicine mainly as a way to overcome the problems associated with small and individually inconclusive clinical trials. It is now used more generally to summarize clinical trials irrespective of their size. Earlier, meta-analysis was applied mostly to combine the results of random- ized clinical trials. There are many topics for which randomized trials are impossible. Nonexperimental studies provide clinically relevant information. Thus, meta-analysis of nonexperimental studies has become common. Meta- analysis is used with increasing frequency not simply to derive a single estimate of effect but more broadly to examine and try to understand the reasons for contradictions among different studies of the same topic.

USE OF META-ANALYSIS IN ENDOCRINOLOGY

Published articles that report the results of meta-analysis are found more often in generalist than specialty journals (Table 2). Among specialty journals,

From the Department of Research and Evaluation, Kaiser Permanente, Southern California Region, Pasadena, California

ENDOCRINOLOGY AND METABOLISM CLINICS OF NORTH AMERICA

VOLUME 26 * NUMBER 1 - MARCH 1997 31

32 PETITTI

Table 1. NUMBER OF PUBLISHED ARTICLES CONCERNING META-ANALYSIS IN MEDLINE FOR SEVERAL PERIODS

Number of Period Articles ArticledYear

1980-1 984 0 0 1985-1 989 299 60 1990-1992 1146 382 1993-1 995 1533 51 1

meta-analysis has been published most often in cardiology and obstetric journals. In the period from January 1,1993 through January 31, 1996, no studies that used meta-analysis were identified in a MEDLlNE search of two specialty journals, Diabetes and Endocrinology, that have endocrinologists as a primary audience.

A search of MEDLINE to identify published meta-analyses of four topics pertinent to endocrinologists (diabetes mellitus, lipid disorders, osteoporosis, thyroid disease) for the period January 1, 1993 through January 31, 1996 yielded 43 articles (Table 3). Among these topics, lipid disorders were most often the subject of meta-analysis. Seventeen of the 43 meta-analyses (40%) published in the period from January 1, 1993 through January 31, 1996 on topics pertinent to endocrinology were meta-analyses of nonexperimental studies or included nonexperimental as well as experimental studies.

Table 4 describes four recently published meta-analyses pertinent to endo- ~rinology.~, 14, 20, 57 These meta-analyses are used to illustrate various points throughout this article. The four meta-analyses exemplify the trend toward meta-analysis of nonexperimental as well as experimental studies.

STEPS IN A META-ANALYSIS

Overview

There are five steps in a meta-analysis. First, studies with relevant data are identified. Second, eligibility criteria for inclusion and exclusion of the identified

Table 2. NUMBER OF ARTICLES USING META-ANALYSIS OR COMMENTING

JOURNALS FROM JANUARY 1,1993 THROUGH JANUARY 31,1996 ON META-ANALYSIS PUBLISHED IN VARIOUS GENERALIST AND SPECIALIST

Journal Number of Articles

Generalist Annals of Internal Medicine Journal of the American Medical Association New England Journal of Medicine

American Journal of Cardiology Diabetes Endocrinology Journal of Infectious Disease Journal of Urology Obstetrics & Gynecology Radiology

Specialist

35 48 11

12 0 0 1 6

19 12

META-ANALYSIS AND ENDOCRINOLOGY 33

Table 3. NUMBER OF PUBLISHED ARTICLES USING META-ANALYSIS FOR SEVERAL TOPICS PERTINENT TO ENDOCRINOLOGY FROM JANUARY 1,1993 THROUGH JANUARY 31, 1996

Topic Number of Articles ~~ ~ ~~~

Diabetes mellitus 8 Lipid disorder* 30 Osteoporosis 2 Thyroid diseaset 3

'Cholesterol, triglycerides, apolipoproteins, lipoproteins, hyperlipidemia. tHypothyroidism, hyperthyroidism, Graves' disease, thyroid neoplasms, thyroidectomy, thyroid hor-

mones

studies are defined. Third, data are abstracted from eligible studies. Fourth, the abstracted data are analyzed statistically. Fifth, heterogeneity and the reasons for heterogeneity are examined.

Identification of Studies

A systematic explicit approach to the identification of studies with relevant data is one of the features of meta-analysis that distinguishes it from a qualitative literature review. Ideally, information for a meta-analysis would identify all of the relevant information, including information from unpublished studies and from studies that are still in progress. In practice, the retrieval of information for a meta-analysis is usually limited to published information. In a review of 150 published meta-analyses, Cook and co-workers12 found that 46 (30.7%) included unpublished data in their primary analysis.

The identification of published studies usually begins with a computerized search of MEDLINE and other computerized literature databases. The titles and abstracts of studies identified in the computerized search are scanned to exclude ones that are clearly irrelevant. The full texts of the remaining articles are retrieved and read to determine whether they contain information on the topic of interest. The reference lists of articles with information on the topic of interest are reviewed to identify citations to other studies of the same topic, and publications that were not identified in the computerized literature search are retrieved and reviewed for the presence of relevant information. Reference lists

Table 4. DESCRIPTION OF FOUR RECENTLY PUBLISHED META-ANALYSES PERTINENT TO ENDOCRINOLOGY

Study Topic Design

Warshafsky et a15 Garlic and serum cholesterol Randomized trials Cummings and Psaty4 Cholesterol lowering and death Randomized trials*

Bohlen et a15 ACE inhibitors and proteinuria in Longitudinal, randomized

Faber and GalloeZ0 Change in bone mass in women Cross-sectional

from injury

diabetics trials

treated with L-thyroxine

'Animal studies, cross-sectional studies, and cohort studies were also identified but quantitative estimates of effect were not derived from these studies.

34 PETITTI

of review articles are also perused to check for the completeness of the assem- bled list of relevant publications. In many cases, the list of studies identified by computer literature search and by reference checks is submitted for review by a knowledgeable expert who is asked to identify studies of the topic that have not been included in the list.

The number of potentially relevant publications retrieved in a MEDLINE search frequently is large. For example, in the meta-analysis by Bohlen and co-workers5 comparing the effect on proteinuria of ACE inhibitors and other antihypertensive agents, 260 publications were identified in the literature search conducted as described previously.

The work of retrieving all of the potentially relevant literature for a meta- analysis can be large, and thus the conduct of a meta-analysis can be expensive and time-consuming.

Defining Eligibility Criteria

The next step in a meta-analysis is specification of eligibility criteria. Explicit delineation of eligibility criteria for inclusion in a meta-analysis ensures repro- ducibility of the meta-analysis and minimizes bias in the selection of studies for the meta-analysis. Eligibility criteria are usually given as inclusion criteria and exclusion criteria. The reasons for the inclusions and exclusions are generally explained or documented in the protocol for the meta-analysis.

Table 5 shows the inclusion and exclusion criteria for the meta-analysis of garlic and cholesterol lowering by Warshafsky and c o - ~ o r k e r s . ~ ~ These exem- plify the explicitness that is necessary in the definition of eligibility criteria in a well-conducted meta-analysis.

When the eligibility criteria for a meta-analysis are strict, the final number of studies included in the meta-analysis may be small in comparison with the number of studies identified in the literature search. In the meta-analysis by Warshafsky and c0lIeagues,5~ 28 of 260 studies obtained in the literature search measured the effect of garlic on serum cholesterol in humans. Of these 28 studies, only 5 fulfilled the eligibility criteria for the meta-analysis.

A log of excluded studies is kept, and the reasons for exclusion are recorded. In many publications reporting the results of meta-analysis, citations to excluded as well and included studies are provided. The citation of excluded studies is

Table 5. INCLUSION AND EXCLUSION CRITERIA IN META-ANALYSIS OF GARLIC AND CHOLESTEROL LOWERING

Inclusion criteria Published Randomized with placebo control Effect of garlic on cholesterol a prespecified hypothesis Total cholesterol in experimental and control groups exceeded 5.7 mmol/L in 75% of

subjects Exclusions

Information to calculate effect size not presented Control medication-altered lipids Purpose of study, effect of garlic on fat loading

From Warshafsky S, Kamer RS, Sivak SL: Effect of garlic on total serum cholesterol: A meta- analysis. Ann Intern Med 119:599, 1993; with permission.

META-ANALYSIS AND ENDOCRINOLOGY 35

desirable because it allows readers and reviewers to assess the completeness of the attempt at retrieving relevant studies.

Data Abstraction

The next step in the meta-analysis is the abstraction of data from publica- tions and study reports. In determining eligibility, information that documents whether identified studies are eligible for the meta-analysis is abstracted for all of the studies identified in the literature search. Next, for all eligible studies, information on the relevant outcomes of the study is abstracted.

The procedures for abstracting data in a meta-analysis should be similar to procedures used to abstract data from medical records or other administrative documents. That is, data should be abstracted onto structured forms that have been p r e t e ~ t e d . ~ ~ . ~ ~ There should be an explicit plan to ensure minimal coder bias and the reliability of data abstra~tion.~~, 52, 53

Statistical Analysis

The next step is statistical analysis of the data. The result of the statistical analysis is most often a summary estimate of effect size and a measure of its variance or a 95 percent confidence interval. The summary estimate is interpre- ted as the estimate of the effect size combining all of the eligible studies.

The summary estimate is determined by the type of study and the question being addressed. The meta-analysis of the effect of garlic on serum cholesterol by Warshafsky and colleagues57 derived a summary estimate of the reduction in serum cholesterol for garlic compared with placebo ( - 0.59 mmol/L; 95% confi- dence interval, - 0.74 mmol/L to - 0.44 mmol/L). The meta-analysis of choles- terol lowering and death from injury by Cummings and P ~ a t y ' ~ derived a summary estimate of the relative risk of death from injury in men treated with cholesterol-lowering drugs compared with the risk in a control group (1.41; confidence interval, 0.95 to 2.10). The meta-analysis comparing ACE inhibitors with other antihypertensive agents by Bohlen and co-workers5 derived a sum- mary estimate of the percentage reduction in urinary albumin or protein in diabetic patients treated with ACE inhibitors (45%; 95% confidence interval, - 64% to - 25%), in patients treated with diuretics with or without beta-blockers (23%; 95% confidence interval, -35% to -ll%), and in patients treated with calcium channel antagonists (17%; 95% confidence interval, - 33% to - 2%).

The statistical formulas used to estimate effect size and variance or 95% confidence intervals are not presented here. The interested reader is referred to introductory descriptions of these methods by Greenland,Z6 Petitti,4z and Wolf,58 The articles by Cooper and Hedges,I3 Hunter and colleague^,^^ and Hedges and Olkin30 are advanced discussions of statistical analysis for meta-analysis.

Examination of Heterogeneity

Heterogeneity literally means dissimilar. The term heterogeneity in meta- analysis is used to refer two things-dissimilarities in the quantitative results of individual studies in the meta-analysis, called statistical heterogeneity, and dissimilarities in the designs, end points, and methodologies of studies in the meta-analysis, usually called clinical heterogeneity. Examining the reasons for

36 PETITTI

statistical heterogeneity is an increasingly important goal of meta-analysis.”, 54

A meta-analysis that does not assess the possibility of statistical heterogeneity and attempt to explain the heterogeneity should be interpreted cautiously.

Statistical heterogeneity may be caused by clinical heterogeneity. An under- standing of how clinical heterogeneity affects effect measures can provide im- portant scientific and clinical insightss4 The usefulness of identifying statistical heterogeneity and exploring the relationship between clinical heterogeneity and statistical heterogeneity is illustrated in the meta-analysis of the effects ACE inhibitors have in comparison with other antihypertensive agents on proteinuria in diabetic patients5 In that study, the researchers found evidence of statistical heterogeneity in the effect of calcium channel antagonists. They compared the change in urinary protein in patients treated with nifedipine with the change in patients treated with other calcium channel blockers. An increase in urinary protein was found in patients treated with nifedipine. When studies of nifedi- pine were excluded from the analysis, the percentage change in urinary albumin or protein in patients treated with calcium channel antagonists was - 35% (95% confidence interval, - 47% to - 24%) compared with - 17% in the all studies of calcium channel antagonists. The researchers were able to identify specific bio- logic mechanisms that could account for the difference in the effect of nifedipine compared with other calcium channel antagonists and concluded that nifedipine might have a specific adverse effect on urinary protein in diabetes.

Statistical heterogeneity is assessed by conducting a formal statistical test. When studies are found to be statistically heterogeneous based on the results of a formal test, some investigators argue that a summary measure of effect size should not be calculated.z6, 27 Others recommend using a random effects model to take the heterogeneity into account statistically?, 22

For more information on the formulas used for calculating statistics to assess statistical heterogeneity and the complex arguments for and against the use of fixed or random effect models the reader is referred to the articles by Fleiss,2l DerSimonian and Laird,Is Laird and Mo~teller,3~ and Wolf.58 Articles by Colditz and co-workers,” Bailey: Fleiss and Gross,Z2 Pet0,4~ and Thompson and Pocock,5s discuss fixed versus random effect models in detail.

The statistical power of tests of heterogeneity is low. For this reason, the absence of statistical evidence of heterogeneity should not preclude an assess- ment of the possible effects of clinical heterogeneity on the study results. In the meta-analysis of cholesterol lowering and death from injury,I4 a statistical test of heterogeneity for all primary prevention trials, which included trials of both diet and drugs to lower cholesterol, had a value greater than 0.2, which is not significant. Other investigators had previously hypothesized that cholesterol lowering by drug therapy but not by diet increased the risk of death from injury. Cummings and Psaty14 appropriately compared estimates of relative risk of death from injury in primary prevention studies that used drugs to lower cholesterol with studies that used diet, even though there was no statistical evidence of heterogeneity. The relative risk estimates for men were 1.39 (95% confidence interval, 0.84 to 2.29) for studies of drugs and 1.50 (95% confidence interval, 0.76 to 2.94) for the single study of diet. The meta-analysis provided no evidence of a difference in the risk of death from injury for cholesterol lowering by drugs compared with cholesterol lowering by diet.

POOLING DISTINGUISHED FROM META-ANALYSIS

In pooled analysis, the original subject-level data are obtained from investi- gators and analyzed using common definitions, coding, and cut-points for vari-

META-ANALYSIS AND ENDOCRINOLOGY 37

ables. It is possible to adjust for the same confounders using the same statistical model. Pooled analysis is not the same as meta-analysis. Special problems and limitations are associated with pooled analysis, including problems related to the unwillingness of investigators to provide data for inclusion in the pooled study.I0 If some investigators with relevant data do not agree to cooperate, the pooled analysis may be biased by the exclusion of these studies.

Pooled analysis is more labor-intensive and time-intensive than meta-analy- sis. An article by Friedenrei~h~~ describes how pooled analysis of epidemiologic studies is done.

LIMITATIONS OF META-ANALYSIS

Publication Bias

The validity of meta-analysis is critically dependent on the identification and consideration of all information on a given topic. Most meta-analyses include only published studies which makes them vulnerable to publication bias. This refers to the greater likelihood of research with statistically significant results to be submitted and published in comparison with nonsignificant and null results4

The existence of a bias in favor of the publication of statistically significant results is well-documented.’*, 49, 51 Easterbrook and co-workers’8 examined the publication status of 285 analyzed studies for which institutional review board approval had been obtained between 1984 and 1987 in Oxford. A total of 154 studies had statistically significant results and 131 did not. Of the 154 studies with statistically significant results, 60.4% had been published. Of the 131 studies that did not have statistically significant results, 34.4% had been published.

Publication bias frequently is attributed to editorial policies that favor the publication of positive results and to a presumed bias of journal reviewers against negative results. Dickersin and co-workersI6 found that more than 90% of 124 unpublished studies from Johns Hopkins University had not been submit- ted for publication. Only 6 of the 124 unpublished studies had been submitted for publication and rejected by a journal. Thus, publication bias is a result of the investigator’s failure to submit negative studies and not of the policies of journals with regard to negative results or reviewer bias against negative results.

Statistical and quasistatistical solutions to assessing and overcoming publi- cation bias have been d e ~ c r i b e d . ~ ~ , ~ ~ , 46, 47 Most of these approaches to correct for bias in the original study or to determine the effect of publication bias on the conclusion of a meta-analysis are not recommended because they are poorly based in statistical theory or make insupportable statistical ass~mptions.~, 32, 45

The method referred to as the ”file drawer” estimate is used frequently in an attempt to address the issue of publication b i a ~ . ~ ~ , ~ ~ The method has considerable intuitive appeal. It purports to estimate the number of unpublished studies that would need to exist to negate the results of the published studies that were included in the meta-analysis. Ln making this estimate, it is assumed that the mean effect in unobserved unpublished studies is zero, and that the mean number of subjects in each of the hypothetical unpublished studies is the same as the mean in the published studies. In reality, it is possible that the mean effect might be in the opposite direction in the unobserved unpublished studies. It is impossible to know how large the unobserved studies might be; they might be much larger than the observed studies. Use of the file drawer method is

38 PETITTI

based on circular reasoning. It leads to a predictable conclusion; if one assumes there is no publication bias, publication bias does not account for the results.

The meta-analysis of changes in bone mass in women treated with L- thyroxinez0 calculated a file drawer estimate of the number of unpublished studies that would have to exist to make the observed decrease in bone mass in women treated with L-thyroxine insignificant. The other meta-analysis described in Table 4 did not carry calculations of this type. These calculations are rare in meta-analyses of medical topics.

The funnel plot is an exception to this general caution with regard to statistical and quasistatistical approaches to publication bias. This graphical technique was first described by Light and Pillemer.= The effect measure, appro- priately scaled, is plotted on the horizontal axis and the sample size on the vertical axis. In the absence of publication bias, the graph should have the shape of a funnel that is viewed sideways with large opening down and the tip pointed up and centered on the true effect size. If there is bias against the publication of null results or results showing an adverse effect of the treatment, the left comer of the pyramidal part of the funnel will be distorted or missing. When a funnel plot is distorted, publication bias should be suspected.

Other Causes of Incomplete Information Retrieval

Even if all studies of a topic are published, it may be difficult to identify and retrieve them, and a meta-analysis may be biased because of incomplete retrieval of information from the published literature. For example, published studies may be missed in a MEDLINE search because the indexing of letters is incomplete, because published abstracts are selectively indexed, or because the search algorithm is imperfect. Relevant studies may be buried in what is often called the ”fugitive literature,” that is, government reports, book chapters, the proceedings of conferences, and published dissertations.

The exclusion of published abstracts is especially problematic. Only 35% to 40% of abstracts are followed by a full report within 4 to 5 years?. 17, 25, 35, 36

Chalmers and co-workers6 showed that publication in full form is unrelated to the quality of the original study. A meta-analysis that excludes published ab- stracts could be biased toward the exclusion of negative studies.

It is common to base meta-analysis on journal articles available in English. Gregoire and co-workersZ8 reported that 78% of the meta-analyses they identified had language restrictions. Language restrictions are probably imposed because it is difficult to retrieve publications in foreign-language journals. When such articles are retrieved, translation can be costly.

Moher and c o - ~ o r k e r s ~ ~ assessed the completeness of reporting, design characteristics, and analytic approaches in 133 randomized trials published in English and 96 randomized trials published in French, German, Italian, and Spanish. There were no differences in the completeness of reporting or in the proportion of “adequately reported” studies between those published in English and other languages. Differences in analytic approaches were found, but these differences were identifiable and could be taken into account in a meta-analysis. Because studies published in languages other than English are equal in quality to studies published in English, and because a main goal of meta-analysis is comprehensiveness and completeness of retrieval of relevant information, a well- conducted meta-analysis should include information published in any language.

In the meta-analysis of garlic and cholesterol 10wering,5~ two of the five studies included in the final analysis were published in a language other than

META-ANALYSIS AND ENDOCRINOLOGY 39

English. Of the total of 14 controlled trials examining garlic and cholesterol lowering (9 were excluded from the final analysis), 6 were published in foreign language journals. Restriction of the meta-analysis to publications in English would have resulted in a paltry yield for this topic.

A critical review of a meta-analysis should consider the completeness of information retrieval and the possibility of bias due to failure to identify and include all of the pertinent published information. The failure to retrieve pub- lished information from non-English journals is as important a threat to the validity of meta-analysis as publication bias.

Bias in the Original Studies

The validity of meta-analysis is dependent on the validity of the studies that are included in the meta-analysis. Meta-analysis cannot compensate for bias in the original studies. If the original studies fail to randomize properly, have an improper control group, mismeasure exposure or disease, do not control adequately for confounding, have a low response rate or a high rate of loss to follow-up, meta-analysis will not remedy the problem. Meta-analysis is particu- larly useless when most of studies of the topic at hand are similarly flawed. The problems created when bias in the original studies is ignored has been called the "garbage in-garbage out" pr0b1em.I~

ACCOUNTING FOR STUDY QUALITY

Meta-analysis focuses attention on the quality of individual studies and the adequacy and completeness of the reporting of the methods and results of these studies. Critical review of all of the studies of a given topic almost always leads to the conclusion that they have not all been performed with equal care. The recognition that not all studies are equal in the quality of conduct and reporting has led to attempts to assess study quality formally and to take it into account in meta-analysis.

Attempts to account for study quality in a meta-analysis have focused for the most part on the development of standard instruments and checklists to evaluate the quality of studies and study reporting9 Measured study quality has then been used as a stratification or weighting variable in meta-analysis. Alternatively, poor measured study quality is used as a basis for excluding a study entirely from the meta-analysis, which is the equivalent of giving the study a statistical weight of zero.

Accounting for study quality is appealing in theory. In practice, it has been difficult to develop valid and reliable instruments to assess quality. The completeness of and the detail of reporting in a study, which is often dictated by journal limitations on space, has a large influence on the measured quality of the study and may result in an excellent study being inappropriately down- weighted in the analysis. Empiric study reveals large disparities in the develop- ment of scales and checklists to assess

The standardized assessment of study quality is a subfield of meta-analysis that is still in its infancy. Given the theoretical and practical problems of as- sessing study quality, meta-analyses that use study quality as a weighting variable should be interpreted cautiously.

Concern about study quality and the completeness of reporting of study methods and results has been especially intense for randomized trials. This

40 PETITTI

concern has resulted in the development of explicit standards for the reporting of randomized trials in biomedical journals (Standards Reporting Group). Ad- herence to these standards will enhance the ability to conduct careful meta- analysis of randomized trials in the future.

VALIDITY, REPRODUCIBILITY, AND QUALITY OF META-ANALYSIS

Validity

Validity is the ability of a method to measure what it is meant to measure. As applied to meta-analysis, validity means the ability to ascertain "the truth" about the relationship between an intervention and an outcome or between an exposure and a disease. Two investigative groups have attempted to assess the validity of meta-analysis by comparing the results of a meta-analysis with a single large gold standard trial.

Chalmers and co-workers8 compared the results of a meta-analysis of multi- ple, undersized randomized trials with the results of a large gold standard trial for three topics: beta-blockers in myocardial infarction, phenobarbital and intracranial hemorrhage in neonates, and intravenous streptokinase in myocar- dial infarction. For all three topics, the results of the meta-analysis and the gold standard study were at least somewhat discrepant.

Villar and co-workers% performed meta-analyses for 30 different interven- tions during pregnancy and childbirth. The results of the 30 meta-analyses were compared with the results of the largest trial for each intervention. Total agreement was defined to be a relative risk in the same direction and the same statistical significance (or lack of it) for both the meta-analysis and the largest trial. The meta-analysis and the largest trial were considered to be in partial agreement if the relative risk was in the same direction but the two studies differed with regard to statistical significance. The meta-analysis and the largest trial were considered to disagree when the relative risks were in opposite directions. There was total or partial agreement for 24 of 30 topics (80%) and disagreement for 6 of 30 (20%).

Based on the observation that meta-analysis did not always square with the results of a large trial of the same topic, Villar and colleagues57 raised concerns about deciding not to conduct a large trial based on the existence of a meta- analysis composed of many small studies. Their concern parallels that expressed elsewhere, that is, that the apparent certainty conveyed by a meta-analysis would dissuade individuals from undertaking (and agencies from funding) further original research on the same

Reproducibility

The reproducibility or replicability of a method is critical to its validity. A method that does not yield reproducible results cannot be valid. Chalmers and co-workers7 compared the results of independent replications of meta-analyses for 18 topics. For 12 of the 18 topics, all of the replicate meta-analyses agreed both in the direction of the effect of the intervention and in regards to whether the effect was statistically significant at a probability level of 0.05. For 3 of the 18 topics, at least one of the replicates found an effect opposite in direction to another replicate. The replicability of meta-analysis of nonexperimental studies

META-ANALYSIS AND ENDOCRlNOLOGY 41

is unstudied. The poor replicability of meta-analysis is troublesome and much ig- nored.

Quality

In a 1987 publication Sacks and co -w~rke r s~~ evaluated the quality of 86 published meta-analyses. Only 24 of 86 meta-analyses addressed all of six major areas that were considered to measure the quality of meta-analysis. For 18 of 23 features considered to measure the quality of meta-analysis, more than 50% of the meta-analyses were rated as inadequate. A more recent systematic assess- ment of the quality of meta-analysis could not be identified.

In an editorial about meta-analysis of randomized controlled trials pub- lished in the Journal of the American Medical Association, Moher and O l k i ~ ~ ” ~ pointed out several recent examples in which the quality of a published meta- analysis and the reporting of the meta-analyses were deficient. In an essay based on his qualitative impressions about meta-analysis, Bailar’ concluded that, in practice, meta-analysis was sometimes ”careless, and even biased.” He observed the frequent failure to observe elementary precautions that might prevent the misinterpretation of meta-analysis. Bailar was particularly concerned about the focus on a single, combined, “best” estimate of a quantitative estimate from all studies.

Concerns about the quality of meta-analysis are important. Perhaps because of its popularity, inexperienced practitioners are conducting meta-analysis. There is much room for improvement in the conduct and reporting of meta-analysis.

STRENGTHS OF META-ANALYSIS

Meta-analysis has many strengths and many advantages over traditional narrative literature review. It defines a standard approach to literature retrieval that ensures complete retrieval of relevant information on a topic. It forces the reviewer/analyst to define explicit criteria for accepting or rejecting studies as a basis for drawing conclusions. It provides a framework for identifying and then explaining inconsistencies in studies of the same topic. It aids in the identifica- tion of areas in which further research should be undertaken to resolve inconsis- tencies among studies. When studies are consistent, it provides an estimate of effect that is useful in guiding policy and public health decisions. Meta-analysis has legitimized literature review as an academically productive endeavor.

SUMMARY

It is likely that more studies that use meta-analysis will be published in the endocrinologic literature. Major strengths of meta-analysis are the systematic ascertainment of research on a given topic and the explicit delineation of reasons for accepting or rejecting studies as a basis for drawing conclusions.

The tendency of meta-analysis to focus on a single estimate of effect and to ignore heterogeneity are problems both with the conduct of meta-analysis and the way in which it is interpreted. Meta-analysis cannot overcome bias in the original studies. It is difficult to perform a good meta-analysis and easy to perform a bad one.

The critical reader should not be overawed by the results of a meta-analysis.

42 PETITTI

Reading a meta-analysis should not substitute for careful reading of the primary studies on which the meta-analysis is based. Meta-analysis should not be used to stifle the conduct of original research.

References

1 2

3

4

5

6

7.

8.

9.

10. 11.

12.

13.

14.

15.

16.

17. 18.

19. 20.

21.

22.

23.

24. 25.

Bailar JC: The practice of meta-analysis. J Clin Epidemiol 48:149, 1995 Bailey K R Inter-study differences: How should they influence the interpretation and analysis of results? Stat Med 6:351, 1987 Bayarri MJ: Comment on ”Selection models and the file drawer problem.” Stat Sci 3:128, 1988 Begg CB, Berlin JA: Publication bias: A problem in interpreting medical data. J Royal Stat SOC A 151:419, 1988 Bohlen L, de Courten M, Weidmann P: Comparative study of the effect of ACE inhibitors and other antihypertensive agents on proteinuria in diabetic patients. Am J Hypertens 7848,1994 Chalmers I, Adams M, Dickersin K, et al: A cohort study of summary reports of controlled trials. JAMA 263:1401, 1990 Chalmers TC, Berrier J, Sacks HS, et al: Meta-analysis of clinical trials as a scientific discipline. 11. Replicate variability and comparison of studies that agree and disagree. Stat Med 6:733, 1987 Chalmers TC, Levin H, Sacks HS, et al: Meta-analysis of clinical trials as a scientific discipline. I. Control of bias and comparison with- large cooperative trials. Stat Med 6:315, 1987 Chalmers TC, Smith H Jr, Blackbum 8, et al: A method for assessing the quality of a randomized control trial. Controlled Clin Trials 2:31, 1981 Checkoway H Data pooling in occupational studies. J Occup Med 33:1257, 1991 Colditz GA, Burdick E, Mosteller F: Heterogeneity in meta-analysis of data from epidemiologic studies: A commentary. Am J Epidemiol 142:371, 1995 Cook DJ, Guyatt GH, Ryan G, et al: Should unpublished data be included in meta- analyses? Current convictions and controversies. JAMA 269:2749, 1993 Cooper H, Hedges LV (eds): The Handbook of Research Synthesis. New York, Russell Sage Foundation, 1994 Cummings P, Psaty BM: The association between cholesterol and death from injury. AM Intern Med 120: 848, 1994 DerSimonian R, Laird N Meta-analysis in clinical trials. Controlled Clin Trials 7177, 1986 Dickersin K, Min Y-I, Meinert CL: Factors influencing publication of research results: Follow-up of applications submitted to two institutional review boards. JAMA 267374,1992 Dudley HAF Surgical research: Master or servant? Am J Surg 135458, 1978 Easterbrook PJ, Berlin JA, Gopalan R, et al: Publication bias in research. Lancet 337867, 1991 Eysenck HJ: An exercise in mega-silliness. Am Psycho1 33:517, 1978 Faber J, Galloe AM: Changes in bone mass during prolonged subclinical hyperthyroid- ism due to 1-thyroxine: A meta-analysis. Eur J Endocrinol 130:350, 1994 Fleiss J L Statistical Methods for Rates and Proportions. New York, John Wiley & Sons, 1981, p 160 Fleiss JL, Gross AJ: Meta-analysis in epidemiology, with special reference to studies of the association between exposure to environmental tobacco smoke and lung cancer: A critique. J Clin Epidemiol 44:127, 1991 Friedenreich CM:’ Methods for pooled analyses of epidemiologic studies. Epidemiology 4295, 1993 Glass GV: Primary, secondary and meta-analysis of research. Educ Res 5:3, 1976 Goldman L, Loscalzo A: Fate of cardiology research originally published in abstract form. N Engl J Med 303:255, 1980

META-ANALYSIS AND ENDOCRINOLOGY 43

26. Greenland S: Quantitative methods in the review of epidemiologic literature. Epide- mi01 Rev 9:1, 1987

27. Greenland S, Salvan A: Bias in the one-step method for pooling study results. Stat Med 9247, 1990

28. Gregoire G, Derderian F, Le Lorier J: Selecting the language of the publications included in a meta-analysis: Is there a tower of Babel bias? J Clin Epidemiol 48:159, 1995

29. Ham C, Hunter DJ, Robinson R Evidence based policy making: Research must inform health policy as well as medical care. BMJ 310:71, 1995

30. Hedges LV, Olkin I: Statistical Methods for Meta-analysis. Orlando, Florida, Academic Press, 1985

31. Hunter JE, Schmidt FL, Jackson BG: Meta-analysis: Cumulating Research Findings Across Studies. Beverly Hills, Sage Publications, 1982

32. Iyengar SI, Greenhouse J B Selection models and the file drawer problem. Stat Sci 3:109, 1988

33. Laird NM, Mosteller F Some statistical methods for combining experimental results. Int J Techno1 Assess Health Care 6:5, 1990

34. Light RJ, Pillemer DB: Summing Up: The Science of Reviewing Research. Cambridge, MA, Harvard University Press, 1984

35. McCormick MC, Holmes J H Publication of research presented at the pediatric meet- ings: Change in selection. Am J Dis Child 139:122, 1985

36. Meranze J, Ellison N, Greenhow DE: Publications resulting from anesthesia meeting abstracts. Anesth Analg 161:445, 1982

37. Moher D, Olkin I: Meta-analysis of randomized controlled trials: A concern for stan- dards. JAMA 274:1962, 1995

38. Moher D, Fortin P, Jadad AR, et al: Completeness of reporting of trials published in languages other than English Implications for conduct and reporting of systematic reviews. Lancet 347:363, 1996

39. Moher D, Jadad AR, Nichol G, et al: Assessing the quality of randomized controlled trials: An annotated bibliography of scales and checklists. Control Clin Trials 16:62, 1995

40. Orwin RG: A fail-safe N for effect size in meta-analysis. J Educ Stat 8:157, 1983 41. Orwin RG: Evaluating coding decisions. In Cooper H, Hedges LV (eds): The Handbook

of Research Synthesis. New York, Russell Sage Foundation, 1994, p 139 42. Petitti DB: Meta-analysis, Decision Analysis, and Cost-Effectiveness Analysis: Methods

for Quantitative Synthesis in Medicine. New York, Oxford University Press, 1994 43. Petitti DB: Of babies and bath water. Am J Epidemiol 140779, 1994 44. Peto R Why do we need systematic overviews of randomized trials? Stat Med 6:233,

45. Rao CR Comment on "Selection models and the file drawer problem." Stat Sci 1987

3:131, 1988 46. Rosenthal R The "file drawer uroblem" and tolerance for null results. Psvchol Bull

86638, 1979 47. Rosenthal R, Rubin DB: Further meta-analytic procedures for assessing cognitive

gender differences. J Educ Psycho1 74:708, 1982 48. Sacks HS, Berrier J, Reitman D, et al: Meta-analysis of randomized controlled trials. N

Engl J Med 316450, 1987 49. Simes J R Publication bias: The case for an international registry of trials. J Clin Oncol

4:1529, 1986 50. Standards of Reporting Trials Group: A proposal for structured reporting of random-

ized controlled trials. JAMA 272:1926, 1994 51. Sterling T D Publication decisions and their possible effects on inferences drawn from

tests of significance-or vice versa. JASA 54:30, 1959 52. Stock WA: Systematic coding for research synthesis. In Cooper H, Hedges LV (eds):

The Handbook of Research Synthesis. New York, Russell Sage Foundation, 1994, p 125 53. Stock WA, Okun MA, Haring MJ, et al: Rigor in data synthesis: A case study of

reliability in meta-analysis. Educational Researcher ll:lO, 1982

44 PETITTI

54. Thompson SG: Why sources of heterogeneity in meta-analysis should be investigated.

55. Thompson SG, Pocock SJ: Can meta-analysis be trusted? Lancet 338:1127, 1991 56. Villar J, Carroli G, Belizan JM. Predictive ability of meta-analyses of randomised

57. Warshafsky S, Kamer RS, Sivak SL: Effect of garlic on total serum cholesterol: A meta-

58. Wolf FM: Meta-analysis: Quantitative Methods for Research Synthesis. Newbury Park,

BMJ 309:1351, 1994

controlled trials. Lancet 345:772, 1995

analysis. Ann Intern Med 119599, 1993

CA, Sage Publications, 1986

Address reprint requests to Diana B. Petitti, MD, MPH

Kaiser Permanente Southern California Region

393 East Walnut Street Pasadena, CA 91188