the analysis of the ab/ba cross-over trial in the medical literature

9
PHARMACEUTICAL STATISTICS Pharmaceut. Statist. 2004; 3: 123–131 (DOI:10.1002/pst.106) The analysis of the AB/BA cross-over trial in the medical literature Stephen Senn 1,2, * ,y and Sally Lee 1 1 Department of Epidemiology and Public Health, University College, London, UK 2 Department of Statistical Science, University College, London, UK The evolution of opinion as to how to analyse the AB/BA cross-over trials is described by examining the recommendations of three key papers. The impact of these papers on the medical literature is analysed by looking at citation rates as a function of various factors. It is concluded that amongst practitioners there is a highly imperfect appreciation of the issues raised by the possibility of carry- over. Copyright # 2004 John Wiley & Sons Ltd. Keywords: carry-over; citation analysis; generalized linear model THE AB/BA DESIGN A very common object of clinical research is to compare the effect of two treatments, an experi- mental treatment or formulation (A) and either a placebo or a standard therapy or a standard formulation (B). Often the subjects are patients and the outcome is therapeutic or pharmacody- namic response. Sometimes the subjects are healthy volunteers and then, more usually, toler- ability or some pharmacokinetic parameter, as in a so-called bioequivalence study, is measured. If the disease is chronic and the effect of treatment is reversible, a cross-over trial may be an attractive option. A natural design to employ is a two-period design using two sequences, whereby patients are allocated at random to receive A followed by B (sequence AB) or B followed by A (sequence BA). The randomization is frequently constrained so that the numbers on each sequence are equal. An alternative design in two periods is to allocate patients to the four sequences AB, BA, AA, and BB [1]. This design has considerable practical disadvantages and is rarely employed; it will not be considered here. However, its existence means that the phrase ‘two-period design’ is not unambig- uous, and in this paper, to avoid such ambiguity, the phrase ‘AB/BA design’ will be used instead. An obvious difficulty with the AB/BA design is that the data from the second period may reflect not only the effect of the treatment given in that period but also the residual effect of treatments given in the first period. This phenomenon is usually referred to as ‘carry-over’. Clearly, if carry- over is present it is highly likely to bias the estimate of the treatment effect. Sometimes a so- called washout period, a period in which no treatment is given, is employed between treat- ments. Usually, however, there is no statistical reason to do this, although in an active controlled Copyright # 2004 John Wiley & Sons, Ltd. Received \60\re /teci *Correspondence to: Stephen Senn, Department of Epi- demiology and Public Health, University College, London, UK y E-mail: [email protected]

Upload: stephen-senn

Post on 06-Jul-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

PHARMACEUTICAL STATISTICS

Pharmaceut. Statist. 2004; 3: 123–131 (DOI:10.1002/pst.106)

The analysis of the AB/BA cross-over

trial in the medical literature

Stephen Senn1,2,*,y and Sally Lee1

1Department of Epidemiology and Public Health, University College, London, UK2Department of Statistical Science, University College, London, UK

The evolution of opinion as to how to analyse the AB/BA cross-over trials is described by examining

the recommendations of three key papers. The impact of these papers on the medical literature is

analysed by looking at citation rates as a function of various factors. It is concluded that amongst

practitioners there is a highly imperfect appreciation of the issues raised by the possibility of carry-

over. Copyright # 2004 John Wiley & Sons Ltd.

Keywords: carry-over; citation analysis; generalized linear model

THE AB/BA DESIGN

A very common object of clinical research is tocompare the effect of two treatments, an experi-mental treatment or formulation (A) and either aplacebo or a standard therapy or a standardformulation (B). Often the subjects are patientsand the outcome is therapeutic or pharmacody-namic response. Sometimes the subjects arehealthy volunteers and then, more usually, toler-ability or some pharmacokinetic parameter, as in aso-called bioequivalence study, is measured. If thedisease is chronic and the effect of treatment isreversible, a cross-over trial may be an attractiveoption. A natural design to employ is a two-perioddesign using two sequences, whereby patients areallocated at random to receive A followed by B(sequence AB) or B followed by A (sequence BA).

The randomization is frequently constrained sothat the numbers on each sequence are equal. Analternative design in two periods is to allocatepatients to the four sequences AB, BA, AA, andBB [1]. This design has considerable practicaldisadvantages and is rarely employed; it will not beconsidered here. However, its existence means thatthe phrase ‘two-period design’ is not unambig-uous, and in this paper, to avoid such ambiguity,the phrase ‘AB/BA design’ will be used instead.

An obvious difficulty with the AB/BA design isthat the data from the second period may reflectnot only the effect of the treatment given in thatperiod but also the residual effect of treatmentsgiven in the first period. This phenomenon isusually referred to as ‘carry-over’. Clearly, if carry-over is present it is highly likely to bias theestimate of the treatment effect. Sometimes a so-called washout period, a period in which notreatment is given, is employed between treat-ments. Usually, however, there is no statisticalreason to do this, although in an active controlled

Copyright # 2004 John Wiley & Sons, Ltd.

Received \60\re /teci

*Correspondence to: Stephen Senn, Department of Epi-demiology and Public Health, University College, London, UK

yE-mail: [email protected]

study, if a drug–drug interaction is feared, it maybe required for medical reasons. Eventually mostPharmaceuticals, and certainly all those consid-ered for cross-over trials, are eliminated from thebody whether or not other treatments are beinggiven. In many, but not all, cases the rate ofelimination is not (greatly) influenced by othertreatments and where this is so there is noparticular advantage in having a conventionalwashout unless baseline measurements are desired.Instead the strategy of an ‘active washout’ may beemployed: measurement of the effects of treatmentis delayed until such time as the effects of anyprevious treatment have been eliminated [2]. Todistinguish conventional washout from such activewashout, we refer to it as ‘passive washout’.

In this paper we shall discuss the analysisof AB/BA designs where either an active washouthas been employed or, if a passive washout hasbeen employed, baseline measurements are notavailable. Baseline measurements taken after apassive washout and before the second periodbegins are sometimes incorporated in analysisas part of a general strategy for dealing withcarry-over. We shall not cover this possibility inthis paper for two reasons. First, it wouldcomplicate the discussion considerably. Second,it is most unlikely that the effect of carry-over,were it to occur, would be the same at theend of the washout period as at the end ofthe second period, where, due to a furtherelapse of time, it would usually be smaller.That being so, the strategies proposed forusing baselines to deal with carry-over areunconvincing [3].

A potential further disadvantage of cross-overdesigns in general, which also affects the AB/BAdesign, concerns drop-out. Suppose, for simplicity,that patients can be divided into two groups.Those who will not discontinue the trial prema-turely, however long it is, and those who may.Since the probability of discontinuation mustincrease with time, it is almost inconceivable thatthe drop-out rate per patient amongst the secondgroup will not be higher in a cross-over trial thanin a parallel-group trial. Thus, other things beingequal, unless all patients in a clinical trial are of the

first sort, we may expect to see more patients withincomplete data in a cross-over trial than in aparallel-group trial. This causes further complica-tions for analysis. However, the issues it raises,whether the fact that data are missing is informa-tive and how to recover inter-patient information,are not germane to our discussion and we shall notconsider these matters.

Thus, we shall assume in what follows that thestatistician faced with analysing an AB/BA designwill have data from each of the two treatments (nobaselines) for each of n1 patients in the sequenceAB and for each of n2 patients in the sequence BA.We consider the history of advice as to how toanalyse AB/BA designs under such circumstances.First, however, we outline briefly the main purposeof this paper.

PURPOSE OF THIS PAPER

The AB/BA cross-over trial has a poor reputationand a controversial history, which we cover below.That is not, however, the main purpose of thispaper. The evolution of the analysis of cross-overtrials in practice has not necessarily followed thatof recommendations in the statistical literature.This evolution is currently being investigated byone of us (SL) in a number of ways: first, changingrecommendations in textbooks of medical statis-tics; second, current practice in analysis within thepharmaceutical industry; third, the actual analysisof cross-over trials in papers in the medical press,and finally, citation of key statistical papers in themedical literature. It is the last of these that formsthe subject of this paper. We consider the patternof citations over time for the three key papers:Grizzle [4–6], Hills and Armitage [7] and Freeman[8].

However, because this is the first of the series ofproposed papers investigating the way in whichcross-over trial are analysed in practice, we devoteconsiderable space to covering the history of thecontroversy. This will be done in the sections thatfollow before proceeding to the citation analysisitself.

Copyright # 2004 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2004; 3: 123–131

124 S. Senn and S. Lee

THREE KEY PAPERS: A BRIEFHISTORY OF THE AB/BA DESIGN

Modern approaches to analysing the AB/BAcross-over design in patients can be said to havestarted with an influential paper in Biometrics byGrizzle [4–6]. We now proceed to explain Grizzle’smodel and approach.

If we denote the random variable for theoutcome on patient j, j=1,. . ., ni, of sequence i,i=1,2, in period k, k=1,2, as Yijk, then Grizzle’smodel assumes that

EðY1j1Þ ¼ mþ fA þ p1;

EðY1j2Þ ¼ mþ fB þ p2 þ lA;

ð1ÞEðY2j1Þ ¼ mþ fB þ p1;

EðY2j2Þ ¼ mþ fA þ p2 þ lB

for all j. Here fA and fB are treatment parameters,p1 and p2 are period parameters and lA and lB arecarry-over parameters. The model is overparame-terized, but the following contrasts are estimable:

f ¼ fA � fB; l ¼ lA � lB

Interest is centred on the first of these, thetreatment effect f. As Grizzle points out, a thirdpotential contrast of interest,

p ¼ p1 � p2

is not estimable without further assumptions, forexample, that l=0.

The above model for expected responses issupplemented by an error-structure model,

Yijk ¼ EðYijkÞ þ xij þ eijk ð2Þ

Here the xij are ‘patient effects’ assumed indepen-dently and normally distributed of each other withmean 0 and variance s2x; and the eijk are within-patient errors distributed independently of eachother and the patient effects with mean 0 andvariance s2e : Thus (1) and (2) together define amixed model. This model implies that the var-iance–covariance matrix for gijk=xij+eijk has a‘block-diagonal’ form with variances equal to s2 ¼s2x þ s2e and covariances equal to rs2 ¼ s2x: Clearlythe error structure implies that 04r41. A slightly

more general approach is to modify (2), writing

Yijk ¼ EðYijkÞ þ gijk ð3Þ

use a block-diagonal structure for variances andcovariances as before, but allow �14r41. Thismodel is also implicitly considered by Grizzle [4–6]in some places. In practice, which of these twovariants is used makes little or no difference toanalysis.

Next, Grizzle shows that the contrast

%yy1�1 � %yy2�1 ð4Þ

has expectation f and variance {n/(nln2)}s2, where

n=n1+n2. Similarly,

ð %yy1�1 þ %yy1:2Þ � ð %yy2�1 þ %yy2�2Þ ð5Þ

has expectation l and variance {2n/(n1n2)}s2(1+r).

The contrast given by (4) is based on first-perioddata only and has the structure of a treatmentestimate from a parallel-group trial. FollowingFreeman [8], we shall refer to this as PAR.Similarly, we shall refer to (5) as CARRY.

The problem with PAR is that it is an inefficientestimate. It is based on first-period data. Thus, itnot only ignores data from the second period butalso fails to exploit the within-patient nature of thetrial. Grizzle has this to say: ‘If the residual effectsof the two drugs are equal we can delete residualeffects from the model. However, before decidingto alter the model, a preliminary test should bemade at some high level of significance, say a=.10or a=.15.’ He then refers to a paper by Larsonand Bancroft [9] regarding the dropping of termsfrom regression models and concludes that ‘fewserious errors would be made if the preliminarytest were made at a ¼ :10:’ However, as we shalldiscuss below, this conclusion is not correct.

Grizzle pointed out that if carry-over could bedropped from consideration and it could beassumed that l=0, then an efficient and unbiasedestimate of the treatment effect f could be found.This was

fð %yy1�1 � %yy1�2Þ þ ð %yy2�2 � %yy2�1Þg=2 ð6Þ

Thus, effectively what Grizzle proposed as astrategy for analysing was the following. First,

The AB/BA cross-over trial in the medical literature 125

Copyright # 2004 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2004; 3: 123–131

test the null hypothesis Hl:l=0 at size a=0.10using CARRY. If this is not rejected, test thehypothesis Hf:f=0 at size a=0.05 using CROS.If Hl is rejected, test Hf at size a=0.05 usingPAR. This has come to be known as ‘the two-stageprocedure’.

The second milestone in the history of theanalysis of the AB/BA cross-over trial was anexpository paper by Hills and Armitage in theJournal of Clinical Pharmacology [7]. This coveredplanning and analysis for a variety of outcomes.The authors also included an extensive discussionof treatment-by-period interaction, explaininghow it might arise. To understand the relevanceof this, it will be helpful to examine expression (1)and note that the difference between the expecta-tion in period 1 of a first sequence value and asecond sequence value will be f, whereas thecorresponding difference from period 2, but thistime subtracting the first sequence value from thesecond, is f�l. It follows that if we were toadopt the logic of a parallel-group trial andprovide two estimates of the treatment effect, onefrom the first period and one from the second, bycomparing patients in different sequences, wewould have estimates whose expectation woulddiffer by �l. Thus, the parameter l couldrepresent a treatment-by-period interaction. AsHills and Armitage explain, such treatment-by-period interaction could arise as a result ofpharmacological carry-over, psychological carry-over or non-additivity of treatment and periodeffects.

Hills and Armitage presented an analysiscorresponding to Grizzle’s linear model but insimpler terms, rendering it easier for the practi-tioner to implement and also thereby avoidingsome algebraic slips in Grizzle’s presentation [5, 6](although making some numerical ones!).Although showing some reservations about thetwo-stage analysis, particularly on account of thelow power of the preliminary test for carry-over,Hills and Armitage gave a very clear description ofit and endorsed its use, concluding their paper withthe following recommendation: ‘the internal evi-dence that the basic assumptions of the cross-overare fulfilled must be presented and if necessary the

conclusions should be based on the first periodonly’ [7, p. 19].

As we shall show below, perhaps on account ofits clarity, but also perhaps because it waspublished in a medical rather than a statisticaljournal, the paper by Hills and Armitage hadconsiderable impact. It had the effect of makingGrizzle’s approach known to a much wideraudience and of promoting its use.

The third of the milestone papers was a paper byFreeman in Statistics in Medicine [8]. To under-stand its importance, consider the claim by Grizzlethat the resulting first-stage test when carried outat the 10% level introduces little bias into theprocedure. This claim is based on the paper byLarson and Bancroft [9] referred to above. Theseauthors were looking at a rather different situa-tion: essentially that of a backward eliminationproblem in regression. They considered the biasthat might arise if true predictors were droppedfrom a regression model based on lack ofsignificance of effect. The models considered byLarson and Bancroft, however, have a single errorterm, not two as in Grizzle’s work. It is true thatthe model represented by (1) and (2) can bereplaced by a single error model in which patienteffects are fixed and not random, and that, in thecase where there are no missing periods and thecarry-over is not included, this will yield identicalinferences [2]. Such a model is a simple linearmodel of the sort considered by Larson andBancroft [9]. However, since carry-over is con-founded with fixed patient effects in such a model,there is no question of testing for the significanceof carry-over and thus no question of applying thetwo-stage procedure nor, for that matter, ofapplying the backward elimination procedure asregards the carry-over effect. Thus, the investiga-tions of Larson and Bancroft are not directlyrelevant to Grizzle’s purpose. Furthermore, theirinvestigations are concerned with bias in predic-tion and not significance tests for effects. Yet, inthe context of clinical trials and, in particular, thatof drug regulation it is that latter that has been thefocus of interest.

Freeman’s paper formally investigates Grizzle’sclaim and shows it to be false [8]. Freeman

Copyright # 2004 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2004; 3: 123–131

126 S. Senn and S. Lee

considers the joint distribution of CARRY andCROS and of CARRY and PAR. WhereasCARRY and CROS are orthogonal, CARRYand PAR are highly correlated. CARRY is basedon the sum of first- and second-period values,PAR on the first-period values, and it is not hardto show that if Grizzle’s model applies, thecorrelation between the two will plausibly liebetween 1=

ffiffiffi2

pand 1. The consequence of this is

that where CARRY is significant PAR will alsotend to be so. By double integration of the samplespace, Freeman shows that the type I error rate forthe test of the treatment effect for this procedurewhen both treatment effect and carry-over are zerolies between 7% and 9.5% for a claimed 5% size.As Senn has pointed out, this unconditional Type Ierror rate is made up of two conditional errorrates: an error rate of 5% with probability 90%corresponding to the use of CROS, and an errorrate between 25% and 50% with probability 10%corresponding to the use of PAR [10]. Hence,either the two-stage procedure is irrelevant (theanswer is the same as using CROS alone) or theresult is deeply misleading [2, 11, 12]. Freeman’sconclusion, one with which we completely agree, isthat ‘the two-stage analysis is too potentiallymisleading to be of practical use’ [8, p. 1421].

CITATION ANALYSIS

Citation analysis is a bibliometric method thatuses reference citations found in scientific papersas an analytical tool [13]. It has been widely usedto quantify the influence of research articles onscience [14]. Citation analysis has been used tosearch for fundamental articles and identify keycontributors in many research areas, includingbiomedicine, economics, information systems,computing and chemistry.

In this paper we present a detailed bibliometricstudy of the three key publications discussed abovein terms of their citation rate, journal type andcitation type. Data for this citation analysis aregenerated using Science Citation Index Expandedon the Web of Science (http://wos.mimas.ac.uk)

provided by the Institute of Scientific Information(ISI). Science Citation Index Expanded is a sourcedatabase of articles published in journals andcontains all the end-of-article citations in paperspublished by more than 5700 scientific journalsacross 164 scientific disciplines. The databaseincludes journal titles of all research fields inwhich cross-over trials are applied. With anaverage of 17 750 new records added every week,the results of this citation analysis of the two-stageprocedure discussed in this paper are based on asearch performed in February 2002.

In order to quantify the influences of each paperand their area of impact, the analysis carried outon these bibliographic data includes (a) thecitation rate of each paper over time, (b) thejournals in which citing papers are published and(c) the citation type. We also present a statisticalanalysis of these data as a means of standardizingfor the influence of other factors, in particular thefact that these articles appeared in 1965, 1979 and1989. Although we do not take this analysis tooseriously, we consider, nevertheless, that it hassome utility in controlling for strong seculartrends.

In interpreting the results that follow, it shouldbe borne in mind that the data represent citationsfor the years 1981 to 2001 inclusive (together witha few weeks of 2002), data before 1981 not beingavailable. Since both the Grizzle [4] and the Hillsand Armitage [7] papers were published before thestart of this period they are both represented for 21years. On the other hand, Freeman’s paper [8] waspublished in 1989 and therefore only provides datafor the 13 years from 1990 onwards.

SEARCH RESULTS: CITATION RATE

The two papers describing the two-stage proceduretogether received 1328 citations over the period ofjust over 21 years from 1981 to February 2002,Hills and Armitage [7] accounting for 961 citationsand Grizzle [4] accounting for the other 367. Thisis a mean per year of 48 citations of Hills andArmitage [7] and 18 for Grizzle [4]. Putting these

The AB/BA cross-over trial in the medical literature 127

Copyright # 2004 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2004; 3: 123–131

together, there have been roughly 5 citations onthe two-stage procedure per month for the pasttwo decades (excluding, of course, other papersthat may have described it). The significance ofthis citation rate can be weighed against the figureconcluded by Garfield in 1971 that only 1% of allpublished items are cited more than 15 times a year[15]. This indicates that each of these two papers isin the top 1% of cited articles.

These two papers are also ‘citation classics’. DeSolla Price [16] suggested that a paper which iscited more than 4 times every year can be qualifiedas a ‘classic’. The citations on each of Grizzle [4]and Hills and Armitage [7] have exceeded thisnumber for at least 22 years. For Hills andArmitage the citation rate has always been greaterthan 4 times this boundary figure.

In contrast, Freeman [8] is far less cited than theother two. Freeman has only been cited 56 times

for the 12-year period from 1990 to 2001. It doesnot quite qualify for the admittedly somewhatarbitrary label ‘classic’, since there are a few yearswhen Freeman was cited less than four times.

The results year by year are presented in Table I,which also includes the number of cross-over trialpapers for each year. We shall use these figures totry standardize for citation rates in the nextsection.

A STANDARDIZED COMPARISONOF CITATION FREQUENCY

Two factors that might plausibly affect the numberof citations a paper receives in a given year are theage of the cited paper and the number of paperspublished in the given year that would, in

Table I. Citation of the three key papers by year.

Year Grizzle [4]Hills andArmitage [7] Freeman [8]

Total number ofcross-over trial reportsfrom Medline search

1981 20 17 N/A 951982 15 28 N/A 1261983 20 45 N/A 1211984 21 32 N/A 1291985 17 49 N/A 1341986 19 58 N/A 1681987 19 59 N/A 1601988 12 65 N/A 1561989 22 84 N/A 1651990 24 61 1 1481991 24 49 3 1821992 24 57 2 1811993 20 60 8 1771994 13 49 5 1661995 14 50 6 1661996 29 41 8 1671997 17 42 6 1611998 6 31 5 1261999 12 33 5 1222000 10 20 4 1242001 5 29 3 188Total (up to 2001, not including 1980) 363 959 56 31622002 9 23 4Total (up to 18/09/02, 1980 not included) 372 982 60

Copyright # 2004 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2004; 3: 123–131

128 S. Senn and S. Lee

principle, be capable of citing the paper. Thegeneral lines of this latter concept are clearenough, but the details have some fuzziness. Inour analysis we have used the number of paperspublished in a given year satisfying a Medlinesearch using keywords ‘crossover trial’, ‘crossovertrials’, ‘cross-over trial’, ‘cross-over trials’,‘changeover trial’, ‘changeover trials’, ‘change-over trial’ and ‘change-over trials’. Any paperwith at least one of these keywords was included.The search results were reviewed and papers whichwere not cross-over trial reports were excluded.

The data set consists of the following variables:citation (number of citations), author (a categoricalvariable taking on the values 1,2,3 for Grizzle,Hills and Armitage and Freeman, respectively),citage (the years since publication) and tt (the totalnumber of cross-over trial papers published in theyear in question). It should be noted that tt isreally only a proxy measure for something thatmight be termed ‘citing potential’. For example,although many of the papers that contribute tocitation will be included amongst those that consistof tt, there are some exceptions. Some of thepapers citing Grizzle, Hills and Armitage, andmost of those citing Freeman, as we shall see, areactually statistical methodological papers and sonot included in tt. Therefore it is included as anoffset variable. It should also be noted that it is, ofcourse, possible for any paper included in tt to citenone, one, two or all three of Grizzle, Hills andArmitage, and Freeman.

There are 54 observations in total: 21 each forGrizzle and Hills and Armitage, and 12 forFreeman. The results below are those obtainedby fitting a generalized linear mode [17, 18] in procgenmod of SAS1, assuming that citation has aPoisson distribution with author and citage aspredictors.

The code is

Proc genmod data=com;

Class author;

Model citation=author citage /offset=

log (tt) dist=Poisson

link=log obstats residuals dscale Type1

Type3;

Estimate ‘Grizzle-Freeman’ author 1 0 �1;

Estimate ‘H&A - Freeman’ author 0 1 �1;

Estimate ‘H&A - Grizzle’ author �1 1 0;.

Note that use of the dscale option allows foroverdispersion. The results include the following,

LR Statistics for Type 3 Analysis

Source Num DF Den DF F value Pr>F

author 2 50 142.59 50.0001citage 1 50 11.73 0.0006

from which a result is confirmed that was obviousfrom inspection of the raw data: there is aconsiderable difference between citation rates forthe three papers that cannot be explained bydifferences in their age, nor by volume of trials inthe years in question.

As regards the comparisons between the threepapers, the following results giving estimates,standard errors and 95% confidence limits apply:

Label Estimate SE Confidence limits

Grizzle – Freeman 1.86 0.26 1.34 2.37

Hills/Armitage – Freeman 2.48 0.21 2.06 2.90

Hills/Armitage – Grizzle 0.62 0.14 0.35 0.90

These estimates are on the log scale, andexponentiation yields values of 6.42, 11.94 and1.86 for the relative (adjusted) citation rates ofthese papers.

An alternative approach to dealing with over-dispersion is to model this directly using thenegative binomial distribution rather than thePoisson distribution. When this is done the resultsare very similar, with estimates of contrasts(standard errors) as follows: 1.87 (0.24), 2.47(0.17), 0.59 (0.15).

CITATION TYPE

Another question of interest concerns the type ofciting paper: whether this is more methodologicalor consists, for example, of a medical application.

The AB/BA cross-over trial in the medical literature 129

Copyright # 2004 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2004; 3: 123–131

Table II shows the citations cross-tabulated bytype of citing paper, medical or statistical,and paper cited. All reports of clinical trialsare categorized as medical papers. Papers con-taining discussion of statistical methodologyare categorized as statistical papers. Citing paperswhich do not fulfil the description above arecategorized as ‘other’. The subject areas ofjournals where the papers fall in the ‘other’category include biology, veterinary science, aqua-culture, computer science, social science, physicsand health science. The purposes of citing the two-stage procedure in such journals are less obvious,and as the main focus of the impact of the two-stage procedure is on its applications to theanalysis of cross-over trials in a clinical context,‘other citation types’ are excluded from thediscussion.

Clearly there is a very different pattern for thethree papers, which is confirmed by a highlysignificant Pearson–Fisher chi-square value of194.5 on two degrees of freedom (based on aconditional analysis which compared only citationtypes ‘medical’ and ‘statistical’). Again, this is asignificance test of questionable validity, sincethere may be some dependence amongst resultsdue to multiple citations, but the value is so largethat the message is quite clear: there is a genuinedifference in citation patterns between the threepapers.

Hills and Armitage receive the highest propor-tion of citations (94%) from medical papers,whereas Freeman received the lowest proportion(39%) and Grizzle has an intermediate proportion(76%). These large differences seem only partially

explicable by journal type. It is clear that theBritish Journal of Clinical Pharmacology, in whichHills and Armitage [7] was published, is likely tobe read by many physicians carrying out cross-over trials. Much of the subject matter of thatjournal is concerned with phase I and phase IItrials in pharmaceuticals, and these are often runas cross-over trials. However, Biometrics, in whichGrizzle [4] appeared, is arguably more methodo-logically orientated than Statistics in Medicine, inwhich Freeman [8] appeared, and thus the higherrelative (and absolute) citation rate of Grizzle inthe medical literature does not seem explicable onthese grounds.

DISCUSSION

Our own view is that the paper by Freeman [8]provided a necessary correction to those of Grizzle[4] and Hills and Armitage [7]. It is thusdisappointing to see that, whilst it is a paper thathas attracted much interest, it is achievingrelatively little impact compared to Grizzle andHills and Armitage and that this position is evenclearer when one looks at the result in terms ofjournal type.

Our analysis, however, has only looked at therelative impact of these three papers. Many cross-over trials continue to be analysed in ways thatcould not be justified by appeal to any of thesepapers or indeed to any of the monographs thathave been written on the subject [2, 19–21]. (Notthat these are necessarily in agreement with eachother!) A common approach is to use the matched-pairs t test. This perhaps is not too bad, but worseis sometimes encountered: for example, separatetesting back to baseline within each treatmentgroup with no direct comparison of treatments.

However, the more general issue of how cross-over trials are actually analysed in the medicalliterature requires a different approach: that ofsampling published articles directly, rather than bylooking at those that cite key papers. This furtherinvestigation will form the subject of a separatepaper.

Table II. Citations cross-tabulated by paper cited and

type of citing article.

Type

Paper Medical Statistical Other Total

Freeman 22 34 0 56Grizzle 267 83 13 363Hills and Armitage 856 51 52 959Total 1145 168 65 1378

Copyright # 2004 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2004; 3: 123–131

130 S. Senn and S. Lee

ACKNOWLEDGEMENT

We thank an anonymous referee for helpful comments.

REFERENCES

1. Balaam LN. A two period design with t2 experi-mental units. Biometrics 1968; 24:61–73.

2. Senn SJ. Cross-over trials in clinical research. Wiley:Chichester, 2002.

3. Senn SJ. The AB/BA crossover: past, present andfuture? Statistical Methods in Medical Research1994; 3:303–324.

4. Grizzle JE. The two-period change over design andits use in clinical trials. Biometrics 1965; 21:467–480.

5. Grizzle JE. Correction to Grizzle (1965). Biometrics1974; 30:727.

6. Grieve AP. The two-period changeover design inclinical trials [letter]. Biometrics 1982; 38:517.

7. Hills M, Armitage P. The two-period cross-overclinical trial. British Journal of Clinical Pharmacol-ogy 1979; 8:7–20.

8. Freeman P. The performance of the two-stageanalysis of two-treatment, two-period cross-overtrials. Statistics in Medicine 1989; 8:1421–1432.

9. Larson HJ, Bancroft TA. Biases in prediction byregression for certain incompletely specified models.Biometrika 1963; 50:391–401.

10. Senn SJ. Problems with the two stage analysis ofcrossover trials [letter; comment]. British Journal ofClinical Pharmacology 1991; 32:133.

11. Senn SJ. Cross-over trials. In Encyclopedia ofBiostatistics, Armitage P, Colton T (eds). Wiley:Clichester, 1998; Vol. 2, pp. 1033–1049.

12. Senn SJ. Crossover design. In Encyclopedia ofBiopharmaceutical Statistics, Chou SC, Lin JP(eds). Marcel Dekher: New York, 2000, pp.142–149.

13. Garfield E, Welljams-Dorof A. Citation data: theiruse as quantitative indicators for science andtechnology evaluation and policy-making. Scienceand Public Policy 1992; 19:321–327.

14. Redman J, Willett P, Allen FH, Taylor R. A citationanalysis of the Cambridge Crystallographic DataCentre. Journal of Applied Crystallography 2001;34:375–380.

15. Garfield E. Play the new game of twenty citations.Wherein ISI reveals the fifty most frequentlycited non-journal items. Current Contents 1971;224–228.

16. De Solla Price DJ. Networks of scientific papers.Science 1965; 149:510–515.

17. Nelder JA, Wedderburn RWM. Generalized linearmodels. Journal of the Royal Statistical Society A1972; 132:107–120.

18. McCullagh P, Nelder JA. Generalized linear models.Chapman & Hall: London, 1989.

19. Jones B, Kenward MG. Design and analysis of cross-over trials, 2nd edition. Chapman & Hall/CRC:Boca Raton, FL, 2003.

20. Ratkowsky DA, Evans MA, Alldredge JR. Cross-over experiments: design, analysis, and Application.Marcel Dekker: New York, 1993.

21. Cotton JD. Analyzing within-subject experiments.Lawrence Erlbaum Associates: Mahwah, NJ, 1998.

The AB/BA cross-over trial in the medical literature 131

Copyright # 2004 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2004; 3: 123–131