robust and realistic approaches to carry-over

16
STATISTICS IN MEDICINE Statist. Med. 17, 2849 2864 (1998) ROBUST AND REALISTIC APPROACHES TO CARRY-OVERs STEPHEN SENN* AND DIMITRIOS LAMBROU Department of Statistical Science, University College London, London, U.K. If the area of the experiment were kept constant and the replication increased by using smaller plots we should only gain in precision if, as abundant agricultural experimentation shows to be generally the case, the greater proximity of the smaller areas led to a greater similarity of the fertility of the soil. The practical limit to plot subdivision is set, in agricultural experiments, by the necessity of discarding a strip at the edge of each plot. The width of the strip depends on the competition of neighbouring plants for moisture, soil nutrients and light, and is independent of the size of plots. Consequently, as smaller plots are used, a larger proportion of the experimental area has to be discarded. R. A. FISHER, The Design of Experiments1 (pp. 6061) SUMMARY The relationship between choice of model for carry-over and choice of efficient cross-over design is studied in particular with reference to two treatment designs in four periods and two sequences. The effect of model miss-specification is also examined. It is concluded that previous claims regarding efficient designs are not necessarily reasonable. ( 1998 John Wiley & Sons, Ltd. INTRODUCTION Research into the design and analysis of cross-over trials is dominated by the problem of carry-over. Carry-over has been defined as, ‘the persistence (whether physically or in terms of effect) of a treatment applied in one period in a subsequent period of treatment.2 When it occurs, carry-over arises as a consequence of treatment (indeed, it is often referred to as the ‘residual effect’). It should not be confused with the period effect which is any secular change between periods of measurement, which is not connected with treatments given at any time, but which applies to all patients, either as a consequence of a change in their disease status or as a result of changes in the conditions of measurement. This is not to say that carry-over need always be directly pharmacological, in the sense of pharmacokinetics. It need not be a residual chemical persistence of the previous treatment in the blood. In a double blind clinical trial, however, it must be at least indirectly due to pharmacology. This is because, in a double blind trial, the only difference between treatments (and hence between sequences) is pharmacological; shape and colour are the same and identities of the treatments are unknown. Hence, as has not always been appreciated, if one talks of ‘psychological’ carry-over under such circumstances, the origin of this * Correspondence to: Stephen Senn, Room 316, 1-19 Torrington Place, University College London, London WC1E 6BT, U.K. s Presented at the International Society for Clinical Biostatistics, Seventeenth International Meeting, Budapest, Hun- gary, August 1996. CCC 02776715/98/24284916$17.50 ( 1998 John Wiley & Sons, Ltd.

Upload: stephen-senn

Post on 06-Jun-2016

217 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Robust and realistic approaches to carry-over

STATISTICS IN MEDICINE

Statist. Med. 17, 2849—2864 (1998)

ROBUST AND REALISTIC APPROACHES TO CARRY-OVERs

STEPHEN SENN* AND DIMITRIOS LAMBROU

Department of Statistical Science, University College London, London, U.K.

If the area of the experiment were kept constant and the replication increased by using smaller plots weshould only gain in precision if, as abundant agricultural experimentation shows to be generally thecase, the greater proximity of the smaller areas led to a greater similarity of the fertility of the soil. Thepractical limit to plot subdivision is set, in agricultural experiments, by the necessity of discardinga strip at the edge of each plot. The width of the strip depends on the competition of neighbouringplants for moisture, soil nutrients and light, and is independent of the size of plots. Consequently, assmaller plots are used, a larger proportion of the experimental area has to be discarded.

R. A. FISHER, The Design of Experiments1 (pp. 60—61)

SUMMARY

The relationship between choice of model for carry-over and choice of efficient cross-over design is studied inparticular with reference to two treatment designs in four periods and two sequences. The effect of modelmiss-specification is also examined. It is concluded that previous claims regarding efficient designs are notnecessarily reasonable. ( 1998 John Wiley & Sons, Ltd.

INTRODUCTION

Research into the design and analysis of cross-over trials is dominated by the problem ofcarry-over. Carry-over has been defined as, ‘the persistence (whether physically or in terms ofeffect) of a treatment applied in one period in a subsequent period of treatment.2 When it occurs,carry-over arises as a consequence of treatment (indeed, it is often referred to as the ‘residualeffect’). It should not be confused with the period effect which is any secular change betweenperiods of measurement, which is not connected with treatments given at any time, but whichapplies to all patients, either as a consequence of a change in their disease status or as a result ofchanges in the conditions of measurement. This is not to say that carry-over need always bedirectly pharmacological, in the sense of pharmacokinetics. It need not be a residual chemicalpersistence of the previous treatment in the blood. In a double blind clinical trial, however, it mustbe at least indirectly due to pharmacology. This is because, in a double blind trial, the onlydifference between treatments (and hence between sequences) is pharmacological; shape andcolour are the same and identities of the treatments are unknown. Hence, as has not always beenappreciated, if one talks of ‘psychological’ carry-over under such circumstances, the origin of this

* Correspondence to: Stephen Senn, Room 316, 1-19 Torrington Place, University College London, London WC1E 6BT, U.K.s Presented at the International Society for Clinical Biostatistics, Seventeenth International Meeting, Budapest, Hun-gary, August 1996.

CCC 0277—6715/98/242849—16$17.50( 1998 John Wiley & Sons, Ltd.

Page 2: Robust and realistic approaches to carry-over

so-called psychological carry-over is in the pharmacological effect of a previous engenderingtreatment. This is not to say that other factors may not modify this carry-over. Indeed, just as it ispossible for the residual effect of an engendering treatment to perturb some treatment currentlybeing studied, so it is also possible for the residual effect itself to be modified by the perturbedtreatment. This point is regularly overlooked and the investigation of its consequences forms animportant part of this paper.

A popular design for comparing two treatments is the AB/BA cross-over. In the presence ofcarry-over, however, it is not possible to produce unbiased and efficient estimators of thetreatment effect using this design, and a number of authors have suggested multi-period alterna-tives as a means of dealing with the problem. In investigating such designs there has been nearuniversal reliance on the ‘simple carry-over model’, whereby it is assumed that the effect ofcarry-over lasts for one period and is determined entirely by the engendering treatment and not atall modified by the perturbed treatment.3~6 As Fleiss7,8 and Senn2,9~11 have pointed out,however, this model is not realistic and the question then arises as to whether, despite its unrealnature, it is useful. Fleiss,7,8 for example, pointed out that it would be more natural to assume thata treatment does not carry over into itself since, usually, for any reasonable period of study, one mayexpect that steady-state will have been reached. Senn has taken various designs and shown thatapplication of the simple carry-over model can increase the bias in the estimate of the treatmenteffect compared to ignoring carry-over altogether if steady-state or other forms of carry-overapply.2,9 Matthews,12,13 however, has investigated the performance of designs which are efficientfor the simple carry-over model and shown that they are also generally efficient for the steady-statemodel and hence that the approach of finding suitable designs for the former is generally robust.A similar claim has been made by Jones and Kenward5 (p. 178). In this paper we show, however,that this claim depends upon a specific and (in our view) unrealistic view of what is required fordesign robustness and that, given a more ‘robust’ view of robustness, it is not entirely correct.

The plan of this paper is as follows. First, we shall discuss carefully the notion of a period sincethis is essential to an understanding of the design problem for cross-over trials. Second, we shallconsider carry-over itself and show why the simple carry-over model is not reasonable. Third, weshall consider two different approaches to design robustness, drawing attention to an importantdistinction between them. Fourth, in the main section of the paper, we shall present the results ofsome investigations into the performance of various designs for two treatments in four periodsand two sequences. Finally, we shall offer some tentative conclusions.

We want to make it quite clear at the outset, however, that we do not regard this investigationas being in any sense definitive. Indeed, our object tends in the other direction. We shall concludethat no general recommendation regarding an optimal design for cross-over trials is possible andthat previous approaches to this problem have been unrealistic, optimistic and naive.

PERIODS

Patients are very rarely recruited simultaneously in a cross-over trial. A period is almost nevera single given calendar data (an exception might be bioequivalence studies) and for some patientsin a cross-over trial, period 1 may actually be later than period 2 was for others. Furthermore, animportant distinction must be drawn between single-dose trials (where the purpose is usually tostudy pharmacodynamic effects of treatment) and multiple-dose trials (which are often used fortherapeutic purposes).2,12~14 (The common usage of single and multiple ‘doses’ in this contextrefers to whether single or repeat administration of a treatment are being given. It has nothing to

2850 S. SENN AND D. LAMBROU

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 2849—2864 (1998)

Page 3: Robust and realistic approaches to carry-over

do with whether a number of different unit doses are being studied as, say, in a dose-finding trial.)In single-dose trials, the design constraint may well be the number of periods. This is because,very often, the patient will be called into the clinic on each occasion and studied throughout theday. This has been, for example, a common approach in single-dose trials in asthma. There is thena limit to the number of occasions for which a patient may be called into the clinic and this limit isthe time constraint. For multiple-dose studies, however, the patient is given a course of medica-tion and supplies are renewed from time to time. Usually such renewal is associated with a visit tothe doctor and it is true that each such visit imposes some extra burden on the patient, butessentially it is the total time on the study that is important. Furthermore, in many cases thestrategy of an active wash-out would be employed.2 Patients would be switched over almostimmediately to the next treatment. The measurements taken in the first few days after the switchwould then be discarded for the purpose of analysis. The situation is very similar to that foragricultural experiments described by Fisher1 in our opening quotation.

For such multiple-dose experiments, it is often largely a matter of analytic convention as tohow a design is defined. Consider, for example, a 26 week study in asthma comparing a beta-agonist with a placebo in which patients are switched over at 13 weeks, either from placebo tobeta-agonist or vice versa. Perhaps the first week after the switch might be sacrificed (correspond-ing to the ‘strip’ at the edge of the ‘plot’ in the quotation from Fisher above) and, because thesteady-state of treatments are to be studied, the same would be done at the beginning of the firstthirteen weeks. The peak expiratory flow (PEF) measurements of waking in the morning, andaveraged over the 12 weeks, might be used as the outcome measure. Obviously this is an AB/BAdesign, but it has been stated, that an ‘optimal’ design in four periods and two sequences is theAABB/BBAA design5 and the design in question can be turned into this, simply by dividingthe period of measurements in two. However, it does not make the slightest bit of difference to thepatient as to whether a single average is produced from one 12-week period by the statistician orwhether two six-week averages are produced and the only advantage in the later course would bethat some more complicated function could be fitted to the patients’ response over the whole ofthe trial. Using the 12 weeks together corresponds to giving equal weight to the two six-weeksub-periods and obviously, by identifying them separately, we have other options. Note, however,how such a consideration exposes the essentially arbitrary nature of the simple carry-over model.For, in order to use it in connection with the AABB/BBAA design, one is now claiming not thatcarry-over lasts for one period but that it lasts no longer than seven weeks. Furthermore, if oneweek is being allowed for treatments to reach steady-state, it seems strange that carry-over shouldbe presumed to last for up to seven weeks.

A genuinely different design, however, can be produced if we choose to switch treatments moreoften during the 26 weeks. Thus, for example, the ABAB/BABA design is genuinely a differentdesign to the AB/BA design, as it has three switches of treatment rather than one. Furthermore,such a design could exploit an autocorrelation across periods within patients analogous to that towhich Fisher refers but at the expense of discarding four weeks of observation rather than two.We shall not pursue this particular point further here, as it will be considered in a future paper.However, we simply make the point to show that, for such therapeutic trials, treating the numberof design periods as a primitive constraint rather than a derived solution to the problem makeslittle sense. It is the total time that is relevant. It seems to us that the design problem for such trialsis quite other than has been supposed. It will consist in using active wash-out to deal withcarry-over and considering what the loss of experimental material imposed by a switch, togetherwith the presumed correlation structure, implies about the analysis.

ROBUST AND REALISTIC APPROACHES TO CARRY-OVER 2851

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 2849—2864 (1998)

Page 4: Robust and realistic approaches to carry-over

For single dose designs, it does make sense to talk about periods, although here, ironically thepotential problem of carry-over is much less. This is because it is a practical proposition to arrangefor periods of wash-out which are many times longer than the presumed duration of action of thetreatment. Frequently, in fact, a variable wash-out is employed and a minimum length of time forthis is fixed. For example, in a trial in asthma comparing salbutamol to formoterol conducted byPalmqvist et al.,15 the wash-out varied from seven to 95 days. In such a case, the number of periodsavailable for analysis may indeed form a constraint. It is also our opinion, however, that in thecontext of such designs it makes little sense to make any allowance for carry-over at all.

If this analysis of the two basic types of cross-over design is accepted, it suggests that (i) forsingle-dose designs the problem of carry-over can be safely ignored, and (ii) for multi-dosetherapeutic designs, discussing the relative performance of designs for a fixed number of periods isto miss the point. This leaves no room whatsoever for the conventional investigations of designefficiency which have generally been carried out to date. If this claim were accepted, there wouldbe no need for this paper. However, this point of view does not seem to be general and, in ouropinion, the development of the cross-over trial is being held back as a consequence. For example,in comparison with the enormous body of work on carry-over, very little seems to have been doneon using multi-period designs to investigate patient by treatment interaction.16 Hence, we shallinvestigate the consequences of allowing for carry-over more conventionally, that is to say,accepting both that substantial carry-over may occur and that the number of periods availablemay form the design constraint. ¹his does not mean that we regard this as being necessarily anappropriate way of looking at matters. We are merely responding to the debate as others appear towish to conduct it.

CARRY-OVER

As already explained, the simple carry-over model assumes that carry-over lasts only for oneperiod and depends only on the engendering and not the perturbed treatment. The steady-statemodel assumes that carry-over of a treatment into itself does not occur because it will havealready reached steady-state by the time one period of treatment has been taken. The steady-statemodel has rarely been investigated but seems much more natural for multi-dose trials than forsingle-dose trials. (Amongst those who have worked on optimal design, for example, Mat-thews12,13 is one of the few who has carried out investigations of both approaches. There is alsoa brief discussion in Jones and Kenward5 on pp. 177—178.) The simple carry-over model is notparticularly reasonable for either major type of cross-over. It might apply to a single-dose designif the dose was such that one was in a linear part of the dose—response curve. There is noparticular reason in general, however, why it should apply. It is sometimes suggested that it maybe appropriate for psychological as opposed to pharmacological response (see above) but ourview is that these claims would require detailed analysis to substantiate them. Drugs aredeveloped because the direct effects of treatment are presumed explicable in terms of pharmacology;it seems most reasonable to assume that this explication applies to the residual effects also and inany case, psychophysics holds no more comfort in this regard than does pharmacodynamics. (TheWeber—Fechner law is a case in point.) An extensive discussion of the implication of theories ofpharmacodynamic response for carry-over is given by Senn2 in chapter 10.

Nevertheless, it is not hard to think of cases in which carry-over might be considerable butneither the steady-state model nor the simple carry-over model would be appropriate. A modelmay be false but useful, however, and it might be that behaving as if the simple carry-over model

2852 S. SENN AND D. LAMBROU

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 2849—2864 (1998)

Page 5: Robust and realistic approaches to carry-over

applied, even though it did not, might be good strategy.12,13 Furthermore, if estimators areproduced which are unbiased in the presence of both steady-state and simple carry-over, they willbe unbiased for carry-over which corresponds to any possible mixture of the two cases. For twotreatment designs, this covers all cases of carry-over lasting one period. Therefore it is perhaps ofinterest to consider the two extreme cases and we shall do that in due course. As part of theinvestigation into the effect of various types of carry-over we shall first, however, develop a generalmodel for belief in carry-over to be used later. This is necessary because, although the comparison ofdesigns may be made in terms of variance alone where the model is not in dispute, where there aredisputes about the model, the issue of bias is raised and also that of mean square error.

Consider carry-over where an active treatment is being compared with placebo and supposethat carry-over can be characterized in terms of the dose—response curve of the active treatment.This is a useful way of looking at things because on pharmacokinetic grounds it can be regardedas being generally reasonable for many drugs (but there are some exceptions) that the samefraction of active substance would carry into a placebo period as into a period of treatment withan active drug. Any difference in carry-over is then due to pharmacodynamic response. However,it is not necessary to regard carry-over in this way. It is simply a device for allowing a given‘amount’ of carry-over that depends on (in the sense of being caused by) an engendering treatmentonly to translate into differential effects on a perturbed treatment. Now consider four generalcases. Cases one and two are steady-state carry-over and simple carry-over, which we havealready encountered. Steady-state corresponding to a dose response approaching saturation(diminishing returns to scale) and simple carry-over corresponds to an approximately linear doseresponse over the region of interest (constant returns to scale). Case three is ‘super-carry-over’;here the response shows increasing returns to scale. Case four is where the patient has too much ofa good thing and we actually have negative returns to scale and a point has been reached wherean increase in the dose produces a reduction in the response.

These four cases can be described in terms of a single parameter r giving the effect of carry-overinto an active treatment as a ratio of the effect of carry-over into placebo. For strict application ofcase one we have r"0. In practice we might allow that r could be greater than 0 but much lessthan 1. For case two we have r"1. Again we might relax the definition of case two to cover r:1.For case three we have r'1 and for case four r(0. Assume that to case i of carry-over we assigna probability /

i. At a later stage, numerical probabilities will be assigned to these. For the

moment we consider what conditional belief in r might be, given confidence that case i is the sortof carry-over which applies.

Figure 1 shows four such conditional distributions, the result of some introspection carried outby one of us (SS). Of course, other investigators might have quite different distributions and, inany case, one might expect that in practice the probabilities ought to be dependent on what isknown about the indication and the class of treatments. However, these probability distributionsare not to be used as priors in a Bayesian analysis but as part of an investigation into design andanalysis efficiency and robustness, as will be discussed below in due course. Note that there issome overlap of the distributions so that ‘case 1’ is potentially capable of producing a higher valueof r than case 2 and so forth. These distributions are merely stages towards producing an overallprobability distribution for r. This has been done by allocating probabilities of 0)8, 0)1, 0)05 and0)05 to /

1, /

2, /

3and /

4, using these as a mixing distribution and hence producing the integrated

probability distribution given in Figure 2. This, has, of course, a curious shape but it may bedefended by noting that the steady-state carry-over model and simple carry-over model simplyreplace the distribution given in Figure 1 by spikes at 0 and 1, respectively. Figure 2 may appear

ROBUST AND REALISTIC APPROACHES TO CARRY-OVER 2853

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 2849—2864 (1998)

Page 6: Robust and realistic approaches to carry-over

Figure 1. Probability distributions for r given four basic cases (r is the effect carry-over into an active treatment as a ratioof the effect of carry-over into placebo)

Figure 2. Unconditional probability distribution for carry-over

fantastic but it has a good claim to being as realistic as these extreme alternatives. Of course, thedistribution given in Figure 2 is loaded very much in favour of the steady-state model comparedwith the simple carry-over model. However, again this may be defended. Either, the trial isa single dose trial in which case the absolute level of carry-over will in any case be small — it doesnot much matter then which model for r applies, since carry-over could be ignored altogether — orthe trial is multiple-dose. In that case, it will have been designed so that the treatment will have

2854 S. SENN AND D. LAMBROU

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 2849—2864 (1998)

Page 7: Robust and realistic approaches to carry-over

reached steady-state by the end of a period. Now, of course, the trialists may be mistaken in theirdesign, but it seems more reasonable to assume that they will get it nearly right than that they willget it completely wrong.

DESIGN ROBUSTNESS

It is difficult to formalize the notion of a robust design but the basic idea is that a design should bechosen as being reasonably efficient for a general range of models rather than simply because it isoptimal for a given model. This seems similar to the idea of choosing an estimator that isreasonably efficient for a number of error distributions rather than, say, one that is optimal when theerrors are Normally distributed and homoscedastic. When examined more closely, however, a diffi-culty emerges and it becomes important to identify at least two different kinds of design robustness:

¹ype I: it is assumed that although the appropriate model is not available at the time thedesign is chosen, it will be by the time the design is analysed. A design is chosen which keeps theoptions open as regards analysis.¹ype II: it is not known which model is appropriate at the time of selecting a design and it isnot expected that this knowledge will become available before analysis. A design must bechosen which is adequate even when the wrong ‘model’ is applied.

Now, although he does not make this explicitly clear, Matthews’s investigations of therobustness of the designs commonly recommended in connection with the simple carry-overmodel implicitly assume that type I robustness is involved. In practice, however, it is type IIrobustness that is important. The reason that this is so is that if a decision as to the ‘correct’ modelis not available at the time of choosing the design, then this could only become available by thetime of analysis in one of two ways. Either the results of another investigation would becomeavailable in the meantime, or some diagnostic examination of the data to be analysed woulddetermine which was appropriate. However, the first case, although not without interest, reallycorresponds to a different problem altogether: that of designing a sequential investigation ofa treatment through a series of trials. The second case corresponds to a two-stage procedure;a preliminary analysis of the data (formal or informal, it makes little difference) followed bya ‘definitive’ analysis using the final model. This sort of behaviour is incoherent in Bayesian terms(a considerable uncertainty is ‘resolved’ on the basis of rather little information) and whenformally examined in frequentist terms such procedures do not perform well. (The two-stageanalysis of cross-over trials, stepwise regression and outlier detection methods are cases in point.)

When Type II robustness is investigated, however, there is a problem. The conventionalapproach to choosing designs is to compare their performance in terms of some variancecriterion. It is, of course, assumed that the estimators are unbiased. Where, however, it is notknown with certainty which model applies, this requires, either, that some means be found ofproducing unbiased estimates without knowledge of the definitive model, or that design/estima-tion strategies be compared in terms of mean square error or some similar measure. For the firststrategy some hyper-model is used which subsumes all possible models in contention. For thesecond, some means of establishing the likely relative importance of bias and variance is needed.

Of course, the number of possible designs that might be considered is enormous. The purposeof this investigation, however, is not an exhaustive catalogue of designs but an exploration of theconsequences of the model for carry-over on design robustness. For this purpose we limitourselves to a particular much-studied (if seldom used) class of design: dual balanced designs in

ROBUST AND REALISTIC APPROACHES TO CARRY-OVER 2855

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 2849—2864 (1998)

Page 8: Robust and realistic approaches to carry-over

two sequences and four treatments. These designs are capable of a simple and robust analysis5and included amongst these seven designs are three particularly attractive ones in which eachpatient receives each treatment for an equal number (two) of periods.

Our general approach to investigating the effect of carry-over on the choice of design is,therefore, as follows:

1. We shall consider all possible dual balanced designs for comparing two treatments in fourperiods and two sequences.

2. Where available, minimum variance unbiased treatment estimators will be given for eachdesign for:(i) the case ignoring carry-over;(ii) the simple carry-over model;(iii) the steady-state model;(iv) any arbitrary mixture of (ii) and (iii).(In finding these estimators we shall assume homoscedasticity and independence of within-patient errors and also that the patient effects are fixed.)

3. Efficiencies of the design and model combination will be given.4. We shall consider what we refer to as type I and type II robustness.

GENERAL MODEL

We assume that two treatments A and B are compared using two sequences and four periods. Themodel assumed in this investigation is of the general form given by Jones and Kenward5 with onemodification

½ijk"k#s

ik#n

j#q

d *i,j+#j

r *i, j~1,j+#e

ijk. (1)

Here, i subscripts a sequence, i"1, 2, and j a period, j"1, 2, 3, 4, whereas k subscripts a subjectwithin a sequence. The general ‘intercept’ is represented by k, s

ikis the effect due to subject k of

sequence i, niis the effect of period j and e

ijkis a general disturbance term assumed to have

constant variance p2 and covariance with other disturbance terms of zero. The direct treatmenteffect which applies in sequence i and period j is q

d*i,j+where d

*i,j+indicates which treatment was

applied in period j for sequence i. Thus d*i,j+

"A or B and is determined by the design. Thedifference in the model as presented here to that in Jones and Kenward is that the carry-overeffect, j, is indexed not only by the treatment of the preceding period but also by the currenttreatment. Thus the indexing term is r[i, j!1, j] and this can take on six possible values: AA;AB; BB; BA; !A, and !B. It will be assumed that j

~Aand j

~B, which are carry-over terms

applying in the first period, are zero (an assumption which is standardly made). The simplecarry-over (SC) model then corresponds to assuming that j

AA"j

ABand j

BB"j

BA. For the

steady-state (SS) carry-over model it is assumed that jBB"j

AA"0. A more general model makes

no such restrictions and eliminating this more general form of carry-over corresponds toeliminating both simple and steady-state carry-over. We assume that carry-over lasts for oneperiod only.

EFFICIENCIES OF DESIGNS

We are interested in estimators for the treatment effect qA!q

Band in the general linear model

these will simply correspond to linear combinations of the eight cell means formed by the

2856 S. SENN AND D. LAMBROU

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 2849—2864 (1998)

Page 9: Robust and realistic approaches to carry-over

Table I. Weights for four-period two-sequence designs

ModelDesign 1 Design 2

A A B B A B A B

Neither 1/4 1/4 !1/4 !1/4 1/4 !1/4 1/4 !1/4SC 6/20 4/20 !7/20 !3/20 1 !1/4 !1/2 !1/4SS 1/4 1/4 0 !1/2 1 !1/4 !1/2 !1/4Both 1 !1/2 0 !1/2 1 !1/4 !1/2 !1/4

ModelDesign 3 Design 4

A B B A A B A A

Neither 1/4 !1/4 !1/4 1/4 1/6 !3/6 1/6 1/6SC 6/20 !3/20 !7/20 4/20 2/12 !6/12 !1/12 5/12SS 1/2 0 !1/2 0 1/2 !1/2 !1/2 1/2Both 1 !1/2 0 !1/2 1 !1/2 !1/2 0

ModelDesign 5 Design 6

A A B A A B B B

Neither 1/6 1/6 !3/6 1/6 3/6 !1/6 !1/6 !1/6SC 2/12 5/12 !6/12 !1/12 4/8 !2/8 !1/8 !1/8SS 1/2 1/2 !1/2 !1/2 1/2 0 !1/4 !1/4Both 1 0 !1/2 !1/2 — — — —

ModelDesign 7

A A A B

Neither 1/6 1/6 1/6 !3/6SC 0 1/4 1/4 !1/2SS — — — —Both — — — —

cross-classification of four periods and two sequences. The task then is to determine the weightsfor each cell mean for each design/model combination. We consider four models: no carry-over;simple carry-over; steady-state carry-over, and ‘both’ (that is, any arbitrary mixture of the two).The weights may be determined by identifying the appropriate entries in

(XTX)~1XT

where X is a suitably constructed design (or more properly design-model) matrix. Model (1) isover-parameterized so that a reduced form of X leading to an invertible XTX must be used or onemust work with generalized inverses from which the identifiable contrast q

A!q

Bmust be

established. Alternatively, the simple system of successive elimination, together with a finalminimization of sums of squares of weights, as illustrated by Senn,2 may be employed. Note thatbecause we are only interested in the variance of a single treatment contrast, the issue of whichparticular variance function to minimize (as in classical A, D, E optimality17) does not arise.

The appropriate systems of weights are given in Table I, where the numbering of the designsfollows that given in Jones and Kenward5 (chapter 4). Only the weights for the first sequence ina pair are reproduced. The weights for the second sequence are simply the negative of those in the

ROBUST AND REALISTIC APPROACHES TO CARRY-OVER 2857

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 2849—2864 (1998)

Page 10: Robust and realistic approaches to carry-over

Table II. Variances of treatment differences for four-period two-sequence designs

Design Number No carry-over Simple carry-over Steady-state Bothcarry-over

AABB 1 0)500 (1/2) 0)550 (11/20) 0)750 (3/4) 3)000ABAB 2 0)500 (1/2) 2)750 (11/4) 2)750 (11/4) 2)750 (11/4)ABBA 3 0)500 (1/2) 0)550 (11/20) 1)000 3)000ABAA 4 0)667 (2/3) 0)916 (11/12) 2)000 3)000AABA 5 0)667 (2/3) 0)916 (11/12) 2)000 3)000ABBB 6 0)667 (2/3) 0)687 (11/16) 0)750 (3/4) —AAAB 7 0)667 (2/3) 0)750 (3/4) — —

first. These weights all share the property that when added over any sequence or any period theywill come to zero. When added over all cells labelled A they come to one and when added over allcells labelled B they come to minus one. They also come to zero when added over certain othercells; which cells depends on the model for carry-over assumed. If we take, for example, theAABB/BBAA design, then for the simple carry-over model (SC) the weights are 6/20, 4/20,!7/20, !3/20 and !6/20, !4/20, 7/20, 3/20. This model asserts that a term corresponding tothe carry-over for A occurs in cells 2 and 3 of the first sequence and term 4 of the second. The sumof these weights is #4/20, !7/20, #3/20"0. Similarly, the carry-over associated withB comes to zero. On the other hand, if the steady-state model (SS) applies to this design, then theweights are 1/4, 1/4, 0, !2/4 and !1/4, !1/4, 0, 2/4. A carry-over due to A occurs in cells 3 ofsequence 1 only and the weight here is 0. If the weights for ‘both’ are tried it will be seen that eitherform of carry-over is eliminated.

Given our assumptions about the disturbance terms, the efficiency of the designs may becompared by summing the squares of the weights. Assume, for argument’s sake, that there aren patients allocated to each sequence. Then the variance of any design/model combination maybe obtained by multiplying the figure given in Table II by p2/n, where p2 is the variance of thewithin-patient errors.

If the designs are compared it will be seen that the three designs (1, 2 and 3) in which eachpatient receives A twice and B twice are the most efficient for the model without carry-over. Twoof these (designs 1 and 3) are also the most efficient when the simple carry-over model applies. Onthe other hand, designs 1 and 6 are the most efficient when the steady-state model applies. Itmight be thought, therefore, that a safe recommendation is to use design 1. There is, however,a problem. If we wish to have type II robustness and we require unbiased estimators, then none ofthe designs is particularly good; typically the variance is six times that which applies in the bestpossible case (no carry-over and designs 1, 2 or 3). There is one exception and that is design 2.However, design 2 is the worst possible design if we know for sure that simple carry-over appliesor we know for sure that steady-state carry-over applies. Nevertheless, if we desire unbiasedestimators and we know that carry-over applies but do not know which form (but that it will lastfor at most one period) there is no escape from the fact that design 2 is the design of choice.

This immediately raises the possibility of an alternative approach altogether; that of beingprepared to accept a limited amount of bias as an acceptable cost of an efficient estimator. Thequestion then arises as to which model should be used. As Senn has pointed out, the simplecarry-over model is not necessarily a good choice.2 Consider design 1 again and suppose, just for

2858 S. SENN AND D. LAMBROU

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 2849—2864 (1998)

Page 11: Robust and realistic approaches to carry-over

Table III. Expected absolute bias as a fraction of j for four-period two-sequence designs asa function of r

Design Number No carry-over Simple carry-over Steady-state Bothcarry-over

AABB 1 r/2!1/4 7(r!1)/20 3r/4 0ABAB 2 !3/4 0 0 0ABBA 3 r/4!1/2 7(r!1)/20 r/2 0ABAA 4 r/6!2/3 5(r!1)/12 r/2 0AABA 5 r/6!1/3 5(r!1)/12 r/2 0ABBB 6 r/3!1/6 (r!1)/4 r/2 —AAAB 7 r/3!1/2 (r!1)/2 — —

the sake of argument, that B is a placebo and A is an active treatment. It seems plausible that ifcarry-over occurs it will be most extensive in period 3 of sequence 1 where a placebo is given aftera double period of active treatment. However, of the four weighting schemes, the one for whichthe absolute weight given to this cell is the largest is, in fact, the SC model. It is thus possible thatthis model will produce the most biased estimators. In the next section, therefore, we considerpotential biases associated with model/design combinations.

BIAS OF VARIOUS APPROACHES

Assume that A is an active treatment and that B is a placebo and that carry-over can only beengendered by the active treatment. Table III and Figures 3 and 4 give the bias of variousmodel/design combinations for the seven designs as a function of r, the ratio of carry-over fromA into A to the carry-over from A into B. Figure 3 covers the three designs which are uniform onthe patients (each patient receives A and B equally). Figure 4 covers the four other designs. Withthe exception of design 7, the plots cover three of the possible models for each design (adjustingfor both forms of carry-over is not included since this has a bias of zero whatever the value for r).Design 7 does not permit the elimination of steady-state carry-over. If Table III and Figures 3 and4 are studied it will be seen that for designs 3, 4, 5 and 6 the bias as a function of r is identical forthe steady-state model.

From Table III it can be seen that bias for the simple carry-over model is proportional to thefactor (r!1). This reflects, of course, the fact that the estimator must eliminate the carry-overcompletely for this model when r"1. On the other hand, bias for the steady-state model isproportional to r, because for r"0 estimators associated with this model are unbiased. Theunadjusted estimators do not, of course, in general eliminate carry-over and are not generallyzero unless j"0. If we ignore the inefficient design 2, then particularly attractive design/modelcombinations from the point of view of bias are designs 1 and 6 in conjunction with the nocarry-over model. In both cases the bias lies between that for simple and steady-state carry-over,and in both cases the bias is zero, whatever the value of j, if r"1/2.

In studying these graphs it should be borne in mind that on the whole it is the absolute biases ofthese design/model combinations which are important. This is because the contribution to meansquare error is via the square of the bias so that its sign is not relevant. The qualification ‘on thewhole’ is because there is one further consideration which does make the sign of the bias

ROBUST AND REALISTIC APPROACHES TO CARRY-OVER 2859

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 2849—2864 (1998)

Page 12: Robust and realistic approaches to carry-over

Figure 3. Bias of three estimation approaches for three designs as a function of r, the carry-over from A into A as a ratio ofthe carry-over from A into B. The carry-over from A into B, j, is assumed equal to 1

Figure 4. Bias of three estimation approaches for four designs as a function of r, the carry-over from A into A as a ratio ofthe carry-over from A into B. The carry-over from A into B, j, is assumed equal to 1

2860 S. SENN AND D. LAMBROU

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 2849—2864 (1998)

Page 13: Robust and realistic approaches to carry-over

important. If r is greater than 0, it implies that steady-state has not been reached by the end ofa period. However it is really the steady-state effect of a treatment we should like to study. Thismeans that we would really like to estimate what is sometimes referred to as the total effect:(q

A!q

B)#(j

AA!j

BB). Hence for values of r between 0 and 1, which is the most important range,

a negative bias of the treatment effect is actually more serious, other things being equal, thana positive bias. This really reinforces the desirability of designs 1 and 6. These are designs whichhave been identified as good designs in connection with the simple carry-over model5,12,13 andwould, therefore, seem partly to vindicate research based on that approach. However, there arethree important points to note. First, that if this argument is made, the near inevitability of bias ifcarry-over occurs must be accepted. If this is not accepted we are either back to using design 2 andadjusting for both types of carry-over or we must rely on unsupported statements that carry-overmust be of a particular form. Second, although the designs may appear vindicated, the simplecarry-over model is not. It is actually the approach that does not adjust for carry-over that doesbest. Third, as soon as we accept the possibility of bias we must accept that ‘optimality’ can nolonger be understood in conventional terms.

Figures 3 and 4 show bias as a function of r. The next step is to consider expected bias by usingthe probability distribution for r given in Figure 2. This is used as a mixing distribution for theconditional absolute bias given r. The reason that the absolute bias is used is that an estimatorwhich was extremely un-robust but either produced estimates with extreme negative orpositive bias, as the case might be, might otherwise appear to be robust. The mean square error, ofcourse, includes a squared bias term and this may then be calculated from this expected absolutebias.

In terms of bias, it appears that the steady-state estimators perform best. This is, of course, onlyto be expected since the probability distribution given in Figure 2 represents a stronger belief inthis form of carry-over than in any other. In view of the good conditional (on r) performance ofdesigns 1 and 6 it is, of course, only to be expected that they also perform well in terms of biaswhen this is studied unconditionally. If adjusting for simple carry-over and not adjusting iscompared, it will be seen that the latter strategy is preferable. This is only partly a reflection of thefact that the mixing distribution we have used places most weight on r"0. As already discussed,the conditional properties of not adjusting are generally good.

Since the unadjusted estimators also have the lowest variance, it thus follows that whatever thedegree of carry-over, given a choice between using the simple carry-over model or no carry-overat all, then given the probability distribution of Figure 2, the policy of not adjusting would bebest. However, given the bias advantage of the steady-state approach, no such simple statementregarding the choice between it and not adjusting is possible. We now consider this aspect interms of mean square error.

MEAN SQUARE ERROR

It is not generally possible to investigate mean square error without addressing the issue ofabsolute precision. This may be seen very simply by considering the AB/BA cross-over. Thewithin-patient estimator of the treatment effect is (potentially) biased but efficient. The between-patient estimator based on first period values is unbiased but inefficient. However, which haslower mean square error will depend not only on the degree of carry-over but on the number ofpatients. Given very many patients even the between-patient estimator will have a low variance.The bias of the within-patient estimator may then dominate. However, given the number of

ROBUST AND REALISTIC APPROACHES TO CARRY-OVER 2861

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 2849—2864 (1998)

Page 14: Robust and realistic approaches to carry-over

Table IV. Expected absolute bias as a fraction of j for four-period two-sequence designs,assuming that the probability distribution for r given in Figure 2 applies

Design Number No carry-over Simple carry-over Steady-state Bothcarry-over

AABB 1 0)189 0)274 0)193 0ABAB 2 0)751 0 0 0ABBA 3 0)442 0)274 0)129 0ABAA 4 0)628 0)326 0)129 0AABA 5 0)295 0)326 0)129 0ABBB 6 0)127 0)191 0)129 —AAAB 7 0)422 0)391 — —

patients commonly recruited to a cross-over trial, the mean square error for the within-patientestimator is likely to be much lower.11,14,18

However, some progress can be made even without considering absolute precision. Supposethat for a given design model combination (call this combination 1), another combination can befound which has both lower bias and lower variance (call this combination 2). Then combination1 is dominated by combination 2. Because both bias and variance are always lower in combina-tion 2, its mean square error will be lower than that of combination 1, whatever the sample size.

Using this argument, for example, with the help of Tables II and IV we can note that for the nocarry-over model we may eliminate all designs except 1 and 6. For the SC model we would retaindesigns 1, 3 and 6. However, comparing now across models for designs 1 and 6, the no-carry-overmodels have both lower bias and lower variance than the SC models so that the latter aredominated for this design. For the SS model designs 1 and 3 would survive on an internalcomparison. However, in connection with design 1, the no-carry-over model dominates on bothbias and variance so that the SS model can be eliminated. This would appear to leave design 3 butif this is compared with design 6 and no-carry-over it loses out on both variance and (just) bias.This leaves the no carry-over model for designs 1 and 6. The variances as a proportion of p2/n are1/2 and 2/3, respectively, and the biases as a function of j are 0)189 and 0)127. The condition thatdesign 6 is better than design 1 is thus the condition that

p2/(2n)#(0)189)2j2'2p2/(3n)#(0)127)2j2

or thatj2'8)5p2/n.

Now, since the variance of the treatment estimate associated with design 1 is 0)5p2/n, this impliesthat the ratio of carry-over to the standard error of the treatment effect would have to be in excessof 4 for design 6 to be preferable. This would be very high precision for the treatment effect itself,and of course, the carry-over is liable to be much less. Thus it seems that to the extent that anyconclusion is possible, the combination of design 1 and not adjusting is best.

However, it should be stressed that this conclusion is conditional on the assumption regardingthe independence of the within-patient error terms made in connection with the model given by(1) above. If, for example, an autoregressive structure applies but carry-over may be assumednegligible, then designs 2 and 3 are particularly attractive. This is true even if ordinary leastsquares (OLS) is applied rather than generalized least squares (GLS) as the approach to

2862 S. SENN AND D. LAMBROU

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 2849—2864 (1998)

Page 15: Robust and realistic approaches to carry-over

estimation. (Matthews gives good discussion of the value of designs for which OLS is efficient ornearly efficient despite autocorrelation.13) On the other hand, design 1 is less attractive.

DISCUSSION

Some caution is called for regarding our results. We have only presented here the results fordesigns in two sequences. These designs are interesting because, as Jones and Kenward point out,a particularly simple and robust analysis is possible for them. A simple contrast can be producedfor each patient and these can then be compared between the two sequences using the t-test orMann—Whitney (or Wilcoxon rank sum) test. These approaches finesse any need to make explicitassumptions about the within-patient correlation structure. However, designs with more se-quences might deal with carry-over more successfully. Consider for example the design in oursequences AABB/BBAA/ABBA/BAAB. A naıve system of weights is one whereby each of theeight A cells is weighted 1/8 and each of the B cells is weighted !1/8. These are the appropriateweights for the no-carry-over model but also happen to eliminate simple carry-over as well. Thiselimination is thus ‘free’ for this design and achieved at no cost in terms of variance. This is, ofcourse, well known as this design is often described as being ‘optimal’. On the other hand,however, one should not get too carried away with this possibility. This system of weights doesnothing to deal with steady-state carry-over and hence is unlikely to eliminate carry-over inpractice.

In fact, our investigation shows that the claim that designs that are optimal for simplecarry-over are generally robust is only partially justified. Two of the designs which statisticiansusing the simple carry-over model have identified as being good designs (designs 1 and 6) arereasonable. A third which is generally identified as good (design 3) is less so unless carry-over maybe assumed negligible. However, the irony is that for this claim to be correct two unconventionalsteps have to be taken. First, bias in estimation has to be accepted. If it is not, then short ofdeclaring that simple carry-over applies by fiat, the best design is design 2, a design which hastraditionally been identified as the worst. The second point is that the best approach to estimationis actually to ignore carry-over altogether.

A further irony is that the AABB/BBAA design analysed using the weights appropriate for nocarry-over is, of course, effectively an AB/BA design with two measurements per period to whicha summary measures approach has been applied. It is thus easy to see that the claim that the‘problems’ with this design can be cured by adding more periods is not reasonable.

Of course, some of our investigations are dependent on the mixing distribution for r we haveassumed. Others, of course, would come up with different distributions and we anticipatecriticism in this respect. The object of this investigation has not been, however, to provide generalrecommendations valid in all cases. On the contrary, what we have wished to show is that suchrecommendations are not possible. In our opinion, what those who still cling to the simplecarry-over model need to provide are arguments in favour of it, or evidence that it obtains. Thesehave been conspicuous by their absence.

Our conclusion is that the statistician cannot provide medical colleagues with a general designand estimation strategy that can be guaranteed to deal with carry-over. Our practical advice is toturn the problem the other way round. Rather than the statistician trying to persuade the lifescientist that statistical solutions to carry-over are available, the statistician should inquire of thelife scientist what needs to be done to minimize the problem. Close collaboration between both isneeded to establish suitable passive or active wash-out periods. In general, pharmacology, as

ROBUST AND REALISTIC APPROACHES TO CARRY-OVER 2863

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 2849—2864 (1998)

Page 16: Robust and realistic approaches to carry-over

much as statistics, is the key to successful design of trials in drug development, and cross-overtrials are no exception.

ACKNOWLEDGEMENT

We are grateful to an anonymous referee for helpful comments.

REFERENCES

1. Fisher, R. A. ¹he Design of Experiments, in Bennett, J. H. (ed.), Statistical Methods, Experimental Design,and Scientific Inference, Oxford University Press, Oxford, 1990.

2. Senn, S. J. Cross-over ¹rials in Clinical Research, Wiley, Chichester, 1993.3. Kershner, R. P. and Federer, W. T. ‘Two-treatment crossover designs for estimating a variety of effects’,

Journal of the American Statistical Association, 76, 612—619 (1981).4. Laska, E. M., Meisner, M. and Kushner, H. B. ‘Optimal crossover designs in the presence of carryover

effects’, Biometrics, 39, 1087—1091 (1983).5. Jones, B. and Kenward, M. Design and Analysis of Cross-Over ¹rials, Chapman and Hall, 1989.6. Ratkowsky, D. A., Evans, M. A. and Alldredge, J. R. Cross-over Experiments, Design, Analysis and

Application, Marcel Dekker, New York, 1993.7. Fleiss, J. ‘Letter to the editor, On multiperiod crossover studies’, Biometrics, 42, 449—450 (1986).8. Fleiss, J. ‘A critique of recent research on the two-treatment cross-over design’, Controlled Clinical

¹rials, 10, 237—243.9. Senn, S. J. ‘Is the simple carry-over model useful?’, Statistics in Medicine, 11, 715—726 (1992).

10. Senn, S. J. ‘Some controversies in designing and allocating cross-over trials’, Biocybernetics andBiomedical Engineering, 15, 27—39 (1995).

11. Senn, S. J. ‘Cross-over trials at the cross-roads?’, Applied Clinical ¹rials, 4, 24—31 (1995).12. Matthews, J. N. S. ‘Modelling and optimality in the design of crossover studies for medical applications’,

Journal of Statistical Planning and Inference, 42, 89—108 (1994).13. Matthews, J. ‘Multi-period cross-over trials’, Statistical Methods in Medical Research, 3, 383—405 (1994).14. Senn, S. J. ‘The AB/BA crossover: past, present and future?’, Statistical Methods in Medical Research, 3,

303—324 (1994).15. Palmqvist, M. Balder, B., Lowhagen, O., Melander, B., Svedmyr, N. and Wahlander, L. ‘Late asthmatic

reaction decreased after pretreatment with salbutamol and formoterol a new long acting b2 agonist’,Journal of Allergy and Clinical Immunology, 89, 844—849 (1992).

16. Senn, S. J. Statistical Issues in Drug Development, Wiley, Chichester, 1997.17. Atkinson, A. C. and Donev, A. N. Optimum Experiment Designs, Oxford Science Publications, Oxford,

1992.18. Senn, S. J. ‘The AB/BA Cross-over: How to perform the two-stage analysis if you can’t be persuaded that

you should’t’, in ¸iber Amicorum Roel van Strik, Erasmus University, Rotterdam, 1996, pp. 93—100.

2864 S. SENN AND D. LAMBROU

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 2849—2864 (1998)