some controversies in planning and analysing multi-centre trials

13
STATISTICS IN MEDICINE Statist. Med. 17, 1753 1765 (1998) SOME CONTROVERSIES IN PLANNING AND ANALYSING MULTI-CENTRE TRIALSs STEPHEN SENN* Department of Statistical Science & Department of Epidemiology and Public Health, University College London, Gower Street, London WC1E 6BT, U.K. SUMMARY It is shown that a rational approach to planning multi-centre trials will lead to an unequal distribution of patients across centres. Consequently different approaches to estimation will yield different estimates. However, some such approaches are not reasonable and it is concluded that multi-centre trials are less problematic than is commonly supposed. ( 1998 John Wiley & Sons, Ltd. 2 every thesis and hypothesis have an offspring of propositions;- and each proposition has its own conclusions; every one of which leads the mind on again into fresh enquiries and doubtings. LAURENCE STERNE, Tristram Shandy INTRODUCTION A concern in planning clinical trials is the supply of patients. Considerable efforts are made in the planning stage to estimate rates of accrual and later, when the trial is implemented, to see that these rates are achieved. The longer a trial takes the more, other things being equal, it will cost the sponsor, in particular in terms of lost sales. In many trials the time between recruiting the first and the last patient is a considerable proportion of the total duration of the trial. For example, a ‘three month’ study in asthma might take 15 months to complete due to 12 months recruitment. The sample sizes required by most clinical trials are such (the exceptions being cross-over trials in some specialties) that acceptable recruitment times can only be achieved by recruiting patients simultaneously to a number of centres. Hence, although they are regarded by some commentators as being problematical, multi-centre trials are none the less granted to be an absolute necessity. In this paper,s1 I shall consider some of the controversial issues which such multi-centre trials raise. These are listed under individual section headings below. SHOULD PATIENT NUMBERS PER CENTRE BE ROUGHLY EQUAL? A point of view which is often propounded is that they should be. It is maintained that it is the sign of a bad trial that the numbers of patients per centre are unequal.2 There are perhaps two * Correspondence to: Stephen Senn, Department of Statistical Science and Department of Epidemiology and Public Health, University College London, Gower Street, London WC1E 6BT, U.K. s This paper is a shortened version of Chapter 14 of Statistical Issues in Drug Development1 CCC 02776715/98/151753 13$17.50 ( 1998 John Wiley & Sons, Ltd.

Upload: stephen-senn

Post on 06-Jun-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Some controversies in planning and analysing multi-centre trials

STATISTICS IN MEDICINE

Statist. Med. 17, 1753—1765 (1998)

SOME CONTROVERSIES IN PLANNING AND ANALYSINGMULTI-CENTRE TRIALSs

STEPHEN SENN*

Department of Statistical Science & Department of Epidemiology and Public Health, University College London, Gower Street,London WC1E 6BT, U.K.

SUMMARY

It is shown that a rational approach to planning multi-centre trials will lead to an unequaldistribution of patients across centres. Consequently different approaches to estimation will yielddifferent estimates. However, some such approaches are not reasonable and it is concludedthat multi-centre trials are less problematic than is commonly supposed. ( 1998 John Wiley &Sons, Ltd.

2every thesis and hypothesis have an offspring of propositions;- and each proposition has itsown conclusions; every one of which leads the mind on again into fresh enquiries and doubtings.

LAURENCE STERNE, Tristram Shandy

INTRODUCTION

A concern in planning clinical trials is the supply of patients. Considerable efforts are made in theplanning stage to estimate rates of accrual and later, when the trial is implemented, to see thatthese rates are achieved. The longer a trial takes the more, other things being equal, it will cost thesponsor, in particular in terms of lost sales. In many trials the time between recruiting the first andthe last patient is a considerable proportion of the total duration of the trial. For example, a ‘threemonth’ study in asthma might take 15 months to complete due to 12 months recruitment. Thesample sizes required by most clinical trials are such (the exceptions being cross-over trials insome specialties) that acceptable recruitment times can only be achieved by recruiting patientssimultaneously to a number of centres. Hence, although they are regarded by some commentatorsas being problematical, multi-centre trials are none the less granted to be an absolute necessity. Inthis paper,s1 I shall consider some of the controversial issues which such multi-centre trials raise.These are listed under individual section headings below.

SHOULD PATIENT NUMBERS PER CENTRE BE ROUGHLY EQUAL?

A point of view which is often propounded is that they should be. It is maintained that it is thesign of a bad trial that the numbers of patients per centre are unequal.2 There are perhaps two

* Correspondence to: Stephen Senn, Department of Statistical Science and Department of Epidemiology and PublicHealth, University College London, Gower Street, London WC1E 6BT, U.K.s This paper is a shortened version of Chapter 14 of Statistical Issues in Drug Development1

CCC 0277—6715/98/151753—13$17.50( 1998 John Wiley & Sons, Ltd.

Page 2: Some controversies in planning and analysing multi-centre trials

reasons why this is believed to be the case. First, it is argued that we ought to wish to treat centresequally, and therefore, failure to recruit equal numbers of patients to centres indicates either poorplanning or poor control in execution. Second, it is argued that trials in which the numbers ofpatients differ greatly from centre to centre will be inefficient. This latter point will be dealt withbelow under another heading. The first is considered here.

In contradistinction to many commentators, it is my view that unequal numbers of patients percentre, far from being a sign of poor planning or execution, are a necessary and logicalconsequence of a rational approach to clinical trials. It is rare, although it is occasionally the case,that the rate limiting factor for a given centre in treating patients in a clinical trial is resourceavailability: doctors, nurses, and so forth. It is more usually the case that the process as a whole isgoverned by the rate at which patients arrive who both satisfy the inclusion criteria and arewilling to give consent to enter the trial. If one seeks a model to describe the probability of arrivalof a given number of patients at a given clinic in a given period (say six months) then the simplestcandidate is the Poisson distribution. This is a distribution for which the variance is equal to themean so that, for example, if mean number of arrivals were 9 patients per six months, the variancewould also be 9. (In practice most real life distributions of events per time period show morevariation than that exhibited by the Poisson but it often provides a useful starting point.)

When describing the relative variability of a distribution, however, it is not the ratio of varianceto mean which is important but that of standard deviation to mean (the coefficient of variation).This feature is a useful one which can be exploited given an appropriate attitude. The sum ofa number of independent Poisson variables is also a Poisson variable so that if we had fourcentres each with a mean (Poisson) arrival rate of 9 patients per six months overall, we shouldhave a trial where arrival was described by a Poisson distribution with mean 36 per six months.Now, since the standard deviation is the square root of the variance, the ratio of standarddeviation to mean for a Poisson with mean 36 is 6/36"0)17, whereas for a Poisson with mean 9 itis 3/9"0)33.

This phenomenon is really nothing more than a particular instance of the law of large numbers;an increase in numbers bringing about a reduction in relative variability. What is the origin of thisreduced relative variability? It comes from the possibility of mutually (randomly) compensatingbehaviour of the centres. One centre, by chance, may have a smaller than average influx ofpatients in the period for which the study is planned to run but another may, by chance, havea greater than average number of patients. Provided we allow the centres to compensate for eachother, this feature may be exploited.

Is there any way in which we can destroy this ‘safety in numbers’ property? Yes: we can insist onapplying individual identical targets for each centre in order to try and ensure that the numbersrecruited per centre are equal. If we adopt this procedure, then the recruitment rate is driven by theslowest recruiting centre. Far from having obtained stability through large numbers we shall havedesigned a trial where the extremes are important.

Consider, for example, a trial in four centres in which it is determined to recruit 96 patients intotal. Suppose that in each centre recruitment follows a Poisson distribution with mean arrivalrate of 24 patients per year. Mean inter-arrival time is then 1/24 years or 0)042 year. The time torecruit the 24th patient in a given centre will be given by the gamma distribution with parameters0)042 and 24. The mean recruitment time for a centre will be 1 year and the median will be 0)986year. The probability of completing in one year or less in a given centre will be 0)527.

If, however, we have to wait for every centre to complete then the probability of completing inone year or less is the product of the probabilities that each centre does so and is (0)527)4"0)077.

1754 S. SENN

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 1753—1765 (1998)

Page 3: Some controversies in planning and analysing multi-centre trials

Further calculations show that the median time to finish the whole trial is 1)2 and thatthree-quarters of all trials will take more than 1)1 years to finish. If, on the other hand, for thepurpose of recruitment we treated the whole trial as one, simply requiring a total recruitment of96, then the median recruitment time would be 0)997, nearly two and a half months less (seeAppendix I for details).

The above argument shows the minimum likely effect of requiring equal numbers per centreand assumes that the true long-term recruitment rates are equal in all centres. If one furthersupposed that such true rates vary from centre to centre then the requirement for equal numbersper centre becomes even more hopeless. Hence, it is the case that any rational approach toplanning and managing multi-centre trials must allow for disparity in the number of patients percentre.

ARE TRIALS WITH UNEQUAL NUMBERS OF PATIENTS PER CENTRE INEFFICIENT

If it were the case that a trial with very variable numbers of patients per centre were less efficient interms of the precision of the estimate delivered at the end of the trial than a trial with the sametotal numbers of patients distributed equally among the centres, then the superiority of the formerwith respect to recruitment rate might be completely vitiated. We should be able to recruitpatients faster but would have to recruit more of them in order to provide an estimate of equalprecision. It is thus necessary to consider what effect unequal numbers have on the precision ofthe treatment estimate.

This turns out to be a rather complex matter because, associated with a multi-centre trial, onecan conceive of at least three different sorts of estimate which might be calculated even where thetrialist’s aim is to allow for differences between centres. Two of these estimates are so called ‘fixed’effect estimates, the other is a random effect estimate. It is really only with one of the two fixedeffect estimators that this problem of inefficiency is serious. I shall discuss this point below whencomparing ‘type II’ and ‘type III’ approaches to inference. For the moment suffice it to say that itis not true that trials with unequal numbers of patients per centre are inefficient unless we insiston weighting centres equally (the type III approach).

SHOULD WE USE ‘TYPE II’ OR ‘TYPE III’ SUMS OF SQUARES?

As we have explained, multi-centre trials are usually unbalanced. For unbalanced experiments,the simplest sum of squares attributable to effects (so called type I sums of squares) have valueswhich depend on the order in which they are fitted. This means that in analysing such experi-ments for the purpose of assigning a sum of squares to any given effect, it is necessary to makea decision as to which other effects are to be deemed to have been fitted first. The strategy offitting all other effects first when attempting to say something definitive about another is generallyagreed as sensible by statisticians. There is one exception, however, and that is to do withinteractive effects such as sex-by-centre, or treatment-by-centre interaction (the tendency for theactual efficacy of the treatment to be different in different centres). One school maintains that suchinteractive effects should also be fitted first (the ‘type III’ philosophy),3 another maintains thatwhen looking at the effect of treatment (say) all other effects should be fitted first except for anyinteractive effect in which treatment itself is involved (this is the type II philosophy).4 Thus, if wewere to use PROC GLM of SAS the type II sum of squares for treatment would correspond to

PLANNING AND ANALYSING MULTI-CENTRE TRIALS 1755

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 1753—1765 (1998)

Page 4: Some controversies in planning and analysing multi-centre trials

the type I sum of squares for treatment for

model OUTCOME"SEX CENTRE SEX]CENTRE TREATMENT

TREATMENT]CENTRE

where SEX]CENTRE and TREATMENT]CENTRE are interactive terms. On the otherhand, the type III sum of squares for treatment would correspond to the type I sums of squares fortreatment for

model OUTCOME"SEX CENTRE SEX]CENTRE

TREATMENT]CENTRE TREATMENT.

There are also treatment estimators which correspond to the particular approaches for estimatingsums of squares and it is easier to see the relative merits of the two approaches in terms of theseestimators. As regards treatment effects adjusted for the effect of centres, the approaches can bedescribed as follows. First each approach would first estimate a separate treatment effect in eachcentre. This corresponds to adjusting treatment for pure differences between centres. The type IIIapproach would then combine these separate treatment effects by weighting each of these equally.For the type II approach they would be weighted according to the precision with which they hadbeen estimated so that larger centres would receive more weight. Thus, to make a politicalanalogy, type II squares are like the U.S. House of Representatives in which states are representedaccording to their population. On the other hand, type III sums of squares are like the Senate:each state is represented equally.

The arguments made in favour of type III approaches are as follows: (i) if treatment effects varyfrom centre to centre then the only interpretable overall treatment effect would be a straightfor-ward average of the centre effects; (ii) (a related point) if we use the treatment estimate as the basisof a test of a hypothesis then the hypothesis we test concerns some average of the true treatmenteffects in each centre. It would be absurd if this hypothesized average itself depended on thenumbers of patients we happened to have recruited to the trial.

The type II proponent might argue as follows: (i) We really do not care which centres we recruitto the trial provided they deliver enough information. It is therefore nonsense not to weight themaccording to the amount of information they provide. (ii) If we consider all the centres we mighthave included but did not, then the type III approach also depends arbitrarily on the numbers ofpatients recruited; it depends on whether the centre recruited none or some. (iii) Under the nullhypothesis of no treatment effect there cannot be any treatment by centre interaction anyway andso any weighted combination of the treatment effects by centre forms a valid test of thishypothesis. Why not use the most efficient?

My own view is that, although the type III philosophy seems plausible at first, it is untenable. Itleads to paradoxes. For example, given two centres, a large and a small centre, unless the smallcentre is at least one-third the size of the large centre, the type III treatment estimate will havea larger variance than that based on the large centre alone. Thus more information is worse thanless. Figure 1 shows the variance of the treatment effect in a two centre trial as a function of thenumber of patients in the second centre. The effect is bizarre to say the least. In this sense, thegeneral view that trials with variable centre sizes are inefficient is true. However, provided a typeII approach is used, the problem and paradox disappear.

In fact, analyses as carried out by statisticians wedded to the type III philosophy show signs ofmany concessions to a type II approach. For example, it is a common habit to combine small

1756 S. SENN

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 1753—1765 (1998)

Page 5: Some controversies in planning and analysing multi-centre trials

Figure 1. Type III efficiency in a two centre trial. The number of patients in the first centre is fixed. The plot shows theeffect on efficiency as more patients are recruited to the second centre

centres for analysis. This, of course, downweights their influence on the final result, thusproducing an answer more like the type II approach. Furthermore, for meta-analyses, nobodyuses an analysis which weights the trials equally. It is true that a random effects analysis (seebelow) is sometimes advocated but where a fixed effects analysis is employed it is essentially a typeII analysis which is used. A similar concession is made when fitting baselines. Most statisticiansfirst subtract the overall mean baseline from the baseline for each patient. This has the conse-quence that the treatment effect for the average patient is estimated whether or not type III ortype II sums of squares are used. Not doing this, however, although it has no consequences for thetype II approach, has the disastrous effect on the type III approach that the treatment effect isestimated for a patient with baseline 0. Such patients may be dead and so it will be no surprise ifthe treatment is proved ineffective!

SHOULD WE USE FIXED OR SHOULD WE USE RANDOM EFFECT MODELS?

Choices between fixed or random effect estimators arise wherever we have data sets with multiplelevels within the experimental units, for example, patients within trials for a meta-analysis,episodes per patient for a series of n-of-1 trials or patients within centres for a multi-centre trial.The issue is an extremely complex one and it is difficult to give hard and fast rules as to which isappropriate.5,6

The choice between them has something to do with treatment-by-centre interaction although(in my view) rather with our attitude towards it, than whether its presence is detectable or not.7 Itis a commonplace that patients vary from centre to centre. Provided that the treatment effect isadditive on our chosen scale of measurement (in other words, whatever the state of patients, itbrings the same measured benefit), such differences may be simply eliminated in analysis.Suppose, for example that in one large centre in an asthma trial the mean value for forcedexpiratory volume in one second (FEV

1) for patients is 1620 ml whereas that for patients being

given active treatment is 1870 ml but that in another centre the value for placebo patients is

PLANNING AND ANALYSING MULTI-CENTRE TRIALS 1757

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 1753—1765 (1998)

Page 6: Some controversies in planning and analysing multi-centre trials

1900 ml whereas that for patients being given active treatment is 2150 ml. Clearly there areimportant differences between centres; patients on active treatment in the one are apparentlyworse off than those on placebo in the other! However the difference (active!placebo) is 250 mlin each centre, so that it is conceivable that the treatment is having the same effect (given anappropriate definition of effect). Suppose, furthermore, that we believed that the treatment effectwas identical in every centre in the trial (whatever general differences in level there might be) andthat it would also be the same in any centre we might study in the future, even if the patients theredid not fulfil the inclusion criteria of the trial. (Of course random variation would mean that wewould not always observe identical apparent effects as in the above example but we mightconceivably believe them to be similar.) In that case the following treatment effects would be thesame:

(i) the true mean of the effects for all patients in the trial;(ii) the mean over all centres in the trial of the true mean effect for each centre;(iii) the true individual effect for any patient in the trial;(iv) the true mean effect for all possible selections of patients into the trial as designed;(v) the true effect for any future patient or centre to which we might wish to apply the results.

This list is by no means exhaustive but shows that very different sorts of treatment effects may beentertained.

Suppose, however, that the treatment effect was not the same from centre to centre. Then notall of the above would be identical. For example, (i) and (ii) would be the same if we had equalnumbers of patients per treatment group per centre but not otherwise. (This at least is oneapparent advantage of trials with equal numbers per centre!) In general, therefore, (as we sawabove) we may have a choice of effects to estimate.

Suppose that we remain very unambitious and adopt as an objective simply to show that thedrug had an effect in these patients and also to describe what effect we think might have applied.We posit as a null hypothesis that the drug had no effect at all. That being the case it follows thatthe effect is identically zero in every centre. We adopt as an alternative hypothesis that the drughas an effect somewhere. Under the alternative hypothesis, the effect might well be different indifferent centres. However, because this is a multi-centre trial it follows that the numbers percentre are scarcely adequate to produce a reasonable estimate individually for each centre.Although they are not in themselves valuable, however, we may nevertheless produce individualestimates per centre with a view to combining them in one overall estimator. Such an estimator,although technically subject to some bias when applied to individual centres or patients, mayhave a much lower variance, and, therefore, nevertheless be more accurate (a justification of thetype II attitude). Furthermore, to test the null hypothesis, any combination we choose (providedit is prespecified) of the estimates from the individual centres may be employed.

Since we have adopted as our objective the description and detection of treatment effects inthese patients there is no need for us to consider patients we might have studied, and, it thusfollows, from which we might have recruited. This does not mean, however, that chance does notaffect our treatment estimate. Ideally, to measure the treatment effect in these patients, we shouldlike to be able to study each twice under identical conditions, once under each treatment. Inpractice a given patient will have been allocated to a particular treatment (using some form ofrandomization). This allocation introduces variability. Furthermore if patients do not reactidentically to treatment and especially if these effects vary from centre to centre, then what wedefine as the treatment effect is a partly arbitrary choice. If, however, we make this choice taking

1758 S. SENN

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 1753—1765 (1998)

Page 7: Some controversies in planning and analysing multi-centre trials

account (a) of the random variation introduced by allocation of treatment to these patients,and (b) with a view to describing what happened in these patients, and in particular if incalculating standard errors and confidence intervals we regard (a) as being the source ofvariability, then we have what is known as a fixed effects approach. As we discussed above, thiscan still leave us with a choice of estimates. However, my own opinion is that the type II approachis logical.

If, however, we consider instead that we wish to make probabilistic statements about patientsin general, including those from centres we did not include, then we have to move to regarding thecentres themselves as some realization of a random process. The true difference in treatmenteffects from centre to centre now becomes a further source of random variation because, althoughif we restrict inference to these centres only, this variability is frozen (the differences are what theyare and that is an end of it), if we talk about future centres then these may also vary and there is noreason why these other differences should be exactly the same which now apply. If we takeaccount of this further source of variation, not only in forming estimators but also in calculatingconfidence intervals, then we have a random effects model.

The technical consequence of the random effects model as regards its effect on estimates isgenerally to produce values between those produced by the type II and type III fixed effectsapproach (see Appendix II for details). This is because we now have two sources of randomvariation: random variation within centres and random variation of the treatment effect betweencentres. Whereas the former will be smaller the more patients in a given centre, the latter will notbe generally affected by the number of patients in the centre. Therefore if we weight treatmentestimates according to their reliability this leads only to a partial weighting by number of patientsand hence produces an answer between type II and type III. The effect on the estimated varianceof the resulting treatment effect and hence on standard errors, confidence limits, tests ofhypotheses and p-values is more complex and is perhaps most easily explained by considering thecase of equal numbers of patients in each treatment group in each centre. Here type II and type IIIfixed and random effects approaches produces the same estimate. Estimated variances for type IIand type III will also be identical. For the random effects model they would usually be muchhigher. This is as it should be, since in the first case we are saying something about what happenedin the centres actually studied. In the second case we are attempting to say what might happen inother centres.

If we consider whether we should use fixed or random effects then the following points apply.

Pro fixed:

1. Gives a fairly precise answer to a fairly well defined question.2. Is the only possible approach for a single-centre trial. Many trialists would be quite happy to

run a trial in a single-centre if only they could and this implies that there must be an implicitacceptance on their part of the fixed effect policy.

3. It is also the only realistic option when we have very few centres.4. Random sampling of centres does not take place in clinical trials and it is rather hard to

define precisely what question the random effects model is answering.5. The definition of a centre is largely arbitrary. What makes a centre: the patients (who will

not be the same tomorrow as they are today); the doctors and nurses (who may leave to takejobs elsewhere)? The random effects model involves a degree of ‘reification’ of the centregranting it a substantive significance it does not really have. Centre is not a well definedexperimental unit to the same degree as patient.

PLANNING AND ANALYSING MULTI-CENTRE TRIALS 1759

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 1753—1765 (1998)

Page 8: Some controversies in planning and analysing multi-centre trials

Pro random:

1. We develop treatments to say things about their effects on patients in general. We shouldnot dodge this issue even if it is difficult to come up with precise answers.

2. If we attempt to answer this difficult question the random effects model will almost certainlyproduce a better approximation.

3. In practice confidence limits for random effect models will be wider than for fixed effectmodels and this is a more realistic representation of the true uncertainties if we are interestedin prediction.

4. If we are interested to say something about patients in a given centre, the fixed effectapproach leaves little alternative but to use the results from that centre only. The randomeffects approach will allow us to combine information with the given centre with informa-tion from all centres in a way which may be more appealing and useful.

Unlike for the controversy over type II and type III sums of squares, I have no firm opinion as towhich of fixed or random effects is the right approach and consider that both have their place.I would nearly always propose a fixed effect analysis of a clinical trial. I might also consider thata random effect analysis would be useful on occasion, especially if there were rather many centreswhich had been fairly widely selected. I am, however, rather sceptical of some of the enthusiasmwhich proponents of the random effects model seem to generate on occasion. There are a numberof worrying problems to be addressed when dealing with random effect models. The definition ofthe centre, for example, is an important one. Most trials are randomized in sub-centre blocks. Ifwe were a random effects model at that level rather than the centre level, we should end upweighting the blocks approximately equally, and since centres differ according to the number ofblocks, move back closer to the type II estimator. Furthermore, if we consider that the reason fora centre-by-treatment interaction is that larger centres tend to produce different results to smallerones, then centre size becomes a stratum. If the distribution of centres in the trial reflects thedistribution in general, then weighting centres more nearly equally will give a misleading positionas to what happens on average. Again we should have to move closer to the type II estimator. Inshort, because centre is not a well-defined experimental unit (the label might sometimes beapplied separately to two physicians in the same hospital and sometimes to hospitals in differentcountries) random effect models involve a partially arbitrary choice. This does not make theminappropriate but it means that one should be cautious in the claims made for them.

SHOULD WE STUDY EFFECTS FROM INDIVIDUAL CENTRES?

It is difficult to think of any objections to this but if such study is undertaken, caution must beexercised. First, it should be appreciated that results from individual centres are almost never ofinterest in themselves. However, it may be that if centres can be found which produce a particulartype of result then some common factor can be identified. This might be useful knowledge.Furthermore, looking for unusual centres may be part of an approach to detecting fraud.8 Thesecond point is that the precision of centre-specific treatment estimates is very low. It is almostinvariably the case that the reason that a multi-centre trial has been contemplated is that thepower of a single-centre trial would be far too low. As a consequence we may expect considerablechance variation between centres. Consider a placebo controlled multi-centre trial with 80 percent power in total at 5 per cent level (two-sided). Suppose that the true treatment effect isidentically equal to the clinically relevant difference. It then follows that provided we have at least

1760 S. SENN

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 1753—1765 (1998)

Page 9: Some controversies in planning and analysing multi-centre trials

Figure 2. Probability of at least one effect reversal as a function of the number of centres given 80 per cent overall powerfor a two-sided significance level of 5 per cent

six centres, there is an odds on chance that at least one of them will show an ‘effect reversal’(the placebo will appear superior). Figure 2 provides a plot of the probability of at least oneeffect reversal as a function of the number of centres. (Details of the calculations are given inAppendix III.) Hence the trialist needs to be on guard not to overinterpret results from singlecentres.

WHAT EFFECT DOES TREATMENT-BY-CENTRE INTERACTION HAVE ON THEPOWER OF MULTI-CENTRE TRIALS?

This issue is not what it seems, since, to the extent that treatment-by-centre interaction has anadverse effect on the power of clinical trials, it probably has an even worse effect on single-centretrials than on multi-centre ones. Consider an extreme case in which only half of all centres wouldbe capable of showing any treatment effect whatsoever. It then follows that the probability ofgetting a significant result in any of these centres if run as a single-centre trial is 5 per cent, thetype I error rate (assuming the most usual conventional rate is used). At the very best theprobability of getting a significant result in one of the other sort of centres must be less than 100per cent. Hence the overall power of the single centre trial must be less than 1

2]5 per

cent#12]100 per cent and hence less than 52)5 per cent however large the centre. On the other

hand, for a multi-centre trial by increasing the number of centres one could continue to increasethe power.

The point about multi-centre trials is rather than they cause us to think about the problem oftreatment by centre interaction. We are thus tempted to believe that such interaction is uniquelya problem for multi-centre trials. In fact it is more of a problem for single-centre trials but becausethere is no way of investigating the phenomenon we assume that it does not exist. Figure 3 showsthe power as a function of the number of centres for a trial designed to have 80 per cent power fora two-sided test at the 5 per cent level given that the treatment effect is believed to be 6 in everycentre for a standard deviation of 10 but is, in fact, 8 with probability 1/2 and 4 with probability1/2 (see Appendix IV for details).

PLANNING AND ANALYSING MULTI-CENTRE TRIALS 1761

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 1753—1765 (1998)

Page 10: Some controversies in planning and analysing multi-centre trials

Figure 3. Power of a multi-centre trial as a function of the number of centres given treatment by centre interaction (seetext for explanation)

The above discussion is of course posited on the assumption that the fact that one has morecentres in a trial does not adversely affect the average quality of the centres. If it is possible toidentify a centre which is better in terms of quality of work than all others, or if monitoringstandards suffer as one includes more centres, then it is possible that a single centre trial could beat an advantage.

IN CONCLUSION: OUT OF SIGHT, OUT OF MIND

The conclusion which I offer is this: multi-centre trials do not bring new problems compared tosingle-centre trials, although they may bring new opportunities. The problems which affectmulti-centre trials are essentially those from which single-centre trials suffer also. The difference isthat multi-centre trials cause us to think about these difficulties and hence to confront them. Insingle-centre trials we have a tendency to ignore them. Only if one believes that to bury one’s headin the stand is an effective way to avoiding danger can single-centre trials be regarded as lessproblematic than multi-centre ones.

APPENDIX I: RECRUITMENT

It is assumed that each centre contributes patients according to a common stable Poisson processwith intensity j per year. It is required to recruit N patients in total in k centres. Thus inter-arrivaltimes in a given centre are negative exponential with mean 1/j and we require the time, t, torecruit patient number n

iin a given centre i. This is given by the gamma distribution with

parameters B"1/j and C"ni. The probability density is

f (t; j, ni)"(jt)ni~1 e~jt/M(n

i!1)!/jN. (1)

The distribution function is given by

F(t; j, ni)"1!e~jt

ni~1+j/0

(jt)j/ j !. (2)

1762 S. SENN

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 1753—1765 (1998)

Page 11: Some controversies in planning and analysing multi-centre trials

For strategy 1 there is no individual centre requirement. The time to recruit N patients ink centres is given by the gamma distribution with parameters B"1/(kj) and C"N. Theprobability density function is

g (t, kj, N)"(kjt)N~1 e~jkt/M(N!1)! /(kj)N (3)

and the distribution function is

G (t, kj, N)"1!e~jktN~1+j/0

(jkt)j/j !. (4)

For strategy 2 each centre must recruit the same number of patients and we have distributionfunction

H (t)"MF(t, j, N/k)Nk (5)

and probability density function

h (t)"d

dtH(t). (6)

By setting (4) or (5)"0)5 and solving for t we can obtain median recruitment times for the twostrategies. For k"4, N"96 and j"24 we obtain G~1(0)5)"0)99 and H~1(0)5)"1)20.

APPENDIX II: ESTIMATORS

Assume that we have k centres. The true treatment effect for centre i is qiand its estimator is qL

iwith variance p2

i. An overall treatment effect will be estimated by +k

i/1wiqLiwhere w

iis the weight

for centre i and +ki/1

wi"1. The type III approach consists of setting w

i"1/k for all i so that

E(qLIII

)"+k1/1

qi/k"qN and var(qN

III)"+k

i/1p2i/k2. The type II approaches uses weights w

i"(1/p2

i)/

+ki/1

(1/p2i). If the variances within centres are identically equal to p2 (this is an assumption which

is commonly made and then becomes a theoretical property of the model rather than an empiricalfact) and we have n

ipatients in total in centre i with n

i/2 on each treatment arm then for the

type II approach we have

wi"M1/(4p2/n

i)NN

k+i/1

M1/(4p2/ni)N"n

iNk+i

ni. (7)

Hence centres are weighted according to the number of patients. It then follows that under thesecircumstances expectations and variances of the type II estimator are given by

E (qNII)"

k+i/1

niqiN

k+i/1

ni

and var(qLII)"4p2N

k+i/1

ni.

It is the first of these properties which makes the estimator unacceptable to the type III adherentbut it becomes unobjectionable when one realizes that the individual q

iare completely without

interest.The random effects estimator, qL

R, uses weights

wi"M1/(c2#p2

i)NN

k+i/1

M1/(c2#p2i)N (8)

PLANNING AND ANALYSING MULTI-CENTRE TRIALS 1763

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 1753—1765 (1998)

Page 12: Some controversies in planning and analysing multi-centre trials

where c2"var(q) and the treatment effect is now regarded as random. If c2 is large compared tothe values of p2

ithen (14)8)+1/k. On the other hand if c2 is small, then (8)+(7). Hence in practice

it will usually be the case that either qLII)qL

R)qL

IIIor qL

II*qL

R*qL

III. Note that if we have equal

numbers per centre so that ni"n for all i then (8)"(7)"1/k so that all three estimators are

identical. Even under these circumstances, however, the variance of the random effects estimatorwill exceed the other two.

APPENDIX III: PROBABILITIES OF EFFECT REVERSALS

The precision t (ratio of clinically relevant difference to standard error) of a trial designed to havea power of 1!b for a two-sided test of size of a will be

t"'~1 (1!a/2)#'~1 (1!b)

where '( ) ) is the distribution function of the standard Normal and '~1( ) ) is its inverse. Ifpatients are equally divided between k centres, then the precision in a given centre will be t/Jk. Ifwe assume that the probability of an effect reversal overall is negligible and suppose that theoverall treatment effect is positive, then the probability of not showing a reversal in a given centreis the probability that the estimated treatment effect is positive in that centre which is1!' (!t/Jk)"'(t/Jk). Hence, the probability that it is positive in all k centres isM'(t/Jk)Nk, from which the probability of at least one reversal is

1!M'(t/Jk)Nk. (9)

APPENDIX IV: POWER OF SINGLE-CENTRE AND MULTI-CENTRE TRIALS

Assume that centres are of equal sizes and of two sorts: those in which with probability p there isa treatment effect of size q

1and those for which with probability 1!p there is a treatment effect

of size q2. Assume that the within-trial variance is var"p2, that there are k centres and that the

total number of patients is N. Assume that there are equal numbers of patients per group percentre and that this integer number may be approximated by N/(2k).

It is also assumed that the trial has been designed to have a target power 1!b for a sizea based on a treatment effect of *"pq

1#(1!p)q

2, based on the mistaken assumption that the

treatment effect is constant from centre to centre, but that the analysis employed will eliminatepatient-by-treatment interaction. From the power and size requirements we may find N as

N"

4p2M'~1(1!a/2)#'~1(1!b)N2pq

1#(1!p)q

2

. (10)

The number of centres of one type or another actually chosen is assumed binomial(k, p) and theconditional power given the number of each type chosen is easily calculated. The sum of theproduct of the marginal and conditional distribution gives the required power, n (k, p) asa function of the proportion of centres over all of each type and the number of centres in the trial.Thus

n (k, p)"k+i/0

k!

i! (k!i) !pi(1!p)k~1'C

Miq1#(i!k)q

2N/k

2pJN!'~1(1!a/2)D. (11)

1764 S. SENN

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 1753—1765 (1998)

Page 13: Some controversies in planning and analysing multi-centre trials

ACKNOWLEDGEMENTS

I thank Willi Maurer, Uwe Ferner and Robert O’Neill for having introduced me to the issue ofeffect reversal and the referee for helpful comments.

REFERENCES

1. Senn, S. J. Statistical Issues in Drug Development, Wiley, Chichester, 1997.2. CPMP Working Party on Efficacy of Medicinal Products. ‘Biostatistical methodology in clinical trials in

applications for marketing authorizations for medicinal purposes’, Statistics in Medicine, 14, 1659—1682(1995).

3. Speed, F. M., Hocking, R. R. and Hackney, O. P. ‘Methods of analysis of linear models with unbalanceddata’, Journal of the American Statistical Association, 73, 105—112 (1978).

4. Nelder, J. ‘A reformulation of linear models’, Journal of the Royal Statistical Society, Series A, 140, 48—76(1977).

5. Berry, D. A. ‘Basic principles in designing and analysing clinical studies’, in Berry, D. A. (ed.), StatisticalMethodology in the Pharmaceutical Industry, Marcel Dekker, New York, 1990.

6. Fleiss, J. L. ‘Analysis of data from multiclinic trials’, Controlled Clinical ¹rials, 10, 237—243 (1986).7. Senn, S. J. ‘A personal view of some controversies in allocating treatment to patients in clinical trials’,

Statistics in Medicine, 14, 2661—2674 (1995).8. Ward, P. ‘Europe takes tentative steps to combat fraud’, Applied Clinical ¹rials, 4, (9), 36—40 (1995).

PLANNING AND ANALYSING MULTI-CENTRE TRIALS 1765

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 1753—1765 (1998)