hans van houwelingen and the art of summing up

10
Hans van Houwelingen and the Art of Summing up Stephen Senn Department of Statistics, University of Glasgow, Glasgow, G12 8QW, UK Received 17 March 2009, revised 15 July 2009, accepted 13 October 2009 Some personal remarks about Hans van Houwelingen’s approach to biostatistics in general are followed by a discussion of his article with Koos Zwinderman and Theo Stijnen outlining a bivariate approach to meta-analysis. It is concluded that this is more radical than many may realise in that it permits inter-trial information to be recovered. This has some advantages but in theory opens the door to bias. It is concluded that in practice the size of this bias is likely to be small. I end with some further personal remarks to Hans. Key words: Bias; Bivariate model; Inter-trial information; Meta-analysis; Random effects. 1 Introduction I can’t remember where I first heard Hans speak. I firmly associate him with the meetings of the International Society for Clinical Biostatistics (ISCB), which he and I have both attended for many years. I know that we were both at the meeting in Cardiff organised by Douglas Wilson in 1986 but can’t remember whether I heard Hans then. What I do know is that by the time that Hans gave his keynote address to the ISCB in Budapest 10 years later, I must have already heard him speak many times. I remember that I knew what to expect and was not disappointed. I do not mean that the content of his lecture was predictable. Hans has worked on many themes and continues to produce fresh and innovative work. I mean that the style was predictably, ‘‘Hans’’, which is to say that it included challenging material but was also relevant to the practice and purpose of biostatistics. Hans’s talk was eventually published in the proceedings of the ISCB (van Houwelingen, 1997). As it is a review article, it makes rather easier reading to the less mathematically inclined (among which, alas, I must include myself) than some he has written but it still refers to mathematical challenges. It also refers to some other challenges of data-analysis and it is one of these that I wish to treat here. I quote from Hans’s article and a section in which he recounts some statistical nightmares: ‘‘Meta-analysis is another nightmare. I wholeheartedly support the idea of combining evidence from different sources, but the popular practice of analysing summary measures from selected publications is a poor man’s solution. As I said before, I hope that we will have full multi-centre multi-study databases that can be analysed by appropriate random effect models considering both random variation within and between studies and/or centres. In that ideal situation we are back to the data, there is no meta-aspect on the analysis anymore and the term meta-analysis can be skipped from the dictionary. In the meantime, we have to walk the poor man’s path of meta- analysis and to perform the best analysis of the evaluable summary data modelling between study variation by explanatory variables at the meta-level and residual random effects.’’ * Correspondence author: e-mail: [email protected], Phone: 144-141-330-5141, Fax: 144-141-330-4814 r 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Biometrical Journal 52 (2010) 1, 85–94 DOI: 10.1002/bimj.200900074 85

Upload: stephen-senn

Post on 06-Jun-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Hans van Houwelingen and the Art of Summing up

Stephen Senn�

Department of Statistics, University of Glasgow, Glasgow, G12 8QW, UK

Received 17 March 2009, revised 15 July 2009, accepted 13 October 2009

Some personal remarks about Hans van Houwelingen’s approach to biostatistics in general arefollowed by a discussion of his article with Koos Zwinderman and Theo Stijnen outlining a bivariateapproach to meta-analysis. It is concluded that this is more radical than many may realise in that itpermits inter-trial information to be recovered. This has some advantages but in theory opens thedoor to bias. It is concluded that in practice the size of this bias is likely to be small. I end with somefurther personal remarks to Hans.

Key words: Bias; Bivariate model; Inter-trial information; Meta-analysis; Random effects.

1 Introduction

I can’t remember where I first heard Hans speak. I firmly associate him with the meetings of theInternational Society for Clinical Biostatistics (ISCB), which he and I have both attended for manyyears. I know that we were both at the meeting in Cardiff organised by Douglas Wilson in 1986 butcan’t remember whether I heard Hans then. What I do know is that by the time that Hans gave hiskeynote address to the ISCB in Budapest 10 years later, I must have already heard him speak manytimes. I remember that I knew what to expect and was not disappointed. I do not mean that thecontent of his lecture was predictable. Hans has worked on many themes and continues to producefresh and innovative work. I mean that the style was predictably, ‘‘Hans’’, which is to say that itincluded challenging material but was also relevant to the practice and purpose of biostatistics.

Hans’s talk was eventually published in the proceedings of the ISCB (van Houwelingen, 1997). Asit is a review article, it makes rather easier reading to the less mathematically inclined (among which,alas, I must include myself) than some he has written but it still refers to mathematical challenges. Italso refers to some other challenges of data-analysis and it is one of these that I wish to treat here. Iquote from Hans’s article and a section in which he recounts some statistical nightmares:

‘‘Meta-analysis is another nightmare. I wholeheartedly support the idea of combining evidencefrom different sources, but the popular practice of analysing summary measures from selectedpublications is a poor man’s solution. As I said before, I hope that we will have full multi-centremulti-study databases that can be analysed by appropriate random effect models consideringboth random variation within and between studies and/or centres. In that ideal situation we areback to the data, there is no meta-aspect on the analysis anymore and the term meta-analysis canbe skipped from the dictionary. In the meantime, we have to walk the poor man’s path of meta-analysis and to perform the best analysis of the evaluable summary data modelling between studyvariation by explanatory variables at the meta-level and residual random effects.’’

* Correspondence author: e-mail: [email protected], Phone: 144-141-330-5141, Fax: 144-141-330-4814

r 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Biometrical Journal 52 (2010) 1, 85–94 DOI: 10.1002/bimj.200900074 85

I share Hans’s hope that the development of multi-centre multi-study databases will eventuallyallow us to do away with meta-analysis (MA) in its current form, although I hope that Hans won’tmind my pulling his leg a little by saying that there is something slightly utopian in this passage,vaguely reminiscent of Engels’s oft quoted remark that as the working of the dialectic proceedstowards its apotheosis, ‘‘the state is not abolished it withers away’’. Even more than a dozen yearsafter Hans’s lecture, I don’t think that there is any prospect of MA of summary statistics witheringaway and it is MA that I propose to consider in this article.

In fact, it is another of Hans’s articles, also published in Statistics in Medicine, and also reportingwork presented at an ISCB meeting that I propose to look at more closely here, namely his articlewith Koos Zwinderman and Theo Stijnen (van Houwelingen, Zwinderman, and Stijnen, 1993). Alater paper in Statistics in Medicine (van Houwelingen, Arends, and Stijnen, 2002) explains theapproach in more detail. I shall refer to the earlier article as HZS. In fact, HZS has got so much inits short span of 11 pages that I shall only be concentrating on one section of it, that is to say Section5, in which it describes a parametric Normal–binomial mixture for modelling MA. Before I do so,however, I wish to dispose of four red herrings or misconceptions.

In doing this, I want to make it quite clear that I do not think that these are misconceptions thatHans shares. On the contrary, I suspect that he finds them so obviously wrong that they are notworth a mention. However, I think it will be useful to take this opportunity to clear them out of theway as a prelude to discussing HZS, since HZS actually involves a more radical assault on somehabits of MA than many might suppose.

2 Four Misconceptions of Meta-Analysis

The first misconception of MA is that an analysis of original data is more precise than an analysisbased on summary statistics. It is indeed true that an analysis based on summary statistics asconventionally carried out is frequently inefficient. This is because such MAs weight the treatmentcontrasts from individual trials by observed information and observed information is imprecise(Senn, 2000, 2007b). However, to take the case of continuous data, given standard summaries in theform of means and variances per treatment arm, it is possible to reconstruct what a conventionalordinary least-squares (OLS) analysis of the original data would show. This is summarized in Table1 using data based on Yudkin et al. describing a MA of four placebo controlled trials of aza-thioprine in multiple sclerosis published in The Lancet (Yudkin et al., 1991). These data are quotedby Petitti in her book (see Table 8-2 on p. 117, Petitti, 1994) and I shall follow her example of usingthe data on the Kurtzke disability status score at 2 years of follow-up. The variance figure in thethird column is the estimated variance of the original data at the patient level assuming that there ishomoscedasticity across arms within studies but not across studies. Bartlett’s test is equivocal asregards this assumption, giving a chi-square of 6.82 on three degrees of freedom (DF) and a p-valueof 0.08.

Now the point is that to produce the OLS solution, all that we need to do is impose theassumption of homoscedasticity across studies. In that case by weighting each of the variances bythe DF, we recover a single estimate of the patient-level variance for all studies of 1.544. Thedifference between MA and OLS is now as follows. In each case, we proceed to estimate a figure forthe variance of the contrasts trial-by-trial. To do this, we multiply the patient level variance by thevalues of ð1=n111=n2Þ for each study, where n1 is the number of patients on azathioprine and n2 isthe number on placebo. For the OLS solution the variance in question is 1.544 for every study andfor MA it is as summarized in Table 1. The situation is provided in Table 2, where the columnheaded DF gives the degrees of freedom by which the between-patient variances per trial in the MAcolumn have been weighted to produce the repeated figure in the OLS column. The fourth column,gives the multiplier ð1=n111=n2Þ based on patient totals per arm and the last two columnsare the estimated variances of the treatment contrasts according to the two methods. Standard

86 S. Senn: Hans van Houwelingen and the art of summing up

r 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

meta-analytic software will now permit one to produce the OLS solution. All that it is necessary todo is input the treatment contrasts from Table 1 and the variances (or standard error calculatedfrom the variances) from the last column of Table 2. Use of the second to last column of Table 2yields the standard meta-analytic solution. Here, the conventional MA method yields an over allfixed effects estimate (standard error) of �0.215 (0.111), whereas the OLS solution is �0.189 (0.113).

Which is better? The MA solution can never be optimal, although it will always appear to give amore precise answer (as here) than the OLS or indeed any other solution (Senn, 2000). (In con-sequence, conventional MA confidence intervals generally do not achieve claimed coverage prob-abilities (Senn, 2007a).) If, as is scarcely credible, there is true homoscedasticity, then OLS is optimal.In practice, some ideal compromise, to be achieved (with great difficulty) by putting random effects onthe variances, might be better. The point is, however, that it is not necessary to have the original datato do one or the other. This applies, a fortiori of course, to binary outcomes, where the equivalence ofthe Bernoulli formulation of the likelihood at patient level with the binomial formulation at trial levelis well known and reflected in algorithms for logistic regression in all major packages. This, of course,does not carry over when we come to have a look at covariate information and it is this, maybe thatHans has in mind when referring to databases in the passage above. However, as will be discussed inSection 3, the HZS approach to MA does require information in the form of summary statistics perarm. It cannot proceed on the basis of trial-specific treatment contrasts.

The second misconception of MA is that a fixed effects MA is equivalent to an OLS analysis of alinear model in which trial and treatment but not their interaction are fitted. This has been claimedat least twice in the pages of Biometrics (Mathew, 1999; Olkin and Sampson, 1998) but it issomewhat misleading (Senn, 2000).

As the above demonstration showed, by imposing the assumption of homoscedasticity acrosstrials we recover the OLS estimate and, indeed, provided we calculate the variances of the individualtrial contrasts this way, we can recover the OLS estimate using standard MA software. This is,indeed, the same estimate of the treatment effect we would get by analysing the original data fittingmain effects of trial and treatment. It is also the estimate we would get fitting an interaction provided

Table 1 Summary of results in terms of Kurtzke disability scale from four trials of Azathioprine inmultiple sclerosis.a)

Study Contrast Variance Patients Patients

16 �0.12 1.614 162 17517 �0.66 0.897 15 2020 �0.25 1.233 30 3221 �0.25 1.878 27 25

a) The variance in question is the between-patient within-trial estimate assuming homoscedasticity between arms but not

between trials. The identifying numbers for studies are as given in Yudkin et al. (1991).

Table 2 Various statistics calculated from Table 1.

Between-patient variances Variances of contrasts

Study DF MA OLS Multiplier MA OLS

16 335 1.614 1.544 0.012 0.019 0.01817 33 0.897 1.544 0.117 0.105 0.18020 60 1.233 1.544 0.065 0.080 0.10021 50 1.878 1.544 0.077 0.145 0.119

Biometrical Journal 52 (2010) 1 87

r 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

that we use the type II philosophy in calculating the main effect of treatment. However, the estimateof the standard error of the treatment contrast is not that which comes from a model withoutinteraction. It is that which comes from a model with interaction. This can be seen by consideringthat in conventional MA of N patients in total allocated to one of the two treatments in k trials, 2DF are lost in estimating each of the k variance so that there are N � 2k DF used in total forestimating variances. On the other hand, a main effects model would lose 1 DF for the grand mean 1for treatment and k� 1 for trials to leave N � k� 1. The difference between the two is k� 1, whichis the number of DF available for the interaction. In other words, the MA approach correspondsmutatis mutandis to the OLS approach, with interaction fitted.

This brings me on to the third misconception. A conventional fixed effects MA is sometimesclaimed to depend on an assumption of no interaction. However, this is a rather strange claim if MAcorresponds (apart from local estimation of variances) to the OLS model with interactions. If weknew that the treatment effect was the same from trial to trial, it would be more efficient (in a second-order sense) to pool the k� 1 DF from the interaction with the error term. In fact, it is possible (ifslightly tedious) to recover this information from the Q statistic for homogeneity. (I am tempted toshow how to do this here but the margin of the article is not wide enough to hold the result.)

I would say, rather, that whether one uses a fixed or random effects MA depends on purpose. Forinstance, if one wishes to test the null hypothesis that the treatment is ineffective, the hypothesis of notrial-by-treatment interaction can be regarded as being included in the hypothesis being tested. It istherefore legitimate to proceed in testing this hypothesis as if the assumptions were true (Senn, 2004).

My fourth misconception is the belief that a linear model in which not only the trial-by-treatmentinteraction but also the trial effects is treated as random corresponds to a random effects MA. Thismight, indeed be a very natural way to model such data. Indeed, for example, in their excellent bookon mixed-effects models, Brown and Prescott (2006) effectively assume that a standard randomeffects MA proceeds in this way. (See Section 5.6 of that book.) It does not.

From its very beginning, either implicitly or explicitly, the idea of local control was built into MA.This means that all conventional linear MA approaches can be based on within trial–treatmentcontrasts. Recovery of information between trials is not permitted but it would be if the trial effectswere treated as random. Of course, in a Bayesian framework everything is random but the ana-logous difference to the frequentist distinction between a fixed effect and a random effects approachto the main effect of trial is the difference between a non-hierarchical and a hierarchical model. Thefixed effects analogue has a non-hierarchical vague prior on the mean effect for each trial, whereasthe random effects analogue has a single distribution for the trial effects, although it in turn willhave a prior distribution on its mean and variance.

3 The Bivariate Approach of Hans van Houwelingen and Colleagues

This method is outlined in Section 5 of HZS. The approach is specifically illustrated with binarydata but it is not the treatment of binary data per se which is most controversial: that relates to themodelling of outcomes at the patient level. An earlier part of HZS has some extremely interestingproposals regarding this but the treatment in Section 5 of this aspect is more conventional. Thecontroversial aspect of Section 5, however, has to do with the modelling of effects at the trial level.

HZS propose to regard the ‘‘true’’ mean response for trial i; i ¼ 1 � � � k as a bivariate vector

hi ¼yiTyiC

� �ð1Þ

where the hi are distributed independently from trial to trial but within trials as a bivariate Normal.That is to say,

hi�Nðl;�Þ ð2Þ

88 S. Senn: Hans van Houwelingen and the art of summing up

r 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

identically and independently, where

l ¼mTmC

� �;� ¼

s2T sTC

sTC s2C

� �: ð3Þ

Now, while this model may seem perfectly natural from a perspective of multivariate analysis, thisperspective itself may seem a little strange to trialists. For example, what has suddenly become veryimportant is the random distribution of observed responses under treatment and control. But this isprecisely a level at which the statistician has had no random input. It is not obvious in thisparticular formulation at what level if any the fact that one has used a randomised controlled trialenters into the picture.

Of course, it is true that randomisation models are not quite the same as Normal theory modelsand we regularly use the latter while invoking the former, a habit that is not without its critics. (See,for example, Ludbrook and Dudley’s article in The American Statistician (Ludbrook and Dudley,1998).) Indeed, there is an argument for saying that where blinding is crucial one goes beyondrandomisation tests at one’s peril (Senn, 1994, 2004).

Another feature that is strange from the point of view of the conventional analysis of clinicaltrials is that it appears that separate effects for treatment and control now become identifiable. Theusual point of view is that only contrasts are identifiable. This is a difference between sampling andexperimental views of inferences and was noted, for example, by Michael Healy in commenting onJohn Nelder’s famous article on linear models (Nelder, 1977). See also my article in Statistics inMedicine (Senn, 2004).

One further issue is slightly puzzling a first sight. How does the conventional random effectsvariance appear in this model? How does one know if the random variation from trial to trial isimportant?

This can be seen if we reformulate the model as follows. An alternative to Eq. (1) would be

ui ¼yiT1yiCyiT � yiC

� �

with alternatives to Eqs. (2) and (3) of

ui�Nðl0;R0Þ

and

l0 ¼od

� �;R0 ¼

s2T1s2

C12sTC s2T � s2

C

s2T � s2

C s2T1s2

C � 2sTC

� �:

Here, d is the average ‘‘treatment effect’’ and o is a general level parameter. Now, it is the bottomright-hand element of R0 that corresponds to what is usually referred to as the ‘‘random effectsvariance’’ and this is seen to involve both the variances and the covariances. An alternativerepresentation of R0 would be

s2T11f212rf 1� f2

1� f2 11f2� 2rf

� �

where f ¼ sC=sT the ratio of standard deviations and r is the correlation coefficient. Somealternative formulations implicitly assume that f ¼ 1, an assumption that HZS relax. In practice,however, we can expect values of f close to 1, although the combination of patients-by-treatmentinteraction and a variation in patient types from trial to trial might produce some difference invariances. (A referee points out to me that if the control is a placebo in practice we might expectsT4sC and hence fo1.) Suppose we write

t2 ¼ R022 ¼ s2T ð11f2

� 2rfÞ

for the random effects variance. It can be seen that a large random effects variance requires dif-ferences in observed outcomes from trial to trial causing s2

T (or s2C) to be large and also that the

Biometrical Journal 52 (2010) 1 89

r 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

correlation should be at most moderately positive. I have argued elsewhere, which has implicationsfor any choice of Bayesian prior distributions, that it is also inherently more reasonable to supposethat large values of t2 are unlikely unless d is large (Senn, 2007b).

4 What is Controversial about This?

What is controversial about this, and related approaches to MA, is not that it allows for a treatmenteffect that varies randomly from trial to trial. As HZS state, it is unrealistic to assume that thetreatment effect will be constant and I and many other statisticians would agree with them, that arandom effects MA is a legitimate and interesting thing to do in many circumstances. I wouldmaintain that a fixed effects MA is also something of interest to do in all circumstances and it mightbe that HZS would disagree with this but that is another matter.

The consequence is that the approach appears to allow the recovery of inter-trial information.(See the Appendix for an explanation.) This means that the principle of strict concurrent control isabandoned. This raises the possibility of bias. I suspect, for reasons that I will outline below, that inpractice the amount of bias, if any, will be low. Nevertheless, I think it has not been sufficientlyappreciated by some that this principle is abandoned by some random effect modelling approaches,including that of HZS.

Of course, there are certain purposes for which it may be necessary to allow the main effect of trialto be random. For example, if one wishes to examine and exploit the way in which the treatment effectvaries from trial to trial, then the way that the trials themselves vary becomes a key feature of theproblem and may need explicit modelling. The one occasion I have been privileged enough to be a co-author of Hans (van Houwelingen and Senn, 1999) provides an application to ‘‘baseline risk’’ and thiswas subsequently developed in greater detail by members of Hans’s group (Arends et al., 2000).

However, the possibility of bias caused by recovering inter-trial information is at least worth asecond thought and I consider it below.

5 Is it a Problem to Recover Inter-trial Information?

HZS state in their article, ‘‘We have work in progress in which the same methodology is applied tomulticentre multitreatment studies where the treatment effect y can vary randomly from centre tocentre.’’ (p. 2282). However, although, as has been pointed out elsewhere, there is a stronghomology between a multi-centre trial and an MA of many trials, in that both are hierarchical datastructures, a multi-centre trial may have involved randomisation of patients between as well aswithin centres but an MA will not have involved randomisation between trials (Senn, 2000).

An interesting case study, that it would be nice to analyse in more detail, is that of the TARGETstudy reported in The Lancet (Farkouh et al., 2004; Schnitzer et al., 2004) and which I also looked at inPharmaceutical Statistics (Senn, 2008). This was a study comparing lumiracoxib, naproxen and ibu-profen but which was conducted, for reasons of convenience in blinding and administering treatments,in two sub-studies in one of which lumiracoxib was compared with naproxen and in the other of whichit was compared with ibuprofen. In terms of baseline characteristics, there is a striking homogeneity ofpatients between arms within sub-studies but a striking heterogeneity across sub-studies.

One must be careful, however, in assuming that this points to the danger of recovering between-study information. The standard by which heterogeneity is judged is using patients as the index ofsample size and this is not appropriate for all purposes (Senn and Harrell, 1997). To explore thesedata further, it would be necessary to see whether using not patients but centres-within-substudy asthe primary unit of inference there was still heterogeneity. Data at this level of detail are notavailable to me but presumably the trial sponsor would be able to provide them and it might be aninteresting exercise to have a look again at this trial.

90 S. Senn: Hans van Houwelingen and the art of summing up

r 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

More disturbing, perhaps, is the evidence provided by a recent examination by Julious and Wang(2008) in the Drug Information Journal, in which they found evidence both of a change over time ofthe effect of placebo on occlusion rates in controlled trials of aspirin after coronary surgery and onresponse in trials of depression. Since HZS effectively assume that trials are exchangeable, suchtrends are potentially a problem.

I say potentially a problem because it would take more than this to cause a difficulty for the HZSapproach. It would require in addition that the amount of inter-trial information was appreciable.For MAs comparing two treatments only, this does not seem likely unless many of the trialsinvolved unequal randomisation. Even where this was the case it would not lead to a marked biasunless there was a correlation between allocation and other trends.

In short, although I respect the principle of concurrent control and consider that HZS (and otherapproaches that treat the main effects of trial as random) do not entirely respect this principle, Iconsider that in practice little harm is likely to be done.

Nevertheless, I think that researchers should at least be aware of what they are embarking onwhen they undertake such an analysis. Also by the same token, the question arises, if the dangers oftreating trials at random are not great, is this because for most purposes such approaches make littledifference and if so is much gained? I hope that Hans will accept the challenge, in due course, ofanswering this.

6 To Sum up Hans Summing up

Hans’s contributions to MA have been both original and challenging. I was tempted to say,thinking of Hans’s article with Theo Stijnen (Stijnen and van Houwelingen, 1990) in this journal in1990, ‘‘some twenty years after he started working in this field we are only just catching up’’.However, when I look at the references in that article, I find a doctoral thesis by Hans that is dated1973 and entitled ‘‘On empirical Bayes rules for the continuous exponential family’’. It seems thatHans’s adventures in data shrinkage have been going on for 40 years! I suspect, furthermore, thatHans’s contributions will continue and in that context I have some good news and some bad newsfor Hans on his retirement.

The good news for you Hans, is that you are required to do less teaching, less writing of grantproposal and above all less administration. The bad news is that we expect you to do more: morementoring and more research.

I look forward to learning more from Hans at future ISCB meetings not only with great pleasurebut also with a little apprehension.

Acknowledgements This article was written while on study leave from Glasgow University while visitingENSAI in Rennes and I am grateful to both institutions for supporting my research. I am grateful to a refereeand the editors for helpful comments.

Conflict of Interests Statement

The authors have declared no conflict of interest.

Appendix: Random Main Effects of Trials and the Recovery of Inter-trial

Information

We illustrate this with a simple case in which two treatments, placebo and active, are studied in twotrials. This leads to a fourfold table of mean responses with columns (say) representing treatmentsand indexed j ¼ 1; 2 and rows (say) representing trials and indexed i ¼ 1; 2. For simplicity, the

Biometrical Journal 52 (2010) 1 91

r 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

treatment effect, t, that is to say the difference between active and placebo, is assumed constantfrom trial to trial. That being so, we can represent the variances and covariances between the fourmeans as follows.

ðg21c211Þ g2 ! ðg21c2

12Þ

" - % "

0 0 0# . & #

ðg21c221Þ g2 ! ðg21c2

22Þ

ð4Þ

Here, the terms in brackets are the variances and the terms between the arrows are the covariancesof the means in the fourfold table, with the arrows indicating the positions of the terms forwhich the covariance applies. The terms of the form c2

ij represent the precision with which meanresponses have been measured within trial and treatment groups and they in turn would reflect thevariance at patient level and the number of patients so that we might (most simply) have somethinglike

c2ij ¼

s2ij

nij;

where s2ij is a patient level variance and nij is the number of patients in trial i under treatment j. The

term g2 is the variance in the true mean from group to group.Now to estimate the treatment effect we must use a linear combination of the four cell means with

weights of the form

w1 w2

�w1 � 1 �w211: ð5Þ

This is because in our two-by-two table the left-hand column means reflect the placebotreatment and the right-hand column means the active treatment. The weights associated with theformer must add to �1 and the latter to 11 if we are to produce an unbiased estimate of thecontrast.

We now need to choose w1;w2 so as to minimise the variance of our estimate. This requiresminimisation of the function (6) below, which is the variance of the linear combination obtained byapplying the weights in Eq. (5) to the means from our two-by-two table and using the variances andcovariances as given by Eq. (4).

f ðw1;w2Þ ¼w21ðg

21c211Þ1w2

2ðg21c2

12Þ12w1w2g2

1ð11w1Þ2ðg21c2

21Þ1ð1� w2Þ2ðg21c2

22Þ

� 2ð11w1Þð1� w2Þg21lðw11w2Þ:

ð6Þ

Here, l is a Lagrange multiplier. If we are prepared to regard the main effects of trial as random wehave l ¼ 0 but if not, we need the constraint that the sum of weights in any row (which correspondto the trials) add to zero. If we differentiate with respect to l;w1;w2 and set the derivatives equal tozero, we obtain the system of equations

w11w2 ¼ 0

w1ð2g21c2111c2

21Þ12w2g21c2211l=2 ¼ 0

w2ð2g21c2221c2

12Þ12w1g2 � c2221l=2 ¼ 0

ð7Þ

Tedious but elementary algebra then yields the solution

w1 ¼�ðc2

211c222Þ

ðc2111c2

121c2211c2

22Þ; w2 ¼ �w1: ð8Þ

92 S. Senn: Hans van Houwelingen and the art of summing up

r 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

This is the fixed effects solution. Note that it forces the weights in Eq. (5) to add to 0 in any row, asmay be seen by inspecting Eq. (8). It thus eliminates the main effect of trial. When the effect of trialis random, however, we only use the second and third equations in Eq. (7) and set l ¼ 0. This yieldsthe solution

w1 ¼ ½�2g2ðc2211c2

22Þ � c221ðc

2121c2

22Þ�=k

w2 ¼ ½2g2ðc2211c2

22Þ1c222ðc

2111c2

21Þ�=k

k ¼ 2g2ðc2111c2

121c2211c2

22Þ1ðc2111c2

21Þðc2121c2

22Þ:

ð9Þ

Note that as g2!1 the solution given by Eq. (9) approaches that given by Eq. (8). Thus from onepoint of view the solution with trial effects as fixed is equivalent to the one with trial effects randombut with infinite variance. Similarly, if the trials are balanced, so that c2

11 ¼ c212; c

221 ¼ c2

22, thenEqs. (8) and (9) reduce to the identical solution

w1 ¼�c2

21

c2111c2

21

w2 ¼c221

c2111c2

21

:

Again we have w2 ¼ �w1. This is because in the balanced case within-trial contrasts are fullyefficient and there is no between-trial information to recover.

However, more generally, Eqs. (8) and (9) are not identical and the scheme of weights given byEq. (9), since it represents an unconstrained optimisation, will give variances that are lower thanthat given by Eq. (8). The difference will be most marked for cases where g2 is small and where theimbalance between arms of trials is strong (and where the degree of imbalance varies from trial totrial). Furthermore, unlike the case for Eq. (8) it is not in general the case for Eq. (9) that w2 ¼ �w1.Hence, the analysis does not proceed on the basis of within-trial contrasts. It thus breaks theprinciple of concurrent control and it is this that makes it controversial.

References

Arends, L. R., Hoes, A. W., Lubsen, J., Grobbee, D. E. and Stijnen, T. (2000). Baseline risk as predictor oftreatment benefit: three clinical meta-re-analyses. Statistics in Medicine 19, 3497–3518.

Brown, H. and Prescott, R. (2006). Applied Mixed Models in Medicine (2nd edn). Wiley, Chichester.Farkouh, M. E., Kirshner, H., Harrington, R. A., Ruland, S., Verheugt, F. W., Schnitzer, T. J., Burmester,

G. R., Mysler, E., Hochberg, M. C., Doherty, M., Ehrsam, E., Gitton, X., Krammer, G., Mellein, B.,Gimona, A., Matchaba, P., Hawkey, C. J. and Chesebro, J. H. (2004). Comparison of lumiracoxibwith naproxen and ibuprofen in the Therapeutic Arthritis Research and Gastrointestinal Event Trial(TARGET), cardiovascular outcomes: randomised controlled trial. Lancet 364, 675–684.

Julious, S. A. and Wang, S. J. (2008). How biased are indirect comparisons, particularly when comparisons aremade over time in controlled trials? Drug Information Journal 42, 625–633.

Ludbrook, J. and Dudley, H. (1998). Why permutation tests are superior to t and F tests in biomedicalresearch. American Statistician 52, 127–132.

Mathew, T. (1999). On the equivalence of meta-analysis using literature and using individual patient data.Biometrics 55, 1221–1223.

Nelder, J. A. (1977). A reformulation of linear models. Journal of the Royal Statistical Society A 140, 48–77.Olkin, I. and Sampson, A. (1998). Comparison of meta-analysis versus analysis of variance of individual patient

data. Biometrics 54, 317–322.Petitti, D. B. (1994).Meta-Analysis, Decision Analysis and Cost-Effectiveness Analysis. Oxford University Press,

New York.Schnitzer, T. J., Burmester, G. R., Mysler, E., Hochberg, M. C., Doherty, M., Ehrsam, E., Gitton, X.,

Krammer, G., Mellein, B., Matchaba, P., Gimona, A. and Hawkey, C. J. (2004). Comparison of

Biometrical Journal 52 (2010) 1 93

r 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

lumiracoxib with naproxen and ibuprofen in the Therapeutic Arthritis Research and GastrointestinalEvent Trial (TARGET), reduction in ulcer complications: randomised controlled trial. Lancet 364,665–674.

Senn, S. J. (1994). Fisher’s game with the devil. Statistics in Medicine 13, 217–230.Senn, S. J. (2000). The many modes of meta. Drug Information Journal 34, 535–549.Senn, S. J. (2004). Added values: controversies concerning randomization and additivity in clinical trials.

Statistics in Medicine 23, 3729–3753.Senn, S. J. (2007a). Statistical Issues in Drug Development (2nd edn). Wiley, Hobokken.Senn, S. J. (2007b). Trying to be precise about vagueness. Statistics in Medicine 26, 1417–1430.Senn, S. J. (2008). Lessons from TGN1412 and TARGET: implications for observational studies and meta-

analysis. Pharmaceutical Statistics 7, 294–301.Senn, S. J. and Harrell, F. (1997). On wisdom after the event. Journal of Clinical Epidemiology 50, 749–751.Stijnen, T. and van Houwelingen, J. C. (1990). Empirical Bayes methods in clinical-trials metaanalysis.

Biometrical Journal 32, 335–346.van Houwelingen, H. (1997). The future of biostatistics: expecting the unexpected. Statistics in Medicine 16,

2773–2784.van Houwelingen, H. C. and Senn, S. J. (1999). Letter to the Editor: Investigating underlying risk as a source of

heterogeneity in meta-analysis. Statistics in Medicine 18, 110–115.van Houwelingen, H. C., Arends, L. R. and Stijnen, T. (2002). Advanced methods in meta-analysis: multi-

variate approach and meta-regression. Statistics in Medicine 21, 589–624.van Houwelingen, H. C., Zwinderman, K. H. and Stijnen, T. (1993). A bivariate approach to meta-analysis.

Statistics in Medicine 12, 2273–2284.Yudkin, P. L., Ellison, G. W., Ghezzi, A., Goodkin, D. E., Hughes, R. A., McPherson, K., Mertin, J. and

Milanese, C. (1991). Overview of azathioprine treatment in multiple sclerosis. Lancet 338, 1051–1055.

94 S. Senn: Hans van Houwelingen and the art of summing up

r 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com