lessons from tgn1412 and target: implications for observational studies and meta-analysis

8
PHARMACEUTICAL STATISTICS Pharmaceut. Statist. 2008; 7: 294–301 Published online 2 April 2008 in Wiley InterScience (www.interscience.wiley.com) DOI: 10.1002/pst.322 Lessons from TGN1412 and TARGET: implications for observational studies and meta-analysis Stephen Senn* ,y Department of Statistics, University of Glasgow, Glasgow, UK Two very different studies are examined: the first, a very large trial in osteoarthritis (the so-called TARGET study) and the second a very small ‘first-in-man’ study of the monoclonal antibody TGN1412. In each trial the unbiased estimate of the treatment effect is not efficient and in consequence the efficient estimate is not unbiased. In the case of the large trial it seems reasonable that unbiased estimation is desirable but in the case of the small trial it leads to absurd conclusions. These two cases are examined in detail and some general lessons for the analysis of clinical trials and observational studies and collections of studies are drawn. Copyright # 2008 John Wiley & Sons, Ltd. Keywords: veiled trial; bias; variance; mean-square error; concurrent control 1. INTRODUCTION Two very different studies with which I have been involved seem to carry quite different lessons regarding the value of concurrent control. The first of these, TARGET (Therapeutic Arthritis Research & Gastrointestinal Event Trial), was a very large trial in osteoarthritis and I was a member of the data safety monitoring board. The second was the small trial in healthy volunteers of TGN1412 that was abandoned in March 2006 after all subjects given the active treatment suffered extreme adverse reactions within a short space of time after starting the trial. I was the chairman of a Royal Statistical Society working party on first-in-man studies that tried to draw lessons from this trial [1]. The TARGET study points to the value of concurrent control and the study in TGN1412 to its irrelevance. This apparent contradiction is easily resolved in terms of the well-known statistical phenomenon of bias–variance trade- off. Nevertheless, it seems to me that these studies point to some interesting lessons and that it is valuable to consider these and contrast the studies. In particular, consideration of these trials carries y E-mail: [email protected] *Correspondence to: Stephen Senn, Department of Statistics, University of Glasgow, 15 University Gardens, Glasgow G12 8QQ, UK. Copyright # 2008 John Wiley & Sons, Ltd.

Upload: stephen-senn

Post on 06-Jul-2016

218 views

Category:

Documents


2 download

TRANSCRIPT

PHARMACEUTICAL STATISTICS

Pharmaceut. Statist. 2008; 7: 294–301

Published online 2 April 2008 in Wiley InterScience

(www.interscience.wiley.com) DOI: 10.1002/pst.322

Lessons from TGN1412 and TARGET:

implications for observational studies

and meta-analysis

Stephen Senn*,y

Department of Statistics, University of Glasgow, Glasgow, UK

Two very different studies are examined: the first, a very large trial in osteoarthritis (the so-called

TARGET study) and the second a very small ‘first-in-man’ study of the monoclonal antibody

TGN1412. In each trial the unbiased estimate of the treatment effect is not efficient and in

consequence the efficient estimate is not unbiased. In the case of the large trial it seems reasonable

that unbiased estimation is desirable but in the case of the small trial it leads to absurd conclusions.

These two cases are examined in detail and some general lessons for the analysis of clinical trials and

observational studies and collections of studies are drawn. Copyright # 2008 John Wiley & Sons,

Ltd.

Keywords: veiled trial; bias; variance; mean-square error; concurrent control

1. INTRODUCTION

Two very different studies with which I have beeninvolved seem to carry quite different lessonsregarding the value of concurrent control. Thefirst of these, TARGET (Therapeutic ArthritisResearch & Gastrointestinal Event Trial), was avery large trial in osteoarthritis and I was amember of the data safety monitoring board. Thesecond was the small trial in healthy volunteers ofTGN1412 that was abandoned in March 2006

after all subjects given the active treatmentsuffered extreme adverse reactions within a shortspace of time after starting the trial. I was thechairman of a Royal Statistical Society workingparty on first-in-man studies that tried to drawlessons from this trial [1].

The TARGET study points to the value ofconcurrent control and the study in TGN1412 toits irrelevance. This apparent contradiction iseasily resolved in terms of the well-knownstatistical phenomenon of bias–variance trade-off. Nevertheless, it seems to me that these studiespoint to some interesting lessons and that it isvaluable to consider these and contrast the studies.In particular, consideration of these trials carriesyE-mail: [email protected]

*Correspondence to: Stephen Senn, Department of Statistics,University of Glasgow, 15 University Gardens, Glasgow G128QQ, UK.

Copyright # 2008 John Wiley & Sons, Ltd.

some implications for observational studies andalso meta-analysis. To draw these lessons is thepurpose of this paper.

The plan of the paper is as follows. In Section 2I consider the TARGET study and in Section 3 thetrial in TGN1412. This is then followed by a briefdiscussion of some relevant points in Section 4.Finally, some implications for causal inference inobservational studies are considered in a briefconcluding fifth section.

2. THE TARGET STUDY

This was at the time probably the largestcontrolled study ever run in osteoarthritis. Morethan 18 000 patients were entered into the study.Although the condition being treated was osteoar-thritis, it was not the primary purpose of the trialto compare the treatments as regards their effectson this condition; it was taken as given that allthree were effective in osteoarthritis. Instead, thetreatments were compared as regards their gastric(GI) [2] and cardiovascular (CV) [3] side-effects.Lumiracoxib is the so-called COX-II inhibitor. Itwas expected to have fewer GI side-effects thannaproxen and ibuprofen, which were establishednon-steroidal anti-inflammatory drugs. On theother hand, it was considered possible that COX-II inhibitors, including lumiracoxib, might havemore CV side-effects.

The total patient numbers on the trial were 9156on lumiracoxib, 4415 on ibuprofen and 4754 onnaproxen. This appears to be (approximately) a2:1:1 randomization. However, an interestingfeature of this trial is that patients were notrandomized to one of the three treatments. Anygiven patient was only randomized to one of thetwo, since the trial was run in the form of two sub-studies, the first of which compared lumiracoxibwith ibuprofen in a 1:1 randomization and thesecond of which compared lumiracoxib withnaproxen in a 1:1 randomization. The reasonsfor this were to do with administrative conveni-ence but also to make blinding easier. It is acommon feature of clinical trials that blind

comparison of active treatments can only beachieved by using the so-called double dummytechnique. Suppose that active treatment A iscompared with active treatment B in a trial.Patients are allocated to one of the two groups.They either receive active A and placebo to B oractive B and placebo to A. To effectively blindadministration where three treatments were beingcompared would require a treble dummy techni-que and this increases the treatment burden onpatients and reduces compliance. The net conse-quence is that as regards comparison of all threetreatments, TARGET is what has been referred toas a veiled study [4]: although patients do notknow which treatment they are receiving, they doknow that there is a possible treatment they are notreceiving.

In fact, it is impossible to design such a studynot to be veiled unless a treble dummy technique.However, there is a further feature of theTARGET study that is important, and which willbe examined below, namely that patients were notrandomized to the two sub-studies. As a refereehas pointed out to me, such random allocationwould have been possible and this would have hadthe advantage of making the three arms compar-able in terms of demographic variables (or at leastonly randomly different). The disadvantage wouldhave been that two different sorts of treatmentpack, differing by sub-study, would have had tohave been handled in every centre. In fact anintermediate design would have been possible.Centres could have been randomized to sub-study.This would have been a hierarchical design withtwo levels of randomization. There would thenhave been some between-centre information reco-verable in a random effects analysis. Whatever thepros and cons of these three alternatives are, that isto say randomizing within centres to sub-study,randomizing centres to sub-study or allocatingcentres not at random to a given sub-study in, thefact is that the last alternative was chosen.

In formal statistical terms, the TARGET studyis thus of a randomized incomplete block design.(But only randomized within blocks.) If one isconcerned about bias, either due to the trial’sveiled nature or due to non-random allocation,

Copyright # 2008 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2008 7: 294–301DOI: 10.1002/pst

Lessons from TGN1412 and TARGET 295

one should not make a direct comparison, forexample, of ibuprofen and naproxen. In fact, thesetwo treatments could only be compared in terms ofa double contrast: within each sub-study thedifference to lumiracoxib would have to becompared. These two differences would then haveto be compared with each other. This doubleunbiased contrast would have a variance that wastwice that of the single (potentially biased)contrast of naproxen and ibuprofen. A similarconsideration applies to the comparison of lumi-racoxib with either of the other two treatments,although here the effect is less dramatic. Anunbiased contrast of lumiracoxib to naproxenwould only use patients on lumiracoxib from thenaproxen sub-study. A (potentially biased) con-trast with lower variance would use all thepatients. Suppose that in total there are 4n patientswith 2n on lumiracoxib (n in each sub-study) and neach on naproxen and ibuprofen. The variance ofthe biased contrast would be approximatelyproportional to 1=ð2nÞ þ 1=n ¼ 3=ð2nÞ and thatof the unbiased contrast to 1=nþ 1=n ¼ 2=n ¼4=ð2nÞ: The ratio of the latter to the former is 4

3:

Therefore, insisting on an unbiased contrast leadsto a variance inflation of 1

3:

This immediately raises the issue (relevant toobservational studies) as to whether these moreefficient contrasts are in actual fact biased. TheTARGET study is an interesting one in thisrespect. First, the conditions under which it isrun are, by the standards of observational studiesextremely favourable to eliminating bias. How-ever, for the choice of comparator the protocolsfor the two sub-studies are identical and the twosub-studies were run simultaneously and super-vised by the same group of scientists. Second, the

trial is of high quality with very many covariatevalues being reported.

Table I, however, shows the problem for anyoptimistic view of the matter. The distributions offour selected binary demographic covariates rele-vant to CV outcomes are shown. Within each sub-study it can be seen that the distribution ofcovariates is extremely similar between the twoarms. However, across sub-studies the pattern isquite different.

Although I do not consider that significancetests on baseline variables are generally useful forcomparing treatment arms in randomized clinicaltrials, unless they are carried out as a qualitycontrol check on the randomization procedureitself [5, 6], if all treatment arms are consideredtogether this is not a randomized trial. In thatconnection it becomes interesting to considerwhich of the sorts of differences seen in thesedemographic characteristics could have occurredby chance. Table II gives the results of applyingthree logistic regression models to each of the fourdemographic variables listed in Table I. The tablelists the deviance, fitting each of three possiblefactors: sub-study, treatment given that sub-studyis already in the model and treatment withouthaving sub-study in the model. These should have(approximately) a chi-square distribution underthe null hypothesis that there is no systematicdifference between groups associated with theclassification in question for the variable inquestion. For the factor sub-study the degrees offreedom should be 1 and for the other two theyshould be 2. Also given are the P-values associatedwith the various significance tests.

For every demographic variable, it will be seenthat there is a significant ‘effect’ of sub-study, a

Table I. Distribution of selected demographic characteristics in the TARGET study (based on [3]).

Sub-study 1 Sub-study 2

Demographic characteristics Lumiracoxibn ¼ 4376

Ibuprofenn ¼ 4397

Lumiracoxibn ¼ 4741

Naproxenn ¼ 4730

Use of low-dose aspirin 975 (22%) 966 (22%) 1195 (25%) 1193 (25%)History of vascular disease 393 (9%) 340 (8%) 588 (12%) 559 (12%)Cerebrovascular disease 69 (2%) 65 (1%) 108 (2%) 107 (2%)Dyslipidaemias 1030 (24%) 1025 (23%) 799 (17%) 809 (17%)

Copyright # 2008 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2008 7: 294–301DOI: 10.1002/pst

296 S. Senn

non-significant ‘effect’ of treatment if sub-study isin the model and a significant ‘effect’ of treatmentif sub-study is not in the model.

In other words, the process by which patientswere allocated to treatment is such that sub-studymust be in the model in order to permit a validcomparison of the outcomes of this trial betweentreatments, since there are already importantdifferences between treatment groups at baseline.

Note that this is not the same as saying that oneis entitled to carry out a logistic regression of oneof the outcome variables (for example CV events)provided only that sub-study is fitted in additionto treatment. This is a necessary condition but nota sufficient one. If, as must surely be the case, thesedemographic variables have prognostic value, thengiven that they have been observed, they must befitted. At least, that is my point of view [7, 8]. It israther that even if they are fitted, then one is stillobliged to fit sub-study. This, of course, immedi-ately raises a dilemma for an observational study,since this, unlike the TARGET study, would haveno element of randomization. I shall return to thispoint once I have considered TGN1412.

3. TGN1412

On the morning of 13 March 2006 a first-in-manstudy of TeGenero’s monoclonal antibodyTGN1412 was carried out by the contract researchorganization Parexel at their research facilityadjacent to Northwick Park Hospital. The trialplan envisaged a number of dose–cohorts. The first

cohort would consist of eight healthy volunteersallocated in the ratio 3:1 to the lowest dose ofTGN1412 or placebo. In fact, there were twoblocks of four subjects so that in each block, threewould be given treatment and one placebo. (Thisfeature will be ignored in what follows.) Dosingwas at 10min intervals. Within a few hours ofdosing (at most) all healthy volunteers allocated toTGN1412 had begun to exhibit symptoms of asevere ‘cytokine storm’ and by midnight all hadbeen admitted to intensive care at NorthwickPark. As mentioned in the Introduction, this trialand its implications for future conduct of first-in-man studies were the subject of a report of a RoyalStatistical Society working party [9].

Note that had the trial proceeded to conclusion,which would have involved further cohorts ofeight volunteers allocated six to TGN1412 (athigher doses) or two to placebo, then it would havehad the structure of a veiled trial [4], becausedifferent doses are allocated to different cohorts.This would have raised an inferential issue, notaddressed in the protocol, as to whether or not theplacebo subjects from different cohorts should bepooled in order to judge the effect of a given dose,or whether that dose should only be comparedwith the accompanying placebo. Even if thehealthy volunteers are allocated to differentcohorts at random and not just randomly toplacebo or TGN1412 within cohort, they will beaware as to which cohort they have been assigned.Thus, in order to eliminate fully the sort of biasthat blinding is designed to eliminate, it wouldhave been necessary to compare subjects on aparticular dose only to placebo subjects in the

Table II. Results of carrying out significance tests on the baseline demographic variables.

Effect Aspirin Vascular history Cerebrovascular Dyslipidaemias

Deviances for the four demographic variablesSubstudy 23.57 70.14 13.538 117.98Treatment given substudy 0.13 5.23 0.144 0.17Treatment 13.40 47.41 7.745 54.72

Approximate chi-square probabilities for the four demographic variablesSubstudy 0.0000 0.00000 0.0002 0.0000Treatment given substudy 0.9365 0.07332 0.9304 0.9194Treatment 0.0012 0.00000 0.0208 0.0000

Lessons from TGN1412 and TARGET 297

Copyright # 2008 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2008 7: 294–301DOI: 10.1002/pst

same cohort. As it turned out, of course, the trialwas stopped and this consideration is not relevant.It would have had some relevance had the trialcontinued to conclusion.

The results from the trial can be summarized asin Table III. This has the very familiar form of a2� 2 contingency table, with all the observationson the main diagonal. If this is analysed usingFisher’s exact test the resulting one-sided P-value is0.0357. Since the conventional standard for sig-nificance in drug development is 2.5% one sided,then the result is not even significant by conven-tional standards. Of course, Fisher’s exact test isnot uncontroversial and although I do not acceptmost of the common reasons for rejecting it asbeing a valid approach to analysing contingencytable, it is my opinion that where one of themargins has not been fixed in advance and all theresults lie on the diagonal, it does discard someinformation. After all the fact that the side-effectssplit 6 to 2 is suggestive even without knowing howthey split. A possible alternative would be Bar-nard’s test [10], which yields a P-value of 0.0111,which is significant by conventional standards.

However, both of these P-values are clearlyirrelevant, and that is the point of the example.They do not begin to do justice to the strength ofevidence from the case, even accepting the limita-tions that P-values have in this respect. The reasonmust be that other information is used in comingto a decision: this information is of two sorts.First, the background risk of a cytokine storm is solow (it sometimes accompanies severe influenza)that this makes the occurrence of one, let alone six,already extremely indicative of a causal relation-ship. Figure 1 shows a plot of the P-value thatwould result if one were to test the null hypothesis

that the probability of a cytokine storm wasidentical for placebo and TGN1412 and as afunction of some posited probability, y; for thisevent. It can be seen that for low y; the P-value isextremely impressive. The P-value correspondingto Barnard’s test is that which applies when y ¼ 6

8;

the observed proportion of events. Note also thatsince the most extreme result has occurred, the P-values is the probability of an actual result giventhe null hypothesis, and since the best supportedalternative suggests that this probability would be1, the P-value is also a likelihood ratio [11].Secondly, the temporal coincidence of side-effectswithin a short time of dosing is very suggestive[12]. This effect is not captured by Figure 1, whichdoes not use this information. Timing of side-effects is a matter that is increasingly exercising theattention of statisticians [13].

For further discussion of this study see the RSSworking party report [1] and the commentary inthis journal by Julious [14].

4. DISCUSSION

These two trials seem to give very different lessons.In the case of the TARGET study it seems to be

Table III. Summary of results from the trial of

TGN1412.

Cytokine storm

Treatment No Yes Total

Placebo 2 0 2TGN1412 0 6 6Total 2 6 8

0 0.2 0.4 0.6 0.8 1

0.005

0.01

0.015

P-value or Likelihood RatioCommon probability of reaction

P-va

lue

Barnard

6

8

Figure 1. P-value for testing the null hypothesis that

there is no difference in the risk of side-effect for

TGN1412 or placebo as a function of the assumed

background risk of a reaction.

Copyright # 2008 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2008 7: 294–301DOI: 10.1002/pst

298 S. Senn

that going beyond any form of inference sanc-tioned by the way in which the patients wererandomized to treatment is very risky. In the trialof TGN1412, the opposite seems to be the case.Only using the sort of inference to which therandomization points seems quite inappropriate. Ifone compares the two studies, then the featuresgiven in Table IV are notable.

In fact our difference in attitude to the twotrials, in the one case believing that a very cautiousand formal approach is appropriate (TARGET)and in the other that such an approach completelyfails to summarize the implications of the trial(TGN1412), is easily explained in terms of thatwell-known statistical phenomenon the bias–var-iance trade-off. In a frequentist framework, theoverall reliability of an estimator can be expressedin terms of mean-square error, MSE: the sum ofthe square of its bias and its variance. We thushave the formula

MSE ¼ b2 þ g2

where b is the bias and g is the standard error.Usually, we have that g is (roughly) inverselyproportional to the number of subjects, N in thestudy. Hence, the larger the study, the greater therelative importance of the square of the bias to theMSE.

This position is reflected in Figure 2, whichshows in very general terms, the MSE for twoestimators. One is unbiased but with a largervariance and the other is biased but with a smallervariance. For small sample size, the biasedestimator has lower MSE. For larger sample size,the unbiased estimator has the lower MSE.

Table IV. Comparison of two studies as regards various features.

Trial

Feature TARGET TGN1412

Design Partially randomized and hence veiled Partially randomized and hence veiledNumber of subjects Many FewBackground information Considerable from one point of view but

not precise enough as regards the effect ofthe control treatments in the populationsstudied to provide much additional usableinformation

None as regards TGN1412. Very much asregards the probability of severe adversereactions in subjects given no treatment.Information on timing valuable and notused by conventional analyses

Magnitude of effect Sought for effect was small. Observedeffects (not discussed in this note) were alsosmall

Sought for effect unclear but observed effecthuge

Potential for bias Small but large relative to the sought-fortreatment effect

Possibly large compared with anticipatedeffects but small compared with observedeffects

Variance of conventionaltreatment estimates

Small Large

Biased

Unbiased

Sample size

Mea

n sq

uare

err

or

β2

Figure 2. Mean-square error as a function of sample

size for an unbiased and a biased estimator when the

bias is b:

Lessons from TGN1412 and TARGET 299

Copyright # 2008 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2008 7: 294–301DOI: 10.1002/pst

Asymptotically, the MSE for the biased estimatorapproaches b2 whereas for the unbiased estimatorit is zero. In terms of our examples, TGN1412belongs towards the very left of the diagram andTARGET towards the very right. The commonapproach of requiring unbiased estimators andthen choosing that with minimum variance amongthat class only makes sense if the contribution ofbias to MSE would be large. This is the cases withTARGET but not at all the case with TGN1412.

In the case of TGN1412, the variance of aformal controlled estimator of risk is so large thatuncontrolled inferences, despite their potentialbiases are valuable.

As a referee pointed out to me, however, the realdilemma arises because, although we can estimateg; the bias, b; is inherently unknowable. The mostthat we can do is place plausible bounds on it, orin Bayesian terms, a prior distribution for it. Herethe two examples I have taken suggest that thingsare easier than they may be in practice. For a largestudy we clearly want unbiased estimators. For asmall study we may want efficient ones. However,for studies of intermediate size it may not be at allobvious what is the best approach.

There are some lessons here for meta-analysis.There has been much discussion of the appropriateway to take account of the quality of studies. It hasbeen proposed that they might be given lowerweight. However, what consideration of MSEshows is that other things being equal, the weightsgiven to poor quality studies should vary with theamount of information available. No simple rulesuch as, for example, ‘give such studies x% of theweight they would have if of good quality’ can beapplied, since the value of x would decline themore good quality studies there were untileventually, given enough studies of adequatequality one would pay no attention whatsoeverto those of poor quality.

5. CONCLUSIONS

In conclusion, these are the (very obvious) lessonsthat these studies point.

(a) Confining inferences to those permitted byformal controlled analyses of controlled ex-periments is far too limiting. Such a restrictioncannot possibly accommodate all of the needsof drug development. The case of TGN1412underlines this point dramatically.

(b) Nevertheless, where effects being studied aresmall but precise inferences are needed, largeprecisely controlled experiments can play avital role.

(c) However, practical realities, resource con-straints and the fact that some backgroundinformation is always available means that formany questions the information from uncon-trolled studies can be very valuable.

(d) On the other hand such uncontrolled studies willhit a limit of overall precision as they increase insize. Eventually, as sample size increases bias willcome to be the dominating term in MSE and thisbias will be incapable of further reduction simplyby increasing the size of the study.

(e) To the extent that blinding of subjects andexperimenters is considered crucial in anycontrolled study the randomization of thestudy plays a crucial guiding role in analysis.The case of TARGET shows this clearly.

(f) It will often be necessary to model the effect ofconfounding variables in uncontrolled studies.This does not mean that they may be ignoredin controlled studies.

Point (d) has particular relevance for meta-analysis. There has been considerable discussionof weighting schemes for quality in meta-analysis.A key implication is that the relative weight givento studies of lower quality should vary accordingto the total amount of information available [15].A further implication is that one may have to becautious with approaches to meta-analysis, such asthat of van Houwelingen et al. [16] that treat themain effect of the trial as random. Such ap-proaches permit the recovery of inter-trial infor-mation [17]. Although such information willusually be very little, this may not be the case iftrials are unbalanced. The situation is then some-what analogous to an analysis of the TARGETstudy without including sub-study as a factor. Of

Copyright # 2008 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2008 7: 294–301DOI: 10.1002/pst

300 S. Senn

course, for certain purposes, treating the trial effectas random may be appropriate [18, 19].

ACKNOWLEDGEMENTS

I thank two referees for their helpful comments.

REFERENCES

1. Working Party on Statistical Issues in First-in-ManStudies. Statistical issues in first-in-man studies.Journal of the Royal Statistical Society; Series A2007; 170(3):517–579.

2. Schnitzer TJ, Burmester GR, Mysler E, HochbergMC, Doherty M, Ehrsam E et al. Comparison oflumiracoxib with naproxen and ibuprofen in theTherapeutic Arthritis Research and GastrointestinalEvent Trial (TARGET), reduction in ulcer compli-cations: randomised controlled trial. The Lancet2004; 364(9435):665–674.

3. Farkouh ME, Kirshner H, Harrington RA, RulandS, Verheugt FW, Schnitzer TJ et al. Comparison oflumiracoxib with naproxen and ibuprofen in theTherapeutic Arthritis Research and GastrointestinalEvent Trial (TARGET), cardiovascular outcomes:randomised controlled trial. The Lancet 2004;364(9435):675–684.

4. Senn SJ. A personal view of some controversiesin allocating treatment to patients in clinical trials[see comments]. Statistics in Medicine 1995; 14(24):2661–2674.

5. Senn SJ. Covariate imbalance and random alloca-tion in clinical trials [see comments]. Statistics inMedicine 1989; 8(4):467–475.

6. Senn SJ. Testing for baseline balance in clinicaltrials. Statistics in Medicine 1994; 13(17):1715–1726.

7. Senn SJ. Added values: controversies con-cerning randomization and additivity in clinical

trials. Statistics in Medicine 2004; 23(24):3729–3753.

8. Senn SJ. Baseline balance and valid statisticalanalyses: common misunderstandings. Applied Clin-ical Trials 2005; 14(3):24–27.

9. Working Party on Statistical Issues in First-in-ManStudies. Report of the Working Party on StatisticalIssues in First-in-Man Studies. Royal StatisticalSociety: London, 2007.

10. Barnard GA. A new test for 2� 2 tables. Nature1945; 156:177.

11. Senn SJ. Dicing with death. Cambridge UniversityPress: Cambridge, 2003.

12. Senn SJ. Comment on Farrington and Whitaker:semiparametric analysis of case series data. Journalof the Royal Statistical Society; Series C – AppliedStatistics 2006; 55:581–583.

13. Farrington CP, Whitaker HJ. Semiparametric ana-lysis of case series data (with discussion). Journal ofthe Royal Statistical Society; Series C – AppliedStatistics 2006; 55(5):1–28.

14. Julious S. A personal perspective on the RoyalStatistical Society report of the working party onstatistical issues in first-in-man studies. Pharmaceu-tical Statistics 2007; 6(2):75–78.

15. Detsky AS, Naylor CD, O’Rourke K, McGeer AJ,L’Abbe KA. Incorporating variations in the qualityof individual randomized trials into meta-analysis.Journal of Clinical Epidemiology 1992; 45(3):255–265.

16. Van Houwelingen HC, Zwinderman KH, Stijnen T.A bivariate approach to meta-analysis. Statistics inMedicine 1993; 12(24):2273–2284.

17. Senn SJ. The many modes of meta. Drug Informa-tion Journal 2000; 34:535–549.

18. van Houwelingen H, Senn S. Investigating under-lying risk as a source of heterogeneity in meta-analysis [letter; comment]. Statistics in Medicine1999; 18(1):110–115.

19. van Houwelingen HC, Arends LR, Stijnen T.Advanced methods in meta-analysis: multivariateapproach and meta-regression. Statistics in Medi-cine 2002; 21(4):589–624.

Lessons from TGN1412 and TARGET 301

Copyright # 2008 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2008 7: 294–301DOI: 10.1002/pst