comment

3
Comment Stephen Senn * Department of Statistics, University of Glasgow, 15 University Gardens, Glasgow, G12 8QQ, United Kingdom Those of us who have, over the years, been following Dr. Berger’s many, varied and excellent contri- butions to the literature on clinical trials have come to expect papers of great originality and interest. This latest example (Berger, 2005) does not disappoint. In it he describes yet another potential head- ache for the clinical trials community, that of third-order bias. In so doing he exhibits a degree of cynicism and scepticism regarding the possible conduct of clinical trials which I, for one, find very congenial. After all, it is the hallmark of a statistician to disbelieve almost anything and everything. Dr. Berger has such scepticism in spades and shows us yet another way in which trials that are de- signed to be randomised can be subverted by recruiting physicians so that they are not. One might refer to such trials as randomised by intention but not execution. That trials that were designed to be randomised can have their allocation subverted, is clearly illus- trated by the case of the CAPPP trial (Hansson et al., 1999). This open trial in hypertension in nearly 11,000 patients used allocation by sealed envelope. There was a difference in mean blood pressure at baseline which was highly significant. (It was described as p < 0:00001 by the authors but this does not begin to do it justice. The t-statistics are 5.8 and 8.9 for systolic/diastolic blood pressure respec- tively.) The authors and others (Peto, 1999) attributed this to the use of envelopes but perhaps Dr. Berger’s third-order or some other order bias was involved. I myself liked to speculate that some gremlin may have got into a minimisation procedure and inverted it to give us a maximisation proce- dure instead, but alas, must abandon that theory. Apparently a modified form of randomised block design was used with blocks of size 36 but constrained to have no greater difference than three between treatment numbers on the two arms by any intermediate point in the block. Nor should we believe that all trialists are saints. I certainly came across some sinners during the time I worked in the pharmaceutical industry. (Lest I be misunderstood, I should point out that the sinners in question were not themselves working for the industry.) I was, however, more worried about the invention of data than the subversion of allocation. (For some reason the consultant who treated a patient two weeks after her death, according to the independent testimony of her general practitioner, sticks in the mind.) Perhaps my priorities were wrong. Dr. Berger refers disapprovingly to a ‘denial of the form of the various forms of residual selection bias’ but perhaps there is also some denial in believing that one ‘can compensate for selection bias’. No compensation of any form has rescued or could rescue the CAPPP study from its critics, who regard it as only fit for the dustbin. Any adjustment strategy must assume conditional unbiasedness given adjustment and any testing strategy must assume that the trialist is not running the same test himself. So no strategy of testing is proof against ruthless criminals and hence against determined scepticisim on the part of the reader. I gave an example some years ago of how a Machiavellian trialist using the “Devil’s algorithm” could try and evade detection while stacking the cards in his favour (Senn, 1994b). Of course, this was when faced with a relatively simple test. More elaborate strategies would be necessary for more elaborate tests. On the other hand knowing that such tests might be carried out might deter some and as such they could be useful. Perhaps, for example, quality control departments of pharmaceutical companies could add them to the range of procedures used. All tests, however, carry with them a risk of false positives so they are not free and in my view there is no point in carrying out such tests unless you are prepared to follow up, investigate and if necessary prosecute. As far as I am aware only one investiga- * Corresponding author: e-mail: [email protected], Phone: +44 (0)141 330 5141, Fax: +44(0)141 330 4814 Biometrical Journal 47 (2005) 2, 133 135 DOI: 10.1002/bimj.200510109 # 2005 WILEY-VCH Verlag GmbH &Co. KGaA, Weinheim

Upload: stephen-senn

Post on 06-Jun-2016

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Comment

Comment

Stephen Senn*

Department of Statistics, University of Glasgow, 15 University Gardens, Glasgow, G12 8QQ,United Kingdom

Those of us who have, over the years, been following Dr. Berger’s many, varied and excellent contri-butions to the literature on clinical trials have come to expect papers of great originality and interest.This latest example (Berger, 2005) does not disappoint. In it he describes yet another potential head-ache for the clinical trials community, that of third-order bias. In so doing he exhibits a degree ofcynicism and scepticism regarding the possible conduct of clinical trials which I, for one, find verycongenial. After all, it is the hallmark of a statistician to disbelieve almost anything and everything.Dr. Berger has such scepticism in spades and shows us yet another way in which trials that are de-signed to be randomised can be subverted by recruiting physicians so that they are not. One mightrefer to such trials as randomised by intention but not execution.

That trials that were designed to be randomised can have their allocation subverted, is clearly illus-trated by the case of the CAPPP trial (Hansson et al., 1999). This open trial in hypertension in nearly11,000 patients used allocation by sealed envelope. There was a difference in mean blood pressure atbaseline which was highly significant. (It was described as p < 0:00001 by the authors but this doesnot begin to do it justice. The t-statistics are 5.8 and 8.9 for systolic/diastolic blood pressure respec-tively.) The authors and others (Peto, 1999) attributed this to the use of envelopes but perhapsDr. Berger’s third-order or some other order bias was involved. I myself liked to speculate that somegremlin may have got into a minimisation procedure and inverted it to give us a maximisation proce-dure instead, but alas, must abandon that theory. Apparently a modified form of randomised blockdesign was used with blocks of size 36 but constrained to have no greater difference than threebetween treatment numbers on the two arms by any intermediate point in the block.

Nor should we believe that all trialists are saints. I certainly came across some sinners during thetime I worked in the pharmaceutical industry. (Lest I be misunderstood, I should point out that thesinners in question were not themselves working for the industry.) I was, however, more worried aboutthe invention of data than the subversion of allocation. (For some reason the consultant who treated apatient two weeks after her death, according to the independent testimony of her general practitioner,sticks in the mind.) Perhaps my priorities were wrong.

Dr. Berger refers disapprovingly to a ‘denial of the form of the various forms of residual selectionbias’ but perhaps there is also some denial in believing that one ‘can compensate for selection bias’.No compensation of any form has rescued or could rescue the CAPPP study from its critics, whoregard it as only fit for the dustbin. Any adjustment strategy must assume conditional unbiasednessgiven adjustment and any testing strategy must assume that the trialist is not running the same testhimself. So no strategy of testing is proof against ruthless criminals and hence against determinedscepticisim on the part of the reader. I gave an example some years ago of how a Machiavelliantrialist using the “Devil’s algorithm” could try and evade detection while stacking the cards in hisfavour (Senn, 1994b). Of course, this was when faced with a relatively simple test. More elaboratestrategies would be necessary for more elaborate tests.

On the other hand knowing that such tests might be carried out might deter some and as such theycould be useful. Perhaps, for example, quality control departments of pharmaceutical companies couldadd them to the range of procedures used. All tests, however, carry with them a risk of false positivesso they are not free and in my view there is no point in carrying out such tests unless you areprepared to follow up, investigate and if necessary prosecute. As far as I am aware only one investiga-

* Corresponding author: e-mail: [email protected], Phone: +44 (0)141 330 5141, Fax: +44(0)141 330 4814

Biometrical Journal 47 (2005) 2, 133–135 DOI: 10.1002/bimj.200510109

# 2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Page 2: Comment

tor in the UK has ever been found guilty on the basis of statistical evidence alone of fraudulentlymanipulating patient allocation (Anonymous, 2003; Dyer, 2003; Farthing, 2004). I raise this not toclaim that fraud is rare but that its nemesis may involve more than statistics.

I am also not convinced that we should always prepare for the worst case. To give another exampleaffecting trial quality, I am largely persuaded by the arguments of (Day et al., 1998) that double dataentry makes no useful contribution to data quality. However, that argument is based on plausible errorrates in data-entry. If the error rate were very high, then even triple data entry would not be goodenough. To give another example, for any conclusion of equivalence, no degree of blinding is proofagainst the determined data-falsifier. All that is necessary is to generate data at random from a singledistribution (Senn, 1993, 1994a). This does not necessarily mean, however, that all therapeutic equiva-lence trials should be abandoned, although it must be accepted that there simply are some claims thatcould never be convincingly proved by such trials. For instance no active control study of a homeo-pathic agent could ever convincingly prove its efficacy if all that was demonstrated was closeness ofeffect to any degree desired between the two arms.

Maybe in the end we have to accept that randomisation and blinding can only ever provide protec-tion up to a point. For example Philipson & Desimone (1997) have a ‘proof’ that no trial can ever betruly blind: patients’ speculation about effects will lead to treatment identification, which will unblindinvestigators which could drive the various orders of bias Dr. Berger identifies. On the other hand, ofcourse, despite claims to the contrary (Fergusson et al., 2004), it is really unblinding through efficacythat we want in clinical trials (Senn, 2004) but not, of course, at the expense of subverting allocation.How are we to solve this conundrum? Perhaps we should all be Bayesians and simply ask ourselves forany given trial which of two explanations, treatment effect or bias, is more plausible. The most deter-mined critic can never be convinced and may after all be right. For example, I believe that Americanslanded on the moon in 1969 and have even met a very nice gentleman, apparently an astronaut, whoclaimed to have gone there himself on a subsequent mission. He seemed a truthful sort and I aminclined to believe him but the internet is full of pages that claimed that it was all a Hollywood typestunt, rather like the film Capricorn One starring Elliot Gould. When the claim being made is reallybig, no amount of post-testing, no Berger and Exner tests (Berger & Exner, 1999) and no possibleadjustments will satisfy the critics. You are faced with Hume’s problems of miracles: it is always easierto believe in the existence of liars than wonders. (But see Earman, 2002 for a criticism of Hume.) Youwill have to arrange to have them supervise your trial in its last details as it is run. This is just whatJohn Maddox, did in investigating Benveniste’s claims for homeopathy (Benveniste, 1988; Maddoxet al., 1988). He went to his lab and took a professional magician, James Randi, with him to superviseboth randomisation and blinding. Although there are statisticians who are also magicians (DeGroot,1986), most of us will not be able to do likewise. Perhaps an awareness of third order bias will help us,perhaps not. Either way, I welcome this interesting addition to the literature on clinical trials.

Acknowledgements I am grateful to Jan Lanke for helpful discussions on the CAPPP trial. I bear sole responsi-bility for the views expressed here.

References

Anonymous. (2003). Gulf war syndrome doctor faked £90m trial for diabetic drug British Nursing News Online.http://www.bnn-online.co.uk/comments_display.asp?HeadlineID=36531&Year=2003.

Benveniste, J. (1988). Benveniste on Nature investigation. Science 241, 4869, 1028.Berger, V. (2005). Quantifying the magnitude of baseline covariate imbalances resulting from selection bias in

randomized clinical trials. Biometrical Journal 47, 119–127.Berger, V. W. and Exner, D. V. (1999). Detecting selection bias in randomized clinical trials. Controlled Clinical

Trials 20, 4, 319–327.Day, S., Fayers, P., and Harvey, D. (1998). Double data entry: what value, what price? Controlled Clinical Trials

19, 1, 15–24.Degroot, M. (1986). A conversation with Persi Diaconis. Statistical Science 1, 319–334.Dyer, O. (2003). GMC reprimands doctor for research fraud. British Medical Journal 326, 7392, 730.

134 S. Senn: Comment

# 2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Page 3: Comment

Earman, J. (2002). Bayes’s Theorem, Miracles and Theism. Bayes’s Theorem, R. Swinburne (ed.). Oxford Univer-sity Press, Oxford.

Farthing, M. J. G. (2004). ‘Publish and be damned’ . . . the road to research misconduct. The Journal of the RoyalCollege of Physicians of Edinburgh 34, 301–304.

Fergusson, D., Glass, K. C., Waring, D., and Shapiro, S. (2004). Turning a blind eye: the success of blindingreported in a random sample of randomised, placebo controlled trials. British Medical Journal 328, 7437,432.

Hansson, L., Lindholm, L. H., Niskanen, L., Lanke, J., Hedner, T., Niklason, A., Luomanm�ki, K., Dahl�f, B.,de Faire, U., M�rlin, C., Karlberg, B. E., Wester, P. O., and Bj�rk, J. E. (1999). Effect of angiotensin-converting-enzyme inhibition compared with conventional morbidity and mortality in hypertension: the Captopril Pre-vention Project (CAPPP) randomised trial. Lancet 353, 611–616.

Maddox, J., Randi, J., and Stewart, W. W. (1988). “High-dilution” experiments a delusion. Nature 334, 6180,287–290.

Peto, R. (1999). Failure of randomisation by “sealed” envelope. Lancet 354, 9172, 73.Philipson, T. and Desimone, J. (1997). Experiments and subject sampling. Biometrika 84, 3, 619–630.Senn, S. J. (1993). Inherent difficulties with active control equivalence studies. Statistics in Medicine 12, 2367–

2376.Senn, S. J. (1994a). Fisher’s game with the devil. Statistics in Medicine 13, 217–230.Senn, S. J. (1994b). Testing for baseline balance in clinical trials. Statistics in Medicine 13, 1715–1726.Senn, S. J. (2004). Turning a blind eye: Authors have blinkered view of blinding. British Medical Journal 328,

7448, 1135–1136.

Biometrical Journal 47 (2005) 2 135

# 2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim