disappointing dichotomies
TRANSCRIPT
PHARMACEUTICAL STATISTICS
Pharmaceut. Statist. 2003; 2: 239–240 (DOI:10.1002/pst.090)
Disappointing dichotomies
Stephen Senn*,y
Department of Statistics, University of Glasgow
A regrettably common means of judging the effectof treatments is by ‘responder analysis’: at the endof the trial every patient is classified according to abinary variable with ‘responded’ and ‘did notrespond’ as the alternatives [1]. Sometimes theclassification is natural. For example, in a trial inanti-infectives it may be possible to separatepatients into ‘cured’ and ‘not cured’ categories,and this may seem to be the only meaningful thingto do. Or, in a trial in lung cancer, it may seemnatural to use some standard time of follow-upand classify the patients as either alive or dead atthe end. However, even in such cases a simpleanalysis using a chi-square test on a fourfold table,or, where covariate information is employed, itsmore sophisticated cousin, logistic regression, maybe unwise. To take the case of lung cancer, in thelong run we are all dead, and if the follow-up timehas been chosen unwisely we may fail to findimportant genuine differences between treatmentgroups. A better approach may be to use the timeof death as the outcome variable and hencesurvival analysis rather than logistic regression.
However, such dichotomies are often far fromnatural and instead arbitrarily constructed fromcontinuous (or nearly continuous) measurements.An approach common in many areas, hyperten-sion and depression to name but two, is tocompare the result at outcome to that at baselineand then classify patients as responders or notdepending on whether some arbitrary threshold ofdifference has been reached [2, pp. 118–119]. Thereare many reasons to why this is undesirable. The
first is that it is inefficient and leads to at the veryleast a 40% increase in the sample size required [3].(If the cut-point has been chosen unwisely, matterscan be much worse.) The second is that it isvulnerable to the sort of trends that clinical trialsare designed to control. Consider, for example, atrial for hypertension in which patients have beenselected for inclusion on the basis of a singlebaseline measurement, compared to one in whichsome form of repeated measurement over a periodhas been used. Other things being equal, in theformer case there is likely to be a greater regressionto the mean effect. Hence the response rate in bothcontrol and intervention groups is likely to behigher. The expected difference between the groupswill not necessarily be the same, either on theprobability scale or on the log-odds scale. Third,such an approach makes an inefficient use of thebaseline measurements. Being based on changescores, a double inefficiency is introduced. Thedichotomy is more inefficient than the changescore, as already discussed, and the change score isalready more inefficient than analysis of covar-iance [2, pp. 106–108]. Finally, it encourages na.ııveand inappropriate judgements of causality. It isinherently liable to be extravagantly interpreted asshowing whether a given patient did or did notgain a benefit from treatment. However, everypatient in a clinical trial could show the same truedegree of benefit to treatment but throughmeasurement error some would have a differencethat exceeded the response threshold and somewould not. We could then make erroneousjudgements about the proportion showing re-sponse [4].
Unfortunately, such inefficient measures arebecoming enshrined in regulatory practice. For
Copyright # 2003 John Wiley & Sons, Ltd.
*Correspondence to: Stephen Senn, Department of Statistics,University of Glasgow, 15 University Gardens, Glasgow G128QQ.
yE-mail: [email protected]
example, the CPMP guideline on multiplicity moreor less takes it for granted that such measures willbe used and, indeed, are desirable [5]. The currentfashionable obsession with numbers need to treatis unfortunately fuelling the demand for dichoto-mies [6]. There are now a number of guidelines forspecific therapeutic areas that require responderanalyses. They are often defended in terms of‘clinical relevance’, but in my opinion this phrase issimply a mantra that is chanted to justify badhabits. Take the case of blood pressure. Trialistsseem to have forgotten that this is itself a surrogatemeasure (admittedly a very familiar one) forstrokes, kidney damage, eye damage, heart diseaseand so forth. If clinically relevant measures areneeded, then these therapeutic outcomes arerelevant. Diastolic blood pressure as a continuousoutcome will predict these sequelae of hyperten-sion better than if dichotomized.
The net consequence of dichotomizing contin-uous data is that trials are much bigger than theyneed be, that our inferences are poorer and that we
are wasting both resources and lives. Thinkingabout appropriate measurement is an importantpart of any science. Physicists take it veryseriously. Trialists should do the same.
REFERENCES
1. Altman DG. Statistics in medical journals: somerecent trends. Statistics in Medicine 2000; 19:3275–3289.
2. Senn SJ. Statistical issues in drug development. Wiley:Chichester, 1997.
3. Whitehead J. Sample size calculations for orderedcategorical data. Statistics in Medicine 1993; 12:2257–2271.
4. Senn SJ. Author’s reply to Walter and Guyatt. DrugInformation Journal 2003; 37:7–10.
5. Committee for Proprietary Medicinal Products.Points to consider on multiplicity issues in clinicaltrials. 2002; 1–11.
6. Grieve AP. The number needed to treat: a usefulclinical measure or a case of the Emperor’snew clothes? Pharmaceutical Statistics 2003; 2:87–102.
Copyright # 2003 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2003; 2: 239–240
S. Senn240