On Plotting Renovated Samples
Post on 01Feb2017
217 views
Embed Size (px)
TRANSCRIPT
On Plotting Renovated SamplesAuthor(s): Peter J. SmithSource: Biometrics, Vol. 51, No. 3 (Sep., 1995), pp. 11471151Published by: International Biometric SocietyStable URL: http://www.jstor.org/stable/2533014 .Accessed: 25/06/2014 09:07
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp
.JSTOR is a notforprofit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact support@jstor.org.
.
International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access toBiometrics.
http://www.jstor.org
This content downloaded from 185.44.78.105 on Wed, 25 Jun 2014 09:07:05 AMAll use subject to JSTOR Terms and Conditions
http://www.jstor.org/action/showPublisher?publisherCode=ibshttp://www.jstor.org/stable/2533014?origin=JSTORpdfhttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
BIOMETRICS 51, 11471151 September 1995
On Plotting Renovated Samples
Peter J. Smith
Department of Statistics and Operations Research, Royal Melbourne Institute of Technology,
G.P.O. Box 2476V, Melbourne, Victoria, Australia 3001
SUMMARY
In this note we use the BuckleyJames method for censored regression in the p sample problem where the samples are subject to rightcensoring. The samples are reconstructed so as to remove the effect of censoring, and graphical procedures based on quantiles (such as boxplots) may then be used as a standard dataanalytic tool to describe the variable being measured.
1. Introduction When two censored samples are to be compared, a commonly used initial graphical approach is to place the productlimit suivival curves on the same diagram. In such plots, it may be difficult to visually separate the curves. To facilitate direct visual communication of the information contained in censored data, Gentleman and Crowley (1992) show how to construct rank plots, QQ plots and comparative boxplots; the focus is on the plot as a functional of the productlimit estimator.
In this paper, we take the approach of reniovatinlg the data to provide a view of what the response would be like had it been unaffected by censoring. We apply renovated scatterplots (Smith and Zhang, 1995) to thep sample problem using the BuckleyJames method (Buckley and James, 1979; Miller and Halpern, 1982; James and Smith, 1984; Lai and Ying, 1991; Lin and Wei, 1992a; Hillis, 1993) for regression with censored data. We note that the linear model is often appropriate when the response is measured on the logarithm scale (Buckley and James, 1979).
2. Rightcensoring Suppose that the outcomes Y' Y2, .I . Y , y,, of a positivevalued rightcensored response variable Y comprise p = 2 samples, of combined size n, with group membership held by the covariate
{I if zi is from Sample 1; xi  0 if zi is from Sample 2.
This means that instead of observing the outcome yi directly, we observe the data
(XI, z1, 8k), (x2, z2, 8), . *, (x,I , zIt, 8,,), (1)
where zi = min4yi, ti} denote the observed responses defined in terms of ti, the censor time associated with Yi, and the censor indicators
fI if Yi < tj; = l0 otherwise
return the value 1 only when yi is observed exactly (uncensored). We assume that survival is independent of the causes of censoring and that the censor times are fixed (as is likely for the case for an experiment termination date). Our methods are also applicable to leftcensoring, which is a special case of rightcensoring with the response axis reversed (Turnlbull, 1974).
3. Data Renovation When the censor indicator 8 is interpreted as a plot symbol, the data (xi, zi, 6i)i = 1, 2, 3, ..., ii, may be depicted in a scatterplot composed of censored points (xi, zi, o) and uncensored poinlts
Key words. BuckleyJames estimator; Censored regression; p sample problem; Rightcensorinlg. 1147
This content downloaded from 185.44.78.105 on Wed, 25 Jun 2014 09:07:05 AMAll use subject to JSTOR Terms and Conditions
http://www.jstor.org/page/info/about/policies/terms.jsp
1148 Biometrics, Septem1ber 1995
(x1, z1, ) in two lines of dots standing above the respective covariate values. When the scatterplot is used as a guide for the effect of the explanatory variable on the response, the points are in the "wrong place": the plotted points are lower than they would likely be in the absence of censorinig. We lift the positionls of these points by using the BuckleyJames method which we nlow describe in an easily programnmed matrix formulation.
For the p sample problem we fit the model
Y = X/3 + R (2)
in terms of: (p  1) parameters (8; matrix X of order n x (p  1); residual vector R = (RI, R2, *.. R,)7, where Ri are independenit and identically distributed with mean a, finite variance, ancl common unknown distribution function F = 1  S. Censored points in the scatterplot are replaced by their estimated conditional expected values by using a weighted linear combination of observed ranked residuals E(b) = (e,(b), e,(b), ..., el,(b))T = Z  Xb from a fitted line of "slope" b from the observed data Z = (ZI, Z2, ... , Z,j)7. Let F = 1  S denote the productlimit estimator based on the observecl residual vector E(b). The weights (Buckley and James, 1979) used in the linear combinationi are defined by
lF(ek(b))6(k  8i)
'ik(b) = S(ei(b)) if ek(b) > (3)
10 otherwise.
dF(r) If the true parameter were known, these weights would estimate in the equation
E[R2jRi > 1] = r'
so that the conditional expectation E(RiRi > ej(,l)) is estimated as a linear combination of XIl=> ek(b)wik(b) when b is near 38.
In a multivariate setting, such as for comparing p > 2 samples, the BuckleyJames method consists of determining an iterative solution b = (,, to the equation
(XTX)'XTY*(b)  b = 0 (4)
through the renovated responises Y*(b) = Xb + VW(b)(Z  Xb), where ( 1 w 12(b) 13(b) ... w l,/(b) 0 b w2 3(b) ... * 21(b)
W(b)= 0 0 0 * ),,(b) 0 0 0 /
is the renovationi weight matrix containing the censor indicators on the main diagonal (Smith and Zhang, 1995). A "solution," b = ,, is reached iteratively when the norm of the leftside of (4) is minimum.
In a univariate setting for comparing p = 2 samples, the equation in (4) may be easily solved
E ix () Y )Yj'(b) iteratively: , is the solution to  b = 0; &,, is then the mean of the resulting
partial residuals yj(,8j)  ,3,xi. This gives 3 as a BuckleyJames estimator of 3. Distributional properties of /, are succilnctly outlined in Lin and Wei (1992b).
4. Plots of Renovated Data Once a BuckleyJames solution b = f3,, has been found in the least squares twosample problem, then the data may be "'renovated": by a renlovated dlotplot we mean a plot of (X, Y*, 6), wheret
= *() X(3 1)X,, + 914( Z  ,l,/ ( 5)
14/ = W1(fl/), and S contains the plot symbols for uncensored points and renovated points.
This content downloaded from 185.44.78.105 on Wed, 25 Jun 2014 09:07:05 AMAll use subject to JSTOR Terms and Conditions
http://www.jstor.org/page/info/about/policies/terms.jsp
Plotting Renovated Samples 1149
Notice that when the BuckleyJames method provides a unique solution, f,, may then be written as f,B, = (XTWX) IXTWY* (Smith and Zhanig, 1995) and is the least squares estimator of 38 usinlg clata from the renovated dotplot. Standard boxplot comparisons may then take place on the renovated data (X, Y*, 5).
The reniovated data may be usefully employed in QQ plots to detect an unlderlying distribution for the respoinse. Importantly, after renovation, for each of thep samples we may produce plots of the empirical survivor function S,(u), defined as the fraction of the in withinsample data exceeding 11. Wheni the linear model is appropriate, the consistency of the BuckleyJames estimators implies that the renovatecl points will provide a guide to the shape of the suivival funlctioll for each group.
5. Comparative Boxplots: An Example Lawless (1982), Gehan (1965), aind others have discussed data from a clinical trial examining steroid induced remission times (weeks) for leukemia patients. One group of 21 patients were given 6mercaptopurliie (6MP); a second group of 21 patients were given a placebo. Since the trial lasted 1 year and patients were admitted to the trial during the year, rightcensoring occurred at the cutoff date wheni some patients were still in remission. Observations logf,Z on log remission time logt,Y are as follows:
6MP: 1.79 1.79 1.79 1.79+ 1.95 2.2(0+ 2.30 2.30+ 2.40+ 2.56 2.77 (Group 1) 2.83+ 9.94+ 3.00+ 3.09 3.14 3.22+ 3.47+ 3.47+ 3.52+ 3.56+ Placebo: .00 .00 .69 .69 1.10 1.39 1.39 1.61 1.61 2. 08 2.18 (Group 0) 2.08 208 240 2 4 _48 2.48 2.71 2. 83 3.0)9 3.14
The '+' denotes right censoring in the 6MP group, so that 6+ represenits an observed 6week remission which was still in effect at the closure of the trial.
0~~~~~~~~ C
1)4
0 E~~~~~~~~~ o co o** _:
E
E c\ 0 CMj 6 0 C
0
LC
CQ Q
0 1 0 1
Group Group Figure 1. Dotplots of original and renovated data for leukemia log remission times.
The BuckleyJames method provides an exact solution to the model parameteis in this two sample problem on the logarithm scale with covariate x = 1 for 6MP; x = 0 for placebo. The logarithmic transformation has the effect of stabilising the variance in the two groups being comiipared. In regression terminiology, the appropriateness of the linear model is important since, for an exact solution, the least squares line for the renovated data and the BuLckleyJames line for the prereniovat ion data coincide (Smith and Zhang, 1995). The renovated log(,Y data are:
6MP: 1.79 1.79 1.79 3.37 1.95 3.50 2.3() 3.53 3.53 2.56 2.77 (Group 1) 3.72 3.79 3.79 3.09 3.14 3.87 4.022 4.02 4.02 4.12 Placebo: .00 .00 .69 .69 1.10 1.39 1.39 1.61 1.61 2.08 2.08 (Group ()) 2.0)8 2.0?8 2.40) 2.40) 2.48 2.484 2.71 2.83 3.()9 3.14
Dotplots of the original data and the renovated data are given in Figure l, where mulltiple points have been overwritten by a single plot symbol. Notice that it is the smallest censor times which receive the greatest renovation; the censored observation at 1.79 log weeks is renovated to over 3.37
This content downloaded from 185.44.78.105 on Wed, 25 Jun 2014 09:07:05 AMAll use subject to JSTOR Terms and Conditions
http://www.jstor.org/page/info/about/policies/terms.jsp
1150 Bioinetrics, Septemiber 1995
log weeks. In general, cenisored poinits with the most negative residuals are relnovated towards thc mean. The renovated dotplot is a view of the patients' survival if no censoring was prcsent. SuchI renovation produces a change of rank order in the data set, with wider spread apparent in the 6MP data in Figure 1.
CO
0
E c\j

Placebo 6MPPL 6MPRenovate
Figure 2. Boxplots of renovated data for leukemia log remissioni times for 'PIlacebo' and '6MP Renovate'; for comparison, '6MPPL' is a Gentleman and Crowley (19922) boxplot of thle 6MP
data based on the PLestimator.
Having established the new ranks in the renovation process, boxplot comparisons of the two groups may proceed. In particular, in Figure 2, we concentrate oni the 6MP data wherie the rightcenisoring occurs. The boxplot method of Gentleman and Crowley (1992), whichi is based oni inverting the productlimit estimator to locate appropriate quantiles for the boxplot display labelled "6MPPL", is compared directly with the renovated boxplot labelled "6MPRenovate". Thle difference between the two plots is partially caused by the large proportionl of cenlsorec data at the top of the 6MP distribution leavinig the productlimit estimator apparently "hanging" before its conventional assignmenit to zero beyond the largest observation (Efroni, 1967; Miller, 1981).
. _ __ . _ _ _ _ _ _ _ _ _ _
6MPPL Llacebo 6MPPL 6MPRenovate
C: co
CD
(Id C\b
0 1 2 3 4 5
Log remission
Figure 3. Compariisons for the 6MP group: empirical survivor fuinction for raenovated log lifevtiiet (dshe lines); produ ctlimit estimatonr fora lnog lifetime (dotpted lines).
This effec is deonstated dineFigurei3, whereonovthed logxscle,th prollductlimitestimator' SThe represented byte cothed line andt is panretimator ofse Sby the surviva funcotion forfo remission. Onta the same logfcae the empP isricaburiorfntion Sevig h prdcisi rpestiaopaentebycahdlins.A ''ani' ceorsequenc
This content downloaded from 185.44.78.105 on Wed, 25 Jun 2014 09:07:05 AMAll use subject to JSTOR Terms and Conditions
http://www.jstor.org/page/info/about/policies/terms.jsp
Plotting Renovated Samples 1151
of the final residual rankings of both groups combined, the renovation process moves some censored times beyond the largest prerenovation observation in the 6MP group thereby reducing the "hanging effect" of the estimated survival function at the top of the distribution. For the empirical survivor function, the points of discontinuity occur at every distinct renovated data point. In comparison, because of the redistributetotheright algorithm (Efron, 1967), the jump sizes at the points of discontinuity of 5* increase towards the top of the distribution.
Notice that generally ST, depends on the censoring pattern in both the samples being compared, whereas 5* is determined from a single sample. However, when the linear model is appropriate, the consistency of the BuckleyJames estimators implies that, provided that the expected number of censored observations and uncensored observations is large over the support of the survival distribution (Meier, 1975), both 5* and ST* are uniformly consistent estimators of the same survival function. For moderate sample sizes, when the linear model is appropriate, the graph of the empirical survivor function of the renovated data provides an alternative to the graph of the productlimit estimator on the observed data.
ACKNOWLEDGEMENTS
Part of the research in this work was undertaken while the author was visiting the Department of Statistics, University of Auckland, New Zealand. The author thanks the referees for their editorial comments and suggestions.
RESUME
Dans cette note, nous utilisons la m6thode de BuckleyJames pour la regression censuree dans un probleme de p echantillons quand les echantillons sont censures a droite. Les echantillons sont reconstitu6s pour eliminer l'effet de la censure et des methodes graphiques bas6es sur les quantiles (telles que les "boxplots") peuvent etre utilis6es comme m6thodes standards d'analyse de donn6es pour d6crire la variable qui est mesuree.
REFERENCES
Buckley, J. J. and James, I. R. (1979). Linear regression with censored data. Biometrika 66, 429436.
Efron, B. (1967). The two sample problem with censored data. Proceedings of...
Recommended