royal statistical society are collaborating with jstor to...

19
A Generalized Estimating Equation Method for Fitting Autocorrelated Ordinal Score Data with an Application in Horticultural Research Author(s): N. R. Parsons, R. N. Edmondson and S. G. Gilmour Source: Journal of the Royal Statistical Society. Series C (Applied Statistics), Vol. 55, No. 4 (2006), pp. 507-524 Published by: Wiley for the Royal Statistical Society Stable URL: http://www.jstor.org/stable/3879106 . Accessed: 02/06/2014 10:14 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . Wiley and Royal Statistical Society are collaborating with JSTOR to digitize, preserve and extend access to Journal of the Royal Statistical Society. Series C (Applied Statistics). http://www.jstor.org This content downloaded from 129.59.95.115 on Mon, 2 Jun 2014 10:14:31 AM All use subject to JSTOR Terms and Conditions

Upload: others

Post on 05-Mar-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Royal Statistical Society are collaborating with JSTOR to ...hbiostat.org/papers/ordinal/par06gen.pdfApp. Statist. (2006) 55, Part 4, pp. 507-524 A generalized estimating equation

A Generalized Estimating Equation Method for Fitting Autocorrelated Ordinal Score Data withan Application in Horticultural ResearchAuthor(s): N. R. Parsons, R. N. Edmondson and S. G. GilmourSource: Journal of the Royal Statistical Society. Series C (Applied Statistics), Vol. 55, No. 4(2006), pp. 507-524Published by: Wiley for the Royal Statistical SocietyStable URL: http://www.jstor.org/stable/3879106 .

Accessed: 02/06/2014 10:14

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

Wiley and Royal Statistical Society are collaborating with JSTOR to digitize, preserve and extend access toJournal of the Royal Statistical Society. Series C (Applied Statistics).

http://www.jstor.org

This content downloaded from 129.59.95.115 on Mon, 2 Jun 2014 10:14:31 AMAll use subject to JSTOR Terms and Conditions

Page 2: Royal Statistical Society are collaborating with JSTOR to ...hbiostat.org/papers/ordinal/par06gen.pdfApp. Statist. (2006) 55, Part 4, pp. 507-524 A generalized estimating equation

App. Statist. (2006) 55, Part 4, pp. 507-524

A generalized estimating equation method for fitting autocorrelated ordinal score data with an application in horticultural research

N. R. Parsons and R. N. Edmondson

University of Warwick, Coventry, UK

and S. G. Gilmour

Queen Mary, University of London, UK

[Received March 2005. Final revision April 2006]

Summary. Generalized estimating equations for correlated repeated ordinal score data are developed assuming a proportional odds model and a working correlation structure based on a first-order autoregressive process. Repeated ordinal scores on the same experimental units, not necessarily with equally spaced time intervals, are assumed and a new algorithm for the joint estimation of the model regression parameters and the correlation coefficient is developed. Approximate standard errors for the estimated correlation coefficient are developed and a simu- lation study is used to compare the new methodology with existing methodology. The work was part of a project on post-harvest quality of pot-plants and the generalized estimating equation model is used to analyse data on poinsettia and begonia pot-plant quality deterioration over time. The relationship between the key attributes of plant quality and the quality and longevity of ornamental pot-plants during shelf and after-sales life is explored.

Keywords: Generalized estimating equations; Ordinal scores; Plant quality scores; Proportional odds model; Repeated measures

1. Introduction

Modelling ordinal score data that are repeated over time is common to a wide range of prob- lems and has been studied by many researchers. Miller et al. (1993), Kenward et al. (1994) and

Lipsitz et al. (1994) have adopted the generalized estimating equation (GEE) approach that was proposed by Liang and Zeger (1986) for proportional odds models for ordinal responses. Maximum likelihood procedures have been developed by Molenberghs and Lesaffre (1994) and Glonek and McCullagh (1995) and a Bayesian approach has been suggested by Girard and Parent (2001).

In ornamental plant and crop research, the quality of produce is paramount; see for example Conover (1986), Nielsen and Starkey (1999) and Williams et al. (2000). Ornamental crop quality can be modelled by visually scoring quality on an ordinal scale and then relating the quality scores to a set of underlying plant attributes by using an appropriate statistical model. However, for long-lived ornamental products such as pot-plants, plant quality scores must be repeated over time and must then be related to plant attributes measured on the same plants at the same times. For example, Table 1 shows a sample of records from a simulated home-life begonia

Address for correspondence: N. R. Parsons, Risk Initiative and Statistical Consultancy Unit, Department of Statistics, University of Warwick, Coventry, CV4 7AL, UK. E-mail: [email protected]

? 2006 Royal Statistical Society 0035-9254/06/55507

This content downloaded from 129.59.95.115 on Mon, 2 Jun 2014 10:14:31 AMAll use subject to JSTOR Terms and Conditions

Page 3: Royal Statistical Society are collaborating with JSTOR to ...hbiostat.org/papers/ordinal/par06gen.pdfApp. Statist. (2006) 55, Part 4, pp. 507-524 A generalized estimating equation

508 N. R. Parsons, R. N. Edmondson and S. G. Gilmour

Table 1. Data records for five plants, showing the change in plant attributes, from a simulated home-life

begonia trial conducted during 6 weeks in spring 1999 at Horticulture Research International (Efford, Hamp- shire, UK)

Plant Weekt Flowerst Budst, Leaves4 Quality dropped score

Single Double Dropped Damaged Dropped Damaged Pale

1 0 0 15 0 4 0 0 1 N 4 1 1 0 20 0 5 0 0 1 N 3 1 2 0 20 1 5 0 0 1 Y 2 1 3 0 21 3 4 0 0 1 Y 2 1 4 0 16 6 0 1 0 1 Y 1 1 5 0 14 3 0 4 1 1 Y 1 1 6 0 11 3 0 7 1 2 Y 1 2 0 3 12 1 0 4 4 0 N 4

2 1 5 12 0 0 8 4 0 N 3 2 2 5 16 0 0 8 4 0 Y 3 2 3 7 15 5 0 6 5 0 Y 1 2 4 7 10 10 2 2 6 0 Y 1 2 5 11 2 13 21 0 6 0 Y 1 2 6 18 1 4 24 0 6 1 Y 1 3 0 0 17 0 6 0 0 3 N 4

3 1 0 21 0 9 0 0 3 N 3 3 2 0 22 0 9 0 0 3 Y 3 3 3 0 24 4 7 0 0 4 Y 2 3 4 1 19 14 0 4 0 4 Y 1 3 5 8 11 13 0 9 1 3 Y 1 3 6 12 6 6 0 11 1 4 Y 1 4 0 0 15 0 1 0 1 0 N 4 4 1 0 18 0 0 0 1 0 N 3 4 2 0 20 0 0 0 1 1 N 3 4 3 0 27 0 0 0 1 0 N 2 4 4 0 33 5 0 0 1 0 Y 2 4 5 1 26 9 2 0 1 0 Y 1 4 6 5 28 5 2 0 2 0 Y 2

5 0 0 20 0 0 8 3 0 N 3 5 1 0 26 0 0 12 3 0 N 3 5 2 0 25 1 0 12 5 0 Y 2 5 3 0 22 4 1 11 3 2 Y 1 5 4 0 14 10 5 3 4 0 Y 1 5 5 0 6 14 18 0 6 0 Y 1 5 6 0 4 2 15 0 8 0 Y 1

tNumber of weeks of simulated home-life. tCounts of damaged or dropped flowers, buds and leaves, counts of single and double flowers and an indicator of the presence (Y) or absence (N) of pale leaves.

pot-plant trial comprising ordinal scores for overall plant quality together with a series of plant attributes recorded at weekly intervals during 6 weeks. In this paper, we develop modified GEE

methods and apply the methodology to model the dependences of the observed repeated ordin-

al responses on the underlying plant attributes. It is worth noting that, because pot-plants can

continue to grow and develop after purchase, quality can improve as well as decline; therefore

we make no assumption about monotonically declining quality in our models.

In this paper, GEE methodology is developed for fitting a proportional odds model (McCul-

lagh, 1980) to repeated ordinal scores, where the primary scientific interest is the estimation of

marginal mean regression parameters. A common assumption for this class of models is that

the working correlation between observations on the same experimental unit at different times

This content downloaded from 129.59.95.115 on Mon, 2 Jun 2014 10:14:31 AMAll use subject to JSTOR Terms and Conditions

Page 4: Royal Statistical Society are collaborating with JSTOR to ...hbiostat.org/papers/ordinal/par06gen.pdfApp. Statist. (2006) 55, Part 4, pp. 507-524 A generalized estimating equation

Fitting Autocorrelated Ordinal Score Data 509

is 0. This assumption is convenient, as no additional modelling of the correlation structure is required, but is unrealistic and unlikely to be valid. Efficiency studies by some researchers have shown that modelling the correlation structure can be important (Sutradhar and Das, 2000) and it seems clear that fitting a model that takes explicit account of any underlying error struc- ture is likely to be more realistic and efficient than simply assuming no underlying correlation structure. In the completely general case, the complexity of the ordinal regression model leads to complex and intractable correlation structures with many parameters that cannot be reliably estimated. However, the general model can be simplified by making realistic assumptions about the correlation structure and, in this paper, a model for repeated ordinal score data will be devel- oped based on an autoregressive process for observations with a single unknown correlation coefficient. Autoregressive correlation structures provide good approximations to reality while still allowing reliable and efficient estimates of model parameters.

First, we describe some typical ornamental plant data and then develop a modified GEE methodology and obtain explicit equations for estimating the parameters of an assumed model with repeated ordinal scores and an autoregressive correlation structure. Next, we use simu- lated data sets to assess the consistency and efficiency of the methodology proposed. Finally, we use the methodology to examine data that were recorded over a period of 6 weeks of simulated home-life from an experiment on the quality of ornamental pot-plants and show how our model can be used to describe the contribution of individual plant attributes to overall plant quality.

2. Ornamental plant data

Data were collected from simulated home-life trials, on 144 commercially grown begonia (Be- gonia elatior) and poinsettia (Euphorbia pulcherrima) ornamental pot-plants at Horticulture Research International (Efford, Hampshire, UK) in spring 1999 and winter 1999-2000 respec- tively. One of the main aims of the trials, which we shall concentrate on in this paper, was to produce sufficient variability in the crop response to allow the development of models relating overall plant quality to the measured plant attributes. This was achieved by applying transport stress to two widely grown cultivars of each species at three levels of simulated transport stress (low, medium and high) during transport from the nursery to the retailer. The effects of trans- port stress on plant quality, during the 6 weeks following marketing, were then assessed in a range of simulated home-life environments comprising all factorial combinations of two tem- peratures (21 'C and 16

?C), two light regimes (12 ~mols m-2 s-1 and 6 ~mols m-2 s-1) and two

watering regimes (standard and fluctuating for begonia, and standard and wet for poinsettia). The quality of the plants was assessed weekly throughout all 6 weeks of simulated home-life with the same assessor making an overall plant quality score on a four-point ordinal scale (1, very poor; 2, poor; 3, good; 4, very good) at each assessment occasion.

Immediately before each quality assessment, a series of plant attribute records were made comprising, for begonia,

(a) flower count (single and double flowers), (b) flower drop, (c) bud drop, (d) damaged flower count, (e) damaged leaf count, (f) leaf drop and (g) the presence of pale leaves,

and, for poinsettia,

This content downloaded from 129.59.95.115 on Mon, 2 Jun 2014 10:14:31 AMAll use subject to JSTOR Terms and Conditions

Page 5: Royal Statistical Society are collaborating with JSTOR to ...hbiostat.org/papers/ordinal/par06gen.pdfApp. Statist. (2006) 55, Part 4, pp. 507-524 A generalized estimating equation

510 N. R. Parsons, R. N. Edmondson and S. G. Gilmour

(a) cyathia (flower) drop as a proportion of the total count, (b) bract (red leaf) drop, (c) leaf drop, (d) the presence of bract edge necrosis, (e) the presence of pale leaves and (f) the presence of pale bracts.

Table 1 shows data from five typical begonia plants recorded at each week of the trial during sim- ulated home-life. Flowers were predominantly of the double type (flower within a flower) with the numbers of single (standard) flowers increasing in the final weeks of the trial. As the plants matured during home-life, flowers, buds and leaves became discoloured (damaged or pale) and finally dropped from the plants (to be removed before quality assessments). The change in these attributes was associated with the decline in quality scores during home-life, and the aim of the modelling work was to assess the contribution of each attribute to the overall quality of the plant.

3. A proportional odds model

3. 1. Model data We assume a model for repeated observations on the same experimental unit where observations are made at a number of time points. Let N experimental units be scored at each of T time points by using an ordinal scale with K categories. Let Yit represent the score on the ith experimental unit at the tth time where the individual scores are made on an integer-valued scale from 1 to K, where K represents the optimum score. Let Y = (Yil, Yi2,... - -, iT) be a vector of scores for the ith experimental unit over the set of T time points. In addition to the ordinal scores, assume that a multivariate vector of measured variables, xit, is observed on each experimental unit at each time point t. A marginal ordinal regression model can then be used to explain the relationship between the ordinal scores Yit and the measured variables xit.

3.2. Model specification The probability that an ordinal score Yit falls within a particular score category can be related to the measured variables xit by a proportional odds model based on cumulative logits (McCullagh and Nelder (1989), chapter 5):

log

=itk--00k

+ Xto (1) (1 -

Aitk)

Here, PLitk = P(Yit - k) is the cumulative probability for all scores Yit <, k, the /0k for k = 1,..., K - 1 are cut points to be estimated from the data and 3 is a vector of model parameters. The cut points (-oo < /3o < ... <0/0(K-1) < co) define the divisions between the ordinal score categories on the cumulative logit scale.

Equation (1) transforms the ordinal scale to a continuous scale based on the linear predictor

xt/3, where the cut points 30k define the class boundaries on the continuous scale of measure- ment. The predicted response xtW3

is a linear function of the measured variates xit and is a convenient model for relating the ordinal scores to the measured variates. The probability that a sampling unit has a score that is greater than k is 1 - pitk and, since, from equation (1), a decrease in x•t3 implies a decrease in pitk, those attributes that increase the score will have negative regression coefficients, whereas those attributes that decrease the score will have positive regres- sion coefficients. To ensure that higher values of qit indicate higher scores, we choose qit = -x•ti

This content downloaded from 129.59.95.115 on Mon, 2 Jun 2014 10:14:31 AMAll use subject to JSTOR Terms and Conditions

Page 6: Royal Statistical Society are collaborating with JSTOR to ...hbiostat.org/papers/ordinal/par06gen.pdfApp. Statist. (2006) 55, Part 4, pp. 507-524 A generalized estimating equation

Fitting Autocorrelated Ordinal Score Data 511

to represent the objective score for the ith unit at the tth time point. The cut points /0k give a

partition of the objective score qit into intervals 30(k-1) < qit < 3Ok, where the kth interval covers the continuous objective score for the kth ordinal score category Yit = k. A scale without fixed reference points is difficult to interpret, so if desired the objective score model can simply be rescaled to give predictions on the original scale of the ordinal scores; see Hannah and Quigley (1996).

The objective quality score model with regression parameters that are independent of the assessment occasion and the cut point categories proves most useful for ornamental plant data and for this reason only the proportional odds model is considered here. However, in situations where more complex models may be more applicable, tests of the proportionality assumption are available (Stiger et al., 1999).

4. Generalized estimating equations 4. 1. The generalized estimating equation model For repeated ordinal score data, the most widely used method for parameter estimation for the proportional odds model that is shown in equation (1), and the method that is described

here, is the GEE approach; see for example Hardin and Hilbe (2002). The GEE method, which was originally introduced by Liang and Zeger (1986), avoids strong assumptions about the dis- tribution and the dependence structure of the repeated measures that are required for the full likelihood model by solving multivariate analogues of the quasi-likelihood estimating equa- tions (Wedderburn, 1974). Although the resulting estimates are not maximum likelihood esti-

mates, they do have asymptotic normality and consistency (Liang and Zeger, 1986). The most

widely used GEE methods model the effects of the explanatory variables 3 and the correla- tion between observations a, using a working correlation matrix R, separately. The matrix R can be constructed by using a priori assumptions about the likely correlation structure, or it can be calculated and updated by using an iterative fitting algorithm, although Pepe and Anderson (1994) have offered some cautionary comments on fitting correlated longitudinal data.

Typically, when inferences on 0 are of prime importance, the most widely used GEE method estimates 0 by solving an estimating equation q3p(3; y, a) = 0, treating the a as nuisance param- eters that can be estimated for many common correlation structures by using Pearson residuals

it = Vit 1/2(Yit - pit) and moment estimators (Liang and Zeger, 1986) or, more generally, by solving a second estimating equation q, (a; y, 3) = 0 in an analogous manner to the method that is used to estimate 3 (Prentice, 1988). The most widely used approach for estimating a is to use a simple function Ei

r'iu iv/N of the Pearson residuals that can, in principle, be derived directly

from the solution of an estimating equation for a (Hardin and Hilbe, 2002). This approach was exemplified for a range of working correlation structures by Lipsitz et al. (1994) and for

working independence by Kenward et al. (1994). Iterative estimation using Pearson residuals to estimate a from the solution of an estimating

equation can lead to non-convergence and to estimates of a that are not sensible for the working correlation model (Crowder, 1995), and this has led some researchers to suggest alternatives

(Chaganty, 1997; Wang and Carey, 2003). As the models that are required for plant quality score data are often complex (Parsons, 2004) and involve a large number of explanatory vari-

ates, model selection and fitting can be problematic and in extreme cases may fail to converge. These difficulties have led us to develop an alternative method of estimating a by minimizing an objective function Q~(ca; y, /3) (Crowder, 1995), rather than using a simple function of the Pearson residuals. The modified method GEEM that is described here is similar to other GEE

This content downloaded from 129.59.95.115 on Mon, 2 Jun 2014 10:14:31 AMAll use subject to JSTOR Terms and Conditions

Page 7: Royal Statistical Society are collaborating with JSTOR to ...hbiostat.org/papers/ordinal/par06gen.pdfApp. Statist. (2006) 55, Part 4, pp. 507-524 A generalized estimating equation

512 N. R. Parsons, R. N. Edmondson and S. G. Gilmour

methods in that it uses an estimating equation qp(/3; y, a) = 0 to estimate /3. However, it differs in the method that it uses to estimate

oa. Rather than directly solving an additional estimating

equation qa (a; y, /3) = 0 we estimate a, in this paper the first-order autoregressive correlation model, by minimizing the logarithm of the determinant of the covariance matrix of the regres- sion parameters at each step of a fitting algorithm. The use of the logarithm of the determinant of the covariance matrix as a suitable objective function Q, (a; y, /3) is discussed in Section 4.4. The method for estimating / is described in Section 4.2 and the method for estimating Ca is described in detail in Sections 4.3-4.5.

4.2. Estimation of/3 GEEs for ordinal models redefine a set of K ordinal responses as K - 1 binary responses and then fit a proportional odds model for the marginal probabilities by assuming a suitable choice of working correlation matrix for the dependences between the binary responses (Kenward et al., 1994). At each time point, each experimental unit has an ordinal score between 1 and K and these scores can be transformed into a new set of K - 1 binary variables by the equa- tions Zitk = 1 if Yit ? k or Zitk = 0 if Yit > k for k = 1,..., K - 1. This transformation gives a data vector Zit =

(Zitl,... , Zit(K-1)) of K -1 binary variables for each experimental unit at

each time point t. There are S explanatory variates xit observed at each of the T time points and the complete T x S matrix of explanatory variates for each sampling unit i = 1,..., N is Xoi = (Xil,Xi2, .. . , XiT)'. The complete data matrix including the cut points and the explana- tory variables for the ith sampling unit is Xi = (1 T IK-1, Xoi 0 1K-1). Here, 1T and 1K-1 are T-dimensional and (K - 1)-dimensional vectors of unit elements respectively, and IK-1 is the (K - 1)-dimensional identity matrix.

Let the data vector of binary measurements for the ith sampling unit be Z = (Zil,..., Z/) and let E[Zi] = i where p (si i.. T iPt (Puitl,... , Itit(K-1)) and Ititk = P(Yit < k). Let

O•= (3o01,i,..,/3(K- 1)) be a (K - 1)-dimensional vector of cut point parameters, let /3 = (3I1, 2,... , I3s) be an S-dimensional vector of regression parameters and let /3= (/3, )/3). Let g(.) represent the logit function. Then equation (1) can be re-expressed as

E[Zil]= g-(XiP). (2)

Let Di = aPti(3)/8/ and Wi = VJ/2RiV/2, where V1/2 is a matrix containing the square roots of the variances of the elements of Zi, {Pitk(1 - 1Litk) 11/2, along the leading diagonal. Let Ri be the matrix of correlations between the elements of Zi. For any given set of explanatory variables, and assuming that the model equations are fully identified, the model parameters of equation (2) can be estimated by iterative reweighted least squares by equating the GEEs q'p(3; Z, a) to 0 where

N q(; Z, a) = DWi (Zi - i). (3)

i=l

The covariance matrix of the regression parameters can be shown to be

-1 N N-1 Vo

=-E

DiW-Di E-

DiW-l cov(Zi)W- Di DiW-

'Di (4) i=1 i= 1 i=1

An estimate of Vgp can be found by replacing /3 by its current estimate and by replacing cov(Zi) by V /2(IT 0 S)V/2 along the leading diagonal blocks and (Zi - pli)(Zi - Pi)' on the off- diagonal blocks (Liang and Zeger, 1986). Equation (4) is a consistent estimator of the variances

This content downloaded from 129.59.95.115 on Mon, 2 Jun 2014 10:14:31 AMAll use subject to JSTOR Terms and Conditions

Page 8: Royal Statistical Society are collaborating with JSTOR to ...hbiostat.org/papers/ordinal/par06gen.pdfApp. Statist. (2006) 55, Part 4, pp. 507-524 A generalized estimating equation

Fitting Autocorrelated Ordinal Score Data 513

of the regression parameters regardless of the specification of Wi. Equation (3) will produce consistent estimates of the model parameters as N -- o, even if the covariance structure of Zi is not correctly specified (Liang and Zeger, 1986). However, for practical values of N, the effi- ciency of estimation of the parameter vector p and the reliability of inference can be improved if Ri, the 'working' correlation matrix, is chosen to be as close to the true correlation matrix of Zi as possible.

4.3. Specification of Ri The first stage in estimating Ri is to account for the correlations between the derived binary responses Zi within each time point. It can be shown (see Kenward et al. (1994)) that the expected correlation between binary variables Zitj and Zitk is given by pjk = Pkj = exp(3oj - 00k) 1/2 where j < k. By assumption, the same correlation matrix applies to every set of binary variables at every time point t and can be written as the (K - 1) x (K - 1)-dimensional matrix

P( 11 ... P(K-1)

P(K-1)1 ... P(K-1)(K-1)

Assuming that the longitudinal model for the repeated observations is continuous, the matrix of correlations between observations on consecutive pairs of vectors of binary observations on the same unit must approach S as the interval between the observations approaches 0. Therefore a natural limiting model for the correlation matrix between consecutive vectors of observations is aS where 0 < a < 1 is a weighting coefficient that approaches 1 as the interval between the observations decreases. Further, if we assume a first-order autoregressive model for a, the cor- relation matrix between vectors of observations at time points ti and tj becomes alti-t IS. Under these assumptions, a natural model for the working correlation matrix for the set of vectors of observations on the same experimental unit is Ra = COa S, where Ca is the (T x T)-dimensional autoregressive matrix

1 adl2 . . . dlT

C ad21 da dTIT

adT1 .... dTr-1 1

where dij = dji = Iti - tj 1. The correlation matrix Ra is a function of a single unknown scalar parameter a and, under this model, the problem of estimating the weighting matrix Wi in equation (3) reduces to the problem of estimating the single unknown parameter a of Ra.

4.4. Estimation of ae An alternative to directly formulating and solving an additional estimating equation for a (Hardin and Hilbe, 2002), in an analogous manner to that developed for 3 in Section 4.2, is to minimize an objective function Qa,(a; Z, 3). Vp is a consistent estimator of the variance matrix of the regression parameters and is a function of the unknown correlation parameter a and therefore a natural candidate for an objective function is loglV~ i. The determinant IVp I is the generalized variance of /3 and minimizing loglV,3l is equivalent to minimizing the size of the confidence region of the estimated parameter vector /3. The criterion can also be justified by using information theory (Kullback, 1997), as minimizing j(VpI is equivalent to maximiz- ing the determinant of the information matrix. To ensure that Ra remains positive definite

This content downloaded from 129.59.95.115 on Mon, 2 Jun 2014 10:14:31 AMAll use subject to JSTOR Terms and Conditions

Page 9: Royal Statistical Society are collaborating with JSTOR to ...hbiostat.org/papers/ordinal/par06gen.pdfApp. Statist. (2006) 55, Part 4, pp. 507-524 A generalized estimating equation

514 N. R. Parsons, R. N. Edmondson and S. G. Gilmour

during model fitting, a is constrained to lie in the interval (-1, 1) by making the transfor- mation q = log(1 + a) - log(1 - a) for -1 < a < 1, thus allowing loglVp3 to be minimized as a function of 0. Using current estimates of the model parameters /m and covariance matrix Vi3m, an updated estimate of q, after m iterations of the Newton-Raphson method, is given by

lm+l-

=

m-- l2glV

({logV ( (5) ~~m2llYgm

aq5 >b=abm

Expressions for the first and second derivatives of logIVp•

are given in Appendix A. Fitting proceeds by using an initial estimate of 4 and equations (3) and (4) to estimate 3 and Vp3, and an updated estimate of b is given by equation (5). The fitting procedure is iterated until conver- gence is achieved. Estimates of parameters /3 and b will be obtained provided that equation (3) has a solution 3 for every 4.

The difference between the method that is proposed here and other GEE methods can be clarified by noting that equation (5) is the Newton-Raphson method for solving the estimating equation a{logIV,() I}/q •

I = 0. Therefore, it is evident that our method uses the same estimat- ing equation as other GEE methods for estimating /3, but it uses a different estimating equation for the correlation parameter a. The Newton-Raphson method converges quadratically near a root, making it a good choice for a function such as loglV31 whose derivatives can be evalu- ated efficiently for the correlation structure of Section 4.3. The strong convergence property of the Newton-Raphson method (Lange, 2004) and the desirable model robustness properties of Vp (Kauermann and Carroll, 2001) make this method of estimating a particularly suitable for modelling complex and variable data from biological systems.

4.5. Variance of q An approximate standard error for the estimated transformed correlation parameter q can be found by using a method based on maximum likelihood (Lindsey (1996), chapter 3). Let log lV(3) I be a continuous function on the interval (-oo, oo) and let loglV3(q)l have a mini- mum in this interval. Then a second-order approximation for logIlV3() lat 0 is given by

1 2 a2{loglVQ(q)l|} logVp(q)l + _(q-)2

002 2 a02

Thus the function IM(() I, where M(0) = V 1 (Q) and IVp()|-' - IVVI (k)I, can be approxi- mated by a normal distribution function with mean 0 and standard deviation s2= 1/a2{loglV(O) 1/la02}1 =,

giving

IM( q)I kexp{-( q- )2/2s2}. An approximate standard error for the parameter q is given by

s=(y 2 {lgV()I2

) }

and confidence intervals for q can be calculated if we assume an approximate normal distri- bution for ?. Confidence intervals for & can be obtained by back-transforming the interval boundaries for b to the original scale of a by using the transformation

This content downloaded from 129.59.95.115 on Mon, 2 Jun 2014 10:14:31 AMAll use subject to JSTOR Terms and Conditions

Page 10: Royal Statistical Society are collaborating with JSTOR to ...hbiostat.org/papers/ordinal/par06gen.pdfApp. Statist. (2006) 55, Part 4, pp. 507-524 A generalized estimating equation

Fitting Autocorrelated Ordinal Score Data 515

exp(q)- 1 exp() +1

5. Simulation study The efficiencies of parameter estimation of the three GEE methods

(a) GEEMR1, the modified GEE method that was described in Section 4, (b) GEEAR1, a conventional GEE method using the model residuals to estimate a as

Z ritkri(t+l)k/N(K - 1)(T - 1) i,k,t

(see Hardin and Hilbe (2002), chapter 3), and (c) GEEI, assumed working independence for the correlation model,

were compared by using a simulation study. A repeated binary measurement data generator was used to produce simulated data by using the algorithm of Park et al. (1996). 1000 repeated ordinal score data sets were generated with autoregressive correlation parameters a = 0.5 and a =0.7, cut point parameters given by 01o = , /02= 1 and /03 = 5, and a regression parameter given by / = 2, representing the effect of a within-cluster (time-varying) covariate. Each of the 1000 simulations comprised 50 sampling units, with three observations (at equally spaced time points) per unit. Table 2 shows mean estimates and nominal 95% confidence intervals, based on the parameter estimates at convergence for 1000 simulations, for /3, 01, 02, /303, loglVpl and a for each of the three methods for each correlation parameter.

For both correlation values and for all three methods, the regression parameter estimates were close to their true values and the confidence intervals included the true model parameters, confirming that equation (3) produces consistent estimates of the model parameters, even when the working correlation matrix is not correctly specified, as was the case for method GEEI. Sim- ilarly, for methods GEEMR1 and GEEAR1, the estimates of a were close to their true values and the confidence intervals included the true model parameters. As reported in similar simulation studies elsewhere, Table 2 shows that, for both choices of correlation parameter, the variance of the parameter estimates (as measured by loglVpl) was smaller for methods GEEAMR1 and GEEAR1, which specifically estimated the correlation parameter, than for method GEEI, which did not. There were no significant differences in values of loglVp I between methods GEEMR1 and GEEAR1 for either correlation, indicating that the estimation methods were similarly efficient. The simulation study also showed that there was good agreement between the actual precision and the estimated precision for the regression parameter /3 (the data are not shown), indicating that the estimated standard errors were indeed reflecting the true precision of the regression parameter estimates.

For a = 0.5, both method GEEMR1 and method GEEAR1 converged in a small number of iterations (5 or fewer) for the overwhelming majority of the 1000 simulations. However, a small number of the simulations proved problematic for both methods with slow convergence that required 7 or more iterations for 11 and 30 of the 1000 simulations for methods GEEMR1 and GEEAR1 respectively. The significantly larger number of simulations that showed slow con- vergence for the GEEAR1 method compared with the GEEMR1 method is evidence that the

GEEMR1 method has better convergence properties than the GEEAR1 method. The difference in the relative convergence rates of the two methods suggests that, for very large data sets or for very large simulation studies, the improved speed of convergence of the modified method would be substantial. Large scale simulation studies are beyond the scope of this paper but these

This content downloaded from 129.59.95.115 on Mon, 2 Jun 2014 10:14:31 AMAll use subject to JSTOR Terms and Conditions

Page 11: Royal Statistical Society are collaborating with JSTOR to ...hbiostat.org/papers/ordinal/par06gen.pdfApp. Statist. (2006) 55, Part 4, pp. 507-524 A generalized estimating equation

516 N. R. Parsons, R. N. Edmondson and S. G. Gilmour

Table 2. Means and nominal 95% confidence intervals, based on 1000 simulations, for 3, 001, 902 033, and loglVO3 for correlation parameters 0.5 and 0.7, using methods GEEMR1, GEEAR1 and GEEI

Parameter Results for the following methods:

GEEMR1 GEEAR1 GEEI

Mean (Lower, Upper) Mean (Lower, Upper) Mean (Lower, Upper)

a = 0.5 0 2.005 (1.985, 2.026) 2.014 (1.993, 2.035) 2.010 (1.990, 2.029) 301 0.496 (0.489, 0.503) 0.499 (0.492, 0.507) 0.497 (0.489, 0.504)

/02 1.002 (0.988, 1.015) 0.991 (0.978, 1.005) 1.000 (0.987, 1.014) /303 5.031 (4.943, 5.119) 5.040 (4.950, 5.131) 5.028 (4.938, 5.119) a 0.505 (0.498, 0.512) 0.500 (0.495, 0.504) loglVo l -12.965 (-12.990, -12.940) -12.955 (-12.980, -12.931) -12.871 (-12.897, -12.844)

a = 0. 7 0 1.997 (1.980, 2.015) 1.993 (1.976, 2.009) 1.995 (1.978, 2.011) 301 0.501 (0.493, 0.509) 0.498 (0.490, 0.506) 0.497 (0.489, 0.504)

/02 0.999 (0.983, 1.015) 0.993 (0.978, 1.008) 0.995 (0.981, 1.010) /03 4.993 (4.900, 5.088) 5.014 (4.925, 5.106) 5.030 (4.934, 5.128) a 0.700 (0.693, 0.705) 0.704 (0.700, 0.708) logIVol -12.821 (-12.845, -12.798) -12.821 (-12.845, -12.798) -12.643 (-12.670, -12.616)

limited studies support the theoretical advantages of the Newton-Raphson iteration algorithm that is used in the GEEMR1 method compared with the GEEARI method. Where the assump- tions of a first-order autoregressive model for R are fully valid, information theoretic arguments (Kullback, 1997) strongly support the use of the GEEMR1 method compared with GEEAR1.

6. Analysis of ornamental plant data

Models relating plant quality during home-life to a range of measured attributes were fitted to data from plant species begonia and poinsettia separately. Model fitting proceeded using the GenStat procedure GEEORDINAL (which is available from the authors at http: / /www2. warwick. ac. uk/ fac / sci /statistics /staff /research/parsons/mgee); the pro- cedure is a modified version of the standard GenStat statistical software procedure GEE (Kenward and Smith, 1995; GenStat, 2000). The method of model selection initially included all main effects and all two-factor treatment interaction effects and then discarded terms if the corresponding Wald statistic (Rotnitzky and Jewell, 1990) indicated a lack of significance. Ta- ble 3 shows the estimates of 3 that were based on the optimal model found by this method of selection using the GEEAMRI algorithm, and standard errors based on the robust covariance ma- trix of equation (4), for both poinsettia and begonia data. Table 3 also shows estimates of 3 for the optimal model using the GEEI algorithm, the working independence model (setting a= 0 throughout model fitting) as a comparison. Estimation of model parameters was not sensitive to the initial value of 0, and it was found that using an initial value of q = 0.8 gave convergent model parameters (the maximum relative change for any of the regression parameters was less than 10-3) after nine and 11 cycles of the fitting algorithm, for begonia and poinsettia respec- tively. Values of logI Vp (b) I and a2 {logiVp (0) }/I 2,= at convergence were -35.985 and 3.917 for poinsettia and -70.913 and 11.733 for begonia.

This content downloaded from 129.59.95.115 on Mon, 2 Jun 2014 10:14:31 AMAll use subject to JSTOR Terms and Conditions

Page 12: Royal Statistical Society are collaborating with JSTOR to ...hbiostat.org/papers/ordinal/par06gen.pdfApp. Statist. (2006) 55, Part 4, pp. 507-524 A generalized estimating equation

Fitting Autocorrelated Ordinal Score Data 517

Table 3. Parameter estimates and standard errors se, using algorithms GEEMRl and GEE,, for models relating plant attributes to expert quality assessments for poinsettia and begonia

Model parameter Results for the following methods:

GEEMR1 GEEI

Estimate se Estimate se

Poinsettia 001 -2.020 0.222 -2.132 0.294 o02 0.072 0.228 -0.045 0.290

/03 3.229 0.332 3.252 0.369 Cyathia drop -4.524 0.345 -4.773 0.379 Bract drop 0.060 0.012 0.117 0.024 Bract edge necrosis 0.871 0.163 0.989 0.203 Foliar colour 1.466 0.188 1.694 0.272 Bract colour 1.707 0.458 2.291 0.569 Bract edge necrosis x -1.262 0.460 -2.081 0.538

Bract colour 1.263 0.505

Begonia

301 -4.428 0.301 -4.336 0.321 302 -1.470 0.250 -0.994 0.274 /03 1.025 0.197 1.862 0.218 Double-flower count -0.031 0.010 -0.072 0.011 Single-flower count -0.084 0.013 -0.108 0.014 Flower drop 0.264 0.030 0.311 0.034 Bud drop 0.270 0.035 0.285 0.042 Foliar colour 3.063 0.177 3.719 0.191 Damaged flower count 0.105 0.032 0.105 0.034 Flower drop x Bud drop -0.026 0.004 -0.031 0.005

0.552 0.292

The estimated autoregressive correlation of & = 0.559, for poinsettia, was relatively large for method GEEMAR1 and resulted in this method giving significantly smaller estimates for the standard errors of all the regression parameters than did method GEEI. Fig. 1 shows a plot of loglVOl calculated over a range of values of q, using the model regression parameter esti- mates at convergence, and also a second-order approximation to this curve (see Section 4.5). The approximation was realistic over the central range of values and suggested an approximate standard error for the correlation parameter 5 of s = 0.505. A 95% confidence interval for ac was given by (0.136, 0.810) and therefore there is good evidence of a positive autoregressive correlation between observations for poinsettia. The estimated autoregressive correlation for begonia (& = 0.269) was relatively modest and the approximation for the standard error of 0 was good (the data are not shown) and suggested an approximate standard error for the correlation parameter q of s = 0.292.

The parameter estimates in Table 3 represent the effects of a unit change in the respective variates on plant quality, expressed on a cumulative logit scale. For poinsettia, the model coeffi- cients showed that, as the plant lost cyathia, the overall plant quality declined whereas high bract drop, the presence of bract edge necrosis, pale upper canopy leaves and pale bracts all

This content downloaded from 129.59.95.115 on Mon, 2 Jun 2014 10:14:31 AMAll use subject to JSTOR Terms and Conditions

Page 13: Royal Statistical Society are collaborating with JSTOR to ...hbiostat.org/papers/ordinal/par06gen.pdfApp. Statist. (2006) 55, Part 4, pp. 507-524 A generalized estimating equation

518 N. R. Parsons, R. N. Edmondson and S. G. Gilmour

-34.2 -

-34.4 -

-34.6 -

-34.8 -

S-35.0 -35.2

-35.4

-35.6

-35.8

-36.0

0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4

Correlation parameter (0) Fig. 1. log|Vpl (*) versus 0, for values of a in the range [0.2, 0.8], and a second-order approximation (-) to this curve, for the poinsettia data

reduced overall plant quality. The effects of bract edge necrosis and pale bracts on reduced plant quality were less than additive when both symptoms were observed together. This was realistic, as we would expect the effect of bract edge necrosis on plant quality to be less marked for pale bracts (probably near the end of home-life) than for bright red bracts (at the start of home-life). Similarly, for begonia, the model coefficients showed that high double- and single-flower counts improved overall plant quality whereas high flower drop, bud drop, the presence of pale upper canopy leaves and a high damaged flower count all reduced overall plant quality. The effects of bud and flower drops on reduced plant quality were less than additive.

In general, the parameter estimates appeared to agree with the expectations of the assessor for both crops. The measured attributes that were recorded covered the range of plant features that have previously been reported to be important for quality retention during home-life for poin- settia (Miller and Heins, 1983; Scott et al., 1983; Nell and Barrett, 1986) and begonia (Larson, 1980; Hoyer, 1985) and it is unsurprising that these characteristics also proved important in this study. However, the models that are described here are the first to indicate the relative impor- tance of each of the measured plant attributes to overall plant quality. This work has important implications for growers in suggesting changes in current methods of pot-plant production, as many of the post-production physiological aspects of potted plants are known to be influenced by production factors (Nielsen and Starkey, 1999). Thus, a grower could choose to modify the production environment to optimize only those aspects of physiology that determine the quality of pot-plants during post-production retailing and home-life.

Using the parameter estimates from Table 3, objective quality scores qit can be determined for each pot-plant and at each occasion during home-life. The objective quality score is fully determined from a range of readily measurable plant attributes and can be used to quantify how changes in the measured attribute data affect perceived plant quality. The effects of exper- imental treatments on home-life quality can then be determined by a simple analysis of vari-

This content downloaded from 129.59.95.115 on Mon, 2 Jun 2014 10:14:31 AMAll use subject to JSTOR Terms and Conditions

Page 14: Royal Statistical Society are collaborating with JSTOR to ...hbiostat.org/papers/ordinal/par06gen.pdfApp. Statist. (2006) 55, Part 4, pp. 507-524 A generalized estimating equation

Fitting Autocorrelated Ordinal Score Data 519

ance of a number of derived variables (e.g. mean quality during home-life and rate of quality decline during home-life) calculated as functions of the time course of objective quality scores qi = (qil, qi2, ... , qiT) for each pot-plant.

7. Discussion

We have presented a methodology for the analysis of longitudinal ordinal score data that relates the scores to a range of underlying measured variates assuming a model that allows successive observations on the same experimental unit to be positively correlated over time. Our method relates the score data to the underlying measured variates by using a modified GEE methodol- ogy that is based on a proportional odds model for the repeated ordinal scores. The working correlation structure that we developed uses only a single autoregressive model parameter and this gives a model that is relatively simple to estimate and interpret. Our experience with this model is that our fitting algorithm has good convergence properties and is reasonably fast and robust against the initial choice of the correlation parameter. For this method to be applicable, log IVp I must have a minimum on the interval 0 < a < 1, but otherwise we expect the algorithm to converge in all situations, provided that Vp is positive definite at each iteration, which is also a necessary condition for the working independence model. The approximate standard error that we have suggested for the autoregressive correlation parameter depends on the appropri- ateness of the quadratic approximation for logIV,(s) i at q. In our example, the transformation

5 = log(1 + a) - log(1 - a) gave an approximate confidence region for & showing that there was good evidence of a positive correlation between successive observations on the same experimen- tal unit.

The simulation study showed that the GEEAMR1 method gave consistent estimates of the model regression parameters and the correlation parameter. The efficiency of estimation was at least as good as the more conventional method (GEEAR1), which uses the model residuals to estimate the correlation parameter directly, and was significantly better than the working independence method (GEE,). Our experience with real plant quality data suggests that the GEEAMR1 method converges more reliably than the GEEARI method and there was some evidence to support this observation from the significantly lower number of simulations that converged particularly slowly, for the simple models of Section 5, for method GEEAMR1 compared with method GEEARI We think that convergence is more likely to be affected by anomalous values of the Pearson residuals for the GEEAR1 method than for the GEEAMR1 method; therefore we recommend the use of the modified GEEMR1 method for routine analysis of real data sets. A comprehensive review and simulation study would be useful to explore the reliability and speed of convergence of all available GEE methods for ordinal data. However, only practical experience with real data sets is likely to show the relative advantages of the two methods and it would be valuable to see the two methods compared in practice on real and varied data sets.

The specification of R, in Section 4.3 is a simple working correlation matrix for ordinal data with an assumed first-order autoregressive correlation structure that does not appear to have been much used in the current context. However, we think that the specification is natural and useful for data of this type. There was some gain in efficiency of estimation of the parameters of equation (2) when using the working correlation structure of Section 4.3 relative to the inde- pendence model and we expect that the potential gains in efficiency could be much larger for data sets with stronger correlations between successive observations. Another possible reason for studying the correlation structure of repeated observations is for the planning and design of future research. If successive repeated observations are strongly correlated then the recording interval may be too short and the recording effort may not be making the best use of avail-

This content downloaded from 129.59.95.115 on Mon, 2 Jun 2014 10:14:31 AMAll use subject to JSTOR Terms and Conditions

Page 15: Royal Statistical Society are collaborating with JSTOR to ...hbiostat.org/papers/ordinal/par06gen.pdfApp. Statist. (2006) 55, Part 4, pp. 507-524 A generalized estimating equation

520 N. R. Parsons, R. N. Edmondson and S. G. Gilmour

able resources. Observations with short recording intervals do not give full information if the autocorrelation is high and a more efficient strategy may be to record more experimental units but with fewer and more widely spaced observations. In our example, we found that weekly recording intervals gave small but significant autocorrelations of 0.559 and 0.269 for poinsettia and begonia respectively, and we think that more frequent recording would have given little extra useful information. Knowledge of the correlation structure of the data has helped to con- firm that our sampling interval of I week was about right for these particular crops. Although estimation of the correlation parameter helped to confirm our choice of sampling interval, it should be emphasized that the limiting value a, even if it exists, may depend on the form that is chosen for R, and thus may not be a well-defined quantity associated with an underlying stochastic model.

We expect that the autoregressive correlation structure and estimation method that we have discussed will have applicability in a range of problems. Assessment of the adequacy of a fitted GEE model is problematic because there is no likelihood function on which to base goodness-of- fit tests. Goodness-of-fit tests have been proposed for GEE modelling (Barnhart and Williamson, 1998), but their application is non-trivial, particularly for complex regression problems. Model selection criteria based on quasi-likelihood have also been suggested for GEEs (Pan, 2001). Further work in assessing the goodness of fit of GEE models with correlated data structures would be valuable.

Acknowledgements This research work was carried out while the first author was employed at Warwick Horticulture Research International and supported by Horticulture LINK, project Hort 194. The authors are grateful to the Associate Editor and two referees for their helpful comments and suggestions.

Appendix A: Expressions for 0 {IogljVm(O)l}/o0

and 2 {Iogl m(4)l}/0 2

Expressions for the first and second derivatives of loglVpl are given as follows. From equation (4) let Vp = H-1GH-', where

N

H= EDW-'Di i=1

and N

G= DW1 ' cov(Zi)W• Di,

i=l

and let the logarithm of the determinant of Vp be written as logIlV I= loglGI - 2 logIHI. Assuming that H and G are of full rank and A1... AL and nl,..., qL are the eigenvalues of H and G

respectively, then IHI- AI... AL and IGI = ql...- L. The first derivative of loglVp I with respect to a is

logIVI = aloglGI -2-log|HI aal aa aac

where

S loglHI-= 13a

and

logGI • --

0a I rT/

This content downloaded from 129.59.95.115 on Mon, 2 Jun 2014 10:14:31 AMAll use subject to JSTOR Terms and Conditions

Page 16: Royal Statistical Society are collaborating with JSTOR to ...hbiostat.org/papers/ordinal/par06gen.pdfApp. Statist. (2006) 55, Part 4, pp. 507-524 A generalized estimating equation

Fitting Autocorrelated Ordinal Score Data 521

The second derivative of loglVpl is

a2 a2 a2

c2logIV I = aa2 logIGI -22 a2loglHI

where

22loglHI :

/ -E

aa A, 1 AX

and

a2 1 2

log|GI=

_ rg aai2 I rl

Derivatives on the transformed scale, a = {exp(o) - 1 }/{exp(o) + 1 }, are given by

a aaa

a~ d~a~a and

a2 a2a a2 a2 log2 o V - -

loglV)| + 1 12 VlogV i, 02 2 2 t a21

where

aa 2exp(0)

o f {1 +exp()1}2

and

a2ca 2exp(q){ 1 - exp(o)} a02 {1+exp(0)}3

The first derivative of A; is given by differentiating the expression Hp, = Alp1 with respect to a, where pi is the lth eigenvector of H, to give

apt (OAt aH

(H - IA) - a I - aa

)PI. aa aa aa )

Since p'(H - IAt) = 0, it follows that

a= ,PaH aa PIla

Differentiating A' again with respect to a gives

82 l ,pa2H ap' H , aH apt aa2 Plaa2Pl +aaaa z aaPl ijaat t

Since

P(L' - IAt)P' = I I -_- Pi aa(aa aa)

where P is the matrix of eigenvectors of H and L' is the corresponding diagonal matrix of eigenvalues, and since papt/caa = 0 because of the orthogonality of pi and apt/ac, it follows that a generalized inverse solution for apt/aa is

= PE P' IP_ aH

P aa P (aa aa)

where E6 is a diagonal matrix with elements e, = 1/(Ar - At) if r A 1 and et is arbitrary.

This content downloaded from 129.59.95.115 on Mon, 2 Jun 2014 10:14:31 AMAll use subject to JSTOR Terms and Conditions

Page 17: Royal Statistical Society are collaborating with JSTOR to ...hbiostat.org/papers/ordinal/par06gen.pdfApp. Statist. (2006) 55, Part 4, pp. 507-524 A generalized estimating equation

522 N. R. Parsons, R. N. Edmondson and S. G. Gilmour

Substitution into equation (6) gives

2A1 /2H ( 6, aH

,BH p Ppl) _a , aH aH

Pa2 = P 2- Pi + pPEP p + PE-2p PEP p, aa2 Ia2 1 11)1 ace

and, since p'PE•P' = elp and PEbP' = Er Prp:e,, this can be expressed as

a2 A1 2H BAl\ 2 BH prP H A12 =

+ P i+2e,-2 p, Pi + eB l ( aa2 aa2 aa ra r - 1 t

Eliminating the arbitrary el gives the explicit solution

2AI ( 2H a 2H aH\ p2 -Pi 2 FlB Pi

aa2 Ia2 aa aa ) where

F = PrPr

ro! Ar - A,

Expressions for the first and second derivatives ofrh, r• and 77' can readily be derived in an analogous way to that described above and are given by

B_/

, BG = q1 qB

and

82771 ( a2G BG BG\ =q; -2 F1 -)ql, Ba2 IBa2 Ba Ba /

where

F1 = r rol 7r -- T7

and ql is the /th eigenvector of G. The first and second derivatives of H are given by

BH NZD 1/2( aca S-1 /2

Bai1

and

N2H N 2c-1

02H = -DIV1/2

S-1 V) 1/2Di, aa2 i-1

Ba2 J and the first and second derivatives of G by

G = D:v 1/2 ( S- Vi1/2 COv(Zi)V 1/2

(C-1 S-1)Vi1/2Di aa

ia=1 2

0aa N 1DVi -1/2(C--1 (gS-)V-1/2 Cov(Zi)V-1/2 • • 1V 2

i=o i (a~v-

- D

and

2G N 1/2 S-1 V 1/21cv(Zi)VZ 1/2(C-1 I S-')V'1/2Di aa2 i=

l

aa2

+2 NDIV71/2 C -IV 1/2 Cov(Zi)V

1/2 ac

S-1')vi 1/2Di

+ D•V-1/2

(C1 0S-1)V 1/2 COV(Zi)V1/2 2C ) ?- 1/2D

i= I a i B aa2 ji

This content downloaded from 129.59.95.115 on Mon, 2 Jun 2014 10:14:31 AMAll use subject to JSTOR Terms and Conditions

Page 18: Royal Statistical Society are collaborating with JSTOR to ...hbiostat.org/papers/ordinal/par06gen.pdfApp. Statist. (2006) 55, Part 4, pp. 507-524 A generalized estimating equation

Fitting Autocorrelated Ordinal Score Data 523

where

1 -ad12

1 - a2d21 1 - a2d12

•ad21

- a2d31 - ad23

1 - a2d21 (1 - a2d21)(1 --

2d32) 1 - C2d23

l -td32 C-1

= 0 ".

. 0

1 - t2d32

.1

- d C1dTT-2 -cdT-IT

(1 c- 2dT-1T-2)(1 - c2dTr-1) 1 - a2dT-1T

-addT-1 1 1 -

a2dT_-1 1 - a2drT-1

and the first and second derivatives of C.' with respect to a follow by differentiation of the expressions in each cell of the matrix. For the most common case of equally spaced observations dii-n = di-ni = n, and the inverse matrix is

1 -a 0 -.. 0

-a 1 + 2 -act +a, 1 - a2 0 -a ' - a O

1 +2 _ \0

.. 0" -a 1

with derivatives given by

2a -(1 +a-2)

0 "'"

0

-(1 +ca2)

4a -(1+a2) ac- 1 a (1-a2)2 0 -(1 + a 2)

. . 4a -(1+ a2)

0 ... 0 -(1 + a2) 2a

and

1 + 3c2 -a(3 + a2) 0 ... 0

-a(3 + a2) 2(1 + 3a2) -a(3-+a2) a2C-1 2

- 2

(1-2)3 0 -a(3 + a2) . . O

2(1 + 32) -a(3 +a 2) 0 -.. 0 -a(3 + a2) 1 + 3a 2

References Barnhart, H. X. and Williamson, J. M. (1998) Goodness-of-fit tests for GEE modeling with binary responses.

Biometrics, 54, 720-729. Chaganty, N. R. (1997) An alternative approach to the analysis of longitudinal data via generalized estimating

equations. J. Statist. Planng Inf.,

63, 39-54. Conover, C. A. (1986) Quality. Acta Hort., 181, 201-205. Crowder, M. (1995) On the use of a working correlation matrix in using generalised linear models for repeated

measures. Biometrika, 82, 407-410. GenStat (2000) Genstatfor Windows, Release 4.2, 5th edn. Oxford: VSN International. Girard, P. and Parent, E. (2001) Bayesian analysis of autocorrelated ordered categorical data for industrial quality

monitoring. Technometrics, 43, 180-191. Glonek, G. E V. and McCullagh, P. (1995) Multivariate logistic models. J R. Statist. Soc. B, 57, 533-546. Hannah, M. and Quigley, P. (1996) Presentation of ordinal regression analysis on the original scale. Biometrics,

52, 771-775.

This content downloaded from 129.59.95.115 on Mon, 2 Jun 2014 10:14:31 AMAll use subject to JSTOR Terms and Conditions

Page 19: Royal Statistical Society are collaborating with JSTOR to ...hbiostat.org/papers/ordinal/par06gen.pdfApp. Statist. (2006) 55, Part 4, pp. 507-524 A generalized estimating equation

524 N. R. Parsons, R. N. Edmondson and S. G. Gilmour

Hardin, J. W and Hilbe, J. M. (2002) Generalized Estimating Equations. New York: Chapman and Hall. Hoyer, L. (1985) Bud and flower drop in Begonia elatior Sirene caused by ethylene and darkness. Acta Hort., 167,

387-391. Kauermann, G. and Carroll, R. J. (2001) A note on the efficiency of sandwich covariance matrix estimation.

J Am. Statist. Ass., 96, 1387-1396. Kenward, M. G., Lesaffre, E. and Molenberghs, G. (1994) An application of maximum likelihood and generalized

estimating equations to the analysis of ordinal data from a longitudinal study with cases missing at random. Biometrics, 50, 945-954.

Kenward, M. G. and Smith, D. M. (1995) Computing the generalized estimating equations for repeated ordinal measurements. Genstat Newslett., 32, 63-70.

Kullback, S. (1997) Information Theory and Statistics. New York: Dover Publications. Lange, K. (2004) Optimization. New York: Springer. Larson, R. A. (1980) Begonias. In Introduction to Floriculture. New York: Academic Press. Liang, K.-Y. and Zeger, S. L. (1986) Longitudinal data analysis using generalized linear models. Biometrika, 73,

13-22. Lindsey, J. K. (1996) Parametric Statistical Inference. Oxford: Oxford University Press. Lipsitz, S. R., Kim, K. and Zhao, L. (1994) Analysis of repeated categorical data using generalized estimating

equations. Statist. Med., 13, 1149-1163. McCullagh, P. (1980) Regression models for ordinal data (with discussion). J R. Statist. Soc. B, 42, 109-142. McCullagh, P. and Nelder, J. A. (1989) Generalized Linear Models, 2nd edn. London: Chapman and Hall. Miller, M. E., Davis, C. S. and Landis, J. R. (1993) The analysis of longitudinal polytomous data: generalized

estimating equations and connections with weighted least squares. Biometrics, 49, 1033-1044. Miller, S. and Heins, R. D. (1983) Variation in cyathia abscission of poinsettia cultivars in a greenhouse and a

simulated postharvest environment. HortScience, 21, 270-272. Molenberghs, G. and Lesaffre, E. (1994) Marginal modeling of correlated ordinal data using a multivariate

Plackett distribution. J. Am. Statist. Ass., 89, 633-644. Nell, T. A. and Barrett, J. E. (1986) Growth and incidence of bract necrosis in Gutbier V-14 Glory poinsettia.

J Am. Soc. Hort. Sci., 111, 266-269. Nielsen, B. and Starkey, K. R. (1999) Influence of production factors on postharvest life of potted roses. Postharv.

Biol. Technol., 16, 157-167. Pan, W. (2001) Akaike's information criterion in generalized estimating equations. Biometrics, 57, 120-125. Park, C. G., Park, T. and Shin, D. (1996) A simple method for generating correlated binary variates. Am. Statistn,

50, 306-310. Parsons, N. R. (2004) Statistical methods for improving pot-plant quality and robustness. PhD Thesis. Queen

Mary, University of London, London. Pepe, M. S. and Anderson, G. L. (1994) A cautionary note on inference for marginal regression models with

longitudinal data and general correlated response data. Communs Statist. Simuln, 29, 939-951. Prentice, R. L. (1988) Correlated binary regression with covariates specific to each binary observation. Biometrics,

44, 1033-1048. Rotnitzky, A. and Jewell, N. P. (1990) Hypothesis testing of regression parameters in semiparametric generalized

linear models for cluster correlated data. Biometrika, 77, 485-497. Scott, L. E, Blessington, T. M. and Price, J. A. (1983) Postharvest effects of temperature, dark storage duration,

and sleeving on quality retention of 'Gutbier V-14 Glory' poinsettia. HortScience, 18, 749-750. Stiger, T. R., Barnhart, H. X. and Williamson, J. M. (1999) Testing proportionality in the proportional odds

model fitted with GEE. Statist. Med., 18, 1419-1433. Sutradhar, B. C. and Das, K. (2000) On the accuracy of efficiency of estimating equation approach. Biometrics,

56, 622-625. Wang, Y.-G. and Carey, V. (2003) Working correlation structure misspecification, estimation and covariate design:

implications for generalized estimating equations performance. Biometrika, 90, 29-41. Wedderburn, R. W M. (1974) Quasi-likelihood functions, generalized linear models and the gaussian method.

Biometrika, 61, 439-447. Williams, M. H., Rosenqvist, E. and Buchhave, M. (2000) The effect of reducing water availability on the

post-production quality of potted miniature roses (Rosa x hybrida). Postharv. Biol. Technol., 18, 143- 150.

This content downloaded from 129.59.95.115 on Mon, 2 Jun 2014 10:14:31 AMAll use subject to JSTOR Terms and Conditions