meaningful change in cancer-specific quality of life scores: differences between improvement and...

15
Meaningful change in cancer-specific quality of life scores: Differences between improvement and worsening David Cella, Elizabeth A. Hahn & Kelly Dineen Evanston Northwestern Healthcare and Northwestern University, Evanston, Illinois, USA (E-mail: d-cella@ nwu.edu) Accepted in revised form 19 December 2001 Abstract Introduction: There has been increased recent attention to the clinical meaningfulness of group change scores on health-related quality of life (HRQL) questionnaires. It has been assumed that improvements and declines of comparable magnitude have the same meaning or value. Method: We assessed 308 cancer patients with the Functional Assessment of Cancer Therapy (FACT) and a Global Rating of Change. Patients were classified into five levels of change in HRQL and its dimensions based upon their responses to retrospective ratings of change after 2 months: sizably worse, minimally worse, no change, minimally better, and sizably better. Raw score and standardized score changes on the FACT-G subscales and total score were then compared across different categories of patient-rated change. Results: The relationship between actual FACT change scores and retrospective ratings of change was modest but usually statistically significant (r: 0.07 to 0.35). Change scores associated with each retrospective rating category were evaluated to determine estimates of meaningful difference. Patients who reported global worsening of HRQL di- mensions had considerably larger change scores than those reporting comparable global improvements. Although related to a ceiling effect, this remained true even after removing cases that began near the ceiling of the questionnaire. Discussion: Relatively small gains in HRQL have significant value. Comparable declines may be less meaningful, perhaps due to patients’ tendency to minimize personal negative evalu- ations about one’s condition. This has important implications for the interpretation of the meaningfulness of change scores in HRQL questionnaires. Factors such as adaptation to disease, response shift, disposi- tional optimism and the need for signs of clinical improvement may be contributing to the results and should be investigated in future studies. Keywords: Clinical significance, Functional Assessment of Cancer Therapy (FACT), Meaningful change, Responsiveness Introduction Health-related quality of life (HRQL) is an im- portant factor in rational decision-making re- garding treatment options. The increased interest in identifying meaningful change in HRQL reflects an emerging emphasis on the evaluation of meaningful outcomes to the patient [1]. As patients assume more active roles in their treatment, they have become increasingly concerned with the im- pact of treatment upon their lives. Patients value the provision of HRQL information along with other relevant outcome data as they make impor- tant treatment decisions. Interpretable HRQL data from standardized questionnaires will assist clinicians in delivering this information reliably. HRQL is a multidimensional construct that encompasses physical, mental and social health domains [2–7]. Several HRQL questionnaires have been validated for generic and disease-specific as- sessment. An important next step in the matura- tion of these questionnaires is to determine a meaningful improvement or decline in scores, to help assess the value of a given treatment. Quality of Life Research 11: 207–221, 2002. Ó 2002 Kluwer Academic Publishers. Printed in the Netherlands. 207

Upload: david-cella

Post on 05-Aug-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Meaningful change in cancer-specific quality of life scores: Differences between improvement and worsening

Meaningful change in cancer-specific quality of life scores: Differences

between improvement and worsening

David Cella, Elizabeth A. Hahn & Kelly DineenEvanston Northwestern Healthcare and Northwestern University, Evanston, Illinois, USA (E-mail: [email protected])

Accepted in revised form 19 December 2001

Abstract

Introduction: There has been increased recent attention to the clinical meaningfulness of group changescores on health-related quality of life (HRQL) questionnaires. It has been assumed that improvements anddeclines of comparable magnitude have the same meaning or value. Method: We assessed 308 cancerpatients with the Functional Assessment of Cancer Therapy (FACT) and a Global Rating of Change.Patients were classified into five levels of change in HRQL and its dimensions based upon their responses toretrospective ratings of change after 2 months: sizably worse, minimally worse, no change, minimallybetter, and sizably better. Raw score and standardized score changes on the FACT-G subscales and totalscore were then compared across different categories of patient-rated change. Results: The relationshipbetween actual FACT change scores and retrospective ratings of change was modest but usually statisticallysignificant (r: 0.07 to 0.35). Change scores associated with each retrospective rating category were evaluatedto determine estimates of meaningful difference. Patients who reported global worsening of HRQL di-mensions had considerably larger change scores than those reporting comparable global improvements.Although related to a ceiling effect, this remained true even after removing cases that began near the ceilingof the questionnaire. Discussion: Relatively small gains in HRQL have significant value. Comparabledeclines may be less meaningful, perhaps due to patients’ tendency to minimize personal negative evalu-ations about one’s condition. This has important implications for the interpretation of the meaningfulnessof change scores in HRQL questionnaires. Factors such as adaptation to disease, response shift, disposi-tional optimism and the need for signs of clinical improvement may be contributing to the results andshould be investigated in future studies.

Keywords: Clinical significance, Functional Assessment of Cancer Therapy (FACT), Meaningful change,Responsiveness

Introduction

Health-related quality of life (HRQL) is an im-portant factor in rational decision-making re-garding treatment options. The increased interestin identifying meaningful change in HRQL reflectsan emerging emphasis on the evaluation ofmeaningful outcomes to the patient [1]. As patientsassume more active roles in their treatment, theyhave become increasingly concerned with the im-pact of treatment upon their lives. Patients valuethe provision of HRQL information along with

other relevant outcome data as they make impor-tant treatment decisions. Interpretable HRQLdata from standardized questionnaires will assistclinicians in delivering this information reliably.

HRQL is a multidimensional construct thatencompasses physical, mental and social healthdomains [2–7]. Several HRQL questionnaires havebeen validated for generic and disease-specific as-sessment. An important next step in the matura-tion of these questionnaires is to determine ameaningful improvement or decline in scores, tohelp assess the value of a given treatment.

Quality of Life Research 11: 207–221, 2002.� 2002 Kluwer Academic Publishers. Printed in the Netherlands.

207

Page 2: Meaningful change in cancer-specific quality of life scores: Differences between improvement and worsening

Regardless of the type of HRQL measure used,the ability to determine clinically meaningfulchange is essential, and the patient is the final ar-biter. For this reason, we adhere to the essentialunderpinning of the McMaster approach to deter-mining a minimally important difference. Specifi-cally, if a change score on a given questionnaire canbe calibrated to a global rating offered by a patientregarding change in a given dimension, then thatcalibrated change score can be considered as an‘anchor’ for subjectively meaningful change.

Available literature suggests or at least impliesthat the significance of a change score is indepen-dent of the direction of change. It seems to havebeen assumed that the value of HRQL improve-ment is equivalent to the value of HRQL decline.Clinical experience challenges this assumption.Specifically, in oncology treatment it has beenobserved that small improvements in HRQL canbe very meaningful to patients who are looking forany indication of response to treatment. On theother hand, when HRQL declines during cancertreatment, patients have been observed to adap-tively minimize negative attributions regarding thedecline. In summary, it appears that patients canendure rather large negative changes in theirquality of life for the purpose of potential gains inthe future. Likewise, patients are often heartenedby what would be perceived from a purely statis-tical viewpoint as small positive changes.

If there is a difference in the meaning or valuepatients place upon change in HRQL, dependingupon whether that change is positive or negative, itshould be measurable with a properly designedstudy. To determine whether meaningfulness ofchange depends upon direction of change, we ex-amined calibrations of change scores to globalchange ratings separately by direction of change.Previous researchers using similarmethods typicallycombine improved with worsened scores into singlegroups according to the magnitude, or absolutedifference, of globally rated change [8–9]. Yet thereis reason from the cognitive and health psychologyliterature to expect that improvements would bemore highly valued than comparable declines.

Clinical vs. statistical significance

Until fairly recently there has been a tendency forstatistical significance testing to dominate the re-

porting of findings, especially in clinical trials [10].Often, clinical trials have large numbers of sub-jects. As Kraemer [11] cogently points out, onecan find a statistically significant effect in almostany study simply by increasing the sample size[12]. However, finding a statistically significantchange does not tell us if the change is clinicallysignificant or subjectively meaningful. Problemsarise when trying to derive clinical meaning fromstatistically significant results. While several au-thors discuss methods for evaluating the reliabil-ity of change scores (i.e., making some adjustmentfor measurement error) these methods do notprovide the best way to determine the meaning-fulness of change [13–15]. Published articles usu-ally do not provide enough information about thesample to allow for a more comprehensive un-derstanding of significant findings. Also, the over-reliance on the use of hypothesis testing andmeasures of significance leads many to disregardother important findings such as response pat-terns, degrees of effect, and the time course ofeffect [16, 17]. So, although tests of statisticalsignificance can reveal change, they do not indi-cate whether the measured change is meaningfulto the average patient.

Lydick and Epstein [18] provides a summaryof approaches to the study of clinical meaning-fulness, dividing them into two categories: distri-bution-based or anchor-based interpretations.Distribution-based definitions are based on sta-tistical distributions, e.g., effect size measures [12],the reliable change index [15, 19–21], and othermeasures using means and SDs obtained fromresearch studies or reference populations. An-chor-based interpretations of clinical meaningful-ness are defined as measures that compare, oranchor, HRQL changes to other clinical changes,similar to the methods used to evaluate the con-struct validity of a measure. The most commonlyused anchor-based measure is the global assess-ment originally described by Jaeschke et al. [8].Originally referred to as the minimal clinicallyimportant difference, the now-preferred term‘minimally important difference’ (MID) is definedas: ‘the smallest difference in score in the domainof interest which patients perceive as beneficialand which would mandate, in the absence oftroublesome side effects and excessive cost, achange in the patient’s management’ [8, p. 408].

208

Page 3: Meaningful change in cancer-specific quality of life scores: Differences between improvement and worsening

They developed a brief, two-part scale in whichpatients are first asked to make global ratings ofchanges (GRC) in specified domains. For exam-ple, in a study of HRQL in chronic respiratorydisease, they asked patients whether or not theirshortness of breath was worse, about the same, orbetter. Based upon their response to this firstquestion, patients then used a seven-point scale torate how much better or worse they were. Pre-liminary data suggested that estimates of mean-ingful change based upon clinical experienceclosely corresponded with change using thismethod.

Although much work is being done in this area,a recent review of the literature suggests that thereis a lack of consensus on how responsiveness(sensitivity to clinical change) should be quantified[22]. Kazis et al. [23] described several differentways effect sizes can be used to provide a clearerunderstanding of change in scores. In general theauthors felt that because there is no accepted no-tion of clinically meaningful change, effect sizescould be compared to Cohen’s [12] general guide-lines to supplement widely used tests of statisticalsignificance. They also suggested that HRQL effectsizes might differ somewhat from Cohen’s bench-marks, which were not derived from HRQL mea-sures.

Osoba et al. [9] applied an anchor-based ap-proach to address the issue of ‘subjective signifi-cance’ in cancer patients. Using this approach, theauthors compared changes over time on the Eu-ropean Organization for Research and Treatmentof Cancer (EORTC) quality of life questionnaire(QLQ-C30) with a newly developed subjectivesignificance questionnaire (SSQ). They examinedeffect sizes (mean change scores divided by the SDof subjects at baseline) and compared these effectsizes with patient-rated level of change (e.g., nochange, moderate change, very much change).They found that the magnitudes of change scoresincreased in the expected directions.

In examining the approaches, we chose to con-duct our study using an adaptation of theMcMaster method of determining magnitudes ofclinically meaningful difference scores. Ideally, theanchor, in this case the patients’ GRC in each ofthe measured dimensions, should correlate(rP0:5) with actual change score in that dimen-sion. We then evaluated whether the magnitude of

the anchor (GRC) could be a useful criterion formeaningful change, and specifically whether thedirection of the GRC (better vs. worse health)made a difference in the magnitude of the changescore.

Methods

Participants

Patients were recruited from an urban academicmedical center. Possible participants were identi-fied by chart review and physician referral. Partic-ipants must have met these criteria: (1) 18 years ofage or older, (2) able to read and speak English, (3)diagnosis of breast, colorectal, head and neck, lungor prostate cancer (any stage of disease), (4) ab-sence of brain metastasis, delirium, psychosis, orsevere depression, and (5) signed informed consent.Patients could be in active treatment or follow-up.A total of 308 patients were enrolled in the study.Approximately half of the enrolled patients werefemale, with a mean age of 58.8 years, and one-fourth were ethnic minorities (Table 1). Nearlyone-third had a diagnosis of lung cancer and theremaining diagnoses included breast cancer (25%),colorectal cancer (19%), prostate cancer (14%) orhead-and-neck cancer (9%). One-fourth of thepatients had no current evidence of disease, andnearly 40% had metastatic disease.

Measures

Version 3 of the functional assessment of cancertherapy-general (FACT-G)Fact-G was given along with the disease-specificsubscale fitting the patient’s cancer diagnosis. TheFACT-G, developed empirically using interviewswith cancer patients and providers, was designed tobe brief, yet sensitive to changes in HRQL. Internalconsistency, score stability, and validity are well-documented elsewhere [24–29]. The scored items inthe FACT-G employ a Likert-type format(0 ¼ ‘not at all’ to 4 ¼ ‘very much’). The itemsassess four dimensions of HRQL: physical well-being (PWB) (seven items, range: 0–28), social/family well-being (SWB) (seven items, range: 0–28),emotional well-being (EWB) (six items, range: 0–24), and functional well-being (FWB) (seven items,

209

Page 4: Meaningful change in cancer-specific quality of life scores: Differences between improvement and worsening

range: 0–28). Although the disease-specific sub-scales were also administered for five solid tumors

(lung, breast, colorectal, prostate, head and neck),only FACT-G data are presented here.

Table 1. Patient characteristics

Total Completeda Not completeda p-Valueb

N = 308 (%) N = 197 (%) N = 111 (%)

Female gender 157 (51.0) 99 (50.2) 58 (52.2) 0.736

Mean age (�SD) 58.8 (�12.1) 58.4 (�12.3) 59.4 (�11.7) 0.502

Diagnosis

Breast 76 (24.7) 52 (26.4) 24 (21.6)

Colon 59 (19.1) 40 (20.3) 19 (17.1)

Head & neck 29 (9.4) 19 (9.8) 10 (9.0)

Prostate 43 (14.0) 33 (16.8) 10 (9.0)

Lung 101 (32.8) 53 (26.9) 48 (43.2) 0.042

Race/ethnicity

White non-hispanic 228 (74.5) 152 (77.6) 76 (69.1)

Black non-hispanic 62 (20.3) 35 (17.9) 27 (24.6)

Hispanic 10 (3.3) 5 (2.6) 5 (4.6)

Other 6 (1.9) 4 (2.0) 2 (1.8) 0.375

Marital status

Single 33 (10.7) 22 (11.2) 11 (9.9)

Married 211 (68.7) 139 (70.9) 72 (64.9)

Separated 6 (1.9) 3 (1.5) 3 (2.7)

Divorced 29 (9.4) 18 (9.2) 11 (9.9)

Widowed 28 (9.1) 14 (7.4) 14 (12.6) 0.500

Living status

Alone 44 (14.3) 30 (15.2) 14 (12.6)

With other adults (no children) 190 (61.7) 120 (60.9) 70 (63.0)

With other adults and children 67 (21.8) 43 (21.8) 24 (21.6)

With children only 6 (1.9) 3 (1.5) 3 (2.7)

Institution/retirement home 1 (0.3) 1 (0.5) 0 (0.0) 0.833

Education level

< High school degree 35 (11.4) 20 (10.2) 15 (13.8)

= High school degree 153 (50.0) 94 (47.7) 59 (54.1)

> High school degree 118 (38.6) 83 (42.1) 35 (32.1) 0.201

Patient rated ECOG PSR

Normal activity 119 (38.8) 85 (43.2) 34 (30.9)

Some symptoms 111 (36.2) 70 (35.5) 41 (37.3)

<50% Daytime in bed 60 (19.5) 34 (17.3) 26 (23.6)

>50% Daytime in bed/bedridden 17 (5.5) 8 (4.0) 9 (8.18) 0.093

Extent of disease

No evidence of disease 78 (26.9) 59 (32.2) 19 (17.8)

Local disease 50 (17.2) 29 (15.8) 21 (19.7)

Regional spread 47 (16.2) 31 (16.9) 16 (15.0)

Distant metastases 115 (39.7) 64 (35.0) 51 (47.7) 0.033

FACT subscale scores (�SD)

PWB 21.2 (�6.2) 21.9 (�6.2) 20.1 (�6.1) 0.019

SWB 22.3 (�4.8) 22.4 (�4.6) 22.1 (�5.1) 0.598

EWB 18.1 (�4.5) 18.3 (�4.1) 17.5 (�5.0) 0.158

FWB 18.8 (�6.4) 19.3 (�6.3) 17.9 (�6.5) 0.500

FACT-G 80.4 (�15.9) 81.9 (�15.8) 77.7 (�15.8) 0.023

Values in the table refer to the number (percent) of patients, unless otherwise specified. ECOG PSR – Eastern cooperative oncology

group performance status rating; PWB – physical well-being; SWB – social/family well-being; EWB – emotional well-being; FWB –

functional well-being; FACT-G – total.a Completed: patient completed both assessments; not completed: patient completed only the baseline assessment.b p-Value for comparison between the two groups.

210

Page 5: Meaningful change in cancer-specific quality of life scores: Differences between improvement and worsening

Global rating of change (GRC)The GRC scale was used as the criterion againstwhich actual change scores could be compared andcalibrated. To align GRC ratings to the subscalesof the FACT, each FACT dimension (PWB, FWB,EWB, SWB, and total HRQL) was briefly sum-marized, and then patients rated whether theyexperienced no change, a worsening, or an im-provement in that area. If they experienced aworsening or an improvement, they rated the de-gree of change on a seven-point scale. Originally,the two questions on the GRC were combined todefine a 15-point scale ranging from �7 (a greatdeal worse) through 0 (no change) to þ7 (a greatdeal better). Across the five GRCs (n ¼ 972) re-sponses, there were 241 responses (25%) wherepatients circled ‘about the same’ for the firstquestion, and then ‘1=almost the same, hardlyany better/worse at all’. We considered these to beindications of insignificant change, and codedthem accordingly. Forty-eight responses (5%)were uninterpretable because they were ambigu-ous. For example, some patients first circled ‘aboutthe same’ as opposed to the other choices (‘better’or ‘worse’), and then rated change on the 1–7 scaleusing responses from 2 (a little better/worse) to 5(a good deal better/worse). Since it was unclearwhether the change was better or worse, we deletedthese responses from the analyses. From the 15response categories we derived five distinct patientgroups for each subscale and total score: (1) siz-ably worse (�5, �6, �7), (2) minimally worse (�2,�3, �4), (3) no change (�1, 0, þ1), (4) minimallybetter (þ2, þ3, þ4), and (5) sizably better (þ5, þ6,þ7).

Performance status rating scale (PSR) [30]This is a single item (five-level) performance ratingscale designed to be completed by either the pa-tient or the health provider. It taps the degree towhich symptoms from disease or treatment inter-fere with activity level. It is frequently used incancer research and provides a somewhat coarseindication of physical activity level.

FACT utility ladderThe FACT utility ladder asks patients to rate theirPWB, EWB, SWB, FWB, and overall quality oflife on a 0 (the same as being dead) to 10 (the bestpossible health) scale [31]. Patients were presented

with a visual aid of a ladder representing the gra-dations from 0 to 10, and asked to place a markwhere they would rate their current functioning.This scale was not used in the current analyses.

Procedure

Patients were recruited as they waited for scheduledoutpatient appointments. After written informedconsent was obtained, the interviewer gathereddemographic and clinical data, including PSR. Thepatient then completed the FACT (with appropri-ate disease-specific subscales), and the FACT util-ity ladder interview, in that order. The interviewerwas available during this time to offer any neededassistance. Two to three months following baselineassessment, the patient completed (in order): theFACT with disease-specific subscale, the FACTutility ladder interview, and the GRC scale. TheGRC was administered last so its completionwould not exert influence upon the completion ofthe FACT questionnaire. Patients completed as-sessments in clinic during a visit that correspondedwith their 2–3 month window; in cases where thepatient was not scheduled to come to clinic duringthat time, mail surveys were conducted with tele-phone follow-up to assist compliance and datacompleteness.

Five variables were examined for change overtime: Four FACT subscale scores (PWB, EWB,SWB, and FWB) and the FACT-G total score.Time 1 to Time 2 change scores were computed asthe mean change in score per subscale (changefrom baseline) and tested for statistical significanceusing a paired t-test. Analysis of variance tech-niques were used to compare mean change scoresacross the five GRC groups and to compare theabsolute value of the mean change for ‘sizablybetter’ vs. ‘sizably worse’, and ‘minimally better’vs. ‘minimally worse.’ In addition to computationof change scores, effect sizes were computed bydividing the mean change by the SD of the baselinemean score for all patients who completed bothassessments [12]. Spearman rank correlation coef-ficients between FACT scores and GRC scoreswere examined, and 95% CI were obtained toevaluate the precision of the estimates.

All analyses were performed using the rawFACT scores (see scoring algorithms in Cella [32])as well as logit measures derived from item re-

211

Page 6: Meaningful change in cancer-specific quality of life scores: Differences between improvement and worsening

sponse theory (IRT) modeling. The logit measuresachieve interval measurement properties and allowone to make equal-interval assumptions aboutmeaningful change scores at different levels of themeasurement continuum. We used an extension ofthe Rasch measurement model for rating scaledata (i.e., items with ordered response categoriessuch as those used in the FACT) [33–35]. Themodel has three components: (1) an estimate ofeach patient’s ability to achieve a high score (highHRQL), (2) an estimate of each item’s difficulty(the degree to which an item would be unlikely tobe answered in a manner reflecting a high HRQL)and (3) response thresholds for each step in therating scale (there are m-1 steps in an m-categoryscale). These logit measures were derived usingWINSTEPS software that converts raw scores to(interval) ‘Rasch measures’ using IRT-based ana-lyses [36]. IRT analysis was also used to evaluatethe validity of the GRC. Specifically, the GRCratings for the four FACT subscales were analyzedto determine the extent to which they defined dis-tinct indicators of change. The item separationindex, GI, which is the adjusted test SD divided bythe average calibration error, was used to calculatethe number of item difficulty strata with centersthree calibration errors apart: [(4GI þ 1)/3] [35].

Patient characteristics were compared usingstandard statistical tests for continuous and cate-gorical data. Aside from WINSTEPS score con-versions, all data were analyzed using SASprograms [37]. Because our purpose was to deter-mine the extent to which one can calibrate changescores to GRC scores with confidence, rather thanto detect group differences via inferential statistics,no adjustment was made for multiple compari-sons.

Results

Patient characteristics

Of the 308 patients enrolled, 197 (64%) completedboth assessments. Eight of those patients providedincomplete data, leaving a total of 189 for most ofthe analyses. Of the 111 who were not followed atTime 2, 11 died and 11 were too ill to complete thequestionnaires. The remaining 89 patients were lostto follow-up. The lung cancer patients comprisedthe largest group of non-completers. Compared tothose who completed the study, the non-completershad more advanced disease, worse baseline PSRand slightly lower baseline HRQL scores, but didnot differ on sociodemographic variables (see Ta-ble 1).

Changes in the FACT

When viewed as a group, ignoring GRC rating, thetotal sample means changed very little from Time1 to Time 2 (Table 2). In addition, across the fiveFACT subscales, most patients (46–63%) did notreport significant change in their GRC (Table 3).Patients were, however, able to provide distinctevaluations of change for the various HRQL di-mensions. For example, although 13 patients ratedtheir PWB as sizably worse, only one of thesepatients also rated all other FACT domains assizably worse (results not shown). Similarly, only11 patients rated all FACT domains as sizablybetter. Rasch model analysis of the GRC ratingsfor the four FACT subscales demonstrated theexpected orderly progression of the step calibra-tions (i.e., response thresholds for the five GRCcategories) and evidence for at least two statisti-

Table 2. Descriptive summary of FACT scores

Reliability

Baseline/follow-up

Raw scores mean (SD) Interval measures mean (SD)

Baseline Follow-up p-Value Baseline Follow-up p-Value

PWB (n = 182) 0.90/0.89 21.88 (6.18) 21.31 (6.06) 0.143 2.08 (1.88) 1.87 (1.73) 0.068

SWB (n = 178) 0.68/0.78 22.55 (4.58) 22.62 (5.15) 0.840 1.64 (1.55) 1.65 (1.76) 0.978

EWB (n = 180) 0.75/0.79 18.27 (4.11) 18.28 (4.17) 0.988 1.44 (1.70) 1.45 (1.75) 0.950

FWB (n = 184) 0.88/0.88 19.50 (6.29) 18.74 (6.22) 0.044 1.57 (2.09) 1.29 (1.91) 0.018

FACT-G (n = 177) 0.91/0.92 81.92 (15.87) 80.62 (16.72) 0.177 1.21 (1.29) 1.09 (1.33) 0.129

Reliability = Cronbach’s a coefficient based on raw score data; p-Value from paired t-test; see Table 1 for FACT subscale

abbreviations.

212

Page 7: Meaningful change in cancer-specific quality of life scores: Differences between improvement and worsening

cally distinct change strata (item separation in-dex ¼ 1.25; number of strata ¼ [4(1.25) þ 1]/3 ¼ 2.00) [35]. These results (not shown) providesome evidence for the reliability and validity of theGRC scale.

Given the number of patients in the sample withadvanced disease, it was notable that many indi-cated a sizable improvement in their HRQL acrosssubscales. A relatively small number of patientsindicated a sizable worsening of their HRQLacross subscales. The small number of patients inthe sizably worse group causes the estimate for theworse group to be relatively less stable and repre-sentative.

Table 4 presents the relationships between theGRC ratings and the associated FACT scores.Except for the SWB domain, the magnitude of therelationship between the change score and the as-sociated GRC was in the low-moderate range(0:30� 0:05) with a precision of approximately0.14. The GRC was also significantly and posi-tively associated with the follow-up score andgenerally uncorrelated with the baseline score – afinding anticipated by Norman and colleagues [38]based on their review of similar studies. These

authors have also suggested that an unbiasedmeasure of change should show a negative corre-lation with baseline scores. Regression to the meaninduces a negative correlation between baselinevalues and change scores when the baseline andfollow-up scores have equal variances [39, 40], so itmay be reasonable to expect a negative correlationbetween baseline scores and an ordinal measure ofchange such as the GRC.

Tables 5–9 summarize the main findings. Dataare presented in raw score units and IRT-trans-formed interval measure units. In nearly all cases,the mean change on the FACT subscale or totalscore was greater for those who reported globalworsening than for those who reported globalimprovement. Among ‘GRC-sizably worse’ pa-tients, the mean change in the raw score rangedfrom �8.23 to �4.00 across the four subscales. Thesame ranges were �3.07 to �1.95 for the mini-mally worse category (raw scores). This magnitudeof change was comparable to that expected fromprevious experience using the FACT and was themagnitude of change we hypothesized would occurin this group. The ‘GRC-sizably better’ groupshad considerably smaller mean change scores. The

Table 3. GRC scale: frequency of responses by subscale

PWB SWB EWB FWB FACT-G

N = 182 (%) N = 178 (%) N = 180 (%) N = 184 (%) N = 177 (%)

Sizably worse 13 (7) 2 (1) 5 (3) 9 (5) 7 (4)

Minimally worse 2 (11) 6 (3) 22 (12) 20 (11) 16 (9)

No change 83 (46) 112 (63) 94 (52) 99 (54) 100 (57)

Minimally better 31 (17) 33 (19) 30 (17) 35 (19) 29 (16)

Sizably better 34 (19) 25 (14) 29 (16) 21 (11) 25 (14)

Values in the table refer to the number (percent) of patients.

See description of categories in the methods section.

Table 4. Correlations between FACT Scores and GRC

GRC FACT score

Baseline Follow-up Change

PWB 0.09 (�0.06, 0.25) 0.38 (0.25, 0.51)* 0.35 (0.21, 0.49)*

SWB �0.05 (�0.20, 0.09) �0.03 (�0.19, 0.12) 0.07 (�0.09, 0.22)

EWB 0.19 (0.06, 0.33)* 0.37 (0.23, 0.51)* 0.27 (0.14, 0.40)*

FWB 0.08 (�0.08, 0.24) 0.26 (0.13, 0.40)* 0.27 (0.13, 0.40)*

FACT-G 0.11 (�0.04, 0.27) 0.33 (0.19, 0.47)* 0.34 (0.21, 0.48)*

* p < 0:05.

Spearman rank correlation coefficients (and the 95% CI) are shown in the table.

213

Page 8: Meaningful change in cancer-specific quality of life scores: Differences between improvement and worsening

Table 5. Change in PWB

GRC PWB Raw scores Interval measures

N Baseline

mean (SD)

Change

mean (SD)

Effect

size

Baseline

mean (SD)

Change

mean (SD)

Effect

size

All cases (n = 182)

Sizably worse 13 20.15 (7.70) �8.23 (8.02) �1.33 1.76 (2.29) �2.21 (2.37) �1.18

Minimally worse 21 19.56 (6.28) �2.25 (4.17) �0.36 1.32 (1.55) �0.68 (1.01) �0.36

No change 83 23.01 (5.18) �0.34 (4.51) �0.06 2.40 (1.68) �0.14 (1.24) �0.07

Minimally better 31 20.32 (7.91) 1.06 (4.72) 0.17 1.61 (2.36) 0.25 (1.71) 0.13

Sizably better 34 22.67 (5.43) 1.31 (4.12) 0.21 2.32 (1.72) 0.23 (1.46) 0.12

Overall 182

Baseline mean (SD) 21.88 (6.18) 2.08 (1.88)

Score <26 at baseline (n = 114)

Sizably worse 9 16.89 (7.04) �5.78 (7.19) �0.98 0.55 (1.48) �1.12 (1.56) �0.84

Minimally worse 16 17.35 (5.54) �1.70 (4.64) �0.29 0.66 (1.10) �0.36 (0.94) �0.27

No change 50 20.29 (5.04) �0.04 (5.63) �0.01 1.32 (1.08) 0.04 (1.28) 0.03

Minimally better 20 16.70 (7.71) 2.45 (4.88) 0.41 0.38 (1.96) 0.74 (1.44) 0.55

Sizably better 19 19.25 (5.06) 3.24 (4.56) 0.55 1.07 (1.08) 1.00 (1.45) 0.75

Overall 114

Baseline mean (SD) 18.81 (5.92) 0.96 (1.34)

Overall p-value (p < 0:001) from one way analysis of variance of the change scores across the five groups.

Table 6. Change in SWB

GRC SWB Raw scoresa Interval measuresb

N Baseline

mean (SD)

Change

mean (SD)

Effect

size

Baseline

mean (SD)

Change

mean (SD)

Effect

size

All cases (n = 178)

Sizably worse 2 23.67 (0.47) �7.33 (6.13) �1.60 1.80 (0.88) �2.27 (0.95) �1.48

Minimally worse 6 20.08 (5.31) �2.69 (6.76) �0.59 1.00 (1.50) �0.87 (2.53) �0.56

No change 112 23.01 (4.48) 0.34 (4.37) 0.07 1.81 (1.58) 0.09 (1.52) 0.06

Minimally better 33 21.35 (4.86) 0.13 (5.35) 0.03 1.17 (1.42) 0.03 (1.37) 0.02

Sizably better 25 22.53 (4.47) 0.06 (4.36) 0.01 1.69 (1.54) �0.03 (1.41) �0.02

Overall

Baseline mean (SD)

178

22.55 (4.58) 1.64 (1.55)

Score <26 at baseline (n = 114)

Sizably worse 2 23.67 (0.47) �7.33 (6.13) �1.80 1.80 (0.88) �2.27 (0.95) �2.29

Minimally worse 5 18.90 (4.98) �1.03 (6.04) �0.25 0.53 (1.08) �0.28 (2.34) �0.28

No change 75 20.90 (4.01) 1.21 (4.91) 0.30 0.91 (1.03) 0.52 (1.53) 0.53

Minimally better 27 20.06 (4.40) 0.20 (5.93) 0.05 0.64 (0.88) 0.07 (1.51) 0.07

Sizably better 18 20.69 (3.87) 0.42 (5.01) 0.10 0.90 (0.92) 0.19 (1.43) 0.19

Overall

Baseline mean (SD)

127

20.65 (4.08) 0.85 (0.99)

Overall p-value from one way analysis of variance of the change scores across the five groups.a p = 0.115.b p = 0.151.c p = 0.183.d p = 0.084.

214

Page 9: Meaningful change in cancer-specific quality of life scores: Differences between improvement and worsening

Table 7. Change in EWB

GRC EWB Raw scoresa Interval measuresb

N Baseline

mean (SD)

Change

mean (SD)

Effect

size

Baseline

mean (SD)

Change

mean (SD)

Effect

size

All cases (n = 180)

Sizably worse 5 13.60 (4.72) �4.80 (3.19) �1.17 �0.24 (1.21) �1.69 (0.97) �0.99

Minimally worse 22 16.64 (4.67) �1.95 (3.86) �0.47 0.78 (1.70) �0.78 (1.22) �0.46

No change 94 18.52 (4.00) 0.10 (3.93) 0.02 1.54 (1.72) 0.02 (1.75) 0.01

Minimally better 30 18.24 (4.17) 1.06 (4.17) 0.26 1.52 (1.77) 0.24 (1.82) 0.14

Sizably better 29 19.55 (3.13) 0.92 (2.48) 0.22 1.85 (1.35) 0.63 (1.29) 0.37

Overall

Baseline mean (SD)

180

18.27 (4.11) 1.44 (1.70)

Score < 22 at baseline (n = 133)

Sizably worse 5 13.60 (4.72) �4.80 (3.19) �1.36 �0.24 (1.21) �1.69 (0.97) �1.44

Minimally worse 20 15.95 (4.31) �1.95 (4.05) �0.55 0.43 (1.31) �0.68 (1.20) �0.58

No change 66 16.70 (3.35) 0.99 (3.98) 0.28 0.67 (1.18) 0.45 (1.56) 0.38

Minimally better 24 16.93 (3.58) 1.74 (4.28) 0.49 0.83 (1.16) 0.57 (1.66) 0.49

Sizably better 18 17.72 (2.59) 1.59 (2.63) 0.45 0.99 (0.92) 0.94 (1.30) 0.80

Overall

Baseline mean (SD)

133

16.65 (3.54) 0.67 (1.17)

Overall p-value from one way analysis of variance of the change scores across the five groups.a p = 0.001.b p = 0.005.c p < 0.001.

Table 8. Change in FWB

GRC FWB Raw scoresa Interval measuresb

N Baseline

mean (SD)

Change

mean (SD)

Effect

Size

Baseline

mean (SD)

Change

mean (SD)

Effect

Size

All cases (n = 184)

Sizably worse 9 15.22 (5.40) �4.00 (5.89) �0.64 0.18 (1.24) �0.93 (1.30) �0.45

Minimally worse 20 18.08 (6.42) �3.07 (5.64) �0.49 1.23 (2.22) �1.07 (1.83) �0.51

No change 99 20.69 (5.42) �1.07 (4.65) �0.17 1.92 (1.94) �0.32 (1.62) �0.15

Minimally better 35 16.77 (7.35) 1.23 (5.16) 0.20 0.73 (2.17) 0.19 (1.56) 0.09

Sizably better 21 21.57 (6.31) 0.99 (4.19) 0.16 2.31 (2.20) 0.12 (1.43) 0.06

Overall

Baseline mean (SD)

184

19.50 (6.29) 1.57 (2.09)

Score < 26 at baseline (n = 145)

Sizably worse 9 15.22 (5.40) �4.00 (5.89) �0.74 0.18 (1.24) �0.93 (1.30) �0.67

Minimally worse 16 15.72 (4.74) �1.90 (5.22) �0.35 0.28 (1.10) �0.47 (1.28) �0.34

No change 76 18.69 (4.55) �0.77 (5.02) �0.14 1.05 (1.17) �0.04 (1.43) �0.03

Minimally better 29 14.76 (6.41) 2.10 (5.09) 0.39 0.06 (1.70) 0.53 (1.41) 0.38

Sizably better 15 19.20 (5.95) 1.86 (4.62) 0.34 1.23 (1.53) 0.54 (1.28) 0.39

Overall

Baseline mean (SD)

145

17.41 (5.42) 0.73 (1.39)

Overall p-value from one way analysis of variance of the change scores across the five groups.a p = 0.002.b p = 0.032.c p = 0.004.d p = 0.018.

215

Page 10: Meaningful change in cancer-specific quality of life scores: Differences between improvement and worsening

same pattern of change was observed for the logitscores (interval measures). Evidence of a baselinemeasurement ceiling effect prompted us to deletepeople who at baseline scored within two points ofthe highest possible subscale score, and withineight points of the highest possible total score. Theresults of both comparisons (all patients and groupwith ceiling cases removed) are presented in Tables5–9. With the exception of the SWB subscale (Ta-ble 6), mean changes were significantly different(p < 0.05) between the five change groups for theraw scores and interval measures, and for com-parisons with and without patients at the ceiling. Inaddition, the absolute value of the mean changewas nearly always significantly greater (p < 0.05)for the worse group compared to the correspond-ing better group. We also examined the extent towhich the steps between categories of GRC (e.g.,the step from sizably worse to minimally worse isone of 16 steps in each of the Tables 5–9) wereordered as would be predicted by a significant as-sociation between GRC and change score.

Physical well-being (PWB)In both the all-case and ceiling-removed analyses,we observed the predicted ordering of raw and in-

terval change scores, with the exception of the in-terval scores for minimally better vs. sizably betterin the all-case analysis, or in 15 of the 16 steps. Theabsolute value of the raw score change associatedwith the minimal GRC change rating was near 2.0for all but the minimally better in the all-caseanalysis only, where it was closer to 1.0. Except forthis one group, effect sizes associated with minimalchange were in the range of 0.30–0.50, whereas theywere in the 0.50–1.30 range for sizable changes.Even after removal of baseline ceiling cases, smallerchange scores and effect sizes were noted in thesizably better group compared to the sizably worsegroup. In most comparisons, the absolute value ofthe mean change was significantly greater(p < 0.05) for the sizably worse and minimallyworse groups compared to the sizably better andminimally better groups, respectively. This patternwas not always observed in the ceiling-removedanalyses; specifically, the absolute value of themean change was greater for the minimally bettergroups compared to the minimally worse group.

Social/family well-being (SWB)Across the all-case and ceiling-removed analyses,we observed the predicted ordering of raw and

Table 9. Change in FACT-G

GRC FACT-G Raw scores Interval measures

N Baseline

mean (SD)

Change

mean (SD)

Effect

size

Baseline

mean (SD)

Change

mean (SD)

Effect

size

All cases (n = 177)

Sizably worse 7 72.60 (18.75) �19.71 (13.95) �1.24 0.57 (1.12) �1.20 (0.87) �0.93

Minimally worse 16 73.47 (18.92) �9.87 (12.69) �0.62 0.62 (1.33) �0.68 (0.97) �0.53

No change 100 84.51 (13.31) �1.34 (11.77) �0.08 1.40 (1.25) �0.13 (1.03) �0.10

Minimally better 29 74.65 (19.58) 5.48 (11.57) 0.35 0.63 (1.33) 0.33 (0.88) 0.26

Sizably better 25 88.00 (12.43) 1.67 (9.41) 0.11 1.68 (1.06) 0.10 (0.80) 0.08

Overall

Baseline mean (SD)

177

81.92 (15.87) 1.21 (1.29)

Score < 100 at baseline (n = 158)

Sizably worse 7 72.60 (18.75) �19.71 (13.95) �1.32 0.57 (1.12) �1.20 (0.87) �1.15

Minimally worse 14 69.61 (16.87) �7.78 (12.10) �0.52 0.29 (1.04) �0.45 (0.76) �0.43

No change 88 82.00 (12.14) �0.54 (11.86) �0.04 1.09 (0.90) 0.01 (0.92) 0.01

Minimally better 27 72.59 (18.66) 5.97 (11.81) 0.40 0.46 (1.21) 0.27 (0.81) 0.26

Sizably better 22 86.08 (11.98) 2.01 (9.98) 0.13 1.48 (0.93) 0.13 (0.82) 0.13

Overall

Baseline mean (SD)

158

79.44 (14.97) 0.94 (1.04)

Overall p-value (p < 0.001) from one way analysis of variance of the change scores across the five groups.

216

Page 11: Meaningful change in cancer-specific quality of life scores: Differences between improvement and worsening

interval change scores between the sizably worse,minimally worse and no change groups, and notethat all of the step disorder lies in the transitionsinvolving the better categories. This observation,combined with the very low sample sizes in theworse categories, and the lack of correlation be-tween the GRC and change score overall (Ta-ble 4), suggests these results for SWB areinconclusive.

Emotional well-being (EWB)Across the all-case and ceiling-removed analyses,we generally observed the predicted orderingof raw and interval change scores (14 of 16 tran-sitions). The two exceptions were in raw scoredifferences from minimally to sizably better in theall-case and ceiling-removed analyses. As withPWB, the absolute value of the raw score changeassociated with minimal GRC change rating wasnear 2.0 for all but the minimally better in the all-case analysis only, where it was closer to 1.0. Ex-cept for the minimally better group, effect sizesassociated with minimal change were in the rangeof 0.50–0.60, whereas they were in the 0.50–1.40range for sizable changes. Even after removal ofbaseline ceiling cases, smaller change scores andeffect sizes were usually observed in the sizablybetter group compared to the sizably worse group.In all comparisons, the absolute value of the meanchange was significantly greater (p < 0.05) for thesizably worse and minimally worse groups com-pared to the sizably better and minimally bettergroups respectively. It is also noteworthy thatthose who indicated a sizable worsening also hadthe lowest FACT baseline EWB scores. This groupalso demonstrated the greatest degree of change intheir scores across time.

Functional well-being (FWB)Across the all-case and ceiling-removed analyses,we observed the predicted ordering of raw andinterval change scores in most (12 of 16) compar-isons, with the disordering usually observed in thesizably better group. Absolute raw score changeassociated with minimal GRC change rating wasin the 2.0–3.0 range for all but the minimally betterin the all-case analysis only, where it was closer to1.0. Except for the better groups in the all-caseanalysis, effect sizes associated with minimalchange were in the range of 0.35–0.50, whereas

they were in the 0.35–0.75 range for sizablechanges. Even after removal of baseline ceilingcases, smaller change scores and effect sizes werenoted in the sizably better group compared to thesizably worse group.

The absolute value of the mean change wassignificantly greater (p < 0.05) for the sizablyworse group compared to the sizably better group.The only exception was in the all-case comparisonof internal change scores, where a trend towardssignificance was observed (p ¼ 0.068). Among allcases, the absolute value of the minimally worsechange score was significantly greater (p < 0.05)than that observed for the minimally better group.This pattern was not observed in the ceiling-re-moved analyses.

Total FACT-GAcross the all-case and ceiling-removed analyses,we generally observed the predicted ordering ofraw and interval change scores (12 of 16 steps). Allstep disorders occurred between the minimallybetter and sizably better categories. For worseningoverall HRQL, raw score change associated withminimal GRC change rating was in the 8.0–10.0range, whereas it was in the 5.0–6.0 range for im-provement. Effect sizes associated with minimalchange were in the range of 0.30–0.60, whereasthey were in the 1.0–1.3 range for sizable worsen-ing. Even after removal of baseline ceiling cases,smaller change scores and effect sizes were noted inthe sizably better group compared to the sizablyworse group. In fact, smaller change scores werenoted in the sizably better group than among theminimally better group. Of note is the large dif-ference in starting point for those patients whoreported getting sizably better (mean ¼ 86.08) vs.those who reported getting minimally better(mean ¼ 72.59). In all comparisons, the absolutevalue of the mean change was significantly greater(p < 0.05) for the sizably worse and minimallyworse groups compared to the sizably better andminimally better groups, respectively.

Discussion

The results of this study provide some support forthe responsiveness of the FACT questionnaire topatient-rated meaningful changes in PWB, FWB,

217

Page 12: Meaningful change in cancer-specific quality of life scores: Differences between improvement and worsening

EWB, and overall HRQL. The absence of associ-ation between the patient’s retrospective GRC andthe actual change score in social well-being lessensconfidence in the estimates of meaningful changefor this particular domain. With the exception ofthe social domain, patients appeared able to offer aglobal retrospective rating that bears sufficientassociation to actual change scores to enablecomparison of the two.

We were able to categorically examine changescores, whether raw or interval measures, tohighlight the differences between minimal changeand more extensive change. Unlike previous re-ports that de-emphasize a distinction between im-provement and worsening of one’s condition, wechose to highlight whatever distinction may exist.Indeed, smaller improvements than declines wereassociated with comparable GRC, the anchor se-lected in this investigation. When we removedcases near the ceiling to account for/correct forceiling effect, this finding, while attenuated, re-mained. Sneeuw and colleagues [41] also collapsedimprovement and decline categories. Closer ex-amination of the data prior to collapsing revealedapparent differences in magnitude of change basedon direction of change. Similarly, using a methodthat asks patients to compare themselves to others,Canadian investigators have shown that clinicallyimportant differences may be viewed asymmetri-cally in patients with rheumatoid arthritis [42] andhead and neck cancer [44]. In both cases, smallerpositive differences in scores between patientscomparing one another are associated with be-lieving one is better off, than is the case withnegative differences.

Consistent with other studies discussed by Nor-man et al. [38], we observed a positive relationshipbetween the GRC rating and the follow-up scores,and the absence of a relationship between GRCand the baseline scores. It is noteworthy that manyof the cases removed from the analysis to correctfor the baseline ceiling effect were in fact peoplewho rated their HRQL as improving sizably. Whileit is possible that these patients experienced a trueand dramatic improvement in their domain-specificHRQL, it is also possible that they are displaying ageneral yea-saying or positive health responsetendency. This trend is also noted in the sizablyworse group of patients in EWB. These findingsmay reflect a subset of patients who see their

HRQL as always good (or bad) and always im-proving (or declining), perhaps reflecting a dispo-sitional optimism (or pessimism). Anotherpossibility is that patients’ internal frame of refer-ence changed during the time elapsed between thefirst and second assessment, indicating a responseshift. This is a general methodological concern andpossible limitation of this approach to determininga minimally important difference in change score.

Comparatively few patients indicated a wors-ening in their HRQL across FACT subscales. To adegree this is related to losing patients with wors-ening HRQL to follow-up. Therefore, the smallnumbers in the minimally and sizably worsegroups may provide less stable estimates of changedue to the smaller sample size and the possibilityof greater variability in the sample. Taking intoaccount the tentative nature of the findings in thisgroup, several points deserve further discussion.First, those patients who indicated a minimal andsizable worsening of their symptoms relatedchanges in their FACT scores that are consistentwith the expected direction of change. Thesefindings are also consistent with other publisheddata on the FACT [24—29, 44] and the McMasterstudies [8, 13, 45, 46]. Second, in the sizably worsegroups, the degree of raw score change was large,ranging from �8.23 (PWB) to �4.00 (FWB).While the range of change scores was larger thanwe expected, the mean change fell within our ex-pectations. To conclude, among patients who re-port a sizable worsening of their HRQL, averagechange scores on the FACT are quite large, indi-cating that it took a greater amount of negativechange for patients to shift to a perception ofworsening HRQOL.

The minimally worse and minimally bettergroups highlight the smallest degree of meaningfulchange. Importantly, those indicating a change forthe better were more responsive to very small de-grees of change and those experiencing a worsen-ing required a somewhat larger degree of change.The degree of change for both the minimally betterand minimally worse groups fell in the 1.0–3.0range. This fits with our clinical experience and isconsistent with our original hypotheses.

The results for patients who report a sizablechange further illustrate the distinction betweenimprovement and decline. Whereas there weremuch larger change scores associated with patients

218

Page 13: Meaningful change in cancer-specific quality of life scores: Differences between improvement and worsening

who report sizable worsening, the change scoresassociated with sizable improvement were essen-tially the same as those associated with minimalimprovement. This was true even after patientsnear the ceiling were removed. In this study, thesmall degree of change associated with global rat-ings of improvement may be influenced by severalfactors. First, the baseline scores of these groupswere high, leaving little room for improvement(ceiling effect). Removing cases near the ceiling willnot fully address this problem, especially for pa-tients who report sizable improvement. Anotherpossible explanation is that a smaller improvementin the context of cancer care has more meaningthan a decline does. It may be that oncology pa-tients have a ‘response bias’ to viewing even smallchanges as positive and meaningful in light of thetremendous physical and emotional costs and in-vestment in treatment. Another possibility is thatbeing diagnosed with cancer and undergoingtreatment causes a shift in patients’ internal stan-dards of HRQL. This has been referred to as ‘re-sponse-shift’ [47], and if present might introduce abias that threatens study validity.

A third possibility is that the GRC taps intodispositional optimism [48]. Optimists may be in-clined to report improvements in HRQL which arenot detectable in health status measures becausethose measures have already been answered nearthe ceiling. This reporting style may be an epi-phenomenon in itself, or it may be due to a lack ofappropriate and relevant questionnaire itemspointing toward the need for improvement in theinstrument. Although the FACT has a relativelyminor problem with ceiling effects, it remainspossible that other items could be added to im-prove this problem. To account for the possibleindependent effect of dispositional optimism itmay prove helpful to add a measure of optimismas a possible adjustment factor is worth consider-ing when a GRC will be used as the criterion forimprovement.

In this study, the largest number of patientsindicated no change in their HRQL. As expected,there was little to no change in their mean FACTscores. As this study was not specifically designedto assess change, we did not sample equally fromthose expected to experience a change in theirHRQL (those in active treatment) and those not(those who completed treatment). Again, the

baseline scores of this group were generally highacross all subscales.

This study highlights the need for a more sen-sitive and comprehensive approach to examiningchange. Clearly the magnitude of change neededfor a patient to experience it as meaningful variesdepending upon a variety of factors. What thefindings of this study may reflect is the subtle shiftin one’s self and worldview that occurs followingadaptation to the experience of being diagnosedwith cancer and undergoing treatment [49–52]. Itmay be that one’s definition and parameters ofHRQL changes in a way that is not fully capturedusing a statistical understanding of change overtime. The reporting of global improvement may bein part driven by dispositional optimism (and thehealth state itself, as suggested by Norman et al.[38]), or some similar reporting style. As the cur-rent study was not designed to test this distinction,the answer to this must await further study.

The findings of this study support the use of thismethodology to detect score differences that areconnected with worsening in most HRQL domainsand total HRQL. A more complete understandingof HRQL improvement may require more atten-tion to the ceiling of measurement or the additionof other measures to assess the impact of themeaning of the cancer experience, shift in per-spective over time, or the role of optimism.

Although the initial results are promising, thisstudy does have limitations. First, the modificationof the McMaster global rating scale from an in-terview to a self-administered, paper-and-pencilformat resulted in some unforeseen problems.Some patients provided uninterpretable data andwere therefore dropped from the analysis. A sec-ond limitation concerned the higher than expectedrate of patients lost to follow-up. The lost to fol-low-up group had slightly lower HRQL scores,poorer performance status, and more advanceddisease at baseline assessment. At least 11 of thesepatients died; another 11 were too ill to completethe questionnaires. Obtaining information fromthis group would have added to our understandingof those who indicated a worsening of theirHRQL. In addition, a parallel assessment of op-timism would have allowed us to examine its im-pact on ratings of HRQL and of global change.

Understanding these caveats, the convergence ofevidence in this study supports the findings of

219

Page 14: Meaningful change in cancer-specific quality of life scores: Differences between improvement and worsening

previous research. Comparing change scores onstandardized instruments to GRC seems to offer avalid, supporting anchor for understanding theperformance of a questionnaire in longitudinalresearch. These data contribute to the refinementof a well-validated instrument, the FACT, using amodification of a design reported by the McMas-ter group. When comparing these results to thefindings of previous studies using the FACTquestionnaire, including mixed cancer [25], breastcancer [24], lung cancer [23], head and neck cancer[28, 44], and prostate cancer [27], we find in generalthat changes of 2–3 raw scale points in the PWBand FWB subscales are associated with meaning-ful changes in patient activity level as measured bythe ECOG PSR. This was considered preliminarysupporting evidence of a clinically meaningfuldifference. This same degree of change has gener-ally been equivalent to a ‘meaningful’ [12] effectsize of 0.3 and has been comparable findings of theMcMaster group [8, 13, 45, 46] and Osoba et al.[9]. In this study, this difference emerged on thePWB and FWB subscales. EWB changes havebeen less well studied, but results are similar whereavailable [53]. SWB have not been studied. Thetotal FACT-G raw score changes in the range of5–7 points have also emerged using different ap-proaches and again here in this study, providingimportant converging evidence, particularly whenhealth status declines over time, as is often the casein advanced cancer.

Acknowledgements

The authors thank Josephine Ribaudo and StacieHudgens for assistance with the statistical analyses.Results from this study were presented at the SixthAnnual Conference of the International Societyfor Quality of Life Research, Barcelona, Spain,November, 1999. The study was supported in partby unrestricted grants from Glaxo Wellcome andAstraZeneca Pharmaceuticals.

References

1. De Haes J, Stiggelbout A. Assessment of values, utilities,

and preferences in cancer patients. Cancer Treat Rev 1996;

22: 13–26.

2. Aaronson NK. Quality of life assessment in cancer clinical

trials. In: Holland JC (ed.), Psychosocial Aspects of On-

cology, New York, NY: Springer-Verlag, 1990: 97–113.

3. Calman K. Definitions and dimensions of quality of life. In:

Aaronson NK, Beckman J (eds), The Quality of Life of

Cancer Patients, New York: Raven Press, 1987: 1–9.

4. Cella D. Quality of life: The concept. J Palliative Care,

1992; 8: 8–13.

5. Cella D, Bonomi A. Measuring quality of life: 1995 update.

Oncology 1995; 9(Suppl. 11): 47–60.

6. Schipper H, Clinch J, Powell V. Definitions and conceptual

issues. In: Spilker B (ed.), Quality of Life Assessments in

Clinical Trials, New York: Raven Press, 1990: 11–24.

7. Cella D. Quality of life: Concepts and definition. J Pain

Symptom Manag 1994; 9(3): 186–192.

8. Jaeschke R, Singer J, Guyatt G. Measurement of health

status: Ascertaining the minimal clinically important dif-

ference. Control Clin Trials 1989; 10: 407–415.

9. Osoba D, Rodrigues G, Myles J, Zee B, Pater J. Inter-

preting the significance of changes in health-related quality

of life scores. J Clin Oncol 1998; 16: 139–144.

10. Pocock S, Hughes M, Lee R. Statistical problems in the

reporting of clinical trials: A survey of three medical jour-

nals. New Engl J Med, 1987; 317: 426–432.

11. Kraemer H. Reporting the size of effects in research studies

to facilitate assessments of practical or clinical significance.

Psychoneuroendocrinology, 1992; 17: 527–536.

12. Cohen J. Statistical Power Analysis for the Behavioral

Sciences, 2nd edn. Hillsdale, NJ: Laurence Erlbaum Asso-

ciates, Inc, 1988.

13. Guyatt G, Walter S, Norman G. Measuring change over

time: Assessing the usefulness of evaluative instruments.

J Chron Dis 1987; 40: 171–178.

14. Hsu LM. Reliable changes in psychotherapy: Taking into

account regression toward the mean. Behav Assess 1989;

11: 459–467.

15. Jacobson N, Truax P. Clinical significance: A statistical

approach to defining meaningful change in psychotherapy

research. J Consult Clin Psych 1991; 59: 12–19.

16. Braitman L. Statistical, clinical, and experimental evidence

in randomized controlled trials. Ann Intern Med 1983; 98:

407–408.

17. Salsburg D. The religion of statistics as practiced in medical

journals. Am Stat 1985; 39: 220–223.

18. Lydick E, Epstein R. Interpretation of quality of life

changes. Qual Life Res 1993; 2: 221–226.

19. Jacobson NS, Follette WC, Revenstorf D. Psychotherapy

outcome research:Methods for reportingvariability and eva-

luating clinical significance. Behav Ther 1984; 15: 336–352.

20. Jacobson N, Revenstorf D. Statistics for assessing the

clinical significance of psychotherapy techniques: Issues,

problems, and new developments. Behav Assess 1988; 10:

133–145.

21. Christensen L, Mendoza JL. A method of assessing change

in a single subject: An alteration of the RC index. Behav

Ther 1986; 17: 305–308.

22. Husted JA, Cook RJ, Farewell VT, Gladman DD. Meth-

ods for assessing responsiveness: A critical review and rec-

ommendations. J Clin Epidemiol 2000; 53: 459–468.

220

Page 15: Meaningful change in cancer-specific quality of life scores: Differences between improvement and worsening

23. Kazis L, Anderson J, Meehan R. Effect sizes for inter-

preting changes in health status. Med Care 1989; 27 (3,

Supplement): S178–S189.

24. Brady M, Cella D, Mo F, et al. Reliability and validity of

the functional assessment of cancer therapy-breast quality

of life instrument. J Clin Oncol 1997; 15: 974–986.

25. Cella D, Tulsky D, Gray G, et al. The functional assess-

ment of cancer therapy (FACT) scale: Development and

validation of the general version. J Clin Oncol 1993; 11:

570–579.

26. Cella D, Bonomi A, Lloyd S, Tulsky D, Kaplan E, Bonomi,

P. Reliability and validity of the functional assessment of

cancer therapy-lung (FACT-L) quality of life instrument.

Lung Cancer 1995; 12: 199–220.

27. Esper P, Mo F, Chodak G, Sinner M, Cella D, Pienta K.

Measuring quality of life in men with prostate cancer using

the functional assessment of cancer therapy-prostate in-

strument. Urology 1997; 50(6): 920–928.

28. List MA, D’Antonio LL, Cella DF, et al. The performance

status scale for head and neck cancer patients and the

functional assessment of cancer therapy-head and neck

(FACT-H&N) scale: A study of utility and validity. Cancer

1996; 77: 2294–2301.

29. Ward WL, Hahn EA, Mo F, Hernandez L, Tulsky DS,

Cella D. Reliability and validity of the functional assess-

ment of cancer therapy-colorectal (FACT-C) quality of life

instrument. Qual Life Res 1999; 8: 181–195.

30. Zubrod CG, Schneiderman M, Frei E. et al. Appraisal of

methods for the study of chemotherapy in cancer in man:

Comparative therapeutic trial of nitrogen mustard and

triethylene thiophosphoramide. J Chron Dis 1960; 11: 7–33.

31. Yellen S, Cella D. Someone to live for: Social well-being,

parenthood status, and decision-making in oncology. J Clin

Oncol 1995; 13: 1255–1264.

32. Cella DF. Manual of the functional assessment of chronic

illness therapy (FACIT Scales) – Version 4. Evanston, IL:

Center on Outcomes Research and Education (CORE),

Evanston Northwestern Healthcare & Northwestern Uni-

versity November, 1997.

33. Andrich D. A rating formulation for ordered response

categories. Psychometrika 1978; 43: 561–573.

34. Rasch G. Probabilistic Models for Some Intelligence and

Attainment Tests. Copenhagen: Denmarks Paedogogiske

Institut, 1960 (Chicago: University of Chicago Press; 1980).

35. Wright BD, Masters GN. Rating Scale Analysis: Rasch

Measurement. Chicago, IL: MESA Press, 1982.

36. Linacre JM, Wright BD. A User’s Guide to BIGSTEPS/

WINSTEPS/MINISTEP: Rasch-Model Computer Pro-

grams. Chicago, IL: MESA Press, 1998.

37. SAS Software Release 6.12. Cary, NC: SAS Institute Inc.,

1996.

38. Norman GR, Stratford P, Regehr G. Methodological

problems in the retrospective computation of responsive-

ness to change: The lesson of Cronbach. J Clin Epidemiol

1997; 50: 869–879.

39. Blomqvist N. On the relation between change and initial

value. J Am Stat Assoc 1987; 72: 746–749.

40. Oldham PD. A note on the analysis of repeated mea-

surements of the same subjects. J Chron Dis 1962; 15:

969–977.

41. Sneeuw K, Muller M, Aaronson N. Interpreting the sig-

nificance of changes in EORTC QLQ C30 and COOP/

WONCA scores. Qual Life Res 2000; 9: 256.

42. Wells GA, Tugwell P, Kraag GR, Baker PRA, Groh J,

Redelmeier DA. Minimum important difference between

patients with rheumatoid arthritis: The patient’s perspec-

tive. J Rheumatol 1993; 20(3): 557–560.

43. Ringash GJ, Redelmeier DA, O’Sullivan B, Bezjak AA.

Assymetry of good and bad minimal important differences

in quality of life for laryngeal cancer patients. Qual Life Res

1999; 8: 604.

44. D’Antonio L, Zimmerman G, Cella D, Long S. Quality of

life and functional status measures in patients with head

and neck cancer. Arch Otolaryngol 1996; 122: 482–487.

45. Guyatt G, Deyo R, Charlson M, Levine M, Mitchell A.

Responsiveness and validity in health status measurement:

A clarification. J Clin Epidemiol 1989; 42: 403–408.

46. Guyatt G, Jaeschke R. Measurement in clinical trials:

Choosing the appropriate approach. In: Spilker B (ed.),

Quality of Life Assessments in Clinical Trials. New York:

Raven Press, 1990.

47. Schwartz CE, Sprangers MAG. (eds). Adaptation to

Changing Health: Response Shift in Quality-of-life re-

search. Washington DC: American Psychological Associa-

tion, 2000: 227 pp.

48. Scheier M, Carver C. Optimism, coping, and health: As-

sessment and implications of generalized outcome expec-

tancies. Health Psychol 1985; 4: 219–247.

49. Andrykowski M, Brady M, Hunt J. Positive psychosocial

adjustment in potential bone marrow transplant recipients:

Cancer as a psychosocial transition. Psycho-Oncol 1993; 2:

261–276.

50. Kennedy B, Tellegen A, Kennedy S, Havernick N. Psy-

chological responses of patients cured of advanced cancer.

Cancer 1976; 38: 2184–2191.

51. Rieker P, Fitzgerald E, Kalish L, et al. Psychosocial fac-

tors, curative factors, and behavioral outcomes: A com-

parison of testis cancer survivors and a control group of

healthy men. Cancer 1989; 64: 2399–2407.

52. Thompson S, Pitts J. Factors relating to a person’s ability

to find meaning after a diagnosis of cancer. J Psychosoc

Oncol 1993; 11: 1–21.

53. McCain N, Zellar J, Cella D, Urbanski P, Novack R. The

influence of stress management training in the HIV disease.

Nursing Res 1996; 45(4): 246–253.

Address for correspondence: David Cella, Ph.D., Center on

Outcomes, Research and Education, Evanston Northwestern

Healthcare, 1033 University Place, Evanston, Illinois 60201,

USA

Phone: þ1-8475701720; Fax: þ1-8475701735

E-mail: [email protected]

221