uva-dare (digital academic repository) morbidity after ... · dissection (nd) for head and neck...

19
UvA-DARE is a service provided by the library of the University of Amsterdam (http://dare.uva.nl) UvA-DARE (Digital Academic Repository) Morbidity after lymph node dissection in patients with cancer: Incidence, risk factors, and prevention Stuiver, M.M. Link to publication Citation for published version (APA): Stuiver, M. M. (2014). Morbidity after lymph node dissection in patients with cancer: Incidence, risk factors, and prevention. General rights It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons). Disclaimer/Complaints regulations If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible. Download date: 29 Jul 2020

Upload: others

Post on 05-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

UvA-DARE is a service provided by the library of the University of Amsterdam (http://dare.uva.nl)

UvA-DARE (Digital Academic Repository)

Morbidity after lymph node dissection in patients with cancer: Incidence, risk factors, andprevention

Stuiver, M.M.

Link to publication

Citation for published version (APA):Stuiver, M. M. (2014). Morbidity after lymph node dissection in patients with cancer: Incidence, risk factors, andprevention.

General rightsIt is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s),other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons).

Disclaimer/Complaints regulationsIf you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, statingyour reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Askthe Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam,The Netherlands. You will be contacted as soon as possible.

Download date: 29 Jul 2020

137

CHAPTER 7Psychometric properties of three patient reported outcome measures for the assessment of shoul-der disability after neck dissection

Martijn M. Stuiver MSc1,2, Marieke R. ten Tusscher MSc1, Anita van Opzeeland, PT3, Wim Brendeke, PT4,

Robert Lindeboom PhD2, Pieter U. Dijkstra PhD5, Neil K. Aaronson PhD 6.

1. Department of Physiotherapy, The Netherlands Cancer Institute, Amsterdam, The Netherlands

2. Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Academic Medical

Centre, University of Amsterdam, Amsterdam, The Netherlands

3. Department of Physiotherapy, Medical Centre Leeuwarden, Leeuwarden, The Netherlands

4. Department of Physiotherapy, Rijnstate Hospital, Arnhem, The Netherlands

5. University of Groningen, University Medical Centre Groningen, Department of Rehabilitation and

Department of Oral and Maxillofacial Surgery, Groningen, the Netherlands

6. Division of Psychosocial Research and Epidemiology, The Netherlands Cancer Institute,

Amsterdam, The Netherlands

Submitted

Cha

pter

7

138

ABSTRACTBackgroundPatient-reported outcome measures evaluating shoulder disability after neck dissection (ND) have

not been sufficiently validated. We assessed the psychometric properties of the Shoulder Disability

Questionnaire (SDQ), Neck Dissection Impairment Index (NDII) and the Shoulder Pain and Disability

Index (SPADI) in patients after ND.

Methods107 patients completed the SDQ, NDII and SPADI on 4 occasions over 6 months, and underwent

physical examination. We assessed internal consistency, test-retest reliability, clinical- and construct

validity, and responsiveness to change. The possibility of combining the NDII and SPADI items into

a single scale was explored by Rasch-analysis.

ResultsAll questionnaires exhibited good reliability and validity. We were successful in fitting a Rasch model

to the data.

ConclusionThe results support the suitability of the SDQ, NDII and the SPADI for use in ND patients. Combining

the SPADI and NDII in a single Rasch-scale improves item difficulty distribution, but reduces variability

and discriminative ability.

139

IntroductionShoulder complaints such as pain and restricted range of motion are well known sequelae of neck

dissection (ND) for head and neck cancer 1. Shoulder complaints can impact negatively on daily

activities, and can compromise the patient’s health-related quality of life 2-4. The prevalence of

shoulder disability after ND ranges from 20% after selective neck dissection (SND) to 77% after radical

neck dissection (RND), although there is considerable variability across studies 1,5,6.

Patient-reported outcome measures (PROMs) are used in research and clinical practice to quantify

the subjective shoulder complaints resulting from neck dissection. A number of PROMs are currently

available to assess shoulder complaints, but their psychometric properties for use in ND populations

have been insufficiently established. This complicates the interpretation of study findings and may

also account, in part, for the variability observed in reports of the prevalence of ND-related shoulder

complaints and disability 1,7.

The University of Washington Quality of Life questionnaire (UW-QOL) 8-11, the Shoulder Disability

Questionnaire (SDQ)2,5,8,9, the Shoulder Pain and Disability Index (SPADI)12,13, and the Neck dissection

impairment index (NDII)8,9,13-16 are the most commonly used PROMs for assessing ND related shoulder

complaints 1. The UW-QOL is a head-and-neck cancer specific questionnaire that includes a single

item regarding shoulder function 17. Although this may suffice for screening purposes 8, the lack of

detail limits its usefulness in evaluating changes in shoulder complaints over time in the context of

prevention or treatment trials. The SDQ and SPADI were developed for use in patients with general

shoulder pathology 18-20. Although both questionnaires have exhibited good psychometric properties

when used in various clinical populations 21-23, neither questionnaire has been validated for use in a

ND population. The NDII was developed specifically to assess the disability and quality of life impact

of neck dissection. Preliminary data supported its validity in a single, small cross-sectional study 14.

Although the SDQ, SPADI and the NDII all assess shoulder complaints, they do so in different ways. The

SDQ reflects primarily the International Classification of Human Functioning and Health (ICF)-domain

of physical function. The SPADI also includes items related to activity restriction, although these are

limited to non-complex activities such as reaching for and carrying objects. While the SDQ and SPADI

are more comprehensive in the assessment of shoulder pain, the NDII includes assessment of more

complex activities such adverse changes in overall activity level. Additionally, it contains items relating

to the ICF-domain of social participation, such as the ability to work and to engage in social and

recreational activities.

The primary aim of our study was to conduct a comprehensive evaluation of the psychometric

properties of the SDQ, the SPADI and the NDII when used in patients who have undergone a neck

dissection (ND), including reliability, validity and responsiveness to change over time. Additionally,

we were interested in determining, with the use of item response theory analysis, the extent to

which it is empirically justifiable to combine the items of the SPADI and the NDII into a single, more

comprehensive measure of shoulder complaints, and to evaluate the psychometric properties of such

a combined measure.

Cha

pter

7

140

Methods Setting and patients

We recruited patients consecutively from three specialized HNC-centres in the Netherlands:

The Netherlands Cancer Institute, the Medical Centre Leeuwarden and the Rijnstate Hospital Arnhem.

Their treating physiotherapist recruited patients during regular outpatient control visits. All patients

provided written informed consent. The medical ethics committees of the participating hospitals

approved the study. Patients were eligible for the study if they had undergone a neck dissection

1 to 3 months earlier as part of their treatment for HNC and were aged 18 years or older. Exclusion

criteria included: lack of basic written and oral command of the Dutch language; serious psychiatric

or cognitive problems that would preclude completion of self-report questionnaires; prior serious

shoulder complaints unrelated to the neck dissection (e.g., due to orthopaedic or rheumatoid

disorders); or accessory nerve damage prior to the neck dissection.

Sociodemographic and medical characteristics

We collected age, gender, height and weight, primary tumour, type and extent of neck dissection,

(neo)adjuvant medical treatment (radiotherapy, chemoradiation, chemotherapy) of the neck from the

medical record. Highest level of education, current profession and leisure activities involving use of

the arm or neck on the operated side were collected through self-report.

PROMS

The SDQ is a 16 item scale with three response categories (yes/no/not applicable). Sum scores are

calculated as the percentage of applicable items that are endorsed 18. The SPADI is a 13 item scale

which uses a 0-10 numerical rating scale 20. For both the SDQ and SPADI, higher scores indicate

more complaints. The NDII contains 10 items, with 5 response options with verbal anchors ranging

from ‘not at all’ to ‘a lot’. Higher scores indicate fewer complaints 14. A Dutch translation of the

NDII was not available. Therefore, we performed a standard forward-backward translation procedure.

The provisional Dutch version of the NDII was then pilot tested in a small sample of 10 patients

to evaluate clarity. This led to minor rephrasing of three questions, after which the Dutch NDII was

considered fit for further psychometric evaluation.

The SDQ, NDII and SPADI were administered at four time points: (T1) during a follow up visit 1-3

months after surgery; (T2) within 7 days after T1; (T3) and (T4) during regular medical control visits

approximately 3 and 6 months after T1. At T3 and T4, patients were asked to indicate whether or not

they had a need for rehabilitation treatment of their shoulder.

We assessed health-related quality of life at all time points except T2 with the Dutch language version

of the RAND 36-item Health Survey (RAND-36) 24,25, a generic 36 item questionnaire that has been

used in previous studies addressing shoulder morbidity in Dutch HNC centres 2,4. The RAND-36

includes 9 scales assessing physical functioning, social functioning, role limitations due to physical

problems, role limitations due to emotional problems, mental health, vitality, bodily pain, general

health perception and health change.

Figure 1 summarizes the measurements taken at all timepoints.

141

Physical examination

At all time points except for T2, we measured active range of motion (AROM) for abduction using

an inclinometer according to a standardized protocol and assessed the presence of pain (yes/no)

on passive external rotation of the shoulder. AROM for abduction is indicative of accessory nerve

dysfunction 26 and, like pain on external rotation, is a predictor for shoulder disability 2.

Statistical analysis

Statistical analyses were performed using R, version 2.15.2 (R Core Team, Vienna, Austria) 27,and OPLM

(CITO, Arnhem, the Netherlands) 28.

Figure 1Studyflow

Descriptive statistics

We generated descriptive statistics (frequency and percentage, mean and standard deviation or

median and range, as appropriate) for sociodemographic and medical variables. We calculated sum

scores for the SDQ, SPADI and NDII, and linearly transformed all scores to obtain a 0-100 score range,

maintaining the original scoring directions. For all follow up points, we calculated summary statistics

(mean, standard deviation, median, minimum and maximum) per scale, as well as floor and ceiling

effects (expressed as the proportion of patients with the worst and best possible score). We used

mean imputation for patients with less than 50% items missing on a questionnaire, and excluded

patients with more than 50% missing items.

Item Response Theory Scaling

One of the objectives of the study was to examine, by means of Item Response Theory analysis (IRT)

the possibility of combining the NDII and SPADI items into a single scale.

Rasch and related IRT based models estimate ‘item difficulty’ of individual items, together with person

(dis)ability on a common logit scale. This enables visualization of item difficulty along the continuum

of a construct to detect gaps and redundancies in difficulty of items. Also, it provides a meaningful

ordering of the items, which enhances clinical interpretation.

Cha

pter

7

142

Rasch models have been applied to improve questionnaires in many settings, including the assessment

of quality of life and mood in cancer survivors 29,30. Using a Rasch model has important benefits, as

it results in a scale with true metric properties. Also, the resulting scale is considered ‘person free’,

meaning that the observed measurement properties hold true for other populations as well 31. In

Rasch analysis, item fit tests can be used to evaluate the appropriateness of the item response scales,

i.e whether item score categories should be collapsed before summation. Rasch analysis assumes, and

tests, the unidimensionality of a scale. After exploratory factor analysis and inspection of the scree

plot, we performed Rasch analysis on the combined NDII and SPADI items with the OPLM software

package. We estimated item difficulty locations, assessed the extent of item difficulty coverage along

the continuum of subjective shoulder disability, and tested the fit of the combined items to the

unidimensional Rasch model.

The SDQ was not included in this analysis, since it contains a ‘not applicable’ response option that

prohibits meaningful dichotomization of responses, which is a prerequisite for Rasch analysis. Data

from all time points were used in the analysis.

Item difficulties were estimated using conditional maximum likelihood estimation in a one parameter

logistic model (Rasch). Because this method makes no assumptions on the distribution of data in

the sample or about the way the sample is selected, it accommodates the use of dependent

observations 32. The Rasch analysis consisted of two parts. First, we examined the appropriateness

of the rating scale of each item of the SPADI and NDII in OPLM and collapsed disordered rating step

categories. Second, we fitted the data to the one parameter logistic model with the collapsed rating

categories. Fit of the items to the unidimensional model was tested using specific item oriented fit

statistics, so-called M-tests, that compare deviations of observed and expected frequencies of item

scores for shoulder patients. M-tests values follow a t-distribution and values between - 2 and +2

indicate fit for an item 28. Overall fit of the combined scales to the unidimensional Rasch model was

examined using the R1c statistic P-value, that should exceed P > 0.05 to accept the model for the

data 28. We then calculated absolute agreement between expected and observed item scores, condi-

tional on the sum score. We plotted the item difficulty locations to identify gaps and redundancies

and to assess the extent to which measurement sensitivity and comprehensiveness could be improved

by combining both questionnaires.

Additionally, we evaluated the combined scale alongside the original PROMS using classical test

theory, as described below. For this purpose a sum score was calculated by summing the item scores

as used in the IRT analysis, and a linear transformation was employed to obtain a 0-100 score, with

higher scores indicating more complaints.

Reliability

We calculated intraclass correlation coefficients (ICC(2,1)) 31 for all scales, using the T1 and T2 measure-

ments to assess test-retest reliability coefficients, and Cronbach’s alpha coefficient 31 on the T1 data

to estimate internal consistency.

143

Clinical validity

To assess clinical validity, we calculated the area under the curve (AUC) of the Receiver Operating

Characteristic curve (ROC-curve) for the SDQ, SPADI, NDII, and the combined scale, with the patients’

self-reported need for rehabilitation as the criterion. The AUC reflects the probability that the

questionnaires correctly classify patients as having a self-reported need for shoulder rehabilitation.

Known groups validity is also an aspect of clinical validity 31. To establish this property of each of the

scales, between group comparisons of median scores were made for several subgroups of patients

based on T1 data: patients with RND or modified RND (level 1-5 dissection) versus SND; patients

with AROM for abduction ≥90° versus <90°; and patients who had shoulder pain on external rotation

versus patients who did not. Previous research has demonstrated that shoulder disability is signifi-

cantly different between these subgroups 2,33.

Construct (Convergent and divergent) validity

We assessed convergent and divergent correlations between each questionnaire and the RAND-36

domains as well as shoulder range of motion. Also, correlations between all questionnaires were

calculated. For this purpose, we constructed univariable linear multilevel models, with random

intercepts per patient to account for the repeated measurements. All variables (SDQ, NDII, SPADI,

combined scale, ROM for shoulder abduction and scores on each of the RAND-36-domains) were

centred and scaled by subtracting the mean and dividing by the standard deviation, to obtain

Beta coefficients equal to the correlation coefficient. We expected moderate to high correlations

(r>0.40) 34 of the PROMs with AROM for shoulder abduction and the RAND-36 domains physical

functioning, role functioning-physical and bodily pain, moderate correlations (0.3< r <0.5) 34 with

social functioning, and small correlations (r<0.3) 34 with role functioning-emotional, mental health,

energy, general health perception and health change. Additionally, we generated a plot representing

the mean scores of SDQ, NDII, SPADI and the combined scale over time in relation to ROM for

shoulder abduction to visually assess responsiveness to change of the scales compared to an external

reference measure.

ResultsWe enrolled 107 patients in the study. Characteristics of the sample are shown in Table 1. Ninety-two

patients (86%) returned their T2 questionnaires. T2 questionnaires that were completed and returned

later than 8 days after T1 were excluded from the test-retest analyses (n= 32). Additionally, some

questionnaires contained too much missing data and were therefore excluded from the analysis,

leaving between 54 and 58 evaluable patients available for the T2 measurement (Table 3) . Patients

who did not return their questionnaire (in time) were, on average, 6 years younger, and 2 weeks closer

in time to post-surgery, than patients who did.

Eighty-eight patients (82%) completed the T3 questionnaires and 82 (77%) the T4 questionnaires.

Number of available questionnaires, mean time since surgery, and reasons for loss to follow-up are

depicted in Figure 1. Not all patients returned fully completed questionnaires. Missing data on the

questionnaires was < 10% at all time points, except for the SPADI at T1 (14% missing).

Cha

pter

7

144

Table 1Descriptive statistics of the study sample

Characteristic 1/2 Frequency PercentTotal number of participants 107 100

Male 78 73

Median age (min-max) 62 (31-83)

Median BMI (min-max) 25.6 (15.2-42.6)

Localisation primary tumour

Larynx/pharynx 7 6

Oropharynx/ tongue 43 40

Salivary glands 11 10

Skin/ lip 34 32

Other 12 12

T classification

Carcinoma in situ 1 <1

1 20 19

2 25 23

3 12 11

4 6 6

x 17 16

unknown* 26 24

N classification

0 39 36

1 20 19

2 16 15

x 6 6

unknown* 26 24

Surgical procedure

Radical (modified) neck dissection 33 31

Accessory nerve sacrificed 10 10

Sternocleidoid muscle sacrificed 40 37

Internal jugular vein sacrificed 32 30

145

Characteristic 2/2 Frequency PercentRadiotherapy/chemotherapy

Neoadjuvant radiotherapy 11 10

Neoadjuvant chemoradiation 2 2

Adjuvant chemotherapy 2 2

Currently on chemotherapy 1 <1

Adjuvant radiotherapy 46 43

Currently on radiotherapy treatment 18 17

Education†

Elementary school 11 10

Secondary school (high school) 15 14

Vocational education 43 40

Higher vocational education (B) 22 21

University 13 12

Employment‡

None (retired, unemployed) 46 43

Desk job 35 32

Light physical work 8 8

Moderate to heavy physical work 6 6

Homemaker 4 4

Leisure activities involving use of arm/neck§

No relevant activities reported 37 35

Sports/ exercise 39 36

Handcrafting (timbering etc.) 17 16

Gardening 8 8

Community work 3 3

Musician 1 1

Other 2 2

* For patients who had previously been treated elsewhere, no TN classification is available† Level of education is missing for 2 persons‡ Employment is missing for 8 persons§ If participants were active in more than one category, the most strenuous activity category is reported

Cha

pter

7

146

Item Response Theory analysis

For the IRT analysis, only complete data from all time points were used and patients with zero-scores

on all items were excluded. Thus, a total of 292 observations were included in the analysis.

Visual inspection of the screeplot suggested unidimensionality of the combined items. Also, the first

factor explained 60% of the variance, and although the second added another 9% explained variance,

the correlation between the two factors was >0.90 at all time points. We considered this sufficient

evidence to proceed with the Rasch analysis.

Rating scale analysis showed disordered rating scale step categories for all items, which was resolved

by dichotomising the rating scales of the SPADI and the NDII. We recoded SPADI item scores <4 as

0 and scores ≥ 4 as 1. For the NDII, we recoded the categories ‘not at all’ and ‘a little’ as 0 and all

higher scores (‘moderate’ to ‘very much’) as 1. After dichotomisation, a Rasch-type model could be

fitted, with a R1c of 71.5 with 66 degrees of freedom (p=0.30), indicating good model fit. Absolute

agreement of expected and observed item scores, conditional on the sum score, ranged between

78% and 96% (median 88%). The overall ICC between expected and observed scores was 0.996.

Figure 2 displays the item difficulty spread of separate and combined SPADI and NDII items. From this

plot it is apparent that the NDII and SPADI both have gaps in item difficulty coverage, which can be

resolved by combining the two scales. There was also some overlap, with 4 items having equal item

difficulty. Table 2 shows the item content ordered by item difficulty.

Figure 2Item difficulty on a logit scale fo the Shoulder Pain and Disability Index (SPADI), the Neck Dissection Impairment Index (NDII) and the combined scale. Dots represent scale items and are stacked in case of equal item difficulty

147

Table 2Item order (easy to difficult) and the problems addressed in the combined questionnaire using dichomotomized responses. The italicized items have equal difficulty

Item* Problem queried

N5 Limitations with lifting heavy objects.

S12 Difficulty with carrying heavy objects of 10 pounds (5 kg)

S11 Difficulty with placing an object on a high shelf

S1 Pain at its worst > 3 on Numeric Rating Scale

S3 Pain when reaching for something on a high shelf

N2 Bothered by stiffness in neck or shoulder

N1 Pain or discomfort of the neck or shoulder

N10 Limitations with work (including work at home)

N6 Limitations reaching up to kitchen top level

S7 Difficulty with washing the back

N9 Limitations in leisure time activities

N7 Diminished overall activity level

S2 Pain when lying on the involved side

S8 Difficulty with putting on an undershirt or jumper

S5 Pain while pushing with involved arm

S4 Pain when touching the back of the neck

S6 Difficulty with washing hair

S9 Difficulty with putting on a front buttoned shirt

N3 Difficulty with self care

S13 Difficulty with removing something from back pocket

N4 Limitations with lifting light objects

N8 Diminished participation in social activities

S10 Difficulty with putting on trousers

* Letters and numbers correspond to the original scale and item (N= Neck Dissection Impairment Index, S= Shoulder Pain and Disability Index).

Classical Test Theory analysis

Reliability, floor- and ceiling effectsAll questionnaires exhibited good to excellent internal consistency and test-retest reliability, with

Cronbach’s alpha ranging from 0.91 to 0.96 and ICC(2,1) from 0.84 to 0.93. The NDII exhibited fewest

floor effects, followed by the SPADI, SDQ and the combined scale. Floor effects increased with follow

up time and ranged up to 56% in the combined scale at T4. Some ceiling effects were present for

the SDQ and the combined scale. The number of valid questionnaires, reliability statistics, scores

summary statistics and floor/ceiling effects at all time points are shown in Table 3.

Cha

pter

7

148

Table 3Number of valid questionnaires, reliability and descriptive statistics for the questionnaires at all time points.

Instruments*N

valid alpha†ICC(2,1) (95%CI )‡ P Mean SD Median Min Max

Floor effect§

Ceiling effect§

SDQ T1 103 0.91 33 28.4 31 0 100 19 3

T2 58 0.84 (0.74 - 0.90)

<0.001 27 28.0 25 0 100 23 1

T3 87 27 26.2 20 0 100 25 1

T4 77 14 18.9 0 0 81 40 0

SPADI T1 92 0.96 23 21.8 17 0 80 16 0

T2 56 0.91 (0.85 ; 0.95)

<0.001 23 22.9 16 0 82 20 0

T3 85 18 18.9 12 0 77 19 0

T4 76 12 17.7 3 0 86 33 0

NDII T1 101 0.94 73 21.0 78 10 100 5 0

T2 54 0.93 (0.87 ; 0.96 )

<0.001 75 21.2 78 10 100 11 0

T3 85 80 19.1 85 8 100 13 0

T4 76 87 12.7 90 43 100 20 0

Combined scale T1 104 0.94 28 28.8 17 0 100 26 2

T2 56 0.90 (0.84 ; 0.94)

<0.001 25 29.3 9 0 91 39 0

T3 88 19 26.3 7 0 100 42 1

T4 78 11 19.6 0 0 87 56 0

*Original item scores were used for calculating sum scores of the individual instruments, and dichotomized item scores for the combined scale. Lower scores indicate less disability on the Shoulder Pain and Disability Index (SPADI), Shoulder Disability Questionnaire (SDQ) and combined score, and higher disability on the Neck Dissection Impairment Index (NDII).†Cronbach’s alpha as calculated on t1 data‡ test-retest reliability between t1 and t2§Floor- and ceiling effects are expressed as the percentage of respondents with respectively the best and worst possible score.

149

Clinical validityThere were no statistically significant differences between the questionnaires in the ability to discrim-

inate between patients with or without a self-reported need for treatment. At T3 and T4, 29 and

17 patients, respectively, expressed a need for treatment,. The area under the receiver-operating

characteristic curves (AUC) was 0.85 (95%CI 0.78 – 0.94) for the SDQ, 0.85 (95%CI 0.77 – 0.94) for

the SPADI, 0.85 (0.77 – 0.94) for the NDII and 0.79 (95%CI 0.69 – 0.90) for the combined scale. At T4,

the discriminatory ability was less for all scales, with an AUC of 0.77 (95%CI 0.63 – 0.91) for the SDQ,

0.71 (95%CI 0.57 – 0.86) for the SPADI, 0.74 (95%CI 0.58 – 0.90) for the NDII and 0.72 (95%CI 0.57 –

0.87) for the combined scale.

Known groups comparisonMedian scores on the SDQ, SPADI, NDII and the combined scale differed in the expected direction

between all known groups (Table 4). All differences were statistically significant at the 0.05 level, with

the exception of the comparisons between R(M)ND and SND, where only NDII score differences were

significant .

Convergent correlations, divergent correlations and responsiveness to changeConvergent and divergent correlations with the RAND-36 domains and objectively measured shoulder

function were as expected (Table 5). Visual assessment of change in mean scores of the SDQ, NDII,

SPADI and the combined scale showed a strong association over time with change of shoulder AROM

for abduction, demonstrating their responsiveness to change (Figure 3).

Table 4Known group comparisons at T1*

Scale Type of neck dissection AROM abduction Pain at passive external rotation of the shoulder

R(M)ND SND <90° >90° yes no

33 74 Z p 68 39 Z p 22 85 Z p

SDQ† 38 28 -0.9 0.17 40 6 -5.5 <0.01 52 19 -3.3 <0.01

SPADI‡ 22 13 -1.4 0.08 22 4 -4.2 <0.01 32 13 -3.5 <0.01

NDII § 70 80 1.8 0.03 68 86 4.7 <0.01 68 80 2.5 <0.01

Combined Scale

26 13 -1.3 0.10 30 4 -4.7 <0.01 52 13 -3.6 <0.01

* Listed are number of patients per subgroup, median scores, and Z-scores and corresponding p- values from a Mann-Whitney U test with continuity correction.† Shoulder Disability Index; higher scores indicate more disability‡ Shoulder Pain and Disability Index; higher scores indicate more disability§ Neck Dissection Impairment Index; higher scores indicate less disability

Cha

pter

7

150

Figure 3Mean scores over time for Shoulder Disability Questionnaire (SDQ), Shoulder Pain and Disability Index (SPADI), Neck Dissection Impairment Index (NDII) and the combined scale. For ease of inter-pretation an inverse score is used for the NDII (lower score indicating less complaints), and the Y-axis for shoulder abduction is reversed (descending line for Active Range of Motion (AROM) reflects improved shoulder function). T2 data are omitted because AROM was not measured at T2.

DiscussionOur results provide support for the reliability and validity of the SDQ, SPADI and the NDII for

assessing shoulder complaints after neck dissection, with the SPADI and NDII exhibiting the highest

(and comparable) reliability. While the SPADI provides more detail on pain, the NDII would be the

obvious choice if aspects of activity and social participation are of interest. Also, the NDII was

the only scale that was sensitive to the type of neck dissection. In addition, the NDII exhibited the least

floor effects. Floor effects on all scales increased over time, as shoulder function improved in a large

number of patients. Floor effects were largest in the combined scale. Post-hoc analysis of the number

of patients with a self-expressed need for treatment among patients with a 0-score on the combined

scale, showed that this was the case for only one patient. This could indicate that the observed floor

effects appropriately reflect the absence of serious shoulder complaints.

We hypothesized that the scales could be complementary, and used Rasch analysis to explore this

possibility. In order to combine the NDII and SPADI in a Rasch model, item scores were dichotomised.

Our choice for the cut-off points of the NDII was to a certain extent arbitrary. Different cut-points

have been described for dichotomising 0-10 point numeric (pain) scales as used on the SPADI 35,36.

We also considered the often used 5-point cutoff, but that resulted in misfit of the Rasch model.

This indicates that the 4-point cutoff is the most optimal to discriminate between trait levels.

Dichotomising responses comes at the cost of losing variability in the data, but in return it improved

interpretability. The IRT-analysis showed that a number of items had disordered step-ratings which

was resolved by dichotomisation.

151

From figure 2 it is apparent that patients with a disability score between -1 and 0 logit, and from 0.5 to

1.5 logit cannot be distinguished from one another with the SPADI. The same applies for patients with

NDII scores between -2 and -1 logit, and between 0 and 1 logit. Combining the SPADI and NDII into

a single scale resolved these gaps and resulted in a more even spread of item difficulties.

Some limitations of this study should be noted. Only 60 patients returned their T1 questionnaires within

an appropriate time window, which limited the number of observations available for the test-retest

analysis. Also, at all time points there was a substantial number of invalid questionnaires due to

missing or ambivalent responses (e.g., two response options chosen for a single item). Although this

was below 10% at most time points, it nevertheless suggests that a small percentage of patients found

it difficult to complete the questionnaires. The number of missing items could possibly be reduced

with computer-aided assessment, which can provide instant feedback on missed items. However,

considering that most patients in this population are over 60 years of age, one cannot assume that all

patients will have the requisite computer skills or have access to the internet. This should be less of

a problem with future generations of patients.

Although sufficient, the sample size in our study was relatively small, particularly for the IRT analysis.

Therefore, although our results are promising, they need to be confirmed in future studies.

Clinical validity was, in part, assessed using patients’ self-expressed need for shoulder rehabilitation

as a criterion. While a clinically relevant anchor, it should be pointed out that perceived need for

treatment may be influenced by other factors than shoulder complaints alone 2.

Our initial aim was to include all questionnaires that had previously been used in studies on shoulder

complaints after neck dissection, but we chose not to include the UW-QoL 17 because of its very limited

coverage of shoulder problems (a single item). The Disability of Arm Shoulder and Hand -question-

naire has also been used in a fairly recent study evaluating shoulder complaints after neck dissection 37,

but that study was published after the enrolment of the current study had started. Hence, this scale

was not included in our study. A recent cross-sectional clinimetric study provided some evidence for

the reliability and validity of the DASH in patients following neck dissection 38.

Although our study provides information on the performance of the scales between 1 and 8 months

after neck dissection, future studies are needed to evaluate the scales when used earlier or later in

the cancer care trajectory of these patients.

Notable strengths of the study include its longitudinal design, the inclusion of shoulder range of

motion measures, and the comprehensive approach taken to psychometric evaluation, in particular,

the use of item response theory.

ConclusionThe results of this study support the suitability of the SDQ, NDII and the SPADI for assessing shoulder

complaints in individual patients after neck dissection, and for evaluating change in these complaints

over time. Combining the SPADI and NDII into a single scale and dichotomising the responses yields

a Rasch-scale which allows for true metric measurement and provides a meaningful item ordering as

well as better spread of item-difficulty along the continuum of shoulder disability, but at the expense

of lower variability and discriminative ability.

Cha

pter

7

152

Table 5Correlation coefficients between the shoulder questionnaires and the RAND-36* domains and shoulder abduction.

Shoulder questionnaires

SDQ SPADI NDII COMBINED SCALE

Shoulder questionnaires

SDQ† 1 0.78 -0.76 0.77

SPADI‡ 1 -0.75 0.91

NDII§ 1 -0.87

Rand-36 domains

Physical functioning -0.45 -0.52 0.49 -0.54

Social functioning -0.41 -0.37 0.48 -0.44

Role functioning physical -0.49 -0.40 0.47 -0.43

Role functioning emotional -0.28 -0.23 0.31 -0.26

Menthal health -0.28 -0.24 0.28 -0.28

Vitality -0.41 -0.45 0.45 -0.47

Pain -0.59 -0.55 0.65 -0.59

Health perception -0.19 -0.19 0.24 -0.25

Health change -0.33 -0.27 0.37 -0.34

Shoulder range of motion

Abduction -0.56 -0.46 0.50 -0.49

* RAND 36-item Health Survey; higher scores indicate better health† Shoulder Disability Index; higher scores indicate more disability‡ Shoulder Pain and Disability Index; higher scores indicate more disability § Neck Dissection Impairment Index; higher scores indicate less disability

AcknowledgementWe would like to thank M.L. Vos, M.B. Pantlin and J.C. Chepeha for their assistance with the trans-

lation of the NDII, and P. Venema and E.M. de Boer for patient recruitment.

REFERENCES1. Goldstein DP, Ringash J, Bissada E, Jaquet Y, Irish J, Chepeha D, et al. Scoping review of the liter-

ature on shoulder impairments and disability after neck dissection. Head Neck 2014; 36:299-308

2. Stuiver MM, van Wilgen CP, de Boer EM, de Goede CJT, Koolstra M, van Opzeeland A, et al. Impact of shoulder complaints after neck dissection on shoulder disability and quality of life. Otolaryngol Head Neck Surg 2008;139:32–9.

3. Terrell JE, Welsh DE, Bradford CR, Chepeha DB, Esclamado RM, Hogikyan ND, et al. Pain, quality of life, and spinal accessory nerve status after neck dissection. Laryngoscope 2000;110:620–6.

4. van Wilgen CP, Dijkstra PU, van der Laan BFAM, Plukker JT, Roodenburg JLN. Shoulder and neck morbidity in quality of life after surgery for head and neck cancer. Head Neck 2004;26:839–44.

5. van Wilgen CP, Dijkstra PU, van der Laan BFAM, Plukker JTM, Roodenburg JLN. Shoulder complaints after nerve sparing neck dissections. Int J Oral Maxillofac Surg 2004;33:253–7.

6. Shone GR, Yardley MP. An audit into the incidence of handicap after unilateral radical neck dissection. J Laryngol Otol 1991;105:760–2.

7. Goldstein DP, Ringash J, Bissada E, Jacquet Y, Irish J, Chepeha D, et al. Evaluation of shoulder disability questionnaires used for the assessment of shoulder disability after neck dissection for head and neck cancer. Head Neck 2013. doi:10.1002/hed.23490

8. Rogers SN, Scott B, Lowe D. An evaluation of the shoulder domain of the University of Washington quality of life scale. Br J Oral Maxillofac Surg 2007;45:5–10.

9. Orhan KS, Demirel T, Baslo B, Orhan EK, Yücel EA, Güldiken Y, et al. Spinal accessory nerve function after neck dissections. J Laryngol Otol 2007;121:44–8.

10. Kuntz AL, Weymuller EA. Impact of neck dissection on quality of life. Laryngoscope 1999;109:1334–8.

11. Laverick S, Lowe D, Brown JS, Vaughan ED, Rogers SN. The Impact of Neck Dissection on Health-Related Quality of Life. Arch Otolaryngol Head Neck Surg 2004;130:149–54.

12. Selcuk A, Selcuk B, Bahar S, Dere H. Shoulder function in various types of neck dissection. Role of spinal accessory nerve and cervical plexus preservation. Tumori. 2008;94:36–9.

13. McNeely ML, Parliament MB, Seikaly H, Jha N, Magee DJ, Haykowsky MJ, et al. Effect of exercise on upper extremity pain and dysfunction in head and neck cancer survivors. Cancer 2008;113:214–22.

14. Taylor RJ, Chepeha JC, Teknos TN, Bradford CR, Sharma PK, Terrell JE, et al. Development and validation of the neck dissection impairment index: a quality of life measure. Arch Otolaryngol Head Neck Surg 2002;128:44–9.

15. Güldiken Y, Orhan KS, Demirel T, Ural HI, Yücel EA, Deðer K. Assessment of shoulder impairment after functional neck dissection: long term results. Auris Nasus Larynx. 2005;32:387–91.

16. Murer K, Huber GF, Haile SR, Stoeckli SJ. Comparison of morbidity between sentinel node biopsy and elective neck dissection for treatment of the N0 neck in patients with oral squamous cell carcinoma. Head Neck; 2011;33:1260–4.

17. Hassan SJ, Weymuller EA. Assessment of quality of life in head and neck cancer patients. Head Neck 1993;15:485–96.

18. van der Heijden GJ, Leffers P, Bouter LM. Shoulder disability questionnaire design and respon-siveness of a functional status measure. J Clinical Epidemiol 2000;53:29–38.

Cha

pter

7

19. Elvers RI, Oostendorp RAB, N SI. The Dutch-language version of the Shoulder Pain and Disability Index (SPADI-Dutch Version) in patients after subacromial decompression according to Neer: internal consistency and construct validity. Dutch Journal of Physical Therapy 2003;113:126–31.

20. Roach KE, Budiman-Mak E, Songsiridej N, Lertratanakul Y. Development of a shoulder pain and disability index. Arthritis Care Res 1991;4:143–9.

21. Beaton D, Richards RR. Assessing the reliability and responsiveness of 5 shoulder questionnaires. J Shoulder Elbow Surg 1998;7:565–72.

22. Bot SDM, Terwee CB, van der Windt DA, Bouter LM, Dekker J, de Vet HC. Clinimetric evalu-ation of shoulder disability questionnaires: a systematic review of the literature. Ann Rheum Dis 2004;63:335–41.

23. van der Windt DA, van der Heijden GJ, de Winter AF, Kroes BW, Deville W, Bouter LM. The responsiveness of the shoulder disability questionnaire. Ann Rheum Dis 1998;57:82-7.

24. van der Zee KI, Sanderman R, Heyink JW, de Haes H. Psychometric qualities of the rand 36-item health survey 1.0: A multidimensional measure of general health status. Int J Behav Med 1996;3:104–22.

25. van der Zee KI, Sanderman R. Het meten van de algemene gezondheidstoestand met de RAND-36, een handleiding. 2nd ed. Groningen: Research Institute SHARE, UMCG, Groningen University; 2012.

26. Dijkstra PU, van Wilgen CP, Buijs RP, Brendeke W, de Goede CJ, Kerst A, et al. Incidence of shoulder pain after neck dissection: a clinical explorative study for risk factors. Head Neck. 2001;23:947–53.

27. R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing. wwwR-projectorg. 2013.

28. Glas CAW, Verhelst NDG. One Parameter Logistic Model (OPLM). Arnhem, the Netherlands; 1995.

29. Smith AB, Wright P, Selby PJ, Velikova G. A Rasch and factor analysis of the Functional Assessment of Cancer Therapy-General (FACT-G). Health Qual Life Outcomes 2007;5:19.

30. Lambert S, Pallant JF, Girgis A. Rasch analysis of the Hospital Anxiety and Depression Scale among caregivers of cancer survivors: implications for its use in psycho-oncology. Psycho-Oncology 2011;20:919–25.

31. Streiner DL, Norman GR. Health Measurement Scales. New York: Oxford University Press; 2008.

32. Verhelst NDG, Glas CAW. The one parameter logistic model. In: Fischer GH, Molenaar IW, editors. Rasch Models: foundations, recent developments and applications. New York: Springer-Verlag; 1995.

33. Cheng PT, Hao SP, Lin YH, Yeh AR. Objective comparison of shoulder dysfunction after three neck dissection techniques. Ann Otol Rhinol Laryngol. 2000;109(8 Pt 1):761–6.

34. Cohen J. Statistical power analysis for the behavioral sciencies. New Your: Academic Press;1977.

35. Fejer R, Jordan A, Hartvigsen J. Categorising the severity of neck pain: Establishment of cut-points for use in clinical and epidemiological research. Pain. 2005;119:176–82.

36. Serlin RC, Mendoza TR, Nakamura Y, Edwards KR, Cleeland CS. When is cancer pain mild, moderate or severe? Grading pain severity by its interference with function. Pain. 1995;61:277–84.

37. Carr SD, Bowyer D, Cox G. Upper limb dysfunction following selective neck dissection: A retro-spective questionnaire study. Head Neck. 2009;31(6):789–92.

38. Goldstein DP, Ringash J, Irish JC, Gilbert R, Gullane P, Brown D, et al. Assessment of the Disabilities of the Arm, Shoulder and Hand (DASH) questionnaire for use in patients following neck dissection for head and neck cancer. Head Neck. 2013. doi: 10.1002/hed.23593