generic measures: limitations of use within specific settings ? j freeman institute of health...

Generic measures:limitations of use

within specific settings ?

J FreemanInstitute of Health Studies

Plymouth University

Properties

Clinical feasibility

Psychometric reliability validity appropriateness responsiveness

Validity

Does it measure what it says

it measures?

Content validity Criterion validity Construct (convergent

and discriminant) Bowling 1997

Construct validity

The extent to which empirical data supports hypotheses concerning the attributes being measured

detective work jigsaw puzzle

Appropriateness

Is the range of the construct measured within the sample similar to the range covered by the instrument?

Van der putten et al 1999

The 36-item Short Form Health Survey (SF-36):

Gold-standard generic self-report measure of health status

Adopted & disseminated world-wide

Standardised UK and US version

SF-36 dimensions

Dimensions No.items

Physical function 10Physical role limitations 3Emotional role limitations 3Emotional well-being 5Bodily Pain 2Energy / vitality 4Social function 2Health perceptions 5

The SF-36

Relatively few studies have evaluated its use as an outcome measure for clinical practice or clinical trials in MS

Aim of study

To explore the reliability, validity, clinical appropriateness, and responsiveness of the SF-36 in MS patients within a health care setting

Methods

150 patients with clinically definite MS

Broad spectrum of disease severity

Assessments completed in 106 patients once, twice in 44 rehabilitation inpatients

Assessments

Disease severity: EDSS Health Status: SF-36 Disability: FIM Handicap:LHS Emotional well-

being:GHQ

Assessment of construct validity...

Convergent validityCorrelation's between SF-36dimensions & instrumentsmeasuring similar & differentconstructs

Group differences validity ANOVA to differentiate

between different groups

...Assessment of construct validity

Hypothesis testingT-tests to investigate whether results in line

with theoretical expectation

Assessment of appropriateness

Examination of the scale score distributions of the 8 dimensions and the 2 summary components of the SF-36 and all other measures range, mean, sd, floor,

ceiling

Sample characteristics Mean age 45 (24 - 78yrs)

Female 68%

Disease pattern SP 50% RR 33% PP 11% Benign 6%

Mean yr’s since diagnosis11 (0.1 - 38)

Mean EDSS 5.7 (1 -9)

Results: convergent & discriminat validity Convergent & discriminant

validity supported

Substantial correlation’s with related scales, e.g. FIM with SF-36 physicalfunction (r = 0.68), EDSS (r = 0.82)

Weak correlation's with unrelated scalese.g. GHQ with SF-36 physical function (r =0.26)

Results: group differences validity

Group differences validity supported

Significant differencesdemonstrated in health status atdifferent level of disease

severity

(p<0.05)

Results: hypothesis testing

As hypothesised: Patients requiring carer

assistance reported lower physical scores (p<0.0001)

Patients scoring > 5 GHQ points reported lower SF-36 emotional scores (p<0.0001)

Results: appropriateness

Scores span the entire spectrum of available range

Significant floor and ceiling effects (>20%) in - physical function - physical role limitations- emotional role limitations- bodily pain

Results: appropriateness

Floor & ceiling effects particularly marked when patient selection restricted to narrow range

- physical dimensions 52% floor in severe group

- physical role limitations 84% floor in severe group

- role limitations 45% ceiling in mild group

Score

range

Implications

floor ceiling

Implications

Spectrum of SF-36 scale too limited to detect changes whichmay occur in pwMS

likely to limit its potentialresponsiveness

limited usefulness within specific MS populations /settings

Recommendations

Generic measures should be tested for specific populations and for specific purposes

When evaluating health status in MS the SF-36 should be supplemented with other relevant & validated measures to ensure comprehensive & valid measurement

Recommendations

Clinicians & researchers shouldunderstand the properties of anoutcome measure whenchoosing an instrument andinterpreting the information itgenerates

...the measure you choose is key in determining effectiveness

Properties of Outcome Measures



Reliability of gait measurements using CODAmpx30 motion analysis system

Veronica MaynardInstitute of Health Studies

University of Plymouth

Reliability

Reliability refers to the

consistency or repeatability of

a measurement taken under

the

same conditions

Factors affecting reliability

instrumental reliability - reliability of measurement device

rater reliability - reliability of rater administering measurement device

response reliability - reliability/stability of variable being measured

Sources of error

Measurement error difference between a

measurement & its true value

Systematic error bias resulting from

one or more processes Random error

Reliability

3 broad categories of reliability: equivalence

(reproducibility) stability

(repeatability) internal consistency

(homogeneity)

Types of reliability & how they are determined

Reliability

Equivalence or

Reproducibility

Stability or consistenc

y

Internal consistency

Inter-rater reliability

Intra-rater or test-retest reliability

Split half reliability & item

analysis

(Adapted from: Sim & Wright 2000, p.132)

Aim of study

To determine intra-rater and inter-rater reliability of gait measurements

using CODA mpx30 motion analysis system

Reliability studies (I)

Intra-rater reliability study:

10 healthy subjects

mean age 39.2 (29-52)

yrs

3 recordings

single trained observer

Reliability studies (II)

Inter-rater reliability study:

19 healthy subjects

mean age 34.4 (20-49)

yrs

3 trained observers

Procedure

self-selected speed

Investigators blind

Points for analysis:• i) initial contact (IC)• ii) mid-stance and (MSt)• iii) mid swing (MSw)

Stick figure illustrations of position of right leg (red) at 1) IC 2) MSt and 3) MSw. Joint angles, moments and powers were determined at these points in the gait cycle.

1) 2) 3)

Procedure (cont)

Spatiotemporal parameters: walking velocity duration of stance duration of swing

Kinematic variables: hip, knee & ankle angles at IC, MSt & MSw

Kinetic variables: moments & power at hip, knee, ankle at IC

and MSt

Analysis

Sagittal plane data

Bland & Altman methods

Intraclass correlation coefficient

(ICC) to determine consistency and agreement among ratings

Right Ankle Sagittal Rotation

-20

-15

-10

-5

0

5

10

15

20

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100

Time (% Gait Cycle)

Dor

sifl

exio

n (+

ve)

(deg

rees

)

Stance Phase Swing Phase

Graphical illustration of sagittal plane joint movement of the ankle during a single gait cycle (dorsiflexion positive, plantarflexion negative). IC= Initial contact; MSt = Mid stance; TO = Toe off; MSw = Mid swing

IC MSt TO MSw

Results (I)

Intra-rater study: Good agreement for

spatio-temporal Generally low ICC values

(ICC < 0.75) for all parameters

Bland & Altman plots reasonable agreement for kinematic data at ankle and knee

Summary of key findings (II)

Inter-rater study: Generally good agreement

for spatio-temporal parameters (ICC > 0.70)

Lower ICC values & wide limits of agreement for kinematic data (especially hip)

angle mean am-pm (degrees)

10.08.06.04.02.00.0

an

gle

diff

ere

nce

am

-pm

(d

eg

rees)

6.0

4.0

2.0

0.0

-2.0

-4.0

angle mean am-pm (degrees)

18.016.014.012.010.08.0

an

gle

diff

ere

nce

am

-pm

(d

eg

ree

s)

6.0

4.0

2.0

0.0

-2.0

-4.0

-6.0

Examples of distribution plots from Bland & Altman test for am-pm repeatability showing mean measurements against differences between measurements for ankle range of motion (ºs) at 1) initial contact 2) mid stance.

1) 2)

Factors affecting reliabilty

Errors associated with marker placement

Soft tissue motion

Natural variation in individual gait cycle

Sampling rate

Recommendations

Standard protocol for marker placement

Training of observers Averaging of min 3 gait cycles

(Winter 1984) Interpret with caution data

from single cycle

General Recommendations

Standard protocol Training Averaging may be required Determine level of error Assess reliability before use in

research/clinically Assess reliability in population

under study

Responsiveness

S.K. Spooner PhD BSc SRCh

Scheme Co-ordinator Podiatry

Properties



Responsiveness to Change

HRQOL measures should be responsive to interventions that change HRQOL

Evaluating responsiveness requires assessing HRQOL relative to an external indicator of change

Testing for Responsiveness

Measurement tools should be tested on patients receiving treatment of known efficacy

Capable of detecting treatment effects?

Responsiveness Indices

Effect size (ES) = D/SD Standardized Response

Mean (SRM) = D/SD+

Guyatt responsiveness statistic (RS) = D/SD++

Where:D = mean changeSD = baseline SDSD+ = SD of DSD++ = SD of D among “unchanged”

So How Big Are Different Changes?

Effect size benchmarks Small: 0.20 - 0.49 Moderate: 0.50 - 0.79 Large: 0.80 or above

Example 1

Freeman, J. et al.: Clinical appropriateness: a key factor in outcome measurement selection: the 36 item short health survey in multiple sclerosis. J Neurol Neurosurg Psychiatry 2000; 68:150-156

Results

n=44 Effect sizes for SF-36

dimensions ranged from negligible to small (0.01-0.30)

Pain & Physical Function demonstrated statistically significant change from admission to discharge

Results

In contrast: Functional

independence measure (ES = 0.56)

London Handicap Scale (ES = 0.58)

28- item General Health Questionnaire (ES = 0.51)

Example 2

Mens, J.M. et al: Reliability & validity of hip abduction strength to measure disease severity in posterior pelvic pain since pregnancy. Spine 2002; 27(15): 1674-9

Example 2

Responsiveness of hip abduction strength expressed as standardized response mean was compared with responsiveness of Quebec Back Pain Disability Scale in patients with PPPP

Results

Responsiveness of hip abduction strength was “large” (SRM =0.93)

In comparison, Quebec Back Pain Disability Scale (SRM = 1.20)

Change and Responsiveness Depends on Treatment

Treatment Outcomes

Hip Replacement

Shoulder Surgery

Heart Valve Surgery

Ulcer Medication

Imp

ac

t on

SF

-36

12

10

8

6

4

2

Magnitude of Change Should Parallel Underlying Change

Size of Intervention

Ch

an

ge

in H

RQ

OL

12

10

8

6

4

2

0

Generic vs Condition Specific Instruments SF-36 is generic measure,

and may contain items unrelated to disease being studied.

Generic vs Condition Specific Instruments Generic instruments are

most useful in discriminating and making comparisons of different disease states for determining severity of disease impact and cross-condition comparisons.

Generic vs Condition Specific Instruments Disease-specific

instruments can assess limitations or restrictions associated with particular disease states.

May be more responsive to minimally significant changes.

Value Depends on Cost

What ever instrument is employed the importance of HRQOL change depends on what it costs to produce it!

generic measures: limitations of use within specific settings ? j freeman institute of health...

Documents