generic measures: limitations of use within specific settings ? j freeman institute of health...
Post on 22-Dec-2015
217 views
TRANSCRIPT
Generic measures:limitations of use
within specific settings ?
J FreemanInstitute of Health Studies
Plymouth University
Properties
Clinical feasibility
Psychometric reliability validity appropriateness responsiveness
Validity
Does it measure what it says
it measures?
Content validity Criterion validity Construct (convergent
and discriminant) Bowling 1997
Construct validity
The extent to which empirical data supports hypotheses concerning the attributes being measured
detective work jigsaw puzzle
Appropriateness
Is the range of the construct measured within the sample similar to the range covered by the instrument?
Van der putten et al 1999
The 36-item Short Form Health Survey (SF-36):
Gold-standard generic self-report measure of health status
Adopted & disseminated world-wide
Standardised UK and US version
SF-36 dimensions
Dimensions No.items
Physical function 10Physical role limitations 3Emotional role limitations 3Emotional well-being 5Bodily Pain 2Energy / vitality 4Social function 2Health perceptions 5
The SF-36
Relatively few studies have evaluated its use as an outcome measure for clinical practice or clinical trials in MS
Aim of study
To explore the reliability, validity, clinical appropriateness, and responsiveness of the SF-36 in MS patients within a health care setting
Methods
150 patients with clinically definite MS
Broad spectrum of disease severity
Assessments completed in 106 patients once, twice in 44 rehabilitation inpatients
Assessments
Disease severity: EDSS Health Status: SF-36 Disability: FIM Handicap:LHS Emotional well-
being:GHQ
Assessment of construct validity...
Convergent validityCorrelation's between SF-36dimensions & instrumentsmeasuring similar & differentconstructs
Group differences validity ANOVA to differentiate
between different groups
...Assessment of construct validity
Hypothesis testingT-tests to investigate whether results in line
with theoretical expectation
Assessment of appropriateness
Examination of the scale score distributions of the 8 dimensions and the 2 summary components of the SF-36 and all other measures range, mean, sd, floor,
ceiling
Sample characteristics Mean age 45 (24 - 78yrs)
Female 68%
Disease pattern SP 50% RR 33% PP 11% Benign 6%
Mean yr’s since diagnosis11 (0.1 - 38)
Mean EDSS 5.7 (1 -9)
Results: convergent & discriminat validity Convergent & discriminant
validity supported
Substantial correlation’s with related scales, e.g. FIM with SF-36 physicalfunction (r = 0.68), EDSS (r = 0.82)
Weak correlation's with unrelated scalese.g. GHQ with SF-36 physical function (r =0.26)
Results: group differences validity
Group differences validity supported
Significant differencesdemonstrated in health status atdifferent level of disease
severity
(p<0.05)
Results: hypothesis testing
As hypothesised: Patients requiring carer
assistance reported lower physical scores (p<0.0001)
Patients scoring > 5 GHQ points reported lower SF-36 emotional scores (p<0.0001)
Results: appropriateness
Scores span the entire spectrum of available range
Significant floor and ceiling effects (>20%) in - physical function - physical role limitations- emotional role limitations- bodily pain
Results: appropriateness
Floor & ceiling effects particularly marked when patient selection restricted to narrow range
- physical dimensions 52% floor in severe group
- physical role limitations 84% floor in severe group
- role limitations 45% ceiling in mild group
Score
range
Implications
floor ceiling
Implications
Spectrum of SF-36 scale too limited to detect changes whichmay occur in pwMS
likely to limit its potentialresponsiveness
limited usefulness within specific MS populations /settings
Recommendations
Generic measures should be tested for specific populations and for specific purposes
When evaluating health status in MS the SF-36 should be supplemented with other relevant & validated measures to ensure comprehensive & valid measurement
Recommendations
Clinicians & researchers shouldunderstand the properties of anoutcome measure whenchoosing an instrument andinterpreting the information itgenerates
...the measure you choose is key in determining effectiveness
Properties of Outcome Measures
Clinical feasibility
Psychometric reliability validity appropriateness responsiveness
Reliability of gait measurements using CODAmpx30 motion analysis system
Veronica MaynardInstitute of Health Studies
University of Plymouth
Reliability
Reliability refers to the
consistency or repeatability of
a measurement taken under
the
same conditions
Factors affecting reliability
instrumental reliability - reliability of measurement device
rater reliability - reliability of rater administering measurement device
response reliability - reliability/stability of variable being measured
Sources of error
Measurement error difference between a
measurement & its true value
Systematic error bias resulting from
one or more processes Random error
Reliability
3 broad categories of reliability: equivalence
(reproducibility) stability
(repeatability) internal consistency
(homogeneity)
Types of reliability & how they are determined
Reliability
Equivalence or
Reproducibility
Stability or consistenc
y
Internal consistency
Inter-rater reliability
Intra-rater or test-retest reliability
Split half reliability & item
analysis
(Adapted from: Sim & Wright 2000, p.132)
Aim of study
To determine intra-rater and inter-rater reliability of gait measurements
using CODA mpx30 motion analysis system
Reliability studies (I)
Intra-rater reliability study:
10 healthy subjects
mean age 39.2 (29-52)
yrs
3 recordings
single trained observer
Reliability studies (II)
Inter-rater reliability study:
19 healthy subjects
mean age 34.4 (20-49)
yrs
3 trained observers
Procedure
self-selected speed
Investigators blind
Points for analysis:• i) initial contact (IC)• ii) mid-stance and (MSt)• iii) mid swing (MSw)
Stick figure illustrations of position of right leg (red) at 1) IC 2) MSt and 3) MSw. Joint angles, moments and powers were determined at these points in the gait cycle.
1) 2) 3)
Procedure (cont)
Spatiotemporal parameters: walking velocity duration of stance duration of swing
Kinematic variables: hip, knee & ankle angles at IC, MSt & MSw
Kinetic variables: moments & power at hip, knee, ankle at IC
and MSt
Analysis
Sagittal plane data
Bland & Altman methods
Intraclass correlation coefficient
(ICC) to determine consistency and agreement among ratings
Right Ankle Sagittal Rotation
-20
-15
-10
-5
0
5
10
15
20
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100
Time (% Gait Cycle)
Dor
sifl
exio
n (+
ve)
(deg
rees
)
Stance Phase Swing Phase
Graphical illustration of sagittal plane joint movement of the ankle during a single gait cycle (dorsiflexion positive, plantarflexion negative). IC= Initial contact; MSt = Mid stance; TO = Toe off; MSw = Mid swing
IC MSt TO MSw
Results (I)
Intra-rater study: Good agreement for
spatio-temporal Generally low ICC values
(ICC < 0.75) for all parameters
Bland & Altman plots reasonable agreement for kinematic data at ankle and knee
Summary of key findings (II)
Inter-rater study: Generally good agreement
for spatio-temporal parameters (ICC > 0.70)
Lower ICC values & wide limits of agreement for kinematic data (especially hip)
angle mean am-pm (degrees)
10.08.06.04.02.00.0
an
gle
diff
ere
nce
am
-pm
(d
eg
rees)
6.0
4.0
2.0
0.0
-2.0
-4.0
angle mean am-pm (degrees)
18.016.014.012.010.08.0
an
gle
diff
ere
nce
am
-pm
(d
eg
ree
s)
6.0
4.0
2.0
0.0
-2.0
-4.0
-6.0
Examples of distribution plots from Bland & Altman test for am-pm repeatability showing mean measurements against differences between measurements for ankle range of motion (ºs) at 1) initial contact 2) mid stance.
1) 2)
Factors affecting reliabilty
Errors associated with marker placement
Soft tissue motion
Natural variation in individual gait cycle
Sampling rate
Recommendations
Standard protocol for marker placement
Training of observers Averaging of min 3 gait cycles
(Winter 1984) Interpret with caution data
from single cycle
General Recommendations
Standard protocol Training Averaging may be required Determine level of error Assess reliability before use in
research/clinically Assess reliability in population
under study
Responsiveness
S.K. Spooner PhD BSc SRCh
Scheme Co-ordinator Podiatry
Properties
Clinical feasibility
Psychometric reliability validity appropriateness responsiveness
Responsiveness to Change
HRQOL measures should be responsive to interventions that change HRQOL
Evaluating responsiveness requires assessing HRQOL relative to an external indicator of change
Testing for Responsiveness
Measurement tools should be tested on patients receiving treatment of known efficacy
Capable of detecting treatment effects?
Responsiveness Indices
Effect size (ES) = D/SD Standardized Response
Mean (SRM) = D/SD+
Guyatt responsiveness statistic (RS) = D/SD++
Where:D = mean changeSD = baseline SDSD+ = SD of DSD++ = SD of D among “unchanged”
So How Big Are Different Changes?
Effect size benchmarks Small: 0.20 - 0.49 Moderate: 0.50 - 0.79 Large: 0.80 or above
Example 1
Freeman, J. et al.: Clinical appropriateness: a key factor in outcome measurement selection: the 36 item short health survey in multiple sclerosis. J Neurol Neurosurg Psychiatry 2000; 68:150-156
Results
n=44 Effect sizes for SF-36
dimensions ranged from negligible to small (0.01-0.30)
Pain & Physical Function demonstrated statistically significant change from admission to discharge
Results
In contrast: Functional
independence measure (ES = 0.56)
London Handicap Scale (ES = 0.58)
28- item General Health Questionnaire (ES = 0.51)
Example 2
Mens, J.M. et al: Reliability & validity of hip abduction strength to measure disease severity in posterior pelvic pain since pregnancy. Spine 2002; 27(15): 1674-9
Example 2
Responsiveness of hip abduction strength expressed as standardized response mean was compared with responsiveness of Quebec Back Pain Disability Scale in patients with PPPP
Results
Responsiveness of hip abduction strength was “large” (SRM =0.93)
In comparison, Quebec Back Pain Disability Scale (SRM = 1.20)
Change and Responsiveness Depends on Treatment
Treatment Outcomes
Hip Replacement
Shoulder Surgery
Heart Valve Surgery
Ulcer Medication
Imp
ac
t on
SF
-36
12
10
8
6
4
2
Magnitude of Change Should Parallel Underlying Change
Size of Intervention
Ch
an
ge
in H
RQ
OL
12
10
8
6
4
2
0
Generic vs Condition Specific Instruments SF-36 is generic measure,
and may contain items unrelated to disease being studied.
Generic vs Condition Specific Instruments Generic instruments are
most useful in discriminating and making comparisons of different disease states for determining severity of disease impact and cross-condition comparisons.
Generic vs Condition Specific Instruments Disease-specific
instruments can assess limitations or restrictions associated with particular disease states.
May be more responsive to minimally significant changes.
Value Depends on Cost
What ever instrument is employed the importance of HRQOL change depends on what it costs to produce it!