![Page 1: RELIABILITY consistency or reproducibility of a test score (or measurement)](https://reader036.vdocuments.net/reader036/viewer/2022062320/56649d435503460f94a1fa2d/html5/thumbnails/1.jpg)
RELIABILITY
consistency or reproducibility of a test score (or measurement)
![Page 2: RELIABILITY consistency or reproducibility of a test score (or measurement)](https://reader036.vdocuments.net/reader036/viewer/2022062320/56649d435503460f94a1fa2d/html5/thumbnails/2.jpg)
Common approaches to estimating reliability
Classical True Score Theory– test-retest, alternate forms, internal consistency
useful for estimating relative decisions
– intraclass correlation useful for estimating absolute decisions
Generalizability Theory– can estimate both relative & absolute
![Page 3: RELIABILITY consistency or reproducibility of a test score (or measurement)](https://reader036.vdocuments.net/reader036/viewer/2022062320/56649d435503460f94a1fa2d/html5/thumbnails/3.jpg)
Reliability is a concept central to all behavioral sciences. To some extent all measures are unreliable. This is especially true with psychological measures and measurements based on human observation
![Page 4: RELIABILITY consistency or reproducibility of a test score (or measurement)](https://reader036.vdocuments.net/reader036/viewer/2022062320/56649d435503460f94a1fa2d/html5/thumbnails/4.jpg)
Sources of Error
Random– fluctuations in the measurement based purely
on chance. Systematic
– Measurement error that affect a score because of some particular characteristic of the person or the test that has nothing to due with the construct being measured.
![Page 5: RELIABILITY consistency or reproducibility of a test score (or measurement)](https://reader036.vdocuments.net/reader036/viewer/2022062320/56649d435503460f94a1fa2d/html5/thumbnails/5.jpg)
CTST
X = T + E– Recognizes only two sources of variance
test -retest (stability) alternate forms (equivalence in item sampling) test-retest with alternate forms (stability &
equivalence but these are confounded)
– Cannot adequately estimate individual sources of error influencing a measurement
![Page 6: RELIABILITY consistency or reproducibility of a test score (or measurement)](https://reader036.vdocuments.net/reader036/viewer/2022062320/56649d435503460f94a1fa2d/html5/thumbnails/6.jpg)
ICC
Uses ANOVA to partition variance due to between subjects and within subjects– Has some ability to accommodate multiple
sources of variance– Does not provide an integrated approach to
estimating reliability under multiple conditions
![Page 7: RELIABILITY consistency or reproducibility of a test score (or measurement)](https://reader036.vdocuments.net/reader036/viewer/2022062320/56649d435503460f94a1fa2d/html5/thumbnails/7.jpg)
Generalizability Theory
The Dependability of Behavioral Measures, (1972) Cronbach, Glaser, Nanda, & Rajaratnam
![Page 8: RELIABILITY consistency or reproducibility of a test score (or measurement)](https://reader036.vdocuments.net/reader036/viewer/2022062320/56649d435503460f94a1fa2d/html5/thumbnails/8.jpg)
Dependability
The accuracy of generalizing from a person’s observed score on a measure to the average score that person would have received under all possible testing conditions the tester would be willing to accept.
![Page 9: RELIABILITY consistency or reproducibility of a test score (or measurement)](https://reader036.vdocuments.net/reader036/viewer/2022062320/56649d435503460f94a1fa2d/html5/thumbnails/9.jpg)
The Decision Maker
The score on which the decision is to be based is only one of many scores that might serve the same purpose. The decision maker is almost never interested in the response given to the particular moment of testing.
Ideally the decision should be based on that person’s mean score over all possible measurement occasions.
![Page 10: RELIABILITY consistency or reproducibility of a test score (or measurement)](https://reader036.vdocuments.net/reader036/viewer/2022062320/56649d435503460f94a1fa2d/html5/thumbnails/10.jpg)
Universe of Generalization
Definition & establishment of the universe admissible observations:– observations that the decision maker is willing
to treat as interchangeable.– all sources of influence acting on the measurement of the trait under study.
What are the sources of ERROR influencing your measurement?
![Page 11: RELIABILITY consistency or reproducibility of a test score (or measurement)](https://reader036.vdocuments.net/reader036/viewer/2022062320/56649d435503460f94a1fa2d/html5/thumbnails/11.jpg)
Generalizability Issues
Facet of Generalization– raters, trials, days, clinics, therapists
Facet of Determination– usually people, but can vary (e.g. raters)
![Page 12: RELIABILITY consistency or reproducibility of a test score (or measurement)](https://reader036.vdocuments.net/reader036/viewer/2022062320/56649d435503460f94a1fa2d/html5/thumbnails/12.jpg)
Types of Studies
Generalizability Study (G-Study)
Decision Study (D-Study)
![Page 13: RELIABILITY consistency or reproducibility of a test score (or measurement)](https://reader036.vdocuments.net/reader036/viewer/2022062320/56649d435503460f94a1fa2d/html5/thumbnails/13.jpg)
G-Study
Purpose is to anticipate the multiple uses of a measurement.
To provide as much information as possible about the sources of variation in the measurement.
The G-Study should attempt to identify and incorporate into its design as many potential sources of variation as possible.
![Page 14: RELIABILITY consistency or reproducibility of a test score (or measurement)](https://reader036.vdocuments.net/reader036/viewer/2022062320/56649d435503460f94a1fa2d/html5/thumbnails/14.jpg)
D-Study
Makes use of the information provided by the G-Study to design the best possible application of the measurement for a particular purpose.
Planning a D-Study:– defines the Universe of Generalization– specifies the proposed interpretation of the
measurement.– uses G-Study information to evaluate the effectiveness
of alternative designs for minimizing error and maximizing reliability.
![Page 15: RELIABILITY consistency or reproducibility of a test score (or measurement)](https://reader036.vdocuments.net/reader036/viewer/2022062320/56649d435503460f94a1fa2d/html5/thumbnails/15.jpg)
Design Considerations
Fixed Facets
Random Facets
![Page 16: RELIABILITY consistency or reproducibility of a test score (or measurement)](https://reader036.vdocuments.net/reader036/viewer/2022062320/56649d435503460f94a1fa2d/html5/thumbnails/16.jpg)
Fixed Facet
When the levels of the facet exhaust all possible conditions in the universe to which the investigator wants to generalize.
When the level of the facet represent a convenient sub-sample of all possible conditions in the universe.
![Page 17: RELIABILITY consistency or reproducibility of a test score (or measurement)](https://reader036.vdocuments.net/reader036/viewer/2022062320/56649d435503460f94a1fa2d/html5/thumbnails/17.jpg)
Random Facets
When it is assumed that the levels of the facet represent a random sample of all possible levels described by the facet.
If you are willing to EXCHANGE the conditions (levels) under study for any other set of conditions of the same size from the universe.
![Page 18: RELIABILITY consistency or reproducibility of a test score (or measurement)](https://reader036.vdocuments.net/reader036/viewer/2022062320/56649d435503460f94a1fa2d/html5/thumbnails/18.jpg)
Types of Decisions
Relative– establish a rank order of individuals (or
groups).– the comparison of a subject’s performance
against others in the group. Absolute
– to index an individual’s (or group’s) absolute level of measurement.
– measurement results are to be made independent from the performance of others in the group.
![Page 19: RELIABILITY consistency or reproducibility of a test score (or measurement)](https://reader036.vdocuments.net/reader036/viewer/2022062320/56649d435503460f94a1fa2d/html5/thumbnails/19.jpg)
Statistical Modeling
ANOVA – just as ANOVA partitions a dependent variable
into effects for the independent variable (main effects & interactions), G-theory uses ANOVA to partition an individual’s measurement score into an effect for the universe-score and an effect for each source of error and their interactions in the design.
![Page 20: RELIABILITY consistency or reproducibility of a test score (or measurement)](https://reader036.vdocuments.net/reader036/viewer/2022062320/56649d435503460f94a1fa2d/html5/thumbnails/20.jpg)
In ANOVA we were driven to test specific hypotheses about our independent variables and thus sought out the F statistic and p-value.
In G-theory we will use ANOVA to partition the different sources of variance and then to estimate their amount (Variance Component).
Statistical Modeling
![Page 21: RELIABILITY consistency or reproducibility of a test score (or measurement)](https://reader036.vdocuments.net/reader036/viewer/2022062320/56649d435503460f94a1fa2d/html5/thumbnails/21.jpg)
One Facet Design
4 Sources of Variability– systematic differences among subjects
(object of measurement)
– systematic differences among raters (occasions, items)
– subjects*raters interaction– random error
confounded
![Page 22: RELIABILITY consistency or reproducibility of a test score (or measurement)](https://reader036.vdocuments.net/reader036/viewer/2022062320/56649d435503460f94a1fa2d/html5/thumbnails/22.jpg)
Two Facet DesignComponents of Variance
Example of a fully crossed two facet design (Kroll, et. al.)
Seven sources of variance are estimated:– subjects– raters– observations– sxr– sxo– rxo– sxrxo,e
![Page 23: RELIABILITY consistency or reproducibility of a test score (or measurement)](https://reader036.vdocuments.net/reader036/viewer/2022062320/56649d435503460f94a1fa2d/html5/thumbnails/23.jpg)
Variance Components
Subjects (s) Observations (o)
Raters (r)
(sxr) (oxr)
(sxo)
(sxrxo)+
Error
![Page 24: RELIABILITY consistency or reproducibility of a test score (or measurement)](https://reader036.vdocuments.net/reader036/viewer/2022062320/56649d435503460f94a1fa2d/html5/thumbnails/24.jpg)
TABLE 1 - Variance Components and Percentage of Variation for Measures of Pelvic Tilt (raters=2,observations=5)
Resting Pelvic Tilt Anterior Pelvic Tilt Posterior Pelvic Tilt
Source ofVariation
VC Percent VC Percent VC Percent
Persons 19.956 75.2 47.683 84.8 20.607 72.3
Raters 1.726 6.5 0.000 0.0 2.508 8.8
Observations 0.148 0.6 0.000 0.0 0.011 0.0
P x R 1.671 6.3 1.935 3.4 1.910 6.7
P x O 0.042 0.2 0.972 1.7 1.077 3.8
R x O 0.000 0.0 0.000 0.0 0.000 0.0
P x R x O, E 3.050 11.5 5.646 10.0 2.394 8.4
Abbreviations: P x R = persons by raters; P x O = persons by observations; R x O = raters byobservations; P x R x O, E = persons by raters by observations combined with residual error
![Page 25: RELIABILITY consistency or reproducibility of a test score (or measurement)](https://reader036.vdocuments.net/reader036/viewer/2022062320/56649d435503460f94a1fa2d/html5/thumbnails/25.jpg)
TABLE 2 - Variance Components and Percentage of Variation for Modified Schober, Attraction Method,and Lower Abdominal Strength Measures (raters=2, observations=3)
Modified Schober Attraction Method Lower AbdominalStrength
Source VC Percent VC Percent VC Percent
Persons 1.006 67.8 0.360 81.3 105.055 52.9
Raters 0.000 0.0 0.000 0.0 0.000 0.0
Observations 0.008 0.5 0.000 0.0 0.000 0.0
P x R 0.181 12.2 0.000 0.0 71.349 36.0
P x O 0.029 2.0 0.083 18.7 3.695 1.9
R x O 0.016 1.1 0.000 0.0 0.757 0.4
P x R x O, E 0.243 16.4 0.000 0.0 17.577 8.9
Abbreviations: P x R = persons by raters; P x O = persons by observations; R x O = raters byobservations; P x R x O, E = persons by raters by observations combined with residual error
![Page 26: RELIABILITY consistency or reproducibility of a test score (or measurement)](https://reader036.vdocuments.net/reader036/viewer/2022062320/56649d435503460f94a1fa2d/html5/thumbnails/26.jpg)
Relative ErrorFacet of Determination: Subjects
Subjects (s) Observations (o)
Raters (r)
(sxr) (oxr)
(sxo)
(sxrxo)+
Error
2rel = 2
sr /nr + 2so /no+ 2
sro,e/nrno
![Page 27: RELIABILITY consistency or reproducibility of a test score (or measurement)](https://reader036.vdocuments.net/reader036/viewer/2022062320/56649d435503460f94a1fa2d/html5/thumbnails/27.jpg)
Absolute ErrorFacet of Determination: Subjects
Subjects (s) Observations (o)
Raters (r)
(sxr) (oxr)
(sxo)
(sxrxo)+
Error
2abs = 2
r/nr + 2o /no + 2
sr /nr + 2so /no + 2
or /nonr + 2sro,e /nonr
![Page 28: RELIABILITY consistency or reproducibility of a test score (or measurement)](https://reader036.vdocuments.net/reader036/viewer/2022062320/56649d435503460f94a1fa2d/html5/thumbnails/28.jpg)
Generalizability CoefficientsAKA: Reliability Coefficients
Absolute Generalizability Coefficient for Subjects:
2s
= -------------2
s + 2abs
Relative Generalizability Coefficient for Subjects:
2s
2 = -------------2
s + 2rel
![Page 29: RELIABILITY consistency or reproducibility of a test score (or measurement)](https://reader036.vdocuments.net/reader036/viewer/2022062320/56649d435503460f94a1fa2d/html5/thumbnails/29.jpg)
TABLE 3 - Variance Components and Percentage of Variation for Right and Left Hamstring FlexibilityMeasures (raters = 2, observations = 3)
Right HamstringFlexibility
Left HamstringFlexibility
Source VC Percent VC Percent
Persons 398.526 93.1 382.639 91.9
Raters 0.000 0.0 0.000 0.0
Observations 1.767 0.4 2.123 0.5
P x R 20.656 4.8 24.030 5.8
P x O 0.708 0.2 1.235 0.3
R x O 0.001 0.0 0.707 0.2
P x R x O, E 6.407 1.5 5.727 1.4
Abbreviations: P x R = persons by raters; P x O = persons by observations; R x O = raters byobservations; P x R x O, E = persons by raters by observations combined with residual error
![Page 30: RELIABILITY consistency or reproducibility of a test score (or measurement)](https://reader036.vdocuments.net/reader036/viewer/2022062320/56649d435503460f94a1fa2d/html5/thumbnails/30.jpg)
TABLE 4 - Variance Component and Percentage of Variation of Abdominal and Trunk MuscleEndurance Methods (raters=2, observation=2)
Abdominal MuscleEndurance
Trunk MuscleEndurance
Source VC Percent VC Percent
Persons 646.177 68.9 1160.656 83.6
Raters 43.936 4.7 0.000 0.0
Observations 0.000 0.0 0.000 0.0
P x R 0.000 0.0 21.732 1.6
P x O 15.736 1.7 24.559 1.8
R x O 0.000 0.0 0.000 0.0
P x R x O, E 232.117 24.7 181.944 13.1
Abbreviations: P x R = persons by raters; P x O = persons by observations; R x O = raters byobservations; P x R x O, E = persons by raters by observations combined with residual error
![Page 31: RELIABILITY consistency or reproducibility of a test score (or measurement)](https://reader036.vdocuments.net/reader036/viewer/2022062320/56649d435503460f94a1fa2d/html5/thumbnails/31.jpg)
TABLE 5 - Generalizability of Pelvic Tilt Measures
Resting Pelvic Tilt Anterior Pelvic Tilt Posterior Pelvic Tilt
G-study D-study G-study D-study G-study D-study
nr =no =
25
11
25
11
25
11
ρ2 0.946 0.809 0.967 0.848 0.936 0.793
φ 0.907 0.750 0.967 0.848 0.886 0.723
Abbreviations: nr = number of raters; no = number of observations; ρ2 = generalizability (G) coefficient forrelative decisions; φ = G-coefficient for absolute decisions
![Page 32: RELIABILITY consistency or reproducibility of a test score (or measurement)](https://reader036.vdocuments.net/reader036/viewer/2022062320/56649d435503460f94a1fa2d/html5/thumbnails/32.jpg)
TABLE 6 - Generalizability of Trunk Flexibility and Strength Measures
ModifiedSchober
AttractionMethod
LowerAbdominalStrength
RightHamstringFlexibility
LeftHamstringFlexibility
study G D G D G D G D G D
n r =no =
23
11
23
11
23
11
23
11
23
11
ρ2 0.877 0.690 0.928 0.813 0.752 0.531 0.972 0.935 0.966 0.925
φ 0.873 0.678 0.928 0.813 0.724 0.530 0.970 0.931 0.964 0.919
Abbreviations: nr = number of raters; no = number of observations; ρ2 = generalizability (G) coefficient forrelative decisions; φ = G-coefficient for absolute decisions
![Page 33: RELIABILITY consistency or reproducibility of a test score (or measurement)](https://reader036.vdocuments.net/reader036/viewer/2022062320/56649d435503460f94a1fa2d/html5/thumbnails/33.jpg)
TABLE 7 - Generalizability of Trunk Endurance Measures
Flexion Extension
G-study D-study G-study D-study
nr =no =
22
11
22
11
ρ2 0.908 0.723 0.944 0.836
φ 0.880 0.689 0.944 0.836
Abbreviations: nr = number of raters; no = number of observations; ρ2 = generalizability (G) coefficient forrelative decisions; φ = G-coefficient for absolute decisions