review i
DESCRIPTION
REVIEW I. Reliability Index of Reliability Theoretical correlation between observed & true scores S tandard E rror of M easurement Reliability measure Degree to which an observed score fluctuates due to measurement errors Factors affecting reliability - PowerPoint PPT PresentationTRANSCRIPT
REVIEW I
• Reliability• Index of Reliability
• Theoretical correlation between observed & true scores
• Standard Error of Measurement • Reliability measure• Degree to which an observed score fluctuates due to
measurement errors
• Factors affecting reliability
• A test must be RELIABLE to be VALID
'xxr
'1 xxrSSEM
REVIEW II
• Types of validity• Content-related (face)
• Represents important/necessary knowledge• Use “experts” to establish
• Criterion-related• Evidence of a statistical relationship w/ trait being measured• Alternative measures must be validated w/ criterion measure
• Construct-related• Validates unobservable theoretical measures
REVIEW III
• Standard Error of Estimate• Validity measure• Degree of error in estimating a score based on the criterion
• Methods of obtaining a criterion measure• Actual participation• Perform criterion• Predictive measures
• Interpreting “r”
xyrSSEE 21
Criterion-Referenced Measurement
Poor Sufficient Better
It’s all about me: did I get ‘there’ or not?
Criterion-Referenced Testingaka, Mastery Learning
• Standard Development• Judgmental: use experts typical in human performance
• Normative: theoretically accepted criteria
• Empirical: cutoff based on available data
• Combination: expert & norms typically combined
Advantages of Criterion-Referenced Measurement
• Represent specific, desired performance levels linked to a criterion
• Independent of the % of the population that meets the standard
• If not met, specific diagnostic evaluations can be made
• Degree of performance is not important-reaching the standard is
• Performance linked to specific outcomes
• Individuals know exactly what is expected of them
Limitations of Criterion-Referenced Measurement
• Cutoff scores always involve subjective judgment
• Misclassifications can be severe
• Motivation can be impacted; frustrated/bored
Setting a Cholesterol “Cut-Off”
0
100
200
300
400
500
600
160 175 190 200 210 220 230 240 260 270
Cholesterol mg/dl
N of deaths
Setting a Cholesterol “Cut-Off”
0
100
200
300
400
500
600
160 175 190 200 210 220 230 240 260 270
Cholesterol mg/dl
N of deaths
Statistical Analysis of CRTs
• Nominal data (categorical; major, gender, pass/fail, etc.)
• Contingency table development (2x2 Chi2)
• Chi-Square analysis (used w/ categorical variables)
• Proportion of agreement (see next slide)
• Phi coefficient (correl for dichotomous (y/n) variables)
Proportion of Agreement (P)
Sum the correctly classified cells/total(n1 + n4)/n1+n2+n3+ n4
Examples on board
Considerations with CRT
• The same as norm-referenced testing• Reliability (consistency)
Equivalence: is the PACER equivalent to 1-mi run/walk? Stability: does same test result in consistent findings?
• Validity (Truthfulness of measurement)Criterion-related: concurrent or predictive
Construct-related: establish cut scores (see Fig. 7.3)
Meeting Criterion-Referenced StandardsPossible Decisions
Truly Below
CriterionTruly Above
Criterion
Did not achieve standard
CorrectDecision
FalsePositive
Did achieve standard
FalseNegative
CorrectDecision
CRT ReliabilityTest/Retest of a single measure
Fail
Day 2
Pass
Fail
Pass
Day 1
n1 n2
n3 n4
(n1 + n4)/(n1+n2+n3+ n4)
CRT Validity Use of a field test and criterion measure
Fail
Field Test
Pass
Fail
Pass
Criterion
n1 n2
n3 n4
Example 1
FITNESSGRAM Standards (1987)
24
(4%)
21
(4%)
64
(11%)
472
(81%)
Did not achieve the standard on the run/walk test
Did achieve the standard on the run/walk test
Below the
criterion VO2max
Above the
criterion VO2max
P=(24 + 472)/(24+21+64+472) 496/581=85%
Example 2
AAHPERD Standards (1988)
130
(22%)
23
(4%)
201
(35%)
227
(39%)
Did not achieve the standard on the run/walk test
Did achieve the standard on the run/walk test
Below thecriterion VO2max
Above thecriterion VO2max
Compare Examples 1-2: F’gram (81%) better predictor of VO2max than AAHPERD standards (39%)
P=(130 + 227)/(130+23+201+227) 357/581=61%
Criterion-referenced Measurement
Find a friend:Explain one thing that you learned
today and share WHY IT MATTERS
to you as a future professional