item response theory dan mungas, ph.d. department of neurology
DESCRIPTION
What is it? Why should anyone care?TRANSCRIPT
![Page 1: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/1.jpg)
Item Response Theory
Dan Mungas, Ph.D.Department of Neurology
University of California, Davis
![Page 2: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/2.jpg)
What is it?
Why should anyone care?
![Page 3: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/3.jpg)
IRT Basics
![Page 4: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/4.jpg)
Item Response Theory - What Is It
• Modern approach to psychometric test development– Mathematical measurement theory– Associated numeric and computational methods
• Widely used in large scale educational, achievement, and aptitude testing
• More than 50 years of conceptual and methodological development
![Page 5: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/5.jpg)
Item Response Theory - Methods
• Dataset consists of rectangular table– rows correspond to examinees– columns correspond to items
• IRT applications simultaneously estimate examinee ability and item parameters– iterative, maximum likelihood estimation algorithms– processor intensive, no longer a problem
![Page 6: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/6.jpg)
Basic Data Structure
Subject Item1 Item2 Item3 Item4
S1 X11 X12 X13 X14
S2 X21 X22 X23 X24
S3 X31 X32 X33 X34
S4 X41 X42 X43 X44
![Page 7: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/7.jpg)
Item Types
• Dichotomous• Multiple Choice• Polytomous
– Information is greater for polytomous item than for the same item dichotomized at a cutpoint
![Page 8: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/8.jpg)
What is the item level response
• Smallest discrete unit (e.g. Object Naming)• Sum of correct responses (trials in word list
learning test)• For practical reasons, continuous measures might
have to be recoded into ordinal scales with reduced response categories (10, 15)
![Page 9: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/9.jpg)
Item Response Theory - Basic Results
• Item parameters– difficulty– discrimination– correction for guessing
• most applicable for multiple choice items
• Subject Ability (in the psychometric sense)– Capacity to successfully respond to test items (or propensity to
respond in a certain direction)– Net result of all genetic and environmental influences– Measured by scales composed of homogenous items
• Item difficulty and subject ability are on the same scale
![Page 10: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/10.jpg)
Item Characteristic Curves
0.0
0.2
0.4
0.6
0.8
1.0
-3 -2 -1 0 1 2 3Ability
Proportion Correct
Item 1 Item 2 Item 3
![Page 11: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/11.jpg)
Item Response Theory - Outcomes
• Item-Level Results– Item Characteristic Curve (ICC)
• non-linear function relating ability to probability of correct response to item
– Item Information Curve (IIC)• non-linear function showing precision of measurement
(reliability) at different ability points
– Both curves are defined by the item parameters
![Page 12: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/12.jpg)
Item Characteristic Curves
0.0
0.2
0.4
0.6
0.8
1.0
-3 -2 -1 0 1 2 3Ability
Proportion Correct
Item 1 Item 2 Item 3
![Page 13: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/13.jpg)
Information Curves
0.00.51.01.52.02.53.03.54.0
-3 -2 -1 0 1 2 3Ability
Information
Item 1 Item 2 Item 3 Total
![Page 14: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/14.jpg)
0.0
0.2
0.4
0.6
0.8
1.0
-3 -2 -1 0 1 2 3Ability
Proportion Correct
Item 1 Item 2 Item 3
0.00.51.01.52.02.53.03.54.0
-3 -2 -1 0 1 2 3Ability
Information
Item 1 Item 2 Item 3 Total
![Page 15: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/15.jpg)
Item Response Theory - Outcomes
• Test-Level Results– Test Characteristic Curve (TCC)
• non-linear function relating ability to expected total test score
– Test Information Curve (TIC)• non-linear function showing precision of measurement
(reliability) at different ability points
– Both sum of item level functions of included items
![Page 16: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/16.jpg)
Test Characteristic CurveMini-Mental State Examination
0
5
10
15
20
25
30
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0Ability Metric
Total Score
![Page 17: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/17.jpg)
Information Curves
![Page 18: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/18.jpg)
Item Response Theory - Fundamental Assumptions
• Unidimensionality - items measure a homogenous, single domain
• Local independence - covariance among items is determined only by the latent dimension measured by the item set
![Page 19: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/19.jpg)
IRT Models
• 1PL (Rasch)– Only Difficulty and Ability are estimated– Discrimination is assumed to be equal across items
• 2PL– Discrimination, Difficulty and Ability are estimated– Guessing is assumed to not have an effect
• 3PL – Discrimination, Difficulty, Guessing, and Ability are
estimated (multiple choice items)
![Page 20: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/20.jpg)
Item Response Theory - Invariance Properties
• Invariance requires that basic assumptions are met• Item parameters are invariant across different
samples– Within the range of overlap of distributions– Distributions of samples can differ
• Ability estimates are invariant across different item sets– Assumes that ability range of items spans ability range
of subjects that is of interest
![Page 21: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/21.jpg)
Why Do We Care -Applications of IRT in Health Care Settings
• Refined scoring of tests• Characterization of psychometric properties of
existing tests• Construction of new tests
![Page 22: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/22.jpg)
Test Scoring
• IRT permits refined scoring of items that allows for differential weighting of items based on their item parameters
![Page 23: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/23.jpg)
Physical Function Scale Hays, Morales & Reise (2000)
Item LIMITED LIMITED NOT LIMITEDA LOT A LITTLE AT ALL
Vigorous activities, running,Lifting heavy objects,Strenuous sports 1 2 3
Climbing one flight 1 2 3
Walking more than 1 mile 1 2 3
Walking one block 1 2 3
Bathing / dressing self 1 2 3
Preparing meals / doing laundry 1 2 3
Shopping 1 2 3
Getting around inside home 1 2 3
Feeding self 1 2 3
![Page 24: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/24.jpg)
How to Score Test
• Simple approach: there are numbers that will be circled; total these up, and we have a score.
• But: should “limited a lot” for walking a mile receive the same weight as “limited a lot” in getting around inside the home?
• Should “limited a lot” for walking one block be twice as bad as “limited a little” for walking one block?
![Page 25: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/25.jpg)
How IRT Can Help
• IRT provides us with a data-driven means of rational scoring for such measures
• Items that are more discriminating are given greater weight
• In practice, the simple sum score is often very good; improvement is at the margins
![Page 26: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/26.jpg)
Description of Psychometric Properties
• The Test Information Curve (TIC) shows reliability that continuously varies by ability– Depicts ability levels associated with high and low
reliability• The standard error of measurement is directly
related to information value (I())– SEM = 1 / sqrt(I())
• SEM and I() also have a direct correspondence to traditional r– r = 1 - 1/ I()
![Page 27: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/27.jpg)
I(), SEM, r
I() SEM (s.d. units) r1 1.00 0.002 0.71 0.504 0.50 0.759 0.33 0.89
12 0.29 0.9216 0.25 0.9425 0.20 0.9636 0.17 0.97
![Page 28: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/28.jpg)
TICs for English and Spanish language Versions of Two Scales
0
4
8
12
16
-3 -2 -1 0 1 2 3Ability
Information
Object Naming English Object Naming Spanish3MSE English 3MSE Spanish
Mungas et al., 2004
![Page 29: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/29.jpg)
Construction of New Scales
• Items can be selected to create scales with desired measurement properties
• Can be used for prospective test development• Can be used to create new scales from existing
tests/item pools
• IRT will not overcome inadequate items
![Page 30: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/30.jpg)
TICs from an Existing Global Cognition Scale and Re-Calibrated Existing Cognitive Tests
05
101520253035
40 55 70 85 100 115 130Ability (standard score metric)
Information
Global Memory Executive Mattis DRS
Mungas et al., 2003
![Page 31: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/31.jpg)
Principles of Scale Construction
• Information corresponds to assessment goals– Broad and flat TIC for longitudinal change measure in
population with heterogenous ability– For selection or diagnostic test, peak at point of ability
continuum where discrimination is most important
– But normal cognition spans a 4.0 s.d. range, and is even greater in demographically diverse populations
![Page 32: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/32.jpg)
Other Issues In IRT
• Polytomous IRT models are available– Useful for ordinal (Likert) rating scales
• Each possible score of the item (minus 1) is treated like a separate item with a different difficulty parameter
• Information is greater for polytomous item than for the same item dichotomized at a cutpoint
![Page 33: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/33.jpg)
Other Issues in IRT
• Applicable to broad range of content domains• IRT certainly applies to cognitive abilities• Also applies to other health outcomes
– Quality of life– Physical function– Fatigue– Depression– Pain
![Page 34: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/34.jpg)
Other Issues in IRT
• Differential Item Function - Test Bias• IRT provides explicit methods to evaluate and
quantify the extent to which items and tests have different measurement properties in different groups– e.g. racial and ethnic groups, linguistic groups, gender
![Page 35: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/35.jpg)
English and Spanish Item Characteristic Curves for “Lamb/Cordero” Item
0.00
0.20
0.40
0.60
0.80
1.00
-3 -2 -1 0 1 2 3Ability Metric
Probability of Correct Response
EnglishSpanish
![Page 36: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/36.jpg)
English and Spanish Item Characteristic Curves for “Stone/Piedra” Item
0.00
0.20
0.40
0.60
0.80
1.00
-3 -2 -1 0 1 2 3Ability Metric
Probability of Correct Response
EnglishSpanish
![Page 37: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/37.jpg)
Differential Item Function (DIF)
• DIF refers to systematic bias in measuring “true” ability - doesn’t address group differences in ability
![Page 38: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/38.jpg)
Challenges/ Limitations of IRT
• Large samples required for stable estimation– 150-200 for 1PL– 400-500 for 2PL– 600-1000 for 3PL
• Analytic methods are labor intensive– There are a number of (expensive *) applications
readily available for IRT analyses– Evaluation of basic assumptions, identification of
appropriate model, and systematic IRT analysis require considerable expertise and labor
* but, R!!
![Page 39: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/39.jpg)
Computerized Adaptive Testing (CAT)
• IRT based computer driven method• Selects items that most closely match examinee’s
ability• Administers only items needed to achieve a pre-
specified level of precision in measurement (information, s.e.m., reliability)
![Page 40: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/40.jpg)
Why CAT
• Efficiency– Administration -
• Standardization• Time efficiency• Data collection
– Scoring• Computer can implement complex scoring algorithms
![Page 41: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/41.jpg)
CAT Example 1
![Page 42: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/42.jpg)
CAT Example 2
![Page 43: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/43.jpg)
Practical Considerations for CAT
![Page 44: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/44.jpg)
What You Need for CAT
• Computer technology– Item Selection– Item Administration– Scale Scoring
• Item bank with IRT parameters– Range of item difficulty relevant to measurement needs
![Page 45: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/45.jpg)
What is Straightforward/Easy?
• Dichotomous items• Multiple choice items• Ordered polytomous response scales
– Up to 10-15 response options
![Page 46: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/46.jpg)
Technical Challenges
• Continuous response scales (memory, timed tasks)– Can be recoded into smaller number of ordered
response ranges• Lose information
![Page 47: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/47.jpg)
Methodological Challenges
• Sample size requirements– Minimally 300-600 cases for stable estimation of item
parameters• Differential Item Function and Measurement Bias
– Essentially involves item calibration within groups of interest
• e.g., age, education, language, gender, race
– Available literature provides minimal guidance
![Page 48: Item Response Theory Dan Mungas, Ph.D. Department of Neurology](https://reader035.vdocuments.net/reader035/viewer/2022070606/5a4d1b667f8b9ab0599b0619/html5/thumbnails/48.jpg)
References
• Hays, R. D., Morales, L. S., & Reise, S. P. (2000). Item response theory and health outcomes measurement in the 21st century. Med Care, 38(9 Suppl), II28-42.
• Mungas, D., Reed, B. R., & Kramer, J. H. (2003). Psychometrically matched measures of global cognition, memory, and executive function for assessment of cognitive decline in older persons. Neuropsychology, 17(3), 380-392.
• Mungas, D., Reed, B. R., Crane, P. K., Haan, M. N., & González, H. (2004). Spanish and English Neuropsychological Assessment Scales (SENAS): Further development and psychometric characteristics. Psychological Assessment, 16(4), 347-359.