item response theory (irt): enhancing health outcomes measurement · 2020. 12. 18. · irt models...

41
Item Response Theory (IRT): Enhancing Health Outcomes Measurement Bryce B. Reeve, Ph.D. e-mail: [email protected]

Upload: others

Post on 25-Feb-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

Item Response Theory (IRT):

Enhancing Health Outcomes

Measurement

Bryce B. Reeve, Ph.D. e-mail: [email protected]

Page 2: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

Presentation Overview

• IRT Models

– Theory of IRT (Reeve)

– IRT item, scale, and person properties (Reeve)

– Comparison with Classical Test Theory (Reeve)

– IRT Assumptions and Model Fit (Orlando Edelen)

– IRT Scoring (Orlando Edelen)

• Applying IRT to enhancing health outcomes

measurement

– Designing and evaluating scales (Siemons; Krishnan)

– Assessing Differential Item Functioning (DIF) (Orlando

Edelen)

– Linking scales (Glas; Oude Voshaar)

– Item Banking and Computerized Adaptive Testing

(Bjorner; Nikolaus)

Page 3: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

Please Note:

• The quality of a health outcomes measure is related to the attention the developer(s) took to use qualitative and quantitative methods integrating multiple perspectives throughout the process.

• IRT Methods do not replace the classical/traditional test theory methods for item/scale analysis.

• IRT analysis is not a magic wand!

– It cannot fix bad data or poorly defined constructs

– By itself, it does not address all forms of validity and other attributes that evaluate the quality of a questionnaire.

Page 4: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

The Need for Better Outcome Measures

Needs Challenges

Develop measures that are

valid, reliable, and sensitive

to detect clinically

meaningful change

Have a minimum set of

questions to reduce

respondent burden.

Different forms of an

instrument to measure

different health levels.

Different forms to be

linked on the same metric

for group comparisons

Non-biased measurement

across groups

Detect differences in

group perceptions

Page 5: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

What is Item Response Theory (IRT)?

1. Theory for Scale Construction

2. Methodology for:

– Evaluating the properties of items within a scale

and the overall scale

– Refining a scale

– Scoring an individual

– Linking multiple scales on to a common metric

– Tailoring PRO measures to an individual or group.

• IRT is designed for:

– Modeling latent “unobservable” variables (traits,

domains, q)

– Multi-item Scales/Questionnaires

Page 6: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

IRT Model: Item Characteristic Curves

I am unhappy some of the time?

0.00

0.25

0.50

0.75

1.00

-3 -2 -1 0 1 2 3

Depression

Pro

ba

bility o

f R

esp

on

se

Severe None

False True

q

Page 7: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

IRT Model

I am unhappy some of the time?

0.00

0.25

0.50

0.75

1.00

-3 -2 -1 0 1 2 3

Depression

Pro

ba

bility o

f R

esp

on

se

Severe None

True

q

b (item) parameter:

- Threshold

- Location

- Difficulty

- Severity

b = .25

Page 8: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

IRT Models

I am unhappy some of the time.

I don’t care what happens to me.

0.00

0.25

0.50

0.75

1.00

-3 -2 -1 0 1 2 3

Depression

Pro

ba

bility o

f R

esp

on

se

Severe None

True

q

b parameter:

- Threshold

- Location

- Difficulty

- Severity

b = .25

b = 1.33

Page 9: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

IRT Model

I am unhappy some of the time?

0.00

0.25

0.50

0.75

1.00

-3 -2 -1 0 1 2 3

Depression

Pro

ba

bility o

f R

esp

on

se

Severe None

True

q

b = .25

a (item) parameter:

- slope

- discrimination

- relationship

with trait

a = 2.83

Page 10: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

IRT Models

I am unhappy some of the time.

I don’t care what happens to me.

I cry easily.

0.00

0.25

0.50

0.75

1.00

-3 -2 -1 0 1 2 3

Depression

Pro

ba

bility o

f R

esp

on

se

Severe None

True

q

b = 0.25

b = 1.33

b = -0.23

a = 2.83

a = 1.11

a = 2.20

Page 11: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

IRT Model: Item Characteristic Curves

I am unhappy some of the time?

0.00

0.25

0.50

0.75

1.00

-3 -2 -1 0 1 2 3

Depression

Pro

ba

bility o

f R

esp

on

se

Severe None

False True

q

ii bai

eXP

q

q7.1

1

11 ai(q – bi)

Page 12: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

IRT: Item Information Curves (The range of the latent construct over which an

item is most useful for distinguishing among

respondents)

0.0

0.5

1.0

1.5

2.0

-3 -2 -1 0 1 2 3

Depression

In

fo

rm

atio

n

I am unhappy some of the time

I don’t care what happens to me

I cry easily

Severe None

b = 0.25 b = 1.33 b = -0.23

a = 2.83

a = 1.11

a = 2.20

Page 13: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

Building reliable and efficient measures…

0.0

1.0

2.0

-3.00 -2.00 -1.00 0.00 1.00 2.00 3.00

Depression

Info

rmati

on

Severe Mild

I don’t seem to care what

happens to me.

I am unhappy some of

the time.

10 Items from the MMPI-2 Depression Scale

I cry easily.

Page 14: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

Scale (Test) Information Curve (The range of the latent construct over which a scale

is most useful for distinguishing among respondents)

0

5

10

15

20

25

-3 -2 -1 0 1 2 3

Depression

In

fo

rm

atio

n

r = .80

r = .90

r = .93

r = .95

r = .96

Severe None

nInformatio

11

Error) (Standard - 1

VarianceError - 1 y Reliabilit

2

m

i

ITI

1

)()( qq

Page 15: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

Questions on the MMPI-2 depression scales were

chosen because they maximally discriminate a

clinically depressed group from a non-clinical group

0

5

10

15

20

25

-3 -2 -1 0 1 2 3

Depression

In

fo

rm

atio

n

r = .80

r = .90

r = .93

r = .95

r = .96

Severe None

Page 16: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

0.0

0.5

1.0

1.5

2.0

2.5

-3 -2 -1 0 1 2 3

Depression

Sta

nd

ard

Err

or

of

Me

as

ure

me

nt

Standard Error of Measurement Curve (The range of the latent construct over which a scale

is most useful for measuring respondent trait levels)

Severe None

I

1SEM

Page 17: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

What is the reduction in information

going from a 22 to 12 item scale?

High Low

q *r = approximate reliability

r = .80

r = .90

r = .93

r = .95

0

5

10

15

20

25

-3.00 -2.00 -1.00 0.00 1.00 2.00 3.00

Fear of Disease Recurrence

In

fo

rm

atio

n

Page 18: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

What about IRT models for

questions with more than two

response categories?

Data from responses to the

PROMIS Depression Item Bank.

Page 19: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

0.0

0.2

0.4

0.6

0.8

1.0

-3.00 -2.00 -1.00 0.00 1.00 2.00 3.00

P

Never

Rarely

Some

times Often

Always In the past 7 days,

I felt unhappy.

Item Response Theory (IRT): Category Response Curves

very mild severe Depressive Symptoms

0.0

0.2

0.4

0.6

0.8

1.0

-3.00 -2.00 -1.00 0.00 1.00 2.00 3.00

P

very mild severe Depressive Symptoms

Never

Rarely

Some

times Often

Always

In the past 7 days,

I felt I had no

reason for living.

Page 20: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

0.0

0.2

0.4

0.6

0.8

1.0

-3.00 -2.00 -1.00 0.00 1.00 2.00 3.00

P

Never

Rarely

Some

times Often

Always

In the past 7 days, I felt unhappy. Item Response Theory (IRT)

Category

Response

Curves

very mild severe Depressive Symptoms

0.0

1.0

2.0

3.0

4.0

-3.00 -2.00 -1.00 0.00 1.00 2.00 3.00

Info

rm

atio

n

Information

Function

very mild severe Depressive Symptoms

Page 21: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

In the past 7 days, I felt I had no

reason for living.

Item Response Theory (IRT)

Category

Response

Curves

very mild severe Depressive Symptoms

0.0

1.0

2.0

3.0

4.0

-3.00 -2.00 -1.00 0.00 1.00 2.00 3.00

Info

rm

atio

n

Information

Function

very mild severe Depressive Symptoms

0.0

0.2

0.4

0.6

0.8

1.0

-3.00 -2.00 -1.00 0.00 1.00 2.00 3.00

P

Never

Rarely

Some

times Often

Always

Page 22: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

Item Response Theory (IRT): Item Information Functions

0.0

1.0

2.0

3.0

4.0

5.0

6.0

-3.00 -2.00 -1.00 0.00 1.00 2.00 3.00

In

fo

rm

atio

n

I felt unhappy.

I felt depressed.

I withdrew from other people.

I felt worthless.

I felt I had no reason for living.

very mild severe Depressive Symptoms

1. Never

2. Rarely

3. Sometimes

4. Often

5. Always

Page 23: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

IRT Family of Models

IRT models come in many varieties (over a 100) to handle:

• Unidimensional and multidimensional data

• Binary, polytomous, and continuous response data

• Ordered as well as unordered response data

Page 24: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

IRT Models You May See in Outcomes Research

Model

Item Response

Format

Model Characteristics

Rasch / 1-

Parameter Logistic

Dichotomous Discrimination power equal across all

items. Threshold varies across items.

2-Parameter

Logistic

Dichotomous Discrimination and threshold

parameters vary across items.

Graded Response Polytomous Ordered responses. Discrimination

varies across items.

Nominal Polytomous No pre-specified item order.

Discrimination varies across items.

Partial Credit

(Rasch Model)

Polytomous Discrimination power constrained to

be equal across items.

Rating Scale

(Rasch Model)

Polytomous Discrimination equal across items.

Item threshold steps equal across

items.

Generalized Partial

Credit

Polytomous Variation of Partial Credit Model with

discrimination varying among items.

Page 25: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

Applications of IRT models for

Health Outcomes Measurement

Page 26: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

1. Design and Evaluation

Page 27: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

1. Design and Evaluation

Page 28: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

1. Design and Evaluation

Page 29: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

2. Testing for Differential Item Functioning (DIF)

In the past 7 days, I cried In the past 7 days, I felt blue None of

the time

A little of

the time

Some of

the time

Most of

the time

All of the

time

None of

the time

A little of

the time

Some of

the time

Most of

the time

All of the

time

Depression

Page 30: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

3. Linking Health Outcome Measures

PROMIS

Depression

Measure

CES

Depression

Scale

Page 31: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

3. Linking Health Outcome Measures

PROMIS

Depression

Measure

CES

Depression

Scale

20 30 40 50 60 70 80

Becks

Depression

Inventory

Depression

Page 32: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

4. Item Banking and

Computerized Adaptive Testing (CAT)

no

depression

mild

depression

moderate

depression

severe

depression

extreme

depression

Depression Item Bank

Item

1

Item

2

Item

3

Item

4

Item

5

Item

6

Item

7

Item

8

Item

9

Item

n

In the past 7 days, I felt unhappy:

Never Rarely Some

times Often Always

Page 33: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

Traditional Measurement Theory

(Classical Test Theory, CTT)

versus

Modern Measurement Theory

Page 34: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

Classical Test Theory Item Response Theory

Measures of precision fixed for

all scores

Precision measures vary across

scores

Longer scales increase

reliability

Shorter, targeted scales can be

equally reliable

Scale properties are sample

dependent

Item & scale properties are

invariant within a linear

transformation

Comparing person scores

dependent on item set

Person scores comparable across

different item sets

Comparing respondents

requires parallel scales

Different scales can be placed on

a common metric

Mixed item formats leads to

unbalanced impact on total

scale scores

Easily handles mixed item formats

Summed scores are on an

ordinal scale

Scores on interval scale

Graphical tools for item and scale

analysis

Page 35: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

Questions on the MMPI-2 depression scales were

chosen because they maximally discriminate a

clinically depressed group from a non-clinical group

0

5

10

15

20

25

-3 -2 -1 0 1 2 3

Depression

In

fo

rm

atio

n

r = .80

r = .90

r = .93

r = .95

r = .96

Severe None to mild

Page 36: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

Conclusions

• IRT serves as a powerful analytic tool

to help design health outcomes

measures.

• Limitations

– Lack of user-friendliness of software

– Required knowledge of measurement

theory.

– Needs large sample sizes

Page 37: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

Important Deadlines

April 12:

Oral and Poster Presentation Abstract

Submissions Due

May 31:

Scholarship Applications and Award

Nominations Due

July 1:

Presenters Confirm Participation

(Oral and Poster Presentations)

August 12:

Early Registration Deadline

September 16:

Advanced Registration Deadline

September 16:

ISOQOL Hotel Room Block Closes

Energizing the Science of

Quality of Life Research:

Where have we been and where can we go?

isoqol.org/2013conference

Page 38: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

Sample Size Issues

Page 39: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

Sample Size Issues • The IRT model to be estimated

– Parameters , Sample Size - Rasch models need less data.

• The number of items or questions. – Number of items , Sample Size

• The number of response options. – Number of response categories , Sample Size

• Unidimensionality of construct – Better the data meet assumption of unidimensionality, sample size

• The item properties – Items at the extremes need more data

• Population distribution – Distributed across theta continuum, Sample Size

• Purpose of Study – Evaluation of an instrument, smaller sample sizes needed

– Estimate accurate respondent scores, larger sample sizes needed.

– Calibrating items for an item bank, larger sample sizes

Page 40: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

Rasch / 1-Parameter Logistic IRT Model

0.00

0.25

0.50

0.75

1.00

-3 -2 -1 0 1 2 3

Depression

Pro

ba

bility o

f R

esp

on

se

Severe None

q

I am unhappy some of the time

I don’t care what happens to me

I cry easily

ibai

eXP

q

q1

11

ibaie

XP

qq

7.11

11

ibie

XP

qq

1

11

b = 0.25

b = 1.33

b = -0.23

q - bi

Page 41: Item Response Theory (IRT): Enhancing Health Outcomes Measurement · 2020. 12. 18. · IRT Models You May See in Outcomes Research Model Item Response Format Model Characteristics

2-Parameter Logistic IRT Model

0.00

0.25

0.50

0.75

1.00

-3 -2 -1 0 1 2 3

Depression

Pro

ba

bility o

f R

esp

on

se

Severe None

q

b = 0.25

b = 1.33

b = -0.23

a = 2.83

a = 1.11

a = 2.20

ii bai

eXP

q

q7.1

1

11 ai(q – bi)