applying ideal point irt models to score single stimulus and pairwise preference personality items...

30
Applying Ideal Point IRT Models to Score Single Stimulus and Pairwise Preference Personality Items Stephen Stark (USF) Oleksandr S. Chernyshenko (UC, NZ) Fritz Drasgow (UIUC)

Upload: john-dennis

Post on 23-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Applying Ideal Point IRT Models to Score Single Stimulus and Pairwise Preference Personality Items Stephen Stark (USF) Oleksandr S. Chernyshenko (UC, NZ)

Applying Ideal Point IRT Models to Score Single Stimulus and

Pairwise Preference Personality Items

Stephen Stark (USF)

Oleksandr S. Chernyshenko (UC, NZ)

Fritz Drasgow (UIUC)

Page 2: Applying Ideal Point IRT Models to Score Single Stimulus and Pairwise Preference Personality Items Stephen Stark (USF) Oleksandr S. Chernyshenko (UC, NZ)

2

Overview

“Problems” with current personality

assessment procedures

The case for ideal point response

process assumptions in personality

Ideal point IRT models for single

statement and pairwse preference items

Score comparability study

Page 3: Applying Ideal Point IRT Models to Score Single Stimulus and Pairwise Preference Personality Items Stephen Stark (USF) Oleksandr S. Chernyshenko (UC, NZ)

3

Personality Scale Construction Today

Rooted in Classical Test Theory (CTT) and

Common Factor Theory (CFT)

Uses single stimulus format, fixed length

scales and total scores in all analyses and

interpretations

Existing inventories Are static

Contain a large number of relatively short scales

Page 4: Applying Ideal Point IRT Models to Score Single Stimulus and Pairwise Preference Personality Items Stephen Stark (USF) Oleksandr S. Chernyshenko (UC, NZ)

4

Problem # 1

Current scales worked well for research purposes,

where the interest is to “understand the

relationship” between constructs

But, these measures are not well-suited for

adaptive formats or feedback purposes Item parameters are scale dependent

Item difficulties do not directly correspond to item content,

because of reverse scoring

Scales are too short to have good precision

More flexible test construction technology is needed

Page 5: Applying Ideal Point IRT Models to Score Single Stimulus and Pairwise Preference Personality Items Stephen Stark (USF) Oleksandr S. Chernyshenko (UC, NZ)

5

Problem # 2

CTT and CFT make dominance response process

assumption This has been “adopted” from cognitive ability testing

To satisfy constraints of the dominance assumption Reverse scoring of negative items is introduced

Neutral or extreme items are deleted from items pools

because they have low item-total correlations (loadings)

This results in depleted item pools and scales with

properties more suitable for scholarship exams

Page 6: Applying Ideal Point IRT Models to Score Single Stimulus and Pairwise Preference Personality Items Stephen Stark (USF) Oleksandr S. Chernyshenko (UC, NZ)

6

Person endorses item if her standing on the latent trait, theta, is more extreme than that of the item.

Only appropriate for moderately positive/negative items (e.g., “I like/dislike parties”)

0.00.10.20.30.40.50.60.70.80.91.0

-3 -2 -1 0 1 2 3

Theta

Pro

b o

f P

osi

tive

Re

spo

nse

Item Person

Dominance Response Process and Personality Items (MBR, 2001; JAP, 2006)

Page 7: Applying Ideal Point IRT Models to Score Single Stimulus and Pairwise Preference Personality Items Stephen Stark (USF) Oleksandr S. Chernyshenko (UC, NZ)

Person endorses item if her standing on the latent trait, theta, is near

that of the item. “My social skills are about average.” Disagree either because:

Too introverted (uncomfortable talking to people)

Too extraverted (great skills)

Ideal Point Process: A More Flexible Alternative?

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

-3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0Theta

Item

TooIntroverted

TooExtraverted

Page 8: Applying Ideal Point IRT Models to Score Single Stimulus and Pairwise Preference Personality Items Stephen Stark (USF) Oleksandr S. Chernyshenko (UC, NZ)

8

Ideal Point Process and Personality (JAP, 2006; Psych Assessment, in

press)

Ideal point IRT models provided better fit to a

wider variety of personality items than

dominance IRT models

Many nonmonotonic, but highly discriminating

items have been found

30% more items were retained in item pools More items are available for scale construction

Page 9: Applying Ideal Point IRT Models to Score Single Stimulus and Pairwise Preference Personality Items Stephen Stark (USF) Oleksandr S. Chernyshenko (UC, NZ)

9

Conclusions and Further Basic Research

Ideal point process offers numerous advantages for

improving current measures

More research is needed Only few ideal point models are available; more flexibility is

needed

Item and person parameter estimation must be improved

(APM, 2005)

Responses to adaptive scales may be more complicated

than we think

Note that this research carries limited applied value,

because traditional items are easily FAKED

Page 10: Applying Ideal Point IRT Models to Score Single Stimulus and Pairwise Preference Personality Items Stephen Stark (USF) Oleksandr S. Chernyshenko (UC, NZ)

10

Single Stimulus Response Format

Items consist of individual statements I get along well with others. (A+) I try to be the best at everything I do. (C+) I insult people. (A-) My peers call me “absent minded.” (C-)

Agree/Disagree or Likert type (SD,D,N,A,SA) response options are used

In each case, socially desirable response is obvious.

Page 11: Applying Ideal Point IRT Models to Score Single Stimulus and Pairwise Preference Personality Items Stephen Stark (USF) Oleksandr S. Chernyshenko (UC, NZ)

11

How to Deal With Faking?

Social Desirability (SD) scales often used to

“detect” and “correct” for faking Adjustments made to content scale scores

Little effect on validity

Correcting for faking using SD scores is

problematic, because… SD scales may function differently across testing situations

(JAP, 2001)

Need to develop fake-resistant items

Page 12: Applying Ideal Point IRT Models to Score Single Stimulus and Pairwise Preference Personality Items Stephen Stark (USF) Oleksandr S. Chernyshenko (UC, NZ)

12

Search for Fake-Resistant Formats

Empirically keyed, nontransparent items But problems with construct and face validity data

Biodata or situational judgments Do not measure personality directly Can be easily faked as soon as respondents told

personality is being assessed

Forced-choice (FC) items Halo and other biases are reduced (Borman et al.,

2001) Intuitively, should reduce faking (Jackson et al., 2000)

Page 13: Applying Ideal Point IRT Models to Score Single Stimulus and Pairwise Preference Personality Items Stephen Stark (USF) Oleksandr S. Chernyshenko (UC, NZ)

13

Unidimensional Pairwise Preference Format

Create items by pairing stimuli that are on the same dimension, but representing different locations on the trait continuum

Sociability item: I talk a lot. (+3) My social skills are about average . (0)

Respondent chooses statement that is “More Like Me”

Navy Computer Adaptive Personality Scales (NCAPS) uses this format

Page 14: Applying Ideal Point IRT Models to Score Single Stimulus and Pairwise Preference Personality Items Stephen Stark (USF) Oleksandr S. Chernyshenko (UC, NZ)

14

Multidimensional Pairwise Preference Format

Create items by pairing stimuli that are similar in desirability, but representing different dimensions

Positive item: I get along well with others. (A+) I set very high standards for myself. (C+)

Negative item: I insult people. (A-) I work just enough to pass my classes. (C-)

Variation of this approach is the tetrad format (Army AIM or SHL’s OPQ-32-i)

Page 15: Applying Ideal Point IRT Models to Score Single Stimulus and Pairwise Preference Personality Items Stephen Stark (USF) Oleksandr S. Chernyshenko (UC, NZ)

15

Scoring Forced Choice Measures

Traditional scoring of FC items is problematic Unidimensional FC scale scores have bi-modal

distributions Multidimensional FC scores are ipsative

Inter-individual comparisons not possible Scale scores correlate negatively (even facets of Big 5)

Scoring lacks a formal psychometric model Difficult to evaluate scoring accuracy Does not provide insight about item construction Not usable for adaptive testing

Page 16: Applying Ideal Point IRT Models to Score Single Stimulus and Pairwise Preference Personality Items Stephen Stark (USF) Oleksandr S. Chernyshenko (UC, NZ)

16

Are Forced Choice Scores Equivalent to Traditional

Scores? FC measures are gaining popularity

But, direct comparisons of traditional FC and

SS scores not possible “Score inflations” can only be evaluated within measures

Correlations between measures are low

Before evaluating FC measures in operational

settings: Scores must be normative

Under honest conditions, FC and SS scores should be the

same

Page 17: Applying Ideal Point IRT Models to Score Single Stimulus and Pairwise Preference Personality Items Stephen Stark (USF) Oleksandr S. Chernyshenko (UC, NZ)

17

Response Format Study(in review)

Used advances in IRT to obtain normative scores for

Order, Self Control and Sociability 36-item Single Stimulus measure

36-pair Unidimensional Pairwise Preference measure

36-pair Multidimensional Pairwise Preference measure

All scores were estimated using IRT

All items administered under honest conditions

(N=602 for self reports and N=110 for observers)

Page 18: Applying Ideal Point IRT Models to Score Single Stimulus and Pairwise Preference Personality Items Stephen Stark (USF) Oleksandr S. Chernyshenko (UC, NZ)

18

IRT Model for Single Stimulus Items

Generalized Graded Unfolding Model

(GGUM; Roberts et al., 1998) GGUM fit personality items well

(Chernyshenko, 2002)

No reverse scoring needed

C

w

w

kikiji

w

kikiji

z

kikiji

z

kikiji

ji

wMw

zMz

zZP

0 00

00

expexp

expexp

|

Page 19: Applying Ideal Point IRT Models to Score Single Stimulus and Pairwise Preference Personality Items Stephen Stark (USF) Oleksandr S. Chernyshenko (UC, NZ)

19

Example: “Ideal Point IRT” Order Scale

GGUM Fit Plot for ORD23:My room neatness is about average.

0.0

0.2

0.4

0.6

0.8

1.0

-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0

Theta

Pro

bab

ilit

y o

f P

osi

tive

R

esp

on

se

ORF

EMP

GGUM Fit Plot for ORD24:Half of the time I do not put things in their

proper place.

0.0

0.2

0.4

0.6

0.8

1.0

-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0

ThetaP

rob

abil

ity

of

Po

siti

ve R

esp

on

se ORF

EMP

Page 20: Applying Ideal Point IRT Models to Score Single Stimulus and Pairwise Preference Personality Items Stephen Stark (USF) Oleksandr S. Chernyshenko (UC, NZ)

20

IRT Model for Scoring Unidimensional Pairwise Preferences (Stark &

Drasgow,2002)

Zinnes and Griggs (1974) Probabilistic

Unfolding Model (ZG model)

Idea: Respondent has ideal point

representing his/her perception of typical

behavior (trait level)

Task: On each trial, respondent chooses the

statement that better describes him/her

Page 21: Applying Ideal Point IRT Models to Score Single Stimulus and Pairwise Preference Personality Items Stephen Stark (USF) Oleksandr S. Chernyshenko (UC, NZ)

04/19/23

Equation for ZG Item Response Functions

P a b a b

a

b

jk jk jk jk jk

jk j k

jk j k

i i i i i

i i i

i i i

( ) ( ) ( ) ( ) ( )

( ) /

0

0

1 2

2 3

is the cumulative standard normal

Page 22: Applying Ideal Point IRT Models to Score Single Stimulus and Pairwise Preference Personality Items Stephen Stark (USF) Oleksandr S. Chernyshenko (UC, NZ)

IRF for Stimulus-Pair j = 17, k = 18(

0.00

0.20

0.40

0.60

0.80

1.00

2.00 4.00 6.00 8.00

Pj,k(

0)

Page 23: Applying Ideal Point IRT Models to Score Single Stimulus and Pairwise Preference Personality Items Stephen Stark (USF) Oleksandr S. Chernyshenko (UC, NZ)

23

Respondent evaluates each stimulus (personality statement)

separately and makes independent decisions about endorsement.

Stimuli may be on different dimensions.

Single stimulus response probabilities P{0} and P{1} computed

using a unidimensional ideal point model for “traditional” items

(GGUM)

IRT Model for Scoring Multidimensional Pairwise Preferences (Stark, 2002; Stark, Chernyshenko,

& Drasgow, 2005)

}1{}0{}0{}1{

}0{}1{

}1,0{}0,1{

}0,1{),()(

tsts

ts

stst

stddts PPPP

PP

PP

PP

tsi

1 = Agree0 = Disagree

Refer to new pairwise preference model as MDPP

Page 24: Applying Ideal Point IRT Models to Score Single Stimulus and Pairwise Preference Personality Items Stephen Stark (USF) Oleksandr S. Chernyshenko (UC, NZ)

Model Notation (a)

(b)

Page 25: Applying Ideal Point IRT Models to Score Single Stimulus and Pairwise Preference Personality Items Stephen Stark (USF) Oleksandr S. Chernyshenko (UC, NZ)

25

Normative Score Recovery

Roberts et al. (2000) and Stark (1998, 2002)

showed in simulations studies: Accurate normative scores could be recovered for

GGUM, ZG and MDPP models

10 items or pairs per dimension are sufficient to

obtain reasonable estimates

But, no empirical study has compared scores

from these 3 formats, even under “honest”

conditions

Page 26: Applying Ideal Point IRT Models to Score Single Stimulus and Pairwise Preference Personality Items Stephen Stark (USF) Oleksandr S. Chernyshenko (UC, NZ)

26

Results for Conscientiousness Facets

GGUM MDPP ZG GGUM MDPP ZGGGUM .83MDPP .76 .76

ZG .75 .74 .75GGUM .34 .42 .39 .67MDPP .18 .34 .25 .64 .69

ZG .23 .31 .28 .66 .62 .70

Order

Self Control

Dimension FormatOrder Self Control

Correlations = reliability

Positive correlation for MDPP facet scores.

Page 27: Applying Ideal Point IRT Models to Score Single Stimulus and Pairwise Preference Personality Items Stephen Stark (USF) Oleksandr S. Chernyshenko (UC, NZ)

27

Results for Order and Sociability

Correlations = reliability

GGUM MDPP ZG GGUM MDPP ZGGGUM .83MDPP .76 .76

ZG .75 .74 .75GGUM -.08 -.20 -.13 .77MDPP -.10 -.14 -.13 .79 .76

ZG -.05 -.18 -.12 .73 .73 .73

Order

Sociability

Dimension FormatOrder Sociability

Page 28: Applying Ideal Point IRT Models to Score Single Stimulus and Pairwise Preference Personality Items Stephen Stark (USF) Oleksandr S. Chernyshenko (UC, NZ)

28

Criterion Validities

GGUM MDPP ZG GGUM MDPP ZGPreventative Health Behaviors

.15 .16 .21 .09 .01 .04

Traffic Risk Behaviors -.17 -.22 -.23 .08 .06 .18

Substance Avoidance .09 .17 .14 -.20 -.18 -.18

Study Behaviors .38 .38 .38 .02 .01 .00

SociabilityCriterion

Order

Criterion validities are comparable

Page 29: Applying Ideal Point IRT Models to Score Single Stimulus and Pairwise Preference Personality Items Stephen Stark (USF) Oleksandr S. Chernyshenko (UC, NZ)

29

Conclusions

Under honest conditions, MDPP, ZG, and SS

versions of the questionnaire provided equivalent

measurement and can be viewed as alternate

forms

Moving toward FC formats did not affect the validity

of personality scores.

Observing a positive correlation between Order

and Self Control MDPP scales provided empirical

evidence for normative scoring

Page 30: Applying Ideal Point IRT Models to Score Single Stimulus and Pairwise Preference Personality Items Stephen Stark (USF) Oleksandr S. Chernyshenko (UC, NZ)

30

Current Research

Results of this study speak in favor of using ZG and MDPP IRT models for scoring FC scales

Having IRT models makes transition to adaptive testing easy

Adaptive format may offer additional benefit of fake resistance (see NCAPS presentations for recent IMTA talks)

Current studies: How to best pair stimuli? How many unidimensional parings needed? Will increasing # of dimensions lead to more fake resistant

scores? Can we better detect faking using forced choice than

traditional format?