teoria masurarii

Upload: ttelecom2000

Post on 01-Jun-2018

228 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 teoria masurarii

    1/67

  • 8/9/2019 teoria masurarii

    2/67

    An Introduction to Modern Measurement Theory

    This tutorial was written as an introduction to the basics of item response theory (IRT) modelingand its applications to health outcomes measurement for the National Cancer Institutes Cancerutcomes Measurement !or"ing #roup (CM!#)$ In no way this tutorial is meant to replaceany te%t on measurement theory& but only ser'e as a steppingstone for health care researchers tolearn this methodology in a framewor" that may be more appealing to their research field$ Anillustration of IRT modeling is pro'ided in the appendi% and can pro'ide better insight into theutility of these methods$ This tutorial is only a draft 'ersion that may undergo many re'isions$

    lease feel free to comment or as" *uestions of the author of this piece+

    ,ryce ,$ Ree'e& h$-$ utcomes Research ,ranch Applied Research rogram-i'ision of Cancer Control and opulation .ciences National Cancer Institute Tel+ (/01) 2345264

    7a%+ (/01) 4/2/610 8mail+ ree'eb9mail$nih$go'

  • 8/9/2019 teoria masurarii

    3/67

    :

    utline

    Topic

    Reference Terms;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

    An Introduction to Modern Measurement Theory;;;;;;;;;;;;;;;;;$

    A ,rief ed Adapti'e Tests;;;;;;;;;;;;;

    Conclusion;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;$

    References;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;$$

    Illustration of IRT Modeling;;;;;;;;;;;;;;;;;;;;;;$$$

    /4

    /?

    42

    46

    21

    1/

    14

    12161?13:0:::/:4

    :2

    /1

    age

    /

    4

    2

    6

    11

    appendi%

  • 8/9/2019 teoria masurarii

    4/67

    /

    Reference+ .ome Terms @ou !ill .ee in this Tutorial as well as in the =iterature

    Assumption of =ocal Independence A response to a *uestion is independent of responses toother *uestions in a scale after controlling for the latent trait (construct) measured by the scale$

    Assumption of Bnidimensionality the set of *uestions are measuring a single continuous latent'ariable (construct)$

    Construct (a$"$a$ trait& domain& ability& latent 'ariable& or theta) see definition of theta below$

    Information 7unction (for a scale or item) an inde%& typically displayed in a graph& indicatingthe range of trait le'el o'er which an item or test is most useful for distinguishing amongindi'iduals$ The information function characteri>es the precision of measurement for persons atdifferent le'els of the underlying latent construct& with higher information denoting moreprecision$

    Item Duestion in a scale

    Item Characteristic Cur'e (ICC& a$"$a$ item response function& IR7& or item trace line) The ICCmodels the relationship between a persons probability for endorsing an item category and thele'el on the construct measured by the scale$

    Item -ifficulty (Threshold) arameter b point on the latent scale where a person has a 20Echance of responding positi'ely to the scale item$

    Item -iscrimination (.lope) arameter a describes the strength of an itemFs discriminationbetween people with trait le'els () below and abo'e the threshold b$ The a parameter may alsobe interpreted as describing how an item may be related to the trait measured by the scale$

    .cale (measure& *uestionnaire& or test) A scale in this tutorial is assumed to measure a singleconstruct or domain$

    .lope arameter .ee Item -iscrimination

    Test Characteristic Cur'e (TCC& a$"$a$ Test Response 7unction) The TCC describes thee%pected number of scale items endorsed as a function of the underlying latent 'ariable$

    Theta () The unobser'able (or latent) construct being measured by the *uestionnaire$ These

    constructs or traits are measured along a continuous scale$ 8%amples of constructs in healthoutcomes measurement are depression le'el& physical functioning& an%iety& or social support$

    Threshold arameter .ee Item -ifficulty

  • 8/9/2019 teoria masurarii

    5/67

  • 8/9/2019 teoria masurarii

    6/67

    2

    ,ecause the e%pected participantFs scale score is computed from their responses to each item

    (that is characteri>ed by a set of properties)& the IRT estimated score is sensiti'e to differences

    among indi'idual response patterns and is a better estimate of the indi'idualFs true le'el on the

    trait continuum than CTTFs summed scale score (.antor G Ramsay& 133?)$

    ,ecause of these ad'antages& IRT is being applied in health outcomes research to de'elop

    new measures or impro'e e%isting measures& to in'estigate group differences in item and scale

    functioning& to e*uate scales for crosswal"ing patient scores& and to de'elop computeri>ed

    adapti'e tests$ This handboo" pro'ides an o'er'iew of IRT and the more commonly used

    models in health outcomes research& along with a discussion of the applications of these models

    to de'elop 'alid& reliable& sensiti'e& and feasible endpoint measures$

    A ,rief

  • 8/9/2019 teoria masurarii

    7/67

    5

    normali>ed scores$ =awley (134/) e%tended the statistical analysis of the properties of the

    normal ogi'e cur'e and described ma%imumli"elihood estimation procedures for the item

    parameters and linear appro%imations to those estimates$ 7red =ord (132:) introduced the idea

    of a latent trait or ability and differentiated this construct from obser'ed test score$ =a>arsfeld

    (1320) described the unobser'ed 'ariable as accounting for the obser'ed interrelationships

    among the item responses$

    Considered a milestone in psychometrics (8mbretson G Reise& :000)& =ord and No'ic"s

    (135?) te%tboo" entitled .tatistical Theories of Mental Test .cores pro'ides a rigorous and

    unified statistical treatment of classical test theory$ The remaining half of the boo"& written by

    Allen ,irnbaum& pro'ides an e*ually solid description of the IRT models$ ,oc"& and se'eral

    student collaborators at the Bni'ersity of Chicago& including -a'id Thissen& 8iKi Mura"i&

    Richard #ibbons& and Robert Misle'y de'eloped effecti'e estimation methods and computer

    programs such as ,ilog& Multilog& arscale& and Testfact$ Along with Ait"en (,oc" G Ait"en&

    13?1)& ,oc" de'eloped the algorithm of marginal ma%imum li"elihood method to estimate the

    item parameters that are used in many of these IRT programs$

    In a separate line of de'elopment of IRT models& #eorg Rasch (1350) discussed the need

    for creating statistical models that maintain the property of specific obKecti'ity& the idea that

    people and item parameters be estimated separately but comparable on a similar metric$ Rasch

    inspired #erhard 7ischer (135?) to e%tend the applicability of the Rasch models into

    psychological measurement and ,en !right to teach these methods and help to inspire other

    students to the de'elopment of the Rasch models$ These students& including -a'id Andrich&

    #eoffrey Masters& #raham -ouglas& and Mar" !ilson& helped to push the methodology into

    education and beha'ioral medicine (!right& 1336)$

  • 8/9/2019 teoria masurarii

    8/67

    6

    Item Response Theory Model ,asics

    IRT is a model for e%pressing the association between an indi'idualFs response to an item

    and the underlying latent 'ariable (often called LabilityL or LtraitL) being measured by the

    instrument$ The underlying latent 'ariable in health research may be any measurable construct

    such as physical functioning& ris" for cancer& or depression$ The latent 'ariable& e%pressed as

    theta ()& is a continuous unidimensional construct that e%plains the co'ariance among item

    responses (.teinberg G Thissen& 1332)$ eople at higher le'els of ha'e a higher probability of

    responding correctly or endorsing an item$

    IRT models use item responses to obtain scaled estimates of & as well as to calibrate

    items and e%amine their properties (Mellenbergh& 1334)$ 8ach item is characteri>ed by one or

    more model parameters$ The item difficulty& or threshold& parameter b is the point on the latent

    scale where a person has a 20E chance of responding positi'ely to the scale item (*uestion)$

    Items with high thresholds are less often endorsed (.teinberg G Thissen& 1332)$ The slope& or

    discrimination& parameter a describes the strength of an itemFs discrimination between people

    with trait le'els () below and abo'e the threshold b$ The a parameter may also be interpreted as

    describing how an item may be related to the trait measured by the scale and is directly related&

    under the assumption of a normal distribution& to the biserial itemtest correlation (=inden G

  • 8/9/2019 teoria masurarii

    9/67

    ?

    asymptote parameter or guessing parameter c to possibly e%plain why people of low le'els of the

    trait are responding positi'ely to an item$

    To model the relation of the probability of a correct response to an item conditional on

    the latent 'ariable & trace lines& estimated from the item parameters& are plotted$ Most IRT

    models in research assume that the normal ogi'e or logistic function describes this relationship

    accurately and fits the data$ The logistic function is similar to the normal ogi'e function& and is

    mathematically simpler to use and& as a result& is predominately used in research$ The trace line

    (or sometimes called the item characteristic cur'e& ICC) can be 'iewed as the regression of item

    score on the underlying 'ariable (=ord& 13?0& p$ /4)$ The left graph in 7igure 1 models

    1$00 /0

    Test

    CharacteristicCur'e

    :2

    :0

    12

    10

    2

    0

    0$62

    (P)0$20

    0$:2

    0$00

    / : 1 0 1 : / / : 1 0 1 : /

    Bnderlying =atent Qariable ( ) Bnderlying =atent Qariable ( )

    7igure 1+ =eft figure is an IRT item characteristic cur'e (trace line) for one item$Right figure is test characteristic cur'e for thirty items$

    the probability for endorsing an item conditional on the le'el on the underlying trait$ The higher

    a persons trait le'el (mo'ing from left to right along the scale)& the greater the probability that

    the person will endorse the item$ 7or e%ample& if a *uestion as"ed& Are you unhappy most of the

    dayJ& then the left graph of 7igure 1 shows that people with higher le'els of depression () will

    ha'e higher probabilities for answering yes to the *uestion$ 7or dichotomous items& the

    probability of a negati'e response is high for low 'alues of the underlying 'ariable being

    measured& and decreases for higher le'els on $ The collection of the item trace lines forms the

  • 8/9/2019 teoria masurarii

    10/67

    3

    scaleH thus& the sum of the probabilities of the correct response of the item trace lines yields the

    test characteristic cur'e (TCC)$ The TCC describes the e%pected number of scale items endorsed

    as a function of the underlying latent 'ariable$ The right figure of 7igure 1 presents a TCC cur'e

    for /0 items$ !hen the sum of the probabilities is di'ided by the number of items& the TCC

    gi'es the a'erage probability or e%pected proportion correct as a function of the underlying

    construct (!eiss& 1332)$

    Another important feature of IRT models is the information function& an inde% indicating

    the range of trait le'el o'er which an item or test is most useful for distinguishing among

    indi'iduals$ In other words& the information function characteri>es the precision of measurement

    for persons at different le'els of the underlying latent construct& with higher information

    denoting more precision$ #raphs of the information function place personsF trait le'el on the

    hori>ontal %a%is& and amount of information on the 'ertical ya%is (left graph in 7igure :)$

    1$00 /0

    :2

    r $36

    r $35

    r $32

    r $3/

    r $30

    r $?0

    / : 1 0 1 : /

    Test

    Information

    / : 1 0 1 : /

    0$62

    Information

    :0

    12

    10

    2

    0$20

    0$:2

    0$00 0

    Bnderlying =atent Qariable ( ) Bnderlying =atent Qariable ( )

    7igure :+ Item information cur'e on left side$ Test information cur'e withappro%imate test reliability on right side$

    The shape of the item information function is dependent on the item parameters$ The higher the

    items discrimination& the more pea"ed the information function will beH thus& higher

    discrimination parameters pro'ide more information about indi'iduals whose trait le'els () lie

    near the itemFs threshold 'alue$ The items difficulty parameter(s) determines where the item

  • 8/9/2019 teoria masurarii

    11/67

    10

    information function is located (7lannery& Reise& G !idaman& 1332)$ !ith the assumption of

    local independence (re'iewed below)& the item information 'alues can be summed across all of

    the items in the scale to form the test information cur'e (=ord& 13?0)$

    At each le'el of the underlying trait & the information function is appro%imately e*ual to

    the e%pected 'alue of the in'erse of the s*uared standard errors of the estimates (=ord& 13?0)$

    The smaller the standard error of measurement (.8M)& the more information or precision the

    scale pro'ides about $ 7or e%ample& if a measure has a test information 'alue of 15 at :$0&

    then e%aminee scores at this trait le'el ha'e a standard error of measurement of 1 15 $:2&( )

    indicating good precision (reliability appro%imately $34) at the le'el of theta (7lannery et al$&

    1332)$ The right graph in 7igure : presents a test information function$ Most information

    (precision in measurement) is contained within the middle of the scale (1$0 S S 1$2) with less

    reliability (labeled r in graph) at the high and low ends of the underlying trait$ To obser'e the

    conditional standard error of measurement for a gi'en scale& the in'erse of the s*uare root of the

    test information function across all le'els of the continuum is graphed$

    /$0

    :$0

    1$0

    0$0

    / : 1 0 1 : /

    7igure /+ Test standard error of measurement

    7igure / presents the test .8M with little error (more precision) in measurement for middle to

    high le'els of the underlying trait& and high error in measuring respondents with low le'els of $

    If the scale& presented in 7igures : and /& measures physical functioning from poor ( /) to

  • 8/9/2019 teoria masurarii

    12/67

    11

    high ( /)& then the scale lac"s precision in measuring physically disabled patients ( S 1$2)&

    but ade*uately captures the physical functioning of a'erage to healthy (ambulatory) indi'iduals$

    .cale scoring in item response theory has a maKor ad'antage o'er classical test theory$ In

    classical test theory& the summed scale score is dependent on the difficulty of the items used in

    the selected scale& and therefore& not an accurate measure of a personFs trait le'el$ The procedure

    assumes that e*ual ratings on each item of the scale represent e*ual le'els of the underlying trait

    (Coo"e G Michie& 1336)$ Item response theory& on the other hand& estimates indi'idual latent

    trait le'el scores based on all the information in a participantFs response pattern$ That is& IRT

    ta"es into consideration& which items were answered correctly (positi'ely) and which ones were

    answered incorrectly& and utili>es the difficulty and discrimination parameters of the items when

    estimating trait le'els (!eiss& 1332)$ ersons with the same summed score but different response

    patterns may ha'e different IRT estimated latent scores$ ne person may answer more of the

    highly discriminating and difficult items and recei'e a higher latent score than one who answers

    the same number of items with low discrimination or difficulty$ IRT trait le'el estimation uses

    the item response cur'es associated with the indi'idualFs response pattern$ A statistical

    procedure& such as ma%imum li"elihood estimation& finds the ma%imum of a li"elihood function

    created from the product of the population distribution with the indi'idualFs trace cur'es

    associated with each itemFs right or wrong response$ A full discussion of trait scoring follows the

    o'er'iew of the IRT models$

    Assumptions of Item Response Theory Models

    The IRT model is based on the assumption that the items are measuring a single

    continuous latent 'ariable ranging from U to U$ The unidimensionality of a scale can be

    e'aluated by performing an itemle'el factor analysis& designed to e'aluate the factor structure

  • 8/9/2019 teoria masurarii

    13/67

    1:

    underlying the obser'ed co'ariation among item responses$ The assumption can be e%amined by

    comparing the ratio of the first to the second eigen'alue for each scaled matri% of tetrachoric

    correlations$ This ratio is an inde% of the strength of the first dimension of the data$ .imilarly&

    another indication of unidimensionality is that the first factor accounts for a substantial

    proportion of the matri% 'ariance (=ord& 13?0H Reise G !aller& 1330)$ 7or tests using many

    items& the assumption of unidimensionality may be unrealisticH howe'er& Coo"e and Michie

    (1336) report that IRT models are moderately robust to departures from unidimensionality$ If

    multidimensionality e%ists& the in'estigator may want to consider di'iding the test into subtests

    based on both theory and the factor structure pro'ided by the itemle'el factor analysis$ Multi

    dimensional IRT models do e%ist& but its models as well as informati'e documentation and user

    friendly software are still in de'elopment$

    In the IRT model& the item responses are assumed to be independent of one another+ the

    assumption of local independence$ The only relationship among the items is e%plained by the

    conditional relationship with the latent 'ariable $ In other words& local independence means

    that if the trait le'el is held constant& there should be no association among the item responses

    (Thissen G .teinberg& 13??)$ Qiolation of this assumption may result in parameter estimates that

    are different from what they would be if the data were locally independentH thus& selecting items

    for scale construction based on these estimates may lead to erroneous decisions (Chen G

    Thissen& 1336)$ The assumptions of unidimensionality and local independence are related in

    thatH items found to be locally dependent will appear as a separate dimension in a factor analysis$

    7or some IRT models& the latent 'ariable (not the data response distribution) is assumed

    to be normally distributed within the population$ !ithout this assumption& estimates of for

  • 8/9/2019 teoria masurarii

    14/67

    1/

    some response patterns (e$g$& respondents who do not endorse any of the scale items) ha'e no

    finite 'alues resulting in unstable parameter estimates (Chen G Thissen& 1336)$

    Item Response Theory Models

    Two Theories towards Measurement

    Thissen and rlando (:001) discuss two approaches to model building in item response

    theory$ ne approach is to de'elop a wellfitting model to reflect the item response data by

    parameteri>ing the ability or trait of interest as well as the properties of the items$ The goal of

    this approach is item analysis$ The model should reflect the properties of the item response data

    sufficiently and accurately& so that the beha'ior of the item is summari>ed by the item

    parameters$ The philosophy is that the items are assumed to measure as they do& not as they

    should (Thissen G rlando& :001)$ This approach to model building belie'es the theory of

    measurement is to e%plain (i$e$& model) the data$

    Another approach of IRT model building is to obtain specific measurement properties

    defined by the model to which the item response data must fit$ If the item or a person does not

    fit within the measurement properties of the IRT model& assessed by analysis of residuals (i$e$&

    item and person fit statistics)& the item or person is discarded$ This approach follows that of the

    Rasch (1350) models& and in the cases where the data fits the model& offers a simple

    interpretation for item analysis and scale scoring$ This approach to model building belie'es

    optimal measurement is defined mathematically& and then the class of item response models that

    yield such measurement is deri'ed$

    The two approaches described abo'e yield a di'ision in psychometrics$ Those who

    belie'e health research measurement should be about describing the beha'iors behind the

    response patterns in a sur'ey will use the most appropriate IRT model (e$g$& RaschVne

  • 8/9/2019 teoria masurarii

    15/67

    14

    arameter =ogistic Model& Twoarameter =ogistic Model& #raded Model) to fit the data$ The

    choice of the IRT model is data dependent$ Researchers from the Rasch tradition belie'e that the

    only appropriate models to use are the Rasch family of models& which retain strong mathematical

    properties such as specific obKecti'ity (person parameters and item parameters estimated

    separately) and summed score simple sufficiency (no information from the response pattern is

    needed) (see model descriptions below)$ .e'eral ad'antages of the Rasch model include+ the

    ability of the model to produce more stable estimates of person and item properties when there is

    a small number of respondents& when e%tremely nonrepresentati'e samples are used& and when

    the population distribution o'er the underlying trait is hea'ily s"ewed$

    8mbretson and Reise (:000) suggest one should use the Rasch family of models when

    each item carries e*ual weight (i$e$& each item is e*ually important) in defining the underlying

    'ariable& and when strong measurement model properties (i$e$& specific obKecti'ity& simple

    sufficiency) are desired$ If one desires fitting an IRT model to e%isting data or desires highly

    accurate parameter estimates& then a more comple% model such as the Twoarameter =ogistic

    Model or #raded Model should be used$

    The IRT models

    Table 1 presents se'en common IRT models with potential application to healthrelated

    research$ The table also indicates if the model is appropriate for dichotomous (binary) or

    polytomous (/ or more options) responses& and some characteristics associated with each model$

    Models noted with an asteris" are part of the Rasch family of models and& therefore& retain the

    uni*ue properties of summed score simple sufficiency and specific obKecti'ity$ 8ach of the

    models are discussed below$ ,ecause of the separate de'elopment of the Rasch and ne

    arameter =ogistic models& they are discussed indi'idually$

  • 8/9/2019 teoria masurarii

    16/67

  • 8/9/2019 teoria masurarii

    17/67

    15

    TX

    X Y

    (Thissen G rlando& :001)$ The model in this form has the interpretation of the probability of a

    positi'e response being e*ual to the 'alue of the person parameter X relati'e to the 'alue of the

    item parameter Y (=inden G

  • 8/9/2019 teoria masurarii

    18/67

    16

    1$0

    0$?

    (P)0$2

    0$/

    0$0

    / : 1

    0 1 : /

    7igure 4 Rasch V ne arameter =ogistic IRT Model (6 items)

    The nearameter =ogistic Model

    The de'elopment of the Rasch model was independent of the de'elopment of the one

    parameter logistic model& but both ha'e similar features and are mathematically e*ui'alent$ The

    oneparameter logistic (1=) model trace line for a gi'en item i is+

    Ti (u i 1 ) 1 $

    1 e%pZO a( O bi )[

    Ti (u i 1 ) traces the conditional probability of a positi'e (u i 1) response to item i as a

    function of the trait parameter& the threshold or difficulty parameter bi& and the discrimination

    parameter a$ !here the Rasch model had a fi%ed slope of one for all items& the 1= model only

    re*uires the slope to be e*ual for all items (Thissen G rlando& :001)$

    The population distribution of the underlying 'ariable for the oneparameter logistic

    model (as well as the two and threeparameter logistic models) is usually specified to ha'e a

    population mean of >ero and a 'ariance of one$ The threshold (or difficulty) parameters b i are

    located relati'e to >ero& which is the a'erage trait le'el in the population& and the slope parameter

    a ta"es some 'alue relati'e to the unit standard de'iation of the latent 'ariable (Thissen G

    rlando& :001)$ Thus& it is the latent 'ariable the model is assuming to be normally

    distributed& not the categorical item responses (Thissen G .teinberg& 13??)$

  • 8/9/2019 teoria masurarii

    19/67

    1?

    The Twoarameter =ogistic Model

    The twoparameter logistic model (:=H ,irnbaum& 135?) allows the slope or

    discrimination parameter a to 'ary across items instead of being constrained to be e*ual as in the

    oneparameter logistic or Rasch model$ The relati'e importance of the difference between a

    persons trait le'el and item threshold is determined by the magnitude of the discriminating

    power of the item (8mbretson G Reise& :000)$ The twoparameter logistic model trace line for

    the probability of a positi'e response to item i for a person with latent trait le'el is+

    Ti (u i 1 ) 1 $

    1 e%pZO1$6ai ( O bi )

    The constant& 1$6& is added to the model as an adKustment so that the logistic model appro%imates

    the normal ogi'e model$ Appro%imately half of the literature includes the adKustment and half

    does not (Thissen G .teinberg& 13??)$

    7igure 2 presents := trace lines for fi'e items with 'arying thresholdVdifficulty b and

    discrimination a parameters$ The item mar"ed by a dashed line represents an item with little

    relationship with the underlying 'ariable being measured by the sur'ey$ Items with steeper

    sloped trace lines ha'e more discriminating power$

    1$0

    0$?

    (P)0$2

    0$/

    0$0

    / : 1

    0 1 : /

    7igure 2 Two arameter =ogistic Trace =ines (2 items)

  • 8/9/2019 teoria masurarii

    20/67

    13

    The Threearameter =ogistic IRT Model

    The threeparameter logistic model (/=H =ord& 13?0) was de'eloped in educational

    testing to e%tend the application of item response theory to multiple choice items that may elicit

    guessing$ 7or item i& the threeparameter logistic trace line is+

    Ti (u i 1 ) ci 1 O ci $

    1 e%pZO1$6 a i ( O bi )[

    The guessing parameter c is the probability of a positi'e response to item i e'en if the person

    does not "now the answer$ !hen c 0& the threeparameter model is e*ui'alent to the :=

    model$ Including the guessing parameter changes the interpretation of other parameters in the

    model$ The threshold parameter b is the 'alue of theta at which respondents ha'e a ($2

    $2c)W(100)E chance of responding correctly to the item (Thissen G rlando& :001)$

    1$00

    0$62

    0$20

    0$:2

    0$00

    / : 1 0

    1 : /

    7igure 5 Three arameter =ogistic Model (1 item)

    7igure 5 presents a /= model trace line for one item$ .ur'ey respondents low on the underlying

    trait ha'e a :0 percent probability (c $:) of endorsing the item$

    The interpretation of the guessing parameter for multiple choice tests in educational

    measurement is straightforward& but can be 'ague in health measurement$ In most healthrelated

    measurement research& the guessing parameter is left out& thus opting for the := model$

  • 8/9/2019 teoria masurarii

    21/67

    :0

    beha'ior of participants in the *uestionnaire$ There may be a considerable proportion of

    participants in the sur'ey who may respond positi'ely to an item for other reasons besides the

    trait being measured$ In ability testing (in an educational conte%t)& it is usually assumed that

    respondents will usually inflate their abilities by guessing at the right answer$

  • 8/9/2019 teoria masurarii

    22/67

    :1

    (Thissen& Nelson& et al$& :001)$ The trace line models the probability of obser'ing each response

    alternati'e as a function of the underlying construct (.teinberg G Thissen& 1332)$ The slope a i

    'aries by item i& but within an item& all response trace lines share the same slope

    (discrimination)$ This constraint of e*ual slope for responses within an item "eeps trace lines

    from crossing& thus a'oiding negati'e probabilities$ The threshold parameters b i" 'aries within

    an item with the constraint b"1 S b" S b"1$ At each 'alue b"& the respondent has a 20E

    probability of endorsing the category$ TW("]) is the trace line describing the probability that a

    respondent at any particular le'el of will respond in that scoring category or a higher category$

    The graded model trace line T(u "]) represents the proportion of participants responding to

    that category across which will be a nonmonotonic cur'e& e%cept for the first and last response

    categories (Thissen& Nelson& et al$& :001)$

    7or the first response category " 1& TW(1]) 1H therefore& the trace line T(u 1]) will

    ha'e a monotonically decreasing logistic function with the lowest threshold parameter+

    Ti (u i 1 ) 1 O 1

    $1 e%pZOai ( O bi : )[

    (Thissen& Nelson et al$& :001)$ 7or the last response category " m& TW(m1]) 0H therefore&

    the trace line T(u m]) will ha'e a monotonically increasing logistic function with the highest

    threshold parameter+

    Ti (u i m )

    (Thissen& Nelson et al$& :001)$

    1

    1 e%pZOai ( O bi m )[

  • 8/9/2019 teoria masurarii

    23/67

    ::

    1$00

    .trongly

    -isagree

    0$62

    Agree

    .trongly

    Agree

    0$20 -isagree Neut ral

    0$:2

    0$00

    / : 1 0

    1 : /

    7igure 6 #raded Model (1 item with 2 response categories)

    7igure 6 displays a graded item with fi'e response categories$ The model presents o'er what

    le'els of the underlying trait a person is li"ely to endorse one of the response options$

    The Nominal Model

    roposed by ,oc" (136:)& the Nominal Model is an alternati'e to the #raded Model for

    polytomously scored items& not re*uiring any a priori specification of the order of the mutually

    e%clusi'e response categories with respect to $ The nominal model trace line for scores u 1& :&

    $$$& mi & for item i is+

    Ti (u i % ) e%pZa i % ci % [

    \ e%pZa" 1

    m$

    i" ci " [

    !here is the latent 'ariable& a"s are discrimination parameters& and c"s are the intercepts$ To

    identify the model& additional constraints are imposed$ The sum of each set of parameters must

    e*ual >ero& i$e$

    \a" 0

    m O1

    i" \ ci" 0

    " 0

    m O1

    (Thissen& Nelson et al$& :001)$

  • 8/9/2019 teoria masurarii

    24/67

    :/

    The Nominal Model is used when no prespecified order can be determined among the

    response alternati'es$ In other words& the model allows one to determine which response

    alternati'e order is associated with higher le'els on the underlying trait$ This model has also

    been applied to determine the location of the neutral response in a =i"erttype scale in relation to

    the ordered responses$ nce the item order has been confirmed& the #raded IRT Model is often

    fit to the data$

    The artial Credit Model

    7or items with two or more ordered responses& Masters (13?:) created the partial credit

    model within the Rasch model framewor"& and thus the model shares the desirable characteristics

    of the Rasch family+ simple sum as a sufficient statistic for trait le'el measurement& and separate

    persons and item parameter estimation allowing specifically obKecti'e comparisons$ The partial

    credit model contains two sets of location parameters& one for persons and one for items& on an

    underlying unidimensional construct (Masters G !right& 1336)$

    The partial credit model is a simple adaptation of RaschFs model for dichotomies$ The

    model follows that from the intended order 0 S 1 S :& $ $ $& S m& of a set of categories& the

    conditional probability of scoring % rather than % 1 on an item should increase monotonically

    throughout the latent 'ariable range$ 7or the partial credit model& the e%pectation for person K

    scoring in category % o'er % 1 for item i is modeled+

    e%p( K O Y i% )

    1 e%p( K O Y i% )&

    where Y i% is an item parameter go'erning the probability of scoring % rather than % 1$ The Y i%

    parameter can be thought of as an item step difficulty associated with the location on the

    underlying trait where categories %1 and % intersect$ Rewriting the model& the response function

  • 8/9/2019 teoria masurarii

    25/67

    :4

    for the probability of person K scoring % on one of the possible outcomes 0& 1& :& $ $ $& m of item i

    can be written+

    Ti u i % K ( )\

    e%p \" 0 ( K O Y i" )%

    mK

    h 0

    he%p \" 0 ( K O Y i" ) & % 0& 1& $ $ $& mi&

    (Masters G !right& 1336)$ Thus& the probability of a respondent K endorsing category % for item

    i is a function of the difference between their le'el on the underlying trait and the step difficulty

    ( O Y ) $

    Thissen& Nelson et al$ (:001& p$ 14?) note the artial Credit Model to be a constrained

    'ersion of the Nominal Model in which& Lnot only are the aFs constrained to be linear functions of

    the category codes& all of those linear functions are constrained to ha'e the same slope Zfor all the

    items[L

    The #enerali>ed artial Credit Model is a generali>ation of the artial Credit Model that

    allows the discrimination parameter to 'ary among the items$

    The Rating .cale Model

    The Rating .cale Model (Andrich& 136?a& 136?b) is another member of the Rasch family

    because the model retains the elegant measurement property of simple score sufficiency$ The

    Rating .cale Model is deri'ed from the artial Credit Model with the same constraint of e*ual

    discrimination power across all items$ The Rating .cale Model differs from the artial Credit

    Model in that the distance between difficulty steps (or le'els) from category to category within

    each item is the same across all items$ The Rating .cale Model includes an additional parameter

    ^i& which locates where the item i is on the underlying construct being measured by the scale$

    The response function for the unconditional probability of person K scoring % on one of the

    possible outcomes 0& 1& :& $ $ $& m of item i can be written+

  • 8/9/2019 teoria masurarii

    26/67

    :2

    Ti u i % K ( )\

    e%p \" 0 ( K O (^i Y " ))%

    mK

    h 0

    he%p \" 0 ( K O (^i Y " ))

    & where \ ( O (^0" 0 K i

    Y " )) 0

    (8mbretson G Reise& :000)$ The constraint that a fi%ed set of rating points are used for the entire

    item set re*uires the item formats to be similar throughout the scale (e$g$& all items ha'e four

    response categories)$

    Trait .coring

    In classical test theory& scales are scored typically by summing the responses to the items$

    This summed score may then be linearly transformed to a scaled score estimate of a persons trait

    le'el$ 7or e%ample& if a respondent endorses :0 out of 20 items& he recei'es a score of 40E$ n

    the other hand& item response theory uses the properties of the items (i$e$& item discrimination&

    item difficulty) as well as "nowledge of how item properties influence beha'ior (i$e$& the item

    trace line) to estimate a persons trait score based on their responses to the items (8mbretson G

    Reise& :000)$

    IRT models are used to calculate a persons trait le'el by first estimating the li"elihood of

    the pattern of responses to the items& gi'en the le'el on the underlying trait being measured by

    the scale$ ,ecause the items are locally independent& the li"elihood function = is

    =nitems

    i 1

    _ T (u )i i

    which is simply the product of the indi'idual item trace lines Ti (u i )$ Ti (u i ) models the

    probability of the response u to the item i conditional on the underlying trait $ ften&

    information about the population is included in the estimation process along with the information

    of the item response patterns$ Therefore& the li"elihood function is a product of the IRT trace

  • 8/9/2019 teoria masurarii

    27/67

    :5

    lines for each indi'idual item i multiplied by the population distribution of the latent construct

    ` ( ) +

    =nitems

    i 1

    _ T (u ) ` ( ) $i i

    Ne%t& trait le'els are estimated typically by a ma%imum li"elihood methodH specifically& the

    persons trait le'el ma%imi>es the li"elihood function gi'en the item properties$ Thus& a

    respondents trait le'el is estimated by a process that 1) calculates the li"elihood of a response

    pattern across the continuous le'els of the underlying trait & and :) uses some search method to

    find the trait le'el at the ma%imum of the li"elihood (8mbretson G Reise& :000)$ ften times&

    this search method uses some form of the estimate of the mode (highest pea") or the a'erage of

    the li"elihood function$ These estimates can be linearly transformed to ha'e any mean and

    standard de'iation a researcher may desire$

    As an illustration of trait scoring& 7igure ? presents graphs of the population distribution

    of the underlying 'ariable& four items twoparameter logistic IRT trace lines& and the li"elihood

    function for a persons response pattern of endorsing (i$e$& true response) the first two items and

    not endorsing (i$e$& false response) the last two items of a four item scale$ In 7igure ?a& the

    population distribution of is assumed to be normally distributed& and the scale is set to a mean

    of >ero and 'ariance of one$ ther distributions can be used or no population information can be

    pro'ided in the estimation process$

  • 8/9/2019 teoria masurarii

    28/67

    :6

    These trace lines for nonendorsement are represented by a monotonically decreasing cur'e to

    reflect that respondents high on the latent trait are less li"ely to respond false to (or not endorse)

    the items$

    The li"elihood function (see 7igure ?f) describes the li"elihood of a respondents trait

    le'el gi'en the population distribution of and the persons response pattern of endorsing the

    first two items and not endorsing the last two items& as represented by the following e*uation+

    = ` ( ) W T1 (u1 true ) W T: (u : true )W T/ (u / false ) W T4 (u 4 false )

    The ma%imum of the li"elihood (i$e$& the respondents trait le'el) for this response pattern is

    estimated to be 0$14$ The a'erage of the distribution can also be used as an estimate of the

    respondents trait le'el& 0$1:$

    As another e%ample& 7igure 3 presents graphs of the population distribution& four item

    trace lines& and li"elihood function for a respondent who endorses the first& second& and fourth

    item& but responds negati'ely to the third item$ The ma%imumli"elihood estimate for this

    response pattern is 0$6:$ As e%pected& this response pattern yields a higher estimated score

    because the respondent endorsed one additional item than the prior response pattern$ The

    li"elihood function for the second response pattern is smaller (has less area under the cur'e) than

    the li"elihood function for the first response pattern$ The decreased area reflects an

    inconsistency of item responses in the second pattern$ The items are ordered by thresholds (i$e$&

    item 1 is the least difficult and item 4 is the most difficult item to endorseH see b parameter

    estimates)& therefore& one would e%pect that if a person endorses item four& then they should also

    endorse the first three items with lower difficulties$ As an e%ample& if these four items represent

    physical tas"s of harder comple%ity from wal"ing ten steps (item 1)& wal"ing to your mailbo%

    (item :)& wal"ing a bloc" (item /)& to wal"ing a mile (item 4)& then you would e%pect someone

  • 8/9/2019 teoria masurarii

    29/67

    :?

    7igure ?

    a) opulation -istribution

    (NormalH mean 0H 'ariance 1)

    / : 1 0

    1 : /

    b) Item 1 (a :$//& b 0$14) true response

    1

    c) Item : (a :$02& b 0$0:) true response

    1$0

    0$2 0$2

    0

    / : 1 0

    1 : /

    0$0

    / : 1 0

    1 : /

    d) Item / (a /$46& b 0$:5) false response

    1$0

    e) Item 4 (a :$41& b 1$:6) false response

    1$0

    0$2 0$2

    0$0

    / : 1 0

    1 : /

    0$0

    / : 1 0

    1 : /

    f) =i"elihood function (mode 0$14& a'erage 0$1:)

    / : 1 0

    1 : /

  • 8/9/2019 teoria masurarii

    30/67

    :3

    7igure 3

    a) opulation -istribution

    (NormalH mean 0H 'ariance 1)

    / : 1 0

    1 : /

    b) Item 1 (a :$//& b 0$14) true response

    1

    c) Item : (a :$02& b 0$0:) true response

    1$0

    0$2 0$2

    0

    / : 1 0

    1 : /

    0$0

    / : 1 0

    1 : /

    d) Item / (a /$46& b 0$:5) false response

    1$0

    e) Item 4 (a :$41& b 1$:6) true response

    1$0

    0$2 0$2

    0$0

    / : 1 0

    1 : /

    0$0

    / : 1 0

    1 : /

    f) =i"elihood function (mode 0$6:& a'erage 0$?0)

    / : 1 0

    1 : /

  • 8/9/2019 teoria masurarii

    31/67

    /0

    who endorsed the item wal"ing a mile to also endorse the item wal" a bloc"$

    The oneparameter logistic model or Rasch model constrains the discrimination

    parameter& a& to be e*ual across the four items in the e%ample$ Therefore any response pattern

    with the same number of endorsed items will ha'e the same estimated trait le'el$ nowledge of

    an indi'iduals particular response pattern is not needed and& by the same to"en& information

    about that indi'iduals trait le'el that might be deri'ed from the response pattern is ignored$

    Thus& the total score is a sufficient statistic for estimating trait le'els$ Table : lists all possible

    response patterns for the four binary items (:4 15 patterns)& the summed score of the response

    pattern& and the associated ma%imumli"elihood estimates of the trait le'el calculated by using

    either the twoparameter logistic (:=) IRT model or the Rasch model$ The table shows that

    response patterns with two item endorsements (5 to 11) are estimated by the Rasch model to

    ha'e the same trait le'el ( 0$::)$ 7or these types of item response patterns with e*ual

    number of total responses& the := IRT model will gi'e higher estimates of the trait le'el for

    patterns with higher item thresholds (difficulties)& and li"ewise& lower estimated trait le'els for

    patterns with lower item thresholds$ Thus& the := IRT model will estimate a higher latent trait

    score for a person who gets two harder items correct than a person who endorses two easier

    items$ Those who e%clusi'ely use the Rasch model 'iew this property of the := IRT model as

    a wea"ness$ It is inconsistent for a person to endorse a harder item and not an easier item$

    The earson correlation among the summed score& the Rasch model score& and the :=

    IRT model score& for the abo'e e%ample& are abo'e $36$ -espite the presented data is only an

    e%ample& adding more items still yield correlations among the scores abo'e $3$ .o the *uestion is

    often as"ed& !hy not use the summed score& it is easier to compute .ummed scores are on an

    ordinal scale that assumes the distance between any consecuti'e scores is e*ual$ IRT model

  • 8/9/2019 teoria masurarii

    32/67

    /1

    Table :

    1:/4

    256?

    310111:1/141215

    Item Response attern

    0 not endorse& 1 endorse 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0

    0 0 0 1 1 1 0 0 1 0 1 0

    0 1 1 0 1 0 0 1

    0 1 0 1 0 0 1 1 1 1 1 0 1 1 0 1 1 0 1 1 0 1 1 1 1 1 1 1

    .ummed

    .core 0 1 1 1

    1 : :

    : :

    : : / / / / 4

    : = IRT Model

    Ma%imum =i"elihood 8stimate 0$?: 0$:6 0$:1 0$13

    0$01 0$14 0$12

    0$13 0$/1

    0$/5 0$/6 0$2: 0$6: 0$64 0$?0 1$/2

    1 = IRT V Rasch Model

    Ma%imum =i"elihood 8stimate 0$?4 0$:: 0$:: 0$::

    0$:: 0$:: 0$::

    0$:: 0$::

    0$:: 0$:: 0$61 0$61 0$61 0$61 1$/5

    Note+ The four items are ordered by difficulty with the last item ha'ing the highest threshold$

    scores are on an inter'al scale and distance between scores 'ary depending on the difficulty (and

    sometimes discriminating power) of the *uestion$ 7or e%ample& the difference in physical ability

    to endorse two items that as" if one can wal" 10 feet and :0 feet is certainly different than the

    ability to endorse two items that as" if one can wal" 100 feet and two miles$ Ability scores

    should be close together for the first two items (wal"ing 10 feet and :0 feet)& and scores should

    be farther apart for answering the last two items (wal"ing 100 feet and : miles)$

    Classical Test Theory and Item Response Theory

    The past and most of the present research in health measurement has been grounded in

    classical test theory (CTT) modelsH howe'er wor"s by 8mbretson and Reise (:000) and Reise

    (1333) point out se'eral ad'antages to mo'ing to IRT modeling$ Table / pro'ides se'eral "ey

    differences between CTT and IRT models$

    recision of measurement statistics such as standard error of measurement (.8M) and

    reliability indicate how well an instrument measures a single construct$ The .8M describes an

    e%pected score fluctuation due to error in the measurement tool (8mbretson and Reise& :000)$

  • 8/9/2019 teoria masurarii

    33/67

    /:

    Table /

    Classical Test TheoryMeasures of precision fi%ed for all scores=onger scales increase reliabilityTest properties are sample dependentMi%ed item formats leads to unbalanced impact

    on total test scoresComparing respondents re*uires parallel scales

    .ummed scores are on ordinal scale

    Item Response Theoryrecision measures 'ary across scores.horter& targeted scales can be e*ually reliableTest properties are sample free8asily handles mi%ed item formats$

    -ifferent scales can be placed on a commonmetric.cores on inter'al scale#raphical tools for item and scale analysis

    Reliability is the fraction of obser'ed score 'ariance that is true score 'ariance& or the proportion

    that is not error 'ariance (!ainer G Thissen& in press)$ In CTT& both .8M and reliability (such

    as Cronbachs & internal consistency) measures are fi%ed for all scale scores$ In other words&

    CTT models assume that measurement error is distributed normally and e*ually for all score

    le'els (8mbretson G Reise& :000)$ In IRT& measures of precision are estimated separately for

    each score le'el or response pattern& controlling for the characteristics (e$g$& difficulty) of the

    items in the scale$ recision of measurement is best (low .8M& high reliabilityVinformation)

    typically in the middle of the scale range (or trait continuum)& and precision is least at the low

    and high ends of the continuum where items do not discriminate well among respondents$

    In CTT& scale reliability is a function of the number of items in the scale$

  • 8/9/2019 teoria masurarii

    34/67

  • 8/9/2019 teoria masurarii

    35/67

    /4

    scores on a gi'en construct for respondents from that population who did not answer the same

    *uestions& without intermediate e*uating steps (rlando& .herbourne& G Thissen& :000)$

    A wonderful feature of IRT models are the graphical descriptions of item functioning and

    test functioning$ IRT models allow you to graph the probability of a person endorsing an item

    gi'en the persons le'el on the underlying trait (see trace lines in 7igure 1 for binary data and

    7igure 6 for multipleresponse data)& the reliability or precision of the scale at all le'els of the

    underlying construct (see information cur'es in 7igures : and /)& response pattern scoring (see

    7igures ? and 3)& and group differences in responding to items (see 7igure 12 and 15)$ These

    graphical tools are 'aluable for decision ma"ing& presentation of results& and allowing

    researchers unfamiliar with IRT to better understand its methodology$

    Applications of Item Response Theory in

  • 8/9/2019 teoria masurarii

    36/67

    /2

    Information about the items location on the underlying trait allows researchers the

    opportunity to identify trait le'els not measured or o'erly measured by the instrument$ After

    estimating item properties& there are many graphical resources to 'iew the items location along

    the underlying trait$ ne common and interpretable graphical tool to model the item locations

    for the Rasch model is an item mapJ$ The Rasch or 1= IRT model constrains the items

    discrimination power (item slope) to be e*ui'alent& and estimates the location of the items on the

    underlying trait $ 7igure 10a presents an item map for twel'e items numbered 1 1: abo'e the

    hori>ontal a%is line$ 8%cept for item 5& the items are located at the lower end of the latent scale&

    meaning that the items are measuring low le'els of the trait being measured by the instrument$

    Items 1& /& 2& and ? measure people at similar trait le'els& suggesting that an instrument

    de'eloper may wish to remo'e one or two of the items because they pro'ide redundant

    information$ 7igure 10b displays the item characteristic cur'es (trace lines) for the same set of

    twel'e items$ The Rasch models constraint of e*ual item discriminations (i$e$& e*ual slopes) is

    easily obser'ed in this plot of parallel cur'es$ Items 1& /& 2& and ? are bunched together around

    theta le'els of 0$46$ 7or an o'erall 'iew of the functioning of the twel'e items in the scale& the

    test information function (see 7igure 10c) displays the precision of measurement of respondents

    at different le'els of the underlying latent construct $ A scale de'eloper can easily obser'e that

    the instrument lac"s information for measuring respondents with high le'els of the trait $

    The twoparameter logistic IRT model estimates an items location along the continuous

    scale& but also estimates the discrimination power or an items relationship with the underlying

    'ariable$ 7igure 11 presents item characteristic cur'es for twel'e items with the same location

    (threshold) parameters as items in 7igure 10& but with 'arying le'els of discriminati'e power$

  • 8/9/2019 teoria masurarii

    37/67

    /5

    a$

    11?/ 6 4:1:3 1 2 105

    /$0 :$2 :$0 1$2 1$0 0$2 0$0

    0$2 1$0 1$2 :$0 :$2 /$0

    b$

    1$00 10

    3

    ?

    c$

    Test

    Information

    0$626

    5

    2

    4

    /

    :

    1

    0

    (P)0$20

    0$:2

    0$00

    / : 1 0

    1 : // : 1 0 1 : /

    7igure 10 a$ Item map for twel'e items along the latent trait $ b$ Item characteristic cur'es forthe same twel'e items$ c$ Test information cur'e for the twel'eitem scale$

    7or the redundant item set ( 1& /& 2& and ?)& items 2 and ?& identified by dotted lines& pro'ide

    less precision in measurement of than items 1 and /& and thus& may be remo'ed from the

    scale$

    1$00

    0$62

    (P)0$20

    0$:2

    0$00

    / : 1 0

    1 : /

    7igure 11 := IRT model trace lines for twel'e items$

  • 8/9/2019 teoria masurarii

    38/67

    /6

    lacement of items within a scale may differ depending on the purpose of the instrument$

    A diagnostic instrument& with the intent to identify respondents high on a trait for purposes of

    treatment or to di'ide respondents into one of se'eral groups along a continuous measure& will

    re*uire items with location parameter 'alues at the cutoff points on the scale$ Thus& accuracy in

    person measurement is most precise at the category thresholds$

  • 8/9/2019 teoria masurarii

    39/67

    /?

    item local independence will re'eal redundant items$ Redundant items essentially as" the same

    *uestion of the respondent$ .uch locally dependent items typically ha'e similarlyworded

    formats or sentence stems$

    Together& the applications of both classical test theory and item response theory pro'ide

    instrument de'elopers the tools to understand how each of the items function within a scale$ IRT

    can increase scale precision with fewer items by the selection of items that are highly related to

    the underlying dimensions (.teinberg G Thissen& 1332)$ Also& de'elopers can create shorter

    scales by choosing *uestions (items) that target the population of interest to measure$ .horter

    instruments with high reliability are attracti'e for both instrument responders and graders$

    -ifferential Item 7unctioning

    -ifferential item functioning (-I7) is a condition when an item functions differently for

    respondents from one group to another$ In other words& respondents& with similar le'els on a

    latent trait (e$g$& physical functioning) but who belong to different populations& ha'e a different

    probability of responding to an item$ -I7 items are a serious threat to the 'alidity of the

    instruments to measure the trait le'els of members from different populations or groups$

    Instruments containing such items may ha'e reduced 'alidity for betweengroup comparisons&

    because their scores may be indicati'e of a 'ariety of attributes other than those the scale is

    intended to measure (Thissen& .teinberg& G !ainer& 13??)$

    In the past& differential item functioning has been called Litem biasL in the literature

    because such an item biases one group to ha'e a higher scale score than another group$ 7or

    e%ample& the item I cry easilyJ on the Minnesotta Multiphasic ersonality In'entory (MMI:H

    ,utcher& -ahlstrom& #raham& Tellegen& G aemmer& 13?3) unfa'orably biases females to ha'e

    higher depression scores than males who as a group are culturally condoned not to display such

  • 8/9/2019 teoria masurarii

    40/67

    /3

    emotions$ Item bias indicates that groups are treated unfairlyH howe'er& Angoff (133/) cautions

    that differential item functioning can occur without Kudgements of unfairness to one group or

    another$ 7or e%ample& when two different national groups were compared on a common measure

    as part of a research effort to study linguistic differences (e$g$& Alderman G

  • 8/9/2019 teoria masurarii

    41/67

    40

    and focal group ha'e different item endorsement rates after controlling for group differences on

    the latent 'ariable being measured by the instrument$ 7igure 1: presents two item trace cur'es

    estimated separately for each group& with the focal group ha'ing the higher threshold le'el$ This

    -I7 e%ample shows that& o'er all the le'els on the underlying 'ariable & the reference group has

    a higher probability (P) of endorsing the item than the focal group$ That is& the item is easier

    for the reference group to answer than the focal group$

    1$00

    7o cal #ro up

    Reference #ro up

    0$62

    (P)0$20

    0$:2

    0$00

    / : 1 0

    1 : /

    7igure 1:$ neparameter logistic (or Rasch) model trace linesfor reference (b 0) and focal groups (b 1)$ At e*ual le'elsof the underlying 'ariable& respondents from the reference groupare estimated to ha'e a higher probability of endorsing the itemthan respondents from the focal group$

    7or binary data& the twoparameter logistic IRT model allows for the in'estigation of -I7

    between groups in the itemFs threshold parameter b& slope parameter a& or both parameters$ -I7

    in the slope parameter represents an interaction between the underlying measured 'ariable and

    group membership (Teresi& leinman& G cepe"!eli"son& :000)$ The degree to which an item

    relates to the underlying construct depends on the group being measured$ 7igure 1/ presents the

    reference and focal groupsF twoparameter logistic IRT trace lines for the same item$ -I7

    analyses suggest that the item is more endorsed (or answered correctly) by the reference group

  • 8/9/2019 teoria masurarii

    42/67

    41

    (i$e$& lower b parameter) than the focal group& and the item is more discriminating among

    respondents in the reference group (i$e$& higher a parameter) than the focal group$

    1$00

    7o cal #ro upReference #ro up

    0$62

    (P)0$20

    0$:2

    0$00

    / : 1 0 1 : /

    7igure 1/$ Twoparameter logistic IRT model trace lines forthe focal group (a 1$2& b 0) and reference group(a 1$0& b 1$0)$

    =i"e the twoparameter IRT model& the threeparameter logistic IRT model allows for the

    in'estigation of -I7 in the discrimination parameter and threshold parameter& but also e'aluates

    -I7 in the pseudoguessing parameter$ 7igure 14 presents threeparameter logistic IRT cur'es

    for the reference and focal groups$ -I7 in the pseudoguessing parameter indicates that groups

    1$00

    7ocal #roup

    0$62

    Reference#roup

    (P)0$20

    0$:2

    0$00

    / : 1 0 1 : /

    7igure 14$ Threeparameter logistic IRT cur'es for the referencegroup (a 1$2& b 0& c $:0) and focal group (a 1$0& b 1$0&c $:2)$

  • 8/9/2019 teoria masurarii

    43/67

    4:

    are differently affected by some confounding 'ariable that e%plains why respondents low on the

    underlying 'ariable measured by the instrument may endorse the item$

    Rasch modelers in'estigate differential item functioning only in the threshold (location)

    parameter$ This methodology has strict re*uirements to maintain the elegance of the Rasch

    model (e$g$& sum score sufficiency)$ Any item that differs in its ability to discriminate among

    respondents compared to other items in a measure is considered a misfitting item to the Rasch

    model (.mith& 1331)$ Thus& if an item has different estimated slopes (i$e$& discrimination ability)

    between the reference and focal groups& the item is not considered biased& but is considered

    misfit and is remo'ed from the measure$

    As an illustration of the application of -I7 analysis in the framewor" of Rasch

    measurement& Tennant (:000& une) in'estigated differences in responding patterns between

    respondents from the Bnited ingdom and Italy on items in the *uality of life measurement scale

    =M.Do=$ .tudy results found significant differential item ordering (i$e$& different threshold

    parameter estimates) between the two groups of respondents$ 7igure 12 presents the =M.Do=

    scale items ordered by their estimated threshold parameters for respondents from Italy and the

    B$ Item ordering represents differences in groupsF attitudes towards *uality of life for their

    culture$ TennantFs (:000& une) results suggest a strong need to in'estigate -I7 for all health

    measures to create a core set of outcome measures that can be used across different cultures$

    Many researchers (e$g$& Angoff& 133/H Camilli G .hepard& 1334) belie'e in'estigation of

    -I7 in the framewor" of Rasch measurement is limited$ Nonconsideration of the possible

    differences in respect to discrimination power or differences in respect to pseudoguessing will

    result in undetected -I7 items and will lead to the ultimate remo'al of the most useful items in a

    measure (Angoff& 133/)$ Therefore& applications of the Rasch models limit researchersF

  • 8/9/2019 teoria masurarii

    44/67

    4/

    understanding of group differences in responding to items in a measure$ IRT models that allow

    the le'el of discrimination to 'ary from item to item describe the data more accurately than

    models that constrain the slope parameter to be e*ual across items$

    1$0

    Italy

    0$2

    I ha'e felt tired

    B

    I ha'e felt happy about the future

    I ha'e felt good about my appearanceI ha'e felt happy about the futureI ha'e had as much energy as usualI ha'e felt tired

  • 8/9/2019 teoria masurarii

    45/67

    44

    differently percei'ed$ It may be that the nonclinical respondents interpret the item to mean

    LwellL in the physical sense& where as the clinical respondents may interpret the item to

    1$00

    0$62

    Clinical #ro upNo nClinical #ro up

    (P)0$20

    0$:2

    0$00

    / : 1

    0 1 : /

    7igure 15$ Twoparameter logistic trace lines for the item-uring the past few years& I ha'e been well most of the timemodeled for a clinical (a 1$?:& b 1$2?) and nonclinical(a :$31& b 1$2?) group of respondents to the MMI:$

    mean LwellL in the mental sense$ ,ecause the threshold parameter of this item is found to be

    in'ariant across the two study groups& applications of the Rasch model would not identify -I7

    and therefore would li"ely not disco'er the discrepancy between the groupsF understanding of the

    item content$ The Rasch model may find the item to misfit in one group and not the other&

    possibly leading to further in'estigation$

    As a goal for any researcher to create a core set of endpoint measures that could be used

    in a 'ariety of trialbased and obser'ational studies and that is in'ariant to subgroups& it is

    necessary to in'estigate the psychometric properties of the items within the scales$ 8ssential to

    this in'estigation is the analysis of differential item functioning to obser'e if a group such as

    cancer patients interprets or treats the item content of a measure in the same manner as a control

    group$ -I7 analysis based on the IRT model can not only pro'ide identification and remo'al of

    1The clinical sample is a group of :&56: females from both inpatient and outpatient mental ser'ices collected across

  • 8/9/2019 teoria masurarii

    46/67

    42

    biased items& but also pro'ide aid to interpreting psychological or physiological processes

    underlying group differences in responding to particular items$

    Instrument 8*uating and Computeri>ed Adapti'e Tests

    !ith the large number of health care measures for an e'en larger set of outcome

    'ariables& there is a great need to ha'e some standardi>ation of the concepts and metrics to allow

    comparisons of instruments and respondents administered those instruments (!are& ,Korner& G

    osins"i& :000)$ The goal to either de'elop or identify a core set of health care measures (or

    outcome domains) can be accomplished by lin"ing scales together or creating an item pool to

    place all items of a similar construct on a common metric$ The applications of IRT are ideal for

    such a methodology because of the property of item in'ariance$ Items calibrated by the

    appropriate IRT model(s) are lin"ed together on a common metric allowing the creation of

    *uestionnaires that may use a different set of items depending on the target audience of

    responders$ In other words& two responders administered two different assessments (i$e$& sets of

    *uestions) can ha'e scores comparable on a similar metric$

    The goal of instrument lin"ing or e*uating is not to change the content& but to adKust for

    the differences in difficulties of the measures (Mc

  • 8/9/2019 teoria masurarii

    47/67

    45

    scaled$ Therefore& instruments with a different number or difficulty of items can be lin"ed by

    responses to a common set of anchor items$

    As an e%ample& Mc

  • 8/9/2019 teoria masurarii

    48/67

    46

    some precision in measurement is met$ The benefits of a CAT methodology to measurement is

    the reduction in respondent burden to answering many items without a loss in reliability or

    precision in measurement$

    !are& ,Korner& and osins"i (:000) de'eloped an online CAT for measuring the impact

    of headaches$ !ith an item ban" of 2/ items that co'er the range of headache suffering ()&

    study findings indicate as many as 3? percent of respondents in a controlled simulation test and

    60 percent of respondents in an internet pilot test re*uire fi'e or fewer items to obtain an

    accurate measure of their suffering$ Qalidation tests of the CAT instrument found high

    correlations with three other headache instruments and a moderate correlation with one other

    instrument$

    Conclusion

    There is a great need in the health care industry to ha'e tools to monitor population health

    on a large scale as well as more precise tools to identify those who need and are most li"ely to

    benefit from treatment (!are& ,Korner& G osins"i& :000)$ The applications of item response

    theory modeling can help to create these tools$ Item and scale analysis within the framewor" of

    IRT will ensure reliable& 'alid& and accurate measurement of respondent trait le'els$

    Identification of items that are informati'e or problematic help in'estigators to understand the

    domains they are measuring as well as the populations they measure$

    7urthermore& there is a need in the health care industry to standardi>e the concepts and

    metrics of healthcare measurement to allow comparisons of results across assessment tools and

    across di'erse populations (!are& ,Korner& G osins"i& :000)$ In a reference to measures of

    physical functioning but that applies to all health outcome instruments& Mc

  • 8/9/2019 teoria masurarii

    49/67

    4?

    measurement& item difficulty& response format& scaling method& and psychometric rigor$ ,ecause

    of the limitations with classical test theory in regards to sample and test dependencies& these

    'ariations ma"e it nearly impossible to compare respondent scores across different measures$

    ed adapti'e tests that reduce

    respondent burden and increases reliable measurement by using a methodology that targets in on

    a respondents true score$

    .o& why are the methodologies of item response theory slow to be adopted into the health

    care measurement field Item response theory was de'eloped within the framewor" of

    educational testing and so most of the literature and terminology is oriented towards that

    discipline (ed& it is unclear if a latent trait such as *uality of life or patient

    satisfaction can be modeled in the same manner$ Another limitation of the modern measurement

    theory is the comple%ities of the mathematical IRT models$ Most researchers ha'e been trained

    in classical test theory and are comfortable with reporting statistics such as summed scale scores&

    proportions correct& and Cronbachs alpha$ ,eyond the mathematical formulas& there are the

    comple%ities of the numerous IRT models themsel'es as to what circumstances are appropriate

  • 8/9/2019 teoria masurarii

    50/67

    43

    to use IRT and which model to use$ There is not e'en a consensus among psychometricians as to

    the definition of measurement and which IRT models fit that definition$ Adding to the burden of

    confusion& the numerous a'ailable IRT software in the mar"et are not userfriendly and often

    yield different results (parameter and trait estimates) because of the different estimation

    processes used by the software$

    -espite these limitations& the practical applications of IRT cannot be ignored$

    nowledge of IRT is spreading as more and more classes are being taught within the uni'ersity

    disciplines of psychology& education& and public health& and at seminars and conferences

    throughout the world$ Along with this& more boo"s and tutorials are being written on the subKect

    as well as more userfriendly software is being de'eloped$ Research applying IRT models are

    appearing more fre*uently in health care Kournals& and much of their concluding comments is

    directed towards discussing the benefits and limitations of using the methodology in this field$

    Together& a better understanding of the models and applications of IRT will emerge and IRT will

    be as commonly used as the methodology of classical test theory$ This effort will result in

    instruments that are shorter& reliable& and targeted towards the population of interest$

    ne further note is that item response theory is only one step towards the goal of the

    creation of reliable and 'alid health care measures$

  • 8/9/2019 teoria masurarii

    51/67

    20

    domains of content& drafting items to measure the constructs& field testing& test norming& and

    conducting reliability and 'alidity studies;If these steps are not handled well& bad

    measurements will follow$J

  • 8/9/2019 teoria masurarii

    52/67

    21

    References

    Alderman& -$ =$& G

  • 8/9/2019 teoria masurarii

    53/67

    2:

    7ischer& #$ (135?)$ sychologische test theorie Zsychological test theory[$ ,ern+

  • 8/9/2019 teoria masurarii

    54/67

  • 8/9/2019 teoria masurarii

    55/67

    24

    Teresi& $ A$& leinman& M$& G cepe"!eli"son& $ (:000)$ Modern psychometric methods for detection of differential item functioning+ application to cogniti'e assessment measures$ .tatistics in Medicine& 13& 152115?/$

    Thissen& -$ (1331)$ Multilog BserFs #uide$ Multiple& categorical item analysis and test scoring using Item Response Theory& Qersion 5$/$ Chicago& I=+ .cientific .oftware International$

    Thissen& -$& Nelson& =$& ,illeaud& $ G Mc=eod& =$ (:001)$ Chapter 4Item response theory for items scored in more than two categories$ In -$ Thissen G

  • 8/9/2019 teoria masurarii

    56/67

    22

    Illustration of IRT Modeling

    .cale

    The ten items listed in Table A1 ma"e up the

  • 8/9/2019 teoria masurarii

    57/67

    25

    IRT analyses& ten percent random subsamples of :&253 females and :&53? males were drawn

    from the full sample to reduce computational burden$ This illustration will focus on responses

    from the clinical female sample& but will also discuss results from the analyses on the clinical

    male sample responses in the in'estigation of differential item functioning$

    Classical Test Theory Analysis of -ata

    The items in the data set were formatted so that a score of 0 indicates the respondent

    answered the item in a manner consistent with lower le'els of brooding and a score of 1 indicates

    the respondent answered the item in a manner consistent with higher le'els of brooding$ The

    a'erage summed score for the females in the clinical sample is /$34 and a standard de'iation of

    :$34$ The histogram in 7igure A1 displays the fre*uency of all summed scores among the

    7igure A1

  • 8/9/2019 teoria masurarii

    58/67

    26

    Table A: lists descripti'e statistics of the indi'idual items in the brooding subscale$ The

    column titled a'erage (when item endorsed)J shows the a'erage summed score for those

    respondents who endorsed the item in a direction consistent with higher le'els of brooding$

    =arger differences between this score and the scale a'erage of /$34 indicate the respondent is

    li"ely to endorse more of the other items in the scaleH thus& the respondent reports more

    broodingJ beha'iors when they endorse the item$ The proportion of the :&253 females in the

    Table A: Classical Test Theory Item Analyses ten itemA'erageproportion sum scale (when itemendorsed ItemA'erageendorsed) (n :253)1 eriods when I couldnt get goingJ/$342$3:0$201

    : I wish I could be as happy as others/$342$?60$240/ I dont seem to care what happens to me/$346$?20$1214 Criticism or scolding hurts me terribly/$342$/40$22:2 I certainly feel useless at times/$345$0/0$2045 I cry easily/$342$410$2456 I am afraid of losing my mind/$345$660$:35? I brood a great deal/$346$1/0$:4?3 I usually feel that life is worthwhile/$346$410$1?510 I am happy most of the time/$345$250$41/

    pointbiserial correlation(itemscale) 0$562

    0$61/ 0$25/ 0$2/0 0$6:0 0$24? 0$5:5 0$5:4 0$252 0$64?

    clinical sample who endorsed the item in a direction consistent with higher le'els of brooding is

    pro'ided in the column mar"ed proportion endorsedJ$ The proportion of respondents endorsing

    an item is an indication of item difficulty or the amount of the trait measured by the test that

    must be needed to endorse the item$ In the last column& the pointbiserial correlation is a

    measure of the relationship between the item endorsement and scale score$ Items 4 (Criticism or

    scolding hurts me terribly) and 5 (I cry easily) are least related to the total scale score and items

    2 (I certainly feel useless at times) and 10 (I am happy most of the time) ha'e the highest

    relationship with the total scale score$

    Modern Measurement Theory

  • 8/9/2019 teoria masurarii

    59/67

    2?

    ,efore an IRT model can be fit to the data& the assumptions of unidimensionality and

    local independence must be assessed$ Analysis of the factor structure within a scale is often

    carried out by researchers using the classical test theory methods$

  • 8/9/2019 teoria masurarii

    60/67

    23

    what happens to me) and 3 (I usually feel that life is worthwhile) to be locally dependent$

    .olutions to remo'e local dependence in the subscale will be re'iewed later in the discussion of

    recommended scale modifications$

    Choosing the Appropriate Item Response Theory Model

    ,ecause of the binary (dichotomous) response format of the items& either the one

    parameter logistic IRT (Rasch) model& the twoparameter logistic (:=) IRT model& or the three

    parameter logistic (/=) IRT model may be appropriate for the data$ #oodnessoffit statistics

    may be used to test for the amount of impro'ement in model fit to the data$ The simpler Rasch

    model is nested within the := model& which is nested within the more comple% /= IRT model$

    The degrees of freedom for the test of the difference between the goodnessoffit statistics

    between nested models is the difference in additional parameters needed to be estimated for the

    more comple% model$ 7or e%ample& the test of difference in model fit between the := and

    Rasch model has nine degrees of freedom representing the remo'al of the constraint of e*ual

    discrimination across the ten items by the := IRT model$ The difference in fit statistics

    suggests that the := IRT model pro'ide a significantly impro'ed fit (# :(3) /64$?& p S $01) to

    the data than the Rasch model$ 8'idence for allowing the discrimination power to 'ary among

    scale items in the := model may also be obser'ed in the 'ariation of item factor loadings from

    the factor analysis and the 'ariation of itemscale correlations (pointbiserial correlations)

    pro'ided in Table A:$ Tests for the impro'ement in model fit for the /= model did not find

    any additional e%planatory power for using the more comple% model$ Also& some may argue

    theoretically that the /= models pseudoguessing parameter may not pro'ide any insight into

    the respondents beha'ior in health care research$

  • 8/9/2019 teoria masurarii

    61/67

    50

    Item and .cale Analysis using Item Response Theory

    Table A/ displays the twoparameter logistic IRT model estimated slope and threshold

    parameters for the ten items in the brooding subscale$ The high slope parameters (a) indicate the

    items are highly related to the latent trait measured by the scale$ 7or e%ample& I brood a great

    deal along with I am happy most of the time and I certainly feel useless at times all ha'e a high

    Table A/ IRT 8stimated arameters for the ,rooding .ubscale clinical females (n :253) itemab 1eriods when I couldnt get goingJ1$320$0: :I wish I could be as happy as others:$450$12 /I dont seem to care what happens to me:$:01$//

    4Criticism or scolding hurts me terribly1$0/0$:5 2I certainly feel useless at times:$4:0$0/ 5I cry easily1$110$:/ 6I am afraid of losing my mind1$610$62 ?I brood a great deal1$?40$3/ 3I usually feel that life is worthwhile1$?41$:4 10I am happy most of the time:$?/0$:2Note+ Multilog was used to calculate the twoparameter logistic model$

    loading (a parameter)& as would be e%pected for a scale measuring brooding$ The location of the

    items across the underlying trait are scattered from 0$:5 to 1$//$ The item with the

    lowest threshold is I cry easily& suggesting that it does not ta"e high le'els of the brooding trait

    for a clinical female to endorse this item$ n the other hand& it ta"es high le'els of brooding for

    a person to respond falseJ to the item I usually feel that life is worthwhile$

    7igure A/ displays each of the items trace line (top graph) and information cur'e

    (bottom graph) for the clinical female sample$ 7rom the item trace lines& one can determine how

    discriminating each item is relati'e to each other and o'er what range of depressi'e se'erity

  • 8/9/2019 teoria masurarii

    62/67

    51

    7igure A/ Item Characteristic Cur'es and Information Cur'es for 8ach of the Items in the

  • 8/9/2019 teoria masurarii

    63/67

    5:

    7igure A4 Test Characteristic Cur'e& Test Information Cur'e& and .tandard 8rror ofMeasurement Cur'e for the

  • 8/9/2019 teoria masurarii

    64/67

    5/

    items best discriminate among indi'idual trait le'els$ The two items mar"ed by dash lines (I cry

    easily and Criticism or scolding hurts me terribly) are the least related to the underlying trait in

    relation to the other items in the scale$ The item information cur'es are an inde% indicating the

    range of the brooding trait o'er which an item is most useful for distinguishing among

    indi'iduals$ Information cur'es with high pea"s denote items with high discrimination& thus

    pro'iding more information o'er the trait le'els around the itemFs estimated threshold$ The two

    items mar"ed by dashed lines (I cry easily and Criticism or scolding hurts me terribly) pro'ide

    little precision to estimating trait le'els around the items thresholds$

    7igure A4 displays the brooding subscaleFs test characteristic cur'e (top)& test

    information cur'e (middle)& and test conditional standard error of measurement cur'e (bottom)

    for the clinical female group$ The test characteristic cur'e describes the e%pected number of

    items that will be endorsed in a direction consistent with brooding& conditional on the le'el on

    the underlying trait$ The test information cur'e shows what le'el of the underlying 'ariable the

    scale is most accurate for measuring le'els of brooding$ An in'erse s*uareroot transformation

    of the test information cur'e yields the test standard error of measurement cur'e& which

    describes the precision in measurement of trait le'els across theta$ Together& these diagrams

    suggest that the subscale measures respondents from middle to high le'els (1 S S :) of the

    latent brooding trait$ No items in the subscale appear to measure low le'els of the brooding trait$

    This finding is e%pected gi'en that the MMI:& a diagnostic instrument& is designed to identify

    indi'iduals scoring high on depressi'e traits$

    Analysis of -ifferential Item 7unctioning

    -ifferential Item 7unctioning (-I7) is a condition when an item functions differently for

    respondents from one group to another& after controlling for differences between the groups on

  • 8/9/2019 teoria masurarii

    65/67

    54

    the underlying trait$ To illustrate the analysis of -I7 within the framewor" of IRT& responses

    from the clinical female sample as well as responses from a clinical male sample will be

    considered$ This analysis will focus on how items within the brooding subscale may function

    differently between females and male respondents in a clinical population$

    The clinical male sample responses were analy>ed in a similar manner to the

    methodology used abo'e for the clinical female respondents$ The scale appears to be

    unidimensional with some possible problems with item local independence$ The := IRT model

    was fit to the clinical male data and item functioning appears to be similar to the findings from

    the clinical females$

    nce parameters ha'e been estimated separately for each group& the ne%t step is to

    identify items that may function differently between the groups and a set of items that can be

    used as anchorsJ for testing -I7$ Item anchors are items thought to function similarly between

    the groups and will be used to lin" the two groups together on a similar metric$ To identify item

    candidates for -I7 and anchors for lin"ing the two populations& item parameters for the entire

    scale are plotted on a bi'ariate graph for a pairwise comparison among groups (Angoff& 13?:)$

    This step is performed separately for each estimated parameter (e$g$& item b parameters estimated

    from clinical females 'ersus estimated from clinical males& see 7igure A2)$ Bnbiased items will

    present a linear relationship in the scatterplot& while outlying or di'ergent items may suggest

    differential item functioning between the groups (Mac"innon& orm& Christensen& .cott&

    ontally

    depending on which group is higher on theta (Angoff& 13?:)$ In the left diagram of 7igure A2&

  • 8/9/2019 teoria masurarii

    66/67

    52

    the item I cry easily appears to ha'e a higher threshold for clinical males than females$ In the

    right diagram& discrimination power for this item appears to be e*ual across the two groups

    7igure A2 ,i'ariate plots of item parameters between female and male groups

    b param eters

    /$2

    1$5

    Clinical Males1$:

    0$?

    0$4

    0$0

    0$4

    0$4

    I cry easily

    Clinical Males

    /$0

    :$2

    :$0

    1$2

    1$0

    0$2

    0$0

    0$0 0$4 0$? 1$: 1$5 0$0 0$2 1$0 1$2 :$0 :$2 /$0 /$2

    Clinical 7em ales Clinical 7em ales

    I cry easily

    a param eters

  • 8/9/2019 teoria masurarii

    67/67

    55

    7igure A5 Item Characteristic Cur'es for 7emales and Males

    1$0

    0$?

    0$5

    0$4

    0$:

    0$0

    /

    0$:

    0$4

    : 1 0

    brooding trait

    1 : /

    Clinical 7emales ICC

    Clinical M ales ICC

    I cry easily$

    .cale .ummary

    The analyses find that the tenitem scale functions well for measuring respondents high

    on the latent trait (see test information cur'e in 7igure A4)$ This is consistent with the purpose

    of the MMI:& to identify persons high on problematic personality traits that may need clinical

    help$ If one wanted to design an instrument that would accurately measure respondents at all

    le'els of the latent trait& one would need to add items that measure respondents at the low le'els

    of the scale$

    8arly in the analyses& two item pairs were found to be locally dependent& that is&

    responses to the items are highly correlated after controlling for the underlying trait measured by

    the scale$ 7or the locally dependent item pair& ,rooding or scolding hurts me terribly and I cry

    easily& the latter item was found to function differently between female and male respondents$

    This item has been identified by many as a characteristically poor item because of the differences

    in societyFs attitude regarding gender roles and emotional display$ eeping this item in the scale

    (P)

    56

    symptoms& and if test constructors wish to include the item on the depression scales& it may be

    beneficial to reword the item to LI spend a great deal of time cryingL or LMost days I cry$L

    The other item pair found to be locally dependent is I donFt seem to care what happens to

    me and I usually feel that life is worthwhile$ ,oth items pro'ide precision in measurement in the

    higher le'els of the brooding trait$ Remo'al of one of the items will certainly decrease

    reliability$ Therefore& another option to remo'e item dependence is to combine the items into a

    testlet$ A testlet combines one or more items into a single item with multiple categories$ This

    single item preser'es the unidimensionality of the scale without remo'al of any information from

    the items$ Thus& for parameter estimation and scoring& a testlet can combine both of these two

    dependent items into an item with ordered le'els of+ 0 endorsed neither item& 1 endorsed the

    item I usually feel that life is worthwhile& : endorsed the item I dont seem to care what

    happens to me& or / endorsed both items$ IRT can easily model this testlet with a graded

    model along with the other dichotomously scored items in the scale without any complications$