document resume - eric · document resume ed 235.197 tm 830 594 author eignor, daniel r.; cook,...

DOCUMENT RESUME

ED 235.197 TM 830 594

AUTHOR Eignor, Daniel R.; Cook, Linda L.TITLE An Investigation of the Feasibility of USing Item

Response Theory in the Pre-Equating of AptitudeTests.

INSTITUTION College Entrance Eiamination Board, New York, N.Y.

PUB DATE Apr 83NOTE 54p.; Paper presented at the Annual Meeting of the

American Educational Research Association (67th,

Montreal, Quebec, April 11-15, 1983). Small print insome figures.

PUB TYPE Speeches/Conference Papers (150) -- ReportsResearch /Technical (143)

EDRS PRICEDESCRIPTORS

IDENTIFIERS

ABSTRACTThe purpose of this study was to determine the extent

to which item parameters estimated on pretest data from the verbal

section of the Scholastic Aptitude Test (SAT) can be used forequating purposes in a situation where intact final form SAT testingdata have normally been used Items appearing in two final SAT - verbal

formt were calibrated almost completely_from pretest data An

elaborate linkage system was devised and_utilized to get parameterestimates for the items, contained in multiple pretests, on the sameScale. The final forms understudy were equated to two different oldforms; the equatings were then redone using item parameter estimates

based on the pretest data For each form, the item response theory

(IRT) equating based on pretest Statistics was then compared to theIRT equating based on intact final form_data and the linear equating

used operationally. The results varied considerably across forms,ranging from acceptable to marginally acceptable. The overall resultsof the pre-equating were deemed sufficiently promising that aninvestigation_of pre-equating two forms of SAT-mathematical will be

undertaken. (BW)

MF01/PC03 Plus Postage.Aptitude Tettt; *College Entrance Examinations;*Equated Scoresr-*Feasibility Studies; Goodness ofFit; *Latent Trait Theory*Pre Equating (Tests); Scholastic Aptitude Test

***********************************************************************Reproductions supplied by FDRS are the best that can be made

from the original document.***********************************************************************

N-

(

U.S. DEPARTMENT OF EDUCATIONNATIONAL INSTITUTE OF EOUCATION

EDUCATIONAL RESOURCES INFORMATIONCENTER (ERIC)

This oocumont has boon reproduced asreceived from the person or organizationorigineting it

I Minor chang rm des have be mae to improveregroduction quality.

Points of view or opinions stated in this documeet do not necessarily represent official NIEposmonoroacy.

c4-\

(NJ An Investigation of the Feasibility of Using Item Response

an _1,2,3Theory in the Pre-equating of Aptitude Tests

LLi

cm

Daniel R. EignorLinda L. Cook

Educational Testing Service

"PERMISSION TO REPR_O_DUCETRISMATERIAL HAS BEEN GRANTED BY

TO THE EDUCATIONAL RESOURCESINFORMATION CENTER (ERIC)...

1A paper presented at the annual meeting of A.. RA, Montreal, 1983.

2This study was supported by the College Board through Jbint Staff Research and

Development Committee funding.3-The authors would like to acknowledge the technical assistance of Martha Stoc king

and Nancy Wright and the programming assistance of Inge Stiebritz and Karen Carroll

in performing-this study.

An Investigation of the Feasibility -of Using Item Response

Theory in the Pre-equating of Aptitude Tests

Daniel R. EignorLinda L. Cook

Educational Testing Service

Introduction

The current thrust of research devoted to the applications of

item- response theory (IRT) has generated an active interest in the use

Of IRT methods in the solution of score equating problems (see Cook and

Eignor, 1983). Because of the Special properties of test data

characterized by IRT models, users are often able to'solve prOblemS not

amenable to traditional equa tang methods. For other situations. IRT

equating offers an alternative against whiCh to evaluate traditional

methods. In addition, a number of other important outcomes accrue fr om

the use of IRT Eor equating tests; among these are: 1) Improved

equating,, including better equating at the ends of the Scald where

important decisions are often made, 2) greater test security through

less dependence on items in common with a single old form easier

re-equating should items be deleted; and 4) the possible reduction of

bias or drift in equating introduced when traditional methods are used

over time in-certain situations, most notably when the equating samples

for the old and new forms are not random samples from the same

population.

While the abOVe listed Outcomes accrue as the result of the

application Of any IRT equating method; if the test forms to be equated

can be pre-equated using IRT:methods, a number of additional advantages

accrue. Pre-equating refers to the process of establishing conversions

from raw to scaled scores prior to the tidew_ the new test is administered

operationally,- The process depends on the adequate pretesting of a pool

of items from which the new test will be built, the calibratidil of these

items using .CRT methods and the utilization of a linking scheme to

place the IRT parameters from the pretested items on the same scale

Among the additional advantage§ offered by IRT pre-equating are: 1)

Since equating using IRT pre - equating methods is possible prior to the

actual administration of the test, new forms can be introduced at low

volume special administrations, a particular problem if traditional

methods are used; 2) since pre-equatitg permits linkages to many old

forms, it is the most likely of any equating method to yield acceptable

results ShoUld testing legislation Mandate the disclosure of pretest cr

equating items; 3) pre-equating would allow more time to do

_

reasonableness and quality control checks, Which are normally done iit.a

hu't-ried fashion due to score reporting deadlines; and 4) pre-equating

would actually permit a reduction in the usual score reporting cycle

While simultaneously allowing more time to do the equating itself. In

short; the listed advantages that can potentially accrue from the use of

IRT pre-equating build a strong case for investigation of the

feasibility of application of this method; In this report, the

applicability of IRT pre-equating to the:Scholastic Aptitude Test (SAT)

verbal section is considered.

-3-

Problem and Purpose

date; investigations of the feasibility of pre-equating using IRT

for tests developed and administered by Educational Testing Service for

the College Bbard have been done using data from the Test of Standard

Written English (TSWE) (Bejar and Wingersky, 1982). The Bejar and

WingerSky study (1982) indicated some discrepancies between pre-equating

results and the results;from traditional equating§ in situations where

tradi.tonal equating was a reasonable procedure; The calibration system

used for pre-equating TSWE was considerably different, howeVer, frOM any

system that could be devised for pre-equating the SAT. ThuS, although

the results of the TSWE pre-equating study were not altogether

promising, there is little reason to suggest that these results are

generalizable to Ore=eqUating the SAT; For this reason, it was dedthed

important to investigate the feasibility of pre - equating the SAT using

an appropriate calibration system, such as that deviSed for this study;

-The purpose of this study was to determine the extent to which item

parameters estimated on SAT verbal pretest data can be used for equating

purposes in a Sitdatibn where intact final for SAT testing data has .

normally been used.' The items that appear in any final SAT=Verbal form

come from multiple pretests and to the extent that the item parameter

estimates are sensitive to the context in which the item appears, vr to

sample differenceS, there may be differences between these parameter

estimates and parameter estimates generated using data from the actual

final form administration, resulting in a discrepancy between equating

based on pretest item parameter estimates and intact final form item

-4--

parameter estimates. More specifically; in the study, items appearing

in two final SAT-verbal forms, 3ASA3 and 3BSA3, were calibrated almost

completely from pretest data. (See section on IRT Calibration Design

and Linkage System.) An elaborate linkage system; quite representative

of the system that would exist Were pre-equating to be considered for

operational use; was devised and utilized to get; parameter estimates for

the items, contained in multiple pretests, on the Sate Scale. The two

verbal forms under consideration were both part of thiS linkage system.

The effects of using the parameter estimates; obtained from the

pretest data, on the equating process were evaluated in the following

way; Each of the SAT - verbal final forms under study, when administered

for the firSt time operationally, was equated to two different old forms

and t!-.6 results of the equatings averaged. ConventiOnal linear equating

methods were used when this equating was done These equatings were

redone using item parameter estimates based on the pretest data and item

'parameter estimates generated from the intact final form adtinistratioti.

In each Cage, IRT true-score equating was performed. For each form the

IRT equating based on pretest statistics was then compared to the IRT

equating based on intact rinal form data and the linear equating used

operationally When each form was put on scale. IRT equating based on

intact final form data and linear equating rdSultS were used as criteria

in this study for the following reasons: (1) In recent IRT equating

feasibility studies (Petersen, Cook; and Stocking, in press; Kingston

and Dorans, 1982), it has been demonstrated that intact form IRT

true-score equating is a viable equating method for aptitude test data;

and, (2) the linear methodsactually performed to put the forms on scale

-5-

operationally haVe undergone many years of scrutiny through their use

fOr operational score reporting purposes. This study was done using two

SAT-verbal forms so that all results could be replicated. This should

form the basis for drawing stronger ConcldSions about the feasibility of

pre-equating the SAT-verbal section than had the replication not taken

place.

Me-thOdology

Description of Tests

Test booklets containing SAT fords suctias those used in this study

consist of six 30-Minute sections: two SAT-verbal sect.onsi two

SAT-mathematical sections, one Test of Standard Written English (TSWE),

and one variably section. All examinees at a given administration take

the same test sections except for the variable section; where different

subsampleS of the total group receive different variable sections. The

variable section consists of either one of two verbal or mathematical

common item equating sections (anchor tests) or one of a number of

verbal, matheMatical, or TSWE pretests. In this study; data from only

the verbal sections, Verbal common item equating sections, and verbal

pretests were used. The samples used for calibration purposes in the

Study either took the verbal sections and one of the verbal common item

equating sections or the verbal sections and one Of the verbal pretests.

she two SAT-verbal sections contain a total of 85 five-choice items

(45 items in one section; 40 items in the other section) c.omprised of 25

antonyms, 20 analogies, 15 sentence completiona; and 5 reading passages

7

-6-

each of which is followed by 5 items based on the passage. The verbal

Common Item equating sections contain 40 items (10 of each type); theSe

sections are built to be as parallel as possible to the 40 item

section. The verbal pretest sections either contain 45 or 40

items and are built to be as parallel as possible to the comparable

length SAT-verbal sections.

Prior to 1982; raw scores on the SAT were typically transformed to

scaled scores on the College Board 200 to 800 scale via linear equating

tethOds. Since January of 1982, IRT true -score equating using intact

fihal fort data has been used to put forms on scale. SAT-verbal raw

scores are obtained scores that have been corrected for guessing. Raw

scores are computed by the formula R-04, where R is the number of

correct responses, W is the number of incorrect responses, and (k+1)

equaIS the number of choices per item.

Item-Calibration Design and Linkage-System

Pretest items corresponding to the verbal sections of two forms of

the SAr, 3ASA3 and 3BSA3, were calibrated and placed on a common scale

through an elaborate linkage system which utilized data on overlapping

items from the administration of intact final forms with either pretest

Sections or common item equating sections. The calibration linkage

system, involVing the pretests; final forms, and equating sections is

depicted in Figure 1. Responses from randomly Selected samples of

Approximately 3000 examinees taking each pretest-final form combination

and approximately 2700 takihg each final form-equating section

combination were used for calibration purposes. Each box in Figure 1

represents a separate calibration ( computer run). The dotted-lire boxes

77-

within the larger bokda indicate the overlapping items that were used to

place parameter estimates on the same scale within a single calibration

-tun. The directional arrows between the boxes indicate that a scaling

program (described in a later section Of this paper) was run to place

parameter estimates from the separate calibration runs on the same

scale; It should be noted that all items contained in each 40 item

equating section appearing in Figure 1 were calibrated; howevet this was

not thb case for all items in each pretest or 4461 fort. In. order to

reduce calibration costs, only the 40 i m section of SAT-verbal forms

-Iused for linking purposes and OnIT-the 170 (85 items X 2 forms) pretest

items which eventually appeared in final forms 3ASA3 and 3BSA3 were

calibrated. Table 1 contains the total number of items and also the

total number of examinees responding to the items for each of the 13

calibration runs. Table 2 lists the number of pretest items calibrated

in each of the runs. Further reduction in costs were made possible by

using existing parameter estimates from the SAT IRT Scald Drift Study

(Petersen, Cook; and Stocking, in press) whenever gossible Also;

certain final form-equating section combinations from the Scale Drift

Study (labeled C-C, in Figiird 1) and certain final formequating section

calibration runs (numbered 9 and 13 in Figure 1) were linked into the

overall calibration linking system though they were not essential to

getting the pretest parameter estimates on the same scale. This was

done for equating purposes, and will be described in a later section.

IRT_ModeI.and Item Calibration

Item response theory (IRT) assumes that there is a mathematidal

function which relates the probability of a correct response on an item

Pretest data did not exist for 8 of the 85 items in Fort 3ASA3; Therefore,

final form data had tc be used in thecalibratiOn Syatem; This data was

obtained from calibration run number 9 in Figure 1;

9

A

6

EicYt

it)

I

Ce

1l. i'ttlIn

100lit I

...

I 1AL I

kW, ;AL

(

IA/

X I241XL

ICI

It IPA-

46.

A3

lormr

Nara 3: Verbal pa-OquitIng ceillretionudil.Iloking plan,..Upper7ceav letters

iulluved by oae dlgic_ileulgnate intactSAT flu/ lump. -(ln the

(Regrets, all Intact SAT final furor JebignatIons r abbreviated; Al is Uit_altbreviatian fur 3ASA3, etc.) Upperran Idlers iuliothof by

three or ionr_digits &alpaca pretest (urns, Loot:in:de! letter( depignete cowmanlints equating aeelIune, _Bow (nuAiled 17,11) indicate

Rpm: IIX,IST ollIvotP:o runs Ilnatni huOeq %Obi the lay( boles inilleate_usurInpping Iteav that ogre wed to (Awl parawitt

natkatuu on CI4 mat stall LOCIST111116 OVAI, (1,ttettol A,C)rums and quAtins sectiao lot Aid parotid estilligdg

already Cobb u iron SAT IliT Drift Study, Amite indicatti ditection in which eating (linking) rune took place.

LoctsT run inaber n peceutiary in that kill 85 Al Imo can ba calibrated. (Only 40 Revs (run Al were calibrated In gag run unbar 3.)

eaof a necessary nu Owl the AS

calibrated in IJICIST run number 8 could be placed on (Neill,

ll

Table 1 I

Total Number of Items and Total Number of Examinees

for each of the LOGIST Calibration RUN

LOGIST Calibration

Run Number

Total Number of.

Items Calibrated

Number of PeeteSt

It6mS Calibrated

Number of Equating

Section Items Calibrated

Number of SATyerbal Total NuMbers

Section Items Calibrated-- of Examinag

135 55 40 40 8;459

2 162 2 80 80 8;519

14 3.121 1. 80 40 7;964

4 120 80 40 6;181

5 174 14 80 80 14,069

6 132 12 80 40 22,922

120 80 40 _5)41

8 137 17 80 40 '4,778

9 ,125 40 85 2,777

10 298 58 120 120 20;460

11 161 1 80 80 10;347

12 82 2 40 40 8;146

13 125 40 85 2;754

;.;

1;892 1622 920 810 143,499

LOGIST run number refers to identification scheme in Figure 1;

Pretest data did not exist for 8 of the 85 itemSIii_1ASA3, and hence;final form data had to be used for

calibration purposes. Thus only 162 of the total 170 pretest items were calibrated;

2-

Table 2

Number of items Calibthted from Each Pretest For

Pretest

Form

LOGIST1

Run No.

Total No.

of Items

Calibrated

No, of

Items

'in 3ASA3

No, of

Itimi

in 38SA3

Pretest LOGIST

-F Ron No.

Total No;

of Items

Calibrated

No; of

Items

in 3ASA3

No. of

_ItOmi

in DSO

C167 I. 27 13 14 X2232 8 2 1 1

0168 1 28 16 12 X2111 8 1-

X4058 2 2 - 2 X2069 8 1 -

A1128 3 1- I X2163 8 2 1 1

A2120 5 7 - 7 X2134 8 4 4 -

A2061 5 4 - .4 X2216 8 1 1

W4057 5X2128 8 6 6

X5050 625069 10 1 - 1

X5126 6 2 C237 10 29 15 14

X5I61 1 C238 Ili 28 14 14

X5132 1NO W5014 11

X5111 6 5- 5 24125 12 1 1

24066 12 1 1

Totals 1622 772 85

LWIST run number refers to the identification scheme in Figure 1.

2-Pretest data did not exist for 8 of the 85 items in 3ASA3, and

hence, final form data had to be used for

calibration purposes. Thus, only 77 (of 85) pretest items were calibrated for 3ASA3 and 162 (of 170) for

both forms.

14

to an examinee's ability. (See Lord, 1980, for a detailed discussion.)

Many different mathematical models of this functional relationship are

possible. The model chosen for this study was the three-parameter

logistic model. In this model, the probability of a correct response to

item i, i(0),

p (0) _ c.6-1;702 ai(6-bi)

1 - ni(1)

where,

and ci

are three parameters describing the item andai

,represents an examinee's ability. These parameters have specific

interpretations: bi is the point on the e Metric at the inflection

point.of Pi(6) and is interpreted as the iterii-d-ifficulty; ai is

proportional to the sloPe of Pi(0) at the point ofsinfiection and

represents the item diactknttnavtanl and ci is the lower asymptote of

P.(e) and represents a pseudo,,guessing_ parameter.

The item parameters and examinee abilities for this study were

calibrated using the program LOGIST (WingerSky, Barton; and Lord, 1982;

Wingerskyi 1983). The estimates are obtained by a (modified) maximum

likelihood procedure with special procedures for the treatment of

omitted items (see Lord, 1974).

LOGIST requires as input the responses to a set of items from a

group of examinees, coded to reflect items answered correctly,

incorrectly, omitted and not reached. In additicim, the user may specify

certain restrictions on the data and parameters in order to speed

convergence of the iterative procedure. The major restrictions

specified for the study for most of the LOGIST computer runs were:

16

-12-

1. examinees who answered less than one-third of the items were not

used,

2. a's were restricted to a range of .01 to 1.75,

3. c'S were restricted to a range of .0 to .50 or .75(p+), and

4. 0's were restricted to a range of -7.0 to 5;0;

LOGIST produces as output estimates of the a; b, and c for each item,

and e for each examinee.

Thirteen separate LOGIST runs were necessary to calibrate the

pretest items, final form and equating section items useor linking

purposes, and the final forms to be used for equating purposes. These'

LOGIST runs are numbered 1-1 hi Figure 1. Each of the separate LOGIST

runs generated item parameter estimates on the particular:scale defined

by the ability distribution of the group of examinees used it the

calibration, and hence; a scaling program had to be run to put parameter

estimates from the separate LOGIST runs on a common scale. This Scaling

program also had to oe run to put the final form-equating section

combinations from the SAT IRT Scale Drift Study (Petersen et al,

press) on the cotton scale; LOGIST run 10 in Figure 1 was chosen as the

base form for scaling purposes because it contains an SAT-verbal form

and equating sectio,.. which are in common with a partial pre - calibration

linkage system recently deViSed (Cook and Petersen, 1982) for possible

future operational SAT use.

Scalings

The Scalings Just referred to are indicated by the directional

arrows in Figure 1 (and alai) Figure 2, to be diScussed in the foliowing

section). A recently devised scaling method (Stocking and Lord, 1982)

17

13

was used in the study. Briefly, the method works as follows. Letting

b, a; and c denote item difficulty; discrimination, and lower asymptote

parameters, a linear transformation of the form

.-bT =rb +

aT = a/r (T = transformed)(2)

is found which places new fort Item parameters on the base form scale.

The r and m of this transformation are chosen to minimize the average

squared difference between true scores on the common item set for a

particular group of examinees who have taken the ',lase fort; It should

be noted that cf = c. so that there is no necessity to transform lower

asymptote parameters. This method implicitly makes use of data from all

the parameterS-chatacterizing,an item because true scores are used in

the minimization process.

Equating Design

Operationally, the verbal sections of 3ASA3 and 3BSA3 were each

linearly equated to two old SAT forms and the results averaged. Thdad

equatingS can be used as a means for evaluating the effects of using

items calibrated from pretest data in the equating process. The

following diagram depicts the actual equatings that took place, and the

common item sections used for the equatings.

3ASA3

XSA2 YSA3

SA33

fk lw

= 3ASA1YSA1

For each equating depicted, IRT truescore equating, to be de-Scribed

in detail in the next section, was done three different ways. The first

way, referred to as IRT preequating, involved the use of item paramet'er

18

-14-

estimates based on pretest items which constitute 3ASA3 and 3BSA3, while

the other two ways (both used as criteria to evaluate the IRT.

pre - equating) involve the use of item parameter estimates based on data

collected when 3ASA3 and 3BSA3 were administered. as final forMs in an

intact fashion. The second and third ways differ in the following

fashion. In One situation, referred to as intact form calibration

system equating, item parameter estimates for 3ASA3, 3BSA3, and the old

forms to which they were equated were placed on the same acald, which is

essential for IRT equating, by being linked into the overall calibration

and linking plan shoWn in Figure 1. In this situation; the forms to be

equated were linked indirectly through multiple scaling runs applied to

a number of intervening LOG1ST runs which contain multiple final forms

and equating sections. This was done in an attempt to simulate_

conditions of one possible model under which intact final form IRT

equating might take place for the SAT in -the future; In the other case,

referred to as intact firm direct link equating, parameter estimates for

the new (3ASA3 and 3BSA3) and old forms to be equated were linked

directly through common equating sections. This linking is depicted in

Figure 2.

Equating Methods

Linear equating methods produce an equating transforMation of the

form T(x) = Ax + B; where T is the equating transformation, x is the

test score to which it is applied, and A and B are parameters estimated

from the data. The Tudker, Levine Equally Reliable; and Levine

Unequally Reliable linear equating models (Angoff, 1971, pp: 579-583)

are the models that have been used until 1982 for equating SAT=verbal.

19

-15-

To equate A3 to X2 and Y3directly using intact form A3 data

fmA3

INTACT

C fm X2

9

fm Y3 )

To- equate B3 4i Y2 and Aldirectly -using intact-- -form B3 data

(: B3 fk 2)-INTACT

fk Y2 :)

E

INTACT133_

13

fw Al

Figure 2: Verbal intact form direct link calibration and linking plan.liPer-case letters followed by one digit designate intact SAT

final forms. Lower-case letters designate common item equating

sections` -Boxes and ovals are numbered to directly correspond_

to comparable boxes and ovals in Figure 1. Arrows indicate

direction in which scaling (linking) took place.

-16-

Choice of which of the three models to use for score reporting purposes

dePeheS on 1) differences in ability between new and old form groups, as

Measured by a-set of common items (anchor test), and 2) whether the new

_-and old forms are equally reliable, which is typically interpreted to

mean --of equal test length. TheSe models are based on univariate

selectiOn sampling theory. Scores on the relevant selection attribute

(the attribute on which the equating savTles vary) are assumed to be

Collinear with scores on the anchor test in the case of the TUcker model

and with true scores on both the anchor test and the test form in the

case of the Levine models. Scores on the common item set (anchor test)

are used to estimate performance of the combined group of examinees on

both the old and new forms of the test; thus simulating by statistical

Methods the situation in which the same group of examinees takes both

forms of the test.

The parameters A and B of the equating tranformation are estimated

by means of an equation that expresses the relationship between raw

scores on two test for in standard score terms:

X - M_ y - 14_x y

S S:y

(3)

where x and y refer to the test scores to be equated, and M and S refer

to the means and standard deviations of the scores in some group of

examinees. Methods using the above equation differ in their

identification of the means and standard deviations to be estimated.

The Tucker and Levine Equally Reliable methods are based on the

estimated means and standard deviations of observed scores whereas the

Levine Unequally Reliable method is based on the estimated means and

standard deviations of true scores.

21

-17-

Table 3

Formulas for Linder Conversion Parameters

Tucker

A4. 62 it2 t2 ,/t4 .1

yb yvb' vc vb" vb xa xyas ye ve va'

2 -2B H + C 01 - H )/S - AM - AC (H - H _ )/S

yb yvb ye vb vb xa xva ye va va

Levine Equally Reliable

2 2 2A (S- + (S-- - 52., = S- )/(S2 - S-

2

yb yb y' c vb vb v"b

--2 -2 1 2 2 -2 2(Sva + (S_x_a

Sx"a)(Svc - Sva va v-)/(S--

- S- "a )

B ti + (1 - )((S2 - S2 ) ifi2 -N2 -05yb vc vb yb vb V"b"

2 2 2 -2 LI

- AM5ai A01 -vc

)((S3-c, Sx.4)/(Nv4 Sv.,))

Levine Unequally Reliable

2A ((S - S

2. )/(S-

2

yb y b vb

B 14yb

ii0)11(02_ 2(s

i2V_

"xa 4a

2 2 2(1.4 - )((Ssl - Sy0/(Svi, - Nei))

1/4

-

Angoff Error Variance Estimates (Anchor Test External

2S (S

2S2 - C

2 MS2+ C _

P pg vg pvg vg pvg

2 -2S2

(S2

S2

- C-pv g pg vg vg)/(S + C

pg pvg)

Notatiotr

Xi

to Total Test)

New Test Form ZOICTest FormEither New or Old Test FormAnch-or Test V

Observed Score x, y, v, p

2".Error Scoria et v", P"Group Taking Test X and That VGroup Taking Test Y and Test VGroup Taking'Test P.and Teat VCombined Group or (a + b)

MeanStandard DeviationCovariance

22

The formulas for computing the A and B parameters for the Tucker;

Levine Equally Reliable; and LeVine Unequally Reliable modeIs,are given

in Table 3. AS noted in Table 3; the formulas for the Levine models

require error variance estimates; Angoff's method (1953) Of estimating

error variances is used for operational linear equating. ThiS method

assumes that the test to be equated and the anchor test are parallel_

except for length.

When a new form is equated to two old forms; the final linear

parameters to 1:1Ut the new form on scale are arrived at in the folloWing

faShiort. Each of the old forms has linear parameters for placing it on

Scale; these parameters are combined with linear parameters generated

from the equating relationship to derive parameters to put the new form

on scale. There will be a set of parameters for each equating to each

Old fotM; the final set of parameters are arrived at by averaging

parameters from each of the single equatingS.

Although there are a number of equating techniques possible when

using IRT, this study was concerned only with true formula score

equating (Lord, 1980). The expected value of an examinee's doserVed

formula score is defined as his or her true formula score. For the true

formula score, s , we have

(k + 1)

= E P (e)k

i 1

(4)

n is the number of items in the test and (ki+1) is the number of

choices for item i. If we have two tests measuring the same ability8

23

k}ien true formula scores and n from the two tests are relatea by the

equations

mn= E

j = 1

(ki_ +1)

k; IT1T1 ]

(k. + 1)

(5)

Clearly, for a particular 6 corresponding true scores and n have

identidal meaning. They are said to be equated.

Because true formula scores below the chance score level are

undefined for the three-parameter logistic model, Some Method must be

established to obtain a relatiOnShip between scores below the chance

level oft the two test fortS to be equated; The approach used for this

study (Lord, 1980) was tb estimate the mean (M) and standard-deviatidh

(S) Of below chance level scores on the two tests to be equated via the

formulas

IE

i =3-Li+ 1)/ki liki. and (6)

2n

S = E (c

=

where n is the huMbet of items in the test; (ki I) is the number of

choices for item i, and d iis the psuedo-guessing parameter for item i;

And then to use these estimates to do a simple linear equating (see

equation (3)) between the two sets of bdloW chance level scores.

=20=

o_

.

In practice, true score equating is carried out by Substituting

'// estimated parameters into the equations (5)-and (6). Paired values of

and n are then computed for a.SerieS Of arbitrary values of e. Since we

cannot know an examinee's true formula -.sqpre; we act as if relationships

(5) and (6) apply to an examinee's observed formula score.

Two furner points. require clarification. First, the mechanics of

doing IRT true -score equating based on pretest data (pre-equating) and

based on intact final f rm data are exactly the same. What differs are

the item parameter estimates that are used to calbulate P1(e) in

equation (4). In one instance the parameters have been calibrated for

the item-When given in a pretest, and in the other instance, when the

item was given as part of an intact final form. Second, when performirig

-score equating to two old-forms using IRT true-score equating

techniques,, a conversion table is generated for each'new form-old form -

relationship ant' then the correspondingAlf

entries in each table are Simply

averaged to generate the final table.

Results

Pre-e tatin<z Results

4$ A number of figures and 'tables have been prepared to summarize the

results 'of this study; Because' the equatings done for 3ASA3 and for

3BSA3 are independent, and meant to serve as replications of the

pre-equating process, the figures for each form can be viewed

separately. BeCAUSe each of the forms was equated to two old forms,

there are figures for each of the single equatings and then the equating

'tegtlting from the averaging of the single equatings.

.9

-21-

In the figures for each equating performed, thete are two plbts.

The first plot compares the raw to scaled score conversion line

resulting from the IRT pre-equating to one of the three Comparison

conversion lines, resulting from either the intact form calibration

system IRT equating, the intact form direct link IRT equating, or the

intact forM linear equating actually used operationally for score

reporting purposes. The second plot contains residuals. These

residuals are simple differentet between scaled scores resulting from

the IRT pre-equating and one of the comparison equatings fOr each

possible formula score point. The plots use the comparison- equating

(intact form calibration IRT equating, intact fOrM direct link IRT

equating; or intact form linear equating) as the baseline and show

differences between the Ptd=equating equating and the baselineequating

results across the formula score scale. As mentioned earlier, the

intact form talibiation system and 1irect link IRT equatings were chosen

as baseline egliatings for these residual plots because this sort of IRT

equating has been shown in previout studies to be a viable eq.,:ating

method for SAT data, and provides a good criterion equating against

-

-which to evaluate the Pre-equating results; The intact form linear

equating was also used as abasellne.because thit was the method

actually used to put 3ASA3 and 3BSA3 on scale operationally. Of the

three comparison equatings, the intact form direct link equating should

provide the best criterion against which to evaluate the pre-equating

regultt in that 1) the relationship between the parameter estimates; for

the forms to be equated, .from the separate LOGIST runs have not been

influenced by intervening scalingt, and 2) in contrast with linear4

\_

-22-

Equating Plot Res-a, dual Plot

D 35SAT-V A37SAT-V X2

SOO 1.----L F

I 30 -00 700 L F 25 -

.?-1 S- 0 20

-0M

C 600 -A

F 15 -L s 10 -

-CP E SOO I"-43

C -D 1- A

E-1 5400-E

Ir 0-ad1-1 C I- D- -5 -

0 300 fra-10 -

S RI/ E

1-1 ld 200 - - - IRT PRE-EQUATING..15

o -20 -al >,

- INTACT FORM _ 4CS)

S -30-25___

1--C0 100- IRT EQUATING

E

,--1

SAT-V A3/SAT-V X2 EQUATING RESIDUALS_

E1-4

0

m00

4.J --i

0

"! JJ

0

0 1 1 1 1 1 1 1 I 1-i-1_1_1_1 1_1_1 III LI I 1 1 35-30 -20 -10 0 10 20 30 - 40_ - 50 10 70 80 90

FORMULA TRUE-SCORE

600!

700 T-

S rC 800A

E500

Sol"C _

R0 3001-Ea4-

looE11 t1-1--rii-r-t-i-1-1111,1

-30 -20 -10 0 10_ -20 30 40 50 60 70 80 90FORMULA TRUE-SCORE

800

700

SC 6001-A

= --LSCO

0t-1

S 4°°

0 300 L

Egoal-

i CO

SAT-V A3/SAT-V X2

IRT PRE-EQUATING

INTACT FORM CDL):CRT EQUATING

SAT -V A3/SAT -V X2

- IRT PRE-EQUATING

INTACT FORM OPER,LINEAR EQUATING

0 t-1 1 1 1 -1 1 l 1 1 1 1 1 1 1 1 I 1 I 1 1 1 1 I 1 1-

-30 -20 -10 0 10 20 30 40 SO 80 70 SO 90FORMULA TRUE-SCORE

D

I 30F 25

-

1.1- =RT PRE-EQUATING

INTACT-FORM-41:S)IRT EQUATING

-30 -20 -10 O 10 20 30 40 SO so 70 60 DO

FORMULA TRUE -SCORE

SAT-V A3/SAT-V X2 EQUATING RESIDUALS

O 20F 15

S 10CAL 0ED

s __r J--zo

RI

E -25 r-30

- IRT PRE-EQUATING

INTACT -FORM CDL)IRT EQUATING

sl-1 -11 1_1 1_1 1 I _I_ I 1 -r- 1 1 t 1 t I 1 t h 1 f- t-30 -20 -10 0 10 29 30 40 50 80 70 80 90

FORMULA TRUE-SCORE

SAT -V AS/SAT -V X2 EQUATING RESIDUALS

D 35IF 3°F 25

6 20F IS

s 10

S

LED -5sc --I sO -20R 1

/ E -25S -30

351

I- - IRT PRE-EQUATING

f. ---- INTACT FORM OPER.rT LINEAR EQUATING[L

I--

I.- 7 ._01- /

1---.

I-

pI-I1-

1_1 _1_1 L1 1 i I 1 1 1 I 11 1 1-1-1-1-1-1- I-1--30 -20 -10 0 10 20 30 40, SO 60 70 60 20

7UFKLA TRUE-SCORE

Figure 3: SAT-verbal Form 3ASA3 equated- to erb XSA2 - Picts of I) IRT

pre-equating raw to scaled score transformation compared to corresponding

intact form calibration system IRT, direct link IRT; and operational;. _ ___ . _

rawTine_ equating raw to scaled score transformati,:ms, and 2) difference-S.

betWeen Sdaied scores (IRT pre-equating - comparison equating) resulting

from the equatings

27.

600

700

S-C600A

EUM0

__400C0 300

E-200

100

0

SOO

E.suaring_Plot

-23-

SAT-V A9/SAT-V Y3

7

I- I I-

- IRT PRE - EQUATING

INTACT-FORM-ECS)IRT EQUATING

1 1 1 1 t 1 1 1 1 I 1 1 1 1 1 I t I I

-30 -20 -10 0 10 20 90 40 SO 80 70 60 00FORMULA TRUE -SCORE ,

SAT-V A3/SAT-V Y3

0 60

-4C9 C31

ME crW P4

P.0

E-4

.1-.1

4.1 5_Z

1.4

700

SC 600A

E S001-D

S4001-

C

0 300PE

100

0

r

-

h"*

-30 .20 -10 0

0E CO p-3

1J f(1

Cl CIC 0

.1.3 rlC

1--I

C)

aO

600rt

tCO

- IRT PRE-EQUATING

_ FORM COL)IRT EQUATING

10 20 30 _ 40 _ -50 80FORMULA TRUE-SCORE

70 SO Go

SAT-V A3/SAT -V Y3

-30 -20 --IC 0 10 20 30 40 SO 60 70FORMLLA TRUE -SCONE

Figure 4.

o nI

3O

F 2S

O 20

F 16

S 10C s

LL 0

-6

S0-160-20E -25

S,3

Residual Pl

SAT-V A9 /SAT=4, Y9 EQUATING RESIDUALS

35 30 -.20 .10 0 10 20 30 40 50 6Q 70 60 610

FORMULA TRUE-SCORE

IRT PRE-EQUATING

INTACT-FORM-CCS)IRT EQUATING _

SAT -V A3/SAT -V Y3 EQUATING RESIDUALS

ss

F --F 2S

0 dF isS 10CAL OrED -

..1SC-16r° -20 -RE25

-30

/

- IRT PRE-EQUATING

INTACT_ F_ORM CDL)IRT EQUATING

-35 1i1-10 0 113 20 30 40 SO 00 70 60 GO

FORMULA TRUE-SCORE

D 3SI 30F 25

2°F IS1-s 10 1--

C sAL 0E -5

-100-150 .40RE -25

-3035

-90

SAT -/ A3 /SAT -/ Y9 EQUATING RESIDUALS

- IRT PRE-EQUATING

INTACT FORM OPER.LINEAR EQUATING

..

-20 -IC 0 10 20 30 40 50FORMULA TRUE -SCORE

00 70 80 90

SAT-verbal Form 3ASA3 equated to SAT- verbal Form YSA3 - Plots of 1) IRT

pre-equating raw -to scaled Store transformation compared to corresponding

intact form calibratidn system IRT; direct link IRT, and operational

linear equating rat..? to scaled score transformationsi and 2) differences

between scaled_Scrites (IRT pre-equating - comparison equating) resulting

from the equatings

Equating-Plat Residual Plot

0-tm 00J.) clam

X MIa

(.7

E Pas.. i-4-0

Eti)-Lt

L1 tit

m

700

S600

AL _E 500

400 I-

0300RE-

200

co

SAT -V AWSAT-v X2 AND Y3

---1-

IRT PRE - EQUATING

INTACT FORM CCS)IRT EQUATING

+-1}14111-NY -20 -10 0 tO 20 30 40 50 80 70 80 GO

FORMULA TRUE-SCORE

SAT -V A3 /SAT -V X2 AND Y3

awl

700 -Sc 800 -A

,

E SOO

1-

S 450C

0 3001R ,

E r-200 I-

1901- rel[

LLI.J..11 1 if t U 1 1 IL_L-1 / 1 1 1 t 1 t L,-30 -20 -10 0 10 29 30 40 SO 60 70 80 GO

FORMULA TRUE -SCORE


INTACT -FORM LDL)IRT EQUATING

M_ WS Z0 ..- Q0LL

H -,14,-1 ta 4.:U C Z4) -..-i .., p',14 c

ii

euP

MO -SC81:10 ',--A I

i_ _!E 500 1--0 1--

S400 I--

0 300 1PE I--

W

SAT -V .43./sAT-v X2 AND v3

IRT PRE-EQUATING

cL 7 INTACT FORM OPER.CD 100 r . LINEAR EQUATING

01h' j 1[1-- 1--1-- I f-lr-V-1-4- 4-1 -1- i 1 1 1 1 I I 1 1 1 1

-30 -20 -10 0 10 20 30 40 50 190 70 80 GO

FORMULA TRUE-SCORE

1

1) 35ISO

F zs

oF 15S 10

SACOED

S-10

c -160 -20E -25

-30SS -30

SAT -v AS/SAT -V X2 AND Y3 EQUATING RESIDUALS

D 3530zs -

0

IsFS 10

ShAL -6 F-

E -I-5 r--t

c -IS

E-25s -30 -

35-30

o/

- - - IRT PRE - EQUATING

INTACT_ FORM (CS)IRT EQUATING

I I 1_11 I : t

-20 -10 O 1o__20FORMULA

3 0 4 0 5 0

TRUE-SCORE

60 70 80 80

SAT -V A3/SAT-v X2 AND Y3 EQUATING RESICUALS__

- - IRT PRE-EQUATING

INTACT-FORM-0740IRT EQUATING

-20 -10 0 10 20 30 40 50 80 70

FORMULA TRUE-SCORE

80 GO

SAT -V AS/SAT-v X2 MELTS EQUATING RESIDUALS

D 35

FF .2s -O ::0 -F 1G-

loi-

A5

L

r)D-t

c-I50 -20 -R-25 -

S

35


_____INTACT FORM OPER.LINEAR EQUATING

/ N,:---=

-,... /--,

.\.....--=- '* \N -/

-20 -10 O 1+0_ 20 30 4_0___50 $0FORMULA TRUE-SCORE

Figure SATverbal Form 3ASiLLeeiyated to .:1AT=Verbal Fort-XSA2 AteJ. SAT-verbal

Form YSA3 - Plots of 1) IRT ptdEotiati4 raw to scaled score trans-

"i formation compared td Corresponding intact form calibration systemIRT,diredt link IRT, and operational linear equating raw to sca4ed

Score tranSfortaticts, and 2) differences between scaled scores (IRT

pre-equating -.comparison equating) resulting from the equ4lAinv.

29

70 80 GO

E 0O cc

Lz. C-1

4-1 MC

0 0C 4.1 r. r.1

:4 0

CY0

DqUatiag Residual Plot

SAT-V AUSAT-V X2 AND Y3

700

SC 800 ,-A -

E SOO -O 0-.

-1-

S 4C)°C0 300RE200 - - - IRT PRE - EQUATING

C -16 -

°-20 --

INTACT FORM CCS) R

mOH IRT EQUATINGE-25 -

,

------INTACT_FORM tCS)

S -30 - IRT EQUATING

3,-1--1-1-it'.-L-1-E-111t111__IJILIIIII -351 1 i 1

1_11L_II1 I- -t-r-t-i-r-t --I- -I- -F : f-II-3o- -20 -10 0 10 20 30 AO SO SO 70 80 GO -30 -20 -10 0 10 20 90 40 SO 80 70 SO GO.

FORMULA TRUE-SCORE FORMULA TRUE -SCORE

0IFF

0F

SAT-4 AS/SAT V X2 AND Y3 EQUATING RESIDUALS

3$

30-25-20-IS -

S 10 -CAL 0 -g =5

so

- IRT PRE- EQUATING

SAT-V A3/SAT-V X2 AND Y3

ecuL

700-

C 800A

LE 5001-O 1--

S 443a

C

° 300 ;--Rr_ r200 I-

L_

loc4-

ied--

; L_ILL I I J till!! LL:=11111__I 1 I

-30 -20 -10 0 10 20 30 40 50 00 70 80 GO

FORMULA TRUE-SCORE

- IRT PRE-EQUATING

IRT EQUATING

8up

7G0

C BCCAL

Scc{-

S 400"C 1-p0 300 {-

E

1 A3/SAT-V X2 AND V3

200 F -- -IRT PRE-EQUATING

INTACT FORM OPER.100r- LINEAR EQUATING

01r-11,1-1-4-1 1-1--r-F1-1-1f11J11111111-30 11-20 -10 0 10 20 30 40 SO 80 70 SO GO

FORMULA TRUE-SCORE

35I 30

F 2$ -

0 at$ :50 F

1_A

5

L 0 FE 1

-10$C -15

R. -20-205 -2S -S -30 -

3$

SAT-4 A3/SAT-V X2 AND Y3 EQUATING RESICLI

-30 -20 -10

PRE - EQUATING

INTACT-FORM-LDL)IRT EQUATING-1-

O 10 20 30 40 50 80FORMULA TRUE-SCORE

70 80 GO

SAT--V Al/SAT-V X2 AND _Y3 EQUATING RESIDUALS

35

3°1-F :5-

20^-1 i c

5 10LgL

A -1

0EL 0

0_20_

Es -25

35

------IRT PRE-EQUATING

L-__.--INTACT FORM OPER.LINEAR EQUATING

-20 -10 0: 1_0__ 20 30 40__50 80 73FORMULA TRUE-SCORE

Figure SAT-verbal Form 3ASiL3 _e_luated to .3AT-verbal Form-XSA2 Ll SAT-verbal

Form YSA3 Plots of I.) IRT pre- equating raw to scaled score trans-

formation compared to corresponding intact form calibration systemIRT, ;direct link IRT, and operational linear equating raw to scaldscore traUfOrtations; arid 2) differences between scai2d scores (IRT

pre-equating - .comparison equating) resulting from the eq74cAngr7

29

SO GO

'C

±.) .5,L)

.a iti

1-at P.1OE

800

700

C 800A

E SOOD

4°°

-

L 1-4 C

0 300 -RE

C.) UJ

C13

r./1 143C4-

01_1_1_1 t-30 -20

T..1

u 007001-

49 S I-4 Li

two 1-

=I MZ

A

hE SOO

cr"1.i LT.1

0 -C

EU004

fr. SC

)..) 1-I 0300-U Ral ...

5E

200--t-

G,-)1-1

loo

4.)

-26-

WIatidg Plat Residual Plot

SAT-V 83/SAT-V Al

-7

- - IRT ME-EQUATINGINTACT FORM <CS)

IRT EQUATING

1 I I 4-1-1- -t -1 -t t f l t 4-1 I II-10 0 10 20 30 40 50 80 70 80 00

FORMULA TRUE-SCORE

SAT -v 83,SAT -v A I


INTACT -FORM- <DoIRT EQUATING

0- ..1_1 1 l l 1 II 1 1 1 1 1 1 1 1 I 1 1 1 1

- 30. -20 - 0 0 10 20 30 40 ZO 80 70 SO 00

FORMULA TRUE -SCORE

600

noL$c6-661-A

E SOO -

--1 ,-1D

z4001 -

93ou

kCi 200r- - - - IRT PRE-EQUATINGW.

0 100 L/NEAR EQUATINGINTAcT FORm opER

-30 - 01 0I

10I

20\ 30 0i_50 -1-

SO 1 70FoRmuLAITRuE-SCORE

_ _5AT-VU/SAT-1.1 A

80 00

3130

F 256 20

F 15S 10

As

L 0

O -5-10

sc I 6

Rci -20 k

E -25S

-35-30

SAT -V 83/SAT-V Al EQUATING RESIDUAL

-20 -10 0 10 20FORMULA

--urr PRE-EcUATINGINTACT-FORM-GCS)

IRT EQUATING .

30 40 SO 89 70 80 GO

TRUE-SCORE

SAT-V 83/SAT-V Al EQUATING RESIDUALS

1) 35IFF 2s -O 20 -F 15

s 10CAL 0

g -15F_ -10 -- I S -

R_20

E-25 -S_

5511 1 1 I 1 I

-30 -20 -10

D 35SD

F 25O 20F isS 10CAL 0-O -5

R-25 - INTACT -FORM OPER,

-30 - LINEAR EQUATING

35 LI 1_1 1_1 l I L I _L I 1_4_4_1 11 I

-30 -20 -10 0 t0 20 30 49_ So 60 70 80 DO

'ORMULA ;'RUE -SCORE


--,INTACT FORM -<1:41:5....... IRT EQUATING

1 . 1 1 I 1 1 1 1 1_1 1_4_1 1 1 1-1-1--1E--0 10 20 30 - 40_ - GO SO 70 80 20

FORMULA TRUE-SCORE

SAT -7 53 /SAT -V A l EQUATING_RESIDUAL_S

N

-IRT PRE-EQUATING

Fig,.:re 7: SA.T-vzrbal-Fr.i_m_3BSA3 equated to SAT-verbal-FOrm_3ASA1 Piocs of I) TRT

pre-equatingr_raw to scaled sCete_ttah§f.dtMation compared to corresponding'

intact form talibiatiOn system IRT, direct link IRT; and operational

linear equating raw to scaled score transformations; and 2) differences

between scaled scores (IRT pre - equating - comparison equating) resulting

from the equatings.

800

700 -

SC 800 -A

E 500 -0 _

400 -C

300 -RE

200 7

100 p

-SO -20 -10 0 10_ 20 30 _40 SO 80 70 80 90

FOPMULA TRUE-SCORE

-27-

Equating Plot Residual Plot

SAT-V 53/SAT-V12AND AL

800

- IRT PRE-EQUATING

INTACT TORN -SCS)IRT EQUATING

SAT-V 83/SAT-V Y2 AND AI

...., ru to 700c-,

W C S r,..

,-1 4.3C 800 -

a m - AI= /-000 f

5 -or 0 ,--1-

)1 430

9-4007

PL. E7C 1-

JJ /--I ° 300 I-U _ RM .W E I-4-1 C 200 ,

ro

1-1 74C

1-2

.63 ro

M 0L r10 .1.3

CU

0

00 -

--IRT PRE-EQUATING

INTACT FORM -(Dl)IRT EQUATING

0-30 -20 -10 0 10 20 30 40 50 00 70 80 GO

FORMULA TRUE -SCORE

8C0

SAT -V 83/SAT-V Y2 AND Al

700 7

scam. r-A

66soo

n

Cd

or

5400C

0300

200

100 -

IRT PRE-EQUATING

INTACT FORm_OPER-LINEAR EQUATING

I 1 1 LI 1_1 1 1

-30 -20 -10 0 10 20 30 49 __S0 80 70 80 00FORMULA TRUE-SCORE

SAT-V 83/SAT AND AI EQUATING RESIDUALS

D 95p 30_F 25-0 20 -r 15 -S 10 -cAC 0E

°

s-10 -

0 _E -25 - INTACT_FORM tCS)

-30 -IRT EQUATING

35111-1 1_1111_11 1111-r-t11111fil

-30 -20 -i0 0 L0 20 SO _40 40 00 70 80 GO

FORMULA TRUE-SCORE

.......'-"'",0.--/ .

- IRT PRE-EQUATING

SAT -V 83/SAT -V Y2 AND Al EQUATING RESIDUALS

0 95F 30-F 25 -0 20 -F Is -S 10 -C 5-AL o

O./

E-- ...... ,...

-s-10-

C - I 5 -

R - - -----IRT PRE-EQUATING

E -25 - INTACT FORM _COL)S.30.. IRT EQUATING

35 1- 1 1-1 1 I- -1-1--1-1-1 1-1 t 1 I 1 1 I 1 1 1 I I I

-30 -20 -10 n 10 20 30 40 50 80 70 80 00

FORMULA TRUE -SCORE

SAT -V 83/SAT-V Y2 AND Al EQUATING RESIDUALS

D 35I 30-

F 25-0 07-F 15_

S 10 -C /'A

0 -E0 -5

S _ _

0-20-R

-----IRT PRE - EQUATING

INTACT FORM OPER,S.-30 - LINEAR EQUATING

35 I- I 1111 1111 1 -1- 1 1- I -1- I -1- 1 1

-30 -20 -10 0 10 20 30 40 50 60 70 CO 03FORMULA TRUE -SCORE

Figure :SAT-verhaL_Form 3BSA3 equated to SAT-verbal F6rm-YSA2_and SAT-verbal

Form3ASA1 - Plots of 1) IRT pre- equating raw to scaled score trans-.

formation _tompared to corresponding intact form calibration system IRTi

direct link IRTi. and_ operational linear equating raw to scaled score

transformations,_2) differences between scaled scores (IRT Pre-equating -

comparison equating) resulting from the equatings.

4

32

Table 4

Scaled Score Summary Statistics Resulting from Application of Four Equating Methods

SAT-verbal Sections of Forms 3ASA3 and 38SA3

Form N

IRT Intact Form

(Direct Link)

IRT Intact Form

(Calibration System)

Intact Form Linear IRT Pre-equating

3ASA3 126,788

M 437.04 437.01 441.45 439.26

S.D. 111.91 111.30 108,34 109.65

3BSA3 253,354

430.25 43042 431.42 440,39

S. D, 105.99 105.57 106.53 110.55

34

33

-29-

equating, curvilinear relationships are permitted. The residual plots,

in conjunction with data presented in Table 4, to be described next,

provide much of the data upon which to evaluate the results of this

study.

Table 4 provides the Scaled score means and standard deviations for

FOrMS 3ASA3 and 3BSA3 that would have resulted froth use for score

reporting purposes of pre-equating, intact form calibration system IRT

equating, intact form direct link IRT equating,. and intact form linear

equating to the old fen:ha. The means and standard deviations_were

computed using frequendiea for the total groups taking Forms 3ASA3 and

3BSA3.at the respective initial intact form administrations.

Based on the data presented for Form 3ASA3, it is clear that the

pre-equating was quite successful. In no residual plot is the

difference between the Scaled score resulting from the pre-equating and

the comparison intact form calibration system IRT or direct liOk IRT

equatings tere.then 15 score peihts ot a-SCald 'Containing'600-score--

pOints; The differences between the pre-equating results and the intact

form linear results are grater than the differences resulting fret the

intact forM IRT.equatings, particularly at the upper end of the formula

score :4ca1e. This is beauseall three IRT equatings demonstrate that

the raw to scaled score conversion is curVilitear_in this region, and

the linear equating cannot account 'for this curviIinearity. The

differences in scaled score means and standard deviations presented in

Table 4 are very small. The scaled score means and standard deviations

resulting frOM the two IRT methods used as criteria are Almost

identical. The scaled score mean resulting from the pre-equating lies

-30=

between the scaled score mean resulting from either the intact form

calibration system or direct lidkJRT equatings and the scaled score

mean resulting frOm the intact form linear equating. The scaled score

difference between the mean resulting from the pre-equating and any of

the Other equatings is about 2 points. What is particularly interesting

to note is the pattern of the residuals plots for the comparison of the

pre- equating results with the intact forM calibration system and direct

link IRT equating results, displayed in Figures 3-5 The patterns of

residuals are the same across both the single equatings and the. average

equating. The pre-equating results in lower scaled scores at the bottom

and top of the formula score scale and slightly higher scaled scores in

the middle region. As mentioned earlier, at no point are these

differenceS greater than 15 scaled score points, and hence, although the

pattern Of differences is consistent across equatings, the differences

themselves are minor when compared toi for instance the scaled standard

error of me357uremenrIor:S4T7-Nerhali_Whic.h ta. approximately. :0 sCaled_

score points.

The nre=equating of Form 3BSA3 was clearly not as successful as the

pre-equatitig for Form 3ASA3. The residual plots show maximum

differences in scaled scores resulting from the pre-equating and the

operational calibration system or diteet link IRT equating of upwards of

20 score points. Once again, the differences between.the pre-equating

and the intact form linear equating are even greater, partiCularly in

the regions of the formula score scale where.the raw to scaled

conversion is curvilinear. The differences in scaled score means and

standard deviations resulting from the pre-equating and the comparison

36

-31-

equatings are mktph larger than thoSe for Form 3ASA3. The two IRT

methods used as criteria produced scaled score summary statistics that

are very similar. Unlike the equatings for 3ASA3, scaled-score summary

statistics produced by the linear equatings are fairly similar to those

produced by the 1RT criterion equatings. The scaled score mean

resulting from the pre equating IS about ten points greater than the

scaled score means resulting from the IRT intadt form calibratien

system, IRT intact forth direct link, and intact form linear equatings,

Which are all within a scaled score point of each other. Once again,

the patterns in the residual plots for the IRT pre-equating and the

comparison IRT equatings are the an across both of the single

uatings (to 7SA2 and to 3ASA1) and; ,hence, the subsequent average

equating. The pre equating results in slightly loWer Scaled scores at

the lower end of the formula score scale and consistently higher scaled

scores through the middle and,upper end of the formula score scale. The

maximum differences-odour 5n-all-plots .pround_a_formula score of_70,_

S elemental Investib,ationS-and-Results

A number of possible explanations were generated-fo why the 3BSA3

pre-equating results were different from the 3BSA3 comparison equating

results and clearly not of the same quality as the 3ASA3 pre-equating

results; Exploration of these possible reasons for the inferiority of

.

the 3BSA3 ore- equating results are reported on next, and also diScussed

briefly in the conclusions section.

One possible explanation for the 3BSA3 pre-equating results has to

do with practice effects generated from the manner in which the test

sections are sequenced. In other words, for 3ASA3 there may have been

-32--

more or less of a balancing effect of the sequencing of the operational

final form section'and the pretest section (perhaps in about 50% of the

final form = pretest combinations represented in Figure 1 the

operational section appeared first and in the other 50% of the

combinations the pretest section appeared first), while for 3BSA3 the

balancing may not haVe occured. It can be hypothesized; given that the

3BSA3 pre-equating results are consistently higher in the upper part Of

the fOrmdla score scale than'any of the comparison equatings, that the

preteOt section followed the operational section in a disproportionate

number of. cases, and that practice effects resulted. An investigation

of the sequencing of sections did indeed show that; for 3BSA3, the

pretests occurred after the operational sections in 65% of the final

form - pretest combinations; but for 3ASA3, this was true 64% of the

time. Hence it would appear that the above explanation cannot be used

to explain why the 3BSA3 pre-equating results were so different from

those for 3ASA3.

Two other potential explanations for differences in pre-equating

results have to do with equating samples and LOGIST calibration runs.

These are:

I, The use of two different equating samples with the 3BSA3 intact

form calibt8tiOn system and direct link equatings. In doing the

3ASA3 intact form calibration system and direct link equatinge,

the same equating section, fm, was in common with old fotms XSA2

and YSA3; and hence the same sample, and subsequent set of

parameter estimates; could be used for both equatings. This was

not true for 3BSA3 in that fk was in common with YSA2 and fw

with 3ASA1. This necessitated the use of two different samples,

And hence, two different sets of item parameter estimates (both

Sets taken from the SAT IRT Scale Drift. Study) to perform the

equatings.

2. The use of different versions of the LOGIST program to generate

item. parameter estimates. For. 3ASA3, both the pretest and. the

final form parameter estimates were generated .from the current

version of LOGIST, and this is also true of the 3BSA3 pretest

parameter estimates. TO save on calibration costs; the'-3BSA3,

:final intact fOrM parameter estimates were recovered from the

SAT -IRT Scale Drift study (Petersen, et al; in press) run on a

different version of LOGIST. It is possible that the updating

and refinement of the LOGIST prOgram caused subtle differences

in parameter estimates, which collectively caused the

differences seen in the residual plots for 3BSA3.

The Possible-explanations above implicitly assume that it is not the

pre-equating for 3BSA3 that is somehow faulty, but instead the

comparison IRT equatings. To investigate whether or not it is

reasonable to explain the differences in pre-equating results this way,

the folloWing was done. The operational final form-equating section

combinations needed to equate intact final form 3BSA3 to old forma YSA2/

and 3ASA1 (see Figure 2) were run together in one large LOGIST;tUn,

using the current version of LOGIST and the intact final form equating

.redone. As a result, the patattitei- estimates for the 3BSA3 pre-equating

and the 3BSA3 intact final form equating were generated using the same

version of LOGIST. Further, by running the data for 3BSA3 and the two

33

-34--

old forms concurrently, there-was no need for scaling parameter

estimates (all parameter. estimates needed in the equating'are

automatically on the same scale) and only one set of 3BSA3 final form

parameter estimates were used in the equating (unlike the previous IRT

comparison equatings); In sum, the results of- equating intact final

form 3BSA3 to the old forms using the parameter estimates from the

concurrent LOGIST run should provide the best criterion possible for

evaluating the 3ESA3'pre-equating results.

A comparison of the 3BSA3 pre-equating results to this new IRT

comparison equating is presented in Figure 9 for each of the single

equatings and the average equating. The new' tomparison equating has

been labeled intact form concurrent equating in this figure.

Information on the scaled score summary statistics resulting from this

new equating and the others previously deScribed is presented in

Table 5.

The results presented in Figure 9 clearly lead to the conclusion

that it is not the comparison IRT equatings for Fort 3BSA3 that are

faulty, but rather the Form 3BSA3 pre-equating results. The data

presented in Figure 9 show differences between the IRT pre-equating and

the intact form concurrent IRT equating that are comparable to the

differences in the residual plots using the other intact form comparison

equatings.. Thus the poSsible explanations for differences in equating

results based on the use of different versions of LOGIST and multiple

sets Of parameter estimates, generated from the IRT Scale Drift Study

(Petersen et al, in press) must be discounted.

Equating Plot

=35-

C

R0 300 -

E ----TAT PRE-EQUATING

INTACT FORS 2 CONC3

100 IR7 EQUATING

0I-1-1-1 1-1-1-34 -20 -t0 0

11 I 1 1 I 1-I 1 -1-1 -1-11 1

10 20 30 40 50 80 70 80 90

FORNuLA TRUE-SCORE

SAT-V 83 /SAT-V Al

600

700

C 800ALESN)

cn-mmC034w

200

100

01-1 1 I I 1 -I 1

Residual -P-1 16-t

SAT-v 83/SAT-v Y2 EQUATING RESIDUALS

O 3S30 -

F 25 -ID 20 -F 15.-S 10 -CAL

(7-5

e-15R-5E -25 - INTACT-FORM-CC:0NC)

S -30 -IRT EQUATING_

-1-1-1-1 I 1,,1 1 IlL11111_1_1_3s -301 1,20t -I--1-101 01

10 20 30 40 60 80 70 80 90

FORMULA TRUE-SCORE

\ .

--IRT PRE-EQUATING

SAT-V 63/SAT -V AI EQUATING RESIDUALS

35

F 3°F 25 -

OF 15

10-C 5-AL 0 7E

S-

D

---IRT PRE-EQUATING 0-20 _

de'll

-.LL.-INTACT Fowl CCONC3IRT EQUATING

RE- -S -30 -

-INTACT_FORM CCONC3IRT EQUATING

I I I 1_ 1 I 35 1 I 1 1 L 1 1 I I 1 1-1-1-4 I I 1 1-

-30 -20 -10 0 ol 20 30 40 SO 80 70 80 90 -30 -20 -10 0 tO 20 30 40 CO 00 70 80 90

FORMULA TRIE SCORE FORMA TRUE-SCORE

.

- -IRT PRE-EQUATING

3A7-4 EM/SAT4 Y2 AND Al

800

7C0

C801

"c3 LESTM

71 -4 0 7CV < 3400 -

C

>4$00

a

200 - - PRE-EQUATING

I 00r _SW A CT-FORM-CO:WC,IRT EQUATING

0171-1--1-1-( I I I 1_1 I L I I 11 1 I I 1 1 I 1

-30 -20 -0 0 10 20 30 40 SO 00 70 BO 90

FORMULA TRUE-SCORE

D 3$I 30FF25O 20F 15s 10C gAL 0E

-6D-10

C-1604CiRE -25 - FORM CCONC35--30 _

IRT EQUATING

1_1 1 ri 1--1 1-1 1.1 I35

-30 -20 -10

SAT -V 83/SAT,V Y2 AND AI EQUATING RESIDUALS

MIN

4""

-IRT PRE-EQUATING

O 10 20FORMULA

30 40 60TRUE-SCORE

60 70 80

Samme_r_bel Form 3BSA3 equated to SAT-verbal Form YSA2; Form 3ASA1) and.

Eorms YSA2 and 3ASA1 -_Pldta 6f, 1) IRT pre-equating raw to scaled score

transformation compared to intact final form concurrent_IRT equating

n,4 to scaled score tranSformation; and 2) differences between scaled

scores (IRT Ord=equating - intact final form concurrent IRT equating)

resulting from the equatings;

90

Table 5

Scaled Score Summary Statistics Resulting from40licatiOn of Five Equating Methods

SAT-verbal Sections of Form 3BSA3

Form N

IRT Intact Form

(Direct Link)

IRT Intact Form

(Calibration System)

Intact Form Linear IRT Intact Form

(Concurrent Run) '

,

IRT Pre-equating

3BSA3 253,354

M 430;25 430.42 431,42 431.54 440.39

S.D.

......

105.99 105.57 106.53 105.86 110.55

43

42

The only'other possible explanation for,the differences between the

Form 3BSA3 pre-equating and intact form comparison equating results haS

to do with the quality of the parameter estimates for the 85 3BSA3 items

when they appeared in pretest form: In order f r the equatings to be as

discrepant as they are, the pretest and final form parameter estimates

for certain of the items must be quite different. The following method

was used to compare the-Se two sets of parameter estimates in an attempt

to locate those items for which the pretest parameter estimates were

probleMatiC. A mean absolute difference And a Mean signed difference

_

between the item response functionS for each item, where the functions

were generated using the pretest and the final form parameter estimates,

were obtained. Using all individuals in the sample taking Form 3BSA3

when calibrated as an intact final form, the abSolUte difference and the

signed difference between the item response functions for each person

(i.e., value of S) were obtained and then averaged across people. Items

having the largest mean AbSolUte difference and signed absolute

difference valudS were then located. The above analysis was also done

for the two sets of Form 3ASA3 item parameter estimates so that the

discrepancies between parameter estimates for 3ASA3, where the

pre-equating results were more than acceptable, could be compared to the

3BSA3 discrepancies.

Using the mean absolute and mean signed differences between the item

response functions as criteria for selection Of problematic items,

thirteen items from 3BSA3 and twelve from 3ASA3 were identified. Upon

inspection of these two subsets of problem items, they were found to

.

differconsiderably in characteristics. Of the thirteen items

-38-

_

identified for ForM 3BSA3, eleven were reading comprehension items. Of

the eleven, four were based on the same passage and .thred on another

passage; The remaining four reading comprehension items were single

items based on four different passages. Of the twelve items identified

for Form 3ASA3., four were reading comprehension items (two from one

passage, the other two from two other passages different from the

first), four were antonym items, three were analogies, and one was a

sentence completion item. Of the thirteen 3BSA3 items identified;

twelve were more difficult when given in a pretest than in the final

fort-. Of the twelve 3ASA3 items, seven were more difficult when given

in a pretest and five when given as part of the intact final form.

Upon closer inspection of the eleven reading comprehension items

from 3BSA3 exhibiting large absolute and signed differences in item

response functions, it was found that nine Of these items came from

pretests in Which the passage they were linked to .was located in the

final position Of the pretest section. For all but one of these items,

the item as it appeared in the,pretest was more difficult, sometimes

considerably more, than when it appeared in the final form. For the

lone exception, a word was deleted from the correct response (the only

such occurrence on either 3ASA3 or 3BSA3) between the time when the item

was given in a pretest and the time it appeared in the final fort. This

word also appeared as a key word in the text of the passage and it may

be hypothesized that it acted as a "clue" to the correct response, thus

explaining the increase in difficulty upon removal of the word from the

correct response when the item appeared in the final form. Figure 10

contains plots of the item response functionS baSed on pretest and

-39-

PEVIM CINIPRDIENa014 11E14 1

1.0

0.9

r0.1a O.70A 0.60

0.5

10.4

0.3

0.2

0.1

0.0 -3

A01.572OP .236c. .147

----FINAL MOA01.11-7

C. .092

0.9

II-111111ft2 3

I -1

mm

-2 0

'META

READING CONPREHEWIIN ITEN s

1,0.6

Ft00.7 r-

6A0.3-67:0.6-L-

3:0.4-

Y0.31-

0.2...........

0.1 -

I 1

---*

1 I0.0

MI .itltOW .931

-Aat.243OP .1G1Co .170

I -t-I L-4-1 IA1 2

THETA

1.0

0.2

0.t013.7

A0.60 .60I 0. 5

10.4Y 0.3

0;2

0.1

0.0

DING CONPREHOICION ITEM z

1.0

0.2

r 0' 41

:13.71-0 -_-A 0 . 60I0.5L10.4 -

151: .443..tag1:1

T

...-'FINAL FOR0.2 - .... - - 11

Ka . AOC!la-. 207

0 . I - Ca .147

0.0 1 1 I 1---1---3 -2 -1 0 1 2 3

THETA

1111111111

1 I-3

A±, .501111-. 1 OSOn .096

-FINAL FMA. 4966-.G24C. ;147

I I 1 1 1 1 1 1 1 -1--

Mml

- 2 0THETA

2

READING CONME107191121 ITEM 4

1 1 1 I 1

Figure 10: Plots of item response functions based on pretest and intact final form

item parameter estimates for thirteen problematic Form 3BSA3

46

1.0

0.2

REJOINS CIIMPREHEMS19c>f'1 ITEM

0.2

0.1

0.0 I 1 i -I- A-z -1

THETA

REPDIMS 0:11CPCHEMIZMI% 11131 7

1.0

00.7-3-A0.6a-x13.5

ID. 4

0.3 -0. 2

0.1_

0.0 t-3

1 1 1 I 1 1 1 1 1

Am .00715o .410Co Atia

-FINAL FORMAm .620Oo .10SCo ;147

1 I -V- I

0 1

THETA

A01.00Doo.70C

_.2011_---FINAL. FORM

Am .20.X1.114CS .147

-ILI-- 1 11110

THETA

1 2 3

FEARING CORREA:MUM 11134 I

0.a f_

a .7 -0

-4 A0.6 -alox =10.4 -Y0.3-

- 0.2

-3I-

-=t

A! -.7a40m-.345Ca .138

-FINAL FORMAo .7tS

-0o-.723Co. ;147

1 1- -I- -10 1

THETA

Figure 10: Plots Of item response functions based on pretest and intact final form

item parameter estimates for-thirteen problematic Form 3BSA3 items.

4'7

1.0

0.9

-ra.e

00.79

-10.4

Y0.3

0.1

0.01

rF

-3

A -6g4-Do-G476Co ;ON

FORMA .612Bo-1.623Co .147

111 1 I

aTHETA

THETA

0.9

0.1

'Ae .692Bo .446Cr .303

AP .7406o,-.160Co .147

mei

0.0 L l I I 1 1 1 1 1 L 1 I

-3 2 -1 0 1 2 3

I.0

0.0

THETA

£0. -00.70-A0.6

YO.SL_INALser:Ivocfrets

A. .S49

Xr0 4Ar.4CE

-.

-0.2 -

0.1 er..-Co .418347

0.0 1 A L I I I I I 1 I ' : t-5 -2 -1 0 i Z 3

1.0

r0CR0"A 0.111

10.6

X0.4T

Ya.31-

a.2

0.1

0.0

ANTONYM 1Te1

1

-

-

1

T 1 I 1 I 1

1-

AP .4421006-2.t73Co .121-

.....;-F274AC FORMAP .6410A-2.1191Co .147

1=1,

ME.

-3 -t -1 0THETA

1 2 3

THETA

Figure 10: Plots of item response functions based on pretest and intact final form

item parameter estimates for thirteen problematic Form 3BSA3 items.

-42-

intact final form parameter estimates for the thirteen problematic 3BSA3

items, identified by item type. For the reading comprehension items

(numbered 1-11), items numbered 1-4 are all based on the same reading

passage, as mentioned earlier, and items.numbered 5-7 are based an the

other passage discussed. Reading comPrehensicn item nuMbet 11 is the

item in which the word was deleted from the correct response betWeed

When the item was given in pretest and in final form; The remaining two

problematic items are presented after the reading comprehension items;

one of the items is an analogy item and the other is an antonym item.

Upon inspection Of the four problematic reading comprehension items

identified in Form 3ASA3, it was determined that, exactly like the

situation for Form 3BSA3, the items were located in pretests where the

passage they were linked to was located last in the pretest section.

All four items were also more difficult when they appeared in the

pretest section than in the final form.

On the basis, of the above data, it may be hypothesized that either

something approaching a "fatigue factot" is being exhibited in the

responses of candidates to reading comprehension item! based on passages

located at the end of pretest sections or, because of lack of time,

random responses are being supplied by these candidatea to certain of

the questions based on this last passage. In aither case, the items are

more difficult in the pretest than they are in the final form; where due

to passage location, a fatigue factor or the supplying of random

responses is not occurring. The data on the reading comprehenaion items

from both 3BSA3 and 3ASA3 are consistent with this statement.

4 9

-43-

If the above is happening to pretest reading comprehension items

based on passages located at the end of pretest sections, one might be

concerned about whether there ire large discrepancies in parameter

estimates between pretest and final form for reading comprehension items

in the intact final form based on passages at the end ofthe SAT-verbal

45 item and 40 item sections. The 45 item SAT-verbal sections do not

en&with reading comprehension items, but the 40 item sections do. For

3ASA3, the passage upon which the last set of reading COthprehenSiOh

items (items 36=40) were baSed Wa: also located in the final position in

the pretest::: TWO of the five items demonstrated discrepancies large

enough -to be included in the overall set of twelve items discussed

earlier. For 3BSk3, the passage upon which the last set of reading

\ comprehension items (items 36=40) were based was not located at the endN

of the pretest section. It was, however, the only reading comprehension

pa60age in the pretest, and one of these last five items (36-40) in

3BSA3did exhibit large discrepancies in pretest-final form parameter

estimates. Thus it would appear that While the outcome in terms of

parameter estimate discrepancies for reading comprehension items located

at the end of SAT-verbal sections is not as clear cut as for comparable

pretest reading comprehension items, there is still cause for concern.

The effect on, equating of having, in particular, reading

comprehension preteSt ittt difficulties estimated to be lower than they

are when estimated onAntact final form data is predictable', and

demonstrated in the Forth 3BSk3 pre-equating results.; If the same items

are more difficult ih the first "test" (made up of pretest items) than

in the second (made up of the items in the intact final form), then the

-44-

same raw score on both "tests" should result in a higher scaled score on

the first "test" than the second. This appears-to be exactly what is

happening with the 3BSA3 pre-equating results. For 3ASA3, on the other

hand, there is more of a balancing effect of the discrepancies between

pretest and intact final form parameter estimates and the result is that

the pre-equating and the intact final form comparison IRT equatings more

closely coincide. One might still be Concerned, however, about the fact

that any discrepancies at all showed up between the pretest and intact

final form parameter estimates, particularly for 3ASA3, where the

problem with final passage reading comprehension items was minimal.

,--study by Cook, Eignor, and Petersen (1982), examining the stability over

time of intact final form SAT-verbal (and other testing data)parameter

estimates, can be used to address this issue. The magnitudes of the

discrepancies found in the Cook, et al, (1982) study, based on the same

intact final form items given on two occasions, were of the magnitudes

of the discrepancies found in thit study. Hence; the lack of parameter

invariance demonstrated by certain of the items in this study may not be

such. a . serious cause f5r concern; this will be discussed further in the

conclusions'sectiOn.

Additional-F54ating,_ResuIts

In Concluding tliis section, one other noteworthy result should be

mentioned; this folloWs from a comparison of the intact form calibration

system IRT equating to the intact form direct link and intact form

concurrent IRT equatings. It would appear, based upon the equatings

done, that the equating is.more than adequate-when done thrOugh the

indirect linking of the new and old forms used for equating via the

'94

overall calibration system: That isi.even though in this situation the

forms to be equated are linked indirectly through intervening LOGIST

runs; and parameterestimates placed on a scale defined by the ability

distribution of the sample taking a form not*used in the equatings; the

quality of the eqUatinga are comparable to those resulting from either

linking the new and old forms directly (direct link equating) or

calibrating all data concurrently so that new and old form parameter

estimates are automatically on the same scale. This has important'

implications for the future construction of a large pool of calibrated

items and test forms to be used in intact final form IRT equating of

SAT-verbal.

Conclu sions

The results of pie-equating the two forals of SAT-verbal reported on

in this study; when compared to the intact final form IRT equating;

varied considerably; ranging from acceptable for Poi= 3ASA3 to only

marginally acceptable for Fort 3BSA3. Reasons for the infetiority

the Form 3BSA3 pre=eqUating results; having to do with the location of

reading passages and reading comprehension items at the end Of pretest

sections; have been advanced and discussed. The overall results of the

pre-equating of the two forma of SAT-verbal were deemed sufficiently

promising that an investigation .of pre-equating two forms of;

4

SAT=MSthematical is presently being undertaken. Data ftbitithe second

study, when considered-with the data froM this study; should supply

information necessary for the consideration of IRT pre-equating of the

52

SAT on a regular Operational basis. The results reported here also have

clear implications for changeS in test deVelopment practice; having to

do with the positioning of pretest and final form reading comprehension

items and making "minor" changes in the wording of items between. pretest

and final form; if pre-equating the SAT-verbal section is to become a

reality-

On a more general level, the results of this study indicate that the

IRT item parameter estimates generated for certain items When 1Jyen in

pretest form did not remain invariant when given in intact final form.

Rased on the results of recent studies; partidularly-Cooki Signor, and.

Petersen (1982), parameter invariance for all items in a test forth Would

not be expected to be the case. The real issue is whether the lack of

parameter invariance is serious enough to cause one to dismiss the use

of item response theory*for the particular application of concern. The

application in this study is pre-equating and the results, particularly

as they pertain to SAT --verbal Form 3ASA3, suggest that even though there'

is some lack of parameter invariance; IRT pre- equating may be A.

reasonable equating method for the SAT - verbal section.

v"

-47.-

References

AngoffW. H. Test reliability and effective test length. PSychometrika,

'1953, 1-8-, 1-14;

Angoff; W. H. Scales; norms and equivalent scores._ In R. L. Thorndike

(Ed.); Educational Measurement (2nd dd.). Washington, DC: American

Council on Education, 1971.

Bejar; I. I., and Wingerdky,_M, S. A study of pre-equating based on item

response theOtY. Applied Psychologital_Measurement; 1982; 6, 309-325

,

Cook, L. L., and Eignor, D. R. Practical considerations regarding the use

Of item response theory to equate tests. In R. K.,_HambletOn (Ed.),

Applications of item response theory. Vancouver, B.C.: Edutational

Research Institute of British Columbia, 1983.

Cook; L. L.; and Petersen, N. S. Item response theory equating for the

SAT: New designs and directions, Proposal submitted to College

Board -- ETS Joint Staff Research and Development Committee; 1982.

Cook; L. L., Eignor, D.. R., and Petersen; N. S. A study of the teMporal

stability of IRT item parameter estimates. A paper presented at the

Annual meeting of AERA; New York; 1982.

Kingston, N. M.; and Dorans, N. J. The feasibility of using item response

theory as a psychometric model for the GRE Aptitude Test. RR-82-I2.

Princeton, NJ: Educational Tegting Service; 1982:

Lord, F. M. Estimation Of latent ability and item parameters when there

are omitted responses. P- sychometrika; 1977; 14, 117 -138.

Lprd; F,_M. Applations of_ttem response theory to practical testing

problems. HIlisdale, NJ: ErIbaum. 1980.

Petersen; 1% S;; Cook, L. L., and StoCking, M. L. IRT versus conventional

equating methods: A comparative study of scale stability. Journal

of Educational Statistics, in press.

.Stocking, M. L.i and Lord, F. M. Developing a common metric in item

response theOry. RR.=-82-25-ONR; Princeton; NJ: Educational Testing

SerViCe, 1982.

WingerSky, M. S. LOGIST: A program for computing maximum likelihoodprocedures for logistic test models. In R. K. Hambleton (Ed.);

Applirrittons of item response theory. Vancouver; B.C.: Educational

'-Research Institute of BritiSh COluMbid, 1983.

Wingersky, M. S.,- Barton, M. A.; and Lord; F. M. LOGIST V user's guide.

PrinOetOn, N.J.: Educational Testing Service, 1982.

54

document resume - eric · document resume ed 235.197 tm 830 594 author eignor, daniel r.; cook,...

Documents