steps in test development & standardization (part 1

17
STEPS IN TEST DEVELOPMENT & STANDARDIZATION (PART 1) Dr. Meenakshi Shukla Assistant Professor Department of Psychology Magadh University Bodh Gaya

Upload: others

Post on 20-Dec-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: STEPS IN TEST DEVELOPMENT & STANDARDIZATION (PART 1

STEPS IN TEST DEVELOPMENT &

STANDARDIZATION (PART 1)

Dr. Meenakshi Shukla

Assistant Professor

Department of Psychology

Magadh University

Bodh Gaya

Page 2: STEPS IN TEST DEVELOPMENT & STANDARDIZATION (PART 1

STEP 1: OVERALL PLAN

• Every testing program needs some type of overall plan.• The first major decision is: What construct is to be measured? What score

interpretations are desired? What test format or combination of formats(selected response or constructed response/performance) is mostappropriate for the planned assessment? What test administration modalitywill be used (paper and pencil or computer based)?

• One needs to know how exactly and when to begin the program or process,in what sequence tasks must be accomplished, which tasks depend on thesuccessful completion of other tasks, what timeline must be adhered to,who is responsible for carrying out which specific tasks, how to qualitycontrol all major aspects of the testing program, plus literally hundreds ofother issues, decisions, tasks, and operational details.

Page 3: STEPS IN TEST DEVELOPMENT & STANDARDIZATION (PART 1

• Step 1, the overall plan, places a systematic framework on all majoractivities associated with the test development project, makes explicit manyof the most important a priori decisions, puts the entire project on arealistic timeline, and emphasizes test security and quality control issuesfrom the outset.

• Many other fundamental decisions must be made as part of an overall testdevelopment plan, including:

▪ Who creates test items for selected-response tests or prompts or stimuli forperformance tests?

▪ Who reviews newly written test items, prompts or other test stimuli?▪ How is the item or prompt production process managed and on what

timeline?▪ Who is responsible for the final selection of test items or prompts? Who

produces, publishes, or prints the test?

Page 4: STEPS IN TEST DEVELOPMENT & STANDARDIZATION (PART 1

▪ How is test security maintained throughout the test development sequence?▪ What quality controls are used to ensure accuracy of all testing materials?▪ When and how is the test administered, and by whom?▪ Is the test a traditional paper-and-pencil test or a computer-based test?▪ If required, how is the cut score or passing score established, and by what

method?▪ Who scores the test and how are the scores reported to examinees?▪ Who maintains an item bank or item pool of secure test items or

performance prompts?▪ What are the key dates on the timeline of test development to ensure that

all major deadlines are met?▪ Who is responsible for the complete documentation of all the important

activities, data results, and evaluation of the test?

Page 5: STEPS IN TEST DEVELOPMENT & STANDARDIZATION (PART 1

STEP 2: CONTENT DEFINITION

• One of the most important questions to be answered in the earliest stages of

test development is: What content is to be tested? No other issue is as

critical, in the earliest stages of developing effective tests, as delineating the

content domain to be sampled by the examination. If the content domain is ill

defined or not carefully delineated, no amount of care taken with other test

development activities can compensate for this inadequacy.

• Define the content domain operationally, delineate clearly the construct to be

measured, and successfully implement procedures to systematically and

adequately sample the content domain.

Page 6: STEPS IN TEST DEVELOPMENT & STANDARDIZATION (PART 1

• Content defining methods vary in rigor, depending on the purpose of the test, theconsequences of decisions made from the resulting test scores, and the amount ofdefensibility required for any decisions resulting from test scores. For some lowerstakes achievement tests, the content-defining methods may be very simple andstraightforward, such as instructors making informal (but informed) judgments aboutthe appropriate content to test. For other very high-stakes examination programs,content definition may begin with a multiyear task or job analysis, costing millions ofdollars, and requiring the professional services of many testing professionals.

• For high-stakes achievement examinations, test content defining methods must besystematic, comprehensive, and defensible. For instance, a professional school maywish to develop an end-of-curriculum comprehensive achievement test covering thecontent of a two-year curriculum, with a passing score on this test required tocontinue on to a third year of professional education. In this example, rigorous anddefensible methods of content definition and delineation of the content domain arerequired; all decisions on content, formats, and methods of content selectionbecome essential aspects of validity evidence.

Page 7: STEPS IN TEST DEVELOPMENT & STANDARDIZATION (PART 1

STEP 3: TEST SPECIFICATIONS: BLUEPRINTING THE TEST

• Test specifications or test blueprint refers to a complete operational definition of testcharacteristics, in every major detail, and thus includes what some authors call the testblueprint. For example, at a minimum, the test specifications must describe:

(1) the type of testing format to be used (selected response or constructedresponse/performance);

(2) the total number of test items (or performance prompts) to be created or selected for thetest, as well as the type or format of test items (e.g., multiple choice, three option, single-best answer);

(3) the cognitive classification system to be used (e.g., modified Bloom’s taxonomy with threelevels);

(4) whether or not the test items or performance prompts will contain visual stimuli (e.g.,photographs, graphs, charts);

(5) the expected item scoring rules (e.g., 1 point for correct, 0 points for incorrect, with noformula scoring);

(6) how test scores will be interpreted (e.g., norm or criterion referenced); and(7) the time limit for each item

Page 8: STEPS IN TEST DEVELOPMENT & STANDARDIZATION (PART 1

STEP 4: ITEM DEVELOPMENT• This step concentrates on a discussion of methods used to systematically

develop selected response items.• Creating effective test items may be more art than science, although there is

a solid scientific basis for many of the well-established principles of itemwriting. The creation and production of effective test questions, designed tomeasure important content at an appropriate cognitive level, is one of thegreater challenges for test developers.

• Early in the test development process, the test developer must decide whattest item formats to use for the proposed examination. For most large-scale,cognitive achievement testing programs, the choice of an objectivelyscorable item format is almost automatic. The multiple-choice format (andits variants) is the item format of choice for most testing programs.

• The multiple-choice item is an extremely efficient format for examinees, butis often a challenge for the item writer.

Page 9: STEPS IN TEST DEVELOPMENT & STANDARDIZATION (PART 1

• The principles of writing effective, objectively scored multiple-choice items are well

established and many of these principles have a solid basis in the research

literature. Yet, knowing the principles of effective item writing is no guarantee of an

item writer’s ability to actually produce effective test questions.

• Thus, one of the more important validity issues associated with test development

concerns the selection and training of item writers. For largescale examinations,

many item writers are often used to produce the large number of questions

required for the testing program. The most essential characteristic of an effective

item writer is content expertise. Writing ability is also a trait closely associated with

the best and most creative item writers.

Page 10: STEPS IN TEST DEVELOPMENT & STANDARDIZATION (PART 1

STEP 5: TEST DESIGN AND ASSEMBLY

• Assembling a collection of test items (or performance prompts) into a test ortest form is a critical step in test development.

• The specific method and process of assembling test items into final test formsdepends on the mode of examination delivery.

• If a single test form is to be administered in paper-and pencil mode, the testcan be assembled manually, by skilled test developers (perhaps using computersoftware to assist manual item selection and assembly).

• If multiple “parallel” test forms are to be assembled simultaneously, human testdevelopers using advanced computer software can assemble the tests.

• If the test is to be administered as computer-based, more specialized computersoftware will likely be needed to assemble multiple test forms to ensure properformatting of the fixed-length test form for the computer-delivery software.

Page 11: STEPS IN TEST DEVELOPMENT & STANDARDIZATION (PART 1

• If the test is to be administered as a computer-adaptive test, very advancedcomputer software (automatic test assembly software) will likely be required toadequately manage and resolve multiple test item characteristicssimultaneously to create many equivalent test forms

• Other major considerations in test assembly, at least for traditional paper-and-pencil tests, relate to formatting issues. Tests must be formatted to maximizethe ease of reading and minimize any additional cognitive burden that isunrelated to the construct being tested such as formatting items such that theentire item, together with any visual or graphical stimuli, appear on the samepage (or frame).

• Other formatting issues are more psychometric in nature. For example, theplacement of pretest (tryout) items within the test form is a formatting issue ofinterest and concern. Ideally, pretest items are scattered throughout the testform randomly, to minimize any effects of fatigue or lack of motivation

Page 12: STEPS IN TEST DEVELOPMENT & STANDARDIZATION (PART 1

STEP 6: PRETESTING AND ITEM ANALYSIS

• Pretesting involves administering the items to a preliminary sample of subjects– the subjects in this group should be representative of the population ofsubjects for whom the test itself is intended.

• Next, refining the items is required. Refining the items means eliminating itemsthat do not have the properties we had hoped for and selecting items that haveparticularly desirable properties, through item analysis (to find the itemdifficulty and item discrimination) and expert judgment (to get information onthe appropriateness of test item(s).

• Item analysis consists of statistical analyses of the data produced when testtakers respond to test items—analyses conducted for the purpose of providinginformation about the items, rather than the test takers.

• Item analysis provides three kinds of important information about the items:difficulty, discrimination and Differential Item Functioning (DIF).

Page 13: STEPS IN TEST DEVELOPMENT & STANDARDIZATION (PART 1

• Difficulty is exactly what the term implies, how hard the item is. For most tests, the test

developers need to know which test items (if any) are so hard that almost none of the test

takers can answer them correctly and which items (if any) are so easy that nearly all the test

takers can answer them correctly.

• Knowing the difficulty of the items helps the test makers to avoid making a test so hard or

so easy that it fails to provide much information about individual test takers. Also, an item

that proves to be much harder or much easier than anticipated may be flawed in some way.

• An unexpectedly hard item may be ambiguous, or it may have a wrong answer option—a

distractor—that is too nearly correct. An unexpectedly easy item may contain some kind of

information that makes the correct answer apparent even to test takers who do not have

the knowledge the item is intended to test.

Page 14: STEPS IN TEST DEVELOPMENT & STANDARDIZATION (PART 1

• Discrimination is the tendency of the item to be answered correctly by test takerswho are generally strong in the skills or type of knowledge the item is intended tomeasure and to be answered incorrectly by test takers who are not. To evaluatethe discriminating power of the item, it is necessary to have a measure of the testtakers’ proficiency in those skills or that type of knowledge. This measure is theitem analysis criterion or simply the criterion.

• Usually, it is the test taker’s score on the full test or on a portion of the test. Anitem that does not discriminate between test takers who are strong on thecriterion and those who are weak on the criterion is likely to be a bad item. It maybe ambiguous or misleading. It may have wrong answer options that are toonearly correct. It may test an obsolete skill or point of knowledge.

• On the other hand, it may be a perfectly good item that measures a specific skillor point of knowledge that happens to be known by many test takers who are notespecially strong in the other skills or knowledge the test measures.

Page 15: STEPS IN TEST DEVELOPMENT & STANDARDIZATION (PART 1

• DIF stands for Differential item functioning. DIF is the tendency of an item tofunction differently in different groups of test takers, groups defined by somethingother than their proficiency in the subject of the test. An item shows DIF against agroup of test takers if it is particularly difficult for members of that group—moredifficult than expected from the general performance of that group and the generaldifficulty of that item.

• The most common reason for DIF analysis is to identify and remove from the testany items that are particularly difficult for test takers who are members ofspecified demographic groups: women, African Americans, Asian Americans,Hispanic or Latino Americans, and Native Americans. However, DIF analysis canalso be used to identify items that are particularly difficult (or items that areparticularly easy) for students who have attended a particular type of school orstudied a particular type of curriculum.

• DIF is a secondary form of item analysis that is sometimes conducted after theprimary item analysis, which focuses on the difficulty of the items and theirdiscrimination between generally strong and generally weak test takers.

Page 16: STEPS IN TEST DEVELOPMENT & STANDARDIZATION (PART 1

HOW ARE ITEM ANALYSIS RESULTS USED?

• The information provided by item analysis helps test developers select theitems to be included in each form (edition) of the test, and to identify itemsthat need to be revised before they are included in any form of the test.

• Item analysis also serves as a quality-control step—a last chance to catcherrors in the scoring key or items that should be excluded from the scoring ofthe test. (Statistics alone cannot determine which items on a test are goodand which are bad, but statistics can be used to identify items that are wortha particularly close look) .

• Item analysis helps test developers decide which items from a current form ofa test to use in a future form of the test. It helps the test developers identifyitems that might be substantially improved by revisions. And it helps the testdevelopers learn what types of items tend to work well and what types tendnot to work well in a particular type of test.

• Next step is to administer the revised test to a new sample of subjects.

Page 17: STEPS IN TEST DEVELOPMENT & STANDARDIZATION (PART 1

Thank you…