test development nicole williams, msn, rn-bc content manager sarah hagge, phd psychometrician
TRANSCRIPT
Test Development
Nicole Williams, MSN, RN-BC
Content Manager
Sarah Hagge, PhD
Psychometrician
Objectives
Identify the negative impact of examination bias
Discuss the impact of enemy items on test validity
Differential item Functioning (DIF)
Because of the high-stakes nature of the NCLEX®, numerous processes are in place to ensure that the exam is psychometrically sound, valid and legally defensible
One such process includes regular review of the NCLEX for potential biases
Detecting Bias
Bias exists when the test construct measured in one group differs from the construct measured in another group taking the same exam
For example, bias would exist in the NCLEX if it measured nursing knowledge in one group of candidates and another construct, such as reading comprehension, in another
Consequence of Bias
Goal of the NCLEX is to classify candidates into two groups Those who have adequate knowledge, skills
and ability to practice entry-level nursing safely Those who do not
If bias occurs, the construct of entry-level nursing knowledge may not be measured accurately for some groups of candidates
Methods to Detect and Minimize Bias
Item Development Writing Review
Editorial SME Sensitivity
Analyses Differential Item Functioning (DIF) Readability
What is DIF?
Investigates bias at the individual item level Exists when two groups of candidates with
similar ability perform differently on an item In short, one may consider whether the
candidate’s response to the item is dependent upon a group in which he/she resides
DIF Analyses
Statistical analyses are conducted on a focal vs. reference group Focal: group of interest (generally the minority) Reference: group with whom the focal group is
compared (generally the majority)
Method Rasch Separate Calibration t-test Compares the difference in difficulty of an item
for the focal and reference groups
NCLEX DIF Procedure
Routine DIF analyses are conducted semi-annually
Data include all U.S.-educated candidates
Focal and Reference Groups
Gender Reference: Female Focal: Male
Ethnicity Reference: Caucasian Focal: African American, Hispanic, Asian Other,
Asian Indian, Native American and Pacific Islander
2010 U.S.-Educated NCLEX Candidates
[1] 22,008 candidates did not provide information regarding ethnicities; 5,827 candidates did not provide information on gender.
78,222 PN candidates reported gender 164,175 RN candidates reported gender
74,147 PN candidates reported ethnicity152,069 RN candidates reported ethnicity
NCLEX DIF Procedure Continued
Analyses are conducted on all pretest and operational items
Minimum sample size requirements 50 focal group candidates 400 reference group candidates
Item difficulty is estimated for the two separate groups of candidates
Content Review
Items with large differences in difficulty are flagged for content review
Items displaying statistical DIF may still be content appropriate and valid Item content may be within the scope of entry-
level nurse practice Obstetrics and gynecology Operating medical equipment
Content Review Panel
Panel of subject matter experts (SMEs) convened to review items displaying statistical DIF
Panel composition must contain at least Five members Three ethnic focal groups One male One member with a background in linguistics One licensed RN
Content Review Panel Continued
Panel reviews all items flagged for statistical DIF in the past six months Potential bias Content relevance for entry-level nursing
Items identified for bias are forwarded to NCLEX Examination Committee
Content irrelevant items removed from operational use
Sample Item #1
The nursing care plan for a 74-year-old resident of a long-term care facility includes actions to promote the quality and duration of the client’s nighttime sleep. Which of the following behaviors, if exhibited by the client, would indicate an appropriate action?
1.The client does mild calisthenics 1 hour before bedtime.
2.The client takes walks in the halls primarily in the afternoon.
3.The client takes naps from mid- to late afternoon.
4.The client drinks warm tea before bedtime.
Sample Item #2
The nurse is caring for a 9-year-old client with bronchial asthma who was admitted with pneumonia. The client is on bed rest. Which of the following would be most appropriate to offer the client?
1.Coloring book and crayons
2.A toy stethoscope and syringe with needle
3.Beads and thread for making jewelry
4.A radio and telephone
Conclusion
Goal of NCLEX is to ensure public safety by classifying candidates based on whether they can practice entry-level nursing safely and effectively
Analyses such as DIF are conducted to ensure that all candidates receive an examination that accurately measures their entry-level nursing knowledge
Impact of Enemy Items
Effective item sampling from a specified test plan is essential to ensure that the exam is psychometrically sound, valid and legally defensible
One such process which assists in this endeavor is assessing and eliminating item duplication or enemy item pairs
How are Enemy Pairs Developed?
Random Occurs coincidentally in the normal process of
item development
Direct Intent Items similar in nature are purposefully
developed
What is an Enemy Item Pair?
Two or more items with very similar content are not placed on the same exam due to an impairment in: Content validity Face validity Measurement precision
Content Validity
The consistency with which the content is represented on the exam may be impacted
The content domain may be considered “oversampled”
Large impact on standardized exam as a specific number of items are allocated to the said content domain
Face Validity
Item duplication may cause the candidate to question exam validity
Candidate response may be altered due to the perception that the item is redundant
Candidate may become distracted believing that it is a “trick”
Measurement Precision
Item duplication may result in what is called Conditional Dependence
The two or more items are most likely correlated
Two dependent areas are being sampled and may lead to errors in ability estimates
Types of Enemy Item Pairs
Duplicate Items Stems Options Stimuli
Overlapping Content
Duplicate Items
All item components are virtually identical True duplicates, same item except
punctuation or other small differences
Duplicate Stems
Identical item stem and varying options May occur as a result of developing items
used as “variants” Less likely to occur when developing
authentic items from “scratch”
Duplicate Options
Similar stem and near identical item options Is considered a cost effective strategy used
by test developers to increase item development productivity
With response options so similar, candidates may become confused
Duplicate Stimuli
Identical exam stimulus such as Graphics Exhibits Case scenarios
Using same stimuli across exam items may create candidate confusion
Candidate exposure to the same stimuli multiple times may introduce fatigue
Overlapping Content
Similar content exists in the items (stem or options), however, the verbiage is different
Same concept, phrased differently Difficult to detect, precise effort should be
employed to seek out Can occur in differing item format, e.g.
multiple-choice and multiple response
Management of Enemy Item Pairs
Item Development Process Test Publishing Efforts Post Exam Administration
Item Development Enemy Management
Efforts placed at the beginning of item development to identify and label enemy pairs
Automated software now available which can isolate potential enemy pairs
Subject Matter Experts (SMEs) then review potential enemy item pairs, making identification more precise
Test Publishing Enemy Management
Once one or more enemy items are labeled, test developers can activate test driver specifications to prohibit the inclusion of an enemy item once one item in the enemy set has been selected
Post Administration Enemy Management
Test developers may analyze item intercorrelations
High intercorrelations may indicate potential enemy pairs
This method may capture the most obscure enemy pairs—those not immediately identifiable, least likely to impact test validity and measurement
Future Research
Future Research
DIF Investigate DIF using different reference/focal
groups
Enemy Item Management Impact of various enemy pairs on test validity—
does one type of enemy pair have a stronger/lesser impact on test validity and measurement?
References
Exam Publications
Ensuring Validity of NCLEX® With Differential Item Functioning Analysis
Understanding the Impact of Enemy Items on Test Validity and Measurement Precision