exploring the full-information bifactor model in vertical scaling with construct shift ying li and...
TRANSCRIPT
Exploring the Full-Information Bifactor Model in Vertical Scaling With Construct Shift
Ying Li and Robert W. Lissitz
contentscontents
1. Unidimensionality of tests at each grade level2. Test construct invariance across grade
╳
bifactor model
Vertical scaling
1. Computational simplicity for estimation
2. Ease of interpretation
3. Vertical scaling problems across grades
QuestionsQuestions
PurposePurpose
• to propose and evaluate a bifactor model for IRT vertical scaling that can incorporate construct shift across grades while extracting a common scale,
• to evaluate the robustness of the UIRT model in parameter recovery, and
• to compare parameter estimates from the bifactor and UIRT models
MethodMethod
• Bifactor Model• Gibbons and Hedeker
(1992) generalized the work of Holzinger and Swineford (1937) to derive a bifactor model for dichotomously scored item response data.
0: general factor or ability i0 : discrimination parameter for general factors:group special factor or ability is : discrimination parameter for group-special factordi :overall multidimensional item difficulty
simulationsimulation
• Common item design
• Bifactor model for generating data
• Concurrent calibration– Stable results
• Sample size• Number or percentage of common items
– Test length:60
• Variance of grade-specific factor: degree of construct shift
• 100 replication
Data generationData generation
Item discrimination parameters were set deliberately and repeatedly at 1.2, 1.4, 1.6, 1.8, 2.0, and 2.2 for the general dimension, and fixed at 1.7 for grade-specific
Identifications of Bifactor Model EstimationIdentifications of Bifactor Model Estimation
• For the general dimension, the variance of the general latent dimension was fixed to 1, and the discrimination parameters (loadings) were freely estimated in the study
• For the grade-specific dimensions (s = 1, 2, . . . , k), the discrimination parameters (s = 1, 2, .. . , k) (loadings) were fixed to the true parameter value 1.7, so that the variances of the grade-specific dimensions could be freely estimated
• the common items answered by multiple groups were restricted so that they would have unique item parameters in the multiple-group concurrent calibration
Model EstimationModel Estimation• Multigroup concurrent calibration was
implemented
• The computer program IRTPRO (Cai, Thissen, & du Toit, 2011), using marginal maximumlikelihood estimation (MML) with an EM algorithm, was used to estimate the models
Evaluation criteriaEvaluation criteria
RMSE = root mean square error; SE = standard error; SS = sample size; CI = common item; VR = variance of grade-specific factor.
ResultsResults
Person parameterPerson parameter
• person parameter estimates of the general dimension were better recovered than that of the grade-specific dimensions when the degree of construct shift was small or moderate
• sample size the estimation accuracy
Group parameterGroup parameter
overestimated
UIRTUIRT
• discrimination: overestimated
• Difficulty: well recovered
• construct shift person & group mean
ANOVA Effects for the Simulated FactorsANOVA Effects for the Simulated Factors• Three-way tests of between-subject effects (ANOVA)
– bias, RMSE, and SE
Sample size Degree of Construct shift
Percentage of common item
Bifactor model small~moderate
No~small Small Bias in d
UIRT small~moderate
Large : &Group Mean ability
Large bias in d
& d & SEGroup Mean abilitygrade-specific variance parameter
Large:SE
comparisoncomparison
• UIRT: overestimate discrimation parameter
• person and group mean parameter:
Less accurate
Real dataReal data
• 2006 fall Michigan mathematics assessments• Grade 3, 4, 5• Randomly 4000 examinees
• Bifactor vs UIRT
Variance estimationVariance estimation
R=0.983R=0.983
DiscussionDiscussion
• sample size the estimation accuracy & stability
• Variance of grade-specific dimension stability
• Be caution about construct shift
• Polytomounsly/mixed item format• Incorporate covariates • longitudinal studies
• Common item measure two group-specific ability?
• the item discriminate parameter fixed to the true value
• Multidimensional IRT?