a comparative study of the bias correction methods for ... · 2020. 3. 2. · mum likelihood...

12
Research Article A Comparative Study of the Bias Correction Methods for Differential Item Functioning Analysis in Logistic Regression with Rare Events Data Marjan Faghih, 1 Zahra Bagheri, 1 Dejan Stevanovic, 2 Seyyed Mohhamad Taghi Ayatollahi, 1 and Peyman Jafari 1 1 Department of Biostatistics, Faculty of Medicine, Shiraz University of Medical Sciences, Shiraz, Iran 2 Clinic for Neurology and Psychiatry for Children and Youth, Belgrade, Serbia Correspondence should be addressed to Peyman Jafari; [email protected] Received 25 September 2019; Revised 19 December 2019; Accepted 20 January 2020; Published 25 February 2020 Academic Editor: Ali I. Abdalla Copyright©2020MarjanFaghihetal.isisanopenaccessarticledistributedundertheCreativeCommonsAttributionLicense, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. e logistic regression (LR) model for assessing differential item functioning (DIF) is highly dependent on the asymptotic sampling distributions. However, for rare events data, the maximum likelihood estimation method may be biased and the asymptotic distributions may not be reliable. In this study, the performance of the regular maximum likelihood (ML) estimation is compared with two bias correction methods including weighted logistic regression (WLR) and Firth’s penalized maximum likelihood (PML) to assess DIF for imbalanced or rare events data. e power and type I error rate of the LR model for detecting DIF were investigated under different combinations of sample size, moderate and severe magnitudes of uniform DIF (DIF 0.4 and 0.8), sample size ratio, number of items, and the imbalanced degree (τ). Indeed, as compared with WLR and for severe imbalanced degree (τ 0.069), there were reductions of approximately 30% and 24% under DIF 0.4 and 27% and 23% under DIF 0.8 in the power of the PML and ML, respectively. e present study revealed that the WLR outperforms both the ML and PML estimation methods when logistic regression is used to evaluate DIF for imbalanced or rare events data. 1. Introduction In psychological and educational tests, measurement in- variance is a crucial assumption for comparison of mean scores across people from different cultural, racial, or demographic backgrounds. Assessing this statistical property at the item level, also known as differential item functioning (DIF), is an important part of the process of validating tests. In general, DIF analysis is used to dis- tinguish whether the probability of responding to a specific item on a multi-item scale differs between two groups after controlling for the overall ability that is being measured by the a questionnaire [1]. Nowadays, different statistical methods including the logistic regression (LR) model, multiple group confirmatory factor analysis (MGCFA), and item response theory (IRT) model are available to assess the presence of DIF among subgroups of people [2, 3]. e LR is a model-based approach first introduced by Swaminathan and Rogers to assess DIF for both di- chotomous and polytomous item scored [4, 5]. e LR model is able to control additional continuous and cate- gorical confounders which may affect the results of DIF analysis. Furthermore, the LR model provides a number of effect size measures to quantify the magnitude of uniform and nonuniform DIF which may not be practically or clinically important [6–8]. Uniform DIF occurs when the difference in item response probabilities is constant across the scale. Nonuniform DIF is evident when the direction of DIF differs in different parts of the construct scale [9, 10]. Previous simulation studies have shown that identification of DIF through the LR model may be affected by various factors such as sample size, sample size ratio, magnitude of DIF, scale length (the number of items), and the number of groups [11–14]. Hindawi BioMed Research International Volume 2020, Article ID 1632350, 12 pages https://doi.org/10.1155/2020/1632350

Upload: others

Post on 07-Mar-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Comparative Study of the Bias Correction Methods for ... · 2020. 3. 2. · mum likelihood estimation method (PML) was originally developedbyDavidFirth[16]inordertoreducethesmall

Research ArticleA Comparative Study of the Bias Correction Methods forDifferential ItemFunctioningAnalysis in Logistic RegressionwithRare Events Data

Marjan Faghih1 Zahra Bagheri1 Dejan Stevanovic2 SeyyedMohhamadTaghi Ayatollahi1

and Peyman Jafari 1

1Department of Biostatistics Faculty of Medicine Shiraz University of Medical Sciences Shiraz Iran2Clinic for Neurology and Psychiatry for Children and Youth Belgrade Serbia

Correspondence should be addressed to Peyman Jafari pjbiostatgmailcom

Received 25 September 2019 Revised 19 December 2019 Accepted 20 January 2020 Published 25 February 2020

Academic Editor Ali I Abdalla

Copyright copy 2020Marjan Faghih et al+is is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

+e logistic regression (LR) model for assessing differential item functioning (DIF) is highly dependent on the asymptoticsampling distributions However for rare events data the maximum likelihood estimation method may be biased and theasymptotic distributions may not be reliable In this study the performance of the regular maximum likelihood (ML) estimation iscompared with two bias correction methods including weighted logistic regression (WLR) and Firthrsquos penalized maximumlikelihood (PML) to assess DIF for imbalanced or rare events data +e power and type I error rate of the LR model for detectingDIF were investigated under different combinations of sample size moderate and severe magnitudes of uniform DIF (DIF 04and 08) sample size ratio number of items and the imbalanced degree (τ) Indeed as compared with WLR and for severeimbalanced degree (τ 0069) there were reductions of approximately 30 and 24 under DIF 04 and 27 and 23 underDIF 08 in the power of the PML and ML respectively +e present study revealed that the WLR outperforms both the ML andPML estimation methods when logistic regression is used to evaluate DIF for imbalanced or rare events data

1 Introduction

In psychological and educational tests measurement in-variance is a crucial assumption for comparison of meanscores across people from different cultural racial ordemographic backgrounds Assessing this statisticalproperty at the item level also known as differential itemfunctioning (DIF) is an important part of the process ofvalidating tests In general DIF analysis is used to dis-tinguish whether the probability of responding to a specificitem on a multi-item scale differs between two groups aftercontrolling for the overall ability that is being measured bythe a questionnaire [1] Nowadays different statisticalmethods including the logistic regression (LR) modelmultiple group confirmatory factor analysis (MGCFA)and item response theory (IRT) model are available toassess the presence of DIF among subgroups of people

[2 3] +e LR is a model-based approach first introducedby Swaminathan and Rogers to assess DIF for both di-chotomous and polytomous item scored [4 5] +e LRmodel is able to control additional continuous and cate-gorical confounders which may affect the results of DIFanalysis Furthermore the LR model provides a number ofeffect size measures to quantify the magnitude of uniformand nonuniform DIF which may not be practically orclinically important [6ndash8] Uniform DIF occurs when thedifference in item response probabilities is constant acrossthe scale Nonuniform DIF is evident when the direction ofDIF differs in different parts of the construct scale [9 10]Previous simulation studies have shown that identificationof DIF through the LR model may be affected by variousfactors such as sample size sample size ratio magnitude ofDIF scale length (the number of items) and the number ofgroups [11ndash14]

HindawiBioMed Research InternationalVolume 2020 Article ID 1632350 12 pageshttpsdoiorg10115520201632350

However it should be noted that statistical inference basedon the logistic regression model is highly dependent on theasymptotic properties of the maximum likelihood estimator[9] Under the large sample situations the sampling distri-bution of the maximum likelihood (ML) estimators for thelogistic regression coefficients is asymptotically unbiased andnormal However in small samples it is well known that theML estimations may be biased and the asymptotic propertiesmay not hold [9 15] Accordingly a number of penalizationtechniques such as Firthrsquos method have been introduced tocorrect or reduce the small sample bias of theML estimators ofthe LR model [15 16] A new simulation study has recentlycompared the performance of the LR model based on max-imum likelihood (ML) and Firthrsquos penalized maximumlikelihood (PML) estimation methods in terms of empiricalpower and type I error rate to detect uniform and nonuniformDIF [9] +e results showed that as compared with PML theLR model based on asymptotic ML worked slightly better interms of statistical power although the difference in perfor-mance was not practically important [9]

In addition to the small sample rare events data isanother factor that can substantially influence the as-ymptotic properties of the ML estimators in the LR model[15 17] +e bias of prediction as well as bias in re-gression coefficients is a potential problem that may arisefrom the use of standard logistic regression in thepresence of rare events data Rare events data refer tooccurrences that take place much less frequently thanmore common events [18 19] According to King andZeng ordinary logistic regression can sharply underes-timate the probability of rare events [20] Hence theyproposed weighted logistic regression (WLR) to correctbias terms in regression coefficients and predictedprobability by applying a weighted likelihood function toestimate parameters under pure case-control sampling[20] Moreover a wide variety of penalization methodshave been suggested to resolve the problem of rare eventsdata in the LR model [15ndash17]

Although the effect of bias correction methods on theperformance of the LR model for detecting DIF has beenevaluated by Lee in a small sample [9] such an explanationhas never been provided for rare events data To fill this gapin the present simulation study the three inferentialmethods including WLR PML and ML are compared todiscover what the best correction method is to achieve theadequate power and type I error rate for detecting DIFwhen the response variable in the LR model is imbalancedor rare Hence in a comprehensive simulation study weinvestigate whether the statistical properties of the LRmodel for detecting DIF with and without applying biascorrection methods can be influenced by the degree ofrareness or imbalance data magnitude of DIF sample sizesample size ratio and the length of the scale across ref-erence and focal groups In addition to simulation we havealso used real data to validate and compare the effectivenessof the proposed methods for detecting DIF in practice

2 Methods

21 LogisticRegressionModel forDetectingDIF +epresenceof uniform and nonuniform DIF can be tested by comparingthree different logistic regression (LR) models as follows

Model 1 log it πi( 1113857 lnp yi 1( 1113857

p yi 0( 11138571113888 1113889 β1 + β2θ

Model 2 log it πi( 1113857 lnp yi 1( 1113857

p yi 0( 11138571113888 1113889 β1 + β2θ + β3G

Model 3 log it πi( 1113857 lnp yi 1( 1113857

p yi 0( 11138571113888 1113889 β1 + β2θ + β3G + β4θG

(1)

In these models the term θ is the observed ability of eachrespondent usually defined as total test score and G is anindicator variable representing group membership (refer-ence and focal groups for example male and female ingender variable) According to the abovementioned modelsthe presence of uniform and nonuniform DIF could bedetermined by comparing models 1 and 2 as well as models 2and 3 respectively In both cases the difference between minus 2log likelihoods of the models is compared to a χ2 distributionwith one degree of freedom

For nonuniform DIF items the direction of DIF differsacross latent constructs and the effect of DIF is naturallycancelled out at the level of latent constructs [21] which is themajor reason for focusing on uniform DIF in this study

22 Maximum Likelihood Consider the logistic regressionmodel

P yi 11113868111386811138681113868 xir B1113872 1113873 πi

eXiβ

1 + eXiβ

11 + eminus Xiβ

1

1 + exp minus 1113936kr1 βrxir1113872 1113873

so lnπi

1 minus πi

1113888 1113889 Xiβ

(2)

where (yi xir) xir (i 1 n r 2 k) and yi isin 0 1denotes a sample of n observations of binary outcomevariable y and the vector of independent covariates with 1 times

k dimensions Xi (1 xi2 xik) where xi1 1 is theconstant and β (β1 βk)prime +en ML estimates 1113954βr ofregression parameters and βr are obtained by solving thefollowing score equations

z ln L

zβr

equiv U βr( 1113857 1113944n

i1yi minus πi( 1113857xir 0 (3)

where

2 BioMed Research International

L(β) 1113945n

i1πi( 1113857

yi 1 minus πi( 11138571minus yi 1113945

n

i1

eXiβ

1 + eXiβ1113888 1113889

yi 11 + eXiβ

1113874 11138751minus yi

ln L(β) 1113944n

i1yi ln πi + 1 minus yi( 1113857ln 1 minus πi( 1113857( 1113857 1113944

n

i1yi ln

eXiβ

1 + eXiβ1113888 1113889 + 1 minus yi( 1113857ln

11 + eXiβ

1113874 11138751113888 1113889

(4)

Unfortunately there is no closed-form expression formaximizing the log likelihood of β +e ML estimations cantherefore be obtained using numerical optimization methodswhich start with a guess point and repeat to improve +eNewtonndashRaphson method is one of the most commonly usednumerical methods which needs theHessianmatrix as follows

z2 ln L

zβr2 1113944

n

i1

minus x2ire

Xiβ

1 + eXiβ( 11138572

⎛⎝ ⎞⎠ 1113944n

i1minus xir

2 πi 1 minus πi( 1113857( 11138571113872 1113873 (5)

If vi is defined as πi(1 minus πi) and V diag(v1 vn)then the Hessian matrix can be written as

H(β) minus XprimeVX (6)

+e information matrix is given by I(β) minus (z2 lnLzβ2) XprimeVX and the variance of β is then V(β)

I(β)minus 1 (XprimeVX)minus 1

23 Penalized Maximum Likelihood +e penalized maxi-mum likelihood estimation method (PML) was originallydeveloped by David Firth [16] in order to reduce the smallsample bias of maximum likelihood estimates For expo-nential family models this method corresponds to penali-zation of the likelihood by Jeffreysrsquo invariant prior [22]+us the penalized log likelihood for logistic regressiontakes the following form

ln L(β)lowast

ln L(β) + 5 ln|I(β)| (7)

where |I(β)| denotes the determinant of the Fisher infor-mation matrix evaluated at β Penalized maximum likeli-hood estimates for βr (r 1 k) are involved in calculating

z ln L(β)lowast

zβr

U βr( 1113857lowast

1113944n

i1yi minus πi + hi 05 minus πi( 1113857( 1113857xir 0

r 1 k

(8)

where the hirsquos represent the diagonal elements of the pe-nalized likelihood version of the standard ldquohatrdquo matrix

H V12

X XprimeVX( 1113857minus 1

XprimeV12 (9)

By this method the first-order term is removed from theTaylor series expansion of the bias of the ML estimatorwhich has negligible impacts in large samples but can besevere in small samples or rare events

Now Firth-type estimation 1113954β can be produced itera-tively by NewtonndashRaphson algorithmwith a starting value of1113954β

(0) 0 until convergence is obtained

1113954β(s+1)

1113954βs

+ Iminus 1 1113954β

s1113872 1113873U 1113954β

s1113872 1113873lowast (10)

where the superscript s is the sth iteration If the penalizedlog likelihood assessed at 1113954β(s+1) is less than that assessed at 1113954β

s

then 1113954β(s+1) is recalculated by step-halving +e PML esti-mations can also be created by carrying out theMLEmethodand splitting each main observation i into two new obser-vations having outcomes 1 minus yi and yi with weights hi2and 1 + (hi2) respectively +e new observations changethe score function for the ML method to (yi minus πi)(1+1113864

(hi2)) + (1 minus yi minus πi)hi2xir yi minus πi + hi(05 minus πi)1113864 1113865xir+erefore the breaking of each observation into a nonre-sponse and a response ensures that the finite PML estimatesalways exist [23 24]

+e standard error estimation is based on the root of theinverse diagonal elements of the Fisher information matrixwhich is approximated by minus (z2 ln Llowastzβ2)1113966 1113967

minus 1[16]

24 Weighted Logistic Regression +e weighted logistic re-gression (WLR) introduced by King and Zeng [20 25] is oneof the bias correction methods which considers two cor-rection steps +e first correction concerns the weights tomake up for the differences in the proportion of events in thesample and population So the following weighted log-likelihood should be maximized

ln Lw(β | yX) 1113944n

i1wi ln

eyiXiβ

1 + eXiβ1113888 1113889 (11)

where wi w1yi + w0(1 minus yi) w0 1 minus τ1 minus y w1 τywith y and τ that are the fraction of 1s in the sample andpopulation respectively in which the above weighted log-likelihood is obtained under pure case-control sampling asfollows

Suppose that the joint distribution of y and X in thesample is

fs(yX | β) Ps(X | y β)Ps(y) (12)

Yet since X is a matrix of exploratory variables thenPs(X | y β) P(X | y β) In the other words the condi-tional probabilities of X in the sample and population areequal However the conditional probability of the pop-ulation is

P(X | y β) f(yX | β)

P(y) (13)

But

f(yX | β) P(y |X β)P(X) (14)

BioMed Research International 3

Also by replacing and rearranging

fs(yX | β) Ps(X | y β)Ps(y) P(X | y β)Ps(y) f(yX | β)

P(y)Ps(y)

Ps(y)

P(y)P(y |X β)P(X)

HQ

P(y |X β)P(X) (15)

where H and Q represent the proportions in the sample andpopulation respectively +e likelihood is then

L(β) 1113945n

i1

Hi

Qi

P yi

1113868111386811138681113868 xi β1113872 1113873P xi( 1113857 (16)

+e log-likelihood for WLR can then be rewritten as

ln Lw(β | yX) 1113944n

i1

Qi

Hi

lnP yi

1113868111386811138681113868 xi β1113872 1113873 1113944n

i1

Qi

Hi

lneyiXiβ

1 + eXiβ1113888 1113889 1113944

n

i1wi ln

eyiXiβ

1 + eXiβ1113888 1113889 (17)

where wi QiHi +us the likelihood function must bemultiplied by the inverse of the fractions in order to obtain aconsistent estimator If the proportion of events in thepopulation is more than that in the sample then wi is morethan one and so the nonevent are given less weight and viceversa [26 27] +e weighting method has two seriousproblems that limit its application first the calculation ofthe standard errors through the information matrix is se-verely biased +is problem will be solved by using Whitersquosheteroscedasticity-consistent variance matrix [20]

Second slope coefficient is biased in rare events data Forsolving the second problem the bias in 1113954β can be estimated bythe following weighted least-squares expression

bias(1113954β) XprimeVX( 1113857minus 1

XprimeVξ (18)

If bias (1113954β) is the bias component and 1113954β is the uncorrectedcoefficient then the corrected coefficients 1113957β is 1113957β 1113954βminus

bias(1113954β) where

ξi 05Qii 1 + w1( 11138571113954πi minus w11113858 1113859 (19)

Also Qii are the diagonal elements of Q X(XprimeVX)minus 1

Xprime and V diag 1113954πi(1 minus 1113954πi)wi1113864 1113865+e variance matrix of 1113957β is

V(1113957β) n

n + k1113874 1113875

2V(1113954β) (20)

Since (n(n + k))2 lt 1 thenV(1113957β)ltV(1113954β) and so both thevariance and the bias are now diminished

+e second correction step is to adjust the underesti-mation of the probabilities when using the bias-correctedcoefficients in the logistic model By subtracting a correctionfactor Ci to the biased value of probability 1113957πi

πi 1113957πi minus Ci Ci 05 minus 1113957πi( 11138571113957πi 1 minus 1113957πi( 1113857X0V(1113957β)X0prime (21)

where V(1113957β) is the variance-covariance matrix X0 is a vectorof exploratory variables and πi is the approximate unbiasedestimator by means of a known τ When τ is unknown orpartially known King and Zeng introduced πi 1113957πi + Ci asthe approximate Bayesian estimator In this case the

researcher may specify an upper and lower bound for thepossible range of τ

Maximum Likelihood (ML) Penalized MaximumLikelihood (PML) and rare event logistic regression wereimplemented using the ldquoglmrdquo function and the ldquologistfrdquo Rpackage as well as the ldquorelogitrdquo function in the R packageZelig respectively [28 29]

25 Data Generation In this study an item response theorymodel for binary data was used to produce response data+e mathematical form of the IRT model is

pij Y 1 | θi( 1113857 eaj θi minus bj( 1113857

1 + eaj θi minus bj( 1113857 (22)

where pij(θ) is the probability of correct response for in-dividual i of item j aj denotes the item discriminationparameter bj is the item difficulty parameter and θi rep-resents the ability level for the ith individual In this study bj

(j 2 3 J) and θ parameters were simulated from thestandard normal distribution and aj (j 2 3 J) pa-rameters were random samples of a uniform distributionwithin the interval (1 2) [1]

In this simulation study five factors were varied samplesize magnitude of uniform DIF sample size ratio numberof items and the degree of rareness or imbalance +reesample sizes (N 200 600 1000) and three levels of samplesize ratio (R 1 2 3) were investigated+e sample size ratiobetween the focal and reference groups was set to 1 1 for theequal sample size conditions and 2 1 and 3 1 for the un-equal sample size conditions More specifically we createdconditions with nRnF 100100 67133 and 50150 for thesmall sample size (N 200) and nRnF 300300 200400and 150450 for the medium sample size (N 600) and nRnF 500500 333667 and 250750 for the large sample size(N 1000) Furthermore the two measures with 5 and 15items were simulated (I 5 15)

To generate rare events data we followed similar sce-narios and notations used by King and Zeng In order togenerate imbalanced or rare events data we applied the logitmodel log it(πi) β0 + β1xi πi p(Yi 1) by holding β1

4 BioMed Research International

constant (β1 1) while varying β0 In this case the values ofβ0 minus 3 and β0 minus 2 generate outcomes with the percentagesof ones which are equal to 69 and 156 respectively Itshould be noted that we generate the simulated data basedon the IRTmodel not the logistic regression Hence we needto show that the parameters of the logistic regression and theIRT model with binary responses are one-to-one corre-spondent In order to denote that both models have similarmathematical expressions equation (22) can be rearrangedas log it(πi) minus ajbj + ajθi By comparing this equation withthe logit function log it(πi) β0 + β1xi coefficients β0 andβ1 and the variable xi in the logistic regression model arefound to be respectively correspondent to parameters minus ajbjaj and θi in the IRTmodel (ie β0 ndashajbj β1 aj and xi θi)With the assumption of aj β1 1 the constant parameter(β0) in the logistic regression is equivalent to the minus valueof the difficulty parameter (minus bj) in the IRT model Ac-cordingly in the IRT framework and when aj 1 with thevalues of bj 3 and 2 we can generate responses with 69and 156 imbalanced degree respectively

Statistical power is defined by the proportion of timesthat DIF is correctly identified by the logistic method acrossreplications and the type I error rate also referred to thefalse positive rate represents the proportion of non-DIFitems incorrectly flagged as having DIF in 1000 replications+e type I error rates are averaged over all without DIF items[30]

3 Results

Table 1 presents the statistical power of the LR model basedon three different inferential methods (ML PML andWLR)and under various combinations of τ R N DIF and scalelength +e major finding was that the power of the ML andPML was more affected by imbalanced degree (τ) than byWLR estimation method Specifically for the moderatemagnitude of DIF (DIF 04) regardless of the sample sizeand number of items increasing imbalanced or rarenessdegree τ from 0156 to 0069 resulted in a reduction byapproximately 38 37 and 30 in the power of PMLMLand WLR respectively Furthermore for the severe mag-nitude of DIF (DIF 08) a similar finding was observedie the power of the PML ML and WLR was reducedapproximately by 37 35 and 26 respectively Ingeneral our findings indicated that the power of the threeestimationmethods for detecting DIF could be ordered fromhighest to lowest as follows WLRgeMLge PML Indeed forthe moderate values of DIF (DIF 04) compared withWLR there were reductions of approximately 30 and 24under τ 0069 and 22 and 16 under τ 0156 in thepower for the PML andML respectively On the other handwhen the magnitude of DIF was severe (DIF 08) com-pared with WLR there were reductions of approximately27 and 23 under τ 0069 and 14 and 12 underτ 0156 in the power for the PML and ML respectively

When the magnitude of DIF was moderate (DIF 04)N 1000 R 1 and I 15 the maximum power of the PMLML andWLRwas 46 48 and 53 for τ 0156 and 2528 and 32 for τ 0069 respectively Hence to achieve

the adequate power (08) for detecting moderate DIF werequire a sample size of larger than 1000 (N 1000) Fur-thermore for the severe of DIF (DIF 08) and τ 0156 thepower of the PMLML andWLRwas 75 78 and 83 forI 5 and 80 81 and 87 for I 15 respectively whenN 600 and R 1 2 However under DIF 08 τ 0069R 1 and N 1000 the power of the PML ML and WLRwas 66 69 and 77 for I 5 and 72 74 and 81 forI 15 respectively

Table 2 reports the empirical type I error rate of the LRmodel under various combinations of imbalanced degree ofdata sample size sample size ratio magnitude of DIF andthe number of items In general regardless of the magnitudeof DIF sample size sample size ratio and the number ofitems the average type I error rates of the ML PML andWLR were 006 005 and 003 for τ 0156 and 006 004and 001 for τ 0069 +ese findings indicate that when theimbalanced degree τ increased from 0156 to 0069 theaverage type I error rate of the WLR decreased dramaticallywhile it was close to the 5 level for the ML and PMLmethods However there were some exceptions where thetype I error rates of the ML PML and WLR were higherthan the nominal level of 5 When DIF 08 N 1000I 5 and τ 0156 the empirical type I error rate was 1111 and 9 for ML 8 9 and 8 for PML and 8 7and 7 for WLR when R was equal to 1 2 and 3 re-spectively In this case as shown in Table 2 when thenumber of items increased from 5 to 15 the type I error ratewould be equal or less than 5

+e average power and type I error rate on measureswith 5 and 15 items are depicted in Figures 1 and 2According to Figure 1 irrespective of the number of itemsthe highest power in all combinations belonged to WLRwhile PML and ML had relatively equivalent powerFigure 2 indicates that in most simulation conditions typeI error rate was above or exactly at the nominal significancelevel of 005 for ML and below or exactly equal to thenominal level for the PML method (ML range005ndash011and PML range 003ndash005) However in WLRtype I error rate was lower than the other two methods andfluctuated between 002 and 005 from 0 to 001 forτ 0156 and τ 0069 respectively

31 Real Data Example In this section we used a real dataset to validate the simulation findings Accordingly 387Serbian individuals (403 male and 507 female) wereselected out of a larger sample of 4192 people from elevencountries participating in a cross-cultural study [31] +eparticipants completed the 47-item Revised Child Anxietyand Depression Scale (RCADS) +e RCADS divided intosix subscales including social phobia separation anxietydisorder generalized anxiety panic disorder obsessivecompulsive disorder and major depressive disorder Allitems use the same 4-point Likert-type response scales toassess the frequency of a certain symptom We binarizedthe item responses after transformation of never andsometimes into ldquono symptomrdquo and often and always intoldquosymptomrdquo For details on the RCADS as used in this

BioMed Research International 5

project see [31] +e results of DIF analysis of the RCADSacross male and female Serbian adolescents based on threeinferential methods (WLR PML and ML) are shown inTable 3 Our findings showed that ten items with DIF and33 items without DIF were common across the three

methods However the results indicate that the WLRmethod is more sensitive than the PML and ML fordetecting DIF For example in the separation anxietysubscale while the WLR model identified four items withDIF one and three items exhibited DIF according to PML

Table 2 Type I error rate of different methods of estimation under different combinations

Item Ratio NDIF 04 DIF 08

τ 0156 τ 0069 τ 0156 τ 0069ML PML WLR ML PML WLR ML PML WLR ML PML WLR

5 nf nr200 007 005 002 007 005 001lowast 007 005 003 007 005 001lowast600 005 004 003 005 003 001lowast 008 006 005 006 004 001lowast1000 007 005 004 006 004 001 011 008 008 006 005 001

5 nf 2nr200 007 005 003 007 005 001 008 006 004 007 005 001600 006 004 002 004 004 001 008 006 005 005 004 0011000 006 005 003 005 004 001 011 009 007 006 004 001

5 nf 3nr200 006 004 003 005 004 001 007 005 004 006 004 001600 007 005 003 007 005 001 008 006 004 007 005 0011000 007 006 004 006 004 001 009 007 007 006 005 001

15 nf nr200 005 003 002 005 003 001 005 003 002 005 003 001600 005 004 002 005 004 001 006 004 003 005 004 0011000 005 005 003 005 005 001 005 005 003 005 005 001

15 nf 2nr200 007 005 003 006 005 001lowast 007 005 003 007 005 001lowast600 006 004 003 006 005 001 006 005 003 006 005 0011000 004 004 002 004 003 001lowast 005 004 002 004 004 001lowast

15 nf 3nr200 006 005 002 006 005 001 006 005 002 006 005 001600 005 005 002 005 004 001lowast 005 005 002 005 005 001lowast1000 005 004 002 005 004 001 005 004 002 005 004 001lowast

Note Ratio sample size ratio between the reference and focal groups nr and nf represent sample sizes in reference and focal groups respectively N totalsample size N nr + nf τ the fraction of 1s in the population DIF differential item functioning ML maximum likelihood PML penalized maximumlikelihood WLR Weighted Logistic Regression lowastNear to 001

Table 1 Statistical power of different methods of estimation under different combinations

Item Ratio NDIF 04 DIF 08

τ 0156 τ 0069 τ 0156 τ 0069ML PML WLR ML PML WLR ML PML WLR ML PML WLR

5 nf nr200 013 010 016 009 006 012 034 030 042 020 015 028600 029 026 033 017 015 021 078 075 083 049 045 0581000 046 042 051 023 021 027 094 092 096 069 066 077

5 nf 2nr200 013 011 016 009 007 013 032 029 041 019 015 030600 029 027 032 016 015 022 074 072 082 045 044 0591000 041 039 046 023 022 029 089 088 093 062 061 073

5 nf 3nr200 013 011 017 009 008 016 030 028 041 017 016 029600 021 020 026 013 013 018 061 060 073 037 036 0511000 037 035 042 021 020 026 086 085 091 056 056 070

15 nf nr200 013 011 017 010 009 013 036 033 045 022 018 029600 034 032 038 020 019 023 081 080 087 052 049 0621000 048 046 053 028 025 032 096 096 098 074 072 081

15 nf 2nr200 013 012 017 011 009 015 035 033 044 022 020 032600 034 032 039 020 020 025 077 077 085 051 050 0651000 045 044 050 025 024 031 094 094 097 070 070 080

15 nf 3nr200 013 013 017 010 008 014 031 031 041 019 019 031600 025 025 031 015 015 021 074 074 083 043 044 0571000 037 037 043 023 024 030 090 089 094 064 066 077

Note Ratio sample size ratio between the reference and focal groups nr and nf represent sample sizes in reference and focal groups respectively N totalsample size N nr + nf τ the fraction of 1s in the population DIF differential item functioning ML maximum likelihood PML penalized maximumlikelihood WLR Weighted Logistic Regression

6 BioMed Research International

010

020

030

040

050

060

Pow

er

400 600 800 1000200

Total sample size

(a)

04

06

08

010

Pow

er

400 600 800 1000200

Total sample size

(b)

400 600 800 1000200

Total sample size

010

020

030

040

050

Pow

er

(c)

02

04

06

08

010

Pow

er

400 600 800 1000200

Total sample size

(d)

005

015

010

020

025

030

Pow

er

400 600 800 1000200

Total sample size

(e)

020

040

060

080

Pow

er

400 600 800 1000200

Total sample size

(f )

Figure 1 Continued

BioMed Research International 7

400 600 800 1000200

Total sample size

002

003

004

005

006

007

Type

I er

ror r

ate

(a)

002

004

006

008

Type

I er

ror r

ate

400 600 800 1000200

Total sample size

(b)

002

003

004

005

006

007

Type

I er

ror r

ate

1000600 800400200

Total sample size

(c)

002

003

004

005

006

007

008

Type

I er

ror r

ate

1000600 800400200

Total sample size

(d)

Figure 2 Continued

400 600 800 1000200

Total sample size

005

010

015

020

025

030

Pow

er

(g)

02

04

06

08

Pow

er

400 600 800 1000200

Total sample size

(h)

Figure 1 +e average power of MLE (solid lines) PML (dotted line) and WLR (broken line) methods on measures with 5 and 15 itemsNote Left panel for DIF 04 and right panel for DIF 08 From top to bottom the four panels are (nf nr τ 0156) (nf 3nr τ 0156)(nf nr τ 0069) and (nf 3nr τ 0069)

8 BioMed Research International

and ML respectively Moreover as compared with ML andPML WLR gives smaller standard error for regressioncoefficient and it can result in increased likelihood offinding DIF

4 Discussion

Detecting DIF based on the logistic regression model can bechallenging for binary data where events are rare or severely

000

002

004

006

Type

I er

ror r

ate

1000600 800400200

Total sample size

(e)

000

002

004

006

Type

I er

ror r

ate

1000600 800400200

Total sample size

(f )

000

001

002

003

004

005

006

Type

I er

ror r

ate

1000600 800400200

Total sample size

(g)

400 600 800 1000200

Total sample size

000

002

004

006

Type

I er

ror r

ate

(h)

Figure 2 +e average type I error rates of MLE (solid lines) PMLE (dotted line) and WLR (broken line) methods on measures with 5 and15 items Note Left panel for DIF 04 and right panel for DIF 08 From top to bottom the four panels are (nf nr τ 0156) (nf 3nrτ 0156) (nf nr τ 0069) and (nf 3nr τ 0069)

Table 3 +e results of DIF analysis across male and female Serbian individuals based on ML PML and WLR methods

ItemML PML WLR

B SE P value B SE P value B SE P value τ

Social phobia

7 061 029 0032 06 029 0042 061 031 0042 02420 minus 116 044 0007 minus 113 044 001 minus 119 042 0002 01738 124 044 0002 119 043 0004 118 041 0001 01943 minus 157 047 lt0001 minus 152 046 0001 minus 156 044 lt0001 019

Separation anxiety

9 minus 089 048 006 minus 086 047 01 minus 083 036 0015 03117 206 122 0045 17 106 0092 097 075 0008 01145 096 048 0034 091 047 0059 104 056 0036 00946 minus 304 114 0001 minus 269 101 0003 minus 19 051 lt0001 014

Generalized anxiety disorder 13 114 035 0001 112 035 0001 112 035 0001 03137 minus 201 044 lt0001 minus 196 044 lt0001 minus 195 044 lt0001 022

Obsessive compulsive disorder 23 079 043 0059 075 043 01 079 041 0038 018

Major depression2 123 045 0004 118 044 0006 119 045 0003 0196 minus 122 058 0031 minus 118 056 0043 minus 115 042 0004 01221 063 032 0048 062 032 0072 065 034 0049 026

Note P value is reported in three decimal places for more accuracy of comparing the three models τ the fraction of 1s in the population that is extracted froma data set of 4192 adolescents in eleven countries B regression coefficient for testing uniform DIF SE standard error of regression coefficient

BioMed Research International 9

imbalanced +e present study is designed to clarify theunknown consequences of rare events data in the context ofDIF analysis based on the LR model with and withoutapplying bias correction methods Hence in the compre-hensive simulation study we compared the performance ofthe WLR PML and ML methods for detecting DIF withfocus on imbalanced binary response data

According to our simulation findings of the three es-timation methods for detecting DIF the WLR seems to bethe most appealing with regards to its statistical propertiesits type I error rate is close to or less than the nominal 005level and its power is considerably higher than the PML andML Moreover in the present study comparing the threeinferential methods for the real data set confirmed thefindings of the simulation Accordingly as compared withWLR PML and ML were less sensitive for detecting DIFacross male and female Serbian adolescents

It is worthwhile to explain why the WLR outperforms thePML and ML estimation methods for DIF analysis Indeedthe WLR is an extension of the small-sample bias correctionsas described by Cordeiro and McCullagh to the weightedlikelihood [32] A previous simulation study has shown thatthis bias correction method reduces both the bias and thevariance thereby offering the smallest Mean Squared Error(MSE) when compared with that of PML and ML estimationmethods [33] Hence applying the WLR in DIF analysis canlead to reduction in the standard error of the parameterestimates which consequently increases the likelihood offinding DIF Another advantage of the WLR is that it givesunbiased predicted probability [15 20] However the pre-dicted probabilities from the WLRmodel may fall outside theplausible range of 0 to 1 [15]

Furthermore our findings revealed that the logisticregression model based on the traditional ML estimationmethod had a slightly better performance than the PML forimbalanced or rare events data +ese findings were similarto those of Lee who reported that the ML slightly out-performed the PML for detecting DIF in small samples [9]In this case as described by Lee it seems that the sacrifice inthe precision cannot be compensated for the bias reductionof the PML estimates in imbalanced data It should also benoted that while Firthrsquos PML method reduces the bias in theestimates of regression coefficients it introduces bias in thepredicted probabilities and this bias is not negligible for rareevents data [15] To overcome this problem Puhr et alproposed two modified versions of Firthrsquos PML resulting inunbiased predicted probabilities [15] Accordingly theproposed methods including Firthrsquos logistic regression withintercept correction and Firthrsquos logistic regression withadded covariate efficiently improve on predictions fromFirthrsquos logistic regression [15] Since these methods were notimplemented in the standard statistical software such as Rwe were not able to apply them in the context of DIF analysisand compare their results with those of the WLR modelHence in the present study the WLR is the most efficientmethod for DIF analysis under rare events data +e WLRnot only reduces the bias of regression coefficients but alsoprovides unbiased predicted probability in comparison tothe PML method [15 20]

What distinguishes this study from the previous one isthat we simultaneously evaluated the effect of imbalanced dataand small samples (as well as large samples) on the perfor-mance of the three estimation methods for DIF analysis Inaddition one of the advantages of the present study is that itprovides a guideline about the required sample size for DIFanalysis with imbalanced or rare events data Previous sim-ulation studies have shown that the minimum sample size forDIF analysis with logistic regression should be within therange of 100 to 200 per group [34 35] However for moderateimbalanced data (τ 0156) and severe DIF (DIF 08) as ageneral rule of thumb we would suggest imposing a mini-mum of 300 respondents per group to achieve the adequatepower with the WLR method In addition for severe im-balance rate (τ 0069) and DIF 08 theWLR performs wellto ensure the acceptable power with samples of 500 per groupTo find what the minimum number of sample size is toachieve the adequate power (80) for moderate DIF wesimulated items with DIF 04 τ 0156 and 0069 andsample sizes greater than 500 in each group not reported inthe results section +e findings revealed that sample sizes ofat least 1000 per group for τ 0156 and 2000 per group forτ 0069 are required to detect DIF 04 based on the WLRmethod

In the present study we restricted our simulation to onlytwo bias correction methods for detecting DIF namelyWLR and PML Nevertheless there are a limited number ofstudies that evaluated DIF based on other penalizationmethods such as LASSO For example Tutz and Schaubergerapplied LASSO penalization for DIF analysis which includescontinuous variables [36] Furthermore Magis et al pro-posed a new DIF detection method based on LASSO whereall items were simultaneously evaluated for DIF in a singlemodeling approach so that multiple testing would not be aproblem [37]

5 Conclusion

Although the logistic regression (LR) model is one the mostcommon methods for detecting DIF certain samplingstrategies and appropriate bias correction techniques shouldbe applied when LR is implemented on moderate or severeimbalanced data sets In summary our findings revealedthat as compared with ML and PML the WLR is a moresensitive method for detecting DIF when data are imbal-anced or rare Hence as well as its easy application toexisting software the WLR introduced by King and Zeng isstrongly recommended for detecting DIF due to higherpower and lower type I error rate in comparison to PML andML inferential methods However in the future studies ofDIF penalized and bias correction methods should beproposed for ordinal logistic regression in the presence ofrare events or small sample setting

Data Availability

+is is a simulation study and the real data set has beenprovided by Dr Dejan Stevanovic the third author of themanuscript

10 BioMed Research International

Conflicts of Interest

+e authors declare no potential conflicts of interest withrespect to the research authorship andor publication ofthis article

Acknowledgments

+is work was supported by the Shiraz University of MedicalSciences Research Council Shiraz Iran under grant no96-01-01-14601 +is paper was extracted fromMarjan FaghihrsquosPhD thesis under supervision of Prof Peyman JafariPeyman Jafari andMarjan Faghih equally contributed to thiswork

References

[1] B D Zumbo ldquo+ree generations of DIF analyses consideringwhere it has been where it is now and where it is goingrdquoLanguage Assessment Quarterly vol 4 no 2 pp 223ndash2332007

[2] T A Ackerman ldquo+e theory and practice of item responsetheory by de Ayala R Jrdquo Journal of Educational Measure-ment vol 47 no 4 pp 471ndash476 2010

[3] B E Clauser and K M Mazor ldquoUsing statistical proceduresto identify differentially functioning test itemsrdquo EducationalMeasurement Issues and Practice vol 17 no 1 pp 31ndash442005

[4] H Swaminathan and H J Rogers ldquoDetecting differential itemfunctioning using logistic regression proceduresrdquo Journal ofEducational Measurement vol 27 no 4 pp 361ndash370 1990

[5] H J Rogers and H Swaminathan ldquoA comparison of logisticregression and Mantel-Haenszel procedures for detectingdifferential item functioningrdquo Applied Psychological Mea-surement vol 17 no 2 pp 105ndash116 1993

[6] P Narayanon and H Swaminathan ldquoIdentification of itemsthat show nonuniform DIFrdquo Applied Psychological Mea-surement vol 20 no 3 pp 257ndash274 1996

[7] P Jafari D Stevanovic and Z Bagheri ldquoCross-culturalmeasurement equivalence of the KINDL questionnaire forquality of life assessment in children and adolescentsrdquo ChildPsychiatry amp Human Development vol 47 no 2 pp 291ndash3042016

[8] P K Crane L E Gibbons K Ocepek-Welikson et al ldquoAcomparison of three sets of criteria for determining thepresence of differential item functioning using ordinal logisticregressionrdquo Quality of Life Research vol 16 no 1 pp 69ndash842007

[9] S Lee ldquoDetecting differential item functioning using thelogistic regression procedure in small samplesrdquo AppliedPsychological Measurement vol 41 no 1 pp 30ndash43 2017

[10] E Allahyari P Jafari and Z Bagheri ldquoA simulation study toassess the effect of the number of response categories on thepower of ordinal logistic regression for differential itemfunctioning analysis in rating scalesrdquo Computational andMathematical Methods in Medicine vol 2016 Article ID5080826 8 pages 2016

[11] P O Monahan C A McHorney T E Stump andA J Perkins ldquoOdds ratio delta ETS classification andstandardization measures of dif magnitude for binary logisticregressionrdquo Journal of Educational and Behavioral Statisticsvol 32 no 1 pp 92ndash109 2007

[12] S Apinyapibal N Lawthong and S Kanjanawasee ldquoAcomparative analysis of the efficacy of differential itemfunctioning detection for dichotomously scored items amonglogistic regression SIBTEST and raschtree methodsrdquo Proce-dia-Social and Behavioral Sciences vol 191 pp 21ndash25 2015

[13] H-F Chen and K-Y Jin ldquoApplying logistic regression todetect differential item functioning in multidimensionaldatardquo Frontiers in Psychology vol 9 p 1302 2018

[14] D Svetina and L Rutkowski ldquoDetecting differential itemfunctioning using generalized logistic regression in thecontext of large-scale assessmentsrdquo Large-scale Assessments inEducation vol 2 no 1 2014

[15] R Puhr G Heinze M Nold et al ldquoFirthrsquos logistic regressionwith rare events accurate effect estimates and predictionsrdquoStatistics in Medicine vol 36 no 14 pp 2302ndash2317 2017

[16] D Firth ldquoBias reduction of maximum likelihood estimatesrdquoBiometrika vol 80 no 1 pp 27ndash38 1993

[17] M Pavlou G Ambler S Seaman M De Iorio andR Z Omar ldquoReview and evaluation of penalised regressionmethods for risk prediction in low-dimensional data with feweventsrdquo Statistics in Medicine vol 35 no 7 pp 1159ndash11772016

[18] M Maalouf and T B Trafalis ldquoRobust weighted kernel lo-gistic regression in imbalanced and rare events datardquo Com-putational Statistics amp Data Analysis vol 55 no 1pp 168ndash183 2011

[19] M Maalouf D Homouz and T B Trafalis ldquoLogistic re-gression in large rare events and imbalanced data a perfor-mance comparison of prior correction and weightingmethodsrdquo Computational Intelligence vol 34 no 1pp 161ndash174 2018

[20] G King and L Zeng ldquoLogistic regression in rare events datardquoPolitical Analysis vol 9 no 2 pp 137ndash163 2001

[21] I-C Huang W L Leite P Shearer M Seid D A Revickiand E A Shenkman ldquoDifferential item functioning in qualityof life measure between children with and without specialhealth-care needsrdquo Value in Health vol 14 no 6 pp 872ndash883 2011

[22] H Jeffreys ldquoAn invariant form for the prior probability inestimation problemsrdquo Proceedings of the Royal Society ofLondon Series A Mathematical and Physical Sciencesvol 186 no 1007 pp 453ndash461 1946

[23] G Heinze and M Schemper ldquoA solution to the problem ofseparation in logistic regressionrdquo Statistics in Medicinevol 21 no 16 pp 2409ndash2419 2002

[24] G Heinze and M Ploner ldquoFixing the nonconvergence bug inlogistic regression with SPLUS and SASrdquo Computer Methodsand Programs in Biomedicine vol 71 no 2 pp 181ndash187 2003

[25] G King and L Zeng ldquoExplaining rare events in internationalrelationsrdquo International Organization vol 55 no 3pp 693ndash715 2001

[26] M Maalouf ldquoLogistic regression in data analysis an over-viewrdquo International Journal of Data Analysis Techniques andStrategies vol 3 no 3 pp 281ndash299 2011

[27] D E A Sulasih S W Purnami and S P Rahayu BeBeoretical Study of Rare Event Weighted Logistic Regressionfor Classification of Imbalanced Data Universitas Muham-madiyah Surakarta Surakarta Indonesia 2015

[28] G Heinze logistf Firthrsquos Bias Reduced Logistic RegressionR Package Version 123 2018 httpsCRANR-projectorgpackage=logistf

[29] C Choirat J Honaker K Imai G King and O Lau ZeligEveryonersquos Statistical Software Version 5161 2018 httpzeligprojectorg

BioMed Research International 11

[30] M L Ong L Lu S Lee et al ldquoA comparison of the hier-archical generalized linear model multiple-indicators mul-tiple-causes and the item response theory-likelihood ratio testfor detecting differential item functioningrdquo in QuantitativePsychology Research pp 343ndash357 Springer Berlin Germany2015

[31] D Stevanovic Z Bagheri O Atilola et al ldquoCross-culturalmeasurement invariance of the revised Child anxiety anddepression scale across 11 world-wide societiesrdquo Epidemiol-ogy and Psychiatric Sciences vol 26 no 4 pp 430ndash440 2017

[32] G M Cordeiro and P McCullagh ldquoBias correction in gen-eralized linear modelsrdquo Journal of the Royal Statistical SocietySeries B (Methodological) vol 53 no 3 pp 629ndash643 1991

[33] T Maiti and V Pradhan ldquoBias reduction and a solution forseparation of logistic regression with missing covariatesrdquoBiometrics vol 65 no 4 pp 1262ndash1269 2009

[34] B D Zumbo A Handbook on the Beory and Methods ofDifferential Item Functioning (DIF) National DefenseHeadquarters Ottawa Canada 1999

[35] J-S Lai J Teresi and R Gershon ldquoProcedures for the analysisof differential item functioning (DIF) for small sample sizesrdquoEvaluation amp the Health Professions vol 28 no 3 pp 283ndash294 2005

[36] G Tutz and G Schauberger A Penalty Approach to Differ-ential Item Functioning in Rasch Models Springer BerlinGermany 2015

[37] D Magis F Tuerlinckx and P de Boeck ldquoDetection ofdifferential item functioning using the lasso approachrdquoJournal of Educational and Behavioral Statistics vol 40 no 2pp 111ndash135 2015

12 BioMed Research International

Page 2: A Comparative Study of the Bias Correction Methods for ... · 2020. 3. 2. · mum likelihood estimation method (PML) was originally developedbyDavidFirth[16]inordertoreducethesmall

However it should be noted that statistical inference basedon the logistic regression model is highly dependent on theasymptotic properties of the maximum likelihood estimator[9] Under the large sample situations the sampling distri-bution of the maximum likelihood (ML) estimators for thelogistic regression coefficients is asymptotically unbiased andnormal However in small samples it is well known that theML estimations may be biased and the asymptotic propertiesmay not hold [9 15] Accordingly a number of penalizationtechniques such as Firthrsquos method have been introduced tocorrect or reduce the small sample bias of theML estimators ofthe LR model [15 16] A new simulation study has recentlycompared the performance of the LR model based on max-imum likelihood (ML) and Firthrsquos penalized maximumlikelihood (PML) estimation methods in terms of empiricalpower and type I error rate to detect uniform and nonuniformDIF [9] +e results showed that as compared with PML theLR model based on asymptotic ML worked slightly better interms of statistical power although the difference in perfor-mance was not practically important [9]

In addition to the small sample rare events data isanother factor that can substantially influence the as-ymptotic properties of the ML estimators in the LR model[15 17] +e bias of prediction as well as bias in re-gression coefficients is a potential problem that may arisefrom the use of standard logistic regression in thepresence of rare events data Rare events data refer tooccurrences that take place much less frequently thanmore common events [18 19] According to King andZeng ordinary logistic regression can sharply underes-timate the probability of rare events [20] Hence theyproposed weighted logistic regression (WLR) to correctbias terms in regression coefficients and predictedprobability by applying a weighted likelihood function toestimate parameters under pure case-control sampling[20] Moreover a wide variety of penalization methodshave been suggested to resolve the problem of rare eventsdata in the LR model [15ndash17]

Although the effect of bias correction methods on theperformance of the LR model for detecting DIF has beenevaluated by Lee in a small sample [9] such an explanationhas never been provided for rare events data To fill this gapin the present simulation study the three inferentialmethods including WLR PML and ML are compared todiscover what the best correction method is to achieve theadequate power and type I error rate for detecting DIFwhen the response variable in the LR model is imbalancedor rare Hence in a comprehensive simulation study weinvestigate whether the statistical properties of the LRmodel for detecting DIF with and without applying biascorrection methods can be influenced by the degree ofrareness or imbalance data magnitude of DIF sample sizesample size ratio and the length of the scale across ref-erence and focal groups In addition to simulation we havealso used real data to validate and compare the effectivenessof the proposed methods for detecting DIF in practice

2 Methods

21 LogisticRegressionModel forDetectingDIF +epresenceof uniform and nonuniform DIF can be tested by comparingthree different logistic regression (LR) models as follows

Model 1 log it πi( 1113857 lnp yi 1( 1113857

p yi 0( 11138571113888 1113889 β1 + β2θ

Model 2 log it πi( 1113857 lnp yi 1( 1113857

p yi 0( 11138571113888 1113889 β1 + β2θ + β3G

Model 3 log it πi( 1113857 lnp yi 1( 1113857

p yi 0( 11138571113888 1113889 β1 + β2θ + β3G + β4θG

(1)

In these models the term θ is the observed ability of eachrespondent usually defined as total test score and G is anindicator variable representing group membership (refer-ence and focal groups for example male and female ingender variable) According to the abovementioned modelsthe presence of uniform and nonuniform DIF could bedetermined by comparing models 1 and 2 as well as models 2and 3 respectively In both cases the difference between minus 2log likelihoods of the models is compared to a χ2 distributionwith one degree of freedom

For nonuniform DIF items the direction of DIF differsacross latent constructs and the effect of DIF is naturallycancelled out at the level of latent constructs [21] which is themajor reason for focusing on uniform DIF in this study

22 Maximum Likelihood Consider the logistic regressionmodel

P yi 11113868111386811138681113868 xir B1113872 1113873 πi

eXiβ

1 + eXiβ

11 + eminus Xiβ

1

1 + exp minus 1113936kr1 βrxir1113872 1113873

so lnπi

1 minus πi

1113888 1113889 Xiβ

(2)

where (yi xir) xir (i 1 n r 2 k) and yi isin 0 1denotes a sample of n observations of binary outcomevariable y and the vector of independent covariates with 1 times

k dimensions Xi (1 xi2 xik) where xi1 1 is theconstant and β (β1 βk)prime +en ML estimates 1113954βr ofregression parameters and βr are obtained by solving thefollowing score equations

z ln L

zβr

equiv U βr( 1113857 1113944n

i1yi minus πi( 1113857xir 0 (3)

where

2 BioMed Research International

L(β) 1113945n

i1πi( 1113857

yi 1 minus πi( 11138571minus yi 1113945

n

i1

eXiβ

1 + eXiβ1113888 1113889

yi 11 + eXiβ

1113874 11138751minus yi

ln L(β) 1113944n

i1yi ln πi + 1 minus yi( 1113857ln 1 minus πi( 1113857( 1113857 1113944

n

i1yi ln

eXiβ

1 + eXiβ1113888 1113889 + 1 minus yi( 1113857ln

11 + eXiβ

1113874 11138751113888 1113889

(4)

Unfortunately there is no closed-form expression formaximizing the log likelihood of β +e ML estimations cantherefore be obtained using numerical optimization methodswhich start with a guess point and repeat to improve +eNewtonndashRaphson method is one of the most commonly usednumerical methods which needs theHessianmatrix as follows

z2 ln L

zβr2 1113944

n

i1

minus x2ire

Xiβ

1 + eXiβ( 11138572

⎛⎝ ⎞⎠ 1113944n

i1minus xir

2 πi 1 minus πi( 1113857( 11138571113872 1113873 (5)

If vi is defined as πi(1 minus πi) and V diag(v1 vn)then the Hessian matrix can be written as

H(β) minus XprimeVX (6)

+e information matrix is given by I(β) minus (z2 lnLzβ2) XprimeVX and the variance of β is then V(β)

I(β)minus 1 (XprimeVX)minus 1

23 Penalized Maximum Likelihood +e penalized maxi-mum likelihood estimation method (PML) was originallydeveloped by David Firth [16] in order to reduce the smallsample bias of maximum likelihood estimates For expo-nential family models this method corresponds to penali-zation of the likelihood by Jeffreysrsquo invariant prior [22]+us the penalized log likelihood for logistic regressiontakes the following form

ln L(β)lowast

ln L(β) + 5 ln|I(β)| (7)

where |I(β)| denotes the determinant of the Fisher infor-mation matrix evaluated at β Penalized maximum likeli-hood estimates for βr (r 1 k) are involved in calculating

z ln L(β)lowast

zβr

U βr( 1113857lowast

1113944n

i1yi minus πi + hi 05 minus πi( 1113857( 1113857xir 0

r 1 k

(8)

where the hirsquos represent the diagonal elements of the pe-nalized likelihood version of the standard ldquohatrdquo matrix

H V12

X XprimeVX( 1113857minus 1

XprimeV12 (9)

By this method the first-order term is removed from theTaylor series expansion of the bias of the ML estimatorwhich has negligible impacts in large samples but can besevere in small samples or rare events

Now Firth-type estimation 1113954β can be produced itera-tively by NewtonndashRaphson algorithmwith a starting value of1113954β

(0) 0 until convergence is obtained

1113954β(s+1)

1113954βs

+ Iminus 1 1113954β

s1113872 1113873U 1113954β

s1113872 1113873lowast (10)

where the superscript s is the sth iteration If the penalizedlog likelihood assessed at 1113954β(s+1) is less than that assessed at 1113954β

s

then 1113954β(s+1) is recalculated by step-halving +e PML esti-mations can also be created by carrying out theMLEmethodand splitting each main observation i into two new obser-vations having outcomes 1 minus yi and yi with weights hi2and 1 + (hi2) respectively +e new observations changethe score function for the ML method to (yi minus πi)(1+1113864

(hi2)) + (1 minus yi minus πi)hi2xir yi minus πi + hi(05 minus πi)1113864 1113865xir+erefore the breaking of each observation into a nonre-sponse and a response ensures that the finite PML estimatesalways exist [23 24]

+e standard error estimation is based on the root of theinverse diagonal elements of the Fisher information matrixwhich is approximated by minus (z2 ln Llowastzβ2)1113966 1113967

minus 1[16]

24 Weighted Logistic Regression +e weighted logistic re-gression (WLR) introduced by King and Zeng [20 25] is oneof the bias correction methods which considers two cor-rection steps +e first correction concerns the weights tomake up for the differences in the proportion of events in thesample and population So the following weighted log-likelihood should be maximized

ln Lw(β | yX) 1113944n

i1wi ln

eyiXiβ

1 + eXiβ1113888 1113889 (11)

where wi w1yi + w0(1 minus yi) w0 1 minus τ1 minus y w1 τywith y and τ that are the fraction of 1s in the sample andpopulation respectively in which the above weighted log-likelihood is obtained under pure case-control sampling asfollows

Suppose that the joint distribution of y and X in thesample is

fs(yX | β) Ps(X | y β)Ps(y) (12)

Yet since X is a matrix of exploratory variables thenPs(X | y β) P(X | y β) In the other words the condi-tional probabilities of X in the sample and population areequal However the conditional probability of the pop-ulation is

P(X | y β) f(yX | β)

P(y) (13)

But

f(yX | β) P(y |X β)P(X) (14)

BioMed Research International 3

Also by replacing and rearranging

fs(yX | β) Ps(X | y β)Ps(y) P(X | y β)Ps(y) f(yX | β)

P(y)Ps(y)

Ps(y)

P(y)P(y |X β)P(X)

HQ

P(y |X β)P(X) (15)

where H and Q represent the proportions in the sample andpopulation respectively +e likelihood is then

L(β) 1113945n

i1

Hi

Qi

P yi

1113868111386811138681113868 xi β1113872 1113873P xi( 1113857 (16)

+e log-likelihood for WLR can then be rewritten as

ln Lw(β | yX) 1113944n

i1

Qi

Hi

lnP yi

1113868111386811138681113868 xi β1113872 1113873 1113944n

i1

Qi

Hi

lneyiXiβ

1 + eXiβ1113888 1113889 1113944

n

i1wi ln

eyiXiβ

1 + eXiβ1113888 1113889 (17)

where wi QiHi +us the likelihood function must bemultiplied by the inverse of the fractions in order to obtain aconsistent estimator If the proportion of events in thepopulation is more than that in the sample then wi is morethan one and so the nonevent are given less weight and viceversa [26 27] +e weighting method has two seriousproblems that limit its application first the calculation ofthe standard errors through the information matrix is se-verely biased +is problem will be solved by using Whitersquosheteroscedasticity-consistent variance matrix [20]

Second slope coefficient is biased in rare events data Forsolving the second problem the bias in 1113954β can be estimated bythe following weighted least-squares expression

bias(1113954β) XprimeVX( 1113857minus 1

XprimeVξ (18)

If bias (1113954β) is the bias component and 1113954β is the uncorrectedcoefficient then the corrected coefficients 1113957β is 1113957β 1113954βminus

bias(1113954β) where

ξi 05Qii 1 + w1( 11138571113954πi minus w11113858 1113859 (19)

Also Qii are the diagonal elements of Q X(XprimeVX)minus 1

Xprime and V diag 1113954πi(1 minus 1113954πi)wi1113864 1113865+e variance matrix of 1113957β is

V(1113957β) n

n + k1113874 1113875

2V(1113954β) (20)

Since (n(n + k))2 lt 1 thenV(1113957β)ltV(1113954β) and so both thevariance and the bias are now diminished

+e second correction step is to adjust the underesti-mation of the probabilities when using the bias-correctedcoefficients in the logistic model By subtracting a correctionfactor Ci to the biased value of probability 1113957πi

πi 1113957πi minus Ci Ci 05 minus 1113957πi( 11138571113957πi 1 minus 1113957πi( 1113857X0V(1113957β)X0prime (21)

where V(1113957β) is the variance-covariance matrix X0 is a vectorof exploratory variables and πi is the approximate unbiasedestimator by means of a known τ When τ is unknown orpartially known King and Zeng introduced πi 1113957πi + Ci asthe approximate Bayesian estimator In this case the

researcher may specify an upper and lower bound for thepossible range of τ

Maximum Likelihood (ML) Penalized MaximumLikelihood (PML) and rare event logistic regression wereimplemented using the ldquoglmrdquo function and the ldquologistfrdquo Rpackage as well as the ldquorelogitrdquo function in the R packageZelig respectively [28 29]

25 Data Generation In this study an item response theorymodel for binary data was used to produce response data+e mathematical form of the IRT model is

pij Y 1 | θi( 1113857 eaj θi minus bj( 1113857

1 + eaj θi minus bj( 1113857 (22)

where pij(θ) is the probability of correct response for in-dividual i of item j aj denotes the item discriminationparameter bj is the item difficulty parameter and θi rep-resents the ability level for the ith individual In this study bj

(j 2 3 J) and θ parameters were simulated from thestandard normal distribution and aj (j 2 3 J) pa-rameters were random samples of a uniform distributionwithin the interval (1 2) [1]

In this simulation study five factors were varied samplesize magnitude of uniform DIF sample size ratio numberof items and the degree of rareness or imbalance +reesample sizes (N 200 600 1000) and three levels of samplesize ratio (R 1 2 3) were investigated+e sample size ratiobetween the focal and reference groups was set to 1 1 for theequal sample size conditions and 2 1 and 3 1 for the un-equal sample size conditions More specifically we createdconditions with nRnF 100100 67133 and 50150 for thesmall sample size (N 200) and nRnF 300300 200400and 150450 for the medium sample size (N 600) and nRnF 500500 333667 and 250750 for the large sample size(N 1000) Furthermore the two measures with 5 and 15items were simulated (I 5 15)

To generate rare events data we followed similar sce-narios and notations used by King and Zeng In order togenerate imbalanced or rare events data we applied the logitmodel log it(πi) β0 + β1xi πi p(Yi 1) by holding β1

4 BioMed Research International

constant (β1 1) while varying β0 In this case the values ofβ0 minus 3 and β0 minus 2 generate outcomes with the percentagesof ones which are equal to 69 and 156 respectively Itshould be noted that we generate the simulated data basedon the IRTmodel not the logistic regression Hence we needto show that the parameters of the logistic regression and theIRT model with binary responses are one-to-one corre-spondent In order to denote that both models have similarmathematical expressions equation (22) can be rearrangedas log it(πi) minus ajbj + ajθi By comparing this equation withthe logit function log it(πi) β0 + β1xi coefficients β0 andβ1 and the variable xi in the logistic regression model arefound to be respectively correspondent to parameters minus ajbjaj and θi in the IRTmodel (ie β0 ndashajbj β1 aj and xi θi)With the assumption of aj β1 1 the constant parameter(β0) in the logistic regression is equivalent to the minus valueof the difficulty parameter (minus bj) in the IRT model Ac-cordingly in the IRT framework and when aj 1 with thevalues of bj 3 and 2 we can generate responses with 69and 156 imbalanced degree respectively

Statistical power is defined by the proportion of timesthat DIF is correctly identified by the logistic method acrossreplications and the type I error rate also referred to thefalse positive rate represents the proportion of non-DIFitems incorrectly flagged as having DIF in 1000 replications+e type I error rates are averaged over all without DIF items[30]

3 Results

Table 1 presents the statistical power of the LR model basedon three different inferential methods (ML PML andWLR)and under various combinations of τ R N DIF and scalelength +e major finding was that the power of the ML andPML was more affected by imbalanced degree (τ) than byWLR estimation method Specifically for the moderatemagnitude of DIF (DIF 04) regardless of the sample sizeand number of items increasing imbalanced or rarenessdegree τ from 0156 to 0069 resulted in a reduction byapproximately 38 37 and 30 in the power of PMLMLand WLR respectively Furthermore for the severe mag-nitude of DIF (DIF 08) a similar finding was observedie the power of the PML ML and WLR was reducedapproximately by 37 35 and 26 respectively Ingeneral our findings indicated that the power of the threeestimationmethods for detecting DIF could be ordered fromhighest to lowest as follows WLRgeMLge PML Indeed forthe moderate values of DIF (DIF 04) compared withWLR there were reductions of approximately 30 and 24under τ 0069 and 22 and 16 under τ 0156 in thepower for the PML andML respectively On the other handwhen the magnitude of DIF was severe (DIF 08) com-pared with WLR there were reductions of approximately27 and 23 under τ 0069 and 14 and 12 underτ 0156 in the power for the PML and ML respectively

When the magnitude of DIF was moderate (DIF 04)N 1000 R 1 and I 15 the maximum power of the PMLML andWLRwas 46 48 and 53 for τ 0156 and 2528 and 32 for τ 0069 respectively Hence to achieve

the adequate power (08) for detecting moderate DIF werequire a sample size of larger than 1000 (N 1000) Fur-thermore for the severe of DIF (DIF 08) and τ 0156 thepower of the PMLML andWLRwas 75 78 and 83 forI 5 and 80 81 and 87 for I 15 respectively whenN 600 and R 1 2 However under DIF 08 τ 0069R 1 and N 1000 the power of the PML ML and WLRwas 66 69 and 77 for I 5 and 72 74 and 81 forI 15 respectively

Table 2 reports the empirical type I error rate of the LRmodel under various combinations of imbalanced degree ofdata sample size sample size ratio magnitude of DIF andthe number of items In general regardless of the magnitudeof DIF sample size sample size ratio and the number ofitems the average type I error rates of the ML PML andWLR were 006 005 and 003 for τ 0156 and 006 004and 001 for τ 0069 +ese findings indicate that when theimbalanced degree τ increased from 0156 to 0069 theaverage type I error rate of the WLR decreased dramaticallywhile it was close to the 5 level for the ML and PMLmethods However there were some exceptions where thetype I error rates of the ML PML and WLR were higherthan the nominal level of 5 When DIF 08 N 1000I 5 and τ 0156 the empirical type I error rate was 1111 and 9 for ML 8 9 and 8 for PML and 8 7and 7 for WLR when R was equal to 1 2 and 3 re-spectively In this case as shown in Table 2 when thenumber of items increased from 5 to 15 the type I error ratewould be equal or less than 5

+e average power and type I error rate on measureswith 5 and 15 items are depicted in Figures 1 and 2According to Figure 1 irrespective of the number of itemsthe highest power in all combinations belonged to WLRwhile PML and ML had relatively equivalent powerFigure 2 indicates that in most simulation conditions typeI error rate was above or exactly at the nominal significancelevel of 005 for ML and below or exactly equal to thenominal level for the PML method (ML range005ndash011and PML range 003ndash005) However in WLRtype I error rate was lower than the other two methods andfluctuated between 002 and 005 from 0 to 001 forτ 0156 and τ 0069 respectively

31 Real Data Example In this section we used a real dataset to validate the simulation findings Accordingly 387Serbian individuals (403 male and 507 female) wereselected out of a larger sample of 4192 people from elevencountries participating in a cross-cultural study [31] +eparticipants completed the 47-item Revised Child Anxietyand Depression Scale (RCADS) +e RCADS divided intosix subscales including social phobia separation anxietydisorder generalized anxiety panic disorder obsessivecompulsive disorder and major depressive disorder Allitems use the same 4-point Likert-type response scales toassess the frequency of a certain symptom We binarizedthe item responses after transformation of never andsometimes into ldquono symptomrdquo and often and always intoldquosymptomrdquo For details on the RCADS as used in this

BioMed Research International 5

project see [31] +e results of DIF analysis of the RCADSacross male and female Serbian adolescents based on threeinferential methods (WLR PML and ML) are shown inTable 3 Our findings showed that ten items with DIF and33 items without DIF were common across the three

methods However the results indicate that the WLRmethod is more sensitive than the PML and ML fordetecting DIF For example in the separation anxietysubscale while the WLR model identified four items withDIF one and three items exhibited DIF according to PML

Table 2 Type I error rate of different methods of estimation under different combinations

Item Ratio NDIF 04 DIF 08

τ 0156 τ 0069 τ 0156 τ 0069ML PML WLR ML PML WLR ML PML WLR ML PML WLR

5 nf nr200 007 005 002 007 005 001lowast 007 005 003 007 005 001lowast600 005 004 003 005 003 001lowast 008 006 005 006 004 001lowast1000 007 005 004 006 004 001 011 008 008 006 005 001

5 nf 2nr200 007 005 003 007 005 001 008 006 004 007 005 001600 006 004 002 004 004 001 008 006 005 005 004 0011000 006 005 003 005 004 001 011 009 007 006 004 001

5 nf 3nr200 006 004 003 005 004 001 007 005 004 006 004 001600 007 005 003 007 005 001 008 006 004 007 005 0011000 007 006 004 006 004 001 009 007 007 006 005 001

15 nf nr200 005 003 002 005 003 001 005 003 002 005 003 001600 005 004 002 005 004 001 006 004 003 005 004 0011000 005 005 003 005 005 001 005 005 003 005 005 001

15 nf 2nr200 007 005 003 006 005 001lowast 007 005 003 007 005 001lowast600 006 004 003 006 005 001 006 005 003 006 005 0011000 004 004 002 004 003 001lowast 005 004 002 004 004 001lowast

15 nf 3nr200 006 005 002 006 005 001 006 005 002 006 005 001600 005 005 002 005 004 001lowast 005 005 002 005 005 001lowast1000 005 004 002 005 004 001 005 004 002 005 004 001lowast

Note Ratio sample size ratio between the reference and focal groups nr and nf represent sample sizes in reference and focal groups respectively N totalsample size N nr + nf τ the fraction of 1s in the population DIF differential item functioning ML maximum likelihood PML penalized maximumlikelihood WLR Weighted Logistic Regression lowastNear to 001

Table 1 Statistical power of different methods of estimation under different combinations

Item Ratio NDIF 04 DIF 08

τ 0156 τ 0069 τ 0156 τ 0069ML PML WLR ML PML WLR ML PML WLR ML PML WLR

5 nf nr200 013 010 016 009 006 012 034 030 042 020 015 028600 029 026 033 017 015 021 078 075 083 049 045 0581000 046 042 051 023 021 027 094 092 096 069 066 077

5 nf 2nr200 013 011 016 009 007 013 032 029 041 019 015 030600 029 027 032 016 015 022 074 072 082 045 044 0591000 041 039 046 023 022 029 089 088 093 062 061 073

5 nf 3nr200 013 011 017 009 008 016 030 028 041 017 016 029600 021 020 026 013 013 018 061 060 073 037 036 0511000 037 035 042 021 020 026 086 085 091 056 056 070

15 nf nr200 013 011 017 010 009 013 036 033 045 022 018 029600 034 032 038 020 019 023 081 080 087 052 049 0621000 048 046 053 028 025 032 096 096 098 074 072 081

15 nf 2nr200 013 012 017 011 009 015 035 033 044 022 020 032600 034 032 039 020 020 025 077 077 085 051 050 0651000 045 044 050 025 024 031 094 094 097 070 070 080

15 nf 3nr200 013 013 017 010 008 014 031 031 041 019 019 031600 025 025 031 015 015 021 074 074 083 043 044 0571000 037 037 043 023 024 030 090 089 094 064 066 077

Note Ratio sample size ratio between the reference and focal groups nr and nf represent sample sizes in reference and focal groups respectively N totalsample size N nr + nf τ the fraction of 1s in the population DIF differential item functioning ML maximum likelihood PML penalized maximumlikelihood WLR Weighted Logistic Regression

6 BioMed Research International

010

020

030

040

050

060

Pow

er

400 600 800 1000200

Total sample size

(a)

04

06

08

010

Pow

er

400 600 800 1000200

Total sample size

(b)

400 600 800 1000200

Total sample size

010

020

030

040

050

Pow

er

(c)

02

04

06

08

010

Pow

er

400 600 800 1000200

Total sample size

(d)

005

015

010

020

025

030

Pow

er

400 600 800 1000200

Total sample size

(e)

020

040

060

080

Pow

er

400 600 800 1000200

Total sample size

(f )

Figure 1 Continued

BioMed Research International 7

400 600 800 1000200

Total sample size

002

003

004

005

006

007

Type

I er

ror r

ate

(a)

002

004

006

008

Type

I er

ror r

ate

400 600 800 1000200

Total sample size

(b)

002

003

004

005

006

007

Type

I er

ror r

ate

1000600 800400200

Total sample size

(c)

002

003

004

005

006

007

008

Type

I er

ror r

ate

1000600 800400200

Total sample size

(d)

Figure 2 Continued

400 600 800 1000200

Total sample size

005

010

015

020

025

030

Pow

er

(g)

02

04

06

08

Pow

er

400 600 800 1000200

Total sample size

(h)

Figure 1 +e average power of MLE (solid lines) PML (dotted line) and WLR (broken line) methods on measures with 5 and 15 itemsNote Left panel for DIF 04 and right panel for DIF 08 From top to bottom the four panels are (nf nr τ 0156) (nf 3nr τ 0156)(nf nr τ 0069) and (nf 3nr τ 0069)

8 BioMed Research International

and ML respectively Moreover as compared with ML andPML WLR gives smaller standard error for regressioncoefficient and it can result in increased likelihood offinding DIF

4 Discussion

Detecting DIF based on the logistic regression model can bechallenging for binary data where events are rare or severely

000

002

004

006

Type

I er

ror r

ate

1000600 800400200

Total sample size

(e)

000

002

004

006

Type

I er

ror r

ate

1000600 800400200

Total sample size

(f )

000

001

002

003

004

005

006

Type

I er

ror r

ate

1000600 800400200

Total sample size

(g)

400 600 800 1000200

Total sample size

000

002

004

006

Type

I er

ror r

ate

(h)

Figure 2 +e average type I error rates of MLE (solid lines) PMLE (dotted line) and WLR (broken line) methods on measures with 5 and15 items Note Left panel for DIF 04 and right panel for DIF 08 From top to bottom the four panels are (nf nr τ 0156) (nf 3nrτ 0156) (nf nr τ 0069) and (nf 3nr τ 0069)

Table 3 +e results of DIF analysis across male and female Serbian individuals based on ML PML and WLR methods

ItemML PML WLR

B SE P value B SE P value B SE P value τ

Social phobia

7 061 029 0032 06 029 0042 061 031 0042 02420 minus 116 044 0007 minus 113 044 001 minus 119 042 0002 01738 124 044 0002 119 043 0004 118 041 0001 01943 minus 157 047 lt0001 minus 152 046 0001 minus 156 044 lt0001 019

Separation anxiety

9 minus 089 048 006 minus 086 047 01 minus 083 036 0015 03117 206 122 0045 17 106 0092 097 075 0008 01145 096 048 0034 091 047 0059 104 056 0036 00946 minus 304 114 0001 minus 269 101 0003 minus 19 051 lt0001 014

Generalized anxiety disorder 13 114 035 0001 112 035 0001 112 035 0001 03137 minus 201 044 lt0001 minus 196 044 lt0001 minus 195 044 lt0001 022

Obsessive compulsive disorder 23 079 043 0059 075 043 01 079 041 0038 018

Major depression2 123 045 0004 118 044 0006 119 045 0003 0196 minus 122 058 0031 minus 118 056 0043 minus 115 042 0004 01221 063 032 0048 062 032 0072 065 034 0049 026

Note P value is reported in three decimal places for more accuracy of comparing the three models τ the fraction of 1s in the population that is extracted froma data set of 4192 adolescents in eleven countries B regression coefficient for testing uniform DIF SE standard error of regression coefficient

BioMed Research International 9

imbalanced +e present study is designed to clarify theunknown consequences of rare events data in the context ofDIF analysis based on the LR model with and withoutapplying bias correction methods Hence in the compre-hensive simulation study we compared the performance ofthe WLR PML and ML methods for detecting DIF withfocus on imbalanced binary response data

According to our simulation findings of the three es-timation methods for detecting DIF the WLR seems to bethe most appealing with regards to its statistical propertiesits type I error rate is close to or less than the nominal 005level and its power is considerably higher than the PML andML Moreover in the present study comparing the threeinferential methods for the real data set confirmed thefindings of the simulation Accordingly as compared withWLR PML and ML were less sensitive for detecting DIFacross male and female Serbian adolescents

It is worthwhile to explain why the WLR outperforms thePML and ML estimation methods for DIF analysis Indeedthe WLR is an extension of the small-sample bias correctionsas described by Cordeiro and McCullagh to the weightedlikelihood [32] A previous simulation study has shown thatthis bias correction method reduces both the bias and thevariance thereby offering the smallest Mean Squared Error(MSE) when compared with that of PML and ML estimationmethods [33] Hence applying the WLR in DIF analysis canlead to reduction in the standard error of the parameterestimates which consequently increases the likelihood offinding DIF Another advantage of the WLR is that it givesunbiased predicted probability [15 20] However the pre-dicted probabilities from the WLRmodel may fall outside theplausible range of 0 to 1 [15]

Furthermore our findings revealed that the logisticregression model based on the traditional ML estimationmethod had a slightly better performance than the PML forimbalanced or rare events data +ese findings were similarto those of Lee who reported that the ML slightly out-performed the PML for detecting DIF in small samples [9]In this case as described by Lee it seems that the sacrifice inthe precision cannot be compensated for the bias reductionof the PML estimates in imbalanced data It should also benoted that while Firthrsquos PML method reduces the bias in theestimates of regression coefficients it introduces bias in thepredicted probabilities and this bias is not negligible for rareevents data [15] To overcome this problem Puhr et alproposed two modified versions of Firthrsquos PML resulting inunbiased predicted probabilities [15] Accordingly theproposed methods including Firthrsquos logistic regression withintercept correction and Firthrsquos logistic regression withadded covariate efficiently improve on predictions fromFirthrsquos logistic regression [15] Since these methods were notimplemented in the standard statistical software such as Rwe were not able to apply them in the context of DIF analysisand compare their results with those of the WLR modelHence in the present study the WLR is the most efficientmethod for DIF analysis under rare events data +e WLRnot only reduces the bias of regression coefficients but alsoprovides unbiased predicted probability in comparison tothe PML method [15 20]

What distinguishes this study from the previous one isthat we simultaneously evaluated the effect of imbalanced dataand small samples (as well as large samples) on the perfor-mance of the three estimation methods for DIF analysis Inaddition one of the advantages of the present study is that itprovides a guideline about the required sample size for DIFanalysis with imbalanced or rare events data Previous sim-ulation studies have shown that the minimum sample size forDIF analysis with logistic regression should be within therange of 100 to 200 per group [34 35] However for moderateimbalanced data (τ 0156) and severe DIF (DIF 08) as ageneral rule of thumb we would suggest imposing a mini-mum of 300 respondents per group to achieve the adequatepower with the WLR method In addition for severe im-balance rate (τ 0069) and DIF 08 theWLR performs wellto ensure the acceptable power with samples of 500 per groupTo find what the minimum number of sample size is toachieve the adequate power (80) for moderate DIF wesimulated items with DIF 04 τ 0156 and 0069 andsample sizes greater than 500 in each group not reported inthe results section +e findings revealed that sample sizes ofat least 1000 per group for τ 0156 and 2000 per group forτ 0069 are required to detect DIF 04 based on the WLRmethod

In the present study we restricted our simulation to onlytwo bias correction methods for detecting DIF namelyWLR and PML Nevertheless there are a limited number ofstudies that evaluated DIF based on other penalizationmethods such as LASSO For example Tutz and Schaubergerapplied LASSO penalization for DIF analysis which includescontinuous variables [36] Furthermore Magis et al pro-posed a new DIF detection method based on LASSO whereall items were simultaneously evaluated for DIF in a singlemodeling approach so that multiple testing would not be aproblem [37]

5 Conclusion

Although the logistic regression (LR) model is one the mostcommon methods for detecting DIF certain samplingstrategies and appropriate bias correction techniques shouldbe applied when LR is implemented on moderate or severeimbalanced data sets In summary our findings revealedthat as compared with ML and PML the WLR is a moresensitive method for detecting DIF when data are imbal-anced or rare Hence as well as its easy application toexisting software the WLR introduced by King and Zeng isstrongly recommended for detecting DIF due to higherpower and lower type I error rate in comparison to PML andML inferential methods However in the future studies ofDIF penalized and bias correction methods should beproposed for ordinal logistic regression in the presence ofrare events or small sample setting

Data Availability

+is is a simulation study and the real data set has beenprovided by Dr Dejan Stevanovic the third author of themanuscript

10 BioMed Research International

Conflicts of Interest

+e authors declare no potential conflicts of interest withrespect to the research authorship andor publication ofthis article

Acknowledgments

+is work was supported by the Shiraz University of MedicalSciences Research Council Shiraz Iran under grant no96-01-01-14601 +is paper was extracted fromMarjan FaghihrsquosPhD thesis under supervision of Prof Peyman JafariPeyman Jafari andMarjan Faghih equally contributed to thiswork

References

[1] B D Zumbo ldquo+ree generations of DIF analyses consideringwhere it has been where it is now and where it is goingrdquoLanguage Assessment Quarterly vol 4 no 2 pp 223ndash2332007

[2] T A Ackerman ldquo+e theory and practice of item responsetheory by de Ayala R Jrdquo Journal of Educational Measure-ment vol 47 no 4 pp 471ndash476 2010

[3] B E Clauser and K M Mazor ldquoUsing statistical proceduresto identify differentially functioning test itemsrdquo EducationalMeasurement Issues and Practice vol 17 no 1 pp 31ndash442005

[4] H Swaminathan and H J Rogers ldquoDetecting differential itemfunctioning using logistic regression proceduresrdquo Journal ofEducational Measurement vol 27 no 4 pp 361ndash370 1990

[5] H J Rogers and H Swaminathan ldquoA comparison of logisticregression and Mantel-Haenszel procedures for detectingdifferential item functioningrdquo Applied Psychological Mea-surement vol 17 no 2 pp 105ndash116 1993

[6] P Narayanon and H Swaminathan ldquoIdentification of itemsthat show nonuniform DIFrdquo Applied Psychological Mea-surement vol 20 no 3 pp 257ndash274 1996

[7] P Jafari D Stevanovic and Z Bagheri ldquoCross-culturalmeasurement equivalence of the KINDL questionnaire forquality of life assessment in children and adolescentsrdquo ChildPsychiatry amp Human Development vol 47 no 2 pp 291ndash3042016

[8] P K Crane L E Gibbons K Ocepek-Welikson et al ldquoAcomparison of three sets of criteria for determining thepresence of differential item functioning using ordinal logisticregressionrdquo Quality of Life Research vol 16 no 1 pp 69ndash842007

[9] S Lee ldquoDetecting differential item functioning using thelogistic regression procedure in small samplesrdquo AppliedPsychological Measurement vol 41 no 1 pp 30ndash43 2017

[10] E Allahyari P Jafari and Z Bagheri ldquoA simulation study toassess the effect of the number of response categories on thepower of ordinal logistic regression for differential itemfunctioning analysis in rating scalesrdquo Computational andMathematical Methods in Medicine vol 2016 Article ID5080826 8 pages 2016

[11] P O Monahan C A McHorney T E Stump andA J Perkins ldquoOdds ratio delta ETS classification andstandardization measures of dif magnitude for binary logisticregressionrdquo Journal of Educational and Behavioral Statisticsvol 32 no 1 pp 92ndash109 2007

[12] S Apinyapibal N Lawthong and S Kanjanawasee ldquoAcomparative analysis of the efficacy of differential itemfunctioning detection for dichotomously scored items amonglogistic regression SIBTEST and raschtree methodsrdquo Proce-dia-Social and Behavioral Sciences vol 191 pp 21ndash25 2015

[13] H-F Chen and K-Y Jin ldquoApplying logistic regression todetect differential item functioning in multidimensionaldatardquo Frontiers in Psychology vol 9 p 1302 2018

[14] D Svetina and L Rutkowski ldquoDetecting differential itemfunctioning using generalized logistic regression in thecontext of large-scale assessmentsrdquo Large-scale Assessments inEducation vol 2 no 1 2014

[15] R Puhr G Heinze M Nold et al ldquoFirthrsquos logistic regressionwith rare events accurate effect estimates and predictionsrdquoStatistics in Medicine vol 36 no 14 pp 2302ndash2317 2017

[16] D Firth ldquoBias reduction of maximum likelihood estimatesrdquoBiometrika vol 80 no 1 pp 27ndash38 1993

[17] M Pavlou G Ambler S Seaman M De Iorio andR Z Omar ldquoReview and evaluation of penalised regressionmethods for risk prediction in low-dimensional data with feweventsrdquo Statistics in Medicine vol 35 no 7 pp 1159ndash11772016

[18] M Maalouf and T B Trafalis ldquoRobust weighted kernel lo-gistic regression in imbalanced and rare events datardquo Com-putational Statistics amp Data Analysis vol 55 no 1pp 168ndash183 2011

[19] M Maalouf D Homouz and T B Trafalis ldquoLogistic re-gression in large rare events and imbalanced data a perfor-mance comparison of prior correction and weightingmethodsrdquo Computational Intelligence vol 34 no 1pp 161ndash174 2018

[20] G King and L Zeng ldquoLogistic regression in rare events datardquoPolitical Analysis vol 9 no 2 pp 137ndash163 2001

[21] I-C Huang W L Leite P Shearer M Seid D A Revickiand E A Shenkman ldquoDifferential item functioning in qualityof life measure between children with and without specialhealth-care needsrdquo Value in Health vol 14 no 6 pp 872ndash883 2011

[22] H Jeffreys ldquoAn invariant form for the prior probability inestimation problemsrdquo Proceedings of the Royal Society ofLondon Series A Mathematical and Physical Sciencesvol 186 no 1007 pp 453ndash461 1946

[23] G Heinze and M Schemper ldquoA solution to the problem ofseparation in logistic regressionrdquo Statistics in Medicinevol 21 no 16 pp 2409ndash2419 2002

[24] G Heinze and M Ploner ldquoFixing the nonconvergence bug inlogistic regression with SPLUS and SASrdquo Computer Methodsand Programs in Biomedicine vol 71 no 2 pp 181ndash187 2003

[25] G King and L Zeng ldquoExplaining rare events in internationalrelationsrdquo International Organization vol 55 no 3pp 693ndash715 2001

[26] M Maalouf ldquoLogistic regression in data analysis an over-viewrdquo International Journal of Data Analysis Techniques andStrategies vol 3 no 3 pp 281ndash299 2011

[27] D E A Sulasih S W Purnami and S P Rahayu BeBeoretical Study of Rare Event Weighted Logistic Regressionfor Classification of Imbalanced Data Universitas Muham-madiyah Surakarta Surakarta Indonesia 2015

[28] G Heinze logistf Firthrsquos Bias Reduced Logistic RegressionR Package Version 123 2018 httpsCRANR-projectorgpackage=logistf

[29] C Choirat J Honaker K Imai G King and O Lau ZeligEveryonersquos Statistical Software Version 5161 2018 httpzeligprojectorg

BioMed Research International 11

[30] M L Ong L Lu S Lee et al ldquoA comparison of the hier-archical generalized linear model multiple-indicators mul-tiple-causes and the item response theory-likelihood ratio testfor detecting differential item functioningrdquo in QuantitativePsychology Research pp 343ndash357 Springer Berlin Germany2015

[31] D Stevanovic Z Bagheri O Atilola et al ldquoCross-culturalmeasurement invariance of the revised Child anxiety anddepression scale across 11 world-wide societiesrdquo Epidemiol-ogy and Psychiatric Sciences vol 26 no 4 pp 430ndash440 2017

[32] G M Cordeiro and P McCullagh ldquoBias correction in gen-eralized linear modelsrdquo Journal of the Royal Statistical SocietySeries B (Methodological) vol 53 no 3 pp 629ndash643 1991

[33] T Maiti and V Pradhan ldquoBias reduction and a solution forseparation of logistic regression with missing covariatesrdquoBiometrics vol 65 no 4 pp 1262ndash1269 2009

[34] B D Zumbo A Handbook on the Beory and Methods ofDifferential Item Functioning (DIF) National DefenseHeadquarters Ottawa Canada 1999

[35] J-S Lai J Teresi and R Gershon ldquoProcedures for the analysisof differential item functioning (DIF) for small sample sizesrdquoEvaluation amp the Health Professions vol 28 no 3 pp 283ndash294 2005

[36] G Tutz and G Schauberger A Penalty Approach to Differ-ential Item Functioning in Rasch Models Springer BerlinGermany 2015

[37] D Magis F Tuerlinckx and P de Boeck ldquoDetection ofdifferential item functioning using the lasso approachrdquoJournal of Educational and Behavioral Statistics vol 40 no 2pp 111ndash135 2015

12 BioMed Research International

Page 3: A Comparative Study of the Bias Correction Methods for ... · 2020. 3. 2. · mum likelihood estimation method (PML) was originally developedbyDavidFirth[16]inordertoreducethesmall

L(β) 1113945n

i1πi( 1113857

yi 1 minus πi( 11138571minus yi 1113945

n

i1

eXiβ

1 + eXiβ1113888 1113889

yi 11 + eXiβ

1113874 11138751minus yi

ln L(β) 1113944n

i1yi ln πi + 1 minus yi( 1113857ln 1 minus πi( 1113857( 1113857 1113944

n

i1yi ln

eXiβ

1 + eXiβ1113888 1113889 + 1 minus yi( 1113857ln

11 + eXiβ

1113874 11138751113888 1113889

(4)

Unfortunately there is no closed-form expression formaximizing the log likelihood of β +e ML estimations cantherefore be obtained using numerical optimization methodswhich start with a guess point and repeat to improve +eNewtonndashRaphson method is one of the most commonly usednumerical methods which needs theHessianmatrix as follows

z2 ln L

zβr2 1113944

n

i1

minus x2ire

Xiβ

1 + eXiβ( 11138572

⎛⎝ ⎞⎠ 1113944n

i1minus xir

2 πi 1 minus πi( 1113857( 11138571113872 1113873 (5)

If vi is defined as πi(1 minus πi) and V diag(v1 vn)then the Hessian matrix can be written as

H(β) minus XprimeVX (6)

+e information matrix is given by I(β) minus (z2 lnLzβ2) XprimeVX and the variance of β is then V(β)

I(β)minus 1 (XprimeVX)minus 1

23 Penalized Maximum Likelihood +e penalized maxi-mum likelihood estimation method (PML) was originallydeveloped by David Firth [16] in order to reduce the smallsample bias of maximum likelihood estimates For expo-nential family models this method corresponds to penali-zation of the likelihood by Jeffreysrsquo invariant prior [22]+us the penalized log likelihood for logistic regressiontakes the following form

ln L(β)lowast

ln L(β) + 5 ln|I(β)| (7)

where |I(β)| denotes the determinant of the Fisher infor-mation matrix evaluated at β Penalized maximum likeli-hood estimates for βr (r 1 k) are involved in calculating

z ln L(β)lowast

zβr

U βr( 1113857lowast

1113944n

i1yi minus πi + hi 05 minus πi( 1113857( 1113857xir 0

r 1 k

(8)

where the hirsquos represent the diagonal elements of the pe-nalized likelihood version of the standard ldquohatrdquo matrix

H V12

X XprimeVX( 1113857minus 1

XprimeV12 (9)

By this method the first-order term is removed from theTaylor series expansion of the bias of the ML estimatorwhich has negligible impacts in large samples but can besevere in small samples or rare events

Now Firth-type estimation 1113954β can be produced itera-tively by NewtonndashRaphson algorithmwith a starting value of1113954β

(0) 0 until convergence is obtained

1113954β(s+1)

1113954βs

+ Iminus 1 1113954β

s1113872 1113873U 1113954β

s1113872 1113873lowast (10)

where the superscript s is the sth iteration If the penalizedlog likelihood assessed at 1113954β(s+1) is less than that assessed at 1113954β

s

then 1113954β(s+1) is recalculated by step-halving +e PML esti-mations can also be created by carrying out theMLEmethodand splitting each main observation i into two new obser-vations having outcomes 1 minus yi and yi with weights hi2and 1 + (hi2) respectively +e new observations changethe score function for the ML method to (yi minus πi)(1+1113864

(hi2)) + (1 minus yi minus πi)hi2xir yi minus πi + hi(05 minus πi)1113864 1113865xir+erefore the breaking of each observation into a nonre-sponse and a response ensures that the finite PML estimatesalways exist [23 24]

+e standard error estimation is based on the root of theinverse diagonal elements of the Fisher information matrixwhich is approximated by minus (z2 ln Llowastzβ2)1113966 1113967

minus 1[16]

24 Weighted Logistic Regression +e weighted logistic re-gression (WLR) introduced by King and Zeng [20 25] is oneof the bias correction methods which considers two cor-rection steps +e first correction concerns the weights tomake up for the differences in the proportion of events in thesample and population So the following weighted log-likelihood should be maximized

ln Lw(β | yX) 1113944n

i1wi ln

eyiXiβ

1 + eXiβ1113888 1113889 (11)

where wi w1yi + w0(1 minus yi) w0 1 minus τ1 minus y w1 τywith y and τ that are the fraction of 1s in the sample andpopulation respectively in which the above weighted log-likelihood is obtained under pure case-control sampling asfollows

Suppose that the joint distribution of y and X in thesample is

fs(yX | β) Ps(X | y β)Ps(y) (12)

Yet since X is a matrix of exploratory variables thenPs(X | y β) P(X | y β) In the other words the condi-tional probabilities of X in the sample and population areequal However the conditional probability of the pop-ulation is

P(X | y β) f(yX | β)

P(y) (13)

But

f(yX | β) P(y |X β)P(X) (14)

BioMed Research International 3

Also by replacing and rearranging

fs(yX | β) Ps(X | y β)Ps(y) P(X | y β)Ps(y) f(yX | β)

P(y)Ps(y)

Ps(y)

P(y)P(y |X β)P(X)

HQ

P(y |X β)P(X) (15)

where H and Q represent the proportions in the sample andpopulation respectively +e likelihood is then

L(β) 1113945n

i1

Hi

Qi

P yi

1113868111386811138681113868 xi β1113872 1113873P xi( 1113857 (16)

+e log-likelihood for WLR can then be rewritten as

ln Lw(β | yX) 1113944n

i1

Qi

Hi

lnP yi

1113868111386811138681113868 xi β1113872 1113873 1113944n

i1

Qi

Hi

lneyiXiβ

1 + eXiβ1113888 1113889 1113944

n

i1wi ln

eyiXiβ

1 + eXiβ1113888 1113889 (17)

where wi QiHi +us the likelihood function must bemultiplied by the inverse of the fractions in order to obtain aconsistent estimator If the proportion of events in thepopulation is more than that in the sample then wi is morethan one and so the nonevent are given less weight and viceversa [26 27] +e weighting method has two seriousproblems that limit its application first the calculation ofthe standard errors through the information matrix is se-verely biased +is problem will be solved by using Whitersquosheteroscedasticity-consistent variance matrix [20]

Second slope coefficient is biased in rare events data Forsolving the second problem the bias in 1113954β can be estimated bythe following weighted least-squares expression

bias(1113954β) XprimeVX( 1113857minus 1

XprimeVξ (18)

If bias (1113954β) is the bias component and 1113954β is the uncorrectedcoefficient then the corrected coefficients 1113957β is 1113957β 1113954βminus

bias(1113954β) where

ξi 05Qii 1 + w1( 11138571113954πi minus w11113858 1113859 (19)

Also Qii are the diagonal elements of Q X(XprimeVX)minus 1

Xprime and V diag 1113954πi(1 minus 1113954πi)wi1113864 1113865+e variance matrix of 1113957β is

V(1113957β) n

n + k1113874 1113875

2V(1113954β) (20)

Since (n(n + k))2 lt 1 thenV(1113957β)ltV(1113954β) and so both thevariance and the bias are now diminished

+e second correction step is to adjust the underesti-mation of the probabilities when using the bias-correctedcoefficients in the logistic model By subtracting a correctionfactor Ci to the biased value of probability 1113957πi

πi 1113957πi minus Ci Ci 05 minus 1113957πi( 11138571113957πi 1 minus 1113957πi( 1113857X0V(1113957β)X0prime (21)

where V(1113957β) is the variance-covariance matrix X0 is a vectorof exploratory variables and πi is the approximate unbiasedestimator by means of a known τ When τ is unknown orpartially known King and Zeng introduced πi 1113957πi + Ci asthe approximate Bayesian estimator In this case the

researcher may specify an upper and lower bound for thepossible range of τ

Maximum Likelihood (ML) Penalized MaximumLikelihood (PML) and rare event logistic regression wereimplemented using the ldquoglmrdquo function and the ldquologistfrdquo Rpackage as well as the ldquorelogitrdquo function in the R packageZelig respectively [28 29]

25 Data Generation In this study an item response theorymodel for binary data was used to produce response data+e mathematical form of the IRT model is

pij Y 1 | θi( 1113857 eaj θi minus bj( 1113857

1 + eaj θi minus bj( 1113857 (22)

where pij(θ) is the probability of correct response for in-dividual i of item j aj denotes the item discriminationparameter bj is the item difficulty parameter and θi rep-resents the ability level for the ith individual In this study bj

(j 2 3 J) and θ parameters were simulated from thestandard normal distribution and aj (j 2 3 J) pa-rameters were random samples of a uniform distributionwithin the interval (1 2) [1]

In this simulation study five factors were varied samplesize magnitude of uniform DIF sample size ratio numberof items and the degree of rareness or imbalance +reesample sizes (N 200 600 1000) and three levels of samplesize ratio (R 1 2 3) were investigated+e sample size ratiobetween the focal and reference groups was set to 1 1 for theequal sample size conditions and 2 1 and 3 1 for the un-equal sample size conditions More specifically we createdconditions with nRnF 100100 67133 and 50150 for thesmall sample size (N 200) and nRnF 300300 200400and 150450 for the medium sample size (N 600) and nRnF 500500 333667 and 250750 for the large sample size(N 1000) Furthermore the two measures with 5 and 15items were simulated (I 5 15)

To generate rare events data we followed similar sce-narios and notations used by King and Zeng In order togenerate imbalanced or rare events data we applied the logitmodel log it(πi) β0 + β1xi πi p(Yi 1) by holding β1

4 BioMed Research International

constant (β1 1) while varying β0 In this case the values ofβ0 minus 3 and β0 minus 2 generate outcomes with the percentagesof ones which are equal to 69 and 156 respectively Itshould be noted that we generate the simulated data basedon the IRTmodel not the logistic regression Hence we needto show that the parameters of the logistic regression and theIRT model with binary responses are one-to-one corre-spondent In order to denote that both models have similarmathematical expressions equation (22) can be rearrangedas log it(πi) minus ajbj + ajθi By comparing this equation withthe logit function log it(πi) β0 + β1xi coefficients β0 andβ1 and the variable xi in the logistic regression model arefound to be respectively correspondent to parameters minus ajbjaj and θi in the IRTmodel (ie β0 ndashajbj β1 aj and xi θi)With the assumption of aj β1 1 the constant parameter(β0) in the logistic regression is equivalent to the minus valueof the difficulty parameter (minus bj) in the IRT model Ac-cordingly in the IRT framework and when aj 1 with thevalues of bj 3 and 2 we can generate responses with 69and 156 imbalanced degree respectively

Statistical power is defined by the proportion of timesthat DIF is correctly identified by the logistic method acrossreplications and the type I error rate also referred to thefalse positive rate represents the proportion of non-DIFitems incorrectly flagged as having DIF in 1000 replications+e type I error rates are averaged over all without DIF items[30]

3 Results

Table 1 presents the statistical power of the LR model basedon three different inferential methods (ML PML andWLR)and under various combinations of τ R N DIF and scalelength +e major finding was that the power of the ML andPML was more affected by imbalanced degree (τ) than byWLR estimation method Specifically for the moderatemagnitude of DIF (DIF 04) regardless of the sample sizeand number of items increasing imbalanced or rarenessdegree τ from 0156 to 0069 resulted in a reduction byapproximately 38 37 and 30 in the power of PMLMLand WLR respectively Furthermore for the severe mag-nitude of DIF (DIF 08) a similar finding was observedie the power of the PML ML and WLR was reducedapproximately by 37 35 and 26 respectively Ingeneral our findings indicated that the power of the threeestimationmethods for detecting DIF could be ordered fromhighest to lowest as follows WLRgeMLge PML Indeed forthe moderate values of DIF (DIF 04) compared withWLR there were reductions of approximately 30 and 24under τ 0069 and 22 and 16 under τ 0156 in thepower for the PML andML respectively On the other handwhen the magnitude of DIF was severe (DIF 08) com-pared with WLR there were reductions of approximately27 and 23 under τ 0069 and 14 and 12 underτ 0156 in the power for the PML and ML respectively

When the magnitude of DIF was moderate (DIF 04)N 1000 R 1 and I 15 the maximum power of the PMLML andWLRwas 46 48 and 53 for τ 0156 and 2528 and 32 for τ 0069 respectively Hence to achieve

the adequate power (08) for detecting moderate DIF werequire a sample size of larger than 1000 (N 1000) Fur-thermore for the severe of DIF (DIF 08) and τ 0156 thepower of the PMLML andWLRwas 75 78 and 83 forI 5 and 80 81 and 87 for I 15 respectively whenN 600 and R 1 2 However under DIF 08 τ 0069R 1 and N 1000 the power of the PML ML and WLRwas 66 69 and 77 for I 5 and 72 74 and 81 forI 15 respectively

Table 2 reports the empirical type I error rate of the LRmodel under various combinations of imbalanced degree ofdata sample size sample size ratio magnitude of DIF andthe number of items In general regardless of the magnitudeof DIF sample size sample size ratio and the number ofitems the average type I error rates of the ML PML andWLR were 006 005 and 003 for τ 0156 and 006 004and 001 for τ 0069 +ese findings indicate that when theimbalanced degree τ increased from 0156 to 0069 theaverage type I error rate of the WLR decreased dramaticallywhile it was close to the 5 level for the ML and PMLmethods However there were some exceptions where thetype I error rates of the ML PML and WLR were higherthan the nominal level of 5 When DIF 08 N 1000I 5 and τ 0156 the empirical type I error rate was 1111 and 9 for ML 8 9 and 8 for PML and 8 7and 7 for WLR when R was equal to 1 2 and 3 re-spectively In this case as shown in Table 2 when thenumber of items increased from 5 to 15 the type I error ratewould be equal or less than 5

+e average power and type I error rate on measureswith 5 and 15 items are depicted in Figures 1 and 2According to Figure 1 irrespective of the number of itemsthe highest power in all combinations belonged to WLRwhile PML and ML had relatively equivalent powerFigure 2 indicates that in most simulation conditions typeI error rate was above or exactly at the nominal significancelevel of 005 for ML and below or exactly equal to thenominal level for the PML method (ML range005ndash011and PML range 003ndash005) However in WLRtype I error rate was lower than the other two methods andfluctuated between 002 and 005 from 0 to 001 forτ 0156 and τ 0069 respectively

31 Real Data Example In this section we used a real dataset to validate the simulation findings Accordingly 387Serbian individuals (403 male and 507 female) wereselected out of a larger sample of 4192 people from elevencountries participating in a cross-cultural study [31] +eparticipants completed the 47-item Revised Child Anxietyand Depression Scale (RCADS) +e RCADS divided intosix subscales including social phobia separation anxietydisorder generalized anxiety panic disorder obsessivecompulsive disorder and major depressive disorder Allitems use the same 4-point Likert-type response scales toassess the frequency of a certain symptom We binarizedthe item responses after transformation of never andsometimes into ldquono symptomrdquo and often and always intoldquosymptomrdquo For details on the RCADS as used in this

BioMed Research International 5

project see [31] +e results of DIF analysis of the RCADSacross male and female Serbian adolescents based on threeinferential methods (WLR PML and ML) are shown inTable 3 Our findings showed that ten items with DIF and33 items without DIF were common across the three

methods However the results indicate that the WLRmethod is more sensitive than the PML and ML fordetecting DIF For example in the separation anxietysubscale while the WLR model identified four items withDIF one and three items exhibited DIF according to PML

Table 2 Type I error rate of different methods of estimation under different combinations

Item Ratio NDIF 04 DIF 08

τ 0156 τ 0069 τ 0156 τ 0069ML PML WLR ML PML WLR ML PML WLR ML PML WLR

5 nf nr200 007 005 002 007 005 001lowast 007 005 003 007 005 001lowast600 005 004 003 005 003 001lowast 008 006 005 006 004 001lowast1000 007 005 004 006 004 001 011 008 008 006 005 001

5 nf 2nr200 007 005 003 007 005 001 008 006 004 007 005 001600 006 004 002 004 004 001 008 006 005 005 004 0011000 006 005 003 005 004 001 011 009 007 006 004 001

5 nf 3nr200 006 004 003 005 004 001 007 005 004 006 004 001600 007 005 003 007 005 001 008 006 004 007 005 0011000 007 006 004 006 004 001 009 007 007 006 005 001

15 nf nr200 005 003 002 005 003 001 005 003 002 005 003 001600 005 004 002 005 004 001 006 004 003 005 004 0011000 005 005 003 005 005 001 005 005 003 005 005 001

15 nf 2nr200 007 005 003 006 005 001lowast 007 005 003 007 005 001lowast600 006 004 003 006 005 001 006 005 003 006 005 0011000 004 004 002 004 003 001lowast 005 004 002 004 004 001lowast

15 nf 3nr200 006 005 002 006 005 001 006 005 002 006 005 001600 005 005 002 005 004 001lowast 005 005 002 005 005 001lowast1000 005 004 002 005 004 001 005 004 002 005 004 001lowast

Note Ratio sample size ratio between the reference and focal groups nr and nf represent sample sizes in reference and focal groups respectively N totalsample size N nr + nf τ the fraction of 1s in the population DIF differential item functioning ML maximum likelihood PML penalized maximumlikelihood WLR Weighted Logistic Regression lowastNear to 001

Table 1 Statistical power of different methods of estimation under different combinations

Item Ratio NDIF 04 DIF 08

τ 0156 τ 0069 τ 0156 τ 0069ML PML WLR ML PML WLR ML PML WLR ML PML WLR

5 nf nr200 013 010 016 009 006 012 034 030 042 020 015 028600 029 026 033 017 015 021 078 075 083 049 045 0581000 046 042 051 023 021 027 094 092 096 069 066 077

5 nf 2nr200 013 011 016 009 007 013 032 029 041 019 015 030600 029 027 032 016 015 022 074 072 082 045 044 0591000 041 039 046 023 022 029 089 088 093 062 061 073

5 nf 3nr200 013 011 017 009 008 016 030 028 041 017 016 029600 021 020 026 013 013 018 061 060 073 037 036 0511000 037 035 042 021 020 026 086 085 091 056 056 070

15 nf nr200 013 011 017 010 009 013 036 033 045 022 018 029600 034 032 038 020 019 023 081 080 087 052 049 0621000 048 046 053 028 025 032 096 096 098 074 072 081

15 nf 2nr200 013 012 017 011 009 015 035 033 044 022 020 032600 034 032 039 020 020 025 077 077 085 051 050 0651000 045 044 050 025 024 031 094 094 097 070 070 080

15 nf 3nr200 013 013 017 010 008 014 031 031 041 019 019 031600 025 025 031 015 015 021 074 074 083 043 044 0571000 037 037 043 023 024 030 090 089 094 064 066 077

Note Ratio sample size ratio between the reference and focal groups nr and nf represent sample sizes in reference and focal groups respectively N totalsample size N nr + nf τ the fraction of 1s in the population DIF differential item functioning ML maximum likelihood PML penalized maximumlikelihood WLR Weighted Logistic Regression

6 BioMed Research International

010

020

030

040

050

060

Pow

er

400 600 800 1000200

Total sample size

(a)

04

06

08

010

Pow

er

400 600 800 1000200

Total sample size

(b)

400 600 800 1000200

Total sample size

010

020

030

040

050

Pow

er

(c)

02

04

06

08

010

Pow

er

400 600 800 1000200

Total sample size

(d)

005

015

010

020

025

030

Pow

er

400 600 800 1000200

Total sample size

(e)

020

040

060

080

Pow

er

400 600 800 1000200

Total sample size

(f )

Figure 1 Continued

BioMed Research International 7

400 600 800 1000200

Total sample size

002

003

004

005

006

007

Type

I er

ror r

ate

(a)

002

004

006

008

Type

I er

ror r

ate

400 600 800 1000200

Total sample size

(b)

002

003

004

005

006

007

Type

I er

ror r

ate

1000600 800400200

Total sample size

(c)

002

003

004

005

006

007

008

Type

I er

ror r

ate

1000600 800400200

Total sample size

(d)

Figure 2 Continued

400 600 800 1000200

Total sample size

005

010

015

020

025

030

Pow

er

(g)

02

04

06

08

Pow

er

400 600 800 1000200

Total sample size

(h)

Figure 1 +e average power of MLE (solid lines) PML (dotted line) and WLR (broken line) methods on measures with 5 and 15 itemsNote Left panel for DIF 04 and right panel for DIF 08 From top to bottom the four panels are (nf nr τ 0156) (nf 3nr τ 0156)(nf nr τ 0069) and (nf 3nr τ 0069)

8 BioMed Research International

and ML respectively Moreover as compared with ML andPML WLR gives smaller standard error for regressioncoefficient and it can result in increased likelihood offinding DIF

4 Discussion

Detecting DIF based on the logistic regression model can bechallenging for binary data where events are rare or severely

000

002

004

006

Type

I er

ror r

ate

1000600 800400200

Total sample size

(e)

000

002

004

006

Type

I er

ror r

ate

1000600 800400200

Total sample size

(f )

000

001

002

003

004

005

006

Type

I er

ror r

ate

1000600 800400200

Total sample size

(g)

400 600 800 1000200

Total sample size

000

002

004

006

Type

I er

ror r

ate

(h)

Figure 2 +e average type I error rates of MLE (solid lines) PMLE (dotted line) and WLR (broken line) methods on measures with 5 and15 items Note Left panel for DIF 04 and right panel for DIF 08 From top to bottom the four panels are (nf nr τ 0156) (nf 3nrτ 0156) (nf nr τ 0069) and (nf 3nr τ 0069)

Table 3 +e results of DIF analysis across male and female Serbian individuals based on ML PML and WLR methods

ItemML PML WLR

B SE P value B SE P value B SE P value τ

Social phobia

7 061 029 0032 06 029 0042 061 031 0042 02420 minus 116 044 0007 minus 113 044 001 minus 119 042 0002 01738 124 044 0002 119 043 0004 118 041 0001 01943 minus 157 047 lt0001 minus 152 046 0001 minus 156 044 lt0001 019

Separation anxiety

9 minus 089 048 006 minus 086 047 01 minus 083 036 0015 03117 206 122 0045 17 106 0092 097 075 0008 01145 096 048 0034 091 047 0059 104 056 0036 00946 minus 304 114 0001 minus 269 101 0003 minus 19 051 lt0001 014

Generalized anxiety disorder 13 114 035 0001 112 035 0001 112 035 0001 03137 minus 201 044 lt0001 minus 196 044 lt0001 minus 195 044 lt0001 022

Obsessive compulsive disorder 23 079 043 0059 075 043 01 079 041 0038 018

Major depression2 123 045 0004 118 044 0006 119 045 0003 0196 minus 122 058 0031 minus 118 056 0043 minus 115 042 0004 01221 063 032 0048 062 032 0072 065 034 0049 026

Note P value is reported in three decimal places for more accuracy of comparing the three models τ the fraction of 1s in the population that is extracted froma data set of 4192 adolescents in eleven countries B regression coefficient for testing uniform DIF SE standard error of regression coefficient

BioMed Research International 9

imbalanced +e present study is designed to clarify theunknown consequences of rare events data in the context ofDIF analysis based on the LR model with and withoutapplying bias correction methods Hence in the compre-hensive simulation study we compared the performance ofthe WLR PML and ML methods for detecting DIF withfocus on imbalanced binary response data

According to our simulation findings of the three es-timation methods for detecting DIF the WLR seems to bethe most appealing with regards to its statistical propertiesits type I error rate is close to or less than the nominal 005level and its power is considerably higher than the PML andML Moreover in the present study comparing the threeinferential methods for the real data set confirmed thefindings of the simulation Accordingly as compared withWLR PML and ML were less sensitive for detecting DIFacross male and female Serbian adolescents

It is worthwhile to explain why the WLR outperforms thePML and ML estimation methods for DIF analysis Indeedthe WLR is an extension of the small-sample bias correctionsas described by Cordeiro and McCullagh to the weightedlikelihood [32] A previous simulation study has shown thatthis bias correction method reduces both the bias and thevariance thereby offering the smallest Mean Squared Error(MSE) when compared with that of PML and ML estimationmethods [33] Hence applying the WLR in DIF analysis canlead to reduction in the standard error of the parameterestimates which consequently increases the likelihood offinding DIF Another advantage of the WLR is that it givesunbiased predicted probability [15 20] However the pre-dicted probabilities from the WLRmodel may fall outside theplausible range of 0 to 1 [15]

Furthermore our findings revealed that the logisticregression model based on the traditional ML estimationmethod had a slightly better performance than the PML forimbalanced or rare events data +ese findings were similarto those of Lee who reported that the ML slightly out-performed the PML for detecting DIF in small samples [9]In this case as described by Lee it seems that the sacrifice inthe precision cannot be compensated for the bias reductionof the PML estimates in imbalanced data It should also benoted that while Firthrsquos PML method reduces the bias in theestimates of regression coefficients it introduces bias in thepredicted probabilities and this bias is not negligible for rareevents data [15] To overcome this problem Puhr et alproposed two modified versions of Firthrsquos PML resulting inunbiased predicted probabilities [15] Accordingly theproposed methods including Firthrsquos logistic regression withintercept correction and Firthrsquos logistic regression withadded covariate efficiently improve on predictions fromFirthrsquos logistic regression [15] Since these methods were notimplemented in the standard statistical software such as Rwe were not able to apply them in the context of DIF analysisand compare their results with those of the WLR modelHence in the present study the WLR is the most efficientmethod for DIF analysis under rare events data +e WLRnot only reduces the bias of regression coefficients but alsoprovides unbiased predicted probability in comparison tothe PML method [15 20]

What distinguishes this study from the previous one isthat we simultaneously evaluated the effect of imbalanced dataand small samples (as well as large samples) on the perfor-mance of the three estimation methods for DIF analysis Inaddition one of the advantages of the present study is that itprovides a guideline about the required sample size for DIFanalysis with imbalanced or rare events data Previous sim-ulation studies have shown that the minimum sample size forDIF analysis with logistic regression should be within therange of 100 to 200 per group [34 35] However for moderateimbalanced data (τ 0156) and severe DIF (DIF 08) as ageneral rule of thumb we would suggest imposing a mini-mum of 300 respondents per group to achieve the adequatepower with the WLR method In addition for severe im-balance rate (τ 0069) and DIF 08 theWLR performs wellto ensure the acceptable power with samples of 500 per groupTo find what the minimum number of sample size is toachieve the adequate power (80) for moderate DIF wesimulated items with DIF 04 τ 0156 and 0069 andsample sizes greater than 500 in each group not reported inthe results section +e findings revealed that sample sizes ofat least 1000 per group for τ 0156 and 2000 per group forτ 0069 are required to detect DIF 04 based on the WLRmethod

In the present study we restricted our simulation to onlytwo bias correction methods for detecting DIF namelyWLR and PML Nevertheless there are a limited number ofstudies that evaluated DIF based on other penalizationmethods such as LASSO For example Tutz and Schaubergerapplied LASSO penalization for DIF analysis which includescontinuous variables [36] Furthermore Magis et al pro-posed a new DIF detection method based on LASSO whereall items were simultaneously evaluated for DIF in a singlemodeling approach so that multiple testing would not be aproblem [37]

5 Conclusion

Although the logistic regression (LR) model is one the mostcommon methods for detecting DIF certain samplingstrategies and appropriate bias correction techniques shouldbe applied when LR is implemented on moderate or severeimbalanced data sets In summary our findings revealedthat as compared with ML and PML the WLR is a moresensitive method for detecting DIF when data are imbal-anced or rare Hence as well as its easy application toexisting software the WLR introduced by King and Zeng isstrongly recommended for detecting DIF due to higherpower and lower type I error rate in comparison to PML andML inferential methods However in the future studies ofDIF penalized and bias correction methods should beproposed for ordinal logistic regression in the presence ofrare events or small sample setting

Data Availability

+is is a simulation study and the real data set has beenprovided by Dr Dejan Stevanovic the third author of themanuscript

10 BioMed Research International

Conflicts of Interest

+e authors declare no potential conflicts of interest withrespect to the research authorship andor publication ofthis article

Acknowledgments

+is work was supported by the Shiraz University of MedicalSciences Research Council Shiraz Iran under grant no96-01-01-14601 +is paper was extracted fromMarjan FaghihrsquosPhD thesis under supervision of Prof Peyman JafariPeyman Jafari andMarjan Faghih equally contributed to thiswork

References

[1] B D Zumbo ldquo+ree generations of DIF analyses consideringwhere it has been where it is now and where it is goingrdquoLanguage Assessment Quarterly vol 4 no 2 pp 223ndash2332007

[2] T A Ackerman ldquo+e theory and practice of item responsetheory by de Ayala R Jrdquo Journal of Educational Measure-ment vol 47 no 4 pp 471ndash476 2010

[3] B E Clauser and K M Mazor ldquoUsing statistical proceduresto identify differentially functioning test itemsrdquo EducationalMeasurement Issues and Practice vol 17 no 1 pp 31ndash442005

[4] H Swaminathan and H J Rogers ldquoDetecting differential itemfunctioning using logistic regression proceduresrdquo Journal ofEducational Measurement vol 27 no 4 pp 361ndash370 1990

[5] H J Rogers and H Swaminathan ldquoA comparison of logisticregression and Mantel-Haenszel procedures for detectingdifferential item functioningrdquo Applied Psychological Mea-surement vol 17 no 2 pp 105ndash116 1993

[6] P Narayanon and H Swaminathan ldquoIdentification of itemsthat show nonuniform DIFrdquo Applied Psychological Mea-surement vol 20 no 3 pp 257ndash274 1996

[7] P Jafari D Stevanovic and Z Bagheri ldquoCross-culturalmeasurement equivalence of the KINDL questionnaire forquality of life assessment in children and adolescentsrdquo ChildPsychiatry amp Human Development vol 47 no 2 pp 291ndash3042016

[8] P K Crane L E Gibbons K Ocepek-Welikson et al ldquoAcomparison of three sets of criteria for determining thepresence of differential item functioning using ordinal logisticregressionrdquo Quality of Life Research vol 16 no 1 pp 69ndash842007

[9] S Lee ldquoDetecting differential item functioning using thelogistic regression procedure in small samplesrdquo AppliedPsychological Measurement vol 41 no 1 pp 30ndash43 2017

[10] E Allahyari P Jafari and Z Bagheri ldquoA simulation study toassess the effect of the number of response categories on thepower of ordinal logistic regression for differential itemfunctioning analysis in rating scalesrdquo Computational andMathematical Methods in Medicine vol 2016 Article ID5080826 8 pages 2016

[11] P O Monahan C A McHorney T E Stump andA J Perkins ldquoOdds ratio delta ETS classification andstandardization measures of dif magnitude for binary logisticregressionrdquo Journal of Educational and Behavioral Statisticsvol 32 no 1 pp 92ndash109 2007

[12] S Apinyapibal N Lawthong and S Kanjanawasee ldquoAcomparative analysis of the efficacy of differential itemfunctioning detection for dichotomously scored items amonglogistic regression SIBTEST and raschtree methodsrdquo Proce-dia-Social and Behavioral Sciences vol 191 pp 21ndash25 2015

[13] H-F Chen and K-Y Jin ldquoApplying logistic regression todetect differential item functioning in multidimensionaldatardquo Frontiers in Psychology vol 9 p 1302 2018

[14] D Svetina and L Rutkowski ldquoDetecting differential itemfunctioning using generalized logistic regression in thecontext of large-scale assessmentsrdquo Large-scale Assessments inEducation vol 2 no 1 2014

[15] R Puhr G Heinze M Nold et al ldquoFirthrsquos logistic regressionwith rare events accurate effect estimates and predictionsrdquoStatistics in Medicine vol 36 no 14 pp 2302ndash2317 2017

[16] D Firth ldquoBias reduction of maximum likelihood estimatesrdquoBiometrika vol 80 no 1 pp 27ndash38 1993

[17] M Pavlou G Ambler S Seaman M De Iorio andR Z Omar ldquoReview and evaluation of penalised regressionmethods for risk prediction in low-dimensional data with feweventsrdquo Statistics in Medicine vol 35 no 7 pp 1159ndash11772016

[18] M Maalouf and T B Trafalis ldquoRobust weighted kernel lo-gistic regression in imbalanced and rare events datardquo Com-putational Statistics amp Data Analysis vol 55 no 1pp 168ndash183 2011

[19] M Maalouf D Homouz and T B Trafalis ldquoLogistic re-gression in large rare events and imbalanced data a perfor-mance comparison of prior correction and weightingmethodsrdquo Computational Intelligence vol 34 no 1pp 161ndash174 2018

[20] G King and L Zeng ldquoLogistic regression in rare events datardquoPolitical Analysis vol 9 no 2 pp 137ndash163 2001

[21] I-C Huang W L Leite P Shearer M Seid D A Revickiand E A Shenkman ldquoDifferential item functioning in qualityof life measure between children with and without specialhealth-care needsrdquo Value in Health vol 14 no 6 pp 872ndash883 2011

[22] H Jeffreys ldquoAn invariant form for the prior probability inestimation problemsrdquo Proceedings of the Royal Society ofLondon Series A Mathematical and Physical Sciencesvol 186 no 1007 pp 453ndash461 1946

[23] G Heinze and M Schemper ldquoA solution to the problem ofseparation in logistic regressionrdquo Statistics in Medicinevol 21 no 16 pp 2409ndash2419 2002

[24] G Heinze and M Ploner ldquoFixing the nonconvergence bug inlogistic regression with SPLUS and SASrdquo Computer Methodsand Programs in Biomedicine vol 71 no 2 pp 181ndash187 2003

[25] G King and L Zeng ldquoExplaining rare events in internationalrelationsrdquo International Organization vol 55 no 3pp 693ndash715 2001

[26] M Maalouf ldquoLogistic regression in data analysis an over-viewrdquo International Journal of Data Analysis Techniques andStrategies vol 3 no 3 pp 281ndash299 2011

[27] D E A Sulasih S W Purnami and S P Rahayu BeBeoretical Study of Rare Event Weighted Logistic Regressionfor Classification of Imbalanced Data Universitas Muham-madiyah Surakarta Surakarta Indonesia 2015

[28] G Heinze logistf Firthrsquos Bias Reduced Logistic RegressionR Package Version 123 2018 httpsCRANR-projectorgpackage=logistf

[29] C Choirat J Honaker K Imai G King and O Lau ZeligEveryonersquos Statistical Software Version 5161 2018 httpzeligprojectorg

BioMed Research International 11

[30] M L Ong L Lu S Lee et al ldquoA comparison of the hier-archical generalized linear model multiple-indicators mul-tiple-causes and the item response theory-likelihood ratio testfor detecting differential item functioningrdquo in QuantitativePsychology Research pp 343ndash357 Springer Berlin Germany2015

[31] D Stevanovic Z Bagheri O Atilola et al ldquoCross-culturalmeasurement invariance of the revised Child anxiety anddepression scale across 11 world-wide societiesrdquo Epidemiol-ogy and Psychiatric Sciences vol 26 no 4 pp 430ndash440 2017

[32] G M Cordeiro and P McCullagh ldquoBias correction in gen-eralized linear modelsrdquo Journal of the Royal Statistical SocietySeries B (Methodological) vol 53 no 3 pp 629ndash643 1991

[33] T Maiti and V Pradhan ldquoBias reduction and a solution forseparation of logistic regression with missing covariatesrdquoBiometrics vol 65 no 4 pp 1262ndash1269 2009

[34] B D Zumbo A Handbook on the Beory and Methods ofDifferential Item Functioning (DIF) National DefenseHeadquarters Ottawa Canada 1999

[35] J-S Lai J Teresi and R Gershon ldquoProcedures for the analysisof differential item functioning (DIF) for small sample sizesrdquoEvaluation amp the Health Professions vol 28 no 3 pp 283ndash294 2005

[36] G Tutz and G Schauberger A Penalty Approach to Differ-ential Item Functioning in Rasch Models Springer BerlinGermany 2015

[37] D Magis F Tuerlinckx and P de Boeck ldquoDetection ofdifferential item functioning using the lasso approachrdquoJournal of Educational and Behavioral Statistics vol 40 no 2pp 111ndash135 2015

12 BioMed Research International

Page 4: A Comparative Study of the Bias Correction Methods for ... · 2020. 3. 2. · mum likelihood estimation method (PML) was originally developedbyDavidFirth[16]inordertoreducethesmall

Also by replacing and rearranging

fs(yX | β) Ps(X | y β)Ps(y) P(X | y β)Ps(y) f(yX | β)

P(y)Ps(y)

Ps(y)

P(y)P(y |X β)P(X)

HQ

P(y |X β)P(X) (15)

where H and Q represent the proportions in the sample andpopulation respectively +e likelihood is then

L(β) 1113945n

i1

Hi

Qi

P yi

1113868111386811138681113868 xi β1113872 1113873P xi( 1113857 (16)

+e log-likelihood for WLR can then be rewritten as

ln Lw(β | yX) 1113944n

i1

Qi

Hi

lnP yi

1113868111386811138681113868 xi β1113872 1113873 1113944n

i1

Qi

Hi

lneyiXiβ

1 + eXiβ1113888 1113889 1113944

n

i1wi ln

eyiXiβ

1 + eXiβ1113888 1113889 (17)

where wi QiHi +us the likelihood function must bemultiplied by the inverse of the fractions in order to obtain aconsistent estimator If the proportion of events in thepopulation is more than that in the sample then wi is morethan one and so the nonevent are given less weight and viceversa [26 27] +e weighting method has two seriousproblems that limit its application first the calculation ofthe standard errors through the information matrix is se-verely biased +is problem will be solved by using Whitersquosheteroscedasticity-consistent variance matrix [20]

Second slope coefficient is biased in rare events data Forsolving the second problem the bias in 1113954β can be estimated bythe following weighted least-squares expression

bias(1113954β) XprimeVX( 1113857minus 1

XprimeVξ (18)

If bias (1113954β) is the bias component and 1113954β is the uncorrectedcoefficient then the corrected coefficients 1113957β is 1113957β 1113954βminus

bias(1113954β) where

ξi 05Qii 1 + w1( 11138571113954πi minus w11113858 1113859 (19)

Also Qii are the diagonal elements of Q X(XprimeVX)minus 1

Xprime and V diag 1113954πi(1 minus 1113954πi)wi1113864 1113865+e variance matrix of 1113957β is

V(1113957β) n

n + k1113874 1113875

2V(1113954β) (20)

Since (n(n + k))2 lt 1 thenV(1113957β)ltV(1113954β) and so both thevariance and the bias are now diminished

+e second correction step is to adjust the underesti-mation of the probabilities when using the bias-correctedcoefficients in the logistic model By subtracting a correctionfactor Ci to the biased value of probability 1113957πi

πi 1113957πi minus Ci Ci 05 minus 1113957πi( 11138571113957πi 1 minus 1113957πi( 1113857X0V(1113957β)X0prime (21)

where V(1113957β) is the variance-covariance matrix X0 is a vectorof exploratory variables and πi is the approximate unbiasedestimator by means of a known τ When τ is unknown orpartially known King and Zeng introduced πi 1113957πi + Ci asthe approximate Bayesian estimator In this case the

researcher may specify an upper and lower bound for thepossible range of τ

Maximum Likelihood (ML) Penalized MaximumLikelihood (PML) and rare event logistic regression wereimplemented using the ldquoglmrdquo function and the ldquologistfrdquo Rpackage as well as the ldquorelogitrdquo function in the R packageZelig respectively [28 29]

25 Data Generation In this study an item response theorymodel for binary data was used to produce response data+e mathematical form of the IRT model is

pij Y 1 | θi( 1113857 eaj θi minus bj( 1113857

1 + eaj θi minus bj( 1113857 (22)

where pij(θ) is the probability of correct response for in-dividual i of item j aj denotes the item discriminationparameter bj is the item difficulty parameter and θi rep-resents the ability level for the ith individual In this study bj

(j 2 3 J) and θ parameters were simulated from thestandard normal distribution and aj (j 2 3 J) pa-rameters were random samples of a uniform distributionwithin the interval (1 2) [1]

In this simulation study five factors were varied samplesize magnitude of uniform DIF sample size ratio numberof items and the degree of rareness or imbalance +reesample sizes (N 200 600 1000) and three levels of samplesize ratio (R 1 2 3) were investigated+e sample size ratiobetween the focal and reference groups was set to 1 1 for theequal sample size conditions and 2 1 and 3 1 for the un-equal sample size conditions More specifically we createdconditions with nRnF 100100 67133 and 50150 for thesmall sample size (N 200) and nRnF 300300 200400and 150450 for the medium sample size (N 600) and nRnF 500500 333667 and 250750 for the large sample size(N 1000) Furthermore the two measures with 5 and 15items were simulated (I 5 15)

To generate rare events data we followed similar sce-narios and notations used by King and Zeng In order togenerate imbalanced or rare events data we applied the logitmodel log it(πi) β0 + β1xi πi p(Yi 1) by holding β1

4 BioMed Research International

constant (β1 1) while varying β0 In this case the values ofβ0 minus 3 and β0 minus 2 generate outcomes with the percentagesof ones which are equal to 69 and 156 respectively Itshould be noted that we generate the simulated data basedon the IRTmodel not the logistic regression Hence we needto show that the parameters of the logistic regression and theIRT model with binary responses are one-to-one corre-spondent In order to denote that both models have similarmathematical expressions equation (22) can be rearrangedas log it(πi) minus ajbj + ajθi By comparing this equation withthe logit function log it(πi) β0 + β1xi coefficients β0 andβ1 and the variable xi in the logistic regression model arefound to be respectively correspondent to parameters minus ajbjaj and θi in the IRTmodel (ie β0 ndashajbj β1 aj and xi θi)With the assumption of aj β1 1 the constant parameter(β0) in the logistic regression is equivalent to the minus valueof the difficulty parameter (minus bj) in the IRT model Ac-cordingly in the IRT framework and when aj 1 with thevalues of bj 3 and 2 we can generate responses with 69and 156 imbalanced degree respectively

Statistical power is defined by the proportion of timesthat DIF is correctly identified by the logistic method acrossreplications and the type I error rate also referred to thefalse positive rate represents the proportion of non-DIFitems incorrectly flagged as having DIF in 1000 replications+e type I error rates are averaged over all without DIF items[30]

3 Results

Table 1 presents the statistical power of the LR model basedon three different inferential methods (ML PML andWLR)and under various combinations of τ R N DIF and scalelength +e major finding was that the power of the ML andPML was more affected by imbalanced degree (τ) than byWLR estimation method Specifically for the moderatemagnitude of DIF (DIF 04) regardless of the sample sizeand number of items increasing imbalanced or rarenessdegree τ from 0156 to 0069 resulted in a reduction byapproximately 38 37 and 30 in the power of PMLMLand WLR respectively Furthermore for the severe mag-nitude of DIF (DIF 08) a similar finding was observedie the power of the PML ML and WLR was reducedapproximately by 37 35 and 26 respectively Ingeneral our findings indicated that the power of the threeestimationmethods for detecting DIF could be ordered fromhighest to lowest as follows WLRgeMLge PML Indeed forthe moderate values of DIF (DIF 04) compared withWLR there were reductions of approximately 30 and 24under τ 0069 and 22 and 16 under τ 0156 in thepower for the PML andML respectively On the other handwhen the magnitude of DIF was severe (DIF 08) com-pared with WLR there were reductions of approximately27 and 23 under τ 0069 and 14 and 12 underτ 0156 in the power for the PML and ML respectively

When the magnitude of DIF was moderate (DIF 04)N 1000 R 1 and I 15 the maximum power of the PMLML andWLRwas 46 48 and 53 for τ 0156 and 2528 and 32 for τ 0069 respectively Hence to achieve

the adequate power (08) for detecting moderate DIF werequire a sample size of larger than 1000 (N 1000) Fur-thermore for the severe of DIF (DIF 08) and τ 0156 thepower of the PMLML andWLRwas 75 78 and 83 forI 5 and 80 81 and 87 for I 15 respectively whenN 600 and R 1 2 However under DIF 08 τ 0069R 1 and N 1000 the power of the PML ML and WLRwas 66 69 and 77 for I 5 and 72 74 and 81 forI 15 respectively

Table 2 reports the empirical type I error rate of the LRmodel under various combinations of imbalanced degree ofdata sample size sample size ratio magnitude of DIF andthe number of items In general regardless of the magnitudeof DIF sample size sample size ratio and the number ofitems the average type I error rates of the ML PML andWLR were 006 005 and 003 for τ 0156 and 006 004and 001 for τ 0069 +ese findings indicate that when theimbalanced degree τ increased from 0156 to 0069 theaverage type I error rate of the WLR decreased dramaticallywhile it was close to the 5 level for the ML and PMLmethods However there were some exceptions where thetype I error rates of the ML PML and WLR were higherthan the nominal level of 5 When DIF 08 N 1000I 5 and τ 0156 the empirical type I error rate was 1111 and 9 for ML 8 9 and 8 for PML and 8 7and 7 for WLR when R was equal to 1 2 and 3 re-spectively In this case as shown in Table 2 when thenumber of items increased from 5 to 15 the type I error ratewould be equal or less than 5

+e average power and type I error rate on measureswith 5 and 15 items are depicted in Figures 1 and 2According to Figure 1 irrespective of the number of itemsthe highest power in all combinations belonged to WLRwhile PML and ML had relatively equivalent powerFigure 2 indicates that in most simulation conditions typeI error rate was above or exactly at the nominal significancelevel of 005 for ML and below or exactly equal to thenominal level for the PML method (ML range005ndash011and PML range 003ndash005) However in WLRtype I error rate was lower than the other two methods andfluctuated between 002 and 005 from 0 to 001 forτ 0156 and τ 0069 respectively

31 Real Data Example In this section we used a real dataset to validate the simulation findings Accordingly 387Serbian individuals (403 male and 507 female) wereselected out of a larger sample of 4192 people from elevencountries participating in a cross-cultural study [31] +eparticipants completed the 47-item Revised Child Anxietyand Depression Scale (RCADS) +e RCADS divided intosix subscales including social phobia separation anxietydisorder generalized anxiety panic disorder obsessivecompulsive disorder and major depressive disorder Allitems use the same 4-point Likert-type response scales toassess the frequency of a certain symptom We binarizedthe item responses after transformation of never andsometimes into ldquono symptomrdquo and often and always intoldquosymptomrdquo For details on the RCADS as used in this

BioMed Research International 5

project see [31] +e results of DIF analysis of the RCADSacross male and female Serbian adolescents based on threeinferential methods (WLR PML and ML) are shown inTable 3 Our findings showed that ten items with DIF and33 items without DIF were common across the three

methods However the results indicate that the WLRmethod is more sensitive than the PML and ML fordetecting DIF For example in the separation anxietysubscale while the WLR model identified four items withDIF one and three items exhibited DIF according to PML

Table 2 Type I error rate of different methods of estimation under different combinations

Item Ratio NDIF 04 DIF 08

τ 0156 τ 0069 τ 0156 τ 0069ML PML WLR ML PML WLR ML PML WLR ML PML WLR

5 nf nr200 007 005 002 007 005 001lowast 007 005 003 007 005 001lowast600 005 004 003 005 003 001lowast 008 006 005 006 004 001lowast1000 007 005 004 006 004 001 011 008 008 006 005 001

5 nf 2nr200 007 005 003 007 005 001 008 006 004 007 005 001600 006 004 002 004 004 001 008 006 005 005 004 0011000 006 005 003 005 004 001 011 009 007 006 004 001

5 nf 3nr200 006 004 003 005 004 001 007 005 004 006 004 001600 007 005 003 007 005 001 008 006 004 007 005 0011000 007 006 004 006 004 001 009 007 007 006 005 001

15 nf nr200 005 003 002 005 003 001 005 003 002 005 003 001600 005 004 002 005 004 001 006 004 003 005 004 0011000 005 005 003 005 005 001 005 005 003 005 005 001

15 nf 2nr200 007 005 003 006 005 001lowast 007 005 003 007 005 001lowast600 006 004 003 006 005 001 006 005 003 006 005 0011000 004 004 002 004 003 001lowast 005 004 002 004 004 001lowast

15 nf 3nr200 006 005 002 006 005 001 006 005 002 006 005 001600 005 005 002 005 004 001lowast 005 005 002 005 005 001lowast1000 005 004 002 005 004 001 005 004 002 005 004 001lowast

Note Ratio sample size ratio between the reference and focal groups nr and nf represent sample sizes in reference and focal groups respectively N totalsample size N nr + nf τ the fraction of 1s in the population DIF differential item functioning ML maximum likelihood PML penalized maximumlikelihood WLR Weighted Logistic Regression lowastNear to 001

Table 1 Statistical power of different methods of estimation under different combinations

Item Ratio NDIF 04 DIF 08

τ 0156 τ 0069 τ 0156 τ 0069ML PML WLR ML PML WLR ML PML WLR ML PML WLR

5 nf nr200 013 010 016 009 006 012 034 030 042 020 015 028600 029 026 033 017 015 021 078 075 083 049 045 0581000 046 042 051 023 021 027 094 092 096 069 066 077

5 nf 2nr200 013 011 016 009 007 013 032 029 041 019 015 030600 029 027 032 016 015 022 074 072 082 045 044 0591000 041 039 046 023 022 029 089 088 093 062 061 073

5 nf 3nr200 013 011 017 009 008 016 030 028 041 017 016 029600 021 020 026 013 013 018 061 060 073 037 036 0511000 037 035 042 021 020 026 086 085 091 056 056 070

15 nf nr200 013 011 017 010 009 013 036 033 045 022 018 029600 034 032 038 020 019 023 081 080 087 052 049 0621000 048 046 053 028 025 032 096 096 098 074 072 081

15 nf 2nr200 013 012 017 011 009 015 035 033 044 022 020 032600 034 032 039 020 020 025 077 077 085 051 050 0651000 045 044 050 025 024 031 094 094 097 070 070 080

15 nf 3nr200 013 013 017 010 008 014 031 031 041 019 019 031600 025 025 031 015 015 021 074 074 083 043 044 0571000 037 037 043 023 024 030 090 089 094 064 066 077

Note Ratio sample size ratio between the reference and focal groups nr and nf represent sample sizes in reference and focal groups respectively N totalsample size N nr + nf τ the fraction of 1s in the population DIF differential item functioning ML maximum likelihood PML penalized maximumlikelihood WLR Weighted Logistic Regression

6 BioMed Research International

010

020

030

040

050

060

Pow

er

400 600 800 1000200

Total sample size

(a)

04

06

08

010

Pow

er

400 600 800 1000200

Total sample size

(b)

400 600 800 1000200

Total sample size

010

020

030

040

050

Pow

er

(c)

02

04

06

08

010

Pow

er

400 600 800 1000200

Total sample size

(d)

005

015

010

020

025

030

Pow

er

400 600 800 1000200

Total sample size

(e)

020

040

060

080

Pow

er

400 600 800 1000200

Total sample size

(f )

Figure 1 Continued

BioMed Research International 7

400 600 800 1000200

Total sample size

002

003

004

005

006

007

Type

I er

ror r

ate

(a)

002

004

006

008

Type

I er

ror r

ate

400 600 800 1000200

Total sample size

(b)

002

003

004

005

006

007

Type

I er

ror r

ate

1000600 800400200

Total sample size

(c)

002

003

004

005

006

007

008

Type

I er

ror r

ate

1000600 800400200

Total sample size

(d)

Figure 2 Continued

400 600 800 1000200

Total sample size

005

010

015

020

025

030

Pow

er

(g)

02

04

06

08

Pow

er

400 600 800 1000200

Total sample size

(h)

Figure 1 +e average power of MLE (solid lines) PML (dotted line) and WLR (broken line) methods on measures with 5 and 15 itemsNote Left panel for DIF 04 and right panel for DIF 08 From top to bottom the four panels are (nf nr τ 0156) (nf 3nr τ 0156)(nf nr τ 0069) and (nf 3nr τ 0069)

8 BioMed Research International

and ML respectively Moreover as compared with ML andPML WLR gives smaller standard error for regressioncoefficient and it can result in increased likelihood offinding DIF

4 Discussion

Detecting DIF based on the logistic regression model can bechallenging for binary data where events are rare or severely

000

002

004

006

Type

I er

ror r

ate

1000600 800400200

Total sample size

(e)

000

002

004

006

Type

I er

ror r

ate

1000600 800400200

Total sample size

(f )

000

001

002

003

004

005

006

Type

I er

ror r

ate

1000600 800400200

Total sample size

(g)

400 600 800 1000200

Total sample size

000

002

004

006

Type

I er

ror r

ate

(h)

Figure 2 +e average type I error rates of MLE (solid lines) PMLE (dotted line) and WLR (broken line) methods on measures with 5 and15 items Note Left panel for DIF 04 and right panel for DIF 08 From top to bottom the four panels are (nf nr τ 0156) (nf 3nrτ 0156) (nf nr τ 0069) and (nf 3nr τ 0069)

Table 3 +e results of DIF analysis across male and female Serbian individuals based on ML PML and WLR methods

ItemML PML WLR

B SE P value B SE P value B SE P value τ

Social phobia

7 061 029 0032 06 029 0042 061 031 0042 02420 minus 116 044 0007 minus 113 044 001 minus 119 042 0002 01738 124 044 0002 119 043 0004 118 041 0001 01943 minus 157 047 lt0001 minus 152 046 0001 minus 156 044 lt0001 019

Separation anxiety

9 minus 089 048 006 minus 086 047 01 minus 083 036 0015 03117 206 122 0045 17 106 0092 097 075 0008 01145 096 048 0034 091 047 0059 104 056 0036 00946 minus 304 114 0001 minus 269 101 0003 minus 19 051 lt0001 014

Generalized anxiety disorder 13 114 035 0001 112 035 0001 112 035 0001 03137 minus 201 044 lt0001 minus 196 044 lt0001 minus 195 044 lt0001 022

Obsessive compulsive disorder 23 079 043 0059 075 043 01 079 041 0038 018

Major depression2 123 045 0004 118 044 0006 119 045 0003 0196 minus 122 058 0031 minus 118 056 0043 minus 115 042 0004 01221 063 032 0048 062 032 0072 065 034 0049 026

Note P value is reported in three decimal places for more accuracy of comparing the three models τ the fraction of 1s in the population that is extracted froma data set of 4192 adolescents in eleven countries B regression coefficient for testing uniform DIF SE standard error of regression coefficient

BioMed Research International 9

imbalanced +e present study is designed to clarify theunknown consequences of rare events data in the context ofDIF analysis based on the LR model with and withoutapplying bias correction methods Hence in the compre-hensive simulation study we compared the performance ofthe WLR PML and ML methods for detecting DIF withfocus on imbalanced binary response data

According to our simulation findings of the three es-timation methods for detecting DIF the WLR seems to bethe most appealing with regards to its statistical propertiesits type I error rate is close to or less than the nominal 005level and its power is considerably higher than the PML andML Moreover in the present study comparing the threeinferential methods for the real data set confirmed thefindings of the simulation Accordingly as compared withWLR PML and ML were less sensitive for detecting DIFacross male and female Serbian adolescents

It is worthwhile to explain why the WLR outperforms thePML and ML estimation methods for DIF analysis Indeedthe WLR is an extension of the small-sample bias correctionsas described by Cordeiro and McCullagh to the weightedlikelihood [32] A previous simulation study has shown thatthis bias correction method reduces both the bias and thevariance thereby offering the smallest Mean Squared Error(MSE) when compared with that of PML and ML estimationmethods [33] Hence applying the WLR in DIF analysis canlead to reduction in the standard error of the parameterestimates which consequently increases the likelihood offinding DIF Another advantage of the WLR is that it givesunbiased predicted probability [15 20] However the pre-dicted probabilities from the WLRmodel may fall outside theplausible range of 0 to 1 [15]

Furthermore our findings revealed that the logisticregression model based on the traditional ML estimationmethod had a slightly better performance than the PML forimbalanced or rare events data +ese findings were similarto those of Lee who reported that the ML slightly out-performed the PML for detecting DIF in small samples [9]In this case as described by Lee it seems that the sacrifice inthe precision cannot be compensated for the bias reductionof the PML estimates in imbalanced data It should also benoted that while Firthrsquos PML method reduces the bias in theestimates of regression coefficients it introduces bias in thepredicted probabilities and this bias is not negligible for rareevents data [15] To overcome this problem Puhr et alproposed two modified versions of Firthrsquos PML resulting inunbiased predicted probabilities [15] Accordingly theproposed methods including Firthrsquos logistic regression withintercept correction and Firthrsquos logistic regression withadded covariate efficiently improve on predictions fromFirthrsquos logistic regression [15] Since these methods were notimplemented in the standard statistical software such as Rwe were not able to apply them in the context of DIF analysisand compare their results with those of the WLR modelHence in the present study the WLR is the most efficientmethod for DIF analysis under rare events data +e WLRnot only reduces the bias of regression coefficients but alsoprovides unbiased predicted probability in comparison tothe PML method [15 20]

What distinguishes this study from the previous one isthat we simultaneously evaluated the effect of imbalanced dataand small samples (as well as large samples) on the perfor-mance of the three estimation methods for DIF analysis Inaddition one of the advantages of the present study is that itprovides a guideline about the required sample size for DIFanalysis with imbalanced or rare events data Previous sim-ulation studies have shown that the minimum sample size forDIF analysis with logistic regression should be within therange of 100 to 200 per group [34 35] However for moderateimbalanced data (τ 0156) and severe DIF (DIF 08) as ageneral rule of thumb we would suggest imposing a mini-mum of 300 respondents per group to achieve the adequatepower with the WLR method In addition for severe im-balance rate (τ 0069) and DIF 08 theWLR performs wellto ensure the acceptable power with samples of 500 per groupTo find what the minimum number of sample size is toachieve the adequate power (80) for moderate DIF wesimulated items with DIF 04 τ 0156 and 0069 andsample sizes greater than 500 in each group not reported inthe results section +e findings revealed that sample sizes ofat least 1000 per group for τ 0156 and 2000 per group forτ 0069 are required to detect DIF 04 based on the WLRmethod

In the present study we restricted our simulation to onlytwo bias correction methods for detecting DIF namelyWLR and PML Nevertheless there are a limited number ofstudies that evaluated DIF based on other penalizationmethods such as LASSO For example Tutz and Schaubergerapplied LASSO penalization for DIF analysis which includescontinuous variables [36] Furthermore Magis et al pro-posed a new DIF detection method based on LASSO whereall items were simultaneously evaluated for DIF in a singlemodeling approach so that multiple testing would not be aproblem [37]

5 Conclusion

Although the logistic regression (LR) model is one the mostcommon methods for detecting DIF certain samplingstrategies and appropriate bias correction techniques shouldbe applied when LR is implemented on moderate or severeimbalanced data sets In summary our findings revealedthat as compared with ML and PML the WLR is a moresensitive method for detecting DIF when data are imbal-anced or rare Hence as well as its easy application toexisting software the WLR introduced by King and Zeng isstrongly recommended for detecting DIF due to higherpower and lower type I error rate in comparison to PML andML inferential methods However in the future studies ofDIF penalized and bias correction methods should beproposed for ordinal logistic regression in the presence ofrare events or small sample setting

Data Availability

+is is a simulation study and the real data set has beenprovided by Dr Dejan Stevanovic the third author of themanuscript

10 BioMed Research International

Conflicts of Interest

+e authors declare no potential conflicts of interest withrespect to the research authorship andor publication ofthis article

Acknowledgments

+is work was supported by the Shiraz University of MedicalSciences Research Council Shiraz Iran under grant no96-01-01-14601 +is paper was extracted fromMarjan FaghihrsquosPhD thesis under supervision of Prof Peyman JafariPeyman Jafari andMarjan Faghih equally contributed to thiswork

References

[1] B D Zumbo ldquo+ree generations of DIF analyses consideringwhere it has been where it is now and where it is goingrdquoLanguage Assessment Quarterly vol 4 no 2 pp 223ndash2332007

[2] T A Ackerman ldquo+e theory and practice of item responsetheory by de Ayala R Jrdquo Journal of Educational Measure-ment vol 47 no 4 pp 471ndash476 2010

[3] B E Clauser and K M Mazor ldquoUsing statistical proceduresto identify differentially functioning test itemsrdquo EducationalMeasurement Issues and Practice vol 17 no 1 pp 31ndash442005

[4] H Swaminathan and H J Rogers ldquoDetecting differential itemfunctioning using logistic regression proceduresrdquo Journal ofEducational Measurement vol 27 no 4 pp 361ndash370 1990

[5] H J Rogers and H Swaminathan ldquoA comparison of logisticregression and Mantel-Haenszel procedures for detectingdifferential item functioningrdquo Applied Psychological Mea-surement vol 17 no 2 pp 105ndash116 1993

[6] P Narayanon and H Swaminathan ldquoIdentification of itemsthat show nonuniform DIFrdquo Applied Psychological Mea-surement vol 20 no 3 pp 257ndash274 1996

[7] P Jafari D Stevanovic and Z Bagheri ldquoCross-culturalmeasurement equivalence of the KINDL questionnaire forquality of life assessment in children and adolescentsrdquo ChildPsychiatry amp Human Development vol 47 no 2 pp 291ndash3042016

[8] P K Crane L E Gibbons K Ocepek-Welikson et al ldquoAcomparison of three sets of criteria for determining thepresence of differential item functioning using ordinal logisticregressionrdquo Quality of Life Research vol 16 no 1 pp 69ndash842007

[9] S Lee ldquoDetecting differential item functioning using thelogistic regression procedure in small samplesrdquo AppliedPsychological Measurement vol 41 no 1 pp 30ndash43 2017

[10] E Allahyari P Jafari and Z Bagheri ldquoA simulation study toassess the effect of the number of response categories on thepower of ordinal logistic regression for differential itemfunctioning analysis in rating scalesrdquo Computational andMathematical Methods in Medicine vol 2016 Article ID5080826 8 pages 2016

[11] P O Monahan C A McHorney T E Stump andA J Perkins ldquoOdds ratio delta ETS classification andstandardization measures of dif magnitude for binary logisticregressionrdquo Journal of Educational and Behavioral Statisticsvol 32 no 1 pp 92ndash109 2007

[12] S Apinyapibal N Lawthong and S Kanjanawasee ldquoAcomparative analysis of the efficacy of differential itemfunctioning detection for dichotomously scored items amonglogistic regression SIBTEST and raschtree methodsrdquo Proce-dia-Social and Behavioral Sciences vol 191 pp 21ndash25 2015

[13] H-F Chen and K-Y Jin ldquoApplying logistic regression todetect differential item functioning in multidimensionaldatardquo Frontiers in Psychology vol 9 p 1302 2018

[14] D Svetina and L Rutkowski ldquoDetecting differential itemfunctioning using generalized logistic regression in thecontext of large-scale assessmentsrdquo Large-scale Assessments inEducation vol 2 no 1 2014

[15] R Puhr G Heinze M Nold et al ldquoFirthrsquos logistic regressionwith rare events accurate effect estimates and predictionsrdquoStatistics in Medicine vol 36 no 14 pp 2302ndash2317 2017

[16] D Firth ldquoBias reduction of maximum likelihood estimatesrdquoBiometrika vol 80 no 1 pp 27ndash38 1993

[17] M Pavlou G Ambler S Seaman M De Iorio andR Z Omar ldquoReview and evaluation of penalised regressionmethods for risk prediction in low-dimensional data with feweventsrdquo Statistics in Medicine vol 35 no 7 pp 1159ndash11772016

[18] M Maalouf and T B Trafalis ldquoRobust weighted kernel lo-gistic regression in imbalanced and rare events datardquo Com-putational Statistics amp Data Analysis vol 55 no 1pp 168ndash183 2011

[19] M Maalouf D Homouz and T B Trafalis ldquoLogistic re-gression in large rare events and imbalanced data a perfor-mance comparison of prior correction and weightingmethodsrdquo Computational Intelligence vol 34 no 1pp 161ndash174 2018

[20] G King and L Zeng ldquoLogistic regression in rare events datardquoPolitical Analysis vol 9 no 2 pp 137ndash163 2001

[21] I-C Huang W L Leite P Shearer M Seid D A Revickiand E A Shenkman ldquoDifferential item functioning in qualityof life measure between children with and without specialhealth-care needsrdquo Value in Health vol 14 no 6 pp 872ndash883 2011

[22] H Jeffreys ldquoAn invariant form for the prior probability inestimation problemsrdquo Proceedings of the Royal Society ofLondon Series A Mathematical and Physical Sciencesvol 186 no 1007 pp 453ndash461 1946

[23] G Heinze and M Schemper ldquoA solution to the problem ofseparation in logistic regressionrdquo Statistics in Medicinevol 21 no 16 pp 2409ndash2419 2002

[24] G Heinze and M Ploner ldquoFixing the nonconvergence bug inlogistic regression with SPLUS and SASrdquo Computer Methodsand Programs in Biomedicine vol 71 no 2 pp 181ndash187 2003

[25] G King and L Zeng ldquoExplaining rare events in internationalrelationsrdquo International Organization vol 55 no 3pp 693ndash715 2001

[26] M Maalouf ldquoLogistic regression in data analysis an over-viewrdquo International Journal of Data Analysis Techniques andStrategies vol 3 no 3 pp 281ndash299 2011

[27] D E A Sulasih S W Purnami and S P Rahayu BeBeoretical Study of Rare Event Weighted Logistic Regressionfor Classification of Imbalanced Data Universitas Muham-madiyah Surakarta Surakarta Indonesia 2015

[28] G Heinze logistf Firthrsquos Bias Reduced Logistic RegressionR Package Version 123 2018 httpsCRANR-projectorgpackage=logistf

[29] C Choirat J Honaker K Imai G King and O Lau ZeligEveryonersquos Statistical Software Version 5161 2018 httpzeligprojectorg

BioMed Research International 11

[30] M L Ong L Lu S Lee et al ldquoA comparison of the hier-archical generalized linear model multiple-indicators mul-tiple-causes and the item response theory-likelihood ratio testfor detecting differential item functioningrdquo in QuantitativePsychology Research pp 343ndash357 Springer Berlin Germany2015

[31] D Stevanovic Z Bagheri O Atilola et al ldquoCross-culturalmeasurement invariance of the revised Child anxiety anddepression scale across 11 world-wide societiesrdquo Epidemiol-ogy and Psychiatric Sciences vol 26 no 4 pp 430ndash440 2017

[32] G M Cordeiro and P McCullagh ldquoBias correction in gen-eralized linear modelsrdquo Journal of the Royal Statistical SocietySeries B (Methodological) vol 53 no 3 pp 629ndash643 1991

[33] T Maiti and V Pradhan ldquoBias reduction and a solution forseparation of logistic regression with missing covariatesrdquoBiometrics vol 65 no 4 pp 1262ndash1269 2009

[34] B D Zumbo A Handbook on the Beory and Methods ofDifferential Item Functioning (DIF) National DefenseHeadquarters Ottawa Canada 1999

[35] J-S Lai J Teresi and R Gershon ldquoProcedures for the analysisof differential item functioning (DIF) for small sample sizesrdquoEvaluation amp the Health Professions vol 28 no 3 pp 283ndash294 2005

[36] G Tutz and G Schauberger A Penalty Approach to Differ-ential Item Functioning in Rasch Models Springer BerlinGermany 2015

[37] D Magis F Tuerlinckx and P de Boeck ldquoDetection ofdifferential item functioning using the lasso approachrdquoJournal of Educational and Behavioral Statistics vol 40 no 2pp 111ndash135 2015

12 BioMed Research International

Page 5: A Comparative Study of the Bias Correction Methods for ... · 2020. 3. 2. · mum likelihood estimation method (PML) was originally developedbyDavidFirth[16]inordertoreducethesmall

constant (β1 1) while varying β0 In this case the values ofβ0 minus 3 and β0 minus 2 generate outcomes with the percentagesof ones which are equal to 69 and 156 respectively Itshould be noted that we generate the simulated data basedon the IRTmodel not the logistic regression Hence we needto show that the parameters of the logistic regression and theIRT model with binary responses are one-to-one corre-spondent In order to denote that both models have similarmathematical expressions equation (22) can be rearrangedas log it(πi) minus ajbj + ajθi By comparing this equation withthe logit function log it(πi) β0 + β1xi coefficients β0 andβ1 and the variable xi in the logistic regression model arefound to be respectively correspondent to parameters minus ajbjaj and θi in the IRTmodel (ie β0 ndashajbj β1 aj and xi θi)With the assumption of aj β1 1 the constant parameter(β0) in the logistic regression is equivalent to the minus valueof the difficulty parameter (minus bj) in the IRT model Ac-cordingly in the IRT framework and when aj 1 with thevalues of bj 3 and 2 we can generate responses with 69and 156 imbalanced degree respectively

Statistical power is defined by the proportion of timesthat DIF is correctly identified by the logistic method acrossreplications and the type I error rate also referred to thefalse positive rate represents the proportion of non-DIFitems incorrectly flagged as having DIF in 1000 replications+e type I error rates are averaged over all without DIF items[30]

3 Results

Table 1 presents the statistical power of the LR model basedon three different inferential methods (ML PML andWLR)and under various combinations of τ R N DIF and scalelength +e major finding was that the power of the ML andPML was more affected by imbalanced degree (τ) than byWLR estimation method Specifically for the moderatemagnitude of DIF (DIF 04) regardless of the sample sizeand number of items increasing imbalanced or rarenessdegree τ from 0156 to 0069 resulted in a reduction byapproximately 38 37 and 30 in the power of PMLMLand WLR respectively Furthermore for the severe mag-nitude of DIF (DIF 08) a similar finding was observedie the power of the PML ML and WLR was reducedapproximately by 37 35 and 26 respectively Ingeneral our findings indicated that the power of the threeestimationmethods for detecting DIF could be ordered fromhighest to lowest as follows WLRgeMLge PML Indeed forthe moderate values of DIF (DIF 04) compared withWLR there were reductions of approximately 30 and 24under τ 0069 and 22 and 16 under τ 0156 in thepower for the PML andML respectively On the other handwhen the magnitude of DIF was severe (DIF 08) com-pared with WLR there were reductions of approximately27 and 23 under τ 0069 and 14 and 12 underτ 0156 in the power for the PML and ML respectively

When the magnitude of DIF was moderate (DIF 04)N 1000 R 1 and I 15 the maximum power of the PMLML andWLRwas 46 48 and 53 for τ 0156 and 2528 and 32 for τ 0069 respectively Hence to achieve

the adequate power (08) for detecting moderate DIF werequire a sample size of larger than 1000 (N 1000) Fur-thermore for the severe of DIF (DIF 08) and τ 0156 thepower of the PMLML andWLRwas 75 78 and 83 forI 5 and 80 81 and 87 for I 15 respectively whenN 600 and R 1 2 However under DIF 08 τ 0069R 1 and N 1000 the power of the PML ML and WLRwas 66 69 and 77 for I 5 and 72 74 and 81 forI 15 respectively

Table 2 reports the empirical type I error rate of the LRmodel under various combinations of imbalanced degree ofdata sample size sample size ratio magnitude of DIF andthe number of items In general regardless of the magnitudeof DIF sample size sample size ratio and the number ofitems the average type I error rates of the ML PML andWLR were 006 005 and 003 for τ 0156 and 006 004and 001 for τ 0069 +ese findings indicate that when theimbalanced degree τ increased from 0156 to 0069 theaverage type I error rate of the WLR decreased dramaticallywhile it was close to the 5 level for the ML and PMLmethods However there were some exceptions where thetype I error rates of the ML PML and WLR were higherthan the nominal level of 5 When DIF 08 N 1000I 5 and τ 0156 the empirical type I error rate was 1111 and 9 for ML 8 9 and 8 for PML and 8 7and 7 for WLR when R was equal to 1 2 and 3 re-spectively In this case as shown in Table 2 when thenumber of items increased from 5 to 15 the type I error ratewould be equal or less than 5

+e average power and type I error rate on measureswith 5 and 15 items are depicted in Figures 1 and 2According to Figure 1 irrespective of the number of itemsthe highest power in all combinations belonged to WLRwhile PML and ML had relatively equivalent powerFigure 2 indicates that in most simulation conditions typeI error rate was above or exactly at the nominal significancelevel of 005 for ML and below or exactly equal to thenominal level for the PML method (ML range005ndash011and PML range 003ndash005) However in WLRtype I error rate was lower than the other two methods andfluctuated between 002 and 005 from 0 to 001 forτ 0156 and τ 0069 respectively

31 Real Data Example In this section we used a real dataset to validate the simulation findings Accordingly 387Serbian individuals (403 male and 507 female) wereselected out of a larger sample of 4192 people from elevencountries participating in a cross-cultural study [31] +eparticipants completed the 47-item Revised Child Anxietyand Depression Scale (RCADS) +e RCADS divided intosix subscales including social phobia separation anxietydisorder generalized anxiety panic disorder obsessivecompulsive disorder and major depressive disorder Allitems use the same 4-point Likert-type response scales toassess the frequency of a certain symptom We binarizedthe item responses after transformation of never andsometimes into ldquono symptomrdquo and often and always intoldquosymptomrdquo For details on the RCADS as used in this

BioMed Research International 5

project see [31] +e results of DIF analysis of the RCADSacross male and female Serbian adolescents based on threeinferential methods (WLR PML and ML) are shown inTable 3 Our findings showed that ten items with DIF and33 items without DIF were common across the three

methods However the results indicate that the WLRmethod is more sensitive than the PML and ML fordetecting DIF For example in the separation anxietysubscale while the WLR model identified four items withDIF one and three items exhibited DIF according to PML

Table 2 Type I error rate of different methods of estimation under different combinations

Item Ratio NDIF 04 DIF 08

τ 0156 τ 0069 τ 0156 τ 0069ML PML WLR ML PML WLR ML PML WLR ML PML WLR

5 nf nr200 007 005 002 007 005 001lowast 007 005 003 007 005 001lowast600 005 004 003 005 003 001lowast 008 006 005 006 004 001lowast1000 007 005 004 006 004 001 011 008 008 006 005 001

5 nf 2nr200 007 005 003 007 005 001 008 006 004 007 005 001600 006 004 002 004 004 001 008 006 005 005 004 0011000 006 005 003 005 004 001 011 009 007 006 004 001

5 nf 3nr200 006 004 003 005 004 001 007 005 004 006 004 001600 007 005 003 007 005 001 008 006 004 007 005 0011000 007 006 004 006 004 001 009 007 007 006 005 001

15 nf nr200 005 003 002 005 003 001 005 003 002 005 003 001600 005 004 002 005 004 001 006 004 003 005 004 0011000 005 005 003 005 005 001 005 005 003 005 005 001

15 nf 2nr200 007 005 003 006 005 001lowast 007 005 003 007 005 001lowast600 006 004 003 006 005 001 006 005 003 006 005 0011000 004 004 002 004 003 001lowast 005 004 002 004 004 001lowast

15 nf 3nr200 006 005 002 006 005 001 006 005 002 006 005 001600 005 005 002 005 004 001lowast 005 005 002 005 005 001lowast1000 005 004 002 005 004 001 005 004 002 005 004 001lowast

Note Ratio sample size ratio between the reference and focal groups nr and nf represent sample sizes in reference and focal groups respectively N totalsample size N nr + nf τ the fraction of 1s in the population DIF differential item functioning ML maximum likelihood PML penalized maximumlikelihood WLR Weighted Logistic Regression lowastNear to 001

Table 1 Statistical power of different methods of estimation under different combinations

Item Ratio NDIF 04 DIF 08

τ 0156 τ 0069 τ 0156 τ 0069ML PML WLR ML PML WLR ML PML WLR ML PML WLR

5 nf nr200 013 010 016 009 006 012 034 030 042 020 015 028600 029 026 033 017 015 021 078 075 083 049 045 0581000 046 042 051 023 021 027 094 092 096 069 066 077

5 nf 2nr200 013 011 016 009 007 013 032 029 041 019 015 030600 029 027 032 016 015 022 074 072 082 045 044 0591000 041 039 046 023 022 029 089 088 093 062 061 073

5 nf 3nr200 013 011 017 009 008 016 030 028 041 017 016 029600 021 020 026 013 013 018 061 060 073 037 036 0511000 037 035 042 021 020 026 086 085 091 056 056 070

15 nf nr200 013 011 017 010 009 013 036 033 045 022 018 029600 034 032 038 020 019 023 081 080 087 052 049 0621000 048 046 053 028 025 032 096 096 098 074 072 081

15 nf 2nr200 013 012 017 011 009 015 035 033 044 022 020 032600 034 032 039 020 020 025 077 077 085 051 050 0651000 045 044 050 025 024 031 094 094 097 070 070 080

15 nf 3nr200 013 013 017 010 008 014 031 031 041 019 019 031600 025 025 031 015 015 021 074 074 083 043 044 0571000 037 037 043 023 024 030 090 089 094 064 066 077

Note Ratio sample size ratio between the reference and focal groups nr and nf represent sample sizes in reference and focal groups respectively N totalsample size N nr + nf τ the fraction of 1s in the population DIF differential item functioning ML maximum likelihood PML penalized maximumlikelihood WLR Weighted Logistic Regression

6 BioMed Research International

010

020

030

040

050

060

Pow

er

400 600 800 1000200

Total sample size

(a)

04

06

08

010

Pow

er

400 600 800 1000200

Total sample size

(b)

400 600 800 1000200

Total sample size

010

020

030

040

050

Pow

er

(c)

02

04

06

08

010

Pow

er

400 600 800 1000200

Total sample size

(d)

005

015

010

020

025

030

Pow

er

400 600 800 1000200

Total sample size

(e)

020

040

060

080

Pow

er

400 600 800 1000200

Total sample size

(f )

Figure 1 Continued

BioMed Research International 7

400 600 800 1000200

Total sample size

002

003

004

005

006

007

Type

I er

ror r

ate

(a)

002

004

006

008

Type

I er

ror r

ate

400 600 800 1000200

Total sample size

(b)

002

003

004

005

006

007

Type

I er

ror r

ate

1000600 800400200

Total sample size

(c)

002

003

004

005

006

007

008

Type

I er

ror r

ate

1000600 800400200

Total sample size

(d)

Figure 2 Continued

400 600 800 1000200

Total sample size

005

010

015

020

025

030

Pow

er

(g)

02

04

06

08

Pow

er

400 600 800 1000200

Total sample size

(h)

Figure 1 +e average power of MLE (solid lines) PML (dotted line) and WLR (broken line) methods on measures with 5 and 15 itemsNote Left panel for DIF 04 and right panel for DIF 08 From top to bottom the four panels are (nf nr τ 0156) (nf 3nr τ 0156)(nf nr τ 0069) and (nf 3nr τ 0069)

8 BioMed Research International

and ML respectively Moreover as compared with ML andPML WLR gives smaller standard error for regressioncoefficient and it can result in increased likelihood offinding DIF

4 Discussion

Detecting DIF based on the logistic regression model can bechallenging for binary data where events are rare or severely

000

002

004

006

Type

I er

ror r

ate

1000600 800400200

Total sample size

(e)

000

002

004

006

Type

I er

ror r

ate

1000600 800400200

Total sample size

(f )

000

001

002

003

004

005

006

Type

I er

ror r

ate

1000600 800400200

Total sample size

(g)

400 600 800 1000200

Total sample size

000

002

004

006

Type

I er

ror r

ate

(h)

Figure 2 +e average type I error rates of MLE (solid lines) PMLE (dotted line) and WLR (broken line) methods on measures with 5 and15 items Note Left panel for DIF 04 and right panel for DIF 08 From top to bottom the four panels are (nf nr τ 0156) (nf 3nrτ 0156) (nf nr τ 0069) and (nf 3nr τ 0069)

Table 3 +e results of DIF analysis across male and female Serbian individuals based on ML PML and WLR methods

ItemML PML WLR

B SE P value B SE P value B SE P value τ

Social phobia

7 061 029 0032 06 029 0042 061 031 0042 02420 minus 116 044 0007 minus 113 044 001 minus 119 042 0002 01738 124 044 0002 119 043 0004 118 041 0001 01943 minus 157 047 lt0001 minus 152 046 0001 minus 156 044 lt0001 019

Separation anxiety

9 minus 089 048 006 minus 086 047 01 minus 083 036 0015 03117 206 122 0045 17 106 0092 097 075 0008 01145 096 048 0034 091 047 0059 104 056 0036 00946 minus 304 114 0001 minus 269 101 0003 minus 19 051 lt0001 014

Generalized anxiety disorder 13 114 035 0001 112 035 0001 112 035 0001 03137 minus 201 044 lt0001 minus 196 044 lt0001 minus 195 044 lt0001 022

Obsessive compulsive disorder 23 079 043 0059 075 043 01 079 041 0038 018

Major depression2 123 045 0004 118 044 0006 119 045 0003 0196 minus 122 058 0031 minus 118 056 0043 minus 115 042 0004 01221 063 032 0048 062 032 0072 065 034 0049 026

Note P value is reported in three decimal places for more accuracy of comparing the three models τ the fraction of 1s in the population that is extracted froma data set of 4192 adolescents in eleven countries B regression coefficient for testing uniform DIF SE standard error of regression coefficient

BioMed Research International 9

imbalanced +e present study is designed to clarify theunknown consequences of rare events data in the context ofDIF analysis based on the LR model with and withoutapplying bias correction methods Hence in the compre-hensive simulation study we compared the performance ofthe WLR PML and ML methods for detecting DIF withfocus on imbalanced binary response data

According to our simulation findings of the three es-timation methods for detecting DIF the WLR seems to bethe most appealing with regards to its statistical propertiesits type I error rate is close to or less than the nominal 005level and its power is considerably higher than the PML andML Moreover in the present study comparing the threeinferential methods for the real data set confirmed thefindings of the simulation Accordingly as compared withWLR PML and ML were less sensitive for detecting DIFacross male and female Serbian adolescents

It is worthwhile to explain why the WLR outperforms thePML and ML estimation methods for DIF analysis Indeedthe WLR is an extension of the small-sample bias correctionsas described by Cordeiro and McCullagh to the weightedlikelihood [32] A previous simulation study has shown thatthis bias correction method reduces both the bias and thevariance thereby offering the smallest Mean Squared Error(MSE) when compared with that of PML and ML estimationmethods [33] Hence applying the WLR in DIF analysis canlead to reduction in the standard error of the parameterestimates which consequently increases the likelihood offinding DIF Another advantage of the WLR is that it givesunbiased predicted probability [15 20] However the pre-dicted probabilities from the WLRmodel may fall outside theplausible range of 0 to 1 [15]

Furthermore our findings revealed that the logisticregression model based on the traditional ML estimationmethod had a slightly better performance than the PML forimbalanced or rare events data +ese findings were similarto those of Lee who reported that the ML slightly out-performed the PML for detecting DIF in small samples [9]In this case as described by Lee it seems that the sacrifice inthe precision cannot be compensated for the bias reductionof the PML estimates in imbalanced data It should also benoted that while Firthrsquos PML method reduces the bias in theestimates of regression coefficients it introduces bias in thepredicted probabilities and this bias is not negligible for rareevents data [15] To overcome this problem Puhr et alproposed two modified versions of Firthrsquos PML resulting inunbiased predicted probabilities [15] Accordingly theproposed methods including Firthrsquos logistic regression withintercept correction and Firthrsquos logistic regression withadded covariate efficiently improve on predictions fromFirthrsquos logistic regression [15] Since these methods were notimplemented in the standard statistical software such as Rwe were not able to apply them in the context of DIF analysisand compare their results with those of the WLR modelHence in the present study the WLR is the most efficientmethod for DIF analysis under rare events data +e WLRnot only reduces the bias of regression coefficients but alsoprovides unbiased predicted probability in comparison tothe PML method [15 20]

What distinguishes this study from the previous one isthat we simultaneously evaluated the effect of imbalanced dataand small samples (as well as large samples) on the perfor-mance of the three estimation methods for DIF analysis Inaddition one of the advantages of the present study is that itprovides a guideline about the required sample size for DIFanalysis with imbalanced or rare events data Previous sim-ulation studies have shown that the minimum sample size forDIF analysis with logistic regression should be within therange of 100 to 200 per group [34 35] However for moderateimbalanced data (τ 0156) and severe DIF (DIF 08) as ageneral rule of thumb we would suggest imposing a mini-mum of 300 respondents per group to achieve the adequatepower with the WLR method In addition for severe im-balance rate (τ 0069) and DIF 08 theWLR performs wellto ensure the acceptable power with samples of 500 per groupTo find what the minimum number of sample size is toachieve the adequate power (80) for moderate DIF wesimulated items with DIF 04 τ 0156 and 0069 andsample sizes greater than 500 in each group not reported inthe results section +e findings revealed that sample sizes ofat least 1000 per group for τ 0156 and 2000 per group forτ 0069 are required to detect DIF 04 based on the WLRmethod

In the present study we restricted our simulation to onlytwo bias correction methods for detecting DIF namelyWLR and PML Nevertheless there are a limited number ofstudies that evaluated DIF based on other penalizationmethods such as LASSO For example Tutz and Schaubergerapplied LASSO penalization for DIF analysis which includescontinuous variables [36] Furthermore Magis et al pro-posed a new DIF detection method based on LASSO whereall items were simultaneously evaluated for DIF in a singlemodeling approach so that multiple testing would not be aproblem [37]

5 Conclusion

Although the logistic regression (LR) model is one the mostcommon methods for detecting DIF certain samplingstrategies and appropriate bias correction techniques shouldbe applied when LR is implemented on moderate or severeimbalanced data sets In summary our findings revealedthat as compared with ML and PML the WLR is a moresensitive method for detecting DIF when data are imbal-anced or rare Hence as well as its easy application toexisting software the WLR introduced by King and Zeng isstrongly recommended for detecting DIF due to higherpower and lower type I error rate in comparison to PML andML inferential methods However in the future studies ofDIF penalized and bias correction methods should beproposed for ordinal logistic regression in the presence ofrare events or small sample setting

Data Availability

+is is a simulation study and the real data set has beenprovided by Dr Dejan Stevanovic the third author of themanuscript

10 BioMed Research International

Conflicts of Interest

+e authors declare no potential conflicts of interest withrespect to the research authorship andor publication ofthis article

Acknowledgments

+is work was supported by the Shiraz University of MedicalSciences Research Council Shiraz Iran under grant no96-01-01-14601 +is paper was extracted fromMarjan FaghihrsquosPhD thesis under supervision of Prof Peyman JafariPeyman Jafari andMarjan Faghih equally contributed to thiswork

References

[1] B D Zumbo ldquo+ree generations of DIF analyses consideringwhere it has been where it is now and where it is goingrdquoLanguage Assessment Quarterly vol 4 no 2 pp 223ndash2332007

[2] T A Ackerman ldquo+e theory and practice of item responsetheory by de Ayala R Jrdquo Journal of Educational Measure-ment vol 47 no 4 pp 471ndash476 2010

[3] B E Clauser and K M Mazor ldquoUsing statistical proceduresto identify differentially functioning test itemsrdquo EducationalMeasurement Issues and Practice vol 17 no 1 pp 31ndash442005

[4] H Swaminathan and H J Rogers ldquoDetecting differential itemfunctioning using logistic regression proceduresrdquo Journal ofEducational Measurement vol 27 no 4 pp 361ndash370 1990

[5] H J Rogers and H Swaminathan ldquoA comparison of logisticregression and Mantel-Haenszel procedures for detectingdifferential item functioningrdquo Applied Psychological Mea-surement vol 17 no 2 pp 105ndash116 1993

[6] P Narayanon and H Swaminathan ldquoIdentification of itemsthat show nonuniform DIFrdquo Applied Psychological Mea-surement vol 20 no 3 pp 257ndash274 1996

[7] P Jafari D Stevanovic and Z Bagheri ldquoCross-culturalmeasurement equivalence of the KINDL questionnaire forquality of life assessment in children and adolescentsrdquo ChildPsychiatry amp Human Development vol 47 no 2 pp 291ndash3042016

[8] P K Crane L E Gibbons K Ocepek-Welikson et al ldquoAcomparison of three sets of criteria for determining thepresence of differential item functioning using ordinal logisticregressionrdquo Quality of Life Research vol 16 no 1 pp 69ndash842007

[9] S Lee ldquoDetecting differential item functioning using thelogistic regression procedure in small samplesrdquo AppliedPsychological Measurement vol 41 no 1 pp 30ndash43 2017

[10] E Allahyari P Jafari and Z Bagheri ldquoA simulation study toassess the effect of the number of response categories on thepower of ordinal logistic regression for differential itemfunctioning analysis in rating scalesrdquo Computational andMathematical Methods in Medicine vol 2016 Article ID5080826 8 pages 2016

[11] P O Monahan C A McHorney T E Stump andA J Perkins ldquoOdds ratio delta ETS classification andstandardization measures of dif magnitude for binary logisticregressionrdquo Journal of Educational and Behavioral Statisticsvol 32 no 1 pp 92ndash109 2007

[12] S Apinyapibal N Lawthong and S Kanjanawasee ldquoAcomparative analysis of the efficacy of differential itemfunctioning detection for dichotomously scored items amonglogistic regression SIBTEST and raschtree methodsrdquo Proce-dia-Social and Behavioral Sciences vol 191 pp 21ndash25 2015

[13] H-F Chen and K-Y Jin ldquoApplying logistic regression todetect differential item functioning in multidimensionaldatardquo Frontiers in Psychology vol 9 p 1302 2018

[14] D Svetina and L Rutkowski ldquoDetecting differential itemfunctioning using generalized logistic regression in thecontext of large-scale assessmentsrdquo Large-scale Assessments inEducation vol 2 no 1 2014

[15] R Puhr G Heinze M Nold et al ldquoFirthrsquos logistic regressionwith rare events accurate effect estimates and predictionsrdquoStatistics in Medicine vol 36 no 14 pp 2302ndash2317 2017

[16] D Firth ldquoBias reduction of maximum likelihood estimatesrdquoBiometrika vol 80 no 1 pp 27ndash38 1993

[17] M Pavlou G Ambler S Seaman M De Iorio andR Z Omar ldquoReview and evaluation of penalised regressionmethods for risk prediction in low-dimensional data with feweventsrdquo Statistics in Medicine vol 35 no 7 pp 1159ndash11772016

[18] M Maalouf and T B Trafalis ldquoRobust weighted kernel lo-gistic regression in imbalanced and rare events datardquo Com-putational Statistics amp Data Analysis vol 55 no 1pp 168ndash183 2011

[19] M Maalouf D Homouz and T B Trafalis ldquoLogistic re-gression in large rare events and imbalanced data a perfor-mance comparison of prior correction and weightingmethodsrdquo Computational Intelligence vol 34 no 1pp 161ndash174 2018

[20] G King and L Zeng ldquoLogistic regression in rare events datardquoPolitical Analysis vol 9 no 2 pp 137ndash163 2001

[21] I-C Huang W L Leite P Shearer M Seid D A Revickiand E A Shenkman ldquoDifferential item functioning in qualityof life measure between children with and without specialhealth-care needsrdquo Value in Health vol 14 no 6 pp 872ndash883 2011

[22] H Jeffreys ldquoAn invariant form for the prior probability inestimation problemsrdquo Proceedings of the Royal Society ofLondon Series A Mathematical and Physical Sciencesvol 186 no 1007 pp 453ndash461 1946

[23] G Heinze and M Schemper ldquoA solution to the problem ofseparation in logistic regressionrdquo Statistics in Medicinevol 21 no 16 pp 2409ndash2419 2002

[24] G Heinze and M Ploner ldquoFixing the nonconvergence bug inlogistic regression with SPLUS and SASrdquo Computer Methodsand Programs in Biomedicine vol 71 no 2 pp 181ndash187 2003

[25] G King and L Zeng ldquoExplaining rare events in internationalrelationsrdquo International Organization vol 55 no 3pp 693ndash715 2001

[26] M Maalouf ldquoLogistic regression in data analysis an over-viewrdquo International Journal of Data Analysis Techniques andStrategies vol 3 no 3 pp 281ndash299 2011

[27] D E A Sulasih S W Purnami and S P Rahayu BeBeoretical Study of Rare Event Weighted Logistic Regressionfor Classification of Imbalanced Data Universitas Muham-madiyah Surakarta Surakarta Indonesia 2015

[28] G Heinze logistf Firthrsquos Bias Reduced Logistic RegressionR Package Version 123 2018 httpsCRANR-projectorgpackage=logistf

[29] C Choirat J Honaker K Imai G King and O Lau ZeligEveryonersquos Statistical Software Version 5161 2018 httpzeligprojectorg

BioMed Research International 11

[30] M L Ong L Lu S Lee et al ldquoA comparison of the hier-archical generalized linear model multiple-indicators mul-tiple-causes and the item response theory-likelihood ratio testfor detecting differential item functioningrdquo in QuantitativePsychology Research pp 343ndash357 Springer Berlin Germany2015

[31] D Stevanovic Z Bagheri O Atilola et al ldquoCross-culturalmeasurement invariance of the revised Child anxiety anddepression scale across 11 world-wide societiesrdquo Epidemiol-ogy and Psychiatric Sciences vol 26 no 4 pp 430ndash440 2017

[32] G M Cordeiro and P McCullagh ldquoBias correction in gen-eralized linear modelsrdquo Journal of the Royal Statistical SocietySeries B (Methodological) vol 53 no 3 pp 629ndash643 1991

[33] T Maiti and V Pradhan ldquoBias reduction and a solution forseparation of logistic regression with missing covariatesrdquoBiometrics vol 65 no 4 pp 1262ndash1269 2009

[34] B D Zumbo A Handbook on the Beory and Methods ofDifferential Item Functioning (DIF) National DefenseHeadquarters Ottawa Canada 1999

[35] J-S Lai J Teresi and R Gershon ldquoProcedures for the analysisof differential item functioning (DIF) for small sample sizesrdquoEvaluation amp the Health Professions vol 28 no 3 pp 283ndash294 2005

[36] G Tutz and G Schauberger A Penalty Approach to Differ-ential Item Functioning in Rasch Models Springer BerlinGermany 2015

[37] D Magis F Tuerlinckx and P de Boeck ldquoDetection ofdifferential item functioning using the lasso approachrdquoJournal of Educational and Behavioral Statistics vol 40 no 2pp 111ndash135 2015

12 BioMed Research International

Page 6: A Comparative Study of the Bias Correction Methods for ... · 2020. 3. 2. · mum likelihood estimation method (PML) was originally developedbyDavidFirth[16]inordertoreducethesmall

project see [31] +e results of DIF analysis of the RCADSacross male and female Serbian adolescents based on threeinferential methods (WLR PML and ML) are shown inTable 3 Our findings showed that ten items with DIF and33 items without DIF were common across the three

methods However the results indicate that the WLRmethod is more sensitive than the PML and ML fordetecting DIF For example in the separation anxietysubscale while the WLR model identified four items withDIF one and three items exhibited DIF according to PML

Table 2 Type I error rate of different methods of estimation under different combinations

Item Ratio NDIF 04 DIF 08

τ 0156 τ 0069 τ 0156 τ 0069ML PML WLR ML PML WLR ML PML WLR ML PML WLR

5 nf nr200 007 005 002 007 005 001lowast 007 005 003 007 005 001lowast600 005 004 003 005 003 001lowast 008 006 005 006 004 001lowast1000 007 005 004 006 004 001 011 008 008 006 005 001

5 nf 2nr200 007 005 003 007 005 001 008 006 004 007 005 001600 006 004 002 004 004 001 008 006 005 005 004 0011000 006 005 003 005 004 001 011 009 007 006 004 001

5 nf 3nr200 006 004 003 005 004 001 007 005 004 006 004 001600 007 005 003 007 005 001 008 006 004 007 005 0011000 007 006 004 006 004 001 009 007 007 006 005 001

15 nf nr200 005 003 002 005 003 001 005 003 002 005 003 001600 005 004 002 005 004 001 006 004 003 005 004 0011000 005 005 003 005 005 001 005 005 003 005 005 001

15 nf 2nr200 007 005 003 006 005 001lowast 007 005 003 007 005 001lowast600 006 004 003 006 005 001 006 005 003 006 005 0011000 004 004 002 004 003 001lowast 005 004 002 004 004 001lowast

15 nf 3nr200 006 005 002 006 005 001 006 005 002 006 005 001600 005 005 002 005 004 001lowast 005 005 002 005 005 001lowast1000 005 004 002 005 004 001 005 004 002 005 004 001lowast

Note Ratio sample size ratio between the reference and focal groups nr and nf represent sample sizes in reference and focal groups respectively N totalsample size N nr + nf τ the fraction of 1s in the population DIF differential item functioning ML maximum likelihood PML penalized maximumlikelihood WLR Weighted Logistic Regression lowastNear to 001

Table 1 Statistical power of different methods of estimation under different combinations

Item Ratio NDIF 04 DIF 08

τ 0156 τ 0069 τ 0156 τ 0069ML PML WLR ML PML WLR ML PML WLR ML PML WLR

5 nf nr200 013 010 016 009 006 012 034 030 042 020 015 028600 029 026 033 017 015 021 078 075 083 049 045 0581000 046 042 051 023 021 027 094 092 096 069 066 077

5 nf 2nr200 013 011 016 009 007 013 032 029 041 019 015 030600 029 027 032 016 015 022 074 072 082 045 044 0591000 041 039 046 023 022 029 089 088 093 062 061 073

5 nf 3nr200 013 011 017 009 008 016 030 028 041 017 016 029600 021 020 026 013 013 018 061 060 073 037 036 0511000 037 035 042 021 020 026 086 085 091 056 056 070

15 nf nr200 013 011 017 010 009 013 036 033 045 022 018 029600 034 032 038 020 019 023 081 080 087 052 049 0621000 048 046 053 028 025 032 096 096 098 074 072 081

15 nf 2nr200 013 012 017 011 009 015 035 033 044 022 020 032600 034 032 039 020 020 025 077 077 085 051 050 0651000 045 044 050 025 024 031 094 094 097 070 070 080

15 nf 3nr200 013 013 017 010 008 014 031 031 041 019 019 031600 025 025 031 015 015 021 074 074 083 043 044 0571000 037 037 043 023 024 030 090 089 094 064 066 077

Note Ratio sample size ratio between the reference and focal groups nr and nf represent sample sizes in reference and focal groups respectively N totalsample size N nr + nf τ the fraction of 1s in the population DIF differential item functioning ML maximum likelihood PML penalized maximumlikelihood WLR Weighted Logistic Regression

6 BioMed Research International

010

020

030

040

050

060

Pow

er

400 600 800 1000200

Total sample size

(a)

04

06

08

010

Pow

er

400 600 800 1000200

Total sample size

(b)

400 600 800 1000200

Total sample size

010

020

030

040

050

Pow

er

(c)

02

04

06

08

010

Pow

er

400 600 800 1000200

Total sample size

(d)

005

015

010

020

025

030

Pow

er

400 600 800 1000200

Total sample size

(e)

020

040

060

080

Pow

er

400 600 800 1000200

Total sample size

(f )

Figure 1 Continued

BioMed Research International 7

400 600 800 1000200

Total sample size

002

003

004

005

006

007

Type

I er

ror r

ate

(a)

002

004

006

008

Type

I er

ror r

ate

400 600 800 1000200

Total sample size

(b)

002

003

004

005

006

007

Type

I er

ror r

ate

1000600 800400200

Total sample size

(c)

002

003

004

005

006

007

008

Type

I er

ror r

ate

1000600 800400200

Total sample size

(d)

Figure 2 Continued

400 600 800 1000200

Total sample size

005

010

015

020

025

030

Pow

er

(g)

02

04

06

08

Pow

er

400 600 800 1000200

Total sample size

(h)

Figure 1 +e average power of MLE (solid lines) PML (dotted line) and WLR (broken line) methods on measures with 5 and 15 itemsNote Left panel for DIF 04 and right panel for DIF 08 From top to bottom the four panels are (nf nr τ 0156) (nf 3nr τ 0156)(nf nr τ 0069) and (nf 3nr τ 0069)

8 BioMed Research International

and ML respectively Moreover as compared with ML andPML WLR gives smaller standard error for regressioncoefficient and it can result in increased likelihood offinding DIF

4 Discussion

Detecting DIF based on the logistic regression model can bechallenging for binary data where events are rare or severely

000

002

004

006

Type

I er

ror r

ate

1000600 800400200

Total sample size

(e)

000

002

004

006

Type

I er

ror r

ate

1000600 800400200

Total sample size

(f )

000

001

002

003

004

005

006

Type

I er

ror r

ate

1000600 800400200

Total sample size

(g)

400 600 800 1000200

Total sample size

000

002

004

006

Type

I er

ror r

ate

(h)

Figure 2 +e average type I error rates of MLE (solid lines) PMLE (dotted line) and WLR (broken line) methods on measures with 5 and15 items Note Left panel for DIF 04 and right panel for DIF 08 From top to bottom the four panels are (nf nr τ 0156) (nf 3nrτ 0156) (nf nr τ 0069) and (nf 3nr τ 0069)

Table 3 +e results of DIF analysis across male and female Serbian individuals based on ML PML and WLR methods

ItemML PML WLR

B SE P value B SE P value B SE P value τ

Social phobia

7 061 029 0032 06 029 0042 061 031 0042 02420 minus 116 044 0007 minus 113 044 001 minus 119 042 0002 01738 124 044 0002 119 043 0004 118 041 0001 01943 minus 157 047 lt0001 minus 152 046 0001 minus 156 044 lt0001 019

Separation anxiety

9 minus 089 048 006 minus 086 047 01 minus 083 036 0015 03117 206 122 0045 17 106 0092 097 075 0008 01145 096 048 0034 091 047 0059 104 056 0036 00946 minus 304 114 0001 minus 269 101 0003 minus 19 051 lt0001 014

Generalized anxiety disorder 13 114 035 0001 112 035 0001 112 035 0001 03137 minus 201 044 lt0001 minus 196 044 lt0001 minus 195 044 lt0001 022

Obsessive compulsive disorder 23 079 043 0059 075 043 01 079 041 0038 018

Major depression2 123 045 0004 118 044 0006 119 045 0003 0196 minus 122 058 0031 minus 118 056 0043 minus 115 042 0004 01221 063 032 0048 062 032 0072 065 034 0049 026

Note P value is reported in three decimal places for more accuracy of comparing the three models τ the fraction of 1s in the population that is extracted froma data set of 4192 adolescents in eleven countries B regression coefficient for testing uniform DIF SE standard error of regression coefficient

BioMed Research International 9

imbalanced +e present study is designed to clarify theunknown consequences of rare events data in the context ofDIF analysis based on the LR model with and withoutapplying bias correction methods Hence in the compre-hensive simulation study we compared the performance ofthe WLR PML and ML methods for detecting DIF withfocus on imbalanced binary response data

According to our simulation findings of the three es-timation methods for detecting DIF the WLR seems to bethe most appealing with regards to its statistical propertiesits type I error rate is close to or less than the nominal 005level and its power is considerably higher than the PML andML Moreover in the present study comparing the threeinferential methods for the real data set confirmed thefindings of the simulation Accordingly as compared withWLR PML and ML were less sensitive for detecting DIFacross male and female Serbian adolescents

It is worthwhile to explain why the WLR outperforms thePML and ML estimation methods for DIF analysis Indeedthe WLR is an extension of the small-sample bias correctionsas described by Cordeiro and McCullagh to the weightedlikelihood [32] A previous simulation study has shown thatthis bias correction method reduces both the bias and thevariance thereby offering the smallest Mean Squared Error(MSE) when compared with that of PML and ML estimationmethods [33] Hence applying the WLR in DIF analysis canlead to reduction in the standard error of the parameterestimates which consequently increases the likelihood offinding DIF Another advantage of the WLR is that it givesunbiased predicted probability [15 20] However the pre-dicted probabilities from the WLRmodel may fall outside theplausible range of 0 to 1 [15]

Furthermore our findings revealed that the logisticregression model based on the traditional ML estimationmethod had a slightly better performance than the PML forimbalanced or rare events data +ese findings were similarto those of Lee who reported that the ML slightly out-performed the PML for detecting DIF in small samples [9]In this case as described by Lee it seems that the sacrifice inthe precision cannot be compensated for the bias reductionof the PML estimates in imbalanced data It should also benoted that while Firthrsquos PML method reduces the bias in theestimates of regression coefficients it introduces bias in thepredicted probabilities and this bias is not negligible for rareevents data [15] To overcome this problem Puhr et alproposed two modified versions of Firthrsquos PML resulting inunbiased predicted probabilities [15] Accordingly theproposed methods including Firthrsquos logistic regression withintercept correction and Firthrsquos logistic regression withadded covariate efficiently improve on predictions fromFirthrsquos logistic regression [15] Since these methods were notimplemented in the standard statistical software such as Rwe were not able to apply them in the context of DIF analysisand compare their results with those of the WLR modelHence in the present study the WLR is the most efficientmethod for DIF analysis under rare events data +e WLRnot only reduces the bias of regression coefficients but alsoprovides unbiased predicted probability in comparison tothe PML method [15 20]

What distinguishes this study from the previous one isthat we simultaneously evaluated the effect of imbalanced dataand small samples (as well as large samples) on the perfor-mance of the three estimation methods for DIF analysis Inaddition one of the advantages of the present study is that itprovides a guideline about the required sample size for DIFanalysis with imbalanced or rare events data Previous sim-ulation studies have shown that the minimum sample size forDIF analysis with logistic regression should be within therange of 100 to 200 per group [34 35] However for moderateimbalanced data (τ 0156) and severe DIF (DIF 08) as ageneral rule of thumb we would suggest imposing a mini-mum of 300 respondents per group to achieve the adequatepower with the WLR method In addition for severe im-balance rate (τ 0069) and DIF 08 theWLR performs wellto ensure the acceptable power with samples of 500 per groupTo find what the minimum number of sample size is toachieve the adequate power (80) for moderate DIF wesimulated items with DIF 04 τ 0156 and 0069 andsample sizes greater than 500 in each group not reported inthe results section +e findings revealed that sample sizes ofat least 1000 per group for τ 0156 and 2000 per group forτ 0069 are required to detect DIF 04 based on the WLRmethod

In the present study we restricted our simulation to onlytwo bias correction methods for detecting DIF namelyWLR and PML Nevertheless there are a limited number ofstudies that evaluated DIF based on other penalizationmethods such as LASSO For example Tutz and Schaubergerapplied LASSO penalization for DIF analysis which includescontinuous variables [36] Furthermore Magis et al pro-posed a new DIF detection method based on LASSO whereall items were simultaneously evaluated for DIF in a singlemodeling approach so that multiple testing would not be aproblem [37]

5 Conclusion

Although the logistic regression (LR) model is one the mostcommon methods for detecting DIF certain samplingstrategies and appropriate bias correction techniques shouldbe applied when LR is implemented on moderate or severeimbalanced data sets In summary our findings revealedthat as compared with ML and PML the WLR is a moresensitive method for detecting DIF when data are imbal-anced or rare Hence as well as its easy application toexisting software the WLR introduced by King and Zeng isstrongly recommended for detecting DIF due to higherpower and lower type I error rate in comparison to PML andML inferential methods However in the future studies ofDIF penalized and bias correction methods should beproposed for ordinal logistic regression in the presence ofrare events or small sample setting

Data Availability

+is is a simulation study and the real data set has beenprovided by Dr Dejan Stevanovic the third author of themanuscript

10 BioMed Research International

Conflicts of Interest

+e authors declare no potential conflicts of interest withrespect to the research authorship andor publication ofthis article

Acknowledgments

+is work was supported by the Shiraz University of MedicalSciences Research Council Shiraz Iran under grant no96-01-01-14601 +is paper was extracted fromMarjan FaghihrsquosPhD thesis under supervision of Prof Peyman JafariPeyman Jafari andMarjan Faghih equally contributed to thiswork

References

[1] B D Zumbo ldquo+ree generations of DIF analyses consideringwhere it has been where it is now and where it is goingrdquoLanguage Assessment Quarterly vol 4 no 2 pp 223ndash2332007

[2] T A Ackerman ldquo+e theory and practice of item responsetheory by de Ayala R Jrdquo Journal of Educational Measure-ment vol 47 no 4 pp 471ndash476 2010

[3] B E Clauser and K M Mazor ldquoUsing statistical proceduresto identify differentially functioning test itemsrdquo EducationalMeasurement Issues and Practice vol 17 no 1 pp 31ndash442005

[4] H Swaminathan and H J Rogers ldquoDetecting differential itemfunctioning using logistic regression proceduresrdquo Journal ofEducational Measurement vol 27 no 4 pp 361ndash370 1990

[5] H J Rogers and H Swaminathan ldquoA comparison of logisticregression and Mantel-Haenszel procedures for detectingdifferential item functioningrdquo Applied Psychological Mea-surement vol 17 no 2 pp 105ndash116 1993

[6] P Narayanon and H Swaminathan ldquoIdentification of itemsthat show nonuniform DIFrdquo Applied Psychological Mea-surement vol 20 no 3 pp 257ndash274 1996

[7] P Jafari D Stevanovic and Z Bagheri ldquoCross-culturalmeasurement equivalence of the KINDL questionnaire forquality of life assessment in children and adolescentsrdquo ChildPsychiatry amp Human Development vol 47 no 2 pp 291ndash3042016

[8] P K Crane L E Gibbons K Ocepek-Welikson et al ldquoAcomparison of three sets of criteria for determining thepresence of differential item functioning using ordinal logisticregressionrdquo Quality of Life Research vol 16 no 1 pp 69ndash842007

[9] S Lee ldquoDetecting differential item functioning using thelogistic regression procedure in small samplesrdquo AppliedPsychological Measurement vol 41 no 1 pp 30ndash43 2017

[10] E Allahyari P Jafari and Z Bagheri ldquoA simulation study toassess the effect of the number of response categories on thepower of ordinal logistic regression for differential itemfunctioning analysis in rating scalesrdquo Computational andMathematical Methods in Medicine vol 2016 Article ID5080826 8 pages 2016

[11] P O Monahan C A McHorney T E Stump andA J Perkins ldquoOdds ratio delta ETS classification andstandardization measures of dif magnitude for binary logisticregressionrdquo Journal of Educational and Behavioral Statisticsvol 32 no 1 pp 92ndash109 2007

[12] S Apinyapibal N Lawthong and S Kanjanawasee ldquoAcomparative analysis of the efficacy of differential itemfunctioning detection for dichotomously scored items amonglogistic regression SIBTEST and raschtree methodsrdquo Proce-dia-Social and Behavioral Sciences vol 191 pp 21ndash25 2015

[13] H-F Chen and K-Y Jin ldquoApplying logistic regression todetect differential item functioning in multidimensionaldatardquo Frontiers in Psychology vol 9 p 1302 2018

[14] D Svetina and L Rutkowski ldquoDetecting differential itemfunctioning using generalized logistic regression in thecontext of large-scale assessmentsrdquo Large-scale Assessments inEducation vol 2 no 1 2014

[15] R Puhr G Heinze M Nold et al ldquoFirthrsquos logistic regressionwith rare events accurate effect estimates and predictionsrdquoStatistics in Medicine vol 36 no 14 pp 2302ndash2317 2017

[16] D Firth ldquoBias reduction of maximum likelihood estimatesrdquoBiometrika vol 80 no 1 pp 27ndash38 1993

[17] M Pavlou G Ambler S Seaman M De Iorio andR Z Omar ldquoReview and evaluation of penalised regressionmethods for risk prediction in low-dimensional data with feweventsrdquo Statistics in Medicine vol 35 no 7 pp 1159ndash11772016

[18] M Maalouf and T B Trafalis ldquoRobust weighted kernel lo-gistic regression in imbalanced and rare events datardquo Com-putational Statistics amp Data Analysis vol 55 no 1pp 168ndash183 2011

[19] M Maalouf D Homouz and T B Trafalis ldquoLogistic re-gression in large rare events and imbalanced data a perfor-mance comparison of prior correction and weightingmethodsrdquo Computational Intelligence vol 34 no 1pp 161ndash174 2018

[20] G King and L Zeng ldquoLogistic regression in rare events datardquoPolitical Analysis vol 9 no 2 pp 137ndash163 2001

[21] I-C Huang W L Leite P Shearer M Seid D A Revickiand E A Shenkman ldquoDifferential item functioning in qualityof life measure between children with and without specialhealth-care needsrdquo Value in Health vol 14 no 6 pp 872ndash883 2011

[22] H Jeffreys ldquoAn invariant form for the prior probability inestimation problemsrdquo Proceedings of the Royal Society ofLondon Series A Mathematical and Physical Sciencesvol 186 no 1007 pp 453ndash461 1946

[23] G Heinze and M Schemper ldquoA solution to the problem ofseparation in logistic regressionrdquo Statistics in Medicinevol 21 no 16 pp 2409ndash2419 2002

[24] G Heinze and M Ploner ldquoFixing the nonconvergence bug inlogistic regression with SPLUS and SASrdquo Computer Methodsand Programs in Biomedicine vol 71 no 2 pp 181ndash187 2003

[25] G King and L Zeng ldquoExplaining rare events in internationalrelationsrdquo International Organization vol 55 no 3pp 693ndash715 2001

[26] M Maalouf ldquoLogistic regression in data analysis an over-viewrdquo International Journal of Data Analysis Techniques andStrategies vol 3 no 3 pp 281ndash299 2011

[27] D E A Sulasih S W Purnami and S P Rahayu BeBeoretical Study of Rare Event Weighted Logistic Regressionfor Classification of Imbalanced Data Universitas Muham-madiyah Surakarta Surakarta Indonesia 2015

[28] G Heinze logistf Firthrsquos Bias Reduced Logistic RegressionR Package Version 123 2018 httpsCRANR-projectorgpackage=logistf

[29] C Choirat J Honaker K Imai G King and O Lau ZeligEveryonersquos Statistical Software Version 5161 2018 httpzeligprojectorg

BioMed Research International 11

[30] M L Ong L Lu S Lee et al ldquoA comparison of the hier-archical generalized linear model multiple-indicators mul-tiple-causes and the item response theory-likelihood ratio testfor detecting differential item functioningrdquo in QuantitativePsychology Research pp 343ndash357 Springer Berlin Germany2015

[31] D Stevanovic Z Bagheri O Atilola et al ldquoCross-culturalmeasurement invariance of the revised Child anxiety anddepression scale across 11 world-wide societiesrdquo Epidemiol-ogy and Psychiatric Sciences vol 26 no 4 pp 430ndash440 2017

[32] G M Cordeiro and P McCullagh ldquoBias correction in gen-eralized linear modelsrdquo Journal of the Royal Statistical SocietySeries B (Methodological) vol 53 no 3 pp 629ndash643 1991

[33] T Maiti and V Pradhan ldquoBias reduction and a solution forseparation of logistic regression with missing covariatesrdquoBiometrics vol 65 no 4 pp 1262ndash1269 2009

[34] B D Zumbo A Handbook on the Beory and Methods ofDifferential Item Functioning (DIF) National DefenseHeadquarters Ottawa Canada 1999

[35] J-S Lai J Teresi and R Gershon ldquoProcedures for the analysisof differential item functioning (DIF) for small sample sizesrdquoEvaluation amp the Health Professions vol 28 no 3 pp 283ndash294 2005

[36] G Tutz and G Schauberger A Penalty Approach to Differ-ential Item Functioning in Rasch Models Springer BerlinGermany 2015

[37] D Magis F Tuerlinckx and P de Boeck ldquoDetection ofdifferential item functioning using the lasso approachrdquoJournal of Educational and Behavioral Statistics vol 40 no 2pp 111ndash135 2015

12 BioMed Research International

Page 7: A Comparative Study of the Bias Correction Methods for ... · 2020. 3. 2. · mum likelihood estimation method (PML) was originally developedbyDavidFirth[16]inordertoreducethesmall

010

020

030

040

050

060

Pow

er

400 600 800 1000200

Total sample size

(a)

04

06

08

010

Pow

er

400 600 800 1000200

Total sample size

(b)

400 600 800 1000200

Total sample size

010

020

030

040

050

Pow

er

(c)

02

04

06

08

010

Pow

er

400 600 800 1000200

Total sample size

(d)

005

015

010

020

025

030

Pow

er

400 600 800 1000200

Total sample size

(e)

020

040

060

080

Pow

er

400 600 800 1000200

Total sample size

(f )

Figure 1 Continued

BioMed Research International 7

400 600 800 1000200

Total sample size

002

003

004

005

006

007

Type

I er

ror r

ate

(a)

002

004

006

008

Type

I er

ror r

ate

400 600 800 1000200

Total sample size

(b)

002

003

004

005

006

007

Type

I er

ror r

ate

1000600 800400200

Total sample size

(c)

002

003

004

005

006

007

008

Type

I er

ror r

ate

1000600 800400200

Total sample size

(d)

Figure 2 Continued

400 600 800 1000200

Total sample size

005

010

015

020

025

030

Pow

er

(g)

02

04

06

08

Pow

er

400 600 800 1000200

Total sample size

(h)

Figure 1 +e average power of MLE (solid lines) PML (dotted line) and WLR (broken line) methods on measures with 5 and 15 itemsNote Left panel for DIF 04 and right panel for DIF 08 From top to bottom the four panels are (nf nr τ 0156) (nf 3nr τ 0156)(nf nr τ 0069) and (nf 3nr τ 0069)

8 BioMed Research International

and ML respectively Moreover as compared with ML andPML WLR gives smaller standard error for regressioncoefficient and it can result in increased likelihood offinding DIF

4 Discussion

Detecting DIF based on the logistic regression model can bechallenging for binary data where events are rare or severely

000

002

004

006

Type

I er

ror r

ate

1000600 800400200

Total sample size

(e)

000

002

004

006

Type

I er

ror r

ate

1000600 800400200

Total sample size

(f )

000

001

002

003

004

005

006

Type

I er

ror r

ate

1000600 800400200

Total sample size

(g)

400 600 800 1000200

Total sample size

000

002

004

006

Type

I er

ror r

ate

(h)

Figure 2 +e average type I error rates of MLE (solid lines) PMLE (dotted line) and WLR (broken line) methods on measures with 5 and15 items Note Left panel for DIF 04 and right panel for DIF 08 From top to bottom the four panels are (nf nr τ 0156) (nf 3nrτ 0156) (nf nr τ 0069) and (nf 3nr τ 0069)

Table 3 +e results of DIF analysis across male and female Serbian individuals based on ML PML and WLR methods

ItemML PML WLR

B SE P value B SE P value B SE P value τ

Social phobia

7 061 029 0032 06 029 0042 061 031 0042 02420 minus 116 044 0007 minus 113 044 001 minus 119 042 0002 01738 124 044 0002 119 043 0004 118 041 0001 01943 minus 157 047 lt0001 minus 152 046 0001 minus 156 044 lt0001 019

Separation anxiety

9 minus 089 048 006 minus 086 047 01 minus 083 036 0015 03117 206 122 0045 17 106 0092 097 075 0008 01145 096 048 0034 091 047 0059 104 056 0036 00946 minus 304 114 0001 minus 269 101 0003 minus 19 051 lt0001 014

Generalized anxiety disorder 13 114 035 0001 112 035 0001 112 035 0001 03137 minus 201 044 lt0001 minus 196 044 lt0001 minus 195 044 lt0001 022

Obsessive compulsive disorder 23 079 043 0059 075 043 01 079 041 0038 018

Major depression2 123 045 0004 118 044 0006 119 045 0003 0196 minus 122 058 0031 minus 118 056 0043 minus 115 042 0004 01221 063 032 0048 062 032 0072 065 034 0049 026

Note P value is reported in three decimal places for more accuracy of comparing the three models τ the fraction of 1s in the population that is extracted froma data set of 4192 adolescents in eleven countries B regression coefficient for testing uniform DIF SE standard error of regression coefficient

BioMed Research International 9

imbalanced +e present study is designed to clarify theunknown consequences of rare events data in the context ofDIF analysis based on the LR model with and withoutapplying bias correction methods Hence in the compre-hensive simulation study we compared the performance ofthe WLR PML and ML methods for detecting DIF withfocus on imbalanced binary response data

According to our simulation findings of the three es-timation methods for detecting DIF the WLR seems to bethe most appealing with regards to its statistical propertiesits type I error rate is close to or less than the nominal 005level and its power is considerably higher than the PML andML Moreover in the present study comparing the threeinferential methods for the real data set confirmed thefindings of the simulation Accordingly as compared withWLR PML and ML were less sensitive for detecting DIFacross male and female Serbian adolescents

It is worthwhile to explain why the WLR outperforms thePML and ML estimation methods for DIF analysis Indeedthe WLR is an extension of the small-sample bias correctionsas described by Cordeiro and McCullagh to the weightedlikelihood [32] A previous simulation study has shown thatthis bias correction method reduces both the bias and thevariance thereby offering the smallest Mean Squared Error(MSE) when compared with that of PML and ML estimationmethods [33] Hence applying the WLR in DIF analysis canlead to reduction in the standard error of the parameterestimates which consequently increases the likelihood offinding DIF Another advantage of the WLR is that it givesunbiased predicted probability [15 20] However the pre-dicted probabilities from the WLRmodel may fall outside theplausible range of 0 to 1 [15]

Furthermore our findings revealed that the logisticregression model based on the traditional ML estimationmethod had a slightly better performance than the PML forimbalanced or rare events data +ese findings were similarto those of Lee who reported that the ML slightly out-performed the PML for detecting DIF in small samples [9]In this case as described by Lee it seems that the sacrifice inthe precision cannot be compensated for the bias reductionof the PML estimates in imbalanced data It should also benoted that while Firthrsquos PML method reduces the bias in theestimates of regression coefficients it introduces bias in thepredicted probabilities and this bias is not negligible for rareevents data [15] To overcome this problem Puhr et alproposed two modified versions of Firthrsquos PML resulting inunbiased predicted probabilities [15] Accordingly theproposed methods including Firthrsquos logistic regression withintercept correction and Firthrsquos logistic regression withadded covariate efficiently improve on predictions fromFirthrsquos logistic regression [15] Since these methods were notimplemented in the standard statistical software such as Rwe were not able to apply them in the context of DIF analysisand compare their results with those of the WLR modelHence in the present study the WLR is the most efficientmethod for DIF analysis under rare events data +e WLRnot only reduces the bias of regression coefficients but alsoprovides unbiased predicted probability in comparison tothe PML method [15 20]

What distinguishes this study from the previous one isthat we simultaneously evaluated the effect of imbalanced dataand small samples (as well as large samples) on the perfor-mance of the three estimation methods for DIF analysis Inaddition one of the advantages of the present study is that itprovides a guideline about the required sample size for DIFanalysis with imbalanced or rare events data Previous sim-ulation studies have shown that the minimum sample size forDIF analysis with logistic regression should be within therange of 100 to 200 per group [34 35] However for moderateimbalanced data (τ 0156) and severe DIF (DIF 08) as ageneral rule of thumb we would suggest imposing a mini-mum of 300 respondents per group to achieve the adequatepower with the WLR method In addition for severe im-balance rate (τ 0069) and DIF 08 theWLR performs wellto ensure the acceptable power with samples of 500 per groupTo find what the minimum number of sample size is toachieve the adequate power (80) for moderate DIF wesimulated items with DIF 04 τ 0156 and 0069 andsample sizes greater than 500 in each group not reported inthe results section +e findings revealed that sample sizes ofat least 1000 per group for τ 0156 and 2000 per group forτ 0069 are required to detect DIF 04 based on the WLRmethod

In the present study we restricted our simulation to onlytwo bias correction methods for detecting DIF namelyWLR and PML Nevertheless there are a limited number ofstudies that evaluated DIF based on other penalizationmethods such as LASSO For example Tutz and Schaubergerapplied LASSO penalization for DIF analysis which includescontinuous variables [36] Furthermore Magis et al pro-posed a new DIF detection method based on LASSO whereall items were simultaneously evaluated for DIF in a singlemodeling approach so that multiple testing would not be aproblem [37]

5 Conclusion

Although the logistic regression (LR) model is one the mostcommon methods for detecting DIF certain samplingstrategies and appropriate bias correction techniques shouldbe applied when LR is implemented on moderate or severeimbalanced data sets In summary our findings revealedthat as compared with ML and PML the WLR is a moresensitive method for detecting DIF when data are imbal-anced or rare Hence as well as its easy application toexisting software the WLR introduced by King and Zeng isstrongly recommended for detecting DIF due to higherpower and lower type I error rate in comparison to PML andML inferential methods However in the future studies ofDIF penalized and bias correction methods should beproposed for ordinal logistic regression in the presence ofrare events or small sample setting

Data Availability

+is is a simulation study and the real data set has beenprovided by Dr Dejan Stevanovic the third author of themanuscript

10 BioMed Research International

Conflicts of Interest

+e authors declare no potential conflicts of interest withrespect to the research authorship andor publication ofthis article

Acknowledgments

+is work was supported by the Shiraz University of MedicalSciences Research Council Shiraz Iran under grant no96-01-01-14601 +is paper was extracted fromMarjan FaghihrsquosPhD thesis under supervision of Prof Peyman JafariPeyman Jafari andMarjan Faghih equally contributed to thiswork

References

[1] B D Zumbo ldquo+ree generations of DIF analyses consideringwhere it has been where it is now and where it is goingrdquoLanguage Assessment Quarterly vol 4 no 2 pp 223ndash2332007

[2] T A Ackerman ldquo+e theory and practice of item responsetheory by de Ayala R Jrdquo Journal of Educational Measure-ment vol 47 no 4 pp 471ndash476 2010

[3] B E Clauser and K M Mazor ldquoUsing statistical proceduresto identify differentially functioning test itemsrdquo EducationalMeasurement Issues and Practice vol 17 no 1 pp 31ndash442005

[4] H Swaminathan and H J Rogers ldquoDetecting differential itemfunctioning using logistic regression proceduresrdquo Journal ofEducational Measurement vol 27 no 4 pp 361ndash370 1990

[5] H J Rogers and H Swaminathan ldquoA comparison of logisticregression and Mantel-Haenszel procedures for detectingdifferential item functioningrdquo Applied Psychological Mea-surement vol 17 no 2 pp 105ndash116 1993

[6] P Narayanon and H Swaminathan ldquoIdentification of itemsthat show nonuniform DIFrdquo Applied Psychological Mea-surement vol 20 no 3 pp 257ndash274 1996

[7] P Jafari D Stevanovic and Z Bagheri ldquoCross-culturalmeasurement equivalence of the KINDL questionnaire forquality of life assessment in children and adolescentsrdquo ChildPsychiatry amp Human Development vol 47 no 2 pp 291ndash3042016

[8] P K Crane L E Gibbons K Ocepek-Welikson et al ldquoAcomparison of three sets of criteria for determining thepresence of differential item functioning using ordinal logisticregressionrdquo Quality of Life Research vol 16 no 1 pp 69ndash842007

[9] S Lee ldquoDetecting differential item functioning using thelogistic regression procedure in small samplesrdquo AppliedPsychological Measurement vol 41 no 1 pp 30ndash43 2017

[10] E Allahyari P Jafari and Z Bagheri ldquoA simulation study toassess the effect of the number of response categories on thepower of ordinal logistic regression for differential itemfunctioning analysis in rating scalesrdquo Computational andMathematical Methods in Medicine vol 2016 Article ID5080826 8 pages 2016

[11] P O Monahan C A McHorney T E Stump andA J Perkins ldquoOdds ratio delta ETS classification andstandardization measures of dif magnitude for binary logisticregressionrdquo Journal of Educational and Behavioral Statisticsvol 32 no 1 pp 92ndash109 2007

[12] S Apinyapibal N Lawthong and S Kanjanawasee ldquoAcomparative analysis of the efficacy of differential itemfunctioning detection for dichotomously scored items amonglogistic regression SIBTEST and raschtree methodsrdquo Proce-dia-Social and Behavioral Sciences vol 191 pp 21ndash25 2015

[13] H-F Chen and K-Y Jin ldquoApplying logistic regression todetect differential item functioning in multidimensionaldatardquo Frontiers in Psychology vol 9 p 1302 2018

[14] D Svetina and L Rutkowski ldquoDetecting differential itemfunctioning using generalized logistic regression in thecontext of large-scale assessmentsrdquo Large-scale Assessments inEducation vol 2 no 1 2014

[15] R Puhr G Heinze M Nold et al ldquoFirthrsquos logistic regressionwith rare events accurate effect estimates and predictionsrdquoStatistics in Medicine vol 36 no 14 pp 2302ndash2317 2017

[16] D Firth ldquoBias reduction of maximum likelihood estimatesrdquoBiometrika vol 80 no 1 pp 27ndash38 1993

[17] M Pavlou G Ambler S Seaman M De Iorio andR Z Omar ldquoReview and evaluation of penalised regressionmethods for risk prediction in low-dimensional data with feweventsrdquo Statistics in Medicine vol 35 no 7 pp 1159ndash11772016

[18] M Maalouf and T B Trafalis ldquoRobust weighted kernel lo-gistic regression in imbalanced and rare events datardquo Com-putational Statistics amp Data Analysis vol 55 no 1pp 168ndash183 2011

[19] M Maalouf D Homouz and T B Trafalis ldquoLogistic re-gression in large rare events and imbalanced data a perfor-mance comparison of prior correction and weightingmethodsrdquo Computational Intelligence vol 34 no 1pp 161ndash174 2018

[20] G King and L Zeng ldquoLogistic regression in rare events datardquoPolitical Analysis vol 9 no 2 pp 137ndash163 2001

[21] I-C Huang W L Leite P Shearer M Seid D A Revickiand E A Shenkman ldquoDifferential item functioning in qualityof life measure between children with and without specialhealth-care needsrdquo Value in Health vol 14 no 6 pp 872ndash883 2011

[22] H Jeffreys ldquoAn invariant form for the prior probability inestimation problemsrdquo Proceedings of the Royal Society ofLondon Series A Mathematical and Physical Sciencesvol 186 no 1007 pp 453ndash461 1946

[23] G Heinze and M Schemper ldquoA solution to the problem ofseparation in logistic regressionrdquo Statistics in Medicinevol 21 no 16 pp 2409ndash2419 2002

[24] G Heinze and M Ploner ldquoFixing the nonconvergence bug inlogistic regression with SPLUS and SASrdquo Computer Methodsand Programs in Biomedicine vol 71 no 2 pp 181ndash187 2003

[25] G King and L Zeng ldquoExplaining rare events in internationalrelationsrdquo International Organization vol 55 no 3pp 693ndash715 2001

[26] M Maalouf ldquoLogistic regression in data analysis an over-viewrdquo International Journal of Data Analysis Techniques andStrategies vol 3 no 3 pp 281ndash299 2011

[27] D E A Sulasih S W Purnami and S P Rahayu BeBeoretical Study of Rare Event Weighted Logistic Regressionfor Classification of Imbalanced Data Universitas Muham-madiyah Surakarta Surakarta Indonesia 2015

[28] G Heinze logistf Firthrsquos Bias Reduced Logistic RegressionR Package Version 123 2018 httpsCRANR-projectorgpackage=logistf

[29] C Choirat J Honaker K Imai G King and O Lau ZeligEveryonersquos Statistical Software Version 5161 2018 httpzeligprojectorg

BioMed Research International 11

[30] M L Ong L Lu S Lee et al ldquoA comparison of the hier-archical generalized linear model multiple-indicators mul-tiple-causes and the item response theory-likelihood ratio testfor detecting differential item functioningrdquo in QuantitativePsychology Research pp 343ndash357 Springer Berlin Germany2015

[31] D Stevanovic Z Bagheri O Atilola et al ldquoCross-culturalmeasurement invariance of the revised Child anxiety anddepression scale across 11 world-wide societiesrdquo Epidemiol-ogy and Psychiatric Sciences vol 26 no 4 pp 430ndash440 2017

[32] G M Cordeiro and P McCullagh ldquoBias correction in gen-eralized linear modelsrdquo Journal of the Royal Statistical SocietySeries B (Methodological) vol 53 no 3 pp 629ndash643 1991

[33] T Maiti and V Pradhan ldquoBias reduction and a solution forseparation of logistic regression with missing covariatesrdquoBiometrics vol 65 no 4 pp 1262ndash1269 2009

[34] B D Zumbo A Handbook on the Beory and Methods ofDifferential Item Functioning (DIF) National DefenseHeadquarters Ottawa Canada 1999

[35] J-S Lai J Teresi and R Gershon ldquoProcedures for the analysisof differential item functioning (DIF) for small sample sizesrdquoEvaluation amp the Health Professions vol 28 no 3 pp 283ndash294 2005

[36] G Tutz and G Schauberger A Penalty Approach to Differ-ential Item Functioning in Rasch Models Springer BerlinGermany 2015

[37] D Magis F Tuerlinckx and P de Boeck ldquoDetection ofdifferential item functioning using the lasso approachrdquoJournal of Educational and Behavioral Statistics vol 40 no 2pp 111ndash135 2015

12 BioMed Research International

Page 8: A Comparative Study of the Bias Correction Methods for ... · 2020. 3. 2. · mum likelihood estimation method (PML) was originally developedbyDavidFirth[16]inordertoreducethesmall

400 600 800 1000200

Total sample size

002

003

004

005

006

007

Type

I er

ror r

ate

(a)

002

004

006

008

Type

I er

ror r

ate

400 600 800 1000200

Total sample size

(b)

002

003

004

005

006

007

Type

I er

ror r

ate

1000600 800400200

Total sample size

(c)

002

003

004

005

006

007

008

Type

I er

ror r

ate

1000600 800400200

Total sample size

(d)

Figure 2 Continued

400 600 800 1000200

Total sample size

005

010

015

020

025

030

Pow

er

(g)

02

04

06

08

Pow

er

400 600 800 1000200

Total sample size

(h)

Figure 1 +e average power of MLE (solid lines) PML (dotted line) and WLR (broken line) methods on measures with 5 and 15 itemsNote Left panel for DIF 04 and right panel for DIF 08 From top to bottom the four panels are (nf nr τ 0156) (nf 3nr τ 0156)(nf nr τ 0069) and (nf 3nr τ 0069)

8 BioMed Research International

and ML respectively Moreover as compared with ML andPML WLR gives smaller standard error for regressioncoefficient and it can result in increased likelihood offinding DIF

4 Discussion

Detecting DIF based on the logistic regression model can bechallenging for binary data where events are rare or severely

000

002

004

006

Type

I er

ror r

ate

1000600 800400200

Total sample size

(e)

000

002

004

006

Type

I er

ror r

ate

1000600 800400200

Total sample size

(f )

000

001

002

003

004

005

006

Type

I er

ror r

ate

1000600 800400200

Total sample size

(g)

400 600 800 1000200

Total sample size

000

002

004

006

Type

I er

ror r

ate

(h)

Figure 2 +e average type I error rates of MLE (solid lines) PMLE (dotted line) and WLR (broken line) methods on measures with 5 and15 items Note Left panel for DIF 04 and right panel for DIF 08 From top to bottom the four panels are (nf nr τ 0156) (nf 3nrτ 0156) (nf nr τ 0069) and (nf 3nr τ 0069)

Table 3 +e results of DIF analysis across male and female Serbian individuals based on ML PML and WLR methods

ItemML PML WLR

B SE P value B SE P value B SE P value τ

Social phobia

7 061 029 0032 06 029 0042 061 031 0042 02420 minus 116 044 0007 minus 113 044 001 minus 119 042 0002 01738 124 044 0002 119 043 0004 118 041 0001 01943 minus 157 047 lt0001 minus 152 046 0001 minus 156 044 lt0001 019

Separation anxiety

9 minus 089 048 006 minus 086 047 01 minus 083 036 0015 03117 206 122 0045 17 106 0092 097 075 0008 01145 096 048 0034 091 047 0059 104 056 0036 00946 minus 304 114 0001 minus 269 101 0003 minus 19 051 lt0001 014

Generalized anxiety disorder 13 114 035 0001 112 035 0001 112 035 0001 03137 minus 201 044 lt0001 minus 196 044 lt0001 minus 195 044 lt0001 022

Obsessive compulsive disorder 23 079 043 0059 075 043 01 079 041 0038 018

Major depression2 123 045 0004 118 044 0006 119 045 0003 0196 minus 122 058 0031 minus 118 056 0043 minus 115 042 0004 01221 063 032 0048 062 032 0072 065 034 0049 026

Note P value is reported in three decimal places for more accuracy of comparing the three models τ the fraction of 1s in the population that is extracted froma data set of 4192 adolescents in eleven countries B regression coefficient for testing uniform DIF SE standard error of regression coefficient

BioMed Research International 9

imbalanced +e present study is designed to clarify theunknown consequences of rare events data in the context ofDIF analysis based on the LR model with and withoutapplying bias correction methods Hence in the compre-hensive simulation study we compared the performance ofthe WLR PML and ML methods for detecting DIF withfocus on imbalanced binary response data

According to our simulation findings of the three es-timation methods for detecting DIF the WLR seems to bethe most appealing with regards to its statistical propertiesits type I error rate is close to or less than the nominal 005level and its power is considerably higher than the PML andML Moreover in the present study comparing the threeinferential methods for the real data set confirmed thefindings of the simulation Accordingly as compared withWLR PML and ML were less sensitive for detecting DIFacross male and female Serbian adolescents

It is worthwhile to explain why the WLR outperforms thePML and ML estimation methods for DIF analysis Indeedthe WLR is an extension of the small-sample bias correctionsas described by Cordeiro and McCullagh to the weightedlikelihood [32] A previous simulation study has shown thatthis bias correction method reduces both the bias and thevariance thereby offering the smallest Mean Squared Error(MSE) when compared with that of PML and ML estimationmethods [33] Hence applying the WLR in DIF analysis canlead to reduction in the standard error of the parameterestimates which consequently increases the likelihood offinding DIF Another advantage of the WLR is that it givesunbiased predicted probability [15 20] However the pre-dicted probabilities from the WLRmodel may fall outside theplausible range of 0 to 1 [15]

Furthermore our findings revealed that the logisticregression model based on the traditional ML estimationmethod had a slightly better performance than the PML forimbalanced or rare events data +ese findings were similarto those of Lee who reported that the ML slightly out-performed the PML for detecting DIF in small samples [9]In this case as described by Lee it seems that the sacrifice inthe precision cannot be compensated for the bias reductionof the PML estimates in imbalanced data It should also benoted that while Firthrsquos PML method reduces the bias in theestimates of regression coefficients it introduces bias in thepredicted probabilities and this bias is not negligible for rareevents data [15] To overcome this problem Puhr et alproposed two modified versions of Firthrsquos PML resulting inunbiased predicted probabilities [15] Accordingly theproposed methods including Firthrsquos logistic regression withintercept correction and Firthrsquos logistic regression withadded covariate efficiently improve on predictions fromFirthrsquos logistic regression [15] Since these methods were notimplemented in the standard statistical software such as Rwe were not able to apply them in the context of DIF analysisand compare their results with those of the WLR modelHence in the present study the WLR is the most efficientmethod for DIF analysis under rare events data +e WLRnot only reduces the bias of regression coefficients but alsoprovides unbiased predicted probability in comparison tothe PML method [15 20]

What distinguishes this study from the previous one isthat we simultaneously evaluated the effect of imbalanced dataand small samples (as well as large samples) on the perfor-mance of the three estimation methods for DIF analysis Inaddition one of the advantages of the present study is that itprovides a guideline about the required sample size for DIFanalysis with imbalanced or rare events data Previous sim-ulation studies have shown that the minimum sample size forDIF analysis with logistic regression should be within therange of 100 to 200 per group [34 35] However for moderateimbalanced data (τ 0156) and severe DIF (DIF 08) as ageneral rule of thumb we would suggest imposing a mini-mum of 300 respondents per group to achieve the adequatepower with the WLR method In addition for severe im-balance rate (τ 0069) and DIF 08 theWLR performs wellto ensure the acceptable power with samples of 500 per groupTo find what the minimum number of sample size is toachieve the adequate power (80) for moderate DIF wesimulated items with DIF 04 τ 0156 and 0069 andsample sizes greater than 500 in each group not reported inthe results section +e findings revealed that sample sizes ofat least 1000 per group for τ 0156 and 2000 per group forτ 0069 are required to detect DIF 04 based on the WLRmethod

In the present study we restricted our simulation to onlytwo bias correction methods for detecting DIF namelyWLR and PML Nevertheless there are a limited number ofstudies that evaluated DIF based on other penalizationmethods such as LASSO For example Tutz and Schaubergerapplied LASSO penalization for DIF analysis which includescontinuous variables [36] Furthermore Magis et al pro-posed a new DIF detection method based on LASSO whereall items were simultaneously evaluated for DIF in a singlemodeling approach so that multiple testing would not be aproblem [37]

5 Conclusion

Although the logistic regression (LR) model is one the mostcommon methods for detecting DIF certain samplingstrategies and appropriate bias correction techniques shouldbe applied when LR is implemented on moderate or severeimbalanced data sets In summary our findings revealedthat as compared with ML and PML the WLR is a moresensitive method for detecting DIF when data are imbal-anced or rare Hence as well as its easy application toexisting software the WLR introduced by King and Zeng isstrongly recommended for detecting DIF due to higherpower and lower type I error rate in comparison to PML andML inferential methods However in the future studies ofDIF penalized and bias correction methods should beproposed for ordinal logistic regression in the presence ofrare events or small sample setting

Data Availability

+is is a simulation study and the real data set has beenprovided by Dr Dejan Stevanovic the third author of themanuscript

10 BioMed Research International

Conflicts of Interest

+e authors declare no potential conflicts of interest withrespect to the research authorship andor publication ofthis article

Acknowledgments

+is work was supported by the Shiraz University of MedicalSciences Research Council Shiraz Iran under grant no96-01-01-14601 +is paper was extracted fromMarjan FaghihrsquosPhD thesis under supervision of Prof Peyman JafariPeyman Jafari andMarjan Faghih equally contributed to thiswork

References

[1] B D Zumbo ldquo+ree generations of DIF analyses consideringwhere it has been where it is now and where it is goingrdquoLanguage Assessment Quarterly vol 4 no 2 pp 223ndash2332007

[2] T A Ackerman ldquo+e theory and practice of item responsetheory by de Ayala R Jrdquo Journal of Educational Measure-ment vol 47 no 4 pp 471ndash476 2010

[3] B E Clauser and K M Mazor ldquoUsing statistical proceduresto identify differentially functioning test itemsrdquo EducationalMeasurement Issues and Practice vol 17 no 1 pp 31ndash442005

[4] H Swaminathan and H J Rogers ldquoDetecting differential itemfunctioning using logistic regression proceduresrdquo Journal ofEducational Measurement vol 27 no 4 pp 361ndash370 1990

[5] H J Rogers and H Swaminathan ldquoA comparison of logisticregression and Mantel-Haenszel procedures for detectingdifferential item functioningrdquo Applied Psychological Mea-surement vol 17 no 2 pp 105ndash116 1993

[6] P Narayanon and H Swaminathan ldquoIdentification of itemsthat show nonuniform DIFrdquo Applied Psychological Mea-surement vol 20 no 3 pp 257ndash274 1996

[7] P Jafari D Stevanovic and Z Bagheri ldquoCross-culturalmeasurement equivalence of the KINDL questionnaire forquality of life assessment in children and adolescentsrdquo ChildPsychiatry amp Human Development vol 47 no 2 pp 291ndash3042016

[8] P K Crane L E Gibbons K Ocepek-Welikson et al ldquoAcomparison of three sets of criteria for determining thepresence of differential item functioning using ordinal logisticregressionrdquo Quality of Life Research vol 16 no 1 pp 69ndash842007

[9] S Lee ldquoDetecting differential item functioning using thelogistic regression procedure in small samplesrdquo AppliedPsychological Measurement vol 41 no 1 pp 30ndash43 2017

[10] E Allahyari P Jafari and Z Bagheri ldquoA simulation study toassess the effect of the number of response categories on thepower of ordinal logistic regression for differential itemfunctioning analysis in rating scalesrdquo Computational andMathematical Methods in Medicine vol 2016 Article ID5080826 8 pages 2016

[11] P O Monahan C A McHorney T E Stump andA J Perkins ldquoOdds ratio delta ETS classification andstandardization measures of dif magnitude for binary logisticregressionrdquo Journal of Educational and Behavioral Statisticsvol 32 no 1 pp 92ndash109 2007

[12] S Apinyapibal N Lawthong and S Kanjanawasee ldquoAcomparative analysis of the efficacy of differential itemfunctioning detection for dichotomously scored items amonglogistic regression SIBTEST and raschtree methodsrdquo Proce-dia-Social and Behavioral Sciences vol 191 pp 21ndash25 2015

[13] H-F Chen and K-Y Jin ldquoApplying logistic regression todetect differential item functioning in multidimensionaldatardquo Frontiers in Psychology vol 9 p 1302 2018

[14] D Svetina and L Rutkowski ldquoDetecting differential itemfunctioning using generalized logistic regression in thecontext of large-scale assessmentsrdquo Large-scale Assessments inEducation vol 2 no 1 2014

[15] R Puhr G Heinze M Nold et al ldquoFirthrsquos logistic regressionwith rare events accurate effect estimates and predictionsrdquoStatistics in Medicine vol 36 no 14 pp 2302ndash2317 2017

[16] D Firth ldquoBias reduction of maximum likelihood estimatesrdquoBiometrika vol 80 no 1 pp 27ndash38 1993

[17] M Pavlou G Ambler S Seaman M De Iorio andR Z Omar ldquoReview and evaluation of penalised regressionmethods for risk prediction in low-dimensional data with feweventsrdquo Statistics in Medicine vol 35 no 7 pp 1159ndash11772016

[18] M Maalouf and T B Trafalis ldquoRobust weighted kernel lo-gistic regression in imbalanced and rare events datardquo Com-putational Statistics amp Data Analysis vol 55 no 1pp 168ndash183 2011

[19] M Maalouf D Homouz and T B Trafalis ldquoLogistic re-gression in large rare events and imbalanced data a perfor-mance comparison of prior correction and weightingmethodsrdquo Computational Intelligence vol 34 no 1pp 161ndash174 2018

[20] G King and L Zeng ldquoLogistic regression in rare events datardquoPolitical Analysis vol 9 no 2 pp 137ndash163 2001

[21] I-C Huang W L Leite P Shearer M Seid D A Revickiand E A Shenkman ldquoDifferential item functioning in qualityof life measure between children with and without specialhealth-care needsrdquo Value in Health vol 14 no 6 pp 872ndash883 2011

[22] H Jeffreys ldquoAn invariant form for the prior probability inestimation problemsrdquo Proceedings of the Royal Society ofLondon Series A Mathematical and Physical Sciencesvol 186 no 1007 pp 453ndash461 1946

[23] G Heinze and M Schemper ldquoA solution to the problem ofseparation in logistic regressionrdquo Statistics in Medicinevol 21 no 16 pp 2409ndash2419 2002

[24] G Heinze and M Ploner ldquoFixing the nonconvergence bug inlogistic regression with SPLUS and SASrdquo Computer Methodsand Programs in Biomedicine vol 71 no 2 pp 181ndash187 2003

[25] G King and L Zeng ldquoExplaining rare events in internationalrelationsrdquo International Organization vol 55 no 3pp 693ndash715 2001

[26] M Maalouf ldquoLogistic regression in data analysis an over-viewrdquo International Journal of Data Analysis Techniques andStrategies vol 3 no 3 pp 281ndash299 2011

[27] D E A Sulasih S W Purnami and S P Rahayu BeBeoretical Study of Rare Event Weighted Logistic Regressionfor Classification of Imbalanced Data Universitas Muham-madiyah Surakarta Surakarta Indonesia 2015

[28] G Heinze logistf Firthrsquos Bias Reduced Logistic RegressionR Package Version 123 2018 httpsCRANR-projectorgpackage=logistf

[29] C Choirat J Honaker K Imai G King and O Lau ZeligEveryonersquos Statistical Software Version 5161 2018 httpzeligprojectorg

BioMed Research International 11

[30] M L Ong L Lu S Lee et al ldquoA comparison of the hier-archical generalized linear model multiple-indicators mul-tiple-causes and the item response theory-likelihood ratio testfor detecting differential item functioningrdquo in QuantitativePsychology Research pp 343ndash357 Springer Berlin Germany2015

[31] D Stevanovic Z Bagheri O Atilola et al ldquoCross-culturalmeasurement invariance of the revised Child anxiety anddepression scale across 11 world-wide societiesrdquo Epidemiol-ogy and Psychiatric Sciences vol 26 no 4 pp 430ndash440 2017

[32] G M Cordeiro and P McCullagh ldquoBias correction in gen-eralized linear modelsrdquo Journal of the Royal Statistical SocietySeries B (Methodological) vol 53 no 3 pp 629ndash643 1991

[33] T Maiti and V Pradhan ldquoBias reduction and a solution forseparation of logistic regression with missing covariatesrdquoBiometrics vol 65 no 4 pp 1262ndash1269 2009

[34] B D Zumbo A Handbook on the Beory and Methods ofDifferential Item Functioning (DIF) National DefenseHeadquarters Ottawa Canada 1999

[35] J-S Lai J Teresi and R Gershon ldquoProcedures for the analysisof differential item functioning (DIF) for small sample sizesrdquoEvaluation amp the Health Professions vol 28 no 3 pp 283ndash294 2005

[36] G Tutz and G Schauberger A Penalty Approach to Differ-ential Item Functioning in Rasch Models Springer BerlinGermany 2015

[37] D Magis F Tuerlinckx and P de Boeck ldquoDetection ofdifferential item functioning using the lasso approachrdquoJournal of Educational and Behavioral Statistics vol 40 no 2pp 111ndash135 2015

12 BioMed Research International

Page 9: A Comparative Study of the Bias Correction Methods for ... · 2020. 3. 2. · mum likelihood estimation method (PML) was originally developedbyDavidFirth[16]inordertoreducethesmall

and ML respectively Moreover as compared with ML andPML WLR gives smaller standard error for regressioncoefficient and it can result in increased likelihood offinding DIF

4 Discussion

Detecting DIF based on the logistic regression model can bechallenging for binary data where events are rare or severely

000

002

004

006

Type

I er

ror r

ate

1000600 800400200

Total sample size

(e)

000

002

004

006

Type

I er

ror r

ate

1000600 800400200

Total sample size

(f )

000

001

002

003

004

005

006

Type

I er

ror r

ate

1000600 800400200

Total sample size

(g)

400 600 800 1000200

Total sample size

000

002

004

006

Type

I er

ror r

ate

(h)

Figure 2 +e average type I error rates of MLE (solid lines) PMLE (dotted line) and WLR (broken line) methods on measures with 5 and15 items Note Left panel for DIF 04 and right panel for DIF 08 From top to bottom the four panels are (nf nr τ 0156) (nf 3nrτ 0156) (nf nr τ 0069) and (nf 3nr τ 0069)

Table 3 +e results of DIF analysis across male and female Serbian individuals based on ML PML and WLR methods

ItemML PML WLR

B SE P value B SE P value B SE P value τ

Social phobia

7 061 029 0032 06 029 0042 061 031 0042 02420 minus 116 044 0007 minus 113 044 001 minus 119 042 0002 01738 124 044 0002 119 043 0004 118 041 0001 01943 minus 157 047 lt0001 minus 152 046 0001 minus 156 044 lt0001 019

Separation anxiety

9 minus 089 048 006 minus 086 047 01 minus 083 036 0015 03117 206 122 0045 17 106 0092 097 075 0008 01145 096 048 0034 091 047 0059 104 056 0036 00946 minus 304 114 0001 minus 269 101 0003 minus 19 051 lt0001 014

Generalized anxiety disorder 13 114 035 0001 112 035 0001 112 035 0001 03137 minus 201 044 lt0001 minus 196 044 lt0001 minus 195 044 lt0001 022

Obsessive compulsive disorder 23 079 043 0059 075 043 01 079 041 0038 018

Major depression2 123 045 0004 118 044 0006 119 045 0003 0196 minus 122 058 0031 minus 118 056 0043 minus 115 042 0004 01221 063 032 0048 062 032 0072 065 034 0049 026

Note P value is reported in three decimal places for more accuracy of comparing the three models τ the fraction of 1s in the population that is extracted froma data set of 4192 adolescents in eleven countries B regression coefficient for testing uniform DIF SE standard error of regression coefficient

BioMed Research International 9

imbalanced +e present study is designed to clarify theunknown consequences of rare events data in the context ofDIF analysis based on the LR model with and withoutapplying bias correction methods Hence in the compre-hensive simulation study we compared the performance ofthe WLR PML and ML methods for detecting DIF withfocus on imbalanced binary response data

According to our simulation findings of the three es-timation methods for detecting DIF the WLR seems to bethe most appealing with regards to its statistical propertiesits type I error rate is close to or less than the nominal 005level and its power is considerably higher than the PML andML Moreover in the present study comparing the threeinferential methods for the real data set confirmed thefindings of the simulation Accordingly as compared withWLR PML and ML were less sensitive for detecting DIFacross male and female Serbian adolescents

It is worthwhile to explain why the WLR outperforms thePML and ML estimation methods for DIF analysis Indeedthe WLR is an extension of the small-sample bias correctionsas described by Cordeiro and McCullagh to the weightedlikelihood [32] A previous simulation study has shown thatthis bias correction method reduces both the bias and thevariance thereby offering the smallest Mean Squared Error(MSE) when compared with that of PML and ML estimationmethods [33] Hence applying the WLR in DIF analysis canlead to reduction in the standard error of the parameterestimates which consequently increases the likelihood offinding DIF Another advantage of the WLR is that it givesunbiased predicted probability [15 20] However the pre-dicted probabilities from the WLRmodel may fall outside theplausible range of 0 to 1 [15]

Furthermore our findings revealed that the logisticregression model based on the traditional ML estimationmethod had a slightly better performance than the PML forimbalanced or rare events data +ese findings were similarto those of Lee who reported that the ML slightly out-performed the PML for detecting DIF in small samples [9]In this case as described by Lee it seems that the sacrifice inthe precision cannot be compensated for the bias reductionof the PML estimates in imbalanced data It should also benoted that while Firthrsquos PML method reduces the bias in theestimates of regression coefficients it introduces bias in thepredicted probabilities and this bias is not negligible for rareevents data [15] To overcome this problem Puhr et alproposed two modified versions of Firthrsquos PML resulting inunbiased predicted probabilities [15] Accordingly theproposed methods including Firthrsquos logistic regression withintercept correction and Firthrsquos logistic regression withadded covariate efficiently improve on predictions fromFirthrsquos logistic regression [15] Since these methods were notimplemented in the standard statistical software such as Rwe were not able to apply them in the context of DIF analysisand compare their results with those of the WLR modelHence in the present study the WLR is the most efficientmethod for DIF analysis under rare events data +e WLRnot only reduces the bias of regression coefficients but alsoprovides unbiased predicted probability in comparison tothe PML method [15 20]

What distinguishes this study from the previous one isthat we simultaneously evaluated the effect of imbalanced dataand small samples (as well as large samples) on the perfor-mance of the three estimation methods for DIF analysis Inaddition one of the advantages of the present study is that itprovides a guideline about the required sample size for DIFanalysis with imbalanced or rare events data Previous sim-ulation studies have shown that the minimum sample size forDIF analysis with logistic regression should be within therange of 100 to 200 per group [34 35] However for moderateimbalanced data (τ 0156) and severe DIF (DIF 08) as ageneral rule of thumb we would suggest imposing a mini-mum of 300 respondents per group to achieve the adequatepower with the WLR method In addition for severe im-balance rate (τ 0069) and DIF 08 theWLR performs wellto ensure the acceptable power with samples of 500 per groupTo find what the minimum number of sample size is toachieve the adequate power (80) for moderate DIF wesimulated items with DIF 04 τ 0156 and 0069 andsample sizes greater than 500 in each group not reported inthe results section +e findings revealed that sample sizes ofat least 1000 per group for τ 0156 and 2000 per group forτ 0069 are required to detect DIF 04 based on the WLRmethod

In the present study we restricted our simulation to onlytwo bias correction methods for detecting DIF namelyWLR and PML Nevertheless there are a limited number ofstudies that evaluated DIF based on other penalizationmethods such as LASSO For example Tutz and Schaubergerapplied LASSO penalization for DIF analysis which includescontinuous variables [36] Furthermore Magis et al pro-posed a new DIF detection method based on LASSO whereall items were simultaneously evaluated for DIF in a singlemodeling approach so that multiple testing would not be aproblem [37]

5 Conclusion

Although the logistic regression (LR) model is one the mostcommon methods for detecting DIF certain samplingstrategies and appropriate bias correction techniques shouldbe applied when LR is implemented on moderate or severeimbalanced data sets In summary our findings revealedthat as compared with ML and PML the WLR is a moresensitive method for detecting DIF when data are imbal-anced or rare Hence as well as its easy application toexisting software the WLR introduced by King and Zeng isstrongly recommended for detecting DIF due to higherpower and lower type I error rate in comparison to PML andML inferential methods However in the future studies ofDIF penalized and bias correction methods should beproposed for ordinal logistic regression in the presence ofrare events or small sample setting

Data Availability

+is is a simulation study and the real data set has beenprovided by Dr Dejan Stevanovic the third author of themanuscript

10 BioMed Research International

Conflicts of Interest

+e authors declare no potential conflicts of interest withrespect to the research authorship andor publication ofthis article

Acknowledgments

+is work was supported by the Shiraz University of MedicalSciences Research Council Shiraz Iran under grant no96-01-01-14601 +is paper was extracted fromMarjan FaghihrsquosPhD thesis under supervision of Prof Peyman JafariPeyman Jafari andMarjan Faghih equally contributed to thiswork

References

[1] B D Zumbo ldquo+ree generations of DIF analyses consideringwhere it has been where it is now and where it is goingrdquoLanguage Assessment Quarterly vol 4 no 2 pp 223ndash2332007

[2] T A Ackerman ldquo+e theory and practice of item responsetheory by de Ayala R Jrdquo Journal of Educational Measure-ment vol 47 no 4 pp 471ndash476 2010

[3] B E Clauser and K M Mazor ldquoUsing statistical proceduresto identify differentially functioning test itemsrdquo EducationalMeasurement Issues and Practice vol 17 no 1 pp 31ndash442005

[4] H Swaminathan and H J Rogers ldquoDetecting differential itemfunctioning using logistic regression proceduresrdquo Journal ofEducational Measurement vol 27 no 4 pp 361ndash370 1990

[5] H J Rogers and H Swaminathan ldquoA comparison of logisticregression and Mantel-Haenszel procedures for detectingdifferential item functioningrdquo Applied Psychological Mea-surement vol 17 no 2 pp 105ndash116 1993

[6] P Narayanon and H Swaminathan ldquoIdentification of itemsthat show nonuniform DIFrdquo Applied Psychological Mea-surement vol 20 no 3 pp 257ndash274 1996

[7] P Jafari D Stevanovic and Z Bagheri ldquoCross-culturalmeasurement equivalence of the KINDL questionnaire forquality of life assessment in children and adolescentsrdquo ChildPsychiatry amp Human Development vol 47 no 2 pp 291ndash3042016

[8] P K Crane L E Gibbons K Ocepek-Welikson et al ldquoAcomparison of three sets of criteria for determining thepresence of differential item functioning using ordinal logisticregressionrdquo Quality of Life Research vol 16 no 1 pp 69ndash842007

[9] S Lee ldquoDetecting differential item functioning using thelogistic regression procedure in small samplesrdquo AppliedPsychological Measurement vol 41 no 1 pp 30ndash43 2017

[10] E Allahyari P Jafari and Z Bagheri ldquoA simulation study toassess the effect of the number of response categories on thepower of ordinal logistic regression for differential itemfunctioning analysis in rating scalesrdquo Computational andMathematical Methods in Medicine vol 2016 Article ID5080826 8 pages 2016

[11] P O Monahan C A McHorney T E Stump andA J Perkins ldquoOdds ratio delta ETS classification andstandardization measures of dif magnitude for binary logisticregressionrdquo Journal of Educational and Behavioral Statisticsvol 32 no 1 pp 92ndash109 2007

[12] S Apinyapibal N Lawthong and S Kanjanawasee ldquoAcomparative analysis of the efficacy of differential itemfunctioning detection for dichotomously scored items amonglogistic regression SIBTEST and raschtree methodsrdquo Proce-dia-Social and Behavioral Sciences vol 191 pp 21ndash25 2015

[13] H-F Chen and K-Y Jin ldquoApplying logistic regression todetect differential item functioning in multidimensionaldatardquo Frontiers in Psychology vol 9 p 1302 2018

[14] D Svetina and L Rutkowski ldquoDetecting differential itemfunctioning using generalized logistic regression in thecontext of large-scale assessmentsrdquo Large-scale Assessments inEducation vol 2 no 1 2014

[15] R Puhr G Heinze M Nold et al ldquoFirthrsquos logistic regressionwith rare events accurate effect estimates and predictionsrdquoStatistics in Medicine vol 36 no 14 pp 2302ndash2317 2017

[16] D Firth ldquoBias reduction of maximum likelihood estimatesrdquoBiometrika vol 80 no 1 pp 27ndash38 1993

[17] M Pavlou G Ambler S Seaman M De Iorio andR Z Omar ldquoReview and evaluation of penalised regressionmethods for risk prediction in low-dimensional data with feweventsrdquo Statistics in Medicine vol 35 no 7 pp 1159ndash11772016

[18] M Maalouf and T B Trafalis ldquoRobust weighted kernel lo-gistic regression in imbalanced and rare events datardquo Com-putational Statistics amp Data Analysis vol 55 no 1pp 168ndash183 2011

[19] M Maalouf D Homouz and T B Trafalis ldquoLogistic re-gression in large rare events and imbalanced data a perfor-mance comparison of prior correction and weightingmethodsrdquo Computational Intelligence vol 34 no 1pp 161ndash174 2018

[20] G King and L Zeng ldquoLogistic regression in rare events datardquoPolitical Analysis vol 9 no 2 pp 137ndash163 2001

[21] I-C Huang W L Leite P Shearer M Seid D A Revickiand E A Shenkman ldquoDifferential item functioning in qualityof life measure between children with and without specialhealth-care needsrdquo Value in Health vol 14 no 6 pp 872ndash883 2011

[22] H Jeffreys ldquoAn invariant form for the prior probability inestimation problemsrdquo Proceedings of the Royal Society ofLondon Series A Mathematical and Physical Sciencesvol 186 no 1007 pp 453ndash461 1946

[23] G Heinze and M Schemper ldquoA solution to the problem ofseparation in logistic regressionrdquo Statistics in Medicinevol 21 no 16 pp 2409ndash2419 2002

[24] G Heinze and M Ploner ldquoFixing the nonconvergence bug inlogistic regression with SPLUS and SASrdquo Computer Methodsand Programs in Biomedicine vol 71 no 2 pp 181ndash187 2003

[25] G King and L Zeng ldquoExplaining rare events in internationalrelationsrdquo International Organization vol 55 no 3pp 693ndash715 2001

[26] M Maalouf ldquoLogistic regression in data analysis an over-viewrdquo International Journal of Data Analysis Techniques andStrategies vol 3 no 3 pp 281ndash299 2011

[27] D E A Sulasih S W Purnami and S P Rahayu BeBeoretical Study of Rare Event Weighted Logistic Regressionfor Classification of Imbalanced Data Universitas Muham-madiyah Surakarta Surakarta Indonesia 2015

[28] G Heinze logistf Firthrsquos Bias Reduced Logistic RegressionR Package Version 123 2018 httpsCRANR-projectorgpackage=logistf

[29] C Choirat J Honaker K Imai G King and O Lau ZeligEveryonersquos Statistical Software Version 5161 2018 httpzeligprojectorg

BioMed Research International 11

[30] M L Ong L Lu S Lee et al ldquoA comparison of the hier-archical generalized linear model multiple-indicators mul-tiple-causes and the item response theory-likelihood ratio testfor detecting differential item functioningrdquo in QuantitativePsychology Research pp 343ndash357 Springer Berlin Germany2015

[31] D Stevanovic Z Bagheri O Atilola et al ldquoCross-culturalmeasurement invariance of the revised Child anxiety anddepression scale across 11 world-wide societiesrdquo Epidemiol-ogy and Psychiatric Sciences vol 26 no 4 pp 430ndash440 2017

[32] G M Cordeiro and P McCullagh ldquoBias correction in gen-eralized linear modelsrdquo Journal of the Royal Statistical SocietySeries B (Methodological) vol 53 no 3 pp 629ndash643 1991

[33] T Maiti and V Pradhan ldquoBias reduction and a solution forseparation of logistic regression with missing covariatesrdquoBiometrics vol 65 no 4 pp 1262ndash1269 2009

[34] B D Zumbo A Handbook on the Beory and Methods ofDifferential Item Functioning (DIF) National DefenseHeadquarters Ottawa Canada 1999

[35] J-S Lai J Teresi and R Gershon ldquoProcedures for the analysisof differential item functioning (DIF) for small sample sizesrdquoEvaluation amp the Health Professions vol 28 no 3 pp 283ndash294 2005

[36] G Tutz and G Schauberger A Penalty Approach to Differ-ential Item Functioning in Rasch Models Springer BerlinGermany 2015

[37] D Magis F Tuerlinckx and P de Boeck ldquoDetection ofdifferential item functioning using the lasso approachrdquoJournal of Educational and Behavioral Statistics vol 40 no 2pp 111ndash135 2015

12 BioMed Research International

Page 10: A Comparative Study of the Bias Correction Methods for ... · 2020. 3. 2. · mum likelihood estimation method (PML) was originally developedbyDavidFirth[16]inordertoreducethesmall

imbalanced +e present study is designed to clarify theunknown consequences of rare events data in the context ofDIF analysis based on the LR model with and withoutapplying bias correction methods Hence in the compre-hensive simulation study we compared the performance ofthe WLR PML and ML methods for detecting DIF withfocus on imbalanced binary response data

According to our simulation findings of the three es-timation methods for detecting DIF the WLR seems to bethe most appealing with regards to its statistical propertiesits type I error rate is close to or less than the nominal 005level and its power is considerably higher than the PML andML Moreover in the present study comparing the threeinferential methods for the real data set confirmed thefindings of the simulation Accordingly as compared withWLR PML and ML were less sensitive for detecting DIFacross male and female Serbian adolescents

It is worthwhile to explain why the WLR outperforms thePML and ML estimation methods for DIF analysis Indeedthe WLR is an extension of the small-sample bias correctionsas described by Cordeiro and McCullagh to the weightedlikelihood [32] A previous simulation study has shown thatthis bias correction method reduces both the bias and thevariance thereby offering the smallest Mean Squared Error(MSE) when compared with that of PML and ML estimationmethods [33] Hence applying the WLR in DIF analysis canlead to reduction in the standard error of the parameterestimates which consequently increases the likelihood offinding DIF Another advantage of the WLR is that it givesunbiased predicted probability [15 20] However the pre-dicted probabilities from the WLRmodel may fall outside theplausible range of 0 to 1 [15]

Furthermore our findings revealed that the logisticregression model based on the traditional ML estimationmethod had a slightly better performance than the PML forimbalanced or rare events data +ese findings were similarto those of Lee who reported that the ML slightly out-performed the PML for detecting DIF in small samples [9]In this case as described by Lee it seems that the sacrifice inthe precision cannot be compensated for the bias reductionof the PML estimates in imbalanced data It should also benoted that while Firthrsquos PML method reduces the bias in theestimates of regression coefficients it introduces bias in thepredicted probabilities and this bias is not negligible for rareevents data [15] To overcome this problem Puhr et alproposed two modified versions of Firthrsquos PML resulting inunbiased predicted probabilities [15] Accordingly theproposed methods including Firthrsquos logistic regression withintercept correction and Firthrsquos logistic regression withadded covariate efficiently improve on predictions fromFirthrsquos logistic regression [15] Since these methods were notimplemented in the standard statistical software such as Rwe were not able to apply them in the context of DIF analysisand compare their results with those of the WLR modelHence in the present study the WLR is the most efficientmethod for DIF analysis under rare events data +e WLRnot only reduces the bias of regression coefficients but alsoprovides unbiased predicted probability in comparison tothe PML method [15 20]

What distinguishes this study from the previous one isthat we simultaneously evaluated the effect of imbalanced dataand small samples (as well as large samples) on the perfor-mance of the three estimation methods for DIF analysis Inaddition one of the advantages of the present study is that itprovides a guideline about the required sample size for DIFanalysis with imbalanced or rare events data Previous sim-ulation studies have shown that the minimum sample size forDIF analysis with logistic regression should be within therange of 100 to 200 per group [34 35] However for moderateimbalanced data (τ 0156) and severe DIF (DIF 08) as ageneral rule of thumb we would suggest imposing a mini-mum of 300 respondents per group to achieve the adequatepower with the WLR method In addition for severe im-balance rate (τ 0069) and DIF 08 theWLR performs wellto ensure the acceptable power with samples of 500 per groupTo find what the minimum number of sample size is toachieve the adequate power (80) for moderate DIF wesimulated items with DIF 04 τ 0156 and 0069 andsample sizes greater than 500 in each group not reported inthe results section +e findings revealed that sample sizes ofat least 1000 per group for τ 0156 and 2000 per group forτ 0069 are required to detect DIF 04 based on the WLRmethod

In the present study we restricted our simulation to onlytwo bias correction methods for detecting DIF namelyWLR and PML Nevertheless there are a limited number ofstudies that evaluated DIF based on other penalizationmethods such as LASSO For example Tutz and Schaubergerapplied LASSO penalization for DIF analysis which includescontinuous variables [36] Furthermore Magis et al pro-posed a new DIF detection method based on LASSO whereall items were simultaneously evaluated for DIF in a singlemodeling approach so that multiple testing would not be aproblem [37]

5 Conclusion

Although the logistic regression (LR) model is one the mostcommon methods for detecting DIF certain samplingstrategies and appropriate bias correction techniques shouldbe applied when LR is implemented on moderate or severeimbalanced data sets In summary our findings revealedthat as compared with ML and PML the WLR is a moresensitive method for detecting DIF when data are imbal-anced or rare Hence as well as its easy application toexisting software the WLR introduced by King and Zeng isstrongly recommended for detecting DIF due to higherpower and lower type I error rate in comparison to PML andML inferential methods However in the future studies ofDIF penalized and bias correction methods should beproposed for ordinal logistic regression in the presence ofrare events or small sample setting

Data Availability

+is is a simulation study and the real data set has beenprovided by Dr Dejan Stevanovic the third author of themanuscript

10 BioMed Research International

Conflicts of Interest

+e authors declare no potential conflicts of interest withrespect to the research authorship andor publication ofthis article

Acknowledgments

+is work was supported by the Shiraz University of MedicalSciences Research Council Shiraz Iran under grant no96-01-01-14601 +is paper was extracted fromMarjan FaghihrsquosPhD thesis under supervision of Prof Peyman JafariPeyman Jafari andMarjan Faghih equally contributed to thiswork

References

[1] B D Zumbo ldquo+ree generations of DIF analyses consideringwhere it has been where it is now and where it is goingrdquoLanguage Assessment Quarterly vol 4 no 2 pp 223ndash2332007

[2] T A Ackerman ldquo+e theory and practice of item responsetheory by de Ayala R Jrdquo Journal of Educational Measure-ment vol 47 no 4 pp 471ndash476 2010

[3] B E Clauser and K M Mazor ldquoUsing statistical proceduresto identify differentially functioning test itemsrdquo EducationalMeasurement Issues and Practice vol 17 no 1 pp 31ndash442005

[4] H Swaminathan and H J Rogers ldquoDetecting differential itemfunctioning using logistic regression proceduresrdquo Journal ofEducational Measurement vol 27 no 4 pp 361ndash370 1990

[5] H J Rogers and H Swaminathan ldquoA comparison of logisticregression and Mantel-Haenszel procedures for detectingdifferential item functioningrdquo Applied Psychological Mea-surement vol 17 no 2 pp 105ndash116 1993

[6] P Narayanon and H Swaminathan ldquoIdentification of itemsthat show nonuniform DIFrdquo Applied Psychological Mea-surement vol 20 no 3 pp 257ndash274 1996

[7] P Jafari D Stevanovic and Z Bagheri ldquoCross-culturalmeasurement equivalence of the KINDL questionnaire forquality of life assessment in children and adolescentsrdquo ChildPsychiatry amp Human Development vol 47 no 2 pp 291ndash3042016

[8] P K Crane L E Gibbons K Ocepek-Welikson et al ldquoAcomparison of three sets of criteria for determining thepresence of differential item functioning using ordinal logisticregressionrdquo Quality of Life Research vol 16 no 1 pp 69ndash842007

[9] S Lee ldquoDetecting differential item functioning using thelogistic regression procedure in small samplesrdquo AppliedPsychological Measurement vol 41 no 1 pp 30ndash43 2017

[10] E Allahyari P Jafari and Z Bagheri ldquoA simulation study toassess the effect of the number of response categories on thepower of ordinal logistic regression for differential itemfunctioning analysis in rating scalesrdquo Computational andMathematical Methods in Medicine vol 2016 Article ID5080826 8 pages 2016

[11] P O Monahan C A McHorney T E Stump andA J Perkins ldquoOdds ratio delta ETS classification andstandardization measures of dif magnitude for binary logisticregressionrdquo Journal of Educational and Behavioral Statisticsvol 32 no 1 pp 92ndash109 2007

[12] S Apinyapibal N Lawthong and S Kanjanawasee ldquoAcomparative analysis of the efficacy of differential itemfunctioning detection for dichotomously scored items amonglogistic regression SIBTEST and raschtree methodsrdquo Proce-dia-Social and Behavioral Sciences vol 191 pp 21ndash25 2015

[13] H-F Chen and K-Y Jin ldquoApplying logistic regression todetect differential item functioning in multidimensionaldatardquo Frontiers in Psychology vol 9 p 1302 2018

[14] D Svetina and L Rutkowski ldquoDetecting differential itemfunctioning using generalized logistic regression in thecontext of large-scale assessmentsrdquo Large-scale Assessments inEducation vol 2 no 1 2014

[15] R Puhr G Heinze M Nold et al ldquoFirthrsquos logistic regressionwith rare events accurate effect estimates and predictionsrdquoStatistics in Medicine vol 36 no 14 pp 2302ndash2317 2017

[16] D Firth ldquoBias reduction of maximum likelihood estimatesrdquoBiometrika vol 80 no 1 pp 27ndash38 1993

[17] M Pavlou G Ambler S Seaman M De Iorio andR Z Omar ldquoReview and evaluation of penalised regressionmethods for risk prediction in low-dimensional data with feweventsrdquo Statistics in Medicine vol 35 no 7 pp 1159ndash11772016

[18] M Maalouf and T B Trafalis ldquoRobust weighted kernel lo-gistic regression in imbalanced and rare events datardquo Com-putational Statistics amp Data Analysis vol 55 no 1pp 168ndash183 2011

[19] M Maalouf D Homouz and T B Trafalis ldquoLogistic re-gression in large rare events and imbalanced data a perfor-mance comparison of prior correction and weightingmethodsrdquo Computational Intelligence vol 34 no 1pp 161ndash174 2018

[20] G King and L Zeng ldquoLogistic regression in rare events datardquoPolitical Analysis vol 9 no 2 pp 137ndash163 2001

[21] I-C Huang W L Leite P Shearer M Seid D A Revickiand E A Shenkman ldquoDifferential item functioning in qualityof life measure between children with and without specialhealth-care needsrdquo Value in Health vol 14 no 6 pp 872ndash883 2011

[22] H Jeffreys ldquoAn invariant form for the prior probability inestimation problemsrdquo Proceedings of the Royal Society ofLondon Series A Mathematical and Physical Sciencesvol 186 no 1007 pp 453ndash461 1946

[23] G Heinze and M Schemper ldquoA solution to the problem ofseparation in logistic regressionrdquo Statistics in Medicinevol 21 no 16 pp 2409ndash2419 2002

[24] G Heinze and M Ploner ldquoFixing the nonconvergence bug inlogistic regression with SPLUS and SASrdquo Computer Methodsand Programs in Biomedicine vol 71 no 2 pp 181ndash187 2003

[25] G King and L Zeng ldquoExplaining rare events in internationalrelationsrdquo International Organization vol 55 no 3pp 693ndash715 2001

[26] M Maalouf ldquoLogistic regression in data analysis an over-viewrdquo International Journal of Data Analysis Techniques andStrategies vol 3 no 3 pp 281ndash299 2011

[27] D E A Sulasih S W Purnami and S P Rahayu BeBeoretical Study of Rare Event Weighted Logistic Regressionfor Classification of Imbalanced Data Universitas Muham-madiyah Surakarta Surakarta Indonesia 2015

[28] G Heinze logistf Firthrsquos Bias Reduced Logistic RegressionR Package Version 123 2018 httpsCRANR-projectorgpackage=logistf

[29] C Choirat J Honaker K Imai G King and O Lau ZeligEveryonersquos Statistical Software Version 5161 2018 httpzeligprojectorg

BioMed Research International 11

[30] M L Ong L Lu S Lee et al ldquoA comparison of the hier-archical generalized linear model multiple-indicators mul-tiple-causes and the item response theory-likelihood ratio testfor detecting differential item functioningrdquo in QuantitativePsychology Research pp 343ndash357 Springer Berlin Germany2015

[31] D Stevanovic Z Bagheri O Atilola et al ldquoCross-culturalmeasurement invariance of the revised Child anxiety anddepression scale across 11 world-wide societiesrdquo Epidemiol-ogy and Psychiatric Sciences vol 26 no 4 pp 430ndash440 2017

[32] G M Cordeiro and P McCullagh ldquoBias correction in gen-eralized linear modelsrdquo Journal of the Royal Statistical SocietySeries B (Methodological) vol 53 no 3 pp 629ndash643 1991

[33] T Maiti and V Pradhan ldquoBias reduction and a solution forseparation of logistic regression with missing covariatesrdquoBiometrics vol 65 no 4 pp 1262ndash1269 2009

[34] B D Zumbo A Handbook on the Beory and Methods ofDifferential Item Functioning (DIF) National DefenseHeadquarters Ottawa Canada 1999

[35] J-S Lai J Teresi and R Gershon ldquoProcedures for the analysisof differential item functioning (DIF) for small sample sizesrdquoEvaluation amp the Health Professions vol 28 no 3 pp 283ndash294 2005

[36] G Tutz and G Schauberger A Penalty Approach to Differ-ential Item Functioning in Rasch Models Springer BerlinGermany 2015

[37] D Magis F Tuerlinckx and P de Boeck ldquoDetection ofdifferential item functioning using the lasso approachrdquoJournal of Educational and Behavioral Statistics vol 40 no 2pp 111ndash135 2015

12 BioMed Research International

Page 11: A Comparative Study of the Bias Correction Methods for ... · 2020. 3. 2. · mum likelihood estimation method (PML) was originally developedbyDavidFirth[16]inordertoreducethesmall

Conflicts of Interest

+e authors declare no potential conflicts of interest withrespect to the research authorship andor publication ofthis article

Acknowledgments

+is work was supported by the Shiraz University of MedicalSciences Research Council Shiraz Iran under grant no96-01-01-14601 +is paper was extracted fromMarjan FaghihrsquosPhD thesis under supervision of Prof Peyman JafariPeyman Jafari andMarjan Faghih equally contributed to thiswork

References

[1] B D Zumbo ldquo+ree generations of DIF analyses consideringwhere it has been where it is now and where it is goingrdquoLanguage Assessment Quarterly vol 4 no 2 pp 223ndash2332007

[2] T A Ackerman ldquo+e theory and practice of item responsetheory by de Ayala R Jrdquo Journal of Educational Measure-ment vol 47 no 4 pp 471ndash476 2010

[3] B E Clauser and K M Mazor ldquoUsing statistical proceduresto identify differentially functioning test itemsrdquo EducationalMeasurement Issues and Practice vol 17 no 1 pp 31ndash442005

[4] H Swaminathan and H J Rogers ldquoDetecting differential itemfunctioning using logistic regression proceduresrdquo Journal ofEducational Measurement vol 27 no 4 pp 361ndash370 1990

[5] H J Rogers and H Swaminathan ldquoA comparison of logisticregression and Mantel-Haenszel procedures for detectingdifferential item functioningrdquo Applied Psychological Mea-surement vol 17 no 2 pp 105ndash116 1993

[6] P Narayanon and H Swaminathan ldquoIdentification of itemsthat show nonuniform DIFrdquo Applied Psychological Mea-surement vol 20 no 3 pp 257ndash274 1996

[7] P Jafari D Stevanovic and Z Bagheri ldquoCross-culturalmeasurement equivalence of the KINDL questionnaire forquality of life assessment in children and adolescentsrdquo ChildPsychiatry amp Human Development vol 47 no 2 pp 291ndash3042016

[8] P K Crane L E Gibbons K Ocepek-Welikson et al ldquoAcomparison of three sets of criteria for determining thepresence of differential item functioning using ordinal logisticregressionrdquo Quality of Life Research vol 16 no 1 pp 69ndash842007

[9] S Lee ldquoDetecting differential item functioning using thelogistic regression procedure in small samplesrdquo AppliedPsychological Measurement vol 41 no 1 pp 30ndash43 2017

[10] E Allahyari P Jafari and Z Bagheri ldquoA simulation study toassess the effect of the number of response categories on thepower of ordinal logistic regression for differential itemfunctioning analysis in rating scalesrdquo Computational andMathematical Methods in Medicine vol 2016 Article ID5080826 8 pages 2016

[11] P O Monahan C A McHorney T E Stump andA J Perkins ldquoOdds ratio delta ETS classification andstandardization measures of dif magnitude for binary logisticregressionrdquo Journal of Educational and Behavioral Statisticsvol 32 no 1 pp 92ndash109 2007

[12] S Apinyapibal N Lawthong and S Kanjanawasee ldquoAcomparative analysis of the efficacy of differential itemfunctioning detection for dichotomously scored items amonglogistic regression SIBTEST and raschtree methodsrdquo Proce-dia-Social and Behavioral Sciences vol 191 pp 21ndash25 2015

[13] H-F Chen and K-Y Jin ldquoApplying logistic regression todetect differential item functioning in multidimensionaldatardquo Frontiers in Psychology vol 9 p 1302 2018

[14] D Svetina and L Rutkowski ldquoDetecting differential itemfunctioning using generalized logistic regression in thecontext of large-scale assessmentsrdquo Large-scale Assessments inEducation vol 2 no 1 2014

[15] R Puhr G Heinze M Nold et al ldquoFirthrsquos logistic regressionwith rare events accurate effect estimates and predictionsrdquoStatistics in Medicine vol 36 no 14 pp 2302ndash2317 2017

[16] D Firth ldquoBias reduction of maximum likelihood estimatesrdquoBiometrika vol 80 no 1 pp 27ndash38 1993

[17] M Pavlou G Ambler S Seaman M De Iorio andR Z Omar ldquoReview and evaluation of penalised regressionmethods for risk prediction in low-dimensional data with feweventsrdquo Statistics in Medicine vol 35 no 7 pp 1159ndash11772016

[18] M Maalouf and T B Trafalis ldquoRobust weighted kernel lo-gistic regression in imbalanced and rare events datardquo Com-putational Statistics amp Data Analysis vol 55 no 1pp 168ndash183 2011

[19] M Maalouf D Homouz and T B Trafalis ldquoLogistic re-gression in large rare events and imbalanced data a perfor-mance comparison of prior correction and weightingmethodsrdquo Computational Intelligence vol 34 no 1pp 161ndash174 2018

[20] G King and L Zeng ldquoLogistic regression in rare events datardquoPolitical Analysis vol 9 no 2 pp 137ndash163 2001

[21] I-C Huang W L Leite P Shearer M Seid D A Revickiand E A Shenkman ldquoDifferential item functioning in qualityof life measure between children with and without specialhealth-care needsrdquo Value in Health vol 14 no 6 pp 872ndash883 2011

[22] H Jeffreys ldquoAn invariant form for the prior probability inestimation problemsrdquo Proceedings of the Royal Society ofLondon Series A Mathematical and Physical Sciencesvol 186 no 1007 pp 453ndash461 1946

[23] G Heinze and M Schemper ldquoA solution to the problem ofseparation in logistic regressionrdquo Statistics in Medicinevol 21 no 16 pp 2409ndash2419 2002

[24] G Heinze and M Ploner ldquoFixing the nonconvergence bug inlogistic regression with SPLUS and SASrdquo Computer Methodsand Programs in Biomedicine vol 71 no 2 pp 181ndash187 2003

[25] G King and L Zeng ldquoExplaining rare events in internationalrelationsrdquo International Organization vol 55 no 3pp 693ndash715 2001

[26] M Maalouf ldquoLogistic regression in data analysis an over-viewrdquo International Journal of Data Analysis Techniques andStrategies vol 3 no 3 pp 281ndash299 2011

[27] D E A Sulasih S W Purnami and S P Rahayu BeBeoretical Study of Rare Event Weighted Logistic Regressionfor Classification of Imbalanced Data Universitas Muham-madiyah Surakarta Surakarta Indonesia 2015

[28] G Heinze logistf Firthrsquos Bias Reduced Logistic RegressionR Package Version 123 2018 httpsCRANR-projectorgpackage=logistf

[29] C Choirat J Honaker K Imai G King and O Lau ZeligEveryonersquos Statistical Software Version 5161 2018 httpzeligprojectorg

BioMed Research International 11

[30] M L Ong L Lu S Lee et al ldquoA comparison of the hier-archical generalized linear model multiple-indicators mul-tiple-causes and the item response theory-likelihood ratio testfor detecting differential item functioningrdquo in QuantitativePsychology Research pp 343ndash357 Springer Berlin Germany2015

[31] D Stevanovic Z Bagheri O Atilola et al ldquoCross-culturalmeasurement invariance of the revised Child anxiety anddepression scale across 11 world-wide societiesrdquo Epidemiol-ogy and Psychiatric Sciences vol 26 no 4 pp 430ndash440 2017

[32] G M Cordeiro and P McCullagh ldquoBias correction in gen-eralized linear modelsrdquo Journal of the Royal Statistical SocietySeries B (Methodological) vol 53 no 3 pp 629ndash643 1991

[33] T Maiti and V Pradhan ldquoBias reduction and a solution forseparation of logistic regression with missing covariatesrdquoBiometrics vol 65 no 4 pp 1262ndash1269 2009

[34] B D Zumbo A Handbook on the Beory and Methods ofDifferential Item Functioning (DIF) National DefenseHeadquarters Ottawa Canada 1999

[35] J-S Lai J Teresi and R Gershon ldquoProcedures for the analysisof differential item functioning (DIF) for small sample sizesrdquoEvaluation amp the Health Professions vol 28 no 3 pp 283ndash294 2005

[36] G Tutz and G Schauberger A Penalty Approach to Differ-ential Item Functioning in Rasch Models Springer BerlinGermany 2015

[37] D Magis F Tuerlinckx and P de Boeck ldquoDetection ofdifferential item functioning using the lasso approachrdquoJournal of Educational and Behavioral Statistics vol 40 no 2pp 111ndash135 2015

12 BioMed Research International

Page 12: A Comparative Study of the Bias Correction Methods for ... · 2020. 3. 2. · mum likelihood estimation method (PML) was originally developedbyDavidFirth[16]inordertoreducethesmall

[30] M L Ong L Lu S Lee et al ldquoA comparison of the hier-archical generalized linear model multiple-indicators mul-tiple-causes and the item response theory-likelihood ratio testfor detecting differential item functioningrdquo in QuantitativePsychology Research pp 343ndash357 Springer Berlin Germany2015

[31] D Stevanovic Z Bagheri O Atilola et al ldquoCross-culturalmeasurement invariance of the revised Child anxiety anddepression scale across 11 world-wide societiesrdquo Epidemiol-ogy and Psychiatric Sciences vol 26 no 4 pp 430ndash440 2017

[32] G M Cordeiro and P McCullagh ldquoBias correction in gen-eralized linear modelsrdquo Journal of the Royal Statistical SocietySeries B (Methodological) vol 53 no 3 pp 629ndash643 1991

[33] T Maiti and V Pradhan ldquoBias reduction and a solution forseparation of logistic regression with missing covariatesrdquoBiometrics vol 65 no 4 pp 1262ndash1269 2009

[34] B D Zumbo A Handbook on the Beory and Methods ofDifferential Item Functioning (DIF) National DefenseHeadquarters Ottawa Canada 1999

[35] J-S Lai J Teresi and R Gershon ldquoProcedures for the analysisof differential item functioning (DIF) for small sample sizesrdquoEvaluation amp the Health Professions vol 28 no 3 pp 283ndash294 2005

[36] G Tutz and G Schauberger A Penalty Approach to Differ-ential Item Functioning in Rasch Models Springer BerlinGermany 2015

[37] D Magis F Tuerlinckx and P de Boeck ldquoDetection ofdifferential item functioning using the lasso approachrdquoJournal of Educational and Behavioral Statistics vol 40 no 2pp 111ndash135 2015

12 BioMed Research International