identification of differential item functioning in multiple ... · a multivariate outlier...

23
Multivariate Behavioral Research, 46:733–755, 2011 Copyright © Taylor & Francis Group, LLC ISSN: 0027-3171 print/1532-7906 online DOI: 10.1080/00273171.2011.606757 Identification of Differential Item Functioning in Multiple-Group Settings: A Multivariate Outlier Detection Approach David Magis University of Liège and K. U. Leuven Paul De Boeck University of Amsterdam and K. U. Leuven We focus on the identification of differential item functioning (DIF) when more than two groups of examinees are considered. We propose to consider items as elements of a multivariate space, where DIF items are outlying elements. Following this approach, the situation of multiple groups is a quite natural case. A robust statistics technique is proposed to identify DIF items as outliers in the multivariate space. For low dimensionalities, up to 2–3 groups, a simple graphical tool is derived. We illustrate our approach with a reanalysis of data from Kim, Cohen, and Park (1995) on using calculators for a mathematics test. Differential item functioning (DIF) is an undesirable phenomenon that can affect the validity of test conclusions. An item is said to function differently (or shortly, to exhibit DIF) among two (or more) groups of examinees whenever respondents with identical ability levels, but who are from different groups, have different probabilities of endorsing the item. In this sense, DIF can be seen as adding extra Correspondence concerning this article should be addressed to David Magis, Department of Mathematics (B37), University of Liège, Grande Traverse 12, B-4000 Liège, Belgium. E-mail: [email protected] 733 Downloaded by [University of Liege], [David Magis] at 01:24 18 October 2011

Upload: others

Post on 16-Jul-2020

24 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Identification of Differential Item Functioning in Multiple ... · A Multivariate Outlier Detection Approach David Magis University of Liège and K. U. Leuven Paul De Boeck University

Multivariate Behavioral Research, 46:733–755, 2011

Copyright © Taylor & Francis Group, LLC

ISSN: 0027-3171 print/1532-7906 online

DOI: 10.1080/00273171.2011.606757

Identification of Differential ItemFunctioning in Multiple-Group Settings:

A Multivariate Outlier DetectionApproach

David MagisUniversity of Liège and

K. U. Leuven

Paul De BoeckUniversity of Amsterdam and

K. U. Leuven

We focus on the identification of differential item functioning (DIF) when more

than two groups of examinees are considered. We propose to consider items as

elements of a multivariate space, where DIF items are outlying elements. Following

this approach, the situation of multiple groups is a quite natural case. A robust

statistics technique is proposed to identify DIF items as outliers in the multivariate

space. For low dimensionalities, up to 2–3 groups, a simple graphical tool is

derived. We illustrate our approach with a reanalysis of data from Kim, Cohen,

and Park (1995) on using calculators for a mathematics test.

Differential item functioning (DIF) is an undesirable phenomenon that can affect

the validity of test conclusions. An item is said to function differently (or shortly,

to exhibit DIF) among two (or more) groups of examinees whenever respondents

with identical ability levels, but who are from different groups, have different

probabilities of endorsing the item. In this sense, DIF can be seen as adding extra

Correspondence concerning this article should be addressed to David Magis, Department of

Mathematics (B37), University of Liège, Grande Traverse 12, B-4000 Liège, Belgium. E-mail:

[email protected]

733

Do

wn

load

ed b

y [

Un

iver

sity

of

Lie

ge]

, [D

avid

Mag

is]

at 0

1:2

4 1

8 O

cto

ber

20

11

Page 2: Identification of Differential Item Functioning in Multiple ... · A Multivariate Outlier Detection Approach David Magis University of Liège and K. U. Leuven Paul De Boeck University

734 MAGIS, DE BOECK

dimensionality in the data (Ackerman, 1992). The identification and removal of

DIF items are a necessity for valid conclusions.

Many DIF detection methods have been proposed for the case of two groups

of examinees. Well-known methods are the Mantel-Haenszel method (Holland &

Thayer, 1988), standardization (Dorans & Kulick, 1986), the SIBTEST method

(Shealy & Stout, 1993), the logistic regression procedure (Swaminathan &

Rogers, 1990), Lord’s chi-square test (Lord, 1980), and the likelihood-ratio test

(Thissen, Steinberg, & Wainer, 1988). For a review of these (and other) methods,

see Clauser and Mazor (1998), Holland and Wainer (1993), Millsap and Everson

(1993), Osterlind and Everson (2009), and Penfield and Camilli (2007).

Although the DIF literature is vast, only a few studies have focused on

multiple-group DIF identification. The relative lack of interest has historic and

pragmatic reasons. Historically, the common case is where the test is constructed

primarily on the basis of (and for) a reference group, whereas an individual wants

to apply it also to a minority group, called the focal group (Angoff & Ford, 1973).

Gender is an exception because no minority group is involved, but the situation

is formally equivalent because there are only two gender groups. In line with

this common situation, the DIF methodology is focused on comparing one group

with another, and if more than one minority group is involved, it makes sense

to compare each of them as a focal group with the reference group. However,

this leads into well-known issues of multiple testing (e.g., Miller, 1981). From

a pragmatic point of view, the focus on just two groups makes sense because

many of the traditional methods cannot easily be extended to a simultaneous

comparison of more than two groups of examinees.

Because of an increasing interest for large-scale or international assessment

studies, such as PISA or TIMSS, there is a real need for multiple-group DIF

methods. Several traditional methods have been extended already: the general-

ized Mantel-Haenszel method (Penfield, 2001) and the generalized Lord’s test

(Kim, Cohen, & Park, 1995). It is clear that the likelihood-ratio test approach

can also be applied for multiple groups.

The main purpose of this article is to present a robust outlier method that can

be applied to more than two groups of examinees without an extension. The key

idea is to rely on a transformation of traditional DIF statistics in order to identify

outliers in the multigroup space of these transformed DIF statistics, based on

principles from robust statistics. This approach has several advantages. First,

in principle, any DIF statistic from traditional DIF methods can be considered,

although in practice an application of the approach is much easier if the chosen

statistic is normally distributed. We therefore restricted this study to normally

distributed DIF statistics. If the distribution cannot be assumed to be normal, it

would be necessary to simulate the distribution, such as with a bootstrap method.

Second, the procedure of DIF identification is one-step process. No further

iterative application of the process is needed. The traditional DIF detection

Do

wn

load

ed b

y [

Un

iver

sity

of

Lie

ge]

, [D

avid

Mag

is]

at 0

1:2

4 1

8 O

cto

ber

20

11

Page 3: Identification of Differential Item Functioning in Multiple ... · A Multivariate Outlier Detection Approach David Magis University of Liège and K. U. Leuven Paul De Boeck University

MULTIVARIATE OUTLIER APPROACH TO MULTIGROUP DIF 735

methods do require an iterative purification process to avoid distortions due

to the presence of other DIF items than the one under consideration. Third,

although the traditional methods are sensitive to Type I error inflation in the

case of asymmetric DIF, it was recently pointed out (Magis & De Boeck, 2011)

that, in the usual case of two groups, the outlier detection approach is robust

against this Type I error inflation. Although it was not a subject of investigation

in the present study, the same kind of robustness for the case of multiple groups

could be expected, based on the statistical principles of the approach.

We start with an explanation of the principles of the robust outlier approach

for the case of two groups. Then, we present the more general approach in-

dependent of the number of groups. A graphical tool is described that can be

used up to three groups. Finally, the general approach and the graphical tool are

illustrated with a real-data example about a test of mathematics.

OUTLIER IDENTIFICATION BETWEEN TWO GROUPS

The traditional DIF methods look rather diverse. Some rely on item response

theory (IRT) models, such as Lord’s test or the likelihood-ratio test methods,

whereas others are built on statistical methods for discrete data, for instance the

Mantel-Haenszel approach, logistic regression, or SIBTEST. Several methods

focus primarily on one type of DIF effect (uniform or nonuniform). Uniform

DIF occurs when the relationship between the group membership and the item

response does not depend on the ability level, that is, the DIF effect is uniform

along the ability scale. Nonuniform DIF implies that the DIF effect can be dif-

ferent depending on the ability level (Clauser & Mazor, 1998). Other techniques

are able to detect both types of DIF. On the other hand, the various methods

have some important formal aspects in common.

Let J be the number of items and let zj .j D 1; : : : ; J / be the DIF statistic

for item j . The form of zj depends on the DIF method. For instance, with

the SIBTEST method, zj corresponds to the test statistic B (Shealy & Stout,

1993, Equation 18), and for Lord’s test, it is the chi-square statistic ¦2j as given

by Kim et al. (1995, Equation 10). The choice of a DIF statistic is crucial

for the type of DIF. For instance, the statistics based on the Mantel-Haenszel

and SIBTEST methods focus primarily on uniform DIF, whereas the logistic

regression procedure can detect uniform and nonuniform DIF separately and in

combination. With IRT methods, it is mainly the selection of an IRT model that

imposes the kind of DIF effect one can focus on.

The distribution of zj under the hypothesis of no DIF obviously depends on

the statistic. Most often the asymptotic null distribution is the standard normal

distribution, for instance with SIBTEST’s B statistic, the Mantel-Haenszel esti-

mate of the common odds ratio, and the standardized areas between two item

Do

wn

load

ed b

y [

Un

iver

sity

of

Lie

ge]

, [D

avid

Mag

is]

at 0

1:2

4 1

8 O

cto

ber

20

11

Page 4: Identification of Differential Item Functioning in Multiple ... · A Multivariate Outlier Detection Approach David Magis University of Liège and K. U. Leuven Paul De Boeck University

736 MAGIS, DE BOECK

characteristic curves (Raju, 1988, 1990). Another common null distribution is

the chi-square distribution, as for the Wald and likelihood-ratio statistics, Lord’s

test statistic (from which a normally distributed statistic can also be derived),

and the Mantel-Haenszel chi-square statistic. In this study, we put emphasis on

normally distributed DIF statistics.

Despite some important intrinsic differences between statistical methods, they

are all based on a common logic with four elements: (a) a DIF statistic is defined;

(b) the statistical distribution of this DIF statistic under the null hypothesis of no

DIF is determined; (c) using this null distribution, the empirical value of the DIF

statistic is compared to the quantiles of the null distribution; and (d) because the

DIF statistic of an item is affected by the presence of DIF among the other items,

a purification procedure is set up until convergence to a supposedly DIF-free

test.

The (robust) outlier approach is meant to induce the null distribution in a one-

step purification, or in other words, directly from the data without any iteration,

and this null distribution refers to a distribution over items. Suppose that in the

absence of DIF, the DIF statistic is distributed across items in a certain way that

corresponds to noise. The presence of DIF can be seen as a process that adds

a signal to the noise, so that the DIF statistic of affected items deviates from

this distribution. Assuming that the DIF items are a minority, it is possible to

identify those DIF items with a robust outlier approach, without the need of an

iterative purification process, and with a robust estimate of the null distribution

as a result. The crucial assumption is that the DIF items are a minority. However,

if they would be a majority, then the concept of DIF does no longer apply. The

test is then simply measuring a different construct altogether. Although majority

DIF can still be called DIF, we focus here on minority DIF.

The outlier approach can be sketched in the following way. First, some DIF

statistic zj must be computed for each item, for example, the Mantel-Haenszel

estimate of the common odds ratio, or the Raju distance (Raju, 1988). Second, a

transformation is applied to this statistic, so that a statistic •j is obtained which

is asymptotically normally distributed across items under the null hypothesis of

no DIF. In the application further on, we in fact use the Raju distance and its

standardization (Raju, 1988). The statistic •j is derived in the following way:

•j Dzj z

sz

(1)

where z and sz are the sample mean and the sample covariance of the zj mea-

sures .j D 1; : : : ; J /, respectively. The statistic •j actually corresponds to the

sample Mahalanobis distance in a one-dimensional space (Mahalanobis, 1936).

Hence, using the properties of the Mahalanobis distance and assuming that the

zj measures come from a normal distribution, the statistic •2j is asymptotically

Do

wn

load

ed b

y [

Un

iver

sity

of

Lie

ge]

, [D

avid

Mag

is]

at 0

1:2

4 1

8 O

cto

ber

20

11

Page 5: Identification of Differential Item Functioning in Multiple ... · A Multivariate Outlier Detection Approach David Magis University of Liège and K. U. Leuven Paul De Boeck University

MULTIVARIATE OUTLIER APPROACH TO MULTIGROUP DIF 737

chi-square distributed with one degree of freedom. However, Mardia, Kent, and

Bibby (1979) showed that under the same conditions, the modified statistic

•�j DJ

.J 1/2•2

j DJ

.J 1/2

.zj z/2

s2z

(2)

follows an exact beta distribution with parameters 0.5 and .J 2/=2 (see also

Hardin & Rocke, 2005). With shorter tests, this exact distribution is much more

accurate than the asymptotic chi-square distribution, and both are nearly identical

when the number of items is very large.

The exact beta distribution of the •�j statistic holds on the condition that the

zj statistics are normally distributed. This motivates the restriction to normally

distributed DIF statistics. However, the •�j statistic may still be used as an

indicator of outlying (DIF) item, even when nonnormally distributed zj measures

are considered. This would be, however, at the cost of losing the exact beta

distribution, and the search for an accurate DIF classification threshold would

be more complicated, for example, because the distribution needs to be simulated

such as with a bootstrap method.

Based on a comparison of the •�j statistic from Equation (2) with the appropri-

ate quantile of the beta distribution, outlying items can be identified. However,

•j and •�j depend on the sample mean and the sample standard deviation of the

DIF measures zj . Provided that one or several measures zj are outlying, this

seriously affects the computation of z and sz , and hence the values of •j and

•�j . This issue is similar to the issue encountered by the traditional methods, for

which item purification was recommended as a way to withdraw the impact of

DIF items on the DIF test results for other items. Here, we propose to replace

the regular sample estimates, which are nonrobust, by robust alternatives. The

term robust has to be understood in its statistical sense: a robust estimator is

much less sensitive to aberrant or outlying measures and returns more reliable

estimates than the classical, nonrobust methods. To better emphasize the fact

that sample estimates are not robust in the presence of outliers, they are further

referred to as the nonrobust estimates.

In the one-dimensional framework of DIF (i.e., when only two groups are

studied), straightforward robust estimators of location and scale are the median

Qz and the median absolute deviation (or MAD) madz . The MAD corresponds

the median of the absolute differences between the zj s and their median value Qz(Rousseeuw & Croux, 1993). Replacing the nonrobust estimates by their robust

versions, it is possible to obtain the robust outlier identification rule by computing

Q•�j DJ

.J 1/2Q•2j D

J

.J 1/2

.zj Qz/2

mad2z

(3)

and comparing Q•�j to the same quantile of the beta distribution.

Do

wn

load

ed b

y [

Un

iver

sity

of

Lie

ge]

, [D

avid

Mag

is]

at 0

1:2

4 1

8 O

cto

ber

20

11

Page 6: Identification of Differential Item Functioning in Multiple ... · A Multivariate Outlier Detection Approach David Magis University of Liège and K. U. Leuven Paul De Boeck University

738 MAGIS, DE BOECK

It is important to notice that by replacing the nonrobust estimates with their

robust counterparts, there is no guarantee that the modified statistic Equation

(3) follows a beta distribution. However, we keep comparing Q•�j to the beta

quantile for a reason that is central to the concept of robust estimators and also

commonly accepted (e.g., Mardia et al., 1979; Maronna, Martin, & Yohai, 2006;

Rousseeuw & Leroy, 1987). The concept of robust estimators is that they capture

the location and the scale of the good part of the data set, that is, the majority

of data points that are unaffected by outliers and therefore may be assumed

to follow a given distribution. In the absence of outliers, robust and nonrobust

estimators usually return very close estimates. When outliers are present, they

distort the distributional assumption and affect the nonrobust estimates, whereas

robust methods get rid of this effect. In sum, the violation of beta distribution

(by replacing nonrobust estimates by robust alternatives) should not affect that

much the validity of the beta threshold selection.

OUTLIER IDENTIFICATION FOR MORE THAN

TWO GROUPS

Consider one reference group and F focal groups with F � 2. Each item j can

be characterized by a vector of DIF measures, say zj D .zj1; : : : ; zjF /, where

zjf .f D 1; : : : F / denotes the measure of DIF effect between the reference

group and focal group f . It is possible to assume that zjf is a traditional measure

of DIF between two groups, as listed in the previous section, and is computed

repeatedly between each focal group and the reference group. In the simple case

of two groups, zj D zj1 and is thus a scalar, so that only one dimension must

be considered. The DIF identification therefore reduces to the identification of

outliers in one set of statistics. For multiple groups, DIF is identified by detecting

outlying vectors zj among the set of test items. Outlying can occur for one or

for several components of zj , which makes the problem multidimensional. DIF

can be present between the reference group and one or several focal group(s).

The approach can easily be used without a reference group but instead with an

undifferentiated set of groups as in an international comparison study. When

there is no natural reference group and when the purpose is to identify DIF

across groups in a global way, then any group can function as a reference group.

For the identification of overall DIF in the sense of DIF somewhere between

the groups (without specifying where), the choice of a reference group has no

consequences. The situation is similar to a situation in which multiple categories

are compared through a global test. The choice of a baseline category does not

affect the result of the global test.

A fundamental assumption for further developments is that zj arises from a

multivariate normal distribution. This obviously restricts the set of allowable DIF

Do

wn

load

ed b

y [

Un

iver

sity

of

Lie

ge]

, [D

avid

Mag

is]

at 0

1:2

4 1

8 O

cto

ber

20

11

Page 7: Identification of Differential Item Functioning in Multiple ... · A Multivariate Outlier Detection Approach David Magis University of Liège and K. U. Leuven Paul De Boeck University

MULTIVARIATE OUTLIER APPROACH TO MULTIGROUP DIF 739

measures if the desire is to rely on the identification rule to be formulated. Set z

as the sample mean vector (of length F ) and S as the sample covariance matrix

of the DIF vectors zj . Then, the square of the sample Mahalanobis distance ¨j

for item j is

¨j Dq

.zj z/0S 1.zj z/ (4)

where the prime stands for transposed vectors and matrices, and is asymptotically

chi-square distributed with F degrees of freedom (Mahalanobis, 1936). Also,

the modified statistic

¥�j DJ

.J 1/2¨2

j DJ

.J 1/2.zj z/0S 1.zj z/ (5)

has an exact beta distribution with parameters F=2 and .J F 1/=2 (Hardin

& Rocke, 2005; Mardia et al., 1979). Note that Equation (4) reduces to Equation

(1), and Equation (5) reduces to Equation (2), when F is equal to 1 (i.e., in

the usual case of a single focal group). Hence, the detection of outlying vectors

of DIF measures zj can be achieved by computing the modified statistics ¥�jin Equation (5) and comparing them to the corresponding quantile of the beta

distribution, similarly as in the previous section: item j is flagged as DIF if and

only if

¥�j DJ

.j 1/2.zj z/0S 1.zj z/ > Q’ (6)

with Q’ being the quantile of the beta distribution related to significance level

’.

As in the univariate case, DIF between the reference group and any focal

group f affects the corresponding zjf measure. Typically, if item j exhibits

DIF, then zjf will be much larger or much smaller than most other zkf values

.k ¤ j /, which has an inflation effect on the sample variance and covariance in

the S matrix. This often leads to reduced ¥�j measures and an underdetection

of DIF items. Another problem is that also the mean z may be distorted by the

presence of DIF items, which can make DIF-free items looking as if they are

DIF items indeed. To overcome these problems, and in line with the univariate

outlier approach, we proposed to replace the nonrobust estimates z and S by

some robust multivariate alternatives. Let Qz and R be such robust estimates of

the mean vector and the covariance matrix, respectively. The robust identification

of outlying vectors consists then in computing the statistics

Q¥�j DJ

.J 1/2Q¥2

j DJ

.J 1/2.zj Qz/0R 1.zj Qz/ (7)

Do

wn

load

ed b

y [

Un

iver

sity

of

Lie

ge]

, [D

avid

Mag

is]

at 0

1:2

4 1

8 O

cto

ber

20

11

Page 8: Identification of Differential Item Functioning in Multiple ... · A Multivariate Outlier Detection Approach David Magis University of Liège and K. U. Leuven Paul De Boeck University

740 MAGIS, DE BOECK

and comparing the results to the same quantile of the beta distribution: item j

is flagged as DIF if and only if

Q¥�j DJ

.J 1/2.zj Qz/0R 1.zj Qz/ > Q’: (8)

In the univariate framework, simple robust alternatives can be considered,

as highlighted previously. However, turning to multivariate statistics leads to

technical complexity and the lack of straightforward robust estimators of � and

†. Several robust estimators are available, from four broad categories:

1. The projection methods, which are weighted estimators of the mean and

the covariance matrix. The weights are the inverse of an outlying measure

derived for a one-dimensional projection of the points. If a point is a

multivariate outlier, then a one-dimensional projection of the data can

be made in which the degree of outlying is maximized for the point in

question. The Donoho-Stahel estimator (Maronna & Yohai, 1995) belongs

to this category.

2. Methods based on weighted maximum likelihood estimation, with the

weights defined such that outlying observations have lower or zero weight.

The S and M estimators (Rocke, 1996; Woodruff & Rocke, 1994) are of

that kind.

3. Methods based on optimal subset identification (omitting outlying obser-

vations), among which are the Minimum Covariance Determinant (MCD)

and the Minimum Volume Ellipsoid estimators (Rousseeuw & Leroy,

1987; Rousseeuw & van Driessen, 1999).

4. The pairwise methods, for which multivariate robust estimators are built

from univariate robust estimates and then regularized to guarantee pos-

itive definite covariance matrices. The orthogonalized quadrant correla-

tion pairwise estimator (Maronna & Zamar, 2002), and the Orthogonal-

ized Gnanadesikan-Kettenring (OGK) pairwise estimator (Gnanadesikan

& Kettenring, 1972; Maronna & Zamar, 2002) belong to this category.

Although relying on different conceptual approaches, these four methods can

provide joint robust estimates of the mean vector and the covariance matrix.

Moreover, it is commonly observed that in the absence of outliers the robust

estimates are close approximations of the regular nonrobust estimates, although

they differ substantially from the nonrobust estimates when outlying vectors are

present. Also, despite some conceptual differences between the robust methods,

they usually return similar estimates. A main drawback is that most robust

methods rely on a kind of tuning parameter that must be fixed in advance.

Additional information can be found in Huber and Ronchetti (2009), Maronna

et al. (2006), and Todorov and Filzmoser (2009).

Do

wn

load

ed b

y [

Un

iver

sity

of

Lie

ge]

, [D

avid

Mag

is]

at 0

1:2

4 1

8 O

cto

ber

20

11

Page 9: Identification of Differential Item Functioning in Multiple ... · A Multivariate Outlier Detection Approach David Magis University of Liège and K. U. Leuven Paul De Boeck University

MULTIVARIATE OUTLIER APPROACH TO MULTIGROUP DIF 741

The choice among the available methods is often driven by preferences

and the availability of computer software. However, most robust estimators are

available from the packages rrcov (Todorov & Filzmoser, 2009) of the software

R (R Development Core Team, 2010). There are no limitations to the number

of focal groups that can be taken into account. But, the larger this number, the

larger the dimensionality of the mean vector and the covariance matrix to be

estimated, and the more intensive the computational task. In this article, all four

types of robust estimators are considered for the applications and their output is

compared.

GRAPHICAL INTERPRETATION

The use of the Mahalanobis distance in the statistics of Equations (4), (5), and (7)

has a useful graphical interpretation. Let MD.x; �; †/ be the Mahalanobis dis-

tance of any n-dimensional vector x, for a given mean vector � and a covariance

matrix †. Then, for any positive constant Q, the relationship MD.x; �; †/ DQ defines an n-dimensional ellipsoid, whose center is given by �, the main

axes are given by the eigenvectors of †, and their lengths are proportional to

the eigenvalues of † (e.g., Johnson & Wichern, 1998). Furthermore, any n-

dimensional vector y satisfying MD.y ; �; †/ � Q is located within (or on the

boundary of) this ellipsoid. Similarly, the inequality MD.y ; �; †/ > Q implies

that y is located outside of the ellipsoid.

This graphical interpretation is most useful in the present DIF framework.

Consider for instance the nonrobust estimates of the mean vector and the co-

variance matrix. Recall that an item is flagged as DIF if and only if ¥�j > Q’

according to Equation (6). Making use of the definition in Equation (4), the

classification rule can be rewritten as

.zj z/0S 1.zj z/ >.J 1/2

JQ’ D Q�

’ : (9)

Outlying vectors zj are therefore lying outside of the F -dimensional ellipsoid

defined by the mean vector z, the covariance matrix S , and the constant Q�’ .

The robust classification rule is similar to Equation (9), with nonrobust estimates

being replaced by their robust alternatives.

The dimensionality of the ellipsoid equals the number of focal groups. If

there is only one focal group, the multivariate outlier identification reduces to

the classical univariate method and the ellipsoid collapses into a one-dimensional

interval that can easily be delineated on the basis of the statistic as in Equation

(3). In the case of two focal groups, the ellipsoid is a two-dimensional ellipse

whose contour can be described with simple equations. For the case of two focal

Do

wn

load

ed b

y [

Un

iver

sity

of

Lie

ge]

, [D

avid

Mag

is]

at 0

1:2

4 1

8 O

cto

ber

20

11

Page 10: Identification of Differential Item Functioning in Multiple ... · A Multivariate Outlier Detection Approach David Magis University of Liège and K. U. Leuven Paul De Boeck University

742 MAGIS, DE BOECK

groups and nonrobust estimates, the classification rule in Equation (9) is:

.zj1 z1 zj 2 z2/

s11 s12

s12 s22

� 1 �

zj1 z1

zj 2 z2

� Q�’ (10)

which is equivalent with the combination of Equations (11) and (12):

zj1 2 I1 D Œz1 ˙ s11

p

Q�’ � (11)

zj 2 2 I2.zj1/ D�

z2 Cs12.zj1 z1/

s11

˙1

s11

q

.s11s22 s212/fs11Q�

’ .zj1 z1/2g�

: (12)

The derivation is provided in the Appendix. In other words, the ellipsoid is fully

characterized by the intervals I1 and I2.zj1/ for any zj1 2 I1. Equations (11)

and (12) permit to graphically display the two-dimensional ellipse and to locate

each item by its pair of DIF measures .zj1; zj 2/. Using robust estimates, the

formulas are

zj1 2 I1 D Œ Qz1 ˙ r11

p

Q�’ � (13)

zj 2 2 I2.zj1/ D�

Qz2 Cr12.zj1 Qz1/

r11

˙1

r11

q

.r11r22 r212/fr11Q�

’ .zj1 Qz1/2g�

(14)

Because typically the variance r11 and the determinant of R are smaller than

the nonrobust variance s11, and the determinant of S , respectively, the robust

ellipse is usually smaller in size than the nonrobust ellipse, yielding increased

outlier detection rates.

Although the robust outlier approach as such (as defined in Equations [7]

and [8]) has no limitations with respect to the number of groups, it cannot

easily be graphically interpreted when there are more than three focal groups.

The derivation of the ellipsoid becomes rapidly intractable beyond three focal

groups. For more than three groups, the identification of DIF items through

condition Equation (8) can still be performed, but without a graphical support.

Do

wn

load

ed b

y [

Un

iver

sity

of

Lie

ge]

, [D

avid

Mag

is]

at 0

1:2

4 1

8 O

cto

ber

20

11

Page 11: Identification of Differential Item Functioning in Multiple ... · A Multivariate Outlier Detection Approach David Magis University of Liège and K. U. Leuven Paul De Boeck University

MULTIVARIATE OUTLIER APPROACH TO MULTIGROUP DIF 743

ILLUSTRATIVE EXAMPLE

Data and Original Analysis

The data were from a study by Kim et al. (1995) and consisted of three groups

of 200 students enrolled in calculus and precalculus mathematics courses at a

large Midwestern university in the Fall of 1990. The reference group was not

allowed to make use of scientific calculators during the test, whereas the two

focal groups were allowed to use a calculator, a different kind depending on the

group. Fourteen items were administered to the three groups of students. We

further refer to these data as the Math data set.

Kim et al. (1995) illustrated the generalized Lord’s test method with this

data set. The item parameters were estimated under the two-parameter logistic

(2PL) model, using the marginal maximum likelihood method, and a calibration

to the reference group scale was performed. The generalized Lord’s test was

implemented through iterative linking to reduce the impact of DIF items on the

linking procedure. This iterative linking is the equivalent of item purification. To

specify the results from the generalized Lord test and to detect between which

groups DIF occurs, we performed a pairwise comparison based on Lord’s test.

For these pairwise comparisons, the usual Bonferroni correction was adopted.

For all statistical tests an alpha level of .05 was used.

The following results were obtained. First, only item 14 was flagged as DIF

after a first run of the generalized Lord’s test. The purification process required

three iterations, and after these iterations, items 10 and 14 were flagged as

functioning differently between the three groups. Apparently, item 14 had a

masking effect on item 10, which was detected only thanks to the purification.

From the pairwise group comparisons it turned out that for the first focal group

in comparison with the reference group, none of the items were flagged as DIF.

When comparing the second focal group to the reference group, only item 14

was flagged as DIF, but the DIF of item 10 was almost significant. We conclude

that (a) item 14 exhibits DIF, mainly in the second focal group; (b) item 10

exhibits DIF only when the three groups are compared, but strictly speaking not

in any of the two focal groups separately; and (c) the other items are not flagged

as DIF. Before applying the outlier detection method on these data, we inspect

the scatter plot of DIF statistics and their bivariate normality.

Scatter Plot

A graphical representation of the DIF analysis can be obtained by using the

item parameter estimates rescaled to the reference group metric (see Kim et al.,

1995, Tables 3 and 5), and by computing the standardized areas between the

characteristic curves of the reference group and each focal group. Let aj 0 and

Do

wn

load

ed b

y [

Un

iver

sity

of

Lie

ge]

, [D

avid

Mag

is]

at 0

1:2

4 1

8 O

cto

ber

20

11

Page 12: Identification of Differential Item Functioning in Multiple ... · A Multivariate Outlier Detection Approach David Magis University of Liège and K. U. Leuven Paul De Boeck University

744 MAGIS, DE BOECK

ajf .f D 1; 2/ be the estimated discrimination parameters of item j in the

reference group and in focal group f , and let bj 0 and bjf be the corresponding

estimated difficulty parameters. Then, the standardized area zjf between the

reference group and focal group f takes the following form (Raju, 1988):

zjf DHjf

¢.Hjf /(15)

where

Hjf D2.ajf aj 0/

ajf aj 0

log

1C exp

ajf aj 0.bjf bj 0/

ajf aj 0

��

.bjf bj 0/ (16)

and ¢.Hjf / can be found in Raju (1990, Equations 34–39). Under the null

hypothesis that item j is not a DIF item, the standardized area zjf has an

asymptotic standard normal distribution (Lord, 1980; Raju, 1990). This motivates

the choice of the standardized area as a measure of DIF, which is important for

the outlier detection method.

Figure 1 displays the scatter plot of pairs of zjf statistics for each item,

represented by their number in the order of administration. The figure is a sheer

depiction of the DIF statistics without the application of an outlier method.

Clearly, items 10 and 14 depart from the rest of the items, mostly because of

larger zjf measures between the second focal group and the reference group,

but item 14 shows a somewhat deviating zjf measure also for the first focal

group in comparison with the reference group.

These findings are in agreement with the conclusions from Kim et al. (1995).

However, recall that their conclusions were drawn only after item purification,

whereas no purification or iterative procedure was used for Figure 1, and that

our conclusion thus far is drawn only on the basis of a visual inspection without

(yet) applying a distribution-based statistical inference procedure.

Exploratory Testing of Bivariate Normality

The assumption of asymptotic multivariate normality for the vectors zj of DIF

measures is central for identifying multivariate outliers. It is therefore of interest

to inspect the bivariate normality of the vectors .zj1; zj 2/ displayed in Figure 1.

We can expect bivariate normality only when there are no DIF items, but also

the presence of just a few DIF items may perhaps not distort the bivariate

distribution. Given that there are only 14 items, a test of bivariate normality

cannot be very powerful.

First, under the assumption of bivariate normality, the scatter plot of the

vectors of DIF measures should have an approximate elliptical shape. Apart

from items 10 and 14, the scatter plot looks as an ellipsoid. However, the

Do

wn

load

ed b

y [

Un

iver

sity

of

Lie

ge]

, [D

avid

Mag

is]

at 0

1:2

4 1

8 O

cto

ber

20

11

Page 13: Identification of Differential Item Functioning in Multiple ... · A Multivariate Outlier Detection Approach David Magis University of Liège and K. U. Leuven Paul De Boeck University

MULTIVARIATE OUTLIER APPROACH TO MULTIGROUP DIF 745

FIGURE 1 Scatter plot of measures of standardized areas between the item response

functions of each of the two focal groups and the reference group (math data set). The plot

shows focal group 1 measures versus focal group 2 measures (in both cases in comparison

with the reference group). Items are displayed by their rank number in the data set.

exclusion of items 10 and 14 is not based on an objective method. A more

formal approach to investigate the bivariate normality is the following. As stated

previously and under the hypothesis of bivariate normality, the statistic ¥�j given

in Equation (5) has an exact beta distribution with parameters 1 and 5.5, given

that F D 2.F=2 D 1, and Œ14 2 1�=2 D 5:5). This exact beta distribution

can be tested with a simple Kolmogorov-Smirnov one-sample test (Conover,

1971) using the empirical values of ¥�j computed on the total set of items.

Rejecting the null hypothesis of a match between the sample distribution and

the theoretical beta distribution would therefore imply that the bivariate normal

distribution may not be adequate (but note that the presence of DIF items may

invalidate the test).

This Kolomogorov-Smirnov test was performed with the ks.test function of

the R software (R Development Core Team, 2010). The result was nonsignificant

(KS D 0:301; p D :128), which can be understood as a reasonable result given

Do

wn

load

ed b

y [

Un

iver

sity

of

Lie

ge]

, [D

avid

Mag

is]

at 0

1:2

4 1

8 O

cto

ber

20

11

Page 14: Identification of Differential Item Functioning in Multiple ... · A Multivariate Outlier Detection Approach David Magis University of Liège and K. U. Leuven Paul De Boeck University

746 MAGIS, DE BOECK

that the number of items was small and that among them a few items were

DIF items. The test was repeated twice, with item 14 excluded, and with items

10 and 14 excluded. With just item 14 excluded, the KS test statistic equaled

0.243 (p D :367), and with also item 10 excluded, KS D 0:226 .p D :457/. As

expected, the exclusion of the most outlying item(s) yielded an improvement of

the test results in favor of the null hypothesis (decreased KS statistics, increased

p values).

In sum, the assumption of bivariate normality is not rejected for the math

data set and can be used as the basis for the outlier detection approach in this

study. After these exploratory steps, we explain the outlier detection method,

beginning with how the robust multivariate estimates can be obtained.

Robust Estimates

The nonrobust estimates � and † can be obtained directly through the common

estimation techniques. For the corresponding robust technique, four estimator

types are used: the Donoho-Stahel estimator, the constrained M estimator, the

MCD estimator, and the OGK estimator, one from each of the main categories

of robust multivariate estimators, as discussed in Section 3. All four are imple-

mented in the R package rrcov (Todorov & Filzmoser, 2009).

The OGK method, in its one-step (not reweighted) version, does not require

fixing any tuning parameter. The Donoho-Stahel and the constrained M estimator

require the specification of the so-called asymptotic rejection probability (arp),

that is, the probability that a nonoutlying observation would be eventually flagged

as outlying. This is closely related to the Type I error of a statistical test, which

is why we fixed the arp value to .05 for both methods. Finally, the MCD

method (in its not reweighted version) requires that the size of the optimal

subset is determined. The optimal subset is the a priori maximum number of

observations that are not outliers. The earlier visual inspection suggests that

the optimal subset consists of 12 items, but this visual inspection is a rather

subjective method. A more objective method would be to use an elbow criterion

for a relevant criterion. Possibly relevant criteria are the KS statistic we have

used previously, and the determinant of the covariance matrix. Because outliers

have an impact on the nonrobust variance and covariance estimates, they also

tend to inflate the determinant of the covariance matrix, so the elbow criterion

may work. We have used the smallest determinant of the covariance matrix

among all possible subsets of p items out of n, with n=2 < p � n.n D 14/.

If the plot of the smallest determinants against n p shows an elbow, then the

value of n p is used as the size of the optimal subset. Figure 2 displays the

results for subset sizes p from 8 to 14. The elbow occurs at subset size 12,

which is the value we will use for the tuning parameter of the MCD estimator.

Although the elbow procedure for the MCD method may seem a reintroduction

Do

wn

load

ed b

y [

Un

iver

sity

of

Lie

ge]

, [D

avid

Mag

is]

at 0

1:2

4 1

8 O

cto

ber

20

11

Page 15: Identification of Differential Item Functioning in Multiple ... · A Multivariate Outlier Detection Approach David Magis University of Liège and K. U. Leuven Paul De Boeck University

MULTIVARIATE OUTLIER APPROACH TO MULTIGROUP DIF 747

FIGURE 2 Plot of the subset size against the smallest determinants of the covariance

matrix (math data set) as a basis to apply the elbow criterion.

of a purification process, it neither includes an iterative purification of DIF items

nor a repeated computation of DIF statistics.

Table 1 summarizes the estimates of the mean vector and the covariance

matrix with the five methods: the nonrobust estimation and the four robust

alternatives. The estimates of �1 are rather similar across the five methods,

except for the OGK method, which returns a slightly lower value. The estimates

of �2, however, are clearly lower for the four robust methods ( 0.606 to 0.640

vs. 0.126), as a consequence of outlying values in the second dimension. With

respect to the variances, the robust estimates of the first variance component, ¢11,

are very close to the nonrobust one, whereas for the second variance component,

¢22, the robust estimates are much smaller (0.672–0.848) than the nonrobust

estimate (2.011), which is in agreement with the findings for the means. Finally,

the robust estimates of the covariance ¢12 differ from the nonrobust one ( 0.201

to 0.062 vs. 0.598).

In sum, the presence of outlying items has a direct impact on the nonrobust

estimates, especially for the second dimension (when comparing the second

Do

wn

load

ed b

y [

Un

iver

sity

of

Lie

ge]

, [D

avid

Mag

is]

at 0

1:2

4 1

8 O

cto

ber

20

11

Page 16: Identification of Differential Item Functioning in Multiple ... · A Multivariate Outlier Detection Approach David Magis University of Liège and K. U. Leuven Paul De Boeck University

748 MAGIS, DE BOECK

TABLE 1

Nonrobust and Robust Parameter Estimates of the Mean Vector and

Covariance Matrix of the Bivariate Standardized Areas (Math Data Set)

Method �1 �2 ¢11 ¢22 ¢12

Nonrobust 0.021 0.126 1.572 2.011 0.598

D-S 0.085 0.606 1.508 0.742 0.201

M 0.094 0.629 1.467 0.672 0.175

MCD 0.085 0.606 1.447 0.712 0.193

OGK 0.116 0.640 1.443 0.848 0.062

Note. D-S D Dohono-Stahel estimator; M D constrained M estimator;

MCD D minimum covariance determinant estimator; OGK D orthogonalized

Gnanadesikan-Kettenring estimator.

focal group with the reference group). Because the robust estimates converge

while they diverge from the nonrobust estimates, we expected DIF identification

convergence among the robust methods and some divergence with the nonrobust

method.

DIF Identification

Based on the decision rules of Equations (6) and (8), items 10 and 14 were

identified as DIF with all five methods, using an alpha of .05. Table 2 displays

the nonrobust statistics ¥�j in Equation (5), and also the statistics Q¥�j in Equation

(7) for each robust estimator. The DIF detection threshold is equal to 0.420, that

is, the quantile of the beta distribution with 1 and 5.5 degrees of freedom.

The nonrobust ellipse is given in Figure 3, and the four robust ellipses are

shown in Figure 4. Despite differences between the ellipses of the five methods,

they all return the same conclusion that items 10 and 14 are DIF items. The

gap between DIF (outside) and non-DIF (inside) items is clearer with the robust

methods than with the nonrobust method.

We focus now on pairwise comparisons. Starting from an ellipse, the pairwise

comparison can be achieved by projecting the items and the ellipses onto the

axes of the corresponding figure. The item projections in Figures 3 and 4 are

indicated by thin narrow lines on the axes, while the ellipse projections are

indicated as intervals delineated with thick brackets. The intervals in Figure 4

are for the largest ranges among the four robust ellipses.

Comparing the first focal group and the reference group (the projection onto

the x-axis), all items are located inside the non-DIF interval for the nonrobust

method as well as for the robust methods, which is in line with the findings by

Kim et al. (1995). Comparing the second focal group and the reference group

Do

wn

load

ed b

y [

Un

iver

sity

of

Lie

ge]

, [D

avid

Mag

is]

at 0

1:2

4 1

8 O

cto

ber

20

11

Page 17: Identification of Differential Item Functioning in Multiple ... · A Multivariate Outlier Detection Approach David Magis University of Liège and K. U. Leuven Paul De Boeck University

MULTIVARIATE OUTLIER APPROACH TO MULTIGROUP DIF 749

TABLE 2

Nonrobust and Robust Statistics ¥*

jand O¥*

j(Math Data Set)

Item Nonrobust D-S M MCD OGK

1 0.009 0.004 0.005 0.004 0.003

2 0.039 0.225 0.257 0.235 0.218

3 0.077 0.055 0.056 0.057 0.034

4 0.093 0.116 0.123 0.121 0.118

5 0.129 0.109 0.111 0.114 0.059

6 0.017 0.113 0.131 0.118 0.112

7 0.094 0.066 0.066 0.068 0.034

8 0.027 0.033 0.035 0.034 0.046

9 0.123 0.123 0.126 0.129 0.181

10 0.501a 1.017a 1.102a 1.060a 0.795a

11 0.107 0.078 0.079 0.081 0.046

12 0.137 0.133 0.136 0.139 0.192

13 0.099 0.126 0.132 0.132 0.136

14 0.700a 2.117a 2.360a 2.206a 2.199a

Note. D-S D Dohono-Stahel estimator; M D constrained M estimator;

MCD D minimum covariance determinant estimator; OGK D orthogonalized

Gnanadesikan-Kettenring estimator.aLarger than the quantile QBeta.0:95I 1; 5:5/ D 0:420 of the Beta

distribution with 1 and 5.5 degrees of freedom and significance level of .05.

(the projection on the y-axis), item 14 is located outside the non-DIF interval

for the nonrobust and the robust methods, whereas item 10 is located outside

the non-DIF interval only for the robust methods. Also these results are in line

with the findings by Kim et al., except for item 10 and the robust ellipses. The

pairwise Lord test did not reveal item 10 as a statistically significant DIF item

while our robust outlier method clearly does.

Reference Group

In this example the reference group was naturally defined as the group without

scientific calculators. To illustrate that the multiple-group DIF identification does

not depend on which group is chosen as the reference group, we also considered,

separately, each of the groups with a calculator as the reference group, and

repeated a DIF analysis using the nonrobust and robust methods.

It turned out that although the areas zjf were obviously different from those

obtained previously, the items identified as DIF items were the same. That is,

the nonrobust and the robust methods flagged items 10 and 14 as DIF, and only

those two. This illustrates the invariance of the results with respect to the choice

of the reference group.

Do

wn

load

ed b

y [

Un

iver

sity

of

Lie

ge]

, [D

avid

Mag

is]

at 0

1:2

4 1

8 O

cto

ber

20

11

Page 18: Identification of Differential Item Functioning in Multiple ... · A Multivariate Outlier Detection Approach David Magis University of Liège and K. U. Leuven Paul De Boeck University

750 MAGIS, DE BOECK

FIGURE 3 Scatter plot of measures of standardized areas between the item response

functions of each of the two focal groups and the reference group (math data set). The plot

shows focal group 1 measures versus focal group 2 measures (in both cases in comparison

with the reference group). Items are displayed by their rank number in the data set. The

contour is determined on the basis of nonrobust method. The marks on the axes are orthogonal

projections of the items onto these axes. The rectangular brackets are orthogonal projections

of the ellipse onto the axes.

DISCUSSION

In the present study we proposed to consider DIF items as multivariate outliers.

The method consists in identifying outlying items as items whose Mahalanobis

distance is larger than a well-defined quantile under the null hypothesis assump-

tion of no DIF. Robust statistical estimators are proposed instead of the regular,

nonrobust estimators to improve the accuracy of the outlier detection. Robust

statistics is a common approach in many fields of research, (e.g., biometrics or

econometrics), but seems to be fairly new in psychometrics (but see Zijlstra,

Do

wn

load

ed b

y [

Un

iver

sity

of

Lie

ge]

, [D

avid

Mag

is]

at 0

1:2

4 1

8 O

cto

ber

20

11

Page 19: Identification of Differential Item Functioning in Multiple ... · A Multivariate Outlier Detection Approach David Magis University of Liège and K. U. Leuven Paul De Boeck University

MULTIVARIATE OUTLIER APPROACH TO MULTIGROUP DIF 751

FIGURE 4 Scatter plot of measures of standardized areas between the item response

functions of each of the two focal groups and the reference group (math data set). The plot

shows focal group 1 measures versus focal group 2 measures (in both cases in comparison

with the reference group). Items are displayed by their rank number in the data set. The

four contours are determined on the basis of Dohono-Stahel estimates (D-S), constrained M

estimates (M), MCD estimates (MCD), and OGK estimates (OGK; math data set). Items

are displayed by their rank number in the data set. The marks on the axes are orthogonal

projections of the items onto these axes. The rectangular brackets are orthogonal projections

of the ellipse onto the axes.

van der Ark, & Sijtsma, 2007, 2011), and certainly for the identification of DIF

(Magis & De Boeck, 2011).

The outlier detection approach is straightforward and easy to implement, and

up to three groups, a simple graphical representation is available. An impor-

tant asset is that the method does not require iterative purification procedures.

The math data set, which was analyzed in this study, illustrates the potential

usefulness of this method. Given the theoretical potential of the method and its

success in the application, it seems promising also for large-scale international

Do

wn

load

ed b

y [

Un

iver

sity

of

Lie

ge]

, [D

avid

Mag

is]

at 0

1:2

4 1

8 O

cto

ber

20

11

Page 20: Identification of Differential Item Functioning in Multiple ... · A Multivariate Outlier Detection Approach David Magis University of Liège and K. U. Leuven Paul De Boeck University

752 MAGIS, DE BOECK

comparison studies and for cross-cultural equivalence studies. However, the

present study must be considered primarily as a proof of concept for DIF as

a robust outlier phenomenon. Further investigation is necessary to develop the

method and find out more precisely how accurate and practical the approach is.

The robust outlier methods can be further developed in several ways. First,

it is an interesting research issue how to determine of detection criteria when

the DIF statistics are not normally distributed. This would be particularly useful

for methods whose DIF statistics are basically chi-square distributed. For the

case the distribution is unknown, bootstrap methods may be developed. Sec-

ond, tools can be developed for the exploration and testing of where (between

which groups) the DIF occurs more precisely. The graphical approach presented

previously cannot be used beyond three groups.

To evaluate the robust outlier methods, they must be carefully compared

with the more traditional methods on their accuracy and efficiency. Extensive

simulation studies may help for this evaluation. An interesting aspect of the

comparison is that the robust outlier approach relies on statistics as used in

traditional methodology. The difference does not concern the DIF statistics

that are used, but rather the application of the robust outlier principle to these

statistics.

The robust outlier principle is based on a kind of logic that is worth consid-

ering also for the exploration and identification of other discrepancies between

a model and the data, due to discrepant data patterns associated to items or

persons. DIF is an example of such a discrepancy, but other such examples may

be of interest as a part of more specific model fit investigation than with global

goodness of fit statistics.

ACKNOWLEDGMENTS

This research was presented at the 75th annual meeting of the Psychometric

Society (Athens, GA, July 2010). The authors wish to thank the editor and the

anonymous referees for insightful suggestions. The research was financially sup-

ported by a postdoctoral research grant “Chargé de recherches” of the National

Funds for Scientific Research (FNRS, Belgium) and the Research Funds of K.U.

Leuven (GOA/10/02).

REFERENCES

Ackerman, T. A. (1992). A didactic explanation of item bias, item impact, and item validity from a

multidimensional perspective. Journal of Educational Measurement, 29, 67–91.

Angoff, W. H., & Ford, S. F. (1973). Item-race interaction on a test of scholastic aptitude. Journal

of Educational Measurement, 10, 95–106.

Do

wn

load

ed b

y [

Un

iver

sity

of

Lie

ge]

, [D

avid

Mag

is]

at 0

1:2

4 1

8 O

cto

ber

20

11

Page 21: Identification of Differential Item Functioning in Multiple ... · A Multivariate Outlier Detection Approach David Magis University of Liège and K. U. Leuven Paul De Boeck University

MULTIVARIATE OUTLIER APPROACH TO MULTIGROUP DIF 753

Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify differential item

functioning test items. Educational Measurement: Issues and Practice, 17, 31–44.

Conover, W. J. (1971). Practical nonparametric statistics. New York, NY: Wiley.

Dorans, N. J., & Kulick, E. (1986). Demonstrating the utility of the standardization approach to

assessing unexpected differential item performance on the Scholastic Aptitude Test. Journal of

Educational Measurement, 23, 355–368.

Gnanadesikan, R., & Kettenring, J. (1972). Robust estimates, residuals, and outlier detection with

multiresponse data. Biometrics, 28, 81–124.

Hardin, J., & Rocke, D. M. (2005). The distribution of robust distances. Journal of Computational

and Graphical Statistics, 14, 928–946.

Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel

procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129–145). Hillsdale, NJ: Erlbaum.

Holland, P. W., & Wainer, H. (1993). Differential item functioning. Hillsdale, NJ: Erlbaum.

Huber, P. J., & Ronchetti, E. M. (2009). Robust statistics (second edition). Hoboken, NJ: Wiley.

Johnson, R. A., & Wichern, D. W. (1998). Applied multivariate statistical analysis (4th ed.). Upper

Saddle River, NJ: Prentice-Hall.

Kim, S.-H., Cohen, A. S., & Park, T.-H. (1995). Detection of differential item functioning in multiple

groups. Journal of Educational Measurement, 32, 261–276.

Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale,

NJ: Erlbaum.

Magis, D., & De Boeck, P. (2011). A robust outlier approach to prevent Type I error inflation in

DIF. Research report, Department of Mathematics, University of Liège, Belgium.

Mahalanobis, P. C. (1936). On the generalized distance in statistics. Proceedings of the National

Institute of Science of India, 2, 49–55.

Mardia, K., Kent, J., & Bibby, J. (1979). Multivariate Analysis. New York, NY: Academic Press.

Maronna, R. A., Martin, D., & Yohai, V. J. (2006). Robust statistics: Theory and methods. New

York, NY: Wiley.

Maronna, R. A., & Yohai, V. J. (1995). The behavior of the Stahel-Donoho robust multivariate

estimator. Journal of the American Statistical Association, 90, 330–341.

Maronna, R. A., & Zamar, R. H. (2002). Robust estimates of location and dispersion for high-

dimensional datasets. Technometrics, 44, 307–317.

Miller, R. G. (1981). Simultaneous statistical inference (2nd ed.). New York, NY: Springer-Verlag.

Millsap, R. E., & Everson, H. T. (1993). Methodology review: Statistical approaches for assessing

measurement bias. Applied Psychological Measurement, 17, 297–334.

Osterlind, S. J., & Everson, H. T. (2009). Differential item functioning (2nd ed.). Thousand Oaks,

CA: Sage.

Penfield, R. D. (2001). Assessing differential item functioning among multiple groups: a comparison

of three Mantel-Haenszel procedures. Applied Measurement in Education, 14, 235–259.

Penfield, R. D., & Camilli, G. (2007). Differential item functioning and item bias. In C. R. Rao

& S. Sinharray (Eds.), Handbook of statistics 26: Psychometrics (pp. 125–167). Amsterdam, the

Netherlands: Elsevier.

R Development Core Team. (2010). R: A language and environment for statistical computing.

Vienna, Austria: R Foundation for Statistical Computing.

Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika, 53, 495–502.

Raju, N. S. (1990). Determining the significance of estimated signed and unsigned areas between

two item response functions. Applied Psychological Measurement, 14, 197–207.

Rocke, D. M. (1996). Robustness properties of S-estimates of multivariate location and shape in

high dimension. Annals of Statistics, 24, 1327–1345.

Rousseeuw, P. J., & Croux, C. (1993). Alternatives to the median absolute deviation. Journal of the

American Statistical Association, 88, 1273–1283.

Do

wn

load

ed b

y [

Un

iver

sity

of

Lie

ge]

, [D

avid

Mag

is]

at 0

1:2

4 1

8 O

cto

ber

20

11

Page 22: Identification of Differential Item Functioning in Multiple ... · A Multivariate Outlier Detection Approach David Magis University of Liège and K. U. Leuven Paul De Boeck University

754 MAGIS, DE BOECK

Rousseeuw, P. J., & Leroy, A. M. (1987). Robust regression and outlier detection. New York, NY:

Wiley.

Rousseeuw, P. J., & van Driessen, K. (1999). A fast algorithm for the minimum covariance deter-

minant estimator. Technometrics, 41, 212–223.

Shealy, R. T., & Stout, W. (1993). A model based standardization approach that separates true

bias/DIF from group ability differences and detects test bias/DIF as well as item bias/DIF.

Psychometrika, 58, 159–194.

Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic

regression procedures. Journal of Educational Measurement, 27, 361–370.

Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group

difference in trace lines. In H. Wainer & H. Braun (Eds.), Test validity (pp. 147–170). Hillsdale,

NJ: Erlbaum.

Todorov, V., & Filzmoser, P. (2009). An object-oriented framework for robust multivariate analysis.

Journal of Statistical Software, 32, 1–47.

Woodruff, D. L., & Rocke, D. M. (1994). Computable robust estimation of multivariate location

and shape on high dimension using compound estimators. Journal of the American Statistical

Association, 89, 888–896.

Zijlstra, W. P., van der Ark, L. A., & Sijstma, K. (2007). Outlier detection in test and questionnaire

data. Multivariate Behavioral Research, 42, 531–555.

Zijlstra, W. P., van der Ark, L. A., & Sijstma, K. (2011). Robust Mokken scale analysis by means of

the forward search algorithm for outlier detection. Multivariate Behavioral Research, 46, 58–89.

APPENDIX

This section provides the derivation of Equations (11) and (12) from the in-

equality (10). To simplify the purpose, set F.zj1; zj 2/ D .zj z/0S 1.zj z/

as the squared Mahalanobis distance, so that Equation (10) takes the simple

form F.zj1; zj 2/ � Q�’ . By definition, the interval I1 must be such that zj1 62 I1

if and only if F.zj1; zj 2/ > Q�’ for any value of zj 2, or equivalently if and

only if the smallest value of F.zj1; zj 2/ (upon all zj 2 values) is larger than Q�’ .

This minimum value is obtained as follows. First, one can rewrite F.zj1; zj 2/

as follows:

F.zj1; zj 2/ D1

det Sfs22.zj1 z1/

2 2s12.zj1 z1/.zj 2 z2/

C s11.zj 2 z2/2g(17)

where det S D s11s22 s212 is the determinant of the covariance matrix S . Note

that F.zj1; zj 2/ is convex with zj 2 and reaches its minimum value whenever

zj 2 D z2 Cs12

s11

.zj1 z1/ D z�j 2: (18)

Do

wn

load

ed b

y [

Un

iver

sity

of

Lie

ge]

, [D

avid

Mag

is]

at 0

1:2

4 1

8 O

cto

ber

20

11

Page 23: Identification of Differential Item Functioning in Multiple ... · A Multivariate Outlier Detection Approach David Magis University of Liège and K. U. Leuven Paul De Boeck University

MULTIVARIATE OUTLIER APPROACH TO MULTIGROUP DIF 755

In sum, the minimum value of F.zj1; zj 2/ with respect to zj 2 is equal to

minzj2

F.zj1; zj 2/ D F.zj1 ; z�j 2/ D.zj1 z1/

2

s11

: (19)

Therefore

zj1 62 I1 ,.zj1 z1/2

s11

> Q�’ , jzj1 z1j >

p

s11Q�’

, zj1 62 Œz1 ˙p

s11Q�’ �;

(20)

as expected. Keep now zj1 fixed in I1, which implies that .zj1 z1/2 � s11Q�’ .

The condition Equation (10) can be written as follows, using Equation (17):

s22.zj1 z1/2 2s12.zj1 z1/.zj 2 z2/C s11.zj 2 z2/2 Q�

’ det S � 0; (21)

and the left-hand side of Equation (21) is convex with zj 2. The discriminant of

this convex function is equal to

� D 4fs11Q�’ .zj1 z1/2g det S (22)

and is positive since zj1 belongs to I1. In sum, condition (21) is satisfied if and

only if

zj 2 2"

z2 Cs12.zj1 z1/˙

p�

s11

#

(23)

(where Œa ˙ b� stands for interval Œa bI a C b�), which is equal to Equation

(12).

Do

wn

load

ed b

y [

Un

iver

sity

of

Lie

ge]

, [D

avid

Mag

is]

at 0

1:2

4 1

8 O

cto

ber

20

11