identification of differential item functioning in multiple ... · a multivariate outlier...
TRANSCRIPT
Multivariate Behavioral Research, 46:733–755, 2011
Copyright © Taylor & Francis Group, LLC
ISSN: 0027-3171 print/1532-7906 online
DOI: 10.1080/00273171.2011.606757
Identification of Differential ItemFunctioning in Multiple-Group Settings:
A Multivariate Outlier DetectionApproach
David MagisUniversity of Liège and
K. U. Leuven
Paul De BoeckUniversity of Amsterdam and
K. U. Leuven
We focus on the identification of differential item functioning (DIF) when more
than two groups of examinees are considered. We propose to consider items as
elements of a multivariate space, where DIF items are outlying elements. Following
this approach, the situation of multiple groups is a quite natural case. A robust
statistics technique is proposed to identify DIF items as outliers in the multivariate
space. For low dimensionalities, up to 2–3 groups, a simple graphical tool is
derived. We illustrate our approach with a reanalysis of data from Kim, Cohen,
and Park (1995) on using calculators for a mathematics test.
Differential item functioning (DIF) is an undesirable phenomenon that can affect
the validity of test conclusions. An item is said to function differently (or shortly,
to exhibit DIF) among two (or more) groups of examinees whenever respondents
with identical ability levels, but who are from different groups, have different
probabilities of endorsing the item. In this sense, DIF can be seen as adding extra
Correspondence concerning this article should be addressed to David Magis, Department of
Mathematics (B37), University of Liège, Grande Traverse 12, B-4000 Liège, Belgium. E-mail:
733
Do
wn
load
ed b
y [
Un
iver
sity
of
Lie
ge]
, [D
avid
Mag
is]
at 0
1:2
4 1
8 O
cto
ber
20
11
734 MAGIS, DE BOECK
dimensionality in the data (Ackerman, 1992). The identification and removal of
DIF items are a necessity for valid conclusions.
Many DIF detection methods have been proposed for the case of two groups
of examinees. Well-known methods are the Mantel-Haenszel method (Holland &
Thayer, 1988), standardization (Dorans & Kulick, 1986), the SIBTEST method
(Shealy & Stout, 1993), the logistic regression procedure (Swaminathan &
Rogers, 1990), Lord’s chi-square test (Lord, 1980), and the likelihood-ratio test
(Thissen, Steinberg, & Wainer, 1988). For a review of these (and other) methods,
see Clauser and Mazor (1998), Holland and Wainer (1993), Millsap and Everson
(1993), Osterlind and Everson (2009), and Penfield and Camilli (2007).
Although the DIF literature is vast, only a few studies have focused on
multiple-group DIF identification. The relative lack of interest has historic and
pragmatic reasons. Historically, the common case is where the test is constructed
primarily on the basis of (and for) a reference group, whereas an individual wants
to apply it also to a minority group, called the focal group (Angoff & Ford, 1973).
Gender is an exception because no minority group is involved, but the situation
is formally equivalent because there are only two gender groups. In line with
this common situation, the DIF methodology is focused on comparing one group
with another, and if more than one minority group is involved, it makes sense
to compare each of them as a focal group with the reference group. However,
this leads into well-known issues of multiple testing (e.g., Miller, 1981). From
a pragmatic point of view, the focus on just two groups makes sense because
many of the traditional methods cannot easily be extended to a simultaneous
comparison of more than two groups of examinees.
Because of an increasing interest for large-scale or international assessment
studies, such as PISA or TIMSS, there is a real need for multiple-group DIF
methods. Several traditional methods have been extended already: the general-
ized Mantel-Haenszel method (Penfield, 2001) and the generalized Lord’s test
(Kim, Cohen, & Park, 1995). It is clear that the likelihood-ratio test approach
can also be applied for multiple groups.
The main purpose of this article is to present a robust outlier method that can
be applied to more than two groups of examinees without an extension. The key
idea is to rely on a transformation of traditional DIF statistics in order to identify
outliers in the multigroup space of these transformed DIF statistics, based on
principles from robust statistics. This approach has several advantages. First,
in principle, any DIF statistic from traditional DIF methods can be considered,
although in practice an application of the approach is much easier if the chosen
statistic is normally distributed. We therefore restricted this study to normally
distributed DIF statistics. If the distribution cannot be assumed to be normal, it
would be necessary to simulate the distribution, such as with a bootstrap method.
Second, the procedure of DIF identification is one-step process. No further
iterative application of the process is needed. The traditional DIF detection
Do
wn
load
ed b
y [
Un
iver
sity
of
Lie
ge]
, [D
avid
Mag
is]
at 0
1:2
4 1
8 O
cto
ber
20
11
MULTIVARIATE OUTLIER APPROACH TO MULTIGROUP DIF 735
methods do require an iterative purification process to avoid distortions due
to the presence of other DIF items than the one under consideration. Third,
although the traditional methods are sensitive to Type I error inflation in the
case of asymmetric DIF, it was recently pointed out (Magis & De Boeck, 2011)
that, in the usual case of two groups, the outlier detection approach is robust
against this Type I error inflation. Although it was not a subject of investigation
in the present study, the same kind of robustness for the case of multiple groups
could be expected, based on the statistical principles of the approach.
We start with an explanation of the principles of the robust outlier approach
for the case of two groups. Then, we present the more general approach in-
dependent of the number of groups. A graphical tool is described that can be
used up to three groups. Finally, the general approach and the graphical tool are
illustrated with a real-data example about a test of mathematics.
OUTLIER IDENTIFICATION BETWEEN TWO GROUPS
The traditional DIF methods look rather diverse. Some rely on item response
theory (IRT) models, such as Lord’s test or the likelihood-ratio test methods,
whereas others are built on statistical methods for discrete data, for instance the
Mantel-Haenszel approach, logistic regression, or SIBTEST. Several methods
focus primarily on one type of DIF effect (uniform or nonuniform). Uniform
DIF occurs when the relationship between the group membership and the item
response does not depend on the ability level, that is, the DIF effect is uniform
along the ability scale. Nonuniform DIF implies that the DIF effect can be dif-
ferent depending on the ability level (Clauser & Mazor, 1998). Other techniques
are able to detect both types of DIF. On the other hand, the various methods
have some important formal aspects in common.
Let J be the number of items and let zj .j D 1; : : : ; J / be the DIF statistic
for item j . The form of zj depends on the DIF method. For instance, with
the SIBTEST method, zj corresponds to the test statistic B (Shealy & Stout,
1993, Equation 18), and for Lord’s test, it is the chi-square statistic ¦2j as given
by Kim et al. (1995, Equation 10). The choice of a DIF statistic is crucial
for the type of DIF. For instance, the statistics based on the Mantel-Haenszel
and SIBTEST methods focus primarily on uniform DIF, whereas the logistic
regression procedure can detect uniform and nonuniform DIF separately and in
combination. With IRT methods, it is mainly the selection of an IRT model that
imposes the kind of DIF effect one can focus on.
The distribution of zj under the hypothesis of no DIF obviously depends on
the statistic. Most often the asymptotic null distribution is the standard normal
distribution, for instance with SIBTEST’s B statistic, the Mantel-Haenszel esti-
mate of the common odds ratio, and the standardized areas between two item
Do
wn
load
ed b
y [
Un
iver
sity
of
Lie
ge]
, [D
avid
Mag
is]
at 0
1:2
4 1
8 O
cto
ber
20
11
736 MAGIS, DE BOECK
characteristic curves (Raju, 1988, 1990). Another common null distribution is
the chi-square distribution, as for the Wald and likelihood-ratio statistics, Lord’s
test statistic (from which a normally distributed statistic can also be derived),
and the Mantel-Haenszel chi-square statistic. In this study, we put emphasis on
normally distributed DIF statistics.
Despite some important intrinsic differences between statistical methods, they
are all based on a common logic with four elements: (a) a DIF statistic is defined;
(b) the statistical distribution of this DIF statistic under the null hypothesis of no
DIF is determined; (c) using this null distribution, the empirical value of the DIF
statistic is compared to the quantiles of the null distribution; and (d) because the
DIF statistic of an item is affected by the presence of DIF among the other items,
a purification procedure is set up until convergence to a supposedly DIF-free
test.
The (robust) outlier approach is meant to induce the null distribution in a one-
step purification, or in other words, directly from the data without any iteration,
and this null distribution refers to a distribution over items. Suppose that in the
absence of DIF, the DIF statistic is distributed across items in a certain way that
corresponds to noise. The presence of DIF can be seen as a process that adds
a signal to the noise, so that the DIF statistic of affected items deviates from
this distribution. Assuming that the DIF items are a minority, it is possible to
identify those DIF items with a robust outlier approach, without the need of an
iterative purification process, and with a robust estimate of the null distribution
as a result. The crucial assumption is that the DIF items are a minority. However,
if they would be a majority, then the concept of DIF does no longer apply. The
test is then simply measuring a different construct altogether. Although majority
DIF can still be called DIF, we focus here on minority DIF.
The outlier approach can be sketched in the following way. First, some DIF
statistic zj must be computed for each item, for example, the Mantel-Haenszel
estimate of the common odds ratio, or the Raju distance (Raju, 1988). Second, a
transformation is applied to this statistic, so that a statistic •j is obtained which
is asymptotically normally distributed across items under the null hypothesis of
no DIF. In the application further on, we in fact use the Raju distance and its
standardization (Raju, 1988). The statistic •j is derived in the following way:
•j Dzj z
sz
(1)
where z and sz are the sample mean and the sample covariance of the zj mea-
sures .j D 1; : : : ; J /, respectively. The statistic •j actually corresponds to the
sample Mahalanobis distance in a one-dimensional space (Mahalanobis, 1936).
Hence, using the properties of the Mahalanobis distance and assuming that the
zj measures come from a normal distribution, the statistic •2j is asymptotically
Do
wn
load
ed b
y [
Un
iver
sity
of
Lie
ge]
, [D
avid
Mag
is]
at 0
1:2
4 1
8 O
cto
ber
20
11
MULTIVARIATE OUTLIER APPROACH TO MULTIGROUP DIF 737
chi-square distributed with one degree of freedom. However, Mardia, Kent, and
Bibby (1979) showed that under the same conditions, the modified statistic
•�j DJ
.J 1/2•2
j DJ
.J 1/2
.zj z/2
s2z
(2)
follows an exact beta distribution with parameters 0.5 and .J 2/=2 (see also
Hardin & Rocke, 2005). With shorter tests, this exact distribution is much more
accurate than the asymptotic chi-square distribution, and both are nearly identical
when the number of items is very large.
The exact beta distribution of the •�j statistic holds on the condition that the
zj statistics are normally distributed. This motivates the restriction to normally
distributed DIF statistics. However, the •�j statistic may still be used as an
indicator of outlying (DIF) item, even when nonnormally distributed zj measures
are considered. This would be, however, at the cost of losing the exact beta
distribution, and the search for an accurate DIF classification threshold would
be more complicated, for example, because the distribution needs to be simulated
such as with a bootstrap method.
Based on a comparison of the •�j statistic from Equation (2) with the appropri-
ate quantile of the beta distribution, outlying items can be identified. However,
•j and •�j depend on the sample mean and the sample standard deviation of the
DIF measures zj . Provided that one or several measures zj are outlying, this
seriously affects the computation of z and sz , and hence the values of •j and
•�j . This issue is similar to the issue encountered by the traditional methods, for
which item purification was recommended as a way to withdraw the impact of
DIF items on the DIF test results for other items. Here, we propose to replace
the regular sample estimates, which are nonrobust, by robust alternatives. The
term robust has to be understood in its statistical sense: a robust estimator is
much less sensitive to aberrant or outlying measures and returns more reliable
estimates than the classical, nonrobust methods. To better emphasize the fact
that sample estimates are not robust in the presence of outliers, they are further
referred to as the nonrobust estimates.
In the one-dimensional framework of DIF (i.e., when only two groups are
studied), straightforward robust estimators of location and scale are the median
Qz and the median absolute deviation (or MAD) madz . The MAD corresponds
the median of the absolute differences between the zj s and their median value Qz(Rousseeuw & Croux, 1993). Replacing the nonrobust estimates by their robust
versions, it is possible to obtain the robust outlier identification rule by computing
Q•�j DJ
.J 1/2Q•2j D
J
.J 1/2
.zj Qz/2
mad2z
(3)
and comparing Q•�j to the same quantile of the beta distribution.
Do
wn
load
ed b
y [
Un
iver
sity
of
Lie
ge]
, [D
avid
Mag
is]
at 0
1:2
4 1
8 O
cto
ber
20
11
738 MAGIS, DE BOECK
It is important to notice that by replacing the nonrobust estimates with their
robust counterparts, there is no guarantee that the modified statistic Equation
(3) follows a beta distribution. However, we keep comparing Q•�j to the beta
quantile for a reason that is central to the concept of robust estimators and also
commonly accepted (e.g., Mardia et al., 1979; Maronna, Martin, & Yohai, 2006;
Rousseeuw & Leroy, 1987). The concept of robust estimators is that they capture
the location and the scale of the good part of the data set, that is, the majority
of data points that are unaffected by outliers and therefore may be assumed
to follow a given distribution. In the absence of outliers, robust and nonrobust
estimators usually return very close estimates. When outliers are present, they
distort the distributional assumption and affect the nonrobust estimates, whereas
robust methods get rid of this effect. In sum, the violation of beta distribution
(by replacing nonrobust estimates by robust alternatives) should not affect that
much the validity of the beta threshold selection.
OUTLIER IDENTIFICATION FOR MORE THAN
TWO GROUPS
Consider one reference group and F focal groups with F � 2. Each item j can
be characterized by a vector of DIF measures, say zj D .zj1; : : : ; zjF /, where
zjf .f D 1; : : : F / denotes the measure of DIF effect between the reference
group and focal group f . It is possible to assume that zjf is a traditional measure
of DIF between two groups, as listed in the previous section, and is computed
repeatedly between each focal group and the reference group. In the simple case
of two groups, zj D zj1 and is thus a scalar, so that only one dimension must
be considered. The DIF identification therefore reduces to the identification of
outliers in one set of statistics. For multiple groups, DIF is identified by detecting
outlying vectors zj among the set of test items. Outlying can occur for one or
for several components of zj , which makes the problem multidimensional. DIF
can be present between the reference group and one or several focal group(s).
The approach can easily be used without a reference group but instead with an
undifferentiated set of groups as in an international comparison study. When
there is no natural reference group and when the purpose is to identify DIF
across groups in a global way, then any group can function as a reference group.
For the identification of overall DIF in the sense of DIF somewhere between
the groups (without specifying where), the choice of a reference group has no
consequences. The situation is similar to a situation in which multiple categories
are compared through a global test. The choice of a baseline category does not
affect the result of the global test.
A fundamental assumption for further developments is that zj arises from a
multivariate normal distribution. This obviously restricts the set of allowable DIF
Do
wn
load
ed b
y [
Un
iver
sity
of
Lie
ge]
, [D
avid
Mag
is]
at 0
1:2
4 1
8 O
cto
ber
20
11
MULTIVARIATE OUTLIER APPROACH TO MULTIGROUP DIF 739
measures if the desire is to rely on the identification rule to be formulated. Set z
as the sample mean vector (of length F ) and S as the sample covariance matrix
of the DIF vectors zj . Then, the square of the sample Mahalanobis distance ¨j
for item j is
¨j Dq
.zj z/0S 1.zj z/ (4)
where the prime stands for transposed vectors and matrices, and is asymptotically
chi-square distributed with F degrees of freedom (Mahalanobis, 1936). Also,
the modified statistic
¥�j DJ
.J 1/2¨2
j DJ
.J 1/2.zj z/0S 1.zj z/ (5)
has an exact beta distribution with parameters F=2 and .J F 1/=2 (Hardin
& Rocke, 2005; Mardia et al., 1979). Note that Equation (4) reduces to Equation
(1), and Equation (5) reduces to Equation (2), when F is equal to 1 (i.e., in
the usual case of a single focal group). Hence, the detection of outlying vectors
of DIF measures zj can be achieved by computing the modified statistics ¥�jin Equation (5) and comparing them to the corresponding quantile of the beta
distribution, similarly as in the previous section: item j is flagged as DIF if and
only if
¥�j DJ
.j 1/2.zj z/0S 1.zj z/ > Q’ (6)
with Q’ being the quantile of the beta distribution related to significance level
’.
As in the univariate case, DIF between the reference group and any focal
group f affects the corresponding zjf measure. Typically, if item j exhibits
DIF, then zjf will be much larger or much smaller than most other zkf values
.k ¤ j /, which has an inflation effect on the sample variance and covariance in
the S matrix. This often leads to reduced ¥�j measures and an underdetection
of DIF items. Another problem is that also the mean z may be distorted by the
presence of DIF items, which can make DIF-free items looking as if they are
DIF items indeed. To overcome these problems, and in line with the univariate
outlier approach, we proposed to replace the nonrobust estimates z and S by
some robust multivariate alternatives. Let Qz and R be such robust estimates of
the mean vector and the covariance matrix, respectively. The robust identification
of outlying vectors consists then in computing the statistics
Q¥�j DJ
.J 1/2Q¥2
j DJ
.J 1/2.zj Qz/0R 1.zj Qz/ (7)
Do
wn
load
ed b
y [
Un
iver
sity
of
Lie
ge]
, [D
avid
Mag
is]
at 0
1:2
4 1
8 O
cto
ber
20
11
740 MAGIS, DE BOECK
and comparing the results to the same quantile of the beta distribution: item j
is flagged as DIF if and only if
Q¥�j DJ
.J 1/2.zj Qz/0R 1.zj Qz/ > Q’: (8)
In the univariate framework, simple robust alternatives can be considered,
as highlighted previously. However, turning to multivariate statistics leads to
technical complexity and the lack of straightforward robust estimators of � and
†. Several robust estimators are available, from four broad categories:
1. The projection methods, which are weighted estimators of the mean and
the covariance matrix. The weights are the inverse of an outlying measure
derived for a one-dimensional projection of the points. If a point is a
multivariate outlier, then a one-dimensional projection of the data can
be made in which the degree of outlying is maximized for the point in
question. The Donoho-Stahel estimator (Maronna & Yohai, 1995) belongs
to this category.
2. Methods based on weighted maximum likelihood estimation, with the
weights defined such that outlying observations have lower or zero weight.
The S and M estimators (Rocke, 1996; Woodruff & Rocke, 1994) are of
that kind.
3. Methods based on optimal subset identification (omitting outlying obser-
vations), among which are the Minimum Covariance Determinant (MCD)
and the Minimum Volume Ellipsoid estimators (Rousseeuw & Leroy,
1987; Rousseeuw & van Driessen, 1999).
4. The pairwise methods, for which multivariate robust estimators are built
from univariate robust estimates and then regularized to guarantee pos-
itive definite covariance matrices. The orthogonalized quadrant correla-
tion pairwise estimator (Maronna & Zamar, 2002), and the Orthogonal-
ized Gnanadesikan-Kettenring (OGK) pairwise estimator (Gnanadesikan
& Kettenring, 1972; Maronna & Zamar, 2002) belong to this category.
Although relying on different conceptual approaches, these four methods can
provide joint robust estimates of the mean vector and the covariance matrix.
Moreover, it is commonly observed that in the absence of outliers the robust
estimates are close approximations of the regular nonrobust estimates, although
they differ substantially from the nonrobust estimates when outlying vectors are
present. Also, despite some conceptual differences between the robust methods,
they usually return similar estimates. A main drawback is that most robust
methods rely on a kind of tuning parameter that must be fixed in advance.
Additional information can be found in Huber and Ronchetti (2009), Maronna
et al. (2006), and Todorov and Filzmoser (2009).
Do
wn
load
ed b
y [
Un
iver
sity
of
Lie
ge]
, [D
avid
Mag
is]
at 0
1:2
4 1
8 O
cto
ber
20
11
MULTIVARIATE OUTLIER APPROACH TO MULTIGROUP DIF 741
The choice among the available methods is often driven by preferences
and the availability of computer software. However, most robust estimators are
available from the packages rrcov (Todorov & Filzmoser, 2009) of the software
R (R Development Core Team, 2010). There are no limitations to the number
of focal groups that can be taken into account. But, the larger this number, the
larger the dimensionality of the mean vector and the covariance matrix to be
estimated, and the more intensive the computational task. In this article, all four
types of robust estimators are considered for the applications and their output is
compared.
GRAPHICAL INTERPRETATION
The use of the Mahalanobis distance in the statistics of Equations (4), (5), and (7)
has a useful graphical interpretation. Let MD.x; �; †/ be the Mahalanobis dis-
tance of any n-dimensional vector x, for a given mean vector � and a covariance
matrix †. Then, for any positive constant Q, the relationship MD.x; �; †/ DQ defines an n-dimensional ellipsoid, whose center is given by �, the main
axes are given by the eigenvectors of †, and their lengths are proportional to
the eigenvalues of † (e.g., Johnson & Wichern, 1998). Furthermore, any n-
dimensional vector y satisfying MD.y ; �; †/ � Q is located within (or on the
boundary of) this ellipsoid. Similarly, the inequality MD.y ; �; †/ > Q implies
that y is located outside of the ellipsoid.
This graphical interpretation is most useful in the present DIF framework.
Consider for instance the nonrobust estimates of the mean vector and the co-
variance matrix. Recall that an item is flagged as DIF if and only if ¥�j > Q’
according to Equation (6). Making use of the definition in Equation (4), the
classification rule can be rewritten as
.zj z/0S 1.zj z/ >.J 1/2
JQ’ D Q�
’ : (9)
Outlying vectors zj are therefore lying outside of the F -dimensional ellipsoid
defined by the mean vector z, the covariance matrix S , and the constant Q�’ .
The robust classification rule is similar to Equation (9), with nonrobust estimates
being replaced by their robust alternatives.
The dimensionality of the ellipsoid equals the number of focal groups. If
there is only one focal group, the multivariate outlier identification reduces to
the classical univariate method and the ellipsoid collapses into a one-dimensional
interval that can easily be delineated on the basis of the statistic as in Equation
(3). In the case of two focal groups, the ellipsoid is a two-dimensional ellipse
whose contour can be described with simple equations. For the case of two focal
Do
wn
load
ed b
y [
Un
iver
sity
of
Lie
ge]
, [D
avid
Mag
is]
at 0
1:2
4 1
8 O
cto
ber
20
11
742 MAGIS, DE BOECK
groups and nonrobust estimates, the classification rule in Equation (9) is:
.zj1 z1 zj 2 z2/
�
s11 s12
s12 s22
� 1 �
zj1 z1
zj 2 z2
�
� Q�’ (10)
which is equivalent with the combination of Equations (11) and (12):
zj1 2 I1 D Œz1 ˙ s11
p
Q�’ � (11)
zj 2 2 I2.zj1/ D�
z2 Cs12.zj1 z1/
s11
˙1
s11
q
.s11s22 s212/fs11Q�
’ .zj1 z1/2g�
: (12)
The derivation is provided in the Appendix. In other words, the ellipsoid is fully
characterized by the intervals I1 and I2.zj1/ for any zj1 2 I1. Equations (11)
and (12) permit to graphically display the two-dimensional ellipse and to locate
each item by its pair of DIF measures .zj1; zj 2/. Using robust estimates, the
formulas are
zj1 2 I1 D Œ Qz1 ˙ r11
p
Q�’ � (13)
zj 2 2 I2.zj1/ D�
Qz2 Cr12.zj1 Qz1/
r11
˙1
r11
q
.r11r22 r212/fr11Q�
’ .zj1 Qz1/2g�
(14)
Because typically the variance r11 and the determinant of R are smaller than
the nonrobust variance s11, and the determinant of S , respectively, the robust
ellipse is usually smaller in size than the nonrobust ellipse, yielding increased
outlier detection rates.
Although the robust outlier approach as such (as defined in Equations [7]
and [8]) has no limitations with respect to the number of groups, it cannot
easily be graphically interpreted when there are more than three focal groups.
The derivation of the ellipsoid becomes rapidly intractable beyond three focal
groups. For more than three groups, the identification of DIF items through
condition Equation (8) can still be performed, but without a graphical support.
Do
wn
load
ed b
y [
Un
iver
sity
of
Lie
ge]
, [D
avid
Mag
is]
at 0
1:2
4 1
8 O
cto
ber
20
11
MULTIVARIATE OUTLIER APPROACH TO MULTIGROUP DIF 743
ILLUSTRATIVE EXAMPLE
Data and Original Analysis
The data were from a study by Kim et al. (1995) and consisted of three groups
of 200 students enrolled in calculus and precalculus mathematics courses at a
large Midwestern university in the Fall of 1990. The reference group was not
allowed to make use of scientific calculators during the test, whereas the two
focal groups were allowed to use a calculator, a different kind depending on the
group. Fourteen items were administered to the three groups of students. We
further refer to these data as the Math data set.
Kim et al. (1995) illustrated the generalized Lord’s test method with this
data set. The item parameters were estimated under the two-parameter logistic
(2PL) model, using the marginal maximum likelihood method, and a calibration
to the reference group scale was performed. The generalized Lord’s test was
implemented through iterative linking to reduce the impact of DIF items on the
linking procedure. This iterative linking is the equivalent of item purification. To
specify the results from the generalized Lord test and to detect between which
groups DIF occurs, we performed a pairwise comparison based on Lord’s test.
For these pairwise comparisons, the usual Bonferroni correction was adopted.
For all statistical tests an alpha level of .05 was used.
The following results were obtained. First, only item 14 was flagged as DIF
after a first run of the generalized Lord’s test. The purification process required
three iterations, and after these iterations, items 10 and 14 were flagged as
functioning differently between the three groups. Apparently, item 14 had a
masking effect on item 10, which was detected only thanks to the purification.
From the pairwise group comparisons it turned out that for the first focal group
in comparison with the reference group, none of the items were flagged as DIF.
When comparing the second focal group to the reference group, only item 14
was flagged as DIF, but the DIF of item 10 was almost significant. We conclude
that (a) item 14 exhibits DIF, mainly in the second focal group; (b) item 10
exhibits DIF only when the three groups are compared, but strictly speaking not
in any of the two focal groups separately; and (c) the other items are not flagged
as DIF. Before applying the outlier detection method on these data, we inspect
the scatter plot of DIF statistics and their bivariate normality.
Scatter Plot
A graphical representation of the DIF analysis can be obtained by using the
item parameter estimates rescaled to the reference group metric (see Kim et al.,
1995, Tables 3 and 5), and by computing the standardized areas between the
characteristic curves of the reference group and each focal group. Let aj 0 and
Do
wn
load
ed b
y [
Un
iver
sity
of
Lie
ge]
, [D
avid
Mag
is]
at 0
1:2
4 1
8 O
cto
ber
20
11
744 MAGIS, DE BOECK
ajf .f D 1; 2/ be the estimated discrimination parameters of item j in the
reference group and in focal group f , and let bj 0 and bjf be the corresponding
estimated difficulty parameters. Then, the standardized area zjf between the
reference group and focal group f takes the following form (Raju, 1988):
zjf DHjf
¢.Hjf /(15)
where
Hjf D2.ajf aj 0/
ajf aj 0
log
�
1C exp
�
ajf aj 0.bjf bj 0/
ajf aj 0
��
.bjf bj 0/ (16)
and ¢.Hjf / can be found in Raju (1990, Equations 34–39). Under the null
hypothesis that item j is not a DIF item, the standardized area zjf has an
asymptotic standard normal distribution (Lord, 1980; Raju, 1990). This motivates
the choice of the standardized area as a measure of DIF, which is important for
the outlier detection method.
Figure 1 displays the scatter plot of pairs of zjf statistics for each item,
represented by their number in the order of administration. The figure is a sheer
depiction of the DIF statistics without the application of an outlier method.
Clearly, items 10 and 14 depart from the rest of the items, mostly because of
larger zjf measures between the second focal group and the reference group,
but item 14 shows a somewhat deviating zjf measure also for the first focal
group in comparison with the reference group.
These findings are in agreement with the conclusions from Kim et al. (1995).
However, recall that their conclusions were drawn only after item purification,
whereas no purification or iterative procedure was used for Figure 1, and that
our conclusion thus far is drawn only on the basis of a visual inspection without
(yet) applying a distribution-based statistical inference procedure.
Exploratory Testing of Bivariate Normality
The assumption of asymptotic multivariate normality for the vectors zj of DIF
measures is central for identifying multivariate outliers. It is therefore of interest
to inspect the bivariate normality of the vectors .zj1; zj 2/ displayed in Figure 1.
We can expect bivariate normality only when there are no DIF items, but also
the presence of just a few DIF items may perhaps not distort the bivariate
distribution. Given that there are only 14 items, a test of bivariate normality
cannot be very powerful.
First, under the assumption of bivariate normality, the scatter plot of the
vectors of DIF measures should have an approximate elliptical shape. Apart
from items 10 and 14, the scatter plot looks as an ellipsoid. However, the
Do
wn
load
ed b
y [
Un
iver
sity
of
Lie
ge]
, [D
avid
Mag
is]
at 0
1:2
4 1
8 O
cto
ber
20
11
MULTIVARIATE OUTLIER APPROACH TO MULTIGROUP DIF 745
FIGURE 1 Scatter plot of measures of standardized areas between the item response
functions of each of the two focal groups and the reference group (math data set). The plot
shows focal group 1 measures versus focal group 2 measures (in both cases in comparison
with the reference group). Items are displayed by their rank number in the data set.
exclusion of items 10 and 14 is not based on an objective method. A more
formal approach to investigate the bivariate normality is the following. As stated
previously and under the hypothesis of bivariate normality, the statistic ¥�j given
in Equation (5) has an exact beta distribution with parameters 1 and 5.5, given
that F D 2.F=2 D 1, and Œ14 2 1�=2 D 5:5). This exact beta distribution
can be tested with a simple Kolmogorov-Smirnov one-sample test (Conover,
1971) using the empirical values of ¥�j computed on the total set of items.
Rejecting the null hypothesis of a match between the sample distribution and
the theoretical beta distribution would therefore imply that the bivariate normal
distribution may not be adequate (but note that the presence of DIF items may
invalidate the test).
This Kolomogorov-Smirnov test was performed with the ks.test function of
the R software (R Development Core Team, 2010). The result was nonsignificant
(KS D 0:301; p D :128), which can be understood as a reasonable result given
Do
wn
load
ed b
y [
Un
iver
sity
of
Lie
ge]
, [D
avid
Mag
is]
at 0
1:2
4 1
8 O
cto
ber
20
11
746 MAGIS, DE BOECK
that the number of items was small and that among them a few items were
DIF items. The test was repeated twice, with item 14 excluded, and with items
10 and 14 excluded. With just item 14 excluded, the KS test statistic equaled
0.243 (p D :367), and with also item 10 excluded, KS D 0:226 .p D :457/. As
expected, the exclusion of the most outlying item(s) yielded an improvement of
the test results in favor of the null hypothesis (decreased KS statistics, increased
p values).
In sum, the assumption of bivariate normality is not rejected for the math
data set and can be used as the basis for the outlier detection approach in this
study. After these exploratory steps, we explain the outlier detection method,
beginning with how the robust multivariate estimates can be obtained.
Robust Estimates
The nonrobust estimates � and † can be obtained directly through the common
estimation techniques. For the corresponding robust technique, four estimator
types are used: the Donoho-Stahel estimator, the constrained M estimator, the
MCD estimator, and the OGK estimator, one from each of the main categories
of robust multivariate estimators, as discussed in Section 3. All four are imple-
mented in the R package rrcov (Todorov & Filzmoser, 2009).
The OGK method, in its one-step (not reweighted) version, does not require
fixing any tuning parameter. The Donoho-Stahel and the constrained M estimator
require the specification of the so-called asymptotic rejection probability (arp),
that is, the probability that a nonoutlying observation would be eventually flagged
as outlying. This is closely related to the Type I error of a statistical test, which
is why we fixed the arp value to .05 for both methods. Finally, the MCD
method (in its not reweighted version) requires that the size of the optimal
subset is determined. The optimal subset is the a priori maximum number of
observations that are not outliers. The earlier visual inspection suggests that
the optimal subset consists of 12 items, but this visual inspection is a rather
subjective method. A more objective method would be to use an elbow criterion
for a relevant criterion. Possibly relevant criteria are the KS statistic we have
used previously, and the determinant of the covariance matrix. Because outliers
have an impact on the nonrobust variance and covariance estimates, they also
tend to inflate the determinant of the covariance matrix, so the elbow criterion
may work. We have used the smallest determinant of the covariance matrix
among all possible subsets of p items out of n, with n=2 < p � n.n D 14/.
If the plot of the smallest determinants against n p shows an elbow, then the
value of n p is used as the size of the optimal subset. Figure 2 displays the
results for subset sizes p from 8 to 14. The elbow occurs at subset size 12,
which is the value we will use for the tuning parameter of the MCD estimator.
Although the elbow procedure for the MCD method may seem a reintroduction
Do
wn
load
ed b
y [
Un
iver
sity
of
Lie
ge]
, [D
avid
Mag
is]
at 0
1:2
4 1
8 O
cto
ber
20
11
MULTIVARIATE OUTLIER APPROACH TO MULTIGROUP DIF 747
FIGURE 2 Plot of the subset size against the smallest determinants of the covariance
matrix (math data set) as a basis to apply the elbow criterion.
of a purification process, it neither includes an iterative purification of DIF items
nor a repeated computation of DIF statistics.
Table 1 summarizes the estimates of the mean vector and the covariance
matrix with the five methods: the nonrobust estimation and the four robust
alternatives. The estimates of �1 are rather similar across the five methods,
except for the OGK method, which returns a slightly lower value. The estimates
of �2, however, are clearly lower for the four robust methods ( 0.606 to 0.640
vs. 0.126), as a consequence of outlying values in the second dimension. With
respect to the variances, the robust estimates of the first variance component, ¢11,
are very close to the nonrobust one, whereas for the second variance component,
¢22, the robust estimates are much smaller (0.672–0.848) than the nonrobust
estimate (2.011), which is in agreement with the findings for the means. Finally,
the robust estimates of the covariance ¢12 differ from the nonrobust one ( 0.201
to 0.062 vs. 0.598).
In sum, the presence of outlying items has a direct impact on the nonrobust
estimates, especially for the second dimension (when comparing the second
Do
wn
load
ed b
y [
Un
iver
sity
of
Lie
ge]
, [D
avid
Mag
is]
at 0
1:2
4 1
8 O
cto
ber
20
11
748 MAGIS, DE BOECK
TABLE 1
Nonrobust and Robust Parameter Estimates of the Mean Vector and
Covariance Matrix of the Bivariate Standardized Areas (Math Data Set)
Method �1 �2 ¢11 ¢22 ¢12
Nonrobust 0.021 0.126 1.572 2.011 0.598
D-S 0.085 0.606 1.508 0.742 0.201
M 0.094 0.629 1.467 0.672 0.175
MCD 0.085 0.606 1.447 0.712 0.193
OGK 0.116 0.640 1.443 0.848 0.062
Note. D-S D Dohono-Stahel estimator; M D constrained M estimator;
MCD D minimum covariance determinant estimator; OGK D orthogonalized
Gnanadesikan-Kettenring estimator.
focal group with the reference group). Because the robust estimates converge
while they diverge from the nonrobust estimates, we expected DIF identification
convergence among the robust methods and some divergence with the nonrobust
method.
DIF Identification
Based on the decision rules of Equations (6) and (8), items 10 and 14 were
identified as DIF with all five methods, using an alpha of .05. Table 2 displays
the nonrobust statistics ¥�j in Equation (5), and also the statistics Q¥�j in Equation
(7) for each robust estimator. The DIF detection threshold is equal to 0.420, that
is, the quantile of the beta distribution with 1 and 5.5 degrees of freedom.
The nonrobust ellipse is given in Figure 3, and the four robust ellipses are
shown in Figure 4. Despite differences between the ellipses of the five methods,
they all return the same conclusion that items 10 and 14 are DIF items. The
gap between DIF (outside) and non-DIF (inside) items is clearer with the robust
methods than with the nonrobust method.
We focus now on pairwise comparisons. Starting from an ellipse, the pairwise
comparison can be achieved by projecting the items and the ellipses onto the
axes of the corresponding figure. The item projections in Figures 3 and 4 are
indicated by thin narrow lines on the axes, while the ellipse projections are
indicated as intervals delineated with thick brackets. The intervals in Figure 4
are for the largest ranges among the four robust ellipses.
Comparing the first focal group and the reference group (the projection onto
the x-axis), all items are located inside the non-DIF interval for the nonrobust
method as well as for the robust methods, which is in line with the findings by
Kim et al. (1995). Comparing the second focal group and the reference group
Do
wn
load
ed b
y [
Un
iver
sity
of
Lie
ge]
, [D
avid
Mag
is]
at 0
1:2
4 1
8 O
cto
ber
20
11
MULTIVARIATE OUTLIER APPROACH TO MULTIGROUP DIF 749
TABLE 2
Nonrobust and Robust Statistics ¥*
jand O¥*
j(Math Data Set)
Item Nonrobust D-S M MCD OGK
1 0.009 0.004 0.005 0.004 0.003
2 0.039 0.225 0.257 0.235 0.218
3 0.077 0.055 0.056 0.057 0.034
4 0.093 0.116 0.123 0.121 0.118
5 0.129 0.109 0.111 0.114 0.059
6 0.017 0.113 0.131 0.118 0.112
7 0.094 0.066 0.066 0.068 0.034
8 0.027 0.033 0.035 0.034 0.046
9 0.123 0.123 0.126 0.129 0.181
10 0.501a 1.017a 1.102a 1.060a 0.795a
11 0.107 0.078 0.079 0.081 0.046
12 0.137 0.133 0.136 0.139 0.192
13 0.099 0.126 0.132 0.132 0.136
14 0.700a 2.117a 2.360a 2.206a 2.199a
Note. D-S D Dohono-Stahel estimator; M D constrained M estimator;
MCD D minimum covariance determinant estimator; OGK D orthogonalized
Gnanadesikan-Kettenring estimator.aLarger than the quantile QBeta.0:95I 1; 5:5/ D 0:420 of the Beta
distribution with 1 and 5.5 degrees of freedom and significance level of .05.
(the projection on the y-axis), item 14 is located outside the non-DIF interval
for the nonrobust and the robust methods, whereas item 10 is located outside
the non-DIF interval only for the robust methods. Also these results are in line
with the findings by Kim et al., except for item 10 and the robust ellipses. The
pairwise Lord test did not reveal item 10 as a statistically significant DIF item
while our robust outlier method clearly does.
Reference Group
In this example the reference group was naturally defined as the group without
scientific calculators. To illustrate that the multiple-group DIF identification does
not depend on which group is chosen as the reference group, we also considered,
separately, each of the groups with a calculator as the reference group, and
repeated a DIF analysis using the nonrobust and robust methods.
It turned out that although the areas zjf were obviously different from those
obtained previously, the items identified as DIF items were the same. That is,
the nonrobust and the robust methods flagged items 10 and 14 as DIF, and only
those two. This illustrates the invariance of the results with respect to the choice
of the reference group.
Do
wn
load
ed b
y [
Un
iver
sity
of
Lie
ge]
, [D
avid
Mag
is]
at 0
1:2
4 1
8 O
cto
ber
20
11
750 MAGIS, DE BOECK
FIGURE 3 Scatter plot of measures of standardized areas between the item response
functions of each of the two focal groups and the reference group (math data set). The plot
shows focal group 1 measures versus focal group 2 measures (in both cases in comparison
with the reference group). Items are displayed by their rank number in the data set. The
contour is determined on the basis of nonrobust method. The marks on the axes are orthogonal
projections of the items onto these axes. The rectangular brackets are orthogonal projections
of the ellipse onto the axes.
DISCUSSION
In the present study we proposed to consider DIF items as multivariate outliers.
The method consists in identifying outlying items as items whose Mahalanobis
distance is larger than a well-defined quantile under the null hypothesis assump-
tion of no DIF. Robust statistical estimators are proposed instead of the regular,
nonrobust estimators to improve the accuracy of the outlier detection. Robust
statistics is a common approach in many fields of research, (e.g., biometrics or
econometrics), but seems to be fairly new in psychometrics (but see Zijlstra,
Do
wn
load
ed b
y [
Un
iver
sity
of
Lie
ge]
, [D
avid
Mag
is]
at 0
1:2
4 1
8 O
cto
ber
20
11
MULTIVARIATE OUTLIER APPROACH TO MULTIGROUP DIF 751
FIGURE 4 Scatter plot of measures of standardized areas between the item response
functions of each of the two focal groups and the reference group (math data set). The plot
shows focal group 1 measures versus focal group 2 measures (in both cases in comparison
with the reference group). Items are displayed by their rank number in the data set. The
four contours are determined on the basis of Dohono-Stahel estimates (D-S), constrained M
estimates (M), MCD estimates (MCD), and OGK estimates (OGK; math data set). Items
are displayed by their rank number in the data set. The marks on the axes are orthogonal
projections of the items onto these axes. The rectangular brackets are orthogonal projections
of the ellipse onto the axes.
van der Ark, & Sijtsma, 2007, 2011), and certainly for the identification of DIF
(Magis & De Boeck, 2011).
The outlier detection approach is straightforward and easy to implement, and
up to three groups, a simple graphical representation is available. An impor-
tant asset is that the method does not require iterative purification procedures.
The math data set, which was analyzed in this study, illustrates the potential
usefulness of this method. Given the theoretical potential of the method and its
success in the application, it seems promising also for large-scale international
Do
wn
load
ed b
y [
Un
iver
sity
of
Lie
ge]
, [D
avid
Mag
is]
at 0
1:2
4 1
8 O
cto
ber
20
11
752 MAGIS, DE BOECK
comparison studies and for cross-cultural equivalence studies. However, the
present study must be considered primarily as a proof of concept for DIF as
a robust outlier phenomenon. Further investigation is necessary to develop the
method and find out more precisely how accurate and practical the approach is.
The robust outlier methods can be further developed in several ways. First,
it is an interesting research issue how to determine of detection criteria when
the DIF statistics are not normally distributed. This would be particularly useful
for methods whose DIF statistics are basically chi-square distributed. For the
case the distribution is unknown, bootstrap methods may be developed. Sec-
ond, tools can be developed for the exploration and testing of where (between
which groups) the DIF occurs more precisely. The graphical approach presented
previously cannot be used beyond three groups.
To evaluate the robust outlier methods, they must be carefully compared
with the more traditional methods on their accuracy and efficiency. Extensive
simulation studies may help for this evaluation. An interesting aspect of the
comparison is that the robust outlier approach relies on statistics as used in
traditional methodology. The difference does not concern the DIF statistics
that are used, but rather the application of the robust outlier principle to these
statistics.
The robust outlier principle is based on a kind of logic that is worth consid-
ering also for the exploration and identification of other discrepancies between
a model and the data, due to discrepant data patterns associated to items or
persons. DIF is an example of such a discrepancy, but other such examples may
be of interest as a part of more specific model fit investigation than with global
goodness of fit statistics.
ACKNOWLEDGMENTS
This research was presented at the 75th annual meeting of the Psychometric
Society (Athens, GA, July 2010). The authors wish to thank the editor and the
anonymous referees for insightful suggestions. The research was financially sup-
ported by a postdoctoral research grant “Chargé de recherches” of the National
Funds for Scientific Research (FNRS, Belgium) and the Research Funds of K.U.
Leuven (GOA/10/02).
REFERENCES
Ackerman, T. A. (1992). A didactic explanation of item bias, item impact, and item validity from a
multidimensional perspective. Journal of Educational Measurement, 29, 67–91.
Angoff, W. H., & Ford, S. F. (1973). Item-race interaction on a test of scholastic aptitude. Journal
of Educational Measurement, 10, 95–106.
Do
wn
load
ed b
y [
Un
iver
sity
of
Lie
ge]
, [D
avid
Mag
is]
at 0
1:2
4 1
8 O
cto
ber
20
11
MULTIVARIATE OUTLIER APPROACH TO MULTIGROUP DIF 753
Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify differential item
functioning test items. Educational Measurement: Issues and Practice, 17, 31–44.
Conover, W. J. (1971). Practical nonparametric statistics. New York, NY: Wiley.
Dorans, N. J., & Kulick, E. (1986). Demonstrating the utility of the standardization approach to
assessing unexpected differential item performance on the Scholastic Aptitude Test. Journal of
Educational Measurement, 23, 355–368.
Gnanadesikan, R., & Kettenring, J. (1972). Robust estimates, residuals, and outlier detection with
multiresponse data. Biometrics, 28, 81–124.
Hardin, J., & Rocke, D. M. (2005). The distribution of robust distances. Journal of Computational
and Graphical Statistics, 14, 928–946.
Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel
procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129–145). Hillsdale, NJ: Erlbaum.
Holland, P. W., & Wainer, H. (1993). Differential item functioning. Hillsdale, NJ: Erlbaum.
Huber, P. J., & Ronchetti, E. M. (2009). Robust statistics (second edition). Hoboken, NJ: Wiley.
Johnson, R. A., & Wichern, D. W. (1998). Applied multivariate statistical analysis (4th ed.). Upper
Saddle River, NJ: Prentice-Hall.
Kim, S.-H., Cohen, A. S., & Park, T.-H. (1995). Detection of differential item functioning in multiple
groups. Journal of Educational Measurement, 32, 261–276.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale,
NJ: Erlbaum.
Magis, D., & De Boeck, P. (2011). A robust outlier approach to prevent Type I error inflation in
DIF. Research report, Department of Mathematics, University of Liège, Belgium.
Mahalanobis, P. C. (1936). On the generalized distance in statistics. Proceedings of the National
Institute of Science of India, 2, 49–55.
Mardia, K., Kent, J., & Bibby, J. (1979). Multivariate Analysis. New York, NY: Academic Press.
Maronna, R. A., Martin, D., & Yohai, V. J. (2006). Robust statistics: Theory and methods. New
York, NY: Wiley.
Maronna, R. A., & Yohai, V. J. (1995). The behavior of the Stahel-Donoho robust multivariate
estimator. Journal of the American Statistical Association, 90, 330–341.
Maronna, R. A., & Zamar, R. H. (2002). Robust estimates of location and dispersion for high-
dimensional datasets. Technometrics, 44, 307–317.
Miller, R. G. (1981). Simultaneous statistical inference (2nd ed.). New York, NY: Springer-Verlag.
Millsap, R. E., & Everson, H. T. (1993). Methodology review: Statistical approaches for assessing
measurement bias. Applied Psychological Measurement, 17, 297–334.
Osterlind, S. J., & Everson, H. T. (2009). Differential item functioning (2nd ed.). Thousand Oaks,
CA: Sage.
Penfield, R. D. (2001). Assessing differential item functioning among multiple groups: a comparison
of three Mantel-Haenszel procedures. Applied Measurement in Education, 14, 235–259.
Penfield, R. D., & Camilli, G. (2007). Differential item functioning and item bias. In C. R. Rao
& S. Sinharray (Eds.), Handbook of statistics 26: Psychometrics (pp. 125–167). Amsterdam, the
Netherlands: Elsevier.
R Development Core Team. (2010). R: A language and environment for statistical computing.
Vienna, Austria: R Foundation for Statistical Computing.
Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika, 53, 495–502.
Raju, N. S. (1990). Determining the significance of estimated signed and unsigned areas between
two item response functions. Applied Psychological Measurement, 14, 197–207.
Rocke, D. M. (1996). Robustness properties of S-estimates of multivariate location and shape in
high dimension. Annals of Statistics, 24, 1327–1345.
Rousseeuw, P. J., & Croux, C. (1993). Alternatives to the median absolute deviation. Journal of the
American Statistical Association, 88, 1273–1283.
Do
wn
load
ed b
y [
Un
iver
sity
of
Lie
ge]
, [D
avid
Mag
is]
at 0
1:2
4 1
8 O
cto
ber
20
11
754 MAGIS, DE BOECK
Rousseeuw, P. J., & Leroy, A. M. (1987). Robust regression and outlier detection. New York, NY:
Wiley.
Rousseeuw, P. J., & van Driessen, K. (1999). A fast algorithm for the minimum covariance deter-
minant estimator. Technometrics, 41, 212–223.
Shealy, R. T., & Stout, W. (1993). A model based standardization approach that separates true
bias/DIF from group ability differences and detects test bias/DIF as well as item bias/DIF.
Psychometrika, 58, 159–194.
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic
regression procedures. Journal of Educational Measurement, 27, 361–370.
Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group
difference in trace lines. In H. Wainer & H. Braun (Eds.), Test validity (pp. 147–170). Hillsdale,
NJ: Erlbaum.
Todorov, V., & Filzmoser, P. (2009). An object-oriented framework for robust multivariate analysis.
Journal of Statistical Software, 32, 1–47.
Woodruff, D. L., & Rocke, D. M. (1994). Computable robust estimation of multivariate location
and shape on high dimension using compound estimators. Journal of the American Statistical
Association, 89, 888–896.
Zijlstra, W. P., van der Ark, L. A., & Sijstma, K. (2007). Outlier detection in test and questionnaire
data. Multivariate Behavioral Research, 42, 531–555.
Zijlstra, W. P., van der Ark, L. A., & Sijstma, K. (2011). Robust Mokken scale analysis by means of
the forward search algorithm for outlier detection. Multivariate Behavioral Research, 46, 58–89.
APPENDIX
This section provides the derivation of Equations (11) and (12) from the in-
equality (10). To simplify the purpose, set F.zj1; zj 2/ D .zj z/0S 1.zj z/
as the squared Mahalanobis distance, so that Equation (10) takes the simple
form F.zj1; zj 2/ � Q�’ . By definition, the interval I1 must be such that zj1 62 I1
if and only if F.zj1; zj 2/ > Q�’ for any value of zj 2, or equivalently if and
only if the smallest value of F.zj1; zj 2/ (upon all zj 2 values) is larger than Q�’ .
This minimum value is obtained as follows. First, one can rewrite F.zj1; zj 2/
as follows:
F.zj1; zj 2/ D1
det Sfs22.zj1 z1/
2 2s12.zj1 z1/.zj 2 z2/
C s11.zj 2 z2/2g(17)
where det S D s11s22 s212 is the determinant of the covariance matrix S . Note
that F.zj1; zj 2/ is convex with zj 2 and reaches its minimum value whenever
zj 2 D z2 Cs12
s11
.zj1 z1/ D z�j 2: (18)
Do
wn
load
ed b
y [
Un
iver
sity
of
Lie
ge]
, [D
avid
Mag
is]
at 0
1:2
4 1
8 O
cto
ber
20
11
MULTIVARIATE OUTLIER APPROACH TO MULTIGROUP DIF 755
In sum, the minimum value of F.zj1; zj 2/ with respect to zj 2 is equal to
minzj2
F.zj1; zj 2/ D F.zj1 ; z�j 2/ D.zj1 z1/
2
s11
: (19)
Therefore
zj1 62 I1 ,.zj1 z1/2
s11
> Q�’ , jzj1 z1j >
p
s11Q�’
, zj1 62 Œz1 ˙p
s11Q�’ �;
(20)
as expected. Keep now zj1 fixed in I1, which implies that .zj1 z1/2 � s11Q�’ .
The condition Equation (10) can be written as follows, using Equation (17):
s22.zj1 z1/2 2s12.zj1 z1/.zj 2 z2/C s11.zj 2 z2/2 Q�
’ det S � 0; (21)
and the left-hand side of Equation (21) is convex with zj 2. The discriminant of
this convex function is equal to
� D 4fs11Q�’ .zj1 z1/2g det S (22)
and is positive since zj1 belongs to I1. In sum, condition (21) is satisfied if and
only if
zj 2 2"
z2 Cs12.zj1 z1/˙
p�
s11
#
(23)
(where Œa ˙ b� stands for interval Œa bI a C b�), which is equal to Equation
(12).
Do
wn
load
ed b
y [
Un
iver
sity
of
Lie
ge]
, [D
avid
Mag
is]
at 0
1:2
4 1
8 O
cto
ber
20
11