normal theory based test statistics in structural equation...
TRANSCRIPT
British Journal of Mathematical and Statistical Psychology (1998), 51, 289–309.
Normal Theory Based Test Statisticsin Structural Equation Modeling∗
Ke-Hai Yuan and Peter M. Bentler
University of California, Los Angeles
September 12, 1997; Revised, February 20, 1998
∗This research was supported in part by the National Institute on Drug Abuse, grants DA01070and DA00017.
Even though data sets in psychology are seldom normal, the statistics used to evaluate covariance
structure models are typically based on the assumption of multivariate normality. Consequently,
many conclusions based on normal theory methods are suspect. In this paper, we develop test
statistics that can be correctly applied to the normal theory maximum likelihood estimator. We
propose three new asymptotically distribution free (ADF) test statistics that technically must yield
improved behavior in realistic-sized samples, and use Monte Carlo methods to study their actual
finite sample behavior. Results indicate that there exists an ADF test statistic that also performs
quite well in finite sample situations. Our analysis shows that various forms of ADF test statistics
are sensitive to model degrees of freedom rather than to model complexity. A new index is proposed
for evaluating whether a rescaled statistic will be robust. Recommendations are given regarding
the application of each test statistic.
1. Introduction
With the advance of factor analytic models to more flexible and confirmatory models, and with
the help of popular software like LISREL (Joreskog & Sorbom, 1993) and EQS (Bentler, 1995),
the literature on structural equation modeling has increased dramatically in the past decade (see,
e.g., Austin & Calderon, 1996; Austin & Wolfle, 1991; Tremblay & Gardner, 1996). There is a
vast amount of recent introductory (Dunn et al., 1993; Byrne, 1994; Mueller, 1996; Schumacker
& Lomax, 1996; Ullman, 1996) and overview material (Hoyle, 1995; Marcoulides & Schumacker,
1996). However, these sources are not complete enough to accurately describe the problems and
alternatives associated with the most critical part of modeling, namely, model evaluation, especially
under conditions with real data that tend to be badly distributed. As recently reviewed by Bentler
and Dudgeon (1996), there exist test statistics, proposed by Browne (1984), whose asymptotic
distributions are known, but their finite sample behaviors are very disappointing (Hu et al., 1992;
Chan, 1995); there also exists a rescaled statistic, proposed by Satorra and Bentler (1988, 1994),
that works well in a variety of distribution situations (Curran et al., 1996; Hu et al., 1992), however,
its asymptotic distribution is generally not known. In this paper, we will propose and study some
new test statistics. We aim to find statistics whose asymptotic distributions are known that also
work well in finite samples with normal and nonnormal data.
Although many methods exist in structural equation modeling, we will concentrate our efforts
on tests associated with Wishart maximum likelihood (ML) estimation for several reasons. First,
the field largely developed on the basis of this classical method of estimation and the associated
likelihood ratio test TML for model evaluation (e.g., Bollen, 1989; Joreskog, 1969). Second, the
ML method has been implemented in virtually all structural equation modeling software packages.
Third, ML is by far the most widely used method in practice (e.g., Breckler, 1990; Gierl & Mulvenon,
1995). Our new statistics will result from modifications to an existing statistic by Browne (1984,
Proposition 4). A detailed analysis will be given on why the existing statistic fails when model
degrees of freedom get large. Since the Satorra-Bentler rescaled statistic has been reported in many
places to perform well, we will also give a new characterization which permits the robustness of this
rescaled statistic to be evaluated. To motivate the development, we will use a real data example to
demonstrate the inadequacy of existing statistics applied to psychological data.
Harlow, Stein, and Rose (1998) studied a variety of scales and measures regarding psychosocial
functioning, sexual experience, and substance use in an attempt to understand the antecedents of
HIV-risky sexual behaviors in two samples of women assessed across three time points. Nineteen
of their variables, a subset of their study variables for two time points, were made available to us.
Exploratory factor analysis might be used to find out how many latent factors would be needed to
explain the 190 elements of the covariance matrix (171 intercorrelations and 19 variances) of these 19
variables. However, the authors believed that a confirmatory factor model with six factors based on
53 parameters and 137 degrees of freedom could explain the variances and covariances. In particular,
the authors hypothesized that four variables called meaninglessness in life, stress, demoralization,
and powerlessness would be highly related and would measure one latent factor they called Poor
Psychosocial Functioning. The frequency, intensity, and amount of alcohol use were hypothesized
1
to correlate highly and be good indicators of a latent Alcohol Use factor. Variables of body kissing,
genital kissing and touching, and sexual intercourse from various positions were expected to be good
markers of a latent factor of Common Sexual Experience. Variables of positive, negative and control
psychosexual attitudes were expected to mark a Positive Psychosexual Attitudes factor. Similarly,
Drug Use was indicated by recreational drug use frequency, hard drug use frequency, and amount
of drug use, while Diverse Sexual Experience was marked by third-person sex, use of toys or films
with sex, and engaging in sex that is hurtful or painful.
Letting the six factors be correlated, the authors’ substantive theory implies that their 19 vari-
ables X could be generated by a model X = Λf +ε with Λ(19×6), f(6×1), and ε(19×1). Based on
a sample with size N=213, their model was evaluated by normal theory ML. When referred to the
nominal χ2137 with TML = 230.46, the associated p-value=1.01×10−6, implying that the model is not
adequate. However, other evidence, such as a small average absolute residual of about .05 between
the data and model-reproduced correlation matrices, implies that the model may be statistically
adequate. The significant TML may be due to the nonnormality of the data. In fact, examining
the distribution of this data set, we find that the largest marginal kurtosis is 12.96, which is associ-
ated with the hard drug use variable, and Mardia’s normalized multivariate kurtosis coefficient was
11.87. Hence, we cannot believe the normal theory statistic TML. We shall return to this example
after we present some more reliable test statistics.
In the rest of this section, we briefly review the two test statistics proposed by Browne and
Satorra and Bentler. Our new test statistics will be given in the next section. The empirical
performances of these statistics will be studied in sections 3 and 4.
Let Xi = (xi1, · · · , xip)′, i = 1, · · · , N = n + 1 be a sample from X = (x1, . . . , xp)
′, where
cov(X) = Σ is the population covariance matrix and S is its sample analogue. A covariance structure
on Σ can be expressed as a matrix function Σ = Σ(θ0) of q unknown parameters. Assuming the
normality of X, the estimate θ is obtained by minimizing
FML(θ) = tr(SΣ−1(θ)) − log |SΣ−1(θ)| − p, (1)
and TML = nFML(θ) is the associated statistic for evaluating the hypothetical model Σ = Σ(θ0).
Under the assumption of multivariate normality and the null hypothesis, TML is asymptotically
distributed as χ2p∗−q, where p∗ = p(p + 1)/2. Conditions also exist for normal theory inference to
be valid for nonnormal data with some specific models (Amemiya & Anderson, 1990; Anderson
& Amemiya, 1988; Browne, 1987; Browne & Shapiro, 1988; Mooijaart & Bentler, 1991; Satorra
& Bentler, 1990, 1991; Shapiro, 1987). Unfortunately, there is no effective way of verifying these
conditions in practice. When these conditions are not satisfied, the test statistic TML completely
breaks down as reported by Hu et al. (1992). So if the normal theory based statistic is going to be
used in conditions for which it was not designed, such as with nonnormal data, it will have to be
modified in some way.
Since the assumption of multivariate normality of data sets in psychology are often unrealistic
(e.g., Micceri, 1989), Browne (1984) proposed a test statistic which does not require any specific
distribution assumptions of the observed data. Let vech(·) be an operator which transforms a
2
symmetric matrix into a vector by picking the nonduplicated elements in the matrix, s = vech(S),
σ(θ) = vech(Σ(θ)), and denote the p∗ × q Jacobian matrix corresponding to σ(θ) as σ(θ). Then
there exists a full column rank p∗ × (p∗ − q) matrix σc(θ) whose columns are orthogonal to those of
σ(θ). Let Yi = vech(Xi − X)(Xi − X)′, SY be the corresponding sample covariance matrix of Yi.
For an estimate θ, the test statistic given by Browne (1984) is defined as
TB(θ) = ne′σc(θ)σ′c(θ)SY σc(θ)−1σ′
c(θ)e, (2)
where e = s − σ(θ). When SY is nonsingular, another equivalent form for TB, for which one does
not need to compute σc(θ), is
TB(θ) = ne′[S−1Y − S−1
Y σ(θ)σ′(θ)S−1Y σ(θ)−1σ′(θ)S−1
Y ]e. (3)
The test statistic TB(θ) asymptotically follows χ2p∗−q as long as θ is consistent. When θ is obtained
by minimizing
FGLS(θ) = (s − σ(θ))′S−1Y (s − σ(θ)), (4)
it is called the asymptotically distribution free (ADF) estimate and TADF = nFGLS(θ) is called the
classical ADF test statistic. We will refer to TB(θ) as the residual-based ADF test statistic for a
general consistent estimate θ. When sample size is large enough, the statistic TADF does perform
as expected (Chou et al., 1991; Curran et al., 1996; Hu et al., 1992; Muthen & Kaplan, 1985,
1992; Tanaka, 1984). However, in the typical situation with large models and small to moderate
sample sizes, TADF rejects correct models far too frequently, e.g., yielding up to 68% rejection of
the correct model in the study of Curran et al.. This problem was substantially eliminated by
Yuan and Bentler (1997a, in press a), who proposed some new test statistics based on minimizing
(4). Their test statistics are also asymptotically distribution free but outperform the classical ADF
test in small to moderate sample sizes. However, there often exist computational problems and
nonconvergent solutions when minimizing the function FGLS(θ) in (4) when model size is large and
sample size is small, and no statistics can be obtained without a convergent solution. This type of
problem hardly exists with minimizing (1), which is another argument for using ML when data are
not normal. However, there is not much point to using ML if it is impossible to accurately evaluate
models. The purpose of this paper is to develop some new procedures to evaluate models based on
ML estimates.
When the normality assumption does not hold, we can still estimate the unknown parameter θ0
by minimizing the function FML(θ), but TML will generally approach a nonnegative random variable
Q instead of χ2p∗−q. Let W = 2D′
p(Σ−1 ⊗ Σ−1)Dp, where Dp is the p2 × p∗ duplication matrix as
defined in Magnus and Neudecker (1988, p. 49), and σ = σ(θ0), then
U = W −Wσ(σ′Wσ)−1σ′W
is the residual weight matrix, as so named in Bentler and Dudgeon (1996). Satorra and Bentler
(1988) decomposed Q =∑p∗−q
1 αiτi, where the α′is are the nonzero eigenvalues of the matrix UΓ
with Γ = cov[vech(X − µ)(X − µ)′], τ ′is are independent chi-square variates each with degree
3
of freedom 1. When all the α′is are equal to 1, which is the case for normal data, then Q follows
χ2p∗−q; when all the α′
is are equal to α, then Q follows αχ2p∗−q. Based on this, Satorra and Bentler
proposed the statistic
TSB =p∗ − q
tr(U Γ)TML, (5)
where Γ = SY , and U is a consistent estimate of U . When all the α′is are equal, tr(USY ) approaches
(p∗ − q)α, and TSB approaches χ2p∗−q. Generally, the asymptotic distribution of TSB is unknown.
Let α be the mean and SD(α) = ∑p∗−qi=1 (αi − α)2/(p∗ − q − 1)
12 be the standard deviation of the
α′is, respectively. Then the coefficient of variation of the α′
is is given by CV(α) =SD(α)/α. So TSB
is asymptotically robust when CV(α) = 0, and TSB will not be asymptotically robust in general.
Existing empirical studies such as Chou et al. (1991), Curran et al. (1996), and Hu et al. (1992)
did not report the index CV(α). They only reported the marginal skewnesses and kurtoses of the
observed variables. However, Yuan and Bentler (in press b) showed that even though the observed
variables have heterogeneous marginal skewnesses and kurtoses, CV(α) can still be zero. So it is
impossible to know whether previous Monte Carlo studies violated the condition CV(α)=0 when
reporting the robustness of TSB. We will study TSB by specifying different CV(α) in our designed
conditions. This will allow us to evaluate the robustness of TSB when CV(α) 6= 0. There are several
other statistics that rescale TML (e.g., Kano, 1992; Shapiro & Browne, 1987). We will not study
them here since they are not as general as the statistic TSB in robustness (e.g., Hu et al., 1992).
Another aspect we want to make clear in this paper is that the test statistics TB(θ) and TADF are
sensitive to model degrees of freedom p∗− q, rather than to “model complexity”, defined as number
of parameters. Large degrees of freedom previously have been implicated in poor performance of
some test statistics. For example, Browne (1984, p. 79) conjectured that the sensitivity of TML
to kurtosis increases as the number of degrees of freedom for the model increases. Muthen and
Kaplan (1992) obtained a phenomenon of poorer performance of the statistic TADF with models of
larger size. However, they confounded three aspects: number of variables, number of parameters,
and number of degrees of freedom. The four models they studied varied in the same direction on
all three aspects, so that model that had a greater number of variables also had greater number of
parameters and more degrees of freedom. Thus it is impossible to conclude which of these aspects is
responsible for poorer performance of TADF . Curran et al. (1996, p. 18) concluded from their review
that: “The ADF χ2 appeared to be very sensitive to model complexity, with extreme inflation of the
model chi-square as the tested model became increasingly complex”, where they define complexity
as the number of parameters estimated in the model. However, this seemingly correct remark is
misleading, as will be seen in our analysis of the statistic TB in the later sections. It is not the
number of parameters that is critical, but the degrees of freedom.
2. Three Asymptotically Distribution Free Test Statistics
In the definition of TB(θ) in (2), SY is used to estimate the population fourth-order moment
matrix Γ. A typical element of Γ is
γij,kl = σijkl − σijσkl,
4
where σijkl=E(xi − µi)(xj − µj)(xk − µk)(xl − µl) is the fourth-order multivariate moment of X
about its mean µ = (µ1, . . . , µp)′ and σij is an element of Σ. The estimation of Γ by SY in Browne
(1984) is only based on consistency considerations. This is a large sample property. We will consider
some new estimates Γ which have the same consistency property as SY , as well as having good finite
sample properties. This will lead to several new test statistics. These test statistics can be applied
to any consistent estimates, but, for the reason mentioned earlier, we will restrict their applications
to the normal theory ML estimates in our study.
In the regression literature, a matrix based on cross products of model residuals is regularly
used for estimating the population counterpart in obtaining standard errors. For a consistent θ,
Γ =1
n
N∑
i=1
(Yi − σ(θ))(Yi − σ(θ))′ = SY +N
n(Y − σ(θ))(Y − σ(θ))′ (6)
is also a consistent estimate of Γ. Since Y 6= σ(θ) generally, Γ is different from SY . Replacing SY
by Γ in (2), we obtain a new statistic
TY B(θ) = TB(θ)/(1 + NTB(θ)/n2). (7)
This statistic is also asymptotically distribution free for any consistent θ. As sample size gets large,
TY B approaches χ2p∗−q under the null hypothesis of a correct structure. So TY B is asymptotically
equivalent to TB. But since TY B < TB numerically for finite sample size N , we can expect that the
overrejection rate of TB for correct models with smaller sample sizes can be lessened by using TY B.
Notice that an estimate Γ that appears in (2) involves a form σ′c(θ)Γσc(θ)−1. For any sym-
metric random matrix A, it generally holds that E(A−1) ≥ (EA)−1 (see excise 3.7 on page 114 of
Muirhead, 1982). So σ′c(θ)SY σc(θ)−1 will be stochastically larger for estimating σ′
c(θ)Γσc(θ)−1
even though SY may be a good estimate of Γ. For example, let Z1, . . ., ZN be a sample from
Nm(µ,Ω). Then the corresponding sample covariance SZ is an unbiased estimator of Ω. However,
E(S−1Z ) = nΩ−1/(n − m − 1), so that an unbiased estimator of Ω−1 is given by (n − m − 1)S−1
Z /n
rather than by S−1Z . Although S−1
Z is still consistent for Ω−1, it is positively biased and the bias
will proportionate to the model degrees of freedom. Note that Ω = σ′c(θ)SY σc(θ) corresponds to the
sample covariance matrix of Zi = σ′c(θ)Yi. Even though the Zi here are not a random sample from
Np∗−q(µ,Ω), it is hard to imagine the advantage of Ω−1 over n− (p∗ − q)− 1Ω−1/n in estimating
Ω−1. See Yuan and Bentler (1997b) on an application. Using this new estimate for Ω = σ′c(θ)Γσc(θ)
in TB(θ) leads to a new test statistic
TC(θ) = n − (p∗ − q) − 1TB(θ)/n. (8)
Similar to TB and TY B, the statistic TC is also asymptotically distribution free and approaches χ2p∗−q
as sample size gets large.
Recognizing that the statistics TB is stochastically too large to be approximated by χ2p∗−q with
finite sample sizes, the new statistics TY B and TC modified TB in order to get better approxima-
tions. The next test procedure is obtained by modifying the reference distribution χ2p∗−q to a new
distribution instead of modifying the test statistic TB.
5
The nice quadratic form of TB in (2) reminded Yuan and Bentler (in press a) of the well-known
Hotelling’s statistic
T 2 = N(AX − b)′(ASA′)−1(AX − b),
which is used for testing the hypothesis Aµ = b. In a similar way, we may use (2) to test the
hypothesis σc(σ(θ0) − σ) = 0. Of course, we can use a chi-square distribution to approximate the
distribution of T 2, but the approximation will be bad unless the model degrees of freedom are small
and N is large. For example, let r be the dimension of the vector b, then
E(T 2) =(N − 1)r
N − r − 2,
var(T 2) =2r(N − 1)2(N − 2)
(N − r − 2)2(N − r − 4),
both the expectation and variance of T 2 are much larger than those of χ2r, respectively. When
r/N is not ignorable, T 2 will be stochastically much larger than a χ2r variate. This will inevitably
lead to overrejection when using critical values from χ2r to judge the significance of T 2. A better
approximation to the distribution of T 2 is through an F distribution
(N − r)T 2/r(N − r) ∼ Fr,N−r. (9)
Even when a sample is not from a multivariate normal distribution and the observations are not
independent, the relation in (9) can still be exact. Actually, the approximation in (9) is quite
robust to a variety of nonnormal distributions (e.g., Mardia, 1975; Kariya, 1981). Thus motivated,
we propose to approximate the distribution of TB by a Hotelling’s T 2 distribution, giving the statistic
TF = N − (p∗ − q)TB/(N − 1)(p∗ − q),
which is referred to an F-distribution with degrees of freedom p∗−q and N − (p∗−q). Note that the
statistic TF with its associated new distribution is asymptotically equivalent to the residual-based
ADF statistic. Thus, it is also asymptotically distribution free. The difference between TB and TF
will increase as the degrees of freedom p∗ − q get larger for a given finite sample size N .
The above three statistics TY B, TC, and TF are closely related. All of them are simple functions
of TB and are asymptotically distribution free. The asymptotic equivalence of TY B and TC to TB is
easy to see. The asymptotic equivalence of TF to TB can be seen through the ratio
F =χ2
p∗−q/(p∗ − q)
χ2N−(p∗−q)/N − (p∗ − q)
where (p∗ − q)F approaches χ2p∗−q in distribution. If we were only interested in the asymptotic
distributions of these test statistics, there would be no need to develop TY B, TC or TF , since they
are all equivalent to the existing statistic TB. However, our interest is in the finite sample behavior
of these statistics, which may be quite different from the asymptotic behavior. Since these statistics
are complicated nonlinear functions of the observed variables, the only possible approach to study
their finite sample behavior is through empirical simulations. This will be given in the next two
sections.
6
3. Models and Designs
Three models are used in our study. The first one is a fifteen dimensional confirmatory factor
model X = Λf + e, as used in Hu et al. (1992), with three common factors each having its own five
indicators. This generates a covariance structure Σ(θ) = ΛΦΛ′ + Ψ. The population parameters
are given by
Λ =
λ 0 00 λ 00 0 λ
, Φ =
1.0 .30 .40.30 1.0 .50.40 .50 1.0
,
where λ′ = (70, .70, .75, .80, .80) and 0 is a vector of five zeros. The Ψ is a diagonal matrix which
makes all the diagonal elements of Σ be 1.0. In order for the factor model to be identifiable, we
restrict the last factor loading corresponding to each factor at its true value; this fixes the scale of
the factors. All the other nonzero parameters are set as unknown free parameters. So q = 33 for
this model, and degrees of freedom are p∗ − q = 87.
For the population covariance matrix Σ in the above 3-factor model, we let the observed variables
be generated by X = Σ12 e with cov(e) = I in the second model. So the population covariance in
this model is the same as in the first model, and the 3-factor model is still correct for fitting data
from this model. Consequently, we call the first model factor model A and the second model factor
model B.
The details of the variable generation techniques are given in Table 1. In factor model A, the
latent common factors f and the unique factors e are generated under four distribution conditions.
In factor model B, the vector e is generated under two distribution conditions. In factor model A,
condition 1 generates observed normal variables; condition 2 generates variables from an elliptical
distribution with Mardia’s kurtosis parameter equal to 3. Conditions 3 and 4 of factor model A and
conditions 1 and 2 of factor model B generate variables with nonzero skewness and kurtosis. The
difference between factor models A and B is that some aspects of the 4th-order moment matrices
of the observed variables are different.
Insert Table 1 near here
The other model in our study is a fifteen-dimensional intraclass correlation model Σ(θ) = θ1I +
θ211′, where 1 represents a column vector of elements 1.0. The population parameters are given
by θ1 = θ2 = .5. So q=2 for this model, and the degrees of freedom are p∗ − q = 118. Let
X = Σ12 e. The e are generated under 4 different conditions, which are also given in Table 1. As in
the conditions for the confirmatory factor model, conditions 1 and 2 generate normal and elliptical
data, respectively; conditions 3 and 4 generate skewed data with nonzero kurtosis. A motivation
for choosing the intraclass model is to study the sensitivity of the statistic TB to model degrees of
freedom and model complexity. Compared to the confirmatory factor model, the intraclass model
is much simpler, with only two parameters that need to be estimated. On the basis of Curran et
al.’s observations, one would expect TB to perform better for this model than for the factor analysis
model. However, as we shall see in the next section, the statistic TB performs much worse on this
7
model than on the factor analysis model. This is because the degrees of freedom for this model are
larger.
As for the study of statistic TSB, the CV(α)s corresponding to each structure and condition are
given at the bottom of Table 1. For all the conditions in factor model A, and conditions 1 and 2
in the intra-class model, CV(α) = 0. So, under these conditions, the statistic TSB should perform
robustly for large sample sizes. For conditions 1 and 2 of factor model B, CV(α) = .089, indicating
only a small discrepancy among the eigenvalues. Hence, in these conditions we would expect the
statistic TSB still to behave well. For conditions 3 and 4 of the intra-class model, on the other hand,
CV(α) = 2.38, a relatively much larger value. Consequently, we should not expect the statistic
TSB to behave robustly in these two conditions, at least when sample size is large. Since we know
the α′is in all the conditions, we can characterize some aspects of the large sample behavior of TSB
even when CV(α) does not equal zero. For example, in conditions 1 and 2 of factor model B, TSB
approaches a statistic with a standard deviation of 13.24; in conditions 3 and 4 of the intra-class
model, TSB approaches a statistic with a standard deviation of 39.46. Comparing these values with
the standard deviations associated with χ287 and χ2
118, which are 13.19 and 15.36, respectively, we
see that TSB should still behave appropriately with conditions 1 and 2 of factor model B, but not
with conditions 3 and 4 of the intra-class model.
The settings in Table 1 allow us to study various aspects of the statistics. The asymptotically
distribution free property of TB, TC, TY B and TF does not necessarily imply that they are distribu-
tion free in finite samples. We can observe their sensitivity to different models and distributional
conditions through our study. For example, in factor model A and the intra-class model, as we
move from conditions 1 to 4, the data change from normal to slight nonnormal, and then to severe
nonnormal. The behavior of TSB under various CV(α) would allow us to judge its robustness under
different violations of robustness conditions. The design will also allow us to see whether TB, as well
as other statistics, is sensitive to model complexity or to model degrees of freedom. Even though we
are not studying the statistic TML, we should note that TML is not robust in any of the nonnormal
conditions in Table 1. This was shown for some of these conditions by Hu et al. (1992).
For each model and each condition, estimates of the unknown parameters in the structural model
are obtained by applying the algorithm of iteratively reweighted least squares (Lee & Jennrich, 1979)
to minimize the FML(θ) in (1). Each test statistic is computed for each model and condition. Six
sample sizes 150, 200, 300, 500, 1000, 5000 are used for each case. Each combination is replicated
500 times. The sample mean, sample standard deviation, and rejection rate based on 5% critical
value are calculated for each statistic over the 500 replications. At a chance level, there will be
approximately 500 × .05 = 25 rejections of the correct model.
Although tail behavior is probably the most important aspect of a test statistic, we also want to
understand the overall accuracy of the approximation of the proposed asymptotic distributions to
the finite sample empirical behavior of these statistics. For this we use the well-known Kolmogorov-
Smirnov (KS) statistic, using the empirical distribution FM(x) of each statistic T based on M=500
replications, and comparing to a theoretical distribution function F (x). The KS-statistic is defined
as DKS = supx |FM(x)− F (x)|. A very large DKS indicates that it is inappropriate to use F (x) to
8
describe the randomness of the statistic T . Critical values of DKS at significance levels .05 and .01
are 1.3581/√
M and 1.6276/√
M , respectively; which are approximately .061 and .073 for M = 500.
More about the KS-statistic can be found in Stuart and Ord (1991, §30.37-§30.42).
4. Empirical Results
Simulation results under the various conditions are presented in Tables 2 to 11, where M, SD,
R, DKS represent the sample mean, the sample standard deviation, the rejection rate, and the
KS-statistic, respectively. For easy comparison, the means and standard deviations of the F distri-
butions are given at the bottom of Tables 2 and 8.
Table 2 summarizes the performance of the various test statistics when data are normal. From
Table 2, it is clearly seen that the statistic TB rejects the correct model too frequently for small
to moderate sample sizes, so that inferences based on TB are not reliable at all. In fact, TB does
not perform nominally until N=5000. The statistic TC performs much better than TB, but TC still
overrejects the correct model too frequently at smaller sample sizes. Note that when N=150, the
correction factor of TC over TB is .42; this factor is not small enough to make TC behave properly
in rejection rate. The statistic TSB performs very well except for a little overrejection for small
N . The statistic TY B overcorrects the rejection rate of TB in that, for smaller sample sizes, TY B
rejects the correct model less than the nominal level. For small to moderate sample sizes, TF also
minimally overrejects the correct model, but performs much better than TC.
With respect to the sample mean and sample standard deviation, those corresponding to TB for
smaller sample sizes are much larger than that of the χ287. Even N=5000 is not large enough to
solve this problem completely. The sample mean and standard deviation of TC retain some heritage
from TB, though they are much better. TSB has minimally larger sample means than 87, while its
sample standard deviations match those of χ287 very well for all the sample sizes studied. While the
means of TY B are a little bit larger than 87, its sample standard deviations are smaller than that
of χ287. The sample means and sample standard deviations corresponding to TF also are a little
bit larger than those of the corresponding F-distributions. With respect to the KS-statistic, those
corresponding to TB are the largest for all the sample sizes. For sample sizes up to 500, TSB has
the smallest DKS . For N=1000, TC has the smallest DKS and TY B has the smallest DKS when
N=5000.
Insert Tables 2 and 3 near here
Table 3 gives results for elliptically symmetric data. The rejection rate of TB is still as bad as
it is for normal data; TC still overrejects the correct model with smaller N , but it performs better
than that for the normal data case. The rejection rate of TY B is almost the same as in the normal
data case, with some underrejection of the correct model for smaller sample sizes. The rejection
rates of TSB and TF are uniformly good at all the sample sizes. With respect to the KS-statistic,
TB has the largest DKS and TSB has the smallest DKS over all the sample sizes studied.
Results for factor model A with skewed data are given in Table 4. Most of the test statistics
behave in a pattern similar to that of the elliptical data case. The statistic TB works well when
9
N=5000; this is the same conclusion as given by Hu et al. (1992) for the classical ADF statistic.
For N=300 and up, the statistic TC behaves very well, showing the effect of the correction factor
n−(p∗−q)−1/n. The test statistic TF for all the sample sizes change from a little overrejection in
Table 3 to a little underrejection of the correct model when compared to the 5% nominal level. The
statistic TY B, on the other hand, behaves in a very stable way across these two types of nonnormal
data over all the sample sizes. The KS-statistics all behave in a similar way as found with elliptical
data. The results based on condition 4, with both factors and errors skewed, are similar to those of
condition 3. They are given in Table 5.
Insert Tables 4 and 5 near here
Tables 6 and 7 present the results from factor model B with asymmetric data. It is apparent
that statistics TB, TC , TY B, and TF behave in almost the same way as for conditions 3 and 4 in
factor model A. However, TSB behaves somewhat differently. For example, with moderate sample
sizes, the rejection rate of TSB changes from slightly under the nominal level in Tables 4 and 5 to
slightly over the nominal level in Tables 6 and 7. Also, the rejection rate of TSB is quite high for
smaller sample sizes. This may be due to the minor violation of the robustness condition for TSB.
Insert Tables 6 and 7 near here
Next we turn to the intraclass model, for which the results are given in Tables 8-11. First, it
is instructive to compare Tables 8 and 9 with Tables 2 and 3. All these tables give results that
correspond to normal and elliptical data. If the rejection rates of the statistics TB and TC are bad
in Tables 2 and 3, they are even worse in Tables 8 and 9. This is better appreciated through TC
and the KS-statistics. For example, when N=150, the factor n − (p∗ − q) − 1/n = .21 used to
obtain TC is still not small enough to correct the large rejection rate of TB. The DKS corresponding
to TB all equal to 1 for N=150 and 200. Since the upper bound for the DKS is 1, this means that
for the intraclass model with N=150 and 200, using any other distribution to approximate that
of TB can not be worse than using a χ2118 in the criterion of the KS-statistic. An implication is
that the statistic TB is very sensitive to the degrees of freedom in a model rather than to model
complexity. With respect to the other statistics, TF rejects the correct intraclass model more often
than the factor model. Since TF is a nonrandom scalar times TB, the overrejection of TB passes
to TF . This can also be observed in Tables 10 and 11. As in Table 2, the statistic TSB in Table 8
overrejects the correct model in a stable way across all the sample sizes. However, TSB underrejects
the correct model in Table 9 for all sample sizes. This may reflect its sensitivity to model complexity
and degrees of freedom. The statistic TY B in the intraclass model performs as it does in the factor
model.
Insert Tables 8 and 9 near here
Tables 10 and 11 correspond to conditions 3 and 4 for the intraclass model. These conditions
were designed to demonstrate the possible nonrobustness of the statistic TSB. Even though the
10
asymptotic distribution of TSB is not χ2p∗−q anymore, using a critical value from the chi-square
distribution still gives reasonable rejection rates in small to moderate sample sizes. However, as
N gets larger, the rejection rate of TSB departs from the nominal level. A trend toward non-
χ2 behavior can also be observed from the standard deviations and the KS-statistics. While the
standard deviation for χ2118 is 15.36, the sample standard deviations of TSB in Tables 10 and 11
are much larger than 15.36 for all sample sizes studied. Statistically, the standard deviation of
TSB will approach 39.46 as N gets larger. This trend can be observed in these two tables. The
nonrobustness of TSB can also be easily detected by the DKS . In conditions for which TSB is robust,
the DKS associated with TSB are almost always the smallest. Even when they are not the smallest,
the obtained DKS are only minimally different from the smallest DKS . In Tables 10 and 11, the
DKS associated with TSB uniformly dominate those associated with TY B for all sample sizes, and
those associated with TF for most of the sample sizes. Furthermore, for conditions under which TSB
is robust or nearly robust, the DKS associated with TSB decrease very fast as N increases; however,
they decrease rather slowly in Tables 10 and 11.
Insert Tables 10 and 11 near here
It is also worthwhile to revaluate the model in the example introduced at the beginning of the
paper. Results based on the new test statistics are presented in Table 12. We can see that similarly
to TML, the statistics TB and TSB also suggest that the model in this example is not adequate, even
though TB is specifically designed for large sample nonnormal data. In contrast, TY B suggests that
the model fits the data very well. Similarly, the statistics TF and TC also suggest that the model
is reasonable. Our new finite sample test statistics are consistent with a priori expectations and
residuals.
Insert Table 12 near here
So far, we concentrated on the performance of the different statistics under the null hypothesis.
The power property of a statistic under an alternative hypothesis is equally important. When a
model slightly departs from the null hypothesis and N becomes large, TC and TY B will approach
the same noncentral chi-square distribution as that of TB, and TF will approach a noncentral F-
distribution with the same noncentrality parameter as the chi-square. The finite sample behavior
in power of these statistics also was studied. Since the power properties of TY B and TF are very
similar to those of their counterparts when applying to the ADF estimates, as reported in Yuan
and Bentler (1997a, in press a), we only give a brief report of these results here. When misfitting
the 3-factor model by a 2-factor model, TY B has an average power of .568 for sample size 150, and
.936 for sample size 200; TF has an average power of .843 for sample size 150, and .982 for sample
size 200; and each of them has an actual power of 1.0 for sample sizes 300 and above. The statistic
TC has a better power than either TY B or TF . Note that the power of a test statistic is closely
related to its type I error rate. A decreasing type I error usually is accompanied by an increase in
type II error. This happens to almost any useful test statistic, e.g., the TSB as studied in Curran
11
et al. (1996). So it is inappropriate to talk about power without controlling the type I error. For
example, the statistic TB always has a power of 1.0 for detecting the wrong 2-factor model, but one
would prefer not to use it since it rejects the correct model with probability 1 for smaller sample
sizes. In most of the statistical literature on power, one generally controls the type I error, since it
is usually considered more important.
5. Discussion and Recommendation
In this paper, we propose the three new test statistics TY B, TC and TF and apply each of
them to the normal theory MLE. All of them are asymptotically equivalent to the residual-based
ADF statistic TB. Empirical studies indicate that, to a certain degree, each of them corrects the
overrejection of TB for correct models in finite samples. Among these three, TY B performs most
stably across different models and distribution conditions. The only drawback of TY B as a general
test statistic is the tendency to slightly overaccept the correct models when sample size is small.
The test statistic TF is very reliable for models with not so large degrees of freedom, however,
its rejection is still too high when model degrees of freedom get larger. The rejection rate of the
statistic TC is also too high for small to moderate sample sizes, though it dramatically improves on
TB.
The remarkable behavior of TSB is once again verified in our study with CV(α)=0. Even when
CV(α)=2.38, its performance in rejection rate is still very good for smaller sample sizes. However,
the robustness of TSB under many conditions does not necessarily mean that it can be applied to
any type of data. Actually, its robustness will break down when CV(α) gets larger, at least with
larger sample sizes. This is demonstrated in Tables 10 and 11 as well as in Table 12, where TML
and TSB give almost the same model evaluation for the example. The drawback of TSB is that we
do not know its asymptotic distribution unless CV(α)=0.
What general conclusions and recommendations can we make from this study? First, we have
shown definitively that TB should not be used in any case. This is because statistics TY B, TC and
TF all possess the same theoretical property as TB but perform much better. If one needs an asymp-
totically distribution free test statistic when facing severely nonnormal data, TY B is recommended.
Statistic TF is also very reliable when the model degrees of freedom are not so large. The statistic
TSB is remarkably reliable when CV(α) ≈ 0. However, there is currently no known statistical test
for the hypothesis CV(α) = 0 in the population, and CV(α) may still be relatively large even if
CV(α) = 0. The development of a statistical test for the hypothesis CV(α) = 0, and empirical re-
search on CV(α) for different types of nonnormal data satisfying CV(α) = 0, are interesting topics
for further study. Of course, the matrix Ω = σ′c(θ)SY σc(θ) in (2) needs to be nonsingular in order
to use TY B or TF ; similarly, the matrix U Γ in (5) needs to be of rank p∗ − q for using the statistic
TSB. Assuming matrices Σ and σc are of full rank, we can show that the nonsingularity of Ω is
equivalent to rank(U Γ) = p∗ − q. In situations with near singular Ω, such as very small sample
size, further study is needed for finding an appropriate statistic with nonnormal data. The above
discussion and our recommendations are summarized in Table 13.
Insert Table 13 near here
12
References
Amemiya, Y. & Anderson, T. W. (1990). Asymptotic chi-square tests for a large class of factor
analysis models. Annals of Statistics, 18, 1453–1463.
Anderson, T. W. & Amemiya, Y. (1988). The asymptotic normal distribution of estimators in
factor analysis under general conditions. Annals of Statistics, 16, 759–771.
Austin, J. T. & Calderon, R. F. (1996). Theoretical and technical contributions to structural
equation modeling: An updated annotated bibliography. Structural Equation Modeling, 3, 105–
175.
Austin, J. T. & Wolfle, D. (1991). Annotated bibliography of structural equation modeling: Tech-
nical work. British Journal of Mathematical and Statistical Psychology, 44, 93–152.
Bentler, P. M. (1995). EQS Structural Equations Program Manual. Encino, CA: Multivariate
Software.
Bentler, P. M. & Dudgeon, P. (1996). Covariance structure analysis: Statistical practice, theory,
and directions. Annual Review of Psychology, 47, 541–570.
Bollen, K. A. (1989). Structural Equations with Latent Variables. New York: Wiley.
Breckler, S. J. (1990). Application of covariance structure modeling in psychology: Cause for
concern? Psychological Bulletin, 107, 260-273.
Browne, M. W. (1984). Asymptotic distribution-free methods for the analysis of covariance struc-
tures. British Journal of Mathematical and Statistical Psychology, 37, 62–83.
Browne, M. W. (1987). Robustness of statistical inference in factor analysis and related models.
Biometrika, 74, 375–384.
Browne, M. W. & Shapiro, A. (1988). Robustness of normal theory methods in the analysis of linear
latent variate models. British Journal of Mathematical and Statistical Psychology, 41, 193–208.
Byrne, B. M. (1994). Structural Equation Modeling with EQS and EQS/Windows, Thousand Oaks,
CA: Sage.
Chan, W. (1995). Covariance structure analysis of ipsative data. Ph.D. thesis, UCLA.
Chou, C.-P., Bentler, P. M. & Satorra, A. (1991). Scaled test statistics and robust standard errors
for nonnormal data in covariance structure analysis: A Monte Carlo study. British Journal of
Mathematical and Statistical Psychology, 44, 347–357.
Curran, P. S., West, S. G. & Finch, J. F. (1996). The robustness of test statistics to nonnormality
and specification error in confirmatory factor analysis. Psychological Methods, 1, 16–29.
Dunn, G., Everitt, B. & Pickles, A. (1993). Modeling Covariances and Latent Variables Using EQS.
London: Chapman & Hall.
Gierl, M. J. & Mulvenon S. (1995). Evaluating the application of fit indices to structural equation
models in educational research: A review of the literature from 1990 through 1994. Paper
presented at Annual Meetings of the American Educational Research Association, San Francisco.
13
Harlow, L. L., Stein, J. A. & Rose, J. S. (1998). Substance abuse and risky sexual behavior in
women: A longitudinal stage model. Manuscript under review; based on NIMH Grant MH47233.
Hoyle, R. (ed.) (1995). Structural Equation Modeling: Concepts, Issues, and Applications. Thou-
sand Oaks, CA: Sage.
Hu, L., Bentler, P. M. & Kano, Y. (1992). Can test statistics in covariance structure analysis be
trusted? Psychological Bulletin, 112, 351–362.
Joreskog, K. G. (1969). A general approach to confirmatory maximum likelihood factor analysis.
Psychometrika, 34, 183–202.
Joreskog, K. G. & Sorbom, D. (1993). LISREL 8 User’s Reference Guide. Chicago: Scientific
Software International.
Kano, Y. (1992). Robust statistics for test-of-independence and related structural models. Statistics
and Probability Letters, 15, 21–26.
Kariya, T. (1981). A robustness property of Hotelling’s T 2-test. Annals of Statistics, 9, 211–214.
Lee, S.-Y. & Jennrich, R. I. (1979). A study of algorithms for covariance structure analysis with
specific comparisons using factor analysis. Psychometrika, 44, 99–113.
Magnus, J. R. & Neudecker, H. (1988). Matrix Differential Calculus with Applications in Statistics
and Econometrics. New York: Wiley.
Marcoulides, G. A. & Schumacker, R. E. (eds.) (1996). Advanced Structural Equation Modeling:
Issues and Techniques. Mahwah, NJ: Lawrence Erlbaum Associates.
Mardia, K. V. (1975). Assessment of multinormality and the robustness of Hotelling’s T 2 test.
Applied Statistics, 24, 163–171.
Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological
Bulletin, 105, 156–166.
Mooijaart, A. & Bentler, P. M. (1991). Robustness of normal theory statistics in structural equation
models. Statistica Neerlandica, 45, 159–171.
Mueller, R. O. (1996). Basic Principles of Structural Equation Modeling. New York: Springer
Verlag.
Muirhead, R. J. (1982). Aspects of Multivariate Statistical Theory. New York: Wiley.
Muthen, B. & Kaplan, D. (1985). A comparison of some methodologies for the factor analysis of
non-normal Likert variables. British Journal of Mathematical and Statistical Psychology, 38,
171–189.
Muthen, B. & Kaplan, D. (1992). A comparison of some methodologies for the factor analysis of
non-normal Likert variables: A note on the size of the model. British Journal of Mathematical
and Statistical Psychology, 45, 19–30.
Satorra, A. & Bentler, P. M. (1988). Scaling corrections for chi-square statistics in covariance
structure analysis. American Statistical Association 1988 proceedings of Business and Economics
14
Sections (pp. 308–313). Alexandria, VA: American Statistical Association.
Satorra, A. & Bentler, P. M. (1990). Model conditions for asymptotic robustness in the analysis of
linear relations. Computational Statistics & Data Analysis, 10, 235–249.
Satorra, A. & Bentler, P. M. (1991). Goodness-of-fit test under IV estimation: Asymptotic robust-
ness of a NT test statistic. In R. Gutierrez & M. J. Valderrama (eds.), Applied Stochastic Models
and Data Analysis (pp. 555–567). Singapore: World Scientific.
Satorra, A. & Bentler, P. M. (1994).Corrections to test statistics and standard errors in covariance
structure analysis. In A. von Eye & C. C. Clogg (eds.), Latent Variables Analysis: Applications
for Developmental Research (pp. 399-419). Newbury Park, CA: Sage.
Schumacker, R. E. & Lomax, R. G. (1996). A Beginner’s Guide to Structural Equation Modeling.
Mahwah, NJ: Lawrence Erlbaum Associates.
Shapiro, A. (1987). Robustness properties of the MDF analysis of moment structures. South African
Statistical Journal, 21, 39–62.
Shapiro, A. & Browne, M. (1987). Analysis of covariance structures under elliptical distributions.
Journal of the American Statistical Association, 82, 1092–1097.
Stuart, A. & Ord, J. K. (1991). Kendall’s Advanced Theory of Statistics, Vol. 2, 5th ed.. New York:
Oxford University Press.
Tanaka, J. S. (1984). Some results on the estimation of covariance structure models. Ph.D. thesis,
UCLA.
Tremblay, P. F. & Gardner, R. C. (1996). On the growth of structural equation modeling in
psychological journals. Structural Equation Modeling, 3, 93–104.
Ullman, J. (1996). Structural equation modeling. In B. G. Tabachnick & L. S. Fidell, Using Mul-
tivariate Statistics 3rd ed. (Ch. 14, pp. 709–811). New York: Harper Collins College Publishers.
Yuan, K.-H. & Bentler, P. M. (1997a). Mean and covariance structure analysis: Theoretical and
practical improvements. Journal of the American Statistical Association, 92, 767–774.
Yuan, K.-H. & Bentler, P. M. (1997b). Improving parameter tests in covariance structure analysis.
Computational Statistics & Data Analysis, 26, 177–198.
Yuan, K.-H. & Bentler, P. M. (in press a). F-tests for mean and covariance structure analysis.
Journal of Educational and Behavioral Statistics.
Yuan, K.-H. & Bentler, P. M. (in press b). On normal theory and associated test statistics in
covariance structure analysis under two classes of nonnormal distributions. Statistica Sinica.
15
Table 1
Designed Conditions∗
Factor Model A X = Λf + e
11 f ∼ N(0,Φ), e ∼ N(0,Ψ)
21 f = f1/R, e = e1/R
f1 ∼ N(0,Φ), e1 ∼ N(0,Ψ), R ∼√
χ25/3
31 f = f1/R, e = e1/R
f1 ∼ N(0,Φ), e1 ∼ Lognormal(0,Ψ), R ∼√
χ25/3
41 f = f1/R, e = e1/R
f1 ∼ Lognormal(0,Φ), e1 ∼ Lognormal(0,Ψ), R ∼√
χ25/3
Factor Model B X = Σ12 e, Σ = ΛΦΛ′ + Ψ
12 e ∼ Lognormal(0, I)
22 e = e1/R, e1 ∼ Lognormal(0, I), R ∼√
χ25/3
Intra-class Model X = Σ12 e, Σ = θ1I+θ211’
11 e ∼ N(0,I)
21 e = e1/R, e1 ∼ N(0,I), R ∼√
χ25/3
33 e ∼ Lognormal(0, I)
43 e = e1/R, e1 ∼ Lognormal(0, I), R ∼√
χ25/3
∗ e and f are independent, e1, f1, and R are independent1 CV (α) = 0; 2 CV (α) = .089; 3 CV (α) = 2.38;
Table 2
Tail Behavior of Different Statistics
Factor Model A, Condition 1Sample Size N
150 200 300 500 1000 5000M 235.33 168.46 130.23 110.05 97.09 89.03
TB SD 57.32 34.83 22.99 19.01 15.82 13.89R 500 486 413 238 100 43D1
KS .991 .944 .796 .540 .285 .072M 97.27 94.33 92.03 90.68 88.54 87.47
TC SD 23.69 19.51 16.25 15.66 14.43 13.63R 140 97 63 58 41 35DKS .251 .197 .139 .111 .047 .030M 90.89 89.75 88.24 87.72 87.91 87.29
TSB SD 13.67 13.34 13.10 13.01 13.16 13.57R 48 39 32 23 27 35DKS .126 .098 .052 .049 .052 .029M 90.36 90.50 90.22 89.81 88.31 87.44
TY B SD 8.53 10.06 11.00 12.61 13.06 13.40R 4 14 20 32 35 31DKS .219 .192 .159 .114 .062 .026M 1.14 1.10 1.07 1.05 1.02 1.01
TF SD .279 .227 .188 .181 .166 .157R 58 53 41 42 40 33DKS .176 .156 .145 .111 .054 .028
Fn1,n2 M 1.03 1.02 1.01 1.00 1.00 1.00SD .248 .208 .183 .168 .159 .153
1Critical values for DKS at significance levels .05 and .01 are respectively .061 and .073
Table 3
Tail Behavior of Different Statistics
Factor Model A, Condition 2Sample Size N
150 200 300 500 1000 5000M 226.24 166.23 129.50 110.04 97.98 89.44
TB SD 47.77 28.78 20.40 15.97 13.68 13.06R 500 495 422 240 94 33D1
KS .991 .955 .808 .598 .339 .105M 93.51 93.09 91.51 90.67 89.36 87.87
TC SD 19.72 16.10 14.40 13.15 12.47 12.82R 82 70 53 33 28 24DKS .182 .189 .142 .135 .103 .058M 90.47 88.48 87.19 86.11 86.02 86.67
TSB SD 12.36 12.13 11.88 11.78 13.12 12.67R 36 24 22 15 18 19DKS .127 .088 .044 .055 .057 .031M 89.28 90.11 89.99 89.91 89.09 87.84
TY B SD 7.38 8.55 9.82 10.61 11.31 12.60R 2 4 12 19 18 24DKS .216 .197 .182 .165 .113 .058M 1.10 1.08 1.06 1.05 1.03 1.01
TF SD .232 .188 .167 .152 .144 .148R 29 30 32 24 22 24DKS .163 .174 .154 .153 .109 .059
1Critical values for DKS at significance levels .05 and .01 are respectively .061 and .073
Table 4
Tail Behavior of Different Statistics
Factor Model A, Condition 3Sample Size N
150 200 300 500 1000 5000M 220.89 162.48 126.04 107.24 97.61 89.69
TB SD 42.20 27.34 17.81 14.83 12.70 12.16R 500 498 412 208 88 25D1
KS .998 .968 .812 .548 .332 .116M 91.30 90.99 89.07 88.37 89.02 88.11
TC SD 17.42 15.29 12.57 12.21 11.57 11.93R 75 56 31 23 22 19DKS .128 .123 .078 .075 .097 .072M 92.12 90.64 88.93 88.19 88.24 87.19
TSB SD 12.17 11.10 11.39 11.66 12.36 13.30R 33 25 17 20 28 22DKS .192 .151 .102 .077 .069 .026M 88.57 89.04 88.39 88.06 88.81 88.08
TY B SD 6.78 8.18 8.76 9.99 10.53 11.73R 1 3 4 9 14 17DKS .212 .179 .158 .121 .117 .074M 1.07 1.06 1.03 1.02 1.03 1.01
TF SD .205 .178 .146 .141 .133 .137R 20 23 15 18 17 18DKS .127 .132 .122 .100 .108 .073
1Critical values for DKS at significance levels .05 and .01 are respectively .061 and .073
Table 5
Tail Behavior of Different Statistics
Factor Model A, Condition 4Sample Size N
150 200 300 500 1000 5000M 220.47 161.82 125.16 107.40 97.10 90.12
TB SD 41.28 26.51 17.14 14.65 13.74 12.01R 498 495 404 200 87 26D1
KS .998 .959 .799 .552 .291 .142M 91.13 90.62 88.45 88.50 88.55 88.53
TC SD 17.04 14.83 12.10 12.06 12.52 11.79R 70 53 21 25 29 20DKS .143 .103 .074 .079 .066 .101M 91.77 90.77 88.92 87.52 86.68 87.61
TSB SD 12.12 11.24 11.53 11.37 12.10 13.41R 43 24 15 17 15 26DKS .163 .167 .105 .077 .035 .043M 88.52 88.87 87.97 88.17 88.36 88.50
TY B SD 6.79 7.94 8.49 9.85 11.38 11.59R 1 3 2 8 21 19DKS .212 .183 .149 .125 .078 .105M 1.07 1.06 1.02 1.02 1.02 1.02
TF SD .201 .173 .140 .139 .144 .136R 13 19 9 17 25 19DKS .129 .127 .112 .104 .069 .103
1Critical values for DKS at significance levels .05 and .01 are respectively .061 and .073
Table 6
Tail Behavior of Different Statistics
Factor Model B, Condition 1Sample Size N
150 200 300 500 1000 5000M 223.50 159.59 125.83 108.20 97.52 89.40
TB SD 48.09 25.75 19.60 15.40 13.36 12.42R 500 496 392 220 88 31D1
KS .999 .959 .771 .552 .314 .102M 92.38 89.37 88.92 89.16 88.94 87.82
TC SD 19.86 14.40 13.83 12.68 12.18 12.19R 91 40 38 26 28 26DKS .145 .079 .071 .082 .094 .061M 92.60 92.01 91.05 88.46 87.75 87.30
TSB SD 13.16 12.94 13.59 12.70 12.84 12.56R 50 41 52 25 29 24DKS .179 .178 .121 .063 .032 .030M 88.81 88.21 88.20 88.69 88.72 87.80
TY B SD 7.49 7.80 9.64 10.35 11.04 11.98R 2 5 8 13 20 24DKS .203 .171 .115 .115 .115 .065M 1.09 1.04 1.03 1.03 1.02 1.01
TF SD .234 .168 .160 .146 .140 .140R 29 14 19 21 23 25DKS .110 .111 .079 .099 .106 .063
1Critical values for DKS at significance levels .05 and .01 are respectively .061 and .073
Table 7
Tail Behavior of Different Statistics
Factor Model B, Condition 2Sample Size N
150 200 300 500 1000 5000M 225.93 163.12 128.80 109.25 98.60 90.01
TB SD 44.19 25.36 18.49 14.31 13.59 11.54R 499 497 425 232 99 22D1
KS .997 .965 .823 .598 .338 .136M 93.02 91.35 91.02 90.02 89.92 88.43
TC SD 18.30 14.18 13.05 11.78 12.38 11.33R 89 49 36 34 26 20DKS .191 .144 .126 .144 .105 .097M 96.38 93.44 93.01 91.04 89.21 87.77
TSB SD 12.55 13.25 13.77 12.35 12.67 11.80R 72 52 53 32 29 20DKS .303 .216 .174 .154 .097 .065M 89.34 89.31 89.73 89.43 89.61 88.40
TY B SD 6.94 7.61 9.00 9.61 11.19 11.13R 1 2 8 7 20 17DKS .216 .208 .187 .168 .123 .101M 1.10 1.06 1.05 1.04 1.04 1.02
TF SD .215 .165 .151 .136 .143 .130R 23 17 21 21 24 18DKS .155 .165 .156 .157 .114 .099
1Critical values for DKS at significance levels .05 and .01 are respectively .061 and .073
Table 8
Tail Behavior of Different Statistics
Intra-class Correlation Model, Condition 1Sample Size N
150 200 300 500 1000 5000M 710.00 340.19 213.68 163.32 136.88 120.23
TB SD 222.08 77.25 38.87 26.65 18.83 16.60R 500 500 488 362 163 35D1
KS 1.00 1.00 .945 .733 .440 .080M 146.73 137.78 128.92 124.45 120.59 117.37
TC SD 45.85 31.25 23.43 20.29 16.57 16.19R 225 189 111 87 41 24DKS .409 .365 .249 .180 .071 .043M 120.76 120.74 119.83 118.23 117.60 116.83
TSB SD 16.24 15.75 16.22 15.76 15.52 15.97R 38 40 37 27 28 19DKS .092 .084 .066 .028 .051 .047M 122.24 124.50 123.83 122.51 120.16 117.36
TY B SD 6.60 10.34 12.86 15.00 14.51 15.81R 4 12 24 37 24 21DKS .318 .263 .204 .130 .089 .038M 1.29 1.19 1.10 1.06 1.02 .995
TF SD .404 .270 .201 .173 .141 .137R 78 90 62 62 31 24DKS .253 .286 .210 .154 .081 .040
Fn1,n2 M 1.07 1.03 1.01 1.01 1.00 1.00SD .319 .213 .170 .150 .139 .132
1Critical values for DKS at significance levels .05 and .01 are respectively .061 and .073
Table 9
Tail Behavior of Different Statistics
Intra-class Correlation Model, Condition 2Sample Size N
150 200 300 500 1000 5000M 663.12 324.51 209.38 163.15 139.40 123.02
TB SD 183.07 66.43 33.28 22.60 17.42 14.62R 500 500 498 412 184 38D1
KS 1.00 1.00 .964 .791 .499 .171M 137.05 131.43 126.32 124.32 122.81 120.09
TC SD 37.80 26.88 20.06 17.20 15.33 14.26R 174 130 93 59 37 31DKS .334 .248 .198 .159 .134 .100M 109.69 109.41 110.34 112.19 112.79 116.01
TSB SD 16.28 16.34 15.90 15.00 14.18 14.65R 7 3 7 9 7 15DKS .201 .193 .191 .149 .157 .054M 121.01 122.59 122.57 122.58 122.14 120.03
TY B SD 6.15 9.27 11.39 12.77 13.34 13.92R 3 7 15 25 24 29DKS .293 .251 .175 .192 .148 .104M 1.21 1.13 1.08 1.06 1.04 1.02
TF SD .333 .232 .172 .147 .130 .121R 46 62 46 40 31 30DKS .197 .209 .178 .172 .142 .102
1Critical values for DKS at significance levels .05 and .01 are respectively .061 and .073
Table 10
Tail Behavior of Different Statistics
Intra-class Correlation Model, Condition 3Sample Size N
150 200 300 500 1000 5000M 764.34 352.00 221.30 171.09 143.55 125.88
TB SD 231.25 69.61 34.49 24.50 17.94 15.33R 500 500 498 436 217 62D1
KS 1.00 1.00 .979 .837 .578 .200M 157.96 142.56 133.52 130.37 126.47 122.88
TC SD 47.74 28.16 20.79 18.65 15.79 14.95R 281 207 145 102 64 48DKS .519 .420 .321 .280 .221 .127M 102.44 103.70 103.51 105.93 108.86 111.23
TSB SD 23.71 25.86 26.79 27.77 28.62 31.27R 21 29 35 46 57 63DKS .361 .366 .377 .347 .309 .252M 123.88 126.43 126.61 126.99 125.32 122.74
TY B SD 6.34 8.90 11.28 13.41 13.63 14.57R 2 8 29 57 43 42DKS .346 .355 .307 .275 .233 .128M 1.39 1.23 1.14 1.11 1.07 1.04
TF SD .421 .243 .178 .159 .134 .127R 113 115 91 79 52 46DKS .378 .351 .299 .277 .228 .127
1Critical values for DKS at significance levels .05 and .01 are respectively .061 and .073
Table 11
Tail Behavior of Different Statistics
Intra-class Correlation Model, Condition 4Sample Size N
150 200 300 500 1000 5000M 752.35 344.56 218.52 167.81 142.07 126.37
TB SD 216.09 62.04 31.49 21.03 16.86 13.50R 500 500 499 448 219 42D1
KS 1.00 1.00 .983 .862 .545 .245M 155.49 139.54 131.84 127.87 125.16 123.36
TC SD 44.61 25.10 18.98 16.01 14.84 13.16R 277 183 127 71 53 30DKS .517 .391 .303 .250 .199 .173M 98.45 98.67 99.13 100.21 104.70 106.86
TSB SD 24.10 25.37 25.27 24.76 25.51 26.85R 21 20 22 23 39 41DKS .436 .416 .426 .422 .371 .334M 123.75 125.63 125.80 125.28 124.21 123.22
TY B SD 5.83 8.14 10.41 11.67 12.90 12.84R 2 3 20 27 37 28DKS .372 .360 .300 .274 .204 .172M 1.37 1.20 1.13 1.09 1.06 1.05
TF SD .393 .217 .162 .136 .126 .112R 100 88 68 51 43 30DKS .365 .356 .283 .261 .203 .173
1Critical values for DKS at significance levels .05 and .01 are respectively .061 and .073
Table 12
Different Statistics for the Model in Example 1TML TB TC TSB TY B TF
Statistics 230.46 504.93 178.63 220.76 149.31 1.32
P-values 1.0 × 10−6 0 .01 7.5 × 10−6 .22 .09
Table 13
Summary and RecommendationType of Data Recommended StatisticsNormal data TML
Nonnormal data with
a nonsingular Ω1 TY B or TF
Nonnormal data with a nonsingular
Ω and a near zero CV(α) TSB
Nonnormal data with
a near singular Ω Further study needed
1Ω = σ′c(θ)SY σc(θ)