normal theory based test statistics in structural equation...

British Journal of Mathematical and Statistical Psychology (1998), 51, 289–309.

Normal Theory Based Test Statisticsin Structural Equation Modeling∗

Ke-Hai Yuan and Peter M. Bentler

University of California, Los Angeles

September 12, 1997; Revised, February 20, 1998

∗This research was supported in part by the National Institute on Drug Abuse, grants DA01070and DA00017.

Even though data sets in psychology are seldom normal, the statistics used to evaluate covariance

structure models are typically based on the assumption of multivariate normality. Consequently,

many conclusions based on normal theory methods are suspect. In this paper, we develop test

statistics that can be correctly applied to the normal theory maximum likelihood estimator. We

propose three new asymptotically distribution free (ADF) test statistics that technically must yield

improved behavior in realistic-sized samples, and use Monte Carlo methods to study their actual

finite sample behavior. Results indicate that there exists an ADF test statistic that also performs

quite well in finite sample situations. Our analysis shows that various forms of ADF test statistics

are sensitive to model degrees of freedom rather than to model complexity. A new index is proposed

for evaluating whether a rescaled statistic will be robust. Recommendations are given regarding

the application of each test statistic.

1. Introduction

With the advance of factor analytic models to more flexible and confirmatory models, and with

the help of popular software like LISREL (Joreskog & Sorbom, 1993) and EQS (Bentler, 1995),

the literature on structural equation modeling has increased dramatically in the past decade (see,

e.g., Austin & Calderon, 1996; Austin & Wolfle, 1991; Tremblay & Gardner, 1996). There is a

vast amount of recent introductory (Dunn et al., 1993; Byrne, 1994; Mueller, 1996; Schumacker

& Lomax, 1996; Ullman, 1996) and overview material (Hoyle, 1995; Marcoulides & Schumacker,

1996). However, these sources are not complete enough to accurately describe the problems and

alternatives associated with the most critical part of modeling, namely, model evaluation, especially

under conditions with real data that tend to be badly distributed. As recently reviewed by Bentler

and Dudgeon (1996), there exist test statistics, proposed by Browne (1984), whose asymptotic

distributions are known, but their finite sample behaviors are very disappointing (Hu et al., 1992;

Chan, 1995); there also exists a rescaled statistic, proposed by Satorra and Bentler (1988, 1994),

that works well in a variety of distribution situations (Curran et al., 1996; Hu et al., 1992), however,

its asymptotic distribution is generally not known. In this paper, we will propose and study some

new test statistics. We aim to find statistics whose asymptotic distributions are known that also

work well in finite samples with normal and nonnormal data.

Although many methods exist in structural equation modeling, we will concentrate our efforts

on tests associated with Wishart maximum likelihood (ML) estimation for several reasons. First,

the field largely developed on the basis of this classical method of estimation and the associated

likelihood ratio test TML for model evaluation (e.g., Bollen, 1989; Joreskog, 1969). Second, the

ML method has been implemented in virtually all structural equation modeling software packages.

Third, ML is by far the most widely used method in practice (e.g., Breckler, 1990; Gierl & Mulvenon,

1995). Our new statistics will result from modifications to an existing statistic by Browne (1984,

Proposition 4). A detailed analysis will be given on why the existing statistic fails when model

degrees of freedom get large. Since the Satorra-Bentler rescaled statistic has been reported in many

places to perform well, we will also give a new characterization which permits the robustness of this

rescaled statistic to be evaluated. To motivate the development, we will use a real data example to

demonstrate the inadequacy of existing statistics applied to psychological data.

Harlow, Stein, and Rose (1998) studied a variety of scales and measures regarding psychosocial

functioning, sexual experience, and substance use in an attempt to understand the antecedents of

HIV-risky sexual behaviors in two samples of women assessed across three time points. Nineteen

of their variables, a subset of their study variables for two time points, were made available to us.

Exploratory factor analysis might be used to find out how many latent factors would be needed to

explain the 190 elements of the covariance matrix (171 intercorrelations and 19 variances) of these 19

variables. However, the authors believed that a confirmatory factor model with six factors based on

53 parameters and 137 degrees of freedom could explain the variances and covariances. In particular,

the authors hypothesized that four variables called meaninglessness in life, stress, demoralization,

and powerlessness would be highly related and would measure one latent factor they called Poor

Psychosocial Functioning. The frequency, intensity, and amount of alcohol use were hypothesized

1

to correlate highly and be good indicators of a latent Alcohol Use factor. Variables of body kissing,

genital kissing and touching, and sexual intercourse from various positions were expected to be good

markers of a latent factor of Common Sexual Experience. Variables of positive, negative and control

psychosexual attitudes were expected to mark a Positive Psychosexual Attitudes factor. Similarly,

Drug Use was indicated by recreational drug use frequency, hard drug use frequency, and amount

of drug use, while Diverse Sexual Experience was marked by third-person sex, use of toys or films

with sex, and engaging in sex that is hurtful or painful.

Letting the six factors be correlated, the authors’ substantive theory implies that their 19 vari-

ables X could be generated by a model X = Λf +ε with Λ(19×6), f(6×1), and ε(19×1). Based on

a sample with size N=213, their model was evaluated by normal theory ML. When referred to the

nominal χ2137 with TML = 230.46, the associated p-value=1.01×10−6, implying that the model is not

adequate. However, other evidence, such as a small average absolute residual of about .05 between

the data and model-reproduced correlation matrices, implies that the model may be statistically

adequate. The significant TML may be due to the nonnormality of the data. In fact, examining

the distribution of this data set, we find that the largest marginal kurtosis is 12.96, which is associ-

ated with the hard drug use variable, and Mardia’s normalized multivariate kurtosis coefficient was

11.87. Hence, we cannot believe the normal theory statistic TML. We shall return to this example

after we present some more reliable test statistics.

In the rest of this section, we briefly review the two test statistics proposed by Browne and

Satorra and Bentler. Our new test statistics will be given in the next section. The empirical

performances of these statistics will be studied in sections 3 and 4.

Let Xi = (xi1, · · · , xip)′, i = 1, · · · , N = n + 1 be a sample from X = (x1, . . . , xp)

′, where

cov(X) = Σ is the population covariance matrix and S is its sample analogue. A covariance structure

on Σ can be expressed as a matrix function Σ = Σ(θ0) of q unknown parameters. Assuming the

normality of X, the estimate θ is obtained by minimizing

FML(θ) = tr(SΣ−1(θ)) − log |SΣ−1(θ)| − p, (1)

and TML = nFML(θ) is the associated statistic for evaluating the hypothetical model Σ = Σ(θ0).

Under the assumption of multivariate normality and the null hypothesis, TML is asymptotically

distributed as χ2p∗−q, where p∗ = p(p + 1)/2. Conditions also exist for normal theory inference to

be valid for nonnormal data with some specific models (Amemiya & Anderson, 1990; Anderson

& Amemiya, 1988; Browne, 1987; Browne & Shapiro, 1988; Mooijaart & Bentler, 1991; Satorra

& Bentler, 1990, 1991; Shapiro, 1987). Unfortunately, there is no effective way of verifying these

conditions in practice. When these conditions are not satisfied, the test statistic TML completely

breaks down as reported by Hu et al. (1992). So if the normal theory based statistic is going to be

used in conditions for which it was not designed, such as with nonnormal data, it will have to be

modified in some way.

Since the assumption of multivariate normality of data sets in psychology are often unrealistic

(e.g., Micceri, 1989), Browne (1984) proposed a test statistic which does not require any specific

distribution assumptions of the observed data. Let vech(·) be an operator which transforms a

2

symmetric matrix into a vector by picking the nonduplicated elements in the matrix, s = vech(S),

σ(θ) = vech(Σ(θ)), and denote the p∗ × q Jacobian matrix corresponding to σ(θ) as σ(θ). Then

there exists a full column rank p∗ × (p∗ − q) matrix σc(θ) whose columns are orthogonal to those of

σ(θ). Let Yi = vech(Xi − X)(Xi − X)′, SY be the corresponding sample covariance matrix of Yi.

For an estimate θ, the test statistic given by Browne (1984) is defined as

TB(θ) = ne′σc(θ)σ′c(θ)SY σc(θ)−1σ′

c(θ)e, (2)

where e = s − σ(θ). When SY is nonsingular, another equivalent form for TB, for which one does

not need to compute σc(θ), is

TB(θ) = ne′[S−1Y − S−1

Y σ(θ)σ′(θ)S−1Y σ(θ)−1σ′(θ)S−1

Y ]e. (3)

The test statistic TB(θ) asymptotically follows χ2p∗−q as long as θ is consistent. When θ is obtained

by minimizing

FGLS(θ) = (s − σ(θ))′S−1Y (s − σ(θ)), (4)

it is called the asymptotically distribution free (ADF) estimate and TADF = nFGLS(θ) is called the

classical ADF test statistic. We will refer to TB(θ) as the residual-based ADF test statistic for a

general consistent estimate θ. When sample size is large enough, the statistic TADF does perform

as expected (Chou et al., 1991; Curran et al., 1996; Hu et al., 1992; Muthen & Kaplan, 1985,

1992; Tanaka, 1984). However, in the typical situation with large models and small to moderate

sample sizes, TADF rejects correct models far too frequently, e.g., yielding up to 68% rejection of

the correct model in the study of Curran et al.. This problem was substantially eliminated by

Yuan and Bentler (1997a, in press a), who proposed some new test statistics based on minimizing

(4). Their test statistics are also asymptotically distribution free but outperform the classical ADF

test in small to moderate sample sizes. However, there often exist computational problems and

nonconvergent solutions when minimizing the function FGLS(θ) in (4) when model size is large and

sample size is small, and no statistics can be obtained without a convergent solution. This type of

problem hardly exists with minimizing (1), which is another argument for using ML when data are

not normal. However, there is not much point to using ML if it is impossible to accurately evaluate

models. The purpose of this paper is to develop some new procedures to evaluate models based on

ML estimates.

When the normality assumption does not hold, we can still estimate the unknown parameter θ0

by minimizing the function FML(θ), but TML will generally approach a nonnegative random variable

Q instead of χ2p∗−q. Let W = 2D′

p(Σ−1 ⊗ Σ−1)Dp, where Dp is the p2 × p∗ duplication matrix as

defined in Magnus and Neudecker (1988, p. 49), and σ = σ(θ0), then

U = W −Wσ(σ′Wσ)−1σ′W

is the residual weight matrix, as so named in Bentler and Dudgeon (1996). Satorra and Bentler

(1988) decomposed Q =∑p∗−q

1 αiτi, where the α′is are the nonzero eigenvalues of the matrix UΓ

with Γ = cov[vech(X − µ)(X − µ)′], τ ′is are independent chi-square variates each with degree

3

of freedom 1. When all the α′is are equal to 1, which is the case for normal data, then Q follows

χ2p∗−q; when all the α′

is are equal to α, then Q follows αχ2p∗−q. Based on this, Satorra and Bentler

proposed the statistic

TSB =p∗ − q

tr(U Γ)TML, (5)

where Γ = SY , and U is a consistent estimate of U . When all the α′is are equal, tr(USY ) approaches

(p∗ − q)α, and TSB approaches χ2p∗−q. Generally, the asymptotic distribution of TSB is unknown.

Let α be the mean and SD(α) = ∑p∗−qi=1 (αi − α)2/(p∗ − q − 1)

12 be the standard deviation of the

α′is, respectively. Then the coefficient of variation of the α′

is is given by CV(α) =SD(α)/α. So TSB

is asymptotically robust when CV(α) = 0, and TSB will not be asymptotically robust in general.

Existing empirical studies such as Chou et al. (1991), Curran et al. (1996), and Hu et al. (1992)

did not report the index CV(α). They only reported the marginal skewnesses and kurtoses of the

observed variables. However, Yuan and Bentler (in press b) showed that even though the observed

variables have heterogeneous marginal skewnesses and kurtoses, CV(α) can still be zero. So it is

impossible to know whether previous Monte Carlo studies violated the condition CV(α)=0 when

reporting the robustness of TSB. We will study TSB by specifying different CV(α) in our designed

conditions. This will allow us to evaluate the robustness of TSB when CV(α) 6= 0. There are several

other statistics that rescale TML (e.g., Kano, 1992; Shapiro & Browne, 1987). We will not study

them here since they are not as general as the statistic TSB in robustness (e.g., Hu et al., 1992).

Another aspect we want to make clear in this paper is that the test statistics TB(θ) and TADF are

sensitive to model degrees of freedom p∗− q, rather than to “model complexity”, defined as number

of parameters. Large degrees of freedom previously have been implicated in poor performance of

some test statistics. For example, Browne (1984, p. 79) conjectured that the sensitivity of TML

to kurtosis increases as the number of degrees of freedom for the model increases. Muthen and

Kaplan (1992) obtained a phenomenon of poorer performance of the statistic TADF with models of

larger size. However, they confounded three aspects: number of variables, number of parameters,

and number of degrees of freedom. The four models they studied varied in the same direction on

all three aspects, so that model that had a greater number of variables also had greater number of

parameters and more degrees of freedom. Thus it is impossible to conclude which of these aspects is

responsible for poorer performance of TADF . Curran et al. (1996, p. 18) concluded from their review

that: “The ADF χ2 appeared to be very sensitive to model complexity, with extreme inflation of the

model chi-square as the tested model became increasingly complex”, where they define complexity

as the number of parameters estimated in the model. However, this seemingly correct remark is

misleading, as will be seen in our analysis of the statistic TB in the later sections. It is not the

number of parameters that is critical, but the degrees of freedom.

2. Three Asymptotically Distribution Free Test Statistics

In the definition of TB(θ) in (2), SY is used to estimate the population fourth-order moment

matrix Γ. A typical element of Γ is

γij,kl = σijkl − σijσkl,

4

where σijkl=E(xi − µi)(xj − µj)(xk − µk)(xl − µl) is the fourth-order multivariate moment of X

about its mean µ = (µ1, . . . , µp)′ and σij is an element of Σ. The estimation of Γ by SY in Browne

(1984) is only based on consistency considerations. This is a large sample property. We will consider

some new estimates Γ which have the same consistency property as SY , as well as having good finite

sample properties. This will lead to several new test statistics. These test statistics can be applied

to any consistent estimates, but, for the reason mentioned earlier, we will restrict their applications

to the normal theory ML estimates in our study.

In the regression literature, a matrix based on cross products of model residuals is regularly

used for estimating the population counterpart in obtaining standard errors. For a consistent θ,

Γ =1

n

N∑

i=1

(Yi − σ(θ))(Yi − σ(θ))′ = SY +N

n(Y − σ(θ))(Y − σ(θ))′ (6)

is also a consistent estimate of Γ. Since Y 6= σ(θ) generally, Γ is different from SY . Replacing SY

by Γ in (2), we obtain a new statistic

TY B(θ) = TB(θ)/(1 + NTB(θ)/n2). (7)

This statistic is also asymptotically distribution free for any consistent θ. As sample size gets large,

TY B approaches χ2p∗−q under the null hypothesis of a correct structure. So TY B is asymptotically

equivalent to TB. But since TY B < TB numerically for finite sample size N , we can expect that the

overrejection rate of TB for correct models with smaller sample sizes can be lessened by using TY B.

Notice that an estimate Γ that appears in (2) involves a form σ′c(θ)Γσc(θ)−1. For any sym-

metric random matrix A, it generally holds that E(A−1) ≥ (EA)−1 (see excise 3.7 on page 114 of

Muirhead, 1982). So σ′c(θ)SY σc(θ)−1 will be stochastically larger for estimating σ′

c(θ)Γσc(θ)−1

even though SY may be a good estimate of Γ. For example, let Z1, . . ., ZN be a sample from

Nm(µ,Ω). Then the corresponding sample covariance SZ is an unbiased estimator of Ω. However,

E(S−1Z ) = nΩ−1/(n − m − 1), so that an unbiased estimator of Ω−1 is given by (n − m − 1)S−1

Z /n

rather than by S−1Z . Although S−1

Z is still consistent for Ω−1, it is positively biased and the bias

will proportionate to the model degrees of freedom. Note that Ω = σ′c(θ)SY σc(θ) corresponds to the

sample covariance matrix of Zi = σ′c(θ)Yi. Even though the Zi here are not a random sample from

Np∗−q(µ,Ω), it is hard to imagine the advantage of Ω−1 over n− (p∗ − q)− 1Ω−1/n in estimating

Ω−1. See Yuan and Bentler (1997b) on an application. Using this new estimate for Ω = σ′c(θ)Γσc(θ)

in TB(θ) leads to a new test statistic

TC(θ) = n − (p∗ − q) − 1TB(θ)/n. (8)

Similar to TB and TY B, the statistic TC is also asymptotically distribution free and approaches χ2p∗−q

as sample size gets large.

Recognizing that the statistics TB is stochastically too large to be approximated by χ2p∗−q with

finite sample sizes, the new statistics TY B and TC modified TB in order to get better approxima-

tions. The next test procedure is obtained by modifying the reference distribution χ2p∗−q to a new

distribution instead of modifying the test statistic TB.

5

The nice quadratic form of TB in (2) reminded Yuan and Bentler (in press a) of the well-known

Hotelling’s statistic

T 2 = N(AX − b)′(ASA′)−1(AX − b),

which is used for testing the hypothesis Aµ = b. In a similar way, we may use (2) to test the

hypothesis σc(σ(θ0) − σ) = 0. Of course, we can use a chi-square distribution to approximate the

distribution of T 2, but the approximation will be bad unless the model degrees of freedom are small

and N is large. For example, let r be the dimension of the vector b, then

E(T 2) =(N − 1)r

N − r − 2,

var(T 2) =2r(N − 1)2(N − 2)

(N − r − 2)2(N − r − 4),

both the expectation and variance of T 2 are much larger than those of χ2r, respectively. When

r/N is not ignorable, T 2 will be stochastically much larger than a χ2r variate. This will inevitably

lead to overrejection when using critical values from χ2r to judge the significance of T 2. A better

approximation to the distribution of T 2 is through an F distribution

(N − r)T 2/r(N − r) ∼ Fr,N−r. (9)

Even when a sample is not from a multivariate normal distribution and the observations are not

independent, the relation in (9) can still be exact. Actually, the approximation in (9) is quite

robust to a variety of nonnormal distributions (e.g., Mardia, 1975; Kariya, 1981). Thus motivated,

we propose to approximate the distribution of TB by a Hotelling’s T 2 distribution, giving the statistic

TF = N − (p∗ − q)TB/(N − 1)(p∗ − q),

which is referred to an F-distribution with degrees of freedom p∗−q and N − (p∗−q). Note that the

statistic TF with its associated new distribution is asymptotically equivalent to the residual-based

ADF statistic. Thus, it is also asymptotically distribution free. The difference between TB and TF

will increase as the degrees of freedom p∗ − q get larger for a given finite sample size N .

The above three statistics TY B, TC, and TF are closely related. All of them are simple functions

of TB and are asymptotically distribution free. The asymptotic equivalence of TY B and TC to TB is

easy to see. The asymptotic equivalence of TF to TB can be seen through the ratio

F =χ2

p∗−q/(p∗ − q)

χ2N−(p∗−q)/N − (p∗ − q)

where (p∗ − q)F approaches χ2p∗−q in distribution. If we were only interested in the asymptotic

distributions of these test statistics, there would be no need to develop TY B, TC or TF , since they

are all equivalent to the existing statistic TB. However, our interest is in the finite sample behavior

of these statistics, which may be quite different from the asymptotic behavior. Since these statistics

are complicated nonlinear functions of the observed variables, the only possible approach to study

their finite sample behavior is through empirical simulations. This will be given in the next two

sections.

6

3. Models and Designs

Three models are used in our study. The first one is a fifteen dimensional confirmatory factor

model X = Λf + e, as used in Hu et al. (1992), with three common factors each having its own five

indicators. This generates a covariance structure Σ(θ) = ΛΦΛ′ + Ψ. The population parameters

are given by

Λ =

λ 0 00 λ 00 0 λ

, Φ =

1.0 .30 .40.30 1.0 .50.40 .50 1.0

,

where λ′ = (70, .70, .75, .80, .80) and 0 is a vector of five zeros. The Ψ is a diagonal matrix which

makes all the diagonal elements of Σ be 1.0. In order for the factor model to be identifiable, we

restrict the last factor loading corresponding to each factor at its true value; this fixes the scale of

the factors. All the other nonzero parameters are set as unknown free parameters. So q = 33 for

this model, and degrees of freedom are p∗ − q = 87.

For the population covariance matrix Σ in the above 3-factor model, we let the observed variables

be generated by X = Σ12 e with cov(e) = I in the second model. So the population covariance in

this model is the same as in the first model, and the 3-factor model is still correct for fitting data

from this model. Consequently, we call the first model factor model A and the second model factor

model B.

The details of the variable generation techniques are given in Table 1. In factor model A, the

latent common factors f and the unique factors e are generated under four distribution conditions.

In factor model B, the vector e is generated under two distribution conditions. In factor model A,

condition 1 generates observed normal variables; condition 2 generates variables from an elliptical

distribution with Mardia’s kurtosis parameter equal to 3. Conditions 3 and 4 of factor model A and

conditions 1 and 2 of factor model B generate variables with nonzero skewness and kurtosis. The

difference between factor models A and B is that some aspects of the 4th-order moment matrices

of the observed variables are different.

Insert Table 1 near here

The other model in our study is a fifteen-dimensional intraclass correlation model Σ(θ) = θ1I +

θ211′, where 1 represents a column vector of elements 1.0. The population parameters are given

by θ1 = θ2 = .5. So q=2 for this model, and the degrees of freedom are p∗ − q = 118. Let

X = Σ12 e. The e are generated under 4 different conditions, which are also given in Table 1. As in

the conditions for the confirmatory factor model, conditions 1 and 2 generate normal and elliptical

data, respectively; conditions 3 and 4 generate skewed data with nonzero kurtosis. A motivation

for choosing the intraclass model is to study the sensitivity of the statistic TB to model degrees of

freedom and model complexity. Compared to the confirmatory factor model, the intraclass model

is much simpler, with only two parameters that need to be estimated. On the basis of Curran et

al.’s observations, one would expect TB to perform better for this model than for the factor analysis

model. However, as we shall see in the next section, the statistic TB performs much worse on this

7

kyuan

Highlight

kyuan

Highlight

kyuan

Highlight

kyuan

Highlight

kyuan

Highlight

model than on the factor analysis model. This is because the degrees of freedom for this model are

larger.

As for the study of statistic TSB, the CV(α)s corresponding to each structure and condition are

given at the bottom of Table 1. For all the conditions in factor model A, and conditions 1 and 2

in the intra-class model, CV(α) = 0. So, under these conditions, the statistic TSB should perform

robustly for large sample sizes. For conditions 1 and 2 of factor model B, CV(α) = .089, indicating

only a small discrepancy among the eigenvalues. Hence, in these conditions we would expect the

statistic TSB still to behave well. For conditions 3 and 4 of the intra-class model, on the other hand,

CV(α) = 2.38, a relatively much larger value. Consequently, we should not expect the statistic

TSB to behave robustly in these two conditions, at least when sample size is large. Since we know

the α′is in all the conditions, we can characterize some aspects of the large sample behavior of TSB

even when CV(α) does not equal zero. For example, in conditions 1 and 2 of factor model B, TSB

approaches a statistic with a standard deviation of 13.24; in conditions 3 and 4 of the intra-class

model, TSB approaches a statistic with a standard deviation of 39.46. Comparing these values with

the standard deviations associated with χ287 and χ2

118, which are 13.19 and 15.36, respectively, we

see that TSB should still behave appropriately with conditions 1 and 2 of factor model B, but not

with conditions 3 and 4 of the intra-class model.

The settings in Table 1 allow us to study various aspects of the statistics. The asymptotically

distribution free property of TB, TC, TY B and TF does not necessarily imply that they are distribu-

tion free in finite samples. We can observe their sensitivity to different models and distributional

conditions through our study. For example, in factor model A and the intra-class model, as we

move from conditions 1 to 4, the data change from normal to slight nonnormal, and then to severe

nonnormal. The behavior of TSB under various CV(α) would allow us to judge its robustness under

different violations of robustness conditions. The design will also allow us to see whether TB, as well

as other statistics, is sensitive to model complexity or to model degrees of freedom. Even though we

are not studying the statistic TML, we should note that TML is not robust in any of the nonnormal

conditions in Table 1. This was shown for some of these conditions by Hu et al. (1992).

For each model and each condition, estimates of the unknown parameters in the structural model

are obtained by applying the algorithm of iteratively reweighted least squares (Lee & Jennrich, 1979)

to minimize the FML(θ) in (1). Each test statistic is computed for each model and condition. Six

sample sizes 150, 200, 300, 500, 1000, 5000 are used for each case. Each combination is replicated

500 times. The sample mean, sample standard deviation, and rejection rate based on 5% critical

value are calculated for each statistic over the 500 replications. At a chance level, there will be

approximately 500 × .05 = 25 rejections of the correct model.

Although tail behavior is probably the most important aspect of a test statistic, we also want to

understand the overall accuracy of the approximation of the proposed asymptotic distributions to

the finite sample empirical behavior of these statistics. For this we use the well-known Kolmogorov-

Smirnov (KS) statistic, using the empirical distribution FM(x) of each statistic T based on M=500

replications, and comparing to a theoretical distribution function F (x). The KS-statistic is defined

as DKS = supx |FM(x)− F (x)|. A very large DKS indicates that it is inappropriate to use F (x) to

8

describe the randomness of the statistic T . Critical values of DKS at significance levels .05 and .01

are 1.3581/√

M and 1.6276/√

M , respectively; which are approximately .061 and .073 for M = 500.

More about the KS-statistic can be found in Stuart and Ord (1991, §30.37-§30.42).

4. Empirical Results

Simulation results under the various conditions are presented in Tables 2 to 11, where M, SD,

R, DKS represent the sample mean, the sample standard deviation, the rejection rate, and the

KS-statistic, respectively. For easy comparison, the means and standard deviations of the F distri-

butions are given at the bottom of Tables 2 and 8.

Table 2 summarizes the performance of the various test statistics when data are normal. From

Table 2, it is clearly seen that the statistic TB rejects the correct model too frequently for small

to moderate sample sizes, so that inferences based on TB are not reliable at all. In fact, TB does

not perform nominally until N=5000. The statistic TC performs much better than TB, but TC still

overrejects the correct model too frequently at smaller sample sizes. Note that when N=150, the

correction factor of TC over TB is .42; this factor is not small enough to make TC behave properly

in rejection rate. The statistic TSB performs very well except for a little overrejection for small

N . The statistic TY B overcorrects the rejection rate of TB in that, for smaller sample sizes, TY B

rejects the correct model less than the nominal level. For small to moderate sample sizes, TF also

minimally overrejects the correct model, but performs much better than TC.

With respect to the sample mean and sample standard deviation, those corresponding to TB for

smaller sample sizes are much larger than that of the χ287. Even N=5000 is not large enough to

solve this problem completely. The sample mean and standard deviation of TC retain some heritage

from TB, though they are much better. TSB has minimally larger sample means than 87, while its

sample standard deviations match those of χ287 very well for all the sample sizes studied. While the

means of TY B are a little bit larger than 87, its sample standard deviations are smaller than that

of χ287. The sample means and sample standard deviations corresponding to TF also are a little

bit larger than those of the corresponding F-distributions. With respect to the KS-statistic, those

corresponding to TB are the largest for all the sample sizes. For sample sizes up to 500, TSB has

the smallest DKS . For N=1000, TC has the smallest DKS and TY B has the smallest DKS when

N=5000.

Insert Tables 2 and 3 near here

Table 3 gives results for elliptically symmetric data. The rejection rate of TB is still as bad as

it is for normal data; TC still overrejects the correct model with smaller N , but it performs better

than that for the normal data case. The rejection rate of TY B is almost the same as in the normal

data case, with some underrejection of the correct model for smaller sample sizes. The rejection

rates of TSB and TF are uniformly good at all the sample sizes. With respect to the KS-statistic,

TB has the largest DKS and TSB has the smallest DKS over all the sample sizes studied.

Results for factor model A with skewed data are given in Table 4. Most of the test statistics

behave in a pattern similar to that of the elliptical data case. The statistic TB works well when

9

N=5000; this is the same conclusion as given by Hu et al. (1992) for the classical ADF statistic.

For N=300 and up, the statistic TC behaves very well, showing the effect of the correction factor

n−(p∗−q)−1/n. The test statistic TF for all the sample sizes change from a little overrejection in

Table 3 to a little underrejection of the correct model when compared to the 5% nominal level. The

statistic TY B, on the other hand, behaves in a very stable way across these two types of nonnormal

data over all the sample sizes. The KS-statistics all behave in a similar way as found with elliptical

data. The results based on condition 4, with both factors and errors skewed, are similar to those of

condition 3. They are given in Table 5.


Tables 6 and 7 present the results from factor model B with asymmetric data. It is apparent

that statistics TB, TC , TY B, and TF behave in almost the same way as for conditions 3 and 4 in

factor model A. However, TSB behaves somewhat differently. For example, with moderate sample

sizes, the rejection rate of TSB changes from slightly under the nominal level in Tables 4 and 5 to

slightly over the nominal level in Tables 6 and 7. Also, the rejection rate of TSB is quite high for

smaller sample sizes. This may be due to the minor violation of the robustness condition for TSB.


Next we turn to the intraclass model, for which the results are given in Tables 8-11. First, it

is instructive to compare Tables 8 and 9 with Tables 2 and 3. All these tables give results that

correspond to normal and elliptical data. If the rejection rates of the statistics TB and TC are bad

in Tables 2 and 3, they are even worse in Tables 8 and 9. This is better appreciated through TC

and the KS-statistics. For example, when N=150, the factor n − (p∗ − q) − 1/n = .21 used to

obtain TC is still not small enough to correct the large rejection rate of TB. The DKS corresponding

to TB all equal to 1 for N=150 and 200. Since the upper bound for the DKS is 1, this means that

for the intraclass model with N=150 and 200, using any other distribution to approximate that

of TB can not be worse than using a χ2118 in the criterion of the KS-statistic. An implication is

that the statistic TB is very sensitive to the degrees of freedom in a model rather than to model

complexity. With respect to the other statistics, TF rejects the correct intraclass model more often

than the factor model. Since TF is a nonrandom scalar times TB, the overrejection of TB passes

to TF . This can also be observed in Tables 10 and 11. As in Table 2, the statistic TSB in Table 8

overrejects the correct model in a stable way across all the sample sizes. However, TSB underrejects

the correct model in Table 9 for all sample sizes. This may reflect its sensitivity to model complexity

and degrees of freedom. The statistic TY B in the intraclass model performs as it does in the factor

model.


Tables 10 and 11 correspond to conditions 3 and 4 for the intraclass model. These conditions

were designed to demonstrate the possible nonrobustness of the statistic TSB. Even though the

10

asymptotic distribution of TSB is not χ2p∗−q anymore, using a critical value from the chi-square

distribution still gives reasonable rejection rates in small to moderate sample sizes. However, as

N gets larger, the rejection rate of TSB departs from the nominal level. A trend toward non-

χ2 behavior can also be observed from the standard deviations and the KS-statistics. While the

standard deviation for χ2118 is 15.36, the sample standard deviations of TSB in Tables 10 and 11

are much larger than 15.36 for all sample sizes studied. Statistically, the standard deviation of

TSB will approach 39.46 as N gets larger. This trend can be observed in these two tables. The

nonrobustness of TSB can also be easily detected by the DKS . In conditions for which TSB is robust,

the DKS associated with TSB are almost always the smallest. Even when they are not the smallest,

the obtained DKS are only minimally different from the smallest DKS . In Tables 10 and 11, the

DKS associated with TSB uniformly dominate those associated with TY B for all sample sizes, and

those associated with TF for most of the sample sizes. Furthermore, for conditions under which TSB

is robust or nearly robust, the DKS associated with TSB decrease very fast as N increases; however,

they decrease rather slowly in Tables 10 and 11.


It is also worthwhile to revaluate the model in the example introduced at the beginning of the

paper. Results based on the new test statistics are presented in Table 12. We can see that similarly

to TML, the statistics TB and TSB also suggest that the model in this example is not adequate, even

though TB is specifically designed for large sample nonnormal data. In contrast, TY B suggests that

the model fits the data very well. Similarly, the statistics TF and TC also suggest that the model

is reasonable. Our new finite sample test statistics are consistent with a priori expectations and

residuals.


So far, we concentrated on the performance of the different statistics under the null hypothesis.

The power property of a statistic under an alternative hypothesis is equally important. When a

model slightly departs from the null hypothesis and N becomes large, TC and TY B will approach

the same noncentral chi-square distribution as that of TB, and TF will approach a noncentral F-

distribution with the same noncentrality parameter as the chi-square. The finite sample behavior

in power of these statistics also was studied. Since the power properties of TY B and TF are very

similar to those of their counterparts when applying to the ADF estimates, as reported in Yuan

and Bentler (1997a, in press a), we only give a brief report of these results here. When misfitting

the 3-factor model by a 2-factor model, TY B has an average power of .568 for sample size 150, and

.936 for sample size 200; TF has an average power of .843 for sample size 150, and .982 for sample

size 200; and each of them has an actual power of 1.0 for sample sizes 300 and above. The statistic

TC has a better power than either TY B or TF . Note that the power of a test statistic is closely

related to its type I error rate. A decreasing type I error usually is accompanied by an increase in

type II error. This happens to almost any useful test statistic, e.g., the TSB as studied in Curran

11

et al. (1996). So it is inappropriate to talk about power without controlling the type I error. For

example, the statistic TB always has a power of 1.0 for detecting the wrong 2-factor model, but one

would prefer not to use it since it rejects the correct model with probability 1 for smaller sample

sizes. In most of the statistical literature on power, one generally controls the type I error, since it

is usually considered more important.

5. Discussion and Recommendation

In this paper, we propose the three new test statistics TY B, TC and TF and apply each of

them to the normal theory MLE. All of them are asymptotically equivalent to the residual-based

ADF statistic TB. Empirical studies indicate that, to a certain degree, each of them corrects the

overrejection of TB for correct models in finite samples. Among these three, TY B performs most

stably across different models and distribution conditions. The only drawback of TY B as a general

test statistic is the tendency to slightly overaccept the correct models when sample size is small.

The test statistic TF is very reliable for models with not so large degrees of freedom, however,

its rejection is still too high when model degrees of freedom get larger. The rejection rate of the

statistic TC is also too high for small to moderate sample sizes, though it dramatically improves on

TB.

The remarkable behavior of TSB is once again verified in our study with CV(α)=0. Even when

CV(α)=2.38, its performance in rejection rate is still very good for smaller sample sizes. However,

the robustness of TSB under many conditions does not necessarily mean that it can be applied to

any type of data. Actually, its robustness will break down when CV(α) gets larger, at least with

larger sample sizes. This is demonstrated in Tables 10 and 11 as well as in Table 12, where TML

and TSB give almost the same model evaluation for the example. The drawback of TSB is that we

do not know its asymptotic distribution unless CV(α)=0.

What general conclusions and recommendations can we make from this study? First, we have

shown definitively that TB should not be used in any case. This is because statistics TY B, TC and

TF all possess the same theoretical property as TB but perform much better. If one needs an asymp-

totically distribution free test statistic when facing severely nonnormal data, TY B is recommended.

Statistic TF is also very reliable when the model degrees of freedom are not so large. The statistic

TSB is remarkably reliable when CV(α) ≈ 0. However, there is currently no known statistical test

for the hypothesis CV(α) = 0 in the population, and CV(α) may still be relatively large even if

CV(α) = 0. The development of a statistical test for the hypothesis CV(α) = 0, and empirical re-

search on CV(α) for different types of nonnormal data satisfying CV(α) = 0, are interesting topics

for further study. Of course, the matrix Ω = σ′c(θ)SY σc(θ) in (2) needs to be nonsingular in order

to use TY B or TF ; similarly, the matrix U Γ in (5) needs to be of rank p∗ − q for using the statistic

TSB. Assuming matrices Σ and σc are of full rank, we can show that the nonsingularity of Ω is

equivalent to rank(U Γ) = p∗ − q. In situations with near singular Ω, such as very small sample

size, further study is needed for finding an appropriate statistic with nonnormal data. The above

discussion and our recommendations are summarized in Table 13.


12

References

Amemiya, Y. & Anderson, T. W. (1990). Asymptotic chi-square tests for a large class of factor

analysis models. Annals of Statistics, 18, 1453–1463.

Anderson, T. W. & Amemiya, Y. (1988). The asymptotic normal distribution of estimators in

factor analysis under general conditions. Annals of Statistics, 16, 759–771.

Austin, J. T. & Calderon, R. F. (1996). Theoretical and technical contributions to structural

equation modeling: An updated annotated bibliography. Structural Equation Modeling, 3, 105–

175.

Austin, J. T. & Wolfle, D. (1991). Annotated bibliography of structural equation modeling: Tech-

nical work. British Journal of Mathematical and Statistical Psychology, 44, 93–152.

Bentler, P. M. (1995). EQS Structural Equations Program Manual. Encino, CA: Multivariate

Software.

Bentler, P. M. & Dudgeon, P. (1996). Covariance structure analysis: Statistical practice, theory,

and directions. Annual Review of Psychology, 47, 541–570.

Bollen, K. A. (1989). Structural Equations with Latent Variables. New York: Wiley.

Breckler, S. J. (1990). Application of covariance structure modeling in psychology: Cause for

concern? Psychological Bulletin, 107, 260-273.

Browne, M. W. (1984). Asymptotic distribution-free methods for the analysis of covariance struc-

tures. British Journal of Mathematical and Statistical Psychology, 37, 62–83.

Browne, M. W. (1987). Robustness of statistical inference in factor analysis and related models.

Biometrika, 74, 375–384.

Browne, M. W. & Shapiro, A. (1988). Robustness of normal theory methods in the analysis of linear

latent variate models. British Journal of Mathematical and Statistical Psychology, 41, 193–208.

Byrne, B. M. (1994). Structural Equation Modeling with EQS and EQS/Windows, Thousand Oaks,

CA: Sage.

Chan, W. (1995). Covariance structure analysis of ipsative data. Ph.D. thesis, UCLA.

Chou, C.-P., Bentler, P. M. & Satorra, A. (1991). Scaled test statistics and robust standard errors

for nonnormal data in covariance structure analysis: A Monte Carlo study. British Journal of

Mathematical and Statistical Psychology, 44, 347–357.

Curran, P. S., West, S. G. & Finch, J. F. (1996). The robustness of test statistics to nonnormality

and specification error in confirmatory factor analysis. Psychological Methods, 1, 16–29.

Dunn, G., Everitt, B. & Pickles, A. (1993). Modeling Covariances and Latent Variables Using EQS.

London: Chapman & Hall.

Gierl, M. J. & Mulvenon S. (1995). Evaluating the application of fit indices to structural equation

models in educational research: A review of the literature from 1990 through 1994. Paper

presented at Annual Meetings of the American Educational Research Association, San Francisco.

13

Harlow, L. L., Stein, J. A. & Rose, J. S. (1998). Substance abuse and risky sexual behavior in

women: A longitudinal stage model. Manuscript under review; based on NIMH Grant MH47233.

Hoyle, R. (ed.) (1995). Structural Equation Modeling: Concepts, Issues, and Applications. Thou-

sand Oaks, CA: Sage.

Hu, L., Bentler, P. M. & Kano, Y. (1992). Can test statistics in covariance structure analysis be

trusted? Psychological Bulletin, 112, 351–362.

Joreskog, K. G. (1969). A general approach to confirmatory maximum likelihood factor analysis.

Psychometrika, 34, 183–202.

Joreskog, K. G. & Sorbom, D. (1993). LISREL 8 User’s Reference Guide. Chicago: Scientific

Software International.

Kano, Y. (1992). Robust statistics for test-of-independence and related structural models. Statistics

and Probability Letters, 15, 21–26.

Kariya, T. (1981). A robustness property of Hotelling’s T 2-test. Annals of Statistics, 9, 211–214.

Lee, S.-Y. & Jennrich, R. I. (1979). A study of algorithms for covariance structure analysis with

specific comparisons using factor analysis. Psychometrika, 44, 99–113.

Magnus, J. R. & Neudecker, H. (1988). Matrix Differential Calculus with Applications in Statistics

and Econometrics. New York: Wiley.

Marcoulides, G. A. & Schumacker, R. E. (eds.) (1996). Advanced Structural Equation Modeling:

Issues and Techniques. Mahwah, NJ: Lawrence Erlbaum Associates.

Mardia, K. V. (1975). Assessment of multinormality and the robustness of Hotelling’s T 2 test.

Applied Statistics, 24, 163–171.

Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological

Bulletin, 105, 156–166.

Mooijaart, A. & Bentler, P. M. (1991). Robustness of normal theory statistics in structural equation

models. Statistica Neerlandica, 45, 159–171.

Mueller, R. O. (1996). Basic Principles of Structural Equation Modeling. New York: Springer

Verlag.

Muirhead, R. J. (1982). Aspects of Multivariate Statistical Theory. New York: Wiley.

Muthen, B. & Kaplan, D. (1985). A comparison of some methodologies for the factor analysis of

non-normal Likert variables. British Journal of Mathematical and Statistical Psychology, 38,

171–189.

Muthen, B. & Kaplan, D. (1992). A comparison of some methodologies for the factor analysis of

non-normal Likert variables: A note on the size of the model. British Journal of Mathematical

and Statistical Psychology, 45, 19–30.

Satorra, A. & Bentler, P. M. (1988). Scaling corrections for chi-square statistics in covariance

structure analysis. American Statistical Association 1988 proceedings of Business and Economics

14

Sections (pp. 308–313). Alexandria, VA: American Statistical Association.

Satorra, A. & Bentler, P. M. (1990). Model conditions for asymptotic robustness in the analysis of

linear relations. Computational Statistics & Data Analysis, 10, 235–249.

Satorra, A. & Bentler, P. M. (1991). Goodness-of-fit test under IV estimation: Asymptotic robust-

ness of a NT test statistic. In R. Gutierrez & M. J. Valderrama (eds.), Applied Stochastic Models

and Data Analysis (pp. 555–567). Singapore: World Scientific.

Satorra, A. & Bentler, P. M. (1994).Corrections to test statistics and standard errors in covariance

structure analysis. In A. von Eye & C. C. Clogg (eds.), Latent Variables Analysis: Applications

for Developmental Research (pp. 399-419). Newbury Park, CA: Sage.

Schumacker, R. E. & Lomax, R. G. (1996). A Beginner’s Guide to Structural Equation Modeling.

Mahwah, NJ: Lawrence Erlbaum Associates.

Shapiro, A. (1987). Robustness properties of the MDF analysis of moment structures. South African

Statistical Journal, 21, 39–62.

Shapiro, A. & Browne, M. (1987). Analysis of covariance structures under elliptical distributions.

Journal of the American Statistical Association, 82, 1092–1097.

Stuart, A. & Ord, J. K. (1991). Kendall’s Advanced Theory of Statistics, Vol. 2, 5th ed.. New York:

Oxford University Press.

Tanaka, J. S. (1984). Some results on the estimation of covariance structure models. Ph.D. thesis,

UCLA.

Tremblay, P. F. & Gardner, R. C. (1996). On the growth of structural equation modeling in

psychological journals. Structural Equation Modeling, 3, 93–104.

Ullman, J. (1996). Structural equation modeling. In B. G. Tabachnick & L. S. Fidell, Using Mul-

tivariate Statistics 3rd ed. (Ch. 14, pp. 709–811). New York: Harper Collins College Publishers.

Yuan, K.-H. & Bentler, P. M. (1997a). Mean and covariance structure analysis: Theoretical and

practical improvements. Journal of the American Statistical Association, 92, 767–774.

Yuan, K.-H. & Bentler, P. M. (1997b). Improving parameter tests in covariance structure analysis.

Computational Statistics & Data Analysis, 26, 177–198.

Yuan, K.-H. & Bentler, P. M. (in press a). F-tests for mean and covariance structure analysis.

Journal of Educational and Behavioral Statistics.

Yuan, K.-H. & Bentler, P. M. (in press b). On normal theory and associated test statistics in

covariance structure analysis under two classes of nonnormal distributions. Statistica Sinica.

15

Table 1

Designed Conditions∗

Factor Model A X = Λf + e

11 f ∼ N(0,Φ), e ∼ N(0,Ψ)

21 f = f1/R, e = e1/R

f1 ∼ N(0,Φ), e1 ∼ N(0,Ψ), R ∼√

χ25/3

31 f = f1/R, e = e1/R

f1 ∼ N(0,Φ), e1 ∼ Lognormal(0,Ψ), R ∼√

χ25/3

41 f = f1/R, e = e1/R

f1 ∼ Lognormal(0,Φ), e1 ∼ Lognormal(0,Ψ), R ∼√

χ25/3

Factor Model B X = Σ12 e, Σ = ΛΦΛ′ + Ψ

12 e ∼ Lognormal(0, I)

22 e = e1/R, e1 ∼ Lognormal(0, I), R ∼√

χ25/3

Intra-class Model X = Σ12 e, Σ = θ1I+θ211’

11 e ∼ N(0,I)

21 e = e1/R, e1 ∼ N(0,I), R ∼√

χ25/3

33 e ∼ Lognormal(0, I)

43 e = e1/R, e1 ∼ Lognormal(0, I), R ∼√

χ25/3

∗ e and f are independent, e1, f1, and R are independent1 CV (α) = 0; 2 CV (α) = .089; 3 CV (α) = 2.38;

Table 2

Tail Behavior of Different Statistics

Factor Model A, Condition 1Sample Size N

150 200 300 500 1000 5000M 235.33 168.46 130.23 110.05 97.09 89.03

TB SD 57.32 34.83 22.99 19.01 15.82 13.89R 500 486 413 238 100 43D1

KS .991 .944 .796 .540 .285 .072M 97.27 94.33 92.03 90.68 88.54 87.47

TC SD 23.69 19.51 16.25 15.66 14.43 13.63R 140 97 63 58 41 35DKS .251 .197 .139 .111 .047 .030M 90.89 89.75 88.24 87.72 87.91 87.29

TSB SD 13.67 13.34 13.10 13.01 13.16 13.57R 48 39 32 23 27 35DKS .126 .098 .052 .049 .052 .029M 90.36 90.50 90.22 89.81 88.31 87.44

TY B SD 8.53 10.06 11.00 12.61 13.06 13.40R 4 14 20 32 35 31DKS .219 .192 .159 .114 .062 .026M 1.14 1.10 1.07 1.05 1.02 1.01

TF SD .279 .227 .188 .181 .166 .157R 58 53 41 42 40 33DKS .176 .156 .145 .111 .054 .028

Fn1,n2 M 1.03 1.02 1.01 1.00 1.00 1.00SD .248 .208 .183 .168 .159 .153

1Critical values for DKS at significance levels .05 and .01 are respectively .061 and .073

kyuan

Text Box

The norminal has a df=87

Table 3



150 200 300 500 1000 5000M 226.24 166.23 129.50 110.04 97.98 89.44

TB SD 47.77 28.78 20.40 15.97 13.68 13.06R 500 495 422 240 94 33D1

KS .991 .955 .808 .598 .339 .105M 93.51 93.09 91.51 90.67 89.36 87.87

TC SD 19.72 16.10 14.40 13.15 12.47 12.82R 82 70 53 33 28 24DKS .182 .189 .142 .135 .103 .058M 90.47 88.48 87.19 86.11 86.02 86.67

TSB SD 12.36 12.13 11.88 11.78 13.12 12.67R 36 24 22 15 18 19DKS .127 .088 .044 .055 .057 .031M 89.28 90.11 89.99 89.91 89.09 87.84

TY B SD 7.38 8.55 9.82 10.61 11.31 12.60R 2 4 12 19 18 24DKS .216 .197 .182 .165 .113 .058M 1.10 1.08 1.06 1.05 1.03 1.01

TF SD .232 .188 .167 .152 .144 .148R 29 30 32 24 22 24DKS .163 .174 .154 .153 .109 .059


kyuan

Text Box


Table 4



150 200 300 500 1000 5000M 220.89 162.48 126.04 107.24 97.61 89.69

TB SD 42.20 27.34 17.81 14.83 12.70 12.16R 500 498 412 208 88 25D1

KS .998 .968 .812 .548 .332 .116M 91.30 90.99 89.07 88.37 89.02 88.11

TC SD 17.42 15.29 12.57 12.21 11.57 11.93R 75 56 31 23 22 19DKS .128 .123 .078 .075 .097 .072M 92.12 90.64 88.93 88.19 88.24 87.19

TSB SD 12.17 11.10 11.39 11.66 12.36 13.30R 33 25 17 20 28 22DKS .192 .151 .102 .077 .069 .026M 88.57 89.04 88.39 88.06 88.81 88.08

TY B SD 6.78 8.18 8.76 9.99 10.53 11.73R 1 3 4 9 14 17DKS .212 .179 .158 .121 .117 .074M 1.07 1.06 1.03 1.02 1.03 1.01

TF SD .205 .178 .146 .141 .133 .137R 20 23 15 18 17 18DKS .127 .132 .122 .100 .108 .073


kyuan

Text Box


Table 5



150 200 300 500 1000 5000M 220.47 161.82 125.16 107.40 97.10 90.12

TB SD 41.28 26.51 17.14 14.65 13.74 12.01R 498 495 404 200 87 26D1

KS .998 .959 .799 .552 .291 .142M 91.13 90.62 88.45 88.50 88.55 88.53

TC SD 17.04 14.83 12.10 12.06 12.52 11.79R 70 53 21 25 29 20DKS .143 .103 .074 .079 .066 .101M 91.77 90.77 88.92 87.52 86.68 87.61

TSB SD 12.12 11.24 11.53 11.37 12.10 13.41R 43 24 15 17 15 26DKS .163 .167 .105 .077 .035 .043M 88.52 88.87 87.97 88.17 88.36 88.50

TY B SD 6.79 7.94 8.49 9.85 11.38 11.59R 1 3 2 8 21 19DKS .212 .183 .149 .125 .078 .105M 1.07 1.06 1.02 1.02 1.02 1.02

TF SD .201 .173 .140 .139 .144 .136R 13 19 9 17 25 19DKS .129 .127 .112 .104 .069 .103


kyuan

Text Box


Table 6


Factor Model B, Condition 1Sample Size N

150 200 300 500 1000 5000M 223.50 159.59 125.83 108.20 97.52 89.40

TB SD 48.09 25.75 19.60 15.40 13.36 12.42R 500 496 392 220 88 31D1

KS .999 .959 .771 .552 .314 .102M 92.38 89.37 88.92 89.16 88.94 87.82

TC SD 19.86 14.40 13.83 12.68 12.18 12.19R 91 40 38 26 28 26DKS .145 .079 .071 .082 .094 .061M 92.60 92.01 91.05 88.46 87.75 87.30

TSB SD 13.16 12.94 13.59 12.70 12.84 12.56R 50 41 52 25 29 24DKS .179 .178 .121 .063 .032 .030M 88.81 88.21 88.20 88.69 88.72 87.80

TY B SD 7.49 7.80 9.64 10.35 11.04 11.98R 2 5 8 13 20 24DKS .203 .171 .115 .115 .115 .065M 1.09 1.04 1.03 1.03 1.02 1.01

TF SD .234 .168 .160 .146 .140 .140R 29 14 19 21 23 25DKS .110 .111 .079 .099 .106 .063


kyuan

Text Box


Table 7


Factor Model B, Condition 2Sample Size N

150 200 300 500 1000 5000M 225.93 163.12 128.80 109.25 98.60 90.01

TB SD 44.19 25.36 18.49 14.31 13.59 11.54R 499 497 425 232 99 22D1

KS .997 .965 .823 .598 .338 .136M 93.02 91.35 91.02 90.02 89.92 88.43

TC SD 18.30 14.18 13.05 11.78 12.38 11.33R 89 49 36 34 26 20DKS .191 .144 .126 .144 .105 .097M 96.38 93.44 93.01 91.04 89.21 87.77

TSB SD 12.55 13.25 13.77 12.35 12.67 11.80R 72 52 53 32 29 20DKS .303 .216 .174 .154 .097 .065M 89.34 89.31 89.73 89.43 89.61 88.40

TY B SD 6.94 7.61 9.00 9.61 11.19 11.13R 1 2 8 7 20 17DKS .216 .208 .187 .168 .123 .101M 1.10 1.06 1.05 1.04 1.04 1.02

TF SD .215 .165 .151 .136 .143 .130R 23 17 21 21 24 18DKS .155 .165 .156 .157 .114 .099


kyuan

Text Box


Table 8


Intra-class Correlation Model, Condition 1Sample Size N

150 200 300 500 1000 5000M 710.00 340.19 213.68 163.32 136.88 120.23

TB SD 222.08 77.25 38.87 26.65 18.83 16.60R 500 500 488 362 163 35D1

KS 1.00 1.00 .945 .733 .440 .080M 146.73 137.78 128.92 124.45 120.59 117.37

TC SD 45.85 31.25 23.43 20.29 16.57 16.19R 225 189 111 87 41 24DKS .409 .365 .249 .180 .071 .043M 120.76 120.74 119.83 118.23 117.60 116.83

TSB SD 16.24 15.75 16.22 15.76 15.52 15.97R 38 40 37 27 28 19DKS .092 .084 .066 .028 .051 .047M 122.24 124.50 123.83 122.51 120.16 117.36

TY B SD 6.60 10.34 12.86 15.00 14.51 15.81R 4 12 24 37 24 21DKS .318 .263 .204 .130 .089 .038M 1.29 1.19 1.10 1.06 1.02 .995

TF SD .404 .270 .201 .173 .141 .137R 78 90 62 62 31 24DKS .253 .286 .210 .154 .081 .040

Fn1,n2 M 1.07 1.03 1.01 1.01 1.00 1.00SD .319 .213 .170 .150 .139 .132


kyuan

Text Box


Table 9



150 200 300 500 1000 5000M 663.12 324.51 209.38 163.15 139.40 123.02

TB SD 183.07 66.43 33.28 22.60 17.42 14.62R 500 500 498 412 184 38D1

KS 1.00 1.00 .964 .791 .499 .171M 137.05 131.43 126.32 124.32 122.81 120.09

TC SD 37.80 26.88 20.06 17.20 15.33 14.26R 174 130 93 59 37 31DKS .334 .248 .198 .159 .134 .100M 109.69 109.41 110.34 112.19 112.79 116.01

TSB SD 16.28 16.34 15.90 15.00 14.18 14.65R 7 3 7 9 7 15DKS .201 .193 .191 .149 .157 .054M 121.01 122.59 122.57 122.58 122.14 120.03

TY B SD 6.15 9.27 11.39 12.77 13.34 13.92R 3 7 15 25 24 29DKS .293 .251 .175 .192 .148 .104M 1.21 1.13 1.08 1.06 1.04 1.02

TF SD .333 .232 .172 .147 .130 .121R 46 62 46 40 31 30DKS .197 .209 .178 .172 .142 .102


kyuan

Text Box


Table 10



150 200 300 500 1000 5000M 764.34 352.00 221.30 171.09 143.55 125.88

TB SD 231.25 69.61 34.49 24.50 17.94 15.33R 500 500 498 436 217 62D1

KS 1.00 1.00 .979 .837 .578 .200M 157.96 142.56 133.52 130.37 126.47 122.88

TC SD 47.74 28.16 20.79 18.65 15.79 14.95R 281 207 145 102 64 48DKS .519 .420 .321 .280 .221 .127M 102.44 103.70 103.51 105.93 108.86 111.23

TSB SD 23.71 25.86 26.79 27.77 28.62 31.27R 21 29 35 46 57 63DKS .361 .366 .377 .347 .309 .252M 123.88 126.43 126.61 126.99 125.32 122.74

TY B SD 6.34 8.90 11.28 13.41 13.63 14.57R 2 8 29 57 43 42DKS .346 .355 .307 .275 .233 .128M 1.39 1.23 1.14 1.11 1.07 1.04

TF SD .421 .243 .178 .159 .134 .127R 113 115 91 79 52 46DKS .378 .351 .299 .277 .228 .127


kyuan

Text Box


Table 11



150 200 300 500 1000 5000M 752.35 344.56 218.52 167.81 142.07 126.37

TB SD 216.09 62.04 31.49 21.03 16.86 13.50R 500 500 499 448 219 42D1

KS 1.00 1.00 .983 .862 .545 .245M 155.49 139.54 131.84 127.87 125.16 123.36

TC SD 44.61 25.10 18.98 16.01 14.84 13.16R 277 183 127 71 53 30DKS .517 .391 .303 .250 .199 .173M 98.45 98.67 99.13 100.21 104.70 106.86

TSB SD 24.10 25.37 25.27 24.76 25.51 26.85R 21 20 22 23 39 41DKS .436 .416 .426 .422 .371 .334M 123.75 125.63 125.80 125.28 124.21 123.22

TY B SD 5.83 8.14 10.41 11.67 12.90 12.84R 2 3 20 27 37 28DKS .372 .360 .300 .274 .204 .172M 1.37 1.20 1.13 1.09 1.06 1.05

TF SD .393 .217 .162 .136 .126 .112R 100 88 68 51 43 30DKS .365 .356 .283 .261 .203 .173


kyuan

Text Box


Table 12

Different Statistics for the Model in Example 1TML TB TC TSB TY B TF

Statistics 230.46 504.93 178.63 220.76 149.31 1.32

P-values 1.0 × 10−6 0 .01 7.5 × 10−6 .22 .09

Table 13

Summary and RecommendationType of Data Recommended StatisticsNormal data TML

Nonnormal data with

a nonsingular Ω1 TY B or TF

Nonnormal data with a nonsingular

Ω and a near zero CV(α) TSB

Nonnormal data with

a near singular Ω Further study needed

1Ω = σ′c(θ)SY σc(θ)

normal theory based test statistics in structural equation...

Documents