structural equation modeling: problems and ambiguities with “well fitting” models andrew...
TRANSCRIPT
Structural Equation Modeling: Problems and Ambiguities with
“Well Fitting” Models
Andrew Tomarken
1
Overview of Talk
• Brief introduction to structural equation modeling (SEM) with emphasis on core concept of model fit
• Review of several ambiguities and problems associated with well-fitting models that are typically ignored by users
• Conclusions:
– It is important for users to bear in mind what precisely is being tested when assessing model fit
– Users need to look beyond omnibus measures of fit
2
What is SEM?
• A set of methods for estimating and testing models that are hypothesized to account for the variances and covariances (and possibly mean structures) among a set of variables
• Such models typically consist of sets of linear equations containing free, fixed, or otherwise constrained parameters
• Two types of linear relations can be specified– between latent constructs (or factors) and their observable
indicators (measurement model)– between latent constructs (structural model)
• One way to think about it: Combines simultaneous equation/econometric approaches and factor-analytic/ psychometric approaches
3
SEM as a General Statistical Approach
• Most statistical procedures conventionally used to test hypotheses can be considered special cases of SEM
• Parallels development of GLMs in 1970’s and 1980’s as liberalization of classic linear models
• Recent development of multilevel and mixture modeling within SEM domain represents further extension of GLM’s to latent continuous and categorical variables
• Thus SEM may arguably be most general data-analytic framework at present time (Tomarken & Waller, 2005)
4
Some Advantages of SEM
• High level of explicitness: Forces researchers to specify a model with a high level of detail
• Typically aligns the statistical null hypothesis with the research hypothesis
• In principle, allows for separate assessments of relations between observable indicators and latent variables (measurement model) and among latent variables
• Can test models that are difficult or impossible to test with other procedures (e.g., factor of curves, associative growth)
• Allows you to test the overall fit of even very complex models – and that’s the focus of today’s talk
5
Path Analysis Model (Lynam et al., 1993)
6
SES
Test Effort
VIQ
DelinquencyImpulsivity
a
c
e
d
e1
1
e2
1
b
The Figure Implies Linear Equations
Figure Equations
Imp = a SES + b TE + c VIQ + e1
Del = d SES + e Imp + e2
SES
Test Effort
VIQ
DelinquencyImpulsivity
a
c
e
d
e1
1
e2
1
b
7
Confirmatory Factor Analysis Model
spatial
visperc
cubes
lozenges
wordmean
paragrap
sentence
e_v
e_c
e_l
e_p
e_s
e_w
verbal
1
a
b
1
c
d
1
1
1
1
1
1
8
The Figure Implies Linear Equations
Figure Equations
Visperc = Spatial + e_v
Cubes = a Spatial + e_c
Lozenges = b Spatial + e_l
Paragraph = Verbal + e_p
Sentence = c Verbal + e_s
Wordmean = d Verbal + e_w
spatial
visperc
cubes
lozenges
wordmean
paragrap
sentence
e_v
e_c
e_l
e_p
e_s
e_w
verbal
1
a
b
1
c
d
1
1
1
1
1
1
9
Latent Variable Causal Model (Trull, 2001)
ParentalMood
Disorder
ParentalDisinhibitory
Disorder
TraitDisinhibition
TraitNegative
Affectivity
BorderlineFeaturesAbuse
D1
D21
D3
1
SA PA
DEP ANX HOS
DEL SD IMP
PAI MMPI DIBR SIDP
e1
1
e2
1
e31
e41
e51
e6
1
e7
1
e8
1
e9
1
e10
1
e11
1
e12
1
1
10
A SEM Analysis: What Do We Want to Do?
11
spatial
visperc
cubes
lozenges
wordmean
paragrap
sentence
e_v
e_c
e_l
e_p
e_s
e_w
verbal
1
a
b
1
c
d
1
1
1
1
1
1
Estimate Coefficients and Standard Errors
12
Estimate S.E. C.R. P Label
visperc <--- spatial 1.000
cubes <--- spatial .610 .143 4.250 *** a
lozenges <--- spatial 1.198 .272 4.405 *** b
paragrap <--- verbal 1.000
sentence <--- verbal 1.334 .160 8.322 *** c
wordmean <--- verbal 2.234 .263 8.482 *** d
Assess Overall Fit
13
MODEL NPARCHI-
SQUAREDF P
Correlated Factors
13 7.853 8 .448
RMSEA LO 90 HI 90 PCLOSE
Correlated Factors
.000 .000 .137 .577
Model Comparisons
14
spatial
visperc
cubes
lozenges
wordmean
paragrap
sentence
e_v
e_c
e_l
e_p
e_s
e_w
verbal
1
a
b
1
c
d
1
1
1
1
1
1
spatial
visperc
cubes
lozenges
wordmean
paragrap
sentence
e_v
e_c
e_l
e_p
e_s
e_w
verbal
1
a
b
1
c
d
1
1
1
1
1
1
Model DFCHI-
SQUAREP
Orthogonal Factors
9 19.860
Correlated Factors
8 7.853
Nested Comparison
1 12.008 .001
The Concept of Model Fit in SEM
• The question: Does the structure implied by the model account for the observed variances and covariances among a set of variables?
• We compare the observed covariance matrix to the covariance matrix implied by the model
• A fitting function (=F) assesses the discrepancy between S (sample cov. matrix) and (estimated population covariance matrix implied by the model)
• Example: ML fitting function:• F – or something very much like F – appears in the formulae for
all conventionally used statistical tests of fit and fit indices • Estimates of free parameters are chosen that meet two potentially
competing goals:– minimize the discrepancy between the implied and observed
matrices– respect the restrictions (constraints) on the covariance matrix
implied by the model
1log tr S logMLF S p
15
Example of Model-Imposed Restrictions: 3 Variable Mediational Model
16
MX Y
Equations:M = a X + e1Y = b M + e2
Implied Restriction: Cov (X,Y) *Var(M) = Cov (X,M)*Cov(M,Y)Standardized: R(X,Y) = R(X,M)R(M,Y)
a b
e11
e21
Example of Model-Imposed Restrictions: Confirmatory Factor Model
F1
1
v4
e4
d
1
v3
e3
c
1
v2
e2
b
1
v1
e1
a
1
10 knowns: the 4 variances and 6 covariances among v1-v4
8 free parameters to estimate: 4 factor loadings (a-d) and the variances of the four error terms (e1-e4)
This model also implies a set of constraints on the covariances among the observable variables
C(1,3)C(2,4) = C(1,4)C(2,3) = C(1,2)C(3,4)
This model will result in an estimated or implied covariance matrix that respects these constraints
If the sample and implied matrices agree, the model fits
17
This Should Sound Familiar
• Although the models and the specific criteria minimized may differ, the notion that statistical tests and fit indices evaluate model-imposed restrictions is completely consistent with general principles of statistical modeling, particularly in specific contexts (e.g., ML estimation)
18
How Do Users Typically Assess Overall Fit?
• Hypothesis-testing using inferential statistical tests– Likelihood ratio chi-square test of exact fit – Compares
target model to a saturated (just-identified model)
– Nested chi-square tests for competing models (very important for model comparisons)
• Fit indices that indicate degree of fit• Historically, more methodological papers on SEM
have focused on measures of fit than any other topic
19
Below the Radar
• Both methodological literature and empirical applications heavily emphasize statistical tests and descriptive indices of fit
• This focus can blind users to an important point: Even “well-fitting” models can have substantial problems and uncertainties that are often ignored by researchers
• Tomarken and Waller’s (2003) review indicated a number of respects in which users ignore several potential problems with models that appear to fit well
• Ironically, these issues are not particularly subtle. Rather they are linked to core features of the concept of “model fit” in the SEM context
20
Potential Problems/Ambiguities with Well-Fitting Models
---- and/or the Researchers Who Test Them
1. Lack of clarity concerning what exactly is being tested. 2. A poorly fitting structural (i.e., path) component that is
masked by a well-fitting composite model3. A large number of equivalent models that will always yield
identical fit to the target model4. Questionable lower-order components of fit5. Omitted variables that influence constructs included in the
model 6. The presence of a number of non-equivalent and non-nested
alternative models that could fit better but are rarely ever tested
7. Low power or sensitivity to detect critical misspecifications 8. Specifications driven by hidden post-hoc modifications that
lower the validity and replicability of the results21
Issue # 1: Do you Know What Exactly is Being Tested?
• SEM models impose restrictions on variances and covariances among the observed variables (and sometimes on means too).
• Unfortunately: – Researchers are often unaware of the restrictions tested by even simple
models– Such restrictions sometimes do not reflect what the researcher would
identify as core features of the model --- questions that motivated the study in the first place
– Many models impose so many restrictions that it’s typically impossible for even specialists to figure them all out or render them comprehensible in a more global way
• In short: – People often are unaware of what exactly is being assessed by statistical
tests of fit or fit measures – and what is being assessed is often not exactly what the researcher had in mind
22
X2
f2
X1
f3
Y1 Y2f5
f8
DX21
f9
DY2
f1f7f6
2W 2V Panel Model9 free parameters
1degree of freedomQ: What's This Model Testing?
f4
1
23
X2
f2
X1
f3
Y1 Y2f5
f8
DX21
f9
DY2
f1f7f6
AnswerCOV(X2,Y2.X1,Y1)=0
f4
1
0
24
X 1
Y1 Y2
X 2 X 3
Y3
D X 2
D Y2
D X 3
D Y3
C ro ss-L a g g ed P a n el M o d el1 7 F ree P a ra m eters4 R estrictio n s = 4 d f
Q : W h a t's T h is M o d el T estin g ?
25
X 1
Y1 Y2
X 2 X 3
Y3
D X 2
D Y2
D X 3
D Y3
A n sw er: L a g -2 P a th s = 0
0
0
00
26
This Does Not Mean Overall Model Fit is Irrelevant!
• One might argue: Let’s just ignore fit indices and look at what we’re really interested in
• Flawed argument: One would not want to test coefficients, estimate direct and indirect effects, estimate proportion of variance, etc., etc in a model that does not fit well and appears to be mis-specified. Parameter estimates and standard errors will be inaccurate.
• Don’t ignore fit but see it as a first step or necessary condition for looking at what you really are interested in. It is not an end in itself.
27
Why the Problem?
• Educational
• Perceptual/cognitive biases
– Feature-positive effect: We attend more to presence (what’s there) than to absence (what’s not there)
– Model restrictions are usually characterized by absence (e.g., coefficients that are fixed at 0).
• Reliance on graphics and other user-friendly mechanisms for specifying models in software
28
X Y Z
Ry1
Rz1
X Y
Z
Ry
Rz
1
Can You See It Now?
29
Why the Problem?
• Educational
• Perceptual/cognitive biases
– Feature-positive effect: We attend more to presence (what’s there) than to absence (what’s not there)
– Model restrictions are usually characterized by absence (e.g., coefficients that are fixed at 0).
• Reliance on graphics and other user-friendly mechanisms for specifying models in software
• Complexity of many models makes it impossible to catalogue all restrictions
30
PMD
PDD
SAPA
NEO_DEP NEO_HOS NEO_ANX
NEO_DEL NEO_SD NEO_IMP
PAI_BOR
MM_BPD
DIB_R
SIDP
Abuse
TDIS
TNA
BOR
e1
1
e2
1
e31
e41
e51
e9
e10
e11
e12
e6
1
e7
1
e8
1
1
1
1
D1
D21
D3
11
Trull (2001) Borderline Personality Disorder ModelIdentify the 62 Restrictions this Model is Testing
and Win a Prize!
1
1
1
1
1
31
Issue # 2: A poorly fitting composite model that masks an ill-fitting structural (path) model
• In many latent variable SEM models it’s important to distinguish between:– Measurement model: Relations between manifest indicators and latent
constructs– Structural (path) model: Relations among latent constructs– Composite model: The whole model that combines both the
measurement and structural components
• Typically, in latent variable models the clear majority of the restrictions are imposed at the level of the measurement model -- and that often fits well
• Common result: A well-fitting composite model that masks an ill-fitting structural component
• But the main motivation for the study typically is the structural component!
32
L X
X 1 X 2 X 3
e 2 e 3
L Y
Y1 Y2 Y3
e 4 e 5 e 6
L Z
Z1 Z2 Z3
e 7 e 8 e 9
C o m p o s ite (i.e . , T a rg e t) M o d e l
M e a s u re m e n t M o d e l
R L Y R L Z
L X
X 1 X 2 X 3
e 1 e 2 e 3
L Y
Y1 Y2 Y3
e 4 e 5 e 6
L Z
Z1 Z2 Z3
e 7 e 8 e 9
e 1
33
Chi-Square Tests of the C, M, and S Models
• Composite: Global test of the composite model
• Measurement: Global test of the measurement model
• Structural: Nested Test assessing relative fit of the composite and mesurement models
2 2 2S C M
S C Mdf df df
2
2
2
34
Model df p RMSEA
Composite 35.46 25 .080 .0290
Measurement 24.66 24 .425 .0074
Structural 10.80 1 .001 .1402
2
Illustrating the Problem
35
Issue # 3: Equivalent Models
• Two models are equivalent when their assessed fit across all possible samples is identical because they impose identical restrictions on the data
• Such models are ubiquitous in statistics• In the context of SEM, two models are equivalent when their
implied covariance matrices are identical because they impose the same restrictions on the variances and covariances
• If their implied covariance matrices are identical, then for any given sample, their discrepancy functions will be identical.
• If their discrepancy functions are identical, the values of all conventionally used fit indices will be identical.
36
The Problem
• The typical structural equation model has many equivalent models that impose the same restrictions on the data
• Typically, at least some are compelling theoretical alternatives to the target model of interest
• Such equivalent models are almost never acknowledged by researchers
37
3 Equivalent Causal Models
• These 3 models share the same restriction: [Cov(x,z)*Var(y)]-[Cov(x,y)*Cov(y,z)] = 0
• If variables are standardized, this restriction is: rxz-rxyryz=0
• All 3 models predict that the partial correlation between x and z, adjusting for y equals 0
• The overall fit of these 3 models will always be the same
• However, they represent three radically different claims about causal structure
X Y Z
Ry1
Rz1
Model 1A
X Y Z
Model 1B
Ry1
Rx1
X
Y
Z
Model 1C
Rx1
Rz1
38
Three Equivalent Measurement Models
F1
X4
e4
1
X3
e3
1
X2X1
e1
1
e2
1
Model 2A
F1
X4
e4
1
X3
e3
1
X2X1
e1
1
e2
1
Model 2B
X4
e4
1
X3
e3
1
X2X1
e1
1
Model 2C
e2
1
F1 F2
All 3 models impose the same restriction on the implied covariance matrix:
[Cov(x1,x3)*Cov(x2,x4)]-[Cov(x1,x4)*Cov(x2,x3)] = 0
39
PMD
PDD
SAPA
NEO_DEP NEO_HOS NEO_ANX
NEO_DEL NEO_SD NEO_IMP
PAI_BOR
MM_BPD
DIB_R
SIDP
Abuse
TDIS
TNA
BOR
e1
1
e2
1
e31
e41
e51
e9
e10
e11
e12
e6
1
e7
1
e8
1
1
1
1
D1
D21
D3
11
How Many Equivalent Models?Lower Bound Estimate = 33,925
1
1
1
1
1
40
ParentalMood
Disorder
ParentalDisinhibitory
Disorder
TraitDisinhibition
TraitNegative
Affectivity
BorderlineFeaturesAbuse
Model 3BEquivalent Model
D1 D21
D3
1
D4
1
1
ParentalMood
Disorder
ParentalDisinhibitory
Disorder
TraitDisinhibition
TraitNegative
Affectivity
BorderlineFeatures
Abuse
Model 3DEquivalent Model
ParentalMood
Disorder
ParentalDisinhibitory
Disorder
TraitDisinhibition
TraitNegative
Affectivity
BorderlineFeaturesAbuse
D1
1
D2
1
D3
1
Model 3AOriginal Model
ParentalMood
Disorder
ParentalDisinhibitory
Disorder
TraitDisinhibition
BorderlineFeatures Abuse
Model 3CEquivalent Model
D21
D41Trait
NegativeAffectivity
D11
D31
41
Recommendations
• Researchers need to acknowledge presence of equivalent models
• Use designs that limit number of plausible equivalents (e.g., one rarely noted advantage of longitudinal relative to cross-sectional designs).
42
Issue # 4: Fixated on FitInattention to Lower-order Components
• What are “lower-order components” ?– Specific model parameters (e.g., path coefficents)– Measures that can be derived from parameters
• Direct, indirect, and total effects• Proportion of variance
• In most other statistical procedures that we use (e.g., multiple regression), the focus is on lower-order components
• There can be dissociations between measures of overall fit and lower-order components – A model can fit perfectly, yet have problematic or disappointing lower-
order components– Lower-order components can indicate very strong effects, yet the overall
fit can be terrible• Problem: Applied researchers often inappropriately de-emphasize lower-order
components in favor of reliance on global fit indices
43
X Y Z
Ry
1
Rz
1
Model Tested
Q
44
Sample Covariance Matrix SA Sample Covariance Matrix SB
X
Q Y
Z
X
100
Q
20
100
Y
55
65
100
Z
65
75
80
100
45
X
Q Y
Z
X
100
Q
30
100
Y
6.5
6.5
100
Z
0.52
0.52
8.0
100
Overall Fit and Components of Fit when AS and BS are Analyzed
Measure Good Fit Region Matrix SA Matrix SB Overall Fit
2 (df = 2) 0.00, p = 1.00 421.26, p < .0001
RMSEA Estimate .06 0.000 0.648 Lower 90% Limit -------- 0.596 Upper 90% Limit --------- 0.701 SRMSR .08 0.000 0.098 TLI .95 1.126 0.108 CFI .95 1.000 0.703 Components of Fit Path Coefficients PYX 0.05, NS 0.44, p <.0001 PYQ 0.05, NS 0.56, p <.0001 PZY 0.08, NS 0.80, p < .0001 % of Variance
2YR < 1% 61%
2ZR < 1% 64%
Note: N = 500. 2 = chi-square test of exact fit; RMSEA = root mean squared error of approximation; SRMSR = standardized root mean squared residual; TLI = Tucker-Lewis Index; CFI = Comparative Fit Index; PYX = path coefficient denoting effect of X on Y; PYQ = path coefficient denoting effect of Q on Y; PZY = path coefficient denoting effect of Y on Z; NS = not significant. For these models, the non-standardized and standardized coefficients are identical.
46
How Can a Model with Problematic Lower-order Components Fit Well?
• Residuals are part of the model!• Two types of residuals in SEM
– Residual matrix that is difference between observed and implied covariance matrices
– Residual variances and covariances (e.g., variance of an endogenous variable not accounted for by its predictors) that are model parameters
• Residual variances– Typically, are just-identified (impose no restrictions)– Can easily “fill in the difference” to reproduce the observed variance
of a variable even when predictors account for very small proportion of variance
• In essence, a weak theory can be bailed out by residual terms
47
Residual Covariances
are Often Critical Too
PMD
PDD
SAPA
NEO_DEP NEO_HOS NEO_ANX
NEO_DEL NEO_SD NEO_IMP
PAI_BOR
MM_BPD
DIB_R
SIDP
Abuse
TDIS
TNA
BOR
e1
1
e2
1
e31
e41
e51
e9
e10
e11
e12
e6
1
e7
1
e8
1
1
1
1
D1
D21
D3
11
1
1
1
1
1
48
Other Respects in Which Local Features of a Model are Ignored
• Confidence intervals around parameter estimates rarely reported
• Potential problems with tests of parameters often ignored– Reliance on Wald tests– Incorrect chi-square distributions for tests at the boundary of the
parameter space – Often invariance across different parameterizations is mistakenly
assumed • Issue of assessment of fit at the level of individual subjects is
typically ignored (e.g. no analysis of residuals or of individual contributions to fit)
• Irony: In many cases, a more rigorous assessment of a model is afforded by a more traditional multiple regression approach!
49
Issue # 5: Omitted Variables
• Sometimes measures of fit are sensitive to the problem of omitted variables (4A tested model, 4B true model)
• Sometimes they are not (4A tested, 4C true model)
• Thus, a well-fitting model could -- and typically does -- omit important variables
X Y Z
Ry1
Rz1
Model 4A:Hypothesized Model
X Y Z
Ry
1
Rz
1
Model 4B:Correct Alternative Structure # 1
Q
X Y Z
Ry
1
Rz
1
Model 4C:Correct Alternative Structure #2
Q
50
Omitted Variables: Can Residual Covariance Terms Bail us Out?
• By representing the omitted influences that do variables may share in common, residual covariance terms can improve model fit
• However, there are limits on the covariances that can be specified
• In addition, they typically do not correct for biased estimates due to omitted variables
51
Y1
X1
X2
Y2
X3
0.4
0.4
.4
0.2
0.20.2
0.20.6
0.6
.34
D11
.34
D21
True Model
52
Y1X1
X2 Y2
D11
D21
Model TestedBad Fit: Chi-square (1) = 114.33, p < .0001
Note Also Biased Path Estimates
0.37
0.37
0.37 0.37
53
Y1X1
X2 Y2
D11
D21
Revised ModelNow Perfect Fit (Saturated)
But Path Estimates Unchanged
0.37
0.37
0.37 0.370.28
54
Summary
• SEM is a powerful and comprehensive data-analytic technique
• There are a number of issues regarding model-imposed restrictions and assessment of fit that commonly operate under the radar of the applied user
• A well-fitting model can have substantial problems and ambiguities
55