Structural Equations Modeling – Part 1:Confirmatory Factor Analysis
Pekka Malo 30E00500 – Quantitative Empirical Research Spring 2016
Agenda
• Basic concepts
• Confirmatory Factor Analysis (CFA)
• Practical guidelines
• Tutorial on SPSS Amos and CFA
25.01.16 Confirmatory Factor Analysis
2
What is SEM?
Structural equation modeling (SEM) is a collection of statistical techniques that allow a set of
relationships between one or more independent variables (IV’s), either continuous or discrete, and
one or more dependent variables (DV’s), either continuous or discrete, to be examined.
(~ Series of multiple regression equations)
25.01.16 Confirmatory Factor Analysis
4
Why use SEM?
1. Estimation of several interrelated relationships
2. Ability to represent unobserved (latent) concepts and correct for measurement error
3. Defines a model to explain an entire set of relationships
25.01.16 Confirmatory Factor Analysis
5
What is a latent construct?
• Represents theoretical concepts, which cannot be observed directly
• Similar to factors discussed in Exploratory Factor Analysis
• Needs to be measured indirectly using multiple measured variables (a.k.a. indicator or manifest variables)
25.01.16 Confirmatory Factor Analysis
6
Exogenous vs. Endogenous Constructs • Exogenous construct ~ latent, multi-item equivalent of an
independent variable – Variate (linear combination) of measures is used to represent a
construct
– Multiple measured variables represent the exogenous constructs
• Endogenous construct ~ latent, multi-item equivalent to a dependent variable
– Theoretically determined by factors within the model
– Multiple measured variables represent the endogenous constructs
25.01.16 Confirmatory Factor Analysis
7
Example: Two latent constructs
• Loadings represent the relationships from constructs to variables as in factor analysis.
• Path estimates represent the relationships between constructs as does β in regression analysis.
Source: Hair et al. (2010)
EndogenousConstruct
Y1 Y2 Y3 Y4
Exogenous Construct
X1 X2 X3 X4
25.01.16 Confirmatory Factor Analysis
8
Visual modeling: Path diagrams • SEM models are commonly described in visual form using
“path diagrams”, which present relations between constructs and measured variables
• Path diagrams generally consist of two parts:
Measurement model • How are the constructs related to measured variables?
Structural model • What are the relationships between the constructs?
25.01.16 Confirmatory Factor Analysis
9
Types of relationships in SEM 1. Relationship between a Construct and a Measured
Variable
Endogenous
Exogenous X
Y
25.01.16 Confirmatory Factor Analysis
10
Types of relationships in SEM 2. Relationship between a Construct and multiple Measured
Variables
Exogenous
X1
X2
X3
25.01.16 Confirmatory Factor Analysis
11
Types of relationships in SEM 3. Dependence relationship between two constructs
(Structural relationship)
Exogenous Endogenous
25.01.16 Confirmatory Factor Analysis
12
Types of relationships in SEM 4. Correlational relationship between constructs
Construct 1
Construct 2
25.01.16 Confirmatory Factor Analysis
13
Measurement and structural model
Measurement model
Structural model
25.01.16 Confirmatory Factor Analysis
14
“Cause-and-effect” relationships
Substantial evidence required:
1. Covariation
2. Sequence
3. Non-spurious covariance
4. Theoretical support
25.01.16 Confirmatory Factor Analysis
15
Non-spurious relationships • Original relationship:
• Testing for alternate cause:
Supervisor Job satisfaction
0.50
Supervisor Job satisfaction
Working conditions
0.50
0.00 0.30
25.01.16 Confirmatory Factor Analysis
17
Reliability and measurement error
• A certain degree of measurement error is practically always present
• Reliability = measure for the degree to which a set of indicators of a latent construct are internally consistent (i.e. the extent to which they measure the same thing)
– Reliability is generally inversely related to measurement error
25.01.16 Confirmatory Factor Analysis
18
Improving statistical estimation • In the previous multivariate techniques, we have assumed
that we can overlook the measurement error in the variables
• SEM automatically applies a “correction” for the amount of measurement error and estimates the correct structural coefficient (i.e. the relationships between constructs)
• Relationship coefficients estimated by SEM tend to be larger than coefficients obtained from multiple regression
25.01.16 Confirmatory Factor Analysis
19
Strong theoretical basis needed
• No SEM model should be considered without an underlying theory
• Theory is needed for specifying the path diagram: – Measurement model
– Structural model
25.01.16 Confirmatory Factor Analysis
20
Modeling strategies • Confirmatory modeling strategy
– Specify a single model
– “It either works or it doesn’t”
• Competing models strategy – Multiple alternative specifications
– Strongest test is to compare models representing different but plausible hypothesized relationships
• Model development strategy – Basic model proposed as a starting point
– SEM used to get insights for re-specification
– Model needs to be verified with an independent sample
25.01.16 Confirmatory Factor Analysis
21
SEM and other multivariate techniques
• SEM is most appropriate when researcher has multiple constructs, each represented by several measurement variables
• SEM ~ hybrid of multiple regression, MANOVA and factor analysis
• Opposite of exploratory techniques; everything is theory driven
25.01.16 Confirmatory Factor Analysis
22
Confirmatory Factor Analysis
Similar to EFA in many respects, but with a completely different philosophy. With CFA,
researcher needs to specify both number of factors as well as what variables define the factors.
25.01.16 Confirmatory Factor Analysis
25
CFA as a tool for evaluating measurement model
• Specification of the measurement model is a crucial step in SEM (!)
• Commonly CFA is used as a tool to validate the measurement model before specifying and estimating the structural model:
– Are the constructs unidimensional and valid?
– How many indicators should be used for each construct?
– Are the measures able to portray the construct or explain it?
25.01.16 Confirmatory Factor Analysis
26
Steps in CFA
Define constructs
Define measurement model
Design the empirical study
Estimate and assess validity
25.01.16 Confirmatory Factor Analysis
27
Step 1: Defining the constructs
• Operationalization
• Scales from prior research
• Development of new scales
• Pretesting
25.01.16 Confirmatory Factor Analysis
28
Step 2: Defining the measurement model
• Are the constructs unidimensional (i.e. no cross-loadings)?
• Is the measurement model congeneric (i.e. no covariance between or within construct error variances)?
• Is there a sufficient number of indicators per construct (i.e. ensure identification)?
25.01.16 Confirmatory Factor Analysis
29
Compensation
X1 X2 X3 X4
e1 e2 e3 e4
Lx1
Lx 2 Lx 3
Lx 4
Teamwork
X5 X6 X7 X8
e5 e6 e7 e8
Lx 5 L 6 Lx 7
Lx 8
Source: Hair et al. (2010)
Example: Congeneric model
Each measured variable is related to exactly one construct.
25.01.16 Confirmatory Factor Analysis
30
X5 X6 X7 X8
δ5 δ6 δ7 δ8
λx5,2 λx6,2 λx7,2 λx8,2
X1 X2 X3 X4
λx1,1 λx2,1 λx3,1 λx4,1
λx3,2 λx5,1
δ1 δ2 δ3 δ4
Ф21
θδ 2,1 θδ 7,4
Figure 11.2 A Measurement Model with Hypothesized Cross-Loadings and Correlated Error Variance
Compensation Teamwork
Source: Hair et al. (2010)
Example: Non-Congeneric model
Each measured variable is not related to exactly one construct – errors are not independent.
25.01.16 Confirmatory Factor Analysis
31
Items per construct • Good practice dictates a minimum of 3 indicator variables
per construct (4 is preferred)
• Assessment of single-item constructs is problematic (if included, they don’t generally stand for latent constructs)
• Rationale for requirement of 3 indicators: – Measurement model with a single constructs and only 2
indicators is under-identified (= there are more parameters than unique covariances)
– Remember: the number of unique variances and covariances in the observed covariance matrix = degrees of freedom
25.01.16
Confirmatory Factor Analysis
32
Measured Items X1=Cheerful X2=Stimulated X3=Lively X4=Bright
Loading Estimates λx 1,1=0.78 λx 2,1=0.89 λx 3,1=0.83 λx 4,1=0.87
Error Variance Estimates θδ1,1=0.39 θδ2,2=0.21 θδ3,3=0.31 θδ4,4=0.24
Eight paths to estimate
10 unique variance-covariance terms
ξ1
X1 X2 X3 X4
δ1 δ2 δ3 δ4
λx 1,1 λx 2,1 λx
3,1 λx 4,1
θδ 1,1 θδ 2,2 θδ 3,3 θδ 4,4
Symmetric Covariance Matrix: | X1 X2 X3 X4 --------------------------------------- X1 | 2.01 X2 | 1.43 2.01 X3 | 1.31 1.56 2.24 X4 | 1.36 1.54 1.57 2.00 Model Fit: χ2 = 14.9 df = 2 p = .001 CFI = .99
Source: Hair et al. (2010)
Example: Over-identified construct
25.01.16 Confirmatory Factor Analysis
33
Formative vs. reflective constructs
• Reflective measurement theory: – Latent constructs cause the measured variables – CFA is based on the reflective approach – Errors occur due to inability to fully explain variables
• Formative measurement theory: – Measured variables “cause” the construct
– Error term is an inability of measured variables to fully explain the construct
– Formative constructs are not latent – Formative constructs are interpreted as indices where each indicator
is a cause of the construct – Have problems in statistical identification?
25.01.16 Confirmatory Factor Analysis
34
Formative vs. reflective constructs (cont.)
• Practical implications: – Use of formative constructs require additional variables or
constructs to ensure an over-identified model
– Formative should represent all items for it: dropping items because of low loadings should not be done (internal consistency and reliability are not so important)
– In reflective approach, indicators which have low correlations with the other indicators of the same construct, should be removed
25.01.16 Confirmatory Factor Analysis
35
Step 3: Design the empirical study
• Choice of measurement scales
• Sampling issues
• Model specification and identification issues
• Countering potential estimation problems
25.01.16 Confirmatory Factor Analysis
36
Setting the scales for constructs • All indicator variables for a construct don’t have to be of
the same scale
• However, normalization can make interpretation easier
• Before estimation of the model, you need to ensure that the scale of each construct is defined:
– Fix one loading and set its value to 1 (i.e. don’t estimate loading parameter); or
– Fix the construct variance and set its value to 1
• Check that multiple values are not constrained to 1 for the purpose of defining the scale
25.01.16 Confirmatory Factor Analysis
37
Identification of the model • Degrees of freedom gives the amount of mathematical
information available to estimate model parameters
• In the case of SEM, this is given by the number of unique variances and covariances minus number of parameters
• Where p = number of variables and k=number of parameters
25.01.16 Confirmatory Factor Analysis
38
Identification of the model (cont.)
Order condition: Net degrees of freedom must be > 0
• Under-identified ~ more parameters than unique covariance and variance terms
• Just identified ~ df = 0
• Over-identified ~ df > 0
Rank condition: Each parameter is uniquely defined
25.01.16 Confirmatory Factor Analysis
39
Recognizing identification problems
• Incorrect indicator specification – Not linking an indicator to any construct
– Linking an indicator to two or more constructs
– Not creating and linking an error term for each indicator
• Setting the scale of a construct – Forgetting to set the scale (either loading of an indicator or the
construct variance)
• Insufficient degree of freedom – Violation of 3-indicator rule (in particular when sample < 200)
– More indicators needed or add constraints to free up degrees of freedom
25.01.16 Confirmatory Factor Analysis
40
Recognizing identification problems (cont.)
• Very large standard errors
• Inability to invert the information matrix (no solution found)
• Wildly unreasonable estimates, including negative error variances
• Unstable parameter values
25.01.16 Confirmatory Factor Analysis
41
Sample size issues • Multivariate distribution of data
– Should have 15 observations for each parameter estimated
• Estimation technique – If all assumptions OK, ML works already with sample of 50 – In less than ideal conditions, sample should be at least 200 – Sample sizes in range of 100-400 are recommended
• Model complexity (# of constructs, parameters, groups)
• Amount of missing data
• Amount of average error variance among the reflective indicators
– With communalities less than 0.5 (i.e. standardized loadings less than 0.7), large samples required for stable solution
25.01.16 Confirmatory Factor Analysis
42
Thumb rules on sample size • Minimum sample of 100:
– 5 or less constructs, each with more than 3 indicator variables, and high communalities 0.6 or higher
• Minimum sample of 150: – 7 or less constructs, modest communalities 0.5, and no under-
identified constructs (i.e. fewer than 3 indicators)
• Minimum sample of 300: – 7 or fewer constructs, low communalities (below 0.45), and
multiple under-identified constructs
• Minimum sample of 500: – Models with large number of constructs, some with lower
communalities, and/or having fewer than 3 indicators
25.01.16 Confirmatory Factor Analysis
43
Step 4: Examination of model validity
• Are the constructs valid?
• Is the model fit acceptable?
• Diagnostics?
25.01.16 Confirmatory Factor Analysis
44
Construct validity • SEM can be used to evaluate the validity of constructs (i.e.
to what extent do the measured items reflect the theoretical latent construct?)
• Aspects of construct validity: – Convergent validity: loadings, variance extracted, reliability
– Discriminant validity
– Nomological validity
– Face validity
25.01.16 Confirmatory Factor Analysis
45
Convergent validity • Indicators of a specific construct should “converge” or
share a high proportion of variance in common
• Statistics for convergent validity – Loadings
– Average variance extracted
– Reliability
25.01.16 Confirmatory Factor Analysis
46
Statistics for convergent validity • Standardized factor loadings and squared factor loadings
– High loadings indicate convergence
– Should be statistically significant
• AVE = average variance extracted
– where squared standardized factor loadings indicate the amount
of variation in the indicator that can be explained by the factor
– AVE > 0.5 => adequate convergence
25.01.16
Confirmatory Factor Analysis
47
Statistics for convergent validity(cont.)
• Construct reliability
– Where V(ei) = error variance in variable i
– Should be > 0.7 to warrant good reliability
– High construct reliability indicates internal consistency, i.e. all measures represent the same construct
25.01.16 Confirmatory Factor Analysis
48
Guidelines for evaluating convergent and discriminant validity
• Estimated loadings should be 0.5 or higher
• AVE should be 0.5 or higher to support convergent validity
• AVE estimates for two factors should be greater than the square of the correlation between two factors to provide evidence of discriminant validity
• Construct reliability should be 0.7 or higher to suggest convergence and internal consistency
25.01.16 Confirmatory Factor Analysis
49
Discriminant validity • Is the construct unique?
• Does it differ from other constructs?
• Do the individual indicator variables represent only one latent construct?
• Examine correlations between constructs
• Presence of cross-loadings is an indicator of discriminant validity problems
25.01.16 Confirmatory Factor Analysis
50
Nomological and face validity • Face validity ~ “looks like it will work”
– Needs to be established before experiment
– Ensure understanding of every indicators content and meaning
• Nomological validity ~ “does the construct behave as it should with respect to other constructs”
– Theoretical propositions, e.g. “as age increases, memory loss increases”
– Check whether the correlations between constructs make sense!
25.01.16 Confirmatory Factor Analysis
51
Assessment of model validity
• Goodness-of-fit: Does the estimated implied covariance matrix match the observed covariance structure?
– Absolute goodness-of-fit
– Incremental goodness-of-fit
– Parsimonious fit measures
• Construct validity
25.01.16 Confirmatory Factor Analysis
52
Chi-square test
• The null hypothesis tests whether the difference between the sample and the estimated covariance matrix is a null or zero matrix
• Concluding that the null hypothesis holds indicates that the model fits the data
25.01.16 Confirmatory Factor Analysis
53
Problems with Chi-square test • Chi-square statistic is a function of the sample size N and
the difference between observed and estimated covariance matrices
• As N increases, so does the test-statistic even when differences between matrices don’t change
• Chi-square statistic also increases when adding number of observed variables, which makes it more difficult to achieve a fit
Need for complementary statistics!!
25.01.16 Confirmatory Factor Analysis
54
Comparative fit indices • Based on idea of comparing nested models on
continuum: saturated --- estimated --- independence
• Bentler-Bonett normed fit index (NFI): compares estimated model to independence model
– High values (> 0.95) indicate good-fit
• Bentler’s comparative fit index (CFI): – High values (> 0.95) indicate good-fit
25.01.16 Confirmatory Factor Analysis
55
Comparative fit indices (cont.) • Tucker-Lewis Index (TLI):
– Conceptually similar to NFI
– Takes model complexity into account
– Not normalized, but generally models with good fit have values close to 1
• Relative non-centrality index (RNI) – Compares observed fit to that of a null model
– Higher values represent better model (> 0.9)
25.01.16 Confirmatory Factor Analysis
56
Parsimony fit indices
• Improved either by a better fit or a simpler model
• Conceptually similar to adjusted R2
• Examples: – Adjusted Goodness-of-fit (AGFI)
– Parsimony normed fit index (PNFI)
25.01.16 Confirmatory Factor Analysis
57
Badness-of-fit indices • Root mean square error of approximation (RMSEA)
– Quite broadly used
– Attempts to correct for tendency of chi-square to reject models with large sample or large number of observed variables
– Lower values imply better fit (< 0.08)
• Root mean square residual (RMR) or standardized RMR – Generally standardized residuals exceeding |4.0| should be
scrutinized
– SRMR > 0.1 indicates a problem with fit
25.01.16 Confirmatory Factor Analysis
58