cfa and more in r!outline overview of lavaan and pisa data data screening in r, a brief overview...
TRANSCRIPT
Outline
Overview of lavaan and PISA data
Data Screening in R, a brief overview
Confirmatory factor analysis (CFA)One-factor CFA, continuous vs ordinal data
Two-factor CFA
Measurement Invariance
Structural equation modeling (SEM)Structural model
Mediation model
2
lavaan
An R package aimed for latent variable analysis (Rosseel, 2012).
Regression
CFA
Path analysis
SEM
Publicly downloadable
Yield results comparable to Mplus (mimic=“mplus”)
3
lavaan Basic operators & utilities (please refer to Rosseel, 2017 for more details).
• y ~ x1 + x2 + x3 + x4 y is regressed on x1-x4 (works for both observed and latent Vs)
• F1 =~ y1 + y2 + y3 F1 is measured by y1-y3.
• y1 ~~ y2 error covariance of (y1, y2)
• y1 ~ 1 intercept of y1
• cfa() analyzing measurement models, multiple group.
• sem() analyzing models with structural paths.
4
Descriptions Syntax
PISA 2015 Student Data_U.S.
• The Programme for International Student Assessment (PISA)
• Data retrieved from http://www.oecd.org/pisa/data/2015database/
• Coordinated by the Organization for Economic Co-operation and Development (OECD)
• Spanning 35 OECD countries/regions, PISA assesses skills in reading, maths, and science with the focus rotating every three years.
• Targeting at 15-year-old secondary school students
• Assesses multiple cognitive, social, and emotional well-being (e.g., self-efficacy, belief, engagement).
• N = 5712 (2854 males) in science_15.dat
5
Selected Variables
oMale (1: male; 0: female)
oParental support (pa_sup1 – pa_sup4), 4-point
oMotivation (mot1 – mot9, 5 for enjoyment, 4 for instrumental), 4-point
oScience efficacy (sci_eff1-sci_eff8), 4-point
oHome educational resources (HEDRES), continuous
oScience performance (PV1SCIE), continuous (M = 496, SD = 97.5)
oAll missing data is coded as 999.
6
Data Screening in RA Brief Practice
7
Set-upLoad the R package lavaan
install.packages("lavaan") ## only need to install once
library(lavaan) ## load the package each time
Define a working directory where the data is stored
setwd("C:/Users/xu.1384/Documents/lavaan workshop")
Read in data
## header=T: our data has variable names; na.string: our missingness is coded as 999.
science <- read.table("science_15.dat", header=T, na.string=999)
## to display the first six lines of data
head(science) ## everything looks OK?
8
Set-up ##in case the first column name is disordered
names(science)[1] <- "male"
9
Data Management
Basic summary
dim(science) ## to display the dimension
[1] 5712 24
summary(science) ## to produce summary information of each V
sapply(science, function(x) sum(is.na(x))/length(x)) ## to obtain missing rate
10
Data Management
To display variable distributions
par(mfrow=c(2,2)) ## tell R to display graphs in 2*2 format
pa <- science[, 2:5] ## to extract parental support items
sapply(pa, function(x) hist(x)) ## to exhibit the histogram of the 4 Vs
11
Data Management
How are the variables related with each other?
pairs(pa, panel=panel.smooth) ## to show pairwise correlation plot
12
Data Management
To get the bivariate correlations
cor(pa, use="complete.obs", method="pearson") ## to show the correlations
13
Confirmatory Factor Analysis
14
One-factor CFA Example
Parental support (4 items)
Item descriptions
pa_sup1: My parents are interested in my school activities.
pa_sup2: My parents support my educational efforts and achievements.
pa_sup3: My parents support me when I am facing difficulties at school.
pa_sup4: My parents encourage me to be confident.
15
CFA: a useful tool for measurement purposes.
To test how well the 4 items represent “parental support”.
One-factor CFA Example
R code for model fitting
## specify the one-factor CFA model, naming the latent factor to be pa_sup
pa.model <- 'pa_sup =~ pa_sup1 +pa_sup2 + pa_sup3 + pa_sup4’
## fit the model, fill in the model, data, and estimator;
model.pa <- cfa(model = pa.model, data = science, estimator = "MLR", mimic="mplus")
##Note. the naming in the left hand side of “<-” is flexible.
## to obtain the output for the analysis
summary(model.pa, fit.measures =TRUE, standardized=TRUE, rsquare=T)
16
One-factor CFA Example Yes, please refer to the “Robust” results
The Robust column
refers to the MLR
estimates, use this one
17
One-factor CFA Example
Estimates Scaling both latent
factor and DVs to be 1
Scaling latent
factor to be 1
18
One-factor CFA Example## to flexibly extract fit indices of interest
fitMeasures(model.pa, c("cfi.scaled", "tli.scaled","rmsea.scaled","srmr","aic","bic"))
## to extract unstandardized parameter estimates
parameterEstimates(model.pa)
## to obtain the standardized factor loadings only, use the following two steps:
std <- standardizedSolution(model.pa)
std[std$op=="=~","est.std"] ## std$op==“=~” means factor loadings
[1] 0.7167092 0.7942240 0.8349395 0.8230712
XX.scaled refers to
MLR estimates
19
When data is treated as ordinal
20
One-factor CFA_ordinal data• ## check frequency distribution of each V first.
sapply(pa, function(x) table(x))
21
To apply table(x) to all the Vs in pa
One-factor CFA_ordinal data## fit the same model while treating the variables as ordinal type
model.pa_cat <- cfa(model=pa.model, data=science, estimator = "WLSMV", mimic="mplus", ordered=c("pa_sup1","pa_sup2","pa_sup3","pa_sup4"))
## same procedures for summary and results extraction.
summary(model.pa_cat, fit.measures=TRUE, standardized=TRUE, rsquare=T)
fitMeasures(model.pa_cat, c("cfi.scaled", "tli.scaled","rmsea.scaled","srmr"))
std <- standardizedSolution(model.pa_cat) ## to obtain standardized estimates
std[std$op=="=~","est.std"] ## to extract standardized loadings
[1] 0.7945616 0.8855959 0.9049610 0.889216322
One-factor CFA
• Questions so far?
Highlights
Basic data management
Specify model in lavaan
Specify the fitting function, if data is ordinal, use “ordered=…”
summary() for model fit and estimates
Ways to extract specific information from output, e.g., fit indices, loadings.
23
Practice IScience efficacy (8 items, sci_eff1 – sci_eff8).
To create a “efficacy” data set
hint: efficacy <- science[, 20:27]
To make histogram plot for the 8 items
Hint: sapply(efficacy, function(x) hist(x))
Please fit a one-factor CFA model for sci_eff treating data as continuous
Does it fit well? All loadings significant?
Please fit the same model while treating data as ordinal.
Does it fit well? All loadings significant?
Please compare selected model fit indices (cfi.scaled, tli.scaled, rmsea.scaled, srmr), and standardized loadings for the two approaches. Which one is better?
24
Item Descriptions for Efficacy
sci_eff1 Recognise the science question that underlies a newspaper report on a health issue.
sci_eff2 Explain why earthquakes occur more frequently in some areas than in others.
sci_eff3 Describe the role of antibiotics in the treatment of disease.
sci_eff4 Identify the science question associated with the disposal of garbage.
sci_eff5 Predict how changes to an environment will affect the survival of certain species.
sci_eff6 Interpret the scientific information provided on the labelling of food items.
sci_eff7Discuss how new evidence can lead you to change your understanding about the
possibility of life on Mars.
sci_eff8 Identify the better of two explanations for the formation of acid rain.
25
To what extent I can…
Sample Output
Model fit comparison
Standardized loading comparison
26
Two-factor CFA Example
Science Motivation (9 items)
Item descriptions
27
mot1_enj I have fun when I am learning science
mot2_enj I like reading about science topics.
mot3_enj I am happy working on science topics.
mot4_enj I enjoy acquiring new knowledge in science.
mot5_enj I am interested in learning about science.
mot6_int
Making an effort in my science subject(s) is worth it because this will help me in the
work I want to do later on
mot7_int
What I learn in my science subject(s) is important for me because I need this for
what I want to do later on
mot8_int
Studying my science subject(s) is worthwhile for me because what I learn will
improve my career prospects.
mot9_int Many things I learn in my science subject(s) will help me to get a job.
Two-factor CFA Example
R code for model fitting
## specify the two-factor CFA model, naming the latent factor to be enjoy & instru
mot.2f <- 'enjoy =~ mot1_enj+mot2_enj+mot3_enj+mot4_enj+mot5_enj
instru =~ mot6_int+mot7_int+mot8_int+mot9_int'
## fit the model, fill in the model, data, and estimator;
model.mot_2f<-cfa(mot.2f, data=science, estimator = "MLR", mimic="mplus")
##Note. the naming in the left hand side of “<-” is flexible.
## to obtain the output for the analysis
summary(model.mot_2f, fit.measures=TRUE, standardized=TRUE, rsquare=T)
28
Two-factor CFA ExampleModel fit
Estimates
29
What’s the
inter-factor
correlation?
Practice II
Assuming continuous data, fit a one-factor CFA model to the motivation data (9 items);
Compare the one-factor vs two-factor CFA in terms of model fit and parameter estimates.
Which one is better?
Note. If estimator = “ML”, chi-square difference test between two nested models can be made by anova(model1, model2). For the practice, simply look at the respective output.
30
Sample Output
fit indices of one-factor vs two-factor CFA
31
Other Models Could be Considered
Bi-factor model: when it is posited that variables are explained by a single underlying construct, while there is uniqueness among groups of items.
32
The general
trait factor
The uniqueness for
intrinsic motivation
The uniqueness for
extrinsic motivation
Measurement Invariance (MI)
33
Measurement Invariance (MI)
Group differences on cognitive, psychological, and social traits are of popular interest in social sciences, e.g., Gender, race, ses differences on academic achievement;
Depression levels between clinical and non-clinical samples.
Levels of achievement and psychological well-being across Grade 1, 2, 3,….
It is assumed that scales are measuring the same construct across groups of interest/across time.
How to test it? (Meredith, 1993).Configural invariance: same loading patterns
Metric invariance: equal loadings
Scalar invariance: equal loadings + intercepts
Strict invariance: equal loadings + intercepts + error variances
(optional)34
Measurement Invariance (MI)Research question: is motivation (enjoy & instrumental) invariantly measured
across gender?
Configural Invariance: the two groups share the same loading patterns
Male Female
35
Configural Invariance
MI_1 <- cfa(mot.2f, data=science, group="male", estimator = "MLR", mimic="mplus")
fitMeasures(MI_1, c("chisq.scaled","df","cfi.scaled","tli.scaled","rmsea.scaled","srmr","bic"))
36
The fit indices refer to how good the
model fits the data assuming the groups
share the same factor structuresummary(MI_1,fit.measures=TRUE, standardized=TRUE)
Metric InvarianceMetric (Weak) Invariance: the two groups share the same structure + loadings
Male Female
37
1.0
λ31
λ41
λ51
λ21
λ72
λ82
λ92
1.0
λ21
λ31
λ41
λ51
1.0
1.0
λ72
λ82
λ92
Metric InvarianceMI_2 <- cfa(mot.2f, data=science, group="male", estimator = "MLR", mimic="mplus", group.equal=c("loadings"))
fitMeasures(MI_2, c("chisq.scaled","df","cfi.scaled", "tli.scaled","rmsea.scaled","srmr","bic"))
summary(MI_2,fit.measures=TRUE, standardized=TRUE)
38
Scalar Invariance
Scale (Strong) Invariance: the two groups share the same structure + loadings + intercepts
39
τ1τ1
τ2 τ2
τ9 τ9
All the intercepts are
equal across group
Scalar Invariance
MI_3 <- cfa(mot.2f, data=science, group="male", estimator = "MLR", mimic="mplus", group.equal=c("loadings", "intercepts"))
fitMeasures(MI_3, c("chisq.scaled","df","cfi.scaled","tli.scaled","rmsea.scaled","srmr","bic"))
summary(MI_3,fit.measures=TRUE, standardized=TRUE)
40Latent mean differences can be told
Strict InvarianceStrict invariance: equal error variance on the basis of strong
invariance. Not mandatory.
MI_4 <- cfa(mot.2f, data=science, group="male", estimator = "MLR", mimic="mplus",
group.equal=c("loadings", "intercepts","residuals"))
fitMeasures(MI_4, c("chisq.scaled","df","cfi.scaled", "tli.scaled","rmsea.scaled","srmr","bic"))
summary(MI_4,fit.measures=TRUE, standardized=TRUE)
41
Summary of MI Steps
Fit and changes in fit
42
Model χ2 df CFI TLI RMSEA SRMR BIC Δχ2 Δdf ΔCFI ΔTLI ΔRMSEA
Configural 380.761 52 0.986 0.980 0.048 0.013 78755.5
Metric 410.833 59 0.985 0.981 0.046 0.018 78714.7 -30.072 -7 0.001 -0.001 0.002
Scalar 441.294 66 0.984 0.982 0.045 0.018 78677 -30.461 -7 0.001 -0.001 0.001
Strict 415.363 75 0.985 0.986 0.040 0.020 78624.9 25.931 -9 -0.001 -0.004 0.005
Differences in the fit indices between
adjacent models: less constrained –
constrained model
e.g., this row is calculated as
Configural - Metric
Evaluation Criteria:
a) anova(model1, model 2) can be used for χ2 difference test if estimator = ML. When estimator = MLR, please refer to the
Mplus website: https://www.statmodel.com/chidiff.shtml
χ2 difference test is sensitive to sample size, so tentatively skipped.
b) Differences in CFI, TLI < .01, RMSEA < .015 indicates nonsignificant change (Cheung and Rensvold, 2002; Chen, 2007)
Summary of MI Steps
Implications of the MI results:
Motivation has same meaning across gender.
Latent means of enjoy & instru are comparable across gender.
Observed means are also comparable across gender.
What if MI is not satisfied at a certain step? => partial invariance (for details, please refer to Hirschfeld & von Brachel, 2014).
43
Practice III
Please run a series of MI models for science efficacy (sci_eff1 –sci_eff8, the one-factor CFA model) across gender.
χ2 difference test can be skipped right now.
No worry about partial invariance, just stop anywhere failing MI.
Could consider use the following table to help organize results.
44
Model CFI TLI RMSEA SRMR BIC ΔCFI ΔTLI ΔRMSEA
Configural
Metric
Scalar
Strict
Output for Reference
45
Structural Equation Modeling
46
SEM ExampleResearch question: what are the effects of gender (male), home
educational resources (HEDRES), and motivation (enjoy & instru) on science performance (PV1SCIE)?
Observed variables:
male, HEDRES, PV1SCIE.
Latent variables:
enjoy, instru
47
Structural
part
SEM Example
lavaan code
## specify the SEM model
sem1 <- 'enjoy =~ mot1_enj+mot2_enj+mot3_enj+mot4_enj+mot5_enj
instru =~ mot6_int+mot7_int+mot8_int+mot9_int
PV1SCIE ~ HEDRES + male + enjoy + instru’
## fit the model sem1
sem_1 <- sem(sem1, data=science, estimator = "MLR", mimic="mplus")
## get the selected fit indices
fitMeasures(sem_1, c("chisq.scaled","df","cfi.scaled", "tli.scaled","rmsea.scaled","srmr","bic"))
## to get the output for model fit and parameter estimates
summary(sem_1, fit.measures=TRUE, standardized=TRUE)48
SEM Example
Output
49
Check the p value for estimates,
anything nonsignificant?
Practice IV
Please delete the nonsignificant predictor(s) for PV1SCIE, re-run the SEM model
50
How does it fit?
SEM_Mediation
Research question: what is the effect of HEDRES on PV1SCIE mediated by motivation factors?
51
a1 b1
a2 b2
c
Direct effect:
c
Indirect effects:
a1*b1
a2*b2
Total effect
c + a1*b1+ a2*b2
SEM_Mediation Lavaan code
mediation <- 'enjoy =~ mot1_enj+mot2_enj+mot3_enj+mot4_enj+mot5_enj
instru =~ mot6_int+mot7_int+mot8_int+mot9_int
# direct effect
PV1SCIE ~ c*HEDRES
# mediator
enjoy ~ a1*HEDRES
PV1SCIE ~ b1*enjoy
instru ~ a2*HEDRES
PV1SCIE ~ b2*instru
# indirect effect (a*b)
a1b1 := a1*b1
a2b2 := a2*b2
# total effect
total := c + (a1*b1)+ (a2*b2)
# allow the two dimensions be correlated, not a default in sem() ###
enjoy~~instru’
med <- sem(mediation, data=science, estimator = "MLR", mimic="mplus")
fitMeasures(med, c("chisq.scaled","df","cfi.scaled", "tli.scaled","rmsea.scaled","srmr","bic"))
summary(med, fit.measures=TRUE, standardized=TRUE, rsquare=T) 52
In the fitting function, for bootstrap approach for se, use
sem(mediation, data=science, mimic="mplus“,
se = "boot", bootstrap = 10000)
SEM_Mediation
Output
53
a)How does the model fit?
b)How much variance in DV is explained?
c)Any interesting findings?
Practice VPlease fit a mediation model as shown in the following graph:
54
Exogenous: HEDRES
Mediator: science efficacy (latent)
Outcome: PV1SCIE
c
ab
Some References• Cheung, G.W., Rensvold, R.B.(2002). Evaluating goodness-of-fit indexes for testing measurement
invariance. Structural Equation Modeling, 9, 233–255.
• Chen, F.F. (2007). Sensitivity of goodness of fit indexes to lack of measurement invariance. Structural Equation Modeling. 14, 464–504.
• Hirschfeld, G., & von Brachel, R.(2014). Multiple-Group confirmatory factor analysis in R–A tutorial in measurement invariance with continuous and ordinal indicators. Practical Assessment, Research and Evaluation, 19(7), 1-12.
• Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58(4), 525-543.
• Organization for Economic Co-Operation and Development (OECD). Country Note: Key Findings from PISA 2015 for the United States; OECD Publishing: Paris, France, 2016; Available online: https://www.oecd.org/pisa/pisa-2015-United-States.pdf
• Rosseel, Y. (2012). lavaan: an R package for structural equation modeling. Journal of Statistical Software, 48, 1-36.
• Rosseel, Y. (2017). The lavaan tutorial. Retrieved from http://lavaan.ugent.be/tutorial/tutorial.pdf
55
Thanks for joining us!
Questions & Feedback
56