what is factor analysis?
DESCRIPTION
What is Factor Analysis?. Factor analysis examines the interrelationships among a large number of variables and, then, attempts to explain them in terms of their common underlying dimension Common underlying dimensions are referred to as factors Interdependence technique No I.V.s or D.V.s - PowerPoint PPT PresentationTRANSCRIPT
Factor analysis examines Factor analysis examines the interrelationships the interrelationships among a large number of among a large number of variables and, then, variables and, then, attempts to explain them attempts to explain them in terms of their common in terms of their common underlying dimensionunderlying dimension– Common underlying Common underlying
dimensions are referred to dimensions are referred to as factorsas factors
Interdependence techniqueInterdependence technique– No I.V.s or D.V.s No I.V.s or D.V.s – All variables are considered All variables are considered
simultaneouslysimultaneously
What is Factor What is Factor Analysis?Analysis?
Why do Factor Why do Factor Analysis?Analysis? Data SummarizationData Summarization
– Research question is to better understand Research question is to better understand the interrelationships among the variablesthe interrelationships among the variables
– Identify latent dimensions within data setIdentify latent dimensions within data set– Identification and understanding of these Identification and understanding of these
underlying dimensions is the goalunderlying dimensions is the goal Data ReductionData Reduction
– Discover underlying dimensions to reduce Discover underlying dimensions to reduce data to fewer variables so all dimensions data to fewer variables so all dimensions are represented in subsequent analysesare represented in subsequent analyses Surrogate variablesSurrogate variables Aggregated scalesAggregated scales Factor ScoresFactor Scores
Precursor to subsequent MV techniquesPrecursor to subsequent MV techniques– Data SummarizationData Summarization
Latent dimensions -- research question Latent dimensions -- research question answered with other MV techniquesanswered with other MV techniques
– Data ReductionData Reduction Avoid multicollinearity problemsAvoid multicollinearity problems Improve reliability of aggregated scalesImprove reliability of aggregated scales
AssumptionsAssumptions Variables must be Variables must be
interrelatedinterrelated– 20 unrelated variables=20 20 unrelated variables=20
factorsfactors– Matrix must have sufficient Matrix must have sufficient
number of correlationsnumber of correlations Some underlying factor Some underlying factor
structurestructure Sample must be Sample must be
homogeneoushomogeneous Metric variables assumedMetric variables assumed MV normality not requiredMV normality not required Sample sizeSample size
– Min 50, prefer 100Min 50, prefer 100– Min 5 observations/item, Min 5 observations/item,
prefer 10 observations/itemprefer 10 observations/item
Types of Factor Types of Factor AnalysisAnalysis
Exploratory Factor Analysis Exploratory Factor Analysis (EFA)(EFA)– Used to discover underlying structureUsed to discover underlying structure– Principal components analysis (PCA) Principal components analysis (PCA)
(Thurstone)(Thurstone) Treats individual items or measures Treats individual items or measures
as though they have no unique as though they have no unique errorerror
– Factor analysis (common factors Factor analysis (common factors analysis) (Spearman)analysis) (Spearman) Treats individual items or measures Treats individual items or measures
as having unique erroras having unique error– Both PCA and FA give similar answers Both PCA and FA give similar answers
most of the timemost of the time
Confirmatory Factor Analysis Confirmatory Factor Analysis (CFA)(CFA)– Used to test whether data fit a priori Used to test whether data fit a priori
expectations for data structureexpectations for data structure– Structural equations modelingStructural equations modeling
Purpose of EFAPurpose of EFA EFA is a data reduction techniqueEFA is a data reduction technique
– Scientific parsimonyScientific parsimony– Which items are virtually the same Which items are virtually the same
thingthing Objective: simplification of items Objective: simplification of items
into subset of concepts or into subset of concepts or measuresmeasures
Part of construct validation (what Part of construct validation (what are underlying patterns in data?)are underlying patterns in data?)
EFA assesses dimensionality or EFA assesses dimensionality or homogeneityhomogeneity
Issues:Issues:– Use principal components analysis Use principal components analysis
(PCA) or factor analysis (FA)?(PCA) or factor analysis (FA)?– How many factors?How many factors?– What type of rotation?What type of rotation?– How to interpret?How to interpret?
LoadingsLoadings Cross-loadingsCross-loadings
Types of EFATypes of EFA Principal components analysisPrincipal components analysis
– A composite of the observed variables as A composite of the observed variables as a summary of those variablesa summary of those variables
– Assumes no error in itemsAssumes no error in items– No assumption of underlying constructNo assumption of underlying construct– Often used in physical scienceOften used in physical science– Precise mathematical solutions possiblePrecise mathematical solutions possible– Unity inserted on diagonal of matrixUnity inserted on diagonal of matrix
Factor (or common factors) analysisFactor (or common factors) analysis– In SPSS known as principal axis factoringIn SPSS known as principal axis factoring– Explain relationship between observed Explain relationship between observed
vars in terms of latent vars or factorsvars in terms of latent vars or factors– Factor is a hypothesized constructFactor is a hypothesized construct– Assumes error in itemsAssumes error in items– Precise math not possible, solved by Precise math not possible, solved by
iterationiteration– Communalities (shared var) on diagonalCommunalities (shared var) on diagonal
Basic Logic of EFABasic Logic of EFA Items you want to reduce. Items you want to reduce. Creates mathematical combination Creates mathematical combination
of variables that maximizes of variables that maximizes variance you can predict in all variance you can predict in all variablesvariables principal component or principal component or a factora factor
New combination of items from New combination of items from residual variance that maximizes residual variance that maximizes variance you can predict in what is variance you can predict in what is leftleft second principal component second principal component or factoror factor
Continue until all variance is Continue until all variance is accounted for. Select the minimal accounted for. Select the minimal number of factors that captures number of factors that captures the most amount of variance.the most amount of variance.
Interpret the factors. Interpret the factors. Rotated matrix and loadings are Rotated matrix and loadings are
more interpretable. more interpretable.
Concepts and TermsConcepts and TermsPCA starts with data matrix of N persons PCA starts with data matrix of N persons
arranged in rows and k measures arranged in arranged in rows and k measures arranged in columnscolumns
MeasuresMeasures
PersonsPersons A A B B C C D ... k D ... k
11
22 The objective is to explainThe objective is to explain
33 the data in less than thethe data in less than the
.. total number of itemstotal number of items
..
NN
N persons, k different measuresN persons, k different measures
PCA is a method to transform the original set of PCA is a method to transform the original set of variables into a new set of principal variables into a new set of principal components that are unrelated to each components that are unrelated to each other.other.
Concepts and Concepts and TermsTerms
Factor - Linear composite. A Factor - Linear composite. A way of turning multiple way of turning multiple measures into one thing.measures into one thing.
Factor Score - Measure of Factor Score - Measure of one person’s score on a one person’s score on a given factor.given factor.
Factor Loadings - Factor Loadings - Correlation of a factor score Correlation of a factor score with an item. Variables with with an item. Variables with high loadings are the high loadings are the distinguishing features of distinguishing features of the factor.the factor.
Concepts and Concepts and TermsTerms
Communality - (hCommunality - (h22) - Variance in ) - Variance in a given item accounted for by all a given item accounted for by all factors. Sum of squared factor factors. Sum of squared factor loadings in a row from factor loadings in a row from factor analysis results. These are analysis results. These are presented in the diagonal in presented in the diagonal in common factor analysiscommon factor analysis
Factorally pure - A test that only Factorally pure - A test that only loads on one factor.loads on one factor.
Scale score - score for individual Scale score - score for individual obtained by adding together obtained by adding together items making up a factor. items making up a factor.
The ProcessThe Process Because we are trying to Because we are trying to
reduce the data, we don’t reduce the data, we don’t want as many factors as want as many factors as itemsitems
Because each new Because each new component or factor is the component or factor is the best linear combination of best linear combination of residual variance, data can residual variance, data can be explained relatively well in be explained relatively well in many less factors than many less factors than original number of itemsoriginal number of items
Stop taking additional factors Stop taking additional factors is a difficult decision. Primary is a difficult decision. Primary methods:methods:– Scree ruleScree rule– Kaiser criterion Kaiser criterion
(eigenvalues > 1)(eigenvalues > 1)
How Many Factors?How Many Factors? Scree Plot (Cattell) - Not a test Scree Plot (Cattell) - Not a test
– Look for bend in plotLook for bend in plot– Include factor located right at bend pointInclude factor located right at bend point
Kaiser (or Latent Root) criterionKaiser (or Latent Root) criterion– Eigenvalues greater than 1Eigenvalues greater than 1– Also, 1 is the amount of variance Also, 1 is the amount of variance
accounted for by a single item (raccounted for by a single item (r22 = = 1.00). If eigenvalue < 1.00 then factor 1.00). If eigenvalue < 1.00 then factor accounts for less variance than a single accounts for less variance than a single item.item.
– Tinsley & Tinsley - Kaiser criterion can Tinsley & Tinsley - Kaiser criterion can underestimate number of factorsunderestimate number of factors
A priori hypothesized # of A priori hypothesized # of factorsfactors
Percent of variance criterion Percent of variance criterion Parallel analysis – eigenvalues Parallel analysis – eigenvalues
higher than expect by chancehigher than expect by chance Use both Use both plus theoryplus theory to make to make
determinationdetermination
ExampleExampleRR matrix (correlation matrix) matrix (correlation matrix)
BlPrBlPr LSatLSatChol LStrChol LStr BdWtBdWt JSatJSatJStrJStr
BlPrBlPr 1.001.00
LSatLSat -.18 -.18 1.00 1.00
CholChol .65.65 -.17 -.17 1.00 1.00
LStrLStr .15 .15 -.45-.45 .22 .22 1.00 1.00
BdWtBdWt .45.45 -.11 -.11 .52.52 .16 .16 1.00 1.00
JSatJSat -.21 -.21 .85.85 -.12 -.12 -.35-.35 -.05 -.05 1.001.00
JStrJStr .19 .19 -.21 -.21 .02 .02 .79.79 .19 .19 -.35-.35 1.001.00
Principal Components Analysis (PCA)Principal Components Analysis (PCA)
Initial Statistics:Initial Statistics:Variable Communality * Factor Eigenval %Var Cum%Variable Communality * Factor Eigenval %Var Cum%
BLPR 1.00000 * 1 2.85034 40.7 40.7BLPR 1.00000 * 1 2.85034 40.7 40.7
LSAT 1.00000 * 2 1.74438 24.9 65.6LSAT 1.00000 * 2 1.74438 24.9 65.6
CHOL 1.00000 * 3 1.16388 16.6 82.3CHOL 1.00000 * 3 1.16388 16.6 82.3
LSTR 1.00000 * 4 .56098 8.0 90.3LSTR 1.00000 * 4 .56098 8.0 90.3
BDWT 1.00000 * 5 .44201 6.3BDWT 1.00000 * 5 .44201 6.3 96.696.6
JSAT 1.00000 * 6 .20235 2.9 99.5JSAT 1.00000 * 6 .20235 2.9 99.5
JSTR 1.00000 * 7 .03607 .5 100.0JSTR 1.00000 * 7 .03607 .5 100.0
ExampleExampleVariable Communality * Factor Eigenval %Var Variable Communality * Factor Eigenval %Var
Cum%Cum%
BLPR 1.00000 * 1 2.85034 40.7 40.7BLPR 1.00000 * 1 2.85034 40.7 40.7
LSAT 1.00000 * 2 1.74438 24.9 65.6LSAT 1.00000 * 2 1.74438 24.9 65.6
CHOL 1.00000 * 3 1.16388 16.6 82.3CHOL 1.00000 * 3 1.16388 16.6 82.3
LSTR 1.00000 * 4 .56098 8.0 90.3LSTR 1.00000 * 4 .56098 8.0 90.3
BDWT 1.00000 * 5 .44201 6.3 96.6BDWT 1.00000 * 5 .44201 6.3 96.6
JSAT 1.00000 * 6 .20235 2.9 99.5JSAT 1.00000 * 6 .20235 2.9 99.5
JSTR 1.00000 * 7 .03607 .5 100.0JSTR 1.00000 * 7 .03607 .5 100.0
Factor Matrix (Unrotated):Factor Matrix (Unrotated):
Factor 1 Factor 2 Factor 3 ... Fac7Factor 1 Factor 2 Factor 3 ... Fac7
LSTR .73738 -.32677 .47575LSTR .73738 -.32677 .47575
LSAT -.71287 .38426 .52039LSAT -.71287 .38426 .52039
JSAT -.70452 .42559 .48553JSAT -.70452 .42559 .48553
JSTR .64541 -.32867 .62912JSTR .64541 -.32867 .62912
CHOL .54945 .68694 -.10453CHOL .54945 .68694 -.10453
BDWT .48867 .60471 .13043BDWT .48867 .60471 .13043
BLPR .58722 .60269 -.08534BLPR .58722 .60269 -.08534
Eigenvalue 2.850343Eigenvalue 2.850343 1.74438 1.16388 1.74438 1.16388
ExampleExampleFinal Statistics:Final Statistics:
Variable Communality * Factor Eigenvalue %Var CumVariable Communality * Factor Eigenvalue %Var Cum%%
BLPR .71533 * 1 2.85034 40.7 40.7BLPR .71533 * 1 2.85034 40.7 40.7
LSAT .92665 * 2 1.74438 24.9 65.6LSAT .92665 * 2 1.74438 24.9 65.6
CHOL .78470 * 3 1.16388 16.6 82.3CHOL .78470 * 3 1.16388 16.6 82.3
LSTR .87684 *LSTR .87684 *
BDWT .62149 *BDWT .62149 *
JSAT .91321 *JSAT .91321 *
JSTR .92037 *JSTR .92037 *
VARIMAX Rotated Factor Matrix:VARIMAX Rotated Factor Matrix:
Factor 1 Factor 2 Factor 3Factor 1 Factor 2 Factor 3 hh22
CHOL CHOL .87987.87987 -.10246 -.00574 -.10246 -.00574.78470.78470
BLPR BLPR .83043.83043 -.14875 .05988 -.14875 .05988 .71533.71533
BDWT BDWT .76940.76940 .05630 .16234 .05630 .16234 .62149.62149
LSAT -.09806 LSAT -.09806 .94430.94430 -.15917 -.15917 .92665.92665
JSAT -.05790 JSAT -.05790 .93376.93376 -.19479 -.19479 .91321.91321
JSTR .06542 -.10717 JSTR .06542 -.10717 .95110.95110 .92036.92036
LSTR .12381 -.26465 LSTR .12381 -.26465 .88965.88965 .87684.87684
Eigenvalue 2.0883Eigenvalue 2.0883 1.88091.8809 1.78931.7893
Scree PlotScree PlotFactor Scree Plot
Factor Number
7654321
Eig
env
alu
e
3.5
3.0
2.5
2.0
1.5
1.0
.5
0.0
Scree comes from a word for loose rock anddebris at the base of a cliff!
Information from Information from EFAEFA FACTORFACTOR
MsrMsr F1F1 F2F2 F3F3 hh22
aa .60.60 -.06-.06 .02 .02 .36.36
bb .81.81 .12 .12 -.03-.03 .67.67
cc .77.77 .03 .03 .08 .08 .60.60
dd .01.01 .65 .65 -.04-.04 .42.42
ee .03.03 .80 .80 .07 .07 .65.65
ff .12.12 .67 .67 -.05-.05 .47.47
gg .19.19 -.02-.02 .68 .68 .50.50
hh .08.08 -.10-.10 .53 .53 .30.30
ii .26.26 -.13-.13 .47 .47 .31.31
Sum Sq LdngSum Sq Ldng 1.761.76 1.561.56 .98.98 TotalTotal
% Variance% Variance .195.195 .173.173 .109.109 47.7%47.7%
(1.76/9) (1.56/9) (.98/9)(1.76/9) (1.56/9) (.98/9)
A factor loading is the correlation A factor loading is the correlation between a factor and an itembetween a factor and an item
When factors are orthogonal, factor When factors are orthogonal, factor loadings squared are the amount of loadings squared are the amount of variance in one variable explained by variance in one variable explained by that factor (F1 explains 36% of the that factor (F1 explains 36% of the variance in Msr a; F3 explains 46% of variance in Msr a; F3 explains 46% of the variance in Msr g)the variance in Msr g)
Information from EFAInformation from EFAMsrMsr F1F1 F2F2 F3F3 hh22
aa .60.60 -.06-.06 .02 .02 .36.36bb .81.81 .12 .12 -.03-.03 .67.67...... ... ... ... ... ... ... ... ... ii .26.26 -.13-.13 .47 .47 .31.31Sum Sq LdngSum Sq Ldng 1.761.76 1.561.56 .98.98 TotalTotal% Variance% Variance .195.195 .173.173 .109.109 47.7%47.7%
(1.76/9) (1.56/9) (.98/9)(1.76/9) (1.56/9) (.98/9)
Eigenvalue:Eigenvalue: Sum of squared Sum of squared loadings down a column loadings down a column (associated with a factor). Total (associated with a factor). Total variance in all vars explained by variance in all vars explained by one factor. Factors with one factor. Factors with eigenvalues less than 1 predict eigenvalues less than 1 predict less than the variance of 1 item.less than the variance of 1 item.
Communality (hCommunality (h22):): Variance in a Variance in a given item accounted for by all given item accounted for by all factors. Sum of squared factors. Sum of squared loadings across rows. Will equal loadings across rows. Will equal 1 if you retain all possible 1 if you retain all possible factors.factors.
Eigenvalue
Information from EFAInformation from EFA FACTORFACTOR
MsrMsr F1F1 F2F2 F3F3 hh22
aa .60.60 -.06-.06 .02 .02 .36.36
bb .81.81 .12 .12 -.03-.03 .67.67
...... ... ... ... ... ... ... ... ...
ii .26.26 -.13-.13 .47 .47 .31.31
Sum Sq LdngSum Sq Ldng 1.761.76 1.561.56 .98.98 TotalTotal
% Variance% Variance .195.195 .173.173 .109.109 47.7%47.7%
(1.76/9) (1.56/9) (.98/9)(1.76/9) (1.56/9) (.98/9)
Average of all communalities (hAverage of all communalities (h22 / k) = / k) = proportion of variance in all variables proportion of variance in all variables explained by all factors. explained by all factors.
If all variables reproduced perfectly by If all variables reproduced perfectly by the factors, correlation between the factors, correlation between original variables equals sum of the original variables equals sum of the products of factor loadings. When not products of factor loadings. When not perfect, gives an estimate of the perfect, gives an estimate of the correlation.correlation.
e.g. re.g. rabab (.60*.81) + (-.06*.12) + (.02*-.03) (.60*.81) + (-.06*.12) + (.02*-.03) 48 48
Information from EFAInformation from EFAMsrMsr F1F1 F2F2 F3F3 hh22
aa .60.60 -.06-.06 .02 .02 .36.36
bb .81.81 .12 .12 -.03-.03 .67.67
...... ... ... ... ... ... ... ... ...
ii .26.26 -.13-.13 .47 .47 .31.31
Sum Sq LdngSum Sq Ldng 1.761.76 1.561.56 .98.98 TotalTotal
% Variance% Variance .195.195 .173.173 .109.109 47.7%47.7%
(1.76/9) (1.56/9) (.98/9)(1.76/9) (1.56/9) (.98/9)
1-h1-h22 is the uniqueness is the uniqueness variance of variance of an item not shared with other items. an item not shared with other items. Unique variance could be random Unique variance could be random error or systematic. error or systematic.
The factor matrix above is after The factor matrix above is after rotation. Eigenvalues computed on rotation. Eigenvalues computed on the unrotated and unreduced factor the unrotated and unreduced factor loading matrix because we are loading matrix because we are interested in total variance interested in total variance accounted for in the data. Use of accounted for in the data. Use of eigenvalues and % variance eigenvalues and % variance accounted for in SPSS not reordered accounted for in SPSS not reordered after rotation.after rotation.
Eigenvalue
Important Properties of Important Properties of PCAPCA
Each factor in turn maximizes Each factor in turn maximizes variance explained from an variance explained from an RR matrixmatrix
For any number of factors For any number of factors obtained, PCs maximize variance obtained, PCs maximize variance explainedexplained
Amount of variance explained by Amount of variance explained by each PC equals the corresponding each PC equals the corresponding characteristic root (eigenvalue)characteristic root (eigenvalue)
All characteristic roots of PCs are All characteristic roots of PCs are positivepositive
Number of PCs derived equal the Number of PCs derived equal the number of factors need to explain number of factors need to explain all the variance in all the variance in RR
The sum of characteristic roots The sum of characteristic roots equals the sum of diagonal equals the sum of diagonal RR elementselements
RotationsRotations All original PC and PF solutions All original PC and PF solutions
are orthogonal.are orthogonal. Once you obtain minimal number Once you obtain minimal number
of factors, you have to interpret of factors, you have to interpret themthem
Interpreting original solutions is Interpreting original solutions is difficult. Rotation aids difficult. Rotation aids interpretation.interpretation.
You are looking for simple You are looking for simple structurestructure– Component loadings should be Component loadings should be
very high for a few vars and very high for a few vars and near 0 for remaining variablesnear 0 for remaining variables
– Each variable should load Each variable should load highly on only 1 componenthighly on only 1 component
Unrotated MatrixUnrotated Matrix Rotated Rotated MatrixMatrixVarVar F1F1 F2 F2 F1F1 F2F2aa .75.75 .63 .63 .14.14 .95.95bb .69.69 .57 .57 .14.14 .90.90cc .80.80 .49 .49 .18.18 .92.92dd .85.85 -.42-.42 .94.94 .09.09ee .76.76 -.42-.42 .92.92 .07.07
RotationRotation After rotation, variance After rotation, variance
accounted for by a factor accounted for by a factor is spread out. First factor is spread out. First factor no longer accounts for no longer accounts for max variance possible; max variance possible; others get more variance. others get more variance. Total variance accounted Total variance accounted for is the same.for is the same.
Two types of rotationTwo types of rotation– Orthogonal (factors Orthogonal (factors
uncorrelated)uncorrelated)– Oblique (factors Oblique (factors
correlated)correlated)
RotationRotation Orthogonal rotation (rigid, 90 Orthogonal rotation (rigid, 90
degrees) - PCs or PFs remain degrees) - PCs or PFs remain uncorrelated after transformationuncorrelated after transformation– Varimax - Simplifying column Varimax - Simplifying column
weights to 1s and 0s. Factor has weights to 1s and 0s. Factor has items loading highly, others items loading highly, others don’t load. Not appropriate if don’t load. Not appropriate if you expect a single factor.you expect a single factor.
– Quartimax - Simplify to 1s and Quartimax - Simplify to 1s and 0s in a row. Item loads high on 1 0s in a row. Item loads high on 1 factor, almost 0 on others. factor, almost 0 on others. Appropriate if you expect single Appropriate if you expect single general factor.general factor.
– Equimax. Compromise of Equimax. Compromise of Varimax and Quartimax Varimax and Quartimax rotations.rotations.
– In practice, choice of rotation In practice, choice of rotation makes little differencemakes little difference
RotationRotation Oblique or correlated components (less Oblique or correlated components (less
or more than 90 degrees) - Account for or more than 90 degrees) - Account for same % var, but factors correlatedsame % var, but factors correlated– Some say not meaningful with PCASome say not meaningful with PCA– Many factors are theoretically related, so Many factors are theoretically related, so
rotation method should not “force” rotation method should not “force” orthogonalityorthogonality Allows the loadings to more closely match Allows the loadings to more closely match
simple structuresimple structure Correlated solutions will get you closer to Correlated solutions will get you closer to
simple structuresimple structure Oblimin (Kaiser) and promax are goodOblimin (Kaiser) and promax are good
– Provides a structure matrix of loadings and Provides a structure matrix of loadings and a pattern matrix of partial weights – which a pattern matrix of partial weights – which to interpret?to interpret?
Orthogonal RotationOrthogonal Rotation
Unrotated MatrixUnrotated MatrixRotated MatrixRotated Matrix
VarVar F1F1 F2 F2 F1F1 F2F2
aa .75.75 .63 .63 .14.14 .95.95
bb .69.69 .57 .57 .14.14 .90.90
cc .80.80 .49 .49 .18.18 .92.92
dd .85.85 -.42-.42 .94.94 .09.09
ee .76.76 -.42-.42 .92.92 .07.07
.au
RF1
RF2
F1
F2
-1.00
-1.00
1.00
1.00
.bu .cu
.du.eu
Simple Structure Simple Structure (Thurstone)(Thurstone)
(1) Each row of factor matrix (1) Each row of factor matrix should have at least one 0 should have at least one 0 loadingloading
(2) The number of items with 0 (2) The number of items with 0 loadings equals the number of loadings equals the number of factors; each column has 1 or factors; each column has 1 or more 0 loadingsmore 0 loadings
(3) Items with high loadings on (3) Items with high loadings on one factor or the otherone factor or the other
(4) If there are more than 4 (4) If there are more than 4 factors, a large portion of items factors, a large portion of items should have zero loadingsshould have zero loadings
(5) For every pair of columns, (5) For every pair of columns, there should be few cross-there should be few cross-loadingsloadings
(6) Few if any negative loadings(6) Few if any negative loadings
Simple StructureSimple Structure FactorFactor
MsrMsr 11 22 33aa xx 00 00bb xx 00 00cc xx 00 00dd 00 xx 00ee 00 xx 00ff 00 xx 00gg 00 00 xxhh 00 00 xxii 00 00 xxjj 00 00 xx
Oblique RotationOblique Rotation Example:Example:
Unrotated MatrixUnrotated MatrixRotated MatrixRotated Matrix
VarVar F1F1 F2 F2 F1F1 F2F2
aa .75.75 .63 .63 .04.04 .98.98
bb .69.69 .57 .57 .02.02 .99.99
cc .80.80 .49 .49 .01.01 .97.97
dd .85.85 -.42-.42 .99.99 .01.01
ee .76.76 -.42-.42 .98.98 .02.02
.au
RF1
RF2
F1
F2
-1.00
-1.00
1.00
1.00
.bu .cu
.du.eu
Orthogonal or Orthogonal or Oblique Rotation?Oblique Rotation?
Nunnally suggests using Nunnally suggests using orthogonal as opposed to orthogonal as opposed to oblique rotationsoblique rotations– Orthogonal is simplerOrthogonal is simpler– Leads to same conclusionsLeads to same conclusions– Oblique can be misleadingOblique can be misleading
Ford et al. suggest using Ford et al. suggest using oblique unless oblique unless orthogonality assumption orthogonality assumption is tenableis tenable
InterpretationInterpretation Factors usually interpreted by Factors usually interpreted by
observing which variables load observing which variables load highest on each factorhighest on each factor– a priori criteria for loadings a priori criteria for loadings
(min .3+)(min .3+) Name factor. Always provide Name factor. Always provide
factor loading matrix in study.factor loading matrix in study. Cross-loadings are problematicCross-loadings are problematic
– a priori criteria for “large” a priori criteria for “large” cross-loadingcross-loading
– decide a priori what you will dodecide a priori what you will do Factor loadings or summated Factor loadings or summated
scales used to define new scale. scales used to define new scale. Can go back to correlation matrix Can go back to correlation matrix and do not only use factor and do not only use factor loadings. Loadings can be loadings. Loadings can be inflated.inflated.
PCA and FAPCA and FA PCA - No constructs of theoretical PCA - No constructs of theoretical
meaning assumed; Simple meaning assumed; Simple mechanical linear combination. (1s in mechanical linear combination. (1s in the diagonal of R)the diagonal of R)
FA - assumes underlying latent FA - assumes underlying latent constructs. Allows for measurement constructs. Allows for measurement error (communalities in diagonal of R)error (communalities in diagonal of R)– Also PAF or common factors Also PAF or common factors
analysisanalysis PCA uses all the variance. FA uses PCA uses all the variance. FA uses
ONLY shared variance. ONLY shared variance. In FA you can have indeterminant In FA you can have indeterminant
(unsolvable) solutions. Have to (unsolvable) solutions. Have to iterate (computer makes best iterate (computer makes best “guess”) to get the solutions.“guess”) to get the solutions.
FAFA Also known as principal axis Also known as principal axis
factoring or common factor factoring or common factor analysisanalysis
StepsSteps– Estimate communalities of the Estimate communalities of the
variables (shared variance)variables (shared variance)– Substitute communalities in Substitute communalities in
place of 1s on diagonal of place of 1s on diagonal of RR– Perform a principal component Perform a principal component
analysis on the reduced matrixanalysis on the reduced matrix– Iterated FAIterated FA
Estimate hEstimate h22
Solve for factor modelSolve for factor model Calculate new communalitiesCalculate new communalities Substitute new estimates of hSubstitute new estimates of h22 into into
matrix and redomatrix and redo Iterate until communalities don’t Iterate until communalities don’t
change muchchange much Rotate for interpretationRotate for interpretation
Estimating Estimating CommunalitiesCommunalities
Highest correlation of given Highest correlation of given variable with other variables in variable with other variables in data setdata set
Squared multiple correlations Squared multiple correlations (SMCs) of each variable (SMCs) of each variable predicted by all other variables predicted by all other variables in the data setin the data set
Reliability of the variableReliability of the variable Because you are estimating and Because you are estimating and
the factors are no longer the factors are no longer combinations of actual combinations of actual variables, can get funny results:variables, can get funny results:– Communalities > 1.00Communalities > 1.00– Negative eigenvaluesNegative eigenvalues– Negative uniquenessNegative uniqueness
Example FAExample FARR matrix (correlation matrix with h matrix (correlation matrix with h22))
BlPrBlPr LSatLSatChol LStrChol LStr BdWtBdWt JSatJSatJStrJStr
BlPrBlPr .54 .54
LSatLSat -.18 -.18 .89 .89
CholChol .65.65 -.17 -.17 .67 .67
LStrLStr .15 .15 -.45-.45 .22 .22 .87 .87
BdWtBdWt .45.45 -.11 -.11 .52.52 .16 .16 .41 .41
JSatJSat -.21 -.21 .85.85 -.12 -.12 -.35-.35 -.05-.05 .86 .86
JStrJStr .19 .19 -.21 -.21 .02 .02 .79.79 .19 .19 -.35-.35.87.87
Principal Axis Factoring (PAF)Principal Axis Factoring (PAF)
Initial Statistics:Initial Statistics:
Variable Communality * Factor Eigenvalue %Var Cum%Variable Communality * Factor Eigenvalue %Var Cum%
BLPR .53859 * 1 2.85034 40.7 40.7BLPR .53859 * 1 2.85034 40.7 40.7
LSAT .88573 * 2 1.74438 24.9 65.6LSAT .88573 * 2 1.74438 24.9 65.6
CHOL .66685 * 3 1.16388 16.6 82.3CHOL .66685 * 3 1.16388 16.6 82.3
LSTR .87187 * 4 .56098 8.0 90.3LSTR .87187 * 4 .56098 8.0 90.3
BDWT .41804 * 5 .44201 6.3 96.6BDWT .41804 * 5 .44201 6.3 96.6
JSAT .86448 * 6 .20235 2.9 99.5JSAT .86448 * 6 .20235 2.9 99.5
JSTR .86966 * 7 .03607 .5 100.0JSTR .86966 * 7 .03607 .5 100.0
FAFAPrincipal Axis Factoring (PAF)Principal Axis Factoring (PAF)
Initial Statistics:Initial Statistics:
Variable Communality * Factor Eigenvalue Variable Communality * Factor Eigenvalue %Var Cum% %Var Cum%
BLPR .53859 * 1 2.85034 40.7 40.7BLPR .53859 * 1 2.85034 40.7 40.7
LSAT .88573 * 2 1.74438 24.9 65.6LSAT .88573 * 2 1.74438 24.9 65.6
CHOL .66685 * 3 1.16388 16.6 82.3CHOL .66685 * 3 1.16388 16.6 82.3
LSTR .87187 * 4 .56098 8.0 90.3LSTR .87187 * 4 .56098 8.0 90.3
BDWT .41804 * 5 .44201 6.3 96.6BDWT .41804 * 5 .44201 6.3 96.6
JSAT .86448 * 6 .20235 2.9 99.5JSAT .86448 * 6 .20235 2.9 99.5
JSTR .86966 * 7 .03607 .5 100.0JSTR .86966 * 7 .03607 .5 100.0
Factor Matrix (Unrotated):Factor Matrix (Unrotated):
Factor 1 Factor 2 Factor 3Factor 1 Factor 2 Factor 3
LSAT -.75885 .31104 .54455LSAT -.75885 .31104 .54455
LSTR .70084 -.20961 .36388LSTR .70084 -.20961 .36388
JSAT -.70038 .31502 .39982JSAT -.70038 .31502 .39982
JSTR .68459 -.29044 .66213JSTR .68459 -.29044 .66213
CHOL .48158 .74399 -.07267CHOL .48158 .74399 -.07267
BLPR .48010 .56066 -.02253BLPR .48010 .56066 -.02253
BDWT .36699 .47668 .08381BDWT .36699 .47668 .08381
FAFAPrincipal Axis Factoring (PAF)Principal Axis Factoring (PAF)
Final Statistics:Final Statistics:Variable Communality * Factor Eigenvalue %Var Variable Communality * Factor Eigenvalue %Var
Cum%Cum%
BLPR .54535 * 1 2.62331 37.5 37.5BLPR .54535 * 1 2.62331 37.5 37.5
LSAT .96913 * 2 1.41936 20.3 57.8LSAT .96913 * 2 1.41936 20.3 57.8
CHOL .79071 * 3 1.04004 14.9 72.6CHOL .79071 * 3 1.04004 14.9 72.6
LSTR .66752 *LSTR .66752 *
BDWT .36893 *BDWT .36893 *
JSAT .74962 *JSAT .74962 *
JSTR .99144 *JSTR .99144 *
Rotated Factor Matrix (VARIMAX):Rotated Factor Matrix (VARIMAX):
Factor 1 Factor 2 Factor 3Factor 1 Factor 2 Factor 3
LSAT .96846 -.10483 -.14223LSAT .96846 -.10483 -.14223
JSAT .83532 -.07092 -.21643JSAT .83532 -.07092 -.21643
CHOL -.08425 .88520 -.00547CHOL -.08425 .88520 -.00547
BLPR -.11739 .72364 .08898BLPR -.11739 .72364 .08898
BDWT -.00430 .59379 .12778BDWT -.00430 .59379 .12778
JSTR -.10474 .07011 .98770JSTR -.10474 .07011 .98770
LSTR -.28514 .15273 .75026LSTR -.28514 .15273 .75026
Logic of FALogic of FA
BlPr LSat Chol LStr BdWt JSat JStr
How many? What are the factors?
What we found:
BlPr LSat Chol LStr BdWt JSat JStr
PCA vs. FAPCA vs. FA Pros & Cons:Pros & Cons:
– Pro PCA: has solvable equations. Pro PCA: has solvable equations. “Math is right”.“Math is right”.
– Con PCA: Lumping garbage Con PCA: Lumping garbage together. Also, no underlying together. Also, no underlying concepts.concepts.
– Pro FA: considers role of Pro FA: considers role of measurement error, gets at measurement error, gets at concepts. concepts.
– Con FA: doing mathematical Con FA: doing mathematical gymnastics.gymnastics.
Practically: Usually not much Practically: Usually not much differencedifference– PCA will tend to converge more PCA will tend to converge more
consistentlyconsistently– FA is more meaningful conceptuallyFA is more meaningful conceptually
PCA vs. FAPCA vs. FA Situations where you might Situations where you might
want to use FA:want to use FA:– Where there are 12 or fewer Where there are 12 or fewer
variables (diagonal will have a variables (diagonal will have a large impact)large impact)
– Where the correlations between Where the correlations between the variables are small, then the variables are small, then diagonals will have a large impactdiagonals will have a large impact
If you have clear factor If you have clear factor structure, won’t make much structure, won’t make much differencedifference
Otherwise:Otherwise:– PCA will tend to overfactorPCA will tend to overfactor– If doing exploratory analysis, may If doing exploratory analysis, may
not mind overfactoringnot mind overfactoring
Using FA ResultsUsing FA Results Single surrogate measure – choose a Single surrogate measure – choose a
single item with a high loading to single item with a high loading to represent factorrepresent factor
Summated Scale*Summated Scale*– Form a composite from items Form a composite from items
loading on same factorloading on same factor– Average all items that load on a Average all items that load on a
factor (unit weighting)factor (unit weighting)– Calculate the alpha for the Calculate the alpha for the
reliabilityreliability– Name the scale/constructName the scale/construct
Factor ScoresFactor Scores– Composite measures for each Composite measures for each
factor were computed for each factor were computed for each subjectsubject
– Based on all factor loadings for all Based on all factor loadings for all itemsitems
– Not easily replicatedNot easily replicated
ReportingReporting If you create a factor based If you create a factor based
scale, describe the processscale, describe the process Factor analytic study, report:Factor analytic study, report:
– Theoretical rationale for EFATheoretical rationale for EFA– Detailed description of subjects Detailed description of subjects
and items, including and items, including descriptive statsdescriptive stats
– Correlation matrixCorrelation matrix– Methods used (PCA/FA, Methods used (PCA/FA,
communality estimates, factor communality estimates, factor extraction, rotation)extraction, rotation)
– Criteria employed for number Criteria employed for number of factors and meaningful of factors and meaningful loadingsloadings
– Factor matrix (aka pattern Factor matrix (aka pattern matrix)matrix)
Confirmatory Factor Confirmatory Factor AnalysisAnalysis
Part of construct validation process Part of construct validation process (do the data conform to expectations (do the data conform to expectations regarding the underlying patterns?)regarding the underlying patterns?)
Use SEM packages to perform CFAUse SEM packages to perform CFA EFA with specified number of factors EFA with specified number of factors
for a criterion is NOT a CFAfor a criterion is NOT a CFA Basically start with a correlation Basically start with a correlation
matrix and expected relationshipsmatrix and expected relationships Look at whether expected Look at whether expected
relationships can reproduce the relationships can reproduce the correlation matrix wellcorrelation matrix well
Tested with chi-square goodness of Tested with chi-square goodness of fit. If significant, data don’t fit fit. If significant, data don’t fit expected structure. No confirmation.expected structure. No confirmation.
Alternative measures of fit available.Alternative measures of fit available.
Logic of CFALogic of CFALet’s say I believe:
BlPr LSat Chol LStr BdWt JSat JStr
Phys Hlth Life Happ Job Happ
BlPr LSat Chol LStr BdWt JSat JStr
But the reality is:
Phys Hlth Stress Satisfact
Data won’t confirm expected structure
ExampleExampleR matrix (correlation matrix)R matrix (correlation matrix)
BlPrBlPr LSatLSatChol LStrChol LStr BdWtBdWtJSatJSat JStrJStr
BlPrBlPr 1.001.00
LSatLSat -.18 -.18 1.00 1.00
CholChol .65.65 -.17 -.17 1.00 1.00
LStrLStr .15 .15 -.45-.45 .22 .22 1.00 1.00
BdWtBdWt .45.45 -.11 -.11 .52.52 .16 .16 1.00 1.00
JSatJSat -.21 -.21 .85.85 -.12 -.12 -.35-.35 -.05-.051.001.00
JStrJStr .19 .19 -.21 -.21 .02 .02 .79.79 .19 .19 -.35-.351.001.00
Do the data fit?Do the data fit?
BlPr LSat Chol LStr BdWt JSat JStr
Phys Hlth Life Happ Job Happ