an introduction to factor analysis ppt
TRANSCRIPT
• RAKESH KUMAR• MUKESH CHANDRA
BISHT(PhD Scholar, LNIPE)
A Presentation by
AN INTRODUCTION TO EXPOLRATORY FACTOR ANALYSIS
“When you CAN MEASURE what you are speaking about and express it in numbers, you know
something about it; but when you CANNOT express it in numbers your knowledge is of a mearge and
unsatisfactory kind.”
Measurement is necessary.
LORD KELVIN, British Scientist
FIRST NOTABLE MENTION
Charles Edward Spearmen was known for his seminal work on testing and measuring of HUMAN INELLIGENCE by using the FACTOR ANALYSIS during World War I.
CHARLES EDWARD SPEARMEN(BRITISH PSYCHOLOGIST)
A factor is a linear combination of variables. It is a construct that is not directly observed but
that needs to be inferred from the input variables.
What is a factor
• Variable reduction technique
• Reduces a set of variable in terms of a small number of latent factors(unobservable).
• Factor analysis is a correlational method used to find and describe the underlying factors driving data values for a large set of variables.
Factor Analysis
SIMPLE PATH DIAGRAM FOR A FACTOR ANALYSIS MODEL
•F1 and F2 are two common factors. Y1,Y2,Y3,Y4, and Y5 are observed variables, possibly 5 subtests or measures of other observations such as responses to items on a survey.• e1,e2,e3,e4, and e5 represent residuals or unique factors, which are assumed to be uncorrelated with each other.
Questionnaire construction Test Battery construction
Uses of Factor Analysis
Conducting Factor Analysis
Testing the Assumptions
Construction of correlation Matrix
Interpretation of Factors
Rotation of Factors
Determination of Number of Factors
Method of Factor Analysis
1. No outliers in the data set.2. Normality of the data set.3. Adequate sample size.4. Multi collinearity and singularity among the
variables does not exist.5. Homoscedasticity does not exist between the
variables because factor analysis is a linear function of measured variables.
6. Variables should be linear in nature.7. Data should be metric in nature i.e. on
interval and ratio scale.
Assumptions to be fulfilled for running Factor analysis
KMO test is used
Bartlett test of sphericity
It test the null hypothesis that all the correlation between the variables is Zero. It also test whether the correlation matrix is a identity matrix or not. If it is an identity matrix then factor analysis becomes in appropriate.
Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy
This test checks the adequacy of data for running the factor analysis. The value of KMO ranges from 0 to 1. The larger the value of KMO more adequate is the sample for running the factor analysis. Kaiser recommends accepting values greater than 0.5 as acceptable.
Testing the Assumptions
Construction of correlation Matrix
Problem formulation
Interpretation of Factors
Rotation of Factors
Determination of Number of Factors
Method of Factor Analysis
•Analyses the pattern of correlations between variables in the correlation matrix
•Which variables tend to correlate highly together?
•If variables are highly correlated, likely that they represent the same underlying dimension
Factor analysis pinpoints the clusters of high correlations between variables and for each cluster, it will assign a factor
Construction of the Correlation Matrix
Correlation MatrixQ1 Q2 Q3 Q4 Q5 Q6
Q1 1
Q2 .987 1
Q3 .801 .765 1
Q4 -.003 -.088 0 1
Q5 -.051 .044 .213 .968 1
Q6 -.190 -.111 0.102 .789 .864 1
• Q1-3 correlate strongly with each other and hardly at all with 4-6• Q4-6 correlate strongly with each other and hardly at all with 1-3• Two factors!
Testing the Assumptions
Construction of correlation Matrix
Problem formulation
Interpretation of Factors
Rotation of Factors
Determination of Number of Factors
Method of Factor Analysis
Method of Factor Analysis
(A) Principal component analysis
•Provides a unique solution, so that the original data can be reconstructed from the results
•It looks at the total variance among the variables that is the unique as well as the common variance.
•In this method, the factor explaining the maximum variance is extracted first.
Uses an estimate of common variance among the original variables to generate factor solution. Because of this, the number of factors will always be less than the number of original variables
(B) Common factor analysis
Un weighted least squares, Generalized least squares, Maximum likelihood, Principal axis factoring, Alpha factoring, and Image factoring.
Other Method s Includes:-
Variable
Specific Variance
Error Varianc
e
Common
Variance
Variance unique to the variable itself
Variance due to
measurement error or some
random, unknown source
Variance that a variable
shares with other
variables in a matrix
When searching for the factors underlying the relationships between a set of variables, we are interested in detecting and explaining the common variance
Total Variance = common variance + specific variance + error variance
Testing the Assumptions
Construction of correlation Matrix
Problem formulation
Interpretation of Factors
Rotation of Factors
Determination of Number of Factors
Method of Factor Analysis
Determination of Number of FactorsEIGEN VALUE
•The Eigen value for a given factor measures the variance in all the variables which is accounted for by that factor. •It is the amount of variance explained by a factor. It is also called as characteristic root.
Kaiser Guttmann Criterion
This method states that the number of factors to be extracted should be equal to the number of factors having an Eigen value of 1 or greater than 1.
The Scree Plot
The examination of the Scree plot provides a visual of the total variance associated with each factor.
The steep slope shows the large factors.
The gradual trailing off (scree) shows the rest of the factors usually lower than an Eigen value of 1.
Scree Plot
Component Number
654321
3.5
3.0
2.5
2.0
1.5
1.0
.5
0.0
-.5
Take the components above the elbow
Testing the Assumptions
Construction of correlation Matrix
Problem formulation
Interpretation of Factors
Rotation of Factors
Determination of Number of Factors
Method of Factor Analysis
• Maximizes high item loadings and minimizes low item loadings, thereby producing a more interpretable and simplified solution. • Two common rotation techniques orthogonal rotation and oblique rotation.
Rotation of Factors
Rotation
Orthogonal Oblique
Varimax Qudramax Equamax Direct Oblimin Promax
Testing the Assumptions
Construction of correlation Matrix
Problem formulation
Interpretation of Factors
Rotation of Factors
Determination of Number of Factors
Method of Factor Analysis
KEY TERMINOLOGIES TO KNOW
Factor Loading
• It can be defined as the correlation coefficient between the variable
and the factor.
• The squared factor loading of a variable indicates the percentage
variability explained by the factor in that variable. A factor loading of
0.7 is considered to be sufficient.
COMMUNALITY
•The communality is the amount of variance each variable in the analysis shares with other variables.•Squared multiple correlation for the variable as dependent using the factors as predictors and is denoted by h2.• The value of communality may be considered as the indicator of reliability of a variable.
Variables Component 1 Component 2 Component 3 CommunalityVividness Qu -.198 -.805 .061 69%Control Qu .173 .751 .306 69%Preference Qu .353 .577 -.549 76%Generate Test -.444 .251 .543 55%Inspect Test -.773 .051 -.051 60%Maintain .734 -.003 .384 69%Transform (P&P) Test .759 -.155 .188 64%Transform (Comp) Test
-.792 .179 .304 75%
Visual STM Test .792 -.102 .215 69%
Eigenvalues 3.36 1.677 1.018 /
% Variance 37.3% 18.6% 11.3% /
Communality of Variable 1 (Vividness Qu) = (-.198)2 + (-.805)2 + (.061)2 = . 69 or 69%Eigenvalue of Comp 1 = ( [-.198]2 + [.173]2 + [.353]2 + [-.444]2 + [-.773]2 +[.734]2 + [.759]2 + [-.792]2 + [.792]2 ) = 3.363.36 / 9 = 37.3%
In a study on swimmers eleven physical and physiological parameters were measured. Apply factor analysis technique to study the factor structure and suggest the test battery that can be used for screening the talents in swimming.
Field Example
Click on this arrow
Click on Descriptives
Click on Continue
Click on Extraction
Click on Continue
Select Principal components
Click on Rotation
Click on Continue
Click on Rotation
Click on Continue
Click on OK
Select VARIMAX Rotation
Interpretation of various outputsDescriptive Statistics
Mean Std. Deviation Analysis N
Standing Broad Jump 212.3810 15.45793 21
Shuttle Run 10.2514 .51167 21
Fifty Meter Dash 7.6938 .80880 21
Twelve Meter run and walk 2488.9524 222.46696 21
Anerobic capacity 39.9071 12.70207 21
Weight 37.8095 7.67215 21
Height 148.3810 10.18566 21
Leg Length 76.3333 5.18009 21
Calf Girth 28.5238 1.99045 21
Thigh Girth 40.5238 3.51595 21
Shoulder Width 38.1429 4.43041 21
Correlation Matrix
Standing
Broad Jump
Shuttle Run
Fifty Meter Dash
Twelve Meter
run and walk
Anerobic
capacity
Weight
Height
Leg Lengt
h
Calf Girth
Thigh Girth
Shoulder
Width
Correlation
Standing Broad Jump
1.000
Shuttle Run -.651 1.000
Fifty Meter Dash -.359 .277 1.000
Twelve Meter run and walk
.539 -.691 -.492 1.000
Anerobic capacity .608 -.709 -.322 .686 1.000
Weight .469 -.087 -.231 -.045 .255 1.000
Height .416 -.048 -.358 .010 .142 .947 1.000
Leg Length .513 -.321 -.354 .151 .292 .687 .675 1.000
Calf Girth .606 -.495 -.400 .366 .602 .577 .522 .739 1.000
Thigh Girth .584 -.515 -.186 .269 .589 .632 .543 .646 .773 1.000
Shoulder Width .455 -.483 .128 .279 .410 .405 .244 .322 .377 .451 1.000
KMO and Bartlett's Test
Kaiser-Meyer-Olkin Measure of Sampling Adequacy. .687
Bartlett's Test of Sphericity
Approx. Chi-Square 165.579
df 55
Sig. .000
Since the value of KMO is more than 0.5 so the sample taken in the study is adequate to run the factor analysis.
Since the value for significance in Bartlett test of sphericity is less than 0.05 so the null hypothesis i.e. all the correlation between the variables is 0 is rejected. So the correlation matrix is not an identity matrix and that is good.
Total Variance ExplainedComponent
Initial Eigenvalues Extraction Sums of Squared Loadings
Rotation Sums of Squared Loadings
Total % of Variance
Cumulative %
Total % of Variance
Cumulative %
Total % of Variance
Cumulative %
1 5.429 49.355 49.355 5.429 49.355 49.355 3.890 35.364 35.3642 2.157 19.608 68.963 2.157 19.608 68.963 3.692 33.559 68.9243 1.241 11.285 80.247 1.241 11.285 80.247 1.246 11.324 80.2474 .595 5.407 85.6545 .421 3.831 89.4856 .367 3.336 92.8217 .243 2.214 95.0358 .216 1.967 97.0019 .180 1.637 98.63810 .137 1.241 99.88011 .013 .120 100.000Extraction Method: Principal Component Analysis.
We are looking for an Eigen
value above 1.0
Cumulative percent of variance
explained.
These three factors will be extracted out as they have an eigen value greater than 1.
Factor loadings of all the variables on each of the two factors have been shown here. Since this is an unrotated factor solution, some of the variables may show their contribution in more than one factor. In order to avoid this situation, the factors are rotated by using the varimax rotation technique.
Unrotated Component MatrixComponent
1 2 3
Standing Broad Jump .814 -.179 .020Shuttle Run -.682 .587 -.136Fifty Meter Dash -.469 .108 .808Twelve Meter run and walk .549 -.694 -.230Anerobic capacity .731 -.484 .053Weight .700 .650 .050Height .647 .663 -.159Leg Length .762 .396 -.087Calf Girth .863 .088 -.051Thigh Girth .835 .138 .199Shoulder Width .560 -.082 .660Extraction Method: Principal Component Analysis.a. 3 components extracted.
Rotated Component MatrixComponent
1 2 3Standing Broad Jump .469 .689 -.003Shuttle Run -.091 -.901 -.090Fifty Meter Dash -.292 -.356 .820Twelve Meter run and walk -.069 .868 -.279Anaerobic capacity .200 .855 .012Weight .954 .010 .079Height .930 -.047 -.128Leg Length .828 .230 -.074Calf Girth .690 .524 -.058Thigh Girth .696 .483 .194Shoulder Width .332 .479 .646Extraction Method: Principal Component Analysis. Rotation Method: Varimax with Kaiser Normalization.a. Rotation converged in 4 iterations.
After varimax rotation factors will have non-overlapping variables. If the variable has factor loadings more than 0.7, it indicates that the factor extracts sufficient variance from that variable. Thus, all those variables having loadings more than 0.7 or more on a particular factor is identified in that factor.
Shuttle Run
Fifty Meter Dash
Twelve Meter run and walk
ANTHROPOMETRIC
Weight
Height
Leg Length
Name each factor as per your wish
PHYSICAL
THANK YOU FOR YOU KIND
ATTENTION