slide 1 principal components analysis complete problems
TRANSCRIPT
Slide 1
Principal Components Analysis
Complete Problems
Slide 2
Complete Principal Components Analysis
We add three steps to the end of the principal components analysis testing basic relationships:
Analysis of the impact of outliers Split-sample validation analysis Computation of Chronbach’s alpha to measure feasibility of using components as
summated scales
Slide 3
Outliers Outliers can change the factor structure found for a principal components analysis,
creating the dilemma of determining which factor structure should be reported
SPSS calculates factor scores as standard scores.
SPSS suggests that one way to identify outliers is to compute the factors scores and identify those have a value greater than ±3.0 as outliers.
If we find outliers in our analysis, we redo the analysis, omitting the cases that were outliers.
If there is the analysis excluding outliers still satisfies the requirement for communalities and the factor structure is the same as the analysis with all cases, it implies that there outliers do not have an impact. If our factor solution changes, we will have to study the outlier cases to determine whether or not we should exclude them.
After testing outliers, restore full data set before any further calculations
Slide 4
Split Sample Validation
To test the generalizability of findings from a principal component analysis, we could conduct a second research study to see if our findings are verified.
A less costly alternative is to split the sample randomly into two halves, do the principal component analysis on each half and compare the results.
If the communalities and the factor loadings are the same on the analysis on each half and the full data set, we have evidence that the findings are generalizable and valid because, in effect, the two analyses represent a study and a replication.
Slide 5
Misleading Results to Watch Out For
When we examine the communalities and factor loadings, we are matching up overall patterns, not exact results: the communalities should all be greater than 0.50 and the pattern of the factor loadings should be the same.
Sometimes the variables will switch their components (variables loading on the first component now load on the second and vice versa), but this does not invalidate our findings.
Sometimes, all of the signs of the factor loadings will reverse themselves (the plus's become minus's and the minus's become plus's), but this does not invalidate our findings because we interpret the size, not the sign of the loadings.
Slide 6
When validation fails
If the validation fails, we are warned that the solution found in the analysis of the full data set is not generalizable and should not be reported as valid findings.
We do have some options when validation fails: If the problem is limited to one or two variables, we can remove those variables and redo the
analysis. Randomly selected samples are not always representative. We might try some different random
number seeds and see if our negative finding was a fluke. If we choose this option, we should do a large number of validations to establish a clear pattern, at least 5 to 10. Getting one or two validations to negate the failed validation and support our findings is not sufficient.
Slide 7
Reliability of Summated Scales
One of the common uses of factor analysis is the formation of summated scales, where we sum or average the scores on all the variables loading on a component to create the score for the component.
To verify that the variables for a component are measuring similar entities that are legitimate to add together, we compute Chronbach's alpha.
If Chronbach's alpha is 0.70 or greater (0.60 or greater for exploratory research), we have support on the interval consistency of the items justifying their use in a summated scale.
Chronbach’s alpha requires that all variables be coded in the same direction. If there are negative loadings on a component, the variable must be reverse coded to get the correct value for alpha.
Slide 8
The Problem in BlackBoard
The problem statement tells us: the data set and variables included in the
analysis the alpha for the statistical tests The seed number to use for the validation
analysis
Slide 9
Statement about Level of Measurement
The first statement in the problem asks about level of measurement. Principal components analysis requires that all of the variables included in the analysis are metric.
Slide 10
Marking the Statement about Level of Measurement
All of the variables included in the analysis are ordinal level. We will employ the common convention of treating ordinal variables as metric variables, but we should consider mentioning this as a limitation to the analysis.
Since we treated all variables as metric, we mark the check box.
Slide 11
Statement about Sample Size
We will use the minimum sample size requirement of 150 valid cases recommended by Tabachnick and Fidell (1996).
Slide 12
Run the Principal Components Analysis - 1
Select the Factor command from the Analyze > Data Reduction menu.
To answer the question about the sample size, we run the first principal components analysis.
Slide 13
Run the Principal Components Analysis - 2
First, move the variables listed in the problem to the Variables list box.
Next, click on the Descriptives button to request the statistics needed to evaluate the suitability of the data for factor analysis.
Slide 14
Run the Principal Components Analysis - 3
First, mark the check box for Univariate Statistics to get the number of valid cases for the analysis.
Second, mark the check boxes for the statistics for the suitability of factor analysis:
•Coefficients of the correlation matrix, •KMO and Bartlett’s test of sphericity, and •Anti-image correlation matrix.
Third, click on the Continue button to close the Factor Analysis: Descriptives dialog box.
Slide 15
Run the Principal Components Analysis - 4
Click on the Extraction button to tell SPSS what method it should use to extract the factors.
Slide 16
Run the Principal Components Analysis - 5
We will use the default method of Principal Components. The drop down list contains numerous other methods.
We accept the other defaults for displaying the unrotated factor solution and extracting eigenvalues over 1.
Click on the Continue button to close the dialog box.
Slide 17
Run the Principal Components Analysis - 6
Click on the Rotation button to tell SPSS what method it should use to rotate the factors to clarify the interpretation.
Slide 18
Run the Principal Components Analysis - 7
We mark the option button for the Varimax rotation which will make the factors independent of each other.
Click on the Continue button to close the dialog box.
Slide 19
Run the Principal Components Analysis - 8
Having specified the analysis, click on the OK button to produce the output.
Slide 20
Output for Sample Size Requirement
The 509 cases available for this principal components analysis satisfy the minimum sample size requirement of 150 valid cases recommended by Tabachnick and Fidell (1996).
Slide 21
Marking the Statement about Sample Size
Since we satisfied the minimum sample size requirement, we mark the statement.
If we did not satisfy the sample size requirement, we should consider mentioning this fact as a limitation to the analysis. Factor analysis can be numerically unstable when the sample size is small.
Slide 22
The Statement about Suitability for Factor Analysis: Sufficient Correlations
Principal components analysis requires that there be some correlations greater than 0.30 (more than 1) between the variables included in the analysis.
Slide 23
Sufficient Correlations in Correlation Matrix
For this set of variables, there are 9 correlations in the matrix greater than 0.30.
Slide 24
Marking the Statement about Sufficient Correlations
Since there are 9 correlations greater than 0.30, we mark the statement.
Slide 25
The Statement about Suitability for Factor Analysis: Test of Sphericity
Bartlett’s test of sphericity tests the null hypothesis that the correlation matrix is an identity matrix with 1’s, or perfect correlations, on the main diagonal, and 0’s for all of the remaining elements.
If this is true, the variables are not correlated and the factor analysis will not work.
Our goal in this test is to reject the null hypothesis, supporting the contention that there are sufficient correlations, or similarity of values, among the variables that several can be combined into a factor or component.
Slide 26
Bartlett’s Test of Sphericity
Principal component analysis requires that the probability associated with Bartlett's Test of Sphericity (χ²(df=15, N = 509) = 854.15, p < .001) be less than or equal to the level of significance (0.05). The probability associated with the Bartlett Test satisfies this requirement.
Slide 27
Marking the Statement about Bartlett’s Test of Sphericity
Since the probability associated with the Bartlett Test is sufficient to reject the null hypothesis, we mark the check box.
Slide 28
The Statement about Suitability for Factor Analysis: Sampling Adequacy
Sampling adequacy predicts if data are likely to factor well, based on correlation and partial correlation.
The Kaiser-Meyer-Olkin Measure of Sampling Adequacy (MSA) must be greater than 0.50 for each individual variable as well as the set of variables.Variables that do not have an MSA of .50 or greater are removed from the analysis one at a time, until all variables and the overall measure are above .50.
Slide 29
Measures of Sampling Adequacy for Individual Variables
In the initial iteration for suitability of principal components analsyis , the MSA for all of the individual variables was greater than 0.50 ("information and knowledge are shared openly within this organization" [q76] - .70; "an effort is made to get the opinions of people throughout the organization" [q77] - .69; "our web site is easy to use and contains helpful information" [q83] - .76; "I have a good understanding of our mission, vision, and strategic plan" [q84] - .73; "I believe we communicate our mission effectively to the public" [q85] - .81; and "my organization encourages me to be involved in my community" [q86] - .84).
Note: Not all MSA’s are shown on this slide.
Slide 30
Kaiser-Meyer-Olkin Measure of Sampling Adequacy
In addition, the overall MSA for the set of variables included in the analysis was 0.75, which exceeds the minimum requirement of 0.50 for overall MSA.
Slide 31
Marking the Statement about Measures of Sampling Adequacy
Since the sampling adequacy measures met the criteria for both individual variables and overall, the check box is marked.
Slide 32
Statement about Initial Number of Factors
Various tests are used to estimate the number of factors to be extracted. This was very important when factor analysis was calculated by hand.
Two of the criteria were the latent root criterion which was based on the number of eigenvalues greater than 1.0 and the cumulative proportion of variance criteria which calculated the number of components needed to explain 60% or more of the total variance in the original set of variables.
The problem offers two possible responses.
Slide 33
Initial Number of Factors: Eigenvalues Greater than One
The latent root criterion for number of factors to extract would indicate that there were 2 components to be extracted for these variables, since there were 2 eigenvalues greater than 1.0 (2.84, and 1.05).
Slide 34
Initial Number of Factors: Percentage of Variance Explained
In addition, the cumulative proportion of variance criteria can be met with 2 components to satisfy the criterion of explaining 60% or more of the total variance in the original set of variables. A 2 component solution would explain an estimated 64.86% of the total variance.
Slide 35
Marking the Statement about Initial Number of Factors
Since the SPSS default is to extract the number of components indicated by the latent root criterion, our initial factor solution will be based on the extraction of 2 components.
We mark the second statement in the pair.
Note: the question is worded to indicate that both criteria suggest the same number of factors. Should they suggest a different number of factors, neither statement would be marked, but we would still continue with the factor analysis using the number of factors suggested by the latent root criteria.
Slide 36
Statement about First Iteration of Factor Extraction
The problem suggests that the first iteration of the factor solution included a variable (my organization encourages me to be involved in my community [q86] ) that should be excluded, with because it did not satisfy the requirement for communalities, or because it violated simple structure.
Slide 37
Output for Communalities on First Iteration
Examination of the first principal components model extracted by SPSS resulted in the removal of the variable "my organization encourages me to be involved in my community" [q86] from the analysis. "My organization encourages me to be involved in my community" [q86]was removed because it communality (.467) meant that the factor solution explained less than half of the variable's variance. The communality for this variable was less than the minimum requirement that the factor solution should explain at least 50% of the variance in the original variable.
Slide 38
Marking the Statement about First Iteration of Factor Extraction
My organization encourages me to be involved in my community [q86] was removed because it did not satisfy the requirement for communalities, i.e. the factors should explain at least 50% of the variance in the variable. Since we have already determined that the variable is to be removed, it was not necessary to check the factor loadings for simple structure. The first statement in the pair is marked.
Slide 39
Removing a Variable from the Factor Analysis - 1
To remove the variable, my organization encourages me to be involved in my community [q86], we select Factor Analysis from the Dialog Recall drop down menu.
Slide 40
Removing a Variable from the Factor Analysis - 2
To remove the variable, highlight the target variable in the Variables list box, and click on the arrow button pointing to the left.
Slide 41
Removing a Variable from the Factor Analysis - 3
Since all of the other specifications for the analysis remain the same, click on the OK button to produce the output for the second iteration.
Slide 42
Statement about Second Iteration of Factor Extraction
The problem suggests that the second iteration of the factor solution included a variable (I believe we communicate our mission effectively to the public [q85] ) that should be excluded, with because it did not satisfy the requirement for communalities, or because it violated simple structure.
Slide 43
Output for Communalities on Second Iteration
Examination of the second principal components model extracted by SPSS produced a table of Communalities in which all variables have the required minimum of .50.
Slide 44
Output for Factor Structure on Second Iteration
Examination of the second principal components model extracted by SPSS resulted in the removal of the variable "I believe we communicate our mission effectively to the public" [q85] from the analysis. The variable "I believe we communicate our mission effectively to the public" [q85] had loadings of 0.40 or higher on component 1 (.526) and component 2 (.536).
Multiple high loadings violates the requirement for simple structure, so this variable was removed from the analysis.
Slide 45
Marking the Statement about Second Iteration of Factor Extraction
I believe we communicate our mission effectively to the public [q85] was removed because it did not satisfy the requirement for simple structure, so the first statement in the pair is marked.
Slide 46
Removing a Variable from the Factor Analysis - 1
To remove the variable, I believe we communicate our mission effectively to the public [q85], we select Factor Analysis from the Dialog Recall drop down menu.
Slide 47
Removing a Variable from the Factor Analysis - 2
To remove the variable, highlight the target variable in the Variables list box, and click on the arrow button pointing to the left.
Slide 48
Removing a Variable from the Factor Analysis - 3
Since all of the other specifications for the analysis remain the same, click on the OK button to produce the output for the second iteration.
Slide 49
Statement about Third Iteration of Factor Extraction
The problem does not indicate that any variables were removed on the third iteration of the factor extraction, and that the solution met all of the requirements for a factor analysis solution:
• all the variables remaining in the analysis had communalities above 0.50,
• demonstrated simple structure, and• each component had more than one variable
loading on it
Slide 50
Output for Communalities on Third Iteration - 1
Examination of the third principal components model extracted by SPSS produced a table of Communalities in which all four variables have the required minimum of .50.
Slide 51
Output for Factor Structure on Third Iteration - 2
Examination of the third principal components model extracted by SPSS did not show any variables having a loading of .40 on both of the components.
Slide 52
Output for Factor Structure on Third Iteration - 3
Each of the components has two variables loading on it.
If a component had only one variable loading on it, it would make more sense to use the original variable in subsequent analyses rather than the component.
Slide 53
Marking the Statement about Third Iteration of Factor Extraction
On the third iteration, all of the requirements for a factor solution were satisfied.
For the 4 variables not excluded from the analysis, two components can be substituted for the 4 variables.
Since the final solution found two components, so we mark the statement.
Slide 54
Statement about Variables Loading on the First Component
Two options are given which suggest different combinations of variables loading on the first component.
Slide 55
Output for Component One
Component 1 included the variables: "information and knowledge are shared openly within this organization" [q76] (loading = .901); and "an effort is made to get the opinions of people throughout the organization" [q77] (loading = .912). We can substitute one component variable for this combination of variables in further analyses.
Since more than one component was extracted, the factor structure is based on the "Rotated Component Matrix"
Slide 56
Marking the Statement about Variables Loading on the First Component
Component 1 included the variables: "information and knowledge are shared openly within this organization" [q76] and "an effort is made to get the opinions of people throughout the organization" [q77]. We mark the fist statement in the pair.
Slide 57
Statement about Variables Loading on the Second Component
Two options are given which suggest different combinations of variables loading on the second component.
Slide 58
Output for Component Two
Component 2 included the variables: "our web site is easy to use and contains helpful information" [q83] (loading = .821); and "I have a good understanding of our mission, vision, and strategic plan" [q84] (loading = .833). We can substitute one component variable for this combination of variables in further analyses.
Since more than one component was extracted, the factor structure is based on the "Rotated Component Matrix"
Slide 59
Marking the Statement about Variables Loading on the Second Component
Component 2 included the variables: "our web site is easy to use and contains helpful information" [q83] and "I have a good understanding of our mission, vision, and strategic plan" [q84]. We mark the fist statement in the pair.
Slide 60
Statement about Percentage of Variance Explained by Factors
The final statement questions whether or not the factor solution met the standard of explaining 60% of the variance in the variables that were replaced.
Slide 61
Output for Percentage of Variance Explained by Factors
The components explain 77.25% of the total variance in the variables which are included on the components. This percentage of variance explained satisfies the goal of explaining 60% or more of the total original variance in the variables.
Slide 62
Marking the Statement about Percentage of Variance Explained by Factors
Since the percentage of variance explained by the factors satisfies the goal of explaining 60% or more of the total original variance in the variables the components will replace, we mark the final statement.
Slide 63
Statement about Outliers
The next statement requires us to determine whether or not there are any outliers in the results of the principal components analysis. If outliers are found, they are removed from the analysis and the results computed again. If the factor solution is the same as that based on all cases, we conclude that outliers do not have any impact and we report the results based on all cases.
If the solution without outliers is different, we face the difficult decision of which factor structure should be reported. In our problems, we will halt the analysis.
Slide 64
Detecting Outliers - 1
To detect outliers, we compute the factor scores in SPSS.
Select the Factor Analysis command from the Dialog Recall tool button
Slide 65
Detecting Outliers - 2
Click on the Scores… button to access the factor scores dialog box.
The only command we need to change is to request SPSS to compute the factor scores.
Slide 66
Detecting Outliers - 3
First, click on the Save as variables checkbox to create factor variables.
Third, click on the Continue button to complete the specifications.
Second, accept the default method using a Regression equation to calculate the scores.
Slide 67
Detecting Outliers - 4
Click on the Continue button to compute the factor scores.
Slide 68
Outliers in the Data Editor
SPSS creates the factor score variables in the data editor window. It names the first factor score “FAC1_1,” and the second factor score “FAC2_1.”
We need to check to see if we have any values for either factor score that are larger than ±3.0. One way to check for the presence of large values indicating outliers is to sort the factor variables and see if any fall outside the acceptable range.
Should you forget to delete the factor scores from the previous analysis, SPSS will alter the final digit in the factor name, i.e. instead of naming it FAC1_1, it will name it FAC1_2.
Slide 69
Sort the data to locate outliers for factor one
First, select the FAC1_1 column by clicking on its header.
Second, right click on the column header and select the Sort Ascending command from the drop down menu.
Slide 70
Negative outliers for factor one
Scroll down past the cases for whom factor scores could not be computed because of missing data.
We see that none of the scores for factor one are less than or equal to -3.0, so there are no outliers detected yet.
Slide 71
Positive outliers for factor one
Scrolling down to the bottom of the sorted data set, we see that none of the scores for factor one are greater than or equal to +3.0.
There are no outliers on factor one.
Slide 72
Sort the data to locate outliers on factor two
First, select the fac2_1 column by clicking on its header.
Second, right click on the column header and select the Sort Ascending command from the drop down menu.
Slide 73
Negative outliers for factor two
Scrolling down past the cases for whom factor scores could not be computed, we see that there are five cases that have a score factor less than or equal to -3.0 on factor 2.
Slide 74
Positive outliers for factor two
Scrolling down to the bottom of the sorted data set, we see that none of the scores for factor two are greater than or equal to +3.0.
We will run the analysis excluding the five negative outliers, and see if it changes our interpretation of the analysis.
Slide 75
Removing the outliers
To see whether or not outliers are having an impact on the factor solution, we will compute the factor analysis without the outliers and compare the results.
To remove the outliers, we will include the cases that are not outliers.
Choose the Select Cases… command from the Data menu.
Slide 76
Setting the If condition
Click on the If… button to enter the formula for selecting cases to include in the analysis.
First, mark the option button for the If condition is satisfied.
Slide 77
Formula to select cases that are not outliers
First, type the formula as shown. The formula says: include cases if the absolute value of the first and second factor scores are less than 3.0.
Second, click on the Continue button to complete the specification.
Slide 78
Complete the select cases command
Having entered the formula for including cases, click on the OK button to complete the selection.
SPSS writes the formula we entered next to the IF button.
Slide 79
The outliers selected out of the analysis
The cases with missing data are also excluded because they do not satisfy the criteria in the formula.
When SPSS selects a case out of the data analysis, it draws a slash through the case number. The cases that we identified as outliers will be excluded.
Slide 80
Repeating the factor analysis
To repeat the factor analysis without the outliers, select the Factor Analysis command from the Dialog Recall tool button
Slide 81
Stopping SPSS from computing factor scores again
On the last factor analysis, we included the specification to compute factor scores. Since we do not need to do this again, we will remove the specification.
Click on the Scores… button to access the factor scores dialog.
Slide 82
Clearing the command to save factor scores
First, clear the Save as variables checkbox. This will deactivate the Method options.
Second, click on the Continue button to complete the specification
Slide 83
Computing the factor analysis
To produce the output for the factor analysis excluding outliers, click on the OK button.
Slide 84
Comparing communalities
All of the communalities for the factor analysis excluding outliers satisfy the minimum requirement of being larger than 0.50.
All of the communalities for the factor analysis including all cases satisfy the minimum requirement of being larger than 0.50.
Though the communalities for each variable are slightly smaller when we excluded outliers, we would not alter our interpretation of the role of these four variables in the solution.
Slide 85
Comparing factor loadings
The pattern of variable loadings on components did not change when the outliers were removed. Component 1 included the variables: "information and knowledge are shared openly within this organization" [q76] and "an effort is made to get the opinions of people throughout the organization“. Component 2 included the variables: "our web site is easy to use and contains helpful information" [q83] and "I have a good understanding of our mission, vision, and strategic plan" [q84].
The factor loadings for the factor analysis including all cases is shown on the left.
The factor loadings for the factor analysis excluding outliers is shown on the right.
Slide 86
Marking the Statement about Outliers
The presence of outliers did not alter the factor solution. The factor solution based on all cases should be used in further analyses.
We mark the check box for no impact due to outliers.
Had the factor solution changed, we would have halted the analysis until we could understand the problem further.
Slide 87
Statement about Generalizability
Since factor analysis tends to over-fit the data used to develop the model at the expense of generalizability, we will test generalizability with a split sample validation strategy. In this strategy, we divide the sample in half, conduct the factor analysis on each half, and compare the results to the analysis on the full data set.
Slide 88
Deleting the Factor Scores
Before we do the split-sample validation, we will delete the factors scores that we used to detect outliers.
First, highlight the columns containing the factors scores.
Second, select the Clear command from the Edit menu.
Slide 89
Restoring All Cases to the Analysis - 1
We removed cases that were detected as outliers. Before doing our validation, we need to restore these cases to subsequent analyses.
Select the Select Cases command from the Data menu.
Slide 90
Restoring All Cases to the Analysis - 2
First, click on the All cases option button.
Click on the OK button to restore the cases.
Slide 91
All Cases Restored to the Data Set
The slash lines are removed from the case numbers, indicating that all cases are available to the analysis.
Slide 92
Split-sample validation
We validate our analysis by conducting an analysis on each half of the sample. We compare the results of these two split sample analyses with the analysis of the full data set.
To split the sample into two half, we generate a random variable that indicates which half of the sample each case should be placed in.
To compute a random selection of cases, we need to specify the starting value, or random number seed. Otherwise, the random sequence of numbers that you generate will not match mine, and we will get different results.
Before we do the random selection, you must make certain that your data set is sorted in the original sort order, or the cases in your two half samples will not match mine. To make certain your data set is in the same order as mine, sort your data set in ascending order by case id.
Slide 93
Sorting the data set in ascending order
To make certain the data set is sorted in the original order, highlight the case id column, right click on the column header, and select the Sort Ascending command from the popup menu.
Slide 94
Setting the random number seed - 1
To set the random number seed, select the Random Number Generators… command from the Transform menu.
NOTE: you must use the random number seed that is stated in the problem in order to produce the same results that I found. Any other seed will generate a different random sequence that can produce results that are very different from mine.
Slide 95
Setting the random number seed - 2
Third, type the seed number provided in the problem directions: 291769.
First, mark the check for Set Starting Point.
Second, select the option button for a Fixed Value.
Fourth, click on the OK button to complete the action.
NOTE: SPSS does not provide any feedback that the seed has been set or changed. If you are in doubt, you can reopen the dialog box and see what it indicates.
Slide 96
Computing the split variable - 1
To enter the formula for the variable that will split the sample in two parts, click on the Compute… command.
Slide 97
Computing the split variable - 2
First, type the name for the new variable, split, into the Target Variable text box.
Second, the formula for the value of split is shown in the text box.
The uniform(1) function generates a random decimal number between 0 and 1. The random number is compared to the value 0.50.
If the random number is less than or equal to 0.50, the value of the formula will be 1, the SPSS numeric equivalent to true. If the random number is larger than 0.50, the formula will return a 0, the SPSS numeric equivalent to false.
Third, click on the OK button to complete the dialog box.
Slide 98
The split variable in the data editor
In the data editor, the split variable shows a random pattern of zero’s and one’s.
To select half of the sample for each validation analysis, we will first select the cases where split = 0, then select the cases where split = 1.
Slide 99
Repeating the analysis with the first validation sample
To repeat the principal component analysis for the first validation sample, select Factor Analysis from the Dialog Recall tool button.
Slide 100
Using "split" as the selection variable
First, scroll down the list of variables and highlight the variable split.
Second, click on the right arrow button to move the split variable to the Selection Variable text box.
Slide 101
Setting the value of split to select cases
When the variable named split is moved to the Selection Variable text box, SPSS adds "=?" after the name to prompt up to enter a specific value for split.
Click on the Value… button to enter a value for split.
Slide 102
Completing the value selection
First, type the value for the first half of the sample, 0, into the Value for Selection Variable text box.
Second, click on the Continue button to complete the value entry.
Slide 103
Requesting output for the first validation sample
When the value entry dialog box is closed, SPSS adds the value we entered after the equal sign. This specification now tells SPSS to include in the analysis only those cases that have a value of 0 for the split variable.
Click on the OK button to request the output.
Since the validation analysis requires us to compare the results of the analysis using the two split sample, we will request the output for the second sample before doing any comparison.
Slide 104
Repeating the analysis with the second validation sample
To repeat the principal component analysis for the second validation sample, select Factor Analysis from the Dialog Recall tool button.
Slide 105
Setting the value of split to select cases
Since the split variable is already in the Selection Variable text box, we only need to change its value.
Click on the Value… button to enter a different value for split.
Slide 106
Completing the value selection
First, type the value for the second half of the sample, 1, into the Value for Selection Variable text box.
Second, click on the Continue button to complete the value entry.
Slide 107
Requesting output for the second validation sample
When the value entry dialog box is closed, SPSS adds the value we entered after the equal sign. This specification now tells SPSS to include in the analysis only those cases that have a value of 1 for the split variable.
Click on the OK button to request the output.
Slide 108
Comparing communalities
All of the communalities for the first split sample satisfy the minimum requirement of being larger than 0.50.
Note how SPSS identifies for us which cases we selected for the analysis.
All of the communalities for the second split sample satisfy the minimum requirement of being larger than 0.50.
Slide 109
Comparing factor loadings
The pattern of factor loading for both split samples shows the variables: "information and knowledge are shared openly within this organization" [q76] and "an effort is made to get the opinions of people throughout the organization" [q77] loading on component one, and "our web site is easy to use and contains helpful information" [q83] and "I have a good understanding of our mission, vision, and strategic plan" [q84]loading on the second component.
Slide 110
Marking the Statement about Generalizability
All of the communalities in both validation samples met the criteria.
The pattern of loadings for both validation samples is the same, and the same as the pattern for the analysis using the full sample.
In effect, we have done the same analysis on two separate sub-samples of cases and obtained the same results.
This validation analysis supports a finding that the results of this principal component analysis are generalizable to the population represented by this data set. We mark the check box.
Slide 111
Statement about Summated Scales
The next statement indicates that we can form a summative scale from the variables loading on a component, i.e. summing or averaging the scores for the variables.
The utility of summated scales is measured by Chronbach’s alpha, which should minimally be greater than 0.60, and preferably be greater than 0.70.
Slide 112
Computing Chronbach's Alpha
To compute Chronbach's alpha for each component in our analysis, we select Scale > Reliability Analysis… from the Analyze menu.
Slide 113
Selecting the variables for the first component
First, move the two variables that loaded on the first component (q76 and q77) to the Items list box.
Second, click on the Statistics… button to select the statistics we will need.
Slide 114
Selecting the statistics for the output
First, mark the checkboxes for Item, Scale, and Scale if item deleted.
Second, click on the Continue button.
Slide 115
Completing the specifications
First, If Alpha is not selected as the Model in the drop down menu, select it now.
Second, click on the OK button to produce the output.
Slide 116
Chronbach's Alpha
The reliability for component 1 as measured by Chronbach's alpha is 0.814, which is greater than the generally agreed upon lower limit of 0.70. The variables included on this component ("information and knowledge are shared openly within this organization" and "an effort is made to get the opinions of people throughout the organization") can be used in a summated scale.
Slide 117
Computing Chronbach's Alpha
To compute Chronbach's alpha for the second scale we select Reliability Analysis from the Dialog Recall menu.
Slide 118
Selecting the variables for the second component
First, remove the variables that loaded on the first component and move the two variables that loaded on the second component to the Items list box.
Second, since we want the same output we had for the first component, we only need to click on the OK button to produce the output.
Slide 119
Chronbach's Alpha
The reliability for component 2 as measured by Chronbach's alpha is 0.561, which is less than the generally agreed upon lower limit of 0.70, and even less than the 0.60 lower limit for exploratory research. A summated scale based on these variables ("our web site is easy to use and contains helpful information" and "I have a good understanding of our mission, vision, and strategic plan") should not be used.
Slide 120
Chronbach's Alpha if Item Deleted - 1
If alpha is too small, the Chronbach’s Alpha if Item Deleted column may suggest which variable should be removed to improve the internal consistency of the scale variables. It tells us what alpha we would get if the variable listed were removed from the scale.
In this example, it does not produce a result because there are only two items and the removal of one would result in a one-item scale, which is not useful.
Slide 121
Chronbach's Alpha if Item Deleted - 2
Though not part of this problem, this output demonstrates the output for deleting an item to increase alpha.
If the last item in this table were deleted, alpha would increase to .820, instead of the .686 for alpha with this item included.
Slide 122
Marking the Statement about Summated Scales
Since the variables loading on the second component did not satisfy the reliability scale, we leave the check box un-marked.
Slide 123
Principal Components Analysis: Level of Measurement
No
No
Ordinal level variable treated as metric?
Yes
Yes
Level of measurement ok (all variables metric)?
Consider limitation in discussion of findings
Mark check box for level of measurement
Do not mark check box for level of measurement
Mark: Inappropriate application of the statistic
Stop
Slide 124
Principal Components Analysis: Sample Size
Yes
NoAdequate Sample Size(at least 150 valid cases)
Consider limitation in discussion of findings
Mark check box for sample size
Do not mark check box for sample size
Run Principal Components Analysis
Slide 125
Principal Components Analysis: Suitability for Factor Analysis - 1
No
Mark check box for correlations
Do not mark check box for correlations
Probability for Bartlett test of sphericity ≤ alpha?
Two or more correlations ≥ 0.30?
Yes
Yes
Stop, variables not good candidate for factor analysis
No Do not mark check box for sphericity test
Stop, variables not good candidate for factor analysis
Mark check box for sphericity test
Slide 126
Principal Components Analysis: Suitability for Factor Analysis - 2
No
Yes
Yes
No
Mark check box for MSA
Remove variable with lowest MSA.Run PCA again.
Sampling adequacy ≥0.50 for each variable?
KMO measure of sampling adequacy ≥ 0.50?
One variable remaining in
analysis?
Yes
Do not mark check box for MSA
Stop, variables not good candidate for factor analysis
No
Do not mark check box for MSA
Stop, variables not good candidate for factor analysis
Slide 127
Principal Components Analysis: Anticipated Number of Factors
No
Yes
Mark correct check box for number of factors
Don’t mark check box for number of factors
Correct umber factors supported by eigenvalues > 1.0 and the
number of components needed to explain 60% of the variance?
Today, this step provides information to the analyst about the potential solution. When factor analysis was calculated by hand, this step determined how one would do the calculations.
Slide 128
Principal Components Analysis: Excluding Variables for Low Communality
Yes
No
Mark check box for communality removal
Remove variable load that is only one loading on
component.
One variable remaining in
analysis?Yes
Stop, no viable factor solution
Run PCA again.
No
Communality for all variables ≥ 0.50?
Do not mark check box for communality removal
Slide 129
Principal Components Analysis: Excluding Variables for Complex Structure
Yes
No
Mark check box for complex structure removal
Remove variable load that is only one loading on
component.
One variable remaining in
analysis?Yes
Do not mark check box for complex structure removal
Stop, no viable factor solution
Run PCA again.
No
Simple structure (all variables load on single component)?
Slide 130
Principal Components Analysis: Excluding Variables for One-variable Components
Yes
No
Mark check box for one-variable component
Remove variable load that is only one loading on
component.
One variable remaining in
analysis?Yes
Do not mark check box for one-variable component
Stop, no viable factor solution
All components have more than one variable
loading?
Run PCA again.
No
Slide 131
Principal Components Analysis: Factor Structure
No
Yes
Mark check box for number of components
Do not mark check box for number of component
Correct list of variables loaded on component?
Correct number of components extracted?
No Do not mark check box for loadings on component
Yes
Mark check box for loadings on component
Repeat this step for each component
Slide 132
Principal Components Analysis: Percent of Variance Explained
No
Yes
Mark check box for percent of variance
Do not mark check box for percent of variance
Components explain 60% or more of variance of
included variables?
Include as limitation in discussion of findings
Slide 133
Principal Components Analysis: Impact of Outliers - 1
Yes
No outliers, mark check box for no impact
No
Re-run factor analysis, requesting regression factor scores
Are any of the factor scores outliers (larger than ±3.0)?
Re-run factor analysis, excluding outliers
Yes
Go to validation analysis
Starting here, we include only the variables in the factor solution.
Slide 134
Principal Components Analysis: Impact of Outliers - 2
No
Yes
Mark check box for no impact
Are all of the communalities excluding outliers greater than 0.50?
Pattern of factor loadings excluding outliers match pattern for full data set?
Yes
No
Do not mark check box for no impact
Stop, clarify which analysis should be reported
Do not mark check box for no impact
Stop, clarify which analysis should be reported
Re-run factor analysis, including all cases Since outliers had no effect,
there is no reason to exclude them from the analysis
Slide 135
Principal Components Analysis: Validation Analysis - 1
Compute split variable using specified random number seed
Run factor analysis, selecting cases where split = 0
Run factor analysis, selecting cases where split = 1
No
Yes
Are all of the communalities for both split samples greater than 0.50?
Do not mark check box for validation analysis
Stop, generalizability of findings is questionable
Slide 136
Principal Components Analysis: Impact of Outliers - 2
Mark check box for generalizability
Pattern of factor loadings for split samples matches factor loadings for full data set?
Yes
No Do not mark check box for validation analysis
Stop, generalizability of findings is questionable
Slide 137
Principal Components Analysis: Reliability Analysis
No
Yes
Mark check box for summated scales
Do not mark check box for summated scales
Chronbach’s alpha greater than .70 for all components?
Compute Chronbach’s alpha for all components
Chronbach’s alpha greater than .60 for all components?
Yes
No Mark check box for summated scales
Add note of cautionto interpretation