slide 1 principal components analysis complete problems

Principal Components Analysis

Complete Problems

Complete Principal Components Analysis

We add three steps to the end of the principal components analysis testing basic relationships:

Analysis of the impact of outliers Split-sample validation analysis Computation of Chronbach’s alpha to measure feasibility of using components as

summated scales

Outliers Outliers can change the factor structure found for a principal components analysis,

creating the dilemma of determining which factor structure should be reported

SPSS calculates factor scores as standard scores.

SPSS suggests that one way to identify outliers is to compute the factors scores and identify those have a value greater than ±3.0 as outliers.

If we find outliers in our analysis, we redo the analysis, omitting the cases that were outliers.

If there is the analysis excluding outliers still satisfies the requirement for communalities and the factor structure is the same as the analysis with all cases, it implies that there outliers do not have an impact. If our factor solution changes, we will have to study the outlier cases to determine whether or not we should exclude them.

After testing outliers, restore full data set before any further calculations

Split Sample Validation

To test the generalizability of findings from a principal component analysis, we could conduct a second research study to see if our findings are verified.

A less costly alternative is to split the sample randomly into two halves, do the principal component analysis on each half and compare the results.

If the communalities and the factor loadings are the same on the analysis on each half and the full data set, we have evidence that the findings are generalizable and valid because, in effect, the two analyses represent a study and a replication.

Misleading Results to Watch Out For

When we examine the communalities and factor loadings, we are matching up overall patterns, not exact results: the communalities should all be greater than 0.50 and the pattern of the factor loadings should be the same.

Sometimes the variables will switch their components (variables loading on the first component now load on the second and vice versa), but this does not invalidate our findings.

Sometimes, all of the signs of the factor loadings will reverse themselves (the plus's become minus's and the minus's become plus's), but this does not invalidate our findings because we interpret the size, not the sign of the loadings.

When validation fails

If the validation fails, we are warned that the solution found in the analysis of the full data set is not generalizable and should not be reported as valid findings.

We do have some options when validation fails: If the problem is limited to one or two variables, we can remove those variables and redo the

analysis. Randomly selected samples are not always representative. We might try some different random

number seeds and see if our negative finding was a fluke. If we choose this option, we should do a large number of validations to establish a clear pattern, at least 5 to 10. Getting one or two validations to negate the failed validation and support our findings is not sufficient.

Reliability of Summated Scales

One of the common uses of factor analysis is the formation of summated scales, where we sum or average the scores on all the variables loading on a component to create the score for the component.

To verify that the variables for a component are measuring similar entities that are legitimate to add together, we compute Chronbach's alpha.

If Chronbach's alpha is 0.70 or greater (0.60 or greater for exploratory research), we have support on the interval consistency of the items justifying their use in a summated scale.

Chronbach’s alpha requires that all variables be coded in the same direction. If there are negative loadings on a component, the variable must be reverse coded to get the correct value for alpha.

The Problem in BlackBoard

The problem statement tells us: the data set and variables included in the

analysis the alpha for the statistical tests The seed number to use for the validation

analysis

Statement about Level of Measurement

The first statement in the problem asks about level of measurement. Principal components analysis requires that all of the variables included in the analysis are metric.

Marking the Statement about Level of Measurement

All of the variables included in the analysis are ordinal level. We will employ the common convention of treating ordinal variables as metric variables, but we should consider mentioning this as a limitation to the analysis.

Since we treated all variables as metric, we mark the check box.

Statement about Sample Size

We will use the minimum sample size requirement of 150 valid cases recommended by Tabachnick and Fidell (1996).

Run the Principal Components Analysis - 1

Select the Factor command from the Analyze > Data Reduction menu.

To answer the question about the sample size, we run the first principal components analysis.


First, move the variables listed in the problem to the Variables list box.

Next, click on the Descriptives button to request the statistics needed to evaluate the suitability of the data for factor analysis.


First, mark the check box for Univariate Statistics to get the number of valid cases for the analysis.

Second, mark the check boxes for the statistics for the suitability of factor analysis:

•Coefficients of the correlation matrix, •KMO and Bartlett’s test of sphericity, and •Anti-image correlation matrix.

Third, click on the Continue button to close the Factor Analysis: Descriptives dialog box.


Click on the Extraction button to tell SPSS what method it should use to extract the factors.


We will use the default method of Principal Components. The drop down list contains numerous other methods.

We accept the other defaults for displaying the unrotated factor solution and extracting eigenvalues over 1.

Click on the Continue button to close the dialog box.


Click on the Rotation button to tell SPSS what method it should use to rotate the factors to clarify the interpretation.


We mark the option button for the Varimax rotation which will make the factors independent of each other.

Click on the Continue button to close the dialog box.


Having specified the analysis, click on the OK button to produce the output.

Output for Sample Size Requirement

The 509 cases available for this principal components analysis satisfy the minimum sample size requirement of 150 valid cases recommended by Tabachnick and Fidell (1996).

Marking the Statement about Sample Size

Since we satisfied the minimum sample size requirement, we mark the statement.

If we did not satisfy the sample size requirement, we should consider mentioning this fact as a limitation to the analysis. Factor analysis can be numerically unstable when the sample size is small.

The Statement about Suitability for Factor Analysis: Sufficient Correlations

Principal components analysis requires that there be some correlations greater than 0.30 (more than 1) between the variables included in the analysis.

Sufficient Correlations in Correlation Matrix

For this set of variables, there are 9 correlations in the matrix greater than 0.30.

Marking the Statement about Sufficient Correlations

Since there are 9 correlations greater than 0.30, we mark the statement.

The Statement about Suitability for Factor Analysis: Test of Sphericity

Bartlett’s test of sphericity tests the null hypothesis that the correlation matrix is an identity matrix with 1’s, or perfect correlations, on the main diagonal, and 0’s for all of the remaining elements.

If this is true, the variables are not correlated and the factor analysis will not work.

Our goal in this test is to reject the null hypothesis, supporting the contention that there are sufficient correlations, or similarity of values, among the variables that several can be combined into a factor or component.

Bartlett’s Test of Sphericity

Principal component analysis requires that the probability associated with Bartlett's Test of Sphericity (χ²(df=15, N = 509) = 854.15, p < .001) be less than or equal to the level of significance (0.05). The probability associated with the Bartlett Test satisfies this requirement.

Marking the Statement about Bartlett’s Test of Sphericity

Since the probability associated with the Bartlett Test is sufficient to reject the null hypothesis, we mark the check box.

The Statement about Suitability for Factor Analysis: Sampling Adequacy

Sampling adequacy predicts if data are likely to factor well, based on correlation and partial correlation.

The Kaiser-Meyer-Olkin Measure of Sampling Adequacy (MSA) must be greater than 0.50 for each individual variable as well as the set of variables.Variables that do not have an MSA of .50 or greater are removed from the analysis one at a time, until all variables and the overall measure are above .50.

Measures of Sampling Adequacy for Individual Variables

In the initial iteration for suitability of principal components analsyis , the MSA for all of the individual variables was greater than 0.50 ("information and knowledge are shared openly within this organization" [q76] - .70; "an effort is made to get the opinions of people throughout the organization" [q77] - .69; "our web site is easy to use and contains helpful information" [q83] - .76; "I have a good understanding of our mission, vision, and strategic plan" [q84] - .73; "I believe we communicate our mission effectively to the public" [q85] - .81; and "my organization encourages me to be involved in my community" [q86] - .84).

Note: Not all MSA’s are shown on this slide.

Kaiser-Meyer-Olkin Measure of Sampling Adequacy

In addition, the overall MSA for the set of variables included in the analysis was 0.75, which exceeds the minimum requirement of 0.50 for overall MSA.

Marking the Statement about Measures of Sampling Adequacy

Since the sampling adequacy measures met the criteria for both individual variables and overall, the check box is marked.

Statement about Initial Number of Factors

Various tests are used to estimate the number of factors to be extracted. This was very important when factor analysis was calculated by hand.

Two of the criteria were the latent root criterion which was based on the number of eigenvalues greater than 1.0 and the cumulative proportion of variance criteria which calculated the number of components needed to explain 60% or more of the total variance in the original set of variables.

The problem offers two possible responses.

Initial Number of Factors: Eigenvalues Greater than One

The latent root criterion for number of factors to extract would indicate that there were 2 components to be extracted for these variables, since there were 2 eigenvalues greater than 1.0 (2.84, and 1.05).

Initial Number of Factors: Percentage of Variance Explained

In addition, the cumulative proportion of variance criteria can be met with 2 components to satisfy the criterion of explaining 60% or more of the total variance in the original set of variables. A 2 component solution would explain an estimated 64.86% of the total variance.

Marking the Statement about Initial Number of Factors

Since the SPSS default is to extract the number of components indicated by the latent root criterion, our initial factor solution will be based on the extraction of 2 components.

We mark the second statement in the pair.

Note: the question is worded to indicate that both criteria suggest the same number of factors. Should they suggest a different number of factors, neither statement would be marked, but we would still continue with the factor analysis using the number of factors suggested by the latent root criteria.

Statement about First Iteration of Factor Extraction

The problem suggests that the first iteration of the factor solution included a variable (my organization encourages me to be involved in my community [q86] ) that should be excluded, with because it did not satisfy the requirement for communalities, or because it violated simple structure.

Output for Communalities on First Iteration

Examination of the first principal components model extracted by SPSS resulted in the removal of the variable "my organization encourages me to be involved in my community" [q86] from the analysis. "My organization encourages me to be involved in my community" [q86]was removed because it communality (.467) meant that the factor solution explained less than half of the variable's variance. The communality for this variable was less than the minimum requirement that the factor solution should explain at least 50% of the variance in the original variable.

Marking the Statement about First Iteration of Factor Extraction

My organization encourages me to be involved in my community [q86] was removed because it did not satisfy the requirement for communalities, i.e. the factors should explain at least 50% of the variance in the variable. Since we have already determined that the variable is to be removed, it was not necessary to check the factor loadings for simple structure. The first statement in the pair is marked.

Removing a Variable from the Factor Analysis - 1

To remove the variable, my organization encourages me to be involved in my community [q86], we select Factor Analysis from the Dialog Recall drop down menu.


To remove the variable, highlight the target variable in the Variables list box, and click on the arrow button pointing to the left.


Since all of the other specifications for the analysis remain the same, click on the OK button to produce the output for the second iteration.

Statement about Second Iteration of Factor Extraction

The problem suggests that the second iteration of the factor solution included a variable (I believe we communicate our mission effectively to the public [q85] ) that should be excluded, with because it did not satisfy the requirement for communalities, or because it violated simple structure.

Output for Communalities on Second Iteration

Examination of the second principal components model extracted by SPSS produced a table of Communalities in which all variables have the required minimum of .50.

Output for Factor Structure on Second Iteration

Examination of the second principal components model extracted by SPSS resulted in the removal of the variable "I believe we communicate our mission effectively to the public" [q85] from the analysis. The variable "I believe we communicate our mission effectively to the public" [q85] had loadings of 0.40 or higher on component 1 (.526) and component 2 (.536).

Multiple high loadings violates the requirement for simple structure, so this variable was removed from the analysis.

Marking the Statement about Second Iteration of Factor Extraction

I believe we communicate our mission effectively to the public [q85] was removed because it did not satisfy the requirement for simple structure, so the first statement in the pair is marked.


To remove the variable, I believe we communicate our mission effectively to the public [q85], we select Factor Analysis from the Dialog Recall drop down menu.


To remove the variable, highlight the target variable in the Variables list box, and click on the arrow button pointing to the left.


Since all of the other specifications for the analysis remain the same, click on the OK button to produce the output for the second iteration.

Statement about Third Iteration of Factor Extraction

The problem does not indicate that any variables were removed on the third iteration of the factor extraction, and that the solution met all of the requirements for a factor analysis solution:

• all the variables remaining in the analysis had communalities above 0.50,

• demonstrated simple structure, and• each component had more than one variable

loading on it

Output for Communalities on Third Iteration - 1

Examination of the third principal components model extracted by SPSS produced a table of Communalities in which all four variables have the required minimum of .50.

Output for Factor Structure on Third Iteration - 2

Examination of the third principal components model extracted by SPSS did not show any variables having a loading of .40 on both of the components.

Output for Factor Structure on Third Iteration - 3

Each of the components has two variables loading on it.

If a component had only one variable loading on it, it would make more sense to use the original variable in subsequent analyses rather than the component.

Marking the Statement about Third Iteration of Factor Extraction

On the third iteration, all of the requirements for a factor solution were satisfied.

For the 4 variables not excluded from the analysis, two components can be substituted for the 4 variables.

Since the final solution found two components, so we mark the statement.

Statement about Variables Loading on the First Component

Two options are given which suggest different combinations of variables loading on the first component.

Output for Component One

Component 1 included the variables: "information and knowledge are shared openly within this organization" [q76] (loading = .901); and "an effort is made to get the opinions of people throughout the organization" [q77] (loading = .912). We can substitute one component variable for this combination of variables in further analyses.

Since more than one component was extracted, the factor structure is based on the "Rotated Component Matrix"

Marking the Statement about Variables Loading on the First Component

Component 1 included the variables: "information and knowledge are shared openly within this organization" [q76] and "an effort is made to get the opinions of people throughout the organization" [q77]. We mark the fist statement in the pair.

Statement about Variables Loading on the Second Component

Two options are given which suggest different combinations of variables loading on the second component.

Output for Component Two

Component 2 included the variables: "our web site is easy to use and contains helpful information" [q83] (loading = .821); and "I have a good understanding of our mission, vision, and strategic plan" [q84] (loading = .833). We can substitute one component variable for this combination of variables in further analyses.

Since more than one component was extracted, the factor structure is based on the "Rotated Component Matrix"

Marking the Statement about Variables Loading on the Second Component

Component 2 included the variables: "our web site is easy to use and contains helpful information" [q83] and "I have a good understanding of our mission, vision, and strategic plan" [q84]. We mark the fist statement in the pair.

Statement about Percentage of Variance Explained by Factors

The final statement questions whether or not the factor solution met the standard of explaining 60% of the variance in the variables that were replaced.

Output for Percentage of Variance Explained by Factors

The components explain 77.25% of the total variance in the variables which are included on the components. This percentage of variance explained satisfies the goal of explaining 60% or more of the total original variance in the variables.

Marking the Statement about Percentage of Variance Explained by Factors

Since the percentage of variance explained by the factors satisfies the goal of explaining 60% or more of the total original variance in the variables the components will replace, we mark the final statement.

Statement about Outliers

The next statement requires us to determine whether or not there are any outliers in the results of the principal components analysis. If outliers are found, they are removed from the analysis and the results computed again. If the factor solution is the same as that based on all cases, we conclude that outliers do not have any impact and we report the results based on all cases.

If the solution without outliers is different, we face the difficult decision of which factor structure should be reported. In our problems, we will halt the analysis.

Detecting Outliers - 1

To detect outliers, we compute the factor scores in SPSS.

Select the Factor Analysis command from the Dialog Recall tool button


Click on the Scores… button to access the factor scores dialog box.

The only command we need to change is to request SPSS to compute the factor scores.


First, click on the Save as variables checkbox to create factor variables.

Third, click on the Continue button to complete the specifications.

Second, accept the default method using a Regression equation to calculate the scores.


Click on the Continue button to compute the factor scores.

Outliers in the Data Editor

SPSS creates the factor score variables in the data editor window. It names the first factor score “FAC1_1,” and the second factor score “FAC2_1.”

We need to check to see if we have any values for either factor score that are larger than ±3.0. One way to check for the presence of large values indicating outliers is to sort the factor variables and see if any fall outside the acceptable range.

Should you forget to delete the factor scores from the previous analysis, SPSS will alter the final digit in the factor name, i.e. instead of naming it FAC1_1, it will name it FAC1_2.

Sort the data to locate outliers for factor one

First, select the FAC1_1 column by clicking on its header.

Second, right click on the column header and select the Sort Ascending command from the drop down menu.

Negative outliers for factor one

Scroll down past the cases for whom factor scores could not be computed because of missing data.

We see that none of the scores for factor one are less than or equal to -3.0, so there are no outliers detected yet.

Positive outliers for factor one

Scrolling down to the bottom of the sorted data set, we see that none of the scores for factor one are greater than or equal to +3.0.

There are no outliers on factor one.

Sort the data to locate outliers on factor two

First, select the fac2_1 column by clicking on its header.

Second, right click on the column header and select the Sort Ascending command from the drop down menu.

Negative outliers for factor two

Scrolling down past the cases for whom factor scores could not be computed, we see that there are five cases that have a score factor less than or equal to -3.0 on factor 2.

Positive outliers for factor two

Scrolling down to the bottom of the sorted data set, we see that none of the scores for factor two are greater than or equal to +3.0.

We will run the analysis excluding the five negative outliers, and see if it changes our interpretation of the analysis.

Removing the outliers

To see whether or not outliers are having an impact on the factor solution, we will compute the factor analysis without the outliers and compare the results.

To remove the outliers, we will include the cases that are not outliers.

Choose the Select Cases… command from the Data menu.

Setting the If condition

Click on the If… button to enter the formula for selecting cases to include in the analysis.

First, mark the option button for the If condition is satisfied.

Formula to select cases that are not outliers

First, type the formula as shown. The formula says: include cases if the absolute value of the first and second factor scores are less than 3.0.

Second, click on the Continue button to complete the specification.

Complete the select cases command

Having entered the formula for including cases, click on the OK button to complete the selection.

SPSS writes the formula we entered next to the IF button.

The outliers selected out of the analysis

The cases with missing data are also excluded because they do not satisfy the criteria in the formula.

When SPSS selects a case out of the data analysis, it draws a slash through the case number. The cases that we identified as outliers will be excluded.

Repeating the factor analysis

To repeat the factor analysis without the outliers, select the Factor Analysis command from the Dialog Recall tool button

Stopping SPSS from computing factor scores again

On the last factor analysis, we included the specification to compute factor scores. Since we do not need to do this again, we will remove the specification.

Click on the Scores… button to access the factor scores dialog.

Clearing the command to save factor scores

First, clear the Save as variables checkbox. This will deactivate the Method options.

Second, click on the Continue button to complete the specification

Computing the factor analysis

To produce the output for the factor analysis excluding outliers, click on the OK button.

Comparing communalities

All of the communalities for the factor analysis excluding outliers satisfy the minimum requirement of being larger than 0.50.

All of the communalities for the factor analysis including all cases satisfy the minimum requirement of being larger than 0.50.

Though the communalities for each variable are slightly smaller when we excluded outliers, we would not alter our interpretation of the role of these four variables in the solution.

Comparing factor loadings

The pattern of variable loadings on components did not change when the outliers were removed. Component 1 included the variables: "information and knowledge are shared openly within this organization" [q76] and "an effort is made to get the opinions of people throughout the organization“. Component 2 included the variables: "our web site is easy to use and contains helpful information" [q83] and "I have a good understanding of our mission, vision, and strategic plan" [q84].

The factor loadings for the factor analysis including all cases is shown on the left.

The factor loadings for the factor analysis excluding outliers is shown on the right.

Marking the Statement about Outliers

The presence of outliers did not alter the factor solution. The factor solution based on all cases should be used in further analyses.

We mark the check box for no impact due to outliers.

Had the factor solution changed, we would have halted the analysis until we could understand the problem further.

Statement about Generalizability

Since factor analysis tends to over-fit the data used to develop the model at the expense of generalizability, we will test generalizability with a split sample validation strategy. In this strategy, we divide the sample in half, conduct the factor analysis on each half, and compare the results to the analysis on the full data set.

Deleting the Factor Scores

Before we do the split-sample validation, we will delete the factors scores that we used to detect outliers.

First, highlight the columns containing the factors scores.

Second, select the Clear command from the Edit menu.

Restoring All Cases to the Analysis - 1

We removed cases that were detected as outliers. Before doing our validation, we need to restore these cases to subsequent analyses.

Select the Select Cases command from the Data menu.

Restoring All Cases to the Analysis - 2

First, click on the All cases option button.

Click on the OK button to restore the cases.

All Cases Restored to the Data Set

The slash lines are removed from the case numbers, indicating that all cases are available to the analysis.

Split-sample validation

We validate our analysis by conducting an analysis on each half of the sample. We compare the results of these two split sample analyses with the analysis of the full data set.

To split the sample into two half, we generate a random variable that indicates which half of the sample each case should be placed in.

To compute a random selection of cases, we need to specify the starting value, or random number seed. Otherwise, the random sequence of numbers that you generate will not match mine, and we will get different results.

Before we do the random selection, you must make certain that your data set is sorted in the original sort order, or the cases in your two half samples will not match mine. To make certain your data set is in the same order as mine, sort your data set in ascending order by case id.

Sorting the data set in ascending order

To make certain the data set is sorted in the original order, highlight the case id column, right click on the column header, and select the Sort Ascending command from the popup menu.

Setting the random number seed - 1

To set the random number seed, select the Random Number Generators… command from the Transform menu.

NOTE: you must use the random number seed that is stated in the problem in order to produce the same results that I found. Any other seed will generate a different random sequence that can produce results that are very different from mine.

Setting the random number seed - 2

Third, type the seed number provided in the problem directions: 291769.

First, mark the check for Set Starting Point.

Second, select the option button for a Fixed Value.

Fourth, click on the OK button to complete the action.

NOTE: SPSS does not provide any feedback that the seed has been set or changed. If you are in doubt, you can reopen the dialog box and see what it indicates.

Computing the split variable - 1

To enter the formula for the variable that will split the sample in two parts, click on the Compute… command.

Computing the split variable - 2

First, type the name for the new variable, split, into the Target Variable text box.

Second, the formula for the value of split is shown in the text box.

The uniform(1) function generates a random decimal number between 0 and 1. The random number is compared to the value 0.50.

If the random number is less than or equal to 0.50, the value of the formula will be 1, the SPSS numeric equivalent to true. If the random number is larger than 0.50, the formula will return a 0, the SPSS numeric equivalent to false.

Third, click on the OK button to complete the dialog box.

The split variable in the data editor

In the data editor, the split variable shows a random pattern of zero’s and one’s.

To select half of the sample for each validation analysis, we will first select the cases where split = 0, then select the cases where split = 1.

Repeating the analysis with the first validation sample

To repeat the principal component analysis for the first validation sample, select Factor Analysis from the Dialog Recall tool button.

Using "split" as the selection variable

First, scroll down the list of variables and highlight the variable split.

Second, click on the right arrow button to move the split variable to the Selection Variable text box.

Setting the value of split to select cases

When the variable named split is moved to the Selection Variable text box, SPSS adds "=?" after the name to prompt up to enter a specific value for split.

Click on the Value… button to enter a value for split.

Completing the value selection

First, type the value for the first half of the sample, 0, into the Value for Selection Variable text box.

Second, click on the Continue button to complete the value entry.

Requesting output for the first validation sample

When the value entry dialog box is closed, SPSS adds the value we entered after the equal sign. This specification now tells SPSS to include in the analysis only those cases that have a value of 0 for the split variable.

Click on the OK button to request the output.

Since the validation analysis requires us to compare the results of the analysis using the two split sample, we will request the output for the second sample before doing any comparison.

Repeating the analysis with the second validation sample

To repeat the principal component analysis for the second validation sample, select Factor Analysis from the Dialog Recall tool button.

Setting the value of split to select cases

Since the split variable is already in the Selection Variable text box, we only need to change its value.

Click on the Value… button to enter a different value for split.

Completing the value selection

First, type the value for the second half of the sample, 1, into the Value for Selection Variable text box.

Second, click on the Continue button to complete the value entry.

Requesting output for the second validation sample

When the value entry dialog box is closed, SPSS adds the value we entered after the equal sign. This specification now tells SPSS to include in the analysis only those cases that have a value of 1 for the split variable.

Click on the OK button to request the output.

Comparing communalities

All of the communalities for the first split sample satisfy the minimum requirement of being larger than 0.50.

Note how SPSS identifies for us which cases we selected for the analysis.

All of the communalities for the second split sample satisfy the minimum requirement of being larger than 0.50.

Comparing factor loadings

The pattern of factor loading for both split samples shows the variables: "information and knowledge are shared openly within this organization" [q76] and "an effort is made to get the opinions of people throughout the organization" [q77] loading on component one, and "our web site is easy to use and contains helpful information" [q83] and "I have a good understanding of our mission, vision, and strategic plan" [q84]loading on the second component.

Marking the Statement about Generalizability

All of the communalities in both validation samples met the criteria.

The pattern of loadings for both validation samples is the same, and the same as the pattern for the analysis using the full sample.

In effect, we have done the same analysis on two separate sub-samples of cases and obtained the same results.

This validation analysis supports a finding that the results of this principal component analysis are generalizable to the population represented by this data set. We mark the check box.

Statement about Summated Scales

The next statement indicates that we can form a summative scale from the variables loading on a component, i.e. summing or averaging the scores for the variables.

The utility of summated scales is measured by Chronbach’s alpha, which should minimally be greater than 0.60, and preferably be greater than 0.70.

Computing Chronbach's Alpha

To compute Chronbach's alpha for each component in our analysis, we select Scale > Reliability Analysis… from the Analyze menu.

Selecting the variables for the first component

First, move the two variables that loaded on the first component (q76 and q77) to the Items list box.

Second, click on the Statistics… button to select the statistics we will need.

Selecting the statistics for the output

First, mark the checkboxes for Item, Scale, and Scale if item deleted.

Second, click on the Continue button.

Completing the specifications

First, If Alpha is not selected as the Model in the drop down menu, select it now.

Second, click on the OK button to produce the output.

Chronbach's Alpha

The reliability for component 1 as measured by Chronbach's alpha is 0.814, which is greater than the generally agreed upon lower limit of 0.70. The variables included on this component ("information and knowledge are shared openly within this organization" and "an effort is made to get the opinions of people throughout the organization") can be used in a summated scale.

Computing Chronbach's Alpha

To compute Chronbach's alpha for the second scale we select Reliability Analysis from the Dialog Recall menu.

Selecting the variables for the second component

First, remove the variables that loaded on the first component and move the two variables that loaded on the second component to the Items list box.

Second, since we want the same output we had for the first component, we only need to click on the OK button to produce the output.

Chronbach's Alpha

The reliability for component 2 as measured by Chronbach's alpha is 0.561, which is less than the generally agreed upon lower limit of 0.70, and even less than the 0.60 lower limit for exploratory research. A summated scale based on these variables ("our web site is easy to use and contains helpful information" and "I have a good understanding of our mission, vision, and strategic plan") should not be used.

Chronbach's Alpha if Item Deleted - 1

If alpha is too small, the Chronbach’s Alpha if Item Deleted column may suggest which variable should be removed to improve the internal consistency of the scale variables. It tells us what alpha we would get if the variable listed were removed from the scale.

In this example, it does not produce a result because there are only two items and the removal of one would result in a one-item scale, which is not useful.

Chronbach's Alpha if Item Deleted - 2

Though not part of this problem, this output demonstrates the output for deleting an item to increase alpha.

If the last item in this table were deleted, alpha would increase to .820, instead of the .686 for alpha with this item included.

Marking the Statement about Summated Scales

Since the variables loading on the second component did not satisfy the reliability scale, we leave the check box un-marked.

Principal Components Analysis: Level of Measurement

No

No

Ordinal level variable treated as metric?

Yes

Yes

Level of measurement ok (all variables metric)?

Consider limitation in discussion of findings

Mark check box for level of measurement

Do not mark check box for level of measurement

Mark: Inappropriate application of the statistic

Stop

Principal Components Analysis: Sample Size

Yes

NoAdequate Sample Size(at least 150 valid cases)

Consider limitation in discussion of findings

Mark check box for sample size

Do not mark check box for sample size

Run Principal Components Analysis

Principal Components Analysis: Suitability for Factor Analysis - 1

No

Mark check box for correlations

Do not mark check box for correlations

Probability for Bartlett test of sphericity ≤ alpha?

Two or more correlations ≥ 0.30?

Yes

Yes

Stop, variables not good candidate for factor analysis

No Do not mark check box for sphericity test


Mark check box for sphericity test

Principal Components Analysis: Suitability for Factor Analysis - 2

No

Yes

Yes

No

Mark check box for MSA

Remove variable with lowest MSA.Run PCA again.

Sampling adequacy ≥0.50 for each variable?

KMO measure of sampling adequacy ≥ 0.50?

One variable remaining in

analysis?

Yes

Do not mark check box for MSA


No

Do not mark check box for MSA


Principal Components Analysis: Anticipated Number of Factors

No

Yes

Mark correct check box for number of factors

Don’t mark check box for number of factors

Correct umber factors supported by eigenvalues > 1.0 and the

number of components needed to explain 60% of the variance?

Today, this step provides information to the analyst about the potential solution. When factor analysis was calculated by hand, this step determined how one would do the calculations.

Principal Components Analysis: Excluding Variables for Low Communality

Yes

No

Mark check box for communality removal

Remove variable load that is only one loading on

component.


analysis?Yes

Stop, no viable factor solution

Run PCA again.

No

Communality for all variables ≥ 0.50?

Do not mark check box for communality removal

Principal Components Analysis: Excluding Variables for Complex Structure

Yes

No

Mark check box for complex structure removal


component.


analysis?Yes

Do not mark check box for complex structure removal


Run PCA again.

No

Simple structure (all variables load on single component)?

Principal Components Analysis: Excluding Variables for One-variable Components

Yes

No

Mark check box for one-variable component


component.


analysis?Yes

Do not mark check box for one-variable component


All components have more than one variable

loading?

Run PCA again.

No

Principal Components Analysis: Factor Structure

No

Yes

Mark check box for number of components

Do not mark check box for number of component

Correct list of variables loaded on component?

Correct number of components extracted?

No Do not mark check box for loadings on component

Yes

Mark check box for loadings on component

Repeat this step for each component

Principal Components Analysis: Percent of Variance Explained

No

Yes

Mark check box for percent of variance

Do not mark check box for percent of variance

Components explain 60% or more of variance of

included variables?

Include as limitation in discussion of findings

Principal Components Analysis: Impact of Outliers - 1

Yes

No outliers, mark check box for no impact

No

Re-run factor analysis, requesting regression factor scores

Are any of the factor scores outliers (larger than ±3.0)?

Re-run factor analysis, excluding outliers

Yes

Go to validation analysis

Starting here, we include only the variables in the factor solution.


No

Yes

Mark check box for no impact

Are all of the communalities excluding outliers greater than 0.50?

Pattern of factor loadings excluding outliers match pattern for full data set?

Yes

No

Do not mark check box for no impact

Stop, clarify which analysis should be reported

Do not mark check box for no impact

Stop, clarify which analysis should be reported

Re-run factor analysis, including all cases Since outliers had no effect,

there is no reason to exclude them from the analysis

Principal Components Analysis: Validation Analysis - 1

Compute split variable using specified random number seed

Run factor analysis, selecting cases where split = 0

Run factor analysis, selecting cases where split = 1

No

Yes

Are all of the communalities for both split samples greater than 0.50?

Do not mark check box for validation analysis

Stop, generalizability of findings is questionable


Mark check box for generalizability

Pattern of factor loadings for split samples matches factor loadings for full data set?

Yes

No Do not mark check box for validation analysis

Stop, generalizability of findings is questionable

Principal Components Analysis: Reliability Analysis

No

Yes

Mark check box for summated scales

Do not mark check box for summated scales

Chronbach’s alpha greater than .70 for all components?

Compute Chronbach’s alpha for all components

Chronbach’s alpha greater than .60 for all components?

Yes

No Mark check box for summated scales

Add note of cautionto interpretation

slide 1 principal components analysis complete problems

Documents

principal component

factor loadings

factor structure

components variables

factor solution changes

valid findings

generalizability of

data set