slide 1 principal components analysis complete problems

137
Slide 1 Principal Components Analysis Complete Problems

Upload: jeremy-taylor

Post on 11-Jan-2016

233 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Slide 1 Principal Components Analysis Complete Problems

Slide 1

Principal Components Analysis

Complete Problems

Page 2: Slide 1 Principal Components Analysis Complete Problems

Slide 2

Complete Principal Components Analysis

We add three steps to the end of the principal components analysis testing basic relationships:

Analysis of the impact of outliers Split-sample validation analysis Computation of Chronbach’s alpha to measure feasibility of using components as

summated scales

Page 3: Slide 1 Principal Components Analysis Complete Problems

Slide 3

Outliers Outliers can change the factor structure found for a principal components analysis,

creating the dilemma of determining which factor structure should be reported

SPSS calculates factor scores as standard scores.

SPSS suggests that one way to identify outliers is to compute the factors scores and identify those have a value greater than ±3.0 as outliers.

If we find outliers in our analysis, we redo the analysis, omitting the cases that were outliers.

If there is the analysis excluding outliers still satisfies the requirement for communalities and the factor structure is the same as the analysis with all cases, it implies that there outliers do not have an impact. If our factor solution changes, we will have to study the outlier cases to determine whether or not we should exclude them.

After testing outliers, restore full data set before any further calculations

Page 4: Slide 1 Principal Components Analysis Complete Problems

Slide 4

Split Sample Validation

To test the generalizability of findings from a principal component analysis, we could conduct a second research study to see if our findings are verified.

A less costly alternative is to split the sample randomly into two halves, do the principal component analysis on each half and compare the results.

If the communalities and the factor loadings are the same on the analysis on each half and the full data set, we have evidence that the findings are generalizable and valid because, in effect, the two analyses represent a study and a replication.

Page 5: Slide 1 Principal Components Analysis Complete Problems

Slide 5

Misleading Results to Watch Out For

When we examine the communalities and factor loadings, we are matching up overall patterns, not exact results: the communalities should all be greater than 0.50 and the pattern of the factor loadings should be the same.

Sometimes the variables will switch their components (variables loading on the first component now load on the second and vice versa), but this does not invalidate our findings.

Sometimes, all of the signs of the factor loadings will reverse themselves (the plus's become minus's and the minus's become plus's), but this does not invalidate our findings because we interpret the size, not the sign of the loadings.

Page 6: Slide 1 Principal Components Analysis Complete Problems

Slide 6

When validation fails

If the validation fails, we are warned that the solution found in the analysis of the full data set is not generalizable and should not be reported as valid findings.

We do have some options when validation fails: If the problem is limited to one or two variables, we can remove those variables and redo the

analysis. Randomly selected samples are not always representative. We might try some different random

number seeds and see if our negative finding was a fluke. If we choose this option, we should do a large number of validations to establish a clear pattern, at least 5 to 10. Getting one or two validations to negate the failed validation and support our findings is not sufficient.

Page 7: Slide 1 Principal Components Analysis Complete Problems

Slide 7

Reliability of Summated Scales

One of the common uses of factor analysis is the formation of summated scales, where we sum or average the scores on all the variables loading on a component to create the score for the component.

To verify that the variables for a component are measuring similar entities that are legitimate to add together, we compute Chronbach's alpha.

If Chronbach's alpha is 0.70 or greater (0.60 or greater for exploratory research), we have support on the interval consistency of the items justifying their use in a summated scale.

Chronbach’s alpha requires that all variables be coded in the same direction. If there are negative loadings on a component, the variable must be reverse coded to get the correct value for alpha.

Page 8: Slide 1 Principal Components Analysis Complete Problems

Slide 8

The Problem in BlackBoard

The problem statement tells us: the data set and variables included in the

analysis the alpha for the statistical tests The seed number to use for the validation

analysis

Page 9: Slide 1 Principal Components Analysis Complete Problems

Slide 9

Statement about Level of Measurement

The first statement in the problem asks about level of measurement. Principal components analysis requires that all of the variables included in the analysis are metric.

Page 10: Slide 1 Principal Components Analysis Complete Problems

Slide 10

Marking the Statement about Level of Measurement

All of the variables included in the analysis are ordinal level. We will employ the common convention of treating ordinal variables as metric variables, but we should consider mentioning this as a limitation to the analysis.

Since we treated all variables as metric, we mark the check box.

Page 11: Slide 1 Principal Components Analysis Complete Problems

Slide 11

Statement about Sample Size

We will use the minimum sample size requirement of 150 valid cases recommended by Tabachnick and Fidell (1996).

Page 12: Slide 1 Principal Components Analysis Complete Problems

Slide 12

Run the Principal Components Analysis - 1

Select the Factor command from the Analyze > Data Reduction menu.

To answer the question about the sample size, we run the first principal components analysis.

Page 13: Slide 1 Principal Components Analysis Complete Problems

Slide 13

Run the Principal Components Analysis - 2

First, move the variables listed in the problem to the Variables list box.

Next, click on the Descriptives button to request the statistics needed to evaluate the suitability of the data for factor analysis.

Page 14: Slide 1 Principal Components Analysis Complete Problems

Slide 14

Run the Principal Components Analysis - 3

First, mark the check box for Univariate Statistics to get the number of valid cases for the analysis.

Second, mark the check boxes for the statistics for the suitability of factor analysis:

•Coefficients of the correlation matrix, •KMO and Bartlett’s test of sphericity, and •Anti-image correlation matrix.

Third, click on the Continue button to close the Factor Analysis: Descriptives dialog box.

Page 15: Slide 1 Principal Components Analysis Complete Problems

Slide 15

Run the Principal Components Analysis - 4

Click on the Extraction button to tell SPSS what method it should use to extract the factors.

Page 16: Slide 1 Principal Components Analysis Complete Problems

Slide 16

Run the Principal Components Analysis - 5

We will use the default method of Principal Components. The drop down list contains numerous other methods.

We accept the other defaults for displaying the unrotated factor solution and extracting eigenvalues over 1.

Click on the Continue button to close the dialog box.

Page 17: Slide 1 Principal Components Analysis Complete Problems

Slide 17

Run the Principal Components Analysis - 6

Click on the Rotation button to tell SPSS what method it should use to rotate the factors to clarify the interpretation.

Page 18: Slide 1 Principal Components Analysis Complete Problems

Slide 18

Run the Principal Components Analysis - 7

We mark the option button for the Varimax rotation which will make the factors independent of each other.

Click on the Continue button to close the dialog box.

Page 19: Slide 1 Principal Components Analysis Complete Problems

Slide 19

Run the Principal Components Analysis - 8

Having specified the analysis, click on the OK button to produce the output.

Page 20: Slide 1 Principal Components Analysis Complete Problems

Slide 20

Output for Sample Size Requirement

The 509 cases available for this principal components analysis satisfy the minimum sample size requirement of 150 valid cases recommended by Tabachnick and Fidell (1996).

Page 21: Slide 1 Principal Components Analysis Complete Problems

Slide 21

Marking the Statement about Sample Size

Since we satisfied the minimum sample size requirement, we mark the statement.

If we did not satisfy the sample size requirement, we should consider mentioning this fact as a limitation to the analysis. Factor analysis can be numerically unstable when the sample size is small.

Page 22: Slide 1 Principal Components Analysis Complete Problems

Slide 22

The Statement about Suitability for Factor Analysis: Sufficient Correlations

Principal components analysis requires that there be some correlations greater than 0.30 (more than 1) between the variables included in the analysis.

Page 23: Slide 1 Principal Components Analysis Complete Problems

Slide 23

Sufficient Correlations in Correlation Matrix

For this set of variables, there are 9 correlations in the matrix greater than 0.30.

Page 24: Slide 1 Principal Components Analysis Complete Problems

Slide 24

Marking the Statement about Sufficient Correlations

Since there are 9 correlations greater than 0.30, we mark the statement.

Page 25: Slide 1 Principal Components Analysis Complete Problems

Slide 25

The Statement about Suitability for Factor Analysis: Test of Sphericity

Bartlett’s test of sphericity tests the null hypothesis that the correlation matrix is an identity matrix with 1’s, or perfect correlations, on the main diagonal, and 0’s for all of the remaining elements.

If this is true, the variables are not correlated and the factor analysis will not work.

Our goal in this test is to reject the null hypothesis, supporting the contention that there are sufficient correlations, or similarity of values, among the variables that several can be combined into a factor or component.

Page 26: Slide 1 Principal Components Analysis Complete Problems

Slide 26

Bartlett’s Test of Sphericity

Principal component analysis requires that the probability associated with Bartlett's Test of Sphericity (χ²(df=15, N = 509) = 854.15, p < .001) be less than or equal to the level of significance (0.05). The probability associated with the Bartlett Test satisfies this requirement.

Page 27: Slide 1 Principal Components Analysis Complete Problems

Slide 27

Marking the Statement about Bartlett’s Test of Sphericity

Since the probability associated with the Bartlett Test is sufficient to reject the null hypothesis, we mark the check box.

Page 28: Slide 1 Principal Components Analysis Complete Problems

Slide 28

The Statement about Suitability for Factor Analysis: Sampling Adequacy

Sampling adequacy predicts if data are likely to factor well, based on correlation and partial correlation.

The Kaiser-Meyer-Olkin Measure of Sampling Adequacy (MSA) must be greater than 0.50 for each individual variable as well as the set of variables.Variables that do not have an MSA of .50 or greater are removed from the analysis one at a time, until all variables and the overall measure are above .50.

Page 29: Slide 1 Principal Components Analysis Complete Problems

Slide 29

Measures of Sampling Adequacy for Individual Variables

In the initial iteration for suitability of principal components analsyis , the MSA for all of the individual variables was greater than 0.50 ("information and knowledge are shared openly within this organization" [q76] - .70; "an effort is made to get the opinions of people throughout the organization" [q77] - .69; "our web site is easy to use and contains helpful information" [q83] - .76; "I have a good understanding of our mission, vision, and strategic plan" [q84] - .73; "I believe we communicate our mission effectively to the public" [q85] - .81; and "my organization encourages me to be involved in my community" [q86] - .84).

Note: Not all MSA’s are shown on this slide.

Page 30: Slide 1 Principal Components Analysis Complete Problems

Slide 30

Kaiser-Meyer-Olkin Measure of Sampling Adequacy

In addition, the overall MSA for the set of variables included in the analysis was 0.75, which exceeds the minimum requirement of 0.50 for overall MSA.

Page 31: Slide 1 Principal Components Analysis Complete Problems

Slide 31

Marking the Statement about Measures of Sampling Adequacy

Since the sampling adequacy measures met the criteria for both individual variables and overall, the check box is marked.

Page 32: Slide 1 Principal Components Analysis Complete Problems

Slide 32

Statement about Initial Number of Factors

Various tests are used to estimate the number of factors to be extracted. This was very important when factor analysis was calculated by hand.

Two of the criteria were the latent root criterion which was based on the number of eigenvalues greater than 1.0 and the cumulative proportion of variance criteria which calculated the number of components needed to explain 60% or more of the total variance in the original set of variables.

The problem offers two possible responses.

Page 33: Slide 1 Principal Components Analysis Complete Problems

Slide 33

Initial Number of Factors: Eigenvalues Greater than One

The latent root criterion for number of factors to extract would indicate that there were 2 components to be extracted for these variables, since there were 2 eigenvalues greater than 1.0 (2.84, and 1.05).

Page 34: Slide 1 Principal Components Analysis Complete Problems

Slide 34

Initial Number of Factors: Percentage of Variance Explained

In addition, the cumulative proportion of variance criteria can be met with 2 components to satisfy the criterion of explaining 60% or more of the total variance in the original set of variables. A 2 component solution would explain an estimated 64.86% of the total variance.

Page 35: Slide 1 Principal Components Analysis Complete Problems

Slide 35

Marking the Statement about Initial Number of Factors

Since the SPSS default is to extract the number of components indicated by the latent root criterion, our initial factor solution will be based on the extraction of 2 components.

We mark the second statement in the pair.

Note: the question is worded to indicate that both criteria suggest the same number of factors. Should they suggest a different number of factors, neither statement would be marked, but we would still continue with the factor analysis using the number of factors suggested by the latent root criteria.

Page 36: Slide 1 Principal Components Analysis Complete Problems

Slide 36

Statement about First Iteration of Factor Extraction

The problem suggests that the first iteration of the factor solution included a variable (my organization encourages me to be involved in my community [q86] ) that should be excluded, with because it did not satisfy the requirement for communalities, or because it violated simple structure.

Page 37: Slide 1 Principal Components Analysis Complete Problems

Slide 37

Output for Communalities on First Iteration

Examination of the first principal components model extracted by SPSS resulted in the removal of the variable "my organization encourages me to be involved in my community" [q86] from the analysis. "My organization encourages me to be involved in my community" [q86]was removed because it communality (.467) meant that the factor solution explained less than half of the variable's variance. The communality for this variable was less than the minimum requirement that the factor solution should explain at least 50% of the variance in the original variable.

Page 38: Slide 1 Principal Components Analysis Complete Problems

Slide 38

Marking the Statement about First Iteration of Factor Extraction

My organization encourages me to be involved in my community [q86] was removed because it did not satisfy the requirement for communalities, i.e. the factors should explain at least 50% of the variance in the variable. Since we have already determined that the variable is to be removed, it was not necessary to check the factor loadings for simple structure. The first statement in the pair is marked.

Page 39: Slide 1 Principal Components Analysis Complete Problems

Slide 39

Removing a Variable from the Factor Analysis - 1

To remove the variable, my organization encourages me to be involved in my community [q86], we select Factor Analysis from the Dialog Recall drop down menu.

Page 40: Slide 1 Principal Components Analysis Complete Problems

Slide 40

Removing a Variable from the Factor Analysis - 2

To remove the variable, highlight the target variable in the Variables list box, and click on the arrow button pointing to the left.

Page 41: Slide 1 Principal Components Analysis Complete Problems

Slide 41

Removing a Variable from the Factor Analysis - 3

Since all of the other specifications for the analysis remain the same, click on the OK button to produce the output for the second iteration.

Page 42: Slide 1 Principal Components Analysis Complete Problems

Slide 42

Statement about Second Iteration of Factor Extraction

The problem suggests that the second iteration of the factor solution included a variable (I believe we communicate our mission effectively to the public [q85] ) that should be excluded, with because it did not satisfy the requirement for communalities, or because it violated simple structure.

Page 43: Slide 1 Principal Components Analysis Complete Problems

Slide 43

Output for Communalities on Second Iteration

Examination of the second principal components model extracted by SPSS produced a table of Communalities in which all variables have the required minimum of .50.

Page 44: Slide 1 Principal Components Analysis Complete Problems

Slide 44

Output for Factor Structure on Second Iteration

Examination of the second principal components model extracted by SPSS resulted in the removal of the variable "I believe we communicate our mission effectively to the public" [q85] from the analysis. The variable "I believe we communicate our mission effectively to the public" [q85] had loadings of 0.40 or higher on component 1 (.526) and component 2 (.536).

Multiple high loadings violates the requirement for simple structure, so this variable was removed from the analysis.

Page 45: Slide 1 Principal Components Analysis Complete Problems

Slide 45

Marking the Statement about Second Iteration of Factor Extraction

I believe we communicate our mission effectively to the public [q85] was removed because it did not satisfy the requirement for simple structure, so the first statement in the pair is marked.

Page 46: Slide 1 Principal Components Analysis Complete Problems

Slide 46

Removing a Variable from the Factor Analysis - 1

To remove the variable, I believe we communicate our mission effectively to the public [q85], we select Factor Analysis from the Dialog Recall drop down menu.

Page 47: Slide 1 Principal Components Analysis Complete Problems

Slide 47

Removing a Variable from the Factor Analysis - 2

To remove the variable, highlight the target variable in the Variables list box, and click on the arrow button pointing to the left.

Page 48: Slide 1 Principal Components Analysis Complete Problems

Slide 48

Removing a Variable from the Factor Analysis - 3

Since all of the other specifications for the analysis remain the same, click on the OK button to produce the output for the second iteration.

Page 49: Slide 1 Principal Components Analysis Complete Problems

Slide 49

Statement about Third Iteration of Factor Extraction

The problem does not indicate that any variables were removed on the third iteration of the factor extraction, and that the solution met all of the requirements for a factor analysis solution:

• all the variables remaining in the analysis had communalities above 0.50,

• demonstrated simple structure, and• each component had more than one variable

loading on it

Page 50: Slide 1 Principal Components Analysis Complete Problems

Slide 50

Output for Communalities on Third Iteration - 1

Examination of the third principal components model extracted by SPSS produced a table of Communalities in which all four variables have the required minimum of .50.

Page 51: Slide 1 Principal Components Analysis Complete Problems

Slide 51

Output for Factor Structure on Third Iteration - 2

Examination of the third principal components model extracted by SPSS did not show any variables having a loading of .40 on both of the components.

Page 52: Slide 1 Principal Components Analysis Complete Problems

Slide 52

Output for Factor Structure on Third Iteration - 3

Each of the components has two variables loading on it.

If a component had only one variable loading on it, it would make more sense to use the original variable in subsequent analyses rather than the component.

Page 53: Slide 1 Principal Components Analysis Complete Problems

Slide 53

Marking the Statement about Third Iteration of Factor Extraction

On the third iteration, all of the requirements for a factor solution were satisfied.

For the 4 variables not excluded from the analysis, two components can be substituted for the 4 variables.

Since the final solution found two components, so we mark the statement.

Page 54: Slide 1 Principal Components Analysis Complete Problems

Slide 54

Statement about Variables Loading on the First Component

Two options are given which suggest different combinations of variables loading on the first component.

Page 55: Slide 1 Principal Components Analysis Complete Problems

Slide 55

Output for Component One

Component 1 included the variables: "information and knowledge are shared openly within this organization" [q76] (loading = .901); and "an effort is made to get the opinions of people throughout the organization" [q77] (loading = .912). We can substitute one component variable for this combination of variables in further analyses.

Since more than one component was extracted, the factor structure is based on the "Rotated Component Matrix"

Page 56: Slide 1 Principal Components Analysis Complete Problems

Slide 56

Marking the Statement about Variables Loading on the First Component

Component 1 included the variables: "information and knowledge are shared openly within this organization" [q76] and "an effort is made to get the opinions of people throughout the organization" [q77]. We mark the fist statement in the pair.

Page 57: Slide 1 Principal Components Analysis Complete Problems

Slide 57

Statement about Variables Loading on the Second Component

Two options are given which suggest different combinations of variables loading on the second component.

Page 58: Slide 1 Principal Components Analysis Complete Problems

Slide 58

Output for Component Two

Component 2 included the variables: "our web site is easy to use and contains helpful information" [q83] (loading = .821); and "I have a good understanding of our mission, vision, and strategic plan" [q84] (loading = .833). We can substitute one component variable for this combination of variables in further analyses.

Since more than one component was extracted, the factor structure is based on the "Rotated Component Matrix"

Page 59: Slide 1 Principal Components Analysis Complete Problems

Slide 59

Marking the Statement about Variables Loading on the Second Component

Component 2 included the variables: "our web site is easy to use and contains helpful information" [q83] and "I have a good understanding of our mission, vision, and strategic plan" [q84]. We mark the fist statement in the pair.

Page 60: Slide 1 Principal Components Analysis Complete Problems

Slide 60

Statement about Percentage of Variance Explained by Factors

The final statement questions whether or not the factor solution met the standard of explaining 60% of the variance in the variables that were replaced.

Page 61: Slide 1 Principal Components Analysis Complete Problems

Slide 61

Output for Percentage of Variance Explained by Factors

The components explain 77.25% of the total variance in the variables which are included on the components. This percentage of variance explained satisfies the goal of explaining 60% or more of the total original variance in the variables.

Page 62: Slide 1 Principal Components Analysis Complete Problems

Slide 62

Marking the Statement about Percentage of Variance Explained by Factors

Since the percentage of variance explained by the factors satisfies the goal of explaining 60% or more of the total original variance in the variables the components will replace, we mark the final statement.

Page 63: Slide 1 Principal Components Analysis Complete Problems

Slide 63

Statement about Outliers

The next statement requires us to determine whether or not there are any outliers in the results of the principal components analysis. If outliers are found, they are removed from the analysis and the results computed again. If the factor solution is the same as that based on all cases, we conclude that outliers do not have any impact and we report the results based on all cases.

If the solution without outliers is different, we face the difficult decision of which factor structure should be reported. In our problems, we will halt the analysis.

Page 64: Slide 1 Principal Components Analysis Complete Problems

Slide 64

Detecting Outliers - 1

To detect outliers, we compute the factor scores in SPSS.

Select the Factor Analysis command from the Dialog Recall tool button

Page 65: Slide 1 Principal Components Analysis Complete Problems

Slide 65

Detecting Outliers - 2

Click on the Scores… button to access the factor scores dialog box.

The only command we need to change is to request SPSS to compute the factor scores.

Page 66: Slide 1 Principal Components Analysis Complete Problems

Slide 66

Detecting Outliers - 3

First, click on the Save as variables checkbox to create factor variables.

Third, click on the Continue button to complete the specifications.

Second, accept the default method using a Regression equation to calculate the scores.

Page 67: Slide 1 Principal Components Analysis Complete Problems

Slide 67

Detecting Outliers - 4

Click on the Continue button to compute the factor scores.

Page 68: Slide 1 Principal Components Analysis Complete Problems

Slide 68

Outliers in the Data Editor

SPSS creates the factor score variables in the data editor window. It names the first factor score “FAC1_1,” and the second factor score “FAC2_1.”

We need to check to see if we have any values for either factor score that are larger than ±3.0. One way to check for the presence of large values indicating outliers is to sort the factor variables and see if any fall outside the acceptable range.

Should you forget to delete the factor scores from the previous analysis, SPSS will alter the final digit in the factor name, i.e. instead of naming it FAC1_1, it will name it FAC1_2.

Page 69: Slide 1 Principal Components Analysis Complete Problems

Slide 69

Sort the data to locate outliers for factor one

First, select the FAC1_1 column by clicking on its header.

Second, right click on the column header and select the Sort Ascending command from the drop down menu.

Page 70: Slide 1 Principal Components Analysis Complete Problems

Slide 70

Negative outliers for factor one

Scroll down past the cases for whom factor scores could not be computed because of missing data.

We see that none of the scores for factor one are less than or equal to -3.0, so there are no outliers detected yet.

Page 71: Slide 1 Principal Components Analysis Complete Problems

Slide 71

Positive outliers for factor one

Scrolling down to the bottom of the sorted data set, we see that none of the scores for factor one are greater than or equal to +3.0.

There are no outliers on factor one.

Page 72: Slide 1 Principal Components Analysis Complete Problems

Slide 72

Sort the data to locate outliers on factor two

First, select the fac2_1 column by clicking on its header.

Second, right click on the column header and select the Sort Ascending command from the drop down menu.

Page 73: Slide 1 Principal Components Analysis Complete Problems

Slide 73

Negative outliers for factor two

Scrolling down past the cases for whom factor scores could not be computed, we see that there are five cases that have a score factor less than or equal to -3.0 on factor 2.

Page 74: Slide 1 Principal Components Analysis Complete Problems

Slide 74

Positive outliers for factor two

Scrolling down to the bottom of the sorted data set, we see that none of the scores for factor two are greater than or equal to +3.0.

We will run the analysis excluding the five negative outliers, and see if it changes our interpretation of the analysis.

Page 75: Slide 1 Principal Components Analysis Complete Problems

Slide 75

Removing the outliers

To see whether or not outliers are having an impact on the factor solution, we will compute the factor analysis without the outliers and compare the results.

To remove the outliers, we will include the cases that are not outliers.

Choose the Select Cases… command from the Data menu.

Page 76: Slide 1 Principal Components Analysis Complete Problems

Slide 76

Setting the If condition

Click on the If… button to enter the formula for selecting cases to include in the analysis.

First, mark the option button for the If condition is satisfied.

Page 77: Slide 1 Principal Components Analysis Complete Problems

Slide 77

Formula to select cases that are not outliers

First, type the formula as shown. The formula says: include cases if the absolute value of the first and second factor scores are less than 3.0.

Second, click on the Continue button to complete the specification.

Page 78: Slide 1 Principal Components Analysis Complete Problems

Slide 78

Complete the select cases command

Having entered the formula for including cases, click on the OK button to complete the selection.

SPSS writes the formula we entered next to the IF button.

Page 79: Slide 1 Principal Components Analysis Complete Problems

Slide 79

The outliers selected out of the analysis

The cases with missing data are also excluded because they do not satisfy the criteria in the formula.

When SPSS selects a case out of the data analysis, it draws a slash through the case number. The cases that we identified as outliers will be excluded.

Page 80: Slide 1 Principal Components Analysis Complete Problems

Slide 80

Repeating the factor analysis

To repeat the factor analysis without the outliers, select the Factor Analysis command from the Dialog Recall tool button

Page 81: Slide 1 Principal Components Analysis Complete Problems

Slide 81

Stopping SPSS from computing factor scores again

On the last factor analysis, we included the specification to compute factor scores. Since we do not need to do this again, we will remove the specification.

Click on the Scores… button to access the factor scores dialog.

Page 82: Slide 1 Principal Components Analysis Complete Problems

Slide 82

Clearing the command to save factor scores

First, clear the Save as variables checkbox. This will deactivate the Method options.

Second, click on the Continue button to complete the specification

Page 83: Slide 1 Principal Components Analysis Complete Problems

Slide 83

Computing the factor analysis

To produce the output for the factor analysis excluding outliers, click on the OK button.

Page 84: Slide 1 Principal Components Analysis Complete Problems

Slide 84

Comparing communalities

All of the communalities for the factor analysis excluding outliers satisfy the minimum requirement of being larger than 0.50.

All of the communalities for the factor analysis including all cases satisfy the minimum requirement of being larger than 0.50.

Though the communalities for each variable are slightly smaller when we excluded outliers, we would not alter our interpretation of the role of these four variables in the solution.

Page 85: Slide 1 Principal Components Analysis Complete Problems

Slide 85

Comparing factor loadings

The pattern of variable loadings on components did not change when the outliers were removed. Component 1 included the variables: "information and knowledge are shared openly within this organization" [q76] and "an effort is made to get the opinions of people throughout the organization“. Component 2 included the variables: "our web site is easy to use and contains helpful information" [q83] and "I have a good understanding of our mission, vision, and strategic plan" [q84].

The factor loadings for the factor analysis including all cases is shown on the left.

The factor loadings for the factor analysis excluding outliers is shown on the right.

Page 86: Slide 1 Principal Components Analysis Complete Problems

Slide 86

Marking the Statement about Outliers

The presence of outliers did not alter the factor solution. The factor solution based on all cases should be used in further analyses.

We mark the check box for no impact due to outliers.

Had the factor solution changed, we would have halted the analysis until we could understand the problem further.

Page 87: Slide 1 Principal Components Analysis Complete Problems

Slide 87

Statement about Generalizability

Since factor analysis tends to over-fit the data used to develop the model at the expense of generalizability, we will test generalizability with a split sample validation strategy. In this strategy, we divide the sample in half, conduct the factor analysis on each half, and compare the results to the analysis on the full data set.

Page 88: Slide 1 Principal Components Analysis Complete Problems

Slide 88

Deleting the Factor Scores

Before we do the split-sample validation, we will delete the factors scores that we used to detect outliers.

First, highlight the columns containing the factors scores.

Second, select the Clear command from the Edit menu.

Page 89: Slide 1 Principal Components Analysis Complete Problems

Slide 89

Restoring All Cases to the Analysis - 1

We removed cases that were detected as outliers. Before doing our validation, we need to restore these cases to subsequent analyses.

Select the Select Cases command from the Data menu.

Page 90: Slide 1 Principal Components Analysis Complete Problems

Slide 90

Restoring All Cases to the Analysis - 2

First, click on the All cases option button.

Click on the OK button to restore the cases.

Page 91: Slide 1 Principal Components Analysis Complete Problems

Slide 91

All Cases Restored to the Data Set

The slash lines are removed from the case numbers, indicating that all cases are available to the analysis.

Page 92: Slide 1 Principal Components Analysis Complete Problems

Slide 92

Split-sample validation

We validate our analysis by conducting an analysis on each half of the sample. We compare the results of these two split sample analyses with the analysis of the full data set.

To split the sample into two half, we generate a random variable that indicates which half of the sample each case should be placed in.

To compute a random selection of cases, we need to specify the starting value, or random number seed. Otherwise, the random sequence of numbers that you generate will not match mine, and we will get different results.

Before we do the random selection, you must make certain that your data set is sorted in the original sort order, or the cases in your two half samples will not match mine. To make certain your data set is in the same order as mine, sort your data set in ascending order by case id.

Page 93: Slide 1 Principal Components Analysis Complete Problems

Slide 93

Sorting the data set in ascending order

To make certain the data set is sorted in the original order, highlight the case id column, right click on the column header, and select the Sort Ascending command from the popup menu.

Page 94: Slide 1 Principal Components Analysis Complete Problems

Slide 94

Setting the random number seed - 1

To set the random number seed, select the Random Number Generators… command from the Transform menu.

NOTE: you must use the random number seed that is stated in the problem in order to produce the same results that I found. Any other seed will generate a different random sequence that can produce results that are very different from mine.

Page 95: Slide 1 Principal Components Analysis Complete Problems

Slide 95

Setting the random number seed - 2

Third, type the seed number provided in the problem directions: 291769.

First, mark the check for Set Starting Point.

Second, select the option button for a Fixed Value.

Fourth, click on the OK button to complete the action.

NOTE: SPSS does not provide any feedback that the seed has been set or changed. If you are in doubt, you can reopen the dialog box and see what it indicates.

Page 96: Slide 1 Principal Components Analysis Complete Problems

Slide 96

Computing the split variable - 1

To enter the formula for the variable that will split the sample in two parts, click on the Compute… command.

Page 97: Slide 1 Principal Components Analysis Complete Problems

Slide 97

Computing the split variable - 2

First, type the name for the new variable, split, into the Target Variable text box.

Second, the formula for the value of split is shown in the text box.

The uniform(1) function generates a random decimal number between 0 and 1. The random number is compared to the value 0.50.

If the random number is less than or equal to 0.50, the value of the formula will be 1, the SPSS numeric equivalent to true. If the random number is larger than 0.50, the formula will return a 0, the SPSS numeric equivalent to false.

Third, click on the OK button to complete the dialog box.

Page 98: Slide 1 Principal Components Analysis Complete Problems

Slide 98

The split variable in the data editor

In the data editor, the split variable shows a random pattern of zero’s and one’s.

To select half of the sample for each validation analysis, we will first select the cases where split = 0, then select the cases where split = 1.

Page 99: Slide 1 Principal Components Analysis Complete Problems

Slide 99

Repeating the analysis with the first validation sample

To repeat the principal component analysis for the first validation sample, select Factor Analysis from the Dialog Recall tool button.

Page 100: Slide 1 Principal Components Analysis Complete Problems

Slide 100

Using "split" as the selection variable

First, scroll down the list of variables and highlight the variable split.

Second, click on the right arrow button to move the split variable to the Selection Variable text box.

Page 101: Slide 1 Principal Components Analysis Complete Problems

Slide 101

Setting the value of split to select cases

When the variable named split is moved to the Selection Variable text box, SPSS adds "=?" after the name to prompt up to enter a specific value for split.

Click on the Value… button to enter a value for split.

Page 102: Slide 1 Principal Components Analysis Complete Problems

Slide 102

Completing the value selection

First, type the value for the first half of the sample, 0, into the Value for Selection Variable text box.

Second, click on the Continue button to complete the value entry.

Page 103: Slide 1 Principal Components Analysis Complete Problems

Slide 103

Requesting output for the first validation sample

When the value entry dialog box is closed, SPSS adds the value we entered after the equal sign. This specification now tells SPSS to include in the analysis only those cases that have a value of 0 for the split variable.

Click on the OK button to request the output.

Since the validation analysis requires us to compare the results of the analysis using the two split sample, we will request the output for the second sample before doing any comparison.

Page 104: Slide 1 Principal Components Analysis Complete Problems

Slide 104

Repeating the analysis with the second validation sample

To repeat the principal component analysis for the second validation sample, select Factor Analysis from the Dialog Recall tool button.

Page 105: Slide 1 Principal Components Analysis Complete Problems

Slide 105

Setting the value of split to select cases

Since the split variable is already in the Selection Variable text box, we only need to change its value.

Click on the Value… button to enter a different value for split.

Page 106: Slide 1 Principal Components Analysis Complete Problems

Slide 106

Completing the value selection

First, type the value for the second half of the sample, 1, into the Value for Selection Variable text box.

Second, click on the Continue button to complete the value entry.

Page 107: Slide 1 Principal Components Analysis Complete Problems

Slide 107

Requesting output for the second validation sample

When the value entry dialog box is closed, SPSS adds the value we entered after the equal sign. This specification now tells SPSS to include in the analysis only those cases that have a value of 1 for the split variable.

Click on the OK button to request the output.

Page 108: Slide 1 Principal Components Analysis Complete Problems

Slide 108

Comparing communalities

All of the communalities for the first split sample satisfy the minimum requirement of being larger than 0.50.

Note how SPSS identifies for us which cases we selected for the analysis.

All of the communalities for the second split sample satisfy the minimum requirement of being larger than 0.50.

Page 109: Slide 1 Principal Components Analysis Complete Problems

Slide 109

Comparing factor loadings

The pattern of factor loading for both split samples shows the variables: "information and knowledge are shared openly within this organization" [q76] and "an effort is made to get the opinions of people throughout the organization" [q77] loading on component one, and "our web site is easy to use and contains helpful information" [q83] and "I have a good understanding of our mission, vision, and strategic plan" [q84]loading on the second component.

Page 110: Slide 1 Principal Components Analysis Complete Problems

Slide 110

Marking the Statement about Generalizability

All of the communalities in both validation samples met the criteria.

The pattern of loadings for both validation samples is the same, and the same as the pattern for the analysis using the full sample.

In effect, we have done the same analysis on two separate sub-samples of cases and obtained the same results.

This validation analysis supports a finding that the results of this principal component analysis are generalizable to the population represented by this data set. We mark the check box.

Page 111: Slide 1 Principal Components Analysis Complete Problems

Slide 111

Statement about Summated Scales

The next statement indicates that we can form a summative scale from the variables loading on a component, i.e. summing or averaging the scores for the variables.

The utility of summated scales is measured by Chronbach’s alpha, which should minimally be greater than 0.60, and preferably be greater than 0.70.

Page 112: Slide 1 Principal Components Analysis Complete Problems

Slide 112

Computing Chronbach's Alpha

To compute Chronbach's alpha for each component in our analysis, we select Scale > Reliability Analysis… from the Analyze menu.

Page 113: Slide 1 Principal Components Analysis Complete Problems

Slide 113

Selecting the variables for the first component

First, move the two variables that loaded on the first component (q76 and q77) to the Items list box.

Second, click on the Statistics… button to select the statistics we will need.

Page 114: Slide 1 Principal Components Analysis Complete Problems

Slide 114

Selecting the statistics for the output

First, mark the checkboxes for Item, Scale, and Scale if item deleted.

Second, click on the Continue button.

Page 115: Slide 1 Principal Components Analysis Complete Problems

Slide 115

Completing the specifications

First, If Alpha is not selected as the Model in the drop down menu, select it now.

Second, click on the OK button to produce the output.

Page 116: Slide 1 Principal Components Analysis Complete Problems

Slide 116

Chronbach's Alpha

The reliability for component 1 as measured by Chronbach's alpha is 0.814, which is greater than the generally agreed upon lower limit of 0.70. The variables included on this component ("information and knowledge are shared openly within this organization" and "an effort is made to get the opinions of people throughout the organization") can be used in a summated scale.

Page 117: Slide 1 Principal Components Analysis Complete Problems

Slide 117

Computing Chronbach's Alpha

To compute Chronbach's alpha for the second scale we select Reliability Analysis from the Dialog Recall menu.

Page 118: Slide 1 Principal Components Analysis Complete Problems

Slide 118

Selecting the variables for the second component

First, remove the variables that loaded on the first component and move the two variables that loaded on the second component to the Items list box.

Second, since we want the same output we had for the first component, we only need to click on the OK button to produce the output.

Page 119: Slide 1 Principal Components Analysis Complete Problems

Slide 119

Chronbach's Alpha

The reliability for component 2 as measured by Chronbach's alpha is 0.561, which is less than the generally agreed upon lower limit of 0.70, and even less than the 0.60 lower limit for exploratory research. A summated scale based on these variables ("our web site is easy to use and contains helpful information" and "I have a good understanding of our mission, vision, and strategic plan") should not be used.

Page 120: Slide 1 Principal Components Analysis Complete Problems

Slide 120

Chronbach's Alpha if Item Deleted - 1

If alpha is too small, the Chronbach’s Alpha if Item Deleted column may suggest which variable should be removed to improve the internal consistency of the scale variables. It tells us what alpha we would get if the variable listed were removed from the scale.

In this example, it does not produce a result because there are only two items and the removal of one would result in a one-item scale, which is not useful.

Page 121: Slide 1 Principal Components Analysis Complete Problems

Slide 121

Chronbach's Alpha if Item Deleted - 2

Though not part of this problem, this output demonstrates the output for deleting an item to increase alpha.

If the last item in this table were deleted, alpha would increase to .820, instead of the .686 for alpha with this item included.

Page 122: Slide 1 Principal Components Analysis Complete Problems

Slide 122

Marking the Statement about Summated Scales

Since the variables loading on the second component did not satisfy the reliability scale, we leave the check box un-marked.

Page 123: Slide 1 Principal Components Analysis Complete Problems

Slide 123

Principal Components Analysis: Level of Measurement

No

No

Ordinal level variable treated as metric?

Yes

Yes

Level of measurement ok (all variables metric)?

Consider limitation in discussion of findings

Mark check box for level of measurement

Do not mark check box for level of measurement

Mark: Inappropriate application of the statistic

Stop

Page 124: Slide 1 Principal Components Analysis Complete Problems

Slide 124

Principal Components Analysis: Sample Size

Yes

NoAdequate Sample Size(at least 150 valid cases)

Consider limitation in discussion of findings

Mark check box for sample size

Do not mark check box for sample size

Run Principal Components Analysis

Page 125: Slide 1 Principal Components Analysis Complete Problems

Slide 125

Principal Components Analysis: Suitability for Factor Analysis - 1

No

Mark check box for correlations

Do not mark check box for correlations

Probability for Bartlett test of sphericity ≤ alpha?

Two or more correlations ≥ 0.30?

Yes

Yes

Stop, variables not good candidate for factor analysis

No Do not mark check box for sphericity test

Stop, variables not good candidate for factor analysis

Mark check box for sphericity test

Page 126: Slide 1 Principal Components Analysis Complete Problems

Slide 126

Principal Components Analysis: Suitability for Factor Analysis - 2

No

Yes

Yes

No

Mark check box for MSA

Remove variable with lowest MSA.Run PCA again.

Sampling adequacy ≥0.50 for each variable?

KMO measure of sampling adequacy ≥ 0.50?

One variable remaining in

analysis?

Yes

Do not mark check box for MSA

Stop, variables not good candidate for factor analysis

No

Do not mark check box for MSA

Stop, variables not good candidate for factor analysis

Page 127: Slide 1 Principal Components Analysis Complete Problems

Slide 127

Principal Components Analysis: Anticipated Number of Factors

No

Yes

Mark correct check box for number of factors

Don’t mark check box for number of factors

Correct umber factors supported by eigenvalues > 1.0 and the

number of components needed to explain 60% of the variance?

Today, this step provides information to the analyst about the potential solution. When factor analysis was calculated by hand, this step determined how one would do the calculations.

Page 128: Slide 1 Principal Components Analysis Complete Problems

Slide 128

Principal Components Analysis: Excluding Variables for Low Communality

Yes

No

Mark check box for communality removal

Remove variable load that is only one loading on

component.

One variable remaining in

analysis?Yes

Stop, no viable factor solution

Run PCA again.

No

Communality for all variables ≥ 0.50?

Do not mark check box for communality removal

Page 129: Slide 1 Principal Components Analysis Complete Problems

Slide 129

Principal Components Analysis: Excluding Variables for Complex Structure

Yes

No

Mark check box for complex structure removal

Remove variable load that is only one loading on

component.

One variable remaining in

analysis?Yes

Do not mark check box for complex structure removal

Stop, no viable factor solution

Run PCA again.

No

Simple structure (all variables load on single component)?

Page 130: Slide 1 Principal Components Analysis Complete Problems

Slide 130

Principal Components Analysis: Excluding Variables for One-variable Components

Yes

No

Mark check box for one-variable component

Remove variable load that is only one loading on

component.

One variable remaining in

analysis?Yes

Do not mark check box for one-variable component

Stop, no viable factor solution

All components have more than one variable

loading?

Run PCA again.

No

Page 131: Slide 1 Principal Components Analysis Complete Problems

Slide 131

Principal Components Analysis: Factor Structure

No

Yes

Mark check box for number of components

Do not mark check box for number of component

Correct list of variables loaded on component?

Correct number of components extracted?

No Do not mark check box for loadings on component

Yes

Mark check box for loadings on component

Repeat this step for each component

Page 132: Slide 1 Principal Components Analysis Complete Problems

Slide 132

Principal Components Analysis: Percent of Variance Explained

No

Yes

Mark check box for percent of variance

Do not mark check box for percent of variance

Components explain 60% or more of variance of

included variables?

Include as limitation in discussion of findings

Page 133: Slide 1 Principal Components Analysis Complete Problems

Slide 133

Principal Components Analysis: Impact of Outliers - 1

Yes

No outliers, mark check box for no impact

No

Re-run factor analysis, requesting regression factor scores

Are any of the factor scores outliers (larger than ±3.0)?

Re-run factor analysis, excluding outliers

Yes

Go to validation analysis

Starting here, we include only the variables in the factor solution.

Page 134: Slide 1 Principal Components Analysis Complete Problems

Slide 134

Principal Components Analysis: Impact of Outliers - 2

No

Yes

Mark check box for no impact

Are all of the communalities excluding outliers greater than 0.50?

Pattern of factor loadings excluding outliers match pattern for full data set?

Yes

No

Do not mark check box for no impact

Stop, clarify which analysis should be reported

Do not mark check box for no impact

Stop, clarify which analysis should be reported

Re-run factor analysis, including all cases Since outliers had no effect,

there is no reason to exclude them from the analysis

Page 135: Slide 1 Principal Components Analysis Complete Problems

Slide 135

Principal Components Analysis: Validation Analysis - 1

Compute split variable using specified random number seed

Run factor analysis, selecting cases where split = 0

Run factor analysis, selecting cases where split = 1

No

Yes

Are all of the communalities for both split samples greater than 0.50?

Do not mark check box for validation analysis

Stop, generalizability of findings is questionable

Page 136: Slide 1 Principal Components Analysis Complete Problems

Slide 136

Principal Components Analysis: Impact of Outliers - 2

Mark check box for generalizability

Pattern of factor loadings for split samples matches factor loadings for full data set?

Yes

No Do not mark check box for validation analysis

Stop, generalizability of findings is questionable

Page 137: Slide 1 Principal Components Analysis Complete Problems

Slide 137

Principal Components Analysis: Reliability Analysis

No

Yes

Mark check box for summated scales

Do not mark check box for summated scales

Chronbach’s alpha greater than .70 for all components?

Compute Chronbach’s alpha for all components

Chronbach’s alpha greater than .60 for all components?

Yes

No Mark check box for summated scales

Add note of cautionto interpretation