1 psy6010: statistics, psychometrics and research design professor leora lawton spring 2006...

1

PSY6010: Statistics, Psychometrics and Research Design

Professor Leora LawtonSpring 2006

Wednesdays 7-10 PMRoom 204

FACTOR ANALYSIS, CLUSTER ANALYSISand

SEGMENTATIONS

2

1. Purpose of Factor Analysis

Factor Analysis – a ‘data reduction’ technique1. Technique for dealing with multicollinearity2. Used to transform Likert scales into factor scores as

an alternative to linear additive scale. 3. Creates groups of respondents based on sets of

shared attitudes (explains variables in terms of their underlying dimensions).

4. Facilitates interpretation of a large number of variables

5. Factor scores (the grouped attitudes) can be then used as an independent variable.

3

2. Steps to conducting FA

• When creating a questionnaire, often you may want to include a number of attitudinal questions around certain issues.

• When analyzing the data with all these variables you start by selecting those attitudes that you think describe some overall category, for example ‘Taste in Music’.

• These attitudinal variables ideally should be of the same metric (e.g., 1,2,3,4,5). Some say the variables should have 7 values, but 5 works fine. Don’t use dichotomous variables.

• Begin by computing a correlation matrix of all the variables in question. There should be some significant correlations, both positive and negative.

• There should be a 4:1 ratio of cases to variables (e.g., 100 cases for 25 variables minimum), and sample size of at least 50.

4

Correlation matrix of musical tastes

• Research issue: You’ve been asked by a music store owner to assist in increasing sales by making sure the placement of music genres in the store is optimal.

• Using GSS93 subset.sav, run a set of frequencies to check that the variables fit the requirements.

• Then run a correlation matrix of all the music questions.

5

Correlation MatrixCorrelations

1 .267** .111** .269** .526** .365** .311** .280** .363** -.059* -.091**

.000 .000 .000 .000 .000 .000 .000 .000 .033 .001

1337 1253 1328 1307 1299 1302 1300 1320 1285 1292 1290

.267** 1 .409** .194** .153** .109** .399** .062* .149** -.026 -.005

.000 .000 .000 .000 .000 .000 .024 .000 .352 .847

1253 1335 1331 1307 1286 1292 1296 1311 1277 1298 1298

.111** .409** 1 .033 .008 -.109** .214** -.110** -.029 -.041 -.075**

.000 .000 .211 .757 .000 .000 .000 .275 .127 .005

1328 1331 1468 1421 1398 1409 1404 1436 1398 1417 1413

.269** .194** .033 1 .220** .199** .167** .556** .206** .183** .107**

.000 .000 .211 .000 .000 .000 .000 .000 .000 .000

1307 1307 1421 1434 1381 1387 1379 1416 1370 1391 1383

.526** .153** .008 .220** 1 .499** .363** .262** .452** .030 -.115**

.000 .000 .757 .000 .000 .000 .000 .000 .272 .000

1299 1286 1398 1381 1412 1381 1362 1398 1359 1369 1366

.365** .109** -.109** .199** .499** 1 .407** .281** .583** .014 .000

.000 .000 .000 .000 .000 .000 .000 .000 .604 .996

1302 1292 1409 1387 1381 1425 1375 1406 1377 1383 1375

.311** .399** .214** .167** .363** .407** 1 .112** .328** -.058* -.039

.000 .000 .000 .000 .000 .000 .000 .000 .031 .144

1300 1296 1404 1379 1362 1375 1414 1393 1359 1374 1373

.280** .062* -.110** .556** .262** .281** .112** 1 .231** .197** .100**

.000 .024 .000 .000 .000 .000 .000 .000 .000 .000

1320 1311 1436 1416 1398 1406 1393 1451 1388 1405 1401

.363** .149** -.029 .206** .452** .583** .328** .231** 1 .116** -.013

.000 .000 .275 .000 .000 .000 .000 .000 .000 .623

1285 1277 1398 1370 1359 1377 1359 1388 1410 1370 1366

-.059* -.026 -.041 .183** .030 .014 -.058* .197** .116** 1 .360**

.033 .352 .127 .000 .272 .604 .031 .000 .000 .000

1292 1298 1417 1391 1369 1383 1374 1405 1370 1431 1392

-.091** -.005 -.075** .107** -.115** .000 -.039 .100** -.013 .360** 1

.001 .847 .005 .000 .000 .996 .144 .000 .623 .000

1290 1298 1413 1383 1366 1375 1373 1401 1366 1392 1423

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

bigband Bigband Music

blugrass BluegrassMusic

country Country WesternMusic

blues Blues or R & BMusic

musicals BroadwayMusicals

classicl Classical Music

folk Folk Music

jazz Jazz Music

opera Opera

rap Rap Music

hvymetal Heavy MetalMus ic

bigband BigbandMusic

blugrass Bluegrass

Music

country CountryWestern

Music

blues Blues or R& B Music


classicl Classical

Mus icfolk Folk

Musicjazz Jazz

Music opera Operarap Rap

Music

hvymetal Heavy Metal

Mus ic

Correlation is significant at the 0.01 level (2-tailed).**.

Correlation is significant at the 0.05 level (2-tailed).*.

6

Evaluating Appropriateness of FA

• Check the correlation matrix, which examines only relationships between pairs of variables (e.g., bivariate, not multivariate correlation)

• So, then select these variables into the FA.• Analysis - Data Reduction – Factor • Move all 11 music variables to the Variables window.• Under Descriptions, click on the option for KMO and Bartletts test of

sphericity.• Use Bartlett Test of Sphericity to examine the entire matrix, where you want

to reject the null hypothesis that the matrix is a unity matrix (i.e., it should be significant. A unity matrix is when all the correlations are 0 except for, of course, the correlation between a variable and itself (=1). (Note that our text says not to place much value on this test in most cases.)

• KMO stands for Kaiser-Meyer-Olkin Meausure and it compares the magnitude of observed correlation coefficients to partial (that is, what’s unique about the attribute) coefficients. Here you want a number closer to 1. Less than .5 indicates that FA may not be appropriate. Ours is .748.

7

SPSS for PCA/FA

• Analysis – Data Reduction – Factor• Under Extraction, choose the options for Principle

Components, Eigenvalues over 1, Display unrotated and screen plot.

• Note that there is an option for Number of Factors. There are times you may want to impose a number rather than letting SPSS decide for you (and it decides based on the eigenvalues in the extraction).

• For Rotation, choose Varimax (variance maximization; it’s the most commonly used), and Display Rotated Solution.

• For scores, you will want to select Save as Variables/Regression when you find your solution. But not while in the exploration phase.

8

SPSS for PCA/FA

FACTOR /VARIABLES bigband blugrass country blues musicals classicl folk jazz opera rap hvymetal /MISSING LISTWISE /ANALYSIS

bigband blugrass country blues musicals classicl folk jazz opera rap hvymetal

/PRINT INITIAL KMO EXTRACTION ROTATION /CRITERIA MINEIGEN(1) ITERATE(25) /EXTRACTION PC /CRITERIA ITERATE(25) /ROTATION VARIMAX /METHOD=CORRELATION .

9

Interpreting SPSS results

• Under the chart ‘Total Variance Explained’ you will see that four factors have been identified, based on having eigenvalues > 1.

• The screen plot shows you a pictoral view of the eigenvalues. We have four, some might want to try the fifth, because that’s where the slope of the eigenvalues change, or similarly, try only 2. The most important thing is that the solution is interpretable, that it makes sense, that the factors provide insight into your overall concept. Eigenvalues are the values for the factor loading matrix that is used to describe the factors. It’s the variance in the correlation matrix condensed into a scale such that the factor with the largest eigenvalue has the most variance (or, the more variance the greater the distance of one factor from another, i.e., the factors are distinguishable.

• The unrotated matrix doesn’t tell you too much, go directly to the rotated matrix: here’s where the ‘rotated view’ can give you a better picture on the distinctiveness of each factor. Rotation maximizes high correlations and minimizes low correlations in the matrix used t calculate the factors, or it makes the factors more distinguishable to the ‘naked eye.’

• In the rotated matrix, you then select the variables (attributes) with the highest coefficients. This one works out pretty well, sometimes you have to go back to the drawing board to redefine.

• Try it by limiting the result to just two factors. What underlying issue might be explaining this result compared to the four-factor solution?

10


Rotated Component Matrixa

.597 .340 .206 -.189

.164 .137 .813 .018

-.074 -.045 .825 -.058

.133 .850 .143 .105

.764 .190 .033 -.091

.841 .097 -.072 .046

.604 -.040 .463 -.012

.204 .843 -.086 .099

.785 .090 .006 .103

.020 .142 -.027 .793

-.044 .018 -.012 .822

bigband Bigband Music

blugrass BluegrassMusic

country Country WesternMusic

blues Blues or R & BMusic


classicl Classical Music

folk Folk Music

jazz Jazz Music

opera Opera

rap Rap Music

hvymetal Heavy MetalMus ic

1 2 3 4

Component

Extraction Method: Princ ipal Component Analysis. Rotation Method: Varimax with Kaiser Normalization.

Rotation converged in 5 iterations.a.

You want to keep find components where the coefficients are at least above .3 and see a clear demarcation between the highest coefficients per component. Note that folk music is high for both 1 and 3. Sometimes therefore it is worthwhile to set the number of components to one above, and one less, than the default number based on the eigenvalue you’ve selected.

11

Scree Plot: Number of Components

12


Rotated Component Matrix(a)

Component

1 2 3 4

Bigband Music 0.597 0.340 0.206 -0.189

Bluegrass Music 0.164 0.137 0.813 0.018

Country Western Music -0.074 -0.045 0.825 -0.058

Blues or R & B Music 0.133 0.850 0.143 0.105

Broadway Musicals 0.764 0.190 0.033 -0.091

Classical Music 0.841 0.097 -0.072 0.046

Folk Music 0.604 -0.040 0.463 -0.012

Jazz Music 0.204 0.843 -0.086 0.099

Opera 0.785 0.090 0.006 0.103

Rap Music 0.020 0.142 -0.027 0.793

Heavy Metal Music -0.044 0.018 -0.012 0.822

Extraction Method: Principal Component Analysis. Rotation Method:

Varimax with Kaiser Normalization.

a Rotation converged in 5 iterations.

13

Project Recommendations

Current

Aisle 1A Aisle1B Aisle2A Aisle2B

bigband jazz heavy metal rap

bluegrass blues musicals C&W

classical opera

folk

Recommended

Aisle 1A Aisle1B Aisle2A Aisle2B

bigband folk C&W metal

musicals classical bluegrass rap

opera blues

Jazz

14

Homework #8

• Using our own employee dataset (or if you wish, use your SDA data set and select your own variables), take the attitudinal variables, to understand how people define “quality of work.”– V11 I have the necessary resources (e.g., computers, databases) to

do my work comfortably and efficiently.– V13 The work I'm responsible for is appropriate for my level of

capability.– V16 I'm challenged and interested in my work.– V17 My immediate manager recognizes and acknowledges my

contributions.– V22 I have responsibility with the required authority.– V24 I am satisfied with communications between management and

employees.– v41r Your total compensation (salary, bonuses)– v42r 401(k), retirement and/or pension– v43r Availability of PTO (vacation) days– v44r The office itself (lighting, space, decor)– v45r Performance awards and bonuses

15

Homework #8

• Run a frequencies test to make sure they are appropriate. Are they? Explain.

• Run a correlations table. Is this appropriate for PCA/FA? Explain.

• On this same selection of variables, conduct tests for KMO and Bartlett. Are we still on track for PCA/FA? Explain.

• Now conduct a factor analysis using these variables, setting the defaults as in the class example. Are you happy with this result? Then try setting the number of components differently, adding one or more, or subtracting, from the first result. Are you happy with this result? Explain.

• What can you say about components of Quality of Work?

16

Using Factor Scores

• Rarely are factor analyses conducted just for themselves. Rather, they are used as attitudinal measures to predict or be associated with other behavior or statuses.

• One could use factor scores as predictors in regression analyses.

• Or, as will be seen in segmentation later this semester, one can use factor scores to cluster with other characteristics to create typologies, or segments, of subgroups in a population.

• Today we’ll go back and use our music taste factors as predictors in other behaviors.

17

Review of Factor Analysis

First, let’s not twist our brains into pretzels, so begin by doing an automatic recode on all musical variables. Give them a consistent new name, e.g., preface or end with an ‘r’, e.g., BIGBAND becomes RBIGBAND.

/VARIABLES bigband blugrass country blues musicals classicl folk jazz opera rap hvymetal

18

Saving the Factor Score

• Analyze – data reduction – factor– Descriptives (check KMO-Bartletts)– Extraction (uncheck unrotated matrix, and check

Screen Plot, select method = principal components)– Rotation (select varimax)– Scores (select Save as Variables)

• Run. Now look at your Variable View, and then at the Data View.

• Now run a Descriptive Statistics – Descriptives – Mean, Std Dev, Min, Max).

19

Using Factor Scores in a Regression

• Now, let’s predict tv viewing.

• First, run a frequencies of the variable TV hours watched per week.

• Recode it so that 8 hours and above = 8.

• Create a conceptual model:

TV viewing = a + musical taste + education + sex + age.

Run your regression with these variables.

20

Homework #9

• Using the same factor analysis you ran last week with the employee data (see slide #14, run this factor analysis and save the factor score variables.

• Now run a regression:• Overall satisfaction = a + (factor scores) + male

+ hours worked (hourswk)+ whether there was a layoff (v32)

• Explain why this model makes theoretical sense. Now explain the results. If you were an HR manager, what areas would you either try to improve, or make sure they stay as good?

21

Segmentation Using Factor Analysis and Cluster Analysis

• As you learned last week, segmentation analysis is used to create typologies or categorical groups of constituents, such as customers, patrons, etc.

• Often segmentations employ factor score results as well. • In a segmentation, one first develops any necessary factor scores and

saves them as output variables (you will see them added to your data set). • Then, because the purpose of the segmentation is to create groups that can

then be reached through some sort of marketing (social or commercial), or for some other actionable purpose, use demographics that can be employed to target the groups.

• Then, with the factor scores and the sociodemographic variables identified as being logical, use a clustering technique to create the groups.

• We will use cluster analysis, but other techniques include discriminant (also in SPSS), CHAID and CART (separate software packages), and the most adventurous is latent class models (also separate software, such as AMOS).

22

Cluster Analysis - 1

• We’ll use GSS93 subset.sav. • You will remember our musical factors (go back to slide #12 for

results).• First create names for your factor scores. I’ve labeled them:

Classbig, bluejazz, cwgrass, heavyrap. Clients like meaningful labels, plus it helps you when reading the output.

• Then, consider possible demographic factors that might relate to musical taste, e.g., sex, age, race, region, education, income.

• Because this kind of analysis tends to be exploratory, you don’t need to specify the logic behind the relationships, but you should have some a priori idea about why these factors might be important in distinguishing the possible groups, in this case, musical taste.

• Cluster analysis doesn’t require recoding of IVs the way the other methods do…specify a categorical variable, or a covariate, as is appropriate.

23

Cluster Analysis - 2

• Analyze - Classify – 2-step Cluster – select factors (categorical variables, e.g., sex) and covariates (ratio, interval or continuous variables).

• In our first round, do not specify the number of clusters.

• Because segmentations are part art, part science, you need to experiment until you find one that ‘works’ for you, so let’s try it with a different number of clusters.

24

Syntax for Cluster Analysis

• TWOSTEP CLUSTER• /CATEGORICAL VARIABLES = sex politics• /CONTINUOUS VARIABLES = bigclass bluejazz cwgrass heavyrap age

educ• /DISTANCE LIKELIHOOD• /NUMCLUSTERS FIXED = 4• /HANDLENOISE 0• /MEMALLOCATE 64• /CRITERIA INITHRESHOLD (0) MXBRANCH (8) MXLEVEL (3)• /PLOT BARFREQ PIEFREQ• /PRINT COUNT SUMMARY• /SAVE VARIABLE=TSC_4337 .• AIM TSC_4337• /CATEGORICAL sex politics• /CONTINUOUS bigclass bluejazz cwgrass heavyrap age educ• /PLOT ERRORBAR CATEGORY CLUSTER (TYPE=PIE) .

25

Segmentation Homework

• Use the same data set, but this time use the variables for tv viewing and attendance at sports events and art museums for your factors.

• Label the factors, then cluster them with age, sex, political views.

• Try it with 3, 4, and 5 clusters. Which do you find, if any, to be believable? Why?

1 psy6010: statistics, psychometrics and research design professor leora lawton spring 2006...

Documents