multivariate data analysis using spss

124
Multivariate Data Multivariate Data Analysis Using SPSS Analysis Using SPSS John Zhang John Zhang ARL, IUP ARL, IUP

Upload: k9denden

Post on 14-Nov-2014

125 views

Category:

Documents


9 download

TRANSCRIPT

Page 1: Multivariate Data Analysis Using SPSS

Multivariate Data Analysis Multivariate Data Analysis Using SPSSUsing SPSS

John ZhangJohn Zhang

ARL, IUPARL, IUP

Page 2: Multivariate Data Analysis Using SPSS

TopicsTopics

A Guide to Multivariate TechniquesA Guide to Multivariate TechniquesPreparation for Statistical AnalysisPreparation for Statistical AnalysisReview: ANOVAReview: ANOVAReview: ANCOVAReview: ANCOVAMANOVAMANOVAMANCOVAMANCOVARepeated Measure AnalysisRepeated Measure AnalysisFactor AnalysisFactor AnalysisDiscriminant AnalysisDiscriminant AnalysisCluster AnalysisCluster Analysis

Page 3: Multivariate Data Analysis Using SPSS

Guide-1Guide-1

Correlation: 1 IV – 1 DV; relationshipCorrelation: 1 IV – 1 DV; relationshipRegression: 1+ IV – 1 DV; relation/predictionRegression: 1+ IV – 1 DV; relation/predictionT test: 1 IV (Cat.) – 1 DV; group diff.T test: 1 IV (Cat.) – 1 DV; group diff.One-way ANOVA: 1 IV (2+ cat.) – 1 DV; One-way ANOVA: 1 IV (2+ cat.) – 1 DV; group diff.group diff.One-way ANCOVA: 1 IV (2+ cat.) – 1 DV – One-way ANCOVA: 1 IV (2+ cat.) – 1 DV – 1+ covariates; group diff.1+ covariates; group diff.One-way MANOVA: 1 IV (2+ cat.) – 2+ DVs; One-way MANOVA: 1 IV (2+ cat.) – 2+ DVs; group diff.group diff.

Page 4: Multivariate Data Analysis Using SPSS

Guide-2Guide-2

One-way MANCOVA: 1 IV (2+cat.) – 2+ DVs – One-way MANCOVA: 1 IV (2+cat.) – 2+ DVs – 1+ covariate; group diff.1+ covariate; group diff.

Factorial MANOVA: 2+ IVs (2+cat.) – 2+ DVs; Factorial MANOVA: 2+ IVs (2+cat.) – 2+ DVs; group diff.group diff.

Factorial MANCOVA: 2+ IVs (2+cat.) – 2+ DVs – Factorial MANCOVA: 2+ IVs (2+cat.) – 2+ DVs – 1+ covariate; group diff.1+ covariate; group diff.

Discriminant Analysis: 2+ IVs – 1 DV (cat.); Discriminant Analysis: 2+ IVs – 1 DV (cat.); group predictiongroup prediction

Factor Analysis: explore the underlying structureFactor Analysis: explore the underlying structure

Page 5: Multivariate Data Analysis Using SPSS

Preparation for Stat. Analysis-1Preparation for Stat. Analysis-1

Screen dataScreen data– SPSS Utility proceduresSPSS Utility procedures– Frequency procedureFrequency procedure

Missing data analysis (missing data should Missing data analysis (missing data should be random)be random)– Check if patterns existCheck if patterns exist– Drop data case-wiseDrop data case-wise– Drop data variable-wiseDrop data variable-wise– Impute missing dataImpute missing data

Page 6: Multivariate Data Analysis Using SPSS

Preparation for Stat. Analysis-2Preparation for Stat. Analysis-2

Outliers (generally, statistical procedures Outliers (generally, statistical procedures are sensitive to outliers.are sensitive to outliers.– Univariate case: boxplotUnivariate case: boxplot– Multivariate case: Mahalanobis distance (a Multivariate case: Mahalanobis distance (a

chi-square statistics), a point is an outlier chi-square statistics), a point is an outlier when its p-value is < .001.when its p-value is < .001.

– Treatment:Treatment:Drop the caseDrop the case

Report two analysis (one with outlier, one without)Report two analysis (one with outlier, one without)

Page 7: Multivariate Data Analysis Using SPSS

Preparation for Stat. Analysis-3Preparation for Stat. Analysis-3

NormalityNormality– Testing univariate normal:Testing univariate normal:

Q-Q plotQ-Q plot

Skewness and Kurtosis: they should be 0 when Skewness and Kurtosis: they should be 0 when normal; not normal when p-value < .01 or .001normal; not normal when p-value < .01 or .001

Komogorov-Smirnov statistic: significant means Komogorov-Smirnov statistic: significant means not normal.not normal.

– Testing multivariate normal:Testing multivariate normal:Scatterplots should be ellipticalScatterplots should be elliptical

Each variable must be normalEach variable must be normal

Page 8: Multivariate Data Analysis Using SPSS

Preparation for Stat. Analysis-4Preparation for Stat. Analysis-4

LinearityLinearity– Linear combination of variables make senseLinear combination of variables make sense– Two variables (or comb. of variables) are Two variables (or comb. of variables) are

linearlinear– Check for linearityCheck for linearity

Residual plot in regressionResidual plot in regression

Scatterplots Scatterplots

Page 9: Multivariate Data Analysis Using SPSS

Preparation for Stat. Analysis-5Preparation for Stat. Analysis-5

Homoscedasticity: the covariance matrixes Homoscedasticity: the covariance matrixes are equal across groupsare equal across groups– Box’s M test: test the equality of the Box’s M test: test the equality of the

covariance matrixes across groupscovariance matrixes across groupsSensitive to normalitySensitive to normality

– Levene’s test: test equality of variances Levene’s test: test equality of variances across groups.across groups.

Not sensitive to normalityNot sensitive to normality

Page 10: Multivariate Data Analysis Using SPSS

Preparation for Stat. Analysis-Preparation for Stat. Analysis-Example-1Example-1

Steps in preparation for stat. analysis:Steps in preparation for stat. analysis:– Check for variable codling, recode if necessaryCheck for variable codling, recode if necessary– Examining missing dataExamining missing data– Check for univariate outlier, normality, homogeneity of Check for univariate outlier, normality, homogeneity of

variances (Explore)variances (Explore)– Test for homogeneity of variances (ANOVA)Test for homogeneity of variances (ANOVA)– Check for multivariate outliers (Regression>Save> Check for multivariate outliers (Regression>Save>

Mahalanobis)Mahalanobis)– Check for linearity (scatterplots; residual plots in Check for linearity (scatterplots; residual plots in

regression)regression)

Page 11: Multivariate Data Analysis Using SPSS

Preparation for Stat. Analysis-Preparation for Stat. Analysis-Example-2Example-2

Use dataset dssft.savUse dataset dssft.savObjective: we are interested in Objective: we are interested in investigating group differences (satjob2) in investigating group differences (satjob2) in income (income91), age (age_2) and income (income91), age (age_2) and education (educ)education (educ)Check for coding: need to recode Check for coding: need to recode rincome91 into rincome_2 (22, 98, 99 be rincome91 into rincome_2 (22, 98, 99 be system missing)system missing)– Transform>Recode>Into Different VariableTransform>Recode>Into Different Variable

Page 12: Multivariate Data Analysis Using SPSS

Preparation for Stat. Analysis-Preparation for Stat. Analysis-Example-3Example-3

Check for missing valueCheck for missing value– Use Frequency for categorical variableUse Frequency for categorical variable– Use Descriptive Stat. for measurement variableUse Descriptive Stat. for measurement variable– For categorical variables:For categorical variables:

If missing value is < 5%, use List-wise optionIf missing value is < 5%, use List-wise option

If >=5%, define the missing value as a new categoryIf >=5%, define the missing value as a new category

– For measurement variables:For measurement variables:If missing value is < 5%, use List-wise optionIf missing value is < 5%, use List-wise option

If between 5% and 15%, use Transform>Replace Missing If between 5% and 15%, use Transform>Replace Missing Value. Replacing less than 15% of data has little effect on Value. Replacing less than 15% of data has little effect on the outcomethe outcome

If greater than 15%, consider to drop the variable or subjectIf greater than 15%, consider to drop the variable or subject

Page 13: Multivariate Data Analysis Using SPSS

Preparation for Stat. Analysis-Preparation for Stat. Analysis-Example-4Example-4

– Check missing value for satjob2Check missing value for satjob2Analysis>Descriptive Statistics>FrequencyAnalysis>Descriptive Statistics>Frequency

– Check for missing value for rincome_2Check for missing value for rincome_2Analysis>Descriptive Statistics>DescriptiveAnalysis>Descriptive Statistics>Descriptive

– Replaying the missing values in rincome_2Replaying the missing values in rincome_2Transform>Replacing Missing ValueTransform>Replacing Missing Value

Page 14: Multivariate Data Analysis Using SPSS

Preparation for Stat. Analysis-Preparation for Stat. Analysis-Example-5Example-5

Check for univariate outliers, normality, Check for univariate outliers, normality, Homogeneity of variancesHomogeneity of variances– Analysis>Descriptive Statistics>ExploreAnalysis>Descriptive Statistics>Explore

Put rincome_2, age_2, and educ into the Put rincome_2, age_2, and educ into the Dependent List box; satjob2 into Factor List boxDependent List box; satjob2 into Factor List box

– There are outliers in rincome_2, lets change There are outliers in rincome_2, lets change those outliers to the acceptable min or max those outliers to the acceptable min or max valuevalue

Transform>Recode>Into Different VariableTransform>Recode>Into Different Variable– Put income_2 into Original Variable box, type income_3 Put income_2 into Original Variable box, type income_3

as the new nameas the new name– Replace all values <= 3 by 4, all other values remain the Replace all values <= 3 by 4, all other values remain the

samesame

Page 15: Multivariate Data Analysis Using SPSS

Preparation for Stat. Analysis-Preparation for Stat. Analysis-Example-6Example-6

Explore rincome_3 again: not normalExplore rincome_3 again: not normal– Transform rincome_3 into rincome_4 by ln or Transform rincome_3 into rincome_4 by ln or

sqrtsqrt

Explore rincome_4Explore rincome_4Check for multivariate outliersCheck for multivariate outliers– Analysis>Regression>linearAnalysis>Regression>linear

Put id (dummy variable) into Depend box, put Put id (dummy variable) into Depend box, put rincome_4, age_2, and educ into Independent boxrincome_4, age_2, and educ into Independent boxClick at Save, then Mahalanobis boxClick at Save, then Mahalanobis boxCompare Mahalanobis dist. with chi-sqrt critical Compare Mahalanobis dist. with chi-sqrt critical value at p=.001 and df=number of independent value at p=.001 and df=number of independent variablesvariables

Page 16: Multivariate Data Analysis Using SPSS

Preparation for Stat. Analysis-Preparation for Stat. Analysis-Example-7Example-7

Check for multivariate normal:Check for multivariate normal:– Must univariate normalMust univariate normal– Construct a scatterplot matrix, each Construct a scatterplot matrix, each

scatterplot should be elliptical shapescatterplot should be elliptical shape

Check for HomoscedasticityCheck for Homoscedasticity– Univariate (ANOVA, Levene’s test)Univariate (ANOVA, Levene’s test)– Multivariate (MANOVA, Box’s M test, use .01 Multivariate (MANOVA, Box’s M test, use .01

level of significance level)level of significance level)

Page 17: Multivariate Data Analysis Using SPSS

Review: ANOVA -1Review: ANOVA -1

One-way ANOVA test the equality of group One-way ANOVA test the equality of group meansmeans– Assumptions: independent observations; normality; Assumptions: independent observations; normality;

homogeneity of variancehomogeneity of variance

Two-way ANOVA tests three hypotheses Two-way ANOVA tests three hypotheses simultaneously:simultaneously:– Test the interaction of the levels of the two Test the interaction of the levels of the two

independent variablesindependent variablesInteraction occurs when the effects of one factor depends on Interaction occurs when the effects of one factor depends on the different levels of the second factorthe different levels of the second factor

– Test the two independent variable separately Test the two independent variable separately

Page 18: Multivariate Data Analysis Using SPSS

Review: ANCOVA -1Review: ANCOVA -1

Idea: the difference on a DV often does not just Idea: the difference on a DV often does not just depend on one or two IVs, it may depend on other depend on one or two IVs, it may depend on other measurement variables. ANCOVA takes into measurement variables. ANCOVA takes into account of such dependency.account of such dependency.– i.e. it removes the effect of one or more covariatesi.e. it removes the effect of one or more covariates

Assumptions: in addition to the regular ANOVA Assumptions: in addition to the regular ANOVA assumptions, we need:assumptions, we need:– Linear relationship between DV and covariatesLinear relationship between DV and covariates– The slope for the regression line is the same for each The slope for the regression line is the same for each

groupgroup– The covariates are reliable and is measure without The covariates are reliable and is measure without

errorerror

Page 19: Multivariate Data Analysis Using SPSS

Review: ANCOVA -2Review: ANCOVA -2

– Homogeneity of slopes = homogeneity of Homogeneity of slopes = homogeneity of regression = there is interaction between IVs regression = there is interaction between IVs and the covariateand the covariate

If the interaction between covariate and IVs are If the interaction between covariate and IVs are significant, ANCOVA should not be conductedsignificant, ANCOVA should not be conducted

Example: determine if hours worked per Example: determine if hours worked per week (hrs2) is different by gender (sex) week (hrs2) is different by gender (sex) and for those satisfy or dissatisfied with and for those satisfy or dissatisfied with their job (satjob2), after adjusted to their their job (satjob2), after adjusted to their income (or equalized to their income)income (or equalized to their income)

Page 20: Multivariate Data Analysis Using SPSS

Review: ANCOVA -3Review: ANCOVA -3

– Analysis>GLM>UnivariateAnalysis>GLM>UnivariateMove hrs2 into DV box; move sex and satjob2 into Move hrs2 into DV box; move sex and satjob2 into Fixed Factor box; move rincome_2 into Covariate Fixed Factor box; move rincome_2 into Covariate boxboxClick at Model>CustomClick at Model>Custom

– Highlight all variables and move it to the Model boxHighlight all variables and move it to the Model box– Make sure the Interaction option is selectedMake sure the Interaction option is selected

Click at OptionClick at Option– Move sex and satjob2 into Display Means boxMove sex and satjob2 into Display Means box– Click Descriptive Stat.; Estimates of effect size; and Click Descriptive Stat.; Estimates of effect size; and

Homogeneity testsHomogeneity tests

This tests the homogeneity of regression slopesThis tests the homogeneity of regression slopes

Page 21: Multivariate Data Analysis Using SPSS

Review: ANCOVA -4Review: ANCOVA -4

– If there is no interaction found by the previous If there is no interaction found by the previous step, then repeat the previous step except step, then repeat the previous step except click at Model>Factorial instead of click at Model>Factorial instead of Model>CustomModel>Custom

Page 22: Multivariate Data Analysis Using SPSS

Review: ANOVA -2Review: ANOVA -2

– Interaction is significant means the two IVs in Interaction is significant means the two IVs in combination result in a significant effect on the DV, thus, combination result in a significant effect on the DV, thus, it does not make sense to interpret the main effects.it does not make sense to interpret the main effects.

– Assumptions: the same as One-way ANOVAAssumptions: the same as One-way ANOVA– Example: the impact of gender (sex) and age (agecat4) Example: the impact of gender (sex) and age (agecat4)

on income (rincome_2)on income (rincome_2)Explore (omitted)Explore (omitted)Analysis>GLM>univariateAnalysis>GLM>univariate

– Click model>click Full factorial>Cont.Click model>click Full factorial>Cont.– Click Options>Click Descriptive Stat; Estimates of effect size; Click Options>Click Descriptive Stat; Estimates of effect size;

Homogeneity testHomogeneity test– Click Post Hoc>click LSD; Bonferroni; Scheffe; Cont.Click Post Hoc>click LSD; Bonferroni; Scheffe; Cont.– Click Plots>put one IV into Horizontal and the other into Separate Click Plots>put one IV into Horizontal and the other into Separate

lineline

Page 23: Multivariate Data Analysis Using SPSS

MANOVA-1MANOVA-1

CharacteristicsCharacteristics– Similar to ANOVASimilar to ANOVA– Multiple DVsMultiple DVs– The DVs are correlated and linear combination makes The DVs are correlated and linear combination makes

sensesense– It tests whether mean differences among k groups on It tests whether mean differences among k groups on

a combination of DVs are likely to have occurred by a combination of DVs are likely to have occurred by chance chance

– The idea of MANOVA is find a linear combination that The idea of MANOVA is find a linear combination that separates the groups ‘optimally’, and perform ANOVA separates the groups ‘optimally’, and perform ANOVA on the linear combinationon the linear combination

Page 24: Multivariate Data Analysis Using SPSS

MANOVA-2MANOVA-2

AdvantagesAdvantages– The chance of discovering what actually The chance of discovering what actually

changed as a result of the the different changed as a result of the the different treatment increasestreatment increases

– May reveal differences not shown in separate May reveal differences not shown in separate ANOVAsANOVAs

– Without inflation of type one errorWithout inflation of type one error– The use of multiple ANOVAs ignores some The use of multiple ANOVAs ignores some

very important info (the fact that the DVs are very important info (the fact that the DVs are correlated)correlated)

Page 25: Multivariate Data Analysis Using SPSS

MANOVA-3MANOVA-3

DisadvantagesDisadvantages– More complicatedMore complicated– ANOVA is often more powerfulANOVA is often more powerful

Assumptions:Assumptions:– Independent random samplesIndependent random samples– Multivariate normal distribution in each groupMultivariate normal distribution in each group– Homogeneity of covariance matrixHomogeneity of covariance matrix– Linear relationship among DVsLinear relationship among DVs

Page 26: Multivariate Data Analysis Using SPSS

MANOVA-4MANOVA-4

Steps in carry out MANOVASteps in carry out MANOVA– Check for assumptionsCheck for assumptions– If MANOVA is not significant, stopIf MANOVA is not significant, stop– If MANOVA is significant, carry out univariate If MANOVA is significant, carry out univariate

ANOVAANOVA– If univariate ANOVA is significant, do Post If univariate ANOVA is significant, do Post

HocHoc

If homoscedasticity, use Wilks Lambda, if If homoscedasticity, use Wilks Lambda, if not, use Pillai’s Trace. In general, all 4 not, use Pillai’s Trace. In general, all 4 statistics should be similar.statistics should be similar.

Page 27: Multivariate Data Analysis Using SPSS

MANOVA-5MANOVA-5

Example:An experiment looking at the memory Example:An experiment looking at the memory effects of different instructions: 3 groups of effects of different instructions: 3 groups of human subjects learned nonsense syllables as human subjects learned nonsense syllables as they were presented and were administered two they were presented and were administered two memory tests: recall and recognition. The first memory tests: recall and recognition. The first group of subjects was instructed to like or dislike group of subjects was instructed to like or dislike the syllables as they were presented (to the syllables as they were presented (to generate affect). A second group was instructed generate affect). A second group was instructed that they will be tested (induce anxiety?). The 3that they will be tested (induce anxiety?). The 3 rdrd group was told to count the syllable as the were group was told to count the syllable as the were presented (interference). The objective is to presented (interference). The objective is to access group differences in memoryaccess group differences in memory

Page 28: Multivariate Data Analysis Using SPSS

MANOVA-6MANOVA-6

How to do it?How to do it?– File>Open DataFile>Open Data

Open the file As9.por in Instruct>Zhang Multivariate Short Open the file As9.por in Instruct>Zhang Multivariate Short Course folderCourse folder

– Analyze>GLM>MultivariateAnalyze>GLM>MultivariateMove recall and recog into Dependent Variable box; move Move recall and recog into Dependent Variable box; move group into Fixed Factors boxgroup into Fixed Factors boxClick at Options; move group into Display means box (this Click at Options; move group into Display means box (this will display the marginal means predicted by the model, will display the marginal means predicted by the model, these means may be different than the observed means if these means may be different than the observed means if there are covariates or the model is not factorial); Compare there are covariates or the model is not factorial); Compare main effect box is for testing the every pair of the estimated main effect box is for testing the every pair of the estimated marginal means for the selected factors.marginal means for the selected factors.Click at Estimates of effect size and Homogeneity of varianceClick at Estimates of effect size and Homogeneity of variance

Page 29: Multivariate Data Analysis Using SPSS

MANOVA-7MANOVA-7

Push buttons:Push buttons:– Plots: create a profile plot for each DV displaying Plots: create a profile plot for each DV displaying

group meansgroup means– Post Hoc: Post Hoc tests for marginal meansPost Hoc: Post Hoc tests for marginal means– Save: save predicted values, etc.Save: save predicted values, etc.– Contrast: perform planned comparisonsContrast: perform planned comparisons– Model: specify the modelModel: specify the model– Options: Options:

Display Means for: display the estimated means predicted by Display Means for: display the estimated means predicted by the modelthe model

– Compare main effects: test for significant difference between Compare main effects: test for significant difference between every pair of estimated marginal means for each of the main every pair of estimated marginal means for each of the main effectseffects

Page 30: Multivariate Data Analysis Using SPSS

MANOVA-8MANOVA-8

– Observed power: produce a statistical power Observed power: produce a statistical power analysis for your studyanalysis for your study

– Parameter estimate: check this when you Parameter estimate: check this when you need a predictive modelneed a predictive model

– Spread vs. level plot: visual display of Spread vs. level plot: visual display of homogeneity of variancehomogeneity of variance

Page 31: Multivariate Data Analysis Using SPSS

MANOVA-9MANOVA-9

Example 2: Check for the impact of job Example 2: Check for the impact of job satisfaction (satjob) and gender (sex) on satisfaction (satjob) and gender (sex) on income (rincome_2) and education (educ) income (rincome_2) and education (educ) (in gssft.sav)(in gssft.sav)– Screen data: transform educ to educ2 to Screen data: transform educ to educ2 to

eliminate cases with ‘6 or less’eliminate cases with ‘6 or less’– Check for assumptions: exploreCheck for assumptions: explore– MANOVAMANOVA

Page 32: Multivariate Data Analysis Using SPSS

MANCOVA-1MANCOVA-1

Objective: Test for mean differences Objective: Test for mean differences among groups for a linear combination of among groups for a linear combination of DVs after adjusted for the covariate.DVs after adjusted for the covariate.

Example: to test if there is differences in Example: to test if there is differences in productivity (measured by income and productivity (measured by income and hours worked) for individuals in different hours worked) for individuals in different age groups after adjusted for the age groups after adjusted for the education leveleducation level

Page 33: Multivariate Data Analysis Using SPSS

MANCOVA-2MANCOVA-2

Assumptions: similar to ANCOVAAssumptions: similar to ANCOVA

SPSS how to:SPSS how to:– Analysis>GLM>MultivariateAnalysis>GLM>Multivariate

Move rincome_2 and educ2 to DV box; move sex Move rincome_2 and educ2 to DV box; move sex and satjob into IV box; move age to Covariate boxand satjob into IV box; move age to Covariate box

Check for homogeneity of regressionCheck for homogeneity of regression– Click at Model>Custom; Highlight all variables and move Click at Model>Custom; Highlight all variables and move

them to Model boxthem to Model box

If the covariate-IVs interaction is not significant, If the covariate-IVs interaction is not significant, repeat the process but select the Full under modelrepeat the process but select the Full under model

Page 34: Multivariate Data Analysis Using SPSS

Repeated Measure Analysis-1Repeated Measure Analysis-1

Objective: test for significant differences in Objective: test for significant differences in means when the same observation appears in means when the same observation appears in multiple levels of a factormultiple levels of a factor

Examples of repeated measure studies:Examples of repeated measure studies:– Marketing – compare customer’s ratings on 4 different Marketing – compare customer’s ratings on 4 different

brandsbrands– Medicine – compare test results before, immediately Medicine – compare test results before, immediately

after, and six months after a procedureafter, and six months after a procedure– Education – compare performance test scores before Education – compare performance test scores before

and after an intervention programand after an intervention program

Page 35: Multivariate Data Analysis Using SPSS

Repeated Measure Analysis-2Repeated Measure Analysis-2

The logic of repeated measure: SPSS The logic of repeated measure: SPSS performs repeated measure ANOVA by performs repeated measure ANOVA by computing contrasts (differences) across computing contrasts (differences) across the repeated measures factor’s levels for the repeated measures factor’s levels for each subject, then testing if the means of each subject, then testing if the means of the contrasts are significantly different the contrasts are significantly different from 0; any between subject tests are from 0; any between subject tests are based on the means of the subjects.based on the means of the subjects.

Page 36: Multivariate Data Analysis Using SPSS

Repeated Measure Analysis-3Repeated Measure Analysis-3

Assumptions:Assumptions:– Independent observationsIndependent observations– NormalityNormality– Homogeneity of variancesHomogeneity of variances– Sphericity: if two or more contrasts are to be pooled Sphericity: if two or more contrasts are to be pooled

(the test of main effect is based on this pooling), then (the test of main effect is based on this pooling), then the contrasts should be equally weighted and the contrasts should be equally weighted and uncorrelated (equal variances and uncorrelated uncorrelated (equal variances and uncorrelated contrasts); this assumption is equivalent to the contrasts); this assumption is equivalent to the covariance matrix is diagonal and the diagonal covariance matrix is diagonal and the diagonal elements are the same)elements are the same)

Page 37: Multivariate Data Analysis Using SPSS

Repeated Measure Analysis-4Repeated Measure Analysis-4

Example 1: A study in which 5 subjects were Example 1: A study in which 5 subjects were tested in each of 4 drug conditionstested in each of 4 drug conditionsOpen data file:Open data file:– File>Open…Data; select Repmeas1.porFile>Open…Data; select Repmeas1.por

SPSS repeated measure procedure:SPSS repeated measure procedure:– Analyze>GLM>Repeated MeasureAnalyze>GLM>Repeated Measure

Within-Subject Factor Name (the name of the repeated Within-Subject Factor Name (the name of the repeated measure factor): a repeated measure factor is expressed as measure factor): a repeated measure factor is expressed as a set of variablesa set of variables

– Replace factor1 with DrugReplace factor1 with Drug

Number of levels: the number of repeated measurementsNumber of levels: the number of repeated measurements– Type 4Type 4

Page 38: Multivariate Data Analysis Using SPSS

Repeated Measure Analysis-5Repeated Measure Analysis-5

– The Measure pushbutton for two functionsThe Measure pushbutton for two functionsFor multiple dependent measures (e.g. we For multiple dependent measures (e.g. we recorded 4 measures of physiological stress under recorded 4 measures of physiological stress under each of the drug conditions)each of the drug conditions)To label the factor levelsTo label the factor levels

– Click Measure; type memory in Measure name box; click Click Measure; type memory in Measure name box; click addadd

Click Define: here we link the repeated measure Click Define: here we link the repeated measure factor level to variable names; define between factor level to variable names; define between subject factors and covariatessubject factors and covariates

– Move drug1 – drug 4 to the Within-Subject boxMove drug1 – drug 4 to the Within-Subject boxYou can move a selected variable by the up and You can move a selected variable by the up and down buttondown button

Page 39: Multivariate Data Analysis Using SPSS

Repeated Measure Analysis-6Repeated Measure Analysis-6

Model button: by default a complete modelModel button: by default a complete modelContrast button: specify particular contrastsContrast button: specify particular contrastsPlot button: create profile plots that graph factor Plot button: create profile plots that graph factor level estimated marginal means for up to 3 factors level estimated marginal means for up to 3 factors at a timeat a timePost Hoc: provide Post Hoc tests for between Post Hoc: provide Post Hoc tests for between subject factorssubject factorsSave button: allow you to save predicted values, Save button: allow you to save predicted values, residuals, etc.residuals, etc.Options: similar to MANOVAOptions: similar to MANOVA

– Click Descriptive; click at Transformation Matrix (it Click Descriptive; click at Transformation Matrix (it provides the contrasts)provides the contrasts)

Page 40: Multivariate Data Analysis Using SPSS

Repeated Measure Analysis-7Repeated Measure Analysis-7

Interpret the resultsInterpret the results1.1. Look at the descriptive statisticsLook at the descriptive statistics

2.2. Look at the test for SphericityLook at the test for Sphericity1.1. If Sphericity is significant, use the Multivariate results (test If Sphericity is significant, use the Multivariate results (test

on the contrasts). It tests whether all of the contrast on the contrasts). It tests whether all of the contrast variables are zero in the populationvariables are zero in the population

2.2. If Sphericity is not significant, use the Sphericity Assumed If Sphericity is not significant, use the Sphericity Assumed resultresult

3.3. Look at the tests for within subject contrasts: it test Look at the tests for within subject contrasts: it test the linear trend; the quadratic trend…the linear trend; the quadratic trend…

– It may not be make sense in some applications, as in this It may not be make sense in some applications, as in this example (but it makes sense in terms of time and dosage)example (but it makes sense in terms of time and dosage)

Page 41: Multivariate Data Analysis Using SPSS

Repeated Measure Analysis-8Repeated Measure Analysis-8

Transformation matrix provide info on what are Transformation matrix provide info on what are linear contrast, etc.linear contrast, etc.

– The fist table is for the average across the repeated The fist table is for the average across the repeated measure factor (here they are all .5, it means each measure factor (here they are all .5, it means each variable is weighted equally, normalization requires that variable is weighted equally, normalization requires that the square of the sums equals to 1)the square of the sums equals to 1)

– The second table defines the corresponding repeated The second table defines the corresponding repeated measure factormeasure factor

Linear – increase by a constant, etc.Linear – increase by a constant, etc.Linear and quadratic is orthogonal, etc.Linear and quadratic is orthogonal, etc.

– Having concluded there are memory Having concluded there are memory differences due to drug condition, , we want to differences due to drug condition, , we want to know which condition differ to which othersknow which condition differ to which others

Page 42: Multivariate Data Analysis Using SPSS
Page 43: Multivariate Data Analysis Using SPSS

Repeated Measure Analysis-9Repeated Measure Analysis-9

Repeat the analysis, except under Option button, Repeat the analysis, except under Option button, move ‘drug’ into Display Means, click at Compare move ‘drug’ into Display Means, click at Compare Main effects and select Bonferroni adjustmentMain effects and select Bonferroni adjustment

– Transformation Coefficients (M Matrix): it shows how the Transformation Coefficients (M Matrix): it shows how the variables are created for comparison. Here, we compare variables are created for comparison. Here, we compare the drug conditions, so the M matrix is an identity matrixthe drug conditions, so the M matrix is an identity matrix

Suppose we want to test each adjacent pair of Suppose we want to test each adjacent pair of means: drug1 vs. drug2; drug2 vs. drug3; drug3 means: drug1 vs. drug2; drug2 vs. drug3; drug3 vs. drug 4:vs. drug 4:

– Repeated measure>Define>Contrast>Select RepeatedRepeated measure>Define>Contrast>Select Repeated

Page 44: Multivariate Data Analysis Using SPSS

Repeated Measure Analysis-10Repeated Measure Analysis-10

Example 2: A marketing experiment was devised Example 2: A marketing experiment was devised to evaluate whether viewing a commercial to evaluate whether viewing a commercial produces improved ratings for a specific brand. produces improved ratings for a specific brand. Ratings on 3 brands were obtained from objects Ratings on 3 brands were obtained from objects before and after viewing the commercial. Since before and after viewing the commercial. Since the hope was that the commercial would the hope was that the commercial would improve ratings of only one brand (A), improve ratings of only one brand (A), researchers expected a significant brand by pre-researchers expected a significant brand by pre-post commercial interaction. There are two post commercial interaction. There are two between-subjects factors: sex and brand used between-subjects factors: sex and brand used by the subjectby the subject

Page 45: Multivariate Data Analysis Using SPSS

Repeated Measure Analysis-11Repeated Measure Analysis-11

SPSS how to:SPSS how to:– Analyze>GLM>Repeated MeasuresAnalyze>GLM>Repeated Measures

Replace factor1 with prepost in the Within-Subject Replace factor1 with prepost in the Within-Subject Factor box; type 2 in the Number of level box; click Factor box; type 2 in the Number of level box; click addadd

Type brand in the Within-Subject Factor box; type Type brand in the Within-Subject Factor box; type 3 in the Number of level box; click add3 in the Number of level box; click add

Click measure; type measure in Measure Name Click measure; type measure in Measure Name box; click addbox; click add

Note: SPSS expects 2 between-subject factorsNote: SPSS expects 2 between-subject factors

Page 46: Multivariate Data Analysis Using SPSS

Repeated Measure Analysis-12Repeated Measure Analysis-12

Click Define button; move the appropriate variable Click Define button; move the appropriate variable into place; move sex and user into Between-into place; move sex and user into Between-Subject Factor boxSubject Factor box

Click Options button; move sex, user, prepost and Click Options button; move sex, user, prepost and brand into the Display means boxbrand into the Display means box

Click Homogeneity tests and descriptive boxesClick Homogeneity tests and descriptive boxes

Click Plot; move user into Horizontal Axis box and Click Plot; move user into Horizontal Axis box and brand into Separate Lines box brand into Separate Lines box

Click continue; OKClick continue; OK

Page 47: Multivariate Data Analysis Using SPSS

Factor Analysis-1Factor Analysis-1

The main goal of factor analysis is data The main goal of factor analysis is data reduction. A typical use of factor analysis is in reduction. A typical use of factor analysis is in survey research, where a researcher wishes to survey research, where a researcher wishes to represent a number of questions with a smaller represent a number of questions with a smaller number of factorsnumber of factorsTwo questions in factor analysis:Two questions in factor analysis:– How many factors are there and what they represent How many factors are there and what they represent

(interpretation)(interpretation)

Two technical aids: Two technical aids: – EigenvaluesEigenvalues– Percentage of variance accounted forPercentage of variance accounted for

Page 48: Multivariate Data Analysis Using SPSS

Factor Analysis-2Factor Analysis-2

Two types of factor analysis:Two types of factor analysis:– Exploratory: introduce hereExploratory: introduce here– Confirmatory: SPSS AMOSConfirmatory: SPSS AMOS

Theoretical basis:Theoretical basis:– Correlations among variables are explained by Correlations among variables are explained by

underlying factorsunderlying factors– An example of mathematical 1 factor model for two An example of mathematical 1 factor model for two

variables:variables:

VV11=L=L11*F*F11+E+E11

VV22=L=L22*F*F11+E+E22

Page 49: Multivariate Data Analysis Using SPSS

Factor Analysis-3Factor Analysis-3

Each variable is compose of a common factor (FEach variable is compose of a common factor (F11) )

multiply by a loading coefficient (Lmultiply by a loading coefficient (L11, L, L22 – the – the

lambdas or factor loadings) plus a random lambdas or factor loadings) plus a random componentcomponent

VV1 1 and Vand V22 correlate because the common factor correlate because the common factor

and should relate to the factor loadings, thus, the and should relate to the factor loadings, thus, the factor loadings can be estimated by the factor loadings can be estimated by the correlationscorrelations

A set of correlations can derive different factor A set of correlations can derive different factor loadings (i.e. the solutions are not unique)loadings (i.e. the solutions are not unique)

One should pick the simplest solutionOne should pick the simplest solution

Page 50: Multivariate Data Analysis Using SPSS

Factor Analysis-4Factor Analysis-4

A factor solution needs to be confirm:A factor solution needs to be confirm:– By a different factor methodBy a different factor method– By a different sampleBy a different sample

More on terminologyMore on terminology– Factor loading: interpreted as the Pearson Factor loading: interpreted as the Pearson

correlation between the variable and the correlation between the variable and the factorfactor

– Communality: the proportion of variability for a Communality: the proportion of variability for a given variable that is explained by the factorgiven variable that is explained by the factor

– Extraction: the process by which the factors Extraction: the process by which the factors are determined from a large set of variablesare determined from a large set of variables

Page 51: Multivariate Data Analysis Using SPSS

Factor Analysis-5Factor Analysis-5

Principle component: one of the extraction Principle component: one of the extraction methodsmethods– A principle component is a linear combination of A principle component is a linear combination of

observed variables that is independent (orthogonal) of observed variables that is independent (orthogonal) of other componentsother components

– The first component accounts for the largest amount The first component accounts for the largest amount of variance in the input data; the second component of variance in the input data; the second component accounts for the largest amount or the remaining accounts for the largest amount or the remaining variance…variance…

– Components are orthogonal means they are Components are orthogonal means they are uncorrelateduncorrelated

Page 52: Multivariate Data Analysis Using SPSS

Factor Analysis-6Factor Analysis-6

Possible application of principle Possible application of principle components:components:– E.g. in a survey research, it is common to E.g. in a survey research, it is common to

have many questions to address one issue have many questions to address one issue (e.g. customer service). It is likely that these (e.g. customer service). It is likely that these questions are highly correlated. It is questions are highly correlated. It is problematic to use these variables in some problematic to use these variables in some statistical procedures (e.g. regression). One statistical procedures (e.g. regression). One can use factor scores, computed from factor can use factor scores, computed from factor loadings on each orthogonal componentloadings on each orthogonal component

Page 53: Multivariate Data Analysis Using SPSS

Factor Analysis-7Factor Analysis-7

Principle component vs. other extract methods:Principle component vs. other extract methods:– Principle component focus on accounting for the Principle component focus on accounting for the

maximum among of variance (the diagonal of a maximum among of variance (the diagonal of a correlation matrix)correlation matrix)

– Other extract methods (e.g. principle axis factoring) Other extract methods (e.g. principle axis factoring) focus more on accounting for the correlations focus more on accounting for the correlations between variables (off diagonal correlations)between variables (off diagonal correlations)

– Principle component can be defined as a unique Principle component can be defined as a unique combination of variables but the other factor methods combination of variables but the other factor methods can notcan not

– Principle component are use for data reduction but Principle component are use for data reduction but more difficult to interpretmore difficult to interpret

Page 54: Multivariate Data Analysis Using SPSS

Factor Analysis-8Factor Analysis-8

Number of factors:Number of factors:– Eigenvalues are often used to determine how Eigenvalues are often used to determine how

many factors to takemany factors to takeTake as many factors there are eigenvalues Take as many factors there are eigenvalues greater than 1greater than 1

– Eigenvalue represents the amount of standardized Eigenvalue represents the amount of standardized variance in the variable accounted for by a factorvariance in the variable accounted for by a factor

– The amount of standardized variance in a variable is 1The amount of standardized variance in a variable is 1– The sum of eigenvalues is the percentage of variance The sum of eigenvalues is the percentage of variance

accounted foraccounted for

Page 55: Multivariate Data Analysis Using SPSS

Factor Analysis-9Factor Analysis-9

RotationRotation– Objective: to facilitate interpretationObjective: to facilitate interpretation– Orthogonal rotation: done when data reduction is the Orthogonal rotation: done when data reduction is the

objective and factors need to be orthogonalobjective and factors need to be orthogonalVarimax: attempts to simplify interpretation by maximize the Varimax: attempts to simplify interpretation by maximize the variances of the variable loadings on each factorvariances of the variable loadings on each factorQuartimax: simplify solution by finding a rotation that Quartimax: simplify solution by finding a rotation that produces high and low loadings across factors for each produces high and low loadings across factors for each variablevariable

– Oblique rotation: use when there are reason to allow Oblique rotation: use when there are reason to allow factors to be correlatedfactors to be correlated

Oblimin and Promax (promax runs fast)Oblimin and Promax (promax runs fast)

Page 56: Multivariate Data Analysis Using SPSS

Factor Analysis-10Factor Analysis-10

Factor scores: if you are satisfy with a Factor scores: if you are satisfy with a factor solutionfactor solution– You can request that a new set of variables You can request that a new set of variables

be created that represents the scores of each be created that represents the scores of each observation on the factor (difficult of interpret)observation on the factor (difficult of interpret)

– You can use the lambda coefficient to judge You can use the lambda coefficient to judge which variables are highly related to the which variables are highly related to the factor; the compute the sum of the mean of factor; the compute the sum of the mean of this variables for further analysis (easy to this variables for further analysis (easy to interpret)interpret)

Page 57: Multivariate Data Analysis Using SPSS

Factor Analysis-11Factor Analysis-11

Sample size: the sample size should be about Sample size: the sample size should be about 10 to 15 times of the number of variables (as 10 to 15 times of the number of variables (as other multivariate procedures)other multivariate procedures)

Number of methods: there are 8 factoring Number of methods: there are 8 factoring methods, including principle componentmethods, including principle component– Principle axis: account for correlations between the Principle axis: account for correlations between the

variablesvariables– Unweighted least-squares: minimize the residual Unweighted least-squares: minimize the residual

between the observed and the reproduced correlation between the observed and the reproduced correlation matrixmatrix

Page 58: Multivariate Data Analysis Using SPSS

Factor Analysis-12Factor Analysis-12

– Generalize least-squares: similar to Unweighted least-Generalize least-squares: similar to Unweighted least-squares but give more weight the the variables with squares but give more weight the the variables with stronger correlationstronger correlation

– Maximum Likelihood: generate the solution that is the Maximum Likelihood: generate the solution that is the most likely to produce the correlation matrixmost likely to produce the correlation matrix

– Alpha Factoring: Consider variables as a sample; not Alpha Factoring: Consider variables as a sample; not using factor loadingsusing factor loadings

– Image factoring: decompose the variables into a Image factoring: decompose the variables into a common part and a unique part, then work with the common part and a unique part, then work with the common partcommon part

Page 59: Multivariate Data Analysis Using SPSS

Factor Analysis-13Factor Analysis-13

Recommendations:Recommendations:– Principle components and principle axis are Principle components and principle axis are

the most common used methodsthe most common used methods– When there are multicollinearity, use principle When there are multicollinearity, use principle

componentscomponents– Rotations are often done. Try to use VarimaxRotations are often done. Try to use Varimax

Page 60: Multivariate Data Analysis Using SPSS

Factor Analysis-14Factor Analysis-14

Example 1: whether a small number of athletic Example 1: whether a small number of athletic skills account for performance in the ten skills account for performance in the ten separate decathlon eventsseparate decathlon events– File>Open>Data…; select Olymp88.porFile>Open>Data…; select Olymp88.por– Looking at correlation:Looking at correlation:

Analyze>Correlation>BivariateAnalyze>Correlation>Bivariate

– Principle component with orthogonal rotationPrinciple component with orthogonal rotationAnalyze>Data Reduction>FactorAnalyze>Data Reduction>Factor

– Select all variables except scoreSelect all variables except score– Click Extract button>click Scree PlotClick Extract button>click Scree Plot– Check off Unrotated factor solutionCheck off Unrotated factor solution– Click continueClick continue

Page 61: Multivariate Data Analysis Using SPSS

Factor Analysis-15Factor Analysis-15

Click Rotation button>click Varimax; Loading plots; Click Rotation button>click Varimax; Loading plots; click continueclick continue

Click options button>click sorted by size; click Click options button>click sorted by size; click Suppress absolute values box; change .1 to ,3; Suppress absolute values box; change .1 to ,3; click continueclick continue

Click Descriptive>Univariate descriptive; KMO and Click Descriptive>Univariate descriptive; KMO and Bartlett’s test of sphericity (KMO measures how Bartlett’s test of sphericity (KMO measures how well the sample data are suited for factor well the sample data are suited for factor analysis: .9 is great and less than .5 is not analysis: .9 is great and less than .5 is not acceptable; Bartlett’s test tests the sphericity of the acceptable; Bartlett’s test tests the sphericity of the correlation matrix); click continuecorrelation matrix); click continue

Click OKClick OK

Page 62: Multivariate Data Analysis Using SPSS

Factor Analysis-16Factor Analysis-16

Try to validate the first factor solution Try to validate the first factor solution using a different methodusing a different method– Analyze>Data Reduction>Factor AnalysisAnalyze>Data Reduction>Factor Analysis

Click Extraction>Select Principle axis factoring; Click Extraction>Select Principle axis factoring; click continueclick continueClick Rotation>Select Direct Oblimin (leave delta Click Rotation>Select Direct Oblimin (leave delta value at 0, most oblique value possible); type 50 in value at 0, most oblique value possible); type 50 in the Max Iteration box; click continuethe Max Iteration box; click continueClick Score button>click save as variables (this Click Score button>click save as variables (this involve solving system of equation for the factors, involve solving system of equation for the factors, regression is one of the methods to solve the regression is one of the methods to solve the equations); click continueequations); click continueClick OKClick OK

Page 63: Multivariate Data Analysis Using SPSS

Factor Analysis-17Factor Analysis-17

Note: the Patten matrix gives the Note: the Patten matrix gives the standardized linear weights and the standardized linear weights and the Structure matrix gives the correlation Structure matrix gives the correlation between variable and factors (in principle between variable and factors (in principle component analysis, the component component analysis, the component matrix gives both factor loadings and the matrix gives both factor loadings and the correlations)correlations)

Page 64: Multivariate Data Analysis Using SPSS

Discriminant Analysis-1Discriminant Analysis-1

Discriminant analysis characterize the Discriminant analysis characterize the relationship between a set of IVs with a relationship between a set of IVs with a categorical DV with relatively few categorical DV with relatively few categoriescategories– It creates a linear combination of the IVs that It creates a linear combination of the IVs that

best characterizes the differences among the best characterizes the differences among the groupsgroups

– Predictive discriminant analysis focus on Predictive discriminant analysis focus on creating a rule to predict group membershipcreating a rule to predict group membership

– Descriptive DA studies the relationship Descriptive DA studies the relationship between the DV and the IVs.between the DV and the IVs.

Page 65: Multivariate Data Analysis Using SPSS

Discriminant Analysis-2Discriminant Analysis-2

Possible applications:Possible applications:– Whether a bank should offer a loan to a new Whether a bank should offer a loan to a new

customer?customer?– Which customer is likely to buy?Which customer is likely to buy?– Identify patients who may be at high risk for Identify patients who may be at high risk for

problems after surgeryproblems after surgery

Page 66: Multivariate Data Analysis Using SPSS

Discriminant Analysis-3Discriminant Analysis-3

How does it work?How does it work?– Assume the population of interest is composed of Assume the population of interest is composed of

distinct populationsdistinct populations– Assume the IVs follows multivariate normal Assume the IVs follows multivariate normal

distributiondistribution– DS seek a linear combination of the IVs that best DS seek a linear combination of the IVs that best

separate the populationsseparate the populations– If we have k groups, we need k-1 discriminate If we have k groups, we need k-1 discriminate

functionsfunctions– A discriminant score is computed for each functionA discriminant score is computed for each function– This score is used to classify cases into one of the This score is used to classify cases into one of the

categoriescategories

Page 67: Multivariate Data Analysis Using SPSS

Discriminant Analysis-4Discriminant Analysis-4

– There are three methods to classify group There are three methods to classify group memberships:memberships:

Maximum likelihood method: assign case to group Maximum likelihood method: assign case to group k is the probability of membership is greater in k is the probability of membership is greater in group k than any other groupgroup k than any other groupFisher (linear) classification functions: assign a Fisher (linear) classification functions: assign a membership to group k if its score on the function membership to group k if its score on the function for group k is greater than any other function for group k is greater than any other function scoresscoresDistance function: assign membership to group k if Distance function: assign membership to group k if its distance to the centroid of the group is minimumits distance to the centroid of the group is minimumNote: SPSS uses Maximum likelihood methodNote: SPSS uses Maximum likelihood method

Page 68: Multivariate Data Analysis Using SPSS

Discriminant Analysis-5Discriminant Analysis-5

Basic steps in DA:Basic steps in DA:– Identify the variablesIdentify the variables– Screen data: look for outliers, variables may Screen data: look for outliers, variables may

not be good predictors, etcnot be good predictors, etc– Run DARun DA– Check for the correct prediction rateCheck for the correct prediction rate– Check for the importance of individual Check for the importance of individual

predictorspredictors– Validate the modelValidate the model

Page 69: Multivariate Data Analysis Using SPSS

Discriminant Analysis-6Discriminant Analysis-6

Assumptions:Assumptions:– IVs are either dichotomous or measurementIVs are either dichotomous or measurement– NormalityNormality– Homogeneity of variances Homogeneity of variances

Page 70: Multivariate Data Analysis Using SPSS

Discriminant Analysis-7Discriminant Analysis-7

Example 1: VCR buyers filled out a survey; we Example 1: VCR buyers filled out a survey; we want to determine which set of demographic want to determine which set of demographic information and attitude best predict which information and attitude best predict which customer may buy another VCRcustomer may buy another VCR– File>Open Data…>CSM.porFile>Open Data…>CSM.por– Explore the dataExplore the data– Analyze>Classify>DiscriminantAnalyze>Classify>Discriminant

Move age, complain, educ, fail, pinnovat, preliabl, puse, qual, Move age, complain, educ, fail, pinnovat, preliabl, puse, qual, use, and value into Independent boxuse, and value into Independent boxMove buyyes into Grouping boxMove buyyes into Grouping boxClick Define Range; type 1 for Min and 2 for MaxClick Define Range; type 1 for Min and 2 for MaxClick continueClick continue

Page 71: Multivariate Data Analysis Using SPSS

Discriminant Analysis-8Discriminant Analysis-8

Click Statistics>click Box’s M and Fisher’s; Click Statistics>click Box’s M and Fisher’s; continuecontinueClick Classify button>click Summary table; Click Classify button>click Summary table; Separate groups; ContinueSeparate groups; ContinueClick Save button>click on Discriminant Scores; Click Save button>click on Discriminant Scores; continuecontinueClick OKClick OK

– How original variables related to the How original variables related to the discriminant score?discriminant score?

Graphs>Scatter>Click DefineGraphs>Scatter>Click Define– Move pinnovat into X and dis1_1 into Y; move buyyes Move pinnovat into X and dis1_1 into Y; move buyyes

into Set Markers by boxinto Set Markers by box

Page 72: Multivariate Data Analysis Using SPSS

Discriminant Analysis-9Discriminant Analysis-9

Since Box’s M test was significant, one Since Box’s M test was significant, one can ask SPSS to run DA using ‘separate can ask SPSS to run DA using ‘separate covariances’ option (under Classify) and covariances’ option (under Classify) and compare the resultscompare the results

From the 1From the 1stst analysis, we see that ‘age’ analysis, we see that ‘age’ was not important, one can redo the was not important, one can redo the analysis without ‘age’ and compare the analysis without ‘age’ and compare the resultsresults

Page 73: Multivariate Data Analysis Using SPSS

Discriminant Analysis-10Discriminant Analysis-10

Validate the model: leave-one-out classificationValidate the model: leave-one-out classification– Repeat the analysis, click on Classify>click leave-one-Repeat the analysis, click on Classify>click leave-one-

out classification; Click continueout classification; Click continue

Example 2: predict smoking and drinking habitsExample 2: predict smoking and drinking habits– Analyze>Classify>DiscriminantAnalyze>Classify>Discriminant

Move smkdrnk into Grouping Variable box; move age, Move smkdrnk into Grouping Variable box; move age, attend, black, class, educ, sex and white into IV listattend, black, class, educ, sex and white into IV list

Click Statistics>Select Fisher’s and Box M; ContinueClick Statistics>Select Fisher’s and Box M; Continue

Click Classify>Summary table, Combine-groups; Territorial Click Classify>Summary table, Combine-groups; Territorial map; Continuemap; Continue

Click OKClick OK

Page 74: Multivariate Data Analysis Using SPSS

Cluster Analysis-1Cluster Analysis-1

Cluster analysis is an exploratory data Cluster analysis is an exploratory data analysis technique design to reveal groupsanalysis technique design to reveal groupsHow?How?– By distance: close together observations By distance: close together observations

should be in the same group, and should be in the same group, and observations in the groups should be far apartobservations in the groups should be far apart

Applications:Applications:– Plants and animals into ecological groupsPlants and animals into ecological groups– Companies for product usageCompanies for product usage

Page 75: Multivariate Data Analysis Using SPSS

Cluster Analysis-2Cluster Analysis-2

Two types of methodTwo types of method– Hierarchical: requires observations to remain Hierarchical: requires observations to remain

together once they have joint in a clustertogether once they have joint in a clusterComplete linkageComplete linkage

Between groups average linkageBetween groups average linkage

Ward’s methodWard’s method

– Nonhierarchical: no such requirementNonhierarchical: no such requirementResearch must pick a number of clusters to run (K-Research must pick a number of clusters to run (K-means algorithm)means algorithm)

Page 76: Multivariate Data Analysis Using SPSS

Cluster Analysis-3Cluster Analysis-3

Recommendations:Recommendations:– For relative small samples, use hierarchical For relative small samples, use hierarchical

(less than a few hundred)(less than a few hundred)– For large samples, use K-meansFor large samples, use K-means

Example 1: evaluating 20 types of beerExample 1: evaluating 20 types of beer– File>Open>Data; select beer.porFile>Open>Data; select beer.por– Analyze>Descriptive Stat>DescriptiveAnalyze>Descriptive Stat>Descriptive

Move cost, calories, sodium, and alcohol into Move cost, calories, sodium, and alcohol into variable listvariable listClick at Save standardized values; OKClick at Save standardized values; OK

Page 77: Multivariate Data Analysis Using SPSS

Cluster Analysis-4Cluster Analysis-4

Analyze>Classify>Hierarchical ClusterAnalyze>Classify>Hierarchical Cluster– Move cost, calories, sodium, and alcohol into Variable Move cost, calories, sodium, and alcohol into Variable

list boxlist box– Move Beer into label cases by boxMove Beer into label cases by box– Click Plots>click Dendrogram; click none in Icicle Click Plots>click Dendrogram; click none in Icicle

area; continuearea; continue– Click Method>select Z-score from the standardize Click Method>select Z-score from the standardize

drop-down list; Continuedrop-down list; Continue– Click Save>Click range of solutions; range 2-5 Click Save>Click range of solutions; range 2-5

clusters; continueclusters; continue– OKOK

Page 78: Multivariate Data Analysis Using SPSS

Cluster Analysis-5Cluster Analysis-5

Additional analysisAdditional analysis– Look at the last 4 column of the data (clu5_1 to Look at the last 4 column of the data (clu5_1 to

clu2_1) they contain memberships for each solution clu2_1) they contain memberships for each solution between 5 and 2 clustersbetween 5 and 2 clusters

– Analyze>Descriptive>FrequenciesAnalyze>Descriptive>FrequenciesMove clu2_1 to clu5_1 to Variable boxMove clu2_1 to clu5_1 to Variable boxOKOK

– Obtain mean profile for clustersObtain mean profile for clustersGraph>Line>summary of separate variablesGraph>Line>summary of separate variables

– Click Define>move zcost, zcalorie, zsodium, and zalcohol to Click Define>move zcost, zcalorie, zsodium, and zalcohol to Lines Rep. BoxLines Rep. Box

– Click clu4_1 and move it to Category boxClick clu4_1 and move it to Category box

Page 79: Multivariate Data Analysis Using SPSS

Path Analysis-1Path Analysis-1

Path analysis is a technique based on Path analysis is a technique based on regression to establish causal relationshipregression to establish causal relationship– Start with a diagram with causal flowStart with a diagram with causal flow– Direct causal effects model (regression)Direct causal effects model (regression)

The direct causal effect of an IV on a DV is the coefficient The direct causal effect of an IV on a DV is the coefficient (the number of unit change in DV for 1 unit change in X)(the number of unit change in DV for 1 unit change in X)

– Building on the DCEMBuilding on the DCEM

Two forms of causal model:Two forms of causal model:– DiagramDiagram– Equation (structure equation)Equation (structure equation)

Page 80: Multivariate Data Analysis Using SPSS

Path Analysis-2Path Analysis-2

An example of a causal modelAn example of a causal model– Structural equation:Structural equation:

ZZ44=p=p4141ZZ11+p+p4242ZZ22+p+p4343ZZ33+e+e44

– P: path coefficientP: path coefficient– e: disturbancee: disturbance

– ZZ44, endogenous variable, endogenous variable

– ZZ11: exogenous variable: exogenous variable

– Path diagramPath diagramIndirect effect is the multiplication of the path Indirect effect is the multiplication of the path coefficientscoefficients

Page 81: Multivariate Data Analysis Using SPSS

Path Analysis-3Path Analysis-3

Steps in path analysis:Steps in path analysis:– Create a path diagramCreate a path diagram– Use regression to estimate structural equation Use regression to estimate structural equation

coefficientscoefficients– Assess to model:Assess to model:

Compare the observed and reproduced Compare the observed and reproduced correlations (reproduced correlations will be correlations (reproduced correlations will be computed by hand)computed by hand)

Page 82: Multivariate Data Analysis Using SPSS

Path Analysis-4Path Analysis-4

Research questions: Research questions: – Is our model-which describe the causal Is our model-which describe the causal

effects among the variables ‘region of the effects among the variables ‘region of the world’, ‘status as a developing nation’, world’, ‘status as a developing nation’, ‘number of doctors’, and ‘male life ‘number of doctors’, and ‘male life expectancy’-consistent with our observed expectancy’-consistent with our observed correlation among these variables?correlation among these variables?

– If our model is consistent, what are the If our model is consistent, what are the estimated direct, indirect, and total causal estimated direct, indirect, and total causal effects among the variables?effects among the variables?

Page 83: Multivariate Data Analysis Using SPSS

Path Analysis-5Path Analysis-5

Legal path:Legal path:– No path may pass through the same variable No path may pass through the same variable

more than oncemore than once– No path may go backward on an arrow after No path may go backward on an arrow after

going forward on another arrowgoing forward on another arrow– No path may include more than one double No path may include more than one double

headed curve arrowheaded curve arrow

Page 84: Multivariate Data Analysis Using SPSS

Path Analysis-6Path Analysis-6

Component labels:Component labels:– D: direct effect (just one straight arrow)D: direct effect (just one straight arrow)– I: indirect effect (more than one straight I: indirect effect (more than one straight

arrows)arrows)– S: spurious effect (there is a backward arrow)S: spurious effect (there is a backward arrow)– U: effect is uncertain (start with a two arrows U: effect is uncertain (start with a two arrows

curve)curve)

Page 85: Multivariate Data Analysis Using SPSS

Path Analysis-7Path Analysis-7

If the model is in question (some of the If the model is in question (some of the reproduced correlations differ from the reproduced correlations differ from the observed correlations by more than .05)observed correlations by more than .05)– Test all missing paths (running additional Test all missing paths (running additional

regressions and check for significance of the regressions and check for significance of the coefficients)coefficients)

– Reduce the existing paths if their coefficients Reduce the existing paths if their coefficients are not significantare not significant

Page 86: Multivariate Data Analysis Using SPSS

Logistic regression - MotivationsLogistic regression - Motivations

When the dependent variable is When the dependent variable is dichotomous, regular regression is not dichotomous, regular regression is not appropriateappropriate– We want to predict probabilityWe want to predict probability– OLS regression predictions could be any OLS regression predictions could be any

numbers, not just numbers between 0 and 1numbers, not just numbers between 0 and 1– When dealing with proportions, variance is When dealing with proportions, variance is

depended on mean, equal variance depended on mean, equal variance assumption in OLS is violatedassumption in OLS is violated

Page 87: Multivariate Data Analysis Using SPSS

Motivations-ContinueMotivations-Continue

Fit a S curve to the dataFit a S curve to the data

0 5 10

0.0

0.5

1.0

Income

Pro

b of

Ow

nnin

g H

ome

Page 88: Multivariate Data Analysis Using SPSS

What is Logistic Regression?What is Logistic Regression?

Regressions of the formRegressions of the form

ln(Odds)=Bln(Odds)=B00+B+B11XX11+…+B+…+BkkXXkk

ln(Odds) is called a logicln(Odds) is called a logic

Odds=Porb/(1-Prob)Odds=Porb/(1-Prob)

kk

kk

XBXB

XBXB

e

eob

...B

...B

110

110

1Pr

Page 89: Multivariate Data Analysis Using SPSS

Application of Logistic Application of Logistic RegressionRegression

When to use it? When to use it? – When the dependent valuable is When the dependent valuable is

dichotomousdichotomous

Objectives:Objectives:– Run a logistic regressionRun a logistic regression– Apply a stepwise logistic regressionApply a stepwise logistic regression– Use ROC (response operating Use ROC (response operating

characteristic) curve to access the model characteristic) curve to access the model

Page 90: Multivariate Data Analysis Using SPSS

Assumptions of logistic Assumptions of logistic regressionregression

The indep. variables be interval or The indep. variables be interval or dichotomousdichotomous

All relevant predictors be included, no All relevant predictors be included, no irrelevant predictors be included and the irrelevant predictors be included and the form of the relationship is linearform of the relationship is linear

The expected value of the error term is The expected value of the error term is zerozero

There is no autocorrelationThere is no autocorrelation

Page 91: Multivariate Data Analysis Using SPSS

Assumptions of logistic Assumptions of logistic regression – Cont.regression – Cont.

There is no correlation between the error There is no correlation between the error and the independent variablesand the independent variables

There is an absence of perfect There is an absence of perfect multicollinearity between the independent multicollinearity between the independent variablesvariables

Need to have a large sample (rule of Need to have a large sample (rule of thumb: n should be > 30 times of the thumb: n should be > 30 times of the number of parameters)number of parameters)

Page 92: Multivariate Data Analysis Using SPSS

Note on assumptionsNote on assumptions

No need for normality of errorsNo need for normality of errors

No need for equal varianceNo need for equal variance

Page 93: Multivariate Data Analysis Using SPSS

ExampleExample

Objective: to predict low birth weight babiesObjective: to predict low birth weight babiesVariables:Variables:– Low: 1: <=2500 grams, 0: >2500 gramsLow: 1: <=2500 grams, 0: >2500 grams– LWT: weight at last menstrual cycleLWT: weight at last menstrual cycle– AgeAge– SmokeSmoke– PTL: # of premature deliveriesPTL: # of premature deliveries– HT: History of HypertensionHT: History of Hypertension– UI: uterine irritabilityUI: uterine irritability– FTV: # of physician visits during first trimesterFTV: # of physician visits during first trimester– Race: 1=white, 2=black, 3=otherRace: 1=white, 2=black, 3=other

Page 94: Multivariate Data Analysis Using SPSS

ExampleExample

File > Open > Data > Select SPSS File > Open > Data > Select SPSS Portable type > select Birthwt (in Portable type > select Birthwt (in Regression)Regression)

Analyze > Regression > Binary LogisticAnalyze > Regression > Binary Logistic– Move ‘low’ to the Dependent list boxMove ‘low’ to the Dependent list box– Move ‘age’, ‘ftv’, ‘ht’, ‘ptl’, ‘race’, ‘smoke’, and Move ‘age’, ‘ftv’, ‘ht’, ‘ptl’, ‘race’, ‘smoke’, and

‘ui’ into the Covariate list box‘ui’ into the Covariate list box

Page 95: Multivariate Data Analysis Using SPSS

Example (cont.)Example (cont.)

Click the Categorical buttonClick the Categorical button– Place ‘race’ in the Categorical Covariates boxPlace ‘race’ in the Categorical Covariates box

Click Continue, click SaveClick Continue, click Save– Click the Probability and Group Membership Click the Probability and Group Membership

check boxescheck boxes

Click Continue and then the Option buttonClick Continue and then the Option button

Page 96: Multivariate Data Analysis Using SPSS

Example (cont.)Example (cont.)

Click on the Classification plots and Click on the Classification plots and Hosmer-Lemeshow goodness of fit Hosmer-Lemeshow goodness of fit checkboxescheckboxes

Click Continue, then OKClick Continue, then OK

Page 97: Multivariate Data Analysis Using SPSS

Logistic outputsLogistic outputs

Initial summary output: info on dependent Initial summary output: info on dependent and categorical variablesand categorical variables

Block 0: based on the model just include a Block 0: based on the model just include a constant – provides baseline infoconstant – provides baseline info

Block 1: Method Enter – include the model Block 1: Method Enter – include the model infoinfo– Chi-square tests if all the coeffs are 0 (similar Chi-square tests if all the coeffs are 0 (similar

to ‘F’ in regression)to ‘F’ in regression)

Page 98: Multivariate Data Analysis Using SPSS

Logistic outputs (cont.)Logistic outputs (cont.)

The Modle chi-square value is the The Modle chi-square value is the difference of the initial and final –2LL difference of the initial and final –2LL (small value of -2LL indicates a good fit, -(small value of -2LL indicates a good fit, -2LL=0 indicates a perfect fit)2LL=0 indicates a perfect fit)

The Step and Block display the the result The Step and Block display the the result of last Step and Block (they are the same of last Step and Block (they are the same here because we are not using stepwise here because we are not using stepwise regression)regression)

Page 99: Multivariate Data Analysis Using SPSS

Logistic outputs (cont.)Logistic outputs (cont.)

The goodness of fit statistics –2LL is The goodness of fit statistics –2LL is 203.554203.554Cox & Snell R square – similar to R-Cox & Snell R square – similar to R-square in OLSsquare in OLSNagelkerke R squre (prefered b/c it can Nagelkerke R squre (prefered b/c it can be 1)be 1)Hosmer and Lemeshow test: test “there Hosmer and Lemeshow test: test “there is no difference between expected and is no difference between expected and observe counts”. I.e. we prefer a non-observe counts”. I.e. we prefer a non-significant resultsignificant result

Page 100: Multivariate Data Analysis Using SPSS

Logistic outputs (cont.)Logistic outputs (cont.)

Classification table: can our model to Classification table: can our model to predict accurately?predict accurately?– Overall accuracy is 73%Overall accuracy is 73%– We do much better on higher birth weightWe do much better on higher birth weight– Does a poor job on lower birth weightDoes a poor job on lower birth weight– A significant model doesn’t mean having high A significant model doesn’t mean having high

predictabilitypredictability

Page 101: Multivariate Data Analysis Using SPSS

Interpretation of the coefficientsInterpretation of the coefficients

E.g. HT (hypertension) E.g. HT (hypertension) – B=1.736 – hypertension in the mother B=1.736 – hypertension in the mother

increase the log odds by 1.736increase the log odds by 1.736– Exp(B)=5.831 - hypertension in the mother Exp(B)=5.831 - hypertension in the mother

increase the odds of having a low birth baby increase the odds of having a low birth baby by a factor of 5.831by a factor of 5.831

– What is the prob. change?What is the prob. change?If the original odds is 1:100 (p=.0099), it changes If the original odds is 1:100 (p=.0099), it changes to 5.831:100 (p=.0551); if the original odds is 1:1 to 5.831:100 (p=.0551); if the original odds is 1:1 (p=.5), it changes to 5:1 (p=.83)(p=.5), it changes to 5:1 (p=.83)

Page 102: Multivariate Data Analysis Using SPSS

Interpretation of the coefficients Interpretation of the coefficients (cont.)(cont.)

Categorical variable Race:Categorical variable Race:– First an overall effectFirst an overall effect– Race(1) – white: the effect of being white is Race(1) – white: the effect of being white is

significant, acting to decrease the odds ratio significant, acting to decrease the odds ratio compared to those of ‘other’ by a factor of .4compared to those of ‘other’ by a factor of .4

– The effect of being black is not significant The effect of being black is not significant compared with ‘other’compared with ‘other’

Page 103: Multivariate Data Analysis Using SPSS

Making predictionMaking prediction

Suppose a mother;Suppose a mother;– Age 20Age 20– Weigh 130 poundsWeigh 130 pounds– SmokeSmoke– No hypertension or premature laborNo hypertension or premature labor– Has uterine irritabilityHas uterine irritability– WhiteWhite– Two visits to her doctorTwo visits to her doctor

Page 104: Multivariate Data Analysis Using SPSS

Making prediction (cont.)Making prediction (cont.)

P(event) = 1/(1+exp(-(a+bP(event) = 1/(1+exp(-(a+b11XX11+…+b+…+bkkXXkk))

P=.397P=.397

Predicted to be not have low birth rate Predicted to be not have low birth rate because the prob. is less that .5because the prob. is less that .5

Page 105: Multivariate Data Analysis Using SPSS

Checking classificationChecking classification

Need to study the characteristics of Need to study the characteristics of mispredicted casesmispredicted cases– Transform>Compute> Pred_err=1 if…Transform>Compute> Pred_err=1 if…– Analyze>Compare Means (LWT vs Pred_err)Analyze>Compare Means (LWT vs Pred_err)

The mean LWT for mispredicted is much lower The mean LWT for mispredicted is much lower than the correctly predictedthan the correctly predicted

Page 106: Multivariate Data Analysis Using SPSS

Residual AnalysisResidual Analysis

Analyze>Regression>Logistic>Click Save >Click Analyze>Regression>Logistic>Click Save >Click Cook’s, Leverage, Unstandardized, Logit, and Cook’s, Leverage, Unstandardized, Logit, and StandardizedStandardizedExamining dataExamining data– Cook’s and Leverage should be small (if a case has no Cook’s and Leverage should be small (if a case has no

influence on the regression result, the values would be influence on the regression result, the values would be 0)0)

– Res_1 is the residual of probability (e.g. 1Res_1 is the residual of probability (e.g. 1stst case have case have predicted prob. .29804 and and actual ‘low’ value is 0, predicted prob. .29804 and and actual ‘low’ value is 0, and the res_1=0-.29804=-.29804)and the res_1=0-.29804=-.29804)

– Zre_1 is the standardized residual of the probsZre_1 is the standardized residual of the probs– lre_1 is the residual in terms of logitlre_1 is the residual in terms of logit

Page 107: Multivariate Data Analysis Using SPSS

ROC curve (Receiver Operating ROC curve (Receiver Operating Characteristic)Characteristic)

Sensitivity: true positiveSensitivity: true positiveSpecificity: true negativeSpecificity: true negativeChanging cut off points (.5) changes both the Changing cut off points (.5) changes both the sensitivity and specificitysensitivity and specificityROC can help us to select an ‘optimal’ cut off ROC can help us to select an ‘optimal’ cut off pointpointGraph>ROC Curve>move pre_1 to ‘Test Graph>ROC Curve>move pre_1 to ‘Test Variable’, low to ‘State Variable’, type ‘1’ in the Variable’, low to ‘State Variable’, type ‘1’ in the ‘Value of State Variable’, click ‘with diagonal ‘Value of State Variable’, click ‘with diagonal reference line’ and ‘Coordinate points of the reference line’ and ‘Coordinate points of the ROC Curve’ROC Curve’

Page 108: Multivariate Data Analysis Using SPSS

ROC curve interpretationROC curve interpretation

Vertical axis: sensitivity (true positive rate)Vertical axis: sensitivity (true positive rate)Horizontal axis: false negative rateHorizontal axis: false negative rateDiagonal: referenceDiagonal: referenceGive the trade off between sensitivity and Give the trade off between sensitivity and false negative ratesfalse negative ratesPay attention to the area where the curve Pay attention to the area where the curve rise rapidlyrise rapidlyThe 1The 1stst column of ‘coordinate of the curve’ column of ‘coordinate of the curve’ gives the cut off prob. gives the cut off prob.

Page 109: Multivariate Data Analysis Using SPSS

Residual Analysis – Cont.Residual Analysis – Cont.

Examine the distribution of zre_1Examine the distribution of zre_1– Graph>Interactive>Histogram>drag zre_1 to Graph>Interactive>Histogram>drag zre_1 to

X axis, click Histogram, click Normal CurveX axis, click Histogram, click Normal CurveNote: this plot need not to should normalityNote: this plot need not to should normality

Finding influential casesFinding influential cases– Graph>Scatterplot>Define>Move id to X axis, Graph>Scatterplot>Define>Move id to X axis,

coo_1 to Y axiscoo_1 to Y axis

MulticollinearityMulticollinearity– Use OLS regression to check (?)Use OLS regression to check (?)

Page 110: Multivariate Data Analysis Using SPSS

Multinomial Logistic RegressionMultinomial Logistic Regression

The dependent variable is categorical with The dependent variable is categorical with two or more categoriestwo or more categories

It is an extension of the logistic regressionIt is an extension of the logistic regression

The assumptions are the assumptions for The assumptions are the assumptions for logistic regression plus ‘the dependent logistic regression plus ‘the dependent variable has multinomial distributionvariable has multinomial distribution

Page 111: Multivariate Data Analysis Using SPSS

ExampleExample

Objective: predict risk credit risk (3 Objective: predict risk credit risk (3 categories) base on financial and categories) base on financial and demographic variablesdemographic variablesVariables:Variables:– AgeAge– IncomeIncome– GenderGender– Marital (single, married, divsepwid)Marital (single, married, divsepwid)– Numkids: # of dependent childrenNumkids: # of dependent children

Page 112: Multivariate Data Analysis Using SPSS

Example Cont.Example Cont.

– Numcards: #of credit cardsNumcards: #of credit cards– Howpaid: how often paid (weekly, monthly)Howpaid: how often paid (weekly, monthly)– Mortgage: have a mortgage (y, n)Mortgage: have a mortgage (y, n)– Storecar: # of store credit cardsStorecar: # of store credit cards– Loans: # of other loadsLoans: # of other loads– Risk: 1=bad risk, 2=bad risk-profit, 3=good Risk: 1=bad risk, 2=bad risk-profit, 3=good

riskrisk

Page 113: Multivariate Data Analysis Using SPSS

How does it work?How does it work?

Let f(j) be the probability of being in Let f(j) be the probability of being in outcome category joutcome category j– f(1)=P(bad risk-lost)f(1)=P(bad risk-lost)– f(2)=P(bad risk-profit)f(2)=P(bad risk-profit)– f(3)=P(good risk)f(3)=P(good risk)– g(1)=f(1)/f(3)g(1)=f(1)/f(3)– g(2)=f(2)/f(3)g(2)=f(2)/f(3)– g(3)=f(3)/f(3)=1g(3)=f(3)/f(3)=1

Page 114: Multivariate Data Analysis Using SPSS

How does it work? – Cont.How does it work? – Cont.

Fit the modele:Fit the modele:– ln(g(1))= Aln(g(1))= A11+B+B1111XX11+…+B+…+B1k1kXXkk

– ln(g(2))= Aln(g(2))= A22+B+B2121XX11+…+B+…+B2k2kXXkk

– ln(g(3))= ln(1)=0=Aln(g(3))= ln(1)=0=A33+B+B3131XX11+…+B+…+B3k3kXXkk

)(

)()(

jg

jgjf

Page 115: Multivariate Data Analysis Using SPSS

How does it work? – Cont.How does it work? – Cont.

11)2()1(

)1()1(

2121211111

11111

......

...

kkkk

kk

XBXBAXBXBA

XBXBA

ee

e

gg

gf

11)2()1(

)2()2(

2121211111

21212

......

...

kkkk

kk

XBXBAXBXBA

XBXBA

ee

e

gg

gf

1

1

1)2()1(

)3()3(

2121211111 ......

kkkk XBXBAXBXBA eegg

gf

Page 116: Multivariate Data Analysis Using SPSS

Example – Cont.Example – Cont.

File > Open > Data > Select Risk > OpenFile > Open > Data > Select Risk > Open

Move risk into dependent list boxMove risk into dependent list box

Move marital and mortgage into the Move marital and mortgage into the Factor(s) list boxFactor(s) list box

Move income and numberkids into the Move income and numberkids into the Covariate(s) list boxCovariate(s) list box

Click model buttonClick model button– Click cancel buttonClick cancel button

Page 117: Multivariate Data Analysis Using SPSS

Example (Cont.)Example (Cont.)

Click Statistics buttonClick Statistics button– Check the Classification table check boxCheck the Classification table check box– Click ContinueClick Continue

Click SaveClick Save– The Multinomial Logistic regression in SPSS The Multinomial Logistic regression in SPSS

version 10 will only save model info in an XML version 10 will only save model info in an XML (Extensible Markup Language) format(Extensible Markup Language) format

– Click cancelClick cancel

Click OKClick OK

Page 118: Multivariate Data Analysis Using SPSS

Multinomial outputMultinomial output

Model Fit and Pseudo R-square, Model Fit and Pseudo R-square, Likelihood ratio test are similar to logistic Likelihood ratio test are similar to logistic regressionregression

Parameter estimates table is differentParameter estimates table is different– There are two sets of parametersThere are two sets of parameters

One for the probability ratio ofOne for the probability ratio of(bad risk-lost)/(good risk)(bad risk-lost)/(good risk)

Another set for the prob. Ratio ofAnother set for the prob. Ratio of

(bad risk-profit)/(good risk)(bad risk-profit)/(good risk)

Page 119: Multivariate Data Analysis Using SPSS

Interpretation of coefficientsInterpretation of coefficients

Income in the ‘bad lost’ sectionIncome in the ‘bad lost’ section– It is significantIt is significant– Exp(B)=.962: the expected probability ratio is Exp(B)=.962: the expected probability ratio is

decreased a little (by a factor of .962) for one decreased a little (by a factor of .962) for one unit increase of incomeunit increase of income

Page 120: Multivariate Data Analysis Using SPSS

How to predict?How to predict?

F(1) – the chance in ‘bad loss’ groupF(1) – the chance in ‘bad loss’ group

F(2) – the chance in ‘bad profit’ groupF(2) – the chance in ‘bad profit’ group

F(3) – the chance in ‘good risk’ groupF(3) – the chance in ‘good risk’ group

F(j)=g(j)/sum(g(i))F(j)=g(j)/sum(g(i))

g(j)=exp(modelg(j)=exp(modeljj))

Page 121: Multivariate Data Analysis Using SPSS

How to predict? (cont.)How to predict? (cont.)

Suppose an individualSuppose an individual– Single, has a mortgageSingle, has a mortgage– No childrenNo children– Income of 35,000 poundsIncome of 35,000 pounds

g(1)=.218g(1)=.218

g(2)=.767g(2)=.767

g(3)=1g(3)=1

Page 122: Multivariate Data Analysis Using SPSS

How to predict?How to predict?

F(1)=.218/(.218+.767+1)=.110F(1)=.218/(.218+.767+1)=.110

F(2)=.386F(2)=.386

F(3)=.504F(3)=.504

The individual is classified as good riskThe individual is classified as good risk

Page 123: Multivariate Data Analysis Using SPSS

Multinomial Logistic Reg. With Multinomial Logistic Reg. With InteractionInteraction

Analyze>Regression>Multinomial Analyze>Regression>Multinomial Logistic>Click at Model, select Logistic>Click at Model, select custom>specify your model (all main custom>specify your model (all main effects and the interaction between Marital effects and the interaction between Marital and Mortgage)and Mortgage)

Interpret the results as usualInterpret the results as usual

Page 124: Multivariate Data Analysis Using SPSS

Interaction effects in logistic Interaction effects in logistic RegressionRegression

It is similar to OLS regression:It is similar to OLS regression:– Add interaction terms to the model as Add interaction terms to the model as

crossproductscrossproducts– In SPSS, highlight two variables (holding In SPSS, highlight two variables (holding

down the ctrl key) and move them into the down the ctrl key) and move them into the variable box will create the interaction termvariable box will create the interaction term