multivariate data analysis regression, cluster and factor analysis on spss

26
Aditya Banerjee 86 Amlan Anurag 90 Apoorva Jain 94 Boris Babu Joseph 98

Upload: aditya-banerjee

Post on 22-Jan-2018

525 views

Category:

Data & Analytics


9 download

TRANSCRIPT

Page 1: Multivariate data analysis   regression, cluster and factor analysis on spss

“Aditya Banerjee 86Amlan Anurag 90Apoorva Jain 94

Boris Babu Joseph 98

Page 2: Multivariate data analysis   regression, cluster and factor analysis on spss
Page 3: Multivariate data analysis   regression, cluster and factor analysis on spss

Regression Equation

Y = .243xX6 - .286xX7 + .248xX9 + .127x11 + .546xX12 + .227xX20 + .2xX21 – 2.010

Product Line has the least effect on Csat. This should be looked at last when increasing efforts.

Salesforce Image has the most effect on Csat. This should be looked at first when increasing efforts.

Page 4: Multivariate data analysis   regression, cluster and factor analysis on spss

Existence of Homoscedasticity: All errors have constant variance

This is tested by looking at scatter plots of each independent variable to the

dependent variable.

We see that x6, x12,

and x20 have mild

heteroscedasticity, but

this magnitude can be

ignored.

Page 5: Multivariate data analysis   regression, cluster and factor analysis on spss

Functional Form of Regression is Linear: The highest power of the equation is

1, i.e. when plotted, the regression equation is a straight line.

Page 6: Multivariate data analysis   regression, cluster and factor analysis on spss

Sphericity of Errors: All errors are normally distributed.

As can be seen, there is only one outlier when looking

at errors.

Page 7: Multivariate data analysis   regression, cluster and factor analysis on spss

�No Multicollinearity: No dependence between independent variables. This is checked by

looking at the data for Tolerance And VIF. Tolerance is how resistant the variable is to the other

independent variables, and VIF is how much the variable will change if resistance threshold is

crossed.

No Autocorrelation: This is accounted for by loking at the Durbin Watson statistic. It is

acceptable to have it at 2.3

Page 8: Multivariate data analysis   regression, cluster and factor analysis on spss

The R2 is .835, and the Adjusted R2 is .822. This shows that this

model is robust as it can be generalised for 82% of the population.

The SEE is also at .5027 which is advisable.

Page 9: Multivariate data analysis   regression, cluster and factor analysis on spss

When efforts are being made to increase C Sat, the bulk of our efforts should be directed towards x12.

E Commerce activities show coefficient of -.268 which show that while there is an increase in e

commerce activities, it might not be contributing to increasing consumer satisfaction. Hence, work

needs to be done there in the form discounts, or other offers that can be put online

Page 10: Multivariate data analysis   regression, cluster and factor analysis on spss

The highest correlation seen is between the variables cost control and cash and financial

management which is 0.496, which is not very strong.

Page 11: Multivariate data analysis   regression, cluster and factor analysis on spss
Page 12: Multivariate data analysis   regression, cluster and factor analysis on spss

To determine the number of clusters we put the condition of Eigen value>1. This gave us four factors. But as

we can see four factors are explaining only 58% of the variance which is below our agreeable limit. We can

also see that after 4 factors, each additional factor is explaining a very small amount of variation. Hence we

put 5 factors a priori and run the analysis again, the result of which can be seen below.

Page 13: Multivariate data analysis   regression, cluster and factor analysis on spss
Page 14: Multivariate data analysis   regression, cluster and factor analysis on spss
Page 15: Multivariate data analysis   regression, cluster and factor analysis on spss
Page 16: Multivariate data analysis   regression, cluster and factor analysis on spss
Page 17: Multivariate data analysis   regression, cluster and factor analysis on spss

We can see in the factor

matrix box that factor 1 has

high correlation with

variable 4,7,10,11. Factor

2 has high correlation with

variable 3,5. Factor3 with

variable 6, factor 4 with

variables 8,9 and factor 5

as we can see does not

have high correlation with

any of the factors. We can

also see that variable 1

and 2 do not have a strong

correlation with any of the

factors. Hence on rotation

of the matrix a more

equitable distribution of

variation can be seen,

though the total variance

remains the same. Factor

1 shows high correlation

with variables 7,10,11.

Factor 2 shows high

correlation with variables 1

and 3. Factor 3 shows with

variables 2,4 and Factor 4

shows with variable 8.

Variable 6 does not have

correlation with any of the

factors. Therefore, we can

take it as a separate factor.

Page 18: Multivariate data analysis   regression, cluster and factor analysis on spss
Page 19: Multivariate data analysis   regression, cluster and factor analysis on spss

Taking the correlation of the variables with their

factors we have given the following labels to the

five factors extracted. :

1. Cost management 2. Product service3. Pricing of machinery4. Marketing5. Employee productivity.

Page 20: Multivariate data analysis   regression, cluster and factor analysis on spss

DATA CLEANING

We have converted the missing values in

the Likert scale (1-7) .

Values which were shown to be higher than 7 were

replaced with the mean of the given variable.

This produced a whole new set of variables for the

operation.

This was done using data transform.

TRANFORM > REPLACE MISSING VALUES

Select Data mean

Page 21: Multivariate data analysis   regression, cluster and factor analysis on spss

CHANGE CAPTURED

Change from 9 to mean values for that particular variable.

Page 22: Multivariate data analysis   regression, cluster and factor analysis on spss

FACTOR ANALYSIS

Multicollinearity occurs when 2 or more predictor

variables are highly correlated. Small changes in the

data might lead to large jumps due to this.

To address the issue of multicollinearity, we have

run factor analysis.

With a KMO > .6, the issue of Multicollinearity is

surpassed.

ANALYZE > DIMENSION REDUCTION > FACTOR

Multicollinearity

check

completed

Page 23: Multivariate data analysis   regression, cluster and factor analysis on spss

FACTOR ANALYSIS

Awareness, Attitude & Preference combined for the

first factor which can be classified as Consumer

Attitude as it showed factors that may influence the

consumers and how their perception is built

Purchase & Loyalty combined for the second factor

which can be considered as Consumer Loyalty as

these factors reflected how the consumer feels about

the brand, and holds it above others in comparison.

Page 24: Multivariate data analysis   regression, cluster and factor analysis on spss

CLUSTERING

The highest change in coefficient was noticed at

Stage 40 to Stage 41 which means that

agglomeration had to stop at this point.

N = 45

No. of Clusters = 45 – 40 = 4

Page 25: Multivariate data analysis   regression, cluster and factor analysis on spss

PROFILING AND INTERPRETATION

Gender & Usage

Anova test was run to check if the classification was

significantly different when based on Gender or

Usage patterns.

It was found that no significant associations were

present for the same.

Page 26: Multivariate data analysis   regression, cluster and factor analysis on spss

K MEANS VS HEIRARCHIAL CLUSTERING

It was found that there were major differences in the

number of cases/respondents that each cluster took

from the different methods used.

Although the number of clusters are same the mean

values for various variables will also differ

accordingly across the two methods due to the

change in respondents

Cluster 1 15

2 12

3 5

4 5

5 8

Valid 45

Missing 0

Hierarchical Method

K Means Method