introduction to spss with the interpretation and steps

Ahmad Nazim & W.M.AsyrafIntroduction to SPSS hands-0n

1

Item Description

File Open : allows data files to be opened for analysis.

File Save : saves the file in the active window.

File Print : prints the file in the active windows.

Insert Cases : inserts a case above the case containing the active cell.

Insert Variable : inserts a variable to the left of the variable containing the

active cell.

Value Labels : allows toggling between actual values and value labels in

the Data Editor.

Select Cases : provides methods for selecting a subgroup of cases based

on criteria that include variables and complex expressions.

Split Files : splits the data file into separate groups for analysis based on

the values of one or more grouping variable.

Toolbar

This toolbar is available in SPSS Data Editor, providing quick and easy access to

frequently used features. The following are some of the frequent used tools in the Data

Editor.


2

The box shown above is a dialogue box appears once PASW SPSS 18.0 opened. To

open an existing file, choose a file under “open an existing data source menu”. Since we

are not going to use this dialogue box, click on the Cancel button to close it.

If you want to open existing data, go to File > Open > Data.


3

You can now open the location where you save the existing SPSS data file.

Data Editor Window

The window (shown above) is the Data Editor Window. It consists of 12 pull-down

PASW STATISTICS menus available for user. The menus are: File, Edit, View, Data,

Transform, Analyze, Direct Marketing, Graphs, Utilities, Add-ons, Window and Help. At


4

the left-bottom of the window, there are options for Data View and Variable View

windows.

The Data View window is where you will type in your data. However, you must first tell

SPSS certain things about your data and you will do this in the Variable View window.

Variable View window has 10 columns and they tell the program different things about

the measurement values such as whether or not the values are qualitative or

quantitative.

Defining Variables

To enable PASW SPSS 18.0 analysis works, variables of the research must be defined

first in the Data Editor Window before entering any data. Click on the left bottom of the

PASW STATISTICS Data Editor. We can see Define Variable Dialog box, as shown in

the figure below. It consists of:

Name

Type

Width

Decimals

Label

Values

Missing

Measure


5

o Nominal variable, is for mutual exclusive, but not ordered, categories. For

example, your study might compare achievement between gender; male and

female.

o Ordinal variable, is one where the order matters but not the difference between

values. For example, you might ask patients to express the amount of pain they

are feeling on a scale of 1 to 10. Another example would be movie ratings, from *

to *****.

o Interval variable is a measurement where the difference between two values is

meaningful. For example, you might ask the respondent’s salary in order to

compare their salary and their expenses.

o Ratio data is interval data with a natural zero point. For example, time is ratio

since 0 time is meaningful. A weight of 4 grams is twice a weight of 2 grams,

because weight is a ratio variable. A temperature of 100 degrees C is not twice


6

as hot as 50 degrees C, because temperature C is not a ratio variable. A pH of 3

is not twice as acidic as a pH of 6, because pH is not a ratio variable.

Variable Name Label Value Label Measure

Gender Respondent’s Sex 1 = Male, 2 = Female Nominal

Qualification Respondent’s Highest Education Background

1 = SPM, 2 = Diploma, 3 = Bachelor, 4 = Master, 5 = PhD

Ordinal

Income Respondent’s Monthly Income

1 = ≤ RM999, 2 = RM1000 – RM1999, 3 = RM2000 – RM2999, 4 = RM3000 – RM3999, 5 = ≥ RM4000

Interval

Weight Respondent’s Weight

Any value Ratio

Value Labels

Value labels is a label assigned to a particular value of variable. For example, for races

label, we might use codes 1= Malay, 2= Chinese, 3= Indian, 4=others.

In our case, for gender labels we will use 1= Male and 2= Female.

To enter the codes:

Type “1” in value box and “male” in label box. Then click “Add”.


7

Type “2” in value box and “female” in label box. Then click “Add”. End the process “OK”.

Missing Values

Select Discrete Missing Value button. Then, type “99” (example) or any other codes that

will not be used in other variable’s code to replace the missing value.

Repeat the step for labelling other variables. For age, the Measure column should be in

“scale” since age is an interval measure. The same measure goes to visit, serv_prop,

ser_friendly, serv_clean, serv_time, serv_overall variables. For employment and

residence, we will use a “Nominal” measure.


8


9


10

Assumption on Parametric Test

1. Data must be normal

2. Data have equal variance

3. Data must be more than 30 cases

Testing Normality


11

Testing normality of our data is prerequisite for inferential statistical technique.

The normality test also needed in order to use parametric test on our data. Only

normal data can use parametric tests.

To check normality for single variable, follow the following steps:

Analyze > Descriptive Statistics > Explore

Select the variable of interest, for example : age. Then, click on “Plots” button.


12

Ansure that the “Factor levels together” button is selected in the Boxplot display.

Tick on “Stem and Leaf”, “Histogram” and “Normality plots with tests” buttons

Click Continue for the results.

Normality Test (Single Variable) Output


13

In the above diagram, the Histogram shows a perfect bell-shaped distribution without

skewness to either left or right. Therefore, the age variable can be concluded as normal.

Another way to look at the distribution of our data is by using Normal Q-Q plot. In our

case, the points lie along the straight line and show no pattern, therefore the age data

distribution can be concluded as normal.


14

To check normality for multiple variables, follow the following steps:

Analyze > Regression > Linear

Click “overall quality” and insert it to the Dependent box. Click other observed variable

(demographic profile excluded) to the independent variables. Then, click on “Plots”

button.


15

Normality Test (Multiple Variables) Output

In the above diagram, the Histogram shows a perfect bell-shaped distribution without

skewness to either left or right. Therefore, variables in this case study can be concluded

as normal.

Another way to look at the distribution of our data is by using Normal Q-Q plot. In our

case, the points lie along the straight line and show no pattern, therefore the age data

distribution can be concluded as normal.


16

Recode Into Different Variable

Recode assigns discrete values to a variable, based solely on the present values of the

variable being recoded. You may want to recode variable for easier interpretation or

decision making.

Transform > Recode Into Different Variables

Step 2: Type “overall” in the Name box. Then, click on “Old and New Values” button.


17

Step 3 : Let us recode the overall perception variable

1 thru 2 = 1 (low / disagree)

3 = 2 (medium / undecided)

4 thru 5 = 3 (high / agree)

Then, click on “Continue” button.

Now, new recoded value will appear to the left side of the Data View window.


18

Independent Sample T-Test

The Independent Sample T-test procedure tests the null hypothesis that the population

mean o a variable is the same for the two groups of cases. It also displays confidence

interval for the different between the population means of the groups

Step 1: Click analyze > Compare Means > Independent- Sample T- test

Step 2: Transfer the variable into Test Variable(s) box, following gender variable into the

Grouping Variable: box (below)


19

Step 3: Click on the Define Groups button and you will need to define which two

categories for gender variable. In this case, there are only two categories which are

male for Group 1 and female in Group 2. These categories referred as the values 1 and

2. Hence, type the value 1 in the Group 1 and 2 in the Group 2.

Step 4: Click on Continue. Then OK. The following output is appeared:

Group Statistics

respondent's

sex N Mean

Std.

Deviation

Std. Error

Mean

infrastructure male 7 3.5714 .53452 .20203

female 16 3.6250 .95743 .23936

service quality male 7 3.4286 .53452 .20203

female 16 3.8125 1.04682 .26171

cleanliness

quality

male 7 4.2857 .75593 .28571

female 16 3.8125 .75000 .18750

queue time male 7 3.1429 .89974 .34007

female 16 2.8750 .88506 .22127

overall quality male 7 4.0000 .81650 .30861

female 16 4.0000 .73030 .18257


20

The table above shows the means of infrastructure, service quality, cleanliness quality,

queue time and overall quality between male and female. By referring on that table, the

mean to cleanliness quality for male is the highest following to the overall quality

between male and female. The least mean is queue time for female.

Independent Samples Test

Levene's Test

for Equality of

Variances t-test for Equality of Means

F Sig. t df

Sig. (2-

tailed)

Mean

Differenc

e

Std.

Error

Differe

nce

95% Confidence

Interval of the

Difference

Lower Upper

infrastructure Equal variances assumed 1.447 .242 -.138 21 .892 -.05357 .38888 -.86228 .75514

Equal variances not

assumed

-.171 19.387 .866 -.05357 .31322 -.70827 .60113

service quality Equal variances assumed 3.000 .098 -.911 21 .372 -.38393 .42131 -1.26010 .49224

Equal variances not

assumed

-1.161 20.237 .259 -.38393 .33061 -1.07306 .30520

cleanliness Equal variances assumed .000 .987 1.389 21 .179 .47321 .34064 -.23519 1.18162

Equal variances not

assumed

1.385 11.433 .193 .47321 .34174 -.27550 1.22193

queue time Equal variances assumed .106 .748 .665 21 .513 .26786 .40299 -.57020 1.10592

Equal variances not

assumed

.660 11.342 .522 .26786 .40571 -.62184 1.15755

overall quality Equal variances assumed .091 .765 .000 21 1.000 .00000 .34256 -.71239 .71239

Equal variances not

assumed

.000 10.424 1.000 .00000 .35857 -.79456 .79456

The table above encompasses the result of Levene’s Test for equality of variances and

t- test for equality of means. However, most of researchers just focus on the value of

significant in t- test for equality of means to determine whether differences exist

between male and female students. In this case, all of the variables indicates that p>

0.05 and therefore is not significant. Hence, the null hypothesis is accepted that there is

no significant difference between male and female pertaining to all variable included.


21

One-Way Anova

Step 1: Click analyze > compare means > One- way ANOVA

Step 2: Transfer the infrastructure (serv_prop) from the list variable into Dependent List

following the work place (employment) into the factor.

Step 3: Click Post Hoc > Tick LSD


22

Step 4: Click Option > Tick Descriptive, Fixed and Homogeneity of Variance Test

Step 5: Click on Continue, followed by OK. The following result is produced:


23

Descriptives

Infrastructure

N Mean

Std.

Deviation

Std.

Error

95% Confidence

Interval for Mean

Minimum Maximum

Between-

Component

Variance

Lower

Bound

Upper

Bound

Government 6 4.1667 1.32916 .54263 2.7718 5.5615 2.00 6.00

Private Sector 6 3.1667 .40825 .16667 2.7382 3.5951 3.00 4.00

GLC Sector 5 3.6000 .54772 .24495 2.9199 4.2801 3.00 4.00

Self Employed 6 3.5000 .54772 .22361 2.9252 4.0748 3.00 4.00

Total 23 3.6087 .83878 .17490 3.2460 3.9714 2.00 6.00

M

o

d

e

l

Fixed Effects .80677 .16822 3.2566 3.9608

Random Effects .21266 2.9319 4.2855 .06731

The descriptive table shows mean of infrastructure for each categories of work place.

The result obtained shows the respondents among government sector are the highest

interest on infrastructure towards customer’s satisfaction. Instead, the respondents

among private sector are the lowest interest on infrastructure towards customer

satisfaction.

Test of Homogeneity of Variances

infrastructure

Levene Statistic df1 df2 Sig.

1.643 3 19 .213

The test of homogeneity shows insignificant since 0.213> 0.05. So, the null hypothesis

is accepted and proved that the population variances for each group are approximately

equal. This test is required to ensure the probability of the test value is homogeneity or

heterogeneity.


24

ANOVA

Infrastructure

Sum of Squares df Mean Square F Sig.

Between Groups 3.112 3 1.037 1.594 .224

Within Groups 12.367 19 .651

Total 15.478 22

The significant value of the ANOVA table is 0.224 which is greater than 0.05. Hence,

the null hypothesis is accepted which defines that the infrastructure towards customer

satisfaction is not different at work place. However, the ANOVA result does not enough

to identify which work place differed with each other. Thus, the LSD is required to

determine where the significant lies. The result is shown as below:

Multiple Comparisons

Infrastructure

LSD

(I) work place (J) work place Mean Difference

(I-J) Std. Error Sig.

95% Confidence Interval

Lower Bound Upper Bound

Government

Sector

Private Sector 1.00000* .46579 .045 .0251 1.9749

GLC Sector .56667 .48852 .260 -.4558 1.5892

Self Employed .66667 .46579 .169 -.3082 1.6416

Private Sector Government Sector -1.00000* .46579 .045 -1.9749 -.0251

GLC Sector -.43333 .48852 .386 -1.4558 .5892

Self Employed -.33333 .46579 .483 -1.3082 .6416

GLC Sector Government Sector -.56667 .48852 .260 -1.5892 .4558

Private Sector .43333 .48852 .386 -.5892 1.4558

Self Employed .10000 .48852 .840 -.9225 1.1225

Self Employed Government Sector -.66667 .46579 .169 -1.6416 .3082

Private Sector .33333 .46579 .483 -.6416 1.3082

GLC Sector -.10000 .48852 .840 -1.1225 .9225

*. The mean difference is significant at the 0.05 level.

The outcome illustrates that government sector and private sector have significantly

different mean on infrastructure towards customer’s satisfaction.

Association Analysis


25

Association Analysis is the weakest measurement of relationship. It is usually used for

categorical types of data with nominal and ordinal measurement. Measurement of

association is obtained together with cross tabulation between qualitative variables

(Nominal and Ordinal). For example, we want to measure the relationship between the

gender (male and female) with their attitude towards Mathematics whether high,

medium or low.

Step 1: Click analyze > Descriptive Statistics > Crosstabs

Step 2: Transfer the cleanliness quality into the Row(s) following the respondent’s sex

into the Column(s).


26

Step 3: Click Statistics > Tick Chi-square

Step 4: Click Cell Display > Tick Observed and Unstandardized Residual


27

Step 5: Click on Continue and then press OK.

Crosstab

respondent's sex

Totalmale female

overall

quality

neither agree

nor disagree

Count 2 4 6

Residual .2 -.2

agree Count 3 8 11

Residual -.3 .3

strongly agree Count 2 4 6

Residual .2 -.2

Total Count 7 16 23


28

Chi-Square Tests

Value df

Asymp. Sig.

(2-sided)

Pearson Chi-Square .100a 2 .951

Likelihood Ratio .100 2 .951

Linear-by-Linear Association .000 1 1.000

N of Valid Cases 23

a. 5 cells (83.3%) have expected count less than 5. The minimum

expected count is 1.83.

First, look at the Pearson Chi- Square value. Based on the result, the Chi-Square test

value is not significant since p-value is greater than 0.05. This result indicates that there

is no association exist between gender and the perception towards overall quality.

Correlation Analysis

Correlation Analysis is used to measure the relationship between variables. To measure

the relationship using correlation analysis, there are two types of correlation coefficient

which are the Spearman rank coefficient of correlation and the Pearson product

moment coefficient of correlation.

The Spearman is appropriate for abnormal data and it is also known as the non

parametric version of correlation analysis. The Pearson is appropriate for normal

distributed data and it is calculated using the actual data values while Spearman

replaces the actual data with ranks.


29

Step 1: Click Analyze > Correlate > Bivariate

Step 2: Transfer the five variables from the list variables into the Variables.

Step 3: Click OK.

Correlations


30

infrastructure

service

quality

cleanliness

quality queue time

overall

quality

infrastructure Pearson Correlation 1 -.408 .046 .111 -.592

Sig. (2-tailed) .054 .835 .613 .003

N 23 23 23 23 23

service quality Pearson Correlation -.408 1 -.147 -.408 -.066

Sig. (2-tailed) .054 .502 .053 .763

N 23 23 23 23 23

cleanliness

quality

Pearson Correlation .046 -.147 1 -.604 .401

Sig. (2-tailed) .835 .502 .002 .058

N 23 23 23 23 23

queue time Pearson Correlation .111 -.408 -.604 1 .210

Sig. (2-tailed) .613 .053 .002 .335

N 23 23 23 23 23

overall quality Pearson Correlation -.592 -.066 .401 .210 1

Sig. (2-tailed) .003 .763 .058 .335

N 23 23 23 23 23

**. Correlation is significant at the 0.01 level (2-tailed).

The table above shows that there is only two significant relationships exist among the

variables which are relationship between infrastructure with overall quality and

cleanliness with queue time with p-value of 0.03 and 0.02 respectively (<0.05). Both

relationships are considered as negative moderate relationship.

introduction to spss with the interpretation and steps

Education

spss data editor

existing spss data file

data editor window

file open data

spss hands

interval data

data view window

data files