tech tip: creating dummy variables in ibm spss statistics

14
Creating Dummy Variables in IBM SPSS Statistics

Upload: presidion

Post on 09-Jan-2017

389 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Creating Dummy Variables

in IBM SPSS Statistics

What are Dummy Variables

Also known as Indicator Variables

Used in techniques like Regression where there is an assumption

that the predictors measurement level is scale

Dummy coding get’s around this assumption

Take a value of 0 or 1 to indicate the absence (0) or presence (1)

of some categorical effect

k -1 dummy variables required for a variable with k categories

2

An Example

Suppose you have a nominal variable with more than two

categories that you want to use as a predictor in a linear

Regression analysis i.e. Job Category

Then you will need to create 2 dummy variables (i.e. the

number of categories – 1) and include these new dummy

variables in your regression model

3

Considerations

Number of dummy variables – straight forward = k-1, where

k is the number of categories

Choose a reference category – this is the category that you

will compare all the other categories against

Often the reference category will be the first or last category

4

Doing this in IBM SPSS Statistics

Built into the Logistic Regression procedures, needs to be

created manually for Linear Regression/Discriminant

Analysis

No single function available

Best to do this using syntax

5

Approach 1

Using “Employee Data.sav” located in

C:\Program Files\IBM\SPSS\Statistics\*Version\Samples\English

*Version: Your SPSS Statistics Version, e.g. 20, 21, 22,…

For variable jobcat create two dummy variables: jobcat1 and

jobcat2

Initially set each variable to 0 and then specify that each will

take on a value of 1 for job categories 1 and 2

In this way category number 3 is set to be the reference

category

6

Approach 1

7

Approach 1

8

Approach 2

Using the VECTOR and LOOP – END LOOP commands

Use the Vector Command to create the required number of

dummy variables i.e. 2 in this case

Use the LOOP – END LOOP command to loop through each

of the dummy variables that are created using the VECTOR

command

9

Approach 2

10

Approach 2

This approach will make the last category the reference

category as we are only looping through categories 1 and 2

in COMPUTE jobcat(#i) = ( jobcat = #i).

To make the first category the reference category you could

modify the COMPUTE statement in the syntax as follows:

COMPUTE jobcat(#i) = ( jobcat = #i +1).

11

Dealing with missing values

Modify compute statements in Approach 1 to just:

• IF (NOT MISSING(jobcat)) jobcat1=0.

• IF (NOT MISSING(jobcat)) jobcat2=0.

This ensures missing values are still missing in the dummy

variables

Approach 2 will deal with missing values implicitly

12

Approach 1 modified to account for missing values

13