sas programming: working with variables. data step manipulations new variables should be created...

26
SAS Programming: Working With Variables

Upload: harry-fields

Post on 17-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

SAS Programming: Working With Variables

Data Step Manipulations

• New variables should be created during a Data step

• Existing variables should be manipulated during a data step

Missing Values in SAS

• SAS uses a period (.) to represent missing values in a SAS data set

• Different SAS procedures and functions treat missing values differently - always be careful when your SAS data set contains missing values

Working With Numeric Variables

• SAS uses the standard arithmetic operators+, -, *, /, ** (exponentiation)

Note on Missing Values: Arithmetic operators propagate missing values.

• SAS has many built-in numeric functionsround(variable,value): Rounds variable to nearest

unit given by value.

sum(variable1, variable2, …): Adds any number of variables and ignores missing values

Acting on Selected Observations

• Working with selected observations - subsets of a SAS data set - is easy in SAS

• First, you must decide on a selection process. What is the distinguishing characteristic of the observations you want to work with?

Selecting Observations: IF-THEN Statements

• The IF-THEN statement is the most common way to select observations. Format:IF condition THEN action;

• condition is one or more comparisons. For any observation, condition is either true or false. If condition is true, SAS performs the action.

IF-THEN Statement: Example

• Suppose INC is a variable representing annual household income and you want to create a dummy variable, DUM, based on income that takes value 1 when income is less than $10,000.IF INC<10000 THEN DUM=1;

IF INC >=10000 THEN DUM=0;

Using OBS in condition

• In a SAS data set, each record has an observation number which is the number stored in the variable OBS

• OBS can be used in a condition, but you must refer to the observation number using the variable _n_

• Example: set the first 10 observations of INC equal to zeroIF _n_ <= 10 THEN INC=0;

Comparison Operators

• There are 6 comparison operators• Can use either the symbol or mnemonic

Symbol Mnemonic Meaning

= EQ Equal to

^= NE Not equal to

> GT Greater than

< LT Less than

>= GE Greater than or equal to

<= LE Less than or equal to

Multiple Comparisons

• Can make more than one comparison in condition by using AND/OR

• AND / &: All parts must be true for condition to be true

• Or / |: At least one part must be true for condition to be true

• Be careful when using AND/OR• Can use parentheses in condition

Selecting Observations for New SAS Data Sets

• Can use IF-THEN statements to create new SAS data sets

• Either delete or keep selected observations based on condition

Deleting Observations

• Format for IF-THEN:IF condition THEN DELETE;

• Example: Removing missing observations. Suppose the variable INC is missing for some households and you want to drop these observationsIF INC=. THEN DELETE;

Keeping Selected Observations

• A more straightforward way to create new SAS data sets is to keep only those observations that meet some condition. Format:IF condition;

Example

• The file salary.dat contains data for 93 employees of a Chicago bank. The file contains the following variables:Y: Salary

X: Years of education

E: Months of previous work experience

T: Number of months after 1/1/69 that the individual was hired

• First 61 observations are females, last 32 males

Example: Create Dummy for Males

*Program to create dummy variables and;*new SAS data sets ;

data salary;infile ‘s:\mysas\salary.dat;input y x e t;

IF _n_ >61 THEN G=1;IF _n_ <= 60 THEN G=0;run;

Example: Create Data Set for Males

*Make a new SAS data set composed of only;*records for males ;

data males; *New SAS data set; set=salary; *Created from salary;

IF G=1;

run;

Example: Create Data Set for Females

*Make a new SAS data set composed of only;*records for females ;

data females; *New SAS data set; set=salary; *Created from salary;

IF G=0;

run;

Describing Data: Sample Statistics

• Format:

PROC UNIVARIATE <option-list>;VAR variable-list;BY variable-list;FREQ variable;WEIGHT variable;

Selected Options

DATA=SAS-data-set; Specify Data Set

If omitted, uses most recent

SAS data set

FREQ Generate Frequency Table

NOPRINT Suppress Printed Output

VAR Statement

• List of variables to calculate sample statistics for.

• If no variables are specified, sample statistics are generated for all numeric variables

WEIGHT Statement

• Specifies a numeric variable in the SAS data set whose values are used to weight each observation

BY Statement

• Can be used to obtain separate analyses on observations in groups defined by some value of a variable.

• Example: Suppose SEX=1 if individual is male, SEX=0 if individual is female; EARN=annual earnings.

PROC UNIVARIATE; *Generates statistics; VAR EARN; *on earnings for men;BY SEX; *and women;RUN;

BY Statements and Sorting

• Before using a BY statement, the SAS data set must be sorted on the variable specified

• SAS puts the observations in order, based on the values of the variables specified in the BY statement.

• Use PROC SORT

PROC SORT

• FORMAT:

PROC SORT <options>; BY <options>variables;• Sort Order: ascending. For descending,

put DESCENDING on BY line

Describing Data: Frequencies

• FORMAT:

PROC FREQ <options>; BY variables; TABLES requests</options>; WEIGHT variable;

One-Way Frequency Table

• SEX=1 (Male) SEX=0(Female)• EDUCATION=1(Less than High School), =2(High

School),=3(Some College),=4(College grad.)• EARN=Annual EarningsPROC FREQ; TABLES EDUCATION;RUN; PROC FREQ; TABLES EDUCATION; BY SEX;RUN;