1 epib 698d lecture 1 notes instructor: raul cruz [email protected] 1/23/2013

62
1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz [email protected] 1/23/2013

Upload: imogene-gibbs

Post on 04-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

1

EPIB 698D Lecture 1 Notes

Instructor: Raul [email protected]

1/23/2013

Page 2: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

2

Syllabus

Page 3: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

3

What is SAS?

SAS = “Statistical Analysis System” – developed for both data manipulation and data analyses in 1976

Visit the SAS website: http://www.sas.com

Page 4: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

4

Basics of SAS 5 Windows

EDITOR – file where you write code and comments for execution by SAS (save as .sas)

LOG – file where notes about the execution of the program are written, as well as errors (save as .log)

OUTPUT – file where results from the program are written (save as .lst)

Explorer Window

Results Window

Page 5: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

5

The SAS interface consists of multiple windows designed for specific functions.

The following windows are open by default:

Enhanced Editor Window

Output Window Log

Window Explorer

Window Results

Window Type SAS programs here.  The "enhanced" editor has more advanced features than the traditional "program editor" used in SAS 6.12.

View the results of SAS procedures including tables and line charts.  Graphs will be displayed in a separate Graph window.

View  SAS programs as they execute including error messages and warnings.

Browse your SAS tables (datasets) and libraries.  Create new files and file shortcuts.

Displays a hierarchical outline of SAS results to simplify output navigation.

Page 6: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

6

SAS Menus File: file input/output Edit: Editing contents in every window

Contents in LOG and OUTPUT windows are not editable, but deletable

View: view programs, log files, outputs, and data sets

Tools: editors for graphics, report, table, etc Solutions: analysis without writing codes Window: navigating among windows Help: help information of SAS

Page 7: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

7

SAS toolbar

The toolbar gives you quick access to commands that are already accessible through the pull down menus

Not all operating environments have a toolbar

Page 8: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

8

SAS command bar

Command bar is a place where you can type in SAS command.

Most commands you can type in the command bar are accessible through the SAS menus or the toolbar

Page 9: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

9

Controlling your windows

The window pull-down menu Type the name of the window in

the command bar Click on the window

Page 10: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

10

Basic Rules of SAS Codes Every SAS statement ends with a semicolon ; Lines of data are NOT separated by semicolons SAS statements can extend over multiple lines

provided you do not split a word of the statement across lines

More than one statement can appear on a single line

You can start statement anywhere within a line (not recommended)

SAS is case insensitive Words in SAS statement are separated by blanks

Page 11: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

11

SAS Steps Two main types of SAS steps:

Data Step: read in data, manipulate datasets etc.

PROC Step: perform statistical analyses etc.

DATA and PROC steps execute when a RUN, QUIT, or CARDS statement is enters Another DATA or PROC statement is entered The ENDSAS statement is entered

Page 12: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

12

SAS Comments Two ways to comment:

/* …..comments…..*/ good for long documentation good for commenting out sections of code

*……comments……; good for commenting out one line of code only commented until first ‘;’

SAS Comments are green in (SAS steps are blue)

Page 13: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

Example 1/*Data instructor contains information of several teachers*/

data instructor;input name $ gender $ age;cards;Jane F 30Mary F 29Mike M 28;run;Proc means;var age; run;

Page 14: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

SAS Dataset Basic structure: a rectangular matrix

Name Sex Age

Observation 1 Jane F 30

Observation 2 Mary F 29

Observation 3 Mike M 28

Columns are variables Rows are observations

Page 15: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

15

SAS data type

(1) Numeric data: numbers• Can be added and subtracted• Can have decimal places• Can be positive or negative

(2) Character data: contains letters, numerals or special characters

Page 16: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

SAS Dataset and variable names

Dataset name Start with A-Z or underscore character _ Can contain only letters, numbers, or

underscores Can contain upper- and lowercase letters choose names which are easy to be memorized Can be greater than 8 characters in SAS 8.0+

Variable name: same rule as dataset name

Page 17: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

17

Examples: valid SAS names

Parts LastName First_Name _Null1_ X12 X1Y1

Page 18: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

18

Examples: invalid SAS names

3Parts Last Name First-Name _Null1$ Num%

Page 19: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

19

Submitting a program in SAS

First, get your program into the editor

Type your program in the editor Open an existing SAS program: use

open from the File full down menu or use the open icon or just click your SAS program directly

Page 20: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

20

Submitting a program in SAS

Make your editor window active, and submit your code by

Submit Icon Enter submit in the command bar Select “submit” from the Run pull-

down menu

Page 21: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

21

Submitting a program in SAS

Reading the SAS log window It starts with notes about the version

of SAS and your SAS site number Original SAS code with line numbers

added on the left Notes contains information about

SAS data set and computer resources used

Page 22: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

22

Assessing errors in .log file Non-error SAS messages begin with NOTE: SAS error message begin with ERROR: or

possibly WARNING: In data set creation NOTE’s are important to

read because they indicate if the data set was created correctly. Many times there are no errors yet the data set is not correct.

ERROR message sometimes give you hints about options or keywords in DATA/PROC steps

Page 23: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

23

The output window

Viewing results from the output window

You can save and print contents in the output window

When you have a lot of output, one easy way to find the specific output is to use the list in the “results” window

Page 24: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

24

SAS Data Libraries A SAS library is simply a location where

SAS data sets are stored Explorer window, click on “libraries”,

there are at least three libraries: Sashelp, Sasuser and work.

Sashelp and Sasuser contains information that controls your SAS session.

Work is the default library, it is a temporary storage location for SAS data sets.

Page 25: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

25

Creating a new library

Make the “Active libraries” window active (click Explorer, then click libraries)

Choose “New” from the File menu or right click in the active libraries window and choose “New” from the pop-up menu

Page 26: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

26

Creating a new library Type the name of the library in the box

after name. This name must be eight characters or

fewer, and contains only letters, numbers and underscore.

In the path field, type in the complete path to the folder or directory where you want to save your data (or use Browse)

Page 27: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

27

Creating a new library

Another way to create a new library is to use the LIBNAME statement to associate the library with a directory accessible from your computer. LIBNAME mylib ‘H:/EPIB698D/week1’;

associates the directory h:/EPIB698D/week1 with the name mylib. Mylib is known as a libref (a library reference)

Page 28: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

Temporary/permanent SAS datasets

Every SAS dataset is stored in a SAS data library. By default all data sets created during a SAS session are

temporary data sets and are deleted when you close SAS. All data sets associated with the library WORK are

deleted at the end of the SAS session (they are temporary).

A permanent data set is a data set that will not be deleted when SAS is exited. To create a permanent data set, simply use a different

library name to create a data set.

Page 29: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

To create Permanent SAS datasets

Code to create permanent SAS datasets

libname yourlib ‘H:/EPIB698D/week1';

data yourlib.instructor; input name $ sex $ age; cards; Mike M 30 Wendy F 29 Jane F 28 ; run;

Page 30: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

30

To access Permanent SAS datasets When you start a new SAS session, the

permanent datasets can be accessed directly using libref.

The name of the libref can be different from the name you used when creating the permanent data set.libname mylib ‘H:/EPIB698D/week1';

proc print data=mylib.instructor;

run;

Page 31: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

Read existing SAS data: Libname and set statement

libname epib "D:\";

data new;set epib.list_example2;Run;

31

Page 32: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

32

Viewing SAS data with SAS Explorer Click the libraries icon in the Explorer

window Click the library you want to see Click the data name to open a SAS

data To go back to the previous window

within Explorer, choose “up one level” from the view menu, or click the up one level button on the toolbar

Page 33: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

PROC contents

PROC contents prints the descriptive information about the data set and the variables in the data set Data set information: name, number of

observations, number of variables, and date created

Variable information: name, internal order, type, length, format/informat, and label

Very useful for snapshot a data set Syntax:proc contents data=data_set_name;

run;

Page 34: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

TITLES Titles are descriptive headers SAS places at the top of

each page of the OUT window. A title is set with the TITLE statement followed by a

string of character. The string must be enclosed in single or double

quotes. The maximum length for a string is 200 characters. If you want multiple line titles you can use the TITLE

statement where the word title is followed by a number:title1 ‘EPIB 698A'; title2 'week1';

To clear the title setting simply executetitle;

Page 35: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

35

PROC print The PRINT procedure prints the observations in a SAS

data set to the output window. Features:

Autoformatting columns labeled with variable names or labels automatic accumulation and printing of subtotals

and totals Syntax:

proc print data=data_set_name options;

var var1 var2 var3 var4;

run;

Order

Page 36: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

36

PROC print (cont.)

Useful options with PROC print: double: double spaces the output noobs: suppresses observation numbers label: uses variable labels as column headings

Page 37: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

37

PROC print (cont.) The var statement

The var statement is used to specify the variables to process in a proc step. Not unique to proc print.

Variables are usually processed in the order listed in the var statement.

Only applies to a local proc step (not global) If no var statement is used, generally the

procedure will process all the variables (or all the numeric variables if a calculation is performed).

Page 38: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

Creating Data in SAS, an overview

Creating datasets by hand entry (Viewtable window, CARDS, DATALINES statement)

Reading dataset from external files (not SAS data, INFILE statement)

Using Import/Export facility (a point-and-click approach)

38

Page 39: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

Reading Data Inline, CARDS Statement

You enter the actual data points inside the PROGRAM EDITOR

Example: CARDS statement

data instructor;input name $ gender $ age;cards;Jane F 30Mary F 29Mike M 28;run;

39

Page 40: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

Reading Dataset from External Files: INFILE Statement

Identifies the external file that contains the data and has options that control how the records in file are read into SAS

Must be used before the input statement because it locates the data file to be read

Syntax

data data_set_name;

Infile directory_and_file_name;

input variable_list;

run; 40

Page 41: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

Reading raw data separated by spaces—list input

List input (also called free formatted input) can read data separated by at least one space. By default, SAS assumes data values are separated by one or more blanks.

Will read all the data in a record, no skipping unwanted values

Any missing data must by indicated with a period

Character data must be simple, no embedded spaces

41

Page 42: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

Reading raw data separated by spaces—list input

Example: data demographics; infile ’F:\teaching\SAS\mydata.txt';

input Gender $ Age Height Weight;

run;

$ sign after gender means that gender is a character variable

42

Page 43: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

Specify missing values with list input

We use a period to represent missing values

M 50 168 155 M 50 168 155

F 23 160 101 F . 160 101 M 65 172 220 M 65 172 220 F 35 165 133 F 35 165 133 M 15 171 166 M 15 171 166

43

Page 44: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

Reading raw data separated by commas

Comma separated values file (csv file) use commas as data delimiters

They may or may not enclose character values in quotes.

Example: mydata.csv

"M",50,68,155"F",23,60,101"M",65,72,220"F",35,65,133"M",15,71,166

44

Page 45: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

Reading raw data separated by commas

data demographics; infile 'c:\books\learning\mydata.csv' dsd; input Gender $ Age Height Weight; run;

Dsd: means delimiter sensitive data. It has several functions.(1)change the default delimiter from a blank to a comma.(2)If there are two delimiters in a row, it assumes there is a

missing value in between(3)If character values are placed in quotes, the quotes are

stripped from the value

45

Page 46: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

INFILE statement, useful options

DSD It recognizes two consecutive delimiters as a

missing value. Example: 20,30,,50, SAS will treat this as 20 30 50 . but with the the dsd option SAS will treat it as 20 30 . 50 .

It allows you to include the delimiter within quoted strings.

Example: a comma separated file and your data included values like "George Bush, Jr.”

With the dsd option, SAS will recognize that the comma in "George Bush, Jr." is part of the name, and not a separator indicating a new variable.

46

Page 47: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

INFILE statement (cont.)

useful options continued…

obs: specifies the last record to be read into the data set

firstobs: specifies the first line of data to be read into data set. Useful if there is a header row in the dataset.

47

data test; infile 'C:\mydata.txt' obs=4 firstobs=2; input Gender $ Age Height Weight;run;

Page 48: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

Filename statement

Filename statement identifies the file and associate it with a reference name then use this reference in your INFILE statement instead of the actual file name

Filename datafile'F:\teaching\SAS\mydata.csv';data demographics; infile datafile dsd; input Gender $ Age Height Weight; run;

48

Page 49: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

Specifying INFILE options with the DATALINES statement

data demographics; infile datalines dsd; input gender $ Age Height Weight;datalines;"M",50,68,155"F",23,60,101"M",65,72,220"F",35,65,133"M",15,71,166;Run;

49

Page 50: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

Formatted Input

Formatted input can read both character and standard numerical values as well as nonstandard numerical values, such as numbers with dollar signs and commas, and dates.

Formatted input is the most common and powerful of all input methods

SAS Formats and Informats: An informat is a specification for how raw data should be read. A format is a layout specification for how a variable should be printed or displayed. 50

Page 51: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

Formatted Input

The w.d informat reads standard numeric values. The w tells SAS how many columns to read; the optional d tells SAS that there is an implied decimal point in the value.

For examples: data value is 123, With informat 3.0, SAS will save it as 123;

With informat 3.1, SAS will save it as 12.3;

If the data value already has a decimal in it, then SAS ignores the d option. For examples: data value is 1.23,

With informat 4.1, SAS will it as 1.23;51

Page 52: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

Formatted Input

The $w. Informat tells SAS to read w columns of character data

The MMDDYY10. informat tells SAS that the date you are reading is in the mm/dd/yyyy form. SAS reads the date and converts the value into a SAS date.

SAS stores dates as numeric values equal to the number of days from January 1, 1960. Eg, if you read 01/01/1960, SAS stores a value of 0. The data 01/02/1960 is stored as a value of 1

52

Page 53: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

The format statement The format statements are built-in SAS command that allow you

to display data in easily readable ways. All SAS formats command ends either in a period or in a period

follows by a number.

title "Listing of FINANCIAL";proc print data=financial;format DOB mmddyy10.

Balance dollar11.2;

The dollar11.2 tells SAS to put a $ sign in front of the number, and allow up to 11 columns to print the balance values, the 2 tells SAS to include two decimal places after the decimal points.

53

Page 54: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

Using a format/informat statement in a DATA step

It is usually more useful to place your format statement with a SAS data step. There is a permanent association of the formats and variables in the data set.

You can override any permanent format by placing a FORMAT statement in a particular procedure.

54

Page 55: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

A informat statement with list input

Following the key word informat, you list each variable and information you want to use to read each variable

data list_example; informat Subj $3. Name $20. DOB mmddyy10. Salary dollar8.; infile 'c:\books\learning\list.csv' dsd; input Subj Name DOB Salary; format DOB date9. Salary dollar8.;Run;

55

Also helps with the size

of the variable

Page 56: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

data list_example;infile 'c:\books\learning\list.csv ' dsd; input Subj : $3. Name : $20. DOB : mmddyy10. Salary : dollar8.; format DOB date9. Salary dollar8.;

there is a colon (called an informat modifier) preceding of each informat. It tells SAS to use informat supplied but to stop reading the values for this variable when a delimiter is met.

Without it, SAS may read past a delimiter to satisfy the width specified in the informat.

56

A informat statement with list input (Abbreviated Form)

Page 57: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

Import/Export Data

To Export SAS datasets Go to the File menu and select “Export Data” Choose the data file ( from the library Work) Locate and select file type using the browse button Save the data set and finish Check the log to make sure the data set was

created This method does not require a data step, but any

modification may require a data step Convenient for Excel file

Import a SAS data set follows similar step

57

Page 58: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

58

Creating and Redefining Variables

You can create and redefine variables with assignment statements as follows: Variable =expression

Type of expression Example

Numeric constant Age =10;

Character constant Gender =‘Female’;

A old variable Age = age_at_baseline ;

Addition Age =age_at_baseline +10;

Page 59: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

59

Home gardener's data

Gardeners were asked to estimate the pounds they harvested for four corps:tomatoes, zucchini, peas and grapes. Here is the data:

Gregor 10 2 40 0 Molly 15 5 10 1000 Luther 50 10 15 50 Susan 20 0 . 20 Task: add new variable group with a value of 14; add variable type to indicate home gardener; Create a new variable zucchini_1 which equals to

zucchini*10 derive total pounds of corps for each gardener; derive % of tomatoes for each gardener

Page 60: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

60

Home gardener's data

DATA homegarden;

INFILE 'F:\SAS\lecture4\Garden.txt';

INPUT Name $ 1-7 Tomato Zucchini Peas grapes;

group = 14;

Type = 'home';

Zucchini_1= Zucchini * 10;

Total=tomato + zucchini_1 + peas + grapes;

PerTom = (Tomato / Total) * 100;

Run;

Page 61: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

61

Home gardener's data

Check the log window: Missing values were generated as a result of performing an operation on missing values.

Since for the last subject, we have missing values for peas, so we the variable total and PerTom, which are calculated from peas, are set to missing

Page 62: 1 EPIB 698D Lecture 1 Notes Instructor: Raul Cruz raulcruz@umd.edu 1/23/2013

62

Exercise 1 Exercise 1 next week

Syllabus is online

Securing SAS outside the classroom Labs (http://www.oit.umd.edu/as/cl/)

Regents Drive Garage (Building #202) in Room 0504.  The lab is open 24 hours, 7 days per week: http://www.oit.umd.edu/as/cl/

Desktop version from departments SAS Enterprise