epib 698c lecture 2 notes instructor: raul cruz [email protected] 2/14/11 1

37
EPIB 698C Lecture 2 Notes Instructor: Raul Cruz [email protected] 2/14/11 1

Upload: abigayle-smith

Post on 01-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

EPIB 698C Lecture 2 Notes

Instructor: Raul [email protected]

2/14/11

1

Creating Data in SAS, an overview

Creating datasets by hand entry (Viewtable window, CARDS, DATALINES statement)

Reading dataset from external files (not SAS data, INFILE statement)

Using Import/Export facility (a point-and-click approach)

2

Entering Data with Viewtable window

To open viewtable window, select “Table Editor” from the Tools menu. An empty viewtable window will appear

The letters at the tops of columns are default variable names. Right click on the letter and open the Column attributes window. You can replace variables name, type, etc

3

Entering Data with Viewtable window

Entering you data once you have defined your columns. To save your table, select “Save as” from the File menu,

then select a library and specify the name of your table (SAS dataset)

To open an existing table, go to ToolsTable editor, the view table window will be opened. Then go the File menu, click Openchoose libraryselect Table name. To switch from browse mode (default) to edit mode, select “Edit Mode” from the Edit menu

Using Viewtable you can easily create a data table by setting the columns, you can add rows to the table. However, you can not add columns once you finish defining your columns. This is a considerable disadvantage.

4

Reading Data Inline, CARDS Statement

You enter the actual data points inside the PROGRAM EDITOR

Example: CARDS statement

data instructor;input name $ gender $ age;cards;Jane F 30Mary F 29Mike M 28;run;

5

$ sign after gender means that gender is a character variable

Examples

Reading multiple observations in each line of data by adding the @@ symbol at the end of the input statement

data aaa; input x y @@; datalines; 1 12.4 1 11.3 1 1.4 2 2.1 2 19.4 1 10.0 2 13.2 1 11.6; run;

6

Reading Dataset from External Files: INFILE Statement

Identifies the external file that contains the data and has options that control how the records in file are read into SAS

Must be used before the input statement because it locates the data file to be read

Syntax

data data_set_name;

Infile directory_and_file_name;

input variable_list;

run; 7

Reading raw data separated by spaces—list input

List input (also called free formatted input) can read data separated by at least one space. By default, SAS assumes data values are separated by one or more blanks.

Will read all the data in a record, no skipping unwanted values

Any missing data must by indicated with a period

Character data must be simple, no embedded spaces

8

Reading raw data separated by spaces—list input

Example: data demographics; infile "C:\test.txt“; input Gender $ Age Height Weight;

run;

9

Specify missing values with list input

We use a period to represent missing values

M 50 168 155 M 50 168 155

F 23 160 101 F . 160 101 M 65 172 220 M 65 172 220 F 35 165 133 F 35 165 133 M 15 171 166 M 15 171 166

10

Reading raw data separated by commas

Comma separated values file (csv file) use commas as data delimiters

They may or may not enclose character values in quotes.

Example: test.csv

"M",50,68,155"F",23,60,101"M",65,72,220"F",35,65,133"M",15,71,166

11

Reading raw data separated by commas

data demographics; infile 'c:\test.csv' dsd; input Gender $ Age Height Weight; run;

Dsd: means delimiter sensitive data. It has several functions.(1)change the default delimiter from a blank to a comma.(2)If there are two delimiters in a row, it assumes there is a

missing value in between(3)If character values are placed in quotes, the quotes are

stripped from the value

12

INFILE statement, useful options DSD It recognizes two consecutive delimiters as a

missing value. Example: 20,30,,50, SAS will treat this as 20 30 50 . but with the the dsd option SAS will treat it as 20 30 . 50 .

It allows you to include the delimiter within quoted strings.

Example: a comma separated file and your data included values like "George Bush, Jr.”

With the dsd option, SAS will recognize that the comma in "George Bush, Jr." is part of the name, and not a separator indicating a new variable.

13

INFILE statement (cont.)

DLM=The dlm= option can be used to specify the delimiter that separates the variables in your raw data file.

dlm=‘,’ indicates a comma is the delimiter (e.g., a comma separated file, .csv file).

dlm='09'x indicates that tabs are used to separate your variables

dlm=‘:’ indicates a colon is the delimiter

14

INFILE statement (cont.)

We can use dsd and dlm at the same time:

infile ‘file-description’ dsd dlm=‘:’ ;

This combination of options performs all the actions requested by the DSD option, but overrides the default delimiter (comma) with a delimiter of your choice

15

INFILE statement (cont.)

useful options continued… missover: if number of variables in file does not

match number of variables in input file all remaining variables are set to missing.

obs: specifies the last record to be read into the data set

firstobs: specifies the first line of data to be read into data set. Useful if there is a header row in the dataset.

CODE

16

Filename statement

Filename statement identifies the file and associate it with a reference name then use this reference in your INFILE statement instead of the actual file name

filename mydata 'C:\test.csv';

data demographics;infile mydata dsd;input Gender $ Age Height Weight;run;

17CODE

The INFILE option can be the DATALINES statement

data demographics;infile datalines dsd;input gender $ Age Height Weight;datalines;"M",50,68,155"F",23,60,101"M",65,72,220"F",35,65,133"M",15,71,166;run;

18

CODE

Reading data from fixed columns

Many raw data files store specific information in fixed columns

The advantage of fixed column files: (1) don’t need to worry about missing values

(2) you can choose which variables to read and in what order to read them

19

Bank data

123456789012345678901 columns00110/ 21/1955M 114500211/ 18/2001F 18722

Column 1-3 : subject IDColumn 4-13: Date of birthColumn 14-14: genderColumn 15-21: Account balance

20

Column Input

data financial;

infile "C:\bank.txt";input Subj $ 1-3

DOB $ 4-13

Gender $ 14

Balance 15-21;

run;

21

CODE

Formatted Input Formatted input can read both character and

standard numerical values as well as nonstandard numerical values, such as numbers with dollar signs and commas, and dates.

Formatted input is the most common and powerful of all input methods

SAS Formats and Informats: An informat is a specification for how raw data should be read. A format is a layout specification for how a variable should be printed or displayed.

22

Formatted Input

The bank data: data financial;infile "C:\bank.txt"; input @1 Subj $3.

@4 DOB mmddyy10.

@14 Gender $1.

@15 Balance 7.;

run;

23

Formatted Input

The @ sign the INPUT statement are called column pointers. @4 tells SAS to go to column 4.

Following variable names are SAS informats. Informats are built-in instructions that tell SAS how to read a data value

24

Formatted Input

Two of the most basic informats are w.d and $w.

The w.d informat reads standard numeric values. The w tells SAS how many columns to read; the optional d tells SAS that there is an implied decimal point in the value.

For examples: data value is 123, With informat 3.0, SAS will save it as 123;

With informat 3.1, SAS will save it as 12.3;

If the data value already has a decimal in it, then SAS ignores the d option. For examples: data value is 1.23,

With informat 4.1, SAS will it as 1.23;

25

Formatted Input

The $w. Informat tells SAS to read w columns of character data

The MMDDYY10. informat tells SAS that the date you are reading is in the mm/dd/yyyy form. SAS reads the date and converts the value into a SAS date.

SAS stores dates as numeric values equal to the number of days from January 1, 1960. Eg, if you read 01/01/1960, SAS stores a value of 0. The data 01/02/1960 is stored as a value of 1.

26

CODE

The format statement The format statements are built-in SAS command that allow you

to display data in easily readable ways. All SAS formats command ends either in a period or in a period

follows by a number.

title "Listing of FINANCIAL";proc print data=financial;format DOB mmddyy10.

Balance dollar11.2;

The dollar11.2 tells SAS to put a $ sign in front of the number, and allow up to 11 columns to print the balance values, the 2 tells SAS to include two decimal places after the decimal points.

27CODE

Using a format/informat statement in a DATA step

It is usually more useful to place your format statement with a SAS data step. There is a permanent association of the formats and variables in the data set.

You can override any permanent format by placing a FORMAT statement in a particular procedure.

28

CODE

A informat statement with list input Following the key word informat, you list each

variable and information you want to use to read each variable

data list_example; informat Subj $3. Name $20. DOB mmddyy10. Salary dollar8.; infile 'c:\list.csv' dsd; input Subj Name DOB Salary; format DOB date9.;

29

CODE

data list_example;infile 'c:\list.csv ' dsd; input Subj : $3. Name : $20. DOB : mmddyy10. Salary : dollar8.; format DOB date9.;run;

(1) there is a colon (called an informat modifier) preceding of each informat. It tells SAS to use informat supplied but to stop reading the values for this variable when a delimiter is met.

(2) Without it, SAS may read past a delimiter to satisfy the width specified in the informat.

30

What are the Differences Informats used at input Usually reading external data

Formats used during output cycle Write formatted value to output

Selected date informats

Informats Data form Input data

Input statement

Datew. ddmmmyyyy 1Feb1961 Input date Date9.

ddmmmyy 1Feb61 Input date Date7.

DDMMYYw. ddmmyy 01-02-61 Input date ddmmyy8.

ddmmyyyy 01/02/1961

Input date ddmmyy10.

MMDDYYw. mmddyy 02-01-61 Input date mmddyy8.

mmddyy 02-01-1961

Input date mmddyy10.

Selected date formats

formats Data displayed

Input data format statement

Results

Datew. ddmmmyyyy

366 format date Date9.

01Jan1961

ddmmmyy 366 format date Date7.

01Jan61

MMDDYYw.

mmddyy 366 format date mmddyy8.

01/01/61

mmddyyyy 366 format date mmddyy10.

01/01/1961

More examples of informat for numeric data

Informat Definition Input data

INPUT statement

Results

COMMAw.d Removes embedded commas and $, converts left parentheses to minus sign

$1,000

Input income comma6.0;

1000

(1,234)

Input income comma7.0;

-1234

Percentw. Converts percent to numbers

(20%) Input value Percent5.;

-0.2

w.d Reads standard numeric data

-12.3 Input value 5.1;

-12.3

More examples of informat for character data

Informat Definition Input data

INPUT statement

Results

$CHARw.

Reading character data, do not trim leading or trailing blanks

my cat Input animal $char10.;

my cat

my cat Input animal $char10.;

my cat

$w. Read character data, trim blanks

my cat Input animal $char10.;

my cat

my cat Input animal $char10.;

my cat

CODE

Import/Export Data

To Export SAS datasets Go to the File menu and select “Export Data” Choose the data file ( from the library Work) Locate and select file type using the browse button Save the data set and finish Check the log to make sure the data set was

created This method does not require a data step, but any

modification may require a data step Convenient for Excel file

Import a SAS data set follows similar step

36

37

Read data in Excel file

(1) Use Import procedure FileImport dataChoose Microsoft ExcelClick

“Next”Select work sheet using browseselect “Table” you want to importClick “Next”Select Library and assign a file nameClick “Next” if you want Proc import SAS code generated, otherwise click “Finish” Check the log window to make sure the data was created successfully.

(2) Save Excel file as a csv file, then read in using the infile statement with dsd option.