understanding the sas® data step and the program … presentation was written by systems seminar...

117
Understanding the SAS® DATA Step and the Program Data Vector Steven J. First, President 2997 Yarmouth Greenway Drive, Madison, WI 53711 Understanding the SAS® DATA Step and the Program Data Vector 1 Phone: (608) 278-9964, Web: www.sys-seminar.com

Upload: lamhanh

Post on 17-Mar-2018

223 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Understanding the SAS® DATA Step and the Program Data Vector

Steven J. First, President2997 Yarmouth Greenway Drive, Madison, WI 53711

Understanding the SAS® DATA Step and the Program Data Vector 1

Phone: (608) 278-9964, Web: www.sys-seminar.com

Page 2: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Understanding the SAS® DATA Step and the PDV

This presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar Consultants, Inc.

SSC specializes SAS software and offers SAS: • Training Services• Training Services • Consulting Services • Help Desk Plans • Newsletter subscriptions to The Missing Semicolon™ Newsletter subscriptions to The Missing Semicolon .

SAS is a registered trademark of SAS Institute Inc. in the USA and other countries. The Missing Semicolon is a trademark of Systems Seminar C lt t I

Understanding the SAS® DATA Step and the Program Data Vector 2

Consultants, Inc.

Page 3: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Global Warming Part 1

Understanding the SAS® DATA Step and the Program Data Vector 3

Page 4: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Abstract

Th SAS t i d f SAS PROC d th DATA StThe SAS system is made up of SAS PROCs and the DATA Step

The DATA Step has:• an excellent, full fledged programming language• statements to read and write almost any type of data value• Conversion and calculation of new data• Looping• Interfaces• Arrays y• Much, much more.

In many ways the design of the DATA step along with its powerfulIn many ways, the design of the DATA step along with its powerful statements, is what makes the SAS language so popular.

Understanding the SAS® DATA Step and the Program Data Vector 4

Page 5: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Abstract (continued)

Thi ill ddThis paper will address:• How the DATA step fits with the rest of the SAS System• DATA step assumptions and defaults• internal structures such as buffers, and the Program Data Vector• compiler statements• executable statements

Understanding the SAS® DATA Step and the Program Data Vector 5

Page 6: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Introduction

Th SAS t ’ i i i th 1960’ d 1970’ ithThe SAS system’s origins are in the 1960’s and 1970’s with:• James Goodnight• John Sall• A. J. Barr• others

Concepts in the design include:• “self defining files”• a system of default assumptionsy p• procedures for commonly used routines• data handling step that would evolve into the SAS DATA step

Understanding the SAS® DATA Step and the Program Data Vector 6

Page 7: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Introduction

Th DATA t i i iThe DATA step in my opinion:• extremely simple• elegant design• continues today• 30 years of enhancements.

Understanding the SAS® DATA Step and the Program Data Vector 7

Page 8: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Structure of SAS

SAS i t fSAS consists of: 1. a data handling language (DATA step) 2. a library of pre-written procedures ( PROC step)

S A S S A SSTATEMENTS SUPERVISOR

RAWDATA

SASDATASET

REPORTSASDATASTEP

SASPROCSTEPSETSTEP STEP

Understanding the SAS® DATA Step and the Program Data Vector 8

Page 9: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Purpose of the DATA Step

“G t th d t i h ” f l t PROC d DATA t“Get the data in shape” for later PROCs and DATA steps.• SAS PROCs can only read SAS datasets• We might have some other type of file to process• SAS dataset has built in descriptor that keeps track of names and

attributes• later steps don’t have to remember as many details

If we don’t have well defined data• DATA step gives us the power to read and write virtually any kind of file • We can do calculations and computations on a single row of data• It has a very powerful data handling language

Understanding the SAS® DATA Step and the Program Data Vector 9

Page 10: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

SAS DATA Step Overview

DATA t d d it t t f d t t d tDATA steps can read and write most types of data stored on your computer.

otherSAS

d

instreamdata

raw disk,

DBMS,program

ddataset data tape products

SAS/ACCESS

get the data inshape for analysis

S S/ CC SS

DBMS,programproducts

raw disk,tape

instreamdata

other SAS

dataset

SAS/ACCESS

reports

Notes: • DATA step output is usually a SAS dataset but can be other files

productstapedataset

Understanding the SAS® DATA Step and the Program Data Vector 10

• DATA step output is usually a SAS dataset but can be other files. • Access to non-SAS database management systems requires a

SAS/ACCESS product.

Page 11: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Default Assumptions

M ti d t ti d ff tMany assumptions are made to save time and effort.

As computer scientist I study and use many programming languages• Early on I was intrigued by the cleverness and common sense of SAS• Many tedious programming tasks were eliminated through defaults• System still provided a means to override those defaults when necessary• Our job as a DATA step programmer then is different from other

languages

In many ways our tasks are:• understanding the defaults• knowing how to work with themg• override them as necessary.

Understanding the SAS® DATA Step and the Program Data Vector 11

Page 12: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Default Assumptions (continued)

E l Of SAS D f ltExamples Of SAS Defaults:

• Handling compile and execution naming and storage details • A dataset descriptor that makes SAS datasets “self defining”• Generating data set names if omitted• When reading a data set, assume most recently created dataset if not

specified• Processing all the rows and columns in a file• Automatically opening and closing of files• Automatically controlling data initialization• DATA step looping, data set output, and end of file checking

Understanding the SAS® DATA Step and the Program Data Vector 12

Page 13: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Default Assumptions (continued)

E l Of SAS D f ltExamples Of SAS Defaults:

• Automatically defining storage areas for each variable referenced without d t d fi thneed to predefine them

• A default length of 8 was assumed for all variables• A assumption that a variable is numeric if not specified• LIST input assumed that data values would be separated by blanks rather

than specifying exact columns• SUBSETTING IF statements which imply to continue processing if a

diti i t l d l t th b ti (E If t 10 )condition is true, else delete the observation. (Ex. If rate > 10; )• When no comparison is made in an IF statement, assume to be checking

for 1 (true) (Ex. If eof then put ‘At end’;)Ab i t d t t t (E S l t t l )• Abreviated sum statements. (Ex. Salestot+sales)

Understanding the SAS® DATA Step and the Program Data Vector 13

Page 14: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Compiling a DATA Step

A t l th DATA t i fi t il d th t dAs most languages, the DATA step is first compiled then executed.

• Languages can be compiled, interpreted, SAS is a hybrid language• Some features from other languages• Some unique features• Sometimes difficult to separate compile versus execution events with SAS• DATA step compiler examines SAS statements for syntax data structures • generates an executable program• Unique to SAS compiler checking for the existence of resourcesq p g• Assumptions that it “inserts” into the source code.

Understanding the SAS® DATA Step and the Program Data Vector 14

Page 15: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Data Structures

“G tti th d t i h ” d d t t t t h ld d t d“Getting the data in shape” needs data structures to hold data as processed.

• All computer languages need to address this• Each language may name the structures differently• There is a lot of similarity in the way most languages store data.

Understanding the SAS® DATA Step and the Program Data Vector 15

Page 16: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

A Typical SAS Job

BETH H 12 4822.12 982.10CHRIS H 2 233.11 94.12JOHN H 7 678.43 150.11

1234567890123456789012345678901234

input buffer

data softsale;;infile rawin;input name $1-10 division $12 years 15-16 sales 19-25 expense 27-34;

run;

Name: NAME DIVISION YEARS SALES EXPENSE _ERROR_ _N_ PROGRAM Type: CHAR CHAR NUM NUM NUM NUM NUM DATA Length: 10 1 8 8 8 8 8 VECTOR Format:

Informat: Label: Flags: D D

DATASET DESCRIPTOR Name: NAME DIVISION YEARS SALES EXPENSEPORTION Type: CHAR CHAR NUM NUM NUM(DISK) Length: 10 1 8 8 8

Flags: D D

Value

( ) gFormat: Informat: Label:

BETH H 12 4822.12 982.10DATASET CHRIS H 2 233.11 94.12DATA JOHN H 7 678.43 150.11PORTION

Understanding the SAS® DATA Step and the Program Data Vector 16

Page 17: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Raw File Buffers

DATA t di / iti “ ” SAS d t d b ffDATA steps reading/writing “raw” or non-SAS data need memory buffers.• Needed to temporarily hold at least one input record at a time.• Also times when multiple lines of input can be held in buffers• Allows the program to logically read later rows before earlier ones.• Buffer contains the complete input and output record, regardless of

whether the INPUT statement reads all of the columns.• SAS datasets and RDMS (which usually appear as SAS datasets) do not

use raw buffers as the files are already in “shape”.

Understanding the SAS® DATA Step and the Program Data Vector 17

Page 18: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Raw File Buffers

BETH H 12 4822.12 982.10CHRIS H 2 233.11 94.12JOHN H 7 678.43 150.11

1234567890123456789012345678901234

input buffer

data softsale;infile rawin;input name $1-10 division $12 years 15-16 sales 19-25 expense 27-34;

run;

Name: NAME DIVISION YEARS SALES EXPENSE _ERROR_ _N_ PROGRAM Type: CHAR CHAR NUM NUM NUM NUM NUM DATA Length: 10 1 8 8 8 8 8 VECTOR Format:

Informat: Label: Flags: D D

DATASET DESCRIPTOR Name: NAME DIVISION YEARS SALES EXPENSEPORTION Type: CHAR CHAR NUM NUM NUM(DISK) Length: 10 1 8 8 8

Flags: D D

Value

( ) gFormat: Informat: Label:

BETH H 12 4822.12 982.10DATASET CHRIS H 2 233.11 94.12DATA JOHN H 7 678.43 150.11PORTION

Understanding the SAS® DATA Step and the Program Data Vector 18

Page 19: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

The LOGICAL PROGRAM DATA VECTOR (PDV)

A d f d t i l ti i fi tA second memory area for data manipulation, conversion, refinement.

Areas are needed for:• Inputting and input formatting (informatting) desired variables• Revising existing values• Computing new variables• System indicators and flags• Called Logical Program Data Vector (PDV).• Many languages have a similar working area. (COBOL Working Storage).y g g g ( g g )• “Retained” and “Non-retained” variables are stored in separate physical

program data vectors, together make Logical Program Data Vector• All variables referenced be automatically defined in the PDV by compiler y y p

using characteristics from the first reference of a variable.

Example: agemo=age/12;

Understanding the SAS® DATA Step and the Program Data Vector 19

p g g ;

Page 20: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

The LOGICAL PROGRAM DATA VECTOR (PDV)

BETH H 12 4822.12 982.10CHRIS H 2 233.11 94.12JOHN H 7 678.43 150.11

1234567890123456789012345678901234

input buffer

data softsale;infile rawin;input name $1-10 division $12 years 15-16 sales 19-25 expense 27-34;

run;

Name: NAME DIVISION YEARS SALES EXPENSE _ERROR_ _N_ PROGRAM Type: CHAR CHAR NUM NUM NUM NUM NUM DATA Length: 10 1 8 8 8 8 8 VECTOR Format:

Informat: Label: Flags: D DFlags: D D

Value

DATASET DESCRIPTOR Name: NAME DIVISION YEARS SALES EXPENSEPORTION Type: CHAR CHAR NUM NUM NUM(DISK) Length: 10 1 8 8 8( ) g

Format: Informat: Label:

BETH H 12 4822.12 982.10DATASET CHRIS H 2 233.11 94.12DATA JOHN H 7 678.43 150.11PORTION

Understanding the SAS® DATA Step and the Program Data Vector 20

Page 21: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

The Final SAS Dataset

A “ lf d fi i ” d t tA “self-defining” dataset.

• The dataset descriptor contains attributes for all kept variables plus data t l b li i f tiset labeling information.

• The Dataset Data portion contains data values for all output rows and kept columns.P d i bl t k t• Pseudo-variables are not kept.

Understanding the SAS® DATA Step and the Program Data Vector 21

Page 22: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

The Final SAS Dataset

BETH H 12 4822.12 982.10CHRIS H 2 233.11 94.12JOHN H 7 678.43 150.11

1234567890123456789012345678901234

input buffer

data softsale;data softsale;infile rawin;input name $1-10 division $12 years 15-16 sales 19-25 expense 27-34;

run;

Name: NAME DIVISION YEARS SALES EXPENSE _ERROR_ _N_ PROGRAM Type: CHAR CHAR NUM NUM NUM NUM NUM DATA Length: 10 1 8 8 8 8 8 VECTOR Format:

Informat: Label: Flags: D D

DATASET DESCRIPTOR Name: NAME DIVISION YEARS SALES EXPENSEPORTION Type: CHAR CHAR NUM NUM NUM(DISK) Length: 10 1 8 8 8

Flags: D D

Value

( ) gFormat: Informat: Label:

BETH H 12 4822.12 982.10DATASET CHRIS H 2 233.11 94.12DATA JOHN H 7 678.43 150.11PORTION

Understanding the SAS® DATA Step and the Program Data Vector 22

Page 23: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Variable Attributes

Th il i h d fi d i bl l h t i tiThe compiler assigns each defined variable several characteristics.

• Relative variable number• Position in the dataset• Name• Data type• Length in bytes• Informat• Format• Variable label• Flags to indicate dropping, retaining, handling of missing values for each

variable

Understanding the SAS® DATA Step and the Program Data Vector 23

Page 24: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Data Types and Conversion

All SAS l f t d t t i h tAll SAS values are one of two data types: numeric or character.

• Hundreds of different data types can be read or written via the PDV, only t t dtwo stored

• In PDV, SAS Dataset every character value is stored as a native EBCDIC or ASCII value with length between 1 and at least 32767N i t d d bl i i fl ti i t l ith l th• Numerics stored as double precision floating point values with length between 3 and 8 bytes.

• Storing only two data types greatly simplifies things for SAS datasetsth li ti f ti diff t d t t ( k d bi• moves the complication of converting different data types (packed, binary,

etc.) to the INPUT and PUT statement along with appropriate INFORMATS and FORMATS.

• Floating point for numbers with a length of 8 allows for storage of very• Floating point for numbers with a length of 8 allows for storage of very large numbers (or small) without overflow

• Floating point does have minor mathematical issues of its own.

Understanding the SAS® DATA Step and the Program Data Vector 24

Page 25: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Data Types and Conversion

BETH H 12 4822.12 982.10CHRIS H 2 233.11 94.12JOHN H 7 678.43 150.11

1234567890123456789012345678901234

input buffer

data softsale;;infile rawin;input name $1-10 division $12 years 15-16 sales 19-25 expense 27-34;

run;

Name: NAME DIVISION YEARS SALES EXPENSE _ERROR_ _N_ PROGRAM Type: CHAR CHAR NUM NUM NUM NUM NUM DATA Length: 10 1 8 8 8 8 8 VECTOR Format:

Informat: Label: Flags: D D

DATASET DESCRIPTOR Name: NAME DIVISION YEARS SALES EXPENSEPORTION Type: CHAR CHAR NUM NUM NUM(DISK) Length: 10 1 8 8 8

Flags: D D

Value

( ) gFormat: Informat: Label:

BETH H 12 4822.12 982.10DATASET CHRIS H 2 233.11 94.12DATA JOHN H 7 678.43 150.11PORTION

Understanding the SAS® DATA Step and the Program Data Vector 25

Page 26: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Pseudo Variables

S i l i bl th t th il t th t t dd d t t t filSpecial variables that the compiler creates that are not added to output file.

Partial list:_N_ contains the number of times the DATA step has looped._ERROR_ is set to 0 if there were no input errors, otherwise 1. FIRST.variable showing beginning of control breakLAST.variable showing end of control break_INFILE_ contains entire input buffer

Others can be requested by the programmer to:• Detect end of file• Access control blocksAccess control blocks• Access and alter system information

Understanding the SAS® DATA Step and the Program Data Vector 26

Page 27: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Pseudo Variables

BETH H 12 4822.12 982.10CHRIS H 2 233.11 94.12JOHN H 7 678.43 150.11

1234567890123456789012345678901234

input buffer

data softsale;;infile rawin;input name $1-10 division $12 years 15-16 sales 19-25 expense 27-34;

run;

Name: NAME DIVISION YEARS SALES EXPENSE _ERROR_ _N_PROGRAM Type: CHAR CHAR NUM NUM NUM NUM NUMDATA Length: 10 1 8 8 8 8 8VECTOR Format:

Informat: Label: Flags: D D

DATASET DESCRIPTOR Name: NAME DIVISION YEARS SALES EXPENSEPORTION Type: CHAR CHAR NUM NUM NUM(DISK) Length: 10 1 8 8 8

Flags: D D

Value

( ) gFormat: Informat: Label:

BETH H 12 4822.12 982.10DATASET CHRIS H 2 233.11 94.12DATA JOHN H 7 678.43 150.11PORTION

Understanding the SAS® DATA Step and the Program Data Vector 27

Page 28: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Output Datasets

U ll SAS d t fil b t fil d t l b t dUsually a SAS datafile, but raw files and reports can also be created.

• SAS data sets are “self-defining” via dataset descriptor • Much simpler operation for the programmer• SAS automatically builds the structures needed• Outputs the record at DATA step return. • In addition, all variables except dropped and pseudo variables are kept• Descriptor info can be printed with PROC CONTENTS, dictionary tables

Notes:• If a raw file or report is produced, buffers opposite those of INFILE are

used.• Raw files can pass info to other programs.

Understanding the SAS® DATA Step and the Program Data Vector 28

Page 29: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Questions About The Data Step

If t diti l i i i li d thi DATA t tiIf traditional programming experience is applied this DATA step, questions might be:

Where are the opens and closes?Where are the opens and closes?Where do we write out records?What is looping?Wh d th t ?When does the program stop?

Understanding the SAS® DATA Step and the Program Data Vector 29

Page 30: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Assumptions Made In The Data Step

Th d f lt ti f th DATA t il d t tThe default actions of the DATA step easily accommodates most programs.

• All rows are read from files starting with the first record until last record.• Input values from a previous row are cleared before reading next row.• All variables referenced will be included on the output file.• All records will be included on the resulting output file.• All files should be opened at the beginning and closed at the end of step.• Programs should not continue to loop if no data is read in previous pass.

Understanding the SAS® DATA Step and the Program Data Vector 30

Page 31: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Assumptions Made In The Data Step

The SAS compiler makes the following assumption and inserts code to do:The SAS compiler makes the following assumption and inserts code to do:

• Immediately upon entry check for infinite looping• All values from non-SAS files are cleared before executing any• All values from non-SAS files are cleared before executing any

statements.• If any reading statement would read a record after end of file, step stops.• If the program reaches the last statement in the step or if a RETURN• If the program reaches the last statement in the step, or if a RETURN

statement is executed, the current PDV contents (all columns for each row) is output to the SAS data set being built.

• A branch is executed to go to the top and enter the DATA step for anotherA branch is executed to go to the top and enter the DATA step for another pass.

Understanding the SAS® DATA Step and the Program Data Vector 31

Page 32: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Assumptions Made In The Data Step

A th i it i th t th il i t th bl it li dAnother view it is that the compiler inserts the blue italic code.

data softsale;Check for looping initialize PDVCheck for looping, initialize PDVinfile rawin;if at EOF then stopInput Name $1-10 Division $12 Input Name $1-10 Division $12

Years 15-16 Sales 19-25Expense 28-34 State $36-37;

output to SAS Datasetgoto top of DATA steprun;

Notes:• These save effort and make DATA steps virtually infinite-loop proof.

Understanding the SAS® DATA Step and the Program Data Vector 32

p y p p

Page 33: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Executing the DATA StepFileref rawin

1 2 3data softsale;

init buffer, PDVinfile rawin; if at EOF then stop

End

1 2 31234567890123456789012345678901234567CHRIS H 2 233.11 94.12 WIMARK H 5 298.12 52.65 WISARAH S 6 301.21 65.17 MNif at EOF then stop

input Name $1-10 Division $12 Years 15-16Sales 19-25 Expense 28-34 State $36-37;

output to SAS Datasetgoto top of Data Stepgoto top of Data Step

run; 1 2 3 4 81234567890123456789012345678901234567890...0

Input buffer

Name Division Years Sales Expense StateLogicalProgram Data Vector

(Data)

(Descriptor)Dataset work.softsale

Name Division Years Sales Expense State

Understanding the SAS® DATA Step and the Program Data Vector 33

(Data)

Page 34: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Executing the DATA StepFileref rawin

1 2 3data softsale;

init buffer, PDVinfile rawin; if at EOF then stop

End

1 2 31234567890123456789012345678901234567CHRIS H 2 233.11 94.12 WIMARK H 5 298.12 52.65 WISARAH S 6 301.21 65.17 MNif at EOF then stop

input Name $1-10 Division $12 Years 15-16Sales 19-25 Expense 28-34 State $36-37;

output to SAS Datasetgoto top of Data Stepgoto top of Data Step

run; 1 2 3 4 81234567890123456789012345678901234567890...0

Input buffer

Name Division Years Sales Expense StateLogicalProgram Data Vector

(Data)

(Descriptor)Dataset work.softsale

Name Division Years Sales Expense State

Understanding the SAS® DATA Step and the Program Data Vector 34

(Data)

Page 35: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Executing the DATA StepFileref rawin

1 2 3data softsale;init buffer, PDVinfile rawin; if at EOF then stop

End

1 2 31234567890123456789012345678901234567CHRIS H 2 233.11 94.12 WIMARK H 5 298.12 52.65 WISARAH S 6 301.21 65.17 MNif at EOF then stop

input Name $1-10 Division $12 Years 15-16Sales 19-25 Expense 28-34 State $36-37;

output to SAS Datasetgoto top of Data Stepgoto top of Data Step

run; 1 2 3 4 81234567890123456789012345678901234567890...0

Input buffer

Name Division Years Sales Expense StateLogicalProgram Data Vector . . .

(Data)

(Descriptor)Dataset work.softsale

Name Division Years Sales Expense State

Understanding the SAS® DATA Step and the Program Data Vector 35

(Data)

Page 36: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Executing the DATA StepFileref rawin

1 2 3data softsale;init buffer, PDVinfile rawin;if at EOF then stop

End

1 2 31234567890123456789012345678901234567CHRIS H 2 233.11 94.12 WIMARK H 5 298.12 52.65 WISARAH S 6 301.21 65.17 MNif at EOF then stop

input Name $1-10 Division $12 Years 15-16Sales 19-25 Expense 28-34 State $36-37;

output to SAS Datasetgoto top of Data Stepgoto top of Data Step

run; 1 2 3 4 81234567890123456789012345678901234567890...0

Input buffer

Name Division Years Sales Expense StateLogicalProgram Data Vector . . .

(Data)

(Descriptor)Dataset work.softsale

Name Division Years Sales Expense State

Understanding the SAS® DATA Step and the Program Data Vector 36

(Data)

Page 37: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Executing the DATA StepFileref rawin

1 2 3data softsale;init buffer, PDVinfile rawin; if at EOF then stop

End

1 2 31234567890123456789012345678901234567CHRIS H 2 233.11 94.12 WIMARK H 5 298.12 52.65 WISARAH S 6 301.21 65.17 MNif at EOF then stop

input Name $1-10 Division $12 Years 15-16Sales 19-25 Expense 28-34 State $36-37;

output to SAS Datasetgoto top of Data Stepgoto top of Data Step

run; 1 2 3 4 81234567890123456789012345678901234567890...0

Input buffer

Name Division Years Sales Expense StateLogicalProgram Data Vector . . .

(Data)

(Descriptor)Dataset work.softsale

Name Division Years Sales Expense State

Understanding the SAS® DATA Step and the Program Data Vector 37

(Data)

Page 38: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Executing the DATA StepFileref rawin

1 2 3data softsale;init buffer, PDVinfile rawin; if at EOF then stop

End

1 2 31234567890123456789012345678901234567CHRIS H 2 233.11 94.12 WIMARK H 5 298.12 52.65 WISARAH S 6 301.21 65.17 MNif at EOF then stop

input Name $1-10 Division $12 Years 15-16Sales 19-25 Expense 28-34 State $36-37;

output to SAS Datasetgoto top of Data Stepgoto top of Data Step

run; 1 2 3 4 81234567890123456789012345678901234567890...0

Input buffer CHRIS H 2 233.11 94.12 WI

Name Division Years Sales Expense StateLogicalProgram Data Vector . . .

(Data)

(Descriptor)Dataset work.softsale

Name Division Years Sales Expense State

Understanding the SAS® DATA Step and the Program Data Vector 38

(Data)

Page 39: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Executing the DATA StepFileref rawin

1 2 3data softsale;init buffer, PDVinfile rawin; if at EOF then stop

End

1 2 31234567890123456789012345678901234567CHRIS H 2 233.11 94.12 WIMARK H 5 298.12 52.65 WISARAH S 6 301.21 65.17 MNif at EOF then stop

input Name $1-10 Division $12 Years 15-16Sales 19-25 Expense 28-34 State $36-37;

output to SAS Datasetgoto top of Data Stepgoto top of Data Step

run; 1 2 3 4 81234567890123456789012345678901234567890...0

Input buffer CHRIS H 2 233.11 94.12 WI

Name Division Years Sales Expense StateLogicalProgram Data Vector . . .CHRIS

(Data)

(Descriptor)Dataset work.softsale

Name Division Years Sales Expense State

Understanding the SAS® DATA Step and the Program Data Vector 39

(Data)

Page 40: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Executing the DATA StepFileref rawin

1 2 3data softsale;init buffer, PDVinfile rawin; if at EOF then stop

End

1 2 31234567890123456789012345678901234567CHRIS H 2 233.11 94.12 WIMARK H 5 298.12 52.65 WISARAH S 6 301.21 65.17 MNif at EOF then stop

input Name $1-10 Division $12 Years 15-16Sales 19-25 Expense 28-34 State $36-37;

output to SAS Datasetgoto top of Data Stepgoto top of Data Step

run; 1 2 3 4 81234567890123456789012345678901234567890...0

Input buffer CHRIS H 2 233.11 94.12 WI

Name Division Years Sales Expense StateLogicalProgram Data Vector . . .CHRIS H

(Data)

(Descriptor)Dataset work.softsale

Name Division Years Sales Expense State

Understanding the SAS® DATA Step and the Program Data Vector 40

(Data)

Page 41: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Executing the DATA StepFileref rawin

1 2 3data softsale;init buffer, PDVinfile rawin; if at EOF then stop

End

1 2 31234567890123456789012345678901234567CHRIS H 2 233.11 94.12 WIMARK H 5 298.12 52.65 WISARAH S 6 301.21 65.17 MNif at EOF then stop

input Name $1-10 Division $12 Years 15-16Sales 19-25 Expense 28-34 State $36-37;

output to SAS Datasetgoto top of Data Stepgoto top of Data Step

run; 1 2 3 4 81234567890123456789012345678901234567890...0

Input buffer CHRIS H 2 233.11 94.12 WI

Name Division Years Sales Expense StateLogicalProgram Data Vector 2 . .CHRIS H

(Data)

(Descriptor)Dataset work.softsale

Name Division Years Sales Expense State

Understanding the SAS® DATA Step and the Program Data Vector 41

(Data)

Page 42: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Executing the DATA StepFileref rawin

1 2 3data softsale;init buffer, PDVinfile rawin; if at EOF then stop

End

1 2 31234567890123456789012345678901234567CHRIS H 2 233.11 94.12 WIMARK H 5 298.12 52.65 WISARAH S 6 301.21 65.17 MNif at EOF then stop

input Name $1-10 Division $12 Years 15-16Sales 19-25 Expense 28-34 State $36-37;

output to SAS Datasetgoto top of Data Stepgoto top of Data Step

run; 1 2 3 4 81234567890123456789012345678901234567890...0

Input buffer CHRIS H 2 233.11 94.12 WI

Name Division Years Sales Expense StateLogicalProgram Data Vector 2 233.11 .CHRIS H

(Data)

(Descriptor)Dataset work.softsale

Name Division Years Sales Expense State

Understanding the SAS® DATA Step and the Program Data Vector 42

(Data)

Page 43: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Executing the DATA StepFileref rawin

1 2 3data softsale;init buffer, PDVinfile rawin; if at EOF then stop

End

1 2 31234567890123456789012345678901234567CHRIS H 2 233.11 94.12 WIMARK H 5 298.12 52.65 WISARAH S 6 301.21 65.17 MNif at EOF then stop

input Name $1-10 Division $12 Years 15-16Sales 19-25 Expense 28-34 State $36-37;

output to SAS Datasetgoto top of Data Stepgoto top of Data Step

run; 1 2 3 4 81234567890123456789012345678901234567890...0

Input buffer CHRIS H 2 233.11 94.12 WI

Name Division Years Sales Expense StateLogicalProgram Data Vector 2 233.11 94.12CHRIS H

(Data)

(Descriptor)Dataset work.softsale

Name Division Years Sales Expense State

Understanding the SAS® DATA Step and the Program Data Vector 43

(Data)

Page 44: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Executing the DATA StepFileref rawin

1 2 3data softsale;init buffer, PDVinfile rawin; if at EOF then stop

End

1 2 31234567890123456789012345678901234567CHRIS H 2 233.11 94.12 WIMARK H 5 298.12 52.65 WISARAH S 6 301.21 65.17 MNif at EOF then stop

input Name $1-10 Division $12 Years 15-16Sales 19-25 Expense 28-34 State $36-37;

output to SAS Datasetgoto top of Data Stepgoto top of Data Step

run; 1 2 3 4 81234567890123456789012345678901234567890...0

Input buffer CHRIS H 2 233.11 94.12 WI

Name Division Years Sales Expense StateLogicalProgram Data Vector 2 233.11 94.12 CHRIS H WI

(Data)

(Descriptor)Dataset work.softsale

Name Division Years Sales Expense State

Understanding the SAS® DATA Step and the Program Data Vector 44

(Data)

Page 45: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Executing the DATA StepFileref rawin

1 2 3data softsale;init buffer, PDVinfile rawin; if at EOF then stop

End

1 2 31234567890123456789012345678901234567CHRIS H 2 233.11 94.12 WIMARK H 5 298.12 52.65 WISARAH S 6 301.21 65.17 MNif at EOF then stop

input Name $1-10 Division $12 Years 15-16Sales 19-25 Expense 28-34 State $36-37;

output to SAS Datasetgoto top of Data Stepgoto top of Data Step

run; 1 2 3 4 81234567890123456789012345678901234567890...0

Input buffer CHRIS H 2 233.11 94.12 WI

Name Division Years Sales Expense StateLogicalProgram Data Vector 2 233.11 94.12CHRIS H WI

(Data)

(Descriptor)Dataset work.softsale

Name Division Years Sales Expense State

CHRIS H 2 233 11 94 12 WI

Understanding the SAS® DATA Step and the Program Data Vector 45

(Data) CHRIS H 2 233.11 94.12 WI

Page 46: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Executing the DATA StepFileref rawin

1 2 3data softsale;init buffer, PDVinfile rawin; if at EOF then stop

End

1 2 31234567890123456789012345678901234567CHRIS H 2 233.11 94.12 WIMARK H 5 298.12 52.65 WISARAH S 6 301.21 65.17 MNif at EOF then stop

input Name $1-10 Division $12 Years 15-16Sales 19-25 Expense 28-34 State $36-37;

output to SAS Datasetgoto top of Data Stepgoto top of Data Step

run; 1 2 3 4 81234567890123456789012345678901234567890...0

Input buffer CHRIS H 2 233.11 94.12 WI

Name Division Years Sales Expense StateLogicalProgram Data Vector 2 233.11 94.12 CHRIS H WI

(Data)

(Descriptor)Dataset work.softsale

Name Division Years Sales Expense State

CHRIS H 2 233 11 94 12 WI

Understanding the SAS® DATA Step and the Program Data Vector 46

(Data) CHRIS H 2 233.11 94.12 WI

Page 47: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Executing the DATA StepFileref rawin

1 2 3data softsale;init buffer, PDVinfile rawin; if at EOF then stop

End

1 2 31234567890123456789012345678901234567CHRIS H 2 233.11 94.12 WIMARK H 5 298.12 52.65 WISARAH S 6 301.21 65.17 MNif at EOF then stop

input Name $1-10 Division $12 Years 15-16Sales 19-25 Expense 28-34 State $36-37;

output to SAS Datasetgoto top of Data Stepgoto top of Data Step

run; 1 2 3 4 81234567890123456789012345678901234567890...0

Input buffer CHRIS H 2 233.11 94.12 WI

Name Division Years Sales Expense StateLogicalProgram Data Vector 2 233.11 94.12 CHRIS H WI

(Data)

(Descriptor)Dataset work.softsale

Name Division Years Sales Expense State

CHRIS H 2 233 11 94 12 WI

Understanding the SAS® DATA Step and the Program Data Vector 47

(Data) CHRIS H 2 233.11 94.12 WI

Page 48: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Executing the DATA StepFileref rawin

1 2 3data softsale;init buffer, PDVinfile rawin; if at EOF then stop

End

1 2 31234567890123456789012345678901234567CHRIS H 2 233.11 94.12 WIMARK H 5 298.12 52.65 WISARAH S 6 301.21 65.17 MNif at EOF then stop

input Name $1-10 Division $12 Years 15-16Sales 19-25 Expense 28-34 State $36-37;

output to SAS Datasetgoto top of Data Stepgoto top of Data Step

run; 1 2 3 4 81234567890123456789012345678901234567890...0

Input buffer

Name Division Years Sales Expense StateLogicalProgram Data Vector . . .

(Data)

(Descriptor)Dataset work.softsale

Name Division Years Sales Expense State

CHRIS H 2 233 11 94 12 WI

Understanding the SAS® DATA Step and the Program Data Vector 48

(Data) CHRIS H 2 233.11 94.12 WI

Page 49: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Executing the DATA StepFileref rawin

1 2 3data softsale;init buffer, PDVinfile rawin;if at EOF then stop

End

1 2 31234567890123456789012345678901234567CHRIS H 2 233.11 94.12 WIMARK H 5 298.12 52.65 WISARAH S 6 301.21 65.17 MNif at EOF then stop

input Name $1-10 Division $12 Years 15-16Sales 19-25 Expense 28-34 State $36-37;

output to SAS Datasetgoto top of Data Stepgoto top of Data Step

run; 1 2 3 4 81234567890123456789012345678901234567890...0

Input buffer

Name Division Years Sales Expense StateLogicalProgram Data Vector . . .

(Data)

(Descriptor)Dataset work.softsale

Name Division Years Sales Expense State

CHRIS H 2 233 11 94 12 WI

Understanding the SAS® DATA Step and the Program Data Vector 49

(Data) CHRIS H 2 233.11 94.12 WI

Page 50: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Executing the DATA StepFileref rawin

1 2 3data softsale;init buffer, PDVinfile rawin; if at EOF then stop

End

1 2 31234567890123456789012345678901234567CHRIS H 2 233.11 94.12 WIMARK H 5 298.12 52.65 WISARAH S 6 301.21 65.17 MNif at EOF then stop

input Name $1-10 Division $12 Years 15-16Sales 19-25 Expense 28-34 State $36-37;

output to SAS Datasetgoto top of Data Stepgoto top of Data Step

run; 1 2 3 4 81234567890123456789012345678901234567890...0

Input buffer

Name Division Years Sales Expense StateLogicalProgram Data Vector . . .

(Data)

(Descriptor)Dataset work.softsale

Name Division Years Sales Expense State

CHRIS H 2 233 11 94 12 WI

Understanding the SAS® DATA Step and the Program Data Vector 50

(Data) CHRIS H 2 233.11 94.12 WI

Page 51: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Executing the DATA StepFileref rawin

1 2 3data softsale;init buffer, PDVinfile rawin; if at EOF then stop

End

1 2 31234567890123456789012345678901234567CHRIS H 2 233.11 94.12 WIMARK H 5 298.12 52.65 WISARAH S 6 301.21 65.17 MNif at EOF then stop

input Name $1-10 Division $12 Years 15-16Sales 19-25 Expense 28-34 State $36-37;

output to SAS Datasetgoto top of Data Stepgoto top of Data Step

run; 1 2 3 4 81234567890123456789012345678901234567890...0

Input buffer MARK H 5 298.12 52.65 WI

Name Division Years Sales Expense StateLogicalProgram Data Vector . . .

(Data)

(Descriptor)Dataset work.softsale

Name Division Years Sales Expense State

CHRIS H 2 233 11 94 12 WI

Understanding the SAS® DATA Step and the Program Data Vector 51

(Data) CHRIS H 2 233.11 94.12 WI

Page 52: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Executing the DATA StepFileref rawin

1 2 3data softsale;init buffer, PDVinfile rawin; if at EOF then stop

End

1 2 31234567890123456789012345678901234567CHRIS H 2 233.11 94.12 WIMARK H 5 298.12 52.65 WISARAH S 6 301.21 65.17 MNif at EOF then stop

input Name $1-10 Division $12 Years 15-16Sales 19-25 Expense 28-34 State $36-37;

output to SAS Datasetgoto top of Data Step

Each row is handled the same way.

Let's skip throughgoto top of Data Steprun; 1 2 3 4 8

1234567890123456789012345678901234567890...0Input buffer MARK H 5 298.12 52.65 WI

MARK’s input step.

Name Division Years Sales Expense StateLogicalProgram Data Vector . . .

(Data)

(Descriptor)Dataset work.softsale

Name Division Years Sales Expense State

CHRIS H 2 233 11 94 12 WI

Understanding the SAS® DATA Step and the Program Data Vector 52

(Data) CHRIS H 2 233.11 94.12 WI

Page 53: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Executing the DATA StepFileref rawin

1 2 3data softsale;init buffer, PDVinfile rawin; if at EOF then stop

End

1 2 31234567890123456789012345678901234567CHRIS H 2 233.11 94.12 WIMARK H 5 298.12 52.65 WISARAH S 6 301.21 65.17 MNif at EOF then stop

input Name $1-10 Division $12 Years 15-16Sales 19-25 Expense 28-34 State $36-37;

output to SAS Datasetgoto top of Data Stepgoto top of Data Step

run; 1 2 3 4 81234567890123456789012345678901234567890...0

Input buffer MARK H 5 298.12 52.65 WI

Name Division Years Sales Expense StateLogicalProgram Data Vector 5 298.12 52.65 WIHMARK

(Data)

(Descriptor)Dataset work.softsale

Name Division Years Sales Expense StateCHRIS H 2 233.11 94.12 WI

Understanding the SAS® DATA Step and the Program Data Vector 53

(Data)

Page 54: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Executing the DATA StepFileref rawin

1 2 3data softsale;init buffer, PDVinfile rawin; if at EOF then stop

End

1 2 31234567890123456789012345678901234567CHRIS H 2 233.11 94.12 WIMARK H 5 298.12 52.65 WISARAH S 6 301.21 65.17 MNif at EOF then stop

input Name $1-10 Division $12 Years 15-16Sales 19-25 Expense 28-34 State $36-37;

output to SAS Datasetgoto top of Data Stepgoto top of Data Step

run; 1 2 3 4 81234567890123456789012345678901234567890...0

Input buffer MARK H 5 298.12 52.65 WI

Name Division Years Sales Expense StateLogicalProgram Data Vector 5 298.12 52.65 WIHMARK

(Data)

(Descriptor)Dataset work.softsale

Name Division Years Sales Expense StateCHRISMARK

HH

25

233.11298.12

94.1252.65

WIWI

Understanding the SAS® DATA Step and the Program Data Vector 54

(Data)

Page 55: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Executing the DATA StepFileref rawin

1 2 3data softsale;init buffer, PDVinfile rawin; if at EOF then stop

End

1 2 31234567890123456789012345678901234567CHRIS H 2 233.11 94.12 WIMARK H 5 298.12 52.65 WISARAH S 6 301.21 65.17 MNif at EOF then stop

input Name $1-10 Division $12 Years 15-16Sales 19-25 Expense 28-34 State $36-37;

output to SAS Datasetgoto top of Data Stepgoto top of Data Step

run; 1 2 3 4 81234567890123456789012345678901234567890...0

Input buffer MARK H 5 298.12 52.65 WI

Name Division Years Sales Expense StateLogicalProgram Data Vector 5 298.12 52.65 WIHMARK

(Data)

(Descriptor)Dataset work.softsale

Name Division Years Sales Expense StateCHRISMARK

HH

25

233.11298.12

94.1252.65

WIWI

Understanding the SAS® DATA Step and the Program Data Vector 55

(Data)

Page 56: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Executing the DATA StepFileref rawin

1 2 3data softsale;init buffer, PDVinfile rawin; if at EOF then stop

End

1 2 31234567890123456789012345678901234567CHRIS H 2 233.11 94.12 WIMARK H 5 298.12 52.65 WISARAH S 6 301.21 65.17 MNif at EOF then stop

input Name $1-10 Division $12 Years 15-16Sales 19-25 Expense 28-34 State $36-37;

output to SAS Datasetgoto top of Data Stepgoto top of Data Step

run; 1 2 3 4 81234567890123456789012345678901234567890...0

Input buffer MARK H 5 298.12 52.65 WI

Name Division Years Sales Expense StateLogicalProgram Data Vector 5 298.12 52.65 WIHMARK

(Data)

(Descriptor)Dataset work.softsale

Name Division Years Sales Expense StateCHRISMARK

HH

25

233.11298.12

94.1252.65

WIWI

Understanding the SAS® DATA Step and the Program Data Vector 56

(Data)

Page 57: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Executing the DATA StepFileref rawin

1 2 3data softsale;init buffer, PDVinfile rawin; if at EOF then stop

End

1 2 31234567890123456789012345678901234567CHRIS H 2 233.11 94.12 WIMARK H 5 298.12 52.65 WISARAH S 6 301.21 65.17 MNif at EOF then stop

input Name $1-10 Division $12 Years 15-16Sales 19-25 Expense 28-34 State $36-37;

output to SAS Datasetgoto top of Data Stepgoto top of Data Step

run; 1 2 3 4 81234567890123456789012345678901234567890...0

Input buffer

Name Division Years Sales Expense StateLogicalProgram Data Vector . . .

(Data)

(Descriptor)Dataset work.softsale

Name Division Years Sales Expense StateCHRISMARK

HH

25

233.11298.12

94.1252.65

WIWI

Understanding the SAS® DATA Step and the Program Data Vector 57

(Data)

Page 58: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Executing the DATA StepFileref rawin

1 2 3data softsale;init buffer, PDVinfile rawin;if at EOF then stop

End

1 2 31234567890123456789012345678901234567CHRIS H 2 233.11 94.12 WIMARK H 5 298.12 52.65 WISARAH S 6 301.21 65.17 MNif at EOF then stop

input Name $1-10 Division $12 Years 15-16Sales 19-25 Expense 28-34 State $36-37;

output to SAS Datasetgoto top of Data Stepgoto top of Data Step

run; 1 2 3 4 81234567890123456789012345678901234567890...0

Input buffer

Name Division Years Sales Expense StateLogicalProgram Data Vector . . .

(Data)

(Descriptor)Dataset work.softsale

Name Division Years Sales Expense StateCHRISMARK

HH

25

233.11298.12

94.1252.65

WIWI

Understanding the SAS® DATA Step and the Program Data Vector 58

(Data)

Page 59: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Executing the DATA StepFileref rawin

1 2 3data softsale;init buffer, PDVinfile rawin; if at EOF then stop

End

1 2 31234567890123456789012345678901234567CHRIS H 2 233.11 94.12 WIMARK H 5 298.12 52.65 WISARAH S 6 301.21 65.17 MNif at EOF then stop

input Name $1-10 Division $12 Years 15-16Sales 19-25 Expense 28-34 State $36-37;

output to SAS Datasetgoto top of Data Stepgoto top of Data Step

run; 1 2 3 4 81234567890123456789012345678901234567890...0

Input buffer

Name Division Years Sales Expense StateLogicalProgram Data Vector . . .

(Data)

(Descriptor)Dataset work.softsale

Name Division Years Sales Expense StateCHRISMARK

HH

25

233.11298.12

94.1252.65

WIWI

Understanding the SAS® DATA Step and the Program Data Vector 59

(Data)

Page 60: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Executing the DATA StepFileref rawin

1 2 3data softsale;init buffer, PDVinfile rawin; if at EOF then stop

End

1 2 31234567890123456789012345678901234567CHRIS H 2 233.11 94.12 WIMARK H 5 298.12 52.65 WISARAH S 6 301.21 65.17 MNif at EOF then stop

input Name $1-10 Division $12 Years 15-16Sales 19-25 Expense 28-34 State $36-37;

output to SAS Datasetgoto top of Data Stepgoto top of Data Step

run; 1 2 3 4 81234567890123456789012345678901234567890...0

Input buffer SARAH S 6 301.21 65.17 MN

Name Division Years Sales Expense StateLogicalProgram Data Vector . . .

(Data)

(Descriptor)Dataset work.softsale

Name Division Years Sales Expense StateCHRISMARK

HH

25

233.11298.12

94.1252.65

WIWI

Understanding the SAS® DATA Step and the Program Data Vector 60

(Data)

Page 61: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Executing the DATA StepFileref rawin

1 2 3data softsale;init buffer, PDVinfile rawin; if at EOF then stop

End

1 2 31234567890123456789012345678901234567CHRIS H 2 233.11 94.12 WIMARK H 5 298.12 52.65 WISARAH S 6 301.21 65.17 MNif at EOF then stop

input Name $1-10 Division $12 Years 15-16Sales 19-25 Expense 28-34 State $36-37;

output to SAS Datasetgoto top of Data Step

Each row is handled the same way.

Let's skip throughgoto top of Data Steprun; 1 2 3 4 8

1234567890123456789012345678901234567890...0Input buffer SARAH S 6 301.21 65.17 MN

SARAH’S input step.

Name Division Years Sales Expense StateLogicalProgram Data Vector . . .

(Data)

(Descriptor)Dataset work.softsale

Name Division Years Sales Expense StateCHRISMARK

HH

25

233.11298.12

94.1252.65

WIWI

Understanding the SAS® DATA Step and the Program Data Vector 61

(Data)

Page 62: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Executing the DATA StepFileref rawin

1 2 3data softsale;init buffer, PDVinfile rawin; if at EOF then stop

End

1 2 31234567890123456789012345678901234567CHRIS H 2 233.11 94.12 WIMARK H 5 298.12 52.65 WISARAH S 6 301.21 65.17 MNif at EOF then stop

input Name $1-10 Division $12 Years 15-16Sales 19-25 Expense 28-34 State $36-37;

output to SAS Datasetgoto top of Data Stepgoto top of Data Step

run; 1 2 3 4 81234567890123456789012345678901234567890...0

Input buffer SARAH S 6 301.21 65.17 MN

Name Division Years Sales Expense StateLogicalProgram Data Vector 6 301.21 65.17 MNSARAH S

(Data)

(Descriptor)Dataset work.softsale

Name Division Years Sales Expense StateCHRISMARK

HH

25

233.11298.12

94.1252.65

WIWI

Understanding the SAS® DATA Step and the Program Data Vector 62

(Data)

Page 63: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Executing the DATA StepFileref rawin

1 2 3data softsale;init buffer, PDVinfile rawin; if at EOF then stop

End

1 2 31234567890123456789012345678901234567CHRIS H 2 233.11 94.12 WIMARK H 5 298.12 52.65 WISARAH S 6 301.21 65.17 MNif at EOF then stop

input Name $1-10 Division $12 Years 15-16Sales 19-25 Expense 28-34 State $36-37;

output to SAS Datasetgoto top of Data Stepgoto top of Data Step

run; 1 2 3 4 81234567890123456789012345678901234567890...0

Input buffer SARAH S 6 301.21 65.17 MN

Name Division Years Sales Expense StateLogicalProgram Data Vector 6 301.21 65.17 MNSARAH S

(Data)

(Descriptor)Dataset work.softsale

Name Division Years Sales Expense StateCHRISMARK

HH

25

233.11298.12

94.1252.65

WIWI

Understanding the SAS® DATA Step and the Program Data Vector 63

(Data)SARAH S 6 301.21 65.17 MN

Page 64: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Executing the DATA StepFileref rawin

1 2 3data softsale;init buffer, PDVinfile rawin; if at EOF then stop

End

1 2 31234567890123456789012345678901234567CHRIS H 2 233.11 94.12 WIMARK H 5 298.12 52.65 WISARAH S 6 301.21 65.17 MNif at EOF then stop

input Name $1-10 Division $12 Years 15-16Sales 19-25 Expense 28-34 State $36-37;

output to SAS Datasetgoto top of Data Stepgoto top of Data Step

run; 1 2 3 4 81234567890123456789012345678901234567890...0

Input buffer SARAH S 6 301.21 65.17 MN

Name Division Years Sales Expense StateLogicalProgram Data Vector 6 301.21 65.17 MNSARAH S

(Data)

(Descriptor)Dataset work.softsale

Name Division Years Sales Expense StateCHRISMARK

HH

25

233.11298.12

94.1252.65

WIWI

Understanding the SAS® DATA Step and the Program Data Vector 64

(Data)SARAH S 6 301.21 65.17 MN

Page 65: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Executing the DATA StepFileref rawin

1 2 3data softsale;init buffer, PDVinfile rawin; if at EOF then stop

End

1 2 31234567890123456789012345678901234567CHRIS H 2 233.11 94.12 WIMARK H 5 298.12 52.65 WISARAH S 6 301.21 65.17 MNif at EOF then stop

input Name $1-10 Division $12 Years 15-16Sales 19-25 Expense 28-34 State $36-37;

output to SAS Datasetgoto top of Data Stepgoto top of Data Step

run; 1 2 3 4 81234567890123456789012345678901234567890...0

Input buffer

Name Division Years Sales Expense StateLogicalProgram Data Vector . . .

(Data)

(Descriptor)Dataset work.softsale

Name Division Years Sales Expense StateCHRISMARK

HH

25

233.11298.12

94.1252.65

WIWI

Understanding the SAS® DATA Step and the Program Data Vector 65

(Data)SARAH S 6 301.21 65.17 MN

Page 66: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Executing the DATA StepFileref rawin

1 2 3data softsale;init buffer, PDVinfile rawin; if at EOF then stop

End

1 2 31234567890123456789012345678901234567CHRIS H 2 233.11 94.12 WIMARK H 5 298.12 52.65 WISARAH S 6 301.21 65.17 MNif at EOF then stop

input Name $1-10 Division $12 Years 15-16Sales 19-25 Expense 28-34 State $36-37;

output to SAS Datasetgoto top of Data Stepgoto top of Data Step

run; 1 2 3 4 81234567890123456789012345678901234567890...0

Input buffer

Name Division Years Sales Expense StateLogicalProgram Data Vector . . .

(Data)

(Descriptor)Dataset work.softsale

Name Division Years Sales Expense StateCHRISMARK

HH

25

233.11298.12

94.1252.65

WIWI

Understanding the SAS® DATA Step and the Program Data Vector 66

(Data)SARAH S 6 301.21 65.17 MN

Page 67: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Executing the DATA StepFileref rawin

1 2 3data softsale;init buffer, PDVinfile rawin;if at EOF then stop

End

1 2 31234567890123456789012345678901234567CHRIS H 2 233.11 94.12 WIMARK H 5 298.12 52.65 WISARAH S 6 301.21 65.17 MNif at EOF then stop

input Name $1-10 Division $12 Years 15-16Sales 19-25 Expense 28-34 State $36-37;

output to SAS Datasetgoto top of Data Stepgoto top of Data Step

run; 1 2 3 4 81234567890123456789012345678901234567890...0

Input buffer

Name Division Years Sales Expense StateLogicalProgram Data Vector . . .

(Data)

(Descriptor)Dataset work.softsale

Name Division Years Sales Expense StateCHRISMARK

HH

25

233.11298.12

94.1252.65

WIWI

Understanding the SAS® DATA Step and the Program Data Vector 67

(Data)SARAH S 6 301.21 65.17 MN

Page 68: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Executing the DATA StepFileref rawin

1 2 3data softsale;init buffer, PDVinfile rawin; if at EOF then stop

End

1 2 31234567890123456789012345678901234567CHRIS H 2 233.11 94.12 WIMARK H 5 298.12 52.65 WISARAH S 6 301.21 65.17 MN

We're done! Go to the next step.

if at EOF then stopinput Name $1-10 Division $12 Years 15-16

Sales 19-25 Expense 28-34 State $36-37; output to SAS Datasetgoto top of Data Stepgoto top of Data Step

run; 1 2 3 4 81234567890123456789012345678901234567890...0

Input buffer

Name Division Years Sales Expense StateLogicalProgram Data Vector . . .

(Data)

(Descriptor)Dataset work.softsale

Name Division Years Sales Expense StateCHRISMARK

HH

25

233.11298.12

94.1252.65

WIWI

Understanding the SAS® DATA Step and the Program Data Vector 68

(Data)SARAH S 6 301.21 65.17 MN

Page 69: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Overriding Data Step Assumptions

N t ll fit i b lt b h i f th ?Not all programs fit scenario above; can we alter behavior of the program?

• The FIRSTOBS= and OBS= system options begin and end logically after the first record and before the last record respectivelythe first record and before the last record respectively.

• The RETAIN compiler statement instructs the step to not initialize variables.

• The STOP statement (usually used with IF statement) stops step early• The STOP statement (usually used with IF statement) stops step early.• The DELETE, subsetting IF statements, exit the DATA step early; never

reach OUTPUT.• RETURN exits the DATA step early but does output to the SAS file• RETURN exits the DATA step early but does output to the SAS file.• DROP, KEEP statements, dataset options exclude or include columns.• GOTO and various DO groups alter the looping path for the program.

Understanding the SAS® DATA Step and the Program Data Vector 69

Page 70: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Retaining Data Values

S ti i bl h ld t b l d it/ tSometimes variables should not be cleared upon exit/reentry.

The RETAIN compiler statement:• Specifies listed variables should not be initialized• Can also set an initial value • Iif first reference sets compiler attributes• Is essentially a flag set by compiler to tell execution don’t clear• Can be coded anywhere in DATA Step• Variables read with SET, MERGE, or UPDATE, MODIFY are retained., , ,

Understanding the SAS® DATA Step and the Program Data Vector 70

Page 71: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Retaining Data Values

Example: Read a file with CITY in the first row RETAIN the valueExample: Read a file with CITY in the first row, RETAIN the value.

MADISONBETH H 12 4822.12 982.10CHRIS H 2 233.11 94.12JOHN H 7 678.43 150.11

1234567890123456789012345678901234

data softsale;infile rawin missover; if _N_ = 1 then do;

input city $2-8;p y ;delete; end;

input name $1-10 division $12 years 15-16 sales 19-25 expense 27-34; retain city;

run;N NAME DIVISION YEARS SALES EXPENSE CITY ERROR N

MADISON

Name: NAME DIVISION YEARS SALES EXPENSE CITY _ERROR_ _N_ PROGRAM Type: CHAR CHAR NUM NUM NUM CHAR NUM NUM DATA Length: 10 1 8 8 8 7 8 8 VECTOR Format:

Informat: Label: Flags: R D D

Value

DATASET DESCRIPTOR Name: NAME DIVISION YEARS SALES EXPENSE CITYPORTION Type: CHAR CHAR NUM NUM NUM CHAR(DISK) Length: 10 1 8 8 8 7 Format:

Informat: Label:

BETH H 12 4822 12 982 10 MADISON

Understanding the SAS® DATA Step and the Program Data Vector 71

BETH H 12 4822.12 982.10 MADISONDATASET CHRIS H 2 233.11 94.12 MADISONDATA JOHN H 7 678.43 150.11 MADISONPORTION

Page 72: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Stopping Early

N ll j b til EOF OBS STOP t t lNormally jobs run until EOF, OBS= or STOP can stop step early.

Example: Stop after 50 records

data softsale;infile rawin; if N = > 50 then stop;if _N_ 50 then stop;input name $1-10 division $12 years 15-16 sales 19-25

expense 27-34;run;

Understanding the SAS® DATA Step and the Program Data Vector 72

Page 73: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Stopping Late

S ti h ld ti if th l t d dSometimes a program should continue even if the last record was read.Example: Calculate a percentage of total

DATA PCTDS;IF _N_ = 1 THEN

DO UNTIL(EOF) CONCAT DATASET

OBS NAME YEARS SALES

1 BETH 12 4822.12 2 CHRIS 2 233.11 3 JOHN 7 678.43 4 MARK 5 298 12

DO UNTIL(EOF);SET CONCAT(KEEP=NAME

SALES)END=EOF;

TOTSALES+SALES;END; 4 MARK 5 298.12

5 ANDREW 24 1762.11 6 BENJAMIN 3 201.11 7 JANET 1 98.11

END;SET CONCAT(KEEP=NAME SALES);

SALESPCT=(SALES*100)/TOTSALES;RUN;

Name Sales TOTSALES SALESPCTPDV

Understanding the SAS® DATA Step and the Program Data Vector 73

Page 74: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Outputting And Deleting Observations

D f lt i t t t th t if DATA t h i li d OUTPUTDefault is to output the current row if DATA step reaches implied OUTPUT.

Statements to discard rows:• DELETE (usually after an IF) leaves the DATA step without OUTPUT.• False cases of Subsetting IF statements exit without OUTPUTting.• RETURN exits the step but does OUTPUT. • Explicit OUTPUT statement OUTPUTs when executed, but no implied

OUTPUT at bottom.• You may prefer those positive statements such as Subsetting IF,

OUTPUT etc.• You might use a negative statement such as DELETE to filter unwanted

rows.

Note:

Understanding the SAS® DATA Step and the Program Data Vector 74

• WHERE also filters rows in engine, not in DATA step.

Page 75: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Outputting And Deleting Observations

E l O l t t d ith th 4000 i S lExample: Only output records with more than 4000 in Sales..

data softsale;if no input last time thru then stopinitialize PDVinfile rawin;if at EOF then stopinput Name $1-10 input Name $1 10

Division $12 Years 15-16 Sales 19-25Expense 28-34 St t $36 37 State $36-37;

If sales > 4000; If sales not gt 4000 then goto top of DATA step output to SAS Datasetgoto top of DATA stepg p p

run;

Note:

Understanding the SAS® DATA Step and the Program Data Vector 75

Note:• WHERE also filters rows in engine, not in DATA step.

Page 76: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Losing the Crowd

Understanding the SAS® DATA Step and the Program Data Vector 76

Page 77: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Accumulating Totals in a DATA Step

W h ld b bl t it f l t t b tiWe should be able to write formulas to count or sum across observations.

In a DATA step, read the following dataset and do the following:• Count all employees and print running counts.• Sum hours and print running sums.

fileref

rawin

JOE 40 PETE 20 STEVE .TOM 35TOM 35

Understanding the SAS® DATA Step and the Program Data Vector 77

Page 78: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Accumulating Totals in a DATA Step (continued)

data timecard;data timecard;infile rawin;input Name $ Hours;Ktr=ktr+1;

JOE 40 PETE 20 STEVE

FilerefiKtr ktr 1;

Hourstot=hourstot+hours;run;

STEVE .TOM 35

rawin

proc print data=timecard;title 'SOFTCO PAYROLL';

run;

SOFTCO PAYROLL

OBS Name Hours Ktr Hourstot

1 JOE 40 1 JOE 40 . .2 PETE 20 . .3 STEVE . . .4 TOM 35 . .

Understanding the SAS® DATA Step and the Program Data Vector 78

Question: Why didn't the above program work correctly?

Page 79: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Accumulating Totals in a DATA Step (continued)

Addi l d h th blAdding values across records has three problems:1. All variables are set to missing on every record.2. Totals normally need to start at 0, not missing.3. When a missing value is added to any value, the result is a missing value.

The SUM statement is a special assignment statement to add values across observations.

SYNTAX:

variable+expression;

Notes:• SUM variables start at 0, not missing. • SUM variables are retained from pass to pass.

Understanding the SAS® DATA Step and the Program Data Vector 79

• Any missing values encountered are ignored. • RETAIN and SUM function could also be used.

Page 80: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

An Accumulation Solution

data timecard;infile rawin; input Name $ Hours; Ktr+1;

JOE 40 PETE 20 STEVE .TOM 35

filerefrawin

Hourstot+hours; run;

proc print data=timecard; title 'SOFTCO PAYROLL'; title SOFTCO PAYROLL ;

run;

Name Hours Ktr Hourstot flagsPDV

Understanding the SAS® DATA Step and the Program Data Vector 80

Page 81: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

An Accumulation Solution

data timecard;infile rawin; input Name $ Hours; Ktr+1;

JOE 40 PETE 20 STEVE .TOM 35

filerefrawin

Hourstot+hours; run;

proc print data=timecard; title 'SOFTCO PAYROLL'; Set at title SOFTCO PAYROLL ;

run;

Name Hours Ktr Hourstot flags RM RM

Set at compile time.

PDV 0 0.

Understanding the SAS® DATA Step and the Program Data Vector 81

Page 82: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

An Accumulation Solution

data timecard;infile rawin;input Name $ Hours; Ktr+1;

JOE 40 PETE 20 STEVE .TOM 35

filerefrawin

Hourstot+hours; run;

proc print data=timecard; title 'SOFTCO PAYROLL'; title SOFTCO PAYROLL ;

run;

Name Hours Ktr Hourstot flags RM RMPDV . 0 0

Understanding the SAS® DATA Step and the Program Data Vector 82

Page 83: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

An Accumulation Solution

data timecard;infile rawin;input Name $ Hours;Ktr+1;

JOE 40PETE 20 STEVE .TOM 35

filerefrawin

Hourstot+hours; run;

proc print data=timecard; title 'SOFTCO PAYROLL'; title SOFTCO PAYROLL ;

run;

Name Hours Ktr Hourstot flags RM RMPDV 40 0 0JOE

Understanding the SAS® DATA Step and the Program Data Vector 83

Page 84: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

An Accumulation Solution

data timecard;infile rawin;input Name $ Hours; Ktr+1;

JOE 40 PETE 20 STEVE .TOM 35

filerefrawin

Hourstot+hours; run;

proc print data=timecard; title 'SOFTCO PAYROLL'; 0title SOFTCO PAYROLL ;

run;

Name Hours Ktr Hourstot flags RM RM

0+ 1

PDV 40 1 0JOE

Understanding the SAS® DATA Step and the Program Data Vector 84

Page 85: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

An Accumulation Solution

data timecard;infile rawin;input Name $ Hours; Ktr+1;

JOE 40 PETE 20 STEVE .TOM 35

filerefrawin

Hourstot+hours; run;

proc print data=timecard; title 'SOFTCO PAYROLL'; 0title SOFTCO PAYROLL ;

run;

Name Hours Ktr Hourstot flags RM RM

0+ 40

PDV 40 1 40JOE

Understanding the SAS® DATA Step and the Program Data Vector 85

Page 86: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

An Accumulation Solution

data timecard;infile rawin;input Name $ Hours; Ktr+1;

JOE 40 PETE 20 STEVE .TOM 35

filerefrawin

Hourstot+hours; run;

proc print data=timecard; title 'SOFTCO PAYROLL'; title SOFTCO PAYROLL ;

run;

Name Hours Ktr Hourstot flags RM RMPDV 40 1 40JOE

Obs Name Hours Ktr Hourstot Obs Name Hours Ktr Hourstot

1 JOE 40 1 40

Understanding the SAS® DATA Step and the Program Data Vector 86

Page 87: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

An Accumulation Solution

data timecard;infile rawin;input Name $ Hours; Ktr+1;

JOE 40 PETE 20 STEVE .TOM 35

filerefrawin

Hourstot+hours; run;

proc print data=timecard; title 'SOFTCO PAYROLL'; title SOFTCO PAYROLL ;

run;

Name Hours Ktr Hourstot flags RM RM

Obs Name Hours Ktr Hourstot

PDV . 1 40

Obs Name Hours Ktr Hourstot

1 JOE 40 1 40

Understanding the SAS® DATA Step and the Program Data Vector 87

Page 88: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

An Accumulation Solution

data timecard;infile rawin; input Name $ Hours; Ktr+1;

JOE 40 PETE 20 STEVE .TOM 35

filerefrawin

Hourstot+hours; run;

proc print data=timecard; title 'SOFTCO PAYROLL'; title SOFTCO PAYROLL ;

run;

Name Hours Ktr Hourstot flags RM RM

Obs Name Hours Ktr Hourstot

PDV . 1 40

Obs Name Hours Ktr Hourstot

1 JOE 40 1 40

Understanding the SAS® DATA Step and the Program Data Vector 88

Page 89: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

An Accumulation Solution

data timecard;infile rawin;input Name $ Hours;Ktr+1;

JOE 40 PETE 20STEVE .TOM 35

filerefrawin

Hourstot+hours; run;

proc print data=timecard; title 'SOFTCO PAYROLL'; title SOFTCO PAYROLL ;

run;

Name Hours Ktr Hourstot flags RM RM

Obs Name Hours Ktr Hourstot

PDV 20 1 40PETE

Obs Name Hours Ktr Hourstot

1 JOE 40 1 40

Understanding the SAS® DATA Step and the Program Data Vector 89

Page 90: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

An Accumulation Solution

data timecard;infile rawin;input Name $ Hours; Ktr+1;

JOE 40 PETE 20 STEVE .TOM 35

filerefrawin

Hourstot+hours; run;

proc print data=timecard; title 'SOFTCO PAYROLL'; 1title SOFTCO PAYROLL ;

run;

Name Hours Ktr Hourstot flags RM RM

1+ 1

Obs Name Hours Ktr Hourstot

PDV 20 2 40PETE

Obs Name Hours Ktr Hourstot

1 JOE 40 1 40

Understanding the SAS® DATA Step and the Program Data Vector 90

Page 91: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

An Accumulation Solution

data timecard;infile rawin;input Name $ Hours; Ktr+1;

JOE 40 PETE 20 STEVE .TOM 35

filerefrawin

Hourstot+hours; run;

proc print data=timecard; title 'SOFTCO PAYROLL'; 40title SOFTCO PAYROLL ;

run;

Name Hours Ktr Hourstot flags RM RM

40+ 20

Obs Name Hours Ktr Hourstot

PDV 20 2 60PETE

Obs Name Hours Ktr Hourstot

1 JOE 40 1 40

Understanding the SAS® DATA Step and the Program Data Vector 91

Page 92: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

An Accumulation Solution

data timecard;infile rawin;input Name $ Hours; Ktr+1;

JOE 40 PETE 20 STEVE .TOM 35

filerefrawin

Hourstot+hours; run;

proc print data=timecard; title 'SOFTCO PAYROLL'; title SOFTCO PAYROLL ;

run;

Name Hours Ktr Hourstot flags RM RM

Obs Name Hours Ktr Hourstot

PDV 20 2 60PETE

Obs Name Hours Ktr Hourstot

1 JOE 40 1 40 2 PETE 20 2 60

Understanding the SAS® DATA Step and the Program Data Vector 92

Page 93: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

An Accumulation Solution

data timecard;infile rawin;input Name $ Hours; Ktr+1;

JOE 40 PETE 20 STEVE .TOM 35

filerefrawin

Hourstot+hours; run;

proc print data=timecard; title 'SOFTCO PAYROLL'; title SOFTCO PAYROLL ;

run;

Name Hours Ktr Hourstot flags RM RM

Obs Name Hours Ktr Hourstot

PDV . 2 60

Obs Name Hours Ktr Hourstot

1 JOE 40 1 40 2 PETE 20 2 60

Understanding the SAS® DATA Step and the Program Data Vector 93

Page 94: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

An Accumulation Solution

data timecard;infile rawin;input Name $ Hours; Ktr+1;

JOE 40 PETE 20 STEVE .TOM 35

filerefrawin

Hourstot+hours; run;

proc print data=timecard; title 'SOFTCO PAYROLL'; title SOFTCO PAYROLL ;

run;

Name Hours Ktr Hourstot flags RM RM

Obs Name Hours Ktr Hourstot

PDV . 2 60

Obs Name Hours Ktr Hourstot

1 JOE 40 1 40 2 PETE 20 2 60

Understanding the SAS® DATA Step and the Program Data Vector 94

Page 95: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

An Accumulation Solution

data timecard;infile rawin;input Name $ Hours;Ktr+1;

JOE 40 PETE 20 STEVE .TOM 35

filerefrawin

Hourstot+hours; run;

proc print data=timecard; title 'SOFTCO PAYROLL'; title SOFTCO PAYROLL ;

run;

Name Hours Ktr Hourstot flags RM RM

Obs Name Hours Ktr Hourstot

PDV . 2 60STEVE

Obs Name Hours Ktr Hourstot

1 JOE 40 1 40 2 PETE 20 2 60

Understanding the SAS® DATA Step and the Program Data Vector 95

Page 96: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

An Accumulation Solution

data timecard;infile rawin;input Name $ Hours; Ktr+1;

JOE 40 PETE 20 STEVE .TOM 35

filerefrawin

Hourstot+hours; run;

proc print data=timecard; title 'SOFTCO PAYROLL'; 2title SOFTCO PAYROLL ;

run;

Name Hours Ktr Hourstot flags RM RM

2+ 1

Obs Name Hours Ktr Hourstot

PDV . 3 60STEVE

Obs Name Hours Ktr Hourstot

1 JOE 40 1 40 2 PETE 20 2 60

Understanding the SAS® DATA Step and the Program Data Vector 96

Page 97: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

An Accumulation Solution

data timecard;infile rawin;input Name $ Hours; Ktr+1;

JOE 40 PETE 20 STEVE .TOM 35

FilerefRAWIN

Hourstot+hours; run;

proc print data=timecard; title 'SOFTCO PAYROLL'; 60title SOFTCO PAYROLL ;

run;

Name Hours Ktr Hourstot flags RM RM

60+ .

Obs Name Hours Ktr Hourstot

PDV . 3 60STEVE

Obs Name Hours Ktr Hourstot

1 JOE 40 1 40 2 PETE 20 2 60

Understanding the SAS® DATA Step and the Program Data Vector 97

Page 98: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

An Accumulation Solution

data timecard;infile rawin;input Name $ Hours; Ktr+1;

JOE 40 PETE 20 STEVE .TOM 35

filerefrawin

Hourstot+hours; run;

proc print data=timecard; title 'SOFTCO PAYROLL'; title SOFTCO PAYROLL ;

run;

Name Hours Ktr Hourstot flags RM RM

Obs Name Hours Ktr Hourstot

PDV . 3 60STEVE

Obs Name Hours Ktr Hourstot

1 JOE 40 1 40 2 PETE 20 2 603 STEVE . 3 60

Understanding the SAS® DATA Step and the Program Data Vector 98

Page 99: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

An Accumulation Solution

data timecard;infile rawin;input Name $ Hours; Ktr+1;

JOE 40 PETE 20 STEVE .TOM 35

filerefrawin

Hourstot+hours; run;

proc print data=timecard; title 'SOFTCO PAYROLL'; title SOFTCO PAYROLL ;

run;

Name Hours Ktr Hourstot flags RM RM

Obs Name Hours Ktr Hourstot

PDV . 3 60

Obs Name Hours Ktr Hourstot

1 JOE 40 1 40 2 PETE 20 2 603 STEVE . 3 60

Understanding the SAS® DATA Step and the Program Data Vector 99

Page 100: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

An Accumulation Solution

data timecard;infile rawin; input Name $ Hours; Ktr+1;

JOE 40 PETE 20 STEVE .TOM 35

filerefrawin

Hourstot+hours; run;

proc print data=timecard; title 'SOFTCO PAYROLL'; title SOFTCO PAYROLL ;

run;

Name Hours Ktr Hourstot flags RM RM

Obs Name Hours Ktr Hourstot

PDV . 3 60

Obs Name Hours Ktr Hourstot

1 JOE 40 1 40 2 PETE 20 2 603 STEVE . 3 60

Understanding the SAS® DATA Step and the Program Data Vector 100

Page 101: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

An Accumulation Solution

data timecard;infile rawin;input Name $ Hours;Ktr+1;

JOE 40 PETE 20 STEVE .TOM 35

filerefrawin

Hourstot+hours; run;

proc print data=timecard; title 'SOFTCO PAYROLL'; title SOFTCO PAYROLL ;

run;

Name Hours Ktr Hourstot flags RM RM

Obs Name Hours Ktr Hourstot

PDV 35 3 60TOM

Obs Name Hours Ktr Hourstot

1 JOE 40 1 40 2 PETE 20 2 603 STEVE . 3 60

Understanding the SAS® DATA Step and the Program Data Vector 101

Page 102: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

An Accumulation Solution

data timecard;infile rawin;input Name $ Hours; Ktr+1;

JOE 40 PETE 20 STEVE .TOM 35

filerefrawin

Hourstot+hours; run;

proc print data=timecard; title 'SOFTCO PAYROLL'; 3title SOFTCO PAYROLL ;

run;

Name Hours Ktr Hourstot flags RM RM

3+ 1

Obs Name Hours Ktr Hourstot

PDV 35 4 60TOM

Obs Name Hours Ktr Hourstot

1 JOE 40 1 40 2 PETE 20 2 603 STEVE . 3 60

Understanding the SAS® DATA Step and the Program Data Vector 102

Page 103: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

An Accumulation Solution

data timecard;infile rawin;input Name $ Hours; Ktr+1;

JOE 40 PETE 20 STEVE .TOM 35

filerefrawin

Hourstot+hours; run;

proc print data=timecard; title 'SOFTCO PAYROLL'; 60title SOFTCO PAYROLL ;

run;

Name Hours Ktr Hourstot flags RM RM

60+ 35

Obs Name Hours Ktr Hourstot

PDV 35 4 95TOM

Obs Name Hours Ktr Hourstot

1 JOE 40 1 40 2 PETE 20 2 603 STEVE . 3 60

Understanding the SAS® DATA Step and the Program Data Vector 103

Page 104: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

An Accumulation Solution

data timecard;infile rawin;input Name $ Hours; Ktr+1;

JOE 40 PETE 20 STEVE .TOM 35

filerefrawin

Hourstot+hours; run;

proc print data=timecard; title 'SOFTCO PAYROLL'; title SOFTCO PAYROLL ;

run;

Name Hours Ktr Hourstot flags RM RM

Obs Name Hours Ktr Hourstot

PDV 35 4 95TOM

Obs Name Hours Ktr Hourstot

1 JOE 40 1 40 2 PETE 20 2 603 STEVE . 3 60

Understanding the SAS® DATA Step and the Program Data Vector 104

4 TOM 35 4 95

Page 105: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

An Accumulation Solution

data timecard;infile rawin;input Name $ Hours; Ktr+1;

JOE 40 PETE 20 STEVE .TOM 35

filerefrawin

Hourstot+hours; run;

proc print data=timecard; title 'SOFTCO PAYROLL'; title SOFTCO PAYROLL ;

run;

Understanding the SAS® DATA Step and the Program Data Vector 105

Page 106: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

An Accumulation Solution

data timecard;infile rawin;input Name $ Hours; Ktr+1;

JOE 40 PETE 20 STEVE .TOM 35

filerefrawin

Hourstot+hours; run;

proc print data=timecard; title 'SOFTCO PAYROLL'; title SOFTCO PAYROLL ;

run;

SOFTCO PAYROLL

Obs Name Hours Ktr Hourstot

1 JOE 40 1 40 2 PETE 20 2 60 2 PETE 20 2 60 3 STEVE . 3 60 4 TOM 35 4 95

Understanding the SAS® DATA Step and the Program Data Vector 106

Page 107: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Another Accumulation Solution

R t i d th SUM f ti l l t tlRetain and the SUM function can also accumulate correctly.

data timecard;infile rawin; infile rawin; input Name $ Hours;ktr=ktr+1; hourstot=sum(hourstot,hours); ( , );retain Ktr Hourstot 0;run; proc print data=timecard;title 'SOFTCO PAYROLL'; run;

Understanding the SAS® DATA Step and the Program Data Vector 107

Page 108: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Compiler Instructions

A series of statements that instruct the compiler to alter attributesA series of statements that instruct the compiler to alter attributes.• In general, declarative statements • Can be coded in any order • Allow the program to be very explicit in the definition of SAS structures• Allow the program to be very explicit in the definition of SAS structures.

Examples of these statements are:

• LENGTH statement to set a variables internal length• INFORMAT to set input format• FORMAT to set output display formatp p y• LABEL to define a variable label• ATTRIB to define any or all of the above in one statement• DROP to indicate which variables are to be left behind on SAS file• KEEP to indicate which variables to include on the SAS file• RETAIN to set initial values and instruct SAS to never clear

Understanding the SAS® DATA Step and the Program Data Vector 108

Page 109: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Compiler Statements to Alter Variable Attributes

data softsale;infile rawin;length name $20;attr division length=$2;

BETH H 12 4822.12 982.10CHRIS H 2 233.11 94.12JOHN H 7 678.43 150.11

1234567890123456789012345678901234

input bufferattr division length $2;format sales comma10.2;input name $1-10 division $12 years 15-16 sales 19-25

expense 27-34; drop sales years; drop sales years; run;

Name: NAME DIVISION YEARS SALES EXPENSE _ERROR_ _N_ PROGRAM Type: CHAR CHAR NUM NUM NUM NUM NUM DATA Length: 20 2 8 8 8 8 8VECTOR Format: comma.

DATASET

C O o at co aInformat: Label: Flags: D D Value

DESCRIPTOR Name: NAME DIVISION EXPENSEPORTION Type: CHAR CHAR NUM (DISK) Length: 20 2 8

Format: Informat: Label:

BETH H 982.10DATASET CHRIS H 94.12

Understanding the SAS® DATA Step and the Program Data Vector 109

DATA JOHN H 150.11PORTION

Page 110: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Debugging Features

There are a number of debugging features in the DATA stepThere are a number of debugging features in the DATA step.

• Excellent Interactive DATA step debugger available.• Simple DATA step statements that display data• Simple DATA step statements that display data.• The LIST statement can display the input buffer from the most recent

INPUT.• The FILE LOG with PUT can display any text and variable from the PDVThe FILE LOG with PUT can display any text and variable from the PDV

in the SAS log.• The PUTLOG statement can also display text to the SAS log.• PROC CONTENTS and PROC PRINT/REPORT can be used to display

the dataset descriptor and data values of the final dataset.

N tNotes:• By putting the LIST and PUT/PUTLOG statements at strategic points in

the DATA step may be the simplest and best way to debug.

Understanding the SAS® DATA Step and the Program Data Vector 110

Page 111: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

A Debugging Example

Example: The program below selects 0 records Which statement is causingExample: The program below selects 0 records. Which statement is causing the problem?

1234567890123456789012345678901234

BETH H 12 4822.12 982.10CHRIS H 2 233.11 94.12JOHN H 7 678.43 150.11

data softsale;infile rawin missover; input name $1-10 division $12 years 15-16 sales 19-25 expense 27 34; expense 27-34; if sales > 4000;if division='h';

run;

O h d O SO S h b i d i blNOTE: The data set WORK.SOFTSALE has 0 observations and 5 variables.

Understanding the SAS® DATA Step and the Program Data Vector 111

Page 112: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

A Debugging Example

N PUTLOG t di l d l d th IFNow use PUTLOG to display messages and values around the IF statements.

data softsale;;infile rawin missover; input name $1-10 division $12 years 15-16 sales 19-25 expense 27-34; putlog '$$$ before sales if ' _n_= sales= division=;if sales > 4000;putlog '$$$ before division if ' _n_= sales= division=;if division='h';run;

$$$ before sales if _N_=1 sales=4822.12 division=H$$$ before division if _N_=1 sales=4822.12 division=H$$$ before sales if _N_=2 sales=233.11 division=H$$$ before sales if N =3 sales=678.43 division=H$$$ before sales if _N_ 3 sales 678.43 division H

NOTE: The data set WORK SOFTSALE has 0 observations and 5 variables

Understanding the SAS® DATA Step and the Program Data Vector 112

NOTE: The data set WORK.SOFTSALE has 0 observations and 5 variables.

Page 113: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Other Data Step Statements

The SAS DATA step has many many more statements that can read writeThe SAS DATA step has many, many more statements that can read, write, and process data in almost any form.

• over 500 DATA step functionsover 500 DATA step functions• interfaces to the SAS macro system• much more• much written about those featuresmuch written about those features• beyond the scope of this paper to try to cover them all.• the DATA step is an extremely versatile and full featured programming

language.

Understanding the SAS® DATA Step and the Program Data Vector 113

Page 114: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Other Topics

As powerful and well designed as the DATA step is it is different from otherAs powerful and well designed as the DATA step is, it is different from other languages.

• Perhaps a more standardized language such as SQL might be morePerhaps a more standardized language such as SQL might be more auditable and more desirable.

• Terrible DATA step code can be written.• Good design and adequate documentation make a well written program.• PROC SQL is also a great tool that adds that language’s features to SAS.• SAS programs can be well written programs that can be used for

everything from one time programs to full fledged production programs.

Understanding the SAS® DATA Step and the Program Data Vector 114

Page 115: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Conclusions

The SAS DATA step is an excellent programming language with uniqueThe SAS DATA step is an excellent programming language with unique features and extremely versatile features.

Understanding the SAS® DATA Step and the Program Data Vector 115

Page 116: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Not Good On the Flight Home

Understanding the SAS® DATA Step and the Program Data Vector 116

Page 117: Understanding the SAS® DATA Step and the Program … presentation was written by Systems Seminar Consultants IncThis presentation was written by Systems Seminar ... DBMS, program

Contact UsQuestions, comments or to subscribe to the free “Missing Semicolon” newslettter.

SAS® Training, Consulting, & Help Desk ServicesSAS Training, Consulting, & Help Desk Services

Steven J. FirstPresident

(608) 278-9964FAX (608) 278-0065sfirst@sys-seminar comsfirst@sys seminar.comwww.sys-seminar.com

2997 Yarmouth Greenway Drive • Madison, WI 53711

Understanding the SAS® DATA Step and the Program Data Vector 117