1-reading raw data in sas week1
TRANSCRIPT
-
7/27/2019 1-Reading Raw Data in SAS Week1
1/75
Powerpoint Templates
1Powerpoint Templates
SAS - Basic Concepts
http://www.powerpointstyles.com/http://www.powerpointstyles.com/http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
2/75
Powerpoint Templates
2
Statistical Application Software
The SAS system has a suite of products
Each product associated with a set of functionalities
Capable of efficiently handling very large data sets
What is SAS?
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
3/75
Powerpoint Templates
3
Some products in the SAS System
Core of the SAS System The basic software to make SAS run
Base SAS Software SAS language, DATA step and Basic Procedures
SAS/STAT Procedures for various statistical analyses
SAS/GRAPH Procedures and options to create graphs
SAS/ETS
Economic Time Series Time Series Analysis SAS/OR
Operations Research Optimization, LP etc
SAS/EIS Enterprise Information Systems for OLAP models
Enterprise Miner Mining Package with various techniques
SAS/Intrnet Web based application and portal development
Analyst/AF/FSP/Other Front End Based Features
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
4/75
Powerpoint Templates
4
Program Editor Window
Write your code inthis window
ExplorerandResultsWindow
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
5/75
Powerpoint Templates
5
Log Window
Log Window
View the Log Created by
the Program Execution
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
6/75
Powerpoint Templates
6
Output Window
Output Window
View the Values of a Datasetin this Window
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
7/75
Powerpoint Templates 7
Components of SAS Programs
DATA steps typically create or modify SAS data sets.
put your data into a SAS data set compute values
check for and correct errors in your data
produce new SAS data sets by subsetting, merging, and updating existing data sets.
PROC (procedure) steps are pre-written routines thatenable you to analyze and process the data in a SASdata set
you can use PROC steps to
create a report that lists the data
produce descriptive statistics
create a summary report
produce plots and charts.
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
8/75
Powerpoint Templates 8
Characteristics of SAS Programs
SAS programs consist ofSAS statements.
A SAS statement has two important characteristics: It usually begins with a SAS keyword.
It always ends with a semicolon.
SAS statements are in free format. This means that they can begin and end anywhere on a line
One statement can continue over several lines
Several statements can be on a line.
Blanks or special characters separate "words" in a SASstatement.
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
9/75
Powerpoint Templates 9
Overview of Data Sets
Descriptor Portion
The descriptor portion of a SAS data set contains informationabout the data set, including
the name of the data set
the date and time that the data set was created
the number of observations
the number of variables
Data Portion Contains Rows, Column and actual Value
Variable Attr i butes
Name
Type
Length
Format
Informat
Label
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
10/75
Powerpoint Templates 10
SAS Libraries
Every SAS file is stored in a SAS library, which is a
collection of SAS files. A SAS data library is the highest level of organization for
information within SAS.
General form, basic LIBNAME statement:
LIBNAME libref'SAS-data-library' ;
where
librefis 1 to 8 characters long, begins with a letter or underscore,and contains only letters, numbers, or underscores.
SAS-data-libraryis the name of a SAS data library in which SAS
data files are stored. The specification of the physical name ofthe library differs by operating environment.
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
11/75
Powerpoint Templates 11
Storing Files Temporarily or Permanently
Storing files temporarily:
If you don't specify a library name when you create a file (or if youspecify the library name Work), the file is stored in the temporary SASdata library.
When you end the session, the temporary library and all of its files aredeleted. Temporary SAS libraries last only for the current SAS session
Storing files permanently: To store files permanently in a SAS data library, you specify a
library name other than the default library name Work.
Permanent SAS libraries are available to you during subsequentSAS sessions
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
12/75
Powerpoint Templates 12
Referencing SAS Files
To reference a permanent SAS data set in your SAS programs, you use atwo-level name:
l ibref. f i lename
In the two-level name, l ibrefis the name of the SAS data library that contains the file, andf i lenameis the name of the file itself. A period separates the libref and filename.
To reference temporary SAS files, you can specify the default librefWork,
a period, and the filename. For example, the two-level name Work.Test
Alternatively, you can use a one-level name (the filename only) to reference a file in atemporary SAS library. When you specify a one-level name, the default librefWork isassumed.
If the USER library is assigned, SAS uses the Userlibrary rather than theWork library for one-level names. Useris a permanent library.
So referencing a SAS file in any library except Work indicates that the SASfile is stored permanently.
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
13/75
Powerpoint Templates 13
Example of a SAS Data set
ID NAME HT WT
1
2
3
4
5
6
Observations
Variables
ID, HT and WT are Numeric Variables
NAME is a Character Variable
Character Variables if blank are represented by a space
Numeric Variables if blank are represented by a .
53 Lucy 42 41
54 Tom 46 54
55 Dan 43 .
56 Tim 45 56
57 42 48
58 Mary 48 43
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
14/75
Powerpoint Templates 14
CREATING LIST REPORTS
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
15/75
Powerpoint Templates 15
Basic Report
You can easily list the contents of a SAS data set by
using a simple program like the one shown below.libname clinic 'your-SAS-data-library';
proc print data=clinic.admit;
run;
You can produce column totals for numeric variableswithin your report.libname clinic 'your-SAS-data-library';
proc print data=clinic.admit;
sum fee;
run;
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
16/75
Powerpoint Templates 16
Selected Observations and Variables
You can choose the observations and variables that
appear in your report. In addition, you can remove thedefault Obs column that displays observation numbers.
libname clinic 'your-SAS-data-library';
proc print data=clinic.admit noobs;
var age height weight fee;
where age>30;
run;
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
17/75
Powerpoint Templates 17
Specifying WHERE Expressions
Symbol Meaning Example
= or eq equal to where name='Jones, C.';
^= or ne not equal to where temp ne 212;
> or gt greater than where income>20000;
< or lt less than where partno lt "BG05";
>= or ge greater than or equal to where id>='1543';
-
7/27/2019 1-Reading Raw Data in SAS Week1
18/75
Powerpoint Templates 18
More Operators
Using the CONTAINS Operator
The CONTAINS operator selects observations that include thespecified substring. The mnemonic equivalent for theCONTAINS operator is ?
where firstname CONTAINS 'Jon';
where firstname ? 'Jon';
IN operatorwhere actlevel in ('LOW','MOD');
where fee in (124.80,178.20);
Between And
Where date Between 21Dec2010d And 20Jan2011d;
Like Operator Where name like _uj%;
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
19/75
Powerpoint Templates 19
CREATING SAS DATA SETSFROM RAW DATA
St t C t SAS D t S t
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
20/75
Powerpoint Templates 20
Steps to Create a SAS Data Set
To Do This Use This SAS Statement Example
Reference a SAS data library LIBNAME statement libname libref 'SAS-data-library';
Reference an external file FILENAME statement filename tests 'c:\users\tmill.dat';
Name a SAS data set DATA statement data clinic.stress;
Identify an external file INFILE statement infile tests obs=10;
Describe data INPUT statement input ID 1-4 Age 6-7 ...;
Execute the DATA step RUN statement run;
List the data PROC PRINT statement proc print data=clinic.stress;
Execute the final program step RUN statement run;
St t C t SAS D t S t
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
21/75
Powerpoint Templates 21
Steps to Create a SAS Data Set
Using a LIBNAME Statement
libname taxes 'c:\users\acct\qtr1\report'; Using a FILENAME Statement
filename exer 'c:\users\exer.dat';
Naming the Data Set
DATA SAS-data-set-1 ; Rules for SAS Names
SAS data set names
can be 1 to 32 characters long
must begin with a letter (AZ, either uppercase or lowercase) or an
underscore (_) can continue with any combination of numbers, letters, or
underscores.
St t C t SAS D t S t
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
22/75
Powerpoint Templates 22
Steps to Create a SAS Data Set
Specifying the Raw Data File
INFILEfile-specification ;
Describing the Data General form, INPUT statement using column input:
INPUT variable ; startcol-endcol . . .where
variable is the SAS name that you assign to the field
the dollar sign ($) identifies the variable type as character (if thevariable is numeric, then nothing appears here)
startcolrepresents the starting column for this variable endcolrepresents the ending column for this variable.
St t C t SAS D t S t
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
23/75
Powerpoint Templates 23
Steps to Create a SAS Data Set
filename exer 'c:\users\exer.dat';
data exercise;infile exer;
input ID $ 1-4 Age 6-7 ActLevel $ 9-12 Sex $ 14;
run;
When you use column input, you can read any or all fields from the raw data file
read the fields in any order
specify only the starting column for values that occupy only onecolumn.
input ActLevel $ 9-12 Sex $ 14 Age 6-7;
V if i th D t
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
24/75
Powerpoint Templates 24
Verifying the Data
Whenever you use the DATA step to read raw data
Write the DATA step using the OBS= option in the INFILEstatement.
Submit the DATA step.
Check the log for messages.
View the resulting data set.
Remove the OBS= option and re-submit the DATA step.
Check the log again.
View the resulting data set again.
C ti d M dif i V i bl
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
25/75
Powerpoint Templates 25
Creating and Modifying Variables
General form, assignment statement:
variable=expression;where
variable names a new or existing variable
expression is any valid SAS expression.
SAS Expressions An expression is a sequence of operands and operators that
form a set of instructions. The instructions are performed toproduce a new value:
Operands are variable names or constants. They can be numeric,
character, or both. Operators are special-character operators, grouping parentheses,
or functions.
U i O t i SAS E i
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
26/75
Powerpoint Templates 26
Using Operators in SAS Expressions
Operator Action Example Priority
- negative prefix negative=-x; I
** exponentiation raise=x**y; I
* multiplication mult=x*y; II
/ division divide=x/y; II
+ addition sum=x+y; III
- subtraction diff=x-y; III
When you use more than one arithmetic operator in an expression,
operations of priority I are performed before operations of priority II,and so on
consecutive operations that have the same priority are performed
from right to left within priority I
from left to right within priorities II and III
you can use parentheses to control the order of operations.
Reading In stream Data
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
27/75
Powerpoint Templates 27
Reading In stream Data
To read in stream data, you use a DATALINES
statement as the last statement in the DATA step(except for the RUN statement) and immediatelypreceding the data lines
A null statement (a single semicolon) to indicate the
end of the input data
Using Data step for Internal raw data
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
28/75
Powerpoint Templates 28
Using Data step for Internal raw data
Internal raw data
Datalines or Cards to indicate that the data isinternal
data cities;
input City $ Rank ;
datalines;
Mumbai 1
Delhi 2
Chennai 3
Calcutta 4
;
run ;
*
Steps to Create a Raw Data File
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
29/75
Powerpoint Templates 29
Steps to Create a Raw Data File
data _null_;
set clinic.stress;file 'c:\clinic\patients\stress.dat';
put id 1-4 name 6-25 resthr 27-29 maxhr 31-33
rechr 35-37 timemin 39-40 timesec 42-43
tolerance 45 totaltime 47-49;
run;
Using the_NULL_ Keyword The keyword _NULL_, which enables you to use the DATA step
without actually creating a SAS data set
A SET statement specifies the SAS data set that you want toread from.
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
30/75
Powerpoint Templates 30
UNDERSTANDING DATA STEPPROCESSING
C il ti phase
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
31/75
Powerpoint Templates 31
Compilation phase
A SAS DATA step is processed in two phases:
During the compilation phase, each statement is scanned for syntax errors. Most syntax errors
prevent further processing of the DATA step.
When the compilation phase is complete, the descriptor portionof the new data set is created.
Execution phase
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
32/75
Powerpoint Templates 32
Execution phase
If the DATA step
compiles successfully,then the executionphase begins. During the execution
phase, the DATA step
reads and processesthe input data.
The DATA stepexecutes once for eachrecord in the input file
Compile Program
Initialize variablesto missing
Execute inputstatement
Execute otherstatements
End ofFile ?
Output to SASdataset
Next step
Yes
No
Execution Phase
Compilation Phase In detail
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
33/75
Powerpoint Templates 33
Compilation Phase In detail
Input Buffer
At the beginning of the compilation phase, the input buffer(anarea of memory) is created to hold a record from the external file
The input buffer is created only when raw data is read, not whena SAS data set is read
The term input bufferrefers to a logical concept; it is not a
physical storage area
1 2 3 4 5 6 7 8 9
Program Data Vector
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
34/75
Powerpoint Templates 34
Program Data Vector
Program Data Vector
The program data vector is the area of memory where SASbuilds a data set, one observation at a time
Like the term input buffer, the term program data vectorrefersto a logical concept
The program data vector contains two automatic variables that
can be used for processing but which are not written to the dataset as part of an observation
_N_counts the number of times that the DATA step begins toexecute.
_ERROR_signals the occurrence of an error that is caused by the
data during execution. The default value is 0, which means there is no error. When one or more errors
occur, the value is set to 1.
Syntax Checking
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
35/75
Powerpoint Templates 35
Syntax Checking
During the compilation phase, SAS scans each
statement looking for following syntax errors. missing or misspelled keywords
invalid variable names
missing or invalid punctuation
invalid options.
Data Set Variables As the INPUT statement is compiled, a slot is added to the
program data vector for each variable in the new data set
variable attributes such as length and type are determined the
first time a variable is encountered. Any variables that are created with an assignment statement in
the DATA step are also added
Execution Phase In detail
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
36/75
Powerpoint Templates 36
Execution Phase In detail
Initializing Variables
At the beginning of the execution phase, the value of _N_ is 1. Because there are no data errors, the value of _ERROR_ is 0.
The remaining variables are initialized to missing.
Missing numeric values are represented by periods (.)
missing character values are represented by blanks ()
Input Data The INFILE statement identifies the location of the raw data.
Input Pointer INPUT statement uses an input pointer to keep track of its
position The input pointer starts at column 1 of the first record, unless
otherwise directed
Execution Phase - End of the DATA Step
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
37/75
Powerpoint Templates 37
Execution Phase - End of the DATA Step
1. The values in the program data vector are written to the
output data set as the first observation2. The value of _N_ is set to 2 and control returns to the
top of the DATA step
3. The variable values in the program data vector are re-
set to missing4. That the automatic variables _N_ and _ERROR_ retain
their values
5. The DATA step works like a loop, repetitively executing
statements to read data values and create observationsone by one
End-of-File Marker The ultimate End !!
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
38/75
Powerpoint Templates 38
End-of-File Marker The ultimate End !!
End-of-File Marker
The execution phase continues the iterations until the end-of-filemarker is reached in the raw data file
The order in which variables are defined in the DATA stepdetermines the order in which the variables are stored in thedata set
data perm.update;
infile invent;
input Item $ 1-13 IDnum $ 15-19
InStock 21-22 BackOrd 24-25;
Total=instock+backord;
run;
Methods for getting your data into the SAS system
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
39/75
Powerpoint Templates 39
Methods for getting your data into the SAS system
Entering data directly into SAS dataset
Creating SAS datasets from raw files Using Data step
Using Import Procedure
Converting other softwares data files into SAS datasets
Reading other softwares data files directly
Different types of input
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
40/75
Powerpoint Templates 40
Different types of input
Column Input
List Input Formatted input
Types ofinput
Columninput
List inputFormatted
input
Free-Format Data
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
41/75
Powerpoint Templates 41
Free Format Data
What is free format data
Data that is not arranged in columns The fields are often separated by blanks or by some other
delimiter
Fixed-Field Data
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
42/75
Powerpoint Templates 42
Fixed Field Data
What is Fixed-Field Data
Data is arranged in columns or fixed fields You can specify a beginning and ending column for each field
Reading Free-Format Data
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
43/75
Powerpoint Templates 43
Reading Free Format Data
Using List Input
General form, INPUT statement using list input:INPUT variable ;
where
variable specifies the variable whose value the INPUT statementis to read
$ specifies that the variable is a character variable.
Because list input, by default, does not specify columnlocations, all fields must be separated by at least one blank or other
delimiter fields must be read in orderfrom left to right
you cannot skip or re-read fields.
Using Data step for External raw data
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
44/75
Powerpoint Templates 44
External raw data
Infile statement to tell SAS the filename and the path
data cities;
infile "C:\training\sample1.txt ;
input City $ Rank ;
run ;
NOTE: The infile "C:\training\sample1.txt" is: File Name=C:\training\sample1.txt,RECFM=V, LRECL=256
NOTE: 4 records were read from the infile "C:\training\sample1.txt".
The minimum record length was 8.
The maximum record length was 10.
NOTE: The data set WORK.CITIES has 4 observations and 2 variables.
NOTE: DATA statement used:
real time 0.25 seconds
cpu time 0.11 seconds
Mumbai 1
Delhi 2
Chennai 3
Calcutta 4
Text in the external file
Using Data step for External raw data
*
Working with Delimiters
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
45/75
Powerpoint Templates 45
Working with Delimiters
Use the DLM= option in the INFILE statement to
specify a delimiter other than a blank (the default)
Example:data perm.survey;
infile credit dlm=',';input Gender $ Age Bankcard FreqBank Deptcard FreqDept;
run;
Reading Raw data separated by spaces
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
46/75
Powerpoint Templates 46
Reading Raw data separated by spaces
List Inputdata runners;
input name $ surname $ age runtime1 runtime3 ;
datalines;
Scott A 15
23.3 21.5
Mark . 13 25.2 24.1
;
run ;
NOTE: SAS went to a new line when INPUT statement reached past the end of a line.
NOTE: The data set WORK.RUNNERS has 2 observations and 5 variables.
All missing data must be indicated by a periodAll values are separated by at least one spaceCharacter data are eight characters or fewer
Should not have embedded spaces
*
Reading Raw data separated by spaces
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
47/75
Powerpoint Templates 47
Reading Raw data separated by spaces
NOTE: Invalid data for runtime2 in line 228 1-7.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+-
228 Michael M 14 12 .name=Jon surname=K age=13 runtime1=25.1 runtime2=. _ERROR_=1 _N_=3
NOTE: SAS went to a new line when INPUT statement reached past the end of a line.
NOTE: The data set WORK.RUNNERS has 4 observations and 5 variables.
data runners;
input name $ surname $ age runtime1
runtime2 ;
datalines;
Scott A 15
22.0 21.9
Mark . 13 25.2 24.1
Jon K 13 25.1
Michael M 14 12 .
;
run ;
data runners;
input name $ surname $ age runtime1
runtime2 ;
datalines;
Scott A 15
22.0 21.9 Mark . 13 25.2 24.1
Michael M 14 12 .
;
run ;
*
Reading Missing Values at the End of a Record
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
48/75
Powerpoint Templates 48
Reading Missing Values at the End of a Record
Missover option:
If the missing values occur at the end of the record, you can usethe MISSOVER option in the INFILE statement to read themissing values at the end of the record
The MISSOVER option prevents SAS from going to anotherrecord if, when using list input, it does not find values in the
current line for all the INPUT statement variables At the end of the current record, values that are expected but not
found are set to missing
The MISSOVER option works only for missing values that occurat the end of the record
Reading Missing Values at the Beginning or
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
49/75
Powerpoint Templates 49
g g g gMiddle of a Record
The DSD Option
You can use the DSD option in the INFILE statement to correctlyread the raw data
sets the default delimiter to a comma
treats two consecutive delimiters as a missing value
removes quotation marks from values
If the data uses multiple delimiters or a single delimiter otherthan a comma, then simply specify the delimiter value(s) with theDLM= option
data perm.survey;
infile credit dsd dlm='*';
input Gender $ Age Bankcard FreqBank Deptcard FreqDept;
run;
The LENGTH Statement
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
50/75
Powerpoint Templates 50
The variable attributes are defined when the variable is
first encountered in the DATA step
Modifying List Input
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
51/75
Powerpoint Templates 51
y g p
The ampersand (&) modifier :
is used to read character values that contain embedded blanks. The & indicates that a character value that is being read with list
input might contain one or more single embedded blanks
The value is read until two or more consecutive blanks areencountered
The colon (:) modifier : enables you to read nonstandard data values and character
values that are longer than eight characters, but which containno embedded blanks
The colon (:) indicates that values are read until a blank (or otherdelimiter) is encountered, and then an informat is applied
input Rank City & $12. Pop86 : comma.;
Creating Free-Format Data
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
52/75
Powerpoint Templates 52
g
The PUT statement can also be used with list output to
create free-format raw data files. data _null_;
set perm.finance;
file 'c:\data\findat2'dlm=',' ;
put ssn name salary date : date9.;
run;
PROC EXPORT DATA=SAS-data-set;OUTFILE=filename ;
RUN;
Reading Raw Data in Fixed Fields
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
53/75
Powerpoint Templates 53
g
Column Input Features
It can be used to read character variable values that containembedded blanks.
input Name $ 1-25;
No placeholder is required for missing data. A blank field is readas missing and does not cause other fields to be readincorrectly.
Fields or parts of fields can be re-read.
Fields do not have to be separated by blanks or other delimiters.
Identifying Standard and Nonstandard Numeric
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
54/75
Powerpoint Templates 54
y gData
Standard numeric data values can contain only
numbers decimal points
numbers in scientific, or E, notation (23E4)
minus signs and plus signs.
Some examples of standard numeric data are 15, -15, 15.4,
+.05, 1.54E3, and -1.54E-3.
Nonstandard numeric data includes values that contain special characters, such as percent signs
(%), dollar signs ($), and commas (,)
date and time values data in fraction, integer binary, real binary, and hexadecimal
forms.
Using Formatted Input
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
55/75
Powerpoint Templates 55
g p
Whenever you encounter raw data that is organized into
fixed fields, you can use column input to read standard data only
formatted input to read both standard and nonstandard data.
General Form of the INPUT Statement UsingFormatted Input
INPUT variable informat.;
where
pointer-controlpositions the input pointer on a specified column
variable is the name of the variable that is being created
informatis the special instruction that specifies how SAS readsraw data.
@n Column Pointer Control
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
56/75
Powerpoint Templates 56
@
Using the @nColumn Pointer Control
The @n is an absolute pointer control that moves the inputpointer to a specific column number
The @moves the pointer to column n, which is the first columnof the field that is being read
You can use the @n to move a pointer forward or backward
when reading a record.INPUT @n variable informat.;
input @9 FirstName $5. @1 LastName $7.
The +n Pointer Control
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
57/75
Powerpoint Templates 57
The +nPointer Control
The +n pointer control moves the input pointer forward to acolumn number that is relative to the current position
The + moves the pointer forward ncolumns
INPUT +n variable informat.;
Using Informats
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
58/75
Powerpoint Templates 58
An informat is an instruction that tells SAS how to read
raw data SAS provides many informats for reading standard and
nonstandard data values
Note that each informat contains a wvalue to indicate the width of the raw
data field
each informat also contains a period, which is a requireddelimiter
for some informats, the optional dvalue specifies the number ofimplied decimal places
informats for reading character data always begin with a dollarsign ($).
PERCENTw.d DATEw. NENGOw.
$BINARYw. DATETIMEw. PDw.d
HEXw. PERCENTw.
$w. JULIANw. TIMEw.
COMMAw.d MMDDYYw. w.d
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
59/75
Powerpoint Templates 59
Reading Character Values
The $w. informat enables you to read character data The wrepresents the field width of the data value (the total
number of columns that contain the raw data field)
Reading Standard Numeric Data
The informat for reading standard numeric data is the w.dinformat
The wspecifies the field width of the raw data value, the periodserves as a delimiter, and the doptionally specifies the numberof implied decimal places for the value
. The w.dinformat ignores any specified dvalue if the dataalready contains a decimal point
Reading Nonstandard Numeric Data
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
60/75
Powerpoint Templates 60
The COMMAw.dinformat is used to read numeric values
and to remove embedded blanks commas
dashes
dollar signs
percent signs right parentheses
left parentheses, which are converted to minus signs
1. the informat name COMMA
2
a value that specifies the width of the field to be read (including dollar
signs, decimal places, or other special characters), followed by a period w.
3an optional value that specifies the number of implied decimal places for avalue (not necessary if the value already contains decimal places).
d
Reading Date and Time Values
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
61/75
Powerpoint Templates 61
How SAS Stores Date Values ?
When you use a SAS informat to read a date, SAS converts it toa numeric date value. A SAS date value is the number of daysfrom January 1, 1960, to the given date.
How SAS Stores Time Values ? SAS stores time values similar to the way it stores date values. A
SAS time value is stored as the number of seconds sincemidnight.
A SAS datetime is a special value that combines bothdate and time information. A SAS datetime value is
stored as the number of seconds between midnight onJanuary 1, 1960, and a given date and time.
Date and Time Informats
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
62/75
Powerpoint Templates 62
MMDDYYw. Informat
Reads mmddyyormmddyyyy
DATEw. Informat Reads ddmmmyyorddmmmyyyy
Date Expression SAS Date Informat
101599 MMDDYY6.
10/15/99 MMDDYY8.
10 15 99 MMDDYY8.
10-15-1999 MMDDYY10.
Date Expression SAS Date Informat
30May00 DATE7.
30May2000 DATE9.
30-May-2000 DATE11.
Date and Time Informats
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
63/75
Powerpoint Templates 63
TIMEw. Informat
Reads hh:mm:ss.ss where
hh is an integer from 00 to 23, representing the hour
mm is an integer from 00 to 59, representing the minute
ss.ss is an optional field that represents seconds and hundredths of
seconds.
Five is the minimum acceptable field width for theTIMEw. informat.
Time Expression SAS Time Informat
17:00:01.34 TIME11.
17:00 TIME5.
2:34 TIME5.
Date and Time Informats
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
64/75
Powerpoint Templates 64
The WEEKDATEw. Format
The WEEKDATEw. format writes date values in the form day-of-week, month-name dd, yy(oryyyy).
The WORDDATEw. Format The WORDDATEw. format is similar to the WEEKDATEw.
format, but it does not display the day of the week or the two-
digit year values.
FORMAT Statement Result
format datein weekdate3.; Mon
format datein weekdate6.; Monday
format datein weekdate17.; Monday, Apr 5, 99
format datein weekdate21.; Monday, April 5, 1999
FORMAT Statement Result
format datein worddate3.; Apr
format datein worddate5.; April
format datein worddate14.; April 15, 1999
Line Pointer Controls
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
65/75
Powerpoint Templates 65
When SAS reads raw data values, it keeps track of its
position with an input pointer We have used column pointer controls and column
specifications to determine the column placement of theinput pointer
We can also position the input pointer on a specificrecord by using a line pointer control
There are two types of line pointer controls The forward slash (/) specifies a line location that is relative to
the current one
The #nspecifies the absolute number of the line to which youwant to move the pointer
Reading Multiple Records Sequentially
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
66/75
Powerpoint Templates 66
The Forward Slash (/) Line Pointer Control
Refer to the embedded word doc for illustration
Note that the raw data file must contain the samenumber of records for each observation that is beingcreated
Reading Multiple Records Sequentially
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
67/75
Powerpoint Templates 67
The #n Line Pointer Control
The #n specifies the absolute number of the line to which youwant to move the input pointer
The #n pointer control can read records in any order
Refer to the embedded word doc for illustration
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
68/75
Creating a Single Observation from MultipleR d
-
7/27/2019 1-Reading Raw Data in SAS Week1
69/75
Powerpoint Templates69
Records
The forward slash (/) specifies a line location that is
relative to the current one. The / advances the input pointer to the next record. The / line pointer control only moves the input pointerforward
and must be specified afterthe instructions for reading thevalues in the current record.
Note that the raw data file must contain the same number ofrecords for each observation that is being created.
Reading Multiple Records Non-Sequentially
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
70/75
Powerpoint Templates70
The #n Line Pointer Control
The #n specifies the absolute number of the line to which youwant to move the input pointer.
The #n pointer control can read records in any order
It must be specified before the instructions for reading values ina specific record.
Points to Remember Because the / pointer control can only move forward, the pointer
control is specified afterthe values in the current record areread.
The #n pointer control can read records in any order and mustbe specified before the variable names are defined.
A semicolon should be placed at the end of the complete INPUTstatement.
Creating Multiple Observations from a SingleR d
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
71/75
Powerpoint Templates71
Record
SAS provides two line-hold specifiers. A Line-Hold
Specifiers hold the current record for next inputstatement.
The trailing at sign (@) holds the input record for theexecution of the next INPUT statement.
The double trailing at sign (@@) holds the input recordfor the execution of the next INPUT statement, evenacross iterations of the DATA step.
The term trailing indicates that the @ or @@ must bethe last item that is specified in the INPUT statement. E.g. input Name $20. @; or input Name $20. @@;
Using the Double Trailing At Sign (@@) toH ld th C t R d
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
72/75
Powerpoint Templates72
Hold the Current Record
Typically, each time a DATA step executes, the INPUT
statement reads a new record. When the trailing @@ is used, the INPUT statement
holds the current record and reads the next value.
The double trailing at sign (@@)
Holds the data line in the input bufferacross multipleexecutions of the DATA step
Typically is used to read multiple SAS observations from a singledata line
Should not be used with the @ pointer control, with column
input, nor with the MISSOVER option.
Using the Double Trailing At Sign (@@) toH ld th C t R d
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
73/75
Powerpoint Templates73
Hold the Current Record
A record that is being held by the double trailing at sign
(@@) is not released until one of the following eventsoccurs: The input pointer moves past the end of the record. Then the
input pointer moves down to the next record.
An INPUT statement that has no line-hold specifier executes.
Using the Single Trailing At Sign (@) to Holdth C t R d
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
74/75
Powerpoint Templates74
the Current Record
Like the double trailing @@, the single trailing @
Enables the next INPUT statement to read from the same record Releases the current record when a subsequent INPUT
statement executes without a line-hold specifier.
It's easy to distinguish between the trailing @@ and thetrailing @ by remembering that the double trailing at sign (@@) holds a record across multiple
iterations of the DATA step until the end of the record is reached.
the single trailing at sign (@) releases a record when control
returns to the top of the DATA step.
http://www.powerpointstyles.com/http://www.powerpointstyles.com/ -
7/27/2019 1-Reading Raw Data in SAS Week1
75/75
THANK YOU
HAPPY LEARNING