copyright © 2004, sas institute inc. all rights reserved. sashelp datasets a real life example barb...
DESCRIPTION
Copyright © 2004, SAS Institute Inc. All rights reserved. 3 Situation – Data The data is extracted from many different applications and saved to a flat file as a ‘report’. Excluding the header and subheader, there were 9 different parts for each report Data formatting was very specific and varied from part to part The number of fields varied from 4 to 35, Field lengths varied between 1 and 400 The part lengths varied between 139 and 507 characters.TRANSCRIPT
Copyright © 2004, SAS Institute Inc. All rights reserved.
SASHELP DatasetsA real life exampleBarb CrowtherSAS ConsultantOctober 22, 2004
Copyright © 2004, SAS Institute Inc. All rights reserved. 2
Situation
The Canadian government requires all financial institutions to report suspicious financial transactions, including:• Cash• Money Orders• Casino Chips• Real Estate ….
Copyright © 2004, SAS Institute Inc. All rights reserved. 3
Situation – Data
The data is extracted from many different applications and saved to a flat file as a ‘report’.
Excluding the header and subheader, there were 9 different parts for each report
Data formatting was very specific and varied from part to part
The number of fields varied from 4 to 35,
Field lengths varied between 1 and 400
The part lengths varied between 139 and 507 characters.
Copyright © 2004, SAS Institute Inc. All rights reserved. 4
Types of Required Information ……
Name Add.
Char Char
Date
Char
Whoare
you?
Date
Char
Time
Num.
Amount
Num.
Susp.Txn
Desc.
CharWhy?
Copyright © 2004, SAS Institute Inc. All rights reserved. 5
The first kicker
Not all reports needed all parts.
Some parts were always required, others only some of the time.
Which parts were included was dependant on the data.
But if the part was required, it had to be perfect!!!
Copyright © 2004, SAS Institute Inc. All rights reserved. 6
The second kicker
The users wanted to be able to edit each part before it was sent to the government – but because of the tool they used, they could not insert (or delete) a missing part.
So even all the fields were missing from the source data, the part had to be included
Copyright © 2004, SAS Institute Inc. All rights reserved. 7
So……….
The application had to insert each required part
The only information I would get is a sequence number.
Copyright © 2004, SAS Institute Inc. All rights reserved. 8
Attempt #1 – Hard code the data step
For each of the nine parts…..
…. For each of the 10 to 35 fields per part….
…I could write a “length variable ($) n;” statement.
Oh, and by the way, did I tell you that the government regularly changes part content?
Copyright © 2004, SAS Institute Inc. All rights reserved. 9
Attempt #2 – Investigate system options
Options obs=0;Data parta1;
set input.parta1;Run;Options obs=max;The problem with this code is the output dataset has no observations and I needed one, even if there was no data.
Copyright © 2004, SAS Institute Inc. All rights reserved. 10
Attempt #3 – Look at SASHELP.datasets
The SASHELP datasets contain information about the current SAS session including• all the members of all the libraries
(SASHELP.VMEMBER)• all the columns of each member (SASHELP.VCOLUMN)
Copyright © 2004, SAS Institute Inc. All rights reserved. 11
VCOLUMN Contents
Variable Name Variable Label Create Date …Format Column FormatInformat Column InformatLabel Column LabelLength Column LengthLibname Library NameMemname Member NameType Column Type
…
Copyright © 2004, SAS Institute Inc. All rights reserved. 12
Accessing V* Tables
Accessing the V* Tables can be done using PROC, SQL, or Data statements• proc print data=sashelp.vtable; where
libname='WORK'; run; • proc sql; create view work.options as select * from
dictionary.options;
Copyright © 2004, SAS Institute Inc. All rights reserved. 13
So how does this help me?
Step 1: Get a list of all the variables (and their attributes) required for the “empty dataset”.
Step 2: Move all that information into macro variables
Step 3: Create a dataset template
Step 4: Create the empty dataset.
Copyright © 2004, SAS Institute Inc. All rights reserved. 14
Step 1: Variables and their attributes
proc sql noprint; create table &table._vars as select name, type, format, length, label from sashelp.vcolumns where upcase(libname)=upcase("&inset") and upcase(memname) = upcase("&table") ;quit;
For this example, our inset will be Work and our table Txns.
Copyright © 2004, SAS Institute Inc. All rights reserved. 15
Step 1: Work.Txns_varsVCOLUMNS Output
Name Type Format Length LabelTran_date Char 8Tran_Post_Date Char 8Tran_Currency Char $3.00 3 Currency_codeTran_time Char 4Tran_Amount Num 8Teller_id Char $15
…
Copyright © 2004, SAS Institute Inc. All rights reserved. 16
Step 2: Create macro variablesusing Txns_vars from Step 1
data _null_; set &table._vars end=eof; call symput('var'||left(put(_n_,3.)),name); if format ne ' ' then call symput('fmt'||
left(put(_n_,3.)),format); else if upcase(type) = 'CHAR' then call
symput('fmt'||left(put(_n_,3.)),'$'|| put(length,3.)||'.');
if label ne ' ' then call symput('label'||left(put(_n_,3.)),label);
if eof then call symput('var_cnt',left(put(_n_,3.))) ;
run;
Copyright © 2004, SAS Institute Inc. All rights reserved. 17
Step 2: Macro Output from the SASLOG
Macro NameMacro Variable Value
BUILD_TABLE_TEMPLATE VAR1 TRANSACTION_KEYBUILD_TABLE_TEMPLATE VAR2 TRAN_CURRENCY
. . .BUILD_TABLE_TEMPLATE FMT1 20BUILD_TABLE_TEMPLATE FMT2 $3
. . .BUILD_TABLE_TEMPLATE LABEL1 TRANSACTION_KEYBUILD_TABLE_TEMPLATE LABEL2 CURRENCY_CODE
. . .BUILD_TABLE_TEMPLATE VAR_CNT 21
Copyright © 2004, SAS Institute Inc. All rights reserved. 18
Step 3: Create the dataset templateWork.Txn_tpl using the Step 2 macro variables
%let i = 1;data &table._tpl; %do i=1 %to &var_cnt; attrib &&var&i format= &&fmt&i label = "&&label&i"; %end;
Copyright © 2004, SAS Institute Inc. All rights reserved. 19
Step 3: Create the dataset templategenerated code to create Work.Txn_tpl
Value of i Generated Code1 Attrib transaction_key format = 20. label =
‘transaction_key”;2 Attrib transaction currency format = $3.
label=‘Currency Code’;
…
Copyright © 2004, SAS Institute Inc. All rights reserved. 20
Step 3: Create a dataset templateGet a list of the required variables
%global &table._var_list;proc sql noprint; select distinct name into :&table._var_list separated by ' ' from &table._vars ; quit;
Results in a macro variable called txn_var_list with a value of TRANSACTION_KEY TRAN_CURRENCY …
Copyright © 2004, SAS Institute Inc. All rights reserved. 21
So where are we?
We have a report with a known sequence number, but no data
We know what variables are required • &txn_var_list
We know the variables’ attributes • &&var&i format= &&fmt&i label = "&&label&i";
Copyright © 2004, SAS Institute Inc. All rights reserved. 22
Step 4: Create the empty dataset
List of variablesin the Txn
dataset(&txn_var_list)
The variableattributes(&&var&1,&&attrib&i,&&label&i)
The sequencenumber of thereport with themissing Txn
part
Dataset with thesequence
number & allthe othervariables
Copyright © 2004, SAS Institute Inc. All rights reserved. 23
Step 4: Code to generate the dataset
data &table._miss_data; retain &&&table._var_list; set result (keep=seq_num); if _n_ = 1 then set
&table._tpl(drop=seq_num);run;
Copyright © 2004, SAS Institute Inc. All rights reserved. 24
Thoughts…
Writing the macros took longer than hard coding the attribute statements.
But, if there are any future changes, I won’t have to do very much (if any).
The macros can be used in other applications…
Copyright © 2004, SAS Institute Inc. All rights reserved. 25
Suggested readings
The SASHELP Library: It Really Does Help You Manage Data by Melinda Thielbar• http://support.sas.com/sassamples/bitsandbytes/sashel
p.html
You Could Look It Up: An Introduction to SASHELP Dictionary Tables by Michael Davis • http://www2.sas.com/proceedings/sugi26/p017-26.pdf
Copyright © 2004, SAS Institute Inc. All rights reserved. 26Copyright © 2004, SAS Institute Inc. All rights reserved. 26