working with administrative databases: tips and tricks › content › dam › sas › en_ca ›...
TRANSCRIPT
3
Working with Administrative
Databases: Tips and Tricks
Canadian Institute for Health Information
Emerging Issues Team
Simon Tavasoli
Administrative Databases
> Administrative databases are often used to synthesize information
regarding health care system or to investigate health research
questions
> The data may be derived from population registries, vital statistics or
other records of life events, or from health claims and services data
> Canadian Institute for Health Information (CIHI), collect /receives
essential data and prepares analyses on Canada’s health system
and the health of Canadians
> Currently CIHI holds more than 27 databases with millions of
Record (e.g. National Ambulatory Care Registry contains millions of
records each year)
3
Working with Administrative Databases:
General Tips and Tricks
> Each day hundreds of employees conduct analyses using SAS
> Given the magnitude of work load on the CIHI server, using
resources wisely is important
> Efficiency can be measured in many ways
– Real Time
– CPU time
– Memory
– Input /Output
– Original Programmer time
– Maintenance Programmer time
3
There is always a trade-off
System Options for measure of performance > Options STIMER; (Default ) NOTE: DATA statement used:
real time 1.16 seconds
cpu time 0.09 seconds
> Options FULLSTIMER; NOTE: The SAS System used:
real time 0.14 seconds
user cpu time 0.01 seconds
system cpu time 0.05 seconds
Memory 1452k
Page Faults 1
Page Reclaims 2349
Page Swaps 0
Voluntary Context Switches 53
Involuntary Context Switches 5
Block Input Operations 1
Block Output Operations 0
4
Optimizing performance
* Optimize performance by reducing CPU time
5
-Check the program using the _null_ or the OBS
-Use WHERE vs. IF
-Use DROP and KEEP statements
-Issues with merging data
-Avoid unnecessary DATA steps or sorting
-Manipulation of data with IF/THEN/ELSE statements
-Dealing with resource intensive calculations
*Keep the libraries clean
*Reduce the size of the tables using COMPRESS=YES
When checking your programs, use a null
data set or limit the number of
observations
6
Subsetting Datasets: WHERE vs. IF
statements
7
Process only the variables that you need
8
Need only two variables
Social Sciences computing cooperative
Subsetting datasets
9
Subsetting datasets: KEEP Statement
10
Subsetting datasets: KEEP Statement
11
Subsetting datasets: KEEP Statement
12
Some other Shortcuts
13
Merging data
14
Merging data
15
When only one condition can be true for
a given observation, write
a series of IF-THEN/ELSE statements.
16 Social Sciences computing cooperative
When only one condition can be true for
a given observation, write
a series of IF-THEN/ELSE statements.
17
18
When only one condition can be true for
a given observation, write
a series of IF-THEN/ELSE statements.
Perform resource-intensive calculations
and comparisons only once
19 Social Sciences computing cooperative
Assign many values in one statement
20 Social Sciences computing cooperative
Dealing with Missing Values
21
Put missing values last in expressions
Check for missing values before using a variable in multiple
statements.
Social Sciences computing cooperative
Avoid unnecessary sorting
22
If several different subsets are needed,
avoid rereading the data for each subset
23
Keep your SAS environment clean
24
COMPRESS=
25