working with administrative databases: tips and tricks › content › dam › sas › en_ca ›...

Post on 07-Jun-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

3

Working with Administrative

Databases: Tips and Tricks

Canadian Institute for Health Information

Emerging Issues Team

Simon Tavasoli

Administrative Databases

> Administrative databases are often used to synthesize information

regarding health care system or to investigate health research

questions

> The data may be derived from population registries, vital statistics or

other records of life events, or from health claims and services data

> Canadian Institute for Health Information (CIHI), collect /receives

essential data and prepares analyses on Canada’s health system

and the health of Canadians

> Currently CIHI holds more than 27 databases with millions of

Record (e.g. National Ambulatory Care Registry contains millions of

records each year)

3

Working with Administrative Databases:

General Tips and Tricks

> Each day hundreds of employees conduct analyses using SAS

> Given the magnitude of work load on the CIHI server, using

resources wisely is important

> Efficiency can be measured in many ways

– Real Time

– CPU time

– Memory

– Input /Output

– Original Programmer time

– Maintenance Programmer time

3

There is always a trade-off

System Options for measure of performance > Options STIMER; (Default ) NOTE: DATA statement used:

real time 1.16 seconds

cpu time 0.09 seconds

> Options FULLSTIMER; NOTE: The SAS System used:

real time 0.14 seconds

user cpu time 0.01 seconds

system cpu time 0.05 seconds

Memory 1452k

Page Faults 1

Page Reclaims 2349

Page Swaps 0

Voluntary Context Switches 53

Involuntary Context Switches 5

Block Input Operations 1

Block Output Operations 0

4

Optimizing performance

* Optimize performance by reducing CPU time

5

-Check the program using the _null_ or the OBS

-Use WHERE vs. IF

-Use DROP and KEEP statements

-Issues with merging data

-Avoid unnecessary DATA steps or sorting

-Manipulation of data with IF/THEN/ELSE statements

-Dealing with resource intensive calculations

*Keep the libraries clean

*Reduce the size of the tables using COMPRESS=YES

When checking your programs, use a null

data set or limit the number of

observations

6

Subsetting Datasets: WHERE vs. IF

statements

7

Process only the variables that you need

8

Need only two variables

Social Sciences computing cooperative

Subsetting datasets

9

Subsetting datasets: KEEP Statement

10

Subsetting datasets: KEEP Statement

11

Subsetting datasets: KEEP Statement

12

Some other Shortcuts

13

Merging data

14

Merging data

15

When only one condition can be true for

a given observation, write

a series of IF-THEN/ELSE statements.

16 Social Sciences computing cooperative

When only one condition can be true for

a given observation, write

a series of IF-THEN/ELSE statements.

17

18

When only one condition can be true for

a given observation, write

a series of IF-THEN/ELSE statements.

Perform resource-intensive calculations

and comparisons only once

19 Social Sciences computing cooperative

Assign many values in one statement

20 Social Sciences computing cooperative

Dealing with Missing Values

21

Put missing values last in expressions

Check for missing values before using a variable in multiple

statements.

Social Sciences computing cooperative

Avoid unnecessary sorting

22

If several different subsets are needed,

avoid rereading the data for each subset

23

Keep your SAS environment clean

24

COMPRESS=

25

top related