tips and tricks group... · 2016-03-11 · tips & tricks with lots of help from other sug and...

33
Tips & Tricks With lots of help from other SUG and SUGI presenters SAS CALGARY USER GROUP OCTOBER 12, 2012 1

Upload: others

Post on 24-Mar-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tips and Tricks Group... · 2016-03-11 · Tips & Tricks With lots of help from other SUG and SUGI presenters SAS CALGARY USER GROUP OCTOBER 12, 2012 1

Tips & Tricks

With lots of help from other SUG and

SUGI presenters

SAS CALGARY USER GROUP

OCTOBER 12, 2012

1

Page 2: Tips and Tricks Group... · 2016-03-11 · Tips & Tricks With lots of help from other SUG and SUGI presenters SAS CALGARY USER GROUP OCTOBER 12, 2012 1

2

Page 3: Tips and Tricks Group... · 2016-03-11 · Tips & Tricks With lots of help from other SUG and SUGI presenters SAS CALGARY USER GROUP OCTOBER 12, 2012 1

Sorting

3

Page 4: Tips and Tricks Group... · 2016-03-11 · Tips & Tricks With lots of help from other SUG and SUGI presenters SAS CALGARY USER GROUP OCTOBER 12, 2012 1

THREADS

Multi-threading available if your computer has more than one

processor (CPU)

Divide and conquer: calculations are split up and run in

parallel on the processors

4

Page 5: Tips and Tricks Group... · 2016-03-11 · Tips & Tricks With lots of help from other SUG and SUGI presenters SAS CALGARY USER GROUP OCTOBER 12, 2012 1

SORTING

•In memory or using a utility file

– details option

•Three copies of the data

– Original data set

– Temporary sorted files

– Sorted version of the data set

– Original file not overwritten until sort is complete …

• …. unless you specify the ‘overwrite’ option

5

Page 6: Tips and Tricks Group... · 2016-03-11 · Tips & Tricks With lots of help from other SUG and SUGI presenters SAS CALGARY USER GROUP OCTOBER 12, 2012 1

SORTING - TAGSORT

•Useful when there is not enough disk space to sort a large data set.

•Only sort keys (listed on the by statement) and observation number (= tags) are sorted

•Tags are used to retrieve record from input dataset in sorted order.

•If sort keys small relative to total record length, temporary disk use is reduced.

•Requires reading each observation twice

•Single-threaded.

• Considerably slower

6

Page 7: Tips and Tricks Group... · 2016-03-11 · Tips & Tricks With lots of help from other SUG and SUGI presenters SAS CALGARY USER GROUP OCTOBER 12, 2012 1

THE ‘NOEQUALS’

OPTION AND SQL

Default sorting behavior is to retain the order of records

within each ‘by’ group

‘noequals’ does not retain the order

Faster …..

….unless you are relying on retaining the order

SQL does the equivalent of ‘noequals’

7

Page 8: Tips and Tricks Group... · 2016-03-11 · Tips & Tricks With lots of help from other SUG and SUGI presenters SAS CALGARY USER GROUP OCTOBER 12, 2012 1

COMPRESSION AND

SORTING

•SAS compress removes repeating blanks, characters, and

numbers from each observation

•Adds a tag, containing the information needed to

uncompress the observation.

•Sorting may be faster on an compressed data set (less I/O)

•!!Compress can result in a data set that is larger than the

original (even in v9)

– – size of tag may be larger than saved space.

8

Page 9: Tips and Tricks Group... · 2016-03-11 · Tips & Tricks With lots of help from other SUG and SUGI presenters SAS CALGARY USER GROUP OCTOBER 12, 2012 1

SAVING DISK SPACE

9

Page 10: Tips and Tricks Group... · 2016-03-11 · Tips & Tricks With lots of help from other SUG and SUGI presenters SAS CALGARY USER GROUP OCTOBER 12, 2012 1

COMPRESS (AGAIN)

options compress = yes;

• Requires CPU to uncompress each observation prior to use

• Data sets may grow rather than shrink

• Probably not a good idea to apply compression automatically

data mydata (compress = yes);

• Check log file to see if file shrunk

10

Page 11: Tips and Tricks Group... · 2016-03-11 · Tips & Tricks With lots of help from other SUG and SUGI presenters SAS CALGARY USER GROUP OCTOBER 12, 2012 1

DROP VARIABLES AND

OBSERVATIONS

•Drop unnecessary variables ASAP (use the keep and drop

options on the data step)

•Get rid of unnecessary observations ASAP (use the “where”

statement on input to avoid wasting CPU time processing

observations you don’t need to process)

data females; set all (where = (sex = ‘F’));

•Murphy’s law of SAS datasets

– Any variable that is dropped from a dataset will be required

two procs later.

11

Page 12: Tips and Tricks Group... · 2016-03-11 · Tips & Tricks With lots of help from other SUG and SUGI presenters SAS CALGARY USER GROUP OCTOBER 12, 2012 1

LENGTH STATEMENT

•Each character (including spaces!) requires 1 byte.

data bigword;

length word $10;

length flag 3;

To change the length of a variable after the fact

data smallword;

length word $4;

set mydata;

12

Page 13: Tips and Tricks Group... · 2016-03-11 · Tips & Tricks With lots of help from other SUG and SUGI presenters SAS CALGARY USER GROUP OCTOBER 12, 2012 1

NUMBERS

Length in

bytes

Largest integer

represented exactly

Exponential notation

3 8,192 213

4 2,097,152 221

5 536,870,912 229

6 137,438,953,472 237

7 35,184,372,088,832 245

8 9,007,119,254,740,992 253

13

Page 14: Tips and Tricks Group... · 2016-03-11 · Tips & Tricks With lots of help from other SUG and SUGI presenters SAS CALGARY USER GROUP OCTOBER 12, 2012 1

14

Page 15: Tips and Tricks Group... · 2016-03-11 · Tips & Tricks With lots of help from other SUG and SUGI presenters SAS CALGARY USER GROUP OCTOBER 12, 2012 1

THE DATA STEP

15

Page 16: Tips and Tricks Group... · 2016-03-11 · Tips & Tricks With lots of help from other SUG and SUGI presenters SAS CALGARY USER GROUP OCTOBER 12, 2012 1

DATA SET LISTS

Useful for reading in like-named datasets.

Can specify libraries.

WHERE, DROP, KEEP etc. can be applied in one data step.

16

Page 17: Tips and Tricks Group... · 2016-03-11 · Tips & Tricks With lots of help from other SUG and SUGI presenters SAS CALGARY USER GROUP OCTOBER 12, 2012 1

MULTIPLE INPUT

DATASETS

Useful when you want to read in multiple like-named datasets

that have an incremental numbering sequence as part of the

name.

e.g.

indata1, indata2, indata3…indata50

17

Page 18: Tips and Tricks Group... · 2016-03-11 · Tips & Tricks With lots of help from other SUG and SUGI presenters SAS CALGARY USER GROUP OCTOBER 12, 2012 1

OPTIONS

•Write Macro

– Can complicate data step

• Can type the datasets names

– Time consuming

– Prone to error

– Do you really want to do this?

data result_out;

set indata1 indata2 indata3 indata4 indata5 indata6 indata7 indata8

indata9 indata10 indata11 indata12 indata13 indata14 indata15 indata16

indata17 indata18 indata19 indata20 indata21 indata22 indata23 indata24

indata25 indata26 indata27 indata28 indata29 indata30 indata31 indata32

indata33 indata34 indata35 indata36 indata37 indata38 indata39 indata40

indata41 indata42 indata43 indata44 indata45 indata46 indata47 indata48

indata49 indata50;

run;

18

Page 19: Tips and Tricks Group... · 2016-03-11 · Tips & Tricks With lots of help from other SUG and SUGI presenters SAS CALGARY USER GROUP OCTOBER 12, 2012 1

USE DASH LISTS

•Dash lists make it simple

data out;

set indata1 - indata50;

run;

•Can also specify library

data out;

set cmlib.indata1 – cmlib.indata50;

run;

19

Page 20: Tips and Tricks Group... · 2016-03-11 · Tips & Tricks With lots of help from other SUG and SUGI presenters SAS CALGARY USER GROUP OCTOBER 12, 2012 1

MULTIPLE INPUT

DATASETS

PART 2

Useful when you want to read in multiple like named datasets

that start with the same prefix or when all the datasets to be

read are not known

e.g. Prod_region1, Prod_region4, Prod_region7,

Prod_region21…Prod_region36

20

Page 21: Tips and Tricks Group... · 2016-03-11 · Tips & Tricks With lots of help from other SUG and SUGI presenters SAS CALGARY USER GROUP OCTOBER 12, 2012 1

OPTIONS

• Write Macro

– Can complicate datastep

• Can type the datasets names

– Time consuming

– Prone to error

– Do you really want to do this?

data total_production;

set Prod_region1 Prod_region1 Prod_region4 Prod_region7

Prod_region20 Prod_region21 Prod_region24,Prod_region27

Prod_region31 Prod_region32 Prod_region33 Prod_region36;

run;

21

Page 22: Tips and Tricks Group... · 2016-03-11 · Tips & Tricks With lots of help from other SUG and SUGI presenters SAS CALGARY USER GROUP OCTOBER 12, 2012 1

USE COLON LISTS

Colon lists make it simple

data out;

set prod_region: ; run;

Can also specify library

data out;

set CMLIB.prod_region: ; run;

22

Page 23: Tips and Tricks Group... · 2016-03-11 · Tips & Tricks With lots of help from other SUG and SUGI presenters SAS CALGARY USER GROUP OCTOBER 12, 2012 1

23

Page 24: Tips and Tricks Group... · 2016-03-11 · Tips & Tricks With lots of help from other SUG and SUGI presenters SAS CALGARY USER GROUP OCTOBER 12, 2012 1

WHAT ELSE CAN THE

DATA STEP DO?

• Lots of stuff, but…here’s one example

• You already know how to read in multiple datasets…

• What if you want to write out multiple datasets?

Data east_production (where=(region=1))

west_production (where=(region=2))

north_production (where=(region=3))

south_production (where=(region=4));

Set prod_reg:;

Run;

24

Page 25: Tips and Tricks Group... · 2016-03-11 · Tips & Tricks With lots of help from other SUG and SUGI presenters SAS CALGARY USER GROUP OCTOBER 12, 2012 1

25

WHAT ABOUT EG?

Page 26: Tips and Tricks Group... · 2016-03-11 · Tips & Tricks With lots of help from other SUG and SUGI presenters SAS CALGARY USER GROUP OCTOBER 12, 2012 1

ALL THIS APPLIES TO

EG ALSO!

26

Page 27: Tips and Tricks Group... · 2016-03-11 · Tips & Tricks With lots of help from other SUG and SUGI presenters SAS CALGARY USER GROUP OCTOBER 12, 2012 1

OTHER TIPS FOR EG

USERS

• Autoexec in Enterprise Guide?

• Query option to limit output during development

• Use Notes node to document Project

• Organize project using multiple process flows

• Understand your data before performing analysis on it

• Frequency counts

• Charts

• Characterize data

27

Page 28: Tips and Tricks Group... · 2016-03-11 · Tips & Tricks With lots of help from other SUG and SUGI presenters SAS CALGARY USER GROUP OCTOBER 12, 2012 1

28

Page 29: Tips and Tricks Group... · 2016-03-11 · Tips & Tricks With lots of help from other SUG and SUGI presenters SAS CALGARY USER GROUP OCTOBER 12, 2012 1

29

Page 30: Tips and Tricks Group... · 2016-03-11 · Tips & Tricks With lots of help from other SUG and SUGI presenters SAS CALGARY USER GROUP OCTOBER 12, 2012 1

30

Page 31: Tips and Tricks Group... · 2016-03-11 · Tips & Tricks With lots of help from other SUG and SUGI presenters SAS CALGARY USER GROUP OCTOBER 12, 2012 1

31

Page 32: Tips and Tricks Group... · 2016-03-11 · Tips & Tricks With lots of help from other SUG and SUGI presenters SAS CALGARY USER GROUP OCTOBER 12, 2012 1

QUESTIONS?

32

Page 33: Tips and Tricks Group... · 2016-03-11 · Tips & Tricks With lots of help from other SUG and SUGI presenters SAS CALGARY USER GROUP OCTOBER 12, 2012 1

33