a sas user's guide to storage management allan page senior marketing analyst canadian tire...

38
A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

Upload: neal-atkins

Post on 17-Jan-2016

227 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

A SAS User's Guide toStorage Management

Allan PageSenior Marketing AnalystCanadian Tire Financial Services

Page 2: A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

Files are stored on all system types

PC (All flavours of Windows)

File or Application Servers (Novell, NT, Y2KServer)

Mid-Range Systems (Unix, etc.)

Mainframes (MVS)

Page 3: A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

All systems have two thing in common

When the drives are full, they’re full !

You can’t add data to a full drive !

Page 4: A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

Disk drives come in different sizes

PC LAN MID-RANGE MAINFRAME

Page 5: A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

Tip # 1

WILL I EVER, EVER NEED THIS FILE AGAIN?

YES? NO?

Page 6: A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

Tip # 1

WILL I EVER, EVER NEED THIS FILE AGAIN?

YES? NO?

DELETE

Page 7: A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

Tip # 2

What if you said yes?

Page 8: A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

What if you said yes?

Will I need this file

right away?

Will I need this file

in the near future?

Will I need this file

in the far or unknown

future?

YES? NO?

Page 9: A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

I need the file right away …..

Will I need this file

right away?

Will I need this file

in the near future?

Will I need this file

in the far or unknown

future?

YES? NO?

Don’t touch it

Page 10: A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

I need it in the near future …...

Will I need this file

right away?

Will I need this file

in the near future?

Will I need this file

in the far or unknown

future?

YES? NO?

Compress the file‘till it is needed.

Page 11: A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

I need it in the near future …...

MVS uses HSM System

UNIX uses GZIP

Windows uses Winzip

Windows XP with NTFS has zip and compression utilities

Page 12: A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

I need it in the distant future …..

Will I need this file

right away?

Will I need this file

in the near future?

Will I need this file

in the far or unknown

future?

YES? NO?

Consider AlternateStorage Media

Page 13: A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

I need it in the distant future …..

MVS HSM will migrate to tape

UNIX systems may have access to tape storage.

For Windows, consider storing on CD

Page 14: A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

SAS Specific Storage Efficiencies

Don’t keep duplicate files or subsets

Don’t keep unnecessary rows of data

Don’t keep unnecessary columns of data

Page 15: A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

How to create a view or use Where.

There are two types of views

1. Data step views

2. SQL views

Page 16: A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

What a view is - and is not

A view IS a MAP to read other data in a specified form.

A view IS NOT a data store.

Page 17: A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

Creating and using a Data Step ViewData sasuser.withoutact / view= sasuser.withoutact; infile'X:\Pamela\PR03150\nucomm\pmd.nuc.ctac.enroll.20030505.txt'firstobs=2 delimiter = ','MISSOVER DSD lrecl=32767 ;InputVAR1 $ VAR2 $ VAR3 $ VAR4 $ VAR5 $ VAR6 $ VAR7 $ ;Length V1 $14 V2 $38 V3 $42 V4 $21 V5 $4 V6 $8 V7 $12 ;Array grp_a {34}$ var1-var7;more SAS statements;Run;PROC PRINT data= sasuser.withoutact; run;

Page 18: A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

Creating and Using a SQL View

PROC SQL; CREATE VIEW sasuser.fitview as SELECT * FROM sasuser.fitness WHERE age > 50;QUIT;

PROC FREQ data=sasuser.fitview; Tables age;RUN;

Page 19: A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

The LENGTH Statement

Numeric variables have a default length of 8

Character variables default to the length of first use.

Use the LENGTH statement to override the default values.

Page 20: A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

What length should I use for numeric values?

Windows and Unix MVSLength

2 2563 8,192 65,5364 2,097,152 16,777,2165 536,870,912 4,294,967,2966 137,438,953,472 1,099,511,627,7767 35,184,372,088,832 281,474,946,710,6568 9,007,199,254,740,992 72,057,594,037,927,936

Page 21: A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

When to use the LENGTH statement.

It is best to use the LENGTH statement before any reference to the variable is made either by reading data or assigning values.

Page 22: A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

Why is position important?

When SAS compiles a DATA Step, the attributes of the DATA Set are determined. All statements for an attribute, EXCEPT for the length of the variable, are applied to the variable in order.

Page 23: A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

Why is position important?

The length attribute for numeric variables is not applied to the variable while it is being manipulated in the step. If the length of a numeric variable is shortened the truncation does not occur until the observation is written out to the output data set.

Page 24: A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

Why is position important?

The length attribute for character variables is determined by it's first occurrence.

Page 25: A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

Let’s look at some data.

The SAS System 08:07 Wednesday, May 7, 2003 2

Obs age weight runtime rstpulse runpulse maxpulse oxygen group

1 57 73.37 12.63 58 174 176 39.407 2 2 54 79.38 11.17 62 156 165 46.080 2 3 52 76.32 9.63 48 164 166 45.441 2

Page 26: A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

The actual contents of this file

1 age Num 8 0 Age in years8 group Num 8 56 Experimental group6 maxpulse Num 8 40 Maximum heart rate7 oxygen Num 8 48 Oxygen consumption4 rstpulse Num 8 24 Heart rate while resting5 runpulse Num 8 32 Heart rate while running3 runtime Num 8 16 Min. to run 1.5 miles2 weight Num 8 8 Weight in kg

Page 27: A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

The wrong way to use the LENGTH statement

data fitness1; set sasuser.fitness; length age rstpulse runpulse maxpulse group 3;run;

Page 28: A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

The correct way to use the LENGTH statement

;

data fitness2; length age rstpulse runpulse maxpulse group 3; set sasuser.fitness;run;

Page 29: A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

Using LENGTH in a SQL Queryproc sql; connect to oracle (user=&user1 pass=&pwd1 &pth1); create table u.skudata as select acct_id length = 6,

datepart(post_dt) as post_dt length = 4,det_item_qty as quantity length = 4

from connection to oracle ( SELECT acct_id,

post_dt,det_item_qty

FROM sku_data WHERE acct_id_suf = 0

and substr(dept_id,4,8) = '00111200' );

disconnect from oracle; order by acct_id;quit;

Page 30: A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

Setting length in the ATTRIB statement

Data fitness; ATTRIB age length=3 informat=3. Format = 3.

Label=’Age in Years’; Set sasuser.fitness;Run;

Page 31: A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

Data Compression

Can be set in an options statement or as a data step option.

Page 32: A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

Data Compression

Can be set in an options statement or as a data step option.

OPTIONS compress = yes;

Data perm.comp (compress = yes);

Page 33: A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

Data Compression

Compresses the data set by reducing repeated consecutive characters to two- or three-byte representations.

Page 34: A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

Data Compression - Advantages

Reduced storage requirements for the data set

Fewer input and output operations necessary to read from or write to the data set during processing.

Page 35: A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

Data Compression - Disadvantages

may not compress at all (may actually make the file larger), but a message detailing the amount of compression is provided

more CPU resources are required.

Page 36: A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

Compression - A good example6 libname col 'x:\colleen\2002\pr02810';NOTE: Libref COL was successfully assigned as follows: Engine: V8 Physical Name: x:\colleen\2002\pr0281078 data telephone (compress=yes);9 set col.telephone;10 run;

NOTE: There were 1344653 observations read from the dataset COL.TELEPHONE.NOTE: The data set WORK.TELEPHONE has 1344653 observations and 21variables.NOTE: Compressing data set WORK.TELEPHONE decreased size by 29.34 percent. Compressed is 19794 pages; un-compressed would require 28014 pages.NOTE: DATA statement used: real time 7:07.65 cpu time 20.96 seconds

Page 37: A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

Compression - A bad example

11 data fitness (compress=yes);12 set sasuser.fitness;13 run;

NOTE: There were 31 observations read from the dataset SASUSER.FITNESS.NOTE: The data set WORK.FITNESS has 31 observations and 8 variables.NOTE: Compressing data set WORK.FITNESS increased size by 100.00 percent. Compressed is 2 pages; un-compressed would require 1 pages.NOTE: DATA statement used: real time 0.27 seconds cpu time 0.02 seconds

Page 38: A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services

Copyright © 2003, SAS Institute Inc. All rights reserved. 38