the many ways to effectively utilize array processing

The Many Ways to Effectively Utilize Array

ProcessingArthur Li

Why do we need to use Arrays?Allows us to reduce the amount of coding in the

DATA step

What is essential for learning Arrays?Compilation and execution of the DATA stepHow the Program Data Vector (PDV) works

INTRODUCTION

REVIEW: COMPILATION AND EXECUTION PHASES

Compilation phase:Each statement is scanned for syntax errors.

Execution phase:The DATA step reads and processes the input data.

If there is no syntax error

A DATA step is processed in two-phase sequences:

REVIEW IMPLICIT AND EXPLICIT LOOPSREVIEW IMPLICIT LOOP

Patient:ID

1 M2390

2 F2390

3 F2340

4 M1240

The DATA step works like a loop – an implicit loopIt repetitively executes statements

reads data values creates observations in the PDV one at a time

Each loop is called an iteration Suppose you have the following dataset that contains

patient IDs for a clinical trial

You would like to assign each patient with either a drug or a placebo (50% chance of either/or)

REVIEW IMPLICIT LOOP

data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;

1st iteration:_N_ 1_ERROR_ 0The rest of variables are set to missing

Patient:ID

1 M2390

2 F2390

3 F2340

4 M1240

_N_ D _ERROR_ D ID K RANNUM D GROUP K

1 0 .PDV:



1st iteration:

The SET statement copies the 1st observation PDV

Patient:ID

1 M2390

2 F2390

3 F2340

4 M1240


1 0 M2390 .PDV:


1st iteration: RANNUM is generated

Patient:ID

1 M2390

2 F2390

3 F2340

4 M1240


1 0 M2390 0.36993PDV:



1st iteration: GROUP ‘P’ since RANNUM is not > 0.5

Patient:ID

1 M2390

2 F2390

3 F2340

4 M1240


1 0 M2390 0.36993 PPDV:



1st iteration:The implicit OUTPUT statement writes the variables

marked with (K) to the final datasetSAS returns to the beginning of the DATA step

Patient:ID

1 M2390

2 F2390

3 F2340

4 M1240


1 0 M2390 0.36993 PPDV:

Trial1:ID GROUP

1 M2390 P



2nd iteration:_N_ ↑2

Patient:ID

1 M2390

2 F2390

3 F2340

4 M1240


2 0 M2390 .PDV:

Trial1:ID GROUP

1 M2390 P

Variables exist in the input dataset

SAS sets each variable to missing in the PDV only before the 1st iteration of the execution

Variables will retain their values in the PDV until they are replaced by the new values



2nd iteration:

Patient:ID

1 M2390

2 F2390

3 F2340

4 M1240


2 0 M2390 .PDV:

Trial1:ID GROUP

1 M2390 P

Variables being created in the DATA step

SAS sets each variable to missing in the PDV at the beginning of every iteration of the execution



2nd iteration:The SET statement copies the 2nd observation PDV

Patient:ID

1 M2390

2 F2390

3 F2340

4 M1240


2 0 M2390 .PDV:

Trial1:ID GROUP

1 M2390 P

Skip the rest iterations….


REVIEW: OUTPUT STATEMENT

data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';

run;

The explicit OUTPUT statement:

Write the current observation from the PDV to the SAS dataset immediately

Not at the end of the DATA step

output;


The implicit OUTPUT statement:

It tells SAS to write observations to the dataset at the end of the DATA step

Without explicit OUTPUT statements, every DATA step contains an implicit OUTPUT statement at the end of the DATA step

data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';

run;

Placing an explicit OUTPUT

Override the implicit OUTPUTSAS adds an observation to a dataset only when

an explicit OUTPUT is executedWe can use more than one OUTPUT statement

in the DATA step


REVIEW EXPLICIT LOOP

Suppose you don’t have a dataset containing the patient IDs

You are asked to assign four patients, ‘M2390’, ‘F2390’, ‘F2340’, ‘M1240’, with a 50% chance of receiving either the drug or the placebo

You can create the ID and assign each ID to a group in the DATA step at the same time. For example


data trial2(drop = rannum); id = 'M2390'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;

id = 'F2390'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;


id = 'M1240'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;run;

Assigning IDs in the DATA step






4 explicit OUTPUT statements






4 almost identical blocks

Put identical codes in a loop

Loop along the IDs

Reduce amount of coding

ITERATIVE DO LOOP


id = 'F2390'; ...

id = 'F2340'; ...


DO INDEX-VARIABLE = VALUE1, VALUE2, …, VALUEN;SAS STATEMENTSEND;

INDEX-VARIABLE: IDVALUE1 – VALUEN: 'M2390’, 'F2390’, 'F2340’, 'M1240'SAS STATEMENTS:

rannum = ranuni(2);if rannum> 0.5 then group = 'D';else group ='P';output;

ITERATIVE DO LOOP


id = 'F2390'; ...

id = 'F2340'; ...


DO INDEX-VARIABLE = VALUE1, VALUE2, …, VALUEN;SAS STATEMENTSEND;

data trial2 (drop = rannum); do id = 'M2390', 'F2390', 'F2340', 'M1240'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;

THE ITERATIVE DO LOOP ALONG A SEQUENCE OF INTEGERS

data trial3 (drop = rannum); do id = 1 to 4; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;

Suppose you are using a sequence of numbers, say 1 to 4, as patient IDs

DO INDEX-VARIABLE = START TO STOP <BY INCREMENT>;SAS STATEMENTSEND;

INDEX-VARIABLE: IDSTART: 1STOP: 4INCREMENT: 1

PURPOSE OF USING ARRAYS

sbp1 sbp2 sbp3 sbp4 sbp5 sbp6

1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123

6 measurements of SBP for each patient

The missing values are coded as 999

Suppose you would like to recode 999 to periods (.)

data sbp1; set sbp; if sbp1 = 999 then sbp1 = .; if sbp2 = 999 then sbp2 = .; if sbp3 = 999 then sbp3 = .; if sbp4 = 999 then sbp4 = .; if sbp5 = 999 then sbp5 = .; if sbp6 = 999 then sbp6 = .;run;

Each of the IF statements are almost identical

Only the variable names are different

Use a DO loop?


RECALL: DO LOOPdata trial2(drop = rannum); id = 'M2390'; rannum = ranuni(2);

if rannum> 0.5 then group = 'D'; else group ='P'; output;

id = 'F2390'; rannum = ranuni(2);


id = 'F2340'; rannum = ranuni(2);


id = 'M1240'; rannum = ranuni(2);

if rannum> 0.5 then group = 'D'; else group ='P'; output;run;

data trial2 (drop = rannum); do id = 'M2390', 'F2390', 'F2340', 'M1240'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;

The loop iterates along a sequence of values

The index variable holds these values

Difference:The values of ID variables



1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123

data sbp1; set sbp; if sbp1 = 999 then sbp1 = .; if sbp2 = 999 then sbp2 = .; if sbp3 = 999 then sbp3 = .; if sbp4 = 999 then sbp4 = .; if sbp5 = 999 then sbp5 = .; if sbp6 = 999 then sbp6 = .;run;

Difference:Variable names

If we can group these variables into a single unitWe can loop along these variables

SBP

1 2 3 4 5 6 ARRAY: a temporary grouping of SAS variables

ARRAY DEFINITION AND SYNTAX

ARRAY ARRAYNAME [DIMENSION] <$> <ELEMENTS>;

Must be a SAS nameCannot be the name of

a SAS variable in the same DATA step

See handouts for other rules



DIMENSION is the number of elements in the array

More on DIMENSION later…



$ indicates that the elements in the array are character elements

$ is not necessary if the elements have been previously defined as character elements



ELEMENTS are the variables to be included in the array

Must either be all numeric or characters

More on ELEMENTS later…



array sbparray [6] sbp1 sbp2 sbp3 sbp4 sbp5 sbp6;



array sbparray [*] sbp1 sbp2 sbp3 sbp4 sbp5 sbp6;

You can use an asterisk (*) as DIMENSION

You must include ELEMENTS



array sbparray (6) sbp1 sbp2 sbp3 sbp4 sbp5 sbp6; array sbparray {6} sbp1 sbp2 sbp3 sbp4 sbp5 sbp6; array sbparray [6] sbp1 sbp2 sbp3 sbp4 sbp5 sbp6;

DIMENSION can be enclosed in parentheses, braces, or brackets



array sbp [6]; = array sbp [6] sbp1 sbp2 sbp3 sbp4 sbp5 sbp6;

If ELEMENTS are not specified, for example:

Case1: sbp1 – sbp6 were previously defined in the DATA stepCase2: if sbp1 – sbp6 were not previously defined in the DATA step, they will be created by the ARRAY statement



array num [*] _numeric_; array char [*] _character_; array allvar [*] _all_;

_NUMERIC_ : all numeric variables_CHARACTER_ : all character variables _ALL_: all the variables; variables must be

either all numeric or character



array sbp [6] sbp1 - sbp6;

A single dash format can be used to specify a range of variables


ARRAYNAME [INDEX];

must be closed in ( ), [ ], or { }is specified as an integer, a numeric variable, or

a SAS expressionmust be within the lower and upper bounds of

the DIMENSION of the array

To reference an array element:



1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123

data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 sbp2 sbp3 sbp4 sbp5 sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;

data sbp2 (drop=i); set sbp; array sbp [6]; do i = 1 to 6; if sbp [i] = 999 then sbp [i] = .; end;run;

ARRAY:

array sbparray [6] sbp1 - sbp6;

array sbp [6]; = array sbp [6] sbp1 - sbp6; data sbp1;

set sbp; if sbp1 = 999 then sbp1 = .; if sbp2 = 999 then sbp2 = .; if sbp3 = 999 then sbp3 = .; if sbp4 = 999 then sbp4 = .; if sbp5 = 999 then sbp5 = .; if sbp6 = 999 then sbp6 = .;run;

THE DIM FUNCTION

data sbp3 (drop=i); set sbp; array sbparray [*] sbp1 - sbp6; do i = 1 to dim(sbparray); if sbparray [i] = 999 then sbparray [i] = .; end;run;

Use the DIM function to determine the number of elements in an array

It is convenient when you use _NUMERIC_, _CHARACTER_, _ALL_ as array ELEMENTS

DIM(ARRAYNAME)

ASSIGNING INITIAL VALUES TO AN ARRAY

When creating a group of variables by using the ARRAY statement, you can assign initial values to the array elements

array num[3] n1 n2 n3 (1 2 3);

array chr[3] $ ('A', 'B', 'C');

TEMPORARY ARRAYS

Temporary arrays contain temporary data elementsUsing temporary arrays is useful when you want to

create an array only for calculation purposesWhen referring to a temporary data element, you

refer to it by the ARRAYNAME and its DIMENSIONYou cannot use the asterisk (*) with temporary arrays They are not output to the output datasetThey are always automatically retainedTo create a temporary array, you need to use the

keyword _TEMPORARY_

array num[3] _temporary_ (1 2 3);

COMPILATION AND EXECUTION PHASESCOMPILATION PHASE

data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;

_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D


1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123

PDV is created Array name SBPARRAY and references are not included in the PDVSBP1 – SBP6, is referenced by the ARRAY referenceSyntax errors in the ARRAY statement will be detected during the

compilation phase

SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]

EXECUTION PHASE


1 . . . . . . .


1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123

_N_ 1The rest of the variables

missing


1st iteration of the DATA step:


EXECUTION PHASE


1 141 142 137 117 116 124 .


1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123

SET statement copies the 1st obs. from Sbp to the PDV




EXECUTION PHASE


1 141 142 137 117 116 124 .


1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123

The ARRAY statement is a compile-time only statement




EXECUTION PHASE


1 141 142 137 117 116 124 1


1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123

I 1


1st iteration of the DATA step:1st iteration of the DO loop:


EXECUTION PHASE


1 141 142 137 117 116 124 1


1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123

SBPARRAY [ i ] SBPARRAY [1] SBPARRAY [1] SBP1 Since SBP1 ≠ 999, no execution




EXECUTION PHASE


1 141 142 137 117 116 124 1


1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123

SAS reaches the end of the DO loop




EXECUTION PHASE


1 141 142 137 117 116 124 2


1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123

I 2 Since I ≤ 6, the loop continues


1st iteration of the DATA step:2nd iteration of the DO loop:


EXECUTION PHASE


1 141 142 137 117 116 124 2


1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123


1st iteration of the DATA step:2nd iteration of the DO loop: SBPARRAY [ i ] SBPARRAY [2] SBPARRAY [2] SBP2 Since SBP2 ≠ 999, no execution


EXECUTION PHASE


1 141 142 137 117 116 124 2


1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123


1st iteration of the DATA step:2nd iteration of the DO loop:SAS reaches the end of the DO

loopSkip the rest of the iterations


EXECUTION PHASE


1 141 142 137 117 116 124 7


1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123


1st iteration of the DATA step:SAS reaches the end of the

DATA stepThe implicit OUTPUT executes


1 141 142 137 117 116 124


EXECUTION PHASE


2 141 142 137 117 116 124 .


1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123


2nd iteration of the DATA step:_N_ ↑ 2SBP1 – SBP6 are retained I missing


1 141 142 137 117 116 124


EXECUTION PHASE


2 999 141 138 119 119 122 .


1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123


2nd iteration of the DATA step:The SET statement copies the

2nd obs. to the PDV


1 141 142 137 117 116 124


EXECUTION PHASE


2 999 141 138 119 119 122 1


1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123


2nd iteration of the DATA step:

I 1


1 141 142 137 117 116 1241st iteration of the DO loop:


EXECUTION PHASE


2 999 141 138 119 119 122 1


1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123



1 141 142 137 117 116 124

SBPARRAY [ i ] SBPARRAY [1] SBPARRAY [1] SBP1

2nd iteration of the DATA step:1st iteration of the DO loop:


EXECUTION PHASE


2 . 141 138 119 119 122 1


1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123



1 141 142 137 117 116 124

SBPARRAY [ i ] SBPARRAY [1] SBPARRAY [1] SBP1 Since SBP1 = 999, SBP1 missing



EXECUTION PHASE


2 . 141 138 119 119 122 1


1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123



1 141 142 137 117 116 124

SAS reaches the end of loopSkip the rest of the loop



EXECUTION PHASE


2 . 141 138 119 119 122 7


1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123



1 141 142 137 117 116 124

2nd iteration of the DATA step:SAS reaches the end of the

DATA step


EXECUTION PHASE


2 . 141 138 119 119 122 7


1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123



1 141 142 137 117 116 124

2 . 141 138 119 119 122

2nd iteration of the DATA step:SAS reaches the end of the

DATA stepThe implicit OUTPUT executesSkip the rest of the iterations


SOME ARRAY APPLICATIONSCREATING A GROUP OF VARIABLES BY USING ARRAYS


1 141 142 137 117 116 124

2 . 141 138 119 119 122

3 142 . 139 119 120 .

4 136 140 142 118 121 123

Pre-treatment Post-treatment

MEAN SBP: 140 120

above1 above2 above3 above4 above5 above6

1 1 1 0 0 0 1

2 . 1 0 0 0 1

3 1 . 0 0 0 .

4 0 0 1 0 1 1

data sbp4 (drop=i); set sbp2; array sbp[6]; array above[6]; array threshhold[6] _temporary_ (140 140 140 120 120 120); do i = 1 to 6; if (not missing(sbp[i])) then above [i] = sbp[i] > threshhold[i]; end;run;

Used to group the existing variables: sbp1 – sbp6

CREATING A GROUP OF VARIABLES BY USING ARRAYS


1 141 142 137 117 116 124

2 . 141 138 119 119 122

3 142 . 139 119 120 .

4 136 140 142 118 121 123


MEAN SBP: 140 120


1 1 1 0 0 0 1

2 . 1 0 0 0 1

3 1 . 0 0 0 .

4 0 0 1 0 1 1


Used to create variables: above1 – above6


1 141 142 137 117 116 124

2 . 141 138 119 119 122

3 142 . 139 119 120 .

4 136 140 142 118 121 123


MEAN SBP: 140 120


1 1 1 0 0 0 1

2 . 1 0 0 0 1

3 1 . 0 0 0 .

4 0 0 1 0 1 1


The temporary array is for comparison purposes



1 141 142 137 117 116 124

2 . 141 138 119 119 122

3 142 . 139 119 120 .

4 136 140 142 118 121 123


MEAN SBP: 140 120


1 1 1 0 0 0 1

2 . 1 0 0 0 1

3 1 . 0 0 0 .

4 0 0 1 0 1 1



THE IN OPERATOR


1 141 142 137 117 116 124

2 . 141 138 119 119 122

3 142 . 139 119 120 .

4 136 140 142 118 121 123

miss

0

1

1

0

data sbp6 (drop = i); set sbp2; array sbp [6]; if . IN sbp then miss = 1; else miss = 0;run;

CALCULATING PRODUCTS OF MULTIPLE VARIABLES

num1 num2 num3 num4

1 4 . 2 3

2 . 2 3 1

data product (drop=i); set test; array num[4]; if missing(num[1]) then result = 1; else result = num[1]; do i = 2 to 4; if not missing(num[i]) then result =result*num[i]; end;run;

Approach:1. Create an array: num[4]2. Treat missing value as 13. Set result = num[1] Loop: i = 2 to 4 result = result * num[i] End Loop

Test:

Used to group the existing variables: num1 – num6

CALCULATING PRODUCTS OF MULTIPLE VARIABLES

num1 num2 num3 num4

1 4 . 2 3

2 . 2 3 1

data product (drop=i); set test; array num[4]; if missing(num[1]) then result = 1; else result = num[1]; do i = 2 to 4; if not missing(num[i]) then result =result*num[i]; end;run;

Approach:1. Create an array: num[4]2. Treat missing value as 13. Set result = num[1] Loop: i = 2 to 4 result = result * num[i] End Loop

Test:

RESTRUCTURING DATASETS USING ARRAYS

Restructuring datasets:

data with one observation per

subject (the wide format)

data with multiple observations per

subject (the long format)

ID S1 S2 S3

1 A01 3 4 5

2 A02 4 . 2

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

FROM WIDE FORMAT TO LONG FORMAT (WITHOUT USING ARRAYS)

Wide:

Long:

Transform wide long2 obs. to read 2

DATA step iterationsUse multiple OUTPUT

statementAny missing values in

S1 – S3 will not be outputted to long

data long (drop=s1-s3); set wide;

time = 1; score = s1; if not missing(score) then output;


time = 3; score = s3; if not missing(score) then output;run;

ID S1 S2 S3

1 A01 3 4 5

2 A02 4 . 2

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

FROM WIDE FORMAT TO LONG FORMAT(USING ARRAYS)

Wide:

Long:





ID S1 S2 S3

1 A01 3 4 5

2 A02 4 . 2

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

S

[1] [2] [3]

array s[3];

S[1];

S[2];

S[3];


Wide:

Long:





ID S1 S2 S3

1 A01 3 4 5

2 A02 4 . 2

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

S

[1] [2] [3]

array s[3];

S[1];

S[2];

S[3];

Create a DO loop – TIME as index variable


Wide:

Long:





ID S1 S2 S3

1 A01 3 4 5

2 A02 4 . 2

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

S

[1] [2] [3]

array s[3];

S[1];

S[2];

S[3];

do time = 1 to 3; score = s[time]; if not missing(score) then output;end;

data long (drop=s1-s3); set wide; array s[3];

run;

FROM LONG FORMAT TO WIDE FORMAT

Reading 5 observations but only creating 2 observations

You are not copying data from the PDV to the final dataset at each iteration

You only need to generate one observation once all the observations for each subject have been processed

Wide:Long:

ID S1 S2 S3

1 A01 3 4 5

2 A02 4 . 2

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

REVIEW THE RETAIN STATEMENT

To prevents the VARIABLE from being initialized each time the DATA step executes, use the RETAIN statement:

RETAIN VARIABLE <VALUE>;

Name of the variable that we will want to retain

A numeric valueUsed to initialize the VARIABLE

only at the first iteration of the DATA step execution

Not specifying an initial value VARIABLE is initialized as missing

REVIEW: THE SUM STATEMENT

The SUM statement has the following form:

VARIABLE + EXPRESSION;

The numeric accumulator variable that is to be created

It is automatically set to 0 at the beginning of the first iteration of the DATA step execution

Retained in following iterations

Any SAS expression If EXPRESSION is evaluated

to a missing value, it is treated as 0

REVIEW: FIRST.VARIABLE AND LAST.VARIABLE

You only output the data after you finish reading the last observation of each subject

Thus, you need to identify the last observation

Wide:Long:

ID S1 S2 S3

1 A01 3 4 5

2 A02 4 . 2

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

BY-group processing method

proc sort data=b; by by_variable;run;data a; set b; by by_variable; ... ...run;

For each BY-variable, SAS creates two temporary variables: FIRST.VARIABLELAST.VARIABLE

FIRST.VARIABLE & LAST.VARIABLE are set to 1 at the beginning of the execution phase

They are not being output to the final dataset


ID SCORE

1 A01 3

2 A01 3

3 A01 2

4 A02 4

5 A02 2

Suppose ID is the “BY” variable:

FIRST.ID

1

0

0

1

0

LAST.ID

0

0

1

0

1

SAS reads the 1st observation for ID = A01

SAS reads the last observation for ID = A01

“GROUPING”

1

2

Grouping based ID


REVIEW SUBSETTING IF STATEMENT

Use the IF statement to continue processing only the observations that meet the condition of the specified expression

IF EXPRESSION;

If the EXPRESSION is true for the observationSAS continues to execute statements in the

DATA step and includes the current observation in the data set

REVIEW SUBSETTING IF STATEMENT

Use the IF statement to continue processing only the observations that meet the condition of the specified expression

IF EXPRESSION;

If the EXPRESSION is falseno further statements are processed for that

observation the current observation is not written to the data set the remaining program statements in the DATA step

are not executedSAS immediately returns to the beginning of the

DATA step

FROM LONG FORMAT TO WIDE FORMAT(WITHOUT USING ARRAYS)

S1

S2

S3

S1

S3

if time = 1 then s1 = score;else if time = 2 then s2 = score;else s3 = score;

Use BY-group processing: BY ID Output to the final data when LAST.ID = 1

SCORE S1, S2 S3

RETAINID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

ID S1 S2 S3

1 A01 3 4 5

2 A02 4 . 2

FROM LONG FORMAT TO WIDE FORMAT(WITHOUT USING ARRAYS)

S1

S2

S3

S1

S3

RETAINID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

ID S1 S2 S3

1 A01 3 4 5

2 A02 4 . 2

proc sort data=long; by id;data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;

1ST iteration:_N_ 1FIRST.ID 1, LAST.ID 1Other variables missing

_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K

1 1 1 . . . . .

EXECUTION PHASE Long:

1ST iteration:The SET statement copies the 1st observation PDV


1 1 1 A01 1 3 . . .

EXECUTION PHASE


ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

1ST iteration:The SET statement copies the 1st observation PDVFIRST.ID 1 since this is the 1st observation for A01LAST.ID 0 since this is not the last observation for A01


1 1 0 A01 1 3 . . .

EXECUTION PHASE


ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

1ST iteration:Since TIME = 1, S1 SCORE (3)


1 1 0 A01 1 3 3 . .

EXECUTION PHASE


ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

1ST iteration:Since LAST.ID ≠1, SAS returns to the beginning of the DATA

step to begin the 2nd iteration


1 1 0 A01 1 3 3 . .

EXECUTION PHASE


ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

2nd iteration:_N_ ↑2


2 1 0 A01 1 3 3 . .

EXECUTION PHASE


ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

2nd iteration: FIRST.ID and LAST.ID are retained; they are automatic variables ID, TIME, SCORE are retained; they are from input dataset S1, S2, and S3 are retained because of the RETAIN statement


2 1 0 A01 1 3 3 . .

EXECUTION PHASE


ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

2nd iteration:The SET statement copies the 2nd observation to the PDV


2 1 0 A01 2 4 3 . .

EXECUTION PHASE


ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

2nd iteration:The SET statement copies the 2nd observation to the PDVFIRST.ID 0; this is not the first observation for A01LAST.ID 0; this is not the last observation for A01 either


2 0 0 A01 2 4 3 . .

EXECUTION PHASE


ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

2nd iteration:Since TIME = 2, S2 SCORE (4)


2 0 0 A01 2 4 3 4 .

EXECUTION PHASE


ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

2nd iteration:Since LAST.ID ≠1, SAS returns to the beginning of the DATA

step to begin the 3rd iteration


2 0 0 A01 2 4 3 4 .

EXECUTION PHASE


ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

3rd iteration:_N_ ↑3The rest of the variables are retained


3 0 0 A01 2 4 3 4 .

EXECUTION PHASE


ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

3rd iteration:The SET statement copies the 3rd observation PDV


3 0 0 A01 3 5 3 4 .

EXECUTION PHASE


ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

3rd iteration:The SET statement copies the 3rd observation PDVFIRST.ID 0; this is not the first observation for A01LAST.ID 1; this is the last observation for A01


3 0 1 A01 3 5 3 4 .

EXECUTION PHASE


ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

3rd iteration:Since TIME = 3, S3 SCORE (5)


3 0 1 A01 3 5 3 4 5

EXECUTION PHASE


ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

3rd iteration:Since LAST.ID = 1, SAS continues to execute statements in

the DATA step


3 0 1 A01 3 5 3 4 5

EXECUTION PHASE


ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

3rd iteration:SAS reaches the end of 3rd iteration The implicit OUTPUT executes, variables marked

with (K) are copied to the dataset wide


3 0 1 A01 3 5 3 4 5

EXECUTION PHASE


ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

ID S1 S2 S3

1 A01 3 4 5

4th iteration:_N_ ↑4The rest of the variables are retained


4 0 1 A01 3 5 3 4 5

EXECUTION PHASE


ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

ID S1 S2 S3

1 A01 3 4 5

4th iteration:The SET statement copies the 4th observation PDV


4 0 1 A02 1 4 3 4 5

EXECUTION PHASE


ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

ID S1 S2 S3

1 A01 3 4 5

4th iteration:The SET statement copies the 4th observation PDVFIRST.ID 1; this is the first observation for A02LAST.ID 0; this is not the last observation for A02


4 1 0 A02 1 4 3 4 5

EXECUTION PHASE


ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

ID S1 S2 S3

1 A01 3 4 5

4th iteration:Since TIME = 1, S1 SCORE (4)


4 1 0 A02 1 4 4 4 5

EXECUTION PHASE


ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

ID S1 S2 S3

1 A01 3 4 5

4th iteration:Since LAST.ID ≠1, SAS returns to the beginning of the DATA

step to begin the 5th iteration


4 1 0 A02 1 4 4 4 5

EXECUTION PHASE


ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

ID S1 S2 S3

1 A01 3 4 5

5th iteration:_N_ ↑5The rest of the variables are retained


5 1 0 A02 1 4 4 4 5

EXECUTION PHASE


ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

ID S1 S2 S3

1 A01 3 4 5

5th iteration:The SET statement copies the 5th observation PDV


5 1 0 A02 3 2 4 4 5

EXECUTION PHASE


ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

ID S1 S2 S3

1 A01 3 4 5

5th iteration:The SET statement copies the 5th observation PDVFIRST.ID 0; this is not the first observation for A02LAST.ID 1; this is the last observation for A02


5 0 1 A02 3 2 4 4 5

EXECUTION PHASE


ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

ID S1 S2 S3

1 A01 3 4 5

5th iteration:Since TIME = 3, S3 SCORE (2)


5 0 1 A02 3 2 4 4 2

EXECUTION PHASE


ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

ID S1 S2 S3

1 A01 3 4 5

5th iteration:Since LAST.ID = 1, SAS continues to execute the rest of the

statement


5 0 1 A02 3 2 4 4 2

EXECUTION PHASE


ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

ID S1 S2 S3

1 A01 3 4 5

5th iteration:SAS reaches the end of 5th iteration The implicit OUTPUT executes, variables marked with (K) are

copied to the dataset wide


5 0 1 A02 3 2 4 4 2

EXECUTION PHASE


ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

ID S1 S2 S3

1 A01 3 4 5

2 A02 4 4 2

How to fix this?

EXECUTION PHASE

data wide (drop=time score); set long; by id; retain s1 - s3; if first.id then do; s1 = .; s2 = .; s3 = .; end; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;

FROM LONG FORMAT TO WIDE FORMAT(USING ARRAYS)

ID S1 S2 S3

1 A01 3 4 5

2 A02 4 . 2

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

data wide (drop=time score); set long; by id;

retain s1 - s3;

if first.id then do; s1 = .; s2 = .; s3 = .; end;

if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score;

if last.id;run;

S

[1] [2] [3]

array s[3];if first.id then do; do i = 1 to 3; s[i] = .; end;end;

S[1] S[2] S[3]S[1] S[2] S[3]

S[1]S[2]

S[3]

retain s;


ID S1 S2 S3

1 A01 3 4 5

2 A02 4 . 2


retain s1 - s3;



if last.id;run;

S

[1] [2] [3]

array s[3];

S[1] S[2] S[3]S[1] S[2] S[3]

S[1]S[2]

S[3]

_N_ D FIRST.ID D LAST.ID D

ID K TIME D SCORE D

S1 K S2 K S3 K

S[1] S[2] S[3]

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

retain s;


ID S1 S2 S3

1 A01 3 4 5

2 A02 4 . 2

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2


retain s1 - s3;



if last.id;run;

S

[1] [2] [3]

array s[3];

S[1] S[2] S[3]S[1] S[2] S[3]

S[1]S[2]

S[3]


1 1 0

ID K TIME D SCORE D

A01 1 3

S1 K S2 K S3 K

. . .

S[1] S[2] S[3]

S[TIME]

3

retain s;


ID S1 S2 S3

1 A01 3 4 5

2 A02 4 . 2

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2


retain s1 - s3;



if last.id;run;

S

[1] [2] [3]

array s[3];

S[1] S[2] S[3]S[1] S[2] S[3]

S[1]S[2]

S[3]


2 0 0

ID K TIME D SCORE D

A01 2 4

S1 K S2 K S3 K

. . .

S[1] S[2] S[3]

S[TIME]

3 4

retain s;


ID S1 S2 S3

1 A01 3 4 5

2 A02 4 . 2

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2


retain s1 - s3;



if last.id;run;

S

[1] [2] [3]

array s[3];

S[1] S[2] S[3]S[1] S[2] S[3]

S[1]S[2]

S[3]s[time] = score;

retain s;


ID S1 S2 S3

1 A01 3 4 5

2 A02 4 . 2

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2


retain s1 - s3;



if last.id;run;

S

[1] [2] [3]

array s[3];

S[1] S[2] S[3]S[1] S[2] S[3]

S[1]S[2]

S[3]

if first.id then do; do i = 1 to 3; s[i] = .; end;end;

s[time] = score;

if last.id;run;

data wide (drop = time score i); set long; by id; array s[3]; retain s;retain s;

MULTIDIMENSIONAL ARRAYS

ARRAY ARRAYNAME[R, C, …] <$> <ELEMENTS>;

The difference between one- and multi-dimensional arrays is the DIMENSION

R: number of rowsC: number of columnsIf there are 3 dimensions, the next number will

refer to the number of pages

MULTIDIMENSIONAL ARRAYS

array a[2,3];

equivalent to …

array a[2,3] a1 - a6;

1 2 3

1 a1 a2 a3

2 a4 a5 a6

a[2,2]

a[1,3]

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3

1 1 A B F B A C

2 2 B A D C B C

Dat1:

Dat2:

Create ONE observation after you finish reading ALL the observations for EACH person

FIRST.ID

1

0

1

0

LAST.ID

0

1

0

1

Use the BY-group processing

The output will be generated when LAST.ID equals 1


ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


1 1 A B F B A C

2 2 B A D C B C

Dat1:

Dat2:

1 2 3

G1 G2 G3G[3]:

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

ALL_G[2,3]:

Use to group existing variables

Use to create new variables

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

RETAIN

i + 1;


proc sort data=dat1; by id;run;

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

1 1 1 . . . . 0 .

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

. . . . . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

ALL_G [I,J]

Dat1:

At the beginning of the 1st iteration:

G1 G2 G3G [J]

ARRAY TRACKING


ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


1 1 0 1 A B F 0 .


. . . . . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

1st iteration:

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

ALL_G [I,J]

G1 G2 G3G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


1 1 0 1 A B F 1 .


. . . . . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

1st iteration:

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

ALL_G [I,J]

G1 G2 G3G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


1 1 0 1 A B F 1 1


. . . . . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

1st iteration (1st DO loop):

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

ALL_G [I,J]

G1 G2 G3G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


1 1 0 1 A B F 1 1


A . . . . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

1st iteration (1st DO loop):

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


1 1 0 1 A B F 1 2


A . . . . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

1st iteration (2nd DO loop):

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


1 1 0 1 A B F 1 2


A B . . . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

1st iteration (2nd DO loop):

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


1 1 0 1 A B F 1 3


A B . . . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

1st iteration (3rd DO loop):

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


1 1 0 1 A B F 1 3


A B F . . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

1st iteration (3rd DO loop):

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


1 1 0 1 A B F 1 4


A B F . . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

1st iteration (4th DO loop):

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


1 1 0 1 A B F 1 4


A B F . . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

1st iteration:

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


2 1 0 1 A B F 1 .


A B F . . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

2nd iteration:

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


2 0 1 1 B A C 1 .


A B F . . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

2nd iteration:

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


2 0 1 1 B A C 2 .


A B F . . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

2nd iteration:

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


2 0 1 1 B A C 2 1


A B F . . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

2nd iteration (1st DO loop):

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


2 0 1 1 B A C 2 1


A B F B . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

2nd iteration (1st DO loop):

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


2 0 1 1 B A C 2 2


A B F B . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

2nd iteration (2nd DO loop):

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


2 0 1 1 B A C 2 2


A B F B A .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

2nd iteration (2nd DO loop):

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


2 0 1 1 B A C 2 3


A B F B A .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

2nd iteration (3rd DO loop):

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


2 0 1 1 B A C 2 3


A B F B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:


1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


2 0 1 1 B A C 2 3


A B F B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:


1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


2 0 1 1 B A C 2 4


A B F B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

2nd iteration (4th DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


2 0 1 1 B A C 2 4


A B F B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

2nd iteration:

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]




ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


2 0 1 1 B A C 2 4


A B F B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

2nd iteration:

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3


1 1 A B F B A C

ALL_G [I,J]

G [J]

Dat2:



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


3 0 1 1 B A C 2 .


A B F B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

3rd iteration:

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3


1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


3 1 0 2 B A D 2 .


A B F B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

3rd iteration:

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3


1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


3 1 0 2 B A D 0 .


A B F B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

3rd iteration:

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3


1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


3 1 0 2 B A D 1 .


A B F B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

3rd iteration:

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3


1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


3 1 0 2 B A D 1 1


A B F B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

3rd iteration (1st DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3


1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


3 1 0 2 B A D 1 1


B B F B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

3rd iteration (1st DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3


1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


3 1 0 2 B A D 1 2


B B F B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

3rd iteration (2nd DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3


1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


3 1 0 2 B A D 1 2


B A F B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

3rd iteration (2nd DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3


1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


3 1 0 2 B A D 1 3


B A F B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

3rd iteration (3rd DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3


1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


3 1 0 2 B A D 1 3


B A D B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

3rd iteration (3rd DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3


1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


3 1 0 2 B A D 1 4


B A D B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

3rd iteration (4th DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3


1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


3 1 0 2 B A D 1 4


B A D B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

3rd iteration:

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3


1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


4 1 0 2 B A D 1 .


B A D B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

4th iteration:

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3


1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


4 0 1 2 C B C 1 .


B A D B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

4th iteration:

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3


1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


4 0 1 2 C B C 2 .


B A D B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

4th iteration:

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3


1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


4 0 1 2 C B C 2 1


B A D B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

4th iteration (1st DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3


1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


4 0 1 2 C B C 2 1


B A D C A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

4th iteration (1st DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3


1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


4 0 1 2 C B C 2 2


B A D C A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

4th iteration (2nd DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3


1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


4 0 1 2 C B C 2 2


B A D C B C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

4th iteration (2nd DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3


1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


4 0 1 2 C B C 2 3


B A D C B C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

4th iteration (3rd DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3


1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


4 0 1 2 C B C 2 4


B A D C B C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

4th iteration (4th DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3


1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]



ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


4 0 1 2 C B C 2 4


B A D C B C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

4th iteration (4th DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3


1 1 A B F B A C

2 2 B A D C B C

Dat2:

ALL_G [I,J]

G [J]


ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


1 1 A B F B A C

2 2 B A D C B C

Dat1:

Dat2:

Creating TWO observations after you finish reading ONE observation


ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C


1 1 A B F B A C

2 2 B A D C B C

Dat1:

Dat2:

1 2 3

G1 G2 G3G[3]:

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

ALL_G[2,3]:

Use to create new variables

Use to group existing variables

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]


data dat1 (drop = i j m_g1 -- f_g3); set dat2; array all_g [2,3] m_g1 -- f_g3; array g[3] $;

do i = 1 to 2; do j = 1 to 3; g[j] = all_g[i,j]; end; output; end;

run;

CONCLUSION

Array processing enables you to create more efficient programming code

In order to use arrays correctly, in addition to grasping the array syntax, you also need to understand how DATA steps are processed

In the end, you will often realize that most of the errors are closely related to programming fundamentals, which is understanding how the PDV works

ACKNOWLEDGEMENT

I would like to thank Helen Wang & Cindy Song for giving me the opportunity to present at the PharmaSUG 2011

CONTACT INFORMATION

Arthur Li

City of Hope

Division of Information Science

1500 East Duarte Road

Duarte, CA 91010 - 3000

Phone: (626) 256-4673 ext. 65121

E-mail: [email protected]

the many ways to effectively utilize array processing

Education

group d rannum

rannum id

m2390 rannum

rannum setpatient rannum

d elsegroup

f2340 rannum

f2390 rannum

m2390 id p group