the many ways to effectively utilize array processing

191
The Many Ways to Effectively Utilize Array Processing Arthur Li

Upload: arthur8898

Post on 30-Nov-2014

456 views

Category:

Education


0 download

DESCRIPTION

Utilizing array processing allows us to reduce the amount of coding in the DATA step. In addition to learning how to create one- and multi-dimensional arrays, this paper will review how to create an explicit loop in the DATA step - the prerequisite of constructing an array. You will also be exposed to what happens in the Program Data Vector (PDV) during array processing. A wide range of applications in using loop structures with array processing, such as recoding missing values for a list of variables, transforming datasets, etc., will be covered in this paper.

TRANSCRIPT

Page 1: The many ways to effectively utilize array processing

The Many Ways to Effectively Utilize Array

ProcessingArthur Li

Page 2: The many ways to effectively utilize array processing

Why do we need to use Arrays?Allows us to reduce the amount of coding in the

DATA step

What is essential for learning Arrays?Compilation and execution of the DATA stepHow the Program Data Vector (PDV) works

INTRODUCTION

Page 3: The many ways to effectively utilize array processing

REVIEW: COMPILATION AND EXECUTION PHASES

Compilation phase:Each statement is scanned for syntax errors.

Execution phase:The DATA step reads and processes the input data.

If there is no syntax error

A DATA step is processed in two-phase sequences:

Page 4: The many ways to effectively utilize array processing

REVIEW IMPLICIT AND EXPLICIT LOOPSREVIEW IMPLICIT LOOP

Patient:ID

1 M2390

2 F2390

3 F2340

4 M1240

The DATA step works like a loop – an implicit loopIt repetitively executes statements

reads data values creates observations in the PDV one at a time

Each loop is called an iteration Suppose you have the following dataset that contains

patient IDs for a clinical trial

You would like to assign each patient with either a drug or a placebo (50% chance of either/or)

Page 5: The many ways to effectively utilize array processing

REVIEW IMPLICIT LOOP

data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;

1st iteration:_N_ 1_ERROR_ 0The rest of variables are set to missing

Patient:ID

1 M2390

2 F2390

3 F2340

4 M1240

_N_ D _ERROR_ D ID K RANNUM D GROUP K

1 0 .PDV:

Page 6: The many ways to effectively utilize array processing

data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;

REVIEW IMPLICIT LOOP

1st iteration:

The SET statement copies the 1st observation PDV

Patient:ID

1 M2390

2 F2390

3 F2340

4 M1240

_N_ D _ERROR_ D ID K RANNUM D GROUP K

1 0 M2390 .PDV:

Page 7: The many ways to effectively utilize array processing

REVIEW IMPLICIT LOOP

1st iteration: RANNUM is generated

Patient:ID

1 M2390

2 F2390

3 F2340

4 M1240

_N_ D _ERROR_ D ID K RANNUM D GROUP K

1 0 M2390 0.36993PDV:

data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;

Page 8: The many ways to effectively utilize array processing

REVIEW IMPLICIT LOOP

1st iteration: GROUP ‘P’ since RANNUM is not > 0.5

Patient:ID

1 M2390

2 F2390

3 F2340

4 M1240

_N_ D _ERROR_ D ID K RANNUM D GROUP K

1 0 M2390 0.36993 PPDV:

data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;

Page 9: The many ways to effectively utilize array processing

REVIEW IMPLICIT LOOP

1st iteration:The implicit OUTPUT statement writes the variables

marked with (K) to the final datasetSAS returns to the beginning of the DATA step

Patient:ID

1 M2390

2 F2390

3 F2340

4 M1240

_N_ D _ERROR_ D ID K RANNUM D GROUP K

1 0 M2390 0.36993 PPDV:

Trial1:ID GROUP

1 M2390 P

data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;

Page 10: The many ways to effectively utilize array processing

REVIEW IMPLICIT LOOP

2nd iteration:_N_ ↑2

Patient:ID

1 M2390

2 F2390

3 F2340

4 M1240

_N_ D _ERROR_ D ID K RANNUM D GROUP K

2 0 M2390 .PDV:

Trial1:ID GROUP

1 M2390 P

Variables exist in the input dataset

SAS sets each variable to missing in the PDV only before the 1st iteration of the execution

Variables will retain their values in the PDV until they are replaced by the new values

data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;

Page 11: The many ways to effectively utilize array processing

REVIEW IMPLICIT LOOP

2nd iteration:

Patient:ID

1 M2390

2 F2390

3 F2340

4 M1240

_N_ D _ERROR_ D ID K RANNUM D GROUP K

2 0 M2390 .PDV:

Trial1:ID GROUP

1 M2390 P

Variables being created in the DATA step

SAS sets each variable to missing in the PDV at the beginning of every iteration of the execution

data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;

Page 12: The many ways to effectively utilize array processing

REVIEW IMPLICIT LOOP

2nd iteration:The SET statement copies the 2nd observation PDV

Patient:ID

1 M2390

2 F2390

3 F2340

4 M1240

_N_ D _ERROR_ D ID K RANNUM D GROUP K

2 0 M2390 .PDV:

Trial1:ID GROUP

1 M2390 P

Skip the rest iterations….

data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;

Page 13: The many ways to effectively utilize array processing

REVIEW: OUTPUT STATEMENT

data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';

run;

The explicit OUTPUT statement:

Write the current observation from the PDV to the SAS dataset immediately

Not at the end of the DATA step

output;

Page 14: The many ways to effectively utilize array processing

REVIEW: OUTPUT STATEMENT

The implicit OUTPUT statement:

It tells SAS to write observations to the dataset at the end of the DATA step

Without explicit OUTPUT statements, every DATA step contains an implicit OUTPUT statement at the end of the DATA step

data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';

run;

Page 15: The many ways to effectively utilize array processing

Placing an explicit OUTPUT

Override the implicit OUTPUTSAS adds an observation to a dataset only when

an explicit OUTPUT is executedWe can use more than one OUTPUT statement

in the DATA step

REVIEW: OUTPUT STATEMENT

Page 16: The many ways to effectively utilize array processing

REVIEW EXPLICIT LOOP

Suppose you don’t have a dataset containing the patient IDs

You are asked to assign four patients, ‘M2390’, ‘F2390’, ‘F2340’, ‘M1240’, with a 50% chance of receiving either the drug or the placebo

You can create the ID and assign each ID to a group in the DATA step at the same time. For example

Page 17: The many ways to effectively utilize array processing

REVIEW EXPLICIT LOOP

data trial2(drop = rannum); id = 'M2390'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;

id = 'F2390'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;

id = 'F2340'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;

id = 'M1240'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;run;

Assigning IDs in the DATA step

Page 18: The many ways to effectively utilize array processing

REVIEW EXPLICIT LOOP

data trial2(drop = rannum); id = 'M2390'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;

id = 'F2390'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;

id = 'F2340'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;

id = 'M1240'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;run;

4 explicit OUTPUT statements

Page 19: The many ways to effectively utilize array processing

REVIEW EXPLICIT LOOP

data trial2(drop = rannum); id = 'M2390'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;

id = 'F2390'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;

id = 'F2340'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;

id = 'M1240'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;run;

4 almost identical blocks

Put identical codes in a loop

Loop along the IDs

Reduce amount of coding

Page 20: The many ways to effectively utilize array processing

ITERATIVE DO LOOP

data trial2(drop = rannum); id = 'M2390'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;

id = 'F2390'; ...

id = 'F2340'; ...

id = 'M1240'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;run;

DO INDEX-VARIABLE = VALUE1, VALUE2, …, VALUEN;SAS STATEMENTSEND;

INDEX-VARIABLE: IDVALUE1 – VALUEN: 'M2390’, 'F2390’, 'F2340’, 'M1240'SAS STATEMENTS:

rannum = ranuni(2);if rannum> 0.5 then group = 'D';else group ='P';output;

Page 21: The many ways to effectively utilize array processing

ITERATIVE DO LOOP

data trial2(drop = rannum); id = 'M2390'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;

id = 'F2390'; ...

id = 'F2340'; ...

id = 'M1240'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;run;

DO INDEX-VARIABLE = VALUE1, VALUE2, …, VALUEN;SAS STATEMENTSEND;

data trial2 (drop = rannum); do id = 'M2390', 'F2390', 'F2340', 'M1240'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;

Page 22: The many ways to effectively utilize array processing

THE ITERATIVE DO LOOP ALONG A SEQUENCE OF INTEGERS

data trial3 (drop = rannum); do id = 1 to 4; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;

Suppose you are using a sequence of numbers, say 1 to 4, as patient IDs

DO INDEX-VARIABLE = START TO STOP <BY INCREMENT>;SAS STATEMENTSEND;

INDEX-VARIABLE: IDSTART: 1STOP: 4INCREMENT: 1

Page 23: The many ways to effectively utilize array processing

PURPOSE OF USING ARRAYS

sbp1 sbp2 sbp3 sbp4 sbp5 sbp6

1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123

6 measurements of SBP for each patient

The missing values are coded as 999

Suppose you would like to recode 999 to periods (.)

data sbp1; set sbp; if sbp1 = 999 then sbp1 = .; if sbp2 = 999 then sbp2 = .; if sbp3 = 999 then sbp3 = .; if sbp4 = 999 then sbp4 = .; if sbp5 = 999 then sbp5 = .; if sbp6 = 999 then sbp6 = .;run;

Each of the IF statements are almost identical

Only the variable names are different

Use a DO loop?

Page 24: The many ways to effectively utilize array processing

PURPOSE OF USING ARRAYS

RECALL: DO LOOPdata trial2(drop = rannum); id = 'M2390'; rannum = ranuni(2);

if rannum> 0.5 then group = 'D'; else group ='P'; output;

id = 'F2390'; rannum = ranuni(2);

if rannum> 0.5 then group = 'D'; else group ='P'; output;

id = 'F2340'; rannum = ranuni(2);

if rannum> 0.5 then group = 'D'; else group ='P'; output;

id = 'M1240'; rannum = ranuni(2);

if rannum> 0.5 then group = 'D'; else group ='P'; output;run;

data trial2 (drop = rannum); do id = 'M2390', 'F2390', 'F2340', 'M1240'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;

The loop iterates along a sequence of values

The index variable holds these values

Difference:The values of ID variables

Page 25: The many ways to effectively utilize array processing

PURPOSE OF USING ARRAYS

sbp1 sbp2 sbp3 sbp4 sbp5 sbp6

1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123

data sbp1; set sbp; if sbp1 = 999 then sbp1 = .; if sbp2 = 999 then sbp2 = .; if sbp3 = 999 then sbp3 = .; if sbp4 = 999 then sbp4 = .; if sbp5 = 999 then sbp5 = .; if sbp6 = 999 then sbp6 = .;run;

Difference:Variable names

If we can group these variables into a single unitWe can loop along these variables

SBP

1 2 3 4 5 6 ARRAY: a temporary grouping of SAS variables

Page 26: The many ways to effectively utilize array processing

ARRAY DEFINITION AND SYNTAX

ARRAY ARRAYNAME [DIMENSION] <$> <ELEMENTS>;

Must be a SAS nameCannot be the name of

a SAS variable in the same DATA step

See handouts for other rules

Page 27: The many ways to effectively utilize array processing

ARRAY DEFINITION AND SYNTAX

ARRAY ARRAYNAME [DIMENSION] <$> <ELEMENTS>;

DIMENSION is the number of elements in the array

More on DIMENSION later…

Page 28: The many ways to effectively utilize array processing

ARRAY DEFINITION AND SYNTAX

ARRAY ARRAYNAME [DIMENSION] <$> <ELEMENTS>;

$ indicates that the elements in the array are character elements

$ is not necessary if the elements have been previously defined as character elements

Page 29: The many ways to effectively utilize array processing

ARRAY DEFINITION AND SYNTAX

ARRAY ARRAYNAME [DIMENSION] <$> <ELEMENTS>;

ELEMENTS are the variables to be included in the array

Must either be all numeric or characters

More on ELEMENTS later…

Page 30: The many ways to effectively utilize array processing

ARRAY DEFINITION AND SYNTAX

ARRAY ARRAYNAME [DIMENSION] <$> <ELEMENTS>;

array sbparray [6] sbp1 sbp2 sbp3 sbp4 sbp5 sbp6;

Page 31: The many ways to effectively utilize array processing

ARRAY DEFINITION AND SYNTAX

ARRAY ARRAYNAME [DIMENSION] <$> <ELEMENTS>;

array sbparray [*] sbp1 sbp2 sbp3 sbp4 sbp5 sbp6;

You can use an asterisk (*) as DIMENSION

You must include ELEMENTS

Page 32: The many ways to effectively utilize array processing

ARRAY DEFINITION AND SYNTAX

ARRAY ARRAYNAME [DIMENSION] <$> <ELEMENTS>;

array sbparray (6) sbp1 sbp2 sbp3 sbp4 sbp5 sbp6; array sbparray {6} sbp1 sbp2 sbp3 sbp4 sbp5 sbp6; array sbparray [6] sbp1 sbp2 sbp3 sbp4 sbp5 sbp6;

DIMENSION can be enclosed in parentheses, braces, or brackets

Page 33: The many ways to effectively utilize array processing

ARRAY DEFINITION AND SYNTAX

ARRAY ARRAYNAME [DIMENSION] <$> <ELEMENTS>;

array sbp [6]; = array sbp [6] sbp1 sbp2 sbp3 sbp4 sbp5 sbp6;

If ELEMENTS are not specified, for example:

Case1: sbp1 – sbp6 were previously defined in the DATA stepCase2: if sbp1 – sbp6 were not previously defined in the DATA step, they will be created by the ARRAY statement

Page 34: The many ways to effectively utilize array processing

ARRAY DEFINITION AND SYNTAX

ARRAY ARRAYNAME [DIMENSION] <$> <ELEMENTS>;

array num [*] _numeric_; array char [*] _character_; array allvar [*] _all_;

_NUMERIC_ : all numeric variables_CHARACTER_ : all character variables _ALL_: all the variables; variables must be

either all numeric or character

Page 35: The many ways to effectively utilize array processing

ARRAY DEFINITION AND SYNTAX

ARRAY ARRAYNAME [DIMENSION] <$> <ELEMENTS>;

array sbp [6] sbp1 - sbp6;

A single dash format can be used to specify a range of variables

Page 36: The many ways to effectively utilize array processing

ARRAY DEFINITION AND SYNTAX

ARRAYNAME [INDEX];

must be closed in ( ), [ ], or { }is specified as an integer, a numeric variable, or

a SAS expressionmust be within the lower and upper bounds of

the DIMENSION of the array

To reference an array element:

Page 37: The many ways to effectively utilize array processing

ARRAY DEFINITION AND SYNTAX

sbp1 sbp2 sbp3 sbp4 sbp5 sbp6

1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123

data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 sbp2 sbp3 sbp4 sbp5 sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;

data sbp2 (drop=i); set sbp; array sbp [6]; do i = 1 to 6; if sbp [i] = 999 then sbp [i] = .; end;run;

ARRAY:

array sbparray [6] sbp1 - sbp6;

array sbp [6]; = array sbp [6] sbp1 - sbp6; data sbp1;

set sbp; if sbp1 = 999 then sbp1 = .; if sbp2 = 999 then sbp2 = .; if sbp3 = 999 then sbp3 = .; if sbp4 = 999 then sbp4 = .; if sbp5 = 999 then sbp5 = .; if sbp6 = 999 then sbp6 = .;run;

Page 38: The many ways to effectively utilize array processing

THE DIM FUNCTION

data sbp3 (drop=i); set sbp; array sbparray [*] sbp1 - sbp6; do i = 1 to dim(sbparray); if sbparray [i] = 999 then sbparray [i] = .; end;run;

Use the DIM function to determine the number of elements in an array

It is convenient when you use _NUMERIC_, _CHARACTER_, _ALL_ as array ELEMENTS

DIM(ARRAYNAME)

Page 39: The many ways to effectively utilize array processing

ASSIGNING INITIAL VALUES TO AN ARRAY

When creating a group of variables by using the ARRAY statement, you can assign initial values to the array elements

array num[3] n1 n2 n3 (1 2 3);

array chr[3] $ ('A', 'B', 'C');

Page 40: The many ways to effectively utilize array processing

TEMPORARY ARRAYS

Temporary arrays contain temporary data elementsUsing temporary arrays is useful when you want to

create an array only for calculation purposesWhen referring to a temporary data element, you

refer to it by the ARRAYNAME and its DIMENSIONYou cannot use the asterisk (*) with temporary arrays They are not output to the output datasetThey are always automatically retainedTo create a temporary array, you need to use the

keyword _TEMPORARY_

array num[3] _temporary_ (1 2 3);

Page 41: The many ways to effectively utilize array processing

COMPILATION AND EXECUTION PHASESCOMPILATION PHASE

data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;

_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D

sbp1 sbp2 sbp3 sbp4 sbp5 sbp6

1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123

PDV is created Array name SBPARRAY and references are not included in the PDVSBP1 – SBP6, is referenced by the ARRAY referenceSyntax errors in the ARRAY statement will be detected during the

compilation phase

SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]

Page 42: The many ways to effectively utilize array processing

EXECUTION PHASE

_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D

1 . . . . . . .

sbp1 sbp2 sbp3 sbp4 sbp5 sbp6

1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123

_N_ 1The rest of the variables

missing

SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]

1st iteration of the DATA step:

data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;

Page 43: The many ways to effectively utilize array processing

EXECUTION PHASE

_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D

1 141 142 137 117 116 124 .

sbp1 sbp2 sbp3 sbp4 sbp5 sbp6

1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123

SET statement copies the 1st obs. from Sbp to the PDV

SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]

1st iteration of the DATA step:

data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;

Page 44: The many ways to effectively utilize array processing

EXECUTION PHASE

_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D

1 141 142 137 117 116 124 .

sbp1 sbp2 sbp3 sbp4 sbp5 sbp6

1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123

The ARRAY statement is a compile-time only statement

SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]

1st iteration of the DATA step:

data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;

Page 45: The many ways to effectively utilize array processing

EXECUTION PHASE

_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D

1 141 142 137 117 116 124 1

sbp1 sbp2 sbp3 sbp4 sbp5 sbp6

1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123

I 1

SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]

1st iteration of the DATA step:1st iteration of the DO loop:

data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;

Page 46: The many ways to effectively utilize array processing

EXECUTION PHASE

_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D

1 141 142 137 117 116 124 1

sbp1 sbp2 sbp3 sbp4 sbp5 sbp6

1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123

SBPARRAY [ i ] SBPARRAY [1] SBPARRAY [1] SBP1 Since SBP1 ≠ 999, no execution

SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]

1st iteration of the DATA step:1st iteration of the DO loop:

data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;

Page 47: The many ways to effectively utilize array processing

EXECUTION PHASE

_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D

1 141 142 137 117 116 124 1

sbp1 sbp2 sbp3 sbp4 sbp5 sbp6

1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123

SAS reaches the end of the DO loop

SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]

1st iteration of the DATA step:1st iteration of the DO loop:

data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;

Page 48: The many ways to effectively utilize array processing

EXECUTION PHASE

_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D

1 141 142 137 117 116 124 2

sbp1 sbp2 sbp3 sbp4 sbp5 sbp6

1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123

I 2 Since I ≤ 6, the loop continues

SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]

1st iteration of the DATA step:2nd iteration of the DO loop:

data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;

Page 49: The many ways to effectively utilize array processing

EXECUTION PHASE

_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D

1 141 142 137 117 116 124 2

sbp1 sbp2 sbp3 sbp4 sbp5 sbp6

1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123

SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]

1st iteration of the DATA step:2nd iteration of the DO loop: SBPARRAY [ i ] SBPARRAY [2] SBPARRAY [2] SBP2 Since SBP2 ≠ 999, no execution

data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;

Page 50: The many ways to effectively utilize array processing

EXECUTION PHASE

_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D

1 141 142 137 117 116 124 2

sbp1 sbp2 sbp3 sbp4 sbp5 sbp6

1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123

SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]

1st iteration of the DATA step:2nd iteration of the DO loop:SAS reaches the end of the DO

loopSkip the rest of the iterations

data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;

Page 51: The many ways to effectively utilize array processing

EXECUTION PHASE

_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D

1 141 142 137 117 116 124 7

sbp1 sbp2 sbp3 sbp4 sbp5 sbp6

1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123

SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]

1st iteration of the DATA step:SAS reaches the end of the

DATA stepThe implicit OUTPUT executes

sbp1 sbp2 sbp3 sbp4 sbp5 sbp6

1 141 142 137 117 116 124

data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;

Page 52: The many ways to effectively utilize array processing

EXECUTION PHASE

_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D

2 141 142 137 117 116 124 .

sbp1 sbp2 sbp3 sbp4 sbp5 sbp6

1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123

SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]

2nd iteration of the DATA step:_N_ ↑ 2SBP1 – SBP6 are retained I missing

sbp1 sbp2 sbp3 sbp4 sbp5 sbp6

1 141 142 137 117 116 124

data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;

Page 53: The many ways to effectively utilize array processing

EXECUTION PHASE

_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D

2 999 141 138 119 119 122 .

sbp1 sbp2 sbp3 sbp4 sbp5 sbp6

1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123

SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]

2nd iteration of the DATA step:The SET statement copies the

2nd obs. to the PDV

sbp1 sbp2 sbp3 sbp4 sbp5 sbp6

1 141 142 137 117 116 124

data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;

Page 54: The many ways to effectively utilize array processing

EXECUTION PHASE

_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D

2 999 141 138 119 119 122 1

sbp1 sbp2 sbp3 sbp4 sbp5 sbp6

1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123

SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]

2nd iteration of the DATA step:

I 1

sbp1 sbp2 sbp3 sbp4 sbp5 sbp6

1 141 142 137 117 116 1241st iteration of the DO loop:

data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;

Page 55: The many ways to effectively utilize array processing

EXECUTION PHASE

_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D

2 999 141 138 119 119 122 1

sbp1 sbp2 sbp3 sbp4 sbp5 sbp6

1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123

SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]

sbp1 sbp2 sbp3 sbp4 sbp5 sbp6

1 141 142 137 117 116 124

SBPARRAY [ i ] SBPARRAY [1] SBPARRAY [1] SBP1

2nd iteration of the DATA step:1st iteration of the DO loop:

data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;

Page 56: The many ways to effectively utilize array processing

EXECUTION PHASE

_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D

2 . 141 138 119 119 122 1

sbp1 sbp2 sbp3 sbp4 sbp5 sbp6

1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123

SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]

sbp1 sbp2 sbp3 sbp4 sbp5 sbp6

1 141 142 137 117 116 124

SBPARRAY [ i ] SBPARRAY [1] SBPARRAY [1] SBP1 Since SBP1 = 999, SBP1 missing

2nd iteration of the DATA step:1st iteration of the DO loop:

data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;

Page 57: The many ways to effectively utilize array processing

EXECUTION PHASE

_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D

2 . 141 138 119 119 122 1

sbp1 sbp2 sbp3 sbp4 sbp5 sbp6

1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123

SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]

sbp1 sbp2 sbp3 sbp4 sbp5 sbp6

1 141 142 137 117 116 124

SAS reaches the end of loopSkip the rest of the loop

2nd iteration of the DATA step:1st iteration of the DO loop:

data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;

Page 58: The many ways to effectively utilize array processing

EXECUTION PHASE

_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D

2 . 141 138 119 119 122 7

sbp1 sbp2 sbp3 sbp4 sbp5 sbp6

1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123

SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]

sbp1 sbp2 sbp3 sbp4 sbp5 sbp6

1 141 142 137 117 116 124

2nd iteration of the DATA step:SAS reaches the end of the

DATA step

data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;

Page 59: The many ways to effectively utilize array processing

EXECUTION PHASE

_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D

2 . 141 138 119 119 122 7

sbp1 sbp2 sbp3 sbp4 sbp5 sbp6

1 141 142 137 117 116 124

2 999 141 138 119 119 122

3 142 999 139 119 120 999

4 136 140 142 118 121 123

SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]

sbp1 sbp2 sbp3 sbp4 sbp5 sbp6

1 141 142 137 117 116 124

2 . 141 138 119 119 122

2nd iteration of the DATA step:SAS reaches the end of the

DATA stepThe implicit OUTPUT executesSkip the rest of the iterations

data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;

Page 60: The many ways to effectively utilize array processing

SOME ARRAY APPLICATIONSCREATING A GROUP OF VARIABLES BY USING ARRAYS

sbp1 sbp2 sbp3 sbp4 sbp5 sbp6

1 141 142 137 117 116 124

2 . 141 138 119 119 122

3 142 . 139 119 120 .

4 136 140 142 118 121 123

Pre-treatment Post-treatment

MEAN SBP: 140 120

above1 above2 above3 above4 above5 above6

1 1 1 0 0 0 1

2 . 1 0 0 0 1

3 1 . 0 0 0 .

4 0 0 1 0 1 1

data sbp4 (drop=i); set sbp2; array sbp[6]; array above[6]; array threshhold[6] _temporary_ (140 140 140 120 120 120); do i = 1 to 6; if (not missing(sbp[i])) then above [i] = sbp[i] > threshhold[i]; end;run;

Used to group the existing variables: sbp1 – sbp6

Page 61: The many ways to effectively utilize array processing

CREATING A GROUP OF VARIABLES BY USING ARRAYS

sbp1 sbp2 sbp3 sbp4 sbp5 sbp6

1 141 142 137 117 116 124

2 . 141 138 119 119 122

3 142 . 139 119 120 .

4 136 140 142 118 121 123

Pre-treatment Post-treatment

MEAN SBP: 140 120

above1 above2 above3 above4 above5 above6

1 1 1 0 0 0 1

2 . 1 0 0 0 1

3 1 . 0 0 0 .

4 0 0 1 0 1 1

data sbp4 (drop=i); set sbp2; array sbp[6]; array above[6]; array threshhold[6] _temporary_ (140 140 140 120 120 120); do i = 1 to 6; if (not missing(sbp[i])) then above [i] = sbp[i] > threshhold[i]; end;run;

Used to create variables: above1 – above6

Page 62: The many ways to effectively utilize array processing

sbp1 sbp2 sbp3 sbp4 sbp5 sbp6

1 141 142 137 117 116 124

2 . 141 138 119 119 122

3 142 . 139 119 120 .

4 136 140 142 118 121 123

Pre-treatment Post-treatment

MEAN SBP: 140 120

above1 above2 above3 above4 above5 above6

1 1 1 0 0 0 1

2 . 1 0 0 0 1

3 1 . 0 0 0 .

4 0 0 1 0 1 1

data sbp4 (drop=i); set sbp2; array sbp[6]; array above[6]; array threshhold[6] _temporary_ (140 140 140 120 120 120); do i = 1 to 6; if (not missing(sbp[i])) then above [i] = sbp[i] > threshhold[i]; end;run;

The temporary array is for comparison purposes

CREATING A GROUP OF VARIABLES BY USING ARRAYS

Page 63: The many ways to effectively utilize array processing

sbp1 sbp2 sbp3 sbp4 sbp5 sbp6

1 141 142 137 117 116 124

2 . 141 138 119 119 122

3 142 . 139 119 120 .

4 136 140 142 118 121 123

Pre-treatment Post-treatment

MEAN SBP: 140 120

above1 above2 above3 above4 above5 above6

1 1 1 0 0 0 1

2 . 1 0 0 0 1

3 1 . 0 0 0 .

4 0 0 1 0 1 1

data sbp4 (drop=i); set sbp2; array sbp[6]; array above[6]; array threshhold[6] _temporary_ (140 140 140 120 120 120); do i = 1 to 6; if (not missing(sbp[i])) then above [i] = sbp[i] > threshhold[i]; end;run;

CREATING A GROUP OF VARIABLES BY USING ARRAYS

Page 64: The many ways to effectively utilize array processing

THE IN OPERATOR

sbp1 sbp2 sbp3 sbp4 sbp5 sbp6

1 141 142 137 117 116 124

2 . 141 138 119 119 122

3 142 . 139 119 120 .

4 136 140 142 118 121 123

miss

0

1

1

0

data sbp6 (drop = i); set sbp2; array sbp [6]; if . IN sbp then miss = 1; else miss = 0;run;

Page 65: The many ways to effectively utilize array processing

CALCULATING PRODUCTS OF MULTIPLE VARIABLES

num1 num2 num3 num4

1 4 . 2 3

2 . 2 3 1

data product (drop=i); set test; array num[4]; if missing(num[1]) then result = 1; else result = num[1]; do i = 2 to 4; if not missing(num[i]) then result =result*num[i]; end;run;

Approach:1. Create an array: num[4]2. Treat missing value as 13. Set result = num[1] Loop: i = 2 to 4 result = result * num[i] End Loop

Test:

Used to group the existing variables: num1 – num6

Page 66: The many ways to effectively utilize array processing

CALCULATING PRODUCTS OF MULTIPLE VARIABLES

num1 num2 num3 num4

1 4 . 2 3

2 . 2 3 1

data product (drop=i); set test; array num[4]; if missing(num[1]) then result = 1; else result = num[1]; do i = 2 to 4; if not missing(num[i]) then result =result*num[i]; end;run;

Approach:1. Create an array: num[4]2. Treat missing value as 13. Set result = num[1] Loop: i = 2 to 4 result = result * num[i] End Loop

Test:

Page 67: The many ways to effectively utilize array processing

CALCULATING PRODUCTS OF MULTIPLE VARIABLES

num1 num2 num3 num4

1 4 . 2 3

2 . 2 3 1

data product (drop=i); set test; array num[4]; if missing(num[1]) then result = 1; else result = num[1]; do i = 2 to 4; if not missing(num[i]) then result =result*num[i]; end;run;

Approach:1. Create an array: num[4]2. Treat missing value as 13. Set result = num[1] Loop: i = 2 to 4 result = result * num[i] End Loop

Test:

Page 68: The many ways to effectively utilize array processing

RESTRUCTURING DATASETS USING ARRAYS

Restructuring datasets:

data with one observation per

subject (the wide format)

data with multiple observations per

subject (the long format)

ID S1 S2 S3

1 A01 3 4 5

2 A02 4 . 2

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

Page 69: The many ways to effectively utilize array processing

FROM WIDE FORMAT TO LONG FORMAT (WITHOUT USING ARRAYS)

Wide:

Long:

Transform wide long2 obs. to read 2

DATA step iterationsUse multiple OUTPUT

statementAny missing values in

S1 – S3 will not be outputted to long

data long (drop=s1-s3); set wide;

time = 1; score = s1; if not missing(score) then output;

time = 2; score = s2; if not missing(score) then output;

time = 3; score = s3; if not missing(score) then output;run;

ID S1 S2 S3

1 A01 3 4 5

2 A02 4 . 2

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

Page 70: The many ways to effectively utilize array processing

FROM WIDE FORMAT TO LONG FORMAT(USING ARRAYS)

Wide:

Long:

data long (drop=s1-s3); set wide;

time = 1; score = s1; if not missing(score) then output;

time = 2; score = s2; if not missing(score) then output;

time = 3; score = s3; if not missing(score) then output;run;

ID S1 S2 S3

1 A01 3 4 5

2 A02 4 . 2

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

S

[1] [2] [3]

array s[3];

S[1];

S[2];

S[3];

Page 71: The many ways to effectively utilize array processing

FROM WIDE FORMAT TO LONG FORMAT(USING ARRAYS)

Wide:

Long:

data long (drop=s1-s3); set wide;

time = 1; score = s1; if not missing(score) then output;

time = 2; score = s2; if not missing(score) then output;

time = 3; score = s3; if not missing(score) then output;run;

ID S1 S2 S3

1 A01 3 4 5

2 A02 4 . 2

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

S

[1] [2] [3]

array s[3];

S[1];

S[2];

S[3];

Create a DO loop – TIME as index variable

Page 72: The many ways to effectively utilize array processing

FROM WIDE FORMAT TO LONG FORMAT(USING ARRAYS)

Wide:

Long:

data long (drop=s1-s3); set wide;

time = 1; score = s1; if not missing(score) then output;

time = 2; score = s2; if not missing(score) then output;

time = 3; score = s3; if not missing(score) then output;run;

ID S1 S2 S3

1 A01 3 4 5

2 A02 4 . 2

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

S

[1] [2] [3]

array s[3];

S[1];

S[2];

S[3];

do time = 1 to 3; score = s[time]; if not missing(score) then output;end;

data long (drop=s1-s3); set wide; array s[3];

run;

Page 73: The many ways to effectively utilize array processing

FROM LONG FORMAT TO WIDE FORMAT

Reading 5 observations but only creating 2 observations

You are not copying data from the PDV to the final dataset at each iteration

You only need to generate one observation once all the observations for each subject have been processed

Wide:Long:

ID S1 S2 S3

1 A01 3 4 5

2 A02 4 . 2

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

Page 74: The many ways to effectively utilize array processing

REVIEW THE RETAIN STATEMENT

To prevents the VARIABLE from being initialized each time the DATA step executes, use the RETAIN statement:

RETAIN VARIABLE <VALUE>;

Name of the variable that we will want to retain

A numeric valueUsed to initialize the VARIABLE

only at the first iteration of the DATA step execution

Not specifying an initial value VARIABLE is initialized as missing

Page 75: The many ways to effectively utilize array processing

REVIEW: THE SUM STATEMENT

The SUM statement has the following form:

VARIABLE + EXPRESSION;

The numeric accumulator variable that is to be created

It is automatically set to 0 at the beginning of the first iteration of the DATA step execution

Retained in following iterations

Any SAS expression If EXPRESSION is evaluated

to a missing value, it is treated as 0

Page 76: The many ways to effectively utilize array processing

REVIEW: FIRST.VARIABLE AND LAST.VARIABLE

You only output the data after you finish reading the last observation of each subject

Thus, you need to identify the last observation

Wide:Long:

ID S1 S2 S3

1 A01 3 4 5

2 A02 4 . 2

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

Page 77: The many ways to effectively utilize array processing

BY-group processing method

proc sort data=b; by by_variable;run;data a; set b; by by_variable; ... ...run;

For each BY-variable, SAS creates two temporary variables: FIRST.VARIABLELAST.VARIABLE

FIRST.VARIABLE & LAST.VARIABLE are set to 1 at the beginning of the execution phase

They are not being output to the final dataset

REVIEW: FIRST.VARIABLE AND LAST.VARIABLE

Page 78: The many ways to effectively utilize array processing

ID SCORE

1 A01 3

2 A01 3

3 A01 2

4 A02 4

5 A02 2

Suppose ID is the “BY” variable:

FIRST.ID

1

0

0

1

0

LAST.ID

0

0

1

0

1

SAS reads the 1st observation for ID = A01

SAS reads the last observation for ID = A01

“GROUPING”

1

2

Grouping based ID

REVIEW: FIRST.VARIABLE AND LAST.VARIABLE

Page 79: The many ways to effectively utilize array processing

REVIEW SUBSETTING IF STATEMENT

Use the IF statement to continue processing only the observations that meet the condition of the specified expression

IF EXPRESSION;

If the EXPRESSION is true for the observationSAS continues to execute statements in the

DATA step and includes the current observation in the data set

Page 80: The many ways to effectively utilize array processing

REVIEW SUBSETTING IF STATEMENT

Use the IF statement to continue processing only the observations that meet the condition of the specified expression

IF EXPRESSION;

If the EXPRESSION is falseno further statements are processed for that

observation the current observation is not written to the data set the remaining program statements in the DATA step

are not executedSAS immediately returns to the beginning of the

DATA step

Page 81: The many ways to effectively utilize array processing

FROM LONG FORMAT TO WIDE FORMAT(WITHOUT USING ARRAYS)

S1

S2

S3

S1

S3

if time = 1 then s1 = score;else if time = 2 then s2 = score;else s3 = score;

Use BY-group processing: BY ID Output to the final data when LAST.ID = 1

SCORE S1, S2 S3

RETAINID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

ID S1 S2 S3

1 A01 3 4 5

2 A02 4 . 2

Page 82: The many ways to effectively utilize array processing

FROM LONG FORMAT TO WIDE FORMAT(WITHOUT USING ARRAYS)

S1

S2

S3

S1

S3

RETAINID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

ID S1 S2 S3

1 A01 3 4 5

2 A02 4 . 2

proc sort data=long; by id;data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;

Page 83: The many ways to effectively utilize array processing

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;

1ST iteration:_N_ 1FIRST.ID 1, LAST.ID 1Other variables missing

_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K

1 1 1 . . . . .

EXECUTION PHASE Long:

Page 84: The many ways to effectively utilize array processing

1ST iteration:The SET statement copies the 1st observation PDV

_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K

1 1 1 A01 1 3 . . .

EXECUTION PHASE

data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

Page 85: The many ways to effectively utilize array processing

1ST iteration:The SET statement copies the 1st observation PDVFIRST.ID 1 since this is the 1st observation for A01LAST.ID 0 since this is not the last observation for A01

_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K

1 1 0 A01 1 3 . . .

EXECUTION PHASE

data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

Page 86: The many ways to effectively utilize array processing

1ST iteration:Since TIME = 1, S1 SCORE (3)

_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K

1 1 0 A01 1 3 3 . .

EXECUTION PHASE

data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

Page 87: The many ways to effectively utilize array processing

1ST iteration:Since LAST.ID ≠1, SAS returns to the beginning of the DATA

step to begin the 2nd iteration

_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K

1 1 0 A01 1 3 3 . .

EXECUTION PHASE

data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

Page 88: The many ways to effectively utilize array processing

2nd iteration:_N_ ↑2

_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K

2 1 0 A01 1 3 3 . .

EXECUTION PHASE

data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

Page 89: The many ways to effectively utilize array processing

2nd iteration: FIRST.ID and LAST.ID are retained; they are automatic variables ID, TIME, SCORE are retained; they are from input dataset S1, S2, and S3 are retained because of the RETAIN statement

_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K

2 1 0 A01 1 3 3 . .

EXECUTION PHASE

data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

Page 90: The many ways to effectively utilize array processing

2nd iteration:The SET statement copies the 2nd observation to the PDV

_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K

2 1 0 A01 2 4 3 . .

EXECUTION PHASE

data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

Page 91: The many ways to effectively utilize array processing

2nd iteration:The SET statement copies the 2nd observation to the PDVFIRST.ID 0; this is not the first observation for A01LAST.ID 0; this is not the last observation for A01 either

_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K

2 0 0 A01 2 4 3 . .

EXECUTION PHASE

data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

Page 92: The many ways to effectively utilize array processing

2nd iteration:Since TIME = 2, S2 SCORE (4)

_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K

2 0 0 A01 2 4 3 4 .

EXECUTION PHASE

data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

Page 93: The many ways to effectively utilize array processing

2nd iteration:Since LAST.ID ≠1, SAS returns to the beginning of the DATA

step to begin the 3rd iteration

_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K

2 0 0 A01 2 4 3 4 .

EXECUTION PHASE

data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

Page 94: The many ways to effectively utilize array processing

3rd iteration:_N_ ↑3The rest of the variables are retained

_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K

3 0 0 A01 2 4 3 4 .

EXECUTION PHASE

data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

Page 95: The many ways to effectively utilize array processing

3rd iteration:The SET statement copies the 3rd observation PDV

_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K

3 0 0 A01 3 5 3 4 .

EXECUTION PHASE

data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

Page 96: The many ways to effectively utilize array processing

3rd iteration:The SET statement copies the 3rd observation PDVFIRST.ID 0; this is not the first observation for A01LAST.ID 1; this is the last observation for A01

_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K

3 0 1 A01 3 5 3 4 .

EXECUTION PHASE

data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

Page 97: The many ways to effectively utilize array processing

3rd iteration:Since TIME = 3, S3 SCORE (5)

_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K

3 0 1 A01 3 5 3 4 5

EXECUTION PHASE

data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

Page 98: The many ways to effectively utilize array processing

3rd iteration:Since LAST.ID = 1, SAS continues to execute statements in

the DATA step

_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K

3 0 1 A01 3 5 3 4 5

EXECUTION PHASE

data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

Page 99: The many ways to effectively utilize array processing

3rd iteration:SAS reaches the end of 3rd iteration The implicit OUTPUT executes, variables marked

with (K) are copied to the dataset wide

_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K

3 0 1 A01 3 5 3 4 5

EXECUTION PHASE

data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

ID S1 S2 S3

1 A01 3 4 5

Page 100: The many ways to effectively utilize array processing

4th iteration:_N_ ↑4The rest of the variables are retained

_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K

4 0 1 A01 3 5 3 4 5

EXECUTION PHASE

data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

ID S1 S2 S3

1 A01 3 4 5

Page 101: The many ways to effectively utilize array processing

4th iteration:The SET statement copies the 4th observation PDV

_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K

4 0 1 A02 1 4 3 4 5

EXECUTION PHASE

data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

ID S1 S2 S3

1 A01 3 4 5

Page 102: The many ways to effectively utilize array processing

4th iteration:The SET statement copies the 4th observation PDVFIRST.ID 1; this is the first observation for A02LAST.ID 0; this is not the last observation for A02

_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K

4 1 0 A02 1 4 3 4 5

EXECUTION PHASE

data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

ID S1 S2 S3

1 A01 3 4 5

Page 103: The many ways to effectively utilize array processing

4th iteration:Since TIME = 1, S1 SCORE (4)

_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K

4 1 0 A02 1 4 4 4 5

EXECUTION PHASE

data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

ID S1 S2 S3

1 A01 3 4 5

Page 104: The many ways to effectively utilize array processing

4th iteration:Since LAST.ID ≠1, SAS returns to the beginning of the DATA

step to begin the 5th iteration

_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K

4 1 0 A02 1 4 4 4 5

EXECUTION PHASE

data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

ID S1 S2 S3

1 A01 3 4 5

Page 105: The many ways to effectively utilize array processing

5th iteration:_N_ ↑5The rest of the variables are retained

_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K

5 1 0 A02 1 4 4 4 5

EXECUTION PHASE

data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

ID S1 S2 S3

1 A01 3 4 5

Page 106: The many ways to effectively utilize array processing

5th iteration:The SET statement copies the 5th observation PDV

_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K

5 1 0 A02 3 2 4 4 5

EXECUTION PHASE

data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

ID S1 S2 S3

1 A01 3 4 5

Page 107: The many ways to effectively utilize array processing

5th iteration:The SET statement copies the 5th observation PDVFIRST.ID 0; this is not the first observation for A02LAST.ID 1; this is the last observation for A02

_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K

5 0 1 A02 3 2 4 4 5

EXECUTION PHASE

data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

ID S1 S2 S3

1 A01 3 4 5

Page 108: The many ways to effectively utilize array processing

5th iteration:Since TIME = 3, S3 SCORE (2)

_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K

5 0 1 A02 3 2 4 4 2

EXECUTION PHASE

data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

ID S1 S2 S3

1 A01 3 4 5

Page 109: The many ways to effectively utilize array processing

5th iteration:Since LAST.ID = 1, SAS continues to execute the rest of the

statement

_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K

5 0 1 A02 3 2 4 4 2

EXECUTION PHASE

data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

ID S1 S2 S3

1 A01 3 4 5

Page 110: The many ways to effectively utilize array processing

5th iteration:SAS reaches the end of 5th iteration The implicit OUTPUT executes, variables marked with (K) are

copied to the dataset wide

_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K

5 0 1 A02 3 2 4 4 2

EXECUTION PHASE

data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

ID S1 S2 S3

1 A01 3 4 5

2 A02 4 4 2

How to fix this?

Page 111: The many ways to effectively utilize array processing

EXECUTION PHASE

data wide (drop=time score); set long; by id; retain s1 - s3; if first.id then do; s1 = .; s2 = .; s3 = .; end; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;

Page 112: The many ways to effectively utilize array processing

FROM LONG FORMAT TO WIDE FORMAT(USING ARRAYS)

ID S1 S2 S3

1 A01 3 4 5

2 A02 4 . 2

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

data wide (drop=time score); set long; by id;

retain s1 - s3;

if first.id then do; s1 = .; s2 = .; s3 = .; end;

if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score;

if last.id;run;

S

[1] [2] [3]

array s[3];if first.id then do; do i = 1 to 3; s[i] = .; end;end;

S[1] S[2] S[3]S[1] S[2] S[3]

S[1]S[2]

S[3]

retain s;

Page 113: The many ways to effectively utilize array processing

FROM LONG FORMAT TO WIDE FORMAT(USING ARRAYS)

ID S1 S2 S3

1 A01 3 4 5

2 A02 4 . 2

data wide (drop=time score); set long; by id;

retain s1 - s3;

if first.id then do; s1 = .; s2 = .; s3 = .; end;

if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score;

if last.id;run;

S

[1] [2] [3]

array s[3];

S[1] S[2] S[3]S[1] S[2] S[3]

S[1]S[2]

S[3]

_N_ D FIRST.ID D LAST.ID D

ID K TIME D SCORE D

S1 K S2 K S3 K

S[1] S[2] S[3]

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

retain s;

Page 114: The many ways to effectively utilize array processing

FROM LONG FORMAT TO WIDE FORMAT(USING ARRAYS)

ID S1 S2 S3

1 A01 3 4 5

2 A02 4 . 2

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

data wide (drop=time score); set long; by id;

retain s1 - s3;

if first.id then do; s1 = .; s2 = .; s3 = .; end;

if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score;

if last.id;run;

S

[1] [2] [3]

array s[3];

S[1] S[2] S[3]S[1] S[2] S[3]

S[1]S[2]

S[3]

_N_ D FIRST.ID D LAST.ID D

1 1 0

ID K TIME D SCORE D

A01 1 3

S1 K S2 K S3 K

. . .

S[1] S[2] S[3]

S[TIME]

3

retain s;

Page 115: The many ways to effectively utilize array processing

FROM LONG FORMAT TO WIDE FORMAT(USING ARRAYS)

ID S1 S2 S3

1 A01 3 4 5

2 A02 4 . 2

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

data wide (drop=time score); set long; by id;

retain s1 - s3;

if first.id then do; s1 = .; s2 = .; s3 = .; end;

if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score;

if last.id;run;

S

[1] [2] [3]

array s[3];

S[1] S[2] S[3]S[1] S[2] S[3]

S[1]S[2]

S[3]

_N_ D FIRST.ID D LAST.ID D

2 0 0

ID K TIME D SCORE D

A01 2 4

S1 K S2 K S3 K

. . .

S[1] S[2] S[3]

S[TIME]

3 4

retain s;

Page 116: The many ways to effectively utilize array processing

FROM LONG FORMAT TO WIDE FORMAT(USING ARRAYS)

ID S1 S2 S3

1 A01 3 4 5

2 A02 4 . 2

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

data wide (drop=time score); set long; by id;

retain s1 - s3;

if first.id then do; s1 = .; s2 = .; s3 = .; end;

if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score;

if last.id;run;

S

[1] [2] [3]

array s[3];

S[1] S[2] S[3]S[1] S[2] S[3]

S[1]S[2]

S[3]s[time] = score;

retain s;

Page 117: The many ways to effectively utilize array processing

FROM LONG FORMAT TO WIDE FORMAT(USING ARRAYS)

ID S1 S2 S3

1 A01 3 4 5

2 A02 4 . 2

ID TIME SCORE

1 A01 1 3

2 A01 2 4

3 A01 3 5

4 A02 1 4

5 A02 3 2

data wide (drop=time score); set long; by id;

retain s1 - s3;

if first.id then do; s1 = .; s2 = .; s3 = .; end;

if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score;

if last.id;run;

S

[1] [2] [3]

array s[3];

S[1] S[2] S[3]S[1] S[2] S[3]

S[1]S[2]

S[3]

if first.id then do; do i = 1 to 3; s[i] = .; end;end;

s[time] = score;

if last.id;run;

data wide (drop = time score i); set long; by id; array s[3]; retain s;retain s;

Page 118: The many ways to effectively utilize array processing

MULTIDIMENSIONAL ARRAYS

ARRAY ARRAYNAME[R, C, …] <$> <ELEMENTS>;

The difference between one- and multi-dimensional arrays is the DIMENSION

R: number of rowsC: number of columnsIf there are 3 dimensions, the next number will

refer to the number of pages

Page 119: The many ways to effectively utilize array processing

MULTIDIMENSIONAL ARRAYS

array a[2,3];

equivalent to …

array a[2,3] a1 - a6;

1 2 3

1 a1 a2 a3

2 a4 a5 a6

a[2,2]

a[1,3]

Page 120: The many ways to effectively utilize array processing

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3

1 1 A B F B A C

2 2 B A D C B C

Dat1:

Dat2:

Create ONE observation after you finish reading ALL the observations for EACH person

FIRST.ID

1

0

1

0

LAST.ID

0

1

0

1

Use the BY-group processing

The output will be generated when LAST.ID equals 1

Page 121: The many ways to effectively utilize array processing

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3

1 1 A B F B A C

2 2 B A D C B C

Dat1:

Dat2:

1 2 3

G1 G2 G3G[3]:

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

ALL_G[2,3]:

Use to group existing variables

Use to create new variables

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

RETAIN

i + 1;

Page 122: The many ways to effectively utilize array processing

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

proc sort data=dat1; by id;run;

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

Page 123: The many ways to effectively utilize array processing

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

1 1 1 . . . . 0 .

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

. . . . . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

ALL_G [I,J]

Dat1:

At the beginning of the 1st iteration:

G1 G2 G3G [J]

ARRAY TRACKING

Page 124: The many ways to effectively utilize array processing

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

1 1 0 1 A B F 0 .

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

. . . . . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

1st iteration:

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

ALL_G [I,J]

G1 G2 G3G [J]

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

Page 125: The many ways to effectively utilize array processing

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

1 1 0 1 A B F 0 .

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

. . . . . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

1st iteration:

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

ALL_G [I,J]

G1 G2 G3G [J]

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

Page 126: The many ways to effectively utilize array processing

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

1 1 0 1 A B F 1 .

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

. . . . . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

1st iteration:

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

ALL_G [I,J]

G1 G2 G3G [J]

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

Page 127: The many ways to effectively utilize array processing

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

1 1 0 1 A B F 1 1

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

. . . . . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

1st iteration (1st DO loop):

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

ALL_G [I,J]

G1 G2 G3G [J]

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

Page 128: The many ways to effectively utilize array processing

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

1 1 0 1 A B F 1 1

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

A . . . . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

1st iteration (1st DO loop):

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

Page 129: The many ways to effectively utilize array processing

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

1 1 0 1 A B F 1 1

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

A . . . . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

1st iteration (1st DO loop):

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

Page 130: The many ways to effectively utilize array processing

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

1 1 0 1 A B F 1 2

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

A . . . . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

1st iteration (2nd DO loop):

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

Page 131: The many ways to effectively utilize array processing

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

1 1 0 1 A B F 1 2

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

A B . . . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

1st iteration (2nd DO loop):

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

Page 132: The many ways to effectively utilize array processing

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

1 1 0 1 A B F 1 2

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

A B . . . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

1st iteration (2nd DO loop):

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

Page 133: The many ways to effectively utilize array processing

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

1 1 0 1 A B F 1 3

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

A B . . . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

1st iteration (3rd DO loop):

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

Page 134: The many ways to effectively utilize array processing

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

1 1 0 1 A B F 1 3

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

A B F . . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

1st iteration (3rd DO loop):

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

Page 135: The many ways to effectively utilize array processing

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

1 1 0 1 A B F 1 3

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

A B F . . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

1st iteration (3rd DO loop):

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

Page 136: The many ways to effectively utilize array processing

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

1 1 0 1 A B F 1 4

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

A B F . . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

1st iteration (4th DO loop):

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

Page 137: The many ways to effectively utilize array processing

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

1 1 0 1 A B F 1 4

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

A B F . . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

1st iteration:

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

Page 138: The many ways to effectively utilize array processing

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

2 1 0 1 A B F 1 .

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

A B F . . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

2nd iteration:

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

Page 139: The many ways to effectively utilize array processing

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

2 0 1 1 B A C 1 .

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

A B F . . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

2nd iteration:

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

Page 140: The many ways to effectively utilize array processing

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

2 0 1 1 B A C 1 .

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

A B F . . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

2nd iteration:

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

Page 141: The many ways to effectively utilize array processing

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

2 0 1 1 B A C 2 .

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

A B F . . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

2nd iteration:

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

Page 142: The many ways to effectively utilize array processing

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

2 0 1 1 B A C 2 1

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

A B F . . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

2nd iteration (1st DO loop):

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

Page 143: The many ways to effectively utilize array processing

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

2 0 1 1 B A C 2 1

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

A B F B . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

2nd iteration (1st DO loop):

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

Page 144: The many ways to effectively utilize array processing

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

2 0 1 1 B A C 2 1

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

A B F B . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

2nd iteration (1st DO loop):

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

Page 145: The many ways to effectively utilize array processing

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

2 0 1 1 B A C 2 2

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

A B F B . .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

2nd iteration (2nd DO loop):

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

Page 146: The many ways to effectively utilize array processing

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

2 0 1 1 B A C 2 2

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

A B F B A .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

2nd iteration (2nd DO loop):

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

Page 147: The many ways to effectively utilize array processing

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

2 0 1 1 B A C 2 2

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

A B F B A .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

2nd iteration (2nd DO loop):

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

Page 148: The many ways to effectively utilize array processing

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

2 0 1 1 B A C 2 3

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

A B F B A .

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

2nd iteration (3rd DO loop):

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

Page 149: The many ways to effectively utilize array processing

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

2 0 1 1 B A C 2 3

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

A B F B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

2nd iteration (3rd DO loop):

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

Page 150: The many ways to effectively utilize array processing

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

2 0 1 1 B A C 2 3

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

A B F B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

2nd iteration (3rd DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

Page 151: The many ways to effectively utilize array processing

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

2 0 1 1 B A C 2 4

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

A B F B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

2nd iteration (4th DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

Page 152: The many ways to effectively utilize array processing

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

2 0 1 1 B A C 2 4

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

A B F B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

2nd iteration:

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

Page 153: The many ways to effectively utilize array processing

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

2 0 1 1 B A C 2 4

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

A B F B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

2nd iteration:

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ALL_G [I,J]

G [J]

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

Page 154: The many ways to effectively utilize array processing

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

2 0 1 1 B A C 2 4

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

A B F B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

2nd iteration:

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3

1 1 A B F B A C

ALL_G [I,J]

G [J]

Dat2:

Page 155: The many ways to effectively utilize array processing

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

3 0 1 1 B A C 2 .

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

A B F B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

3rd iteration:

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3

1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]

Page 156: The many ways to effectively utilize array processing

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

3 1 0 2 B A D 2 .

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

A B F B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

3rd iteration:

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3

1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]

Page 157: The many ways to effectively utilize array processing

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

3 1 0 2 B A D 0 .

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

A B F B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

3rd iteration:

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3

1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]

Page 158: The many ways to effectively utilize array processing

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

3 1 0 2 B A D 1 .

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

A B F B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

3rd iteration:

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3

1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]

Page 159: The many ways to effectively utilize array processing

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

3 1 0 2 B A D 1 1

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

A B F B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

3rd iteration (1st DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3

1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]

Page 160: The many ways to effectively utilize array processing

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

3 1 0 2 B A D 1 1

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

B B F B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

3rd iteration (1st DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3

1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]

Page 161: The many ways to effectively utilize array processing

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

3 1 0 2 B A D 1 1

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

B B F B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

3rd iteration (1st DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3

1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]

Page 162: The many ways to effectively utilize array processing

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

3 1 0 2 B A D 1 2

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

B B F B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

3rd iteration (2nd DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3

1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]

Page 163: The many ways to effectively utilize array processing

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

3 1 0 2 B A D 1 2

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

B A F B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

3rd iteration (2nd DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3

1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]

Page 164: The many ways to effectively utilize array processing

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

3 1 0 2 B A D 1 2

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

B A F B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

3rd iteration (2nd DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3

1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]

Page 165: The many ways to effectively utilize array processing

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

3 1 0 2 B A D 1 3

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

B A F B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

3rd iteration (3rd DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3

1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]

Page 166: The many ways to effectively utilize array processing

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

3 1 0 2 B A D 1 3

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

B A D B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

3rd iteration (3rd DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3

1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]

Page 167: The many ways to effectively utilize array processing

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

3 1 0 2 B A D 1 3

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

B A D B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

3rd iteration (3rd DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3

1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]

Page 168: The many ways to effectively utilize array processing

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

3 1 0 2 B A D 1 4

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

B A D B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

3rd iteration (4th DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3

1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]

Page 169: The many ways to effectively utilize array processing

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

3 1 0 2 B A D 1 4

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

B A D B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

3rd iteration:

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3

1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]

Page 170: The many ways to effectively utilize array processing

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

4 1 0 2 B A D 1 .

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

B A D B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

4th iteration:

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3

1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]

Page 171: The many ways to effectively utilize array processing

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

4 0 1 2 C B C 1 .

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

B A D B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

4th iteration:

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3

1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]

Page 172: The many ways to effectively utilize array processing

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

4 0 1 2 C B C 1 .

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

B A D B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

4th iteration:

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3

1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]

Page 173: The many ways to effectively utilize array processing

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

4 0 1 2 C B C 2 .

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

B A D B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

4th iteration:

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3

1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]

Page 174: The many ways to effectively utilize array processing

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

4 0 1 2 C B C 2 1

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

B A D B A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

4th iteration (1st DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3

1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]

Page 175: The many ways to effectively utilize array processing

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

4 0 1 2 C B C 2 1

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

B A D C A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

4th iteration (1st DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3

1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]

Page 176: The many ways to effectively utilize array processing

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

4 0 1 2 C B C 2 1

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

B A D C A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

4th iteration (1st DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3

1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]

Page 177: The many ways to effectively utilize array processing

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

4 0 1 2 C B C 2 2

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

B A D C A C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

4th iteration (2nd DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3

1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]

Page 178: The many ways to effectively utilize array processing

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

4 0 1 2 C B C 2 2

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

B A D C B C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

4th iteration (2nd DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3

1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]

Page 179: The many ways to effectively utilize array processing

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

4 0 1 2 C B C 2 2

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

B A D C B C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

4th iteration (2nd DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3

1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]

Page 180: The many ways to effectively utilize array processing

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

4 0 1 2 C B C 2 3

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

B A D C B C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

4th iteration (3rd DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3

1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]

Page 181: The many ways to effectively utilize array processing

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

4 0 1 2 C B C 2 3

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

B A D C B C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

4th iteration (3rd DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3

1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]

Page 182: The many ways to effectively utilize array processing

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

4 0 1 2 C B C 2 3

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

B A D C B C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

4th iteration (3rd DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3

1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]

Page 183: The many ways to effectively utilize array processing

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

4 0 1 2 C B C 2 4

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

B A D C B C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

4th iteration (4th DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3

1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]

Page 184: The many ways to effectively utilize array processing

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

4 0 1 2 C B C 2 4

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

B A D C B C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

4th iteration (4th DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3

1 1 A B F B A C

Dat2:

ALL_G [I,J]

G [J]

Page 185: The many ways to effectively utilize array processing

data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D

4 0 1 2 C B C 2 4

M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K

B A D C B C

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Dat1:

4th iteration (4th DO loop):

1 2 31 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

G1 G2 G3

ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3

1 1 A B F B A C

2 2 B A D C B C

Dat2:

ALL_G [I,J]

G [J]

Page 186: The many ways to effectively utilize array processing

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3

1 1 A B F B A C

2 2 B A D C B C

Dat1:

Dat2:

Creating TWO observations after you finish reading ONE observation

Page 187: The many ways to effectively utilize array processing

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

ID G1 G2 G3

1 1 A B F

2 1 B A C

3 2 B A D

4 2 C B C

ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3

1 1 A B F B A C

2 2 B A D C B C

Dat1:

Dat2:

1 2 3

G1 G2 G3G[3]:

1 2 3

1 M_G1 M_G2 M_G3

2 F_G1 F_G2 F_G3

ALL_G[2,3]:

Use to create new variables

Use to group existing variables

ALL_G[1,1]

ALL_G[1,2]

ALL_G[1,3]

ALL_G[2,1]

ALL_G[2,2]

ALL_G[2,3]

G[1] G[2] G[3]

Page 188: The many ways to effectively utilize array processing

RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY

data dat1 (drop = i j m_g1 -- f_g3); set dat2; array all_g [2,3] m_g1 -- f_g3; array g[3] $;

do i = 1 to 2; do j = 1 to 3; g[j] = all_g[i,j]; end; output; end;

run;

Page 189: The many ways to effectively utilize array processing

CONCLUSION

Array processing enables you to create more efficient programming code

In order to use arrays correctly, in addition to grasping the array syntax, you also need to understand how DATA steps are processed

In the end, you will often realize that most of the errors are closely related to programming fundamentals, which is understanding how the PDV works

Page 190: The many ways to effectively utilize array processing

ACKNOWLEDGEMENT

I would like to thank Helen Wang & Cindy Song for giving me the opportunity to present at the PharmaSUG 2011

Page 191: The many ways to effectively utilize array processing

CONTACT INFORMATION

Arthur Li

City of Hope

Division of Information Science

1500 East Duarte Road

Duarte, CA 91010 - 3000

Phone: (626) 256-4673 ext. 65121

E-mail: [email protected]