introduction to polyhedral compilation

50
Introduction to Polyhedral Compilation Akihiro Hayashi, Jun Shirako Rice University 1

Upload: akihiro-hayashi

Post on 12-Jan-2017

78 views

Category:

Technology


5 download

TRANSCRIPT

Page 1: Introduction to Polyhedral Compilation

Introduction to Polyhedral Compilation

Akihiro Hayashi, Jun Shirako Rice University

1

Page 2: Introduction to Polyhedral Compilation

Outline q High-level Summary q Theory q Compilers and Tools

2

Page 3: Introduction to Polyhedral Compilation

HIGH LEVEL SUMMARY Introduction to Polyhedral Compilation

3

Page 4: Introduction to Polyhedral Compilation

q The first priority is “performance”

4

Supercomputers Personal Computers Smartphones Embedded Pictures Borrowed From : commons.wikimedia.org, www.hirt-japan.info

Parallel Computing

Page 5: Introduction to Polyhedral Compilation

Parallel programming is hard…

5

DRAM

L3 Cache

Core Core Core Core

L2 Cache L2 Cache

SIMD SIMD SIMD SIMD

L1$ L1$ L1$ L1$

DR

AM (s

low

est)

– R

egis

ter (

fast

est) Exploiting

SIMD

Scheduling tasks on CPUs

OptimizingData Locality

Multi-core CPUs Many-core GPUs

C

L2 Cache

DRAM

CCC

CC

CC

C C C C

CC

CC

CC

CC

C C C CUtilizing

Accelerators

Page 6: Introduction to Polyhedral Compilation

A gap between domain experts and hardware

6

Application Domain(Domain Experts)

Prog Lang. Compilers Runtime

Want to get significant performance

improvement easily (Performance Portability)

Hard to exploit the full capability of hardware

We believe Languages and Compilers are very important!

Hardware (Concurrency Experts)

Page 7: Introduction to Polyhedral Compilation

A review of literature q Automatic Parallelizing Compiler

§  IBM XL Compilers, Intel Compilers, OSCAR, Pluto, Polly, Polaris, R-Stream, SUIF, …

q Parallel Languages §  Language-based:

ü Cilk, CUDA, OpenCL, C++AMP, Java, Habanero C/Java, PGAS, … §  Directive-based:

ü OpenMP, OpenACC, OmpSs, … §  Library-based:

ü Charm++, TBB, Thrust, RAJA, Kokkos, UPC++, HJLib, …

7

Page 8: Introduction to Polyhedral Compilation

From the perspective of compilers…

q Compilers are one of the most complicated software L §  Pointer Analysis §  Scalar Optimizations §  Loop Transformations §  Vectorization/SIMDization §  Scheduling §  Exploiting accelerators §  …

8Credits: dragon by Cassie McKown from the Noun Project, crossed swords by anbileru adaleru from the Noun Project, https://en.wikipedia.org/

Page 9: Introduction to Polyhedral Compilation

What are compilers doing?

9

x = a + b;y = a + b;z = x + y;

+

a b

+

a b

+

Intermediate Representation(e.g. AST)

Programs

x = a + b;y = x;

z = x + y;

“Optimized” Code

Parsing Optimizations

Page 10: Introduction to Polyhedral Compilation

What are compilers doing?

10

q Compiler can modify programs (e.g. change the execution order of statements) as long as maintaining the semantics of programs

x = a + b;y = a + b;z = x + y;

+

a b

+

a b

+

Intermediate Representation(e.g. AST)

Programs

x = a + b;y = x;

z = x + y;

“Optimized” Codez = x + y;x = a + b;

Page 11: Introduction to Polyhedral Compilation

Examples of optimizations:Scalar optimizations

11

x = a + b;y = x;

z = x + y;

x = a + b;y = a + b;z = x + y;

a = 0;if (a) {

… }

ConstantPropaga4on

a = 0;if (0) {

… }

DeadCodeElimina4on a = 0;

CSE

Page 12: Introduction to Polyhedral Compilation

Examples of optimizations:loop permutation (interchange)

12

for (i = 0; i < M; i++) { for (j = 0; j < N; j++) { b[i][j] = a[i][j]; }}

for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { b[i][j] = a[i][j]; }}

Offset access(faster on CPUs)

Stride access (slower on CPUs)

InterchangedOriginal

Page 13: Introduction to Polyhedral Compilation

Examples of optimizations:loop fusion/distribution

13

for (i = 0; i < N; i++) { a[i] = b[i] + c[i]; d[i] = a[i] + e[i];}

for (i = 0; i < N; i++) { a[i] = b[i] + c[i];}for (i = 0; i < N; i++) { d[i] = a[i] + e[i];}

Better temporal locality on CPUs

Fused Distributed

Good for Vectorizationon CPUs

Depending on the loop size “N”

Page 14: Introduction to Polyhedral Compilation

The phase-ordering problem q Which order is better?

14

a = 0;if (a) {

… }

DeadCodeElimina4on

a = 0;if (a) {

… }

a = 0;if (0) {

… }

ConstantPropaga4on

a = 0;if (a) {

… }

ConstantPropaga4on

a = 0;if (0) {

… }

DeadCodeElimina4on

a = 0;

Page 15: Introduction to Polyhedral Compilation

15

x = a + b;y = a + b;z = x + y;

+

a b

+

a b

+

ASTPrograms

x = a + b;y = x;

z = x + y;

“Optimized” Code

AST vs. The Polyhedral Model

i >= 0;i < N;

…Polyhedron

(Affine Inequalities) “Synthesized” Code

TODAYAST

Page 16: Introduction to Polyhedral Compilation

Why Polyhedral Model?

q One solution for tackling the phase-ordering problem q Good for performing a set of loop transformations

§  Loop permutation §  Loop fusion/distribution §  Loop tiling §  …

16

“The Polyhedral Model is a convenient alternative representation which combines analysis power, expressiveness and high flexibility”- OpenScop Specification and Library

Page 17: Introduction to Polyhedral Compilation

THEORY Introduction to Polyhedral Compilation

17

Page 18: Introduction to Polyhedral Compilation

The polyhedral model in a nutshell q  The polyhedral transformation = “scheduling (determine the execution order of statements)”

q  3 important things: §  Domain: A set of instances for a statement §  Scattering (Scheduling): an instance -> time stamp §  Access: an instance -> array element(s)

q  Limitation: Only applicable for Static Control Part (SCoP) in general §  Loop bounds and conditionals are affine functions of the surrounding the loop iterators

18

for (i=1; …){ S1; for (j=1; …) S2;

1 ≤ iS1 ≤ 2;1 ≤ iS2 ≤ 2;1 ≤ jS2 ≤ 3;iS1 = iS2;

InequalitiesProgramConstraints:

Cost Function:

ILP

δe(!s,!t) = φSj

(!t) − φSi

( !s)

for (i=1; …){ S1;}for (i=1; …) { …;

“Synthesized” Code

Ci − Cj ≥ 0,!

Page 19: Introduction to Polyhedral Compilation

Representation of “Domain”

q Observations: §  S1 is executed 30

times (30 instances) §  Each instance is

associated with (i,j)19

for (i=1; i <= 5; i++) for (j=1; j <= 6;j++) S1;

“The key aspect of the polyhedral model is to consider statement instances.”- OpenScop Specification and Library

Page 20: Introduction to Polyhedral Compilation

Iteration Domain

q  A set of constraints to represent instances of a statement §  Using iteration vectors (i,j); §  If those constraints are affine -> Polyhedron

20

for (i=1; i <= 5; i++) for (j=1; j <= 6;j++) S1;

1 ≤ i ≤ 5,1 ≤ j ≤ 6;

DS1 =

1 0 −1−1 0 50 1 −10 −1 6

⎜⎜⎜⎜

⎟⎟⎟⎟

ij1

⎜⎜⎜

⎟⎟⎟≥ 0

Credits: Clint (https://www.ozinenko.com/clint)

Page 21: Introduction to Polyhedral Compilation

Representation of “Scheduling”:1-dimensional schedules

q Function T: returns the logical date of each statement

21

x = a + b; // S1y = a + b; // S2z = x + y; // S3

T_S1 = 0;T_S2 = 1;T_S3 = 2;

Logi

cal T

ime

T=0T=1T=2

Page 22: Introduction to Polyhedral Compilation

Representation of “Scheduling”:multi-dimensional schedules

22

x = a + b; // S1for (i = 0; i < 2; i++) {  a[i] = x; // S2}z = x + y; // S3Lo

gica

l Tim

e

T=1

T=2

T_S1 = (0);

T_S2(0) = (1, 0);T_S2(1) = (1, 1);T_S3 = (2)

T=0

i=0i=1

q Function T: returns the logical date of each statement q Logical dates may be multi-dimensional (c.f. clocks

§  Lexicographical Order: §  C.f. Clocks (days, hours, minutes, seconds)

TS1 ≺ TS2 ≺ TS3 ⇔ (0) ≺ (1,i) ≺ (2)

Page 23: Introduction to Polyhedral Compilation

Representation of “Scheduling”:multi-dimensional schedules

23

x = a + b; // S1for (i = 0; i < 2; i++) {  a[i] = x; // S2}z = x + y; // S3Lo

gica

l Tim

e

T=1

T=2

T_S1 = (0);

T_S2(i) = (1, i);

T_S3 = (2)

T=0

i=0i=1

Parameterized:

Recall “Iteration domain”

0 ≤ i < 2

q Function T: returns the logical date of each statement q Logical dates may be multi-dimensional (c.f. clocks

§  Lexicographical Order: §  C.f. Clocks (days, hours, minutes, seconds)

TS1 ≺ TS2 ≺ TS3 ⇔ (0) ≺ (1,i) ≺ (2)

Page 24: Introduction to Polyhedral Compilation

Representation of “Scheduling”:multi-dimensional schedules

24

x = a + b; // S1for (i = 0; i < 2; i++) {  a[i] = x; // S2}for (i = 0; i < 2; i++) { for (j = 0; j < 3; j++) { b[i][j] += a[i]; // S3 }}

Logi

cal T

ime

T=1

T=2

T_S1 = (0);T_S2(i) = (1, i);

T_S3(i,j) = (2, i, j);

T=0

i=0i=1

j=0i=1

j=0i=1

i=0

i=1

Page 25: Introduction to Polyhedral Compilation

Loop transformations with schedules

25

for (i = 0; i < 2; i++) { for (j = 0; j < 3; j++) { b[i][j] = ...; // S1 }}

for (i = 0; i < 2; i++) { for (j = 0; j < 3; j++) { b[i][j] = ...; // S1 }}

TS1(i,j) = 1 00 1

⎝⎜

⎠⎟ i

j⎛

⎝⎜⎜

⎠⎟⎟ = i

j⎛

⎝⎜⎜

⎠⎟⎟

T_S1(i, j) = (i, j);

T_S1(i, j) = (i, j);

Originalschedule

Newschedule

New Schedule

Iteration Vector

Original

NewTransformation

Page 26: Introduction to Polyhedral Compilation

Loop transformations with schedules: Loop Reversal

26

for (i = 0; i < 2; i++) { for (j = 0; j < 3; j++) { b[i][j] = ...; // S1 }}

T_S1(i, j) = (i, j);

T_S1(i, j) = (-i, j);

Originalschedule

Newschedule

Original

NewTS1(i,j) = −1 0

0 1⎛

⎝⎜

⎠⎟ i

j⎛

⎝⎜⎜

⎠⎟⎟ = −i

j⎛

⎝⎜⎜

⎠⎟⎟

New Schedule

Iteration VectorTransformation

for (i = -1; i <= 0; i++) { for (j = 0; j < 3; j++) { b[-i][j] = ...; // S1 }} inew = −iold;

iold → −inew;

Page 27: Introduction to Polyhedral Compilation

Loop transformations with schedules: Loop Permutation

27

for (i = 0; i < 2; i++) { for (j = 0; j < 3; j++) { b[i][j] = ...; // S1 }}

for (j = 0; j < 3; j++) {  for (i = 0; i < 2; i++) b[i][j] = ...; // S1 }}

T_S1(i, j) = (i, j);

T_S1(i, j) = (j, i);

Originalschedule

Newschedule

Original

NewTS1(i,j) = 0 1

1 0⎛

⎝⎜

⎠⎟ i

j⎛

⎝⎜⎜

⎠⎟⎟ = j

i⎛

⎝⎜⎜

⎠⎟⎟

New Schedule

Iteration VectorTransformation

Page 28: Introduction to Polyhedral Compilation

Loop transformations with schedules: Loop Skewing

28

for (i = 1; i <= 5; i++) { for (j = 1; j <= 5; j++) { a[i][j] = a[i-1][j+1]; // S1 }}

for (i = 1; i <= 5; i++) { for (j = i+1; j <= i+5; j++) { a[i][j-i] = a[i-1][j-i+1]; // S1 }}

T_S1(i, j) = (i, j);

T_S1(i, j) = (i, i+j);

Originalschedule

Newschedule

Original

NewTS1(i,j) = 1 0

1 1⎛

⎝⎜

⎠⎟ i

j⎛

⎝⎜⎜

⎠⎟⎟ = i

i + j⎛

⎝⎜⎜

⎠⎟⎟

New Schedule

Iteration VectorTransformation

jnew = i + jold;jold → jnew − i;

Page 29: Introduction to Polyhedral Compilation

Loop transformations with schedules: Loop Skewing (Cont’d)

29

TS1 = 1 01 1

⎝⎜

⎠⎟ i

j⎛

⎝⎜⎜

⎠⎟⎟

Credits: Clint (https://www.ozinenko.com/clint)

(i,i+j)=(1,2);(1,3);(1,4);(1,5);(2,3);(2,4);(2,5);(2,6);(2,7);(3,4);(3,5);(3,6);(3,7);(3,8);(4,5);…

(i,j)=(1,1);(1,2);(1,3);(1,4);(1,5);(2,1);(2,2);(2,3);(2,4);(2,5);(3,1);(3,2);(3,3);(3,4);(3,5);…

DependenceExecution Order

Page 30: Introduction to Polyhedral Compilation

Scalar Dimensions in schedules q 2d+1 format (d+d+1) q Can represent/transform imperfectly nested loops

§  e.g., Loop fusion/distribution

30

for (i = 0; i < 2; i++) s[i] = ...; // S1 for (j = 0; j < 3; j++) a[i][j] = ...; // S2for (i = 0; i < 2; i++) for (j = 0; j < 3; j++) b[i] = ...; // S3

T_S1(i) = (0, i, 0);T_S2(i,j) = (0, i, 1, j, 0);

T_S3(i,j) = (1, i, 0, j, 0)

Page 31: Introduction to Polyhedral Compilation

Loop transformations to schedulesloop fusion w/ scalar dimensions

31

for (i = 0; i < 2; i++) for (j = 0; j < 3; j++) a[i] = ...; // S1for (i = 0; i < 2; i++) for (j = 0; j < 3; j++) b[i] = ...; // S2

for (i = 0; i < 2; i++) for (j = 0; j < 3; j++) a[i] = ...; // S1 for (j = 0; j < 3; j++) b[i] = ...; // S2

T_S1(i,j) = (0, i, 0, j); T_S2(i,j) = (1, i, 0, j);

T_S1(i,j) = (0, i, 0, j);T_S2(i,j) = (0, i, 1, j);

Originalschedule

Newschedule

TS2(i,j) =

0 01 00 00 1

⎜⎜⎜⎜

⎟⎟⎟⎟

ij⎛

⎝⎜⎜⎞

⎠⎟⎟ +

0010

⎜⎜⎜⎜

⎟⎟⎟⎟

=

0i1j

⎜⎜⎜⎜

⎟⎟⎟⎟

New Schedule

Scalar DimensionsTransformation

Original

New

Page 32: Introduction to Polyhedral Compilation

Schedules in general

32

TS(!i) =

φS1(!i)

φS2(!i)

φS3(!i)

φS4(!i)"

φSd(!i)

⎜⎜⎜⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟⎟⎟⎟

=

C11S C12

S C13S C14

S " C1mSS

C21S C22

S C23S C24

S " C2mSS

C31S C32

S C33S C34

S " C3mSS

C41S C42

S C43S C44

S " C4mSS

# # # # $ #Cd1

S Cd2S Cd3

S Cd 4S " CdmS

S

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟

!i( ) +

C10S

C20S

C30S

C40S

!Cd 0

S

⎜⎜⎜⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟⎟⎟⎟

Scalar DimensionsA transformation for an iteration vector

d

mS

d

1

Schedulese.g.,(0,i,0,j)

d = 2mS + 1, mS = the size of iteration vector

Page 33: Introduction to Polyhedral Compilation

Schedules in general

33

TS(!i) =

φS1(!i)

φS2(!i)

φS3(!i)

φS4(!i)"

φSd(!i)

⎜⎜⎜⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟⎟⎟⎟

=

C11S C12

S C13S C14

S " C1mSS

C21S C22

S C23S C24

S " C2mSS

C31S C32

S C33S C34

S " C3mSS

C41S C42

S C43S C44

S " C4mSS

# # # # $ #Cd1

S Cd2S Cd3

S Cd 4S " CdmS

S

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟

!i( ) +

C10S

C20S

C30S

C40S

!Cd 0

S

⎜⎜⎜⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟⎟⎟⎟

Scalar DimensionsA transformation for an iteration vector

d

mS

d

1

Schedulese.g.,(0,i,0,j)

d = 2mS + 1, mS = the size of iteration vector

Goal: Compute the coefficients and offsets for each statement

Page 34: Introduction to Polyhedral Compilation

Legality of transformations

q All transformations are valid? NO! 34

for (i = 1; i <= 10; i++) s[i] = ...; // S1 for (j = 0; j < 3; j++) a[i][j] = s[i]; // S2

T_S1(i) = (0, i, 0);T_S2(i,j) = (0, i, 1, j, 0);

for (i = 1; i <= 10; i++) for (j = 0; j < 3; j++) a[i][j] = s[i]; // S2 s[i] = ...; // S1

T_S2(i,j) = (0, i, 0, j, 0);T_S2(i) = (0, i, 1);

Original

NewTransforma4on

Page 35: Introduction to Polyhedral Compilation

Dependences q Three types of dependence:

§  Read-After-Write: (a=1; then b=a;) §  Write-After-Read: (b=a; then a=1;) §  Write-After-Write: (a=1; then a=2;)

q Dependence: computed from domain, access, and schedule §  Transformation = Find a new schedule that satisfies

all dependences

35

Page 36: Introduction to Polyhedral Compilation

Dependence polyhedron

q  Dependence polyhedron : a set of inequalities ( ) §  A general and accurate representation of instance-wise dependences

36

for (i = 1; i <= 10; i++) s[i] = ...; // S1 for (j = 0; j < 3; j++) a[i][j] = s[i]; // S2

iS1 = iS21 ≤ iS1 ≤ 10,1 ≤ iS2 ≤ 10;0 ≤ jS2 < 3;

1 −1 0 01 0 0 −1−1 0 0 100 0 1 00 0 −1 2

⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟

iS1iS2jS2

1

⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟

=

≥0

DS1

DS2

S1

S2

Credits: Clint (https://www.ozinenko.com/clint)

iS1 = iS2 ⇒ iS1 − iS2 ≥ 0 ∧ iS2 − iS1 ≥ 0

Page 37: Introduction to Polyhedral Compilation

Legality of transformations q Dependence polyhedron: q Legality:

§  §  If “source” instance must happen before

“target” instance in the original program, the transformed program must preserve this property (must satisfy the dependence)

37

∀ s,t ∈ Pe,(s ∈ DSi,t ∈ DSj),TSi(s) ≺ TSj(t)

Pe

Page 38: Introduction to Polyhedral Compilation

Putting it all together

q Goal : Compute all coefficients and offsets such that

38

TS2(i,j) =

C11S2 C12

S2

C21S2 C22

S2

C31S2 C32

S2

C41S2 C32

S2

C51S2 C52

S2

⎜⎜⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟⎟⎟

iS2jS2

⎝⎜⎜

⎠⎟⎟ +

C10S2

C20S2

C30S2

C40S2

C50S2

⎜⎜⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟⎟⎟

TS1(i) =

C11S1

C21S1

C31S1

⎜⎜⎜⎜

⎟⎟⎟⎟

iS1( ) +C10

S1

C20S1

C30S1

⎜⎜⎜⎜

⎟⎟⎟⎟

1 −1 0 01 0 0 −1−1 0 0 100 0 1 00 0 −1 2

⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟

iS1iS2jS2

1

⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟

=

≥0

∀ s,t ∈ Pe,(s ∈ DS1,t ∈ DS2),TS1(s) ≺ TS2(t)

DependencePolyhedron PeSchedules

iS1 = iS21 ≤ iS1 ≤ 10,1 ≤ iS2 ≤ 10;0 ≤ jS2 < 3;

Page 39: Introduction to Polyhedral Compilation

Linearizing the legality condition(The Pluto Algorithm) q The Legality condition (for iteration vectors) q Uniform dependences : distance between two dependent

iteration is a constant ( is a constant) q  Non-uniform dependences : distance between two

dependence varies ( is a function of j ) §  Apply the Farkas lemma

39

δ(s,t) = (c1Sj,c2

Sj,…,cmSjSj)!t − (c1

Si,c2Si,…,cmSi

Si)!s ≥ 0, s,t ∈ P

i → i + 1 ⇒ δ(s,t)

i → i + j ⇒ δ(s,t)

(c1Sj,c2

Sj,…,cmSjSj)!t − (c1

Si,c2Si,…,cmSi

Si)!s ≥ 0, s,t ∈ Pe ⇔

(c1Sj,c2

Sj,…,cmSjSj)!t − (c1

Si,c2Si,…,cmSi

Si)!s ≡ λe0 + λekk =1

me∑ Pek,λek ≥ 0

Each inequality in a dependence

polyhedron

Page 40: Introduction to Polyhedral Compilation

Cost Function & Objective Function(The Pluto Algorithm) q Compute all coefficients and offsets under the legality

condition : Solve an ILP problem q Cost Function = Transformation policy

§  Pluto’s cost function = dependence distance

ü Fuse loops as much as possible ü Push loops carrying dependence inner level

§  Also used in ISL (Polly, PPCG, …) q Objective Function:

§  Iteratively find linearly independent solutions 40

δ(s,t) = (c1Sj,c2

Sj,…,cmSjSj)!t − (c1

Si,c2Si,…,cmSi

Si)!s, s,t ∈ P

minimize ≺ (u1,w,c1Sj,c2

Sj)

Page 41: Introduction to Polyhedral Compilation

Step-by-step example

41

for (i = 0; i < N; i++) { for (j = 1; j < N; j++) { a[i][j] = a[j][i] + a[i][j-1]; // S1 }}

a[0][1] = a[1][0] + a[0][0]; // S1(0,1)a[0][2] = a[2][0] + a[0][1]; // S1(0,2)a[0][3] = a[3][0] + a[0][2]; // S1(0,3)...a[1][1] = a[1][1] + a[1][0]; // S1(1,1)a[1][2] = a[2][1] + a[1][1]; // S1(1,2)a[1][3] = a[3][1] + a[1][2]; // S1(1,3)...a[2][1] = a[1][2] + a[2][0]; // S1(2,1)a[2][2] = a[2][2] + a[2][1]; // S1(2,2)a[2][3] = a[3][2] + a[2][2]; // S1(2,3)...a[3][1] = a[1][3] + a[3][0]; // S1(3,1)

Dependence 1 (RAW)Dependence 2 (RAW)Dependence 3 (WAR)

Page 42: Introduction to Polyhedral Compilation

(is,js) → (it,jt)

c1S1,c2

S1( ) itjt

⎜⎜

⎟⎟− c1

S1,c2S1( ) is

js

⎜⎜

⎟⎟≥ 0, is,js,it,jt ∈ Pe1

⇒ c1S1it + c2

S1jt − (c1S1is + c2

S1js) = c1S1it + c2

S1jt − (c1S1it + c2

S1(jt − 1)) ≥ 0⇒ c2

S1 ≥ 0

Step-by-step example:Legality Constraints 1 (The Pluto Algorithm) q Dependence 1 : RAW (flow dependence )

42

Source: a[0][1] = a[1][0] + a[0][0]; // S1(0,1)Target: a[0][2] = a[2][0] + a[0][1]; // S1(0,2)...

Pe1 : is = it,js = jt − 1,0 ≤ it ≤ N − 1,2 ≤ jt ≤ Nδ(s,t) = (c1

Sj,c2Sj,…,cmSj

Sj)!t − (c1

Si,c2Si,…,cmSi

Si)!s ≥ 0, s,t ∈ PLegality Constraints:

DependencePolyhedronPe1

Page 43: Introduction to Polyhedral Compilation

Step-by-step example:Legality Constraints 2 (The Pluto Algorithm) q Dependence 2 : RAW (flow dependence )

43

Pe2 : is = jt,js = it,1 ≤ it ≤ N,2 ≤ jt ≤ N,it − jt ≥ 1

c1S1,c2

S1( ) itjt

⎜⎜

⎟⎟− c1

S1,c2S1( ) is

js

⎜⎜

⎟⎟≥ 0, is,js,it,jt ∈ Pe1

⇒ c1S1it + c2

S1jt − (c1S1is + c2

S1js) = c1S1it + c2

S1jt − (c1S1jt + c2

S1it) ≥ 0⇒ (c1

S1 − c2S1)it + (c2

S1 − c1S1)jt ≥ 0,1 ≤ it ≤ N,2 ≤ jt ≤ N,it − jt ≥ 1

δ(s,t) = (c1Sj,c2

Sj,…,cmSjSj)!t − (c1

Si,c2Si,…,cmSi

Si)!s ≥ 0, s,t ∈ PLegality Constraints:

DependencePolyhedronPe2

Target: a[1][2] = a[2][1] + a[1][1]; // S1(1,2)...Source: a[2][1] = a[1][2] + a[2][0]; // S1(2,1)

(is,js) → (it,jt)

FarkasLemma+FourierMozkin c1S1 − c2

S1 ≥ 0

Page 44: Introduction to Polyhedral Compilation

Step-by-step example:Putting it all together (The Pluto Algorithm) q Dependence 1 q Dependence 2 & 3

q Avoiding zero vector

q Objective Function:

44

c2S1 ≥ 0,w ≥ c2

S1

c1S1 − c2

S1 ≥ 0,u1 ≥ 0,u1 ≥ c1S1 − c2

S1,3u1 + w ≥ c1S1 − c2

S1

c1S1 + c2

S1 ≥ 1

minimize ≺ (u1,w,c1S1,c2

S1) → (0,1,1,1)

Constraints using parameter N that bound the dependence distances

Find linearly Independent answer TS1(i,j) = 1 11 0

⎝⎜

⎠⎟ i

j⎛

⎝⎜⎜

⎠⎟⎟

Page 45: Introduction to Polyhedral Compilation

Summary q  The polyhedral transformation = “scheduling (determine the execution order of statements)”

q  3 important things: §  Domain: A set of instances for a statement §  Scattering (Scheduling): an instance -> time stamp §  Access: an instance -> array element(s)

q  Limitation: Only applicable for Static Control Part (SCoP) in general §  Loop bounds and conditionals are affine functions of the surrounding the loop iterators

45

for (i=1; …){ S1; for (j=1; …) S2;

1 ≤ iS1 ≤ 2;1 ≤ iS2 ≤ 2;1 ≤ jS2 ≤ 3;iS1 = iS2;

InequalitiesProgramConstraints:

Cost Function:

ILP

δe(!s,!t) = φSj

(!t) − φSi

( !s)

for (i=1; …){ S1;}for (i=1; …) { …;

“Synthesized” Code

Ci − Cj ≥ 0,!

Page 46: Introduction to Polyhedral Compilation

COMPILERS AND TOOLS Introduction to Polyhedral Compilation

46

Page 47: Introduction to Polyhedral Compilation

Polyhedral Compilers & Tools q PoCC (The Polyhedral Compiler Collection)

§  http://web.cs.ucla.edu/~pouchet/software/pocc/ §  Clan: extract a polyhedral IR from the source code §  Candl: a dependence analyzer §  LetSee: legal transformation space explorer §  PLuTo: an automatic parallelizer and locality

optimizer §  CLooG: code generation from the polyhedral IR

47

Page 48: Introduction to Polyhedral Compilation

Polyhedral Compilers & Tools q Polly

§  http://polly.llvm.org/ §  ISL: Integer Set Library (including code generator)

q Clay/Chrole/Clint §  https://www.ozinenko.com/projects §  Clay: “Chunky Loop Alteration wizardrY” §  Chrole: “Recovering high-level syntactic description of the

automatically computed polyhedral optimization” §  Clint: “Interactive graphical interface to the manual and

compiler-assisted program restructuring in the polyhedral model”

48

Page 49: Introduction to Polyhedral Compilation

Clint

49

Page 50: Introduction to Polyhedral Compilation

Further readings q  Fundamentals

§  OpenScop Specification ü  http://icps.u-strasbg.fr/people/bastoul/public_html/development/openscop/docs/openscop.html

§  ISL ü  https://lirias.kuleuven.be/bitstream/123456789/270231/1/icms2010verdoolaege.pdf

q  Pluto algorithm §  U. Bondhugula, “Effective Automatic Parallelization and Locality Optimization Using The Polyhedral

Model” (PhD Dissertation, 2010) §  U. Bondhugula, A. Hartono, J. Ramanujam, P. Sadayappan, “A Practical Automatic Polyhedral

Parallelizer and Locality Optimizer.” [PLDI’08] q  Polly

§  T. Grosser, S. Verdoolaege, A. Cohen, “Polyhedral AST generation is more than scanning polyhedra” [ACM TOPLAS2015]

q  Polyhedral model + AST-based Cost Function §  J. Shirako, L.N. Pouchet, V. Sarkar, “Oil and Water Can Mix: An Integration of Polyhedral and AST-

based Transformations.” [SC’14] q  GPU Code Generation

§  S. Verdoolaege, J.C Juega. A. Cohen, J.I Gomez, C. Tenllado, F. Catthoor, “Polyhedral parallel code generation for CUDA” [ACM TACO2013]

§  J. Shirako, A. Hayashi, V. Sarkar., “Optimized Two-level Parallelization for GPU Accelerators using the Polyhedral Model” [CC’17] 50