proactive loop-nest optimizations

25
1 Proactive Loop-nest Optimizations Mei Ye [email protected] Acknowledgements: Dinesh Suresh, Roy Ju, Michael Lai

Upload: kylie-campbell

Post on 01-Jan-2016

29 views

Category:

Documents


0 download

DESCRIPTION

Proactive Loop-nest Optimizations. Mei Ye [email protected] Acknowledgements: Dinesh Suresh, Roy Ju, Michael Lai. Adjacent Loops. Five little pumpkins sitting on a gate …. Func. If. Block. If. Then. Else. Then. Else. Block. Loop. If. If. Then. Else. Else. Then. Loop. Loop. - PowerPoint PPT Presentation

TRANSCRIPT

1

Proactive Loop-nest Optimizations

Mei Ye

[email protected]

Acknowledgements: Dinesh Suresh, Roy Ju, Michael Lai

2

Adjacent Loops

Five little pumpkins sitting on a gate …

3

4

Func

If Block If

Then Else

If

Then Else

Loop Loop

Then Else

Loop IfBlock

Then

Loop

Else

5

Proactive Loop Fusion

An automation that applies a set of code transformations (if-merging, head/tail duplication, code motion and etc.) iteratively over the whole function without a fixed order to bring pairs of loops adjacent to each other for the purpose of enabling loop fusion.

6

Proactive Loop Fusion Candidates

A pair of loops are proactive loop fusion candidates iff:

1) Have a Least Common Predecessor (LCP) in the tree. 2) Paths from candidates to LCP have equal length.3) Each pair of nodes on the path have the same type. Pairs of Ifs have

identical values for condition expressions.4) Loops not adjacent to each other but are otherwise good fusion candidates.

O(( depth * n)^2) (depth: depth in tree, n: number of loops at that depth)

LCP

If Block If

Then Else Then Else

Loop1 Loop2

7

Proactive Loop Fusion Transformation Candidates

Proactive loop fusion transformation candidates, cand1 and cand2:1. Are immediate children of the LCP of loop fusion candidates.2. Are either a If or a Loop.3. For every sibling in-between (cand1, cand2) that is a Block or a If. The Block can be safely and legally move above cand1 if cand1 is a Loop. The If has at least one path that does not have dependency on loop fusion candidates.4. For every sibling in-between (cand1, cand2] that is a If, Its preceding siblings can be legally if-merged or head-duplicated into it. 5. For every sibling in-between [cand1, cand2) that is a if. Its succeeding siblings can be legally if-merged or tail-duplicated into it.

LCP

If Block If

Then Else Then Else

Loop1 Loop2

cand1 cand2

8

If Block Ifcand1 cand2

sc1

LCP

sc2

LCP

If If

sc1 sc2

tail-duplication

if-merging

LCP

If

(1)

(2)

(3)

9

Action Table

sc1 sc2 Action

Loop Block Safe code motion of sc2 before sc1;

Iteration continues on sc1. If Block Tail-duplication of sc2 into sc1;

Iteration continues on sc1. Loop If Head duplication of sc1 into sc2;

Iteration continues on sc2. If If If-merging or tail duplication of sc2

into sc1. Iteration continues on sc1.

If Loop Tail duplication of sc2 into sc1.

Iteration continues on sc1.

10

if (a) {

for (i=0; i<n;i++)

stmt1;

if (b)

stmt2;

}

if (a) {

for (i=0; i<n;i++)

stmt3;

}

if (a) {

for (i=0;i<n;i++)

stmt1;

if (b)

stmt2;

for (i=0;i<n;i++)

stmt3;

}

Func

If(a) If(a)

Then Else

Func

If(a)

cand1 cand2

Loop If(b)

Then Else

Else

Block

Loop

Then

Then Else

Loop If(b) Loop

Then Else

Block

----------------------------------if-merging------------------------------------------------

LCP

(sc1) (sc2)

LCP

11

if (a) {

for (i=0;i<n;i++)

stmt1;

if (b)

stmt2;

for (i=0;i<n;i++)

stmt3;

}

if (a) {

if (b) {

for (i=0;i<n;i++)

stmt1;

stmt2;

}

else {

for (i=0;i<n;i++)

stmt1;

}

for (i=0;i<n;i++)

stmt3;

}

If(a)

Then Else

Loop If(b) Loop

Then Else

Block

If(a)

Then Else

If(b) Loop

Then Else

Loop Block Loop

cand1 cand2

------------------------------head duplication-----------------------------------------------

LCP

(sc1) sc2

LCP

sc1 sc2

12

if (a) { if (b) { for (i=0;i<n;i++) stmt1; stmt2; } else { for (i=0;i<n;i++) stmt1; } for (i=0;i<n;i++) stmt3;}

if (a) { if (b) { for (i=0;i<n;i++) stmt1; stmt2; for (i=0; i<n;i++) stmt3; } else { for (i=0;i<n;i++) stmt1; for (i=0;i<n;i++) stmt3; }}

If(a)Then Else

If(b) Loop

Then Else

Loop Block Loop

sc2sc1

If(a)

Then Else

If(b)

Then Else

Loop Block Loop Loop Loop

---------------------------------- tail duplication----------------------------------------------------

LCP

LCP

13

if (a) { if (b) { for (i=0; i<n;i++) stmt1; stmt2; for (i=0;i<n;i++) stmt3; }}else { for (i=0;i<n;i++) stmt1; for(i=0;i<n;i++) stmt3;}

if (a) { if (b) { stmt2; for (i=0;i<n;i++) stmt1; for (i=0;i<n;i++) stmt3; }}else { for (i=0;i<n;i++) stmt1; for (i=0;i<n;i++) stmt3;}

If(a)

Then Else

If(b)Then Else

Loop Block Loop Loop Loop

cand1 cand2

If(a)

Then Else

If(b)

Then Else

Block Loop Loop Loop Loop

-----------------------------------code motion-------------------------------------------------------

LCP

(sc1) sc2

LCP

14

1. void COMP_UNIT::Pro_loop_fusion_trans() {2. // Identifying proactive loop fusion candidates and flags LCPs3. pro_loop_fusion_trans->Classify_loops(func);4. // Start a top-down proactive loop fusion transformations.5. pro_loop_fusion_trans->Top_down_trans(func); }

6. void PRO_LOOP_FUSION_TRANS::Top_down_trans(SC_NODE * sc) {7. if (sc is a LCP) { // Process LCPs8. while (1) {9. // Find proactive loop fusion transformation candidates.10. Find_cand(sc, &cand1, &cand2);11. // Invoke proactive loop fusion transformations.12. if (cand1 && cand2) 13. Traverse_trans(cand1, cand2);14. else15. break; }16. if (transformation happens) {17. // Re- identify proactive loop fusion candidates.18. Classify_loops(sc); } }19. // Recursively visit chid nodes. 20. SC_LIST_ITER sc_list_iter;21. SC_NODE * kid;22. FOR_ALL_ELEM(kid, sc_list_iter, Init(sc->Kids())) 23. Top_down_trans(kid); }

O(n*m) (n: number of LCPs, m: number of intervening nodes among loop fusion candidates)

15

Proactive Loop Interchange

An automation that applies loop unswitching, reverse loop unswitching, if-condition distribution, if-condition tree height reduction and other control flow graph transformations to eliminate intervening statements between the outer loop and the inner loop in a loop-nest for the purpose of enabling loop interchange.

16

for (i=0; i<n;i++) {

if (a & (1<<i)) {

if (b)

bar();

else if (c) {

for (j=0;j<m;j++)

a[j][i] = 0;

}

}

}

for (i=0;i<n;i++) {

if (a & (1<<i)) {

if (!b && c) {

for (j=0;j<m;j++)

a[j][i] = 0;

}

else if (b)

bar();

}

}

Loop

if(a&(1<<i))

Then Else

if(b)

Then Else

if(c)Block

Then Else

Loop

Loop

if (a&(1<<i))

Then Else

if(!b&&c)

Then Else

if(b)

Then Else

Block

Loop

-----------------------if-condition tree height reduction-------------------------

Loop

Loop

Loop

Loop

blue

red

red

blue

red

17

for (i=0; i<n;i++) {

if (a & (1<<i)) {

if (!b && c) {

for (j=0;j<m;j++)

a[j][i]=0;

}

else if (b)

bar();

}

}

for (i=0;i<n;i++) {

if (!b &&c) {

if (a & (1<<i)) {

for (j=0;j<m;j++)

a[j][i] = 0;

}

}

else if (b) {

if (a & (1<<i))

bar();

}

}

Loop

if(a&(1<<i))

Then Else

if(!b&&c)

Then Else

Loop if(b)

Then Else

Block

Loop

if(!b&&c)

Then Else

if(a&(1<<i))

Then Else

Loop

if(b)

Then Else

if(a&(1<<i))

Then Else

Block

------------------------------ if-condition distribution -------------------------------------------------------

Loop

blue

red

Loop

Loop

red

blue

Loop

18

for (i=0;i<n;i++) { if (!b && c) { if (a & (1<<i)) { for (j=0;j<m;j++) a[j][i]=0; } } else if (b) { if (a & (1<<i)) bar(); }}

for (i=0;i<n;i++) { if (!b && c) { for (j=0;j<m;j++) { if (a & (1<<i)) a[j][i]=0; } } else if (b) { if (a & (1<<i)) bar(); }}

Loop

if(!b&&c)Then Else

if(a&(1<<i))

Then Else

Loop

if(b)

Then Else

if(a&(1<<i))

Then Else

BlockBlock

Loop

if(!b&&c)

Then Else

Loop

if(a&(1<<i))

Then Else

Block

if(b)

Then Else

if(a&(1<<i))

Then Else

Block

----------------------------reversed loop un-switching----------------------------------

Loop

red

blue

Loop

Loop

red

Loop

19

ty

for (i=0;i<n;i++) { if (!b && c) { for (j=0;j<m;j++) { if (a & (1<<i)) a[j][i]=0; } } else if (b) { if (a & (1<<i)) bar(); }}

if (!b && c) { for (i=0;i<n;i++) { for (j=0;j<m;j++) { if (a & (1<<i)) a[j][i]=0; } }}else if (b) { for (i=0;i<n;i++) { if (a & (1<<i)) bar(); }}

Loop

if(!b&&c)

Then Else

Loop

if(a&(1<<i))

Then Else

Block

if(b)

Then Else

if(a&(1<<i))

Then Else

Block

if(!b&&c)

Then Else

Loop

Loop

if(a&(1<<i))

Then Else

Block

if(b)

Then Else

Loop

if(a&(1<<i))

Then Else

Block

---------------------------loop un-switching --------------------------------------------------------------

Loop

red

Loop

Loop

Loop

20

Heuristics

Proactive loop fusion Maximize loop fusion. Large or unknown trip count loops. Loops on symmetric paths with same iteration spaces. Pre-check on transformation legality.

Proactive loop interchange Fully-permutable loop-nest. Memory reference iterates on inner loop’s dimension. Inner loop

has large or unknown trip counts. Simply-nested if-regions. Pre-check on transformation legality.

21

Peak scores of libquantum

Binary Istanbul (1c) Istanbul (12c)

Default peak 52.8 174

Default peak + proactive loop fusion

81.6 (1.55x) 459 (2.64x)

Default peak + proactive loop fusion + proactive loop interchange

58.8 (-28%) 632 (+38%)

AMD Istanbul, 2.4GHz, 2 socket, 6 cores/socket, 64KB L1 instruction cache, 64KB L1 data cache, 512 KB L2 cache, 6MB/socket L3 cache, 32GB DDR2-800 memory, SLES10 SP2

22

Reference

Kit Barton (www.cs.ualberta.ca/~cbarton)

Gather intervening codes between loops using dominance relation. Build Data Dependence Graph of the intervening codes. Use schedule queue to identify movable nodes.

23

Barton’s Non-Adjacent loops example

while (i < N) {a += i;i++;

}b := a * 2;c := b + 6;g := 0;h := g + 10;if (c < 100)

d := c/2;else

e := c * 2;while (j < N) {

f := g + 6;j++;

}

b := a * 2;

c := b + 6;

g := 0;

if (c < 100)

d := c/2;

else

e := c * 2;

h := g + 10;

24

Barton’s Non-Adjacent loops example

while (i < N) {a += i;i++;

}b := a * 2;c := b + 6;g := 0;h := g + 10;if (c < 100)

d := c/2;else

e := c * 2;while (j < N) {

f := g + 6;j++;

}

g := 0;h := g + 10;while (i < N) {

a += i;i++;

}while (j < N) {

f := g + 6;j++;

}b := a * 2;c := b + 6;if (c < 100)

d := c/2;else

e := c * 2;

25

Barton’s Pros & Cons

Pros Powerful full-fledged code motion.

Cons Loops must be control-flow equivalent. No finer granularity in if-regions.