jialu huang, arun raman, thomas b. jablin, yun zhang, tzu ...software pipelining creates...

27
Software Pipelining Creates Parallelization Opportunities Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu-Han Hung David I. August Liberty Research Group Princeton University 1 Saturday, May 15, 2010

Upload: others

Post on 23-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu ...Software Pipelining Creates Parallelization Opportunities Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu-Han

Software Pipelining Creates Parallelization Opportunities

Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu-Han Hung David I. August

Liberty Research GroupPrinceton University

1

Saturday, May 15, 2010

Page 2: Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu ...Software Pipelining Creates Parallelization Opportunities Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu-Han

DOALL

DSWP

DOACROSS

SPECDOALL

LOCALWRITE

... ...

DSWPDSWP

2

DSWP+

Saturday, May 15, 2010

Page 3: Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu ...Software Pipelining Creates Parallelization Opportunities Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu-Han

DOALL

A.1

D.1

B.1 A.2

D.2

B.2 A.3

D.3

B.3

C.1

C.2

C.3

Core1 Core2 Core3

node = list -> head;A: while (node != NULL) {B: index = calc (node -> data);C: density [index] = update_density (density [index], node -> data);D: node = node -> next; }

A

B

D

C

intra-iteration dependencecross-iteration dependence

3

Saturday, May 15, 2010

Page 4: Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu ...Software Pipelining Creates Parallelization Opportunities Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu-Han

DOALL

A.1

D.1

B.1 A.2

D.2

B.2 A.3

D.3

B.3

C.1

C.2

C.3

Core1 Core2 Core3

node = list -> head;A: while (node != NULL) {B: index = calc (node -> data);C: density [index] = update_density (density [index], node -> data);D: node = node -> next; }

A

B

D

C

intra-iteration dependencecross-iteration dependence

4

Saturday, May 15, 2010

Page 5: Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu ...Software Pipelining Creates Parallelization Opportunities Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu-Han

LOCALWRITE

A.1

D.1

B.1

A.1

D.1

B.1

A.1

D.1

B.1

Core1 Core2 Core3

node = list -> head;A: while (node != NULL) {B: index = calc (node -> data);C: density [index] = update_density (density [index], node -> data);D: node = node -> next; }

5

P1 P2 P3

density

Saturday, May 15, 2010

Page 6: Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu ...Software Pipelining Creates Parallelization Opportunities Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu-Han

LOCALWRITE

A.1

D.1

B.1

A.1

D.1

B.1

A.1

D.1

B.1

Core1 Core2 Core3

node = list -> head;A: while (node != NULL) {B: index = calc (node -> data);C: density [index] = update_density (density [index], node -> data);D: node = node -> next; }

i = owner (density[index]);

6

P1 P2 P3

density

Saturday, May 15, 2010

Page 7: Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu ...Software Pipelining Creates Parallelization Opportunities Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu-Han

LOCALWRITE

A.1

D.1

B.1

A.1

D.1

B.1

A.1

D.1

B.1

Core1 Core2 Core3

node = list -> head;A: while (node != NULL) {B: index = calc (node -> data);C: density [index] = update_density (density [index], node -> data);D: node = node -> next; }

B.1

C.1

7

A.2 A.2

P1 P2 P3

density

Saturday, May 15, 2010

Page 8: Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu ...Software Pipelining Creates Parallelization Opportunities Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu-Han

LOCALWRITE

A.1

D.1

B.1

A.1

D.1

B.1

A.1

D.1

B.1

Core1 Core2 Core3

node = list -> head;A: while (node != NULL) {B: index = calc (node -> data);C: density [index] = update_density (density [index], node -> data);D: node = node -> next; }

B.1

A.2

D.2

B.2

A.2

D.2

B.2

A.2

D.2

C.1

8

P1 P2 P3

density

Saturday, May 15, 2010

Page 9: Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu ...Software Pipelining Creates Parallelization Opportunities Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu-Han

LOCALWRITE

A.1

D.1

B.1

A.1

D.1

B.1

A.1

D.1

B.1

Core1 Core2 Core3

node = list -> head;A: while (node != NULL) {B: index = calc (node -> data);C: density [index] = update_density (density [index], node -> data);D: node = node -> next; }

B.1

A.2

D.2

B.2

A.2

D.2

B.2

A.2

D.2

C.1

9

P1 P2 P3

density

i = owner (density[index]);

Saturday, May 15, 2010

Page 10: Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu ...Software Pipelining Creates Parallelization Opportunities Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu-Han

LOCALWRITE

A.1

D.1

B.1

A.1

D.1

B.1

A.1

Core1 Core2 Core3

node = list -> head;A: while (node != NULL) {B: index = calc (node -> data);C: density [index] = update_density (density [index], node -> data);D: node = node -> next; }

B.1

A.2

D.2

B.2

A.2

D.2

B.2

C.1

C.2

D.1

B.1

A.2

D.2

B.2

10

A.3

P1 P2 P3

density

Saturday, May 15, 2010

Page 11: Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu ...Software Pipelining Creates Parallelization Opportunities Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu-Han

LOCALWRITE

A.1

D.1

B.1

A.1

D.1

B.1

C.1

Core1 Core2 Core3

A.2

D.2

B.2

A.2

D.2

B.2

node = list -> head;A: while (node != NULL) {B: index = calc (node -> data);C: density [index] = update_density (density [index], node -> data);D: node = node -> next; }

A.1

C.2

D.1

B.1

A.2

D.2

B.2

11

A.3

P1 P2 P3

density

Saturday, May 15, 2010

Page 12: Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu ...Software Pipelining Creates Parallelization Opportunities Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu-Han

Original Loop:

12

node = list -> head;A: while (node != NULL) {B: index = calc (node -> data);C: density [index] = update_density (density [index], node -> data);D: node = node -> next; }

A

B

D

C

intra-iteration dependencecross-iteration dependence

Saturday, May 15, 2010

Page 13: Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu ...Software Pipelining Creates Parallelization Opportunities Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu-Han

Original Loop:

13

node = list -> head;A: while (node != NULL) {B: index = calc (node -> data);C: density [index] = update_density (density [index], node -> data);D: node = node -> next; }

A

B

D

C

intra-iteration dependencecross-iteration dependence

Saturday, May 15, 2010

Page 14: Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu ...Software Pipelining Creates Parallelization Opportunities Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu-Han

Original Loop:

14

node = list -> head;A: while (node != NULL) {B: index = calc (node -> data);C: density [index] = update_density (density [index], node -> data);D: node = node -> next; }

A

B

D

C

intra-iteration dependencecross-iteration dependence

Saturday, May 15, 2010

Page 15: Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu ...Software Pipelining Creates Parallelization Opportunities Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu-Han

Original Loop:

while (TRUE) {E: node = getNodeOrExit();B: index = calc (node -> data); }

node = list -> head;A: while (node != NULL) {D: node = node -> next;}

while (TRUE) {F: node = getNodeOrExit();G: index = getIndex();C: density [index] = update_density ( density [index], node -> data); }

After Partition:

15

node = list -> head;A: while (node != NULL) {B: index = calc (node -> data);C: density [index] = update_density (density [index], node -> data);D: node = node -> next; }

A

B

D

C

intra-iteration dependencecross-iteration dependence

Saturday, May 15, 2010

Page 16: Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu ...Software Pipelining Creates Parallelization Opportunities Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu-Han

16

A.1

D.1

A.2

A.3

D.3

D.2

Core1 Core2 Core3

node = list -> head;A: while (node != NULL) {D: node = node -> next;}

Sequential

Saturday, May 15, 2010

Page 17: Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu ...Software Pipelining Creates Parallelization Opportunities Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu-Han

E.1

B.1

E.4

E.2

B.2

E.5

E.3

B.3

E.6

B.4 B.5 B.6

Core1 Core2 Core3

DOALL while (TRUE) {E: node = getNodeOrExit();B: index = calc (node -> data); }

17

A.1

D.1

A.2

A.3

D.3

D.2

Core1 Core2 Core3

node = list -> head;A: while (node != NULL) {D: node = node -> next;}

Sequential

Saturday, May 15, 2010

Page 18: Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu ...Software Pipelining Creates Parallelization Opportunities Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu-Han

E.1

B.1

E.4

E.2

B.2

E.5

E.3

B.3

E.6

B.4 B.5 B.6

Core1 Core2 Core3

F.1

G.1

C.1

F.4

G.4

F.2

G.2

Core1 Core2 Core3

C.4 C.2

A.1

D.1

A.2

A.3

D.3

D.2

Core1 Core2 Core3

node = list -> head;A: while (node != NULL) {D: node = node -> next;}

DOALLLOCALWRITE

Sequential while (TRUE) {F: node = getNodeOrExit();G: index = getIndex();C: density [index] = update_density (density [index], node -> data); }

F.3

G.3

C.3

F.6

G.6

F.5

G.5

C.6 F.5

while (TRUE) {E: node = getNodeOrExit();B: index = calc (node -> data); }

18

Saturday, May 15, 2010

Page 19: Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu ...Software Pipelining Creates Parallelization Opportunities Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu-Han

DSWP

A

B

D

C

intra-iteration dependencecross-iteration dependence

Stage2

Stage3

DSWP+

Stage1

19

node = list -> head;A: while (node != NULL) {B: index = calc (node -> data);C: density [index] = update_density (density [index], node -> data);D: node = node -> next; }

Saturday, May 15, 2010

Page 20: Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu ...Software Pipelining Creates Parallelization Opportunities Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu-Han

Core1 Core2 Core3

A.1

D.1

C.1

B.1

A.2

D.2

A.3

D.3

A.4

D.4

B.2

C.2

Stage1

Stage2

Stage3

12.5%

50.0%

37.5%

DSWP +

Max(12.5, 50.0, 37.5) = 50.0 % => 2X (speedup)

20

Saturday, May 15, 2010

Page 21: Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu ...Software Pipelining Creates Parallelization Opportunities Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu-Han

Core1 Core2 Core3 Core4

A.1

D.1

C.1

B.1

A.2

D.2

A.3

D.3

A.4

D.4

B.2

C.2

B.3

B.4

12.5%

(50.0/2)%25%

37.5%

Stage1

Stage2

Stage3

DSWP + DOALL

Max(12.5, 50.0/2, 37.5) = 37.5 % => 2.7X (speedup)

21

Saturday, May 15, 2010

Page 22: Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu ...Software Pipelining Creates Parallelization Opportunities Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu-Han

Core1 Core2 Core3 Core4 Core5 Core6

A.1

D.1

C.2

B.1

A.2

D.2

A.3

D.3

A.4

D.4

B.2

C.1

B.3

B.4

C.3

A.5

D.5

B.5

C.4

12.5%

(50.0/3)%16.7%

(37.5/2)%18.8%

Max(12.5, 50.0/3, 37.5/2) = 18.8 % => 5.3X (speedup)

Stage1

Stage2

Stage3

DSWP + DOALL + LOCALWRITE

22

Saturday, May 15, 2010

Page 23: Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu ...Software Pipelining Creates Parallelization Opportunities Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu-Han

DSWP

A

B

D

C

intra-iteration dependencecross-iteration dependence

Stage2

Stage3

DSWP+

Stage1

Loop Distribution Stage2

Stage3

Stage1

23

Saturday, May 15, 2010

Page 24: Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu ...Software Pipelining Creates Parallelization Opportunities Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu-Han

12.5%

50%

37.5%

Core1 Core2 Core3 Core4 Core5 Core6

A.1

D.1

C.2

B.1

A.2

D.2

B.3

C.1

B.4B.2

C.3

B.5

C.4

B.6

.

.

.

C.5 C.6

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Stage2

Stage3

(12.5 + 50.0/6 + 37.5/6) = 27.1% => 3.7X (speedup)

Stage1

24

Saturday, May 15, 2010

Page 25: Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu ...Software Pipelining Creates Parallelization Opportunities Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu-Han

Stage Number

Execution Time (%) Stage Type

1 12.5 Sequential

2 50 DOALL

3 37.5 LOCALWRITE

Stage Number

Execution Time (%) Stage Type

1 1 Sequential

2 50 DOALL

3 49 LOCALWRITE

0

2

4

6

8

3 4 5 6 7 8 9 10 20 50 100

Spee

dup

(X)

Number of Cores

Loop DistributionDSWP+

0

25

50

75

100

3 4 5 6 7 8 9 10 20 50 100

Spee

dup

(X)

Number of Cores25

Saturday, May 15, 2010

Page 26: Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu ...Software Pipelining Creates Parallelization Opportunities Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu-Han

DSWP+DOALLDSWP+DOALL+

LOCALWRITE

DSWP+LOCALWRITE

DSWP+SPECDOALL

26

Saturday, May 15, 2010

Page 27: Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu ...Software Pipelining Creates Parallelization Opportunities Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu-Han

Questions?

27

Saturday, May 15, 2010