streamx10: a stream programming framework on x10

21
HUST StreamX10: A Stream Programming Framework on X10 Haitao Wei 2012-06-14 School of Computer Science at Huazhong University of Sci&Tech

Upload: teness

Post on 24-Feb-2016

50 views

Category:

Documents


0 download

DESCRIPTION

StreamX10: A Stream Programming Framework on X10. Haitao Wei 2012-06- 1 4. School of Computer Science at Huazhong University of Sci&Tech. Outline. 1. Introduction and Background. 2. COStream Programming Language. 3. Stream Compilation on X10. 4. Experiments. 5. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: StreamX10: A Stream Programming Framework on X10

HUSTStreamX10: A Stream

Programming Framework on X10

Haitao Wei2012-06-14

School of Computer Science at Huazhong University of Sci&Tech

Page 2: StreamX10: A Stream Programming Framework on X10

2

Outline

Introduction and Background1

COStream Programming Language2

Stream Compilation on X103

Experiments4

Conclusion and Future Work5

Page 3: StreamX10: A Stream Programming Framework on X10

Background and motivition

Stream Programming A high level programming model that has been

productively applied Usually, depends on the specific architectures which

makes it difficult to port between different platforms X10

a productive parallel programming environment isolates the different architecture details provides a flexible parallel programming abstract layer

for stream programming

StreamX10: try to make the stream program portable based on X10

Page 4: StreamX10: A Stream Programming Framework on X10

4

Outline

Introduction and Background1

COStream Programming Language2

Stream Compilation on X103

Experiments4

Conclusion and Future Work5

Page 5: StreamX10: A Stream Programming Framework on X10

COStream Language

5

stream FIFO queue connecting operators

operator Basic func unit—actor node in stream graph

Multiple inputs and multiple outputs Window

– like pop, peek, push operations Init and work function

composite Connected operators—subgraph of actors A stream program is composed of composites

Page 6: StreamX10: A Stream Programming Framework on X10

COStream and Stream Graph

6

Composite Main{ graph

stream<int i> S = Source(){ state :{ int x;}

init :{x=0;} work :{

S[0].i = x; x++;

} window S:tumbling,count(1);

} streamit<int j> P = MyOp(S){

param pn:N } () as SinkOp = Sink(P){ state :{int r;} work :{ r = P[0].j; println(r); } window P: tumbling,count(1); }}

Composite MyOp(output Out ; input In){ param attribute:pn graph stream<int j> Out = Averager(In){ work :{ int sum=0,i; for(i=0;i<pn;i++) sum += In[i],j; Out[0].j = (sum/pn); } window In: sliding,count(10),count(1); Out:tumbling,count(1); }}

stream

operator

composite

Sourc

e

SinkAverager

push=1

peek=10pop=1

push=1

pop=1S P

Page 7: StreamX10: A Stream Programming Framework on X10

7

Outline

Introduction and Background1

COStream Programming Language2

Stream Compilation on X103

Experiments4

Conclusion and Future Work5

Page 8: StreamX10: A Stream Programming Framework on X10

Compilation flow of StreamX10 Phrase Function

Front-endTranslates the COStream syntax into abstract syntax tree.

InstantiationInstantiates the composites hierarchically to static flattened

operators.

Static Stream GraphConstructs static stream graph from flattened operators.

SchedulingCalculates initialization and steady-state execution orderings of

operators.

PartitioningPerforms partitioning based on X10 parallelism models for load

balance.

Code GenerationGenerates X10 code for COStream programs.

Page 9: StreamX10: A Stream Programming Framework on X10

The Execution Framework

9

Place 0 Place 1 Place 2

activity activity activity

Local buffer object Global buffer object

Data flow intra place Data flow inter place

threads pool

The node is partitioned between the places Each node is mapped to an activity The nodes use the pipeline fashion to exploit the parallelisms The local and Global FIFO buffer are used

Page 10: StreamX10: A Stream Programming Framework on X10

Work Partition Inter-place

10Objective:Minimized Communication and Load Balance (Using Metis)

10

2 2 2

2

10

21

55

555

5

1

Comp. work=10

Comp. work=10

Comp. work=10

Speedup: 30/10 =3Communication: 2

Page 11: StreamX10: A Stream Programming Framework on X10

Global FIFO implementation

11

Producer Consumer

10 n… 10 n…

10 n…

push peek/pop

copy copy

Local Array DistArrayPlace0 Place1

Each Producer/Consumer has its own local buffer the producer uses push operation to store the data to the local

buffer The consumer uses peek/pop operation to fetch data from the local

buffer When the local buffer is full/empty is data will be copied

automatically

Page 12: StreamX10: A Stream Programming Framework on X10

X10 code in the Back-end

12

Spawn activities for each node at place according to the partition

Call the work function in initial and steady schedule

Define the work function

//main.x10 control codepublic static def main( ) { ... finish for (p in Place.places()) async at (p) {

switch(p.id){ case 0:

val a_0 = new Source_0(rc); a_0.run(); break; case 1:

val a_2 = new MovingAver_2(rc); a_2.run(); break; case 2: val a_1 = new Sink_1(rc); a_1.run(); break; default: break; }

} …}

//Source.x10 code ... def work(){ ... push_Source_0_Sink_1(0).x=x; x+=1.0; pushTokens(); popTokens(); } public def run(){ initWork();// init // initSchedule for(var j:Int=0;j<Source_0_init;j++) work(); //steadySchedule for(var i:Int=0;i<RepeatCount;i++) for(var j:Int=0;j<Source_0_steady;j++) work(); flush(); } ...

Page 13: StreamX10: A Stream Programming Framework on X10

13

Outline

Introduction and Background1

COStream Programming Language2

Stream Compilation on X103

Experiments4

Conclusion and Future Work5

Page 14: StreamX10: A Stream Programming Framework on X10

Experimental Platform and Benchmarks

14

Platform Intel Xeon processor (8 cores ) 2.4 GHZ with 4GB

memory Radhat EL5 with Linux 2.6.18 X10 compiler and runtime used are 2.2.0

Benchmarks Rewrite 11 benchmarks from StreamIt

Page 15: StreamX10: A Stream Programming Framework on X10

The throughputs comparison

15

Throughputs of 4 different configurations (NPLACE*NTHREAD=8) Normalized to 1 place with 8 threads

0

1

2

3

4

5

6

7

8

9

10

Thro

ughp

ut n

orm

aliz

ed to

1 p

lace

with

8 th

read

s

NPLACES=1, NTHREADS=8

NPLACES=2, NTHREADS=4

NPLACES=4, NTHREADS=2

NPLACES=8, NTHREADS=1

• for most benchmarks, CPU utilization increases from 24% to 89% ,when places varies from 1 to 4, except for the benchmark with low computation/communication ratio

• benefits are little or worse when the number of places increases from 4 to 8

Page 16: StreamX10: A Stream Programming Framework on X10

Observation and Analysis

16

The throughput goes up when the number of places increases. This is because that multiple places increase the CPU utilization

Multiple places show parallelism but also bring more communication overhead

Benchmarks with more computation workload like DES and Serpent_full can still benefit form the number of places increasing

Page 17: StreamX10: A Stream Programming Framework on X10

17

Outline

Introduction and Background1

COStream Programming Language2

Stream Compilation on X103

Experiments4

Conclusion and Future Work5

Page 18: StreamX10: A Stream Programming Framework on X10

Conclusion We proposed and implemented StreamX10, a

stream programming language and compilation system on X10

A raw partitioning optimization is proposed to exploit the parallelisms based on X10 execution model

Preliminary experiment is conducted to study the performance

18

Page 19: StreamX10: A Stream Programming Framework on X10

Future Work How to choose the best configuration (# of places

and # of threads) automatically for each benchmark

How to decrease the thread switching overhead by mapping multiple nodes to the single activity

19

Page 20: StreamX10: A Stream Programming Framework on X10

AcknowledgmentX10 Innovation Award founding support

QiMing Teng, Haibo Lin and David P. Grove at IBM for their help on this research

20

Page 21: StreamX10: A Stream Programming Framework on X10

HUST