parallel programming with session java

17
Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion Parallel programming with Session Java Nicholas Ng ([email protected]) Imperial College London 1/17

Upload: lydien

Post on 10-Feb-2017

237 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Parallel programming with Session Java

Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion

Parallel programming with Session Java

Nicholas Ng ([email protected])

Imperial College London

1/17

Page 2: Parallel programming with Session Java

Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion

Motivation

Parallel designs are difficult, error prone (eg. MPI)

Session types ensure communication safety in concurrentsystems

So use session types to design safe parallel algorithms for highperformance clusters

2/17

Page 3: Parallel programming with Session Java

Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion

Contributions

An implementation of parallel n-body simulation

1 Programmed in Session Java (SJ), a full implementation ofsession types

2 Uses FPGA on the AXEL heterogeneous cluster

A formal description of multicast outwhile, inwhile SJprimitives in session types

Showed type soundness, progress property in SJ parallelprograms connected in a ring topology

Proved SJ n-body implementation deadlock free

3/17

Page 4: Parallel programming with Session Java

Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion

Session types

Typing system for [HVK98] π-calculus

π-calculus models structured interactions between processes

Main idea: communication primitives should have a dual

Example

(Conventional type system) int i = 9

i is type int

9 is type int

Process A: cab!〈9〉;P (send 9 to B via channel cab)Process B: cab?(x).Q (receive x from A via channel cab)

A is type Send int (or cab : ![int])

B is type Receive int (or cab : ?[int])

4/17

Page 5: Parallel programming with Session Java

Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion

Session programming with SJ

Session Java (SJ) [HYH08]

A full implementation of binary session types in Java

Provides a socket programming interface with eg. accept(),request(), send(), receive()

Workflow of a SJ program:

1 Declare session type/protocol of program in SJ

2 SJ compiler checks local session type conformance

3 Runtime duality check with communicating program

5/17

Page 6: Parallel programming with Session Java

Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion

SJ features for parallel programming

Iteration chaining

Multi-channel inwhile and outwhile in place ofreduce-scatter operations

Master: <s1,s2>.outwhile(i<42){...}

Forwarder1: s3.outwhile(s1.inwhile){...}

Forwarder2: s4.outwhile(s2.inwhile){...}

End: <s3,s4>.inwhile(){...}

Master Forwarder 1

EndForwarder 2

6/17

Page 7: Parallel programming with Session Java

Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion

Simple example: N-body simulation

Figure: Result force is vectorsum of all forces

n particles following Newton’s lawsof motion

Calculate the result force acting oneach particle

Displace the particle based on netforce acting on it

7/17

Page 8: Parallel programming with Session Java

Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion

Simple example: N-body simulation

Implemented in a ring topology

3 kinds of processes - Master, Worker (multiple), LastWorker

1 Each allocated a partition ofparticles

2 Calculate resultant forces onreceived set of particles

3 Forward to next node

4 Repeat until end of one time step

Master

Worker

WorkerLast

Worker

8/17

Page 9: Parallel programming with Session Java

Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion

Another example: Jacobi method

Iteration-based method for solving the Discrete PoissonEquation

Used in physics and natural sciences

Given initial prediction, iterate until converged or upper limitof iterations

- edge edge -

edge value value edge

edge value value edge

- edge edge -

Figure: A sub-matrix of calculation

9/17

Page 10: Parallel programming with Session Java

Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion

Another example: Jacobi method

Implemented in a mesh topology (2D decomposition)

9 kinds of processes - one for each edge case and a Worker inthe center

1 Each allocated a sub-matrix ofvalues

2 Calculate average of neighbouringvalues for all element

3 exchange edges to adjacentsub-grid

4 Repeat until converged

WorkerNorthEast

WorkerNorthMaster

WorkerEastWorkerWorker

West

WorkerSouthEast

WorkerSouth

WorkerSouthWest

1

3

6

2

5

8

4

7

9

10/17

Page 11: Parallel programming with Session Java

Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion

AXEL: a heterogeneous cluster

Axel [TL10] is a heterogeneous cluster that contains differentProcessing Elements (PE) on each node:

CPU Off-the-shelf multicore x86 architecture CPU

GPU Graphics Processing Unit, nVidia Tesla, dedicatedGeneral Purpose GPU

FPGA Field Programmable Gate Arrays, reconfigurablehardware

AXEL is a 16-node NNUS cluster

Each node can be used as individual PC

Connected by high speed Ethernet

11/17

Page 12: Parallel programming with Session Java

Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion

Performance benchmark results

Against MPJ Express [SCB09], implementation of MPI in Java

Performance competitive (Left: N-body simulation, Right:Jacobi method)

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

500 1000 1500 2000 2500 3000

Ru

ntim

e (

mill

ise

co

nd

s)

Number of particles per node

n-Body simulation

Multi-channel SJOld SJ

MPJ Express

0

500

1000

1500

2000

2500

3000

1000 1500 2000 2500 3000 3500 4000 4500 5000

Ru

ntim

e (

se

co

nd

s)

Partition size

Jacobi solution of the Discrete Poisson Equation

Multi-channel SJOld SJ

MPJ Express

12/17

Page 13: Parallel programming with Session Java

Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion

Performance benchmark results (with FPGA)

Better performance with more particles

Best performance: SJ+FPGA 2x faster than SJimplementation

0

10000

20000

30000

40000

50000

60000

70000

80000

0 10000 20000 30000 40000 50000 60000 70000

Runtim

e (

mill

iseconds)

Number of particles

SJ + FPGASJ

MPJExpress

13/17

Page 14: Parallel programming with Session Java

Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion

Well-formed topology

Multichannel inwhile and outwhile not safe on its own

Well-formed topology: Topology constructed as DAG with 1root node and 1 sink node

Individual pairs of sessions are dual

Iteration controlled by a single condition in the Master node

Deadlock freedom for group of processes in well-formedtopology

421

753

986

14/17

Page 15: Parallel programming with Session Java

Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion

Future (and ongoing) work

C based language implementing session types

Higher performance with FPGA or other acceleration hardware

Can integrate with AXEL or similar HPC applicationstoolchain

15/17

Page 16: Parallel programming with Session Java

Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion

16/17

Page 17: Parallel programming with Session Java

Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion

References

Kohei Honda, Vasco T. Vasconcelos, and Makoto Kubo.

Language primitives and type disciplines for structured communication-based programming.In ESOP’98, volume 1381, pages 22–138, 1998.

Raymond Hu, Nobuko Yoshida, and Kohei Honda.

Session-based distributed programming in java.In ECOOP’08, volume 5142 of LNCS, pages 516–541, 2008.

Aamir Shafi, Bryan Carpenter, and Mark Baker.

Nested Parallelism for Multi-core HPC Systems using Java.Journal of Parallel and Distributed Computing, 69(6):532 – 545, 2009.

Kuen Hung Tsoi and Wayne Luk.

Axel: a heterogeneous cluster with FPGAs and GPUs.In FPGA ’10: Proceedings of the 18th annual ACM/SIGDA international symposium on Fieldprogrammable gate arrays, pages 115–124, New York, NY, USA, 2010. ACM.

17/17