bulk-synchronous parallel ml implementation of the parallel superposition

30
Bulk-Synchronous Parallel ML Implementation of the Parallel Superposition rédéric Gava

Upload: gavin-hale

Post on 31-Dec-2015

41 views

Category:

Documents


1 download

DESCRIPTION

Frédéric Gava. Bulk-Synchronous Parallel ML Implementation of the Parallel Superposition. Background. Implicit. Explicit. BSML. Automatic parallelization. skeletons. Data-parallelism. Parallel extensions. Concurrent programming. Parallel programming. Projects. 2002-2004 - PowerPoint PPT Presentation

TRANSCRIPT

Bulk-Synchronous Parallel ML

Implementation of the Parallel Superposition

Frédéric Gava

2/28

Background

Parallel programming

Implicit Explicit

Data-parallelismParallel

extensionsConcurrent

programmingAutomatic

parallelizationskeletons

BSML

3/28

Projects

2002-2004

ACI Grid

LIFO, LACL, PPS, INRIA

Design of parallel and Grid librairies for OCaml.

2004-2007

ACI « Young researchers »

LIFO, LACL

Production of a programming environment in which certified parallel programs can be written and safely executed.

4/28

Outline

I. The BSML language

II. Multi-programming (superposition)

III. Implementation of the superposition

IV. Conclusion and future works

5/28

The BSML language

6/28

The BSML « spirite »Bugs grow faster than Moore’s law. (G. Berry) High-level language lines of code number of bugd

Certified library number of bugs

Small is beautiful. (R. H. Bisseling) BSML only use 5 primitives…

Who would drive a non-deterministic car ? (G. Berry) Propriety of confluence of the semantic of BSML

French Proverb : « All the roads go to Roma » But the better way is to choose the shorter

One can give BSP costs to BSML programs

Different of concurrent programming : cost and confluence

7/28

Characterized by: p Number of processors r Processors speed L Global synchronization g Phase of communication (1 word at most

sent of received by each processor)

BSP architecture:

The BSP model

P/M P/M P/M P/M P/M

Network

Unit of synchronization

8/28

wi

Model of execution

Su

per

-ste

p i+

1

ghi

L

wi+

1ghi+

1L

Su

per

-ste

p i

Beginning of the super-step iLocal computing on

each processor

Global (collective) communications

between processors

Global synchronization : exchanged data

available for the next super-step

Cost(i) = (max0x<p wx

i) + hig + L

9/28

Example : broadcast Direct broadcast (one super-step):

BSP cost = png + L Broadcast with 2 super-steps:

BSP cost = ng + 2L

10/28

-calculus ML

BS-calculus

Parallelconstructions

BSML

Parallelprimitives

Structured parallelism as an explicit parallel extension of ML

Functional language with BSP cost predictions

Allows the implementation of skeletons

Implemented as a parallel library for the "Objective Caml" language

Using a parallel data structure called parallel vector

The BSML language

11/28

fp-1…f1f0

gp-1…g1g0

Parallelpart

Sequentialpart

Replicatedpart

A BSML program

12/28

Parallel primitives of BSML

Asynchronous primitives: Creation of a vector (creation of local values)

mkpar : (int )par Parallel point-wize application

apply : ( ) par par par

Synchronous and communications primitives: Communications

put : (int) par (int) par Projection of local values (to be replicated)

proj : par (int)

13/28

Semantics

Natural semantics

Small-steps semantics

Distributed semantics

Programming modelEasy for proofs (Coq)

Easy for costs

Execution modelMake asynchronous steps

appearClose to a real implemantation

14/28

• Semantics = set of axioms and inference rules• Easy to understand, makes proofs more easy• Example:

Natural semantics

15/28

• Semantics = set of rewriting rules• Using contexts for the strategy• Easier understanding of costs and errors• Example:

Small steps semantics

Global cost

Local costs

16/28Distributed evaluation

scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vecscan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid

Prog scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vecscan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid

Prog scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vecscan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid

Prog

Distributed semantics Semantics = set of parallel rewriting rules SPMD style:

Natural

scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vecscan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid

Prog

Parallelvector

Parts of the parallel vector

17/28

Multi-programming

18/28

Several programs on the same machine

Primitive of parallel composition: Superposition

Divide-and-conquer BSP algorithms

Parallel composition

19/28

super : (unit (unit )

super E1 E2 (E1 (), E2())

Fusion of communications/synchronisations using super-threads

Keep the BSP model

Pure functional semantics

Parallel Superposition

20/28

Communications

Synchronization

Communications

Synchronization

Communications

Synchronization

Communications

Synchronization

Communications

Synchronization

E1 E2 super E1 E2

0 1 20 . . .1 . . .2 . . .

0 1 20 . . .1 . . .2 . . .

0 1 20 . . .1 . . .2 . . .

Parallel Superposition

21/28

Implementationof the

superposition

22/28

Semantics (1) Natural semantics :

Solution, the super-threads :

Small-step semantics:

23/28

Semantics (2)

Management of the superposition :

Management of the communications :

24/28

Semantics based implementation The semantics makes appear 3 low level primitives :

Send to send the data of the environment of communication

Rcv to received them

Wait to allow a super-thread to wait his brother

BSML primitives are thus simple calls of them (as in the small-steps semantics)

Super-threads could be implemented using threads

A scheduler of this threads is thus need for the special management of our super-threads

The environment of communications is just a Hashtable with pid of super-threads as keys

25/28

Example, prefixes calculus

scan : ( par par

scan (+) <v0, …, vp-1>

= <v0, v0+v1, …, v0+v1+…+ vp-1>

scan (+) <v0, …, vm, …>= < w0 , … , wm , … >

scan (+) <… ,vm+1, …, vp-1>=<…, wm+1 , … , wp+1>

< w0 , … , wm , wm+wm+1, … , wm+wp+1>= <v0, v0+v1, v0+…+vm, v0+…+vm+1,…, v0+…+vp-1>

26/28

Benchmarks

Size of the polynomials

Tim

e (s

)

Direct method (BSML+MPI)D-a-C method with superposition

D-a-C method with juxtaposition

27/28

Conclusion and future

works

28/28

Conclusion

BSML=BSP+ML

Superposition = primitive of parallel composition

Small-step semantics of the superposition

Distributed semantics as small one

Superposition implemented using threads as in the small-step semantics

29/28

Future works Implementation using continuation (transformation of source’s code with the help of a type checker) and proof of equivalence using our semantics

Implentation of bigger algorithms for better benchmarks of BSML and its superposition

Implementation of parallel skeletons (management of tasks) using the superposition ?

BSP model-checking of high-level Petri-nets (M-nets). The main difficult : find a non-trivial algorithm as the community of concurrent programming does. Possible but need more theoretical optimisations…

30/28

Thanks foryour attention