bulk-synchronous parallel ml implementation of the parallel superposition
DESCRIPTION
Frédéric Gava. Bulk-Synchronous Parallel ML Implementation of the Parallel Superposition. Background. Implicit. Explicit. BSML. Automatic parallelization. skeletons. Data-parallelism. Parallel extensions. Concurrent programming. Parallel programming. Projects. 2002-2004 - PowerPoint PPT PresentationTRANSCRIPT
2/28
Background
Parallel programming
Implicit Explicit
Data-parallelismParallel
extensionsConcurrent
programmingAutomatic
parallelizationskeletons
BSML
3/28
Projects
2002-2004
ACI Grid
LIFO, LACL, PPS, INRIA
Design of parallel and Grid librairies for OCaml.
2004-2007
ACI « Young researchers »
LIFO, LACL
Production of a programming environment in which certified parallel programs can be written and safely executed.
4/28
Outline
I. The BSML language
II. Multi-programming (superposition)
III. Implementation of the superposition
IV. Conclusion and future works
6/28
The BSML « spirite »Bugs grow faster than Moore’s law. (G. Berry) High-level language lines of code number of bugd
Certified library number of bugs
Small is beautiful. (R. H. Bisseling) BSML only use 5 primitives…
Who would drive a non-deterministic car ? (G. Berry) Propriety of confluence of the semantic of BSML
French Proverb : « All the roads go to Roma » But the better way is to choose the shorter
One can give BSP costs to BSML programs
Different of concurrent programming : cost and confluence
7/28
Characterized by: p Number of processors r Processors speed L Global synchronization g Phase of communication (1 word at most
sent of received by each processor)
BSP architecture:
The BSP model
P/M P/M P/M P/M P/M
Network
Unit of synchronization
8/28
wi
Model of execution
Su
per
-ste
p i+
1
ghi
L
wi+
1ghi+
1L
Su
per
-ste
p i
Beginning of the super-step iLocal computing on
each processor
Global (collective) communications
between processors
Global synchronization : exchanged data
available for the next super-step
Cost(i) = (max0x<p wx
i) + hig + L
9/28
Example : broadcast Direct broadcast (one super-step):
BSP cost = png + L Broadcast with 2 super-steps:
BSP cost = ng + 2L
10/28
-calculus ML
BS-calculus
Parallelconstructions
BSML
Parallelprimitives
Structured parallelism as an explicit parallel extension of ML
Functional language with BSP cost predictions
Allows the implementation of skeletons
Implemented as a parallel library for the "Objective Caml" language
Using a parallel data structure called parallel vector
The BSML language
12/28
Parallel primitives of BSML
Asynchronous primitives: Creation of a vector (creation of local values)
mkpar : (int )par Parallel point-wize application
apply : ( ) par par par
Synchronous and communications primitives: Communications
put : (int) par (int) par Projection of local values (to be replicated)
proj : par (int)
13/28
Semantics
Natural semantics
Small-steps semantics
Distributed semantics
Programming modelEasy for proofs (Coq)
Easy for costs
Execution modelMake asynchronous steps
appearClose to a real implemantation
14/28
• Semantics = set of axioms and inference rules• Easy to understand, makes proofs more easy• Example:
Natural semantics
15/28
• Semantics = set of rewriting rules• Using contexts for the strategy• Easier understanding of costs and errors• Example:
Small steps semantics
Global cost
Local costs
16/28Distributed evaluation
scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vecscan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid
Prog scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vecscan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid
Prog scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vecscan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid
Prog
Distributed semantics Semantics = set of parallel rewriting rules SPMD style:
Natural
scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vecscan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid
Prog
Parallelvector
Parts of the parallel vector
18/28
Several programs on the same machine
Primitive of parallel composition: Superposition
Divide-and-conquer BSP algorithms
Parallel composition
19/28
super : (unit (unit )
super E1 E2 (E1 (), E2())
Fusion of communications/synchronisations using super-threads
Keep the BSP model
Pure functional semantics
Parallel Superposition
20/28
Communications
Synchronization
Communications
Synchronization
Communications
Synchronization
Communications
Synchronization
Communications
Synchronization
E1 E2 super E1 E2
0 1 20 . . .1 . . .2 . . .
0 1 20 . . .1 . . .2 . . .
0 1 20 . . .1 . . .2 . . .
Parallel Superposition
24/28
Semantics based implementation The semantics makes appear 3 low level primitives :
Send to send the data of the environment of communication
Rcv to received them
Wait to allow a super-thread to wait his brother
BSML primitives are thus simple calls of them (as in the small-steps semantics)
Super-threads could be implemented using threads
A scheduler of this threads is thus need for the special management of our super-threads
The environment of communications is just a Hashtable with pid of super-threads as keys
25/28
Example, prefixes calculus
scan : ( par par
scan (+) <v0, …, vp-1>
= <v0, v0+v1, …, v0+v1+…+ vp-1>
scan (+) <v0, …, vm, …>= < w0 , … , wm , … >
scan (+) <… ,vm+1, …, vp-1>=<…, wm+1 , … , wp+1>
< w0 , … , wm , wm+wm+1, … , wm+wp+1>= <v0, v0+v1, v0+…+vm, v0+…+vm+1,…, v0+…+vp-1>
26/28
Benchmarks
Size of the polynomials
Tim
e (s
)
Direct method (BSML+MPI)D-a-C method with superposition
D-a-C method with juxtaposition
28/28
Conclusion
BSML=BSP+ML
Superposition = primitive of parallel composition
Small-step semantics of the superposition
Distributed semantics as small one
Superposition implemented using threads as in the small-step semantics
29/28
Future works Implementation using continuation (transformation of source’s code with the help of a type checker) and proof of equivalence using our semantics
Implentation of bigger algorithms for better benchmarks of BSML and its superposition
Implementation of parallel skeletons (management of tasks) using the superposition ?
BSP model-checking of high-level Petri-nets (M-nets). The main difficult : find a non-trivial algorithm as the community of concurrent programming does. Possible but need more theoretical optimisations…