Concurrent & Multicore OCaml: A deep dive
KC Sivaramakrishnan1 & Stephen Dolan1
Leo White2, Jeremy Yallop1,3, Armaël Guéneau4, Anil Madhavapeddy1,3
1 2
3 4
Concurrency ≠ Parallelism• Concurrency
• Programming technique • Overlapped execution of processes
• Parallelism • (Extreme) Performance hack • Simultaneous execution of computations
Concurrency ≠ Parallelism• Concurrency
• Programming technique • Overlapped execution of processes
• Parallelism • (Extreme) Performance hack • Simultaneous execution of computations
Concurrency ∩ Parallelism ➔ Scalable Concurrency
Concurrency ≠ Parallelism• Concurrency
• Programming technique • Overlapped execution of processes
• Parallelism • (Extreme) Performance hack • Simultaneous execution of computations
Concurrency ∩ Parallelism ➔ Scalable Concurrency(Fibers) (Domains)
Schedulers• Multiplexing fibers over domain(s)
• Bake scheduler into the runtime system (GHC)
Schedulers• Multiplexing fibers over domain(s)
• Bake scheduler into the runtime system (GHC)
• Allow programmers to describe schedulers! • Parallel search —> LIFO work-stealing • Web-server —> FIFO runqueue • Data parallel —> Gang scheduling
Schedulers• Multiplexing fibers over domain(s)
• Bake scheduler into the runtime system (GHC)
• Allow programmers to describe schedulers! • Parallel search —> LIFO work-stealing • Web-server —> FIFO runqueue • Data parallel —> Gang scheduling
• Algebraic Effects and Handlers
Algebraic effects & handlers
• Programming and reasoning about computational effects in a pure setting. • Cf. Monads
Algebraic effects & handlers
• Programming and reasoning about computational effects in a pure setting. • Cf. Monads
• Eff — http://www.eff-lang.org/
Algebraic effects & handlers
Algebraic Effects: Example
exception Foo of int
let f () = 1 + (raise (Foo 3))
let r = try f () with Foo i -> i + 1
Algebraic Effects: Example
exception Foo of int
let f () = 1 + (raise (Foo 3))
let r = try f () with Foo i -> i + 1
Algebraic Effects: Example
exception Foo of int
let f () = 1 + (raise (Foo 3))
let r = try f () with Foo i -> i + 1
val r : int = 4
Algebraic Effects: Example
exception Foo of int
let f () = 1 + (raise (Foo 3))
let r = try f () with Foo i -> i + 1
val r : int = 4
effect Foo : int -> int
let f () = 1 + (perform (Foo 3))
let r = try f () with effect (Foo i) k -> continue k (i + 1)
Algebraic Effects: Example
exception Foo of int
let f () = 1 + (raise (Foo 3))
let r = try f () with Foo i -> i + 1
val r : int = 4
effect Foo : int -> int
let f () = 1 + (perform (Foo 3))
let r = try f () with effect (Foo i) k -> continue k (i + 1)
Algebraic Effects: Example
exception Foo of int
let f () = 1 + (raise (Foo 3))
let r = try f () with Foo i -> i + 1
val r : int = 4
effect Foo : int -> int
let f () = 1 + (perform (Foo 3))
let r = try f () with effect (Foo i) k -> continue k (i + 1)
Algebraic Effects: Example
exception Foo of int
let f () = 1 + (raise (Foo 3))
let r = try f () with Foo i -> i + 1
val r : int = 4
effect Foo : int -> int
let f () = 1 + (perform (Foo 3)) 4
let r = try f () with effect (Foo i) k -> continue k (i + 1)
Algebraic Effects: Example
exception Foo of int
let f () = 1 + (raise (Foo 3))
let r = try f () with Foo i -> i + 1
val r : int = 4
effect Foo : int -> int
let f () = 1 + (perform (Foo 3)) 4
let r = try f () with effect (Foo i) k -> continue k (i + 1)
val r : int = 5
Algebraic Effects: Example
exception Foo of int
let f () = 1 + (raise (Foo 3))
let r = try f () with Foo i -> i + 1
val r : int = 4
effect Foo : int -> int
let f () = 1 + (perform (Foo 3)) 4
let r = try f () with effect (Foo i) k -> continue k (i + 1)
val r : int = 5
fiber — lightweight stack
Scheduler Demo1
[1] https://github.com/kayceesrk/ocaml15-eff/tree/master/chameneos-redux
• Fibers: Heap allocated, dynamically resized stacks
• ~10s of bytes
• No unnecessary closure allocation costs unlike CPS
Implementation
• Fibers: Heap allocated, dynamically resized stacks
• ~10s of bytes
• No unnecessary closure allocation costs unlike CPS
• One-shot delimited continuations
• Simplifies reasoning about resources - sockets, locks, etc.
Implementation
• Fibers: Heap allocated, dynamically resized stacks
• ~10s of bytes
• No unnecessary closure allocation costs unlike CPS
• One-shot delimited continuations
• Simplifies reasoning about resources - sockets, locks, etc.
• Handlers —> Linked-list of fibers
Implementation
• Fibers: Heap allocated, dynamically resized stacks
• ~10s of bytes
• No unnecessary closure allocation costs unlike CPS
• One-shot delimited continuations
• Simplifies reasoning about resources - sockets, locks, etc.
• Handlers —> Linked-list of fibers
Implementation
handle / continue
handler
sp call chainreference
Implementation
handle / continue
handle / continue
sp
handler
call chainreference
• Fibers: Heap allocated, dynamically resized stacks
• ~10s of bytes
• No unnecessary closure allocation costs unlike CPS
• One-shot delimited continuations
• Simplifies reasoning about resources - sockets, locks, etc.
• Handlers —> Linked-list of fibers
perform
sphandle / continue
Implementation
handler
call chainreference
• Fibers: Heap allocated, dynamically resized stacks
• ~10s of bytes
• No unnecessary closure allocation costs unlike CPS
• One-shot delimited continuations
• Simplifies reasoning about resources - sockets, locks, etc.
• Handlers —> Linked-list of fibers
Native-code fibers — Vanilla
C
Native-code fibers — Vanilla
OCaml start programC
OCaml
Native-code fibers — Vanilla
OCaml start program
C call
C
OCaml
C
Native-code fibers — Vanilla
OCaml start program
C call
OCaml callback
C
OCaml
C
OCaml
Native-code fibers — Vanilla
OCaml start program
C call
OCaml callback
C call
C
OCaml
C
OCaml
C
Native-code fibers — Vanilla
OCaml start program
C call
OCaml callback
C call
OCaml callback
C
OCaml
C
OCaml
C
OCaml
C
system stack
Native-code fibers — Effects
C
system stack
Native-code fibers — Effects
OCaml heap
OCaml start program
C
system stack
Native-code fibers — Effects
OCaml heap
OCaml start programhandle
C
system stack
Native-code fibers — Effects
OCaml heap
OCaml start program
C call
handle
C
C
system stack
Native-code fibers — Effects
OCaml heap
OCaml start program
C call
handle
OCaml callback
C
C
system stack
Native-code fibers — Effects
OCaml heap
OCaml start program
C call
handle
OCaml callback
C call
C
C
C
system stack
Native-code fibers — Effects
OCaml heap
OCaml start program
C call
handle
OCaml callback
C call
C
C
1. Stack overflow checks for OCaml functions • Simple static analysis eliminates many checks
C
system stack
Native-code fibers — Effects
OCaml heap
OCaml start program
C call
handle
OCaml callback
C call
C
C
1. Stack overflow checks for OCaml functions • Simple static analysis eliminates many checks
2. FFI calls are more expensive due to stack switching • Specialise for calls which {allocate / pass arguments on stack / do neither}
Performance : Vanilla OCaml
0
0.25
0.5
0.75
1
almabench
alt-ergo-parameter_smallest_divisor
alt-ergo-carte_autorisee_3
alt-ergo-parameter_relabel
alt-ergo-OBF
__ggjj_2
alt-ergo-parameter_def
alt-ergo-parameter_def
alt-ergo-OBF
__yyll_1
alt-ergo-bbvv_351
alt-ergo-controler_carte_13
alt-ergo-div2_sub
alt-ergo-parameter_def
alt-ergo-ccgg_2055
alt-ergo-parameter_def
alt-ergo-parameter_inverse_in_place
alt-ergo-ccgg_1759
alt-ergo-parameter_def
alt-ergo-induction_step
alt-ergo-ccgg_1618
alt-ergo-ccgg_219
alt-ergo-advance_automaton_25
alt-ergo-nsec_sum
_higher_than_1s
alt-ergo-fill_assert_39_Alt-Ergo bdd
cham
eneos-async
cham
eneos-lwt
cohttp-lw
tcore_m
icro
cpdf-merge
cpdf-reformat
cpdf-squeeze
cpdf-transform
frama-c-deflate
frama-c-idct
js_of_ocam
ljsontrip-sample kb
kb-no-exc
lexifi-g2pp
menhir-fancy
menhir-sql
menhir-standard
minilight
numal-durand-kerner-aberth
numal-fft
numal-k-means
numal-levinson-durbin
numal-lu-decom
position
numal-naive-multilayer
numal-qr-decom
position
numal-rnd_access
numal-simple_access
patdiff
sauvola-contrast
sequence
sequence-cps
setrip
setrip-sm
allbuf
thread-ring-async-pipe
thread-ring-lw
t-mvar
thread-ring-lw
t-stream
thread-sleep-async
thread-sleep-lw
tvalet-async
valet-lwt
ydum
p-sample
4.02.2+effects 4.02.2+vanilla
Normalised time (lower is better)
Performance : Vanilla OCaml
0
0.25
0.5
0.75
1
almabench
alt-ergo-parameter_smallest_divisor
alt-ergo-carte_autorisee_3
alt-ergo-parameter_relabel
alt-ergo-OBF
__ggjj_2
alt-ergo-parameter_def
alt-ergo-parameter_def
alt-ergo-OBF
__yyll_1
alt-ergo-bbvv_351
alt-ergo-controler_carte_13
alt-ergo-div2_sub
alt-ergo-parameter_def
alt-ergo-ccgg_2055
alt-ergo-parameter_def
alt-ergo-parameter_inverse_in_place
alt-ergo-ccgg_1759
alt-ergo-parameter_def
alt-ergo-induction_step
alt-ergo-ccgg_1618
alt-ergo-ccgg_219
alt-ergo-advance_automaton_25
alt-ergo-nsec_sum
_higher_than_1s
alt-ergo-fill_assert_39_Alt-Ergo bdd
cham
eneos-async
cham
eneos-lwt
cohttp-lw
tcore_m
icro
cpdf-merge
cpdf-reformat
cpdf-squeeze
cpdf-transform
frama-c-deflate
frama-c-idct
js_of_ocam
ljsontrip-sample kb
kb-no-exc
lexifi-g2pp
menhir-fancy
menhir-sql
menhir-standard
minilight
numal-durand-kerner-aberth
numal-fft
numal-k-means
numal-levinson-durbin
numal-lu-decom
position
numal-naive-multilayer
numal-qr-decom
position
numal-rnd_access
numal-simple_access
patdiff
sauvola-contrast
sequence
sequence-cps
setrip
setrip-sm
allbuf
thread-ring-async-pipe
thread-ring-lw
t-mvar
thread-ring-lw
t-stream
thread-sleep-async
thread-sleep-lw
tvalet-async
valet-lwt
ydum
p-sample
4.02.2+effects 4.02.2+vanilla
Normalised time (lower is better)
4.02.2+effects ~5.4% slower
Performance : Chameneos-ReduxTi
me
(S)
0
0.45
0.9
1.35
1.8
Iterations (X100,000)
1 2 3 4 5 6 7 8 9 10
Lwt Concurrency Monad GHC Fibers
Generator from Iterator1
[1] https://github.com/kayceesrk/ocaml15-eff/blob/master/generator.ml
let rec iter f = function | Leaf -> () | Node (l, x, r) -> iter f l; f x; iter f r
type 'a t = | Leaf | Node of 'a t * 'a * 'a t
Generator from Iterator1
[1] https://github.com/kayceesrk/ocaml15-eff/blob/master/generator.ml
(* val to_gen : 'a t -> (unit -> 'a option) *) let to_gen (type a) (t : a t) = let module M = struct effect Next : a -> unit end in let open M in let step = ref (fun () -> assert false) in let first_step () = try iter (fun x -> perform (Next x)) t; None with effect (Next v) k -> step := continue k; Some v in step := first_step; fun () -> !step ()
let rec iter f = function | Leaf -> () | Node (l, x, r) -> iter f l; f x; iter f r
type 'a t = | Leaf | Node of 'a t * 'a * 'a t
Performance : GeneratorTi
me
(S)
0
1
2
3
4
Binary tree depth
15 16 17 18 19 20 21 22 23 24 25
Iterator Fiber Generator H/W Generator
Async I/O in direct style1
[1] https://github.com/kayceesrk/ocaml15-eff/tree/master/async-io
Async I/O in direct style1
Callback Hell
[1] https://github.com/kayceesrk/ocaml15-eff/tree/master/async-io
Javascript backend
• js_of_ocaml • OCaml bytecode —> Javascript
Javascript backend
• js_of_ocaml • OCaml bytecode —> Javascript
• js_of_ocaml compiler pass
• Whole-program selective CPS transformation
Javascript backend
• js_of_ocaml • OCaml bytecode —> Javascript
• js_of_ocaml compiler pass
• Whole-program selective CPS transformation
• Work-in-progress!
• Runs “hello-effects-world”!
fin.
https://github.com/kayceesrk/ocaml-eff-example