dataflow: declarative concurrency in ruby
DESCRIPTION
While Ruby is known for its flexibility due to high mutability and meta-programming capability, these features make writing thread-safe programs using manual locking very error-prone. For this reason some people are switching to languages with easier to manage concurrency paradigms, such as Erlang/Scala’s message passing, or Clojure/Haskell’s Software Transactional Memory (STM).This talk is about Dataflow, a pure Ruby gem that adds dataflow variables to the Ruby language. Dataflow variables are write-once (or write multiple times with the same value), and suspend execution in the current thread/context if called before being assigned/bound. We will explore how this technique makes writing concurrent but thread-safe code easy, even making it possible to write tests that spawn threads without needing to worry.Declarative concurrency is a relatively unknown programming model that is an alternative to message passing and STM. Ruby’s malleability makes it an ideal host for this model. Besides performance implications, dataflow variables also have an important impact on declarative program modeling. The talk will also go over the differences in performance and memory of the library in various Ruby implementations.TRANSCRIPT
DataflowThe declarative concurrent
programming model
Larry Diehl
{:larrytheliquid => %w[.com github twitter]}
Outline
Purpose of presentation
Gradual explanation of concepts
Helpful tips
Purpose
Lexical Scope
foo = :foodefine_method :foo do fooend
Dynamic Scope
def foo @fooend
Mutability
def initialize @foo = :fooend def foo @fooend
Mutability
def foo @foo = :foo @fooend
Mutability+Concurrency
def initialize Thread.new { loop { @foo = :shazbot } }end def foo @foo = :foo @fooend
The Declarative Model
Declarative Synchronous
my_var = :boundmy_var = :rebind # NOT ALLOWED!
Declarative Synchronous
local do |my_var| my_var.object_id # thread sleepsend
Declarative Synchronous
local do |my_var| unify my_var, :bound unify my_var, :rebind # => # Dataflow::UnificationError, # ":bound != :rebind"end
Declarative Synchronous
class MyClass declare :my_var def initialize unify my_var, :bound endend
Declarative Concurrent(MAGIC)
Declarative Concurrent
local do |my_var| Thread.new { unify my_var, :bound } my_var.should == :boundend
Dependency Resolution
local do |sentence, middle, tail| Thread.new { unify middle, "base are belong #{tail}" } Thread.new { unify tail, "to us" } Thread.new { unify sentence, "all your #{middle}" } sentence.should == "all your base are belong to us"end
Asynchronous Outputdef Worker.async(output=nil) Thread.new do result = # do hard work unify output, result if output endend local do |output| Worker.async(output) output.should == # hard work resultend
Asynchronous Output
local do |output| flow(output) do # do hard work end output.should == # hard work resultend
Anonymous variables
{'google.com' => Dataflow::Variable.new, 'bing.com' => Dataflow::Variable.new}.map do |domain,var| Thread.new do unify var, open("http://#{domain}").read end varend
need_later
%w[google.com bing.com].map do |domain| need_later { open("http://#{domain}").read }end
Chunked Sequential Processing
(1..100).each_slice(10).map do |chunk| sleep(1) chunk.inject(&:+)end.inject(&:+) # => ~10s
Chunked Parallel Processing
(1..100).each_slice(10).map do |chunk| need_later do sleep(1) chunk.inject(&:+) endend.inject(&:+) # => ~1s
Leaving Declarative via Async
Ports & Streams
local do |port, stream| unify port, Dataflow::Port.new(stream) port.send 1 port.send 2 stream.take(2).should == [1, 2]end
Ports & Streams (async)local do |port, stream| unify port, Dataflow::Port.new(stream) Thread.new do stream.each do |message| puts "received: #{message}" end end %w[x y z].each do |letter| Thread.new{ port.send letter } end stream.take(3).sort.should == %w[x y z]end
FutureQueuelocal do |queue, first, second, third| unify queue, FutureQueue.new queue.pop first queue.pop second queue.push 1 queue.push 2 queue.push 3 queue.pop third [first, second, third].should == [1, 2, 3]end
ActorsPing = Actor.new { 3.times { case receive when :ping puts "Ping" Pong.send :pong end }}
Pong = Actor.new { 3.times { case receive when :pong puts "Pong" Ping.send :ping end }}
Ping.send :ping
by_need
def baz(num) might_get_used = by_need { Factory.gen } might_get_used.value if num%2 == 0end
Tips
Modular
local do |my_var| Thread.new { unify my_var, :bound } # my_var.wait my_var.should == :boundend
Debugging
local do |my_var| my_var.inspect # => #<Dataflow::Variable:2637860 unbound>end
Class/Module methods
Dataflow.local do |my_var| Dataflow.async do Dataflow.unify my_var, :bound end my_var.should == :boundend
Use Casesgeneral purpose
concurrency for elegant program structure with respect to coordination
concurrency to make use of extra processors/cores (depending on Ruby implementation)
web developmentworker daemons
concurrently munging together data from various rest api's
Ruby Implementations
Pure Ruby library, should work on any implementation
JRuby in particular has a great GC, no GIL, native threads, and a tunable threadpool option.
Rubinius has more code written in Ruby, so it proxies more method calls (e.g. Array#flatten).
class FutureQueue include Dataflow declare :push_port, :pop_port def initialize local do |pushed, popped| unify push_port, Dataflow::Port.new(pushed) unify pop_port, Dataflow::Port.new(popped) Thread.new { loop do barrier pushed.head, popped.head unify popped.head, pushed.head pushed, popped = pushed.tail, popped.tail end } end end def push(x) push_port.send x end def pop(x) pop_port.send x endend
The End
sudo port install dataflow
http://github.com/larrytheliquid/dataflow
freenode: #dataflow-gem