projectfortress: run your whiteboard, in parallel, on the...
TRANSCRIPT
David Chase
September, 2008
ProjectFortress:Run your whiteboard,in parallel, on the JVM
Page
ProjectFortress: run your whiteboard, in parallel, on the JVM
Fortress is• Originally designed for high-performance-productivity
computing as part of Sun’s DARPA HPCS contract.• Goals> scales onto N-cores, for 0 ≤ log(N) < 6.> runs fast> uncluttered syntax, following familiar mathematical
conventions whenever possible> extensible/growable – defined in libraries wherever possible
• Features (relevant to JVM hosting)> Transactions> Infested with parallelism> Generic (in T, n, and opr) functions, traits, objects> Multiple dispatch (dynamic, not static)
2
Page
ProjectFortress: run your whiteboard, in parallel, on the JVM
Current status• Parsed to ASTs.• Statically checked and transformed into more
runnable form.• Some compilation to bytecodes; mostly interpreted.
Compiler is coming soon, incrementally.• Interpreter is> multi-threaded> somewhat scalable (we test on 64 threads)> supports workstealing and transactions> doing LOTS of type fakery behind the curtain
• Library story is mostly honest, and getting more so.3
Page
ProjectFortress: run your whiteboard, in parallel, on the JVM
Motivation
4
Q: How do we make it pretty, parallel, and “growable”?
A: Types, type inference, dynamic overloading, workstealing.
Page
ProjectFortress: run your whiteboard, in parallel, on the JVM
Many opportunities for parallelism
5
Run YourWhite Board
vnorm = v/ !v!!
k!1:nak xk
C = A "B
y = 3x sin x cos 2x log log x
in Parallel!
ProjectFortress.sun.com
Aggregate objects and reductions drive parallelism with “generators”.
Operands are evaluated in parallel.
Page
ProjectFortress: run your whiteboard, in parallel, on the JVM
Desugaring generators
6
∑[i←a,j←b,p,k←c] ebecomes ∑(g)
⟨ e | i←a,j←b,p,k←c ⟩becomes ⟨g⟩
for i←a,j←b,p,k←c do e end becomes forLoop(g)
where g = (fn singleton ⇒ (a).join(b).generate(fn (i,j)⇒ (p).generate(fn ()⇒ (c).generate(fn (k)⇒ singleton(e))))))
Page
ProjectFortress: run your whiteboard, in parallel, on the JVM
Generator, for a “blocked range”
7
value object BlockedRange(lo: ℤ64, hi: ℤ64, b: ℤ64) extends Generator⟦ℤ64⟧
size = hi – lo + 1 generate⟦R extends Monoid⟦R,⊕⟧⟧(body: ℤ64 → R): R = if size ≤ max(b,1) then r : R = coerce(Identity) i : ℤ64 = lo while i ≤ hi do r := r⊕body(i) i += 1 end r else mid = ⎣(lo + hi) / 2⎦ BlockedRange(lo,mid,b).generate(body)) ⊕ BlockedRange(mid+1,hi,b).generate(body)) end end
SERIAL LOOP
PARALLEL SPLIT
Page
ProjectFortress: run your whiteboard, in parallel, on the JVM
Generator selection with overloading
8
value object BlockedRange(lo: ℤ64, hi: ℤ64, b: ℤ64) extends Generator⟦ℤ64⟧
(* Natural order: lo to hi *) size = hi – lo + 1 generate⟦R extends Monoid⟦R,⊕⟧⟧
(body: ℤ64 → R): R = ... generate⟦R extends CommutativeMonoid⟦R,⊕⟧⟧
(body: ℤ64 → R): R = ... generate⟦R extends { Monoid⟦R,⊕⟧, LeftZero⟦R,⊕⟧ }⟧
(body: ℤ64 → R): R = ... join⟦T⟧(Generator⟦T⟧): Generator⟦(ℤ64,T)⟧
end
Page
ProjectFortress: run your whiteboard, in parallel, on the JVM
Work stealing
9
1024 elements
512 elements
256 elements
128 elements64 elements
64 elements
Thread A’slocal state
Thread A’swork queue
Thread B,stealing work
push
pop-top
pop-bottom Arora, Blumhofe, Plaxton “ABP Queues”Doug Lea, JSR166y
Page
ProjectFortress: run your whiteboard, in parallel, on the JVM
Problems
• Work-stealing threads and transactions and contention managers.
• Type system mismatch• Dispatch compilation• Clean mapping to legacy
libraries
10
• Value objects• FP ops in different
rounding modes• Proper tail call
elimination• Self-profiling
interface
Wishlist
Page
ProjectFortress: run your whiteboard, in parallel, on the JVM
By “transactions”, we mean...
• How do I know if “f()” contains a transaction or not?> Transactions can nest
• How do I know if “f()” contains parallelism?(In Fortress, how could it not?)> Transactions may contain fork-join parallelism.
• This is largely unexplored.• Starting point is DSTM2.
11
Page
ProjectFortress: run your whiteboard, in parallel, on the JVM
Threads, transactions, contention.
• Doug Lea’s workstealing library (jsr166y) works great as long as threads don’t block.
• Transactions with contention management, sometimes “block”.
• Possible solutions:> steal carefully = from your spawn group, from your
transactional children.> accept a certain amount of conflict, shoot down
transactions when it occurs.> “continuations” -- if a thread can push a continuation as
part of its work, this is not an issue, but continuations must be very inexpensive.
> use wait-notify to obtain “continuations” (not fast enough).12
Page
ProjectFortress: run your whiteboard, in parallel, on the JVM
Example blocking problem1. Workstealing in Java pushes “join point” on Java stack. setFlag(queuedwork(), ownwork())2. If parent finishes its work, but not all children have
completed, what to do?A. “wait” -- but that’s not work-stealing.B. steal work W (randomly chosen; join point is still on Java stack).
3. Suppose W uses user-level abort to wait for a condition (Harris/Peyton Jones/Herlihy/Marlow advice): atomic do if NOT flag then abort() else ... end end(or it could be a Java-style wait).
4. Join is blocked by completion of W; W is blocked by execution of “setFlag” which follows join.
13
Page
ProjectFortress: run your whiteboard, in parallel, on the JVM
Type system• “traits”, “objects”, “functions”, “tuples”> traits = interfaces + defaults> objects = final classes OR singletons> traits and object may declare that they are “unboxed”.> Traits and objects describe everything except functions
and tuples.• Traits, objects and functions can be parameterized> by type: trait List⟦E⟧, trait Maybe⟦E⟧
> by N: trait Array1⟦T, nat b0, nat s0⟧
> by Op:trait Monoid⟦T extends Monoid⟦T,opr ⊕⟧, opr ⊕⟧
• “excludes” and “comprises” attributes on traits> Allows compilation straight down to true enums.
14
Page
ProjectFortress: run your whiteboard, in parallel, on the JVM
Goal: “compile booleans to one bit”
15
trait Boolean extends BooleanAlgebra⟦Boolean,∧,∨,¬,⊻⟧ comprises { true, false } opr ∧(self, other: Boolean): Boolean opr ∨(self, other: Boolean): Boolean opr ¬(self): Boolean opr ⊻(self, other: Boolean): Booleanend
object true extends Boolean opr ∧(self, other: Boolean) = other opr ∨(self, other: Boolean) = self opr ¬(self) = false opr ⊻(self, other: Boolean) = ¬otherend
object false extends Boolean opr ∧(self, other: Boolean) = self opr ∨(self, other: Boolean) = other opr ¬(self) = true opr ⊻(self, other: Boolean) = otherend
Only two subtypes
Subtypes are singletons
Exactly two Boolean instances
Not the JVM’s job!
disclaimer: this example is a lie, but the principle is sound
Page
ProjectFortress: run your whiteboard, in parallel, on the JVM
Type-dependent operations
16
extractLeft1(l0:D1⟦E⟧, m0:D0⟦D23⟦E⟧⟧ ) = ... extractLeft1(l0:D1⟦E⟧, m0:NonEmptyFingerTree⟦D23⟦E⟧⟧) = ... extractLeft1(l0:D24⟦E⟧, m0:FingerTree⟦D23⟦E⟧⟧ ) = ...
cast⟦T extends Any⟧(x:Any):T = typecase x of T => x else => throw CastError end
instanceOf⟦T extends Any⟧(x:Any):Boolean = typecase x of T => true else => false end
Page
ProjectFortress: run your whiteboard, in parallel, on the JVM
Type mapping plans
• Would like to avoid double-tagging.• Cannot erase types; must specialize or maintain
static parameters separately.• Could use custom classloader to partially specialize
types.> Believe that opr parameters will always be specialized> Not sure about type parameters; must beware of
polymorphic recursion. What if a type parameter happens to be an “enum”?
> Nat parameters will normally be maintained as final fields and extra parameters. (What Would Fortran Do?)
17
Page
ProjectFortress: run your whiteboard, in parallel, on the JVM
Multiple dispatch
• Want to say what mathematicians say:> scalar + scalar, matrix + matrix,
matrix • matrix, scalar • matrix> But not: scalar + matrix
• Patterns are a crutch for a crippled language.• Rules for overloading consistency ensure no
ambiguity -- “all paths” lead to the same choice, so the order can be tweaked.
18
Page
ProjectFortress: run your whiteboard, in parallel, on the JVM
Compiled multiple dispatch
• Suppose f(M,M), f(M,V), f(S,M), f(S,V), f(S,S)• Choose a parameter dispatch order, say 2,1• Generate a first dispatch interface for f:
interface I_F_2 { F_2(I_F_1 mvs); }
• Generate second dispatch interface for f:interface I_F_1 { F_1m(M m2); F_1v(V v2); F_1s(S s2);}
• add (inject?) interfaces and methods to M, V, and S:
19
Page
ProjectFortress: run your whiteboard, in parallel, on the JVM
20
class M implements I_F_1, I_F_2 { F_2(I_F_1 mvs) { mvs.F_1m(this); } F_1m(M m2) { F(this, m2); // MM } F_1v(V v2) { F(this, v2); // MV } F_1s(S s2) { fail(); // MS } ...}
class V implements I_F_1, I_F_2 { F_2(I_F_1 mvs) { mvs.F_1v(this); } F_1m(M m2) { fail(); // VM } F_1v(V v2) { fail(); // VV } F_1s(S s2) { fail(); // VS } ...}
class S implements I_F_1, I_F_2 { F_2(I_F_1 mvs) { mvs.F_1m(this); } F_1m(M m2) { F(this, m2); // SM } F_1v(V v2) { F(this, v2); // SV } F_1s(S s2) { F(this, s2); // SS } ...}
Page
ProjectFortress: run your whiteboard, in parallel, on the JVM
But M, V, S are nat-generic types
21
f⟦m,p,n⟧(M⟦m,p⟧,M⟦p,n⟧),
f⟦m,n⟧(M⟦m,n⟧,V⟦n⟧),
f⟦m,n⟧(S,M⟦m,n⟧),
f⟦n⟧(S,V⟦n⟧),
f(S,S)
• Nat params are> passed as hidden params to functions
M becomes M, Mm, MnV becomes V, Vn
> stored in data, not in type
Page
ProjectFortress: run your whiteboard, in parallel, on the JVM
22
class M implements I_F_1, I_F_2 { F_2(I_F_1 mvs) { mvs.F_1m(this, getM(), getN()); } F_1m(M m2, int m2m, int m2n) { assert(getN() == m2m); F(this, m2, getM(), m2m, m2n); // MM } F_1v(V v2, int v2n) { assert(getN() == v2n); F(this, v2, getM(), v2n); // MV } ...}
class V implements I_F_1, I_F_2 { F_2(I_F_1 mvs) { mvs.F_1v(this, getN()); } ...}
class S implements I_F_1, I_F_2 { F_2(I_F_1 mvs) { mvs.F_1m(this); } F_1m(M m2, int m2m, int m2n) { F(this, m2, m2m, m2n); // SM } F_1v(V v2, int v2n) { F(this, v2, v2n); // SV } F_1s(S s2) { F(this, s2); // SS } ...}
interface I_F_2 { F_2(I_F_1 mvs); }interface I_F_1 { F_1m(M m2, int m2m, int m2n); F_1v(V v2, int v2m); F_1s(S s2);}
Page
ProjectFortress: run your whiteboard, in parallel, on the JVM
Mapping to legacy libraries• From HPCS discussions with national labs users,
this turned out to be an interesting win for Python.• Legacy = Fortran; big users of big arrays.
Converting representations could be too costly.> We want to call the BLAS.
• Native code in a transaction, what’s that mean?> Maybe we do convert/copy after all.> Recalling launched missiles?
• What’s the “thread local” state of a work-stealing thread?
• Can native code block (either in a transaction or workstealing context?)
23
Page
ProjectFortress: run your whiteboard, in parallel, on the JVM
Value types
• Necessary for performance in HPC world.• Need “multiple return values” (i.e., a value aggregate).•Need to be able to create arrays of value types.> could transpose array of struct into struct of arrays, but this is
bad for mixing with other languages.
24
Page
ProjectFortress: run your whiteboard, in parallel, on the JVM
FP Ops in different rounding modes
• Static choice -- we want a variant of the different operations, not a mode bit.Roughly, “normal”, +, -, 0, inf.
• We have a pure Java implementation of the operations, but it is slow, misses a few corner cases.
• Special (to the JVM) methods would be good enough.
• We know this can be compiled/optimized to good code; treat this the same as sqrt.
25
Page
ProjectFortress: run your whiteboard, in parallel, on the JVM
Tail call elimination
• It’s a useful optimization even on a uniprocessor.• Fluffy stacks are not free, especially when you
have a zillion of them.• A Library-defined language has layer upon layer of
abstraction; why skip a chance to flatten out some of those layers?
26
Page
ProjectFortress: run your whiteboard, in parallel, on the JVM
Profiling feedback?
• High performance scientific libraries (FFTW, ATLAS) use it for their own code selection.
• We could use it to drive specialization of Fortress types and dispatch.> improved multiple dispatch> workstealing granularity> better treatment of transactions
• case fastest of ... ?• Don’t want to reinvent the JVM’s wheel.
27
Page
ProjectFortress: run your whiteboard, in parallel, on the JVM
Team• Guy Steele• Eric Allen• Jan-Willem Maessen• David Chase• Christine Flood• Sukyoung Ryu• Victor Luchangco• Steve Heller
28
• Sam Tobin-Hochstadt• Carl Eastlund• John Dias• Cheryl McCosh• Joe Hallett• Janus Nielsen• Dan Smith• Angelina Lee • Michael Spiegel• Ryan Culpepper• Jon Rafkind• Justin Hilburn• Nels Beckman
Interns
• Andrew BlackVisiting
+ a few externalparticipants
David [email protected]
ProjectFortress:Run your whiteboard,in parallel, on the JVM
Page
ProjectFortress: run your whiteboard, in parallel, on the JVM
Problems
• Work-stealing threads, transactions, and contention managers.
• Type system mismatch• Dispatch compilation• Clean mapping to legacy
libraries
30
• Value objectsIN ARRAYS!
• FP ops in different rounding modes
• Proper tail call elimination
• Self-profiling interface
Wishlist