projectfortress: run your whiteboard, in parallel, on the...

30
David Chase September, 2008 ProjectFortress: Run your whiteboard, in parallel, on the JVM

Upload: others

Post on 26-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ProjectFortress: Run your whiteboard, in parallel, on the JVMwiki.jvmlangsummit.com/pdf/16_Chase_fortress.pdf · ProjectFortress: run your whiteboard, in parallel, on the JVM Multiple

David Chase

September, 2008

ProjectFortress:Run your whiteboard,in parallel, on the JVM

Page 2: ProjectFortress: Run your whiteboard, in parallel, on the JVMwiki.jvmlangsummit.com/pdf/16_Chase_fortress.pdf · ProjectFortress: run your whiteboard, in parallel, on the JVM Multiple

Page

ProjectFortress: run your whiteboard, in parallel, on the JVM

Fortress is• Originally designed for high-performance-productivity

computing as part of Sun’s DARPA HPCS contract.• Goals> scales onto N-cores, for 0 ≤ log(N) < 6.> runs fast> uncluttered syntax, following familiar mathematical

conventions whenever possible> extensible/growable – defined in libraries wherever possible

• Features (relevant to JVM hosting)> Transactions> Infested with parallelism> Generic (in T, n, and opr) functions, traits, objects> Multiple dispatch (dynamic, not static)

2

Page 3: ProjectFortress: Run your whiteboard, in parallel, on the JVMwiki.jvmlangsummit.com/pdf/16_Chase_fortress.pdf · ProjectFortress: run your whiteboard, in parallel, on the JVM Multiple

Page

ProjectFortress: run your whiteboard, in parallel, on the JVM

Current status• Parsed to ASTs.• Statically checked and transformed into more

runnable form.• Some compilation to bytecodes; mostly interpreted.

Compiler is coming soon, incrementally.• Interpreter is> multi-threaded> somewhat scalable (we test on 64 threads)> supports workstealing and transactions> doing LOTS of type fakery behind the curtain

• Library story is mostly honest, and getting more so.3

Page 4: ProjectFortress: Run your whiteboard, in parallel, on the JVMwiki.jvmlangsummit.com/pdf/16_Chase_fortress.pdf · ProjectFortress: run your whiteboard, in parallel, on the JVM Multiple

Page

ProjectFortress: run your whiteboard, in parallel, on the JVM

Motivation

4

Q: How do we make it pretty, parallel, and “growable”?

A: Types, type inference, dynamic overloading, workstealing.

Page 5: ProjectFortress: Run your whiteboard, in parallel, on the JVMwiki.jvmlangsummit.com/pdf/16_Chase_fortress.pdf · ProjectFortress: run your whiteboard, in parallel, on the JVM Multiple

Page

ProjectFortress: run your whiteboard, in parallel, on the JVM

Many opportunities for parallelism

5

Run YourWhite Board

vnorm = v/ !v!!

k!1:nak xk

C = A "B

y = 3x sin x cos 2x log log x

in Parallel!

ProjectFortress.sun.com

Aggregate objects and reductions drive parallelism with “generators”.

Operands are evaluated in parallel.

Page 6: ProjectFortress: Run your whiteboard, in parallel, on the JVMwiki.jvmlangsummit.com/pdf/16_Chase_fortress.pdf · ProjectFortress: run your whiteboard, in parallel, on the JVM Multiple

Page

ProjectFortress: run your whiteboard, in parallel, on the JVM

Desugaring generators

6

∑[i←a,j←b,p,k←c] ebecomes ∑(g)

⟨ e | i←a,j←b,p,k←c ⟩becomes ⟨g⟩

for i←a,j←b,p,k←c do e end becomes forLoop(g)

where g = (fn singleton ⇒ (a).join(b).generate(fn (i,j)⇒ (p).generate(fn ()⇒ (c).generate(fn (k)⇒ singleton(e))))))

Page 7: ProjectFortress: Run your whiteboard, in parallel, on the JVMwiki.jvmlangsummit.com/pdf/16_Chase_fortress.pdf · ProjectFortress: run your whiteboard, in parallel, on the JVM Multiple

Page

ProjectFortress: run your whiteboard, in parallel, on the JVM

Generator, for a “blocked range”

7

value object BlockedRange(lo: ℤ64, hi: ℤ64, b: ℤ64) extends Generator⟦ℤ64⟧

size = hi – lo + 1 generate⟦R extends Monoid⟦R,⊕⟧⟧(body: ℤ64 → R): R = if size ≤ max(b,1) then r : R = coerce(Identity) i : ℤ64 = lo while i ≤ hi do r := r⊕body(i) i += 1 end r else mid = ⎣(lo + hi) / 2⎦ BlockedRange(lo,mid,b).generate(body)) ⊕ BlockedRange(mid+1,hi,b).generate(body)) end end

SERIAL LOOP

PARALLEL SPLIT

Page 8: ProjectFortress: Run your whiteboard, in parallel, on the JVMwiki.jvmlangsummit.com/pdf/16_Chase_fortress.pdf · ProjectFortress: run your whiteboard, in parallel, on the JVM Multiple

Page

ProjectFortress: run your whiteboard, in parallel, on the JVM

Generator selection with overloading

8

value object BlockedRange(lo: ℤ64, hi: ℤ64, b: ℤ64) extends Generator⟦ℤ64⟧

(* Natural order: lo to hi *) size = hi – lo + 1 generate⟦R extends Monoid⟦R,⊕⟧⟧

(body: ℤ64 → R): R = ... generate⟦R extends CommutativeMonoid⟦R,⊕⟧⟧

(body: ℤ64 → R): R = ... generate⟦R extends { Monoid⟦R,⊕⟧, LeftZero⟦R,⊕⟧ }⟧

(body: ℤ64 → R): R = ... join⟦T⟧(Generator⟦T⟧): Generator⟦(ℤ64,T)⟧

end

Page 9: ProjectFortress: Run your whiteboard, in parallel, on the JVMwiki.jvmlangsummit.com/pdf/16_Chase_fortress.pdf · ProjectFortress: run your whiteboard, in parallel, on the JVM Multiple

Page

ProjectFortress: run your whiteboard, in parallel, on the JVM

Work stealing

9

1024 elements

512 elements

256 elements

128 elements64 elements

64 elements

Thread A’slocal state

Thread A’swork queue

Thread B,stealing work

push

pop-top

pop-bottom Arora, Blumhofe, Plaxton “ABP Queues”Doug Lea, JSR166y

Page 10: ProjectFortress: Run your whiteboard, in parallel, on the JVMwiki.jvmlangsummit.com/pdf/16_Chase_fortress.pdf · ProjectFortress: run your whiteboard, in parallel, on the JVM Multiple

Page

ProjectFortress: run your whiteboard, in parallel, on the JVM

Problems

• Work-stealing threads and transactions and contention managers.

• Type system mismatch• Dispatch compilation• Clean mapping to legacy

libraries

10

• Value objects• FP ops in different

rounding modes• Proper tail call

elimination• Self-profiling

interface

Wishlist

Page 11: ProjectFortress: Run your whiteboard, in parallel, on the JVMwiki.jvmlangsummit.com/pdf/16_Chase_fortress.pdf · ProjectFortress: run your whiteboard, in parallel, on the JVM Multiple

Page

ProjectFortress: run your whiteboard, in parallel, on the JVM

By “transactions”, we mean...

• How do I know if “f()” contains a transaction or not?> Transactions can nest

• How do I know if “f()” contains parallelism?(In Fortress, how could it not?)> Transactions may contain fork-join parallelism.

• This is largely unexplored.• Starting point is DSTM2.

11

Page 12: ProjectFortress: Run your whiteboard, in parallel, on the JVMwiki.jvmlangsummit.com/pdf/16_Chase_fortress.pdf · ProjectFortress: run your whiteboard, in parallel, on the JVM Multiple

Page

ProjectFortress: run your whiteboard, in parallel, on the JVM

Threads, transactions, contention.

• Doug Lea’s workstealing library (jsr166y) works great as long as threads don’t block.

• Transactions with contention management, sometimes “block”.

• Possible solutions:> steal carefully = from your spawn group, from your

transactional children.> accept a certain amount of conflict, shoot down

transactions when it occurs.> “continuations” -- if a thread can push a continuation as

part of its work, this is not an issue, but continuations must be very inexpensive.

> use wait-notify to obtain “continuations” (not fast enough).12

Page 13: ProjectFortress: Run your whiteboard, in parallel, on the JVMwiki.jvmlangsummit.com/pdf/16_Chase_fortress.pdf · ProjectFortress: run your whiteboard, in parallel, on the JVM Multiple

Page

ProjectFortress: run your whiteboard, in parallel, on the JVM

Example blocking problem1. Workstealing in Java pushes “join point” on Java stack. setFlag(queuedwork(), ownwork())2. If parent finishes its work, but not all children have

completed, what to do?A. “wait” -- but that’s not work-stealing.B. steal work W (randomly chosen; join point is still on Java stack).

3. Suppose W uses user-level abort to wait for a condition (Harris/Peyton Jones/Herlihy/Marlow advice): atomic do if NOT flag then abort() else ... end end(or it could be a Java-style wait).

4. Join is blocked by completion of W; W is blocked by execution of “setFlag” which follows join.

13

Page 14: ProjectFortress: Run your whiteboard, in parallel, on the JVMwiki.jvmlangsummit.com/pdf/16_Chase_fortress.pdf · ProjectFortress: run your whiteboard, in parallel, on the JVM Multiple

Page

ProjectFortress: run your whiteboard, in parallel, on the JVM

Type system• “traits”, “objects”, “functions”, “tuples”> traits = interfaces + defaults> objects = final classes OR singletons> traits and object may declare that they are “unboxed”.> Traits and objects describe everything except functions

and tuples.• Traits, objects and functions can be parameterized> by type: trait List⟦E⟧, trait Maybe⟦E⟧

> by N: trait Array1⟦T, nat b0, nat s0⟧

> by Op:trait Monoid⟦T extends Monoid⟦T,opr ⊕⟧, opr ⊕⟧

• “excludes” and “comprises” attributes on traits> Allows compilation straight down to true enums.

14

Page 15: ProjectFortress: Run your whiteboard, in parallel, on the JVMwiki.jvmlangsummit.com/pdf/16_Chase_fortress.pdf · ProjectFortress: run your whiteboard, in parallel, on the JVM Multiple

Page

ProjectFortress: run your whiteboard, in parallel, on the JVM

Goal: “compile booleans to one bit”

15

trait Boolean extends BooleanAlgebra⟦Boolean,∧,∨,¬,⊻⟧ comprises { true, false } opr ∧(self, other: Boolean): Boolean opr ∨(self, other: Boolean): Boolean opr ¬(self): Boolean opr ⊻(self, other: Boolean): Booleanend

object true extends Boolean opr ∧(self, other: Boolean) = other opr ∨(self, other: Boolean) = self opr ¬(self) = false opr ⊻(self, other: Boolean) = ¬otherend

object false extends Boolean opr ∧(self, other: Boolean) = self opr ∨(self, other: Boolean) = other opr ¬(self) = true opr ⊻(self, other: Boolean) = otherend

Only two subtypes

Subtypes are singletons

Exactly two Boolean instances

Not the JVM’s job!

disclaimer: this example is a lie, but the principle is sound

Page 16: ProjectFortress: Run your whiteboard, in parallel, on the JVMwiki.jvmlangsummit.com/pdf/16_Chase_fortress.pdf · ProjectFortress: run your whiteboard, in parallel, on the JVM Multiple

Page

ProjectFortress: run your whiteboard, in parallel, on the JVM

Type-dependent operations

16

extractLeft1(l0:D1⟦E⟧, m0:D0⟦D23⟦E⟧⟧ ) = ... extractLeft1(l0:D1⟦E⟧, m0:NonEmptyFingerTree⟦D23⟦E⟧⟧) = ... extractLeft1(l0:D24⟦E⟧, m0:FingerTree⟦D23⟦E⟧⟧ ) = ...

cast⟦T extends Any⟧(x:Any):T = typecase x of T => x else => throw CastError end

instanceOf⟦T extends Any⟧(x:Any):Boolean = typecase x of T => true else => false end

Page 17: ProjectFortress: Run your whiteboard, in parallel, on the JVMwiki.jvmlangsummit.com/pdf/16_Chase_fortress.pdf · ProjectFortress: run your whiteboard, in parallel, on the JVM Multiple

Page

ProjectFortress: run your whiteboard, in parallel, on the JVM

Type mapping plans

• Would like to avoid double-tagging.• Cannot erase types; must specialize or maintain

static parameters separately.• Could use custom classloader to partially specialize

types.> Believe that opr parameters will always be specialized> Not sure about type parameters; must beware of

polymorphic recursion. What if a type parameter happens to be an “enum”?

> Nat parameters will normally be maintained as final fields and extra parameters. (What Would Fortran Do?)

17

Page 18: ProjectFortress: Run your whiteboard, in parallel, on the JVMwiki.jvmlangsummit.com/pdf/16_Chase_fortress.pdf · ProjectFortress: run your whiteboard, in parallel, on the JVM Multiple

Page

ProjectFortress: run your whiteboard, in parallel, on the JVM

Multiple dispatch

• Want to say what mathematicians say:> scalar + scalar, matrix + matrix,

matrix • matrix, scalar • matrix> But not: scalar + matrix

• Patterns are a crutch for a crippled language.• Rules for overloading consistency ensure no

ambiguity -- “all paths” lead to the same choice, so the order can be tweaked.

18

Page 19: ProjectFortress: Run your whiteboard, in parallel, on the JVMwiki.jvmlangsummit.com/pdf/16_Chase_fortress.pdf · ProjectFortress: run your whiteboard, in parallel, on the JVM Multiple

Page

ProjectFortress: run your whiteboard, in parallel, on the JVM

Compiled multiple dispatch

• Suppose f(M,M), f(M,V), f(S,M), f(S,V), f(S,S)• Choose a parameter dispatch order, say 2,1• Generate a first dispatch interface for f:

interface I_F_2 { F_2(I_F_1 mvs); }

• Generate second dispatch interface for f:interface I_F_1 { F_1m(M m2); F_1v(V v2); F_1s(S s2);}

• add (inject?) interfaces and methods to M, V, and S:

19

Page 20: ProjectFortress: Run your whiteboard, in parallel, on the JVMwiki.jvmlangsummit.com/pdf/16_Chase_fortress.pdf · ProjectFortress: run your whiteboard, in parallel, on the JVM Multiple

Page

ProjectFortress: run your whiteboard, in parallel, on the JVM

20

class M implements I_F_1, I_F_2 { F_2(I_F_1 mvs) { mvs.F_1m(this); } F_1m(M m2) { F(this, m2); // MM } F_1v(V v2) { F(this, v2); // MV } F_1s(S s2) { fail(); // MS } ...}

class V implements I_F_1, I_F_2 { F_2(I_F_1 mvs) { mvs.F_1v(this); } F_1m(M m2) { fail(); // VM } F_1v(V v2) { fail(); // VV } F_1s(S s2) { fail(); // VS } ...}

class S implements I_F_1, I_F_2 { F_2(I_F_1 mvs) { mvs.F_1m(this); } F_1m(M m2) { F(this, m2); // SM } F_1v(V v2) { F(this, v2); // SV } F_1s(S s2) { F(this, s2); // SS } ...}

Page 21: ProjectFortress: Run your whiteboard, in parallel, on the JVMwiki.jvmlangsummit.com/pdf/16_Chase_fortress.pdf · ProjectFortress: run your whiteboard, in parallel, on the JVM Multiple

Page

ProjectFortress: run your whiteboard, in parallel, on the JVM

But M, V, S are nat-generic types

21

f⟦m,p,n⟧(M⟦m,p⟧,M⟦p,n⟧),

f⟦m,n⟧(M⟦m,n⟧,V⟦n⟧),

f⟦m,n⟧(S,M⟦m,n⟧),

f⟦n⟧(S,V⟦n⟧),

f(S,S)

• Nat params are> passed as hidden params to functions

M becomes M, Mm, MnV becomes V, Vn

> stored in data, not in type

Page 22: ProjectFortress: Run your whiteboard, in parallel, on the JVMwiki.jvmlangsummit.com/pdf/16_Chase_fortress.pdf · ProjectFortress: run your whiteboard, in parallel, on the JVM Multiple

Page

ProjectFortress: run your whiteboard, in parallel, on the JVM

22

class M implements I_F_1, I_F_2 { F_2(I_F_1 mvs) { mvs.F_1m(this, getM(), getN()); } F_1m(M m2, int m2m, int m2n) { assert(getN() == m2m); F(this, m2, getM(), m2m, m2n); // MM } F_1v(V v2, int v2n) { assert(getN() == v2n); F(this, v2, getM(), v2n); // MV } ...}

class V implements I_F_1, I_F_2 { F_2(I_F_1 mvs) { mvs.F_1v(this, getN()); } ...}

class S implements I_F_1, I_F_2 { F_2(I_F_1 mvs) { mvs.F_1m(this); } F_1m(M m2, int m2m, int m2n) { F(this, m2, m2m, m2n); // SM } F_1v(V v2, int v2n) { F(this, v2, v2n); // SV } F_1s(S s2) { F(this, s2); // SS } ...}

interface I_F_2 { F_2(I_F_1 mvs); }interface I_F_1 { F_1m(M m2, int m2m, int m2n); F_1v(V v2, int v2m); F_1s(S s2);}

Page 23: ProjectFortress: Run your whiteboard, in parallel, on the JVMwiki.jvmlangsummit.com/pdf/16_Chase_fortress.pdf · ProjectFortress: run your whiteboard, in parallel, on the JVM Multiple

Page

ProjectFortress: run your whiteboard, in parallel, on the JVM

Mapping to legacy libraries• From HPCS discussions with national labs users,

this turned out to be an interesting win for Python.• Legacy = Fortran; big users of big arrays.

Converting representations could be too costly.> We want to call the BLAS.

• Native code in a transaction, what’s that mean?> Maybe we do convert/copy after all.> Recalling launched missiles?

• What’s the “thread local” state of a work-stealing thread?

• Can native code block (either in a transaction or workstealing context?)

23

Page 24: ProjectFortress: Run your whiteboard, in parallel, on the JVMwiki.jvmlangsummit.com/pdf/16_Chase_fortress.pdf · ProjectFortress: run your whiteboard, in parallel, on the JVM Multiple

Page

ProjectFortress: run your whiteboard, in parallel, on the JVM

Value types

• Necessary for performance in HPC world.• Need “multiple return values” (i.e., a value aggregate).•Need to be able to create arrays of value types.> could transpose array of struct into struct of arrays, but this is

bad for mixing with other languages.

24

Page 25: ProjectFortress: Run your whiteboard, in parallel, on the JVMwiki.jvmlangsummit.com/pdf/16_Chase_fortress.pdf · ProjectFortress: run your whiteboard, in parallel, on the JVM Multiple

Page

ProjectFortress: run your whiteboard, in parallel, on the JVM

FP Ops in different rounding modes

• Static choice -- we want a variant of the different operations, not a mode bit.Roughly, “normal”, +, -, 0, inf.

• We have a pure Java implementation of the operations, but it is slow, misses a few corner cases.

• Special (to the JVM) methods would be good enough.

• We know this can be compiled/optimized to good code; treat this the same as sqrt.

25

Page 26: ProjectFortress: Run your whiteboard, in parallel, on the JVMwiki.jvmlangsummit.com/pdf/16_Chase_fortress.pdf · ProjectFortress: run your whiteboard, in parallel, on the JVM Multiple

Page

ProjectFortress: run your whiteboard, in parallel, on the JVM

Tail call elimination

• It’s a useful optimization even on a uniprocessor.• Fluffy stacks are not free, especially when you

have a zillion of them.• A Library-defined language has layer upon layer of

abstraction; why skip a chance to flatten out some of those layers?

26

Page 27: ProjectFortress: Run your whiteboard, in parallel, on the JVMwiki.jvmlangsummit.com/pdf/16_Chase_fortress.pdf · ProjectFortress: run your whiteboard, in parallel, on the JVM Multiple

Page

ProjectFortress: run your whiteboard, in parallel, on the JVM

Profiling feedback?

• High performance scientific libraries (FFTW, ATLAS) use it for their own code selection.

• We could use it to drive specialization of Fortress types and dispatch.> improved multiple dispatch> workstealing granularity> better treatment of transactions

• case fastest of ... ?• Don’t want to reinvent the JVM’s wheel.

27

Page 28: ProjectFortress: Run your whiteboard, in parallel, on the JVMwiki.jvmlangsummit.com/pdf/16_Chase_fortress.pdf · ProjectFortress: run your whiteboard, in parallel, on the JVM Multiple

Page

ProjectFortress: run your whiteboard, in parallel, on the JVM

Team• Guy Steele• Eric Allen• Jan-Willem Maessen• David Chase• Christine Flood• Sukyoung Ryu• Victor Luchangco• Steve Heller

28

• Sam Tobin-Hochstadt• Carl Eastlund• John Dias• Cheryl McCosh• Joe Hallett• Janus Nielsen• Dan Smith• Angelina Lee • Michael Spiegel• Ryan Culpepper• Jon Rafkind• Justin Hilburn• Nels Beckman

Interns

• Andrew BlackVisiting

+ a few externalparticipants

Page 29: ProjectFortress: Run your whiteboard, in parallel, on the JVMwiki.jvmlangsummit.com/pdf/16_Chase_fortress.pdf · ProjectFortress: run your whiteboard, in parallel, on the JVM Multiple

David [email protected]

ProjectFortress:Run your whiteboard,in parallel, on the JVM

Page 30: ProjectFortress: Run your whiteboard, in parallel, on the JVMwiki.jvmlangsummit.com/pdf/16_Chase_fortress.pdf · ProjectFortress: run your whiteboard, in parallel, on the JVM Multiple

Page

ProjectFortress: run your whiteboard, in parallel, on the JVM

Problems

• Work-stealing threads, transactions, and contention managers.

• Type system mismatch• Dispatch compilation• Clean mapping to legacy

libraries

30

• Value objectsIN ARRAYS!

• FP ops in different rounding modes

• Proper tail call elimination

• Self-profiling interface

Wishlist