texpoint fonts used in emf
DESCRIPTION
TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A. Taming parallelism. Task-parallelism. Message-passing. Data parallelism: Highly coarse-grained (MapReduce) Highly fine-grained (numeric computations on dense arrays) - PowerPoint PPT PresentationTRANSCRIPT
Swarat Chaudhuri Penn State
Roberto Lublinerman Pavol Cerny Penn State IST Austria
Parallel Programming with
Object Assemblies
Data parallelism:
- Highly coarse-grained (MapReduce)- Highly fine-grained (numeric computations on dense arrays)-Problem-specific methods
Taming parallelismTask-parallelism
Message-passing
Taming parallelism
Our target:
Data-parallel computations over large, unstructured, shared-memory graphs
Unknown granularity
High-level correctness as well as efficiency.
Delaunay mesh refinement• Triangulate a given set of points.• Delaunay property:
No point is contained within the circumcircle of a triangle.
• Quality property:No bad triangles—i.e., triangles with an angle > 120o.
• Mesh refinement:Fix bad triangles through an iterative algorithm.
Retriangulation
Cavity: all triangles whose circumcircle contains new point.
Quality constraint may not hold for all new triangles.
Sequential mesh refinementMesh m = /* read input mesh */Worklist wl = new Worklist(m.getBad());foreach triangle t in wl {Cavity c = new Cavity(t);c.expand();c.retriangulate();m.updateMesh(c);wl.add(c.getBad());
}
• Cavities are contiguous “regions” in the mesh.• Worst-case cavities can encompass the whole mesh.
Parallelization• Computation over complex, unstructured graphs
Mesh = Heap-allocated graph. Nodes = triangles. Edges = adjacency • Atomicity: Cavities must be retriangulated atomically.• Non-overlapping cavities can be processed in parallel.• Seems impossible to handle with static analysis:– Shape of data structure changes greatly over time.– Shape of data structure is highly input-dependent. – Without deep algorithmic knowledge, impossible to say if
statically if cavities will overlap. • Lots of recent work, notably by Pingali et al.
List of similar applications
• Delaunay mesh refinement, Delaunay triangulation• Agglomerative clustering, ray tracing• Social network maintenance• Minimum spanning tree, Maximum flow• N-body simulation, epidemiological simulation• Sparse matrix-vector multiplication, sparse Cholesky
factorization• Belief propagation, survey propagation in Bayesian inference• Iterative dataflow analysis, Petri net simulation• Finite-difference PDE solution
Locality of updates in Chorus
Cavity
• On a mesh of ~100,000 triangles from Lonestar benchmarks: Average cavity size = 3.75 triangles.Maximum cavity size = 12 triangles
• Average-case locality the essence of parallelism. • Chorus: parallel computation driven by
“neighborhoods” in heaps.
Heaps, regions, assemblies• Heap = directed graph
Nodes = objectsLabeled edges = pointers
• Region = induced subgraph• Assembly =
region + thread of control
Typically speculativeand shortlived.
• Assembly class = set of local variables + set of guarded updates + constructor + public
variables.• Program = set of classes • Synchronization happens in guard evaluation.
Programs, assembly classes
busyexecutingupdate
terminated
ready to be preemptedor execute next update
:: Guard: Update
• g is a condition on thelocal variables and owned objects of
Guards can merge assemblies:: merge (u.f): S
:: merge (u.f) when g: S
u f
• gets a bigger region, keeps local state
• dies.• must be in ready state
while merge happens
• Split into assemblies of
class T.• Other assemblies not
affected.• Not a synchronization
construct.
Updates can split an assemblysplit(T)
• Attempts to access objects outside region lead to exceptions.
Local updates
x = u.f;x.f = y;
u f
Delaunay mesh refinement• Use two assembly classes: Triangle and Cavity. – Cavity = local region in mesh.
• Each triangle:– Determines if it is bad (local check).– If so, merges with neighbors to become cavity.
• Each cavity:– Determines if it is complete (local check).– If no, merges with a neighbor.– If yes, retriangulates (locally) and splits.
Delaunay mesh refinement: sketchassembly Triangle:: ... action:: merge (v.f, Cavity) when isBad:
skip assembly Cavity:: ... action:: merge (v.f) when (not isComplete):
...
isComplete: retriangulate(); split(Triangle)
Delaunay mesh refinement: sketchassem Triangle:: ... action:: merge (v.f, Cavity, u) when bad?:
skip assem Cavity:: ... action:: merge (v.f) when (not complete?):
skip
complete?: retriangulate(); split(Triangle)
What happens on a conflict?
• Cavity i “absorbed” by cavity j.• Cavity j now has some
“unnecessary” triangles. • j will later split.
Boruvka’s algorithm for minimum spanning tree
• Assembly = spanning tree • Initially, each assembly has
one node.• As algorithm progresses, trees
merge.
Race-freedom• No aliasing, only
ownership transfer.
• can merge with only when is not in the middle of an update.
Deadlock-freedom• Classic definition: Process P waits for a resource from Q and
vice versa.• Deadlock in Chorus:– has a locally enabled merge with – has a locally enabled merge with – No other progress is possible.
• But one of the merges can always be carried out. (An assembly can always be killed at its ready state.)
u
JChorus• Chorus + sequential
Java.• Assembly classes in
addition to object classes.
7: assembly Cavity { 8: action { // expand cavity 9: merge(outgoingedges, TriangleObject t): 10: { outgoingedges.remove(t);11: frontier.add(t);12: build(); } 13: } 14: Set members; Set border;15: Queue frontier; // current frontier16: List outgoingedges; // outgoing edges on which to merge17: TriangleObject initial; ...
Division-based implementation• Division = set of assemblies
mapped to a core.• Local access:
Merge-actions within a divisionSplit-actionsLocal updates
• Remote access:Merge-actions issued across divisions
• Uses assembly-level locks.
Implementation strategies• Adaptive divisions. Heuristic for reducing
the number of remote merges.• During a merge, not only the target assembly, but
also assemblies reachable by k pointer indirections, are migrated.
• Adaptation heuristic does elementary load balancing.
• Union-find data structure to relate objects and assemblies that they belong to
• Needed for splits and merges.• Token-passing for deadlock prevention and
termination detection.
Experiments: Delaunay refinement from Lonestar benchmarks
• Large dataset from Lonestar benchmarks.– 100,364 triangles.– 47,768 initially bad.
• 1 to 8 threads.• Competing approaches:– Object-level locking– DSTM (Software transactions)
Locality: mesh snapshots
The initial mesh and divisions Mesh after several thousand retriangulations
Delaunay: Speedup over sequential
Delaunay: Self-relative speedup
Delaunay: Conflicts
Related models• Threads + explicit locking: Global heap abstraction, arbitrary
aliasing. • Software transactions: Burden of reasoning passed to
transaction manager. In most implementations, heap is viewed as global.
• Static data partitioning: Unpredictable nature of the computation makes static analysis hard.
• Actors: Based on low-level messaging. If sending references, potential of races. If copying triangles, inefficient.
• Pingali et al’s Galois: Same problem, but ours is an alternative.
More information
Parallel programming with object assemblies.Roberto Lublinerman, Swarat Chaudhuri, Pavol Cerny.OOPSLA 2009.
http://www.cse.psu.edu/~swarat/chorus