presented by high productivity language and systems: next generation petascale programming wael r....

Presented by

High Productivity Language and Systems: Next Generation Petascale Programming

Wael R. Elwasif, David E. Bernholdt, and Robert J. Harrison

Computer Science & Mathematics DivisionOak Ridge National Laboratory

2 Elwasif_0611

Guiding the development of the next generation HPC programming environments

Candidate Languages: Chapel (Cray) Fortress

(Sun) X10 (IBM)

Chart the course towards high productivity, high performance scientific

programming on Petascale machines

by the year 2010.

Joint project with: Argonne National Laboratory Lawrence Berkeley National

Laboratory Rice University

Part of DARPA HPCS program

3 Elwasif_0611

Revolutionary approach to large scale parallel programming

Emphasis on performance and productivity

Not SPMD Lightweight “threads”, LOTS of them Different approaches to locality awareness/management

High level (sequential) language constructs Rich array data types (part of the base languages) Strongly typed object oriented base design Extensible language model Generic programming

4 Elwasif_0611

Concurrency: the next generation

Single initial thread of control Parallelism through language constructs

True global view of memory, one sided access model

Support task and data parallelism

“Threads” grouped by “memory locality”

Extensible, rich distributed array capability

Advanced concurrency constructs: Parallel loops Generator-based looping and distributions Local and remote futures

5 Elwasif_0611

What about productivity?

Index sets/regions for arrays “Array language” (Chapel, X10)

Safe(r) and more powerful language constructs Atomic sections vs locks Sync variables and futures Clocks (X10)

Type inference

Leverage advanced IDE capabilities

Units and dimensions (Fortress)

Component management, testing, contracts (Fortress)

Math/science-based presentation (Fortress)

6 Elwasif_0611

An exercise in MADNESS!!

Multiresolution ADaptive NumErical Scientific Simulation Multiresolution,

multiwavelet representation for quantum chemistry and other domains

Adaptive mesh in 1-6+ dimensions

Very dynamic refinement

Distribution by subtrees == spatial decomposition

Sequential code very compactly written using recursion

Initial parallel code using MPI is about 10x larger

7 Elwasif_0611

The MADNESS continues

Explicit MPI

template <typename T> void Function<T>::_refine(OctTreeTPtr& tree) {

if (tree->islocalsubtreeparent() && isremote(tree)) {

FOREACH_CHILD(OctTreeTPtr, tree, bool dorefine;

comm()->Recv(dorefine, tree->rank(), REFINE_TAG);

if (dorefine) {

set_active(child); set_active(tree); _project(child);

}

_refine(child););

} else {

TensorT* t = coeff(tree);

if (t) {

TensorT d = filter(*t);

d(data->cdata->s0) = 0.0;

bool dorefine = (d.normf() > truncate_tol(data->thresh,tree->n()));

if (dorefine) unset_coeff(tree);

FOREACH_REMOTE_CHILD(OctTreeTPtr, tree,

comm()->Send(dorefine, child->rank(), REFINE_TAG);

set_active(child););

FORIJK(OctTreeTPtr child = tree->child(i,j,k);

if (!child && dorefine) child = tree->insert_local_child(i,j,k);

if ( child && islocal(child)) {

if (dorefine) { _project(child); set_active(child); }

_refine(child);

});

} else {


comm()->Send(false, child->rank(), REFINE_TAG););

FOREACH_LOCAL_CHILD(OctTreeTPtr, tree, _refine(child););

} } }

Explicit MPI

template <typename T> void Function<T>::_refine(OctTreeTPtr& tree) {

if (tree->islocalsubtreeparent() && isremote(tree)) {

FOREACH_CHILD(OctTreeTPtr, tree, bool dorefine;

comm()->Recv(dorefine, tree->rank(), REFINE_TAG);

if (dorefine) {

set_active(child); set_active(tree); _project(child);

}

_refine(child););

} else {


if (t) {


d(data->cdata->s0) = 0.0;

bool dorefine = (d.normf() > truncate_tol(data->thresh,tree->n()));

if (dorefine) unset_coeff(tree);


comm()->Send(dorefine, child->rank(), REFINE_TAG);

set_active(child););

FORIJK(OctTreeTPtr child = tree->child(i,j,k);

if (!child && dorefine) child = tree->insert_local_child(i,j,k);

if ( child && islocal(child)) {

if (dorefine) { _project(child); set_active(child); }

_refine(child);

});

} else {


comm()->Send(false, child->rank(), REFINE_TAG););

FOREACH_LOCAL_CHILD(OctTreeTPtr, tree, _refine(child););

} } }

Cilk-like multithreaded all of the HPLS solutions look like this

Cilk-like multithreaded all of the HPLS solutions look like this

The complexity of the MPI code mostly arises from using an SPMD model to maintain a consistent distributed data structure.

The explicitly distributed code must work on active and inactive nodes – not necessary with global-view and remote activity creation.

template <typename T> void Function::_refine (OctTreeTPtr tree) {

FOREACH_CHILD(OctTreeTPtr, tree, _project(child););



if (d.normf() > truncate_tol(data->thresh,tree->n())) {

unset_coeff(tree);

FOREACH_CHILD(OctTreeTPtr, tree, spawn _refine(child););

} }

8 Elwasif_0611

Tradeoffs in HPLS language design Emphasis on parallel safety (X10) vs. expressivity

(Chapel, Fortress)

Locality control and awareness: X10: explicit placement and access Chapel: user-controlled placement, transparent access Fortress: placement “guidance” only, local/remote access

blurry (data may move!!!) What about mental performance models?

Programming language representation: Fortress: Allow math-like representation Chapel, X10: Traditional programming language front end How much do developers gain from mathematical

representation?

Productivity/Performance tradeoff Different users have different “sweet-spots”

9 Elwasif_0611

Remaining challenges

(Parallel) I/O model

Interoperability with (existing) languages and programming models

Better (preferably portable) performance models and scalable memory models Especially for machines with 1M+ processors

Other considerations: Viable gradual adoption strategy Building a complete development ecosystem

10 Elwasif_0611

Contacts

Wael R. ElwasifComputer Science and Mathematics Division(865) [email protected]

10 Elwasif_0611

David E. BernholdtComputer Science and Mathematics Division(865) [email protected]

Robert J. HarrisonComputer Science and Mathematics Division(865) [email protected]

presented by high productivity language and systems: next generation petascale programming wael r....

Documents

refineocttreetptr tree

coefftree foreach

tag foreach

distributed code

mpi code

tag set

activechild set

child islocalchild