jefferson-coop

35
 This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under contract No. W-7405-Eng-48. Cooperative Parallelism: An evolutionary programming model for exploiting massively parallel systems David Jefferson, John May, Nathan Barton, Rich Becker, Jarek Knap Gary Kumfert, James Leek, John Tannahill Lawrence Livermore National Laboratory 

Upload: jenna-martin

Post on 08-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

8/7/2019 jefferson-coop

http://slidepdf.com/reader/full/jefferson-coop 1/35

 

This work was performed under the auspices of the U.S. Department of Energy

by University of California Lawrence Livermore National Laboratoryunder contract No. W-7405-Eng-48.

Cooperative Parallelism:

An evolutionary programming model 

for exploiting massively parallel systems

David Jefferson, John May,

Nathan Barton, Rich Becker, Jarek Knap

Gary Kumfert, James Leek, John TannahillLawrence Livermore National Laboratory 

8/7/2019 jefferson-coop

http://slidepdf.com/reader/full/jefferson-coop 2/35

 

Blue Gene / L

65,536 x 2 processors, 360 Tflops (peak)

Petaflop (peak) machine in 2 years

Petaflop (sustained) in 5 years

8/7/2019 jefferson-coop

http://slidepdf.com/reader/full/jefferson-coop 3/35

 

Co-op is a new programming paradigm and

components model for petascale simulation 

• Petascale performance driven by need for multiphysics, multiscalemodels

– fluid -- molecule

– continuum metal -- crystal

– plasma -- charged particle

– classical -- quantum

• Multiphysics, multiscale models call for a simulation componentsarchitecture

– whole, parallel simulation codes used as building blocks in larger simulations

– allows composition (federation) and reuse of codes already mature and trusted

• Multiphysics, multiscale models naturally exhibit MPMD parallelism– different subsystems, or length and time scales, require multiphysics

– multiphysics most efficient with different codes in parallel

• Efficient use of petascale resources requires more dynamic simulation algorithms

– much more flexible use of resources: dynamic (sub)allocation of processor nodes

– adaptive sampling family of multiscale algorithms

8/7/2019 jefferson-coop

http://slidepdf.com/reader/full/jefferson-coop 4/35

 

Co-op allows parallel simulations to be

used as components in larger computations

• Large parallel modelstreated as single objects:– coupled with little knowledge

of each others’ internals

• Coupled models:– different languages

– different paralleldecomposition

– different physics

• Components:– dynamically launched

– internally parallel

– externally parallel

–communicate in parallel

time

spacestate space

scale

ensemble coupling for parametric

sensitivity or optimization

{ }

8/7/2019 jefferson-coop

http://slidepdf.com/reader/full/jefferson-coop 5/35

 

Strain rate localization can be predicted

with multiscale expanding cylinder model

1/8 exploding cylinder 

• expands radially

• rings with reflecting

strain rate waves

• develops diagonal

shear bands

8/7/2019 jefferson-coop

http://slidepdf.com/reader/full/jefferson-coop 6/35

Classic SPMD

embedding of fine-

scale calculations

• nodes statically

allocated and

scheduled

• fine scale models

executed sequentiallytime for onemajor cycle

64nodes

fine scale physics

coarse scale model

8/7/2019 jefferson-coop

http://slidepdf.com/reader/full/jefferson-coop 7/35

 

Adaptive Sampling: a class of dynamic

algorithms for multiscale simulation

• Apply fine scale model where continuum model is

invalid…

• …but just a sample of the

elements

• Elsewhere, interpolate material

response function from results

previously calculated

• Much less fine scale work;

remaining computation may be

seriously unbalanced, however.

• More than an order of magnitude

of performance improvement 

may be achieved.

• Adaptive sampling is not AMR! coarse model is

generally accurate

coarse model assumptions

break down

8/7/2019 jefferson-coop

http://slidepdf.com/reader/full/jefferson-coop 8/35

 

Co-op model adds layer of dynamic MPMD

parallelism to familiar SPMD paradigm

MPMD federation

SPMD symponent

Processcomposed of threads

that use shared variables, locks, etc .

composed of processes

that use MPI 

composed of symponents

that use remote method 

invocation (RMI)

Thread

Sequential, with vector, pipeline,or multi-issue parallelism

Familiar 

parallelism

layers

New parallelism

layer 

8/7/2019 jefferson-coop

http://slidepdf.com/reader/full/jefferson-coop 9/35

 

Adaptive sampling app with integrated

fine scale DB

ALE3D CouplerLib

FSDB

CSM

n = 100 processes

z/p = 104zones/process

z = 106zones

T = 104timesteps

? = 100 µσεχ/τιµεστεπ

? = 1 0−2

(εϖαλ φραχτιον)

ΦΣΜΜαστερ

ΦΣΜ Σερϖερσ

ΦΣΜ

Μαστερ

ΦΣΜ Σερϖερσ

Continuum

8/7/2019 jefferson-coop

http://slidepdf.com/reader/full/jefferson-coop 10/35

 

Co-op Architecture

• NodeSet allocate / deallocate

– Contiguous node sets only

– Suballocation from original allocation

– Algorithms somewhat like memory allocation

• Symponent launch

– Array of symponents can be launched on array of nodesets

by single call

• Component termination detection– Parent symponent notified if child terminates

• Component kill

– Must work when target is deadlocked, looping, etc.

8/7/2019 jefferson-coop

http://slidepdf.com/reader/full/jefferson-coop 11/35

 

Remote Method Invocation (RMI)

• General semantics– Operation done by a thread  on a symponent 

– It can be nonblocking : caller gets a ticket and can later check, or wait for,completion of the RMI

– Exceptions supported

– Concurrent RMIs on same symponent executed in nondeterministic order 

• Three kinds of RMI recognized– Sequential body, threaded execution

• Inter-thread synchronization required

• MPI in body not permitted

• Thread concurrency limited by OS

– Parallel body, serialized execution• Atomic

• No recursion; no circularity (results in deadlock)

• MPI permitted and needed in body

– One way • “Call” does not involve a return

• Essentially an asynchronous, one-sided “active” message

– Others might be recognized in the future

8/7/2019 jefferson-coop

http://slidepdf.com/reader/full/jefferson-coop 12/35

 

More about RMI

• Inter-symponent synchronization– RMIs queued, and executed only when callee executes

AtConsistentState() method

– Last RMI signaled by special RMI: continue()

• Intra-symponent synchronization– Sequential body, threaded RMIs must use proper POSIX inter-thread

synchronization

• Implementation– Babel RMI over TCP

– Persistent connections at the moment (except for one-way)• Soon to be non-persistent

– Future implementations over • MPI-2

• UDP

• Native packet transports

8/7/2019 jefferson-coop

http://slidepdf.com/reader/full/jefferson-coop 13/35

 

Babel and Co-op are intimately related

• Symponents are Babel objects

• Co-op RMI implemented over Babel RMI

• Symponent APIs expressed in Babel’s SIDL language

• Any thread with a reference to a symponent can call RMIs on it

• References can be passed as args, results

• Caller and callee can be in different languages

• Co-op rests totally on Babel for 

– RMI syntax

– SIDL specification language

– Language interoperability

– Parts of implementation of RMI

8/7/2019 jefferson-coop

http://slidepdf.com/reader/full/jefferson-coop 14/35

8/7/2019 jefferson-coop

http://slidepdf.com/reader/full/jefferson-coop 15/35

time

64

nodes

fine scale physics

coarse scale model

MPMD refactoring and parallelized fine

scale models

8/7/2019 jefferson-coop

http://slidepdf.com/reader/full/jefferson-coop 16/35

64

nodes

time

full fine scalesimulations

coarse scale model

Adaptive Sampling

• evaluation fraction is the most critical performance parameter 

interpolated fine scale

behavior 

8/7/2019 jefferson-coop

http://slidepdf.com/reader/full/jefferson-coop 17/35

 

QuickTimeª and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Adaptive sampling + active load balancing

yields dramatic speedup

adaptive sample fine

scale simulations

coarse scale model

database retrieval and

interpolation

time

nodes

8/7/2019 jefferson-coop

http://slidepdf.com/reader/full/jefferson-coop 18/35

 

Performance of adaptive sampling

using the Co-op programming model

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

100000

0.0 5.0 10.0 15.0 20.0 25.0

Sim Time (µse

 

adaptive sampling

adaptive sampling with load balancing

classic model embedding

8/7/2019 jefferson-coop

http://slidepdf.com/reader/full/jefferson-coop 19/35

 

Conclusions

• MP/MS simulation drives need for petascale

performance

• MP/MS simulation requires

– componentized model construction– MPMD execution

– dynamic instantiation of components

• hence dynamic node allocation

–language interoperability

• Adaptive Sampling is amazigly powerful

8/7/2019 jefferson-coop

http://slidepdf.com/reader/full/jefferson-coop 20/35

 

End

8/7/2019 jefferson-coop

http://slidepdf.com/reader/full/jefferson-coop 21/35

 

PSI Project Overview

David Jefferson

Lawrence Livermore National Lab

8/7/2019 jefferson-coop

http://slidepdf.com/reader/full/jefferson-coop 22/35

8/7/2019 jefferson-coop

http://slidepdf.com/reader/full/jefferson-coop 23/35

Distribution of Coarse-scale and

Fine-scale Models across Processors

Coarse-scalemodel

Wallclock

time

Onecoars

escale

timestep

Many instances of fine-scale

model

. . .

. . .

MPMD f t i ll b tt

8/7/2019 jefferson-coop

http://slidepdf.com/reader/full/jefferson-coop 24/35

time

64nodes

• remote fine scale

models

• nodes dynamically 

allocated and

scheduled

• improved performance

due to better balance

fine scale physics

coarse scale model

MPMD refactoring allows better 

scheduling of fine scale model executions

8/7/2019 jefferson-coop

http://slidepdf.com/reader/full/jefferson-coop 25/35

time

125

nodes

Additional parallelism then becomes

available

• fine scale model executions independent

• “nearest neighbor” DB queries are mostly independent and

easily parallelizable as well

adaptive sample fine

scale simulations

coarse scale model

database retrieval and

interpolation

8/7/2019 jefferson-coop

http://slidepdf.com/reader/full/jefferson-coop 26/35

Multiscale material science application

8/7/2019 jefferson-coop

http://slidepdf.com/reader/full/jefferson-coop 27/35

 

Multiscale material science application

with parallel FS database

ALE3D Coupler

Lib

CSM

n = 100 processes

z/p = 104zones/process

z = 106zones

T = 104timesteps

? = 100 µσεχ/τιµεστεπ

? = 10−2

(εϖαλ φραχτιον)

ΦΣΜΜαστερ

ΦΣΜ Σερϖερσ

ΦΣΜΜαστερ

ΦΣΜ Σερϖερσ

∆ΒΜαστερ

∆Β Σερϖερσ

∆ΒΜαστερ

∆Β Σερϖερσ

∆ΒΧλονε 1

∆ΒΧλονε κ 

θυερψ()µαξ = ζ / ?ινσερτ()µαξ = ζ / ?

µεαν = ?ζ / ?

ρυνΦΣΜ()µαξ = ζ / ?

µεαν = ?ζ / ?

The PSI Project

8/7/2019 jefferson-coop

http://slidepdf.com/reader/full/jefferson-coop 28/35

 

The PSI Project

• Development Co-op model of hybrid

componentized MPMD computation.– Definition of computational model and semantic

issues

– Implementation of Co-op runtime system

– Implementation of extensions to Babel• Development of multiscale simulation

technology using Co-op– Theory and practice of adaptive sampling

– Implementation of adaptive sampling coupler withinCo-op framework

– Implementation of Fine Scale Model “database”suitable for adaptive sampling

• M-tree database with nearest neighbor queries

C C biliti

8/7/2019 jefferson-coop

http://slidepdf.com/reader/full/jefferson-coop 29/35

 

Co-op Capabilities

• NodeSet allocate/deallocate– Suballocation of nodeset of any size from job’s static allocation

– Free sets of nodesets, not nodes

• Symponent launch / kill– Any process can launch an SPMD executable as a new symponent with any

number of processes on a nodeset whose size divides n.

– Parent-child hierarchy: parent process notified of child death; child killed if parent dies

– Launch uses SLURM srun– Runaway or wedged symponent can be killed & its nodeset recovered

• Symponent remote references– Symponents can have remote references to one another, which they use for 

making RMI calls

– Remote references can be used as arguments in RMI calls

• Symponents and Babel– Symponents are Babel objects, and present SIDL interfaces

– Symponents inherit interfaces in type hierarchy, so they can be treated inobject-oriented fashion

– A symponent RMI is a Babel RMI• Full type safety

• Language independence / interoperability

Co op Capabilities

8/7/2019 jefferson-coop

http://slidepdf.com/reader/full/jefferson-coop 30/35

 

Co-op Capabilities

• Symponent RMI & synchronization– RMI calls are from a thread to a symponent 

– RMIs are one-sided , unexpected , and by default nonblocking – Any number of in- and out-args of any size and type can be used

– Full exception-throwing capability

– RMI’s can only be executed when callee calls atConsistentState()

– Special “system” RMIs inherited by all symponents: continue() and

kill()– Two kinds of user RMIs

• Sequential body, threaded execution, executes in Rank 0 only– Body executes in rank 0 process only

– Body is sequential, and does not need MPI

– Concurrent RMIs must synchronize with one another as threads• Parallel body, serialized execution, executes in all processes

– Each may be parallel, running on all processes of calleesymponent, but multiple RMI calls are serially executed, and henceatomic

– Normally use MPI 

8/7/2019 jefferson-coop

http://slidepdf.com/reader/full/jefferson-coop 31/35

time

64

nodes

Adaptive Sampling substitutes DB

retrieval and interpolation for full fine

scale evaluation

• subscale results tabulated in a DB

• faster DB queries and interpolations substituted for slower 

fine scale model executions

adaptive sample finescale simulations

coarse scale model

database retrieval and

interpolation

8/7/2019 jefferson-coop

http://slidepdf.com/reader/full/jefferson-coop 32/35

 

Linux

Current implementation of Co-op runs

multiscale models on Linux cluster 

8/7/2019 jefferson-coop

http://slidepdf.com/reader/full/jefferson-coop 33/35

 

Co-oplib

MPI

Linux

Co-opd

launch (SLURM / srun)

Current implementation of Co-op runs

multiscale models on Linux cluster 

Not shown: SLURM

daemons and srun()

processes

8/7/2019 jefferson-coop

http://slidepdf.com/reader/full/jefferson-coop 34/35

 

Linux

Co-opd

Babel

Co-oplib

MPI

launch (SLURM / srun)

RMI (over UDP)

CSM

Current implementation of Co-op runs

multiscale models on Linux cluster 

Not shown: SLURM

daemons and srun()

processes

8/7/2019 jefferson-coop

http://slidepdf.com/reader/full/jefferson-coop 35/35

Babel

Co-oplib

MPI

launch (SLURM / srun)

RMI (over UDP)

FSMs

CSM

Linux

Co-opd

Current implementation of Co-op runs

multiscale models on Linux cluster 

Not shown: SLURM

daemons and srun()

processes