swift: a parallel scripting for applications at the petascale and beyond

82
Prof N.B. Venkateswarlu GVPCEW, Visakhapatnam [email protected]

Upload: nagasuri-bala-venkateswarlu

Post on 26-Jan-2017

116 views

Category:

Engineering


1 download

TRANSCRIPT

Page 1: Swift: A parallel scripting for applications at the petascale and beyond

Prof N.B. VenkateswarluGVPCEW, [email protected]

Page 2: Swift: A parallel scripting for applications at the petascale and beyond

First Let me thank the organizers, especially Prof MHM Krishna Prasad Garu(For whom I am an informal Ph.D guide/mentor).

Page 3: Swift: A parallel scripting for applications at the petascale and beyond

3

Outline

• Introduction• Swift Language Introduction (Swift/K and

Swift/T)• Swift Coding examples illustrated

Apple Swift is different.

Page 4: Swift: A parallel scripting for applications at the petascale and beyond

As the theme of the workshop is research guidance, I would like to recall my case while I am doing my PhD

Page 5: Swift: A parallel scripting for applications at the petascale and beyond

Community Clusters – Do You Want to join me. It is not the place to talk more. You can be in touch with me at [email protected]

Page 6: Swift: A parallel scripting for applications at the petascale and beyond

Being an old CSE teacher, I used to ask students how Computer is used during 1970s and 1980s. Of course, today it is used mostly for

entertainment. Any one would like to share?.

Page 7: Swift: A parallel scripting for applications at the petascale and beyond

In the same sense, how really the High Performance Computing is becoming necessary today and coming years.

An example, Weka.

Page 8: Swift: A parallel scripting for applications at the petascale and beyond

Sunway TaihuLight - Sunway MPP, Sunway SW26010 260C 1.45GHz,

Sunway, 93Peta Flops

8

Page 10: Swift: A parallel scripting for applications at the petascale and beyond

The Scientific Computing Campaign

• Swift addresses most of these components10

THINK about what to run next

RUN a battery of tasks

COLLECT results

IMPROVE methods

and codes

Page 11: Swift: A parallel scripting for applications at the petascale and beyond

http://www.scidacreview.org/1002/html/swift.html

Page 12: Swift: A parallel scripting for applications at the petascale and beyond

http://swift-lang.org/main/http://swift-lang.org/tryswift/

Page 13: Swift: A parallel scripting for applications at the petascale and beyond

Swift

13

Swift is a data-flow oriented coarse grained scripting language that supports dataset typing and mapping, dataset iteration, conditional branching, and procedural composition.Swift programs (or workflows) are written in a language called Swift.Swift scripts are primarily concerned with processing (possibly large) collections of data files, by invoking programs to do that processing. Swift handles execution of such programs on remote sites by choosing sites, handling the staging of input and output files to and from the chosen sites and remote execution of programs.

Page 14: Swift: A parallel scripting for applications at the petascale and beyond

14

• Swift is a parallel scripting language for multi-cores, clusters,

grids, clouds, and supercomputers– for loosely-coupled applications - application linked by exchanging files– debug on a laptop, then run on a Cray

• Swift scripts are easy to write– simple high-level functional language with C-like syntax– small Swift scripts can do large-scale work

• Swift is easy to run: contain all services for running workflow - in one Java application

– simple command line interface• Swift is fast: based on an efficient execution engine

– scales readily to millions of tasks• Swift usage is growing:

– applications in neuroscience, proteomics, molecular dynamics, biochemistry, economics, statistics, earth systems science, and more.

Page 15: Swift: A parallel scripting for applications at the petascale and beyond

15

Clouds: Magellan,Amazon,

FutureGrid, BioNimbus

Challenge:complex variety of

computing resources

• Parallel distributed computing is HARD

• Swift harnesses diverse resources with simple scripts

• Many applications are well suited to this approach

• Motivates collaboration through libraries of pipelines and sharing of data and provenance

Challenge: Complexity of parallel computing

Page 16: Swift: A parallel scripting for applications at the petascale and beyond

16

When do you need Swift?Typical application: protein-ligand docking for drug screening

2M+ ligands

(B)

O(100K)drug

candidates

Tens of fruitfulcandidates for wetlab & APS

O(10)proteinsimplicatedin a disease

1Mcompute

jobs

X …

T1af7 T1r69T1b72

Page 17: Swift: A parallel scripting for applications at the petascale and beyond

17

Submit host(Laptop, Linux login node…)

Workflowstatus

and logs

Java applicationCompute

nodesf1

f2

f3

a1

a2

Data serverf1 f2 f3

Provenancelog

scriptAppa1

Appa2

sitelist

applist

Filetransport

Swift supports clusters, grids, and supercomputers.

Solution: parallel scripting for high level parallellism

Page 18: Swift: A parallel scripting for applications at the petascale and beyond

Goals of the Swift language

Swift was designed to handle many aspects of the computing campaign

• Ability to integrate many application components into a new workflow application

• Data structures for complex data organization• Portability- separate site-specific configuration from application logic• Logging, provenance, and

plotting features

• Today, we will focus on supporting multiple language applications at large scale

18

THINK

RUN

COLLECT

IMPROVE

Page 19: Swift: A parallel scripting for applications at the petascale and beyond

19

\A Summary of Swift in a nutshell•Swift scripts are text files ending in .swift The swift command runs on any host, and executes these scripts.• swift is a Java application, which you can install almost anywhere. On Linux, just unpack •the distribution tar file and add its bin/ directory to your PATH.

•Swift scripts run ordinary applications, just like shell scripts do. •Swift makes it easy to run these applications on parallel and remote computers(from laptops to supercomputers). If you can ssh to the system, Swift can likely run applications there.•The details of where to run applications and how to get files back and forth are described in configuration files separate from your program. Swift speaks ssh, PBS, Condor, SLURM, LSF, SGE, Cobalt, and Globus to run applications, and scp, http, ftp, and GridFTP to move data.•The Swift language has 5 main data types: boolean, int, string, float, and file.

Collections of these are dynamic, sparse arrays of arbitrary dimension and structures of scalars and/or arrays defined by thetype declaration.•Swift file variables are "mapped" to external files.• Swift sends files to and from remote systems for you automatically.•Swift variables are "single assignment": once you set them you can not change them (in a given block of code). This makes Swift a natural, "parallel data flow" language. This programming model keeps your workflow scripts simple and easy to write and understand.

Page 20: Swift: A parallel scripting for applications at the petascale and beyond

20

\A Summary of Swift in a nutshell

Swift scripts are text files ending in .swift The swift command runs on any host, and executes these scripts.• This programming model keeps your workflow scripts simple and easy to write and •understand.•Swift lets you define functions to "wrap" application programs, and to cleanly structure more complex scripts. Swift app functions take files and parameters as inputs and return files as outputs.•A compact set of built-in functions for string and file manipulation, type conversions, •high level IO, etc. is provided. •Swift’s equivalent of printf() is tracef(), with limited and slightly different format codes.

•Swift’s parallel foreach {} statement is the workhorse of the language, and executes all •iterations of the loop concurrently. The actual number of parallel tasks executed is based •on available resources and settable "throttles".•Swift conceptually executes all the statements, expressions and function calls in your• program in parallel, based on data flow. These are similarly throttled based on available •resources and settings.•Swift also has if and switch statements for conditional execution. These are seldom needed •in simple workflows but they enable very dynamic workflow patterns to be specified.

Page 21: Swift: A parallel scripting for applications at the petascale and beyond

21

Swift programming model:all progress driven by concurrent

dataflow

• F() and G() implemented in native code or external programs• F() and G()run in concurrently in different processes• r is computed when they are both done• This parallelism is automatic• Works recursively throughout the program’s call graph

(int r) myproc (int i, int j){ int x = F(i); int y = G(j); r = x + y;}

Page 22: Swift: A parallel scripting for applications at the petascale and beyond

22

Data-flow driven execution

1 (int r) myproc (int i)2 {3 j = f(i); 4 k = g(i);5 r = j + k;6 }

7 f() and g() are computed in parallel8 myproc() returns r when they are done

9 This parallelism is AUTOMATIC10 Works recursively down the program’s call graph

22

F() G()

i

+j k

r

Page 23: Swift: A parallel scripting for applications at the petascale and beyond

http://csg.csail.mit.edu/Users/arvind/

23

Aravind

Page 24: Swift: A parallel scripting for applications at the petascale and beyond

24

Swift programming model• Data typesint i = 4;string s = "hello world";file image<"snapshot.jpg">;

• Shell accessapp (file o) myapp(file f, int i){ mysim "-s" i @f @o; }

• Structured datatypedef image file;image A[];type protein_run {

file pdb_in; file sim_out;}bag<blob>[] B;

Conventional expressionsif (x == 3) { y = x+2;s = strcat("y: ", y);}

Parallel loopsforeach f,i in A { B[i] = convert(A[i]);} Data flowmerge(analyze(B[0], B[1]), analyze(B[2], B[3]));Swift: A language for distributed parallel scripting, J. Parallel

Computing, 2011

Page 25: Swift: A parallel scripting for applications at the petascale and beyond

25

Hierarchical programming model

Top-level dataflow scriptuser-workflow.swift

Distributed dataflow evaluationData dependency resolution

Work stealing / load balancing

SWIG-generated wrappers

C C++ Fortran

User libraries

MPI

MPI?

Page 26: Swift: A parallel scripting for applications at the petascale and beyond

Support calls to embedded interpreters

26

We have plugins Tcl, Julia, and QtScript

Page 27: Swift: A parallel scripting for applications at the petascale and beyond

27

A simple Swift script: functions run programs

1 type image; // Declare a “file” type.2 3 app (image output) rotate (image input) {4 {5 convert "-rotate" 180 @input @output ;6 }7 8 image oldimg <"orion.2008.0117.jpg">;9 image newimg <"output.jpg">;10 11 newimg = rotate(oldimg); // runs the “convert” app

27

Page 28: Swift: A parallel scripting for applications at the petascale and beyond

28

A simple Swift script: functions run programs

1 type image; // Declare a “file” type.2 3 app (image output) rotate (image input) {4 {5 convert "-rotate" 180 @input @output ;6 }7 8 image oldimg <"orion.2008.0117.jpg">;9 image newimg <"output.jpg">;10 11 newimg = rotate(oldimg); // runs the “convert” app

28

“application” wrapping function

Page 29: Swift: A parallel scripting for applications at the petascale and beyond

29

A simple Swift script: functions run programs

1 type image; // Declare a “file” type.2 3 app (image output) rotate (image input) {4 {5 convert "-rotate" 180 @input @output ;6 }7 8 image oldimg <"orion.2008.0117.jpg">;9 image newimg <"output.jpg">;10 11 newimg = rotate(oldimg); // runs the “convert” app

29

Input fileOutput file

Actual filesto use

Page 30: Swift: A parallel scripting for applications at the petascale and beyond

30

A simple Swift script: functions run programs

1 type image; // Declare a “file” type.2 3 app (image output) rotate (image input) {4 {5 convert "-rotate" 180 @input @output ;6 }7 8 image oldimg <"orion.2008.0117.jpg">;9 image newimg <"output.jpg">;10 11 newimg = rotate(oldimg); // runs the “convert” app

30

Invoke the “rotate” function to run the

“convert” application

Page 31: Swift: A parallel scripting for applications at the petascale and beyond

31

Parallelism via foreach { }1 type image; 2 3 (image output) flip(image input) {4 app {5 convert "-rotate" "180" @input @output;6 }7 }8 9 image observations[ ] <simple_mapper; prefix=“orion”>;10 image flipped[ ] <simple_mapper; prefix=“flipped”>;11 12 13 14 foreach obs,i in observations {15 flipped[i] = flip(obs); 16 }

31

Name outputs based on index

Process all dataset members in parallel

Map inputs from local directory

Page 32: Swift: A parallel scripting for applications at the petascale and beyond

32

foreach sim in [1:1000] { (structure[sim], log[sim]) = predict(p, 100., 25.);}result = analyze(structure)

…1000

Runs of the“predict”

application

Analyze()

T1af7

T1r69

T1b72

Large scale parallelization with simple loops

Page 33: Swift: A parallel scripting for applications at the petascale and beyond

33

Nested loops generate massive parallelism

1. Sweep( )2. {3. int nSim = 1000;4. int maxRounds = 3;5. Protein pSet[ ] <ext; exec="Protein.map">;6. float startTemp[ ] = [ 100.0, 200.0 ];7. float delT[ ] = [ 1.0, 1.5, 2.0, 5.0, 10.0 ];8. 9. foreach p, pn in pSet {10. foreach t in startTemp {11. foreach d in delT {12. ItFix(p, nSim, maxRounds, t, d);13. }14. }15. }16. }17. 18. Sweep();

33

10 proteins x 1000 simulations x3 rounds x 2 temps x 5 deltas

= 300K tasks

Page 34: Swift: A parallel scripting for applications at the petascale and beyond

34

A more complex workflow: fMRI data processing

reorientRun

reorientRun

reslice_warpRun

random_select

alignlinearRun

resliceRun

softmean

alignlinear

combinewarp

strictmean

gsmoothRun

binarize

reorient/01

reorient/02

reslice_warp/22

alignlinear/03 alignlinear/07alignlinear/11

reorient/05

reorient/06

reslice_warp/23

reorient/09

reorient/10

reslice_warp/24

reorient/25

reorient/51

reslice_warp/26

reorient/27

reorient/52

reslice_warp/28

reorient/29

reorient/53

reslice_warp/30

reorient/31

reorient/54

reslice_warp/32

reorient/33

reorient/55

reslice_warp/34

reorient/35

reorient/56

reslice_warp/36

reorient/37

reorient/57

reslice_warp/38

reslice/04 reslice/08reslice/12

gsmooth/41

strictmean/39

gsmooth/42gsmooth/43gsmooth/44 gsmooth/45 gsmooth/46 gsmooth/47 gsmooth/48 gsmooth/49 gsmooth/50

softmean/13

alignlinear/17

combinewarp/21

binarize/40

reorient

reorient

alignlinear

reslice

softmean

alignlinear

combine_warp

reslice_warp

strictmean

binarize

gsmooth

Dataset-level workflow

Expanded (10 volume) workflow

Page 35: Swift: A parallel scripting for applications at the petascale and beyond

35

Complex parallel workflows can be easily expressed Example fMRI preprocessing script below is automatically parallelized

(Run snr) functional ( Run r, NormAnat a, Air shrink ){ Run yroRun = reorientRun( r , "y" );

Run roRun = reorientRun( yroRun , "x" );Volume std = roRun[0];Run rndr = random_select( roRun, 0.1 );AirVector rndAirVec = align_linearRun( rndr, std, 12, 1000, 1000, "81 3 3" );Run reslicedRndr = resliceRun( rndr, rndAirVec, "o", "k" );Volume meanRand = softmean( reslicedRndr, "y", "null" );Air mnQAAir = alignlinear( a.nHires, meanRand, 6, 1000, 4, "81 3 3" );Warp boldNormWarp = combinewarp( shrink, a.aWarp, mnQAAir );Run nr = reslice_warp_run( boldNormWarp, roRun );Volume meanAll = strictmean( nr, "y", "null" )Volume boldMask = binarize( meanAll, "y" );snr = gsmoothRun( nr, boldMask, "6 6 6" );}

(Run or) reorientRun (Run ir, string direction) { foreach Volume iv, i in ir.v { or.v[i] = reorient(iv, direction); } }

Page 36: Swift: A parallel scripting for applications at the petascale and beyond

36

SLIM

36

Page 37: Swift: A parallel scripting for applications at the petascale and beyond

37

SLIM: Swift sketchtype DataFile;type ModelFile;type FloatFile;

// Function declarationapp (DataFile[] dFiles) split(DataFile inputFile, int noOfPartition) {}

app (ModelFile newModel, FloatFile newParameter) firstStage(DataFile data, ModelFile model) {}

app (ModelFile newModel, FloatFile newParameter) reduce(ModelFile[] model, FloatFile[] parameter){}

app (ModelFile newModel) combine(ModelFile reducedModel, FloatFile reducedParameter, ModelFile oldModel){}

app (int terminate) check(ModelFile finalModel){}

Page 38: Swift: A parallel scripting for applications at the petascale and beyond

38

SLIM: Swift sketch// Variables to hold the input dataDataFile inputData <"MyData.dat">;ModelFile earthModel <"MyModel.mdl">;

// Variables to hold the finalized modelsModelFile model[];

// Variables for reduce stageModelFile reducedModel[];FloatFile reducedFloatParameter[];

// Get the number of partition from command line parameterint n = @toint(@arg("n","1"));

model[0] = earthModel;//Partition the input dataDataFile partitionedDataFiles[] = split(inputData, n);

Page 39: Swift: A parallel scripting for applications at the petascale and beyond

39

SLIM: Swift sketch// Iterate sequentially until the terminal condition is metiterate v { // Variables to hold the output of the first stage ModelFile firstStageModel[] < simple_mapper; location = "output", prefix = @strcat(v, "_"), suffix = ".mdl" >; file_dat firstStageFloatParameter[] < simple_mapper; padding = 3, location = "output", prefix = @strcat(v, "_"), suffix = ".float" >;

// Parallel for loop foreach partitionedFile, count in partitionedDataFiles { // First stage (firstStageModel[count], firstStageFloatParameter[count]) = firstStage(model[v] , partitionedFile); }

// Reduce stage (use the files for synchronization) (reducedModel[v], reducedFloatParameter[v]) = reduce(firstStageModel, firstStageFloatParameter);

// Combine stage model[v+1] = combine(reducedModel[v] ,reducedFloatParameter[v], model[v]);

//Check the termination condition here int shouldTerminate = check(model[v+1]);} until (shouldTerminate != 1);

Page 40: Swift: A parallel scripting for applications at the petascale and beyond

40

Software stack for Swift on ALCF BG/Ps

Swift:scripting language, task coordination, throttling, data management, restart

Coaster execution provider:per-node agents for fast task dispatch across all nodes

Linux OScomplete, high performance Linux compute node OS with full fork/exec

Swift scripts

Shellscripts

Appinvocations

File System

Page 41: Swift: A parallel scripting for applications at the petascale and beyond

41

Java application

File Systemf1 f2 f3

Running Swift scripts

Coaster Service

Coaster Workers

• Start Coaster– Start coaster service– Start coaster workers

(qsub, ssh, …etc)• Run Swift script

Head node

Page 42: Swift: A parallel scripting for applications at the petascale and beyond

42

Swift vs. MapReduceWhat to compare (to maintain the level of abstraction)?• Swift framework (language, compiler, runtime, scheduler plugin

– Coasters, storage)• Map-reduce framework (Hive, Hadoop, HDFS)

Similarities:• Goals: productive programming environment, hide complexities related to parallel execution• able to scale, hide individual failures, load balancing

42

Page 43: Swift: A parallel scripting for applications at the petascale and beyond

43

Swift vs. MapReduceSwift Advantages: • Intuitive programing model.• Platform independent: from small clusters, to large ones, to large ‘exotic’ architectures – BG/P, distributed resources (grids).

- Map/reduce modes will not support all these- Can generally use the software stack available on the target machine with no changes

• More flexible data model (just your old files); - No migration pain; - Easy to share with other applications

• Optimized for coarse-granularity workflows.- E.g., files staged in a in-memory file-system; optimizations for specific patterns.

43

Page 44: Swift: A parallel scripting for applications at the petascale and beyond

• Write site-independent scripts • Automatic parallelization and data movement• Run native code, script fragments as applications• Rapidly subdivide large partitions for

MPI jobs• Move work to data locations

44

Swift control process

Swift control process

Swift/T control process

Swift worker process

C C++

Fortran

C C++

Fortran

C C++

Fortran

MPI

Swift/T worker

64K cores of Blue Waters2 billion Python tasks14 million Pythons/s

Swift/T: Enabling high-performance workflows

Page 45: Swift: A parallel scripting for applications at the petascale and beyond

45

Using Swift/Python/Numpyglobal const string numpy = "from numpy import *\n\n"; typedef matrix string;

(matrix A) eye(int n) { command = sprintf("repr(eye(%i))", n); code = numpy+command; matrix t = python(code); A = replace_all(t, "\n", "", 0); }

(matrix R) add(matrix A1, matrix A2) { command = sprintf("repr(%s+%s)", A1, A2); code = numpy+command; matrix t = python(code); R = replace_all(t, "\n", "", 0); }

a1 = eye(3); a2 = eye(3); sum = add(a1, a2); printf("2*eye(3)=%s", sum);

Page 46: Swift: A parallel scripting for applications at the petascale and beyond

Fully parallel evaluation of complex scripts

46

int X = 100, Y = 100;int A[][];int B[];foreach x in [0:X-1] { foreach y in [0:Y-1] { if (check(x, y)) { A[x][y] = g(f(x), f(y)); } else { A[x][y] = 0; } } B[x] = sum(A[x]);}

Page 47: Swift: A parallel scripting for applications at the petascale and beyond

Centralized evaluation can be a bottleneckat extreme scales

47

Had this (Swift/K): For extreme scale, we need (Swift/T):

Page 48: Swift: A parallel scripting for applications at the petascale and beyond

MPI: The Message Passing Interface

• Programming model used on large supercomputers• Can run on many networks, including sockets, or

shared memory• Standard API for C and Fortran, other languages have

working implementations• Contains communication calls for

– Point-to-point (send/recv)– Collectives (broadcast, reduce, etc.)

• Interesting concepts– Communicators: collections of

communicating processing and a context

– Data types: Language-independent data marshaling scheme

48

Page 49: Swift: A parallel scripting for applications at the petascale and beyond

ADLB: Asynchronous Dynamic Load Balancer

• An MPI library for master-worker workloads in C

• Uses a variable-size, scalable network of servers

• Servers implement work-stealing

• The work unit is a byte array• Optional work priorities, targets, types

• For Swift/T, we added:– Server-stored data– Data-dependent execution– Tcl bindings!

49

Servers

Workers

• Lusk et al. More scalability, less pain: A simple programming model and its implementation for extreme computing. SciDAC Review 17, 2010.

Page 50: Swift: A parallel scripting for applications at the petascale and beyond

One Swift server core per node:

16 n

ode

job

Flexible placement of server ranks in a Swift/T job

Four Swift nodes for the job:

16 n

ode

job

Page 51: Swift: A parallel scripting for applications at the petascale and beyond

A[3] = g(A[2]);

Example distributed execution • Code

• Evaluate dataflow operations

• Workers: execute tasks51

A[2] = f(getenv(“N”));

• Perform getenv()• Submit f

• Process f• Store A[2]

• Subscribe to A[2]

• Submit g

• Process g• Store A[3]

Task put Task put

Notific

ation

• Wozniak et al. Turbine: A distributed-memory dataflow engine for high performance many-task applications. Fundamenta Informaticae 128(3), 2013

Task get Task get

Page 52: Swift: A parallel scripting for applications at the petascale and beyond

Swift/T-specific features• Task locality: Ability to send a task to a process

– Allows for big data –type applications– Allows for stateful objects to remain resident in

the workflow– location L = find_data(D);int y = @location=L f(D, x);

• Data broadcast• Task priorities: Ability to set task priority

– Useful for tweaking load balancing• Updateable variables

– Allow data to be modified after its initial write– Consumer tasks may receive original or updated values when they emerge from the

work queue 52

Page 53: Swift: A parallel scripting for applications at the petascale and beyond

Swift/T: scaling of trivial foreach { } loop 100 microsecond to 10 millisecond tasks

on up to 512K integer cores of Blue Waters

53

Page 54: Swift: A parallel scripting for applications at the petascale and beyond

Swift/T application benchmarkson Blue Waters

54

Page 55: Swift: A parallel scripting for applications at the petascale and beyond

Swift/T Compiler and Runtime

• STC translates high-level Swiftexpressions into low-level Turbine operations:

55

– Create/Store/Retrieve typed data– Manage arrays– Manage data-dependent tasks

Page 56: Swift: A parallel scripting for applications at the petascale and beyond

x = g();if (x > 0) { n = f(x); foreach i in [0:n-1] { output(p(i)); }}

Swift code in dataflow• Dataflow definitions create nodes in the dataflow graph• Dataflow assignments create edges• In typical (DAG) workflow languages, this forms a static graph• In Swift, the graph can grow dynamically – code fragments are evaluated

(conditionally) as a result of dataflow• In its early implementation, these fragments were just tasks

56

x = g();

x

nforeach i … { output(p(i));

if (x > 0) { n = f(x); …

Page 57: Swift: A parallel scripting for applications at the petascale and beyond

Swift/T

http://swift-lang.org

Page 58: Swift: A parallel scripting for applications at the petascale and beyond

Swift/T optimization challenge: distributed vars

58http://swift-lang.org

Page 59: Swift: A parallel scripting for applications at the petascale and beyond

Swift/T optimizations improve data locality

59http://swift-lang.org

Page 60: Swift: A parallel scripting for applications at the petascale and beyond

Parallel tasks in Swift/T

• Swift expression: z = @par=32 f(x,y);• ADLB server finds 8 available workers

– Workers receive ranks from ADLB server– Performs comm = MPI_Comm_create_group()

• Workers perform f(x,y)communicating on comm

Page 61: Swift: A parallel scripting for applications at the petascale and beyond

LAMMPS parallel tasks

• LAMMPS provides a convenient C++ API

• Easily used by Swift/T parallel tasks

foreach i in [0:20] { t = 300+i; sed_command = sprintf("s/_TEMPERATURE_/%i/g", t); lammps_file_name = sprintf("input-%i.inp", t); lammps_args = "-i " + lammps_file_name; file lammps_input<lammps_file_name> = sed(filter, sed_command) => @par=8 lammps(lammps_args);}

Tasks with varying sizes packed into big MPI run Black: Compute Blue: Message White: Idle

Page 62: Swift: A parallel scripting for applications at the petascale and beyond

GeMTC: GPU-enabled Many-Task Computing

Goals:1) MTC support 2) Programmability3) Efficiency 4) MPMD on SIMD5) Increase concurrency to warp level

Approach: Design & implement GeMTC middleware:1) Manages GPU 2) Spread host/device3) Workflow system integration (Swift/T)

Motivation: Support for MTC on all accelerators!

Page 63: Swift: A parallel scripting for applications at the petascale and beyond

Logging and debugging in Swift• Traditionally, Swift programs are debugged through the log or

the TUI (text user interface)

• Logs were produced using normal methods, containing: – Variable names and values as set with respect to thread– Calls to Swift functions– Calls to application code

• A restart log could be produced to restart a large Swift run after certain fault conditions

• Methods require single Swift site: do not scale to larger runs63

Page 64: Swift: A parallel scripting for applications at the petascale and beyond

Logging in MPI• The Message Passing Environment (MPE)• Common approach to logging MPI programs• Can log MPI calls or application events – can store arbitrary data• Can visualize log with Jumpshot

• Partial logs are stored at the site of each process– Written as necessary to shared

file system• in large blocks• in parallel

– Results are merged into a big log file (CLOG, SLOG)

• Work has been done optimize the file format for various queries

64

Page 65: Swift: A parallel scripting for applications at the petascale and beyond

Swift for Really Parallel BuildsAppsapp (object_file o) gcc(c_file c, string cflags[]) {// Example:// gcc -c -O2 -o f.o f.c "gcc" "-c" cflags "-o" o c;}

app (x_file x) ld(object_file o[], string ldflags[]) {// Example:// gcc -o f.x f1.o f2.o ... "gcc" ldflags "-o" x o;}

app (output_file o) run(x_file x) { "sh" "-c" x @stdout=o;}

app (timing_file t) extract(output_file o) { "tail" "-1" o "|" "cut" "-f" "2" "-d" " " @stdout=t;}

Swift code string program_name = "programs/program1.c"; c_file c = input(program_name);

// For each foreach O_level in [0:3] { make file names… // Construct compiler flags string O_flag = sprintf("-O%i", O_level); string cflags[] = [ "-fPIC", O_flag ];

object_file o<my_object> = gcc(c, cflags); object_file objects[] = [ o ]; string ldflags[] = []; // Link the program x_file x<my_executable> = ld(objects, ldflags); // Run the program output_file out<my_output> = run(x); // Extract the run time from the program output timing_file t<my_time> = extract(out);

65

Page 66: Swift: A parallel scripting for applications at the petascale and beyond

Abstract, extensible MapReduce in Swiftmain {

file d[]; int N = string2int(argv("N")); // Map phase foreach i in [0:N-1] { file a = find_file(i); d[i] = map_function(a); } // Reduce phase file final <"final.data"> = merge(d, 0, tasks-1);}

(file o) merge(file d[], int start, int stop) { if (stop-start == 1) { // Base case: merge pair o = merge_pair(d[start], d[stop]); } else { // Merge pair of recursive calls n = stop-start; s = n % 2; o = merge_pair(merge(d, start, start+s), merge(d, start+s+1, stop)); }}

66

• User needs to implement map_function() and merge()

• These may be implemented in native code, Python, etc.

• Could add annotations• Could add additional custom

application logic

Page 67: Swift: A parallel scripting for applications at the petascale and beyond

Crystal Coordinate Transformation Workflow

• Goal: Transform 3D image stack from detector coordinates to real space coordinates– Operate on up to 50 GB data – Relatively light processing – I/O rates are critical– Core numerics in C++, Qt– Parallelism via Swift

• Challenges – Coupling C++ to Swift via Qscript– Data access to HDF file on distributed storage

67

Page 68: Swift: A parallel scripting for applications at the petascale and beyond

CCTW diagrams

68

Page 69: Swift: A parallel scripting for applications at the petascale and beyond

CCTW: Swift/T application (C++)bag<blob> M[]; foreach i in [1:n] {

blob b1= cctw_input(“pznpt.nxs”);blob b2[]; int outputId[]; (outputId, b2) = cctw_transform(i, b1); foreach b, j in b2 { int slot = outputId[j];

M[slot] += b; }} foreach g in M {

blob b = cctw_merge(g); cctw_write(b); }}

69

Page 70: Swift: A parallel scripting for applications at the petascale and beyond

Stateful external interpreters• Desire to use high-level, 3rd party algorithms in Python, R to orchestrate

Swift workflows, e.g.:– Python DEAP for evolutionary algorithms– R language GA package

• Typical control pattern: – GA minimizes the cost function– You pass the cost function to the library and wait

• We want Swift to obtain the parameters from the library– We launch a stateful interpreter on a thread– The "cost function" is a dummy that returns the

parameters to Swift over IPC– Swift passes the real cost function results back

to the library over IPC• Achieve high productivity and high scalability

– Library is not modified – unaware of framework!– Application logic extensions in high-level script

Load balancingSwift

workerPython/R

IPC GA

MPI Process

Tasks

Results

MPI

Page 71: Swift: A parallel scripting for applications at the petascale and beyond

Swift/T: Makefile example: 1app (object_file o) gcc(c_file c, string cflags[]){// gcc -c -O2 -o f.o f.c "gcc" "-c" cflags "-o" o c;}

app (x_file x) ld(object_file o[], string ldflags[]){// gcc -o f.x f1.o f2.o ... "gcc" ldflags "-o" x o;}

app (output_file o) run(x_file x){ "sh" "-c" x @stdout=o;}

71www.ci.uchicago.edu/swift www.mcs.anl.gov/exm

Page 72: Swift: A parallel scripting for applications at the petascale and beyond

Swift/T: Makefile example: 2 // For each foreach O_level in [0:3] { // Construct file names string my_object = sprintf("programs/program1-O%i.o", O_level);. . . string O_flag = sprintf("-O%i", O_level); string cflags[] = [ "-fPIC", O_flag ]; object_file o<my_object> = gcc(c, cflags); object_file objects[] = [ o ]; string ldflags[] = []; x_file x<my_executable> = ld(objects, ldflags);

72

Page 73: Swift: A parallel scripting for applications at the petascale and beyond

Native code overview• Turbine is a Tcl-based system• It is easy to call to Tcl from Swift/T

• Thus, we simply need to wrap the native function in a Tcl wrapper• You can do this manually or with SWIG

1. Build the native function as a shared library2. Wrap the native function in Tcl with SWIG3. Build a Tcl package4. Call to the Tcl function from Swift5. Run it- just make sure the package can be found at run time

73

Page 74: Swift: A parallel scripting for applications at the petascale and beyond

Simple app

type file;app (file out) echo_app (string s){ echo s stdout=filename(out);}file out <"out.txt">;out = echo_app("Hello world!");

74

Page 75: Swift: A parallel scripting for applications at the petascale and beyond

Foreach example

75

type file;app (file o) simulate_app (){ simulate stdout=filename(o);}foreach i in [1:10] { string fname=strcat("output/sim_", i, ".out"); file f <single_file_mapper; file=fname>; f = simulate_app();}

Page 76: Swift: A parallel scripting for applications at the petascale and beyond

Multiple Apps

76

type file;app (file o) simulate_app (int time){ simulate "-t" time stdout=filename(o);}

app (file o) stats_app (file s[]){ stats filenames(s) stdout=filename(o);}file sims[];int time = 5;int nsims = 10;foreach i in [1:nsims] { string fname = strcat("output/sim_",i,".out"); file simout <single_file_mapper; file=fname>; simout = simulate_app(time); sims[i] = simout;}file average <"output/average.out">;average = stats_app(sims);

Page 77: Swift: A parallel scripting for applications at the petascale and beyond

Multi stage

77

type file;app (file out) genseed_app (int nseeds){ genseed "-r" 2000000 "-n" nseeds stdout=@out;}app (file out) genbias_app (int bias_range, int nvalues){ genbias "-r" bias_range "-n" nvalues stdout=@out;}

app (file out, file log) simulation_app (int timesteps, int sim_range, file bias_file, int scale, int sim_count){ simulate "-t" timesteps "-r" sim_range "-B" @bias_file "-x" scale "-n" sim_count stdout=@out stderr=@log;}app (file out, file log) analyze_app (file s[]){ stats filenames(s) stdout=@out stderr=@log;}

Page 78: Swift: A parallel scripting for applications at the petascale and beyond

Multi stage

78

# Values that shape the runint nsim = 10; # number of simulation programs to runint steps = 1; # number of timesteps (seconds) per simulationint range = 100; # range of the generated random numbersint values = 10; # number of values generated per simulation# Main script and datatracef("\n*** Script parameters: nsim=%i range=%i num values=%i\n\n", nsim, range, values);# Dynamically generated bias for simulation ensemblefile seedfile<"output/seed.dat">;seedfile = genseed_app(1);int seedval = readData(seedfile);tracef("Generated seed=%i\n", seedval);file sims[];

Page 79: Swift: A parallel scripting for applications at the petascale and beyond

Multi stage

79

foreach i in [0:nsim-1] { file biasfile <single_file_mapper; file=strcat("output/bias_",i,".dat")>; file simout <single_file_mapper; file=strcat("output/sim_",i,".out")>; file simlog <single_file_mapper; file=strcat("output/sim_",i,".log")>; biasfile = genbias_app(1000, 20); (simout,simlog) = simulation_app(steps, range, biasfile,

1000000, values); sims[i] = simout;}file stats_out<"output/average.out">;file stats_log<"output/average.log">;(stats_out,stats_log) = analyze_app(sims);

Page 80: Swift: A parallel scripting for applications at the petascale and beyond

80

Page 81: Swift: A parallel scripting for applications at the petascale and beyond

81

Page 82: Swift: A parallel scripting for applications at the petascale and beyond

Thank You

82