ian foster computation institute argonne national laboratory & the university of chicago

From the Heroic to the Logistical

Programming Model Implicationsof New Supercomputing Applications

Ian Foster

Computation InstituteArgonne National Laboratory &

The University of Chicago

With thanks to: Miron Livny, Ioan Raicu, Mike Wilde, Yong Zhao, and many others.

What will we do with 1+ Exaflops and 1M+ cores?

3

1) Tackle Bigger and Bigger Problems

ComputationalScientist

as Hero

4

2) Tackle Increasingly Complex Problems

ComputationalScientist

as

LogisticsOfficer

5

“More Complex Problems”

Use ensemble runs to quantify climate model uncertainty Identify potential drug targets by screening a database of

ligand structures against target proteins Study economic model sensitivity to key parameters Analyze turbulence dataset from multiple perspectives Perform numerical optimization to determine optimal

resource assignment in energy problems Mine collection of data from advanced light sources Construct databases of computed properties of chemical

compounds Analyze data from the Large Hadron Collider Analyze log data from 100,000-node parallel computations

6

Programming Model Issues

Massive task parallelism Massive data parallelism Integrating black box applications Complex task dependencies (task graphs) Failure, and other execution management issues Data management: input, intermediate, output Dynamic task graphs Dynamic data access involving large amounts of data Long-running computations Documenting provenance of data products

7

Problem Types

Number of tasks

Inputdatasize

1 1K 1M

Hi

Med

Lo

HeroicMPI

tasks

Dataanalysis,mining

Many loosely coupled tasks

Much data and complex tasks

8

An Incomplete and Simplistic View ofProgramming Models and Tools

Many TasksDAGMan+Pegasus

Karajan+Swift

Much DataMapReduce/Hadoop

Dryad

Complex Tasks, Much DataDryad, Pig, Sawzall

Swift+Falkon

Single task, modest dataMPI, etc., etc., etc.

9

Image courtesy Pat Behling and Yun Liu, UW Madison

NCAR computer + grad student160 ensemble members in 75 days

TeraGrid + “Virtual Data System”250 ensemble members in 4 days

Many Tasks

Climate Ensemble Simulations(Using FOAM,2005)

10

Many Many Tasks:Identifying Potential Drug Targets

2M+ ligandsProtein x target(s)

(Mike Kubal, Benoit Roux, and others)

11

start

report

DOCK6Receptor

(1 per protein:defines pocket

to bind to)

ZINC3-D

structures

ligands complexes

NAB scriptparameters

(defines flexibleresidues, #MDsteps)

Amber Score:1. AmberizeLigand3. AmberizeComplex5. RunNABScript

end

BuildNABScript

NABScript

NABScript

Template

Amber prep:2. AmberizeReceptor4. perl: gen nabscript

FREDReceptor

(1 per protein:defines pocket

to bind to)

Manually prepDOCK6 rec file

Manually prepFRED rec file

1 protein(1MB)

6 GB2M

structures(6 GB)

DOCK6FRED ~4M x 60s x 1 cpu~60K cpu-hrs

Amber~10K x 20m x 1 cpu

~3K cpu-hrs

Select best ~500

~500 x 10hr x 100 cpu~500K cpu-hrsGCMC

PDBprotein

descriptions

4 million tasks500K cpu-hrs

Select best ~5KSelect best ~5K

12

DOCK on SiCortex

CPU cores: 5760 Tasks: 92160 Elapsed time: 12821 sec Compute time: 1.94 CPU years Average task time: 660.3 sec

(does not include ~800 sec to stage input data)

Ioan Raicu,Zhao Zhang

13

0

200

400

600

800

1000

1200

1400

1600

1800

2000

0 180 360 540 720 900 1080 1260 1440Time (sec)

CP

U C

ore

s

0

1000000

2000000

3000000

4000000

5000000

6000000

7000000

8000000

0 180 360 540 720 900 1080 1260 1440

Mic

ro-T

asks

Idle CPUsBusy CPUsWait Queue LengthCompleted Micro-Tasks

MARS Economic Model Parameter Study 2,048 BG/P CPU cores Tasks: 49,152 Micro-tasks: 7,077,888 Elapsed time: 1,601 secs CPU Hours: 894

Mike Wilde, Zhao Zhang

14

B. Berriman, J. Good (Caltech)J. Jacob, D. Katz (JPL)

15

Montage in MPIand Swift

(Yong Zhao, Ioan Raicu, U.Chicago)

MPI: ~950 lines of C for one stage Pegasus: ~1200 lines of C + tools to

generate DAG for specific dataset SwiftScript: ~92 lines for any dataset

16

MapReduce/Hadoop

Client

I/O

Namenode

Metadata (Name, replicas, …):/home/sameerp/foo, 3, …

/home/sameerp/docs, 4, …

Client

Datanodes

Rack 1 Rack 2

Metadata ops

Word Count

221

11431795

863

46887860

1

10

100

1000

10000

75MB 350MB 703MBData Size

Tim

e (

se

c)

Swift+PBSHadoop

Hadoop DFS Architecture

ALCF: 80 TB memory,8 PB disk,

78 GB/s I/O bandwidth

Soner Balkir, Jing Tie, Quan Pham

17

0

0.5

1

1.5

2

2.5

0 20000 40000 60000 80000 100000 120000 140000

Number of Application Tasks

Late

ncy

(sec

s)

1-deep (VN Mode)

2-deep (VN Mode)

3-deep (VN Mode)

131,072 processes

Extreme Scale Debugging: Stack Trace Sampling Tool (STAT)

Cost per sample on BlueGene/L

Bart Miller, Wisconsin

18

Summary

Peta- and exa-scale computers enable us to tackle new types of problems at far greater scales than before– Parameter studies, ensembles, interactive data

analysis, “workflows” of various kinds– Potentially an important source of new applications

Such apps frequently stress petascale hardware and software in interesting ways

New programming models and tools are required– Mixed task and data parallelism, management of many

tasks, complex data management, failure, … – Tools for such problems (DAGman, Swift, Hadoop, …)

exist but need refinement Interesting connections to distributed systems community

More info: www.ci.uchicago.edu/swift

19

Amiga Mars – Swift+Falkon

1024 Tasks (147456 micro-tasks) 256 CPU cores

ian foster computation institute argonne national laboratory & the university of chicago

Documents

cpu yearsaverage task

elapsed time

sawzallswift falkonsingle

mike wilde

zhao zhangb

seccompute time

tackle bigger

mpiand swiftyong zhao