bridge the gap between hpc and htc applications structured as dags data dependencies will be files...

Paving the Road to Exascales with Many-Task Computing

Speaker: Ke WangHome page: http://datasys.cs.iit.edu/~kewang

Supervisor: Ioan RaicuData-Intensive Distributed Systems Laboratory

Computer Science DepartmentIllinois Institute of Technology

November 14th, 2012

Many-Task Computing (MTC)

• Bridge the gap between HPC and HTC• Applications structured as DAGs• Data dependencies will be files that are

written to and read from a file system • Loosely coupled apps with HPC

orientations

Paving the Road to Exascales with Many-Task Computing 1

Number of Tasks

Input Data Size

Hi

Med

Low1 1K 1M

HPC(Heroic

MPI Tasks)

HTC/MTC(Many Loosely Coupled Tasks)

MapReduce/MTC(Data Analysis,

Mining)

MTC(Big Data and Many Tasks)

• Falkon Fast and Lightweight Task Execution Framework http://datasys.cs.iit.edu/projects/Falkon/index.html

• Swift Parallel Programming System http://www.ci.uchicago.edu/swift/index.php

Load Balancing

• the technique of distributing computational and communication loads evenly across processors of a parallel machine, or across nodes of a supercomputer

• Different scheduling strategies– Centralized scheduling: poor scalability (Falkon, Slurm, Cobalt)– Hierarchical scheduling: moderate scalability (Falkon, Charm++)– Distributed scheduling: possible approach to exascales (Charm++)

• Work Stealing: a distributed load balancing strategy – Starved processors steal tasks from overloaded ones– Various parameters affect performance:

• Number of tasks to steal (half)• Number of neighbors (square root of number of all nodes)• Static or Dynamic random neighbors (Dynamic random neighbors)• Stealing poll interval (exponential back off)


SimMatrix


• light-weight and scalable discrete event SIMulator for MAny-Task computing execution fabRIc at eXascales

• supports centralized (FIFO) and distributed (work scheduling) scheduling

• has great scalability (millions of nodes, billions of cores, trillions of tasks)

• future extensions: task dependency, work flow system simulation, different network topologies, data-aware scheduling

LogVisual

StealAvailable

cores

Global Event Queue

Sorted by time

Insert Event(time:t)

No waiting tasks

TaskEndHas Waiting

Tasks

Failed

Dis

patc

h ta

sks

TaskRec

TaskDispStart

First node needs tasks

MATRIX

• a real implementation of distributed MAny-Task execution fabRIc at eXascales


Client

Compute nodeCompute node

Compute node

Index Server

registration (1)

send mem

bership list (2)

requ

est m

embe

rshi

p lis

t (3)

send

mem

bership

list (2)

submit tasks using ZHT(4)

lookup task status using ZHT(5)

send task status info (6)

request load (7)

request load (7)

send

load

(8)

send load (8)

request tasks (9)

send tasks (10)

3.8% 4.7% 3.7% 5.0%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0

1000

2000

3000

4000

5000

6000

64 128 256 512

Diff

eren

ce (%

)

Thro

ughp

ut (t

asks

/sec

)

Scale (# of nodes)

SimMatrix ThroughputMATRIX ThroughputDifference

Acknowledgement

• DataSys LaboratoryIoan RaicuAnupam RajendranTonglin LiKevin Brandstatter

• University of ChicagoZhao Zhang

bridge the gap between hpc and htc applications structured as dags data dependencies will be files...

Documents

task computing4 slide

task computing2 slide

charm distributed scheduling

task dependency

dataaware scheduling

task execution fabric

cobalt hierarchical

task computing1 falkon