senior design project: parallel task scheduling in heterogeneous computing environments senior...

21
Senior Design Project: Parallel Task Scheduling in Heterogeneous Computing Environments Senior Design Students: Christopher Blandin and Dylan Ma Post-doctoral Scholar: Bhavesh Khemka Faculty Advisor: H. J. Siegel Senior Design Presentation

Upload: eleanore-carr

Post on 18-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Senior Design Project: Parallel Task Scheduling in Heterogeneous

Computing Environments

Senior Design Students: Christopher Blandin and Dylan MachovecPost-doctoral Scholar: Bhavesh KhemkaFaculty Advisor: H. J. Siegel

Senior Design Presentation

2

Outline

n motivationn our system modeln problem statementn existing workn simulation detailsn future work

3

Motivation

n High Performance Computing (HPC) used by wide variety of fields to solve challenging problems5 physics simulations, oil and gas industry,

climate modeling, computational biology, computational chemistry, and many more

n improving performance increases productivity in these fieldsn we plan on improving performance of system

by designing novel scheduling techniquesn scheduling refers to the assignment and

ordering of tasks to machines for execution

4

System Model – Definitions

n heterogeneity5 differing execution characteristics

n homogeneity5 have the same execution characteristics

n oversubscribed5 more tasks arriving than the

system can execute immediately

5

System Model – Cluster Modeln clusters have multiple homogeneous nodesn clusters are heterogeneous from each othern nodes may have multiple multicore processorsn each node may only have one task running at a given time

5 avoids interference between tasks

n task assignments are done at node-leveln a task cannot be spread across two clusters

6

System Model – Workload Characteristics

n dynamically arriving tasksn when a task arrives, scheduler obtains the following information:

5 arrival time5 execution time

g different times on different clusters (because of heterogeneity)

5 number of processing cores required5 value function

n tasks are heterogeneousn no pre-emption

7

System Model – Value Function

n each task has a value function5 represents value of the task when it completes5 value function may be different for each task5 monotonically decreasing functions

n value functions can be fully described with four parameters5 a constant starting value5 after soft deadline value decays linearly to a final value5 after hard deadline value drops to zero

8

Problem Statement

n we measure the performance of a scheduler in our environment as the sum of the value earned by completing tasks over a given amount of time

n goal of heuristics: maximize total sum of value earned over a given amount of time5 improve performance of HPC systems

n main contribution5 design, simulation, and analysis of resource allocation

heuristics for task schedulingg heterogeneous HPC system with multiple clustersg tasks with associated value functions

with soft and hard deadlinesg each task executes in parallel over multiple cores

9

t4

t2

Mapping Eventn mapping event: when task assignment decision(s) are maden trigger mapping event whenever:

5 a node becomes available, or5 a task arrives

n during mapping event, all tasks that have not been reserved or have not started execution are considered mappable

n only makes task assignments that can start now5 heuristic may or may not make reservations

n1

n2

n3

n4

t1

t6

unmapped tasks set nodes of cluster 1

time

t11t9

t13

t12

n1

n2

nodes of cluster 2

t5

t8

t10

current time

t7 t3

t2 t4

10

Planned Heuristics

n four planned heuristics5 EASY Backfilling5 FCFS with Multiple Queues5 Max-Max Value5 Max-Max Value-Per-Resource

n submit to Metaheuristics International Conference (MIC 2015)5 submission deadline: 2/6/15

11

Existing Work – Dr. Siegel’s Group

n focuses on utility of tasks5 B. Khemka, R. Friese, L. D. Briceño, H. J. Siegel, A. A.

Maciejewski, G. A. Koenig, C. Groer, G. Okonski, M. M. Hilton, R. Rambharos and S. Poole, “Utility Functions and Resource Management in an Oversubscribed Heterogeneous Computing Environment,” IEEE Transactions on Parallel and Distributed Systems, accepted 2014, to appear.

n another work that models stepped value functions5 J-K Kim, S. Shivle, H. J. Siegel, A. A. Maciejewski, T. D.

Braun, et al. “Dynamically Mapping Tasks with Priorities and Multiple Deadlines in a Heterogeneous Environment,” Journal of Parallel and Distributed Computing, vol. 67, no. 2, pp. 154-169, Feb. 2007

12

Existing Work

n other parallel task scheduling techniques5 EASY Backfilling

g D. A. Lifka, “The ANL/IBM SP Scheduling System,” Proc. First Workshop Job Scheduling Strategies for Parallel Processing, pp. 295-303, 1995.

5 S. Gerald, R. Kettimuthu, A. Rajan and P. Sadayappan, “Scheduling of Parallel Jobs in a Heterogeneous Multi-Site Environment,” Job Scheduling Strategies for Parallel Processing, pp. 87-104, 2003.

13

Design of Parallel Simulator for Experiments

n extends existing serial simulator from Dr. Siegel’s group5 modified to handle scheduling of parallel tasks

n created new modules5 cluster class

g has nodes within it5 methods for obtaining parallel task information from

workload trace5 created a sleep task object to model

idle time within each machinen developed an algorithm to locate slots for parallel

tasks within the area occupied by sleep tasksn developed a method that picks the nodes that create the best

packing (i.e., create the least future restrictions)

14

Workloads for Simulations

n will use Dr. Dror Feitelson’s Parallel Workload Trace to model the workload arrival5 workload log from Curie Supercomputer

in France (has 93,312 cores)t using last 10 months of data

n may use Downey’s model for execution time scaling

15

Future Work

n Use simulator to implement and compare the planned heuristicsn running a post-mortem analysis

5 use a genetic algorithm to find a loose upper bound solution when we know in advance the arrival time and characteristics of all tasks

n since scheduling is NP-hard it is hard to quantify the performance of heuristics5 this analysis will give us a better

metric to compare our results with

Thank You

Questions?Feedback?

16

Back-up Slides

17

18

Packing Nodes Efficientlyn whenever an assignment is to be made,

all heuristics pick the nodes that create the least amount of restrictions for future assignments5 e.g., if task t8 needs 3 nodes, it will be assigned: n1, n2, n5

n1

n2

n3

n4

n5

current time

t8

t8

t8

time

Bhavesh Khemka
I would say that for the sake of this 12 minute presentation, you should move the second bullet of this slide (and its associated figure), and the next two slides (i.e., the heuristic descriptions to the backup section)You need to add slides about the other items that you guys HAVE worked on this semester, i.e., you need to have a slide about the simulator explaining what work you guys have done on it, and also a slide about the Dror F. Traces that you have been looking at and what you have been trying to do. It is important to mention those because only then it is an actual "report" of what you guys have worked on this semester.

19

Heuristics – Overviewn EASY Backfilling

5 considers tasks in a first come first serve (FCFS) order5 makes only one reservation for the first

task that cannot fit on idle machines5 backfills other tasks so that they do no delay the reservation

n FCFS with Multiple Queues5 puts the tasks in three queues5 takes 1, 4, and 8 tasks from the large, medium, and small

queues respectively5 assigns tasks if possible, and otherwise makes the earliest

reservation for them5 repeats until the queues are empty

20

Heuristics – Overviewn Max-Max Value

5 First phase: Considering all tasksg Determine the allocation choice that will earn it the

highest value without delaying any place holder tasku If there are ties, pick the choice with the earlier

completion time5 Second phase: Consider tasks from first phase

g Make assignment or a place-holder for the choice that earns the highest valueuThis assignment should not start execution after the

start of the earliest place holder task5 Repeat the two phases until no more tasks can be mapped

n Max-Max Value-Per-Resource5 Similar to Max-Max Value

21

Simulation Study

n to model real-world system environmentn experiments run on ISTeC Cray HPC Systemn uses real workload traces as inputs