1 andreea chis under the guidance of frédéric desprez and eddy caron scheduling for a climate...

1

Andreea Chisunder the guidance of Frédéric Desprez and Eddy Caron

Scheduling for a Climate Forecast Application

ANR-05-CIGC-11

2

LOGOContents

Scheduling Heuristics3

Introduction 1

Simulation Results4

Related Works2

Conclusions and Future Works5

3

LOGOContents


Introduction 1

Experimental Results4

Related Works2


4

LOGO General Purpose

Context : global warming and climate fluctuations

Numerical simulations using general circulation models of a climate system

• atmosphere

• ocean

• continental surfaces

Climatologists’ purpose estimate global warming simulations’ sensitivity with respect to

the model’s parameterization

Climate forecast application provided by CERFACS within the LEGO project

Introduction

5

LOGOOur Goal

Analyze the application

Model its needs

• Execution model

• Data access pattern

• Computing needs

Elaborate, test and compare appropriate scheduling heuristics

Provide generic scheduling schemes for applications with similar dependence graphs

Introduction

6

LOGOApplication Description

“Scenario” simulations current climate followed by 21st century for 150 years (1800

months) different parameterization of atmospheric model

Introduction

7


One monthly simulation :

concatenate_atmospheric_input_files(1) modify_parameters(1)

process_coupled_run

convert_output_format(60)

compress_diagonals(30) extract_minimun_information(30)

atmospheric model (ARPEGE) ocean and sea-ice model (OPA) runoff pathway (TRIP) coupler (OASIS)

Introduction

8


Introduction

9

LOGOContents


Introduction 1


Related Works2

Conclusions5

10

LOGORelated Works

Multiple DAGs Scheduling Mixed Parallelism Pipelined Data Parallel Tasks

Related Works

11

LOGOMultiple DAGs Scheduling

Directed Acyclic Graph (DAG) Nodes – tasks Edges – precedence constraints

Multiple DAGs Scheduling

Related Works

12


Composite DAG

Related Works

13


Group DAGs’ tasks in levels of independent tasks

Related Works

14

LOGORelated Works – Multiple DAGs Scheduling

Composite DAG and round-robin policy of scheduling among DAGs

Composite DAG & ranking based composition

Related Works

15

LOGOMixed Parallelism

Parallel scientific application Data parallelism

Task parallelism

Mixed parallelism

Scheduling a DAG on a finite number of resources – NP complete even for the simple case of mono-processor tasks

Heuristic approaches

Related Works

16

LOGOMixed Parallelism

A. Radulescu & A. Gemund (2001) – 2 step heuristic - CPA (Critical Path and Area based Scheduling)

Processors allocation to tasks - based on a compromise between the critical path length and the processor utilization

Task allocation on processors - list scheduling heuristic

Related Works

17

LOGOPipelined Data Parallel Tasks

Computations consisting of a chain of data-parallel tasks that process successive data sets in a pipeline fashion – particular case of mixed parallelism

2 key metrics to be optimized: Latency- duration of processing a data-set Throughput- rate at which data sets can be processed

Related Works

18

LOGORelated Works – Pipelined Data Parallel Tasks

Aspects to be considered :

Clustering of successive stages into modules

• Reduces communications

• Improves latency

Replicating modules

• Improves throughput

• Increases latency

Related Works

19

LOGOContents


Introduction 1


Related Works2

Conclusions5

20

LOGOScheduling Heuristics

Climate Application SchedulingGeneric Scheduling Heuristics

Scheduling Heuristics

21

LOGOClimate Application Scheduling

Homogeneous platform composed of R resources

Communication assumed contention-free through NFS

Tasks execution time is assumed to include the necessary time to access the data

redistribute it to processors

effective computing time

store back the data


22


concatenate_atmosferic_input_files(1) modify_parameters(1)

process_coupled_run

convert_output_format(60)

compress_diagonals(30) extract_minimun_information(30)

Main processing

Post

processing


23


We divide processors into disjoint sets on which multi-processor tasks can execute

All multi-processor tasks execute on the same number of resources G, defining a certain grouping of resources

For the given application, 8 possible values for the parameter G (4 →11)


24


Case 1

Case 2


25


The makespan is computed analytically as a function of number of resources R; grouping G ; number of months in an independent simulation (NM) number of independent simulations (NS).

The grouping G yielding the smallest makespan is chosen


26


The constraint of scheduling all multi-processor tasks on the same number of resources is tight Eg. R=53, NS=10, NM=1800,

• found optimal grouping G = 7;– 49 resources for main processing;– 1 resource used for the corresponding post-processing

– 3 resources unused.• however, 3 groups with 8 resources and 4 groups

with 7 resources – 4.5% of gain


27


Possibilities for improvement :

Heuristic 1• distribute evenly the unused resources among the existing

groups

Heuristic 2• use all resources for multi-processor tasks (evenly

distributing the extra-resources among processor groups)• all post-processing at the end

Heuristic 3• use all resources for multi-processor tasks and model the

problem as an instance of the knapsack problem • all post-processing at the end


28


Knapsack problem modelization Items – the 8 possibilities (groupings of resources) for

allocating processors to multi-processor tasks (4 → 11) Cost of an item – the number of resources of that grouping Value of a grouping G – 1/T[G] – the fraction of a multi-

processor task that gets executed in a time unit on G resources

Unknowns ni (i=4 → 11) – number of groups with i resources in the final solution

Constraints

Goal : maximize

11

4i i Rni

11

4i i NSn

11

4 ][

1i i iT

n


29



30

LOGOGeneric Scheduling Heuristics


We propose generic scheduling heuristicsfor a class of applications consisting of independent identical chains of identical DAGs

31


First approach Create a composite DAG

– link all entry nodes to a common entry node and all exit tasks to a common exit node

Apply mixed parallelism scheduling heuristics on the composite DAG

• CPA – reduced complexity (O(V(V+E)R)); – drawback of being a 2 step algorithm.


32


Second approach

Exploit the knowledge on the specific structure of the application

• Exploit the pipelined structure of the application

• Separate the independent pre and post-processing tasks and schedule them with algorithms for independent malleable tasks (5/4 approximation in constant time)


33



34



35


Heuristic 1 Schedule all pre-processing tasks at the beginning Schedule inter and main processing tasks as interval

on the same number of resources Schedule all post-processing tasks at the end

Heuristic 2 Schedule all pre-processing tasks at the beginning Schedule inter and main processing tasks separately

as a pipeline Schedule all post-processing tasks at the end


36


Heuristic 3 Schedule inter and main processing tasks as an

interval pipeline on the same number of resources Schedule pre and post processing tasks

simultaneously on resources specially reserved for them as well as resources unused by the pipeline

Schedule pre and post-processing tasks left at the beginning and end of pipeline respectively


37


Heuristic 4 Schedule inter and main processing tasks separately

as a pipeline schedule pre and post processing tasks

simultaneously with the pipeline on resources specially reserved for them as well as resources unused by the pipeline ;

schedule pre and post processing tasks left at the beginning and end of pipeline respectively;


38

LOGOContents


Introduction 1

Simulation Results4

Related Works2

Conclusions5

39

LOGOSimulation Results

Behavior of the 4 heuristics tested against CPA applied on the composite DAG

Tasks’ execution time modeled by Amdahl’s law:

Several configurations tested

1

1),( T

nntT

Simulation Results

40

LOGOSimulation Results Configuration 1

All tasks’ execution time on 1 processor identical (500) All tasks’ coefficient α is identical (0.1)

Simulation Results

41


Same as before, with αinterprocessing = 0.8

Simulation Results

42

LOGOSimulation Results

Configuration 3 T1pre-processing= T1post-processing=50, T1main-processing = T1inter-processing=500

α= 0.1, α inter_processing=0.6

Simulation Results

43


T1pre-processing= T1post-processing=50, T1main-processing = T1inter-processing=500

α= 0.1, α inter_processing=1.0

Simulation Results

44

LOGOContents


Introduction 1


Related Works2


45

LOGOConclusions

We found a model for the given real application

We proposed a basic heuristic for this model and 3 improved versions

We proposed 4 pipeline- based heuristics for the generalized problem and compared them with the approach of applying a mixed-parallelism algorithm on the composite DAG of the application

Conclusions and Future Works

46

LOGOFuture Works

Enhance the heuristics by taking into account a more precise communication model

Perform real experimentations on Grid’5000 in order to validate the theoretical results

Analyze other applications using a similar approach with the long term goal of deriving application dependent scheduling schemes that could finally be implemented as DIET plug-in schedulers

Conclusions and Future Works

1 andreea chis under the guidance of frédéric desprez and eddy caron scheduling for a climate...

Documents

chain of data

data parallel tasksaspects

successive data sets

dagscomposite dag

general circulation

critical path length

c model arpege ocean

simple case of mono