1 andreea chis under the guidance of frédéric desprez and eddy caron scheduling for a climate...
TRANSCRIPT
1
Andreea Chisunder the guidance of Frédéric Desprez and Eddy Caron
Scheduling for a Climate Forecast Application
ANR-05-CIGC-11
2
LOGOContents
Scheduling Heuristics3
Introduction 1
Simulation Results4
Related Works2
Conclusions and Future Works5
3
LOGOContents
Scheduling Heuristics3
Introduction 1
Experimental Results4
Related Works2
Conclusions and Future Works5
4
LOGO General Purpose
Context : global warming and climate fluctuations
Numerical simulations using general circulation models of a climate system
• atmosphere
• ocean
• continental surfaces
Climatologists’ purpose estimate global warming simulations’ sensitivity with respect to
the model’s parameterization
Climate forecast application provided by CERFACS within the LEGO project
Introduction
5
LOGOOur Goal
Analyze the application
Model its needs
• Execution model
• Data access pattern
• Computing needs
Elaborate, test and compare appropriate scheduling heuristics
Provide generic scheduling schemes for applications with similar dependence graphs
Introduction
6
LOGOApplication Description
“Scenario” simulations current climate followed by 21st century for 150 years (1800
months) different parameterization of atmospheric model
Introduction
7
LOGOApplication Description
One monthly simulation :
concatenate_atmospheric_input_files(1) modify_parameters(1)
process_coupled_run
convert_output_format(60)
compress_diagonals(30) extract_minimun_information(30)
atmospheric model (ARPEGE) ocean and sea-ice model (OPA) runoff pathway (TRIP) coupler (OASIS)
Introduction
8
LOGOApplication Description
Introduction
9
LOGOContents
Scheduling Heuristics3
Introduction 1
Experimental Results4
Related Works2
Conclusions5
10
LOGORelated Works
Multiple DAGs Scheduling Mixed Parallelism Pipelined Data Parallel Tasks
Related Works
11
LOGOMultiple DAGs Scheduling
Directed Acyclic Graph (DAG) Nodes – tasks Edges – precedence constraints
Multiple DAGs Scheduling
Related Works
12
LOGOMultiple DAGs Scheduling
Composite DAG
Related Works
13
LOGOMultiple DAGs Scheduling
Group DAGs’ tasks in levels of independent tasks
Related Works
14
LOGORelated Works – Multiple DAGs Scheduling
Composite DAG and round-robin policy of scheduling among DAGs
Composite DAG & ranking based composition
Related Works
15
LOGOMixed Parallelism
Parallel scientific application Data parallelism
Task parallelism
Mixed parallelism
Scheduling a DAG on a finite number of resources – NP complete even for the simple case of mono-processor tasks
Heuristic approaches
Related Works
16
LOGOMixed Parallelism
A. Radulescu & A. Gemund (2001) – 2 step heuristic - CPA (Critical Path and Area based Scheduling)
Processors allocation to tasks - based on a compromise between the critical path length and the processor utilization
Task allocation on processors - list scheduling heuristic
Related Works
17
LOGOPipelined Data Parallel Tasks
Computations consisting of a chain of data-parallel tasks that process successive data sets in a pipeline fashion – particular case of mixed parallelism
2 key metrics to be optimized: Latency- duration of processing a data-set Throughput- rate at which data sets can be processed
Related Works
18
LOGORelated Works – Pipelined Data Parallel Tasks
Aspects to be considered :
Clustering of successive stages into modules
• Reduces communications
• Improves latency
Replicating modules
• Improves throughput
• Increases latency
Related Works
19
LOGOContents
Scheduling Heuristics3
Introduction 1
Experimental Results4
Related Works2
Conclusions5
20
LOGOScheduling Heuristics
Climate Application SchedulingGeneric Scheduling Heuristics
Scheduling Heuristics
21
LOGOClimate Application Scheduling
Homogeneous platform composed of R resources
Communication assumed contention-free through NFS
Tasks execution time is assumed to include the necessary time to access the data
redistribute it to processors
effective computing time
store back the data
Scheduling Heuristics
22
LOGOClimate Application Scheduling
concatenate_atmosferic_input_files(1) modify_parameters(1)
process_coupled_run
convert_output_format(60)
compress_diagonals(30) extract_minimun_information(30)
Main processing
Post
processing
Scheduling Heuristics
23
LOGOClimate Application Scheduling
We divide processors into disjoint sets on which multi-processor tasks can execute
All multi-processor tasks execute on the same number of resources G, defining a certain grouping of resources
For the given application, 8 possible values for the parameter G (4 →11)
Scheduling Heuristics
24
LOGOClimate Application Scheduling
Case 1
Case 2
Scheduling Heuristics
25
LOGOClimate Application Scheduling
The makespan is computed analytically as a function of number of resources R; grouping G ; number of months in an independent simulation (NM) number of independent simulations (NS).
The grouping G yielding the smallest makespan is chosen
Scheduling Heuristics
26
LOGOClimate Application Scheduling
The constraint of scheduling all multi-processor tasks on the same number of resources is tight Eg. R=53, NS=10, NM=1800,
• found optimal grouping G = 7;– 49 resources for main processing;– 1 resource used for the corresponding post-processing
– 3 resources unused.• however, 3 groups with 8 resources and 4 groups
with 7 resources – 4.5% of gain
Scheduling Heuristics
27
LOGOClimate Application Scheduling
Possibilities for improvement :
Heuristic 1• distribute evenly the unused resources among the existing
groups
Heuristic 2• use all resources for multi-processor tasks (evenly
distributing the extra-resources among processor groups)• all post-processing at the end
Heuristic 3• use all resources for multi-processor tasks and model the
problem as an instance of the knapsack problem • all post-processing at the end
Scheduling Heuristics
28
LOGOClimate Application Scheduling
Knapsack problem modelization Items – the 8 possibilities (groupings of resources) for
allocating processors to multi-processor tasks (4 → 11) Cost of an item – the number of resources of that grouping Value of a grouping G – 1/T[G] – the fraction of a multi-
processor task that gets executed in a time unit on G resources
Unknowns ni (i=4 → 11) – number of groups with i resources in the final solution
Constraints
Goal : maximize
11
4i i Rni
11
4i i NSn
11
4 ][
1i i iT
n
Scheduling Heuristics
29
LOGOClimate Application Scheduling
Scheduling Heuristics
30
LOGOGeneric Scheduling Heuristics
Scheduling Heuristics
We propose generic scheduling heuristicsfor a class of applications consisting of independent identical chains of identical DAGs
31
LOGOGeneric Scheduling Heuristics
First approach Create a composite DAG
– link all entry nodes to a common entry node and all exit tasks to a common exit node
Apply mixed parallelism scheduling heuristics on the composite DAG
• CPA – reduced complexity (O(V(V+E)R)); – drawback of being a 2 step algorithm.
Scheduling Heuristics
32
LOGOGeneric Scheduling Heuristics
Second approach
Exploit the knowledge on the specific structure of the application
• Exploit the pipelined structure of the application
• Separate the independent pre and post-processing tasks and schedule them with algorithms for independent malleable tasks (5/4 approximation in constant time)
Scheduling Heuristics
33
LOGOGeneric Scheduling Heuristics
Scheduling Heuristics
34
LOGOGeneric Scheduling Heuristics
Scheduling Heuristics
35
LOGOGeneric Scheduling Heuristics
Heuristic 1 Schedule all pre-processing tasks at the beginning Schedule inter and main processing tasks as interval
on the same number of resources Schedule all post-processing tasks at the end
Heuristic 2 Schedule all pre-processing tasks at the beginning Schedule inter and main processing tasks separately
as a pipeline Schedule all post-processing tasks at the end
Scheduling Heuristics
36
LOGOGeneric Scheduling Heuristics
Heuristic 3 Schedule inter and main processing tasks as an
interval pipeline on the same number of resources Schedule pre and post processing tasks
simultaneously on resources specially reserved for them as well as resources unused by the pipeline
Schedule pre and post-processing tasks left at the beginning and end of pipeline respectively
Scheduling Heuristics
37
LOGOGeneric Scheduling Heuristics
Heuristic 4 Schedule inter and main processing tasks separately
as a pipeline schedule pre and post processing tasks
simultaneously with the pipeline on resources specially reserved for them as well as resources unused by the pipeline ;
schedule pre and post processing tasks left at the beginning and end of pipeline respectively;
Scheduling Heuristics
38
LOGOContents
Scheduling Heuristics3
Introduction 1
Simulation Results4
Related Works2
Conclusions5
39
LOGOSimulation Results
Behavior of the 4 heuristics tested against CPA applied on the composite DAG
Tasks’ execution time modeled by Amdahl’s law:
Several configurations tested
1
1),( T
nntT
Simulation Results
40
LOGOSimulation Results Configuration 1
All tasks’ execution time on 1 processor identical (500) All tasks’ coefficient α is identical (0.1)
Simulation Results
41
LOGOSimulation Results Configuration 2
Same as before, with αinterprocessing = 0.8
Simulation Results
42
LOGOSimulation Results
Configuration 3 T1pre-processing= T1post-processing=50, T1main-processing = T1inter-processing=500
α= 0.1, α inter_processing=0.6
Simulation Results
43
LOGOSimulation Results Configuration 4
T1pre-processing= T1post-processing=50, T1main-processing = T1inter-processing=500
α= 0.1, α inter_processing=1.0
Simulation Results
44
LOGOContents
Scheduling Heuristics3
Introduction 1
Experimental Results4
Related Works2
Conclusions and Future Works5
45
LOGOConclusions
We found a model for the given real application
We proposed a basic heuristic for this model and 3 improved versions
We proposed 4 pipeline- based heuristics for the generalized problem and compared them with the approach of applying a mixed-parallelism algorithm on the composite DAG of the application
Conclusions and Future Works
46
LOGOFuture Works
Enhance the heuristics by taking into account a more precise communication model
Perform real experimentations on Grid’5000 in order to validate the theoretical results
Analyze other applications using a similar approach with the long term goal of deriving application dependent scheduling schemes that could finally be implemented as DIET plug-in schedulers
Conclusions and Future Works