overview of the crclim project - main - c2sm wiki · federal department of home affairs fdha...
TRANSCRIPT
Federal Department of Home Affairs FDHAFederal Office of Meteorology and Climatology MeteoSwiss
Overview of the crCLIMprojectChristophe CharpillozCOSMO users workshop20th of January 2017
2© COSMO user workshop, the 20th of January Christophe Charpilloz
Cloud-resolving climate modeling on future supercomputing platforms (crCLIM) [1]
A SNF funded SINERGIA project
[1] http://www.c2sm.ethz.ch/research/crCLIM.html
3© COSMO user workshop, the 20th of January Christophe Charpilloz
• MeteoSwiss• Institute for Atmospheric and Climate
Science ETHZ• Institute for Computer Systems
ETHZ• Swiss National Supercomputing
Center
An interdisciplinary project
4© COSMO user workshop, the 20th of January Christophe Charpilloz
• Subproject A- Oliver Fuhrer- Andrea Arteaga- Christophe Charpilloz
• Subproject B- Torsten Hoefler- Salvatore di Girolamo- Thomas Schulthess
The team and subprojects
• Subproject C- Christoph Schaer- Linda Schlemmer- Nikolina Ban- David Leutwyler- Daniel Luethi
• Subproject D- Heinli Wernli- Michael Sprenger- Nicolas Piaget- Stefan Ruedisueli*
Bold: leader✱: speaker at the current workshop
5© COSMO user workshop, the 20th of January Christophe Charpilloz
• Subproject A- Oliver Fuhrer- Andrea Arteaga- Christophe Charpilloz
• Subproject B- Torsten Hoefler- Salvatore di Girolamo- Thomas Schulthess
The team and subprojects
• Subproject C- Christoph Schaer- Linda Schlemmer- Nikolina Ban- David Leutwyler- Daniel Luethi
• Subproject D- Heinli Wernli- Michael Sprenger- Nicolas Piaget- Stefan Ruedisueli*
Bold: leader✱: speaker at the current workshop
6© COSMO user workshop, the 20th of January Christophe Charpilloz
1. Improve our understanding the processes governing water-cycle in a changing climate
2. Improve the representation of the water-cycle in climate models
3. Propose a computational framework allowing large scale climate simulation and allowing its analysis
Goals of the project
7© COSMO user workshop, the 20th of January Christophe Charpilloz
• 10 years climate simulation• On a continent scale
- See label “This proposal” on the figure
• Horizontal grid resolution of 2.2 km
• 60 to 80 vertical levels
Some numbers
8© COSMO user workshop, the 20th of January Christophe Charpilloz
• Huge computational cost
Problems
9© COSMO user workshop, the 20th of January Christophe Charpilloz
• Huge computational cost• Huge amount of data
generated by the simulation
Problems
10© COSMO user workshop, the 20th of January Christophe Charpilloz
• Huge computational cost• Huge amount of data
generated by the simulation- Currently 4.4 TB or 4400
GB per year of simulation
- Difficult to store if not possible
Problems
11© COSMO user workshop, the 20th of January Christophe Charpilloz
• Use large supercomputer systems- Use hybrid CPU-GPU architecture- Kesch, Daint, …
• The COSMO model has already been adapted to run on these architecture [2]- COSMO-pompa, STELLA, CPP DyCore
Solutions - Huge computational cost
[2] O. Fuhrer, C. Osuna, X. Lapillonne, T. Gysi, M. Bianco, and T. Schulthess. "Towards gpu-accelerated operational weather forecasting." In The GPU Technology Conference, GTC. 2013.
12© COSMO user workshop, the 20th of January Christophe Charpilloz
• Be able to run the model in a reasonable amount of time is only one part of the solution
• How do we conduct analysis if we can’t store the data ?- We propose a trade off between computational time and
storage- The idea is to trade space with time
Solutions - Huge amount of data
13© COSMO user workshop, the 20th of January Christophe Charpilloz
1. The simulation runs
Current situation
Runsimulation1
14© COSMO user workshop, the 20th of January Christophe Charpilloz
1. The simulation runs2. The simulation generates data
Current situation
Runsimulation Storeresults1 2
15© COSMO user workshop, the 20th of January Christophe Charpilloz
1. The simulation runs2. The simulation generates data3. The generated data is read by the analysis application
Current situation
Runsimulation Storeresults1 2
Analysetheresults
3
16© COSMO user workshop, the 20th of January Christophe Charpilloz
• Use a data virtualization layer or DVL• Developed by Salvatore di Girolamo
- Subproject B• The DVL is a layer between the analysis application and the
data
Solutions – The data virtualization layer (DVL)
17© COSMO user workshop, the 20th of January Christophe Charpilloz
DVL – Original simulation
18© COSMO user workshop, the 20th of January Christophe Charpilloz
DVL – Writing save points
19© COSMO user workshop, the 20th of January Christophe Charpilloz
DVL – Writing save points
20© COSMO user workshop, the 20th of January Christophe Charpilloz
DVL – Writing save points
21© COSMO user workshop, the 20th of January Christophe Charpilloz
DVL – Writing save points
22© COSMO user workshop, the 20th of January Christophe Charpilloz
DVL – Writing save points
23© COSMO user workshop, the 20th of January Christophe Charpilloz
DVL – Writing save points
24© COSMO user workshop, the 20th of January Christophe Charpilloz
The DVL – Data access
25© COSMO user workshop, the 20th of January Christophe Charpilloz
The DVL – Interception of the access
26© COSMO user workshop, the 20th of January Christophe Charpilloz
The DVL – The data is available
27© COSMO user workshop, the 20th of January Christophe Charpilloz
The DVL – “Simple” read
28© COSMO user workshop, the 20th of January Christophe Charpilloz
The DVL – Data is returned
29© COSMO user workshop, the 20th of January Christophe Charpilloz
The DVL – Data not available
30© COSMO user workshop, the 20th of January Christophe Charpilloz
The DVL – Re-run
31© COSMO user workshop, the 20th of January Christophe Charpilloz
The DVL – The data is computed
32© COSMO user workshop, the 20th of January Christophe Charpilloz
The DVL – The data is returned
33© COSMO user workshop, the 20th of January Christophe Charpilloz
• Open questions regarding data access- Performance ?
• Caching• Access pattern detection• Prefetching
- Application grouping ?- Remote Direct Memory Access (RDMA) ?
Open question and research – The DVL
34© COSMO user workshop, the 20th of January Christophe Charpilloz
• The DVL has to re-run the simulation- Multiple times
• The DVL has to choose the optimal re-run depending on- The requirement of the re-run- The availability of the resources
Solutions – Re-runs (done by the DVL)
35© COSMO user workshop, the 20th of January Christophe Charpilloz
• The optimal re-run is determined by a performance model• For example the costs of the first “nc_open” calls
- Developed by Salvatore di Girolamo
Solutions – Performance model
36© COSMO user workshop, the 20th of January Christophe Charpilloz
• The optimal re-run is determined by a performance model• For example the costs of the first “nc_open” calls
- Developed by Salvatore di Girolamo
Solutions – Performance model
37© COSMO user workshop, the 20th of January Christophe Charpilloz
• The previous model relies on:- I/O model (read, write data results)- COSMO performance model
• Both are still in development (todo, use approach like [5] ?)
Performance model - TODO
[5] T. Hoefler, W. Gropp, W. Kramer, and M. Snir. "Performance modeling for systematic performance tuning." In State of the Practice Reports, p. 6. ACM, 2011.
38© COSMO user workshop, the 20th of January Christophe Charpilloz
Proposed approach
Runsimulation Storesavepoints1 2
Restoresavepoints
anddore-runs
Analysetheresults3 DVL4
Runsimulation Storeresults1 2
Analysetheresults
3
The “classic” way The crClim way
39© COSMO user workshop, the 20th of January Christophe Charpilloz
Proposed approach
Runsimulation Storesavepoints1 2
Restoresavepoints
anddore-runs
Analysetheresults3 DVL4
Runsimulation Storeresults1 2
Analysetheresults
3
The “classic” way The crClim way
40© COSMO user workshop, the 20th of January Christophe Charpilloz
• The DVL has to choose the optimal re-run depending on the availability of the resources:- These machines may be CPU or hybrid CPU-GPU
architecture- The result of the simulation should be machine
independent
Problem - Re-runs
41© COSMO user workshop, the 20th of January Christophe Charpilloz
Re-runs – Why change architecture ?
42© COSMO user workshop, the 20th of January Christophe Charpilloz
Example - Re-runs machine selection
Few re-run instances
Many nodes perinstances: use CPU nodes
Many re-runinstances
Few node per instance: use GPU nodes
43© COSMO user workshop, the 20th of January Christophe Charpilloz
• A re-run can start from any save point• They can be executed on a different hardware than the
original simulation• The results need to be always consistent (perfect match)• We want bit-reproducibility [4]
Solution – Bit-reproducibility
[4] A. Arteaga, O. Fuhrer, and T. Hoefler. "Designing bit-reproducible portable high-performance applications." In Parallel and Distributed Processing Symposium, 2014 IEEE 28th International, pp. 1235-1244. IEEE, 2014.
44© COSMO user workshop, the 20th of January Christophe Charpilloz
• Can we prove it ?- Unlikely
• Do we suffer from performance penalty ?- First results tend to show that’s not the case (memory
bound instead of compute bound)
Open question – Bit-reproducibility
45© COSMO user workshop, the 20th of January Christophe Charpilloz
• Simulation on a continent scale at high horizontal resolution [3] (subproject C)
• An early prototype of the DVL (subproject B)• A reproducible version of COSMO (subproject A)
- Only tested with meteorological configuration
Achievements
[3] D. Leutwyler, O. Fuhrer, X. Lapillonne, D. Luthi, and C. Schar. "Towards European-scale convection-resolving climate simulations with GPUs: a study with COSMO 4.19." Geoscientific Model Development 9, no. 9 (2016): 3393.
46© COSMO user workshop, the 20th of January Christophe Charpilloz
Thank you for your attention
More about the crClim project in the next talk (S. Ruedisueli)
47© COSMO user workshop, the 20th of January Christophe Charpilloz
[1] http://www.c2sm.ethz.ch/research/crCLIM.html
[2] O. Fuhrer, C. Osuna, X. Lapillonne, T. Gysi, M. Bianco, and T. Schulthess. "Towards gpu-accelerated operational weather forecasting." In The GPU Technology Conference, GTC. 2013.
[3] D. Leutwyler, O. Fuhrer, X. Lapillonne, D. Luthi, and C. Schar. "Towards European-scale convection-resolving climate simulations with GPUs: a study with COSMO 4.19." Geoscientific Model Development 9, no. 9 (2016): 3393.
[4] A. Arteaga, O. Fuhrer, and T. Hoefler. "Designing bit-reproducible portable high-performance applications." In Parallel and Distributed Processing Symposium, 2014 IEEE 28th International, pp. 1235-1244. IEEE, 2014.
[5] T. Hoefler, W. Gropp, W. Kramer, and M. Snir. "Performance modeling for systematic performance tuning." In State of the Practice Reports, p. 6. ACM, 2011.
References
48© COSMO user workshop, the 20th of January Christophe Charpilloz 48
MeteoSvizzeraVia ai Monti 146CH-6605 Locarno-MontiT +41 58 460 92 22www.meteosvizzera.ch
MétéoSuisse7bis, av. de la PaixCH-1211 Genève 2T +41 58 460 98 88www.meteosuisse.ch
MétéoSuisseChemin de l‘AérologieCH-1530 PayerneT +41 58 460 94 44www.meteosuisse.ch
MeteoSwissOperation Center 1 CH-8058 Zurich-Airport T +41 58 460 91 11 www.meteoswiss.ch
Federal Department of Home Affairs FDHAFederal Office of Meteorology and Climatology MeteoSwiss