stan posey; [email protected] nvidia, santa clara, ca,...
TRANSCRIPT
2
Model Trends and GPU Motivation
GPU Progress of Select Models
Next Generation Models and GPUs
Agenda: Progress of GPU-Parallel NWP and Climate Models
3
Application Segments and Models
Climate ModelingCoupled interactions of atmosphere, ocean, land surface, and ice
Atmosphere, then Ocean are primary computational bottlenecks,
recent inclusion of atmospheric chemistry emerging bottleneck
HPC objective: As much resolution and physics as practical costs will allow
Models: NICAM(JP), CESM(US), CFSv2(US), GEOS-5(US), MPI-ESM(DE), etc.
Numerical Weather Prediction (NWP)Weather prediction and forecasting using (mostly) atmospheric models
HPC objective: As much resolution and physics as forecast time will allow
Operational NWP models: UM(UK), IFS(UK), GFS(US), WRF(US), COSMO(EU), etc.
Next-gen NWP model research: NIM(US), ICON(DE), MPAS-A(US), UM/GungHo(UK), etc.
Ocean Circulation ModelsModels that predict shallow and deep ocean behavior, waves, storm surge, etc.
Examples: MOM4(US), HYCOM(US), POP(US), NEMO(EU), etc.
Atmosphere
Ocean
Land Surface
Sea-Ice Coupler
Atmosphere
Ocean
4
Higher grid resolution with manageable compute and energy costs Global NWP from 10-km today to global cloud-resolving scales of 1-km
Increase in ensemble use and number of ensemble members to manage uncertainty
Fewer model approximations, more features (physics, chemistry, etc.)
Accelerator technology identified as a cost-effective and practical approach to future computational challenges
Model Trends and Accelerator Motivation
128 km 16 km 10 km 1 km
?IFS
Source: Project Athena – http://www.wxmaps.org/athena/home/
Number of
Jobs by 10x
5
NASA targeting GEOS global model resolution at sub-10-km to 1-km range
Computational requirements for typical 5-day operational forecast:
Grid resolution Westmere CPU cores Comments
10 KM 12,000 Possible today
3 KM 300,000 Reasonable but not available
1 KM 10,000,000 Impractical, need acclerators
Source: http://data1.gfdl.noaa.gov/multi-core/
3.5-km GEOS-5 Simulated Clouds
(CPU-Only)
The Finite-Volume Dynamical Core on GPUs within GEOS-5- Dr. William Putman, Global Modeling and Assimilation Office, NASA GSFC
Example: NASA Global Cloud Resolving GEOS-6
6
Source: https://www2.cisl.ucar.edu/cas2k13/pgas-implementation-ecmwf-integrated-forecasting-system-ifs-nwp-model
From the 11th NCAR International Computing
for the Atmospheric Sciences Symposium (iCAS2013), Sep 2013
“A PGAS Implementation of the ECMWF Integrated Forecasting
System (IFS) NWP Model”-by George.Mozdzynski, ECMWF
Removing the hydrostatic approximation in the IFS NWP operational model:
from 6K cores to 80K (13x)
Example: ECMWF Global Non-Hydrostatic IFS
7
Example: Feature Growth in Climate Models
From the 3rd Assessment Report of Intergovernmental
Panel on Climate Change: IPCC Climate Change 2013
Source: http://www.ipcc.ch/report/ar5/wg1/#.Ut9V4BDTmM8
8
NWP/Climate HPC Centers in Europe and USA
Organization Location ModelsPrevious/Current Operational HPC
Current/NextOperational HPC
ECMWF Reading, UK IFS IBM Power Cray XC30 – x86
Met Office Exeter, UK UM IBM Power ? (2014 Decision)
DWD Offenbach, DE GME, COSMO, ICON NEC SX-9 Cray XC30 - x86
MF Toulouse, FR ALADIN, AROME NEC SX-9 Bull - x86
NOAA/NCEP Various, US GFS, WRF, FIM, NIM IBM Power IBM iDataPlex - x86
Motivation for x86 Migration Includes Preparation for Future Accelerator Deployment
NCAR Boulder, US CESM, WRF, MPAS IBM Power IBM iDataPlex - x86
DKRZ/MPI-M Hamburg, DE MPI-ESM IBM Power Bull - x86
Re
sear
ch
Op
era
tio
nal
NW
P
9
Early focus (~2010): climate and NWP research – early CUDA implementationsProject opportunities to refactor code with CUDA for GPU speedup demonstrations
Current focus: production research and operational models – OpenACC, librariesESM community requires Fortran for programming, portability, maintainability, etc.
NVIDIA investments in applications engineering and strategic partnershipsEngineering collaboration in 15 models/developments and growing (list to follow)
Ongoing software development of ESM-relevant libraries and OpenACCCUBLAS, CUSPARSE, AmgX, OpenACC collaborations with CAPS, Cray; PGI acquisition
OEM system integration and collaboration on strategic deploymentsIntegration— Cray x86; IBM Power 8 + GPUs with NVLink Interconnect; othersCollaborations — Cray: TITAN (18,688 K20X, #2 Top500) —ORNL/NOAA; Gaea —NOAA;Blue Waters (4224 K20) —NCSA; Piz Daint (5,272 K20X, #6 Top500) —CSCS;IBM: Yellowstone (Geyser, Caldera) — NCAR; Discover — NASA GSFC
Evolution of GPUs for NWP/Climate Modeling
10
Model Focus GPU Approach Collaboration
NCAR(US) / WRF NWP/Climate-R (1) OpenACC, (2) CUDA (1) NCAR, (2) SSEC UW-M
DWD(DE) / COSMO NWP/Climate-R CUDA+OpenACC CSCS, MeteoSwiss (MCH)
ORNL(US) / CAM-SE Climate-G CUDA-F OpenACC ORNL, Cray
NCAR(US) / CAM-SE Climate- G CUDA, CUDA-F, OpenACC NCAR-CISL
NOAA(US) / NIM&FIM NWP/Climate-G F2C-ACC, OpenACC NOAA-ESRL, PGI
NASA(US) / GEOS-5 Climate-G CUDA-F OpenACC NASA, PGI
IPSL(FR) / NEMO Ocean GCM OpenACC STFC
UKMO(UK) / GungHo NWP/Climate-G OpenACC STFC, UKMO in future?
USNRL(US) / HYCOM Ocean GCM OpenACC US Naval Research Lab
UT-JAMSTEC-RIKEN / NICAM Climate-G OpenACC RIKEN, TiTech
UNC-ND(US) / ADCIRC Storm Surge OpenACC (AmgX?) LSU LONI
NOAA(US) / MOM6 Ocean GCM OpenACC NOAA-GFDL
NASA(US) / FV-Core Atmospheric GCM OpenACC NASA, NOAA-GFDL
ECMWF(UK) / IFS NWP OpenACC ECMWF, CSC-FI
IPSL(FR) / DYNAMICO Atmospheric GCM CUDA-F, OpenACC IPSL
NVIDIA Collaborations in 15 Model Projects
Other Evaluations: US – COAMPS, MPAS, ROMS, OLAM; Europe – ICON, HARMONIE
Asia-Pacific – ASUCA (JP), GRAPES (CN), KWRF (KR)NVIDIA and Customer Confidential – DO NOT DISTRIBUTE
11
OpenACC Progress Important to NWP/Climate
Of today’s 11 non-vendor OpenACC members, all have NWP/Climate developments
NWP/Climate leads other domains by 2x on OpenACC development projects
GTC 27 Mar 2014: GTC OpenACC Roundtable for NWP and Climate Modeling Motivation to identify critical and common requests for international selection of 10 different CWO models
12
Model Representatives
1. ASUCA Takashi Shimokawabe, TiTech; Michel Müller, RIKEN
2. CAM-SE Jeff Larkin, NVIDIA US; Matt Norman, ORNL
3. COSMO Peter Messmer, NVIDIA CH; Claudio Gheller, Will Sawyer, CSCS
4. FIM/NIM Mark Govett, NOAA
5. HARMONIE JC Desplat, Enda O’Brien, ICHEC
6. ICON Peter Messmer, NVIDIA CH; Claudio Gheller, Will Sawyer, CSCS
7. NEMO Jeremy Appleyard, NVIDIA UK
8. NICAM Akira Naruse, NVIDIA JP; Hisashi Yashiro, RIKEN
9. WRF Carl Ponder, NVIDIA US
10. COAMPS Dave Norton, PGI; Gopal Patnaik, US NRL
Model Contributions at GTC OpenACC Session
GTC OpenACC Roundtable for NWP and Climate Modeling
13
Model Trends and GPU Motivation
GPU Progress of Select Models
Next Generation Models and GPUs
Agenda: Progress of GPU-Parallel NWP and Climate Models
14
Global
Scale Climate Weather OceanNCAR-CISL, ORNL / CESM• CAM-SE (HOMME)• LANL / POP
NASA / GEOS-5NOAA-GFDL / CFSv2• NOAA-GFDL / MOM6
UKMO / HadGEM3• UM
• NEMO
MPI-M / MPI-ESM• ECHAM5 • MPIOM
JAMSTEC, RIKEN, UTokyo / NICAMIPSL / DYNAMICO
UKMO / UMECMWF / IFSDWD / GME NOAA-NCEP / GFSEC, CMC / GEMUSNRL / NAVGEMNOAA-ESRL / FIM
DWD, MPI-M / ICONNOAA-ESRL / NIMNCAR / MPAS-A
LANL / POPNOAA-GFDL / MOM6CNRS, STFC/ NEMOUSNRL / HYCOMMIT / MITgcm
LANL / MPAS-OMPI-M / ICON-OCN
GPU Progress of NWP and Climate Models
NCAR-M3 / WRFUSNRL / COAMPSDWD, MCH / COSMOMFR / AROMEMFR, ICHEC / HARMONIE• HIRLAM + ALADIN
JAMSTEC-JMA / ASUCACAS-CMA / GRAPESUniMiami / OLAM
NCAR-M3 / WRFDWD, MCH / COSMOUniMiami / OLAM
Regional
Rutgers-UCLA / ROMSUNC-ND / ADCIRC
GPU Development (8)CAM-SE, GEOS-5, NEMO, WRF, COSMO, NIM, FIM, GRAPES
GPU Evaluation (15) POP, ICON, NICAM, OLAM, GungHo, PantaRhei, ASUCA, HARMONIE, COAMPS, HYCOM, MITgcm, ROMS, ADCIRC, DYNAMICO, MOM6
GPU Not Started (7)MPAS-A, MPAS-O, GFS, GEM, NAVGEM, AROME, ICON-OCN
Indicates Next-Gen Model
MPAS-O
MPAS-A or NIM
MPAS-A or NIM
ICON-ATM
NIM
GungHo
PantaRhei
MPAS-O
NIM?
ICON-OCN
ICON
GungHo
15
Model Focus GPU Approach Collaboration
NCAR(US) / WRF NWP/Climate-R (1) OpenACC, (2) CUDA (1) NCAR, (2) SSEC UW-M
DWD(DE) / COSMO NWP/Climate-R CUDA+OpenACC CSCS, MeteoSwiss (MCH)
ORNL(US) / CAM-SE Climate-G CUDA-F OpenACC ORNL, Cray
NCAR(US) / CAM-SE Climate- G CUDA, CUDA-F, OpenACC NCAR-CISL
NOAA(US) / NIM&FIM NWP/Climate-G F2C-ACC, OpenACC NOAA-ESRL, PGI
NASA(US) / GEOS-5 Climate-G CUDA-F OpenACC NASA, PGI
IPSL(FR) / NEMO Ocean GCM OpenACC STFC
UKMO(UK) / GungHo NWP/Climate-G OpenACC STFC, UKMO in future?
USNRL(US) / HYCOM Ocean GCM OpenACC US Naval Research Lab
UT-JAMSTEC-RIKEN / NICAM Climate-G OpenACC RIKEN, TiTech
UNC-ND(US) / ADCIRC Storm Surge OpenACC (AmgX?) LSU LONI
NOAA(US) / MOM6 Ocean GCM OpenACC NOAA-GFDL
NASA(US) / FV-Core Atmospheric GCM OpenACC NASA, NOAA-GFDL
ECMWF(UK) / IFS NWP OpenACC ECMWF, CSC-FI
IPSL(FR) / DYNAMICO Atmospheric GCM CUDA-F, OpenACC IPSL
NVIDIA Collaborations in 15 Model Projects
Other Evaluations: US – COAMPS, MPAS, ROMS, OLAM; Europe – ICON, HARMONIE
Asia-Pacific – ASUCA (JP), GRAPES (CN), KWRF (KR)NVIDIA and Customer Confidential – DO NOT DISTRIBUTE
16
COSMO Developments
Towards GPU-accelerated Operational Weather Forecasting
- Oliver Fuhrer (MeteoSwiss), NVIDIA GTC 2013, Mar 2013Source: http://on-demand.gputechconf.com/gtc/2013/presentations/S3417-GPU-Accelerated-Operational-Weather-Forecasting.pdf
Towards operational implementation of COSMO on accelerators at MeteoSwiss
- Oliver Fuhrer (MeteoSwiss), iCAS 2013, Sep 2013Source: https://www2.cisl.ucar.edu/sites/default/files/20130911_fuo_cas2k13.pdf
18
COSMO End-to-End Simulation Performance
From the 11th NCAR International Computing
for the Atmospheric Sciences Symposium (iCAS2013), Sep 2013
“Towards operational implementation of COSMO on accelerators at MeteoSwiss”-by Dr. Oliver Fuhrer, MeteoSwiss
Observe ~3x for COSMO demonstrator on GPUs over production model
(Based on COSMO-7 12UTC)
Source: https://www2.cisl.ucar.edu/sites/default/files/20130911_fuo_cas2k13.pdf
~3x Speedup. . .
. . . Requiring~7x Less Energy
19
http://irina.eas.gatech.edu/EAS8802_Spring2011/Lecture7.pdf
http://www.mmm.ucar.edu/wrf/users/workshops/WS2010/presentations/Lectures/Microphysics10.pdf
http://www.mmm.ucar.edu/wrf/users/docs/user_guide_V3.1/users_guide_chap5.htm#_Installing_WRF
http://www.mmm.ucar.edu/wrf/WG2/GPU/WSM5.htm
Jarno Mielikainen, Bormin Huang, Hung-Lung Allen Huang, and Mitchell D, Goldberg, “Improved GPU/CUDA Based Parallel
Weather and Research Forecast(WRF) Single Moment 5-Class (WSM5) Cloud Microphysics”, IEEE Journal of Selected Topics
in Applied Earth Observations and Remote Sensing, Vol 5, No.4, August 2012
WRF DevelopmentsCUDA Implementation of the Weather Research and Forecasting (WRF) Model
- Bormin Huang (SSEC), Supercomputing 2013, Nov 2013Source: http://on-demand.gputechconf.com/supercomputing/2013/presentation/SC3133-CUDA-Weather-Research-Forecasting-Model.pdf
20
WRF Operational in 21 Countries; 153 Total
Source: Welcome Remarks, 14th Annual WRF Users’ Workshop, 24-28 Jun 2013, Boulder, CO
21
Independent development paths of (I) CUDA and (II) OpenACC
I. CUDA development through funded collaboration with SSEC www.ssec.wisc.edu
CUDA WRF project began during 2010 through funding from NOAA and NASA
Project lead Dr. Bormin Huang, NVIDIA CUDA Fellow: research.nvidia.com/users/bormin-huang
NVIDIA-SSEC plan for full CPU-GPU hybrid WRF by Q3 2014 (today ~65% complete)
II. OpenACC collaboration with NCAR-MMM, NOAA-ESRL, and NOAA-NCEP
NCAR-MMM plans for OpenACC version of WRF-ARW in development trunk
NOAA-ESRL plans for OpenACC WRF physics with FIM and NIM dynamical cores
NOAA-NCEP interest in OpenACC HRRR/WRF configuration (operational in 2014)¥
Flexibility to combine CUDA and OpenACC modules into a WRF GPU modelNCAR-MMM interest in offering a GPU-accelerated WRF-ARW on trunk distribution site
NVIDIA Strategy for GPU-Accelerated WRF
22
Published WRF Speedups from SSEC
Source: Bormin Huang, Space Science and Engineering Center, UW-M
NOTE: All times without CPU data transfer
Hardware: Core-i7 3930K, 1 core use;GTX 590 GeForce
Benchmark: CONUS 12 km for 24 Oct 01
433 x 308, 35 levels
WRF V3.2 and V3.3
Verification: WSM5 by NREL (J. Michalakes) and NVIDIA Applications Engr [Next 2 slides]
23
Accelerator Results of WRF Scheme WSM5CONtinental United States (CONUS) of 12 km resolution domain for October 24, 2001, and is 433 x 308 horizontal grid points with 35 vertical levels.
Performance Results
K40 GPU vs. 2 x Sandybridge CPU = 3.2x
K40 GPU vs. 2 x Ivybridge CPU = 2.0x
K40 GPU vs. Phi = 1.8x
NOTE: K40 Boost Mode Provides 15% Gain
24
NEMO Developments
Accelerating NEMO with OpenACC
- Maxim Milakov (NVIDIA), NVIDIA GTC 2013, Mar 2013Source: http://on-demand.gputechconf.com/gtc/2013/presentations/S3209-Accelerating-NEMO-with-OpenACC.pdf
NEMO on GPU-based Heterogeneous Architectures: a Case Study Using OpenACC
- Jeremy Appleyard (NVIDIA), NEMO UGM, Jul 2014
25
NEMO Model http://www.nemo-ocean.eu/
Nucleus for European Modelling of the Ocean global and regional OGCM
Primary developers CNRS, Mercato-Ocean, UKMO, NERC, CMCC, INGV
OCN component for 5 of 7 Earth system models in the ENES http://enes.org
European consortium of 40 projects, 400 users, and ~50 publications/year
Configurations
GYRE50: Idealized double gyres, 1/4° horizontal resolution, 31 vertical layers
ORCA025: Global high resolution, 1/4° horizontal resolution, 75 vertical layers
NVIDIA “PSG” Cluster http://psgcluster.nvidia.com/trac
PSG consists of 30 compute nodes of mixed type, each 128 GB of system memory
This study: Each node 2 x Intel Xeon Ivy Bridge CPUs and 6 x NVIDIA K40 GPUs
NEMO tests on 8 nodes using 20 of 20 cores per node, and 2 of 6 GPUs per node
NEMO Performance with OpenACC and GPUs
26
NEMO Coupling to European Climate Models
NEMO critical for European climate models: ocean component for 5 of 7 modeling groups
UKMO has announced NEMO as the ocean component model for HadGEM3*
*
27
0
500
1000
1500
2000
2500
2 3 4 6 8
Tesla K40
Xeon IVBLower
is Better
Tota
l Tim
e f
or
1000 T
ime S
teps
(Sec) PSG node utilization:
2 x IVB + 2 x K40
GYRE settings:
NEMO GYRE 1/4° Configuration
3.7x
Number of Compute Nodes
3.2x2.9x
2.6x2.5x
Output every 5 days
Time steps = 1000
NEMO release 3.5
Without using GPUs
Use of GPUs
NEMO Performance with OpenACC and GPUs
28
0
500
1000
1500
2000
2500
4 6 8 10
Tesla K40
Xeon IVBLower
is Better
Tota
l Tim
e f
or
600 T
ime S
teps
(sec)
Node utilization:2 x IVB + 2 x K40
ORCA025 settings:
ORCA025 Configuration
2.3x
Number of Compute Nodes
2.1x
1.8x 1.7x
Output every 5 days
Total run: 10 days
Time steps: 600
NEMO 3.5
Without using GPUs
Use of GPUs
NEMO and GPU Performance with OpenACC
29
0
500
1000
1500
2000
2500
4 6 8 10
Tesla K40
Xeon IVBLower
is Better
Tota
l Tim
e f
or
600 T
ime S
teps
(sec)
Node utilization:2 x IVB + 2 x K40
ORCA025 settings:
ORCA025 Configuration
2.3x
Number of Compute Nodes
2.1x
1.8x 1.7x
Output every 5 days
Total run: 10 days
Time steps: 600
NEMO 3.5
Without using GPUs
Use of GPUs
NEMO and GPU Performance with OpenACC
30
2 Nodes: 8xK40, 4xIVB, 8 of 40 cores
10 Nodes: 20xIVB, 200 cores
2 nodes + 8 GPUs = 4 nodes + 8 GPUs = 10 nodes
4 Nodes: 8xK40, 8xIVB, 8 of 80 cores
NEMO HPC Configurations at Equal Performance
• Flexibility: GPUs free-up existing HPC nodes/cores for other applications
• Efficiency: GPU-based nodes more cost effective for new HPC purchase
31
Model Trends and GPU Motivation
GPU Progress of Select Models
Next Generation Models and GPUs
Agenda: Progress of GPU-Parallel NWP and Climate Models
32
New global NH dynamical cores with icosahedral spatial discretizationUniTokyo/JAMSTEC/RIKEN—NICAM; UMKO—GungHo; NCAR—MPAS; DWD—ICON;
NOAA—NIM; IPSL—DYNAMICO (NH in development); ECMWF-IFS (Scalability Project)
Model investigations of iterative (semi-)implicit methods and linear solversSolution methods to elliptic PDE’s with implicit time stepping
New US DOE Program ACME: Accelerated Climate Model for EnergyNew program to develop Earth system model for accelerated computingACME based on CESM, new co-design based on Trinity and Coral systems
Model Directions and GPU Developments
33
DYNAMICOFrance
Main Developer: Thomas. Dubos (IPSL), Yann Meurdesoif (IPSL)
Development Time:About 4 years for 3D model*hydrostatic model, developing
non-hydrostatic model in near future
MPASUS
Main Developer: Bill Skamarock (NCAR)
Development Time:About 7 years
ICONGermany
Main Developer: Zängl Günther (DWD), Marco Giorgetta (MPI-M)
Development Time:More than 10 years
NICAMJapan
Main Developer: Hirofumi Tomita (AICS),Masaki Satoh (U.Tokyo)
Development Time:More than 10 years
ICOMEX: GPU Developments for 3 of 4 Models
Source: ICOMEX Meeting 2014, R. Yoshida, RIKEN AICS
Next-gen global non-hydrostatic dynamical cores based on icosahedral grids
NICAM, ICON, DYNAMICO under development, MPAS ongoing discussions
34
NVIDIA Member of ECMWF Scalability Program
http://old.ecmwf.int/newsevents/meetings/workshops/2014/Scalability/
ECMWF Scalability Workshop, 14-15 Apr 2014
35
Jun 2013:Completed collaboration agreement with SFTC that includes GungHo (and NEMO)
Nov 2013:Report on GungHo including GPU considerations http://www.metoffice.gov.uk/media/pdf/8/o/FRTR587Tagged.pdf
Feb 2014:
Completed UKMO NDA and benchmark agreement
Technical proposals by NVIDIA:OpenACC for physics parameterizations
GungHo and ENDGame potential use of NVIDIA AmgX library
NVIDIA AmgX – a toolkit for iterative implicit solvers
Multigrid; Krylov: GMRES, CG, BiCGStab, preconditioned and ‘flexible’ variants
Classic iterative: Block-Jacobi, Gauss-Seidel, ILU’s; Multi-colored versions
Flexible configuration: All methods as solvers, preconditioners, or smoothers; nesting
NVIDIA Progress with Met Office and UM/GungHo
36
ACME: New Climate Program from US DOE
Accelerated Climate Model for EnergyConsolidation of DOE Earth system model lab projects from 7 into 1
ACME a development branch of CESM with its own coupling capabilityImprove CESM for optimal performance on DOE Leadership Class Facility (LCF)Towards non-hydrostatic global atmosphere 12 km, ocean 15 km, 80 years
Co-design with new DOE LCF systems based on Trinity and Coral programsNew LCF programs based on heterogeneous accelerator-based HPC
Source: http://asr.science.energy.gov/meetings/stm/2014/presentations/Koch-ASR-ACME-March11.pdf
37
Opportunities exist for GPUs to provide significant performance acceleration for NWP and Climate Models
Higher resolutions possible for existing operational models
Use of more expensive physics parameterizations for same turnaround
Reduced energy consumption in IT configuration and procedures
Potential for simulations recently considered impractical
Non-hydrostatic global models at cloud resolving scale
Parameter physics such as radiation at more frequent time steps
Expanded and more operational use of ensemble predictions
Summary Progress of GPUs for NWP/Climate