stan posey; [email protected] nvidia, santa clara, ca,...

38
Stan Posey; [email protected] NVIDIA, Santa Clara, CA, USA

Upload: truongdien

Post on 29-Apr-2019

217 views

Category:

Documents


0 download

TRANSCRIPT

Stan Posey; [email protected]

NVIDIA, Santa Clara, CA, USA

2

Model Trends and GPU Motivation

GPU Progress of Select Models

Next Generation Models and GPUs

Agenda: Progress of GPU-Parallel NWP and Climate Models

3

Application Segments and Models

Climate ModelingCoupled interactions of atmosphere, ocean, land surface, and ice

Atmosphere, then Ocean are primary computational bottlenecks,

recent inclusion of atmospheric chemistry emerging bottleneck

HPC objective: As much resolution and physics as practical costs will allow

Models: NICAM(JP), CESM(US), CFSv2(US), GEOS-5(US), MPI-ESM(DE), etc.

Numerical Weather Prediction (NWP)Weather prediction and forecasting using (mostly) atmospheric models

HPC objective: As much resolution and physics as forecast time will allow

Operational NWP models: UM(UK), IFS(UK), GFS(US), WRF(US), COSMO(EU), etc.

Next-gen NWP model research: NIM(US), ICON(DE), MPAS-A(US), UM/GungHo(UK), etc.

Ocean Circulation ModelsModels that predict shallow and deep ocean behavior, waves, storm surge, etc.

Examples: MOM4(US), HYCOM(US), POP(US), NEMO(EU), etc.

Atmosphere

Ocean

Land Surface

Sea-Ice Coupler

Atmosphere

Ocean

4

Higher grid resolution with manageable compute and energy costs Global NWP from 10-km today to global cloud-resolving scales of 1-km

Increase in ensemble use and number of ensemble members to manage uncertainty

Fewer model approximations, more features (physics, chemistry, etc.)

Accelerator technology identified as a cost-effective and practical approach to future computational challenges

Model Trends and Accelerator Motivation

128 km 16 km 10 km 1 km

?IFS

Source: Project Athena – http://www.wxmaps.org/athena/home/

Number of

Jobs by 10x

5

NASA targeting GEOS global model resolution at sub-10-km to 1-km range

Computational requirements for typical 5-day operational forecast:

Grid resolution Westmere CPU cores Comments

10 KM 12,000 Possible today

3 KM 300,000 Reasonable but not available

1 KM 10,000,000 Impractical, need acclerators

Source: http://data1.gfdl.noaa.gov/multi-core/

3.5-km GEOS-5 Simulated Clouds

(CPU-Only)

The Finite-Volume Dynamical Core on GPUs within GEOS-5- Dr. William Putman, Global Modeling and Assimilation Office, NASA GSFC

Example: NASA Global Cloud Resolving GEOS-6

6

Source: https://www2.cisl.ucar.edu/cas2k13/pgas-implementation-ecmwf-integrated-forecasting-system-ifs-nwp-model

From the 11th NCAR International Computing

for the Atmospheric Sciences Symposium (iCAS2013), Sep 2013

“A PGAS Implementation of the ECMWF Integrated Forecasting

System (IFS) NWP Model”-by George.Mozdzynski, ECMWF

Removing the hydrostatic approximation in the IFS NWP operational model:

from 6K cores to 80K (13x)

Example: ECMWF Global Non-Hydrostatic IFS

7

Example: Feature Growth in Climate Models

From the 3rd Assessment Report of Intergovernmental

Panel on Climate Change: IPCC Climate Change 2013

Source: http://www.ipcc.ch/report/ar5/wg1/#.Ut9V4BDTmM8

8

NWP/Climate HPC Centers in Europe and USA

Organization Location ModelsPrevious/Current Operational HPC

Current/NextOperational HPC

ECMWF Reading, UK IFS IBM Power Cray XC30 – x86

Met Office Exeter, UK UM IBM Power ? (2014 Decision)

DWD Offenbach, DE GME, COSMO, ICON NEC SX-9 Cray XC30 - x86

MF Toulouse, FR ALADIN, AROME NEC SX-9 Bull - x86

NOAA/NCEP Various, US GFS, WRF, FIM, NIM IBM Power IBM iDataPlex - x86

Motivation for x86 Migration Includes Preparation for Future Accelerator Deployment

NCAR Boulder, US CESM, WRF, MPAS IBM Power IBM iDataPlex - x86

DKRZ/MPI-M Hamburg, DE MPI-ESM IBM Power Bull - x86

Re

sear

ch

Op

era

tio

nal

NW

P

9

Early focus (~2010): climate and NWP research – early CUDA implementationsProject opportunities to refactor code with CUDA for GPU speedup demonstrations

Current focus: production research and operational models – OpenACC, librariesESM community requires Fortran for programming, portability, maintainability, etc.

NVIDIA investments in applications engineering and strategic partnershipsEngineering collaboration in 15 models/developments and growing (list to follow)

Ongoing software development of ESM-relevant libraries and OpenACCCUBLAS, CUSPARSE, AmgX, OpenACC collaborations with CAPS, Cray; PGI acquisition

OEM system integration and collaboration on strategic deploymentsIntegration— Cray x86; IBM Power 8 + GPUs with NVLink Interconnect; othersCollaborations — Cray: TITAN (18,688 K20X, #2 Top500) —ORNL/NOAA; Gaea —NOAA;Blue Waters (4224 K20) —NCSA; Piz Daint (5,272 K20X, #6 Top500) —CSCS;IBM: Yellowstone (Geyser, Caldera) — NCAR; Discover — NASA GSFC

Evolution of GPUs for NWP/Climate Modeling

10

Model Focus GPU Approach Collaboration

NCAR(US) / WRF NWP/Climate-R (1) OpenACC, (2) CUDA (1) NCAR, (2) SSEC UW-M

DWD(DE) / COSMO NWP/Climate-R CUDA+OpenACC CSCS, MeteoSwiss (MCH)

ORNL(US) / CAM-SE Climate-G CUDA-F OpenACC ORNL, Cray

NCAR(US) / CAM-SE Climate- G CUDA, CUDA-F, OpenACC NCAR-CISL

NOAA(US) / NIM&FIM NWP/Climate-G F2C-ACC, OpenACC NOAA-ESRL, PGI

NASA(US) / GEOS-5 Climate-G CUDA-F OpenACC NASA, PGI

IPSL(FR) / NEMO Ocean GCM OpenACC STFC

UKMO(UK) / GungHo NWP/Climate-G OpenACC STFC, UKMO in future?

USNRL(US) / HYCOM Ocean GCM OpenACC US Naval Research Lab

UT-JAMSTEC-RIKEN / NICAM Climate-G OpenACC RIKEN, TiTech

UNC-ND(US) / ADCIRC Storm Surge OpenACC (AmgX?) LSU LONI

NOAA(US) / MOM6 Ocean GCM OpenACC NOAA-GFDL

NASA(US) / FV-Core Atmospheric GCM OpenACC NASA, NOAA-GFDL

ECMWF(UK) / IFS NWP OpenACC ECMWF, CSC-FI

IPSL(FR) / DYNAMICO Atmospheric GCM CUDA-F, OpenACC IPSL

NVIDIA Collaborations in 15 Model Projects

Other Evaluations: US – COAMPS, MPAS, ROMS, OLAM; Europe – ICON, HARMONIE

Asia-Pacific – ASUCA (JP), GRAPES (CN), KWRF (KR)NVIDIA and Customer Confidential – DO NOT DISTRIBUTE

11

OpenACC Progress Important to NWP/Climate

Of today’s 11 non-vendor OpenACC members, all have NWP/Climate developments

NWP/Climate leads other domains by 2x on OpenACC development projects

GTC 27 Mar 2014: GTC OpenACC Roundtable for NWP and Climate Modeling Motivation to identify critical and common requests for international selection of 10 different CWO models

12

Model Representatives

1. ASUCA Takashi Shimokawabe, TiTech; Michel Müller, RIKEN

2. CAM-SE Jeff Larkin, NVIDIA US; Matt Norman, ORNL

3. COSMO Peter Messmer, NVIDIA CH; Claudio Gheller, Will Sawyer, CSCS

4. FIM/NIM Mark Govett, NOAA

5. HARMONIE JC Desplat, Enda O’Brien, ICHEC

6. ICON Peter Messmer, NVIDIA CH; Claudio Gheller, Will Sawyer, CSCS

7. NEMO Jeremy Appleyard, NVIDIA UK

8. NICAM Akira Naruse, NVIDIA JP; Hisashi Yashiro, RIKEN

9. WRF Carl Ponder, NVIDIA US

10. COAMPS Dave Norton, PGI; Gopal Patnaik, US NRL

Model Contributions at GTC OpenACC Session

GTC OpenACC Roundtable for NWP and Climate Modeling

13

Model Trends and GPU Motivation

GPU Progress of Select Models

Next Generation Models and GPUs

Agenda: Progress of GPU-Parallel NWP and Climate Models

14

Global

Scale Climate Weather OceanNCAR-CISL, ORNL / CESM• CAM-SE (HOMME)• LANL / POP

NASA / GEOS-5NOAA-GFDL / CFSv2• NOAA-GFDL / MOM6

UKMO / HadGEM3• UM

• NEMO

MPI-M / MPI-ESM• ECHAM5 • MPIOM

JAMSTEC, RIKEN, UTokyo / NICAMIPSL / DYNAMICO

UKMO / UMECMWF / IFSDWD / GME NOAA-NCEP / GFSEC, CMC / GEMUSNRL / NAVGEMNOAA-ESRL / FIM

DWD, MPI-M / ICONNOAA-ESRL / NIMNCAR / MPAS-A

LANL / POPNOAA-GFDL / MOM6CNRS, STFC/ NEMOUSNRL / HYCOMMIT / MITgcm

LANL / MPAS-OMPI-M / ICON-OCN

GPU Progress of NWP and Climate Models

NCAR-M3 / WRFUSNRL / COAMPSDWD, MCH / COSMOMFR / AROMEMFR, ICHEC / HARMONIE• HIRLAM + ALADIN

JAMSTEC-JMA / ASUCACAS-CMA / GRAPESUniMiami / OLAM

NCAR-M3 / WRFDWD, MCH / COSMOUniMiami / OLAM

Regional

Rutgers-UCLA / ROMSUNC-ND / ADCIRC

GPU Development (8)CAM-SE, GEOS-5, NEMO, WRF, COSMO, NIM, FIM, GRAPES

GPU Evaluation (15) POP, ICON, NICAM, OLAM, GungHo, PantaRhei, ASUCA, HARMONIE, COAMPS, HYCOM, MITgcm, ROMS, ADCIRC, DYNAMICO, MOM6

GPU Not Started (7)MPAS-A, MPAS-O, GFS, GEM, NAVGEM, AROME, ICON-OCN

Indicates Next-Gen Model

MPAS-O

MPAS-A or NIM

MPAS-A or NIM

ICON-ATM

NIM

GungHo

PantaRhei

MPAS-O

NIM?

ICON-OCN

ICON

GungHo

15

Model Focus GPU Approach Collaboration

NCAR(US) / WRF NWP/Climate-R (1) OpenACC, (2) CUDA (1) NCAR, (2) SSEC UW-M

DWD(DE) / COSMO NWP/Climate-R CUDA+OpenACC CSCS, MeteoSwiss (MCH)

ORNL(US) / CAM-SE Climate-G CUDA-F OpenACC ORNL, Cray

NCAR(US) / CAM-SE Climate- G CUDA, CUDA-F, OpenACC NCAR-CISL

NOAA(US) / NIM&FIM NWP/Climate-G F2C-ACC, OpenACC NOAA-ESRL, PGI

NASA(US) / GEOS-5 Climate-G CUDA-F OpenACC NASA, PGI

IPSL(FR) / NEMO Ocean GCM OpenACC STFC

UKMO(UK) / GungHo NWP/Climate-G OpenACC STFC, UKMO in future?

USNRL(US) / HYCOM Ocean GCM OpenACC US Naval Research Lab

UT-JAMSTEC-RIKEN / NICAM Climate-G OpenACC RIKEN, TiTech

UNC-ND(US) / ADCIRC Storm Surge OpenACC (AmgX?) LSU LONI

NOAA(US) / MOM6 Ocean GCM OpenACC NOAA-GFDL

NASA(US) / FV-Core Atmospheric GCM OpenACC NASA, NOAA-GFDL

ECMWF(UK) / IFS NWP OpenACC ECMWF, CSC-FI

IPSL(FR) / DYNAMICO Atmospheric GCM CUDA-F, OpenACC IPSL

NVIDIA Collaborations in 15 Model Projects

Other Evaluations: US – COAMPS, MPAS, ROMS, OLAM; Europe – ICON, HARMONIE

Asia-Pacific – ASUCA (JP), GRAPES (CN), KWRF (KR)NVIDIA and Customer Confidential – DO NOT DISTRIBUTE

16

COSMO Developments

Towards GPU-accelerated Operational Weather Forecasting

- Oliver Fuhrer (MeteoSwiss), NVIDIA GTC 2013, Mar 2013Source: http://on-demand.gputechconf.com/gtc/2013/presentations/S3417-GPU-Accelerated-Operational-Weather-Forecasting.pdf

Towards operational implementation of COSMO on accelerators at MeteoSwiss

- Oliver Fuhrer (MeteoSwiss), iCAS 2013, Sep 2013Source: https://www2.cisl.ucar.edu/sites/default/files/20130911_fuo_cas2k13.pdf

17

COSMO Approach and GPU Implementation

Approach Implementation

18

COSMO End-to-End Simulation Performance

From the 11th NCAR International Computing

for the Atmospheric Sciences Symposium (iCAS2013), Sep 2013

“Towards operational implementation of COSMO on accelerators at MeteoSwiss”-by Dr. Oliver Fuhrer, MeteoSwiss

Observe ~3x for COSMO demonstrator on GPUs over production model

(Based on COSMO-7 12UTC)

Source: https://www2.cisl.ucar.edu/sites/default/files/20130911_fuo_cas2k13.pdf

~3x Speedup. . .

. . . Requiring~7x Less Energy

19

http://irina.eas.gatech.edu/EAS8802_Spring2011/Lecture7.pdf

http://www.mmm.ucar.edu/wrf/users/workshops/WS2010/presentations/Lectures/Microphysics10.pdf

http://www.mmm.ucar.edu/wrf/users/docs/user_guide_V3.1/users_guide_chap5.htm#_Installing_WRF

http://www.mmm.ucar.edu/wrf/WG2/GPU/WSM5.htm

Jarno Mielikainen, Bormin Huang, Hung-Lung Allen Huang, and Mitchell D, Goldberg, “Improved GPU/CUDA Based Parallel

Weather and Research Forecast(WRF) Single Moment 5-Class (WSM5) Cloud Microphysics”, IEEE Journal of Selected Topics

in Applied Earth Observations and Remote Sensing, Vol 5, No.4, August 2012

WRF DevelopmentsCUDA Implementation of the Weather Research and Forecasting (WRF) Model

- Bormin Huang (SSEC), Supercomputing 2013, Nov 2013Source: http://on-demand.gputechconf.com/supercomputing/2013/presentation/SC3133-CUDA-Weather-Research-Forecasting-Model.pdf

20

WRF Operational in 21 Countries; 153 Total

Source: Welcome Remarks, 14th Annual WRF Users’ Workshop, 24-28 Jun 2013, Boulder, CO

21

Independent development paths of (I) CUDA and (II) OpenACC

I. CUDA development through funded collaboration with SSEC www.ssec.wisc.edu

CUDA WRF project began during 2010 through funding from NOAA and NASA

Project lead Dr. Bormin Huang, NVIDIA CUDA Fellow: research.nvidia.com/users/bormin-huang

NVIDIA-SSEC plan for full CPU-GPU hybrid WRF by Q3 2014 (today ~65% complete)

II. OpenACC collaboration with NCAR-MMM, NOAA-ESRL, and NOAA-NCEP

NCAR-MMM plans for OpenACC version of WRF-ARW in development trunk

NOAA-ESRL plans for OpenACC WRF physics with FIM and NIM dynamical cores

NOAA-NCEP interest in OpenACC HRRR/WRF configuration (operational in 2014)¥

Flexibility to combine CUDA and OpenACC modules into a WRF GPU modelNCAR-MMM interest in offering a GPU-accelerated WRF-ARW on trunk distribution site

NVIDIA Strategy for GPU-Accelerated WRF

22

Published WRF Speedups from SSEC

Source: Bormin Huang, Space Science and Engineering Center, UW-M

NOTE: All times without CPU data transfer

Hardware: Core-i7 3930K, 1 core use;GTX 590 GeForce

Benchmark: CONUS 12 km for 24 Oct 01

433 x 308, 35 levels

WRF V3.2 and V3.3

Verification: WSM5 by NREL (J. Michalakes) and NVIDIA Applications Engr [Next 2 slides]

23

Accelerator Results of WRF Scheme WSM5CONtinental United States (CONUS) of 12 km resolution domain for October 24, 2001, and is 433 x 308 horizontal grid points with 35 vertical levels.

Performance Results

K40 GPU vs. 2 x Sandybridge CPU = 3.2x

K40 GPU vs. 2 x Ivybridge CPU = 2.0x

K40 GPU vs. Phi = 1.8x

NOTE: K40 Boost Mode Provides 15% Gain

24

NEMO Developments

Accelerating NEMO with OpenACC

- Maxim Milakov (NVIDIA), NVIDIA GTC 2013, Mar 2013Source: http://on-demand.gputechconf.com/gtc/2013/presentations/S3209-Accelerating-NEMO-with-OpenACC.pdf

NEMO on GPU-based Heterogeneous Architectures: a Case Study Using OpenACC

- Jeremy Appleyard (NVIDIA), NEMO UGM, Jul 2014

25

NEMO Model http://www.nemo-ocean.eu/

Nucleus for European Modelling of the Ocean global and regional OGCM

Primary developers CNRS, Mercato-Ocean, UKMO, NERC, CMCC, INGV

OCN component for 5 of 7 Earth system models in the ENES http://enes.org

European consortium of 40 projects, 400 users, and ~50 publications/year

Configurations

GYRE50: Idealized double gyres, 1/4° horizontal resolution, 31 vertical layers

ORCA025: Global high resolution, 1/4° horizontal resolution, 75 vertical layers

NVIDIA “PSG” Cluster http://psgcluster.nvidia.com/trac

PSG consists of 30 compute nodes of mixed type, each 128 GB of system memory

This study: Each node 2 x Intel Xeon Ivy Bridge CPUs and 6 x NVIDIA K40 GPUs

NEMO tests on 8 nodes using 20 of 20 cores per node, and 2 of 6 GPUs per node

NEMO Performance with OpenACC and GPUs

26

NEMO Coupling to European Climate Models

NEMO critical for European climate models: ocean component for 5 of 7 modeling groups

UKMO has announced NEMO as the ocean component model for HadGEM3*

*

27

0

500

1000

1500

2000

2500

2 3 4 6 8

Tesla K40

Xeon IVBLower

is Better

Tota

l Tim

e f

or

1000 T

ime S

teps

(Sec) PSG node utilization:

2 x IVB + 2 x K40

GYRE settings:

NEMO GYRE 1/4° Configuration

3.7x

Number of Compute Nodes

3.2x2.9x

2.6x2.5x

Output every 5 days

Time steps = 1000

NEMO release 3.5

Without using GPUs

Use of GPUs

NEMO Performance with OpenACC and GPUs

28

0

500

1000

1500

2000

2500

4 6 8 10

Tesla K40

Xeon IVBLower

is Better

Tota

l Tim

e f

or

600 T

ime S

teps

(sec)

Node utilization:2 x IVB + 2 x K40

ORCA025 settings:

ORCA025 Configuration

2.3x

Number of Compute Nodes

2.1x

1.8x 1.7x

Output every 5 days

Total run: 10 days

Time steps: 600

NEMO 3.5

Without using GPUs

Use of GPUs

NEMO and GPU Performance with OpenACC

29

0

500

1000

1500

2000

2500

4 6 8 10

Tesla K40

Xeon IVBLower

is Better

Tota

l Tim

e f

or

600 T

ime S

teps

(sec)

Node utilization:2 x IVB + 2 x K40

ORCA025 settings:

ORCA025 Configuration

2.3x

Number of Compute Nodes

2.1x

1.8x 1.7x

Output every 5 days

Total run: 10 days

Time steps: 600

NEMO 3.5

Without using GPUs

Use of GPUs

NEMO and GPU Performance with OpenACC

30

2 Nodes: 8xK40, 4xIVB, 8 of 40 cores

10 Nodes: 20xIVB, 200 cores

2 nodes + 8 GPUs = 4 nodes + 8 GPUs = 10 nodes

4 Nodes: 8xK40, 8xIVB, 8 of 80 cores

NEMO HPC Configurations at Equal Performance

• Flexibility: GPUs free-up existing HPC nodes/cores for other applications

• Efficiency: GPU-based nodes more cost effective for new HPC purchase

31

Model Trends and GPU Motivation

GPU Progress of Select Models

Next Generation Models and GPUs

Agenda: Progress of GPU-Parallel NWP and Climate Models

32

New global NH dynamical cores with icosahedral spatial discretizationUniTokyo/JAMSTEC/RIKEN—NICAM; UMKO—GungHo; NCAR—MPAS; DWD—ICON;

NOAA—NIM; IPSL—DYNAMICO (NH in development); ECMWF-IFS (Scalability Project)

Model investigations of iterative (semi-)implicit methods and linear solversSolution methods to elliptic PDE’s with implicit time stepping

New US DOE Program ACME: Accelerated Climate Model for EnergyNew program to develop Earth system model for accelerated computingACME based on CESM, new co-design based on Trinity and Coral systems

Model Directions and GPU Developments

33

DYNAMICOFrance

Main Developer: Thomas. Dubos (IPSL), Yann Meurdesoif (IPSL)

Development Time:About 4 years for 3D model*hydrostatic model, developing

non-hydrostatic model in near future

MPASUS

Main Developer: Bill Skamarock (NCAR)

Development Time:About 7 years

ICONGermany

Main Developer: Zängl Günther (DWD), Marco Giorgetta (MPI-M)

Development Time:More than 10 years

NICAMJapan

Main Developer: Hirofumi Tomita (AICS),Masaki Satoh (U.Tokyo)

Development Time:More than 10 years

ICOMEX: GPU Developments for 3 of 4 Models

Source: ICOMEX Meeting 2014, R. Yoshida, RIKEN AICS

Next-gen global non-hydrostatic dynamical cores based on icosahedral grids

NICAM, ICON, DYNAMICO under development, MPAS ongoing discussions

34

NVIDIA Member of ECMWF Scalability Program

http://old.ecmwf.int/newsevents/meetings/workshops/2014/Scalability/

ECMWF Scalability Workshop, 14-15 Apr 2014

35

Jun 2013:Completed collaboration agreement with SFTC that includes GungHo (and NEMO)

Nov 2013:Report on GungHo including GPU considerations http://www.metoffice.gov.uk/media/pdf/8/o/FRTR587Tagged.pdf

Feb 2014:

Completed UKMO NDA and benchmark agreement

Technical proposals by NVIDIA:OpenACC for physics parameterizations

GungHo and ENDGame potential use of NVIDIA AmgX library

NVIDIA AmgX – a toolkit for iterative implicit solvers

Multigrid; Krylov: GMRES, CG, BiCGStab, preconditioned and ‘flexible’ variants

Classic iterative: Block-Jacobi, Gauss-Seidel, ILU’s; Multi-colored versions

Flexible configuration: All methods as solvers, preconditioners, or smoothers; nesting

NVIDIA Progress with Met Office and UM/GungHo

36

ACME: New Climate Program from US DOE

Accelerated Climate Model for EnergyConsolidation of DOE Earth system model lab projects from 7 into 1

ACME a development branch of CESM with its own coupling capabilityImprove CESM for optimal performance on DOE Leadership Class Facility (LCF)Towards non-hydrostatic global atmosphere 12 km, ocean 15 km, 80 years

Co-design with new DOE LCF systems based on Trinity and Coral programsNew LCF programs based on heterogeneous accelerator-based HPC

Source: http://asr.science.energy.gov/meetings/stm/2014/presentations/Koch-ASR-ACME-March11.pdf

37

Opportunities exist for GPUs to provide significant performance acceleration for NWP and Climate Models

Higher resolutions possible for existing operational models

Use of more expensive physics parameterizations for same turnaround

Reduced energy consumption in IT configuration and procedures

Potential for simulations recently considered impractical

Non-hydrostatic global models at cloud resolving scale

Parameter physics such as radiation at more frequent time steps

Expanded and more operational use of ensemble predictions

Summary Progress of GPUs for NWP/Climate

Stan Posey; [email protected]

NVIDIA, Santa Clara, CA, USA