porting climate models to nec sx-aurora tsubasa...< panagiotis adamidis> (dkrz) nec sx-aurora...

17
< Panagiotis Adamidis> (DKRZ) <Panagiotis Adamidis> Deutsches Klimarechenzentrum (DKRZ) Porting Climate Models to NEC SX-Aurora TSUBASA

Upload: others

Post on 20-Oct-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

  • < Panagiotis Adamidis> (DKRZ)

    Deutsches Klimarechenzentrum (DKRZ)

    Porting Climate Models to NEC SX-Aurora TSUBASA

  • < Panagiotis Adamidis> (DKRZ)

    NEC SX-Aurora TSUBASA @ DKRZ

    2

    People who contributed Panagiotis Adamidis, Jan Frederik Engels, Hendryk

    Bockelmann (DKRZ) Günther Zängl (DWD) Rene Redler (MPI-Met)

  • < Panagiotis Adamidis> (DKRZ)

    Climate Models @ DKRZ

    3

    High Resolution Resolving small scale physical processes

    Coarse Resolution Simulating longer periods (80000 – 100000 years) Complete glacial cycles

  • < Panagiotis Adamidis> (DKRZ)

    ICON Grid Resolutions

    4

    grid number of cells avg. resolutionR2B04 20480 158 kmR2B05 81920 79 kmR2B06 327680 40 km R2B07 1310720 20 kmR2B09 20971520 5 kmR2B10 83886080 2.5 km R2B11 335544320 1.25 km

  • < Panagiotis Adamidis> (DKRZ)

    Very High Resolution Climate Modelling HD(CP)2

  • < Panagiotis Adamidis> (DKRZ)

    HDCP2 Modell - Sustained Performance

    0,0

    0,1

    0,2

    0,3

    0,4

    0,5

    0,6

    0,7

    0 7200 14400 21600 28800 36000 43200 50400 57600 64800

    Sim

    ulat

    edD

    ays

    / Day

    Number of Cores

    DE 3 DomainsDE-3Dom-Schifted-NightDE-3Dom-Schifted-DayEstimationWithout Output

  • < Panagiotis Adamidis> (DKRZ)

    Some Statistics (01.2016-02.2018)

    Total number of / amount ofsimulated days 26

    variables written out 169

    meteorological stations 36

    data output : 1,5 PB + meteogram : 1 TB

    node hours used 3,25 Mio node hours

    7

  • < Panagiotis Adamidis> (DKRZ)

    NEC SX-Aurora TSUBASA: ICON-NWP (R2B06L90)

    8

    • Successful compilation with –O2 –finline-functions

    Aurora without IVDEP

    Time in sec

    total 2412.74nh_solve 902.22

    physics 1170.16

    DWD-Cray 36core-BDW

    Time in sec

    total 433.21nh_solve 226.43

    physics 128.46

    Time in sec Aurorawith

    IVDEP

    1633.41 total124.89 nh_solve

    1177.07 physics

    7.2x

    1.8x

    (G. Zängl, DWD)

  • < Panagiotis Adamidis> (DKRZ)

    NEC SX-Aurora TSUBASA

    9

    Compiler shows reluctant behaviour towardsvectorization

    Many unvectorizable dependencies detected Using Pointerarrays with the CONTIGUOUS

    attribute leads to unvectorizable dependencies

  • < Panagiotis Adamidis> (DKRZ)

    NEC SX-Aurora TSUBASA: ICON-NWP (R2B06L90)

    10

    nh_solve 1 node NEC SX-Aurora TSUBASA16 cores

    (time in sec)

    1 node CRAY Broadwell36 cores

    (time in sec)

    nh_solve.cellcomp(direct addressing,

    memory-bound)

    3.16 36.05

    nh_solve.veltend(indirect addressing)

    14 31.7

    nh_solve.vimpl(good cache utilization)

    11.9 53.5

    (G. Zängl, DWD)

  • < Panagiotis Adamidis> (DKRZ)

    DYAMOND : ICON-AES / R2B09L90 / 1 Simulated Day

    12

    mistral BDW

    100 nodes

    200 nodes

    400 nodes

    600nodes

    900nodes

    total 8731.34 4072.34 2192.81 1446.71 1007.58

    0

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    100 200 400 600 900

    Spee

    dup

    # nodes

    Strong Scaling

    linear

    speedup

  • < Panagiotis Adamidis> (DKRZ)

    DYAMOND : ICON-AES / R2B09L90 / 1 Simulated Day

    13

    56% 57% 58% 57% 57%

    25%28% 31% 30% 31%

    18% 15%11% 13% 12%

    0%

    10%

    20%

    30%

    40%

    50%

    60%

    70%

    80%

    90%

    100%

    100 200 400 600 900# nodes

    Time Distribtution

    rest

    physcis/total

    nh_solve/total

  • < Panagiotis Adamidis> (DKRZ)

    NEC SX-Aurora TSUBASA: ICON-coupled ATM/OCE

    14

    • Successful compilation with optimization level O3

    Hardware MPI-Tasks Total (time in sec)

    echam_bcs(time in sec)

    Difference of Total time in

    sec

    Slowdownfactor

    1 mistral node :

    24 HSW cores

    22 Atmosphere+

    2 Ocean

    65,8 4,12

    1 Aurora node : 16

    Vector cores

    14 Atmosphere+

    2 Ocean

    296,7 16,7 230,9 4,5X

    • 1 Simulated Day• Input seems to be a bottleneck

    (R. Redler, MPI-Met)

  • < Panagiotis Adamidis> (DKRZ)

    NEC SX-Aurora TSUBASA : ECHAM6

  • < Panagiotis Adamidis> (DKRZ)

    NEC SX-Aurora TSUBASA: ECHAM6

    16

    • Successful compilation only when using low optimization level O1

    NumberMPI

    Tasks

    HSWO3

    + hiopt(time in sec)

    NEC SX-AuroraO1

    (time in sec)

    Difference

    (time in sec)

    Slowdown

    16 48,657 210,49 161,83 13,15X

    32 26,122 166,43 140,30 6,37X

  • < Panagiotis Adamidis> (DKRZ)

    NEC SX-Aurora TSUBASA : ftrace

    17

    Overhead of ftrace is huge

    with ftrace(time in sec)

    no ftrace(time in sec)

    ICON-AES 478 28 ICON-Coupled 1600 297

  • < Panagiotis Adamidis> (DKRZ)

    NEC SX-Aurora architecture has good potentialto deliver sustained performance Compiler with efficient vectorization is essential Good scaling over hundreds/thousands of

    nodes is necessary Efficient parallel I/O is vital for high resolution

    simulations The performance of the file system is very

    important

    Conclusion & Outlook

    Foliennummer 1NEC SX-Aurora TSUBASA @ DKRZClimate Models @ DKRZICON Grid ResolutionsFoliennummer 5HDCP2 Modell - Sustained PerformanceSome Statistics (01.2016-02.2018)NEC SX-Aurora TSUBASA: ICON-NWP (R2B06L90) NEC SX-Aurora TSUBASANEC SX-Aurora TSUBASA: ICON-NWP (R2B06L90) DYAMOND : ICON-AES / R2B09L90 / 1 Simulated DayDYAMOND : ICON-AES / R2B09L90 / 1 Simulated DayNEC SX-Aurora TSUBASA: ICON-coupled ATM/OCENEC SX-Aurora TSUBASA : ECHAM6NEC SX-Aurora TSUBASA: ECHAM6NEC SX-Aurora TSUBASA : ftraceFoliennummer 18