fujitsu’s architectures and collaborations for s architectures and collaborations for weather...

34
Ross Nobes Fujitsu Laboratories of Europe Fujitsu’s Architectures and Collaborations for Weather Prediction and Climate Research ECMWF Workshop on HPC in Meteorology, October 2014 Copyright 2014 FUJITSU LIMITED

Upload: dangkhuong

Post on 28-Mar-2018

227 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Fujitsu’s Architectures and Collaborations for s Architectures and Collaborations for Weather Prediction and Climate Research ECMWF Workshop on HPC in Meteorology, October 2014 Copyright

Ross Nobes

Fujitsu Laboratories of Europe

Fujitsu’s Architectures and Collaborations for Weather Prediction and Climate Research

ECMWF Workshop on HPC in Meteorology, October 2014 Copyright 2014 FUJITSU LIMITED

Page 2: Fujitsu’s Architectures and Collaborations for s Architectures and Collaborations for Weather Prediction and Climate Research ECMWF Workshop on HPC in Meteorology, October 2014 Copyright

Fujitsu HPC - Past, Present and Future

Japan’s First Vector (Array) Supercomputer(1977)

No.1 in Top500(Nov. 1993)Gordon Bell Prize (1994, 95, 96)

F230-75APU

VPP5000

VPP300/700

AP3000

VPP500

AP1000

VP Series

NWT*Developed with NAL

World’s FastestVector Processor (1999)

PRIMEPOWERHPC2500

World’s Most ScalableSupercomputer (2003)

Japan’s Largest Cluster in Top500(July 2004)

Most Efficient Performancein Top500 (Nov. 2008)

PRIMERGY BX900Cluster node

HX600Cluster node

PRIMEQUEST

FX1

SPARCEnterprise

PRIMERGY RX200Cluster node

*NWT: Numerical Wind Tunnel

ⒸJAXA

K computer

FX10Post-FX10

Exa

No.1 in Top500(June and Nov., 2011)

PRIMERGYCX400

Next x86 generation

Fujitsu has been developing advanced supercomputers since 1977

1 Copyright 2014 FUJITSU LIMITED

Nationalproject

ECMWF Workshop on HPC in Meteorology, October 2014 1

Page 3: Fujitsu’s Architectures and Collaborations for s Architectures and Collaborations for Weather Prediction and Climate Research ECMWF Workshop on HPC in Meteorology, October 2014 Copyright

Fujitsu’s Approach to the HPC Market

Workgroup

Supercomputer

Divisional

Departmental

Work Group

X86PRIMERGY

HPC solutions

SPARC64-BasedSupercomputers

PowerfulWorkstations

2ECMWF Workshop on HPC in Meteorology, October 2014 Copyright 2014 FUJITSU LIMITED2

Page 4: Fujitsu’s Architectures and Collaborations for s Architectures and Collaborations for Weather Prediction and Climate Research ECMWF Workshop on HPC in Meteorology, October 2014 Copyright

The K computer

3 Copyright 2014 FUJITSU LIMITEDECMWF Workshop on HPC in Meteorology, October 2014

June 2011 No.1 in TOP500 List (ISC11)November 2011 Consecutive No. 1 in TOP500 List (SC11)November 2011 ACM Gordon Bell Prize Peak-Performance (SC11)November 2012 No.1 in Three HPC Challenge Award Benchmarks (SC12)November 2012 ACM Gordon Bell Prize (SC12)November 2013 No.1 in Class 1 and 2 of the HPC Challenge Awards (SC13)June 2014 No.1 in Graph 500 “Big Data” Supercomputer Ranking (ISC14)

Build on the success of the K computer

Page 5: Fujitsu’s Architectures and Collaborations for s Architectures and Collaborations for Weather Prediction and Climate Research ECMWF Workshop on HPC in Meteorology, October 2014 Copyright

Evolution of PRIMEHPC

4 Copyright 2014 FUJITSU LIMITEDECMWF Workshop on HPC in Meteorology, October 2014

K computer PRIMEHPC FX10 Post-FX10

CPU SPARC64 VIIIfx SPARC64 IXfx SPARC64 XIfx

Peak perf. 128 GFLOPS 236.5 GFLOPS 1 TFLOPS ~

# of cores 8 16 32 + 2

Memory DDR3 SDRAM ← HMC

Interconnect Tofu Interconnect ← Tofu Interconnect 2

System size 11 PFLOPS Max. 23 PFLOPS Max. 100 PFLOPS

Link BW 5 GB/s x bidirectional

← 12.5 GB/s x bidirectional

Page 6: Fujitsu’s Architectures and Collaborations for s Architectures and Collaborations for Weather Prediction and Climate Research ECMWF Workshop on HPC in Meteorology, October 2014 Copyright

Continuity in Architecture for CompatibilityUpwards compatible CPU: Binary-compatible with the K computer &

PRIMEHPC FX10Good byte/flop balance

New features:New instructions Improved micro architecture

For distributed parallel executions:Compatible interconnect architecture Improved interconnect bandwidth

5 Copyright 2014 FUJITSU LIMITEDECMWF Workshop on HPC in Meteorology, October 2014

Page 7: Fujitsu’s Architectures and Collaborations for s Architectures and Collaborations for Weather Prediction and Climate Research ECMWF Workshop on HPC in Meteorology, October 2014 Copyright

32 + 2 Core SPARC64 XIfx Rich micro architecture improves single thread performance 2 additional, Assistant-cores for avoiding OS jitter and non-blocking

MPI functions

K FX10 Post-FX10Peak FP performance 128 GF 236.5 GF 1-TF classCore config.

Execution unit FMA 2 FMA 2 FMA 2

SIMD 128 bit 128 bit 256 bit wide

Dual SP mode NA NA 2x DP performanceInteger SIMD NA NA Support

Single thread performance enhancement

- - Improved OOOexecution, better

branch prediction, larger cache

ECMWF Workshop on HPC in Meteorology, October 2014 6 Copyright 2014 FUJITSU LIMITED

Page 8: Fujitsu’s Architectures and Collaborations for s Architectures and Collaborations for Weather Prediction and Climate Research ECMWF Workshop on HPC in Meteorology, October 2014 Copyright

Flexible SIMD OperationsNew 256-bit wide SIMD functions enable versatile operations Four double-precision calculations Strided load/store, Indirect (list) load/store, Permutation, Concatenation

7 Copyright 2014 FUJITSU LIMITEDECMWF Workshop on HPC in Meteorology, October 2014

SIMD PermutationIndirect loadStrided load

Double precision

128regs

reg S1

reg S2

reg D

Memory

simultaneous calculation

reg S

reg D

Memory

reg D

reg S

Arbitrary shuffle

reg D

Specified stride

Page 9: Fujitsu’s Architectures and Collaborations for s Architectures and Collaborations for Weather Prediction and Climate Research ECMWF Workshop on HPC in Meteorology, October 2014 Copyright

Hybrid Memory Cube (HMC)

8

Peak performance/node K FX10 post-FX10DP performance (Gflops) 128 236.5 Over 1TFMemory Bandwidth (GB/s) 64 85 480

Interconnect Link Bandwidth (GB/s) 5 5 12.5

The increased arithmetic performance of the processor needs higher memory bandwidth The required memory bandwidth can almost be met by HMC (480 GB/s) Interconnect is boosted to 12.5 GB/s x 2 (bi-directional) with optical link

ECMWF Workshop on HPC in Meteorology, October 2014 Copyright 2014 FUJITSU LIMITED

Page 10: Fujitsu’s Architectures and Collaborations for s Architectures and Collaborations for Weather Prediction and Climate Research ECMWF Workshop on HPC in Meteorology, October 2014 Copyright

Amount per processor Capacity Memory BW

HMC x8 32 GB 480 GB/s

DDR4-DIMM x8 32~128 GB 154 GB/sGDDR5 x16 8 GB 320 GB/s

HMC (2)

9

HSSD was adopted for main memory since its bandwidth is three times more than DDR4

HMC can deliver the required bandwidth for high performance multi-core processors

ECMWF Workshop on HPC in Meteorology, October 2014 Copyright 2014 FUJITSU LIMITED

Page 11: Fujitsu’s Architectures and Collaborations for s Architectures and Collaborations for Weather Prediction and Climate Research ECMWF Workshop on HPC in Meteorology, October 2014 Copyright

Tofu2 Interconnect Successor to Tofu InterconnectHighly scalable, 6-dimensional mesh/torus topology Logical 3D, 2D or 1D torus network from the user’s point of view Increased link bandwidth by 2.5 times to 100 Gbps

10 Copyright 2014 FUJITSU LIMITEDECMWF Workshop on HPC in Meteorology, October 2014

Three-dimensional torus unit2×3×2

Scalable three-dimensional torus

Well-balanced shape available

Page 12: Fujitsu’s Architectures and Collaborations for s Architectures and Collaborations for Weather Prediction and Climate Research ECMWF Workshop on HPC in Meteorology, October 2014 Copyright

A Next Generation InterconnectOptical-dominant: 2/3 of network links are optical

Copyright 2014 FUJITSU LIMITEDECMWF Workshop on HPC in Meteorology, October 2014

0

0.2

0.4

0.6

0.8

1

1.2

IBM BlueGene/Q

IB-FDRFat-Tree

CrayCascade

Tofu1 Tofu2

Gro

ss ra

w b

itrat

e pe

r nod

e (T

bps)

Optical network linksElectrical network links

6.25 Gbps

25 Gbps

25 Gbps

12.5 Gbps

14 Gbps

14 Gbps

14 Gbps

10 Gbps

5 Gbps

11

Page 13: Fujitsu’s Architectures and Collaborations for s Architectures and Collaborations for Weather Prediction and Climate Research ECMWF Workshop on HPC in Meteorology, October 2014 Copyright

Reduction in the Chip Area Size Process technology shrinks from 65 to 20 nm System-on-chip integration eliminates the host bus interfaceChip area shrinks to 1/3 size

Tofu networkinterface (TNI)

andTofu barrier

interface (TBI)

Tofu network router (TNR)

PCIExpress

rootcomplex

Host bus interface

ECMWF Workshop on HPC in Meteorology, October 2014 Copyright 2014 FUJITSU LIMITED

Tofu1 < 300mm2

InterConnect Controller (65nm)

Tofu2 < 100mm2 – SPARC64TM XIfx (20nm)

TNI and TBI

TNR

PCI Expressroot complex

34 core processor

with 8 HMC links

12

Page 14: Fujitsu’s Architectures and Collaborations for s Architectures and Collaborations for Weather Prediction and Climate Research ECMWF Workshop on HPC in Meteorology, October 2014 Copyright

Throughput of Single Put Transfer Achieved 11.46 GB/s of throughput which is 92% efficiency

ECMWF Workshop on HPC in Meteorology, October 2014 Copyright 2014 FUJITSU LIMITED

11.46

4.66

0.03

0.06

0.13

0.25

0.50

1.00

2.00

4.00

8.00

16.00

4 16 64 256 1024 4096

Thro

ughp

ut o

f Put

tran

sfer

(GB/

s)

Message size (byte)

Tofu2

Tofu1

13

Page 15: Fujitsu’s Architectures and Collaborations for s Architectures and Collaborations for Weather Prediction and Climate Research ECMWF Workshop on HPC in Meteorology, October 2014 Copyright

Throughput of Concurrent Put Transfers Linear increase in throughput without the host bus bottleneck

Copyright 2014 FUJITSU LIMITEDECMWF Workshop on HPC in Meteorology, October 2014

45.82

17.63

0

5

10

15

20

25

30

35

40

45

50

1-way 2-way 3-way 4-way

Thro

ughp

ut o

f Put

tran

sfer

(GB

/s)

Tofu2

Tofu1

Host bus bottleneck

14

Page 16: Fujitsu’s Architectures and Collaborations for s Architectures and Collaborations for Weather Prediction and Climate Research ECMWF Workshop on HPC in Meteorology, October 2014 Copyright

Communication Latencies

Copyright 2014 FUJITSU LIMITEDECMWF Workshop on HPC in Meteorology, October 2014

Pattern Method LatencyOne-way Put 8-byte to memory 0.87 μsec

Put 8-byte to cache 0.71 μsecRound-trip Put 8-byte ping-pong by CPU 1.42 μsec

Put 8-byte ping-pong by session 1.41 μsecAtomic RMW 8-byte 1.53 μsec

15

Page 17: Fujitsu’s Architectures and Collaborations for s Architectures and Collaborations for Weather Prediction and Climate Research ECMWF Workshop on HPC in Meteorology, October 2014 Copyright

Communication Intensive Applications

Fujitsu’s math library provides 3D-FFT functionality with improved scalability for massively parallel execution Significantly improved interconnect bandwidthOptimised process configuration enabled by Tofu interconnect and job

managerOptimised MPI library provides high-performance collective

communications

16 Copyright 2014 FUJITSU LIMITEDECMWF Workshop on HPC in Meteorology, October 2014

Page 18: Fujitsu’s Architectures and Collaborations for s Architectures and Collaborations for Weather Prediction and Climate Research ECMWF Workshop on HPC in Meteorology, October 2014 Copyright

Enhanced Software Stack for Post-FX10

17 Copyright 2014 FUJITSU LIMITEDECMWF Workshop on HPC in Meteorology, October 2014

Page 19: Fujitsu’s Architectures and Collaborations for s Architectures and Collaborations for Weather Prediction and Climate Research ECMWF Workshop on HPC in Meteorology, October 2014 Copyright

Enduring Programming Model

ECMWF Workshop on HPC in Meteorology, October 2014 Copyright 2014 FUJITSU LIMITED18

Page 20: Fujitsu’s Architectures and Collaborations for s Architectures and Collaborations for Weather Prediction and Climate Research ECMWF Workshop on HPC in Meteorology, October 2014 Copyright

Smaller, Faster, More Efficient

19 Copyright 2014 FUJITSU LIMITEDECMWF Workshop on HPC in Meteorology, October 2014

1 2 3 4 5

Highly integrated components with high-density packaging.

Performance of 1-chassis corresponds to approx. 1-cabinet of K computer.

32 + 2 core CPU Tofu2 integrated

Three CPUs 3 x 8 HMCs (Hybrid Memory Cube) 8 optical modules (25Gbpsx12ch)

1 CPU/1 node 12 nodes/2U chassis Water cooled

Over 200 nodes/cabinet

Chassis (12 CPUs)

CabinetFujitsu designed

SPARC64 XIfx

TM

Efficient in space, time, and power

Tofu 2CPU Memory Board

Page 21: Fujitsu’s Architectures and Collaborations for s Architectures and Collaborations for Weather Prediction and Climate Research ECMWF Workshop on HPC in Meteorology, October 2014 Copyright

20 © 2014 FUJITSU

Fujitsu Server PRIMERGY CX400 M1 (CX2550 M1 / CX2570 M1)

Scale-Out Smart for HPC and Cloud Computing

Page 22: Fujitsu’s Architectures and Collaborations for s Architectures and Collaborations for Weather Prediction and Climate Research ECMWF Workshop on HPC in Meteorology, October 2014 Copyright

PRIMERGY x86 Cluster Series

Japan’s First Vector (Array) Supercomputer(1977)

No.1 in Top500(Nov. 1993)Gordon Bell Prize (1994, 95, 96)

F230-75APU

VPP5000

VPP300/700

AP3000

VPP500

AP1000

VP Series

NWT*Developed with NAL

World’s FastestVector Processor (1999)

PRIMEPOWERHPC2500

World’s Most ScalableSupercomputer (2003)

Japan’s Largest Cluster in Top500(July 2004)

Most Efficient Performancein Top500 (Nov. 2008)

PRIMERGY BX900Cluster node

HX600Cluster node

PRIMEQUEST

FX1

SPARCEnterprise

PRIMERGY RX200Cluster node

*NWT: Numerical Wind Tunnel

ⒸJAXA

K computer

FX10Post-FX10

Exa

No.1 in Top500(June and Nov., 2011)

PRIMERGYCX400

Next x86 generation

Fujitsu has been developing advanced supercomputers since 1977

21 Copyright 2014 FUJITSU LIMITED

Nationalprojects

ECMWF Workshop on HPC in Meteorology, October 2014 21

Page 23: Fujitsu’s Architectures and Collaborations for s Architectures and Collaborations for Weather Prediction and Climate Research ECMWF Workshop on HPC in Meteorology, October 2014 Copyright

Fujitsu PRIMERGY Portfolio

TX Family RX Family BX Family CX Family

Products

Chassis

CX400 CX420

Server Nodes

CX2550 CX2570

ECMWF Workshop on HPC in Meteorology, October 2014 Copyright 2014 FUJITSU LIMITED22

Page 24: Fujitsu’s Architectures and Collaborations for s Architectures and Collaborations for Weather Prediction and Climate Research ECMWF Workshop on HPC in Meteorology, October 2014 Copyright

Fujitsu PRIMERGY CX400 M1

Feature Overview

4 dual-socket nodes in 2U density Up to 24 storage drives Choice of server nodes Dual socket servers with Intel® Xeon®

processor E5-2600 v3 product family (Haswell) Optionally up to two GPGPU or co-processor

cards DDR4 memory technology Cool-safe® Advanced Thermal Design enables

operation in a higher ambient temperature

ECMWF Workshop on HPC in Meteorology, October 2014 Copyright 2014 FUJITSU LIMITED23

Page 25: Fujitsu’s Architectures and Collaborations for s Architectures and Collaborations for Weather Prediction and Climate Research ECMWF Workshop on HPC in Meteorology, October 2014 Copyright

Fujitsu PRIMERGY CX2550 M1

Feature Overview

Highest performance & density Condensed half-width-1U server Up to 4x CX2550 M1 into a CX400 M1 2U

chassis Intel Xeon E5-2600 v3 product family, 16

DIMMs per server node with up to 1,024 GB DDR4 memory

High reliability & low complexity Variable local storage: 6x 2.5” drives per node,

24 in total Support for up to 2x PCIe SSD per node for fast

caching Hot-plug for server nodes, power supplies and

disk drives enable enhanced availability and easy serviceability

ECMWF Workshop on HPC in Meteorology, October 2014 Copyright 2014 FUJITSU LIMITED24

Page 26: Fujitsu’s Architectures and Collaborations for s Architectures and Collaborations for Weather Prediction and Climate Research ECMWF Workshop on HPC in Meteorology, October 2014 Copyright

Fujitsu PRIMERGY CX2570 M1

Feature Overview

HPC optimization Intel Xeon E5-2600 v3 product family, 16 DIMMs

per server node with up to 1,024 GB DDR4 memory Optional two high-end GPGPU/co-processor cards

(Nvidia Tesla, Grid or Intel Xeon Phi) High reliability & low complexity Variable local storage: 6x 2.5” drives per node, 12 in

total Support for up to 2x PCIe SSD per node for fast

caching Hot-plug for server nodes, power supplies and disk

drives enable enhanced availability and easy serviceability

ECMWF Workshop on HPC in Meteorology, October 2014 Copyright 2014 FUJITSU LIMITED25

Page 27: Fujitsu’s Architectures and Collaborations for s Architectures and Collaborations for Weather Prediction and Climate Research ECMWF Workshop on HPC in Meteorology, October 2014 Copyright

Extreme Weather Strong interest within Wales in

impacts of extreme weather events

Fujitsu has established collaborations in this areaHPC Wales studentships Knowledge Transfer Partnership with

Swansea University

Copyright 2014 FUJITSU LIMITEDECMWF Workshop on HPC in Meteorology, October 2014 26

Page 28: Fujitsu’s Architectures and Collaborations for s Architectures and Collaborations for Weather Prediction and Climate Research ECMWF Workshop on HPC in Meteorology, October 2014 Copyright

HPC Wales – Fujitsu PhD Studentships

Cardiff University Dr Michaela Bray

Extreme weather events in Wales: Weather modelling to flood inundation modellingUnified Model linked to river-estuary numerical flow model

Bangor University Dr Reza Hashemi

Simulating the impacts of climate change on estuarine dynamicsIntegrated catchment-to-coast model

Bangor University Dr Simon Creer

Biogeochemical analysis of the effects of drought on CO2release from peat bogs and fens

20 PhD studentships in Welsh Government priority sectors including “Energy and Environment”

Copyright 2014 FUJITSU LIMITEDECMWF Workshop on HPC in Meteorology, October 2014 27

Page 29: Fujitsu’s Architectures and Collaborations for s Architectures and Collaborations for Weather Prediction and Climate Research ECMWF Workshop on HPC in Meteorology, October 2014 Copyright

Knowledge Transfer Partnership

Met Office Unified Model performance Modelling of extreme weather and flood risk

Two Associates

AcademicSupervisor (Swansea)

CompanySupervisor

(FLE)

CompanyImproved applicationperformanceNew solutions/services

AssociatesUM OptimisationCoupling with hydrology codes

Knowledge BaseEngineering, Swansea

UK Government program to promote skills uptake Partnership between Swansea University and Fujitsu Funding from Welsh Government including overseas visits

Copyright 2014 FUJITSU LIMITEDECMWF Workshop on HPC in Meteorology, October 2014 28

Page 30: Fujitsu’s Architectures and Collaborations for s Architectures and Collaborations for Weather Prediction and Climate Research ECMWF Workshop on HPC in Meteorology, October 2014 Copyright

Collaboration with NCI

29 Copyright 2014 FUJITSU LIMITEDECMWF Workshop on HPC in Meteorology, October 2014

• Planned team of 10:4 funded positions + 2 in-kind (NCI) + 2 in-kind (BoM) + 2 in-kind (Fujitsu)

• Contribute to the development of NG-ACCESS

Next Generation Australian Community Climate andEarth-System Simulator (NG-ACCESS)

– A Roadmap 2014-2019

Page 31: Fujitsu’s Architectures and Collaborations for s Architectures and Collaborations for Weather Prediction and Climate Research ECMWF Workshop on HPC in Meteorology, October 2014 Copyright

ACCESS

30 Copyright 2014 FUJITSU LIMITEDECMWF Workshop on HPC in Meteorology, October 2014

ACCESS

UM

MOM

UKCA

CICE

Cable

4DVAR, EnOI

WW3

OASIS-MCT

Page 32: Fujitsu’s Architectures and Collaborations for s Architectures and Collaborations for Weather Prediction and Climate Research ECMWF Workshop on HPC in Meteorology, October 2014 Copyright

Collaborative Activities

Activities within the ACCESS Optimisation Project

Atmosphere• Scaling and optimisation of 2-3 global configurations of UM• Benchmark configurations of UM provided by the Bureau

Ocean• Scaling and optimisation of two global resolution configurations

of the MOM ocean model • Coupled MOM-CICE utilising the OASIS3-MCT coupler

Coupled Climate

• Performance characteristics of ACCESS-CM 1.3 and then introducing a 0.25 degree global ocean into ACCESS-CM 1.4

Data Assimilation

• Collect performance and scalability data for the various data assimilation codes used in ACCESS Parallel Suites

• Initial work will focus on scalability of 4DVAR31 Copyright 2014 FUJITSU LIMITEDECMWF Workshop on HPC in Meteorology, October 2014

• Understanding of scalability bottlenecks• IO server optimisation• Improved use of thread (OpenMP) parallelism

Page 33: Fujitsu’s Architectures and Collaborations for s Architectures and Collaborations for Weather Prediction and Climate Research ECMWF Workshop on HPC in Meteorology, October 2014 Copyright

Fujitsu is committed to developing high-end HPC platforms Fujitsu selected as RIKEN’s partner in FLAGSHIP 2020 project to

develop an exascale-class supercomputer Post-FX10 is a step on the way to exascale computing

October 1, 2014

RIKEN Selects Fujitsu to Develop New SupercomputerOct. 1 — Following an open bidding process, Fujitsu Ltd. has been selected to work with RIKEN to develop the basic design for Japan’s next-generation supercomputer.

Summary

32 Copyright 2014 FUJITSU LIMITEDECMWF Workshop on HPC in Meteorology, October 2014

Fujitsu also offers x86 cluster solutions across the range from departmental to petascale

Fujitsu is promoting collaborative research and development programmes with key partners and customers

Page 34: Fujitsu’s Architectures and Collaborations for s Architectures and Collaborations for Weather Prediction and Climate Research ECMWF Workshop on HPC in Meteorology, October 2014 Copyright