scaling to petaflop - hpc advisory council to petaflop.pdf24 12x ipass connectors m1 block diagram...

19
Scaling to Petaflop Ola Torudbakken Distinguished Engineer Sun Microsystems, Inc

Upload: others

Post on 26-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scaling to Petaflop - HPC Advisory Council to Petaflop.pdf24 12x iPASS Connectors M1 Block Diagram • An ultra-dense 72-port IB Core Switch • Switch performance-3 x 24 ports SDR

Scaling to Petaflop

Ola TorudbakkenDistinguished Engineer

Sun Microsystems, Inc

Page 2: Scaling to Petaflop - HPC Advisory Council to Petaflop.pdf24 12x iPASS Connectors M1 Block Diagram • An ultra-dense 72-port IB Core Switch • Switch performance-3 x 24 ports SDR

2bof-isc2008

HPC Market growth is strong

Market in 2007 doubled from 2003 (Source: IDC 2007)

CAGR increasedfrom 9.2% (2006) to15.5% (2007)

Page 3: Scaling to Petaflop - HPC Advisory Council to Petaflop.pdf24 12x iPASS Connectors M1 Block Diagram • An ultra-dense 72-port IB Core Switch • Switch performance-3 x 24 ports SDR

3bof-isc2008

IDC Server Data (2007)

31.0%26.1%22.6%16.4%11.6%HPC % of Total CPU

7.2%9,446M9070M8,477M7,658MNon HPC CPUs

49.1%3,333M2,643M1,658M1,005MHPC CPU Shipped

13.8%12,779M11,713M10,135M8,663MTotal CPU Shipped

CAGR2006200520042003Year

Technical Computing is the Server Growth Engine

Page 4: Scaling to Petaflop - HPC Advisory Council to Petaflop.pdf24 12x iPASS Connectors M1 Block Diagram • An ultra-dense 72-port IB Core Switch • Switch performance-3 x 24 ports SDR

4bof-isc2008

Market Observations•HPC is now 25% of all CPU units shipped

- Projected to grow to 33% by 2010- Key growth opportunity for the entire industry

•Clustered systems is now 65% or HPC market- IB is the fastest growing interconnect

•Consolidation - Driven by energy, infrastructure, real estate management- Server Virtualization is becoming increasingly popular- Fabric convergence through FCoIB/FCoE

•Management- Cluster managed as single machine

Page 5: Scaling to Petaflop - HPC Advisory Council to Petaflop.pdf24 12x iPASS Connectors M1 Block Diagram • An ultra-dense 72-port IB Core Switch • Switch performance-3 x 24 ports SDR

5bof-isc2008

PetaScale Fabric Requirements•Fabric Performance is critical to Scalability

- High Bandwidth (0.1B/F or better) - Low Latency (< 1-2 usec unidirectional ping) - Low overhead (More cycles for computation) - High Messaging Rate (N/2) - Ability to handle Fabric Congestion- Multi-path routing

•Host Adaptor must support multi-threading- Otherwise performance does not scale with cores- Must support multiple outstanding send/receives and

optimized collectives to Scale Performance- Support legacy (MPI, Socket) and emerging PGAS-style

programming models

Page 6: Scaling to Petaflop - HPC Advisory Council to Petaflop.pdf24 12x iPASS Connectors M1 Block Diagram • An ultra-dense 72-port IB Core Switch • Switch performance-3 x 24 ports SDR

6bof-isc2008

PetaScale Fabric Requirements (2) •Consolidation – unified fabric for Clustering and IO

- Loss-Less reliable delivery- Service differentiation- High throughput

•High Reliability to avoid interruptions- Extremely low undetected error rate- Link-level and end-to-end CRC mechanisms- Efficient re-transmission schemes in HW- Path failover support

•High Availability & Servicability- Quickly root-cause & isolate fault, and service

component

Page 7: Scaling to Petaflop - HPC Advisory Council to Petaflop.pdf24 12x iPASS Connectors M1 Block Diagram • An ultra-dense 72-port IB Core Switch • Switch performance-3 x 24 ports SDR

7bof-isc2008

Why Infiniband?•Open Standard•Open Source•High Performance•High Message rate•Low Latency•Reliable•QoS•Congestion control•Cost effective

IB excels in all areas

Page 8: Scaling to Petaflop - HPC Advisory Council to Petaflop.pdf24 12x iPASS Connectors M1 Block Diagram • An ultra-dense 72-port IB Core Switch • Switch performance-3 x 24 ports SDR

8bof-isc2008

IB in Cluster Storage•Significantly better performance

- No packet drop and RDMA performance- Demonstrated scaling to 100 GB/s I/O

•File system software exists today- Lustre and other parallel file systems- Scaling requires multiple object stores

•Same fabric can be used for MPI and I/O- No additional cost for storage network

•Lots of interest in Infiniband for Storage- Major performance gains over FibreChannel

•Many customers are evaluating- Cost-performance is a critical consideration

Page 9: Scaling to Petaflop - HPC Advisory Council to Petaflop.pdf24 12x iPASS Connectors M1 Block Diagram • An ultra-dense 72-port IB Core Switch • Switch performance-3 x 24 ports SDR

9bof-isc2008

Fabric architecture•Torus Topologies

- Used in BlueGene/L and Cray XT3/4- Each node connects to its neighbours in the X, Y, and Z direction- Pros: Easy to build large fabrics- Pros: Good for nearest-neighbour type applications- Cons: Blocking fabric and variable latency requires application

deployment node locality awareness•Clos Topologies

- Each Node has constant bandwidth- Pros: Lowest latency MPI communication- Pros: No need to consider locality- Cons: More difficult to construct

Page 10: Scaling to Petaflop - HPC Advisory Council to Petaflop.pdf24 12x iPASS Connectors M1 Block Diagram • An ultra-dense 72-port IB Core Switch • Switch performance-3 x 24 ports SDR

10bof-isc2008

The Sun Magnum 3456-port IB switch• Worlds largest CLOS switch

- 3456 ports DDR- 110Tbps total capacity- 700ns latency (DDR)

• Major improvement in reliability- 6x cable reduction vs leaf & core switches- New 12x connector and cable system

• Major improvement in managability- Single centralized switch with known topology- Dual redundant subnet managers

• Dual-Wide Rack Chassis- Redundant Power and Cooling- 36kW power consumption

• Line Cards and Fabric Cards- 24 Line Cards with 144 4x ports realized through

48 12x connectors- 18 Fabric Cards

Page 11: Scaling to Petaflop - HPC Advisory Council to Petaflop.pdf24 12x iPASS Connectors M1 Block Diagram • An ultra-dense 72-port IB Core Switch • Switch performance-3 x 24 ports SDR

11bof-isc2008

12x Connector and Cable•3X denser than Infiniband 4X connector•Electrically and mechanically superior•Supports active copper cable and optics•Cable Serial Number, Local/Remote Cable inserted detection•Designed for QDR•Proposed next-gen IBTA 12x connector (12x CSFP)

12x CableEquivalent to 3 CX-4 cables

12x to CX4 SplitterBreaks out to 3 CX-4 cables

Page 12: Scaling to Petaflop - HPC Advisory Council to Petaflop.pdf24 12x iPASS Connectors M1 Block Diagram • An ultra-dense 72-port IB Core Switch • Switch performance-3 x 24 ports SDR

12bof-isc2008

3456-p Switch Comparision

1/3001300Switch Chassis

1/6Dual Racks12 RacksRack Space

1/61.6 Ton10 TonsWeight

1/611526912# Cables

RatioMagnumTraditionalMetric

300:1 reduction in management points6:1 reduction in cables

6:1 reduction in weight and rack spaceOrder of Magnitude improvement in reliability

Page 13: Scaling to Petaflop - HPC Advisory Council to Petaflop.pdf24 12x iPASS Connectors M1 Block Diagram • An ultra-dense 72-port IB Core Switch • Switch performance-3 x 24 ports SDR

13bof-isc2008

Multi-Path Routing•Optimized multi-path routing for load balancing and path redundancy•~90% efficiency at uniform random distributed load

1) variable packet size distribution with 5% 1472 byte packets, 65% 320 byte packets, and 30% 64 byte packet2) Uniform traffic: every destination has the same probability of being chosen for every packet3) Round-Robin: Every source sends, x packets to its neighor, then x packets to neighbor+1, and so forthSimula Research Center

Page 14: Scaling to Petaflop - HPC Advisory Council to Petaflop.pdf24 12x iPASS Connectors M1 Block Diagram • An ultra-dense 72-port IB Core Switch • Switch performance-3 x 24 ports SDR

14bof-isc2008

Nano-Magnum

24

24

2424 1

2x iP

ASS

Con

nect

ors

M1 Block Diagram

• An ultra-dense 72-port IB Core Switch• Switch performance

- 3 x 24 ports SDR or DDR- 140ns latency (DDR)

• Ideal for cluster configurations up to 288 nodes- Just 4 Nano-Magnums and C48 leaf switches

• 1RU 19” enclosure- Redundant Power and Cooling- 150W power consumption

• Management- Embedded enclosure management- 100T & Serial OOB connections

Page 15: Scaling to Petaflop - HPC Advisory Council to Petaflop.pdf24 12x iPASS Connectors M1 Block Diagram • An ultra-dense 72-port IB Core Switch • Switch performance-3 x 24 ports SDR

15bof-isc2008

The Sun Constellation SupercomputerOpen-Standards Peta Scale Supercomputer

NetworkingCompute Storage SoftwareDeveloper

Tools ProvisioningGrid Engine

Ultra-dense Blade Platform

Fastest Processors: SPARC, AMD Opteron, Intel XeonHigh-density 2S and 4S Blades Fastest Host

Channel Adaptor

Large-Scale and Ultra-dense IB Switches

72 & 3456 port InfiniBand SwitchesUnrivaled cable simplificationMost economical InfiniBand cost/port

Ultra-dense Storage SolutionMost economical and

scalable storage system with LustreUp to 48 TB in 4RUDirect Cabling to IB

Switch

Linux

Comprehensive Software Stack

Integrated Developer ToolsIntegrated Grid Engine

Infrastructure Provisioning,

Monitoring, PatchingSimplified Inventory

Management

Sun Datacenter Switch 3x24

Sun Datacenter Switch3456

504 TF peak performance

Page 16: Scaling to Petaflop - HPC Advisory Council to Petaflop.pdf24 12x iPASS Connectors M1 Block Diagram • An ultra-dense 72-port IB Core Switch • Switch performance-3 x 24 ports SDR

16bof-isc2008

Constellation Rack•Density Optimized Blade Rack

- Redundant Power and Cooling•192 Sockets per Rack

- 36kW per rac•Unibody Rack Design

- Saves 480 lbs per Rack- Less weight per socket

•Infiniband Leaf Switches- Supports 7-stage fabrics- Max 13,824 Nodes

•Optional heat exchanger

Page 17: Scaling to Petaflop - HPC Advisory Council to Petaflop.pdf24 12x iPASS Connectors M1 Block Diagram • An ultra-dense 72-port IB Core Switch • Switch performance-3 x 24 ports SDR

17bof-isc2008

288-node cluster...where is the switch?

6 Constellation Racks + 4 NM + 96 12x Cables

Page 18: Scaling to Petaflop - HPC Advisory Council to Petaflop.pdf24 12x iPASS Connectors M1 Block Diagram • An ultra-dense 72-port IB Core Switch • Switch performance-3 x 24 ports SDR

18bof-isc2008

Massive Scale from 3456 to 13824 Servers1 Core SwitchServers: 3,456PFLOPS: 0.4

2 Core SwitchesServers: 6,912PFLOPS: 0.9

3 Core SwitchesServers: 10,368PFLOPS: 1.3

4 Core SwitchesServers: 13,824PFLOPS: 1.7

Page 19: Scaling to Petaflop - HPC Advisory Council to Petaflop.pdf24 12x iPASS Connectors M1 Block Diagram • An ultra-dense 72-port IB Core Switch • Switch performance-3 x 24 ports SDR

19bof-isc2008

.....horns optional!