high performance interconnects: assessment & rankings

22
High Performance Interconnects: Landscape, Assessments & Rankings Dan Olds Partner, OrionX September 21 st , 2016

Upload: insidehpc

Post on 08-Jan-2017

1.046 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: High Performance Interconnects: Assessment & Rankings

High Performance Interconnects:

Landscape, Assessments &

Rankings

Dan Olds

Partner, OrionX

September 21st, 2016

Page 2: High Performance Interconnects: Assessment & Rankings

Very top end of networking market – when you absolutely need high bandwidth and low latency

Without HPI, you don’t really have a cluster – or at least one that works very well

Performance has been rising at 30% annually

Spending on HPI is also rising significantly

High Performance Interconnects (HPI)

©2016 OrionX 2

Page 3: High Performance Interconnects: Assessment & Rankings

Current HPI line

Multi-Rack

10G

40G

100G

1G

100G

Link Speed

TCP/IP

InfiniBand

Specialized

OPA

Net Protocol App Comm

MPI

JDBC

RMI

IIOP

SOAP

etc.

Single-Rack

HPI market segment

Page 4: High Performance Interconnects: Assessment & Rankings

Three Types of HPI

©2016 OrionX 4

Ethernet – Sold by a host of providers, Cisco, HPE, Juniper, plus many others

– Tried and true interconnect, easiest to implement

– While it has the bandwidth of others, latency is pretty high (ms rather than ns)

Proprietary – Primarily sold by Cray, SGI, and IBM plus a few others

– Have to purchase a system in order to get their brand of HPI

– Intel is a new entrant in this segment of the market, although without an accompanying system

InfiniBand – Mellanox has emerged as the de-facto leader

– Highest performance based on published numbers:100Gb/s, 150m messages/s, 90ns latency

Page 5: High Performance Interconnects: Assessment & Rankings

Key Differences in HPI: Product Maturity/Position

©2016 OrionX 5

Ethernet

Ethernet has been around longer than any HPI, but surpassed in performance

– Still many installations, but has lost much of its share at the high end

– Latency (measured in ms, not ns) is the problem, not bandwidth

Page 6: High Performance Interconnects: Assessment & Rankings

Key Differences in HPI: Product Maturity/Position

©2016 OrionX.net 6

Intel Omni Path Architecture – Intel and Omni-Path are in their infancy still, very few installations

– Handful of customers (although some big names), few, if any, in production

– Claims bandwidth/latency/message rate same or better than InfiniBand (covered later)

Page 7: High Performance Interconnects: Assessment & Rankings

Key Differences in HPI: Product Maturity/Position

©2016 OrionX.net 7

InfiniBand

– Has been in the HPI market since early 2000’s

– Thousands of customers, millions of nodes

– Now makes up almost half of the TOP500 list

– Synonymous with Mellanox these days

Page 8: High Performance Interconnects: Assessment & Rankings

Key Differences in HPI technology

©2016 OrionX 8

Onload vs. Offload

Onload: main CPU handles all network processing chores, adapter and switches just pass the messages, examples – Intel Omni-Path Architecture, Ethernet

– Also PC servers, old UNIX systems where CPUs handled every task and received interrupts on communications

Offload: HCA and switches handle all network processing tasks, very little or no need for main CPU cycles, allows CPU to continue processing applications, examples – Mellanox InfiniBand

– Mainframes with communication assist processors used to allow CPU to process applications, not communications

Page 9: High Performance Interconnects: Assessment & Rankings

Offload Details

©2016 OrionX.net 9

Network protocol load includes: – Link Layer: packet layout, packet forwarding,

flow control, data integrity, QoS

– Network layer: adds header, routing of packets from one subnet to another

– Transport layer: in-order packet delivery, divides data into packets, receiver reassembles packets, sends/receives acknowledgements

– MPI operations: scatter, gather, broadcast, etc.

With offload, ALL of these operations are handled by the adapter hardware, example: InfiniBand HBA

Page 10: High Performance Interconnects: Assessment & Rankings

Onload Details

©2016 OrionX.net 10

Network protocol load includes: – Link layer packet layout, packet forwarding,

flow control, data integrity, QoS

– Network layer: header, routing of packets from one subnet to another

– Transport layer: in-order packet delivery, divides data into packets, receiver reassembles packets, sends/receives acknowledgements

– MPI operations: scatter, gather, broadcast, etc.

With onload, ALL of these operations are performed by the host processor, using host memory

Page 11: High Performance Interconnects: Assessment & Rankings

Onload vs. Offload

©2016 OrionX.net 11

Onload vs. Offload

isn’t a big deal when

the cluster is

small…

Page 12: High Performance Interconnects: Assessment & Rankings

Onload vs. Offload

©2016 OrionX.net 12

But it will become a very large deal when the cluster becomes larger

Will particularly be a problem on scatter, gather type collective problems when head node will be overrun trying to process messages

Page 13: High Performance Interconnects: Assessment & Rankings

Onload vs. Offload

©2016 OrionX.net 13

As node count increases, performance of Onload will drop

– Higher node count = more messaging, pressure on head node

Node counts are increasing significantly

Dedicated hardware ASICs

– Much faster than general purpose CPUs

MPI not highly parallel

– With Onload, this means that speed is

limited to slowest core speed

– Has no bearing on Offload speed

Page 14: High Performance Interconnects: Assessment & Rankings

FUD War Rampant

©2016 OrionX.net 14

Cost of HPI in cluster budget is

typically ~25% of total

Prices in high tech typically

don’t increase over time

Price points for new products

typically are the same as the

former high end product they

replace…ex: high end PCs

From: The Next Platform, “Intel Stretches Deep Learning On

Scalable System Framework”, 5/10/16

Page 15: High Performance Interconnects: Assessment & Rankings

More FUD War…..

©2016 OrionX.net 15

All images provided by Intel, all from The Next Platform story “Intel Stretches Learning on Scalable System Framework” May 10th, 2016

What else do these images have in common?

Page 16: High Performance Interconnects: Assessment & Rankings

FUD Wars – Behind the Numbers

©2016 OrionX.net 16

It’s all in the fine print,

right?

Here’s Intel’s fine print for

the graphs on the last

slide….

“dapl” is key, it’s an Intel

MPI mechanism that

doesn’t allow for offload

operations ala InfiniBand

…..48 port (B0 silicon). IOU Non-posted Prefetch disabled in BIOS.

Snoop hold-off timer = 9. EDR based on internal testing: Intel MPI

5.1.3, shm:dapl fabric, RHEL 7.2 -genv

I_MPI_DAPL_EAGER_MESSAGE_AGGREGATION off. Mellanox

EDR ConnectX-4 Single Port Rev 3 MCX455A HCA. Mellanox

SB7700 – 36 Port EDR InfiniBand switch. MLNX_OFED_LINUX-3.2-

2.0.0.0 (OFED-3.2-2.0.0). IOU Non-posted Prefetch enabled in BIOS.

1. osu_latency 8 B message. 2. osu_bw 1 MB message. 3.

osu_mbw_mr, 8 B………

Software and workloads used in performance tests may have been

optimized for performance only on Intel microprocessors.

Page 17: High Performance Interconnects: Assessment & Rankings

“dapl” Strikes Again!

©2016 OrionX.net 17

100% CPU core utilization on a Offload HCA?!!

Does anyone believe this?!!

This means that about half of the Top500 systems are absolutely useless

“dapl” is a key component of this ‘benchmark’ once again

Page 18: High Performance Interconnects: Assessment & Rankings

FUD Aside, here are the numbers….

©2016 OrionX.net 18

Intel OPA Mellanox InfiniBand

Bandwidth 100 Gb/sec 100 Gb/sec

Latency .93 .85 or less

Message rate 89 million/sec* 150 million/sec

* this number, provided by Intel, has dropped from >150 million in 2015

Page 19: High Performance Interconnects: Assessment & Rankings

HPI Roadmaps

©2016 OrionX.net 19

InfiniBand roadmap shows EDR now (100Gb/s) and HDR in 2017 (200Gb/s)

Can’t find a solid Intel OPA roadmap

Ethernet roadmap shows 200Gb/s in 2018-19

Page 20: High Performance Interconnects: Assessment & Rankings

Major HPI Choices

©2016 OrionX.net 20

Vendor Market Customer Product

Presence Trends Overall Readiness Needs Overall Capabilities Roadmap Overall

Mellanox 9 9 9 8 9 8.5 9 10 9.5

Ethernet vendors 7 7 7 9 6 7.5 7 6 6.5

Intel 6 8 7 6 7 6.5 7 8 7.5

Page 21: High Performance Interconnects: Assessment & Rankings

Mellanox

Intel

Ethernet vendors

Vendor Market Product Customer

Ethernet 7 6.5 7.5

Mellanox 9 9.5 8.5

Intel 7 7.5 6.5

OrionX Constellation

Page 22: High Performance Interconnects: Assessment & Rankings

©2016 OrionX 22

OrionX Constellation™ reports