unified pci express network - universitetet i oslo · x8 ipass cable connectors 2 meter copper...

25
CONFIDENTIAL Copyright 2017 All rights reserved. 1 Unified PCI Express Network High Performance Applications Roy Nordstrøm

Upload: others

Post on 09-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Unified PCI Express Network - Universitetet i oslo · x8 iPass cable connectors 2 Meter copper cables 64 Gbps bidirectional throughput per port 200ns port to port latency Supports

CONFIDENTIAL Copyright 2017 All rights reserved. 1

Unified PCI Express NetworkHigh Performance Applications

Roy Nordstrøm

Page 2: Unified PCI Express Network - Universitetet i oslo · x8 iPass cable connectors 2 Meter copper cables 64 Gbps bidirectional throughput per port 200ns port to port latency Supports

CONFIDENTIAL - Copyright 2017 All rights reserved. 2

Who is Dolphin?

▪ Norwegian Company started in 1992

▪ Dolphin has more than two decades of multi-host computing and clustering experience

▪ Developed a complete software and hardware infrastructure for multi-host computing and IO

– Software

– Host Adapter cards

– Switches

PCI Express Network

Page 3: Unified PCI Express Network - Universitetet i oslo · x8 iPass cable connectors 2 Meter copper cables 64 Gbps bidirectional throughput per port 200ns port to port latency Supports

CONFIDENTIAL - Copyright 2017 All rights reserved. 3

What is a PCIe Networking or PCIe over Cable?

▪ Extend PCIe between systems and I/O using cables or backplanes

▪ Supports copper and fiber cables– Copper cables up to 9 meters*

– Fiber cables between 10-100 meters*

▪ Two types of bridging models– Transparent bridging (NT) to I/O devices – no software

needed supported in hardware

– Non-transparent bridging (NTB) used to connect two or more root complexes such as processor or GPUs. software required to transfer data.

▪ No changes to PCIe protocol – standard PCIetransactions

• * Copper and fiber cable lengths vary based on boards, switches, and speed of interconnect

Page 4: Unified PCI Express Network - Universitetet i oslo · x8 iPass cable connectors 2 Meter copper cables 64 Gbps bidirectional throughput per port 200ns port to port latency Supports

CONFIDENTIAL - Copyright 2017 All rights reserved. 4

PCIe is the dominant IO bus technology in computers today, and is also gaining traction as a high-bandwidth low-latency interconnect

PCI-SIG. PCI Express 3.1 Base Specification, 2010. http://www.eetimes.com/document.asp?doc_id=1259778

0

5

10

15

20

25

30

35

Gen 2 Gen 3 Gen 4

Gig

abyte

s p

er

second (

GB/s

)

PCIe x4

PCIe x8

PCIe x16

Page 5: Unified PCI Express Network - Universitetet i oslo · x8 iPass cable connectors 2 Meter copper cables 64 Gbps bidirectional throughput per port 200ns port to port latency Supports

CONFIDENTIAL - Copyright 2017 All rights reserved. 5

Goal of PCI Express Network

▪ Reduce network latency and overhead to accelerate applications

▪ Take advantage of standardization, technology and performance of PCI Express to develop an efficient powerful local network – Gen2, Gen3 and beyond

▪ Take advantage of the features and functions within PCI Express

▪ Provide a low cost solution that leverages the PCI Express eco-system

▪ Combined host to host and host to I/O network and sharing

Page 6: Unified PCI Express Network - Universitetet i oslo · x8 iPass cable connectors 2 Meter copper cables 64 Gbps bidirectional throughput per port 200ns port to port latency Supports

CONFIDENTIAL - Copyright 2017 All rights reserved. 6

Unified PCI Express Network

▪ Combination of two elements– PCIe Clustering

– PCIe SmartIO technology

▪ PCIe Clustering– Designed for tightly coupled distributed

systems Low latency

High throughput

– Scale-out capability Node scaling from 2 nodes to 128 nodes (128

nodes based on new technology)

Performance scaling Gen3 x4 PCIe to x16 PCIe

▪ PCIe SmartIO technology– Create pool of devices

Device lending enables devices to be shared in a PCIe Cluster

Direct peer-to-peer communication

By-pass local CPU and system memory

– Enhance capabilities Create MR-IOV capabilities with SR-IOV devices

Page 7: Unified PCI Express Network - Universitetet i oslo · x8 iPass cable connectors 2 Meter copper cables 64 Gbps bidirectional throughput per port 200ns port to port latency Supports

CONFIDENTIAL - Copyright 2017 All rights reserved. 7

PCIe Network Markets

High Performance Markets that benefit from Low Latency

Electronic Trading Applications

• Trading Desks

• High Availability systems

Storage

• NVMe drive interconnect

• Replication

• Low Latency Storage

Real-time Applications

• Military/ Aerospace

• Medical imaging

• Test Equipment

• Video + Rendering

Simulation

• GPU based simulation

• Reflective memory

Clustered File and storage Systems

• Gluster/GFS

• Hadoop

• DRBD Replication

Parallel Computation

• Reflective memory systems

• HPC Libraries

• CUDA applications

File SystemsElectronicTrading

Real-time

Applications

Parallel Computing

Simulation Storage

Page 8: Unified PCI Express Network - Universitetet i oslo · x8 iPass cable connectors 2 Meter copper cables 64 Gbps bidirectional throughput per port 200ns port to port latency Supports

CONFIDENTIAL - Copyright 2017 All rights reserved. 9

CLUSTERING TECHNOLOGY

Using Dolphin

Page 9: Unified PCI Express Network - Universitetet i oslo · x8 iPass cable connectors 2 Meter copper cables 64 Gbps bidirectional throughput per port 200ns port to port latency Supports

CONFIDENTIAL - Copyright 2017 All rights reserved. 10

PCI Express Network Gen3 Hardware

PXH810/812 HOST ADAPTER

▪ PCI Express Small form factor

▪ Gen3 x8 switch

▪ 64 Gbps bidirectional throughput, 0.54ms

▪ x8 Edge connector

▪ x8 iPass cable connector

▪ 5 Meter copper cables

▪ 100 Meter optical cables

▪ Gen1 and Gen 2 support

▪ Compliant with Dolphin Software

▪ Transparent and non-transparent bridging

▪ Host and target support

▪ Transparent only version: PXH812

▪ Available Now

PXH830/832 HOST ADAPTER

▪ PCI Express low profile half length form factor

▪ Gen3 x16 switch

▪ 128 Gbps bidirectional throughput, 0.54ms

▪ x16 Edge connector

▪ 4 – x4 Cable Ports, SFF-8644

▪ PCI-SIG Ext. Cable 3.0 or MiniSAS-HD

▪ 1-x16 port or 2- x8 ports

▪ 9 Meter copper cables

▪ 100 Meter optical cables

▪ Compliant with Dolphin Software

▪ Transparent and non-transparent bridging

▪ Host and target support

▪ Transparent only version: PXH832

▪ Available Now

Page 10: Unified PCI Express Network - Universitetet i oslo · x8 iPass cable connectors 2 Meter copper cables 64 Gbps bidirectional throughput per port 200ns port to port latency Supports

CONFIDENTIAL - Copyright 2017 All rights reserved. 11

Dolphins Switchtec PCIe Gen3 Adapter

MXH832 PCIe HOST ADAPTER

▪ PCI Express low profile, half length form factor

▪ Gen3 32 lane Microsemi Switchtec chipset

▪ 128 Gbps bidirectional throughput,

▪ Host to host latency 0.5us*

▪ x16 Edge connector

▪ 4 – x4 Cable Ports, SFF-8644

▪ PCI-SIG Ext. Cable 3.0 or MiniSAS-HD

▪ 9 Meter copper cables*

▪ 100 Meter optical cables

▪ Configurations

▪ 1-x16 port,

▪ 2- x8 ports

▪ 4- x4 ports

▪ Transparent

▪ Host and target support

▪ Available Q4 2017

▪*)Project target, may change

Page 11: Unified PCI Express Network - Universitetet i oslo · x8 iPass cable connectors 2 Meter copper cables 64 Gbps bidirectional throughput per port 200ns port to port latency Supports

CONFIDENTIAL - Copyright 2017 All rights reserved. 12

Dolphins/Samtec PCIe Gen3 Fiber Adapter

PXH840/PXH842 PCIe HOST ADAPTER

▪ PCI Express low profile, half length form factor

▪ Gen3 32 lane Broadcom Switch chipset

▪ 128 Gbps bidirectional throughput,

▪ Host to host latency 0.5us

▪ x16 Edge connector

▪ Up to 4 – x4 Firefly optical engines

▪ 100 Meter optical cable Support

▪ MTP connector support

▪ Compliant with Dolphin Software

▪ Transparent and non-transparent bridging

▪ Host and target support

▪ Transparent only version: PXH842

▪ Available Q4-2017

Page 12: Unified PCI Express Network - Universitetet i oslo · x8 iPass cable connectors 2 Meter copper cables 64 Gbps bidirectional throughput per port 200ns port to port latency Supports

CONFIDENTIAL - Copyright 2017 All rights reserved. 13

Dolphin Gen 3 Switch Hardware

IXS600 8 port PCIe SWITCH

▪ 1U 19 inch rackmount switch

▪ Gen3 64 lane Chipset

▪ 8 Ports

▪ x8 iPass cable connectors

▪ 2 Meter copper cables

▪ 64 Gbps bidirectional throughput per port

▪ 200ns port to port latency

▪ Supports transparent or non-transparent switching / reflective memory

▪ Gen1 and Gen2 backward compatible

▪ Ethernet management, firmware upgrade and monitoring

▪ Available now

MXS824 24 port PCIe SWITCH

▪ 1U 19 inch rackmount switch

▪ Gen3 96 lane Microsemi Switchtec Chipset

▪ 24 – x4 Cable Ports, SFF-8644

▪ PCI-SIG Ext. Cable 3.0 / MiniSAS-HD cables

▪ 9 Meter copper cables*

▪ 100 Meter optical cables

▪ 32 Gbps bidirectional throughput per port

▪ < 200ns port to port latency

▪ Flexible port merging x4, x8, x16

▪ Supports transparent or non-transparent switching / reflective memory

▪ Cascadeable to larger systems – 64 / 128 nodes

▪ Ethernet management, firmware upgrade and monitoring

▪ Available Q1 2018

Page 13: Unified PCI Express Network - Universitetet i oslo · x8 iPass cable connectors 2 Meter copper cables 64 Gbps bidirectional throughput per port 200ns port to port latency Supports

CONFIDENTIAL - Copyright 2017 All rights reserved. 14

IXH620 + IXS600 configuration

Page 14: Unified PCI Express Network - Universitetet i oslo · x8 iPass cable connectors 2 Meter copper cables 64 Gbps bidirectional throughput per port 200ns port to port latency Supports

CONFIDENTIAL - Copyright 2017 All rights reserved. 15

PXH – IXH Technology Comparison

Feature PXH830 PXH810 IXH610 / IXH620

PCIe Technology x16 Gen3 x8 Gen3 x8 Gen2

Connector SFF-8644 iPass iPass

Latency 0.54 us 0.54 us 0.74 us

PIO Throughput 10 Gigabytes/s 5.3 Gigabytes/s 2.9 Gigabytes/s

DMA Throughput 11 Gigabytes/s 6.6 Gigabytes/s 3.5 Gigabytes/s

Multicast groups 16 (default 4) 16 (default 4) 4

Max multicast PIO performance

10 Gigabytes/s 5.3 Gigabytes/s 2.9 Gigabytes/s

Max nodes 3 (switch 2017) 8 20 (56 multicast)

Max cable length 9 m copper100 meter fiber

5 m copper100 meter fiber

7 m copper300 meter fiber

1) System limitations applies2) Scaling dependent on system resources available

Page 15: Unified PCI Express Network - Universitetet i oslo · x8 iPass cable connectors 2 Meter copper cables 64 Gbps bidirectional throughput per port 200ns port to port latency Supports

CONFIDENTIAL - Copyright 2017 All rights reserved. 16

eXpressWare Software Components

Dolphin Software Components

Standard Software Components

SISCI Shared Memory APISuperSockets Berkeley Sockets APIOptimized TCP / IP Driver

Network ManagementIRM – Interconnect Resource Manager

Page 16: Unified PCI Express Network - Universitetet i oslo · x8 iPass cable connectors 2 Meter copper cables 64 Gbps bidirectional throughput per port 200ns port to port latency Supports

CONFIDENTIAL - Copyright 2017 All rights reserved. 17

eXpressWare Software portability

OSIF API – Operating System dependent code in separate libraries

IDT Driver

Microsemi

Hardware

Linux OSIF Lib

IRM – Interconnect Resource Manager

Intel NTB

Hardware

PLX PCIe

Hardware

IDT PCIe

Hardware

PLX Driver NTB Driver PFX Driver

Windows OSIF Lib

VxWorks OSIF Lib

PAL API – Hardware dependent code in separate libraries

GENIF API – Interface to other drivers

RTX OSIF Lib

Page 17: Unified PCI Express Network - Universitetet i oslo · x8 iPass cable connectors 2 Meter copper cables 64 Gbps bidirectional throughput per port 200ns port to port latency Supports

CONFIDENTIAL - Copyright 2017 All rights reserved. 18

Berkeley Sockets API Compliant

• PIO for low latency, RDMA for low CPU utilization

• Adaptive protocols to reduce system load

• Failover to Ethernet

Implementation optimized over shared memory

No changes to applications –plug and play

SuperSockets

Page 18: Unified PCI Express Network - Universitetet i oslo · x8 iPass cable connectors 2 Meter copper cables 64 Gbps bidirectional throughput per port 200ns port to port latency Supports

CONFIDENTIAL - Copyright 2017 All rights reserved. 19

SuperSockets Availability

Windows

Linux

Windows XP - Windows 10 + Server Editions- Layered Service Provider/WinSock2 API- Support for TCP

Linux 2.6/4.x Distributions-Dynamic Transparent fail-over and fail-back to Ethernet- Support for TCP/RDSv1- UDP / UDP multicast- Compliant with Linux Kernel Space

Socket API

Page 19: Unified PCI Express Network - Universitetet i oslo · x8 iPass cable connectors 2 Meter copper cables 64 Gbps bidirectional throughput per port 200ns port to port latency Supports

CONFIDENTIAL - Copyright 2017 All rights reserved. 20

Halfway Ping Pong Latency – SISCI API

▪ Half way roundtrip latency

▪ PCIe Gen3 Starts at 0.54 µs

▪ PCIe Gen2 Starts at 0.74 µs

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

4.00

4 8 16 32 64 128 256 512 1024 2048 4096 8192

LATEN

CY I

N µ

s

MESSAGE SIZE

SCIPP latency

PXH830 x16 Gen3 PXH810 x8 Gen3 IXH610 x8 Gen2

Page 20: Unified PCI Express Network - Universitetet i oslo · x8 iPass cable connectors 2 Meter copper cables 64 Gbps bidirectional throughput per port 200ns port to port latency Supports

CONFIDENTIAL - Copyright 2017 All rights reserved. 21

SISCI – PIO throughput

▪ Streaming PIO bandwidth

0

2000

4000

6000

8000

10000

12000

4 8 16 32 64 128 256 512 1K 2K 4K 8K 16K 32K 65K 131K 262K 524K

TH

RO

UG

HPU

T I

N M

Bps

MESSAGE SIZE

SCIBench Throughput

PXH810 IXH610 PXH830

Page 21: Unified PCI Express Network - Universitetet i oslo · x8 iPass cable connectors 2 Meter copper cables 64 Gbps bidirectional throughput per port 200ns port to port latency Supports

CONFIDENTIAL - Copyright 2017 All rights reserved. 22

DMA Throughput

0

2000

4000

6000

8000

10000

12000

TH

RO

UG

HPU

T I

N M

Bps

MESSAGE SIZE

DMA Bench Throughput

PXH830 x16 Gen3 PXH810 x8 Gen3 IXH610 x8 Gen2

Page 22: Unified PCI Express Network - Universitetet i oslo · x8 iPass cable connectors 2 Meter copper cables 64 Gbps bidirectional throughput per port 200ns port to port latency Supports

CONFIDENTIAL - Copyright 2017 All rights reserved. 23

DEVICE LENDINGDolphin SmartIO Technology

Page 23: Unified PCI Express Network - Universitetet i oslo · x8 iPass cable connectors 2 Meter copper cables 64 Gbps bidirectional throughput per port 200ns port to port latency Supports

CONFIDENTIAL - Copyright 2017 All rights reserved. 24

PCIe Device Lending

PCIe Gen3 Link

▪ All PCIe devices connected to separate server are logically available at one server

– No changes to device drivers

Physical Connection Logical View

Page 24: Unified PCI Express Network - Universitetet i oslo · x8 iPass cable connectors 2 Meter copper cables 64 Gbps bidirectional throughput per port 200ns port to port latency Supports

CONFIDENTIAL - Copyright 2017 All rights reserved. 25

Resource Pool with Device Lending

▪ Hosts on a PCIe network can borrow regular PCIe devices attached to remote hosts or attached to a central switch– Lend PCIe devices between systems

▪ Supports GPUs, FPGAs, NVMe drivers, and other PCIe devices

▪ Scale out to multiple systems with PCIe switches

▪ No Linux Kernel patches

▪ No application software modifications necessary

▪ Virtually no performance difference between local and remote resources

▪ Supports Hot Pluggable devices

▪ Supports run-time re-configuration and bring-up. Now power on sequencing required between systems

GPU

PCIe Switch

NVMe

Page 25: Unified PCI Express Network - Universitetet i oslo · x8 iPass cable connectors 2 Meter copper cables 64 Gbps bidirectional throughput per port 200ns port to port latency Supports

CONFIDENTIAL - Copyright 2017 All rights reserved. 26

PCIe Device lending

▪ Lending and borrowing software on multiple hosts

▪ Lending system makes borrowing system aware of available devices

▪ Borrowing system borrows devices. New device is hot added to borrowing system.

▪ Supports MSI and MSI-X Interrupts

▪ No changes to device drivers, standard transparent drivers used with Dolphin Smart-IO setup

▪ Devices look like part of borrowing system and acts like an attached device