infinibridgetm: an integrated...scatter gather list system memory channel 1 channel 2 channel 16m...

22
Mellanox Technologies, Inc. InfiniBridge TM : An Integrated InfiniBand Switch and Channel Adapter Chris Eddington Director of Technical Marketing Mellanox Technologies [email protected]

Upload: others

Post on 14-Aug-2020

18 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: InfiniBridgeTM: An Integrated...Scatter Gather List System Memory Channel 1 Channel 2 Channel 16M Kernel User Channel Based Channel Detail User Space Communication DMA engine supports

Mellanox Technologies, Inc.

InfiniBridgeTM: An Integrated

InfiniBand Switch and Channel

Adapter

Chris Eddington

Director of Technical Marketing

Mellanox Technologies

[email protected]

Page 2: InfiniBridgeTM: An Integrated...Scatter Gather List System Memory Channel 1 Channel 2 Channel 16M Kernel User Channel Based Channel Detail User Space Communication DMA engine supports

2

Mellanox Technologies, Inc.

Agenda

InfiniBand Overview

Virtual Lanes and Virtual Fabrics

Network Stack and Reliable Connections

Virtual Interface Architecture

InfiniBridgeTM Transport Protocol

Engines

InfiniPCI Technology

Summary

Page 3: InfiniBridgeTM: An Integrated...Scatter Gather List System Memory Channel 1 Channel 2 Channel 16M Kernel User Channel Based Channel Detail User Space Communication DMA engine supports

3

Mellanox Technologies, Inc.

InfiniBand Switch Fabric

HCA (Host Channel Adaptor)Connects a CPU to the InfiniBandFabric

TCA (Target Channel Adaptor)

Connects I/O controllers such asEthernet, SCSI, Fibre Channel toInfiniBand

Switches:

Basic building block of InfiniBandSubnets

Routers:

Connect IB subnets

Connect IB to SAN / LAN / WAN

Page 4: InfiniBridgeTM: An Integrated...Scatter Gather List System Memory Channel 1 Channel 2 Channel 16M Kernel User Channel Based Channel Detail User Space Communication DMA engine supports

4

Mellanox Technologies, Inc.

Network stack

User code

Kernel code

Hardware

IB End node IB Switch IB Router Legacy End node

Application

Upper Layer

protocols

Transport

Layer

Network

Layer

Link

Layer

Physical

Layer

Application

Upper Layer

protocols

Transport

Layer

Network

Layer

Link

Layer

Physical

Layer

Packet relay

PH

Y

Packet relay

PH

YP

HY

Lin

k

PH

YL

ink

Packet relay

PH

YL

ink

PH

YL

ink

Legacy Router

LRH (8) GRH (40) Transport (12-40) Payload(0-4096) ICRC(4) VCRC(2)

Page 5: InfiniBridgeTM: An Integrated...Scatter Gather List System Memory Channel 1 Channel 2 Channel 16M Kernel User Channel Based Channel Detail User Space Communication DMA engine supports

5

Mellanox Technologies, Inc.

Packet Format

LRH – Local Route Header (8 Bytes)

LRH(8) <GRH>(40) BTH(12) Payload (0-4096) VCRC(2)ICRC(4)

VL(4) LVER(4) SL(4) X(2) LNH(2) DLID(16) SLID(16)X(5) PktLen(11)

General IB Request Packet Structure

BTH – Base Transport Header (12 Bytes)

SE(1) MR(1) PadCnt(2) TVER(4) PKEY(16) DestQP(24) X(7) PSN(24)X(8)

VA(64) RKEY(32) DmaLength(32)

Bits

Bits

Bits

Opcode(8)

BytesRETH(16)

A(1)

RETH – RDMA Extended Transport Header (16 Bytes)

Page 6: InfiniBridgeTM: An Integrated...Scatter Gather List System Memory Channel 1 Channel 2 Channel 16M Kernel User Channel Based Channel Detail User Space Communication DMA engine supports

6

Mellanox Technologies, Inc.

Virtual Lanes

Multiplex independent datastreams onto a single physicallink:

Dedicated management lane

Differentiated services on a packet-boundary basis

Alleviates head-of-line blocking

Allow VL-based load balancingacross multiple paths

VL0

VL14

M

u

x

PacketsPackets

D

e

m

u

x

Management

Lane

Page 7: InfiniBridgeTM: An Integrated...Scatter Gather List System Memory Channel 1 Channel 2 Channel 16M Kernel User Channel Based Channel Detail User Space Communication DMA engine supports

7

Mellanox Technologies, Inc.

Link Level Flow Control

Credit-based link-level flow controlLink Receivers grant packet receivebuffer space credits per VL

Separate flow control per VL enablesVirtual Fabrics

Multiple protocols on a unified physicalnetworkCongestion and latency on one VL doesnot impact traffic with guaranteed QoSon another VL

Arbitration

De-

muxMux

Link

Control

Packets

Credits

Returned

Link

Control

Receive

BuffersPackets

Transmitted

InfiniBand

Virtual

FabricsStorage

LAN & WANLAN & WAN

IB IB IB IB IB IB IBIB IB

Clustering

Page 8: InfiniBridgeTM: An Integrated...Scatter Gather List System Memory Channel 1 Channel 2 Channel 16M Kernel User Channel Based Channel Detail User Space Communication DMA engine supports

8

Mellanox Technologies, Inc.

Reliable Transport

What makes a Reliable Connection?

Reliability (Acknowledgement)

Packets must be in-order

No missing packets

Flow Control – prevents end point buffer overflow

Connections

End to end associations between user space processes

(called sockets in TCP/IP)

Requires de-multiplexing of datagrams

Putting the data where it needs to be

Message copying from kernel to user space

Page 9: InfiniBridgeTM: An Integrated...Scatter Gather List System Memory Channel 1 Channel 2 Channel 16M Kernel User Channel Based Channel Detail User Space Communication DMA engine supports

9

Mellanox Technologies, Inc.

Reliable Transport

6

Demux6 9 141347

1413

7 8

4 6

98

DataGrams

Reordering

&

Retransmission

DataGrams

1413

7 8

4 6

9

1413

7 8

4 5

9 8 9

6Resend 5

1413

7 8

4 5

9

6

Data Movement

&

Memory Protection Checks

1413

7 8

4 5

9

6

Kernel

Space

User

Space

Copy

Page 10: InfiniBridgeTM: An Integrated...Scatter Gather List System Memory Channel 1 Channel 2 Channel 16M Kernel User Channel Based Channel Detail User Space Communication DMA engine supports

10

Mellanox Technologies, Inc.

InfiniBand and Virtual Interface

Page 11: InfiniBridgeTM: An Integrated...Scatter Gather List System Memory Channel 1 Channel 2 Channel 16M Kernel User Channel Based Channel Detail User Space Communication DMA engine supports

11

Mellanox Technologies, Inc.

InfiniBridgeTM Features

Mellanox InfiniBridgeTM MT21108

Integrated channel adapter and switch

Key Features:

Supports both 1X (2.5Gb/s) and 4X (10Gb/s)InfiniBand Links

Hardware Transport Protocol Engines deliver reliablein-order connection

Multiple Virtual Lanes plus a Dedicated ManagementLane

Multicast Support

Maximum Transfer Unit (MTU) up to 2K/4K bytes

Greater than 100 Gb/s Internal Bandwidth

InfiniPCITM: Transparent PCI-to-PCI Bridge

Page 12: InfiniBridgeTM: An Integrated...Scatter Gather List System Memory Channel 1 Channel 2 Channel 16M Kernel User Channel Based Channel Detail User Space Communication DMA engine supports

12

Mellanox Technologies, Inc.

InfiniBridgeTM High Level Block Diagram

InfiniBand Link Interface

Boundary

Scan

I2C

8 bit CPU Port PCI 64b/66 MHZ

InfiniBridgTM

MT21108

Non-blocking Crossbar Switch

and

Advanced Scheduling Engine

Subnet Management

Agent (SMA)

General

Purpose I/O

PCI

Channel Adapter

PCI Controller

Peripheral Controller

JTAGIB P

ort

IB P

ort

IB P

ort

IB P

ort

IB P

ort

IB P

ort

IB P

ort

IB P

ort

Page 13: InfiniBridgeTM: An Integrated...Scatter Gather List System Memory Channel 1 Channel 2 Channel 16M Kernel User Channel Based Channel Detail User Space Communication DMA engine supports

13

Mellanox Technologies, Inc.

InfiniBand Port Logic

Only serial interface defined by InfiniBand TA

SerDes use Parallel interface interface to ASIC

Point to point,125MHz, source synchronous, DDR

SSTL2

10 pins + clock and reference voltage in each direction

TXCLK

TXDATA(10)

InfiniBand

Port

RXCLK

RXDATA(10)

SerDes

Parallel Interface Serial Interface

Transmit Pair

VREF

Receive Pair

VREF

Page 14: InfiniBridgeTM: An Integrated...Scatter Gather List System Memory Channel 1 Channel 2 Channel 16M Kernel User Channel Based Channel Detail User Space Communication DMA engine supports

14

Mellanox Technologies, Inc.

InfiniBand Switch

Layer 2 Forwarding

Decode Incoming Packet Header (LRH) to get

DLID and SL

Lookup destination port in FDB (Forwarding

Database)

Lookup VL from SL

Output scheduler decides priority based on VL

and integrity checks

Page 15: InfiniBridgeTM: An Integrated...Scatter Gather List System Memory Channel 1 Channel 2 Channel 16M Kernel User Channel Based Channel Detail User Space Communication DMA engine supports

15

Mellanox Technologies, Inc.

SL Table

InfiniBand Switch (cont.)

Memory

Input Port 1..n

Incoming Packet

Packet

Decode

DLIDSL

FDB

Output Port 1..n Scheduler

VL Port #

Page 16: InfiniBridgeTM: An Integrated...Scatter Gather List System Memory Channel 1 Channel 2 Channel 16M Kernel User Channel Based Channel Detail User Space Communication DMA engine supports

16

Mellanox Technologies, Inc.

InfiniBridgeTM Transport Engine Block DiagramPCI 64b/66 MHZ

150KB

Distributed Link

Buffering

PCI Target

Transport Engine

Target

Channel Lookup Table

PCI Master

Transport

DMA Engines

Transport Engine

Master

Channel Lookup Table

Link Layer

Controller

Link Layer

Controller

Link Layer

Controller

Link Layer

Controller

Link Layer

Controller

Link Layer

Controller

Link Layer

Controller

Link Layer

Controller

Non-Blocking Full Wire Speed Switch

Packet

Buffering

InfiniRiscTM

Embedded

RISC Processor

Four 1X links may be optionally bonded together to form a 4X (10Gb/s) Link

Page 17: InfiniBridgeTM: An Integrated...Scatter Gather List System Memory Channel 1 Channel 2 Channel 16M Kernel User Channel Based Channel Detail User Space Communication DMA engine supports

17

Mellanox Technologies, Inc.

InfiniBridgeTM Transport Protocol EnginePCI 64b/66 MHZ

Transport

DMA Engines

Descriptors and Data

D0

D1

D2 NULL PTR

IB Channel Info

S/G 0

S/G 1

S/G 2

Data

Linked

List of

descriptors

Scatter

Gather

List

System Memory

Channel 1

Channel 2

Channel 16M

Kernel

User

Channel DetailChannel Based

User Space Communication

DMA engine supports up to

16M channels and able to

sustain more than 100K I/O

Ops/s. Number of

concurrent I/O ops limited

by external memory

Descriptor Detail

Page 18: InfiniBridgeTM: An Integrated...Scatter Gather List System Memory Channel 1 Channel 2 Channel 16M Kernel User Channel Based Channel Detail User Space Communication DMA engine supports

18

Mellanox Technologies, Inc.

InfiniPCITM Technology

InfiniPCITM Technology

Transparent PCI to PCI Bridging over standard InfiniBand

Fabrics

Functions with existing OS, BIOS, PCI software and

hardware without modifications

Use PCI semantics to create multi-segment backplanes, fully

switched chassis, and multi-chassis fabrics

Page 19: InfiniBridgeTM: An Integrated...Scatter Gather List System Memory Channel 1 Channel 2 Channel 16M Kernel User Channel Based Channel Detail User Space Communication DMA engine supports

19

Mellanox Technologies, Inc.

InfiniPCITM System View

PCI Target

Segment

Header

PC

I B

us

CAPCI Target

Channel

Lookup

Table

PCI Master

Channel

Lookup

Table

PC

I B

us

InfiniBand

Switch

Fabric

PCI Target PCI Master

WQPN

InfiniBand Attributes:�Layer 2 Address (LID)

�Connection Number: (WQPN)

�Opcode: RDMA RD/WR, Send

� Address: VA= f(PCIADDRESS)

Address

CMD

Claims the

Cycle

PCI Attributes:� CMD: Mem, IO, Config,

Read/Write

� ReadLine, ReadMultiple

� Interrupts

Page 20: InfiniBridgeTM: An Integrated...Scatter Gather List System Memory Channel 1 Channel 2 Channel 16M Kernel User Channel Based Channel Detail User Space Communication DMA engine supports

20

Mellanox Technologies, Inc.

CPU

6U InfiniBand Channel

Adapter Card

Card Configured for

Primary P2P ModeCards Configured for

Secondary P2P Mode

Compact PCI Chassis

PCI

InfiniBridgeTM

MT21108

LEDsOSC

EEPROM

1X

1X

1X

1X

J1

J2

Compact PCI Chassis

PCI

Compact PCI Chassis

PCI

InfiniBand links can be

direct or through

switched fabric using

copper or fiber

Chassis-to-Chassis Interconnect

Page 21: InfiniBridgeTM: An Integrated...Scatter Gather List System Memory Channel 1 Channel 2 Channel 16M Kernel User Channel Based Channel Detail User Space Communication DMA engine supports

21

Mellanox Technologies, Inc.

Remote I/O Application

Compact PCI Chassis

PCI

CPCI Storage Chassis

64 bit 66 Mhz PCIInfiniBand NIC Configured

for Primary P2P Mode

InfiniBand SwitchServer

4X InfiniBand Links

Cards Configured for

Secondary P2P Mode

Page 22: InfiniBridgeTM: An Integrated...Scatter Gather List System Memory Channel 1 Channel 2 Channel 16M Kernel User Channel Based Channel Detail User Space Communication DMA engine supports

22

Mellanox Technologies, Inc.

Summary

InfiniBridgeTM Architecture

Integrated 45 Gb/s non blocking switch and channel

adapter

Reliable transport in hardware

Transport Protocol Engines support up to 16M

connections with concurrency

InfiniRISCTM embedded RISC processor

Virtual fabrics enable multi-protocol networks

InfiniPCITM technology implements

transparent PCI bridging