competing technologies and architectures for … technologies and architectures for networked flash...
TRANSCRIPT
Competing Technologies and Architectures for
Networked Flash Storage
Ethernet/InfiniBand/OmniPath/PCI/FibreChannel/SAS
Asgeir Eiriksson
Chelsio Communications
Flash Memory Summit 2015
Santa Clara, CA
1
Introduction
Flash Memory Summit 2015
Santa Clara, CA
2
API are evolving for optimal use of SSD
FC and SAS falling behind the speed curve
Ethernet, IB and OmniPath on same PHY curve
PCI on different slightly slower PHY curve
Ethernet, IB, and OmniPath
Have different reach
Same protocol stack efficiencies
Introduction: speeds and feeds
3
Bandwidth (Gbps) Reach
SAS 3, 6, 12 Rack
Fibre Channel 4, 8, 16, 32 Rack, Data center
PCI x1/2/4/8/16 8, 16, 32, 64,128 Rack
Ethernet 1, 2.5, 5, 10, 25, 40, 50, 100 Rack, Data Center, LAN, MAN, WAN
Infiniband 8, 16, 32, 56, 112 Rack, Data Center
OmniPath 100, 200 Rack, Data Center
PHY SERDES (single lane) curves
Flash Memory Summit 2015
Santa Clara, CA
4
2010 2015 2018
SAS 6 12 24
Infiniband 10 25 50
Ethernet 10 25 50
50G
10G
1G
25G
50G
10G
Speed (
b/s
)
25G
6G
12G
24G
Infiniband and Ethernet
• Same PHY curve
• Same speed curve
Traditional Scale Out Storage
5
Disaster
Recovery
60-300+ miles
Storage
Cluster
Network
Storage
Controller
Storage
Controller
Storage
Controller
Application Controller
Application Controller
Data Center
Network
Traditional Scale Out Storage
Preserves software investment
Realizes some of the SSD speedup benefits
Disaster Recovery (DR) requires MAN or WAN
6
Shared Server Flash
7
Disaster Recovery
60-300+ miles
Ethernet,
Infiniband, or
Omnipath
Fabric
NVMe Storage Server
NVMe Storage Server
NVMe Storage Server
Shared Server Flash
Ethernet or IB or OmniPath fabric
PCIe fabric not sufficient reach or scaling
RDMA required for sufficient efficiency
IB and OmniPath use RDMA
Ethernet has RoCE, iWARP and iSCSI with RDMA
Disaster Recovery (DR) requires MAN or WAN
8
API
Flash Memory Summit 2015
Santa Clara, CA
9
I/O and Primitives (Atomics, etc.)
Memory, Persistent Memory
APIs, Libraries
Applications
Lustre, XFS, ext3, ext4, NVMFS
File System
Flash Devices
Flash-Aware
Optimizations
Transparent
Optimizations
API
Preserve software investment
Alternatively jump directly to native SSD API
10
Ethernet, Infiniband, OmniPath
Flash Memory Summit 2015
Santa Clara, CA
11
Infiniband, OmniPath Reliable link layer
Credit based flow control
Ethernet Ubiquitous
Pause and Prioritized Pause (PPC) for lossless operation that propagates through some switches and fewer routers
Flow Control and Reliability at higher layer e.g. TCP, and IB Transport Layer for RoCE
Comparing Ethernet Options
12
DCB
Required Reach
IP
routable RDMA
FCoE Rack, LAN
iSCSI No
Rack, datacenter,
LAN, MAN, WAN
Wired, wireless
iWARP No
Rack, datacenter,
LAN, MAN, WAN
Wired, wireless
RoCEv2 Rack, LAN, datacenter
Comparing Ethernet Options
13
iSCSI, iWARP
Use DCB when it is available but not required for
high performance
iSCSI
Has RDMA WRITE and accomplishes RDMA READ
by using an RDMA WRITE from other end-point
Concurrent support for legacy soft-iSCSI
Comparing Ethernet Options
Flash Memory Summit 2015
Santa Clara, CA
14
RDMA bypasses the host software stack
RoCE
iWARP
iSCSI with offload
NVMe over RDMA fabrics
iWARP/RoCE/IB
Offload
Application Buffer
Sockets Buffer
TCP/IP Buffer
NIC
Driver Buffer
Buffer
Initiator
iWARP/RoCE/IB
Offload
Application Buffer
Sockets Buffer
TCP/IP Buffer
NIC
Driver Buffer
Buffer
Target
Flash Memory Summit 2015
Santa Clara, CA
Bypass
RDMA
iSCSI with offload
iSCSI
Offload
Application Buffer
Sockets Buffer
TCP/IP Buffer
NIC
Driver Buffer
Buffer
Initiator
iSCSI
Offload
Application Buffer
Sockets Buffer
TCP/IP Buffer
NIC
Driver Buffer
Buffer
Target
Flash Memory Summit 2015
Santa Clara, CA
Bypass
RDMA
iSER with offload
iWARP/RoCE
Offload
Application Buffer
Sockets Buffer
TCP/IP Buffer
NIC
Driver Buffer
Buffer
Initiator
iWARP/RoCE
Offload
Application Buffer
Sockets Buffer
TCP/IP Buffer
NIC
Driver Buffer
Buffer
Target
Flash Memory Summit 2015
Santa Clara, CA
Bypass
RDMA
NVMe over fabrics Option 1
Flash Memory Summit 2015
Santa Clara, CA
18
PCIe PCIe
RDMA
• Control Plane on host or ASIC/FPGA/SoC
• Data Plane PCIe-host-PCie or PCIe only
Intel/ARM host
ASIC/FPGA/SoC
Control
&
Data Plane
PCIe
Fabric
NVMe NVMe
SSD
NVMe NVMe
SSD
NVMe NVMe
SSD RDMA NIC
Infiniband,
OmniPath, or
Ethernet
RoCEv2/iWARP
Fabric
RDMA NIC
RDMA NIC
NVMe over fabrics Option 2
Flash Memory Summit 2015
Santa Clara, CA
19
PCIe
PCIe
Intel/ARM
host
Control
Plane
PCIe
Fabric
NVMe NVMe
SSD
NVMe NVMe
SSD
NVMe NVMe
SSD RDMA NIC
Infiniband,
OmniPath, or
Ethernet
RoCEv2/iWARP
Fabric ASIC/FPGA/
SoC
Data Plane
RDMA
• Control Plane Intel/ARM host
• Data Plane PCIe-ASIC/FPGA/SoC-PCIe
PCIe
RDMA NIC
RDMA NIC
NVMe over fabrics comparison
Flash Memory Summit 2015
Santa Clara, CA
20
Option 1
Flexible
Extra latency incurred by copy/copies
Option 2
Minimizes latency by removing host and host
memory system from data path
Summary
Flash Memory Summit 2015
Santa Clara, CA
21
API are evolving for optimal use of SSD
Ethernet, IB, and OmniPath
On same SERDES PHY (single lane) curve
Have different reach
Same protocol stack efficiencies