an approach for implementing nvmeof based solutions - tcs... · an approach for implementing nvmeof...

24
1 | Copyright © 2018 Tata Consultancy Services Limited An Approach for Implementing NVMeOF based Solutions Sanjeev Kumar Software Product Engineering, HiTech, Tata Consultancy Services 25 May 2018 SDC India 2018

Upload: others

Post on 15-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Approach for Implementing NVMeOF based Solutions - TCS... · An Approach for Implementing NVMeOF based Solutions Sanjeev Kumar Software Product Engineering, HiTech, Tata Consultancy

1| Copyright © 2018 Tata Consultancy Services Limited

An Approach for Implementing NVMeOF based Solutions

Sanjeev Kumar

Software Product Engineering, HiTech, Tata Consultancy Services

25 May 2018

SDC India 2018

Page 2: An Approach for Implementing NVMeOF based Solutions - TCS... · An Approach for Implementing NVMeOF based Solutions Sanjeev Kumar Software Product Engineering, HiTech, Tata Consultancy

2

Agenda

Implementing NVMeOF based solution

Network Fabric – Which one to Choose ?

Drivers support for NVMeOF

Recap of NVMeOF Evolution

Page 3: An Approach for Implementing NVMeOF based Solutions - TCS... · An Approach for Implementing NVMeOF based Solutions Sanjeev Kumar Software Product Engineering, HiTech, Tata Consultancy

3

Click to edit Master title styleRecap of NVMeOF Evolution

Page 4: An Approach for Implementing NVMeOF based Solutions - TCS... · An Approach for Implementing NVMeOF based Solutions Sanjeev Kumar Software Product Engineering, HiTech, Tata Consultancy

4

Recap of Communication Protocols

• SAS and SATA performance increased over time but protocols have not changed much

• NVMe was created to allow direct access to the logical block based architecture of SSDs and to do highly-parallel IO enabling SSD to execute more IO threads than any other protocol before.

• NVMe Protocol Specification latest Version 1.3b launched on May 2018.

Page 5: An Approach for Implementing NVMeOF based Solutions - TCS... · An Approach for Implementing NVMeOF based Solutions Sanjeev Kumar Software Product Engineering, HiTech, Tata Consultancy

5

Data Flow with Enterprise Storage Over Network : Limitations

Flash Storage

Linux

Windows

SCSI

SCSI

SCSI

SCSI

SCSI

NVMe SSDNVMe Host Software

SCSI

NVMe

PCIe

NVM SubsystemSCSI NVMe

NVM SubsystemSCSI NVMe

NVM SubsystemSCSI NVMe

Protocol conversion bridge is required to access the data over network which increases the NVMe latency

iSCSIFC

iSCSIFC

iSCSI

Page 6: An Approach for Implementing NVMeOF based Solutions - TCS... · An Approach for Implementing NVMeOF based Solutions Sanjeev Kumar Software Product Engineering, HiTech, Tata Consultancy

6

What is a NVM Subsystem ?

NVM Media

NVM I/F

NVMeNamespace

NVMeNamespace

NVMe Controller

NVMe Controller

Fabric Port

Fabric Port

Network Fabric

NVM Subsystem

Page 7: An Approach for Implementing NVMeOF based Solutions - TCS... · An Approach for Implementing NVMeOF based Solutions Sanjeev Kumar Software Product Engineering, HiTech, Tata Consultancy

7

NVMe Over Fabric Solution

Launched a new specification “NVMe Over Fabric 1.0” on June 2016.

Is a way to send NVMe commands over networking protocols (“Fabrics”). E.g.– RDMA (Infiniband, iWarp, RoCE, ..)– Fibre Channel

Defines a common architecture that supports a range of storage networking fabrics for NVMe block storage protocol over a storage networking fabric

Inherent parallelism of NVMe multiple I/O Queues is exposed to the host (64k Queues & 64k commands per Q)

NVMe commands and structures are transferred end-to-end

Maintains the NVMe architecture across a range of fabric types

Maintains architecture and software consistency between fabric types by standardizing a common abstraction and encapsulation definition

Design goal of NVMe over Fabrics : Provide distance connectivity to NVMe devices with no more than 10 microseconds (μs) of additional latency

Page 8: An Approach for Implementing NVMeOF based Solutions - TCS... · An Approach for Implementing NVMeOF based Solutions Sanjeev Kumar Software Product Engineering, HiTech, Tata Consultancy

8

Click to edit Master title styleNetwork Fabric – Which one to Choose ?

Page 9: An Approach for Implementing NVMeOF based Solutions - TCS... · An Approach for Implementing NVMeOF based Solutions Sanjeev Kumar Software Product Engineering, HiTech, Tata Consultancy

9

NVMe Over fabric Protocol – Transport Mode

Source: nvmexpress.org

Page 10: An Approach for Implementing NVMeOF based Solutions - TCS... · An Approach for Implementing NVMeOF based Solutions Sanjeev Kumar Software Product Engineering, HiTech, Tata Consultancy

10

NVMe Over Fabric Stack Architecture

Fabric Physical(i.e. Ethernet, Infiniband, Fiber Channel)

Fabric Protocol

NVMe Transport

NVMe Transport Binding Services

Fabric Specific PropertiesTransport Specific Features/ Specilization

NVMe Architecture, Queuing Interface, Admin Command & I/O Command Sets, Properties

Fabric

NVMe Transport

Transport BindingSpecifications

NVMe Over Fabric

Page 11: An Approach for Implementing NVMeOF based Solutions - TCS... · An Approach for Implementing NVMeOF based Solutions Sanjeev Kumar Software Product Engineering, HiTech, Tata Consultancy

11

Network Fabrics

NVMe Over Fabric describe how to transport the NVMe interface across several scalable fabrics

NVMe Over Fabric initially defines two type of fabrics for NVMe transport as Fiber Channel and RDMA

NVMe Host Software

Host Side Transport Abstraction

Controller side Transport Abstraction

NVMe SSDs

Fiber ChannelRoCEv2 iWARP Infiniband FCoE NextGen

Fabric

Page 12: An Approach for Implementing NVMeOF based Solutions - TCS... · An Approach for Implementing NVMeOF based Solutions Sanjeev Kumar Software Product Engineering, HiTech, Tata Consultancy

12

Comparative Analysis

RoCEv2 iWARP Infiniband FCoE FC iNVMe

RDMA over Converged Ethernet v2

Internet Wide Area RDMA Protocol

Infiniband Fiber Channel over Ethernet

Fiber Channel NVMe Over TCP

Promoted by Mellanox

Promoted by Intel Promoted by Mellanox

Promoted by CISCO Promoted by CISCO, Brocade

Promoted by Facebook, DELL EMC and Intel

RDMA based RDMA based RDMA based Leverage FC- NVMe Non RDMA based Non RDMA based

Not Compatible with Other Ethernet Options

Not Compatible with Other Ethernet Options

Very Rarely Deployed. Special usecases like HPC or Server-Server cluster

Not Compatible with Other Ethernet Options

Compatible with both SCSI and NVMe protocol

Not Yet a part of NVMe oF standard

Lossless Ethernet support ( ECN-Explicit Congestion Notification)

Lossless Ethernet not required

Already a lossless protocol

Require a DCB network(Lossless Ethernet)

Already a lossless protocol

Leverage s/wimplementation of NVMe

Page 13: An Approach for Implementing NVMeOF based Solutions - TCS... · An Approach for Implementing NVMeOF based Solutions Sanjeev Kumar Software Product Engineering, HiTech, Tata Consultancy

13

NVMe Over Fabric Protocol Stacks

Three dominant protocol Fiber Channel is a lossless network fabric,

NVMe is just a new upper layer protocol where RDMA is not needed

RDMA + IP + Lossless Ethernet (DCB) layer add complexity to RoCEv2 & iWARPprotocols

Ethernet /IP based protocol are using commodity and internet scale.

Fiber Channel RoCEv2 iWARP

FC Physical Ethernet Physical Ethernet Physical

Encoding

Framing & Flow Control

FC - NVMe

Ethernet DCB

IP (with ECN)

TCP(with ECN)

MPA

DDP Protocol

RDMA Verbs

NVMe Over RDMA

Ethernet DCB

IP (with ECN)

UDP

IB Transport(with ECN)

RDMA Verbs

NVMe Over RDMA

NVMe Over Fabric

Page 14: An Approach for Implementing NVMeOF based Solutions - TCS... · An Approach for Implementing NVMeOF based Solutions Sanjeev Kumar Software Product Engineering, HiTech, Tata Consultancy

14

Business Drivers to Adapt NVMeOF based Solution

AI/ Machine Learning

Hyper Converged Infrastructure

Analytics Video Processing

High Performance Computing

Page 15: An Approach for Implementing NVMeOF based Solutions - TCS... · An Approach for Implementing NVMeOF based Solutions Sanjeev Kumar Software Product Engineering, HiTech, Tata Consultancy

15

Click to edit Master title styleImplementing NVMeOF based Solution

Page 16: An Approach for Implementing NVMeOF based Solutions - TCS... · An Approach for Implementing NVMeOF based Solutions Sanjeev Kumar Software Product Engineering, HiTech, Tata Consultancy

16

Core Elements for Implementing NVMe Over Fabric Solutions

NVMe Enabled Host System

NVMe Supported Network

NVMe based Storage Sub System

Windows Linux Unix Solaris VMware

Fiber Channel RoCE v2 iWARP

NVMe Controller NSID (NVMe Name

Space ID) NVM Interface MVM Media(Chip,

SSD, NVDIMM)

1 2 3

Page 17: An Approach for Implementing NVMeOF based Solutions - TCS... · An Approach for Implementing NVMeOF based Solutions Sanjeev Kumar Software Product Engineering, HiTech, Tata Consultancy

17

Reference Architecture for NVMeOF Solution for Enterprise Storage

Flash Storage

LinuxHost

WindowsHost

SCSI

SCSI

SCSI

SCSI

NVMe

HOST

NVMe SSD

NVMe Subsystem

PCIe

iSCSISMBFC

iSCSIiSERSRPFC

NVMe Over Fabric

NVMe Optimized

Host

NVMe

NVMe Over Fabric NVMe Bridge

SCSI NVMe

SCSI NVMe

NVMe Subsystem

NVMe BridgeNVMe SSD

PCIe

NVMe Subsystem

NVMe BridgeNVMe SSD

PCIe

NVMe Subsystem

NVMe BridgeNVMe SSD

PCIe

Front End Fabric Back End Fabric

Page 18: An Approach for Implementing NVMeOF based Solutions - TCS... · An Approach for Implementing NVMeOF based Solutions Sanjeev Kumar Software Product Engineering, HiTech, Tata Consultancy

18

Click to edit Master title styleLinux Based NVMeOF Driver

Page 19: An Approach for Implementing NVMeOF based Solutions - TCS... · An Approach for Implementing NVMeOF based Solutions Sanjeev Kumar Software Product Engineering, HiTech, Tata Consultancy

19

NVMe Over Fabric Support in Linux

Current functionality implemented for NVMe Host Driver• Support for RDMA transport ( Infiniband/ RoCE /iWARP ) • Connect/Disconnect to multiple controllers• Transport of NVMe commands/data generated by NVMe core• Initial Discovery service implementation • Multi-Path

Current functionality implemented for NVMe Target Driver• Support for mandatory NVMe core and Fabrics commands• Support for multiple hosts/subsystems/controls/namespaces• Namespaces backed by Linux block devices• Initial Discovery service; Discovery Subsystem/Controller(s)• Target Configuration interface using Linux configfs• Create NVM and Discovery Subsystems

Linux Fabrics Driver is a part of Linux Kernel 4.8 onwards

Page 20: An Approach for Implementing NVMeOF based Solutions - TCS... · An Approach for Implementing NVMeOF based Solutions Sanjeev Kumar Software Product Engineering, HiTech, Tata Consultancy

20

NVMe Over Fabric Host Driver Implementation in Linux

Infiniband / RoCE / iWARP

PCI Express

Host Channel Adapter Driver

RDMA StackPCIe Stack

NVMe Fabric Driver(New)

NVMe PCIeDriver (minus

common code)

Common Code

Block Layer

File System

Application

FS Operations Block Device

I/O

NVMe Command by IOCTL

Generic Block Command

New Fabric Driver uses the existing common code Additional it is split into a small common fabrics library

and the actual transport driver Existing user space APIs of the PCIe driver are also

supported when using fabrics Uses new sub-command of the existing nvme-cli tool to

connect to remote controllers

Page 21: An Approach for Implementing NVMeOF based Solutions - TCS... · An Approach for Implementing NVMeOF based Solutions Sanjeev Kumar Software Product Engineering, HiTech, Tata Consultancy

21

NVMe Over Fabric Target Driver Implementation in Linux

NVMe Target Core

NVMe Admin Command

Implementation

NVMe I/O Command Implementation

NVMe Over Fabric Support

NVMe Over Fabric Discovery

Service

Defines and manages the NVMe entities (subsystems, controllers, namespaces, ...) and their allocation

Responsible for initial commands processing and correct orchestration of the stack setup and tear down

Admin: Responsible for parsing and servicing admin commands such as controller identify, set features, keep-alive, log page, ...)

I/O : Responsible for performing the actual I/O (Read, Write, Flush, Deallocate.

Responsible for servicing Fabrics commands (connect, property get/set)

Responsible for discovering NVMeSubsystems available to Host, Namespace and multipath

RDMA Transport

Linux RDMA Drivers

iWA

RP

Ro

CE

v2

Infi

nib

an

d

Page 22: An Approach for Implementing NVMeOF based Solutions - TCS... · An Approach for Implementing NVMeOF based Solutions Sanjeev Kumar Software Product Engineering, HiTech, Tata Consultancy

22

NVMe Drive

How Data Flows from Host to Target in NVMeOF?

Host

Linux File System

Linux Block IO

NVMe Device Driver

NVMe Fabrics

RDMA Verbs

RDMA Driver

RDMA Offload

TCP OffloadNIC

Kernel

Target

NVMe Fabrics

RDMA Verbs

RDMA Driver

RDMA Offload

TCP OffloadNIC

Kernel

NVMe Device Driver

NVMe Drive

Page 23: An Approach for Implementing NVMeOF based Solutions - TCS... · An Approach for Implementing NVMeOF based Solutions Sanjeev Kumar Software Product Engineering, HiTech, Tata Consultancy

23

NVMe Supported Storage Array Today

• NetApp – E570 All Flash Array (FC -NVMe)• PureStorage - DirectFlash• DELL EMC• IBM• Supermicro• Mangstor• E8 Storage• Pavillion Data• Excelero• Aperion

Page 24: An Approach for Implementing NVMeOF based Solutions - TCS... · An Approach for Implementing NVMeOF based Solutions Sanjeev Kumar Software Product Engineering, HiTech, Tata Consultancy

24

Click to edit Master title styleQ&Amail us @ [email protected]