nvme virtualization final - snia...r scsi primary/block commands to nvme command mapping r common...

33
2016 Storage Developer Conference. © IBM. All Rights Reserved. NVMe virtualization ideas for Machines on Cloud Sangeeth Keeriyadath IBM [email protected] @ksangeek day2dayunix.sangeek.com https://www.linkedin.com/in/sangeek

Upload: others

Post on 22-Jul-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NVMe Virtualization Final - SNIA...r SCSI primary/block commands to NVMe command mapping r Common SCSI Field translations e.g. “PRODUCT IDENTIFICATION” field would be translated

2016 Storage Developer Conference. © IBM. All Rights Reserved.

NVMe virtualization ideas for Machines on Cloud

Sangeeth KeeriyadathIBM

[email protected]

@ksangeek

day2dayunix.sangeek.com

https://www.linkedin.com/in/sangeek

Page 2: NVMe Virtualization Final - SNIA...r SCSI primary/block commands to NVMe command mapping r Common SCSI Field translations e.g. “PRODUCT IDENTIFICATION” field would be translated

2016 Storage Developer Conference. © IBM. All Rights Reserved.

Disclaimer

All of the contents of this presentation are based on my understanding of NVM Express® and my academic interests to explore the possibilities of virtualizing NVMe. The contents in this presentation do not necessarily represent IBM’s positions, strategies or opinions.

2

Page 3: NVMe Virtualization Final - SNIA...r SCSI primary/block commands to NVMe command mapping r Common SCSI Field translations e.g. “PRODUCT IDENTIFICATION” field would be translated

2016 Storage Developer Conference. © IBM. All Rights Reserved.

Topics

r NVMe primerr Past storage virtualization learningsr Virtualization considerationsr NVMe virtualization approaches

rBlind Mode : SCSI to NVMe translationrVirtual Mode : pure NVMe virtual stackrPhysical Mode : SR-IOV

r NVMe over Fabrics virtualizationr Other deliberations

3

Page 4: NVMe Virtualization Final - SNIA...r SCSI primary/block commands to NVMe command mapping r Common SCSI Field translations e.g. “PRODUCT IDENTIFICATION” field would be translated

2016 Storage Developer Conference. © IBM. All Rights Reserved.

NVMe primer [1] [2]

NVM Express® (NVMe) is an inherently parallel and high-performinginterface and command set designed for non-volatile memory based storager The interface provides optimized command submission and

completion paths resulting in :r Lower latencyr Increased bandwidth

r Simple command setr Support for parallel operation ( no lock contention ) by supporting up

to 65,535 I/O Queues with up to 64K outstanding commands per I/O Queue

r Suitable for complementing the benefits of multi-core CPU systems and application parallelism - Command Submission/Completion queue ( SQ/CQ ) bound to core, process or thread

4

Page 5: NVMe Virtualization Final - SNIA...r SCSI primary/block commands to NVMe command mapping r Common SCSI Field translations e.g. “PRODUCT IDENTIFICATION” field would be translated

2016 Storage Developer Conference. © IBM. All Rights Reserved.

NVMe SSD

NVMe view [3]

5

NVM Express Controller

Namespace A(NSID 1)

Namespace B(NSID 2)

Namespace C(NSID 3)

Controller ManagementAdmin Submission / Completion Queues

I/OSubmissionQueue A

I/OSubmissionQueue X

I/OSubmissionQueue Y

I/OCompletionQueue A

I/OCompletionQueue Z

Page 6: NVMe Virtualization Final - SNIA...r SCSI primary/block commands to NVMe command mapping r Common SCSI Field translations e.g. “PRODUCT IDENTIFICATION” field would be translated

2016 Storage Developer Conference. © IBM. All Rights Reserved.

NVMe benefitsr Storage stack reduction ( lower CPU utilization, 2 Register

writes for command submit/complete )

r Designed considering next generation NVM based devices ( 65,535 I/O Queues )

r Elastic buffer of big size ( 64K commands )

r Bind interrupts to CPUs ( Support For Multiple MSI-X Interrupts )

r Interconnect choices : rNVMe over PCIerNVMe over Fabric (RDMA, Fibre Channel)

6

Page 7: NVMe Virtualization Final - SNIA...r SCSI primary/block commands to NVMe command mapping r Common SCSI Field translations e.g. “PRODUCT IDENTIFICATION” field would be translated

2016 Storage Developer Conference. © IBM. All Rights Reserved.

Topics

r NVMe primerr Past storage virtualization learningsr Virtualization considerationsr NVMe virtualization approaches

rBlind Mode : SCSI to NVMe translationrVirtual Mode : pure NVMe virtual stackrPhysical Mode : SR-IOV

r NVMe over Fabrics virtualizationr Other deliberations

7

Page 8: NVMe Virtualization Final - SNIA...r SCSI primary/block commands to NVMe command mapping r Common SCSI Field translations e.g. “PRODUCT IDENTIFICATION” field would be translated

2016 Storage Developer Conference. © IBM. All Rights Reserved.

Why virtualize ?

r Key component of cloud computingr Optimizes utilization of physical resourcesr Improves storage utilizationr Simplified storage management - provides a

simple and consistent interface to complex functions

r Easier high availability(HA) solutions

8

Page 9: NVMe Virtualization Final - SNIA...r SCSI primary/block commands to NVMe command mapping r Common SCSI Field translations e.g. “PRODUCT IDENTIFICATION” field would be translated

2016 Storage Developer Conference. © IBM. All Rights Reserved.

Learnings : IBM PowerVM success storyIBM® PowerVM® provides the industrial-strength virtualization solution for IBM Power Systems™ servers and blades that run IBM AIX®, IBM i and Linux workloads.Virtual I/O Server(VIOS) facilitates the sharing of physical I/O resources among guest Virtual Machines(VM). [4]

r NPIV [5]N-Port ID Virtualization(NPIV) is a T11 standard technology for virtualization of Fibre Channel networks. Enables connection of multiple VMs to one physical port of a Fibre Channel adapter. VIOS facilitates storage adapter sharing.

r vSCSI [6]Using virtual SCSI, VM can share disk storage, tape or optical devices that are assigned to the VIOS. VIOS does the storage virtualization, performs SCSI emulation and acts as SCSI target. 9

Page 10: NVMe Virtualization Final - SNIA...r SCSI primary/block commands to NVMe command mapping r Common SCSI Field translations e.g. “PRODUCT IDENTIFICATION” field would be translated

2016 Storage Developer Conference. © IBM. All Rights Reserved.

PowerVM storage virtualization view

10

Virtual IO Server

Power Hypervisor

vSCSI VM

A B C D

E F HG

vFC VM

vFC_CvS_CvS_SvFC_S

G H H DG C

Host

Storage box

Page 11: NVMe Virtualization Final - SNIA...r SCSI primary/block commands to NVMe command mapping r Common SCSI Field translations e.g. “PRODUCT IDENTIFICATION” field would be translated

2016 Storage Developer Conference. © IBM. All Rights Reserved.

Why virtualize NVMe ?

r Limited PCIe slots and need to share the benefits amongst VMs ( high VM density )

r Use cases :rServer side caching of VM datarVM boot disk

11

NVMe being highly scalable is well suited for various virtualization exploitations

Page 12: NVMe Virtualization Final - SNIA...r SCSI primary/block commands to NVMe command mapping r Common SCSI Field translations e.g. “PRODUCT IDENTIFICATION” field would be translated

2016 Storage Developer Conference. © IBM. All Rights Reserved.

Topics

r NVMe primerr Past storage virtualization learningsr Virtualization considerationsr NVMe virtualization approaches

rBlind Mode : SCSI to NVMe translationrVirtual Mode : pure NVMe virtual stackrPhysical Mode : SR-IOV

r NVMe over Fabrics virtualizationr Other deliberations

12

Page 13: NVMe Virtualization Final - SNIA...r SCSI primary/block commands to NVMe command mapping r Common SCSI Field translations e.g. “PRODUCT IDENTIFICATION” field would be translated

2016 Storage Developer Conference. © IBM. All Rights Reserved.

NVMe virtualization approaches

r Implementing SCSI to NVMe translation layer on the Hypervisor. ( Blind Mode ).

r Pure virtual NVMe stack by distributing I/O queues amongst hosted VMs. ( Virtual Mode ).

r SR-IOV based NVMe controllers per virtual functions ( Physical mode ).

13

Page 14: NVMe Virtualization Final - SNIA...r SCSI primary/block commands to NVMe command mapping r Common SCSI Field translations e.g. “PRODUCT IDENTIFICATION” field would be translated

2016 Storage Developer Conference. © IBM. All Rights Reserved.

Topics

r NVMe primerr Past storage virtualization learningsr Virtualization considerationsr NVMe virtualization approaches

rBlind Mode : SCSI to NVMe translationrVirtual Mode : pure NVMe virtual stackrPhysical Mode : SR-IOV

r NVMe over Fabrics virtualizationr Other deliberations

14

Page 15: NVMe Virtualization Final - SNIA...r SCSI primary/block commands to NVMe command mapping r Common SCSI Field translations e.g. “PRODUCT IDENTIFICATION” field would be translated

2016 Storage Developer Conference. © IBM. All Rights Reserved.

Blind Mode : SCSI to NVMe translationr Motivation

r Lot of storage stack ecosystem built around SCSI architecture model, protocol and interfaces

r Preserve software infrastructure investmentsr Method

r Implement a “SCSI to NVMe translator” layer built logically below the operating system SCSI storage stack and above the NVM Express driver

r Detailed in “NVM Express: SCSI Translation Reference” document [7]r SCSI primary/block commands to NVMe command mappingr Common SCSI Field translations e.g. “PRODUCT IDENTIFICATION”

field would be translated to first 16 bytes of the Model Number (MN) field within the “Identify Controller Data”

r NVMe “Status Code” to SCSI “Status Code, Sense Key, Additional Sense Key” 15

Page 16: NVMe Virtualization Final - SNIA...r SCSI primary/block commands to NVMe command mapping r Common SCSI Field translations e.g. “PRODUCT IDENTIFICATION” field would be translated

2016 Storage Developer Conference. © IBM. All Rights Reserved.

OS Storage stack : SCSI to NVMe translated

16

NVMe SSD

File System

Logical Volume Manager

SCSI Block disk driver

SCSI to NVMe translator

NVM Express Driver

Application

Success Completion

Invalid Command Opcode

GOOD / NO SENSE

CHECK CONDITION / ILLEGAL REQUEST / INVALID COMMAND OPERATION CODE

Identify command

Read

INQUIRYREPORT LUNSREAD CAPACITY

READ(6), READ(10),READ(12), READ(16)

Page 17: NVMe Virtualization Final - SNIA...r SCSI primary/block commands to NVMe command mapping r Common SCSI Field translations e.g. “PRODUCT IDENTIFICATION” field would be translated

2016 Storage Developer Conference. © IBM. All Rights Reserved.

Blind Mode View : NVMe unaware VM

17

VIOSVM A

NVMe SSD

NVMe SSD

NVMe SSD

VM B

r SCSI to NVMe translation performed on the storage hypervisorr Virtual Machine(VM)’s Operating System(OS) storage stack is

unmodifiedr Performance consideration : “storage hypervisor” could use a “hardware

accelerated SCSI to NVMe translator”

TranslatorvSCSI HOST

Block disk driver

SCSI-NVMe translator

NVMe driver

C HHC

Page 18: NVMe Virtualization Final - SNIA...r SCSI primary/block commands to NVMe command mapping r Common SCSI Field translations e.g. “PRODUCT IDENTIFICATION” field would be translated

2016 Storage Developer Conference. © IBM. All Rights Reserved.

Topics

r NVMe primerr Past storage virtualization learningsr Virtualization considerationsr NVMe virtualization approaches

rBlind Mode : SCSI to NVMe translationrVirtual Mode : pure NVMe virtual stackrPhysical Mode : SR-IOV

r NVMe over Fabrics virtualizationr Other deliberations

18

Page 19: NVMe Virtualization Final - SNIA...r SCSI primary/block commands to NVMe command mapping r Common SCSI Field translations e.g. “PRODUCT IDENTIFICATION” field would be translated

2016 Storage Developer Conference. © IBM. All Rights Reserved.

Model suitable for server side storage virtualization solutions and preserve NVMe benefits

Virtual Mode : pure NVMe virtual stack

19

r Motivation

r Share NVM storage capacity amongst Virtual Machines(VMs)r VMs with pure NVMe stack retain the various NVMe performance

characteristics :r Low latencyr Parallelism

r Easier on-demand scaling(grow/shrink) implementationr Suitable for future “NVMe over Fabrics” implementations in data-

centers

Page 20: NVMe Virtualization Final - SNIA...r SCSI primary/block commands to NVMe command mapping r Common SCSI Field translations e.g. “PRODUCT IDENTIFICATION” field would be translated

2016 Storage Developer Conference. © IBM. All Rights Reserved.

Virtual Mode : pure NVMe virtual stack

20

r Methodr “Storage hypervisor(VIOS)” owns the physical NVMe controller and

facilitates device sharing amongst guest VMsr VIOS governs the admin SQ/CQ of the NVMe device.r VIOS manages the namespaces (storage partitions) for use of the

VMs.r VIOS manages a light-weight “virtual NVMe host adapter” i.e.

vNVMeHostr VM will see a NVMe device off a “virtual NVMe client driver” i.e.

vNVMeClientr Hypervisor manages the pairing between the vNVMeClient and

vNVMeHost adaptersr Storage stack on the VM Operating System above the

vNVMeClient driver would be agnostic to this virtualization and gets to use NVMe disks.

Page 21: NVMe Virtualization Final - SNIA...r SCSI primary/block commands to NVMe command mapping r Common SCSI Field translations e.g. “PRODUCT IDENTIFICATION” field would be translated

2016 Storage Developer Conference. © IBM. All Rights Reserved.

Virtual Mode View : pure NVMe stack on VM

21

VIOSVM A

NVMe SSD

NVMe SSD

NVMe SSD

VM B

NS_X

vNVMeHost B

vNVMeHost A

NVMeDriver

Admin CQ/SQmux / demux

NS_YNS_Z

NS_X

NS_Y

NS_Z

vNVMeClient A

vNVMeClient A

Page 22: NVMe Virtualization Final - SNIA...r SCSI primary/block commands to NVMe command mapping r Common SCSI Field translations e.g. “PRODUCT IDENTIFICATION” field would be translated

2016 Storage Developer Conference. © IBM. All Rights Reserved.

Topics

r NVMe primerr Past storage virtualization learningsr Virtualization considerationsr NVMe virtualization approaches

rBlind Mode : SCSI to NVMe translationrVirtual Mode : pure NVMe virtual stackrPhysical Mode : SR-IOV

r NVMe over Fabrics virtualizationr Other deliberations

22

Page 23: NVMe Virtualization Final - SNIA...r SCSI primary/block commands to NVMe command mapping r Common SCSI Field translations e.g. “PRODUCT IDENTIFICATION” field would be translated

2016 Storage Developer Conference. © IBM. All Rights Reserved.

Physical Mode : SR-IOV [8]

23

r Motivationr Single Root I/O Virtualization (SR-IOV) is a specification that allows a PCIe

device to appear as multiple separate physical PCIe devicesr Inherent QoS capabilities and configuration

r Methodr Partition adapter capability logically divided into multiple separate PCI

function called “virtual functions”r Each “virtual function” could be individually assigned to a Virtual

Machine(VM)

Page 24: NVMe Virtualization Final - SNIA...r SCSI primary/block commands to NVMe command mapping r Common SCSI Field translations e.g. “PRODUCT IDENTIFICATION” field would be translated

2016 Storage Developer Conference. © IBM. All Rights Reserved.

NVM Express Controller

NVMe SR-IOV view

24NVMe SSD

Physical Function Virtual Function( VF A )

Virtual Function( VF B )

I/OQueues

AdminQueues

NVMe driver

Hypervisor

I/OQueues

AdminQueues

NVMe driver

VM A VM B

Page 25: NVMe Virtualization Final - SNIA...r SCSI primary/block commands to NVMe command mapping r Common SCSI Field translations e.g. “PRODUCT IDENTIFICATION” field would be translated

2016 Storage Developer Conference. © IBM. All Rights Reserved.

Topics

r NVMe primerr Past storage virtualization learningsr Virtualization considerationsr NVMe virtualization approaches

rBlind Mode : SCSI to NVMe translationrVirtual Mode : pure NVMe virtual stackrPhysical Mode : SR-IOV

r NVMe over Fabrics virtualizationr Other deliberations

25

Page 26: NVMe Virtualization Final - SNIA...r SCSI primary/block commands to NVMe command mapping r Common SCSI Field translations e.g. “PRODUCT IDENTIFICATION” field would be translated

2016 Storage Developer Conference. © IBM. All Rights Reserved.

NVMe over Fabrics

r Extends NVMe (beyond PCIe) onto data center fabrics such as Ethernet, Fibre Channel and InfiniBand

r Distance connectivity to storage systems with NVMe devicesr Scaling out of the NVMe devices in large solutionsr Eliminate unnecessary protocol translation

r Two types of fabric transports for NVMe are currently under development: r NVMe over Fabrics using RDMA ( InfiniBand, RoCE and iWARP )

r RDMA verbsr NVMe over Fabrics using Fibre Channel (FC-NVMe) [10]

r Standard defining mapping of NVMe commands over FCr Backward compatible with Fibre Channel Protocol (FCP) transporting SCSI

26

Page 27: NVMe Virtualization Final - SNIA...r SCSI primary/block commands to NVMe command mapping r Common SCSI Field translations e.g. “PRODUCT IDENTIFICATION” field would be translated

2016 Storage Developer Conference. © IBM. All Rights Reserved.

NVMe over fabrics view

27

NVMe SSD

NVMe SSD

NVMe SSD

NS_X

NS_Y

NS_Z

Virtual Machine A Infiniband

RoCE

iWARP

FC

Infiniband

RoCE

FC

iWARP

Infinibandfabric

Ethernetfabric

Fibre Channel fabric

Host NVMe Storage box

Virtual Machine B

Virtual Machine C

Page 28: NVMe Virtualization Final - SNIA...r SCSI primary/block commands to NVMe command mapping r Common SCSI Field translations e.g. “PRODUCT IDENTIFICATION” field would be translated

2016 Storage Developer Conference. © IBM. All Rights Reserved.

NVMe over Fabrics virtualization

r NVMe over Fabrics using FC ( NPIV ) [9]

r NVMe over Fabrics using RDMA

28

Page 29: NVMe Virtualization Final - SNIA...r SCSI primary/block commands to NVMe command mapping r Common SCSI Field translations e.g. “PRODUCT IDENTIFICATION” field would be translated

2016 Storage Developer Conference. © IBM. All Rights Reserved.

Other deliberations

r SSD endurance – Drive Writes Per Day (DWPD) r Virtual Machine migrationr Interrupt driven model or pollingr Virtualization Support (VM_IDs in Frame )

29

Page 30: NVMe Virtualization Final - SNIA...r SCSI primary/block commands to NVMe command mapping r Common SCSI Field translations e.g. “PRODUCT IDENTIFICATION” field would be translated

2016 Storage Developer Conference. © IBM. All Rights Reserved.

Topics

r NVMe primerr Past storage virtualization learningsr Virtualization considerationsr NVMe virtualization approaches

rBlind Mode : SCSI to NVMe translationrVirtual Mode : pure NVMe virtual stackrPhysical Mode : SR-IOV

r NVMe over Fabrics virtualizationr Other deliberations

30

Page 31: NVMe Virtualization Final - SNIA...r SCSI primary/block commands to NVMe command mapping r Common SCSI Field translations e.g. “PRODUCT IDENTIFICATION” field would be translated

2016 Storage Developer Conference. © IBM. All Rights Reserved.

Referencesr [1] NVM ExpressTM Infrastructure - Exploring Data Center PCIe® Topologies :

http://www.nvmexpress.org/wp-content/uploads/NVMe_Infrastructure_final1.pdfr [2] A Comparison of NVMe and AHCI “Don H Walker” :

https://sata-io.org/system/files/member-downloads/NVMe%20and%20AHCI_%20_long_.pdfr [3] NVM Express Overview

http://www.nvmexpress.org/nvm-express-overview/r [4] IBM PowerVM Virtual I/O Server overview :

http://www.ibm.com/support/knowledgecenter/POWER8/p8hb1/p8hb1_vios_virtualioserveroverview.htmr [5] IBM PowerVM “Virtual Fibre Channel” :

http://www.ibm.com/support/knowledgecenter/POWER8/p8hb1/p8hat_vfc.htmr [6] IBM PowerVM “Virtual SCSI” :

http://www.ibm.com/support/knowledgecenter/POWER8/p8hb1/p8hb1_vios_concepts_stor.htmr [7] NVM Express: SCSI Translation Reference :

http://www.nvmexpress.org/wp-content/uploads/NVM-Express-SCSI-Translation-Reference-1_1-Gold.pdfr [8] I/O Virtualization in Enterprise SSDs :

http://www.snia.org/sites/default/files/SDC15_presentations/virt/ZhiminDing_IO_Virtualization_eSSD.pdfr [9] FC-NVMe NVMe over Fabrics “QLOGIC” :

http://www.qlogic.com/Resources/Documents/WhitePapers/Adapters/WP_FC-NVMe.pdfr [10] Networking Flash Storage? Fibre Channel Was Always The Answer! :

http://www.snia.org/sites/default/files/DSI/2016/presentations/stor_networks/MarkJonesNetworking_Flash_Fibre_Channel_Jun16-nc.pdf 31

Page 32: NVMe Virtualization Final - SNIA...r SCSI primary/block commands to NVMe command mapping r Common SCSI Field translations e.g. “PRODUCT IDENTIFICATION” field would be translated

2016 Storage Developer Conference. © IBM. All Rights Reserved.

Credits [ IBM Systems ]

r Hemanta Dutta, AIX Storage & IO SWr Mallesh Lepakshaiah, Virtual IO Serverr Ninad Palsule, AIX Virtual Server Storager Sanket Rathi, AIX Storage Device Driversr Sudhir Maddali, AIX Storage Device Driversr Venkata Anumula, AIX Storage Device Drivers

32

Page 33: NVMe Virtualization Final - SNIA...r SCSI primary/block commands to NVMe command mapping r Common SCSI Field translations e.g. “PRODUCT IDENTIFICATION” field would be translated

2016 Storage Developer Conference. © IBM. All Rights Reserved.

Thank You

33