how the streamlined architecture of nvm express … · nvme data set management range attributes...

16
How the Streamlined Architecture of NVM Express Enables High Performance PCIe SSDs Flash Memory Summit 2012 Santa Clara, CA 1 Peter Onufryk Director of Engineering IDT

Upload: dangdan

Post on 05-May-2018

218 views

Category:

Documents


2 download

TRANSCRIPT

How the Streamlined Architecture of NVM

Express Enables High Performance PCIe SSDs

Flash Memory Summit 2012

Santa Clara, CA

1

Peter Onufryk Director of Engineering

IDT

NA

ND

NA

ND

NA

ND

NA

ND

...

NA

ND

NA

ND

NA

ND

NA

ND

...

NA

ND

NA

ND

NA

ND

NA

ND

...

NVMe

NAND Flash

Controller

NA

ND

NA

ND

NA

ND

NA

ND

...

... ... ... ...

PCIe x8

Gen3

BW ~6 GBps

8KB Page TREAD = 75s 109 MBps Read BWTPROG = 1ms 8 MBps Write BW

The Need for a Large Number of

Parallel Commands

Flash Memory Summit 2012

Santa Clara, CA

2

Need:

55 parallel 8KB reads

732 parallel 8KB writes

Scalable Queuing Interface

Flash Memory Summit 2012

Santa Clara, CA

3

Core 0

I/O

Submission

Queue

I/O

Completion

Queue

Core 1

I/O

Submission

Queue

Core N

I/O

Submission

Queue

I/O

Completion

Queue

I/O

Completion

Queue

I/O

Submission

Queue

...

Controller

ManagmentAdmin

Submission

Queue

Admin

Completion

Queue

Host

NVMe Controller

MSI-X MSI-X MSI-X MSI-X

• Enables NUMA optimized drivers One or more I/O submission queues, completion queue, and MSI-X interrupt per core

High performance and low latency command issue

No locking between cores

• Up to 232 outstanding commands Support for up to 64K I/O submission and completion queues

Each queue supports up to 64K outstanding commands

Efficient Queuing Interface

Command Submission 1. Host writes command to

submission queue 2. Host writes updated submission

queue tail pointer to doorbell

Flash Memory Summit 2012

Santa Clara, CA

4

Submission

Queue Host MemoryCompletion

Queue

Host

NVMe Controller

Head

Tail

1

Submission Queue

Tail Doorbell

Completion Queue

Head Doorbell

2

3 4

Tail

Head

5 6

7

8

Queue

Command

Ring

Doorbell

New Tail

Fetch

Command

Process

Command

Queue

Completion

Generate

Interrupt

Process

Completion

Ring

Doorbell

New Head

Command Processing 3. Controller fetches command 4. Controller processes command

Command Completion 5. Controller writes completion to

completion queue 6. Controller generates MSI-X

interrupt 7. Host processes completion 8. Host writes updated completion

queue head pointer to doorbell

NVMe Command Arbitration

Flash Memory Summit 2012

Santa Clara, CA

5

ASQAdmin

SQ

SQ

...

SQ

RR

SQ

SQ

...

SQ

RR

SQ

SQ

...

SQ

RR

SQ

SQ

...

SQ

RR

WRRMedium WRR Priority

High WRR Priority

Low WRR Priority

Priority

High Strict Priority

Medium Strict Priority

Low Strict Priority

Urgent

High

Priority

Medium

Priority

Low

Priority

Fixed Sized

Commands & Completions

Flash Memory Summit 2012

Santa Clara, CA

6

31

0

25 2427 2629 2830 23 17 9 11619 1821 2022 811 1013 1214 03 2515 467

1

Byte 3 Byte 2 Byte 1 Byte 0

2

3

Command Identifier

4

5

6

7

OpcodeFUSE

Namespace Identifier

DW

ord

8

9

10

11

12

13

14

15

Metadata Pointer

PRP Entry 1

PRP Entry 2

31

0

25 2427 2629 2830 23 17 9 11619 1821 2022 811 1013 1214 03 2515 467

1

Byte 3 Byte 2 Byte 1 Byte 0

2

3 Command IdentifierPStatus Field

SQ Head PointerSQ IdentifierDW

ord

Completion Queue Entry (16B)Submission Queue Entry (64B)

Standard Fields Used By All Commands

Standard Fields Optionally Used By Commands

Benefit of Fixed Sized Commands

Flash Memory Summit 2012

Santa Clara, CA

7

Fixed Sized Commands Simplify Command Parsing, Arbitration, and Error Handling

...

Submission Queues in PCIe Memory

Candidate Queue Selector

Arbiter & Element Fetch

Element

Buffer

0

Element

Buffer

N...

Command Issue Logic

Command Processing / Firmware

...

NVMe

Controller

Front End

Queue

ElementCmd

0

Queue

ElementCmd

1

Queue

ElementCmd

2

Queue

ElementCmd

3

Queue

ElementCmd

4

Queue

ElementCmd

5

Queue

ElementCmd

6

Queue

ElementCmd

7

Queue

Element

Cmd

0

Queue

Element

Queue

ElementCmd

1

Queue

Element

Cmd

2Queue

Element

Queue

Element

Queue

Element

Queue

Element

...

Cmd

3

Variable Sized

Commands

PCIe

Memory

Fixed Sized

Commands

Element Buffer Element Buffer

Simple Optimized Command Set

Flash Memory Summit 2012

Santa Clara, CA

8

Admin Commands

Create I/O Submission Queue

Delete I/O Submission Queue

Create I/O Completion Queue

Delete I/O Completion Queue

Get Log Page

Identify

Abort

Set Features

Get Features

Asynchronous Event Request

Firmware Activate (optional)

Firmware Image Download (optional)

NVM Admin Commands

Format NVM (optional)

Security Send (optional)

Security Receive (optional)

NVM I/O Commands

Read

Write

Flush

Write Uncorrectable (optional)

Compare (optional)

Dataset Management (optional)

10 Required Admin Commands

3 Required NVM I/O Commands

NVM Creates New

Challenges and Opportunities

Flash Memory Summit 2012

Santa Clara, CA

9

Logical

to

Physical

Mapping

Wear

Leveling

Storage

Logical Block

Address Range

Physical

NAND Flash

Pages

Flash Translation Layer

NVMe

Controller

SLC

NAND Flash

PCIe

MLC (2-bit)

NAND Flash

TLC

NAND Flash

Other NVM

(MRAM, PCM …)

DRAM

NVM Controller with Tiered Storage

NVMe Data Set Management Hints

Flash Memory Summit 2012

Santa Clara, CA

10

Controller

Traditional Storage Command Set

Host

Commands

ReadLBA

Num LB

WriteLBA

Num LB

ReadLBA

Num LB

WriteLBA

Num LB

Controller

NVMe Command Set

with Data Set Management (DSM)

Host

Commands

ReadLBA

Num LB

DSM

DSM

ReadLBA

Num LB

DSM

WriteLBA

Num LB

DSM DSMDSMDSM

NVMe Data Set Management

Range Attributes

Flash Memory Summit 2012

Santa Clara, CA

11

• Overall DSM Command

Deallocate

Integral write dataset

Integral read dataset

• Per DSM Range

Access size (in logical blocks)

Written in near future

Sequential read

Sequential write

Access latency (longer, typical,

small)

Access frequency

o Typical read and write

o Infrequent read and write

o Infrequent write, frequent read

o Frequent write, infrequent read

o Frequent read and write

DSMRead

LBA

Num LB

DSM

WriteLBA

Num LB

DSM

LBA Range

DSM

LBA Range

DSM

LBA Range

DSM

LBA Range

DSM

LBA Range

DSM

LBA Range

DSM

LBA Range

DSM

LBA Range

DSM

1 to 256

Ranges

DSM DSM

Out-Of-Order Data

Flash Memory Summit 2012

Santa Clara, CA

12

Possible Sources of Out-Of-Order Data NAND or page TRead variation

Target/LUN conflict o Operations associated with same command (e.g., multiple reads to NAND)

o Different operation (e.g., previously issued program or erase)

NAND error handling o ECC correction time variation, read-retry, …

Flash channel conflict

NAND

0,1,2

NVMe

NAND Flash

Controller

Buffer

PCIe

Read(7-0)NAND

5

NAND

Erase

3

NAND

7

NAND

6

NAND NAND

4

NAND NAND

D7 D0 D6 D5 D1 D2 D4 D3

Traditional Scatter Gather List

(SGL)

Flash Memory Summit 2012

Santa Clara, CA

13

D0

D1

D2

D3

D4

D5

D6

D7

C0

C1

C2

C3

C4

C5

aC0LengthAddress

bC1cC2dC3

NANDData

C0

C1

C2

C3

C4

C5

eC4LengthAddress

fC5

HostPhysicalMemory

ControllerHost

Read

I/O Operation and Host Memory

Flash Memory Summit 2012

Santa Clara, CA

14

C7

HostPhysicalMemory

ProcessVirtual

Memory

C5

C6

C0

C1

C2

C3

C4

C0

C1

C2

C3

C4

C5

C6

C7

read(buPtr, numBytes)bufPtr

numBytes

Page Offset

C8

C8

NVMe Physical Region Page

(PRPs)

Flash Memory Summit 2012

Santa Clara, CA

15

D0

D1

D2

D3

D4

D5

D6

D7

C1

NANDData

Read

C0

C8

offsetC0OffsetPage Address

-C1-C2-C3

-C4OffsetPage Address

-C5-C6-C7

C2

C3

C4

C5

C6

C7-C8

OffsetPage Address

------

C7

HostPhysicalMemory

ProcessVirtual

Memory

C5

C6

C0

C1

C2

C3

C4

C0

C1

C2

C3

C4

C5

C6

C7

read(buPtr, numBytes)bufPtr

numBytes

C8

C8D8

Summary

• Scalable and Efficient Queuing Interface

Low overhead command issue and completion

Parallel command execution

• Fixed Sized Commands

Straightforward command fetch, parsing and arbitration

• Simple Command Set (3 required I/O commands)

Fast command processing

• Data Set Management Hints

Controller optimization of data placement

• Physical Region Pointers

Simplified out-of-order data delivery

Flash Memory Summit 2012

Santa Clara, CA

16