k2: work-constraining scheduling of nvme-attached...

K2: Work-Constraining Scheduling of NVMe-Attached Storage

Till Miemietz, Hannes Weisbach, Michael Roitzsch and Hermann Härtig

Presentation at the 40 th IEEE Real-Time Systems Symposium

Hong Kong ⚫ 4th of December 2019

What are the implications of fast storage devices for

real-time systems?

What May a Modern Storage Stack Look Like?

Driver

Block Layer

Apps (e.g., File Systems)

Apps @ CPU 0 Apps @ CPU 1 Apps @ CPU 2

I/O scheduler-specific staging queue scheme

NVMe Queue Pair NVMe Queue Pair

NVMe commands

requests

Dispatching Queues(FIFO only)

What May a Modern Storage Stack Look Like?

Driver

Block Layer

Apps (e.g., File Systems)

Controller with Flash Translation Layer (FTL)

Flash Package Flash Package

Block Block

SQ CQ SQ CQ SQ CQ

NVMe commands

requests

Latency Characteristics of SSDs – Motivation

● Gap between CPU and storage devices is shrinking

→ Speedup of 1000x compared to HDDs

→ Up to 40% of storage latency caused by software

● High degree of abstraction (FTL) is source of non-determinism

→ Garbage Collection

→ Caching

→ Scheduling of multiple NVMe queue pairs

● Can host-sided I/O schedulers still be used to enforce latency goals?

Dissecting Linux' Block I/O Schedulers

● Performing micro benchmarks in a simulated real-time scenario

→ Samsung 970evo (250GB, NVMe 1.3) on a desktop system

● Multiple background processes used to create load on the drive

→ Goal is bandwidth maximization

● Single foreground high-priority process that periodically issues requests

→ Goal is fast access to the drive

● Analyse latency characteristics of RT process using different I/O schedulers

Block I/O Schedulers in Linux 4.15

● None (default)

● mq-deadline

→ Orders requests by target block address

● Kyber

→ Core-local, balances latencies of coarse-grained request classes

● Bandwidth Fair Queuing (BFQ)

→ Bandwidth control per process

→ Aware of I/O priorities

Latency Characteristics for Random Reads

● Both real-time and background processes issuing small random reads (4K)

→ X-axis shows achieved bandwidth, color of plotting symbols depicts targeted throughput

● Plots look similar for larger block sizes

Latency Characteristics for Write-Only Workloads

● Writing is very fast as long as drive-internal SLC cache is accessed

● Garbage collection drastically increases storage latency of RT process

Dissecting Linux’ I/O Schedulers – Lessons learned

● Reading is often slower than writing

→ Caching can avoid synchronous access of second-level flash cells when writing

● No performance isolation

→ Real-time process faces high latency when SSD is fully loaded

→ BFQ is unable to enforce priorities correctly

● From latency viewpoint I/O schedulers show little differences

→ However, complex implementations have latency penalties of up to 10%

K2: Work-Constraining I/O Scheduling

● Work-conserving behavior of current schedulers is not optimal w.r.t. latencies

→ Stalls high-priority read requests

→ Amplifies effects of garbage collection

● Concept: Limit the number of requests that are served in parallel

→ Device-wide limit of inflight requests

→ Requests are stored in per-priority FIFO queues

→ Submit new requests on completion of previous ones

● Queue length as a tunable parameter

→ Trade global throughput for softly bounded I/O latency of high-priority processes

Evaluation of K2 – Random Reads (64K)

● Limiting the length of the device queues enforces correct service order

→ Tested K2 with a queue length of 8, 16, and 32

→ Note the different scales of the y-axis!

● Host gains flexibility to enforce quick submission of real-time requests

Evaluation of K2 – Sequential Write Operations (64K)

● Performance for very fast operations similar to mq-deadline and Kyber

● K2 can not avoid garbage collection but mitigates impact on RT applications

Evaluation of K2 – Application Benchmark

● Tested read-only OLTP benchmarks of sysbench with with read / write background load

Summary

● Fast SSDs impose new challenges on the OS to enforce timely access to storage

→ Complex abstractions cause non-determinism of performance parameters

● Current I/O schedulers are not suitable for real-time demands

→ No performance isolation

→ Overridden by drive-internal scheduler

● K2: work-constraining I/O scheduling to limit storage access latency

→ Trade throughput for lower tail-latencies

→ Improve worst-case latency up to 10x for reading, 6.8x for writing

→ Works with components-off-the-shelf

Additional Slides

Latency Characteristics for Random Reads (All Percentiles)

● Both real-time and background processes issuing small random reads (4K)

Latency Characteristics for Write-Only Workloads

● Garbage collection also affects lower percentiles

Latency Characteristics for Mixed Workloads (64K)

● Access times of real-time application are similar to reading

Impact of Scheduler Complexity

● For small random writes, complex policies have a notable impact on overall latency

Evaluation of K2 – Random Reads (64K, All Percentiles)

● Limiting the length of the device queues enforces correct service order

Evaluation of K2 – Mixed Workloads (64K)

● Reduction of tail latencies is also present for mixed read / write requests

● Bandwidth penalty reduced by fast write operations

● Table shows 99.9th percentile of latency at maximum throughput

→ Throughput loss of K2 is mitigated by large block sizes

Evaluation – Comparison of I/O Schedulers

● Table shows throughput at maximum 99.9th percentile of latency

Evaluation – Comparison of I/O Schedulers

k2: work-constraining scheduling of nvme-attached...

Documents

spdk nvme-of rdma (target & initiator) performance report...

nvme™ management interface (nvme-mi™) and …...nvme...

nvme™ ‘interoperability’ – conformance,...

usenix - spool : reliable virtualized nvme storage systems...

unh iol nvme testing service...unh–iol nvme testing...

the performance impact of nvme and nvme over fabrics ·...

endtoend nvme afaмассивы huawei

in-booth presentation at 2:45 pm - kaminario the hype of...

nvme base specification 2.0 preview...flash memory summit...

unh iol nvme testing service...nvme testing service test...

k2: work-constraining scheduling of nvme-attached storagea....

k2: work-constraining scheduling of nvme-attached...

214 nvme over fabrics pcayton - openfabrics alliance...

constraining asymmetric dark matter

constraining planet structure from stellar chemistry ·...

h-nvme: a hybrid framework of nvme-based storage system in...

nvme virtualization nd - snia · nvme virtualization...

western digital pc sn720 nvme ssd - sandisk · western...

oracle® 6.4 tb nvme ssd v2 user guideoracle 6.4 tb nvme ssd...

pcie protocol suite ™ release...