fstream: managing flash streams in the file system · eunhee rho, kanchan joshi, seung-uk shin,...

19
FStream: Managing Flash Streams in the File System Eunhee Rho, Kanchan Joshi, Seung-Uk Shin, Nitesh Jagadeesh Shetty, Joo-Young Hwang, Sangyeun Cho, Daniel DG Lee, Jaeheon Jeong Memory Division, Samsung Electronics Co., Ltd.

Upload: dokhanh

Post on 04-May-2018

214 views

Category:

Documents


1 download

TRANSCRIPT

FStream: Managing Flash Streams

in the File System

Eunhee Rho, Kanchan Joshi, Seung-Uk Shin, Nitesh Jagadeesh Shetty, Joo-Young Hwang, Sangyeun Cho, Daniel DG Lee, Jaeheon Jeong

Memory Division, Samsung Electronics Co., Ltd.

Table of Contents

• Flash-based SSDs

• Garbage Collection & WAF

• Multi-stream

• FStream

• Workload Analysis & Experimental Results

• Conclusion

2/17

Flash-based Solid State Drives Replacement of HDDs

• Flash Translation Layer (FTL) allows SSDs to maintain traditional block interface

Application

OS

File System

Application

OS

File System

HDD SSD

Block Layer Block Layer

Common Interface

Faster

Energy Efficient

More Reliable

FTL

3/18

Garbage Collection & WAF Garbage Collection (GC) Overheads

• Reclaiming space for empty blocks requires valid page copy

• Media write amplified due to garbage collection

• Shortens SSD lifetime and hampers performance

Write Amplification Factor (WAF)

• Ratio of the actual media writes to the user I/O

4/18

Multi-stream Managing data placement on a SSD with streams

• Mapping data to separate stream by their life expectancy

Standardization status

• T10 (SCSI) standard & NVME 1.3 “directives”

5/18

Multi-stream Cont’d Data Placement Comparison

H1

C1

H2

C2

C3

H3

H4

C4

Writes =

H1, C1, H2, C2,

C3, H3, H4, C4

1

H1

C1

H2

C2

C3

H3

H4

C4

H1’

H2’

H4’

H3’

H1

H2

H4

H3

C1

C2

C3

C4

New Writes =

H1’, H2’, H3’, H4’

2

H1

H2

H3

H4

C1

C2

C3

C4

Stream = X Stream = Y

Stream = Y

H1’

H2’

H4’

H3’

Stream = X

Conventional SSD Multi-streamed SSD

Free Space Fragmentation!

Valid page copy required to reclaim the free space.

Return to free block

6/18

FStream Motivation

• We need easier, general method of stream assignment.

• Block device layer has limited information about data lifetime.

• File system metadata has different lifetime from user data, need be separated.

Our Approach

• File system level stream assignment.

• Separate streams for file system metadata, journal, and user data.

• Implemented FStream in existing file systems.

[1]

[2]

[1] Kang, JU et al., “The Multi-streamed Solid-State Drive”, HotStorage ’14 [2] Yang, Jingpei et al., “AutoStream: automatic stream management for multi-streamed SSDs”, SYSTOR ‘17

7/18

EXT4 metadata and journaling

• EXT4 on-disk layout: block groups with data and metadata related to it

• EXT4 journal: write ordering in ‘data=ordered’ mode

Ext4

8/18

Ext4Stream Mount options

• Journal-stream

• Separate journal writes

• Inode-stream

• Separate inode writes

• Dir-stream

• Separate directory blocks

• Misc-stream

• Inode/block bitmap and group-descriptor

• Fname-stream

• Distinct stream to file(s) with specific names

• Extn-stream

• File-extension based stream

9/18

XFS XFS metadata and journaling

• Parallel metadata operations, metadata buffering (page cache not used)

• Mixture of logical and physical journaling

• Minimum inode update size is a chunk of 64 inodes.

10/18

XFStream Mount options

• Log-stream

• Separate journal writes

• Inode-stream

• Separate inode writes

• Fname-stream

• Distinct stream to file(s) with specific names

11/18

Application Specific Data Separation

Upon Write Request..

1) Writes to Commit Log - For error recovery

2) Writes to Memtable - In memory space

3) Returns success

4) Data in Memtable are

flushed to SSTable

5) SSTables go through compaction

Memory

Commit Log

Memtable

SSTable 1 K1 K2

SSTable 2 K1 K3

SSTable 3 K2 K3

SSTable 4 K1 K3

SSTable 5 K1 K2 K3

SSTable 6 SSTable 7

SSTable 21

Flush

ing

Write

Request

Stream for Cassandra’s commit log file.

• Fname_stream option: commitlog-*

12/18

Experimental Setup

OS:

• Linux kernel v4.5 with io-streamid support

System:

• Dell PowerEdge R720 server with 32 cores and 32GB memory

SSD:

• Samsung PM963 480GB with streams support

Benchmarks:

• Filebench: Varmail & Fileserver

• YCSB on Cassandra

13/18

Filebench Workload Analysis Varmail

Fileserver

Journal 61.0%

Inode 8.0%

Directory 4.0%

Other meta 0.2%

Data 26.8%

EXT4

Journal 60% Inode

9%

Data 31%

XFS

Inode 21.0% Directory

15.8%

Other meta 0.2%

Data 63.0%

EXT4 No Journal

Journal 26%

Inode 16%

Directory 3%

Other meta 0%

Data 55%

EXT4

Journal 16%

Inode 32%

Data 52%

XFS Workload Conditions - Filled 80% of disk before test

- Number of test files: 900,000 (14GB)

- Varmail : fsync-intensive

- Runtime: 2 hours

XFS writes more inodes (random writes) than Ext4.

14/18

Filebench: Performance Fstream achieved 5 ~ 35% performance improvements.

0

20000

40000

60000

80000

100000

120000

Ext4

Ext4Stream

Ext4-NJ

Ext4Stream-NJ

XFS

XFStream

0

20000

40000

60000

80000

100000

120000

Ext4

Ext4Stream

XFS

XFStream

Ops/

sec

Ops/

sec

Varmail Fileserver

15/18

Filebench: WAF

1.3

1.04

1.6

1.09 1.2

1.05

0

0.5

1

1.5

2

Ext4

Ext4Stream

Ext4-NJ

Ext4Stream-NJ

XFS

XFStream

1.1 1.02 1.2

1.05

0

0.5

1

1.5

2

Ext4

Ext4Stream

XFS

XFStream

WA

F

WA

F

Varmail Fileserver

Fstream achieved WAF of close to one.

Ext4’s WAF < Ext4NJ’s WAF

• Journal is written in a circular fashion, so is invalidated periodically.

16/18

YCSB on Cassandra Results Data intensive workload

• Load phase: 1KB record x 120 million inserts

• Run phase: 1KB record x 80 million inserts

0

10000

20000

30000

40000

50000

60000

70000

Ext4

Ext4Stream

XFS

XFStream

2

1.1

2

1.2

0

0.5

1

1.5

2

2.5

Ext4

Ext4Stream

XFS

XFStream

Ops/

sec

WA

F 17/18

Conclusion and Acknowledgements SSD Performance & Lifetime

• The less FTL garbage collection overheads, the longer SSD lives and the faster SSD performs.

Streams: SSD interface for separating data with different lifetimes

FStream: stream assignment in file system

• Separate streams for file system metadata, journal, and user data.

• Provide filename and extension based user data separation.

• Achieved 5~35% performance improvement and near 1 WAF for filebench.

Acknowledgements

• We thank Cristian Ungureanu, our shepherd, and anonymous reviewers for their feedbacks.

18/18

T H A N K

Y O U