nvmw 2014 extending main memory with flash-the optimized swap approach

17
Jihyung Park, Hyuck Han and Sangyeun Cho Memory Solutions Lab Memory Business Extending Main Memory with Flash – the Optimized SWAP Approach

Upload: benoit-hudzia

Post on 20-Jun-2015

440 views

Category:

Technology


1 download

DESCRIPTION

Title: Extending Main Memory with Flash-the Optimized SWAP Approach Author: Jihyung Park, Hyuck Han, Sangyeun Cho Memory Solutions Lab, Memory Business, Samsung Electronics

TRANSCRIPT

Page 1: Nvmw 2014  extending main memory with flash-the optimized swap approach

Jihyung Park, Hyuck Han and Sangyeun ChoMemory Solutions Lab

Memory Business

Extending Main Memory with Flash –the Optimized SWAP Approach

Page 2: Nvmw 2014  extending main memory with flash-the optimized swap approach

1. Introduction2. Optimized SWAP3. Evaluation4. Future Work5. Conclusion

Page 3: Nvmw 2014  extending main memory with flash-the optimized swap approach

Why extend main memory with flash?• To overcome DRAM scaling limitations and offer large working memory• To reduce total cost of ownership (acquisition and operation)• Flash has no seek time• Flash has faster latency than HDD

Two approaches toward memory extension• Non-transparent approach: Application has to change• Transparent approach: Application is NOT aware of the underlying flash

Introduction

Page 4: Nvmw 2014  extending main memory with flash-the optimized swap approach

Current swap algorithm is optimized for HDD

Paging for the Fast device• Fast and Simple vs. Heavy and Accurate

Motivation

Page 5: Nvmw 2014  extending main memory with flash-the optimized swap approach

Swap entry search• A new search algorithm

I/O path optimization• Swap read-ahead• I/O scheduler• Swappiness

Swap device as backing store: Inclusive vs. Exclusive• We adjust the swap entry free policy to enforce that the swap device

“includes” all swapped out pages

Optimized SWAP

Page 6: Nvmw 2014  extending main memory with flash-the optimized swap approach

Tree search• “Bit tree”, no pointer, a node size is just one byte• Fan-out degree is 8 (one bit is pointing a child node)• 8-level tree covers multi-terabytes of swap space.• Search cost: 2O(log N)• Reduce swap structure size

– Roughly current swap mechanism vs. O-Swap = 10MB vs. 2MB (to support 32GB swap space)

Optimized SWAP

0 2 4 61 3 5 7 8 9

Page 7: Nvmw 2014  extending main memory with flash-the optimized swap approach

Read-ahead• No read-ahead (due to randomness)• Note also that SSD has no seek time

I/O scheduler• NOOP (due to randomness and fast response requirements)• Bypass

Swappiness• swappiness : 0

Swap entry reclaim policy• Do not free swap entries as much as possible

Optimized SWAP

Page 8: Nvmw 2014  extending main memory with flash-the optimized swap approach

Evaluation - Memcached

System

CPU Xeon E5-2665 (HT disabled)

# Core 16

Network 10Gb Ethernet

SSD Samsung XS1715 (NVME)

WorkloadYCSB

DB Size 30GB

Value Length 2048B

# memcached threads 64

# Clients 320

Get : Update 95% : 5%

MemorySWAP OSWAP Full DRAM

DRAM 8GBSSD Swap 32GB

DRAM 8GBSSD Swap 32GB

DRAM 32GB

Page 9: Nvmw 2014  extending main memory with flash-the optimized swap approach

Evaluation - Memcached

0

2

4

6

8

10

12

14

SWAP OSWAP Full DRAM

Ope

ratio

ns p

er se

cond

(x10

,000

)Memcached (NVME, 10Gb Network)

Page 10: Nvmw 2014  extending main memory with flash-the optimized swap approach

Evaluation - Memcached

0

1

2

3

4

5

6

7

8

256us 512us 1024us 2ms 4ms 8ms 16ms 32ms 64ms 128ms 256ms 512ms

Ope

ratio

ns p

er se

cond

(x1,

000)

SWAP Performance by Latency Segment

< 1ms QoS

Page 11: Nvmw 2014  extending main memory with flash-the optimized swap approach

Evaluation - Memcached

0

5

10

15

20

25

256us 512us 1024us 2ms 4ms 8ms 16ms 32ms 64ms 128ms 256ms 512ms

Ope

ratio

ns p

er se

cond

(x1,

000)

OSWAP Performance by Latency Segment

< 1ms QoS

Page 12: Nvmw 2014  extending main memory with flash-the optimized swap approach

Evaluation - Memcached

0

2

4

6

8

10

12

256us 512us 1024us 2ms 4ms 8ms 16ms 32ms 64ms 128ms

Ope

ratio

ns p

er se

cond

(x10

,000

)

Full DRAM Performance by Latency Segment

< 1ms QoS

Page 13: Nvmw 2014  extending main memory with flash-the optimized swap approach

Evaluation - Linkbench

System

CPU Xeon E5-2665 (HT disabled)

# Core 16

Network 10Gb Ethernet

SSD Samsung XS1715 (NVME)

WorkloadLinkbench

DB Size 30GB

# Clients 400

MemorySWAP OSWAP Full DRAM

DRAM 8GBSSD Swap 32GB

DRAM 8GBSSD Swap 32GB

DRAM 32GB

Page 14: Nvmw 2014  extending main memory with flash-the optimized swap approach

Evaluation - Linkbench

0

2

4

6

8

10

12

14

SWAP OSWAP Full DRAM

Req

uest

s per

seco

nd (x

1,00

0)

Linkbench

Page 15: Nvmw 2014  extending main memory with flash-the optimized swap approach

Rack scale architecture

High performance memory + High capacity memory

Future Work

CPUs

DRAMDRAMDRAM

Compute

PCIe <-> Ctrl Ctrl

Memory

Memory

Memorycable

Memory Device

Page 16: Nvmw 2014  extending main memory with flash-the optimized swap approach

Cost-effective memory capacity

Exploit flash memory transparently

Conclusion

Page 17: Nvmw 2014  extending main memory with flash-the optimized swap approach