solid state drives (ssds) - androbenchcsl.skku.edu/uploads/eee3052f17/15-ssd.pdf · 2017. 11....
TRANSCRIPT
EEE3052: Introduction to Operating Systems, Fall 2017, Jinkyu Jeong ([email protected])
Solid State Drives (SSDs)
Jinkyu Jeong ([email protected])Computer Systems Laboratory
Sungkyunkwan Universityhttp://csl.skku.edu
EEE3052: Introduction to Operating Systems, Fall 2017, Jinkyu Jeong ([email protected]) 2
Memory Types
• High-density• Reliable• Low-cost• Suitableforhighproductionwithstablecode
ROM
• Non-volatile• High-density• Ultravioletlightforerasure
EPROM
• Non-volatile• Lowerreliability• Highercost• Lowestdensity• Electricallybyte-erasable
EEPROM
• High-density• Low-cost• High-speed• High-power
DRAM
• High-density• Low-cost• High-speed• Low-power• Highreliability
FLASH
Source: Intel Corporation.
EEE3052: Introduction to Operating Systems, Fall 2017, Jinkyu Jeong ([email protected]) 3
Flash Memory Characteristics
• Erase-before-write– Read– Write or Program: 1 à 0– Erase: 0 à 1
• Bulk erase– Program unit:• NOR: byte or word• NAND: page
– Erase unit: block
1 1 1 1 1 1 1 1
1 1 0 1 1 0 1 0
1 1 1 1 1 1 1 1
write(program)
erase
EEE3052: Introduction to Operating Systems, Fall 2017, Jinkyu Jeong ([email protected]) 4
Logical View of NAND Flash
• A collection of blocks
• Each block has a number of pages
• The size of a block or a page depends on the technology (but, it’s getting larger)
Page0
Block1Page1
Pagem-1
Blockn-1Block0
Dataarea Sparearea
EEE3052: Introduction to Operating Systems, Fall 2017, Jinkyu Jeong ([email protected]) 5
NAND Flash Types
• SLC NAND– Single Level Cell– 1 bit/cell
• MLC NAND– Multi-level Cell (misnomer)– 2 bits/cell
• TLC NAND– Triple-level Cell– 3 bits/cell
• 3D NANDSource: Micron Technology, Inc.
EEE3052: Introduction to Operating Systems, Fall 2017, Jinkyu Jeong ([email protected]) 6
NAND Applications
• Universal Flash Drives (UFDs)• Flash cards– CompactFlash, MMC, SD, Memory stick, …
• Smartphones– eMMC (Embedded MMC)– UFS (Universal Flash Storage)
• SSDs (Solid State Drives)• Other embedded devices– MP3 players, Digital TVs, Set-top boxes,
Car navigators, …
EEE3052: Introduction to Operating Systems, Fall 2017, Jinkyu Jeong ([email protected]) 7
Commercial SSDs
http://www.enuri.com (As of May 14, 2016)
EEE3052: Introduction to Operating Systems, Fall 2017, Jinkyu Jeong ([email protected]) 8
Anatomy of an SSD
• Samsung 850 Evo
SSDControllerNANDFlash
DRAM
http://www.anandtech.com/show/9451/the-2tb-samsung-850-pro-evo-ssd-review
EEE3052: Introduction to Operating Systems, Fall 2017, Jinkyu Jeong ([email protected]) 9
SSD Internals
EEE3052: Introduction to Operating Systems, Fall 2017, Jinkyu Jeong ([email protected]) 10
HDDs vs. SSDs
1 http://www.tomshardware.com/reviews/samsung-850-evo-850-pro-2tb-ssd,4205.html1 http://www.storagereview.com/samsung_spinpoint_m9t_hard_drive_review 3 http://www.enuri.com (As of Sep. 27, 2015)
Feature SSD (Samsung) HDD (Seagate)
Model MZ-75E2T0B (850Evo) ST2000LM003(SpinPoint M9T)
Capacity2TB(128Gb 32-Layer3DV-NANDTLCx16die/channelx8channels)
2TB(3Discs, 6Heads,5400RPM)
Formfactor 2.5”,66g 2.5”, 130g
DRAM 2GB 32MB
Hostinterface SATA-3(6.0Gbps) SATA-3(6.0Gbps)
Powerconsumption(Active/Idle/Sleep) 3.7,4.7W /0.5W/0.05W 2.3W/0.7W/0.18W
Performance850 Evo1:Sequential:128KB/QD2Random:4KB/QD32M9T2:Sequential:2MBRandom:4KB
Sequentialread: 544MB/sSequentialwrite: 520MB/sRandomread: 97,687IOPSRandom write: 89,049IOPS
Randomread: 11,335IOPS(QD1)Randomwrite: 38,433IOPS(QD1)
Sequentialread: 124MB/sSequentialwrite: 124MB/sRandomread: 56IOPSRandomwrite: 98IOPS
Power-on toready: 3.5secAverageseek: 12/14msAveragelatency: 5.6ms
Price3 1,009,380won(505won/GB) 117,060won(59won/GB)
EEE3052: Introduction to Operating Systems, Fall 2017, Jinkyu Jeong ([email protected]) 11
State of the Art
• World’s first 2.5” SAS 32TB SSD @ Flash Memory Summit 2016
Source: THESSDREVIEW, Samsung Newsroom
EEE3052: Introduction to Operating Systems, Fall 2017, Jinkyu Jeong ([email protected]) 12
State of the Art
• Z-SSD @ Flash Memory Summit 2016– 4 times faster than NVMe Flash SSDs
Source: Samsung Newsroom
EEE3052: Introduction to Operating Systems, Fall 2017, Jinkyu Jeong ([email protected]) 13
NAND Constraints
• No in-place update– Require sector remapping (or address translation)
• Bit errors– Require the use of error correction codes (ECCs)
• Bad blocks– Factory-marked and run-time bad blocks– Require bad block remapping
• Limited program/erase cycles– < 100K for SLCs, < 3K for MLCs, < 1K for TLCs– Require wear-leveling
EEE3052: Introduction to Operating Systems, Fall 2017, Jinkyu Jeong ([email protected]) 14
Flash Translation Layer (FTL)
• A software layer to make NAND flash fully emulate traditional block devices (e.g. disks)
+DeviceDriver
Read Write Erase
FileSystem
ReadSectors WriteSectors
FlashMemory
Mismatch!
+DeviceDriver
FlashMemory
FTL
+
ReadSectors WriteSectors
FileSystem
ReadSectors WriteSectors
Source: Zeen Info. Tech.
EEE3052: Introduction to Operating Systems, Fall 2017, Jinkyu Jeong ([email protected]) 15
Address Mapping
• Required since flash pages cannot be overwritten
… …
LBAaddressspace(Asseenbythehost)
write
Mappingtable
olddata
NANDflash
newdata
EEE3052: Introduction to Operating Systems, Fall 2017, Jinkyu Jeong ([email protected]) 16
Example: Page Mapping
• Flash configuration– Page size: 4KB– # of pages / block = 4
• Current state– Written to page 0, 1, 2, 8, 4, 5
• Reading page 5
Logicalpage#5 0000000101
0 01 12 23
0 01 12 28 3
4 45 5
67
891011
12131415
PBN:0
PBN:1
PBN:2
PBN:3
PageMapTable DataBlock PPN
4 45 567
8 391011
EEE3052: Introduction to Operating Systems, Fall 2017, Jinkyu Jeong ([email protected]) 17
Example: Page Mapping
• Flash configuration– Page size: 4KB– # of pages / block = 4
• Current state– Written to page 0, 1, 2, 8, 4, 5
• New requests (in order)– Write to page 9– Write to page 3– Write to page 5
0 01 12 23
0 01 12 28 3
4 45 5
67
891011
12131415
PBN:0
PBN:1
PBN:2
PBN:3
PageMapTable DataBlock PPN
4 45 567
8 391011
EEE3052: Introduction to Operating Systems, Fall 2017, Jinkyu Jeong ([email protected]) 18
Example: Page Mapping
• Flash configuration– Page size: 4KB– # of pages / block = 4
• Current state– Written to page 0, 1, 2, 8, 4, 5
• New requests (in order)– Write to page 9– Write to page 3– Write to page 5
0 01 12 23
0 01 12 28 3
4 45 59 6
7
891011
12131415
PBN:0
PBN:1
PBN:2
PBN:3
PageMapTable DataBlock PPN
4 45 567
8 39 61011
EEE3052: Introduction to Operating Systems, Fall 2017, Jinkyu Jeong ([email protected]) 19
Example: Page Mapping
• Flash configuration– Page size: 4KB– # of pages / block = 4
• Current state– Written to page 0, 1, 2, 8, 4, 5
• New requests (in order)– Write to page 9– Write to page 3– Write to page 5
0 01 12 23 7
0 01 12 28 3
4 45 59 63 7
891011
12131415
PBN:0
PBN:1
PBN:2
PBN:3
PageMapTable DataBlock PPN
4 45 567
8 39 61011
EEE3052: Introduction to Operating Systems, Fall 2017, Jinkyu Jeong ([email protected]) 20
Example: Page Mapping
• Flash configuration– Page size: 4KB– # of pages / block = 4
• Current state– Written to page 0, 1, 2, 8, 4, 5
• New requests (in order)– Write to page 9– Write to page 3– Write to page 5
0 01 12 23 7
0 01 12 28 3
4 45 59 63 7
5 891011
12131415
PBN:0
PBN:1
PBN:2
PBN:3
PageMapTable DataBlock PPN
4 45 5867
8 39 61011
Invalidateoldpage
Updatedpagewrite
EEE3052: Introduction to Operating Systems, Fall 2017, Jinkyu Jeong ([email protected]) 21
Garbage Collection
• Garbage collection (GC)– Eventually, FTL will run out of blocks to write to– GC must be performed to reclaim free space– Actual GC procedure depends on the mapping scheme
• GC in page-mapping FTL– Select victim block(s)– Copy all valid pages of victim block(s) to free block– Erase victim block(s)– Note: At least one free block should be reserved for GC
EEE3052: Introduction to Operating Systems, Fall 2017, Jinkyu Jeong ([email protected]) 22
Example: GC in Page Mapping
• Current state– Written to page 0, 1, 2, 8, 4, 5– Written to page 9, 3, 5
• New requests (in order)– Write to page 8– Write to page 9– Write to page 3– Write to page 1– Write to page 4
0 01 12 23 7
0 01 12 28 3
4 45 59 63 7
5 891011
12131415
PBN:0
PBN:1
PBN:2
PBN:3
PageMapTable DataBlock PPN
4 45 867
8 39 61011
Spareblock
EEE3052: Introduction to Operating Systems, Fall 2017, Jinkyu Jeong ([email protected]) 23
Example: GC in Page Mapping
• Current state– Written to page 0, 1, 2, 8, 4, 5– Written to page 9, 3, 5
• New requests (in order)– Write to page 8– Write to page 9– Write to page 3– Write to page 1– Write to page 4
0 01 12 23 7
0 01 12 28 3
4 45 59 63 7
5 88 9
1011
12131415
PBN:0
PBN:1
PBN:2
PBN:3
PageMapTable DataBlock PPN
4 45 867
8 99 61011
Spareblock
EEE3052: Introduction to Operating Systems, Fall 2017, Jinkyu Jeong ([email protected]) 24
Example: GC in Page Mapping
• Current state– Written to page 0, 1, 2, 8, 4, 5– Written to page 9, 3, 5
• New requests (in order)– Write to page 8– Write to page 9– Write to page 3– Write to page 1– Write to page 4
0 01 12 23 7
0 01 12 28 3
4 45 59 63 7
5 88 99 10
11
12131415
PBN:0
PBN:1
PBN:2
PBN:3
PageMapTable DataBlock PPN
4 45 867
8 99 101011
Spareblock
EEE3052: Introduction to Operating Systems, Fall 2017, Jinkyu Jeong ([email protected]) 25
Example: GC in Page Mapping
• Current state– Written to page 0, 1, 2, 8, 4, 5– Written to page 9, 3, 5
• New requests (in order)– Write to page 8– Write to page 9– Write to page 3– Write to page 1– Write to page 4
0 01 12 23 11
0 01 12 28 3
4 45 59 63 7
5 88 99 103 11
12131415
PBN:0
PBN:1
PBN:2
PBN:3
PageMapTable DataBlock PPN
4 45 867
8 99 101011
Spareblock
EEE3052: Introduction to Operating Systems, Fall 2017, Jinkyu Jeong ([email protected]) 26
Example: GC in Page Mapping
• Current state– Written to page 0, 1, 2, 8, 4, 5– Written to page 9, 3, 5
• New requests (in order)– Write to page 8– Write to page 9– Write to page 3– Write to page 1– Write to page 4
0 01 132 23 11
0 01 12 28 3
4 45 59 63 7
5 88 99 103 11
4 121 13
1415
PBN:0
PBN:1
PBN:2
PBN:3
PageMapTable DataBlock PPN
4 125 867
8 99 101011
victim
Valid page copy
Updated page write
EEE3052: Introduction to Operating Systems, Fall 2017, Jinkyu Jeong ([email protected]) 27
Example: GC in Page Mapping
• Current state– Written to page 0, 1, 2, 8, 4, 5– Written to page 9, 3, 5
• New requests (in order)– Write to page 8– Write to page 9– Write to page 3– Write to page 1– Write to page 4
0 01 132 23 11
0 01 12 28 3
4567
5 88 99 103 11
4 121 134 14
15
PBN:0
PBN:1
PBN:2
PBN:3
PageMapTable DataBlock PPN
4 145 867
8 99 101011
Spareblock
EEE3052: Introduction to Operating Systems, Fall 2017, Jinkyu Jeong ([email protected]) 28
OS Implications
• NAND flash has different characteristics compared to disks– No seek time– Asymmetric read/write access times– No in-place-update– Good sequential read/write and random read
performance, but bad random write performance– Wear-leveling– …– Traditional operating systems have been optimized for
disks. What should be changed?
EEE3052: Introduction to Operating Systems, Fall 2017, Jinkyu Jeong ([email protected]) 29
SSD Support in OS
• Turn off “defragmentation” for SSDs
• New “TRIM” command– Remove-on-delete
• Simpler I/O scheduler
• Align file system partition with SSD layout
• Flash-aware file systems (e.g. F2FS in Linux)
• Larger block size (4KB)
EEE3052: Introduction to Operating Systems, Fall 2017, Jinkyu Jeong ([email protected]) 30
Beauty and the Beast
• NAND Flash memory is a beauty– Small, light-weight, robust, low-cost,
low-power non-volatile device
• NAND Flash memory is a beast– Much slower program/erase operations– No in-place-update– Erase unit > write unit– Limited lifetime – Bit errors, bad blocks, …
• Software support is essential forperformance and reliability!
EEE3052: Introduction to Operating Systems, Fall 2017, Jinkyu Jeong ([email protected]) 31
Beyond Flash
• Resistance-based memory technologies
Source: IEEE Computer, August 2013.