di-ssds: desymmetrized interconnection architecture and

27
DI-SSD: Desymmetrized Interconnection Architecture and Dynamic Timing Calibration for Solid-State Drives Ren-Shuo Liu and Jian-Hao Huang System and Storage Design Lab Department of Electrical Engineering National Tsing Hua University, Taiwan 1

Upload: others

Post on 23-Feb-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DI-SSDs: DeSymmetrized Interconnection Architecture and

DI-SSD: Desymmetrized Interconnection Architecture and Dynamic Timing Calibration for Solid-State Drives

Ren-Shuo Liu and Jian-Hao Huang

System and Storage Design Lab

Department of Electrical Engineering

National Tsing Hua University, Taiwan

1

Page 2: DI-SSDs: DeSymmetrized Interconnection Architecture and

Conventional SSD Architecture

Flash

Symmetric Interconnection (SI) Architecture

SSD

USB, SATA, PCIe, etc.

Flash Controller

2

Page 3: DI-SSDs: DeSymmetrized Interconnection Architecture and

Our Key Contribution

Flash

Symmetric Interconnection (SI) Architecture

Desymmetrized Interconnection (DI) Architecture

SSD

Flash Controller

Flash

SSD

USB, SATA, PCIe, etc. USB, SATA, PCIe, etc.

Flash Controller

3

Page 4: DI-SSDs: DeSymmetrized Interconnection Architecture and

Train Station Stair Analogy

• Desymmetrized widths• Up direction is wider

• Down is narrower

Can accommodate the rush of passengers heading upstairs when a train arrives

Best utilizes the limited space of the train platform

4

Page 5: DI-SSDs: DeSymmetrized Interconnection Architecture and

Outline

• Contributions

• SI vs. DI SSD architecture

• Proof of concept-dynamic timing calibration

• Evaluation

• Conclusions

5

Page 6: DI-SSDs: DeSymmetrized Interconnection Architecture and

Cloud

End Users6

SSDs

Page 7: DI-SSDs: DeSymmetrized Interconnection Architecture and

Read-Dominant Usage is important to SSDs

7

Page 8: DI-SSDs: DeSymmetrized Interconnection Architecture and

Flash chips

O

I

SSD

Co

ntr

olle

r

Page

bu

ffer

WW

Page

bu

ffer

Flash Memory Characteristics

• Sensing a page: 0.1 ms

• Programming a page: 1.3~2.6 ms20× gap

20X

20X

8

Page 9: DI-SSDs: DeSymmetrized Interconnection Architecture and

9

Multiple flash chips

O

I

Flash drive

I/O

Co

ntr

olle

r

Page

bu

ffer

WW

Page

bu

ffer

20X

20X

Symmetric Interconnection (SI)

• Interestingly, the interconnections are symmetric • Standardized by JEDEC and ONFi

Clock Frequency Grades (MHz)

SDR 10 20 28 33 40 50

DDR 20 33 50 66 83 100

DDR2/DDR3 33 40 66 83 100 133 166 200 266 333 400

Page 10: DI-SSDs: DeSymmetrized Interconnection Architecture and

Baseline

10

Multiple flash chips

O

I

Flash drive

I/O

Co

ntr

olle

r

Page

bu

ffer

WW

Page

bu

ffer

SI's Dilemma & Compromise

• Dilemma• High bandwidth overdesign for writes• Low bandwidth bottleneck for reads

• Compromise• Select a speed somewhere between the 20× gap• Sacrifice the read strength of flash memory

Page 11: DI-SSDs: DeSymmetrized Interconnection Architecture and

Real USB Drive Measurement

11

16% Memory sensing

72% Interconnection

16 GB, USB 3.0

DDR 100 MHz clock(200 MT/s)

Ready/Busy

Sequential Read

Page 12: DI-SSDs: DeSymmetrized Interconnection Architecture and

Read-dominateusage scenario

12

Multiple flash chips

I

Interconnect

Co

ntr

olle

r

Page

bu

ffer

WW

Page

bu

ffer

Desymmetrized Interconnection (DI)

• Overcome the dilemma

• Allow dedicating resources to speeding up the flash-to-controller direction• Avoid the cost of speeding up the other

• DI breaks the norm of SSD design for decades!

Page 13: DI-SSDs: DeSymmetrized Interconnection Architecture and

DI Implementation

• DI is architecture-level design

• DI applies to various technologies• Serial, parallel, SDR, DDR, raw flash, managed flash,

etc.

13

2x

1x

2x

1x

Page 14: DI-SSDs: DeSymmetrized Interconnection Architecture and

Dynamic Timing Calibration (DTC)

• Proof-of-concept of DI-SSD on top of SDR flash

Flash Memory

CLKn

m

Write Clock

Read Clock

SATA

, PC

Ie, e

tc.

Rea

d B

uff

er

Data

Flash Controller

DTC Manager Routine

D D D

Fine Offset

Coarse Offset

CLKW

CLKR

14

Page 15: DI-SSDs: DeSymmetrized Interconnection Architecture and

DTC’s Goals

Data 1 2

Clock Cycles1 2 3 4 5 6

Coarse offset

CLKR

Fine offset

3

15

Sweep clock frequencies

Sweep (coarse – fine) offsets

Page 16: DI-SSDs: DeSymmetrized Interconnection Architecture and

DTC’s Benefits

• Reclaim flash’s underutilized frequency margins• Fast/slow process corners

• ±10% supply voltage

• 0~70oC environments

• Exploit the fact that outputs can be overlapped

Data 1 2

Clock Cycles1 2 3 4 5 6

CLKR

1st byte2nd byte

3rd byte

16

Page 17: DI-SSDs: DeSymmetrized Interconnection Architecture and

Correctness Guarantees

• DTC only raises flash output frequency• Data transferring into the flash are intact

• Data retention is intact

• Fallback mechanism• Controller can always fall back to a conservative

output frequency (CLKR)

• Raw bit error rate is intact

17

Page 18: DI-SSDs: DeSymmetrized Interconnection Architecture and

Insignificant Overhead

• Coarse and fine offsets• Few tens of logic gates and a multiplexer

• Few flip flops

• Clock divider• Counter

• DTC routine• Upon power-on or

significant temperaturechanges

• On the processor that perform the FTL

18

Flash Memory

CLKn

m

Write Clock

Read Clock

SATA

, PC

Ie, e

tc.

Rea

d B

uff

er

Data

Flash Controller

DTC Manager Routine

D D D

Fine Offset

Coarse Offset

Page 19: DI-SSDs: DeSymmetrized Interconnection Architecture and

Outline

• Contributions

• SI vs. DI SSD architecture

• Proof of concept-dynamic timing calibration

• Evaluation

• Conclusions

19

Page 20: DI-SSDs: DeSymmetrized Interconnection Architecture and

Experimental Setup

• Chip level (DTC)• Industrial strength IC tester

• Real 20 nm MLC NAND flash

• Emulate DTC and measure its benefits

• System level (DI-SSD)• Extended SSDSim simulator

• Synthetic workloads

• Datacenter workloads

20

Page 21: DI-SSDs: DeSymmetrized Interconnection Architecture and

Swept by DTC

Flash-to-Controller Clock Frequency (MHz)

Off

set

(Cyc

les)

50 100 150 200

1.0

1.5

2.0

2.5

3.0

Chip-Level (DTC) Results

21

Baseline

Page 22: DI-SSDs: DeSymmetrized Interconnection Architecture and

1 .0

0 .1 1 .0

0 .3 0 .3

0 .8 0 .3

0 .7 0 .7 0 .8

0 .6

0 .4 0 .8

0 .3

0 .0 0 .8

0 .0 0 .3 1 .0

0 .0 0 .0 0 .8

0 .0 0 .0 0 .1

0 .0 0 .0 0 .0

0 .0 0 .0 0 .0 0 .0

0 .0 0 .0 0 .0

0 .0 0 .0 0 .0 0 .1

0 .0 0 .0 0 .0 0 .1

0 .0 0 .0 0 .0 0 .0 0 .4

0 .0 0 .0 0 .0 0 .0 0 .1

0 .0 0 .0 0 .0 0 .0 0 .0 0 .4

0 .0 0 .0 0 .0 0 .0 0 .0

0 .0 0 .0 0 .0 0 .1 0 .4

0 .0 0 .0 0 .0 0 .6

0 .0 0 .0 0 .0

Chip-Level (DTC) Results

22

Baseline

Flash-to-Controller Clock Frequency (MHz)

Off

set

(Cyc

les)

50 100 150 200

1.0

1.5

2.0

2.5

3.0

Page 23: DI-SSDs: DeSymmetrized Interconnection Architecture and

Synthetic Workloads

• DI-SSD can yield up to 1.9x read response time speedup

23

Page 24: DI-SSDs: DeSymmetrized Interconnection Architecture and

Datacenter Workloads

• DI-SSD can yield 1.7x read response time speedup on average

24

Page 25: DI-SSDs: DeSymmetrized Interconnection Architecture and

Outline

• Contributions

• SI vs. DI SSD architecture

• Proof of concept-dynamic timing calibration

• Evaluation

• Conclusions

25

Page 26: DI-SSDs: DeSymmetrized Interconnection Architecture and

Conclusions

• Symmetric interconnections (SI) are widely used and accepted in SSD architecture design

• However, SI is suboptimal• Flash sensing is 20× faster than programming

• We propose desymmetrized interconnection SSD architecture (DI-SSD)• Dynamic timing calibration (DTC) as a proof of

concept

• 4× higher flash-to-controller speed

• 1.7 to 1.9× read response time improvement

26

Page 27: DI-SSDs: DeSymmetrized Interconnection Architecture and

DI-SSD: Desymmetrized Interconnection Architecture and Dynamic Timing Calibration for Solid-State Drives

Ren-Shuo Liu and Jian-Hao Huang

27