a flexible and practical open-source infrastructure for ...safari/pubs/softmc_hpca17-talk.pdf ·...

35
SoftMC A Flexible and Practical Open-Source Infrastructure for Enabling Experimental DRAM Studies Hasan Hassan , Nandita Vijaykumar, Samira Khan, Saugata Ghose, Kevin Chang, Gennady Pekhimenko, Donghyuk Lee, Oguz Ergin, Onur Mutlu 1

Upload: hoanghanh

Post on 14-Mar-2019

217 views

Category:

Documents


0 download

TRANSCRIPT

SoftMCA Flexible and Practical

Open-Source Infrastructure for Enabling Experimental DRAM Studies

Hasan Hassan,

Nandita Vijaykumar, Samira Khan,

Saugata Ghose, Kevin Chang,

Gennady Pekhimenko, Donghyuk Lee,

Oguz Ergin, Onur Mutlu

1

Executive Summary• Two critical problems of DRAM: Reliability and Performance

‒ Recently-discovered bug: RowHammer

• Characterize, analyze, and understand DRAM cell behavior

• We design and implement SoftMC, an FPGA-based DRAM

testing infrastructure

‒ Flexible and Easy to Use (C++ API)

‒ Open-source (github.com/CMU-SAFARI/SoftMC)

• We implement two use cases

‒ A retention time distribution test

‒ An experiment to validate two latency reduction mechanisms

• SoftMC enables a wide range of studies

2

Outline

3

1. DRAM Basics & Motivation

2. SoftMC

3. Use Cases

– Retention Time Distribution Study

– Evaluating Recently-Proposed Ideas

4. Future Research Directions

5. Conclusion

DRAM Operations

4

CPU

Memory Controller

DRAM Row

Sense Amplifier

ActivateReadPrecharge

Memory Bus

DRAM Cell

DRAM Latency

5

Activate

time

Read Precharge

Activation Latency

Ready-to-access Latency Precharge

Latency

Activate

0 (refresh) 64 msDRAM

Cell

Sense Amplifier

Retention Time: The interval during which the data

is retained correctly in the DRAM cell without accessing it

Latency vs. Reliability

6

Activate

time

Read Precharge

Activation Latency

Ready-to-access Latency Precharge

Latency

Activate

DRAMCell

Sense Amplifier

Violating latencies negatively affects DRAM reliability

Other Factors Affecting Reliability and Latency

• Temperature

• Voltage

• Inter-cell Interference

• Manufacturing Process

• Retention Time

• …

7

To develop new mechanisms improving reliability and latency,

we need to better understand the effects of these factors

Characterizing DRAM

Many of the factors

affecting DRAM reliability and latency

cannot be properly modeled

8

We need to perform experimental studies

of real DRAM chips

Outline

9

1. DRAM Basics & Motivation

2. SoftMC

3. Use Cases

– Retention Time Distribution Study

– Evaluating Recently-Proposed Ideas

4. Future Research Directions

5. Conclusion

Goals of a DRAM Testing Infrastructure

• Flexibility

‒ Ability to test any DRAM operation

‒ Ability to test any combination of DRAM operations and custom timing parameters

• Ease of use

‒ Simple programming interface (C++)

‒ Minimal programming effort and time

‒ Accessible to a wide range of users

• who may lack experience in hardware design10

SoftMC: High-level View

11

FPGA-based memory characterization infrastructure

Prototype using Xilinx ML605

Easily programmable using

the C++ API

SoftMC: Key Components

12

1. SoftMC API

2. PCIe Driver

3. SoftMC Hardware

SoftMC API

InstructionSequence iseq;

iseq.insert(genACT(bank, row));

iseq.insert(genWAIT(tRCD));

iseq.insert(genWR(bank, col, data));

iseq.insert(genWAIT(tCL + tBL + tWR));

iseq.insert(genPRE(bank));

iseq.insert(genWAIT(tRP));

iseq.insert(genEND());

iseq.execute(fpga);

13

Writing data to DRAM:

Instruction generator functions

SoftMC: Key Components

14

1. SoftMC API

2. PCIe Driver*

3. SoftMC Hardware

Communicates raw data with the FPGA

* Jacobsen, Matthew, et al. "RIFFA 2.1: A reusable integration framework for FPGA accelerators." TRETS, 2015

SoftMC Hardware

15

PC

IeC

on

tro

ller

Instruction ReceiverInstruction

Queue

Inst

ruct

ion

D

isp

atch

er

DDR PHY

Auto-refresh

Controller

Calibration Controller

Read Capture

SoftMC Hardware (FPGA)

Instructions

ActivateReadWait (Ready-to-access Latency)

Data

Host Machine

DRAM

Outline

16

1. DRAM Basics & Motivation

2. SoftMC

3. Use Cases

– Retention Time Distribution Study

– Evaluating Recently-Proposed Ideas

4. Future Research Directions

5. Conclusion

Retention Time Distribution Study

17

Increase the refresh interval

Observe Errors

Read Back

Wait (Refresh Interval)

Write Reference Data to a Row

Can be implemented with just ~100 lines of code

Retention Time Test: Results

18

0

2000

4000

6000

8000

0 1 2 3 4 5 6 7 8

Nu

mb

er o

f E

rro

neo

us

By

tes

Refresh Interval (s)

@ ~20⁰C (room temperature)

Module A

Module B

Module C

Validates the correctness of the SoftMC Infrastructure

Outline

19

1. DRAM Basics & Motivation

2. SoftMC

3. Use Cases

– Retention Time Distribution Study

– Evaluating Recently-Proposed Ideas

4. Future Research Directions

5. Conclusion

Accessing Highly-charged Cells Faster

20

NUAT(Shin+, HPCA 2014)

ChargeCache(Hassan+, HPCA 2016)

A highly-charged cell can be accessed with low latency

How a Highly-Charged Cell Is Accessed Faster?

21

Activate

DRAMCell

Sense Amplifier

time

Read Precharge

Activation Latency

Ready-to-access Latency Precharge

Latency

Activate

0 (refresh) 64 ms

Ready-to-access Latency Test

22

Change the Wait Interval

Observe Errors

Read Back

Wait for theWait Interval

Write Reference

Data

Longer wait

Shorter wait Higher cell charge

Lower cell charge

With custom ready-to-access latency parameter

Can be implemented with just ~150 lines of code

Ready-to-access Latency: Results

23

0

100

200

300

400

500

83

25

68

01

04

12

81

52

17

62

00

22

42

48

27

22

96

32

03

44

36

83

92

41

64

40

46

44

88N

um

be

r o

f E

rro

ne

ou

s B

yte

s

Wait Interval (ms)

6 5 4 3

Latency (cycles)

@ 80⁰C temperature

Real Curves

We do not observe the expected

latency reduction effect in existing DRAM chipsN

um

be

r o

f E

rro

ne

ou

s B

yte

s

Refresh Interval

6 5 4 3

Latency (cycles)

Expected Curves

Why Don’t We See the Latency Reduction Effect?

• The memory controller cannot externally control when a

sense amplifier gets enabled in existing DRAM chips

24

Data 0

Data 1

Cell

time

cha

rge

Sense Amp

R/WACT

Ready to AccessCharge Level

Ready to Access

Fixed Latency!

Enabling the Sense Amplifier

Potential Reduction

Outline

25

1. DRAM Basics & Motivation

2. SoftMC

3. Use Cases

– Retention Time Distribution Study

– Evaluating Recently-Proposed Ideas

4. Future Research Directions

5. Conclusion

Future Research Directions

• More Characterization of DRAM

‒ How are the cell characteristics changing with different generations of technology nodes?

‒ What types of usage accelerate aging?

• Characterization of Non-volatile Memory

• Extensions

‒ Memory Scheduling

‒ Workload Analysis

‒ Testbed for in-memory Computation26

Outline

27

1. DRAM Basics & Motivation

2. SoftMC

3. Use Cases

– Retention Time Distribution Study

– Evaluating Recently-Proposed Ideas

4. Future Research Directions

5. Conclusion

Conclusion• SoftMC: First publicly-available FPGA-based DRAM

testing infrastructure

• Flexible and Easy to Use

• Implemented two use cases

‒ Retention Time Distribution Study

‒ Evaluation of two recently-proposed latency reduction mechanisms

• SoftMC can enable many other studies, ideas, and

methodologies in the design of future memory systems

• Download our first prototype

28

github.com/CMU-SAFARI/SoftMC

SoftMCA Flexible and Practical

Open-Source Infrastructure for Enabling Experimental DRAM Studies

Hasan Hassan,

Nandita Vijaykumar, Samira Khan,

Saugata Ghose, Kevin Chang,

Gennady Pekhimenko, Donghyuk Lee,

Oguz Ergin, Onur Mutlu

29

Backup Slides

30

Key SoftMC Instructions

31

SoftMC @ Github

32

Ready-to-Access Latency Test Results

33

1

10

100

1000

10000

100000

1000000

8

56

10

4

15

2

20

0

24

8

29

6

34

4

39

2

44

0

48

8

Nu

mb

er

of

Err

on

eo

us

By

tes

Wait Interval (ms)

6 5 4 3

Latency (cycles)

0

100

200

300

400

500

8

64

12

0

17

6

23

2

28

8

34

4

40

0

45

6

Nu

mb

er

of

Err

on

eo

us

By

tes

Wait Interval (ms)

6 5 4 3

Latency (cycles)

110

1001000

10000100000

100000010000000

8

48

88

12

8

16

8

20

8

24

8

28

8

32

8

36

8

40

8

44

8

48

8

Nu

mb

er

of

Err

on

eo

us

By

tes

Wait Interval (ms)

6 5 4 3

Latency (cycles)

Module A

Module C

Module B

Activation Latency Test

34

Change the wait interval

Observe Errors

ACT-PRE

Wait for theWait Interval

Write Reference

Data

With low activation latency parameter

Wait for theWait Interval

Read Back

Activation Latency Test Results

35

0

10000

20000

30000

40000

8

64

12

0

17

6

23

2

28

8

34

4

40

0

45

6Nu

mb

er

of

Err

on

eo

us

By

tes

Wait Interval (ms)

14 11 8 5 2

Latency (cycles)

0

200

400

600

800

1000

1200

8

56

10

4

15

2

20

0

24

8

29

6

34

4

39

2

44

0

48

8Nu

mb

er

of

Err

on

eo

us

By

tes

Wait Interval (ms)

14 11 8 5 2

Latency (cycles)

0

400

800

1200

1600

2000

2400

8

48

88

12

8

16

8

20

8

24

8

28

8

32

8

36

8

40

8

44

8

48

8

Nu

mb

er

of

Err

on

eo

us

By

tes

Wait Interval (ms)

14 11 8 5 2

Latency (cycles)

Module A

Module C

Module B