on the off-chip memory latency of real-time systems: is...

49
On the Off-chip Memory Latency of Real-Time Systems: Is DDR DRAM Really the Best Option? Mohamed Hassan

Upload: others

Post on 25-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

On the Off-chip Memory Latency of Real-Time

Systems: Is DDR DRAM Really the Best Option?

Mohamed Hassan

Page 2: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

Outline

01 02 0403 05Motivation DRAMs PREDICTABILITY RLDRAM Results

Page 3: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

SRAMs MOTIVATION

1

• Historically, SRAMs have been the option for

real-time safety-critical embedded systems

• With the increase in data demand

→ the cost became unaffordable

Page 4: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

Work in Off-chip Memory MOTIVATION

[Burchard et al, DATE][Heithecker and Ernst, DAC]

Smart PhonesIBM’s Acorn

2005

20112007

2009 2013

Predator[Akesson et al., CODES+ISSS]

AMC[Paolieri et al., ESL]

PRET[Reineke et al., CODES+ISSS]

COP [Goossens et al.]MultiChannel [Gomony et al.]

ORP [Zheng et al.]

MCMC[Ecco et al., RTCSA2014]DCmc [Jalle et al., RTSS2014]

[Kim et al., RTAS2014]ROC[Krishnapillai et al., ECRTS2014]

RTMem[Li et al., ECRTS2014]ReOrder[Ecco et al., ECRTS2016]

[Guo and Pellizzoni, RTAS2017]

PMC [Hassan et al., RTAS2015][Kim et al., RTAS2015]

MEDUSA[Valsan and Yun, CPSNA2015][Yun et al., ECRTS2015]

ReoRorder[Ecco and Ernst, RTSS2015]

[Hassan and Pellizzoni,EMSOFT2018]

2014

2015

2016/17

2018

1

Page 5: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

Work in Off-chip Memory MOTIVATION

[Burchard et al, DATE][Heithecker and Ernst, DAC]

Smart PhonesIBM’s Acorn

2005

20112007

2009 2013

Predator[Akesson et al., CODES+ISSS]

AMC[Paolieri et al., ESL]

PRET[Reineke et al., CODES+ISSS]

COP [Goossens et al.]MultiChannel [Gomony et al.]

ORP [Zheng et al.]

MCMC[Ecco et al., RTCSA2014]DCmc [Jalle et al., RTSS2014]

[Kim et al., RTAS2014]ROC[Krishnapillai et al., ECRTS2014]

RTMem[Li et al., ECRTS2014]ReOrder[Ecco et al., ECRTS2016]

[Guo and Pellizzoni, RTAS2017]

PMC [Hassan et al., RTAS2015][Kim et al., RTAS2015]

MEDUSA[Valsan and Yun, CPSNA2015][Yun et al., ECRTS2015]

ReoRorder[Ecco and Ernst, RTSS2015]

[Hassan and Pellizzoni,EMSOFT2018]

2014

2015

2016/17

2018

All in Double Data Rate (DDR) DRAMs

1

Page 6: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

A Context about DDRs MOTIVATION

• Low cost

• Large capacity

• High BW

• DDR DRAM is the commodity off-chip memory, Why?

• What is the most important requirement for real-time/safety-critical systems?

• Yes, Predictability

• How is DDRx for predictability?

• DDRx Random Access Memories are not Random at all!!

• Access latency varies notably based on many factors• access patterns

• transaction type (read or write)

• DRAM state from previous accesses

2

Page 7: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

DRAM

• Multiplexed address mode:

• The address bits are split into two segments provided to the

device in two stages:

1. Row address → row decoder

2. Column address → column decoder

✓ Low cost (less pin count)

High latency

Huge variability

Background

3

Page 8: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

DRAM

• A request in general can consist of one, two, or three

commands:

• ACTIVATE (A) command:

• Bring data row from cells into sense amplifiers

Background

3

Page 9: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

DRAM

• DRAM Consists of multiple banks

• The memory controller (MC) manages accesses to DRAM

• A request in general consists of:

• ACTIVATE (A) command:

• Bring data row from cells into sense amplifiers

• Read/Write (R/W) commands:

• To read/write from specific columns in

the sense amplifiers

Background

3

Page 10: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

DRAM

• DRAM Consists of multiple banks

• The memory controller (MC) manages accesses to DRAM

• A request in general consists of:

• ACTIVATE (A) command:

• Bring data row from cells into sense amplifiers

• Read/Write (R/W) commands:

• To read/write from specific columns in

the sense amplifiers

• PRECHARGE (P) command:

• to write back a previous row in the sense

amplifiers before bringing the new one

Background

3

Page 11: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

DRAM

• DRAM Consists of multiple banks

• The memory controller (MC) manages accesses to DRAM

• A request in general consists of:

• ACTIVATE (A) command:

• Bring data row from cells into sense amplifiers

• Read/Write (R/W) commands:

• To read/write from specific columns in

the sense amplifiers

• PRECHARGE (P) command:

• to write back a previous row in the sense

amplifiers before bringing the new one

Background

Row Conflict: P+ A + R/W

Row Idle/Close: A + R/W

Row Hit: R/W

3

Page 12: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

Background DRAM

• DRAM Consists of multiple banks

• The memory controller (MC) manages accesses to DRAM

• A request in general consists of:

• ACTIVATE command

• R/W commands

• PRECHARGE command

• All commands have associated timing constraints that have

to be satisfied by the controller (20+ timing constraints)

R1

DATAtRL

0 37 41

A0

DATAtWL tWTR

27

A1

W0

3 9

tRCD

3

Page 13: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

Work in Off-chip Memory MOTIVATION

[Burchard et al, DATE][Heithecker and Ernst, DAC]

Smart PhonesIBM’s Acorn

2005

20112007

2009 2013

Predator[Akesson et al., CODES+ISSS]

AMC[Paolieri et al., ESL]

PRET[Reineke et al., CODES+ISSS]

COP [Goossens et al.]MultiChannel [Gomony et al.]

ORP [Zheng et al.]

MCMC[Ecco et al., RTCSA2014]DCmc [Jalle et al., RTSS2014]

[Kim et al., RTAS2014]ROC[Krishnapillai et al., ECRTS2014]

RTMem[Li et al., ECRTS2014]ReOrder[Ecco et al., ECRTS2016]

[Guo and Pellizzoni, RTAS2017]

PMC [Hassan et al., RTAS2015][Kim et al., RTAS2015]

MEDUSA[Valsan and Yun, CPSNA2015][Yun et al., ECRTS2015]

ReoRorder[Ecco and Ernst, RTSS2015]

[Hassan and Pellizzoni,EMSOFT2018]

2014

2015

2016/17

2018

1. It can not address the variability in the access latency of the DDRx chips.

2. Still suffers from high WCLs due to the complex interactions between DDRx commands

4

Page 14: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

Assessing DDRs for Predictability MOTIVATION

• Low cost

• high capacity

• High BW

• DDR DRAM is the commodity off-chip memory, Why?

• What is the most important requirement for real-time/safety-critical systems?

• Yes, Predictability

• How is DDRx for predictability?

• DDRx Random Access Memories are not Random at all!!

• access latency varies notably based on many factors• access patterns

• transaction type (read or write)

• DRAM state from previous accesses Comprehensively study these factorsሽ

5

Page 15: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

Assessing DDRs for Predictability PREDICTABILITY

• How is DDRx for predictability?

• Predictability has different definitions in the real-time literature

• One important measure is the relative difference between best- and worst-case execution times (or latencies in case of memories) [Wilhelm et al, TECS08]

• We define “Variability Window” (VW) to quantitatively measure the

DRAM predictability

𝑉𝑊 =𝑊𝐶𝐿 − 𝐵𝐶𝐿

𝑊𝐶𝐿× 100

5

Page 16: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

Assessing DDRs for Predictability PREDICTABILITY

R1

DATAtRL

0 10 14

R1

DATAtRL

0 27 31

W0

DATAtWL tWTR

17

• Targets an open row (only R command)

• (a) is best-case

• (e) arrives after a write to same rank

15 2

1

19

.5

19

.5

40

.5

30 3

6

34

.5

34

.5

55

.5

45

79

.5

93 94

.5

10

8

0

20

40

60

80

100

120

a b c d e f g h i j k l m n o

Late

ncy

[n

s]

6

(a)

(e)

Page 17: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

Assessing DDRs for Predictability PREDICTABILITY

R1

DATAtRL

0 10 14

R1

DATAtRL

0 27 31

W0

DATAtWL tWTR

17

• Targets an open row (only R command)

• (a) is best-case

• (e) arrives after a write to same rank

15 2

1

19

.5

19

.5

40

.5

30 3

6

34

.5

34

.5

55

.5

45

79

.5

93 94

.5

10

8

0

20

40

60

80

100

120

a b c d e f g h i j k l m n o

Late

ncy

[n

s]

6

(a)

(e)

Page 18: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

Assessing DDRs for Predictability PREDICTABILITY

• Targets a close row (A + R command)

• (f) is best-case

• (j) arrives after a closed write to same rank

R1

DATAtRL

0 10 24

R1

DATAtRL

0 37 41

A0

DATAtWL tWTR

27

A1

A1

W0

3 9

tRCD

15 2

1

19

.5

19

.5

40

.5

30 3

6

34

.5

34

.5

55

.5

45

79

.5

93 94

.5

10

8

0

20

40

60

80

100

120

a b c d e f g h i j k l m n o

Late

ncy

[n

s]

6

20

(f)

(j)

Page 19: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

Assessing DDRs for Predictability PREDICTABILITY

• Targets a close row (A + R command)

• (f) is best-case

• (j) arrives after a closed write to same rank

R1

DATAtRL

0 10 24

R1

DATAtRL

0 37 41

A0

DATAtWL tWTR

27

A1

A1

W0

3 9

tRCD

15 2

1

19

.5

19

.5

40

.5

30 3

6

34

.5

34

.5

55

.5

45

79

.5

93 94

.5

10

8

0

20

40

60

80

100

120

a b c d e f g h i j k l m n o

Late

ncy

[n

s]

6

(f)

(j)

20

Page 20: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

Assessing DDRs for Predictability PREDICTABILITY

• Targets a conflict row (P + A + R command)

• (k) is best-case

• (o) arrives after a conflict write to same rank

R1

DATAtRL

0 20 34

0 42 72

A0

DATAtWL

27

A1

W0

9

tRCD

P1

10

P0

R1

DATAtRL

A1

P1

tRL

15 2

1

19

.5

19

.5

40

.5

30 3

6

34

.5

34

.5

55

.5

45

79

.5

93 94

.5

10

8

0

20

40

60

80

100

120

a b c d e f g h i j k l m n o

Late

ncy

[n

s]

6

(k)

(o)

30

Page 21: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

Assessing DDRs for Predictability PREDICTABILITY

• Targets a conflict row (P + A + R command)

• (k) is best-case

• (o) arrives after a conflict write to same rank

R1

DATAtRL

0 20 34

0 42 72

A0

DATAtWL

27

A1

W0

9

tRCD

P1

10

P0

R1

DATAtRL

A1

P1

tRL

15 2

1

19

.5

19

.5

40

.5

30 3

6

34

.5

34

.5

55

.5

45

79

.5

93 94

.5

10

8

0

20

40

60

80

100

120

a b c d e f g h i j k l m n o

Late

ncy

[n

s]

6

(k)

(o)

30

Page 22: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

Assessing DDRs for Predictability PREDICTABILITY

15 2

1

19

.5

19

.5

40

.5

30 3

6

34

.5

34

.5

55

.5

45

79

.5

93 94

.5

10

8

0

20

40

60

80

100

120

a b c d e f g h i j k l m n o

Late

ncy

[n

s]

VW= 620%

WCL=108ns

• 15 cases for a read request

• Another 15 for a write request

6

Page 23: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

Assessing DDR Controllers for Predictability PREDICTABILITY

0

200

400

600

800

1000

1200

VW

%

• We calculate the VW for 8 of the state-of-the-art DDRx DRAM Controllers

7

Page 24: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

Assessing DDR Controllers for Predictability PREDICTABILITY

0

200

400

600

800

1000

1200

VW

%

• We calculate the VW for 8 of the state-of-the-art DDRx DRAM Controllers

6 out of the 8 exceed 800%

7

Page 25: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

Assessing DDR Controllers for predictability PREDICTABILITY

0

200

400

600

800

1000

1200

VW

%

• We calculate the VW for 8 of the state-of-the-art DDRx DRAM Controllers

6 out of the 8 exceed 800%Achieve less variability at the expense of

1. complexity:

• Bank partitioning

• Rank switching

2. Conservatism:

• e.g. using close-page (MCMC)

7

Page 26: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

Assessing DDR Controllers for predictability PREDICTABILITY

0

200

400

600

800

1000

1200

VW

%

• We calculate the VW for 8 of the state-of-the-art DDRx DRAM Controllers

6 out of the 8 exceed 800%Achieve less variability at the expense of

1. complexity:

• Bank partitioning

• Rank switching

2. Conservatism:

• e.g. using close-page (MCMC)

Even with the pessimism and complexity, 261% is still a significant variability for safety-critical systems

7

Page 27: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

Assessing DDR Controllers for predictability PREDICTABILITY

0

200

400

600

800

1000

1200

VW

%

• We calculate the VW for 8 of the state-of-the-art DDRx DRAM Controllers

6 out of the 8 exceed 800%Achieve less variability at the expense of

1. complexity:

• Bank partitioning

• Rank switching

2. Conservatism:

• e.g. using close-page (MCMC)

Even with the pessimism and complexity, 261% is still a significant variability for safety-critical systems

Exploring other types of memories that address these limitations is unavoidable towards providing more predictable memory performance with less variability and tighter bounds

7

Page 28: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

RLDRAM: An Alternative RLDRAM

RLDRAM Introducedby Infinion

1999

RLDRAM2 was introduced by Infinion and Micron

RLDRAM3 was introduced by Micron

2003

2012

+8

Page 29: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

Why RLDRAM? RLDRAM

DDRDRAM

12

RASRow Address

CASCol Address

Row Conflict: P+ A + R/W

Row Idle/Close: A + R/W

Row Hit: R/W

RLDRAMAll Address

Bits

R/W

9Multiplexed Address Mode Non-Multiplexed Address Mode

Page 30: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

Assessing RLDRAM for Predictability RLDRAM

R1

DATAtRL

0 13 17

W1

DATAtWL

0 14 18

R1

DATAtRL

3 16 20

R0

0

W1

DATAtWL

2 16 20

R0

0

R1

DATAtRL

4 17 21

W0

0

R1

DATAtWL

3 17 21

W0

0

R1

DATAtRL

5 18 22

C0

0

tRC

W1

DATAtWL

5 23

C0

0

tRC

19

10

Page 31: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

Assessing RLDRAM for Predictability RLDRAM

R1

DATA

tRL

0 13 17

W1

DATA

tWL

0 14 18

R1

DATA

tRL

3 16 20

R0

0

W1

DATA

tWL

2 16 20

R0

0

R1

DATA

tRL

4 17 21

W0

0

R1

DATA

tWL

3 17 21

W0

0

R1

DATA

tRL

5 18 22

C0

0

tRC

W1

DATA

tWL

5 23

C0

0

tRC

19

• A total of 8 cases• Variability window is 46.2%

(13.4x reduction)• WCL=28.5ns (3.79x reduction)

19

.5 21

24

24 2

5.5

25

.5 27

28

.5

0

5

10

15

20

25

30

a b c d e f g h

Late

ncy

[n

s]46.2%

10

Page 32: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

RLDC: A Predictable Controller for RLDRAM RLDRAM

Pro

cess

or

Dec

od

er

Command Generation

Address Mapping

Timing Checker

RR

Arb

iter

RLD

RA

M

type

address

perPEBuffers

cmd

ba

addr

• RLDC to predictably manage accesses to RLDRAM

• Round Robin• Support both bank sharing and

bank partitioning• Simple timing checker → good for

analyzability, V&V, Certification

11

Page 33: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

Bounding Memory Latency RLDRAM

R

DATAtRL

6 12 18

W

0

RW

31

21 43

11 11

: Processor

: Bank

• Bank Sharing Scheme:

𝑊𝐶𝐿𝑠ℎ𝑎𝑟𝑒 = 𝑁 − 1 × 𝑡𝑅𝐶+𝑡𝐶𝐿

N is number of processing elements

tRC

12

Page 34: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

Bounding Memory Latency RLDRAM

N is number of processing elements

R

DATAtRL

5 8 13

W

0

RW

26

21 43

21 43

: Processor

: Bank

𝑊𝐶𝐿𝑝𝑎𝑟𝑡 =𝑁 − 1

2× 𝑡𝑊𝐿 − 𝑡𝑅𝐿 +

𝐵𝐿

2

+𝑁 − 1

2× 𝑡𝑅𝐿 − 𝑡𝑊𝐿 +

𝐵𝐿

2

+ 𝑡𝐶𝐿

W-to-R Delay

R-to-W Delay

• Bank Partitioning Scheme:

12

Page 35: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

Evaluation Setup RESULTS

PEs

4 Processorsin-order pipelinea private 16KB L1 a shared 1MB L2 cache

DRAM Either RLDRAM or DDRx

RLDRAMRLDRAM3-1600RLDC manages accesses to RLDRAM

DDRxDDR3-1600 AMC, PMC, RTMem, DCmc, ORP, MCMC, ROC, or ReOrder manages access to DDR3

Bank Management

We experiment with both bank partitioning and bank sharing among PEs for RLDC

Benchmarks EEMBC Automotive

13

Page 36: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

Worst-Case Latency RESULTS

0

50

100

150

200

250

300

WC

L (n

s)

experimental analytical

1. DDRx MC has 2.5x to 6.46xworse analytical WCL than RLDC→Very similar numbers for exp.

2. Relatively low WCL of MCMC, ROC, ReOrder is due to 4 ranks!

3. For RLDC: bank partitioning provide tighter WCL than sharing→ at the expense of flexibility

4. Gap between exp. vs analytical WCL is much higher for DDR →again due to inherent variability

14

Page 37: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

Variability Window RESULTS

1. Already discussed analytical VW2. Exp. VW for DDRx MCS:

• >400% for 4 MCs, • 300%-400% for 3 MCs and• ~200% for 1 MC.

3. for RLDC:• 76.9% for partitioned banks• 84.6% for shared banks

0

200

400

600

800

1000

1200

VW

experimental analytical

15

Page 38: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

Scalability: # Processors RESULTS

1. The WCL latency gap between RLDC and majority of DDRx MCs increases drastically

16

Page 39: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

Scalability: # Processors RESULTS

1. The WCL latency gap between RLDC and majority of DDRx MCs increases drastically

2. DDRx MCs can be categorized into three categories

cat1 cat2 cat3

Bank

sharing

Bank

partBank part +

multi-rank

16

Page 40: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

Scalability: # Processors RESULTS

1. The WCL latency gap between RLDC and majority of DDRx MCs increases drastically

2. DDRx MCs can be categorized into three categories

3. RLDC’s WCL is less than all categories for all #PEs → for both part and sharing→ without complex arbitration/ reorderings (better analyzability and composability)

cat1 cat2 cat3

Bank

sharing

Bank

partBank part +

multi-rank

16

Page 41: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

Discussion RESULTS

1. How mature is RLDRAM?

• Has been there since 1999

• Long-term Supported by Micron

• The main off-chip memory in networking

and other low-latency needs

17

Page 42: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

Discussion RESULTS

2. Can I buy RLDRAM off-the-shelf?

• Definitely

• They are sold as discrete components (which is the norm for

embedded systems anyway)

• There are also specialized DDR-compatible sockets for

RLDRAM

17

Page 43: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

Discussion RESULTS

3. Are there any platforms/boards/test-beds?

ATCA-9305-NSPProcessing blade

Zynq UltraScale/+ MPSoC

Arria10 SoC

17

Page 44: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

Discussion RESULTS

4. If it is that good, why it did not take off then?

• Take off = replaces DDR? → It is not meant to be!

• It is a specialized type of memory for low-latency guarantees

• Commodity general-purpose market is looking for high BW, capacity,

cost

• We should ask ourselves, what are we looking for?

• An analogy: FPGAs have been there for long-time, why are they

taking off now? → they satisfy a “new” need in the AI market

• Will they replace CPUs then?

• It is not an either-or decision

17

Page 45: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

Discussion RESULTS

5. Does that mean we no longer need DRAMs in real-time systems?

• No

• Use-case dependent

• A heterogenous memory system?

• Mixed Criticality with different requirements?

17

Page 46: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

Discussion RESULTS

6. As a scheduling researcher, why should I care?

• Taking the shared resources interference into account is

unavoidable to provide more accurate numbers

• DDR DRAM is very complex to account for its details (rd vs

wr, conf vs hit, ..etc) → explodes the analysis

• RLDRAM alleviates this complexity..which can make the

analysis more feasible

17

Page 47: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

Discussion RESULTS

7. Limitations/trade-off

17

COST

DDRx RLDRAM SRAM

DDRx RLDRAM SRAM

Latency

Page 48: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

15 21

19

.51

9.5 4

0.5

30 36

34

.53

4.5 5

5.5

45

79

.5 93 94

.5 10

8

0

50

100

150

a c e g i k m o

Late

ncy

[n

s] VW= 620%

0

200

400

600

800

1000

1200

VW

%

19

.5 21

24

24 2

5.5

25

.5 27

28

.5

0

5

10

15

20

25

30

a b c d e f g h

Late

ncy

[n

s]

46.2%

Pro

cess

or

Dec

od

er Command Generation

Address Mapping

Timing Checker

RR

Arb

iter

RLD

RA

M

type

address

perPEBuffers

cmd

ba

addr

11× less timing variability6.4× less WCL

Main Lessons:1. DDR DRAM is not

designed for predictability,RLDRAM is.

2. Looking for solns that address our needs insteadof starting from the mainstream soln?

3. Not either-or: A heterogenous memoryto address conflictingneeds of MCS

Page 49: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li

15 21

19

.51

9.5 4

0.5

30 36

34

.53

4.5 5

5.5

45

79

.5 93 94

.5 10

8

0

50

100

150

a c e g i k m o

Late

ncy

[n

s] VW= 620%

0

200

400

600

800

1000

1200

VW

%

19

.5 21

24

24 2

5.5

25

.5 27

28

.5

0

5

10

15

20

25

30

a b c d e f g h

Late

ncy

[n

s]

46.2%

Pro

cess

or

Dec

od

er Command Generation

Address Mapping

Timing Checker

RR

Arb

iter

RLD

RA

M

type

address

perPEBuffers

cmd

ba

addr

11× less timing variability6.4× less WCL

Main Lessons:1. DDR DRAM is not

designed for predictability,RLDRAM is.

2. Looking for solns that address our needs insteadof starting from the mainstream soln?

3. Not either-or: A heterogenous memoryto address conflictingneeds of MCS