on the off-chip memory latency of real-time systems: is...
TRANSCRIPT
![Page 1: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/1.jpg)
On the Off-chip Memory Latency of Real-Time
Systems: Is DDR DRAM Really the Best Option?
Mohamed Hassan
![Page 2: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/2.jpg)
Outline
01 02 0403 05Motivation DRAMs PREDICTABILITY RLDRAM Results
![Page 3: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/3.jpg)
SRAMs MOTIVATION
1
• Historically, SRAMs have been the option for
real-time safety-critical embedded systems
• With the increase in data demand
→ the cost became unaffordable
![Page 4: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/4.jpg)
Work in Off-chip Memory MOTIVATION
[Burchard et al, DATE][Heithecker and Ernst, DAC]
Smart PhonesIBM’s Acorn
2005
20112007
2009 2013
Predator[Akesson et al., CODES+ISSS]
AMC[Paolieri et al., ESL]
PRET[Reineke et al., CODES+ISSS]
COP [Goossens et al.]MultiChannel [Gomony et al.]
ORP [Zheng et al.]
MCMC[Ecco et al., RTCSA2014]DCmc [Jalle et al., RTSS2014]
[Kim et al., RTAS2014]ROC[Krishnapillai et al., ECRTS2014]
RTMem[Li et al., ECRTS2014]ReOrder[Ecco et al., ECRTS2016]
[Guo and Pellizzoni, RTAS2017]
PMC [Hassan et al., RTAS2015][Kim et al., RTAS2015]
MEDUSA[Valsan and Yun, CPSNA2015][Yun et al., ECRTS2015]
ReoRorder[Ecco and Ernst, RTSS2015]
[Hassan and Pellizzoni,EMSOFT2018]
2014
2015
2016/17
2018
1
![Page 5: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/5.jpg)
Work in Off-chip Memory MOTIVATION
[Burchard et al, DATE][Heithecker and Ernst, DAC]
Smart PhonesIBM’s Acorn
2005
20112007
2009 2013
Predator[Akesson et al., CODES+ISSS]
AMC[Paolieri et al., ESL]
PRET[Reineke et al., CODES+ISSS]
COP [Goossens et al.]MultiChannel [Gomony et al.]
ORP [Zheng et al.]
MCMC[Ecco et al., RTCSA2014]DCmc [Jalle et al., RTSS2014]
[Kim et al., RTAS2014]ROC[Krishnapillai et al., ECRTS2014]
RTMem[Li et al., ECRTS2014]ReOrder[Ecco et al., ECRTS2016]
[Guo and Pellizzoni, RTAS2017]
PMC [Hassan et al., RTAS2015][Kim et al., RTAS2015]
MEDUSA[Valsan and Yun, CPSNA2015][Yun et al., ECRTS2015]
ReoRorder[Ecco and Ernst, RTSS2015]
[Hassan and Pellizzoni,EMSOFT2018]
2014
2015
2016/17
2018
All in Double Data Rate (DDR) DRAMs
1
![Page 6: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/6.jpg)
A Context about DDRs MOTIVATION
• Low cost
• Large capacity
• High BW
• DDR DRAM is the commodity off-chip memory, Why?
• What is the most important requirement for real-time/safety-critical systems?
• Yes, Predictability
• How is DDRx for predictability?
• DDRx Random Access Memories are not Random at all!!
• Access latency varies notably based on many factors• access patterns
• transaction type (read or write)
• DRAM state from previous accesses
2
![Page 7: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/7.jpg)
DRAM
• Multiplexed address mode:
• The address bits are split into two segments provided to the
device in two stages:
1. Row address → row decoder
2. Column address → column decoder
✓ Low cost (less pin count)
High latency
Huge variability
Background
3
![Page 8: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/8.jpg)
DRAM
• A request in general can consist of one, two, or three
commands:
• ACTIVATE (A) command:
• Bring data row from cells into sense amplifiers
Background
3
![Page 9: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/9.jpg)
DRAM
• DRAM Consists of multiple banks
• The memory controller (MC) manages accesses to DRAM
• A request in general consists of:
• ACTIVATE (A) command:
• Bring data row from cells into sense amplifiers
• Read/Write (R/W) commands:
• To read/write from specific columns in
the sense amplifiers
Background
3
![Page 10: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/10.jpg)
DRAM
• DRAM Consists of multiple banks
• The memory controller (MC) manages accesses to DRAM
• A request in general consists of:
• ACTIVATE (A) command:
• Bring data row from cells into sense amplifiers
• Read/Write (R/W) commands:
• To read/write from specific columns in
the sense amplifiers
• PRECHARGE (P) command:
• to write back a previous row in the sense
amplifiers before bringing the new one
Background
3
![Page 11: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/11.jpg)
DRAM
• DRAM Consists of multiple banks
• The memory controller (MC) manages accesses to DRAM
• A request in general consists of:
• ACTIVATE (A) command:
• Bring data row from cells into sense amplifiers
• Read/Write (R/W) commands:
• To read/write from specific columns in
the sense amplifiers
• PRECHARGE (P) command:
• to write back a previous row in the sense
amplifiers before bringing the new one
Background
Row Conflict: P+ A + R/W
Row Idle/Close: A + R/W
Row Hit: R/W
3
![Page 12: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/12.jpg)
Background DRAM
• DRAM Consists of multiple banks
• The memory controller (MC) manages accesses to DRAM
• A request in general consists of:
• ACTIVATE command
• R/W commands
• PRECHARGE command
• All commands have associated timing constraints that have
to be satisfied by the controller (20+ timing constraints)
R1
DATAtRL
0 37 41
A0
DATAtWL tWTR
27
A1
W0
3 9
tRCD
3
![Page 13: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/13.jpg)
Work in Off-chip Memory MOTIVATION
[Burchard et al, DATE][Heithecker and Ernst, DAC]
Smart PhonesIBM’s Acorn
2005
20112007
2009 2013
Predator[Akesson et al., CODES+ISSS]
AMC[Paolieri et al., ESL]
PRET[Reineke et al., CODES+ISSS]
COP [Goossens et al.]MultiChannel [Gomony et al.]
ORP [Zheng et al.]
MCMC[Ecco et al., RTCSA2014]DCmc [Jalle et al., RTSS2014]
[Kim et al., RTAS2014]ROC[Krishnapillai et al., ECRTS2014]
RTMem[Li et al., ECRTS2014]ReOrder[Ecco et al., ECRTS2016]
[Guo and Pellizzoni, RTAS2017]
PMC [Hassan et al., RTAS2015][Kim et al., RTAS2015]
MEDUSA[Valsan and Yun, CPSNA2015][Yun et al., ECRTS2015]
ReoRorder[Ecco and Ernst, RTSS2015]
[Hassan and Pellizzoni,EMSOFT2018]
2014
2015
2016/17
2018
1. It can not address the variability in the access latency of the DDRx chips.
2. Still suffers from high WCLs due to the complex interactions between DDRx commands
4
![Page 14: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/14.jpg)
Assessing DDRs for Predictability MOTIVATION
• Low cost
• high capacity
• High BW
• DDR DRAM is the commodity off-chip memory, Why?
• What is the most important requirement for real-time/safety-critical systems?
• Yes, Predictability
• How is DDRx for predictability?
• DDRx Random Access Memories are not Random at all!!
• access latency varies notably based on many factors• access patterns
• transaction type (read or write)
• DRAM state from previous accesses Comprehensively study these factorsሽ
5
![Page 15: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/15.jpg)
Assessing DDRs for Predictability PREDICTABILITY
• How is DDRx for predictability?
• Predictability has different definitions in the real-time literature
• One important measure is the relative difference between best- and worst-case execution times (or latencies in case of memories) [Wilhelm et al, TECS08]
• We define “Variability Window” (VW) to quantitatively measure the
DRAM predictability
𝑉𝑊 =𝑊𝐶𝐿 − 𝐵𝐶𝐿
𝑊𝐶𝐿× 100
5
![Page 16: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/16.jpg)
Assessing DDRs for Predictability PREDICTABILITY
R1
DATAtRL
0 10 14
R1
DATAtRL
0 27 31
W0
DATAtWL tWTR
17
• Targets an open row (only R command)
• (a) is best-case
• (e) arrives after a write to same rank
15 2
1
19
.5
19
.5
40
.5
30 3
6
34
.5
34
.5
55
.5
45
79
.5
93 94
.5
10
8
0
20
40
60
80
100
120
a b c d e f g h i j k l m n o
Late
ncy
[n
s]
6
(a)
(e)
![Page 17: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/17.jpg)
Assessing DDRs for Predictability PREDICTABILITY
R1
DATAtRL
0 10 14
R1
DATAtRL
0 27 31
W0
DATAtWL tWTR
17
• Targets an open row (only R command)
• (a) is best-case
• (e) arrives after a write to same rank
15 2
1
19
.5
19
.5
40
.5
30 3
6
34
.5
34
.5
55
.5
45
79
.5
93 94
.5
10
8
0
20
40
60
80
100
120
a b c d e f g h i j k l m n o
Late
ncy
[n
s]
6
(a)
(e)
![Page 18: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/18.jpg)
Assessing DDRs for Predictability PREDICTABILITY
• Targets a close row (A + R command)
• (f) is best-case
• (j) arrives after a closed write to same rank
R1
DATAtRL
0 10 24
R1
DATAtRL
0 37 41
A0
DATAtWL tWTR
27
A1
A1
W0
3 9
tRCD
15 2
1
19
.5
19
.5
40
.5
30 3
6
34
.5
34
.5
55
.5
45
79
.5
93 94
.5
10
8
0
20
40
60
80
100
120
a b c d e f g h i j k l m n o
Late
ncy
[n
s]
6
20
(f)
(j)
![Page 19: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/19.jpg)
Assessing DDRs for Predictability PREDICTABILITY
• Targets a close row (A + R command)
• (f) is best-case
• (j) arrives after a closed write to same rank
R1
DATAtRL
0 10 24
R1
DATAtRL
0 37 41
A0
DATAtWL tWTR
27
A1
A1
W0
3 9
tRCD
15 2
1
19
.5
19
.5
40
.5
30 3
6
34
.5
34
.5
55
.5
45
79
.5
93 94
.5
10
8
0
20
40
60
80
100
120
a b c d e f g h i j k l m n o
Late
ncy
[n
s]
6
(f)
(j)
20
![Page 20: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/20.jpg)
Assessing DDRs for Predictability PREDICTABILITY
• Targets a conflict row (P + A + R command)
• (k) is best-case
• (o) arrives after a conflict write to same rank
R1
DATAtRL
0 20 34
0 42 72
A0
DATAtWL
27
A1
W0
9
tRCD
P1
10
P0
R1
DATAtRL
A1
P1
tRL
15 2
1
19
.5
19
.5
40
.5
30 3
6
34
.5
34
.5
55
.5
45
79
.5
93 94
.5
10
8
0
20
40
60
80
100
120
a b c d e f g h i j k l m n o
Late
ncy
[n
s]
6
(k)
(o)
30
![Page 21: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/21.jpg)
Assessing DDRs for Predictability PREDICTABILITY
• Targets a conflict row (P + A + R command)
• (k) is best-case
• (o) arrives after a conflict write to same rank
R1
DATAtRL
0 20 34
0 42 72
A0
DATAtWL
27
A1
W0
9
tRCD
P1
10
P0
R1
DATAtRL
A1
P1
tRL
15 2
1
19
.5
19
.5
40
.5
30 3
6
34
.5
34
.5
55
.5
45
79
.5
93 94
.5
10
8
0
20
40
60
80
100
120
a b c d e f g h i j k l m n o
Late
ncy
[n
s]
6
(k)
(o)
30
![Page 22: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/22.jpg)
Assessing DDRs for Predictability PREDICTABILITY
15 2
1
19
.5
19
.5
40
.5
30 3
6
34
.5
34
.5
55
.5
45
79
.5
93 94
.5
10
8
0
20
40
60
80
100
120
a b c d e f g h i j k l m n o
Late
ncy
[n
s]
VW= 620%
WCL=108ns
• 15 cases for a read request
• Another 15 for a write request
6
![Page 23: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/23.jpg)
Assessing DDR Controllers for Predictability PREDICTABILITY
0
200
400
600
800
1000
1200
VW
%
• We calculate the VW for 8 of the state-of-the-art DDRx DRAM Controllers
7
![Page 24: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/24.jpg)
Assessing DDR Controllers for Predictability PREDICTABILITY
0
200
400
600
800
1000
1200
VW
%
• We calculate the VW for 8 of the state-of-the-art DDRx DRAM Controllers
6 out of the 8 exceed 800%
7
![Page 25: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/25.jpg)
Assessing DDR Controllers for predictability PREDICTABILITY
0
200
400
600
800
1000
1200
VW
%
• We calculate the VW for 8 of the state-of-the-art DDRx DRAM Controllers
6 out of the 8 exceed 800%Achieve less variability at the expense of
1. complexity:
• Bank partitioning
• Rank switching
2. Conservatism:
• e.g. using close-page (MCMC)
7
![Page 26: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/26.jpg)
Assessing DDR Controllers for predictability PREDICTABILITY
0
200
400
600
800
1000
1200
VW
%
• We calculate the VW for 8 of the state-of-the-art DDRx DRAM Controllers
6 out of the 8 exceed 800%Achieve less variability at the expense of
1. complexity:
• Bank partitioning
• Rank switching
2. Conservatism:
• e.g. using close-page (MCMC)
Even with the pessimism and complexity, 261% is still a significant variability for safety-critical systems
7
![Page 27: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/27.jpg)
Assessing DDR Controllers for predictability PREDICTABILITY
0
200
400
600
800
1000
1200
VW
%
• We calculate the VW for 8 of the state-of-the-art DDRx DRAM Controllers
6 out of the 8 exceed 800%Achieve less variability at the expense of
1. complexity:
• Bank partitioning
• Rank switching
2. Conservatism:
• e.g. using close-page (MCMC)
Even with the pessimism and complexity, 261% is still a significant variability for safety-critical systems
Exploring other types of memories that address these limitations is unavoidable towards providing more predictable memory performance with less variability and tighter bounds
7
![Page 28: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/28.jpg)
RLDRAM: An Alternative RLDRAM
RLDRAM Introducedby Infinion
1999
RLDRAM2 was introduced by Infinion and Micron
RLDRAM3 was introduced by Micron
2003
2012
+8
![Page 29: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/29.jpg)
Why RLDRAM? RLDRAM
DDRDRAM
12
RASRow Address
CASCol Address
Row Conflict: P+ A + R/W
Row Idle/Close: A + R/W
Row Hit: R/W
RLDRAMAll Address
Bits
R/W
9Multiplexed Address Mode Non-Multiplexed Address Mode
![Page 30: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/30.jpg)
Assessing RLDRAM for Predictability RLDRAM
R1
DATAtRL
0 13 17
W1
DATAtWL
0 14 18
R1
DATAtRL
3 16 20
R0
0
W1
DATAtWL
2 16 20
R0
0
R1
DATAtRL
4 17 21
W0
0
R1
DATAtWL
3 17 21
W0
0
R1
DATAtRL
5 18 22
C0
0
tRC
W1
DATAtWL
5 23
C0
0
tRC
19
10
![Page 31: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/31.jpg)
Assessing RLDRAM for Predictability RLDRAM
R1
DATA
tRL
0 13 17
W1
DATA
tWL
0 14 18
R1
DATA
tRL
3 16 20
R0
0
W1
DATA
tWL
2 16 20
R0
0
R1
DATA
tRL
4 17 21
W0
0
R1
DATA
tWL
3 17 21
W0
0
R1
DATA
tRL
5 18 22
C0
0
tRC
W1
DATA
tWL
5 23
C0
0
tRC
19
• A total of 8 cases• Variability window is 46.2%
(13.4x reduction)• WCL=28.5ns (3.79x reduction)
19
.5 21
24
24 2
5.5
25
.5 27
28
.5
0
5
10
15
20
25
30
a b c d e f g h
Late
ncy
[n
s]46.2%
10
![Page 32: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/32.jpg)
RLDC: A Predictable Controller for RLDRAM RLDRAM
Pro
cess
or
Dec
od
er
Command Generation
Address Mapping
Timing Checker
RR
Arb
iter
RLD
RA
M
type
address
perPEBuffers
cmd
ba
addr
• RLDC to predictably manage accesses to RLDRAM
• Round Robin• Support both bank sharing and
bank partitioning• Simple timing checker → good for
analyzability, V&V, Certification
11
![Page 33: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/33.jpg)
Bounding Memory Latency RLDRAM
R
DATAtRL
6 12 18
W
0
RW
31
21 43
11 11
: Processor
: Bank
• Bank Sharing Scheme:
𝑊𝐶𝐿𝑠ℎ𝑎𝑟𝑒 = 𝑁 − 1 × 𝑡𝑅𝐶+𝑡𝐶𝐿
N is number of processing elements
tRC
12
![Page 34: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/34.jpg)
Bounding Memory Latency RLDRAM
N is number of processing elements
R
DATAtRL
5 8 13
W
0
RW
26
21 43
21 43
: Processor
: Bank
𝑊𝐶𝐿𝑝𝑎𝑟𝑡 =𝑁 − 1
2× 𝑡𝑊𝐿 − 𝑡𝑅𝐿 +
𝐵𝐿
2
+𝑁 − 1
2× 𝑡𝑅𝐿 − 𝑡𝑊𝐿 +
𝐵𝐿
2
+ 𝑡𝐶𝐿
W-to-R Delay
R-to-W Delay
• Bank Partitioning Scheme:
12
![Page 35: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/35.jpg)
Evaluation Setup RESULTS
PEs
4 Processorsin-order pipelinea private 16KB L1 a shared 1MB L2 cache
DRAM Either RLDRAM or DDRx
RLDRAMRLDRAM3-1600RLDC manages accesses to RLDRAM
DDRxDDR3-1600 AMC, PMC, RTMem, DCmc, ORP, MCMC, ROC, or ReOrder manages access to DDR3
Bank Management
We experiment with both bank partitioning and bank sharing among PEs for RLDC
Benchmarks EEMBC Automotive
13
![Page 36: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/36.jpg)
Worst-Case Latency RESULTS
0
50
100
150
200
250
300
WC
L (n
s)
experimental analytical
1. DDRx MC has 2.5x to 6.46xworse analytical WCL than RLDC→Very similar numbers for exp.
2. Relatively low WCL of MCMC, ROC, ReOrder is due to 4 ranks!
3. For RLDC: bank partitioning provide tighter WCL than sharing→ at the expense of flexibility
4. Gap between exp. vs analytical WCL is much higher for DDR →again due to inherent variability
14
![Page 37: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/37.jpg)
Variability Window RESULTS
1. Already discussed analytical VW2. Exp. VW for DDRx MCS:
• >400% for 4 MCs, • 300%-400% for 3 MCs and• ~200% for 1 MC.
3. for RLDC:• 76.9% for partitioned banks• 84.6% for shared banks
0
200
400
600
800
1000
1200
VW
experimental analytical
15
![Page 38: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/38.jpg)
Scalability: # Processors RESULTS
1. The WCL latency gap between RLDC and majority of DDRx MCs increases drastically
16
![Page 39: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/39.jpg)
Scalability: # Processors RESULTS
1. The WCL latency gap between RLDC and majority of DDRx MCs increases drastically
2. DDRx MCs can be categorized into three categories
cat1 cat2 cat3
Bank
sharing
Bank
partBank part +
multi-rank
16
![Page 40: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/40.jpg)
Scalability: # Processors RESULTS
1. The WCL latency gap between RLDC and majority of DDRx MCs increases drastically
2. DDRx MCs can be categorized into three categories
3. RLDC’s WCL is less than all categories for all #PEs → for both part and sharing→ without complex arbitration/ reorderings (better analyzability and composability)
cat1 cat2 cat3
Bank
sharing
Bank
partBank part +
multi-rank
16
![Page 41: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/41.jpg)
Discussion RESULTS
1. How mature is RLDRAM?
• Has been there since 1999
• Long-term Supported by Micron
• The main off-chip memory in networking
and other low-latency needs
17
![Page 42: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/42.jpg)
Discussion RESULTS
2. Can I buy RLDRAM off-the-shelf?
• Definitely
• They are sold as discrete components (which is the norm for
embedded systems anyway)
• There are also specialized DDR-compatible sockets for
RLDRAM
17
![Page 43: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/43.jpg)
Discussion RESULTS
3. Are there any platforms/boards/test-beds?
ATCA-9305-NSPProcessing blade
Zynq UltraScale/+ MPSoC
Arria10 SoC
17
![Page 44: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/44.jpg)
Discussion RESULTS
4. If it is that good, why it did not take off then?
• Take off = replaces DDR? → It is not meant to be!
• It is a specialized type of memory for low-latency guarantees
• Commodity general-purpose market is looking for high BW, capacity,
cost
• We should ask ourselves, what are we looking for?
• An analogy: FPGAs have been there for long-time, why are they
taking off now? → they satisfy a “new” need in the AI market
• Will they replace CPUs then?
• It is not an either-or decision
17
![Page 45: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/45.jpg)
Discussion RESULTS
5. Does that mean we no longer need DRAMs in real-time systems?
• No
• Use-case dependent
• A heterogenous memory system?
• Mixed Criticality with different requirements?
17
![Page 46: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/46.jpg)
Discussion RESULTS
6. As a scheduling researcher, why should I care?
• Taking the shared resources interference into account is
unavoidable to provide more accurate numbers
• DDR DRAM is very complex to account for its details (rd vs
wr, conf vs hit, ..etc) → explodes the analysis
• RLDRAM alleviates this complexity..which can make the
analysis more feasible
17
![Page 47: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/47.jpg)
Discussion RESULTS
7. Limitations/trade-off
17
COST
DDRx RLDRAM SRAM
DDRx RLDRAM SRAM
Latency
![Page 48: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/48.jpg)
15 21
19
.51
9.5 4
0.5
30 36
34
.53
4.5 5
5.5
45
79
.5 93 94
.5 10
8
0
50
100
150
a c e g i k m o
Late
ncy
[n
s] VW= 620%
0
200
400
600
800
1000
1200
VW
%
19
.5 21
24
24 2
5.5
25
.5 27
28
.5
0
5
10
15
20
25
30
a b c d e f g h
Late
ncy
[n
s]
46.2%
Pro
cess
or
Dec
od
er Command Generation
Address Mapping
Timing Checker
RR
Arb
iter
RLD
RA
M
type
address
perPEBuffers
cmd
ba
addr
11× less timing variability6.4× less WCL
Main Lessons:1. DDR DRAM is not
designed for predictability,RLDRAM is.
2. Looking for solns that address our needs insteadof starting from the mainstream soln?
3. Not either-or: A heterogenous memoryto address conflictingneeds of MCS
![Page 49: On the Off-chip Memory Latency of Real-Time Systems: Is ...2018.rtss.org/wp-content/uploads/2018/12/12-4.pdf · [Kim et al., RTAS2014] ROC[Krishnapillai et al., ECRTS2014] RTMem[Li](https://reader033.vdocuments.net/reader033/viewer/2022060323/5f0da7187e708231d43b6a95/html5/thumbnails/49.jpg)
15 21
19
.51
9.5 4
0.5
30 36
34
.53
4.5 5
5.5
45
79
.5 93 94
.5 10
8
0
50
100
150
a c e g i k m o
Late
ncy
[n
s] VW= 620%
0
200
400
600
800
1000
1200
VW
%
19
.5 21
24
24 2
5.5
25
.5 27
28
.5
0
5
10
15
20
25
30
a b c d e f g h
Late
ncy
[n
s]
46.2%
Pro
cess
or
Dec
od
er Command Generation
Address Mapping
Timing Checker
RR
Arb
iter
RLD
RA
M
type
address
perPEBuffers
cmd
ba
addr
11× less timing variability6.4× less WCL
Main Lessons:1. DDR DRAM is not
designed for predictability,RLDRAM is.
2. Looking for solns that address our needs insteadof starting from the mainstream soln?
3. Not either-or: A heterogenous memoryto address conflictingneeds of MCS