understanding and improving latency of dram-based memory...
TRANSCRIPT
![Page 1: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/1.jpg)
Understanding and Improving Latency of DRAM-Based Memory Systems
Thesis Oral
Kevin Chang
Committee:Prof. Onur Mutlu (Chair)Prof. James HoeProf. Kayvon FatahalianProf. Stephen Keckler (NVIDIA, UT Austin)Prof. Moinuddin Qureshi (Georgia Tech.)
Committee:Prof. Onur Mutlu (Chair)Prof. James HoeProf. Kayvon FatahalianProf. Stephen Keckler (NVIDIA, UT Austin)Prof. Moinuddin Qureshi (Georgia Tech.)
![Page 2: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/2.jpg)
The March For “Moore”
2
Processor
4K transistors
4B transistors
Main Memory
1Kb
8Gb
DRAM (Dynamic Random Access Memory)
Intel 8080, 1974 Intel 1103, 1970
or
![Page 3: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/3.jpg)
3
PROBLEMDRAM latency has been relatively stagnant
![Page 4: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/4.jpg)
1
10
100
1999 2003 2006 2008 2011 2013 2014 2015 2016 2017
DR
AM
Impr
ovem
ent
(log)
Capacity Bandwidth Latency
Main Memory Latency Lags Behind
4
128x
20x
1.3x
Memory latency remains almost constant
![Page 5: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/5.jpg)
DRAM Latency Is Critical for Performance
5
In-Memory Data Analytics [Clapp+ (Intel), IISWC’15; Awan+, BDCloud’15]
Datacenter workloads [Kanev+ (Google), ISCA’15]
In-memory Databases [Mao+, EuroSys’12; Clapp+ (Intel), IISWC’15]
Graph/Tree Processing [Xu+, IISWC’12; Umuroglu+, FPL’15]
Long memory latency → performance bottleneck
![Page 6: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/6.jpg)
6
GoalImprove latency of DRAM (main memory)
![Page 7: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/7.jpg)
Different DRAM Latency Problems
7
DRAM
CPU
1. Slow bulk data movement between two memory locations
2. Refresh delays memory accesses 4. Voltage affects latency
Voltage
chip
3. High standard latency to mitigate cell variation
![Page 8: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/8.jpg)
Thesis Statement
8
Memory latency can be significantly reduced with a multitude of low-cost architectural techniques that aim to reduce different causes of long latency
![Page 9: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/9.jpg)
9
Contributions
![Page 10: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/10.jpg)
10
DRAM
CPU
1. Slow bulk data movement between two memory locations
2. Refresh delays memory accesses 4. Voltage affects latency
Voltage
Low-Cost Inter-Linked Subarrays (LISA) [HPCA’16]
Mitigating Refresh Latency by Parallelizing Accesses with Refreshes (DSARP) [HPCA’14]
3. High standard latency to mitigate cell variation
Understanding and Exploiting Latency Variation in DRAM(FLY-DRAM) [SIGMETRICS’16]
Understanding and Exploiting Latency-Voltage Trade-Off (Voltron) [SIGMETRICS’17]
Low-Cost ArchitecturalFeatures in DRAM
Understanding and overcoming the latency limitation in DRAM
![Page 11: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/11.jpg)
DRAM BackgroundWhat’s inside a DRAM chip? How to access DRAM?How long does accessing data take?
11
![Page 12: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/12.jpg)
High-Level DRAM Organization
12
DRAM Channel
DIMM(Dual in-line memory module)
DRAMchip
![Page 13: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/13.jpg)
13
BankBitline
Row Buffer
Sense amplifierS
S S S S
Precharge unitP
P P P P
Row
D
ecod
er Wordline
Subarray
512 x 8Kb
I/O
64b
Inte
rnal
Dat
a B
us
Subarray
ChipsDRAM Cell
![Page 14: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/14.jpg)
To
Bank
I/O
Reading Data From DRAM
14
ACTIVATE: Store the row into the row buffer
READ: Select the target column and drive to CPU
PRECHARGE: Reset the bitlines for a new ACTIVATE
SP
SP
SP
SP
1111 1
2
3
![Page 15: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/15.jpg)
DRAM Access Latency
15
Command
Data
Duration
ACTIVATE READ PRECHARGE
1 1 1 1Cache line (64B)
NextACT
Activation latency: tRCD(13ns / 50 cycles)
1
Precharge latency: tRP(13ns / 50 cycles)
2
![Page 16: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/16.jpg)
16
DRAM
CPU
1. Slow bulk data movement between two memory locations
2. Refresh delays memory accesses 4. Voltage affects latency
Voltage
Low-Cost Inter-Linked Subarrays (LISA) [HPCA’16]
Mitigating Refresh Latency by Parallelizing Accesses with Refreshes (DSARP) [HPCA’14]
3. High standard latency to mitigate cell variation
Understanding and Exploiting Latency Variation in DRAM(FLY-DRAM) [SIGMETRICS’16]
Understanding and Exploiting Latency-Voltage Trade-Off (Voltron) [SIGMETRICS’17]
Low-Cost ArchitecturalFeatures in DRAM
Understanding and overcoming the latency limitation in DRAM
![Page 17: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/17.jpg)
Problem: Inefficient Bulk Data Movement
17
Bulk data movement is a key operation in many applications–memmove & memcpy: 5% cycles in Google’s datacenter [Kanev+, ISCA’15]
Mem
ory
Con
trol
ler
CPU Memory
Channeldst
src
Long latency and high energy
LLCC
ore
Cor
e
Cor
eC
ore
64 bits
![Page 18: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/18.jpg)
Move Data inside DRAM?
18
![Page 19: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/19.jpg)
Moving Data Inside DRAM?
19
Subarray 1Subarray 2Subarray 3
Subarray N
…
Internal Data Bus (64b)
8Kb512rows
Bank
Bank
Bank
Bank
DRAM
…
Low connectivity in DRAM is the fundamental bottleneck for bulk data movement
Goal: Provide a new substrate to enable wide connectivity between subarrays
![Page 20: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/20.jpg)
20
Our proposal:Low-Cost Inter-Linked SubArrays (LISA)
![Page 21: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/21.jpg)
Observations
21
SP
SP
SP
SP
SP
SP
SP
SP
…
Bitlines serve as a bus that is as wide as a row
1
Bitlines between subarrays areclose but disconnected2
Inte
rnal
Dat
a Bu
s (6
4b)
![Page 22: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/22.jpg)
Low-Cost Interlinked Subarrays (LISA)
22
SP
SP
SP
SP
SP
SP
SP
SP
…
Interconnect bitlines of adjacent subarrays in a bank using
isolation transistors (links)
ON
64b
8kb
![Page 23: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/23.jpg)
Low-Cost Interlinked Subarrays (LISA)
23
SP
SP
SP
SP
SP
SP
SP
SP
…
64b
8kbON
Row Buffer Movement (RBM): Move a row of data in an activated row buffer to a precharged one• 4KB data in 8ns→ 500 GB/s, 26x bandwidth of a DDR4-2400 channel• 0.8% DRAM chip area overhead
ChargeSharing
![Page 24: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/24.jpg)
Three New Applications of LISA to Reduce Latency
24
1 Fast bulk data copy
![Page 25: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/25.jpg)
1. Rapid Inter-Subarray Copying (RISC)
• Goal: Efficiently copy a row across subarrays• Key idea: Use RBM to form a new command sequence
25
SP
SP
SP
SP
SP
SP
SP
SP
Subarray 1
Subarray 2
Activate dst row(write row buffer into dst row)3
RBM SA1→SA22
Activate src row1 src row
dst rowReduces row-copy latency by 9x,DRAM energy by 48x
![Page 26: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/26.jpg)
Methodology
• Cycle-level simulator: Ramulator [Kim+, CAL’15]
• Four out-of-order cores
• Two DDR3-1600 channels
• Benchmarks: TPC, STREAM, SPEC2006, DynoGraph,
random, bootup, forkbench, shell script
26
![Page 27: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/27.jpg)
RISC Outperforms Prior Work
27
0
0.5
1
1.5
2
Nor
mal
ized
Spe
edup
RowClone RISC[Seshadri+, MICRO’13]
0
0.2
0.4
0.6
0.8
1
Nor
mal
ized
DR
AM
Ene
rgy
-24%
RowClone limits bank-level parallelism
66%
-55%
-5%
50 workloads 50 workloads
Rapid Inter-Subarray Copying (RISC) using LISA improves system performance
![Page 28: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/28.jpg)
Three New Applications of LISA to Reduce Latency
28
1 Fast bulk data copy
2 In-DRAM caching
![Page 29: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/29.jpg)
2.Variable Latency DRAM (VILLA)
• Goal: Reduce access latency with low area overhead• Motivation: Trade-off between area and latency
29
High area overhead
Long Bitline Short Bitline
High latency Low latency
Lower resistance and capacitance
![Page 30: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/30.jpg)
2. Variable Latency DRAM (VILLA)
• Key idea: Heterogeneous DRAM design by adding a few fast subarrays as a low-cost cache in each bank
• Benefits: Reduce access latency for frequently-accessed data
30
Slow Subarray
Slow Subarray
Fast Subarray LISA: Cache rows rapidly from slow to fast subarrays
32rows
512rows
Reduces hot data access latency by 2.2x at only 1.6% area overhead
Challenge: How to move data efficiently from slow to fast subarrays?
![Page 31: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/31.jpg)
VILLA Improves System Performance by Caching Hot Data
31
0
10
20
30
40
50
60
70
80
1
1.02
1.04
1.06
1.08
1.1
1.12
1.14
1.16
VILLA
Cache H
it Rate (%
)N
orm
aliz
ed S
peed
up
Workloads (50)
VILLAVILLA Cache Hit Rate
Avg: 5%
Max: 16%
LISA enables an effective in-DRAM caching scheme
![Page 32: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/32.jpg)
Three New Applications of LISA to Reduce Latency
32
1 Fast bulk data copy
2 In-DRAM caching
3 Fast precharge
![Page 33: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/33.jpg)
3. Linked Precharge (LIP)
33
• Problem: The precharge time is limited by the strength of one precharge unit
• Linked Precharge (LIP): LISA precharges a subarray using multiple precharge units
SP
SP
SP
SP
SP
SP
SP
SP
SP
SP
SP
SP
PrechargingSP
SP
SP
SP
Activatedrow
on on
on
LinkedPrecharging
Conventional DRAM LISA DRAM
Reduces precharge latency by 2.6x
Activatedrow
![Page 34: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/34.jpg)
LIP Improves System Performance by Accelerating Precharge
34
1
1.02
1.04
1.06
1.08
1.1
1.12
1.14
1.16
Nor
mal
ized
Spe
edup
Workloads (50)
LIP
Avg: 8%
Max: 13%
LISA reduces precharge latency
![Page 35: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/35.jpg)
Latency Reduction of LISA
35
Latency of Operations
ACTIVATE PRECHARGE
VILLA VILLA1.7x 1.5x
LIP2.6x
READ WRITE
x64x64
9x
4KB data copying
LISA is a versatile substrate that enables many new techniques
![Page 36: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/36.jpg)
36
DRAM
CPU
1. Slow bulk data movement between two memory locations
2. Refresh delays memory accesses 4. Voltage affects latency
Voltage
Low-Cost Inter-Linked Subarrays (LISA) [HPCA’16]
Mitigating Refresh Latency by Parallelizing Accesses with Refreshes (DSARP) [HPCA’14]
3. High standard latency to mitigate cell variation
Understanding and Exploiting Latency Variation in DRAM(FLY-DRAM) [SIGMETRICS’16]
Understanding and Exploiting Latency-Voltage Trade-Off (Voltron) [SIGMETRICS’17]
Low-Cost ArchitecturalFeatures in DRAM
Understanding and overcoming the latency limitation in DRAM
![Page 37: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/37.jpg)
What Does DRAM Latency Mean to You?
• DRAM latency: Delay as specified in DRAM standards
• Memory controllers use these standardized latency to access DRAM
• Key question: How does reducing latency affect DRAM accesses?
37
“The purpose of this Standard is to define the minimum set of requirements for JEDEC compliant … SDRAM devices” (p.1) JEDEC DDRx standard
![Page 38: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/38.jpg)
Goals
38
1 Understand and characterize reduced-latency behavior in modern DRAM chips
2 Develop a mechanism that exploits our observation to improve DRAM latency
2
![Page 39: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/39.jpg)
Experimental Setup
• Custom FPGA-based infrastructure– Existing systems: Commands are generated and controlled by HW
39
PCIe DDR3
PC FPGA DIMM
![Page 40: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/40.jpg)
Experiments
• Swept each timing parameter to read data– Time step of 2.5ns (FPGA cycle time)
• Check the correctness of data read back from DRAM– Any errors (bit flips)?
• Tested 240 DDR3 DRAM chips from three vendors– 30 DIMMs– Capacity: 1GB
40
![Page 41: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/41.jpg)
Experimental ResultsActivation Latency
41
![Page 42: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/42.jpg)
Variation in Activation Errors
42
2.5 5.0 7.5 10.0 12.5t5CD (ns)
100
10-2
10-4
10-6
10-8
10-10
Bit
(rr
Rr 5
Dte
(B(
5)
Different characteristics across cells
Results from 7500 rounds over 240 chips
Very few errors
Many errors
<10 bits
8KB (one row)
step size
Rife with errors No Errors
13.1nsstandard
Activation Latency/tRCD (ns)Modern DRAM chips exhibit significant variation in activation latency
![Page 43: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/43.jpg)
DRAM Latency Variation
43
HighLowDRAM Latency
DRAM BDRAM A DRAM C
Slow cells
Imperfect manufacturing process→latency variation in timing parameters
![Page 44: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/44.jpg)
Experimental ResultsPrecharge Latency
44
![Page 45: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/45.jpg)
Variation in Precharge Errors
2.5 5.0 7.5 10.0 12.5t5P (ns)
100
10-2
10-4
10-6
Bit
(rr
Rr 5
ate
(B(
5)
45
Results from 4000 rounds over 240 chips
Very few errors
Many errors
8KB (one row)
step size
Rife with errorsNo Errors
13.1nsstandard
100 rows
Precharge Latency/tRP (ns)Modern DRAM chips exhibit significant variation in precharge latency
![Page 46: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/46.jpg)
Spatial Locality of Slow Cells
46
Slow cells are concentrated at certain regions
One DIMM: tRCD=7.5ns One DIMM: tRP=7.5ns
![Page 47: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/47.jpg)
47
FLY
Mechanism:Flexible-Latency (FLY) DRAM
![Page 48: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/48.jpg)
Mechanism to Reduce DRAM Latency
• Observation: DRAM timing errors (slow DRAM cells) are concentrated on certain regions
• Flexible-LatencY (FLY) DRAM– A memory controller design that reduces latency
• Key idea:1) Divide memory into regions of different latencies
2) Memory controller: Use lower latency for regions without slow cells; higher latency for other regions
• Latency profile through DRAM vendors or online tests48
![Page 49: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/49.jpg)
0.9
0.95
1
1.05
1.1
1.15
1.2
1.25
Normalize
dPerfo
rmance
40Workloads
Baseline(DDR3)FLY-DRAM(DIMM1)FLY-DRAM(DIMM2)UpperBound
Benefits of FLY-DRAM
49
17.6%19.5% 19.7%
FLY-DRAM improves performance by exploiting latency variation in DRAM
83%
99%
0%
100%
Fast cells (%)
![Page 50: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/50.jpg)
Latency Reduction of FLY-DRAM
50
Latency of Operations
ACTIVATE PRECHARGE
VILLA VILLA1.7x 1.5x
LIP2.6x
READ WRITE
x64x64
9x
4KB data copying
FLY FLY1.7x 1.7x
Experimental demonstration of latency variation enables techniques to reduce latency
![Page 51: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/51.jpg)
51
DRAM
CPU
1. Slow bulk data movement between two memory locations
2. Refresh delays memory accesses 4. Voltage affects latency
Voltage
Low-Cost Inter-Linked Subarrays (LISA) [HPCA’16]
Mitigating Refresh Latency by Parallelizing Accesses with Refreshes (DSARP) [HPCA’14]
3. High standard latency to mitigate cell variation
Understanding and Exploiting Latency Variation in DRAM(FLY-DRAM) [SIGMETRICS’16]
Understanding and Exploiting Latency-Voltage Trade-Off (Voltron) [SIGMETRICS’17]
Low-Cost ArchitecturalFeatures in DRAM
Understanding and overcoming the latency limitation in DRAM
![Page 52: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/52.jpg)
• DRAM voltage is an important factor that affects:latency, power, and reliability
• Goal: Understand the relationship between latency and DRAM voltage and exploit this trade-off
52
Motivation
![Page 53: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/53.jpg)
• FPGA platform
• Tested 124 DDR3L DRAM chips (31 DIMMs)
Methodology
53
VoltageControllerDIMM
![Page 54: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/54.jpg)
Key Result: Voltage vs. Latency
54
Circuit-level SPICE simulation
Trade-off between access latency and voltage
Potential latency range
![Page 55: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/55.jpg)
Goal and Key Observation
• Goal: Exploit the trade-off between voltage and latency to reduce energy consumption
• Approach: Reduce voltage– Performance loss due to increased latency– Energy: Function of time (performance) and power (voltage)
• Observation: Application’s performance loss due to higher latency has a strong linear relationship with its memory intensity
55
![Page 56: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/56.jpg)
Mechanism: Voltron
• Build a performance (linear) model to predict performance loss based on the selected voltage value
• Use the model to select a minimum voltage that satisfies a performance loss target specified by the user
• Results: Reduces system energy by 7.3% with a small performance loss of 1.8%
56
![Page 57: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/57.jpg)
Reducing Latency by Exploiting Voltage-Latency Trade-Off• Voltron exploits the latency-voltage trade-off to
improve energy efficiency
• Another perspective: Increase voltage to reduce latency
57
![Page 58: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/58.jpg)
58
CPU
1. Slow bulk data movement between two memory locations
2. Refresh delays memory accesses
3. High standard latency to mitigate cell variation
4. Voltage affects latency
Voltage
Low-Cost Inter-Linked Subarrays (LISA) [HPCA’16]
Mitigating Refresh Latency by Parallelizing Accesses with Refreshes (DSARP) [HPCA’14]
DRAM
Understanding and Exploiting Latency Variation in DRAM(FLY-DRAM) [SIGMETRICS’16]
Understanding and Exploiting Latency-Voltage Trade-Off (Voltron) [SIGMETRICS’17]
Low-Cost ArchitecturalFeatures in DRAM
Understanding and overcoming the latency limitation in DRAM
![Page 59: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/59.jpg)
• Problem: Refreshing DRAM blocks memory accesses – Prolongs latency of memory requests
• Goal: Reduce refresh-induced latency on demand requests
• Key observation: Some subarrays and I/O remain completely idle during refresh
• Dynamic Subarray Access-Refresh Parallelization (DSARP):– DRAM modification to enable idle DRAM subarrays to serve
accesses during refresh– 0.7% DRAM area overhead
• 20.2% system performance improvement for 8-core systems using 32Gb DRAM
59
Summary of DSARP
![Page 60: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/60.jpg)
Prior Work on Low-Latency DRAM• Uniform short-bitlines DRAM: FCRAM, RLDRAM– Large area overhead (30% - 80%)
• Heterogeneous bitline design– TL-DRAM: Intra-subarray [Lee+, HPCA’13]
Requires two fast rows to cache one slow row– CHARM: Inter-bank [Son+, ISCA’13]
High movement cost between slow and fast banks• SRAM cache in DRAM [Hidaka+, IEEE Micro’90]
– Large area overhead (38% for 64KB) and complex control
• Our work:– Low cost – Detailed experimental understanding via characterization of
commodity chips60
![Page 61: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/61.jpg)
61
CONCLUSION
![Page 62: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/62.jpg)
Conclusion
• Memory latency has remained mostly constant over the past decade– System performance bottleneck for modern applications
• Simple and low-cost architectural mechanisms – New DRAM substrate for fast inter-subarray data movement– Refresh architecture to mitigate refresh interference
• Understanding latency behavior in commodity DRAM– Experimental characterization of:
1) Latency variation inside DRAM 2) Relationship between latency and DRAM voltage
62
![Page 63: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/63.jpg)
Thesis Statement
63
Memory latency can be significantly reduced with a multitude of low-cost architectural techniques that aim to reduce different causes of long latency
![Page 64: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/64.jpg)
Future Research Direction
• Latency characterization and optimization for other memory technologies– eDRAM– Non-volatile memory: PCM, STT-RAM, etc.
• Understanding other aspects of DRAM– Variation in power/energy consumption– Security/reliability
64
![Page 65: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/65.jpg)
Other Areas Investigated
65
Energy Efficient Networks-On-Chip[NOCS’12, SBACPAD’12, SBACPAD’14]
DRAM Testing Platform[HPCA’17]
Low-latency DRAM Architecture[HPCA’15]
Memory Schedulers for Heterogeneous Systems[ISCA’12, TACO’16]
![Page 66: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/66.jpg)
Acknowledgements• Onur Mutlu• James Hoe, Kayvon Fatahalian, Moinuddin Qureshi, and
Steve Keckler• Safari group: Rachata Ausavarungnirun, Amirali Boroumand, Chris Fallin, Saugata
Ghose, Hasan Hassan, Kevin Hsieh, Ben Jaiyen, Abhijith Kashyap, Samira Khan, YoonguKim, Donghyuk Lee, Yang Li, Jamie Liu, Yixin Luo, Justin Meza, Gennady Pekhimenko, Vivek Seshadri, Lavanya Subramanian, NanditaVijaykumar, HanbinYoon, Hongyi Xin
• Georgia Tech. collaborators: Prashant Nair, Jaewoong Sim
• CALCM group• Friends• Family — parents, sister, and girlfriend• Intern mentors and industry collaborators:
66
![Page 67: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/67.jpg)
Sponsors
• Intel and SRC for my fellowship• NSF and DOE• Facebook, Google, Intel, NVIDIA, VMware, Samsung
67
![Page 68: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/68.jpg)
Thesis Related Publications• Improving DRAM Performance by Parallelizing Refreshes with Accesses
Kevin Chang, Donghyuk Lee, Zeshan Chishti, Alaa Alameldeen, Chris Wilkerson, Yoongu Kim, Onur MutluHPCA 2014
• Low-Cost Inter-Linked Subarrays (LISA): Enabling Fast Inter-Subarray Data Movement in DRAMKevin Chang, Prashant J. Nair, Donghyuk Lee, Saugata Ghose, Moinuddin K. Qureshi, and Onur MutluHPCA 2016
• Understanding Latency Variation in Modern DRAM Chips: Experimental Characterization, Analysis, and OptimizationKevin Chang, Abhijith Kashyap, Hasan Hassan Samira Khan, Kevin Hsieh, Donghyuk Lee, Saugata Ghose, Gennady Pekhimenko, Tianshi Li, Onur MutluSIGMETRICS 2016
• Understanding Reduced-Voltage Operation in Modern DRAM Devices: Experimental Characterization, Analysis, and MechanismsKevin Chang, Abdullah Giray Yaglikçi, Saugata Ghose, Aditya Agrawal, Niladrish Chatterjee, Abhijith Kashyap, Donghyuk Lee, Mike O’connor, Hasan Hassan, Onur MutluSIGMETRICS 2017
68
![Page 69: Understanding and Improving Latency of DRAM-Based Memory ...safari/thesis/kchang_defense_slides.pdf · •Linked Precharge (LIP): LISA prechargesa subarray using multiple precharge](https://reader033.vdocuments.net/reader033/viewer/2022060601/6055a8d2e1ad6717064afb02/html5/thumbnails/69.jpg)
Understanding and Improving Latency of DRAM-Based Memory Systems
Thesis Oral
Kevin Chang
Committee:Prof. Onur Mutlu (Chair)Prof. James HoeProf. Kayvon FatahalianProf. Stephen Keckler (NVIDIA, UT Austin)Prof. Moinuddin Qureshi (Georgia Tech.)
Committee:Prof. Onur Mutlu (Chair)Prof. James HoeProf. Kayvon FatahalianProf. Stephen Keckler (NVIDIA, UT Austin)Prof. Moinuddin Qureshi (Georgia Tech.)