embedded system lab. 김해천 [email protected] memory scaling: a system architecture...

Embedded System Lab.


김해천[email protected]

Memory Scaling: A System Architecture Per-spective

Onur MutluCarnegie Mellon University

[email protected]://users.ece.cmu.edu/∼omutlu/

김 해 천


DRAM Refresh DRAM capacitor charge leak over time The memory controller needs to refresh each row

periodically to restore charge Downsides of refresh

Energy consumption: Each refresh consumes energy Performance degradation: DRAM rank/bank unavailable while refreshed Refresh rate limits DRAM capacity scaling

Refresh Overhead: EnergyRefresh Overhead: performance

김 해 천


Reducing Refresh Impact Observation: High refresh rate caused by few weak DRAM Cells Problem: All cells refreshed at the same high rate

Idea: RAIDR decreases refresh rate for most DRAM cells

while refreshing weak cells at a higher rate Group parts of DRAM into different bins depending on their

required refresh rate row’s refresh containing leaky cells > normal most rows are refreshed less frequently

Profiling Binning Refreshing

김 해 천


DRAM Latency and Energy The DRAM industry: provide maximum capacity for a given cost To mitigate the high area overhead of DRAM sensing structures

Connect many DRAM cells to each sense-amplifier through a wire(bitline) High parasitic capacitance

김 해 천


Reducing DRAM Latency and Energy Challenge 1: How to efficiently migrate a row between segments? Challenge 2: How to efficiently manage the cache?

김 해 천


RowClone: Accelerating Page Copy and Initializa-tion

Bulk data copy and initialization Unnecessarily move data on the memory channel Degrade system performance and energy efficiency

RowClone – perform copy in DRAM with low cost Uses row buffer to copy large quantity of data Source row → row buffer → destination row

김 해 천


RowClone: Accelerating Page Copy and Initializa-tion

Fast Parallel Mode Pipelined Serial Mode

김 해 천


SALP:Reducing DRAM Bank Conflicts Problem: Bank conflicts are costly for performance and energy

serialized requests, wasted energy

Goal: Reduce bank conflicts without adding more banks (low cost)

Key idea: Exploit the internal subarray structure of a DRAM bank

to parallelize bank conflicts to different subarrays

Logical hierarchy of main memory DRAM bank organization

김 해 천


SALP:Reducing DRAM Bank Conflicts Subarray-Level-Parallelism

김 해 천


Hybrid Main Memory PCM is a promising technology that can offer higher capacity than

DRAM But, access latency and energy are higher than DRAM’s Endurance is lower

Row buffers are the same in DRAM and PCM Row buffer hit latency same in DRAM and PCM Row buffer miss latency small in DRAM, large in PCM

김 해 천


Hybrid Main Memory

Idea: Cache in DRAM only those rows that Frequently cause row buffer conflicts → because row-conflict latency

is smaller in DRAM Are reused many times → to reduce cache pollution and bandwidth

waste

Simplified rule of thumb: Streaming accesses: Better to place in PCM Other accesses (with some reuse): Better to place in DRAM

김 해 천


The End Thank you

김 해 천


참고 http://www.anandtech.com/show/3851/everything-you-always-wanted-

to-know-about-sdram-memory-but-were-afraid-to-ask/2

김 해 천


Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Archi-tecture

Inter-Segment Migration Our way:

Source and destination cells share bitlines Transfer data from source to destination across shared bitlines concur-

rently

Near Segment

Far Segment

Isolation Transistor

Sense Amplifier

Source

Destination

김 해 천


Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Archi-tecture

김 해 천


SALPACTIVATE Row : row-buffer 로 entire row 들을 읽는 과정 ? - 디램 low 가 activated 되기 전에 뱅크는 precharged state 로 되어 있어야 한다 . 이 상태에서 모든 비트라인들을 voltage-level of 1/2*VDD 로 유지 된다 .- ACTIVATE COMMAND 를 ( row-address 와 함께 ), 보내자마자 해당하는 wordline 은 Vpp 로 올려지고 , connecting the row's cell to the bitlines- 그 후 cell 이 충전 (Q) or 비충전 (0) 인지 여부에 따라 bitline voltage 는 약간 VDD OR 0 으로 동요된다 ?- 로우 버퍼는 이러한 perturbation 을 감지하고 , 증폭한다 .- 비트라인 voltage 가 transition 중에는 정의되어 있지 않는 상태다- 비트라인 voltage 가 안정화되면 , cell charges 는 그들의 원래의 값으로 회복된다 . 이 전체시간을 tRAS 라고 부른다 .

READ/WRITE Column - activate 후에 , 메모리 컨트롤러는 READ 나 WRITE command 를 column address 와 함께 issue 한다 . - ACTIVATE 와 그 후의 column command(READ/WRITE) 사이에 타이밍 제약은 tRCD 라고 불린다- 뱅크에게 다음의 요청이 같은 로우에 접근한다면 , 이것은 column command 만 있으면 된다 . 이미 로우가 이미 activated 되어 있기 때문에 .- 그래서 새로운 row 를 activate 하는 것보다 더 빠르다 .

PRECHARGE Bank- 새로운 row 를 activate 하기 위해서 , 메모리 컨트롤러는 첫번째로 뱅크를 precharged state 로 만들어야 한다 .- 이것은 두 단계로 이루어짐 - 1. 현재의 activated 된 row 에 해당하는 word line 을 zero 로 만든다 . ( bitline 으로 부터 cell 들을 분리시키면서 )- 2. bitline 들이 1/2*Vdd 로 드라이븐 된다 . 이러한 시간을 tRP 라고 함 .

- tRP – 먼저 activated 된 subarray 들이 완전히 precharged 상태로 보장하기 위한 시간 .

tWR : WRITE issue 후에 write recovery latency 라는 추가적인 타임이 필요함 . row buffer 가 새로운 voltage 값으로 drive 될 때 안정화될 시간이 필요함 .로우 버퍼가 new voltage 를 drive 하고 있는데 , precharged 되면 전압값이 변하여 데이터의 저장이 완전해질 수 없다 .

김 해 천


SALP1 SALP2 SALP1

tRP(row-precharge time) 은 먼저 activated 된 subarray 를 (subarray X)

precharged state 로 ensure 하는 시간 이미 다른 subarray 가 precharged 상태라면 다른 subarray 는 acrivation

시작할 수 있음 .

SALP1 에서는 write recovery 시간과 activation 을 overlap 하지 못함 각 subarray 의 write recovery 와 activation 은 자신들에 해당하는 word-

line 을 요청하는데 , 기존의 global row-address latch 는 모든 subarrays들이 공유하기 때문에 한 subarray 에 해당하는 wordline 만 serve 할 수 있음

SALP2 는 하드웨어적으로 row-address latch 를 추가하여 이를 가능하게 하여 rWR 과 Activation 을 overlap 할 수 있다 .

김 해 천


https://www.ece.cmu.edu/~safari/pubs/raidr_isca12.pdf

김 해 천


embedded system lab. 김해천 [email protected] memory scaling: a system architecture...

Documents