highspeedram
DESCRIPTION
High Speed RAMTRANSCRIPT
-
VLSI System DesignEE577B
2009 Fall Semester
Naehyuck Chang
Visiting Professor
[email protected]. of Electrical Engineering
Computer Engineering Division
Viterbi School of Engineering
University of Southern California
High Speed Memories: Evolution of Dynamic MemoriesDRAM, DDR2 and DDR3
1
-
Naehyuck Chang
DRAM Architecture
General memory cell interconnection structure
2
-
Naehyuck Chang 3
DRAM Architecture
DRAM (Dynamic RAM) cell Capacitor keeps the value The capacitor is leaky
Reliable for 64 ms after charging or discharging Repeat charging or discharging every 64 ms Volatile storage
Cheap memory
-
Naehyuck Chang
DRAM Architecture
Wordline Selection (addressing) of cells connected
to the same wordline Exclusively selected by a decoder
Bitline Cell values are transferred via the bitlines Parallel architecture
Address encoding Non-redundant Address value encoding
using binary numbers
Address decoding Large-size binary decoder: 28-
to-268435456 binary decoder for 256Mb memory
4
Addressing
-
Naehyuck Chang
DRAM Architecture
NOR structure Fast random access capable
5
DischargedCharged
-
Naehyuck Chang
DRAM Architecture
NOR structure Fast random access capable
5
DischargedCharged
-
Naehyuck Chang
DRAM Architecture
NOR structure Fast random access capable
5
DischargedCharged
-
Naehyuck Chang
DRAM Architecture
NOR structure Fast random access capable
5
DischargedCharged
-
Naehyuck Chang
DRAM Architecture
NOR structure Fast random access capable
5
DischargedCharged
0 01
-
Naehyuck Chang
DRAM Architecture
Multiplexed addressing
6
-
Naehyuck Chang 7
DRAM Architecture
Sense amplifier Memory cell is a small transistor
(capacitor)
Bitline is a big capacitor Connecting a memory cell to the
bitline cause only small amount of voltage change Differential sense amplifier
-
Naehyuck Chang
DRAM Architecture
8
DRAM bus interface Multiplexed addressing
Row address and column addressshares the same pins
-
Naehyuck Chang
DRAM Operations
Precharge
9
-
Naehyuck Chang
DRAM Operations
Row access Assert RAS
(row address strobe)
10
-
Naehyuck Chang
DRAM Operations
Column access Assert CAD
(column address strobe)
11
-
Naehyuck Chang
DRAM Operations
Refresh DRAM cells are leaky capacitors Can hold charge for 64 ms Read operations restore correct data for the whole wordline Dummy read operations for refresh 15.6 s equal time distance Refresh overhead
Time for refresh operation Need to close the raw currently open if not precharged already
If uniform read operations for the whole cell is guaranteed, no dummy read is needed Video frame buffer
12
-
Naehyuck Chang
DRAM Evolution
13
FPM: Fast Page ModeEDO: Extended Data OutP/B EDO: Pipelined Burst EDOSDRAM: Synchronous DRAMESDRAM: Enhanced Synchronous DRAMDDR: Double Data RateVCDRAM: Virtual Channel RADMFCRAM: Fast Cycle RAMMOSYS: Memory on System, distributed DRAMs for SoC and FPGAs
Serial out
VRAM
-
Naehyuck Chang
Conventional DRAM
Protocol RAS*
Row address strobe that latches the row address CAS*
Column address strobe that larches the column address RAS* and CAS* go high to precharge and get ready for next cycle Read and write is determined with WE* when CAS* is asserted
Variations are allowed such as read-modify-write
14
-
Naehyuck Chang
Conventional DRAM
Read operation WE* should be high while CAS* is asserted OE* can be enabled whenever data is need
Three state buffer enable
15
DRAM read operation
-
Naehyuck Chang
Conventional DRAM
Write operation WE* should be low when CAS* is asserted
Data is lathed with CAS* OE* is dont care
16
DRAM write operation
-
Naehyuck Chang
Conventional DRAM
Refresh RAS* only refresh
Refresh address should be given Selective refresh is feasible
Refresh overhead In general, every 15.6 s
17
DRAM RAS*-only refresh
-
Naehyuck Chang
Conventional DRAM
Refresh CAS* before RAS* refresh (CBR)
Use of a prohibit DRAM protocol Use of internal refresh address counter Makes the refresh control circuit simpler
18
DRAM CAS* before RAS* operation
-
Naehyuck Chang
Conventional DRAM
Refresh Hidden refresh
CBR refresh is hidden after a read/write operation WE* should be high when hidden the second CAS* is asserted Data is available while RAS* is asserted Reduces refresh overhead
19
DRAM hidden refresh
-
Naehyuck Chang
Conventional DRAM
Refresh Self refresh
Suspend operation of its DRAM controller to save power without losing data stored in DRAM
20
DRAM self refresh
-
Naehyuck Chang
Pseudo SRAM
Pseudo SRAM Consists of a DRAM macro core with a traditional SRAM interface On-chip refresh circuit Higher density, higher speed, smaller die size than SDRAM DRAM compatible process Around 70 ns access time or 133 MHz synchronous interface Low active standby current
E.g., < 100 A Deep power down mode
E.g., < 5 A Mobil phone applications
21
-
Naehyuck Chang
Fast Page Mode (FPM)
Keep RAS* active for a while E.g., 10 s Keep the page or row open after a CAS* cycle completes
Repeat CAS* assertions changing the column addresses Continue to access different cells connected to the same wordline
Significant reduction of the access latency if spatial locality exists Precharge and RAS delay
22
-
Naehyuck Chang
Extended Data Out (EDO)
Motivation During the FPM, data disappears when CAS* becomes inactive Need to keep CAS* until the CPU latches the output data
EDO Extend the output data with a latch even CAS* becomes inactive for the next
column address processing Very simple modification but enhance throughput up to 27% (5-2-2-2 @66 MHz)
Roughly 96 MB/s with a 32 bit bus
23
-
Naehyuck Chang
Extended Data Out (EDO)
70ns, 60ns and 50ns speeds
24
The same data out time The same data out time
Shorter CAS* durationLonger CAS* duration
-
Naehyuck Chang
Burst EDO
EDO + built-in column address counter Spatial locality changes as cache appears
Page mode does not always enhance speed: wrong prediction Cache line fill is a guaranteed special locality
2 bit binary counter w/o carry (wrap around) Do not need to supply new column address if the next column address is a simple
increment within the burst boundary Initiate the memory controller concept
5-1-1-1 access can improve throughput up to 50% than EDO
25Burst EDO Pipelined burst EDO
-
Naehyuck Chang
VRAM (Video DRAM)
EDO does not provide enough bandwidth for frame buffers 132 MB/s with a 32 bit bus (5-1-1-1 @66 MHz) 1280 by 1024 with 32 bpp @80 Hz refresh requires 400 MB/s only for refresh
Dual-port DRAM for video frame buffers Two ports that can be used simultaneously
DRAM port: random access port like standard DRAMs Video port: serial port with SLK
Typical DRAM arrays normally access a full row of bits (i.e. a word line) at up to 1024 bits at a time
26
-
Naehyuck Chang
Synchronous DRAM (SDRAM)
Motivation Asynchronous bus protocols exhibit significantly
low efficiency Handshake protocol: active and inactive
Microprocessors now have synchronous bus protocols
Burst mode EDO is ready to go to synchronous interface CAS* is actually synchronized to the 66 MHz bus clock
Synchronous bus protocol All inputs are latched at the edge of the clock No need to repeat active and inactive that occupies minimum two clock cycles
Multiplexed addressing CAS* and RAS* is preserved Temporal redundancy still exists due to the structure of the memory organization Pin counts are important for the chip cost
27
Non-zero CAS duration
-
Naehyuck Chang
Synchronous DRAM (SDRAM)
Command scheme is introduced Patterns of RAS*, CAS* and WE* are determined the commands
On-chip circuit for control and timing
28
-
Naehyuck Chang
Synchronous DRAM (SDRAM)
Generally, a dedicated memory controller is used Convert microprocessor-compatible signals and timing to SDRAM signals and
timing
29
-
Naehyuck Chang
Synchronous DRAM (SDRAM)
30
SDRAM protocol SDRAM 4 beat burst read Command and data are synchronized with the clock
Bus clock and the SDRAM delay do not exactly match Bus clock used to be synchronous (integer multiple) to the CPU core clock
-
Naehyuck Chang
Synchronous DRAM (SDRAM)
Read/write efficiency It takes time to reverse the direction of the data bus Read to write switch costs 2 cycles Write to read switch costs 4 cycles Extra NOPs inserted between requests Switch frequency dependent on traffic
31
-
Naehyuck Chang
Synchronous DRAM (SDRAM)
SDRAM refresh It is possible to refresh a RAM chip by opening and closing (activating and
precharging) each row in each bank - RAS* only refresh To simplify the memory controller, SDRAM chips support an auto refresh command
Performs auto refresh to one row in each bank simultaneously Maintains an internal refresh counter Memory controller simply issues a sufficient number of auto refresh commands Typical tREF = 64 ms, 4096 rows, then every 15.6 s
A refresh command executes in 75 ns on a DDR2- 400 256 Mb device This corresponds to roughly 1% of the time
32
-
Naehyuck Chang
Synchronous DRAM (SDRAM)
On chip multi-banking Multiple semi-independent banks Typical configuration
4 to 8 banks 16K rows/bank 1024 columns/row 4 to 16 bits/column
33
-
Naehyuck Chang
Synchronous DRAM (SDRAM)
Timing and bus clock frequency SDRAM has a DRAM architecture and synchronous bus interface Internal timing requirements have nothing to do with the bus clock To ensure the internal delay, a proper integer multiple wait clock cycles should be
applied
34
-
Naehyuck Chang
Synchronous DRAM (SDRAM)
Pipelined memory access SDRAM interface has a separate data and command bus Overlapping of different action in bank though command scheme
When data is transferred to or from a bank other banks are activated and precharged (bank preparation)
Pipelining memory accesses increases efficiency and throughput
35
-
Naehyuck Chang
Synchronous DRAM (SDRAM)
Timing and bus clock frequency Write to precharge at 33 MHz bus clock
Write to precharge at 66 MHz bus clock
36
-
Naehyuck Chang
Enhanced SDRAM (ESDRAM)
Made by Enhanced Memory Systems Includes a small static RAM in the SDRAM chip
Many accesses will be from the faster SRAM A wide bus between the SRAM and the SDRAM
On-chip bus Category of cache DRAM and are used mainly for L1 and L2
37
-
Naehyuck Chang
Two Directions
Structural modification Reduce the access latency Break down the classical structure
of DRAM Get rid of CAS* and RAS* scheme
Interface modification Improve the bus protocol High speed signaling Serialization of Data Special IO drives Source synchronous signaling
38
-
Naehyuck Chang
Virtual Channel DRAM (VCDRAM)
Designed by NEC Contains SRAM caches Contain 16 virtual channels, or 16 1 KB SRAM caches While the ESDRAM module handles caching internally, the VC SDRAM cache is
managed by the chipset
39
-
Naehyuck Chang
Fast Cycle RAM
Developed by the Fujitsu Corporation Approaches the problem of DRAM/Processor speed in a different way
Various technologies such as EDO and SDRAM have attacked the problem with enhanced logic circuitry and peripherals that accessed the DRAM core
FCRAM seeks to change the DRAM core itself Core segmentation and pipeline operation Ability to send row and column information at the same time
40
-
Naehyuck Chang
Memory on System (MoSys)
41
Surrounds the bit cell with control circuitry that makes the memory functionally equivalent to SRAM Controller hides all DRAM-specific operations such as precharging and refresh
Closer in size and density to embedded DRAM SoC applications
-
Naehyuck Chang
Double Data Rate (DDR) DRAM memories
Double Data Rate (DDR) memories employ Stub series terminated logic (SSTL_2) IO drivers
Signal swing of 2.5 Volt JEDEC Standard JESD8-9B
Synchronous bus Sender of the data sends a reference strobe signal along with data The edges of the strobe are used to capture the valid data
Double data rate The data is transferred on both positive and negative edges of the clock Data rate of 266 Mbps/pin
42
-
Naehyuck Chang
DDR2 SDRAM
Employs an I/O buffer between the memory and the data bus Data bus can be run at twice the speed of the memory clock
The two factors combine to achieve a total of 4 data transfers per memory clock cycle For a 64 bit bus, 100 MHz bus clock
Peak transfer rate = (memory clock rate) 2 (for bus clock multiplier) 2 (for dual rate) 64 (number of bits transferred) / 8 (number of bits/byte) = 3200 MB/s
43
-
Naehyuck Chang
DDR3 SDRAM
Double-data-rate three synchronous dynamic random access memory An improvement over DDR2 SDRAM For a 64 bit bus, 100 MHz bus clock
Peak transfer rate = (memory clock rate) 4 (for bus clock multiplier) 2 (for dual rate) 64 (number of bits transferred) / 8 (number of bits/byte) = 6400 MB/s
44
-
Naehyuck Chang
DDR3 SDRAM
SDRAM/DDR2/DDR3 CAS latency
45
-
Naehyuck Chang
Direct Rambus DRAM
A wide internal bus connected via a high-speed interface to a narrow external bus An 18-bit-wide bidirectional data field An 8-bit-wide field carrying commands and row and column addresses Narrow on-chip bus is serialized and deserialized to provide a 144-/128-bit data
path into the core, which provides 16 bytes every 10 ns internally A 2 byte external 1.25 ns bus yields a 1,600-Mbyte/s bandwidth
Transfers are accomplished on the rising and falling edges of the clock
46
-
Naehyuck Chang
Direct Rambus DRAM
Deep pipeline High throughput High latency
47