highspeedram

51
  VLSI System De sign EE577B 2009 Fall Semester Naehyuck Chang  Visiting Professor [email protected] Dept. of Electrical Engineering Computer Engineering Division  Viterbi School o f Engineerin g University of Southern California 1

Upload: misbah

Post on 01-Nov-2015

10 views

Category:

Documents


0 download

DESCRIPTION

High Speed RAM

TRANSCRIPT

  • VLSI System DesignEE577B

    2009 Fall Semester

    Naehyuck Chang

    Visiting Professor

    [email protected]. of Electrical Engineering

    Computer Engineering Division

    Viterbi School of Engineering

    University of Southern California

    High Speed Memories: Evolution of Dynamic MemoriesDRAM, DDR2 and DDR3

    1

  • Naehyuck Chang

    DRAM Architecture

    General memory cell interconnection structure

    2

  • Naehyuck Chang 3

    DRAM Architecture

    DRAM (Dynamic RAM) cell Capacitor keeps the value The capacitor is leaky

    Reliable for 64 ms after charging or discharging Repeat charging or discharging every 64 ms Volatile storage

    Cheap memory

  • Naehyuck Chang

    DRAM Architecture

    Wordline Selection (addressing) of cells connected

    to the same wordline Exclusively selected by a decoder

    Bitline Cell values are transferred via the bitlines Parallel architecture

    Address encoding Non-redundant Address value encoding

    using binary numbers

    Address decoding Large-size binary decoder: 28-

    to-268435456 binary decoder for 256Mb memory

    4

    Addressing

  • Naehyuck Chang

    DRAM Architecture

    NOR structure Fast random access capable

    5

    DischargedCharged

  • Naehyuck Chang

    DRAM Architecture

    NOR structure Fast random access capable

    5

    DischargedCharged

  • Naehyuck Chang

    DRAM Architecture

    NOR structure Fast random access capable

    5

    DischargedCharged

  • Naehyuck Chang

    DRAM Architecture

    NOR structure Fast random access capable

    5

    DischargedCharged

  • Naehyuck Chang

    DRAM Architecture

    NOR structure Fast random access capable

    5

    DischargedCharged

    0 01

  • Naehyuck Chang

    DRAM Architecture

    Multiplexed addressing

    6

  • Naehyuck Chang 7

    DRAM Architecture

    Sense amplifier Memory cell is a small transistor

    (capacitor)

    Bitline is a big capacitor Connecting a memory cell to the

    bitline cause only small amount of voltage change Differential sense amplifier

  • Naehyuck Chang

    DRAM Architecture

    8

    DRAM bus interface Multiplexed addressing

    Row address and column addressshares the same pins

  • Naehyuck Chang

    DRAM Operations

    Precharge

    9

  • Naehyuck Chang

    DRAM Operations

    Row access Assert RAS

    (row address strobe)

    10

  • Naehyuck Chang

    DRAM Operations

    Column access Assert CAD

    (column address strobe)

    11

  • Naehyuck Chang

    DRAM Operations

    Refresh DRAM cells are leaky capacitors Can hold charge for 64 ms Read operations restore correct data for the whole wordline Dummy read operations for refresh 15.6 s equal time distance Refresh overhead

    Time for refresh operation Need to close the raw currently open if not precharged already

    If uniform read operations for the whole cell is guaranteed, no dummy read is needed Video frame buffer

    12

  • Naehyuck Chang

    DRAM Evolution

    13

    FPM: Fast Page ModeEDO: Extended Data OutP/B EDO: Pipelined Burst EDOSDRAM: Synchronous DRAMESDRAM: Enhanced Synchronous DRAMDDR: Double Data RateVCDRAM: Virtual Channel RADMFCRAM: Fast Cycle RAMMOSYS: Memory on System, distributed DRAMs for SoC and FPGAs

    Serial out

    VRAM

  • Naehyuck Chang

    Conventional DRAM

    Protocol RAS*

    Row address strobe that latches the row address CAS*

    Column address strobe that larches the column address RAS* and CAS* go high to precharge and get ready for next cycle Read and write is determined with WE* when CAS* is asserted

    Variations are allowed such as read-modify-write

    14

  • Naehyuck Chang

    Conventional DRAM

    Read operation WE* should be high while CAS* is asserted OE* can be enabled whenever data is need

    Three state buffer enable

    15

    DRAM read operation

  • Naehyuck Chang

    Conventional DRAM

    Write operation WE* should be low when CAS* is asserted

    Data is lathed with CAS* OE* is dont care

    16

    DRAM write operation

  • Naehyuck Chang

    Conventional DRAM

    Refresh RAS* only refresh

    Refresh address should be given Selective refresh is feasible

    Refresh overhead In general, every 15.6 s

    17

    DRAM RAS*-only refresh

  • Naehyuck Chang

    Conventional DRAM

    Refresh CAS* before RAS* refresh (CBR)

    Use of a prohibit DRAM protocol Use of internal refresh address counter Makes the refresh control circuit simpler

    18

    DRAM CAS* before RAS* operation

  • Naehyuck Chang

    Conventional DRAM

    Refresh Hidden refresh

    CBR refresh is hidden after a read/write operation WE* should be high when hidden the second CAS* is asserted Data is available while RAS* is asserted Reduces refresh overhead

    19

    DRAM hidden refresh

  • Naehyuck Chang

    Conventional DRAM

    Refresh Self refresh

    Suspend operation of its DRAM controller to save power without losing data stored in DRAM

    20

    DRAM self refresh

  • Naehyuck Chang

    Pseudo SRAM

    Pseudo SRAM Consists of a DRAM macro core with a traditional SRAM interface On-chip refresh circuit Higher density, higher speed, smaller die size than SDRAM DRAM compatible process Around 70 ns access time or 133 MHz synchronous interface Low active standby current

    E.g., < 100 A Deep power down mode

    E.g., < 5 A Mobil phone applications

    21

  • Naehyuck Chang

    Fast Page Mode (FPM)

    Keep RAS* active for a while E.g., 10 s Keep the page or row open after a CAS* cycle completes

    Repeat CAS* assertions changing the column addresses Continue to access different cells connected to the same wordline

    Significant reduction of the access latency if spatial locality exists Precharge and RAS delay

    22

  • Naehyuck Chang

    Extended Data Out (EDO)

    Motivation During the FPM, data disappears when CAS* becomes inactive Need to keep CAS* until the CPU latches the output data

    EDO Extend the output data with a latch even CAS* becomes inactive for the next

    column address processing Very simple modification but enhance throughput up to 27% (5-2-2-2 @66 MHz)

    Roughly 96 MB/s with a 32 bit bus

    23

  • Naehyuck Chang

    Extended Data Out (EDO)

    70ns, 60ns and 50ns speeds

    24

    The same data out time The same data out time

    Shorter CAS* durationLonger CAS* duration

  • Naehyuck Chang

    Burst EDO

    EDO + built-in column address counter Spatial locality changes as cache appears

    Page mode does not always enhance speed: wrong prediction Cache line fill is a guaranteed special locality

    2 bit binary counter w/o carry (wrap around) Do not need to supply new column address if the next column address is a simple

    increment within the burst boundary Initiate the memory controller concept

    5-1-1-1 access can improve throughput up to 50% than EDO

    25Burst EDO Pipelined burst EDO

  • Naehyuck Chang

    VRAM (Video DRAM)

    EDO does not provide enough bandwidth for frame buffers 132 MB/s with a 32 bit bus (5-1-1-1 @66 MHz) 1280 by 1024 with 32 bpp @80 Hz refresh requires 400 MB/s only for refresh

    Dual-port DRAM for video frame buffers Two ports that can be used simultaneously

    DRAM port: random access port like standard DRAMs Video port: serial port with SLK

    Typical DRAM arrays normally access a full row of bits (i.e. a word line) at up to 1024 bits at a time

    26

  • Naehyuck Chang

    Synchronous DRAM (SDRAM)

    Motivation Asynchronous bus protocols exhibit significantly

    low efficiency Handshake protocol: active and inactive

    Microprocessors now have synchronous bus protocols

    Burst mode EDO is ready to go to synchronous interface CAS* is actually synchronized to the 66 MHz bus clock

    Synchronous bus protocol All inputs are latched at the edge of the clock No need to repeat active and inactive that occupies minimum two clock cycles

    Multiplexed addressing CAS* and RAS* is preserved Temporal redundancy still exists due to the structure of the memory organization Pin counts are important for the chip cost

    27

    Non-zero CAS duration

  • Naehyuck Chang

    Synchronous DRAM (SDRAM)

    Command scheme is introduced Patterns of RAS*, CAS* and WE* are determined the commands

    On-chip circuit for control and timing

    28

  • Naehyuck Chang

    Synchronous DRAM (SDRAM)

    Generally, a dedicated memory controller is used Convert microprocessor-compatible signals and timing to SDRAM signals and

    timing

    29

  • Naehyuck Chang

    Synchronous DRAM (SDRAM)

    30

    SDRAM protocol SDRAM 4 beat burst read Command and data are synchronized with the clock

    Bus clock and the SDRAM delay do not exactly match Bus clock used to be synchronous (integer multiple) to the CPU core clock

  • Naehyuck Chang

    Synchronous DRAM (SDRAM)

    Read/write efficiency It takes time to reverse the direction of the data bus Read to write switch costs 2 cycles Write to read switch costs 4 cycles Extra NOPs inserted between requests Switch frequency dependent on traffic

    31

  • Naehyuck Chang

    Synchronous DRAM (SDRAM)

    SDRAM refresh It is possible to refresh a RAM chip by opening and closing (activating and

    precharging) each row in each bank - RAS* only refresh To simplify the memory controller, SDRAM chips support an auto refresh command

    Performs auto refresh to one row in each bank simultaneously Maintains an internal refresh counter Memory controller simply issues a sufficient number of auto refresh commands Typical tREF = 64 ms, 4096 rows, then every 15.6 s

    A refresh command executes in 75 ns on a DDR2- 400 256 Mb device This corresponds to roughly 1% of the time

    32

  • Naehyuck Chang

    Synchronous DRAM (SDRAM)

    On chip multi-banking Multiple semi-independent banks Typical configuration

    4 to 8 banks 16K rows/bank 1024 columns/row 4 to 16 bits/column

    33

  • Naehyuck Chang

    Synchronous DRAM (SDRAM)

    Timing and bus clock frequency SDRAM has a DRAM architecture and synchronous bus interface Internal timing requirements have nothing to do with the bus clock To ensure the internal delay, a proper integer multiple wait clock cycles should be

    applied

    34

  • Naehyuck Chang

    Synchronous DRAM (SDRAM)

    Pipelined memory access SDRAM interface has a separate data and command bus Overlapping of different action in bank though command scheme

    When data is transferred to or from a bank other banks are activated and precharged (bank preparation)

    Pipelining memory accesses increases efficiency and throughput

    35

  • Naehyuck Chang

    Synchronous DRAM (SDRAM)

    Timing and bus clock frequency Write to precharge at 33 MHz bus clock

    Write to precharge at 66 MHz bus clock

    36

  • Naehyuck Chang

    Enhanced SDRAM (ESDRAM)

    Made by Enhanced Memory Systems Includes a small static RAM in the SDRAM chip

    Many accesses will be from the faster SRAM A wide bus between the SRAM and the SDRAM

    On-chip bus Category of cache DRAM and are used mainly for L1 and L2

    37

  • Naehyuck Chang

    Two Directions

    Structural modification Reduce the access latency Break down the classical structure

    of DRAM Get rid of CAS* and RAS* scheme

    Interface modification Improve the bus protocol High speed signaling Serialization of Data Special IO drives Source synchronous signaling

    38

  • Naehyuck Chang

    Virtual Channel DRAM (VCDRAM)

    Designed by NEC Contains SRAM caches Contain 16 virtual channels, or 16 1 KB SRAM caches While the ESDRAM module handles caching internally, the VC SDRAM cache is

    managed by the chipset

    39

  • Naehyuck Chang

    Fast Cycle RAM

    Developed by the Fujitsu Corporation Approaches the problem of DRAM/Processor speed in a different way

    Various technologies such as EDO and SDRAM have attacked the problem with enhanced logic circuitry and peripherals that accessed the DRAM core

    FCRAM seeks to change the DRAM core itself Core segmentation and pipeline operation Ability to send row and column information at the same time

    40

  • Naehyuck Chang

    Memory on System (MoSys)

    41

    Surrounds the bit cell with control circuitry that makes the memory functionally equivalent to SRAM Controller hides all DRAM-specific operations such as precharging and refresh

    Closer in size and density to embedded DRAM SoC applications

  • Naehyuck Chang

    Double Data Rate (DDR) DRAM memories

    Double Data Rate (DDR) memories employ Stub series terminated logic (SSTL_2) IO drivers

    Signal swing of 2.5 Volt JEDEC Standard JESD8-9B

    Synchronous bus Sender of the data sends a reference strobe signal along with data The edges of the strobe are used to capture the valid data

    Double data rate The data is transferred on both positive and negative edges of the clock Data rate of 266 Mbps/pin

    42

  • Naehyuck Chang

    DDR2 SDRAM

    Employs an I/O buffer between the memory and the data bus Data bus can be run at twice the speed of the memory clock

    The two factors combine to achieve a total of 4 data transfers per memory clock cycle For a 64 bit bus, 100 MHz bus clock

    Peak transfer rate = (memory clock rate) 2 (for bus clock multiplier) 2 (for dual rate) 64 (number of bits transferred) / 8 (number of bits/byte) = 3200 MB/s

    43

  • Naehyuck Chang

    DDR3 SDRAM

    Double-data-rate three synchronous dynamic random access memory An improvement over DDR2 SDRAM For a 64 bit bus, 100 MHz bus clock

    Peak transfer rate = (memory clock rate) 4 (for bus clock multiplier) 2 (for dual rate) 64 (number of bits transferred) / 8 (number of bits/byte) = 6400 MB/s

    44

  • Naehyuck Chang

    DDR3 SDRAM

    SDRAM/DDR2/DDR3 CAS latency

    45

  • Naehyuck Chang

    Direct Rambus DRAM

    A wide internal bus connected via a high-speed interface to a narrow external bus An 18-bit-wide bidirectional data field An 8-bit-wide field carrying commands and row and column addresses Narrow on-chip bus is serialized and deserialized to provide a 144-/128-bit data

    path into the core, which provides 16 bytes every 10 ns internally A 2 byte external 1.25 ns bus yields a 1,600-Mbyte/s bandwidth

    Transfers are accomplished on the rising and falling edges of the clock

    46

  • Naehyuck Chang

    Direct Rambus DRAM

    Deep pipeline High throughput High latency

    47