dezső sima september 2008

145
Dezső Sima September 2008 (Ver. 1.0) Sima Dezső, 2008 2. Challenges/limiters of parallel connected synchronous memories

Upload: minty

Post on 19-Jan-2016

23 views

Category:

Documents


0 download

DESCRIPTION

2. Challenges/limiters of parallel connected synchronous memories. Dezső Sima September 2008. (Ver. 1.0).  Sima Dezső, 2008. Overview. 1. Key challenges facing main memories. 2. Main limiters of increasing the transfer rate of main memories - Overview. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Dezső  Sima September 2008

Dezső Sima

September 2008

(Ver. 1.0) Sima Dezső, 2008

2. Challenges/limiters of parallel connected synchronous memories

Page 2: Dezső  Sima September 2008

Overview

1. Key challenges facing main memories•

2. Main limiters of increasing the transfer rate of main memories - Overview

3. Challenges in increasing the rate of sourcing/sinking data from/to the memory cell array

4. Challenges in increasing the transfer rate between the memory controller and the DRAM parts

5. Challenges in increasing the rate of capturing data in the memory controller/DRAM parts

6. Main limiters of increasing the memory size•

7. References•

Page 3: Dezső  Sima September 2008

1. Key challenges facing main memories

Page 4: Dezső  Sima September 2008

Key challenges facing main memories

• Increasing (single core) processor performance (the past)

1. Key challenges facing main memories (1)

Page 5: Dezső  Sima September 2008

Figure 1.2: Integer performance growth of Intel’s x86 processors

SPECint92

5

10

50

Year86 8879 1980 81 82 83 84 85 87 89 1990 91 92 93 94 95 96 97 98 99

*

*

*

**

*

**

2

386/16

*

* *

*

*

* 8088/5

*0.5

100

8088/8

80286/10

80286/12

386/20 386/25

386/33

500

*

*

*1000

20

200

1

0.2

*

***

**

*

486/25

486/33486/50 486-DX2/66

Pentium/66

Pentium/100 Pentium/120

Pentium Pro/200

PII/450

PIII/600

486-DX4/100

Pentium/133 Pentium/166

Pentium/200

PII/300PII/400 PIII/500

486-DX2/50*

2000 01 02 03

5000

2000*

*

*

*

*

** *

*

PIII/1000

P4/1500P4/1700

P4/2000 P4/2200P4/2400 P4/2800

P4/3060

P4/3200

~ 100*/10 years

*

*

***

04 05

Northwood B

10000

Prescott (1M)Prescott (2M)

Levelling off

1. Key challenges facing main memories (2)

Integer performance grows

Page 6: Dezső  Sima September 2008

Key challenges facing main memories

• Increasing (single core) processor performance (the past)

• Multicore/manycore processors with doubling core numbers in about every two years (the presence and near future)

1. Key challenges facing main memories (3)

Page 7: Dezső  Sima September 2008

Figure: Evolution of Intel’s process technology [1]

1. Key challenges facing main memories (4)

Evolution of Intel’s process technology

Shrinking: ~ 0.7/2 Years

Page 8: Dezső  Sima September 2008

Figure: The actual rise of IC complexity in DRAMs and microprocessors [2]

1. Key challenges facing main memories (5)

The evolution of IC complexity (Moore’s low)

Page 9: Dezső  Sima September 2008

Figure: Rapid spreading of Intel’s multicore processors

1. Key challenges facing main memories (6)

Rapid spreading of multicore processors in Intel’s processor portfolio

Page 10: Dezső  Sima September 2008

EIB: Element Interface Bus

Figure: Block diagram of the Cell BE [3]

SPE: Synergistic Procesing ElementSPU: Synergistic Processor UnitSXU: Synergistic Execution UnitLS: Local Store of 256 KBSMF: Synergistic Mem. Flow Unit

PPE: Power Processing ElementPPU: Power Processing UnitPXU: POWER Execution Unit

MIC: Memory Interface Contr.BIC: Bus Interface Contr.

XDR: Rambus DRAM

1. Key challenges facing main memories (7)

The Cell BE (2006)

Page 11: Dezső  Sima September 2008

Assuming that the IC process technology will evolve in the near future at a similar rate as now (shrinking of characteristic feature sizes at a rate of ~ 0.7/2 years)

the number of cores will double also about every two years.

1. Key challenges facing main memories (8)

Page 12: Dezső  Sima September 2008

Higher processor performance/more cores

Higher memory performance requirements in terms of

• larger memory size

• higher memory bandwidth

• lower memory latency

1. Key challenges facing main memories (9)

Page 13: Dezső  Sima September 2008

Higher processor performance/more cores

Higher memory performance requirements in terms of

• larger memory size

• higher memory bandwidth

• lower memory latency

Depends on

• characteristics of the application• cache architecture• ...

1. Key challenges facing main memories (10)

Page 14: Dezső  Sima September 2008

Higher processor performance/more cores

Higher memory performance requirements in terms of

• larger memory size

• higher memory bandwidth

• lower memory latency

Depends on

• characteristics of the application• cache architecture• ...

Interestingresearch

area

1. Key challenges facing main memories (11)

Page 15: Dezső  Sima September 2008

Higher processor performance/more cores

Higher memory performance requirements in terms of

• larger memory size

• higher memory bandwidth

• lower memory latency

Depends on

• characteristics of the application• cache architecture• ...

Limitations ofrecent

implementations

1. Key challenges facing main memories (12)

Page 16: Dezső  Sima September 2008

Higher processor performance/more cores

Higher memory performance requirements in terms of

• larger memory size

• higher memory bandwidth

• lower memory latency

Depends on

• characteristics of the application• cache architecture• ...

Limitations ofrecent

implementations

1. Key challenges facing main memories (13)

Page 17: Dezső  Sima September 2008

2. Main limiters of increasing the transfer rate of main memories - Overview

Page 18: Dezső  Sima September 2008

Memory CellArray

I/OBuffers

Memorycontroller

DRAM device

Figure: Main components of the main memory

Main components of the main memory

2. The transfer rate of main memories (1)

Page 19: Dezső  Sima September 2008

• The rate of sourcing/sinking data from/to the memory array, (problem of reducing the Column Cycle Time of the memory cell array)

Main limitations of recent commodity DRAMs (sychronous main memories)in increasing transfer rates

2. The transfer rate of main memories (2)

Memory CellArray

I/OBuffers

Memorycontroller

DRAM device

Sourcing/Sinking

Figure: Schematic view of the structure of the main memory

Page 20: Dezső  Sima September 2008

• The rate of transmitting data between memory controller and memory modules (transmission line termination problem),

Main limitations of recent commodity DRAMs (sychronous main memories)in increasing transfer rates

2. The transfer rate of main memories (3)

Memory CellArray

I/OBuffers

Memorycontroller

DRAM device

Sourcing/Sinking Transfering

Figure: Schematic view of the structure of the main memory

Page 21: Dezső  Sima September 2008

• The rate of capturing data in the memory controller/memory module. (signaling and synchronization problem).

Main limitations of recent commodity DRAMs (sychronous main memories)in increasing transfer rates

2. The transfer rate of main memories (4)

Memory CellArray

I/OBuffers

Memorycontroller

DRAM device

Sourcing/Sinking Transfering

Capturing Capturing

Figure: Schematic view of the structure of the main memory

Page 22: Dezső  Sima September 2008

• The rate of sourcing/sinking data from/to the memory array, (problem of reducing the Column Cycle Time of the memory cell array)

• The rate of transmitting data between memory controller and memory modules (transmission line termination problem),

• The rate of capturing data at the memory controller/memory module. (signaling and synchronization problem).

Main limitations of recent commodity DRAMs (sychronous main memories)in increasing transfer rates

The most serious limitation constrains the achievable transfer rate.

2. The transfer rate of main memories (5)

Page 23: Dezső  Sima September 2008

3. Challenges in increasing the rate of sourcing/sinking data from/to the memory cell

array

3. Challenges in increasing the rate of sourcing/sinking data from/to the memory cell array

Page 24: Dezső  Sima September 2008

Basic operation speed of recent sychronous DRAMs

The memory cell array sources/sinks data to/from the I/O buffers at a rate of T (at a data width of x4/x8/x16).

T = 1/tCCD x FW

with tCCD: Min. column cycle time of the memory cell array

FW: Fetch width of the memory cell array

3. The rate of sourcing/sinking data (1)

Page 25: Dezső  Sima September 2008

Figure: The interpretation of tCCD [4]

3. The rate of sourcing/sinking data (2)

The min. column cycle time (tCCD) of the memory cell array

tCCD (Core column delay)

is the min. time interval between consecutive Reads or Writes.

Remark

tCCD is designated also as the Read/Write command to Read/Write command delay

Page 26: Dezső  Sima September 2008

Figure: The evolution of the column cycle time (tCCD) in different SDRAM types (ns) [5]

3. The rate of sourcing/sinking data (3)

ns

Note: The min. column cycle time (tCCD) of synchronous DRAMs is:

SDRAM: 7.5 nsDDR/2/3 5 ns

Page 27: Dezső  Sima September 2008

specifies how many times more bits the cell array fetches per column cycle then the data width of the device.

E.g. an x4 DRAM chip with a fetch width of 4 (actually a DDR2 DRAM) fetches 4 × 4 that is 16 bits from the memory cell array per column cycle.

The fetch width (FW) of the memory cell array of synchronous DRAMs is typically:

SDRAM: 1DDR: 2DDR2: 4DDR3: 8

DRAM type FW

3. The rate of sourcing/sinking data (4)

The fetch width (FW) of the memory cell array

Page 28: Dezső  Sima September 2008

3. The rate of sourcing/sinking data (5)

DRAM core clock100 MHz

Clock (CK/CK#)400 MHz

Memory CellArray

I/OBuffers

DDR3SDRAM

2 x fCK

fCK/4

n bits

8xn bits

E.g.

DRAM core clock100 MHz

Clock (CK/CK#)200 MHz

Memory CellArray

I/OBuffers

DDR2SDRAM

2 x fCK

fCK/2

4xn bitsn bits

E.g.

Memory CellArray

I/OBuffers

DDRSDRAM

2 x fCKfCK

2xn bitsn bits

DRAM core clock100 MHz

Clock (CK/CK#)100 MHzE.g.

DRAM core frequency100 MHz

Clock frequency (fCK)

100 MHzE.g.

Memory CellArray

I/OBuffers

SDRAM fCKfCK

n bits n bits

DDR3-800Data transfer on both edges of DQS over the data lines (DQ0 - DQn-1)

800 MT/s

Data Strobe (DQS)400 MHz

DDR2-400

Data Strobe (DQS)200 MHz

Data transfer on both edges of DQS over the data lines (DQ0 - DQn-1)

400 MT/s

DDR-200Data transfer on both edges of DQS over the data lines (DQ0 - DQn-1)

200 MT/s

Data Strobe (DQS)100 MHz

Clock (CK)100 MHz

SDRAM-100Data transfer on the rising edges of CK

over the data lines (DQ0 - DQn-1) 100 MT/s

Figure: Fetch width of synchronous DRAM generations

Page 29: Dezső  Sima September 2008

SDRAM: 1/7.5 x 1 = 133 MT/sDDR: 1/5 X 2 = 400 MT/sDDR2: 1/5 x 4 = 800 MT/sDDR3: 1/5 x 8 = 1600 MT/s (not yet achived)

The peak rates of sourcing/sinking data to/from the I/O buffers are:

According to Tmax = 1/tCCD x FW

3. The rate of sourcing/sinking data (6)

The main limitation in increasing the rates of sourcing/sinking data from/to the memory array is TCCD (Column Cycle Time).

The column cycle time TCCD) resulting from a DRAM design depends on a number of architectural choiches, like column decoder layout, array block size, array partitioning, decisions to share resources between array banks etc. [32]. Its reduction below 5 ns is an intricate circuit design task, that is out of scope of our discussion. For an insight into the subject see [32].

Remark

GDDR3 and GDDR4 devices, with peak transfer rates of 1.6 and 2.5 GT/s, respectively, achive min. column cycle times (TCCD) of 2.5 and 3.2 ns, respectively [32].

Page 30: Dezső  Sima September 2008

4. Challenges in increasing the transfer rate between the memory controller and the DRAM parts

Page 31: Dezső  Sima September 2008

4. The transfer rate between the MC and the DRAM parts (1)

Memory controller

Memory modules

Motherboard trace

Figure: The dataway connecting the memory controller and the DRAM chips(based on [6])

The dataway connecting the memory controller and the DRAM chips

Page 32: Dezső  Sima September 2008

4. The transfer rate between the MC and the DRAM parts (2)

Memory controller

Memory modules

Motherboard trace

Figure: The dataway connecting the memory controller and the DRAM chips(based on [6])

The dataway connecting the memory controller and the DRAM chips

For higher data rates PCB tracesbehave as transmission lines

Page 33: Dezső  Sima September 2008

Basic behaviour of transmission lines (TL)

TL

Driver Receiver

Principle of operation

• A signal front given at the input of the TL travels down the TL from the driver side to the receiver side.

• Arriving at the receiver side the signal becomes reflected back to the driver side, then

• at the driver side, the signal will be reflected again toward the receiver side etc.

4. The transfer rate between the MC and the DRAM parts (3)

Page 34: Dezső  Sima September 2008

PC board traces (microstrips) behaves over ~ 100 MT/s like transmission lines with

Transmission lines (TL)

• a characteristic impedance (ZO)• and trace velocity

4. The transfer rate between the MC and the DRAM parts (4)

Page 35: Dezső  Sima September 2008

Characteristic impedance of PCB traces (ZO) [7]

Table: Typical characteristic impedance values of PCB traces [8]

4. The transfer rate between the MC and the DRAM parts (5)

Page 36: Dezső  Sima September 2008

Table: Typical trace velocity values of PCB traces [8]

Remark

2.0 ns/ft equals ~ 15 cm/ns

With 1 ft = 30.48 cm, the equivalent values in cm/ns are:

2.2 ns/ft equals ~ 14 cm/ns

1.6 ns/ft equals ~ 19 cm/ns

4. The transfer rate between the MC and the DRAM parts (6)

Trace velocity

Page 37: Dezső  Sima September 2008

ZD: Internal impedance of the driver

ZO: Charateristic impedance of the TL

ZT: Impedance terminaling the TL

T: Flight-time over TL

TL

VrD(t) VrR(t)

VR(t)VD(t)VO (t)

ZD

ZO

ZT

Driver Receiver

T

Figure: Equivalent circuit of an ideal transmission line, (neglecting attenuation along the TL and capacitive as well as inductive loading of the TL)

With VO(t): Generator voltage

VD(t): Voltage at the

driver output

VrD(t): Reflected voltage

at the driver

VR(t): Voltage at the

receiver

VrR(t): Reflected voltage

at the receiver

Behaviour of an ideal TL

Ideal TL: no attenuation, no capacitive or inductive loading.

4. The transfer rate between the MC and the DRAM parts (7)

Page 38: Dezső  Sima September 2008

describing the reflections and driver/receiver side voltages (based on [9])

VO(t=0) = VO

VD(t=0) = VD(0) = VO

ZO

ZO + ZD

4. The transfer rate between the MC and the DRAM parts (8)

Characteristic equations

At t = 0

VrD(t=0) = VD(t=0)

VR(nT) = VD((n-1)T)*(1+rR)

rR = ZT – ZO

ZT + ZO

VrR(nT) = VD((n-1)T)*rR

At t = T (T: propagation time across the TL)

Driver side:

Receiver side:

where

Page 39: Dezső  Sima September 2008

ZD – ZO

ZD + ZO

4. The transfer rate between the MC and the DRAM parts (9)

Characteristic equations (cont.)

At t = nT (n>1)

At t ∞ (Steady state)

ZO

ZO + ZD

VR(t∞) = VO

VR(nT) =VR((n-2)T) + VrD((n-1)T)*(1+rR)

VrR(nT) = VrD((n-1)T)*rR

where:

VrD((n+1)T) = VrR(nT)*rD

Driver side

Receiver side

VD((n+1)T) = VD((n-1)*T)+VrR(nT)*(1+rD)

rD =

Receiver side

Page 40: Dezső  Sima September 2008

Example 1: Open ended ideal TL

TL

VO (t=0) = 2V

ZD = 25 ΩZO = 50 Ω

ZT >> ZO

Driver Receiver

VrD (t)

VrR (t)

VR(t)VD(t)VO(t)

ZD

ZT

4. The transfer rate between the MC and the DRAM parts (10)

Figure: Equivalent circuit of an open ended ideal TL

Page 41: Dezső  Sima September 2008

Figure: Ladder diagram and VD(t), VR(t) waveforms of an open ended ideal TL (based on [6])

1.0

2.0

2T4T

6T8T

2T

4T

6T

8T

5T

1T

3T

7T

1.0

2.0

T3T

5T7T

9T

1.333

1.333

-0.444

-0.444

0.148

0.148

-0.049

-0.049

0.002

1.333

2.222

1.926

2.025

2.666

1.778

2.074

1.976

1.33

2.221.93

2.02

2.67

1.782.07

1.98

VD(t) VR(t)

VD (t) V

R (t)

4. The transfer rate between the MC and the DRAM parts (11)

Driver side Receiver side

Page 42: Dezső  Sima September 2008

D: DriverR: ReceiverO: OutputI: Input

Figure: Open ended real TL(diiferential connection) [10]

Reflections at both ends (R-end, D-end)

4. The transfer rate between the MC and the DRAM parts (12)

Page 43: Dezső  Sima September 2008

Figure: Reflections shown on a eye diagram due to termination mismatch [11]

4. The transfer rate between the MC and the DRAM parts (13)

Reflections

Page 44: Dezső  Sima September 2008

Implications of the reflections on a TL

Reflections limit the max. data transfer rate of a TL.

• When a data signal is given at the driver side of the TL, a signal wavefront travels down the TL and will be ping-ponged between both ends of the TL until the steady state condition is reached.

4. The transfer rate between the MC and the DRAM parts (14)

• But until the signal becomes at least nearly settled no further wavefront can be given to the TL else inter symbol interferences (ISI) arise.

Page 45: Dezső  Sima September 2008

Example

Open ended TL of the length of 10 cm

• Signal velocity on the TL is 20 cm/ns.

• Reflections settle to an acceptable level after three roundtrips (6T).

Assumptions:

Then the wavefront of a signal settles nearly after 6×0.5 ns = 3 ns.

½ of the min. cycle time is 3 ns, the min. cycle time is 6 ns, the max. transfer rate of the above open ended TL is ~ 166 MHz

The max. data transfer rate is limited primarily by the time until the signal settles, that is, it depends both on

• the number of signal round trips until the signal settles, and• the length of the TL.

4. The transfer rate between the MC and the DRAM parts (15)

T = 0.5 ns

Page 46: Dezső  Sima September 2008

Open ended TLs may be used only for

For higher transfer rates or longer distances the TL needs to be terminated by its characteristic impedance Z0.

• relative low transfer rates (up to ~ 100 MHz), that is up to SDRAM devices, and

• short distances (up to ~ 10 cm).

4. The transfer rate between the MC and the DRAM parts (16)

Page 47: Dezső  Sima September 2008

4. The transfer rate between the MC and the DRAM parts (17)

Reducing reflections by a series resistor

A series resistor put before the TL reduces reflections

Improved signal integrity, higher transfer rates

Page 48: Dezső  Sima September 2008

4. The transfer rate between the MC and the DRAM parts (18)

Figure: Equivalent circuit of an open ended TL with a series resistor (R3 in the figure) includedbetween the driver and the TL (Micro-Cap 9.0.5.0)

Example 2: Using series resistors to reduce reflections

Page 49: Dezső  Sima September 2008

4. The transfer rate between the MC and the DRAM parts (19)

Figure: Driver (Vout) and Reciever (Vin) voltages of an open ended TL with a series resistor R3The value of R3 is modified from 0 to 25 Ohm

R3:

R3 = 0 Ώ

R3 = 25 Ώ

Page 50: Dezső  Sima September 2008

RS

Memory Contr.

Comm., Contr.Addr.

DQ, DQS

DM

SDR DIMM

LVTTL

SDR DIMM

RS

Slot 1 Slot 2

Figure: Series resistors on an SDRAM module inserted into the DQ, DQS, DM lines

(Rs = 10 or 22 Ω)

4. The transfer rate between the MC and the DRAM parts (20)

Page 51: Dezső  Sima September 2008

Matched TLs

Needed above ~ 100 MHz (i.e. for DDR/DDR2/DDR3 memories).

Basic scheme for unidirectional signals (assuming SSTL signaling)

VREF: 0.5 Output voltage

VT: Termination Voltage = VREF

Figure: Termination of a TL with its characteristic impedance [12]

RT: 50 Ohm

VREF

VT

ZO = 50 Ohms

Transmitter Receiver

RT

4. The transfer rate between the MC and the DRAM parts (21)

SSTL:Stub Series Termination Logic

Page 52: Dezső  Sima September 2008

Example 3: Perfectly terminated ideal TL

TL

VO(t=0) = 2V

ZD = 25 Ω ZO = 50 Ω

ZT = 50 Ω

Driver Receiver

VrD (t)

VR(t)VD(t)VO (t)

ZD

ZT

4. The transfer rate between the MC and the DRAM parts (22)

VrR (t)

Figure: Equivalent circuit of a perfectly terminated ideal TL

Page 53: Dezső  Sima September 2008

Figure: Ladder diagram and waveforms VD(t), VR(t) of a perfectly matched ideal TL (based on [6])

1.0

2.0

2T

1T

3T

1.0

2.0

0

1.33

1.33VD(t) VR(t)

VD (t) V

R (t)

4T

4. The transfer rate between the MC and the DRAM parts (23)

Driver side Receiver side

Page 54: Dezső  Sima September 2008

Figure: Perfectly matched real TL(differential connection) [10]

No reflections from the receiver end

RT = ZO

4. The transfer rate between the MC and the DRAM parts (24)

Page 55: Dezső  Sima September 2008

Figure: Discontinuities of TLs connecting the memory controller and the memory modules

(based on [6])

• The TL connecting the memory controller and the DRAM devices is not homogeneous, it consists of multiple sections.

The problem of TL inhomogenity

Memory controller

Memory modules

Motherboard/transmission line

4. The transfer rate between the MC and the DRAM parts (25)

Page 56: Dezső  Sima September 2008

4. The transfer rate between the MC and the DRAM parts (26)

Figure: Discontinuities of TLs connecting the slot to the particular DRAM devices assuming stub-bus topology and a registered memory module [5]

Page 57: Dezső  Sima September 2008

Figure: Discontinuities of TLs connecting the memory controller and the memory modules

(based on [6])

• Between different TL sections there are discontinuities, that give rise to reflections.

• The TL connecting the memory controller and the DRAM devices is not homogeneous, it consists of multiple sections.

The problem of TL inhomogenity

Memory controller

Memory modules

Motherboard/transmission line

4. The transfer rate between the MC and the DRAM parts (27)

Page 58: Dezső  Sima September 2008

Addressing the problem of TL discontinuities

SSTL termination (Stub Series Termination Logic)

Principle

VREF: 0.5 Output voltageVT: Termination Voltage = VREF

Figure: SSTL termination of a unidirectional signal [12]

Use both perfect termination and a series resistors (RS) to increase the TL attenuation and thus reduce reflections from the memory module back to the memory controller [6].

Used in DDR/DDR2/DDR3 devices

4. The transfer rate between the MC and the DRAM parts (28)

RS: 22/25 Ohm

RT: 50 Ohm

RS

VREF

VT

ZO = 50 Ohms

Transmitter Receiver

RT

Page 59: Dezső  Sima September 2008

4. The transfer rate between the MC and the DRAM parts (29)

Figure: Equivalent circuit of two TLs (T1, T2) with slightly different characteristic impedances,a series resistor (R3), while T2 is terminated by 50 Ohm and 3 pF.

Page 60: Dezső  Sima September 2008

4. The transfer rate between the MC and the DRAM parts (30)

Figure: Driver (Vout) and Reciever (Vin) voltages of the previous equivalent circuit

Discontinuities of the transmission line generate reflections

Page 61: Dezső  Sima September 2008

R3 = 0 … 25

4. The transfer rate between the MC and the DRAM parts (31)

Figure: Driver (Vout) and Reciever (Vin) voltages of the previous equivalent circuitThe value of R3 is modified from 0 to 25 Ohm

R3 = 0 Ώ

R3 = 25 Ώ

Higher series resistor values attenuate reflections but lower the steady state output voltage

Page 62: Dezső  Sima September 2008

C3 = 0 … 9 pF

4. The transfer rate between the MC and the DRAM parts (32)

Figure: Driver (Vout) and Reciever (Vin) voltages of the previous equivalent circuitThe value of C3 is modified from 0 to 9 pF

C3=0 pF

C3=9 pF

Higher output capacitance values lower the reflections

Page 63: Dezső  Sima September 2008

Note

With increasing value of Rs (from 2 Ohm to 22 Ohm) the amplitude of the reflected voltage at the receiver side clearly decreases.

4. The transfer rate between the MC and the DRAM parts (33)

Page 64: Dezső  Sima September 2008

Example 1: Line terminations in a DDR memory

Figure: Line terminations in a DDR memory

(RS1: 7.5 Ω for 4 devices, 5.1 Ω for 8 devices, 3 Ω for 16 devicesRS2 = 22 Ω, RT = 56 Ω)

RS1

Memory Contr.

Comm., Contr.Addr.

DQ, DQS/#

DM

DDR DIMM DDR DIMM

Slot 1 Slot 2

VTT VTT

RT

RT

SSTL_2

RS2

RS1

RS2

4. The transfer rate between the MC and the DRAM parts (34)

Page 65: Dezső  Sima September 2008

In order to achieve higher transfer rates

4. The transfer rate between the MC and the DRAM parts (35)

Examples: Synchronous DRAMs (commodity DRAMs)

more and more sophisticated line terminations are needed.

Page 66: Dezső  Sima September 2008

Vss

VTT

Rs2

RS1

VTT

Memory Contr.

Comm., Contr.Addr.

DQ, DQS/#

DM

DDR2 DIMM

RTT

VTTDDR2 DIMM

SSTL_18ODT

Vss

VTT

Rs2

RS1ODT

Slot 1 Slot 2

Figure: Line terminations in a DDR2 memory

(RS1: 10 Ω for 4 devices, 5.1-10 Ω for 8 devices, 7.5 Ω for 16 devicesRS2 = 22 Ω, RTT = 47 Ω)

R1

R2

R1

R2

RTT

Example 2: Line terminations in a DDR2 memory

On-Die Termination (ODT)

4. The transfer rate between the MC and the DRAM parts (36)

Page 67: Dezső  Sima September 2008

Figure: Line terminations in a DDR3 memory

(Rs = 10-15 Ω, RT = 36-39 Ω, RZQ = 240 Ω ±1%)

Vss

VTT

Rs

Dyn. ODT

ZQ

Vss

RZQ

Memory Contr.

Comm., Contr.Addr.

DQ, DQS/#

DM

DDR3 DIMM DDR3 DIMM

SSTL_15

Vss

VTT

RsZQ

Vss

RZQ

Dyn. ODT

R1

R2

R1

R2

VTT

RT

VTT

RT

Remark: Due to the fligh-by module topology no series resistors are needed for the Command/Control/Address lines

Example 3: Line terminations in a DDR3 memory

Dynamic On-Die Termination (ODT) opt.: to optimize termination resistors along with each write command

ZQ calibration: to adjust the „on” and the „termination” impedances of the merged drivers every 128 ms.

4. The transfer rate between the MC and the DRAM parts (37)

Page 68: Dezső  Sima September 2008

Table : Implementation details of SDRAM types

4. The transfer rate between the MC and the DRAM parts (38)

SDRAM DDR SDRAM DDR2 SDRAM DDR3 SDRAM

Signaling

C/C/A LVTTL SSTL_2 SSTL_18 SSTL_15

Clock (CLK/CK) LVTTL SSTL_2 Diff. SSTL_18 Diff. SSTL_15 Diff.

DQ, DQM LVTTL SSTL_2 SSTL_18 SSTL_15

DQS -- SSTL_2 SSTTL-18/SSTL_18 Diff. SSTL_15 Diff.l

Terminations No RS

RS

RS on module RS on module No RS

RT No RT RT on board RT on board RT on module

RS

RS on module

RT-- RT on board ODT (RT on die) Dyn. ODT (RT on die)

Driver architectureSeparate output /

termination driversSeparate output /

termination driversSeparate output /

termination drivers

Merged output/termination drivers with ZQ-calibration (during power

up/ periodically)

Synchronization

Basic scheme Central clock Source synchronization

Aligning DQS with CK

No DQS DLL DLLDLL+ Read/write leveling to

compensate fly-time skews between DQS and CK (during power-up)

Posted reads/writes No No Yes Yes

Reset pin No No No Yes

DIMM topology Stub architecture Fly-by architecture

Packaging TSOP-54TSOP-54BGA-60

BGA-60 for x4/x8BGA-84 for x16

BGA-78 for x4/x8BGA-96 for x16

C/C/A: Command/Control/Address DQ: Data DQM: Data Mask DQS: Data Strobe

C/C/A

DQ/DQS/DM

Page 69: Dezső  Sima September 2008

Line terminations of recent commodity DRAMs

there is not to much headroom remaining

achieved already a rather high grade of sophistication

for further improvements.

4. The transfer rate between the MC and the DRAM parts (39)

Page 70: Dezső  Sima September 2008

5. Challenges in increasing the rate of capturing data in the memory controller/DRAM parts

Page 71: Dezső  Sima September 2008

5. Challenges in increasing the rate of capturing data in the memory controller/DRAM parts

5.1 Coping with capturing data•

5.2 Using more advanced signalling•

5.3 Using more advanced synchronization•

Page 72: Dezső  Sima September 2008

• Data and commands are latched by D Flip-Flops.

Basics of capturing data

• For correctly capturing data or commands,

input signals need to be held valid for specified periods of time before and after the clock puls,

termed as the setup time (tS) and

the hold time (tH) as shown in the figure.

Data/Commands

Clock

Clock

Data

Q

tS

tH

Figure: Temporal requirementsfor correctly capturing data

5.1 Coping with capturing data (1)

Page 73: Dezső  Sima September 2008

Hold Time (tH)

the minimum time interval for which the input signal must remain valid (high or low) following the clock edge in order to capture the data bit correctly.

Setup time (tS)

the minimum time interval for which the input signal must remain valid (high or low) prior to the clock edge in order to capture the data bit correctly.

5.1 Coping with capturing data (2)

Page 74: Dezső  Sima September 2008

Table: Excerpt from the specification of the dynamic parameters of a DDR-400 device [13]

Specification of the setup time (tS) and the hold time (tH)

In device datasheets, e.g. in case of a DDR-400 device:

Note: A DDR-400 device is clocked by 200 MHz, so its half clock period is 1.25 ns. By contrast, its setup and hold times are 0.4 ns each (designated as tDS, TDH in the table).

5.1 Coping with capturing data (3)

Page 75: Dezső  Sima September 2008

Minimum data valid window (DVW)

the minimum time interval for which the input signal must remain valid (high or low) before and after the clock edge in order to capture the data bits correctly.

The minimum DVW has two characteristics,

Figure: Interpretation of the minimum DVW for ideal signals

Data

CK

tS

tH

Min. DVW

5.1 Coping with capturing data (4)

a size, that is the sum of the setup time (tS) and the hold time (tH), and a correct phase related to the clock edge, to satisfy both tS and tH requirements.

Page 76: Dezső  Sima September 2008

In a DDR-400 SDRAM tS = tH = 0.4 ns [13], then

If both tS = tH, the clock edge needs to be center aligned with the DVW, as indicated below.

Data

CK

tS

tH

Min. DVW

Figure: Center aligned clock edge within the min. DVW

Example

• the min. DVW is 0.8 ns, i.e. roughly 2/3 of the clock period (1.25 ns), and

• the clock edge needs to be center aligned in the min. DVW.

5.1 Coping with capturing data (5)

Page 77: Dezső  Sima September 2008

Available DVW

the time interval for which the input signal remains valid (high or low).

Figure: Interpretation of the minimum DVW and the available DVW for ideal signals

Data

CK

tS

tH

Min. DVW

Available DVW

For correctly capturing data:, two requirements need to be fulfilled:

5.1 Coping with capturing data (6)

• the available DVW ≥ available DVW, and

• the clock edge needs to be properly aligned (usually center aligned) within the available DVW.

Page 78: Dezső  Sima September 2008

for the highest transfer rate

the clock signal needs to be center aligned with the data.

Note

Assuming tS = tH (as usual)

5.1 Coping with capturing data (7)

Page 79: Dezső  Sima September 2008

5.1 Coping with capturing data (8)

• skews and

• jitter.

Reduction of the available DVW in real systems

In real systems the available DVW is reduced due to

Page 80: Dezső  Sima September 2008

Skews arise mainly due to

- propagation delays in the PC-board traces, termed also as time of flight (TOF) (about 170 ps/inch), as indicated above [14],

- capacitive loading of a PC-board trace (about 50 ps per pF) as indicated in the

subsequent figure [14],

- SSO (Simultaneous Switching Output) occurring due to parasitic inductances in case when a number of bit lines simultaneously change their output states.

Figure: Skew due to propagation delay [15]

• between different occurances of the same signal, such as a clock, at different locations on a chip or a PC board (as shown in the Figure below), or

• between different bit lines of a parallel bus at a given location.

Skew

is a time offset of the signal edges

5.1 Coping with capturing data (9)

Page 81: Dezső  Sima September 2008

Figure: Skew due to capacitve loading of signal lines [14]

CK-1

CK-2

Skew

5.1 Coping with capturing data (10)

Page 82: Dezső  Sima September 2008

Available DVW

Data

CK

tS

tH

Min. DVW

Data

CK

tS

tH

Min. DVW

Available DVW

Center aligned clock Skewed clock

Figure: Reduction of operational tolerances due to clock skew (ideal signals assumed)

A larger than indicated skew would even jeopardize or prevent correct operation.

Deskewing of clock distribution is needed.

Reduction of operational tolerances due to skews

5.1 Coping with capturing data (11)

Page 83: Dezső  Sima September 2008

• phase uncertainty causing ambiguity in the rising and falling edges of a signal,

as shown in the figure below,

• has a stochastic nature,

Figure: Jitter of signal edges [15]

Jitter

5.1 Coping with capturing data (12)

The main sources of jitter are

• Crosstalk caused by coupling adjacent traces on the board or in the DRAM device,

• ISI (Inter-Symbol Interference) caused by cycling the bus faster than it can settle,

• Reflection noise due to mismatching termination of signal lines,

• EMI (Electromagnetic Interference) caused by electromagnetic radiation emitted from external sources.

Page 84: Dezső  Sima September 2008

Jitter obviously narrow the available DVW, as shown in the following example for DDR-200 devices.

(DDR-200 devices are clocked by 100 MHz, thus their half clock period is 5 ns).

5.1 Coping with capturing data (13)

~ 5 ns

Av. DVW with jitter

Av. DVW without jitter

Figure: Narrowing the available DVW due to jitter

DQ

Narroving the available DVW due to jitter

Page 85: Dezső  Sima September 2008

The timing budget of the available DVW

The available DVW need to cover

• the min. requested DVW (tS +tH),

• all possible sources of skews,

• all possible sources of jitter.

Skews/jitters Skews/jitters

Available DVW

min.DVW

Figure: Interpretation of the timing budget of the available DVW

Note

The white areas before and after the min. DVW represent available timing margins

5.1 Coping with capturing data (14)

Page 86: Dezső  Sima September 2008

Table: Timing budget of a DDR-266 memory [16]

RemarkThe table uses partly different terminology, as follows

Total skew: Available DVWTransmitter skew: Setup timeReceiver skew: Hold timeVREF noise: OSSCIN mismatch: Skew due to different capacitive loading

Example

Timing budget of a DDR-266 memory

5.1 Coping with capturing data (15)

Page 87: Dezső  Sima September 2008

Note

The crucial sources and actual extent of occurring skews and jitters depend on

• the frequency range in question,

• DRAM type used,

• mainboard and memory module implementation details.

Timing budget tuning is a main task of developing DRAM devices/modules and mainboards.

5.1 Coping with capturing data (16)

Page 88: Dezső  Sima September 2008

tDV Width of the available VDW

Shrinking the available DVW for higher transfer rates

Higher data rates Shorter clock periods Shorter available DVWs

This is one of the key problems to be handled for achieving higher data rates.

Figure: Shrinking the available DVW while raising the data rate from PC-133 to DDR-400 and DDR2-800 [17]

5.1 Coping with capturing data (17)

Page 89: Dezső  Sima September 2008

• using more advanced signaling techniques, such as

5.1 Coping with capturing data (18)

• SSTL (Stub Series Terminated Logic) or

• LVDS (Low Voltage Differential Signaling),

• using more efficient synchronisation schemes than central clocking, such as

source-synchronous synchronisation.

instead of open-ended LVTTL (Low Voltage TTL),

Addressing the problem of shrinking (available) DVWs in order to raise DRAM speed

Reducing skews and jitters by

• using DLLs/PLLs to align clock or data strobe edges.

Page 90: Dezső  Sima September 2008

Using more advanced signaling techniques

5.2 Using more advanced signaling (1)

Page 91: Dezső  Sima September 2008

Signal types

Voltage referenced, single ended

Ground referenced Voltage referenced, differential

LVTTL: Low Voltage TTL LVDS: Low Voltage Differential SignalingHVDS: High Voltage Differential Signaling SSTL: Stub Series Terminated LogicVREF: Reference Voltage VCM: Common Mode Voltage

LVTTL (3.3 V) SDRAM PCI PCI-X AGP1.0

TTL (5 V) PCI

SSTL single ended signals SSTL_2 (2.5 V) (DDR)

SSTL_18 (1.8 V) (DDR2)

SSTL_15 (1.5 V) (DDR3)

AGP2.0 (1.5 V)

AGP3.0 (0.8 V)

LVDS Hypertransport SATA Ultra-2 SCSI and later PCI-E

HVDS SCSI-1

t t

VREF

t

S+

S-VCM

Higher data rates

5.2 Using more advanced signaling (2)

Figure: Overview of signal types

SSTL Differential signals

Page 92: Dezső  Sima September 2008

Figure: Signal types used in mainstream DRAMs

(Earliest DRAMs (1K/4K) omitted)

TTL LVTTL SSTL

Signal types used in mainstream DRAMs

3.3 V5 V 2.5/1.8/1.5 V

(Low Voltage TTL) (Stub Series Termination Logic)

Page ModeFPMEDO

FPMEDOSDRAM

DDRDDR2DDR3

Ground referenced Ground referenced Voltage referenced,single ended/differential

Signal type

Voltage

Used in theDRAM types

Termination Open ended Open ended Terminated

5.2 Using more advanced signaling (3)

Page 93: Dezső  Sima September 2008

Table: Signal types of the main signal groups in synchronous DRAM devices

SDRAM DDR SDRAM DDR2 SDRAM DDR3 SDRAM

Comm./Control/Addr./Data (DQ)/Data Mask (DM)

LVTTL SSTL_2 SSTL_18 SSTL_15

Clock (CLK/CK) LVTTL SSTL_2 Diff. SSTL_18 Diff. SSTL_15 Diff.

Data Strobe (DQS) -- SSTL_2 SSTL_18 / SSTL_18 Diff. SSTL_15 Diff.

5.2 Using more advanced signaling (4)

Page 94: Dezső  Sima September 2008

Figure: Input/output characteristics of TTL signals as used in PM/FPM/EDO devices (based on [6])

TTL inverter

VoutVIN

5.2 Using more advanced signaling (5)

2.0

3.0

1.0

4.0

0.4

5.0

2.0 3.01.0 4.0 5.0

2.4

0.8VIL max VIH min

Vin

Vout

VOL max

VOH min

2.4

VOH max

Page 95: Dezső  Sima September 2008

Figure: Input/output characteristics of LVTTL signals (based on [6])

LVTTL inverter

VoutVIN

2.0

3.0

1.0

0.4

3.3

2.0 3.01.0 3.3

2.4VOH min

0.8VIL max VIH min

Vin

Vout

VOL max

VOH max

5.2 Using more advanced signaling (6)

Page 96: Dezső  Sima September 2008

SSTL_2: VDDQ = 2.5 V JESD8-9 (Sept. 1998), used in DDR SDRAMs

SSTL_18: VDDQ = 1.8 V JESD8-15A (Oct. 2002), used in DDR2 SDRAMs

SSTL_15 VDDQ = 1.5 V used in DDR3 SDRAMs

Stub Series Terminated Logic (SSTL)

5.2 Using more advanced signaling (7)

Three generations

SSTL signals

Differential Single ended

Commmand/Control/Address,

Data (DQ), Data Mask (DM),

Data Strobe (DQS) in DDR/DDR2 Data Strobe (DQS) in DDR2/3

Used as

Figure: Types of SSTL signals

t

S+

S-VCM

t

VREF

Clock (CK)

Page 97: Dezső  Sima September 2008

Figure: Input/output characteristics of single ended SSTL signals (based on [6])

The static view

Vout

VIN

VREF

SSTL inverter

5.2 Using more advanced signaling (8)

2.0

2.5

1.0

0.375

2.01.0 2.5

2.125

VREF

1.25

(VREF – 150 mV) (VREF + 150 mV)

VIL max VIH min

Vin

Vout

1.25

VOH min

VOL max

Page 98: Dezső  Sima September 2008

AC values: define the timing specifications the receiver needs to meet e.g. slew rate)

DC values: define the final logic state.

A certain amout of time after the device has crossed the DC threshold and then also the AC threshold (hold time), the device will switch state and will not switch back as long as the input stays beyond the dc threshold [18].

Figure: Interpretation of characteristic input levels of single ended SSTL signals [18]The dynamic view

5.2 Using more advanced signaling (9)

State changes

Page 99: Dezső  Sima September 2008

Figure: Using AC values for defining the falling and rising slew rates of single ended SSTL signals [19]

5.2 Using more advanced signaling (10)

Page 100: Dezső  Sima September 2008

Table: Characteristic input levels of single ended SSTL signals in DDR/DDR2/DD3 devices [20], [21], [22]

DDR DDR2 DDR3

VDDQ 2.5 V 1.8 V 1.5 V

VREF 1.25 V 0.9 V 0.75 V

VIH (ac )min. VREF + 310 mV VREF + 250 mV VREF + 175 mV

VIH (dc) min. VREF + 150 mV VREF + 125 mV VREF + 100 mV

VIL (dc )max. VREF - 150 mV VREF - 125 mV VREF - 100 mV

VIL (ac)max. VREF - 310 mV VREF - 250 mV VREF - 175 mV

VSS Ground Ground Ground

5.2 Using more advanced signaling (11)

Page 101: Dezső  Sima September 2008

Figure: Interpretation of characteristic input levels of differential SSTL signals [19]

Table: Characteristic input levels of differential SSTL signals in DDR/DDR2/DD3 devices [20], [22], [19]

VTR: True level

VCP: Compl. level

(CK/CK#, DQS/DQS#)

DDR DDR2 DDR3

VDDQ 2.5 V 1.8 V 1.5 V

VREF 1.25 V 0.9 V 0.75 V

VID 620 mV 500 mV 400 mV

VIX VREF VREF VREF

VSS Ground Ground Ground

5.2 Using more advanced signaling (12)

Page 102: Dezső  Sima September 2008

Skew reduction by differential data strobes (DQ, DQ#)

Figure: Skew reduction while using differential strobes instead of single ended strobes [23]

5.2 Using more advanced signaling (13)

Page 103: Dezső  Sima September 2008

Figure: Eye diagram of an ideal and a real signal

The eye diagram

5.2 Using more advanced signaling (14)

Visualizes both signal traces (belonging to the H and L levels) by overlapping subsequent symbols in time, as indicated below for both an ideal and real signal.

The eye diagram is a favorable way

JitterReflections

Reflections

• to visualize reflections, jitter and

• to contrast expected and available values both for the DVW and voltage levels.

Page 104: Dezső  Sima September 2008

Figure: Eye diagram of an ideal signal showing both min. and available DVW and voltage levels

5.2 Using more advanced signaling (15)

V1Hmin

V1Lmax

Min.DVW

tS

tH

DATA eyeMargin

Margin

Mrg

Mrg

Visualizing both min. and available DVWs and voltage margins by means of an eye diagram

Page 105: Dezső  Sima September 2008

5.2 Using more advanced signaling (16)

Min.DVW

min

max

Figure: Eye diagram of a real signal showing both min. and available DVW and voltage levels [24]

Page 106: Dezső  Sima September 2008

For a correct operation

available DVW and voltage values ≥ required values

A stable operation needs reasonable temporal margins (timing budget) and voltage margins.

5.2 Using more advanced signaling (17)

Page 107: Dezső  Sima September 2008

Improving the basic synchronisation scheme

Basic synchronisation scheme

Central clock synchronisation Source synchronisation

A central clock is used to latch (capture) addresses, commands and data from

the respective buses or send fetched data.

Figure: Basic synchronisation schemes

5.3 Using more advanced sycnhronisation (1)

Page 108: Dezső  Sima September 2008

Figure: Contrasting central clocking (SDRAMs) and source synchronised clocking (DDR SDRAMs)while writing random data [25], [13]

(TDOSS: Write command to first DQS latching transition)

Address, command and data lines arelatched by the central clock (CLK)

Central clock synchronization(SDRAMs)

5.3 Using more advanced sycnhronisation (2)

Page 109: Dezső  Sima September 2008

Improving the basic synchronisation scheme

Basic synchronisation scheme

Central clock synchronisation Source synchronisation

A central clock is used to latch (capture) addresses, commands and data from

the respective buses or send fetched data.

• Leads to high skews due to propagation delays (flight of time), different path length, different loading of the traces etc.

• SDRAMs and earlier DRAMs are centrally clocked.

Figure: Basic synchronisation schemes

5.3 Using more advanced sycnhronisation (3)

Page 110: Dezső  Sima September 2008

Improving the basic synchronisation scheme

Basic synchronisation scheme

Central clock synchronisation Source synchronisation

A central clock is used to latch (capture) addresses, commands and data from

the respective buses or send fetched data.

An extra data strobe signal (DQS) is provided to accompany data sent from the driving unit to the receiving unit

• Leads to high skews due to propagation delays (flight of time), different path length, different loading of the traces etc.

• SDRAMs and earlier DRAMs are centrally clocked.

Figure: Basic synchronisation schemes

5.3 Using more advanced sycnhronisation (4)

Page 111: Dezső  Sima September 2008

Figure: Contrasting central clocking (SDRAMs) and source synchronised clocking (DDR SDRAMs)while writing random data [25], [13]

(TDOSS: Write command to first DQS latching transition)

Address, command and data lines arelatched by the central clock (CLK)

Central clock synchronization(SDRAMs)

Command and address lines are latchedby the differential clock (CK, CK#) but

data are latched by the source synchronousdata strobe DQS

Source synchronization(DDR SDRAMs)

5.3 Using more advanced sycnhronisation (5)

Page 112: Dezső  Sima September 2008

Improving the basic synchronisation scheme

Basic synchronisation scheme

Central clock synchronisation Source synchronisation

A central clock is used to latch (capture) addresses, commands and data from

the respective buses or send fetched data.

An extra data strobe signal (DQS) is provided to accompany data sent from the driving unit to the receiving unit

• Leads to high skews due to propagation delays (flight of time), different path length, different loading of the traces etc.

• The data strobe signal eliminates propagation delays between data lines

• The data strobe signal (DQS) is bidirectional to reduce pin count.

• SDRAMs and earlier DRAMs are centrally clocked.

• DDR SDRAMs are source synchronised.

Figure: Basic synchronisation schemes

5.3 Using more advanced sycnhronisation (6)

Page 113: Dezső  Sima September 2008

Required phase alignments for synchronous DRAM devices, controllers and modules

• Memory controllers of devices need to perform the following alignments:

• for data writes

SDRAM devices do not perform any alignment on the data sent to the controller, it is the task of the controller to shift the CLK edge to the center of the data eye.

• for data reads

center align data signals (DQ) with the clock (CLK),

• for all commands center align address, command and control signals with the clock (CLK).

5.3 Using more advanced sycnhronisation (7)

In case of SDRAM devices

• SDRAM devices do not need to perform any phase alignments, however

they have to garantee that the required minimal data hold time (TOH) is satisfied, see Figure.

• for data reads

• SDRAM modules need to perform clock deskewing for the clock (CLK) distribution circuitry.

Page 114: Dezső  Sima September 2008

• Memory controllers of devices need to perform the following alignments:

• for all commands center align address, command and control signals with the clock (CK).

In case of DDRx SDRAM devices

5.3 Using more advanced sycnhronisation (8)

Page 115: Dezső  Sima September 2008

5.3 Using more advanced sycnhronisation (9)

Figure: Required phase alignments in case of DDRx devices

Page 116: Dezső  Sima September 2008

• Memory controllers of devices need to perform the following alignments:

• for data writes center align data signals (DQ) with the data strobe (DQS),

• for all commands center align address, command and control signals with the clock (CK).

In case of DDRx SDRAM devices

5.3 Using more advanced sycnhronisation (10)

Page 117: Dezső  Sima September 2008

5.3 Using more advanced sycnhronisation (11)

Figure: Required phase alignments in case of DDRx devices

Page 118: Dezső  Sima September 2008

• Memory controllers of devices need to perform the following alignments:

• for data writes

(DDRx devices send edge aligned data strobe signals (DQS) with the data signals (DQ).)

• for data reads

center align data signals (DQ) with the data strobe (DQS),

• for all commands center align address, command and control signals with the clock (CK).

In case of DDRx SDRAM devices

5.3 Using more advanced sycnhronisation (12)

Page 119: Dezső  Sima September 2008

5.3 Using more advanced sycnhronisation (13)

Figure: Required phase alignments in case of DDRx devices

Page 120: Dezső  Sima September 2008

• Memory controllers of devices need to perform the following alignments:

• for data writes

(DDRx devices send edge aligned data strobe signals (DQS) with the data signals (DQ).) It is then the task of the controller to shift the DQS edge to the center of the data eye.

• for data reads

center align data signals (DQ) with the data strobe (DQS),

• for all commands center align address, command and control signals with the clock (CK).

In case of DDRx SDRAM devices

5.3 Using more advanced sycnhronisation (14)

Page 121: Dezső  Sima September 2008

5.3 Using more advanced sycnhronisation (15)

Figure: DDR2 write operation at 800 MT/s showing 90O shift of the differential DQS into the center of the data eye [27]

Example: Shifting DQS into the center of DQ

Page 122: Dezső  Sima September 2008

• Memory controllers of devices need to perform the following alignments:

• for data writes

(DDRx devices send edge aligned data strobe signals (DQS) with the data signals (DQ).) It is then the task of the controller to shift the DQS edge to the center of the data eye.

• for data reads

center align data signals (DQ) with the data strobe (DQS),

• for all commands center align address, command and control signals with the clock (CK).

In case of DDRx SDRAM devices

• DDRx devices perform the following alignment:

they edge align the data strobe signal (DQS) with the data signal (DQ).

• for data reads

5.3 Using more advanced sycnhronisation (16)

Page 123: Dezső  Sima September 2008

The rationale of this alignment scheme

by centralizing DLL circuitry needed to accomplish alignments in a single place that is into the memory controller and thus to avoid the need for replication DLLs into every DRAM device (except the DLLs needed in the DRAMs to edge align the DQS with CK for reads) [26].

5.3 Using more advanced sycnhronisation (17)

to keep DRAM devices as simple as possible and put complexity into the memory controller [27],

Page 124: Dezső  Sima September 2008

5.3 Using more advanced sycnhronisation (18)

• DDRx modules need to perform clock deskewing for the clock (CK) distribution circuitry.

Furthermore

Page 125: Dezső  Sima September 2008

DLLs (Delay Locked Loops)

used to

• edge align or deskew two signals, or

• center align the data strobe signal (DSQ) with the data signal (DQ).

5.3 Using more advanced sycnhronisation (19)

Figure: Deskewing the CLK signal with reference to the CLKRE signal by means of a DLL

Delay

CLKREF

CLK

CLKD

CLKOUT

Delay

DQ

DQS

DQS

DQS

Figure: Shifting the data strobe (DQS) to the center of the data signal (DQ) by means of a DLL

Page 126: Dezső  Sima September 2008

Simplified block diagram and principle of operation of a DLL

Delay Delay Delay DelayCLK

Phase Delay Control

CLKOUT

CLKOUT

Clock DistributionNetworkCLKREF

Figure: Block diagram and principle of operation of the DLL by deskewing the clock signal CLK

Delay

CLKREF

CLK

CLKD

CLKOUT

The DLL is buit up mainly of a delay line and a phase delay control unit. The phase delay control unit inserts delay on the clock signal (CLK) until the rising edgeof the clock signal (CLK) is in phase with the rising edge of the reference clock signal (CLKREF).

5.3 Using more advanced sycnhronisation (20)

based on [28]

Page 127: Dezső  Sima September 2008

In a DRAM device the DLL will be activated during initialization (power up procedure).After enabling however, the DLL needs about 200 clock cycles to lock [13]and thus, until any read command can be issued.

Remark [6]

• PLLs and DLLs fulfill similar tasks. However,

• PLLs include a voltage controlled oscillator (VCO), that generates a new clock signal, whose phase is adjustable.

• DLLs include a delay line, that inserts a voltage controlled phase delay between the input and output signal.

While DLLs just delay the incoming signal to achieve a phase alignement, the PLLs actually synthesize a new clock signal, whose phase is adjustable.

• Since DLLs do not incorporate a VCO, they are cheaper to implement than PLLs.

„Warm up” time of DLLs

Memory controllers and DRAM devices of synchronous DRAMs make use of DLLs to implement phase alignments. In contrast, memory modules usePLLs to deskew clock distribution networks.

5.3 Using more advanced sycnhronisation (21)

Page 128: Dezső  Sima September 2008

6. Main limiters of increasing the memory size

Page 129: Dezső  Sima September 2008

Memory size (CM)

CM = nCU x nCH x nM x nR x CD

nM: No. of memory modules per channel

nCU: No. of north bridges/memory control units

nCH: No. of memory channels per north bridge/control unit

CR: Rank capacity (device density x no. of DRAM devices)

with

nR: No. of ranks per memory module

E.g. The Core 2 based P35 chipset supports up to two memory channels with up to two dual-ranked memory modules per channel, with 8 x8 devices of 512 Mb or 1 Gb density per rank.

The resulting maximum memory capacity is:

CMmax = 1 x 2 x 2 x 2 x 1 Gb x8 = 8 GB

6. Main limiters of increasing the memory size (1)

The memory size is given basically by the amount of memory installed in the memory system:

Page 130: Dezső  Sima September 2008

6. Main limiters of increasing the memory size (2)

Crucial factors limiting the maximum size of main memories

• nM: No. of memory modules supported per memory channel

• CR: Rank capacity (device density x no. of DRAM devices/rank).

Beyound the max. installable memory the max. memory size may be limited by particular constraints, such as the supported max. addressable space due to the number of address pins on the FSB, like in the 925X and 925XE desktop chipsets [31].

Remark

Page 131: Dezső  Sima September 2008

Number of memory modulessupported per memory channel

1-4memory modules

6-8memory modules

Modules connectedvia a parallel bus

Modules connectedvia a serial bus

SDRAM, DDR, DDR2, DDR3modules

FBDIMM modules

Higher transfer rates limitthe number of mem. modules

typically to one or two.

Figure: Number of memory modules supported by memory channel

E.g.

6. Main limiters of increasing the memory size (3)

Page 132: Dezső  Sima September 2008

Figure: Max. number of supported memory modules (slots)/channel in Intel’s desktop chipsets

133 200 266 333 400 533 667 800 1066 1333 1600

*2

1

3

4

*

*

*

MT/s

Slots/ch.

6. Main limiters of increasing the memory size (4)

Page 133: Dezső  Sima September 2008

Figure: Max. number of supported memory modules (slots)/channel in Intel’s server chipsets

133 200 266 333 400 533 667 800

*2

1

3

4

*

MT/s

Slots/ch

* *

*

* * *At intro.

Later

6. Main limiters of increasing the memory size (5)

Page 134: Dezső  Sima September 2008

Notes

1. Servers prefer memory size over memory speed. E.g.

• current desktop chipsets support

speed grades of up to DDR3-1333 (even DDR3-1600 with strong size restriction) andmemory sizes of up to 4 GB/channel,

• current server chipsets using parallel connected main memory support

speed grades of up to DDR2-667 but

memory sizes of up to 16/24 GB/channel.

2. Servers expect registered memory modules rather than unbuffered modules as desktops do. Registered modules provide buffering for the address and control lines, and through reducing signal loading they increase the number of supported memory slots (memory modules) and thus supported memory size.

3. On higher transfer rates the next wavefront arrives earlier on the transmission line, Less time remains until the next wavefront arrives the transmission line,

Less time remains for settling the reflections of the privious wavefront,Inter signal interferences (ISI) will raise.

Thus, for higher frequencies reflections, also skews and jitter impede more and more signal integrity. This limits the number of supported memory modules/channel.

Recent desktop chipsets support typically 1-2 whereas server chipsets with parallelcommunication path, typically 2-3 memory modules (slots)/channel.

6. Main limiters of increasing the memory size (6)

Page 135: Dezső  Sima September 2008

Rank capacity (CR)

CR = nD x D

with nD: Number of DRAM devices/rank

D: Device density

Number of DRAM devices/rank

E.g. A one-sided (single rank) DDR3 memory module built up of 8 devices

A 64-bit wide rank consists of 8 x8 or 16 x4 devices, and occupies usually one module side.

6. Main limiters of increasing the memory size (7)

Page 136: Dezső  Sima September 2008

Remark

Figure: Double sided DDR SDRAM DIMM with 16 stacked devices on each side [30]

A few Intel server chipsets, such as the E7500, 7501 supported stacked devices as well.E.g. the E7500 server chipset supported double-sided dual rank DIMMs with 16 stackeddevices (a rank) mounted on each side and yielding a total modul size of 2 GB.

6. Main limiters of increasing the memory size (8)

Page 137: Dezső  Sima September 2008

Figure: Evolution of DRAM densities (Mbit) and no. of units shipped/year (Based on [29])

6. Main limiters of increasing the memory size (9)

256M

64K

16M

1G

4M

256K

64M

1M

20151980 1985 1990 1995 2000 2005 2010

500

1000

1500

2000

16K

Units 106

Year

Density: ~4×/4Y

Device density

Page 138: Dezső  Sima September 2008

Figure: Supported max. device size and max memory size/channel in Intel’s desktop chipsets

133 200 266 333 400 533 667 800 1066 1333 1600

*1 GB *

*

MT/s

Max. dev. size

*512 Mb

1 Gb

512 Mb

133 200 266 333 400 533 667 800 1066 1333 1600

*2 GB

1 GB

3 GB

4 GB

*

*

MT/s

Max. mem. size/ch.

*

* * *

845 875P1 925X 975X P35

X482(1/02) (4/03) (6/04) (11/05) (6/07)

(3/08)

6. Main limiters of increasing the memory size (10)

Page 139: Dezső  Sima September 2008

Figure: Supported max. device size and max memory size/channel in Intel’s server chipsets

133 200 266 333 400 533 667 800

*1 Gb

*

*

MT/s

Max. dev. size

*512 Mb

1 Gb

512 Mb

2 Gb

*

*2 Gb

133 200 266 333 400 533 667 800

*

16 GG

8 GB

24 GB

MT/s

Max. mem. size/ch.

*

* *

E7501 E7520 51001

(12/02) (8/04) (1/08)

*

*

* *

At intro.

Later

6. Main limiters of increasing the memory size (11)

Page 140: Dezső  Sima September 2008

Notes

1. As the figures indicate, recent desktops provide up to 4 GB/channel memory size, whereas recent servers (with parallel bus attatchment) offer 4-8 times larger sizes.

2. Servers achieve larger memory sizes by• supporting more memory modules (with registering expected) than desktop chipsets do, and

• using higher density DRAM devices at the same speed grade (e.g. 1 Gb devices instead of 512 Mb devices or 2 Gb devices instead of 1 Gb devices than desktop chipsets.

3. Recent server chipsets supporting main memories with serial bus attachement (like Intel’s 5000 and 7000 DP and MP-family chipsets) support both more channels and more modules/channel providing much higher main memory sizes of up to 192 GB or more (see Section Main memories with serial bus attachment).

6. Main limiters of increasing the memory size (12)

Page 141: Dezső  Sima September 2008

For the same numbers of control units/modules/ranks

The rate of increasing DRAM densities

In accordance with Moore’s law (saying that the transistor count per chip is doubling about every 24 month

DRAM densities evolve about 4x/ 4 years.

the maximum size of main memories would increases also about 4x/4 years.

6. Main limiters of increasing the memory size (13)

But as the number of modules/channel decreases with higher transfer rates,

the maximum size of main memories increases by a rate < 4x/4 years.

Page 142: Dezső  Sima September 2008

7. References (1)

[2]: Moore G. E., No Exponential is Forever... ISSCC 2003, ftp://download.intel.com/research/silicon/Gordon_Moore_ISSCC_021003.pdf

[3]: Gshwind M., „Chip Multiprocessing and the Cell BE,” ACM Computing Frontiers, 2006, http://beatys1.mscd.edu/compfront//2006/cf06-gschwind.pdf

[4]: 16 Mb Synchronous DRAM, MT48LC4M4A1/A2, MT48LC2M8A1/A2 Micron, http://datasheet.digchip.com/297/297-04447-0-MT48LC2M8A1.pdf

[5]: Rhoden D., „The Evolution of DDR”, Via Technology Forum, 2005, http://www.via.com.tw/en/downloads/presentations/events/vtf2005/vtf05hd_inphi.pdf

[6]: Jacob B., Ng S. W., Wang D. T., Memory Systems, Elsevier, 2008

[7]: Backplane Designer’s Guide, Section 9 - Layout Considerations, Fairchild Semiconductor, Apr. 2002, http://www.fairchildsemi.com/ms/MS/MS-569.pdf

[1]: D. Bhandarkar: „The Dawn of a New Era”, 11. EMEA, May, 2006.

[8]: PC133 SDRAM RegisteredcDIMM Design Specification, Rev. 1.1, Aug. 1999, IBM & Reliance Computer Corp., http://www.simmtester.com/PAGE/memory/techdata_ pc133rev1_1.pdf

[9]: Horna O. A., „Pulse Reflection in Transmission Lines,” IEEE Transactions on Computers, Vol. C-20, No. 12, Dec. 1971, pp. 1558-1563

Page 143: Dezső  Sima September 2008

7. References (2)

[10]: Vo J., „A Comparison of Differential Termination Techniques,” Application Note 903, Aug. 1993, National Semiconductor, http://www1.control.com/PLCArchive/RS485_3.pdf

[11]: Allan G., „The outlook for DRAMs in consumer electronics”, EETIMES Europe Online, 01/12/2007, http://eetimes.eu/showArticle.jhtml?articleID=196901366&queryText =calibrated

[12]: Interfacing to DDR SDRAM with CoolRunner-II CPLDs, Application Note XAPP384, Febr. 2003, XILINC inc.

[13]: Double Data Rate (DDR) SDRAM MT46V128M4, MT46V64M8, MT46V32M16, Micron Techn. Inc, 2000, http://download.micron.com/pdf/datasheets/dram/ddr/512MBDDRx4x8x16.pdf

[14]: Kirstein B., „Practical timing analysis for 100-MHz digital design,”, EDN, Aug. 8, 2002, www.edn.com

[15]: Ebeling C., Koontz T., Krueger R., „System Clock Management Simplified with Virtex-II Pro FPGAs”, WP190, Febr. 25 2003, Xilinx, http://www.xilinx.com/support/ documentation/white_papers/wp190.pdf

[16]: DDR Simulation Process Introduction, TN-46-11, July 2005, Micron, http://download. micron.com/pdf/technotes/DDR/TN4611.pdf

[17]: Allan G., „DDR Integration,” Chip Design Magazine, June/July 2007

Page 144: Dezső  Sima September 2008

7. References (3)

[19]: Stub Series Terminated Logic for 1.8 Volts (SSTL-18), JEDEC Standard JESD8-15A, Sept. 2003

[20]: Double Data Rate (DDR) SDRAM Specification, JEDEC Standard JESD79E, May 2005

[22]: DDR3 SDRAM Standard, JEDEC Standard JESD79-3, June 2007

[21]: DDR2 SDRAM Specification, JEDEC Standard JESD79-2, May 2006

[23]: DDR2 (Point-to-Point) Features and Functionality, TN-47-19, Micron,2003, http://download.micron.com/pdf/technotes/ddr2/TN4719.pdf

[24]: Ahn J.-H., „Memory Design Overview,” March 2007, Hynix, http://netro.ajou.ac.kr/~jungyol/memory2.pdf

[18]: Stub Series Terminated Logic for 2.5 Volts (SSTL-2), EIA/JEDEC Standard JESD8-9, Sept. 1998

[25]: Micron Synchronous DRAM, 64 Mbit, MT48LC16M4A2, MT48LC16M8A2, MT48LC16M16A2, Micron Technology, Inc. http://www.micron.com/products/dram/sdram/partlist.aspx Oct. 2000

[26] General DDR SDRAM Functionality, TN-46-05, Micron Techn. Inc., July 2001, http://download.micron.com/pdf/technotes/TN4605.pdf

[27]: Haskill, „The Love/Hate relationship with DDR SDRAM Controllers,” Mosaid, Oct. 2006, http://www.mosaid.com/corporate/products-services/ip/SDRAM_Controller_whitepaper_ Oct_2006.pdf

Page 145: Dezső  Sima September 2008

7. References (4)

[28]: Introduction to Xilinx, Xilinx FPGA Design Workshop, http://www.eas.asu.edu/~kchatha/ cse320_f07/xilinx_intro.ppt

[29]: DRAM Pricing – A White Paper, Tachyon Semiconductors, http://www.tachyonsemi.com/about/papers/dram_pricing.pdf

[30]: Intel E7500 MCH A2 x4/x8 DDR Memory Limitations, Application Note AP-722, March 2002, Intel

[31]: Intel 925X/925XE Express Chipset, Datasheet, Rev. 001, Jun. 2004, Intel

[32]: Keeth B., Baker R. J., Johnson B., Lin F., DRAM Circuit Design, Wiley-Interscience, 2008