cse241a vlsi digital circuits winter 2003 lecture 02: datapath and memory

35
CSE241 L2 Datapath/Memory.1 Kahng & Cichy, UCSD ©2003 CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath and Memory

Upload: miriam

Post on 24-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath and Memory. Introduction: Basic Building Blocks. Datapath Execution units Adder, multiplier, divider, shifter, etc. Register file and pipeline registers Multiplexers, decoders Control - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath  and Memory

CSE241 L2 Datapath/Memory.1 Kahng & Cichy, UCSD ©2003

CSE241AVLSI Digital Circuits

Winter 2003

Lecture 02: Datapath

and Memory

Page 2: CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath  and Memory

CSE241 L2 Datapath/Memory.2 Kahng & Cichy, UCSD ©2003

Introduction: Basic Building Blocks

Datapath Execution units

- Adder, multiplier, divider, shifter, etc.

Register file and pipeline registers Multiplexers, decoders

Control Finite state machines (PLA, ROM, random logic)

Interconnect Switches, arbiters, buses

Memory Caches (SRAMs), TLBs, DRAMs, buffers

Page 3: CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath  and Memory

CSE241 L2 Datapath/Memory.3 Kahng & Cichy, UCSD ©2003

The 1-bit Binary Adder

1-bit Full Adder(FA)

A

BS

Cin

S = A B Cin

Cout = A&B | A&Cin | B&Cin (majority function)

How can we use it to build a 64-bit adder?

How can we modify it easily to build an adder/subtractor?

How can we make it better (faster, lower power, smaller)?

A B Cin CoutS carry status

0 0 0 0 0 kill

0 0 1 0 1 kill

0 1 0 0 1 propagate

0 1 1 1 0 propagate

1 0 0 0 1 propagate

1 0 1 1 0 propagate

1 1 0 1 0 generate

1 1 1 1 1 generate

Cout

G = A&BP = A BK = !A & !B

= P Cin

= G | P&Cin

Slide courtesy of Mary Jane Irwin, Penn state

Page 4: CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath  and Memory

CSE241 L2 Datapath/Memory.4 Kahng & Cichy, UCSD ©2003

FA Gate Level Implementations

A B

S

Cout

Cin

t1 t0t2 t0

t1

A B

S

Cout

Cin

t2

Slide courtesy of Mary Jane Irwin, Penn state

Page 5: CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath  and Memory

CSE241 L2 Datapath/Memory.5 Kahng & Cichy, UCSD ©2003

Review: XOR FA

Cout

S

Cin

A

B

16 transistors

Slide courtesy of Mary Jane Irwin, Penn state

Page 6: CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath  and Memory

CSE241 L2 Datapath/Memory.6 Kahng & Cichy, UCSD ©2003

Ripple Carry Adder (RCA)

A0 B0

S0

C0=CinFA

A1 B1

S1

FA

A2 B2

S2

FA

A3 B3

S3

FACout=C4

T = O(N) worst case delay

Tadder TFA(A,BCout) + (N-2)TFA(CinCout) + TFA(CinS)

Real Goal: Make the fastest possible carry pathMax delay = tdelay = tsum + (N-1) tcarry

Slide courtesy of Mary Jane Irwin, Penn state

Page 7: CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath  and Memory

CSE241 L2 Datapath/Memory.7 Kahng & Cichy, UCSD ©2003

Inversion Property

A B

S

CinFA

!Cout (A, B, Cin) = Cout (!A, !B, !Cin)

Cout

A B

S

FACout Cin

!S (A, B, Cin) = S(!A, !B, !Cin)

Inverting all inputs to a FA results in inverted values for all outputs

Slide courtesy of Mary Jane Irwin, Penn state

Page 8: CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath  and Memory

CSE241 L2 Datapath/Memory.8 Kahng & Cichy, UCSD ©2003

Exploiting the Inversion Property

A0 B0

S0

C0=CinFA’

A1 B1

S1

FA’

A2 B2

S2

FA’

A3 B3

S3

FA’Cout=C4

Now need two “flavors” of FAs

regular cellinverted cell

• Minimizes the critical path (the carry chain) by eliminating inverters between the FAs (will need to increase the transistor sizing on the carry chain portion of the mirror adder).

Slide courtesy of Mary Jane Irwin, Penn state

Page 9: CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath  and Memory

CSE241 L2 Datapath/Memory.10 Kahng & Cichy, UCSD ©2003

Fast Carry Chain Design

The key to fast addition is a low latency carry network

What matters is whether in a given position a carry is generated Gi = Ai & Bi = AiBi

propagated Pi = Ai Bi (sometimes use Ai | Bi) annihilated (killed) Ki = !Ai & !Bi

Giving a carry recurrence of

Ci+1 = Gi | PiCi

C1 = G0 | P0C0

C2 = G1 | P1G0 | P1P0 C0

C3 = G2 | P2G1 | P2P1G0 | P2P1P0 C0

C4 = G3 | P3G2 | P3P2G1 | P3P2P1G0 | P3P2P1P0 C0

Slide courtesy of Mary Jane Irwin, Penn state

Page 10: CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath  and Memory

CSE241 L2 Datapath/Memory.11 Kahng & Cichy, UCSD ©2003

Binary Adder Landscape

synchronous word parallel adders

ripple carry adders (RCA) carry prop min adders

signed-digit fast carry prop residue adders adders adders

Manchester carry parallel conditional carry carry chain select prefix sum skip

T = O(N), A = O(N)

T = O(1), A = O(N)

T = O(log N)A = O(N log N)

T = O(N), A = O(N)T = O(N)

A = O(N)

Page 11: CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath  and Memory

CSE241 L2 Datapath/Memory.12 Kahng & Cichy, UCSD ©2003

Parallel Prefix Adders (PPAs) Define carry operator € on (G,P) signal pairs

€ is associative, i.e.,

[(g’’’,p’’’) € (g’’,p’’)] € (g’,p’) = (g’’’,p’’’) € [(g’’,p’’) € (g’,p’)]

(G’’,P’’) (G’,P’)

(G,P)

where G = G’’ P’’G’ P = P’’P’

€ €

G’

!G

G’’

P’’

Slide courtesy of Mary Jane Irwin, Penn state

Page 12: CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath  and Memory

CSE241 L2 Datapath/Memory.13 Kahng & Cichy, UCSD ©2003

PPA General Structure Given P and G terms for each bit position, computing all

the carries is equal to finding all the prefixes in parallel

(G0,P0) € (G1,P1) € (G2,P2) € … € (GN-2,PN-2) € (GN-1,PN-1)

Since € is associative, we can group them in any order but note that it is not commutative

Measures to consider number of € cells tree cell depth (time) tree cell area cell fan-in and fan-out max wiring length wiring congestion delay path variation

(glitching)

Pi, Gi logic (1 unit delay)

Si logic (1 unit delay)

Ci parallel prefix logic tree (1 unit delay per level)

Slide courtesy of Mary Jane Irwin, Penn state

Page 13: CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath  and Memory

CSE241 L2 Datapath/Memory.14 Kahng & Cichy, UCSD ©2003

Adder Types

RCA = Ripple Carry

MCC = Manchester Carry Chain

CCSka = Carry-Chain haSave

VCSka =

CCSia = Carry Save with Invert

BK = Brent Kung

Others: Ling-Ling ELM Kogge-Stone

Page 14: CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath  and Memory

CSE241 L2 Datapath/Memory.18 Kahng & Cichy, UCSD ©2003

Review: Basic Building Blocks

Datapath Execution units

- Adder, multiplier, divider, shifter, etc. Register file and pipeline registers Multiplexers, decoders

Control Finite state machines (PLA, ROM, random

logic)

Interconnect Switches, arbiters, buses

Memory Caches (SRAMs), TLBs, DRAMs, buffers

Page 15: CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath  and Memory

CSE241 L2 Datapath/Memory.19 Kahng & Cichy, UCSD ©2003

Parallel Programmable Shifters

Dat

a In

Control =

Dat

a O

ut

Shift amountShift directionShift type (logical, arith, circular)

Shifters used in multipliers, floating point units

Consume lots of area if done in random logic gates

Slide courtesy of Mary Jane Irwin, Penn state

Page 16: CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath  and Memory

CSE241 L2 Datapath/Memory.20 Kahng & Cichy, UCSD ©2003

Shifters - Applications

Linear shifting Concatenate 2 words (N-bits) and pull out a contiguous N-bit word.

Take an portion of a word and shift to to the left or right- Multiply by 2M

- Pad the emptied position with 0’s or 1’s

- Arithmetic shifts

– Left shift, pad 0’s

– Right shift, pad 1’s

Barrel shifting Emptied position filled with bit dropped off. Rotational shifting… circular convolution.

wordA wordB

wordC

Slide courtesy of Ken Yang, UCLA

Page 17: CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath  and Memory

CSE241 L2 Datapath/Memory.21 Kahng & Cichy, UCSD ©2003

A Programmable Binary Shifter

rgt nop left

Ai

Ai-1 Bi-1

Bi

Ai Ai-1rgt nop left Bi Bi-1

A1 A00 1 0 A1 A0

A1 A01 0 0 0 A1

A1 A00 0 1 A0

0

•Slide courtesy of Mary Jane Irwin, Penn state

Page 18: CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath  and Memory

CSE241 L2 Datapath/Memory.23 Kahng & Cichy, UCSD ©2003

4-bit Barrel Shifter

A0

A1

A2

A3

B0

B1

B2

B3

Sh1

Sh2

Sh3

Sh0 Sh1 Sh2 Sh3

Example: Sh0 = 1 B3B2B1B0 = A3A2A1A0

Sh1 = 1 B3B2B1B0 = A3A3A2A1

Sh2 = 1 B3B2B1B0 = A3A3A3A2

Sh3 = 1 B3B2B1B0 = A3A3A3A3

Area dominated by wiring

•Slide courtesy of Mary Jane Irwin, Penn state

Page 19: CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath  and Memory

CSE241 L2 Datapath/Memory.25 Kahng & Cichy, UCSD ©2003

4-bit Barrel Shifter Layout

BufferSh3Sh2Sh1Sh0

A3

A2

A1

A0

Widthbarrel ~ 2 pm N N = max shift distance, pm = metal pitchDelay ~ 1 fet + N diff caps

Widthbarrel

Only one Sh#active at a timel

•Slide courtesy of Mary Jane Irwin, Penn state

Page 20: CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath  and Memory

CSE241 L2 Datapath/Memory.26 Kahng & Cichy, UCSD ©2003

Review: Basic Building Blocks

Datapath Execution units

- Adder, multiplier, divider, shifter, etc.

Register file and pipeline registers

Memories SRAM cell

- 6T

DRAM - 1T

Other types- 1T SRAM

Page 21: CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath  and Memory

CSE241 L2 Datapath/Memory.27 Kahng & Cichy, UCSD ©2003

Semiconductor Memories

RWM

Read Write Memory

NVRWM

Non Volatile

ROM

Read Only Random Access

Non-Random Access

EPROM Mask-programmed

SRAM (cache,

register file)

FIFO/LIFO E2PROM

DRAM Shift Register

CAM

FLASH Electrically-programmed

(PROM)

Slide courtesy of Mary Jane Irwin, Penn state

Page 22: CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath  and Memory

CSE241 L2 Datapath/Memory.28 Kahng & Cichy, UCSD ©2003

SecondLevelCache

(SRAM)

A Typical Memory Hierarchy

Control

Datapath

SecondaryMemory(Disk)

On-Chip Components

Reg

File

MainMemory(DRAM)

Data

Cach

eIn

strC

ache

ITL

BD

TL

B

eDRAM

Speed (ns): .1’s 1’s 10’s 100’s 1,000’sSize (bytes): 100’s K’s 10K’s M’s T’s

Cost: highest lowest

By taking advantage of the principle of locality: Present the user with as much memory as is available in the

cheapest technology. Provide access at the speed offered by the fastest technology.

Slide courtesy of Mary Jane Irwin, Penn state

Page 23: CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath  and Memory

CSE241 L2 Datapath/Memory.29 Kahng & Cichy, UCSD ©2003

Access Time comparison

Type Time (ns)

RDRAM 35ns

DRAM 30ns

SRAM 2ns

FLASH 90ns

FRAM 100ns

ROM (read) 50ns

Latency•Time to read

Bandwidth•Throughput of system

(Generalized)

Page 24: CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath  and Memory

CSE241 L2 Datapath/Memory.30 Kahng & Cichy, UCSD ©2003

Read-Write Memories (RAMs) Static – SRAM

data is stored as long as supply is applied large cells (6 fets/cell) – so fewer bits/chip fast – so used where speed is important (e.g., caches) differential outputs (output BL and !BL) use sense amps for performance compatible with CMOS technology

Dynamic – DRAM periodic refresh required small cells (1 to 3 fets/cell) – so more bits/chip slower – so used for main memories single ended output (output BL only) need sense amps for correct operation not typically compatible with CMOS technology

Slide courtesy of Mary Jane Irwin, Penn state

Page 25: CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath  and Memory

CSE241 L2 Datapath/Memory.31 Kahng & Cichy, UCSD ©2003

6-transistor SRAM Cell

!BL BL

WL

M1

M2

M3

M4

M5M6Q

!Q

Slide courtesy of Mary Jane Irwin, Penn state

Page 26: CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath  and Memory

CSE241 L2 Datapath/Memory.32 Kahng & Cichy, UCSD ©2003

SRAM Cell Analysis (Read)

!BL=1 BL=1

WL=1

M1

M4

M5M6

Q=1!Q=0

CbitCbit

Read-disturb (read-upset): must carefully limit the allowed voltage rise on !Q to a value that prevents the read-upset condition from occurring while simultaneously maintaining acceptable circuit speed and area constraints

Slide courtesy of Mary Jane Irwin, Penn state

Page 27: CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath  and Memory

CSE241 L2 Datapath/Memory.33 Kahng & Cichy, UCSD ©2003

SRAM Cell Analysis (Read)

!BL=1 BL=1

WL=1

M1

M4

M5M6

Q=1!Q=0

CbitCbit

Cell Ratio (CR) = (WM1/LM1)/(WM5/LM5)

V!Q = [(Vdd - VTn)(1 + CR (CR(1 + CR))]/(1 + CR)

Slide courtesy of Mary Jane Irwin, Penn state

Page 28: CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath  and Memory

CSE241 L2 Datapath/Memory.34 Kahng & Cichy, UCSD ©2003

Read Voltages Ratios

0

0.2

0.4

0.6

0.8

1

1.2

0.3 0.6 0.9 1.2 1.5 1.8 2.1 2.4

Cell Ratio (CR)

Vo

ltag

e R

ise

on

!Q

Vdd = 2.5VVTn = 0.5V

Slide courtesy of Mary Jane Irwin, Penn state

Page 29: CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath  and Memory

CSE241 L2 Datapath/Memory.35 Kahng & Cichy, UCSD ©2003

SRAM Cell Analysis (Write)

!BL=1 BL=0

WL=1

M1

M4

M5M6

Q=1!Q=0

Pullup Ratio (PR) = (WM4/LM4)/(WM6/LM6)

VQ = (Vdd - VTn) ((Vdd – VTn)2 – (p/n)(PR)((Vdd – VTn - VTp)2)

Slide courtesy of Mary Jane Irwin, Penn state

Page 30: CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath  and Memory

CSE241 L2 Datapath/Memory.36 Kahng & Cichy, UCSD ©2003

Write Voltages Ratios

0

0.2

0.4

0.6

0.8

1

0.3 0.6 0.9 1.2 1.5 1.8 2.1 2.4

Pullup Ratio (PR)

Wri

te V

olt

age

(VQ

)

Vdd = 2.5V|VTp| = 0.5V

p/n = 0.5

Slide courtesy of Mary Jane Irwin, Penn state

Page 31: CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath  and Memory

CSE241 L2 Datapath/Memory.37 Kahng & Cichy, UCSD ©2003

Cell Sizing

Keeping cell size minimized is critical for large caches

Minimum sized pull down fets (M1 and M3) Requires minimum width and longer than minimum channel

length pass transistors (M5 and M6) to ensure proper CR But sizing of the pass transistors increases capacitive load on

the word lines and limits the current discharged on the bit lines both of which can adversely affect the speed of the read cycle

Minimum width and length pass transistors Boost the width of the pull downs (M1 and M3) Reduces the loading on the word lines and increases the

storage capacitance in the cell – both are good! – but cell size may be slightly larger

Slide courtesy of Mary Jane Irwin, Penn state

Page 32: CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath  and Memory

CSE241 L2 Datapath/Memory.38 Kahng & Cichy, UCSD ©2003

6T-SRAM Layout

VDD

GND

QQ

WL

BLBL

M1 M3

M4M2

M5 M6

Slide courtesy of Mary Jane Irwin, Penn state

Page 33: CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath  and Memory

CSE241 L2 Datapath/Memory.39 Kahng & Cichy, UCSD ©2003

1-Transistor DRAM Cell

M1 X

BL

WL

X Vdd-Vt

WLwrite“1”

BL Vdd

Write: Cs is charged (or discharged) by asserting WL and BLRead: Charge redistribution occurs between CBL and Cs

Cs

read“1”

Vdd/2 sensing

Read is destructive, so must refresh after read

CBL

Slide courtesy of Mary Jane Irwin, Penn state

Page 34: CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath  and Memory

CSE241 L2 Datapath/Memory.40 Kahng & Cichy, UCSD ©2003

1-T DRAM Cell

(a) Cross-section

(b) Layout

Diffusedbit line

Polysiliconplate

M1 wordline

Capacitor

Polysilicongate

Metal word line

SiO2

n+ Field Oxide

Inversion layerinduced by plate bias

n+

poly

poly

Used Polysilicon-Diffusion Capacitance

Expensive in Area

Slide courtesy of Mary Jane Irwin, Penn state

Page 35: CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath  and Memory

CSE241 L2 Datapath/Memory.41 Kahng & Cichy, UCSD ©2003

DRAM Cell Observations

DRAM memory cells are single ended (complicates the design of the sense amp)

1T cell requires a sense amp for each bit line due to charge redistribution read

1T cell read is destructive; refresh must follow to restore data

1T cell requires an extra capacitor that must be explicitly included in the design

A threshold voltage is lost when writing a 1 can be circumvented by bootstrapping the word lines to a higher

value than Vdd

Not usually available on chip, unless analog elements are present