performance_characterization.pdf
TRANSCRIPT
-
7/27/2019 Performance_characterization.pdf
1/46
1
ECE 261 Krish Chakrabarty 1
Performance Characterization
Delay analysis Transistor sizing Logical effort Power analysis
ECE 261 Krish Chakrabarty 2
Delay Definitions
tpdr: rising propagation delay From input to rising output crossing VDD/2
tpdf:falling propagation delay From input to falling output crossing VDD/2
tpd: average propagation delay tpd = (tpdr + tpdf)/2
tr: rise time
From output crossing 0.2 VDD to 0.8 VDD
tf:fall time
From output crossing 0.8 VDD to 0.2 VDD
-
7/27/2019 Performance_characterization.pdf
2/46
2
ECE 261 Krish Chakrabarty 3
Simulated Inverter Delay Solving differential equations by hand is too hard SPICE simulator solves the equations numerically
Uses more accurate I-V models too! But simulations take time to write
ECE 261 Krish Chakrabarty 4
Delay Estimation We would like to be able to easily estimate delay
Not as accurate as simulation But easier to ask What if?
The step response usually looks like a 1st order RCresponse with a decaying exponential.
Use RC delay models to estimate delay C = total capacitance on output node Use effective resistance R So that t
pd
= RC Characterize transistors by finding their effective R
Depends on average current as gate switches
-
7/27/2019 Performance_characterization.pdf
3/46
3
ECE 261 Krish Chakrabarty 5
RC Delay Models Use equivalent circuits for MOS transistors
Ideal switch + capacitance and ON resistance Unit nMOS has resistance R, capacitance C Unit pMOS has resistance 2R, capacitance C
Capacitance proportional to width Resistance inversely proportional to width
ECE 261 Krish Chakrabarty 6
Example: 3-input NAND
Sketch a 3-input NAND with transistor widthschosen to achieve effective rise and fall
resistances equal to a unit inverter (R).
-
7/27/2019 Performance_characterization.pdf
4/46
4
ECE 261 Krish Chakrabarty 7
Example: 3-input NAND Sketch a 3-input NAND with transistor widths
chosen to achieve effective rise and fall
resistances equal to a unit inverter (R).
ECE 261 Krish Chakrabarty 8
Example: 3-input NAND Sketch a 3-input NAND with transistor widths
chosen to achieve effective rise and fall
resistances equal to a unit inverter (R).
-
7/27/2019 Performance_characterization.pdf
5/46
5
ECE 261 Krish Chakrabarty 9
3-input NAND Caps Annotate the 3-input NAND gate with gate and
diffusion capacitance.
ECE 261 Krish Chakrabarty 10
3-input NAND Caps Annotate the 3-input NAND gate with gate and
diffusion capacitance.
-
7/27/2019 Performance_characterization.pdf
6/46
6
ECE 261 Krish Chakrabarty 11
3-input NAND Caps Annotate the 3-input NAND gate with gate and
diffusion capacitance.
ECE 261 Krish Chakrabarty 12
Elmore Delay ON transistors look like resistors Pullup or pulldown network modeled asRC ladder Elmore delay of RC ladder
-
7/27/2019 Performance_characterization.pdf
7/46
7
ECE 261 Krish Chakrabarty 13
Example: 2-input NAND Estimate worst-case rising and falling delay of 2-
input NAND driving h identical gates.
ECE 261 Krish Chakrabarty 14
Example: 2-input NAND
Estimate rising and falling propagation delays of a2-input NAND driving h identical gates.
-
7/27/2019 Performance_characterization.pdf
8/46
8
ECE 261 Krish Chakrabarty 15
Example: 2-input NAND Estimate rising and falling propagation delays of a
2-input NAND driving h identical gates.
ECE 261 Krish Chakrabarty 16
Example: 2-input NAND
Estimate rising and falling propagation delays of a2-input NAND driving h identical gates.
-
7/27/2019 Performance_characterization.pdf
9/46
9
ECE 261 Krish Chakrabarty 17
Example: 2-input NAND Estimate rising and falling propagation delays of a
2-input NAND driving h identical gates.
ECE 261 Krish Chakrabarty 18
Example: 2-input NAND Estimate rising and falling propagation delays of a
2-input NAND driving h identical gates.
-
7/27/2019 Performance_characterization.pdf
10/46
10
ECE 261 Krish Chakrabarty 19
Example: 2-input NAND Estimate rising and falling propagation delays of a
2-input NAND driving h identical gates.
ECE 261 Krish Chakrabarty 20
Delay Components
Delay has two parts Parasitic delay
6 or 7 RC Independent of load
Effort delay 4h RC Proportional to load capacitance
-
7/27/2019 Performance_characterization.pdf
11/46
11
ECE 261 Krish Chakrabarty 21
Contamination Delay Best-case (contamination) delay can be
substantially less than propagation delay. Ex: If both inputs fall simultaneously
ECE 261 Krish Chakrabarty 22
Diffusion Capacitance We assumed contacted diffusion on every s / d. Good layout minimizes diffusion area Ex: NAND3 layout shares one diffusion contact
Reduces output capacitance by 2C Merged uncontacted diffusion might help too
-
7/27/2019 Performance_characterization.pdf
12/46
12
ECE 261 Krish Chakrabarty 23
Layout Comparison Which layout is better?
ECE 261 Krish Chakrabarty 24
Resizing the Inverter
n-diffusion
Minimum-sized transistor:W=3, L=2
2
3
poly
2p-diffusion
poly
9
To get equal rise and fall times,n = pWp = 3Wn, assumingthat electron mobility is three times that of holesWp=9
Sometimes the function
being implemented
makes resizing
unnecessary!
-
7/27/2019 Performance_characterization.pdf
13/46
13
ECE 261 Krish Chakrabarty 25
Analyzing the NAND GateVDD
ab
ab
F
Gndc
cn1
n2n3
p1 p2 p3 n, eff= 1+
1n1
1n2
1n3
+
Resistances are in series (conductancesare in parallel)n1= n2 = n3If then n, eff= n/3
Pull-down circuit has three times resistance,one-third times the conductance
= n
For pull-up, only one transistor has to be on, p, eff= min{p1,p2,p3}p1= p2 = p3If then n, eff= p = p = n/3 no resizing is necessary
ECE 261 Krish Chakrabarty 26
Analyzing the NOR Gate
p, eff= 1+
1p1
1p2
1p3
+
Resistances are in series (conductancesare in parallel)p1= p2 = p3If then p, eff= p/3
Pull-up circuit has three times resistance,one-third times the conductance
= p
For pull-down, only one transistor has to be on, n, eff= min{n1,n2,n3}n1= n2 = n3If then n,eff=9p,eff = n = 3p considerable resizing is
necessaryWp = 9Wn!
VDDab
ab
Gnd
c
c
p1
p2p3
n1 n2 n3
-
7/27/2019 Performance_characterization.pdf
14/46
14
ECE 261 Krish Chakrabarty 27
Effect of Series Transistors
L
W
L
L
poly
poly
poly
Diffusion
3L
W
Diffusionpoly
ECE 261 Krish Chakrabarty 28
Effect of Series Transistors
VDDab c
pp
pPull-down
Resize the pull-up transistors tomake pull-up times equalAfter resizing: a: 2p, b: 2p, c: p
Transistorresizingexample
-
7/27/2019 Performance_characterization.pdf
15/46
15
ECE 261 Krish Chakrabarty 29
Transistor Placement (Series Stack)
Body effect: VtVsb
a
b
F
Gnd
c
Pull-up
stack
Ca
Cb
Cc
ta
tb
tc
At time t = 0, a=b=c=0, f=1, capacitances are charged
Ideally Vta = Vtb = Vtc 0.8V However, Vta > Vtb > Vtc because of
body effect
If a, b, c become 1 at the same time, whichtransistor will switch on first?
How to order transistors in a series stack?
tc will switch on first (Vsb for tc is zero), Cc willdischarge, pulling Vsb for tb to zero
If signals arrive at different times, how should the transistors be ordered?
Design strategy: place latest arriving signal nearestto output-early signals will discharge internal
nodes
ECE 261 Krish Chakrabarty 30
Transistor Placement
a
b
F
Gnd
c
Pull-up
stack
Ca
Cb
Cc
ta
tb
tcPrimary
inputs
(change
simultaneously)
2
2
2
2
a
bF
c
Ca
Cb
Cc
ta
tb
tc
2
2
2 2
Pull-up
stack
-
7/27/2019 Performance_characterization.pdf
16/46
16
ECE 261 Krish Chakrabarty 31
Some Design Guidelines
Use NAND gates (instead of NOR) whereverpossible
Placed inverters (buffers) at high fanout nodes toimprove drive capability
Avoid use of NOR completely in high-speedcircuits: A1 + A2 + + An = A1.A2.An
ECE 261 Krish Chakrabarty 32
Some Design Guidelines Use limited fan-in (
-
7/27/2019 Performance_characterization.pdf
17/46
17
ECE 261 Krish Chakrabarty 33
Logical Effort
Chip designers face a bewildering array of choices What is the best circuit topology for a function? How many stages of logic give least delay? How wide should the transistors be?
Logical effort is a method to make these decisions Uses a simple model of delay Allows back-of-the-envelope calculations Helps make rapid comparisons between alternatives Emphasizes remarkable symmetries
ECE 261 Krish Chakrabarty 34
Delay in a Logic Gate Express delays in process-independent unit
= 3RC
12 ps in 180 nm process
40 ps in 0.6 m process
-
7/27/2019 Performance_characterization.pdf
18/46
18
ECE 261 Krish Chakrabarty 35
Delay in a Logic Gate Express delays in process-independent unit
Delay has two components
ECE 261 Krish Chakrabarty 36
Delay in a Logic Gate
Express delays in process-independent unit
Delay has two components
Effort delayf= gh (a.k.a. stage effort) Again has two components
-
7/27/2019 Performance_characterization.pdf
19/46
19
ECE 261 Krish Chakrabarty 37
Delay in a Logic Gate Express delays in process-independent unit
Delay has two components Effort delayf = gh (a.k.a. stage effort)
Again has two components g: logical effort
Measures relative ability of gate to deliver current g 1 for inverter
ECE 261 Krish Chakrabarty 38
Delay in a Logic Gate Express delays in process-independent unit
Delay has two components Effort delayf = gh (a.k.a. stage effort)
Again has two components h: electrical effort= C
out
/ Cin
Ratio of output to input capacitance Sometimes called fanout
-
7/27/2019 Performance_characterization.pdf
20/46
20
ECE 261 Krish Chakrabarty 39
Delay in a Logic Gate Express delays in process-independent unit
Delay has two components Parasitic delay p
Represents delay of gate driving no load
Set by internal parasitic capacitance
ECE 261 Krish Chakrabarty 40
Delay Plots
d = f+ p = gh + p
-
7/27/2019 Performance_characterization.pdf
21/46
21
ECE 261 Krish Chakrabarty 41
Delay Plots
d =f+p = gh +p
ECE 261 Krish Chakrabarty 42
Computing Logical Effort Definition:Logical effort is the ratio of the input
capacitance of a gate to the input capacitance of
an inverter delivering the same output current. Measure from delay vs. fanout plots Or estimate by counting transistor widths
-
7/27/2019 Performance_characterization.pdf
22/46
22
ECE 261 Krish Chakrabarty 43
Catalog of Gates
Gate type Number of inputs1 2 3 4 n
Inverter 1NAND 4/3 5/3 6/3 (n+2)/3NOR 5/3 7/3 9/3 (2n+1)/3Tristate / mux 2 2 2 2 2
Logical effort of common gates
ECE 261 Krish Chakrabarty 44
Catalog of Gates
Gate type Number of inputs1 2 3 4 n
Inverter 1NAND 2 3 4 nNOR 2 3 4 nTristate / mux 2 4 6 8 2nXOR, XNOR 4 6 8
Parasitic delay of common gates In multiples of pinv (1)
-
7/27/2019 Performance_characterization.pdf
23/46
23
ECE 261 Krish Chakrabarty 45
Example: Ring Oscillator Estimate the frequency of an N-stage ring oscillator
Logical Effort: g = Electrical Effort: h =Parasitic Delay: p =Stage Delay: d =Frequency: fosc =
ECE 261 Krish Chakrabarty 46
Example: Ring Oscillator
Estimate the frequency of an N-stage ring oscillator
Logical Effort: g = 1Electrical Effort: h = 1Parasitic Delay: p = 1Stage Delay: d = 2Frequency: fosc = 1/(2*N*d) = 1/4N
31 stage ring oscillator in
0.6 m process hasfrequency of ~ 200 MHz
-
7/27/2019 Performance_characterization.pdf
24/46
24
ECE 261 Krish Chakrabarty 47
Example: FO4 Inverter Estimate the delay of a fanout-of-4 (FO4) inverter
Logical Effort: g = Electrical Effort: h =Parasitic Delay:
p =
Stage Delay: d =
ECE 261 Krish Chakrabarty 48
Example: FO4 Inverter Estimate the delay of a fanout-of-4 (FO4) inverter
Logical Effort: g = 1Electrical Effort: h = 4Parasitic Delay: p = 1Stage Delay: d = 5
The FO4 delay is about
200 ps in 0.6 m process
60 ps in a 180 nm process
f/3 ns in an fm process
-
7/27/2019 Performance_characterization.pdf
25/46
25
ECE 261 Krish Chakrabarty 49
Multistage Logic Networks Logical effort generalizes to multistage networks Path Logical Effort Path Electrical Effort Path Effort
ECE 261 Krish Chakrabarty 50
Multistage Logic Networks Logical effort generalizes to multistage networks Path Logical Effort Path Electrical Effort Path Effort Can we write F = GH?
-
7/27/2019 Performance_characterization.pdf
26/46
26
ECE 261 Krish Chakrabarty 51
Paths that Branch
No! Consider paths that branch:G =H =GH=h
1
=h2 =F = GH?
ECE 261 Krish Chakrabarty 52
Paths that Branch
No! Consider paths that branch:G = 1H = 90 / 5 = 18GH= 18h1
= (15 +15) / 5 = 6
h2 = 90 / 15 = 6F = g1g2h1h2 = 36 = 2GH
-
7/27/2019 Performance_characterization.pdf
27/46
27
ECE 261 Krish Chakrabarty 53
Branching Effort Introduce branching effort
Accounts for branching between stages in path
Now we compute the path effort F = GBH
Note:
ECE 261 Krish Chakrabarty 54
Multistage Delays
Path Effort Delay Path Parasitic Delay Path Delay
-
7/27/2019 Performance_characterization.pdf
28/46
28
ECE 261 Krish Chakrabarty 55
Designing Fast Circuits
Delay is smallest when each stage bears same effort
Thus minimum delay of N stage path is
This is a key result of logical effort
Find fastest possible delay Doesnt require calculating gate sizes
ECE 261 Krish Chakrabarty 56
Gate Sizes How wide should the gates be for least delay?
Working backward, apply capacitance transformation tofind input capacitance of each gate given load it drives.
Check work by verifying input cap spec is met.
-
7/27/2019 Performance_characterization.pdf
29/46
29
ECE 261 Krish Chakrabarty 57
Example: 3-stage path
Select gate sizes x and y for least delay from A toB
ECE 261 Krish Chakrabarty 58
Example: 3-stage path
Logical Effort G = Electrical Effort H =Branching Effort B =Path Effort F =Best Stage Effort Parasitic Delay P =Delay D =
-
7/27/2019 Performance_characterization.pdf
30/46
30
ECE 261 Krish Chakrabarty 59
Example: 3-stage path
Logical Effort G = (4/3)*(5/3)*(5/3) = 100/27Electrical Effort H = 45/8Branching Effort B = 3 * 2 = 6Path Effort F = GBH = 125Best Stage Effort Parasitic Delay P = 2 + 3 + 2 = 7Delay D = 3*5 + 7 = 22 = 4.4 FO4
ECE 261 Krish Chakrabarty 60
Example: 3-stage path
Work backward for sizesy =x =
-
7/27/2019 Performance_characterization.pdf
31/46
31
ECE 261 Krish Chakrabarty 61
Example: 3-stage path Work backward for sizes
y = 45 * (5/3) / 5 = 15x = (15*2) * (5/3) / 5 = 10
ECE 261 Krish Chakrabarty 62
Best Number of Stages How many stages should a path use?
Minimizing number of stages is not always fastest Example: drive 64-bit datapath with unit inverter
D =
-
7/27/2019 Performance_characterization.pdf
32/46
32
ECE 261 Krish Chakrabarty 63
Best Number of Stages How many stages should a path use?
Minimizing number of stages is not always fastest Example: drive 64-bit datapath with unit inverter
D = NF1/N + P = N(64)1/N + N
ECE 261 Krish Chakrabarty 64
Derivation Consider adding inverters to end of path
How many give least delay?
Defi
ne best stage effort
-
7/27/2019 Performance_characterization.pdf
33/46
33
ECE 261 Krish Chakrabarty 65
Best Stage Effort
has no closed-form solution Neglecting parasitics (pinv = 0), we find = 2.718
(e) For pinv = 1, solve numerically for = 3.59
ECE 261 Krish Chakrabarty 66
Review of DefinitionsTerm Stage Pathnumber of stageslogical effortelectrical effortbranching effortefforteffort delayparasitic delaydelay
-
7/27/2019 Performance_characterization.pdf
34/46
34
ECE 261 Krish Chakrabarty 67
Method of Logical Effort1) Compute path effort2) Estimate best number of stages3) Sketch path with N stages4) Estimate least delay5) Determine best stage effort6) Find gate sizes
ECE 261 Krish Chakrabarty 68
Limits of Logical Effort
Chicken and egg problem Need path to compute G But dont know number of stages without G
Simplistic delay model Neglects input rise time effects
Interconnect
Iteration required in designs with wire
Maximum speed only
Not minimum area/power for constrained delay
-
7/27/2019 Performance_characterization.pdf
35/46
35
ECE 261 Krish Chakrabarty 69
Summary
Logical effort is useful for thinking of delay in circuits Numeric logical effort characterizes gates NANDs are faster than NORs in CMOS Paths are fastest when effort delays are ~4 Path delay is weakly sensitive to stages, sizes But using fewer stages doesnt mean faster paths Delay of path is about log4F FO4 inverter delays Inverters and NAND2 best for driving large caps
Provides language for discussing fast circuits But requires practice to master
ECE 261 Krish Chakrabarty 70
Power and Energy
Power is drawn from a voltage source attached tothe VDD pin(s) of a chip.
Instantaneous Power: Energy: Average Power:
-
7/27/2019 Performance_characterization.pdf
36/46
36
ECE 261 Krish Chakrabarty 71
Dynamic Power Dynamic power is required to charge and discharge load
capacitances when transistors switch. One cycle involves a rising and falling output. On rising output, charge Q = CVDD is required On falling output, charge is dumped to GND This repeats Tfsw timesover an interval of T
ECE 261 Krish Chakrabarty 72
Dynamic Power Cont.
-
7/27/2019 Performance_characterization.pdf
37/46
37
ECE 261 Krish Chakrabarty 73
Dynamic Power Cont.
ECE 261 Krish Chakrabarty 74
Activity Factor
Suppose the system clock frequency = f Let fsw = f, where = activity factor
If the signal is a clock, = 1 If the signal switches once per cycle, = Dynamic gates:
Switch either 0 or 2 times per cycle, = Static gates:
Depends on design, but typically = 0.1 Dynamic power:
-
7/27/2019 Performance_characterization.pdf
38/46
38
ECE 261 Krish Chakrabarty 75
Short Circuit Current
When transistors switch, both nMOS and pMOSnetworks may be momentarily ON at once
Leads to a blip of short circuit current. < 10% of dynamic power if rise/fall times are
comparable for input and output
ECE 261 Krish Chakrabarty 76
Example
200 Mtransistor chip 20M logic transistors
Average width: 12 180M memory transistors
Average width: 4 1.2 V 100 nm process Cg = 2 fF/m
-
7/27/2019 Performance_characterization.pdf
39/46
39
ECE 261 Krish Chakrabarty 77
Dynamic Example
Static CMOS logic gates: activity factor = 0.1 Memory arrays: activity factor = 0.05 (many
banks!) Estimate dynamic power consumption per MHz.
Neglect wire capacitance and short-circuit current.
ECE 261 Krish Chakrabarty 78
Dynamic Example Static CMOS logic gates: activity factor = 0.1 Memory arrays: activity factor = 0.05 (many
banks!) Estimate dynamic power consumption per MHz.
Neglect wire capacitance.
-
7/27/2019 Performance_characterization.pdf
40/46
40
ECE 261 Krish Chakrabarty 79
Static Power Static power is consumed even when chip is
quiescent. Ratioed circuits burn power in fight between ON
transistors Leakage draws power from nominally OFF devices
ECE 261 Krish Chakrabarty 80
Ratio Example The chip contains a 32 word x 48 bit ROM
Uses pseudo-nMOS decoder and bitline pullups On average, one wordline and 24 bitlines are high
Find static power drawn by the ROM = 75 A/V2 Vtp = -0.4V
-
7/27/2019 Performance_characterization.pdf
41/46
41
ECE 261 Krish Chakrabarty 81
Ratio Example The chip contains a 32 word x 48 bit ROM
Uses pseudo-nMOS decoder and bitline pullups On average, one wordline and 24 bitlines are high
Find static power drawn by the ROM = 75 A/V2 Vtp = -0.4V
Solution:
ECE 261 Krish Chakrabarty 82
Leakage Example
The process has two threshold voltages and two oxidethicknesses.
Subthreshold leakage: 20 nA/m for low Vt 0.02 nA/m for high Vt
Gate leakage: 3 nA/m for thin oxide 0.002 nA/m for thick oxide
Memories use low-leakage transistors everywhere Gates use low-leakage transistors on 80% of logic
-
7/27/2019 Performance_characterization.pdf
42/46
42
ECE 261 Krish Chakrabarty 83
Leakage Example Cont.
Estimate static power:
ECE 261 Krish Chakrabarty 84
Leakage Example Cont. Estimate static power:
High leakage: Low leakage:
-
7/27/2019 Performance_characterization.pdf
43/46
43
ECE 261 Krish Chakrabarty 85
Leakage Example Cont.
Estimate static power: High leakage: Low leakage:
If no low leakage devices, Pstatic = 749 mW (!)
ECE 261 Krish Chakrabarty 86
Low Power Design
Reduce dynamic power : C: VDD: f:
Reduce static power
-
7/27/2019 Performance_characterization.pdf
44/46
44
ECE 261 Krish Chakrabarty 87
Low Power Design
Reduce dynamic power : clock gating, sleep mode C: VDD: f:
Reduce static power
ECE 261 Krish Chakrabarty 88
Low Power Design
Reduce dynamic power : clock gating, sleep mode C: small transistors (esp. on clock), short wires VDD: f:
Reduce static power
-
7/27/2019 Performance_characterization.pdf
45/46
45
ECE 261 Krish Chakrabarty 89
Low Power Design
Reduce dynamic power : clock gating, sleep mode C: small transistors (esp. on clock), short wires VDD: lowest suitable voltage f:
Reduce static power
ECE 261 Krish Chakrabarty 90
Low Power Design
Reduce dynamic power : clock gating, sleep mode C: small transistors (esp. on clock), short wires VDD: lowest suitable voltage f: lowest suitable frequency
Reduce static power
-
7/27/2019 Performance_characterization.pdf
46/46
ECE 261 Krish Chakrabarty 91
Low Power Design
Reduce dynamic power : clock gating, sleep mode C: small transistors (esp. on clock), short wires VDD: lowest suitable voltage f: lowest suitable frequency
Reduce static power Selectively use ratioed circuits Selectively use low V
t
devices Leakage reduction:
stacked devices, body bias, low temperature