driver sizing - eda.ee.ucla.edu
TRANSCRIPT
ECE902 VLSI Interconnects
Fall 1999, Prof. He 1
Device Layout Optimization*
■ Driver sizing
■ Transistor/gate sizing
■ Transistor ordering
■ Buffer Insertion
* Part of slides is provided by Prof. Sapatnekar from U. of Minnesota.
Driver Sizing
■ Given: A chain of cascaded drivers driving a loadNegligible interconnect between drivers
Problem: Optimize the driver sizes to minimize delay, orminimize total area while meeting target delay
■ [Lin-Linholm, JSSC’75]
[Veendrick, JSSC’84]
[Hendenstierna-Jeppson, TCAD’87]
[Zhou-Liu, JHSES’96]•
••
ECE902 VLSI Interconnects
Fall 1999, Prof. He 2
An Early Work on Driver Sizing [Lin-Linholm, JSSC’75]
■ Constant stage ratio,
■ if the number of drivers is not fixed,
■ Interconnect is modeled as a lumped capacitor
d1 d2dk
CL
=+
g
L
i
i
C
C
k
d
d
/1
1
ed
d
i
i=+ 1
Further Studies on Driver Sizing
■ Hedenstierna-Jeppson, TCAD’87■ Assumptions: Load cap of Di= output cap of Di + input gate cap of Di+1
z considering ramp input■ Results:
z Opt. P/N transistor ratio:z For unlimited number of drivers, opt. Stage ratio is given by
z For N drivers − first (N-1) stages: constant f− last stage fN=(1+b/a)f− typical values: b/a=0.75 & α =1
mobilities hole andelectron , =
=
nn
p
n
ni
pi
W
W
µµµµ
)( t& ln
ln
D of cap. gateinput
D of cap.output where,
00
0
i
i/)(
α
αα
+⋅⋅==⇒
== +
Ntotal
ff
C
CLNt
f
)(CL/CN
ef
0
)/1(C
CLabf N =+⋅
ECE902 VLSI Interconnects
Fall 1999, Prof. He 3
Driver Sizing with Power Minimization
■ Problem Formulation:
4.1nslt
2.96f
7)/ln(
tota
0
==
== CCLN
ns
NNei
t
t
C
CLN
total
N
BN
56.7 t& 12.5 f
3 502000 .,.
00
===⇒≤⋅
≤⋅
areas)driver min total(power min total =≤ Btotal tt
■ Example [Rabaey, 1996]:z An on-chip min-size inverter under 1.2um CMOS, with
C0=10fF, t0=0.2ns, drives an off-chip load CL=20pF, tB=10ns, CL/C0=2000
z Delay optimal driver sizing: z Power Optimal driver sizing: find min N s.t.
58.1==p
n
n
p
W
W
µµ
■ Min. delay driver sizing solutionTransistor sizes for optimally-sized cascaded drivers
Stage 1 2 3 4 5 6 7W n(um ) 1.8 5.3 15.8 47.7 138.2 409 1211W p(um ) 2.8 8.4 24.9 73.8 218.3 646.2 1913
■ Min. area (power) driver sizing solutionTransistor sizes of redesigned cascaded buffer
Stage 1 2 3W n(um) 1.8 22.7 286W p(um) 2.8 35.9 445
20X reduction in total device area!
Power-Optimal Driver Sizing Example (Cont’d)
ECE902 VLSI Interconnects
Fall 1999, Prof. He 4
Area/Delay Trade-Off in Driver Sizing• Area is more sensitvie to f than delay [Hedenstierna-Jeppson,TCAD’87]
• Low-power driver sizing with increasing stage ratio [Zhou-Liu, JHSES’96]
CMOS) 0.5umfor 0.2(
)1(
122
0
0
0
=
+=
+⋅=
−⋅
γ
γγ
γC
C
ii
L
ef
ff
Transistor/Gate Sizing Optimization
■ Given: Logic network with or without cell libraryFind: Optimal size for each transistor/gate to minimize
area or power, both under delay constraint
z Static sizing: based on timing analysis and consider all paths at once [Fishburn-Dunlop, ICCAD’85][Sapatnekar et al., TCAD’93] [Berkelaar-Jess, EDAC’90][Chen-Onodera-Tamaru, ICCAD’95]
z Dynamic sizing: based on timing simulation and consider paths activated by given patterns [Conn et al., ICCAD’96]
■ Transistor sizing versus gate sizing
ECE902 VLSI Interconnects
Fall 1999, Prof. He 5
The Transistor Sizing Problem
Problem statement
minimize Area(x)
subject to Delay(x) ≤ Tspec
or
minimize Power(x)
subject to Delay(x) ≤ Tspec
Comb.Logic
Mathematical Background
■ n - dimensional spacez Any ordered n-tuple x = (x1, x2, ... , xn) can be thought of as a
point in an n-dimensional space
z f(x1,x2, ..., xn) is a function on the n-dimensional space
■ Convex functionsf(x) is a convex function if given
any two points x a and x b, the
line joining the two points lies
on or above the function
Nonconvex f:x
f(x)
xa xb
f(x)
xa xb x
ECE902 VLSI Interconnects
Fall 1999, Prof. He 6
Math Background (Contd.)
■ Convex functions in two dimensions
f(x1,x2) = x12 + x2
2
Formally, f(x) is convex if
f(α xa + [1 - α] xb) ≤ α f(xa) + [1 - α] f(xb) 0 ≤ α≤ 1
Math Background (Contd.)
■ Convex setsA set S is a convex set if given any two points xa and xb in the set, the
line joining the two points lies entirely within the set
■ ExamplesShape of Shape of a
Wyoming pizza
■ Nonconvex SetsShape of CA Silhouette of
the Taj Mahal
ECE902 VLSI Interconnects
Fall 1999, Prof. He 7
Math Background (Contd.)
■ Mathematical characterization of a convex set Sz If x1, x2 ∈ S, then
α x1 + (1 - α) x2 ∈ S, for 0 ≤ α≤ 1
■ If f(x) is a convex function, f(x) ≤ c is a convex set■ An intersection of convex sets is a convex set
x 1
x 2
Math Background (Contd.)
■ Convex programming problem
minimize convex function f(x)
such that ∩ [fi(x) ≤ ci]
■ Global minimum value is unique!(Nonrigorous) explanation(from “The Handwaver’s Guide
to the Galaxy”)
x
f(x)
xa xb
ECE902 VLSI Interconnects
Fall 1999, Prof. He 8
Math Background (Contd. in English)
■ A posynomial is like a polynomial exceptz all coefficients are positivez exponents could be real numbers (positive or negative)
■ Are these posynomials?
6.023 x11.23 + 4.56 x1
3.4 x27.89 x3
-0.12
x1 - 9.78 x24.2 x3
-9.1
(x1 + 2 x2 + 2 x3 + 5)/x1 + (x3 + 2 x4 + 3)/x3
YES
NO
YES
■ In any posynomial function f(x1, x2, ... , xn), substitute xi = exp(zi) to get F(z1, z2, ... , zn)
■ Then F(z1, z2, ... , zn) = convex function in (z1,... , zn) !minimize (posynomial objective in xi’s)
s.t. (posynomial function in xi’s)i ≤ K for 1 ≤ i ≤ m
[xi = exp(zi)]
minimize (convex objective)over a convex set
Therefore, any local minimum is a global minimum!
Math Background (Contd.)
ECE902 VLSI Interconnects
Fall 1999, Prof. He 9
Properties of Tr. Sizing under the Elmore Model
■ x is the set (vector) of transistor sizes
minimize Area(x) subject to Delay(x) ≤ Tspec
■ Area(x) = Σ i = 1 to n x i (posynomial!)
■ Each path delay = Σ R Cz R ∝ xi
-1, C ∝ xi ⇒ posynomial path delay functionz Delay(x) ≤ Tspec ≡ Pathdelay(x) ≤ Tspec for all paths
■ Therefore, problem has a unique global min. value
TILOS™ (TImed LOgic Synthesis)
■ Philosophyz Since min. value is unique, a simple method should find it!
■ Problem
minimize Area(x) subject to Delay(x) ≤ Tspec
■ Strategyz Set all transistors in the circuit to minimum size
z Find the critical path (largest delay path)
z Reduce delay of critical path, but with a minimal increase in the objective function value
(TILOS™ is a registered trademark of Lucent Technologies)
ECE902 VLSI Interconnects
Fall 1999, Prof. He 10
TILOS (Contd.)
minimize Area(x) subject to Delay(x) ≤ Tspec
■ Find ∂D/∂A for all transistors on critical path■ Bump up the size of transistor with the largest ∂D/∂A
x i → M x i + a (default: M = 1; a = 1 contact head width)
Circuit
Critical PathIN OUT
Sensitivity Computation
■ D(w) = K + Rprev (Cu . w)+ Ru . C / w
■ ∂D/∂w = Rprev . Cu - Ru . C / w2
■ Could minimize path delay by setting derivative to zero
■ Problem: may cause another path delay to become very high!
Rprev
“1”
wC
ECE902 VLSI Interconnects
Fall 1999, Prof. He 11
Why Isn’t This THE Perfect Solution?
■ Problems with interacting paths(1) Better to size A than to size all
of B, C and D
(2) If X-E is near-critical and A-D is critical, size A (not D)
■ False paths, layout considerations not incorporated■ AND YET..
z TILOS (the commercial tool) gives good solutions
z It has handled circuits with 250K transistors
z It has linear time performance with increasing circuit size
A
B
C
D
XE
CONTRAST
■ Solves the convex optimization problem exactly
■ Uses an interior point method that is guaranteed to find the optimal solution
■ Can handle circuits with about a thousand transistors
Delay spec.satisfied
Optimal solution
ECE902 VLSI Interconnects
Fall 1999, Prof. He 12
(Convex) Polytopes
■ Polytope = n-dimensional convex polygonz Half-space: aT x ≥ b (aT x = b is a hyperplane)
e.g. a1 x1 + a2 x2 ≥ b (in two dimensions)
z Polytope = intersection of half-spaces, i.e.,
a1T x ≥ b1
AND a2T x ≥ b 2
AND amT x ≥ bm
Represented as A x ≥ b
Convex Optimization Algorithm (Vaidya)
(1) Enclose solution within a polytope (invariant)z Typically, take a “box” represented by
wi ≤ wMAX and wi ≥ wMIN
as the starting polytope.
(2) Find center of polytope, wc
(3) Does wc satisfy constraints (timing specs)?z Take transistor widths corresponding to wc and perform a static
timing analysis
(4) Add a hyperplane through the center so that the solution lies entirely in one half-spacez Hyperplane equation depends on feasibility of wc
ECE902 VLSI Interconnects
Fall 1999, Prof. He 13
Half-space: ∇ f (wc) . w ≥ ∇ f (wc) . wc
■ If wc is feasiblethen f = objective functionFind gradient of area function
■ If wc is infeasiblethen f = violated constraintFind gradient of critical path delay
Equation of the New Half-Space
wc
Illustrative Example
f (w) = c, f decreasingsolution
S S
SS
w1
w2
ECE902 VLSI Interconnects
Fall 1999, Prof. He 14
Calculating the Polytope Center
■ Finding exact centroid is computationally expensive■ Estimate center by minimizing log-barrier function
F(x) = - Σi=1 to m log (aiT x - bi)
Happy “coincidence”:
F(x) is a convex function!
■ Physical meaning:maximize product of perpendicular
distances to each hyperplane
that defines the polytope
Linear Programming Methods
■ LP-based approachesz Model gate delay as a piecewise linear function
Parameters:• transistor widths wn , wp
• fanout transistor widths
• input transition time
z Formulate problem as a linear program (LP)
z Use an efficient simplex package to solve LP
Delay
wn
ECE902 VLSI Interconnects
Fall 1999, Prof. He 15
Power-Delay Sizing
minimize Power(w)
subject to Delay(w) ≤ Tspec
Area ≤ Aspec
Each gate size ≥ Minsize
Power = dynamic power +short-circuit power
Dynamic Power
■ Dynamic Powerz Power required to charge/discharge capacitances
Pdynamic = CL Vdd2 f pT
CL = load capacitance, f = clock frequency, pT = transition probability
z Posynomial function in w’s (if pT constant)
z Constitutes dominant part of power in a well-designed circuit
z Minimize dynamic power ≡ minimize CL
≡ minimize all transistor sizes!RIGHT? (Unfortunately not!)
POST-IT
ECE902 VLSI Interconnects
Fall 1999, Prof. He 16
Short-Circuit Power
■ Short-circuit Powerz Power dissipated when a direct Vdd-ground path exists
z Approximate formula by Veendrick (many assumptions)
Pshort-ckt = β/12. (Vdd -2VT)2 τ f pT
β = transconductance, τ = transition time
z Posynomial function in w’s (if pT const)
z Other (more accurate) models: table lookup, curve-fitting
z “Less than 10-20% of total power in a well-designed circuit”
z So what’s the catch?
POST-IT
The Catch
■ Delay of gate A is largez Therefore, the value of τ for B,
C, ... , H is large
z Therefore short-circuit power for B, C, ... , H is large
z Can be reduced by reducing the delay of A
z In other words, size A!
■ Tradeoff dynamic and short-circuit power!
■ Minpower ≠ minsize
A
B
C
D
XE
F
G
H
ECE902 VLSI Interconnects
Fall 1999, Prof. He 17
begin Calculate pT's ‘for minsized gates
error < ε? end
Solve gate sizing problem for current pT
Calculate pT's for new sizeserror = ||old pT - new pT ||
Problem: inaccuracies in short-ckt. power model
Solution Technique
Yes
No
Transistor/Gate Sizing [Borah-Owens-Irwin, ISLPD’95, TCAD’96]
γσα
τµτµ
++⋅=
⋅⋅⋅+⋅⋅⋅⋅+⋅+⋅⋅⋅= ∑∑==
WW
fWfWkfCLfWcVPn
i
iiin
n
i
iidd
)()(11
2
Optimal transistor size
τµ
µ
τµ
µ
⋅
⋅+⋅⋅=
⋅
⋅+⋅⋅=
∑ ∑
∑ ∑
= =
= =
n
n
i
n
ipiIpip
n
p
n
i
n
iniInin
p
WCWnOW
WCWpOW
1 1)()(
*
1 1)()(
*
)()()(
)()()(
CI = int. cap
ECE902 VLSI Interconnects
Fall 1999, Prof. He 18
Power Optimal Sizes and Corresponding Power Savings
Power-Delay Optimization
ECE902 VLSI Interconnects
Fall 1999, Prof. He 19
Power, Delay and Power-Delay Curves
Power-Delay Optimal Transistor Sizing Algorithm
■ Power-Optimal initial sizing■ Timing analysis
■ While exists path-delay > target-delayz Power-delay optimal sizing critical pathz if path-delay > target-delay
− upsize transistor with minimum power-delay slopez if path-delay < target-delay
− downsize transistor with minimum power-delay slope
z Incremental timing analysis
ECE902 VLSI Interconnects
Fall 1999, Prof. He 20
Effect of Transistor Sizing
Transistor Ordering
■ Problem: Find the best ordering of transistors in each gate, s.t. delay and/or power is minimized
■ Comment: No (or little) penalty on circuit area !
■ Example [Pradad-Roy, IWLPD’94]:
ECE902 VLSI Interconnects
Fall 1999, Prof. He 21
How to Determine the Best Transistor Order
■ No easy answer!
■ Need to evaluate using SPICE or switch-level simulation
■ Example [Carlson-Chen, DAC’93]:
■ CL=0.2pF, all transistor W/L=7
■ Rise time of time =5ns d1/d2=1.23
■ Rise time of time =1ns d1/d2=0.92
Determine the Best Transistor Order at Each Gate
■ Exhaustive Search:z Enumerate all possible permutations [Prasad-Roy,
IWLPD’94]z Use SP-BDD to enumerate all possible ordering for serial-
parallel circuits [Glebov-Blaauw-Jones, ISLPD’95]
■ Heuristic Searchz Try top critical (slowest input closest to output node)z Try bottom-critical (slowest input closest to power node)
=> Choose the best [Carlson-Chen, DAC’93]: z Pre-characterize each cell in a fixed library
=> Connect the slowest input to pins with the smallest delay [Prasad-Roy, IWLPD’94]
ECE902 VLSI Interconnects
Fall 1999, Prof. He 22
Optimal Transistor Ordering for Entire Circuit
■ Iterative approach, local optimal ordering at each gate■ Example [Prasad-Roy, IWLPD’94]:
z Phase 1: Delay minimization
− Forward traversal to compute delay to each gate
− Backward traversal to compute slack at each gateWhen encountering a gate with negative slack
Optimal transistor ordering for this gate Forward traversal to update delay & slack
z Phase2: Power minimization (similar to phase 1)
Experimental Results [Prasad-Roy, IWLPD’94]
= power of a gate driving a min inverter at 1 MHZ.
Delay target = 5ns
0∆
ECE902 VLSI Interconnects
Fall 1999, Prof. He 23
Buffer Insertion■ Motivation
z Reduce quadratic growth of Rint·Cint w.r.t. wirelength
■ Classical approachz Insert equally-spaced buffers along a long wire
■ Buffer insertion for tree topology [van Ginneken, ISCAS’90]z Two-phase algorithm based on dynamic programming
z Bottom-up to compute irredundant options at buffer candidate locations
z Top-down to select the optimal locations
Buffer Insertion for Tree Interconnect[van Ginneken, ISCAS’90]
■ Given topology, buffer types, and candidate buffer locations, insert buffers to
(I) Minimize maximum sink delay
(ii) Meet target delay for each sink
DC Connectedsubtree for i
i
ECE902 VLSI Interconnects
Fall 1999, Prof. He 24
Optimal Buffer Insertion by Dynamic Programming
■ Bottom-up computation of irredundant set of options (c,q)’s at each buffer candidate location
z a possible solution = option (c,q): c: Cap. of DC-connected subtree q: Req. arrival time corresponding to c
z Pruning Rule: Given options (c,q), (c’, q’) (c’, q’) is redundant if c’ ≥ c and q’ < q
z Given sorted option lists for two sub-trees, it takes linear time to merge them at p.
■ Top-down selection of optimal buffer types and buffer locations
DC-connected subtree