driver sizing - eda.ee.ucla.edu

ECE902 VLSI Interconnects

Fall 1999, Prof. He 1

Device Layout Optimization*

■ Driver sizing

■ Transistor/gate sizing

■ Transistor ordering

■ Buffer Insertion

* Part of slides is provided by Prof. Sapatnekar from U. of Minnesota.

Driver Sizing

■ Given: A chain of cascaded drivers driving a loadNegligible interconnect between drivers

Problem: Optimize the driver sizes to minimize delay, orminimize total area while meeting target delay

■ [Lin-Linholm, JSSC’75]

[Veendrick, JSSC’84]

[Hendenstierna-Jeppson, TCAD’87]

[Zhou-Liu, JHSES’96]•

••



An Early Work on Driver Sizing [Lin-Linholm, JSSC’75]

■ Constant stage ratio,

■ if the number of drivers is not fixed,

■ Interconnect is modeled as a lumped capacitor

d1 d2dk

CL

=+

g

L

i

i

C

C

k

d

d

/1

1

ed

d

i

i=+ 1

Further Studies on Driver Sizing

■ Hedenstierna-Jeppson, TCAD’87■ Assumptions: Load cap of Di= output cap of Di + input gate cap of Di+1

z considering ramp input■ Results:

z Opt. P/N transistor ratio:z For unlimited number of drivers, opt. Stage ratio is given by

z For N drivers − first (N-1) stages: constant f− last stage fN=(1+b/a)f− typical values: b/a=0.75 & α =1

mobilities hole andelectron , =

=

nn

p

n

ni

pi

W

W

µµµµ

)( t& ln

ln

D of cap. gateinput

D of cap.output where,

00

0

i

i/)(

α

αα

+⋅⋅==⇒

== +

Ntotal

ff

C

CLNt

f

)(CL/CN

ef

0

)/1(C

CLabf N =+⋅



Driver Sizing with Power Minimization

■ Problem Formulation:

4.1nslt

2.96f

7)/ln(

tota

0

==

== CCLN

ns

NNei

t

t

C

CLN

total

N

BN

56.7 t& 12.5 f

3 502000 .,.

00

===⇒≤⋅

≤⋅

areas)driver min total(power min total =≤ Btotal tt

■ Example [Rabaey, 1996]:z An on-chip min-size inverter under 1.2um CMOS, with

C0=10fF, t0=0.2ns, drives an off-chip load CL=20pF, tB=10ns, CL/C0=2000

z Delay optimal driver sizing: z Power Optimal driver sizing: find min N s.t.

58.1==p

n

n

p

W

W

µµ

■ Min. delay driver sizing solutionTransistor sizes for optimally-sized cascaded drivers

Stage 1 2 3 4 5 6 7W n(um ) 1.8 5.3 15.8 47.7 138.2 409 1211W p(um ) 2.8 8.4 24.9 73.8 218.3 646.2 1913

■ Min. area (power) driver sizing solutionTransistor sizes of redesigned cascaded buffer

Stage 1 2 3W n(um) 1.8 22.7 286W p(um) 2.8 35.9 445

20X reduction in total device area!

Power-Optimal Driver Sizing Example (Cont’d)



Area/Delay Trade-Off in Driver Sizing• Area is more sensitvie to f than delay [Hedenstierna-Jeppson,TCAD’87]

• Low-power driver sizing with increasing stage ratio [Zhou-Liu, JHSES’96]

CMOS) 0.5umfor 0.2(

)1(

122

0

0

0

=

+=

+⋅=

−⋅

γ

γγ

γC

C

ii

L

ef

ff

Transistor/Gate Sizing Optimization

■ Given: Logic network with or without cell libraryFind: Optimal size for each transistor/gate to minimize

area or power, both under delay constraint

z Static sizing: based on timing analysis and consider all paths at once [Fishburn-Dunlop, ICCAD’85][Sapatnekar et al., TCAD’93] [Berkelaar-Jess, EDAC’90][Chen-Onodera-Tamaru, ICCAD’95]

z Dynamic sizing: based on timing simulation and consider paths activated by given patterns [Conn et al., ICCAD’96]

■ Transistor sizing versus gate sizing



The Transistor Sizing Problem

Problem statement

minimize Area(x)

subject to Delay(x) ≤ Tspec

or

minimize Power(x)

subject to Delay(x) ≤ Tspec

Comb.Logic

Mathematical Background

■ n - dimensional spacez Any ordered n-tuple x = (x1, x2, ... , xn) can be thought of as a

point in an n-dimensional space

z f(x1,x2, ..., xn) is a function on the n-dimensional space

■ Convex functionsf(x) is a convex function if given

any two points x a and x b, the

line joining the two points lies

on or above the function

Nonconvex f:x

f(x)

xa xb

f(x)

xa xb x



Math Background (Contd.)

■ Convex functions in two dimensions

f(x1,x2) = x12 + x2

2

Formally, f(x) is convex if

f(α xa + [1 - α] xb) ≤ α f(xa) + [1 - α] f(xb) 0 ≤ α≤ 1


■ Convex setsA set S is a convex set if given any two points xa and xb in the set, the

line joining the two points lies entirely within the set

■ ExamplesShape of Shape of a

Wyoming pizza

■ Nonconvex SetsShape of CA Silhouette of

the Taj Mahal




■ Mathematical characterization of a convex set Sz If x1, x2 ∈ S, then

α x1 + (1 - α) x2 ∈ S, for 0 ≤ α≤ 1

■ If f(x) is a convex function, f(x) ≤ c is a convex set■ An intersection of convex sets is a convex set

x 1

x 2


■ Convex programming problem

minimize convex function f(x)

such that ∩ [fi(x) ≤ ci]

■ Global minimum value is unique!(Nonrigorous) explanation(from “The Handwaver’s Guide

to the Galaxy”)

x

f(x)

xa xb



Math Background (Contd. in English)

■ A posynomial is like a polynomial exceptz all coefficients are positivez exponents could be real numbers (positive or negative)

■ Are these posynomials?

6.023 x11.23 + 4.56 x1

3.4 x27.89 x3

-0.12

x1 - 9.78 x24.2 x3

-9.1

(x1 + 2 x2 + 2 x3 + 5)/x1 + (x3 + 2 x4 + 3)/x3

YES

NO

YES

■ In any posynomial function f(x1, x2, ... , xn), substitute xi = exp(zi) to get F(z1, z2, ... , zn)

■ Then F(z1, z2, ... , zn) = convex function in (z1,... , zn) !minimize (posynomial objective in xi’s)

s.t. (posynomial function in xi’s)i ≤ K for 1 ≤ i ≤ m

[xi = exp(zi)]

minimize (convex objective)over a convex set

Therefore, any local minimum is a global minimum!




Properties of Tr. Sizing under the Elmore Model

■ x is the set (vector) of transistor sizes

minimize Area(x) subject to Delay(x) ≤ Tspec

■ Area(x) = Σ i = 1 to n x i (posynomial!)

■ Each path delay = Σ R Cz R ∝ xi

-1, C ∝ xi ⇒ posynomial path delay functionz Delay(x) ≤ Tspec ≡ Pathdelay(x) ≤ Tspec for all paths

■ Therefore, problem has a unique global min. value

TILOS™ (TImed LOgic Synthesis)

■ Philosophyz Since min. value is unique, a simple method should find it!

■ Problem


■ Strategyz Set all transistors in the circuit to minimum size

z Find the critical path (largest delay path)

z Reduce delay of critical path, but with a minimal increase in the objective function value

(TILOS™ is a registered trademark of Lucent Technologies)



TILOS (Contd.)


■ Find ∂D/∂A for all transistors on critical path■ Bump up the size of transistor with the largest ∂D/∂A

x i → M x i + a (default: M = 1; a = 1 contact head width)

Circuit

Critical PathIN OUT

Sensitivity Computation

■ D(w) = K + Rprev (Cu . w)+ Ru . C / w

■ ∂D/∂w = Rprev . Cu - Ru . C / w2

■ Could minimize path delay by setting derivative to zero

■ Problem: may cause another path delay to become very high!

Rprev

“1”

wC



Why Isn’t This THE Perfect Solution?

■ Problems with interacting paths(1) Better to size A than to size all

of B, C and D

(2) If X-E is near-critical and A-D is critical, size A (not D)

■ False paths, layout considerations not incorporated■ AND YET..

z TILOS (the commercial tool) gives good solutions

z It has handled circuits with 250K transistors

z It has linear time performance with increasing circuit size

A

B

C

D

XE

CONTRAST

■ Solves the convex optimization problem exactly

■ Uses an interior point method that is guaranteed to find the optimal solution

■ Can handle circuits with about a thousand transistors

Delay spec.satisfied

Optimal solution



(Convex) Polytopes

■ Polytope = n-dimensional convex polygonz Half-space: aT x ≥ b (aT x = b is a hyperplane)

e.g. a1 x1 + a2 x2 ≥ b (in two dimensions)

z Polytope = intersection of half-spaces, i.e.,

a1T x ≥ b1

AND a2T x ≥ b 2

AND amT x ≥ bm

Represented as A x ≥ b

Convex Optimization Algorithm (Vaidya)

(1) Enclose solution within a polytope (invariant)z Typically, take a “box” represented by

wi ≤ wMAX and wi ≥ wMIN

as the starting polytope.

(2) Find center of polytope, wc

(3) Does wc satisfy constraints (timing specs)?z Take transistor widths corresponding to wc and perform a static

timing analysis

(4) Add a hyperplane through the center so that the solution lies entirely in one half-spacez Hyperplane equation depends on feasibility of wc



Half-space: ∇ f (wc) . w ≥ ∇ f (wc) . wc

■ If wc is feasiblethen f = objective functionFind gradient of area function

■ If wc is infeasiblethen f = violated constraintFind gradient of critical path delay

Equation of the New Half-Space

wc

Illustrative Example

f (w) = c, f decreasingsolution

S S

SS

w1

w2



Calculating the Polytope Center

■ Finding exact centroid is computationally expensive■ Estimate center by minimizing log-barrier function

F(x) = - Σi=1 to m log (aiT x - bi)

Happy “coincidence”:

F(x) is a convex function!

■ Physical meaning:maximize product of perpendicular

distances to each hyperplane

that defines the polytope

Linear Programming Methods

■ LP-based approachesz Model gate delay as a piecewise linear function

Parameters:• transistor widths wn , wp

• fanout transistor widths

• input transition time

z Formulate problem as a linear program (LP)

z Use an efficient simplex package to solve LP

Delay

wn



Power-Delay Sizing

minimize Power(w)

subject to Delay(w) ≤ Tspec

Area ≤ Aspec

Each gate size ≥ Minsize

Power = dynamic power +short-circuit power

Dynamic Power

■ Dynamic Powerz Power required to charge/discharge capacitances

Pdynamic = CL Vdd2 f pT

CL = load capacitance, f = clock frequency, pT = transition probability

z Posynomial function in w’s (if pT constant)

z Constitutes dominant part of power in a well-designed circuit

z Minimize dynamic power ≡ minimize CL

≡ minimize all transistor sizes!RIGHT? (Unfortunately not!)

POST-IT



Short-Circuit Power

■ Short-circuit Powerz Power dissipated when a direct Vdd-ground path exists

z Approximate formula by Veendrick (many assumptions)

Pshort-ckt = β/12. (Vdd -2VT)2 τ f pT

β = transconductance, τ = transition time

z Posynomial function in w’s (if pT const)

z Other (more accurate) models: table lookup, curve-fitting

z “Less than 10-20% of total power in a well-designed circuit”

z So what’s the catch?

POST-IT

The Catch

■ Delay of gate A is largez Therefore, the value of τ for B,

C, ... , H is large

z Therefore short-circuit power for B, C, ... , H is large

z Can be reduced by reducing the delay of A

z In other words, size A!

■ Tradeoff dynamic and short-circuit power!

■ Minpower ≠ minsize

A

B

C

D

XE

F

G

H



begin Calculate pT's ‘for minsized gates

error < ε? end

Solve gate sizing problem for current pT

Calculate pT's for new sizeserror = ||old pT - new pT ||

Problem: inaccuracies in short-ckt. power model

Solution Technique

Yes

No

Transistor/Gate Sizing [Borah-Owens-Irwin, ISLPD’95, TCAD’96]

γσα

τµτµ

++⋅=

⋅⋅⋅+⋅⋅⋅⋅+⋅+⋅⋅⋅= ∑∑==

WW

fWfWkfCLfWcVPn

i

iiin

n

i

iidd

)()(11

2

Optimal transistor size

τµ

µ

τµ

µ

⋅

⋅+⋅⋅=

⋅

⋅+⋅⋅=

∑ ∑

∑ ∑

= =

= =

n

n

i

n

ipiIpip

n

p

n

i

n

iniInin

p

WCWnOW

WCWpOW

1 1)()(

*

1 1)()(

*

)()()(

)()()(

CI = int. cap



Power Optimal Sizes and Corresponding Power Savings

Power-Delay Optimization



Power, Delay and Power-Delay Curves

Power-Delay Optimal Transistor Sizing Algorithm

■ Power-Optimal initial sizing■ Timing analysis

■ While exists path-delay > target-delayz Power-delay optimal sizing critical pathz if path-delay > target-delay

− upsize transistor with minimum power-delay slopez if path-delay < target-delay

− downsize transistor with minimum power-delay slope

z Incremental timing analysis



Effect of Transistor Sizing

Transistor Ordering

■ Problem: Find the best ordering of transistors in each gate, s.t. delay and/or power is minimized

■ Comment: No (or little) penalty on circuit area !

■ Example [Pradad-Roy, IWLPD’94]:



How to Determine the Best Transistor Order

■ No easy answer!

■ Need to evaluate using SPICE or switch-level simulation

■ Example [Carlson-Chen, DAC’93]:

■ CL=0.2pF, all transistor W/L=7

■ Rise time of time =5ns d1/d2=1.23

■ Rise time of time =1ns d1/d2=0.92

Determine the Best Transistor Order at Each Gate

■ Exhaustive Search:z Enumerate all possible permutations [Prasad-Roy,

IWLPD’94]z Use SP-BDD to enumerate all possible ordering for serial-

parallel circuits [Glebov-Blaauw-Jones, ISLPD’95]

■ Heuristic Searchz Try top critical (slowest input closest to output node)z Try bottom-critical (slowest input closest to power node)

=> Choose the best [Carlson-Chen, DAC’93]: z Pre-characterize each cell in a fixed library

=> Connect the slowest input to pins with the smallest delay [Prasad-Roy, IWLPD’94]



Optimal Transistor Ordering for Entire Circuit

■ Iterative approach, local optimal ordering at each gate■ Example [Prasad-Roy, IWLPD’94]:

z Phase 1: Delay minimization

− Forward traversal to compute delay to each gate

− Backward traversal to compute slack at each gateWhen encountering a gate with negative slack

Optimal transistor ordering for this gate Forward traversal to update delay & slack

z Phase2: Power minimization (similar to phase 1)

Experimental Results [Prasad-Roy, IWLPD’94]

= power of a gate driving a min inverter at 1 MHZ.

Delay target = 5ns

0∆



Buffer Insertion■ Motivation

z Reduce quadratic growth of Rint·Cint w.r.t. wirelength

■ Classical approachz Insert equally-spaced buffers along a long wire

■ Buffer insertion for tree topology [van Ginneken, ISCAS’90]z Two-phase algorithm based on dynamic programming

z Bottom-up to compute irredundant options at buffer candidate locations

z Top-down to select the optimal locations

Buffer Insertion for Tree Interconnect[van Ginneken, ISCAS’90]

■ Given topology, buffer types, and candidate buffer locations, insert buffers to

(I) Minimize maximum sink delay

(ii) Meet target delay for each sink

DC Connectedsubtree for i

i



Optimal Buffer Insertion by Dynamic Programming

■ Bottom-up computation of irredundant set of options (c,q)’s at each buffer candidate location

z a possible solution = option (c,q): c: Cap. of DC-connected subtree q: Req. arrival time corresponding to c

z Pruning Rule: Given options (c,q), (c’, q’) (c’, q’) is redundant if c’ ≥ c and q’ < q

z Given sorted option lists for two sub-trees, it takes linear time to merge them at p.

■ Top-down selection of optimal buffer types and buffer locations

DC-connected subtree

driver sizing - eda.ee.ucla.edu

Documents