dominant critical gate identiï¬cation for power and yield

6
Dominant Critical Gate Identification for Power and Yield Optimization in Logic Circuits Mihir Choudhury, Masoud Rostami, and Kartik Mohanram Depar tment of Electrical and Computer Engineering, Rice University, Houston [email protected] [email protected] [email protected] Abstract With increasing process variations, low-VT swapping is an effec- tive technique that can be used to improve timing yield without having to modify a design following placement and routing. Gate criticality, defined as the probability that a gate lies on a critical path, forms the basis for existing low-V T swapping techniques. This paper presents a simulation-based study that challenges the effec- tiveness of low-V T swapping based on the conventional definition of gate criticality, especially as random process variations increase with technology scaling. We introduce dominant gate criticality to address the drawbacks of the conventional definition of gate criti- cality, and formulate dominant critical gate ranking in the presence of process variations as an optimization problem. Simulation re- sults for 12 benchmark circuits from the ISCAS and OpenSPARC suites to achieve timing yields of 95% and 98% indicate that low- V T swapping based on dominant gate criticality reduces leakage power overhead by 61% and 42% for independent and correlated process variations, respectively, over low-V T swapping based on conventional gate criticality. Categories and Subject Descriptors: B.6.3 [Logic design]: De- sign Aids—Optimization General Terms: Algorithms, Design, Reliability Keywords: Process variations, Yield, Low-V T 1. Introduction Process variations cause significant degradation in the yield of manufactured chips [1], and these effects are expected to worsen with technology scaling. Process variations consist of a correlated component arising from wafer-to-wafer, die-to-die, and spatially correlated within-die variations, and an independent component arising from random variations. As random variations increase with technology scaling [2], guard-banding approaches to improve tim- ing yield result in pessimistic designs. Since leakage power is also an important factor in determining the yield [1], improving timing yield with minimal impact on leakage power is a significant chal- lenge for the future. Statistical optimization techniques to improve timing yield by optimizing circuit parameters such as gate size, V T, and VDD early in the design cycle have been proposed in literature (see [3]). How- ever, since the impact of process variations can be predicted more accurately after place-and-route, engineering change order (ECO) techniques based on logic restructuring, buffer insertion, gate resiz- ing, and low-V T swapping have been proposed to improve yield by fine-tuning the design [4, 5]. Since leakage power is also strongly This research was supported by NSF CAREER Award CCF-0746850. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. GLSVLSI’10, May 16–18, 2010, Providence, Rhode Island, USA. Copyright 2010 ACM 978-1-4503-0012-4/10/06 ...$10.00. influenced by process variations, these techniques try to enhance timing yield with minimum impact on leakage power. Low-V T swapping is a preferred ECO technique for improving timing yield since it can be applied without modifying a design fol- lowing placement and routing. Several optimization-based low-V T swapping techniques to improve yield have been proposed in lit- erature [6–8]. The dynamic programming approach in [6] stores the best low-V T swapping choices, but becomes computationally expensive for circuits with a large number of reconvergent fanout paths. Techniques based on solving a continuous-V T optimization problem, followed by heuristic techniques to discretize the V T as- signments [7, 8], either do not produce good V T assignments or be- come computationally demanding as more complex discretization strategies are used. Given these limitations, practical techniques for low-V T swapping based on the concept of gate criticality have been proposed [5, 9]. Gate criticality is defined as the probability that a gate lies on a critical path and several techniques for gate criticality computation have been proposed [10–13]. Conventional techniques for low-V T swapping use metrics that combine gate criticality with leakage to rank and process candi- dates for timing yield enhancement. However, the effectiveness of such rank-and-swap techniques diminishes with each swap since the criticality of all the gates in the design changes after every swap. This is because the distribution of critical paths, and hence critical gates, changes after every low-V T swap. Although it is possible to repeat criticality computation after every swap or set of swaps, the need to run a statistical timing and yield analyzer for criticality computation makes this approach computationally exorbitant. To address these shortcomings, we propose the concept of domi- nant critical gate ranking in this paper. Dominant critical gate rank- ing ensures that the set of top ranked gates is a critical set of gates, i.e., it ensures that the set of gates is highly effective in improving the timing yield of the circuit. We formulate dominant critical gate ranking in the presence of process variations as an optimization problem. This optimization problem has to be solved only once to determine a ranking of the critical gates that can be effectively used to improve the timing yield of a circuit. The effectiveness of domi- nant critical gate ranking is illustrated by considering low-V T swap- ping of the top ranked gates to improve the timing yield to 95% and 98%. For 12 benchmarks from the ISCAS and OpenSPARC suites, the results indicate that low-V T swapping based on dominant criti- cal gate ranking requires 57% and 32% fewer swaps than conven- tional gate criticality for independent and correlated process varia- tions, respectively. The reduced number of low-V T swaps translates to 61% and 42% reduction in leakage power overhead for achiev- ing the same timing yield in for independent and correlated process variations, respectively. This paper is organized as follows. Section 2 motivates domi- nant gate criticality. Section 3 describes optimization for dominant critical gate ranking for independent process variations. Sec. 4 extends dominant critical gate ranking to correlated process vari- ations. Sec. 5 presents results for yield improvement and power reduction using low-V T swapping. Sec. 6 is a conclusion. 173

Upload: others

Post on 04-Feb-2022

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dominant Critical Gate Identiï¬cation for Power and Yield

Dominant Critical Gate Identificationfor Power and Yield Optimization in Logic Circuits

Mihir Choudhury, Masoud Rostami, and Kartik Mohanram Depar tment of Electr ical and Computer Engineer ing, Rice University, Houston

[email protected] [email protected] [email protected]

AbstractWith increasing process variations, low-VT swapping is an effec-tive technique that can be used to improve timing yield withouthaving to modify a design following placement and routing. Gatecriticality, defined as the probability that a gate lies on a criticalpath, forms the basis for existing low-VT swapping techniques. Thispaper presents a simulation-based study that challenges the effec-tiveness of low-VT swapping based on the conventional definitionof gate criticality, especially as random process variations increasewith technology scaling. We introduce dominant gate criticality toaddress the drawbacks of the conventional definition of gate criti-cality, and formulate dominant critical gate ranking in the presenceof process variations as an optimization problem. Simulation re-sults for 12 benchmark circuits from the ISCAS and OpenSPARCsuites to achieve timing yields of 95% and 98% indicate that low-VT swapping based on dominant gate criticality reduces leakagepower overhead by 61% and 42% for independent and correlatedprocess variations, respectively, over low-VT swapping based onconventional gate criticality.

Categories and Subject Descriptors: B.6.3 [Logic design]: De-sign Aids—OptimizationGeneral Terms: Algorithms, Design, ReliabilityKeywords: Process variations, Yield, Low-VT

1. IntroductionProcess variations cause significant degradation in the yield of

manufactured chips [1], and these effects are expected to worsenwith technology scaling. Process variations consist of a correlatedcomponent arising from wafer-to-wafer, die-to-die, and spatiallycorrelated within-die variations, and an independent componentarising from random variations. As random variations increase withtechnology scaling [2], guard-banding approaches to improve tim-ing yield result in pessimistic designs. Since leakage power is alsoan important factor in determining the yield [1], improving timingyield with minimal impact on leakage power is a significant chal-lenge for the future.

Statistical optimization techniques to improve timing yield byoptimizing circuit parameters such as gate size, VT, and VDD earlyin the design cycle have been proposed in literature (see [3]). How-ever, since the impact of process variations can be predicted moreaccurately after place-and-route, engineering change order (ECO)techniques based on logic restructuring, buffer insertion, gate resiz-ing, and low-VT swapping have been proposed to improve yield byfine-tuning the design [4, 5]. Since leakage power is also strongly

This research was supported by NSF CAREER Award CCF-0746850.

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.GLSVLSI’10, May 16–18, 2010, Providence, Rhode Island, USA.Copyright 2010 ACM 978-1-4503-0012-4/10/06 ...$10.00.

influenced by process variations, these techniques try to enhancetiming yield with minimum impact on leakage power.

Low-VT swapping is a preferred ECO technique for improvingtiming yield since it can be applied without modifying a design fol-lowing placement and routing. Several optimization-based low-VT

swapping techniques to improve yield have been proposed in lit-erature [6–8]. The dynamic programming approach in [6] storesthe best low-VT swapping choices, but becomes computationallyexpensive for circuits with a large number of reconvergent fanoutpaths. Techniques based on solving a continuous-VT optimizationproblem, followed by heuristic techniques to discretize the VT as-signments [7, 8], either do not produce good VT assignments or be-come computationally demanding as more complex discretizationstrategies are used. Given these limitations, practical techniques forlow-VT swapping based on the concept of gate criticality have beenproposed [5, 9]. Gate criticality is defined as the probability that agate lies on a critical path and several techniques for gate criticalitycomputation have been proposed [10–13].

Conventional techniques for low-VT swapping use metrics thatcombine gate criticality with leakage to rank and process candi-dates for timing yield enhancement. However, the effectiveness ofsuch rank-and-swap techniques diminishes with each swap sincethe criticality of all the gates in the design changes after every swap.This is because the distribution of critical paths, and hence criticalgates, changes after every low-VT swap. Although it is possibleto repeat criticality computation after every swap or set of swaps,the need to run a statistical timing and yield analyzer for criticalitycomputation makes this approach computationally exorbitant.

To address these shortcomings, we propose the concept of domi-nant critical gate ranking in this paper. Dominant critical gate rank-ing ensures that the set of top ranked gates is a critical set of gates,i.e., it ensures that the set of gates is highly effective in improvingthe timing yield of the circuit. We formulate dominant critical gateranking in the presence of process variations as an optimizationproblem. This optimization problem has to be solved only once todetermine a ranking of the critical gates that can be effectively usedto improve the timing yield of a circuit. The effectiveness of domi-nant critical gate ranking is illustrated by considering low-VT swap-ping of the top ranked gates to improve the timing yield to 95% and98%. For 12 benchmarks from the ISCAS and OpenSPARC suites,the results indicate that low-VT swapping based on dominant criti-cal gate ranking requires 57% and 32% fewer swaps than conven-tional gate criticality for independent and correlated process varia-tions, respectively. The reduced number of low-VT swaps translatesto 61% and 42% reduction in leakage power overhead for achiev-ing the same timing yield in for independent and correlated processvariations, respectively.

This paper is organized as follows. Section 2 motivates domi-nant gate criticality. Section 3 describes optimization for dominantcritical gate ranking for independent process variations. Sec. 4extends dominant critical gate ranking to correlated process vari-ations. Sec. 5 presents results for yield improvement and powerreduction using low-VT swapping. Sec. 6 is a conclusion.

173

Page 2: Dominant Critical Gate Identiï¬cation for Power and Yield

2. MotivationIn this section, we present results and observations for low-VT

swapping based on the conventional definition of gate criticalityto motivate the problem addressed in this paper. Whereas we useMonte Carlo simulations to compute the criticality of each gate inthe circuit in the presence of process variations, techniques suchas [10–13] can also be used.

Process variations: Our framework considers process variationsarising from random dopant fluctuations (RDF), variations in oxidethickness, and variations in gate length. Variations due to RDF andoxide thickness are assumed to be independent, resulting in inde-pendent variations in threshold voltage of the gates. Variations ingate lengths are assumed to be spatially correlated. The correlationcoefficient between gate gi and gate gj is given by the exponentialcorrelation function [14]:

ρ(gi, gj) = e−αdgi,gj (1)

where dgi,gj is the distance between gate gi and gate gj obtainedafter placement and α is the correlation function decay factor. α de-termines the degree of spatial correlation, with α = 0 and α = ∞representing completely correlated and independent cases, respec-tively. The 3σ of the variations for each parameter are assumed tobe 25% of the mean value.

0 5 10 15 20 25 300

500

1000

1500

2000

2500

3000

3500

4000

4500

Delay

Num

ber o

f chi

ps

Correlatedvariations

Independentvariations

Figure 1: Comparison of delay distributions with independentand correlated process variations.

Before we present our observations on gate criticality, we exam-ine the effect of process variations and correlations on the delay ofa circuit with ten critical paths. Assume that the delay of each pathhas a Gaussian distribution with a mean of 15 and unit variance.Fig. 1 shows the delay distribution for 100K instances of the circuitin the presence of independent and correlated process variations.The critical path delay of the circuit is the max of the delays ofthe ten paths in the circuit. In the presence of independent processvariations, the mean of the delay distribution is greater than 15.As the correlated component of process variations increases, themean of the delay distribution shifts closer to its nominal value of15, but the variance of the distribution shows an increasing trend.Hence, as the correlations increase, fewer chips fail to meet tim-ing constraints, but they fail to meet timing by a larger value. Thisis not a new observation and has been noted in previous works,e.g., [14]. This observation will be useful in explaining limitingtrends in timing yield improvement for low-VT swapping based onthe conventional definition of gate criticality [5, 9].

Simulation setup: Each circuit is optimized and mapped to a 45nmgate library based on predictive technology model [15]. Static gatesizing based on geometric programming [16] is used to obtain power-optimal gate size assignment for a target delay Tspec. Placement ofthe optimized circuit is performed using CAPO [17]. At this point,the circuit satisfies the target delay constraint Tspec under nominalprocess conditions. However, in the presence of process variations,

(i) C499 (ii) C2670

(iii) sparc_ifu_dec (iv) sparc_lsu_ctl

Yie

ld

Number of low VT swaps

0 50 100 150 200 2500

0.2

0.4

0.6

0.8

1

0 20 40 60 80 1000

0.2

0.4

0.6

0.8

1

0 50 100 1500

0.2

0.4

0.6

0.8

1

0 20 40 60 80 1000

0.2

0.4

0.6

0.8

1

IndLow corr.

High corr.

Ind Lowcorr.

High corr.

IndLowcorr.

High corr.

Ind

Low corr.

High corr.

Figure 2: Yield improvement using low-VT swapping based onthe conventional definition of criticality for independent pro-cess variations (Ind), process variations with low spatial corre-lations (Low corr.), and high spatial correlations (High corr.).The initial design has a low timing yield for a target delay Tspec

in the presence of process variations.

the circuit has a low timing yield for target delay Tspec. Techniquessuch as logic restructuring, buffer insertion, gate resizing, and low-VT swapping have been proposed to improve the yield in the pres-ence of process variations [4,5]. Low-VT swapping is preferred forimproving timing yield since it can be applied without modifyinga place-and-routed design. Since low-VT swapping increases leak-age power, it is important to minimize the leakage power overheadduring low-VT swapping to improve timing yield.

Gate criticality: Gate criticality, defined as the probability thata gate lies on a critical path, has been used in literature for low-VT swapping [5, 9]. We will show that low-VT swapping based onconventional gate criticality results in wasteful swapping of gatesbecause the criticality of gates change after every low-VT swap. Foreach benchmark circuit, we use Monte Carlo simulations to obtainthe critical probability of each gate in the circuit. We then rankthe gates in the decreasing order of criticality for low-VT swapping.The improvement in yield obtained after each swap is graphicallyrepresented in Fig. 2 for two benchmark circuits from the ISCASbenchmark suite and two modules from the OpenSPARC T1 pro-cessor. The graph for each benchmark circuit has three yield im-provement curves: (i) only independent process variations, (ii) pro-cess variations with low spatial correlation, and (iii) process vari-ations with high spatial correlation. We make two observationsabout the yield improvement curves.

First, the improvement in yield occurs in steps, i.e., discretejumps in yield improvement are interspersed with regions of lit-tle or no yield improvement (flat regions). These steps are moreprominent for independent process variations. The reason for thestep-like improvement in yield lies in the definition of gate critical-ity. Since the critical probability of a gate is the probability that thegate lies on a critical path, when criticality of a path p is translatedto gate criticality, all the gates on p are affected equally by the crit-ical probability of p. However, for speeding-up the path to improvethe yield using low-VT swapping, only a few dominant gates onthe path need to be chosen. Hence, yield enhancement by low-VT

swapping of gates in the order of their criticality leads to wastefulswapping of multiple gates on the same path instead of swappinggates on other critical paths that can lead to better improvements inyield, resulting in steps in the yield improvement curve.

174

Page 3: Dominant Critical Gate Identiï¬cation for Power and Yield

Second, the number of swaps required to achieve the same timingyield increases as the correlated component of process variationsincreases, i.e., the slope of the yield improvement curve decreases.However, the steps in yield improvement become less prominent asthe correlated component of variations increases, i.e., the discretejumps become smaller and the flat regions become shorter. As weobserved in Fig. 1, when the independent component of variationsdominates, many chips violate timing, but only by a small margin.Hence, low-VT swapping of only the dominant gates on a criticalpath suffices and swapping based on conventional gate criticalityleads to more wasteful swapping of gates. As the correlated com-ponent of variations increases, fewer chips violate timing, but by alarger margin. Hence, low-VT swapping of a gate on a critical pathleads to smaller improvements in yield (smaller jumps) and it be-comes necessary to swap multiple gates on a path, resulting in lesswasteful swapping of gates.

3. Dominant critical gatesGiven a place-and-routed design, Sec. 2 described the limitations

of ranking gates for low-VT swapping based on their critical proba-bility. In the following sections, we will describe the formulation ofdominant critical gate ranking problem. This section will introducethe problem formulation for independent process variations. Sec. 4will generalize the formulation to handle correlations.

Consider a place-and-routed design such that the nominal criticalpath delay is equal to the target path delay δ, i.e., effect of processvariations on the timing of the design has not been taken into ac-count. In the presence of variations, to achieve a desired yield γ fora target path delay δ, the nominal critical path delay must be lessthan or equal to δ/s, where s ≥ 1 is called the speed-up factor. Thevalue of the speed-up factor depends on the desired yield γ and theprocess variations affecting the gates on the path. The computationof the speed-up factor is discussed at the end of this section. Forthis discussion, it is assumed that a speed-up factor s is known.

To achieve a speed-up factor of s for the path delay, each gate gi

on the path must be sped-up by a factor of si ≥ 1. We proposean optimization problem — dominant critical gate (DCG) — forcomputing the speed-up si for all the gates in the circuit. The op-timization problem is set up in such a manner that speed-up valuesobtained by solving the optimization problem reflect the dominantcriticality of each gate, i.e., by ranking gates in the order of theirspeed-up for low-VT swapping, wasteful swapping of gates (as ex-plained in Sec. 2) can be eliminated.

We start by analyzing the case of a single path. This will thenbe generalized to multiple paths and finally to a circuit later in thisdiscussion. Consider a path with n gates and a target path delay δ.Let the mean delay of the gates on the path be δ1, δ2, ..., δn. Thespeed-up of each gate gi is a variable si in the optimization problemSP and the desired speed-up of the path is a known value s.

SP : Minimize:Yn

i=1si subject to

nX

i=1

δi

si≤ δ

sand si ≥ 1 i = 1, 2, ..., n (2)

We define the domination factor, di, of a gate gi as the ratio of thecontribution of the gate to reducing the path delay to the increasein objective function when gate gi is incrementally sped-up fromδi/si to δi/(si + ε). Incrementally speeding-up the gate with thelargest domination factor that takes the solution closest to feasibil-ity per unit increase in objective function and thus, the optimiza-tion algorithm will choose to incrementally speed-up the gate onthe path with the largest domination factor. For a single path, the

domination factor of di of gate gi on the path is

di =δi/si − δi/(si + ε)

s1s2...(si + ε) · · · sn − s1s2 · · · sn

=δiε/(s2

i + siε)

s1s2 · · · sn + ε(s1s2 · · · sn/si) − s1s2 · · · sn

=δiε/s2

i

ε(s1s2 · · · sn/si)(for ε → 0)

=δi

(s1s2 · · · sn)si

(3)

The factor s1s2 · · · sn is common to the domination factor of allgates on the path, and thus, the domination factor di for gate iis proportional to the delay of the gate with speed-up, i.e., δi/si.Hence, the optimization problem SP will incrementally speed-upgates with the largest delay until the target speed-up, s, for the pathis achieved. This can be argued to be intuitively correct since in thepresence of independent process variations, the only systematic in-formation available is the mean delays of the gates on the path, andhence for maximum improvement in timing yield the gate with thelargest delay must be sped-up. Note that the objective function ofminimizing the product of speed-ups,

Qni=1 si, ensures that a small

number of gates are assigned a speeds-up value greater than 1 andthus, only the gates that dominate critical paths are sped-up.

Next, we generalize this to the case of k paths, p1, p2, ..., pk,each with delay δ converging at a single gate g. Consider an opti-mization problem that generalizes Eqn. 3. Now there is one con-straint for each path and the delay of the circuit is the maximumdelay over all paths. If the gate g has the maximum delay on allpaths, then gate g would be chosen for incremental speed-up bythe same argument used for a single path. Next, if there is at leastone path pi (but not all paths) on which g has the maximum de-lay, the optimization algorithm would again choose to incremen-tally speed-up gate g. This is because any other gate on pi wouldhave a sub-optimal domination factor. Speeding-up gates on otherpaths would be wasteful because pi would dominate the delay ofthe circuit. Finally, the only case that remains is when each pathpi has a gate gi (different from gate g) with the maximum delayon path pi. Similar to Eqn. 3, the domination factor, d, when gatesg1, g2, ..., gk are incrementally sped-up by ε1, ε2, ..., εk is given byEqn. 4. Note that the numerator mink

i=1(δiεi/s2i ) is the incremen-

tal reduction in the delay at the output of gate g for an incrementalreduction in the delay of the k paths p1, p2, ..., pk. The summationin the denominator represents the first order εi terms of the increasein the objective function. The higher order εi terms, i.e., ε2i , ε

3i ...,

are neglected because εi → 0.

d =mink

i=1(δiεi/s2i )

(s1s2 · · · sn)Pk

i=1 εi/si

(for εi → 0) (4)

Since the factor s1s2 · · · sn in the denominator is common to thedomination factor of all gates, it can be dropped from the expres-sion. Further, the equality in the expression can be converted intoan inequality by replacing

Pki=1 εi/si by the lower bound k ×

minki=1(εi/si). Finally, the expression can be simplified using the

scalar product inequality (a ·b ≤ |a||b|) for the infinity norm. Thus,

d ≤ minki=1(δiεi/s2

i )

k × minki=1((si/δi)δiεi/s2

i )

d ≤ minki=1(δiεi/s2

i )

k × minki=1(si/δi) × mink

i=1(δiεi/s2i )

d ≤ 1

k × minki=1(δi/si)

=1

kmaxk

i=1(δi/si) (5)

175

Page 4: Dominant Critical Gate Identiï¬cation for Power and Yield

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

(i) C499 (ii) C2670

(iii) sparc_ifu_dec (iv) sparc_lsu_ctl

Crit

ical

wei

ght

0 20 40 60 80 1000

0.2

0.4

0.6

0.8

1

0 20 40 60 800

0.2

0.4

0.6

0.8

1

0 20 40 60 80 1000

0.2

0.4

0.6

0.8

1

0 20 40 60 800

0.2

0.4

0.6

0.8

1

(i) C499 (ii) C2670

(iii) sparc_ifu_dec (iv) sparc_lsu_ctl

Yie

ld

Top ranked gates from DCG(a) (b)

DCG

MC

MC

MC

MC

DCG

DCGDCG

Number of low VT swaps

Figure 3: With independent process variations (a) Critical weights of top ranked gates obtained from DCG and (b) Comparison ofyield improvement obtained using using low-VT swapping based on DCG and conventional metric for gate criticality MC. The initialdesign satisfies timing for a target delay Tspec, but has a low timing yield in the presence of process variations.

The domination factor, dg , of gate g where paths p1, p2, ..., pk con-verge, is given by δg/sg . Eqn. 5 shows that the largest dominationfactor among g1, g2, ..., gk must exceed the domination factor ofg by at least a factor of k in order for gate g to not be chosenfor incremental speed-up. The factor of k arises because the gatesg1, g2, ..., gk are topologically inferior to gate g since g1, g2, ..., gk

lie on only one critical path whereas g lies on k critical paths. Thus,the optimization problem is formulated so that the domination fac-tor of each gate is scaled by the number of critical paths passingthrough that gate, i.e., the domination factor arising due to thetopology of the circuit. Thus, the problem formulation is correctlydirected towards speeding-up gates that dominate the critical paths.

Finally, we generalize the optimization problem to a circuit withn gates. Since various paths in the circuit share gates, the struc-tural properties of the circuit will play a crucial role in determin-ing the speed-up of each gate in the optimization problem. Thepath-based constraints are converted into node-based arrival timeconstraints [16]. The optimization problem DCG is setup suchthat a speed-up, s, in the circuit delay is achieved collectively byspeeding-up the dominant critical gates in the circuit.

DCG : MinimizeYn

i=1si subject to

δi/si + Tj ≤ Ti j ∈ fanin(i) and i = 1, 2, ..., n

Ti ≤ Tspec/s i ∈ primary outputs

si ≥ 1 i = 1, 2, ..., n

(6)

where1. si is the speed-up factor of the ith gate,2. δi is the delay of the ith gate,3. Ti is the arrival time at the output of the ith gate, and4. Tspec is a specified circuit delay.

The result of this optimization problem is a speed-up value, si, foreach gate in the circuit, where si is the dominant criticality of thegates in the circuit. We have observed that setting the speed-up, s,to a value in 1.1–1.4 gives the best results in most cases. There aretwo interesting observations about the problem formulation DCG.

1. DCG is a geometric program (GP) optimization problem inthe continuous domain. However, it is used to optimize the

problem of low-VT swapping that is inherently a discrete op-timization problem. A GP-based problem formulation en-sures that the technique is computationally efficient and scal-able to full-chip optimization.

2. DCG does not contain the notion of statistical yield or statis-tical timing. This is because when independent process vari-ations dominate, the only systematic information that can beused during design is the nominal delay. DCG only uses thenominal gate delays in the problem formulation.

The solution to the optimization problem DCG provides a speed-upfor each gate in the circuit. The dominant critical gate ranking isobtained by ranking the gates in the decreasing order of speed-up.

To compare algorithm DCG with conventional gate criticalitybased on Monte Carlo simulations, we plot a histogram of the crit-ical weight for the top ranked gates obtained using dominant crit-ical gate ranking (see Fig. 3(a)). Critical weight is defined basedon conventional gate criticality as the critical probability of a gatenormalized by the highest critical probability among all gates. Forthree circuits C2670, sparc_ifu_dec, and sparc_lsu_ctl,the top ranked gates in DCG have a wide range of critical weights.The circuit C499 is an exception because the paths in C499 arewell-balanced, resulting in a critical weight close to 1 for most ofthe gates. A wide range of critical weight gates arise because algo-rithm DCG assigns a high rank to only the dominant critical gateson critical paths and then ranks other gates on less critical paths thatoffer higher potential for timing yield improvement, even thoughthey may have a small critical weight. We support this claim byplotting the timing yield improvement of the circuits using low-VT

swapping. This plot is shown in Fig. 3(b) for the four benchmarkcircuits and a target yield of 98%. The plot marked MC representsthe timing yield improvement obtained by ranking gates based onconventional criticality metric and the plot marked DCG representsthe timing yield improvement obtained by ranking gates based onalgorithm DCG. On average, algorithm DCG requires only abouthalf the number of low-VT swaps as compared to MC for the sameyield. The lower number of low-VT swaps translates to lower leak-age power overhead to achieve the same timing yield, as reportedin Tables 1 and 2 for all benchmark circuits.

176

Page 5: Dominant Critical Gate Identiï¬cation for Power and Yield

(i) C499 (ii) C2670

(iii) sparc_ifu_dec (iv) sparc_lsu_ctl

Yie

ld

0 50 100 150 200 2500

0.2

0.4

0.6

0.8

1

0 20 40 60 80 1000

0.2

0.4

0.6

0.8

1

0 50 100 1500

0.2

0.4

0.6

0.8

1

0 20 40 60 80 1000

0.2

0.4

0.6

0.8

1

(i) C499 (ii) C2670

(iii) sparc_ifu_dec (iv) sparc_lsu_ctl

0 20 40 60 80 100 120 140 160 180 2000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 20 40 60 80 100 120 140 160 180 2000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 20 40 60 80 100 1200

0.2

0.4

0.6

0.8

1

0 20 40 60 800

0.2

0.4

0.6

0.8

1

Crit

ical

wei

ght

DCG

DCGDCG

DCG

MC

MCMC

MC

Top ranked gates from DCG(a) (b)

Number of low VT swaps

Figure 4: With correlated process variations (a) Critical weights of top ranked gates obtained from DCG and (b) Comparison of yieldimprovement obtained using low-VT swapping based on DCG and conventional metric for gate criticality MC. The initial designsatisfies timing for a target delay Tspec, but has a low timing yield in the presence of process variations.

4. DCG with correlated process variationsIn this section, we extend algorithm DCG to handle correlated

process variations. The use of nominal gate delays, δi, in algo-rithm DCG is justified when independent variations are a domi-nant component of the total process variations. This is becausegate delay variations are decoupled in the presence of independentprocess variations and the only systematic information about post-manufacturing delay of the gates available during design is theirnominal delay. However, when the correlated component in pro-cess variations is also considered, the nominal gate delays do notaccurately capture the post-manufacturing delay distribution of thegates, and thus we would expect the algorithm DCG to be less ef-fective. For the exponential spatial correlation model, we observedthat DCG was less effective when the correlation between two gatesseparated by a unit distance accounted for more than 15–20% of thetotal variations at each gate.

Algorithm DCG can be extended to leverage correlation infor-mation in the process variations. This is accomplished by usingcorrelated gate delay distribution samples instead of nominal gatedelay, δi, in algorithm DCG. The correlated gate delay distributionsamples can be generated from the correlation matrix of the pro-cess variations. The speed-up si obtained for each gate gi from al-gorithm DCG is then averaged over multiple correlated gate distri-bution samples to obtain the speed-up in the presence of correlatedprocess variations. The final speed-up value defines the dominantgate criticality ranking of the gates. In this approach, the quality ofthe solution depends on the number of correlated gate distributionsamples used. However, we observed that for the largest bench-mark circuits, using more than a few hundred samples offers di-minishing returns in terms of improving the quality of the solutionand the increase in runtime. In this paper, we average the speed-up of each gate over 1000 correlated delay distribution samples toobtain the final speed-up for each gate.

As noted in Fig. 1, from the perspective of timing yield, cor-related process variations cause fewer chips to fail, but by a largertiming margin. Hence, from a dominant gate criticality perspective,multiple gates on a path would have a high dominant gate criticalityrank in order to counter the effect of process variations on the path.This effect is captured by the correlated delay distribution samplesand thus incorporated in the final speed-up of each gate. The plot ofcritical weights shown in Fig. 4(a) illustrates this effect. Although

the overall distribution of critical weights is similar to the case ofindependent process variations shown in Fig. 3(a), the number ofgates chosen for each criticality weight is higher. Fig. 4(b) com-pares the yield improvement curves for the four benchmark circuitsusing algorithm DCG and conventional gate criticality.

5. ResultsIn this section, we present and compare results for yield im-

provement using low-VT swapping based on gate criticality rank-ing obtained using algorithm DCG and the conventional definitionof gate criticality. The effectiveness of each criticality ranking willbe assessed based on the number of low-VT swaps and leakagepower overhead required to achieve a target yield. The comparisonwill demonstrate the effectiveness of algorithm DCG in identify-ing small sets of dominant critical gates to achieve the same timingyield with a lower leakage power overhead as compared to rankingthem using conventional gate criticality.

The techniques are compared using 12 benchmark circuits fromthe ISCAS benchmark suites and modules from the OpenSPARCT1 processor. The simulation setup used for comparison was de-scribed in Sec. 2. On average over various gates in the library, low-VT gate cells improve the delay by 20% and increase the leakagepower dissipation of a gate by 11X. Tables 1 and 2 present resultsfor independent and correlated process variations, respectively. Thename and number of gates for each benchmark circuit is reportedin the first two columns of the tables. The critical probabilities ofthe gates is obtained using 100K Monte Carlo runs for each bench-mark circuit. The gates are then ranked in decreasing order of criti-cal probability for performing yield improvement based on low-VT

swapping. The results for this technique are reported in the col-umn MC in the tables for target yield of 95% and 98%. Resultsfor yield improvement based on dominant critical gate ranking ob-tained from algorithm DCG for the same target yields are reportedin the column DCG. The number of low-VT swaps and the leakagepower overhead over the base design (without any low-VT swaps)are reported in the columns “No. swaps” and “Leakage ovh.”, re-spectively, for each technique and yield combination. The runtimefor each technique in seconds is indicated under “Runtime”.

Results indicate that timing yield improvement using low-VT swap-ping of gates based on algorithm DCG requires 57% and 32% fewerswaps than the conventional metric of gate critical probability for

177

Page 6: Dominant Critical Gate Identiï¬cation for Power and Yield

Table 1: Comparison of yield improvement using low-VT swapping for independent process variations

Circuit GatesMC DCG

Yield = 0.95 Yield = 0.98 Runtime Yield = 0.95 Yield = 0.98 RuntimeNo. swaps Leakage ovh.(%) No. swaps Leakage ovh.(%) (sec) No. swaps Leakage ovh.(%) No. swaps Leakage ovh.(%) (sec)

b9 95 11 118 12 125 14.8 5 41 6 52 0.03C432 212 17 120 19 132 35.8 14 95 18 130 0.22C880 348 5 28 10 45 64.9 6 28 7 30 0.26C499 574 95 199 96 200 111.3 32 94 41 110 0.75C2670 767 62 88 63 90 152.8 29 40 31 42 0.45C5315 1450 56 51 57 59 388.9 23 16 27 19 1.63C7552 1782 75 43 143 86 593.3 16 11 18 12 2.24

sparc tlu intctl 253 53 354 53 354 45.8 19 95 20 98 0.25sparc ifu dec 884 79 103 91 117 185.8 23 23 26 27 1.25lsu excpctl 729 52 83 64 100 147.8 16 22 18 26 0.34

lsu stb rwctl 652 53 177 54 179 129.9 21 72 22 73 0.67lsu stb ctl 897 117 220 148 248 180.1 22 43 26 49 1.09

Average – – 132 – 145 – – 48 – 56 –

Table 2: Comparison of yield improvement using low-VT swapping for spatially correlated process variations

Circuit GatesMC DCG

Yield = 0.95 Yield = 0.98 Runtime Yield = 0.95 Yield = 0.98 RuntimeNo. swaps Leakage ovh.(%) No. swaps Leakage ovh.(%) (sec) No. swaps Leakage ovh.(%) No. swaps Leakage ovh.(%) (sec)

b9 95 12 127 13 137 18.9 6 52 10 96 46.2C432 212 110 469 134 570 80.6 33 177 38 196 166.2C880 348 32 107 38 122 233.4 26 73 34 94 232C499 574 211 410 233 450 636.9 142 260 182 323 585C2670 767 79 110 84 116 1251.7 52 73 82 105 581.5C5315 1450 176 118 178 119 3870 91 56 112 69 1518.4C7552 1782 153 86 156 88 9620.6 53 29 72 38 2246.6

sparc tlu intctl 253 60 375 68 402 119.2 33 160 41 207 138.5sparc ifu dec 884 99 125 127 155 1403.3 88 96 114 125 1270lsu excpctl 729 67 103 82 121 1035.2 48 67 63 85 476

lsu stb rwctl 652 71 202 97 236 845.8 45 125 67 172 390.4lsu stb ctl 897 147 248 155 254 1515.6 47 78 60 103 692.9

Average – – 207 – 231 – – 104 – 134 –

independent and correlated process variations, respectively. Thereduced number of low-VT swaps translates to 61% and 42% re-ductions in leakage power overhead for achieving the same timingyield in the presence of independent and correlated process vari-ations, respectively. Note that we report the reduction in averageleakage power. The actual reductions would be much higher in thepresence of process variations due to the exponential dependenceof leakage power on process variation parameters [18].

6. ConclusionsPower and yield optimization techniques in the presence of pro-

cess variations rely on the identification of a set of critical gatesthat offer the best improvement in timing yield with least increasein power. This paper demonstrated that improving timing yield us-ing low-VT swapping based on conventional criticality ranking ofgates is ineffective and results in high leakage power overhead. Toaddress this problem, we proposed dominant critical gate rankingand demonstrated that low-VT swapping based on dominant criti-cal gate ranking significantly reduces the leakage power overheadrequired to achieve the same timing yield in the presence of bothindependent and spatially correlated process variations.

References[1] S. Borkar et al., “Parameter variations and impact on circuits and microarchitec-

ture,” in Proc. Design Automation Conference, pp. 338–342, 2003.[2] G. Roy et al., “Simulation study of individual and combined sources of intrinsic

parameter fluctuations in conventional nano-MOSFETs,” IEEE Trans. ElectronDevices, vol. 53, pp. 3063–3070, 2006.

[3] A. Srivastava, D. Sylvester, and D. Blaauw, Statistical analysis and optimizationfor VLSI: Timing and power. New York, NY: Springer, 2006.

[4] N. Hanchate et al., “Statistical gate sizing for yield enhancement at post layoutlevel,” in Proc. Annual Symposium on VLSI, pp. 245–252, 2007.

[5] T. Luo et al., “Total power optimization combining placement, sizing and multi-VT through slack distribution management,” in Proc. Asia and South Pacific De-sign Automation Conference, pp. 352–357, 2008.

[6] A. Davoodi et al., “Variability inspired implementation selection problem,” inProc. Intl. Conference Computer-aided Design, pp. 423–427, 2004.

[7] H. Chou et al., “Fast and effective gate-sizing with multiple-VT assignment us-ing generalized lagrangian relaxation,” in Proc. Asia and South Pacific DesignAutomation Conference, pp. 381–386, 2005.

[8] S. Shah et al., “Discrete VT assignment and gate sizing using a self-snappingcontinuous formulation,” in Proc. Intl. Conference Computer-aided Design,pp. 705–712, 2005.

[9] D. Nguyen et al., “Minimization of dynamic and static power through joint as-signment of threshold voltages and sizing optimization,” in Proc. Intl. Sympo-sium on Low Power Electronics and Design, pp. 158–163, 2003.

[10] F. Wane et al., “A novel criticality computation method in statistical timing anal-ysis,” in Proc. Design Automation and Test in Europe, pp. 1–6, 2007.

[11] J. Xiong et al., “Incremental criticality and yield gradients,” in Proc. DesignAutomation and Test in Europe, pp. 1130–1135, 2008.

[12] X. Li et al., “Defining statistical timing sensitivity for logic circuits with large-scale process and environmental variations,” IEEE Trans. Computer-aided De-sign, vol. 27, pp. 1041–1054, 2008.

[13] H. Mogul et al., “Fast and accurate statistical criticality computation under pro-cess variations,” IEEE Trans. Computer-aided Design, vol. 28, pp. 350–363,2009.

[14] B. Liu, “Spatial correlation extraction via random field simulation and produc-tion chip performance regression,” in Proc. Design Automation and Test in Eu-rope, pp. 527–532, 2008.

[15] “Predictive Technology Model (PTM).” Please visit the URL http://www.eas.asu.edu/˜ptm/ for further details.

[16] S. Boyd et al., “Digital circuit optimization via geometric programming,” Oper-ations Research, vol. 53, no. 6, pp. 899–932, 2005.

[17] “CAPO Place and Route tool.” Please visit the http://vlsicad.eecs.umich.edu/BK/PDtools/ for further details.

[18] H. Chang et al., “Full-chip analysis of leakage power under process variations,including spatial correlations,” in Proc. Design Automation Conference, pp. 523–528, 2005.

178