914 ieee transactions on circuits and systemsâ€”i: regular

914 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 55, NO. 3, APRIL 2008

On-Chip Power-Grid Simulation Using LatencyInsertion Method

Subramanian N. Lalgudi, Madhavan Swaminathan, Fellow, IEEE, and Yaron Kretchmer, Member, IEEE

Abstract—Ensuring the integrity of the power supply in thepower distribution networks (PDNs) of a chip is essential forbuilding reliable high-performance chips. To ensure the powerintegrity, accurate, and memory- and time-efficient simulationapproaches for simulating the power-supply noise in the on-chipPDN are essential. In this paper, a finite-difference formulationbased on the latency insertion method (LIM) has been employedfor simulating the power-supply noise in the on-chip PDN. Anew common-mode type equivalent circuit has been proposed. Inthis equivalent circuit, a capacitance to ideal ground may not bepresent at all the nodes. Further, the nodes can be capacitivelycoupled to each other. To avoid inverting a large nonbandedmatrix, a small capacitance to ground is added to a node that didnot have any capacitance to ground, and a small series inductanceis added to any floating capacitor that did not have any seriesinductance. Approximate closed-form expressions to compute thevalues of these capacitances to ground and series inductances havebeen proposed. The accuracy of the LIM-enabled transient simu-lation and the accuracy of the proposed closed-form expressionshave been demonstrated. The memory and time complexity ofthe simulation for each time step have been shown to be � �each, where is the number of nodes in the equivalent circuit.Stability condition is derived for the first time for multidimen-sional inhomogeneous RLC circuit. A upper bound of the time stepis derived from the stability condition. Using this bound on thetime step, the runtime of the overall transient simulation has beenestimated to be approximately proportional to � � � for inthe order of millions.

Index Terms—Computational complexity, explicit, floating ca-pacitor, implicit, latency insertion method (LIM), power distribu-tion network (PDN).

I. INTRODUCTION

THE power-supply noise [1] in the on-chip power distri-bution networks (PDN) is becoming increasingly worse

with the increase in the clock frequency and the density of theswitching circuits, and with the decrease in the supply-voltagelevels. Therefore, ensuring power integrity is a challenge in de-signing future high-performance chips.

Analyzing the on-chip PDN through modeling and simu-lation in the various stages of the design is a cost-effectiveway [2] of ensuring the power integrity. In a pre-layout design

Manuscript received April 23, 2006; revised May 10, 2007, and October 8,2007. This work was supported partly by Altera Corporation and partly by Semi-conductor Research Corporation. This paper was recommended by AssociateEditor R. Puri.

S. N. Lalgudi and M. Swaminathan are with the School of Electrical and Com-puter Engineering, Georgia Institute of Technology, Altanta, GA 30332 USA(e-mail: [email protected]; [email protected]).

Y. Kretchmer is with Altera Corporation, San Jose, CA 95134 USA (e-mail:[email protected]).

Digital Object Identifier 10.1109/TCSI.2008.918223

stage, the on-chip PDN analysis involves constructing sim-plified electrical equivalent circuits [3] of the on-chip PDNand the switching sources, and solving the resulting circuitproblem. The implicit or the explicit circuit simulation methods[4] can be used for discretizing the Kirchoff’s current andvoltage equations. A direct solver [5] or an iterative solver[6] can be employed to solve the resulting sparse, nonbandedlinear system. Since repeated ‘what-if’ analysis is required in apre-layout stage, the simulation methods and the solvers haveto be memory- and time-efficient (computationally efficient),and accurate. The biggest challenge in the on-chip PDN sim-ulation is the problem size (number of nodes): the problemsize is usually large (in the order of millions). For such largeproblems, the general-purpose circuit simulators such as SPICE(Simulation Program with Integrated Circuit Emphasis) [7] arecomputationally inefficient. Therefore, there is a need for de-veloping computationally more efficient simulation approaches(than SPICE) for the on-chip PDN analysis.

Implicit (numerical integration) methods for the on-chipPDN transient simulation have been proposed in [8]–[13].These methods are unconditionally stable (i.e., no restrictionon time step). They require solving a matrix system at everytime step of the transient simulation. The methods [8]–[10] arebased on iterative solvers and can suffer from accuracy or slowconvergence problems. The rest of them are based on directsolvers and/or on statistical solvers. The hierarchical approach[11] manages the complexity, but can compromise accuracy.The random-walk method [12] guarantees a linear run-timeand memory as long as the constant of proportionality is muchsmaller compared to the number of nodes, . This methodcan become inefficient [13] when the voltage of all the nodeshave to be computed or when a high accuracy is needed.

Explicit (numerical integration) methods based on finite-dif-ference formulation for the on-chip PDN transient simulationhave been proposed in [15]–[17]. The methods [16] and [17]provide a transient solution (for equivalent circuits arising outof on-chip PDNs) that:

1) is as accurate as the transient solution from SPICE;2) requires memory;3) requires time per time step;4) guarantees (1)–(3) :

a) independent of the on-chip PDN geometry (regular orirregular);

b) independent of the packaging technologies (flip-chipor wire-bond);

c) even if all node voltages are computed at each timestep (required for tracking the noise propagation withtime).

1549-8328/$25.00 © 2008 IEEE

Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on November 23, 2008 at 16:31 from IEEE Xplore. Restrictions apply.

LALGUDI et al.: ON-CHIP POWER-GRID SIMULATION USING LIM 915

However, these methods are not unconditionally stable (i.e.,these methods have an upper bound on the time step). Themaximum time step, , is dependent on the smallest induc-tance and capacitance in the circuit, and can become small (inthe order of femtoseconds). In this paper, an explicit method isused for the simulation.

The explicit method [16] and [17] has two problems in theon-chip PDN equivalent circuit that can potentially affect the

memory and time complexities per time step of themethod. In this paper, these two problems have been solvedwhile still maintaining the original advantages of the method.In [16] and [17], a distributed -type RLC equivalent circuit hasbeen employed for modeling the power and the ground lines inthe on-chip PDN. Loop-based quantities are used in this equiv-alent circuit, and the capacitance is dropped to ideal ground.The per-unit-length (p. u. l.) loop resistance, loop inductance,and capacitance of a line have been extracted assuming that thereturn currents flow in the lines in the coplanar metal layer andin the lines in the alternate metal layers. Since the lines in theadjacent layers are routed orthogonally, there is no inductivecoupling between lines in the adjacent layers, and therefore,their effect is ignored in this extraction. There are two problemsin this procedure. 1) Though the lines in the adjacent layersdo not affect the inductance and the resistance extraction, theycan affect the capacitance extraction—the lines in the adjacentmetal layer can shield the electric flux and hence can blockthe flux from reaching the lines in the alternate metal layers;therefore, the capacitance is overestimated without consideringthe effect of the lines in the adjacent layers. 2) Since thecapacitance is actually between a line and its return path, andsince the lines comprising the return path are nonideal, thecapacitance should not be dropped to ideal ground; instead, ithas to be inserted between two nonideal nodes.

These two problems have been addressed by modifying thecapacitance extraction procedure and the equivalent circuit. Thefirst problem can be corrected by considering only the capaci-tances between lines in the coplanar metal layer and betweenlines in the adjacent metal layers. The second problem can becorrected by having these capacitances between nonideal nodes,i.e., these capacitances are now floating capacitances. There-fore, a new equivalent circuit for the on-chip PDN has been pro-posed. Two problems were foreseen while simulating the power-supply noise using the approach proposed in [17] in the newcorrected equivalent circuit. 1) In an interdigitated power-grid(power and ground lines alternate), which is the most commontype of power grid, the coplanar line-to-line capacitances andthe adjacent-layer line-to-line capacitances coupled all nodes ina cross section of the on-chip PDN. As a result, the number ofsuch nodes can be a function of . Since in [17], the voltages ofnodes that are capacitively coupled are solved simultaneously,a linear system whose size is a function of has to be solved.Since there can be many such sets (the number of sets can alsobe a function of ) of capacitively coupled nodes, the nodevoltages of all nodes are updated by solving many (dependenton ) linear systems whose sizes are dependent on . As aresult, the memory and time complexity of updating the volt-ages of all nodes for a particular time step cannot be strictlyguaranteed to be . 2) Since the line-to-line capacitances

are now floating, there can be some nodes in the new equiva-lent circuit that are connected to their neighbors only throughseries resistor-inductor branches and would not have a capaci-tance to (ideal) ground. For such a node, the Kirchoff’s currentlaw (KCL) at the node would not relate the derivative of thenode voltage to the currents in the branches connected to thenode. Such a relation is essential for updating the node voltagesand the branch currents independently, and is required by the ap-proaches [15]–[17]. In this paper, these two simulation problemshave been overcome while maintaining the memory andtime complexity of the simulation for each time step.

Recently, an explicit simulation approach called the latencyinsertion method (LIM) [18] has been proposed for large net-works in which the nodes need not have capacitance to groundand can be coupled directly to the other nodes either througha resistive or a capacitive branch. In the LIM, a small capac-itance to ideal ground is added to a node that did not have acapacitance to ground, and a small series inductance is addedto a branch that did not have a series inductance. By addingthese extra circuit elements (referred to as fictitious elementsin this paper) the method avoids inverting a large nonbandedsystem. The formulation [16] and [17] are originally based onLIM, but did not have to use these extra elements. However, tosolve the two simulation problems above, the fictitious elementsare added in this paper. Adding fictitious elements causes twoproblems, one related to the accuracy and the other related tothe time complexity.

When fictitious elements are added, accuracy can be com-prised. As a result, their values have to be kept small. In theoriginal LIM [18], the values of the fictitious elements are com-puted by repeating the simulation with successively reducedvalues of these elements until the accuracy is no longer compro-mised. Such trial-and-error approach to computing these valuescan get prohibitive in terms of time in circuits with a large

, especially in the on-chip PDN equivalent circuits. To avoidthis trial-and-error approach, the values of the fictitious ele-ments have to be computed prior to the simulation. However,for generic circuits, such computation is difficult. In this paper,closed-form expressions for computing the values of the ficti-tious elements have been proposed for the new equivalent cir-cuits of the on-chip PDN. Therefore, fictitious element valuesare known before the transient simulation, avoiding the cum-bersome trial-and-error approach to computing them.

As the values of the fictitious elements are kept small, themaximum time step of the simulation would be affected. Conse-quently, the number of time steps, , can become large. Since

(and hence ) is dependent only on the smallest induc-tance and capacitance values, (and hence ) is indepen-dent of . Therefore, the time complexity of the transient sim-ulation is theoretically . However, in practice,the time complexity is usually more and is not known quantita-tively. This is because a realistic estimate for the time step is notknown. However, estimating the time complexity is importantin assessing the relative merits/demerits of the approach and infinding methods that improve it. It has been shown in this paperthat the runtime of the whole transient simulation is approxi-mately proportional to for practical problems (definedas ).



Finally, in the original LIM [19], the upper bound for the timestep is only derived for one-dimensional homogeneous RLC cir-cuits. However, the equivalent circuit of the on-chip PDN is a3-D inhomogeneous RLC circuit. For such circuits, it is difficultto extend the stability analysis proposed in [19]. This difficultyis due to using the von Neumann method [20] for the stabilityanalysis. This method requires the circuits to be homogeneousand infinitely long. In this paper, an expression for the upperbound of the time step of the transient simulation is proposed formultidimensional inhomogeneous RLC circuit. The circuit neednot be infinitely long and can also be discontinuous. This deriva-tion is based on the recently-introduced stability analysis proce-dure [21], [22] for the FDTD method using the energy method[23]. The new derivation does not suffer from the limitations ofthe von Neumann method.

In this paper, the solution to the two simulation problems (dis-cussed earlier in this section) that arose in the explicit method[17] due to the changes in the on-chip PDN equivalent circuithave been presented by adding fictitious elements like in theLIM. The presented solution preserves the original advantages(listed earlier in this section) of the explicit method [17] even forthe new on-chip PDN equivalent circuit. Closed-form expres-sions for computing the values of the fictitious elements havebeen proposed. Therefore, the fictitious element values can becomputed prior to the simulation, and hence, the time that wouldotherwise be incurred in the trial-and-error approach of findingthe fictitious element values is avoided. The effect of the ficti-tious elements on the maximum time step has been found. It hasbeen found that the maximum time step is reduced further, andthis time step can be in the order of femtoseconds. The runtimefor the overall simulation has been estimated to be proportionalto for in the order of millions of nodes. Finally, an-alytical stability condition for the LIM has been derived usingthe energy method for multidimensional inhomogeneous RLCcircuit.

The contribution of this paper are as follows.1) A new common-mode type equivalent circuit for the

on-chip power grids.2) A LIM-enabled formulation for the power-grid transient

simulation for the new common-mode type equivalent cir-cuit guaranteeing complexity per each time step.

3) Avoiding trial-and-error-based approach of computing thefictitious element values through a closed-form-based ap-proach.

4) Derivation of upper bound of the time step of the semi-im-plicit scheme-enabled LIM for multidimensional inhomo-geneous RLC circuits.

5) Getting an estimate for the practical runtime of the pro-posed LIM-based transient simulation on the proposedequivalent circuit.

The rest of the paper is organized as follows. In Section II, thenew equivalent circuit of the on-chip PDN has been described.In Section III, the LIM has been described. In Section IV, theanalytical stability condition for the semi-implicit scheme-en-abled LIM for inhomogeneous RLC circuit has been derived.In Section V, the proposed LIM-enabled formulation for thepower-grid simulation has been described. In Section VI, theapproximate closed-form expressions for the values of the fic-

Fig. 1. Simplified 3-D view of an on-chip PDN with 3 metal layers; M1 is themetal layer closest to the silicon substrate; M3 is the metal layer farthest fromthe substrate; and M2 is the metal layer between M1 and M3.

Fig. 2. The equivalent circuit of the on-chip PDN shown in Fig. 1.

titious elements have been derived. In Section VII, the memoryand time complexities of the LIM for the on-chip PDN transientsimulation have been derived. In Section VIII, the accuracy ofthe LIM-enabled transient simulation and the accuracy of theproposed closed-form expressions have been demonstrated. Fi-nally, in Section IX, the conclusions have been reported.

II. EQUIVALENT CIRCUIT MODELS OF THE ON-CHIP PDN AND

THE SWITCHING SOURCES

In this section, the new equivalent circuit of the on-chip PDNand the equivalent circuits employed for the switching sourceshave been described. A simplified 3-D view of an on-chip PDNwith three metal layers shown in Fig. 1 has been consideredfor the simulation. The equivalent circuit of the on-chip PDNshown in Fig. 1 has been shown in Fig. 2. The conducting groundplane (see Fig. 1) beneath the substrate is assumed to be ideal.The power and the ground lines, marked as Vdd and Vss, re-spectively, in Fig. 1, are modeled separately by a distributed se-ries resistor-inductor model. The number of segments in a linein a metal layer is determined by the spacing of lines in themetal layers above and below, and by the minimum wavelength(by one-tenth of minimum wavelength) present in the switchingsources. The vias are modeled by a lumped series resistor-in-ductor model.



The line-to-line capacitance between lines in adjacent layers,referred to as the crossover capacitance [16] (see Fig. 1), com-prises of both the overlap area capacitance and the fringingcapacitance. Since the metal layers below and above a metallayer usually shield the electric flux lines, the crossover capaci-tance between lines in nonadjacent layers are usually small andtherefore, are not modeled. The crossover capacitor is presentat locations where lines in adjacent layers crossover. This ca-pacitor has a much higher impedance compared to that of thevia even at the highest frequency of operation. This capacitorbetween power-power (ground-ground) lines in adjacent layerscomes in parallel with a low-impedance power (ground) via;hence, its effect is not felt; and therefore the crossover capac-itors between power-power (ground-ground) lines in adjacentlayers are not modeled. However, the crossover capacitance be-tween power-ground lines in adjacent layers do not have anylow-impedance path in parallel, and hence, its effect might befelt. Since these capacitances are between the power-groundlines, they can act as a decoupling capacitance. The crossovercapacitance increases with the decrease in the inter-layer thick-ness, and with the increase in the line width and thickness [16].

The line-to-line capacitance between a power line and itsnearest ground line in the coplanar layer is directly proportionalto their thicknesses and is inversely proportional to the distancebetween them. Since the distance between a power line and itsnearest ground line in a metal layer (typical value may lie be-tween 10 and 100 m) is usually much greater than the distancebetween adjacent metal layers (typical values may lie between 1and 5 m), the coplanar line-to-line capacitance is usually verysmall compared to the crossover capacitance and therefore, isnot modeled.

Besides the line-to-line capacitances described above, thereis also a line-to-ground capacitance (see Fig. 2) between linesin M1 and the conducting ground plane (assumed to be ideal)beneath the silicon substrate. Apart from the line capacitancesdescribed above, there are built-in and added on-chip decou-pling capacitances [3]. These capacitances are not modeled inthis paper. Moreover, the conductivity of the substrate could af-fect the dc voltages of the ground nodes [24]. However, in thiswork, this effect is not modeled.

Besides the coupling capacitances, there are also coupling in-ductances between lines. Since the coupling inductance betweentwo lines is inversely proportional to the distance between them,the error introduced by ignoring coupling between distant lineswould not be much. Ignoring the far-away coupling and con-sidering only the nearby coupling are fraught with stability is-sues [25] and should be dealt with caution (see [26] and the ref-erences therein). In this work, though a uncoupled inductancemodel is proposed, the coupling between neighboring lines ispartially addressed. This is because the inductance is a loopinductance (which is function of both the self and the mutualpartial inductances) assuming nearby return paths (see [16] formore details regarding the inductance extraction), so the effectof coupling between nearby lines are already taken into account.Since only the self loop inductance is used in the model, the sim-ulation does not suffer from the stability issues.

Ideal voltage sources are assumed at the locations of thepower and the ground bumps. The switching transistors are

Fig. 3. Switching and leakage current models.

modeled as linear triangular current sources (see Fig. 3). Aswitching current source exists between a power node and theground node closest to the power node. Apart from the dynamicpower dissipation (which is modeled by the switching sources),there is also static leakage power dissipation. Since the leakagepower is significant in today’s transistors, its effect has beenmodeled. Since the leakage current through a transistor ispresent even if the transistor does not switch and is presentas long as the power is supplied to the transistor, this currentwould produce a fixed resistive voltage drop (also referred to asIR-drop) in the power and the ground lines of the PDN. Whenthe transistor switches, there will be variation in the voltagesof the nodes of this transistor, which can affect the magnitudeof leakage current. However, when this voltage variation isnot a significant fraction of the dc voltage of this node, themagnitude of the leakage current can be assumed to be constantand equal to the magnitude obtained with a clean power supply.Therefore, the leakage current has been modeled as a constantdc current source (see Fig. 3). A leakage current source, likethe switching current source, exists between a power node andthe ground node closest to this power node.

III. LIM

In this section, the LIM has been described. The LIM is de-veloped to simulate the high-frequency response of a large net-work in the time domain. In this method, a finite-difference for-mulation is used to update branch currents and node voltagesin a leapfrog manner similar to the Yee algorithm used in thefinite-difference time-domain (FDTD) method [27]. As a re-sult, the LIM has linear computational complexity. The LIM isreadily enabled in networks with latency. A network has latencyif each node in it has a shunt capacitance to ground and eachbranch in it has a series inductance. Such networks are observedin distributed RLC-based transmission line circuits. If latency ismissing in some parts of the network, then latency is inserted toenable the LIM. Like the FDTD method, the LIM has an upperbound on the time step, dictated by stability requirements of theupdate algorithm. In the rest of the section, the formulation, ac-curacy, computational complexity, and stability of the LIM havebeen described for a network containing linear sources.

In Fig. 4, a sample circuit is shown for which the LIM isenabled. This type of circuit is also common in the proposedequivalent circuit of the on-chip power grids. The symbols inFig. 4 mean the following: and refer to the nodes; the sub-script refers to a branch between nodes and ; andare the series resistance and inductance of the branch betweennodes and , respectively; refers to the shunt capacitancefrom node to ideal ground; refers to the voltage at node

and time ; refers to the current in the branch between



Fig. 4. Typical equivalent circuit to enable LIM.

Fig. 5. Conceptual equivalent circuit at node �.

nodes and ; refers to the current due to the th currentsource connected to node . Leapfrog scheme is a second-orderintegration method to solve differential equations. This schemerelies on staggering the voltages and the currents by half a spacestep and half a time step. By defining currents for all branchesand voltages for all nodes, the spatial staggering needed for theleapfrog scheme is accomplished. By defining branch currents(also source currents) at time instants and node voltages at

, where , the temporal stag-gering needed for the leapfrog scheme is also met. The leapfrogscheme is referred to as a semi-implicit scheme in [19]. LIM canalso be formulated using first-order schemes like fully explicitand fully implicit schemes [19].

In the LIM, the transient simulation is accomplished by up-dating the node voltages and the branch currents at each timestep. These expressions are derived for the circuit shown inFig. 4, starting with the update expressions for the node volt-ages. The conceptual equivalent circuit at node in Fig. 4 lookslike as shown in Fig. 5. In Fig. 5, refers to the th branchcurrent entering node at time ; and is the thsource current entering node at time . From KCL atnode

(1)

can be obtained, where is the number of branches incidenton node , and is the number of sources incident on node .

Fig. 6. Equivalent circuit of a branch between node � and � .

When the derivative in (1) is discretized at time , thevoltage at time , , is given by

(2)

In (2), is expressed in terms of only the quantitiesknown at . Consequently, (2) is an explicit expressionfor updating the voltage for any node whose conceptual equiv-alent circuit is as shown in Fig. 5.

The update expression for the branch currents can be derivedfollowing a procedure similar to that of the node voltages. Theequivalent circuit of a branch is shown in Fig. 6. When theKirchoff’s voltage law (KVL) is applied along this branch, theequation

(3)

can be obtained. When the derivative in (3) is dis-cretized at time , an explicit update expres-sion for can be obtained as

(4)

At each time step in the transient simulation, first, all nodevoltages are updated through (2), and next, all branch currentsare updated through (4). The accuracy of the transient solutionscales as . The memory complexity of the LIM is

, and its time complexity is ,where is the total number of branches in the network, henceyielding an optimally efficient algorithm. The time step, ,in the LIM has an upper bound. This restriction on the timestep follows from the need to keep the simulation numericallystable. In the next section, analytical stability condition isderived for inhomogeneous RLC circuit. This condition in turnleads to a upper bound on the time step.

IV. STABILITY CONDITION OF SEMI-IMPLICIT

SCHEME-ENABLED LIM FOR INHOMOGENEOUS

RLC CIRCUITS

In this section, analytical stability condition of the LIMformulated using a semi-implicit scheme [19] is derived forinhomogeneous RLC circuits. This scheme, as explained in theprevious section, results in update expressions (2) and (4). Thederived condition is valid for RLC circuits where each node hasan arbitrary nonzero capacitance to ground and each branch hasan arbitrary nonzero series inductance. Each branch can also



Fig. 7. Equivalent circuit of the on-chip PDN shown in Fig. 1 with fictitiouselements; fictitious capacitance to ground is added to nodes in M2 and M3 anda fictitious series inductance is added each crossover capacitor.

have a resistance. The RL values (C values) of branches (ofnodes) can be different from each other. Moreover, each nodecan be connected to any number of branches. Since the equiv-alent circuit of the on-chip PDN is such a circuit (see Fig. 2 orFig. 7), deriving this stability condition becomes critical.

The goal of this section can be mathematically described asfollows. Consider an inhomogeneous RLC circuit in which theLIM can be enabled (see Fig. 4). Let be the capacitance toground from node , and be the diagonal ma-trix comprising of ’s. Let and be the inductance andresistance of the branch , respectively, and letand be the corresponding diagonal matrices ofbranch inductances and resistances, respectively. Letbe the voltage of node at time instant , and let

be the vector of node voltages. Similarly,be the current in branch at time instant , and let

be the vector of branch currents. Letbe the incidence matrix of edges to nodes. An entry in this ma-trix corresponding to branch and node is defined as

if is flowing out of nodeif is flowing into nodeotherwise.

In the absence of current sources, the discretized version ofKCLs at all nodes can be written as

(5)

where the symbol is the transpose of . Similarly, in the ab-sence of voltage sources, the discretized version of KVLs usinga semi-implicit integration scheme [19] in all branches can bewritten as

(6)

The node voltages and branch currents updated according to (5)and (6), respectively, have bounded values only for some values

of time step . In other words, the transient simulation definedby (5) and (6) is only conditionally stable. The objective of thissection is to find the condition on that keeps the simulationdefined by (5) and (6) stable.

Prior work in the stability analysis of LIM [19] have only beenable to prove stability condition for the RLC circuits describedearlier in this section under the following assumptions: 1) RLCvalues have to be same everywhere in the circuit; 2) circuit has tobe infinitely long; and 3) each node has to be connected to onlytwo branches. All these assumptions are not true for on-chippower grid equivalent circuits. The restrictions stated above arethe consequence of using von Neumann method [20]. Using thismethod, the condition on is derived from the conditions onamplitudes of Fourier transforms (w. r. to space) of voltages andcurrents to be bounded. The first two assumptions are crucialduring the Fourier transformation.

Since the LIM is the circuit analog of the well-known FDTDmethod [27], existing stability analysis procedures for theFDTD method can be explored for proving the stability of theLIM. In [28], stability conditions of the FDTD method arederived for 3-D lossless homogeneous dielectric medium usinga complex-frequency analysis approach. However, it is difficultto extend this approach when the medium (or circuit) is lossyand inhomogeneous.

In [21], suffient condition for stability of the FDTD method isderived for 3-D lossy inhomogeneous dielectric medium. Thisderivation is not based on the von Neumann method but is in-stead based on the energy method [23], introduced to the FDTDcommunity in [22]. This derivation does not suffer from any ofthe restrictions of the von Neumann method and of the complex-frequency analysis approach. The energy method is known tothe control systems community as the direct Lyapunov methodfor continuous [29], [30] or discrete [31] dynamical systems.

There are four important differences between the LIMproblem discussed in this section and the FDTD problem [21].1) The loss in the circuit problem (same as the LIM problem)is due to the conductors (power and ground conductors), whileit is due to the dielectric in [21]. 2) LIM discretizes only thecircuits even when the circuits are discontinuous, while theFDTD problem discretizes both the dielectric medium andthe free space. Therefore, the FDTD problem always solvesa continuous problem domain. 3) Unlike the FDTD problem,the LIM problem can have more than three dimensions. In theLIM problem, the dimensions weakly refer to the number ofbranches connected to a node, which can be more than three. 4)Unlike the FDTD problem, the circuit problem is nonuniformwith respect to the number of branches connected to a node. Inthe rest of the section, the approach in [21] is adapted to derivestability conditions for inhomogeneous RLC circuits, startingwith the statement of the Lyapunov’s stability theorem.

The direct method of Lyapunov for a discrete-time system canbe stated as follows [29], [31]. Let be a vector of statesof system, and be the equilibrium point. Suppose thereexists a scalar function continuous in such that

(7)

(8)



Then, is stable. Moreover, if

(9)

then is asymptotically stable. If satisfies (7) and(9) along with the condition that

(10)

then is globally asymptotically stable. The symbol in(10) stands for the -norm of the vector , where . Acontinuous scalar function satisfying (7) and (8) is called aLyapunov function. Existence of a Lyapunov function is a suf-ficient condition for the stability of . In the following,an energy-like function is chosen as a scalar function, and theconditions for this function to be a Lyapunov function are deter-mined. These conditions in turn result in a upper bound for .

Define the discrete energy of voltages and currents at timeinstant , as

(11)

The function in (11) is a scalar and has a similar form tothat of energy stored in a LC circuit: . Thefunction can be rewritten as

(12)

The function is shown in the following to satisfy conditionin (8). Using (11), the difference in energy between successivetime instants can be written as

which can be simplified using (5) and (6) as

(13)

The inequality in (13) is true as is positive semidefinite (noteresistances can be zero).

For to satisfy (7), the matrix in (12) has to be positivedefinite. In the following, the conditions for to be positivedefinite are found. The stability conditions are derived as a re-sult. The stability conditions are found first when each node isconnected to only two branches. These conditions are extendedwhen the number of these branches is arbitrary and when thecircuit is discontinuous.

A. Condition on When Two Branches are Connected toEvery Node

Let the subscript denote a node, and let the subscriptsand denote the two branches connected to node

. Let and denote the branch currents that enterand leave node , respectively. Let denote a branch, and let thetwo nodes of this branch be denoted by and , withbranch current flowing from node to node . Thequantity in (12) can also be written as

(14)

where

(15)

From (14), is positive (in other words satisfies (7)) ifis positive for all . Expressing as a quadratic

form

(16)

The quantity is positive if the matrix is pos-itive definite. For to be positive definite, all the upper left

submatrices , where denotes the size of upper left subma-trix, should have positive determinants [32]. The determinant ofthe first upper left matrix should then satisfy

(17)



So all branch inductances should be nonzero and positive. Sim-

ilarly, it can be shown that the condition is true if

(18)

Since is non-negative for any real , from (18), it canconcluded that , i.e., all capacitances to ground should be

positive. Finally, it can be shown that the conditionis true if

(19)

Making use of the fact that

the condition in (19) is satisfied if

(20)

Since the condition in (20) is more strict than (18), the matrixis positive definite, if ’s and ’s are positive and (20) is

satisfied. When this analysis is repeated for all , the condition(20) becomes

(21)

resulting in a condition for as

(22)

When (22) is simplified, the well-known Courant time step for1-D circuit is obtained

(23)

The proof for (10) for a positive function like is shown inAppendix B of [22]. Therefore, 1) when and satis-fies (22), the discrete system (5), (6) is globally asymptoticallystable; 2) when and satisfies (22), the discrete systemis stable.

B. Condition on When Arbitrary Number of Branches areConnected to a Node

Let denote the number of branches connected to node .The generic condition on can be easily obtained by letting

and in (15) and repeating the derivationfrom (14)through (22). For a generic case, the condition onin (22) can be shown to be

(24)

where denotes the value of th inductor connected to node. The time step condition in (24) is valid even if the power grids

have discontinuous power/ground lines. As can be observed, thederivation described thus far does not require the circuit to behomogeneous or infinitely long.

V. ON-CHIP POWER GRID TRANSIENT SIMULATION USING LIM

To simulate the temporal fluctuation in the power supply dueto switching sources, a transient simulation is preferred. In thissection, a new LIM-enabled formulation for this transient simu-lation is described. The principal advantage of this new formu-lation is that it guarantees computational complexity pertime step of the transient simulation. This advantage is realizedthrough the artificial insertion of latency in the circuit, which isthe difference from the formulations in [15]–[17].

The on-chip power grid shown in Fig. 1, which has a equiv-alent circuit as shown in Fig. 2 has been used a reference fordescribing the formulation. First, it can be noticed that manyparts of the equivalent circuit shown in Fig. 2 has a form similarto the one in shown in Fig. 4. The current source in Fig. 4 can beused to represent the contribution of both the switching and theleakage currents. The power and the ground voltage suppliesat the C4 locations are taken into account by enforcing thesesupply voltages for the node voltages corresponding to the C4locations. Owing to this similarity, the update expressions de-veloped in the LIM can be readily used in most cases. In fact,in Fig. 2, the nodes in M1 at the end points of the vias, namelythe nodes 1, 3, 4, 6, 7, 9, 11, and 14, have the same conceptualequivalent circuit as in Fig. 5, and hence, their voltages can beupdated using (2).

However, for the rest of the nodes in Fig. 2, the node voltagescannot be updated using (2), as the latency is missing in all thesenodes. Latency is missing because these nodes either do nothave a shunt capacitance to ground or have branch capacitorsconnected to them. To see why latency is important for updatingthe node voltage, consider node 18 in M2. This node does nothave a shunt capacitance to ground, for reasons described earlierin this paper. The conceptual equivalent circuit at this node issame as in Fig. 5 except for . If is missing in Fig. 5,the KCL equation in (1) would not involve , and theexplicit expression in (2) cannot be obtained in the first place.

In [16] and [17], a shunt capacitance to ground is assumedto be present in all nodes in the on-chip PDN. However, asdescribed earlier in this paper, this assumption may not true.Therefore, the formulation presented in these approaches cannotbe applied to the new equivalent circuits shown in Fig. 2.

To enable the LIM, latency is inserted. For nodes with amissing capacitance to ground (nodes in M2 and above), asmall fictitious shunt capacitance to ground is added. With thisaddition, the equivalent circuit shown in Fig. 2 is modified asshown in Fig. 7. In the modified equivalent circuit, even thebranch capacitors are changed (more about this later).

With the modified equivalent circuit (see Fig. 7), the nodesin M2 and above that are not connected to any branch capaci-tors, namely the nodes 18, 20, 23, 25, 27, and 29 in M2, and thenodes 29, 33 and 35 in M3, have the same conceptual equiva-lent circuit shown in Fig. 5. Therefore, their voltages can be up-dated using (2). Adding a fictitious shunt capacitance to ground



Fig. 8. Conceptual equivalent circuit at the two end nodes � and � of thecrossover capacitor � .

affects accuracy, so the values of the fictitious elements haveto be small. The choice for the values of fictitious capacitanceto ground that does not compromise the accuracy much is de-scribed in Section VI.

When the crossover capacitors are not modeled, then therewill be no branch capacitors in the equivalent circuit shown inFig. 7. In such a scenario, the rest of the nodes, which are alsothe end points of the (missing) crossover capacitors, too have thesame conceptual equivalent circuit as shown in Fig. 5. There-fore, even their voltages can be updated using (2).

When the crossover capacitors are modeled, the rest of thenodes are connected either to one or to two branch capacitors. Insuch a scenario, the conceptual equivalent circuits at these nodesare not the same as the one in Fig. 5. Therefore, their voltagescannot be updated using (2), necessitating the development of anew update expression for these nodes.

Towards this end, a conceptual equivalent circuit at the endpoints of a branch capacitor is considered. In Fig. 8, a typicalcase is shown. There are two ways to derive the update expres-sions for the voltages of nodes and in Fig. 8. These waysdiffer in the decision to insert a latency (by way of a small se-ries inductance) in the branch capacitor. When the latency is notintroduced in the branch capacitors, then the linear complexityper time step of the approaches [16], [17] may not be guaran-teed (shown below). However, when the latency is introduced,this complexity can be guaranteed. Both the ways are describednext.

In the first way, latency is not introduced in the branch capac-itors. This is the way adopted in [16], [17]. The update expres-sions for the voltages for nodes and in Fig. 8 are derived asfollows. The process starts with the KCL at node . The KCLat node would involve terms and . Whenthese derivatives are discretized, is given by

(25)

In (25), is related to the unknown quantity .

Hence, cannot be obtained using (2). In fact, bothand have to be solved together. The extra

Fig. 9. New equivalent circuit of a floating capacitor � .

equation needed to find this solution is obtained from the KCLequation at node and is given by

(26)

Equations (25) and (26) are solved together to find and. When node in Fig. 8 is also connected capaci-

tively to some other node in the PDN, then the size of the systemto be solved increases, increasing the computational complexityof updating these node voltages. However, in [17], it has beenshown that the linear computational complexity per time step ispreserved when only the crossover capacitances are considered.

However, when decoupling capacitors are present, the linearcomputational complexity may be compromised using this way.When the on-chip decoupling capacitors are spread uniformlyacross the chip, the number of nodes that are capacitively cou-pled could be proportional to . When the number of capaci-tively coupled nodes is proportional to , then a large sparsesystem whose size is proportional to needs to be solved. Thecomplexity of the solution to such a system cannot be guaran-teed to be linear, necessitating efforts to avoid this complexityproblem.

In the second way, latency is inserted in all branch capacitors(whether crossover or decoupling capacitors) by adding a smallseries inductance to them. The choice of the series fictitiousinductance in the crossover capacitor is described in Section VI.Consequently, the crossover capacitor is represented bya series resistor-inductor-capacitor model as shown in Fig. 9.Further, the current through the crossover capacitor as well asthe voltage of its internal node (node in Fig. 9) are main-tained. In Fig. 7, the on-chip PDN equivalent circuit with themodified crossover capacitor model has been shown. Withthe new crossover capacitor model, the conceptual equivalentcircuit shown in Fig. 8 becomes like the one in Fig. 10. Sincethe current (see Fig. 10) through the crossover capacitor ismaintained, the conceptual equivalent circuit at node is sameas the one shown in Fig. 5. Therefore, of node inFig. 10 can be obtained using the expression in (2). As a result,

is solved independently of .However, a similar procedure cannot be employed for ob-

taining : since node is capacitively connected to node, their voltages have to be solved together using the procedure



Fig. 10. Conceptual equivalent circuit at the two end nodes � and � with thenew model for the crossover capacitor � .

described in the first way. However, unlike in the first way, thelinear computational complexity per time step can be guaran-teed. The size of the system to be solved for updating a nodevoltage is equal to the number of floating capacitors connectedto the node. Since only the capacitive coupling between a nodeand its neighbors is modeled in the equivalent circuit shown inFig. 2, the maximum size of the system to be solved is equal tothe number of neighboring nodes of a node. This number is in-dependent of (for equivalent circuits shown in Fig. 2), evenif on-chip decoupling capacitors were to be included.

Unlike the procedure for updating the node voltages, the pro-cedure for updating the branch current is relatively simple. Asall branches, including the branch capacitors, have the sameconceptual equivalent circuit as shown in Fig. 6. the branch cur-rents are updated using (4).

The time step of the transient simulation has an upper bound.A more strict upper bound for the time step than (27) is used

(27)

where is the total number of branches connected to node .The time step in (27) is observed to be working even if there arecoupling capacitances in the equivalent circuit.

VI. FICTITIOUS LATENCY ELEMENTS

In this section, approximate closed-form expressions forcomputing the fictitious latency elements have been derived.

A. Fictitious Series Inductance

The fictitious series inductance added to a floating capacitor(see Fig. 9) makes this capacitance a series inductor-capacitorresonance circuit. The objective is to choose the fictitious in-ductance such that it does not affect the accuracy ofthe results (obtained without it) much. At low frequencies,acts as a short circuit, and its effect is not felt. However, at highfrequencies, its impedance is high, and hence, it can affect theresults. The objective stated above will be met if is chosensuch that even at the maximum frequency of interest , theinductance has a significantly smaller impedance compared tothe impedance of the rest of the capacitor circuit; i.e.,

or

(28)

where , and . Then, the fictitious seriesinductance can be computed from

(29)

From (29), it can be observed that the fictitious inductance de-creases with the increase in , with the increase in , andwith the decrease in . The smaller the fictitious series induc-tance, the more accurate is the result. The factor can be em-ployed to control the accuracy. Since time step depends onthe fictitious series inductance (see (27)), it would be useful tocompute this inductance for some practical values of and

. If , where is the rise time of the (triangular)switching current source, then GHz GHz for

ps ps. The crossover capacitance is usuallyless than 10 fF (The crossover capacitance, , can be approx-imately computed by , where is thepermittivity of silicon dioxide; and are the widths of linesin adjacent metal layers, and is the distance between the adja-cent metal layers; if m, m, and m,then fF.) The thin-oxide on-chip decoupling capac-itor for 1 sq- m area with a gate length of 90 nm and with anoxide thickness of 1.3 nm is approximately 26 fF. Therefore,the fictitious series inductance, , is computed using (29) for

GHz GHz, fF fF,, and . In Fig. 11, the plots for fF, 1 fF,

10 fF, and 100 fF are shown. From Fig. 11, it can be observedthat the fictitious series inductance can be low or high dependingon the values of and . This inductance has been foundto be as low as 2.5 fH for fF and GHz,and as high as 1000 pH for fF and GHz.

B. Fictitious Capacitance to Ground

The procedure for computing the fictitious capacitance toground (in a node that did not have it before) is similar to theone followed for computing the fictitious series inductance.Since all nodes in the power (ground) rails of the on-chip PDNwill have a dc path to the power (ground) supply, the equivalentcircuit from node (with no capacitance to ground) to the power(ground) supply can be represented as shown in Fig. 12(a).In Fig. 12(a), is the power/ground supply voltage source,

and are the resistance and inductance, respectively,between node and the voltage source, andis the impedance observed from node and looking into thedc voltage source in the absence of capacitance to groundfrom node . When a fictitious capacitance to ground, isadded, then the equivalent circuit shown in Fig. 12(a) can betransformed to the one in Fig. 12(b). The objective is to choose

in such a way that it does not affect the accuracy of theresults (obtained without it) much. At low frequencies, actsas an open circuit, and therefore, its effect is not felt. However,at high frequencies, may have a smaller impedance than



Fig. 11. Variation of fictitious inductance with maximum frequency of the excitation and with floating capacitance. (a) Branch capacitance of 0.1 and 1 fF; (b)branch capacitance of 0.1 and 1 fF; (c) branch capacitance of 10 and 100 fF; (d) branch capacitance of 10 and 100 fF.

and its effect might be felt. The impedanceis given by

(30)

To not affect , can be chosen such that itpresents a much higher impedance than at

; i.e.,

or

(31)

where . The capacitance to ground is then computedfrom

(32)

for all nodes that do not have a capacitance to ground. Sinceand vary depending on node and are difficult to

compute at each node , maximum of their values can be used.Accordingly, the fictitious capacitance to ground, , can becomputed for any node by

(33)If there are more than one power (ground) supply, then onlythe nearest power (ground) supply is considered. Approximateestimates for terms and in (33) has

been derived below.The distance between a node and its nearest power-(ground-)

supply bump is usually bounded by the pitch of the bumps. Thishas been illustrated in Fig. 13. In Fig. 13, the typical bump ar-rangement has been shown. In Fig. 13, refers to the th power(ground) bump, and and refer to the spacing between



Fig. 12. Eequivalent circuit as seen from node � in an on-chip PDN to thepower/ground supply terminal. � is the power/ground supply;� and�are the net resistance and inductance, respectively, between node � and supplyvoltage; and� is the fictitious capacitance to ground from node �. (a) Withoutfictitious capacitance to ground. (b) With fictitious capacitance to ground.

Fig. 13. Calculation of the maximum distance between a node and its nearestpower(ground) supply.

adjacent power (ground) bumps in the x- and the y- directions,respectively. Then, the maximum distance between any node Pin the area bounded by rectangle to its nearest power(ground) bump (B in Fig. 13) obeys the relation

(34)

Then

(35)

(36)

where and are the resistance and inductance per unitlength, respectively, of lines in metal layer . The fictitious ca-pacitance to ground is computed for different values of and

. In this computation, is chosen as it was for

the fictitious inductance; the maximum inductance

was computed assuming a p.u.l. inductance of 2 anda worst-case distance of 20 mm from any node to the powersupply; therefore, nH; and since

Fig. 14. Variation of fictitious capacitance to ground with maximum frequencyof operation for �� nH and �� .

usually, the effect of on the fictitious

capacitance to ground was ignored. In Fig. 14, the variation ofthe fictitious capacitance to ground with increasing andfor nH has been shown. From Fig. 14, it

can be observed that the fictitious capacitance to ground, ,is less than 0.25 fF when nH. The capac-

itance can be as small as at 100 GHz whennH.

VII. COMPUTATIONAL COMPLEXITY OF

TRANSIENT SIMULATION

In this section, the memory and the time complexities of theLIM-enabled transient simulation have been derived taking theeffect of fictitious elements into account.

Since the transient simulation is based on the LIM, the com-putational complexity of the transient simulation is same that ofthe LIM. Since and differ only by a constant factor inthe equivalent circuit shown in Fig. 7, the memory complexityof the transient simulation is , and the time complexity ofthe transient simulation is . Since (and hence )is independent of from (27), the overall computational com-plexity of the transient simulation is . However, in prac-tice, the time complexity can be more. This is because, though

is independent of , its value can be comparable to , es-pecially with the presence of fictitious latency elements. In thefollowing, it is estimated that the runtime of the LIM-enabledtransient simulation for power grid simulation is approximatelyproportional to for in the order of millions.

To find the overall time complexity, the typical range ofhas to be estimated. Since the total time , theworst-case values for has to be estimated first. For equiva-lent circuits such as in Fig. 2, the maximum time step, ,is independent of (see (27)). Since is dependenton the smallest L and C in the circuit from (27), and since thesmallest L and C are usually the fictitious series inductance andshunt capacitance, the effect of the values of the fictitious ele-ments on the is studied. For this study, the



Fig. 15. Variation of the maximum time step with the smallest capacitance toground and with the smallest series inductance. �� nH.

from (27) was computed for different values of fictitious capac-itance to ground shown in Fig. 14 and for different values of se-ries inductance of branch shown in Fig. 11. When there are nodecoupling capacitors, the maximum number of branches con-nected to a node is only four in the equivalent circuit shown inFig. 2. Therefore, . In Fig. 15, the has beenplotted for different values of fictitious capacitance to groundand series inductance of branch. From Fig. 15, it can observedthat decreases with the decrease in the smallest ficti-tious capacitance to ground and with the decrease in the smallestfictitious series inductance of branch. The maximum time stepcan be as low as 0.01 fs for a series inductance of 1 fH and a fic-titious capacitance to ground of , and this scenariohappens for GHz and nH from

the discussions in Section VI.If a more conservative estimate of fs is consid-

ered, and if ns is chosen, then . Therefore,when is in the order of millions, the runtime of the wholetransient simulation is approximately proportional to . For

ns and fs, then . Whenis in the order of millions, the runtime is proportional to. Therefore, for in the order of millions, the total time

complexity of the runtime of the overall transient simulationis approximately proportional to . For differential-modeequivalent circuits such as in [3], the capacitance to ground froma node is in couple of femtofarads. In such cases, .For these equivalent circuits, the runtime of the overall transientsimulation is approximately proportional to for in theorder of millions.

A. Remarks

In the proposed transient simulation formulation, the advan-tages mentioned in Section I are preserved with a numerical ro-bustness associated with a direct solver-based implicit method.However, unlike the direct solver, the optimal memory com-plexity is preserved irrespective of the numbering of the nodes.

Fig. 16. Cross-sectional view of the on-chip PDN.

TABLE IP.U.L. R, L, C PARAMETERS OF POWER-GROUND LINES IN DIFFERENT LAYERS

OF THE ON-CHIP PDN

The drawback of the proposed method is its high time com-plexity. This high complexity is only due to the small time stepof the transient simulation observed in these equivalent circuits.This complexity is expected to be alleviated to usingtime-step relaxation schemes such as the alternate direction im-plicit (ADI) methods without compromising the memory com-plexity. Using an ADI-based method, the dependence ofon the element values is removed, yielding a that is smallcompared to . Such a fix using ADI methods is common torelax the restriction on the time step in FDTD method [33], [34].There have been also efforts to employ ADI-based methods forthe on-chip power-grid simulation in mesh-type power grids[10], [35].

VIII. RESULTS

In this section, the transient results that demonstrate the accu-racy of the LIM-enabled power-grid simulation and the accuracyof the proposed closed-form expressions have been presented.The rest of the section is organized as follows. First, the transientresults have been obtained for a small problem. Second, thetransient results pertaining to the accuracy of the closed-formexpressions have been presented for a large problem.

A. Small Problem

1) Test Setup: The test setup consists of an on-chip PDN likethe one in Fig. 1, with three metal layers; M1 is the metal layerclosest to substrate, M3 is the metal layer farthest from sub-strate, and M2 is the metal layer between M1 and M3. In Fig. 16,the cross-sectional view of this on-chip PDN has been shown.Only 400 m 400 m of the region starting from (0, 0) hasbeen considered for this test. The total number of nodes, , is1900. The p.u.l. parameters of the lines in the different layershave been listed in Table I. In Table I, the lines in M1 have a ca-pacitance to ground, while the lines in M2 and M3 do not havea capacitance to ground. The via resistances and inductances,and the crossover capacitances between different metal layers



TABLE IIVIA RESISTANCE AND INDUCTANCE AND CROSSOVER CAPACITANCE BETWEEN

DIFFERENT METAL LAYERS

Fig. 17. Arrangement of power- and ground-supply bumps in M3.

have been listed in Table II. The arrangement of the power- andthe ground-supply bumps in M3 has been shown in Fig. 17. Theleakage and the switching power densities have been chosen as125 each. The leakage current has been mod-eled as a dc current source; these sources have been distributeduniformly in M1; each leakage current has an amplitude of 49uA. The switching current has been modeled as a periodic tri-angular pulse stream with ps, ps,

, ps, and; all sources in the rectangular area bounded by the loca-

tions ( m, m) and ( m, m)have been assumed to be switching starting from . A totalsimulation time of 300 ps has been chosen.

Since a constant current source model has been used for theleakage current, it would produce a fixed IR-drop in the nodesof the on-chip PDN. Thus, a dc analysis is performed before thetransient simulation. At dc, the inductors act as short circuits,and the capacitors act as open-circuits. For the resulting resistor-only network, the dc analysis has been performed using themodified nodal analysis (MNA) approach. An iterative methodhas been used for solving the linear system of equations. Thetranspose-free quasi-minimal residual (TFQMR) algorithm [36]has been used as the iterative algorithm.

2) Accuracy of the LIM-Enabled Transient Simulation: Theaccuracy of the transient results using the LIM is compared withthat from HSPICE. To enable the LIM, fictitious capacitance of1 fF is added to ground from all nodes in M2 and M3, and

[see (28)] for all crossover capacitances. The differentialtransient voltages were computed at ( m, m)using both the LIM and HSPICE, and these results have beencompared in Fig. 18. The time step of 17.8 fs was chosen. FromFig. 18, it can be observed that result from the LIM matchesvery well with that from HSPICE. The maximum instantaneousrelative error was less than 0.06%. This test demonstrates theaccuracy of the LIM.

3) Accuracy of the Proposed Closed-Form Expressions forFictitious Elements: The computation of the fictitious induc-

Fig. 18. Comparison of the differential voltage at (� � ��m, � � ��m)in M1 from the LIM method with that from HSPICE.

tance is relatively easy compared to that of the fictitious capac-itance to ground. Since all the terms in (29) except the termare known before the simulation, the choice of the value ofcompletes the computation of the fictitious inductance. It hasbeen observed from many simulations that guaran-tees an accurate result for all problems. The computation of fic-titious capacitance to ground, however, is not straightforward.The term . From Fig. 17, it can observed that themaximum distance between a node and its nearest power supplyis less than 280 m. Using (33), (35), and (36), the fictitious ca-pacitance to ground from any node was found out to be 0.0634fF. Therefore, the capacitance to ground can be chosen any valueless than or equal to 0.0634 fF. Using a fictitious capacitance toground of 0.01 fF ( 0.0634 fF), the differential transient nodevoltage has been computed at ( m, m).The time step, , is computed through (27) as 1.78 fs (seeFig. 19). It can be noticed in Fig. 19 that resistances and induc-tances connected to the node are not the same in all branches.Therefore, the circuit is inhomogeneous. In Fig. 20, the differ-ential node voltage at ( m, m) has beenplotted with and without the fictitious capacitance to ground.From Fig. 20(a) and (b), it can be observed that the results arebounded. This demonstrates the accuracy of the upper bound ofthe time step shown in (27). From Fig. 20(a), it can be observedthat result with a fictitious capacitance of 0.01 fF agrees wellwith the result without the fictitious capacitance (this result wasobtained using HSPICE). The maximum relative error betweenthe two results is 0.4%. The result in Fig. 20(a) demonstrates theaccuracy of the closed-form expressions proposed in Section VI.From Fig. 20(b), it can be observed that the result with a ficti-tious capacitance to ground of 0.1 fF ( 0.0636 fF) differs fromthe result without this capacitance. The maximum relative errorbetween the two results is 5.2% The time step fswhen fictitious capacitance to ground is 0.1 fF. Thus, the fic-titious capacitance to ground has to be computed carefully ifaccuracy is not to be compromised.



Fig. 19. Tim- step calculation. Shown is the equivalent circuit near the nodethat has the smallest time step; ��

�� fF�

� � �� fs.

B. Large Problem

1) Test Setup: The test setup remains the same except for thefollowing: the size of the chip is increased to 4000 m 4000

m; ; the new bump locations are as shown inFig. 21; the leakage current sources are distributed in the 4000

m 4000 m area in M1; the switching current sources areconfined to the center of M1 in the rectangular area bounded bylocations ( m, m) and ( m,

m).2) Accuracy of the Proposed Closed-Form Expressions

for Fictitious Elements: The computation of the fictitiousseries inductance remains unchanged from that in the smallproblem (since the crossover capacitance and excitation remainunchanged). However, the fictitious capacitance to groundchanges, as the bump locations have changed. From Fig. 21,it can be observed that the maximum distance between a nodeand its nearest power supply is less than 1040 m. Using thisdistance, and using (33)–(36), the fictitious capacitance toground is computed to be any value less than 0.033 fF. Since itis a challenge to run this problem in HSPICE, the accuracy ofthe transient results is shown by observing the convergence ofthese results with the fictitious capacitance to ground. The re-sults obtained with a fictitious capacitance to ground of 0.01 fF( 0.033 fF) have been used as the reference result for showingthe convergence. In Fig. 22, the differential transient voltageswith a fictitious capacitance of 0.01 fF are compared with thoseobtained with a fictitious capacitance of 0.1 fF. The time step

with both these capacitances are same as those in the smallproblem. From Fig. 22, it can be observed that the transient re-sults are almost the same. The maximum relative error betweenthe results obtained with the fictitious capacitance of 0.01 fFand the results obtained with the fictitious capacitance of 0.1 fFis 5.4%. This error is 22% when 1 fF was used and is 36%

Fig. 20. Comparison of the transient results obtained with and without the fic-titious capacitance to ground. (a) Fictitious capacitance to �� fF;(b) fictitious capacitance to �� fF.

Fig. 21. The new arrangement of power- and ground-supply bumps in M3.

when 10 fF was used. Therefore, the maximum relative errorkeeps reducing with the decrease in the fictitious capacitance.Therefore, the transient results with a fictitious capacitance toground of 0.033 fF would have a maximum relative error ofless than 5.4%. Thus, this result also demonstrates the accuracyof the proposed closed-form expressions.



Fig. 22. Convergence of transient results with the reduction in the fictitiouscapacitance to ground. (a) Differential transient voltage at (� � �� m, � �� m). (b) Differential transient voltage at (� � �� m, � � �� m).

TABLE IIITIME AND MEMORY REQUIREMENTS OF THE PROPOSED TRANSIENT

SIMULATION APPROACH

C. Memory and Time Requirements

In this section, the memory and time taken by the simula-tion for the two problems are described. In Table III, the timeand memory requirements of the proposed method are shown.For the small problem K , the time taken per timestep of the transient simulation is approximately 0.058 s. Thememory required is 1.02 MB, which includes the memory re-quired for storing the geometry (0.76 MB) and the memory re-quired for the dc solution (0.26 MB). For the transient solu-

tion, no additional memory is required, as the node voltages(and branch currents) are solved independently. For the largeproblem K , the time taken per time step of the tran-sient simulation is 5.8 s. The memory required is MB

MB MB . It can beobserved that both the memory requirement and the time takenper time step of the transient simulation scale linearly with theproblem size, and, therefore, are optimal in complexity. Thetotal time taken for the whole transient simulation is affected bythe value of . Since fs and the total simulation timeis 0.300 ns, K. Such a large increases the total sim-ulation time. However, (and therefore ) is independent of

, as depends only on the smallest and values. There-fore, the proposed method is advantageous in terms of overallrun time when . Such a situation arises either whenis small and/or is large, or when is large.

In HSPICE, the memory required for the small and thelarge problem are 21.76 MB and 1.5 GB, respectively. Whilethe small problem was completed faster than the proposedmethod (by two times for the same ), the large problem wasnot completed due to the large memory requirements. Thishigh memory requirement in HSPICE is primarily due to thememory requirements of the direct solver in HSPICE. Sincethe memory and time requirements of a direct solver dependson the way the nodes are numbered, these requirements canbe improved with careful node numbering. One of the bettercomplexity with a direct solver comes with a nested dissectionnode ordering [37]. For problems arising out of discretizingpartial differential equations in regular 2-D grid, it has beenshown [37] that ordering the nodes in a nested dissectionmanner makes the memory complexity andthe time complexity , where , and isthe total number of nodes in one dimension. For a 3-D grid,the corresponding complexities are shown to be and

, respectively, where [38]. In a power grid,the number of nodes in a line along the direction of the heightof the chip is usually a constant ( number of metal layers).Therefore, for power grids , where isnumber of nodes in a single power/ground line. Therefore,one of the better memory and time complexities achievablefor a power grid problem can be in between the complexitiesof the nested dissection-based direct solvers in two and threedimensions. However, the proposed method guaranteesmemory complexity and time complexity per time stepfor the power grid problem independent of the way the nodesare numbered. Moreover, the proposed method is as robust asa direct solver in terms of accuracy and convergence. Also,is independent of . Therefore, for , the proposedmethod may also be advantageous in terms of runtime.

IX. CONCLUSION

The on-chip power-grid simulation has been performed usingthe LIM in equivalent circuits of the on-chip PDN in which someof the nodes did not have a capacitance to ideal ground and someof the nodes had a floating capacitance between them. A smallcapacitance to ground was added to those nodes that did not havethis capacitance, and a small series inductance was added tothose capacitive branches that did not have this inductance. The



closed-form expressions for the fictitious capacitance to groundand the fictitious series inductance have been proposed. Theaccuracy of the LIM-enabled power-grid simulation has beenshown. The accuracy of the proposed closed-form expressionshave been demonstrated. It has been shown that the memorycomplexity for the overall transient simulation is . It hasbeen shown that the time complexity per time step of the tran-sient simulation is . An expression for the upper boundfor the time step has been derived for the first time for a multidi-mensional inhomogeneous circuit. It has been found thatdue to the very small values of the fictitious elements, the max-imum time step of the transient simulation becomes small, andtherefore, the time requirements for the overall transient sim-ulation becomes very high. It has been estimated that the run-time of the overall LIM-enabled transient simulation is approx-imately proportional to for problem sizes in the order ofmillions.

ACKNOWLEDGMENT

The authors would like to thank the Associate Editor andthe anonymous reviewers for their suggestions to improve thequality of this manuscript.

REFERENCES

[1] K. Shepard and V. Narayanan, “Noise in deep submicron digitaldesign,” in Proc. Int. Conf. Computer-Aided Design, Nov. 1996, pp.524–531.

[2] A. Dharchoudhury, R. Panda, D. Blaauw, and R. Vaidyanathan, “De-sign and analysis of power distribution networks in PowerPC micro-processors,” in Proc. Design Autom. Conf., 1998, pp. 738–743.

[3] H. H. Chen and J. S. Neely, “Interconnect and circuit modelingtechniques for full-chip power supply noise analysis,” IEEE Trans.Compon., Packag., Manuf. Techno. B, vol. 21, no. 4, pp. 209–215,Aug. 1998.

[4] L. T. Pillage, R. A. Rohrer, and C. Visweswariah, Electronic Circuitand System Simulation Methods. New York: McGraw-Hill, 1994.

[5] G. H. Golub and C. F. Van Loan, Matrix Computations, 2nd ed. Bal-timore, MD: Johns Hopkins Univ. Press, 1989.

[6] Y. Saad, Iterative Methods for Linear Systems, 2nd ed. New York:Springer, 2000.

[7] L. W. Nagel, “SPICE2, A Computer program to simulate semicon-ductor circuits,” Univ. California, Berkeley, Tech. Rep. ERL-M520,1975.

[8] T. H. Chen and C. C. P. Chen, “Efficient large-scale power grid analysisbased on preconditioned Krylov-subspace iterative methods,” in Proc.Design Autom. Conf., Aug. 2001, pp. 559–562.

[9] J. N. Kozhaya, S. R. Nassif, and F. N. Najm, “A mutigrid-like techniquefor power grid analysis,” IEEE Trans. Comput.-Aided Des. Integr. Cir-cuits, vol. 21, no. 10, pp. 1148–1160, Oct. 2002.

[10] W. Guo and S. X. D. Tan, “Circuit-level alternating-direction implicitapproach to transient analysis of power distribution networks,” in Proc.Int. Conf. Application-Specific Integrated Circuits, Oct. 2003, vol. 1,pp. 246–249.

[11] M. Zhao, R. V. Panda, S. S. Sapatnekar, and D. Blaauw, “Hierarchicalanalysis of power distribution networks,” IEEE Trans. Comput.-AidedDes. Integr. Circuits, vol. 21, no. 2, pp. 159–168, Feb. 2002.

[12] H. Qian, S. R. Nassif, and S. S. Sapatnekar, “Power grid analysis usingrandom walks,” IEEE Trans. Comput.-Aided Des. Integr. Circuits, vol.24, no. 5, pp. 1204–1224, Aug. 2005.

[13] W. Guo, S. X. D. Tan, Z. Luo, and X. Hong, “Partial random walk forlarge linear network analysis,” in Proc. Int. Symp. Circuits Syst., May2004, vol. 5, pp. 173–177.

[14] Y. Zhong and M. D. F. Wong, “Fast algorithms for IR drop analysisin large power grid,” in Proc. IEEE Conf. Comput.-Aided Des. Integr.Circuits, Nov. 2005, pp. 351–357.

[15] J. Choi, L. Wan, M. Swaminathan, B. Beker, and R. Master, “Mod-eling of realistic on-chip power grid using the FDTD method,” inProc. IEEE Int. Symp. Electromagn. Compat., Aug. 2002, vol. 1,pp. 238–243.

[16] J. Mao, “Modeling of Simultaneous switching noise in on-chip andpower distribution networks using conformal mapping, finite-differ-ence time-domain and cavity resonator methods,” Ph.D. dissertation,School of Elect. Comput. Eng., Georgia Inst. Technol., Atlanta, Oct.2004.

[17] S. N. Lalgudi, Y. Kretchmer, and M. Swaminathan, “Simulation ofsimultaneous switching noise in on-chip power distribution networksof FPGAs,” in Proc. IEEE 14th Top. Meeting Elect. Perf. Electron.Packag., Oct. 2005, pp. 319–322.

[18] J. E. Schutt-Aine, “Latency Insertion Method (LIM) for the fasttransient simulation of large networks,” IEEE Trans. Circuits Syst.II, Analog Digit. Signal Process., vol. 48, no. 1, pp. 81–89, Jan.2001.

[19] Z. Deng and J. E. Schutt-Aine, “Stability analysis of Latency InsertionMethod (LIM),” in Proc. IEEE 13th Top. Meeting Elect. Perf. Electron.Packag., Oct. 2004, pp. 167–170.

[20] W. Thiel and L. P. B. Katehi, “Some aspects of stability and numer-ical dissipation of the finite-difference time-domain (FDTD) techniqueincluding passive and active lumped elements,” IEEE Trans. Microw.Theory Tech., vol. 50, no. 9, pp. 2159–2165, Sep. 2002.

[21] F. Edelvik, R. Schuhmann, and T. Weiland, “A general stability anal-ysis of FIT/FDTD applied to lossy dielectrics and lumped elements,”Int. J. Numer. Model., vol. 17, pp. 407–419, 2004.

[22] F. Kung and H. T. Chuah, “Stability of classical finite-differencetime-domain (FDTD) formulation with nonlinear elements—A newperspective,” Progr. Electromagn. Res., vol. 42, pp. 49–89, 2003.

[23] B. Gustafsson, H. O. Kreiss, and J. Oliger, Time Dependent Problemsand Difference Methods. New York: Wiley-Interscience, 1995.

[24] Z. Panda, S. Sundareswaran, and D. Blaauw, “Impact of low-impedance substrate on power supply integrity,” in Proc. IEEEConf. Design and Test of Computers, May–June 2003, vol. 20, pp.16–22.

[25] Z. He, M. Celik, and L. Pileggi, “SPIE: Sparse partial inductance ex-traction,” in Proc. Design Autom. Conf., Jun. 1997, pp. 137–140.

[26] A. Devgan, H. Ji, and W. Dai, “How to efficiently capture on-chipinductance effects: Introducing a new circuit element K,” in Proc.IEEE Conf. Comp.Aided Des. Integr. Circuits, Nov. 2000, pp.150–155.

[27] K. S. Yee, “Numerical solution of initial boundary value problems in-volving Maxwell’s equation in isotropic media,” IEEE Trans. AntennasPropag., vol. AP-14, no. 3, pp. 302–307, May 1966.

[28] A. Taflove and S. Hagness, Computational Electrodynamics. Natick,MA: Artech House, 2000.

[29] H. K. Khalil, Nonlinear Systems. New York: Macmillan PublishingCompany, 1992.

[30] W. L. Brogan, Modern Control Theory, 3rd ed. Upper Saddle River,NJ: Prentice-Hall, 1992.

[31] K. Ogata, Discrete-Time Control Systems. Upper Saddle River, NJ:Prentice-Hall, 1987.

[32] G. Strang, Linear Algebra and Its Applications, 3rd ed. New York:Thomson Learning, 1998.

[33] T. Namiki and K. Ito, “A new FDTD algorithm free from the CFL con-dition restraint for a 2D-TE wave,” in Dig. 1999 Antennas Propagat.Symp., Jul. 1999, pp. 192–195.

[34] T. Namiki, “A 3-D ADI-FDTD method—Unconditionally stabletime-domain algorithm for solving vector Maxwell’s equations,” IEEETrans. Microw. Theory Tech., vol. 48, no. 10, pp. 1743–1748, Oct.2000.

[35] Y. M. Lee and C. C. P. Chen, “The power grid transient simuala-tion in linear time based on 3-D alternating-direction-implicit method,”IEEE Trans. Comp.Aided Des. Integr. Circuits, vol. 22, no. 11, pp.1545–1550, Nov. 2003.

[36] R. W. Freund, “A transpose-free quasi-minimal residual algorithmfor non-hermitian linear systems,” SIAM J. Sci. Comput., vol. 14, pp.470–482, Mar. 1993.

[37] A. George, “Nested dissection of a regular finite difference mesh,”SIAM J.Numer. Anal., vol. 10, no. 2, pp. 345–363, Apr. 1973.

[38] C. Ashcroft and J. W. H. Liu, “Robust ordering of sparse matrices usingmultisection,” SIAM J. Matrix Anal. Appl., vol. 19, no. 3, pp. 816–832,1998.



Subramanian N. Lalgudi received the B.E. degreein electronics and communication engineering fromAnna University, Chennai, India, in 1999 and theM.S. degree in electrical engineering from IowaState University, Ames, IA, in 2003. He is workingtoward the Ph.D. degree in electrical engineering inGeorgia Institute of Technology, Atlanta.

From July 1999 to July 2000, he was a SoftwareEngineer in Future Software Pvt. Ltd, Chennai, India,and he was involved in the implementation of datanetworking protocols. From August 2000 to August

2002, was a Research Assistant in the Department of Electrical and ComputerEngineering, Iowa State University. From August 2002 to December 2002, hewas a Visiting Scholar in the Department of Electrical and Computer Engi-neering, Michigan State University. From January 2003 to May 2003, he was aTeaching Assistant in the Department of Electrical and Computer Engineering,Iowa State University. Since August 2003, he has been a Research Assistant inthe Department of Electrical and Computer Engineering, Georgia Institute ofTechnology. His research interests include numerical methods, computationalelectromagnetics, fast algorithms, parasitic extraction, circuit simulation, inter-connect analysis, and signal and power integrity.

Madhavan Swaminathan (M’95–SM’98–F’06) re-ceived the B.E. degree in electronics and communi-cation from the University of Madras, Chennai, India,and the M.S. and Ph.D. degrees in electrical engi-neering from Syracuse University, Syracuse, NY.

He is currently the Joseph M. Pettit Professor inElectronics in the School of Electrical and Com-puter Engineering, Georgia Institute of Technology(Georgia Tech), Atlanta, and the Deputy Director ofthe Packaging Research Center, Georgia Tech. He isthe Co-Founder of Jacket Micro Devices, a company

specializing in integrated devices and modules for wireless applications wherehe serves as the Chief Scientist. Prior to joining Georgia Tech, he was withthe Advanced Packaging Laboratory at IBM working on packaging for supercomputers. He has over 300 publications in refereed journals and conferences,has co-authored 3 book chapters, has 15 issued patents and has several patentspending. While at IBM, he reached the second invention plateau.

Dr. Swaminathan served as the Co-Chair for the 1998 and 1999 IEEE TopicalMeeting on Electrical Performance of Electronic Packaging (EPEP), served asthe Technical and General Chair for the IMAPS Next Generation IC & PackageDesign Workshop, serves as the Chair of TC-12, the Technical Committee onElectrical Design, Modeling and Simulation within the IEEE CPMT society

and was the Co-Chair for the 2001 IEEE Future Directions in IC and PackageDesign Workshop. He is the co-founder of the IMAPS Next Generation IC &Package Design Workshop and the IEEE Future Directions in IC and PackageDesign Workshop. He also serves on the technical program committees of DAC,EPEP, Signal Propagation on Interconnects (SPI) workshop, Solid State De-vices and Materials Conference (SSDM), Electronic Components and Tech-nology Conference (ECTC), and International Symposium on Quality Elec-tronic Design (ISQED). He is also the author of the book on Power IntegrityModeling and Design for Semiconductors and Systems, Prentice Hall, Nov 2007and co-editor of the book on Introduction to SOC, SIP and SOP (McGrawHill, to appear in 2008). He was a Guest Editor of the IEEE TRANSACTIONS

ON ADVANCED PACKAGING and IEEE TRANSACTIONS ON MICROWAVE THEORY

AND TECHNIQUES. He was the Associate Editor of the IEEE TRANSACTIONS

ON COMPONENTS AND PACKAGING TECHNOLOGY. Dr. Swaminathan is the re-cipient of the 2002 Outstanding Graduate Research Advisor Award from theSchool of Electrical and Computer Engineering, Georgia Tech and the 2003 Out-standing Faculty Leadership Award for the mentoring of graduate research as-sistants from Georgia Tech. He is also the recipient of the 2003 Presidential Spe-cial Recognition Award from IEEE CPMT Society for his leadership of TC-12and the IBM Faculty Award in 2004 and 2005. He has also served as the coau-thor and advisor for a number of outstanding student paper awards at EPEP’00,EPEP’02, EPEP’03, EPEP’04, ECTC’98, APMC’05 and the 1997 IMAPS Edu-cation Award. Dr. Swaminathan is the recipient of the Shri. Mukhopadyay bestpaper award at the International Conference on Electromagnetic Interferenceand Compatibility (INCEMIC), Chennai, India, 2003, the 2004 best paper awardin the IEEE TRANSACTIONS ON ADVANCED PACKAGING, the 2004 commendablepaper award in the IEEE TRANSACTIONS ON ADVANCED PACKAGING and the bestposter paper award at ECTC 2004 and 2006. In 2007, Dr. Swaminathan and hisstudents were recognized for their research by the Technical Excellence Awardgiven by Semiconductor Research Corporation (SRC) and Global Research Cor-poration (GRC). His research interests are in mixed signal micro-system andnano-system integration.

Yaron Kretchmer (M’07) is with ICCAD-Engi-neering division of Altera Corporation, San Jose,CA.


914 ieee transactions on circuits and systemsâ€”i: regular

Documents