[ieee 2006 international symposium on vlsi design, automation and test - ambassador hotel, hsinchu...

4
A Supply-Gating Scheme for Both Data-Retention and Spike-Reduction in Power Management and Test Scheduling Tsung-Chu Huangt, Jing-Chi Tzengt, Yuan-Wei Chaot, Ji-Jan Chent, Wei-Ting Liut and Kuen-Jong Lee tDepartment of Electronic Engineering tSoC Technology Center Department of Electrical Engineering National Changhua University of Education Industrial Technology Research Institute National Cheng Kung University 1 Chinde Rd., Changhua, Taiwan 195-4, Sec. 4, Chung Hsing Rd., Hsinchu, Taiwan 1 Da-Hsueh Rd., Tainan, Taiwan INTRODUCTION To reduce the spikes and wake-up time, the authors in [9] propose a zigzag super-cutoff structure for supply-gating CMOS circuits. Fig. In the system-on-a-chip (SoC) revolution, power dissipation becomes 2 shows the basic concept. The circuit is partitioned into bipartites one of the most important issues [1]. This issue has received much according to the logic values of gate outputs with respect to a given attention for circuit design [2] while increasing efforts have been pattern at primary inputs (PIs). Those gates with an output 0 or 1 are devoted to low power testing for CMOS logic circuits recently [3]. then clustered with alternative unilateral virtual supply (VDD') or For reducing dynamic power dissipation, much work has been de- virtual ground (GND'), respectively. Although the sleep transistor voted to vector or chain reordering, clock-gating, activity controlling allocation structure can easily reduce the spike-wakeup-time product or inhibiting, and circuit or chain partitioning [4, 5]. However, in (STP), it has the following drawbacks. The unilaterally unbalanced deep submicron revolution that the supply and threshold voltages are structure tends to result in unequal noise margins and rise/fall times. scaled down, the static leakage becomes a critical issue. In recent It usually causes more serious performance impact in low-voltage technology, leakage accounts for about 20% of the total chip power normal operations. dissipation [6] and even over 40% for some high-performance SoC design [7]. Without advanced reduction, the leakage will dominate VDD the power dissipation in the coming nanometer era. VDD SLEEP 9 | In the fringing techniques, sleep transistors are usually inserted to turn off idle circuits to reduce the total power dissipation in both power management and test scheduling. However, switching from GND' and to the sleeping mode, the supply-gated circuits will be simulta- GND WAKE+ neously discharged or charged so that fatal peak power bursts out instantaneously. The on-and-off peak power, or called spike in short in this paper, will cause problems such as IR drop, electro-migration, Fig. 2 A zigzag structure in [9]. malfunction and even damage risk. In power-constrained concurrent As for the circuit in which the sleep transistors have been clus- systems such as power management and test scheduling in an SoC As for the critein unilate sleep transistor aveobeen clus mobile system, even though the spikes may be under power con- tered under other criteria of unilateral sleep transistor allocation [10 straint, they are often much higher than the estimated peak power in 11], the authors in [12] develop an automatic control pattern genera- the settled (power-steady) state. The sleep/wake-up time will seri- tor (ACPG) [13] to minimize the power-gating spikes during post ously affect other active modules and makes power management and gating phase for both power management in normal operations or test test scheduling either unreliable or inefficient [8]. For instance, Fig. 1 scheduligin test mode [14]. This scheme can prevent from chang- shows an example for power management or test scheduling. Obvi- ig the origial physical design; however, the control efficiency is ously the power-gating spikes of session D have made the total usually less than that in [9]. Additionally, in both previous work [9, power dissipation much excessive over the constraint. Therefore, 12], since the flipflops should be pulled unilaterally in sleeping mode, reducing the power-switching spikes through the sleep transistors data retention will be a problem and this may result in session slicing becomes another critical issue. difficult [15]. This paper proposes a self-selectively bilateral structure for sleep Power Power-up (D) Shut-down (D) transistor allocation. For combinational circuits, dummy sleep tran- -i .sistors are allocated in the opposite rail for balance. For each flipflop, ,the output will automatically select the least-spike sleep transistor to A ....1.IA. turn off and pull to another side to hold the data. The rest of this pa- Vl ~~~~~~~~~~per is organized as follows. In next section, a power switching model \\\\& & $//////>;//%///> 1, ~~is presented to explain the concept for spike reduction. Based on this \\\ N \ < i7 // w t l! ~~~~~~model, the switching activity-based heuristics is then developed to \ \ \ \\\ \ z ,, ,,,,,,,, -e e ^ > partition the circuit into clusters in Section Proposed Scheme. The Test time proposed structure is then illustrated. Prior to conclusions, the ex- periments will compare the STP with previous work and show the Fig. 1 A power-gating problem during scheduling, novel features of our work. MODELING tThis work is supported by the Industrial Technology Research Institute, Ithspertepordiiainiscsifditoow-tay Taiwn RO. uner Cntrat No S3-4S0* dissipation and power-gating dissipation. For completeness and 1-4244-0180-1/06/$20.OO ©2006 IEEE 1

Upload: kuen-jong

Post on 11-Apr-2017

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE 2006 International Symposium on VLSI Design, Automation and Test - Ambassador Hotel, Hsinchu (2006.4.26-2006.4.26)] 2006 International Symposium on VLSI Design, Automation and

A Supply-Gating Scheme for Both Data-Retention and Spike-Reductionin Power Management and Test Scheduling

Tsung-Chu Huangt, Jing-Chi Tzengt, Yuan-Wei Chaot, Ji-Jan Chent, Wei-Ting Liut and Kuen-Jong Lee

tDepartment of Electronic Engineering tSoC Technology Center Department of Electrical EngineeringNational Changhua University of Education Industrial Technology Research Institute National Cheng Kung University

1 Chinde Rd., Changhua, Taiwan 195-4, Sec. 4, Chung Hsing Rd., Hsinchu, Taiwan 1 Da-Hsueh Rd., Tainan, Taiwan

INTRODUCTION To reduce the spikes and wake-up time, the authors in [9] proposea zigzag super-cutoff structure for supply-gating CMOS circuits. Fig.

In the system-on-a-chip (SoC) revolution, power dissipation becomes 2 shows the basic concept. The circuit is partitioned into bipartitesone of the most important issues [1]. This issue has received much according to the logic values of gate outputs with respect to a givenattention for circuit design [2] while increasing efforts have been pattern at primary inputs (PIs). Those gates with an output 0 or 1 aredevoted to low power testing for CMOS logic circuits recently [3]. then clustered with alternative unilateral virtual supply (VDD') orFor reducing dynamic power dissipation, much work has been de- virtual ground (GND'), respectively. Although the sleep transistorvoted to vector or chain reordering, clock-gating, activity controlling allocation structure can easily reduce the spike-wakeup-time productor inhibiting, and circuit or chain partitioning [4, 5]. However, in (STP), it has the following drawbacks. The unilaterally unbalanceddeep submicron revolution that the supply and threshold voltages are structure tends to result in unequal noise margins and rise/fall times.scaled down, the static leakage becomes a critical issue. In recent It usually causes more serious performance impact in low-voltagetechnology, leakage accounts for about 20% of the total chip power normal operations.dissipation [6] and even over 40% for some high-performance SoCdesign [7]. Without advanced reduction, the leakage will dominate VDDthe power dissipation in the coming nanometer era. VDD SLEEP 9 |

In the fringing techniques, sleep transistors are usually inserted toturn off idle circuits to reduce the total power dissipation in bothpower management and test scheduling. However, switching from GND'and to the sleeping mode, the supply-gated circuits will be simulta- GND WAKE+neously discharged or charged so that fatal peak power bursts outinstantaneously. The on-and-off peak power, or called spike in shortin this paper, will cause problems such as IR drop, electro-migration, Fig. 2 A zigzag structure in [9].malfunction and even damage risk. In power-constrained concurrent As for the circuit in which the sleep transistors have been clus-systems such as power management and test scheduling in an SoC As for the critein unilate sleep transistor aveobeen clusmobile system, even though the spikes may be under power con- tered under other criteria of unilateral sleep transistor allocation [10straint, they are often much higher than the estimated peak power in 11], the authors in [12] develop an automatic control pattern genera-the settled (power-steady) state. The sleep/wake-up time will seri- tor (ACPG) [13] to minimize the power-gating spikes during postously affect other active modules and makes power management and gating phase for both power management in normal operations or testtest scheduling either unreliable or inefficient [8]. For instance, Fig. 1 scheduligin test mode [14]. This scheme can prevent from chang-shows an example for power management or test scheduling. Obvi- ig the origial physical design; however, the control efficiency isously the power-gating spikes of session D have made the total usually less than that in [9]. Additionally, in both previous work [9,power dissipation much excessive over the constraint. Therefore, 12], since the flipflops should be pulled unilaterally in sleeping mode,reducing the power-switching spikes through the sleep transistors data retention will be a problem and this may result in session slicingbecomes another critical issue. difficult [15].

This paper proposes a self-selectively bilateral structure for sleepPower Power-up (D) Shut-down (D) transistor allocation. For combinational circuits, dummy sleep tran-

-i .sistors are allocated in the opposite rail for balance. For each flipflop,,the output will automatically select the least-spike sleep transistor to

A ....1.IA. turn off and pull to another side to hold the data. The rest of this pa-Vl~~~~~~~~~~per is organized as follows. In next section, a power switching model

\\\\& &$//////>;//%///> 1, ~~is presented to explain the concept for spike reduction. Based on this

\\\N\< i 7 // w t l! ~~~~~~model, the switching activity-based heuristics is then developed to\ \\\\\\z ,, ,,,,,,,, -e e ^ > partition the circuit into clusters in Section Proposed Scheme. The

Test time proposed structure is then illustrated. Prior to conclusions, the ex-periments will compare the STP with previous work and show the

Fig. 1 A power-gating problem during scheduling, novel features of our work.

MODELING

tThis work is supported by the Industrial Technology Research Institute, Ithspertepordiiainiscsifditoow-tayTaiwn RO.uner Cntrat No S3-4S0* dissipation and power-gating dissipation. For completeness and

1-4244-0180-1/06/$20.OO ©2006 IEEE 1

Page 2: [IEEE 2006 International Symposium on VLSI Design, Automation and Test - Ambassador Hotel, Hsinchu (2006.4.26-2006.4.26)] 2006 International Symposium on VLSI Design, Automation and

comparison, the power-steady dissipation models are briefly re- our scheme. Based on the peak-power model using energy-per-cycleviewed. under the assumption of zero delay, only the total charge during the

supply switching time is considered. As a result, the spike defined inthis paper will be proportional to the total equivalent capacitancelooked from the sleeping transistor.

The supply-steady power dissipation in CMOS logic circuits consists PX(c +(n v)CL), (4)of dynamic and static parts. The dynamic power dissipation of gate g s

at clock t is approximated as follows.where v means the gate output value and n is the sleeping transistor

( (9)fCL()2 1 type (0 for p-type and 1 for n-type).Pd (91t 2 Z( ,)C(g)VDD ()2

Note that the loading capacitance CL is also assumed to be pro-where f, CL and r(g,t) are respectively the frequency, the loading portional to the fanout count Fg. The spike through an ng-type sleep

.. ~~~~transistor in this paper is then estimated ascapacitance and the Boolean value for the transition on g at t. Theshort-circuit power dissipation is usually approximated as below. K (5)P z~(KS + KL L(ng (3vg )Fg )P1

G

PI,, (g't) '10f(VDD vY,) (2)612 where the spike unit, PA, the power-gating dissipation factor, KS, and

where ,6is the transistor gain and yis the ratio of the rise (fall) time the loading dissipation factor, KL, will be estimated by comparing toto the period. the SPICE simulation result using a set of small circuits. Note that

the weighted transition count (WTC), P, is not accurate for all cir-As for the static power, this paper estimates the ratio of leakage cuits but enough to be used as a heuristics.

to the total power dissipation by measuring a set of small circuits.Thus the total supply-steady power dissipation of the circuit with G PROPOSED SCHEMEgates can be estimated by our simulation-based tool as the followingequation.

Fig. 4 shows the basic concept of proposed structure for a primitiveP =Pd +PS +PS (Pd1TgFg +Ps,ITg)/(1-L) (3) gate. Different from the unilateral structure in previous work [9, 12],

G G both the p-type and n-type supply switches (sleep transistors Sp andSn) are allocated for potential uses. It depends on the output value in

where Fg is the fanout count of gate g and L is the leakage rate. the last state of normal operations. When OUT=O (1), Sn (Sp) is stillturned on and pulls the output through the N- (P-) network. Thispositive feedback loop can hold the output value and provide a

Power-Gating Dissipation stronger drive to its down-stream gates in sleeping mode. Since Sp(Sn) is turned off, the leakage is highly reduced.

Without loss of generality, considering only the circuit inserted witha p-type sleeping transistor illustrated as Fig. 3(a). When the sleeptransistor is turned on and the p-network is still turned off, the first WAKE VDDspike occurs like with a time constant of about RsCs, where Rs is the Spequivalent output resistance looked from the drain of the sleep tran-sistor. Power-switching capacitance Cs is the equivalent capacitance P Netwoincluding Cdg, Cdb, Cds (subscripts d, g, b and s denote drain, gate,bulk and source, respectively) of the sleep transistor and Csg, Csb, Csd OUTof the pioneer transistors (the transistor most neighboring to thesleeping transistor) in the p-network. If the p-network is turned on N Networkafter an input delay td, the loading capacitance CL is then charged toVDD. Fig. 3(b) shows the power-up spikes in this case.

SLEEP E Sn

(a) VDD (b)

SLEEP A | I Fig. 4 The basic concept ofproposed structure.

<t<T/2 1/2f Note that the extra NOR and NAND gates can also be supply-P-Network > --,--slgated in some cluster without data retention. Additionally, in the

conceptual figure, the extra NOR and NAND gates result in a highoverhead for a primitive gate but negligible for a cluster of gates.

N-Network L Q'Obviously, apair of supply switches should be built in for-any-clusterl t ~~~~~~~~~withdata retention individually. For instance, a D-flipflop can be bi-

1 ts ~~~~~~~~~~~~partitioned into 3 shadowed and 2 blank primitive gates as shown inFig. 5. Only a pair of bilateral supply switches will be built in the

Fig. 3 A power-gated circuit and the typical spikes. flipflop. No matter what the output value, once the supply is cut offinstantly, the state can be held.

Obviously, the gate delay is critical to the accumulated spike butthis model is sufficient to illustrate the effect of spike reduction using

2

Page 3: [IEEE 2006 International Symposium on VLSI Design, Automation and Test - Ambassador Hotel, Hsinchu (2006.4.26-2006.4.26)] 2006 International Symposium on VLSI Design, Automation and

_ I k g L ~~~~~~~~~~~ClkI I00000 |~X=0 0X0|0 L-00H00LXS eep;

QO ___77777___

Q2

Fig. 5 A clustered D-flipflop. 0 50n 100n 150n 200n

As for the combinational part of the circuit at first, n random Fig. 7 Illustratig data retenhon of a ripple supply-gated counter.logic simulations are applied to the combinational circuit N. The ith Fi. 7 Isimulation separates N into bipartites, N° and N,'. A well-known In the second part of our experiments, we do SPICE simulationsHamming-distance-based clustering algorithm is then modified to for small circuits including some cascades of identical inverters andpartition the combinational circuit N into clusters with a similar gate the ISCAS89 benchmark circuits. This is also to measure the relatedcount [15] using the public tool in [17]. Something different are the parameters of the heuristic model and to prove their positive relation.weighted Hamming distance considering the heuristics based on Eq.5, Although the spike is dependent to the size of each sleep transistor,the termination of simulations by the greedy criteria, and the drop- we directly assume Ksj0.5 and KL=I here. Fig. 7 shows the SPICEping of qualified clusters to accelerate the partitioning. The gate simulations of a cascade of 8 identical inverters. The peak value ofnearest to Hamming center of the cluster is elected as the delegate. supply current spike is reduced form 827,uA to 557,uA when the in-

put is controlled. In sense of energy dissipation, the charge is meas-Fig. 6 shows the proposed supply-gating structure for other clus- ured by the integral of the curves and 88% of charge dissipation can

ters of the circuit. Control rails, WAKE and SLEEP, are provided for be reduced.each cluster in the whole circuit. In each cluster, a pair of virtualsupply (VDDO or VDD1) and virtual ground (GNDO or GND1) are supply step current responsecontrolled by the delegate. Considering placement and routing, theopposite clusters, such as clusters i andj, trends to be enclosed in a Uncontrolledblock as the dashed rectangle shows. SOOu

VDD _________________________________ControlledWAKE.

~~~~~Clu sters

VDDO- T

1I 0 In 2n

GNDI Time (fin) (TIME)GNDO

SLEEP Fig. 8 Simulation results of a cascade of 8 inverters.GND

In the last part of our experiments, we just prove the spike reduc-Fig. 6 Proposed supply-gating structure in this paper. tion without comparison with previous work in [9]. Due to the time

complexity of circuit-level simulations, only small circuits of theWhen a control pattern can be applied in the PIs and pseudo PIs ISCAS89 benchmark are simulated using SPICE to prove the power

(PPIs) either by scanning or resetting, the bipartites in a cluster will gating dissipation model. Table 1 shows the experimental results forbe distinctly separated and a 100% control rate can be achieved. ten ISCAS89 benchmark circuits using the weighted transition countEspecially for some applications, such as accidental switching, even defined in Eq. 5 and SPICE simulation results for three small circuits.though no control pattern will be applied to the PIs and the PPIs, the The unit is shown in the parentheses in each column. Column 1flipflop clusters can be controlled with data retention and most gates shows the FF count and the gate count of each circuit. In Column 2,in the combinational circuit are expected to be controlled by the the random sleep-transistor allocated circuits are simulated for com-feedback message from the delegate gates. parison. Column 3 shows the results of our scheme. In Columns 2

and 3, both the results based on the WTC heuristic model and theSPICE BSIM3 model are listed in Subcolumns WTC and SPICESimulation. Sub-subcolumns Spike and Twakeup represent themaximum current of the spikes and the duration to settle down to the

In the first part of our experiments, we try to prove the ability of data normal mode.retention of our structure. Some small sequential circuits are simu-lated in SPICE. Fig. 7 shows one simulated result of a 4-bit ripple From Rows s27-s510, we find the correlation of the peaks in thecounter consisting of four T-flipflops. Each T-flipflop is also parti- WTC calculation and the Spike in SPICE simulation can be abovetioned into bipartites like the D-flipflop in Fig. 5 and supply-gated874 . Therefore the WTC model is applied for large circuits in Rowsbilaterally. Only the three least-significant bits are shown and enough s1238-s38584. Columns labeled by a percent sign (0%) show the as-to validate the data retention during the sleep mode. The short sociated reduction rate. As a result, we find that if the sleep transis-wakeup time within several nano-seconds also makes the power tors can be allocated in advance, the spike power can be reduced upmanagement and test scheduling feasible. to 83%o in average.

3

Page 4: [IEEE 2006 International Symposium on VLSI Design, Automation and Test - Ambassador Hotel, Hsinchu (2006.4.26-2006.4.26)] 2006 International Symposium on VLSI Design, Automation and

CONCLUSIONS Power Gating Structures. In Proc. Int'l Symp. On Low-PowerElectronics andDesign, pp.22-5, 2003.

Power gating using sleep transistors is a trend for power management [9] K.-S. Min, H. Kawaguchi and T. Sakurai. Zigzag Super Cut-and test scheduling in the deep-submicron and even nanometer reso- off CMOS (ZSCCMOS) Block Activation with Self-Adaptivelutions. This paper develop a sleep transistor allocation structure that Voltage Level Controller: An Alternative to Clock-Gatingcan not only reduce the spike-time product with data retention but Scheme in Leakage Dominant Era. ISSCC Digest of Technicalalso balance the noise margins and timing in active mode. A switch- Papers, pages 400-401, Feb. 2003.ing activity based model is developed as a heuristics for sleep transis- [10] R. Vilangudipitchai and P. T. Balsara. Power Switch Net-worktor clustering. Under the proposed model, the spike reduction can be Design for MTCMOS. In Proc. Int'l Conf. on VLSI Design,up to 83% in average. pages 836-839, 2005.

[11] M. Anis, S. Areibi, M. Mahmoud and M. Elmasry. DynamicREFFERENCES and Leakage Power Reduction in MTCMOS Circuits using an

Automated Efficient Gate Clustering Technique. In Proc. De-[1] L. Whetsel. Addressable Test Ports - an Approach to Testing sign Automation Conf, pages 480-485, June 2002.

Embedded Cores. In IEEE Proc. Int'l Test Conference, [12] J.-C. Tzeng and T.-C. Huang. Vector Control Technique andpageslO55-1064, 1999. Sleep-Transistor Allocation for Supply-Gating Current Spike

[2] A. P. Chandrakasan, S. Sheng, and R. W. Brodersen. Low- Reduction in Power Management. To be published in IEEPower CMOS Digital Design. IEEE Journal of Solid-State Proc. Mobility Conference, Nov. 2005.Circuits, 27(4):473-484, Apr. 1992. [13] T.-C. Huang and K.-J. Lee. Reduction of Power Consumption

[3] P. Girard. Low Power Testing of VLSI Circuits: Problems and in Scan-based Circuits during Test Application by an InputSolutions. In IEEE Proc. Int'l Symp. on Quality Electronic Control Technique. IEEE Trans. on CAD of Circuits and Sys-Design, pages 173-179, 2000. tems, 20(7): 911-917, Jul. 2001.

[4] T.-C. Huang. Low-Power Testing for CMOS Logic Circuits. [14] R. M. Chou, K. K. Saluja and V. D. Agrawal. SchedulingPhD Dissertation, Nat'l Cheng Kong Univ., Taiwan, Jan. 2002. Tests for VLSI Systems under Power Constraints. IEEE Trans.

[5] N. Nicolici and B. M. Al-Hashimi. Power-Constrained Test- on VLSI Systems, 5(2):175-185, June 1997.ing of VLSI Circuits. Kluwer, NY. 2003. [15] W. Liao, J. M. Basile and L. He. Leakage power modeling and

[6] E. Acar, A. Devgan, R. Rao, Y. Liu, H. Su, S. Nassif and J. reduction with data retention. In Proc. IEEE Int'l Conf. CAD,Burns. Leakage and Leakage Sensitivity Computation for Pages 714-719, Nov. 2002.Combinational Circuits. In Proc. ISLPED, pages 96-99, Aug. [16] A. K. Jain and R. C. Dubes. Algorithms for Clustering Data.2003. NY: Prentice Hall, 1988.

[7] W. Liao, J. M. Basile and L. He. Leakage Power Modeling [17] G. Karypis and V. Kumar. hMETIS 1.5: A hypergraph parti-and Reduction with Data Retention. in Proc. of International tioning package. Technical report, Department of ComputerConference on Computer Aided Design, Nov. 2002. Science, University of Minnesota, 1998.

[8] S. Kim, S. V. Kosonocky and D. R. Knebel. Understandingand Minimizing Ground Bounce during Mode Transition of

Table 1: Power-gating spike reduction using our scheme.

Random Unilateral Allocation Our SchemeISCAS89 Benchmark

WTC SPICE Simulation l WTC SPICE Simulation

Circuit #FFs #Gates Spike Twakeup STP Spike %Spike Twakeup %Time STP ISTPCircuit{s #Gates (L (~tA) (ns) [ p)(~tA) ] J (ns) J I

s27 3 13 11.0 1014 5.3 5374.2 8.5 77.3 438 43.2 0.97 81.7 424.9 92.1

s444 21 202 110.0 3299 11.4 37608.6 88.9 80.8 1781 54.0 1.42 87.5 2529.0 93.3

s510 6 217 104.0 4178 12.8 53478.4 84.8 81.5 2152 51.5 1.54 88.0 3314.1 93.8

s1238 18 526 294.5 NA NA NA 240.9 81.8 NA NA NA NA NA NA

s1423 74 731 543.0 NA NA NA 449.6 82.8 NA NA NA NA NA NA

s5378 179 2958 1747.5 NA NA NA 1459.2 83.5 NA NA NA NA NA NA

s9234 228 5825 3468.0 NA NA NA 2927.0 84.4 NA NA NA NA NA NA

s13207 669 8620 5446.0 NA NA NA 4694.5 86.2 NA NA NA NA NA NA

s15850 597 10369 |6405.5 NA NA NA |5540.8 86.5 NA NA NA NA NA NA

s38584 1452 20706 |14226.0 NA NA NA |12447.8 87.5 NA NA NA NA NA NA

[Average || 3236, ] 2830 j 9.8 [ 32153.7 || 2794.2 f83.2 [1457] 49.61 1.3 f 85.7 ]2089.3 f 3.1

NA: Not Available

4