invited paper extended dynamic voltage scaling for low...

Invited Paper

Extended Dynamic Voltage Scaling for Low Power DesignBo Zhai, David Blaauw, Dennis Sylvester, *Krisztian Flautner

{bzhai, blaauw, dennis}@umich.edu, University of Michigan, Ann Arbor, MI* [email protected], ARM Ltd., Cambridge, UK

AbstractDynamic voltage scaling (DVS) is a popular approach for energy

reduction of integrated circuits. Current processors that use DVS typ-ically have an operating voltage range from full to half of the maxi-mum Vdd. However, it is possible to construct designs that operateover a much larger voltage range: from full Vdd to subthreshold volt-ages. This possibility raises the question of whether a larger voltagerange improves the energy efficiency of DVS. First, from a theoreti-cal point of view, we show that for subthreshold supply voltagesleakage energy becomes dominant, making “just in time completion”energy inefficient. We derive an analytical model for the minimumenergy optimal voltage and study its trends with technology scaling.Second, we compare several different low-power approaches includ-ing MTCMOS, standard DVS and extended DVS to subthresholdoperation. Study of real applications on commercial processor showsthat extended DVS has the best energy efficiency. Therefore, we con-clude that extending the voltage range below Vdd/2 will improve theenergy efficiency for most processor designs.

1 Introduction

Due to technology scaling, microprocessor performance hasincreased tremendously albeit at the cost of higher power consump-tion. Energy efficient operation has therefore become a very pressingissue, particularly in mobile applications that are battery operated.Dynamic voltage scaling (DVS) was proposed as an effectiveapproach to reduce energy use and is now used in a number of low-power processor designs [1][2][3].

Most applications do not always require the peak performancefrom the processor. Hence, in a system with a fixed performancelevel, certain tasks complete ahead of their deadline and the proces-sor enters a low-leakage sleep mode [4] for the remainder of the time.This operation is illustrated in Figure 1(a).

In DVS systems however, the performance level is reduced duringperiods of low utilization such that the processor finishes each task“just in time,” stretching each task to its deadline, as shown in Figure1(b). As the processor frequency is reduced, the supply voltage canbe reduced. As shown by the equations below1, the reduction in fre-quency [5] combined with a quadratic reduction from the supplyvoltage results in an approximately cubic reduction of power con-sumption. However, with reduced frequency the time to complete atask increases, leading to an overall quadratic reduction in the energyto complete a task.

DVS is therefore an effective method to reduce the energy consump-tion of a processor, especially under wide variations in workload thatare increasingly common in mobile applications. Hence, extensivework has been performed on how to determine voltage schedules thatmaximize the energy savings obtained from DVS [4][8].

In most current DVS processor designs, the voltage range is lim-ited from full Vdd to approximately Vdd/2. In Table 1, the availablerange of operating voltages and associated performance levels areshown for three commercial designs. The lower limit of voltage scal-

ing is typically dictated by voltage and noise-sensitive circuits, suchas pass-gates, PLLs, and sense amps and results from applying DVSto a processor “as is” without special redesign to accommodate oper-ation over a wide range of voltage levels. However, it is well knownthat CMOS circuits can operate over a very large range of voltagelevels down to less then 200mV. In such “subthreshold” operatingregimes, the supply voltage lies below the threshold voltage and thecircuit operates using leakage currents. Work has been reported ondesigns that operate at subthreshold voltages [6][7] and it wasreported that the minimum allowable supply voltage of a functionalCMOS inverter is 36mV [9]. An example of a commercial productthat uses subthreshold operation for extremely low power applica-tions is shown in [10].

With some additional design effort, it is possible to significantlyextend the operating voltage range of processors. One issue thatneeds to be addressed is the determination of a lower limit of thevoltage range for optimal energy efficiency. The optimal voltage limitdepends on two factors: the power/delay trade-offs at low operatingvoltages and the workload characteristics of the specific processor. Inthis paper we address both of these issues.

First, we show that the quadratic relationship between energy andVdd deviates as Vdd is scaled down into the subthreshold region ofMOSFETs. In subthreshold operation the “on-current” takes the formof subthreshold current, which is exponential with Vdd, causing thedelay to increase exponentially with voltage scaling. Since leakageenergy is linear with the circuit delay, the fraction of leakage energyincreases with supply voltage reduction in the subthreshold regime.Although dynamic energy reduces quadratically, at very low voltageswhere dynamic and leakage energy become comparable, the totalenergy can increase with voltage scaling due to the increased circuitdelay. In this paper, we derive an analytical model for the voltage thatminimizes energy and we show that it lies well above the previouslyreported [9] minimal allowable operating voltage of 36mV. We verifyour model using SPICE and also study its trends as a function of dif-ferent design and process parameters. As one of the results, our workshows that operation at voltages well below threshold is neverenergy-efficient.

A second issue that determines whether to apply such aggressive,or extended, DVS (which we refer to as INSOMNIAC) is based onthe workload characteristics of the processor. Clearly it is not neces-sary to extend the voltage range below that which is needed based on

1. The 1.3-power [5] scaling of current is only valid for high supply voltages when car-rier velocity saturates. Subthreshold scaling of the supply voltage with performancefor low voltage operation will be extensively discussed in Section 3.

Figure 1. Illustration of optimal task scheduling(a) (b)

t

t0

0

Vdd2

Vdd1

Vd

dF

req

f2

f1

t

t

Freq

0

fnormal

0

Vddnormal

Vd

d

task1 task2 task1 task2

Delay 1f---=

CsV ddIdsat

------------------V dd

V dd V th–( )1.3---------------------------------------∝= Power fV dd

2∝ Energy V dd2∝

Table 1. Commercial processor designs and range of voltage scaling

Voltage Range Frequency Range

IBM PowerPC405LP [3]

1.0V-1.8V 153M-333M

TransMeta Cru-soe TM5800 [1]

0.8V-1.3V 300M-1G

Intel XScale80200 [2]

0.95V-1.55V 333M-733M

the expected workload of the processor. To analyze the energy effi-ciency of different low-power schemes, including MTCMOS andINSOMNIAC, we compare the energy consumption for a number ofworkload traces obtained from a processor running a wide range ofapplications. Our results show that most applications benefit signifi-cantly from an operating voltage range that is wider than what isavailable in most current DVS processors. Only very low activityapplications with long idle times benefit greatly from pure MTC-MOS.

The remainder of this paper is organized as follows. Section 2 pro-vides an overview of the voltage limit for functionally correct CMOSlogic. Section 3 presents our analysis of the minimum voltage scalinglimit for optimal energy efficiency. Section 4 presents our energymodeling for different low-power approaches. Section 5 presents thecomputation and analysis based on our workload energy modeling inSection 4. Finally, Section 6 contains our conclusions.

2 Circuit Behavior at Ultra Low Voltages

Before we derive the energy optimal operating voltage in Section3, we first briefly review the minimum operating voltage that isrequired for functional correctness of CMOS logic. The minimumoperating voltage was first derived by Swanson and Meindl in [9] andis given as follows:

(EQ 1)

where Cfs is the fast surface state capacitance per unit area, Cox is thegate-oxide capacitance per unit area, and Cd is the channel depletionregion capacitance per unit area. For bulk CMOS technology, weknow that subthreshold swing can be expressed as follows:

(EQ 2)

From this, we can rewrite EQ1 as follows:

(EQ 3)

For 0.18um technology Ss is typically in the range of 90mV/decade,and therefore

. (EQ 4)

Hence, it is theoretically possible to operate circuits deep into thesubthreshold regime given that typical threshold voltages are muchlarger than 48mV. In fact, SPICE simulation confirms that it is possi-ble to construct an inverter chain that works properly at 48mV,although at this point the internal signal swing is reduced to less than30mV. Based on SPICE simulations, we find that it is possible tooperate a wide range of standard library gates at similar operatingvoltages and that their delays track relatively closely to that of theinverter. However it is clear that there are practical reasons why oper-ating circuits at the minimum voltage is not desirable, such as sus-ceptibility to noise and process variations [13]. We show in the nextsection that the minimum operating voltage for functionally correct-ness also does not provide energy optimal operation.

3 Minimum Energy Analysis

We first illustrate the energy dependence on supply voltage using asimple inverter chain consisting of 50 inverters. A single transition isused as a stimulus and energy is measured over the time period nec-essary to propagate the transition through the chain. The energy-Vddrelation is plotted in Figure 2. It is seen that the dynamic energy com-ponent Eactive reduces quadratically while the leakage energy, Eleak,increases with voltage scaling. The reason for the increase in leakageenergy in the subthreshold operating regime is that as the voltage isscaled below the threshold voltage, the on-current (and hence the cir-cuit delay) increases exponentially with voltage scaling while the off-current is reduced less strongly. Hence, the leakage energy Eleak willrise and supersede the dynamic energy Eactive at 180mV. This effect

creates a minimum energy point in the inverter circuit that lies at200mV, as shown in Figure 2.

In the previous example, if the inverter chain is pipelined logicbetween two registers we are implicitly assuming that there is alwaysone input transition per clock cycle. However, the switching activityvaries across circuits so we include the input activity factor α, whichis the average number of times a node makes a power consumingtransition in one clock period. We now derive an analytical expres-sion for the energy of an inverter chain as a function of the supplyvoltage. Suppose we have an n-stage inverter chain with activity fac-tor of α. The standard expression for subthreshold current is given by[11]:

(EQ 5)

where,

(EQ 6)

In EQ6 we again assume Ss is 90mV/decade. We now express thetotal energy E per clock cycle as the sum of dynamic, leakageenergy1:

(EQ 7)

where

First, we focus on finding an accurate estimate of tp. Let tp,stepdenote the ideal inverter delay with a step input and tp,actual denotethe actual inverter delay with an input rising time of tr. We can com-pute tp,step based on a simple charge-based expression:

(EQ 8)

where Ion is the average on-current of a inverter. Furthermore, fornormal operating voltages, the step delay can be extended to theactual delay as follows [15],

(EQ 9)

V dd limit, 2kTq

------- 1C fs

Cox Cd+------------------------+ 2

CdCox----------+

ln⋅ ⋅ ⋅=

2 V T 2CdCox----------+

ln⋅ ⋅≅

Ss 10ln V T 1CdCox----------+

⋅ ⋅=

V dd limit, 2 V T 1Ss

10ln V T⋅------------------------+

ln⋅ ⋅=

52mV 1Ss

59.87mV-----------------------+

at 300Kln⋅≅

V dd limit, 48mV=

1. Note that we assume that short circuit power is negligible and can be ignored. Thisassumption is known to hold for well-designed circuits in normal (super-threshold)operation [12]. Using SPICE simulations we have found that this assumption holdsin subthreshold operation as well.

Figure 2. Energy as a function of supply voltage for an inverter chain.

0 0.5 1 1.5 210

−20

10−18

10−16

10−14

10−12

Vdd

En

erg

y

different energy components (log scale)

Etotal

Eactive

Eleak

0 0.1 0.2 0.3 0.4 0.50

0.5

1

1.5

2

2.5x 10

−14

Vdd

En

erg

y

different energy components (linear scale)

Etotal

Eactive

Eleak

I sub µeff CoxW

Leff----------- m 1–( )V T

2e

V gs V th– V off–

mV T---------------------------------------------

1 e

V dsV T----------–

–

=

mSs

10ln V T⋅------------------------ 90

10ln 26×------------------------ 1.51= = =

E Eactive E+leak

≅

α n Eswitch inv,⋅ ⋅ Pleak+ td⋅=

α n12--- Cs V dd

2⋅ ⋅ n V dd I leak⋅ ⋅( ) n t p⋅( )⋅+⋅ ⋅=

α - circuit switching factorn - number of stagesEswitch,inv- switching energy of a single inverterPleak - total leakage power of the entire inverter chaintd - delay of the entire inverter chainCs - total switched capacitance of a single inverterIleak - leakage current of a single invertertp - delay of a single inverter

t p step,

12--- Cs V dd⋅ ⋅

Ion------------------------------=

t pHL actual, t pHL step,2 tr

2----

2

+=

It is shown in [12] that if tr > tpHL,actual (which is satisfied when aninverter drives another one of the same size, as in our modelling),

(EQ 10)

Substituting EQ10 into EQ9 gives,(EQ 11)

Similar results hold for tpLH [12]. We then can estimate the averagetp,actual as:

(EQ 12)

However, we need to test if this linear model is valid for sub-threshold operation. To justify the linear modelling of tp,actual withtp,step at such a wide supply voltage range, we plot the calculated η asa function of Vdd, based on SPICE simulation in Figure 3.

From Figure 3, it is clear that the coefficient η increases as thesupply voltage is reduced to the subthreshold regime. Other factorsaffecting the accuracy are that EQ5 does not perfectly model Isub in

subthreshold operation1 and that voltage swing degrades at ultra lowsupply voltages. Taking these factors into account, for the target tech-nology we set an effective η=2.1 for subthreshold operation.

As the supply voltage reduces, the total energy consumptionreaches a minimum at some supply voltage (referred to as Vmin) sincethe delay of the circuit increases and the circuit now leaks over alarger amount of time. Substituting the equation for circuit delayEQ12 into EQ7, we obtain the following expression for total energy:

(EQ 13)

Note that Ion here is subthreshold “on” current because we are focus-ing on the subthreshold region where Vmin occurs. By substituting

EQ5 into EQ13, we now arrive at our final expression for the totalenergy as a function of supply voltage for subthreshold operation:

(EQ 14)

Based on this simple expression of total energy, we can find theoptimal minimum energy voltage Vmin by setting / = 0. Letu=η⋅n/α and t=Vdd/mVT, we obtain:

(EQ 15)

We rewrite the above equation as:

(EQ 16)

By doing this, we can easily find that only if u≥2e3(t =3) can E have aminimum, which means the lowest Vmin is 3mVT. This corresponds ton≅4 if η=2.1, α=0.2.

Since EQ15 is a non-linear equation, it is impossible to solve itanalytically. Hence, we use curve-fitting to arrive at the followingclosed-form expression:

(EQ 17)Substituting the original variables gives the following final expres-sion for the energy optimal voltage:

(EQ 18)

Note that in the presented model, the only parameters that aretechnology-dependent are η and m. Hence, when we switch from onetechnology to another, it is only required to determine these twoparameters which is readily accomplished. Interestingly, the totalenergy in EQ14 and the optimal energy voltage Vmin do not dependon the threshold voltage Vth, as verified using SPICE. This indepen-dence is caused by the fact that in subthreshold operation both leak-age and delay have similar dependencies on Vth, and hence the effect

1. We find that over the entire subthreshold region(0<Vdd<Vth), Isub deviates from thesimple exponential equation(EQ5) by at most 20% if we treat mobility µ as constant.

t pHL actual, 0.84tr=

t pHL actual, 1.2445 t pHL step,⋅=

t p actual, 1.2445 t p step,⋅=

η t p step,⋅=

Figure 3. The ratio η in EQ11 with Vdd (SPICE)

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.81

1.5

2

2.5

Vdd

η=t p

,act

ual

/tp

,ste

p

inverter step delay and actual delay analysis

E12--- α n Cs V dd

2⋅ ⋅ ⋅ ⋅ n V dd I⋅leak

nηCsV dd

2 Ion----------------------⋅ ⋅ ⋅+=

12---nCsV dd

2 α η nI leakIon

--------------⋅ ⋅+

⋅=

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8x 10

−14

Vdd

En

erg

y

Energy−Vdd with different α (n=20)

α=1

α=0.8

α=0.5

α=0.2

Figure 4. Energy-Vdd for an inverter chain(n=20)

E12---nCsV dd

2 α η n e

V ddmV T-------------–

⋅ ⋅+

⋅=

∂E ∂V dd

et u

2--- t u–⋅=

u et

t2--- 1–------------=

t 1.587 uln 2.355–=

V min 1.587 η nα---⋅

2.355–ln mV T⋅=

Figure 5. Inverter chain Energy-Vdd (analytical model vs. SPICE)

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.450

1

2

3

4

5x 10

−14

Vdd

En

erg

y

Energy−Vdd

simple modelSPICE simulation

neff

=80

neff

=70

neff

=60

neff

=50

neff

=40

neff

=30

neff

=20

neff

=10

Figure 6. Minimal energy Vmin with inverter effective stage number neff

10 20 30 40 50 60 70 80 900.1

0.12

0.14

0.16

0.18

0.2

0.22

0.24

neff

Vm

in

Vmin

− neff

Vmin

of model numerical computationV

min of closed form expression

Vmin

of SPICE simulation

of Vth on the total energy cancels out. Also, we find that the mini-mum energy voltage is strongly dependent on the number of stages inthe inverter chain. This is due to the fact that in a longer inverterchain more gates are leaking relative to the dynamic energy compo-nent, causing Vmin to occur at a higher voltage. Finally, we point outthat Vmin is strongly related to the activity factor α. In a circuit with alower α, Vmin occurs at a larger voltage than in a circuit with higherα, because a lower α gives the circuit more time to leak and effec-tively increases the stage number, as shown in Figure 4. We therefore

introduce the notation of effective stage number as .

In order to verify the accuracy of the proposed model, we com-pared the results from EQ14 with SPICE simulations for inverterchains of different lengths. In Figure 5, we compare the energy-Vddrelationship predicted by the proposed analytical model in the sub-threshold region with SPICE simulation results for an industrial0.18um process. The plot shows a range of effective inverter chainlengths (neff). As shown in Figure 5, the analytical model matchesSPICE well, except at voltages less than 100mV. In this region, themodel tends to underestimate the rise in energy consumption due tothe dramatic increase of η from Figure 3, resulting in a delay that isgreater than expected. However, this is not a severe problem since theimportant region of modeling is around Vmin, where the proposedmodel shows good accuracy.

In Figure 6, we compare the predicted minimum energy voltageVmin based on our model with that measured by SPICE simulation. Inthe plot, the results using the fitted closed-form expression of EQ18are shown as well as the numerical solution of the non-linear equa-tion in EQ15. As can be seen, both match SPICE with a high degreeof accuracy for a wide range of effective inverter chain lengths neff.

We now consider the energy optimal voltage for more complexgates, such as NANDs and NORs. Based on SPICE simulations for a2-input NAND (NAND2), we find that the minimum voltage Vminshifts upward compared with the inverter chain. This is caused by thefact that for a chain of NAND2 gates, the number of leaking PMOStransistors is doubled in every other gate and NMOS transistors aretwice the size compared to the inverter. The capacitance increasedoes not affect the Vmin because the delay and the switching energyare proportional to the loading Cs. Now we introduce n’eff,inv as theequivalent stage number of a inverter chain that gives the same Vminas a NAND2 chain with neff,nand2. The n’eff,inv proves a little smallerthan twice neff,nand2 due to the stack affect in the nmos transistors anda slightly larger driving ability of the pull-down nmos. We thereforecompute n’eff,inv value for the NAND2 chain:

(EQ 19)

Using this n’eff,inv, we obtain an accurate match between the mod-eled Vmin and SPICE simulation as shown in Figure 7. Other complexgates can be modeled in a similar way by calculating an appropriate

n’eff,inv value.

4 Energy Modelling for Different Low-PowerApproaches

In this section, we compare the energy efficiency of MTCMOS-based power gating and DVS. First, we define five different systems:

• Sbasic, a basic system with clock gating but without power-gatingor DVS.

• Smtcmos, a system with the ability of power-gating during idlemode (clock gating is implied).

• Sdvspg, a partial DVS system with power-gating ability where theminimum scalable voltage is Vlimit, set to be Vdd/2 in the next sec-tion.

• Sdvsonly, a system similar to Sdvspg but without power-gating.• Sinsom, an INSOMNIAC system with unlimited voltage scaling

ability, down to the energy optimal voltage.For Sbasic, the energy consumption during idle mode is leakage

and we can model the totally energy for workload Ebasic as:(EQ 20)

where ton is the time the processor stays busy and toff is the idle time.For Smtcmos, the energy consumption is modeled as:

(EQ 21)

where Eoverhead is the overhead energy when gating the power sup-ply, the Cpowerrail is the virtual supply rail capacitance, Cinternal is thetotal internal node cap, Csleep is the gate capacitance of the sleeptransistors. Eoverhead results from three sources: the power rail chargeloss, the circuit internal node charge loss and sleep transistor gatecharge needed to conduct power gating. Depending on whether aheader or footer transistor is used in Smtcmos, the processor will losethe voltage level on virtual Vdd or virtual ground respectively, shortlyafter it enters sleep mode [18]. Without loss of generality, we use theaverage between virtual ground and virtual Vdd capacitance. To con-sider the internal node capacitances, we assume half of internal nodesare in at 1/0. Hence, half of the internal nodes are charged high ordischarged low.

According to recent research in [18][19], the virtual power rail canbe restored in several cycles if the circuits is carefully designed andthus the wakeup process can be treated as instantaneous compared tothe thousands/millions of cycles of useful operation.

In order to model Sdvspg and Sdvsonly, we must know how a DVSsystem implements the voltage scaling process. There are severalways to design a DVS system [16][17]. In this paper, we assume amethod similar to the IBM405LP [17], which is illustrated in Figure8. First, we define some system constants:

Suppose the system scales from state (V1, f1) to (V2, f2), thenthere are two possible cases:1. Scaling voltage down -- V2 ≤ V1

For Sdvsonly, the energy consumption for a certain application is:

neffnα---=

n ′eff inv,neff nand2,-------------------------------

I leak nand2,I leak inv,

----------------------------------Ion inv,

Ion nand2,-----------------------------⋅=

1.911.1----------≅

1.74=

Figure 7. Minimal energy Vmin with NAND2 effective stage number neff

10 20 30 40 50 60 70 80 900.1

0.15

0.2

0.25

0.3

neff

Vm

in

Vmin

− neff

Vmin

of model numerical computationV

min of closed form expression

Vmin

of SPICE simulation

Ebasic Pact vdd, ton Pleak ton toff+( )⋅+⋅=

Emtcmos Pact vdd, ton⋅ Pleak ton Eoverhead+⋅+=

Eoverhead12--- C powerrail V dd

2⋅ ⋅ 12--- Cinternal V dd

2⋅ ⋅ Csleep V dd2⋅+ +=

τf = 1us, the constant time it takes to scale the frequencykv = 2mV/us, the speed of the regulator to scale the voltage

Figure 8. Illustration of scaling process for a DVS system

Voltage Scales Up

Voltage Scales Down

tfscaletvscale trest

(V 2,f2)(V 1,f1)

tfscale tvscale trest

(V 2,f2)(V 1,f1)

tinterval

(V 1 V2,f1)

tinterval

(V1 V2,f2)

tidle

(V 2,f2)

(V 2,f2)

tidle

(V 1,f1 f2)

(V 2,f1 f2)

(EQ 22)

For Sdvspg,the energy consumption is

(EQ 23)

In both cases,

where V2,request is the target voltage request by the application.2. Scaling voltage up -- V2 > V1, for both Sdvsonly and Sdvspg

(EQ 24)

For an optimal voltage up-scaling process, tidle is always zero imply-ing that Sdvsonly and Sdvspg have the same expression of energy con-sumption.

A key difference between scaling up and scaling down is that scal-ing up involves extra energy consumption caused by voltage levelchange as it draws current from the power supply to charge up thelevel difference.

In order to make a practical comparison, we extract detailed phys-ical parameters from an existing Alpha processor from its layout asthe basis for the following discussion (shown in Table 2).

In leading-edge processors leakage power is much more substantialthan that shown in the baseline 0.18um Alpha processor. Therefore,we intentionally reduce the Vth in this process to make the leakagepower around 11% of active power, which is reasonable for modernprocessors.

Due to the fact that a DVS system involves a wide range of operat-ing voltages, the regulator efficiency (usually a buck dc-dc converter)will degrade if the loading power is small due to the internal loss ofregulator becoming relatively large. We developed a regulator effi-ciency (ηregulator) model based on [20][21][22] to consider thiseffect.

5 Energy Optimality for Different Work Loads

As discussed earlier, the energy optimal voltage depends on bothcircuit and technology characteristics. At the same time, the bestchoice for the minimum allowed voltage for a processor depends onits workload distribution. If the workload of a processor is such thatlow performance levels are never or rarely required, the minimumoperating voltage for energy-efficient operating will be larger thanthe minimum voltage Vmin computed in Section 3. Furthermore, forMTCMOS, the length of idle periods determine whether the switch-ing overhead can be amortized.

Hence, we studied a number of different applications running onLinux with a Transmeta Crusoe TM5600 processor with dynamic

voltage scaling and recorded traces of the minimum necessary per-formance levels for each application. The applications comprise bothmultimedia and interactive applications:

• emacs is a trace of user activity using the editor performing lighttext editing tasks

• konqueror and netscape are traces of web browsing sessions usingthe two browsers

• fs contains a record of filesystem-intensive operations• mpeg is a trace using MPEG2 video playback

To make a fair comparison, we convert these traces to fit into theAlpha processor mentioned earlier. By applying four different low-power schemes to these workloads, we can compute the energy sav-ings relative to Sbasic based on the models in Section 4. The resultsare shown in Figure 9. As the bar graph shows, the largest savings forall five applications are seen on Sinsom. Note that for a less leaky pro-cess, Sinsom will be even more efficient. Also, the graph shows that itis usually helpful if we apply power gating when the processor is notable to scale as needed, because for all these applications the proces-sor runs long enough to amortize the switching overhead and thuscan save significant leakage energy.

To obtain a more general evaluation for these approaches, we ana-lyze the energy savings under artificial workload. We characterize aworkload by two parameters: N, activity, where the N is the totalnumber of cycles that the processor will run before the deadline atnormal operation mode, and the activity is the ratio of the number ofworking cycles to N, as shown in Figure 10. Both of these two factors

Edvsonly sd, Pact V 1, Pleak V 1,+( ) τ f⋅

Evscale f 2 V 1, , V 2→ Pact V 2, Pleak V 2,+( )+ trest Pleak V 2, tidle⋅+⋅+

=

Edvspg sd, Pact V 1, Pleak V 1,+( ) τ f⋅

Evscale f 2 V 1, , V 2→ Pact V 2, Pleak V 2,+( )+ trest

δ 12--- Cinternal V 2

2⋅ ⋅ Csleep V 22⋅ 1

2--- C powerrail V 2

2⋅ ⋅+ + ⋅+

⋅+

=

if V 2 request, V limit≥( ) V 2 V 2 request, tidle, 0 δ, 0= = =

if V 2 request, V limit<( ) V 2 V limit tidle 0 δ,>, 1= =

Edvs su, Pact V 2, Pleak V 2,+( ) τ f⋅ Evscale f 1 V 1, , V 2→

Pact V 2, Pleak V 2,+( )+ trest12--- C powerrail V 2

2V 1

2–( )⋅ ⋅

12--- Cinternal V 2

2V 1

2–( )⋅ ⋅+

+⋅

+=

Table 2. Physical parameters from an Alpha processor in a0.18um technology (without caches considered)

voltage 1.8 volts

total # of transistors 554,052

total # of gates 39,703

power rail capacitance(Vdd & ground) 447.34 pF

internal node capacitance(interconnect included)

3519.669 pF

total gate capacitance 948.079 pF

sleep transistor gate cap.(assumed to be 10% gate cap)

95 pF

Figure 9. Comparison of energy consumption under differentlow-power schemes

fs emacs konqueror netscape mpeg0

10

20

30

40

50

60

70

80

90

100

applications

norm

aliz

ed e

nerg

y co

nsum

ptio

n(%

)

Sbasic

Smtcmos

Sdvs

without mtcmosS

dvs with mtcmos

Sinsom

Figure 10. Illustration of cycle number N and activity

BUSY IDLE

N*activity N*(1- activity )

N cycles

Figure 11. Energy savings with Smtcmos for general workloads.

102

103

104

00.2

0.40.6

0.81

−40

−20

0

20

40

60

80

100

N

energy savings with the number of cycles and activity (Smtcmos

)

activity

ener

gy

savi

ng

(%)

influence the design choice of low-power systems.The energy savings for Smtcmos with general workloads are shown

in Figure 11. Smtcmos is useful when activity is very low and the num-ber of cycles is large, which gives savings close to 100% and evenbetter savings than Sdvspg. This is because energy is almost com-pletely due to leakage in the Sbasic system. The reason that we findnegative savings for Smtcmos is because the extra power gating energyoutweighs the potential leakage energy in Sbasic.

Figure 12 contains the results for both Sdvsonly and Sdvspg. A sys-tem without power gating is independent of the total number ofcycles, which can be easily derived from the models in Section 4.When activity is high and the application never requests a voltageless than half Vdd, there will be no idle period if an optimal schedul-ing is used. Therefore no difference exists between Sdvsonly andSdvspg at this activity range. When activity drops below a certainvalue and leads to requesting voltages lower than Vlimit, Sdvspgbecomes better than Sdvsonly because of leakage saving. This con-firms again that for modern state-of-the-art partial-DVS systems, it ishelpful if the system also includes power-gating to avoid unwantedleakage at run-time.

For Sinsom, the energy savings are independent of N, but for con-sistency we still plot the savings in three dimensions as shown in Fig-ure 13. It is clearly shown that Sinsom can easily provide energy gainsfor a very wide range of activity, and gives much improved savingsthan Sdvspg or Smtcmos. Only when activity is very low does theenergy savings become saturated, which occurs since the leakagedominates the overall system energy consumption. The system hasscaled below Vmin at this point, as described in Section 3.

6 Conclusions

In this paper, we developed analytical models for the most energyefficient supply voltage (Vmin) for CMOS circuits. A number of inter-esting conclusions can be drawn: 1) Energy shows clear minimum inthe subthreshold region since the time over which a circuit is leaking(delay) grows exponentially in this region while leakage current itselfdoes not drop as rapidly with reduced Vdd, 2) Vmin does not depend

on Vth if Vmin is smaller than Vth, 3) the circuit logic depth andswitching factor impacts Vmin since it relates to the relative contribu-tions of leakage energy and active energy and 4) the only relevanttechnology parameters to Vmin are subthreshold swing and the depen-dency of delay on input transition time. The analytical models pre-sented are shown to match very well with SPICE simulations.

In the second part of the paper, we compare the energy savings ofdifferent low-power schemes, namely, pure MTCMOS, partial-DVS,partial-DVS with power gating, and INSOMNIAC (extended-DVS).The comparison for five applications traces recorded on a commer-cial processor shows that INSOMNIAC provides the largest savings.A comparison for arbitrary workloads shows that for the majority ofdifferent application activity ratios INSOMNIAC continues to yieldthe largest energy improvements.

AcknowledgementsThis research was supported by ARM, NSF, SRC, GSRC-DARPA

References[1] Transmeta Crusoe. http://www.transmeta.com/[2] Intel XScale. http://www.intel.com/design/intelxscale/[3] IBM PowerPC. http://www.chips.ibm.com/products/powerpc/[4] K. Flautner, S. Reinhardt, and T. Mudge, “Automatic Perfor-

mance Setting for Dynamic Voltage Scaling,” In Proc. of the 7thAnnual International Conference on Mobile Computing andNetworking (MobiCom’01), May 2001.

[5] T. Sakurai and A. Newton, “Alpha-Power Law MOSFET Modeland Its Applications to CMOS Inverter Delay and other Formu-las”,IEEE JSSCC, Vol. 25, No. 2, April 1990.

[6] M. Miyazaki, J. Kao, A. Chandrakasan, “A 175mV Multiply-Accumulate Unit using an Adaptive Supply Voltage and BodyBias (ASB) Architecture”, ISSCC 2002, pp. 58-59.

[7] A. Wang, A. Chandrakasan, “A 180mV FFT Processor UsingSubthreshold Circuits Techniques”, ISSCC 2004, pp. 292-294.

[8] K. Flautner and T. Mudge, “Vertigo: automatic performance-setting for Linux,” In 5th Symp. Operating Systems Design &Implementation, pp. 105-116, Dec 2002.

[9] J. D. Meindl and J. A. Davis, “The fundamental limit on binaryswitching energy for terascale integration (TSI),” IEEE JSSCC,vol. 35, pp. 1515-1516, Oct. 2000.

[10] F. Møller, “Algorithm and architecture of a 1-V low-powerhearing instrument DSP”, ISLPED, pp. 711, Aug. 1999

[11] BSIM3. http://www-device.eecs.berkeley.edu/~bsim3/get.html[12] J. Rabaey, “Digital Integrated Circuits: A Design Perspective”,

Prentice Hall, 1996.[13] H. Soeleman, K. Roy, “Digital CMOS logic operation in the

sub-threshold region”, in GVLSI, pp. 107-112, March 2000.[14] A. Wang, A.P. Chandrakasan and S.V. Kosonocky, “Optimal

supply and threshold scaling for subthreshold CMOS circuits,”IEEE Symposium on VLSI, pp. 5-9, April 2002

[15] D. Hodges and H. Jackson, Analysis and Design of Digital Inte-grated Circuits, McGraw-Hill, 1988

[16] “Energy Efficient Microprocessor Design”, T. Burd and R.Brodersen, Kluwer Academic Publishers

[17] K. Nowka, G. Carpenter, et al, “A 32-bit PowerPC System-on-a-Chip With Support for Dynamic Voltage Scaling and DynamicFrequency Scaling”, JSSCC, pp. 1441-1447, vol. 37, Nov. 2002

[18] J. Tschanz, S. Narendra, Y. Ye, et al, “Dynamic Sleep Transistorand Body Bias for Active Leakage Power Control of Micropro-cessors“, JSSCC, pp.1838-1845, vol. 38, Nov. 2003

[19] S. Kim, S. Kosonocky, D. Knebel, “Understanding and Mini-mizing Groudn Bounce during Mode Transition of Power Gat-ing Structures”, ISLPED pp. 22-25, Aug. 2003.

[20] R. Erickson and D. Maksimovic, “High Efficiency DC-DC Con-verters for Battery-Operated Systems with Energy Manage-ment,” Worldwide Wireless Communications, Annual Reviewson Telecommunications, 1995.

[21] A. Dancy and A. Chandrakasan, “Techniques for aggressivesupply voltage scaling and efficient regulation,” in Proc. IEEECustom Integrated Circuits Conf. (CICC), 1997, pp. 579-586

[22] J. Kim and M. A. Horowitz, “An Efficient Digital Sliding Con-troller for Adaptive Power-Supply Regulation,” in JSSCC, May2002, vol. 37, pp. 639-647

Figure 12. Energy savings with Sdvspg and Sdvsonly for general workload

Figure 13. Energy savings with Sinsom for general workload

102

104

106

00.2

0.40.6

0.81

0

20

40

60

80

100

N

energy savings with the number of cycles and activity(Sinsom

)

activity

ener

gy

savi

ng

(%)

invited paper extended dynamic voltage scaling for low...

Documents