low voltage dual mode logic model analysis and parameter ... · unsurprisingly, cmos logic has also...

Microelectronics Journal 44 (2013) 553–560

Contents lists available at SciVerse ScienceDirect

Microelectronics Journal

0026-26http://d

n CorrE-m

journal homepage: www.elsevier.com/locate/mejo

Low voltage dual mode logic: Model analysis and parameter extraction

I. Levi a,n, A. Kaizerman a, A. Fish a,b

a Low Power Circuits & Systems Lab, The VLSI Systems Center, Department of Electrical Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israelb Bar-Ilan University, Israel

a r t i c l e i n f o

Article history:Received 5 April 2012Received in revised form7 March 2013Accepted 11 March 2013Available online 8 May 2013

Keywords:Dual mode logicDMLLow powerLow voltage logic

92/$ - see front matter & 2013 Elsevier Ltd. Ax.doi.org/10.1016/j.mejo.2013.03.005

esponding author. Tel.: +972 503 460385.ail addresses: [email protected], leit@bgu

a b s t r a c t

The Dual Model Logic (DML) family, which was recently introduced by our group for sub-thresholdoperation, provides an alternative design methodology to the existing low power digital designtechniques. DML gates allow switching between static and dynamic modes of operation on-the-flyaccording to system requirements, presenting better tradeoff between Energy consumption andperformance. In static mode, low voltage DML gates achieve very low Energy consumption withmoderate performance, while in dynamic mode they achieve high performance, albeit with higherEnergy consumption. In this paper we analyze DML gates operation in the sub- and near-thresholdregions by employing a recently proposed transregional model for low supply voltages. The sizingmethodology of low voltage DML is discussed and classical Logical Effort parameters are calculated forthe 40 nm DML basic gates. The design example of a DML full adder, implemented in a 40 nm low powerstandard CMOS technology, is shown to compare the proposed method with its CMOS and Dominocounterparts. Monte Carlo simulations are shown to demonstrate the DML immunity to processvariations.

& 2013 Elsevier Ltd. All rights reserved.

1. Introduction

The progress achieved in recent years in portable devices, theirever increasing popularity, and the fact that battery technologycannot keep up the pace, motivates researchers to explore thefields of low power design [1]. Recently, low voltage digital circuitdesign has become a very promising method for ultra-low powerapplications [2–4]. Usually, the circuits operate in the sub-threshold (ST) and/or near-threshold (NT) regions, operating froma supply voltage (VDD) that is close to or even less than thethreshold voltages of the transistors. This aggressive reduction ofsupply voltage leads to significant savings both in dynamic andstatic power. Unfortunately, low voltage circuits suffer fromincreased sensitivity to process variations and device mismatch[5]. Extensive research has been conducted to overcome thesesensitivities and to allow the ability to work under voltage regimessuch as adaptive body bias, or dynamic voltage scaling for adaptivepower applications [9–12].

CMOS has been known for a few decades as the most efficientVLSI design methodology. CMOS gates are very robust, achieve railto rail logic levels, strong on and off states, and until recentprocesses they also featured very low static power consumption.Unsurprisingly, CMOS logic has also become the most commondesign logic family used for low voltage operation. It was shown

ll rights reserved.

.ac.il (I. Levi).

that in many cases it achieves a more robust operation than itsPass Transistor Logic (PTL) and Dynamic Logic counterparts.However, it is also known that the delay of low voltage CMOShas been significantly degraded, making ST and NT CMOS designsunpractical for many applications.

Low voltage dynamic logic, such as Domino logic was proposedas an alternative to low voltage CMOS designs to achieve betterperformance [8]. However, due to known problems of dynamicgates such as charge sharing, susceptibility to glitches and tocrosstalk noise, and sensitivity to process variations in nanoscaledtechnologies have prevented utilization of Domino logic in lowvoltage designs.

Recently, we have proposed sub-threshold Dual Mode Logic(DML) to allow both ultra-low power dissipation and high perfor-mance under low voltage operation [20]. In fact, the DML gatespresent the advantages of both CMOS and dynamic gates, allowingoperation in both static and dynamic modes. This allows thesegates to achieve ultra-low power dissipation while operating inthe static mode, and high performance during the dynamicoperation. It was shown, by both simulations and measurementsof an 80 nm test chip, that DML gates are fully functional at supplyvoltages as low as 300 mV. We have also shown that chains of sub-threshold basic DML NAND/NOR gates achieved an improvementin speed of up to 10� in the dynamic mode compared to astandard CMOS, while dissipating only 150% more power. In thestatic mode, an 80% reduction of power dissipation was presented,compared to a basic Domino, at the expense of a magnitudedecrement in performance. The switching between the modes is

www.elsevier.com/locate/mejo

www.elsevier.com/locate/mejo

http://dx.doi.org/10.1016/j.mejo.2013.03.005



mailto:[email protected]

mailto:[email protected]


I. Levi et al. / Microelectronics Journal 44 (2013) 553–560554

very easy and can be performed on-the-fly, making the DML idealfor computing platforms where flexible workload is required. Andmost importantly, the implementation of DML designs is verysimple and intuitive.

In this paper we show that the DML nanoscaled designs arefunctional and robust both in the ST and NT regions. This isachieved by analysis of the DML gates sizing methodology andLogical Effort using the transregional model [15]. The efficiency ofthe DML method is presented through implementation of a DMLfull adder (FA) in low power 40 nm technology and its comparisonto equivalent CMOS and Domino FAs. For example, we show thatwhen operated in dynamic mode and compared to its CMOScounterpart, the DML adder shows an average improvement of47% in speed. Switching from the DML Static to its Dynamic modeachieves an average improvement of 300% in speed. Whenoperated in static mode and compared to its Domino counterpart,it achieves a 50% energy reduction. Dynamic DML consumes 30%less Energy, while increasing the delay by 15–20%, compared to itsDynamic counterpart. Energy consumption of the DML static modeallows reduction of energy consumption and lower minimumenergy point (MEP), as compared to conventional CMOS operation.These metrics present the flexibility of the DML family to achievehigh-performance or low-power operation, according to time-dependent system requirements.

The rest of this paper is composed as follows: Section 2overviews the basic DML logic gate and method of operation.In Section 3 we analyze sizing of low voltage DML gates using thetransregional model and calculate the Logical Effort parameters.A comparison of 40 nm DML FAs performance with CMOS andDomino counterparts is shown in Section 4. Section 5 concludesthis paper.

Fig. 1. The proposed basic DML gate topologies: (a) type A unfooted

2. DML logic – principle of operation

The structure of the basic DML gate, shown in Fig. 1, is verysimple: it consists of a conventional CMOS gate with an additionalpre\dis-charge transistor to allow dynamic operation. During staticoperation, this transistor is disabled and the DML gate operatessimilar to a standard CMOS gate. As in other dynamic families,DML gates can be designed with or without a footer. Fig. 1 showsall possible DML configurations; type A with pre-charge operationand type B with dis-charge operation. The footer is used todecrease pre-charge time by eliminating the ripple effect of thedata advancing through the cascade and allowing faster pre-charge. Switching the DML gate to operate in CMOS-like (i.e. staticmode) operation is fairly intuitive: the global Clk should be fixedhigh for Type A topology and constant low for Type B topology.As a result, the gate attains a similar topology to CMOS, except forthe extra parasitic capacitance, which is usually negligible. Theprocedure of creating a DML node based on a CMOS gate is alsovery simple: “gluing” an additional transistor for the pre-chargephase, and, in the case of a footed gate, adding an additional nMOStransistor as a footer in Type A gates and a pMOS transistor as aheader in Type B gates.

In addition to the unique capability to switch between differentmodes of operation, DML nodes which are operating in dynamicmode have a number of salient advantages over conventionaldynamic nodes, due to DML transistors sizing and topology, asdiscussed in the rest of this paper.

In this paper, the DML gates were optimized to improve speedin dynamic mode, while still maintaining very low minimumenergy point during static operation. As will be seen, this way,the DML gates will be still preferable for minimum energy

, (b) type B un-footed, (c) Type A footed, and (d) Type B footed.

I. Levi et al. / Microelectronics Journal 44 (2013) 553–560 555

operation, as compared to their CMOS counterparts, while sig-nificantly improving the performance in the dynamic mode. Sincethe performance in the dynamic mode is mainly determined bythe evaluation speed, the evaluation is preferred to be performedthrough the network of parallel transistors and minimal sizedtransistors are utilized in the active self-restore network. Thisallows capacitance reduction at the gate output, especially for thegates with large fan-in. The strength of the evaluation network isset to be equivalent to one minimal sized NMOS transistor,similarly to a standard CMOS methodology. It should be notedthat all gates can be designed as either “Type A” or “Type B”,ignoring the optimization guidelines mentioned above. The opti-mal design methodology when designing with DML gates is toconsequently connect “Type A” and “Type B” gates, exactly as innp-CMOS gates. Even though this design methodology will allowmaximum performance, minimizing of area and power efficiency,it is also possible to connect gates of the same type by using aninverter, buffering between them, in a similar way to how it isdone in Domino logic. Connecting gates of the same type withoutinverters is also possible when a footer/header is used at eachstage; however, this structure will cause glitching after pre-chargeends and until the evaluation data ripples through the chain.These are standard problems when designing with dynamic gates[17]; however, in contrast to the standard dynamic logic, DML‘sinherent keeper helps to recover the logical value.

Low voltage logic has not been widely used yet, mainly due tosignificant degradation in performance. Domino low voltage logicwas introduced as a possible solution [8]; however, it has not beenin use due to high sensitivity to process variations. Moreover, withprocess scaling, dynamic logic is being abandoned even in thesuper-threshold regime, due to very low yield and logic failures.The main issues and drawbacks of low voltage dynamic logic areelegantly solved or avoided when using DML. Charge leakage andcharge sharing are not issues in DML, since the complementarypart acts as a keeper and restores the logical level, without theneed for a high power consuming bleeder or an area and powerconsuming keeper. The ability to properly restore the logical levelsalso prevents the back gate coupling issue. In the followingsection, we will optimize the DML gates, and derive the requiredtransistor sizes.

3. Logical effort parameters calculations for lowvoltage DML gates

The optimization space of DML gates, similar to the majority ofVLSI designs, is composed of area, power and speed. Since ST andNT designs suffer from reduced performance, in this paper wetarget the DML gate optimization for speed. To optimize theperformance of DML gates, we will use the Logical Effort (LE)technique to evaluate the delay [13]. LE is a simple and efficientway to estimate and optimize the delay of digital circuits. Accord-ing to the LE technique, the gate delay (d) can be expressed as asum of the stage effort (f) and the parasitic capacitance (p), asnoted in the following equation:

d¼ f þ p ð1Þwhere f¼g∙h∙b; g is the logical effort of the stage; h is the electricaleffort; and b is the branching effort.

In order to use LE, we will need to evaluate the logical effort (g)of the gate, which is defined as the ratio of the input capacitance ofthe gate to that of an inverter, assuming that both gates drive thesame current. g is an intrinsic property of the gate and is constant.In order to evaluate g we need to set the transistor widths, so theDML gate can deliver the same amount of output current as aninverter. Section 3.1 presents the chosen transregional current

model for low voltage operation, i.e. ST and NT regions. This modelis used to evaluate the stacked transistors fitting parameters toachieve the same current as driven by a single transistor. DMLsizing architecture is presented in Section 3.2 based on the fittingparameters derived in Section 3.1. Logical effort parameters arecalculated in Section 3.3.

3.1. Modeling ion using the transregional model

In 2006, Keane et al. [14] offered a sub-threshold Logical Effort(LE) framework, yielding unconventional device sizing. We wouldlike to apply this observation to DML gates, and extend itsapplication to also include the NT region, and to comply withstate-of-the-art nanoscale processes. Since characteristics of MOStransistors operating in the ST and NT regions are substantiallydifferent from transistors operating in strong inversion, we use thetransregional model, recently developed by Harris et al. [15]. Thismodel is especially suited to fit the ST and NT regions. According tothe model, the on-current (Ion) of the transistor is modeled by thefollowing equation:

Ion ¼ I0WeVDT−αV2DT =nvT ð2Þ

where VDT is an abbreviation for VDD−VT; α and n are empiricalfitting parameters; and vT is the thermal voltage. The modelparameters were derived by curve-fitting Specter simulations forthe chosen low power 40 nm technology. Ion is a function of VDT;and therefore changes in VT caused by process variations or bodybiasing do not require re-fitting of I0, α and n. VDD was swept toextract the model parameters for (2), as shown in Fig. 2(a). In thisfigure, the simulation results are plotted versus the calculatedmodel. The model must be accurate in the functional ST and NTregions, in order to be used to derive the required transistorwidths. Indeed, throughout the entire range, from the functionalsub-threshold region (VDD40.25 V) to the NT vicinity, the leastsquare error fit has an average error of less than 1% and amaximum error of 9%.

In order to prove that the transregional model is not onlysuitable for modeling a single transistor, it was also examined for astack of two and three transistors. Fig. 2(b) and (c) clearly showsthat the model fits accurately for the operational regions, exhibit-ing an average least square fit error of 3% and a maximal error of14%. Modeling of stacks of several transistors will be importantduring LE development for DML. Note that the accuracy is veryhigh due to the fact that the transregional model is a “curvefitting” based model. The max error is 9–14% of the analytic modelversus simulated data for 250 mV supply voltage and for supplyvoltages higher than 250 mV the error drops very rapidly andbecomes negligible.

3.2. Low voltage DML sizing methodology

In the dynamic mode, attaining a fast evaluation period iscritical and therefore the analysis will be performed on the DMLtopology, where the pre-charge transistor is always preferred to beplaced in parallel to the stacked transistors (preferably: NOR in“Type_ A” and NAND in “Type_ B”). In addition, a footer is rarelyemployed in DML gates, and an analysis will be performed to sizethe footer and evaluation transistors (i.e. a stack of two transistorsin an optimal parallel evaluation net). It is important to under-stand that the vast majority of gates would be unfooted, as wasmentioned. Moreover, in complex logic gates (i.e. AOI\IOA) theevaluation net might comprise more than one transistor, evenwithout a footer. The complementary serial transistors, which areparallel to the pre-charge transistor, will be sized to minimalwidth, to decrease gate capacitances and intrinsic delay, and thus

0.1 0.2 0.3 0.4 0.5 0.610-14

10-12

10-10

10-8

10-6

10-4

VDD[V]

Cur

rent

[A]

Model

Simulation

0.1 0.2 0.3 0.4 0.5 0.610-14

10-12

10-10

10-8

10-6

10-4

VDD[V]

Cur

rent

[A]

Model

Simulation

0.1 0.2 0.3 0.4 0.5 0.6

10-12

10-10

10-8

10-6

VDD[V]

Cur

rent

[A]

Model

Simulation

Fig. 2. ION current versus VDD: theoretical calculation using transregional model versus simulation results. (a) single transistor fitting, (b) two stack transistor fitting, and(c) three stack transistor fitting.

0 0.1 0.2 0.3 0.4 0.5 0.62.5

3

3.5

4

4.5

5

5.5

6

VDD[V]

W/W

sing

le

Stack 2 calculated widthStack 3 calculated width

Fig. 3. Ratio between calculated widths of transistors in a stack and the width of asingle transistor.


allow fast dynamic operation. The pre-charge transistor will alsohave a minimal width, to decrease leakage currents. The pre-charge transistor could be sized even bigger to gain robustness inpre-charging, if required. The gain in output capacitance wasshown to be negligible.

Based on the transregional current model, we will calculate thewidths of the footer and evaluation transistors (W‘) required todrive the same on-current as a single transistor (W). The Ion,single ofa single transistor was equated to Ion,2 stack and Ion,3 stack, andWwasextracted as a function of the fitting parameters, as can be seen inEq. (3). The variables marked with a tick (′′′′′) are for the stackedtransistors.

W′W

¼ I0I′0

� exp VDT−αV2DT

nvT−V ′DT−α′V

′2DT

n′vT

!ð3Þ

Fig. 3 shows the ratio between the calculated widths oftransistors in a stack and the width of a single transistor. It shouldbe noted that the widths of the transistors in a stack are notconstant, but a function of VDD. Thus, the optimal width varies

Table 1Optimal transistor width for CMOS, dynamic logic and DML in all topologies.

Footed DML Type_ A Footed DMLType_ B

UnfootedDML Type_ A

UnfootedDML Type_ B

Dynamic CMOS

NOR 3

NAND 3

Table 2Transistor width and LE parameter calculations.

Gate Technology PMOS width NMOS width Clock transistor width p g d¼ghb+p for: h ¼3, b¼1

NOR 3 CMOS 8.4 1 – 11.4/3 9.4/3 13.2Domino\Dynamic 1 3.3 1 10.9/3 3.3/3 7.2DML_A_unfooted 1 1 1 5/3 2/3 3.667DML_B_unfooted 8.4 1 1 12.4/3 9.4/3 13.533DML_A_footed 1 3.3 1 11.9/3 4.3/3 8.267DML_B_footed 15 1 1 19/3 16/3 22.33

NAND 3 CMOS 1.5 5.6 10.1/3 7.1/3 10.467Domino\Dynamic 1 10 1 11/3 10/3 13.667DML_A_unfooted 1 5.6 1 9.6/3 6.6/3 9.8DML_B_unfooted 1.5 1 1 6.5/3 2.5/3 4.667DML_A_footed 1 10 1 14/3 11/3 15.667DML_B_footed 5 1 1 17/3 6/3 11.667

Note: LE for all dynamic logics takes into consideration only the evaluate transition (i.e. these values are a measure for comparison only in the dynamic mode).

0

5

10

15

20

25

0 1 2 3 4 5

CMOS NOR3

Domino NOR3

DML_ A unfooted NOR3

Nor

mal

lizec

Del

ay (d

)

Electrical Effort (h)

Fig. 4. Normalized delay as a function of the Electrical Effort. The delay has a linearrelation to g and an offset of p.


with the supply voltage and is different from region to region. Thisoptimization is worthwhile for the ST and NT region circuits, sincetraditional sizing (ratio of two and three, for the two and threetransistors stacks, respectively) is not precise at low voltages. Forexample, at 300 mV supply voltage, the sizing factor required toachieve the same current through 2-stack transistors as in the caseof a single minimal transistor is 3.3 (for 3-stack-5.6, and for4-stack-10). Nevertheless, for the strong inversion regime thecalculated width converges to the nominal values (not shownin Fig. 3).

Using the presented analysis, we have calculated the optimaltransistors sizing of basic DML gates. Table 1 presents an examplefor optimized transistor sizing, normalized to the minimal tran-sistor width for NAND and NOR gates with Fan-In¼3 for VDD¼0.3 V for DML, CMOS and Domino designs for both footed andunfooted topology. The same stacked-transistor analysis was alsoused to calculate the widths of CMOS/Dynamic transistors. Thesizing factor β, which is defined as the optimal ratio for transistorsin the pull up to the pull down network, was simulated and foundto be ∼1.5 for the 40 nm process, which is the factor for allcalculations henceforth.

3.3. Logical effort parameters

Using the transistor widths from Section 3.2, we have calcu-lated the LE parameters, as shown in Table 2. Note that the derivedvalues are smaller than their CMOS/Domino counterparts for theunfooted Type A NOR3 and Type B NAND3 gates. These delaysrelate to the DML operation in the dynamic mode. For the static

mode, it is clear that DML gates would be a bit slower than CMOSdue to the unsymmetrical sizing, yet not drastically due to verysmall input and output capacitances. A DML designer is urged toconstruct logic in such a manner that gates would preferably beconstructed with a high-stack pull up network in Type A and high-stack pull down network in Type B. The design should only rarelyincorporate other combinations like unfooted Type A NAND orType B NOR. Such a design approach would gain very fast circuits.As was mentioned before, footed gates would be very rarely usedand are presented for the matter of completeness. Fig. 5 shows theachievable frequency of 40 nm NAND-NOR DML chain (a chain of


20 gates, 10 NAND gates in Type B; 10 NOR gates in Type A), CMOSand Domino chains with FO3 of the same length. It should benoted that a very noticeable gain is achieved when testing morecomplicated designs, as will be shown in Section 4.

A NOR DML gate operating in static mode is on average 33%slower than a CMOS gate. Switching a DML gate from static modeto dynamic mode offers an average speed improvement of 2� inthe footed topology (for example, at VDD¼0.3 V, dynamic DMLachieves 66 MHz, whereas CMOS achieves only 50 MHz and staticDML 35 MHz). In the unfooted topology an improvement of up to14� was measured [20]. As expected, Domino logic can operate atthe highest frequency, but it suffers from susceptibility to processvariations. On average, dynamic DML operation consumes 100%more energy than static DML, as will be discussed in Section 4.

In this section we have derived the logical effort parametersunder the modeling of the current through a single transistor.

~Cout ~Sum

a b

c

c

a

b

ab a b c

a

b

a b c a

b

c

c

b

a

~

a b

c

c

a

b

ab a

a

b

a

clk

Fig. 6. Full Adder implementation: (a) CCM

0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6106

107

108

109

1010

Vdd[V]

Freq

uenc

y[H

z]

DynamicDML - DynamicCMOSDML - Static Mode

Fig. 5. Measured speed for CMOS, Domino and DML (static and dynamic).

An example of a normalized delay by LE analysis based CMOS,Domino and unfooted DML_ A NOR gates with Fan-In¼3 isillustrated in Fig. 4. The CMOS NOR gate has the biggest LE value,and therefore it exhibits the highest delay for any given h, whilethe DML unfooted NOR has the smallest LE and parasitic delay, andtherefore the smallest normalized delay for a given h. This is asexpected.

4. Simulations and performance

Low voltage 40 nm DML FA was designed to prove the func-tionality and robustness of the proposed low voltage DML meth-odology. The adder was compared to CMOS and Dynamic FAs [18].The design of the DML FA was based on the optimization devel-oped in Section 3. Sizing of the stacked transistors in all designswas based on the analysis introduced in the previous section. Fig. 6shows the implementation of the FA in the different logic families.The Conventional CMOS design has 24 transistors, whereas theDynamic design has 20, and DML has 30 transistors. It should benoted that the increased transistor count in the DML FA does notoccupy more area, since it employs more than 50% of minimumsized transistors. All the adders were designed and characteri-zed in a standard low power 40 nm process using the CadenceVirtuoso based Specter simulator. Power supplies between 150 mVand 600 mV were tested for energy estimation. Monte Carlostatistical simulations were performed at 300 mV to compare thesensitivity of the simulated adders to process variations andmismatch.

4.1. Energy and delay analysis

Each FA was tested for delay. DML gates were tested both instatic and dynamic modes of operation. Each FA implementationwas analyzed in order to find its critical path. The simulation setupwith the Device Under Test (DUT) is shown in Fig. 7. Theconfiguration of the connection between the FAs in the chainwas selected to highly strain the chain. We performed a 5 k point

~Cout ~Sum

a b

c

a

b

b c

c

b

aa

clk

clk~

Cout ~Sum

' b' c'

' b' c' a'

b'

c'

c'

b'

a'

clk~

OS, (b) Dynamic logic, and (c) DML.

Fig. 8. Monte Carlo full adder delay analysis, VDD¼300 mV. μCMOS¼155 n sCMOS¼84 n. μDynamic¼150 n, sDynamic¼141 n. μDML_Static¼190 n sDML_Static¼80 n. μDML_Dynamic¼96 nsDML_Dynamic¼48 n.

0.2V0.25V0.3V

0.35V0.4V

0.45V

0.5V0.55V

0.2V0.25V

0.3V0.35V

0.4V

0.45V

0.5V

0.2V

0.25V

0.3V

0.35V

0.4V

0.45V

0.5V

0.55V

0.2V0.25V0.3V

0.35V0.4V

0.45V

0.5V

0.55V

0.4

0.9

1.4

1.9

2.4

2.9

3.4

3.9

0.019 0.19 1.9 19

Ener

gy\o

pera

tion

[fJ]

log (Delay[10e-7]) [Sec]

CMOS

DML-Dynamic

Dynamic

DML-Static

Fig. 9. Full adder E–D plots (per operation) of CMOS, dynamic, DML dynamic andstatic as a function of Vdd.

Fig. 7. DUT configuration for delay analysis.


Monte Carlo transient simulation to measure delay, which isshown in Fig. 8. As expected, low voltage dynamic FA failed inmost of the tests. When it did succeed, it performed poorly (with ahigher mean delay than CMOS). Moreover, it displayed the highestoverall variance. On the other hand, Dynamic DML featured thelowest overall delay, while CMOS and Static mode DML providedvery similar, slower delays.

To analyze performance and energy consumption we con-structed a 20-bit ripple adder using the FAs illustrated in Fig. 6.

The Delay and Energy (E–D) analysis of a single block of the rippleadder chain is shown in Fig. 9. It should be noted that static DMLconsumes less energy than CMOS with degraded performance,while dynamic DML consumes less energy than dynamic logic andachieves boosted performance as compared to its CMOS counter-part. Also, in all cases the minimum energy points (MEP) arelocated in the ST region [6,7]. An interesting observation is that,operation at the MEP of static DML (250 mV) offers an iso-frequency of 1.62 MHz with less than 0.72 fJ of energy consump-tion, while switching to dynamic mode provides about 2.3�frequency boost to 3.69 MHz with less than a 40% increase inenergy consumption. Note, plotting the analytical delay graphsversus the measurements delay graphs shows no practicallyvisible error for voltages higher than 300 mV and an average leastsquare fit error of less than1–2% (for all 1–3 stacks). This makes themodel very accurate and attractive.

The chosen activity factor for the E–D graph shown in Fig. 9 is 1.Activity factor typically reveals tradeoffs regarding switching andleakage energy to analyze them separately. The reason that thisactivity factor was chosen is due to worst case considerations forDML gates (in terms of the objective: Energy). At all times and inall operation modes DML logic has at least one minimal sizedtransistor network. It is well known that the Energy dissipation inthese voltage regions is dominated by the sub-threshold and gateleakage currents and that they are linear dependent on thetransistors width which makes the DML leakage substantiallylower. Therefore, activity factor of 1 is quite harsh and presentsthe pessimistic factor.

Fig. 10. Monte Carlo SNM analysis, VDD¼300 mV. μDML_Static¼91 m, sDML_Static¼8 m.μCMOS¼86 u, sCMOS¼6 m.


4.2. Static noise margin analysis

In order to investigate the performance and robustness of theFAs, static noise margin (SNM) analysis was performed. This testquantitatively estimates the robustness of the design, and itssusceptibility to process variations. SNM was measured in a verysimilar way to that introduced by Kwong et al. [16]. The methodconsists of butterfly plot analysis, and verifies proper VOL and VOH

values. The inputs of two FAs are cross-coupled, so they will act asinverters, and feed each other. Then the input voltage is swept, andthe two output curves of the FAs are superimposed to create abutterfly curve. SNM is measured as the inscribed biggest squarein the smaller lobe of the butterfly plot. Fig. 10 shows the MonteCarlo SNM results. The SNM received from static DML is slightlyhigher than CMOS (86 u in CMOS, versus 91 u in DML), and overall,DML in static mode seems to be as robust as CMOS. As mentionedin Section 4.1 the Dynamic version of the FA failed to operate at300 mV.

5. Conclusions and discussion

This paper introduced a novel logic family called DML that canbe switched between static and dynamic operation via a simplecontrol signal. The basic DML gate structure and operation werereviewed, and then an optimization method for DML gates wasintroduced and employed. This optimization consisted of stacksizing analysis for the ST and NT regions. We implemented a DMLFA and compared it to its CMOS and Domino counterparts. Finallywe used the FAs to construct a 20-bit ripple adder. Monte Carlosimulations were performed, and we showed that in dynamicmode, the DML FA is more robust and consumes less power thanDomino logic, and is faster than the CMOS gate with a higherpower figure. In conclusion, dynamic operation of the DML FAprovides 200% more speed than CMOS, while static operationprovides very similar speed and power performance to its CMOScounterpart. Future work includes implementation of a morecomplex Low Voltage DML test bench. For example, we plan to

design a high performance Low Voltage DML CLA adder which isbased on the concept presented in [19]. According to this concept,the CLA presents the ability to constantly evaluate the criticalpaths and dynamically speed up these routes. This way a moreefficient tradeoff between performance and power dissipation isachieved.

References

[1] A.P. Chandrakasan, S. Sheng, R.W. Brodersen, Low-power CMOS digital design,IEEE J. Solid-States Circuits 27 (2002) 473–484.

[2] J. Kwong, Y.K. Ramadass, N. Verma, A.P. Chandrakasan, A 65 nm sub-Vtmicrocontroller with integrated SRAM and switched capacitor DC-DC con-verter, IEEE J. Solid-States Circuits 44 (2009) 115–126.

[3] D. Markovic, C.C. Wang, L.P. Alarcon, J.M. Rabaey, Ultralow-power design innear-threshold region, Proc. IEEE 98 (2010) 237–252.

[4] G. Gammie, A. Wang, H. Mair, R. Lagerquist, M. Chau, P. Royannez,S. Gururajarao, U. Ko, SmartReflex power and performance managementtechnologies for 90 nm, 65 nm, and 45 nm mobile application processors,Proc. IEEE 98 (2010) 144–159.

[5] B. Zhai, S. Hanson, D. Blaauw, D. Sylvester, Analysis andmitigation of variability insubthreshold design, in: Proceedings of the 2005 International Symposium onLow Power Electronics and Design, 2005, pp. 20–25.

[6] B.H. Calhoun, A. Wang, A. Chandrakasan, Modeling and sizing for minimumenergy operation in subthreshold circuits, IEEE J. Solid-State Circuits 40 (2005)1778–1786.

[7] S. Hanson, B. Zhai, K. Bernstein, D. Blaauw, A. Bryant, L. Chang, K.K. Das,W. Haensch, E.J. Nowak, D. Sylvester, Ultralow-voltage, minimum-energyCMOS, IBM J. Res. Dev. 50 (2006) 469–490.

[8] H. Soeleman, K. Roy, B. Paul, Sub-domino logic: ultra-low power dynamic sub-threshold digital logic, VLSI Des. (2001) 211.

[9] H. Mair, A. Wang, G. Gammie, D. Scott, P. Royannez, S. Gururajarao, M. Chau,R. Lagerquist, L. Ho, M. Basude, A 65-nm mobile multimedia applicationsprocessor with an adaptive power management scheme to compensate forvariations, IEEE Symp. VLSI Circuits (2007) 224–225.

[10] G. Gammie, A. Wang, M. Chau, S. Gururajarao, R. Pitts, F. Jumel, S. Engel, P.Royannez, R. Lagerquist, H. Mair, A 45 nm 3.5 g baseband-and-multimediaapplication processor using adaptive body-bias and ultra-low-power techni-ques, in: Solid-State Circuits Conference, 2008. ISSCC 2008. Digest of TechnicalPapers, IEEE International, 2008, pp. 258–611.

[11] B.H. Calhoun, A.P. Chandrakasan, Ultra-dynamic voltage scaling (UDVS) usingsub-threshold operation and local voltage dithering, IEEE J. Solid-StatesCircuits 41 (2006) 238–245.

[12] M.E. Hwang, K. Roy, ABRM: adaptive β-ratio modulation for process-tolerantultradynamic voltage scaling, IEEE Trans. VLSI Syst. 18 (2010) 281–290.

[13] I.E. Sutherland, R.F. Sproull, D.F. Harris, Logical Effort: Designing Fast CMOSCircuits, Morgan Kaufmann, San Francisco, 1999.

[14] J. Keane, H. Eom, T.H. Kim, S. Sapatnekar, C. Kim, Subthreshold logical effort: asystematic framework for optimal subthreshold device sizing, in: DesignAutomation Conference, 2006 43rd ACM/IEEE, pp. 425–428.

[15] D. Harris, B. Keller, J. Karl, S. Keller, A transregional model for near-thresholdcircuits with application to minimum-energy operation, in: 2010 InternationalConference on Microelectronics (ICM), December 2010, pp. 64–67.

[16] J. Kwong, A.P. Chandrakasan, Variation-driven device sizing for minimumenergy sub-threshold circuits, in: Proceedings of the 2006 InternationalSymposium on Low Power Electronics and Design, pp. 8–13.

[17] R. Hossain, High Performance ASIC Design: Using Synthesizable Domino Logicin an ASIC Flow, Cambridge University Press, New York, 2008.

[18] C.H. Chang, J. Gu, M. Zhang, A review of 0.18 μm full adder performance fortree structure arithmetic circuits, IEEE Trans. VLSI Syst. 13 (6) (2005).

[19] I. Levi, O. Bass, A. Kaizerman, A. Belenky and A. Fish, "High speed Dual ModeLogic Carry Look Ahead Adder," in Circuits and Systems (ISCAS), 2012 IEEEInternational Symposium on, pp. 3037–3040, 2012.

[20] A. Kaizerman, S. Fisher and A. Fish, "Subthreshold Dual Mode Logic," VeryLarge Scale Integration (VLSI) Systems, IEEE Transactions on, vol. 21, pp. 979–983, 2013.

low voltage dual mode logic model analysis and parameter ... · unsurprisingly, cmos logic has also...

Documents