effect of tech scaling

8/3/2019 Effect of Tech Scaling

http://slidepdf.com/reader/full/effect-of-tech-scaling 1/8

Effect of Technology Scaling on Digital CMOS Logic Styles

Mohamed Allam, Mohab Anis and Mohamed Elmasry*

Abstract

In this paper, the main challenges of technology scaling are

reviewed in depth. Five popular logic families, namely, Con-

ventional CMOS, CPL, Domino, DCVS and MCML are rep-

resented highlighting their advantages and drawbacks. Th e

behavior of each logic style in deep submicron technologies is

analyzed and predicted for future generations. To verify the

qualitative analysis, simulations were performed on the basic

logic gates, full adder and a 16-bit Carry Look Ahead adder.

The circuits were implemented in 0.8, 0.6, 0.35 and 0.25pm

CMOS technologies.

1 Introduction

Ever since the invention of the f i s t integrated circuit, de-

vice dimension, voltage supply, threshold voltage, and oxide

thickness are parameters tha t have been scaled down at a dra-

matic rate over the past three decades [l]. They are consid-

ered as the main stimulus to the growth of the microelectron-

ics industry. Bu t as technology scales down, many phenom-

ena like short channel effects, hot carriers and subthreshold

leakage currents, dominate the functionality of CMOS logic

circuits. Depending on the application, the kind of circuit

to be implemented and the technology used, different per-

formance aspects vary significantly from one logic style to

another. Choosing the appropriate logic style for a certain

application is becoming a challenge where the designer un-

dergoes exhaustive simulations to evaluate the various imple-

ment ations.

Considerable potential for high speed and power savings ex-

ists by means of proper choice of a logic style for implementing

combinational circuits. This is because the parameters gov-

erning power dissipation and performance are strongly influ-

enced by the chosen logic style. Power dissipation is governed

by the supply voltage, operating frequency, nodal switching

activity an d device sizes. Speed, on the other hand, is affected

by the No. of inversion levels, No. of devices in series, sup-

ply voltage, device sizes and interconnect wiring capacitance.

The circuit’s robustness with respect to voltage and device

scaling, process variations and compatability with surround-

ing circuits is also affected by the type of logic style used for

implementation. These parameters are also influenced by the

technology used for implementation, making a logic style fa-

vorable over another to implement a certain application, while

this is not necessarily true as th e technology is varied.

A metric that is heavily influenced by technology scaling,

and th at describes the efficiency of the circuit in terms of per-

formance and power dissipation, would be the Energy-Delay

‘M.W. Allam, M.H. Anis an d M.I. lmasry ar e with the VLSI

Research Group, Department of Electrical and Computer Engineer-ing, University of Waterloo, ON N2L 3G1,Canada.

Product (EDP) [2]. In order to illustrate the influence of

technology scaling on the behavior of digital circuits, Con-

ventional CMOS, CPL, Domino, DCVS and CML logic styles

are used to implement the basic logic gates, full adders, and a

16-bit Carry-Look-Ahead (CLA) adder. These circuits are im-

plemented in CMOS technologies 0.8, 0.6, 0.35 and 0.25pm,

under nominal operating conditionas, and are all optimized

for minimum EDP values. An overview of the most impor-

tan t logic styles is fi s t presented, followed by how logic styles

are affected by technology scaling. Finally, simulation results

are presented to verify the qualitative analysis.

2 Logic Styles

2.1 Conventional CMOS

Logic gates in conventional CMOS are built from an N and

P block. An AND-OR-Invert (AOI) CMOS gate is shown in

Figure l(a). The N block implements a sum-of-product func-

tion to evaluate the ”0” state by creating a path from the

output to G N D . Th e P block evaluates the ”1” state of the

output by implementing a product-of-sums function to create

a path from VD D o the output node. This is equivalent to

stating that the output node is always a low-impedance node

in steady state. Th e N and P networks should be designed so

that, whatever the value of the inputs, one and only one of

the networks is conducting at steady st ate. The main draw-

back of CMOS circuits is the existence of the P block, due

to it s low mobility. The PMOS devices have to be therefore,

sized up. Furthermore, the input capacitance of a CMOS

gate is large because each input is connected to the gates

of at least one PMOS transistor and one NMOS transistor.

This also degrades the gate’s speed. However, the best ga te

performance is achieved with a PMOS/NMOS width ratio of

d x 3].This ratio will eventually approach 1 in Deep-

Submicron (DSM) technologies, where the carrier drift veloc-

ities in NMOS and PMOS transistors become almost equal

due to velocity saturation. Another drawback of CMOS is

the relatively weak output driving capability due to series

transistors in the output stage.

Another impact that the large input capacitance of a

CMOS has, is high power dissipation. However, static CMOScircuits have a smaller switching activity and short-circuit

current compared to the other logic styles. CMOS is also ro-

bust against voltage and transistor scaling and thus reliable

operation at low voltages and minimal transistor sizes. This

is attributed to the presence of a static path that restores the

correct logic stat e in the case of noise. Through out th is paper

the terminology ”CMOS” will be used to define ”Conventional

CMOS”.

19-1 1

0-7803-5809-O/OO/$lO.OO2000 IEEE IEEE 2000 CUSTOM INTEGRATED CIRCUITS CONFERENCE401



2.2 Complementary Pass Logic (CPL)

A CPL gate [4] consists of two NMOS logic networks (one

for each signal rail), two small pull-up PMOS transistors for

swing restoration, and two output inverters for the comple-

mentary output signals. Figure l( b) shows an A0 1 circuit

implemented using CPL. Unlike CMOS logic, the CPL gate

creates a path from the ou tput node to one of the input nodesof the ga te instead of the power lines. Because the MOS net-

works are connected to variable gate inputs ra ther tha n con-

stant power lines, only one signal path through each network

must be active at a time in order to avoid shorts between the

inputs. Therefore, each pass-transistor network must realize a

multiplexer (MUX) structure . All two-input functions AND,

OR and XOR are therefore, implemented by this basic MUX

structure. This is relatively expensive for simple monotonic

gates such as AND and OR.

In most cases, CPL uses smaller and less number of transis-

tors especially in XOR and MUX based functions. There C PL

employ small input loads and good output driving capability

due to the output inverters, and the fast differential stage

due to the cross-coupled PMOS pull-up transistors. However,

most of the CPL gates require all the inputs and their com-plements which increases the routing complexity and over-

head, and ultimately augment power and delay. Since the

CPL gate is constructed mainly from N transistors, the out-

put voltage swing will be lower than the input swing by the

NMOS threshold voltage V T H ~ .his could cause DC cur-

rent t o flow through the inverter. A swing restoring circuit

should therefore, be added after each two or three cascaded

gates to restore the full output swing. This in turn adds to

the power of the circuit. The layout of pass-transistor cells

is not as straightforward and efficient as CMOS due to the

rather irregular transistor arrangements and high wiring re-

quirements, because of the double rails.

2.3 Domino Logic

The A 0 1 structure of a domino logic gate [5] is shown in Fig-

ure l(c) . It is a non-inverting structure and consists of a

dynamic gate stage, a static CMOS inverter, which provides

the circuit’s outp ut, and a PMOS keeper transistor which re-

stores the logic at the Domino output node. The dynamic gate

stage consists of an NMOS transistor network, which imple-

ments the required function and two transistors (NMOS and

PMOS) where the clock signal is applied and synchronizes

the operation of the circuit. The CMOS inverter is included

for the proper operation of a chain of domino gates, and to

increase the driving capability of the gate. The keeper tran-

sistor restores the logic and gives the domino gate immunity

against charge sharing and charge loss [6].

Any number of logic stages can be cascaded, provided that

the sequence can evaluate within the evaluate clock phase.The domino input signal to a domino gate must therefore,

satisfy some setup and hold timing constraints for correct op-

eration of the gate [7].

Domino logic has low transistor count and input capaci-

tance, which enhances its speed. F‘urthermore, since the logic

block is only constructed from high-mobility N transistors,

the evaluation is fast. Domino logic consumes large power.

This is attributed to its high switching activity because all

the output nodes are precharged to VDD ach clock cycle, as

well as the large clock load switching at full rate.

Domino logic is very susceptible to noise. A voltage at

the input as low as VTHcould turn on the NMOS pull-down

transistor, and the output will eventually reach GND. his

is translated to a NM of VTH,which is quite low compared to

sta tic versions. Some subthreshold leakage current can flow

through the NMOS even when the input is ”0 ” . This effect

becomes more pronounced when the input is not completely

”O”, but approaches VTH in the presence of noise, causing

the N-devices to turn ON. To compensate for the low noise

margins, the size of the PMOS keeper must increase, in turn

increasing the contention current during evaluation and con-

sequently reducing the gate’s performance. This is the typical

Speed-Noise Margin trade-off in Domino logic circuits. An-

other one of the problems of Domino circuits is that nonin-

vkrting logic could only be implemented. This is a problem

in-the implementation of XOR gates and full adders (FA). A

Domino style which overcomes this problem is the NP-Domino

[SI. NP-Domino was used to realize the simulated XOR and

FA’S in this work.

2.4 Differential Cascode Voltage Switch(DCVS)

The static and dynamic DCVS logic were first proposed by

Heler et al . as a high performance logic family [9]. The static

version suffered from major drawbacks: 1. High dynamic

power, 2. Limited driving capability and 3. Complex design.

On the other hand, the dynamic version experienced speed-

noise margin trade-offs similar to Domino Logic. The dynamic

DCVSL (DDCVSL) was therefore proposed.

Figure l(d) presents the architecture of an A 01 gate imple-

mented in DDCVSL logic. It is clear that during precharge

phase (CLK=O), both keeper transistors Q1,2 will be O F F .

Unlike domino logic, the keeper transistors will be O FF at

bhe start of he evaluation phase (CLK=l) which will reduce

power and delay caused by the contention . One branch will

implement the required function, while the other branch im-plements it s inverse. DDCVSL is considered a general pur-

pose logic sty le because it may be used t o implement inverting

and non inverting logic circuits. DCVS is more area efficient

in implementing complex logic gates. Most of the complex

logic functions may be implemented using one gate only which

makes DCVS logic much faster than CMOS circuits. It is

also suitable for implementing gates with XOR functionality

like arithmetic circuits and MUX style logic gates. Over the

past fifteen years, many flavors of Cascode Voltage SwitchLogic (CVSL) were introduced. Differential Cascode Voltage

Switch with Pass Logic family (DCVSPG) uses pass logic to

implement the logic function of each branch[lO]. It avoids the

problem of the floating output node th at exists in DCVS logic.

Switched Output Differential Structure (SO.DS) replaces the

PMOS latch with a clocked latch to avoid the contention [ll].Charge Recycling Differential Logic (CRDL) reduces power

dissipation by shorting the output nodes before each evalua-

tion phase [12].

2.5 MOS Current Mode Logic (MCML)

Figure 2(a) shows the architecture of an MCML inverter/

buffer. Transistor Q1 acts as a DC current source controlled

by V,,,. Resistors R I and Rz are pull up resistors. The logic

40219-1-2



TT -

T T

rh

+

(a)CMOS

CLK -+x(b ) CPL (c ) DOMINO

Figure 1: Full Swing Logic Styles

function is implemented by the logic block connected between

the resistors and the current source. For an inverter/ buffer,

the logic block is the differential pair constructed by tran-

sistors Qz and Q3. The operation of the CML is based on

the differential pair circuit. Each differential input variable

is connected to a differential pair circuit. The value of the

input variable controls the flow of current through the two

branches. For example , if VGS(QZ)s higher than VGS(Q~) ,

the current passing through Qz will be higher than that pass-

ing through Q3. Therefore, the voltage of node N I will start

to drop until reaching a steady sta te where the current going

through the resistor R I matches the current going through

transistor Qz. he amount of current going through the ON

branch (Qz in the previous case) controls the discharge delay

of the logic gate while the load resistor controls the charging

of the output nodes. The output voltage swing V . s de-

fmed as the voltage difference between N I and Nz. he small

output swing of MCML circuits reduces cross t a k between

adjacent signals. The constant current source reduces the

switching noise and supply fluctuations. For those reasons,

MCML is recommended for mixed signal design to reduce the

interference between the digital and analog blocks [13],[14].The reduced output swing also reduces the dynamic power

dissipation for long busses. Therefore, MCML may be used

in the implementation of bus transceivers. Another impor-

tant feature of CML circuits is its noise immunity due to th e

differential nature which is recommended at high operating

frequencies.

However MCML has some major drawbacks which limit its

use in digital systems. First is the static power dissipation dueto the constant current source which is independent on the

operating frequency. Therefore, MCML is preferred at h igh

frequency applications only to reduce th e overhead of its sta tic

biasing power. MCML circuits are not suitable for power-

down mode because of the DC current source. MCML circuits

also require special fabrication technologies to implement the

large load resistors in a reasonable area which increases the

cost and area of the chip. A reference voltage distribution tree

has to be included in the design to distribute Vret eading

to larger chip area and more complex routing. Finally, the

OUT

cL y -i(

(d ) DCVS

matching of the rise and fall delays is not an easy task because

its dependency on the load of each gate.

RL A L

"'UCG

(a ) Inverter (b) A 0 1

Figure 2: MCML

3 Effect of Technology on Logic

Styles

3.1 Velocity Saturation and Mobilitydegradation

In order to evaluate the output logic of a certain gate imple-

mented by some logic style, a series of charging and discharg-

ing processes occur to the output node (at which the logic is

determined). As the input of a logic gate changes, it causes

the output node(s) to either charge or discharge. This is true

for logic styles consisting of an N logic block. A static CMOS

inverter is a simple example. The delay of which is the time

taken for the outp ut node to fully charge or discharge.

19-1-3 403



For full swing logic styles, this NMOS will go through all

the operating phases (cut-off, saturation and linear modes)

while discharging the output node. The transistor is initially

in the cut off mode, when the input is ”0”. As the input

increases, the NMOS operates in 2 regions; Saturation and

Linear. The NMOS will fi s t operate in saturation where the

drain current I D S is large ( I D Sa (VGS- V T H ) ~ ,hich dis-

charges the O/P node quickly. a is the velocity saturation

index [15],which takes a value of 2 for long-channel devices,

and around 1.3 for short-channel devices. The NMOS will

operate along a constant VGS urve in the saturation region

in the typical IDS/VDSharacteristics plot. When the output

node reaches VDD V T H ~the NMOS moves from the satura-

tion to the linear region. I D S n the linear region is less than

in the saturation region for the same VGS[15],which causes

the discharge to slow down.

The slowest transition however is from cut-off + aturation

because all the charge stored in the depletion region of the

NMOS device has to sink before the channel is constructed

between the drain and the source. MCML is therefore, faster

than other logic styles (refer to Figure 2(a)) This is because

Qz and Q3 are never totally O FF, and experience a transition

from the saturation -+ linear region and vice versa which take

a short time.

The speed advantage of CML over other logic styles will

start to fade as we move deeper in the DSM regime, where

saturation currents are reduced compared to the linear cur-

rents and no longer follow the long channel behavior (a p-

proaches 1).Not only will the carrier velocity tend to saturate

as the channel length is scaled down, but the device’s mobility

will start t o degrade as well. Figures 3(a) and 3(b) show the

saturation velocity and mobility degradation of the electron

respectively.

(a ) Velocity Sat . (b) Mobility Deg.

Figure 3: Velocity Saturation & Mobility Degradation

In NMOS, the saturation velocity is reached at a lower crit-

ical electric field compared to PMOS. This indicates that pn

is degraded at a much faster rate than p p [16].Eventually, apoint is reached where both NMOS and PMOS have compa-

rable mobilities and switching speeds. This is particularly im-

portant for the implementation of CMOS structures , for two

reasons. Firstly, CMOS suffers from degraded performance

because of the low mobility PMOS transis tors. This speed

disadvantage will gradually decrease as the technology scales

down, and pn approaches pp. This enhances the performance

of CMOS in terms of delay, power and area. Secondly, the op-

timum noise margin in CMOS is achieved when p p equals pn

[17]. With p p = p n , the CMOS noise margin is enhanced, and

equal driving capability is achieved, which keeps the short-

circuit current within bounds [3].Thus CMOS performance

and robustness are both enhanced relative to other styles as

technology scales down.

3.2 Hot carrier effect (HCE)

Another phenomenon that takes place as the technology is

scaled down is the hot carrier effect (HCE) [16].The scaling

down of the gate oxide thickness TOXat a higher rate than

the supply voltage causes the electric field across the gate to

increase, which causes the increase of electron velocity. Elec-

trons would leave the silicon and tunnel into the gate oxide

upon reaching enough energy levels. Electrons trapped in the

oxide change VTH,ypically increasing VTH f NMOS devices

( V T H ~ ) ,hile decreasing VTH f PMOS devices. MCML may

have some trouble with VTH ariation caused by the HCE, be-

cause the devices have to be matched for correct functionality.

HCE is another reason that makes low voltage operations fa-

vorable. Logic families th at can work at a lower supply volt-

age like MCML (with no degradation in functionality) will

get more preference because this will reduce the HCE andthe punch through phenomenon, leading to better reliability

and lifetime.

Logic styles that can tolerate minor changes in VTHwill

gain more importance because the HCE and electromigration

tend to increase V T H ~ver time. For Domino and DCVS

logic, this is translated into a small variation in delay and

better noise margin. On the other hand, the higher V T H ~

may cause MCML to cease functionality. This is attribu ted

to the fact tha t increasing V T H ~ould decrease the discharge

current, causing the voltage swing VS to be limited in value.

When V . s small, it might cause the following CML stages

to malfunction. Circuits implemented using CPL also have

degraded performance when affected by HCE, as a larger

voltage drop ( V T H ~ )s produced across the pass transistor.

The pass transis tor and output inverter will therefore, havelower switching speeds, because the current is reduced. Short-

circuit currents also take place, adding to the CPL’s power

dissipation.

3.3 Leakage currents

Th e performance of dynamic styles, particularly Domino, will

degrade in DSM technologies. As explained in section 2.3,

Domino logic is particularly susceptible to noise, due to the

effect of leakage currents. Leakage currents are more pro-

nounced as we move down in the DSM regime. This dete-

riorates the gate’s noise margin. To compensate for the low

noise margins, the size of the PMOS keeper must increase, in

turn increasing the contention current during evaluation, as

well as the loading of the O /P node. This reduces the gate’s

performance.

The ra te of improvement in the Domino’s performance will

therefore gradually decrease as we go deeper in DSM technolo-

gies. This is another reason tha t the performance of CMOS

circuits is expected to approach the dynamic logic gates with-

out tampering with noise margins. Figure 4 [18] lots the

optimal VTHversus process technology for the static and dy-

namic cases. It is clear that t he optimal VTHused in static

and dynamic circuits diverge. Static circuits need lower VTH

19-1 4404



to maintain gate drive with lower V D D , hile in dynamic cir-

cuits it becomes difficult to scale VTHdue to noise limits.

0.2 ' ~ " l " a ' l ~ ' s ' ~ ~ ~ ' ~ ~ ~ " " ~ ~ ' ~ ~ ~ ~ ' ~ ~ ~ ~ '

1 0.8 0.6 0.35 0.25 0.18 0.15

Technology ( pm)

Figure 4: Optimal threshold voltage for static and dynamic

circuits versus technology

3.4 The Drain-Induced Barrier Lowering

(DIBL)

DIBL causes VTH o be a function of the operating voltage.

VTHdecreases with L,jj for short-channel devices, while an

increase in the drain-source voltage VD S auses VTH o de-

crease. This effect is called DIBL. This becomes a problem

especially for dynamic circuits which causes a reduction in the

noise margin, tha t is particularly a problem in Domino logic

implementations. As mentioned previously, to maintain suffi-

cient noise margin, this would come at the expense of reduced

performance.

3.5 Scaling down VDD/VTHatio

VDDs scaled down at a relatively slower rate than the scaling

down of VTH as shown in Figure 5. This is attributed to

reliability restrictions tha t limit the electric field applied to

the gate.

Hence, the ratio VDDIVTH rops with technology scaling

until it reaches a minimum value of 3 at a feature size of

0.07pm. This again explains the performance and power

degradation of CPL logic styles. To further illustrate this,

a section of the CPL circuit is shown in Figure 6.

The voltage at the output of the driver circuit is V D D ,

while the pass transistor is initially OFF. As transistor Q1

turns ON , it will start operating in the saturation mode,

where its current I1 cx ( VG S~ V T H ~ ) ~ .n the case of

CPL, V G S ~ V D D- V T H ~due to the V T H ~rop), thus

11 cx (VDD- ~ V T H , ) ~ .f VDDwas to take the worst case

value of 3VTHN [19], then 110:V T H ~. 11 is thus significantly

reduced, and the switching speed of Q1 is largely degraded.

Furthermore, this will increase the short-circuit current flow-

ing from VDDo GND n the inverter. A further speed degra-

dation, is accompanied as Qz passes through the saturation

then linear phases while discharging the O/ P node. Qz starts

discharging in the saturation mode when V G S ~ V T H ~ .hus

,6 -

5 :

4 -

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

Technology ( pm)

Figure 5: Scaling trend for VDD nd VTH

Driver

Circuit

V D D - V T H

Figure 6: Section of a gate implemented using CPL

Iz 0: ( V G S ~ V T H ~ ) ~nd is initially at V T H ~o dis-charge the O/ P node. This produces a very small discharging

curren t, hence a large time delay. This goes on until th e out-

put node goes down to VD D-V T H ~ ,here the keeper turns

ON, pulling up the internal node to V D D ,nd hence acceler-

ating the discharge process. This provides an additional delay

constraint to CPL. Another problem associated with decreas-

ing the VDD/VTHatio is the reduction in gate robustness

because the noise margins will dwindle. CPL, is also sensitive

to voltage and device scaling [ZO], which again influences the

gate's performance, power consumption and robustness.

3.6 Scaling of Interconnects

CPL has a complex structure, and a high wiring overhead

due to the dual-rail signals. The wiring capacitance (inter-

connects) are high, causing the power and delay to also grow.

This becomes worse in the DSM regime, where the RC de-

lay of the interconnects occupies a large ratio of the clock

cycle time, which reaches over 30% in the 0.25pm technol-

ogy [21], as shown in Figure 7. This is another reason for

the degradation in the CPL's performance. Complex struc-

tures implemented with DCVS also suffer from interconnect

scaling.

19-1-5405



35 r 1

% o b . \

10 1A .... ...........dL..L ........ ... .... ... ... ...l...&.-L ._._ ........ 1 . B...’....i

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

Technology (p”)

Figure 7: Trend of the ratio of the interconnect RC delay and

the clock cycle

4 Area Considerations

4.1 Technology Scaling and Area

Metal interconnects are needed to connect transistors, route

signals and supply power across the integrated circuit chip.

As technology scales down, transistor fea ture sizes scale down

linearly, while this is not t rue for metal wire interconnects due

to physical limitations on the metal deposition. The intercon-

nect pitch (metal width+space) is decreasing to exploit inte-

gration. However, the interconnect length is kept constant be-

cause of the use of more transistors per circuits. This leads to

an increase in parasitic capacitances and line resistance. This

degrades the chip’s performance, and higher power is dissi-

pated per unit area which consequently augments the chip’s

temperature.

In older technologies, poly layers were used for routing

because of their reasonable resistance. This is not the case

in DSM technologies, where the impedance of the poly layer

grows and is unsuitable for long interconnects. Such limita-

tions lead to the use of extra vias and metal wires in routing,

which adds additional overhead. Copper interconnects are

particularly used to reduce the interconnect area since the

physical limitations on copper size are more relaxed. Copper

also has lower resistivity, allowing wires to have small widths,

and thus less interconnect delays. However, many problems

are associated with the use of copper wiring which makes it

an expensive alternative [22]. The use of larger number ofmetal layers and stacked vias is a technique for improving in-

terconnect density without reducing pitch . For DSM devices

six levels of metal or more are used. Older technologies used

only two or three levels of metal.

Finally, the interconnect height is scaled at a slower rate

than it s width. This increases the wire’s aspect ra tio, an d con-

sequently reduces the wire resistance. This, however, evokes

line coupling, which causes crosstalk, increased power dissi-

pation, and degradation in performance.

4.2 Logic Style and Area

The choice of logic style affects the area in two ways; cell

area and routing area. Cell area is a function of the number

and size of the devices. It is also dependent on the complex-

ity of the logic cell, since complex gates require more area

for connecting the devices of the gate. Generally, differential

logic styles CPL, DCVS and CML are area efficient in im-

plementing arithmetic circuits and XOR based logic systems.

For simple gates such as AND and OR, single ended logic

styles CMOS and Domino are preferred. Input s ignals are

connected to transistor gates only, which facilitates the us-

age and characterization of logic cells. T he layout of CMOS

gates is straight forward and efficient due to the complemen-

tary transistor pairs. Routing area is the wire interconnect

area for connecting the gates together. Differential logic styles

have twice the number of inputs and ou tputs compared to sin-

gle ended lpgic families, leading to larger interconnect areas.

As a d e f thumb, differential logic should be used only for

complex gates especially XOR gates where it will reduce the

total number of logic gates.

5 Results and Analysis

The performance of the logic gates in terms of power and delay

are divided into two groups. The first includes the NAND,

NOR and A 0 1 gates (Group I) . The second group includes the

MUX, XOR, and the FA (Group 11). Group I gates are usually

implemented using single ended structures . Generally, CMOS

is the most efficient style to implement Group I. Its low power

consumption, and relatively good delay contribute to its low

ED P values. The three dynamic styles follow CMOS in terms

of minimum EDP. CML is particularly the most efficient due

to its high speed and limited power. It is followed by DCVS

then Domino logic. Domino though proves to be the fastes t

for NOR gate, but consumes a large amount of power. The

high dynamic power associated with dynamic circuits is partly

attributed to its high switching activity. CPL is considered

the least efficient logic style to implement Group I gates. This

is attributed to it s exceptionally long delay a nd considerably

high power, proving that AND and OR gates are the least

efficient gates tha t could be realized by CPL.

As for the complex structures in Group 11, logic styles hav-

ing inverted signals and dual rails, are usually used to imple-

ment these functions efficiently. CML and DCVS are the most

efficient styles to implement Group I1 gates. This s attributed

to their differential nature, inverted signal structures, suffi-

cient speed, and tolerable power dissipation. Despite the NP-

Domino’s high speed, its large power degrades its EDP value,

when implementing XOR and FA. Both static styles; CMOS

and CPL, inefficiently implement Group I1 gates. However,

MUX’s are best realized using CPL, while CML tops other

styles in implementing XOR and FA gates. XOR and MUXare considered the least efficient gates that could be real-

ized using the CMOS implementation because ‘they require

inverted signals as inputs.

Figures 8, 9 and 10 present the average normalized delay,

power and EDP of Group I gates.

While Figures 11, 12 and 13 present the average normal-

ized delay, power and EDP of Group I1 gates. In Figure 8 it is

clear tha t the speed enhancement for th e logic styles decreases

40619-1 6



I.5 ~

-Domino

0.25 0.35 0.6 0.8

Technology ( pm)

Figure 8 : Average Normalized Delay for Group I

lo o L I

___L _

CI

- - A - - OCVS

..... 1 L.... _..L___I....I... ... .A.

0.25 0.35 0.6 0.8

Technology ( pm)

Figure 9: Average Normalized Power/MHz for Group I

1ow .....................................................

N.-

0.25 0.35 0.6 0.8

Technology @"

Figure 10: Average Normalized EDP for Group I

as technology scales down. CMOS however, has the best en-

hancement. In Figure 13,CPL had the best EDP values in

the 0.8pm technology, but gradually experiences a relative in-

crease in EDP aswe move deeper in the DSM regime. It is also

worth noting that CML had high EDP values in the 0.8pm

technology (Figures 10 and 13), but achieves low EDP's as

technology is scaled down. This is consistent with [14], e-

cause MCML works efficiently in power down technologies.

Finally, all six graphs verify that both th e delay and power of

CMOS gates are relatively enhanced in DSM technologies.

a

-a- Conv. CMOS

- h - -D C V S

.*.... A.... ....

0.25 0.35 0.6 0.8

Technology ( pm)

Figure 11: Average Normalized Delay for Group I1

-Conv. CMOS

+ onilno- -* - - CVS-cpLU CMI. I

0.25 0.35 0.6 0.8

Technology ( pm)

Figure 12 : Average Normalized Power/MHz for Group I1

Table 1 shows the results of the th e CLA adder. Conven-

tional CMOS proves to have the worst delay, while attaininga somehow average power dissipation value. Conventional

CMOS io therefore, the least efficient way to implement the

CLA adder. Domino logic comes as the second worst im-

plementation. because of its single ended structure. All the

differential ended structures have the best EDP to implement

the CLA adder. This is because of the numerous A01 and

XOR structures that are used to build the CLA adder. It

should be noted that the CPL CLA adder was implemented

with single branch structures. This is the main reason for

CPL's limited power consumption.

19-1-7407



Table 1: CLA Comparison

Logic Power (Norm.) Delay (Norm.) EDP (Norm.)

Style

CMOS

I

0.25 0.35 I 0.6 I 0.8 I 0.25 1 0.35 0.6 1 0.8 0.25 0.35 0.6 1 0.81 I 2.12 I 5.82 I 26.6 I 1 I 1.65 I 3.1 1 3.62 1 1 5.78 1 56 I 348

CP LDomino

DCVS

CML

-Conv. CMOS

-m- Vomino- - A - - VCVS

1.23 1.43 2.49 14.8 0.62 1.58 1.95 3.16 0.48 3.57 9.4 1481.33 7.86 11.6 50.5 0.67 0.81 1.62 1.75 0.59 5.2 30.7 154

1.57 2.96 3.9 14.6 0.74 0.91 1.17 1.54 0.85 2.46 5.3 34.2

1.96 3.31 4.22 21.5 0.6 0.81 1.15 1.88 0.71 2.17 5.58 75.4

-Conv. CMOS

-m- Vomino- - A - - VCVS

, . , , , , , , . . , , . , , . .0.1 ’ ’

0.25 0.35 0.6 0.8

Technology ( pm)

Figure 13: Average Normalized EDP for Group 11

6 Conclusions

As technology scales down, CMOS is the least affected logic

style. Its performance an d robustness are enhanced compared

to other logic styles. Domino’s performance and power will

deteriorate because of the leakage currents and contention

caused by th e keeper transistor, while DCVS will also suffer

from leakage power, but doesn’t have any contention prob-

lems during evaluation. Because interconnects are not scaled

linearly with technology, the percentage of power consumed

in the clock tree will grow. CPL performance degrades much

faster than other logic styles because of the reduction of the

ratio VDD/VTH ith technology scaling. Hot carrier effect

makes it even worse by increasing VTHover the long term.

CP L area will tend to grow with more power dissipation for

the larger area and the complex routing. Although CML tops

the logic styles in many circuit implementations in terms of

minimum EDP, it is yet not very widely used. This is at-

tributed that CML cannot be used as standard cells, becausethe RC delay of each gate varies for every gate, according to

the Funin and Funout. MCML may also have some trouble

with VTHvariations caused by the hot carrier effects. But if

MCML is used a t a lower supply voltage, th e effect of the hot

carrier will be less significant.

References

[l] M.Bohr el al., “A high-performance 0.25-pm logic technology

optimized for 1.8V operation”, I E D M , pp. 847-850, 1996.

408 19-

[2] R.Gonzalez el al., “Supply and threshold voltage scaling for

[3] J.M.Rabaey, Digital Integrated Circuits, Prentice Hall, 1996.

[4] R.Zimmerman n an d W.Fich tner, “Low-Power Logic Styles:

CMOS Versus Pass-Tkansistor Logic”, IEEE JSSC, pp. 1079-

1090, July 1997.

[5] R.H.Krambeck el al., “High-speed Compact Circuits with

[6] P.Srivastava et al., “Issues in the Design of Domino Logic

Circuits”, Proc . o f IEEE GLS VLS I , pp. 108-112, 1998.[7] Ruchir Puri, “Design Issues in Mixed Static-Domino Circuit

Implementations”, Proc. IEEE Internat ional Conf . on Com-

puter Design, pp. 270-275, Oct. 1998.

[8] N.Weste and K.Eshraghian, Principles of CMOS V L S I D e -

sign, Addison-Wesley Publishing Company, 1994.

[9] William R. Griffin Lawrence G. Heller, “Cascode Voltage

Switch Logic: A Differential CMOS Logic Fam ily ”, I S S C C ,

[lo] Wei Hwang Fang-shi Lai, “Design and Imple menta tion of Dif-

ferential Cascode Switch with Pas s-Gat e (DCVSP G) Logic for

High-Performance Digital Systems” , JSSC, pp . 563-573, April

1997.

[ l l ] A. Barriga M. J . Bellido J.L. Huertas A.J. Acosta, M. Va -lencia, “SODS: A New CMOS Differential Type S truc ture “,

JSSC, pp . 835-838, July 1995.

“Charge Recycling

Differential Logic CRDL for Low Power Applications”, JSSC,

pp. 1267-1276, September 1996.

[13] M. Mizuno et al., “A GHz MOS Adaptive Pipeline Technique

Using MOS Current-Mode Logic”, JSSC, pp . 784-791, June

1996.

“MOS urrent mode logic

MCML circuit for low-power GHz processors”, NEC Research

€4 Deve lopmen t , vol. 36 , n. 1, pp. 54-63, Ja n 1995.

[15] T.Sakurai el-al ., “Alpha-power law MOSFET model and its

applicatiorp- to CMOS inverter delay and oth er formulas ”,

[IS] T.Hayashi et al., “Hot carrier injection in PMOSFETs ”, O K 1Technical Review, pp. 59-62, 1991.

[17] A. Bellaouar an d M. I. Elmasry, Low-P ower Digi tal VLSI

Design Circui ts and Systems, Kluwer Academics Publications,

1995.

[18] S.Thompson et al., “Dual Threshold Voltage and S ubstrate

Bias: Keys t o High Performance, Low Power, O.lpm Logic

Designs”, IEEE Symposium on VLSI Technology Tech. Dig. ,

[19] S.Thompson et al., “MOS Scaling: Transistor Challenges for

the 21st Century”, Intel Technology Journal, Q9, 1998.

[ZO] K.Yano et al., “Top-Down Pass-Transistor Logic Design”,

I E E E J S S C , pp. 792-803, June 1996.

[21] M.Bohr et Y.Elmansy, “Technology for Advanced High-

Performance Microprocessors”, IEEE Tran s . on Elec tron De-

vices, pp. 620-625, vo1.45 1998.

[22] Mark Bohr, “Technology development strategi es for the 21st

century”, Applied Surface Science, pp. 534-540,100/101 1996.

low power CMOS”, I E E E JSSC, pp. 1210-1216,1997,

CMOS”, I E E E JSSC, pp. 614-619,1982.

pp. 16-17, 1984.

[12] B. Kong, J . Choi, S. Lee and K. Lee,

[14] M. Yamashina and H. Yamada,

IEEE JSSC, p. 584-594,1990.

pp. 69-70,1997.

1-8

effect of tech scaling

Documents