effect of tech scaling
TRANSCRIPT
![Page 1: Effect of Tech Scaling](https://reader030.vdocuments.net/reader030/viewer/2022021218/577d215c1a28ab4e1e950ed9/html5/thumbnails/1.jpg)
8/3/2019 Effect of Tech Scaling
http://slidepdf.com/reader/full/effect-of-tech-scaling 1/8
Effect of Technology Scaling on Digital CMOS Logic Styles
Mohamed Allam, Mohab Anis and Mohamed Elmasry*
Abstract
In this paper, the main challenges of technology scaling are
reviewed in depth. Five popular logic families, namely, Con-
ventional CMOS, CPL, Domino, DCVS and MCML are rep-
resented highlighting their advantages and drawbacks. Th e
behavior of each logic style in deep submicron technologies is
analyzed and predicted for future generations. To verify the
qualitative analysis, simulations were performed on the basic
logic gates, full adder and a 16-bit Carry Look Ahead adder.
The circuits were implemented in 0.8, 0.6, 0.35 and 0.25pm
CMOS technologies.
1 Introduction
Ever since the invention of the f i s t integrated circuit, de-
vice dimension, voltage supply, threshold voltage, and oxide
thickness are parameters tha t have been scaled down at a dra-
matic rate over the past three decades [l]. They are consid-
ered as the main stimulus to the growth of the microelectron-
ics industry. Bu t as technology scales down, many phenom-
ena like short channel effects, hot carriers and subthreshold
leakage currents, dominate the functionality of CMOS logic
circuits. Depending on the application, the kind of circuit
to be implemented and the technology used, different per-
formance aspects vary significantly from one logic style to
another. Choosing the appropriate logic style for a certain
application is becoming a challenge where the designer un-
dergoes exhaustive simulations to evaluate the various imple-
ment ations.
Considerable potential for high speed and power savings ex-
ists by means of proper choice of a logic style for implementing
combinational circuits. This is because the parameters gov-
erning power dissipation and performance are strongly influ-
enced by the chosen logic style. Power dissipation is governed
by the supply voltage, operating frequency, nodal switching
activity an d device sizes. Speed, on the other hand, is affected
by the No. of inversion levels, No. of devices in series, sup-
ply voltage, device sizes and interconnect wiring capacitance.
The circuit’s robustness with respect to voltage and device
scaling, process variations and compatability with surround-
ing circuits is also affected by the type of logic style used for
implementation. These parameters are also influenced by the
technology used for implementation, making a logic style fa-
vorable over another to implement a certain application, while
this is not necessarily true as th e technology is varied.
A metric that is heavily influenced by technology scaling,
and th at describes the efficiency of the circuit in terms of per-
formance and power dissipation, would be the Energy-Delay
‘M.W. Allam, M.H. Anis an d M.I. lmasry ar e with the VLSI
Research Group, Department of Electrical and Computer Engineer-ing, University of Waterloo, ON N2L 3G1,Canada.
Product (EDP) [2]. In order to illustrate the influence of
technology scaling on the behavior of digital circuits, Con-
ventional CMOS, CPL, Domino, DCVS and CML logic styles
are used to implement the basic logic gates, full adders, and a
16-bit Carry-Look-Ahead (CLA) adder. These circuits are im-
plemented in CMOS technologies 0.8, 0.6, 0.35 and 0.25pm,
under nominal operating conditionas, and are all optimized
for minimum EDP values. An overview of the most impor-
tan t logic styles is fi s t presented, followed by how logic styles
are affected by technology scaling. Finally, simulation results
are presented to verify the qualitative analysis.
2 Logic Styles
2.1 Conventional CMOS
Logic gates in conventional CMOS are built from an N and
P block. An AND-OR-Invert (AOI) CMOS gate is shown in
Figure l(a). The N block implements a sum-of-product func-
tion to evaluate the ”0” state by creating a path from the
output to G N D . Th e P block evaluates the ”1” state of the
output by implementing a product-of-sums function to create
a path from VD D o the output node. This is equivalent to
stating that the output node is always a low-impedance node
in steady state. Th e N and P networks should be designed so
that, whatever the value of the inputs, one and only one of
the networks is conducting at steady st ate. The main draw-
back of CMOS circuits is the existence of the P block, due
to it s low mobility. The PMOS devices have to be therefore,
sized up. Furthermore, the input capacitance of a CMOS
gate is large because each input is connected to the gates
of at least one PMOS transistor and one NMOS transistor.
This also degrades the gate’s speed. However, the best ga te
performance is achieved with a PMOS/NMOS width ratio of
d x 3].This ratio will eventually approach 1 in Deep-
Submicron (DSM) technologies, where the carrier drift veloc-
ities in NMOS and PMOS transistors become almost equal
due to velocity saturation. Another drawback of CMOS is
the relatively weak output driving capability due to series
transistors in the output stage.
Another impact that the large input capacitance of a
CMOS has, is high power dissipation. However, static CMOScircuits have a smaller switching activity and short-circuit
current compared to the other logic styles. CMOS is also ro-
bust against voltage and transistor scaling and thus reliable
operation at low voltages and minimal transistor sizes. This
is attributed to the presence of a static path that restores the
correct logic stat e in the case of noise. Through out th is paper
the terminology ”CMOS” will be used to define ”Conventional
CMOS”.
19-1 1
0-7803-5809-O/OO/$lO.OO2000 IEEE IEEE 2000 CUSTOM INTEGRATED CIRCUITS CONFERENCE401
![Page 2: Effect of Tech Scaling](https://reader030.vdocuments.net/reader030/viewer/2022021218/577d215c1a28ab4e1e950ed9/html5/thumbnails/2.jpg)
8/3/2019 Effect of Tech Scaling
http://slidepdf.com/reader/full/effect-of-tech-scaling 2/8
2.2 Complementary Pass Logic (CPL)
A CPL gate [4] consists of two NMOS logic networks (one
for each signal rail), two small pull-up PMOS transistors for
swing restoration, and two output inverters for the comple-
mentary output signals. Figure l( b) shows an A0 1 circuit
implemented using CPL. Unlike CMOS logic, the CPL gate
creates a path from the ou tput node to one of the input nodesof the ga te instead of the power lines. Because the MOS net-
works are connected to variable gate inputs ra ther tha n con-
stant power lines, only one signal path through each network
must be active at a time in order to avoid shorts between the
inputs. Therefore, each pass-transistor network must realize a
multiplexer (MUX) structure . All two-input functions AND,
OR and XOR are therefore, implemented by this basic MUX
structure. This is relatively expensive for simple monotonic
gates such as AND and OR.
In most cases, CPL uses smaller and less number of transis-
tors especially in XOR and MUX based functions. There C PL
employ small input loads and good output driving capability
due to the output inverters, and the fast differential stage
due to the cross-coupled PMOS pull-up transistors. However,
most of the CPL gates require all the inputs and their com-plements which increases the routing complexity and over-
head, and ultimately augment power and delay. Since the
CPL gate is constructed mainly from N transistors, the out-
put voltage swing will be lower than the input swing by the
NMOS threshold voltage V T H ~ .his could cause DC cur-
rent t o flow through the inverter. A swing restoring circuit
should therefore, be added after each two or three cascaded
gates to restore the full output swing. This in turn adds to
the power of the circuit. The layout of pass-transistor cells
is not as straightforward and efficient as CMOS due to the
rather irregular transistor arrangements and high wiring re-
quirements, because of the double rails.
2.3 Domino Logic
The A 0 1 structure of a domino logic gate [5] is shown in Fig-
ure l(c) . It is a non-inverting structure and consists of a
dynamic gate stage, a static CMOS inverter, which provides
the circuit’s outp ut, and a PMOS keeper transistor which re-
stores the logic at the Domino output node. The dynamic gate
stage consists of an NMOS transistor network, which imple-
ments the required function and two transistors (NMOS and
PMOS) where the clock signal is applied and synchronizes
the operation of the circuit. The CMOS inverter is included
for the proper operation of a chain of domino gates, and to
increase the driving capability of the gate. The keeper tran-
sistor restores the logic and gives the domino gate immunity
against charge sharing and charge loss [6].
Any number of logic stages can be cascaded, provided that
the sequence can evaluate within the evaluate clock phase.The domino input signal to a domino gate must therefore,
satisfy some setup and hold timing constraints for correct op-
eration of the gate [7].
Domino logic has low transistor count and input capaci-
tance, which enhances its speed. F‘urthermore, since the logic
block is only constructed from high-mobility N transistors,
the evaluation is fast. Domino logic consumes large power.
This is attributed to its high switching activity because all
the output nodes are precharged to VDD ach clock cycle, as
well as the large clock load switching at full rate.
Domino logic is very susceptible to noise. A voltage at
the input as low as VTHcould turn on the NMOS pull-down
transistor, and the output will eventually reach GND. his
is translated to a NM of VTH,which is quite low compared to
sta tic versions. Some subthreshold leakage current can flow
through the NMOS even when the input is ”0 ” . This effect
becomes more pronounced when the input is not completely
”O”, but approaches VTH in the presence of noise, causing
the N-devices to turn ON. To compensate for the low noise
margins, the size of the PMOS keeper must increase, in turn
increasing the contention current during evaluation and con-
sequently reducing the gate’s performance. This is the typical
Speed-Noise Margin trade-off in Domino logic circuits. An-
other one of the problems of Domino circuits is that nonin-
vkrting logic could only be implemented. This is a problem
in-the implementation of XOR gates and full adders (FA). A
Domino style which overcomes this problem is the NP-Domino
[SI. NP-Domino was used to realize the simulated XOR and
FA’S in this work.
2.4 Differential Cascode Voltage Switch(DCVS)
The static and dynamic DCVS logic were first proposed by
Heler et al . as a high performance logic family [9]. The static
version suffered from major drawbacks: 1. High dynamic
power, 2. Limited driving capability and 3. Complex design.
On the other hand, the dynamic version experienced speed-
noise margin trade-offs similar to Domino Logic. The dynamic
DCVSL (DDCVSL) was therefore proposed.
Figure l(d) presents the architecture of an A 01 gate imple-
mented in DDCVSL logic. It is clear that during precharge
phase (CLK=O), both keeper transistors Q1,2 will be O F F .
Unlike domino logic, the keeper transistors will be O FF at
bhe start of he evaluation phase (CLK=l) which will reduce
power and delay caused by the contention . One branch will
implement the required function, while the other branch im-plements it s inverse. DDCVSL is considered a general pur-
pose logic sty le because it may be used t o implement inverting
and non inverting logic circuits. DCVS is more area efficient
in implementing complex logic gates. Most of the complex
logic functions may be implemented using one gate only which
makes DCVS logic much faster than CMOS circuits. It is
also suitable for implementing gates with XOR functionality
like arithmetic circuits and MUX style logic gates. Over the
past fifteen years, many flavors of Cascode Voltage SwitchLogic (CVSL) were introduced. Differential Cascode Voltage
Switch with Pass Logic family (DCVSPG) uses pass logic to
implement the logic function of each branch[lO]. It avoids the
problem of the floating output node th at exists in DCVS logic.
Switched Output Differential Structure (SO.DS) replaces the
PMOS latch with a clocked latch to avoid the contention [ll].Charge Recycling Differential Logic (CRDL) reduces power
dissipation by shorting the output nodes before each evalua-
tion phase [12].
2.5 MOS Current Mode Logic (MCML)
Figure 2(a) shows the architecture of an MCML inverter/
buffer. Transistor Q1 acts as a DC current source controlled
by V,,,. Resistors R I and Rz are pull up resistors. The logic
40219-1-2
![Page 3: Effect of Tech Scaling](https://reader030.vdocuments.net/reader030/viewer/2022021218/577d215c1a28ab4e1e950ed9/html5/thumbnails/3.jpg)
8/3/2019 Effect of Tech Scaling
http://slidepdf.com/reader/full/effect-of-tech-scaling 3/8
TT -
T T
rh
+
(a)CMOS
CLK -+x(b ) CPL (c ) DOMINO
Figure 1: Full Swing Logic Styles
function is implemented by the logic block connected between
the resistors and the current source. For an inverter/ buffer,
the logic block is the differential pair constructed by tran-
sistors Qz and Q3. The operation of the CML is based on
the differential pair circuit. Each differential input variable
is connected to a differential pair circuit. The value of the
input variable controls the flow of current through the two
branches. For example , if VGS(QZ)s higher than VGS(Q~) ,
the current passing through Qz will be higher than that pass-
ing through Q3. Therefore, the voltage of node N I will start
to drop until reaching a steady sta te where the current going
through the resistor R I matches the current going through
transistor Qz. he amount of current going through the ON
branch (Qz in the previous case) controls the discharge delay
of the logic gate while the load resistor controls the charging
of the output nodes. The output voltage swing V . s de-
fmed as the voltage difference between N I and Nz. he small
output swing of MCML circuits reduces cross t a k between
adjacent signals. The constant current source reduces the
switching noise and supply fluctuations. For those reasons,
MCML is recommended for mixed signal design to reduce the
interference between the digital and analog blocks [13],[14].The reduced output swing also reduces the dynamic power
dissipation for long busses. Therefore, MCML may be used
in the implementation of bus transceivers. Another impor-
tant feature of CML circuits is its noise immunity due to th e
differential nature which is recommended at high operating
frequencies.
However MCML has some major drawbacks which limit its
use in digital systems. First is the static power dissipation dueto the constant current source which is independent on the
operating frequency. Therefore, MCML is preferred at h igh
frequency applications only to reduce th e overhead of its sta tic
biasing power. MCML circuits are not suitable for power-
down mode because of the DC current source. MCML circuits
also require special fabrication technologies to implement the
large load resistors in a reasonable area which increases the
cost and area of the chip. A reference voltage distribution tree
has to be included in the design to distribute Vret eading
to larger chip area and more complex routing. Finally, the
OUT
cL y -i(
(d ) DCVS
matching of the rise and fall delays is not an easy task because
its dependency on the load of each gate.
RL A L
"'UCG
(a ) Inverter (b) A 0 1
Figure 2: MCML
3 Effect of Technology on Logic
Styles
3.1 Velocity Saturation and Mobilitydegradation
In order to evaluate the output logic of a certain gate imple-
mented by some logic style, a series of charging and discharg-
ing processes occur to the output node (at which the logic is
determined). As the input of a logic gate changes, it causes
the output node(s) to either charge or discharge. This is true
for logic styles consisting of an N logic block. A static CMOS
inverter is a simple example. The delay of which is the time
taken for the outp ut node to fully charge or discharge.
19-1-3 403
![Page 4: Effect of Tech Scaling](https://reader030.vdocuments.net/reader030/viewer/2022021218/577d215c1a28ab4e1e950ed9/html5/thumbnails/4.jpg)
8/3/2019 Effect of Tech Scaling
http://slidepdf.com/reader/full/effect-of-tech-scaling 4/8
For full swing logic styles, this NMOS will go through all
the operating phases (cut-off, saturation and linear modes)
while discharging the output node. The transistor is initially
in the cut off mode, when the input is ”0”. As the input
increases, the NMOS operates in 2 regions; Saturation and
Linear. The NMOS will fi s t operate in saturation where the
drain current I D S is large ( I D Sa (VGS- V T H ) ~ ,hich dis-
charges the O/P node quickly. a is the velocity saturation
index [15],which takes a value of 2 for long-channel devices,
and around 1.3 for short-channel devices. The NMOS will
operate along a constant VGS urve in the saturation region
in the typical IDS/VDSharacteristics plot. When the output
node reaches VDD V T H ~the NMOS moves from the satura-
tion to the linear region. I D S n the linear region is less than
in the saturation region for the same VGS[15],which causes
the discharge to slow down.
The slowest transition however is from cut-off + aturation
because all the charge stored in the depletion region of the
NMOS device has to sink before the channel is constructed
between the drain and the source. MCML is therefore, faster
than other logic styles (refer to Figure 2(a)) This is because
Qz and Q3 are never totally O FF, and experience a transition
from the saturation -+ linear region and vice versa which take
a short time.
The speed advantage of CML over other logic styles will
start to fade as we move deeper in the DSM regime, where
saturation currents are reduced compared to the linear cur-
rents and no longer follow the long channel behavior (a p-
proaches 1).Not only will the carrier velocity tend to saturate
as the channel length is scaled down, but the device’s mobility
will start t o degrade as well. Figures 3(a) and 3(b) show the
saturation velocity and mobility degradation of the electron
respectively.
(a ) Velocity Sat . (b) Mobility Deg.
Figure 3: Velocity Saturation & Mobility Degradation
In NMOS, the saturation velocity is reached at a lower crit-
ical electric field compared to PMOS. This indicates that pn
is degraded at a much faster rate than p p [16].Eventually, apoint is reached where both NMOS and PMOS have compa-
rable mobilities and switching speeds. This is particularly im-
portant for the implementation of CMOS structures , for two
reasons. Firstly, CMOS suffers from degraded performance
because of the low mobility PMOS transis tors. This speed
disadvantage will gradually decrease as the technology scales
down, and pn approaches pp. This enhances the performance
of CMOS in terms of delay, power and area. Secondly, the op-
timum noise margin in CMOS is achieved when p p equals pn
[17]. With p p = p n , the CMOS noise margin is enhanced, and
equal driving capability is achieved, which keeps the short-
circuit current within bounds [3].Thus CMOS performance
and robustness are both enhanced relative to other styles as
technology scales down.
3.2 Hot carrier effect (HCE)
Another phenomenon that takes place as the technology is
scaled down is the hot carrier effect (HCE) [16].The scaling
down of the gate oxide thickness TOXat a higher rate than
the supply voltage causes the electric field across the gate to
increase, which causes the increase of electron velocity. Elec-
trons would leave the silicon and tunnel into the gate oxide
upon reaching enough energy levels. Electrons trapped in the
oxide change VTH,ypically increasing VTH f NMOS devices
( V T H ~ ) ,hile decreasing VTH f PMOS devices. MCML may
have some trouble with VTH ariation caused by the HCE, be-
cause the devices have to be matched for correct functionality.
HCE is another reason that makes low voltage operations fa-
vorable. Logic families th at can work at a lower supply volt-
age like MCML (with no degradation in functionality) will
get more preference because this will reduce the HCE andthe punch through phenomenon, leading to better reliability
and lifetime.
Logic styles that can tolerate minor changes in VTHwill
gain more importance because the HCE and electromigration
tend to increase V T H ~ver time. For Domino and DCVS
logic, this is translated into a small variation in delay and
better noise margin. On the other hand, the higher V T H ~
may cause MCML to cease functionality. This is attribu ted
to the fact tha t increasing V T H ~ould decrease the discharge
current, causing the voltage swing VS to be limited in value.
When V . s small, it might cause the following CML stages
to malfunction. Circuits implemented using CPL also have
degraded performance when affected by HCE, as a larger
voltage drop ( V T H ~ )s produced across the pass transistor.
The pass transis tor and output inverter will therefore, havelower switching speeds, because the current is reduced. Short-
circuit currents also take place, adding to the CPL’s power
dissipation.
3.3 Leakage currents
Th e performance of dynamic styles, particularly Domino, will
degrade in DSM technologies. As explained in section 2.3,
Domino logic is particularly susceptible to noise, due to the
effect of leakage currents. Leakage currents are more pro-
nounced as we move down in the DSM regime. This dete-
riorates the gate’s noise margin. To compensate for the low
noise margins, the size of the PMOS keeper must increase, in
turn increasing the contention current during evaluation, as
well as the loading of the O /P node. This reduces the gate’s
performance.
The ra te of improvement in the Domino’s performance will
therefore gradually decrease as we go deeper in DSM technolo-
gies. This is another reason tha t the performance of CMOS
circuits is expected to approach the dynamic logic gates with-
out tampering with noise margins. Figure 4 [18] lots the
optimal VTHversus process technology for the static and dy-
namic cases. It is clear that t he optimal VTHused in static
and dynamic circuits diverge. Static circuits need lower VTH
19-1 4404
![Page 5: Effect of Tech Scaling](https://reader030.vdocuments.net/reader030/viewer/2022021218/577d215c1a28ab4e1e950ed9/html5/thumbnails/5.jpg)
8/3/2019 Effect of Tech Scaling
http://slidepdf.com/reader/full/effect-of-tech-scaling 5/8
to maintain gate drive with lower V D D , hile in dynamic cir-
cuits it becomes difficult to scale VTHdue to noise limits.
0.2 ' ~ " l " a ' l ~ ' s ' ~ ~ ~ ' ~ ~ ~ " " ~ ~ ' ~ ~ ~ ~ ' ~ ~ ~ ~ '
1 0.8 0.6 0.35 0.25 0.18 0.15
Technology ( pm)
Figure 4: Optimal threshold voltage for static and dynamic
circuits versus technology
3.4 The Drain-Induced Barrier Lowering
(DIBL)
DIBL causes VTH o be a function of the operating voltage.
VTHdecreases with L,jj for short-channel devices, while an
increase in the drain-source voltage VD S auses VTH o de-
crease. This effect is called DIBL. This becomes a problem
especially for dynamic circuits which causes a reduction in the
noise margin, tha t is particularly a problem in Domino logic
implementations. As mentioned previously, to maintain suffi-
cient noise margin, this would come at the expense of reduced
performance.
3.5 Scaling down VDD/VTHatio
VDDs scaled down at a relatively slower rate than the scaling
down of VTH as shown in Figure 5. This is attributed to
reliability restrictions tha t limit the electric field applied to
the gate.
Hence, the ratio VDDIVTH rops with technology scaling
until it reaches a minimum value of 3 at a feature size of
0.07pm. This again explains the performance and power
degradation of CPL logic styles. To further illustrate this,
a section of the CPL circuit is shown in Figure 6.
The voltage at the output of the driver circuit is V D D ,
while the pass transistor is initially OFF. As transistor Q1
turns ON , it will start operating in the saturation mode,
where its current I1 cx ( VG S~ V T H ~ ) ~ .n the case of
CPL, V G S ~ V D D- V T H ~due to the V T H ~rop), thus
11 cx (VDD- ~ V T H , ) ~ .f VDDwas to take the worst case
value of 3VTHN [19], then 110:V T H ~. 11 is thus significantly
reduced, and the switching speed of Q1 is largely degraded.
Furthermore, this will increase the short-circuit current flow-
ing from VDDo GND n the inverter. A further speed degra-
dation, is accompanied as Qz passes through the saturation
then linear phases while discharging the O/ P node. Qz starts
discharging in the saturation mode when V G S ~ V T H ~ .hus
,6 -
5 :
4 -
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
Technology ( pm)
Figure 5: Scaling trend for VDD nd VTH
Driver
Circuit
V D D - V T H
Figure 6: Section of a gate implemented using CPL
Iz 0: ( V G S ~ V T H ~ ) ~nd is initially at V T H ~o dis-charge the O/ P node. This produces a very small discharging
curren t, hence a large time delay. This goes on until th e out-
put node goes down to VD D-V T H ~ ,here the keeper turns
ON, pulling up the internal node to V D D ,nd hence acceler-
ating the discharge process. This provides an additional delay
constraint to CPL. Another problem associated with decreas-
ing the VDD/VTHatio is the reduction in gate robustness
because the noise margins will dwindle. CPL, is also sensitive
to voltage and device scaling [ZO], which again influences the
gate's performance, power consumption and robustness.
3.6 Scaling of Interconnects
CPL has a complex structure, and a high wiring overhead
due to the dual-rail signals. The wiring capacitance (inter-
connects) are high, causing the power and delay to also grow.
This becomes worse in the DSM regime, where the RC de-
lay of the interconnects occupies a large ratio of the clock
cycle time, which reaches over 30% in the 0.25pm technol-
ogy [21], as shown in Figure 7. This is another reason for
the degradation in the CPL's performance. Complex struc-
tures implemented with DCVS also suffer from interconnect
scaling.
19-1-5405
![Page 6: Effect of Tech Scaling](https://reader030.vdocuments.net/reader030/viewer/2022021218/577d215c1a28ab4e1e950ed9/html5/thumbnails/6.jpg)
8/3/2019 Effect of Tech Scaling
http://slidepdf.com/reader/full/effect-of-tech-scaling 6/8
35 r 1
% o b . \
10 1A .... ...........dL..L ........ ... .... ... ... ...l...&.-L ._._ ........ 1 . B...’....i
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
Technology (p”)
Figure 7: Trend of the ratio of the interconnect RC delay and
the clock cycle
4 Area Considerations
4.1 Technology Scaling and Area
Metal interconnects are needed to connect transistors, route
signals and supply power across the integrated circuit chip.
As technology scales down, transistor fea ture sizes scale down
linearly, while this is not t rue for metal wire interconnects due
to physical limitations on the metal deposition. The intercon-
nect pitch (metal width+space) is decreasing to exploit inte-
gration. However, the interconnect length is kept constant be-
cause of the use of more transistors per circuits. This leads to
an increase in parasitic capacitances and line resistance. This
degrades the chip’s performance, and higher power is dissi-
pated per unit area which consequently augments the chip’s
temperature.
In older technologies, poly layers were used for routing
because of their reasonable resistance. This is not the case
in DSM technologies, where the impedance of the poly layer
grows and is unsuitable for long interconnects. Such limita-
tions lead to the use of extra vias and metal wires in routing,
which adds additional overhead. Copper interconnects are
particularly used to reduce the interconnect area since the
physical limitations on copper size are more relaxed. Copper
also has lower resistivity, allowing wires to have small widths,
and thus less interconnect delays. However, many problems
are associated with the use of copper wiring which makes it
an expensive alternative [22]. The use of larger number ofmetal layers and stacked vias is a technique for improving in-
terconnect density without reducing pitch . For DSM devices
six levels of metal or more are used. Older technologies used
only two or three levels of metal.
Finally, the interconnect height is scaled at a slower rate
than it s width. This increases the wire’s aspect ra tio, an d con-
sequently reduces the wire resistance. This, however, evokes
line coupling, which causes crosstalk, increased power dissi-
pation, and degradation in performance.
4.2 Logic Style and Area
The choice of logic style affects the area in two ways; cell
area and routing area. Cell area is a function of the number
and size of the devices. It is also dependent on the complex-
ity of the logic cell, since complex gates require more area
for connecting the devices of the gate. Generally, differential
logic styles CPL, DCVS and CML are area efficient in im-
plementing arithmetic circuits and XOR based logic systems.
For simple gates such as AND and OR, single ended logic
styles CMOS and Domino are preferred. Input s ignals are
connected to transistor gates only, which facilitates the us-
age and characterization of logic cells. T he layout of CMOS
gates is straight forward and efficient due to the complemen-
tary transistor pairs. Routing area is the wire interconnect
area for connecting the gates together. Differential logic styles
have twice the number of inputs and ou tputs compared to sin-
gle ended lpgic families, leading to larger interconnect areas.
As a d e f thumb, differential logic should be used only for
complex gates especially XOR gates where it will reduce the
total number of logic gates.
5 Results and Analysis
The performance of the logic gates in terms of power and delay
are divided into two groups. The first includes the NAND,
NOR and A 0 1 gates (Group I) . The second group includes the
MUX, XOR, and the FA (Group 11). Group I gates are usually
implemented using single ended structures . Generally, CMOS
is the most efficient style to implement Group I. Its low power
consumption, and relatively good delay contribute to its low
ED P values. The three dynamic styles follow CMOS in terms
of minimum EDP. CML is particularly the most efficient due
to its high speed and limited power. It is followed by DCVS
then Domino logic. Domino though proves to be the fastes t
for NOR gate, but consumes a large amount of power. The
high dynamic power associated with dynamic circuits is partly
attributed to its high switching activity. CPL is considered
the least efficient logic style to implement Group I gates. This
is attributed to it s exceptionally long delay a nd considerably
high power, proving that AND and OR gates are the least
efficient gates tha t could be realized by CPL.
As for the complex structures in Group 11, logic styles hav-
ing inverted signals and dual rails, are usually used to imple-
ment these functions efficiently. CML and DCVS are the most
efficient styles to implement Group I1 gates. This s attributed
to their differential nature, inverted signal structures, suffi-
cient speed, and tolerable power dissipation. Despite the NP-
Domino’s high speed, its large power degrades its EDP value,
when implementing XOR and FA. Both static styles; CMOS
and CPL, inefficiently implement Group I1 gates. However,
MUX’s are best realized using CPL, while CML tops other
styles in implementing XOR and FA gates. XOR and MUXare considered the least efficient gates that could be real-
ized using the CMOS implementation because ‘they require
inverted signals as inputs.
Figures 8, 9 and 10 present the average normalized delay,
power and EDP of Group I gates.
While Figures 11, 12 and 13 present the average normal-
ized delay, power and EDP of Group I1 gates. In Figure 8 it is
clear tha t the speed enhancement for th e logic styles decreases
40619-1 6
![Page 7: Effect of Tech Scaling](https://reader030.vdocuments.net/reader030/viewer/2022021218/577d215c1a28ab4e1e950ed9/html5/thumbnails/7.jpg)
8/3/2019 Effect of Tech Scaling
http://slidepdf.com/reader/full/effect-of-tech-scaling 7/8
I.5 ~
-Domino
0.25 0.35 0.6 0.8
Technology ( pm)
Figure 8 : Average Normalized Delay for Group I
lo o L I
___L _
CI
- - A - - OCVS
..... 1 L.... _..L___I....I... ... .A.
0.25 0.35 0.6 0.8
Technology ( pm)
Figure 9: Average Normalized Power/MHz for Group I
1ow .....................................................
N.-
0.25 0.35 0.6 0.8
Technology @"
Figure 10: Average Normalized EDP for Group I
as technology scales down. CMOS however, has the best en-
hancement. In Figure 13,CPL had the best EDP values in
the 0.8pm technology, but gradually experiences a relative in-
crease in EDP aswe move deeper in the DSM regime. It is also
worth noting that CML had high EDP values in the 0.8pm
technology (Figures 10 and 13), but achieves low EDP's as
technology is scaled down. This is consistent with [14], e-
cause MCML works efficiently in power down technologies.
Finally, all six graphs verify that both th e delay and power of
CMOS gates are relatively enhanced in DSM technologies.
a
-a- Conv. CMOS
- h - -D C V S
.*.... A.... ....
0.25 0.35 0.6 0.8
Technology ( pm)
Figure 11: Average Normalized Delay for Group I1
-Conv. CMOS
+ onilno- -* - - CVS-cpLU CMI. I
0.25 0.35 0.6 0.8
Technology ( pm)
Figure 12 : Average Normalized Power/MHz for Group I1
Table 1 shows the results of the th e CLA adder. Conven-
tional CMOS proves to have the worst delay, while attaininga somehow average power dissipation value. Conventional
CMOS io therefore, the least efficient way to implement the
CLA adder. Domino logic comes as the second worst im-
plementation. because of its single ended structure. All the
differential ended structures have the best EDP to implement
the CLA adder. This is because of the numerous A01 and
XOR structures that are used to build the CLA adder. It
should be noted that the CPL CLA adder was implemented
with single branch structures. This is the main reason for
CPL's limited power consumption.
19-1-7407
![Page 8: Effect of Tech Scaling](https://reader030.vdocuments.net/reader030/viewer/2022021218/577d215c1a28ab4e1e950ed9/html5/thumbnails/8.jpg)
8/3/2019 Effect of Tech Scaling
http://slidepdf.com/reader/full/effect-of-tech-scaling 8/8
Table 1: CLA Comparison
Logic Power (Norm.) Delay (Norm.) EDP (Norm.)
Style
CMOS
I
0.25 0.35 I 0.6 I 0.8 I 0.25 1 0.35 0.6 1 0.8 0.25 0.35 0.6 1 0.81 I 2.12 I 5.82 I 26.6 I 1 I 1.65 I 3.1 1 3.62 1 1 5.78 1 56 I 348
CP LDomino
DCVS
CML
-Conv. CMOS
-m- Vomino- - A - - VCVS
1.23 1.43 2.49 14.8 0.62 1.58 1.95 3.16 0.48 3.57 9.4 1481.33 7.86 11.6 50.5 0.67 0.81 1.62 1.75 0.59 5.2 30.7 154
1.57 2.96 3.9 14.6 0.74 0.91 1.17 1.54 0.85 2.46 5.3 34.2
1.96 3.31 4.22 21.5 0.6 0.81 1.15 1.88 0.71 2.17 5.58 75.4
-Conv. CMOS
-m- Vomino- - A - - VCVS
, . , , , , , , . . , , . , , . .0.1 ’ ’
0.25 0.35 0.6 0.8
Technology ( pm)
Figure 13: Average Normalized EDP for Group 11
6 Conclusions
As technology scales down, CMOS is the least affected logic
style. Its performance an d robustness are enhanced compared
to other logic styles. Domino’s performance and power will
deteriorate because of the leakage currents and contention
caused by th e keeper transistor, while DCVS will also suffer
from leakage power, but doesn’t have any contention prob-
lems during evaluation. Because interconnects are not scaled
linearly with technology, the percentage of power consumed
in the clock tree will grow. CPL performance degrades much
faster than other logic styles because of the reduction of the
ratio VDD/VTH ith technology scaling. Hot carrier effect
makes it even worse by increasing VTHover the long term.
CP L area will tend to grow with more power dissipation for
the larger area and the complex routing. Although CML tops
the logic styles in many circuit implementations in terms of
minimum EDP, it is yet not very widely used. This is at-
tributed that CML cannot be used as standard cells, becausethe RC delay of each gate varies for every gate, according to
the Funin and Funout. MCML may also have some trouble
with VTHvariations caused by the hot carrier effects. But if
MCML is used a t a lower supply voltage, th e effect of the hot
carrier will be less significant.
References
[l] M.Bohr el al., “A high-performance 0.25-pm logic technology
optimized for 1.8V operation”, I E D M , pp. 847-850, 1996.
408 19-
[2] R.Gonzalez el al., “Supply and threshold voltage scaling for
[3] J.M.Rabaey, Digital Integrated Circuits, Prentice Hall, 1996.
[4] R.Zimmerman n an d W.Fich tner, “Low-Power Logic Styles:
CMOS Versus Pass-Tkansistor Logic”, IEEE JSSC, pp. 1079-
1090, July 1997.
[5] R.H.Krambeck el al., “High-speed Compact Circuits with
[6] P.Srivastava et al., “Issues in the Design of Domino Logic
Circuits”, Proc . o f IEEE GLS VLS I , pp. 108-112, 1998.[7] Ruchir Puri, “Design Issues in Mixed Static-Domino Circuit
Implementations”, Proc. IEEE Internat ional Conf . on Com-
puter Design, pp. 270-275, Oct. 1998.
[8] N.Weste and K.Eshraghian, Principles of CMOS V L S I D e -
sign, Addison-Wesley Publishing Company, 1994.
[9] William R. Griffin Lawrence G. Heller, “Cascode Voltage
Switch Logic: A Differential CMOS Logic Fam ily ”, I S S C C ,
[lo] Wei Hwang Fang-shi Lai, “Design and Imple menta tion of Dif-
ferential Cascode Switch with Pas s-Gat e (DCVSP G) Logic for
High-Performance Digital Systems” , JSSC, pp . 563-573, April
1997.
[ l l ] A. Barriga M. J . Bellido J.L. Huertas A.J. Acosta, M. Va -lencia, “SODS: A New CMOS Differential Type S truc ture “,
JSSC, pp . 835-838, July 1995.
“Charge Recycling
Differential Logic CRDL for Low Power Applications”, JSSC,
pp. 1267-1276, September 1996.
[13] M. Mizuno et al., “A GHz MOS Adaptive Pipeline Technique
Using MOS Current-Mode Logic”, JSSC, pp . 784-791, June
1996.
“MOS urrent mode logic
MCML circuit for low-power GHz processors”, NEC Research
€4 Deve lopmen t , vol. 36 , n. 1, pp. 54-63, Ja n 1995.
[15] T.Sakurai el-al ., “Alpha-power law MOSFET model and its
applicatiorp- to CMOS inverter delay and oth er formulas ”,
[IS] T.Hayashi et al., “Hot carrier injection in PMOSFETs ”, O K 1Technical Review, pp. 59-62, 1991.
[17] A. Bellaouar an d M. I. Elmasry, Low-P ower Digi tal VLSI
Design Circui ts and Systems, Kluwer Academics Publications,
1995.
[18] S.Thompson et al., “Dual Threshold Voltage and S ubstrate
Bias: Keys t o High Performance, Low Power, O.lpm Logic
Designs”, IEEE Symposium on VLSI Technology Tech. Dig. ,
[19] S.Thompson et al., “MOS Scaling: Transistor Challenges for
the 21st Century”, Intel Technology Journal, Q9, 1998.
[ZO] K.Yano et al., “Top-Down Pass-Transistor Logic Design”,
I E E E J S S C , pp. 792-803, June 1996.
[21] M.Bohr et Y.Elmansy, “Technology for Advanced High-
Performance Microprocessors”, IEEE Tran s . on Elec tron De-
vices, pp. 620-625, vo1.45 1998.
[22] Mark Bohr, “Technology development strategi es for the 21st
century”, Applied Surface Science, pp. 534-540,100/101 1996.
low power CMOS”, I E E E JSSC, pp. 1210-1216,1997,
CMOS”, I E E E JSSC, pp. 614-619,1982.
pp. 16-17, 1984.
[12] B. Kong, J . Choi, S. Lee and K. Lee,
[14] M. Yamashina and H. Yamada,
IEEE JSSC, p. 584-594,1990.
pp. 69-70,1997.
1-8