jan m. rabaey low power design essentials ©2008 chapter 4 optimizing power @ design time circuits...
Post on 14-Dec-2015
238 Views
Preview:
TRANSCRIPT
Jan M. Rabaey
Low Power Design Essentials ©2008 Chapter 4
Optimizing Power @ Design Time
Circuits
Dejan Marković
Borivoje Nikolić
Low Power Design Essentials ©2008 4.2
Chapter Outline
Optimization framework for energy-delay trade-off Dynamic power optimization
– Multiple supply voltages– Transistor sizing– Technology mapping
Static power optimization– Multiple thresholds– Transistor stacking
Low Power Design Essentials ©2008 4.3
Energy/Power Optimization Strategy
For given function and activity, an optimal operation point can be derived in the energy-performance space
Time of optimization depends upon activity profile Different optimizations apply to active and static power
Fixed Activity
Variable Activity
No Activity - Standby
ActiveDesign time Run time Sleep
Static
Low Power Design Essentials ©2008 4.4
Maximize throughput for given energy orMinimize energy for given throughput
Delay
Unoptimized design
Emax
DmaxDmin
Energy/op
Emin
Energy-Delay Optimization and Trade-off
Trade-off space
Other important metrics: Area, Reliability, Reusability
Low Power Design Essentials ©2008 4.5
The Design Abstraction Stack
Logic/RT
(Micro-)Architecture
Software
Circuit
Device
System/Application
Th
is C
hap
ter
A very rich set of design parameters to consider!It helps to consider options in relation to their abstraction layer
sizing, supply, thresholds
logic family, standard cell versus custom
Parallel versus pipelined, general purpose versus application specific
Bulk versus SOI
Choice of algorithm
Amount of concurrency
Low Power Design Essentials ©2008 4.6
Architecture
Micro-Architecture
Circuit (Logic & FFs)
Optimization Can/Must Span Multiple Levels
Design optimization combines top-down and bottom-up: “meet-in-the-middle”
Low Power Design Essentials ©2008 4.7
topology A
DelayE
ner
gy/
op
Globally optimal energy-delay curve for a given function
Energy-Delay Optimization
topology B
topology A
topology B
Delay
En
erg
y/o
p
Low Power Design Essentials ©2008 4.8
Some Optimization Observations
∂E / ∂A∂D / ∂A A=A0
SA=
SB
SA
f (A0,B)
f (A,B0)
Delay
En
erg
y
D0
(A0,B0)
Energy-Delay Sensitivities
[Ref: V. Stojanovic, ESSCIRC’02]
Low Power Design Essentials ©2008 4.9
∆E = SA∙(∆D) + SB∙∆D
On the optimal curve, all sensitivities must be equal
Finding the Optimal Energy-Delay Curve
f (A0,B)
f (A,B0)
Delay
En
erg
y
D0
(A0,B0)
∆D
f (A1,B)
Pareto-optimal:the best that can be achieved without disadvantaging at least one metric.
Low Power Design Essentials ©2008 4.10
Reducing voltages– Lowering the supply voltage (VDD) at the expense of clock speed– Lowering the logic swing (Vswing)
Reducing transistor sizes (CL)– Slows down logic
Reducing activity (a)– Reducing switching activity through transformations– Reducing glitching by balancing logic
fVVCP DDswingLactive ~DDswingLactive VVCE ~
Reducing Active Energy @ Design Time
Low Power Design Essentials ©2008 4.11
Downsizing and/or lowering the supply on the critical path lowers the operating frequency
Downsizing non-critical paths reduces energy for free, but– Narrows down the path delay distribution– Increases impact of variations, impacts robustness
tp (path)
# o
f pa
ths target
delay
tp (path)
# o
f pa
ths target
delay
Observation
Low Power Design Essentials ©2008 4.12
topology A
topology B
Delay
En
erg
y/o
p
Reference case– Dmin sizing @ VDD
max, VTHref
minimize Energy (VDD, VTH, W) subject to Delay (VDD, VTH, W) ≤ Dcon
Constraints VDD
min < VDD < VDDmax
VTHmin < VTH < VTH
max
Wmin < W
Circuit Optimization Framework
[Ref: V. Stojanovic, ESSCIRC’02]
Low Power Design Essentials ©2008 4.13
i i+1
CwgCiCi Ci+1
Optimization Framework: Generic Network
VDD,i+1VDD,i
Gate in stage i loaded by fanout (stage i+1)
Low Power Design Essentials ©2008 4.14
Fit parameters: Von, d, Kd, g
Alpha-power based Delay Model
VDDref = 1.2V, technology 90 nm
)1
1()()(
11
i
inom
i
iwi
onDD
DDdp C
C
C
CCC
VV
VKt
d
(90nm technology)
0 2 4 6 8 100
10
20
30
40
50
60
Fanout (Ci+1/Ci)
Del
ay (
ps)
tp
0.5 0.6 0.7 0.8 0.9 1 0
0.5
1
1.5
2
2.5
3
3.5
4
VDD
/ VDDref
FO
4 de
lay
(nor
m.)
Von
= 0.37 Va
d = 1.53
simulationmodel
tnom = 6 psg = 1.35
simulationmodel
Low Power Design Essentials ©2008 4.15
Parasitic delay pi – depends upon gate topology
Electrical effort fi ≈ Si+1/Si
Logical effort gi – depends upon gate topology
Effective fanout hi = figi
For Complex Gates
[Ref: I. Sutherland, Morgan-Kaufman’99]
Combined with Logical Effort Formulation
)(
iiinomp
gfpt
Low Power Design Essentials ©2008 4.16
= energy consumed by logic gate i
Dynamic Energy
i i+1
CwgCiCi Ci+1
VDD,i+1VDD,i
iiiiwiiei
iDDiiiDDiwidyn
SSCCCfSKC
VfCVCCCE
//)(
)()(
11
2,
2,1
)( 2,
21, iDDiDDiei VVSKE
Low Power Design Essentials ©2008 4.17
for equal h
(Dmin)
max at VDD(max)
(Dmin)
Depends on Sensitivity (E/D)
Optimizating Return on Investment (ROI)
Gate Sizing
Supply Voltage
)( 1
iinom
i
i
i
hh
E
SDS
E
DD
ond
DD
on
DD
DD
VVVV
D
E
VDV
E
1
)1(2
Low Power Design Essentials ©2008 4.18
Properties of inverter chain– Single path topology– Energy increases geometrically from input to output
Example: Inverter Chain
CL1
S1 = 1 S2 … SNS3
Goal– Find optimal sizing S = [S1, S2, …, SN], supply voltage, and
buffering strategy to achieve the best energy-delay tradeoff
Low Power Design Essentials ©2008 4.19
Variable taper achieves minimum energy Reduce number of stages at large dinc
[Ref: Ma, JSSC’94]
Inverter Chain: Gate Sizing
1 2 3 4 5 6 70
5
10
15
20
25
stage
effe
ctiv
e fa
nout
, h
0%
1%
10%
30%
dinc
= 50%nomopt
1
21
112
21
ii
iS
Snom
DDe
i
iii
hh
EF
F
VKS
SSS
Low Power Design Essentials ©2008 4.20
VDD reduces energy of the final load first
Variable taper achieved by voltage scaling
Inverter Chain: VDD Optimization
1 2 3 4 5 6 70
0.2
0.4
0.6
0.8
1.0
stage
V DD
/ V
DDno
m0%
1%
10%
30%
dinc
= 50%
nomopt
Low Power Design Essentials ©2008 4.21
Parameter with the largest sensitivity has the largest potential for energy reduction
Two discrete supplies mimic per-stage VDD
Inverter Chain: Optimization Results
50
inc
0 10 20 30 400
20
40
60
80
100
d (%)en
ergy
red
uctio
n (%
)
0 10 20 30 40 500
0.2
0.4
0.6
0.8
1.0
dinc
(%)
Sen
sitiv
ity (
norm
)
cVDD
SgVDD
2VDD
Low Power Design Essentials ©2008 4.22
Tree adder– Long wires– Re-convergent paths– Multiple active outputs
S0
S15
(A0, B0)
(A15, B15)
Cin
Example: Kogge-Stone Tree Adder
[Ref: P. Kogge, Trans. Comp’73]
Low Power Design Essentials ©2008 4.23
sizing: E (-54%)dinc=10%
referenceD=Dmin
2Vdd: E (-27%)dinc=10%
Tree Adder: Sizing vs. Dual-VDD Optimization
Reference design: all paths are critical
Internal energy S more effective than VDD
– S: E(-54%), 2Vdd: E(-27%) at dinc = 10%
Low Power Design Essentials ©2008 4.24
Tree Adder: Multi-dimensional Search
Can get pretty close to optimum with only 2 variables Getting the minimum speed or delay is very expensive
En
erg
y /
Ere
f
Delay / Dmin
0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20
0.2
0.4
0.6
0.8
1Reference
S, VDD
VDD, VTH
S, VTH
S, VDD, VTH
Low Power Design Essentials ©2008 4.25
Block-level supply assignment– Higher throughput/lower latency functions are
implemented in higher VDD
– Slower functions are implemented with lower VDD
– This leads to so-called “voltage islands” with separate supply grids
– Level conversion performed at block boundaries
Multiple supplies inside a block– Non-critical paths moved to lower supply voltage– Level conversion within the block– Physical design challenging
Multiple Supply Voltages
Low Power Design Essentials ©2008 4.26
V1 = 1.5V, VTH = 0.3V
Using Three VDD’s
0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
V1 (V)V2
(V)
+
V2 (V)
V3 (
V)
0.4 0.6 0.8 1 1.2 1.4
0.4
0.6
0.8
1
1.2
1.4
V2 (V)
V3 (V)
Po
we
r R
ed
uc
tio
n R
ati
o
00.5
11.5
0
0.5
1
1.50.4
0.5
0.6
0.7
0.8
0.9
1
[Ref: T. Kuroda, ICCAD’02]
© IEEE 2002
Low Power Design Essentials ©2008 4.27
1.0
0.5
VD
D R
ati
o
1.0
0.4
0.5 1.0 1.5V1 (V)
P R
ati
o
V2/V1
P2/P1
{ V1, V2 }
V2/V1
V3/V1
{ V1, V2, V3 }
0.5 1.0 1.5V1 (V)
P3/P1
V2/V1
V3/V1
V4/V1
0.5 1.0 1.5V1 (V)
P4/P1
{ V1, V2, V3, V4 }
[Ref: M. Hamada, CICC’01]
Optimum Number of VDD’s
The more VDD’s the less power, but the effect saturates Power reduction effect decreases with scaling of VDD
Optimum V2/V1 is around 0.7
© IEEE 2001
Low Power Design Essentials ©2008 4.28
Two supply voltages per block are optimal
Optimal ratio between the supply voltages is 0.7
Level conversion is performed on the voltage boundary, using a level-converting flip-flop (LCFF)
An option is to use an asynchronous level converter– More sensitive to coupling and supply noise
Lessons: Multiple Supply Voltages
Low Power Design Essentials ©2008 4.29
i1 o1
VDDHVDDL
VSS
Conventional
VDDH circuit VDDL circuit
i2 o2i1 o1
VDDH
VDDL
VSS
Shared N-well
VDDH circuit VDDL circuit
i2 o2
Distributing Multiple Supply Voltages
Low Power Design Essentials ©2008 4.30
VDDH circuit
VDDH VDDL
VSS
N-well isolation
VDDL circuit
(a) Dedicated row
(b) Dedicated region
VDDH Row
VDDH Row
VDDH
RegionVDDL
Region
Conventional
VDDL Row
VDDL Row
Low Power Design Essentials ©2008 4.31
VDDH circuit
VDDH
VDDL
VSS
Shared N-well
VDDL circuit
(a) Floor plan image
VDDL circuit
VDDH circuit
Shared N-Well
[Shimazaki et al, ISSCC’03]
Low Power Design Essentials ©2008 4.32
Lower VDD portion is shared
[Ref: M. Takahashi, ISSCC’98]
“Clustered voltage scaling”
Example: Multiple Supplies in a Block
FF
FF
FF
FFFF
FF
FF
FF
FF
FF
CVS StructureConventional Design
Critical Path
Level-Shifting F/F
Critical Path
FF
FF
FF
FF
FF
FF FF
FF
FF
FF
FF
© IEEE 1998
Low Power Design Essentials ©2008 4.33
Pulsed Half-Latch versus Master-Slave LCFFs Smaller # of MOSFETs / clock loading Faster level conversion using half-latch structure Shorter D-Q path from pulsed circuit
[Ref: F. Ishihara, ISLPED’03]
Level Converting Flip-Flops (LCFFs)
q
ck
ckb ck
clk
level conversion
ckb
ckd q (inv.)
ck
ckclk
level conversion
dmo
mf
sfso db
sfso
MN1 MN2
Master-Slave Pulsed Half-Latch
© IEEE 2003
Low Power Design Essentials ©2008 4.34
Pulsed precharge LCFF (PPR)– Fast level conversion by
precharge mechanism– Suppressed
charge/discharge toggle by conditional capture
– Short D-Q path
clk
ckd1
qb
clk level conversion
x
db
qb
ckd1
VDDH
VDDH
VDDH
d
xb
IV1
q (inv.)
ck
MN1
MN2
MP1
[Ref: F. Ishihara, ISLPED’03]
Dynamic Realization of Pulsed LCFF
Pulsed Precharge Latch
© IEEE 2003
Low Power Design Essentials ©2008 4.35
carrygen.
partialsum
gpgen.
5:1MUX
ain
bin
carry
s0/s1
sum
sumb (long loop-back bus)
clk
clock gen.
: VDDH circuit
: VDDL circuit
INV1INV2
0.5pF
sumsel.
2:1MUX
9:1MUX
logicalunit
9:1MUX
ain0
Case Study: ALU for 64-bit mProcessor
[Ref: Y. Shimazaki, ISSCC’03]
© IEEE 2003
Low Power Design Essentials ©2008 4.36
sum
keeperpc
sumb
VDDH
VDDL
INV1 INV2
domino level converter (9:1 MUX)
ain0sel(VDDH)
VDDH
VDDL
INV2 is placed near 9:1 MUX to increase noise immunity Level conversion is done by a domino 9:1 MUX
Low-Swing Bus and Level Converter
[Ref: Y. Shimazaki, ISSCC’03]
© IEEE 2003
Low Power Design Essentials ©2008 4.37
Single-supply
Shared well(VDDH=1.8V)E
nerg
y [p
J]
TCYCLE [ns]
Room temperature
200
300
400
500
600
700
800
0.6 0.8 1.0 1.2 1.4 1.6
1.16GHz
VDDL=1.4VEnergy:-25.3% Delay :+2.8%
VDDL=1.2VEnergy:-33.3% Delay :+8.3%
Measured Results: Energy and Delay
[Ref: Y. Shimazaki, ISSCC’03]
© IEEE 2003
Low Power Design Essentials ©2008 4.38
Practical Transistor Sizing
Continuous sizing of transistors only an option in custom design
In ASIC design flows, options set by available library
Discrete sizing options made possible in standard-cell design methodology by providing multiple options for the same cell– Leads to larger libraries (> 800 cells)– Easily integrated into technology mapping
Low Power Design Essentials ©2008 4.39
Larger gates reduce capacitance, but are slower
Technology Mapping
a
b
c
slack=1
d
f
Low Power Design Essentials ©2008 4.40
(a) Implemented using 4 input NAND + INV (b) Implemented using 2 input NAND + 2-input NOR
Library 1: High-Speed
Technology Mapping
Example: 4-input AND
Gatetype
Area (cell unit)
Input cap. (fF)
Average delay (ps)
Average delay (ps)
INV 3 1.8 7.0 + 3.8 CL 12.0 + 6.0 CL
NAND2 4 2.0 10.3 + 5.3 CL 16.3 + 8.8 CL
NAND4 5 2.0 13.6 + 5.8 CL 22.7 + 10.2 CL
NOR2 3 2.2 10.7 + 5.4 CL 16.7 + 8.9 CL
Library 2: Low-Power
(delay formula: CL in fF)
(numbers calibrated for 90 nm)
Low Power Design Essentials ©2008 4.41
Technology Mapping – Example
4-input AND(a) NAND4 +
INV(b) NAND2 +
NOR2
Area 8 11
HS: Delay (ps) 31.0 + 3.8 CL 32.7 + 5.4 CL
LP: Delay (ps) 53.1 + 6.0 CL 52.4 + 8.9 CL
Sw Energy (fF) 0.1 + 0.06 CL 0.83 + 0.06 CL
Area– 4-input more compact than 2-input (2 gates vs. 3 gates)
Timing– both implementations are 2-stage realizations– 2nd stage INV (a) is better driver than NOR2 (b)– For more complex blocks, simpler gates will show better
performance Energy
– Internal switching increases energy in the 2-input case– Low-power library has worse delay, but lower leakage (see later)
Low Power Design Essentials ©2008 4.42
Technology mapping Gate selection Sizing Pin assignment
Logical Optimizations Factoring
Restructuring
Buffer insertion/deletion
Don’t care optimization
Gate-Level Tradeoffs for Power
Low Power Design Essentials ©2008 4.43
Logic restructuring to minimize spurious transitions
Buffer insertion for path balancing
Logic Restructuring
01
1
1
0
1
1
1
0
11
1
1
1
1
11
12
3
Low Power Design Essentials ©2008 4.44
Idea: Modify network to reduce capacitance
Caveat: This may increase activity!
pa = 0.1; pb = 0.5; pc = 0.5
Algebraic Transformations
a
bc
ff
a
a
b
c
p1=0.05
p2=0.05
p3=0.075
p4=0.75
p5=0.075
Low Power Design Essentials ©2008 4.45
Joint optimization over multiple design parameters possible using sensitivity-based optimization framework– Equal marginal costs ⇔ Energy-efficient design
Peak performance is VERY power inefficient– About 70% energy reduction for 20% delay penalty– Additional variables for higher energy-efficiency
Two supply voltages in general sufficient; 3 or more supply voltages only offer small advantage
Choice between sizing and supply voltage parameters depends upon circuit topology
But … leakage not considered so far
Lessons from Circuit Optimization
Low Power Design Essentials ©2008 4.46
Considering leakage as well as dynamic power is essential in sub-100 nm technologies
Leakage is not essentially a bad thing– Increased leakage leads to improved
performance, allowing for lower supply voltages– Again a trade-off issue …
Considering Leakage @ Design Time
Low Power Design Essentials ©2008 4.47
Must adapt to process and activity variations
2
ln
Lk Sw optd
avg
E EL
K
Topology Inv Add Dec
(ELk/ESw)opt 0.8 0.5 0.2
Leakage – Not Necessarily a Bad Thing
Optimal designs have high leakage (ELk/ESw ≈ 0.5)
10-2
10-1
100
101
0
0.2
0.4
0.6
0.8
1
Estatic /Edynamic
En
orm
Vthref-180mV
0.81VDDmax
Vthref-140mV
0.52VDDmax
Version 1
Version 2
[Ref: D. Markovic, JSSC’04]
© IEEE 2004
Low Power Design Essentials ©2008 4.48
Switching energy
Leakage energy
with: I0(Y): normalized leakage current with inputs in state Y
Refining the Optimization Model
210 )( DDedyn VfSKE
cycleDDqkT
VV
stat TVeSIEDDdTH
/0 )(
Low Power Design Essentials ©2008 4.49
Using longer transistors– Limited benefit– Increase in active current
Using higher thresholds– Channel doping– Stacked devices– Body biasing
Reducing the voltage!!
Reducing Leakage @ Design Time
Low Power Design Essentials ©2008 4.50
10% longer gates reduce leakage by 50%
Increases switching power by 18% with W/L = const.
Doubling L reduces leakage by 5x Impacts performance
– Attractive when don’t have to increase W (e.g. memory)
Longer Channels
100 110 120 130 140 150 160 170 180 190 2000.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Transistor length (nm)
1
2
3
4
5
6
7
8
9
10
90 nm CMOS
Switching energy
Leakage power
Nor
mal
ized
sw
itchi
ng e
nerg
y
Nor
mal
ized
leak
age
pow
er
Low Power Design Essentials ©2008 4.51
There is no need for level conversion
Dual thresholds can be added to standard design flows– High-VTh and Low-VTh libraries are a standard in sub-0.18m
processes– For example: can synthesize using only high-VTh and then only
in-place swap in low-VTh cells to improve timing.
– Second VTh insertion can be combined with resizing
Only two thresholds are needed per block– Using more than two yields small improvements
Using Multiple Thresholds
Low Power Design Essentials ©2008 4.52
VDD = 1.5V, VTH.1 = 0.3V
Three VTH’s
0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
Vth2 (V)
Vth1
(V)
+
VTH.3 (V)
VT
H.2 (
V)
0.4 0.6 0.8 1 1.2 1.4
0.4
0.6
0.8
1
1.2
1.4
Lea
kag
e R
edu
ctio
n R
atio
VTH.3 (V)
VTH.2 (V) 0
0.51
1.5
0
0.5
11.50
0.2
0.4
0.6
0.8
1
Impact of third threshold very limited
[Ref: T. Kuroda, ICCAD’02]
© IEEE 2002
Low Power Design Essentials ©2008 4.53
Using Multiple Thresholds
FF
FF
FF
FF
FF
Cell-by-cell VTH assignment (not at block level)
Achieves all-low-VTH performance with substantial leakage reduction in leakage
Low VTHHigh VTH
[Ref: S. Date, SLPE’94]
Low Power Design Essentials ©2008 4.54
Shaded transistors are low threshold
Low-threshold transistors used only in critical paths
Dual-VT Domino
P1
Inv1
Inv2 Inv3
Dn+1
Clkn
Clkn+1
Dn …
Low Power Design Essentials ©2008 4.55
Easily introduced in standard cell design methodology by extending cell libraries with cells with different thresholds– Selection of cells during technology mapping– No impact on dynamic power– No interface issues (as was the case with multiple
VDD’s)
Impact: Can reduce leakage power substantially
Multiple Thresholds and Design Methodology
Low Power Design Essentials ©2008 4.56
High-VTH Only
Low-VTH Only
Dual VTH
Total Slack -53 psec 0 psec 0 psec
Dynamic Power 3.2 mW 3.3 mW 3.2 mW
Static Power 914 nW 3873 nW 1519 nW
All designs synthesized automatically using Synopsys Flows
[Courtesy: Synopsys, Toshiba, 2004]
Dual-VTH Design for High-Performance Design
Low Power Design Essentials ©2008 4.57
Example: High- vs. Low-Threshold Libraries
i10 des C7552 seq pair AVER0
1000
2000
3000
4000
5000
6000
7000
8000
LVthLVth+HVthHVthHVth+LVth
Lea
kag
e P
ow
er
(nW
)
Selected combinational tests130 nm CMOS
[Courtesy: Synopsys 2004]
Low Power Design Essentials ©2008 4.58
Complex Gates Increase Ion/Ioff Ratio
Ion and Ioff of single NMOS versus stack of 10 NMOS transistors
Transistors in stack are sized up to give similar drive
No stack
Stack
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.5
1
1.5
2
2.5
3
VDD (V)
I off (
nA
)
No stack
Stack
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
20
40
60
80
100
120
140
I on (m
A)
VDD (V)
(90nm technology) (90nm technology)
Low Power Design Essentials ©2008 4.59
Complex Gates Increase Ion/Ioff Ratio
Stacking transistors suppresses submicron effects Reduced velocity saturation Reduced DIBL effect Allows for operation at lower thresholds
Stack
No stack
Factor 10!
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.5
1
1.5
2
2.5
3
3.5x 105
VDD (V)
I on/I o
ff r
atio
(90nm technology)
Low Power Design Essentials ©2008 4.60
Example: 4-input NAND
With transistors sized for similar performance: Leakage of Fan-in(2) = Leakage of Fan-in(4) x 3(Averaged over all possible input patterns)
Fan-in (2)Fan-in (4)
versus
Complex Gates Increase Ion/Ioff Ratio
2 4 6 8 10 12 14 160
2
4
6
8
10
12
14
Input pattern
Lea
kag
e C
urr
ent
(nA
)
Fan-in (2)
Fan-in (4)
Low Power Design Essentials ©2008 4.61
Example: 32 bit Kogge-Stone Adder
[Ref: S.Narendra, ISLPED’01]
% o
f in
pu
t v
ecto
rs
Standby leakage current (mA)
factor 18
Reducing the threshold by 150 mV increases leakage of single NMOS transistor by factor 60
© Springer 2001
Low Power Design Essentials ©2008 4.62
Circuit optimization can lead to substantial energy reduction at limited performance loss
Energy-delay plots the perfect mechanisms for analyzing energy-delay trade-off’s.
Well-defined optimization problem over W, VDD and VTH parameters
Increasingly better support by today’s CAD flows
Observe: leakage is not necessarily bad – if appropriately managed.
Summary
Low Power Design Essentials ©2008 4.63
Books: A. Bellaouar, M.I Elmasry, Low-Power Digital VLSI Design Circuits and Systems, Kluwer
Academic Publishers, 1st Ed, 1995. D. Chinnery, K. Keutzer, Closing the Gap Between ASIC and Custom, Springer, 2002. D. Chinnery, K. Keutzer, Closing the Power Gap Between ASIC and Custom, Springer, 2007. J. Rabaey, A. Chandrakasan, B. Nikolic, Digital Integrated Circuits: A Design Perspective, 2nd ed,
Prentice Hall 2003. I. Sutherland, B. Sproul, D. Harris, Logical Effort: Designing Fast CMOS Circuits, Morgan-
Kaufmann, 1st Ed, 1999.
Articles: R.W. Brodersen, M.A. Horowitz, D. Markovic, B. Nikolic, V. Stojanovic, “Methods for True Power
Minimization,” Int. Conf. on Computer-Aided Design (ICCAD), pp. 35-42, Nov. 2002. S. Date, N. Shibata, S.Mutoh, and J. Yamada, "IV 30MHz Memory-Macrocell-Circuit Technology
with a 0.5urn Multi-Threshold CMOS," Proceedings of the 1994 Symposium on Low Power Electronics, San Diego, CA, pp. 90-91, Oct. 1994.
M. Hamada, Y. Ootaguro, T. Kuroda, “Utilizing Surplus Timing for Power Reduction,” IEEE Custom Integrated Circuits Conf., (CICC), pp. 89-92, Sept. 2001.
F. Ishihara, F. Sheikh, B. Nikolic, “Level conversion for dual-supply systems,” Int. Conf. Low Power Electronics and Design, (ISLPED), pp. 164-167, Aug. 2003.
P.M. Kogge and H.S. Stone, “A Parallel Algorithm for the Efficient Solution of General Class of Recurrence Equations,” IEEE Trans. Comput., vol. C-22, no. 8, pp. 786-793, Aug 1973.
T. Kuroda, “Optimization and control of VDD and VTH for low-power, high-speed CMOS design,” Proceedings ICCAD 2002, pp. , San Jose, Nov. 2002.
References
Low Power Design Essentials ©2008 4.64
Articles (cont.): H.C. Lin and L.W. Linholm, “An Optimized Output Stage for MOS Integrated Circuits,” IEEE J.
Solid-State Circuits, vol. SC-10, no. 2, pp. 106-109, Apr. 1975. S. Ma and P. Franzon, “Energy Control and Accurate Delay Estimation in the Design of CMOS
Buffers,” IEEE J. Solid-State Circuits, vol. 29, no. 9, pp. 1150-1153, Sept. 1994. D. Markovic, V. Stojanovic, B. Nikolic, M.A. Horowitz, R.W. Brodersen, “Methods for True
Energy-Performance Optimization,” IEEE Journal of Solid-State Circuits, vol. 39, no. 8, pp. 1282-1293, Aug. 2004.
MathWorks, http://www.mathworks.com S. Narendra, S. Borkar, V. De, D. Antoniadis, A. Chandrakasan, “Scaling of stack effect and its
applications for leakage reduction,” Int. Conf. Low Power Electronics and Design, (ISLPED), pp. 195-200, Aug. 2001.
T. Sakurai and R. Newton, “Alpha-Power Law MOSFET Model and its Applications to CMOS Inverter Delay and Other Formulas,” IEEE J. Solid-State Circuits, vol. 25, no. 2, pp. 584-594, Apr. 1990.
Y. Shimazaki, R. Zlatanovici, B. Nikolic, “A shared-well dual-supply-voltage 64-bit ALU,” Int. Conf. Solid-State Circuits, (ISSCC), pp. 104-105, Feb. 2003.
V. Stojanovic, D. Markovic, B. Nikolic, M.A. Horowitz, R.W. Brodersen, “Energy-Delay Tradeoffs in Combinational Logic using Gate Sizing and Supply Voltage Optimization,” European Solid-State Circuits Conf., (ESSCIRC), pp. 211-214, Sept. 2002.
M. Takahashi et al., “A 60mW MPEG video codec using clustered voltage scaling with variable supply-voltage scheme,” IEEE Int. Solid-State Circuits Conf., (ISSCC), pp. 36-37, Feb. 1998.
References
top related