fundamentals of vlsi cmos power consumption...fundamentals of vlsi cmos power consumption andreas...
TRANSCRIPT
-
Prof. Andreas Burg 1
Fundamentals of VLSI CMOS Power Consumption
Andreas Burg Telecommunications Circuits Laboratory
-
Prof. Andreas Burg 2
Energy Bottleneck
• Mobile devices: energy-efficiency
• Battery capacity grows only very slowly
• Boost in the 1990s due to Mobile Phone introduction
• Capacity growth stalled
since 2000 at the limit
of Li-ion
• Only 3%-7% annual
improvement
-
Prof. Andreas Burg 3
Power Bottleneck
• Thermal Design Power: upper limit on power consumption
• Microprocessors for servers: ~30-100 W/cm2
• Mobile devices: ~3W total (handheld)
Package cost
• 4-5W for cheap packages
• 100 W/cm2 for air cooling
• 7.5kW/rack
Power delivery
• > 1000 pins for power delivery
on a 100W processor
Performance penalty
• 25 -> 100 deg. C : 30%
-
Prof. Andreas Burg 4
Power Bottleneck
Complex, large and costly power supply circuits: Three-phase step-down
converter built from toroidal coils, power MOSFETs, and electrolytic capacitors.
-
Prof. Andreas Burg 5
Motivation for Low Power Design
• High-performance circuits
– How to get the heat out?
– How to supply massive currents at very low voltages?
– How to avoid critical voltage drops on supply rails?
• Battery-operated circuits
– How long can we operate a device on a battery charge?
-
Prof. Andreas Burg 6
Table of Contents
Basics in CMOS Power Consumption and Low Power Design
• Active Power Consumption in CMOS
• Leakage Power Consumption
• Voltage Scaling and Sub-VT Design
-
Prof. Andreas Burg 7
Sources of Power Consumption in CMOS
• Four phenomena dissipate energy in digital CMOS circuits
– Charging and discharging of capacitive loads
– Crossover currents
– Leakage currents
– (Driving of resistive loads)
Power
Energy
-
Prof. Andreas Burg 8
Active Power
Consumption in CMOS
-
Prof. Andreas Burg 9
Basic Inverter in Steady State (ideal)
– PMOS performs pull-up to 𝑉𝐷𝐷
– NMOS performs pull-down to 𝐺𝑁𝐷
• Complementary gate: output connected to either 𝑉𝐷𝐷 or GND
– Static (steady state)
– Ideally, no current path from 𝑉𝐷𝐷 to 𝐺𝑁𝐷
– Ideally, no static power consumption
𝑉𝐺𝑆 = 0
𝑉𝐺𝑆 = −𝑉𝐷𝐷
𝑉𝐷𝐷
0: 𝐺𝑁𝐷
𝐺𝑁𝐷
1: 𝑉𝐷𝐷 o
pen
c
los
ed
𝑉𝐺𝑆 = 𝑉𝐷𝐷
𝑉𝐺𝑆 = 0
𝑉𝐷𝐷
1: 𝑉𝐷𝐷
𝐺𝑁𝐷
0: 𝐺𝑁𝐷
op
en
c
los
ed
-
Prof. Andreas Burg 10
CMOS Transistors: Parameters
The gain factor 𝛽 is a function of process parameters and layout geomerty
𝛽 =𝜇𝜀𝑂𝑋𝑡𝑂𝑋
𝑊
𝐿
where
𝑡𝑂𝑋 gate dielectric thikness
𝜀𝑂𝑋 gate dielectric permitivity
𝜇 effective carrier mobility in inversion layer
𝑊 channel (gate) width
𝐿 channel (gate) length
• Designer sets the drive-strength by controlling width and length of the transistor
Design parameters
-
Prof. Andreas Burg 11
CMOS Inverter: In-Out Transfer Characteristic (Static)
Inverter as non-linear amplifier with a large, but finite gain in the transition region
• Cross-over currents lead to power consumption during transients
(a) Transfer characteristic (b) Crossover current (c) Logic states
Dominant during
transition region
𝑉𝑖𝑛𝑝 𝑉𝑜𝑢𝑝
-
Prof. Andreas Burg 12
Minimizing transient currents
– Fast input slow output: driving device quickly shuts of completely
– Slow input fast output: driving device remains long in linear region
• Input of one device is output of the other device: balance input-output delay
for optimum power consumption
Pmos long in
saturation while
nMOS already
turns on during
input transition
Pmos remains in
linear region during
most of the input
transition (until it turns
off by the rising input)
-
Prof. Andreas Burg 13
CMOS Gates With Capacitive Load
• Various capacitances are merged into a single load capacitor 𝐶𝐿
– Intrainsic MOS transistor capacitors (driver)
– Extrinsig (fanout) MOS transistor capacitances
– Interconnect capacitance
𝐶𝐿
𝐶𝑊
Wider transistors
increase the gain
factor (drive) but
also increase the
load (capacitance)
-
Prof. Andreas Burg 14
Reminder: Power/Energy of an Inverter with Capacitive Load
• Energy consumed during one pair of transitions 𝐸↓↑:
– Cross-over currents
– Charge pumped onto the capacitive load
(dominant):
• 𝐸↓↑ = 𝐶𝐿𝑉𝑑𝑑 𝑉𝑑𝑑 = 𝐶𝐿𝑉𝑑𝑑2
• independent of transistor geometry (width/length)
• Independent of the waveforms
• quadratic dependency on voltage
• Energy/transition
• 𝐸𝑡 = 𝐶𝐿𝑉𝑑𝑑2 /2
• Power consumption = Energy/transition * transition/cycle (𝛼) * frequency (𝑓𝑐𝑙𝑘)
• 𝑃 =𝛼
2𝐶𝐿𝑉𝑑𝑑
2 𝑓𝑐𝑙𝑘
𝑉𝐷𝐷
𝐺𝑁𝐷
𝑽𝑫𝑫 → 𝑮𝑵𝑫
𝑮𝑵𝑫 → 𝑽𝑫𝑫 𝐶𝐿
-
Prof. Andreas Burg 15
Gate-Level Power Modeling
• Power consumption is divided into
– Net switching power
– Internal power:
• Internal power depends on actual input values
• Power is consumed even if output does not change
• Library files: internal energy characterization for each cell at given supply
voltage
– Internal energy (cross-current, switching) per change in each input and output
(as functions of input slope 𝑡𝑟𝑓 and output load 𝐶)
– Contribution to capacitance of the connected net (input/output load)
𝐸𝐴𝑂𝐼𝐴 𝐵, 𝐶, 𝐷, 𝑡𝑟𝑓
𝐸𝐴𝑂𝐼𝐵 𝐴, 𝐶, 𝐷, 𝑡𝑟𝑓
𝐸𝐴𝑂𝐼𝐶 𝐴, 𝐵, 𝐷, 𝑡𝑟𝑓
𝐸𝐴𝑂𝐼𝐷 𝐴, 𝐵, 𝐶, 𝑡𝑟𝑓
𝐸𝐴𝑂𝐼𝑍 𝐶
𝐶𝐴𝑂𝐼𝑍
𝐶𝐼𝑁𝑉𝐴 𝐶𝑛𝑒𝑡
𝐶 = 𝐶𝐴𝑂𝐼𝑍 + 𝐶𝑛𝑒𝑡 + 𝐶𝐼𝑁𝑉
𝐴
-
Prof. Andreas Burg 16
Methods for Power Analysis
What about the activity factor(s)?
• Fixed activity:
Assume a constant activity factor for all nodes in the circuit
– Very rough estimate and highly inaccurate
• Statistical power analysis:
Assumes a given toggle activity at the input and propagates the activity
throughout the circuit using statistical models of the gates
– Does not account for correlation between signal values
– No accounting for glitching activity
• Simulation based:
Obtains toggle statistics from gate level simulations
– Most accurate method
– Slow
-
Prof. Andreas Burg 17
Gate-Level Power Analysis Flow
RTL
code
Gate-level
netlist
Delay
annotation
Parasitic/wiring
capacitances
Switching activity
Waveform trace
Backend
design & CTGen
Final gate-
level netlist
Power
analysis
-
Prof. Andreas Burg 18
Leakage Power Consumption
-
Prof. Andreas Burg 19
Leakage Power
• Transistors leak currents even when in off-state
• Sources for leakage
– Sub-threshold leakage
• Dominant component in most circuits
– Gate tunneling
• Generally low, even in modern technologies due to
high-k gate dielectrics
• Decreases very rapidly with decreasing 𝑉𝑑𝑑
– Junction current
• Generally low
• Decreases very rapidly with decreasing 𝑉𝑑𝑑
G
S
D
B
G
S
D
B
G
S
D
B
-
Prof. Andreas Burg 20
Leakage Power
• Long channel deices (>130nm): 𝐼𝐷𝑆 = 𝐼0𝑒𝑉𝐺𝑆−𝑉𝑡ℎ𝑣𝑡𝑛
– 𝐼𝐷𝑆 mostly independent from Drain-Source Voltage
– Leakage current depends strongly on 𝑉𝐺𝑆 − 𝑉𝑡ℎ
• Decreasing threshold voltage increases leakage
• Impact of technology scaling on sub-threshold
leakage (
-
Prof. Andreas Burg 21
Leakage Power over Temperature
Drain current depends exponentially on thermal voltage 𝑣𝑡 = 𝒌𝑻/𝒒
𝐼𝐷𝑆 = 𝐼0𝑒𝑉𝐺𝑆−𝑉𝑡ℎ𝑣𝑡𝑛
• Exponential 𝐼𝐷𝑆 increase with temperature
Vivek De, Intel
Example: 0.7V, 100nm
process, 15mm2 die
n
-
Prof. Andreas Burg 22
Leakage in Transistor Stacks
• Stacking occurs
– In many logic gates (> 1 input)
– When introduced intentionally for leakage reduction
𝑉𝑀 𝑉𝑀
Small speed
penalty: ~25%
𝐼𝑙𝑒𝑎𝑘,𝑀1 = 𝐼0𝑒−𝑉𝑀−𝑉𝑡ℎ+𝝀𝑫𝑺 𝑉𝒅𝒅−𝑉𝑴
𝑣𝑡𝑛
𝐼𝑙𝑒𝑎𝑘,𝑀2 = 𝐼0𝑒−𝑉𝑡ℎ+𝝀𝑫𝑺𝑉𝑴
𝑣𝑡𝑛
-
Prof. Andreas Burg 23
Modern technologies offer different device flavors
• Devices with different threshold voltages => can often be combined on
same die/wafer
• Different process flavors (can typically not be mixed on same wafer)
M. Bohr, Intel Developer Forum 2009
-
Prof. Andreas Burg 24
Modern technologies offer different device flavors
• Sometimes IO transistors are an interesting option: low-leakage, high-VT
but large distance to core transistors in the layout required
-
Prof. Andreas Burg 25
Threshold Voltage Selection
• Modern process technologies support devices with different threshold voltages
– Typically three flavors: low-VT, standard-VT, high-VT
– Often all three flavors can be mixed in the same design
• VT-selection: tradeoff between speed and leakage
• Example: 55nm process
– Small increase in speed comes with a significant leakage penalty
𝑡𝑝𝑑 =𝑡𝑂𝑋𝜇𝜀𝑂𝑋
𝐿
𝑊𝐶𝐿
𝑉𝐷𝐷𝑉𝐷𝐷 − 𝑉𝑡ℎ
𝛼 𝐼𝑙𝑒𝑎𝑘 = 𝐼0𝑒−𝑉𝑡ℎ+𝝀𝑫𝑺𝑉𝑫𝑆
𝑣𝑡𝑛
HVT SVT LVT
Delay 20ps 16ps 14ps
Leakage 30nW 60nW 200nW
-
Prof. Andreas Burg 26
Multi-VT Design
• Design tradeoff when choosing a VT flavor:
– Less leakage (high-VT) increases delay and vice versa
– Threshold voltage types can often be mixed
• Multi-VT design
– Use low-VT cells only on critical paths
– High-VT cells are used in all other paths
• Methodology:
– Either done by replacing non-critical cells in the backend OR already during
synthesis by providing multiple libraries (HVT/SVT and LVT)
D Q D Q
D Q
Caveat: can be very problematic
for near-VT or sub-VT design:
path delays scale very differently
-
Prof. Andreas Burg 27
Body Bias Modulates Threshold Voltage
• Body of the transistor is often connected to the source (no body bias)
• Introducing a body bias modulates threshold voltage
– Forward Body Bias (FBB): increases threshold voltage
– Reverse Body Bias (RBB): reduces threshold voltage
• 𝑉𝑡ℎ = 𝑉𝑡ℎ0 − 𝜆𝐵𝑆𝑉𝐵𝑆
• BULK CMOS:
– Effect of body bias decreases for
technologies below 100nm
– FBB is limited to ~300mV to avoid
operating junction diodes in forward
direction
𝑉𝐵𝑆 < 0
𝐷
𝑆
𝐺 𝑉𝐵𝑆 > 0
𝐺
𝑆
𝐷
𝑉𝑡ℎ0 < 0 𝑉𝑡ℎ0 > 0
𝑉𝐵𝑆 > 0
𝐷
𝑆
𝐺 𝑉𝐵𝑆 < 0
𝐺
𝑆
𝐷
FB
B
RB
B
-
Prof. Andreas Burg 28
Power Gating
• Avoid leakage almost completely when individual design units are not used:
– Disconnect entire modules from the supply with headers and/or footers)
• Objectives with conflicting requirements
– Sleep mode: large off-resistance to avoid leakage (stacking)
• PMOS preferred over NMOS and HVT over LVT, header+footer
– Active mode: minimize on-resistance to reduce negative impact on timing
• Sleep transistors require large area
• NMOS preferred over PMOS, LVT over HVT, footer-only
Logic
sleep
sleep
Logic
sleep
Logic
sleep
-
Prof. Andreas Burg 29
Power Mode Transition
• Rapid re-activation of a power gated block can cause large spikes on the
supply network of the entire circuit
• Popular solutions:
Suhwan Kim, Stephen V. Kosonocky, and Daniel R. Knebel. 2003.
Understanding and minimizing ground bounce during mode transition of
power gating structures. In Proceedings of the 2003 international
symposium on Low power electronics and design (ISLPED '03). ACM,
New York, NY, USA, 22-25. DOI=10.1145/871506.871515
http://doi.acm.org/10.1145/871506.871515
-
Prof. Andreas Burg 30
Voltage Scaling and Sub-VT Design
-
Prof. Andreas Burg 31
Voltage Scaling
𝐸 = 𝐶𝐿𝑉𝑑𝑑
2
2𝛽𝑓𝑐𝑙𝑘 + 𝑉𝑑𝑑𝐼𝑙𝑒𝑎𝑘𝑑𝑡
𝑉𝑑𝑑𝐼𝑙𝑒𝑎𝑘𝑑𝑡 𝐶𝐿𝑉𝑑𝑑
2
2𝛽𝑓𝑐𝑙𝑘𝑑𝑡
Total Energy
Consumption
Static Energy
Consumption
Dynamic Energy
Consumption
G
S
D
S G
D
VDD
C
Minimize leakage energy by:
Reducing voltage
Reducing Vdd to
GND paths
Minimize active energy by:
Reducing voltage
Switching activity
Capacitance
-
Prof. Andreas Burg 32
Voltage Scaling
• Reducing supply voltage below nominal
– Most popular and most effective low-power strategy
– Voltage-scaling
• Reduces active power
• Reduces leakage power (but not necessarily energy/Op)
• Reduces speed : need to compensate with architectural changes (e.g., parallel
processing)
𝐸 = 𝐶𝐿𝑉𝑑𝑑
2
2𝛽𝑓𝑐𝑙𝑘 + 𝑉𝑑𝑑𝐼𝑙𝑒𝑎𝑘𝑑𝑡
𝑉𝑑𝑑𝐼𝑙𝑒𝑎𝑘𝑑𝑡 𝐶𝐿𝑉𝑑𝑑
2
2𝛽𝑓𝑐𝑙𝑘𝑑𝑡
Total Energy
Consumption
Static Energy
Consumption
Dynamic Energy
Consumption
-
Prof. Andreas Burg 33
Voltage Scaling (Impact on Speed)
• Inverter Delay
• Delay is a function of the supply
voltage above VTh – Depends strongly on the overdrive – Decreases as overdrive decreases
• Delay in the sub-VTh regime – Exponential dependency on
overdrive
33
𝒕𝒑𝒅 ∝𝑽𝑫𝑫𝑪𝒍𝒐𝒂𝒅
𝐼0𝑒𝑉𝑫𝑫−𝑉𝑡ℎ𝑣𝑡𝑛
𝑡𝑝𝐻𝐿 = 0.69 3
4
𝑉𝐷𝐷𝐶𝑙𝑜𝑎𝑑𝐼𝐷𝑆𝐴𝑇𝑛
= 0.52𝑉𝐷𝐷𝐶𝑙𝑜𝑎𝑑
𝑘𝑛𝑉𝐷𝑆𝐴𝑇𝑛 𝑉𝐷𝐷 − 𝑉𝑇𝐻𝑛 −𝑉𝐷𝑆𝐴𝑇𝑛
2
-
Prof. Andreas Burg 34
Example: StrongARM SA-1100 processor
Energy per operation as a function of voltage and clock rate
Observation: better energy efficiency for higher frequency at constant voltage.
– Reduced overhead per operation due to constant (leakage) currents
-
Prof. Andreas Burg 35
Ultra-Low-Power Design: Sub-Threshold Operation
• Near/below VT operation:
– Exponential delay/leakage increase
– Minimum energy voltage: balance
between leakage and active power
consumption
J. Rodrigues, PATMOS 2011, Keynote
Relatively flat around EMV
-
Prof. Andreas Burg 36
Ultra-Low-Power Design: Sub-Threshold Operation
• Real-time embedded system requirements
– Handle a given workload with lowest power consumption
• Optimum solution
– Operation at the energy minimum voltage with power gating during
idle periods to avoid leakage
– But, power gating is only effective when idle periods are long and
memories can often not be power gated and are the major source of
leakage
Operation @ EMV
with power gating
without power gating
Operation below EMV
without power gating
-
Prof. Andreas Burg 37
Example:
Power Consumption and Energy
Efficiency in Memories
-
Prof. Andreas Burg 38
Example: Memories Consume a Large Amount of Power
• For embedded processors, memories occupy a large percentage of the silicon area
• Active mode:
– Data and program memory can consume 2/3 of total power
– Low-frequency: SRAM leakage becomes visible
• Sleep modes:
– Generally, no power gating to retain SRAM content
– SRAM leakage becomes dominates system power consumption (3-4 pJ/bit in 180nm):
32kByte -> 400nW @ 1.8V
typical embedded processor
-
Prof. Andreas Burg 39
Example: Standard Cell Based Low Power Memory
Write Logic
Clock-gates (b): smaller and less power than enable flip-flops (a)
Read Logic
Above-VT
Multiplexers (c): smaller, faster, and less power than tri-state buffers
Sub-VT
Tri-state buffers (d): less leakage (energy) than multiplexers
Array of Storage Cells
Latch arrays smaller than flip-flop arrays, but longer write-address setup time
(a) Enable flip-flops (b) Clock
gates
(d) Tri-state buffers (c) Multiplexers
-
Prof. Andreas Burg 40
EMV
EMV
Sub-VT SCM: Insights and Leakage Breakdown
Large memory arrays:
little switching activity Total energy is
dominated by leakage
Active energy
negligible, except for
smallest SCMs
Only smallest SCMs
reach EMV in sub-VT
domain
→ Minimize leakage!
P. Meinerzhagen et al., JETCAS’11
Latch
Mux
Rest SCM
Leakage
Breakdown
-
Prof. Andreas Burg 41
Tri-state-
enabled output
Custom Cell: Low-Leakage Latch with Tri-State Output
Best practice for low leakage
1. Lowest number of VDD-ground paths
2. Highest resistance on each such
path
Tri-state buffers
Stacking & stretching
for inverters
Stacking factor: max 2
Channel length
stretching: 2Lmin
Convert output buffer to tri-state
buffer to avoid static CMOS muxes
Stacking &
stretching
-
Prof. Andreas Burg 42
4kb SCM Test Chip in LP-HVT 65nm CMOS
Chip microphotograph and zoomed-in layout picture
Area cost of 12.7 µm2 per bit (including peripherals) Scan-chain test interface
Functionality verification: W/R random and checker-board patterns
Oven to control temperature: 27 or 37°C
315µm
165µm
-
Prof. Andreas Burg 43
Silicon Measurements: Active Energy is 14 fJ/bit-access
Measured energy per bit-access performed at maximum speed
Measured energy minimum is 14fJ/bit at 500mV, 110kHz
-
Prof. Andreas Burg 44
Silicon Measurements: Leakage Power is 500fW/bit
At VDDhold=220mV, data is correctly held with a leakage power of 425-
500fW per bit (best and worst out of 4 measured dies)
At 37°C (typical for
biomedical implants)
• VDDmin=400mV
(instead of 420mV
at 27°C)
• Maximum operating
frequency doubles
• But: higher leakage
power
• Low retention voltage
is key for low power