l16: power dissipation in digital systemsweb.mit.edu/6.111/www/s2006/lectures/l16.pdf ·...
TRANSCRIPT
L16: 6.111 Spring 2006 1Introductory Digital Systems Laboratory
L16: Power Dissipation in Digital SystemsL16: Power Dissipation in Digital Systems
L16: 6.111 Spring 2006 2Introductory Digital Systems Laboratory
Problem #1: Power Dissipation/HeatProblem #1: Power Dissipation/Heat
5KW 18KW
1.5KW 500W
40048008
80808085
8086286
386486
Pentium® proc
0.1
1
10
100
1000
10000
100000
1971 1974 1978 1985 1992 2000 2004 2008Year
Pow
er (W
atts
)
400480088080
8085
8086
286 386486
Pentium® procP6
1
10
100
1000
10000
1970 1980 1990 2000 2010Year
Pow
er D
ensi
ty (W
/cm
2)
Hot Plate
NuclearReactor
RocketNozzle
Sun’sSurface
Courtesy Intel (S. Borkar)
How do you cool these chips??How do you cool these chips??
chip
heat sink
L16: 6.111 Spring 2006 3Introductory Digital Systems Laboratory
Problem #2: Energy ConsumptionProblem #2: Energy Consumption
(40+ lbs)Battery
Year
Nom
ina l
Cap a
city(
Watt
-ho u
r s/lb
)
Nickel-Cadmium
Ni-Metal Hydride
65 70 75 80 85 90 95 0
10
20
30
40
50 Rechargable Lithium
(from Jon Eager, Gates Inc. , S. Watanabe, Sony Inc.)
No Moore’s law for batteries…Today: Understand where power goes
and ways to manage it
What can One Jouleof energy do?
Send a 1 Megabyte file over 802.11b
Operate a processor
for ~ 7s
The Energy Problem
7.5 cm3
AA battery
Alkaline: ~10,000J
Mow your lawn for
1 ms
L16: 6.111 Spring 2006 4Introductory Digital Systems Laboratory
Dynamic Energy DissipationDynamic Energy Dissipation
VDD
CL
E0→1 = CLVDD2
Ecap = 1/2CLVDD2iDD
Ediss, RP = 1/2CLVDD2
VDD
CL
IN =1Ediss,RN =1/2CLVDD
2
Charging Discharging
IN =0
P = CL VDD2 fclk
RN
RP
RN
RP
L16: 6.111 Spring 2006 5Introductory Digital Systems Laboratory
The Transition Activity Factor The Transition Activity Factor αα00−−>>11
Current Input
Next Input
Output Transition
00 00 1 −> 100 01 1 −> 100 10 1 −> 100 11 1 −> 001 00 1 −> 101 01 1 −> 101 10 1 −> 101 11 1 −> 010 00 1 −> 110 01 1 −> 110 10 1 −> 110 11 1 −> 011 00 0 −> 111 01 0 −> 111 10 0 −> 111 11 0 −> 0
ZAB
Assume inputs (A,B) arrive at f and are uniformly distributedWhat is the average power dissipation?
α0−>1 = 3/16
P = α0−>1 CL VDD2 f
L16: 6.111 Spring 2006 6Introductory Digital Systems Laboratory
Junction (Silicon) TemperatureJunction (Silicon) Temperature
Simple Scenario Realistic Scenario
Tj-Ta= RθJA PD
SiliconSinkCase
SiliconTJ
TC
TS
TATJRθJA is the thermal resistance between silicon and Ambient
RθJCPDTJ
RθJA
TC
RθCSPDTS
Tj= Ta + RθJA PD
TA RθSA
TA
RθCA = RθCS + RθSA Make this as low as possibleis minimized by facilitating heat transfer
(bolt case to extended metal surface – heat sink)
L16: 6.111 Spring 2006 7Introductory Digital Systems Laboratory
Intel Pentium 4 Thermal GuidelinesIntel Pentium 4 Thermal Guidelines
Pentium 4 @ 3.06 GHz dissipates 81.8W!Maximum TC = 69 °CRCA < 0.23 °C/W for 50 C ambientTypical chips dissipate 0.5-1W (cheap packages without forced air cooling)
Execution core
120oC
Cache70°C
Integer & FP ALUs
Temp(oC)
Courtesy of Intel (Ram Krishnamurthy)
L16: 6.111 Spring 2006 8Introductory Digital Systems Laboratory
Power Reduction StrategiesPower Reduction Strategies
P = α0−>1 CL VDD2 f
Reduce Transition Activity or Switching EventsReduce Capacitance (e.g., keep wires short)Reduce Power Supply VoltageFrequency is typically fixed by the application, though this can be adjusted to control power
Optimize at all levels of design hierarchyOptimize at all levels of design hierarchy
L16: 6.111 Spring 2006 9Introductory Digital Systems Laboratory
Clock Gating is a Good Idea!Clock Gating is a Good Idea!
+
X
Global Clock Adder Clock
Multiplier Clock
Clock gating reduces activityand is the most common low-power
technique used today
Adder Off
Enable_Adder
Multiplier On
Enable_Multiplier
100’s of different clocks in a microprocessor
Clock Gating Reduces Energy, does it reduce Power?Clock Gating Reduces Energy, does it reduce Power?
L16: 6.111 Spring 2006 10Introductory Digital Systems Laboratory
Does your GHz Processor run at a GHz? Does your GHz Processor run at a GHz?
Processor
ThermalSensor
ChipActivity Control
Note that there is a difference between average and peak power
On-chip thermal sensor (diode based), measures the silicon temperature
If the silicon junction gets too hot (say 125 °C), then the activity is reduced (e.g., reduce clock rate or use clock gating)
Use of Thermal FeedbackUse of Thermal Feedback
L16: 6.111 Spring 2006 11Introductory Digital Systems Laboratory
Power Supply ResonancePower Supply Resonance
Lboard Lpackage Rgrid
Switchingcurrents
Board decap
On-diedecap
Courtesy of Motorola(David Blaauw)
Courtesy of MotorolaCourtesy of Motorola(David Blaauw)(David Blaauw)
Can write a Virus to Activate Can write a Virus to Activate
Power Supply Resonance!Power Supply Resonance!
200MhzDesign
L16: 6.111 Spring 2006 12Introductory Digital Systems Laboratory
Number Representation:Number Representation:TwoTwo’’s Complement vs. Sign Magnitudes Complement vs. Sign Magnitude
Two’s complement Sign-Magnitude
0000
0111
0011
1011
11111110
1101
1100
1010
1001
1000
0110
0101
0100
0010
0001
+0+1
+2
+3
+4
+5
+6
+7-0
-1
-2
-3
-4
-5
-6
-7
Consider a 16 bit bus where inputs togglesbetween +1 and –1 (i.e., a small noise input)Which representation is more energy efficient?
L16: 6.111 Spring 2006 13Introductory Digital Systems Laboratory
Time Sharing is a Bad IdeaTime Sharing is a Bad Idea
2
Time Sharing Increases Switching ActivityTime Sharing Increases Switching Activity
L16: 6.111 Spring 2006 14Introductory Digital Systems Laboratory
Not just a 6Not just a 6--1 Issue: 1 Issue: ““CoolCool”” Software ???Software ???
CPU
0111111100000000
0111111100000001
0111111100000010
0111111100000011
1000000000000000
1000000000000001
1000000000000010
1000000000000011
a[0]a[1]a[2]a[3]
b[0]b[1]b[2]b[3]
float a [256], b[256];float pi= 3.14;
for (i = 0; i < 255; i++) {a[i] = sin(pi * i /256);}for (i = 0; i < 255; i++) {b[i] = cos(pi * i /256);}
float a [256], b[256];float pi= 3.14;
for (i = 0; i < 255; i++) {a[i] = sin(pi * i /256);b[i] = cos(pi * i /256);
}
address
MEMORY address
16
2(8)+2(2+4+8+16+32+64+128+256)= 1030 transitions
512(8)+2+4+8+16+32+64+128+256= 4607 bit transitions
L16: 6.111 Spring 2006 15Introductory Digital Systems Laboratory
GlitchingGlitching TransitionsTransitions
Balancing paths reduces glitching transitionsStructures such as multipliers have lot of glitching transitionsKeeping logic depths short (e.g., pipelining) reduces glitching
++
+
A B C D
(A+B) + (C+D)+
+
+
A B
C
D
Chain Topology Tree Topology
(((A+B) + C)+D)
L16: 6.111 Spring 2006 16Introductory Digital Systems Laboratory
Reduce Supply Voltage : But is it Free?Reduce Supply Voltage : But is it Free?
t =0+
IN OUT
VDD
+
-
VS D
CL
GVDD
2)(2 T
VDD
VK
−
S
DDV
DDTDD
DD
VVVV
TV
DDV
k
DDV
LC
Di
VL
CDelay
1)( 2
2)(2
2 ≈−
∝
−
⋅
=
∆⋅
=
VDD from 2V to 1V, energy ↓ by x4, delay ↑ x2
L16: 6.111 Spring 2006 17Introductory Digital Systems Laboratory
Transistors Are FreeTransistors Are Free……(What do you do with a Billion Transistors?)(What do you do with a Billion Transistors?)
OUT
IN
X
Pserial = Cmult 22 f P
f =1GHzVDD=2V
parallel = (2Cmult 12 f /2) = Pserial/4
X X
INf = 500MhzVDD=1V
f = 500MhzVDD=1V
IN
SELECT
Trade Area for Low PowerTrade Area for Low Power
OUT
L16: 6.111 Spring 2006 18Introductory Digital Systems Laboratory
Algorithmic WorkloadAlgorithmic Workload
Receiver just updatesCompare Current Image...
...to Previous Image
Fre
quen
cyof
Occ
urre
nce
Number of IDCTs per Frame0 500 1000 1500 2000
0.00
0.02
0.04
0.06
Exploit Time Varying Algorithmic WorkloadExploit Time Varying Algorithmic WorkloadTo Vary the Power Supply Voltage To Vary the Power Supply Voltage
L16: 6.111 Spring 2006 19Introductory Digital Systems Laboratory
Dynamic Voltage Scaling (DVS)Dynamic Voltage Scaling (DVS)
Variable Power SupplyACTIVE IDLE
Fixed Power SupplyACTIVE
EFIXED = ½ C VDD2 EVARIABLE = ½ C (VDD/2)2 = EFIXED / 4
0.2 0.4 0.8 1.0
0.2
0.4
0.6
0.8
1.0
Normalized Workload
Nor
mal
ized
Ene
rgy
Fixed Supply
VariableSupply
00 0.6
[Gutnik97]
L16: 6.111 Spring 2006 20Introductory Digital Systems Laboratory
DVS on a ProcessorDVS on a Processor
Digitally adjustable DC-DC converter powers SA-1110 core
µOS selects appropriate clock frequency based on workload and latency constraints
SA-1110
Control
µOS
VoutController
3.6V
5
L16: 6.111 Spring 2006 21Introductory Digital Systems Laboratory
Hardware vs. SoftwareHardware vs. Software
1nJ/Op
Flex
ibili
ty
0.25nJ/Op
Embedded Processor
DSP
Direct MappedHardware
FPGA0.1-1pJ/Op
Energy/OperationCourtesy of R. Brodersen, J. Rabaey, TI, ARM/StrongARM
L16: 6.111 Spring 2006 22Introductory Digital Systems Laboratory
Energy Efficiency of SoftwareEnergy Efficiency of Software
FPGA (Xilinx)
05
1015202530354045
Pow
er (%
)
Cache Control GCLK EBOX I/O,PLL
Processor (StrongARM-1100)
[Montanaro, JSSC ‘96]
[A. Sinha, DAC]
65%21%
9%5%
InterconnectClock
I/OCLB
CLB CLB
CLBCLB
[Kusse ‘98, UCB]
““SoftwareSoftware”” Energy Dissipation has Large OverheadEnergy Dissipation has Large Overhead
L16: 6.111 Spring 2006 23Introductory Digital Systems Laboratory
Trends: Leakage and Power GatingTrends: Leakage and Power Gating
Duty Cycle (%)
Tota
l Ene
rgy/
Switc
hing
Ene
rgy
VDD
C
VDD
C
EE = = VVDDDDII001010--VVTT//SSEE = = CVCVDDDD
22
SwitchingSwitching(computing)(computing)
LeakageLeakage(standby)(standby)
0 1
Low VTdevices are
leaky - Use a High VT
device is used to gate leakage current
Sleep
L16: 6.111 Spring 2006 24Introductory Digital Systems Laboratory
Trends: Energy ScavengingTrends: Energy Scavenging
MEMS Generator Power Harvesting Shoes
Joe Paradiso(Media Lab)Jose Mur Miranda/
Jeff Lang
After 3-6 steps, it provides 3 mAfor 0.5 sec
~10mW
Vibration-to-Electric Conversion
~ 10µW