2014-2-11 john lazzaro (not a prof - “john” is always ok)
DESCRIPTION
www-inst.eecs.berkeley.edu/~cs152/. CS 152 Computer Architecture and Engineering. Lecture 7 -- Power and Energy. 2014-2-11 John Lazzaro (not a prof - “John” is always OK). TA: Eric Love. Play:. Today: Power and Energy. Metrics: Power and energy (intro). Short Break. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/1.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
2014-2-11
John Lazzaro(not a prof - “John” is always OK)
CS 152Computer Architecture and Engineering
www-inst.eecs.berkeley.edu/~cs152/
TA: Eric Love
Lecture 7 -- Power and Energy
Play:
![Page 2: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/2.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Today: Power and Energy
Metrics: Power and energy (intro)
Short Break.
Metrics: Power and energy (technique)
![Page 3: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/3.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Power and Energy
Universal: Power and energy units are comparable across all of applied physics.
So, we use automobilesto introduce terminology.
![Page 4: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/4.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
The Joule: Unit of energy. A 1 Gallon gas container holds 130 MJ of energy.
1 J = 1 W s.
1 W = 1 J/s.
The Watt: Unit of power.A rate of energy (J/s). A gas pump hose delivers 6 MW.
120 KW: The power delivered by a Tesla Supercharger. Tesla Model S has a306 MJ battery (good for 265 miles).
![Page 5: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/5.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Energy and Performance
Sad fact: Computers turn electrical energy into heat. Computation is a byproduct.
Air or water carries heat away, or chip melts.
![Page 6: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/6.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
+1V -
1 Ohm Resistor
1A
1 Joule heats 1 gram of water 0.24 degree C
This is how electric tea pots work ...
1 Joule of Heat Energy per Second
1 Watt
20 W rating: Maximum power the package is able to transfer to the air. Exceed rating and resistor burns.
The Watt: Unit of power. The amount of energy burned in the resistor in 1 second.
The Joule: Unit of energy.Can also be expressed as Watt-Seconds. Burning 1 Watt for 100 seconds uses 100 Watt-Seconds of energy.
![Page 7: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/7.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Cooling an iPod nano ...
Like resistor on last slide, iPod relies on passive transfer of heat from case to the air.
Why? Users don’t want fans in their pocket ...
To stay “cool to the touch” via passive cooling,
power budget of 5 W.
If iPod nano used 5W all the time, its battery would last 15 minutes ...
![Page 8: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/8.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Powering an iPod nano (2005 edition)
1.2 W-hour battery: Can supply 1.2 watts of power for 1 hour.
1.2 W-hr / 5 W ≈ 15 minutes.
Real specs for iPod nano : 14 hours for music,
4 hours for slide shows.
85 mW for music.300 mW for slides.
More W-hours require bigger battery and thus bigger “form factor” -- it wouldn’t be “nano” anymore :-).
![Page 9: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/9.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Finding the (2005) iPod nano CPU ...
Two 80 MHz CPUs. One CPU used for audio, one for slides.
Low-power ARM roughly 1mW per MHz ... variable clock, sleep modes.
85 mW system power realistic ...
A close relative ...
![Page 10: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/10.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
The CPU is only part of power budget!
“Amdahl’s Law for Power”
If our CPU took no power at all to run, that would only double battery life!
CPULCD Backlight
“other”
LCD
GPU
2004-era notebook running a full
workload.
![Page 11: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/11.jpg)
2010 nano0.74
ounces(50% of
2005 Nano)
What’s happened since 2005?
“Up to” 24 hours audio playback. 70% improvement from 2005 nano. 0.39 W Hr
(33% of 2005 Nano)
![Page 12: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/12.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Processors and Energy
![Page 13: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/13.jpg)
Moore’s Law2.6 Billion
1 Million
2 Thousand
![Page 14: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/14.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Main driver: device scaling ...
From: “Facing the Hot Chips Challenge Again”, Bill Holt, Intel, presented at Hot Chips 17, 2005.
![Page 15: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/15.jpg)
1974: Dennard Scaling
If we scale the gate length by a factor 𝞳, how should we scale other aspects of transistor to get the “best” results?
notscaled
𝞳 = 5 scaling
![Page 16: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/16.jpg)
Dennard Scaling
notscaled
𝞳 = 5 scalingThings we do:
scale dimensions, doping, Vdd.
What we get: 𝞳2 as many transistors at the same power density!
Whose gates switch times faster!𝞳
Power density scaling ended in 2003 (Pentium 4: 3.2GHz, 82W, 55M FETs). Why? We could no longer scale Vdd.
![Page 17: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/17.jpg)
The Power Wall
Why? We can no longer fully scale Vdd ... because MOS transistor leakage current is no longer a negligible part of the power budget.
![Page 18: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/18.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Switching Energy: Fundamental Physics
Every logic transition dissipates energy.
How can we limit
switching energy?
(1) Reduce # of clock transitions. But we have work to do ...(2) Reduce Vdd. But lowering Vdd limits the clock speed ...(3) Fewer circuits. But more transistors can do more work.(4) Reduce C per node. One reason why we scale processes.
Vdd
12
C
Vdd
E0-
>1=
2
Vdd
12
C
Vdd
E1-
>0=
2
C
Strong result: Independent of technology.
![Page 19: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/19.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Scaling switching energy per gate ...
IC process scaling (“Moore’s Law”)
From: “Facing the Hot Chips Challenge Again”, Bill Holt, Intel, presented at Hot Chips 17, 2005.
Due to reducing V and C (length and width of Cs decrease, but plate distance gets smaller).
Recent slope more shallow because V is being scaled less aggressively.
![Page 20: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/20.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
0V =
Second Factor: Leakage Currents
Even when a logic gate isn’t switching, it burns power.
Igate: Ideal capacitors have zero DC current. But modern transistor gates are a few atoms thick, and are not ideal.
Isub: Even when this nFet
is off, it passes an Ioff leakage current.
We can engineer any Ioffwe like, but a lower Ioff
also results in a lower Ion, and thus a lower
maximum clock speed.Intel’s 2006 processor designs, leakage vs switching power A lot of work
was done to get a ratio this good ... 50/50 is common.Bill Holt, Intel, Hot Chips 17.
![Page 21: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/21.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Engineering “On” Current at 25 nm ...
I
dsVs
Vd
V g
0.7 = Vdd
0.25 ≈
Vt
I
ds
1.2 mA = Ion
Ioff
= 0 ???
We can increase Ion by
raising Vdd and/or lowering Vt.
![Page 22: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/22.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Plot on a “Log” Scale to See “Off” Current
I
dsVs
Vd
V g
I
ds
Ioff
≈ 10 nA
We can decrease Ioff by
raising Vt - but that lowers Ion.
0.25 ≈
Vt
1.2 mA = Ion
0.7 = Vdd
![Page 23: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/23.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Ioff? Ion? Recall: Timing Lecture ...
I
ds
0.25 ≈
Vt
1.2 mA = Ion
0.7 = Vdd
Ioff
≈ 10 nA
A “on” n-FET
empties the
bucket.
An “off” n-FET while bucket
fills.Ioff
Ion
Why on?
Vdd >>
Vt
Why
open?
Gnd <<
Vt
I ds
= Current through
n-FET
![Page 24: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/24.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Device engineers trade speed and power
From: Silicon Device Scaling to the Sub-10-nm RegimeMeikei Ieong,1* Bruce Doris,2 Jakub Kedzierski,1 Ken Rim,1 Min Yang1
We can reduce leakage
(Pstandby) by raising Vt.
We can increase speed
by raising Vdd and lowering Vt.
We can reduce CV (Pactive)
by lowering Vdd.
2
![Page 25: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/25.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Customize processes for product types ...
From: “Facing the Hot Chips Challenge Again”, Bill Holt, Intel, presented at Hot Chips 17, 2005.
![Page 26: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/26.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Vgs
IdsIntel 22nm Process
Transistor channel is a raised fin.Gate controls
channel from sides and top.Channel depth is fin
width. 12-15nm for L=22nm.
![Page 27: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/27.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Clock rates have flattened out, but ...
![Page 28: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/28.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Performance: Put more transistors to work
![Page 29: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/29.jpg)
Moore’s Law2.6 Billion
1 Million
2 Thousand
We still scale to get more
transistors per unit area ... but we use design techniques to
reduce power.
![Page 30: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/30.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Takeaway Abstractions
From Part I ...
![Page 31: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/31.jpg)
Small circuits can go very fast in standard CMOS ...
This oscillator runs at 210 GHz in a 32 nm SOI CMOS logic process, and consumes 42 mW ...
But if we used these techniques for a CPU, our 150 W air-cooled power limit would limit our design to using about ten thousand transistors ...... but we are used to using 100s of millions!
![Page 32: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/32.jpg)
The Power Wall
Dennard scaling stopped working in 2004.Why? MOSFET off currents became non-negligible. We limit clock speed to prevent chip from melting.
![Page 33: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/33.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Dynamic Power: 4 ways to reduce it ...
Every logic transition dissipates energy.
How can we limit
switching energy?
(1) Reduce # of clock transitions. But we have work to do ...(2) Reduce Vdd. But lowering Vdd limits the clock speed ...(3) Fewer circuits. But more transistors can do more work.(4) Reduce C per node. One reason why we scale processes.
Vdd
12
C
Vdd
E0-
>1=
2
Vdd
12
C
Vdd
E1-
>0=
2
C
Strong result: Independent of technology.
![Page 34: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/34.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
0V =
Static power: We trade off speed for power.
Even when a logic gate isn’t switching, it burns power.
Igate: Ideal capacitors have zero DC current. But modern transistor gates are a few atoms thick, and are not ideal.
Isub: Even when this nFet
is off, it passes an Ioff leakage current.
We can engineer any Ioffwe like, but a lower Ioff
also results in a lower Ion, and thus a lower
maximum clock speed.Intel’s 2006 processor designs, leakage vs switching power A lot of work
was done to get a ratio this good ... 50/50 is common.Bill Holt, Intel, Hot Chips 17.
![Page 35: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/35.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Factor of 60 in leakage vs. 3.5 in speed
(40, 45, 50) are transistor channel lengths (in nm)
Chart shows 9 different NAND gates for an IC process, each with a different speed vs. static power tradeoff.
![Page 36: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/36.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
FO4 (Delay, Power) vs Vdd -- 65nm
FO4 is the delay of one inverter driving four additional inverters. Power in this plot includes dynamic and static.
![Page 37: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/37.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Six low-power design techniques
Power-down idle transistors
Parallelism and pipelining
Slow down non-critical paths
Clock gating
Thermal management
Data-dependent processing
![Page 38: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/38.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Trading Hardware for Power
via Parallelism and Pipelining ...
Design Technique #1 (of 6)
![Page 39: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/39.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Gate delay roughly
linear with Vdd
This magic trick brought to you by Cory Hall ...
And so, we can transform this:
Block processes stereo audio. 1/2
of clocks for “left”, 1/2 for “right”.
P ~ F ⨯ Vdd2
P ~ 1 ⨯ 12
Into this: Top block processes “left”, bottom “right”.
CV2 power only
P ~ #blks ⨯ F ⨯ Vdd 2
P ~ 2 ⨯ 1/2 ⨯ 1/4 = 1/4
![Page 40: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/40.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Chandrakasan & Brodersen (UCB, 1992)
Simple
Pipelined
Parallel
From:
![Page 41: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/41.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Multiple Cores for Low Power
Trade hardware for power, on a large
scale ...
![Page 42: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/42.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Cell: The PS3 chip
2006
![Page 43: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/43.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Cell (PS3 Chip): 1 CPU + 8 “SPUs”
PowerPC
L2 Cache512 KB
Synergistic Processing
Units(SPUs)
8
![Page 44: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/44.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
One Synergistic Processing Unit (SPU)
256 KB Local Store, 128 128-bit Registers
SPU issues 2 inst/cycle (in order) to 7 execution units
SPU fills Local Store using DMA to DRAM and network
![Page 45: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/45.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
A “Schmoo” plot for a Cell SPU ...
The lower Vdd, the longer the maximum clock period, the slower the clock frequency.
12
C
Vdd
E1-
>0=
212
C
Vdd
E0-
>1=
2
The lower Vdd, the less dynamic energy consumption.
![Page 46: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/46.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Clock speed alone doesn’t help E/op ...
12
C
Vdd
E1-
>0=
212
C
Vdd
E0-
>1=
But, lowering clock frequency while keeping voltage constant spreads the same amount of work over a longer time, so chip stays cooler ... 2
![Page 47: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/47.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Scaling V and f does lower energy/op
7W to reliably get 4.4 GHz performance. 47C die temp.
1 W to get 2.2 GHz performance. 26 C die temp. If a program that needs a
4.4 Ghz CPU can be recoded to use two 2.2 Ghz CPUs ... big win.
![Page 48: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/48.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
How iPod nano 2005 puts its 2 cores to use ...
Two 80 MHz CPUs. Was used in several nano generations, with one CPU doing audio decoding, the other doing photos, etc.
![Page 49: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/49.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
2013 Macbook Air
Haswell CPU/GPU
Voltage range: 0.655V to 1.041V ... 2.5x in CV2 energy
![Page 50: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/50.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Powering down idle circuits
Design Technique #2 (of 6)
![Page 51: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/51.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Add “sleep” transistors to logic ...
Example: Floating point unit logic.When running fixed-point instructions, put logic “to sleep”.
+++ When “asleep”, leakage power is dramatically reduced.--- Presence of sleep transistors slows down the clock rate when the logic block is in use.
![Page 52: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/52.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Intel example: Sleeping cache blocks
From: “Facing the Hot Chips Challenge Again”, Bill Holt, Intel, presented at Hot Chips 17, 2005.
A tiny current supplied in “sleep” maintains SRAM state.
![Page 53: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/53.jpg)
Intel Medfield
![Page 54: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/54.jpg)
Intel Medfield
Switches 45 power “islands.”
Fine-grained control of leakage power, to track user activity.
“Race to idle” strategy -- finish tasks quickly, to get to power down.
![Page 55: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/55.jpg)
Playing a game ...
![Page 56: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/56.jpg)
Watching a video ...
![Page 57: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/57.jpg)
Looking at phone screen, not doing anything ...
![Page 58: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/58.jpg)
Phone in your pocket, waiting for a call ...
![Page 59: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/59.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Slow down “slack paths”
Design Technique #3 (of 6)
![Page 60: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/60.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Fact: Most logic on a chip is “too fast”
From “The circuit and physical design of the POWER4 microprocessor”, IBM J Res and Dev, 46:1, Jan 2002, J.D. Warnock et al.
Most wires have hundreds of picoseconds to spare.The critical path
![Page 61: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/61.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Use several supply voltages on a chip ...
Why use multi-Vdd? We can reduce dynamic power by using low-power Vdd for logic off the critical path.
From: “Facing the Hot Chips Challenge Again”, Bill Holt, Intel, presented at Hot Chips 17, 2005.
What if we can’t do a multi-Vdd design? In a multi-Vt process, we can reduce leakage power on the slow logic by using high-Vth transistors.
![Page 62: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/62.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
From a chapter from new book on ASIC design by Chinnery and Keutzer (UCB).
Logical partition into 0.8V and 1.0V nets
done manually to meet 350 MHz spec
(90nm).
Level-shifter insertion and placement done
automatically.
Dynamic power in 0.8V section cut 50%
below baseline.
Leakage power in 1.0V section cut 70%
below baseline.
![Page 63: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/63.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Gating clocks to save power
Design Technique #4 (of 6)
![Page 64: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/64.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
On a CPU, where does the power go?
From: Bose, Martonosi, Brooks: Sigmetrics-2001 Tutorial
Half of the power goes to latches (Flip-Flops).
Most of the time, the latches don’t change state.
So (gasp) gated clocks are a big win. But, done with CAD tools in a disciplined way.
![Page 65: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/65.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Synopsis Design Compiler can do this ...
“Up to 70% power savings at the block level, for applicable circuits” Synopsis Data Sheet
<=
![Page 66: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/66.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Power Compiler also can do this ...
10-20% push-buttonpower savings, using techniques like this one.
![Page 67: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/67.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Data-Dependent Processing
Design Technique #5 (of 6)
![Page 68: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/68.jpg)
Example: Video Decode Transform
Most of the time, the inputs flip between small positive and negative integers.
In 2’s complement, wastes power:
+1: 0b00001-1: 0b11110
![Page 69: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/69.jpg)
Solution: Add bias value to all inputs
30+% power reduction for a bias of 64. For this linear transform, correcting the output for the bias is trivial.
![Page 70: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/70.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Thermal Management
Design Technique #6 (of 6)
![Page 71: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/71.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
Keep chip cool to minimize leakage power
A recipe for thermal runaway
![Page 72: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/72.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
IBM Power 4: How does die heat up?
4 dies on amulti-chip
module
2 CPUsper die
![Page 73: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/73.jpg)
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy
115 Watts: Concentrated in “hot spots”
Fixedpointunits
Hot spots
Cachelogic
66.8 C == 152 F 82 C == 179.6 F
![Page 74: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/74.jpg)
Idea: Monitor temperature, servo clock speed
TDP = Thermal Design Point
![Page 75: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/75.jpg)
iPad Air, iPad Mini retina, and iPhone
5S all use the A7
Repeatedly running the same benchmark on three Apple
products.
The TDP of each form factor dictates how long it can run at
“top speed”TDP = Thermal Design Point
![Page 76: 2014-2-11 John Lazzaro (not a prof - “John” is always OK)](https://reader035.vdocuments.net/reader035/viewer/2022062807/568150cf550346895dbef339/html5/thumbnails/76.jpg)
On Thursday
Time to market via chip verification.
Have fun in section !