scaling, power and the future of cmos - semantic scholar › ecb5 › b230c47f5109aafce... ·...

Post on 27-Jun-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Scaling, Power

and the Future of CMOS

Mark Horowitz, Elad Alon, Dinesh Patil, Stanford

Samuel Naffziger, Rajesh Kumar, Intel

Kerry Bernstein, IBM

Slide 2

A Long Time Ago

In a building far away

A man made a prediction

On surprisingly little data

That has defined an industry

Slide 3

Moore’s Law

Slide 4

CMOS Computer Performance

1.00

10.00

100.00

1000.00

10000.00

85 87 89 91 93 95 97 99 01 03 05 07

intel 386

intel 486intel pentium

intel pentium 2

intel pentium 3

intel pentium 4intel itanium

Alpha 21064

Alpha 21164

Alpha 21264Sparc

SuperSparc

Sparc64Mips

HP PA

Power PC

AMD K6AMD K7

AMD x86-64

Slide 5

Moore’s Original Issues

• Design cost

• Power dissipation

• What to do with all the functionality possible

ftp://download.intel.com/research/silicon/moorespaper.pdf

Slide 6

Outline

• How designers will deal with poor power scaling

• Origins of the power problem

• An optimization perspective

• Low power circuits and architectures

• Cost of variability

• Future scenarios

• What device characteristics matter (to me)

Slide 7

The 80’s Power Problem

• Until mid 80s technology was mixed

• nMOS, bipolar, some CMOS

• Supply voltage was not scaling / power was rising

• nMOS, bipolar gates dissipate static power

From Roger Schmidt, IBM Corp

Slide 8

Solution: Move to CMOS

• And then scale Vdd

0.1

1

10

Jan-85 Jan-88 Jan-91 Jan-94 Jan-97 Jan-00 Jan-03

Feat Size (um)

Vdd

Slide 9

Scaling MOS Devices

• In this ideal scaling• V scales to αV, L scales to αL

• So C scales to αC, i scales to αi (i/µ is stable)

• Delay = CV/I scales as α

• Energy = CV2 scales as α3

JSSC Oct 74, pg 256

Slide 10

Processor Power

• Continued to grow, even when Vdd was scaled

1

10

100

85 87 89 91 93 95 97 99 01 03

Slide 11

Why Power Increased

• Growing die size, fast frequency scaling

Clock Frequency (MHz)

10

100

1000

10000

85 87 89 91 93 95 97 99 01 03 05

Slide 12

Good News

• Die growth & super frequency scaling have stopped

Cycle in FO4

10

100

85 87 89 91 93 95 97 99 01 03 05

Slide 13

Processor Power

• They were high power too

1

10

100

85 87 89 91 93 95 97 99 01 03

Slide 14

Bad News

• Voltage scaling has

stopped as well

• kT/q does not scale

• Vth scaling has

power consequences

• If Vdd does not scale

• Energy scales slowlyEd Nowak, IBM

Slide 15

Energy – Performance Space

• Every design is a point on a 2-D plane

Performance

Energy

Slide 16

Energy – Performance Space

• Every design is a point on a 2-D plane

Performance

Energy

Slide 17

Energy – Performance Space

• Every design is a point on a 2-D plane

Performance

Energy

Slide 18

Trade-offs for an Adder

101

102

103

104

105

Stat.Carr.Chain Stat.Carr.Sel

Stat. KS

Dom. BK

Dom. KS

Dom. LF

Stat. LF

Stat. BK

Stat. 84421

Slide 19

( )

*dd dd

dd

dd

dd V V

EV

Sens VDV

=

∂∂

= −∂

Key Observation:

• Define the Energy/Delay sensitivity of parameter

• For example Vdd:

• At optimal point, all sensitivities should be the same

• Must equal the slope of the Pareto optimal curve

Slide 20

What This Means

• Vdd and Vth are not directly set by scaling

• Instead set by slope of Pareto optimal curve

• Leakage rose to lower total system power!

101

102

103

104

105

[Vdd=0.68, Vt

N=0.31]

[Vdd=1 Vt

N=0.30]

[Vdd=1.2 Vt

N=0.26]

[Vdd=1.3 Vt

N=0.22]

Slide 21

Low Power Design Techniques

Three main classes of methods to reduce energy:

• Cheating

• Reducing the performance of the design

• Reducing waste

• Stop using energy for stuff that does not produce results

• Stop waiting for stuff that you don’t need (parallelism)

• Problem reformulation

• Reduce work (less energy and less delay)

Slide 22

Cheating

• Many low-power papers talk only about energy

• Don’t consider performance

• Reducing performance can always reduce energy

• But there are many ways to reduce performance

• Good technique must lower the optimal curve

• “Sensitivity” of technique

• Must be better than current curve

• This depends on location on the curve

Slide 23

Reducing Energy Waste

• Clock gating

• If a section is idle, remove clock

• Removes clock power

• Prevents any internal node from transitioning

• Create system power states

• Turn on subsystems only when they are needed

• Can have different “off” states

• Power vs. wakeup time

• Disk (do you stop it from spinning?)

Slide 24

Embedded Power Gating

• Can reduce leakage• 250x reported

• But costs• Performance

• Drop in Vdd, Gnd

Embedded

Power

Switches

Rows of

Standard

Cells

Power Switch

Control Signals

• Since transistors still leak when power is off

Royannez, et al, 90nm Low Leakage SoC Design Techniques

for Wireless Applications, ISSCC 2005

Slide 25

Range of Applicability

• Power supply gating

• Done to remove leakage power

• But slows down the circuit

• Adds series resistance to the supply

Performance

Energy/op

Makes circuit worse

when energy

sensitivity is high

Slide 26

Parallelism

• If the application has data parallelism

• Parallelism is a way to improve performance

• With low additional energy cost

Slide 27

Existing Processors

0.01

0.1

1

1 10 100 1000Spec2000*L

Watts/(Spec*L*Vdd2)

10 processors

working in parallel

Slide 28

Parallel Server Chip

• Power 5 from IBM

Slide 29

Problem Reformulation

• Best way to save energy is to do less work

• Energy directly reduced by the reduction in work

• But required time for the function decreases as well

• Convert this into extra power gains

• Shifts the optimal curve down and to the right

User Performance

Energy/op

Slide 30

Cost of Variation

• Variability changes position of the optimal curves

• Need to margin Vth, Vdd to ensure circuit always works

10-1

100

10-1

100

101

Performance

Energy

∆ Vth = 0 mV

∆ Vth = 120 mV

Slide 31

Partial Compensation

• Adjust Vdd after you get part back

• Compensates very well for small deviations in Vth

10-1

100

10-1

100

101

Performance

Energy

∆ Vth = 0 mV

∆ Vth = 120 mV

Slide 32

Reducing Voltage Margins

• At test time determine Vdd for that part

• Have private DC-DC converter already

20%

Slide 33

Variable Application Demands

• Try to provide a couple of operating points

• Application can control speed and energy

• Hard question is what are valid Vdd, F pairs

• Usually determined during test

• Dynamic voltage scaling

• Intel Speed Step in laptop processors

• 2 performance/power points

• Transmeta Long Run Technology

• Many operating points. Test data + formula

Slide 34

Constant Power Scaling

• Foxton controller on next-gen. Itanium II

• Raises Vdd/boosts F when most units idle

• Lowers Vdd for parallel code to stay in budget

Slide 35

Self Checking Hardware

• Razor (Austin/Blaauw, U of Mich)

• Use the actual hardware to check for errors

• Latch the input data twice

• Once on the clock edge, and then a little later

• If the data is not the same, you are going too fast

Din 0

clk_del

Q[0]Din 1Din 2

error

clk

Error_L

comparator

RAZOR FF

Main

Flip-Flop

Shadow

Latch

01

Slide 36

With Error Recovery

• Can run the chip so it makes some errors

• Chip gets right answer 99.9% of the time

• 0.1% of the time, the chip must rerun operation

1.4 1.5 1.6 1.7 1.8

1.4

1.5

1.6

1.7

1.8 Chips

Linear Fit

y=0.78685x + 0.22117

Voltage at First FailureVoltage at 0.1%Error Rate

Slide 37

Adjusting Vth

• In theory want to adjust Vth too

• Very hard to do with modern transistors

0.9 0.95 1 1.05 1.1 1.15 1.2 1.25 1.35

10

1520

2530

3540

4550

5560

6570

Leakage vs Vdd with Different Body Bias

Vbb=0.8

Vbb=0.7

Vbb=0.6

Vbb=0.5

Vbb=0.4

Vbb=0.3

Vbb=0.2

Vbb=0.1

Vbb=0

Vdd (V)

Leakage (

A)

Slide 38

Future Systems

• Some simple math

• Assume scaling continues

• Dies don’t shrink in size

• Average power/gate must decrease by 2x / generation

• Since gates are shrinking in size

• Get 1.4x from capacitive reduction

• Where is the other factor of 1.4x ?

Slide 39

Exploit Parallelism / Scale Vdd

• If you have parallelism

• Add more function units

• Fill up new die (2x)

• Lower energy/op

• ∆E/∆P will decrease

• Vdd, sizes, etc will reduce

• Build simpler architectures

• Works well when ∆E/∆P is large

• Per unit performance decrease is small

Performance

Energy/op

Slide 40

Exploit Specialization

• Optimize execution units for different applications

• Reformulate the hardware to reduce needed work

• Can improve energy efficiency for a class of applications

• Stream / Vector processing is a current example

• Exploit locality, reuse

• High compute densityµ-controller

Clusters

SRFMem

ory System

HISC NI

Bill Dally et al, Stanford

Imagine

Slide 41

Exploit Integration

• If both those techniques don’t work

• Still can increase integration by at least 1.4x

• Moving units onto one chip

• Reduces the number of I/Os on system

• I/O can take significant power today

• Allows even larger integration

Slide 42

TI - OMAP2420

• Specialization

• And power domains• Most units are off

• OMAP 2420 • 5 Power Domains

• #1: MCU Core

• #2: DSP Core

• #3: Graphic Accelerator

• #4: Core + Periph.

• #5: Always On logicRoyannez, et al, 90nm Low Leakage SoC Design Techniques

for Wireless Applications, ISSCC 2005

Slide 43

Low-Power PowerPC

Nowka et al., Low-

power PowerPC,

ISSCC

Slide 44

What All This Means

• As long as $/function and cap continue to scale

• Moving to the new technology will be profitable

• And will allow designs to be better systems

• In the worst case, active die area will decrease

• Scale gates by the decrease in gate capacitance

• In most cases, we will do much better

• But how to optimize devices in this new domain?

Slide 45

Radical Idea:

• Scaling channel length may no longer be critical

• I still want small (i.e. dense) devices

• But I also want lower variations & external control of Vth

• Longer Leff may actually improve energy efficiency

• Less variability � lower energy penalty

• Especially as move to lower performance (parallelism)

10-1

100

100

101

Performance

Energy

120 mV110 mV

55 mV60 mV

Lnom

Lnom

+ 10%

Slide 46

Conclusions

• Unfortunately power is an old problem

• Magic bullets have mostly been spent

• Power will be addressed by application-level optimization, parallelism/specialized functional units, and more adaptive control

• Need to rethink scaling

• Still makes things cheaper

• But what do we want from scaled transistors?

Slide 47

1.5

Technology Scaling

Seems simple,

• Every 1 years

• Number of transistors double

• Transistors get faster

• Gates become lower power (CMOS)

• Life just gets better and better

2

Slide 48

Reality is a Little Different

• While scaling has been smooth

• Almost nothing else has been

• Device and circuit technology has changed

• DTL, ECL, TTL, pMOS, nMOS, CMOS

• Power periodically becomes a critical issue

• It is critical again

Slide 49

nMOS, TTL, ECL Were King

• 1978 – Started in VLSI

• First design was bipolar/ECL

• 3µm nMOS was hot

Intel 8086 Intel 286

DEC µVaxBIT Sparc HP Focus

Slide 50

MOS Scaling Was Understood

• MOS devices operate on electric fields

• If E fields are the same

• Relation between E and J is the same

• So if all voltages and lengths scale

• iV curve retains the same shape, scaled in V

Bob Dennard worked all the math in 74JSSC Oct 74, pg 256

Slide 51

Dilemma

• Processors today are power limited

• As are many other chips

• Technology scaling will not save us

• With Vdd fixed, energy scaling will be modest

• How does one build more powerful processors?

• Or other types of chips

When constrained, optimize!

Slide 52

Optimizing the Right Thing

• Given systems are power limited

• Highest performance system is not interesting

• Will dissipate too much power

• Lowest energy solution is also not interesting

• Will not have enough performance

• Want constrained optimization

• Highest performance for 20 Watts

• Lowest power for 100 SPEC

Slide 53

Leakage Trends

Gate Length (um)

Active Power Density

Subthreshold Power Density

Slide 54

Performance

Energy/op

Design Parameters To Adjust

• Circuit

(sizing, supply, threshold)

• Circuit topology

(adder: CLA, CSA, …)

• Logic style

(domino, pass-gate, …)

• Micro-architecture

(pipelining, cache design, branch architecture, etc)

Slide 55

Energy Efficient Designs

• Are on the Pareto optimal curve

• On this curve design parameters are constrained

Performance

Energy/op

infeasible

wasting energy

Slide 56

Leakage Energy

• Matching marginal costs for Vdd and Vth

10-2

10-1

0

0.1

0.2

0.3

0.4

0.5

Activity Factor

Leakage Ratio

10-2

10-1

0

0.2

0.4

0.6

0.8

1

1.2

Voltage

Vdd

Vth

Leakage Ratio

Slide 57

Measured Leakage Data

0

0.1

0.2

0.3

0.4

0.5

Leakage R

atio

Slide 58

IBM Cell Processor

Slide 59

Vth Variation

• Since leakage is exponential on Vth

• Average Vth for leakage is not the expected Vth

5 10 15 20 25 30 35 400

0.2

0.4

0.6

0.8

1

Relative Leakage

Cumulative Probability

0.2 0.3 0.4 0.50

0.5

1

1.5

2

2.5

Vth

Relative Leakage Contribution

Leakage

Vth

Slide 60

How Else to Save Energy?

• Running faster than needed wastes energy

• Forces you to run higher on performance curve

• Why do you run faster than needed?

• Need margins to account for variability

• From application, environment, or technology

Variations cause waste

Slide 61

Dynamic Voltage Scaling

Burd et al ISSCC 2000

Slide 62

Dynamic Voltage Scaling

• Dynamic voltage scaling

• Adjusts Vdd to the “right” value for desired performance

• Big problem is how to find the “right” Vdd

• Need to know the relationship between Vdd and F

• Need to have a circuit that matches the critical path

• How do you do this with variations?

top related