the return of silicon efficiency - page d'accueil / lirmm ... · arith18 montpellier june 07...

Post on 21-Jun-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Arith18 Montpellier June 07 simon@icerasemi.com

The Return of Silicon Efficiency

Simon KnowlesIcera

Arith18 Montpellier June 07 simon@icerasemi.com

AGENDA

• Chip Market• Design Objectives• Modern Transistors• Optimality of Designs• The Design Hull• Implications for Design

Arith18 Montpellier June 07 simon@icerasemi.com

“The Cheap will Outsell the Good”

Customer Priorities

1. Availability2. Price

Features3. Power

Size

Design Choices

1. Hardware � Software2. Process Technology3. MHz and Volts4. Custom � Synthesized

Arith18 Montpellier June 07 simon@icerasemi.com

The Chip Market

Most logic chips are processors or processor-centric.

Computers have driven MOS chip evolution – the first“product superclass”.

The second product superclass will be “cellphones”…• Consumer driven – more cost sensitive than computers.• High mobility – more power sensitive than computers.• High (specialized) performance – eg. HSDPA cellular

modem ~50GOPs.

ICs $211bnMOS $174bn

Memory $59bnMicro $54bn Logic $60bnMPU ASSP

SIA 2006

FPGAASICDSPMCU (inc GPU)

0

500

1000

1500

'04 '05 '06 '07 '08 '09

Cellphone units:

Arith18 Montpellier June 07 simon@icerasemi.com

What’s New(-ish) in Logic Chip Physics?

Small dopant populations (VT variability)

Subthreshold leakage (VT reduction is constrained, so Vdd likewise)

Gate leakage (end of SiO2 Tox scaling)

Channel Stress (engineered and STI-induced)

OPC rules

Well Proximity Effect

Line Edge Roughness (gate and wire)

NBTI (RAM VT drift)

All important for designers, but not (directly) the main influencers of design evolution

Arith18 Montpellier June 07 simon@icerasemi.com

Top 2 Design Influencers

1. Most power-sensitive designs will set their own supply voltage.

• Design at typical, turn manufacturing speed spread (functionality risk) into power efficiency spread (battery life).

• Offset manufacturing leakage spread against speed spread.

• Dynamically trade performance vs power.

• Disconnect power in standby (efficient regulators are switch-mode).

2. Processors are evolving to displace fixed-function hardware

• Target multiple market sockets per design.

• Hardware-efficient (dynamic re-purposing of resources).

• Mitigation of specification uncertainty.

• Work around bugs by software change.

• Well-practiced idiom for speed.

Arith18 Montpellier June 07 simon@icerasemi.com

Modern Transistors

Arith18 Montpellier June 07 simon@icerasemi.com

Modern Transistors

ITRS distinguishes 3 roadmaps…

• High Performance (HP)

• Low Operating Power (LOP)

• Low Standby Power (LSTP)

VT (mV)Tox (nm)Lg (nm)

5241.945LSTP

2851.232LOP

1651.125HP

ITRS2005 for 65nm 2007

“cellphones”

computers

TSMC offers 12 logic T’s at 65nm…

• 2 x HP

• 6 x LOP

• 4 x LSTP

Arith18 Montpellier June 07 simon@icerasemi.com

65nm LOP DelayN

orm

aliz

ed �

Temp (°C)Vdd (V)

high VT

mid VT

low VT

Arith18 Montpellier June 07 simon@icerasemi.com

65nm LSTP vs LOP Delay

LST

P �

/ LO

P �

Temp (°C)Vdd (V)

Arith18 Montpellier June 07 simon@icerasemi.com

65nm LOP Sub-Threshold LeakageN

orm

aliz

ed I S

UB

Temp (°C)Vdd (V)

high VT

mid VT

low VT

Arith18 Montpellier June 07 simon@icerasemi.com

65nm LSTP vs LOP Sub-Threshold LeakageLS

TP

I SU

B/ L

OP

I SU

B

Temp (°C)Vdd (V

)

Arith18 Montpellier June 07 simon@icerasemi.com

High Temp ISUB Still Dominates Ig at 65nm (LOP and LSTP)

Temp (°C)Vdd (V

)

(LOP high VT)

Nor

mal

ized

I SU

B, I

g

Arith18 Montpellier June 07 simon@icerasemi.com

LOP 65nm ISUB vs 90nm ISUB65

nm I S

UB

/ 90n

m I S

UB

Temp (°C)Vdd (V

)

(LOP high VT)

Arith18 Montpellier June 07 simon@icerasemi.com

LOP 65nm Ig vs 90nm Ig65

nm I g

/ 90n

m I g

Temp (°C)Vdd (V

)

(LOP high VT)

Arith18 Montpellier June 07 simon@icerasemi.com

Manufacturing Spread: 65nm mid VT LOP DelayN

orm

aliz

ed �

Temp (°C)Vdd (V)

SS

FF

Arith18 Montpellier June 07 simon@icerasemi.com

Manufacturing Spread: 65nm mid VT LOP ISUBN

orm

aliz

ed I S

UB

Temp (°C)Vdd (V

)

SS

FF

Arith18 Montpellier June 07 simon@icerasemi.com

Optimality of Design

Arith18 Montpellier June 07 simon@icerasemi.com

Power, Money and Performance

Excellent recent work on Power vs Performance (among many)…

• Srinivasan, Brooks, Gschwind, Bose, Zyuban, Strenski and Emma, “Optimimum Pipelines for Power and Performance”, Micro ’02.

• Zyuban and Strenski, “Unified Methodology for Resolving Power-Performance Tradeoffs at the Microarchitectural and Circuit Levels”, ISLPED ‘02.

• Harstein, Puzak, “Optimum Power/Performance Pipeline Depth”, Micro ’03.

• Dao, Zeydel, Oklobdzija, “Energy Optimization of Digital Pipelined Systems Using Circuit Sizing and Supply Scaling”, Trans. VLSI Systems ‘05.

But outside the computer market, few chips sell on MHz.

Most designs set out to minimize cost (and power) at fixed performance…

� Decode an MPEG video without glitching.

� Execute an ADSL modem at 8Mbps.

� etc…

Arith18 Montpellier June 07 simon@icerasemi.com

Design Levers

1. Pipelining

2. Replication (parallelism)

3. Speculation

4. Logical redundancy (eg. carry-skip adder)

5. Arithmetic redundancy (eg. carry-save)

6. Vdd

7. VT

8. tOX

9. Lg

10. Manhattan cell height (transistor sizing)

11. Wire geometry

12. Wire screening

13. Structured layout (wire control)

14. Logic circuit engineering

15. Clock circuit engineering

16. Package engineering

(micro)architecture

transistors

geometry

circuits

Arith18 Montpellier June 07 simon@icerasemi.com

Electro-Thermal Model

PackageRSUPPLY

LeakageModel

CapacitanceModel

DeviceDelayModel

Package�JA

+•+

N�

freq

TA

TJ

�req

Vpins

Idd

Istat

Idyn

Required Vdd

Pco-packaged

Ptot

(thermal loop)

PipelineModel

Arith18 Montpellier June 07 simon@icerasemi.com

Pipeline Model

Flux � = nC/τC (T/i/b)

Performance � = KnCf (T/s)

Cost N = K(nC+nS) (T)

Frequency f = 1/τ� (Hz)

nC nS

τC τS

Clock period τ

K “pipettes”…

1 bitAverage nC transistors of combinatorial logic per pipe stage per pipeline bit.

nS transistors per pipeline latch.

Units of τ are “i” � “FO4”; 1i = � seconds.

Arbitrary width/depth mix of pipettes make up the pipelined logic.

RAM

non-pipeline latches

pipelined logic

IO (f

ixed

Vdd

)

Define…

Chip…

Arith18 Montpellier June 07 simon@icerasemi.com

Reality not Included…

• Performance � (TCOMB/s) tends to over-state effective performance at low τ, because logical redundancy is introduced to speed up logic paths and mitigate latency by speculation.

• Flux � (T/i/b) varies a lot locally; tends to rise at low τ for the same redundancy reason, and because pipe cuts have less freedom to avoid high-flux logic.

PrefixAdder

1 wire/bit

3 wires/bit2 wires/bit

Local �sensitivity

Eg. adders…

01234567

flux

at o

utpu

t

Ladner-Fischer Kogge-Stone

0

5

10

15

20

8b 16b 32b 64b

dela

y (i)

Arith18 Montpellier June 07 simon@icerasemi.com

Pipeline Flux in Icera’s DXP®

-

1

2

3

4

All logic macros except regfiles

Flux(T/i/b)

� ~ 3-4 for arithmetic and random translation logic.

� ~ 1-2 for steering logic.

Arith18 Montpellier June 07 simon@icerasemi.com

Required � to meet specified Performance � and Cost N

( )����

���

ττΦ+1�

���

τΨ = ∆

���������� �

��

Specification

Latch delay 4iLatch cost 40TPerformance 10PT/sDesign Cost 10MT

0

5

10

15

20

25

30

35

40

4 9 14 19 24 29 34 39 44 49 54 59 64 69 74

Pipe stage delay τ (number of �’s)

Req

uire

d �

(ps)

� = 2

� = 4

Arith18 Montpellier June 07 simon@icerasemi.com

Constant Flux Implications (� = 3, τS = 4i)

0

5

10

15

20

25

30

35

8 13 18 23 28 33 38 43 48 53 58 63 68 73 78

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

8 13 18 23 28 33 38 43 48 53 58 63 68 73 78

0

5

10

15

20

25

30

35

0.00 1.00 2.00 3.00 4.00 5.00

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0.00 0.50 1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50 5.00

Pipe stage delay τ

Req

uire

d �

(ps)

Com

bina

toria

l fra

ctio

n

Clock frequency (GHz)

Transistors

Delay

Arith18 Montpellier June 07 simon@icerasemi.com

DXP® Logic Stage Population, by #Transistors

Clock distrib.(mostly invs)

All other logic <0.5%

Regfiles(latch-like)

22%

29%

18%

11%

9%

8%

(mostly pipeline)Manhattan cells are ubiquitous in all digital design methodologies.

Complex cells can be important for performance but account for negligible % of total transistors.

Designs are characterized (capacitance, leakage, delay, area) by a few simple Manhattan stages.

Arith18 Montpellier June 07 simon@icerasemi.com

The Design Hull

Arith18 Montpellier June 07 simon@icerasemi.com

Example Design Spec

Performance � = 15PT/sFlux � = 4T/i/bLatch Cost nS = 40TLatch Delay τS = 4iNon-Pipe Logic = 5MTRAM = 1MBLatch Activity = 3x Combinatorial ActivityRAM Activity = 0.2x Combinatorial Activity65nm high VT LOP ProcessAmbient Temp = 85°CMax Die Temp = 125°CPackage �JC = 15°C/WSupply R = 50m�Fixed Parasitic Power = 100mWVdd Limit Range = 0.7V – 1.2V

Arith18 Montpellier June 07 simon@icerasemi.com

Design Hull

Pipe Stage Depth (i)

Pow

er (m

W)

Pipe Cost (MT)

Vdd Ceiling

Thermal Ceiling

Vdd Floor

from here Min Power Pipes

Arith18 Montpellier June 07 simon@icerasemi.com

Feasible DesignsP

ipe

Sta

ge D

epth

(i)

Pipe Cost (MT)

Vdd

Cei

ling

Thermal Ceiling (Thermal Ceiling)

Arith18 Montpellier June 07 simon@icerasemi.com

Minimum Power PipelinesP

ipe

Sta

ge D

epth

ττ ττ(i)

for M

inim

um P

ower

Pipeline Cost (MT)

Vdd

@ fl

oor

Vdd above floor

Arith18 Montpellier June 07 simon@icerasemi.com

The Cost of Minimizing Power

Min

imum

Fea

sibl

e P

ower

(mW

)

Pipeline Cost (MT)

Clearly a design choice

Arith18 Montpellier June 07 simon@icerasemi.com

Minimum Power PipelinesP

ower

(mW

)

Pipe Stage Depth ττττ (i)

13MT

15MT

17MT19MT

Arith18 Montpellier June 07 simon@icerasemi.com

Sensitivity to Pipeline FluxP

ipe

Sta

ge D

epth

ττ ττ(i)

for M

inim

um P

ower

Pipeline Cost (MT)

� = 2

� = 3

� = 4

Arith18 Montpellier June 07 simon@icerasemi.com

LOP vs LSTP

Min

imum

Fea

sibl

e P

ower

(mW

)

Pipeline Cost (MT)

LOP high VT

LSTP mid VT

Arith18 Montpellier June 07 simon@icerasemi.com

LOP vs LSTPP

ipe

Sta

ge D

epth

ττ ττ(i)

for M

inim

um P

ower

Pipeline Cost (MT)

LOP high VT

LSTP mid VT

Arith18 Montpellier June 07 simon@icerasemi.com

Feasible Solutions in Cost-Frequency Space

Clo

ck F

requ

ency

(GH

z)

Pipeline Cost (MT)

65nm LOP high VT

Arith18 Montpellier June 07 simon@icerasemi.com

Implications for Design

Arith18 Montpellier June 07 simon@icerasemi.com

Implications for Chip Design

Every logic chip will have control over its supply voltage.

• Integrated regulators?

• Switch off in standby – limited call for LSTP processes?

We can’t minimize cost and power simultaneously.

Good cost-power points require fairly high speed.

• Beyond synthesis and auto-P&R today � more Structured Custom?

Short pipelines expose the basic TA assumption (single path sensitization).

• Wavefront timing analysis tools?

• Voltage agility makes TA sign-off harder.

Arith18 Montpellier June 07 simon@icerasemi.com

Structured Custom

…is enough for predictable auto-routingNatural cell placement by engineers…

Arith18 Montpellier June 07 simon@icerasemi.com

Thank You

Enjoy the Symposium�

top related