vlsi scaling for architects
TRANSCRIPT
-
8/14/2019 VLSI Scaling for Architects
1/41
VLSI Scaling for ArchitectsMAH 1
VLSI Scaling for Architects
Mark Horowitz
Computer Systems LaboratoryStanford University
-
8/14/2019 VLSI Scaling for Architects
2/41
VLSI Scaling for ArchitectsMAH 2
The Buzz is VLSI Wires are Bad
A lot of talk about VLSI wiresbeing a problem:
Delay
Noise coupling
What are the characteristics ofchip wires?
How do they compare toscaled gates?
And what does it really mean?
To CAD developers
To architects
A very popular figure
-
8/14/2019 VLSI Scaling for Architects
3/41
VLSI Scaling for ArchitectsMAH 3
How Will Scaling Change Computer Design?
To answer this question:
First look at what changes when technology scales
Surprisingly less changes than you might think
Components get faster (both wires and gates)
Mostly it allows one to build more complex devices
Then look at how computing devices use silicon technology
How architects and circuit designers use the transistors
What are the looming problems with scaling
What can be done to help
Lets start by looking at scaling CMOS technology
-
8/14/2019 VLSI Scaling for Architects
4/41
VLSI Scaling for ArchitectsMAH 4
Predicting the Future
(without making a fool of yourself)
Is very difficult
The only guarantee is:
The future will happen, and you will be wrong
Two approaches
Think about limitations
SIA 1994 Roadmap Limited oxide thickness, small clock frequency growth, etc.
Industry hit points above the curve
Project from current trends
SIA 1997 Roadmap
Allow miracles to occur, continue trends Project clock rates higher than physically possible
So use a range of technology scalings
Better chance of covering the correct answer
-
8/14/2019 VLSI Scaling for Architects
5/41
VLSI Scaling for ArchitectsMAH 5
Technology Scaling
In scaling there are really two issues
Devices
Can we build smaller devices
What will their performance be
Wires
Try to avoid the wet noodle effect
There is concern about our ability to scale both of these
components
-
8/14/2019 VLSI Scaling for Architects
6/41
VLSI Scaling for ArchitectsMAH 6
Device Scaling Limits
Limitations to device scaling has been around since I started
I started working in 3 nMOS, 22 years ago (actually bipolar)
Worries were
Short channel effect
Punchthrough
drain control of current rather than gate
Hot electrons
Parasitic resistances
Now worries are a little different
Oxide tunnel currents Punchthrough
Parameter control
Parasitic resistances
-
8/14/2019 VLSI Scaling for Architects
7/41
VLSI Scaling for ArchitectsMAH 7
Transistor Scaling
People are building very short channel devices
Shown are I-V curves for 15nm L pMOS
And a short channel nMOS
The structure is strange
FinFET
But you can make them work
-
8/14/2019 VLSI Scaling for Architects
8/41
VLSI Scaling for ArchitectsMAH 8
Logic Gate Speed
How does the speed of a gate depend on technology?
Use a Fanout of 4 inverter metric
Measure the delay of an inverter with Cout/Cin = 4
Divide speed of a circuit by speed of FO4 inverter
Get delay of circuit in measured in FO4 inverters
Metric pretty stable, over process, temp, and voltage
1 4 16
FO4
-
8/14/2019 VLSI Scaling for Architects
9/41
VLSI Scaling for ArchitectsMAH 9
Delay Tracking
1.4
2
Power Supply (V)
8.4
12
2 2.5 3 3.5
Inverter
De
lays
(log
)
Process Corner
1.4
2
8.4
12
TT FF SS FS SF
-
8/14/2019 VLSI Scaling for Architects
10/41
VLSI Scaling for ArchitectsMAH 10
Using FO4 Metric
Is very useful way to estimate/compare circuit/logic
Can compare two designs and normalize out technology
Can set lower bound on speed of a circuit block
If you know fanout of gate, you can estimate delay
Complete theory is called logical effort
Total Fanout = Electrical fanout * Logical Effort of gate
Use an adder as an example
Fastest 64 bit adders run in around 7 FO4 delays
Dynamic adders
Hasnt changed very much recently (HP at ISSCC in 96)
Can find bound on the delay
Need to have fanout of 64 for Cin
L.E. has a 2 input XOR and some carry logic
-
8/14/2019 VLSI Scaling for Architects
11/41
VLSI Scaling for ArchitectsMAH 11
Memory Performance
Can use FO4 delays
to look at memory
access time too.
Delay is log(Size) foran optimized design.
Wire delay is
important at largermemories, but not adominant factor.
Partition array,use fewer thicker
wires. Notice access times
10-20 FO4 64Kb 256Kb 1Mb 4Mb 16MbSize
0.0
10.0
20.0
30.0
40.0
Delay
(fo4
)
Total Delay (no wire res.)
Decoder Delay (no wire res.)Output Path Delay (no wire res.)Total Delay (with wire res.)
-
8/14/2019 VLSI Scaling for Architects
12/41
VLSI Scaling for ArchitectsMAH 12
FO4 Inverter Delay Under Scaling
Device performance will scale
FO4 delay has been linear with tech
Approximately 0.36 nS/m*Ldrawn at TT
(0.5nS/m under worst-case conditions)
Easy to predict gate performance
We can measure them Labs have built 0.04m devices
Key issue is voltage scaling
Need to scale Vdd for power
Hard since Vth does not scale
Gate speed improvement might slow down
Vth issues and gate oxide limitations
1
3
5
7
00.511.52
FO4dela
y(nS)
Feature size (m)
0.36 * Ldrawn
-
8/14/2019 VLSI Scaling for Architects
13/41
VLSI Scaling for ArchitectsMAH 13
Circuit Power
Is very much tied to voltage scaling
If the power supply scales with technology
For a fixed complexity circuit
Power scales down as ^3 if you run as same frequency
Power scales down as ^2 if you run it 1/ times faster
Power scaling is a problem because
Freq has been scaling at faster than 1/
Complexity of machine has been growing
This will continue to be an issue in future chips
Remember scaling the technology makes a chip lower power!
-
8/14/2019 VLSI Scaling for Architects
14/41
VLSI Scaling for ArchitectsMAH 14
Wire Scaling
More uncertainty than transistor scaling
Many options with complex trade-offs
For each metal layer
Need to set H, TT, TB, 1, 2, conductivity of the metal
W
SRSL
TB
TT
H
1
2
ILDTop
ILDBot
Metal, ILDmiddle
-
8/14/2019 VLSI Scaling for Architects
15/41
VLSI Scaling for ArchitectsMAH 15
Wire Capacitance
Capacitance per micron is roughly constant
Simple approx. = fringe (0.07fF/m today) + 4 parallel plates
Depends only on the ratio of the parameters
Coupling becomes a large issue with increased H/S ratio
ILDTop
ILDBot
Metal, ILDmiddle
1W/TT
1W/TB
2H/SR
2H/SL
-
8/14/2019 VLSI Scaling for Architects
16/41
VLSI Scaling for ArchitectsMAH 16
Wire Resistance
Resistance is simpler
R/ m = /wh
Scales up as the technology shrinks
Main reason that wire height has not scaled much
Tradeoffs between height, width and wire pitch
H1
W1
R
W1
H12R
1.4R
W1
H1
-
8/14/2019 VLSI Scaling for Architects
17/41
VLSI Scaling for ArchitectsMAH 17
Wire Layers
Not all wiring layers have the same characteristics
Today have three types of levels
M1
Finest pitch, highest resistance, local interconnection in a cell
M2-4?
Normal routing level, most of the wires
M5+
Thick coarse metal, used for global wires
When scaling forces thinner metal, create new top layer
-
8/14/2019 VLSI Scaling for Architects
18/41
VLSI Scaling for ArchitectsMAH 18
Noise Issues
Two main noise sources for wires
Capacitance coupling
Inductive coupling
Capacitance coupling is mostly a nearest neighbor issue
High aspect ratio wires make this worse
Real push for low-dielectric between wires
Inductive coupling
Is much more complex to analyze
Depends on where the return currents flow
Reduce these problem by design constraints
Gnd returns in buses, power and gnd planes (e.g. 21264)
-
8/14/2019 VLSI Scaling for Architects
19/41
VLSI Scaling for ArchitectsMAH 19
0
0.2
0.4
0.6
0.25 0.18 0.13 0.1 0.07 0.05 0.035
pF
Technology Ldrawn (um)
Semi-global wire capacitance, 1mm long
Aggressive scalingConservative scaling
0
0.1
0.2
0.3
0.4
0.25 0.18 0.13 0.1 0.07 0.05 0.035
Kohms
Technology Ldrawn (um)
Semi-global wire resistance, 1mm long
Aggressive scalingConservative scaling
Scaling Global Wires
R gets uite a bit worse with scaling; C basically constant
-
8/14/2019 VLSI Scaling for Architects
20/41
VLSI Scaling for ArchitectsMAH 20
0
0.2
0.4
0.6
0.25 0.18 0.13 0.1 0.07 0.05 0.035
pF
Technology Ldrawn (um)
Semi-global wire capacitance, scaled length
Aggressive scalingConservative scaling
0
0.1
0.2
0.3
0.4
0.25 0.18 0.13 0.1 0.07 0.05 0.035
Kohms
Technology Ldrawn (um)
Semi-global wire resistance, scaled length
Aggressive scalingConservative scaling
Scaling Module Wires
R is basically constant, and C falls linearly with scaling
-
8/14/2019 VLSI Scaling for Architects
21/41
VLSI Scaling for ArchitectsMAH 21
0
0.1
0.2
0.3
0.4
0.5
0.25 0.18 0.13 0.1 0.07 0.05 0.035
Wirede
lay/Gatedelay
Technology Ldrawn (um)
Semi-global wire, scaled length
Aggressive scalingConservative scaling
Scaled wire delays stay pretty constant relative to gates
Not a very big change
Module Wires
These wires scale fairly well:
-
8/14/2019 VLSI Scaling for Architects
22/41
VLSI Scaling for ArchitectsMAH 22
Is There a Module-Level Wire Problem?
This first cut seems to imply that scaled wires arent a problem
Delay of these wires are scaling (mostly) with gate speed
Long wires get worse, but pretty slowly
So the job a designer (or CAD tool) see stays the same, right?
So what are we missing?
Why are people working so hard on developing new tools?
Die complexity: what if the number of modulesdoubles?
-
8/14/2019 VLSI Scaling for Architects
23/41
VLSI Scaling for ArchitectsMAH 23
19 modules, 49 exceptions19 modules, 24 exceptions
Implications of Complexity Growth
What can we do? Larger design teams?
Longer design times?
Better tools that have fewer exceptions per module
9 modules, 22 exceptions
-
8/14/2019 VLSI Scaling for Architects
24/41
VLSI Scaling for ArchitectsMAH 24
0
100
200
300
0.25 0.18 0.13 0.10 0.07 0.05 0.035
Lengthingatep
itches
Technology Ldrawn (um)
100K gate module75K gate module
50K gate moduleCurrent trend
Keeping Design Time Reasonable
Suppose we want the total CAD tool exceptions to stay constant
At a module level, we need 1/2 as many exceptions
The required threshold for exceptions will grow The delay of those lines increases dramatically
So CAD tools need to handle increasingly longer wires
Is this important?
THIS is important!
-
8/14/2019 VLSI Scaling for Architects
25/41
VLSI Scaling for ArchitectsMAH 25
0
1
2
3
4
5
67
0.25 0.18 0.13 0.1 0.07 0.05 0.035
Wirede
lay/Gatedelay
Technology Ldrawn (um)
Semi-global wire, 1mm long
Aggressive scalingConservative scaling
Global Wire Scaling
Fixed-length wires, relative to gates, worsen by 2x per generation
This is a big problem
Now we examine global wire delay relative to gate delay
-
8/14/2019 VLSI Scaling for Architects
26/41
VLSI Scaling for ArchitectsMAH 26
Designer Responses
Wire engineering -- use wider wires or thicker wires
Making wires wider will improve performance
Resistance = k/W; Capacitance = C0 + b*W
RC = kC0/W + kb; b
-
8/14/2019 VLSI Scaling for Architects
27/41
VLSI Scaling for ArchitectsMAH 27
Designer Responses
Circuit solution -- use repeaters
Break the wire into segments
Delay becomes linear with length
Signal velocity = k ( FO4 Rw Cw )1/2
CloadCw4
Cw4
Rw/2
CloadCw4
Cw4
Rw/2
Cw4
Rw/2
Cw4
-
8/14/2019 VLSI Scaling for Architects
28/41
VLSI Scaling for ArchitectsMAH 28
When to Add Repeaters
Repeaters have overhead and can introduce logic complexity
Add when they reduce the delay; wire RC > 8 FO4
Add sooner to reduce noise coupling issues Plot for 0.25m minimum width wires:
0
2
4
6
8
1012
14
0 5 10 15 20
De
lay
(FO4)
Distance (mm)
M3
M5
Repeaters
Repeaters
-
8/14/2019 VLSI Scaling for Architects
29/41
VLSI Scaling for ArchitectsMAH 29
Signal Velocity for Repeated wires
Under SIA scaling, pretty constant over many generations
Under conservative scaling, slow change at sub-0.1m techs
Makes wire delay increase slowly
20
40
60
80
100
120140
160
180
0.050.10.150.20.25
De
lay
(ps
/mm
)
Feature s ize (m )
M 1
M 3
M 5
S IA
Conservat ive
-
8/14/2019 VLSI Scaling for Architects
30/41
VLSI Scaling for ArchitectsMAH 30
A Different View
Wires are not as bad as they are painted in the media
If you take a chip and scale it
Keeping the same exact design The ratio of the wire delay to gate delay will change slowly
Currently they both scale by roughly the scaling factor
Wire get a little slower each generation
Not a big issue
Key is the logical span of the wire remained constant
The problem is if you make the chip more complex
Wires in general need to connect up more gates
Logical span increases
Get slower
Circuit Tricks (repeaters) help some, but not enough
-
8/14/2019 VLSI Scaling for Architects
31/41
VLSI Scaling for ArchitectsMAH 31
The World is Growing
The problem associated with wires is really due to complexity
Diagram shows the logical span you reach in a cycle
It also show the logical span of a chip
Old view: a chip looks small to a wire
Logical chip size
Distance I can go in 1 cycle
New view: a chip looks really big to a wire
How is this going to affect chip design?
-
8/14/2019 VLSI Scaling for Architects
32/41
VLSI Scaling for ArchitectsMAH 32
Computer Performance
How is scaling going to effect computer design?
Use Hennessy Patterson Formula Delay = Number of Instructions * Cycles/Inst * Clock Period
Performance = Clock Frequency/CPI
Look at how these factors have been improving, and why
0.01
0.1
1
10
100
Jan-85 Jan-88 Jan-91 Jan-94 Jan-97
8038680486PentiumPentium II
SpecInt9
ISPEC Performance vs Year
1
10
100
1993 1994 1995 1996 1997 1998 1999
Year
ISPEC
ISPEC
-
8/14/2019 VLSI Scaling for Architects
33/41
VLSI Scaling for ArchitectsMAH 33
Computer Architects Job
Convert transistors to performance
Use transistors to
Exploit parallelism
Or create it (speculate)
Processor generations
Simple machine
Reuse hardware
Pipelined Separate hardware for each stage
Super-scalar
Multiple port mems, function units
Out-of-order
Mega-ports, complex scheduling
Speculation
Each design has more logic toaccomplish same task (but faster)
-
8/14/2019 VLSI Scaling for Architects
34/41
VLSI Scaling for ArchitectsMAH 34
Architecture Scaling
Plot of IPC
Compiler + IPC
1.5x / generation
Used all known tricks
OOO is old idea
Uses lots of wires
What next?
Wider machines
Threads
Speculation Guess answers to
create parallelism
Have high wire costs
0.00
0.01
0.02
0.03
0.04
0.05
Dec-83 Dec-86 Dec-89 Dec-92 Dec-95 Dec-98
8038680486PentiumPentium II
SpecInt95/MHz
-
8/14/2019 VLSI Scaling for Architects
35/41
VLSI Scaling for ArchitectsMAH 35
Architecture Scaling Issues
Wire scaling is an issue for architecture
Need to support a old program model
Modest number of global resources Registers, memory port
To execute in parallel
Need to find the parallelism in instruction stream
Many instruction decoders, needing to communicate
Need to predict instruction stream well
Large memory for prediction tables
Need multiple functional units
Need large ported central registerfiles
This will not scale:
Machines are already starting to use internal clustering
-
8/14/2019 VLSI Scaling for Architects
36/41
VLSI Scaling for ArchitectsMAH 36
Clock Frequency
Most of performance comes from clock scaling
Clock frequency double each generation
Two factors contribute: technology (1.4x/gen), circuit design
10
100
1000
Dec-83 Dec-86 Dec-89 Dec-92 Dec-95 Dec-98
8038680486PentiumPentium II
MH
-
8/14/2019 VLSI Scaling for Architects
37/41
VLSI Scaling for ArchitectsMAH 37
Gates Per Clock
Clock speed has been scaling faster than base technology
Number of FO4 delays in a cycle has been falling
Number of gates decrease1.4x each generation
Caused by:
Faster circuit families(dynamic logic)
Better optimization
Better micro-architecture
Better adder/mem arch
All this generally re uiresmore transistors
10.00
100.00
Dec-83 Dec-86 Dec-89 Dec-92 Dec-95 Dec-98
8038680486PentiumPentium II
FO4inverter
de
lays
/cyc
le
-
8/14/2019 VLSI Scaling for Architects
38/41
VLSI Scaling for ArchitectsMAH 38
Gates Per Clock Limits
Current SOA machines are at 16 FO4 gates per cycle
Historical low values (Cray) were at this level
Overhead for short tick machines grows rapidly Power
Increases clock power per logic function
Latency
Flops are already 10-20% of cycle today
Logic reach grows smaller
What fits in a cycle (how many bits/gates) decreases
Difficult to generate a clock at less than 8 FO4 gates
Continued scaling of gates/clock will be hard
Guessing slope will change soon
-
8/14/2019 VLSI Scaling for Architects
39/41
VLSI Scaling for ArchitectsMAH 39
Performance Scaling
Remember processor performance plots used to have two lines
Microprocessors and mainframes
Mainframes had maxed out and improved at technology rate
What will happen with microprocessors?
uP
Mainframe
-
8/14/2019 VLSI Scaling for Architects
40/41
VLSI Scaling for ArchitectsMAH 40
Will Processor Performance Slow Down?
Yes and No
Uniprocessor performance growth will slow down
Lastest jump is getting to the 16ish FO4 cycles
People will change the benchmarks to fix this problem
More data parallel application
Multi-media / streaming applications
More threaded applications
Explicitly parallel machine scale uite nicely
-
8/14/2019 VLSI Scaling for Architects
41/41
VLSI Scaling for ArchitectsMAH 41
VLSI Scaling Summary
Scaling allows people to build more complex machines
That run faster too
It does not to first order change the difficulty of module design
Module wires will get worse, but only slowly
You dont think to rethink your wires in your adder, memory
Or even your super-scalar processor core
It does let you design more modules
Continued scaling of uniprocessor performance is getting hard
Machines using global resources run into wire limitations
Faster clock ticks (in FO4) is getting very hard
Power and logical reach issues in going much under 16 FO4s
Machines will have to become more explicitly parallel