vlsi scaling for architects

Upload: mani

Post on 30-May-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 VLSI Scaling for Architects

    1/41

    VLSI Scaling for ArchitectsMAH 1

    VLSI Scaling for Architects

    Mark Horowitz

    Computer Systems LaboratoryStanford University

    [email protected]

  • 8/14/2019 VLSI Scaling for Architects

    2/41

    VLSI Scaling for ArchitectsMAH 2

    The Buzz is VLSI Wires are Bad

    A lot of talk about VLSI wiresbeing a problem:

    Delay

    Noise coupling

    What are the characteristics ofchip wires?

    How do they compare toscaled gates?

    And what does it really mean?

    To CAD developers

    To architects

    A very popular figure

  • 8/14/2019 VLSI Scaling for Architects

    3/41

    VLSI Scaling for ArchitectsMAH 3

    How Will Scaling Change Computer Design?

    To answer this question:

    First look at what changes when technology scales

    Surprisingly less changes than you might think

    Components get faster (both wires and gates)

    Mostly it allows one to build more complex devices

    Then look at how computing devices use silicon technology

    How architects and circuit designers use the transistors

    What are the looming problems with scaling

    What can be done to help

    Lets start by looking at scaling CMOS technology

  • 8/14/2019 VLSI Scaling for Architects

    4/41

    VLSI Scaling for ArchitectsMAH 4

    Predicting the Future

    (without making a fool of yourself)

    Is very difficult

    The only guarantee is:

    The future will happen, and you will be wrong

    Two approaches

    Think about limitations

    SIA 1994 Roadmap Limited oxide thickness, small clock frequency growth, etc.

    Industry hit points above the curve

    Project from current trends

    SIA 1997 Roadmap

    Allow miracles to occur, continue trends Project clock rates higher than physically possible

    So use a range of technology scalings

    Better chance of covering the correct answer

  • 8/14/2019 VLSI Scaling for Architects

    5/41

    VLSI Scaling for ArchitectsMAH 5

    Technology Scaling

    In scaling there are really two issues

    Devices

    Can we build smaller devices

    What will their performance be

    Wires

    Try to avoid the wet noodle effect

    There is concern about our ability to scale both of these

    components

  • 8/14/2019 VLSI Scaling for Architects

    6/41

    VLSI Scaling for ArchitectsMAH 6

    Device Scaling Limits

    Limitations to device scaling has been around since I started

    I started working in 3 nMOS, 22 years ago (actually bipolar)

    Worries were

    Short channel effect

    Punchthrough

    drain control of current rather than gate

    Hot electrons

    Parasitic resistances

    Now worries are a little different

    Oxide tunnel currents Punchthrough

    Parameter control

    Parasitic resistances

  • 8/14/2019 VLSI Scaling for Architects

    7/41

    VLSI Scaling for ArchitectsMAH 7

    Transistor Scaling

    People are building very short channel devices

    Shown are I-V curves for 15nm L pMOS

    And a short channel nMOS

    The structure is strange

    FinFET

    But you can make them work

  • 8/14/2019 VLSI Scaling for Architects

    8/41

    VLSI Scaling for ArchitectsMAH 8

    Logic Gate Speed

    How does the speed of a gate depend on technology?

    Use a Fanout of 4 inverter metric

    Measure the delay of an inverter with Cout/Cin = 4

    Divide speed of a circuit by speed of FO4 inverter

    Get delay of circuit in measured in FO4 inverters

    Metric pretty stable, over process, temp, and voltage

    1 4 16

    FO4

  • 8/14/2019 VLSI Scaling for Architects

    9/41

    VLSI Scaling for ArchitectsMAH 9

    Delay Tracking

    1.4

    2

    Power Supply (V)

    8.4

    12

    2 2.5 3 3.5

    Inverter

    De

    lays

    (log

    )

    Process Corner

    1.4

    2

    8.4

    12

    TT FF SS FS SF

  • 8/14/2019 VLSI Scaling for Architects

    10/41

    VLSI Scaling for ArchitectsMAH 10

    Using FO4 Metric

    Is very useful way to estimate/compare circuit/logic

    Can compare two designs and normalize out technology

    Can set lower bound on speed of a circuit block

    If you know fanout of gate, you can estimate delay

    Complete theory is called logical effort

    Total Fanout = Electrical fanout * Logical Effort of gate

    Use an adder as an example

    Fastest 64 bit adders run in around 7 FO4 delays

    Dynamic adders

    Hasnt changed very much recently (HP at ISSCC in 96)

    Can find bound on the delay

    Need to have fanout of 64 for Cin

    L.E. has a 2 input XOR and some carry logic

  • 8/14/2019 VLSI Scaling for Architects

    11/41

    VLSI Scaling for ArchitectsMAH 11

    Memory Performance

    Can use FO4 delays

    to look at memory

    access time too.

    Delay is log(Size) foran optimized design.

    Wire delay is

    important at largermemories, but not adominant factor.

    Partition array,use fewer thicker

    wires. Notice access times

    10-20 FO4 64Kb 256Kb 1Mb 4Mb 16MbSize

    0.0

    10.0

    20.0

    30.0

    40.0

    Delay

    (fo4

    )

    Total Delay (no wire res.)

    Decoder Delay (no wire res.)Output Path Delay (no wire res.)Total Delay (with wire res.)

  • 8/14/2019 VLSI Scaling for Architects

    12/41

    VLSI Scaling for ArchitectsMAH 12

    FO4 Inverter Delay Under Scaling

    Device performance will scale

    FO4 delay has been linear with tech

    Approximately 0.36 nS/m*Ldrawn at TT

    (0.5nS/m under worst-case conditions)

    Easy to predict gate performance

    We can measure them Labs have built 0.04m devices

    Key issue is voltage scaling

    Need to scale Vdd for power

    Hard since Vth does not scale

    Gate speed improvement might slow down

    Vth issues and gate oxide limitations

    1

    3

    5

    7

    00.511.52

    FO4dela

    y(nS)

    Feature size (m)

    0.36 * Ldrawn

  • 8/14/2019 VLSI Scaling for Architects

    13/41

    VLSI Scaling for ArchitectsMAH 13

    Circuit Power

    Is very much tied to voltage scaling

    If the power supply scales with technology

    For a fixed complexity circuit

    Power scales down as ^3 if you run as same frequency

    Power scales down as ^2 if you run it 1/ times faster

    Power scaling is a problem because

    Freq has been scaling at faster than 1/

    Complexity of machine has been growing

    This will continue to be an issue in future chips

    Remember scaling the technology makes a chip lower power!

  • 8/14/2019 VLSI Scaling for Architects

    14/41

    VLSI Scaling for ArchitectsMAH 14

    Wire Scaling

    More uncertainty than transistor scaling

    Many options with complex trade-offs

    For each metal layer

    Need to set H, TT, TB, 1, 2, conductivity of the metal

    W

    SRSL

    TB

    TT

    H

    1

    2

    ILDTop

    ILDBot

    Metal, ILDmiddle

  • 8/14/2019 VLSI Scaling for Architects

    15/41

    VLSI Scaling for ArchitectsMAH 15

    Wire Capacitance

    Capacitance per micron is roughly constant

    Simple approx. = fringe (0.07fF/m today) + 4 parallel plates

    Depends only on the ratio of the parameters

    Coupling becomes a large issue with increased H/S ratio

    ILDTop

    ILDBot

    Metal, ILDmiddle

    1W/TT

    1W/TB

    2H/SR

    2H/SL

  • 8/14/2019 VLSI Scaling for Architects

    16/41

    VLSI Scaling for ArchitectsMAH 16

    Wire Resistance

    Resistance is simpler

    R/ m = /wh

    Scales up as the technology shrinks

    Main reason that wire height has not scaled much

    Tradeoffs between height, width and wire pitch

    H1

    W1

    R

    W1

    H12R

    1.4R

    W1

    H1

  • 8/14/2019 VLSI Scaling for Architects

    17/41

    VLSI Scaling for ArchitectsMAH 17

    Wire Layers

    Not all wiring layers have the same characteristics

    Today have three types of levels

    M1

    Finest pitch, highest resistance, local interconnection in a cell

    M2-4?

    Normal routing level, most of the wires

    M5+

    Thick coarse metal, used for global wires

    When scaling forces thinner metal, create new top layer

  • 8/14/2019 VLSI Scaling for Architects

    18/41

    VLSI Scaling for ArchitectsMAH 18

    Noise Issues

    Two main noise sources for wires

    Capacitance coupling

    Inductive coupling

    Capacitance coupling is mostly a nearest neighbor issue

    High aspect ratio wires make this worse

    Real push for low-dielectric between wires

    Inductive coupling

    Is much more complex to analyze

    Depends on where the return currents flow

    Reduce these problem by design constraints

    Gnd returns in buses, power and gnd planes (e.g. 21264)

  • 8/14/2019 VLSI Scaling for Architects

    19/41

    VLSI Scaling for ArchitectsMAH 19

    0

    0.2

    0.4

    0.6

    0.25 0.18 0.13 0.1 0.07 0.05 0.035

    pF

    Technology Ldrawn (um)

    Semi-global wire capacitance, 1mm long

    Aggressive scalingConservative scaling

    0

    0.1

    0.2

    0.3

    0.4

    0.25 0.18 0.13 0.1 0.07 0.05 0.035

    Kohms

    Technology Ldrawn (um)

    Semi-global wire resistance, 1mm long

    Aggressive scalingConservative scaling

    Scaling Global Wires

    R gets uite a bit worse with scaling; C basically constant

  • 8/14/2019 VLSI Scaling for Architects

    20/41

    VLSI Scaling for ArchitectsMAH 20

    0

    0.2

    0.4

    0.6

    0.25 0.18 0.13 0.1 0.07 0.05 0.035

    pF

    Technology Ldrawn (um)

    Semi-global wire capacitance, scaled length

    Aggressive scalingConservative scaling

    0

    0.1

    0.2

    0.3

    0.4

    0.25 0.18 0.13 0.1 0.07 0.05 0.035

    Kohms

    Technology Ldrawn (um)

    Semi-global wire resistance, scaled length

    Aggressive scalingConservative scaling

    Scaling Module Wires

    R is basically constant, and C falls linearly with scaling

  • 8/14/2019 VLSI Scaling for Architects

    21/41

    VLSI Scaling for ArchitectsMAH 21

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.25 0.18 0.13 0.1 0.07 0.05 0.035

    Wirede

    lay/Gatedelay

    Technology Ldrawn (um)

    Semi-global wire, scaled length

    Aggressive scalingConservative scaling

    Scaled wire delays stay pretty constant relative to gates

    Not a very big change

    Module Wires

    These wires scale fairly well:

  • 8/14/2019 VLSI Scaling for Architects

    22/41

    VLSI Scaling for ArchitectsMAH 22

    Is There a Module-Level Wire Problem?

    This first cut seems to imply that scaled wires arent a problem

    Delay of these wires are scaling (mostly) with gate speed

    Long wires get worse, but pretty slowly

    So the job a designer (or CAD tool) see stays the same, right?

    So what are we missing?

    Why are people working so hard on developing new tools?

    Die complexity: what if the number of modulesdoubles?

  • 8/14/2019 VLSI Scaling for Architects

    23/41

    VLSI Scaling for ArchitectsMAH 23

    19 modules, 49 exceptions19 modules, 24 exceptions

    Implications of Complexity Growth

    What can we do? Larger design teams?

    Longer design times?

    Better tools that have fewer exceptions per module

    9 modules, 22 exceptions

  • 8/14/2019 VLSI Scaling for Architects

    24/41

    VLSI Scaling for ArchitectsMAH 24

    0

    100

    200

    300

    0.25 0.18 0.13 0.10 0.07 0.05 0.035

    Lengthingatep

    itches

    Technology Ldrawn (um)

    100K gate module75K gate module

    50K gate moduleCurrent trend

    Keeping Design Time Reasonable

    Suppose we want the total CAD tool exceptions to stay constant

    At a module level, we need 1/2 as many exceptions

    The required threshold for exceptions will grow The delay of those lines increases dramatically

    So CAD tools need to handle increasingly longer wires

    Is this important?

    THIS is important!

  • 8/14/2019 VLSI Scaling for Architects

    25/41

    VLSI Scaling for ArchitectsMAH 25

    0

    1

    2

    3

    4

    5

    67

    0.25 0.18 0.13 0.1 0.07 0.05 0.035

    Wirede

    lay/Gatedelay

    Technology Ldrawn (um)

    Semi-global wire, 1mm long

    Aggressive scalingConservative scaling

    Global Wire Scaling

    Fixed-length wires, relative to gates, worsen by 2x per generation

    This is a big problem

    Now we examine global wire delay relative to gate delay

  • 8/14/2019 VLSI Scaling for Architects

    26/41

    VLSI Scaling for ArchitectsMAH 26

    Designer Responses

    Wire engineering -- use wider wires or thicker wires

    Making wires wider will improve performance

    Resistance = k/W; Capacitance = C0 + b*W

    RC = kC0/W + kb; b

  • 8/14/2019 VLSI Scaling for Architects

    27/41

    VLSI Scaling for ArchitectsMAH 27

    Designer Responses

    Circuit solution -- use repeaters

    Break the wire into segments

    Delay becomes linear with length

    Signal velocity = k ( FO4 Rw Cw )1/2

    CloadCw4

    Cw4

    Rw/2

    CloadCw4

    Cw4

    Rw/2

    Cw4

    Rw/2

    Cw4

  • 8/14/2019 VLSI Scaling for Architects

    28/41

    VLSI Scaling for ArchitectsMAH 28

    When to Add Repeaters

    Repeaters have overhead and can introduce logic complexity

    Add when they reduce the delay; wire RC > 8 FO4

    Add sooner to reduce noise coupling issues Plot for 0.25m minimum width wires:

    0

    2

    4

    6

    8

    1012

    14

    0 5 10 15 20

    De

    lay

    (FO4)

    Distance (mm)

    M3

    M5

    Repeaters

    Repeaters

  • 8/14/2019 VLSI Scaling for Architects

    29/41

    VLSI Scaling for ArchitectsMAH 29

    Signal Velocity for Repeated wires

    Under SIA scaling, pretty constant over many generations

    Under conservative scaling, slow change at sub-0.1m techs

    Makes wire delay increase slowly

    20

    40

    60

    80

    100

    120140

    160

    180

    0.050.10.150.20.25

    De

    lay

    (ps

    /mm

    )

    Feature s ize (m )

    M 1

    M 3

    M 5

    S IA

    Conservat ive

  • 8/14/2019 VLSI Scaling for Architects

    30/41

    VLSI Scaling for ArchitectsMAH 30

    A Different View

    Wires are not as bad as they are painted in the media

    If you take a chip and scale it

    Keeping the same exact design The ratio of the wire delay to gate delay will change slowly

    Currently they both scale by roughly the scaling factor

    Wire get a little slower each generation

    Not a big issue

    Key is the logical span of the wire remained constant

    The problem is if you make the chip more complex

    Wires in general need to connect up more gates

    Logical span increases

    Get slower

    Circuit Tricks (repeaters) help some, but not enough

  • 8/14/2019 VLSI Scaling for Architects

    31/41

    VLSI Scaling for ArchitectsMAH 31

    The World is Growing

    The problem associated with wires is really due to complexity

    Diagram shows the logical span you reach in a cycle

    It also show the logical span of a chip

    Old view: a chip looks small to a wire

    Logical chip size

    Distance I can go in 1 cycle

    New view: a chip looks really big to a wire

    How is this going to affect chip design?

  • 8/14/2019 VLSI Scaling for Architects

    32/41

    VLSI Scaling for ArchitectsMAH 32

    Computer Performance

    How is scaling going to effect computer design?

    Use Hennessy Patterson Formula Delay = Number of Instructions * Cycles/Inst * Clock Period

    Performance = Clock Frequency/CPI

    Look at how these factors have been improving, and why

    0.01

    0.1

    1

    10

    100

    Jan-85 Jan-88 Jan-91 Jan-94 Jan-97

    8038680486PentiumPentium II

    SpecInt9

    ISPEC Performance vs Year

    1

    10

    100

    1993 1994 1995 1996 1997 1998 1999

    Year

    ISPEC

    ISPEC

  • 8/14/2019 VLSI Scaling for Architects

    33/41

    VLSI Scaling for ArchitectsMAH 33

    Computer Architects Job

    Convert transistors to performance

    Use transistors to

    Exploit parallelism

    Or create it (speculate)

    Processor generations

    Simple machine

    Reuse hardware

    Pipelined Separate hardware for each stage

    Super-scalar

    Multiple port mems, function units

    Out-of-order

    Mega-ports, complex scheduling

    Speculation

    Each design has more logic toaccomplish same task (but faster)

  • 8/14/2019 VLSI Scaling for Architects

    34/41

    VLSI Scaling for ArchitectsMAH 34

    Architecture Scaling

    Plot of IPC

    Compiler + IPC

    1.5x / generation

    Used all known tricks

    OOO is old idea

    Uses lots of wires

    What next?

    Wider machines

    Threads

    Speculation Guess answers to

    create parallelism

    Have high wire costs

    0.00

    0.01

    0.02

    0.03

    0.04

    0.05

    Dec-83 Dec-86 Dec-89 Dec-92 Dec-95 Dec-98

    8038680486PentiumPentium II

    SpecInt95/MHz

  • 8/14/2019 VLSI Scaling for Architects

    35/41

    VLSI Scaling for ArchitectsMAH 35

    Architecture Scaling Issues

    Wire scaling is an issue for architecture

    Need to support a old program model

    Modest number of global resources Registers, memory port

    To execute in parallel

    Need to find the parallelism in instruction stream

    Many instruction decoders, needing to communicate

    Need to predict instruction stream well

    Large memory for prediction tables

    Need multiple functional units

    Need large ported central registerfiles

    This will not scale:

    Machines are already starting to use internal clustering

  • 8/14/2019 VLSI Scaling for Architects

    36/41

    VLSI Scaling for ArchitectsMAH 36

    Clock Frequency

    Most of performance comes from clock scaling

    Clock frequency double each generation

    Two factors contribute: technology (1.4x/gen), circuit design

    10

    100

    1000

    Dec-83 Dec-86 Dec-89 Dec-92 Dec-95 Dec-98

    8038680486PentiumPentium II

    MH

  • 8/14/2019 VLSI Scaling for Architects

    37/41

    VLSI Scaling for ArchitectsMAH 37

    Gates Per Clock

    Clock speed has been scaling faster than base technology

    Number of FO4 delays in a cycle has been falling

    Number of gates decrease1.4x each generation

    Caused by:

    Faster circuit families(dynamic logic)

    Better optimization

    Better micro-architecture

    Better adder/mem arch

    All this generally re uiresmore transistors

    10.00

    100.00

    Dec-83 Dec-86 Dec-89 Dec-92 Dec-95 Dec-98

    8038680486PentiumPentium II

    FO4inverter

    de

    lays

    /cyc

    le

  • 8/14/2019 VLSI Scaling for Architects

    38/41

    VLSI Scaling for ArchitectsMAH 38

    Gates Per Clock Limits

    Current SOA machines are at 16 FO4 gates per cycle

    Historical low values (Cray) were at this level

    Overhead for short tick machines grows rapidly Power

    Increases clock power per logic function

    Latency

    Flops are already 10-20% of cycle today

    Logic reach grows smaller

    What fits in a cycle (how many bits/gates) decreases

    Difficult to generate a clock at less than 8 FO4 gates

    Continued scaling of gates/clock will be hard

    Guessing slope will change soon

  • 8/14/2019 VLSI Scaling for Architects

    39/41

    VLSI Scaling for ArchitectsMAH 39

    Performance Scaling

    Remember processor performance plots used to have two lines

    Microprocessors and mainframes

    Mainframes had maxed out and improved at technology rate

    What will happen with microprocessors?

    uP

    Mainframe

  • 8/14/2019 VLSI Scaling for Architects

    40/41

    VLSI Scaling for ArchitectsMAH 40

    Will Processor Performance Slow Down?

    Yes and No

    Uniprocessor performance growth will slow down

    Lastest jump is getting to the 16ish FO4 cycles

    People will change the benchmarks to fix this problem

    More data parallel application

    Multi-media / streaming applications

    More threaded applications

    Explicitly parallel machine scale uite nicely

  • 8/14/2019 VLSI Scaling for Architects

    41/41

    VLSI Scaling for ArchitectsMAH 41

    VLSI Scaling Summary

    Scaling allows people to build more complex machines

    That run faster too

    It does not to first order change the difficulty of module design

    Module wires will get worse, but only slowly

    You dont think to rethink your wires in your adder, memory

    Or even your super-scalar processor core

    It does let you design more modules

    Continued scaling of uniprocessor performance is getting hard

    Machines using global resources run into wire limitations

    Faster clock ticks (in FO4) is getting very hard

    Power and logical reach issues in going much under 16 FO4s

    Machines will have to become more explicitly parallel