3d microprocessor design: stacking at different granularities

29
3D Microprocessor Design Stacking at different granularities Alberto Villegas Erce Seminar on Computer Systems Turku University April 2010 Alberto Villegas Erce (Seminar on Computer Systems 3D Microprocessor Design April 2010 1 / 29

Upload: alberto-villegas

Post on 27-May-2015

1.608 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: 3D Microprocessor Design: Stacking at different granularities

3D Microprocessor DesignStacking at different granularities

Alberto Villegas Erce

Seminar on Computer SystemsTurku University

April 2010

Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 1 / 29

Page 2: 3D Microprocessor Design: Stacking at different granularities

Introduction

Concepts reviewPreviously on 3D world...

Industry trends

Make it faster, smaller and cuter but do not forget the prize

3D Design

Benefits: shorter wire length, speed increase, lower power consumption.Challenges: risk of defects, heat problems, design complexity.

Through Silicon Vias (TSVs)

Vertical electrical connection passing completely through a silicon die.

Low power consumption

Low latency

Increasing integration level (10k-100k per cm2)

Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 2 / 29

Page 3: 3D Microprocessor Design: Stacking at different granularities

Introduction

TodayThree dimensional Puzzle

How to face 3D design?2D design decomposition at differentgranularities.

1 Entire cores, cache: add functionalitywith high 2D reuse.

2 Functional unit blocks: performanceimprovement and power reduction.Must re-floorplan and retime paths.

3 Logic gates (block splitting): reducelatency and power on every level routes.Need new 3D circuit design,methodologies and layout tools.

Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 3 / 29

Page 4: 3D Microprocessor Design: Stacking at different granularities

Introduction

Index

1 Stacking Complete Modules2 Stacking Functional Unit Blocks3 Splitting Functional Unit Blocks4 Conclusions

Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 4 / 29

Page 5: 3D Microprocessor Design: Stacking at different granularities

Stacking Complete Modules

Index

1 Stacking Complete Modules2 Stacking Functional Unit Blocks3 Splitting Functional Unit Blocks4 Conclusions

Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 5 / 29

Page 6: 3D Microprocessor Design: Stacking at different granularities

Stacking Complete Modules Idea

Three-Dimensional Stacked Caches

Idea

Break & stack existing modules.

Conventional dual-core processorfeaturing a 4MB L2 cache.Design options for 3D stacking

Reduce space.

Increase storage.

Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 6 / 29

Page 7: 3D Microprocessor Design: Stacking at different granularities

Stacking Complete Modules Increasing storage

L2 cache controller in 3D

Objective

Add more storage to the L2cache.

Stacking a second siliconlayer

Additional 8MB of cache

Nearly no impact in L2access latency

Traditional 2D solution

Double silicon area.

Latency increased.

Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 7 / 29

Page 8: 3D Microprocessor Design: Stacking at different granularities

Stacking Complete Modules Increasing storage

L2 cache controller in 3D (cont.)

DRAM Solution

Much greaterstorage density.

Greater latency(50-150 cycles).

Reduce siliconarea in a half.

Hybrid solution

SRAM to storeonly the tags.

DRAM to storethe actual data.

Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 8 / 29

Page 9: 3D Microprocessor Design: Stacking at different granularities

Stacking Complete Modules Increasing storage

L2 cache controller in 3D (testing)

Three programs test:

Program A : small working set that fits in 4MB SRAM cache.

Program B : larger working set that do not fit 4MB SRAM but does fitwithin 32MB DRAM cache.

Program C : streaming memory access patterns. Poor cache hits rate forboth configurations.

Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 9 / 29

Page 10: 3D Microprocessor Design: Stacking at different granularities

Stacking Complete Modules 3D optionality

3D Integration... for everyone?

3D Integration:

Increase silicon required for the chip (layers)=⇒ Increase manufacturing cost

Extra manufacturing steps for bounding.

Impact on yield rates.

3D is not the general answer!

3D stacking is to use it as a means to optionally augment the processorwith some additional functionality

Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 10 / 29

Page 11: 3D Microprocessor Design: Stacking at different granularities

Stacking Complete Modules 3D optionality

Introspective 3D Processors

Objective

Access to more dynamic information about the internal state of amicroprocessor.

Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 11 / 29

Page 12: 3D Microprocessor Design: Stacking at different granularities

Stacking Complete Modules 3D optionality

Reliable 3D Processors

Problem

Small size in modern processors makes them vulnerable to data corruption

Solutions

Redundancy: two/three copies of theprocessor operating lock-step =⇒multiple pipelines increase cost.

Leading execution/trailing checkingcores: trailing core re-executesinstructions (not lock-step) =⇒ stilladditional pipeline increases area.

Stack it!Extra wires eliminated.

Optional checker core.

Unutilized silicon area.

Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 12 / 29

Page 13: 3D Microprocessor Design: Stacking at different granularities

Stacking Functional Unit Blocks

Index

1 Stacking Complete Modules2 Stacking Functional Unit Blocks3 Splitting Functional Unit Blocks4 Conclusions

Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 13 / 29

Page 14: 3D Microprocessor Design: Stacking at different granularities

Stacking Functional Unit Blocks Introduction

Stacking Functional Unit Blocks

Nowadays

Early step of development for thistechnologies.

3D integration will require

Design automation tools.

Layout support.

Verification and validationmethodologies.

Future

Reorganize the processor pipeline in newways.

Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 14 / 29

Page 15: 3D Microprocessor Design: Stacking at different granularities

Stacking Functional Unit Blocks Removing wires

Removing WiresPentium III & IV branch misprediction

Problem

Wire delays have not evolve as fast as transistors speed.

PIII branch misprediction

PIV branch misprediction

Solution

3D implementation so distant blocks are now vertically stacked on top ofeach other.

Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 15 / 29

Page 16: 3D Microprocessor Design: Stacking at different granularities

Stacking Functional Unit Blocks Removing wires

Removing WiresAlpha 21264

Problem

Superscalar processor with multiple execution units (EU) requires a bypassnetwork to forward results between all of the EU =⇒ wiring.

2D Solution

Divide EU into two groups orclusters, each with its own bypassnetwork and communicated.

3D Solution

Stack the clusters.

Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 16 / 29

Page 17: 3D Microprocessor Design: Stacking at different granularities

Stacking Functional Unit Blocks Trade-offs

Removing WiresTrade-offs

Pros

Optimize processorpipeline opportunities.

Physically reduction ofamount of wiring.

Cons

Non-trivial engineeringeffort.

Modify pipelineVerify and validatenew design.

Additional costs.

Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 17 / 29

Page 18: 3D Microprocessor Design: Stacking at different granularities

Stacking Functional Unit Blocks TSV Reality

Removing WiresTSV Reality

Problem

After stacking two blocks there is enough room for placing TSVs.

Solution

Different layouts of the TSVs.

Wire overhead reintroductionReintroduced wires do not completely cancel the 3D wire reduction benefits.

Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 18 / 29

Page 19: 3D Microprocessor Design: Stacking at different granularities

Splitting Functional Unit Blocks

Index

1 Stacking Complete Modules2 Stacking Functional Unit Blocks3 Splitting Functional Unit Blocks4 Conclusions

Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 19 / 29

Page 20: 3D Microprocessor Design: Stacking at different granularities

Splitting Functional Unit Blocks Introduction

Splitting Functional Unit Blocks

Last level

Logic gates

Split individual functional unitsacross multiple layers.

Reorganize the functional unitblock =⇒ more compact 3Darragement.

Benefits

Reduce length of intra-blockwiring.

Improve operating frequencies.

We will introduce a starting point of thinking.

Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 20 / 29

Page 21: 3D Microprocessor Design: Stacking at different granularities

Splitting Functional Unit Blocks 3D Cache Organizations

3D Cache OrganizationsFirst view

Problem

L2 cache consumes about half of the overalldie area.

Worst case routing distance: 2x+4y

Two stack possibilities.

Banks on cores

Half space.

Accessingequal.

Banks on banks

Half space.

Accessingreduced.

Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 21 / 29

Page 22: 3D Microprocessor Design: Stacking at different granularities

Splitting Functional Unit Blocks Splitting the cache

3D Splitting the cache

Problem

Wires within each bank also impact overalllatency.

Split individual cache banks across multiple layers.

Columns oncolumns

Bestlatency.

Rows on rows

Energy reduction.

Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 22 / 29

Page 23: 3D Microprocessor Design: Stacking at different granularities

Splitting Functional Unit Blocks Splitting the cache

3D Splitting cacheTesting

Experimental results

SPICE simulation.

Column on column organization.

SRAM implementations in 65-nm process.

Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 23 / 29

Page 24: 3D Microprocessor Design: Stacking at different granularities

Splitting Functional Unit Blocks 3D Adders

3D AddersClassic Look-ahead Carry Adder

Look-ahead Carry Adder

n = 16-bits

Critical path along bit[0]-bit[n-1]

Several ways to split the adder

Based on inputs

x bottom layer;y top layer.

1st lvl of propagatelayer splitted.

Half wire length.

By significance

least significant bitsbottom layer;most significant toplayer.

TSV between rootnodes.

Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 24 / 29

Page 25: 3D Microprocessor Design: Stacking at different granularities

Conclusions

Index

1 Stacking Complete Modules2 Stacking Functional Unit Blocks3 Splitting Functional Unit Blocks4 Conclusions

Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 25 / 29

Page 26: 3D Microprocessor Design: Stacking at different granularities

Conclusions

Conclusions

Benefits of 3D organizingcomponents

Can significantly reducewire lengths.

Devices from differenttechnologies can betightly integrated andcombined.

3D organizations may berequired depending on theexact design constraints andobjectives.

Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 26 / 29

Page 27: 3D Microprocessor Design: Stacking at different granularities

Conclusions

Conclusions

Cons

More granularity ⇒more re-dising.

Stacking can increaseheat.

Long level oftechnologicaldevelopment

Every re-design process yieldsto a cost increment.

Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 27 / 29

Page 28: 3D Microprocessor Design: Stacking at different granularities

References

References

Three-Dimensional Microprocessor DesignGabriel H. LohSpringer Science 2010

A Modular 3D Processor for Flexible Product Design and TechnologyMigrationGabriel H. LohACM 2008

Die-stacking (3D) microarchitectureB. Black.International Symposium on Microarchitecture, pp. 469-479, 2006

Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 28 / 29

Page 29: 3D Microprocessor Design: Stacking at different granularities

The end Questions

Thank you.

Questions?Please be nice

Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 29 / 29