3d microprocessor design: stacking at different granularities
TRANSCRIPT
3D Microprocessor DesignStacking at different granularities
Alberto Villegas Erce
Seminar on Computer SystemsTurku University
April 2010
Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 1 / 29
Introduction
Concepts reviewPreviously on 3D world...
Industry trends
Make it faster, smaller and cuter but do not forget the prize
3D Design
Benefits: shorter wire length, speed increase, lower power consumption.Challenges: risk of defects, heat problems, design complexity.
Through Silicon Vias (TSVs)
Vertical electrical connection passing completely through a silicon die.
Low power consumption
Low latency
Increasing integration level (10k-100k per cm2)
Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 2 / 29
Introduction
TodayThree dimensional Puzzle
How to face 3D design?2D design decomposition at differentgranularities.
1 Entire cores, cache: add functionalitywith high 2D reuse.
2 Functional unit blocks: performanceimprovement and power reduction.Must re-floorplan and retime paths.
3 Logic gates (block splitting): reducelatency and power on every level routes.Need new 3D circuit design,methodologies and layout tools.
Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 3 / 29
Introduction
Index
1 Stacking Complete Modules2 Stacking Functional Unit Blocks3 Splitting Functional Unit Blocks4 Conclusions
Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 4 / 29
Stacking Complete Modules
Index
1 Stacking Complete Modules2 Stacking Functional Unit Blocks3 Splitting Functional Unit Blocks4 Conclusions
Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 5 / 29
Stacking Complete Modules Idea
Three-Dimensional Stacked Caches
Idea
Break & stack existing modules.
Conventional dual-core processorfeaturing a 4MB L2 cache.Design options for 3D stacking
Reduce space.
Increase storage.
Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 6 / 29
Stacking Complete Modules Increasing storage
L2 cache controller in 3D
Objective
Add more storage to the L2cache.
Stacking a second siliconlayer
Additional 8MB of cache
Nearly no impact in L2access latency
Traditional 2D solution
Double silicon area.
Latency increased.
Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 7 / 29
Stacking Complete Modules Increasing storage
L2 cache controller in 3D (cont.)
DRAM Solution
Much greaterstorage density.
Greater latency(50-150 cycles).
Reduce siliconarea in a half.
Hybrid solution
SRAM to storeonly the tags.
DRAM to storethe actual data.
Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 8 / 29
Stacking Complete Modules Increasing storage
L2 cache controller in 3D (testing)
Three programs test:
Program A : small working set that fits in 4MB SRAM cache.
Program B : larger working set that do not fit 4MB SRAM but does fitwithin 32MB DRAM cache.
Program C : streaming memory access patterns. Poor cache hits rate forboth configurations.
Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 9 / 29
Stacking Complete Modules 3D optionality
3D Integration... for everyone?
3D Integration:
Increase silicon required for the chip (layers)=⇒ Increase manufacturing cost
Extra manufacturing steps for bounding.
Impact on yield rates.
3D is not the general answer!
3D stacking is to use it as a means to optionally augment the processorwith some additional functionality
Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 10 / 29
Stacking Complete Modules 3D optionality
Introspective 3D Processors
Objective
Access to more dynamic information about the internal state of amicroprocessor.
Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 11 / 29
Stacking Complete Modules 3D optionality
Reliable 3D Processors
Problem
Small size in modern processors makes them vulnerable to data corruption
Solutions
Redundancy: two/three copies of theprocessor operating lock-step =⇒multiple pipelines increase cost.
Leading execution/trailing checkingcores: trailing core re-executesinstructions (not lock-step) =⇒ stilladditional pipeline increases area.
Stack it!Extra wires eliminated.
Optional checker core.
Unutilized silicon area.
Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 12 / 29
Stacking Functional Unit Blocks
Index
1 Stacking Complete Modules2 Stacking Functional Unit Blocks3 Splitting Functional Unit Blocks4 Conclusions
Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 13 / 29
Stacking Functional Unit Blocks Introduction
Stacking Functional Unit Blocks
Nowadays
Early step of development for thistechnologies.
3D integration will require
Design automation tools.
Layout support.
Verification and validationmethodologies.
Future
Reorganize the processor pipeline in newways.
Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 14 / 29
Stacking Functional Unit Blocks Removing wires
Removing WiresPentium III & IV branch misprediction
Problem
Wire delays have not evolve as fast as transistors speed.
PIII branch misprediction
PIV branch misprediction
Solution
3D implementation so distant blocks are now vertically stacked on top ofeach other.
Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 15 / 29
Stacking Functional Unit Blocks Removing wires
Removing WiresAlpha 21264
Problem
Superscalar processor with multiple execution units (EU) requires a bypassnetwork to forward results between all of the EU =⇒ wiring.
2D Solution
Divide EU into two groups orclusters, each with its own bypassnetwork and communicated.
3D Solution
Stack the clusters.
Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 16 / 29
Stacking Functional Unit Blocks Trade-offs
Removing WiresTrade-offs
Pros
Optimize processorpipeline opportunities.
Physically reduction ofamount of wiring.
Cons
Non-trivial engineeringeffort.
Modify pipelineVerify and validatenew design.
Additional costs.
Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 17 / 29
Stacking Functional Unit Blocks TSV Reality
Removing WiresTSV Reality
Problem
After stacking two blocks there is enough room for placing TSVs.
Solution
Different layouts of the TSVs.
Wire overhead reintroductionReintroduced wires do not completely cancel the 3D wire reduction benefits.
Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 18 / 29
Splitting Functional Unit Blocks
Index
1 Stacking Complete Modules2 Stacking Functional Unit Blocks3 Splitting Functional Unit Blocks4 Conclusions
Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 19 / 29
Splitting Functional Unit Blocks Introduction
Splitting Functional Unit Blocks
Last level
Logic gates
Split individual functional unitsacross multiple layers.
Reorganize the functional unitblock =⇒ more compact 3Darragement.
Benefits
Reduce length of intra-blockwiring.
Improve operating frequencies.
We will introduce a starting point of thinking.
Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 20 / 29
Splitting Functional Unit Blocks 3D Cache Organizations
3D Cache OrganizationsFirst view
Problem
L2 cache consumes about half of the overalldie area.
Worst case routing distance: 2x+4y
Two stack possibilities.
Banks on cores
Half space.
Accessingequal.
Banks on banks
Half space.
Accessingreduced.
Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 21 / 29
Splitting Functional Unit Blocks Splitting the cache
3D Splitting the cache
Problem
Wires within each bank also impact overalllatency.
Split individual cache banks across multiple layers.
Columns oncolumns
Bestlatency.
Rows on rows
Energy reduction.
Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 22 / 29
Splitting Functional Unit Blocks Splitting the cache
3D Splitting cacheTesting
Experimental results
SPICE simulation.
Column on column organization.
SRAM implementations in 65-nm process.
Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 23 / 29
Splitting Functional Unit Blocks 3D Adders
3D AddersClassic Look-ahead Carry Adder
Look-ahead Carry Adder
n = 16-bits
Critical path along bit[0]-bit[n-1]
Several ways to split the adder
Based on inputs
x bottom layer;y top layer.
1st lvl of propagatelayer splitted.
Half wire length.
By significance
least significant bitsbottom layer;most significant toplayer.
TSV between rootnodes.
Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 24 / 29
Conclusions
Index
1 Stacking Complete Modules2 Stacking Functional Unit Blocks3 Splitting Functional Unit Blocks4 Conclusions
Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 25 / 29
Conclusions
Conclusions
Benefits of 3D organizingcomponents
Can significantly reducewire lengths.
Devices from differenttechnologies can betightly integrated andcombined.
3D organizations may berequired depending on theexact design constraints andobjectives.
Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 26 / 29
Conclusions
Conclusions
Cons
More granularity ⇒more re-dising.
Stacking can increaseheat.
Long level oftechnologicaldevelopment
Every re-design process yieldsto a cost increment.
Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 27 / 29
References
References
Three-Dimensional Microprocessor DesignGabriel H. LohSpringer Science 2010
A Modular 3D Processor for Flexible Product Design and TechnologyMigrationGabriel H. LohACM 2008
Die-stacking (3D) microarchitectureB. Black.International Symposium on Microarchitecture, pp. 469-479, 2006
Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 28 / 29
The end Questions
Thank you.
Questions?Please be nice
Alberto Villegas Erce (Seminar on Computer Systems Turku University )3D Microprocessor Design April 2010 29 / 29