3d interconnect: architectural challenges and opportunities
DESCRIPTION
3D Interconnect: Architectural Challenges and Opportunities. Tim Sherwood. UC SANTA BARBARA. The Role of Architecture. Demands. SW. HW. Constraints. (Battery Life, Performance, Programmability ). Applications. Runtime System. Architecture. 3D Integration. Circuit. Device. Package. - PowerPoint PPT PresentationTRANSCRIPT
3D Interconnect: ArchitecturalChallenges and Opportunities
UC SANTA BARBARA
Tim Sherwood
The Role of Architecture
Applications
Runtime System
Architecture
Circuit
Device
Package
SW
HW
Constraints
Demands
3D In
teg
rati
on
(Noise, Thermal, Yield)
(Battery Life, Performance, Programmability )
Lab Overview
Intrusion Detection System
Server Farm
Processor Core
Caches, etc.
Prototype Acceleration
Primitives
High SpeedProgrammable Routers
1 0
0 01
1
1
0
1
0
1
1
0
0
1 0
01
1
0
b2 b1
b0
b4 { 2 }
b5
b9
b8 { 2,7 }
b7
b6 { 2,5 }
b3
Intrusion Detectionand Prevention
Adaptive Hardware ProfilingEngines integrated On-Chip
Memory Hierarchy
Software Defined WirelessAccess Point
ReconfigurableSecurity on FPGAs
ReconfigurableSecurity on FPGAs
High ThroughputMEMS controllers
Lab Overview
Software Defined WirelessAccess PointHigh Speed
Programmable Routers
1 0
0 01
1
1
0
1
0
1
1
0
0
1 0
01
1
0
b2 b1
b0
b4 { 2 }
b5
b9
b8 { 2,7 }
b7
b6 { 2,5 }
b3
Memory Hierarchy
ReconfigurableSecurity on FPGAs
High ThroughputMEMS controllers
Potential for Impact from 3D
Software Defined WirelessAccess Point
Intrusion Detection System
Server Farm
Processor Core
Caches, etc.
Prototype Acceleration
Primitives
High SpeedProgrammable Routers
1 0
0 01
1
1
0
1
0
1
1
0
0
1 0
01
1
0
b2 b1
b0
b4 { 2 }
b5
b9
b8 { 2,7 }
b7
b6 { 2,5 }
b3
Intrusion Detectionand Prevention
Adaptive Hardware ProfilingEngines integrated On-Chip
Memory Hierarchy
3D Specialization 3D Bandwidth
3D Bandwidth
3D Integrationfor Latency
Potential for Impact from 3D
Software Defined WirelessAccess Point
Intrusion Detection System
Server Farm
Processor Core
Caches, etc.
Prototype Acceleration
Primitives
High SpeedProgrammable Routers
1 0
0 01
1
1
0
1
0
1
1
0
0
1 0
01
1
0
b2 b1
b0
b4 { 2 }
b5
b9
b8 { 2,7 }
b7
b6 { 2,5 }
b3
Intrusion Detectionand Prevention
Adaptive Hardware ProfilingEngines integrated On-Chip
Memory Hierarchy
3D Specialization 3D Bandwidth
3D Bandwidth
3D Integrationfor Latency
3D Integrationfor Mixed Signal
3D Integrationfor Mixed Technology
3D Specialization
Presented Works
• Shashidhar Mysore, Banit Agrawal, Sheng-Chih Lin, Navin Srivastava, Kaustav Banerjee and Timothy Sherwood. Introspective 3D Chips , Proceedings of the Twelfth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 2006. San Jose, CA
• Gian Luca Loi, Banit Agrawal, Navin Srivastava, Sheng-Chih Lin, Timothy Sherwood, Kaustav Banerjee. A Thermally-Aware Performance Analysis of Vertically Integrated (3-D) Processor-Memory Hierarchy, Proceedings of the 43nd Design Automation Conference (DAC), June 2006. San Francisco, CA
Two Specific Opportunities
1) 3D Integration for Performance Bring Memory Closer to those that use it More Bandwidth and Lower Latency Tricky System Level Tradeoffs
2 ) 3D Integration for Specialization Integration offers unique specialization opportunity Decouple commodity from niche
The ramifications of any radical change requires a careful evaluation that considers all the parameters
temppackage
total power
dynamic power
V
utilized area
communication
A Simple Performance “Ecosystem”
parallelismfreq
leakage
app
OS or runtime
feedback
performanceNo multicore, no spatial variance, no temporal variance, no metrics of cost or error or yield
Two Specific Opportunities
1) 3D Integration for Performance Bring Memory Closer to those that use it More Bandwidth and Lower Latency Tricky System Level Tradeoffs
2 ) 3D Integration for Specialization Integration offers unique specialization opportunity Decouple commodity from niche
The ramifications of any radical change requires a careful evaluation that considers all the parameters
Basic Savings in 3D
Area: 4Dist: √8 ≈ 2.8
Area: 2Dist: √4 ≈ 2 + 1L
Area: 1Dist: √2 ≈ 1.4 + 3L
BW: √8 ≈ 2.8 BW: 2√4 ≈ 4 BW: 4√2 ≈ 5.6
On-chip Latency improved, Bandwidth could improve more
What about real wires? What about apps? What about temp?
Example Technology Node
Dioxide
Silicon substrate
CMOS
30-
40um
2-5um
50um
Lay
er
1La
yer
2
Metal layers
Vertical
Interconnect
Dioxide
Banerjee et al. IEEE 2001
3D Wire Delay
160 240 320 400 480 560 640 720 8000
0.2
0.4
0.6
0.8
1
1.2
1.4
x 10-11
Del
ay (
Sec
)
Wire length L ( um )
Vertical via model
Horizontal line model
Horizontal wire length L
Distributed RC delay
Ve
rtic
al
wir
e l
en
gth
A “Typical” 2D System Design
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
CPU core
L2 Unified Cache
L2 to Main Memory External Bus
Board
L1 I-Cache L1 D-Cache
Me
mo
ry Co
ntro
ller
Memory Bottleneck
A 3D Memory System
CPU core
L1 I-Cache L1 D-Cache
L2 Unified Cache
L2 to Main Memory vertical interlayer BusL1 to L2 vertical
interlayer Bus
Layer 1
Layer 2
Layer 3 to 18
Stacked three dimensional
main memory
8 bytes to 128 bytes200 Mhz to 2 Ghz
System-Level Simulation
Simulator : Sim-Alpha simulatorProcessor : Alpha-21264 processorBenchmarks: mcf, parser, twolf with Minnespec
reduced inputs
% main memory access per instruction
mcf
parser
twolf
1.7%
0.258472%
0.00062%
Effect of Bus Width and Frequency
0
1
2
3
4
5
6
7
10 100 1000 10000
Exe
cutio
n tim
e (s
ec)
L2 cache size in KBytes
8 bytes bus width (3-D)
16 bytes bus width (3-D)
32 bytes bus width (3-D)
64 bytes bus width (3-D)
128 bytes bus width (3-D)
8 bytes bus width (2-D)
mcf
Only a few vias required
Effect of Clock Frequency : mcf
0
0.5
1
1.5
2
2.5
3
600 1000 1400 1800 2200 2600 3000
Exe
cutio
n t
ime
pe
r in
stru
ctio
n (
ns)
Clock Frequency (MHz)
mcf (3-D)
mcf (2-D)
Effect of Clock Frequency : parser
0
0.2
0.4
0.6
0.8
1
1.2
1.4
600 1000 1400 1800 2200 2600 3000
Exe
cutio
n t
ime
pe
r in
stru
ctio
n (
ns)
Clock Frequency (MHz)
parser (3-D)
parser (2-D)
Effect of Clock Frequency : twolf
0
0.2
0.4
0.6
0.8
1
1.2
1.4
600 1000 1400 1800 2200 2600 3000
Exe
cutio
n t
ime
pe
r in
stru
ctio
n (
ns)
Clock Frequency (MHz)
twolf (3-D)
twolf (2-D)
An Example Memory System
L2 Cache
CPU & L1Cache
DRAM
DRAM
DRAM
DRAM
DRAM
Heat Sink
Therm
al Gradient
Self-consistent Thermal Modeling
Insert the initials values of leakage and dynamic power for each layer
Calculate the first thermal profile
Based on the previous thermal profile calculate the new power dissipation considering
Ion decrease with temperature
ILeakage increase with temperature
Calculate the new temperature profile Finish
Yes
No
Is it convergent?
3D Thermally-awarePerformance Analysis
mcf
600 1000 1400 1800 2200 2600 3000Frequency in MHz
1
1.5
2
2.5
3
Temperature constraint
Min execution time in2-D
Min execution time in3-D
3-D max chiptemperature
2-D max chiptemperature
400
390
380
370
360
350
340
330
Tem
pe
ratu
re(K
)
Exe
cu
tio
nti
me
per
i ns
tru
ctio
n
3D Thermally-awarePerformance Analysis
twolf
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
600 1000 1400 1800 2200 2600 3000Frequency in MHz
Maximum frequency allowed due totemperature constraint
Min execution time in 3-D
Min execution time in 2-D
390
380
370
360
350
340
330
Temperature constraint
2-D max chiptemperature
3-D max chiptemperature
Tem
per
atu
re(K
)
Exe
cu
tio
nti
me
per
ins
t ru
ctio
n
3D Memory Integration
• Many Unaccounted For Effects Effect of Multiple Cores and Memory Banks Spatial Variation Temporal Variation (thermal load balancing) All of these are intimately tied to the integration method
and packaging
• How to Manage Architecture and Software will be increasingly involved Exposing Variation to higher levels Huge demand for “models”, “sensors”, and “knobs” Thermal, Packaging, Application, Architecture all tangled
Need to build models that capture all of these aspects Models need to be “self consistent”
Two Specific Opportunities
1) 3D Integration for Performance Bring Memory Closer to those that use it More Bandwidth and Lower Latency Tricky System Level Tradeoffs
2 ) 3D Integration for Specialization Integration offers unique specialization opportunity Decouple commodity from niche
The ramifications of any radical change requires a careful evaluation that considers all the parameters
3D Integration for Introspection
• Complex interactions across levels of abstraction make debugging, optimizing, securing, and analysis in general difficult
• The first requirement – visibility Not just data capture, we need the ability to put
togethera cohesive picture of system interactions and correlate between them in a sound and non-intrusive manner
• The hardware/software boundary is uniquely situated Piece together from low level events
• What would the programmer wish list look like?
To
In
teg
rate
d M
on
ito
rin
g H
ard
wa
reL1_BPU
Decode
Trace CacheTop
L2_BPU
Bus ControlMOB ITLB
Trace Cache BottomDTLB
L1Cache
Top
L2 Cache
L1 CacheBottom
FP Exec
UROM
FP Reg
Alloc
Rename InstrQ1
SchedInstrQ2
Int RegRetire
Int ExecMemCtl790
320
2
32
What programmers want
32 bit Memory Address32 bit Memory Value10 bit Opcodes2, 5 bit Register Names2, 32 bit Register Values10 bits of “status”
Everything.
3x3x3x3x3x3x
4x4x4x4x4x4x
1892 bits per cycle = 1 terrabyte/sec @ 4Ghz
Why programmers cant have it
• Interconnect is not free Huge cross chip busses OptBuf 285um 20,000 buffers
• Analysis is not free Significant processing
required
• Extra cost of added heat $15 budget for cooling
• Used by developers
To
Inte
gra
ted
Mo
nit
ori
ng
Har
dw
are
L1_BPU
Decode
Trace CacheTop
L2_BPU
Bus ControlMOB ITLB
Trace Cache BottomDTLB
L1Cache
Top
L2 Cache
L1 CacheBottom
FP Exec
UROM
FP Reg
Alloc
Rename InstrQ1
SchedInstrQ2
Int RegRetire
Int ExecMemCtl
790
320
2
32
Cake + Eating It Too
• Need a way to provide cheap (or high margin) HW to the masses No paying for developer functionality
• Get developers the powerful analysis they crave See everything at executable rate
• Provide “snap-on” functionality for developers Separate chip for analysis engine Only hook it onto “developer” systems
• Idea is not limited to development systems Security, Error Correction, Confidentiality,
Accelerators, …
• 3d Integration offers the potential
Thermal Impact
Conclusion: Opportunities+Challenges
3D Integration for Performance Bring Memory Closer to those that use it More Bandwidth and Lower Latency Requires few vias for big impact Tricky System Level Tradeoffs
3D Integration for Specialization Integration offers unique specialization opportunity Requires rethinking of integration process Decouple commodity from niche
Challenges Cross layer models: from app to package Cross layer optimization: both static and dynamic Thermal Management is everybody's problem
http://www.cs.ucsb.edu/~arch/NSF CNS 0524771, NSF CCF 0702798, NSF CCF 0448654
Related Work• Bryan Black, Murali M. Annavaram, Edward Brekelbaum, John DeVale, Gabriel H. Loh,
Lei Jiang, Don McCauley, Pat Morrow, Don Nelson, Daniel Pantuso, Paul Reed, Jeff Rupley, Sadasivan Shankar, John Paul Shen, Clair Webb, "Die Stacking (3D) Microarchitecture," in IEEE International Symposium on Microarchitecture, 469-479, 2006.
• PUBLICATIONS on 3D STACKED IC
• 1. Karthik Balakrishnan, Vidit Nanda, Siddharth Easwar, and Sung Kyu Lim, "Wire Congestion And Thermal Aware 3D Global Placement," IEEE/ACM Asia South Pacific Design Automation Conference, p1131-1134, 2005. (pdf)• 2. Jacob Minz, Sung Kyu Lim, and Cheng-Kok Koh, "3D Module Placement for Congestion and Power Noise Reduction," ACM Great Lake Symposium on VLSI, p458-461, 2005. (pdf)• 3. Jacob Minz, Eric Wong, and Sung Kyu Lim, "Reliability-aware Floorplanning for 3D Circuits," to appear in IEEE International SOC Conference, 2005. (pdf)• 4. Kiran Puttaswamy and Gabriel H. Loh, "Implementing Caches in a 3D Technology for High Performance Processors", IEEE International Conference on Computer Design, pp. 525-532, 2005. (pdf)• 5. Eric Wong and Sung Kyu Lim, "3D Floorplanning with Thermal Vias," to appear in Design, Automation and Test in Europe, 2006.• 6. Kiran Puttaswamy and Gabriel H. Loh, "Implementing Register Files for High-Performance Microprocessors in a Die-Stacked (3D) Technology," IEEE International Symposium on VLSI, pp. 384-389, 2006. (pdf)• 7. Kiran Puttaswamy and Gabriel H. Loh, "The Impact of 3-Dimenstional Integration on the Design of Arithmetic Units," IEEE International Symposium on Circuits and Systems, pp. 4951-4954, 2006. (pdf)• 8. Kiran Puttaswamy and Gabriel H. Loh, "Thermal Analysis of a 3D Die-Stacked High-Performance Microprocessor," ACM/IEEE Great Lakes Symposium on VLSI, 19-24, 2006. (pdf)• 9. Kiran Puttaswamy and Gabriel H. Loh, "Dynamic Instruction Schedulers in a 3-Dimensional Integration Technology," ACM/IEEE Great Lakes Symposium on VLSI, 153-158, 2006. (pdf)• 10. Yuan Xie, Gabriel H. Loh, Bryan Black and Kerry Bernstein, "Design Space Exploration for 3D Architectures," ACM Journal on Emerging Technologies in Computing Systems, vol.2(2), pp. 65-103, 2006. (pdf)• 11. Eric Wong, Jacob Minz, and Sung Kyu Lim, "Decoupling Capacitor Planning and Sizing for Noise and Leakage Reduction," to appear in IEEE International Conference on Computer Aided Design, 2006.• 12. Bryan Black, Murali M. Annavaram, Edward Brekelbaum, John DeVale, Gabriel H. Loh, Lei Jiang, Don McCauley, Pat Morrow, Don Nelson, Daniel Pantuso, Paul Reed, Jeff Rupley, Sadasivan Shankar, John Paul Shen, Clair
Webb, "Die Stacking (3D) Microarchitecture," in IEEE International Symposium on Microarchitecture, 469-479, 2006.• 13. Kiran Puttaswamy, Gabriel H. Loh, "Thermal Herding: Microarchitecture Techniques for Controlling HotSpots in High-Performance 3D-Integrated Processors," in IEEE International Symposium on High-Performance
Computer Architecture, 2007.• 14. Kiran Puttaswamy, Gabriel H. Loh, "Scalability of 3D-Integrated Arithmetic Units in High-Performance Microprocessors," to appear in ACM Design Automation Conference, 2007.
• PUBLICATIONS on MICRO-ARCHITECTURAL FLOORPLANNING
• 1. Mongkol Ekpanyapong, Jacob Minz, Thaisiri Watewai, Hsien-Hsin S. Lee, and Sung Kyu Lim, "Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 25, No. 7, pp. 1289-1300, 2006. (pdf)
• 2. Mongkol Ekpanyapong, Jacob Minz, Thaisiri Watewai, Hsien-Hsin S. Lee, and Sung Kyu Lim, "Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design," ACM Design Automation Conference, p634-639, 2004. (pdf)
• 3. Mongkol Ekpanyapong, Sung Kyu Lim, Chinnakrishnan Ballapuram, and Hsien-Hsin S. Lee, "Wire-driven Microarchitectural Design Space Exploration," IEEE International Symposium on Circuits and Systems, p1867-1870, 2005. (pdf)
• 4. Michael Healy, Mario Vittes, Mongkol Ekpanyapong, Chinnakrishnan Ballapuram, Sung Kyu Lim, Hsien-Hsin S. Lee, and Gabriel H. Loh, "Microarchitectural Floorplanning Under Performance and Temperature Tradeoff," to appear in Design, Automation and Test in Europe, 2006.
• 5. Michael Healy, Mario Vittes, Mongkol Ekpanyapong, Chinnakrishnan Ballapuram, Sung Kyu Lim, Hsien-Hsin S. Lee, and Gabriel H. Loh, "Multi-Objective Microarchitectural Floorplanning For 2D And 3D ICs," to appear in IEEE Transactions on Computer-Aided Design of Integrated Ciruits and Systems.
• 6. Fayez Mohamood, Michael Healy, Sung Kyu Lim, and Hsien-Hsin S. Lee, "A Floorplan-Aware Dynamic Inductive Noise Controller for Reliable Processor Design," to appear in IEEE/ACM International Symposium on Microarchitecture, 2006.
• 7. Fayez Mohamood, Michael Healy, Hsien-Hsin Lee, and Sung Kyu Lim, "Noise-Direct: A Technique for Power Supply Noise Aware Floorplanning Using Microarchitecture Profiling," to appear in IEEE/ACM Asia South Pacific Design Automation Conference, 2007.
• PUBLICATIONS on 3D PACKAGING
• 1. Jacob Minz and Sung Kyu Lim, "Layer Assignment for System-on-Packages," ACM/IEEE Asia and South Pacific Design Automation Conference, p31-37, 2004. (pdf)• 2. Jacob Minz, Mohit Pathak, and Sung Kyu Lim, "Net and Pin Distribution for 3D Package Global Routing," Design, Automation and Test in Europe, p1410-1411, 2004. (pdf)• 3. Ramprasad Ravichandran, Jacob Minz, Mohit Pathak, Siddharth Easwar, and Sung Kyu Lim, "Physical Layout Automation for System-On-Packages," IEEE Electronic Components and Technology Conference, p41-48, 2004.
(pdf)• 4. Pun Hang Shiu, Ramprasad Ravichandran, Siddharth Easwar, and Sung Kyu Lim, "Multi-layer Floorplanning for Reliable System-on-Package," IEEE International Symposium on Circuits and Systems, p69-72, 2004. (pdf)• 5. Jacob Minz, Sung Kyu Lim, Jinwoo Choi, and Madhavan Swaminathan, "Module Placement for Power Supply Noise and Wire Congestion Avoidance in 3D Packaging," IEEE Electrical Performance of Electronic Packaging, p123-
126, 2004. (pdf)• 6. Jacob Minz and Sung Kyu Lim, "A Global Router for System-on-Package Targeting Layer and Crosstalk Minimization," IEEE Electrical Performance of Electronic Packaging, p99-102, 2004. (pdf)• 7. Jacob Minz, Eric Wong, and Sung Kyu Lim, "Thermal and Crosstalk-Aware Physical Design For 3D System-On-Package," IEEE Electronic Components and Technology Conference, P824-831, 2005. (pdf)• 8. Eric Wong, Jacob Minz, and Sung Kyu Lim, "Power Noise-aware 3D Floorplanning for System-On-Package," to appear in IEEE Electrical Performance of Electronic Packaging, 2005. (pdf)• 9. Sung Kyu Lim, "Physical Design for 3D System-On-Package: Challenges and Opportunities," IEEE Design & Test of Computers, Vol. 22, No. 6, p532-539, 2005. (pdf)• 10. Jacob Minz, Eric Wong, Mohit Pathak, and Sung Kyu Lim, "Placement and Routing for 3D System-On-Package Designs," to appear in IEEE Transactions on Components and Packaging Technologies.• 11. Jacob Minz and Sung Kyu Lim, "Block-level 3D Global Routing With an Application to 3D Packaging," to appear in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.• 12. Jacob Minz, Somaskanda Thyagaraja, and Sung Kyu Lim, "Optical Routing for 3D System-On-Package," to appear in Design, Automation and Test in Europe, 2006.• 13. Eric Wong, Jacob Minz, and Sung Kyu Lim, "White Space Management for Thermal Via and Decoupling Capacitor Insertion Targeting 3D System-On-Package," to appear in IEEE Electronic Components and Technology
Conference, 2006.• 14. Eric Wong, Jacob Minz, and Sung Kyu Lim, "Multi-objective Module Placement For 3D System-On-Package," IEEE Transactions on Very Large Scale Integration Systems, Vol. 14, No. 5, pp. 553-557, 2006