1 exploring design space for 3d clustered architectures manu awasthi, rajeev balasubramonian...
Post on 21-Dec-2015
218 views
TRANSCRIPT
![Page 1: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah](https://reader031.vdocuments.net/reader031/viewer/2022032704/56649d595503460f94a399c8/html5/thumbnails/1.jpg)
1
Exploring Design Space for 3D Clustered Architectures
Manu Awasthi, Rajeev BalasubramonianUniversity of Utah
![Page 2: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah](https://reader031.vdocuments.net/reader031/viewer/2022032704/56649d595503460f94a399c8/html5/thumbnails/2.jpg)
2
Device Layer 2Vertical Interconnect
Silicon
1
• Multiple layers of active devices• Vertical interconnects between layers
Device Layer
Silicon
1
Courtesy: K.Bernstein, IBM
2D Chip
3D Chip
Layer 1
Layer 2
3D TechnologiesVerySmall
~ 10µm
![Page 3: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah](https://reader031.vdocuments.net/reader031/viewer/2022032704/56649d595503460f94a399c8/html5/thumbnails/3.jpg)
3
Benefits of 3D • Reduction of global interconnect
L
L
• Delay/Power reduction• Bandwidth• Mix-technology integration
![Page 4: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah](https://reader031.vdocuments.net/reader031/viewer/2022032704/56649d595503460f94a399c8/html5/thumbnails/4.jpg)
4
Previous Proposals
• Previously in 3D…– Break and stack (Folding) [Puttaswamy et
al]• Vertical stacking of active devices
RegFile
Break and Stack
All are active
HEAT!!!
Reduced Intra-block
latency
![Page 5: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah](https://reader031.vdocuments.net/reader031/viewer/2022032704/56649d595503460f94a399c8/html5/thumbnails/5.jpg)
5
An alternative approach?
2D Chip
3D Chip
Die 1
Die 0
Prudent Stacking Can:
• Improve Performance
• Result in better thermal profile
![Page 6: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah](https://reader031.vdocuments.net/reader031/viewer/2022032704/56649d595503460f94a399c8/html5/thumbnails/6.jpg)
6
Wire Delays and Performance
Impact of wire delays
0
5
10
15
20
25
30
35
40
45
50
0 2 4 6 8
Extra delay (in clock cycles)
Per
cent s
low
dow
n
DCACHE-INTALU
IQ-INTALU
RENAME-IQ
L1D-L2
BPRED-ICACHE
ICACHE-DECODE
DECODE-RENAME
DCACHE-FPALU
FPALU-INTALU
![Page 7: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah](https://reader031.vdocuments.net/reader031/viewer/2022032704/56649d595503460f94a399c8/html5/thumbnails/7.jpg)
7
Clustered Architectures
• Centralized front-end– I-Cache & D-Cache– LSQ, Rename, Decode– Branch Predictor
• Clustered back-end– Issue Queue– Regfile, FUs
L1 DCache
Cluster
Crossbar/Router
Front-End
Higher clock Frequency, High ILP!!
![Page 8: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah](https://reader031.vdocuments.net/reader031/viewer/2022032704/56649d595503460f94a399c8/html5/thumbnails/8.jpg)
8
Decentralized Cache Banks
L1 DCache
L1 DCache
L1 DCache
Possibly better performance
![Page 9: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah](https://reader031.vdocuments.net/reader031/viewer/2022032704/56649d595503460f94a399c8/html5/thumbnails/9.jpg)
9
Decentralized Cache Banks
L1 DCache
Replicated Cache Banks
L1 DCache
L1 DCache
![Page 10: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah](https://reader031.vdocuments.net/reader031/viewer/2022032704/56649d595503460f94a399c8/html5/thumbnails/10.jpg)
10
Decentralized Cache Banks
L1 DCache
Word Interleaved Cache Banks
L1 DCache
Odd Words Even Words
![Page 11: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah](https://reader031.vdocuments.net/reader031/viewer/2022032704/56649d595503460f94a399c8/html5/thumbnails/11.jpg)
11
Outline
• Introduction– Motivation– 3D Architectures– Clustered Architectures
• Proposals• Results • Conclusions
![Page 12: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah](https://reader031.vdocuments.net/reader031/viewer/2022032704/56649d595503460f94a399c8/html5/thumbnails/12.jpg)
12
Architecture 1
Cache-on-cluster
Die 1
Die 0
Cache Bank
Cluster
Inter Die Interconnect
Intra Die Interconnect
![Page 13: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah](https://reader031.vdocuments.net/reader031/viewer/2022032704/56649d595503460f94a399c8/html5/thumbnails/13.jpg)
13
Architecture 2
Cluster-on-cluster
Die 1
Die 0
Cache Bank
Cluster
Inter Die Interconnect
Intra Die Interconnect
![Page 14: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah](https://reader031.vdocuments.net/reader031/viewer/2022032704/56649d595503460f94a399c8/html5/thumbnails/14.jpg)
14
Architecture 3
Staggered
Die 1
Die 0
Cache Bank
Cluster
Inter Die Interconnect
Intra Die Interconnect
![Page 15: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah](https://reader031.vdocuments.net/reader031/viewer/2022032704/56649d595503460f94a399c8/html5/thumbnails/15.jpg)
15
Outline
• Introduction– Motivation– 3D Architectures– Clustered Architectures
• Proposals• Results • Conclusions
![Page 16: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah](https://reader031.vdocuments.net/reader031/viewer/2022032704/56649d595503460f94a399c8/html5/thumbnails/16.jpg)
16
Experimental Setup
• Framework– Simplescalar, Wattch and Hotspot 3.0– Wire model : 8x global metal plane
• Benchmarks– SPEC 2K, single threaded
• Processor Configuration– 8 Clusters– 64 kB L1 I/D Caches, 2 way set-assoc
• L1 Data cache Word-Interleaved or Replicated
• 2D Centralized Cache – Base Case
![Page 17: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah](https://reader031.vdocuments.net/reader031/viewer/2022032704/56649d595503460f94a399c8/html5/thumbnails/17.jpg)
17
Base Case PerformancesPerformance Improvement wrt 2D Centralized Cache
0.01.02.03.04.05.06.07.08.09.0
Replicated WI
Cache Bank Type
Per
form
ance
Impr
ovem
ent Best Case 2D Config
![Page 18: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah](https://reader031.vdocuments.net/reader031/viewer/2022032704/56649d595503460f94a399c8/html5/thumbnails/18.jpg)
18
The 3D EffectAverage Performance Improvement
0
2
4
6
8
10
12
14
16
Arch 1 Arch 2 Arch 3
Perc
enta
ge Im
prov
emen
t ove
r 2D
Cent
raliz
ed
3D Replicated vs 2D Centralized
![Page 19: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah](https://reader031.vdocuments.net/reader031/viewer/2022032704/56649d595503460f94a399c8/html5/thumbnails/19.jpg)
19
The 3D EffectAverage performance Improvement
0
5
10
15
20
25
Arch 1 Arch 2 Arch 3Perc
enta
ge Im
pro
vem
ent over
Centr
alized
3D WI vs 2D Centralized
![Page 20: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah](https://reader031.vdocuments.net/reader031/viewer/2022032704/56649d595503460f94a399c8/html5/thumbnails/20.jpg)
20
Comparisons
Average Performance Improvement wrt 2D Centralized
0
5
10
15
20
25
Arch 1 Arch 2 Arch 3
IPC
Impr
ovem
ent
Average performance Improvement wrt 2D Centralized
0
5
10
15
20
25
Arch 1 Arch 2 Arch 3IP
C Im
prov
emen
t
3D Replicated 3D WI
Best Case 3D - Rep Best Case 3D - WI
12% Improvement for best case 3D vs best case 2D
Best Case 2D
2D Case
Base Case Performance Comparisons
0
5
10
15
20
25
Replicated WI
IPC
Impr
ovem
ent
![Page 21: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah](https://reader031.vdocuments.net/reader031/viewer/2022032704/56649d595503460f94a399c8/html5/thumbnails/21.jpg)
21
Thermal Analysis
• Wattch for power numbers• HotSpot 3.0 for thermal model (grid)
– 500x500 grid resolution
• Interconnect power modeling– Attributed to functional units– 8X plane wires– Router + Crossbar modeled as separate
entity
![Page 22: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah](https://reader031.vdocuments.net/reader031/viewer/2022032704/56649d595503460f94a399c8/html5/thumbnails/22.jpg)
22
Thermal Profiles
0
20
40
60
80
100
120
Base Arch 1 Arch 2 Arch 3
Pea
k Tem
p - H
ottes
t U
nit (C
)
Peak Temperature : Hottest on-chip Unit (Celsius)
![Page 23: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah](https://reader031.vdocuments.net/reader031/viewer/2022032704/56649d595503460f94a399c8/html5/thumbnails/23.jpg)
23
Outline
• Introduction– Motivation– 3D Architectures– Clustered Architectures
• Proposals• Results • Conclusions
![Page 24: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah](https://reader031.vdocuments.net/reader031/viewer/2022032704/56649d595503460f94a399c8/html5/thumbnails/24.jpg)
24
Conclusions
• Wire delays are critical to performance– Some are more important than others.
• Prudent block stacking– Performance improvement upto 12% over
2D• WI banks + Arch 3 (3D)
– Better thermal profiles compared to folding
![Page 25: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah](https://reader031.vdocuments.net/reader031/viewer/2022032704/56649d595503460f94a399c8/html5/thumbnails/25.jpg)
25
Backup Slides
![Page 26: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah](https://reader031.vdocuments.net/reader031/viewer/2022032704/56649d595503460f94a399c8/html5/thumbnails/26.jpg)
26
Cluster
(a) Arch-1 (cache-on-cluster) (b) Arch-2 (cluster on cluster) (c) Arch-3 (staggered)
Cache bank Intra-die horizontal wire Inter-die vertical wire
Die 1
Die 0
4 Cluster Arrangements