energy/performance design
DESCRIPTION
Energy/Performance Design. Memory Hierarchies in Intelligent Memories:. Wei Huang, Jose Renau, Seung-Moon Yoo and Josep Torrellas. University of Illinois at Urbana-Champaign. Motivation. Advances in technology: Processor and Memory integration Many processors on a chip - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Energy/Performance Design](https://reader036.vdocuments.net/reader036/viewer/2022081516/5681401a550346895dab6893/html5/thumbnails/1.jpg)
Energy/Performance Design
Wei Huang, Jose Renau, Seung-Moon Yoo and Josep Torrellas
Memory Hierarchies in Intelligent Memories:
University of Illinois at Urbana-Champaign
![Page 2: Energy/Performance Design](https://reader036.vdocuments.net/reader036/viewer/2022081516/5681401a550346895dab6893/html5/thumbnails/2.jpg)
Workshop on Scalable Shared Memory Multiprocessors - June 10, 20002
Motivation
Advances in technology: Processor and Memory integration Many processors on a chip
How to design for high performance Energy consumption is a big concern Problems in cooling system
![Page 3: Energy/Performance Design](https://reader036.vdocuments.net/reader036/viewer/2022081516/5681401a550346895dab6893/html5/thumbnails/3.jpg)
Workshop on Scalable Shared Memory Multiprocessors - June 10, 20003
Goals of this work
Evaluate trade-offs in memory hierarchy Energy consumption Performance Area requirements
Detailed energy consumption analysis
![Page 4: Energy/Performance Design](https://reader036.vdocuments.net/reader036/viewer/2022081516/5681401a550346895dab6893/html5/thumbnails/4.jpg)
Workshop on Scalable Shared Memory Multiprocessors - June 10, 20004
Findings of the Work
Modest cache size is necessary Easy modifications in memory reduce
energy consumption
![Page 5: Energy/Performance Design](https://reader036.vdocuments.net/reader036/viewer/2022081516/5681401a550346895dab6893/html5/thumbnails/5.jpg)
Workshop on Scalable Shared Memory Multiprocessors - June 10, 20005
The FlexRAM Architecture
Yi Kang, Wei Huang, Seung-Moon Yoo, Diana Keen, Zhenzhou Ge, Vinh Lam, Pratap Pattanaik and Josep Torrellas - ICCD99
![Page 6: Energy/Performance Design](https://reader036.vdocuments.net/reader036/viewer/2022081516/5681401a550346895dab6893/html5/thumbnails/6.jpg)
Workshop on Scalable Shared Memory Multiprocessors - June 10, 20006
Chip Architecture
64 nodes, each one includes: 2-issue processor @800Mhz 1MByte DRAM (12 clk) Row Buffers (6 clk) Cache (1 clk)
![Page 7: Energy/Performance Design](https://reader036.vdocuments.net/reader036/viewer/2022081516/5681401a550346895dab6893/html5/thumbnails/7.jpg)
Workshop on Scalable Shared Memory Multiprocessors - June 10, 20007
How a Memory Bank Works
4 Memory sub-banks, each 256KBytes 5 Row Buffers, each 1KByte 1 Data Buffer 256bits
![Page 8: Energy/Performance Design](https://reader036.vdocuments.net/reader036/viewer/2022081516/5681401a550346895dab6893/html5/thumbnails/8.jpg)
Workshop on Scalable Shared Memory Multiprocessors - June 10, 20008
Small Area Memory Banks+More Energy
+Less Spatial Locality
+More Localities
![Page 9: Energy/Performance Design](https://reader036.vdocuments.net/reader036/viewer/2022081516/5681401a550346895dab6893/html5/thumbnails/9.jpg)
Workshop on Scalable Shared Memory Multiprocessors - June 10, 20009
Pipelining the requests
Faster memory system without increased energy consumption
![Page 10: Energy/Performance Design](https://reader036.vdocuments.net/reader036/viewer/2022081516/5681401a550346895dab6893/html5/thumbnails/10.jpg)
Workshop on Scalable Shared Memory Multiprocessors - June 10, 200010
Advanced Memory Banks I
-Less Energy and Contention
+More Area
![Page 11: Energy/Performance Design](https://reader036.vdocuments.net/reader036/viewer/2022081516/5681401a550346895dab6893/html5/thumbnails/11.jpg)
Workshop on Scalable Shared Memory Multiprocessors - June 10, 200011
Advanced Memory Banks II
-Less Energy and Contention
+More area
![Page 12: Energy/Performance Design](https://reader036.vdocuments.net/reader036/viewer/2022081516/5681401a550346895dab6893/html5/thumbnails/12.jpg)
Workshop on Scalable Shared Memory Multiprocessors - June 10, 200012
Terminology for Memory Systems
Trad(i,j): Traditional S(i,j): Segmented IS(i,j): Interleaved Segmented ISP(i,j): Interleaved Segmented Pipelined
i : Degree of interleaving j : Number of sub-banks per interleaving
way
![Page 13: Energy/Performance Design](https://reader036.vdocuments.net/reader036/viewer/2022081516/5681401a550346895dab6893/html5/thumbnails/13.jpg)
Workshop on Scalable Shared Memory Multiprocessors - June 10, 200013
Energy and Area Issues
Access Type Trad(1,4) S(1,4) IS(2,4) IS(2,8)Cache hit (8KB) 191pj 191pj 191pj 191pjRB Hit 468pj 468pj 506pj 517pjBank Access 6999pj 3729pj 2287pj 1556pj
•More advanced configurations:
•More Area
•Less Energy
Trad(1,4) S(1,4) IS(2,4) IS(2,8)Area (.18m) 4.25mm2 4.25mm2 4.83mm2 5.23mm2
![Page 14: Energy/Performance Design](https://reader036.vdocuments.net/reader036/viewer/2022081516/5681401a550346895dab6893/html5/thumbnails/14.jpg)
Workshop on Scalable Shared Memory Multiprocessors - June 10, 200014
Evaluation Environment
Fixed parameters: 2-issue processor @800MHz Prefetch Cache, RB, Bank latencies (1,6,12 cycles)
Variable parameters: Cache sizes (256B,1KB,8KB,16KB) Memory Banks:
Trad(1,4),S(1,4),SP(1,4)
IS(2,4),ISP(2,4),IS(2,8),ISP(2,8)
![Page 15: Energy/Performance Design](https://reader036.vdocuments.net/reader036/viewer/2022081516/5681401a550346895dab6893/html5/thumbnails/15.jpg)
Workshop on Scalable Shared Memory Multiprocessors - June 10, 200015
ApplicationsApplic What It Does Cache Hit Rate
(%)GTree DM Tree Generation 50.7
DTree DM Tree Deployment 98.6
BSOM BSOM Neural Network 94.7
BLAST Protein Matching 96.9
Mpeg Mpeg-2 Motion Estimation 99.9
FIC Fractal Image Compressor 97.8
![Page 16: Energy/Performance Design](https://reader036.vdocuments.net/reader036/viewer/2022081516/5681401a550346895dab6893/html5/thumbnails/16.jpg)
Workshop on Scalable Shared Memory Multiprocessors - June 10, 200016
Performance: Memory Banks
•Small performance improvement in advanced configurations with 1KByte cache
![Page 17: Energy/Performance Design](https://reader036.vdocuments.net/reader036/viewer/2022081516/5681401a550346895dab6893/html5/thumbnails/17.jpg)
Workshop on Scalable Shared Memory Multiprocessors - June 10, 200017
Performance: Cache Effect
•Modest cache size is required for performance
![Page 18: Energy/Performance Design](https://reader036.vdocuments.net/reader036/viewer/2022081516/5681401a550346895dab6893/html5/thumbnails/18.jpg)
Workshop on Scalable Shared Memory Multiprocessors - June 10, 200018
Energy-Delay Product
•Big improvement in energy-delay product with more advanced memory configurations
![Page 19: Energy/Performance Design](https://reader036.vdocuments.net/reader036/viewer/2022081516/5681401a550346895dab6893/html5/thumbnails/19.jpg)
Workshop on Scalable Shared Memory Multiprocessors - June 10, 200019
Energy-Delay Product: Cache
•8KBytes have the best energy-delay product
![Page 20: Energy/Performance Design](https://reader036.vdocuments.net/reader036/viewer/2022081516/5681401a550346895dab6893/html5/thumbnails/20.jpg)
Workshop on Scalable Shared Memory Multiprocessors - June 10, 200020
Conclusions
Modest size cache is enough (8KBytes) Improves performance Reduces energy consumption
Segmentation S(1,4) Reduces energy consumption
When area is available: use interleaving IS(2,4) increases by 14% the area
![Page 21: Energy/Performance Design](https://reader036.vdocuments.net/reader036/viewer/2022081516/5681401a550346895dab6893/html5/thumbnails/21.jpg)
Backup Slides
![Page 22: Energy/Performance Design](https://reader036.vdocuments.net/reader036/viewer/2022081516/5681401a550346895dab6893/html5/thumbnails/22.jpg)
Workshop on Scalable Shared Memory Multiprocessors - June 10, 200022
Area-Delay Product: MB
•SP(1,4) best are utilization
![Page 23: Energy/Performance Design](https://reader036.vdocuments.net/reader036/viewer/2022081516/5681401a550346895dab6893/html5/thumbnails/23.jpg)
Workshop on Scalable Shared Memory Multiprocessors - June 10, 200023
Area-Delay Product: Cache
•8KBytes is a sweet point for area-delay product
![Page 24: Energy/Performance Design](https://reader036.vdocuments.net/reader036/viewer/2022081516/5681401a550346895dab6893/html5/thumbnails/24.jpg)
Workshop on Scalable Shared Memory Multiprocessors - June 10, 200024
Power Consumption: Cache
•Power is a bad metric, only useful as a constraint
![Page 25: Energy/Performance Design](https://reader036.vdocuments.net/reader036/viewer/2022081516/5681401a550346895dab6893/html5/thumbnails/25.jpg)
Workshop on Scalable Shared Memory Multiprocessors - June 10, 200025
Memory Access TimingOperation Cycles
X-address buffer 1
X-address decoder 1
Wordline enabling 2
Charge sharing 2
Bit line sensing 2
DRAM Data buffer 2
L1 Cache 1
Total 11
![Page 26: Energy/Performance Design](https://reader036.vdocuments.net/reader036/viewer/2022081516/5681401a550346895dab6893/html5/thumbnails/26.jpg)
Workshop on Scalable Shared Memory Multiprocessors - June 10, 200026
Area RequirementsCache size Area256B 0.071K 0.168K 0.6016K 1.15
Bank size Area(1,4) 4.25(2,4) 4.83(2,8) 5.23
![Page 27: Energy/Performance Design](https://reader036.vdocuments.net/reader036/viewer/2022081516/5681401a550346895dab6893/html5/thumbnails/27.jpg)
Workshop on Scalable Shared Memory Multiprocessors - June 10, 200027
Small Area Memory Banks+Energy
+Spatial Locality
+Localities
![Page 28: Energy/Performance Design](https://reader036.vdocuments.net/reader036/viewer/2022081516/5681401a550346895dab6893/html5/thumbnails/28.jpg)
Workshop on Scalable Shared Memory Multiprocessors - June 10, 200028
Advanced Memory Banks
Less Energy and Contention
More area
Less Energy and Contention
Even more area