balanced cache ayşe bakir, zeynep zengİn. ayse bakır,cmpe 511,bogazici university2 outline ...
TRANSCRIPT
![Page 1: BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline Introduction Motivation The B-Cache Organization Experimental](https://reader038.vdocuments.net/reader038/viewer/2022110319/56649c725503460f94923102/html5/thumbnails/1.jpg)
BALANCED CACHEAyşe BAKIR, Zeynep ZENGİN
![Page 2: BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline Introduction Motivation The B-Cache Organization Experimental](https://reader038.vdocuments.net/reader038/viewer/2022110319/56649c725503460f94923102/html5/thumbnails/2.jpg)
Ayse Bakır,CMPE 511,Bogazici University
2
Outline
IntroductionMotivationThe B-Cache OrganizationExperimental Methodology and
ResultsProgrammable Decoder DesignAnalysisRelated WorkConclusion
![Page 3: BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline Introduction Motivation The B-Cache Organization Experimental](https://reader038.vdocuments.net/reader038/viewer/2022110319/56649c725503460f94923102/html5/thumbnails/3.jpg)
Ayse Bakır,CMPE 511,Bogazici University
3
Introduction
Increasing gap between memory latency andprocessor speed is a critical bottleneck to achieve a high performance computing system.
Multilevel memory hierarchy has been developed to hide the memory latency.
![Page 4: BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline Introduction Motivation The B-Cache Organization Experimental](https://reader038.vdocuments.net/reader038/viewer/2022110319/56649c725503460f94923102/html5/thumbnails/4.jpg)
Ayse Bakır,CMPE 511,Bogazici University
4
Introduction
PROCESSOR
LEV
EL
2
MAIN MEMORY
LEV
EL
1
Level one cache normally resides on a processor’s critical path, fast access to level one cache is an important issue for improved processor performance.
![Page 5: BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline Introduction Motivation The B-Cache Organization Experimental](https://reader038.vdocuments.net/reader038/viewer/2022110319/56649c725503460f94923102/html5/thumbnails/5.jpg)
Ayse Bakır,CMPE 511,Bogazici University
5
Introduction
There are two cache organization models that
have been developed: Direct-Mapped Cache: Set-Associative Cache:
![Page 6: BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline Introduction Motivation The B-Cache Organization Experimental](https://reader038.vdocuments.net/reader038/viewer/2022110319/56649c725503460f94923102/html5/thumbnails/6.jpg)
Ayse Bakır,CMPE 511,Bogazici University
6
Introduction
1. Direct-Mapped Cache:
![Page 7: BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline Introduction Motivation The B-Cache Organization Experimental](https://reader038.vdocuments.net/reader038/viewer/2022110319/56649c725503460f94923102/html5/thumbnails/7.jpg)
Ayse Bakır,CMPE 511,Bogazici University
7
Introduction
2. Set Associative Cache:
![Page 8: BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline Introduction Motivation The B-Cache Organization Experimental](https://reader038.vdocuments.net/reader038/viewer/2022110319/56649c725503460f94923102/html5/thumbnails/8.jpg)
Ayse Bakır,CMPE 511,Bogazici University
8
Introduction
Direct-Mapped Cache
faster access time consumes less power per access consumes less area easy to implement simple to design higher miss rate
Set-Associative Cache
longer access time consumes more power per access consumes more area reduces conflict misses has a replacement policy
![Page 9: BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline Introduction Motivation The B-Cache Organization Experimental](https://reader038.vdocuments.net/reader038/viewer/2022110319/56649c725503460f94923102/html5/thumbnails/9.jpg)
Ayse Bakır,CMPE 511,Bogazici University
9
Introduction
Frequent hit sets have many more cache hits than other sets.
The cache misses occur more frequently in Frequent miss sets.
Less accessed sets are accessed less than 1% of the total cache
references.
![Page 10: BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline Introduction Motivation The B-Cache Organization Experimental](https://reader038.vdocuments.net/reader038/viewer/2022110319/56649c725503460f94923102/html5/thumbnails/10.jpg)
Ayse Bakır,CMPE 511,Bogazici University
10
Introduction
Balanced Cache (B-Cache):A mechanism to provide the benefit of
cacheblock replacement while maintaining the constant access time of a direct-mapped
cache
![Page 11: BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline Introduction Motivation The B-Cache Organization Experimental](https://reader038.vdocuments.net/reader038/viewer/2022110319/56649c725503460f94923102/html5/thumbnails/11.jpg)
Ayse Bakır,CMPE 511,Bogazici University
11
Introduction
1. The decoder length of a traditional direct-mapped cache is increased by three bits:
accesses to heavily used sets can be reduced to 1/8th of the original design.
only 1/8th of the memory address space has a mapping to the cache sets.
2. A replacement policy is added.3. A programmable decoder is used.
![Page 12: BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline Introduction Motivation The B-Cache Organization Experimental](https://reader038.vdocuments.net/reader038/viewer/2022110319/56649c725503460f94923102/html5/thumbnails/12.jpg)
Ayse Bakır,CMPE 511,Bogazici University
12
Motivation - Example
8-bit adresses
0,1,8,9... 0,1,8,9
![Page 13: BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline Introduction Motivation The B-Cache Organization Experimental](https://reader038.vdocuments.net/reader038/viewer/2022110319/56649c725503460f94923102/html5/thumbnails/13.jpg)
Ayse Bakır,CMPE 511,Bogazici University
13
Motivation - Example
8-bit adress
same as in 2-way cache
X : invalid PD entry
![Page 14: BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline Introduction Motivation The B-Cache Organization Experimental](https://reader038.vdocuments.net/reader038/viewer/2022110319/56649c725503460f94923102/html5/thumbnails/14.jpg)
Ayse Bakır,CMPE 511,Bogazici University
14
B-Cache Organization - Terminology
Memory address mapping factor (MF):
B-Cache associativity (BAS):
PI : index length of PDNPI : index length of NPDOI : index length of original direct-mapped cache
MF = 2(PI+NPI)/2OI , where MF≥1
BAS = 2OI/2NPI , where BAS≥1
![Page 15: BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline Introduction Motivation The B-Cache Organization Experimental](https://reader038.vdocuments.net/reader038/viewer/2022110319/56649c725503460f94923102/html5/thumbnails/15.jpg)
Ayse Bakır,CMPE 511,Bogazici University
15
B-Cache Organization
MF = 2(PI+NPI)/2OI =2(6+6)/29=8 BAS = 2(OI)/2NPI =2(3)/26=8
![Page 16: BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline Introduction Motivation The B-Cache Organization Experimental](https://reader038.vdocuments.net/reader038/viewer/2022110319/56649c725503460f94923102/html5/thumbnails/16.jpg)
Ayse Bakır,CMPE 511,Bogazici University
16
B-Cache Organization–Replacement Policy
Random Policy: Simple to design and needs very few extra hardware.
Least Recently Used(LRU):May achieve a better hit rate but will have more area overhead than the random policy.
![Page 17: BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline Introduction Motivation The B-Cache Organization Experimental](https://reader038.vdocuments.net/reader038/viewer/2022110319/56649c725503460f94923102/html5/thumbnails/17.jpg)
Ayse Bakır,CMPE 511,Bogazici University
17
Experimental Methodology and Results
Miss rate is used as the primary metric to measure the BCache effectiveness, and MP and BAS parameters are determined.
Results are compared with baseline level one cache(a direct-mapped 16kB cache with a line size of 32 bytes for instruction and data caches)
4-issue out-of-order processor simulator is used to collect the miss rate. 26 SPEC2K benchmarks are run using the SimpleScalar tool set.
![Page 18: BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline Introduction Motivation The B-Cache Organization Experimental](https://reader038.vdocuments.net/reader038/viewer/2022110319/56649c725503460f94923102/html5/thumbnails/18.jpg)
Ayse Bakır,CMPE 511,Bogazici University
18
Experimental Methodology and Results
16 entry victim buffer set-associative caches B-Caches with
dif. MFs
![Page 19: BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline Introduction Motivation The B-Cache Organization Experimental](https://reader038.vdocuments.net/reader038/viewer/2022110319/56649c725503460f94923102/html5/thumbnails/19.jpg)
Ayse Bakır,CMPE 511,Bogazici University
19
Experimental Methodology and Results
16 entry victim buffer set-associative caches B-Caches with
dif. MFs
The miss rate reduction of the B-Cache is as good as a 4-way
cache for the data cache. For the instruction cache, on average, the miss rate
reduction is5% better than a 4-way cache.
![Page 20: BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline Introduction Motivation The B-Cache Organization Experimental](https://reader038.vdocuments.net/reader038/viewer/2022110319/56649c725503460f94923102/html5/thumbnails/20.jpg)
Zeynep Zengin, CMPE511, Bogazici Univ.
20
Programmable Decoder Design
• Latency,• Storage,• Power Costs
![Page 21: BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline Introduction Motivation The B-Cache Organization Experimental](https://reader038.vdocuments.net/reader038/viewer/2022110319/56649c725503460f94923102/html5/thumbnails/21.jpg)
Zeynep Zengin, CMPE511, Bogazici Univ.
21
Timing Analysis
• Critical path– Direct mapped: Tag side– B-Cache: May be on tag side or data
side
• B-Cache modifies local decoder
![Page 22: BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline Introduction Motivation The B-Cache Organization Experimental](https://reader038.vdocuments.net/reader038/viewer/2022110319/56649c725503460f94923102/html5/thumbnails/22.jpg)
Zeynep Zengin, CMPE511, Bogazici Univ.
22
Timing Analysis
![Page 23: BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline Introduction Motivation The B-Cache Organization Experimental](https://reader038.vdocuments.net/reader038/viewer/2022110319/56649c725503460f94923102/html5/thumbnails/23.jpg)
Zeynep Zengin, CMPE511, Bogazici Univ.
23
Storage Overhead
• B-cache uses CAM cells additionally
• CAM cell is 25% larger than the SRAM cell used by data and tag memory
![Page 24: BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline Introduction Motivation The B-Cache Organization Experimental](https://reader038.vdocuments.net/reader038/viewer/2022110319/56649c725503460f94923102/html5/thumbnails/24.jpg)
Zeynep Zengin, CMPE511, Bogazici Univ.
24
Power Overhead
• Extra power consumption: PD of each subarray.
• Power reduction: – 3-bit data length reduction– Removal of 3 input NAND gates
![Page 25: BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline Introduction Motivation The B-Cache Organization Experimental](https://reader038.vdocuments.net/reader038/viewer/2022110319/56649c725503460f94923102/html5/thumbnails/25.jpg)
Zeynep Zengin, CMPE511, Bogazici Univ.
25
ANALYSIS
• Overall Performance• Overall Energy• Design Tradeoffs for MP and BAS for
a Fixed Length of PD• Balance Evaluation• The Effect of L1 Cache Sizes• Comparison
![Page 26: BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline Introduction Motivation The B-Cache Organization Experimental](https://reader038.vdocuments.net/reader038/viewer/2022110319/56649c725503460f94923102/html5/thumbnails/26.jpg)
Zeynep Zengin, CMPE511, Bogazici Univ.
26
Overall Performance
![Page 27: BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline Introduction Motivation The B-Cache Organization Experimental](https://reader038.vdocuments.net/reader038/viewer/2022110319/56649c725503460f94923102/html5/thumbnails/27.jpg)
Zeynep Zengin, CMPE511, Bogazici Univ.
27
Overall Energy
• Static – Dynamic Power Dissipation• Charging and discharging of the load
capacitance
• Memory Related – Chip caches– Offchip memory
![Page 28: BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline Introduction Motivation The B-Cache Organization Experimental](https://reader038.vdocuments.net/reader038/viewer/2022110319/56649c725503460f94923102/html5/thumbnails/28.jpg)
Zeynep Zengin, CMPE511, Bogazici Univ.
28
Design Tradeoffs for MP and BAS for a Fixed Length of PD
The question is which design has a higher miss rate reduction???
![Page 29: BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline Introduction Motivation The B-Cache Organization Experimental](https://reader038.vdocuments.net/reader038/viewer/2022110319/56649c725503460f94923102/html5/thumbnails/29.jpg)
Zeynep Zengin, CMPE511, Bogazici Univ.
29
Design Tradeoffs for MP and BAS for a Fixed Length of PD
![Page 30: BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline Introduction Motivation The B-Cache Organization Experimental](https://reader038.vdocuments.net/reader038/viewer/2022110319/56649c725503460f94923102/html5/thumbnails/30.jpg)
Zeynep Zengin, CMPE511, Bogazici Univ.
30
Balance Evaluation
•Frequent hit sets: Hits 2 times higher•Frequent Miss sets: misses 2 times higher•Less accessed sets: accesses below half
fhs ch fms cm las tca
DMAVE
7,5 57,2 5,6 36,5 50,2 10,5
BC 7,6 39,8 2,2 15,7 32,4 8,4
![Page 31: BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline Introduction Motivation The B-Cache Organization Experimental](https://reader038.vdocuments.net/reader038/viewer/2022110319/56649c725503460f94923102/html5/thumbnails/31.jpg)
Zeynep Zengin, CMPE511, Bogazici Univ.
31
• The miss rate reductions increase when the MF is increased
• B-Cache, the design with MF = 8 and BAS = 8 is the best
![Page 32: BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline Introduction Motivation The B-Cache Organization Experimental](https://reader038.vdocuments.net/reader038/viewer/2022110319/56649c725503460f94923102/html5/thumbnails/32.jpg)
Zeynep Zengin, CMPE511, Bogazici Univ.
32
Comparison
• With a victim buffer: the miss rate reduction of the B-Cache is higher than the victim buffer
• with a highly associative cache: – HAC is for low-power embedded
systems– HAC is an extreme case of the B-Cache,
where the decoder of the HAC is fully programmable.
![Page 33: BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline Introduction Motivation The B-Cache Organization Experimental](https://reader038.vdocuments.net/reader038/viewer/2022110319/56649c725503460f94923102/html5/thumbnails/33.jpg)
Zeynep Zengin, CMPE511, Bogazici Univ.
33
RELATED WORK
• Reduce the miss rate of direct mapped caches
• Reduce the access time of set associative caches
![Page 34: BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline Introduction Motivation The B-Cache Organization Experimental](https://reader038.vdocuments.net/reader038/viewer/2022110319/56649c725503460f94923102/html5/thumbnails/34.jpg)
Zeynep Zengin, CMPE511, Bogazici Univ.
34
Reducing Miss Rate of Direct Mapped Caches
TECHNIQUES• Page allocation• Column associative cache• Adaptive group associative cache• Skewed associative cache
![Page 35: BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline Introduction Motivation The B-Cache Organization Experimental](https://reader038.vdocuments.net/reader038/viewer/2022110319/56649c725503460f94923102/html5/thumbnails/35.jpg)
Zeynep Zengin, CMPE511, Bogazici Univ.
35
Reducing Access Time of Set-associative Caches
• Partial address matcing : predicting hit way
• Difference bit cache
![Page 36: BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline Introduction Motivation The B-Cache Organization Experimental](https://reader038.vdocuments.net/reader038/viewer/2022110319/56649c725503460f94923102/html5/thumbnails/36.jpg)
Zeynep Zengin, CMPE511, Bogazici Univ.
36
B-CACHE SUMMARY
• B-cache can be applied to both high performance and low-power embedded systems.
• Balanced without any software intervention.
• Feasible and easy to implement
![Page 37: BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline Introduction Motivation The B-Cache Organization Experimental](https://reader038.vdocuments.net/reader038/viewer/2022110319/56649c725503460f94923102/html5/thumbnails/37.jpg)
Zeynep Zengin, CMPE511, Bogazici Univ.
37
Conclusion
• B-Cache allows the accesses to cache sets to be balanced by increasing the decoder length and incorporating a replacement policy to a direct-mapped cache design.
• programmable decoders dynamically determine which memory address has a mapping to the cache set
• A 16kB level one B-Cache outperforms a traditional same sized direct mapped cache by 64.5% and 37.8% for instruction and data cache, respectively
• Average IPC improvement: 5.9%• Energy reduction: 2%.• Access time: same as a traditional direct mapped
cache
![Page 38: BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline Introduction Motivation The B-Cache Organization Experimental](https://reader038.vdocuments.net/reader038/viewer/2022110319/56649c725503460f94923102/html5/thumbnails/38.jpg)
Ayse Bakır,CMPE 511,Bogazici University
38
References
1. C. Zhang,”Balanced Cache:Reducing Conflict Misses of Direct-Mapped Caches through Programmable Decoders”,ISCA 2006,IEEE.
2. C. Zhang,”Balanced Instruction Cache:Reducing Conflict Misses of Direct-Mapped Caches through Balanced Subarray Accesses”,IEEE Computer Architecture Letter, May 2005.
3. Wilkonson, B.(1996), “Computer Architecture: Design and Performance”, Prentice Hall Europe.
4. University of Maryland http://www.cs.umd.edu/class/fall2001/cmsc411/proj01/cache/cache.html