![Page 1: Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines](https://reader036.vdocuments.net/reader036/viewer/2022062314/56813061550346895d962c14/html5/thumbnails/1.jpg)
1
Line Distillation: Increasing Cache Capacity by
Filtering Unused Words in Cache Lines
Moinuddin K. Qureshi M. Aater Suleman
Yale N. Patt
HPCA 2007
![Page 2: Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines](https://reader036.vdocuments.net/reader036/viewer/2022062314/56813061550346895d962c14/html5/thumbnails/2.jpg)
2
Introduction
Caches are organized at linesize granularity Helps when spatial locality is high
Unused words when spatial locality is low
Unused words occupy space without contributing to cache hits
Filtering unused words allows cache to store more cache lines
![Page 3: Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines](https://reader036.vdocuments.net/reader036/viewer/2022062314/56813061550346895d962c14/html5/thumbnails/3.jpg)
3
Problem: Not all words are useful
On average less than 60% words used (4.7/8)
Cache line (64B) divided into 8 words of 8B each(1 MB 8-way L2 cache)
Word
s u
sed
per
line (
avg
)
![Page 4: Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines](https://reader036.vdocuments.net/reader036/viewer/2022062314/56813061550346895d962c14/html5/thumbnails/4.jpg)
4
Goal: Improving cache performance
Smaller linesize can result in fewer unused words
Smaller linesize degrades cache performance
Linesize of 32B increases MPKI for 14 of 16 benchmarksAverage MPKI increases by 25%
Insight:Words usage stabilizes as line traverses from MRU to LRU
Goal: Improving cache performance by filtering unused words
![Page 5: Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines](https://reader036.vdocuments.net/reader036/viewer/2022062314/56813061550346895d962c14/html5/thumbnails/5.jpg)
5
Insight
Footprint = 8-bits per line that tracks word usage
Most footprint updates occurearly in recency stack
Max recency position before footprint update
78%
5%
6%
11%
MRUPos 1Pos 2Pos 3Pos 4Pos 5Pos 6LRU
Recency Stack
Line Distillation (LDIS):Evict unused words when
line crosses certain recency
![Page 6: Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines](https://reader036.vdocuments.net/reader036/viewer/2022062314/56813061550346895d962c14/html5/thumbnails/6.jpg)
6
Outline
Background Line Distillation Experimental Evaluation Interaction with Compression Related Work and Summary
![Page 7: Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines](https://reader036.vdocuments.net/reader036/viewer/2022062314/56813061550346895d962c14/html5/thumbnails/7.jpg)
7
Framework for LDIS
PROCESSOR
ICACHE DCACHE
footprint
LOC WOC
L2 Cache
Distill Cache
valid bits
(sectored)
Line Organized Cache Word Organized Cache
Line from memory
![Page 8: Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines](https://reader036.vdocuments.net/reader036/viewer/2022062314/56813061550346895d962c14/html5/thumbnails/8.jpg)
8
Distill Cache (Operation)
Traditional cache (4-way)
LOC WOC
MRU LRU
B AC
Four cases:1. Cache Miss: Access to line D2. LOC Hit: Access to line B3. WOC Hit: Access to line A (word A0)4. Hole Miss: Access to line A (word
A1)Words used? Evict
A[1:6]Install A0,A7
(A0,A7 used)
Install Line D in LOC and update LRU state
Same as traditional cache
Send A0 and A7 to L1 and valid bitsInvalidate all words of A in WOC.
Fetch A from Memory and install in LOC
DA0,A7
![Page 9: Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines](https://reader036.vdocuments.net/reader036/viewer/2022062314/56813061550346895d962c14/html5/thumbnails/9.jpg)
9
Median Threshold Filtering
A line with many used words can evict several lines from WOC
A0 B0 C0 D0 E0 F0 G0 H0Line X has all 8 words used
X0 X1 X2 X3 X4 X5 X6 X78 Lines
evicted from WOC
WOC
Increase lines in WOC by not installing lines for which used words > threshold “K”
K = median words used in LOC line (computed at runtime)
![Page 10: Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines](https://reader036.vdocuments.net/reader036/viewer/2022062314/56813061550346895d962c14/html5/thumbnails/10.jpg)
10
Outline
Background Line Distillation Experimental Evaluation Interaction with Compression Related Work and Summary
![Page 11: Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines](https://reader036.vdocuments.net/reader036/viewer/2022062314/56813061550346895d962c14/html5/thumbnails/11.jpg)
11
Methodology
Configuration:
L2 cache: 1MB 8-way 64B linesize
(Distill cache gives 6 ways to LOC and 2 ways to WOC)
Out-of-order processor with 16KB 2-way L1s
400 cycle memory
Benchmarks:
15 SPEC2K benchmarks + health from olden suite
(A 250M instruction slice using SimPoint for SPEC2K)
![Page 12: Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines](https://reader036.vdocuments.net/reader036/viewer/2022062314/56813061550346895d962c14/html5/thumbnails/12.jpg)
12
ResultsLDIS (No MT) LDIS (with MT)
LDIS (MT) reduces MPKI by 25%
(%)
Reduct
ion
in L
2 M
PK
I
![Page 13: Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines](https://reader036.vdocuments.net/reader036/viewer/2022062314/56813061550346895d962c14/html5/thumbnails/13.jpg)
13
Reverter Circuit (RC)
Tournament selection: Distill cache vs. traditional cache Dynamic set sampling with 32 sets [Qureshi+ ISCA’06]
For sets A, C, D, F, H:if (SCTR > 75%) Enable LDISif (SCTR < 25%) Disable LDIS
ATD-LRU
Distill cache
Set B
Set E
Set G
Set A
Set CSet D
Set F
Set H
Set B
Set E
Set G
Set A
Set CSet D
Set F
Set H
Set BSet ESet G
SCTR
- +
(storage overhead of ATD: 1KB)
![Page 14: Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines](https://reader036.vdocuments.net/reader036/viewer/2022062314/56813061550346895d962c14/html5/thumbnails/14.jpg)
14
Results with RCLDIS (MT, No RC) LDIS (MT,RC)
RC disables LDIS when it increases MPKI.
LDIS (MT,RC) reduces MPKI by 30%
(%)
Reduct
ion
in L
2 M
PK
I
![Page 15: Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines](https://reader036.vdocuments.net/reader036/viewer/2022062314/56813061550346895d962c14/html5/thumbnails/15.jpg)
15
Overheads
Storage Tags for WOC + footprint bits: 12.2%
overhead
LatencyTag-access (LOC+WOC) increases by one
cycle WOC hits incur two cycles to rearrange words
PowerAdditional power of WOC tag-store
![Page 16: Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines](https://reader036.vdocuments.net/reader036/viewer/2022062314/56813061550346895d962c14/html5/thumbnails/16.jpg)
16
IPC Results
LDIS improves average IPC by 12%
(%)
IPC
Im
pro
vem
en
t
![Page 17: Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines](https://reader036.vdocuments.net/reader036/viewer/2022062314/56813061550346895d962c14/html5/thumbnails/17.jpg)
17
Outline
Background Line Distillation Experimental Evaluation Interaction with Compression Related Work and Summary
![Page 18: Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines](https://reader036.vdocuments.net/reader036/viewer/2022062314/56813061550346895d962c14/html5/thumbnails/18.jpg)
18
Compression vs. LDIS
Several proposals to increase capacity via compression
Compression and LDIS fundamentally different Compression exploits redundancy in stored data LDIS leverages unused words for spare capacity
Footprint Aware Compression (FAC) combines both
FAC compresses used words before installing in WOC
![Page 19: Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines](https://reader036.vdocuments.net/reader036/viewer/2022062314/56813061550346895d962c14/html5/thumbnails/19.jpg)
19
Results for FAC
Compression and LDIS interact positively.
FAC reduces MPKI by 50%
LDIS Compression FAC
(%)
Reduct
ion
in L
2 M
PK
I
50
40
30
20
10
0
![Page 20: Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines](https://reader036.vdocuments.net/reader036/viewer/2022062314/56813061550346895d962c14/html5/thumbnails/20.jpg)
20
Outline
Background Line Distillation Experimental Evaluation Interaction with Compression Related Work and Summary
![Page 21: Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines](https://reader036.vdocuments.net/reader036/viewer/2022062314/56813061550346895d962c14/html5/thumbnails/21.jpg)
21
Related work
Spatial-Temporal Cache -Gonzales+ [ICS’95]
Spatial Locality Prediction –Johnson+ [ISCA’97]
Variable Linesize Cache –Veidenbaum+ [ICS’99]
Spatial Footprint Prediction –Kumar+ [ISCA’98], Pujara+ [HPCA’06]
Spatial Pattern Prediction -Chen+ [HPCA’05]
LDIS is particularly suited for large caches and outperforms predictor-based techniques without
requiring separate structure for tracking spatial footprint
![Page 22: Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines](https://reader036.vdocuments.net/reader036/viewer/2022062314/56813061550346895d962c14/html5/thumbnails/22.jpg)
22
Contributions
Line Distillation: Filter unused words without a separate footprint predictor
Distill cache: Utilize extra capacity created by LDIS
Median Threshold Filtering and Reverter Circuit: Improve performance and robustness of LDIS Result: LDIS (MT+RC) reduces MPKI by 30%
Footprint Aware Compression: LDIS + compressionResult: FAC reduces MPKI by 50%
![Page 23: Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines](https://reader036.vdocuments.net/reader036/viewer/2022062314/56813061550346895d962c14/html5/thumbnails/23.jpg)
23
Questions
![Page 24: Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines](https://reader036.vdocuments.net/reader036/viewer/2022062314/56813061550346895d962c14/html5/thumbnails/24.jpg)
24
Result comparing capacity
![Page 25: Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines](https://reader036.vdocuments.net/reader036/viewer/2022062314/56813061550346895d962c14/html5/thumbnails/25.jpg)
25
Line Size vs. MPKI
![Page 26: Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines](https://reader036.vdocuments.net/reader036/viewer/2022062314/56813061550346895d962c14/html5/thumbnails/26.jpg)
26
Distribution of Hit-Miss
![Page 27: Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines](https://reader036.vdocuments.net/reader036/viewer/2022062314/56813061550346895d962c14/html5/thumbnails/27.jpg)
27
Average words usage (detailed)
![Page 28: Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines](https://reader036.vdocuments.net/reader036/viewer/2022062314/56813061550346895d962c14/html5/thumbnails/28.jpg)
28
Result for 3 types of LDIS
![Page 29: Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines](https://reader036.vdocuments.net/reader036/viewer/2022062314/56813061550346895d962c14/html5/thumbnails/29.jpg)
29
Replacement
LRU in LOC
WOC needs variable sized replacement
Only power-of-two sizes allowed in WOC
Placement constrained to alignment boundary
Random selection in case of multiple candidates
![Page 30: Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines](https://reader036.vdocuments.net/reader036/viewer/2022062314/56813061550346895d962c14/html5/thumbnails/30.jpg)
30
Background (pictorial)
![Page 31: Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines](https://reader036.vdocuments.net/reader036/viewer/2022062314/56813061550346895d962c14/html5/thumbnails/31.jpg)
31
Result LDIS vs. FAC (detailed)
![Page 32: Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines](https://reader036.vdocuments.net/reader036/viewer/2022062314/56813061550346895d962c14/html5/thumbnails/32.jpg)
32
Comparison with SFP
![Page 33: Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines](https://reader036.vdocuments.net/reader036/viewer/2022062314/56813061550346895d962c14/html5/thumbnails/33.jpg)
33
Appendix A: Other SPEC Benchmarks
![Page 34: Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines](https://reader036.vdocuments.net/reader036/viewer/2022062314/56813061550346895d962c14/html5/thumbnails/34.jpg)
34
Appendix B: Cache Size vs. Density
![Page 35: Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines](https://reader036.vdocuments.net/reader036/viewer/2022062314/56813061550346895d962c14/html5/thumbnails/35.jpg)
35
Summary
Many words in cache lines remain unused
Unused words unlikely to be accessed in less recent part of LRU stack Line Distillation (LDIS)
Distill-cache utilizes extra capacity created by LDIS
LDIS reduces MPKI by 30% and improves IPC by 12%
“Footprint Aware Compression” combines LDIS and compression to reduce MPKI by 50%