adaptive subset based replacement policy for high performance caching liqiang he yan sun chaozhong...
TRANSCRIPT
![Page 1: Adaptive Subset Based Replacement Policy for High Performance Caching Liqiang He Yan Sun Chaozhong Zhang College of Computer Science, Inner Mongolia University](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649cbf5503460f94984927/html5/thumbnails/1.jpg)
Adaptive Subset Based Replacement Policy for High Performance Caching
Liqiang He Yan Sun Chaozhong Zhang
College of Computer Science, Inner Mongolia University
Hohhot, Inner Mongolia, P. R. China
JWAC-1: Cache Replacement Championship2010-06-20
ISCA-2010
![Page 2: Adaptive Subset Based Replacement Policy for High Performance Caching Liqiang He Yan Sun Chaozhong Zhang College of Computer Science, Inner Mongolia University](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649cbf5503460f94984927/html5/thumbnails/2.jpg)
Background Cache Replacement Policy plays an
important role in a cache design. LRU policy is widely used in nowadays
microprocessor The LLC has poor locality due to the L1
already filters temporal locality LRU causes thrashing when working set >
cache size
Inner Mongolia University
College of Computer Science JWAC-1: Cache Replacement Championship
![Page 3: Adaptive Subset Based Replacement Policy for High Performance Caching Liqiang He Yan Sun Chaozhong Zhang College of Computer Science, Inner Mongolia University](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649cbf5503460f94984927/html5/thumbnails/3.jpg)
Possible solution if working set > cache size, retain some working
set [Qureshi, et al, ISCA’07] record part of a longer cache access history
College of Computer Science
Inner Mongolia University
How we do it?
Grouping a cache set and keeping part of access history in each group.
Inspired by the thread migration paper of Pierre at HPCA’04
L2 L2 L2
C0 C1 Cn
L2 L2 L2
g0 g1 gn
JWAC-1: Cache Replacement Championship
![Page 4: Adaptive Subset Based Replacement Policy for High Performance Caching Liqiang He Yan Sun Chaozhong Zhang College of Computer Science, Inner Mongolia University](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649cbf5503460f94984927/html5/thumbnails/4.jpg)
Overview
Proposal: Subset Based Replacement Policy (SRP)
Inner Mongolia University
College of Computer Science
ASRP obtains a 4.5 % of geometric average miss reduction over LRU.
JWAC-1: Cache Replacement Championship
SRP successfully reduces the misses through retaining part of longer history in the groups.
But the static SRP does not suitable for different programs.
To adapt the diversity of programs and the behavior changing inside a program, we propose Adaptive SRP policy (ASRP).
![Page 5: Adaptive Subset Based Replacement Policy for High Performance Caching Liqiang He Yan Sun Chaozhong Zhang College of Computer Science, Inner Mongolia University](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649cbf5503460f94984927/html5/thumbnails/5.jpg)
Outline
Introduction
Static Subset Based Replacement Policy
Adaptive Subset Based Replacement Policy
Summary
College of Computer Science
Inner Mongolia University
JWAC-1: Cache Replacement Championship
![Page 6: Adaptive Subset Based Replacement Policy for High Performance Caching Liqiang He Yan Sun Chaozhong Zhang College of Computer Science, Inner Mongolia University](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649cbf5503460f94984927/html5/thumbnails/6.jpg)
Static Subset Based Replacement Policy
Inner Mongolia University
College of Computer Science JWAC-1: Cache Replacement Championship
subset
subset
subset
subset
Cache set
Active:Accept insertion
Non-Active
Local LRU Stack
![Page 7: Adaptive Subset Based Replacement Policy for High Performance Caching Liqiang He Yan Sun Chaozhong Zhang College of Computer Science, Inner Mongolia University](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649cbf5503460f94984927/html5/thumbnails/7.jpg)
Insertion scheme in SRP
Inner Mongolia University
College of Computer Science JWAC-1: Cache Replacement Championship
Insertion only occurs in active subset
Choose victim at LRU position. Do NOT promote to MRU
a b c dMRU LRU
a b c i
Reference to ‘i’
blocks in active subset
![Page 8: Adaptive Subset Based Replacement Policy for High Performance Caching Liqiang He Yan Sun Chaozhong Zhang College of Computer Science, Inner Mongolia University](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649cbf5503460f94984927/html5/thumbnails/8.jpg)
Operation on cache hit in SRP
Inner Mongolia University
College of Computer Science JWAC-1: Cache Replacement Championship
hit in any (active or non-active) subset
a b c dMRU LRU
Reference to ‘c’
c a b d
Move to local MRU position
![Page 9: Adaptive Subset Based Replacement Policy for High Performance Caching Liqiang He Yan Sun Chaozhong Zhang College of Computer Science, Inner Mongolia University](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649cbf5503460f94984927/html5/thumbnails/9.jpg)
Changing of active subset When the misses in a set > a threshold X, change active subset
Inner Mongolia University
College of Computer Science JWAC-1: Cache Replacement Championship
Thus:
A. force X consecutive misses only replacing the blocks in active subset
B. assume N subsets, then a subset can change to active again ONLY after (N-1)*X misses
C. a greater value of X, a longer time that blocks in non-active subsets can stay in a set
![Page 10: Adaptive Subset Based Replacement Policy for High Performance Caching Liqiang He Yan Sun Chaozhong Zhang College of Computer Science, Inner Mongolia University](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649cbf5503460f94984927/html5/thumbnails/10.jpg)
Thrashing access pattern in SRP
College of Computer Science
Inner Mongolia University
JWAC-1: Cache Replacement Championship
b1 b2 b3 b4 b5 b6 b7 b8 b9 b10 b11 b12 b13 b14 b15 b16 b17 ….. b24
x = 6
assume working set is 24 blocks, LLC is 16-way, 4 subsets, 4 blocks/subset
b1
b2
b3
b4
LRU
MRU
Subset 0
b5b6 b7
b8
b9
b10
b11b12
Subset 1
b6
b2
b3
b4
Blocks in a set with SRP: b2b3b4b6 b8b9b10b12 b14b15b16b18 b20b21b22b24
Blocks in a set with LRU: b9 ….. b24
When access b2b3b4b6b8 again, SRP hits but LRU misses
![Page 11: Adaptive Subset Based Replacement Policy for High Performance Caching Liqiang He Yan Sun Chaozhong Zhang College of Computer Science, Inner Mongolia University](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649cbf5503460f94984927/html5/thumbnails/11.jpg)
Case Study of thrashing workload
Inner Mongolia University
4
4.5
5
5.5
6
6.5
7
7.5
1 2 4 8 16 32 64 128 256 512 1k 2k
Threshold
Mis
ses
pe
r 1
K in
stru
ctio
ns
SRP
LRU
College of Computer Science JWAC-1: Cache Replacement Championship
Different static thresholds have different abilities to reduce misses
![Page 12: Adaptive Subset Based Replacement Policy for High Performance Caching Liqiang He Yan Sun Chaozhong Zhang College of Computer Science, Inner Mongolia University](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649cbf5503460f94984927/html5/thumbnails/12.jpg)
Hardware implementationInner Mongolia University
College of Computer Science JWAC-1: Cache Replacement Championship
MRU
LRU
![Page 13: Adaptive Subset Based Replacement Policy for High Performance Caching Liqiang He Yan Sun Chaozhong Zhang College of Computer Science, Inner Mongolia University](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649cbf5503460f94984927/html5/thumbnails/13.jpg)
Results
Inner Mongolia University
College of Computer Science JWAC-1: Cache Replacement Championship
0.6
0.8
1
1.2
1.4
1.6
1.8
(%)
Impr
ovem
ent o
f mis
ses
over
LR
U
threshold 2
threshold 4
threshold 8
• SRP reduces misses for thrashing workloads but increases for LRU-friendly ones.• Not exist a threshold that is suitable for all benchmarks
![Page 14: Adaptive Subset Based Replacement Policy for High Performance Caching Liqiang He Yan Sun Chaozhong Zhang College of Computer Science, Inner Mongolia University](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649cbf5503460f94984927/html5/thumbnails/14.jpg)
College of Computer Science
Inner Mongolia University
JWAC-1: Cache Replacement Championship
Outline
Introduction
Static Subset Based Replacement Policy
Adaptive Subset Based Replacement Policy
Summary
![Page 15: Adaptive Subset Based Replacement Policy for High Performance Caching Liqiang He Yan Sun Chaozhong Zhang College of Computer Science, Inner Mongolia University](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649cbf5503460f94984927/html5/thumbnails/15.jpg)
College of Computer Science
Inner Mongolia University
JWAC-1: Cache Replacement Championship
Adaptive SRP policy
Different programs prefer different thresholds.
Victim selection and insertion policy are same as in SRP
ONLY difference: threshold is selected dynamically from a pool of values according to which one causes fewest misses. The maximum threshold is 128 Pick eight values: 20, 21, …, 27
Apply the best threshold value to the cache
In ASRP policy:
![Page 16: Adaptive Subset Based Replacement Policy for High Performance Caching Liqiang He Yan Sun Chaozhong Zhang College of Computer Science, Inner Mongolia University](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649cbf5503460f94984927/html5/thumbnails/16.jpg)
+
+
ASRP policy via “Set Dueling” Divide the cache into two
type: Sampling sets (eight
thresholds * 4sets/thres.)
Follower sets
Eight counters misses to threshold X’s
sampling sets: counter_x++
Counters decides threshold for Follower sets:
counter with smallest value
Thres-20-sets
Follower Sets
Thres-21-sets
Thres-27-sets
Cntr_0
miss
Cntr_7
Eight thresholds
JWAC-1: Cache Replacement ChampionshipCollege of Computer Science
Inner Mongolia University
![Page 17: Adaptive Subset Based Replacement Policy for High Performance Caching Liqiang He Yan Sun Chaozhong Zhang College of Computer Science, Inner Mongolia University](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649cbf5503460f94984927/html5/thumbnails/17.jpg)
Resetting mechanism
Eight thresholds
last_follow
=
global_follow
Y++
N --
threshold
>?
Cntr_0
Cntr_7
reset
JWAC-1: Cache Replacement ChampionshipCollege of Computer Science
Inner Mongolia University
To avoid the accumulative effect of a big value in a specific Cnrt_x
Record the times of a same threshold is selected by the follower sets
When the times > a threshold, reset all the Cntr_Xs
![Page 18: Adaptive Subset Based Replacement Policy for High Performance Caching Liqiang He Yan Sun Chaozhong Zhang College of Computer Science, Inner Mongolia University](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649cbf5503460f94984927/html5/thumbnails/18.jpg)
College of Computer Science
Inner Mongolia University
JWAC-1: Cache Replacement Championship
Budget
Totally 45K bits
only 70% of the budget used by LRU policy, and 35% of the total budget provided by this championship
![Page 19: Adaptive Subset Based Replacement Policy for High Performance Caching Liqiang He Yan Sun Chaozhong Zhang College of Computer Science, Inner Mongolia University](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649cbf5503460f94984927/html5/thumbnails/19.jpg)
College of Computer Science
Inner Mongolia University
Results
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
asta
rbw
aves
bzip
2ca
ctus
ADM
calc
ulix
deal
IIga
mes
sgc
cG
emsF
DTD
gobm
kgr
omac
sh2
46re
fhm
mer
lbm
lesl
ie3d
libqu
antu
mm
cfm
ilcna
md
omne
tpp
perlb
ench
pova
rysj
eng
sopl
exsp
hix3
tont
oxa
lanc
bmk
zeus
mp
aver
age
(%)
Imp
rove
me
nt o
f mis
ses
ove
r L
RU
DIP ASRP
For 1MB 16-ways LLC. ASRP gets a geometric average speedup of 4.5% over LRU
JWAC-1: Cache Replacement Championship
![Page 20: Adaptive Subset Based Replacement Policy for High Performance Caching Liqiang He Yan Sun Chaozhong Zhang College of Computer Science, Inner Mongolia University](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649cbf5503460f94984927/html5/thumbnails/20.jpg)
Analyze
College of Computer Science
Inner Mongolia University
4
4.5
5
5.5
6
6.5
7
7.5
1 2 4 8 16 32 64 128 256 512 1k 2kThreshold
Mis
ses
pe
r 1
K in
stru
ctio
ns
SRP
LRU
ASRP
7
7.1
7.2
7.3
7.4
7.5
7.6
7.7
7.8
1 2 4 8 16 32 64 128 256 512 1k 2kThreshold
Mis
ses
pe
r 1
K in
stru
ctio
ns
SRPLRUASRP
xalancbmk GemsFDTD
JWAC-1: Cache Replacement Championship
The sampling mechanism does help ASRP to find the best thresholds for different programs
![Page 21: Adaptive Subset Based Replacement Policy for High Performance Caching Liqiang He Yan Sun Chaozhong Zhang College of Computer Science, Inner Mongolia University](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649cbf5503460f94984927/html5/thumbnails/21.jpg)
Conclusion
Keeping part of working set in the cache helps reducing misses when the cache suffers a thrashing problem
The part of longer access history helps SRP more accurately capturing the frequently used blocks
Different programs and different phases of a program prefer different thresholds to contribute maximum hits to the cache
“Set Dueling” helps ASRP dynamically selecting a suitable threshold
The experiment results show the effectiveness of ASRP policy
Inner Mongolia University
College of Computer Science JWAC-1: Cache Replacement Championship
![Page 22: Adaptive Subset Based Replacement Policy for High Performance Caching Liqiang He Yan Sun Chaozhong Zhang College of Computer Science, Inner Mongolia University](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649cbf5503460f94984927/html5/thumbnails/22.jpg)
Thank you!
Any question?
College of Computer Science
Inner Mongolia University
JWAC-1: Cache Replacement Championship
![Page 23: Adaptive Subset Based Replacement Policy for High Performance Caching Liqiang He Yan Sun Chaozhong Zhang College of Computer Science, Inner Mongolia University](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649cbf5503460f94984927/html5/thumbnails/23.jpg)
Result on multi-core processor
0.6
0.8
1
1.2
1.4
1.6
1.8
2
2.2
2.4
asta
ras
tar
asta
r
bwav
es
GemsF
DTD
omne
tpp
hmm
er
GemsF
DTD
hmm
er
xalan
cbm
k
(%)
Imp
rove
me
nt
of
mis
ses
ove
r L
RU
DIP ASRP
College of Computer Science
Inner Mongolia University
JWAC-1: Cache Replacement Championship
0.6
0.8
1
1.2
1.4
1.6
1.8
2
asta
rbw
aves
asta
rbw
aves
asta
rhm
mer
sphi
x3xa
lanc
bmk
bzip
2G
emsF
DTD
gobm
kom
netp
p
hmm
erxa
lanc
bmk
Gem
sFD
TDom
netp
p
(%)
Imp
rove
me
nt o
f mis
ses
ove
r L
RU DIP ASRP
![Page 24: Adaptive Subset Based Replacement Policy for High Performance Caching Liqiang He Yan Sun Chaozhong Zhang College of Computer Science, Inner Mongolia University](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649cbf5503460f94984927/html5/thumbnails/24.jpg)
7
7.1
7.2
7.3
7.4
7.5
7.6
7.7
7.8
1 2 4 8 16 32 64 128 256 512 1k 2k
Threshold
Mis
ses
per
1K in
stru
ctio
ns
SRP
LRU
Case Study of LRU-friendly workload
Inner Mongolia University
College of Computer Science JWAC-1: Cache Replacement Championship
![Page 25: Adaptive Subset Based Replacement Policy for High Performance Caching Liqiang He Yan Sun Chaozhong Zhang College of Computer Science, Inner Mongolia University](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649cbf5503460f94984927/html5/thumbnails/25.jpg)
Inner Mongolia University
College of Computer Science JWAC-1: Cache Replacement Championship
Explanation of active subset changing
![Page 26: Adaptive Subset Based Replacement Policy for High Performance Caching Liqiang He Yan Sun Chaozhong Zhang College of Computer Science, Inner Mongolia University](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649cbf5503460f94984927/html5/thumbnails/26.jpg)
A simple example of SRP policy
Inner Mongolia University
College of Computer Science JWAC-1: Cache Replacement Championship