temporal ancestry prefetching - research
Post on 16-Oct-2021
6 Views
Preview:
TRANSCRIPT
5/30/20 1
Temporal Ancestry Prefetching
Nathan Gober, Gino Chacon, Daniel Jiménez, Paul V. Gratz
Texas A&M University
5/30/20 2
Block Dead Times in L1I
Mostly Alive Mostly Dead
5/30/20 3
Outline
● Background– Ancestry Prediction
– Temporal Prefetching
● Design● Results● Conclusion
5/30/20 4
Ancestry prediction
● Many paths of varying lengths from A to B
● Blocks in I-cache are long-lived
● Short histories miss the connection between A and B
A
B
5/30/20 5
Temporal Prefetching
● Replay old misses on cache access● Large metadata requirements
– State of the art uses off-chip memory [Wenisch HPCA’09, Bakhshalipour HPCA’18]
– Structural address space? [Wu ISCA’19]
● IPC1 gives large hardware budget
5/30/20 6
Outline
● Background– Ancestry Prediction
– Temporal Prefetching
● Design● Results● Conclusion
5/30/20 7
TAP Model (Training)
L E B A
History Buffer
● Maintain path history
● De-duplicated● A single PC has
many histories
5/30/20 8
TAP Model (Training)
A
B
C
D
E
F
Q V W
F R L Y
Ancestors Descendents
T A M U
● Descendents are captured misses
● The path is not recorded, only the eventual predicted miss
5/30/20 9
TAP Model (Training)
A
B
C
D
E
F
L
Q V W L
F R L Y
Ancestors Descendents
T A M U
E B A
L
History Buffer
● Increment weights on cache access
● Decrement weights on eviction if prefetch not useful
– Invalidate instead if weight was zero.
5/30/20 10
TAP Model (Prediction)
PC
Ancestors Descendents
5/30/20 11
Prefetch Filtering
● Long descendency lists produce large numbers of prefetch hits.
● Filter prefetches by tag check.● With shadow cache of 12-bit partial
tags, filtering is 99% accurate.
5/30/20 12
Page Compression
● Much metadata space is wasted on storing a few pages.
● Use buffer index as a proxy for page number.
● NRU replacement
Page Offset
Index Offset
5/30/20 13
Outline
● Background– Ancestry Prediction
– Temporal Prefetching
● Design● Results● Conclusion
5/30/20 14
Accuracy
5/30/20 15
Coverage
5/30/20 16
Conclusion
● Tracking cache evictions is useful● Prefetch sequencing is unimportant● Future Work
– Replacement in L1I
– Stricter throttling on poor-performing workloads
top related