evaluating associativity in cpu cacheszz124/cs671_fall2013/lectures/jan_ca.pdf · evaluating...

28
Evaluating Associativity in CPU Caches Mark D. Hill, Alan Jay Smith IEEE Transactions on Computers(1989)

Upload: others

Post on 21-Jul-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Evaluating Associativity in CPU Cacheszz124/cs671_fall2013/lectures/jan_ca.pdf · Evaluating Associativity in CPU Caches Mark D. Hill, Alan Jay Smith IEEE Transactions on Computers(1989)

Evaluating Associativity in CPU Caches

Mark D. Hill, Alan Jay SmithIEEE Transactions on Computers(1989)

Page 2: Evaluating Associativity in CPU Cacheszz124/cs671_fall2013/lectures/jan_ca.pdf · Evaluating Associativity in CPU Caches Mark D. Hill, Alan Jay Smith IEEE Transactions on Computers(1989)

CPU Caches

● Direct mapped● Fully associative● Set associative

– Block size– Number of sets– Associativity (elements in one set)– Set mapping function (block -> set)

● Usually bit selection (modulo)

– Replacement policy

Page 3: Evaluating Associativity in CPU Cacheszz124/cs671_fall2013/lectures/jan_ca.pdf · Evaluating Associativity in CPU Caches Mark D. Hill, Alan Jay Smith IEEE Transactions on Computers(1989)

Data and related work

● Trace driven simulation– Samples must be short, space and time limits (1989)

– Metric: miss rate

● Related work– Simulation Algorithms

● Mattson et al. Introduced inclusion property of alternative caches (not multilevel inclusion)

– Requires total ordering of pages in set before eviction– Stack simulation for caches

– Associativity● 32K and smaller caches● Overall design of bigger caches● Few papers focus solely on associativity

Page 4: Evaluating Associativity in CPU Cacheszz124/cs671_fall2013/lectures/jan_ca.pdf · Evaluating Associativity in CPU Caches Mark D. Hill, Alan Jay Smith IEEE Transactions on Computers(1989)

Stack simulation

● Usually linked list implementation– There are more complex implementations

● Avltree, hash tables, ...

– Good for CPU caches (few links)● CPU references – high degree of locality● Caches have large number of sets and limited associativity

– If we use LRU, references that hit most recently used element can be deleted without affecting number of misses

– We record distance of every reference:● n-way cache after K references: miss_ratio = 1 – Σ distance[i]/K; i<n

Page 5: Evaluating Associativity in CPU Cacheszz124/cs671_fall2013/lectures/jan_ca.pdf · Evaluating Associativity in CPU Caches Mark D. Hill, Alan Jay Smith IEEE Transactions on Computers(1989)

Trace Data

Page 6: Evaluating Associativity in CPU Cacheszz124/cs671_fall2013/lectures/jan_ca.pdf · Evaluating Associativity in CPU Caches Mark D. Hill, Alan Jay Smith IEEE Transactions on Computers(1989)

Methods and Traces

● Primary metric: miss ratio

– Effective access time: tcache + miss_ratio * tmemory

– Increase in associativity can increase access time, and degrade performance

– easy to define, interpret, compute, implementation independent

● Traces:– Five trace group (5 x 500 000 references)

– 23-trace groups (170- 400 k references)

– Include instruction fetch references

– Both cold and warm startup

– Trace limitations, large caches subject to errors

Page 7: Evaluating Associativity in CPU Cacheszz124/cs671_fall2013/lectures/jan_ca.pdf · Evaluating Associativity in CPU Caches Mark D. Hill, Alan Jay Smith IEEE Transactions on Computers(1989)

Figure 2

Page 8: Evaluating Associativity in CPU Cacheszz124/cs671_fall2013/lectures/jan_ca.pdf · Evaluating Associativity in CPU Caches Mark D. Hill, Alan Jay Smith IEEE Transactions on Computers(1989)

Figure 3

Page 9: Evaluating Associativity in CPU Cacheszz124/cs671_fall2013/lectures/jan_ca.pdf · Evaluating Associativity in CPU Caches Mark D. Hill, Alan Jay Smith IEEE Transactions on Computers(1989)

Simulation of alternative DM and SA caches

● Useful properties– Set refinement

● Function f refines g if f(x) = f(y) => g(x) = g(y), for all blocks x,y.● C2 refines C1

– Inclusion● Cache C1 includes C2: after any series of references for any block x:

x is resident C1 => x is resident C2

● Theorem– Same block size, LRU replacement:

● C2 includes C1 <=> F2 refines F1 AND assoc2 >= assoc1

Page 10: Evaluating Associativity in CPU Cacheszz124/cs671_fall2013/lectures/jan_ca.pdf · Evaluating Associativity in CPU Caches Mark D. Hill, Alan Jay Smith IEEE Transactions on Computers(1989)

Useful implications

● C2 refines C1 && mapping functions differ => greater number of sets

● Bit selection: 2i refines 2j for all i >= j

● C2 must be strictly larger than a different C1 in order to include it

● Refinement implies inclusion in direct mapped caches.● Inclusion holds for direct mapped caches using bit selection● Inclusion does not hold between pairs of different set-associative caches● Inclusion is a partial ordering of set of caches● Refinement is a partial ordering● Refinement can be used to speed up simulation of alternate caches that

use LRU replacement

Page 11: Evaluating Associativity in CPU Cacheszz124/cs671_fall2013/lectures/jan_ca.pdf · Evaluating Associativity in CPU Caches Mark D. Hill, Alan Jay Smith IEEE Transactions on Computers(1989)

Simulating Direct-Mapped Caches

● Forest simulation– Requires that mapping functions obey set-refinement

● It implies inclusion, advantageous

– The data structures uses trees to simulate alternative direct mapped caches

– Each level for one cache

– Key idea: ● start at the top and proceed down until a reference is found● Increment distance[i] if found on level i● miss_ratio = 1 – Σdistance[i]/K; i<n

– Can be extended to n-way caches by replacing nodes with stacks

Page 12: Evaluating Associativity in CPU Cacheszz124/cs671_fall2013/lectures/jan_ca.pdf · Evaluating Associativity in CPU Caches Mark D. Hill, Alan Jay Smith IEEE Transactions on Computers(1989)

Forest example

Page 13: Evaluating Associativity in CPU Cacheszz124/cs671_fall2013/lectures/jan_ca.pdf · Evaluating Associativity in CPU Caches Mark D. Hill, Alan Jay Smith IEEE Transactions on Computers(1989)

Simulating Set-Associative Caches

● New all-associativity simulation– Simulating alternative DM and SA caches that have

same block size, LRU, no prefetching

– Generalization of earlier work

– Unique accesses can be usually stored in memory● Storage space can be reclaimed if not used by any cache

– Single run for all alternative caches

Page 14: Evaluating Associativity in CPU Cacheszz124/cs671_fall2013/lectures/jan_ca.pdf · Evaluating Associativity in CPU Caches Mark D. Hill, Alan Jay Smith IEEE Transactions on Computers(1989)

Figure 6

Page 15: Evaluating Associativity in CPU Cacheszz124/cs671_fall2013/lectures/jan_ca.pdf · Evaluating Associativity in CPU Caches Mark D. Hill, Alan Jay Smith IEEE Transactions on Computers(1989)

Figure 7

Page 16: Evaluating Associativity in CPU Cacheszz124/cs671_fall2013/lectures/jan_ca.pdf · Evaluating Associativity in CPU Caches Mark D. Hill, Alan Jay Smith IEEE Transactions on Computers(1989)

Figure 8

Page 17: Evaluating Associativity in CPU Cacheszz124/cs671_fall2013/lectures/jan_ca.pdf · Evaluating Associativity in CPU Caches Mark D. Hill, Alan Jay Smith IEEE Transactions on Computers(1989)

Trace performance

Size Number of sets (32B block)

16K 512 256 128

32K 1024 512 256

64K 2048 1024 512

128K 4096 2048 1024

● Comparable for single cache● Forest is fastest for DM caches● All-assoc fastest for general caches (DM + SA)

Page 18: Evaluating Associativity in CPU Cacheszz124/cs671_fall2013/lectures/jan_ca.pdf · Evaluating Associativity in CPU Caches Mark D. Hill, Alan Jay Smith IEEE Transactions on Computers(1989)

Associativity and Miss Ratio

● Relationships exist independent of cache size● Categorizing cache misses

– Conflict misses (no more in same set)● miss_ratio – miss_ratiofully associative

– Capacity misses (no more space in cache)● miss_ratiofully associative – miss_ratioinfinite

– Compulsory misses (first time data)● miss_ratioinfinite

Page 19: Evaluating Associativity in CPU Cacheszz124/cs671_fall2013/lectures/jan_ca.pdf · Evaluating Associativity in CPU Caches Mark D. Hill, Alan Jay Smith IEEE Transactions on Computers(1989)

Table III

Page 20: Evaluating Associativity in CPU Cacheszz124/cs671_fall2013/lectures/jan_ca.pdf · Evaluating Associativity in CPU Caches Mark D. Hill, Alan Jay Smith IEEE Transactions on Computers(1989)

Set associative vs. Fully associative

● pi(s) – probability that reference is made to i-th most recent in one of n sets

● qi – probability that reference is made to i-th most recent in FA cache

● Miss ratio n-way: 1 – Σ p i(s)

● Miss ratio FA (n-blocks): 1 – Σ q i

● Bayes rule:

– pn(s) = Σ Prob(LRU distance n with s sets | LRU distance i with 1 set) q i

● Probability of set conflict is 1/s

Page 21: Evaluating Associativity in CPU Cacheszz124/cs671_fall2013/lectures/jan_ca.pdf · Evaluating Associativity in CPU Caches Mark D. Hill, Alan Jay Smith IEEE Transactions on Computers(1989)

Figure 11

Page 22: Evaluating Associativity in CPU Cacheszz124/cs671_fall2013/lectures/jan_ca.pdf · Evaluating Associativity in CPU Caches Mark D. Hill, Alan Jay Smith IEEE Transactions on Computers(1989)

It works!

● Predictions are accurate– Error usually less than 5%

● Predictions are usually more pessimistic– Bit selection collision slightly less likely than random due to

locality

● Error gets smaller with increased associativity● It's not important (can be measured)● IMPORTANT:

– Increase in miss ratio is nearly identical to results that assume independent and equal probability of sets conflict

Page 23: Evaluating Associativity in CPU Cacheszz124/cs671_fall2013/lectures/jan_ca.pdf · Evaluating Associativity in CPU Caches Mark D. Hill, Alan Jay Smith IEEE Transactions on Computers(1989)

Comparing SA Caches

● Miss ratio spread– Two caches, same capacity, n-way vs. 2n-way

Mn/M2n - 1 = (Mn – M2n)/M2n

– Data smoothed using weighted average● 0.15 for distance 2● 0.20 for distance 1● 0.30 for current

– Large caches again subject to errors● > 64K

Page 24: Evaluating Associativity in CPU Cacheszz124/cs671_fall2013/lectures/jan_ca.pdf · Evaluating Associativity in CPU Caches Mark D. Hill, Alan Jay Smith IEEE Transactions on Computers(1989)

Figure 12

Page 25: Evaluating Associativity in CPU Cacheszz124/cs671_fall2013/lectures/jan_ca.pdf · Evaluating Associativity in CPU Caches Mark D. Hill, Alan Jay Smith IEEE Transactions on Computers(1989)

Figure 13

Page 26: Evaluating Associativity in CPU Cacheszz124/cs671_fall2013/lectures/jan_ca.pdf · Evaluating Associativity in CPU Caches Mark D. Hill, Alan Jay Smith IEEE Transactions on Computers(1989)

Table IV

Page 27: Evaluating Associativity in CPU Cacheszz124/cs671_fall2013/lectures/jan_ca.pdf · Evaluating Associativity in CPU Caches Mark D. Hill, Alan Jay Smith IEEE Transactions on Computers(1989)

Trends

● Spread in low associativity caches is larger– DM to 2-way

● Except for instruction caches, size does not matter– Spread ratio in small instruction caches is smaller

– Sequential behavior of instruction references

● Positively correlated with block size– Larger blocks => fewer sets

● Miss ratio spread of data and unified caches is similar– Smaller for instruction caches

● 8->4, 4->2, 2->1, spread ratio of 5%, 10%, and 30%– Regardless of size, type, block size!!

● Design target miss ratios– Rule of thumb: miss ratio drops as the square root of the cache size

Page 28: Evaluating Associativity in CPU Cacheszz124/cs671_fall2013/lectures/jan_ca.pdf · Evaluating Associativity in CPU Caches Mark D. Hill, Alan Jay Smith IEEE Transactions on Computers(1989)

Conclusion

● Both set refinement and cache inclusion useful for developing fast simulation algorithms– Forest simulation for direct mapped caches

– All associativity simulation

● Miss classification– Conflict, capacity, compulsory

● Difference between FA and SA caches can be predicted● Miss ratio spread is invariant to cache size, and original miss ratio

– 5, 10, 30 percent

● Trace size limitations skewed the results for large caches– 64K, 128K