an introduction to cache design
DESCRIPTION
An Introduction to Cache Design. Cache. A safe place for hiding and storing things. Webster Dictionary. Even with the inclusion of cache, almost all CPUs are still mostly strictly limited by the cache access-time : - PowerPoint PPT PresentationTRANSCRIPT
An Introduction to
Cache Design
112/04/20 \course\cpeg323-08F\Topic7a 1
112/04/20 \course\cpeg323-08F\Topic7a 2
Cache
A safe place for hiding and storing things.
Webster Dictionary
112/04/20 \course\cpeg323-08F\Topic7a 3
Even with the inclusion of cache, almost all CPUs are
still mostly strictly limited by the cache access-time:
In most cases, if the cache access time were decreased,
the machine would speedup accordingly.
- Alan Smith -
Even more so for MPs!
112/04/20 \course\cpeg323-08F\Topic7a 4
While one can imagine ref. patterns that can defeat
existing cache M designs, it is the author’s
experience that cache M improve performance for
any program or workload which actually does useful
computation.
112/04/20 \course\cpeg323-08F\Topic7a 5
Generally has four aspects:
1. Maximizing the probability of finding a memory reference’s
target in the cache (the hit ratio).
2. Minimizing the time to access information that is indeed in the
cache (access time).
3. Minimizing the delay due to a miss.
4. Minimizing the overheads of updating main memory,
maintaining cache coherence etc.
Optimizing the design of a cache memory
112/04/20 \course\cpeg323-08F\Topic7a 6
Key Factor in Design Decision for VM and Cache
Access-timeMainMem
Access-timeCache
Access-timeSecondaryMem
Access-timeMainMem
= 4 ~ 20.
= 104 ~ 106.
Cache control is usually implemented in hardware!!
112/04/20 \course\cpeg323-08F\Topic7a 7
Memory Technology Typical Access Time $ per MbyteSRAM 10-20 ns 200-400DRAM 90-120 ns 50-100
Magnetic disk 10,000,000 - 20,000,000 ns 2-5
Technology in 1990s:
Technology in 2000s ?
112/04/20 \course\cpeg323-08F\Topic7a 8
Memory Technology Typical Access Time $ per GbyteSRAM 0.5 - 5 ns 4,000 - 10,000DRAM 50 - 70 ns 100 - 200
Magnetic disk 10,000,000 - 20,000,000 ns 0.5 - 2
Technology in 2004:
Technology in 2008s ?
See P&H Fig. pg. 469 3rd Ed
112/04/20 \course\cpeg323-08F\Topic7a 9
Memory Technology Typical Access Time $ per GbyteSRAM 0.5 -2.5 ns 2,000 - 5,000DRAM 50 - 70 ns 20 - 75
Magnetic disk 5,000,000 - 20,000,000 ns 0.2 - 2
Technology in 2008:
See P&H Fig. pg. 453 4th Ed
112/04/20 \course\cpeg323-08F\Topic7a 10
ProcessorProcessorCacheCache
Main
Memory
Main
Memory
Secondary
Memory
Secondary
Memory
Cache in Memory Hierarchy
Emerging Memory Device Technologies
Source: Emerging Nanoscale Memory and Logic devices: A Critical Assesment”, Hutchby et al, IEEE Computer, May, 2008
Emerging Memory Device Technologies
Source: “Emerging Nanoscale Memory and Logic devices: A Critical Assesment”, Hutchby et al, IEEE Computer, May, 2008
112/04/20 \course\cpeg323-08F\Topic7a 13
Source: Kooge, Peter ACS Productivity Workshop 2008
112/04/20 \course\cpeg323-08F\Topic7a 15
Four Questions for Classifying Memory Hierarchies:
The fundamental principles that drive all memory
hierarchies allow us to use terms that transcend the levels
we are talking about. These same principles allow us to
pose four questions about any level of the hierarchy:
112/04/20 \course\cpeg323-08F\Topic7a 16
Q1: Where can a block be placed in the upper
level? (Block placement)
Q2: How is a block found if it is in the upper
level? (Block identification)
Q3: Which block should be replaced on a
miss? (Block replacement)
Q4: What happens on a write? (Write strategy)
Four Questions for Classifying Memory Hierarchies
112/04/20 \course\cpeg323-08F\Topic7a 17
These questions will help us gain
an understanding of the different
tradeoffs demanded by the
relationships of memories at
different levels of a hierarchy.
112/04/20 \course\cpeg323-08F\Topic7a 18
01173 30
Line
ADDRESS DATA
Concept of Cache miss and Cache hit
0 1 2 3 4 5 6 7
TAGS DATA
0117X 35, 72, 55, 30, 64, 23, 16, 14
7620X 11, 31, 26, 22, 55, …
3656X 71, 72, 44, 50, …
1741X 33, 35, 07, 65, ...
112/04/20 \course\cpeg323-08F\Topic7a 19
teff : effective cache access time
tcache : cache access time
tmain : main memory access time
h : hit ratio
teff = htcache + (1-h)tmain
Access Time
112/04/20 \course\cpeg323-08F\Topic7a 20
Example
Let tcache = 10 ns - 1- 4 clock cycles
tmain = 50 ns - 8-32 clock cycles
h = 0.95
teffect = ?
10 x 0.95 + 50 x 0.05
9.5 + 2.5 = 12
112/04/20 \course\cpeg323-08F\Topic7a 21
Hit Ratio
• Need high enough (say > 90%) to obtain
desirable level of performance
• Amplifying effect of changes
• Never a constant even for the same
machine
112/04/20 \course\cpeg323-08F\Topic7a 22
Sensitivity of Performance w.r.t h (hit ratio)
teff = h tcache + (1-h) tmain
= tcache [ h + (1-h) ]
tcache [ 1 + (1-h) ]
since 10, the magnifactor of h changes is 10
times.
Conclusion: very sensitive
tmain
tcachetmain
tcache
tmain
tcache
~~
112/04/20 \course\cpeg323-08F\Topic7a 23
• Remember:
“h 1”
• Example:
Let h = 0.90
if h = 0.05 (0.90 0.95)
then (1 - h) = 0.05
then teff = tcache ( 1 + 0.5)
~~
112/04/20 \course\cpeg323-08F\Topic7a 24
Basic Terminology
• Cache line (block) - size of a room
1 ~ 16 words
• Cache directory - key of rooms
Cache may use associativity to find the “right directory” by matching
“A collection of contiguousdata that are treated as a single entity of cache storage.”
The portion of a cache thatholds the access keys that support associative access.
112/04/20 \course\cpeg323-08F\Topic7a 25
Cache Organization
• Fully associative: an element can be in any block
• Direct mapping : an element can be in only one
block.
• Set-associative : an element can be in a group
of block
112/04/20 \course\cpeg323-08F\Topic7a 26
An Example
Mem Size = 256 k words x 4B/W = 1 MB
Cache Size = 2 k words = 8 k byte
Block Size = 16 word/block = 64 byte/block
So
Main M has = 16 k blocks (16,384)
Cache has = 128 blocks
addr = 18 bits + 2 bits = (28 x 210) x 22
256K16
2K16
(byte)20
256 k words
112/04/20 \course\cpeg323-08F\Topic7a 27
Fully Associative
Feature
Any block in M can be in any block-
frame in cache.
All entries (block frame) are compared
simultaneously (by associative search).
112/04/20 \course\cpeg323-08F\Topic7a 28
simplest example: a block = a word
entire memory word address becomes
Address
027560
0 17
027560 data
0 17
adv: no trashing (quick reorganizing)disadv: overhead of associative search:
cost + time
very “flexible” and higherprobability to reside in cache.
Cache
“tag”
A Special Case
112/04/20 \course\cpeg323-08F\Topic7a 29
Fully associative cache organization
112/04/20 \course\cpeg323-08F\Topic7a 30
• No associative match
• From M-addr, “directly” indexed to the
block frame in cache where the block
should be located. A comparison then is
to used to determine if it is a miss or hit.
Direct Mapping
112/04/20 \course\cpeg323-08F\Topic7a 31
Direct Mapping
Advantage:
simplest:
Disadvantage: “trashing”
Cont’d
Fast (fewer logic)Low cost: (only one set comparator is needed
hence can be in the form of standard M
112/04/20 \course\cpeg323-08F\Topic7a 32
since cache only has 128 block frames so the degree of multiplexing:
Disadr: “trashing”
Main Memory Size 16384 (block)
128 (27) 128= = 27 block/frame
for addressing the corresponding frame or set of size 1.
the high-order7 bit is usedas tag.
i.e. 27 blocks “fall” in one block frame.
Example
112/04/20 \course\cpeg323-08F\Topic7a 33
Direct Mapping
112/04/20 \course\cpeg323-08F\Topic7a 34
Direct Mapping
Mapping (indexing) block addr mod (# of blocks in cache –
in this case: mod (27))
Adv: low-order log2 (cache size) bit can be used for indexing
Cont’d
112/04/20 \course\cpeg323-08F\Topic7a 35
Set-Associative
• A compromises between direct/full-associative
• The cache is divided into S sets
S = 2, 4, 8, …
• If the cache has M blocks
than, all together, there are
E = blocks/set
# of buildings available for indexing
MS
In our example, S = 128/2 = 64 sets
112/04/20 \course\cpeg323-08F\Topic7a 36
2-way set associative
The 6-bit will index to the right set, then the 8-bit tag will be used for an associative match.
112/04/20 \course\cpeg323-08F\Topic7a 37
Associativity with 8-block cache
112/04/20 \course\cpeg323-08F\Topic7a 38
thus
or
Set Word
8 6 4 2
a 2-way set associative organization:
available for indexing
214 (16k)26
= 28 block/set
28 block/per set of 2 blocks
6 bit used to indexinto the right “set”higher order
8 bit used as taghence an associativematch of 8 bit withthe tags of the 2 blocks is required
2 way
Hence an associative matching of 8 bit with the tags of the 2 block is required.
112/04/20 \course\cpeg323-08F\Topic7a 39
Sector Mapping Cache
• Sector (IBM 360/85) - 16 sector x 16 block/sector- 1 sector = consecutive multiple blocks- Cache miss: sector replacement- Valid bit - one block is moved on demand
• Example:
Sector block word (tag)
0 6 7 13 14 177 7 4
A sector in memory can be in any sector in cache
112/04/20 \course\cpeg323-08F\Topic7a 40
Sector Mapping Cache
112/04/20 \course\cpeg323-08F\Topic7a 41
Cache has = 8 sector
Main memory has = 1K sectors
128 blocks16 blocks/sector
16k
16
Sector mapping cache
cont’d
112/04/20 \course\cpeg323-08F\Topic7a 42
Example
See P&H Fig. 7.7 3rd Ed or 5.7 4th Ed
112/04/20 \course\cpeg323-08F\Topic7a 43
Total # of Bits in a CacheTotal # of bits = Cache size x (# of bits of a tag + # of bits of a block +
# of bits in valid field)
For the example:
Direct mapped Cache with 4kB of data, 1-word blocks and 32 bit address
4kB = 1k words = 210 words = 210 blocks
# of bits of tag = 32 – (10 + 0 + 2) = 20
210 blocks 20 words/block 22 bytes/word
Total # of bits = 210 x (20 + 32*1 + 1) = 53* 210 = 53 kbits = 6.625kBytes
112/04/20 \course\cpeg323-08F\Topic7a 44
Another example: FastMATHFast embedded microprocessor that uses the MIPS Architecture and a
simple cache implementation.
16kB of data, 16-word blocks and 32 bit address
214 bytes * 1 word/4bytes * 1 block/16 words = 214 / (22 * 24 ) = 28 blocks
# of bits of tag = 32 – (8 + 4 + 2) = 18
28 blocks 24 words/block 22 bytes/word
Total # of bits = 28 x (18 + 32*16 + 1) = 531* 28 = 135,936 bits
= 132.75 kBytes
112/04/20 \course\cpeg323-08F\Topic7a 45
Example FastMATH
See P&H Fig. 7.9 3rd Ed or 5.9 4th Ed