Caching
Chapter 7
Memory Hierarchy
CPU
L1
L2 Cache
DRAM
Speed
Fastest
Slowest
Size
Smallest
Largest
Cost/bit
Highest
Lowest
Tech
SRAM(logic)
SRAM(logic)
DRAM(capacitors)
Two design decisions
• What shall we put in the cache?
• How shall we organize cache to – find things quickly– hold the most important data– freezer or backpack….
What to put in cache?Try to apply a similar problem’s solution
• Can we predict what data we will use?
What to put in cache?
• Can we predict what data we will use?– Instead of predicting branch direction, predict
next memory address request
What to put in cache?
• Can we predict what data we will use?– Instead of predicting branch direction, predict
next memory address request– Like branch prediction, use previous behavior
What to put in cache?
• Can we predict what data we will use?– Instead of predicting branch direction, predict
next memory address request– Like branch prediction, use previous behavior
• Keep a prediction for every load?– Fetch stage for load is *TOO LATE*
• Keep a prediction per-memory address?
What to put in cache?
• Can we predict what data we will use?– Instead of predicting branch direction, predict
next memory address request– Like branch prediction, use previous behavior
• Keep a prediction for every load?– Fetch stage for load is *TOO LATE*
• Keep a prediction per-memory address?– Given address, guess next likely address
What to put in cache?
• Can we predict what data we will use?– Instead of predicting branch direction, predict next
memory address request– Like branch prediction, use previous behavior
• Keep a prediction for every load?– Fetch stage for load is *TOO LATE*
• Keep a prediction per-memory address?– Given address, guess next likely address– Too many choices – table too large or fits too few
Program CharacteristicsFind out more about programs
• Temporal Locality
• Spatial Locality
Program Characteristics
• Temporal Locality– If you use one item, you are likely to use it
again soon
• Spatial Locality
Program Characteristics
• Temporal Locality– If you use one item, you are likely to use it
again soon
• Spatial Locality– If you use one item, you are likely to use its
neighbors soon
Locality
• Programs tend to exhibit spatial & temporal locality. Just a fact of life.
• How can we use this knowledge of program behavior to design a cache?
What does that mean?!?
• 1. Design cache that takes advantage of spatial & temporal locality
What does that mean?!?
• 1. Design cache that takes advantage of spatial & temporal locality
• 2. When you program, place data together that is used together to increase spatial & temporal locality
What does that mean?!?
• 1. Design cache that takes advantage of spatial & temporal locality
• 2. When you program, place data together that is used together to increase locality– Java - difficult to do– C - more control over data placement
• Note: Caches exploit locality. Programs have varying degrees of locality. Caches do not have locality!
Cache Design
• Temporal Locality
• Spatial Locality
Cache Design
• Temporal Locality– When we obtain the data, store it in the cache.
• Spatial Locality
Cache Design
• Temporal Locality– When we obtain the data, store it in the cache.
• Spatial Locality– Transfer large block of contiguous data to get
item’s neighbors.– Block (Line): Amount of data transferred for a
single miss (data plus neighbors)
Where do we put data?
• Searching whole cache takes time & power
• Direct-mapped– Limit each piece of data to one possible
position
• Search is quick and simple
What is our “key” for lookup?
• Tools are sorted by tool-type
• Books are sorted by subject (Dewey-Decimal)
• Old LISP machine sorted by data type
• Modern machines have no information – can only sort by address
Direct-Mapped
Cache
00011011
010000
100000
110000
Memory
000100
010100
100100
110100
Index
000000
Each box corresponds to one
word (4 bytes)
Direct-Mapped
Cache
00011011
Memory
One block (line)
Index
000000
010000
100000
110000
000100
010100
100100
110100
Direct-Mapped
Cache
00011011
000000
010000
100000
110000
Memory
000100
010100
100100
110100
One block (line)
Index
Draw on the board!!!Show what addresses go
where
Direct-Mapped cacheBlock (Line) size = 2 words or 8 bytes
00011011
Byte Address0b100100100
Where do we look in the cache?
How do we know if it is there?
DataIndex
Direct-Mapped cacheBlock (Line) size = 2 words or 8 bytes
00011011
Byte Address0b100100100
Where do we look in the cache? BlockAddress mod #setsBlockAddress & (#sets-1)
How do we know if it is there?
DataIndex
Where is it within the block?Block Address
Direct-Mapped cacheBlock (Line) size = 2 words or 8 bytes
00011011
Byte Address0b100100100
Where do we look in the cache? BlockAddress mod #slots BlockAddress & (#slots-1)
How do we know if it is there? We need a tag & valid bit
M[292-295]
DataTag1001
Valid1 M[288-291]
Where is it within the block?IndexTag
00011011
Direct-Mapped Cache
DataTagValid
000
00b1010001
Tag
Index
Byte Offset
Block Offset
Splitting the Address
Definitions
• Byte Offset: Which _____ within _____?
• Block Offset: Which _____ within ______?
• Set: Group of ______ checked each access
• Index: Which ______ within cache?• Tag: Is this the right one?
Definitions
• Byte Offset: Which byte within word• Block Offset: Which _____ within
______?• Set: Group of ______ checked each
access• Index: Which ______ within cache?• Tag: Is this the right one?
Definitions
• Byte Offset: Which byte within word• Block Offset: Which word within
block• Set: Group of ______ checked each
access• Index: Which ______ within cache?• Tag: Is this the right one?
Definitions
• Byte Offset: Which byte within word• Block Offset: Which word within
block• Set: Group of blocks checked each
access• Index: Which ______ within cache?• Tag: Is this the right one?
Definitions
• Byte Offset: Which byte within word
• Block Offset: Which word within block
• Set: Group of blocks checked each access
• Index: Which set within cache?
• Tag: Is this the right one?
(All of the upper bits)
Definitions
• Block (Line)
• Hit
• Miss
• Hit time / Access time
• Miss Penalty
Definitions
• Block - unit of data transfer – bytes/words
• Hit
• Miss
• Hit time / Access time
• Miss Penalty
Definitions
• Block - unit of data transfer – bytes/words
• Hit - data found in this cache
• Miss
• Hit time / Access time
• Miss Penalty
Definitions
• Block - unit of data transfer – bytes/words
• Hit - data found in this cache
• Miss - data not found in this cache– Send request to lower level
• Hit time / Access time
• Miss Penalty
Definitions
• Block - unit of data transfer – bytes/words• Hit - data found in this cache• Miss - data not found in this cache
– Send request to lower level
• Hit time / Access time– Time to access this cache – look for item, return
data
• Miss Penalty
Definitions• Block - unit of data transfer – bytes/words
• Hit - data found in this cache
• Miss - data not found in this cache– Send request to lower level
• Hit time / Access time– Time to access this cache
• Miss Penalty– Time to receive block from lower level– Not always constant
00011011
Direct-Mapped Cache
DataTagValid
000
0 0x1010001
Tag
Index
Byte Offset
Block Offset
Example 1 – Direct-MappedBlock size=2 words
00011011
Direct-Mapped Cache
DataTagValid
000
0
Reference Stream: Hit/Miss0b10010000b00101000b01110000b00100000b00101000b0100100
Miss Rate:Tag Index Byte OffsetBlock Offset
Example 1 – Direct-MappedBlock size=2 words
Example 1 – Direct-MappedBlock size=2 words
00011011
Direct-Mapped Cache
DataTagValid
00
0
Reference Stream: Hit/Miss0b10010000b00101000b0111000 0b00100000b0010100 0b0100100
Miss Rate:Tag Index Byte OffsetBlock Offset
0
Example 1 – Direct-MappedBlock size=2 words
001001
1011
M[76-79]
Direct-Mapped Cache
DataTagValid
100
0
Reference Stream: Hit/Miss0b1001000 M0b00101000b0111000 0b00100000b0010100 0b0100100
Miss Rate:Tag Index Byte OffsetBlock Offset
M[72-75]
001001
1011
Direct-Mapped Cache
DataTagValid
100
0
Reference Stream: Hit/Miss0b1001000 M0b00101000b0111000 0b00100000b0010100 0b0100100
Miss Rate:Tag Index Byte OffsetBlock Offset
Example 1 – Direct-MappedBlock size=2 words
M[76-79] M[72-75]
0010010010
11M[20-23]
Direct-Mapped Cache
DataTagValid
11
0
0
Reference Stream: Hit/Miss0b1001000 M0b00101000b0111000 0b00100000b0010100 0b0100100
Miss Rate:Tag Index Byte OffsetBlock Offset
M[16-19]
Example 1 – Direct-MappedBlock size=2 words
M[76-79] M[72-75]
0010010010
11
Direct-Mapped Cache
DataTagValid
0
11
0
Reference Stream: Hit/Miss0b1001000 M0b0010100 M0b01110000b00100000b0010100 0b0100100
Miss Rate:Tag Index Byte OffsetBlock Offset
Example 1 – Direct-MappedBlock size=2 words
M[76-79] M[72-75]M[20-23] M[16-19]
00100100100111 M[60-63]
Direct-Mapped Cache
DataTagValid
111
0
Reference Stream: Hit/Miss0b1001000 M0b0010100 M0b0111000 M0b00100000b0010100 0b0100100
Miss Rate:Tag Index Byte OffsetBlock Offset
M[56-59]
Example 1 – Direct-MappedBlock size=2 words
M[76-79] M[72-75]M[20-23] M[16-19]
00100100100111
Direct-Mapped Cache
DataTagValid
111
0
Reference Stream: Hit/Miss0b1001000 M0b0010100 M0b0111000 M0b00100000b0010100 0b0100100
Miss Rate:Tag Index Byte OffsetBlock Offset
Example 1 – Direct-MappedBlock size=2 words
M[76-79] M[72-75]M[20-23] M[16-19]M[60-63] M[56-59]
00100100100111
Direct-Mapped Cache
DataTagValid
111
0
Reference Stream: Hit/Miss0b1001000 M0b0010100 M0b0111000 M0b0010000 H0b0010100 0b0100100
Miss Rate:Tag Index Byte OffsetBlock Offset
Example 1 – Direct-MappedBlock size=2 words
M[16-19]M[20-23]M[76-79] M[72-75]
M[60-63] M[56-59]
00100100100111
Direct-Mapped Cache
DataTagValid
111
0
Reference Stream: Hit/Miss0b1001000 M0b0010100 M0b0111000 M0b0010000 H0b0010100 0b0100100
Miss Rate:Tag Index Byte OffsetBlock Offset
Example 1 – Direct-MappedBlock size=2 words
M[16-19]M[20-23]M[76-79] M[72-75]
M[60-63] M[56-59]
00100100100111
Direct-Mapped Cache
DataTagValid
111
0
Reference Stream: Hit/Miss0b1001000 M0b0010100 M0b0111000 M0b0010000 H0b0010100 H0b0100100
Miss Rate:Tag Index Byte OffsetBlock Offset
Example 1 – Direct-MappedBlock size=2 words
M[16-19]M[20-23]M[76-79] M[72-75]
M[60-63] M[56-59]
00100101100111
Direct-Mapped Cache
DataTagValid
111
0
Reference Stream: Hit/Miss0b1001000 M0b0010100 M0b0111000 M0b0010000 H0b0010100 H0b0100100 M
Miss Rate:Tag Index Byte OffsetBlock Offset
Example 1 – Direct-MappedBlock size=2 words
M[16-19]M[20-23]M[76-79] M[72-75]
M[60-63] M[56-59]
0100100101100111
M[36-39]
Direct-Mapped Cache
DataTagValid
111
1
Reference Stream: Hit/Miss0b1001000 M0b0010100 M0b0111000 M0b0010000 H0b0010100 H0b0100100 M
Miss Rate:Tag Index Byte OffsetBlock Offset
M[32-35]
Example 1 – Direct-MappedBlock size=2 words
M[16-19]M[20-23]M[76-79] M[72-75]
M[60-63] M[56-59]
0100100101100111
Direct-Mapped Cache
DataTagValid
111
1
Reference Stream: Hit/Miss0b1001000 M0b0010100 M0b0111000 M0b0010000 H0b0010100 H0b0100100 M
Miss Rate: Tag Index Byte OffsetBlock Offset
Example 1 – Direct-MappedBlock size=2 words
M[16-19]M[20-23]M[76-79] M[72-75]
M[60-63] M[56-59]
M[36-39] M[32-35]
0100100101100111
Direct-Mapped Cache
DataTagValid
111
1
Reference Stream: Hit/Miss0b1001000 M0b0010100 M0b0111000 M0b0010000 H0b0010100 H0b0100100 M
Miss Rate: 4 / 6 = 67%Hit Rate: 2 / 6 = 33%
Tag Index Byte OffsetBlock Offset
Example 1 – Direct-MappedBlock size=2 words
M[16-19]M[20-23]M[76-79] M[72-75]
M[60-63] M[56-59]
M[36-39] M[32-35]
Implementation
00011011
DataTagValid
Byte Address0b100100100
Tag IndexByte Offset
=
Hit?
MUX
Block offset
Data
Example 2• You are implementing a 64-Kbyte cache,
32-bit address• The block size (line size) is 16 bytes.• Each word is 4 bytes• How many bits is the block offset?
• How many bits is the index?
• How many bits is the tag?
Example 2• You are implementing a 64-Kbyte cache
• The block size (line size) is 16 bytes.
• Each word is 4 bytes
• How many bits is the block offset?– 16 / 4 = 4 words -> 2 bits
• How many bits is the index?
• How many bits is the tag?
Example 2• You are implementing a 64-Kbyte cache
• The block size (line size) is 16 bytes.
• Each word is 4 bytes, address 32 bits
• How many bits is the block offset?– 16 / 4 = 4 words -> 2 bits
• How many bits is the index?– 64*1024 / 16 = 4096 -> 12 bits
• How many bits is the tag?
Example 2• You are implementing a 64-Kbyte cache
• The block size (line size) is 16 bytes.
• Each word is 4 bytes, address 32 bits
• How many bits is the block offset?– 16 / 4 = 4 words -> 2 bits
• How many bits is the index?– 64*1024 / 16 = 4096 -> 12 bits
• How many bits is the tag?– 32 - (2 + 12 + 2) = 16 bits
How caches work• Classic abstraction
• Each level of hierarchy has no knowledge of the configuration of lower level
L1
L2 Cache
DRAM
Memory
Me L2 Cache
DRAM
Memory
Me
L1 cache’s perspective L2 cache’s perspective
Memory Operation at any level
Cache
Memory
Me
Address
1. Cache receives request1.
Cache
Memory
Me
Address
1. Cache receives request2. Look for item in cache
Memory operation at any level
1.
2.
Cache
Memory
Me
Address
1. Cache receives request2. Look for item in cache
Hit - return data
Memory operation at any levelData
1.
2.
3.
Cache
Memory
Me
Address
1. Cache receives request2. Look for item in cache
Hit - return dataMiss - request memory
Memory operation at any level
1.
2.
3.
Cache
Memory
Me
Address
1. Cache receives request2. Look for item in cache
Hit - return dataMiss - request memory
receive dataupdate cache
Memory operation at any level
1.
2.
3.4.
Cache
Memory
Me
Address
1. Cache receives request2. Look for item in cache
Hit - return dataMiss – 3. request memory
4. receive data5. update cache5. return data
Memory operation at any levelData
1.
2.
3.4.
5.
Timing
Cache
Memory
Me
Address
1. Cache receives request
Cache
Memory
Me
Address
1. Cache receives request2. Look for item in cache
Timing
Access Time
Cache
Memory
Me
Address
1. Cache receives request2. Look for item in cache
Hit - return data
Data
Access Time
Cache
Memory
Me
Address
1. Cache receives request2. Look for item in cache
Hit - return dataMiss - request memory
Access Time
Cache
Memory
Me
Address
1. Cache receives request2. Look for item in cache
Hit - return dataMiss - request memory
receive blockupdate cache
Access Time
Miss Penalty
Cache
Memory
Me
Address
1. Cache receives request2. Look for item in cache
Hit - return dataMiss - request memory
receive blockupdate cachereturn data
Data
Access Time
Miss Penalty
Performance
• Hit: latency =
• Miss: latency =
• Goal: minimize misses!!!
Performance
• Hit: latency = access time
• Miss: latency =
• Goal: minimize misses!!!
Performance
• Hit: latency = access time
• Miss: latency = access time + miss penalty
• Goal: minimize misses!!!