recap: memory hierarchy
DESCRIPTION
Recap: Memory Hierarchy. Processor. Secondary Storage (Disk). Control. Main Memory (DRAM). L2 Off-Chip Cache. L1 On-Chip Cache. Datapath. Registers. Speed:. Size:. Cost:. Memory Hierarchy - the Big Picture. Problem: memory is too slow and or too small - PowerPoint PPT PresentationTRANSCRIPT
1
Recap: Memory Hierarchy
2
Memory Hierarchy - the Big Picture
• Problem: memory is too slow and or too small• Solution: memory hierarchy
Fastest Slowest
Smallest Biggest
Highest Lowest
Speed:
Size:
Cost:
Control
Datapath
SecondaryStorage(Disk)
Processor
Registers
L2Off-Chip
Cache
MainMemory(DRAM)
L1 O
n-C
hip
Cach
e
3
Why Hierarchy Works• The principle of locality
– Programs access a relatively small portion of the address space at any instant of time.
– Temporal locality: recently accessed instruction/data is likely to be used again– Spatial locality: instruction/data near recently accessed /instruction data is likely to be used soon
• Result: the illusion of large, fast memory
Address Space0 2n - 1
Probabilityof reference
4
Example of Locality
int A[100], B[100], C[100], D; for (i=0; i<100; i++) {
C[i] = A[i] * B[i] + D;}
A[0]A[1]A[2]A[3]A[5]A[6]A[7] A[4]
A[96]A[97]A[98]A[99]B[1]B[2]B[3] B[0]
. . . . . . . . . . . . . .
B[5]B[6]B[7] B[4]B[9]B[10]B[11] B[8]
C[0]C[1]C[2]C[3]C[5]C[6]C[7] C[4]
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
C[96]C[97]C[98]C[99]D
5
Four Key Cache Questions:
1.Where can block be placed in cache? (block placement)
2.How can block be found in cache? …using a tag(block identification)
3.Which block should be replaced on a miss? (block replacement)
4.What happens on a write? (write strategy)
6
Q1: Block Placement
• Where can block be placed in cache?– In one predetermined place - direct-mapped
• Use fragment of address to calculate block location in cache• Compare cache block with tag to test if block present
– Anywhere in cache - fully associative• Compare tag to every block in cache
– In a limited set of places - set-associative• Use address fragment to calculate set• Place in any block in the set• Compare tag to every block in set• Hybrid of direct mapped and fully associative
7
Direct Mapped Block Placement
*4*0 *8 *C
Cache
0400 08 0C 10 14 18 1C 20 24 28 2C 30 34 38 3C 40 44 48 4C
Memory
address maps to block:location = (block address MOD # blocks in cache)
8
0xF011111
11111 0xAA
0x0F00000
00000 0x55
Direct Mapping
0
1
000001
0
1
0
10x0F00000
00000 0x55
11111 0xAA
0xF011111
Tag Index Data
Direct mapping:A memory value can only be placed at a single corresponding location in the cache
9
Fully Associative Block Placement
0400 08 0C 10 14 18 1C 20 24 28 2C 30 34 38 3C 40 44 48 4C
Cache
Memory
arbitrary block mappinglocation = any
10
0xF01111
1111 0xAA
0x0F0000
0000 0x55
Fully Associative Mapping
0x0F
0x55
0xAA
0xF0
Tag
Data
000110
000001
000000
111110
111111 0xF01111
1111 0xAA
0x0F0000
0000 0x55
0x0F
0x55
0xAA
0xF0
000110
000001
000000
111110
111111
Fully-associative mapping:A memory value can be anywhere in the cache
11
Set-Associative Block Placement
0400 08 0C 10 14 18 1C 20 24 28 2C 30 34 38 3C
*4*0 *8 *C
40 44 48 4C
Cache
Memory
*0 *4 *8 *C
Set 0 Set 1 Set 2 Set 3
address maps to set:location = (block address MOD # sets in cache)(arbitrary location in set)
12
0xF01111
1111 0xAA
0x0F0000
0000 0x55
Set Associative Mapping (2-Way)
0
10x0F
0x55
0xAA
0xF0
Tag Index Data
00
01
100001
10
11
0000
0000
1111
1111
Way
Way 1Way 0
Set-associative mapping:A memory value can be placed in any of a set of corresponding locations in the cache
13
Q2: Block Identification• Every cache block has an address tag and index
that identifies its location in memory• Hit when tag and index of desired word match
(comparison by hardware)• Q: What happens when a cache block is empty?
A: Mark this condition with a valid bit
0x00001C0 0xff083c2d1Tag/indexValid Data
14
Direct-Mapped Cache Design
CACHE SRAM
ADDR
DATA[31:0]
0x00001C0 0xff083c2d01 0x0000000 0x000000211 0x0000000 0x000001030010 0x23F0210 0x00000009
1TagV Data
=
030x0000000
DATA[58:32]DATA[59]
DATA HITADDRESS =1TagCacheIndex
Byte Offset
15
Set Associative Cache Design• Key idea:
– Divide cache into sets– Allow block anywhere in a set
• Advantages:– Better hit rate
• Disadvantage:– More tag bits– More hardware– Higher access time
Address
22 8
V TagIndex
0
1
2
253
254255
Data V Tag Data V Tag Data V Tag Data
3222
4-to-1 multiplexor
Hit Data
123891011123031 0
A Four-Way Set-Associative Cache (Fig. 7.17)
16
tag 11110111 data 1111000011110000101011=
Fully Associative Cache Design• Key idea: set size of one block
– 1 comparator required for each block– No address decoding– Practical only for small caches due to hardware
demands
tag 00011100 data 0000111100001111111101=
=
=
=
=
tag 11111110
tag 00000011
tag 11100110
tag 11110111 data 1111000011110000101011
data 0000000000001111111100
data 1110111100001110000001
data 1111111111111111111111
tag in 11110111 data out 1111000011110000101011
17
Cache Replacement Policy
• Random – Replace a randomly chosen line
• LRU (Least Recently Used)– Replace the least recently used line
18
LRU Policy
AA BB CC DD
MRU LRULRU+1MRU-1
Access CCC AA BB DD
Access DDD CC AA BB
Access EEE DD CC AA
Access CCC EE DD AA
Access GGG CC EE DD
MISS, replacement needed
MISS, replacement needed
19
Cache Write Cache Write StrategiesStrategies
• Need to keep cache consistent with the main memory– Reads are easy - require no modification
– Writes- when does the update occur
1 Write Though: Data is written to both the cache block and to a block of main memory. The lower level always has the most updated data; an important feature
for I/O and multiprocessing.
Easier to implement than write back.
2 Write back: Data is written or updated only to the cache block. The modified or dirty cache block is written to main memory when it’s being replaced from cache. Writes occur at the speed of cache
Uses less memory bandwidth than write through.
20
0x12340x1234
Write-through Policy
0x1234
Processor Cache
Memory
0x12340x56780x5678
21
0x12340x1234
Write-back Policy
0x1234
Processor Cache
Memory
0x12340x5678
0x56780x56780x9ABC
22
Write Buffer for Write Write Buffer for Write ThroughThrough
• A Write Buffer is needed between the Cache and Memory
– Processor: writes data into the cache and the write buffer– Memory controller: write contents of the buffer to memory
• Write buffer is just a FIFO:– Typical number of entries: 4– Works fine if: Store frequency (w.r.t. time) << 1 / DRAM
write cycle
ProcessorCache
Write Buffer
DRAM
23
Unified vs.Separate Level 1 Unified vs.Separate Level 1 CacheCache
• Unified Level 1 Cache (Princeton Memory Architecture).
A single level 1 cache is used for both instructions and data.
• Separate instruction/data Level 1 caches (Harvard Memory Architecture):
The level 1 (L1) cache is split into two caches, one for instructions (instruction cache, L1 I-cache) and the other for data (data cache, L1 D-cache).
Control
Datapath
Processor
Registers
UnifiedLevel OneCache L1
Control
Datapath
Processor
Registers
L1
I-cache
L1
D-cache
Unified Level 1 Cache (Princeton Memory Architecture)
Separate Level 1 Caches (Harvard Memory Architecture)