16memhier1 - brown universitycs.brown.edu/courses/cs033/lecture/16memhier1x.pdf · cs33 intro to...
TRANSCRIPT
![Page 1: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/1.jpg)
CS33 Intro to Computer Systems XVI–1 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
CS 33Architecture and Optimization (3)
![Page 2: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/2.jpg)
CS33 Intro to Computer Systems XVI–2 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
Hyper Threading
Execution
FunctionalUnits
Integer/Branch
FPAdd
FPMult/Div Load Store
DataCache
DataData
Addr. Addr.
GeneralInteger
Operation Results
Instruction Control
InstructionCache
FetchControl
InstructionDecode
Address
Instructions
RetirementUnit
RegisterFile
Instruction Control
InstructionCache
FetchControl
InstructionDecode
Address
Instructions
RetirementUnit
RegisterFile
![Page 3: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/3.jpg)
CS33 Intro to Computer Systems XVI–3 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
Chip
Multiple Cores
Execution
FunctionalUnits
Instruction Control
Integer/Branch
FPAdd
FPMult/Div Load Store
InstructionCache
DataCache
FetchControl
InstructionDecode
Address
Instructions
Operations
DataData
Addr. Addr.
GeneralInteger
Operation Results
RetirementUnit
RegisterFile
Execution
FunctionalUnits
Instruction Control
Integer/Branch
FPAdd
FPMult/Div Load Store
InstructionCache
DataCache
FetchControl
InstructionDecode
Address
Instructions
Operations
DataData
Addr. Addr.
GeneralInteger
Operation Results
RetirementUnit
RegisterFile
MoreCacheOther Stuff Other Stuff
![Page 4: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/4.jpg)
CS33 Intro to Computer Systems XVI–4 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
CS 33Memory Hierarchy I
![Page 5: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/5.jpg)
CS33 Intro to Computer SystemsXVI–5
Copyright © 2018 Thomas W. Doeppner. All rights reserved.
Random-Access Memory (RAM)
• Key features
– RAM is traditionally packaged as a chip
– basic storage unit is normally a cell (one bit per cell)
– multiple RAM chips form a memory
• Static RAM (SRAM)
– each cell stores a bit with a four- or six-transistor circuit
– retains value indefinitely, as long as it is kept powered
– relatively insensitive to electrical noise (EMI), radiation, etc.
– faster and more expensive than DRAM
• Dynamic RAM (DRAM)
– each cell stores bit with a capacitor; transistor is used for access
– value must be refreshed every 10-100 ms
– more sensitive to disturbances (EMI, radiation,…) than SRAM
– slower and cheaper than SRAM
![Page 6: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/6.jpg)
CS33 Intro to Computer Systems XVI–6 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
SRAM vs DRAM Summary
Trans. Access Needs Needsper bit time refresh? EDC? Cost Applications
SRAM 4 or 6 1X No Maybe 100x Cache memories
DRAM 1 10X Yes Yes 1X Main memories,frame buffers
• EDC = error detection and correction• to cope with noise, etc.
![Page 7: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/7.jpg)
CS33 Intro to Computer Systems XVI–7 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
Conventional DRAM Organization• d x w DRAM:
– dw total bits organized as d supercells of size wbits
cols
rows
0 1 2 3
0
1
2
3
Internal row buffer
16 x 8 DRAM chip
addr
data
supercell(2,1)
2 bits/
8 bits/
Memorycontroller
(to/from CPU)
![Page 8: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/8.jpg)
CS33 Intro to Computer Systems XVI–8 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
Reading DRAM Supercell (2,1)Step 1(a): row access strobe (RAS) selects row 2Step 1(b): row 2 copied from DRAM array to row buffer
Cols
Rows
RAS = 20 1 2 3
0
1
2
Internal row buffer
16 x 8 DRAM chip
3
addr
data
2/
8/
Memorycontroller
![Page 9: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/9.jpg)
CS33 Intro to Computer Systems XVI–9 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
Reading DRAM Supercell (2,1)Step 2(a): column access strobe (CAS) selects column 1Step 2(b): supercell (2,1) copied from buffer to data lines, and
eventually back to the CPU
Cols
Rows
0 1 2 3
0
1
2
3
Internal row buffer
16 x 8 DRAM chip
CAS = 1
addr
data
2/
8/
Memorycontroller
supercell(2,1)
supercell(2,1)
To CPU
![Page 10: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/10.jpg)
CS33 Intro to Computer Systems XVI–10 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
Memory Modules
: supercell (i,j)
64 MB memory moduleconsisting ofeight 8Mx8 DRAMs
addr (row = i, col = j)
Memorycontroller
DRAM 7
DRAM 0
031 78151623243263 394047485556
64-bit doubleword at main memory address A
bits0-7
bits8-15
bits16-23
bits24-31
bits32-39
bits40-47
bits48-55
bits56-63
64-bit doubleword
031 78151623243263 394047485556
![Page 11: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/11.jpg)
CS33 Intro to Computer Systems XVI–11 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
Enhanced DRAMs
• Basic DRAM cell has not changed since its invention in 1966
– commercialized by Intel in 1970
• DRAMs with better interface logic and faster I/O:– synchronous DRAM (SDRAM)
» uses a conventional clock signal instead of asynchronous control» allows reuse of the row addresses (e.g., RAS, CAS, CAS, CAS)
– double data-rate synchronous DRAM (DDR SDRAM)» DDR1
• twice as fast» DDR2
• four times as fast» DDR3
• eight times as fast
![Page 12: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/12.jpg)
CS33 Intro to Computer Systems XVI–12 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
Enhanced DRAMs
DRAM Cell
Array
fSDR: n B/sec
fDRAM
Cell Array
I/O Buffer
DDR1: 2n B/sec
DRAM Cell
Array
I/O Buffer
2fDDR2: 4n B/sec
DRAM Cell
Array
I/O Buffer
4fDDR3: 8n B/sec
![Page 13: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/13.jpg)
CS33 Intro to Computer Systems XVI–13 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
Summary
• Memory transfer speed increased by a factor of 8
– no increase in DRAM Cell Array speed– 8 times more data transferred at once
» 64 adjacent bytes fetched from DRAM
![Page 14: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/14.jpg)
CS33 Intro to Computer Systems XVI–14 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
Quiz 1
A program is loading randomly selected bytes
from memory. These bytes will be delivered to
the processor on a DDR3 system at a speed
thatʼs n times that of an SDR system, where n
is:
a) 1
b) 2
c) 4
d) 8
![Page 15: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/15.jpg)
CS33 Intro to Computer Systems XVI–15 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
Traditional Bus Structure Connecting CPU and Memory
• A bus is a collection of parallel wires that carry address, data, and control signals
• Buses are typically shared by multiple devices
Mainmemory
I/O bridgeBus interface
ALU
Register file
CPU chip
System bus Memory bus
![Page 16: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/16.jpg)
CS33 Intro to Computer Systems XVI–16 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
Memory Read Transaction (1)
• CPU places address A on the memory bus
ALU
Register file
Bus interfaceA 0
Ax
Main memoryI/O bridge
%rax
Load operation: movq A, %rax
![Page 17: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/17.jpg)
CS33 Intro to Computer Systems XVI–17 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
Memory Read Transaction (2)
• Main memory reads A from the memory bus, retrieves word x, and places it on the bus
ALU
Register file
Bus interface
x 0
Ax
Main memory
%rax
I/O bridge
Load operation: movq A, %rax
![Page 18: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/18.jpg)
CS33 Intro to Computer Systems XVI–18 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
Memory Read Transaction (3)
• CPU reads word x from the bus and copies it into register %rax
xALU
Register file
Bus interface x
Main memory0
A
%rax
I/O bridge
Load operation: movq A, %rax
![Page 19: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/19.jpg)
CS33 Intro to Computer Systems XVI–19 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
Memory Write Transaction (1)
• CPU places address A on bus. Main memory reads it and waits for the corresponding data word to arrive
yALU
Register file
Bus interfaceA
Main memory0
A
%rax
I/O bridge
Store operation: movq %rax, A
![Page 20: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/20.jpg)
CS33 Intro to Computer Systems XVI–20 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
Memory Write Transaction (2)
• CPU places data word y on the bus
yALU
Register file
Bus interfacey
Main memory0
A
%rax
I/O bridge
Store operation: movq %rax, A
![Page 21: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/21.jpg)
CS33 Intro to Computer Systems XVI–21 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
Memory Write Transaction (3)
• Main memory reads data word y from the bus and stores it at address A
yALU
register file
Bus interface y
main memory0
A
%rax
I/O bridge
Store operation: movq %rax, A
![Page 22: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/22.jpg)
CS33 Intro to Computer Systems XVI–22 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
A Mismatch
• A processor clock cycle is ~0.3 nsecs– SunLab machines (Intel Core i5-4690) run at 3.5 GHz
• Basic operations take 1 – 10 clock cycles– .3 – 3 nsecs
• Accessing memory takes 70-100 nsecs• How is this made to work?
![Page 23: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/23.jpg)
CS33 Intro to Computer Systems XVI–23 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
Caching to the Rescue
CPU
Cache
![Page 24: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/24.jpg)
CS33 Intro to Computer Systems XVI–24 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
Cache Memories
• Cache memories are small, fast SRAM-based memories managed automatically in hardware
– hold frequently accessed blocks of main memory• CPU looks first for data in caches (e.g., L1, L2, and L3),
then in main memory• Typical system structure:
Mainmemory
I/ObridgeBus interface
ALU
Register fileCPU chip
System bus Memory bus
Cache memories
![Page 25: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/25.jpg)
CS33 Intro to Computer Systems XVI–25 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
General Cache Concepts
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
8 9 14 3Cache
MemoryLarger, slower, cheaper memoryviewed as partitioned into “blocks”
Data is copied in block-sized transfer units
Smaller, faster, more expensivememory caches a subset ofthe blocks
4
4
4
10
10
10
![Page 26: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/26.jpg)
CS33 Intro to Computer Systems XVI–26 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
General Cache Concepts: Hit
0 1 2 34 5 6 78 9 10 11
12 13 14 15
8 9 14 3Cache
Memory
Data in block b is neededRequest: 14
14Block b is in cache:Hit!
![Page 27: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/27.jpg)
CS33 Intro to Computer Systems XVI–27 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
General Cache Concepts: Miss
0 1 2 34 5 6 78 9 10 11
12 13 14 15
8 9 14 3Cache
Memory
Data in block b is neededRequest: 12
Block b is not in cache:Miss!
Block b is fetched frommemoryRequest: 12
12
12
12
Block b is stored in cache• placement policy:
determines where b goes• replacement policy:
determines which blockgets evicted (victim)
![Page 28: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/28.jpg)
CS33 Intro to Computer Systems XVI–28 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
General Caching Concepts:
Types of Cache Misses
• Cold (compulsory) miss
– cold misses occur because the cache is empty
• Conflict miss
– most caches limit blocks to a small subset (sometimes a
singleton) of the block positions in RAM
» e.g., block i in RAM must be placed in block (i mod 4) in the cache
– conflict misses occur when the cache is large enough, but
multiple data objects all map to the same cache block
» e.g., referencing blocks 0, 8, 0, 8, 0, 8, ... would miss every time
• Capacity miss
– occurs when the set of active cache blocks (working set) is
larger than the cache
![Page 29: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/29.jpg)
CS33 Intro to Computer Systems XVI–29 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
General Cache Organization (S, E, B)E = 2e lines per set
S = 2s sets
setline
0 1 2 B-1tagv
B = 2b bytes per cache block (the data)
Cache size:C = S x E x B data bytes
valid bit
![Page 30: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/30.jpg)
CS33 Intro to Computer Systems XVI–30 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
Cache ReadE = 2e lines per set
S = 2s sets
0 1 2 B-1tagv
valid bitB = 2b bytes per cache block (the data)
t bits s bits b bitsAddress of word:
tag setindex
blockoffset
data begins at this offset
• Locate set• Check if any line in set
has matching tag• Yes + line valid: hit• Locate data starting
at offset
![Page 31: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/31.jpg)
CS33 Intro to Computer Systems XVI–31 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
Example: Direct Mapped Cache (E = 1)
S = 2s sets
Direct mapped: one line per setAssume: cache block size 8 bytes
t bits 0…01 100Address of int:
0 1 2 7tagv 3 654
0 1 2 7tagv 3 654
0 1 2 7tagv 3 654
0 1 2 7tagv 3 654
find set
![Page 32: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/32.jpg)
CS33 Intro to Computer Systems XVI–32 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
Example: Direct Mapped Cache (E = 1)Direct mapped: one line per setAssume: cache block size 8 bytes
t bits 0…01 100Address of int:
0 1 2 7tagv 3 654
match: assume yes = hitvalid? +
block offset
tag
![Page 33: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/33.jpg)
CS33 Intro to Computer Systems XVI–33 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
Example: Direct Mapped Cache (E = 1)Direct mapped: one line per setAssume: cache block size 8 bytes
t bits 0…01 100Address of int:
0 1 2 7tagv 3 654
match: assume yes = hitvalid? +
int (4 Bytes) is here
block offset
No match: old line is evicted and replaced
![Page 34: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/34.jpg)
CS33 Intro to Computer Systems XVI–34 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
Direct-Mapped Cache SimulationM=16 byte addresses, B=2 bytes/block,
S=4 sets, E=1 Blocks/set
Address trace (reads, one byte per read):
0 [00002], 1 [00012], 7 [01112], 8 [10002], 0 [00002]
x
t=1 s=2 b=1
xx x
0 ? ?
v Tag Block
miss
1 0 M[0-1]
hitmiss
1 0 M[6-7]
miss
1 1 M[8-9]
miss
1 0 M[0-1]Set 0Set 1Set 2Set 3
![Page 35: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/35.jpg)
CS33 Intro to Computer Systems XVI–35 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
A Higher-Level Exampleint sum_array_rows(double a[16][16]){
int i, j;double sum = 0;
for (i = 0; i < 16; i++)for (j = 0; j < 16; j++)
sum += a[i][j];return sum;
}
32 B = 4 doubles
assume: cold (empty) cache,a[0][0] goes here
int sum_array_cols(double a[16][16]){
int i, j;double sum = 0;
for (j = 0; i < 16; i++)for (i = 0; j < 16; j++)
sum += a[i][j];return sum;
}
Ignore the variables sum, i, j
![Page 36: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/36.jpg)
CS33 Intro to Computer Systems XVI–36 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
A Higher-Level Exampleint sum_array_rows(double a[16][16]){
int i, j;double sum = 0;
for (i = 0; i < 16; i++)for (j = 0; j < 16; j++)
sum += a[i][j];return sum;
}
32 B = 4 doubles
int sum_array_cols(double a[16][16]){
int i, j;double sum = 0;
for (j = 0; i < 16; i++)for (i = 0; j < 16; j++)
sum += a[i][j];return sum;
}
a0,0 a0,1 a0,2 a0,3
a0,4 a0,5 a0,6 a0,7
a0,8 a0,9 a0,10 a0,11
a0,12 a0,13 a0,14 a0,15
a1,0 a1,1 a1,2 a1,3
a1,4 a1,5 a1,6 a1,7
a1,8 a1,9 a1,10 a1,11
a1,12 a1,13 a1,14 a1,15
![Page 37: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/37.jpg)
CS33 Intro to Computer Systems XVI–37 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
A Higher-Level Exampleint sum_array_rows(double a[16][16]){
int i, j;double sum = 0;
for (i = 0; i < 16; i++)for (j = 0; j < 16; j++)
sum += a[i][j];return sum;
}
32 B = 4 doubles
int sum_array_cols(double a[16][16]){
int i, j;double sum = 0;
for (j = 0; j < 16; i++)for (i = 0; i < 16; j++)
sum += a[i][j];return sum;
}
a0,0 a0,1 a0,2 a0,3
a1,0 a1,1 a1,2 a1,3
a2,0 a2,1 a2,2 a2,3
a3,0 a3,1 a3,2 a3,3
![Page 38: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/38.jpg)
CS33 Intro to Computer Systems XVI–38 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
Conflict Misses: Aligned
double dotprod(double x[8], double y[8]) {double sum = 0.0;int i;
for (i=0; i<8; i++)sum += x[i] * y[i];
return sum;}
32 B = 4 doubles
x0 x1 x2 x3y0 y1 y2 y3x0 x1 x2 x3y0 y1 y2 y3x0 x1 x2 x3y1 y2 y3y0x0 x1 x2 x3y0 y1 y2 y3
![Page 39: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/39.jpg)
CS33 Intro to Computer Systems XVI–39 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
Different Alignments
double dotprod(double x[8], double y[8]) {double sum = 0.0;int i;
for (i=0; i<8; i++)sum += x[i] * y[i];
return sum;}
32 B = 4 doubles
x0 x1 x2 x3
y0 y1 y2 y3x4 x5 x6 x7
y4 y5 y6 y7
![Page 40: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/40.jpg)
CS33 Intro to Computer Systems XVI–40 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
E-way Set-Associative Cache (Here: E = 2)E = 2: two lines per setAssume: cache block size 8 bytes
t bits 0…01 100Address of short int:
0 1 2 7tagv 3 654 0 1 2 7tagv 3 654
compare both
valid? + match: yes = hit
block offset
tag
![Page 41: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/41.jpg)
CS33 Intro to Computer Systems XVI–41 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
E-way Set-Associative Cache (Here: E = 2)E = 2: two lines per set
Assume: cache block size 8 bytes
t bits 0…01 100
Address of short int:
0 1 2 7tagv 3 654 0 1 2 7tagv 3 654
compare both
valid? + match: yes = hit
block offset
short int (2 Bytes) is here
No match:
• One line in set is selected for eviction and replacement
• Replacement policies: random, least recently used (LRU), …
![Page 42: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/42.jpg)
CS33 Intro to Computer Systems XVI–42 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
Quiz 2 100 01 100Address of int:
8 8 8 9tag=2v 8 999
4 4 4 5tag=0v 4 555
0 0 0 1tag=0v 0 111
c c c dtag=4v c ddd
0
1
2
3
tag=3v
tag=4v
tag=2v
tag=av
2 2 2 32 333
6 6 6 76 777
a a a ba bbb
e e e fe fff
Given the address above and the cache contents as shown, what is the value of the int at the given address?
a) 1111b) 3333c) 4444d) 7777
![Page 43: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/43.jpg)
CS33 Intro to Computer Systems XVI–43 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
2-Way Set-Associative Cache Simulation
M=16 byte addresses, B=2 bytes/block, S=2 sets, E=2 blocks/set
Address trace (reads, one byte per read):0 [00002], 1 [00012], 7 [01112], 8 [10002], 0 [00002]
xxt=2 s=1 b=1
x x
0 ? ?v Tag Block
0
00
miss
1 00 M[0-1]
hitmiss
1 01 M[6-7]
miss
1 10 M[8-9]
hit
Set 0
Set 1
![Page 44: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/44.jpg)
CS33 Intro to Computer Systems XVI–44 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
A Higher-Level Exampleint sum_array_rows(double a[16][16]){
int i, j;double sum = 0;
for (i = 0; i < 16; i++)for (j = 0; j < 16; j++)
sum += a[i][j];return sum;
}
32 B = 4 doubles
assume: cold (empty) cache,a[0][0] goes here
int sum_array_rows(double a[16][16]){
int i, j;double sum = 0;
for (j = 0; j < 16; i++)for (i = 0; i < 16; j++)
sum += a[i][j];return sum;
}
Ignore the variables sum, i, j
![Page 45: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/45.jpg)
CS33 Intro to Computer Systems XVI–45 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
A Higher-Level Example
32 B = 4 doubles
Ignore the variables sum, i, j
a0,0 a0,1 a0,2 a0,3
a0,4 a0,5 a0,6 a0,7
a0,8 a0,9 a0,10 a0,11
a0,12 a0,13 a0,14 a0,15
a1,0 a1,1 a1,2 a1,3
a1,4 a1,5 a1,6 a1,7
a1,8 a1,9 a1,10 a1,11
a1,12 a1,13 a1,14 a1,15
int sum_array_rows(double a[16][16]){
int i, j;double sum = 0;
for (i = 0; i < 16; i++)for (j = 0; j < 16; j++)
sum += a[i][j];return sum;
}
int sum_array_cols(double a[16][16]){
int i, j;double sum = 0;
for (j = 0; i < 16; i++)for (i = 0; j < 16; j++)
sum += a[i][j];return sum;
}
![Page 46: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/46.jpg)
CS33 Intro to Computer Systems XVI–46 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
A Higher-Level Example
32 B = 4 doubles
Ignore the variables sum, i, j
a0,0 a0,1 a0,2 a0,3 a1,0 a1,1 a1,2 a1,3a2,0 a2,1 a2,2 a2,3 a3,0 a3,1 a3,2 a3,3
int sum_array_rows(double a[16][16]){
int i, j;double sum = 0;
for (i = 0; i < 16; i++)for (j = 0; j < 16; j++)
sum += a[i][j];return sum;
}
int sum_array_cols(double a[16][16]){
int i, j;double sum = 0;
for (j = 0; i < 16; i++)for (i = 0; j < 16; j++)
sum += a[i][j];return sum;
}
![Page 47: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/47.jpg)
CS33 Intro to Computer Systems XVI–47 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
Conflict Misses
double dotprod(double x[8], double y[8]) {double sum = 0.0;int i;
for (i=0; i<8; i++)sum += x[i] * y[i];
return sum;}
32 B = 4 doubles
x0 x1 x2 x3 y0 y1 y2 y3
x4 x5 x6 x7 y4 y5 y6 y7
![Page 48: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/48.jpg)
CS33 Intro to Computer Systems XVI–48 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
Intel Core i5 and i7 Cache Hierarchy
Regs
L1 d-cache
L1 i-cache
L2 unified cache
Core 0
Regs
L1 d-cache
L1 i-cache
L2 unified cache
Core 3
…
L3 unified cache(shared by all cores)
Main memory
Processor packageL1 i-cache and d-cache:
32 KB, 8-way, Access: 4 cycles
L2 unified cache:256 KB, 8-way,
Access: 11 cycles
L3 unified cache:8 MB, 16-way,Access: 30-40 cycles
Block size: 64 bytes for all caches
![Page 49: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved](https://reader036.vdocuments.net/reader036/viewer/2022081606/5c2f653109d3f2d20b8d5ce5/html5/thumbnails/49.jpg)
CS33 Intro to Computer Systems XVI–49 Copyright © 2018 Thomas W. Doeppner. All rights reserved.
What About Writes?
• Multiple copies of data exist:– L1, L2, main memory, disk
• What to do on a write-hit?– write-through (write immediately to memory)
– write-back (defer write to memory until replacement of line)» need a dirty bit (line different from memory or not)
• What to do on a write-miss?– write-allocate (load into cache, update line in cache)
» good if more writes to the location follow
– no-write-allocate (writes immediately to memory)
• Typical– write-through + no-write-allocate
– write-back + write-allocate