the cache-coherence problem
DESCRIPTION
The Cache-Coherence Problem. Chapter 7. Cache. Cache. Cache. Example 1: The Cache-Coherence Problem. sum = 0; begin parallel for (i=0; iTRANSCRIPT
![Page 1: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/1.jpg)
CSC/ECE 506: Architecture of Parallel Computers
The Cache-Coherence Problem
The Cache-Coherence Problem
Chapter 7Chapter 7
1
![Page 2: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/2.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Example 1: The Cache-Coherence Problem
sum = 0;begin parallelfor (i=0; i<2; i++) { lock(id, myLock); sum = sum + a[i]; unlock(id, myLock);end parallelprint sum;
Suppose a[0] = 3 and a[1] = 7
P1
CacheCache
P2
CacheCache
Pn
CacheCache
. . .
• Will it print sum = 10?
2
![Page 3: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/3.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Cache-Coherence Problem Illustration
Start state. All caches empty and main memory has Sum = 0.
Start state. All caches empty and main memory has Sum = 0.
P1
CacheCache
P2
CacheCache
P3
CacheCache
Main memoryMain memory
Sum = 0Sum = 0
ControllerControllerTraceTrace
P1P1 Read SumRead Sum
P2P2 Read SumRead Sum
P1P1 Write Sum = 3Write Sum = 3
P2P2 Write Sum = 7Write Sum = 7
P1P1 Read SumRead Sum
Bus
Bus
3
![Page 4: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/4.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Cache-Coherence Problem Illustration
P1 reads Sum from memory.P1 reads Sum from memory. P1
CacheCache
P2
CacheCache
P3
CacheCache
Main memoryMain memory
Sum = 0Sum = 0
ControllerControllerTraceTrace
P1P1 Read SumRead Sum
P2P2 Read SumRead Sum
P1P1 Write Sum = 3Write Sum = 3
P2P2 Write Sum = 7Write Sum = 7
P1P1 Read SumRead Sum
Bus
Bus
Sum=0Sum=0 VV
4
![Page 5: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/5.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Cache-Coherence Problem Illustration
P2 reads. Let’s assume this
comes from memory too.
P2 reads. Let’s assume this
comes from memory too.P1
CacheCache
P2
CacheCache
P3
CacheCache
Main memoryMain memory
Sum = 0Sum = 0
ControllerControllerTraceTrace
P1P1 Read SumRead Sum
P2P2 Read SumRead Sum
P1P1 Write Sum = 3Write Sum = 3
P2P2 Write Sum = 7Write Sum = 7
P1P1 Read SumRead Sum
Bus
Bus
Sum=0Sum=0 VV Sum=0Sum=0 VV
5
![Page 6: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/6.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Cache-Coherence Problem Illustration
P1 writes. This write goes
to the cache.
P1 writes. This write goes
to the cache.P1
CacheCache
P2
CacheCache
P3
CacheCache
Main memoryMain memory
Sum = 0Sum = 0
ControllerControllerTraceTrace
P1P1 Read SumRead Sum
P2P2 Read SumRead Sum
P1P1 Write Sum = 3Write Sum = 3
P2P2 Write Sum = 7Write Sum = 7
P1P1 Read SumRead Sum
Bus
Bus
Sum=3Sum=3 DD Sum=0Sum=0 VV
6
Sum=0Sum=0 VV
![Page 7: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/7.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Cache-Coherence Problem Illustration
P2 writes.P2 writes. P1
CacheCache
P2
CacheCache
P3
CacheCache
Main memoryMain memory
Sum = 0Sum = 0
ControllerControllerTraceTrace
P1P1 Read SumRead Sum
P2P2 Read SumRead Sum
P1P1 Write Sum = 3Write Sum = 3
P2P2 Write Sum = 7Write Sum = 7
P1P1 Read SumRead Sum
Bus
Bus
Sum=3Sum=3 DD Sum=7Sum=7 DD
7
Sum=0Sum=0 VV
![Page 8: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/8.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Cache-Coherence Problem Illustration
P1 reads.P1 reads. P1
CacheCache
P2
CacheCache
P3
CacheCache
Main memoryMain memory
Sum = 0Sum = 0
ControllerControllerTraceTrace
P1P1 Read SumRead Sum
P2P2 Read SumRead Sum
P1P1 Write Sum = 3Write Sum = 3
P2P2 Write Sum = 7Write Sum = 7
P1P1 Read SumRead Sum
Bus
Bus
Sum=3Sum=3 DD Sum=7Sum=7 DD
8
![Page 9: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/9.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Cache-Coherence Problem
• Do P1 and P2 see the same sum?
• Does it matter if we use a WT cache?
• What if we do not have caches, or sum is uncacheable.
Will it work?
9
![Page 10: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/10.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Write-Through Cache Does Not Work
P1 reads.P1 reads. P1
CacheCache
P2
CacheCache
P3
CacheCache
Main memoryMain memory
Sum = 7Sum = 7
ControllerControllerTraceTrace
P1P1 Read SumRead Sum
P2P2 Read SumRead Sum
P1P1 Write Sum = 3Write Sum = 3
P2P2 Write Sum = 7Write Sum = 7
P1P1 Read SumRead Sum
Bus
Bus
Sum=3Sum=3 DD Sum=7Sum=7 DD
10
![Page 11: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/11.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Coherence with Write-Through Caches
11
sum = 0;begin parallelfor (i=0; i<2; i++) { lock(id, myLock); sum = sum + a[i]; unlock(id, myLock);end parallelPrint sum;
Suppose a[0] = 3 and a[1] = 7
P1
CacheCache
P2
CacheCache
Pn
CacheCache
. . .
= Snooper
– Bus-based SMP implementation choice in the mid-’80s– What happens when we snoop a write? invalidating/update the block
• No new states or bus transactions in this case• Write-update protocol: write is immediately propagated• Write-invalidation protocol: causes miss on later access, and memory up-
to-date via write-through
![Page 12: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/12.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Bus-Based Coherent Multiprocessors
Bus-Based Coherent Multiprocessors
12
Chapter 8Chapter 8
![Page 13: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/13.jpg)
CSC/ECE 506: Architecture of Parallel Computers
MSI: Processor-Initiated Transactions
13
M
I
S
PrRd/ –PrWr/ –
PrWr/BusRdX
PrRd/BusRd
PrRd/ / – -
PrWr/BusRdX
Why does a PrWr in state S induce a BusRdX?
![Page 14: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/14.jpg)
CSC/ECE 506: Architecture of Parallel Computers
MSI: Bus-Initiated Transactions
14
M
I
S
BusRdX/Flush
BusRdX/ –
BusRd/-
BusRd/Flush
BusRd/ –BusRdX/ –
Thus, validdata must besuppliedby memory
![Page 15: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/15.jpg)
CSC/ECE 506: Architecture of Parallel Computers
MSI Visualization – Start State
15
Start state. All caches empty and main memory has A = 1.
Start state. All caches empty and main memory has A = 1.
P1
CacheCacheSnooperSnooper
P2
CacheCacheSnooperSnooper
P3
CacheCacheSnooperSnooper
Main memoryMain memoryA = 1A = 1ControllerControllerTraceTraceP1P1 Read ARead A
P1P1 Write A = 2Write A = 2
P3P3 Read ARead A
P3P3 Write A = 3Write A = 3
P1P1 Read ARead A
P3P3 Read ARead A
P2P2 Read ARead A
Bus
Bus
![Page 16: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/16.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Processor P1 Reads A
Processor P1 attempts to read A from its cache.
Processor P1 attempts to read A from its cache.
P1
CacheCache
SnooperSnooper
P2
CacheCache
SnooperSnooper
P3
CacheCache
SnooperSnooper
Main memoryMain memory
A = 1A = 1
ControllerControllerTraceTrace
P1P1 Read ARead A
P1P1 Write A = 2Write A = 2
P3P3 Read ARead A
P3P3 Write A = 3Write A = 3
P1P1 Read ARead A
P3P3 Read ARead A
P2P2 Read ARead A
BusBus
P1P1 PrRd APrRd A
P1P1 BusRd ABusRd A
MemMem returns datareturns data
16
![Page 17: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/17.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Processor P1 Reads A
Processor P1 issues a BusRd.
Processor P1 issues a BusRd.
P1
CacheCache
SnooperSnooper
P2
CacheCache
SnooperSnooper
P3
CacheCache
SnooperSnooper
Main memoryMain memory
A = 1A = 1
ControllerControllerTraceTrace
P1P1 Read ARead A
P1P1 Write A = 2Write A = 2
P3P3 Read ARead A
P3P3 Write A = 3Write A = 3
P1P1 Read ARead A
P3P3 Read ARead A
P2P2 Read ARead A
BusBus
P1P1 PrRd APrRd A
P1P1 BusRd ABusRd A
MemMem returns datareturns data
![Page 18: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/18.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Processor P1 Reads A
Main memory returns data to processor P1 which updates its cache.
Main memory returns data to processor P1 which updates its cache.
P1
CacheCache
A = 1A = 1 SS
SnooperSnooper
P2
CacheCache
SnooperSnooper
P3
CacheCache
SnooperSnooper
Main memoryMain memory
A = 1A = 1
ControllerControllerTraceTrace
P1P1 Read ARead A
P1P1 Write A = 2Write A = 2
P3P3 Read ARead A
P3P3 Write A = 3Write A = 3
P1P1 Read ARead A
P3P3 Read ARead A
P2P2 Read ARead A
BusBus
P1P1 PrRd APrRd A
P1P1 BusRd ABusRd A
MemMem returns datareturns data
![Page 19: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/19.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Processor P1 Reads A
Read operation completes.Read operation completes. P1
CacheCache
A = 1A = 1 SS
SnooperSnooper
P2
CacheCache
SnooperSnooper
P3
CacheCache
SnooperSnooper
Main memoryMain memory
A = 1A = 1
ControllerControllerTraceTrace
P1P1 Read ARead A
P1P1 Write A = 2Write A = 2
P3P3 Read ARead A
P3P3 Write A = 3Write A = 3
P1P1 Read ARead A
P3P3 Read ARead A
P2P2 Read ARead A
BusBus
![Page 20: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/20.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Processor P1 Writes A = 2
Processor P1 writes to its cache.
Processor P1 writes to its cache.
P1
CacheCache
A = 2A = 2 SS
SnooperSnooper
P2
CacheCache
SnooperSnooper
P3
CacheCache
SnooperSnooper
Main memoryMain memory
A = 1A = 1
ControllerControllerTraceTrace
P1P1 Read ARead A
P1P1 Write A = 2Write A = 2
P3P3 Read ARead A
P3P3 Write A = 3Write A = 3
P1P1 Read ARead A
P3P3 Read ARead A
P2P2 Read ARead A
BusBus
P1P1 PrWr APrWr A
P1P1 BusRdXBusRdX
![Page 21: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/21.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Processor P1 Writes A = 2
Processor P1 issues a BusRd request.
Processor P1 issues a BusRd request.
P1
CacheCache
A = 2A = 2 MM
SnooperSnooper
P2
CacheCache
SnooperSnooper
P3
CacheCache
SnooperSnooper
Main memoryMain memory
A = 1A = 1
ControllerControllerTraceTrace
P1P1 Read ARead A
P1P1 Write A = 2Write A = 2
P3P3 Read ARead A
P3P3 Write A = 3Write A = 3
P1P1 Read ARead A
P3P3 Read ARead A
P2P2 Read ARead A
BusBus
P1P1 PrWr APrWr A
P1P1 BusRdXBusRdX
![Page 22: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/22.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Processor P1 Writes A = 2
Write operation completes.Write operation completes. P1
CacheCache
A = 2A = 2 MM
SnooperSnooper
P2
CacheCache
SnooperSnooper
P3
CacheCache
SnooperSnooper
Main memoryMain memory
A = 1A = 1
ControllerControllerTraceTrace
P1P1 Read ARead A
P1P1 Write A = 2Write A = 2
P3P3 Read ARead A
P3P3 Write A = 3Write A = 3
P1P1 Read ARead A
P3P3 Read ARead A
P2P2 Read ARead A
BusBus
![Page 23: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/23.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Processor P3 Reads A
Processor P3 attempts to read A from its cache.
Processor P3 attempts to read A from its cache.
P1
CacheCache
A = 2A = 2 MM
SnooperSnooper
P2
CacheCache
SnooperSnooper
P3
CacheCache
SnooperSnooper
Main memoryMain memory
A = 1A = 1
ControllerControllerTraceTrace
P1P1 Read ARead A
P1P1 Write A = 2Write A = 2
P3P3 Read ARead A
P3P3 Write A = 3Write A = 3
P1P1 Read ARead A
P3P3 Read ARead A
P2P2 Read ARead A
BusBus
P3P3 PrRd APrRd A
P3P3 BusRd ABusRd A
P1P1 snoops BusRdsnoops BusRd
P1P1 FlushFlush
![Page 24: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/24.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Processor P3 Reads A
Processor P3 issues a BusRd.
Processor P3 issues a BusRd.
P1
CacheCache
A = 2A = 2 MM
SnooperSnooper
P2
CacheCache
SnooperSnooper
P3
CacheCache
SnooperSnooper
Main memoryMain memory
A = 1A = 1
ControllerControllerTraceTrace
P1P1 Read ARead A
P1P1 Write A = 2Write A = 2
P3P3 Read ARead A
P3P3 Write A = 3Write A = 3
P1P1 Read ARead A
P3P3 Read ARead A
P2P2 Read ARead A
BusBus
P3P3 PrRd APrRd A
P3P3 BusRd ABusRd A
P1P1 snoops BusRdsnoops BusRd
P1P1 FlushFlush
![Page 25: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/25.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Processor P3 Reads A
Processor P1 snoops the BusRd from processor P3.
Processor P1 snoops the BusRd from processor P3.
P1
CacheCache
A = 2A = 2 SS
SnooperSnooper
P2
CacheCache
SnooperSnooper
P3
CacheCache
SnooperSnooper
Main memoryMain memory
A = 1A = 1
ControllerControllerTraceTrace
P1P1 Read ARead A
P1P1 Write A = 2Write A = 2
P3P3 Read ARead A
P3P3 Write A = 3Write A = 3
P1P1 Read ARead A
P3P3 Read ARead A
P2P2 Read ARead A
BusBus
P3P3 PrRd APrRd A
P3P3 BusRd ABusRd A
P1P1 snoops BusRdsnoops BusRd
P1P1 FlushFlush
![Page 26: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/26.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Processor P3 Reads A
Processor P1 flushes, sending updated data to P3
and main memory.
Processor P1 flushes, sending updated data to P3
and main memory.
P1
CacheCache
A = 2A = 2 SS
SnooperSnooper
P2
CacheCache
SnooperSnooper
P3
CacheCache
A = 2A = 2 SS
SnooperSnooper
Main memoryMain memory
A = 2A = 2
ControllerControllerTraceTrace
P1P1 Read ARead A
P1P1 Write A = 2Write A = 2
P3P3 Read ARead A
P3P3 Write A = 3Write A = 3
P1P1 Read ARead A
P3P3 Read ARead A
P2P2 Read ARead A
BusBus
P3P3 PrRd APrRd A
P3P3 BusRd ABusRd A
P1P1 snoops BusRdsnoops BusRd
P1P1 FlushFlush
![Page 27: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/27.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Processor P3 Reads A
Read operation completes.Read operation completes. P1
CacheCache
A = 2A = 2 SS
SnooperSnooper
P2
CacheCache
SnooperSnooper
P3
CacheCache
A = 2A = 2 SS
SnooperSnooper
Main memoryMain memory
A = 2A = 2
ControllerControllerTraceTrace
P1P1 Read ARead A
P1P1 Write A = 2Write A = 2
P3P3 Read ARead A
P3P3 Write A = 3Write A = 3
P1P1 Read ARead A
P3P3 Read ARead A
P2P2 Read ARead A
BusBus
![Page 28: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/28.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Processor P3 Writes A = 3
Processor P3 writes to its cache.
Processor P3 writes to its cache.
P1
CacheCache
A = 2A = 2 SS
SnooperSnooper
P2
CacheCache
SnooperSnooper
P3
CacheCache
A = 2A = 2 SS
SnooperSnooper
Main memoryMain memory
A = 2A = 2
ControllerControllerTraceTrace
P1P1 Read ARead A
P1P1 Write A = 2Write A = 2
P3P3 Read ARead A
P3P3 Write A = 3Write A = 3
P1P1 Read ARead A
P3P3 Read ARead A
P2P2 Read ARead A
BusBus
P3P3 PrWr APrWr A
P3P3 BusRdXBusRdX
P1P1 snoops BusRdXsnoops BusRdX
![Page 29: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/29.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Processor P3 Writes A = 3
Processor P3 issues a BusRd request.
Processor P3 issues a BusRd request.
P1
CacheCache
A = 2A = 2 SS
SnooperSnooper
P2
CacheCache
SnooperSnooper
P3
CacheCache
A = 2A = 2 SS
SnooperSnooper
Main memoryMain memory
A = 2A = 2
ControllerControllerTraceTrace
P1P1 Read ARead A
P1P1 Write A = 2Write A = 2
P3P3 Read ARead A
P3P3 Write A = 3Write A = 3
P1P1 Read ARead A
P3P3 Read ARead A
P2P2 Read ARead A
BusBus
P3P3 PrWr APrWr A
P3P3 BusRdXBusRdX
P1P1 snoops BusRdXsnoops BusRdX
![Page 30: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/30.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Processor P3 Writes A = 3
Processor P1 snoops the BusRd and invalidates its cache.
Processor P1 snoops the BusRd and invalidates its cache.
P1
CacheCache
A = 2A = 2 II
SnooperSnooper
P2
CacheCache
SnooperSnooper
P3
CacheCache
A = 3A = 3 MM
SnooperSnooper
Main memoryMain memory
A = 2A = 2
ControllerControllerTraceTrace
P1P1 Read ARead A
P1P1 Write A = 2Write A = 2
P3P3 Read ARead A
P3P3 Write A = 3Write A = 3
P1P1 Read ARead A
P3P3 Read ARead A
P2P2 Read ARead A
BusBus
P3P3 PrWr APrWr A
P3P3 BusRdXBusRdX
P1P1 snoops BusRdXsnoops BusRdX
![Page 31: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/31.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Processor P3 Writes A = 3
Write operation completes. Write operation completes. P1
CacheCache
A = 2A = 2 II
SnooperSnooper
P2
CacheCache
SnooperSnooper
P3
CacheCache
A = 3A = 3 MM
SnooperSnooper
Main memoryMain memory
A = 2A = 2
ControllerControllerTraceTrace
P1P1 Read ARead A
P1P1 Write A = 2Write A = 2
P3P3 Read ARead A
P3P3 Write A = 3Write A = 3
P1P1 Read ARead A
P3P3 Read ARead A
P2P2 Read ARead A
BusBus
![Page 32: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/32.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Processor P1 Reads A
Processor P1 reads from its cache.
Processor P1 reads from its cache.
P1
CacheCache
A = 2A = 2 II
SnooperSnooper
P2
CacheCache
SnooperSnooper
P3
CacheCache
A = 3A = 3 MM
SnooperSnooper
Main memoryMain memory
A = 2A = 2
ControllerControllerTraceTrace
P1P1 Read ARead A
P1P1 Write A = 2Write A = 2
P3P3 Read ARead A
P3P3 Write A = 3Write A = 3
P1P1 Read ARead A
P3P3 Read ARead A
P2P2 Read ARead A
BusBus
P1P1 PrRd APrRd A
P1P1 BusRd ABusRd A
P3P3 snoops BusRdsnoops BusRd
P3P3 FlushFlush
![Page 33: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/33.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Processor P1 Reads A
Processor P1 issues a BusRd request.
Processor P1 issues a BusRd request.
P1
CacheCache
A = 2A = 2 II
SnooperSnooper
P2
CacheCache
SnooperSnooper
P3
CacheCache
A = 3A = 3 MM
SnooperSnooper
Main memoryMain memory
A = 2A = 2
ControllerControllerTraceTrace
P1P1 Read ARead A
P1P1 Write A = 2Write A = 2
P3P3 Read ARead A
P3P3 Write A = 3Write A = 3
P1P1 Read ARead A
P3P3 Read ARead A
P2P2 Read ARead A
BusBus
P1P1 PrRd APrRd A
P1P1 BusRd ABusRd A
P3P3 snoops BusRdsnoops BusRd
P3P3 FlushFlush
![Page 34: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/34.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Processor P1 Reads A
Processor P3 snoops the BusRd.
Processor P3 snoops the BusRd.
P1
CacheCache
A = 2A = 2 II
SnooperSnooper
P2
CacheCache
SnooperSnooper
P3
CacheCache
A = 3A = 3 MM
SnooperSnooper
Main memoryMain memory
A = 2A = 2
ControllerControllerTraceTrace
P1P1 Read ARead A
P1P1 Write A = 2Write A = 2
P3P3 Read ARead A
P3P3 Write A = 3Write A = 3
P1P1 Read ARead A
P3P3 Read ARead A
P2P2 Read ARead A
BusBus
P1P1 PrRd APrRd A
P1P1 BusRd ABusRd A
P3P3 snoops BusRdsnoops BusRd
P3P3 FlushFlush
![Page 35: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/35.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Processor P1 Reads A
Processor P3 flushes, updating processor P1, main memory and its own cache state.
Processor P3 flushes, updating processor P1, main memory and its own cache state.
P1
CacheCache
A = 3A = 3 SS
SnooperSnooper
P2
CacheCache
SnooperSnooper
P3
CacheCache
A = 3A = 3 SS
SnooperSnooper
Main memoryMain memory
A = 3A = 3
ControllerControllerTraceTrace
P1P1 Read ARead A
P1P1 Write A = 2Write A = 2
P3P3 Read ARead A
P3P3 Write A = 3Write A = 3
P1P1 Read ARead A
P3P3 Read ARead A
P2P2 Read ARead A
BusBus
P1P1 PrRd APrRd A
P1P1 BusRd ABusRd A
P3P3 snoops BusRdsnoops BusRd
P3P3 FlushFlush
![Page 36: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/36.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Processor P1 Reads A
Read operation completes.Read operation completes. P1
CacheCache
A = 3A = 3 SS
SnooperSnooper
P2
CacheCache
SnooperSnooper
P3
CacheCache
A = 3A = 3 SS
SnooperSnooper
Main memoryMain memory
A = 3A = 3
ControllerControllerTraceTrace
P1P1 Read ARead A
P1P1 Write A = 2Write A = 2
P3P3 Read ARead A
P3P3 Write A = 3Write A = 3
P1P1 Read ARead A
P3P3 Read ARead A
P2P2 Read ARead A
BusBus
![Page 37: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/37.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Processor P3 Reads A
Processor P3 reads from its cache.
Processor P3 reads from its cache.
P1
CacheCache
A = 3A = 3 SS
SnooperSnooper
P2
CacheCache
SnooperSnooper
P3
CacheCache
A = 3A = 3 SS
SnooperSnooper
Main memoryMain memory
A = 3A = 3
ControllerControllerTraceTrace
P1P1 Read ARead A
P1P1 Write A = 2Write A = 2
P3P3 Read ARead A
P3P3 Write A = 3Write A = 3
P1P1 Read ARead A
P3P3 Read ARead A
P2P2 Read ARead A
BusBus
P3P3 PrRd APrRd A
P3P3 returns datareturns data
![Page 38: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/38.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Processor P3 Reads A
Processor P3 returns valid data from its cache.
Processor P3 returns valid data from its cache.
P1
CacheCache
A = 3A = 3 SS
SnooperSnooper
P2
CacheCache
SnooperSnooper
P3
CacheCache
A = 3A = 3 SS
SnooperSnooper
Main memoryMain memory
A = 3A = 3
ControllerControllerTraceTrace
P1P1 Read ARead A
P1P1 Write A = 2Write A = 2
P3P3 Read ARead A
P3P3 Write A = 3Write A = 3
P1P1 Read ARead A
P3P3 Read ARead A
P2P2 Read ARead A
BusBus
P3P3 PrRd APrRd A
P3P3 returns datareturns data
![Page 39: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/39.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Processor P3 Reads A
Read operation completes.Read operation completes. P1
CacheCache
A = 3A = 3 SS
SnooperSnooper
P2
CacheCache
SnooperSnooper
P3
CacheCache
A = 3A = 3 SS
SnooperSnooper
Main memoryMain memory
A = 3A = 3
ControllerControllerTraceTrace
P1P1 Read ARead A
P1P1 Write A = 2Write A = 2
P3P3 Read ARead A
P3P3 Write A = 3Write A = 3
P1P1 Read ARead A
P3P3 Read ARead A
P2P2 Read ARead A
BusBus
![Page 40: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/40.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Processor P2 Reads A
Processor P2 reads from its cache.
Processor P2 reads from its cache.
P1
CacheCache
A = 3A = 3 SS
SnooperSnooper
P2
CacheCache
SnooperSnooper
P3
CacheCache
A = 3A = 3 SS
SnooperSnooper
Main memoryMain memory
A = 3A = 3
ControllerControllerTraceTrace
P1P1 Read ARead A
P1P1 Write A = 2Write A = 2
P3P3 Read ARead A
P3P3 Write A = 3Write A = 3
P1P1 Read ARead A
P3P3 Read ARead A
P2P2 Read ARead A
BusBus
P2P2 PrRd APrRd A
P2P2 BusRd ABusRd A
MemCntrMemCntr observes BusRdobserves BusRd
MemMem returns datareturns data
![Page 41: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/41.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Processor P2 Reads A
Processor P2 issues a BusRd request.
Processor P2 issues a BusRd request.
P1
CacheCache
A = 3A = 3 SS
SnooperSnooper
P2
CacheCache
SnooperSnooper
P3
CacheCache
A = 3A = 3 SS
SnooperSnooper
Main memoryMain memory
A = 3A = 3
ControllerControllerTraceTrace
P1P1 Read ARead A
P1P1 Write A = 2Write A = 2
P3P3 Read ARead A
P3P3 Write A = 3Write A = 3
P1P1 Read ARead A
P3P3 Read ARead A
P2P2 Read ARead A
BusBus
P2P2 PrRd APrRd A
P2P2 BusRd ABusRd A
MemCntrMemCntr observes BusRdobserves BusRd
MemMem returns datareturns data
![Page 42: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/42.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Processor P2 Reads A
Main memory controller observes the BusRd.Main memory controller observes the BusRd.
P1
CacheCache
A = 3A = 3 SS
SnooperSnooper
P2
CacheCache
SnooperSnooper
P3
CacheCache
A = 3A = 3 SS
SnooperSnooper
Main memoryMain memory
A = 3A = 3
ControllerControllerTraceTrace
P1P1 Read ARead A
P1P1 Write A = 2Write A = 2
P3P3 Read ARead A
P3P3 Write A = 3Write A = 3
P1P1 Read ARead A
P3P3 Read ARead A
P2P2 Read ARead A
BusBus
P2P2 PrRd APrRd A
P2P2 BusRd ABusRd A
MemCntrMemCntr observes BusRdobserves BusRd
MemMem returns datareturns data
![Page 43: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/43.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Processor P2 Reads A
Main memory returns valid data.Main memory returns valid data.
P1
CacheCache
A = 3A = 3 SS
SnooperSnooper
P2
CacheCache
A = 3A = 3 SS
SnooperSnooper
P3
CacheCache
A = 3A = 3 SS
SnooperSnooper
Main memoryMain memory
A = 3A = 3
ControllerControllerTraceTrace
P1P1 Read ARead A
P1P1 Write A = 2Write A = 2
P3P3 Read ARead A
P3P3 Write A = 3Write A = 3
P1P1 Read ARead A
P3P3 Read ARead A
P2P2 Read ARead A
BusBus
P2P2 PrRd APrRd A
P2P2 BusRd ABusRd A
MemCntrMemCntr observes BusRdobserves BusRd
MemMem returns datareturns data
![Page 44: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/44.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Processor P2 Reads A
Operation completes.Operation completes. P1
CacheCache
A = 3A = 3 SS
SnooperSnooper
P2
CacheCache
A = 3A = 3 SS
SnooperSnooper
P3
CacheCache
A = 3A = 3 SS
SnooperSnooper
Main memoryMain memory
A = 3A = 3
ControllerControllerTraceTrace
P1P1 Read ARead A
P1P1 Write A = 2Write A = 2
P3P3 Read ARead A
P3P3 Write A = 3Write A = 3
P1P1 Read ARead A
P3P3 Read ARead A
P2P2 Read ARead A
BusBus
![Page 45: The Cache-Coherence Problem](https://reader031.vdocuments.net/reader031/viewer/2022013012/5681443d550346895db0d97f/html5/thumbnails/45.jpg)
CSC/ECE 506: Architecture of Parallel Computers
Example: Rd/Wr to a single line
45
Proc Action
State P1 State P2 State P3 Bus Action Data From
R1 S – – BusRd Mem
W1 M – – BusRdX* Mem
R3 S – S BusRd P1 cache
W3 I – M BusRdX* Mem
R1 S – S BusRd P3 cache
R3 S – S – Local Cache
R2 S S S BusRd Mem
*or, BusUpgr (data from own cache)