directory based cache coherence protocol · directory based cache coherence broadcast based...
TRANSCRIPT
![Page 1: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines](https://reader033.vdocuments.net/reader033/viewer/2022042710/5f5adf3cd54bbc484256e92e/html5/thumbnails/1.jpg)
M4 – Parallelism
Directory based Cache Coherence Protocol
![Page 2: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines](https://reader033.vdocuments.net/reader033/viewer/2022042710/5f5adf3cd54bbc484256e92e/html5/thumbnails/2.jpg)
Outline
● Parallelism● Flynn’s classification● Vector Processing
– Subword Parallelism
● Symmetric Multiprocessors, Distributed Memory Machines– Shared Memory Multiprocessing, Message Passing
● Synchronization Primitives– Locks, LL-SC
● Cache coherence
![Page 3: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines](https://reader033.vdocuments.net/reader033/viewer/2022042710/5f5adf3cd54bbc484256e92e/html5/thumbnails/3.jpg)
Shared Memory vs. Distributed Memory
PP
CC
Main MemoryMain Memory
PP
CC
PP
CC
PP
CC
PP
MM
InterconnectInterconnect
PP
MM
PP
MM
PP
MM
![Page 4: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines](https://reader033.vdocuments.net/reader033/viewer/2022042710/5f5adf3cd54bbc484256e92e/html5/thumbnails/4.jpg)
CPU APrivate Cache
CPU APrivate Cache
Interconnection NetworkInterconnection Network
MM DD
CPU BPrivate Cache
CPU BPrivate Cache
MM DD
CPU CPrivate Cache
CPU CPrivate Cache
MM DD
Directory Based Cache Coherence
![Page 5: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines](https://reader033.vdocuments.net/reader033/viewer/2022042710/5f5adf3cd54bbc484256e92e/html5/thumbnails/5.jpg)
CPU APrivate Cache
CPU APrivate Cache
Interconnection NetworkInterconnection Network
MM DD
CPU BPrivate Cache
CPU BPrivate Cache
MM DD
CPU CPrivate Cache
CPU CPrivate Cache
MM DD
Directory Based Cache Coherence
![Page 6: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines](https://reader033.vdocuments.net/reader033/viewer/2022042710/5f5adf3cd54bbc484256e92e/html5/thumbnails/6.jpg)
CPU APrivate Cache
CPU APrivate Cache
Interconnection NetworkInterconnection Network
MM
A: Read XA: Read X
DD
CPU BPrivate Cache
CPU BPrivate Cache
MM DD
CPU CPrivate Cache
CPU CPrivate Cache
MM DD
Directory Based Cache Coherence
![Page 7: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines](https://reader033.vdocuments.net/reader033/viewer/2022042710/5f5adf3cd54bbc484256e92e/html5/thumbnails/7.jpg)
CPU APrivate Cache
CPU APrivate Cache
Interconnection NetworkInterconnection Network
MM
A: Read XA: Read X
DD
CPU BPrivate Cache
CPU BPrivate Cache
MM DD
CPU CPrivate Cache
CPU CPrivate Cache
MM DD
SharedShared
Directory Based Cache Coherence
![Page 8: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines](https://reader033.vdocuments.net/reader033/viewer/2022042710/5f5adf3cd54bbc484256e92e/html5/thumbnails/8.jpg)
CPU APrivate Cache
CPU APrivate Cache
Interconnection NetworkInterconnection Network
MM
A: Read XA: Read X
DD
CPU BPrivate Cache
CPU BPrivate Cache
MM DD
CPU CPrivate Cache
CPU CPrivate Cache
MM DD
S: AS: A
SharedShared
Directory Based Cache Coherence
![Page 9: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines](https://reader033.vdocuments.net/reader033/viewer/2022042710/5f5adf3cd54bbc484256e92e/html5/thumbnails/9.jpg)
CPU APrivate Cache
CPU APrivate Cache
Interconnection NetworkInterconnection Network
MM DD
CPU BPrivate Cache
CPU BPrivate Cache
MM DD
CPU CPrivate Cache
CPU CPrivate Cache
MM DD
S: AS: A
SharedShared
Directory Based Cache Coherence
![Page 10: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines](https://reader033.vdocuments.net/reader033/viewer/2022042710/5f5adf3cd54bbc484256e92e/html5/thumbnails/10.jpg)
CPU APrivate Cache
CPU APrivate Cache
Interconnection NetworkInterconnection Network
MM DD
CPU BPrivate Cache
CPU BPrivate Cache
MM DD
CPU CPrivate Cache
CPU CPrivate Cache
MM DD
S: AS: A
SharedShared
Directory Based Cache Coherence
B: Read XB: Read X
![Page 11: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines](https://reader033.vdocuments.net/reader033/viewer/2022042710/5f5adf3cd54bbc484256e92e/html5/thumbnails/11.jpg)
CPU APrivate Cache
CPU APrivate Cache
MM DD
CPU BPrivate Cache
CPU BPrivate Cache
MM DD
CPU CPrivate Cache
CPU CPrivate Cache
MM DD
S: AS: A
SharedShared
Directory Based Cache Coherence
B: Read XB: Read X
Read XRead X
![Page 12: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines](https://reader033.vdocuments.net/reader033/viewer/2022042710/5f5adf3cd54bbc484256e92e/html5/thumbnails/12.jpg)
CPU APrivate Cache
CPU APrivate Cache
MM DD
CPU BPrivate Cache
CPU BPrivate Cache
MM DD
CPU CPrivate Cache
CPU CPrivate Cache
MM DD
S: A, BS: A, B
SharedShared
Directory Based Cache Coherence
B: Read XB: Read X
Read XRead X
![Page 13: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines](https://reader033.vdocuments.net/reader033/viewer/2022042710/5f5adf3cd54bbc484256e92e/html5/thumbnails/13.jpg)
CPU APrivate Cache
CPU APrivate Cache
MM DD
CPU BPrivate Cache
CPU BPrivate Cache
MM DD
CPU CPrivate Cache
CPU CPrivate Cache
MM DD
S: A, BS: A, B
SharedShared
Directory Based Cache Coherence
B: Read XB: Read X
XX
![Page 14: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines](https://reader033.vdocuments.net/reader033/viewer/2022042710/5f5adf3cd54bbc484256e92e/html5/thumbnails/14.jpg)
CPU APrivate Cache
CPU APrivate Cache
MM DD
CPU BPrivate Cache
CPU BPrivate Cache
MM DD
CPU CPrivate Cache
CPU CPrivate Cache
MM DD
S: A, BS: A, B
SharedShared
Directory Based Cache Coherence
SharedShared
![Page 15: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines](https://reader033.vdocuments.net/reader033/viewer/2022042710/5f5adf3cd54bbc484256e92e/html5/thumbnails/15.jpg)
CPU APrivate Cache
CPU APrivate Cache
MM DD
CPU BPrivate Cache
CPU BPrivate Cache
MM DD
CPU CPrivate Cache
CPU CPrivate Cache
MM DD
S: A, BS: A, B
SharedShared
Directory Based Cache Coherence
SharedShared
A: Write XA: Write X
![Page 16: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines](https://reader033.vdocuments.net/reader033/viewer/2022042710/5f5adf3cd54bbc484256e92e/html5/thumbnails/16.jpg)
CPU APrivate Cache
CPU APrivate Cache
MM DD
CPU BPrivate Cache
CPU BPrivate Cache
MM DD
CPU CPrivate Cache
CPU CPrivate Cache
MM DD
S: A, BS: A, B
SharedShared
Directory Based Cache Coherence
SharedShared
A: Write XA: Write X
Inv XInv X
![Page 17: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines](https://reader033.vdocuments.net/reader033/viewer/2022042710/5f5adf3cd54bbc484256e92e/html5/thumbnails/17.jpg)
CPU APrivate Cache
CPU APrivate Cache
MM DD
CPU BPrivate Cache
CPU BPrivate Cache
MM DD
CPU CPrivate Cache
CPU CPrivate Cache
MM DD
S: A, BS: A, B
SharedShared
Directory Based Cache Coherence
A: Write XA: Write X
Inv XInv X
InvalidInvalid
![Page 18: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines](https://reader033.vdocuments.net/reader033/viewer/2022042710/5f5adf3cd54bbc484256e92e/html5/thumbnails/18.jpg)
CPU APrivate Cache
CPU APrivate Cache
MM DD
CPU BPrivate Cache
CPU BPrivate Cache
MM DD
CPU CPrivate Cache
CPU CPrivate Cache
MM DD
S: A, BS: A, B
SharedShared
Directory Based Cache Coherence
A: Write XA: Write X
ACKACK
InvalidInvalid
![Page 19: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines](https://reader033.vdocuments.net/reader033/viewer/2022042710/5f5adf3cd54bbc484256e92e/html5/thumbnails/19.jpg)
CPU APrivate Cache
CPU APrivate Cache
MM DD
CPU BPrivate Cache
CPU BPrivate Cache
MM DD
CPU CPrivate Cache
CPU CPrivate Cache
MM DD
M: AM: A
ModifiedModified
Directory Based Cache Coherence
A: Write XA: Write X
InvalidInvalid
![Page 20: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines](https://reader033.vdocuments.net/reader033/viewer/2022042710/5f5adf3cd54bbc484256e92e/html5/thumbnails/20.jpg)
CPU APrivate Cache
CPU APrivate Cache
MM DD
CPU BPrivate Cache
CPU BPrivate Cache
MM DD
CPU CPrivate Cache
CPU CPrivate Cache
MM DD
S: A, BS: A, B
SharedShared
Directory Based Cache Coherence
InvalidInvalid
A: Write XA: Write X
Inv XInv X
ACKACK
![Page 21: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines](https://reader033.vdocuments.net/reader033/viewer/2022042710/5f5adf3cd54bbc484256e92e/html5/thumbnails/21.jpg)
CPU APrivate Cache
CPU APrivate Cache
MM
C: Write XC: Write X
DD
CPU BPrivate Cache
CPU BPrivate Cache
MM DD
CPU CPrivate Cache
CPU CPrivate Cache
MM DD
SharedShared
S: A, BS: A, B
SharedSharedInvalidateInvalidate
M: AM: A
ModifiedModified
B: Read XB: Read X C, A: Write XC, A: Write X
Directory Based Cache Coherence
![Page 22: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines](https://reader033.vdocuments.net/reader033/viewer/2022042710/5f5adf3cd54bbc484256e92e/html5/thumbnails/22.jpg)
![Page 23: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines](https://reader033.vdocuments.net/reader033/viewer/2022042710/5f5adf3cd54bbc484256e92e/html5/thumbnails/23.jpg)
Directory Based Cache Coherence
● Broadcast based snooping protocols do not scale well to large multiprocessors
● Distributed Memory Machines– Physical memory is distributed among all processors
● Directory tracks sharing status of a block of memory– Each node has a directory
● Physical address determines data location● Coherence messages between sent over the ICN
– Point-to-point messages (no broadcast)
![Page 24: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines](https://reader033.vdocuments.net/reader033/viewer/2022042710/5f5adf3cd54bbc484256e92e/html5/thumbnails/24.jpg)
Slides Contents
● Rajeev Balasubramonian, CS6810, University of Utah.
●
![Page 25: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines](https://reader033.vdocuments.net/reader033/viewer/2022042710/5f5adf3cd54bbc484256e92e/html5/thumbnails/25.jpg)
Extra
![Page 26: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines](https://reader033.vdocuments.net/reader033/viewer/2022042710/5f5adf3cd54bbc484256e92e/html5/thumbnails/26.jpg)
Shared Memory vs. Message Passing● Shared Memory Machine: processors share
the same physical address space– Implicit Communication, Hardware controlled
cache coherence
● Message Passing Machine– Explicit communication – programmed
– No cache coherence (simpler hardware)
– Message passing libraries: MPI
PP
CC
Main MemoryMain Memory
PP
CC
PP
CC
PP
CC
PP
MM
InterconnectInterconnect
PP
MM
PP
MM
PP
MM
![Page 27: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines](https://reader033.vdocuments.net/reader033/viewer/2022042710/5f5adf3cd54bbc484256e92e/html5/thumbnails/27.jpg)
Cache Coherence● Consistency
– When should a written value be available to read
– Memory Consistency Models
● Coherence– Which value to return on a read
● A memory system is coherent if:– Write Propagation
● A write is visible after a sufficient time lapse
– Write Serialization● All writes to a location are seen by every processor in the
same order
![Page 28: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines](https://reader033.vdocuments.net/reader033/viewer/2022042710/5f5adf3cd54bbc484256e92e/html5/thumbnails/28.jpg)
Multiprocessor Cache Coherence
● A read by a processor P to a location X that follows a write by P to X, with no writes of X by another processor occurring between the write and the read by P, always returns the value written by P.
● A read by a processor to location X that follows a write by another processor to X returns the written value if the read and write are sufficiently separated in time and no other writes to X occur between the two accesses.
● Writes to the same location are serialized; that is, two writes to the same location by any two processors are seen in the same order by all processors.
![Page 29: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines](https://reader033.vdocuments.net/reader033/viewer/2022042710/5f5adf3cd54bbc484256e92e/html5/thumbnails/29.jpg)
Write Invalidate Coherence Protocol
Writeback / WritethroughEnforcing write serialization
• Bus Arbitration
Tag Contention, Duplication
![Page 30: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines](https://reader033.vdocuments.net/reader033/viewer/2022042710/5f5adf3cd54bbc484256e92e/html5/thumbnails/30.jpg)
SMP Cache Coherence
● MSI Protocol● MESI Protocol
– Exclusive state: No invalidate messages on writes.
– Intel i7 uses MESIF
● MOESI Protocol– Owned state: Only valid copy in the system. Main
memory copy is stale.
– Owner supplies data on a miss.
![Page 31: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines](https://reader033.vdocuments.net/reader033/viewer/2022042710/5f5adf3cd54bbc484256e92e/html5/thumbnails/31.jpg)
SMP Example
ProcessorA
Caches
ProcessorB
Caches
ProcessorC
Caches
ProcessorD
Caches
Main Memory I/O System
A: Rd XB: Rd XC: Rd XA: Wr XA: Wr XC: Wr XB: Rd XA: Rd XA: Rd YB: Wr XB: Rd YB: Wr XB: Wr Y