cache coherence protocols are hardeasy · nicolai oswald, vijay nagarajan, daniel j. sorin s i m i...
TRANSCRIPT
![Page 1: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence](https://reader034.vdocuments.net/reader034/viewer/2022051805/5ff351f80d04ed26f568b4c9/html5/thumbnails/1.jpg)
Cache Coherence Protocols Are Hard Easy
Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin
S
I
M
I
IMAD
M
IMA
IMADI
IMADS
IMAI
IMAS
IMADSI
IMASI
S
ProtoGen
![Page 2: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence](https://reader034.vdocuments.net/reader034/viewer/2022051805/5ff351f80d04ed26f568b4c9/html5/thumbnails/2.jpg)
Directory Cache Coherence
Interconnect
Directory Memory
Core
Cache X = 0
Core
Cache X = 0
Interconnect
Directory Memory
Core
Cache X = 1
Core
Cache X = 0
Interconnect
Directory Memory
Core
Cache X = 1
Core
Cache
![Page 3: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence](https://reader034.vdocuments.net/reader034/viewer/2022051805/5ff351f80d04ed26f568b4c9/html5/thumbnails/3.jpg)
“Cache coherence protocols are notoriously
difficult to implement” [DSL 1997]“Sophisticated cache coherence protocols are notoriously
difficult to get right” [ICS 1999]
“Cache coherence protocols for distributed shared
memory multiprocessors are notoriously difficult
to design” [ICFS 1996]
“Cache coherence protocols are notoriously
difficult to design and verify” [High Perf.
Memory Systems, 2004]
“… directory-based cache coherence protocols
are notoriously complex” [PACT 2011]
“The coherence problem is difficult, because it
requires coordinating events across nodes”
[IEEE Concurrency 2000]
“Coherence protocols are notoriously difficult to
design and implement correctly” [ASPLOS 2017]
“… designing and verifying a new hardware coherence
protocol is difficult”
[Spandex: A Flexible Interface for Efficient Heterogeneous
Coherence - ISCA 2018]
![Page 4: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence](https://reader034.vdocuments.net/reader034/viewer/2022051805/5ff351f80d04ed26f568b4c9/html5/thumbnails/4.jpg)
From AnandTech: “… coherency was broken and manually disabled on the Galaxy S 4. The implications are serious from a power consumption (and performance) standpoint.”
Bugs in the Wild
![Page 5: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence](https://reader034.vdocuments.net/reader034/viewer/2022051805/5ff351f80d04ed26f568b4c9/html5/thumbnails/5.jpg)
S
I
M
I
IMAD
M
store / send GetM to Dir
IMA
recv Data + Ackrecv Data
recv Acks
IMADI recv Fwd-GetM
IMADS recv Fwd-GetS
IMAI
IMAS
recv Fwd-GetM
recv Fwd-GetS
recv Data
recv Data
IMADSI
IMASI
recv Acks
recv Inv
recv Inv
recv Data
recv Acks recv Data + Ack
recv Data + Ack
Srecv Data + Ack
recv Acks
physical atomicity logical atomicity
![Page 6: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence](https://reader034.vdocuments.net/reader034/viewer/2022051805/5ff351f80d04ed26f568b4c9/html5/thumbnails/6.jpg)
S Mstore / send GetM / recv Ack
Atomic S to M Transition
![Page 7: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence](https://reader034.vdocuments.net/reader034/viewer/2022051805/5ff351f80d04ed26f568b4c9/html5/thumbnails/7.jpg)
S SMAD Mstore / send GetM recv Ack
Transient States
non-atomic transaction
![Page 8: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence](https://reader034.vdocuments.net/reader034/viewer/2022051805/5ff351f80d04ed26f568b4c9/html5/thumbnails/8.jpg)
SMAD
recv Inv recv Fwd-GetS
S Mstore / send GetM recv Ack
Concurrent Transactions
![Page 9: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence](https://reader034.vdocuments.net/reader034/viewer/2022051805/5ff351f80d04ed26f568b4c9/html5/thumbnails/9.jpg)
SMAD
recv Inv recv Fwd-GetS
S Mstore / send GetM recv Ack
recv Inv recv Fwd-GetS
Concurrent Transactions
non-atomic transactions + concurrency = complexity
![Page 10: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence](https://reader034.vdocuments.net/reader034/viewer/2022051805/5ff351f80d04ed26f568b4c9/html5/thumbnails/10.jpg)
To Summarize…
§ Stable state protocols assume physically atomic transactions
§ Need to support concurrency for performance
§ Transient states required to provide logically atomic transactions
![Page 11: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence](https://reader034.vdocuments.net/reader034/viewer/2022051805/5ff351f80d04ed26f568b4c9/html5/thumbnails/11.jpg)
Thus…
§ Stable state protocol is a sequential specification
§ The final protocol is a non-blocking concurrent implementation
§ Transient states are synchronization operations
![Page 12: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence](https://reader034.vdocuments.net/reader034/viewer/2022051805/5ff351f80d04ed26f568b4c9/html5/thumbnails/12.jpg)
Insight
S
I
M
I
IMAD
M
IMA
IMADI
IMADS
IMAI
IMAS
IMADSI
IMASI
S
Sequential object {………}
Non-blocking concurrent {………}
No wonder cache coherence protocols are Hard!
![Page 13: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence](https://reader034.vdocuments.net/reader034/viewer/2022051805/5ff351f80d04ed26f568b4c9/html5/thumbnails/13.jpg)
Insightconcurrent Method-1 {…RMW(…); //linearization point…}
timeMethod-1()
Method-2()
Method-1()Method-2()
Interconnect
Directory Memory
Core
Cache X = 0
Core
Cache X = 0
Directory is the Linearization point!
![Page 14: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence](https://reader034.vdocuments.net/reader034/viewer/2022051805/5ff351f80d04ed26f568b4c9/html5/thumbnails/14.jpg)
Demystifying Transient States
How do transient states provide logical atomicity?
§ Convey directory serialization order to caches
§ Transient states ensure that caches obey this order
ProtoGen automates by leveraging this insight!
![Page 15: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence](https://reader034.vdocuments.net/reader034/viewer/2022051805/5ff351f80d04ed26f568b4c9/html5/thumbnails/15.jpg)
How does cache infer serialization order?
recv Inv recv Fwd-GetS
S Mstore / send GetM recv Ack
recv Inv recv Fwd-GetS
S MAD
![Page 16: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence](https://reader034.vdocuments.net/reader034/viewer/2022051805/5ff351f80d04ed26f568b4c9/html5/thumbnails/16.jpg)
recv Fwd-GetM
recv Fwd-GetM
O Mstore / send GetM recv Data + Ack
recv Fwd-GetM
recv Fwd-GetM
O MAC
How to resolve name conflicts?
![Page 17: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence](https://reader034.vdocuments.net/reader034/viewer/2022051805/5ff351f80d04ed26f568b4c9/html5/thumbnails/17.jpg)
recv Fwd-GetM-O
recv Fwd-GetM-M
O Mstore / send GetM recv Data + Ack
recv Fwd-GetM-O
recv Fwd-GetM-M
O MAC
Rename Messages
![Page 18: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence](https://reader034.vdocuments.net/reader034/viewer/2022051805/5ff351f80d04ed26f568b4c9/html5/thumbnails/18.jpg)
ProtoGen Summary
§ Infer serialization order from incoming messages
§ Rename messages in order to achieve this
§ React like in stable state
![Page 19: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence](https://reader034.vdocuments.net/reader034/viewer/2022051805/5ff351f80d04ed26f568b4c9/html5/thumbnails/19.jpg)
ProtoGen Tool
S
I
M
I
IMAD
M
IMA
IMADI
IMADS
IMAI
IMAS
IMADSI
IMASI
S
ProtoGen Murϕ(DSL)
ProtoGen IR for protocolsProtoGen DSL
![Page 20: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence](https://reader034.vdocuments.net/reader034/viewer/2022051805/5ff351f80d04ed26f568b4c9/html5/thumbnails/20.jpg)
ProtoGen Verification
Verified Protocols
MSI ✓MESI ✓MOSI ✓MOESI ✓TSO-CC ✓
Verified Protocols
MSI ✓MESI ✓MOSI ✓MOESI ✓
![Page 21: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence](https://reader034.vdocuments.net/reader034/viewer/2022051805/5ff351f80d04ed26f568b4c9/html5/thumbnails/21.jpg)
How good are ProtoGen protocols?
§ Protocol specifications from Primer
§ Stalling protocols: Almost identical
§ Non-stalling MSI protocol: 5 fewer stalls
ProtoGen as good (or better) than manually generated protocols
![Page 22: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence](https://reader034.vdocuments.net/reader034/viewer/2022051805/5ff351f80d04ed26f568b4c9/html5/thumbnails/22.jpg)
ProtoGen is work in progress…
• Only directory protocols (we believe snooping is possible)
• Needs a correct SSP (working on autocorrecting SSP protocols)
• Only flat protocols (working on hierarchical)
• Needs virtual channel assignment (working on automating it)
![Page 23: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence](https://reader034.vdocuments.net/reader034/viewer/2022051805/5ff351f80d04ed26f568b4c9/html5/thumbnails/23.jpg)
ProtoGen makes coherence protocols easy!
§ https://github.com/icsa-caps/ ProtoGen
N. Oswald, V. Nagarajan and D.J. SorinProtoGen: Automatically Generating Directory Cache Coherence Protocols from Atomic Specifications ISCA 2018.
![Page 24: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence](https://reader034.vdocuments.net/reader034/viewer/2022051805/5ff351f80d04ed26f568b4c9/html5/thumbnails/24.jpg)
![Page 25: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence](https://reader034.vdocuments.net/reader034/viewer/2022051805/5ff351f80d04ed26f568b4c9/html5/thumbnails/25.jpg)
Own Transaction after Remote Transaction
recv Inv-Ack
I MAD
M
I
S
store / send Upgrade
store /send GetM
recv Data
recv Data recv Last(Inv-Ack)
recv Last(Inv-Ack)
recv Inv / send Inv-Ack
recv Ack
recv DataNoAck
recv Inv-Ack
IMA
recv Inv-Ack
SMA
S MAD
recv Inv / send Inv-Ack
recv Inv-Ack
![Page 26: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence](https://reader034.vdocuments.net/reader034/viewer/2022051805/5ff351f80d04ed26f568b4c9/html5/thumbnails/26.jpg)
Own Transaction before Remote Transaction
M
S
store / send Upgrade
recv Data recv Last(Inv-Ack)
recv Ack
recv Fwd-GetM / send DataNoAck
S
recv Inv-Ack
S MAD
SMADIrecv Ack /
send DataNoAck
recv Fwd-GetM
recv Inv-Ack
SMA
recv Data
recv Last(Inv-Ack) / send DataNoAck
SMAI
recv Inv-Ack
SMAII
![Page 27: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence](https://reader034.vdocuments.net/reader034/viewer/2022051805/5ff351f80d04ed26f568b4c9/html5/thumbnails/27.jpg)
S
recv Inv
Mstore / send GetM recv Ack
recv Fwd-GetM
recv Fwd-GetS
S MAD
recv Inv
recv Fwd-GetM
recv Fwd-GetS
SMA
recv Fwd-GetM
recv Fwd-GetS
recv Data recv Last(Inv-Ack)
recv Inv-Ack
![Page 28: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence](https://reader034.vdocuments.net/reader034/viewer/2022051805/5ff351f80d04ed26f568b4c9/html5/thumbnails/28.jpg)
Bluespec
§ Idea: Guarded atomic actions§ Atomic updates of multiple participants
§ Dave et al. implemented non-blocking coherence protocol§ Input was not SSP protocol, but complete non-blocking MSI protocol
description
![Page 29: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence](https://reader034.vdocuments.net/reader034/viewer/2022051805/5ff351f80d04ed26f568b4c9/html5/thumbnails/29.jpg)
Teapot
§ Language Support for Writing Memory Coherence Protocols§ Similar to ProtoGen DSL§ Does not automatically generate transient states nor transitions
§ Input is not SSP protocol, but complete non-blocking MSI protocol description
![Page 30: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence](https://reader034.vdocuments.net/reader034/viewer/2022051805/5ff351f80d04ed26f568b4c9/html5/thumbnails/30.jpg)
Atomic Coherence
§ Atomic Coherence: Leveraging Nanophotonics Similar to Build Race-Free Cache Coherence Protocols
§ Atomic transactions § Mutex based approach
§ Performance achieved by leveraging optical interconnects