ddm – a cache only memory architecture hagersten, landin, and haridi (1991) presented by patrick...
TRANSCRIPT
![Page 1: DDM – A Cache Only Memory Architecture Hagersten, Landin, and Haridi (1991) Presented by Patrick Eibl](https://reader036.vdocuments.net/reader036/viewer/2022082816/56649d1f5503460f949f3a7c/html5/thumbnails/1.jpg)
DDM – A Cache Only Memory Architecture
Hagersten, Landin, and Haridi (1991)Presented by Patrick Eibl
![Page 2: DDM – A Cache Only Memory Architecture Hagersten, Landin, and Haridi (1991) Presented by Patrick Eibl](https://reader036.vdocuments.net/reader036/viewer/2022082816/56649d1f5503460f949f3a7c/html5/thumbnails/2.jpg)
Outline
• Basics of Cache-Only Memory Architectures• The Data Diffusion Machine (DDM)• DDM Coherence Protocol• Examples of Replacement, Reading, Writing• Memory Overhead• Simulated Performance• Strengths and Weaknesses• Alternatives to DDM Architecture
![Page 3: DDM – A Cache Only Memory Architecture Hagersten, Landin, and Haridi (1991) Presented by Patrick Eibl](https://reader036.vdocuments.net/reader036/viewer/2022082816/56649d1f5503460f949f3a7c/html5/thumbnails/3.jpg)
The Big Idea: UMA→NUMA →COMA
• Centralized shared memory feeds data through network to individual caches
• Uniform access time to all memory
• Shared memory is distributed among processors (DASH)
• Data can move from home memory to other caches as needed
• No notion of “home” for data; moves to wherever it is needed
• Individual memories behave like caches
![Page 4: DDM – A Cache Only Memory Architecture Hagersten, Landin, and Haridi (1991) Presented by Patrick Eibl](https://reader036.vdocuments.net/reader036/viewer/2022082816/56649d1f5503460f949f3a7c/html5/thumbnails/4.jpg)
COMA: The Basics
• Individual memories are called Attraction Memories (AM) – each processor “attracts” its working data set
• AM also contains data that has never been accessed (+/-?)
• Uses shared memory programming model, but with no pressure to optimize static partitioning
• Limited duplication of shared memory• The Data Diffusion Machine is the specific COMA
presented here
![Page 5: DDM – A Cache Only Memory Architecture Hagersten, Landin, and Haridi (1991) Presented by Patrick Eibl](https://reader036.vdocuments.net/reader036/viewer/2022082816/56649d1f5503460f949f3a7c/html5/thumbnails/5.jpg)
Data Diffusion Machine
• Directory hierarchy allows scaling to arbitrary number of processors
• Branch factors and bottlenecks a consideration– Hierarchy can be split into different address domains to
improve bandwidth
![Page 6: DDM – A Cache Only Memory Architecture Hagersten, Landin, and Haridi (1991) Presented by Patrick Eibl](https://reader036.vdocuments.net/reader036/viewer/2022082816/56649d1f5503460f949f3a7c/html5/thumbnails/6.jpg)
Coherence Protocol
• Transient states support split-transaction bus• Fairly standard protocol with important exception of
replacement, which must be managed carefully (example to come)
• Sequential consistency is guaranteed, but with cost that writes must wait for acknowledge before continuing
Item StatesI: InvalidE: ExclusiveS: SharedR: ReadingW: WaitingRW: Reading
and Waiting
Bus Transactions
e: erasex: exclusiver: readd: datai: injecto: out
![Page 7: DDM – A Cache Only Memory Architecture Hagersten, Landin, and Haridi (1991) Presented by Patrick Eibl](https://reader036.vdocuments.net/reader036/viewer/2022082816/56649d1f5503460f949f3a7c/html5/thumbnails/7.jpg)
SII I I
P P P P P PP P
I
I
S
S
I
I
I
I: InvalidS: Shared
Processors
Caches
Directories
o: outi: inject
Replacement Example
I
Io
o
oi
i
id
IS
1. A block needs to be brought into a full AM, necessitating a replacement and an out transaction
2. Out propagates up until it finds another copy of block in S, R, W, or RW
3. Out reaches top and is converted to inject, meaning this is the last copy of the data and it needs a new home
4. Inject finds space in new AM
5. Data is transferred to new home
6. States change accordingly
![Page 8: DDM – A Cache Only Memory Architecture Hagersten, Landin, and Haridi (1991) Presented by Patrick Eibl](https://reader036.vdocuments.net/reader036/viewer/2022082816/56649d1f5503460f949f3a7c/html5/thumbnails/8.jpg)
I
S
I S S I I
S
S
S S
S
S
S
P P P P P PP P
RI
RI
AS
AS
RI
RI
RIr
r
r
r
r
r
rd
d
d
d
dd
d
I: InvalidR: ReadingA: AnsweringS: Shared
Processors
Caches
Directories
r: readd: data
Multilevel Read Example
1. First cache issues read request
2. Read propagates up hierarchy
4. Directories change to answering state while waiting for data
3. Read reaches directory with block in shared state
5. Data moves back along same path, changing states to shared as it goes
2. Second cache issues request for same block
3. Request for same block encountered; directory simply waits for data reply from other request
![Page 9: DDM – A Cache Only Memory Architecture Hagersten, Landin, and Haridi (1991) Presented by Patrick Eibl](https://reader036.vdocuments.net/reader036/viewer/2022082816/56649d1f5503460f949f3a7c/html5/thumbnails/9.jpg)
I
I I
III
RWW
E
EI I II
P P P P P PP P
e
I: InvalidR: ReadingW: WaitingE: ExclusiveS: Shared
Processors
Caches
Directories
e: erasex: exclusive
EW
W S
S
S
S
S
S
S
S
S
WSe
e e
ee
e
x
x
Multilevel Write Example
1. Cache issues write request2. Erase propagates up hierarchy and back down, invalidating all other copies
5. ACK propagates back down, changing states from Waiting to Exclusive
4. Top of hierarchy responds with acknowledge
2. Second cache issues write to same block3. Second exclusive request encounters other write to same block; first one won because it arrived first; other erase is propagated back down4. State of second cache changed to RW, and will issue a read request before another erase (not shown)
![Page 10: DDM – A Cache Only Memory Architecture Hagersten, Landin, and Haridi (1991) Presented by Patrick Eibl](https://reader036.vdocuments.net/reader036/viewer/2022082816/56649d1f5503460f949f3a7c/html5/thumbnails/10.jpg)
Memory Overhead
• Inclusion is necessary for directories, but not for data
• Directories only need state bits and address tags
• For two sample configurations given, overheads were 6% for one-level 32-processor and 16% for two-level 256-processor
• Larger item size reduces overhead
![Page 11: DDM – A Cache Only Memory Architecture Hagersten, Landin, and Haridi (1991) Presented by Patrick Eibl](https://reader036.vdocuments.net/reader036/viewer/2022082816/56649d1f5503460f949f3a7c/html5/thumbnails/11.jpg)
Simulated Performance
• Minimal success on programs for which each processor operates on entire shared data
• MP3D was rewritten to improve performance by exploiting fact that data has no home
• OS, hardware, and emulator in development at the time
• Different DDM topology for each program (-)
![Page 12: DDM – A Cache Only Memory Architecture Hagersten, Landin, and Haridi (1991) Presented by Patrick Eibl](https://reader036.vdocuments.net/reader036/viewer/2022082816/56649d1f5503460f949f3a7c/html5/thumbnails/12.jpg)
Strengths
• Each processor attracts the data it’s using into its own memory space
• Data doesn’t need to be duplicated at a home node• Ordinary shared memory programming model• No need to optimize static partitioning (there is
none)• Directory hierarchy scales reasonably well• Good when data is moved around in smaller chunks
![Page 13: DDM – A Cache Only Memory Architecture Hagersten, Landin, and Haridi (1991) Presented by Patrick Eibl](https://reader036.vdocuments.net/reader036/viewer/2022082816/56649d1f5503460f949f3a7c/html5/thumbnails/13.jpg)
Weaknesses
• Attraction memories hold data that isn’t being used, making them bigger and slower
• Different DDM hierarchy topology was used for each program in simulations
• Does not fully exploit large spatial locality; software wins in that case (S-COMA)
• Branching hierarchy is prone to bottlenecks and hotspots
• No way to know where data is but with expensive tree traversal (NUMA wins here)
![Page 14: DDM – A Cache Only Memory Architecture Hagersten, Landin, and Haridi (1991) Presented by Patrick Eibl](https://reader036.vdocuments.net/reader036/viewer/2022082816/56649d1f5503460f949f3a7c/html5/thumbnails/14.jpg)
Alternatives to COMA/DDM
• Flat-COMA– Blocks are free to migrate, but have home nodes
with directories corresponding to physical address• Simple-COMA– Allocation managed by OS and done at page
granularity• Reactive-NUMA– Switches between S-COMA and NUMA with
remote cache on per-page basisGood summary of COMAs: http://ieeexplore.ieee.org/iel5/2/16679/00769448.pdf?tp=&isnumber=&arnumber=769448