distributed shared memory (part 1). distributed shared memory (dsm) mem0 proc0 mem1 proc1 mem2 proc2...
TRANSCRIPT
![Page 1: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory](https://reader034.vdocuments.net/reader034/viewer/2022042516/56649f325503460f94c4dfd2/html5/thumbnails/1.jpg)
Distributed Shared Memory (part 1)
![Page 2: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory](https://reader034.vdocuments.net/reader034/viewer/2022042516/56649f325503460f94c4dfd2/html5/thumbnails/2.jpg)
Distributed Shared Memory (DSM)
mem0
proc0
mem1
proc1
mem2
proc2
memN
procN
network
...
shared memory
![Page 3: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory](https://reader034.vdocuments.net/reader034/viewer/2022042516/56649f325503460f94c4dfd2/html5/thumbnails/3.jpg)
Shared memory programming
• Standard – pthread• synchronizations
– Barriers – Locks– Semaphores
![Page 4: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory](https://reader034.vdocuments.net/reader034/viewer/2022042516/56649f325503460f94c4dfd2/html5/thumbnails/4.jpg)
Sequential SOR
for some number of timesteps/iterations {for (i=0; i<n; i++ )
for( j=1, j<n, j++ )temp[i][j] = 0.25 *
( grid[i-1][j] + grid[i+1][j]
grid[i][j-1] + grid[i][j+1] );for( i=0; i<n; i++ )
for( j=1; j<n; j++ )grid[i][j] = temp[i][j];
}
![Page 5: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory](https://reader034.vdocuments.net/reader034/viewer/2022042516/56649f325503460f94c4dfd2/html5/thumbnails/5.jpg)
Parallel SOR with Barriers (1 of 2)
void* sor (void* arg){
int slice = (int)arg;int from = (slice * (n-1))/p + 1;int to = ((slice+1) * (n-1))/p + 1;
for some number of iterations { … }}
![Page 6: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory](https://reader034.vdocuments.net/reader034/viewer/2022042516/56649f325503460f94c4dfd2/html5/thumbnails/6.jpg)
Parallel SOR with Barriers (2 of 2)
for (i=from; i<to; i++) for (j=1; j<n; j++)
temp[i][j] = 0.25 * (grid[i-1][j] + grid[i+1][j] + grid[i][j-1] + grid[i][j+1]);
barrier();for (i=from; i<to; i++)
for (j=1; j<n; j++) grid[i][j]=temp[i][j];
barrier();
![Page 7: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory](https://reader034.vdocuments.net/reader034/viewer/2022042516/56649f325503460f94c4dfd2/html5/thumbnails/7.jpg)
Differences between SMP and Software DSM
• Delay: tradeoffs, such as block size• Software => traps: cost of
read/write misses• Goals of caches: multiprocessor =
performance, dist. system = transparency
• bus vs. long networks: reliance on serialization and broadcast.
![Page 8: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory](https://reader034.vdocuments.net/reader034/viewer/2022042516/56649f325503460f94c4dfd2/html5/thumbnails/8.jpg)
Consequent differences in protocols and applications
• Bigger block size– Cost amortization, higher hit ratio for larger
blocks?– Reduced overhead
• But therefore...– Migration vs. Replication– False sharing increases
• DSM protocol more complex: Must handle lost, corrupted, and out-of-order packets
• Above, coupled with cost of traps, => SDSM consistency cost much higher!
![Page 9: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory](https://reader034.vdocuments.net/reader034/viewer/2022042516/56649f325503460f94c4dfd2/html5/thumbnails/9.jpg)
Results of high consistency costs
• Manage sharing more carefully• Align data to page boundaries
![Page 10: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory](https://reader034.vdocuments.net/reader034/viewer/2022042516/56649f325503460f94c4dfd2/html5/thumbnails/10.jpg)
Consistency Models
• Sequential Consistency– All processors observe the same order– Must correspond to some serial order– Only ordering constraint is that
reads/writes of P1 appear in the same order, but no restrictions on relative ordering between processors.
![Page 11: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory](https://reader034.vdocuments.net/reader034/viewer/2022042516/56649f325503460f94c4dfd2/html5/thumbnails/11.jpg)
Common consistency protocols
• Write update– Multicast update to all replicas
• Write invalidate– Invalidate cached copies in p2, p3– Cache miss if p2/p3 access X
• Valid data from other cache
![Page 12: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory](https://reader034.vdocuments.net/reader034/viewer/2022042516/56649f325503460f94c4dfd2/html5/thumbnails/12.jpg)
Conventional Implementation
• As proposed by Li & Hudak, TOCS ‘86.• Use virtual memory to implement
sharing.• Shared memory divided up by virtual
memory pages.• Use single-writer, multiple-reader write-
invalidate coherence protocol.• Keep pages in one of three states:
– invalid, read-only, read-write
![Page 13: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory](https://reader034.vdocuments.net/reader034/viewer/2022042516/56649f325503460f94c4dfd2/html5/thumbnails/13.jpg)
Example
proc0 proc1 proc2 procN
shared memory
![Page 14: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory](https://reader034.vdocuments.net/reader034/viewer/2022042516/56649f325503460f94c4dfd2/html5/thumbnails/14.jpg)
Example: Read Access Hit
proc0 proc1 proc2 procN
read
![Page 15: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory](https://reader034.vdocuments.net/reader034/viewer/2022042516/56649f325503460f94c4dfd2/html5/thumbnails/15.jpg)
Example: Write Access Hit
proc0 proc1 proc2 procN
write
![Page 16: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory](https://reader034.vdocuments.net/reader034/viewer/2022042516/56649f325503460f94c4dfd2/html5/thumbnails/16.jpg)
Example: Read Access Miss
proc0 proc1 proc2 procN
read
![Page 17: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory](https://reader034.vdocuments.net/reader034/viewer/2022042516/56649f325503460f94c4dfd2/html5/thumbnails/17.jpg)
Example: Read Fault
proc0 proc1 proc2 procN
readfault
![Page 18: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory](https://reader034.vdocuments.net/reader034/viewer/2022042516/56649f325503460f94c4dfd2/html5/thumbnails/18.jpg)
Example: Replication on Read
proc0 proc1 proc2 procN
read
![Page 19: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory](https://reader034.vdocuments.net/reader034/viewer/2022042516/56649f325503460f94c4dfd2/html5/thumbnails/19.jpg)
Example: Write Access Miss
proc0 proc1 proc2 procN
write
![Page 20: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory](https://reader034.vdocuments.net/reader034/viewer/2022042516/56649f325503460f94c4dfd2/html5/thumbnails/20.jpg)
Example: Write Fault
proc0 proc1 proc2 procN
writefault
![Page 21: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory](https://reader034.vdocuments.net/reader034/viewer/2022042516/56649f325503460f94c4dfd2/html5/thumbnails/21.jpg)
Example: Write Invalidation
proc0 proc1 proc2 procN
write
![Page 22: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory](https://reader034.vdocuments.net/reader034/viewer/2022042516/56649f325503460f94c4dfd2/html5/thumbnails/22.jpg)
Example: Write Access to Read-Only
proc0 proc1 proc2 procN
write
![Page 23: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory](https://reader034.vdocuments.net/reader034/viewer/2022042516/56649f325503460f94c4dfd2/html5/thumbnails/23.jpg)
Example: Write Fault
proc0 proc1 proc2 procN
writefault
![Page 24: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory](https://reader034.vdocuments.net/reader034/viewer/2022042516/56649f325503460f94c4dfd2/html5/thumbnails/24.jpg)
Example: Write Invalidation
proc0 proc1 proc2 procN
write
![Page 25: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory](https://reader034.vdocuments.net/reader034/viewer/2022042516/56649f325503460f94c4dfd2/html5/thumbnails/25.jpg)
How to Remember Locations?
• Broadcast on miss (as in SMP).• Static home.• Dynamic home or owner.
![Page 26: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory](https://reader034.vdocuments.net/reader034/viewer/2022042516/56649f325503460f94c4dfd2/html5/thumbnails/26.jpg)
Ownership and Owner Location
• Owner is the last writer.• Owner maintains copyset.• Every processor maintains
probable owner (not always the real owner).
![Page 27: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory](https://reader034.vdocuments.net/reader034/viewer/2022042516/56649f325503460f94c4dfd2/html5/thumbnails/27.jpg)
Ownership Location
• Every read or write miss is sent to (local) probable owner.
• If owner, handle appropriately, else forward to probable owner.
![Page 28: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory](https://reader034.vdocuments.net/reader034/viewer/2022042516/56649f325503460f94c4dfd2/html5/thumbnails/28.jpg)
Ownership Modification
• If write miss, new writer becomes owner, and all forwarders set probable owner to requester.
• If read miss, set probable owner to responding processor.
![Page 29: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory](https://reader034.vdocuments.net/reader034/viewer/2022042516/56649f325503460f94c4dfd2/html5/thumbnails/29.jpg)
Example
• Initially, owner(page0) = p0, and probable owner(page0) = p0 everywhere.
• Write miss by p1, sends message to its probable owner (p0), handled there, new owner = p1, probable owner(0) on p0 = 1.
• Read miss by p2, sends message to probable owner (p0), forwarded to probable owner (p1), handled there, probable owner(0) on p2 becomes p1.
![Page 30: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory](https://reader034.vdocuments.net/reader034/viewer/2022042516/56649f325503460f94c4dfd2/html5/thumbnails/30.jpg)
Implement synchronizations
• Use messages to implement synchronizations
![Page 31: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory](https://reader034.vdocuments.net/reader034/viewer/2022042516/56649f325503460f94c4dfd2/html5/thumbnails/31.jpg)
Barriers
• Designate one processor as barrier manager.
• When a process waits at a barrier, it sends an arrival message to the barrier manager and waits.
• When barrier manager has received all messages, it sends a departure message to all processes.
![Page 32: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory](https://reader034.vdocuments.net/reader034/viewer/2022042516/56649f325503460f94c4dfd2/html5/thumbnails/32.jpg)
Locks
• Designate one process as the lock manager for a particular lock.
• When a process acquires a lock, it sends an acquire message to the manager and waits.
• Manager forwards message to last acquirer.
• If lock free, send lock grant message.• If lock held, hold on to request until
free, and then send lock grant message.
![Page 33: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory](https://reader034.vdocuments.net/reader034/viewer/2022042516/56649f325503460f94c4dfd2/html5/thumbnails/33.jpg)
Problem: False Sharing
• Concurrent access to different data within the same consistency unit.
• With page as consistency unit, lots of opportunity for false sharing.
• Two flavors:– read-write – write-write
![Page 34: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory](https://reader034.vdocuments.net/reader034/viewer/2022042516/56649f325503460f94c4dfd2/html5/thumbnails/34.jpg)
Read-Write False Sharing
x
y
![Page 35: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory](https://reader034.vdocuments.net/reader034/viewer/2022042516/56649f325503460f94c4dfd2/html5/thumbnails/35.jpg)
Read-Write False Sharing (Cont.)
w(x)
r(y) r(y) r(x)
synch
w(x) w(x)
![Page 36: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory](https://reader034.vdocuments.net/reader034/viewer/2022042516/56649f325503460f94c4dfd2/html5/thumbnails/36.jpg)
Read-Write False Sharing (Cont.)
w(x)
r(y) r(y) r(x)
synch
w(x) w(x)
![Page 37: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory](https://reader034.vdocuments.net/reader034/viewer/2022042516/56649f325503460f94c4dfd2/html5/thumbnails/37.jpg)
Write-Write False Sharing
w(x)
w(y) w(y) r(x)
synch
w(x) w(x)
![Page 38: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory](https://reader034.vdocuments.net/reader034/viewer/2022042516/56649f325503460f94c4dfd2/html5/thumbnails/38.jpg)
Summary
• Software shared memory on distributed memory hardware.– Uses virtual memory.
• Home migration to improve locality– important because of high latencies.
• Sequential consistency suffers from false sharing