cpsc 668 distributed algorithms and systems
DESCRIPTION
CPSC 668 Distributed Algorithms and Systems. Fall 2006 Prof. Jennifer Welch. Distributed Shared Memory. A model for inter-process communication Provides illusion of shared variables on top of message passing - PowerPoint PPT PresentationTRANSCRIPT
CPSC 668 Set 16: Distributed Shared Memory 1
CPSC 668Distributed Algorithms and Systems
Fall 2006
Prof. Jennifer Welch
CPSC 668 Set 16: Distributed Shared Memory 2
Distributed Shared Memory• A model for inter-process communication• Provides illusion of shared variables on top of
message passing• Shared memory is often considered a more
convenient programming platform than message passing
• Formally, give a simulation of the shared memory model on top of the message passing model
• We'll consider the special case of– no failures– only read/write variables to be simulated
CPSC 668 Set 16: Distributed Shared Memory 3
Shared Memory Issues• A process will invoke a shared memory operation at
some time• The simulation algorithm running on the same node
will execute some code, possibly involving exchanges of messages
• Eventually the simulation algorithm will inform the process of the result of the shared memory operation.
• So shared memory operations are not instantaneous!– Operations (invoked by different processes) can overlap
• What should be returned by operations that overlap other operations?– defined by a memory consistency condition
CPSC 668 Set 16: Distributed Shared Memory 4
Sequential Specifications
• Each shared object has a sequential specification: specifies behavior of object in the absence of concurrency.
• Object supports operations– invocations– matching responses
• Set of sequences of operations that are legal
CPSC 668 Set 16: Distributed Shared Memory 5
Sequential Spec for R/W Registers
• Operations are reads and writes
• Invocations are readi(X) and writei(X,v)
• Responses are returni(X,v) and acki(X)
• A sequence of operations is legal iff each read returns the value of the latest preceding write.
CPSC 668 Set 16: Distributed Shared Memory 6
Memory Consistency Conditions
• Consistency conditions tie together the sequential specification with what happens in the presence of concurrency.
• We will study two well-known conditions:– linearizability– sequential consistency
• We will only consider read/write registers, in the absence of failures.
CPSC 668 Set 16: Distributed Shared Memory 7
Definition of Linearizability• Suppose is a sequence of invocations and
responses.– an invocation is not necessarily immediately
followed by its matching response is linearizable if there exists a permutation
of all the operations in (now each invocation is immediately followed by its matching response) s.t. |X is legal (satisfies sequential spec) for all X, and– if response of operation O1 occurs in before
invocation of operation O2, then O1 occurs in before O2 ( respects real-time order of non-concurrent operations in ).
CPSC 668 Set 16: Distributed Shared Memory 8
Linearizability Examples
write(X,1) ack(X)
Suppose there are two shared variables, X and Y, both initially 0
read(Y) return(Y,1)
write(Y,1) ack(Y) read(X) return(X,1)
p0
p1
Is this sequence linearizable? Yes - green triangles.
What if p1's read returns 0?
0
No - see arrow.
1
2
3
4
CPSC 668 Set 16: Distributed Shared Memory 9
Definition of Sequential Consistency
• Suppose is a sequence of invocations and responses.
is sequentially consistent if there exists a permutation of all the operations in s.t. |X is legal (satisfies sequential spec) for all X,
and– if response of operation O1 occurs in before
invocation of operation O2 at the same process, then O1 occurs in before O2 ( respects real-time order of operations by the same process in ).
CPSC 668 Set 16: Distributed Shared Memory 10
Sequential Consistency Examples
write(X,1) ack(X)
Suppose there are two shared variables, X and Y, both initially 0
read(Y) return(Y,1)
write(Y,1) ack(Y) read(X) return(X,0)
p0
p1
Is this sequence sequentially consistent? Yes - green numbers.
What if p0's read returns 0?
0
No - see arrows.
1 2
3 4
CPSC 668 Set 16: Distributed Shared Memory 11
Specification of Linearizable Shared Memory Comm. System• Inputs are invocations on the shared objects• Outputs are responses from the shared
objects• A sequence is in the allowable set iff
– Correct Interaction: each proc. alternates invocations and matching responses
– Liveness: each invocation has a matching response
– Linearizability: is linearizable
CPSC 668 Set 16: Distributed Shared Memory 12
Specification of Sequentially Consistent Shared Memory• Inputs are invocations on the shared objects• Outputs are responses from the shared
objects• A sequence is in the allowable set iff
– Correct Interaction: each proc. alternates invocations and matching responses
– Liveness: each invocation has a matching response
– Sequential Consistency: is sequentially consistent
CPSC 668 Set 16: Distributed Shared Memory 13
Algorithm to Implement Linearizable Shared Memory• Uses totally ordered broadcast as the underlying
communication system.• Each proc keeps a replica for each shared variable• When read request arrives:
– send bcast msg containing request– when own bcast msg arrives, return value in local replica
• When write request arrives:– send bcast msg containing request– upon receipt, each proc updates its replica's value– when own bcast msg arrives, respond with ack
CPSC 668 Set 16: Distributed Shared Memory 14
The Simulation
alg0
read/write return/ack
to-bc-send to-bc-recv
Totally Ordered Broadcast
algn-1
read/write return/ack
to-bc-send to-bc-recv
…
user of read/write shared memory
Shared Memory
CPSC 668 Set 16: Distributed Shared Memory 15
Correctness of Linearizability Algorithm
• Consider any admissible execution of the algorithm – underlying totally ordered broadcast
behaves properly– users interact properly
• Show that , the restriction of to the events of the top interface, satisfies Liveness, and Linearizability.
CPSC 668 Set 16: Distributed Shared Memory 16
Correctness of Linearizability Algorithm• Liveness (every invocation has a response):
By Liveness property of the underlying totally ordered broadcast.
• Linearizability: Define the permutation of the operations to be the order in which the corresponding broadcasts are received. is legal: because all the operations are
consistently ordered by the TO bcast. respects real-time order of operations: if O1
finishes before O2 begins, O1's bcast is ordered before O2's bcast.
CPSC 668 Set 16: Distributed Shared Memory 17
Why is Read Bcast Needed?
• The bcast done for a read causes no changes to any replicas, just delays the response to the read.
• Why is it needed?
• Let's see what happens if we remove it.
CPSC 668 Set 16: Distributed Shared Memory 18
Why Read Bcast is Needed
write(1)
read return(1)
read return(0)
to-bc-send
p0
p1
p2
CPSC 668 Set 16: Distributed Shared Memory 19
Algorithm for Sequential Consistency• The linearizability algorithm, without doing a bcast for
reads:• Uses totally ordered broadcast as the underlying
communication system.• Each proc keeps a replica for each shared variable• When read request arrives:
– immediately return the value stored in the local replica
• When write request arrives:– send bcast msg containing request– upon receipt, each proc updates its replica's value– when own bcast msg arrives, respond with ack
CPSC 668 Set 16: Distributed Shared Memory 20
Correctness of SC Algorithm
Lemma (9.3): The local copies at each proc. take on all the values appearing in write operations, in the same order, which preserves the per-proc. order of writes.
Lemma (9.4): If pi writes Y and later reads X, then pi's update of its local copy of Y (on behalf of that write) precedes its read of its local copy of X (on behalf of that read).
CPSC 668 Set 16: Distributed Shared Memory 21
Correctness of the SC Algorithm
(Theorem 9.5) Why does SC hold?
• Given any admissible execution , must come up with a permutation of the shared memory operations that is– legal and– respects per-proc. ordering of operations
CPSC 668 Set 16: Distributed Shared Memory 22
The Permutation • Insert all writes into in their to-bcast
order.• Consider each read R in in the order of
invocation:– suppose R is a read by pi of X– place R in immediately after the later of
• the operation by pi that immediately precedes R in , and
• the write that R "read from" (caused the latest update of pi's local copy of X preceding the response for R)
CPSC 668 Set 16: Distributed Shared Memory 23
Permutation Example
write(2)
read return(2)
read return(1)
to-bc-send
p0
p1
p2
ack
write(1) ack
to-bc-send
permutation is given by green numbers
1
3
4
2
CPSC 668 Set 16: Distributed Shared Memory 24
Permutation Respects Per Proc. OrderingFor a specific proc:• Relative ordering of two writes is preserved
by Lemma 9.3• Relative ordering of two reads is preserved by
the construction of • If write W precedes read R in exec. , then W
precedes R in by construction• Suppose read R precedes write W in .
Show same is true in .
CPSC 668 Set 16: Distributed Shared Memory 25
Permutation Respects Ordering• Suppose R and W are swapped in :
– There is a read R' by pi that equals or precedes R in – There is a write W' that equals W or follows W in the to-
bcast order– And R' "reads from" W'.
• But:– R' finishes before W starts in and– updates are done to local replicas in to-bcast order (Lemma
9.3) so update for W' does not precede update for W– so R' cannot read from W'.
R' R W|pi :
: …W … W' … R' … R …
CPSC 668 Set 16: Distributed Shared Memory 26
Permutation is Legal
• Consider some read R by pi and some write W s.t. R reads from W in .
• Suppose in contradiction, some other write W' falls between W and R in :
• Why does R follow W' in ?
: …W … W' … R …
CPSC 668 Set 16: Distributed Shared Memory 27
Permutation is Legal
Case 1: R follows W' in because W' is also by pi and R follows W' in .
• Update for W at pi precedes update for W' at pi in (Lemma 9.3).
• Thus R does not read from W, contradiction.
CPSC 668 Set 16: Distributed Shared Memory 28
Permutation is LegalCase 2: R follows W' in due to some operation O
by pi s.t. – O precedes R in , and – O is placed between W' and R in
: …W … W' … O … R …
Case 2.1: O is a write.• update for W' at pi precedes update for O at pi in (Lemma 9.3)• update for O at pi precedes pi's local read for R in (Lemma 9.4)• So R does not read from W, contradiction.
CPSC 668 Set 16: Distributed Shared Memory 29
Permutation is Legal : …W … W' … O' … O … R …
Case 2.2: O is a read.• A recursive argument shows that there exists a read O'
by pi (which might equal O) that – reads from W' in and– appears in between W' and O
• Update for W at pi precedes update for W' at pi in (Lemma 9.3).
• Update for W' at pi precedes local read for O' at pi in (otherwise O' would not read from W').
• Recall that O' equals or precedes O (from above) and O precedes R (by assumption for Case 2) in
• Thus R cannot read from W, contradiction.
CPSC 668 Set 16: Distributed Shared Memory 30
Performance of SC Algorithm
• Read operations are implemented "locally", without requiring any inter-process communication.
• Thus reads can be viewed as "fast": time between invocation and response is that needed for some local computation.
• Time for writes is time for delivery of one totally ordered broadcast (depends on how to-bcast is implemented).
CPSC 668 Set 16: Distributed Shared Memory 31
Alternative SC Algorithm
• It is possible to have an algorithm that implements sequentially consistent shared memory on top of totally ordered broadcast that has reverse performance:– writes are local/fast (even though bcasts are sent,
don't wait for them to be received)– reads can require waiting for some bcasts to be
received
• Like the previous SC algorithm, this one does not implement linearizable shared memory.
CPSC 668 Set 16: Distributed Shared Memory 32
Time Complexity for DSM Algorithms• One complexity measure of interest for DSM
algorithms is how long it takes for operations to complete.
• The linearizability algorithm required D time for both reads and writes, where D is the maximum time for a totally-ordered broadcast message to be received.
• The sequential consistency algorithm required D time for writes and C time for reads, where C is the time for doing some local computation.
• Can we do better? To answer this question, we need some kind of timing model.
CPSC 668 Set 16: Distributed Shared Memory 33
Timing Model
• Assume the underlying communication system is the point-to-point message passing system (not totally ordered broadcast).
• Assume that every message has delay in the range [d-u,d].
• Claim: Totally ordered broadcast can be implemented in this model so that D, the maximum time for delivery, is O(d).
CPSC 668 Set 16: Distributed Shared Memory 34
Time and Clocks in Layered Model• Timed execution: associate an occurrence
time with each node input event.• Times of other events are "inherited" from
time of triggering node input– recall assumption that local processing time is
negligible.
• Model hardware clocks as before: run at same rate as real time, but not synchronized
• Notions of view, timed view, shifting are same:– Shifting Lemma still holds (relates h/w clocks and
msg delays between original and shifted execs)
CPSC 668 Set 16: Distributed Shared Memory 35
Lower Bound for SC
Let Tread = worst-case time for a read to complete
Let Twrite = worst-case time for a write to complete
Theorem (9.7): In any simulation of sequentially consistent shared memory on top of point-to-point message passing, Tread + Twrite d.
CPSC 668 Set 16: Distributed Shared Memory 36
SC Lower Bound Proof• Consider any SC simulation with Tread + Twrite < d.• Let X and Y be two shared variables, both initially 0.• Let 0 be admissible execution whose top layer
behavior is
write0(X,1) ack0(X) read0(Y) return0(Y,0)– write begins at time 0, read ends before time d– every msg has delay d
• Why does 0 exist?– The alg. must respond correctly to any sequence of invocations.– Suppose user at p0 wants to do a write, immediately followed by
a read.– By SC, read must return 0.– By assumption, total elapsed time is less than d.
CPSC 668 Set 16: Distributed Shared Memory 37
SC Lower Bound Proof
• Similarly, let 1 be admissible execution whose top layer behavior is
write1(Y,1) ack1(Y) read1(X) return1(X,0)– write begins at time 0, read ends before time d– every msg has delay d
1 exists for similar reason.
• Now merge p0's timed view in 0 with p1's timed view in 1 to create admissible execution '.
• But ' is not SC, contradiction!
CPSC 668 Set 16: Distributed Shared Memory 38
SC Lower Bound Prooftime 0 d
write(X,1) read(Y,0)p0
p1
0
write(Y,1) read(X,0)
p0
p1
1
write(X,1) read(Y,0)p0
p1
'
write(Y,1) read(X,0)
CPSC 668 Set 16: Distributed Shared Memory 39
Linearizability Write Lower BoundTheorem (9.8): In any simulation of linearizable
shared memory on top of point-to-point message passing, Twrite ≥ u/2.
Proof: Consider any linearizable simulation with Twrite < u/2.
• Let be an admissible exec. whose top layer behavior is:p1 writes 1 to X, p2 writes 2 to X, p0 reads 2 from X
• Shift to create admissible exec. in which p1 and p2's writes are swapped, causing p0's read to violate linearizability.
CPSC 668 Set 16: Distributed Shared Memory 40
Linearizability Write Lower Bound0 u/2 utime:
p0
p1
p2
write 1
read 2
write 2
:
p0
p1
p2
delaypattern
d - u/2
d - u/2
d - u/2 d - u/2
d d - u
CPSC 668 Set 16: Distributed Shared Memory 41
Linearizability Write Lower Bound0 u/2 utime:
p0
p1
p2
write 1
read 2
write 2
p0
p1
p2
delaypattern
d
d - u
d - u d
d- u d
shift p1
by u/2
shift p2
by -u/2
CPSC 668 Set 16: Distributed Shared Memory 42
Linearizability Read Lower Bound
• Approach is similar to the write lower bound. • Assume in contradiction there is an algorithm
with Tread < u/4.• Identify a particular execution:
– fix a pattern of read and write invocations, occurring at particular times
– fix the pattern of message delays
• Shift this execution to get one that is– still admissible– but not linearizable
CPSC 668 Set 16: Distributed Shared Memory 43
Linearizability Read Lower Bound
Original execution:
• p1 reads X and gets 0 (old value).
• Then p0 starts writing 1 to X.
• When write is done, p0 reads X and gets 1 (new value).
• Also, during the write, p0 and p1 alternate reading X.
• At some point, the reads stop getting the old value (0) and start getting the new value (1)
CPSC 668 Set 16: Distributed Shared Memory 44
Linearizability Read Lower Bound
• Set all delays in this execution to be d - u/2.
• Now shift p2 earlier by u/2.
• Verify that result is still admissible (every delay either stays the same or becomes d or d - u).
• But in shifted execution, sequence of values read is
0, 0, …, 0, 1, 0, 1, 1, …, 1
CPSC 668 Set 16: Distributed Shared Memory 45
p0
p1
p2
Linearizability Read Lower Bound
read 0
read 1
read 0
read 1
read 1
read 1
read 1
read 0
write 1
u/2
p0
p1
read 0 read 0 read 1 read 1
p2
read 1read 1 read 1read 0
write 1