cpsc 668 distributed algorithms and systems

45
CPSC 668 Set 16: Distributed Shared Memory 1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch

Upload: aspen-ortega

Post on 31-Dec-2015

45 views

Category:

Documents


0 download

DESCRIPTION

CPSC 668 Distributed Algorithms and Systems. Fall 2006 Prof. Jennifer Welch. Distributed Shared Memory. A model for inter-process communication Provides illusion of shared variables on top of message passing - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 1

CPSC 668Distributed Algorithms and Systems

Fall 2006

Prof. Jennifer Welch

Page 2: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 2

Distributed Shared Memory• A model for inter-process communication• Provides illusion of shared variables on top of

message passing• Shared memory is often considered a more

convenient programming platform than message passing

• Formally, give a simulation of the shared memory model on top of the message passing model

• We'll consider the special case of– no failures– only read/write variables to be simulated

Page 3: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 3

Shared Memory Issues• A process will invoke a shared memory operation at

some time• The simulation algorithm running on the same node

will execute some code, possibly involving exchanges of messages

• Eventually the simulation algorithm will inform the process of the result of the shared memory operation.

• So shared memory operations are not instantaneous!– Operations (invoked by different processes) can overlap

• What should be returned by operations that overlap other operations?– defined by a memory consistency condition

Page 4: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 4

Sequential Specifications

• Each shared object has a sequential specification: specifies behavior of object in the absence of concurrency.

• Object supports operations– invocations– matching responses

• Set of sequences of operations that are legal

Page 5: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 5

Sequential Spec for R/W Registers

• Operations are reads and writes

• Invocations are readi(X) and writei(X,v)

• Responses are returni(X,v) and acki(X)

• A sequence of operations is legal iff each read returns the value of the latest preceding write.

Page 6: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 6

Memory Consistency Conditions

• Consistency conditions tie together the sequential specification with what happens in the presence of concurrency.

• We will study two well-known conditions:– linearizability– sequential consistency

• We will only consider read/write registers, in the absence of failures.

Page 7: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 7

Definition of Linearizability• Suppose is a sequence of invocations and

responses.– an invocation is not necessarily immediately

followed by its matching response is linearizable if there exists a permutation

of all the operations in (now each invocation is immediately followed by its matching response) s.t. |X is legal (satisfies sequential spec) for all X, and– if response of operation O1 occurs in before

invocation of operation O2, then O1 occurs in before O2 ( respects real-time order of non-concurrent operations in ).

Page 8: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 8

Linearizability Examples

write(X,1) ack(X)

Suppose there are two shared variables, X and Y, both initially 0

read(Y) return(Y,1)

write(Y,1) ack(Y) read(X) return(X,1)

p0

p1

Is this sequence linearizable? Yes - green triangles.

What if p1's read returns 0?

0

No - see arrow.

1

2

3

4

Page 9: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 9

Definition of Sequential Consistency

• Suppose is a sequence of invocations and responses.

is sequentially consistent if there exists a permutation of all the operations in s.t. |X is legal (satisfies sequential spec) for all X,

and– if response of operation O1 occurs in before

invocation of operation O2 at the same process, then O1 occurs in before O2 ( respects real-time order of operations by the same process in ).

Page 10: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 10

Sequential Consistency Examples

write(X,1) ack(X)

Suppose there are two shared variables, X and Y, both initially 0

read(Y) return(Y,1)

write(Y,1) ack(Y) read(X) return(X,0)

p0

p1

Is this sequence sequentially consistent? Yes - green numbers.

What if p0's read returns 0?

0

No - see arrows.

1 2

3 4

Page 11: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 11

Specification of Linearizable Shared Memory Comm. System• Inputs are invocations on the shared objects• Outputs are responses from the shared

objects• A sequence is in the allowable set iff

– Correct Interaction: each proc. alternates invocations and matching responses

– Liveness: each invocation has a matching response

– Linearizability: is linearizable

Page 12: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 12

Specification of Sequentially Consistent Shared Memory• Inputs are invocations on the shared objects• Outputs are responses from the shared

objects• A sequence is in the allowable set iff

– Correct Interaction: each proc. alternates invocations and matching responses

– Liveness: each invocation has a matching response

– Sequential Consistency: is sequentially consistent

Page 13: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 13

Algorithm to Implement Linearizable Shared Memory• Uses totally ordered broadcast as the underlying

communication system.• Each proc keeps a replica for each shared variable• When read request arrives:

– send bcast msg containing request– when own bcast msg arrives, return value in local replica

• When write request arrives:– send bcast msg containing request– upon receipt, each proc updates its replica's value– when own bcast msg arrives, respond with ack

Page 14: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 14

The Simulation

alg0

read/write return/ack

to-bc-send to-bc-recv

Totally Ordered Broadcast

algn-1

read/write return/ack

to-bc-send to-bc-recv

user of read/write shared memory

Shared Memory

Page 15: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 15

Correctness of Linearizability Algorithm

• Consider any admissible execution of the algorithm – underlying totally ordered broadcast

behaves properly– users interact properly

• Show that , the restriction of to the events of the top interface, satisfies Liveness, and Linearizability.

Page 16: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 16

Correctness of Linearizability Algorithm• Liveness (every invocation has a response):

By Liveness property of the underlying totally ordered broadcast.

• Linearizability: Define the permutation of the operations to be the order in which the corresponding broadcasts are received. is legal: because all the operations are

consistently ordered by the TO bcast. respects real-time order of operations: if O1

finishes before O2 begins, O1's bcast is ordered before O2's bcast.

Page 17: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 17

Why is Read Bcast Needed?

• The bcast done for a read causes no changes to any replicas, just delays the response to the read.

• Why is it needed?

• Let's see what happens if we remove it.

Page 18: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 18

Why Read Bcast is Needed

write(1)

read return(1)

read return(0)

to-bc-send

p0

p1

p2

Page 19: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 19

Algorithm for Sequential Consistency• The linearizability algorithm, without doing a bcast for

reads:• Uses totally ordered broadcast as the underlying

communication system.• Each proc keeps a replica for each shared variable• When read request arrives:

– immediately return the value stored in the local replica

• When write request arrives:– send bcast msg containing request– upon receipt, each proc updates its replica's value– when own bcast msg arrives, respond with ack

Page 20: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 20

Correctness of SC Algorithm

Lemma (9.3): The local copies at each proc. take on all the values appearing in write operations, in the same order, which preserves the per-proc. order of writes.

Lemma (9.4): If pi writes Y and later reads X, then pi's update of its local copy of Y (on behalf of that write) precedes its read of its local copy of X (on behalf of that read).

Page 21: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 21

Correctness of the SC Algorithm

(Theorem 9.5) Why does SC hold?

• Given any admissible execution , must come up with a permutation of the shared memory operations that is– legal and– respects per-proc. ordering of operations

Page 22: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 22

The Permutation • Insert all writes into in their to-bcast

order.• Consider each read R in in the order of

invocation:– suppose R is a read by pi of X– place R in immediately after the later of

• the operation by pi that immediately precedes R in , and

• the write that R "read from" (caused the latest update of pi's local copy of X preceding the response for R)

Page 23: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 23

Permutation Example

write(2)

read return(2)

read return(1)

to-bc-send

p0

p1

p2

ack

write(1) ack

to-bc-send

permutation is given by green numbers

1

3

4

2

Page 24: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 24

Permutation Respects Per Proc. OrderingFor a specific proc:• Relative ordering of two writes is preserved

by Lemma 9.3• Relative ordering of two reads is preserved by

the construction of • If write W precedes read R in exec. , then W

precedes R in by construction• Suppose read R precedes write W in .

Show same is true in .

Page 25: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 25

Permutation Respects Ordering• Suppose R and W are swapped in :

– There is a read R' by pi that equals or precedes R in – There is a write W' that equals W or follows W in the to-

bcast order– And R' "reads from" W'.

• But:– R' finishes before W starts in and– updates are done to local replicas in to-bcast order (Lemma

9.3) so update for W' does not precede update for W– so R' cannot read from W'.

R' R W|pi :

: …W … W' … R' … R …

Page 26: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 26

Permutation is Legal

• Consider some read R by pi and some write W s.t. R reads from W in .

• Suppose in contradiction, some other write W' falls between W and R in :

• Why does R follow W' in ?

: …W … W' … R …

Page 27: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 27

Permutation is Legal

Case 1: R follows W' in because W' is also by pi and R follows W' in .

• Update for W at pi precedes update for W' at pi in (Lemma 9.3).

• Thus R does not read from W, contradiction.

Page 28: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 28

Permutation is LegalCase 2: R follows W' in due to some operation O

by pi s.t. – O precedes R in , and – O is placed between W' and R in

: …W … W' … O … R …

Case 2.1: O is a write.• update for W' at pi precedes update for O at pi in (Lemma 9.3)• update for O at pi precedes pi's local read for R in (Lemma 9.4)• So R does not read from W, contradiction.

Page 29: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 29

Permutation is Legal : …W … W' … O' … O … R …

Case 2.2: O is a read.• A recursive argument shows that there exists a read O'

by pi (which might equal O) that – reads from W' in and– appears in between W' and O

• Update for W at pi precedes update for W' at pi in (Lemma 9.3).

• Update for W' at pi precedes local read for O' at pi in (otherwise O' would not read from W').

• Recall that O' equals or precedes O (from above) and O precedes R (by assumption for Case 2) in

• Thus R cannot read from W, contradiction.

Page 30: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 30

Performance of SC Algorithm

• Read operations are implemented "locally", without requiring any inter-process communication.

• Thus reads can be viewed as "fast": time between invocation and response is that needed for some local computation.

• Time for writes is time for delivery of one totally ordered broadcast (depends on how to-bcast is implemented).

Page 31: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 31

Alternative SC Algorithm

• It is possible to have an algorithm that implements sequentially consistent shared memory on top of totally ordered broadcast that has reverse performance:– writes are local/fast (even though bcasts are sent,

don't wait for them to be received)– reads can require waiting for some bcasts to be

received

• Like the previous SC algorithm, this one does not implement linearizable shared memory.

Page 32: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 32

Time Complexity for DSM Algorithms• One complexity measure of interest for DSM

algorithms is how long it takes for operations to complete.

• The linearizability algorithm required D time for both reads and writes, where D is the maximum time for a totally-ordered broadcast message to be received.

• The sequential consistency algorithm required D time for writes and C time for reads, where C is the time for doing some local computation.

• Can we do better? To answer this question, we need some kind of timing model.

Page 33: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 33

Timing Model

• Assume the underlying communication system is the point-to-point message passing system (not totally ordered broadcast).

• Assume that every message has delay in the range [d-u,d].

• Claim: Totally ordered broadcast can be implemented in this model so that D, the maximum time for delivery, is O(d).

Page 34: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 34

Time and Clocks in Layered Model• Timed execution: associate an occurrence

time with each node input event.• Times of other events are "inherited" from

time of triggering node input– recall assumption that local processing time is

negligible.

• Model hardware clocks as before: run at same rate as real time, but not synchronized

• Notions of view, timed view, shifting are same:– Shifting Lemma still holds (relates h/w clocks and

msg delays between original and shifted execs)

Page 35: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 35

Lower Bound for SC

Let Tread = worst-case time for a read to complete

Let Twrite = worst-case time for a write to complete

Theorem (9.7): In any simulation of sequentially consistent shared memory on top of point-to-point message passing, Tread + Twrite d.

Page 36: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 36

SC Lower Bound Proof• Consider any SC simulation with Tread + Twrite < d.• Let X and Y be two shared variables, both initially 0.• Let 0 be admissible execution whose top layer

behavior is

write0(X,1) ack0(X) read0(Y) return0(Y,0)– write begins at time 0, read ends before time d– every msg has delay d

• Why does 0 exist?– The alg. must respond correctly to any sequence of invocations.– Suppose user at p0 wants to do a write, immediately followed by

a read.– By SC, read must return 0.– By assumption, total elapsed time is less than d.

Page 37: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 37

SC Lower Bound Proof

• Similarly, let 1 be admissible execution whose top layer behavior is

write1(Y,1) ack1(Y) read1(X) return1(X,0)– write begins at time 0, read ends before time d– every msg has delay d

1 exists for similar reason.

• Now merge p0's timed view in 0 with p1's timed view in 1 to create admissible execution '.

• But ' is not SC, contradiction!

Page 38: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 38

SC Lower Bound Prooftime 0 d

write(X,1) read(Y,0)p0

p1

0

write(Y,1) read(X,0)

p0

p1

1

write(X,1) read(Y,0)p0

p1

'

write(Y,1) read(X,0)

Page 39: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 39

Linearizability Write Lower BoundTheorem (9.8): In any simulation of linearizable

shared memory on top of point-to-point message passing, Twrite ≥ u/2.

Proof: Consider any linearizable simulation with Twrite < u/2.

• Let be an admissible exec. whose top layer behavior is:p1 writes 1 to X, p2 writes 2 to X, p0 reads 2 from X

• Shift to create admissible exec. in which p1 and p2's writes are swapped, causing p0's read to violate linearizability.

Page 40: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 40

Linearizability Write Lower Bound0 u/2 utime:

p0

p1

p2

write 1

read 2

write 2

:

p0

p1

p2

delaypattern

d - u/2

d - u/2

d - u/2 d - u/2

d d - u

Page 41: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 41

Linearizability Write Lower Bound0 u/2 utime:

p0

p1

p2

write 1

read 2

write 2

p0

p1

p2

delaypattern

d

d - u

d - u d

d- u d

shift p1

by u/2

shift p2

by -u/2

Page 42: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 42

Linearizability Read Lower Bound

• Approach is similar to the write lower bound. • Assume in contradiction there is an algorithm

with Tread < u/4.• Identify a particular execution:

– fix a pattern of read and write invocations, occurring at particular times

– fix the pattern of message delays

• Shift this execution to get one that is– still admissible– but not linearizable

Page 43: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 43

Linearizability Read Lower Bound

Original execution:

• p1 reads X and gets 0 (old value).

• Then p0 starts writing 1 to X.

• When write is done, p0 reads X and gets 1 (new value).

• Also, during the write, p0 and p1 alternate reading X.

• At some point, the reads stop getting the old value (0) and start getting the new value (1)

Page 44: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 44

Linearizability Read Lower Bound

• Set all delays in this execution to be d - u/2.

• Now shift p2 earlier by u/2.

• Verify that result is still admissible (every delay either stays the same or becomes d or d - u).

• But in shifted execution, sequence of values read is

0, 0, …, 0, 1, 0, 1, 1, …, 1

Page 45: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 45

p0

p1

p2

Linearizability Read Lower Bound

read 0

read 1

read 0

read 1

read 1

read 1

read 1

read 0

write 1

u/2

p0

p1

read 0 read 0 read 1 read 1

p2

read 1read 1 read 1read 0

write 1