memory models

30
Shared Memory – Consistency of Shared Variables The ideal picture of shared memory : CPU0 CPU1 CPU2 CPU3 Shared Memory Read / Write The actual architecture of shared memory systems : R/W of Misses + Cache Invalidate CPU0 CPU1 CPU2 CPU3 Shared Memory Read / Write Local Cache Local Cache Local Cache Local Cache Symmetric Multi-Processor (SMP) : CPU0 CPU1 CPU2 CPU3 Local Memory Module Local Memory Module Local Memory Module Local Memory Module Network Distributed Shared Memory (DSM) :

Upload: manukumar-kumar

Post on 20-Jul-2016

13 views

Category:

Documents


1 download

DESCRIPTION

Memory models in Advanced computer architecutre

TRANSCRIPT

Page 1: Memory Models

Shared Memory – Consistency of Shared VariablesThe ideal picture ofshared memory:

CPU0 CPU1 CPU2 CPU3

Shared Memory

Read/Write

The actual architecture of shared memory systems:

R/W ofMisses+

CacheInvalidate

CPU0 CPU1 CPU2 CPU3

Shared Memory

Read/Write

LocalCache

LocalCache

LocalCache

LocalCache

Symmetric Multi-Processor (SMP):

CPU0 CPU1 CPU2 CPU3

LocalMemoryModule

LocalMemoryModule

LocalMemoryModule

LocalMemoryModule

Network

Distributed Shared Memory (DSM):

Page 2: Memory Models

The Million $$s Question:How/When Does One ProcessRead Other Process’s Writes?

CPUi

Write value x to local copyof shared variable V

W V,x

Assumption: Initial value of shared variables is always 0.

CPUj

R V,0? R V,x?Read V fromlocal copy

Why is this a question? Because temporal order relations like “before/after” do not necessarily hold in a distributed system.

Page 3: Memory Models

Why Memory Model?

a=0,b=0

Print(b) Print(a)

a=1 b=1

Printed:0,0?

Printed:1,0?

Printed:1,1?

Answers the question: “Which writes by a process are seen by which reads of the other processes?”

Page 4: Memory Models

Memory Consistency Models

Pi: R V; W V,7; R V; R VPj: R V; W V,13; R V; R V

Example program:

A consistency/memory model is an “agreement” between theexecution environment (H/W, OS, middleware) and the processes. Runtime guarantees to the application certain properties on theway values written to shared variables become visible to reads.This determines the memory model, what’s valid, what’s not.

Example execution: Pi: R V,0; W V,7; R V,7; R V,13Pj: R V,0; W V,13; R V,13; R V,7

Order of writes to V as seen to Pi: (1) W V,7; (2) W V,13Order of writes to V as seen to Pj: (1) W V,13; (2) W V,7

Page 5: Memory Models

Memory Model: CoherenceCoherence is the memory model in which (the runtime guarantees to the program that) writes performed by the processes for every specific variable are viewed by all processes in the same full order.

Example program: All valid executions under Coherence:

Pi:W V,7R VR V

Pj:W V,13R VR V

The Register Property: the view of a process consists of the values it “sees” in its reads, and the writes it performs. If a R V in P which is later than W V,x in P sees value different than x, then a later R V cannot see x.

Pi:W V,7R V,7R V,7

Pj:W V,13R V,13R V,7

Pi:W V,7R V,7R V,7

Pj:W V,13R V,7R V,7

Pi:W V,7R V,7R V,13

Pj:W V,13R V,13R V,13

Pi:W V,7R V,13R V,13

Pj:W V,13R V,13R V,13

Pi:W V,7R V,7R V,7

Pj:W V,13R V,13R V,13

Page 6: Memory Models

Formal definition of Coherence

Program Order: The order in which instructions appear in each process. This is a partial order on all the instructions in the program.

A serialization: A full order on all the instructions (reads/writes) of all the processes, which is consistent with the program order.

A legal serialization: A serialization in which each read X returns the value written by the latest write X in the full order.

Let P be a program; let PX be the “sub-program” of P which contains all the read X/write X operations on X only .

Coherence: P is said to be coherent if for every variable X there exists a legal serialization of PX. (Note: a process cannot distinguish one such serialization from another for a given execution)

Page 7: Memory Models

Examples

Process 1write x,1write x,2

Process 2read x,2read x,1

Coherent. Serializations:x: write x,1, read x,1y: write y,1, read y,1

Not Coherent.Cycle of dependencies.Cannot be serialized.

Not Coherent.Cannot be serialized.

Process 2read y,1write x,1

Process 1read x,1write y,1

Process 1read x,1write x,2

Process 2read x,2write x,1

Process 2read y,1write x,1

Page 8: Memory Models

Sequential Consistency [Lamport 1979]

Sequential Consistency is the memory model in which all reads/writes performed by the processes are viewed by all processes in the same full order.

Coherent.Not Sequentiallyconsistent.

Coherent.Not Sequentiallyconsistent.

Process 1write x,1write y,1

Process 2read y,1read x,0

Process 1read x,1write y,1

Process 2read y,1write x,1

Page 9: Memory Models

Strict (Strong) Memory Models

a=0,b=0

Print(b) Print(a)

a=1 b=1

Printed:0,0 or 0,1 or 1,0

Printed:1,1

Sequential Consistency: Given an execution, there exists an order of reads/writes which is consistent with all program orders.

Coherence: For any variable x, there exists an order of read x/write x consistent with all p.o.s.

Page 10: Memory Models

Formal definition of Sequential Consistency

Let P be a program .Sequential Consistency: P is said to be sequentially consistent if there exists a legal serialization of all reads/writes in P.

Observation:Every program which is sequentially consistent is also coherent.Conclusion:Sequential Consistency has stronger requirements and we thussay that it is stronger than Coherence.In general:A consistency model A is said to be (strictly) stronger than B ifall executions which are valid under A are also valid under B.

Page 11: Memory Models

The problem of strong consistency models

The runtime system should ensure the existence of legal serialization, and the same consistent view for all processes.

This requires lots of expensive coordination degrades performance!

P1:Print(U)Write V,1

P2:Print(V)Write U,1

SC: Hardware cannot reorderlocally in each thread for thiswill result in a possible printing 1,1.HW may reorder anyway andpostpone writes, but then whyreorder in the first place?

Page 12: Memory Models

Coherence Forbids Reordering

p.x = 0

p.x=1 a=p.x

b=q.x

assert(ab)

Once thread sees an update – cannot “forget” it has seen it. Cannot reorder two reads of the same memory location.

q.x is aliased to p.x.Reordering may make assignment toB early (seeing 0) and that to A late(seeing 1). The right thread see order of writes different from left thread.

Page 13: Memory Models

Coherence makes reads preventcommon compiler optimizations

p and q might point to same objectp.x = 0

p.x=1 a=p.x

b=q.x

assert(p==q a b c)

Cannot put c=ac=p.x

reads can make aprocess see writesby another process.The read “kills” laterreuse of local values.

Page 14: Memory Models

15

Release Consistency[Gharachorloo et al. 1990, DASH]

• Introduces a special type of variables, called synchronization variables or locks.

• Locks cannot be read or written. They can be acquired and released, denoted acquire(L) and release(L) for a lock L.

• A process that acquired a lock L but has not released it, holds it.

• No more than one process can hold a lock L, while others wait.

K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J.L. Hennessy. Memory consistency and event ordering in scalable shared-memory multiprocessors. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 15--26. IEEE, May 1990.

Page 15: Memory Models

16

Using release and acquire to define execution-flow synchronization primitives

• Let a set of processes release tokens by reaching the operation Release in their program order.

• Let another set (possibly with overlap) acquire those tokens by performing acquire operation, where acquire can proceed only when all tokens have already arrived from all releasing processes.

• 2-way synchronization = lock-unlock, 1 release, 1 acquire

• n-way synchronization = barrier, n releases, n acquires• PARC’s synch = k-way synchronization

Page 16: Memory Models

19

)Almost Formal (Definition ofRelease Consistency

Assuming atomic read/write/acquire/release (never the case. For simplicity only)

(A)Before a read or write access is allowed to perform, all preceding (program order) acquire accesses must be performed, and(B)Before a release access is allowed to perform, all preceding (program order) read or write accesses must be performed, and(C) acquire and release accesses are sequentially consistent.

Page 17: Memory Models

20

•A

•B

rel(L1)w(x)1

t

r(x)0 r(x)? acq(L1)r(x)1 r(x)1

Understanding RC

From this point in this execution all processes must see the value 1 in X

It is undefined what value is read here. It can be any value written by some process. Here it can be 0 or 1.

According to rule (B): 1 is read in the current execution. However, the programmer cannot be sure 1 will be read in all executions.

According to rules (A) and (C), the programmer knows that in all executions this read returns 1.

Page 18: Memory Models

21

Acquire and Release • release serves as a memory-synch operation, or a flush

of the local modifications to all other processes.• acquire and release are not only used for

synchronization of execution, but also for synchronization of memory, i.e. for propagation of writes from/to other processes.– This allows to overlap the two expensive types of

synchronization.– This turns out also simpler on the programmer from

semantic point of view.

Page 19: Memory Models

22

Acquire and Release (cont.)

• A release followed by an acquire of the same lock guarantees to the programmer that all writes previous to the release will be seen by all reads following the acquire.

• The idea is to let the programmer decide which blocks of operations need be synchronized, and put them between matching pair of acquire-release operations.

• In the absence of release/acquire pairs, there is no assurance that modifications will ever propagate between processes.

Page 20: Memory Models

25

Happened-Before relation induced by acquire/release

A

B

rel(L2)

w(x)

t

acq(L1)

r(x)

acq(L2)

rel(L1) acq(L2)

w(y)

r(y)

rel(L2)w(y) rel(L1)r(x)

Cr(y)

w(x)

Page 21: Memory Models

27

Data Races in RCRelease Consistency does not guarantee anything about ordered propagation of updates

Initially: grades = oldDatabase; updated = false;

grades = newDatabase;

updated = true; while (updated == false); X:=grades.gradeOf(lecturersSon);

•Thread T.A.•Thread Lecturer

• If the modification of variable updated is passed to Lecturer while the modification of grades is not, then Lecturer looks at the old database!

• This is possible in Release Consistency, but not in Sequential Consistency.

Page 22: Memory Models

29

Expressiveness of Release Consistency[Gharachorloo et.al 1990]

Theorem: RC = SC for programs having no data-races.

Given a properly-labeled program P, the set of valid executions for P is the same on systems providing RC and those providing SC.

Conclusion (OpenMP, Java, C++, etc) :•System provide RC (performance)•Programmer avoid data-races (program verification)Best of both worlds!

Page 23: Memory Models

30

Lazy Release Consistency[Keleher et al., Treadmarks 1992] *

• Postpone modifications until remote process “really” needs them

• More relaxed than RC

•P. Keleher, A. L. Cox, S. Dwarkadas, and W. Zwaenopol. Treadmarks: Distributed shared memory on standard workstations and operating systems. In Proceedings of the 1994 Winter Usenix Conference,

pages 115--132, Jan. 1994.

•(*)

Page 24: Memory Models

31

Formal Definition of Lazy Release Consistency

(A)Before a read or write access is allowed to perform with respect to any other process, all previous acquire accesses must be performed with respect to that other process, and(B)Before a release access is allowed to perform with respect to any other process, all previous read or write accesses must be performed with respect to that other process, and(C) acquire and release accesses are sequentially consistent.

Page 25: Memory Models

32

A

B

rel(L1)w(x)1

t

r(x)0 r(x)? acq(L1)r(x)? r(x)1

Understanding the LRC Memory Model

Cr(x)0 r(x)? acq(L2)r(x)? r(x)?

• It is guaranteed that the acquirer of the same lock sees the modification that precede the release in program order.

Page 26: Memory Models

33

Understanding the LRC Memory Model: Transitivity

• The process C sees the modification of x by A.

A

B

rel(L1)w(x)1

t

acq(L1)

Cacq(L2)

w(y)1 rel(L2)

r(x)1 r(y)1

acq(L2) rel(L1)

Page 27: Memory Models

34

Implementation of LRC

• Satisfying the happened-before relation between all operations is enough to satisfy LRC.– Maintenance and usage of such a detailed ordering

would be expensive.• Instead, the ordering is applied to process

intervals.– Intervals are segments of time in the execution of a

single process.– New interval begins each time a process executes a

synchronization operation.

Page 28: Memory Models

35

Intervals

P2

acq(L1) rel(L2)acq(L2) rel(L1)

P1

t

P3

rel(L1) acq(L3)

acq(L2)rel(L3)

1 2 3

1 2 3

321

4 5

Page 29: Memory Models

36

Happened-before of Intervals

A happened before partial order is defined between intervals.

An interval i1 precedes an interval i2 according to happened-before of intervals, if all accesses in i1 precede accesses in i2 according to the happened-before of accesses.

Page 30: Memory Models

37

Vector Timestamps

• An interval is said to be performed at a process if all interval’s accesses have been performed at that process.

• Each process p has vector timestamp Vp that tracks which intervals have been performed at that process.– A vector timestamp consists of a set of interval

indices, one per process in the system.