designing correct concurrent applications: an algorithmic view
DESCRIPTION
Designing Correct Concurrent Applications: An Algorithmic View. Hagit Attiya Technion. Concurrent Systems. Concurrent Systems. Programming Languages ( PL ). Distributed Computing ( DC ). Hard to design correct (& efficient!) applications. Main Admin Issues. Mandatory participation - PowerPoint PPT PresentationTRANSCRIPT
Designing Correct Concurrent Applications: An Algorithmic View
Hagit AttiyaTechnion
seminar in distributed algorithm (236825)2
Concurrent Systems
Spring 2013
seminar in distributed algorithm (236825)3
Concurrent Systems
Hard to design correct (& efficient!) applications
Spring 2013
application
concurrent system
Programming Languages (PL)
Distributed Computing (DC)
4 seminar in distributed algorithm (236825) Spring 2013
seminar in distributed algorithm (236825)5
Main Admin Issues
• Mandatory participation– 1 absentee w/o explanation
• List of papers published later this week– First come first serve– Student lectures start after Passover (in 3 weeks)
• Slides in English (encouraged…)
Spring 2013
seminar in distributed algorithm (236825)6
Algorithmic View of Concurrent Systems
A collection of processesEach a sequential thread of execution
Communicating through shared data structures
Spring 2013
seminar in distributed algorithm (236825)7
Abstract Data Types (ADT)
• Cover most concurrent applications– At least encapsulate their data needs– An object-oriented programming point of view
• Abstract representation of data& set of methods (operations) for accessing it– Signature– Specification
Spring 2013
data
8 Spring 2013seminar in distributed algorithm (236825)
Implementing High-Level ADT
data
data
9 Spring 2013seminar in distributed algorithm (236825)
Implementing High-Level ADT
data
data
-------------------------------------------------------------------------------------------------------------------------------------------
Using lower-level ADTs &methods
seminar in distributed algorithm (236825)10
Lower-Level Operations
• High-level operations translate into primitives on base objects– Obvious: read, write (restrictions?)– Common: compare&swap (CAS)– LL/SC, Double-CAS (2CAS, DCAS), kCAS, …– Generic: read-modify-write (RMW), kRMW
• Low-level operations are often implemented from more primitive operations– A hierarchy of implementations
Spring 2013
11 seminar in distributed algorithm (236825)
Executing Operations
Spring 2013
P1
invocation response
P2
P3
12 seminar in distributed algorithm (236825)
Interleaving Operations
Spring 2013
Concurrent (interleaved) execution
13 seminar in distributed algorithm (236825)
Interleaving Operations
Spring 2013
)External (behavior
14 seminar in distributed algorithm (236825)
Interleaving Operations, or Not
Spring 2013
Almost complete non-interleaved execution
seminar in distributed algorithm (236825)15
Interleaving Operations, or Not
Sequential behavior: invocations & response alternate and match (on process & object)
Sequential Specification: All the legal sequential behaviors, satisfying the semantics of the ADT– E.g., for a (LIFO) stack: pop returns the last item pushed
Spring 2013
seminar in distributed algorithm (236825)16
Correctness: Sequential consistency
[Lamport, 1979]
• For every concurrent execution there is a sequential execution that– Contains the same operations– Is legal (obeys the sequential specification)– Preserves the order of operations by the same
process
Spring 2013
17 seminar in distributed algorithm (236825)
Sequential Consistency: Examples
Spring 2013
push(4)
pop():4push(7)
Concurrent (LIFO) stack
push(4)
pop():4push(7)
Last In First Out
18 seminar in distributed algorithm (236825)
Sequential Consistency: Examples
Spring 2013
push(4)
pop():7push(7)
Concurrent (LIFO) stack
Last In First Out
seminar in distributed algorithm (236825)
Pop(Stack S)1. do forever2. top := S.top3. if top = null4. return empty5. if compare&swap(S, top, top.next)6. return top.val7. od
Example 1: Treiber’s stack
19
Top valnext
valnext
… valnext
Spring 2013
20 seminar in distributed algorithm (236825)
Treiber’s Stack: Proof
Spring 2013
Create sequential execution: – Place pop operations in the order of their successful
compare&swap primitives
Pop(Stack S)1. do forever2. top := S.top3. if top = null4. return empty5. if compare&swap(S, top, top.next)6. return top.val7. od
May get stuck &uses CAS
21 seminar in distributed algorithm (236825)
Example 2: Multi-Writer Registers
Add logical time (Lamport timestamps) to values
Write(v,X)read TS1,..., TSn
TSi = max TSj +1write v,TSi
Read only own value
Read(X)read v,TSi return v
Once in a while read TS1,..., TSn
and write to TSi
Spring 2013
Using (multi-reader) single-writer registers
Need to ensure writes are eventually visible
seminar in distributed algorithm (236825)22
Timestamps1. The timestamps of two write operations by the same process
are ordered 2. If a write operation completes before another one starts, it has a
smaller timestamp
Spring 2013
Write(v,X)read TS1,..., TSn
TSi = max TSj +1write v,TSi
23 seminar in distributed algorithm (236825)
Multi-Writer Registers: Proof
Write(v,X)read TS1,..., TSn
TSi = max TSj +1write v,TSi
Read(X)read v,TSi return v
Once in a while read TS1,..., TSn
and write to TSi
Spring 2013
Create sequential execution: – Place writes in timestamp order– Insert reads after the appropriate write
24 Spring 2013seminar in distributed algorithm (236825)
Multi-Writer Registers: Proof
Create sequential execution: – Place writes in timestamp order– Insert reads after the appropriate write
Legality is immediate Per-process order is preserved since a read returns a
value (with timestamp) larger than the preceding write by the same process
The Happened-Before Relation
a b means that event a happened before event b:• If a and b are events by the same process
and a occurs before b, then a b • If event b obtains information from event a
then a b– Usually defined through message passing– But can be extended to read / write
• Transitive closure: If a b and b c then a c
If events a and b by different processes do not exchange information then neither a b nor a b are true
Timestamps Capture the Happened-Before Relation
For timestamps generated as in previous algorithm, we have
•If a b then TS(a) < TS(b)
But not vice versa… can have TS(a) < TS(b) but not a b
Need to use vector timestamps
seminar in distributed algorithm (236825)27
Causality Captures the Essence of the Computation
If two executions have the same happened-before relation
– Disagree only on the order of events a and b such that neither a b nor a b
The executions are indistinguishable to the processes
– Each process obtains the same results when invoking primitives on the base objects
Spring 2013
/ /
28 seminar in distributed algorithm (236825)
Sequential Consistency is not Composable
Spring 2013
enq(Q1,R) enq(Q2,R) deq(Q1,G)enq(Q2,G) enq(Q1,G) deq(Q2,R)
The execution is not sequentially consistent
29 seminar in distributed algorithm (236825)
Sequential Consistency is not Composable
Spring 2013
enq(Q1,R) deq(Q1,G)enq(Q1,G)enq(Q2,R)enq(Q2,G) deq(Q2,R)
The execution projected on each object is sequentially consistent
Must have common object brokerageNot modular
seminar in distributed algorithm (236825)30
Correctness: Linearizability
[Herlihy & Wing, 1990]• For every concurrent execution there is a sequential
execution that– Contains the same operations– Is legal (obeys the specification of the ADTs)– Preserves the real-time order of non-overlapping
operations• Each operation appears to takes effect
instantaneously at some point between its invocation and its response (atomicity)
Spring 2013
31 seminar in distributed algorithm (236825)
Linearizability: Examples
Spring 2013
push(4)
pop():4push(7)
Concurrent (LIFO) stack
push(4)
pop():4push(7)
Last In First Out
32 seminar in distributed algorithm (236825)
Example 3: Linearizable Multi-Writer Registers
Add logical time to values
Write(v,X)read TS1,..., TSn
TSi = max TSj +1write v,TSi
Read(X)read TS1,...,TSn
return value with max TS
Spring 2013
Using (multi-reader) single-writer registers[Vitanyi & Awerbuch, 1987]
33 seminar in distributed algorithm (236825)
Multi-writer registers: Linearization order
Write(v,X)read TS1,..., TSn
TSi = max TSj +1write v,TSi
Spring 2013
Create linearization: – Place writes in timestamp order– Insert each read after the appropriate write
Read(X)read TS1,...,TSn
return value with max TS
34 seminar in distributed algorithm (236825)
Multi-Writer Registers: Proof
Spring 2013
Create linearization: – Place writes in timestamp order– Insert each read after the appropriate write
Legality is immediate Real-time order is preserved since a read returns a value
(with timestamp) larger than all preceding operations
seminar in distributed algorithm (236825)35
Linearizability is Composable
• The whole system is linearizable each object is linearizable
• Allows to implement and verify each object separately
Spring 2013
seminar in distributed algorithm (236825)36
Example 4: Atomic Snapshot
• n components• Update a single component• Scan all the components
“at once” (atomically)
Provides an instantaneous view of the whole memory
Spring 2013
updateok
scanv1,…,vn
37 Spring 2013seminar in distributed algorithm (236825)
Atomic Snapshot Algorithm
Update(v,k)A[k] = v,seqi,i
Scan()repeat
read A[1],…,A[n]read A[1],…,A[n]if equal
return A[1,…,n]Linearize:
• Updates with their writes• Scans inside the double
collects
double collect
[Afek, Attiya, Dolev, Gafni, Merritt, Shavit, JACM 1993]
seminar in distributed algorithm (236825)38
Atomic Snapshot: Linearizability
Double collect (read a set of values twice)If equal, there is no write between the collects
– Assuming each write has a new value (seq#)
Creates a “safe zone”, where the scan can be linearized
Spring 2013
read A[1],…,A[n] read A[1],…,A[n]
write A[j]
seminar in distributed algorithm (236825)39
Liveness Conditions (Eventual)
• Wait-free: every operation completes within a finite number of (its own) steps no starvation for mutex
• Nonblocking: some operation completes within a finite number of (some other process) steps deadlock-freedom for mutex
• Obstruction-free: an operation (eventually) running solo completes within a finite number of (its own) steps– Also called solo termination
wait-free nonblocking obstruction-free
Spring 2013
seminar in distributed algorithm (236825)40
Liveness Conditions (Bounded)
• Wait-free: every operation completes within a bounded number of (its own) steps no starvation for mutex
• Nonblocking: some operation completes within a bounded number of (some other process) steps deadlock-freedom for mutex
• Obstruction-free: an operation (eventually) running solo completes within a bounded number of (its own) steps– Also called solo termination
Bounded wait-free bounded nonblocking bounded obstruction-free
Spring 2013
seminar in distributed algorithm (236825)41
Wait-free Atomic Snapshot[Afek, Attiya, Dolev, Gafni, Merritt, Shavit, JACM 1993]
• Embed a scan within the Update.
Spring 2013
Update(v,k)V = scanA[k] = v,seqi,i,V
Scan()repeat
read A[1],…,A[n]read A[1],…,A[n]if equal
return A[1,…,n]
else record diffif twice pj return Vj
Linearize:• Updates with their writes• Direct scans as before• Borrowed scans in place
direct scan
borrowedscan
seminar in distributed algorithm (236825)42
Atomic Snapshot: Borrowed Scans
Interference by process pj
And another one… pj does a scan inbeteween
Linearizing with the borrowed scan is OK.Spring 2013
write A[j]
read A[j]… …
read A[j]… …
embedded scan write A[j]
read A[j]… …
read A[j]… …