conflict-free replicated data types

67
Conflict-free Replicated Data Types MARC SHAPIRO, NUNO PREGUIÇA, CARLOS BAQUERO AND MAREK ZAWIRSKI Presented by: Ron Zisman

Upload: nowles

Post on 25-Feb-2016

65 views

Category:

Documents


0 download

DESCRIPTION

Conflict-free Replicated Data Types. Presented by: Ron Zisman. Marc Shapiro, Nuno Preguiça , Carlos Baquero and Marek Zawirski. Replication and Consistency - essential features of large distributed systems such as www, p2p, and cloud computing Lots of replicas - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Conflict-free Replicated Data Types

Conflict-free Replicated Data Types

MARC SHAPIRO, NUNO PREGUIÇA, CARLOS BAQUERO AND MAREK ZAWIRSKI

Presented by: Ron Zisman

Page 2: Conflict-free Replicated Data Types

2Motivation

Replication and Consistency - essential features of large distributed systems such as www, p2p, and cloud computing

Lots of replicas Great for fault-tolerance and read latency× Problematic when updates occur

• Slow synchronization• Conflicts in case of no synchronization

Page 3: Conflict-free Replicated Data Types

3Motivation

We look for an approach that: supports Replication guarantees Eventual Consistency is Fast and Simple

Conflict-free objects = no synchronization whatsoever

Is this practical?

Page 4: Conflict-free Replicated Data Types

4Contributions

TheoryStrong Eventual Consistency (SEC)

A solution to the CAP problem Formal definitions Two sufficient conditions Strong equivalence between

the two Incomparable to sequential

consistency

PracticeCRDTs = Convergent or Commutative Replicated Data Types

Counters Set Directed graph

Page 5: Conflict-free Replicated Data Types

5Strong Consistency

Ideal consistency: all replicas know about the update immediately after it executes

Preclude conflicts Replicas update in the same

total order Any deterministic object

Consensus Serialization bottleneck Tolerates < n/2 faults Correct, but doesn’t scale

Page 6: Conflict-free Replicated Data Types

6Strong Consistency

Ideal consistency: all replicas know about the update immediately after it executes

Preclude conflicts Replicas update in the same

total order Any deterministic object

Consensus Serialization bottleneck Tolerates < n/2 faults Correct, but doesn’t scale

Page 7: Conflict-free Replicated Data Types

7Strong Consistency

Ideal consistency: all replicas know about the update immediately after it executes

Preclude conflicts Replicas update in the same

total order Any deterministic object

Consensus Serialization bottleneck Tolerates < n/2 faults Correct, but doesn’t scale

Page 8: Conflict-free Replicated Data Types

8Strong Consistency

Ideal consistency: all replicas know about the update immediately after it executes

Preclude conflicts Replicas update in the same

total order Any deterministic object

Consensus Serialization bottleneck Tolerates < n/2 faults Correct, but doesn’t scale

Page 9: Conflict-free Replicated Data Types

9Strong Consistency

Ideal consistency: all replicas know about the update immediately after it executes

Preclude conflicts Replicas update in the same

total order Any deterministic object

Consensus Serialization bottleneck Tolerates < n/2 faults Correct, but doesn’t scale

Page 10: Conflict-free Replicated Data Types

10Eventual Consistency

Update local and propagate No foreground synch Eventual, reliable delivery

On conflict Arbitrate Roll back

Consensus moved to background Better performance× Still complex

Page 11: Conflict-free Replicated Data Types

11Eventual Consistency

Update local and propagate No foreground synch Eventual, reliable delivery

On conflict Arbitrate Roll back

Consensus moved to background Better performance× Still complex

Page 12: Conflict-free Replicated Data Types

12Eventual Consistency

Update local and propagate No foreground synch Eventual, reliable delivery

On conflict Arbitrate Roll back

Consensus moved to background Better performance× Still complex

Page 13: Conflict-free Replicated Data Types

13Eventual Consistency

Update local and propagate No foreground synch Eventual, reliable delivery

On conflict Arbitrate Roll back

Consensus moved to background Better performance× Still complex

Page 14: Conflict-free Replicated Data Types

14Eventual Consistency

Update local and propagate No foreground synch Eventual, reliable delivery

On conflict Arbitrate Roll back

Consensus moved to background Better performance× Still complex

Page 15: Conflict-free Replicated Data Types

15Eventual Consistency

Update local and propagate No foreground synch Eventual, reliable delivery

On conflict Arbitrate Roll back

Consensus moved to background Better performance× Still complex

Page 16: Conflict-free Replicated Data Types

16Eventual Consistency

Reconcile

Update local and propagate No foreground synch Eventual, reliable delivery

On conflict Arbitrate Roll back

Consensus moved to background Better performance× Still complex

Page 17: Conflict-free Replicated Data Types

17Strong Eventual Consistency

Update local and propagate No synch Eventual, reliable delivery

No conflict deterministic outcome of

concurrent updates No consensus: ≤ n-1 faults Solves the CAP problem

Page 18: Conflict-free Replicated Data Types

18Strong Eventual Consistency

Update local and propagate No synch Eventual, reliable delivery

No conflict deterministic outcome of

concurrent updates No consensus: ≤ n-1 faults Solves the CAP problem

Page 19: Conflict-free Replicated Data Types

19Strong Eventual Consistency

Update local and propagate No synch Eventual, reliable delivery

No conflict deterministic outcome of

concurrent updates No consensus: ≤ n-1 faults Solves the CAP problem

Page 20: Conflict-free Replicated Data Types

20Strong Eventual Consistency

Update local and propagate No synch Eventual, reliable delivery

No conflict deterministic outcome of

concurrent updates No consensus: ≤ n-1 faults Solves the CAP problem

Page 21: Conflict-free Replicated Data Types

21Strong Eventual Consistency

Update local and propagate No synch Eventual, reliable delivery

No conflict deterministic outcome of

concurrent updates No consensus: ≤ n-1 faults Solves the CAP problem

Page 22: Conflict-free Replicated Data Types

22Definition of EC

Eventual delivery: An update delivered at some correct replica is eventually delivered to all correct replicas

Termination: All method executions terminate

Convergence: Correct replicas that have delivered the same updates eventually reach equivalent state Doesn’t preclude roll backs and reconciling

Page 23: Conflict-free Replicated Data Types

23Definition of SEC

Eventual delivery: An update delivered at some correct replica is eventually delivered to all correct replicas

Termination: All method executions terminate

Strong Convergence: Correct replicas that have delivered the same updates have equivalent state

Page 24: Conflict-free Replicated Data Types

24

System model

System of non-byzantine processes interconnected by an asynchronous network

Partition-tolerance and recovery

What are the two simple conditions that guarantee strong convergence?

Page 25: Conflict-free Replicated Data Types

25Query

Client sends the query to any of the replicas Local at source replica Evaluate synchronously, no side effects

Page 26: Conflict-free Replicated Data Types

26Query

Client sends the query to any of the replicas Local at source replica Evaluate synchronously, no side effects

Page 27: Conflict-free Replicated Data Types

27Query

Client sends the query to any of the replicas Local at source replica Evaluate synchronously, no side effects

Page 28: Conflict-free Replicated Data Types

28State-based approach

An object is a tuple

Local queries, local updates Send full state; on receive, merge

Update is said ‘delivered’ at some replica when it is included in its casual history

Causal History:

payload setinitial state quer

y

updatemerge

Page 29: Conflict-free Replicated Data Types

29State-based replication

Local at source .u(a), .u(b), … Precondition, compute Update local payload

Causal History: on query: on update:

Page 30: Conflict-free Replicated Data Types

30State-based replication

Local at source .u(a), .u(b), … Precondition, compute Update local payload

Causal History: on query: on update:

Page 31: Conflict-free Replicated Data Types

31State-based replication

Local at source .u(a), .u(b), … Precondition, compute Update local payload

Convergence Episodically: send payload On delivery: merge payloads

Causal History: on query: on update: on merge:

Page 32: Conflict-free Replicated Data Types

32State-based replication

Local at source .u(a), .u(b), … Precondition, compute Update local payload

Convergence Episodically: send payload On delivery: merge payloads

Causal History: on query: on update: on merge:

Page 33: Conflict-free Replicated Data Types

33State-based replication

Local at source .u(a), .u(b), … Precondition, compute Update local payload

Convergence Episodically: send payload On delivery: merge payloads

Causal History: on query: on update: on merge:

Page 34: Conflict-free Replicated Data Types

34Semi-lattice

A poset is a join-semilattice if for all x,y in S a LUB exists

LUB = Least Upper Bound Associative: Commutative: Idempotent:

Examples:

Page 35: Conflict-free Replicated Data Types

35State-based: monotonic semi-lattice CvRDT

If:

then replicas converge to LUB of last values

payload type forms a semi-lattice updates are increasing merge computes Least Upper Bound

Page 36: Conflict-free Replicated Data Types

36Operation-based approach

An object is a tuple

prepare-update Precondition at source 1st phase: at source, synchronous, no side effects

effect-update Precondition against downstream state (P) 2nd phase, asynchronous, side-effects to downstream state

payload setinitial state quer

yprepare-update

effect-updatedelivery precondition

Page 37: Conflict-free Replicated Data Types

37Operation-based replication

Local at source Precondition, compute Broadcast to all replicas

Causal History: on query/prepare-update:

Page 38: Conflict-free Replicated Data Types

38Operation-based replication

Local at source Precondition, compute Broadcast to all replicas

Eventually, at all replicas: Downstream precondition Assign local replica

Causal History: on query/prepare-update: on effect-update:

Page 39: Conflict-free Replicated Data Types

39Operation-based replication

Local at source Precondition, compute Broadcast to all replicas

Eventually, at all replicas: Downstream precondition Assign local replica

Causal History: on query/prepare-update: on effect-update:

Page 40: Conflict-free Replicated Data Types

40Op-based: commute CmRDT

If:

then replicas converge

Liveness: all replicas execute all operations in delivery order where the downstream precondition (P) is true

Safety: concurrent operations all commute

Page 41: Conflict-free Replicated Data Types

41Monotonic semi-lattice

Commutative

A state-based object can emulate an operation-based object, and vice-versa

Use state-based reasoning and then covert to operation based for better efficiency

Page 42: Conflict-free Replicated Data Types

42Comparison

State-based Update ≠ merge

operation Simple data types State includes preceding

updates; no separate historical information

Inefficient if payload is large

File systems (NFS, Dynamo)

Operation-based Update operation Higher level, more

complex More powerful, more

constraining Small messages Collaborative editing

(Treedoc), Bayou, PNUTS

State-based or op-based, as convenient

Page 43: Conflict-free Replicated Data Types

43SEC is incomparable to sequential consistency

There is a SEC object that is not sequentially-consistent:

Consider a Set CRDT S with operations add(e) and remove(e) remove(e) → add(e) e ∈ S add(e) ║ remove(e’) e ∈ S ∧ e’ S add(e) ║ remove(e) e ∈ S (suppose add wins)

Consider the following scenario with replicas: , 1. [add(e); remove(e’)] ║ [add(e’); remove(e)] 2. merges the states from and : e ∈ S ∧ e’∈ S

The state of replica will never occur in a sequentially-consistent execution (either remove(e) or remove(e’) must be last)

Page 44: Conflict-free Replicated Data Types

44SEC is incomparable to sequential consistency

There is a sequentially-consistent object that is not SEC: If no crashes occur, a sequentially-consistent object is

SEC Generally, sequential consistency requires consensus to

determine the single order of operations – cannot be solved if n-1 crashes occur (while SEC can tolerate n-1 crashes)

Page 45: Conflict-free Replicated Data Types

45

Example CRDTs

Multi-master counter

Observed-Remove Set

Directed Graph

Page 46: Conflict-free Replicated Data Types

46Multi-master counter

Increment Payload: Partial order: value() = increment() = ++

merge(x,y) = =

Page 47: Conflict-free Replicated Data Types

47Multi-master counter

Increment / Decrement Payload: , Partial order: value() = - increment() = ++ decrement() = ++ merge(x,y) = = (

)

Page 48: Conflict-free Replicated Data Types

48Set design alternatives

Sequential specification: {true} add(e) {e ∈ S} {true} remove(e) {e ∈ S}

Concurrent: {true} add(e) ║ remove(e) {???} linearizable? error state? last writer wins? add wins? remove wins?

Page 49: Conflict-free Replicated Data Types

49Observed-Remove Set

Page 50: Conflict-free Replicated Data Types

50Observed-Remove Set

Payload: added, removed (element, unique token) add(e) =

Page 51: Conflict-free Replicated Data Types

51Observed-Remove Set

Payload: added, removed (element, unique token) add(e) =

Page 52: Conflict-free Replicated Data Types

52Observed-Remove Set

Payload: added, removed (element, unique token) add(e) =

Page 53: Conflict-free Replicated Data Types

53Observed-Remove Set

Payload: added, removed (element, unique token) add(e) =

Page 54: Conflict-free Replicated Data Types

54Observed-Remove Set

Payload: added, removed (element, unique token) add(e) = Remove all unique elements observed: remove(e) = lookup(e) = merge(S,S’) =

Page 55: Conflict-free Replicated Data Types

55Observed-Remove Set

Payload: added, removed (element, unique token) add(e) = Remove all unique elements observed: remove(e) = lookup(e) = merge(S,S’) =

Page 56: Conflict-free Replicated Data Types

56Observed-Remove Set

Payload: added, removed (element, unique token) add(e) = Remove all unique elements observed: remove(e) = lookup(e) = merge(S,S’) =

Page 57: Conflict-free Replicated Data Types

57OR-Set + Snapshot

Read consistent snapshots despite concurrent, incremental updates

Vector clock for each process (global time) Payload: a set of (event, timestamp) pairs Snapshot: vector clock value lookup(e,t):

Garbage Collection: retain tombstones until not needed log entry discarded as soon as its timestamp is less than all

remote vector clocks (delivered to all processes)

Page 58: Conflict-free Replicated Data Types

58Sharded OR-Set

Very large objects Independent shards Static: hash, Dynamic: consensus

Statically-Sharded CRDT Each shard is a CRDT Update: single shard No cross-object invariants A combination of independent CRDTs remains a CRDT

Statically-Sharded OR-Set Combination of smaller OR-Sets Consistent snapshots: clock cross shards

Page 59: Conflict-free Replicated Data Types

59Directed Graph – Motivation

Design a web search engine compute page rank by a

directed graph Efficiency and

scalability Asynchronous processing

Responsiveness Incremental processing,

as fast as each page is crawled

Operations Find new pages: add vertex Parse page links: add/remove

arc Add URLs of linked pages to

be crawled: add vertex Deleted pages: remove

vertex (lookup masks incident arcs)

Broken links allowed: add arc works even if tail vertex doesn’t exist

Page 60: Conflict-free Replicated Data Types

60Graph design alternatives

Graph = (V,A) where A ⊆ V V Sequential specification:

{v’,v’’ V} addArc(v’,v’’) {…} {(v’,v’’) A} removeVertex(v’) {…}

Concurrent: removeVertex(v) ║ addArc(v’,v’’) linearizable? last writer wins? addArc(v’,v’’) wins? – v’ or v’’ restored if removed removeVertex(v) wins? - all edges to or from v are

removed

Page 61: Conflict-free Replicated Data Types

61Directed Graph (op-based)

Payload: OR-Set V (vertices), OR-Set A (arcs)

Page 62: Conflict-free Replicated Data Types

62Directed Graph (op-based)

Payload: OR-Set V (vertices), OR-Set A (arcs)

Page 63: Conflict-free Replicated Data Types

63Summary

Principled approach Strong Eventual Consistency

Two sufficient conditions: State: monotonic semi-lattice Operation: commutativity

Useful CRDTs Multi-master counter, OR-Set, Directed Graph

Page 64: Conflict-free Replicated Data Types

64Future Work

Theory Class of computations accomplished by CRDTs Complexity classes of CRDTs Classes of invariants supported by a CRDT CRDTs and self-stabilization, aggregation, and so onPractice Library implementation of CRDTs Supporting non-critical synchronous operations

(commiting a state, global reset, etc) Sharding

Page 65: Conflict-free Replicated Data Types

65Extras: MV-Register and the Shopping Cart Anomaly

MV-Register ≈ LWW-Set Register Payload = { (value, versionVector) } assign: overwrite value, vv++ merge: union of every element in each input set that is not

dominated by an element in the other input set A more recent assignment overwrites an older one Concurrent assignments are merged by union (VC merge)

Page 66: Conflict-free Replicated Data Types

66Extras: MV-Register and the Shopping Cart Anomaly

Shopping cart anomaly deleted element reappears

MV-Register does not behave like a set Assignment is not an alternative to proper

add and remove operations

Page 67: Conflict-free Replicated Data Types

67

The problem with eventual consistency jokes is that you can't tell who doesn't get it from who hasn't gotten it.