cap theorem by ali ghodsi
TRANSCRIPT
![Page 1: CAP theorem by Ali Ghodsi](https://reader033.vdocuments.net/reader033/viewer/2022060204/55a05e0d1a28ab3c2e8b45cc/html5/thumbnails/1.jpg)
Life after CAP
![Page 2: CAP theorem by Ali Ghodsi](https://reader033.vdocuments.net/reader033/viewer/2022060204/55a05e0d1a28ab3c2e8b45cc/html5/thumbnails/2.jpg)
CAP conjecture [reminder]
• Can only have two of:– Consistency
– Availability
– Partition-tolerance
• Examples– Databases, 2PC, centralized algo (C & A)
– Distributed databases, majority protocols (C & P)
– DNS, Bayou (A & P)
![Page 3: CAP theorem by Ali Ghodsi](https://reader033.vdocuments.net/reader033/viewer/2022060204/55a05e0d1a28ab3c2e8b45cc/html5/thumbnails/3.jpg)
CAP theorem
• Formalization by Gilbert & Lynch
• What does impossible mean?
– There exist an execution which violates one of CAP
– not possible to guarantee that an algorithm has all three at all times
• Shard data with different CAP tradeoffs
• Detect partitions and weaken consistency
![Page 4: CAP theorem by Ali Ghodsi](https://reader033.vdocuments.net/reader033/viewer/2022060204/55a05e0d1a28ab3c2e8b45cc/html5/thumbnails/4.jpg)
Partition-tolerance & availability
• What is partition-tolerance?– Consistency and Availability are provided by algo
– Partitions are external events (scheduler/oracle)• Partition-tolerance is really a failure model
• Partition-tolerance equivalent with omissions
• In the CAP theorem– Proof rests on partitions that never heal
– Datacenters can guarantee recovery of partitions!• Can guarantee that conflict resolution eventually happens
![Page 5: CAP theorem by Ali Ghodsi](https://reader033.vdocuments.net/reader033/viewer/2022060204/55a05e0d1a28ab3c2e8b45cc/html5/thumbnails/5.jpg)
How do we ensure consistency
• Main technique to be consistent
– Quorum principle
– Example: Majority quorums• Always write to and read from a majority of nodes
• At least one node knows most recent value
WRITE(v) READ v
majority(9)=5
![Page 6: CAP theorem by Ali Ghodsi](https://reader033.vdocuments.net/reader033/viewer/2022060204/55a05e0d1a28ab3c2e8b45cc/html5/thumbnails/6.jpg)
Quorum Principle• Majority Quorum
– Pro: tolerate up to N/2 -1 crashes– Con: Have to read/write N/2 +1 values
• Read/write quorums (Dynamo, ZooKeeper, Chain Repl)– Read R nodes, Rrite W nodes, s.t. R + W > N (W > N/2)– Pro: adjust performance of reads/writes– Con: availability can suffer
• Maekwa Quorum– Arrange nodes in a MxM grid– Write to row+col, read cols (always overlap)– Pro: Only need to read/write O( sqrt(N) ) nodes– Con: Tolerate at most O( sqrt(N) ) crashes (reconfiguration)
7
P1 P2 P3
P4 P5 P6
P7 P8 P9
![Page 7: CAP theorem by Ali Ghodsi](https://reader033.vdocuments.net/reader033/viewer/2022060204/55a05e0d1a28ab3c2e8b45cc/html5/thumbnails/7.jpg)
Probabilistic Quorums
• Quorum size α√N, (α > 1)
intersects with probability 1-exp(α2)
– Example: N=16 nodes, quorum size 7,
intersects 95%, tolerates 9 failures
– Maekwa: N=16 nodes, quorum size 7,
intersects 100%, tolerates 4 failures
– Pro: Small quorums, high fault-tolerance
– Con: Could fail to intersect, N usually large
8
![Page 8: CAP theorem by Ali Ghodsi](https://reader033.vdocuments.net/reader033/viewer/2022060204/55a05e0d1a28ab3c2e8b45cc/html5/thumbnails/8.jpg)
Quorums and CAP
• With quorums we can get– C & P: partition can make quorum unavailable
– C & A: no-partition ensures availability and atomicity
• Faced decision when fail to get quorum *brewer’11+– Sacrifice availability by waiting for merger– Sacrifice atomicity by ignoring the quorum
• Can we get CAP for weaker consistency?
![Page 9: CAP theorem by Ali Ghodsi](https://reader033.vdocuments.net/reader033/viewer/2022060204/55a05e0d1a28ab3c2e8b45cc/html5/thumbnails/9.jpg)
What does atomicity really mean?
• Linearization Points
– Read ops appear as if immediately happened at all nodes at• time between invocation and response
– Write ops appear as if immediately happened at all nodes at• time between invocation and response
P3
P2
W(5) W(6)
R
P1
R
invocation response
![Page 10: CAP theorem by Ali Ghodsi](https://reader033.vdocuments.net/reader033/viewer/2022060204/55a05e0d1a28ab3c2e8b45cc/html5/thumbnails/10.jpg)
Definition of Atomicity• Linearization Points
– Read ops appear as if immediately happened at all nodes at• time between invocation and response
– Write ops appear as if immediately happened at all nodes at• time between invocation and response
P3
P2
W(5) W(6)
R:5
P1
R:6
atomic
![Page 11: CAP theorem by Ali Ghodsi](https://reader033.vdocuments.net/reader033/viewer/2022060204/55a05e0d1a28ab3c2e8b45cc/html5/thumbnails/11.jpg)
Definition of Atomicity
P3
P2
W(5) W(6)
R:6
P1
R:6
atomic
R:5
P3
P2
W(5) W(6)
R:6
P1not atomic
![Page 12: CAP theorem by Ali Ghodsi](https://reader033.vdocuments.net/reader033/viewer/2022060204/55a05e0d1a28ab3c2e8b45cc/html5/thumbnails/12.jpg)
Atomicity too strong?
P3
P2
W(5) W(6)
R:6
P1
R:5
not atomic
• Linearization points too strong?– Why not just have R:5 appear atomically right after W(5)?
– Lamport: ”If P2’s operator phones P1 and tells her I just read 6”
![Page 13: CAP theorem by Ali Ghodsi](https://reader033.vdocuments.net/reader033/viewer/2022060204/55a05e0d1a28ab3c2e8b45cc/html5/thumbnails/13.jpg)
Atomicity too strong?
P3
P2
W(5) W(6)
R:6
P1
R:5
not atomic
sequentially consistent
• Sequential consistency– Weaker than atomicity
– Sequential consistency removes this ”real-time” requirement
– Any global ordering OK as long as they respect local ordering
– Does Gilbert’s proof fall apart for sequential consistency?
• Causal memory– Weaker than sequential
– No need to have global view, each process different view
– Local, read/writes immediately return to caller
– CAP theorem does not apply to causal memory P2
W(1)
P1
R:0
W(0) R:1
causallyconsistent
![Page 14: CAP theorem by Ali Ghodsi](https://reader033.vdocuments.net/reader033/viewer/2022060204/55a05e0d1a28ab3c2e8b45cc/html5/thumbnails/14.jpg)
Going really weak
• Eventual consistency– When network non-partitioned, all nodes eventually have the same
value– I.e. don’t be ”consistent” at all times, but only after partitions heal!
• Based on powerful technique: gossipping– Periodically exchange ”logs” with one random node– Exchange must be constant-sized packets– Set reconciliation, merkle trees, etc– Use (clock, node_id) to break ties of events in log
• Properties of gossipping– All nodes will have the same value in O(log N) time– No positive-feedback cycles that congest the network
![Page 15: CAP theorem by Ali Ghodsi](https://reader033.vdocuments.net/reader033/viewer/2022060204/55a05e0d1a28ab3c2e8b45cc/html5/thumbnails/15.jpg)
BASE
• Catch all for any consistency model C’ that enables C’-A-P– Eventual consistency
– PRAM consistency
– Causal consistency
• Main ingredients– Stale data
– Soft-state (regenerateable state)
– Approximate answers
![Page 16: CAP theorem by Ali Ghodsi](https://reader033.vdocuments.net/reader033/viewer/2022060204/55a05e0d1a28ab3c2e8b45cc/html5/thumbnails/16.jpg)
Summary
• No need to ensure CAP at all times– Switch between algorithms or satisfy subset at different times
• Weaken consistency model– Choose weaker consistency:
• Causal memory (relatively strong) work around CAP
– Only be consistent when network isn’t partitioned:• Eventual consistency (very weak) works around CAP
• Weaken partition-tolerance– Some environments never partition, e.g. datacenters– Tolerate unavailability in small quorums– Some env. have recovery guarantees (partitions heal within X
hours), perform conflict resolution
![Page 17: CAP theorem by Ali Ghodsi](https://reader033.vdocuments.net/reader033/viewer/2022060204/55a05e0d1a28ab3c2e8b45cc/html5/thumbnails/17.jpg)
Related Work (ignored in talk)
• PRAM consistency (Pipelined RAM)
– Weaker than causal and non-blocking
• Eventual Linearizability (PODC’10)
– Becomes atomic after quiescent periods
• Gossipping & set reconciliation
– Lots of related work