pwlsd.1 a critique of the cap theorem - martin kleppmann

39
A Critique of the CAP Theorem Martin Kleppmann Papers We Love San Diego Daniel Norman – August 4th, 2016 @DreamingInCode

Upload: daniel-norman

Post on 13-Apr-2017

136 views

Category:

Software


0 download

TRANSCRIPT

A Critique of the CAP TheoremMartin Kleppmann

Papers We Love San DiegoDaniel Norman – August 4th, 2016

@DreamingInCode

PWLSD Presents:

GREETINGS PROFESSOR BREWER.

SHALL WE PLAY A GAME?

Love to. How about Global Total Order?

CAP TheoremCAP Theorem

You’ve probably heard it before:

Consistency, Availability, Partition tolerance:

“pick any two”

Seth Gilbert Nancy Lynch

Meet Our Players:

Eric Brewer Martin Kleppmann

Chronology of CAP

2000: Brewer publicizes his conjecture in various talks and papers – “The CAP Principle”.

2002: Gilbert and Lynch formally proved Brewer’s conjecture, and CAP Theorem was born.

1970s, 80s, and 90s: Absolutely[1] Nothing[2] happened[3] Nothing[4] to[5] see[6] here[7] folks.

[1] Paul R Johnson and Robert H Thomas. RFC 677: The maintenance of duplicate databases. Network Working Group, January 1975. URL https://tools.ietf.org/html/rfc677.[2] Jim N Gray, Raymond A Lorie, Gianfranco R Putzolu, and Irving L Traiger. Granularity of locks and degrees of consistency in a shared data base. In G M Nijssen, editor, Modelling in Data Base Management Systems: Proceedings of the IFIP Working Conference on Modelling in Data Base Management Systems, pages 364–394. Elsevier/North Holland, 1976.[3] Leslie Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Transactions on Computers, 28(9):690–691, September 1979. doi:10.1109/TC.1979.1675439.[4] Bruce G Lindsay, Patricia Griffiths Selinger, C Galtieri, Jim N Gray, Raymond A Lorie, Thomas G Price, Gianfranco R Putzolu, Irving L Traiger, and Bradford W Wade. Notes on distributed databases. Technical Report RJ2571(33471), IBM Research, July 1979.[5] Bowen Alpern and Fred B Schneider. Defining liveness. Information Processing Letters, 21(4):181–185, October 1985. doi:10.1016/0020-0190(85)90056-0.[6] Susan B Davidson, Hector Garcia-Molina, and Dale Skeen. Consistency in partitioned networks. ACM Computing Surveys, 17(3):341–370, September 1985. doi:10.1145/5505.5508.[7] Linearizability: A Correctness Condition for Concurrent Objects. M P. Herlihy and J M. Wing. ACM Transactions, Vol. 12, No. 3, July 1990.

“Consistency”

“Consistency”

“I’m totally pro-choice.” (Fox News, October 31, 1999)

“I’m pro-life.” (CPAC, February 10, 2011)

“I wanted to do this for myself. I had to do it for myself.” (Time, August 18, 2015)

“I don’t want it for myself. I don’t need it for myself.” (ABC News, November 20, 2015)

“I think the institution of marriage should be between a man and a woman.” (The Advocate, February 15, 2000)

“If two people dig each other, they dig each other.”(Trump University “Trump Blog,” December 22, 2005)

“I’m against gay marriage.” (Fox News, April 14, 2011)

What is “Consistency”

● Brewer defines consistency as one-copy-serializability (1SR)

● Gilbert and Lynch define consistency as linearizability

Linearizability

Kyle Kingsbury 2014 – https://aphyr.com/posts/313-strong-consistency-models

Kyle Kingsbury 2014 – https://aphyr.com/posts/313-strong-consistency-models

Put Simply:

Linearizability can be viewed as a special case of strict serializability where transactions are restricted to consist of a single operation applied to a single object.[1]

[1] Linearizability: A Correctness Condition for Concurrent Objects. By M P. Herlihy and J M. Wing. ACM Transactions on Programming Languages and Systems, Vol. 12, No. 3, July 1990.

Linearizability vs Serializability

Meaningfully different,

But close enough for our purposes

From Martin Kleppmann’s “Sequential Consistency versus Linearizability”

From Martin Kleppmann’s “Sequential Consistency versus Linearizability”

I ALREADY F#$KING READ IT!

(Eventual consistency)

(Terminal inconsistency)

“Availability”

Gilbert & Lynch defined availability differently.

Property of algorithm, or observed metric?

Brewer:“availability is obviously continuous from 0 to 100 percent”

Gylbert & Lynch:“For a distributed system to be continuously available, every request received by a non-failing node in the

system must result in a response”

“Availability”

The Server is DOWN! – What is “UP” anyway?

● Is it a server that’s fails to respond in 1000ms?

● What if it responds 5 minutes later?

● What if it responds, but with invalid data?

● What if it responds but the response is not received?

● What if your request packet is dropped, but all others are fine?

● What is the sound of one hand clapping?

“Availability”

Hush you! I’ll get around to responding.

“Availability”

If everybody fails, that’s availability.

WUT?

“Availability”

● Gilbert & Lynch definition – Contradictory and counter-intuitive

● Brewer’s definition is ok

● Nonsensical to call an algorithm “Available”

“Partition Tolerance”

A Network partition is:

“a communication failure in which the network is split into disjoint

sub-networks, with no communication possible across sub-networks”

“Partition Tolerance”

Most networks are Fairloss links

A Fairloss link is a link where the probability the message you send is

delivered is non-zero.

(and the probability of delivery is less than 100%)

“Partition Tolerance”

A meditation:

What fraction of messages must go undelivered before it’s a “Partition”?

CA AP CP CCCP

Using Gilbert & Lynch definitions,

essentially only two options:

CA - Avoid the gulag dear comrade, coordinate carefully with your party official.

AP - You’re on your own, capitalist swine!

CP - Not really any different from CA

CCCP - Purge all your data and start over every few years

PROVE IT

From “Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web

Services” Gylbert & Lynch 2002

PROVE IT

G&L offer Irrefutable proof that:

The operation of their algorithm is non-linearizable only if the partition lasts

forever.

But:

Partitions don’t last forever!

Sometimes faults aren’t detected!

Maybe you don’t even want linearizability!

GREETINGS PROFESSOR BREWER

CAP IS A STRANGE GAME.

THE ONLY WINNING MOVE IS NOT TO PLAY.

HELLO

HOW ABOUT A NICE GAME OF CHESS?

In my humble opinion...There’s a simpler way to say it:

● Some consistency models require total event ordering

● Any up-to-date list can only exist in a single point in space

● We are not optimistic about FTL information transfer

Be patient, or be self-sufficient. No whining.

Kleppmann proposes a “Delay sensitivity framework”

In a nutshell: Travel takes time, how patient can you be?

So What’s better?

“Delay sensitivity framework”

● Networks latency is lumpy / uncertain

● Very few scenarios entail actually reliable networks - packet loss is common

● Understanding lower bounds for different consistency models

So What’s better?

Proposed Terminology

● Availability - Empirical measurements only!

● Delay Sensitive - How patient can we be for a given operation?

● Network Faults - All kinds of weird shit happens on networks.

● Fault Tolerance - Under what specific failure modes can we guarantee our invariants?

● Consistency - Not just one, but many different consistency models to choose from

So What’s better?

Strong vs Weak

Consistency models galore:

Consistency models are a continuum.

“Strong” and “Weak” are conversational terms, not formal.

“Weak” does not mean unsafe.

It’s all about selecting which invariants you want.

Kyle Kingsbury 2014 – https://aphyr.com/posts/313-strong-consistency-models

In Conclusion:

● Irrefutable truths mayy be predicated on

misleading definitions

● consistency is subjective.

● CAP has outlived it’s usefulness

● We can do better!

● Let’s reason about latency

● Lets fix our terminology

● Rigor is good, CS needs moar of it

Thank you!

Papers We Love San Diego

A Critique of the CAP Theorem