verification of the chord protocol, a with a · extended successor list (esl) of 29 (with l = 2):...
TRANSCRIPT
AAA
A
A
A
VERIFICATION OF THE CHORD PROTOCOL,
WITH A
SURPRISING INVARIANT
Pamela Zave
AT&T Laboratories—Research
Bedminster, New Jersey, USA
1
8
14
21
32
42
51
THE PROTOCOL ISINTERESTING
no centraladministration(almost)
communicationin the networkis fast
protocoloperations aresimple and fast:
no timingconstraints(almost)
no multi-nodeatomicoperations
m = 6
THE CHORD PROTOCOL MAINTAINS A PEER-TO-PEER NETWORK identifier of a node (assumedunique) is an m-bit hashof its IP address
nodes are arranged ina ring, each nodehaving a successorpointer to the nextnode (in integer orderwith wraparound at 0)
the protocol preservesthe ring structure asnodes join, leave silently,or fail
redundant pointerssupport fault-tolerance(extra successors,predecessors)
successor
successor2
predecessor
WHY IS CHORD IMPORTANT?
the 2001 SIGCOMM paper introducing Chordis one of the most-referenced
papers in computer science, . . .
. . . and won SIGCOMM’s 2011 Test of Time Award
APPLICATIONS
“Three features that distinguish Chordfrom many other peer-to-peer lookupprotocols are . . .
. . . its simplicity,
. . . provable correctness,
. . . and provable performance.”
RESEARCH ON PROPERTIES ANDEXTENSIONS
allows millions of ad hoc peers tocooperate
often used to build distributedkey-value stores (where the keyspace is the same as the Chordidentifier space)
the best-known application isBitTorrent
protection against malicious peers
key consistency (all nodes agreeon which node owns which key),replicated data consistency
used as a building block in fault-tolerant applications
OPERATIONS OF THE PROTOCOL
710
16
10JOINS
10STABILIZES
710
16
16RECTIFIES
10RECTIFIES
7
10
16
710
16
7STABILIZES
7
10
16
an operation changesthe state of one member
Join and Stabilize are scheduled autonomously,Rectify is caused by another member’s Stabilize
in addition, a membercan Fail (or leave) silently
there is perfect failuredetection
Stabilizing memberdetects a dead successorand promotes its nextsuccessor
THE CLAIMS
THE REALITY
even with simple bugs fixed andoptimistic assumptions aboutatomicity, the original protocol isnot correct
of the seven properties claimedinvariant of the original version, notone is actually an invariant
Correctness Property:
In any execution state, IF there areno subsequent Join or Fail events, . . .
. . . THEN eventually . . .
. . . all pointers in the network will beglobally correct, and remain so.
not surprisingly, all due to sloppyinformal specification and proof
I found these problems byanalyzing a small Alloy model
Chris Newcombe and others atAWS credit this work withovercoming their bias againstformal methods, which they nowuse to find bugs.
[CACM, April 2015]
16
3 has no successor2yet (it is not requiredto have all successorsfilled in)
16
16
A TYPICAL BUG IN ORIGINAL CHORD
16FAILS
3STABILIZES
3
20 3
20
3
20
3 has replaced a pointerto a live node with apointer to a dead one
now 3 is disconnectedfrom the ring, and thering may be broken
BASIC CORRECTNESS STRATEGY 1
7
19
16
13
263
29
29
55
55
41
41
6
appendagesbest successor
(first livesuccessor)
ring
dead
29 2932
extended successor list (ESL)of 29 (with L = 2):
memberitself
Original operating assumption:
No failure leaves a member without a live successor.
But if an ESL with L = 2 is . . .
. . . then 32 cannot fail!Clearly, with L = 2, Chord isintended to tolerate one failurein a neighborhood, but not two.
BASIC CORRECTNESS STRATEGY 2
7
19
16
13
263
29
29
55
55
41
41
6
appendages
best successor(first live
successor)
ring
dead
extended successor list (ESL)of 29 (with L = 2):
memberitself
Definition of FullSuccessorLists:The extended successor list of eachmember has L+1 distinct entries.
New operating assumption:
If a Chord network has the propertyFullSuccessorLists, then no failure leavesa member without a live successor.
if not satisfied for the real failure rate,
. . . increase rate of stabilization,
. . . or increase redundancy
BASIC CORRECTNESS STRATEGY 3
7
19
16
13
263
29
55
41
6
appendages
ring
TO MAKE ORIGINAL CHORD CORRECT:
alter the initialization to satisfy FullSuccessorListswith all members live
alter the operations to populate successor lists moreeagerly, so that they always have L entries
now it isroughly correct(in hindsight)
but how do weprove it
without aninvariant?
requires L+1 members
w
x
y
u
z
ringx
fails
w
y
u
z
ring
WHY IS FINDING AN INVARIANT SO DIFFICULT?
there is a ring of best successors
there is no more than one ring
on the unique ring, the membersare in identifier order
from each appendage member, thering is reachable through bestsuccessors
THE KNOWN, NECESSARY PROPERTIES ARE STATED IN TERMS OF THE RING . . .
about the ring ofbest successors
about the appendages
. . . BUT “RING VERSUS APPENDAGE” IS CONTEXT-DEPENDENT AND FLUID:
AN INTERMEDIATE RESULT
THE INDUCTIVE INVARIANT:
AtLeastOneRing
AtMostOneRing
OrderedRing
ConnectedAppendages
BaseNotSkipped
and
and
and
and
ANOTHER OPERATING ASSUMPTION:
A chord network has a stable baseof L+1 nodes that are alwaysmembers.
no successor list skips overa member of the stable base
THE PROOF OF CORRECTNESS:
by exhaustive enumeration, in Alloy,for all model instances up to N = 9, L = 3
expensive to implement thesehigh-availability nodes!
a stable base would have 3-6members, while a Chord networkcan have millions of members—what is the base doing?
I believe it is just preventinganomalies in small networks,
but how can we know for sure?
THE FINAL RESULT
THE INDUCTIVE INVARIANT:
OneLiveSuccessor
and SufficientPrincipals
ANOTHER OPERATING ASSUMPTION:
THE PROOF OF CORRECTNESS:
None
informal and intuitive, but
. . . a real proof (no size limits)
. . . backed up by an Alloy model checked up to N = 9, L = 3 (as a protection against human error)
. . .
this is just a formalizationof the original operatingassumption
Definition of a principal member:A member that is not skipped by anymember’s successor list.
Definition of SufficientPrincipals: There are at least L+1 principal nodes.
the “stable base” has become something we can prove, rather thanan assumption!
identifierspace
hypothesize adisordered extended
successor list[. . . x, . . . y, . . . z, . . .]
the only identifier in theentire space that is notskipped by [x, . . . y, . . . z]is y
principal nodes cannotbe skipped by asuccessor list
Proof of OrderedSuccessorLists
picture
definition
SufficientPrincipals
CONTRADICTION!
there must be at leastL + 1 principal nodes,where L is greater thanor equal to 1
ORDERED (AND FULL) SUCCESSOR LISTS . . .. . . ARE IMPLIED BY THE INVARIANT
x
y z
Definition of OrderedSuccessorLists:For all sublists [x, y, z] of an ESL(whether the sublist is contiguousor not) . . . between [x, y, z].
every member has a bestsuccessor (first live successor)
there are sufficientprincipal nodes
here is a graph of bestsuccessors:
7
7
7
23
23
23
46
46
460
0
0
15
15
9
9
12
12
37
37
24
24
35
35
33
33
51
51
ssss
s
d d d d
these paths . . .
. . . do not skipprincipal nodes
. . . are acyclic
. . . are ordered by identifiers
each tree has exactly one p , whichis unique to it
so the re-arrangedgraph mustlook likethis
automaticallysatisfyingOneOrderedRingandConnected-Appendages
THE PRINCIPAL NODES MAKE THE SHAPE OF THE RING
CONCURRENCY AND COMMUNICATION
node X node Y
atomic step at X
X changes state
step can beassumed to occur
at this instant
AN OPERATION IS A SEQUENCE OFATOMIC STEPS
EACH ATOMIC STEP IS AN INTERNALSTATE CHANGE OR THIS:
querymessage
replymessage
formal model has ashared-stateabstraction
while waiting for a reply (or timeout),X cannot answer queries about its state
because of the structure of operations,queries cannot form circular waits
extended successor list ofstabilizing node, before Stabilize
extended successor listof its new successor
some Chord operations need multiple atomic steps
in the new, provably correct, specification,every intermediate operation state is alsoconstructed in this safe way
N N N NS S SS
S
0
0 0
00
01
1
12 23 3
N N N0 0 1 2
because invariant holds, nocurrent principal nodes are skippedhere or here
precondition guaranteesthat between [S , N , S ],so no current principalnodes are skipped fromS to N
therefore, no former principalnodes are skipped by this newsuccessor list, and the numberof principal nodes has notdecreased
HOW STABILIZE PRESERVES THE INVARIANT JOIN ANDRECTIFY
ARE SIMILAR
HOW FAIL PRESERVES THE INVARIANT
PRESERVATION OFOneLiveSuccessor
PRESERVATION OFSufficientPrincipals
The operatingassumption is thatno failure leaves amember with no livesuccessor, . . .
. . . so the invariantis assumed to bepreserved.
Why can’t failure of a principal node leave the networkwith fewer than L+1 principals?
Lemma: The only operation that can cause a node tochange from principal to non-principal is its own failure.
The life history of a long-lived member:
Joinbecome principal because all neighbors know youenjoy life as a principal nodeFail
1234
Therefore the number of principal nodes isproportional to the number of nodes.
Once the network has grown (especially to millions ofmembers!) it is overwhelmingly improbable that it willhave fewer than 3-6 principal nodes.
PROVING PROGRESS
IF THERE ARE NO MOREJOIN OR FAIL EVENTS . . .
dead successors areremoved, so that everymember’s firstsuccessor is live
. . . WHILE MEMBERSCONTINUE TO STABILIZE . . .
every member’s firstsuccessor andpredecessor becomeglobally correct
tails of all successorlists become correct
1
2
3
as with construction of intermediatesuccessor lists, operations must bespecified precisely to ensurecorrectness
here preconditions must ensurethat no operation reverses theprogress of a past or current phase
CONCLUSIONS
THE PRODUCT THE PROCESS
initialization is more difficult thanoriginal Chord, but a simpleprotocol will get networks off toa safe start
otherwise correct Chord is justas efficient as original Chord
these peer-to-peer protocolshave a (justified) reputation forunreliability
it is an impressivepattern for fault-tolerance
a correct specification couldpave the way for a new generation of reliable, moreuseful implementations
it also provides a firm foundation for work on better failure detection
and security
Chord is a very interesting protocol—note that the invariantlooks nothing like the propertieswe care about!
results would have been impossibleto find without model-checking toexplore bizarre cases and get ideasfrom them
the best result was impossible tofind without the insights that camefrom the proof process
that is where the idea of astable base came from
www.research.att.com/~pamela> How to Make Chord Correct
AAA
A
A
A
PATTERNS IN NETWORK ARCHITECTURE:
PRINCIPLES FOR LAYERING
PERFORMANCE SERVICES
SECURITY SERVICES
Simplifying Assumption:
There is no multihoming.
QoS
reliability
mobility
data integrity
packet filtering
AAA
A
A
A
an overlay link is implemented by an underlay session
links of a network can be implemented by sessions in oneor several networks
BASIC LAYERING
session
link
AAA
A
A
A
BEST-EFFORT POINT-TO-POINT SERVICE (QUANTIFIED)
session
link
packets are sent at the rate of B
with probability P, a packet arrives within time L
in addition to loss or delay, packets can bere-ordered or duplicated in transit
PROPERTIES OF THE LINK
THE UNDERLAY SESSION IMPLEMENTS THIS SERVICE
AAA
A
A
A
PRINCIPLES FOR LAYERING
session
link
As we have seen in this class, there areMANY services a network could implement.
How do these services composein a network architecture?
service session
Where in a network architectureshould a service be implemented?
AAA
A
A
A
for each link in thepath of the session,let Sj be the fractionof the link’s bandwidthallocated to the session
minimum bandwidth = minimum ( Sj (Bj) )j
jminimum success probability = product ( Pj )
j jmaximum latency = sum ( Lj ) + sum ( Dj )
(QUANTIFIED SERVICE WITH GUARANTEES)
D is delay here
COMPOSITION OF “QUALITY OF SERVICE” SERVICE
min. bandwidth B1min. success prob. P1maximum latency L1
min. bandwidth B2min. success prob. P2maximum latency L2
min. bandwidth B3min. success prob. P3maximum latency L3
AAA
A
A
A
COMPOSITION OF RELIABILITY SERVICE(FOR EXAMPLE, TCP)
packets are deliveredreliably and in theorder they were sent
packets are deliveredreliably and in theorder they were sent
packets are deliveredreliably and in theorder they were sent
this session is reliable if . . .
. . . all the links in its path are reliable, and
. . . all the network nodes in its path are reliable
AAA
A
A
A
COMPOSITION OF DATA-INTEGRITY SERVICE(FOR EXAMPLE, IPsec)
this session has data integrity if . . .
. . . all the links in the path have data integrity, and
. . . all the network nodes in the path can be trusted to preserve integrity
no node except thesession endpointscan read or changethe data
no node except thesession endpointscan read or changethe data
no node except thesession endpointscan read or changethe data
AAA
A
A
A
SERVICE WHERENEEDED?
WHEREIMPLEMEN-TABLE?
HOWPROPA-GATED?
RESULT?
QoS topsession
above anyuntrustedlinks ornodes
where a network can provide sessions withdedicated paths, shorterpaths, higher-qualitypaths, etc.
piecewise
piecewise
piecewise
mustpropagate upto BE2EUS
reliability top session
any session protocol
dataintegrity
any session protocol
bottom end-to-endunshared session(BE2EUS)
dynamic link
all properties of thissession propagatepiecewise to the top
there can also be abottom end-to-endsession (BE2ES)
put in BE2ESor above
put in bottomsession over
untrusted span, or above
AAA
A
A
A
network implementssession-locationmobility
session preserveddespite mobility
session preserveddespite mobility
these sessions are preserveddespite mobility if . . .
. . . the links at their endpointsare preserved despite mobility
networkimplementsdynamic-routingmobility
COMPOSITION OF MOBILITY SERVICE
AAA
A
A
A
COMPOSITION OF PACKET-FILTERING SERVICE
member receives nosession-initiation packetsthat are undesirable for X
member receives no undesirablesession-initiation packets if . . .
. . . it does not receive any undesirablesession-initiation packets on its inlinks
networkX
networkY
implemented by filteringall the inlinks of E
E
E
x
y
y
how does Y know what isundesirable for X?
it does not matterwhether inlinks arestatic or dynamic
AAA
A
A
A
packetfiltering
all networkmembersin amachine
at any level whereit is possible todiscriminate
filter high-volumeattacks at a lowlevel, low-volumeattacks wherediscrimination isbetter
SERVICE WHERENEEDED?
WHEREIMPLEMEN-TABLE?
HOWPROPA-GATED?
RESULT?
topsession
only where thechange ofattachment occurs
endpointattachment
endpointattachment
propagationupward is automatic
mobility
AAA
A
A
A
PACKETFILTERING
QoS
RELIABILITY
RELIABILITY
INTEGRITY(ENCRYPTION)
INTEGRITY
reliability convertsbandwidth,probability, latencyto goodput, whichalso propagatespiecewise
encryptiondecreasesbandwidth andincreaseslatency, both bad
filteringincreaseslatency (bad)and bandwidth(good)
reliabilityaboveencryption
reliabilityabovemobility
encryptionabovemobility
filtering andencryption areindependentonly if sessioninitiations arenot encrypted
MOBILITY
MOBILITY
SERVICEINTERACTIONS
filtering abovemobility orfar fromendpoint