communication and data sharing for dynamic distributed systems

38
Communication and Data Sharing Communication and Data Sharing for Dynamic Distributed Systems for Dynamic Distributed Systems Nancy Lynch Nancy Lynch MIT MIT Alex Shvartsman Alex Shvartsman UConn UConn

Upload: turi

Post on 12-Jan-2016

27 views

Category:

Documents


0 download

DESCRIPTION

Communication and Data Sharing for Dynamic Distributed Systems. Nancy Lynch MIT. Alex Shvartsman UConn. Motivation and Focus. Constructing distributed applications for highly dynamic environments is a difficult In practice, considerable effort is required to make applications resilient to - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Communication and Data Sharing  for Dynamic Distributed Systems

Communication and Data Sharing Communication and Data Sharing for Dynamic Distributed Systemsfor Dynamic Distributed Systems

Nancy LynchNancy LynchMITMIT

Alex ShvartsmanAlex ShvartsmanUConnUConn

Page 2: Communication and Data Sharing  for Dynamic Distributed Systems

Motivation and FocusMotivation and Focus• Constructing distributed applications for highly Constructing distributed applications for highly

dynamic environments is a difficultdynamic environments is a difficult• In practice, considerable effort is required to make In practice, considerable effort is required to make

applications resilient to applications resilient to – changes in client requirements– evolution of the underlying computing medium

• Focus of our workFocus of our work– design and analysis of distributed services – that provide useful guarantees and – that make the construction of sophisticated

distributed applications easier.

Page 3: Communication and Data Sharing  for Dynamic Distributed Systems

Our ApproachOur Approach• TraditionallyTraditionally

– research on distributed services emphasized specification and correctness, while

– research on distributed algorithms emphasized complexity and performance

• We combine these concerns leading toWe combine these concerns leading to– algorithms that perform efficiently and degrade

gracefully in dynamic distributed settings, and – whose correctness, performance, and fault-

tolerance guarantees are expressed by precisely-defined global services.

Page 4: Communication and Data Sharing  for Dynamic Distributed Systems

Research Direction SummaryResearch Direction Summary• Develop and analyze algorithms to solve problems Develop and analyze algorithms to solve problems

of of communicationcommunication and and data sharingdata sharing in highly in highly dynamic distributed environmentsdynamic distributed environments

• ““Dynamic” encompassesDynamic” encompasses– Changes in network topology– Processor mobility– Changing sets of participants– Wide range of failures– Timing variations

Page 5: Communication and Data Sharing  for Dynamic Distributed Systems

Research Direction (cont’d)Research Direction (cont’d)• The properties we study includeThe properties we study include

– ordering and reliability guarantees for communication

– coherence guarantees for data sharing• The algorithmic results will be accompanied by The algorithmic results will be accompanied by

– lower bound and impossibility results, – which describe inherent limitations on what

problems can be solved, and at what cost.

Page 6: Communication and Data Sharing  for Dynamic Distributed Systems

RAMBORAMBOReconfigurable Atomic Reconfigurable Atomic

MemoryMemoryfor Read/Write Objectsfor Read/Write Objects

Nancy LynchNancy LynchAlex ShvartsmanAlex Shvartsman

Page 7: Communication and Data Sharing  for Dynamic Distributed Systems

Design GoalsDesign Goals• RAMBORAMBO

– Reconfigurable Atomic Memory for Basic Objects (Read/Write) for message-passing systems

• Dynamic replication for availability and survivabilityDynamic replication for availability and survivability• Loosely-coupled on-the-fly reconfigurationLoosely-coupled on-the-fly reconfiguration• High concurrencyHigh concurrency• Low latencyLow latency• Safety for any patterns of asynchrony and failuresSafety for any patterns of asynchrony and failures• Good performance under partial asynchrony and for Good performance under partial asynchrony and for

moderate failuresmoderate failures

Page 8: Communication and Data Sharing  for Dynamic Distributed Systems

Algorithmic IdeasAlgorithmic Ideas• Reconfigurable quorum systemsReconfigurable quorum systems

– Quorums maintain consistency during modest and transient changes

– Reconfigurations accommodate more drastic and permanent changes

• Read/write operations are frequentRead/write operations are frequent– Use quorum access and allow concurrency– Isolate from reconfiguration

• Reconfigurations are infrequentReconfigurations are infrequent– Use consensus to impose total order (Paxos)– Optimistic dissemination without formal installation– Conservative garbage collection of obsolete config-

s

Page 9: Communication and Data Sharing  for Dynamic Distributed Systems

Related Prior WorkRelated Prior Work• Atomic read/write memory in message-passing Atomic read/write memory in message-passing

modelsmodels– Upfal Widgerson 86– Attiya Bar-Noy Dolev 91, 95– Lynch Shvartsman 97– Englert Shvartsman 01

• – Lamport 89, 98

• QuorumsQuorums– Gifford 79, Thomas 79– and many many others

Page 10: Communication and Data Sharing  for Dynamic Distributed Systems

MethodologyMethodology• Specify algorithmSpecify algorithm

– Interacting state machines– Using non-deterministic “gossip”

• Show correctness/safety for Show correctness/safety for – arbitrary patterns of asynchrony– assuming arbitrary crash-failures and message loss

• Analyze performance for a subset of timed executionsAnalyze performance for a subset of timed executions– Bounded message delay, 0-time local processing– Some “gossip” becomes deliberate, some periodic– Non-failure of certain quorums for certain periods– Reason about operation latency– (Of course none of this impacts safety)

Page 11: Communication and Data Sharing  for Dynamic Distributed Systems

Showing Read/Write Showing Read/Write AtomicityAtomicity

• We show atomicity using a partial orderWe show atomicity using a partial order• Atomicity of a sequence Atomicity of a sequence of reads/writes of reads/writes

– Let be an irreflexive PO of all op-s in . Show:– For any , finitely many – If precedes , then not – If is write then either or – Any read returns value written by last write, per

[Lynch, Lemma 13.16]

Page 12: Communication and Data Sharing  for Dynamic Distributed Systems

Approach: Values and TagsApproach: Values and Tags• Each value Each value vv has an associated tag has an associated tag tt

– Tag is made up of the sequence-processor pair• Reads:Reads:

– a set of value-tag pairs is obtained– the result is the value with the maximum tag

• Writes:Writes:– a set of value-tag pairs is obtained– new-value is propagated with a new-tag that is a

lexicographic increment of tag :

new-tag := tag.seq + 1, pid

Page 13: Communication and Data Sharing  for Dynamic Distributed Systems

Using Quorum SystemsUsing Quorum Systems• Given a set Given a set II (a set of processor ids) (a set of processor ids)• A A quorum systemquorum system is a pair is a pair

– < read-quorums, write-quorums >

• WhereWhere– Read-quorums is a collection of subsets of I– Write-quorums is a collection of subsets of I

• Such thatSuch that– For any RR in read-quorums and WW in write-quorums,

RR W W – For any WW11 and WW22 in write-quorums,

WW11 WW22

Page 14: Communication and Data Sharing  for Dynamic Distributed Systems

High-Level FunctionsHigh-Level Functions• JoinerJoiner

– Introduces new participants to the system• Reader-WriterReader-Writer

– Routine read and write operations– Two-phased algorithm using all “known”

configurations– Using tags

• ReconfigurationReconfiguration– Chooses new next configuration– Informs members of the previous configuration

• Garbage collection (“packaged” with Reader-Writer)Garbage collection (“packaged” with Reader-Writer)– Identify and remove obsolete configurations

Page 15: Communication and Data Sharing  for Dynamic Distributed Systems

RAMBO

RAMBO SystemRAMBO System

Reader-Writer

Recon

Cons

Network

Joiner

Page 16: Communication and Data Sharing  for Dynamic Distributed Systems

Architectural ViewArchitectural View• Each component is formally specifiedEach component is formally specified

– Input/Output Automata [Tuttle Lynch]

• Joiners are specified as Joiners are specified as JoinerJoinerii for for ii in in II

• Reader-Writers are Reader-Writers are Reader-WriterReader-Writerii for for ii in in II

• Reconfigurers are Reconfigurers are ReconReconii for for ii in in II

• Consensus instances are Consensus instances are Cons(k,c)Cons(k,c) for for ii in in NN, c in , c in CC– Where the members of configuration c decide on

the configuration number k

• Network is specified in terms of Network is specified in terms of ChannelChanneli,ji,j for for i, ji, j in in II

– Assumed only to be “honest”• The System is then the composition of all automataThe System is then the composition of all automata

Page 17: Communication and Data Sharing  for Dynamic Distributed Systems

Configurations and Config Configurations and Config MapsMaps

• Configuration Configuration cc– members(c) -- set of members of configuration c– read-quorums(c) -- set of read quorums– write-quorums(c) -- set of write quorums

• Configuration map Configuration map cmcm– mapping from naturals to configurations– cm(k) is the configuration k, and it can be– defined, undefined (), garbage-collected (±)

± ± c c c c . . . . . .

G-C-ed Defined “Mixed” Undefined

Page 18: Communication and Data Sharing  for Dynamic Distributed Systems

Configuration MapsConfiguration Maps

c0

c0 c1

c0 c1 c2 ck

± c1 c2 ck

± ± c2 ck

TIME

. . .

. . .

. . .

. . .

. . .

± ± ± c3 ck . . .

± ± ± ± ± c c c c . . .

. . .

Page 19: Communication and Data Sharing  for Dynamic Distributed Systems

Reader-Writer ProtocolReader-Writer Protocol• One “gossip” messageOne “gossip” message

– < World, value, tag, cmap, ns, nr >• Message from a sender Message from a sender ss to a receiver to a receiver rr is such that is such that

– World is s ’s set of participants, and r World– value and tag are the object value and its tag at s– cmap is the configuration map at s– ns and nr are sender’s and best known receiver’s

phase numbers used to identify “fresh” messages• These messages areThese messages are

– Sent non-deterministically– For performance analysis we impose an

additional deterministic send policy• Certain actions are taken when “enough” info is Certain actions are taken when “enough” info is

gatheredgathered

Page 20: Communication and Data Sharing  for Dynamic Distributed Systems

goss

ip

RAMBOi

Reader-Writeri

Reconi

Read/Write ProtocolRead/Write Protocol

RAMBOj

Reader-Writerj

Reconj

readi

goss

ip

new-config(c,k)i

read-ack(v)i

write(v)i

RAMBOn

Reader-Writern

Reconn

gossip

write-acki

. . .

Page 21: Communication and Data Sharing  for Dynamic Distributed Systems

Reader-Writer CodeReader-Writer Code

Start read

Start write

End read

End write

New cfg

Receive

SendQuery fix

Prop fix

Page 22: Communication and Data Sharing  for Dynamic Distributed Systems

Fixpoint reached?

Start End

Recv Send

Send Collect responses

The Phase PatternThe Phase Pattern

• Send to a collection of processes in “known” configsSend to a collection of processes in “known” configs• Collect responses and update configuration Collect responses and update configuration

informationinformation• Continue until a certain predicate is satisfiedContinue until a certain predicate is satisfied

Continue sending

no yes

Page 23: Communication and Data Sharing  for Dynamic Distributed Systems

Read and Write OperationsRead and Write Operations

• ReadsReads and and WritesWrites use use QueryQuery and and PropagationPropagation phases phases involving known quorum configurationsinvolving known quorum configurations– Query obtains information about “latest” operations

from read quorums & updates configurations– Propagation disseminates the results of “latest”

operation to write quorums & updates configurations • Fixed point must be reached -- discovery of new Fixed point must be reached -- discovery of new

configurations requires new quorums to be reachedconfigurations requires new quorums to be reached

Read or Write

PropagateQuery

StartQuery

EndQuery

StartProp.

EndProp.

Page 24: Communication and Data Sharing  for Dynamic Distributed Systems

Reader-Writer: Send/RecvReader-Writer: Send/Recv

Page 25: Communication and Data Sharing  for Dynamic Distributed Systems

Reader-Writer: Fixed PointsReader-Writer: Fixed Points

Page 26: Communication and Data Sharing  for Dynamic Distributed Systems

Why Readers PropagateWhy Readers Propagate• If the readers do not propagate, If the readers do not propagate,

atomicity can be easily violated:atomicity can be easily violated:

Write of v1 . . . ( s l o w )

v0

v0

v1

Read of v1 Read of v0

v0

v0

v0

Page 27: Communication and Data Sharing  for Dynamic Distributed Systems

RAMBOi

Reader-Writeri

Reconi

Joining ProtocolJoining Protocol

RAMBOj

Reader-Writerj

Reconj

Joinerj

join

joinack

ack

Joinerijoin(J)i

join

gossip

Page 28: Communication and Data Sharing  for Dynamic Distributed Systems

Garbage CollectionGarbage Collection• When a process has the following configuration map When a process has the following configuration map cmapcmap

it can garbage-collection configuration it can garbage-collection configuration cmapcmap((k) = ck) = ckk

• Two-phase protocol using the “gossip” messagesTwo-phase protocol using the “gossip” messages– Update own tag & value by obtaining the “best” tag

and value from a read- and write-quorum of cmap(k)

– Propagate tag & value to a write-quorum of cmap(k+1)– Set cmap(k) to ±

• This “bootstraps” configuration This “bootstraps” configuration k k in case it is “too new” in case it is “too new”

± ± ck ck+1 . . .. . . . . .

Page 29: Communication and Data Sharing  for Dynamic Distributed Systems

ReconfigurationReconfiguration• Very simple protocol for ReconVery simple protocol for Reconii

– Reconfiguration is free of atomicity concerns• Initiator i (multiple initiators are allows)Initiator i (multiple initiators are allows)

– Accepts reconfiguration request recon(c,c’)i from environment: reconfigure from c to c’

– If c is the locally-known “latest” configuration k-1, informs member of c of the reconfiguration

– Calls Paxos for k to decide on “next” configuration c’

– Informs Reader-Writeri of the new configuration• Participants iParticipants i

– Learn about the initiation of reconfiguration– Participate in Paxos– Inform Reader-Writeri of the new configuration

Page 30: Communication and Data Sharing  for Dynamic Distributed Systems

Latency AnalysisLatency Analysis• Certain gossip and messages become “important”Certain gossip and messages become “important”

– Messages to members of “active” configurations when read or write is performed

– Messages to configurations k and k+1 when garbage collection is performed

– Specific messages when joining and reconfiguring– Responses to such messages

• Consider “good” timed executionsConsider “good” timed executions– Bounded message delay d– 0 local processing time– Environment is well-formed

Page 31: Communication and Data Sharing  for Dynamic Distributed Systems

Additional AssumptionsAdditional Assumptions• These are assumptions are used in some resultsThese are assumptions are used in some results• Configuration-viability for time parameter eConfiguration-viability for time parameter e

– If c becomes “known” as configuration k anywhere– Then either one read- and one write-quorum of c

stays alive forever– Or if by time t another configuration is decided

upon by non-faulty members of c, then one read- and one write-quorum of c stays alive until t+e

• Reconfiguration-spacing for time parameter eReconfiguration-spacing for time parameter e

– recon(c,*)i occurs at least e time after report(c)i

• Join-connectivity for time parameter eJoin-connectivity for time parameter e– If i and j join by time t then the learn about each

other by time t+e

Page 32: Communication and Data Sharing  for Dynamic Distributed Systems

Latency Bounds (selected)Latency Bounds (selected)• Joining: Joining:

– 2d, provided “joiner” and “joinee” do not fail• Reconfiguration: Reconfiguration:

– In 0-configuration-viable executions– If recon(c,c’)i action occurs by time t and no

members of c fail after t, then recon-acki occurs at t+12d+

• Garbage-collection of cGarbage-collection of ckk at non-faulty i : at non-faulty i :– 4d, if R in read-quorums(ck), W1 in write-

quorums(ck), and W2 in write-quorums(ck+1) do not fail

• Read and write operations in “stable” systemsRead and write operations in “stable” systems– If no reconfig-s in progress, then process with “up-

to-date” config map completes its operation in 4d• (These do not depend on “gossip”)(These do not depend on “gossip”)

Page 33: Communication and Data Sharing  for Dynamic Distributed Systems

More Latency (1)More Latency (1)• These bounds depend on periodic gossipThese bounds depend on periodic gossip• Learning new configurationsLearning new configurations

– If i and j are “old enough” and do not fail, then information from i is conveyed to j within time 2d

• Garbage-collection when reconfigurations are 6d-Garbage-collection when reconfigurations are 6d-spaced and executions are 6d-configuration-viablespaced and executions are 6d-configuration-viable– If recon(c,*) occurs before t and c is “known” by

t-6d then any non-faulty process that is “old enough” learns about c and garbage-collects any older configuration by time t+6d

– All non-faulty “old enough” processes have one or two defined configurations in their configuration maps

Page 34: Communication and Data Sharing  for Dynamic Distributed Systems

More Latency (2)More Latency (2)• Read and write operations (with periodic gossip)Read and write operations (with periodic gossip)

– Complete in time 8d for non-faulty processes that are “old enough”, provided execution satisfies12.1d-recon-spacing and 6d-configuration-viability

• Learning in failure-free executionsLearning in failure-free executions

– Let J be the set of processes that joined by time t1. Then by time t + log|J|, J worldi for any i in J

2. If i in J “knows” a configuration at time t’, then any j in J learns about it by max(t + log|J|, t’) + 2d

Page 35: Communication and Data Sharing  for Dynamic Distributed Systems

Algorithmic InnovationsAlgorithmic Innovations• Dynamic owners of data:Dynamic owners of data:

– Any and all owners may request reconfiguration– the set of owners can be changed dynamically

• Dynamic configurations:Dynamic configurations:– Arbitrary configurations can be installed– no constraints on intersection of quorum sets or

member sets in distinct configurations.• Loosely-coupled reconfiguration:Loosely-coupled reconfiguration:

– Concurrent reads, writes and reconfiguration– If finite reconfigurations occur during a read or

write operation, then its completion does not depend on whether any reconfigurations complete

Page 36: Communication and Data Sharing  for Dynamic Distributed Systems

Algorithmics (cont’d)Algorithmics (cont’d)• Efficient “steady-state”:Efficient “steady-state”:

– Assuming bounded delays, infrequent reconfig-s, and periodic gossip, reads and writes complete in time constant times the message delay

– Assuming periodic garbage collection, readers/writers only deal with 1 or 2 configurations

• Fast “catch-up”:Fast “catch-up”:– New “joiners” with out-of-date configurations can

catch up after a logarithmic number of message exchanges provided the “joiners graph” is connected

Page 37: Communication and Data Sharing  for Dynamic Distributed Systems

Comparison with Other Comparison with Other ApproachesApproaches

• Paxos or a similar consensus service can be used to Paxos or a similar consensus service can be used to agree on global order of operationsagree on global order of operations– We only agree on sequence configurations– Consensus termination impacts only Recon– Reads/writes are not affected by consensus

• Group communication systems can also be usedGroup communication systems can also be used– Our algorithm is “from scratch”: low-level send-

receive, no hidden/relative costs– Reads/writes work during “new view”

establishment• Dynamic quorums / dynamic configurations workDynamic quorums / dynamic configurations work

– We allow arbitrary new configurations - no static • Our earlier work also solves this problemOur earlier work also solves this problem

– New work: concurrent recon-s and garbage-collect

Page 38: Communication and Data Sharing  for Dynamic Distributed Systems

Work in Progress and FuturesWork in Progress and Futures• Full-fledged implementation is under developmentFull-fledged implementation is under development• Additional analysis in progressAdditional analysis in progress

– “Normal timing” starts at some point– Trade-off between configuration-viability and

garbage collection– Analysis of “join-connectivity” graphs

• Algorithmic refinementsAlgorithmic refinements– Elimination of unnecessary communication– Explicit “leave” protocol– Gossip: “owners” vs. “users” of objects