communication in distributed systemscs230/lectures20/distrsys...communication in distributed systems...

82
Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ, Petri Maaranen and Indranil Gupta )

Upload: others

Post on 12-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Communication in Distributed Systems

CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ, Petri Maaranen and Indranil Gupta )

Page 2: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

2

Messaging in Distributed Systems● Communication using messages

● Synchronouus and asynchronous communication, e.g. RPC-based

● Message Oriented Middlewares (MOMs)● Messages stored in message queues● Message servers decouple client and server

● Various assumptions about message content

Client App.

local messagequeues

Server App.

local messagequeues

messagequeues

Network Network Network

Message Servers

Middlewarecf: www.cl.cam.ac.uk/teaching/0910/ConcDistS/

Page 3: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

3

Properties of MOM

Asynchronous interaction● Client and server are only loosely coupled● Messages are queued● Good for application integration

Support for reliable delivery service● Keep queues in persistent storage

Processing of messages by intermediate message server(s)● May do filtering, transforming, logging, …● Networks of message servers

Natural for database integrationToday -- Middlewares for Message Queues and Message

Brokers (IBM MQ Series, Java JMS)Middleware

cf: www.cl.cam.ac.uk/teaching/0910/ConcDistS/

Page 4: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Generalizing communication

● Group communication● Synchrony of messaging to multiple recipients

is a critical issue

● Publish-subscribe systems● A form of asynchronous messaging; sender

and receiver need not know each other.

Page 5: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Group Communication

● Communication to a collection of processes – process group● Group communication can be exploited to provide

● Simultaneous execution of the same operation in a group of workstations

● Software installation in multiple workstations● Consistent network table management

● Who needs group communication ?● Reliable Storage Systems and Databases -- e.g. Cassandra● Highly available servers - Infrastructure control, Financial applications● Conferencing, online scoreboards and gaming leaderboards● Cluster management, datacenters● Distributed Logging….

Page 6: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Group communication - Types

● Peer● All members are equal● All members send messages to the group● All members receive all the messages

● Client-Server● Common communication pattern

● replicated servers● Client may or may not care which server answers

● Diffusion group ● Servers sends to other servers and clients

● Hierarchical● Highly and easy scalable

Svrs Clients

Page 7: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Message Passing Basics

● A system is said to be asynchronous if there is no fixed upper bound on how long it takes a message to be delivered or how much time elapses between consecutive steps

● Point-to-point messages (unicast)● sndi(m)● rcvi(m,j)

● Group communication● Broadcast

● one-to-all relationship● Multicast

● one-to-many relationship● A variation of broadcast where an object can target its messages to a

specified subset of objects

Page 8: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Using Traditional Transport Protocols

● TCP/IP● Automatic flow control, reliable delivery,

connection service, complexity • linear degradation in performance

● Unreliable broadcast/multicast● UDP, IP-multicast - assumes h/w support● message losses high(30%) during heavy load

• Reliable IP-multicast very expensive

Page 9: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Demo: Unicast vs. Multicast

Page 10: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Modeling Message Passing Systems

● A system consist of n objects a0, …, an-1● Each object ai is modeled as a (possible

infinite) state machine with state set Qi ● The edges incident on ai are labeled arbitrarily

with integers 1 through r, where r is the degree of ai

● Each state of ai contains 2r special components, outbufi[l], inbufi[l], for every 1 ≤ l ≤ r

● A configuration is a vector C=(qo,…,qn-1), where qi is the state of ai

a3

a1 a0

a2

1

2

1

3

2 1

1

2

Page 11: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Group Communication Issues

● Ordering and Delivery Guarantees● Membership● Failure

Page 12: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Ordering Service

● Unordered ● Single-Source FIFO (SSF)● Causally Ordered● Totally Ordered● Hybrid

● SSF + Total ● Causal + Total

Page 13: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

• Multicasts from each sender are received in the order they are sent, at all receivers

• Don’t worry about multicasts from different senders

• Formally• For all messages m1, m2 and all objects ai, aj, if ai sends m1 before it sends m2,

then m2 is not received at aj before m1 is• If a correct process issues (sends) multicast(g,m) to group g and then

multicast(g,m’), then every correct process that delivers m’ would already have delivered m.

Single-source FIFO ordering

13

Page 14: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

M1:1 and M1:2 should be received in that order at each receiverOrder of delivery of M3:1 and M1:2 could be different at different receivers

Single-source FIFO Ordering

P2

Time

P1

P3

M1:1 M1:2

P4

M3:1

Page 15: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

• Multicasts whose send events are causally related, must be received in the same causality-obeying order at all receivers

• Formally– For all messages m1, m2 and all objects ai, aj, if m1 happens before m2, then

m2 is not received at ai before m1 is

– If multicast(g,m) 🡪 multicast(g,m’) then any correct process that delivers m’ would already have delivered m, where 🡪 is Lamport’s happens-before relation

2 Causal Causal Ordering

15

Page 16: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

M3:1 🡪 M3:2, and so should be received in that order at each receiverM1:1 🡪 M3:1, and so should be received in that order at each receiverM3:1 and M2:1 are concurrent and thus ok to be received in diff. orders at diff. receivers

Causal Ordering: Example

P2

Time

P1

P3

M1:1

P4

M3:1 M3:2

M2:1

Page 17: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

• Causal Ordering => FIFO Ordering• Why?

– If two multicasts M and M’ are sent by the same process P, and M was sent before M’, then M 🡪 M’

– Then a multicast protocol that implements causal ordering will obey FIFO ordering since M 🡪 M’

• Reverse is not true! FIFO ordering does not imply causal ordering.

Causal vs. FIFO

17

• A variety of systems implement causal ordering: Social networks, bulletin boards, comments on websites, etc.

Page 18: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

• Also known as “Atomic Broadcast”• Unlike FIFO and causal, this does not pay attention to order of multicast sending• Ensures all receivers receive all multicasts in the same order

• Formally– For all messages m1, m2 and all objects ai, aj, if m1 is received at ai before

m2 is, the m2 is not received at aj before m1 is.

– If a correct process P delivers message m before m’ (independent of the senders), then any other correct process P’ that delivers m’ would already have delivered m.

Total Ordering

18

Page 19: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

The order of receipt of multicasts is the same at all processes. M1:1, then M2:1, then M3:1, then M3:2May need to delay delivery of some messages

Total Ordering: Example

P2

Time

P1

P3

M1:1

P4

M3:1 M3:2

M2:1

19

Page 20: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

• Since FIFO/Causal are orthogonal to Total, can have hybrid ordering protocols too– FIFO-total hybrid protocol satisfies both FIFO and total

orders– Causal-total hybrid protocol satisfies both Causal and total

orders

Hybrid Variants

20

Page 21: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Data Structures

Each receiver maintains a per-sender sequence number (integers)

– Processes P1 through PN– Pi maintains a vector of

sequence numbers Pi[1…N] (initially all zeroes)

– Pi[j] is the latest sequence number Pi has received from Pj

FIFO Multicast: Implementation

21

Update Rules

• Send multicast at process Pj:– Set Pj[j] = Pj[j] + 1– Include new Pj[j] in multicast message

as its sequence number• Receive multicast: If Pi receives a multicast

from Pj with sequence number S in message

– if (S == Pi[j] + 1) then • deliver message to application• Set Pi[j] = Pi[j] + 1

– else buffer this multicast until above condition is true

Page 22: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

P2

Time

P1

P3

P4

[0,0,0,0]

[0,0,0,0]

[0,0,0,0]

[0,0,0,0]

FIFO Ordering: Example

22

Page 23: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

P2

Time

P1

P3

P4

[0,0,0,0]

[0,0,0,0]

[0,0,0,0]

[0,0,0,0]

[1,0,0,0]Deliver!

P1, seq: 1

[1,0,0,0]Deliver!

?

[1,0,0,0]

FIFO Ordering: Example

Page 24: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

P2

Time

P1

P3

P4

[0,0,0,0]

[0,0,0,0]

[0,0,0,0]

[0,0,0,0]

[1,0,0,0]Deliver!

P1, seq: 1

[1,0,0,0]Deliver!

[0,0,0,0]Buffer!

P1, seq: 2

[1,0,0,0] [2,0,0,0]

FIFO Ordering: Example

[1,0,0,0]Deliver this!Deliver buffered <P1, seq:2>Update [2,0,0,0]

Page 25: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

P2

Time

P1

P3

P4

[0,0,0,0]

[0,0,0,0]

[0,0,0,0]

[0,0,0,0]

[1,0,0,0]Deliver!

P1, seq: 1

[1,0,0,0]Deliver!

[0,0,0,0]Buffer!

P1, seq: 2

[1,0,0,0] [2,0,0,0]

[2,0,0,0]Deliver!

[1,0,0,0]Deliver this!Deliver buffered <P1, seq:2>Update [2,0,0,0]

FIFO Ordering: Example

Page 26: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

P2

TimeP1

P3

P4

[0,0,0,0]

[0,0,0,0]

[0,0,0,0]

[0,0,0,0]

[1,0,0,0]Deliver!

P1, seq: 1

[1,0,0,0]Deliver!

[0,0,0,0]Buffer!

P1, seq: 2

[1,0,0,0] [2,0,0,0]

[2,0,0,0]Deliver!

[1,0,0,0]Deliver this!Deliver buffered <P1, seq:2>Update [2,0,0,0]

P3, seq: 1

[2,0,1,0]

[2,0,1,0]Deliver!

[2,0,1,0]Deliver!

?

FIFO Ordering: Example

Page 27: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

P2

Time

P1

P3

P4

[0,0,0,0]

[0,0,0,0]

[0,0,0,0]

[0,0,0,0]

[1,0,0,0]Deliver!

P1, seq: 1

[1,0,0,0]Deliver!

[0,0,0,0]Buffer!

P1, seq: 2

[1,0,0,0] [2,0,0,0]

[2,0,0,0]Deliver!

[1,0,0,0]Deliver this!Deliver buffered <P1, seq:2>Update [2,0,0,0]

P3, seq: 1[2,0,1,0]

[2,0,1,0]Deliver!

[2,0,1,0]Deliver!

[1,0,1,0]Deliver!

[2,0,1,0]Deliver!

FIFO Ordering: Example

Page 28: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

• All receivers receive all multicasts in the same order • Special process elected as leader or sequencer• Send multicast at process Pi:

– Send multicast message M to group and sequencer• Sequencer:

– Maintains a global sequence number S (initially 0)– When it receives a multicast message M, it sets S = S + 1, and multicasts <M, S>

• Receive multicast at process Pi: – Pi maintains a local received global sequence number Si (initially 0)– If Pi receives a multicast M from Pj, it buffers it until it both

1. Pi receives <M, S(M)> from sequencer, and 2. Si + 1 = S(M)• Then deliver it message to application and set Si = Si + 1

Total OrderingSequencer-based Approach

28

Page 29: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Multicasts whose send events are causally related, must be received in the same causality-obeying order at all receivers

Data StructuresEach receiver maintains a vector of per-sender sequence numbers (integers)

• Similar to FIFO Multicast, but updating rules are different• Processes P1 through PN• Pi maintains a vector Pi[1…N] (initially all zeroes)• Pi[j] is the latest sequence number Pi has received from Pj

Causal Multicast: Implementation

29

Page 30: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

• Send multicast at process Pj:– Set Pj[j] = Pj[j] + 1– Include new entire vector Pj[1…N] in multicast message as its sequence

number• Receive multicast: If Pi receives a multicast from Pj with vector

M[1…N] (= Pj[1…N]) in message, buffer it until both:1. This message is the next one Pi is expecting from Pj, i.e.,

• M[j] = Pi[j] + 12. All multicasts, anywhere in the group, which happened-before M have been

received at Pi, i.e., • For all k ≠ j: M[k] ≤ Pi[k]• i.e., Receiver satisfies causality

3. When above two conditions satisfied, deliver M to application and set Pi[j] = M[j]

Causal Multicast: Updating Rules

30

Page 31: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Time

P2

P1

P3

P4

[0,0,0,0]

[0,0,0,0]

[0,0,0,0]

[0,0,0,0]

[1,0,0,0]

[1,0,0,0]

Causal Ordering: Example 31

Page 32: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Time

P2

P1

P3

P4

[0,0,0,0]

[0,0,0,0]

[0,0,0,0]

[0,0,0,0]

[1,0,0,0]

[1,0,0,0]Deliver!

[1,0,0,0]Deliver!

[1,1,0,0]

Causal Ordering: Example

Page 33: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Time

P2

P1

P3

P4

[0,0,0,0]

[0,0,0,0]

[0,0,0,0]

[0,0,0,0]

[1,0,0,0]

[1,0,0,0]Deliver!

[1,0,0,0]Deliver!

[1,1,0,0]

[1,1,0,0]Deliver!

Missing 1 from P1Buffer!

Causal Ordering: Example

Page 34: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Time

P2

P1

P3

P4

[0,0,0,0]

[0,0,0,0]

[0,0,0,0]

[0,0,0,0]

[1,0,0,0]

[1,0,0,0]Deliver!

[1,0,0,0]Deliver!

[1,1,0,0]

[1,1,0,0]Deliver!

Missing 1 from P1Buffer!

[1,0,0,1]

Deliver!Receiver satisfies causality

Deliver!Receiver satisfies causality

Causal Ordering: Example

Page 35: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Time

P2

P1

P3

P4

[0,0,0,0]

[0,0,0,0]

[0,0,0,0]

[0,0,0,0]

[1,0,0,0]

[1,0,0,0]Deliver!

[1,0,0,0]Deliver!

[1,1,0,0]

[1,1,0,0]Deliver!

Missing 1 from P1Buffer!

[1,0,0,1]

Deliver!Receiver satisfies causality

Deliver!Receiver satisfies causality

Missing 1 from P1Buffer!

Causal Ordering: Example

Page 36: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Causal Ordering: Example

Time

P2

P1

P3

P4

[0,0,0,0]

[0,0,0,0]

[0,0,0,0]

[0,0,0,0]

[1,0,0,0]

[1,0,0,0]Deliver!

[1,0,0,0]Deliver!

[1,1,0,0]

[1,1,0,0]Deliver!

Missing 1 from P1Buffer!

[1,0,0,1]

Deliver!Receiver satisfies causality

Deliver!Receiver satisfies causality

Missing 1 from P1Buffer!

Deliver P1’s multicastReceiver satisfies causality for buffered multicasts

Deliver P2’s buffered multicastDeliver P4’s buffered multicast

Page 37: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Causal Ordering: Example

Time

P2

P1

P3

P4

[0,0,0,0]

[0,0,0,0]

[0,0,0,0]

[0,0,0,0]

[1,0,0,0]

[1,0,0,0]Deliver!

[1,0,0,0]Deliver!

[1,1,0,0]

[1,1,0,0]Deliver!

Missing 1 from P1Buffer!

[1,0,0,1]

Deliver!Receiver satisfies causality

Deliver!Receiver satisfies causality

Missing 1 from P1Buffer!

Deliver P1’s multicastReceiver satisfies causality for buffered multicasts

Deliver P2’s buffered multicastDeliver P4’s buffered multicast

Deliver!

Page 38: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

• Agreed Delivery● guarantees total order of message delivery and allows a

message to be delivered as soon as all of its predecessors in the total order have been delivered.

• Safe Delivery● requires in addition, that if a message is delivered by the GC to

any of the processes in a configuration, this message has been received and will be delivered to each of the processes in the configuration unless it crashes.

More Delivery Options

38

Page 39: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Reliable Group Communication

• Reliable Multicast– Every process in the group receives all multicasts

•• What happens with failures?

– First identify fault model• Message omission and delay

– Discover message omission and recover lost messages

• Processor crashes and recoveries• Network partitions and re-merges

Page 40: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Failure Model: Assumptions

● Assume that faults do not corrupt messages ( or that message corruption can be detected)

● Most systems do not deal with Byzantine behavior● Faults are detected using an unreliable fault detector, based on a

timeout mechanism● Note: Reliability is orthogonal to ordering

● Can implement Reliable-FIFO, or Reliable-Causal, or Reliable-Total, or Reliable-Hybrid protocols

Page 41: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

GC Concept: Membership

Messages addressed to the group are received by all group members ● Each member/process maintains a membership list or View● An update to the membership list is called a View Change

● Process join, leave, or failure

● If processes are added to a group or deleted from it (due to process crash, changes in the network or the user's preference), need to report the change to all active group members, while keeping consistency among them

● Every message is delivered in the context of a certain configuration, which is not always accurate. However, we may want to guarantee some properties (GC properties)...

Page 42: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

GC Properties

● Atomic Multicast● Message is delivered to all processes or to none at all. May

also require that messages are delivered in the same order to all processes.

● Failure Atomicity● Failures do not result in incomplete delivery of multicast

messages or holes in the causal delivery order● Uniformity

● A view change reported to a member is reported to all other members

● Liveness● A machine that does not respond to messages sent to it is

removed from the local view of the sender within a finite amount of time.

Page 43: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Virtual Synchrony

Preserve multicast ordering and reliability in spite of failures● Combines a membership protocol with a multicast protocol● Introduced in ISIS System (Cornell Univ.)

● Orders group membership changes along with the regular messages● Users: NYSE, French Air Traffic Control System, Swiss Stock Exchange

● Ensures that failures do not result in incomplete delivery of multicast messages or holes in the causal delivery order(failure atomicity)

● Ensures that, if two processes observe the same two consecutive membership changes, receive the same set of regular multicast messages between the two changes

● A view change acts as a barrier across which no multicast can pass

● Does not constrain the behavior of faulty or isolated processes

Page 44: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,
Page 45: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

More Interesting GC Properties

● There exists a mapping k from the set of messages appearing in all rcvi(m) for all i, to the set of messages appearing in sndi(m) for all i, such that each message m in a rcv() is mapped to a message with the same content appearing in an earlier snd() and:

● Integrity● k is well defined. i.e. every message received was previously sent.

● No Duplicates● k is one to one. i.e. no message is received more than once

● Liveness● k is onto. i.e. every message sent is received

Page 46: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Reliability Service

● A service is reliable (in presence of f faults) if exists a partition of the object indices into faulty and non-faulty such that there are at most f faulty objects and the mapping of k must satisfy:● Integrity● No Duplicates

● no message is received more than once at any single object● Liveness

● Non-faulty liveness• When restricted to non-faulty objects, k is onto. i.e. all messages broadcast by a

non-faulty object are eventually received by all non-faulty objects● Faulty liveness

• Every message sent by a faulty object is either received by all non-faulty objects or by none of them

Page 47: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Faults and Partitions

● When detecting a processor P from which we did not hear for a certain timeout, we issue a fault message

● When we get a fault message, we adopt it (and issue our copy)

● Problem: maybe P is only slow● When a partition occurs, we

can not always completely determine who received which messages (there is no solution to this problem)

Page 48: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Extended virtual synchrony

● Failures● Processes can fail and recover● Networks can partition and remerge

● Virtual synchrony handles recovered processes as new processes

● Can cause inconsistencies with network partitions

● Network partitions are real● Gateways, bridges, wireless communication

● Extended VS (introduced in Totem)● Does not solve all the problems of recovery in fault-tolerant

distributed systems, but avoids inconsistencies

Page 49: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Extended Virtual Synchrony Model

● Network may partition into finite number of components● Two or more may merge to form a larger

component● Each membership with a unique identifier

is a configuration.● Membership ensures that all processes in a

configuration agree on the membership of that configuration

Page 50: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Regular and Transitional Configurations

● To achieve safe delivery with partitions and remerges, the EVS model defines:● Regular Configuration

● New messages are broadcast and delivered● Sufficient for FIFO and causal communication modes

● Transitional Configuration● No new messages are broadcast, only remaining messages

from prior regular configuration are delivered.

● Regular configuration may be followed and preceeded by several transitional configurations.

Page 51: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Configuration change

● Process in a regular or transitional configuration can deliver a configuration change message s.t.

• Follows delivery of every message in the terminated configuration and precedes delivery of every message in the new configuration.

● Algorithm for determining transitional configuration● When a membership change is identified

• Regular conf members (that are still connected) start exchanging information

• If another membership change is spotted (e.g. failure cascade), this process is repeated all over again.

• Upon reaching a decision (on members and messages) – process delivers transitional configuration message to members with agreed list of messages.

• After delivery of all messages, new configuration is delivered.

Page 52: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Totem

● Provides a Reliable totally ordered multicast service over LAN● Intended for complex applications in which fault-tolerance and soft

real-time performance are critical● High throughput and low predictable latency● Rapid detection of, and recovery from, faults● System wide total ordering of messages● Scalable via hierarchical group communication ● Exploits hardware broadcast to achieve high-performance

● Provides 2 delivery services● Agreed● Safe

● Use timestamp to ensure total order and sequence numbers to ensure reliable delivery

Page 53: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

ISIS

● Tightly coupled distributed system developed over loosely coupled processors

● Provides a toolkit mechanism for distributing programming, whereby a DS is built by interconnecting fairly conventional non-distributed programs, using tools drawn from the kit

● Define● how to create, join and leave a group● group membership● virtual synchrony

● Initially point-to-point (TCP/IP) ● Fail-stop failure model

Page 54: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Horus

● Aims to provide a very flexible environment to configure group of protocols specifically adapted to problems at hand

● Provides efficient support for virtual synchrony● Replaces point-to-point communication with group communication

as the fundamental abstraction, which is provided by stacking protocol modules that have a uniform (upcall, downcall) interface

● Not every sort of protocol blocks make sense● HCPI - Horus Common Protocol Interface for protocol composition

● Stability of messages● membership

● Electra● CORBA-Compliant interface● method invocation transformed into multicast

Page 55: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,
Page 56: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Transis

● How different components of a partition network can operate autonomously and then merge operations when they become reconnected ?

● Are different protocols for fast-local and slower-cluster communication needed ?

● A large-scale multicast service designed with the following goals● Tackling network partitions and providing tools for recovery from them● Meeting needs of large networks through hierarchical communication● Exploiting fast-clustered communication using IP-Multicast

● Communication modes● FIFO● Causal● Agreed● Safe

Page 57: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Summary and Future

Summary: Ordering of multicasts and delivery options affects correctness of distributed systems using multicastsOther Challenges● Secure group communication architecture● New systems - big data, data centers● New applications - social media, IoT, mobile ● New needs -- Secure group communication

● Next Generations● Spread● Ensemble● MaelStrom, Ricochet - for cloud data centers

● Wireless networks *VSync - ISIS2 (VS + Paxos) https://www.youtube.com/watch?v=3o81K1olx0Q

Page 58: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Distributed Publish/Subscribe

Nalini Venkatasubramanian(with slides from Roberto Baldoni, Pascal Felber, Hojjat Jafarpour etc.)

Page 59: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Hojjat Jafarpour

CCD: Efficient Customized Content Dissemination in

Distributed Pub/Sub 59

Publish/Subscribe (pub/sub) systems

Pub/Sub Service

Stock ( Name=‘IBM’; Price < 100 ; Volume>10000 )

Stock ( Name=‘IBM’; Price < 110 ; Volume>10000 )

Stock ( Name=‘HP’; Price < 50 ; Volume >1000 )

Football( Team=‘USC’; Event=‘Touch Down’)

Stock ( Name=‘IBM’; Price =95 ; Volume=50000 )

Stock ( Name=‘IBM’; Price =95 ; Volume=50000 )

Stock ( Name=‘IBM’; Price =95 ; Volume=50000 )

■ What is Publish/Subscribe (pub/sub)?• Asynchronous communication • Selective dissemination• Push model• Decoupling publishers and subscribers

Page 60: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Hojjat Jafarpour

CCD: Efficient Customized Content Dissemination in

Distributed Pub/Sub 60

Publish/Subscribe (pub/sub) systems● Applications:

● News alerts● Online stock quotes● Internet games● Sensor networks● Location-based services● Network management● Internet auctions● …

Page 61: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Scalable Publish/Subscribe Architectures & Algorithms — P.

Felber 61

Publish/subscribe architectures● Centralized

● Single matching engine● Limited scalability

● Broker overlay● Multiple P/S brokers● Participants connected to

some broker● Events routed through

overlay● Peer-to-peer

● Publishers & subscribers connected in P2P network

● Participants collectively filter/route events, can be both producer & consumer

● …….

Page 62: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Distributed pub/sub systems

● Broker – based pub/sub● A set of brokers forming an overlay

● Clients use system through brokers

● Benefits• Scalability, Fault tolerance, Cost efficiency

Dissemination Tree

Page 63: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

63

Challenges in distributed pub/sub systems

Broker overlay architecture• How to form the broker network• How to route subscriptions and publications

Broker internal operations • Subscription management

• How to store subscriptions in brokers

• Content matching in brokers• How to match a publication against subscriptions

Broker ResponsibilitySubscription Management Matching: Determining the recipients for an eventRouting: Delivering a notification to all the recipients

Page 64: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

MINEMA Summer School - Klagenfurt (Austria) July 11-15,

2005 64

EVENT vs SUBSCRIPTION ROUTING

● Extreme solutions ● Sol 1 (event flooding)

● flooding of events in the notification event box● each subscription stored only in one place

within the notification event box● Matching operations equal to the number of

brokers ● Sol 2 (subscription flooding)

● each subscription stored at any place within the notification event box

● each event matched directly at the broker where the event enters the notification event box

Page 65: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Major distributed pub/sub approaches

● Tree-based● Brokers form a tree overlay [SIENA, PADRES, GRYPHON]

● DHT-based: ● Brokers form a structured P2P overlay [Meghdoot, Baldoni et al.]

● Channel-based: ● Multiple multicast groups [Phillip Yu et al.]

● Probabilistic: ● Unstructured overlay [Picco et al.]

65

Page 66: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Extra Slides

Page 67: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Horus

A Flexible Group Communication Subsystem

Page 68: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Horus: A Flexible Group Communication System

● Flexible group communication model to application developers.

1. System interface2. Properties of Protocol Stack3. Configuration of Horus

● Run in userspace● Run in OS kernel/microkernel

Page 69: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Architecture● Central protocol => Lego Blocks● Each Lego block implements a communication

feature.● Standardized top and bottom interface (HCPI)

● Allow blocks to communicate● A block has entry points for upcall/downcall● Upcall=receive mesg, Downcall=send mesg.

● Create new protocol by rearranging blocks.

Page 70: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,
Page 71: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Message_send

● Lookup the entry in topmost block and invokes the function.

● Function adds header● Message_send is recursively sent down

the stack● Bottommost block invokes a driver to send

message.

Page 72: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

● Each stack shielded from each other.● Have own threads and memory

scheduler.

Page 73: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Endpoints, Group, and Message Objects

● Endpoints● Models the communicating entity● Have address (used for membership), send and

receive messages● Group

● Maintain local state on an endpoint. ● Group address: to which message is sent● View: List of destination endpoint addr of

accessible group members● Message

● Local storage structure● Interface includes operation pop/push headers● Passed by reference

Page 74: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,
Page 75: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Transis

A Group Communication Subsystem

Page 76: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Transis : Group Communication System

● Network partitions and recovery tools.● Multiple disconnected components in the

network operate autonomously.● Merge these components upon recovery.

● Hierachical communication structure.● Fast cluster communication.

Page 77: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Systems that depend on primary component:

● Isis System: Designate 1 component as primary and shuts down non-primary.● Period before partition detected, non-primaries

can continue to operate.● Operations are inconsistent with primary

● Trans/Total System and Amoeba: ● Allow continued operations ● Inconsistent Operations may occur in different

parts of the system.● Don’t provide recovery mechanism

Page 78: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Group Service● Work of the collection of group modules.● Manager of group messages and group

views● A group module maintains

● Local View: List of currently connected and operational participants

● Hidden View: Like local view, indicated the view has failed but may have formed in another part of the system.

Page 79: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,
Page 80: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Network partition wishlist

1. At least one component of the network should be able to continue making updates.

2. Each machine should know about the update messages that reached all of the other machines before they were disconnected.

3. Upon recovery, only the missing messages should be exchanged to bring the machines back into a consistent state.

Page 81: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,

Transis supports partition● Not all applications progress is dependent on

a primary component.● In Transis, local views can be merged

efficiently.● Representative replays messages upon merging.

● Support recovering a primary component.● Non-primary can remain operational and wait to

merge with primary● Non-primary can generate a new primary if it is

lost.● Members can totally-order past view changes events.

Recover possible loss.● Transis report Hidden-views.

Page 82: Communication in Distributed Systemscs230/lectures20/distrsys...Communication in Distributed Systems CS 230 Distributed Systems (with adapted slides/animations from Cambridge Univ,